Machine learning is a subfield of computer science with the goal of exploring, studying, and developing learning systems, methods, and algorithms that can improve their performance with learning from data. The course is designed to give undergraduate students a onesemesterlong introduction to the main principles, algorithms, and applications of machine learning.
After completing the course, students will be able to:
Having successfully passed the following courses is necessary: (15122) and (21127 or 21128 or 15151) and (21325 or 36217 or 36218 or 36225 or 15359).
In general, familiarity with python programming, and a solid background in general CS, calculus, and probability theory are needed to deal successfully with course challenges. Some basic concepts in CS, calculus, and probability will be briefly revised (but not reexplained from scratch).
Talk to the teacher if you are unsure whether your background is suitable or not for the course.
Course grading will assigned based on the following weighting: 40% Homework, 15% Final exam, 10% Midterm exam, 20% Project, 15% Multiplechoice Quizzes. There will be about four homework assignments. The final exam will include questions about all the topics considered in the course, with an emphasis on the topics introduced after the midterm exam. Quizzes will consist in multiplechoice questions aimed to keep up with the topics of the course inbetween homeworks. The project will be in the form of a leadeboard game that will have to be completed in two stages, Tasks T1 and T2 that will be presented in the class by each student.
Grades will be curved only for classes featuring more than 20 participants.
In addition to the lecture handouts (that will be made available after each lecture), during the course additional material will be provided by the instructor to cover specific parts of the course.
A number of (optional) textbooks can be consulted to ease the understanding of the different topics (the relevant chapters will be pointed out by the teacher):
Dates  Topics  Slides  Useful References 
HW 


8/26  Introduction: Basic concepts; taxonomy of learning problems; workflow of ML approaches; interpretation views of ML problems; course road map; logistics; practical recommendations. 


8/28  Generalization, ML design, Canonical SL problem, Probability review: Example of generalization issues; overfitting; notion of regularization; design of an SL system; hypothesis and inductive bias; loss functions; introduction to learning theory; empirical vs. generalization errors; canonical SL problem: inductive bias, loss function and empirical risk minimization, parameter optimization; review summary of discrete probability theory.  QNA1 out  
8/29  Bayes Optimal Classifier, Decision Boundaries: probabilistic view, elements of decision theory, first review of probability concepts, Bayes rule, decision boundaries and classification errors, generative and discriminative modeling, introduction to probability distribution estimation 




9/4  Prediction and Classification with Estimated Probabilities: overview of conjugate priors for continous and discrete distributions; making predictions using different criteria; classification using estimated probabilities and MLE/MAP/Bayes; Gaussians and decision boundaries; complexity challenges and dependencies; introduction to Naive Bayes. 

QNA1 due, HW1 out  
9/5  Naive Bayes Classifier: review of Bayes classification using estimated probabilities and MLE/MAP/Bayes; review of multivariate Gaussians, notions related to the covariance matrices; quadratic and linear decision boundaries when using Gaussians; complexity challenges and feature dependencies; Naive Bayes models; discrete and continous features, MLE and MAP approaches; simplification rules for MAP estimation using smoothing parameters; case study for discrete features: text classification; case study for continous features: image classification. 




9/11  Logistic Regression, Gradientbased methods: Solution of the optimization problems for Logistic Regression; generalities on solving optimization problems; concave and convex functions; local and global optima; gradients; gradient descent / ascent; exact gradient stepsize and zigzig behavior; effect of step size / learning rate; well and illconditioned functions; gradient ascent for LRMLE; gradient ascent for LRMAP.  pdf Notebook 


9/12  Logistic regression, comparison among classifiers: More on gradient ascent for logistic regression, practical scenarios; MCAP case for gradient ascent with Gaussian priors; logistic regression with more than two classes, softmax function; decision boundaries for different classifiers; linear vs. nonlinear boundaries; number of parameters in LR vs. Naive Bayes; asymptotic results for LR vs. NB; overall comparison between LR and NB.  


9/18  MultiLayer Perceptrons 1: NN as composite functions; functional form of a NN; visualization of the output surface; loss minimization problem in the weight space; nonconvex optimization landscape for the error surface; stochastic and batch gradient descent; backpropagation and chain rule; backpropagation for a logistic unit; backpropagation for a network of logistic units.  pdf Notebook: MLP viz 

HW1 due, QNA2 out  
9/19  MultiLayer Perceptrons 2: more on backpropagation; forward and backward passes in the general case; properties and issues of backpropagation; design choices (momentum, learning rate, epochs); weight inizialization; overfitting and generalization issues; approaches for to regularization; overview of model selection and validation approaches in ML.  pdf Notebook: MLP for MNIST 



9/25  Support Vector Machines (SVM) I: Linear models for classification (again); score, functional margin, geometric margin; maxmargin classifiers; linearly separable case and hardmargin SVM optimization problem; support vectors and relationship with the weight vector; nonlinearly separable case and use of slack variables for elastic problem formulation; margin and non margin support vectors; penalty / tradeoff parameter; SVMs and hinge loss. 


9/26  Support Vector Machines (SVM) II: Review of softmargin SVM; SVM and hinge loss; solution of the SVM optimization problem; relaxations and Lagrangian formulation; Lagrangian function and dual problem; solution of the dual; Lagrange multipliers; functional relations between multipliers and SVM parameters; solving the nonlinearly separable case.  QNA2 due, HW2 out  


10/2  Kernel Methods II: Kernels and similarity measures; Hilbert spaces and inner products; kernelization and modularity; examples of kernel functions; number of dimensions in feature space; RBF kernel and infinite dimensionality; kernel trick in SVMs; training vs. classification time; overfitting issues; examples of kernelize SVMs at work. 


10/3  Kernel Methods 3: Computational and complexity issues using kernels with SVMs; SVMs vs. Logistic regression  


10/9  Ensemble methods, Boosting: Overview of ensemble methods; bagging; boosting; turn a wek classifier into a strong classifier; sequential weighting of training points; AdaBoost; combining weak classifers outputs using voting weights; decision stumps as weak classifiers; analysis and propertied of AdaBoost; robustness to overfitting. 

HW2 due  
10/10  Recitation on Decision Trees (it was on Wednesday)  
10/14  Fall break  
10/16  Fall break  QNA3 out 😕  
10/17  Fall break  
10/21  Midterm Exam  
10/23  Regression Models, Linear Regression 1: Regression problems; empirical predictor; hypothesis class and loss functions; linear regression with squared losses: Ordinary Least Squares (OLS); problem formulation and solution approaches; controlling model complexity (and avoiding singularities) using regularization; effects of different regularization approaches; Ridge regression; Lasso regression; comparison among Lpnorm regularizations. 

QNA3 due, HW3 out  
10/24  Linear Regression 2: Review of linear regression solutions; issues related to solving OLS: matrix inversion, computations, numerical instabilities; normal equations vs. SGD vs. algebraic methods; Ridge regression as constrained optimization, shrinking of weights; Lasso regression as constrained optimization, shrinking and zeroing of weights.  pdf (updated) 


10/24  kNN classifier, Model selection: kNN as a nonparametric distancebased method; kNN classifier vs. Optimal classifier; 1NN decision boundary and Voronoi tassellation of feature space; small vs. large K; nonparametric vs. parametric methods; overfitting issues; training error vs. testing error; training error vs. validation error; holdout method; crossvalidation methods: kfold CV, leaveoneout CV, random subsampling; design issues in CV; model selection; model selection using CV. 




10/30  Nonlinear Regression Models, Kernel Regression 1: Approaches to nonlinear regression; Taylor series, linear approximation, basis functions; solution with generic feature functions; polynomial regression; properties of other basis functions; examples, model complexity and selection of hyperparameters; bias vs. variance; model selection and evalution; use of kernels, example in ridge regression. 


10/31  Nonlinear Regression Models, Kernel Regression 2: Dual form of ridge regression; inner products; complexity of primal and dual forms; derivation and solution of the dual problem; dual variables as Lagrange multipliers; learned weights as weighted average of training examples; dual solution and feature maps; kernelization of the dual; kernels for learning and predictions; complexity of solutions using explicit and kernelized feature maps; concepts about Support Vector Regression (optional). 




11/6  Nonlinear Regression Models, Kernel Regression 3: Smmothing kernels as a result of closedform LS solutions; examples with Gaussian and other basis functions; localization and weighted averages of observations; biasvariance tradeoff; kernel regression as nonparametric regression; NadarayaWatson estimator; examples of kernels; role and impact of bandwidth; kNN as another nonparametric estimator; kernel regression and least squares formulations. 

HW3 due, HW4 out  
11/7  Unsupervised Learning, Principal Component Analysis (PCA): Large feature spaces; curse of dimensionality; dimensionality reduction by feature selection and and latent features; simple general model for dimensionality reduction / compression; subspace spanned by a vector basis; key ideas behind principal component analysis (PCA); representation of data; variance captured by projections; definition of mininum variance directions; PCA algorithm; examples; limitations of PCA, Kernel PCA. 




11/13  Latent variable models, Mixture Models, ExpectationMaximization (EM): Latent variables and associated relevant problems; MLE with complete data; from complete data to latent data; Gaussian Mixture Models (GMMs); MLE for latent data in GMMs; formulation and alternating optimization; proof for the convergence of alternating optimization; general form of EM. 


11/14  Gaussian Mixture Models, Clustering, and EM: Derivation of general form of EM algorithm; EM for GMMs; general properties and limitations of EM; study of special cases for GMMs, relations to clustering (Kmeans).  pdf Slides on GMMs (Barnabas Poczos, 10401) 




11/20  Class canceled  HW4 due, QNA4 out  
11/21  Review of Clustering and EM  


11/27  Learning Theory II: PAC bounds on continuous hypothesis spaces; set shattering; VC dimension; VC dimension for linear models, decision stumps, axisaligned rectangles, circles, ellipsis; generalization error bound and VC dimension; tightness of the bound; biasvariance and VCdimension; limitations of the VC dimensions.  QNA4 due  
11/28  Course wrapup, Q& A:  


12/4  Final Exam 
Topic  Files  Due Dates 

Homework 1: Classification, Naive Bayes and Logistic Regression    Sep 18 
Homework 2: Neural Networks and Support Vector Machines    Oct 9 
Homework 3: Regression    Nov 6 
Homework 4: Unsupervised Learning    Nov 20 
Homework is due on autolab by the posted deadline. Assignments submitted past the deadline will incur the use of late days.
You have 6 late days in total, but cannot use more than 2 late days per homework. No credit will be given for homework submitted more than 2 days after the due date. After your 6 late days have been used you will receive 20% off for each additional day late.
You can discuss the exercises with your classmates, but you should write up your own solutions. If you find a solution in any source other than the material provided on the course website or the textbook, you must mention the source. You can work on the programming questions in pairs, but theoretical questions are always submitted individually. Make sure that you include a README file with your andrew id and your collaborator's andrew id.
In general, for both assignments and exams, CMU's directives for academic integrity apply and must be duly followed.
The class includes both a midterm and a final exam. Both the exams will include
theory and pseudoprogramming questions.
During exams students are only allowed to consult 1page cheatsheet
(written in any desired format). No other material is allowed, including
textbooks, computers/smartphones, or copies of lecture handouts.
The midterm exam is set for October 21.
The final exam is set for December 4.
Name  Hours  Location  

Gianni Di Caro  gdicaro@cmu.edu  Thursdays 4:305:30pm + pass by my office at any time ...  M 1007 
Aliaa Essameldin  aeahmed@andrew.cmu.edu  TBD  M 1004 