Machine learning is a subfield of computer science with the goal of exploring, studying, and developing learning systems, methods, and algorithms that can improve their performance with learning from data. The course is designed to give undergraduate students a onesemesterlong introduction to the main principles, algorithms, and applications of machine learning.
After completing the course, students will be able to:
Having successfully passed the following courses is necessary: (15122) and (21127 or 21128 or 15151) and (21325 or 36217 or 36218 or 36225 or 15359).
In general, familiarity with python programming, and a solid background in general CS, calculus, and probability theory are needed to deal successfully with course challenges. Some basic concepts in CS, calculus, and probability will be briefly revised (but not reexplained from scratch).
Talk to the teacher if you are unsure whether your background is suitable or not for the course.
Course grading will assigned based on the following weighting:
There will be about five homework assignments and six QNAs. The final exam will include questions about all the topics considered in the course, with an emphasis on the topics introduced after the midterm exam. QNA will consist in multiplechoice questions aimed to keep up with the topics of the course inbetween homeworks.
Note that grades will be NOT curved. The mapping between scores and letter grades will roughly follow the scheme below. However, final course scores will be converted to letter grades based on grade boundaries that will be precisely determined at the end of the semester accounting for a number of aspects, such as participation in lecture and recitation, exam performance, and overall grade trends. Note that precise grade cutoffs will not be discussed at any point during or after the semester.
In addition to the lecture handouts (that will be made available after each lecture), during the course additional material will be provided by the instructor to cover specific parts of the course.
A number of (optional) textbooks can be consulted to ease the understanding of the different topics (the relevant chapters will be pointed out by the teacher):
Dates  Topics  Slides  References 
Assignments 


8/20  Introduction, General overview of ML  Course aims and logistics; introduction to ML; core concepts; taxonomy of main ML tasks and paradigms; introduction to supervised, unsupervised, and reinforcement learning. 


8/21  SL workflow, Empirical risk, Generalization, Canonical SL problem  SL workflow and design choices; classification example; hypothesis class; loss function; generalization; empirical risk minimization; canonical SL problem. 


8/22  Loss functions, Overfitting, Model selection and Generalization  Examples of loss functions for classification and regression; regression example, overfitting and model selection; generalization concepts.  
8/23  A first, model: KNearest Neighbors (kNN)  Instancebased and
nonparametric methods; basic concepts of kNN; kNN classifier vs. Optimal
classifier; decision regions and decision boudaries; 1NN decision boundary and
Voronoi tassellation of feature space; small vs. large K; kNN regression;
nonparametric vs. parametric methods.


8/24: QNA1 out  


8/28  Decision Trees 2  Entropy and information gain; purity of a labeled set; maximal gain for choosing the next attribute; overfitting issues and countermeasures; discrete vs. continous features; continous features, ranges, binary splits; decision tree regression, measures of purity.  
8/29  Model selection and CrossValidation  Overfitting, estimation of generalization error, and model selection; holdout method; crossvalidation methods: kfold CV, leaveoneout CV, random subsampling; design issues in CV; general scheme for model optimization and selection using validation sets.  
8/30  Recitation  Decision trees, kNN, crossvalidation.  8/31: QNA1 due; HW1 out  


9/4  Linear regression 2, Regularization  Issues related to solving OLS: matrix inversion, computations, numerical instabilities; normal equations vs. SGD vs. algebraic methods; controlling model complexity (and avoiding singularities) using regularization; effects of different regularization approaches; Ridge regression as constrained optimization, shrinking of weights; Lasso regression as constrained optimization, shrinking and zeroing of weights; comparison among Lpnorm regularizations; comparative analysis over a number of test scenarios with linear and polynomial features; Elastic Net regression; a realworld regression scenario from data wrangling to model selection. 


9/5  Gradient methods for optimization  Concave and convex functions, local and global optima; partial derivatives and their calculation; gradient vectors; geometric properties; general framework for iterative optimization; gradient descent / ascent; design choices: step size, convergence check, starting point; zigzag behavior of gradients and function conditioning properties; sum functions and stochastic approximations of gradients; batch, incremental, and minibatch GD; properties of stochastic GD. 


9/6  Recitation  Linear regression, Regularization, Gradient methods.  Notebook for gradient descent  Sep 7: HW1 due, QNA2 out  


9/11  Break, no classes  
9/12  Nonparametric / Kernel Regression  Closedform solutions for prediction from linear models, weighted linear combination of label; smoothing kernels; examples with Gaussian and other basis functions; localization and weighted averages of observations; biasvariance tradeoff; kernel regression as nonparametric regression; NadarayaWatson estimator; examples of kernels; role and impact of bandwidth; kNN as another nonparametric estimator; kernel regression and least squares formulations. 


9/13  Recitation  Non parametric methods, more on gradient methods  QNA2 due, HW2 out  


9/18  Estimating Probabilities 1, MLE/MAP  Probability estimation and Bayes classifiers; importance and challenges estimating probabilities from data; frequentist vs. Bayesian approach to modeling probabilities; definition and properties of MLE, MAP, and Full Bayes approaches for parameter estimation; priors and conjugate probability distributions; examples with Bernoulli data and Beta priors. 


9/19  Estimating Probabilities 2, MLE/MAP  Review of concepts, overview of conjugate priors for continous and discrete distributions; practical examples; BernoulliBeta, BinomialBeta, MultinomialDirichlet, GaussianGaussian; MLE vs. MAP vs. Full Bayes, pros and cons. 


9/20  Recitation  Review of linear algebra; matrixvector notation; quadratic forms; multivariate Gaussians; isocontours; notions related to covariance matrices; making predictions using estimated probabilities and MLE/MAP/Bayes.  Sep 21: HW2 due, QNA3 out  


9/25  Classification with Naive Bayes models  Naive bayes models; discrete and continous features, MLE and MAP approaches; simplification rules for MAP estimation using smoothing parameters; case study for discrete features: text classification, bags of words; case study for continous features: image classification. 


9/26  From Generative to Discriminative Classifiers, Logistic Regression (LR)  From Generative to Discriminative Classifiers; logistic regression as linear probabilistic classifier; decision boundaries; M(C)LE and M(C)AP models for probabilistic parameter estimation for LR; concave optimization problem; gradient ascent for LRMLE; gradient ascent for LRMAP; MCAP case for gradient ascent with Gaussian priors; logistic regression with more than two classes, softmax function; decision boundaries for different classifiers; linear vs. nonlinear boundaries; number of parameters in LR vs. Naive Bayes; asymptotic results for LR vs. NB; overall comparison between LR and NB.  
9/27  Recitation  Logistic regression, Naive Bayes  QNA3 due, HW3 out  


10/3  Recitation  Ensemble methods  
10/4  Recitation  Review for midterm  QNA4 out  


10/9  Fall break, no classes  
10/10  Fall break, no classes  
10/11  Fall break, no classes  


10/15  Neural networks 2: MLPs  NN as composite functions; functional form of a NN; concepts about overfitting and complexity; visualization of the output surface; loss minimization problem in the weight space; nonconvex optimization landscape for the error surface; stochastic and batch gradient descent; backpropagation and chain rule; backpropagation for a logistic unit; backpropagation for a network of logistic units; forward and backward passes in the general case; properties and issues of backpropagation; design choices (momentum, learning rate, epochs); weight inizialization.  
10/16  Midterm Exam  
10/17  Neural networks 3: CNNs  Overfitting and generalization issues; approaches for to regularization; overview of model selection and validation approaches using NN; design choices; crossentropy loss; softmax activation layer; number of trainable parameters; sgd and epochs; issues with the choice of the activation function, sigmoid/tanh units and vanishing gradients; limitations of fully connected MLPs; general ideas about exploiting structure and locality in input data in convolutional neural networks; receptive fields; convolutional filters and feature maps; weight sharing; invariant properties of features; pooling layers for subsampling; incremental and hierachical feature extraction by convolutional and pooling layers; output layer and softmax function; examples of CNN architectures.  
10/18  Recitation  Neural networks  HW3 due, Oct 20: QNA4 due, HW4 out  


10/23  Neural networks 5: Autoencoders, Transformers, GANs  concepts and implementation of autoencoders for dimensionality reduction, compression, denoising; general ideas about transfer learning and generative networks; examples of transformer, chatGPT; Generative Adversarial Networs.  
10/24  Recitation  Neural networks  
10/25  Recitation  Neural networks  


10/30  Unsupervised learning 2: Data Clustering  Characterization of clustering tasks; types of clustering; (flat) Kmeans clustering problem; role of centroids, cluster assignments, Voronoi diagrams; (naive) Kmeans algorithm, examples; computational complexity; convergence and local minima; Kmeans loss function; alternating optimization (expectationmaximization); assumptions and limitations of Kmeans; illustration of failing cases; kernel Kmeans; soft clustering; relatinship to vector quantization and use of clustering for lossy compression; hierachical clustering, linakge methods, assumptions, computational complexity. 


10/31  Unsupervised learning 3: Mixture models / GMMS, Latent data models  Probabilistic clustering and limitations of hard partitioning methods; mixture models and density estimation; modeling with latent variables; Gaussian Mixture Models (GMMs); MLE for parameter estimation in GMMs; GMMs solutions with complete data, form of decision boundaries; relationships between KMeans solutions and GMMs solutions; from complete data to latent data; MLE for latent data and parameter estimation in GMMs, problem formulation. 


11/1  Unsupervised learning 4: ExpectationMaximization (EM), EM for GMMs  MLE for latent data and parameter estimation in GMMs; concepts and properties of ExpectationMaximization (EM) as iterative alternating optimization; EM for GMMs and probabilistic clustering; general form of EM for likelihood function optimization in latent variable models; Q function as lower bound of likelihood; formalism and concepts behind the EM approach; properties and limitations. 




11/6  Unsupervised learning 5: Nonparametric Density Estimation  Density estimation problem; parametric vs. nonparametric approaches; histogram density estimation; role of bin width; biasvariance tradeoff; general form of the local approximator; fixing the width: kernel methods; Parzen windows; smooth kernels; finite vs. inifinite support; fixing the number of points: kNN methods; comparison between the approaches; role of the bandwidth and biasvariance tradeoff. 


11/7  Recitation  EM, Nonparametric density estimation  
11/8  Break, no classes  


11/13  Support Vector Machines (SVM) 1  Linear classifiers (deterministic); review of score and functional margin, geometric margin; maxmargin classifiers; linearly separable case and hardmargin SVM optimization problem; general concepts about constrained optimization. 


11/14  Support Vector Machines (SVM) 2  Support vectors and relationship with the weight vector; nonlinearly separable case and use of slack variables for elastic problem formulation; margin and non margin support vectors; penalty / tradeoff parameter; solution of the SVM optimization problem; relaxations; Lagrangian function and dual problem; Lagrange multipliers and their interpretation; weak and strong duality concepts.  
11/15  Support Vector Machines (SVM) 3  SVM optimization problem, primal and dual; solution of the dual for the hardmagin case; functional relations between multipliers and SVM parameters; solving the nonlinearly separable case (softmargin). Hinge loss and softmargin SVM; regularized hinge loss; properties of linear classifiers.  


11/22  Learning Theory 1  Needs for bounds on generalization errors; PAC model bounds; sample complexity; consistent but bad hypotheses; derivation of PAC Haussler bound; use of a PAC bound; limitation of Haussler's bound; Hoeffding's bound for a hypothesis which is not consistent; PAC bound and BiasVariance tradeoff; computing the sample complexity; sample complexity for the case of decision trees; DT of fixed width vs. number of leaves; sample complexity and number of points that allow consistent classification.  


11/27  Recitation  Learning theory, SVMs, Kernelization  
11/28  Recitation  Course review  
11/29  Future and Ethics of AI, Q&A  

Topic  Files  Dates 

QNA 1: General concepts, ML models and workflow, kNN    Out: Aug 24  Due: Aug 31 
Homework 1: Decision trees, kNN, Crossvalidation    Out: Aug 31  Due: Sep 7 
QNA 2: Linear regression, Regularization, Gradient methods    Out: Sep 7  Due: Sep 13 
Homework 2: Linear and nonlinear regression models, Nonparametric regression, Gradient methods for optimization    Out: Sep 13  Due: Sep 21 
QNA 3: Probabilistic models, Decision boundaries, MLE and MAP approaches    Out: Sep 21  Due: Sep 27 
Homework 3: Probabilistic models, MLE/MAP/Bayes, Naive Bayes, Logistic regression    Out: Sep 27  Due: Oct 7 
QNA 4: Ensemble methods (Bagging, Boosting, Random Forest)    Out: 3  Due: Oct 18 
Homework 4: Neural networks models and applications, Deep learning    Out: Oct 18  Due: Nov 1 
Homework 5: Upervised Learning (Dimensionality reduction, Clustering, Mixture models, Nonparametric density estimation)    Out: Nov 1  Due: Nov 15 
QNA 5: SVMs, Kernels and kernelization    Out: Nov 15  Due: Nov 23 
QNA 6: Learning theory    Out: Nov 23  Due: Nov 30 
Each assignment, either a homework or a QNA, is due on Gradescope by the posted deadline. Assignments submitted past the deadline will incur the use of late days.
You have 4 late days in total, but cannot use more than 1 late day per homework or QNA. No credit will be given for an assignment submitted more than 1 day after the due date. After your 4 late days have been used you will receive 20% off for each additional day late.
You can discuss the exercises with your classmates, but you should write up your own solutions, both for the theory and programming questions.
Using any external sources of code or algorithms or complete solutions in any way must have approval from the instructor before submitting the work. For example, you must get instructor approval before using an algorithm you found online for implementing a function in a programming assignment.
Violations of the above policies will be reported as an academic integrity violation. In general, for both assignments and exams, CMU's directives for academic integrity apply and must be duly followed. Information about academic integrity at CMU may be found at https://www.cmu.edu/academicintegrity. Please contact the instructor if you ever have any questions regarding academic integrity or these collaboration policies.
The class includes both a midterm and a final exam. Both the exams will include
theory and, possibly, pseudoprogramming questions.
During exams students are only allowed to consult a 2page cheatsheet (written in any
desired format) as well as the lecture slides (previously downloaded offline). No
other material is allowed, including textbooks, computers/smartphones, or general
consultation of Internet repositories. Any violation of these policies will determine
a null grade at the exam.
The midterm exam is set for October 4.
The final exam is set for TBD.
Name  Hours  Location  

Gianni Di Caro  gdicaro@cmu.edu  TBD + pass by my office at any time ...  M 1007 
Zhijie Xu  zhijie@andrew.cmu.edu  TBD  ARC 