Machine learning is a subfield of computer science with the goal of exploring, studying, and developing learning systems, methods, and algorithms that can improve their performance with learning from data. The course is designed to give undergraduate students a onesemesterlong introduction to the main principles, algorithms, and applications of machine learning.
After completing the course, students will be able to:
Having successfully passed the following courses is necessary: (15122) and (21127 or 21128 or 15151) and (21325 or 36217 or 36218 or 36225 or 15359).
In general, familiarity with python programming, and a solid background in general CS, calculus, and probability theory are needed to deal successfully with course challenges. Some basic concepts in CS, calculus, and probability will be briefly revised (but not reexplained from scratch).
Talk to the teacher if you are unsure whether your background is suitable or not for the course.
Course grading will assigned based on the following weighting: 40% Homework, 30% Final exam, 10% Midterm exam, 20% Multiplechoice Quizzes. There will be about five homework assignments. The final exam will include questions about all the topics considered in the course, with an emphasis on the topics introduced after the midterm exam. Quizzes will consist in multiplechoice questions aimed to keep up with the topics of the course inbetween homeworks.
Note that grades will be NOT curved. The mapping between scores and letter grades will roughly follow the scheme below. However, final course scores will be converted to letter grades based on grade boundaries that will be precisely determined at the end of the semester accounting for a number of aspects, such as participation in lecture and recitation, exam performance, and overall grade trends. Note that precise grade cutoffs will not be discussed at any point during or after the semester.
In addition to the lecture handouts (that will be made available after each lecture), during the course additional material will be provided by the instructor to cover specific parts of the course.
A number of (optional) textbooks can be consulted to ease the understanding of the different topics (the relevant chapters will be pointed out by the teacher):
Dates  Topics  Slides  Useful References 
HW 


8/22  Introduction, General overview of ML: Basic concepts; taxonomy of learning problems; workflow of ML approaches; interpretation views of ML problems; course road map; logistics; practical recommendations. 


8/24  ML design, SL workflow, Loss functions: Workflow of a supervised learning problem scenario; structure and challenges of a typical SL problem: features, hypothesis class, loss function, optimization; design choices and inductive biases; loss function to score the effectiveness of learning; loss functions for classification and regression; generalization. 


8/26  Empirical risk minimization, Overfitting, Generalization error: Review
of concepts; Model selection and overfitting; empirical vs. generalization
errors; estimation of generalization error and model selection; different ways of
estimating generalization error. Recitation: Review of basic concepts: calculus, linear algebra, probability theory. 
QNA1 out  


8/31  Decision Trees I, SL based on the DivideandConquer Model: Learning by asking questions; structure of decision trees; expressivness of DTs; hypothesis space and NPhardness for finding simplest consistent hypothesis; recursive dataset decomposition, divideandconquer; axesparallel decision boundaries; greedy topdown heuristics. 


9/2  Decision trees II, Model selection and CrossValidation: Greedy topdown heuristics for decision trees: ID3, C4.5; entropy and information gain; purity of a labeled set; maximal gain for choosing the next attribute; discrete vs. continous features; overfitting issues and countermeasures decision tree regression. Estimating generalization error and model selection; holdout method; crossvalidation methods: kfold CV, leaveoneout CV, random subsampling; design issues in CV; model selection; model selection using CV.  PDF of lecture given in class  QNA1 due (Sat 9/4), HW1 out (Fri 9/3)  


9/7  Estimating Probabilities 1: Probability estimation and Bayes classifiers; importance and challenges estimating probabilities from data; frequentist vs. Bayesian approach to modeling probabilities; definition and properties of MLE, MAP, and Full Bayes approaches for parameter estimation; priors and conjugate probability distributions; examples with Bernoulli data and Beta priors.  pdf Check your knowledge 


9/9  Estimating Probabilities 2: Review of concepts, overview of conjugate priors for continous and discrete distributions; practical examples; BernoulliBeta, BinomialBeta, MultinomialDirichlet, GaussianGaussian; MLE vs. MAP vs. Full Bayes, pros and cons. 




9/12  Break, no classes  
9/14  Linear algebra and Multivariate Gaussians: Review of linear algebra; matrixvector notation; quadratic forms; multivariate Gaussians; isocontours; notions related to covariance matrices; making predictions using estimated probabilities and MLE/MAP/Bayes.  
9/16  Prediction and Classification using Estimated Probabilities, Naive Bayes Classifier: Classification using estimated probabilities and MLE/MAP/Bayes; quadratic and linear decision boundaries using Gaussians; complexity challenges and feature dependencies; Naive Bayes models; discrete and continous features, MLE and MAP approaches; simplification rules for MAP estimation using smoothing parameters; case study for discrete features: text classification; case study for continous features: image classification. 

HW1 due (Sat 9/18), QNA2 out  


9/19  Linear Models: From Generative to Discriminative Classifiers; Linear models for classification (and regression); properties of linear models; geometry of linear models; use for classification; score; functional margin; finding the best linear classifier; loss functions; loglogistic loss. 


9/21  Logistic Regression (LR) 1: Probabilistic discriminative models; logistic regression as linear probabilistic classifier; decision boundaries; M(C)LE and M(C)AP models for probabilistic parameter estimation for LR; optimization problem; concave and convex functions; local and global optima; introduction to partial derivatives and gradient vectors. 


9/23  Gradientbased optimization, Logistic Regression 2: Recap on concave and convex functions, local and global optima; partial derivatives and their calculation; gradient vectors; geometric properties; general framework for iterative optimization; gradient descent / ascent; design choices: step size, convergence check, starting point; zigzag behavior of gradients and function conditioning properties; sum functions and stochastic approximations of gradients; batch, incremental, and minibatch GD; properties of stochastic GD; gradient ascent for LRMLE; gradient ascent for LRMAP; MCAP case for gradient ascent with Gaussian priors; logistic regression with more than two classes, softmax function; decision boundaries for different classifiers; linear vs. nonlinear boundaries; number of parameters in LR vs. Naive Bayes; asymptotic results for LR vs. NB; overall comparison between LR and NB.  pdf Notebook  QNA2 due, HW2 out  


9/28  Support Vector Machines 2: Solution of the SVM optimization problem; relaxations; Lagrangian function and dual problem; Lagrange multipliers and their interpretation; solution of the dual for the hardmagin case; functional relations between multipliers and SVM parameters; solving the nonlinearly separable case (softmargin).  Solving SVM optimization problems  
9/30  Support Vector Machines 3: Hinge loss and softmargin SVM; regularized
hinge loss; properties of linear classifiers. Review for midterm 



10/3  Midterm Exam  
10/7  Kernel Methods, SVM Kernelization: Dual SVM problem formulation for the nonlinearly separable case (softmargin) and dot products; dot products and inner products; generalities on Hilbert's spaces; kernel functions and implicit feature map definition; kernels and similarity measures; Hilbert spaces and inner products; Mercer's conditions for kernels; kernel matrix; kernelization and modularity; kernel trick, kernelizing algorithms; examples of kernel functions; RBF kernel and infinite dimensionality; SVM kernelization; Kernels in logistic regression. 

HW2 due, QN3 out  


10/12  Fall break  
10/14  Fall break  


10/19  Linear Regression 2: Issues related to solving OLS: matrix inversion, computations, numerical instabilities; normal equations vs. SGD vs. algebraic methods; controlling model complexity (and avoiding singularities) using regularization.  
10/21  Linear Regression 3: Effects of different regularization approaches; Ridge regression as constrained optimization, shrinking of weights; Lasso regression as constrained optimization, shrinking and zeroing of weights; comparison among Lpnorm regularizations; kernelization of linear regression.  QN3 due, HW3 out  


10/26 (extra)  Probabilistic regression models: Statistical models of linear
regression; discriminative modeling of the conditional distribution of the outputs; white Gaussian
noise to explain variations; maximization of loglikelihood vs. solution of OLS; M(C)LE as unregularized
least squares; use of priors on parameters; M(C)AP estimate as
regularized LS; Gaussian prior and Ridge regression; Laplace prior and Lasso
regression. 


10/26  Nonparametric / Kernel Regression: Closedform solutions for prediction from linear models, weighted linear combination of label; smoothing kernels; examples with Gaussian and other basis functions; localization and weighted averages of observations; biasvariance tradeoff; kernel regression as nonparametric regression; NadarayaWatson estimator; examples of kernels; role and impact of bandwidth; kNN as another nonparametric estimator; kernel regression and least squares formulations. 


10/27  Ensemble methods, Boosting, Bagging, Random Forests: Ensemble models, Bagging, Boosting, Random forests: General ideas behind combining models; voting/averaging vs. stacking models; bagging and boosting as forms of combining different experts; bagging: construction of the datasets by bootstrapping, properties of the base model, variance reduction goals, aggregation by averaging; random forests as bagging with randomization of the features of each model; boosting: sequential generation of the weighted datasets, base model as a weak learner, goals of combining multiple weak learners, how to compute voting weights in AdaBoost; decision stumps as weak classifiers; analysis and properties of AdaBoost; robustness to overfitting. 


10/28  Neural Networks 1: Linear units and perceptron; perceptron algorithm and properties; from perceptrons to artificial neural networks, biological analogy; structure of a unit; multilayerd feedforward architectures (MLP); recurrent network models; sigmoid units; other activation functions; hidden layers and hierarchical feature learning and propagation; matrices and network parameters; basic overview of properties, design choices, concepts about overfitting and complexity. 




11/2  Neural Networks 3: Overfitting and generalization issues; approaches for to regularization; overview of model selection and validation approaches using NN; design choices; crossentropy loss; softmax activation layer; number of trainable parameters; sgd and epochs; issues with the choice of the activation function, sigmoid/tanh units and vanishing gradients; limitations of fully connected MLPs; general ideas about exploiting structure and locality in input data in convolutional neural networks; receptive fields; convolutional filters and feature maps; weight sharing; invariant properties of features (to be continued).  
11/4  Neural Networks 4: Convolutional filters and feature maps; weight sharing; invariant properties of features; pooling layers for subsampling; incremental and hierachical feature extraction by convolutional and pooling layers; output layer and softmax function; examples of CNN architectures; concepts and implementation of autoencoders for dimensionality reduction, compression, denoising; general ideas about transfer learning and generative networks.  


11/9  Unsupervised learning  Data Clustering: Characterization of clustering tasks; types of clustering; (flat) Kmeans clustering problem; role of centroids, cluster assignments, Voronoi diagrams; (naive) Kmeans algorithm, examples; computational complexity; convergence and local minima; Kmeans loss function; alternating optimization (expectationmaximization); assumptions and limitations of Kmeans; illustration of failing cases; kernel Kmeans; soft clustering; relatinship to vector quantization and use of clustering for lossy compression; hierachical clustering, linakge methods, assumptions, computational complexity. 


11/11  Recitation  Dimensionality reduction methods, Clustering:  


11/16  Unsupervised learning  Probabilistic clustering, Latent variable models, Mixture models, ExpectationMaximization 2: MLE for latent data and parameter estimation in GMMs; concepts and properties of ExpectationMaximization (EM) as iterative alternating optimization; EM for GMMs and probabilistic clustering; general form of EM for likelihood function optimization in latent variable models; Q function as lower bound of likelihood; formalism and concepts behind the EM approach; properties and limitations. 


11/18  Nonparametric Density Estimation: Density estimation problem; parametric vs. nonparametric approaches; histogram density estimation; role of bin width; biasvariance tradeoff; general form of the local approximator; fixing the width: kernel methods; Parzen windows; smooth kernels; finite vs. inifinite support; fixing the number of points: kNN methods; comparison between the approaches; role of the bandwidth and biasvariance tradeoff. 




11/23  Learning Theory II: PAC bounds on continuous hypothesis spaces; set shattering; VC dimension; VC dimension for linear models, decision stumps, axisaligned rectangles, circles, ellipsis; generalization error bound and VC dimension; tightness of the bound; biasvariance and VCdimension; limitations of the VC dimensions.  
11/25  Learning Theory III, General review  


12/4  Final Exam 
Topic  Files  Due Dates 

Homework 1: kNN, Model selection, Decision trees, Bayes optimal classifier, MLE/MAP/Bayes, Naive Bayes    Sep 18 
Homework 2: Logistic regression, Decision boundaries, Gradient methods, Support Vector Machines, Kernelization    Oct 7 
Homework 3: Linear and nonlinear regression models, Ensemble models, Neural networks    Nov 7 
Homework 4: Deep networks, Unsupervised Learning (Dimensionality reductition, Clustering, Mixture models, Nonparametric density estimation)    Nov 21 
Homework is due by the posted deadline. Assignments submitted past the deadline will incur the use of late days.
You have 6 late days in total, but cannot use more than 2 late days per homework or quiz. No credit will be given for an assignment submitted more than 2 days after the due date. After your 6 late days have been used you will receive 20% off for each additional day late.
You can discuss the exercises with your classmates, but you should write up your own solutions, both for the theory and programming questions.
Using any external sources of code or algorithms or complete solutions in any way must have approval from the instructor before submitting the work. For example, you must get instructor approval before using an algorithm you found online for implementing a function in a programming assignment.
Violations of the above policies will be reported as an academic integrity violation. In general, for both assignments and exams, CMU's directives for academic integrity apply and must be duly followed. Information about academic integrity at CMU may be found at https://www.cmu.edu/academicintegrity. Please contact the instructor if you ever have any questions regarding academic integrity or these collaboration policies.
The class includes both a midterm and a final exam. Both the exams will include
theory and pseudoprogramming questions.
During exams students are only allowed to consult 1page cheatsheet
(written in any desired format). No other material is allowed, including
textbooks, computers/smartphones, or copies of lecture handouts.
The midterm exam is set for October 3.
The final exam is set for December 4.
Name  Hours  Location  

Gianni Di Caro  gdicaro@cmu.edu  Thursdays 4:15pm5:30pm + pass by my office at any time ...  M 1007 
Eduardo FeoFlushing  efeoflus@andrew.cmu.edu  TBD  M 1004 