Product ID: 99819

Supplementary Print

Undergraduate

**TARGET AUDIENCE: **

Students with a mathematical background interested in data analysis, and instructors who could tailor this material for Python Jupyter Notebook labs, in courses such as applied linear algebra, mathematics for data science, mathematical modeling, or a mathematics capstone course.

**ABSTRACT: **

Using the framework of linear algebra and optimization as a unifying theme, a number of mathematical concepts including least-squares solutions, loss functions, covariance matrices, eigenvalues and eigenvectors, and separating hyperplanes are used to explain least-squares linear fitting, unsupervised clustering using k-means, dimensionality reduction using principal components, and binary classification of labeled data using support vector machines. To illustrate how data analysis works in practice, Python Jupyter Notebooks are used to analyze a variety of data sets connected to the city of Chicago.**Table of Contents 1. Introduction**

**2. OLS Linear Fitting2.1 Least-Squares Solutions2.2 Minimizing the OLS Loss Function via Normal Equations2.3 Equivalence of Gradient-Based Optimization **

**3. Unsupervised Clustering by k-Means3.1 Clustering of Data3.2 Optimization by Gradient Descent3.3 Optimization by Coordinate and Block Descent3.4 k-Means Clustering by Block Descent LS Minimization 3.4.1 Proof of Block Descent’s Stepwise Minimization of J**

**4. Dimension Reduction4.1 Why Variance Matters in Reducing Dimensionality4.2 Variance and Covariance for Mean-Centered Data4.3 Covariance Matrix4.4 Projected Variance4.5 Maximization of Projected Variance**

**5. Binary Classification Via Support Vector Machines5.1 Linearly Classifying Binary-labeled Data5.2 Intuition Underlying SVM5.3 Mathematical Formalism5.4 Separating Hyperplanes5.5 Signed Distance5.6 Optimization for Linearly-Separable Data5.7 Optimization for Non-Separable Data**

**6. Application to Reality**

**7. Conclusion**

**8. Solution to Selected Exercises**

**References**

**Acknowledgments**

**About the Authors**

©2023 by COMAP, Inc.

UMAP Module

40 pages

- Linear Algebra

- Data Analysis

Multivariable Calculus and Linear Algebra

You must have a **Full Membership** to download this resource.

If you're already a member, **login here**.

Browse More Resources

Search

COMAP develops curriculum resources, professional development programs, and contest opportunities that are multidisciplinary, academically rigorous, and fun for educators and students. COMAP's educational philosophy is centered around mathematical modeling: using mathematical tools to explore real-world problems.

Policies