Skip to main content

Consortium for Mathematics and its Applications

Product ID: 99819
Supplementary Print
Undergraduate

Linear Algebra and Optimization in Data Analysis (UMAP)

Author: Paul Isihara and Student Team


TARGET AUDIENCE:

Students with a mathematical background interested in data analysis, and instructors who could tailor this material for Python Jupyter Notebook labs, in courses such as applied linear algebra, mathematics for data science, mathematical modeling, or a mathematics capstone course.

ABSTRACT:

Using the framework of linear algebra and optimization as a unifying theme, a number of mathematical concepts including least-squares solutions, loss functions, covariance matrices, eigenvalues and eigenvectors, and separating hyperplanes are used to explain least-squares linear fitting, unsupervised clustering using k-means, dimensionality reduction using principal components, and binary classification of labeled data using support vector machines. To illustrate how data analysis works in practice, Python Jupyter Notebooks are used to analyze a variety of data sets connected to the city of Chicago.

Table of Contents

1. Introduction

2. OLS Linear Fitting
2.1 Least-Squares Solutions
2.2 Minimizing the OLS Loss Function via Normal Equations
2.3 Equivalence of Gradient-Based Optimization

3. Unsupervised Clustering by k-Means
3.1 Clustering of Data
3.2 Optimization by Gradient Descent
3.3 Optimization by Coordinate and Block Descent
3.4 k-Means Clustering by Block Descent LS Minimization
3.4.1 Proof of Block Descent’s Stepwise Minimization of J

4. Dimension Reduction
4.1 Why Variance Matters in Reducing Dimensionality
4.2 Variance and Covariance for Mean-Centered Data
4.3 Covariance Matrix
4.4 Projected Variance
4.5 Maximization of Projected Variance

5. Binary Classification Via Support Vector Machines
5.1 Linearly Classifying Binary-labeled Data
5.2 Intuition Underlying SVM
5.3 Mathematical Formalism
5.4 Separating Hyperplanes
5.5 Signed Distance
5.6 Optimization for Linearly-Separable Data
5.7 Optimization for Non-Separable Data

6. Application to Reality

7. Conclusion

8. Solution to Selected Exercises

References

Acknowledgments

About the Authors

©2023 by COMAP, Inc.
UMAP Module
40 pages

Mathematics Topics:

Linear Algebra

Application Areas:

Data Analysis

Prerequisites:

Multivariable Calculus and Linear Algebra

You must have a Full Membership to download this resource.

If you're already a member, login here.

Not yet a member?