# MLCC: Linear regression

I am working through Google’s Machine Learning Crash Course. The notes in this post cover [1] and [2].

A lot of ML quickstarts dive right into jargon like model, feature, y’, L2, etc, which makes it hard for me to learn the basics – â€śwhat are we doing and why?â€ť

The crash course also presents some jargon, but at least explains each concept and links to a glossary, which makes it easier to learn.

After a few days of poking around, one piece of jargon seems irreducible: linear regression. In other words, this is the kind of basic ML concept Iâ€™ve been looking for. This is where Iâ€™d start if I was helping someone learn ML.

I probably learned about linear regression in the one statistics class I took in college, but have forgotten about it after years of string parsing đź™‚

The glossary entry for linear regression describes it as â€śUsing the raw output (yâ€™) of a linear model as the actual prediction in a regression modelâ€ť, which is still too dense for me.

The linear regression module of the crash course is closer to my level:

Linear regression is a method for finding the straight line â€¦ that best fits a set of points.

The crash course provides a good example of a line fitting points describing cricket chirps per minute per temperature:

The â€ślinearâ€ť in â€ślinear regressionâ€ť refers to this straight line, as in linear equation. The “regression” refers to “regression to the mean”, which is a statistical observation unfortunately unrelated to statistical methods like the least squares technique described below, as explained humorously by John Seymour.

Math is Fun describes a technique called â€śleast squares regressionâ€ť for finding such a line. Googleâ€™s glossary also has an entry for least squares regression, which gives me confidence that Iâ€™m bridging my level (Math is Fun) with the novel concept of ML.

Helpful tip from StatQuestâ€™s â€śMachine Learning Fundamentals: Bias and Varianceâ€ť: differences are squared so that negative distances donâ€™t cancel out positive distances.

Math is Funâ€™s article on linear equations and the crash courseâ€™s video on linear regression reminded me of the slope-intercept form of a linear equation I learned about way back when: `y = mx + b`.

The crash course even describes this equation as a â€śmodelâ€ť: â€śBy convention in machine learning, you’ll write the equation for a model slightly differently …â€ť

All this helps me understand in the most basic sense:

• A â€śmodelâ€ť is just an equation
• â€śTrainingâ€ť and â€ślearningâ€ť are just performing a regression calculation to generate an equation
• Performing these calculations regularly and on large data sets is tedious and error prone, so we use a computer, hence â€śmachine learningâ€ť
• â€śPredictionâ€ť and â€śinferenceâ€ť are just plugging x values into the equation