I am working through Google’s Machine Learning Crash Course. The notes in this post cover [2].

[2] introduces Colab, NumPy, Pandas and TensorFlow.

Colab is like a hosted Jupyter notebook and provides an easy way to play with Python ML libraries, among other things.

NumPy provides performant and user-friendly collections and operations for linear algebra.

Pandas provides tools for working with âdataframesâ, which are like spreadsheets in memory.

## Digression into Google Sheets

I like building on my understanding. In this context, I want to learn Colab and NumPy by using them to work with the cricket chirp data introduced in [1].

[1] used cricket chirps per minute per temperature as an example, but didnât provide raw data. Dolbearâs Law provides an equation we can use to generate data: T_{C} = 10 + (N_{60} – 40) / 7 â N_{60} = 7 * T_{C} – 30

Colab and NumPy provide an easy way to use this equation:

import numpy as np
# Starts by generating temp, since chirps are dependent on temp.
# Starts at 5 because Dolbearâs formula results in a negative value below 5 degrees
temps = np.arange(5,36)
# Adds noise to avoid an obviously linear relationship.
# Copies the approach from âNumPy UltraQuick Tutorialâ linked from [2].
# Sets low of -5, which limits the minimum chirps to zero.
noise = np.random.randint(low=-5, high=5, size=36)
chirps = 7 * temps - 30 + noise
# Prints CSVs, since Google Sheets knows how to split CSVs on paste.
print(','.join([str(i) for i in temps]))
print(','.join([str(i) for i in chirps]))

Example chirps per minute:

7,13,15,27,31,38,45,57,57,67,76,85,89,94,100,109,116,120,131,134,144,149,158,165,170,176,187,189,197,208,215

Note this generates synthetic data for chirps per minute, but then Iâll use them to predict temperature, ie chirps is the feature and temperature is the label.

Copy the temps and chirps CSVs. In Sheets, *Edit > paste special > paste comma-separated text (CSV) as columns*.

To improve readability, cut the pasted content and *Edit > paste special > paste transposed* to convert row data to column data.

Add column headers, select everything and then *Insert > Chart*.

Select âScatter chartâ for the chart type. Under *Customize > Series*, check the trendline box. Select âEquationâ for the label to get the regression equation. Check the R_{2} box.

We can also use the SLOPE and INTERCEPT methods to calculate the equation.

Slope, intercept and R_{2}, respectively, given the example chirps per minute from above:

Unfortunately, Sheets doesnât have MSE, which I learned about in [1], which leads me to wonder, âWhatâs the relationship between R_{2} and MSE?â Per [3], weâre better off with MSE.

## Digression into SciKit

[2] introduces Pandas after NumPy, but continuing the theme of building on understanding, Iâd like to perform a linear regression in Colab, rather than copy-pasting into Sheets. Iâll follow [4] and [5] and defer Pandas until I need it for TensorFlow.

import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
actual_temps = np.arange(5,36)
chirps = np.array([7,13,15,27,31,38,45,57,57,67,76,85,89,94,100,109,116,120,131,134,144,149,158,165,170,176,187,189,197,208,215])
model = linear_model.LinearRegression()
model.fit(chirps[:, np.newaxis], actual_temps)
predicted_temps = model.predict(chirps[:, np.newaxis])
plt.scatter(chirps, actual_temps)
plt.plot(chirps, predicted_temps)
# Starts the y-axis at zero, even though the data starts at 5
plt.ylim(0)
print('Slope: %.3f' % model.coef_)
print('Intercept: %.3f' % model.intercept_)
print('MSE: %.3f' % mean_squared_error(actual_temps, predicted_temps))
print('R2: %.3f' % r2_score(actual_temps, predicted_temps))

Slope, intercept, MSE and R2, respectively:

Note SciKit can calculate MSE and R2. Perhaps in line with [3], note MSE is non-zero, but R2 close to 100% đ¤

As expected, Sheets is great for common stuff, but Colab/Jupyter shines for arbitrary calculation.

## TensorFlow

Coincidentally, TensorFlow’s fifth birthday was just a couple days ago đĽł

Continuing the theme of building on experience, Iâm using the cricket chirp data for the synthetic exercise:

my_feature = ([float(i) for i in [7,13,15,27,31,38,45,57,57,67,76,85,89,94,100,109,116,120,131,134,144,149,158,165,170,176,187,189,197,208,215]])
my_label = ([float(i) for i in range(5,36)])

The following settings enabled the cricket chirp data to converge with an RMSE ~ 0.8, which seems like a sweet spot of accuracy vs training time:

- Learning: 0.01
- Epochs: 50
- Batch size: 1

Decreasing the learning rate (eg 0.001) and increasing the epochs (eg 500) converges with an RMSE ~0.5, but takes forever. Increasing the batch increases choppiness of the error tail.

The summary at the bottom of the synthetic data exercise seems generally useful:

- “Training loss should steadily decrease, steeply at first, and then more slowly until the slope of the curve reaches or approaches zero.
- If the training loss does not converge, train for more epochs.
- If the training loss decreases too slowly, increase the learning rate. Note that setting the learning rate too high may also prevent training loss from converging.
- If the training loss varies wildly (that is, the training loss jumps around), decrease the learning rate.
- Lowering the learning rate while increasing the number of epochs or the batch size is often a good combination.
- Setting the batch size to a
*very* small batch number can also cause instability. First, try large batch size values. Then, decrease the batch size until you see degradation. - For real-world datasets consisting of a very large number of examples, the entire dataset might not fit into memory. In such cases, you’ll need to reduce the batch size to enable a batch to fit into memory.”

For the real data, thereâs a note about the âmaxâ being anomalous relative to the different percentiles, which makes sense, but is a little abstract. The plot does a good job showing outliers.

Interesting that the RMSE for the real data is ~100, rather than the zero I was going for with the synthetic data. I guess the point is that weâre trying to minimize loss, rather than eliminate it.

[2] uses California housing data, but we can browse other datasets at https://datasetsearch.research.google.com/.

Great tip to use corr to see which features correlate with a label, as an alternative to trial and error hyperparameter tuning.

## References

- Google Machine Learning Crash Course: âDescending into MLâ
- Google Machine Learning Crash Course: âFirst Steps with TensorFlowâ
- University of Virginia Library: âIs R-squared Useless?â
- Python Data Science Handbook: “In Depth: Linear Regression” excerpt
- SciKit: âLinear Regression Exampleâ