ML Foundations: week 2

Coursera’s Lab was running slowly, so explored Google’s Colab as an alternative.

A few nice features: CPU and RAM usage indicators let me know if I’m close to a limit; the run, create and move buttons on each cell are convenient.

In Coursera, download the and files and unzip.

In Colab, click on “File > Upload notebook” and upload the unzipped notebook.

Add a cell to install Turi Create:

pip install turicreate

Add another cell to authorize Colab to read files from Drive:

from google.colab import drive

In Drive, select “upload folder” and upload the unzipped folder.

In Colab’s left rail, click on the little the stylized folder icon (🗂) and browse Drive for the uploaded folder. Right-click on the folder and select “Copy path”.

Update the SFrame creation to use the copied path:

sales = turicreate.SFrame('/content/drive/MyDrive/home_data.sframe')

Credit to the “Bonus Method — My Drive” section of “Get Started: 3 Ways to Load CSV files into Colab” for describing the basics.

Aaand of course now that I’ve set up Colab, I see Coursera’s Lab is running faster 🤷‍♂️

Out of curiosity, I see the intercept is negative, indicating buyers require a minimum square footage. Solving for x when y=0, I see it’s ~180. I can plug that back into the model:

sqft_model.predict([{'sqft_living': 180}])

Launch stages

Agreeing to incremental launch stages and the criteria that govern promotion from one stage to the next makes launch commitments, aka deadlines, less risky. For example, Google SRE’s launch plan documentation describes four: Early Access Preview, Alpha, Beta and General Availability (GA). Committing to a GA launch is easier after a successful Beta, which is easier after a successful Alpha, end so on.

Comparable to a deployment pipeline, promotion from one stage to the next ideally concerns stability rather than features. In other words, if the difference between Alpha and Beta is a length of time without issues, we can promote with confidence after that time, but if the difference is a set of new features, we don’t know how those features will perform.

Control vs data planes

I recently became aware of a helpful dichotomy: control vs data plane. The former governs how the latter should be delivered.

I believe these terms come from the world of networking, but they’re now entering the world of application engineering via DevOps.

For example, I work on a product that delivers targeted configuration to apps. In this context, the targeting logic is the control plane, and the resulting configuration is the data plane. For contrast, the RESTful perspective would describe both as resources.

In this context, I can see if other patterns might apply. In particular, the best-practice of a declarative control plane has been helpful lately. As Azure’s introduction to Infrastructure as Code states, the goal is to specify “what an environment requires and not necessarily the how.” Collocating control with code simplifies reasoning and minimizes the cost of switching between application and infrastructure logic, similar to the benefits of collocating documentation with code.

Rotating cadence lead

Rotating leadership for recurring team “cadence” meetings is a beneficial pattern I’ve seen on several teams.

“Cadence” meetings help a team or project march to a rhythm. All teams I’ve been on have had them, but not all have called them “cadence”. In agile terms, a daily standup is one form. A form I like is Monday kickoff, Wednesday discussion, Friday retro.

The pattern is simple:

  1. As a team, define the responsibilities of the lead, eg keeping the meeting focused on the agenda, cancelling the meeting if there’s nothing on the agenda, etc
  2. Identify the consistent attendees of the meeting and rotate the lead role among them

Such a rotation has a few benefits:

  1. There’s no single point of failure for keeping the team organized
  2. All members of the team get leadership experience, and no single person is stuck with this form of glue work
  3. Sharing roles engenders empathy between roles. For example, experience motivating participation as a lead can encourage participation as a non-lead. As opposed to “taxation without representation” 🙂

A couple anti-patterns I’ve seen:

  1. Non-overlapping leads and attendees. For example, having the eng oncall rotation also lead a cadence including EMs, PMs, designers, etc who aren’t on the oncall rotation
  2. Having the rotation include optional attendees, which can result in last-minute adjustments

I’m also curious about rotating team leads, eg as part of the Engineer Manager Pendulum, but I don’t have experience with that yet.

Bugs vs tasks vs goals

I am aware of three common approaches for tracking work:

  1. Bugs, eg Github issues, JIRA tickets, etc
  2. Tasks, eg items in a list of things to do
  3. Goals, eg some end state

There are probably many more, but I commonly see teams struggle to reconcile these three.

Part of the challenge is they’re all related and required in some context, but no one is sufficient. Bugs are required because people external to a team need a way to request work. Further, some of this work is essential, so bugs can’t be ignored. Bugs generally represent unplanned work.

Tasks are required because we need a way to deconstruct large projects into more manageable pieces. Tasks generally represent planned work.

Goals are required to separate implementation details from an objective. One of my colleagues phrased it well: setting goals shouldn’t be controversial.

Sometimes tasks can be represented as bugs, but bugs by nature are relatively formal, which breaks down when tasks change at a high rate. Some teams strive for tasks that take no more than one iteration, which is tedious to represent as bugs. I like the pattern of stating goals for the week, and reviewing progress against those goals at the end of the week, but this is tedious to represent as bugs or tasks.

My fantasy is something like:

  1. Monday kickoff and Friday review focused on a simple, written (so we can remember on Fri) list of goals for the week
  2. A support rotation monitors bugs. Active work on bugs is represented as goals for the week
  3. Large projects have independent task tracking. Active work on tasks is represented as goals for the week

The fact that my goal for the week concerns planned or unplanned work matters less than communicating to the team what I’m working on and how it contributes to the team’s priorities.

Praise for Markdown eng docs

Google has a technical documentation system called “g3doc”. The “The Knowledge: Towards a Culture of Engineering Documentation” presentation at SRECon16 described it well, so this post just highlights a few details:

  1. Documentation is collocated with code
  2. Documentation is rendered from code-like Markdown

The first point enables me to include documentation changes and code changes in the same commit.

The second point is appealing because it reduces the cost of context switching between code and documentation. For example, I can edit both in the same editor.

I think part of the appeal is Google’s monorepo. Everything is path-indexed, but things under a “g3doc” dir are rendered into web pages. Searching the repo returns results for code and docs.

Outside of Google, I think Github’s rendering of Mardown content is comparable.

Project governance

I was recently looking for an organizational pattern to 1) help design documents gain visibility, and 2) build a community of senior engineers. We have OWNERS files, but they specify lists of people for ease of maintenance, which complicates the task of finding an appropriate person to review design proposal for affected code. Engineers often have informal conversations about design options, but there’s no body of expertise to query before an impersonal inter-/intra-net search. I needed something in the middle.

This search made me aware of the Fuchsia project, and in particular, its use of the phrase “governance” for the the type of patterns I was looking for. In short: an “eng council” provides “a small group of senior technical leaders responsible for providing a coherent technical vision”; a Request For Comments (RFC) process provides “a consistent and transparent path for making project-wide, technical decisions”; an API council provides “a group of people who are accountable for the quality and long-term health of the Fuchsia API Surface. The council will collaborate constructively with the people who create and modify Fuchsia’s APIs to help guide the evolution of those APIs.” The Fuchsia project recently revised its governance model as part of opening the project for external contributions.

Google has an AIP process, which is like RFCs for APIs.

My team had an API Council, but that focused on the external API surface rather than internal technical decisions. The external focus and the fact it operated at the highest level, required more structure than could be justified for internal discussions. It was helpful to see this council in the context of governance, but I still needed a new structure.

With this in mind, I proposed an eng council structure for the team. We identified ~10 people with several years of diverse experience on the team. We have a weekly meeting, which we cancel if there’s nothing on the agenda.

Interestingly, it now appears team members are better able to find reviewers outside the process, perhaps because the range of reviewers is now more well known. I’ve also heard from eng management that the process has helped teammates have more confidence in their projects. Also of interest, this governing body seems to have a life of its own and needs to be cared for. After several weeks of empty agendas I proposed cancelling the process in favor of exploring options, but several teammates expressed appreciation for a weekly checkpoint, even cancellation is the common outcome.