Launch stages

Agreeing to incremental launch stages and the criteria that govern promotion from one stage to the next makes launch commitments, aka deadlines, less risky. For example, Google SRE’s launch plan documentation describes four: Early Access Preview, Alpha, Beta and General Availability (GA). Committing to a GA launch is easier after a successful Beta, which is easier after a successful Alpha, end so on.

Comparable to a deployment pipeline, promotion from one stage to the next ideally concerns stability rather than features. In other words, if the difference between Alpha and Beta is a length of time without issues, we can promote with confidence after that time, but if the difference is a set of new features, we don’t know how those features will perform.

Control vs data planes

I recently became aware of a helpful dichotomy: control vs data plane. The former governs how the latter should be delivered.

I believe these terms come from the world of networking, but they’re now entering the world of application engineering via DevOps.

For example, I work on a product that delivers targeted configuration to apps. In this context, the targeting logic is the control plane, and the resulting configuration is the data plane. For contrast, the RESTful perspective would describe both as resources.

In this context, I can see if other patterns might apply. In particular, the best-practice of a declarative control plane has been helpful lately. As Azure’s introduction to Infrastructure as Code states, the goal is to specify “what an environment requires and not necessarily the how.” Collocating control with code simplifies reasoning and minimizes the cost of switching between application and infrastructure logic, similar to the benefits of collocating documentation with code.

Rotating cadence lead

Rotating leadership for recurring team “cadence” meetings is a beneficial pattern I’ve seen on several teams.

“Cadence” meetings help a team or project march to a rhythm. All teams I’ve been on have had them, but not all have called them “cadence”. In agile terms, a daily standup is one form. A form I like is Monday kickoff, Wednesday discussion, Friday retro.

The pattern is simple:

  1. As a team, define the responsibilities of the lead, eg keeping the meeting focused on the agenda, cancelling the meeting if there’s nothing on the agenda, etc
  2. Identify the consistent attendees of the meeting and rotate the lead role among them

Such a rotation has a few benefits:

  1. There’s no single point of failure for keeping the team organized
  2. All members of the team get leadership experience, and no single person is stuck with this form of glue work
  3. Sharing roles engenders empathy between roles. For example, experience motivating participation as a lead can encourage participation as a non-lead. As opposed to “taxation without representation” 🙂

A couple anti-patterns I’ve seen:

  1. Non-overlapping leads and attendees. For example, having the eng oncall rotation also lead a cadence including EMs, PMs, designers, etc who aren’t on the oncall rotation
  2. Having the rotation include optional attendees, which can result in last-minute adjustments

I’m also curious about rotating team leads, eg as part of the Engineer Manager Pendulum, but I don’t have experience with that yet.

Bugs vs tasks vs goals

I am aware of three common approaches for tracking work:

  1. Bugs, eg Github issues, JIRA tickets, etc
  2. Tasks, eg items in a list of things to do
  3. Goals, eg some end state

There are probably many more, but I commonly see teams struggle to reconcile these three.

Part of the challenge is they’re all related and required in some context, but no one is sufficient. Bugs are required because people external to a team need a way to request work. Further, some of this work is essential, so bugs can’t be ignored. Bugs generally represent unplanned work.

Tasks are required because we need a way to deconstruct large projects into more manageable pieces. Tasks generally represent planned work.

Goals are required to separate implementation details from an objective. One of my colleagues phrased it well: setting goals shouldn’t be controversial.

Sometimes tasks can be represented as bugs, but bugs by nature are relatively formal, which breaks down when tasks change at a high rate. Some teams strive for tasks that take no more than one iteration, which is tedious to represent as bugs. I like the pattern of stating goals for the week, and reviewing progress against those goals at the end of the week, but this is tedious to represent as bugs or tasks.

My fantasy is something like:

  1. Monday kickoff and Friday review focused on a simple, written (so we can remember on Fri) list of goals for the week
  2. A support rotation monitors bugs. Active work on bugs is represented as goals for the week
  3. Large projects have independent task tracking. Active work on tasks is represented as goals for the week

The fact that my goal for the week concerns planned or unplanned work matters less than communicating to the team what I’m working on and how it contributes to the team’s priorities.

Praise for Markdown eng docs

Google has a technical documentation system called “g3doc”. The “The Knowledge: Towards a Culture of Engineering Documentation” presentation at SRECon16 described it well, so this post just highlights a few details:

  1. Documentation is collocated with code
  2. Documentation is rendered from code-like Markdown

The first point enables me to include documentation changes and code changes in the same commit.

The second point is appealing because it reduces the cost of context switching between code and documentation. For example, I can edit both in the same editor.

I think part of the appeal is Google’s monorepo. Everything is path-indexed, but things under a “g3doc” dir are rendered into web pages. Searching the repo returns results for code and docs.

Outside of Google, I think Github’s rendering of Mardown content is comparable.

Project governance

I was recently looking for an organizational pattern to 1) help design documents gain visibility, and 2) build a community of senior engineers. We have OWNERS files, but they specify lists of people for ease of maintenance, which complicates the task of finding an appropriate person to review design proposal for affected code. Engineers often have informal conversations about design options, but there’s no body of expertise to query before an impersonal inter-/intra-net search. I needed something in the middle.

This search made me aware of the Fuchsia project, and in particular, its use of the phrase “governance” for the the type of patterns I was looking for. In short: an “eng council” provides “a small group of senior technical leaders responsible for providing a coherent technical vision”; a Request For Comments (RFC) process provides “a consistent and transparent path for making project-wide, technical decisions”; an API council provides “a group of people who are accountable for the quality and long-term health of the Fuchsia API Surface. The council will collaborate constructively with the people who create and modify Fuchsia’s APIs to help guide the evolution of those APIs.” The Fuchsia project recently revised its governance model as part of opening the project for external contributions.

Google has an AIP process, which is like RFCs for APIs.

My team had an API Council, but that focused on the external API surface rather than internal technical decisions. The external focus and the fact it operated at the highest level, required more structure than could be justified for internal discussions. It was helpful to see this council in the context of governance, but I still needed a new structure.

With this in mind, I proposed an eng council structure for the team. We identified ~10 people with several years of diverse experience on the team. We have a weekly meeting, which we cancel if there’s nothing on the agenda.

Interestingly, it now appears team members are better able to find reviewers outside the process, perhaps because the range of reviewers is now more well known. I’ve also heard from eng management that the process has helped teammates have more confidence in their projects. Also of interest, this governing body seems to have a life of its own and needs to be cared for. After several weeks of empty agendas I proposed cancelling the process in favor of exploring options, but several teammates expressed appreciation for a weekly checkpoint, even cancellation is the common outcome.

Technical-organizational balance ⚖️

I recently started focusing on technical work after several months of organizational work and it’s been a lot of fun. I explicitly missed a couple things: being directly involved in a project, especially focusing deeply on a problem, and working closely with a small team.

A manager friend half-jokingly described the switch as career-limiting. He also joked we’re not paid to work only on things we enjoy. I can see his point, but there must be a balance. We are paid to do things well, and I think that’s difficult when we don’t enjoy what we’re doing, at least in part.

This recent switch has me thinking of “The Engineer/Manager Pendulum” and the follow-up “Engineering Management: The Pendulum Or The Ladder”. All the quotes below are from these essays.

If management isn’t a promotion, then returning to hands-on work isn’t a demotion, either.

I prefer the term “organizational” to “management” for the non-technical work I do because most people think of the latter as people management. I wasn’t a people manager, but I was focused on project management, spent most of my time in meetings and learned to avoid any technical work in the critical path because I had no uninterrupted time to focus on it.

A tech lead is a manager … but their first priority is achieving the task at hand, not grooming and minding the humans who work on it.

The author provides appropriate advice:

Stop writing code and engineering in the critical path

The author mentions skill erosion after two years, but I experienced something similar after just a few months. Perhaps because I needed to make room in my head for a diversity of projects, I lost the context to go deep on any one of them. My activities were described as “leadership”, but I felt like those more directly involved were actually leading in a technical sense. I can see a need for leadership that stays out of the weeds, eg to avoid the sunk cost fallacy, but my role felt like an awkward middle-ground.

I think of this dichotomy as “technical” vs “organizational”. Both are important, but difficult to do well at the same time.

Management is highly interruptive, and great engineering — where you’re learning things — requires blocking out interruptions. You can’t do these two opposite things at once.

“Maker’s Schedule, Manager’s Schedule” is an essay I think of often on that topic.

Anyway, I think this feeling of fun is positive feedback that it was time for the pendulum to swing from organizational work back to technical work.

… you can feel your skills getting rusty and your effectiveness dwindling. You owe it to yourself to figure out what makes you happy and build a portfolio of experiences that liberate you to do what you love.

I find the phrase “career growth” often refers to increased prestige, rather than fulfillment.

Try to resist the default narratives about promotions and titles and roles, they have nothing to do with what satisfies your soul.

Impact

A professional koan: of the projects I can work on, which has the most impact for the business?

Thinking about this highlights the importance of a prioritized backlog. If impact is a component of the prioritization, then the project with the highest impact is clear.

A senior manager recently commented on some faulty advice they’d heard about service development being more important than client development. They clarified that impact determines importance. We should be working on projects with impact.

“The Secret to Growing Your Engineering Career If You Don’t Want to Manage” makes several good points.

Many engineers become managers because management provides an obvious and well-defined leverage point to scale your impact.

It’s relatively easy to produce more if I can delegate work out to a team.

The less conventional paths outside of management require more creativity, and there are fewer available narratives of successful engineers outside of management for us to model ourselves after.

I can attest to this, and explore it further in “Technical-organizational balance”, but I find it surprising given the idea of parallel tech and management tracks is ostensibly common practice.

Your ability to decide where to spend your efforts to maximize your impact — what code to write, what software to build, and which business problems to tackle … You identify and solve problems that are core to the business, or you enable those around you to more effectively solve those core business problems.

Hopefully our ability to identify projects with high impact improves over time.

“How to Grow as an Engineer (Working Remotely)” also touches on impact.

It’s also not enough to just solve any problems. You need to be solving the right ones. You should constantly make sure there’s alignment between what you want and what the business needs…

MLCC: Neural Networks

I am working through Google’s Machine Learning Crash Course. The notes in this post cover the “Neural Networks” module.

Does “deep learning” imply neural networks?

The introductory video refers to “deep neural networks”, so I’m wondering what the relationship is between deep learning and neural networks.

Yes, according to Quora’s “Does deep learning always mean neural network or can include other ML techniques?”.

“To give you some context, modern Convolutional Networks contain on orders of 100 million parameters and are usually made up of approximately 10-20 layers (hence deep learning)” – https://cs231n.github.io/neural-networks-1/

“Deep Learning is simply a subset of the architectures (or templates) that employs ‘neural networks’” – https://towardsdatascience.com/intuitive-deep-learning-part-1a-introduction-to-neural-networks-aaeb3a1500df (TDS)

“Deep learning” in Google’s glossary links to “deep model”: “A type of neural network containing multiple hidden layers.”

“However, until 2006 we didn’t know how to train neural networks to surpass more traditional approaches, except for a few specialized problems. What changed in 2006 was the discovery of techniques for learning in so-called deep neural networks.” – http://neuralnetworksanddeeplearning.com/about.html

Toward’s Data Science’s “Intuitive Deep Learning Part 1a: Introduction to Neural Networks” clarifies “deep learning” is a subset of machine learning. I guess they’re both “learning”. I like the comparison of an algorithm to a recipe, and in this context, ML optimizes a recipe. Deep learning is a subset of optimization techniques.

When to use neural networks?

Small data with linear relationships → LSR

Large data with linear relationships → gradient descent

Large data with simple, nonlinear relationships → feature crosses

Large data with complex, nonlinear relationships → NN

“Neural nets will give us a way to learn nonlinear models without the use of explicit feature crosses” – https://developers.google.com/machine-learning/crash-course/introduction-to-neural-networks/playground-exercises

“Neural networks, a beautiful biologically-inspired programming paradigm which enables a computer to learn from observational data” – http://neuralnetworksanddeeplearning.com/index.html

NN “have the flexibility to model many complicated relationships between input and output”- https://towardsdatascience.com/intuitive-deep-learning-part-1a-introduction-to-neural-networks-aaeb3a1500df

“That’s not to say that neural networks aren’t good at solving simpler problems. They are. But so are many other algorithms. The complexity, resource-intensiveness and lack of interpretability in neural networks is sometimes a necessary evil, but it’s only warranted when simpler methods are inapplicable” – https://www.quora.com/What-kinds-of-machine-learning-problems-are-neural-networks-particularly-good-at-solving

Why are there multiple layers?

“each layer is effectively learning a more complex, higher-level function over the raw inputs” – https://developers.google.com/machine-learning/crash-course/introduction-to-neural-networks/anatomy

“A single-layer neural network can only be used to represent linearly separable functions … Most problems that we are interested in solving are not linearly separable.” – https://machinelearningmastery.com/how-to-configure-the-number-of-layers-and-nodes-in-a-neural-network/

The universal approximation theory states that one hidden layer is sufficient for any problem – https://machinelearningmastery.com/how-to-configure-the-number-of-layers-and-nodes-in-a-neural-network/

“How many hidden layers? Well if your data is linearly separable (which you often know by the time you begin coding a NN) then you don’t need any hidden layers at all. Of course, you don’t need an NN to resolve your data either, but it will still do the job.” – https://stats.stackexchange.com/questions/181/how-to-choose-the-number-of-hidden-layers-and-nodes-in-a-feedforward-neural-netw

“One hidden layer is sufficient for the large majority of problems.” – https://stats.stackexchange.com/questions/181/how-to-choose-the-number-of-hidden-layers-and-nodes-in-a-feedforward-neural-netw

“Even for those functions that can be learned via a sufficiently large one-hidden-layer MLP, it can be more efficient to learn it with two (or more) hidden layers” – https://machinelearningmastery.com/how-to-configure-the-number-of-layers-and-nodes-in-a-neural-network/

“Multi-layer” implies at least one hidden layer: “It has an input layer that connects to the input variables, one or more hidden layers” – https://machinelearningmastery.com/how-to-configure-the-number-of-layers-and-nodes-in-a-neural-network/

Chris Olah’s “Neural Networks, Manifolds and Topology”, linked from the crash course, visualizes how data sets intersecting in n dimensions may be disjoint in n + 1 dimensions, which enables a linear solution. Other than that, though, Olah’s article was over my head. Articles like TDS are more my speed.

Why are some layers called “hidden”?

“The interior layers are sometimes called “hidden layers” because they are not directly observable from the systems inputs and outputs.” – https://machinelearningmastery.com/how-to-configure-the-number-of-layers-and-nodes-in-a-neural-network/

How many layers do I need?

Task 4 in the exercise recommends playing around with the hyperparameters to get a certain loss, but the combinatorial complexity makes me wonder if there’s an intuitive way to think about the role of layers and neurons. 🤔

“Regardless of the heuristics you might encounter, all answers will come back to the need for careful experimentation to see what works best for your specific dataset” – https://machinelearningmastery.com/how-to-configure-the-number-of-layers-and-nodes-in-a-neural-network/

“In sum, for most problems, one could probably get decent performance (even without a second optimization step) by setting the hidden layer configuration using just two rules: (i) number of hidden layers equals one; and (ii) the number of neurons in that layer is the mean of the neurons in the input and output layers.” – https://stats.stackexchange.com/questions/181/how-to-choose-the-number-of-hidden-layers-and-nodes-in-a-feedforward-neural-netw

“3 neurons are enough because the XOR function can be expressed as a combination of 3 half-planes (ReLU activation)” – https://developers.google.com/machine-learning/crash-course/introduction-to-neural-networks/playground-exercises Seems narrowing the problem space to ReLU enables some deterministic optimization.

“The sigmoid and hyperbolic tangent activation functions cannot be used in networks with many layers due to the vanishing gradient problem” – https://machinelearningmastery.com/rectified-linear-activation-function-for-deep-learning-neural-networks/

“use as big of a neural network as your computational budget allows, and use other regularization techniques to control overfitting” – https://cs231n.github.io/neural-networks-1/#arch

“a model with 1 neuron in the first hidden layer cannot learn a good model no matter how deep it is. This is because the output of the first layer only varies along one dimension (usually a diagonal line), which isn’t enough to model this data set well” – https://developers.google.com/machine-learning/crash-course/introduction-to-neural-networks/playground-exercises

“A single layer with more than 3 neurons has more redundancy, and thus is more likely to converge to a good model” – https://developers.google.com/machine-learning/crash-course/introduction-to-neural-networks/playground-exercises

Two hidden layers with eight neurons in the first and two in the second performed well (~0.15 loss) on repeated runs.

Heuristics from spiral solution video:

  1. Tune number of layers and nodes. Max neurons in the first layer, tapering down a couple layers to the output is a reasonable start. Each neuron takes time to train, though, so reduce total neurons if training is too slow. This is reinforced by the practice exercise, which started with two layers of 20 and 12 neurons, and then tried to reduce the number of neurons while keeping loss stable.
  2. Reduce the learning rate to smooth loss curve
  3. Add regularization to further smooth loss curve
  4. Feature engineering helps with noisy data
  5. Try different activation functions. Ultimately, tanh had the best fit
  6. Iterate from 1

Even after all this, tuning hyper parameters still seems combinatorially complex.

Activation functions

A neural net consists of layers. Nodes in the bottom layer are linear equations. Nodes in a “hidden” layer transform a linear node into a non-linear node using an “activation function”. The crash course states “any mathematical function can serve as an activation function”.

A sigmoid is an example of an activation function. I remember from the module on logistic regression (notes) that we used a sigmoid to transform a linear equation into a probability.

Why is it called a “neuron”?

The glossary definition for “neuron” is pretty good: 1) “taking in multiple input values and generating one output value”, and 2) ”The neuron calculates the output value by applying an activation function.” Aside: this reminds me of lambda architecture. I appreciate TDS clarifying neurons “often take some linear combination of the inputs”, like w1x1 + w2x2 + w3x3. I suppose this is what the glossary means by “a weighted sum of input values”.

TDS references a single image from the biological motivations section of Stanford’s CS231n, but I find both the images from that section useful for comparison.

I like TDS’ definition of a “layer” as “a “neural network” is simply made out of layers of neurons, connected in a way that the input of one layer of neuron is the output of the previous layer of neurons”. In that context, the hidden layer diagrams from the crash course makes sense.

Working agreement

I’ve had a good experience with something called a “working agreement” in the last couple teams I’ve worked on.

The main value I’ve seen is in the discussion, when everyone can have a voice regarding how a high-performing team operates. The process can help a new team gel. The resulting artifact provides a third value of resolving operational details for future reference. For example, rather than debate an issue anew, we can just reference the agreement, which all parties had a role in.

I’ve found a scaffolding helpful for structuring the conversation:

  • Core values
  • Expectations
  • Norms
  • Agreements

Core values are simple ideals and can be aspirational. For example, “we provide each other psychological safety”, or “we do our best work together.” An artificial constraint of, say, five values can help motivate discussion, and improve accessibility of the resulting list.

Expectations provide an opportunity to state explicitly what we might be assuming. For example, “I assume best intentions” or “I assume folks have the big picture in mind, even for small changes.”

Norms provide an opportunity to express preferences. For example, “(given designing, writing and reviewing code requires focused attention) I benefit from no-meeting blocks”, or “I read emails, but aggressively purge (so I don’t mind a follow-up a day later).” Note that norms are becoming more operational.

Agreements build on and finalize the preceding sections. For example, “(To work efficiently and respectfully) we’ll respond to code review requests within a day, or communicate otherwise” or “We’ll use a single email adress for the team, differentiated by suffix (so it’s easy to find all emails, but it’s also possible to filter).”

It can take some time to build rapport, especially if the team is new, so budget an hour for discussion, and likely an additional hour on another day to finalize. It’s helpful to share the scaffolding in advance, so folks can start adding ideas. Having a different person lead development of each section improves participation.

These teams have a stated goal to review the agreement periodically, but in practice I’ve found on-demand is sufficient.