Software engineering notes

Archive for the ‘book’ Category

The Manager’s Path

leave a comment »

I’m an individual contributor, but I want to better understand management’s concerns, so I’m reading Camille Fournier’s excellent The Manager’s Path. These are my notes.

Many think a neutral relationship with management is good because at least it’s not negative, but there is such a thing as a positive relationship w mgmt.

1-1 mtngs:

  • two purposes:
    • Human connection
    • Private conversation, eg feedback
  • I agree w the above two, and would add a third:
    • To ensure time w otherwise busy ppl; the junior person has the priority
  • Not for status
  • Prepare an agenda. I like a living doc, linked from the mtng invite. Items can be added and referenced any time
  • “Regular 1-1s are like oil changes; if you skip them, plan to get stranded …”
  • “Try to keep notes in a shared document” 👍 I like to link an agenda doc from the mtng invite. (Same for most recurring mtngs.)

As you become more senior, feedback decreases.

Appreciate the fact that current peers turn into future jobs.

Uncertainty
– common every 5-10 yrs
– lots of uncertainty in the world
– ultimately, we have to rely on ourselves

People aren’t good at saying what they mean in a way others can understand, so we have to listen carefully to words, and non-verbal cues indicating the person feels understood.

“Be prepared to say anything complex a few times and in diferent ways.” I’ve found such repetition frustrating in the past. It’s validating to see this advice.

Effective teams have good onboarding documents. Have new hires update the docs as their initial contribution.

“What you measure, you improve.”

Beware alpha-geek tendencies. In particular, the tendency to lecture and debate.

Mentorship skills:
– keep an open mind, since the mentee brings fresh eyes
– listen and speak their language. If you can’t hear the question being asked, you can’t provide good answers
– use the mentorship to build your network

“Tech lead is not a job for the person who wants the freedom to focus deeply on the details of her own code.”

“… the tech lead role may be held by many different stages of engineer, and may be passed from one engineer to another without either person necessarily changing his functional job level.”

“… we know from the title that it is expected to be both a technical position and a leadership role.” In other words, it’s not necessarily superlative, ie TL != best.

“The tech lead is learning to be a strong technical project manager… and [is] learning how to handle difficult management and leadership situations”

“Realistically, it is very hard to grow past senior engineer 2 without ever having acted as a tech lead, even on the individual contributor track… people skills are what we’re asking the new tech lead to stretch, more than pure technical expertise.” This stands out to me because of the tension between manager and maker modes, to use Paul Graham’s terminology.

“Being a tech lead is an exercise in influencing without authority …” Including building a psychological skill set for managing associated stresses.

“From now on … balancing is likely to be on of your core challenges.”

Currently, it feels like I’m working two jobs, manager mode during the day, and maker mode in morning and evenings. Regular, project-specific “cadence” meetings have helped reduce ad hoc discussions, fwiw.

Ah, a few lines later: “Some days you’re on maker’s schedule, and some days your on manager’s schedule…It’s very difficult to get into the groove of writing code if you’re interrupted every hour by a meeting.”

“Part of your leadership is helping the other stakeholders … respect the team’s focus and set up meeting calendars that are not overwhelming for individual contributors.” I’m very happy to see this in a book about managing thought workers.

Main roles of tech lead

  • Architect and business analyst. Design the system enough to provide estimates and ensure requirements are met
  • Project planner. The goal is to maximize parallelization
  • Developer and lead. Write code, but not too much. The goal is the project (and team development), not individual tasks

“Sometimes tech leads are tempted to go to heroics and push through obstacles themselves… [but] you should communicate the obstacle first.” I can relate with the former and appreciate the actionable latter.

“Teams often fail because they overworked themselves on a feature their product manager would have been willing to compromise on.” So, communicate.

“… most managers will expect their tech leads to continue writing as much code as before … It’s generally a pure increase in responsibility …”

The goal of a project plan is a “degree of forethought, in places where you can reasonable make predictions and plan … The plan itself … is less important than the act of planning.”

Take time to explain. No one who’s not actively working on a project should be expected to immediately know and understand project details.

Do a premortem as part of project planning. How could the system fail, and what could we do to recover?

“Having the focus to build something big yourself is a distant memory.”

The agile principles can be a healthy alternative to rigid process 👍 I think they’re great.

“… no two great teams ever look exactly alike in process, tools or work style” The best thing I’ve seen is an appreciation of experimentation and iteratively building a style that works for the current team. A basic project plan, ie list of tasks, also seems like a universal business requirement. Put another way, revisiting that plan periodically seems like a reasonable, universal starting point.

Qualities of a great tech lead:

  • Understand the architecture
  • Help build, but involve others
  • Lead decisions, but do so collaboratively
  • Communicate

“You want to encourage others on your team to learn the entire system … but you don’t always need to be self-sacrificing” There’s the need for a sense of balance again.

“Your productivity is now less important than the productivity of the whole team.” But how to improve the productivity of the team without putting on a management hat? Fournier gives an example: “Represent the team in meetings.”

Possession of communication skills differentiates successful leaders.

“Practice repeating things back to people to ensure you understand them.” I like this! I think it pairs well w advice earlier in the book to listen and observe non-verbal cues.

Communicate and listen.

I’d add that the tech lead label can also make one a focal point for questions, eg support, which can disrupt focus work. I like the pattern of having a support rotation, but depending on the company, the convention may be to simply ping the TL.

“Respect the ‘maker schedule’ for reports” 👍 As a general rule, I appreciate biasing toward contiguous meeting blocks.

Autonomy … is an important element of motivation.” I see this w external contributions too. Maximizing an integrating team’s autonomy frees them to meet their goals w minimal bottlenecks.

Written by Erik

January 31, 2020 at 9:54 am

Posted in book, notes, org, Uncategorized

Site reliability

leave a comment »

Google’s SRE handbook, summarized in The Calculus of Service Availability, and the accompanying Art of SLOs workshop materials, are great. Here are a few things that stand out to me.

SLI vs SLO vs SLA

As defined in the chapter on Service Level Objectives:

  • SLI = Service Level Indicator. In other words, a specific metric to track. For example, the success rate of an endpoint. Have as few as possible, to simplify reasoning about service health.
  • SLO = Service Level Objective. A desired SLI value. For example, a 99% success rate.
  • SLA = Service Level Agreement. A contractual agreement between a customer and service provider defining compensation if SLOs are not met. Most free products do not need SLAs.

The “Indicators in Practice” section of the SLO chapter provides some helpful guidelines about what to measure:

  • User-facing serving systems –> availability, latency, and throughput
  • Storage systems –> latency, availability, and durability
  • Big data systems –> throughput and end-to-end latency

In the context of less is more, note each domain has 2-3 SLIs.

Reasonable SLOs

Naively, I’d think a perfect SLO would be something like 100% availability, but the “Embracing Risk” chapter clarifies all changes have costs. Striving for 100% availability would constrain all development to the point where the business might fail for lack of responsiveness to customer’s feature requests, or because it spent all its money on monitoring.

Additionally, customers might not notice the difference between 99% and 100%. For example, “a user on a 99% reliable smartphone cannot tell the difference between 99.99% and 99.999% service reliability!”

Related, a dependency’s SLOs can also provide a guideline. For example, if my service depends on a service with 99% availability, I can forget about 100% availability for my service.

I find downtime calculations (eg uptime.is) helpful for reasoning about appropriate SLOs:

  • 90% = 36d
  • 99% = 3d/yr
  • 99.9% = 8h/yr
  • 99.99% = 52m/yr
  • 99.999% = 5m/yr

The Art of SLOs participant handbook has an “outage math” section that provides similar data, broken down by year, quarter and 28 days.

So, if our strategy is to page a person and have them mitigate any issue in an hour, we might consider a 99.99% SLO. A 5 minute mitigation requirement is outside human ability, so our strategy should include something like canary automation. In this context, a small project involving a couple people on a limited budget should probably consider a goal of 90% or 99% availability.

I found it helpful to walk through an example scenario. The Art of SLOs participant handbook provides several example “user journeys”. For example, I work on a free API developers consume in their apps. This fits the general description of a “user-facing systems”, so availability, latency, and throughput are likely SLIs. Of these, most support requests concern availability and latency, so in the spirit of less is more, I’d focus on those.

I have an oncall rotation, pager automation (eg PagerDuty), and canary automation, but I’m also building on a service with a 99% availability.

We can reasonably respond to pages in 30 minutes, and fail out of problematic regions within 30 minutes after that, but we also have occasional capacity issues which can take a few hours to resolve.

So, 99% seems like a reasonable availability SLO.

A latency SLI seems more straightforward to me, perhaps because it can be directly measured in a running system. One guideline that comes to mind is the perception of immediacy for events that take less than 100ms.

Written by Erik

December 21, 2019 at 10:35 am

Posted in book, SRE

Deep Work by Cal Newport 📖

with one comment

I’ve found I need uninterrupted time to focus for software projects at work, and actually leave work unsatisfied if I’m unable to find this time. While investigating what this is all about, I came across Newport’s Deep Work.

The first half of the book makes a case for investing in focus time, especially for folks performing “knowledge work”, which can require workers to hold many details in mind to produce significant output. The second half of the book provides practical recommendations for how to set aside uninterrupted time.

Focus time

Like Paul Graham’s Maker’s Schedule, Manager’s Schedule, the first recommendation is to explicitly identify and reserve time for focused work, which I found validating.

In practice, I find a couple common sources of distractions:

  • random requests and discussions via email, chat, etc
  • random requests and discussions around my desk area

The book recommends avoiding interruptive electronic communications, especially social media, when trying to focus, and makes a case that most interruptions can wait.

Regarding in-person interruptions, the author mentions sequestering himself in a library. I might try something similar.

One challenge I see: focus requirements might be dependent on when as well as what. For example, the early days of a project might be meeting-heavy as requirements are clarified, and then focus-heavy as pieces are built.

My current experiment is to reserve 8-10 and 2-5 for focus work. During these times, I’ll sit away from my desk and ignore chat and email. This leaves a couple hours before and after lunch to catch up on electronic communication, have spontaneous discussions and provide a reasonable window for scheduling meetings.

The book cites research indicating our ability to work deeply maxes out around four hours, which also aligns with my experience. I previously experimented with reserving 1-5, but found it was too restrictive for natural interactions with colleagues, and I didn’t need the entire time anyway.

In an effort to provide “ownership” for my project, I took pains to be aware of all changes and support requests. This obviously doesn’t scale, but I appreciated reading a supportive quote from Tim Ferris:

Develop the habit of letting small bad things happen. If you don’t, you’ll never find time for the life-changing things.

Productivity

An interesting aspect of this book is it’s not simply presenting thoughts on the topic of deep work; it was motivated by the author’s need to optimize productivity. In this context, the book introduced me to 4DX, which is like agile boiled down to four points. As with agile, I find myself referencing it often.

In particular, I appreciate the emphasis on prioritization; during focused effort, we say “no” to many things so we can say “yes” to the most important things.

The book also mentions the idea of “results-driven reporting”, which defers meetings until there’s something new to report, as an alternative to status updates. This aligns with recommendations to prioritize requirements and blockers over status updates in cadence meetings from a recent talk on politically-charged projects.

Written by Erik

October 6, 2019 at 6:41 pm

Posted in book