Software engineering notes

Archive for the ‘tool’ Category

Site reliability

leave a comment »

Google’s SRE handbook, summarized in The Calculus of Service Availability, and the accompanying Art of SLOs workshop materials, are great. Here are a few things that stand out to me.


As defined in the chapter on Service Level Objectives:

  • SLI = Service Level Indicator. In other words, a specific metric to track. For example, the success rate of an endpoint. Have as few as possible, to simplify reasoning about service health.
  • SLO = Service Level Objective. A desired SLI value. For example, a 99% success rate.
  • SLA = Service Level Agreement. A contractual agreement between a customer and service provider defining compensation if SLOs are not met. Most free products do not need SLAs.

The “Indicators in Practice” section of the SLO chapter provides some helpful guidelines about what to measure:

  • User-facing serving systems –> availability, latency, and throughput
  • Storage systems –> latency, availability, and durability
  • Big data systems –> throughput and end-to-end latency

In the context of less is more, note each domain has 2-3 SLIs.

Reasonable SLOs

Naively, I’d think a perfect SLO would be something like 100% availability, but the “Embracing Risk” chapter clarifies all changes have costs. Striving for 100% availability would constrain all development to the point where the business might fail for lack of responsiveness to customer’s feature requests, or because it spent all its money on monitoring.

Additionally, customers might not notice the difference between 99% and 100%. For example, “a user on a 99% reliable smartphone cannot tell the difference between 99.99% and 99.999% service reliability!”

Related, a dependency’s SLOs can also provide a guideline. For example, if my service depends on a service with 99% availability, I can forget about 100% availability for my service.

I find downtime calculations (eg helpful for reasoning about appropriate SLOs:

  • 90% = 36d
  • 99% = 3d/yr
  • 99.9% = 8h/yr
  • 99.99% = 52m/yr
  • 99.999% = 5m/yr

The Art of SLOs participant handbook has an “outage math” section that provides similar data, broken down by year, quarter and 28 days.

So, if our strategy is to page a person and have them mitigate any issue in an hour, we might consider a 99.99% SLO. A 5 minute mitigation requirement is outside human ability, so our strategy should include something like canary automation. In this context, a small project involving a couple people on a limited budget should probably consider a goal of 90% or 99% availability.

I found it helpful to walk through an example scenario. The Art of SLOs participant handbook provides several example “user journeys”. For example, I work on a free API developers consume in their apps. This fits the general description of a “user-facing systems”, so availability, latency, and throughput are likely SLIs. Of these, most support requests concern availability and latency, so in the spirit of less is more, I’d focus on those.

I have an oncall rotation, pager automation (eg PagerDuty), and canary automation, but I’m also building on a service with a 99% availability.

We can reasonably respond to pages in 30 minutes, and fail out of problematic regions within 30 minutes after that, but we also have occasional capacity issues which can take a few hours to resolve.

So, 99% seems like a reasonable availability SLO.

A latency SLI seems more straightforward to me, perhaps because it can be directly measured in a running system. One guideline that comes to mind is the perception of immediacy for events that take less than 100ms.

Written by Erik

December 21, 2019 at 10:35 am

Posted in book, tool

A nice data store ๐Ÿช

leave a comment »

I don’t have a lot of experience with data stores, but I recently met one that seems nice and simple.

The store benefits from a few other abstractions:

  1. a service that just ingests and persists client events
  2. a query abstraction, like Hive
  3. trustworthy authentication and list membership infra

Given these, the store in question simplifies the process of utilizing data by abstracting a few common requirements:

  1. a simple config DSL specifies which query to run, the frequency to run it, the output table, deletion conditions, etc. Specifying config via files enables use of common source control tools.
  2. three predefined processing stages (raw-to-normalized, normalized-to-problem-specific, problem-specific-to-view-specific). New event sources, aggregations and views can be independently defined by adding new config files.
  3. common styling and libraries for data visualization
  4. access is generalized to a few tiers of increasing restriction, eg team, division, company. The lowest level might be freely granted to teams for their own business intelligence, and the highest level restricted to executives for making revenue-specific decisions.

In retrospect, this seems pretty straightforward. I’m remembering a tool from another team (basically Rails + D3) that had the same goals, but didn’t have the query, scheduling or ACL abstractions underneath. It was replaced by an external tool that was terrible to the point of being unusable, but more secure. Eventually, we dumped normalized data in a columnar store that was also secure and easier to use for our team’s business intelligence, but would’ve been insufficient for things like periodically updating charts. I guess it’s the combination of data store features and supporting infra that makes the magic happen.

Written by Erik

October 14, 2019 at 8:52 pm

Posted in pattern, tool

Praise for a task tracker ๐Ÿ“‰

leave a comment »

I just wrapped up a three month project in a new tech stack. From the beginning to the end, I was regularly asked: “Will this be done in time?” I was new to the tech, so it was difficult to estimate. The best I could do was document tasks as they revealed themselves, and then point at the percentage of tasks complete as a measure of progress.

For the first couple months, I added more tasks than I closed. The scope of the project seemed like it would grow unbounded, but then it leveled off. Today, I closed the last task. Amazing, and it wasn’t that hard, because we at least had an agreed-upon task tracker. I just needed to feed it.

Some features I like:

  • Using the same system used for tracking bugs. This frees folks to use whatever tracker they prefer. One colleague prefers a simple spreadsheet of tasks in descending priority, for example. It also enabled folks to use familiar bug tracking tools, like subscribing to updates, assigning ownership, commenting, etc, to communicate about tasks
  • At least trying to present a burn-down chart, though it doesn’t yet know how to work with nested milestones. A burn-down chart is the most accurate tool I’ve found for estimation; even if we can’t estimate the completion time, we can say the job will be done when the number of tasks drops to zero, and easily follow along on the chart.
  • Organizing tasks in a variety of ways, notably, differentiating organization by project from organization by week
  • Organizing tasks by blocking relationship. In other words, enabling me to specify this task depends on that task. Identifying branch nodes in this tree as milestones is helpful for maintaining a sense of momentum

Some usage patterns I like:

  • Presenting the tracker in a periodic (agile, 4DX) cadence meeting and using it to structure discussion. Avoiding tracker maintenance was helpful for maintaining meeting pace
  • Agreeing on one approach to tracking helped focus attention and minimize maintenance cost, though an alternative popped in and out mid-project, and the project launch used a different tracker

Written by Erik

September 25, 2019 at 10:01 pm

Posted in org, tool


leave a comment »

The joy of top-down rendering.


I want to present data, ideally as view = render(data).


I really like the view mechanics provided by choo/yo-yo/bel.

const html = require('bel')
const nanobus = require('nanobus')
const yo = require('yo-yo')

const bus = nanobus()
const render = yo.update.bind(yo, document.body)
const emit = bus.emit.bind(bus)

bus.on('change', (name) => {
  const state = {} = name.toUpperCase()
  render(view(state, emit))

function view(state, emit){
  return html`
      Hello, <input value="${}" placeholder="name" onkeyup=${onKeyUp}>
  function onKeyUp(e){

Written by Erik

October 4, 2017 at 6:13 pm

Posted in pattern, tool

Tagged with , , , , ,

Object path

leave a comment »


I want to reduce conditional assignment when setting nested keys in an object, ideally:

{a:{b:{c:value}}} = set(a/b/c, value)

This is handy for data manipulation and abstracting path-based tools like LevelDB and Firebase Realtime Database.


Use object-path or lodash’s set/get.

Note: the tools mentioned above interpret numeric path segments as array indices, which may cause unexpected results when inserting arbitrary values, eg:

set(store, '', 'Kwan') // store.users.length --> 6

If this is an issue, consider:

function set(obj, path, val){
  path.split('/').reduce((parent, key, i, keys) => {
    if (typeof parent[key] != 'object') {
      if (i === keys.length - 1) {
        parent[key] = val
      } else {
        parent[key] = {}
    return parent[key]
  }, obj)
function get(obj, path){
  return path.split('/').reduce((parent, key) => {
    return typeof parent === 'object' ? parent[key] : undefined
  }, obj)


Inverting an object:

const posts = {1: {tags: {sports: true, news: true}}, 2: {tags: {news: true}}}
const byTag = {}
Object.entries(posts).forEach(([id, post]) => {
  Object.keys(post.tags).forEach(tag => {
    set(byTag, `${tag}/${id}`, true)
// byTag --> { sports: { '1': true }, news: { '1': true, '2': true } }

Creating and querying a prefix tree:

const flatten = require('flat')

// populate tree
const emojis = {
  '๐Ÿ™‚': 'smile',
  '๐Ÿ˜€': 'grinning',
  '๐Ÿ˜': 'grin'
const tree = {}
Object.entries(emojis).forEach(([emoji, name]) => {
  let path = name.split('').join('/') + '/' + emoji
  set(tree, path, true)

// lookup prefix
const prefix = 'g'
const path = prefix.split('').join('/')
const subtree = get(tree, path) || {}
const matches = Object.entries(flatten(subtree)).map(([key, val]) => {
  return key.slice(-2)
console.log(matches) // --> ["๐Ÿ˜€", "๐Ÿ˜"]

Written by Erik

October 3, 2017 at 6:01 pm

Posted in pattern, tool

Tagged with , , , ,

Adventures in blogging ๐Ÿ“

leave a comment »


I like the idea of shared learning and working together as a team, and in a way that’s independent of employment.

The desire for independence takes a couple forms:

  1. Minimizing loss of work due to employment boundaries. For example, learning documented on an internal blog may be lost when changing roles.
  2. Maximizing opportunities that would otherwise be constrained by company goals. For example, exploring professional growth areas that may never be a company priority.

Prakhar pointed me at Nathan Marz’s post on blogging, which provides excellent motivation specific to blogging.

Megha shared her experience at Write/Speak/Code, which recommends professional writing as an essential aspect of career development. I like Write/Speak/Code’s recommendation to seek feedback from trusted sources. Inline comments and explicit comment permission, like in Google Docs, would be ideal.

(Write/Speak/Code also brought Open Source Misfeasance to my awareness ๐Ÿ‘ esp the slide “open source is like being an adult – it seems magical until you realize nobody knows what the hell they’re doing.” ๐Ÿ™‚

I’m inspired by the meta-knowledge community on Github.

A New Yorker article on Jonathan Ledgard mentions he “carried two notebooksโ€”red for his reporting notes, blue for thoughts and observations to use in fictionโ€ฆ”

Current approach

A simple note app (currently Google Keep) to capture and access thoughts with minimal fuss.

WordPress for blog hosting. It feels relatively old-school, but it’s stable, feature-rich and has a large community. Signal v Noise, Joel Spolsky and Bill Gates use it, so I’m in good company. A few features I appreciate in particular:

  • Full-text search, which is especially useful for finding and updating posts
  • Navigation by tag and category
  • A convenient native app
  • Easy navigation from viewing to editing
  • Basic stats and “likes” aren’t the reason I’m writing, but are reassuring

Separate professional from personal content, to facilitate usage of content at work, hence the desire for multi-account support.

Alternatives explored

  • Medium is clean and modern, but the monetization strategy of charging readers seems relatively high-friction, and it doesn’t have multi-account support
  • is clean, and has a sustainable business model and an interesting tie-in to content federation, but it’s missing comments, search, etc
  • Github Pages is simple, but relatively inconvenient for frequent content management
  • Forestry CMS to manage Github Pages is an improvement, but I still want for search, comments, etc
  • Siteleaf is good, but doesn’t provide preview in free mode, and renames files according to date front-matter (after import)
  • I tried in the past, but saving content was flaky
  • jekyll-admin seems interesting if/when it’s supported by Github Pages

Written by Erik

September 27, 2017 at 4:53 pm

Posted in tool

Tagged with , , , , , , ,

Source control

leave a comment »


I want to snapshot incremental progress and designate a working version.

I want a tool that’s widely available and easy to reason about.


Use Git. [1]

Use a single repo [2] until it’s unwieldy [3].

[1]: Some folks (Facebook, Google) argue Murcurial is preferable because it provides an extensible API, but in my experience, git has been sufficient for non-trivial work with tens of eng.

[2]: Looking forward to things like GVFS for scaling.

[3]: Managing multiple interdependent repos has a cost, which is why I trend to a monorepo, but I’ve also worked w a monorepo that took ages to sync. I’d be curious to see someone experiment w tools for managing several repos as an explicit alternative to a monorepo.

Written by Erik

September 19, 2017 at 4:13 pm

Posted in tool

Tagged with , , ,