Software engineering notes

Archive for the ‘pattern’ Category

A nice data store 🏪

leave a comment »

I don’t have a lot of experience with data stores, but I recently met one that seems nice and simple.

The store benefits from a few other abstractions:

  1. a service that just ingests and persists client events
  2. a query abstraction, like Hive
  3. trustworthy authentication and list membership infra

Given these, the store in question simplifies the process of utilizing data by abstracting a few common requirements:

  1. a simple config DSL specifies which query to run, the frequency to run it, the output table, deletion conditions, etc. Specifying config via files enables use of common source control tools.
  2. three predefined processing stages (raw-to-normalized, normalized-to-problem-specific, problem-specific-to-view-specific). New event sources, aggregations and views can be independently defined by adding new config files.
  3. common styling and libraries for data visualization
  4. access is generalized to a few tiers of increasing restriction, eg team, division, company. The lowest level might be freely granted to teams for their own business intelligence, and the highest level restricted to executives for making revenue-specific decisions.

In retrospect, this seems pretty straightforward. I’m remembering a tool from another team (basically Rails + D3) that had the same goals, but didn’t have the query, scheduling or ACL abstractions underneath. It was replaced by an external tool that was terrible to the point of being unusable, but more secure. Eventually, we dumped normalized data in a columnar store that was also secure and easier to use for our team’s business intelligence, but would’ve been insufficient for things like periodically updating charts. I guess it’s the combination of data store features and supporting infra that makes the magic happen.

Written by Erik

October 14, 2019 at 8:52 pm

Posted in pattern, tool

Entropy

leave a comment »

A colleague once relayed to me someone else’s observation that every syntax variation allowed by a language will eventually appear in a code base. Resisting the process of breaking down into what’s possible requires energy. The idea that “naming things is hard” seems a variation of this. If I could remember the originator, I’d call it ___’s Law. In the meantime, I think “entropy” is the general form.

With its Greek prefix en-, meaning “within”, and the trop- root here meaning “change”, entropy basically means “change within (a closed system)”

https://www.merriam-webster.com/dictionary/entropy

In this context, static analysis tools like linters help limit what’s possible.

An organizational approach I’ve seen a couple times is to embrace the range of possibility. For example, given a camp in favor of Java and another in favor of Scala, a former team avoided endless debate by supporting both until there was an obvious reason not to. Another example is Google Cloud’s reconciliation of REST and gRPC:

All our Cloud APIs expose a simple JSON REST interface that you can call directly or via our client libraries. Some of our latest generation of APIs also provide an RPC interface that lets clients make calls to the API using gRPC: many of our client libraries use this to provide even better performance when you use these APIs

https://cloud.google.com/apis/docs/overview#multiple-surfaces-rest-and-grpc

Another organizational strategy David Poll brilliantly described: products will express the org structure that created them (Conway’s Law); we can expend energy resisting this, eg review processes, and/or we can create orgs in the shape of the products we intend.

Written by Erik

September 25, 2019 at 10:35 pm

Posted in org, pattern

Better together SDK pattern

leave a comment »

I’m a fan of an SDK product pattern I’ve heard people call “better together”. The idea is for SDKs to be decoupled, but complementary.

An example is an SDK that needs telemetry. One approach would be to add telemetry to the SDK, but this has a few problems: bloat, opacity, redundancy and coupling. An app may already have a telemetry SDK installed, so bundling another with an unrelated SDK bloats the app. Data logged inside the SDK is opaque to the app, which also complicates any SDK billing story. If the SDK does want to export telemetry data, it will need to build telemetry-specific logic redundant to the app’s telemetry provider. Any telemetry logic built by the SDK is coupled to the SDK.

The better-together pattern provides an alternative. To continue with the example above, an SDK requiring telemetry could detect if a telemetry provider is installed and publish events to it. A simplistic example would be to provide a method on the SDK to set a telemetry provider, eg:

class SDK {
   constructor(telemetry = null);
   …
   sayHi(){
     if (telemetry) {
       telemetry.logEvent(‘said_hi’);
     }
   }
 }
 …
 telemetry = new Telemetry();
 sdk = new SDK(telemetry);
 sdk.sayHi();

With this approach telemetry is only included in the app if the app owner wants it, minimizing bloat. Telemetry from the SDK is visible alongside the app’s other telemetry. The SDK can focus on whatever it does best. Telemetry is reusable elsewhere in the app.

One potential downside with this pattern concerns differentiating “internal” use-cases. Continuing with the telemetry example, the SDK may want to log events that are unrelated to the app’s functionality. I’ve seen three approaches: don’t differentiate, differentiate throughout, or don’t use the better-together pattern. The first approach treated all data as belonging to the app and namespaced all events published by the SDK, which worked well. The second approach was expensive due to technical complexity and eventually discontinued. The third approach was expensive due to redundant staffing, infra, UX, etc, but necessary so long as some parties don’t buy into the better-together pattern. I guess this stresses the “together” part of better-together 🙂

Written by Erik

September 25, 2019 at 8:20 am

Posted in org, pattern

View

leave a comment »

The joy of top-down rendering.

Problem

I want to present data, ideally as view = render(data).

Solution

I really like the view mechanics provided by choo/yo-yo/bel.

const html = require('bel')
const nanobus = require('nanobus')
const yo = require('yo-yo')

const bus = nanobus()
const render = yo.update.bind(yo, document.body)
const emit = bus.emit.bind(bus)

bus.on('change', (name) => {
  const state = {}
  state.name = name.toUpperCase()
  render(view(state, emit))
})

function view(state, emit){
  return html`
    <body>
      Hello, <input value="${state.name}" placeholder="name" onkeyup=${onKeyUp}>
    </body>
  `
  function onKeyUp(e){
    emit('change', e.target.value)
  }
}

Written by Erik

October 4, 2017 at 6:13 pm

Posted in pattern, tool

Tagged with , , , , ,

Object path

leave a comment »

Problem

I want to reduce conditional assignment when setting nested keys in an object, ideally:

{a:{b:{c:value}}} = set(a/b/c, value)

This is handy for data manipulation and abstracting path-based tools like LevelDB and Firebase Realtime Database.

Solution

Use object-path or lodash’s set/get.

Note: the tools mentioned above interpret numeric path segments as array indices, which may cause unexpected results when inserting arbitrary values, eg:

set(store, 'users.5.name', 'Kwan') // store.users.length --> 6

If this is an issue, consider:

function set(obj, path, val){
  path.split('/').reduce((parent, key, i, keys) => {
    if (typeof parent[key] != 'object') {
      if (i === keys.length - 1) {
        parent[key] = val
      } else {
        parent[key] = {}
      }
    }
    return parent[key]
  }, obj)
}
function get(obj, path){
  return path.split('/').reduce((parent, key) => {
    return typeof parent === 'object' ? parent[key] : undefined
  }, obj)
}

Examples

Inverting an object:

const posts = {1: {tags: {sports: true, news: true}}, 2: {tags: {news: true}}}
const byTag = {}
Object.entries(posts).forEach(([id, post]) => {
  Object.keys(post.tags).forEach(tag => {
    set(byTag, `${tag}/${id}`, true)
  })
})
// byTag --> { sports: { '1': true }, news: { '1': true, '2': true } }

Creating and querying a prefix tree:

const flatten = require('flat')

// populate tree
const emojis = {
  '🙂': 'smile',
  '😀': 'grinning',
  '😁': 'grin'
}
const tree = {}
Object.entries(emojis).forEach(([emoji, name]) => {
  let path = name.split('').join('/') + '/' + emoji
  set(tree, path, true)
})

// lookup prefix
const prefix = 'g'
const path = prefix.split('').join('/')
const subtree = get(tree, path) || {}
const matches = Object.entries(flatten(subtree)).map(([key, val]) => {
  return key.slice(-2)
})
console.log(matches) // --> ["😀", "😁"]

Written by Erik

October 3, 2017 at 6:01 pm

Posted in pattern, tool

Tagged with , , , ,

Client-side stream processing

leave a comment »

Solution

Given a bus and store:

struct Post {
  let id: String
  var text: String
  var likeState: Bool
}
protocol State {}
struct RootState : State {
  var userId: String? = nil
  var posts: [String:Post] = [:]
}
protocol Renderable {
  func render(_ state: State)
}
struct PostsImpression: Event {}
struct LikeRequested: Event {
  let postId: String
  let likeState: Bool
}
class Reducer : Subscriber {
  let store: Store
  let controller: Renderable
  var state: RootState
  init(store: Store, controller: Renderable, state: RootState){
    self.store = store
    self.controller = controller
    self.state = state
  }
  func onEvent(event: Event){
    switch event {
    case _ as PostsImpression:
      store.get("posts/\(state.userId!)")
      store.get("likes/\(state.userId!)")
    case let event as LikeRequested:
      store.set("likes/\(state.userId!)/\(event.postId)", event.likeState)
    case let event as Value where event.key.hasPrefix("likes"):
      let postId = event.key.components(separatedBy: "/").last!
      let likeState = event.val as! Bool
      state.posts[postId]?.likeState = likeState
      controller.render(state)
    case let event as Value where event.key.hasPrefix("posts"):
      let post = Post(
        id: event.key.components(separatedBy: "/").last!,
        text: event.val as! String,
        likeState: false) 
      state.posts[post.id] = post
      controller.render(state)
    default:
      break
    }
  }
}

Context

Redux’s reducer inspired me to think about this. Kleppmann’s blog post on turning the database inside out inspired me to think about stream processing in general.

Problem

Consolidate event processing from UI and data streams.

Written by Erik

August 16, 2017 at 3:53 pm

Posted in pattern

Tagged with , ,

Praise for the humble bus 🚌

leave a comment »

Context

This is a stream-of-consciousness gush for a pattern I like. I start by stating some things I like followed by a pattern that produces these things and then attempt to state the problem being solved (in case other folks like me appreciate a problem statement).

I’m a fan of the unidirectional event flow first brought to my attention by React/Redux. Prakhar mentioned this is also called the yo-yo pattern. (Events bubble up, views render down). yo-yo.js provides a delightfully simple implemention. choo completes yo-yo pattern by building on yo-yo.js and injecting an event bus into the view renderer.

Slightly related, I’m also enamored by the notion of an append-only log, reverently described by Jay Kreps and Martin Kleppmann in The Log and Turning the database inside-out with Apache Samza, respectively. Kleppmann provides additional, wonderful context in Data Intensive Applications.

In my experience, event logging from a client can be tricky to maintain. A couple helpful patterns: enable stdout-logging close to the event source, and explicitly enumerate events.

Solution

In this context, I’ve developed deep appreciation for the simple pubsub pattern, and the notion of an "event bus" through which published events flow to subscribers. Although busses and logs (and indices) frequently appear together, the bus seems most primitive.

This pattern is nothing new, but here’s a simplistic implementation I find easy to reason about:

protocol Event {}
struct LikeEvent : Event {}
protocol Subscriber {
  func onEvent(event: Event)
}
class StdoutSubscriber : Subscriber {
  func onEvent(event: Event) {
    print(event)
  }
}
class Bus {
  var subscribers: [String:Subscriber] = [:]
  func sub(_ subscriber: Subscriber){
    self.subscribers[key(subscriber)] = subscriber
  }
  func unsub(subscriber: Subscriber){
    self.subscribers[key(subscriber)] = nil
  }
  func pub(_ event: Event){
    for subscriber in subscribers.values {
      subscriber.onEvent(event: event)
    }
  }
  func key(_ subscriber: Subscriber) -> String {
    return String(describing: type(of: subscriber))
  }
}
let bus = Bus()
bus.sub(StdoutSubscriber())
// ... on "like" button tap
bus.pub(LikeEvent())

Events are first-class in Node, so an easy equivalent to the above would be:

var EventEmitter = require('events')
var bus = new EventEmitter()
function stdoutSubscriber(event){
  console.log(`event=${event}`)
}
bus.on('event', stdoutSubscriber)
bus.emit('event', 'like')

Problem

Given all the above, I think the problem I find the bus solving is: reduce complexity in a distributed system by allowing event sources to publish, and event processors to subscribe, as plainly as possible.

Caveat

I think decoupling event production from processing does have a cost. We lose locality, which complicates reasoning. In cases where production/consumption can be colocated, eg async operations on a thread that’s safe to block (Finagle’s use of Scala’s composable futures is a great example), I think it’s worth considering.

Related

Node’s event emitter supports the notion of a "channel". Kafka calls them "topics". This concept reminds me of Objective C’s KVO, and Firebase’s realtime database, which allow me to subscribe to the stream of changes for a given "key" (or "path").

Written by Erik

August 13, 2017 at 3:30 pm

Posted in pattern

Tagged with , , ,