Software engineering notes

Praise for the blobstore 📯

The phrase “blobstore” refers to a simple key-value store, optimized for large values. Amazon’s S3Google Cloud Storage, App Engine’s Blobstore and Twitter’s Blobstore are examples.

I’m writing to highlight what appears to be a fundamental role played by the blobstore pattern. This may be aspirational, but I’m curious to explore.

Caveat: I only have experience using blobstores and the related infra I mention below, eg CDNs, source control, document stores, CI, etc, not maintaining this infra.

I’ve bolded references to other key pieces of infrastructure.

Hosting

This is one of the more primitive, intuitive applications. Push a local file to the store and then retrieve it:

$ curl -T my_file https://blobs.example.com/my_file
$ curl https://blobs.example.com/my_file

Note how amenable the store is to a RESTful API. Related, many of the connections made in this post remind me of CouchDB.

Custom domains

Enable custom domains and we have general hosting:

$ curl -H "x-custom-domain: mydomain.com" https://blobs.example.com/my_file
$ curl https://mydomain.com/my_file

Artifact store

Push executable blobs into the store and we have an artifact store, like Twitter’s Packer and Google’s MPM.

CDN

Put a cache in front of the store and we have a CDN. Use SSDs for hosting and we may not need a cache. Or use an external CDN and the blobstore as an “origin server”.

Key-value

Key-value stores scale well because they require relatively little coordination: a key maps simply to a host and doesn’t need to be read before being written.

We improve hosting scalability by building on a key-value pattern, for example.

Document store

For better or worse, a blobstore is only aware of simple keys, so we can’t read into the values.

However, index the value and we have a document store (eg AWS DocumentDB). We can keep the blobstore pure by defining the document store as standalone infrastructure and depending on an event bus (eg Google Cloud Pub/Sub) and execution service (eg Google Cloud Functions):

  1. Document store subscribes to blobstore push events via event bus associating an executable
  2. Client pushes value to blobstore
  3. Blobstore writes value and fires event, eg “push ”
  4. Document store processes the path as described above

Now we can query values:

$ curl -d '{"k":"v"}' https://blobs.example.com/my_file
$ curl https://documents.example.com/my_file/?query="k>t"
> v

Search

Using an approach similar to building a document store, we can process the values stored in blobstore to populate a search index.

Source control

Put a review process and commit log in front of writes to the blobstore and you have scalable source controlUse a blobstore for Git’s object store, and you get scalable replication. Use a document-aware source control mechanism, like Google, and you get a scalable mono-repo.

Blobstore data is often immutable and versioned, so commits could simply be references to file versions.

CI

Subscribe to push events from source control, using an event bus and execution service and we have CI. We can listen for success events from our subscriber to get pre-submit quality control. We can write artifacts back to the blobstore as part of a continuous deploy pipeline.

Batch

We’ve already described a few cases of stream processing via an event bus, but we can also periodically iterate over values and stream them through an execution service to rebuild indices, produce artifacts, normalize data, etc.

Written by Erik

November 18, 2017 at 4:19 pm

Posted in infra

Tagged with , ,

%d bloggers like this: