Batch to SSTable

A pattern I’ve seen a couple times for immutable data:

  1. Generate the data using a batch process
  2. Store the data in an an indexed structure (like SSTable)
  3. Expose the structure through an API

The result is a key-value store with extremely high read performance.

The first time I heard about this was Twitter’s Manhattan database. Recently, I saw the pattern again at a different company. Ilya Grigorik wrote about it several years ago in the context of log-structured data, BigTable and LevelDB.

My takeaway is: this pattern is worth considering if:

  • my current store is having issues (no need to fix what’s not broken)
  • I have heavy read traffic
  • I can tolerate latency on updates

The context of log-structured makes me think that might open a door to write access too. Twitter’s post mentions a “heavy read, light write” use-case, although it also describes use of a B-tree structure rather then a simple sorted file for that case. Grigorik’s post mentions BigTable uses a “memtable” to facilitate writes.

Note Web’s IndexedDB has a similar access pattern to SSTable. If I think about remote updates as an infrequent write, then the pattern described here might be a common use-case for Web, which might bring this around full circle: Google crawls the Web in a batch process and updates an index which is read-heavy.

Getting started with Google App Engine Java SDK

A few days ago, I tried to use the App Engine Eclipse plugin, but ran into some issues, as described in an earlier post. These were probably due to my lack of experience with Java, Eclipse, and/or the AppEngine dev model, but I was blocked all the same. This time, I’ll start at a lower level, with the App Engine Java SDK.

My first stop was the App Engine Java overview page, which suggests “… you haven’t already, see the Java Getting Started Guide …”, so I hopped over.

The steps outlined in Installing the Java SDK worked well, and I was able to launch the dev server.

Next, I created my first project, the Guestbook app. The steps here were helpful too, and I was able to compile the app successfully (via the Using Apache Ant documentation), but I ran into trouble when I tried to run it:

$ ant runserver
Unable to find a $JAVA_HOME at "/usr", continuing with system-provided Java...
Buildfile: build.xml



[java] 2010-10-14 00:47:18.489 java[24218:903] [Java CocoaComponent compatibility mode]: Enabled
[java] 2010-10-14 00:47:18.492 java[24218:903] [Java CocoaComponent compatibility mode]: Setting timeout for SWT to 0.100000
[java] Oct 14, 2010 7:47:20 AM info
[java] INFO: Logging to JettyLogger(null) via
[java] Oct 14, 2010 7:47:20 AM readAppEngineWebXml
[java] SEVERE: Received exception processing /Users/foo/Sites/appengine/Guestbook/war/WEB-INF/appengine-web.xml
[java] Could not locate /Users/foo/Sites/appengine/Guestbook/war/WEB-INF/appengine-web.xml

Total time: 3 seconds

The missing file is located in /Users/foo/Sites/appengine/Guestbook/war/WEB-INF/classes/WEB-INF/appengine-web.xml, which seems to be intentional given the statement “All other files found in src/, such as the META-INF/ directory, are copied verbatim to war/WEB-INF/classes/”.

If I add the following to build.xml, so appengine-web.xml and web.xml are coped into the src/WEB-INF dir, then it works:

    <copy todir="war/WEB-INF">
      <fileset dir="src/WEB-INF">

The next step would be to Using the Users Service, but it’s getting alte, and I’z getting seelpy, so I’ll save that for another day.

To conclude w/ something uplifting, here’s a pic of a sleeping hedgehog.

Sleepy Hedgehog
Sleepy Hedgehog, credit: Andreas-photography

Getting started with Google Eclipse plugin

This post is a record of my first experience with Google’s plugin for Eclipse Helios (3.6)

First impression: anyone who can get Eclipse to install a plugin without multiple errors deserves commendation. Good job, Google.

Doh! Spoke too soon. After running step 5 in the Creating a Project section of the plugin documentation I got “The project cannot be built until build path errors are resolved … Unknown Java Problem”

Sigh. Ok. Searching Stack Overflow … OMG. I can’t believe Eclipse has been around as long as it has and it’s still simply un-runnable. Maybe it’s a Java thing. Searching … “Build path entry is missing: org.eclipse.jdt.launching.JRE_CONTAINER” … Wow. A couple hours later and no luck.

Back to Ruby for a little pick-me-up 🙂

update (Oct. 15)

A friend with more experience helped me sort this out:

  1. Find your JDK.  On Mac, 10.6 it’s in /System/Library/Frameworks/JavaVM.framework
  2. In Eclipse menu bar, go to Eclipse > Preferences… > Java > Installed JREs
  3. Click “Add…”
  4. Either click “Directory…” and browse to the location of the JDK from step 1, or just enter the path if you know it.  In my case, it was
  5. Give it a name.  Mine is “JDK 6”
  6. Click OK to save

Before trying again with the Google Eclipse plugin, I ran the software update (Help > Check for Updates) and restarted, for good luck.

Google I/O notes: “Transactions Across Datacenters (and Other Weekend Projects)”

Transactions Across Datacenters (and Other Weekend Projects)
– master/slave replication
— usually asynch
— weak/eventual consistency: granularity matters
— datastore: current
— this is how app engine “multi-home”s the datastore
– multi-master replication
— one of the most fascinating areas of computer science
— eventual consistency is the best we can do
— need serialization protocol
— a global timestamp
– 2 phase commit
— semi-distributed protocol: there is always a master coordinator
— ah! got an emergency call – gotta go – but this was the best talk ever…… 😦

Google I/O notes: “A Design for a Distributed Transaction Layer for Google App Engine”

A Design for a Distributed Transaction Layer for Google App Engine
– distributed algorithms are difficult to impossible to debug – they must be proved correct
– correctness and performance are at the heart of engineering
– what is your goal? start there and work backwards, but keep focused on the goal
//check out codecon next year
– invariance
— correctness requires invariance
— a sentence that doesn’t change when everything else is changing
— initialize invariants during construction
— isolation and atomicity
– scalability
— deconstruct what you’re doing and figure out how to spread it out everywhere
— distributed machines are unreliable, non-serial, non-sychronized
– transactions
— a “good” state is one in which all invariants are satisfied
— invariants must be temporarily violated
— a “transaction” is a set of operations that take us from one good state to another
— durable: state persist
— atomic and isolated: no in-between states
— consistent: only jump from one good state to the next
— in app engine, “entity groups” partition data
— you can’t run queries in a transactional app engine
– algorithm (read this whitepaper)
— very similar to two-phase commit (‘there are only so many good ideas’)
— 1) run client
— records version numbers
— 2) get write locks
— 3) check version
— 4) copy shadows
— details
— deadlock prevention
—- get locks in a certain order
— ongoing progress
—- 10-100 x reads to writes in web apps
— concurrent roll-forward
— proof of isolation
— light swtiches are idempotent
– eventual vs. strong vs. causal consistency
— app engine uses strong consistency
– local vs distributed transactions
— local transactions are cheaper
— no read-after-write, and no write-after-write, because writes are buffered – enforce hard rules for scalability
– be able to tell if a transaction has or has not happened; provide ids for each transaction
– questions
— will this be released as a library or built-in?
— it’ll be released as an opensource library called “tapioca”
— roll-forward vs. roll-back?
— when a write takes place, a “diff” is generated against the db as a shadow object. at the correct time, this shadow object is incorporated as a “roll-forward” of the db.
– use transactions anytime you are going to violate an invariant to ensure we return to a good state

google i/o ’09 notes: “From Spark Plug to Drive Train: Life of an App Engine Request”

I/O conference page
– desiging for scale and reliability
— web stack: 3 tier
— frontent
— applayer
— api
— persistence layer
— lamp stack evolution
— scalable? no. shared machine for db and webserver.
— split machines for db and server
—- 2 spof
— multi-web servers, single db machine
— dns round robin router
—- pretty scalable
— reverse proxy routing
—- cache static content
—- app-level load balencing; least-loaded instead of round robin
—- requires more machines
—- eg proball
— master-slave db
—- gets better read throughput
—- invisible to app
—- scalable? scales read rate w/ # of servers, but not write
—- spof for writes
—- master may die before replication
— partitioned db
—- requires re-arch of data model: no joins
— app engine stack
—- reverse proxy
—- app & static fiule serving
—- data store
—- memchache
—- api

– design motivations
— considerations
— build on existing google tech,
— integrated env
— small per-req footprints
— isolation btwn apps
— statelessness & specialization
— require partitioned data model

– life of a request
— routed to nearest data center using existing google infrastructure
— load-balancing and routing at reverse proxy layer
— static content
— served, if necessary, using existing google cache. static content is specified in app.yaml in the app
— dyn content
— app server
—- the app engine web server
—- serve many apps
—- concurrent
—- enforces isolation btwn apps and google, and statelessness
— flow
—- checks for cached runtime
—- executes the request
—- caches the runtime
—- app master: manages and schedules ops on app servers
— api requests
— app issues api call
— server accetpts
— blocks runtime
— app server issues call
— server returns resposne

– apis
— memcahceg
— distributed in-memory cache
— optimistic cache
— db queries, results of internet data fetch, etc.
— very specialized: just in-mem cache
— app engine data store
— built in big table (whitepaper avail)
— partitioned
— explicit indexes
— instant reads
— slower writes
— replicated >= 3 machines
— built on gfs (white paper available)
— mail api
— uses same routing as gmail
— the majority of the apis app engine uses are built on other apis used by much larger services = reliable

– recap
— built on existing google tech
— years, lots of money, much talent spent on optimization for scalable tech
— integrated env so
—- best practices
—- some restrictions
—- google tools easily avaolable
—- all logs in one place
—- no machine config
—- ez deployment
— small per-req footprints
— better utilization of app servers
— less mem usage
— limited cpu
— fast requests
— fairness to other apps
— agile routing and scheduling
— runtime caching
— request deadlines
— better use of resources
— isolation between apps
— reasons: safetly and predictability
— certain sys calls are unavailable
— statelessness & specialization
— how? use api calls
— why? performance, load balanced, fault tolerant
— partitioned data model
— indexes for all queries
— no schema
— super-fast reads; writes are a bit slower
— great for read-intensive apps, which include most apps

– metrics
— 80k apps
— 140m pageviews per day
— 200k developers
— whitehouse “open for questions” app
— handled 100k questions and 3.6M votes at 700 requests/sec peak
— instance of google moderator running on whitehouse servers

– questions
— moderator
— it’s an opensource project
— whitehouse engineers tweaked it for performance and security
— google provided support
— api for full-text search?
— they’re working on it
— loading big libraries?
— use runtime caching to store runtime for subsequent requests
— one-click magic for python like w/ gwt?
— not at this time
— how is static serving priced?
— by bandwidth
— data portability in/out bigtable?
— bulk uploader/downloader
— check out “app engine nitty-gritty” talk tomorrow
— differences btwn java and python capabilities?
— equivalence is important
— how many req/sec are req’d to maintain runtime in cache?
— no
— can we pin bigtable replication geographically?
— no, but they’re working on it