Transactions Across Datacenters (and Other Weekend Projects)
– master/slave replication
— usually asynch
— weak/eventual consistency: granularity matters
— datastore: current
— this is how app engine “multi-home”s the datastore
– multi-master replication
— one of the most fascinating areas of computer science
— eventual consistency is the best we can do
— need serialization protocol
— a global timestamp
– 2 phase commit
— semi-distributed protocol: there is always a master coordinator
— ah! got an emergency call – gotta go – but this was the best talk ever…… ๐ฆ
Tag: app engine
Google I/O notes: “A Design for a Distributed Transaction Layer for Google App Engine”
A Design for a Distributed Transaction Layer for Google App Engine
– distributed algorithms are difficult to impossible to debug – they must be proved correct
– correctness and performance are at the heart of engineering
– what is your goal? start there and work backwards, but keep focused on the goal
//check out codecon next year
– invariance
— correctness requires invariance
— a sentence that doesn’t change when everything else is changing
— initialize invariants during construction
— isolation and atomicity
– scalability
— deconstruct what you’re doing and figure out how to spread it out everywhere
— distributed machines are unreliable, non-serial, non-sychronized
– transactions
— a “good” state is one in which all invariants are satisfied
— invariants must be temporarily violated
— a “transaction” is a set of operations that take us from one good state to another
— ACID
— durable: state persist
— atomic and isolated: no in-between states
— consistent: only jump from one good state to the next
— in app engine, “entity groups” partition data
— you can’t run queries in a transactional app engine
– algorithm (read this whitepaper)
— very similar to two-phase commit (‘there are only so many good ideas’)
— 1) run client
— records version numbers
— 2) get write locks
— 3) check version
— 4) copy shadows
— details
— deadlock prevention
—- get locks in a certain order
— ongoing progress
—- 10-100 x reads to writes in web apps
— concurrent roll-forward
— proof of isolation
— light swtiches are idempotent
– eventual vs. strong vs. causal consistency
— app engine uses strong consistency
– local vs distributed transactions
— local transactions are cheaper
— no read-after-write, and no write-after-write, because writes are buffered – enforce hard rules for scalability
– be able to tell if a transaction has or has not happened; provide ids for each transaction
– questions
— will this be released as a library or built-in?
— it’ll be released as an opensource library called “tapioca”
— roll-forward vs. roll-back?
— when a write takes place, a “diff” is generated against the db as a shadow object. at the correct time, this shadow object is incorporated as a “roll-forward” of the db.
– use transactions anytime you are going to violate an invariant to ensure we return to a good state
google i/o ’09 notes: “From Spark Plug to Drive Train: Life of an App Engine Request”
I/O conference page
– desiging for scale and reliability
— web stack: 3 tier
— frontent
— applayer
— api
— persistence layer
— lamp stack evolution
— scalable? no. shared machine for db and webserver.
— split machines for db and server
—- 2 spof
— multi-web servers, single db machine
— dns round robin router
—- pretty scalable
— reverse proxy routing
—- cache static content
—- app-level load balencing; least-loaded instead of round robin
—- requires more machines
—- eg proball
— master-slave db
—- gets better read throughput
—- invisible to app
—- scalable? scales read rate w/ # of servers, but not write
—- spof for writes
—- master may die before replication
— partitioned db
—- requires re-arch of data model: no joins
— app engine stack
—- reverse proxy
—- app & static fiule serving
—- data store
—- memchache
—- api
– design motivations
— considerations
— build on existing google tech,
— integrated env
— small per-req footprints
— isolation btwn apps
— statelessness & specialization
— require partitioned data model
– life of a request
— routed to nearest data center using existing google infrastructure
— load-balancing and routing at reverse proxy layer
— static content
— served, if necessary, using existing google cache. static content is specified in app.yaml in the app
— dyn content
— app server
—- the app engine web server
—- serve many apps
—- concurrent
—- enforces isolation btwn apps and google, and statelessness
— flow
—- checks for cached runtime
—- executes the request
—- caches the runtime
—- app master: manages and schedules ops on app servers
— api requests
— app issues api call
— server accetpts
— blocks runtime
— app server issues call
— server returns resposne
– apis
— memcahceg
— distributed in-memory cache
— optimistic cache
— db queries, results of internet data fetch, etc.
— very specialized: just in-mem cache
— app engine data store
— built in big table (whitepaper avail)
— partitioned
— explicit indexes
— instant reads
— slower writes
— replicated >= 3 machines
— built on gfs (white paper available)
— mail api
— uses same routing as gmail
— the majority of the apis app engine uses are built on other apis used by much larger services = reliable
– recap
— built on existing google tech
— years, lots of money, much talent spent on optimization for scalable tech
— integrated env so
—- best practices
—- some restrictions
—- google tools easily avaolable
—- all logs in one place
—- no machine config
—- ez deployment
— small per-req footprints
— better utilization of app servers
— less mem usage
— limited cpu
— fast requests
— fairness to other apps
— agile routing and scheduling
— runtime caching
— request deadlines
— better use of resources
— isolation between apps
— reasons: safetly and predictability
— certain sys calls are unavailable
— statelessness & specialization
— how? use api calls
— why? performance, load balanced, fault tolerant
— partitioned data model
— indexes for all queries
— no schema
— super-fast reads; writes are a bit slower
— great for read-intensive apps, which include most apps
– metrics
— 80k apps
— 140m pageviews per day
— 200k developers
— whitehouse “open for questions” app
— handled 100k questions and 3.6M votes at 700 requests/sec peak
— instance of google moderator running on whitehouse servers
– questions
— moderator
— it’s an opensource project
— whitehouse engineers tweaked it for performance and security
— google provided support
— api for full-text search?
— they’re working on it
— loading big libraries?
— use runtime caching to store runtime for subsequent requests
— one-click magic for python like w/ gwt?
— not at this time
— how is static serving priced?
— by bandwidth
— data portability in/out bigtable?
— bulk uploader/downloader
— check out “app engine nitty-gritty” talk tomorrow
— differences btwn java and python capabilities?
— equivalence is important
— how many req/sec are req’d to maintain runtime in cache?
— no
— can we pin bigtable replication geographically?
— no, but they’re working on it