Software engineering notes

Posts Tagged ‘ec2

hadoop summit 09 > applications track > Case Studies on EC2

leave a comment »


– eHarmony

— matching people is an N^2 process

— run hadoop jobs on EC2 and S3

— results downloaded from S3 and imported into BerkeleyDB

— S3 is a great place to store huge files for a long time because it’s so cheap

— switched from bash to ruby because ruby has better exception handling

— elastic map reduce has replaced 150 lines of ec2 management script


– share this

— simplifies sharing online content: delicious + +

— they’re a small compan, but they need to keep pace w/ the volume of the large publishers they support

— they’re 100% based on AWS

— aster + lamp stack + cascading running hadoop (to clean logs before pushing data into db) + s3 + sqs

— sharded search mostly used for business intel

— cascading allows efficient hadoop coding, more so than pig

— in the hadoop book, the author of cascading wrote a case study on sharethis


– lookery

— started as an ad network on facebook

— built completely on aws

— use a javascript-based tracker like google analytics to gather data

— data acquisition + data serving + reporting + billing–> all done in hadoop

— they use voldemort, a distributed key/val store instead of memcache

— heavy use of hadoop streaming w/ python


– deepdyve

— a search engine

— having an elastic infrastructure allows for innovation

— using hadoop, they went from 1 wk to 1 hr for indexing

— start spinning up new clusters and discarding old ones

— ec2 + katta + zookeeper + hadoop + lucene –>most of the software they run, they didn’t have to write

— query times are lower, user satisfaction is higher

— problems:

— unstable aws

— session timeout on zookeeper

— slow provisioning for aws

— with aws, they can run load tests to prepare for spikes

Written by Erik

June 10, 2009 at 1:51 pm

Posted in notes

Tagged with , , , , , ,

barcamp san diego 5: “cloud computing on EC2”

leave a comment »

– rightscale alternatives
— chef
— sponsored by att
— an opensource ruby project
— puppet
— cfengine
— way cheaper than rightscale

– ec2 alternative
— eucalyptus
— opensource project on a private cloud
— akamai now hosts applications on their edge servers

– load balancer
— haproxy (
— software based
— allows us to route all traffic to a new cluster once it’s launched and running
— red5
— hardware based
— algorithms
— round robin

Written by Erik

May 31, 2009 at 2:56 pm