notes from cloudera basic training > installing the cloudera hadoop distro locally and on ec2

ref: http://www.cloudera.com/hadoop
installing cloudera distro on a cluster
– motivation: hadoop is complocated to install
– cloudera uses Alternatives to manage a|b testing
– cloudera has created a “configurator” that will generate an rpm customized to your cluster
— generates the configuration files and a custom installer. Each can optionally be used together or separately
– alternatively, you can install an unconfigured distro
– for large scale deployment, use puppet, bcfg2, cfengine, etc. to manage the cluster
— cloudera’s tool can still be used to generate config scripts
– storing data in ebs takes advantage of locality and is much faster than s3
— ebs is more performant than normal hard drives