Software engineering notes

silicon valley hadoop user group 5-20-09: cloudera on automatic database import w/ sqoop

motivation
- hadoop is great for unstructured data
- hadoop is not great for structured data
- how to glue data from mysql to unstructured data for hadoop

DBInputFormat
- uses jdbc to connect to db

DBWritable
- a bridge from jdbc result set to mapper value

Sqoop
- SQL-to-Hadoop
- jdbc-based interface
- auto datatype generation
- uses mapreduce to read tables from db
- imprts into hdfs and creates java file
- easy to import into hive
- serialized output is comma-separated

Written by Erik

May 20, 2009 at 6:16 pm

Posted in notes

Tagged with , ,

%d bloggers like this: