silicon valley hadoop user group 5-20-09: ibm research on hadoop over gpfs

- tested on jbot
- equivalent performance between hdfs and gpfs for non-trivial applications
- used Bonnie for filesys benchmarking
- cluster topology
-- standard hadoop uses local storage
--- cheap, scalable
-- full san uses central store
--- configurability of compute nodes
--- not as scalable
-- sub-cluster uses split storage
- conclusions
-- abstraction of filesys from mapreduce was good
-- gpfs (and other cluster filesys) can match performance of hdfs
- scalability?
-- gpfs runs on thousands of nodes
- fault tolerance?
-- not tested yet
- how similar is gpfs to unix filesys?
-- consistency issues are handled in a similar way