Archive for the ‘MapReduce’ Category

Big Data: Hadoop is tuned for availability not efficiency

A very interesting post by UC Berkeley Professor Joe Hellerstein on his blog about two very different big data deployments on Hadoop and Greenplum. Joe was contrasting a recent Yahoo implementation on Hadoop to sort a petabyte using approximately 3800 [...]


MapReduce is now reaching mainstream science

Most of you will be aware about MapReduce, a framework developed by Google to analyze large data sets in parallel on clusters of computers. It is used for certain kinds of distributable problems using a large number of computers. There [...]