Archive for the ‘MapReduce’ Category

Processing 192 million reads in less than 5 minutes

In a recent BMC Bioinformatics paper, Feng et al. talks about their new Hadoop flavoured software PeakRanger, a peak caller for ChIP-seq data. I must admit paper is really fantastic, and it shows how powerful can Hadoop implementation be. On [...]


Getting cited

Just checked a recent BMC Bioinformatics paper covering the overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics. A very interesting read indeed. More interestingly, this peer reviewed paper by Ronald Taylor cites one of my favorite and [...]


Some interesting Big Data Links

Closing the gap between big data and people who need It According to Stefan Groschupf, the CEO of Datameer there is “a gap between the data and the people that want to get close to it”, particularly those who wants [...]


Delta downloads for big data in Bioinformatics

Most of public bioinformatics databases don’t offer delta downloads or incremental updates. They don’t offer because either scientists are not demanding or data base providers don’t see any value in this. This was not a big issues for long time [...]


Efficient Graph Algorithms in MapReduce

In this video Jimmy Lin talks about current best practices in designing large-scale graph algorithms in MapReduce. Most of MapReduce based graph processing algorithms have performance limiting shortcomings, especially those related to partitioning, serial-izing, and distributing the graph. He presents [...]


Graph Processing with Pregel

Graph processing with Pregel is very similar to the MapReduce model and by design Pregel is well suited for distributed implementations. Pregel is a scalable and fault-tolerant system with an expressive and exible API.


Mapreduce and Hadoop Algorithms in Bioinformatics Papers

Purely inspired by Atbrox’s list of academic papers for Mapreduce & Hadoop Algorithms. Unlike computer science where applications of Mapreduce/Hadoop are very much diversified, most of published implementations in bioinformatics are still focused on the analysis and/or assembly of biological [...]


MapReduce goes evolutionary

Scientists from Texas A&M University have developed a new algorithm MrsRF (MapReduce Speeds up Robinson-Foulds) for analyzing large collection of evolutionary trees using MapReduce framework. Matthews et. al, have used their MapReduce algorithm to compute all-to-all Robinson-Foulds (RF) distance matrix [...]


Google's future scale computing

Forget about the Internet scale, seriously that is not enough for Google. Google engineers are now talking about future scale which means company is preparing to manage as many as 10 million servers in the future. Google fellow Jeff Dean [...]


Hadoop Visualization

This is a visualization of the data transfers inside a hadoop cluster. This is all Live data from a cluster at the University of Nebraska – Lincoln.