MapReduce goes evolutionary

Scientists from Texas A&M University have developed a new algorithm MrsRF (MapReduce Speeds up Robinson-Foulds) for analyzing large collection of evolutionary trees using MapReduce framework. Matthews et. al, have used their MapReduce algorithm to compute all-to-all Robinson-Foulds (RF) distance matrix on multi-core computing platforms. Calculation of all possible Robinson-Foulds distance pairs is a computationally intensive task. The results show that a significant speedup can be achieved using MrsRF compared to the fastest sequential algorithms.
We studied the performance of our MrsRF algorithm on two large biological trees sets consisting of 20,000 trees of 150 taxa each and 33,306 trees of 567 taxa each. Our experiments show that MrsRF is a scalable approach reaching a speedup of over 18 on 32 total cores.

Phase 1 of the MrsRF algorithm. Two mappers and two reducers are used to process the input files.

Apart from speeding up the phylogenetic analysis, this study presents a new type of MapReducible problem where “the size of the input (t evolutionary trees) is much smaller than the size of the output (t × t RF matrix)”. Generally in MapReduce implementations the final output is smaller in size than the initial input. Another important thing which authors point out is getting best performance out of MapReduce implementation on a multi-core cluster depends on the cluster configuration. For instance, they tried their problem set with 32 total cores, a 16 nodes by 2 cores (16 × 2) cluster configuration which outperformed 8 × 4, 4 × 8, and 32 × 1 cluster configuration.
Overall their research makes a strong case for using MapReduce framework to design high-performance phylogenetic applications and it can be best for tackling the large evolutionary computational problems such as summarizing the big collections of evolutionary trees. An open-source implementation of MrsRF algorithm is freely available from the Google code.

Reference:
Matthews, S., & Williams, T. (2010). MrsRF: an efficient MapReduce algorithm for analyzing large collections of evolutionary trees BMC Bioinformatics, 11 (Suppl 1) DOI: 10.1186/1471-2105-11-S1-S15

Reblog this post [with Zemanta]
Share and Enjoy:
  • HackerNews
  • Twitter
  • Facebook
  • Google Buzz
  • LinkedIn
  • Posterous
  • Tumblr
  • Digg
  • Reddit
  • del.icio.us
  • DZone
  • FriendFeed
  • Suggest to Techmeme via Twitter
  • Print
  • RSS
  • Slashdot

10 Responses to “MapReduce goes evolutionary”
  1. 02.08.2010

    MapReduce goes evolutionary http://bit.ly/9Edncm

  2. MapReduce goes evolutionary http://goo.gl/fb/eeOO

  3. 02.08.2010

    RT @ResearchBlogs: MapReduce goes evolutionary http://goo.gl/fb/eeOO <– see http://dx.doi.org/10.1186/1471-2105-11-S1-S15

  4. 02.08.2010

    RT @ResearchBlogs: MapReduce goes evolutionary http://goo.gl/fb/eeOO <– see http://dx.doi.org/10.1186/1471-2105-11-S1-S15

  5. 02.08.2010

    MT @rdmpage/@ResearchBlogs: MapReduce goes evolutionary http://goo.gl/fb/eeOO http://dx.doi.org/10.1186/1471-2105-11-S1-S15

  6. topsy_top20k_en
    02.08.2010

    MapReduce goes evolutionary http://bit.ly/9Edncm

  7. 02.08.2010

    @teachmescience MapReduce speeds up phylogenetic analyses http://bit.ly/aSY918

  8. 02.08.2010

    RT @tweetmeme http://bit.ly/9Edncm – Further Phylogentic Analysis of Trees, Supertrees & co becomes tractable

  9. 02.10.2010

    RT @tweetmeme MapReduce goes evolutionary- by Fisheye Perspective http://bit.ly/9Edncm

  10. 02.21.2010

    MapReduce goes evolutionary http://ff.im/-gkbbU