Archive for the ‘Big Data’ Category

Processing 192 million reads in less than 5 minutes

In a recent BMC Bioinformatics paper, Feng et al. talks about their new Hadoop flavoured software PeakRanger, a peak caller for ChIP-seq data. I must admit paper is really fantastic, and it shows how powerful can Hadoop implementation be. On [...]


Some interesting Big Data Links

Closing the gap between big data and people who need It According to Stefan Groschupf, the CEO of Datameer there is “a gap between the data and the people that want to get close to it”, particularly those who wants [...]


Knowing the data vs relying on it

It is good to know your data. No matter which area you work, there is always an opportunity to make additional gains by closely observing the characteristic and quality of your data. By experimenting and looking carefully at the data [...]


Delta downloads for big data in Bioinformatics

Most of public bioinformatics databases don’t offer delta downloads or incremental updates. They don’t offer because either scientists are not demanding or data base providers don’t see any value in this. This was not a big issues for long time [...]


Why it is best time to be Bioinformatician?

I am new fan of Quora, a community created, edited, and organized website which helps participants to aggregates questions and answers on their topic of interest. There are always some interesting debate for the participants and in fact there are [...]


Funky NoSQL vs SQL

MongoDB is Web Scale MySQL is Not ACID Compliant via http://nosql.mypopescu.com/


Beyond the data clouds

No doubt, cloud computing is hot at the moment. Everyone is jumping onto the bandwagon before it become too late for them. Currently data clouds seems to be a major focus for most of the companies and institutions adopting cloud [...]


Graph Processing with Pregel

Graph processing with Pregel is very similar to the MapReduce model and by design Pregel is well suited for distributed implementations. Pregel is a scalable and fault-tolerant system with an expressive and exible API.


Where Does the Time Go in a Data Mining Project?

Robert Grossman is one of the prominent thinkers in the field of big data analytics. Recently I was going through his presentation about high performance and distributed data mining, where he points out one of the dirty secrets of deploying [...]


Bloom filters for bioinformatics

The Bloom filter was originally developed by Burton H. Bloom back in the seventies and for long time it was there without any major application. Google is credited for making Bloom filter popular again. Only after the Google used Bloom [...]