Big data, Big challenges
Image by Photo Blog 0001 via Flickr
The big question is whether the person on the other side of that machine will have the wherewithal to do something interesting with an almost limitless supply of genetic information.
Well those are not only problems we have, most of our freshly minted university graduates are not prepared to face this kind of data deluge. Why?
For the most part, university students have used rather modest computing systems to support their studies. They are learning to collect and manipulate information on personal computers or what are known as clusters, where computer servers are cabled together to form a larger computer. But even these machines fail to churn through enough data to really challenge and train a young mind meant to ponder the mega-scale problems of tomorrow.
I guess this is something for which we can not blame students alone. The lack of resources and exposer to new technologies is the one of the reasons. To tackle this issue Google and IBM are now promoting Internet-scale research at places like the University of Washington and Purdue by giving students wide access to their powerful computational infrastructure. Idea is to encourage the students to churn the data with the help of open-source tools like Hadoop used for processing Internet-scale data sets. Hadoop which is open source implementation of MapReduce, a software framework introduced by Google to support distributed computing on mega-scale data sets on clusters of computers. By the start of 2008 Google was processing over 20 petabytes of data per day through an average of 100,000 MapReduce jobs spread across its massive computing clusters which gives a glimpse of Google’s Internet-scale capabilities. In a similar kind of initiative to promote the cloud based distributed computing learning, Amazon Web Services (AWS) is providing their on-demand infrastructure to the educational purposes for free.
So far we have talked about the next generation of data which is coming out of high throughput technologies in different scientific disciplines, and we all agree that this will have greater impact on the infrastructure of research, research funding and beyond (if and only if this is managed properly). On a further note, this data will need to be annotated with metadata, then archived and curated. Each of these seems to be mammoth task which means focus should not be only on onetime analysis but also on future reusability and interoperability.
In following video Roger Magoulas (Director of Research at O’Reilly) talks about the Big Data in general and gives a glimpse into future technologies and general advice to organizations interested in improving their proficiency in handling web-scale data.
9 Responses to “Big data, Big challenges”
-
[...] complicated. Science is constantly shifting landscape, both in terms of data type and quantity. We are s... abhishek-tiwari.com/2010/03/put-some-breathe-life-in-your-papers.html
-
[...] future scale computing Forget about the Internet scale, seriously that is not enough for Google. Googl... abhishek-tiwari.com/2009/10/googles-future-scale-computing.html
-
[...] complicated. Science is constantly shifting landscape, both in terms of data type and quantity. We are s... abhishek-tiwari.com/2010/03/put-some-breathe-life-in-your-papers.html

![Reblog this post [with Zemanta]](http://img.zemanta.com/reblog_e.png?x-id=3ab0d8e2-34ce-4bed-83ad-07f7e5fb9690)


















I have a slide in many of my decksthat says “Data Management is NOT Data Storage”
Big data, Big challenges http://bit.ly/1yr5xG
Big data, Big challenges http://tinyurl.com/ygur5px
Big data, Big challenges: Image by Photo Blog 0001 via Flickr Recently New York Times has published a tech arti.. http://bit.ly/1yr5xG
Liked RT @abhishektiwari: Big data, Big challenges http://bit.ly/1yr5xG
My 2 cents, creativity in science communication was never so relevant until now when we are facing the problem of filter failure. Although I am not big fan of impact factors (IF) sounds like the visual impact of journals can be highly correlated with their IF. In terms of illustration appeal journals like Nature, Cell, PNAS and PLoS are way ahead to their counterparts in other areas.
This comment was originally posted on Fisheye Perspective