Chemical Open Source For Free: How far it really matters

In a recent post Peter Murray-Rust applauds that “Chemical software will be Open Source“, and claims that “the simple future is inevitable“. I agree with the fact that open source is a long due activity in this particular area. Different community projects such as CDK, Taverna, KNIME are pushing open source in chemistry to a next level where both academia and industry are making the most of open source. Someone can ask if everything is free and open source, how come software vendors can be commercial viable. For several reasons open source chemistry is more viable for consumers rather than producers of the software. If the open source development and distribution model has been adopted, companies can move on a service-oriented business model by offering component integration, consulting and support services, but problem persist with user base. Unlike general open source offerings such as different flavors of Linux operating systems and databases that have a huge consumer base, unfortunately in chemistry or biology consumer base is quite constrained and mostly belongs to pharmaceutical companies or academia. Academia is already enjoying everything for free, for example most of chemoinformatics toolkits (JChem, OEChem), workflow solutions (PipeLine Pilot) and many others commercial software are freely available for academic uses. However at same time finding a commercial customer for open source software service in chemistry or biology is an arduous task. Pharmaceutical companies are maintaining huge BIO-IT departments, and some of them have created a back door to exploit the cheap and free software through their academic collaborations. Peter argue that anything offered to academia should be free and industry should be charged for that, does this include the academic labs with industry connections? I don’t understand why academia or anyone should be offered such a liberty especially when they are working for their industrial collaborations. Further, in my opinion Open Source and Free should not be considered as low or zero initial cost. Unlike academia hosted community project that involves cheap labor of PhD and Postdoctoral students, open source transformation for commercial vendors is never easy and they need to find a competitive way to survive. In either case it is totally justified if software producer happily charge you for their time and resources.
Share and Enjoy:
  • HackerNews
  • Twitter
  • Facebook
  • Google Buzz
  • LinkedIn
  • Posterous
  • Tumblr
  • Digg
  • Reddit
  • del.icio.us
  • DZone
  • FriendFeed
  • Suggest to Techmeme via Twitter
  • Print
  • RSS
  • Slashdot

19 Responses to “Chemical Open Source For Free: How far it really matters”
  1. 06.03.2009

    Not a direct response to your post, but wanted to point out that the whole commercial software ecosystem is not that simple. For one, Univ tech transfer offices have a pretty strong hand in the models chosen. Second, the biology and chemistry communities have adopted very different approaches. Having worked at companies that have profited (although scientific software and profits are not words that go well together) from academic research, I think the models need to fundamentally change. In the biology world, especially informatics, most software is open source and while there are service and value add models around that software, the real money is in data management and software that supports post-discovery science where compliance, etc become important issues. I do not think anyone can make money in basic bioinformatics, given the availability of good open source software. In chemistry the model has been different, and for whatever reason, there is not that much free software and too much of it is locked up in arcane…

  2. 06.03.2009

    Indeed my opinion is quite simplistic, but I agree that whole open source oriented model is quite different for biology and chemistry. When it comes to biology industry has adopted best tools and techniques from open source community and integrated them with their product line. For example PipeLine Pilot and Inforsense have provided support for BioPERL, BioJAVA long back but they never replicated same with CDK or others. Reason is very clear there was never a good commercial offering in bio toolkit segment, in case of chemistry toolkits Daylight, ChemAxon, OpenEye came up with far superior products. Part of the problem is the quality of software open source chemistry is delivering.

  3. 06.04.2009

    Abhishek, can you back up the statement that commercial offerings are significantly better or even superior than open source cheminformatics tools? I think I missed that publication…

  4. 06.04.2009

    Seriously, surely they are better in support, that is what you pay them for, but I have not seen any writing showing that there algorithms are better at making drugs, predicting properties, …

  5. 06.04.2009

    Chemical Open Source For Free: How far it really matters: In a recent post Peter Murray-Rust applauds that “Chem.. http://tinyurl.com/q6rlbh

  6. 06.04.2009

    Chemical Open Source For Free: How far it really matters- by … http://bit.ly/OchUk

  7. 06.04.2009

    My statement can not be supported by any publication, however there are some advantages by using commercial offerings. Don’t take me wrong but even for simple algorithms like ALogP there is a difference but in my opinion how they can handle large scale data sets is key issue, i.e data cartridges. Although Open source efforts have great collective potential but quite scattered. Take CDK, MX, OpenBable, pgchem::tigress, JOELib2, ChemSQL all together make a great collection but then alone none of them are sufficient to create a very powerful chemoinformatics application.

  8. 06.04.2009

    That’s a support issue. *Not* a problem with the libraries. This sounds like a request for a consultant to glue stuff together for you. The fact that you feel such glue is missing (btw, this is exactly what Bioclipse is solving) does not make tools like OpenBabel or the CDK inferior. Also, ALogP being better by other tools… you cannot back that up either? If we had enough OpenData that would have been easy to quantify…

  9. 06.04.2009

    Anyway… don’t worry about how easy you can use your black box, worry about how well you can validate the box you use. Open Source cheminformatics offers you transparent boxes. That’s what is important, not the price tag.

  10. 06.04.2009

    I have to disagree – CDK + OB + Postgres etc can lead to powerful cheminformatics applications. AS Egon pointed out it’s mainly a matter of glue. While Bioclipse is certainly one sort of glue, there’s scope for other forms as well. But at the same time I will say that OSS cheminformatics does have holes – depending on the specific toolkit, onemight need to employ another toolkit to fill in those holes. This aspect makes broader usage of OSS toolkits problematics.

  11. 06.04.2009

    Which brings in the Interoperability promoted by the Blue Obelisk. Blue Obelisk members generally try/tried to keep the various tools non-overlapping, though increasingly duplication of code happens, just because a different programming language or different license is more convenient. So, e.g. Jmol has the same UFF code as OpenBabel; but, it is common to use different open source cheminformatics tools indeed.

  12. 06.04.2009

    Abhishek, as you seem to be a demanding end user (meant in the most positive way possible), maybe you can blog about the whole you see, and then the Blue Obelisk (and the rest of the open source cheminformatics) community can respond, fill the gaps, or pointing to existing tools? Yes, this support comes for free… no entrance fees or yearly contracts required.

  13. 06.04.2009

    Abhishek, that was kind of my point. In chemistry, you can make money from the underlying tools, e.g. Glide, but in biology, esp informatics, you have to go up the stack, focus on pipelining and data management (noting that pipeline pilot’s origins are completely in cheminformatics).

  14. 06.04.2009

    Egon, I am already in process to compile a comparison table of all chemoinformatics toolkits, and will blog that soon. Also regarding support from community, I must say even commercial products have developed very strong community around their products. Product is commercial or open, I don’t expect any support from producer but from user group. And that what really matters.

  15. 06.04.2009

    Cheminformatics does face challenges though, and I am not sure why. The same organization will primarily use an open source bioinformatics stack, but a proprietary cheminformatics stack with crazy licensing with only a few OSS tools in regular play, but that’s what Egon etc are trying to change.

  16. 06.04.2009

    And the reason for this, IMO, is that when companies aw the need for bioinformatics tools ,they were able to find a comprehensive OSS stack. OSS chemifnormatics has had a much later start and hence has to prove itself to a higher degree. Another aspect I think plays a role – there are *many* more people able/willing to write bioinformatics code than cheminformatics code. (But of course this is a vicious cycle)

  17. 06.04.2009

    Egon , wrt interoperability – that’s nice. But the fact is that if I’m using a low level toolkit I expect it to provide all the core components without having to integrate another toolkit. Yes, ideally, all OSS toolkits would have the same/similar interfaces, API’s etc. But that’s an unrealistic expectation IMO. Right now, if somebody uses the CDK and needs stereochem support, they are out of luck and will likely shift to OB. But if they’re a Java shop, they’d likely rather switch to JChem.

  18. 06.04.2009

    Rajarshi, don’t disagree. Bioinformatics codes are a lot more generalized from the computer science point of view and the number of people came from there. But even culturally chemistry-oriented codes (I put MD in the same category) have often not been free for academics either. They’ve had source available, especially to academics, but it’s not an open source model. Quantum chemistry codes which pre-date any bioinformatics apps have almost never had a proper OSS model, with a combination of commercial codes (Gaussian) and not exactly easy to use free ones (GAMESS)

  19. 06.04.2009

    I have addressed some of your points in:
    http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=2057