BioInform has a report (sub reqd, or get it from the Google cache) on a special session on scientific publishing organized my Scott Markel at ISMB. Since that is a topic of particular interest to bbgm and bbgm readers, thought I’d give it a read and find out if there were any new insights there.
I won’t delve into all the specifics, but it is clear that there is a desire in the community to make scientific publication as machine readable as possible. As has been discussed many times here, scientific research is a treasure trove of information and the content contained within papers and other forms of publishing should become part of the data that we try to mine and correlate, and in the long term it’s not just the text, but as pointed out in the article images and other datasets as well.
Mark Gerstein in particular brings up some points which make a ton of sense, and to a degree are somewhat obvious. That we still need to debate them is the part that makes me frown. Wouldn’t it be nice if we could take some of those ideas and thoughts for granted? For example
Gerstein favors the idea of linking databases and journal articles so that scientists can track a given gene annotation in a database back to the published paper.
The part that people don’t seem to grasp, or at least it didn’t jump out to me is that for text-based documents, we have a database, one called the world wide web. Via DOIs and other identifiers, each paper is an addressable resource. Given the right structure and the right APIs it’s a data mashup waiting to happen. Or if you structure things the right way, and want access to a nice n-tuple data store, use something like Freebase or Talis as a backend platform
But I agree with Matt Cockerill as well. We need better authoring tools, and authoring tools need standards for markup and structure. I am not sure where this comes from and how this will be implemented. Unfortunately, we all still write our papers in Word. The LaTeX crowd, or the HTML crowed would probably be happy providing some markup (it doesn’t have to be too heavy).
I won’t even debate open access vs. closed access. In my mind it is no longer a debate. It is good to hear publishers talking about online journals, and about integrating other media formats into publications.
Such as being able to quickly update gene annotations. That would be useful.