X3DNA-DSSR: a resource for structural bioinformatics of nucleic acids
(An NIGMS National Resource supported by NIH grant R24GM153869)

The value of tiny and self-contained software in the big-data era

Nowadays, “big data” and “big science” are hot topics. They all sound good and certainly come about for a reason. Yet, to transform data to information to knowledge to understanding to wisdom, sophisticated software tools are required. The programs can be big and complicated, or small and self-contained, fitting different purposes. As long as they can get the claimed job done in a robust fashion, size should not be a concern.

Over the years, however, I have seen a trend of bloated software with many (fragile) dependencies in bioinformatics. Some tools are so picky and hard to use/maintain that instead of serving, they become sort of a master. As a more representative example, I recently tried to install an open-source software associated with a paper published just a few years ago in a leading journal. The software has only a few dependencies, yet some of them have already become obsolete. I spent hours each time, on Mac OS X and two versions of Ubuntu Linux, but failed to get it running properly (always abort with error messages). The download page hosting the software has been inactive since around the publication of the paper. Presumably, the PhD student or postdoc who wrote the code had left the lab, and with a paper published, all is done!

As an active practitioner of bioinformatics for well over a decade, I can confidently claim to be well above average in familiarity with Linux/Mac OS X and associated shell programming and make etc tools, and various common scripting and compiled programming languages. Yet, once in a while, I get frustrated when I try to download and install a software tool attached to a paper I am interested in. As I see it, the vast majority of software programs from research labs are publication-oriented — as long a paper is published, it is finished.

From my experience, I always see software as engineering. It needs careful design and great attention to meticulous details. A sophisticated piece of scientific software is a combination of science and engineering. Expertise in domain knowledge is a must, and refined skills in computer programming is indispensable. The DSSR program I created and continuously refined over the past three years represents what a scientific software should be in my believe.

Among other unique features, DSSR is tiny (< 1mb), self-contained (without run-time dependencies) and runs on Windows, Mac OS X, and Linux. Getting DSSR up and running should take only minutes by any one with basic familiarity of common computer systems. I have no doubt that the beauty of being small as represented by DSSR will be gradually appreciated by the community.

Comment

« Characterization of H-type pseudoknots with DSSR · DSSR output in JSON format »

Created and maintained by Dr. Xiang-Jun Lu [律祥俊] (xiangjun@x3dna.org) · Powered by Textpattern
Dr. Lu is currently affiliated with the Bussemaker Laboratory at the Department of Biological Sciences, Columbia University.

Thank you for printing this article from http://home.x3dna.org/. Please do not forget to visit back for more 3DNA-related information. — Xiang-Jun Lu

Name	Remember
Email
Website
Message	Textile help

X3DNA-DSSR: a resource for structural bioinformatics of nucleic acids(An NIGMS National Resource supported by NIH grant R24GM153869)

The value of tiny and self-contained software in the big-data era

Comment

X3DNA-DSSR: a resource for structural bioinformatics of nucleic acids
(An NIGMS National Resource supported by NIH grant R24GM153869)