Computational Computer Science

by Jessica Collins

When you think of a chemistry or biology lab, you likely imagine a group of scientists performing experiments that involve microscopes and test tubes, putting in a lifetime of work to uncover small details of the natural world. While this is the case in most labs, nowadays there are many which are not only more technologically advanced, but also involve no lab-bench experiments at all. Bioinformatics, or “dry lab biology” as it has been nicknamed, is an emerging field which involves no hands on experimenting, just a high degree of computer science knowledge. The scientists, who are frequently people with backgrounds in information technology, write statistical algorithms to both simulate complex systems in biology and to analyze the high amount of data that has already been published. This new discovery method has produced a vast number of advances in medicine using databases of available information.

The Pittsburgh Supercomputing Center (PSC), a group at the University of Pittsburgh and Carnegie Mellon University, received a 7.6 million dollar grant from the National Science Foundation in October of this year to fund a project called the Data Excel. The goal of the Data Excel is to create a prototype of next generation technology which closely couples analytical resources with storage technology. This is a project building upon the Data Supercell, a data storage system with the world’s largest shared memory that was developed two years ago. Right now, one of the major issues in bioinformatics is the retrieval and usage of specific information from large computer databases. With all of the information the computer has access to, it is difficult for scientists to pick out the specific pieces they want to use for their research. “High-performance computing used to be all about solving partial differential equations, now it’s much more about how you move information around,” said Nick Nystrom, PSC’s Director of Strategic Applications. The Data Excel project will be tested using selected data-intensive research projects in diverse fields of science such as biology, astronomy and computer science.

The PSC is involved in a wide range of projects. One in particular, Blacklight, involves next-generation gene sequencing to examine the genomes of various organisms at high speeds. This computer system operates at much faster rates than traditional DNA sequencers by doing short sequence reads at a time. Another PSC project is involved with connectomics, a field of neuroscience which uses electron microscopy to image sections of the brain which are then re-assembled in a computer database into a three-dimensional image. This year the Nobel Prize in Chemistry was given to three scientists, Martin Karplus, Michael Levitt, and Arieh Warshel, for developing computer models for uncovering chemical processes by combining classic and quantum physics. The three Nobel Prize winners utilized the PSC systems to collaborate on their research.

With decades worth of published information, collaboration among research groups across the globe is more important for advancing the fields of science and medicine than ever before. While complex data storage systems already exist, using that information in an effective manner is an issue that is only now being addressed. The PSC plays a key role in developing technology that allows for such collaboration. “We are sort of a service organization for the community, providing scientists with infrastructure and the tools they need to do their work, whatever those may be,” said Markus Dittrich, Ph.D., head of the CPSC's National Resource for Biomedical Supercomputing.

For the scientific community, this type of technology means a more efficient method for both conducting and analyzing research. For the general public, this could lead to better drug treatment options, the possibility of gene sequencing, and doctors with a broader spectrum of readily accessible information. Students interested in the fields of biology and chemistry will likely need to develop a background in computer science to make an impact in research fields in the future. “I think what’s important for students to realize right now is that a lot of boundaries between the fields have broken down and you shouldn’t just think of yourself as a biologist. It’s good to acquire skills outside of that range,” said Dittrich.

The PSC is currently working with Pitt’s World History department to create a web-based interface that will allow them to conduct data searches in real-time. This application can extend across multiple fields of academia. With these developments in technology, there is huge potential for the future in science in going beyond the limits of “wet lab” experimentation.