Research Areas

The overarching goal of our research is to predict and verify the biological function of genes and proteins within an organism by utilizing the power of computation to better harness the information found in diverse biological assays. We approach this broad goal from several angles and perspectives which overlap and compliment each other.

Genomic Data Integration

The recent explosion of whole genome testing methodologies and the increasing push to make biological datasets publicly available has created a vast, but unwieldy repository of raw biological knowledge. Our work in this area pursues methods that combine these various data (such as microarrays, two hybrid assays, affinity precipitation, synthetic lethality, co-localization, etc.) in a manner that reflects the data's reliability and biological accuracy. This large-scale integration of various data sources can then be used for a variety of tasks in computational biology, including gene/protein function prediction and identification of biological networks and pathways.

Microarray Analysis

Among the many recently developed whole genome biological assays, microarrays are an inexpensive and easy method to take a "snapshot" of expression levels under a variety of conditions. While microarrays have the ability to shed light on a variety of biological mechanisms, the resulting data presents special challenges for analysis. High levels of noise, missing values, and large heterogenies between protocols and experimental methods require robust techniques for analysis and visualization.

Gene and Protein Function Prediction

Now that it is routine technology to perform genomic sequencing of whole organisms, which provides the "code" underlying biological mechanisms, the next key challenge in genomics is to understand the translation from this "code" to specific gene function and regulation. Functional genomics aims to determine what these genes do (gene function) and how they are controlled inside the cell (regulation). Experimental approaches to these problems have led to an explosion of functional genomics data, but these datasets are large, very noisy, and highly heterogeneous, making accurate analysis by existing computational methods impossible. Novel computing methodologies developed specifically for biological data are essential to realize the potential of functional genomics. We are developing such methodologies based on machine learning, statistical, and data mining techniques.

Detection and Analysis of Chromosomal Abnormalities

Chromosomal copy number changes play an important role in cancer and in molecular evolution, and we are developing robust algorithms for identifying chromosomal abnormalities accurately on genomic scale. In collaboration with biologists at the Lewis-Sigler Institute for integrative genomics, we are using these algorithms to study chromosomal abnormalities in the context of molecular evolution and cancer. Results of these experiments may shed light on how chromosomal aberrations are involved in carcinogenesis. Our goal is developing both technologies that can uncover fundamental biology and also methods that can be routinely applied clinically to identify medically relevant functional copy number changes.

Identification of Biological Networks and Pathways

While it is important to understand and predict the function of individual genes, the more complete biological story of many genes and proteins is much more complicated. Many genes/proteins serve multiple, interacting roles within a cell that can better be characterized within the framework of pathways or networks of biological processes. Our work in this area addresses a key issue in systems biology research of how to integrate the myriad of genome-wide data being generated by the research community into meaningful biological pathway and network predictions.

Evaluation and Validation of Computational Predictions

In addition to generating predictions of gene function, regulatory interactions, and biological pathways, we recognize the need to validate these predictions in order to evaluate individual methods and compare the relative performance between methods. We work in collaboration with the Gene Ontology Consortium and the curators of biological databases, such as SGD, in order to develop standards and methods that can be used for fair evaluation and comparison of prediction methods.


Effective visualization-based analysis is critical to unlocking the full potential of genomic data and to support collaborative research that is commonplace in genomics. Currently available methods are designed to visualize a single dataset in limited ways and are often hampered by the limited resolution and size of traditional displays. We are developing methodologies that enable experts to drive analysis through visualization and iterative feedback. These methods are dynamic and scalable: they can be used on either desktop screens or on large wall-size displays thereby supporting both individual and collaborative analysis by groups of investigators.