Research Areas
The overarching goal of our research is to predict and verify the biological function of genes and proteins within an organism by utilizing the power of computation to better harness the information found in diverse biological assays. We approach this broad goal from several angles and perspectives which overlap and compliment each other.
Genomic Data Integration
The recent explosion of whole genome testing methodologies and the increasing push to make biological datasets publicly available has created a vast, but unwieldy repository of raw biological knowledge. Our work in this area pursues methods that combine these various data (such as microarrays, two hybrid assays, affinity precipitation, synthetic lethality, co-localization, etc.) in a manner that reflects the data's reliability and biological accuracy. This large-scale integration of various data sources can then be used for a variety of tasks in computational biology, including gene/protein function prediction and identification of biological networks and pathways.
-
Myers et al. Discovery of biological networks from diverse functional
genomic data. Genome Biology 6(13):R114, 2005.
-
Troyanskaya et al. A Bayesian framework for combining heterogeneous
data sources for gene function prediction (in S. cerevisiae).
Proc Natl Acad Sci USA 100(14): 8348-53, 2003.
Microarray Analysis
Among the many recently developed whole genome biological assays, microarrays are an inexpensive and easy method to take a "snapshot" of expression levels under a variety of conditions. While microarrays have the ability to shed light on a variety of biological mechanisms, the resulting data presents special challenges for analysis. High levels of noise, missing values, and large heterogenies between protocols and experimental methods require robust techniques for analysis and visualization.
-
Hibbs et al. Visualization Methods for Statistical Analysis
of Microarray Clusters. BMC Bioinformatics 6: 115, 2005.
-
Myers et al. Accurate detection of aneuploidies in array CGH
and gene expression microarray data. Bioinformatics 20(18):
3533-3543, 2004.
- Whitfield et al. Systemic and cell type-specific gene expression patterns in scleroderma skin. Proc Natl Acad Sci USA, 100(21):12319-24, 2003.
- Chi et al. Endothelial cell diversity revealed by global expression profiling. Proc Natl Acad Sci USA, 100(19):10623-8, 2003.
- Chen et al. Variation in gene expression patterns in human gastric cancers. Mol Biol Cell, 14(8):3208-15, 2003.
- Bohen et al. Variation in gene expression patterns in follicular lymphoma and the response to rituximab. Proc Natl Acad Sci USA, 100(4): 1926-30, 2003.
- Leun et al. Phospholipase A2, Group IIA expression in gastric adenocarcinoma is associated with prolonged survival and less frequent metastasis. Proc Natl Acad Sci USA, 99(25):16203-8, 2002.
-
Troyanskaya et al. Nonparametric methods for identifying
differentially expressed genes in microarray data. Bioinformatics
18: 1454-61, 2002.
-
Troyanskaya et al. Missing value estimation methods for DNA microarrays.
Bioinformatics 17: 520-525, 2001.
Gene and Protein Function Prediction
Now that it is routine technology to perform genomic sequencing of whole organisms, which provides the "code" underlying biological mechanisms, the next key challenge in genomics is to understand the translation from this "code" to specific gene function and regulation. Functional genomics aims to determine what these genes do (gene function) and how they are controlled inside the cell (regulation). Experimental approaches to these problems have led to an explosion of functional genomics data, but these datasets are large, very noisy, and highly heterogeneous, making accurate analysis by existing computational methods impossible. Novel computing methodologies developed specifically for biological data are essential to realize the potential of functional genomics. We are developing such methodologies based on machine learning, statistical, and data mining techniques.
-
Barutcuoglu et al. Hierarchical Multi-label Prediction of Gene
Function. Bioinformatics, 2006.
-
Myers et al. Discovery of biological networks from diverse functional
genomic data. Genome Biology 6(13):R114, 2005.
-
Troyanskaya et al. A Bayesian framework for combining heterogeneous
data sources for gene function prediction (in S. cerevisiae).
Proc Natl Acad Sci USA 100(14): 8348-53, 2003.
Detection and Analysis of Chromosomal Abnormalities
Chromosomal copy number changes play an important role in cancer and in molecular evolution, and we are developing robust algorithms for identifying chromosomal abnormalities accurately on genomic scale. In collaboration with biologists at the Lewis-Sigler Institute for integrative genomics, we are using these algorithms to study chromosomal abnormalities in the context of molecular evolution and cancer. Results of these experiments may shed light on how chromosomal aberrations are involved in carcinogenesis. Our goal is developing both technologies that can uncover fundamental biology and also methods that can be routinely applied clinically to identify medically relevant functional copy number changes.
-
Myers et al. Accurate detection of aneuploidies in array CGH
and gene expression microarray data. Bioinformatics 20(18):
3533-3543, 2004.
-
Myers et al. Visualization-based discovery and analysis of
genomic abberations in microarray data. BMC Bioinformatics
6: 146, 2005.
Identification of Biological Networks and Pathways
While it is important to understand and predict the function of individual genes, the more complete biological story of many genes and proteins is much more complicated. Many genes/proteins serve multiple, interacting roles within a cell that can better be characterized within the framework of pathways or networks of biological processes. Our work in this area addresses a key issue in systems biology research of how to integrate the myriad of genome-wide data being generated by the research community into meaningful biological pathway and network predictions.
-
Myers et al. Discovery of biological networks from diverse functional
genomic data. Genome Biology 6(13):R114, 2005.
Evaluation and Validation of Computational Predictions
In addition to generating predictions of gene function, regulatory interactions, and biological pathways, we recognize the need to validate these predictions in order to evaluate individual methods and compare the relative performance between methods. We work in collaboration with the Gene Ontology Consortium and the curators of biological databases, such as SGD, in order to develop standards and methods that can be used for fair evaluation and comparison of prediction methods.
-
Sealfon RSG et al. GOLEM: an interactive graph-based gene ontology navigation and analysis tool. BMC Bioinformatics 2006, 7:443.
-
Myers et al. Finding function: evaluation methods for functional genomic data. BMC Genomics 2006, 7:187.
- Reguly et al. Comprehensive Curation and Analysis of Global Interaction Networks in Saccharomyces cerevisiae. Submitted to Journal of Biology, 2005.
- Brown et al. Global analysis of gene function in yeast by quantitative phenotypic profiling. Accepted to Molecular Systems Biology, 2005.
Visualization
Effective visualization-based analysis is critical to unlocking the full potential of genomic data and to support collaborative research that is commonplace in genomics. Currently available methods are designed to visualize a single dataset in limited ways and are often hampered by the limited resolution and size of traditional displays. We are developing methodologies that enable experts to drive analysis through visualization and iterative feedback. These methods are dynamic and scalable: they can be used on either desktop screens or on large wall-size displays thereby supporting both individual and collaborative analysis by groups of investigators.
-
Sealfon RSG et al. GOLEM: an interactive graph-based gene ontology navigation and analysis tool. BMC Bioinformatics 2006, 7:443.
-
Myers et al. Finding function: evaluation methods for functional genomic data. BMC Genomics 2006, 7:187.
-
Myers et al. Discovery of biological networks from diverse functional
genomic data. Genome Biology 6(13):R114, 2005.
-
Wallace et al. Tools and Applications for Large Scale Display Walls.
Special Issue on Large Format Displays, IEEE Computer Graphics and
Applications. July/August 2005.
-
Myers et al. Visualization-based discovery and analysis of
genomic abberations in microarray data. BMC Bioinformatics
6: 146, 2005.
-
Hibbs et al. Visualization Methods for Statistical Analysis
of Microarray Clusters. BMC Bioinformatics 6: 115, 2005.
-
Li et al. Dynamic Scalable Visualization for Collaborative
Scientific Applications. IPDPS 2005 Workshop on Next Generation
Software Proceedings. 2005.