HumanBase: data-driven predictions of gene expression, function, regulation, and interactions in human

HumanBase applies machine learning algorithms to learn biological associations from massive genomic data collections. These integrative analyses reach beyond existing "biological knowledge" represented in the literature to identify novel, data-driven associations.

The Human Nephrogenesis Atlas

A spatial transcriptomic map for human nephrogenesis to study development and disease at a single cell and network level within an anatomic framework.

AMBER: toolkit for designing high-performance neural network models

AMBER is a toolkit for designing high-performance neural network models automatically in Genomics and Bioinformatics. AMBER-designed deep convolutional neural network models outperform equivalent non-neural architecture search models, even published ones designed by experts.

FENRIR: tissue-specific enhancer functional networks for associating distal regulatory regions to disease

FENRIR integrates tissue-specific enhancer networks with disease GWAS or genes and reprioritizes ~48,000 enhancers.

DeepArk: deep learning models of regulatory activity for model species

DeepArk is a set of deep learning algorithms capable of predicting regulatory activity (e.g. transcription factor binding) from genomic sequences. DeepArk consists of four distinct neural networks for mouse (Mus musculus), fly (Drosophila melanogaster), worm (Caenorhabditis elegans), and zebrafish (Danio rerio)

Antigen Explorer: antigen combinations for precision cancer recognition

Antigen Explorer is an interactive resource for browsing antigen combinations for more precise tumor recognition. Leveraging expression data from TCGA and GTEx, the discrimination potential of all possible combinations of surface antigens were scored for 33 tumor types. Users can explore the top predictions and make interactive plots to evaluate an antigen pair against normal tissue cross-reactivity.

ExPecto: tissue-specific gene expression effect prediction for human mutations

ExPecto is a framework for ab initio sequence-based prediction of mutation gene expression effects and disease risks.

ALZ: Exploring functional networks and expression of neuron types related to Alzheimer's disease

Alzheimer's disease, as with most other neurodegenerative diseases, is characterized by selective neuronal vulnerability, where some types of neurons are particularly susceptible to the disease, while others are resistant. Here, we provide genome-wide functional networks in human for 7 neuron types, including the vulnerable neurons from the entorhinal cortex layer II and hippocampus CA1, as well as resistant neurons from the hippocampus CA2, CA3, dentate gyrus, primary visual cortex, and primary somatosensory cortex. Our webserver also provides a portal to explore the mouse cell-type-specific gene expression profiles for these 7 neuron types across the lifetime of the healthy mouse (5, 12, 24 months).

Selene: library for deep-learning-based sequence models

Selene is an open source, PyTorch-based library for developing deep-learning-based sequence models. The library is supported by a command-line interface that allows users to easily train and evaluate new models with minimal code. Models developed in our group (e.g. DeepSEA, SeqWeaver, DeepArk) can be applied to make predictions about new variants and sequences using Selene.

Seqweaver: deep-learning framework for predicting the RNA-binding protein dysregulation

Seqweaver is a deep learning-based algorithmic framework for predicting the RNA-binding protein dysregulation effects of sequence alterations with single nucleotide sensitivity.

DeepSEA: Deep learning-based algorithmic framework for predicting chromatin effects

DeepSEA is a deep learning-based algorithmic framework for predicting the chromatin effects of sequence alterations with single nucleotide sensitivity. DeepSEA can accurately predict the epigenetic state of a sequence, including transcription factors binding, DNase I sensitivities, and histone marks in multiple cell types. It can be further utilized to predict the chromatin effects of sequence variants and prioritize regulatory variants.

DeepSEA is now also integrated into HumanBase.

ASD: Genome-wide predictions of Autism Spectrum Disorder-associated genes

Autism spectrum disorder (ASD) is a neurodevelopmental disorder characterized by deficits in social communication and restricted, repetitive patterns of behavior. ASD has a strong genetic basis but we still lack the full complement of autism-associated genes. 

Here we present a genome-wide prediction of autism-associated genes based on known disease genes in the context of a human brain-specific gene interaction network.

URSAHD: Unveiling RNA Sample Annotation for Human Diseases

URSA (Unveiling RNA Sample Annotation), originally released in 2013, simultaneously estimated the probabilities that a given sample is associated with a particular tissue or cell-type. Individual cell-type models were constructed from more than ten thousand manually curated samples from GEO and then aggregated using Bayesian Correction. This method has been shown effective for both array-based and sequence-based genome-scale experiments.

Now, in addition to tissues and cell-types, URSAHD (Unveiling RNA Sample Annotation for Human Diseases) also measures hundreds of disease-specific signatures in a single gene expression profile. Each disease-specific model (i.e. gene weights) were computed based on thousands of clinical samples from GEO.

GIANT: Genome-scale Integrated Analysis of gene Networks in Tissues

We present genome-wide functional interaction networks for 144 human tissues and cell types developed using a data-driven Bayesian methodology that integrates thousands of diverse experiments spanning tissue and disease states. Tissue-specific networks predict lineage-specific responses to perturbation, identify the changing functional roles of genes across tissues and illuminate relationships among diseases. We introduce NetWAS, which combines genes with nominally significant genome-wide association study (GWAS) P values and tissue-specific networks to identify disease-gene associations more accurately than GWAS alone. Our webserver, GIANT, provides an interface to human tissue networks through multi-gene queries, network visualization, analysis tools including NetWAS and downloadable networks. GIANT enables systematic exploration of the landscape of interacting genes that shape specialized cellular functions across more than a hundred human tissues and cell types.

GIANT is now also integrated into HumanBase.

YETI: Your Evidence Tailored Integration

Our method relies on a library of 237 Bayesian functional networks, each capturing the biology of a particular pathway or process. In aggregate, this functional library maps the entire human functional landscape. These genome-wide networks were constructed by integrating the growing public data compendium, currently including 35,300 experiments.

Based on the user's dataset, YETI selects relevant pathway/process networks by solving a global regularized linear regression problem. The resulting dataset-specific functional network represents an unbiased study of the user’s experiment that leverages the collection of public biological data.

IMP: Integrative Multi-species Prediction

IMP (Integrative Multi-species Prediction), originally released in 2012, is an interactive web server that enables molecular biologists to interpret experimental results and to generate hypotheses in the context of a large cross-organism compendium of functional predictions and networks. The system provides biologists with a framework to analyze their candidate gene sets in the context of functional networks, expanding or refining their sets using functional relationships predicted from integrated high-throughput data. IMP 2.0 integrates updated prior knowledge and data collections from the last three years in the seven supported organisms (Homo sapiens, Mus musculus, Rattus norvegicus, Drosophila melanogaster, Danio rerio, Caenorhabditis elegans, and Saccharomyces cerevisiae) and extends function prediction coverage to include human disease. IMP identifies homologs with conserved functional roles for disease knowledge transfer, allowing biologists to analyze disease contexts and predictions across all organisms. Additionally, IMP 2.0 implements a new flexible platform for experts to generate custom hypotheses about biological processes or diseases, making sophisticated data-driven methods easily accessible to researchers.

FNTM: Functional Networks of Tissues in Mouse

Functional Networks of Tissues in Mouse (FNTM) provides biomedical researchers with tissue-specific predictions of functional relationships between proteins in the most widely used model organism for human disease, the laboratory mouse. Users can explore FNTM-predicted functional relationships for their tissues and genes of interest or examine gene function and interaction predictions across multiple tissues, all through an interactive, multi-tissue network browser. FNTM makes predictions based on integration of a variety of functional genomic data, including over 13,000 gene expression experiments, and prior knowledge of gene function. FNTM is an ideal starting point for clinical and translational researchers considering a mouse model for their disease of interest, researchers already working with mouse models who are interested in discovering new genes related to their pathways or phenotypes of interest, and biologists working with other organisms to explore the functional relationships of their genes of interest in specific mouse tissue contexts.

SEEK: Search-based exploration of Expression compendia

SEEK is a computational gene co-expression search engine. SEEK provides biologists with a way to navigate the massive human expression compendium that now contains thousands of expression datasets. SEEK returns a robust ranking of co-expressed genes in the biological area of interest defined by the user's query genes. In the meantime, it also prioritizes thousands of expression datasets according to the user's query of interest. The unique strengths of SEEK include its support for multi-gene query and cross-platform analysis, as well as its rich visualization features.

Sleipnir: Library for computational functional genomics

Sleipnir is a C++ library enabling efficient analysis, integration, mining, and machine learning over genomic data. This includes a particular focus on microarrays, since they make up the bulk of available data for many organisms, but Sleipnir can also integrate a wide variety of other data types, from pairwise physical interactions to sequence similarity or shared transcription factor binding sites. All analysis is done with attention to speed and memory usage, enabling the integration of hundreds of datasets covering tens of thousands of genes. In addition to the core library, Sleipnir comes with a variety of pre-made tools, providing solutions to common data processing tasks and examples to help you use Sleipnir in your own programs. Sleipnir is free, open source, fully documented, and ready to be used by itself or as a component in your computational biology analyses.

SPELL: Serial Pattern of Expression Levels Locator

SPELL is a query-driven search engine for large gene expression microarray compendia. Given a small set of query genes, SPELL identifies which datasets are most informative for these genes, then within those datasets additional genes are identified with expression profiles most similar to the query set. Both SGD and WormBase now manage their own instances of SPELL.

KNNimpute: K-Nearest Neighbors Imputation

KNNimpute is an implementation of the k-nearest neighbors algorithm for estimation of missing values in microarray data. In our comparative study of several different methods used for missing value estimation we determined that KNNimpute provides superior performance in a variety of situations.

Take a stroll through our software cemetery

In-silico nano-dissection

Cell-lineage-specific transcripts are essential for differentiated tissue function in metazoan organisms. They are frequently found to be the cause of hereditary disease and mediate progression of acquired diseases. Identifying the tissue specific transcriptome can guide disease gene identification in genetic studies and the development of organ specific therapeutic targets. This server performs an in silico nano-dissection, which is an approach we developed to identify genes with novel cell-lineage specific expression. This bioinformatics strategy leverages high-throughput functional genomics data from tissue homogenates to accurately predict genes enriched in specific cell types.

PILGRM: The Platform for Interactive Learning by Genomics Results Mining

PILGRM is for the biologist with a set of proteins relevant to a disease, biological function or tissue of interest who wants to find additional players in that process. It uses a data driven method that provides added value for literature search results by mining compendia of publicly available gene expression datasets using lists of relevant and irrelevant genes (standards).

MEFIT: Microarray Experiment Functional Integration Technology

MEFIT is a system for microarray integration. As a framework, MEFIT uses the results of many microarray experiments in combination with known biological process annotations (drawn from the Gene Ontology, KEGG, MIPS, or a biologist's own pathways of interest) to predict new gene pair functional relationships within the given biological functions. Or in other words, MEFIT is a system that takes microarray results and known functional annotations as inputs and produces predicted gene pair functional relationships as output.

HEFalMP: A Human Experimental/FunctionAL MaPper

A functional map is a way of usefully exploring information from thousands of experimental results, focused on a specific query of interest. This might mean finding data that pertains to a single gene/protein, a group of related (or unrelated) genes, a pathway, process, or set of disease-related genes. Functional maps rely on data integration to summarize genomic data as functional relationship networks. Each network encodes how likely it is for every pair of genes in the genome to interact functionally - possibly a direct interaction, like protein binding, or an indirect functional relationship, like participating in the same cellular process. Functional mapping analyzes portions of these networks related to user-specified groups of genes and biological processes and displays the results as probabilities (for individual genes), functional association p-values (for groups of genes), or graphically (as an interaction network). HEFalMp contains information from roughly 15,000 microarray conditions, over 15,000 publications on genetic and physical protein interactions, and several types of DNA and protein sequence analyses and allows the exploration of over 200 H. sapeins process-specific functional relationship networks, including a global, process-independent network capturing the most general functional relationships.

GRIFn: Gene Relationship Identification in Functional data

GRIFn is a system for evaluation of datasets and methods using a functional genomics gold standard based on curation by expert biolgists. It allows users to assess the ability of their datasets or methods to recapitulate known biology both in a global sense and in the context of specific biological processes. GRIFn allows enables fair comparisons between various data types and methods.

HIDRA: Horizontally Integrated Dataset Relationship Analysis

HIDRA is a visualization and analyis framework for simultaneously exploring multiple microarray datasets at once. HIDRA allows users to quicky identify patterns common across many datasets as well as patterns unique to individual datasets. HIDRA is currently in beta testing and is still under development.

bioPIXIE: Biological Pathway Inference from eXperimental Interaction Evidence

bioPIXIE is a novel system for biological data integration and visualization for S. cereviciae. It allows the user to discover interaction networks and pathways in which the user's gene(s) of interest participate. The system is based on a Bayesian algorithm for identification of biological networks based on integrated diverse genomic data.

GOLEM: Gene Ontology Local Exploration Map

GOLEM is a tool for viewing, navigating, and analyzing the hierarchical structure and annotations to the gene ontology. The visualization component allows a user to see the local graph structure around a GO term of interest and navigate to nearby nodes. GOLEM also provides the ability to look for statistical enrichment of GO terms in lists of genes and then observe the relationships between those terms. GOLEM is available both as an applet for use online and as a standalone download.

ChARMview: Chromosomal Aberration Region Miner and Viewer

ChARMView is a visualization and analysis system for guided discovery of chromosomal abnormalities from microarray data. Our system facilitates manual or automated discovery of aneuploidies through dynamic visualization and integrated statistical analysis. ChARMView can be used with array CGH and gene expression microarray data, and multiple experiments can be viewed and analyzed simultaneously.

GeneVAnD: Genomic Visualization and Analysis of Datasets

GeneVAnD is an implementation of several visualization techniques that incorporate meaningful statistics that are noise-robust for the purpose of analyzing the results of clustering algorithms on microarray data. This includes a rank-based visualization method that is more robust to noise, a difference display method to aid assessments of cluster quality and detection of outliers, and a projection of high dimensional data into a three dimensional space in order to examine relationships between clusters. Our methods are interactive and are dynamically linked together for comprehensive analysis. Further, our approach applies to both protein and gene expression microarrays, and our architecture is scalable for use on both desktop/laptop screens and large-scale display devices.