ASD: Genome-wide predictions of Autism Spectrum Disorder-associated genes

Autism spectrum disorder (ASD) is a neurodevelopmental disorder characterized by deficits in social communication and restricted, repetitive patterns of behavior. ASD has a strong genetic basis but we still lack the full complement of autism-associated genes. 

Here we present a genome-wide prediction of autism-associated genes based on known disease genes in the context of a human brain-specific gene interaction network.

DeepSEA: Deep learning-based algorithmic framework for predicting chromatin effects

DeepSEA is a deep learning-based algorithmic framework for predicting the chromatin effects of sequence alterations with single nucleotide sensitivity. DeepSEA can accurately predict the epigenetic state of a sequence, including transcription factors binding, DNase I sensitivities, and histone marks in multiple cell types. It can be further utilized to predict the chromatin effects of sequence variants and prioritize regulatory variants.

FNTM: Functional Networks of Tissues in Mouse

Functional Networks of Tissues in Mouse (FNTM) provides biomedical researchers with tissue-specific predictions of functional relationships between proteins in the most widely used model organism for human disease, the laboratory mouse. Users can explore FNTM-predicted functional relationships for their tissues and genes of interest or examine gene function and interaction predictions across multiple tissues, all through an interactive, multi-tissue network browser. FNTM makes predictions based on integration of a variety of functional genomic data, including over 13,000 gene expression experiments, and prior knowledge of gene function. FNTM is an ideal starting point for clinical and translational researchers considering a mouse model for their disease of interest, researchers already working with mouse models who are interested in discovering new genes related to their pathways or phenotypes of interest, and biologists working with other organisms to explore the functional relationships of their genes of interest in specific mouse tissue contexts.

GIANT: Genome-scale Integrated Analysis of gene Networks in Tissues

We present genome-wide functional interaction networks for 144 human tissues and cell types developed using a data-driven Bayesian methodology that integrates thousands of diverse experiments spanning tissue and disease states. Tissue-specific networks predict lineage-specific responses to perturbation, identify the changing functional roles of genes across tissues and illuminate relationships among diseases. We introduce NetWAS, which combines genes with nominally significant genome-wide association study (GWAS) P values and tissue-specific networks to identify disease-gene associations more accurately than GWAS alone. Our webserver, GIANT, provides an interface to human tissue networks through multi-gene queries, network visualization, analysis tools including NetWAS and downloadable networks. GIANT enables systematic exploration of the landscape of interacting genes that shape specialized cellular functions across more than a hundred human tissues and cell types.

IMP: Integrative Multi-species Prediction

IMP (Integrative Multi-species Prediction), originally released in 2012, is an interactive web server that enables molecular biologists to interpret experimental results and to generate hypotheses in the context of a large cross-organism compendium of functional predictions and networks. The system provides biologists with a framework to analyze their candidate gene sets in the context of functional networks, expanding or refining their sets using functional relationships predicted from integrated high-throughput data. IMP 2.0 integrates updated prior knowledge and data collections from the last three years in the seven supported organisms (Homo sapiens, Mus musculus, Rattus norvegicus, Drosophila melanogaster, Danio rerio, Caenorhabditis elegans, and Saccharomyces cerevisiae) and extends function prediction coverage to include human disease. IMP identifies homologs with conserved functional roles for disease knowledge transfer, allowing biologists to analyze disease contexts and predictions across all organisms. Additionally, IMP 2.0 implements a new flexible platform for experts to generate custom hypotheses about biological processes or diseases, making sophisticated data-driven methods easily accessible to researchers.

In-silico nano-dissection

Cell-lineage-specific transcripts are essential for differentiated tissue function in metazoan organisms. They are frequently found to be the cause of hereditary disease and mediate progression of acquired diseases. Identifying the tissue specific transcriptome can guide disease gene identification in genetic studies and the development of organ specific therapeutic targets. This server performs an in silico nano-dissection, which is an approach we developed to identify genes with novel cell-lineage specific expression. This bioinformatics strategy leverages high-throughput functional genomics data from tissue homogenates to accurately predict genes enriched in specific cell types.

KNNimpute: K-Nearest Neighbors Imputation

KNNimpute is an implementation of the k-nearest neighbors algorithm for estimation of missing values in microarray data. In our comparative study of several different methods used for missing value estimation we determined that KNNimpute provides superior performance in a variety of situations.

SEEK: Search-based exploration of Expression compendia

SEEK is a computational gene co-expression search engine. SEEK provides biologists with a way to navigate the massive human expression compendium that now contains thousands of expression datasets. SEEK returns a robust ranking of co-expressed genes in the biological area of interest defined by the user's query genes. In the meantime, it also prioritizes thousands of expression datasets according to the user's query of interest. The unique strengths of SEEK include its support for multi-gene query and cross-platform analysis, as well as its rich visualization features.

SPELL: Serial Pattern of Expression Levels Locator

SPELL is a query-driven search engine for large gene expression microarray compendia. Given a small set of query genes, SPELL identifies which datasets are most informative for these genes, then within those datasets additional genes are identified with expression profiles most similar to the query set. Both SGD and WormBase now manage their own instances of SPELL.

Sleipnir: Library for computational functional genomics

Sleipnir is a C++ library enabling efficient analysis, integration, mining, and machine learning over genomic data. This includes a particular focus on microarrays, since they make up the bulk of available data for many organisms, but Sleipnir can also integrate a wide variety of other data types, from pairwise physical interactions to sequence similarity or shared transcription factor binding sites. All analysis is done with attention to speed and memory usage, enabling the integration of hundreds of datasets covering tens of thousands of genes. In addition to the core library, Sleipnir comes with a variety of pre-made tools, providing solutions to common data processing tasks and examples to help you use Sleipnir in your own programs. Sleipnir is free, open source, fully documented, and ready to be used by itself or as a component in your computational biology analyses.

URSAHD: Unveiling RNA Sample Annotation for Human Diseases

URSA (Unveiling RNA Sample Annotation), originally released in 2013, simultaneously estimated the probabilities that a given sample is associated with a particular tissue or cell-type. Individual cell-type models were constructed from more than ten thousand manually curated samples from GEO and then aggregated using Bayesian Correction. This method has been shown effective for both array-based and sequence-based genome-scale experiments.

Now, in addition to tissues and cell-types, URSAHD (Unveiling RNA Sample Annotation for Human Diseases) also measures hundreds of disease-specific signatures in a single gene expression profile. Each disease-specific model (i.e. gene weights) were computed based on thousands of clinical samples from GEO.

Take a stroll through our software cemetery

PILGRM: The Platform for Interactive Learning by Genomics Results Mining

PILGRM is for the biologist with a set of proteins relevant to a disease, biological function or tissue of interest who wants to find additional players in that process. It uses a data driven method that provides added value for literature search results by mining compendia of publicly available gene expression datasets using lists of relevant and irrelevant genes (standards).

HEFalMP: A Human Experimental/FunctionAL MaPper

A functional map is a way of usefully exploring information from thousands of experimental results, focused on a specific query of interest. This might mean finding data that pertains to a single gene/protein, a group of related (or unrelated) genes, a pathway, process, or set of disease-related genes. Functional maps rely on data integration to summarize genomic data as functional relationship networks. Each network encodes how likely it is for every pair of genes in the genome to interact functionally - possibly a direct interaction, like protein binding, or an indirect functional relationship, like participating in the same cellular process. Functional mapping analyzes portions of these networks related to user-specified groups of genes and biological processes and displays the results as probabilities (for individual genes), functional association p-values (for groups of genes), or graphically (as an interaction network). HEFalMp contains information from roughly 15,000 microarray conditions, over 15,000 publications on genetic and physical protein interactions, and several types of DNA and protein sequence analyses and allows the exploration of over 200 H. sapeins process-specific functional relationship networks, including a global, process-independent network capturing the most general functional relationships.

HIDRA: Horizontally Integrated Dataset Relationship Analysis

HIDRA is a visualization and analyis framework for simultaneously exploring multiple microarray datasets at once. HIDRA allows users to quicky identify patterns common across many datasets as well as patterns unique to individual datasets. HIDRA is currently in beta testing and is still under development.

MEFIT: Microarray Experiment Functional Integration Technology

MEFIT is a system for microarray integration. As a framework, MEFIT uses the results of many microarray experiments in combination with known biological process annotations (drawn from the Gene Ontology, KEGG, MIPS, or a biologist's own pathways of interest) to predict new gene pair functional relationships within the given biological functions. Or in other words, MEFIT is a system that takes microarray results and known functional annotations as inputs and produces predicted gene pair functional relationships as output.

GRIFn: Gene Relationship Identification in Functional data

GRIFn is a system for evaluation of datasets and methods using a functional genomics gold standard based on curation by expert biolgists. It allows users to assess the ability of their datasets or methods to recapitulate known biology both in a global sense and in the context of specific biological processes. GRIFn allows enables fair comparisons between various data types and methods.

GOLEM: Gene Ontology Local Exploration Map

GOLEM is a tool for viewing, navigating, and analyzing the hierarchical structure and annotations to the gene ontology. The visualization component allows a user to see the local graph structure around a GO term of interest and navigate to nearby nodes. GOLEM also provides the ability to look for statistical enrichment of GO terms in lists of genes and then observe the relationships between those terms. GOLEM is available both as an applet for use online and as a standalone download.

bioPIXIE: Biological Pathway Inference from eXperimental Interaction Evidence

bioPIXIE is a novel system for biological data integration and visualization for S. cereviciae. It allows the user to discover interaction networks and pathways in which the user's gene(s) of interest participate. The system is based on a Bayesian algorithm for identification of biological networks based on integrated diverse genomic data.

ChARMview: Chromosomal Aberration Region Miner and Viewer

ChARMView is a visualization and analysis system for guided discovery of chromosomal abnormalities from microarray data. Our system facilitates manual or automated discovery of aneuploidies through dynamic visualization and integrated statistical analysis. ChARMView can be used with array CGH and gene expression microarray data, and multiple experiments can be viewed and analyzed simultaneously.

GeneVAnD: Genomic Visualization and Analysis of Datasets

GeneVAnD is an implementation of several visualization techniques that incorporate meaningful statistics that are noise-robust for the purpose of analyzing the results of clustering algorithms on microarray data. This includes a rank-based visualization method that is more robust to noise, a difference display method to aid assessments of cluster quality and detection of outliers, and a projection of high dimensional data into a three dimensional space in order to examine relationships between clusters. Our methods are interactive and are dynamically linked together for comprehensive analysis. Further, our approach applies to both protein and gene expression microarrays, and our architecture is scalable for use on both desktop/laptop screens and large-scale display devices.