GeneVAnD Help

·         File Formats

·         Running GeneVAnD

·         Loading Data

·         Visualizing the Data

·         Searching for a Gene

Developer: Matthew Hibbs

For questions, comments, or bug reports, email us at genevand@genomics.princeton.edu.


File Formats

 

GeneVAnD expects microarray data formatted in the typical .pcl format.  The basic structure of this format is tab-delimited lines, each of which represents the expression levels for one gene.  The first line contains header information about each experimental condition.  The second row is optionally a list of experiment weights.  The first entry on each line is a unique identifier of the gene/protein, the second is often the common name of the gene/protein, the third entry is optionally a gene weight, and the expression levels follow thereafter.  GeneVAnD does not perform any normalization or missing value estimation on loaded data.  In order to perform PCA, any missing values are replaced with 0.  Users are highly encouraged to perform missing value estimation using other available software, such as KNNImpute.

 

Fig. 1 – Example of format for data files

 

Clustering files should contain one number per line that indicates which cluster each gene/protein belongs to in the order of the data file.  For example, if the 17th gene in the data file belongs to cluster 3, the 17th line of the clustering file should contain a 3.

 

Fig. 2 – Example of format for cluster files

 

Running GeneVAnD

 

GeneVAnD requires both Java (version 1.4.2 or higher) and Java3D (version 1.3 or higher) in order to run.  These can be found at:



GeneVAnD is a computationally demanding program, and as such should be run on a suitably fast machine.  We recommend the following minimum system configurations for smooth execution:

 

PC-compatible Systems



Macintosh Systems



GeneVAnD is distributed as a .jar file.  When viewing larger data sets it may be necessary to run java with a command line option to allocate more memory than the default settings.  This can be done with a command line such as:

     java –Xmx512m –jar geneVAnD.jar

This will allocate 512MB of memory for GeneVAnD to use.

 

When GeneVAnd is first launched, all of the views will be blank.

 

Fig. 3 – GeneVAnD when first launched

 

 

Loading Data

 

To load data use the menu in the “GeneVAnD” window File/Load Data…  This will open a dialog box where you choose which data file you would like to load into GeneVAnD (such as the sample EisenPaperData.pcl available here).  Once the file is selected, GeneVAnD will load the contents of the data file, perform the PCA calculations, generate additional statistics, and finally generate the visualizations.

 

Fig. 4 – GeneVAnD with a data file loaded

 

Next, you will want to load the clustering that you wish to analyze.  This is done in the menu by choosing File/Load Clusters… which will open a dialog box asking for which clustering file you would like GeneVAnD to use (such as the sample EisenPaperClusters.txt available here).  Now the data will be displayed and color-coded by cluster.  The vertical color bars in the Expression Levels window correspond to the colors of the genes in the PCA display window.  These colors can be changed by clicking on the Color button associated with each cluster in the “GeneVAnD” window.

 

Fig. 5 – GeneVAnD with data and clusters loaded

 

 

Visualizing the Data

 

Expression Level Visualization

 

The “VAnD – Expression Levels” window contains two views of the same data, a context view on the left and a focus view on the right.  the region visible in the focus view is boxed in purple in the context view.  Right-clicking in the context pane will cause the focus pane to be centered on the area right-clicked on, which moves the purple box to that location.  The window also contains options for setting cutoff values and choosing which of the three coloration options you wish to use (Classic, Difference, or Rank Based).

 

Fig. 6 – The Expression Levels window

 

Left-clicking on a gene’s expression profile in either pane will select that gene across all views.  A blue box is displayed around the selected genes in the Expression Level display, and a wire-frame blue sphere will surround the gene in the PCA Display.

 

Fig. 7 – Example of a selected gene highlighted in all views

 

The Expression Levels window can also display hierarchies as dendrograms in the context and focus panes.  In the focus pane the dendrograms of each cluster are shown (each cluster is independently hierarchically clustered), and in the context pane a dengrogram of the cluster averages is displayed (the averages are also hierarchically clustered independently).  To calculate and display these dendrograms use the menu to check Display Options / Show Hierarchy.  You can also choose which metric to use during the hierarchical clustering calculation.  The first time a hierarchy is displayed using each metric it must be calculated, which will cause a wait dialog to appear that shows the progress of the calculation.

 

Fig. 8 – Hierarchies shown in the Expression Levels window

 

Double-clicking on a cluster average bar causes the cluster to collapse down to just it’s average.  This also causes the cluster to no longer appear in the PCA Display.  A cluster can also be collapsed by clicking on the Hide button in the Clusters tab.  Double-clicking a collapsed cluster or clicking on the Show button will cause the genes in the cluster to re-appear.  It can be useful to collapse clusters in order to better view the dendrogram of the cluster averages.

 

Fig. 9 – Hierarchy of cluster averages with 4 clusters collapsed

 

 

PCA Display Visualization

 

In the “VAnD – PCA Display” window, you can change which Principal Component corresponds to which axis and increase or decrease the amount of standard deviation inside the larger transparent spheres.

 

Fig. 10 – The PCA Display window

 

The mouse and keyboard control navigation in the 3D space is as follows:

 

 

Searching for a Gene

 

You can select/highlight a desired gene by using the menu on the “GeneVAnD” window, Find/Find Gene…  This will open a dialog where you can provide a gene name.  After entering the desired name and clicking on the “Search” button, the gene will be selected in all views.  The name provided must correspond to one of the names in the first or second column of the input file.