Figure 1.

Three-stage segmental aneuploidy detection scheme. The edge detection filter estimates edge coordinates, which are then refined by the EM edge-placement algorithm. The resulting edges serve as input to the prediction significance test that analyzes statistical significance of spatial biases.
Figure 2.

Preliminary edge detection filtering process illustrated on gene expression data positioned along the chromosome. Bars above the coordinate axis represent overexpression, bars below represent underexpression. The input-output relation for each of the filters is given on the left.
is the output as a function of
where
refers to gene index on the chromosome and
is the window size of each filter. Significant peaks are marked at the output of the differentiator.
Figure 3.

Receiver operating characteristic (ROC) curves for sign test, mean test, coefficient of variance, and combined tests with p-value cutoffs between 10 -6 and .4. Performance was evaluated on synthetic data with simulated 50-gene aneuploidies and generated with
A combined mean and sign test shows the highest sensitivity at every false positive rate (FPR) tested.
Figure 4.

Effect of multiplicative noise on A. sensitivity and B. errors in edge coordinates (as % of total window size). Performance of the scheme in identifying a 50 gene aneuploidal segment was evaluated under varying degrees of noise.
was varied while the remaining terms were fixed at .1. Similarly,
were varied with
. Biological noise is typically under 0.65 for
and under 0.2 for
(Table 1). P-value cutoffs were set at 10 -3 and 10 -2 for the sign and mean permutation tests respectively, and the tests were combined as previously described. The detection scheme with the combined mean and sign window significance test identifies most windows (>90%) with high accuracy in placement of edge coordinates (error < 0.1%) and is robust to high levels of spot, test, and reference noise (substantially higher than noise levels common in biological data shown in Table 1).
Figure 5.

Chromosomal maps showing a subset of predicted aneuploidies (sign test p-values of < 10 -3 and mean permutation test p-values of < 10 -2 ) and biologically relevant mapped chromosomal elements. Aneuploidies are color-coded: red indicates amplification and green indicates deletion. Predictions shown in different rows on the same chromosome correspond to different yeast strains (e.g. Chr II), and multiple predications at the same chromosomal coordinate represent identical aneuploidies found in multiple strains (e.g. Chr XI). Proximity of predictions to LTR, transposon, and tRNA elements was evaluated through 10,000 random placements of same-sized regions on the chromosomal map and through finding the proportion of random regions with shorter distance (d rand) to homologous elements than real predictions (d obs)
.
Figure 6.

Gene expression levels plotted by chromosomal location in example segmental aneuploidies: A. anp1 (chromosome II, sign test p-value of < 10 -10, mean permutation test p-value of 10 -3 ) and B. prb1 (chromosome III, sign test p-value of < 10 -10, mean permutation test p-value of < 10 -4) heterozygous deletion mutants. Aneuploidies predicted by our method are identified by arrows and correspond to spatial expression biases.
Figure 7.

Overlapping amplification predictions in array CGH and gene expression microarray data for breast cancer. Amplifications predicted from gene expression data are shown below the chromosomal map, those predicted from array CGH data are shown above the map.
Table 1.

Estimated parameters for array CGH and expression human breast cancer data. Parameters were estimated as suggested by Rocke and Durbin (2001).