This page describes how to generate files of background-corrected probe intensities. These files can be processed by perl and R scripts to produce GeneBASE estimates.
Instructions are available on the ProbeEffects portion of the download page.
A log file to store the progress of the computation and to output any error messages. |
Exon array annotation, including pgf, clf and probeset annotation files. These can be downloaded from the annotation page. |
The exon array data. We recommend a set of diverse samples of data for probe selection. The Affymetrix tissue panel data may be combined with small numbers of exon array data. |
An output file stores the resulting background-corrected, normalized probe intensities. |
The model parameters specify a choice of background correction and normalization method. |
We provide several sample parameter files which can be modified. Detailed descriptions of the parameters are given below.
***Note that flags and parameter values are separated by tabs***
MAT background correction, scalar normalization. |
MAT background correction, no normalization. |
[log] logfile The name of the file to log progress, errors, etc. |
[exon_annotation] probeset_annotation The probeset annotation file specifies the grouping of probesets into transcript clusters and the level of annotation supporting each probeset. See the annotation page to download. |
pgf_file The pgf file specifies the grouping of probes into probesets. See the annotation page to download. |
clf_file The clf file describes the position of each probe on the chip. Those clf files with a description "crosshyb_x" including the mapping information of probes to off-targets allowing an edit distance of "x" base-pairs. To generate GeneBASE-xhyb estimates, a clf file with "crosshyb_x" must be specified. See the annotation page for details. |
[exon_data] folder The folder storing the array cel files. |
exon_cel_files A list of cel files, each array separated by a single "," and no spaces. |
[output] output_model_fit A file storing the MAT fitted coefficients and R-squared values. |
output_all_bkgd_correct_norm_probes When set to "true" one file for each array is output containing background-corrected, normalized probe intensities. |
bkgd_correct_norm_probes_file A prefix for the set of files created for each array. |
[model] array_type The type of array analyzed. Here a value of "exon" should be specified. |
method Background-correction method. One of (mat, median_gc, none) |
train_model This value should be set to "true" to output background-corrected normalized probe intensities. |
mat_training_probe_type The probes used for training the background model. One of (background, full). Defaults to background. |
normalization_method Normalization method. One of (core_probe_scaling, none, quantile). The core_probe_scaling method applies a scalar to each array so that the median of background-corrected core probe intensities is equal to 100. The none method applies no normalization in addition to the background correction. The quantile method applies a quantile normalization (followed by background correction). |
The program is run using the parameter file "parameterFile.txt" on the command line with the following command:
./ProbeEffects2.0 -par parameterFile.txt
The program outputs a log file which should be checked for errors.
The background-corrected probe intensities will be output in files specified by the "output_model_fit" parameter in the parameter file.
Download the ProbeSelection program from the download page.
Detailed instructions for running probe selection can be found in the file ReadMe_ProbeSelection.