False Discovery Rate for 2x2 Contingency Tables Details
This tool estimates the False Discovery Rate (FDR) for 2x2 Contingency Tables, based on an extension to Fisher’s exact test.
When testing a large number of hypotheses, it can be helpful to estimate or control the false discovery rate (FDR), the expected proportion of tests called significant that are truly null. We extend Fisher's exact test to calculate the exact null distribution over a set of 2x2 contingency tables. We use these computations to determine pooled p-values, and q-values.
The tool accepts a tab delimited text file describing a set of contingency tables. Each line in the file describes one table by providing the table counts. For example, the line 34 78 98 70, describes a table:
The user can specify any number of columns preceding the table counts containing table descriptions, and any number of columns after the table counts.
We allow the following options for processing the data (for more details please refer to the technical report):
- Input file - specify the file containing the set of contingency tables.
- First row contains column headers - use this option to ignore the first line, if it contains column headers or any other text information.
- Table counts start on column - specify how many of the initial columns to ignore (when they contain textual description of the table). There is no need to explicitly specify the number of columns that should be ignored after the table counts.
- Compute positive FDR (pFDR) - use this option to switch between the computation of FDR (unchecked) and positive FDR (checked).
- Filter irrelevant tables - use this option to ignore tables that could not possibly achieve a significant p-value.
- Pi evaluation method - select the method for estimating the ratio of true nulls (pi) in the data.
- Pi=1 - do not attempt to estimate pi.
- sum of observed p-values / expected - estimate pi using the ratio of the number of observed test with a specific p-value and the expected number of such tests under the null hypothesis. When filtering is selected. this is the only valid method.
- 2*avg(p-value) - estimate pi using twice the average p-value.
- Output all computed statistics - select this option to include in the computed table all the statistics computed in the process. When this option is not used, only q-values are reported.
Once you have selected an input file and all the appropriate options for your application, press Compute
. The tool will report the progress of the computation and, once completed, present the results. This process may take a few minutes based on the number of tables and the counts of the tables. We provide the following results:
Data - in the Data pane we report, in addition to the input data, the following values:
To use the data, press Copy to clipboard. Then you can paste the information to any application, such as Microsoft Excel.
- p-value - p-values computed using Fisher's exact test.
- pooled p-value - pooled p-values across the multiple tests.
- pr(R(p)>0) (if pFDR is selected) - the probability that at least one test is rejected at p. This quantity is used in the computation of the pFDR.
- filtering pi (if filtering is selected) - - the estimated ratio of nulls among those tests that could possibly achieve the current p-value.
- pFDR (if pFDR is selected) - the positive false discovery rate.
- FDR (if pFDR is not selected) - the false discovery rate.
- q-value - the q-value for the specific table.
- p-value histogram - the histogram of p-values. Use Zoom to focus on the more significant p-values.
- q-value vs. p-value - shows q-value as a function of p-value. Use Zoom to focus on the more significant p-values.
- #tests vs. q-value - the computed number of tests with a specific q-value. Use Zoom to focus on the more significant q-values.
We provide a file containing a set of contingency tables as an example. To test our tool, save this file to your local computer, and then use it as the Input file in the tool.
Data description: The MHC locus on Chromosome 6 contains three HLA class I genes, A, B and C. These genes are very important for the adaptive immune response, and are remarkably variable, with thousands of known variations (alleles). Because these genes are located in the same region, they tend to be inherited together, such that certain pairs of HLA alleles tend to be correlated in different populations. Rousseau and colleagues recently looked for correlations between HLA alleles and observed adaptations in HIV. As a meta analysis, they looked for correlations between HLA alleles using Fisher's exact test. Such correlations were useful in determining which HLA allele was likely responsible for an observed adaptation in HIV. The example file consists of contingency tables measuring the frequency of co-occurrence for each HLA allele in Rousseau et al's study population (more information available here).
(Rousseau et al 2008 Journal of Virology, 82(13):6434-6446)
While our application is designed to handle large table sets (was tested with about 100K tables) with large counts (was tested with n=500K), further scaling options are unavailable through the Silverlight user interface.
To access all the possible options, install the complete application from CodePlex.
Let us know what you think! If you have suggestions, or encounter problems, please send a mail to one of the authors below.