False Discovery Rate for 2x2 Contingency Tables Details
Overview
This tool estimates the False Discovery Rate (FDR) for 2x2 Contingency Tables, based on an extension to Fisher�s exact test.
When testing a large number of hypotheses, it can be helpful to estimate or control the false discovery rate (FDR), the expected proportion of tests called significant that are truly null. We extend Fisher's exact test to calculate the exact null distribution over a set of 2x2 contingency tables. We use these computations to determine pooled pvalues, and qvalues.
The tool accepts a tab delimited text file describing a set of contingency tables. Each line in the file describes one table by providing the table counts. For example, the line 34 78 98 70, describes a table:
The user can specify any number of columns preceding the table counts containing table descriptions, and any number of columns after the table counts.
Options
We allow the following options for processing the data (for more details please refer to the technical report):
 Input file  specify the file containing the set of contingency tables.
 First row contains column headers  use this option to ignore the first line, if it contains column headers or any other text information.
 Table counts start on column  specify how many of the initial columns to ignore (when they contain textual description of the table). There is no need to explicitly specify the number of columns that should be ignored after the table counts.
 Compute positive FDR (pFDR)  use this option to switch between the computation of FDR (unchecked) and positive FDR (checked).
 Filter irrelevant tables  use this option to ignore tables that could not possibly achieve a significant pvalue.
 Pi evaluation method  select the method for estimating the ratio of true nulls (pi) in the data.
 Pi=1  do not attempt to estimate pi.
 sum of observed pvalues / expected  estimate pi using the ratio of the number of observed test with a specific pvalue and the expected number of such tests under the null hypothesis. When filtering is selected. this is the only valid method.
 2*avg(pvalue)  estimate pi using twice the average pvalue.
 Output all computed statistics  select this option to include in the computed table all the statistics computed in the process. When this option is not used, only qvalues are reported.
Once you have selected an input file and all the appropriate options for your application, press
Compute. The tool will report the progress of the computation and, once completed, present the results. This process may take a few minutes based on the number of tables and the counts of the tables. We provide the following results:

Data  in the Data pane we report, in addition to the input data, the following values:
 pvalue  pvalues computed using Fisher's exact test.
 pooled pvalue  pooled pvalues across the multiple tests.
 pr(R(p)>0) (if pFDR is selected)  the probability that at least one test is rejected at p. This quantity is used in the computation of the pFDR.
 filtering pi (if filtering is selected)   the estimated ratio of nulls among those tests that could possibly achieve the current pvalue.
 pFDR (if pFDR is selected)  the positive false discovery rate.
 FDR (if pFDR is not selected)  the false discovery rate.
 qvalue  the qvalue for the specific table.
To use the data, press Copy to clipboard. Then you can paste the information to any application, such as Microsoft Excel.
 pvalue histogram  the histogram of pvalues. Use Zoom to focus on the more significant pvalues.
 qvalue vs. pvalue  shows qvalue as a function of pvalue. Use Zoom to focus on the more significant pvalues.
 #tests vs. qvalue  the computed number of tests with a specific qvalue. Use Zoom to focus on the more significant qvalues.
Example
We provide a file containing a set of contingency tables as an example. To test our tool, save this file to your local computer, and then use it as the Input file in the tool.
Data description: The MHC locus on Chromosome 6 contains three HLA class I genes, A, B and C. These genes are very important for the adaptive immune response, and are remarkably variable, with thousands of known variations (alleles). Because these genes are located in the same region, they tend to be inherited together, such that certain pairs of HLA alleles tend to be correlated in different populations. Rousseau and colleagues recently looked for correlations between HLA alleles and observed adaptations in HIV. As a meta analysis, they looked for correlations between HLA alleles using Fisher's exact test. Such correlations were useful in determining which HLA allele was likely responsible for an observed adaptation in HIV. The example file consists of contingency tables measuring the frequency of cooccurrence for each HLA allele in Rousseau et al's study population (more information available here).
(Rousseau et al 2008 Journal of Virology, 82(13):64346446)
Scalability
While our application is designed to handle large table sets (was tested with about 100K tables) with large counts (was tested with n=500K), further scaling options are unavailable through the Silverlight user interface.
To access all the possible options, install the complete application from CodePlex.
Contact Information
Let us know what you think! If you have suggestions, or encounter problems, please send a mail to one of the authors below.