Share this page
    Project Tuva Enhanced Video Player
    Project Tuva Enhanced Video Player
    Share this page E-mail this page Print this page RSS feeds
    Home  > eScience  > Microsoft Computational Biology Web Tools  > Fisher's Exact Test  > Fisher's Exact Test Details
    Fisher's Exact Test Tool Details

    Overview

    This tool rapidly computes Fisher's exact test of independence for 2x2 contingency tables. Below, we explain how Fisher's exact test can be efficiently computed. For a more detailed description, please refer to the technical report.

    Efficient hyper-geomteric probability computation

    Given a 2x2 contingency table, summarizing the co-occurrences of two boolean variables - X, and Y:

    Fisher showed that the probability of a table under the null hypothesis that X and Y are independent, follows the hypergeometric distribution:

    where teta_X=a+b, teta_Y=a+c, and n=a+b+c+d.

    There are two difficulties with the equation above. First, computing factorials for large numbers causes numerical overflow. Second, a direct computation requires 2n operations.

    We solve the overflow problem by selectively multiplying and dividing terms from the equation. That is, whenever the partial result grows beyond 1, we divide by a term in the denominator, and wherever the partial result shrinks below 1, we multiply by a term in the numerator. To reduce the overall runtime we factorize the factorials. As we have argued above, in many cases the marginals are not balanced. In that case there is a considerable difference between the minimal and the maximal marginal. If teta_Y = c + d is the minimal marginal we factorize the computation as follows:

    which requires 3(c + d) operations. In many cases, one of the entries of the table significantly dominates the rest, making the above computation a dramatic improvement over the straight forward implementation. Algorithm 1 provides pseudo-code for our approach.

    Efficient computation of Fisher's exact test

    Fisher suggested to compute a p-value by summing the probabilities of all the tables that are "as extreme" as the current table. A one-sided test sums all the tables that are more extreme in one direction, that is, all the tables consistent with the marginals that have smaller a (Left)

    or larger a (Right).

    The Fisher two-tailed p-value for a table t=[a,b,c,d] is defined as:

    which is the sum of all tables consistent with the marginals that are as likely as the current table. If we order the permutations by increasing value of a, we can see that, for two tables

    , we have

    Therefore, computing the probabilities of all the permutations can be done by computing the probability of the table with the minimal possible a, and then iteratively computing successor probabilities until reaching the maximal possible a.