/Project Details

False Discovery Rate Control in Computational Biology

Implemented a simulation study comparing BH, Storey's q-value, and Independent Hypothesis Weighting for large-scale multiple testing scenarios.

This seminar project was based on the paper 'A Practical Guide to Methods Controlling False Discoveries in Computational Biology' by Korthauer et al. The motivation is simple: modern biological datasets often require thousands or millions of hypothesis tests at once, and uncorrected p-values can create many false positives.

I implemented an RMarkdown simulation with 20,000 hypotheses to compare three FDR control approaches: Benjamini-Hochberg, Storey's q-value, and Independent Hypothesis Weighting. The simulation includes an informative covariate, similar to mean gene expression in RNA-seq, that affects both signal probability and detectability.

The important idea is that not every biological hypothesis is equally informative. Covariate-aware methods such as IHW can use side information to prioritize hypotheses with higher detection power, increasing discoveries without losing control over the false discovery rate.

Highlights

Simulated 20,000 large-scale hypothesis tests with covariate-dependent signal probability.
Compared BH, q-value, and IHW under the same nominal alpha level.
Evaluated discoveries, true positives, false positives, observed FDR, and true positive rate.
Observed that IHW produced the strongest power in the simulation while maintaining low observed FDR.
Connected the code experiment to high-throughput settings such as RNA-seq, single-cell analysis, microbiome studies, ChIP-seq, GWAS, and gene set analysis.