Predicting Epistatic Interactions

Mutual Information is well defined for either discrete or continuous variables, but not in cases of mixed data. The need for this approximation has led to the publications of many methods. I use the most well known by Ross et al, 2014. Given discrete variable $\mathcal{X}$ and continuous variable $\mathcal{Y}$, the mutual information between the two is defined, $I(\mathcal{X},\mathcal{Y}) = \psi(N) - \langle \psi(N_x) \rangle + \psi(K) - \langle \psi(M) \rangle$.

Epistatic genetic interactions are typically ignored in genome wide association studies because of the underlying mathematical and computational complexities. One proposed method from Hu et al, 2016 uses information and network theory in order to identify important single nucleotide polymorphisms (SNPs) that engage in epistatic behavior. Hu uses $\textit{Information Gain}$ (McGill 1954) as a measure of the amount of information gained about discrete phenotype $\mathcal{P}$ from only combined effect of two SNPs A and B shown by $IG_D(A;B;\mathcal{P}) = I(A,B;\mathcal{P}) - I(A;\mathcal{P}) - I(B;\mathcal{P})$, where $I(\cdot)$ is the Mutual Information.

We extend upon this method with updated methods and to work with continuous phenotypes.