Motif Enrichment Analysis

After motif scan, we can run motif enrichment analysis by chosing two list of regions, running Fisher’s Exact test between the two sets for each motif-cluster (or motif) and perform multiple tests correction

Import

import numpy as np
import pandas as pd
import xarray as xr
import seaborn as sns
import matplotlib.pyplot as plt
from scipy.stats import fisher_exact
from statsmodels.stats.multitest import multipletests
from ALLCools.mcds import RegionDS

Load

dmr = RegionDS.open('HIP_small')
Using dmr as region_dim

Motif Enrichment Between Two Sets of Regions

region_dim = 'dmr'
region_state_da = 'dmr_state'
feature_dim = 'sample'
# this is a helper function to select hypo- and hyper-DMR for one sample
hypo_dmr, hyper_dmr = dmr.get_hypo_hyper_index('CA1')
result = dmr.motif_enrichment(true_regions=hypo_dmr,
                              background_regions=hyper_dmr,
                              region_dim=None,
                              motif_dim='motif-cluster',
                              motif_da=None,
                              alternative='two-sided')
result.head()
oddsratio p q log2OR -lgq
motif-cluster
c1 0.369231 0.476190 0.901924 -1.437405 0.04483
c10 1.142857 1.000000 1.000000 0.192645 -0.00000
c100 0.474576 0.299267 0.901924 -1.075288 0.04483
c101 0.733333 0.701906 1.000000 -0.447459 -0.00000
c102 0.369231 0.476190 0.901924 -1.437405 0.04483

Motif Enrichment For Each Sample

Alternatively, you can use RegionDS.sample_dmr_motif_enrichment() to achieve the same purpose.

result = dmr.sample_dmr_motif_enrichment('CA1')
result.head()
oddsratio p q log2OR -lgq
motif-cluster
c1 0.369231 0.476190 0.901924 -1.437405 0.04483
c10 1.142857 1.000000 1.000000 0.192645 -0.00000
c100 0.474576 0.299267 0.901924 -1.075288 0.04483
c101 0.733333 0.701906 1.000000 -0.447459 -0.00000
c102 0.369231 0.476190 0.901924 -1.437405 0.04483

Motif Enrichment Between Sample Pairs

You can also compare hypo-DMRs (non-overlapping) from two different samples.

a_not_b, a_and_b, b_not_a = dmr.get_pairwise_differential_index('CA1', 'ASC', dmr_type='hypo')

result = dmr.motif_enrichment(true_regions=a_not_b,
                              background_regions=b_not_a,
                              region_dim=None,
                              motif_dim='motif-cluster',
                              motif_da=None,
                              alternative='two-sided')
result.head()
oddsratio p q log2OR -lgq
motif-cluster
c1 0.491228 1.000000 1.0 -1.025535 -0.0
c10 0.736364 1.000000 1.0 -0.441510 -0.0
c100 0.658824 0.523942 1.0 -0.602036 -0.0
c101 0.721154 0.725386 1.0 -0.471621 -0.0
c102 0.491228 1.000000 1.0 -1.025535 -0.0
# this function are the same as above
result = dmr.pairwise_dmr_motif_enrichment('CA1', 'ASC', dmr_type='hypo')
result.head()
oddsratio p q log2OR -lgq
motif-cluster
c1 0.491228 1.000000 1.0 -1.025535 -0.0
c10 0.736364 1.000000 1.0 -0.441510 -0.0
c100 0.658824 0.523942 1.0 -0.602036 -0.0
c101 0.721154 0.725386 1.0 -0.471621 -0.0
c102 0.491228 1.000000 1.0 -1.025535 -0.0