Motif Enrichment Analysis
Contents
Motif Enrichment Analysis¶
After motif scan, we can run motif enrichment analysis by chosing two list of regions, running Fisher’s Exact test between the two sets for each motif-cluster (or motif) and perform multiple tests correction
Import¶
import numpy as np
import pandas as pd
import xarray as xr
import seaborn as sns
import matplotlib.pyplot as plt
from scipy.stats import fisher_exact
from statsmodels.stats.multitest import multipletests
from ALLCools.mcds import RegionDS
Load¶
dmr = RegionDS.open('HIP_small')
Using dmr as region_dim
Motif Enrichment Between Two Sets of Regions¶
region_dim = 'dmr'
region_state_da = 'dmr_state'
feature_dim = 'sample'
# this is a helper function to select hypo- and hyper-DMR for one sample
hypo_dmr, hyper_dmr = dmr.get_hypo_hyper_index('CA1')
result = dmr.motif_enrichment(true_regions=hypo_dmr,
background_regions=hyper_dmr,
region_dim=None,
motif_dim='motif-cluster',
motif_da=None,
alternative='two-sided')
result.head()
oddsratio | p | q | log2OR | -lgq | |
---|---|---|---|---|---|
motif-cluster | |||||
c1 | 0.369231 | 0.476190 | 0.901924 | -1.437405 | 0.04483 |
c10 | 1.142857 | 1.000000 | 1.000000 | 0.192645 | -0.00000 |
c100 | 0.474576 | 0.299267 | 0.901924 | -1.075288 | 0.04483 |
c101 | 0.733333 | 0.701906 | 1.000000 | -0.447459 | -0.00000 |
c102 | 0.369231 | 0.476190 | 0.901924 | -1.437405 | 0.04483 |
Motif Enrichment For Each Sample¶
Alternatively, you can use RegionDS.sample_dmr_motif_enrichment()
to achieve the same purpose.
result = dmr.sample_dmr_motif_enrichment('CA1')
result.head()
oddsratio | p | q | log2OR | -lgq | |
---|---|---|---|---|---|
motif-cluster | |||||
c1 | 0.369231 | 0.476190 | 0.901924 | -1.437405 | 0.04483 |
c10 | 1.142857 | 1.000000 | 1.000000 | 0.192645 | -0.00000 |
c100 | 0.474576 | 0.299267 | 0.901924 | -1.075288 | 0.04483 |
c101 | 0.733333 | 0.701906 | 1.000000 | -0.447459 | -0.00000 |
c102 | 0.369231 | 0.476190 | 0.901924 | -1.437405 | 0.04483 |
Motif Enrichment Between Sample Pairs¶
You can also compare hypo-DMRs (non-overlapping) from two different samples.
a_not_b, a_and_b, b_not_a = dmr.get_pairwise_differential_index('CA1', 'ASC', dmr_type='hypo')
result = dmr.motif_enrichment(true_regions=a_not_b,
background_regions=b_not_a,
region_dim=None,
motif_dim='motif-cluster',
motif_da=None,
alternative='two-sided')
result.head()
oddsratio | p | q | log2OR | -lgq | |
---|---|---|---|---|---|
motif-cluster | |||||
c1 | 0.491228 | 1.000000 | 1.0 | -1.025535 | -0.0 |
c10 | 0.736364 | 1.000000 | 1.0 | -0.441510 | -0.0 |
c100 | 0.658824 | 0.523942 | 1.0 | -0.602036 | -0.0 |
c101 | 0.721154 | 0.725386 | 1.0 | -0.471621 | -0.0 |
c102 | 0.491228 | 1.000000 | 1.0 | -1.025535 | -0.0 |
# this function are the same as above
result = dmr.pairwise_dmr_motif_enrichment('CA1', 'ASC', dmr_type='hypo')
result.head()
oddsratio | p | q | log2OR | -lgq | |
---|---|---|---|---|---|
motif-cluster | |||||
c1 | 0.491228 | 1.000000 | 1.0 | -1.025535 | -0.0 |
c10 | 0.736364 | 1.000000 | 1.0 | -0.441510 | -0.0 |
c100 | 0.658824 | 0.523942 | 1.0 | -0.602036 | -0.0 |
c101 | 0.721154 | 0.725386 | 1.0 | -0.471621 | -0.0 |
c102 | 0.491228 | 1.000000 | 1.0 | -1.025535 | -0.0 |