Cluster Differentially Methylated Genes
Contents
Cluster Differentially Methylated Genes¶
Load¶
import pandas as pd
from ALLCools.clustering import one_vs_rest_dmg
Parameters¶
mcds_paths = 'geneslop2k_frac.mcds'
cell_meta_path = '../step_by_step/100kb/L1.ClusteringResults.csv.gz'
cluster_col = 'L1'
obs_dim = 'cell'
var_dim = 'geneslop2k'
mc_type = 'CHN'
top_n = 1000
auroc_cutoff = 0.8
adj_p_cutoff = 0.001
fc_cutoff = 0.8
max_cluster_cells = 2000
max_other_fold = 5
cpu = 10
Load¶
cell_meta = pd.read_csv('../../cell_level/step_by_step/100kb/L1.ClusteringResults.csv.gz', index_col=0)
cell_meta.head()
AllcPath | mCCCFrac | mCGFrac | mCGFracAdj | mCHFrac | mCHFracAdj | FinalReads | InputReads | MappedReads | DissectionRegion | ... | Sample | leiden | mCHFrac.1 | tsne_0 | tsne_1 | L1 | L1_proba | CellTypeAnno | umap_0 | umap_1 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
10E_M_0 | /gale/raidix/rdx-4/mapping/10E/CEMBA190625-10E... | 0.008198 | 0.822633 | 0.821166 | 0.041640 | 0.033718 | 1626504.0 | 4407752 | 2892347.0 | 10E | ... | 10E_190625 | 13 | 0.041640 | 57.602540 | -5.024663 | c11 | 0.864367 | MGE-Sst | 5.288734 | 9.726882 |
10E_M_1 | /gale/raidix/rdx-4/mapping/10E/CEMBA190625-10E... | 0.006019 | 0.743035 | 0.741479 | 0.024127 | 0.018218 | 2009998.0 | 5524084 | 3657352.0 | 10E | ... | 10E_190625 | 11 | 0.024127 | -45.191850 | -11.135287 | c7 | 0.669400 | CA3 | -3.702348 | 7.514084 |
10E_M_10 | /gale/raidix/rdx-4/mapping/10E/CEMBA190625-10E... | 0.006569 | 0.750172 | 0.748520 | 0.027665 | 0.021235 | 1383636.0 | 3455260 | 2172987.0 | 10E | ... | 10E_190625 | 11 | 0.027665 | -46.905564 | -8.491459 | c7 | 0.787267 | CA3 | -2.797569 | 7.604081 |
10E_M_101 | /gale/raidix/rdx-4/mapping/10E/CEMBA190625-10E... | 0.006353 | 0.760898 | 0.759369 | 0.026547 | 0.020323 | 2474670.0 | 7245482 | 4778768.0 | 10E | ... | 10E_190625 | 11 | 0.026547 | -53.480022 | -1.604433 | c7 | 0.526933 | CA3 | -0.310848 | 8.465321 |
10E_M_102 | /gale/raidix/rdx-4/mapping/10E/CEMBA190625-10E... | 0.005409 | 0.752980 | 0.751637 | 0.019497 | 0.014164 | 2430290.0 | 7004754 | 4609570.0 | 10E | ... | 10E_190625 | 7 | 0.019497 | -25.967990 | 13.813133 | c30 | 0.924000 | CA1 | 0.252257 | -3.450731 |
5 rows × 27 columns
Calculate DMG¶
dmg_table = one_vs_rest_dmg(cell_meta,
group=cluster_col,
mcds_paths=mcds_paths,
obs_dim=obs_dim,
var_dim=var_dim,
mc_type=mc_type,
top_n=top_n,
adj_p_cutoff=adj_p_cutoff,
fc_cutoff=fc_cutoff,
auroc_cutoff=auroc_cutoff,
max_cluster_cells=max_cluster_cells,
max_other_fold=max_other_fold,
cpu=cpu)
Calculating cluster c0 DMGs.
Calculating cluster c1 DMGs.
Calculating cluster c10 DMGs.
Calculating cluster c11 DMGs.
Calculating cluster c12 DMGs.
/home/hanliu/miniconda3/envs/allcools_new/lib/python3.8/site-packages/xarray/core/indexing.py:1227: PerformanceWarning: Slicing is producing a large chunk. To accept the large
chunk and silence this warning, set the option
>>> with dask.config.set(**{'array.slicing.split_large_chunks': False}):
... array[indexer]
To avoid creating the large chunks, set the option
>>> with dask.config.set(**{'array.slicing.split_large_chunks': True}):
... array[indexer]
return self.array[key]
Calculating cluster c13 DMGs.Calculating cluster c14 DMGs.
Calculating cluster c15 DMGs.
Calculating cluster c16 DMGs.
Calculating cluster c17 DMGs.
Calculating cluster c18 DMGs.
c17 Finished.
Calculating cluster c19 DMGs.
c16 Finished.
Calculating cluster c2 DMGs.
c15 Finished.
Calculating cluster c20 DMGs.
c14 Finished.
Calculating cluster c21 DMGs.
c13 Finished.
Calculating cluster c22 DMGs.
c12 Finished.
Calculating cluster c23 DMGs.
c11 Finished.
Calculating cluster c24 DMGs.
c10 Finished.
Calculating cluster c25 DMGs.
c18 Finished.
Calculating cluster c26 DMGs.
c21 Finished.
Calculating cluster c27 DMGs.
c19 Finished.
Calculating cluster c28 DMGs.
c22 Finished.
Calculating cluster c29 DMGs.
c20 Finished.
Calculating cluster c3 DMGs.
c24 Finished.
Calculating cluster c30 DMGs.
c23 Finished.
Calculating cluster c31 DMGs.
c25 Finished.
Calculating cluster c32 DMGs.
Calculating cluster c33 DMGs.
c27 Finished.
c26 Finished.
Calculating cluster c34 DMGs.
c29 Finished.
Calculating cluster c35 DMGs.
c28 Finished.
Calculating cluster c36 DMGs.
c1 Finished.
Calculating cluster c37 DMGs.
c30 Finished.
Calculating cluster c38 DMGs.
c31 Finished.
Calculating cluster c39 DMGs.
c33 Finished.
Calculating cluster c4 DMGs.
c34 Finished.
Calculating cluster c40 DMGs.
c32 Finished.
Calculating cluster c5 DMGs.
c37 Finished.
Calculating cluster c6 DMGs.
c35 Finished.
Calculating cluster c7 DMGs.
c36 Finished.
Calculating cluster c8 DMGs.
c38 Finished.
Calculating cluster c9 DMGs.
c39 Finished.
c40 Finished.
c2 Finished.
c3 Finished.
c0 Finished.
c7 Finished.
c9 Finished.
c8 Finished.
c6 Finished.
c5 Finished.
c4 Finished.
Save¶
dmg_table.to_hdf(f'{cluster_col}.OneVsRestDMG.hdf', key='data')