Cluster Differentially Methylated Genes

Cluster Differentially Methylated Genes

Load

import pandas as pd
from ALLCools.clustering import one_vs_rest_dmg

Parameters

mcds_paths = 'geneslop2k_frac.mcds'
cell_meta_path = '../step_by_step/100kb/L1.ClusteringResults.csv.gz'
cluster_col = 'L1'

obs_dim = 'cell'
var_dim = 'geneslop2k'
mc_type = 'CHN'

top_n = 1000
auroc_cutoff = 0.8
adj_p_cutoff = 0.001
fc_cutoff = 0.8
max_cluster_cells = 2000
max_other_fold = 5
cpu = 10

Load

cell_meta = pd.read_csv('../../cell_level/step_by_step/100kb/L1.ClusteringResults.csv.gz', index_col=0)
cell_meta.head()
AllcPath mCCCFrac mCGFrac mCGFracAdj mCHFrac mCHFracAdj FinalReads InputReads MappedReads DissectionRegion ... Sample leiden mCHFrac.1 tsne_0 tsne_1 L1 L1_proba CellTypeAnno umap_0 umap_1
10E_M_0 /gale/raidix/rdx-4/mapping/10E/CEMBA190625-10E... 0.008198 0.822633 0.821166 0.041640 0.033718 1626504.0 4407752 2892347.0 10E ... 10E_190625 13 0.041640 57.602540 -5.024663 c11 0.864367 MGE-Sst 5.288734 9.726882
10E_M_1 /gale/raidix/rdx-4/mapping/10E/CEMBA190625-10E... 0.006019 0.743035 0.741479 0.024127 0.018218 2009998.0 5524084 3657352.0 10E ... 10E_190625 11 0.024127 -45.191850 -11.135287 c7 0.669400 CA3 -3.702348 7.514084
10E_M_10 /gale/raidix/rdx-4/mapping/10E/CEMBA190625-10E... 0.006569 0.750172 0.748520 0.027665 0.021235 1383636.0 3455260 2172987.0 10E ... 10E_190625 11 0.027665 -46.905564 -8.491459 c7 0.787267 CA3 -2.797569 7.604081
10E_M_101 /gale/raidix/rdx-4/mapping/10E/CEMBA190625-10E... 0.006353 0.760898 0.759369 0.026547 0.020323 2474670.0 7245482 4778768.0 10E ... 10E_190625 11 0.026547 -53.480022 -1.604433 c7 0.526933 CA3 -0.310848 8.465321
10E_M_102 /gale/raidix/rdx-4/mapping/10E/CEMBA190625-10E... 0.005409 0.752980 0.751637 0.019497 0.014164 2430290.0 7004754 4609570.0 10E ... 10E_190625 7 0.019497 -25.967990 13.813133 c30 0.924000 CA1 0.252257 -3.450731

5 rows × 27 columns

Calculate DMG

dmg_table = one_vs_rest_dmg(cell_meta,
                            group=cluster_col,
                            mcds_paths=mcds_paths,
                            obs_dim=obs_dim,
                            var_dim=var_dim,
                            mc_type=mc_type,
                            top_n=top_n,
                            adj_p_cutoff=adj_p_cutoff,
                            fc_cutoff=fc_cutoff,
                            auroc_cutoff=auroc_cutoff,
                            max_cluster_cells=max_cluster_cells,
                            max_other_fold=max_other_fold,
                            cpu=cpu)
Calculating cluster c0 DMGs.
Calculating cluster c1 DMGs.
Calculating cluster c10 DMGs.
Calculating cluster c11 DMGs.
Calculating cluster c12 DMGs.
/home/hanliu/miniconda3/envs/allcools_new/lib/python3.8/site-packages/xarray/core/indexing.py:1227: PerformanceWarning: Slicing is producing a large chunk. To accept the large
chunk and silence this warning, set the option
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': False}):
    ...     array[indexer]

To avoid creating the large chunks, set the option
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': True}):
    ...     array[indexer]
  return self.array[key]
Calculating cluster c13 DMGs.Calculating cluster c14 DMGs.
Calculating cluster c15 DMGs.
Calculating cluster c16 DMGs.
Calculating cluster c17 DMGs.
Calculating cluster c18 DMGs.
c17 Finished.
Calculating cluster c19 DMGs.
c16 Finished.
Calculating cluster c2 DMGs.
c15 Finished.
Calculating cluster c20 DMGs.
c14 Finished.
Calculating cluster c21 DMGs.
c13 Finished.
Calculating cluster c22 DMGs.
c12 Finished.
Calculating cluster c23 DMGs.
c11 Finished.
Calculating cluster c24 DMGs.
c10 Finished.
Calculating cluster c25 DMGs.
c18 Finished.
Calculating cluster c26 DMGs.
c21 Finished.
Calculating cluster c27 DMGs.
c19 Finished.
Calculating cluster c28 DMGs.
c22 Finished.
Calculating cluster c29 DMGs.
c20 Finished.
Calculating cluster c3 DMGs.
c24 Finished.
Calculating cluster c30 DMGs.
c23 Finished.
Calculating cluster c31 DMGs.
c25 Finished.
Calculating cluster c32 DMGs.
Calculating cluster c33 DMGs.
c27 Finished.
c26 Finished.
Calculating cluster c34 DMGs.
c29 Finished.
Calculating cluster c35 DMGs.
c28 Finished.
Calculating cluster c36 DMGs.
c1 Finished.
Calculating cluster c37 DMGs.
c30 Finished.
Calculating cluster c38 DMGs.
c31 Finished.
Calculating cluster c39 DMGs.
c33 Finished.
Calculating cluster c4 DMGs.
c34 Finished.
Calculating cluster c40 DMGs.
c32 Finished.
Calculating cluster c5 DMGs.
c37 Finished.
Calculating cluster c6 DMGs.
c35 Finished.
Calculating cluster c7 DMGs.
c36 Finished.
Calculating cluster c8 DMGs.
c38 Finished.
Calculating cluster c9 DMGs.
c39 Finished.
c40 Finished.
c2 Finished.
c3 Finished.
c0 Finished.
c7 Finished.
c9 Finished.
c8 Finished.
c6 Finished.
c5 Finished.
c4 Finished.

Save

dmg_table.to_hdf(f'{cluster_col}.OneVsRestDMG.hdf', key='data')