ALLCools.clustering.dmg
Contents
ALLCools.clustering.dmg
¶
Module Contents¶
- _single_pairwise_dmg(cluster_l, cluster_r, top_n, adj_p_cutoff, delta_rate_cutoff, auroc_cutoff, adata_dir, dmg_dir)[source]¶
Calculate DMG between a pair of adata file
- class PairwiseDMG(max_cell_per_group=1000, top_n=10000, adj_p_cutoff=0.001, delta_rate_cutoff=0.3, auroc_cutoff=0.9, random_state=0, n_jobs=1, verbose=True)[source]¶
- fit_predict(self, x, groups, var_dim, obs_dim='cell', outlier='Outlier', cleanup=True, selected_pairs: List[tuple] = None)[source]¶
provide data and perform the pairwise DMG
- Parameters
x – 2D cell-by-feature xarray.DataArray
groups – cluster labels
obs_dim – name of the cell dim
var_dim – name of the feature dim
outlier – name of the outlier group, if provided, will ignore this label
cleanup – Whether to delete the group adata file
selected_pairs – By default, pairwise DMG will calculate all possible pairs between all the groups, which might be very time consuming if the group number is large. With this parameter, you may provide a list of cluster pairs
- _save_cluster_adata(self)[source]¶
Save each group into separate adata, this way reduce the memory during parallel
- aggregate_pairwise_dmg(self, adata, groupby, obsm='X_pca')[source]¶
Aggregate pairwise DMG results for each cluster, rank DMG for the cluster by the sum of AUROC * cluster_pair_similarity This way, the DMGs having large AUROC between similar clusters get more weights
- Parameters
adata –
groupby –
obsm –
- _single_ovr_dmg(cell_label, mcds, obs_dim, var_dim, mc_type, top_n, adj_p_cutoff, fc_cutoff, auroc_cutoff)[source]¶
single one vs rest DMG runner
- _one_vs_rest_dmr_runner(cell_meta, group, cluster, max_cluster_cells, max_other_fold, mcds_paths, obs_dim, var_dim, mc_type, top_n, adj_p_cutoff, fc_cutoff, auroc_cutoff, verbose=True)[source]¶
one vs rest DMG runner
- one_vs_rest_dmg(cell_meta, group, mcds=None, mcds_paths=None, obs_dim='cell', var_dim='gene', mc_type='CHN', top_n=1000, adj_p_cutoff=0.01, fc_cutoff=0.8, auroc_cutoff=0.8, max_cluster_cells=2000, max_other_fold=5, cpu=1, verbose=True)[source]¶
Calculating cluster marker genes using one-vs-rest strategy.
- Parameters
cell_meta – cell metadata containing cluster labels
group – the name of the cluster label column
mcds – cell-by-gene MCDS object for calculating DMG. Provide either mcds_paths or mcds.
mcds_paths – cell-by-gene MCDS paths for calculating DMG. Provide either mcds_paths or mcds.
obs_dim – dimension name of the cells
var_dim – dimension name of the features
mc_type – value to select methylation type in the mc_type dimension
top_n – report top N DMGs
adj_p_cutoff – adjusted P value cutoff to report significant DMG
fc_cutoff – mC fraction fold change cutoff to report significant DMG
auroc_cutoff – AUROC cutoff to report significant DMG
max_cluster_cells – The maximum number of cells from a group, downsample large group to this number
max_other_fold – The fold of other cell numbers comparing
cpu – number of cpus
- Returns
pandas Dataframe of the one-vs-rest DMGs
- Return type
dmg_table