`ALLCools.clustering.dmg`¶

Module Contents¶

_single_pairwise_dmg(cluster_l, cluster_r, top_n, adj_p_cutoff, delta_rate_cutoff, auroc_cutoff, adata_dir, dmg_dir)[source]¶: Calculate DMG between a pair of adata file

class PairwiseDMG(max_cell_per_group=1000, top_n=10000, adj_p_cutoff=0.001, delta_rate_cutoff=0.3, auroc_cutoff=0.9, random_state=0, n_jobs=1, verbose=True)[source]¶

fit_predict(self, x, groups, var_dim, obs_dim='cell', outlier='Outlier', cleanup=True, selected_pairs: List[tuple] = None)[source]¶

provide data and perform the pairwise DMG

Parameters

x – 2D cell-by-feature xarray.DataArray
groups – cluster labels
obs_dim – name of the cell dim
var_dim – name of the feature dim
outlier – name of the outlier group, if provided, will ignore this label
cleanup – Whether to delete the group adata file
selected_pairs – By default, pairwise DMG will calculate all possible pairs between all the groups, which might be very time consuming if the group number is large. With this parameter, you may provide a list of cluster pairs

_save_cluster_adata(self)[source]¶: Save each group into separate adata, this way reduce the memory during parallel

_pairwise_dmg(self)[source]¶: pairwise DMG runner, result save to self.dmg_table

_cleanup(self)[source]¶: Delete group adata files

aggregate_pairwise_dmg(self, adata, groupby, obsm='X_pca')[source]¶

Aggregate pairwise DMG results for each cluster, rank DMG for the cluster by the sum of AUROC * cluster_pair_similarity This way, the DMGs having large AUROC between similar clusters get more weights

Parameters

adata –
groupby –
obsm –

_single_ovr_dmg(cell_label, mcds, obs_dim, var_dim, mc_type, top_n, adj_p_cutoff, fc_cutoff, auroc_cutoff)[source]¶: single one vs rest DMG runner

_one_vs_rest_dmr_runner(cell_meta, group, cluster, max_cluster_cells, max_other_fold, mcds_paths, obs_dim, var_dim, mc_type, top_n, adj_p_cutoff, fc_cutoff, auroc_cutoff, verbose=True)[source]¶: one vs rest DMG runner

one_vs_rest_dmg(cell_meta, group, mcds=None, mcds_paths=None, obs_dim='cell', var_dim='gene', mc_type='CHN', top_n=1000, adj_p_cutoff=0.01, fc_cutoff=0.8, auroc_cutoff=0.8, max_cluster_cells=2000, max_other_fold=5, cpu=1, verbose=True)[source]¶

Calculating cluster marker genes using one-vs-rest strategy.

Parameters

cell_meta – cell metadata containing cluster labels
group – the name of the cluster label column
mcds – cell-by-gene MCDS object for calculating DMG. Provide either mcds_paths or mcds.
mcds_paths – cell-by-gene MCDS paths for calculating DMG. Provide either mcds_paths or mcds.
obs_dim – dimension name of the cells
var_dim – dimension name of the features
mc_type – value to select methylation type in the mc_type dimension
top_n – report top N DMGs
adj_p_cutoff – adjusted P value cutoff to report significant DMG
fc_cutoff – mC fraction fold change cutoff to report significant DMG
auroc_cutoff – AUROC cutoff to report significant DMG
max_cluster_cells – The maximum number of cells from a group, downsample large group to this number
max_other_fold – The fold of other cell numbers comparing
cpu – number of cpus

Returns

pandas Dataframe of the one-vs-rest DMGs

Return type

dmg_table

ALLCools.clustering.dmg

Contents

ALLCools.clustering.dmg¶

Module Contents¶

`ALLCools.clustering.dmg`¶