ALLCools.clustering.dmg

Module Contents

_single_pairwise_dmg(cluster_l, cluster_r, top_n, adj_p_cutoff, delta_rate_cutoff, auroc_cutoff, adata_dir, dmg_dir)[source]

Calculate DMG between a pair of adata file

class PairwiseDMG(max_cell_per_group=1000, top_n=10000, adj_p_cutoff=0.001, delta_rate_cutoff=0.3, auroc_cutoff=0.9, random_state=0, n_jobs=1, verbose=True)[source]
fit_predict(self, x, groups, var_dim, obs_dim='cell', outlier='Outlier', cleanup=True, selected_pairs: List[tuple] = None)[source]

provide data and perform the pairwise DMG

Parameters
  • x – 2D cell-by-feature xarray.DataArray

  • groups – cluster labels

  • obs_dim – name of the cell dim

  • var_dim – name of the feature dim

  • outlier – name of the outlier group, if provided, will ignore this label

  • cleanup – Whether to delete the group adata file

  • selected_pairs – By default, pairwise DMG will calculate all possible pairs between all the groups, which might be very time consuming if the group number is large. With this parameter, you may provide a list of cluster pairs

_save_cluster_adata(self)[source]

Save each group into separate adata, this way reduce the memory during parallel

_pairwise_dmg(self)[source]

pairwise DMG runner, result save to self.dmg_table

_cleanup(self)[source]

Delete group adata files

aggregate_pairwise_dmg(self, adata, groupby, obsm='X_pca')[source]

Aggregate pairwise DMG results for each cluster, rank DMG for the cluster by the sum of AUROC * cluster_pair_similarity This way, the DMGs having large AUROC between similar clusters get more weights

Parameters
  • adata

  • groupby

  • obsm

_single_ovr_dmg(cell_label, mcds, obs_dim, var_dim, mc_type, top_n, adj_p_cutoff, fc_cutoff, auroc_cutoff)[source]

single one vs rest DMG runner

_one_vs_rest_dmr_runner(cell_meta, group, cluster, max_cluster_cells, max_other_fold, mcds_paths, obs_dim, var_dim, mc_type, top_n, adj_p_cutoff, fc_cutoff, auroc_cutoff, verbose=True)[source]

one vs rest DMG runner

one_vs_rest_dmg(cell_meta, group, mcds=None, mcds_paths=None, obs_dim='cell', var_dim='gene', mc_type='CHN', top_n=1000, adj_p_cutoff=0.01, fc_cutoff=0.8, auroc_cutoff=0.8, max_cluster_cells=2000, max_other_fold=5, cpu=1, verbose=True)[source]

Calculating cluster marker genes using one-vs-rest strategy.

Parameters
  • cell_meta – cell metadata containing cluster labels

  • group – the name of the cluster label column

  • mcds – cell-by-gene MCDS object for calculating DMG. Provide either mcds_paths or mcds.

  • mcds_paths – cell-by-gene MCDS paths for calculating DMG. Provide either mcds_paths or mcds.

  • obs_dim – dimension name of the cells

  • var_dim – dimension name of the features

  • mc_type – value to select methylation type in the mc_type dimension

  • top_n – report top N DMGs

  • adj_p_cutoff – adjusted P value cutoff to report significant DMG

  • fc_cutoff – mC fraction fold change cutoff to report significant DMG

  • auroc_cutoff – AUROC cutoff to report significant DMG

  • max_cluster_cells – The maximum number of cells from a group, downsample large group to this number

  • max_other_fold – The fold of other cell numbers comparing

  • cpu – number of cpus

Returns

pandas Dataframe of the one-vs-rest DMGs

Return type

dmg_table