ALLCools.clustering.feature_selection.feature_enrichment

Module Contents

_calculate_enrichment_score(raw_adata, labels)[source]

Enrichment score modified from [Zeisel et al., 2018] for normalized methylation fractions Assuming the methylation value is posterior frac calculated by MCDS.add_mc_frac)

_calculate_enrichment_score_cytograph(adata, labels)[source]

The original CEF algorithm from [Zeisel et al., 2018] for count based data (RNA, ATAC)

_plot_enrichment_result(qvals, enrichment, null_enrichment, alpha)[source]

Make some plots for the p-value, q-value and # of CEFs distribution

_aggregate_enrichment(adata, enrichment, top_n, alpha, qvals, cluster_col)[source]

Aggregate enrichment results, calculate q values

cluster_enriched_features(adata: anndata.AnnData, cluster_col: str, top_n=200, alpha=0.05, stat_plot=True, method='mc')[source]

Calculate top Cluster Enriched Features (CEF) from per-cell normalized dataset. An post-clustering feature selection step adapted from [Zeisel et al., 2018, La Manno et al., 2021] and their great [cytograph2](https://github.com/linnarsson-lab/cytograph2) package. For details about CEF calculation, read the methods of [Zeisel et al., 2018]. Note that in original paper, they look for cluster-specific highly expressed genes as CEFs; for methylation, we are looking for hypo-methylation as CEFs, so the score and test is reversed.

Parameters
  • adata – adata containing per-cell normalized values. For methylation fraction, the value need to be 1-centered (1 means cell’s average methylation), like those produced by ALLCools.mcds.mcds.MCDS.add_mc_frac() with normalize_per_cell=True. For RNA and ATAC, you can use per cell normalized counts. Do not log transform the data before running this function

  • cluster_col – The name of categorical variable in adata.obs

  • top_n – Select top N CEFs for each cluster

  • alpha – FDR corrected q-value cutoff

  • stat_plot – Whether making some summary plots for the CEF calculation

  • method – “mc” for methylation CEF (look for hypo-methylation), “rna” and “atac” for the RNA and ATAC or any count based data (use the original cytograph algorithm, look for higher value)

Returns

  • Modify adata inplace, adding a dictionary in adata.uns called f”{cluster_col}_feature_enrichment”

  • The dictionary contains “qvals” (np.ndarray cluster-by-feature enrichment score q-value) and

  • ”cluster_order” (cluster order of the “qvals”)