ALLCools.clustering.feature_selection.feature_enrichment
Contents
ALLCools.clustering.feature_selection.feature_enrichment
¶
Module Contents¶
- _calculate_enrichment_score(raw_adata, labels)[source]¶
Enrichment score modified from [Zeisel et al., 2018] for normalized methylation fractions Assuming the methylation value is posterior frac calculated by MCDS.add_mc_frac)
- _calculate_enrichment_score_cytograph(adata, labels)[source]¶
The original CEF algorithm from [Zeisel et al., 2018] for count based data (RNA, ATAC)
- _plot_enrichment_result(qvals, enrichment, null_enrichment, alpha)[source]¶
Make some plots for the p-value, q-value and # of CEFs distribution
- _aggregate_enrichment(adata, enrichment, top_n, alpha, qvals, cluster_col)[source]¶
Aggregate enrichment results, calculate q values
- cluster_enriched_features(adata: anndata.AnnData, cluster_col: str, top_n=200, alpha=0.05, stat_plot=True, method='mc')[source]¶
Calculate top Cluster Enriched Features (CEF) from per-cell normalized dataset. An post-clustering feature selection step adapted from [Zeisel et al., 2018, La Manno et al., 2021] and their great [cytograph2](https://github.com/linnarsson-lab/cytograph2) package. For details about CEF calculation, read the methods of [Zeisel et al., 2018]. Note that in original paper, they look for cluster-specific highly expressed genes as CEFs; for methylation, we are looking for hypo-methylation as CEFs, so the score and test is reversed.
- Parameters
adata – adata containing per-cell normalized values. For methylation fraction, the value need to be 1-centered (1 means cell’s average methylation), like those produced by
ALLCools.mcds.mcds.MCDS.add_mc_frac()
with normalize_per_cell=True. For RNA and ATAC, you can use per cell normalized counts. Do not log transform the data before running this functioncluster_col – The name of categorical variable in adata.obs
top_n – Select top N CEFs for each cluster
alpha – FDR corrected q-value cutoff
stat_plot – Whether making some summary plots for the CEF calculation
method – “mc” for methylation CEF (look for hypo-methylation), “rna” and “atac” for the RNA and ATAC or any count based data (use the original cytograph algorithm, look for higher value)
- Returns
Modify adata inplace, adding a dictionary in adata.uns called f”{cluster_col}_feature_enrichment”
The dictionary contains “qvals” (np.ndarray cluster-by-feature enrichment score q-value) and
”cluster_order” (cluster order of the “qvals”)