`ALLCools.clustering.feature_selection.feature_enrichment`¶

Module Contents¶

_calculate_enrichment_score(raw_adata, labels)[source]¶: Enrichment score modified from [Zeisel et al., 2018] for normalized methylation fractions Assuming the methylation value is posterior frac calculated by MCDS.add_mc_frac)

_calculate_enrichment_score_cytograph(adata, labels)[source]¶: The original CEF algorithm from [Zeisel et al., 2018] for count based data (RNA, ATAC)

_plot_enrichment_result(qvals, enrichment, null_enrichment, alpha)[source]¶: Make some plots for the p-value, q-value and # of CEFs distribution

_aggregate_enrichment(adata, enrichment, top_n, alpha, qvals, cluster_col)[source]¶: Aggregate enrichment results, calculate q values

cluster_enriched_features(adata: anndata.AnnData, cluster_col: str, top_n=200, alpha=0.05, stat_plot=True, method='mc')[source]¶

Calculate top Cluster Enriched Features (CEF) from per-cell normalized dataset. An post-clustering feature selection step adapted from [Zeisel et al., 2018, La Manno et al., 2021] and their great [cytograph2](https://github.com/linnarsson-lab/cytograph2) package. For details about CEF calculation, read the methods of [Zeisel et al., 2018]. Note that in original paper, they look for cluster-specific highly expressed genes as CEFs; for methylation, we are looking for hypo-methylation as CEFs, so the score and test is reversed.

Parameters

adata – adata containing per-cell normalized values. For methylation fraction, the value need to be 1-centered (1 means cell’s average methylation), like those produced by ALLCools.mcds.mcds.MCDS.add_mc_frac() with normalize_per_cell=True. For RNA and ATAC, you can use per cell normalized counts. Do not log transform the data before running this function
cluster_col – The name of categorical variable in adata.obs
top_n – Select top N CEFs for each cluster
alpha – FDR corrected q-value cutoff
stat_plot – Whether making some summary plots for the CEF calculation
method – “mc” for methylation CEF (look for hypo-methylation), “rna” and “atac” for the RNA and ATAC or any count based data (use the original cytograph algorithm, look for higher value)

Returns

Modify adata inplace, adding a dictionary in adata.uns called f”{cluster_col}_feature_enrichment”
The dictionary contains “qvals” (np.ndarray cluster-by-feature enrichment score q-value) and
”cluster_order” (cluster order of the “qvals”)

ALLCools.clustering.feature_selection.feature_enrichment

Contents

ALLCools.clustering.feature_selection.feature_enrichment¶

Module Contents¶

`ALLCools.clustering.feature_selection.feature_enrichment`¶