`ALLCools.clustering.balanced_pca`¶

Perform Balanced PCA as well as Reproducible PCA (A class allow user provide fitted model and just transform MCDS to adata with PCs)

Module Contents¶

log_scale(adata, method='standard', with_mean=False, with_std=True, max_value=10, scaler=None)[source]¶

Perform log transform and then scale the cell-by-feature matrix

Parameters

adata – adata with normalized, unscaled cell-by-feature matrix
method – the type of scaler to use: ‘standard’ for sklearn.preprocessing.StandardScaler; ‘robust’ for sklearn.preprocessing.RobustScaler.
with_mean – Whether scale with mean center
with_std – Whether scale the std
max_value – Whether clip large values after scale
scaler – A fitted sklearn scaler, if provided, will only use it to transform the adata.

Returns

Return type

adata.X is scaled in place, the fitted scaler object will be return if the scaler parameter is None.

significant_pc_test(adata: anndata.AnnData, p_cutoff=0.1, update=True, obsm='X_pca', downsample=50000)[source]¶

Perform two-sample Kolmogorov-Smirnov test for goodness of fit on two adjacent PCs, select top PCs based on the p_cutoff. Top PCs have significantly different distributions, while later PCs only capturing random noise will have larger p-values. An idea from [Zeisel et al., 2018].

Parameters

adata – adata with PC matrix calculated and stored in adata.obsm
p_cutoff – the p-value cutoff to select top PCs
update – Whether modify adata.obsm and only keep significant PCs
obsm – name of the PC matrix in adata.obsm
downsample – If the dataset is too large, downsample the cells before testing.

Returns

number of PCs selected

Return type

n_components

balanced_pca(adata: anndata.AnnData, groups: str = 'pre_clusters', max_cell_prop=0.1, n_comps=200, scale=False)[source]¶

Given a categorical variable (e.g., a pre-clustering label), perform balanced PCA by downsample cells in the large categories to make the overall population more balanced, so the PCs are expected to represent more variance among small categories.

Parameters

adata – adata after preprocessing and feature selection steps
groups – the name of the categorical variable in adata.obsm
max_cell_prop – any single category with cells > n_cell * max_cell_prop will be downsampled to this number.
n_comps – Number of components in PCA
scale – whether to scale the input matrix before PCA

Returns

Return type

adata with PC information stored in obsm, varm and uns like the scanpy.tl.pca() do.

get_pc_centers(adata: anndata.AnnData, group: str, outlier_label=None, obsm='X_pca')[source]¶

Get the cluster centroids of the PC matrix

Parameters

adata – adata with cluster labels in obs and PC in obsm
group – the name of cluster labels in adata.obs
outlier_label – if there are outlier labels in obs, will exclude outliers before calculating centroids.
obsm – the key of PC matrix in obsm

Returns

a dataframe for cluster centroids by PC

Return type

pc_center

class ReproduciblePCA(scaler, mc_type, adata=None, pca_obj=None, pc_loading=None, var_names=None, max_value=10)[source]¶

mcds_to_adata(self, mcds)[source]¶

Get adata from MCDS with only selected features

Parameters: mcds (Input raw count MCDS object) –
Returns: Adata with per-cell normalized mC fraction and selected features
Return type: adata

scale(self, adata)[source]¶

Perform {func}`log_scale <ALLCools.clustering.balanced_pca.log_scale>` with fitted scaler

Parameters: adata – Adata with per-cell normalized mC fraction and selected features
Returns
Return type: adata.X is transformed in place

pc_transform(self, adata)[source]¶

calculate the PC from adata.X and PC loading, store PCs in adata.obsm[“X_pca”] and loadings in adata.varm[“PCs”]

Parameters: adata – Adata with log_scale transformed mC fraction and selected features
Returns
Return type: PC information stored in adata.obsm and varm

mcds_to_adata_with_pc(self, mcds)[source]¶

From raw count MCDS to adata object with PCs using fitted scaler and PC loadings. Steps include select features, calculate per-cell normalized mC fractions, log_scale transform the data with fitted scaler, and finally add PC matrix.

Parameters: mcds – Raw count MCDS
Returns: anndata object with per-cell normalized, log scale transformed matrix in .X and PCs in adata.obsm[“X_pca”] and PC loadings in adata.varm[“PCs”]. The scale and PC are done with fitted model.
Return type: adata

dump(self, path)[source]¶: Save the ReproduciblePCA to path

ALLCools.clustering.balanced_pca

Contents

ALLCools.clustering.balanced_pca¶

Module Contents¶

`ALLCools.clustering.balanced_pca`¶