ALLCools.pseudo_cell
Contents
ALLCools.pseudo_cell
¶
Package Contents¶
- generate_pseudo_cells_kmeans(adata, cluster_col='leiden', obsm='X_pca', cluster_size_cutoff=100, max_pseudo_size=25, aggregate_func='downsample')¶
Balance the clusters by merge or downsample cells within each cluster. We first group the data by pre-defined clusters (cluster_col), then run k-means clustering iteratively on clusters with size > cluster_size_cutoff, the k-means clusters are called cell groups, and the maximum cell group size < max_pseudo_size, Finally, we generate a new adata for the balanced dataset.
- Parameters
adata – Original AnnData object, raw count in X is recommended if aggregate_func is sum.
cluster_col – The clustering label for downsample
obsm – The obsm key name to use for performing k-means clustering within clusters.
cluster_size_cutoff – Cluster size smaller than the cutoff will not be downsample or aggregated.
max_pseudo_size – Maximum number of cells in one pseudo-cell group
aggregate_func – ‘downsample’ means randomly select one cell from one pseudo-cell group; ‘sum’ means sum up all values in a pseudo-cell group ‘mean’ means take the average of each feature in a pseudo-cell group ‘median’ means take the median of each feature in a pseudo-cell group
- generate_pseudo_cells_knn(adata, cluster_col='leiden', obsm='X_pca', target_pseudo_size=100, min_pseudo_size=None, ignore_small_cluster=False, n_components=None, aggregate_func='downsample', pseudo_ovlp=0)¶