ALLCools.pseudo_cell

Package Contents

generate_pseudo_cells_kmeans(adata, cluster_col='leiden', obsm='X_pca', cluster_size_cutoff=100, max_pseudo_size=25, aggregate_func='downsample')

Balance the clusters by merge or downsample cells within each cluster. We first group the data by pre-defined clusters (cluster_col), then run k-means clustering iteratively on clusters with size > cluster_size_cutoff, the k-means clusters are called cell groups, and the maximum cell group size < max_pseudo_size, Finally, we generate a new adata for the balanced dataset.

Parameters
  • adata – Original AnnData object, raw count in X is recommended if aggregate_func is sum.

  • cluster_col – The clustering label for downsample

  • obsm – The obsm key name to use for performing k-means clustering within clusters.

  • cluster_size_cutoff – Cluster size smaller than the cutoff will not be downsample or aggregated.

  • max_pseudo_size – Maximum number of cells in one pseudo-cell group

  • aggregate_func – ‘downsample’ means randomly select one cell from one pseudo-cell group; ‘sum’ means sum up all values in a pseudo-cell group ‘mean’ means take the average of each feature in a pseudo-cell group ‘median’ means take the median of each feature in a pseudo-cell group

generate_pseudo_cells_knn(adata, cluster_col='leiden', obsm='X_pca', target_pseudo_size=100, min_pseudo_size=None, ignore_small_cluster=False, n_components=None, aggregate_func='downsample', pseudo_ovlp=0)