ALLCools.pseudo_cell.pseudo_cell_kmeans

Module Contents

_kmeans_division(matrix, cells, max_pseudo_size, max_k=50)[source]
_calculate_pseudo_group(clusters, total_matrix, cluster_size_cutoff=100, max_pseudo_size=25)[source]
_merge_pseudo_cell(adata, aggregate_func, pseudo_group_key)[source]
generate_pseudo_cells(adata, cluster_col='leiden', obsm='X_pca', cluster_size_cutoff=100, max_pseudo_size=25, aggregate_func='downsample')[source]

Balance the clusters by merge or downsample cells within each cluster. We first group the data by pre-defined clusters (cluster_col), then run k-means clustering iteratively on clusters with size > cluster_size_cutoff, the k-means clusters are called cell groups, and the maximum cell group size < max_pseudo_size, Finally, we generate a new adata for the balanced dataset.

Parameters
  • adata – Original AnnData object, raw count in X is recommended if aggregate_func is sum.

  • cluster_col – The clustering label for downsample

  • obsm – The obsm key name to use for performing k-means clustering within clusters.

  • cluster_size_cutoff – Cluster size smaller than the cutoff will not be downsample or aggregated.

  • max_pseudo_size – Maximum number of cells in one pseudo-cell group

  • aggregate_func – ‘downsample’ means randomly select one cell from one pseudo-cell group; ‘sum’ means sum up all values in a pseudo-cell group ‘mean’ means take the average of each feature in a pseudo-cell group ‘median’ means take the median of each feature in a pseudo-cell group