ALLCools.clustering.ConsensusClustering
Contents
ALLCools.clustering.ConsensusClustering
¶
Module Contents¶
- _r1_normalize(cmat)[source]¶
Adapted from https://github.com/SCCAF/sccaf/blob/develop/SCCAF/__init__.py
Normalize the confusion matrix based on the total number of cells in each class x(i,j) = max(cmat(i,j)/diagnol(i),cmat(j,i)/diagnol(j)) confusion rate between i and j is defined by the maximum ratio i is confused as j or j is confused as i.
Input cmat: the confusion matrix
- Returns
- Return type
the normalized confusion matrix
- _r2_normalize(cmat)[source]¶
Adapted from https://github.com/SCCAF/sccaf/blob/develop/SCCAF/__init__.py
Normalize the confusion matrix based on the total number of cells. x(i,j) = max(cmat(i,j)+cmat(j,i)/N) N is total number of cells analyzed. Confusion rate between i and j is defined by the sum of i confused as j or j confused as i. Then divide by total number of cells.
Input cmat: the confusion matrix
- Returns
- Return type
the normalized confusion matrix
- _leiden_runner(g, random_states, partition_type, **partition_kwargs)[source]¶
run leiden clustering len(random_states) times with different random states, return all clusters as a pd.DataFrame
- _split_train_test_per_group(x, y, frac, max_train, random_state)[source]¶
Split train test for each cluster and make sure there are enough cells for train
- single_supervise_evaluation(clf, x_train, y_train, x_test, y_test, r1_norm_step=0.05, r2_norm_step=0.05)[source]¶
A single fit and merge cluster step
- class ConsensusClustering(model=None, n_neighbors=25, metric='euclidean', min_cluster_size=10, leiden_repeats=200, leiden_resolution=1, target_accuracy=0.95, consensus_rate=0.7, random_state=0, train_frac=0.5, train_max_n=500, max_iter=50, n_jobs=- 1)[source]¶
-
- multi_leiden_clustering(self, partition_type=None, partition_kwargs=None, use_weights=True, n_iterations=- 1)[source]¶
Modified from scanpy, perform Leiden clustering multiple times with different random states
- _summarize_multi_leiden(self)[source]¶
Summarize the multi_leiden results, generate a raw cluster version simply based on the hamming distance between cells and split cluster with cutoff (consensus_rate)
- plot_leiden_cases(self, coord_data, coord_base='umap', plot_size=3, dpi=300, plot_n_cases=4, s=3)[source]¶
Show some leiden runs with biggest different as measured by ARI
- plot_before_after(self, coord_data, coord_base='umap', plot_size=3, dpi=300)[source]¶
Plot the raw clusters from multi-leiden and final clusters after merge
- select_confusion_pairs(true_label, predicted_label, ratio_cutoff=0.001)[source]¶
Select cluster pairs that are confusing (ratio_cutoff) between true and predicted labels
- Parameters
true_label (true cell labels) –
predicted_label (predicted cell labels) –
ratio_cutoff (ratio of clusters cutoff to define confusion) –
- Returns
list of cluster pair tuples
- Return type
confused_pairs