ALLCools.clustering.ConsensusClustering

Module Contents

_r1_normalize(cmat)[source]

Adapted from https://github.com/SCCAF/sccaf/blob/develop/SCCAF/__init__.py

Normalize the confusion matrix based on the total number of cells in each class x(i,j) = max(cmat(i,j)/diagnol(i),cmat(j,i)/diagnol(j)) confusion rate between i and j is defined by the maximum ratio i is confused as j or j is confused as i.

Input cmat: the confusion matrix

Returns

Return type

the normalized confusion matrix

_r2_normalize(cmat)[source]

Adapted from https://github.com/SCCAF/sccaf/blob/develop/SCCAF/__init__.py

Normalize the confusion matrix based on the total number of cells. x(i,j) = max(cmat(i,j)+cmat(j,i)/N) N is total number of cells analyzed. Confusion rate between i and j is defined by the sum of i confused as j or j confused as i. Then divide by total number of cells.

Input cmat: the confusion matrix

Returns

Return type

the normalized confusion matrix

_leiden_runner(g, random_states, partition_type, **partition_kwargs)[source]

run leiden clustering len(random_states) times with different random states, return all clusters as a pd.DataFrame

_split_train_test_per_group(x, y, frac, max_train, random_state)[source]

Split train test for each cluster and make sure there are enough cells for train

single_supervise_evaluation(clf, x_train, y_train, x_test, y_test, r1_norm_step=0.05, r2_norm_step=0.05)[source]

A single fit and merge cluster step

class ConsensusClustering(model=None, n_neighbors=25, metric='euclidean', min_cluster_size=10, leiden_repeats=200, leiden_resolution=1, target_accuracy=0.95, consensus_rate=0.7, random_state=0, train_frac=0.5, train_max_n=500, max_iter=50, n_jobs=- 1)[source]
add_data(self, x)[source]
fit_predict(self, x, leiden_kwds=None)[source]
compute_neighbors(self)[source]

Calculate KNN graph

multi_leiden_clustering(self, partition_type=None, partition_kwargs=None, use_weights=True, n_iterations=- 1)[source]

Modified from scanpy, perform Leiden clustering multiple times with different random states

_summarize_multi_leiden(self)[source]

Summarize the multi_leiden results, generate a raw cluster version simply based on the hamming distance between cells and split cluster with cutoff (consensus_rate)

_create_model(self, n_estimators=1000)[source]

Init default model

supervise_learning(self)[source]

Perform supervised learning and cluster merge process

final_evaluation(self)[source]

Final evaluation of the model and assign outliers

save(self, output_path)[source]

Save the model

plot_leiden_cases(self, coord_data, coord_base='umap', plot_size=3, dpi=300, plot_n_cases=4, s=3)[source]

Show some leiden runs with biggest different as measured by ARI

plot_before_after(self, coord_data, coord_base='umap', plot_size=3, dpi=300)[source]

Plot the raw clusters from multi-leiden and final clusters after merge

plot_steps(self, coord_data, coord_base='umap', plot_size=3, dpi=300)[source]

Plot the supervised learning and merge steps

plot_merge_process(self, plot_size=3)[source]

Plot the change of accuracy during merge

select_confusion_pairs(true_label, predicted_label, ratio_cutoff=0.001)[source]

Select cluster pairs that are confusing (ratio_cutoff) between true and predicted labels

Parameters
  • true_label (true cell labels) –

  • predicted_label (predicted cell labels) –

  • ratio_cutoff (ratio of clusters cutoff to define confusion) –

Returns

list of cluster pair tuples

Return type

confused_pairs