ALLCools.clustering.doublets

Package Contents

class MethylScrublet(sim_doublet_ratio=2.0, n_neighbors=None, expected_doublet_rate=0.1, stdev_doublet_rate=0.02, metric='euclidean', random_state=0, n_jobs=- 1)[source]
fit(self, mc, cov, clusters=None, batches=None)
simulate_doublets(self)

Simulate doublets by adding the counts of random observed cell pairs.

pca(self)
get_knn_graph(self, data)
calculate_doublet_scores(self)
call_doublets(self, threshold=None)
plot(self)
_plot_cluster_dist(self)
coverage_doublets(allc_dict: dict, resolution: int = 100, cov_cutoff=2, region_alpha=0.01, tmp_dir='doublets_temp_dir', cpu=1, keep_tmp=False)

Quantify cell high coverage bins for doublets evaluation

Parameters
  • allc_dict – dict with cell_id as key, allc_path as value

  • resolution – genome bin resolution to quantify, bps

  • cov_cutoff – cutoff the cov, sites within cov_cutoff < cov <= 2 * cov_cutoff will be count

  • region_alpha – FDR adjusted P-value cutoff

  • tmp_dir – temporary dir to save the results

  • cpu – number of cpu to use

  • keep_tmp – Whether save the tem_dir for debugging