ALLCools._allc_to_region_count
Contents
ALLCools._allc_to_region_count
¶
Module Contents¶
- _bedtools_map(region_bed, site_bed, out_bed, chrom_size_path, save_zero_cov=True)[source]¶
Use bedtools map to map site_bed format into any bed file provided.
- _map_to_sparse_chrom_bin(site_bed, out_bed, chrom_size_path, bin_size=500)[source]¶
Calculate chromosome bins regional count, output is SPARSE, bin_id constructed from chrom_size_path and can be reproduce.
- allc_to_region_count(allc_path: str, output_prefix: str, chrom_size_path: str, mc_contexts: List[str], split_strand: bool = False, region_bed_paths: List[str] = None, region_bed_names: List[str] = None, bin_sizes: List[int] = None, cov_cutoff: int = 9999, save_zero_cov: bool = False, remove_tmp: bool = True, cpu: int = 1, binarize: bool = False)[source]¶
Calculate mC and cov at regional level. Region can be provided in 2 forms: 1. BED file, provided by region_bed_paths, containing arbitrary regions and use bedtools map to calculate; 2. Fix-size non-overlap genome bins, provided by bin_sizes. Form 2 is much faster to calculate than form 1. The output file is in 6-column bed-like format: chrom start end region_uid mc cov
- Parameters
allc_path – {allc_path_doc}
output_prefix – Path prefix of the output region count file.
chrom_size_path – {chrom_size_path_doc}
mc_contexts – {mc_contexts_doc}
split_strand – {split_strand_doc}
region_bed_paths – {region_bed_paths_doc}
region_bed_names – {region_bed_names_doc}
bin_sizes – {bin_sizes_doc}
cov_cutoff – {cov_cutoff_doc}
save_zero_cov – Whether to save the regions that have 0 cov, only apply to region count but not the chromosome count
remove_tmp – Whether to remove the temporary BED file
cpu – {cpu_basic_doc} This function parallel on region level at the extraction step and will generate a bunch of small files if cpu > 1. Do not use cpu > 1 for single cell region count. For single cell data, parallel on cell level is better.
binarize – {binarize_doc}