ALLCools._extract_allc
Contents
ALLCools._extract_allc
¶
Module Contents¶
- _merge_cg_strand(in_path, out_path)[source]¶
Merge strand after extract context step in extract_allc (and only apply on CG), so no need to check context.
- _merge_gz_files(file_list, output_path)[source]¶
Merge the small chunk files generated by _extract_allc_parallel, remove the small files after merge
- _extract_allc_parallel(allc_path, output_prefix, mc_contexts, strandness, output_format, chrom_size_path, cov_cutoff, cpu, chunk_size=100000000, tabix=True)[source]¶
Parallel extract_allc on region level Then parallel merge region chunk files to the final output in order Same input output as extract_allc, but will generate a bunch of small files during running Don’t use this on small files
- extract_allc(allc_path: str, output_prefix: str, mc_contexts: Union[str, list], chrom_size_path: str, strandness: str = 'both', output_format: str = 'allc', region: str = None, cov_cutoff: int = 9999, tabix: bool = True, cpu=1, binarize=False)[source]¶
Extract information (strand, context) from 1 ALLC file. Save to several formats.
- Parameters
allc_path – {allc_path_doc}
output_prefix – Path prefix of the output ALLC file.
mc_contexts – {mc_contexts_doc}
strandness – {strandness_doc}
output_format – Output format of extracted information, possible values are: 1. allc: keep the allc format 2. bed5: 5-column bed format, chrom, pos, pos, mc, cov
chrom_size_path – {chrom_size_path_doc} If chrom_size_path provided, will use it to extract ALLC with chrom order, but if region provided, will ignore this.
region – {region_doc}
cov_cutoff – {cov_cutoff_doc}
tabix – Whether to generate tabix if format is ALLC, only set this to False from _extract_allc_parallel
cpu – {cpu_basic_doc} This function parallel on region level and will generate a bunch of small files if cpu > 1. Do not use cpu > 1 for single cell region count. For single cell data, parallel on cell level is better.
binarize – {binarize_doc}
- Returns
- Return type
A list of output file paths, not include index files.