`ALLCools._extract_allc`¶

Module Contents¶

_merge_cg_strand(in_path, out_path)[source]¶: Merge strand after extract context step in extract_allc (and only apply on CG), so no need to check context.

_check_strandness_parameter(strandness) → str[source]¶

_check_out_format_parameter(out_format, binarize=False) → Tuple[str, Callable[[list], str]][source]¶

_merge_gz_files(file_list, output_path)[source]¶: Merge the small chunk files generated by _extract_allc_parallel, remove the small files after merge

_extract_allc_parallel(allc_path, output_prefix, mc_contexts, strandness, output_format, chrom_size_path, cov_cutoff, cpu, chunk_size=100000000, tabix=True)[source]¶: Parallel extract_allc on region level Then parallel merge region chunk files to the final output in order Same input output as extract_allc, but will generate a bunch of small files during running Don’t use this on small files

extract_allc(allc_path: str, output_prefix: str, mc_contexts: Union[str, list], chrom_size_path: str, strandness: str = 'both', output_format: str = 'allc', region: str = None, cov_cutoff: int = 9999, tabix: bool = True, cpu=1, binarize=False)[source]¶

Extract information (strand, context) from 1 ALLC file. Save to several formats.

Parameters

allc_path – {allc_path_doc}
output_prefix – Path prefix of the output ALLC file.
mc_contexts – {mc_contexts_doc}
strandness – {strandness_doc}
output_format – Output format of extracted information, possible values are: 1. allc: keep the allc format 2. bed5: 5-column bed format, chrom, pos, pos, mc, cov
chrom_size_path – {chrom_size_path_doc} If chrom_size_path provided, will use it to extract ALLC with chrom order, but if region provided, will ignore this.
region – {region_doc}
cov_cutoff – {cov_cutoff_doc}
tabix – Whether to generate tabix if format is ALLC, only set this to False from _extract_allc_parallel
cpu – {cpu_basic_doc} This function parallel on region level and will generate a bunch of small files if cpu > 1. Do not use cpu > 1 for single cell region count. For single cell data, parallel on cell level is better.
binarize – {binarize_doc}

Returns

Return type

A list of output file paths, not include index files.

ALLCools._extract_allc

Contents

ALLCools._extract_allc¶

Module Contents¶

`ALLCools._extract_allc`¶