allcools extract
allcools extract
¶
execute_command_and_return_markdown('allcools extract-allc -h')
$ allcools extract-allc -h
usage: allcools extract-allc [-h] --allc_path ALLC_PATH --output_prefix
OUTPUT_PREFIX --mc_contexts MC_CONTEXTS
[MC_CONTEXTS ...] --chrom_size_path
CHROM_SIZE_PATH [--strandness {both,split,merge}]
[--output_format {allc,bed5}] [--region REGION]
[--cov_cutoff COV_CUTOFF] [--cpu CPU]
optional arguments:
-h, --help show this help message and exit
--strandness {both,split,merge}
What to do with strand information, possible values
are: 1. both: save +/- strand together in one file
without any modification; 2. split: save +/- strand
into two separate files, with suffix contain Watson
(+) and Crick (-); 3. merge: This will only merge the
count on adjacent CpG in +/- strands, only work for
CpG like context. For non-CG context, its the same as
both. (default: both)
--output_format {allc,bed5}
Output format of extracted information, possible
values are: 1. allc: keep the allc format; 2. bed5:
5-column bed format, chrom, pos, pos, mc, cov;
(default: allc)
--region REGION Only extract records from certain genome region(s) via
tabix, multiple region can be provided in tabix form.
If region is not None, will not run in parallel
(default: None)
--cov_cutoff COV_CUTOFF
Max cov filter for a single site in ALLC. Sites with
cov > cov_cutoff will be skipped. (default: 99999)
--cpu CPU Number of processes to use in parallel. This function
parallel on region level and will generate a bunch of
small files if cpu > 1. Do not use cpu > 1 for single
cell region count. For single cell data, parallel on
cell level is better. (default: 1)
required arguments:
--allc_path ALLC_PATH
Path to 1 ALLC file (default: None)
--output_prefix OUTPUT_PREFIX
Path prefix of the output ALLC file. (default: None)
--mc_contexts MC_CONTEXTS [MC_CONTEXTS ...]
Space separated mC context patterns to extract from
ALLC. The context length should be the same as ALLC
file context. Context pattern follows IUPAC nucleotide
code, e.g. N for ATCG, H for ATC, Y for CT. (default:
None)
--chrom_size_path CHROM_SIZE_PATH
Path to UCSC chrom size file. This can be generated
from the genome fasta or downloaded via UCSC
fetchChromSizes tools. All ALLCools functions will
refer to this file whenever possible to check for
chromosome names and lengths, so it is crucial to use
a chrom size file consistent to the reference fasta
file ever since mapping. ALLCools functions will not
change or infer chromosome names. (default: None)