allcools extract

allcools extract

execute_command_and_return_markdown('allcools extract-allc -h')
$ allcools extract-allc -h
usage: allcools extract-allc [-h] --allc_path ALLC_PATH --output_prefix
                             OUTPUT_PREFIX --mc_contexts MC_CONTEXTS
                             [MC_CONTEXTS ...] --chrom_size_path
                             CHROM_SIZE_PATH [--strandness {both,split,merge}]
                             [--output_format {allc,bed5}] [--region REGION]
                             [--cov_cutoff COV_CUTOFF] [--cpu CPU]

optional arguments:
  -h, --help            show this help message and exit
  --strandness {both,split,merge}
                        What to do with strand information, possible values
                        are: 1. both: save +/- strand together in one file
                        without any modification; 2. split: save +/- strand
                        into two separate files, with suffix contain Watson
                        (+) and Crick (-); 3. merge: This will only merge the
                        count on adjacent CpG in +/- strands, only work for
                        CpG like context. For non-CG context, its the same as
                        both. (default: both)
  --output_format {allc,bed5}
                        Output format of extracted information, possible
                        values are: 1. allc: keep the allc format; 2. bed5:
                        5-column bed format, chrom, pos, pos, mc, cov;
                        (default: allc)
  --region REGION       Only extract records from certain genome region(s) via
                        tabix, multiple region can be provided in tabix form.
                        If region is not None, will not run in parallel
                        (default: None)
  --cov_cutoff COV_CUTOFF
                        Max cov filter for a single site in ALLC. Sites with
                        cov > cov_cutoff will be skipped. (default: 99999)
  --cpu CPU             Number of processes to use in parallel. This function
                        parallel on region level and will generate a bunch of
                        small files if cpu > 1. Do not use cpu > 1 for single
                        cell region count. For single cell data, parallel on
                        cell level is better. (default: 1)

required arguments:
  --allc_path ALLC_PATH
                        Path to 1 ALLC file (default: None)
  --output_prefix OUTPUT_PREFIX
                        Path prefix of the output ALLC file. (default: None)
  --mc_contexts MC_CONTEXTS [MC_CONTEXTS ...]
                        Space separated mC context patterns to extract from
                        ALLC. The context length should be the same as ALLC
                        file context. Context pattern follows IUPAC nucleotide
                        code, e.g. N for ATCG, H for ATC, Y for CT. (default:
                        None)
  --chrom_size_path CHROM_SIZE_PATH
                        Path to UCSC chrom size file. This can be generated
                        from the genome fasta or downloaded via UCSC
                        fetchChromSizes tools. All ALLCools functions will
                        refer to this file whenever possible to check for
                        chromosome names and lengths, so it is crucial to use
                        a chrom size file consistent to the reference fasta
                        file ever since mapping. ALLCools functions will not
                        change or infer chromosome names. (default: None)