allcools mcds

allcools mcds

Warning

This command is deprecated, please consider using allcools generate-dataset as the uniform method to generate methylation count matrix.

execute_command_and_return_markdown('allcools generate-mcds -h')
$ allcools generate-mcds -h
usage: allcools generate-mcds [-h] --allc_table ALLC_TABLE --output_prefix
                              OUTPUT_PREFIX --chrom_size_path CHROM_SIZE_PATH
                              --mc_contexts MC_CONTEXTS [MC_CONTEXTS ...]
                              [--split_strand]
                              [--bin_sizes BIN_SIZES [BIN_SIZES ...]]
                              [--region_bed_paths REGION_BED_PATHS [REGION_BED_PATHS ...]]
                              [--region_bed_names REGION_BED_NAMES [REGION_BED_NAMES ...]]
                              [--cov_cutoff COV_CUTOFF] [--cpu CPU]
                              [--max_per_mcds MAX_PER_MCDS]
                              [--cell_chunk_size CELL_CHUNK_SIZE]
                              [--dtype {uint8,uint16,uint32,uint64,int8,int16,int32,int64,bool}]

optional arguments:
  -h, --help            show this help message and exit
  --split_strand        If true, Watson (+) and Crick (-) strands will be
                        count separately (default: False)
  --bin_sizes BIN_SIZES [BIN_SIZES ...]
                        Fix-size genomic bins can be defined by bin_sizes and
                        chrom_size_path. Space separated sizes of genome bins,
                        each size will be count separately. (default: None)
  --region_bed_paths REGION_BED_PATHS [REGION_BED_PATHS ...]
                        Arbitrary genomic regions can be defined in several
                        BED files to count on. Space separated paths to each
                        BED files, The fourth column of the BED file should be
                        unique id of the regions. (default: None)
  --region_bed_names REGION_BED_NAMES [REGION_BED_NAMES ...]
                        Space separated names for each BED file provided in
                        region_bed_paths. (default: None)
  --cov_cutoff COV_CUTOFF
                        Max cov filter for a single site in ALLC. Sites with
                        cov > cov_cutoff will be skipped. (default: 9999)
  --cpu CPU             Number of processes to use in parallel. (default: 1)
  --max_per_mcds MAX_PER_MCDS
                        Maximum number of ALLC files to aggregate into 1 MCDS,
                        if number of ALLC provided > max_per_mcds, will
                        generate MCDS in chunks, with same prefix provided.
                        (default: 3072)
  --cell_chunk_size CELL_CHUNK_SIZE
                        Size of cell chunk in parallel aggregation. Do not
                        have any effect on results. Large chunksize needs
                        large memory. (default: 100)
  --dtype {uint8,uint16,uint32,uint64,int8,int16,int32,int64,bool}
                        Data type of MCDS count matrix. Default is np.uint32.
                        For single cell feature count, this can be set to
                        np.uint16 [0, 65536] to decrease file size. The values
                        exceed min/max will be clipped while keep the mc/cov
                        same, and a warning will be sent. (default: uint32)

required arguments:
  --allc_table ALLC_TABLE
                        Contain all the ALLC file information in two tab-
                        separated columns: 1. file_uid, 2. file_path. No
                        header (default: None)
  --output_prefix OUTPUT_PREFIX
                        Output prefix of the MCDS (default: None)
  --chrom_size_path CHROM_SIZE_PATH
                        Path to UCSC chrom size file. This can be generated
                        from the genome fasta or downloaded via UCSC
                        fetchChromSizes tools. All ALLCools functions will
                        refer to this file whenever possible to check for
                        chromosome names and lengths, so it is crucial to use
                        a chrom size file consistent to the reference fasta
                        file ever since mapping. ALLCools functions will not
                        change or infer chromosome names. (default: None)
  --mc_contexts MC_CONTEXTS [MC_CONTEXTS ...]
                        Space separated mC context patterns to extract from
                        ALLC. The context length should be the same as ALLC
                        file context. Context pattern follows IUPAC nucleotide
                        code, e.g. N for ATCG, H for ATC, Y for CT. (default:
                        None)