allcools mcds
allcools mcds
¶
Warning
This command is deprecated, please consider using allcools generate-dataset
as the uniform method to generate methylation count matrix.
execute_command_and_return_markdown('allcools generate-mcds -h')
$ allcools generate-mcds -h
usage: allcools generate-mcds [-h] --allc_table ALLC_TABLE --output_prefix
OUTPUT_PREFIX --chrom_size_path CHROM_SIZE_PATH
--mc_contexts MC_CONTEXTS [MC_CONTEXTS ...]
[--split_strand]
[--bin_sizes BIN_SIZES [BIN_SIZES ...]]
[--region_bed_paths REGION_BED_PATHS [REGION_BED_PATHS ...]]
[--region_bed_names REGION_BED_NAMES [REGION_BED_NAMES ...]]
[--cov_cutoff COV_CUTOFF] [--cpu CPU]
[--max_per_mcds MAX_PER_MCDS]
[--cell_chunk_size CELL_CHUNK_SIZE]
[--dtype {uint8,uint16,uint32,uint64,int8,int16,int32,int64,bool}]
optional arguments:
-h, --help show this help message and exit
--split_strand If true, Watson (+) and Crick (-) strands will be
count separately (default: False)
--bin_sizes BIN_SIZES [BIN_SIZES ...]
Fix-size genomic bins can be defined by bin_sizes and
chrom_size_path. Space separated sizes of genome bins,
each size will be count separately. (default: None)
--region_bed_paths REGION_BED_PATHS [REGION_BED_PATHS ...]
Arbitrary genomic regions can be defined in several
BED files to count on. Space separated paths to each
BED files, The fourth column of the BED file should be
unique id of the regions. (default: None)
--region_bed_names REGION_BED_NAMES [REGION_BED_NAMES ...]
Space separated names for each BED file provided in
region_bed_paths. (default: None)
--cov_cutoff COV_CUTOFF
Max cov filter for a single site in ALLC. Sites with
cov > cov_cutoff will be skipped. (default: 9999)
--cpu CPU Number of processes to use in parallel. (default: 1)
--max_per_mcds MAX_PER_MCDS
Maximum number of ALLC files to aggregate into 1 MCDS,
if number of ALLC provided > max_per_mcds, will
generate MCDS in chunks, with same prefix provided.
(default: 3072)
--cell_chunk_size CELL_CHUNK_SIZE
Size of cell chunk in parallel aggregation. Do not
have any effect on results. Large chunksize needs
large memory. (default: 100)
--dtype {uint8,uint16,uint32,uint64,int8,int16,int32,int64,bool}
Data type of MCDS count matrix. Default is np.uint32.
For single cell feature count, this can be set to
np.uint16 [0, 65536] to decrease file size. The values
exceed min/max will be clipped while keep the mc/cov
same, and a warning will be sent. (default: uint32)
required arguments:
--allc_table ALLC_TABLE
Contain all the ALLC file information in two tab-
separated columns: 1. file_uid, 2. file_path. No
header (default: None)
--output_prefix OUTPUT_PREFIX
Output prefix of the MCDS (default: None)
--chrom_size_path CHROM_SIZE_PATH
Path to UCSC chrom size file. This can be generated
from the genome fasta or downloaded via UCSC
fetchChromSizes tools. All ALLCools functions will
refer to this file whenever possible to check for
chromosome names and lengths, so it is crucial to use
a chrom size file consistent to the reference fasta
file ever since mapping. ALLCools functions will not
change or infer chromosome names. (default: None)
--mc_contexts MC_CONTEXTS [MC_CONTEXTS ...]
Space separated mC context patterns to extract from
ALLC. The context length should be the same as ALLC
file context. Context pattern follows IUPAC nucleotide
code, e.g. N for ATCG, H for ATC, Y for CT. (default:
None)