`ALLCools.count_matrix.mcds`¶

Module Contents¶

DEFAULT_MCDS_DTYPE[source]¶

clip_too_large_cov(data_df, machine_max)[source]¶

_region_count_table_to_csr_npz(region_count_tables, region_id_map, output_prefix, compression=True, dtype=DEFAULT_MCDS_DTYPE)[source]¶

helper func of _aggregate_region_count_to_mcds

Take a list of region count table paths, read, aggregate them into a 2D sparse matrix and save the mC and COV separately. This function don’t take care of any path selection, but assume all region_count_table is homogeneous type It return the saved file path

_csr_matrix_to_dataarray(matrix_table, row_name, row_index, col_name, col_index, other_dim_info)[source]¶

helper func of _aggregate_region_count_to_mcds

This function aggregate sparse array files into a single xarray.DataArray, combining cell chunks, mc/cov count type together. The matrix_table provide all file paths, each row is for a cell chunk, with mc and cov matrix path separately.

_aggregate_region_count_to_mcds(output_dir, dataset_name, chunk_size=100, row_name='cell', cpu=1, dtype=DEFAULT_MCDS_DTYPE)[source]¶: This function aggregate all the region count table into a single mcds

generate_mcds(allc_table, output_prefix, chrom_size_path, mc_contexts, rna_table=None, split_strand=False, bin_sizes=None, region_bed_paths=None, region_bed_names=None, cov_cutoff=9999, cpu=1, remove_tmp=True, max_per_mcds=3072, cell_chunk_size=100, dtype=DEFAULT_MCDS_DTYPE, binarize=False, engine='zarr')[source]¶

Generate MCDS from a list of ALLC file provided with file id.

Parameters

allc_table – {allc_table_doc}
output_prefix – Output prefix of the MCDS
chrom_size_path – {chrom_size_path_doc}
mc_contexts – {mc_contexts_doc}
rna_table – {rna_table_doc}
split_strand – {split_strand_doc}
bin_sizes – {bin_sizes_doc}
region_bed_paths – {region_bed_paths_doc}
region_bed_names – {region_bed_names_doc}
cov_cutoff – {cov_cutoff_doc}
cpu – {cpu_basic_doc}
remove_tmp – Whether to remove the temp directory for generating MCDS
max_per_mcds – Maximum number of ALLC files to aggregate into 1 MCDS, if number of ALLC provided > max_per_mcds, will generate MCDS in chunks, with same prefix provided.
cell_chunk_size – Size of cell chunk in parallel aggregation. Do not have any effect on results. Large chunksize needs large memory.
dtype – Data type of MCDS count matrix. Default is np.uint32. For single cell feature count, this can be set to np.uint16, which means the value is 0-65536. The values exceed max will be clipped.
binarize – {binarize_doc}
engine – use zarr or netcdf to store dataset, default is zarr

ALLCools.count_matrix.mcds

Contents

ALLCools.count_matrix.mcds¶

Module Contents¶

`ALLCools.count_matrix.mcds`¶