`ALLCools._merge_allc`¶

Some of the functions are modified from methylpy https://github.com/yupenghe/methylpy

Original Author: Yupeng He

Module Contents¶

log[source]¶

DEFAULT_MAX_ALLC = 150[source]¶

PROCESS[source]¶

class _ALLC(path, region)[source]¶

readline(self)[source]¶

close(self)[source]¶

_increase_soft_fd_limit()[source]¶

Increase soft file descriptor limit to hard limit, this is the maximum a process can do Use this in merge_allc, because for single cell, a lot of file need to be opened

Some useful discussion https://unix.stackexchange.com/questions/36841/why-is-number-of-open-files-limited-in-linux https://docs.python.org/3.6/library/resource.html https://stackoverflow.com/questions/6774724/why-python-has-limit-for-count-of-file-handles/6776345

_batch_merge_allc_files_tabix(allc_files, out_file, chrom_size_file, bin_length, cpu=10, binarize=False, snp=False)[source]¶

_merge_allc_files_tabix(allc_files, out_file, chrom_size_file, query_region=None, buffer_line_number=10000, binarize=False)[source]¶

_merge_allc_files_tabix_with_snp_info(allc_files, out_file, chrom_size_file, query_region=None, buffer_line_number=10000, binarize=False)[source]¶

merge_allc_files(allc_paths, output_path, chrom_size_path, bin_length=10000000, cpu=10, binarize=False, snp=False)[source]¶

Merge N ALLC files into 1 ALLC file.

Parameters

allc_paths – {allc_paths_doc}
output_path – Path to the output merged ALLC file.
chrom_size_path – {chrom_size_path_doc}
bin_length – Length of the genome bin in each parallel job, large number means more memory usage.
cpu – {cpu_basic_doc} The real CPU usage is ~1.5 times than this number, due to the sub processes of handling ALLC files using tabix/bgzip. Monitor the CPU and Memory usage when running this function.
binarize – {binarize_doc}
snp – {snp_doc}

ALLCools._merge_allc

Contents

ALLCools._merge_allc¶

Module Contents¶

`ALLCools._merge_allc`¶