ALLCools.sandbox.motif.cmotif
Contents
ALLCools.sandbox.motif.cmotif
¶
Module Contents¶
- generate_cmotif_database(bed_file_paths, reference_fasta, motif_files, output_dir, slop_b=None, chrom_size_path=None, cpu=1, sort_mem_gbs=5, path_to_fimo='', raw_score_thresh=8.0, raw_p_value_thresh=0.0002, top_n=300000, cmotif_bin_size=10000000)[source]¶
Generate lookup table for motifs all the cytosines belongs to. BED files are used to limit cytosine scan in certain regions. Scanning motif over whole genome is very noisy, better scan it in some functional part of genome. The result files will be in the output
- Parameters
bed_file_paths – Paths of bed files. Multiple bed will be merged to get a final region set. The motif scan will only happen on the regions defined in these bed files.
reference_fasta – FASTA file path of the genome to scan
motif_files – MEME motif files that contains all the motif information.
output_dir – Output directory of C-Motif database
slop_b – Whether add slop to both ends of bed files.
chrom_size_path – {chrom_size_path_doc} Needed if slop_b is not None
cpu – {cpu_basic_doc}
sort_mem_gbs – Maximum memory usage in GBs when sort bed files
path_to_fimo – Path to fimo executable, if fimo is not in PATH
raw_score_thresh – Threshold of raw motif match likelihood score, see fimo doc for more info.
raw_p_value_thresh – Threshold of raw motif match P-value, see fimo doc for more info.
top_n – If too much motif found, will order them by likelihood score and keep top matches.
cmotif_bin_size – Bin size of single file in C-Motif database. No impact on results, better keep the default.