Alternative Way to Create RegionDS
Contents
Alternative Way to Create RegionDS¶
Parse Methylpy DMRfind output¶
If you used the methylpy DMRfind
function to identify DMRs, you can create a RegionDS
by running methylpy_to_region_ds
from ALLCools.mcds import RegionDS
from ALLCools.dmr.parse_methylpy import methylpy_to_region_ds
# DMR output of methylpy DMRfind
methylpy_dmr = '../../data/HIPBulk/DMR/snmC_CT/_rms_results_collapsed.tsv'
methylpy_to_region_ds(dmr_path=methylpy_dmr, output_dir='test_HIP_methylpy')
RegionDS.open('test_HIP_methylpy', region_dim='dmr')
<xarray.RegionDS> Dimensions: (dmr: 2337497, sample: 10) Coordinates: * dmr (dmr) <U15 'snmC_CT-0' 'snmC_CT-1' ... 'snmC_CT-2337496' dmr_chrom (dmr) <U5 'chr1' 'chr1' 'chr1' 'chr1' ... 'chrY' 'chrY' 'chrY' dmr_end (dmr) int64 3001020 3003900 3006189 ... 90811943 90812481 dmr_ndms (dmr) int64 1 3 2 1 3 2 1 1 1 1 3 1 ... 11 5 4 2 5 7 2 6 9 1 4 dmr_start (dmr) int64 3001018 3003640 3005998 ... 90811941 90812266 * sample (sample) <U17 'snmC_ASC' 'snmC_CA1' ... 'snmC_ODC' 'snmC_OPC' Data variables: dmr_da_frac (sample, dmr) float64 ... dmr_state (sample, dmr) int16 ... Attributes: region_dim: dmr region_ds_location: /home/hanliu/pkg/ALLCools_pycharm/docs/allcools/clus...
Create RegionDS from a BED file¶
You can create an empty RegionDS
with a BED file, with only the region coordinates recorded. You can then perform annotation, motif scan and further analysis using the methods described in the following sections.
The BED file contains three columns:
chrom: required
start: required
end: required
region_id: optional, but recommended to have. If not provided, RegionDS will automatically generate
f"{region_dim}_{i_row}"
as region_id. region_id must be unique.
You also need to provide a chrom_size_path
which tells RegionDS the sizes of your chromosomes.
Important
About BED Sorting
Region order matters throughout the genomic analysis. The best practice is to sort your BED file according to the chrom_size_path
you are providing. If your BED file is already sorted, you can set sort_bed=False
, which is True by default
# example BED file with region ID
!head test_from_bed_func.bed
chr1 45388517 45388519 snmC_CT-40246
chr1 58086003 58086005 snmC_CT-51693
chr10 96777313 96777315 snmC_CT-270457
chr10 97954303 97954318 snmC_CT-271472
chr10 106769860 106769862 snmC_CT-279004
chr10 111530721 111530723 snmC_CT-283627
chr10 116428091 116429149 snmC_CT-288520
chr11 10168273 10168460 snmC_CT-309845
chr11 19559808 19559926 snmC_CT-318359
chr11 42477074 42477076 snmC_CT-339956
bed_region_ds = RegionDS.from_bed(
bed='test_from_bed_func.bed',
location='test_from_bed_RegionDS',
chrom_size_path='../../data/genome/mm10.main.nochrM.chrom.sizes',
region_dim='bed_region',
# True by default, set to False if bed is already sorted
sort_bed=True)
# the RegionDS is stored at {location}
RegionDS.open('test_from_bed_RegionDS')
Using bed_region as region_dim
<xarray.RegionDS> Dimensions: (bed_region: 100) Coordinates: bed_region_end (bed_region) int64 45388519 58086005 ... 161863422 bed_region_start (bed_region) int64 45388517 58086003 ... 161863420 * bed_region (bed_region) <U15 'snmC_CT-40246' ... 'snmC_CT-2330818' bed_region_chrom (bed_region) <U5 'chr1' 'chr1' 'chr2' ... 'chrX' 'chrX' Data variables: *empty* Attributes: chrom_size_path: /home/hanliu/pkg/ALLCools_pycharm/docs/allcools/clus... region_dim: bed_region region_ds_location: /home/hanliu/pkg/ALLCools_pycharm/docs/allcools/clus...