{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Annotate RegionDS\n", "\n", "After getting an {{ RegionDS }} from DMR calling or any other genome region sets, we can annotate the regions with other epigenomic profiles or genomic features stored in the BigWig or BED format.\n", "\n", "For example, in this section, we will annotate the DMR RegionDS with chromatin accessibility profiles and some general genomic features." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Import" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2022-01-09T23:39:53.936192Z", "start_time": "2022-01-09T23:39:52.372477Z" } }, "outputs": [], "source": [ "import pandas as pd\n", "import pathlib\n", "from ALLCools.mcds import RegionDS" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Open RegionDS" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2022-01-09T23:39:54.082631Z", "start_time": "2022-01-09T23:39:53.938643Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Using dmr as region_dim\n" ] }, { "data": { "text/html": [ "
<xarray.RegionDS>\n", "Dimensions: (count_type: 2, dmr: 132, sample: 20)\n", "Coordinates:\n", " * count_type (count_type) <U3 'mc' 'cov'\n", " * dmr (dmr) <U9 'chr1-0' 'chr1-1' ... 'chr19-122' 'chr19-123'\n", " dmr_chrom (dmr) <U5 'chr1' 'chr1' 'chr1' ... 'chr19' 'chr19' 'chr19'\n", " dmr_end (dmr) int64 10002172 10003542 10003967 ... 5099203 5099952\n", " dmr_length (dmr) int64 2 305 54 2 2 2 2 ... 589 924 632 842 195 399 335\n", " dmr_ndms (dmr) int64 1 7 2 1 1 1 1 13 3 2 1 ... 2 1 2 7 13 19 9 9 3 6 13\n", " dmr_start (dmr) int64 10002170 10003237 10003913 ... 5098804 5099617\n", " * sample (sample) <U18 'snm3C_ASC' 'snm3C_CA1' ... 'snmC_ODC' 'snmC_OPC'\n", "Data variables:\n", " dmr_da (sample, dmr, count_type) uint32 4294967295 ... 4294967295\n", " dmr_da_frac (sample, dmr) float32 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0\n", " dmr_state (sample, dmr) int8 -1 0 0 1 -1 -1 -1 0 0 ... 0 0 0 0 0 0 0 0 0\n", "Attributes:\n", " region_dim: dmr\n", " region_ds_location: /home/hanliu/pkg/ALLCools_pycharm/docs/allcools/clus...\n", " chrom_size_path: /home/hanliu/pkg/ALLCools_pycharm/docs/allcools/clus...
<xarray.DataArray 'dmr_snATAC_da' (dmr: 132, snATAC: 10)>\n", "dask.array<open_dataset-03cdf5fb902771957a3508f7758e6aeedmr_snATAC_da, shape=(132, 10), dtype=float32, chunksize=(132, 1), chunktype=numpy.ndarray>\n", "Coordinates:\n", " * dmr (dmr) <U9 'chr1-0' 'chr1-1' 'chr1-2' ... 'chr19-122' 'chr19-123'\n", " dmr_chrom (dmr) <U5 'chr1' 'chr1' 'chr1' ... 'chr19' 'chr19' 'chr19'\n", " dmr_end (dmr) int64 10002172 10003542 10003967 ... 5099203 5099952\n", " dmr_length (dmr) int64 2 305 54 2 2 2 2 440 ... 589 924 632 842 195 399 335\n", " dmr_ndms (dmr) int64 1 7 2 1 1 1 1 13 3 2 1 ... 2 1 2 7 13 19 9 9 3 6 13\n", " dmr_start (dmr) int64 10002170 10003237 10003913 ... 5098804 5099617\n", " * snATAC (snATAC) <U4 'CA23' 'CGE' 'ASC' 'MGE' ... 'NonN' 'OPC' 'DG'
<xarray.DataArray 'dmr_genome-features_da' (dmr: 132, genome-features: 25)>\n", "dask.array<open_dataset-50fa93a20cd098fbd6ef8798079cfba4dmr_genome-features_da, shape=(132, 25), dtype=bool, chunksize=(132, 1), chunktype=numpy.ndarray>\n", "Coordinates:\n", " * dmr (dmr) <U9 'chr1-0' 'chr1-1' ... 'chr19-122' 'chr19-123'\n", " dmr_chrom (dmr) <U5 'chr1' 'chr1' 'chr1' ... 'chr19' 'chr19' 'chr19'\n", " dmr_end (dmr) int64 10002172 10003542 10003967 ... 5099203 5099952\n", " dmr_length (dmr) int64 2 305 54 2 2 2 2 ... 924 632 842 195 399 335\n", " dmr_ndms (dmr) int64 1 7 2 1 1 1 1 13 3 2 ... 1 2 7 13 19 9 9 3 6 13\n", " dmr_start (dmr) int64 10002170 10003237 10003913 ... 5098804 5099617\n", " * genome-features (genome-features) <U25 'CGI' ... 'transcript.protein_cod...
<xarray.RegionDS>\n", "Dimensions: (count_type: 2, dmr: 2, sample: 20, snATAC: 10)\n", "Coordinates:\n", " * count_type (count_type) <U3 'mc' 'cov'\n", " * dmr (dmr) <U9 'chr1-0' 'chr1-1'\n", " dmr_chrom (dmr) <U5 'chr1' 'chr1'\n", " dmr_end (dmr) int64 10002172 10003542\n", " dmr_length (dmr) int64 2 305\n", " dmr_ndms (dmr) int64 1 7\n", " dmr_start (dmr) int64 10002170 10003237\n", " * sample (sample) <U18 'snm3C_ASC' 'snm3C_CA1' ... 'snmC_OPC'\n", " * snATAC (snATAC) <U4 'CA23' 'CGE' 'ASC' 'MGE' ... 'NonN' 'OPC' 'DG'\n", "Data variables:\n", " dmr_da (sample, dmr, count_type) uint32 4294967295 ... 4294967295\n", " dmr_da_frac (sample, dmr) float32 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0\n", " dmr_state (sample, dmr) int8 -1 0 0 1 0 1 -1 0 -1 ... 0 1 0 0 1 0 0 0\n", " dmr_snATAC_da (dmr, snATAC) float32 0.09983 0.02372 ... 0.02372 0.03385\n", "Attributes:\n", " region_dim: dmr\n", " region_ds_location: /home/hanliu/pkg/ALLCools_pycharm/docs/allcools/clus...\n", " chrom_size_path: /home/hanliu/pkg/ALLCools_pycharm/docs/allcools/clus..." ], "text/plain": [ "