ALLCools.mcds.utilities

Module Contents

log[source]
calculate_posterior_mc_frac(mc_da, cov_da, var_dim=None, normalize_per_cell=True, clip_norm_value=10)[source]
calculate_posterior_mc_frac_lazy(mc_da, cov_da, var_dim, output_prefix, cell_chunk=20000, dask_cell_chunk=500, normalize_per_cell=True, clip_norm_value=10)[source]

Running calculate_posterior_mc_rate with dask array and directly save to disk. This is highly memory efficient. Use this for dataset larger then machine memory.

Parameters
  • mc_da

  • cov_da

  • var_dim

  • output_prefix

  • cell_chunk

  • dask_cell_chunk

  • normalize_per_cell

  • clip_norm_value

calculate_gch_rate(mcds, var_dim='chrom100k')[source]
get_mean_dispersion(x, obs_dim)[source]
highly_variable_methylation_feature(cell_by_feature_matrix, feature_mean_cov, obs_dim=None, var_dim=None, min_disp=0.5, max_disp=None, min_mean=0, max_mean=5, n_top_feature=None, bin_min_features=5, mean_binsize=0.05, cov_binsize=100)[source]

Adapted from Scanpy, the main difference is that, this function normalize dispersion based on both mean and cov bins.

determine_engine(dataset_paths)[source]
obj_to_str(ds, coord_dtypes=None)[source]
write_ordered_chunks(chunks_to_write, final_path, append_dim, engine='zarr', coord_dtypes=None, dtype=None)[source]
convert_to_zarr(paths)[source]

Convert xarray.Dataset stored in other backends into zarr backend.

update_dataset_config(output_dir, add_ds_region_dim=None, change_region_dim=None, config=None, add_ds_sample_dim=None)[source]