- Benchmarks for classification of genomic sequences
https://github.com/ML-Bioinfo-CEITEC/genomic_benchmarks
Accessing:
1.list_datasets
from genomic_benchmarks.data_check import list_datasets
list_datasets()
2. Pytorch
from genomic_benchmarks.dataset_getters.pytorch_datasets import HumanNontataPromoters
dset = HumanNontataPromoters(split='train', version=0)
dset[0]
output:
('CAATCTCACAGGCTCCTGGTTGTCTACCCATGGACCCAGAGGTTCTTTGACAGCTTTGGCAACCTGTCCTCTGCCTCTGCCATCATGGGCAACCCCAAAGTCAAGGCACATGGCAAGAAGGTGCTGACTTCCTTGGGAGATGCCATAAAGCACCTGGATGATCTCAAGGGCACCTTTGCCCAGCTGAGTGAACTGCACTGTGACAAGCTGCATGTGGATCCTGAGAACTTCAAGGTGAGTCCAGGAGATGT', 0)
3. HuggingFace Hub
https://github.com/ML-Bioinfo-CEITEC/genomic_benchmarks/blob/main/notebooks/How_To_Use_Datasets_From_HF.ipynb
Training Benchmark genomic datasets resources: