PCR.PTML-Project Supporting Information

  1. G. Diaz, Humbert

Editor: figshare

Any de publicació: 2024

Tipus: Dataset

CC BY 4.0

Resum

The characterization of DNA sequence variants (variant calling) from ancient samples (aDNA) is a topic of major interest in modern science. However, there are important theoretical and experimental problems to be overcome like the low number of reliable samples, the possibility of contamination, <i>etc</i>. On the other hand, the main concern during the extraction ancient of DNA (aDNA) is to prevent contamination with modern materials. Consequently, it is crucial to be able to differentiate modern from ancient sequences in aDNA variant calling. In this context, computational techniques may play an important role. Most methods reported use alignment-dependent algorithms relying on limited samples. More recently Artificial Intelligence / Machine Learning (AI/ML) methods have been proposed as an alternative. However, almost all of them focus on the analysis of Mammoth, Neanderthal, or other Demographic aDNA problems and omit the study of aDNA paleo-microobiomes. In this work, we report by the first time the PCR.PTML methodology based on the combination of experimental Polymerase Chain Reaction (PCR) metagenomic analysis of aDNA sequences with the Perturbation Theory (PT) and Machine Learning (ML) predictive modeling. Firstly, we reported the extraction and PCR analysis of a new set of putative microbiome 16S aDNA sequences from Miocene fossil amber. Next, we developed a variant calling PTML model able to discriminate 16S aDNA sequences of Miocene bacteria from modern bacteria sequences with more that 80% of specificity and sensitivity in training and validation series. We used both Linear Discriminant Analysis (LDA) and Artificial Neural Networks (ANN) algorithms for variant calling exploration of 100000 combinations of query and reference sequences to seek model.