speciateIT: 16S rRNA gene Classifier
speciateIT: A FAST CLUSTERING-FREE 16S ribosomal RNA gene sequence taxonomic assignment tool
Clustering of sequences into Operational Taxonomic Units (OTUs) has become a mainstream approach to facilitate taxonomic classification of large numbers of 16S rRNA gene sequences. This is partly due to the high computational requirements for processing each sequence in increasingly large datasets. A primary focus of the field has been development and improvement of OTU-based sequence clustering methods that rely on distances between each pair of sequences in a dataset. Following OTU-based clustering, representative sequences are commonly classified using tools such as the RDP Naïve Bayesian Classifier (Wang et al. 2007), and the resulting classification transitively assigned to all sequences comprising that OTU. However, problems with this strategy exist (Nguyen et al., 2016). We have developed speciateIT, a novel per sequence taxonomic assigner which quickly and accurately classifies millions of 16S rRNA gene sequences using higher order Markov Chain models built from a user- specified set of reference sequences, hence does not require the need for OTU clustering.
May 2018: Stay tuned for the availability of speciateIT. We are updating our repository.
The development of speciateIT is supported by the National Institute of Allergy and Infectious Diseases and the National Institute of General Medical Sciences of the National Institutes of Health under awards numbers U19AI084044, R01AI116799 and R01GM103604.