However, the database creation workflow is solely focused on the human microbiome. Another recent database, mMGE ( 20) has the advantage of unifying phage and plasmid information in a single catalog. A new plasmid collection by Brooks et al., for example, tries to bundle NCBI plasmid information in a collection ( 19). Similarly, new databases emerge trying to manage and overcome the resulting flood of plasmid data. Furthermore, dedicated algorithms for plasmid extraction from short read sequencing are gaining attention allowing for more efficient automated analysis of sequencing data ( 16–18). With the rising popularity of whole metagenome shotgun sequencing slowly superseding 16S rRNA sequencing, more plasmids are getting discovered. A core function of PLSDB is to allow users to upload their own sequences and compare them to the database contents, thereby selecting from established search methods such as Mash ( 9) or blastn ( 15). Apart from the dataset, PLSDB also provides a web interface to present the data in a simple but powerful manner. The additional annotations consist of resistance and virulence factors from ARG-ANNOT ( 11), CARD ( 12), ResFinder ( 13) and VFDB ( 14). The filtering hereby focuses on deduplication, Mash distances ( 9), and identification of putative chromosomal sequences using 53rps genes from PubMLST ( 10). ( 8) and adds further filtering and annotation steps.
PLSDB gathers data from NCBI & INSDC based on the query formulated by Orlek A et al. The original PLSDB was created to complement NCBI’s plasmid collection on RefSeq, which is partially incomplete, inconsistent, lacking in functionality, and contains several chromosomal sequences. Here, PLSDB ( 7) supports researchers with an easy-to-use web interface since 2018. However, to allow monitoring global distributions of plasmids within populations, a general-purpose database is required, providing easy access to previously reported plasmids. Due to several mechanisms, e.g., horizontal gene transfer via conjugation, antibiotic resistance may spread calling for a readjustment of focus in pharmaceutical research on new innovative antibiotics ( 6). On the other hand, plasmid research furthermore plays a significant role on a population level ( 5). On the one hand, associative connections between clinical conditions and plasmids may allow untangling specific disease and treatment patterns. Due to the appearance of such clinically relevant phenotypes, the analysis of plasmid sequences is widely acknowledged and often performed in the context of microbiome sequencing studies ( 3, 4). They can harbor a wide range of genes such as antibiotic resistance and virulence factors ( 1, 2). Plasmids are extrachromosomal DNA sequences that are short in comparison to chromosomes and frequently found in circular form within prokaryotes. The latest release of PLSDB is freely accessible under. Lastly, an application programming interface was implemented along with a python library, to allow remote database queries in automated workflows.
Additionally, new features implemented in the web-server ease user interaction and allow for a deeper understanding of custom uploaded sequences, by visualizing similarity information. New filtering steps, annotations, and preprocessing of existing records improve the quality of the provided data. For this update, we aggregated community feedback for major changes to the database featuring new analysis functionality as well as performance, quality, and accessibility improvements. Within two years, the size of this resource has more than doubled from the initial 13,789 to now 34,513 entries over the course of eight regular data updates. Here, our previously published database PLSDB provides a reliable resource for researchers to quickly compare their sequences against selected and annotated previous findings. However, with the increasing popularity and scale of metagenomics experiments, the number of reported plasmids is rapidly growing as well, amassing a considerable number of false positives due to undetected misassembles. Their relevance in metagenomic data processing is steadily growing. Plasmids are known to contain genes encoding for virulence factors and antibiotic resistance mechanisms.