Toward Open Science in Natural Products Research: Curation and Dissemination of Natural Products Data in Wikidata


As contemporary bioinformatic and chemoinformatic capabilities are reshaping natural products research, major benefits could result from an open database of structure-organism pairs. Those pairs allow the identification of distinct molecular structures found as components of heterogeneous chemical matrices originating from living organisms. Current databases with such information suffer from paywall restrictions, limited taxonomic scope, poorly standardized fields, and lack of interoperability. To ensure data quality, references to the work that describes the structure-organism relationship is mandatory. To fill this void, we collected and curated a set of structure-organism pairs from publicly available natural products databases to yield LOTUS (naturaL prOducTs occUrrences databaSe), which contains over 500,000 curated and referenced structure-organism pairs. We make all scripts used for data collection, curation, and dissemination available. Aimed at providing unlimited access as well as standardized linkage to data from other resources, the extracted information is hosted on Wikidata. The diffusion of these referenced structure-organism pairs on the Wikidata framework addresses many of the limitations of currently-available databases and facilitates linkage to existing biological and chemical data resources. Collectively, this resource represents an important advancement in the design and deployment of a comprehensive and collaborative natural products knowledge base.

In eLife
