Data Questions
Last updated
Last updated
Skeletal formulae are usually used to represent the skeleton of organic molecules in publications, but some of them are generally not machine friendly. The most common problem is that certain stereochemistry is readable by humans, but cannot be recognized correctly by the computer software. In order to accurately present the stereochemistry in the database, we have performed some data processing for different situations.
Wavy lines represent either unknown stereochemistry (R/S or E/Z) or a mixture of the two possible stereoisomers at that point. In order to better reflect the chemical space of marine natural products and reduce the redundancy of the database, we determine whether the structure with wavy lines is treated as one entry or multiple entries according to the following rules:
The chair conformation, Haworth projection and Fischer projection are converted to wedge-dash diagrams. Wedged bonds are used to describe the stereochemistry instead of axial bonds and equatorial bonds.
If the author indicates that the compound has tautomerism, each tautomer is treated as a separate entry.
3D conformer is a three-dimensional representation of the compound. The 3D structure is not experimentally determined, but a low-energy conformer computed by OpenEye OMEGA. Since a certain number of complicated MNPs contain many undefined stereocenters and/or rotatable bonds, it makes no sense to compute 3D descriptions for all records.Therefore, according to the criterion of PubChem3D conformer models, CMNPD provides a 3D conformer representation for each compound record that satisfies the following conditions:
Not too large (with no more than 50 heavy atoms)
Not too flexible (with no more than 15 rotatable bonds)
Has only a single covalent unit (salt, mixture or polymer keep only the largest fragment in the calculation)
Consists of only supported elements (H, C, N, O, F, Si, P, S, Cl, Br, and I)
Contains only atom types recognized by the MMFF94s force field
Has fewer than six undefined atom or bond stereocenters
All compounds with the same name are marked with "Homonym⚠️" in the Name and Classification section.
You should use the data for these compounds with caution, because there are many factors that may cause this phenomenon:
Some entries are tautomers
Some entries are homologues
Some entries are analogs
Some entries are enantiomers or diastereomers
Different authors use the same name in their articles to represent completely different substances
Some authors use axial bonds and equatorial bonds with blurred directions instead of wedge bonds to describe stereochemistry, which leads to ambiguity
With the development of structure determination technology, some compounds’ chemical structures have been revised or their absolute configurations have been identified
It should be noted that CMNPD only objectively presents the chemical structure depictions of marine natural products based on published literature. We do not guarantee the accuracy of the structure described in the original article, nor will we have any tendency to indicate which structure is more reliable.
Structures that are considered inappropriate or have no absolute configuration will not be treated as erroneous entries in CMNPD unless the reference document is retracted by the publisher. However, a special symbol "†" will be added after the compound name to indicate that these structures have been revised or have more precise stereochemistry in another paper. This feature allows you to track the stereochemical progress of related compounds.
Brief description of the bicoactivies is mainly extracted from marine natural products reviews published in Natural Product Reports. In recent years, the authors have adopted many abbreviations to shorten the length of bioactivity data. Here is a comparison table of some useful abbreviations.
Standardized experimental data is incorporated from ChEMBL. The following is ChEMBL's description of assay type:
Binding (B) - Data measuring binding of compound to a molecular target, e.g. Ki, IC50, Kd.
Functional (F) - Data measuring the biological effect of a compound, e.g. %cell death in a cell line, rat weight.
ADME (A) - ADME data e.g. t1/2, oral bioavailability.
Toxicity (T) - Data measuring toxicity of a compound, e.g., cytotoxicity.
Physicochemical (P) - Assays measuring physicochemical properties of the compounds in the absence of biological material e.g., chemical stability, solubility.
Unclassified (U) - A small proportion of assays cannot be classified into one of the above categories e.g., ratio of binding vs efficacy.
Targets are classified according to ChEMBL category. This slide shows how ChEMBL defines and classifies targets:
Abbreviation
Full name
AB
antibacterial
AF
antifungal
AI
anti-inflammatory
AM
antimicrobial
AM/AB
antimicrobial/antibacterial
AO
antioxidant
AV
antiviral
HTCL
Human Tumour Cell Line
IA
inactive
MDR
multidrug resistant
MIC
minimum inhibitory concentration
MO
microorganism
MOA
mechanism of action
NO
nitrous oxide
NT
not tested
Norm.
normal
SAR
Structure Activity Relationship(s)
TRP
Transient Receptor Potential
activ.
Activity
anal.
analysis
antifoul.
antifouling
bact.
bacteria
calc.
calculation
compar.
Comparison
connect.
connectivity
cytotox.
cytotoxicity/cytotoxic
degrad.
degradation
deriv.
derivative
determ.
Determined
diffrac.
diffraction
estab.
established
expt.
experimental
hum.
human
immunomod.
immunomodulatory
inhib.
inhibitor/inhibition/inhibitory
insep.
Inseparable
isol.
isolated
microb.
microbial, microbe
mixt.
mixture
mod.
moderate
prod.
production
prop.
proposed
recept.
receptor
spec. rot.
specific rotation