Data Questions

How do you process the chemical structures?

Skeletal formulae are usually used to represent the skeleton of organic molecules in publications, but some of them are generally not machine friendly. The most common problem is that certain stereochemistry is readable by humans, but cannot be recognized correctly by the computer software. In order to accurately present the stereochemistry in the database, we have performed some data processing for different situations.

Wavy lines

Wavy lines represent either unknown stereochemistry (R/S or E/Z) or a mixture of the two possible stereoisomers at that point. In order to better reflect the chemical space of marine natural products and reduce the redundancy of the database, we determine whether the structure with wavy lines is treated as one entry or multiple entries according to the following rules:

Wedge-dash diagrams

The chair conformation, Haworth projection and Fischer projection are converted to wedge-dash diagrams. Wedged bonds are used to describe the stereochemistry instead of axial bonds and equatorial bonds.

Tautomers

If the author indicates that the compound has tautomerism, each tautomer is treated as a separate entry.

What is the 3D Conformer?

3D conformer is a three-dimensional representation of the compound. The 3D structure is not experimentally determined, but a low-energy conformer computed by OpenEye OMEGA. Since a certain number of complicated MNPs contain many undefined stereocenters and/or rotatable bonds, it makes no sense to compute 3D descriptions for all records.Therefore, according to the criterion of PubChem3D conformer models, CMNPD provides a 3D conformer representation for each compound record that satisfies the following conditions:

  • Not too large (with no more than 50 heavy atoms)

  • Not too flexible (with no more than 15 rotatable bonds)

  • Has only a single covalent unit (salt, mixture or polymer keep only the largest fragment in the calculation)

  • Consists of only supported elements (H, C, N, O, F, Si, P, S, Cl, Br, and I)

  • Contains only atom types recognized by the MMFF94s force field

  • Has fewer than six undefined atom or bond stereocenters

Why do some compounds have the same name?

All compounds with the same name are marked with "Homonym⚠️" in the Name and Classification section.

You should use the data for these compounds with caution, because there are many factors that may cause this phenomenon:

  • Some entries are tautomers

  • Some entries are homologues

  • Some entries are analogs

  • Some entries are enantiomers or diastereomers

  • Different authors use the same name in their articles to represent completely different substances

  • Some authors use axial bonds and equatorial bonds with blurred directions instead of wedge bonds to describe stereochemistry, which leads to ambiguity

  • With the development of structure determination technology, some compounds’ chemical structures have been revised or their absolute configurations have been identified

It should be noted that CMNPD only objectively presents the chemical structure depictions of marine natural products based on published literature. We do not guarantee the accuracy of the structure described in the original article, nor will we have any tendency to indicate which structure is more reliable.

Structures that are considered inappropriate or have no absolute configuration will not be treated as erroneous entries in CMNPD unless the reference document is retracted by the publisher. However, a special symbol "†" will be added after the compound name to indicate that these structures have been revised or have more precise stereochemistry in another paper. This feature allows you to track the stereochemical progress of related compounds.

What do the abbreviations in biological activity data mean?

Brief description of the bicoactivies is mainly extracted from marine natural products reviews published in Natural Product Reports. In recent years, the authors have adopted many abbreviations to shorten the length of bioactivity data. Here is a comparison table of some useful abbreviations.

Abbreviation

Full name

AB

antibacterial

AF

antifungal

AI

anti-inflammatory

AM

antimicrobial

AM/AB

antimicrobial/antibacterial

AO

antioxidant

AV

antiviral

HTCL

Human Tumour Cell Line

IA

inactive

MDR

multidrug resistant

MIC

minimum inhibitory concentration

MO

microorganism

MOA

mechanism of action

NO

nitrous oxide

NT

not tested

Norm.

normal

SAR

Structure Activity Relationship(s)

TRP

Transient Receptor Potential

activ.

Activity

anal.

analysis

antifoul.

antifouling

bact.

bacteria

calc.

calculation

compar.

Comparison

connect.

connectivity

cytotox.

cytotoxicity/cytotoxic

degrad.

degradation

deriv.

derivative

determ.

Determined

diffrac.

diffraction

estab.

established

expt.

experimental

hum.

human

immunomod.

immunomodulatory

inhib.

inhibitor/inhibition/inhibitory

insep.

Inseparable

isol.

isolated

microb.

microbial, microbe

mixt.

mixture

mod.

moderate

prod.

production

prop.

proposed

recept.

receptor

spec. rot.

specific rotation

What is the Assay Type?

Standardized experimental data is incorporated from ChEMBL. The following is ChEMBL's description of assay type:

  • Binding (B) - Data measuring binding of compound to a molecular target, e.g. Ki, IC50, Kd.

  • Functional (F) - Data measuring the biological effect of a compound, e.g. %cell death in a cell line, rat weight.

  • ADME (A) - ADME data e.g. t1/2, oral bioavailability.

  • Toxicity (T) - Data measuring toxicity of a compound, e.g., cytotoxicity.

  • Physicochemical (P) - Assays measuring physicochemical properties of the compounds in the absence of biological material e.g., chemical stability, solubility.

  • Unclassified (U) - A small proportion of assays cannot be classified into one of the above categories e.g., ratio of binding vs efficacy.

What is the Target Type?

Targets are classified according to ChEMBL category. This slide shows how ChEMBL defines and classifies targets:

Last updated