PDBbind database

The PDBbind database^[1][2] provides a comprehensive collection of experimentally measured binding affinity data (i.e. K_d, K_i, and IC₅₀) for all major types of biomolecular complexes deposited in the Protein Data Bank (PDB), including protein-ligand, protein-protein, protein-nucleic acid, and nucleic acid-ligand complexes. In other words, the focus of this database is the linkage between energetic and structural information of those biomolecular complexes. This type of information is vastly needed by various computational and machine learning studies on molecular recognition, drug discovery, and others.

History

A prototype of the PDBbind database was first released to the public in May 2004.^[3][4] This prototype was created by Prof. Shaomeng Wang’s group at the University of Michigan, which provided experimental binding data for 2276 protein-ligand complexes in PDB. Since 2007, the PDBbind database has been maintained by Prof. Renxiao Wang’s group at Fudan University in China. Besides, this database has been updated regularly on an annual base since then to keep up with the growth of the Protein Data Bank.

Figure 1. Growth of the binding data in PDBbind since 2007

The PDBbind database now can be assessed at the PDBbind-CN web server (http://www.pdbbind-cn.org/)

Current release

The current release of the PDBbind database is version 2020. It is compiled based on the contents of PDB officially released at the first week in year 2020. This release provides binding affinity data for a total of 23,496 biomolecular complexes in PDB, including protein-ligand (19,443), protein-protein (2,852), protein-nucleic acid (1,052), and nucleic acid-ligand complexes (149). Compared to the last release (v.2019), binding data included in this release have increased by ~10%. All binding data are curated by our team from ~40,500 original references.

In addition to the valuable collection of experimental binding data, the PDBbind database also provides the following information and service:

(1) Processed structure files

PDBbind also provides processed“clean”structure files for all of the protein-ligand complexes in its contents. Such structure files can be readily utilized by most today’s molecular modeling software. In brief, the biological unit of each complex is split in to a protein molecule (saved in the PDB format) and a ligand molecule (saved in the Mol2 and SDF format). Atom/bond types on the ligand molecule are assigned by a special computer program and then examined and corrected manually. All processed structure files of protein-ligand complexes are wrapped in a data package that can be downloaded from the PDBbind-CN web server.

(2) Web-based display and analysis tools

Users can browse, analyze, and search the contents of PDBbind on the PDBbind-CN web server. The basic information of each complex is summarized on a single page, where the user can display the complex structure in multiple modes. Text-based and structure-based search among the contents of PDBbind are also enabled. These web-based tools can be accessed via PCs or mobile phones.

Note that the users need to register on the PDBbind-CN web server to access the full contents and web-based service of the PDBbind database. The registration is currently free for all academic and industrial users.

File:Pdbbind-user-UI.png

Figure 2. Homepage of a protein-ligand complex on the PDBbind-CN server

Major applications

Although the PDBbind database was created originally for developing and validating protein-ligand docking/scoring methods, it has been applied to so many different types of study. According to our literature survey, the data sets from PDBbind have been employed in this or that way by over 500 published studies. The major applications of PDBbind include the follows.

(1) Calibration and validation of docking/scoring methods

As mentioned above, PDBbind systematically annotates the protein−ligand complexes throughout PDB with experimental binding data. All binding data have been curated from the original references and then checked carefully. This type of high-quality data set is much needed for developing protein-ligand docking/scoring methods used in drug design. In fact, the PDBbind database has been a dominant data resource for this type of study. For example, many new scoring functions, including machine learning models, are calibrated and validated with the data sets from PDBbind, such as K_DEEP,^[5]Δ_VINARF₂₀,^[6]OnionNet,^[7]and AGL-Score.^[8] Based on PDBbind, our team has developed the Comparative Assessment of Scoring Functions (CASF) benchmark,^[9-13] which is arguably the most popular benchmark for validating protein-ligand interaction scoring functions.^[1]

(2) Analysis of drug-target interactions

The protein−ligand complexes included in the PDBbind data set cover a wide range of validated and potential drug targets as well as bioactive small molecules. Augmented by the three-dimensional structures and experimental binding data, this type of information can be utilized by a broad range analysis of drug-target interactions, such as binding site detection,^[14]statistical analysis of specific interactions,^[15]de novo ligand design,^[16]drug-target interaction network,^[17]target elucidation for bioactive compounds,^[18,19] and drug repositioning.^[20]

On-going developments

After providing free data and service to the community for more than a decade, the PDBbind database is now undergoing commercialization. The current plan is to charge the user for a modest amount of license fee starting at version 2021.

The binding data available in version 2021 are estimated to increase by 20% as compared to version 2020. In addition to larger sets of binding data, our team are now implementing state-of-the-art analysis tools on the PDBbind-CN web server for conducting drug target fishing, a task often relevant to studies on bioactive compounds.

Despite of the commercialization plan, the users can still access the contents of the PDBbind database up to version 2020 for free.

References

[1] Liu, Z.; Su, M.; Han, L.; Liu, J.; Yang, Q.; Li, Y.; Wang, R. Forging the Basis for Developing Protein–Ligand Interaction Scoring Functions. Acc. Chem. Res. 2017, 50, 302−309. doi: 10.1021/acs.accounts.6b00491. PMID: 28182403.

[2] Liu, Z.; Li, Y.; Han, L.; Li, J.; Liu, J.; Zhao, Z.; Nie, W.; Liu, Y.; Wang, R. PDB-wide collection of binding data: current status of the PDBbind database. Bioinformatics 2015, 31, 405-412. doi:10.1093/bioinformatics/btu626. PMID: 25301850

[3] Wang, R.; Fang, X.; Lu, Y.; Wang, S. The PDBbind database: collection of binding affinities for protein-ligand complexes with known three-dimensional structures. J. Med. Chem. 2004, 47, 2977–2980. doi:10.1021/jm030580l. PMID: 15163179

[4] Wang, R.; Fang, X.; Lu, Y.; Yang, C. Y.; Wang, S. The PDBbind database: methodologies and updates. J. Med. Chem. 2005, 48, 4111–4119. doi:10.1021/jm048957q. PMID: 15943484

[5]Jimenez, J.; Skalic, M.; Martinez-Rosell, G.; De Fabritiis, G. K_DEEP: protein-ligand absolute binding affinity prediction via 3D-convolutional neural networks. J. Chem. Inf. Model. 2018, 58, 287-296. doi: 10.1021/acs.jcim.7b00650. PMID: 29309725

[6]Wang, C.; Zhang, Y. Improving scoring-docking-screening powers of protein-ligand scoring functions using random forest. J. Comput. Chem. 2017, 38, 169-177. doi: 10.1002/jcc.24667. PMID: 27859414

[7]Zheng, L.; Fan, J.; Mu, Y. OnionNet: a multiple-layer intermolecular-contact-based convolutional neural network for protein-ligand binding affinity prediction. ACS Omega, 2019, 4, 15956-15965. doi: 10.1021/acsomega.9b01997. PMID: 31592466

[8] Nguyen, D. D.; Wei, G. W. AGL-Score: algebraic graph learning score for protein-ligand binding scoring, ranking, docking and screening. J. Chem. Inf. Model. 2019, 59, 3291-3304. doi: 10.1021/acs.jcim.9b00334. PMID: 31257871.

[9]Cheng, T.; Li, X.; Li, Y.; Liu, Z.; Wang, R. Comparative Assessment of Scoring Functions on a Diverse Test Set. J. Chem. Inf. Model. 2009, 49, 1079−1093. doi: 10.1021/ci9000053. PMID: 19358517.

[10]Li Y.; Liu Z.H.; Li J.; Han L.; Liu J.; Zhao Z.X.; Wang R.X. Comparative Assessment of Scoring Functions on an Updated Benchmark: I. Compilation of the Test Set. J. Chem. Inf. Model. 2014, 54, 1700–1716. doi:10.1021/ci500080q. PMID24716849.

[11]Li Y.; Han L.; Liu Z.H.; Wang R.X. Comparative Assessment of Scoring Functions on an Updated Benchmark: II. Evaluation Methods and General Results. J. Chem. Inf. Model. 2014, 54, 1717–1736. doi:10.1021/ci500081m. PMID24708446.

[12]Li, Y.; Su, M. Y.; Liu, Z. H.; Li, J.; Liu, J.; Han, L.; Wang, R. X. Assessing Protein-Ligand Interaction Scoring Functions with the CASF-2013 Benchmark. Nat. Protocol, 2018, 13, 666-680.doi: 10.1038/nprot.2017.114. PMID: 29517771

[13]Su, M. Y.; Yang, Q. F.; Du, Y.; Feng, G. Q.; Liu, Z. H.; Li, Y.; Wang, R. X. Comparative Assessment of Scoring Functions: The CASF-2016 Update. J. Chem. Inf. Model. 2019, 59, 895-913. doi: 10.1021/acs.jcim.8b00545.PMID: 30481020

[14]Zhao, R.; Cang, Z.; Tong, Y.; Wei, G.-W. Protein pocket detection via convex hull surface evolution and associated Reeb graph. Bioinformatics 2018, 34, i830-i837. doi: 10.1093/bioinformatics/bty598. PMID: 30423105

[15]Inhester, T.; Nittinger, E.; Sommer, K.; Schmidt, P.; Bietz, S.; Rarey, M. NAOMInova: interactive geometric analysis of noncovalent interactions in macromolecular structures. J. Chem. Inf. Model. 2017, 57, 2132-2142. doi: 10.1021/acs.jcim.7b00291. PMID: 28891648

[16]Li, Y.; Sun, Y.; Song, Y.; Dai, D.; Zhao, Z.; Zhang, Q.; Zhong, W.; Hu, L. A.; Ma, Y.; Li, X.; Wang, R. Fragment-based computational method for designing GPCR ligands. J. Chem. Inf. Model. 2020, 60, 4339-4349. doi: 10.1021/acs.jcim.9b00699. PMID: 31652060

[17]Pinzi, L.; Rastelli, G. Identification of target associations for polypharmacology from analysis of crystallographic ligands of the Protein Data Bank. J. Chem. Inf. Model. 2020, 60, 372-390. doi: 10.1021/acs.jcim.9b00821. PMID: 31800237

[18] Shaikh, F.; Tai, H. K.; Desai, N.; Siu, S. W. I. LigTMap: ligand and structure-based target identification and activity prediction for small molecular compounds. J. Cheminform. 2021, 13, 44. doi: 10.1186/s13321-021-00523-1. PMID: 34112240

[19] Li, G.-B.; Yu, Z.-J.; Liu, S.; Huang, L.-Y.; Yang, L.-L.; Lohans, C. T.; Yang, S.-Y. IFPTarget: a customized virtual target identification method based on protein-ligand interaction fingerprinting analyses. J. Chem. Inf. Model. 2017, 57, 1640-1651. doi: 10.1021/acs.jcim/7b00225. PMID: 28661143

[20] Wang, F., Wu, F., Li, C., Jia, C., Su, S., Hao, G. ACID: a free tool for drug repurposing using consensus inverse docking strategy. J. Cheminform. 2019, 11, 73. doi: 10.1186/s13321-019-0394-z. PMID: 33430982

External links

The Latest domain name of PDBbind-CN server is: