Previous Next

Computer-Aided Drug Design Tutorials: Visualization of Macromolecules

Experimental Sources of Structural Data

Three dimensional structures of biological macromolecules are determined to near-atomic resolution by either diffraction methods or using nuclear magnetic resonance (NMR). The diffraction of X-rays from the electrons in single protein crystals is the most commonly used method for the determination of protein structures. Neutron diffraction by nuclei in single protein crystals is less commonly used but provides the position of hydrogen atoms, which is often critical in the elucidation of protonation states of catalytic residues. The quality of single-crystal diffraction structures is characterized by the resolution and the R-factor. Structures used for target structure based design would ideally have high resolution (d < 1.3 Å) because the positions of atoms involved in ligand binding are much better defined in such structures. Unfortunately, only a few protein crystals diffract X-rays to such high resolution because of mosaicity (one protein molecule differs slighly from the next one in conformation) and large thermal motions (proteins are flexible and change slightly in shape during the data collection). The low resolution achieved in a typical neutron diffraction experiment (2-3 Å) makes such structurs less suitable for computer-based drug design. However, the new Spallation Neutron Source promises to deliver protein structures with 1.5 Å resolution. The solution-phase NMR studies yield good-quality structures for small (< 30 kDa) soluble proteins but structure determination of larger proteins is very challenging.

Protein Data Bank

The main database for structures of biological macromolecules is the Protein Data Bank (PDB), managed by the Research Collaboratory for Structural Bioinformatics (RCSB). The PDB archives experimentally determined three-dimensional coordinates and related structural parameters of biological macromolecules such as proteins, nucleic acids, and their complexes. The majority of the structures are determined via X-ray crystallography but about 15% of structures in the PDB were obtained via solution NMR studies. For several hundred proteins, both X-ray and NMR structures are available, allowing one to analyze how the crystal packing affects structure. Each structure in the Protein Data Bank has a unique 4-character alphanumeric identifier known as the PDB ID. You can retrieve all related structures using search by keywords, or access a particular structure by its PDB ID. Some molecular modeling programs, such as PyMOL, Chimera, and MOLDEN (v. 4.9 or newer) can directly retrieve structure files from the PDB.

Data Mining from PDB

The Research Collaboratory for Structural Bioinformatics, which manages the PDB, has created web-based tools that make it quite easy to search and understand the content of PDB files. Some of these tools require that Java Software is installed, and that the Java plugin is enabled in your browser. Follow these steps to learn about some of these tools:

  1. Search the RCSB PDB by keyword oseltamivir to find all protein structures that contain the influenza drug Tamiflu. Brief information about each will be shown. Select 2HT8 by clicking on its title line.
  2. Visually examine the content of the PDB file by clicking on the "Display Files" icon and selecting PDB File. Scroll through this large file and notice that the structure contains only one polypeptide (chain identifier for all ATOMs is A) and one ligand, which has the residue name G39. Close the window with text representation of the PDB file.
  3. Save the PDB file in your directory on the workstation as 2HT8.pdb by clicking on the "Download Files" icon next to the PDB ID code and selecting PDB File (Text).
  4. Examine the information about this structure in the PDB website. Notice that the Primary Citation includes a PubMed Abstract and links to PubMed and the publisher websites. You can follow these links to read the original resaerch paper or find related papers in MEDLINE database.
  5. The Ligand Chemical Component field near the bottom of the page is one of the most interesting from the drug design viewpoint. It provides information about every small molecule that is bound to the protein. The D icon under Links stands for Drug Similarity and takes you to the SuperLigands web server at Charité Berlin. This web server finds well-known molecules that are most similar to the molecule you used to search the database. A related but smaller database is the Super Drug Database hosted at the same institution. Clicking on the Ligand Explorer will open a Java application that analyzes the structure of the binding pocket. When you click on the Ligand Identifier, you can see the full name, the SMILES string, and the chemical structure of the molecule.
  6. Click on the Links tab at the top of the screen to see a rather long list of tools. One that might be useful for us is the PDBsum. Follow this link and briefly examine the views under Protein tab, under the Ligand tab, and under the Clefts tab. Notice how the LigPlot image provides a 2D view of the active site interactions and how you can identify possible binding sites in the Clefts tab. Close the PDBsum window.
  7. Click on the Links and select Computed Atlas of Surface Topography of proteins (CASTp). The CASTp server allows identification and characterization of pockets and cavities in protein structures. In the CASTp window, select Calculation Request, then enter 2ht8 or upload the structure that you had saved earlier. Visually examine the largest pockets in the structure by checking the pocket with the largest volume from the left-side menu. Residues that line the pocket are shown as colored spheres. What is in the middle of this pocket? If you have difficulty, you can activate JMol menus by clicking down the right mouse button. Select Hetero, Ligand, and then choose Ball and Stick scheme under Render. Close the CASTp server window when done.
  8. Another interesting database is PDB-redo from the Center for Molecular and Biomolecular Informatics at Radboud University Nijmegen in Netherlands. This database presents protein structures obtained from the original diffraction data but re-refined and validated using modern and consistent approaches; this database eliminates many structural errors that have slipped into the PDB. You may find this database useful when preparing proteins for your computational studies.

Previous Next

Tutorial by Dr. Kalju Kahn, Department of Chemistry and Biochemistry, UC Santa Barbara. ©2010.