version 01/12/00

[ PDB tutorial|4581 =>|Rasmol tutorial|Structure Tool|Williams Page]

In this tutorial you will learn to use the Protein Data Bank, which is the international repository for processing and distributing 3-D macromolecular structure data determined by X-ray crystallography and NMR. The immediate goal is to obtain the coordinates of the DNA binding domain of Tata Box Binding Protein bound to a fragment of DNA. Specific questions that students must answer and hand in are highlighted in red font.

a) Go to the PDB WWW page.

b) Under SEARCH on the right side of the page, go to "Search Fields: advanced search".

c) Set the experimental technique to "X-RAY DIFFRACTION".

d) In the TEXT SEARCH field, type: "Tata and Box and Binding and Protein" (do not type the quotation marks).

e) Click on the SEARCH button.

f) After a pause, the Query Result Browser should display: "Your query found 11 structures..."

g) Scan down the list of entries until you find entry 1CDW.

1CDW Deposited:11-Apr-1996  Exp. Method: X-ray Diffraction Resolution: 1.90 Å
Classification Complex (Transcription Factor/DNA)
Compound Mol_Id: 1; Molecule: Tata-Box-Binding Protein; Chain: A; Fragment: Tbp Core Domain, Residues 155 - 333; Engineered: Yes
Mol_Id: 2; Molecule: 16 Base-Pair Tata-Containing Oligonucleotide; Chain: B, C; Engineered: Yes

h) "Explore" entry 1CDW, by clicking the EXPLORE button on the right side of the page.

i) Go to SEQUENCE DETAILS.

1) Make a helical wheel of the second alpha-helix (starts with EE...). Under "sequence and secondary structure", alpha-helices are indicated by H's.

j) Go to DOWNLOAD/DISPLAY FILE. Under DOWNLOAD THE STRUCTURE FILE, click PDB/no compression. That should download a file called 1CDW.pdb to your hard drive.

k) Open 1CDW.pdb with your word processor. The first two lines should be:

HEADER COMPLEX (TRANSCRIPTION FACTOR/DNA) 11-APR-96 1CDW
TITLE HUMAN TBP CORE DOMAIN COMPLEXED WITH DNA

Further down you should see,

ATOM 1 N SER A 155 79.567 95.981 -35.807 1.00 31.29 N
ATOM 2 CA SER A 155 78.596 95.084 -36.391 1.00 28.46 C
ATOM 3 C SER A 155 79.183 93.703 -36.578 1.00 28.17 C
ATOM 4 O SER A 155 78.463 92.707 -36.552 1.00 37.13 O
ATOM 5 CB SER A 155 78.144 95.628 -37.734 1.00 24.93 C
ATOM 6 OG SER A 155 79.256 95.751 -38.586 1.00 31.23 O
ATOM 7 N GLY A 156 80.498 93.647 -36.761 1.00 29.06 N
ATOM 8 CA GLY A 156 81.158 92.375 -36.994 1.00 27.86 C
ATOM 9 C GLY A 156 81.094 92.093 -38.486 1.00 30.80 C

These lines give the atom number, atom name, residue name, residue number, x coordinate, y coordinate, z coordinate, occupancy, and thermal factor of every atom in the protein. The thermal factor tells you about relative mobility. CA is an alpha carbon. Every amino acid has an alpha carbon. Those atoms are part of the polypeptide backbone. "CG2" means Cgamma2. That atom is part of the R-group of valine. Hydrogen atoms are missing, because hydrogen atoms are not observed by x-ray diffraction of large molecules like proteins. Some PDB files contain hydrogen atoms, but they were put there via modeling and geometric considerations, not from the diffraction data.

2) Draw the chemical structures of residue 226 and two adjacent residues. Also draw the structure of the corresponding tripeptide, indicating correct stereochemistry. Note that this work should be independently of other students. Do not use duplicate residues.

3) In what kind of secondary structure are these three residues? One way to find this information is to look on the SEQUENCE DETAILS page of the PDB. (NOTE: In the SEQUENCE DETAILS page of the PDB, the amino acids are numbered beginning with 1.  In the PDB coordinate file, the first amino acid residue is number 155.  Therefore, you must subtract 154 from the PDB residue number to get the number in the SEQUENCE DETAILS.)

l) Either quit the file or save it as text only. Do not save it as a word processing document.