------------------------------------------------------------------------ Protein Data Bank Quarterly Newsletter Release #71 January 1995 ------------------------------------------------------------------------ The latest version of the Electronic Deposition Form should be obtained from the FTP /pub directory before depositing data. ------------------------------------------------------------------------ January 1995 PDB Release 3091 full-release atomic coordinate entries (174 new additions 2869 proteins, enzymes, and viruses 212 nucleic acids 10 carbohydrates 368 structure factor entries 31 NMR experimental entries The total size of the atomic coordinate entry database is 1003 Mbytes uncompressed. ------------------------------------------------------------------------ What is New at the PDB During December 1994, the number of entries in the PDB passed the 3,000 mark and has now reached 3,091. The number of entries continues to rise exponentially with a 75 percent increase in the size of the PDB in 1994 alone. In parallel, the number of accesses to PDB over the Internet has been increasing as shown in the number of FTP downloads per month. This is equivalent to one download in each minute of every day ­ and it doesn't include the number of entries downloaded via the WWW! To make network access easier, the PDB Browser has been modified so that it now runs on WWW client viewers such as Mosaic and is accessible through the PDB's HTML home page (http://www.pdb.bnl.gov/). This is described in the article `PDB Announces a WWW Version of the PDB Browser'. See also M.C. Peitsch, T.N.C. Wells, D.R. Stampf, and J.L. Sussman, TIBS 20, 82-84 (1995). In parallel with our efforts at Brookhaven, a number of outside groups have created and developed tools that add enormously to the value of the data in the PDB, by examining them from different points of view. One such project is SCOP: a Structural Classification of Proteins Database for the Investigation of Sequences and Structures developed at the MRC Laboratory of Molecular Biology and the Cambridge Centre for Protein Engineering. This project is further described in a separate article. We encourage groups who have been developing programs or tools to extract various kinds of information from the PDB for their own requirements, or which may be of general use, to send a description of this to the PDB (newsletter@pdb.pdb.bnl.gov). We will try to provide brief articles in this Newsletter, post them on our Listserver, and where appropriate, insert hyperlinks in our Mosaic HTML home page. PDB intends to introduce significant changes to the format of the ATOM/HETATM records. Please pay particular attention to the article `PDB Proposes Changes to ATOM/HETATM Records` describing these changes ­ your input is valued. Users who are interested in receiving future printed copies of the PDB Newsletter should note that our mailing list is being re-initialized following the distribution of this issue. Please see the following article for additional information. ­ Joel L. Sussman ------------------------------------------------------------------------ Newsletter Distribution Changes In view of the ease of retrieving the Newsletter via Internet (FTP, Gopher, and WWW from the subdirectory /newsletter in PostScript and ASCII formats), the distribution of large numbers of printed Newsletters through the postal system seems quite wasteful, especially in this era of tight budgets. Therefore, the PDB is re-initializing the Newsletter mailing list following the distribution of this issue. Current subscribers, as well as new subscribers, who wish to receive printed copies of future Newsletters must contact us immediately. A Newsletter Request Form is available in the file news_mailinglist from the FTP /pub directory. This form may be completed and returned electronically (send_news@pdb.pdb.bnl.gov) or via the postal system (PDB Newsletter Mailing List, Chemistry Department, Building 555, Brookhaven National Laboratory, Upton, NY 11973 USA). ------------------------------------------------------------------------ PDB Proposes Changes to ATOM/HETATM Records PDB is planning a change in format of the ATOM and HETATM records. This modification addresses needs brought to our attention by many users. If acceptable, this change will take place in approximately two to three months. Along with other changes to our format, as discussed in the article `Revised PDB Format Description', these changes are part of our current efforts to produce PDB entries using CIF. PDB will introduce this format change in such a way as to minimize negative impact on existing software. Numerous applications rely on the current coordinate record format, and we wish to give ample notice of this change. We need input from the community regarding this issue, so please examine the proposal carefully and send us your comments (abola1@bnl.gov). PDB proposes to use columns 73 - 76 to identify specific segments of the molecule, and columns 77 - 80 to provide element information. Currently, columns 71 - 80 contain the entry's ID code and line number. If elimination of these data presents a problem to any programs, we need to be informed. Columns 73 - 76 will contain the segment id (SEGID), which will identify specific segments of molecules. The segment can consist of a complete chain or a portion of a chain. The importance of this new field can be readily understood if one considers an antibody structure having two molecules in the asymmetric unit. Since each chain must have a unique chain identifier, the two heavy chains and two light chains cannot currently be labeled to indicate their nature. SEGID's of CH, VH1, VH2, VH3, CL, and VL would clearly identify regions of the chains and the relationship between them. Users of X-PLOR will be familiar with SEGID as used in the refinement application of X-PLOR. SEGID is defined as a string of at most four (4) alphanumeric characters, left justified, and can include a white space, e.g., CH86, A 1, NASE. Columns 77 - 78 will contain the atom's element symbol, right justified, and columns 79 - 80 will indicate any charge on the atom, e.g., MN2+,O1-, H. In the past, hydrogen naming sometimes conflicted with IUPAC conventions. For example, we have not been able to label a hydrogen HG11, but as 1HG1 in order for it not to be confused with mercury. After adopting the format change, HG11 will be allowed in columns 13 - 16, and hydrogen will be clearly identified in columns 77 - 78, thus columns 13 - 16 will continue to be used to uniquely identify each atom. Please send your comments on these proposed changes to ATOM and HETATM records to Enrique Abola (abola1@bnl.gov). Again, if acceptable to our users, these changes will be implemented in approximately two to three months. ------------------------------------------------------------------------ PDB Release Policy To clarify PDB's policy regarding on-hold entries, the following is included in the acknowledgment letter sent to depositors upon data acceptance: PDB follows the IUCr guidelines which state that coordinates may be held (before release) no longer than one (1) year and structure factors may be held no longer than four (4) years from the date of publication. PDB has chosen to apply the same guidelines to NMR restraints data, allowing a maximum hold of four (4) years. Requests that the PDB delay release of your data (put it on hold) should be submitted at the time of the initial deposition. PDB cannot consider hold requests received more than one week after the date of this acknowledgment letter. A one-time extension of up to six (6) months due to delay in publication can be requested in writing. In no case will a coordinate data set be held for longer than eighteen (18) months from the date of deposition. Deposition of data constitutes your acceptance of PDB's Release Policy. Please note that information on the status of all entries, including those on hold, is available to the public. The PDB Pending and Waiting list, updated daily, is available through anonymous FTP and is searchable using WWW and Gopher. This file gives the status, including hold expiration date if the entry is on hold, of every pending entry. In addition, the full list of on-hold entries, with their hold expiration dates, is available via FTP, WWW, and Gopher. See the file named /pub/on_hold.list. ------------------------------------------------------------------------ Revised PDB Format Description The revised PDB Format Description for Atomic Coordinate Entries has been recently released. The purpose of this document is to completely describe the contents of PDB coordinate entry files. Several changes are being introduced to PDB files to make them more explicit to the human reader and more easily computer-parseable. Additionally, once the macromolecular CIF has been adopted, these changes will pave the way for conversion to CIF. An important enhancement to the current PDB format is in ATOM/HETATM records (see article `PDB Proposes Changes to ATOM/HETATM Records`). Additions to PDB files include new record types, such as TITLE, CAVEAT, KEYWRD, CISPRO, MODRES, DBREF, and SEQADV; introduction of keyword:value pairs in certain records such as COMPND, SOURCE, and REMARK 3; further detailing of the heterogen groups with the new records HETNAM, HETSYN, and HETSIT (list of residues in very close proximity to a given heterogen); the deprecation of footnotes; and restructuring of some REMARK records to make them more machine-accessible. This revised Format Description will be helpful to several communities. It will assist depositors in preparing entries for deposition, guide software and information resource developers, and help users of the PDB understand the contents of coordinate entries. Ultimately, this document and the enhanced entry format will facilitate the conversion of PDB files into CIF. This document is available in several formats (HTML, plain text, and PostScript) from FTP, Gopher, and WWW in the /pub directory. A more advanced HyperText version is being designed for later release. A convenient way to peruse the Format Description is to use Mosaic or another WWW interface. Access the PDB home page by opening the URL http://www.pdb.bnl.gov/. Move to "PDB Format Description" where you'll see several choices, including: - Table of Contents Advances you to the pertinent page of the document. - The Format Description Document Brings up the entire document. - Keyword Search of The Format Description Allows you to locate a specific word within the document. A sample page from the revised PDB Format Description follows: CRYST1 Overview: The CRYST1 records present the unit cell parameters, space group, and Z value. This record is present even if the structure was not determined by crystallographic means, in which case it simply defines a unit cube. Record Format: COLUMNS DATA TYPE FIELD DEFINITION 1 - 6 Record name "CRYST1" 7 - 15 Real a a (Angstroms) 16 - 24 Real b b (Angstroms) 25 - 33 Real c c (Angstroms) 34 - 40 Real alpha alpha (degrees) 41 - 47 Real beta beta (degrees) 48 - 54 Real gamma gamma (degrees) 56 - 66 LString(11) sGroup Space group 67 - 70 Integer z Z-value Details: - If the coordinate entry describes a structure determined by a technique other than crystallography, CRYST1 will contain a=b=c=1.0, alpha=beta=gamma=90, space group P 1, and z=1. - The Z-value is the number of polymeric chains in a unit cell. In the case of heteropolymers, Z is the number of occurrences of the most populous chain. - As an example, given two chains A and B, each with a different sequence, and the space group P 2 that has 2 equipoints in the standard unit cell, the following table gives the correct Z-value. Asymmetric Unit Content Z-value A 2 AA 4 AB 2 AAB 4 AABB 4 Verification/Validation/Value Control Authority: The given space group and Z-values are checked during processing for correctness and internal consistency. The calculated SCALE is compared to that supplied by the depositor. Packing is also computed, and close contacts of symmetry-related molecules are diagnosed. Relationships to Other Record Types: The unit cell parameters are used to calculate SCALE. If the EXPDTA record is NMR, FIBER DIFFRACTION, or THEORETICAL MODEL, the CRYST1 record is predefined as a=b=c=1.0, alpha= beta=gamma=90, space group P 1, and z=1. In these cases, an explanatory REMARK will also appear in the entry. Deposition Form Section and Prompt: CRYSTAL (CRYST1) a (A) : b (A) : c (A) : alpha (deg) : beta (deg) : gamma (deg) : space group : space group number : Z-value : Text to explain unusual unit-cell data : Symmetry operations for non-standard setting : CIF Equivalent: From the core CIF Dictionary: ----------------------------- data_cell_[] ; Data items in the _cell_ category record details about the crystallographic cell parameters. ; data_cell_angle_ loop_ _name '_cell_angle_alpha' '_cell_angle_beta' '_cell_angle_gamma' data_cell_formula_units_Z data_cell_length_ loop_ _name '_cell_length_a' '_cell_length_b' '_cell_length_c' data_cell_special_details data_cell_volume _definition Examples: 1 2 3 4 5 6 7 1234567890123456789012345678901234567890123456789012345678901234567890 CRYST1 52.000 58.600 61.900 90.00 90.00 90.00 P 21 21 21 8 CRYST1 1.000 1.000 1.000 90.00 90.00 90.00 P 1 1 ====================================================================== ------------------------------------------------------------------------ PDB Announces a WWW Version of the PDB Browser Previous Newsletter articles described a GUI-based browser utility running under tcl/tk for searching the PDB archives. This browser provided the ability to search text portions of entries using arbitrary regular expressions. Additionally, the browser incorporated graphical tools such as RASMOL and MidasPlus to view selected molecules. The browser was written in a modular fashion, permitting replacement of the search, front-end, or display mechanisms. Recently, a WWW version of the browser was made available to the user community (http://www.pdb.bnl.gov/cgi-bin/browse). This replaces the front end of the browser with popular WWW viewer programs such as Mosaic or Netscape. The WWW front end provides most of the functionality of the original browser while adding the following benefits: - All searching and access is over the network. One need not install Perl, tcl, and tk. (But on general principles, you should!) - The archive being searched is the up-to-date PDB (or a mirrored copy!) - Hypertext links are provided in the PDB file display to the Enzyme Data Bank as well as to the sequence databases. - Hypertext links are provided to use the default display program (e.g., RasMol) and pictures from M. Peitsch. - PDB access is provided for PC, Macintosh, and Unix computers. The primary drawback is that the search of the remark fields has been deleted due to the time necessary to complete the searches and the network time-outs that resulted. Short-term plans include replacing search scripts with a SYBASE database engine, increasing linkages to sequence and reference databases, and taking advantage of video and audio capabilities of WWW. Exporting of the server source code to PDBSA Affiliated Centers, permitting the user community to take advantage of the best network links available, is taking place. --------------------------------------------------------------------- NMR Depositors PDB would like to remind depositors of NMR entries including multiple models that all models should be presented in a common aligned orientation corresponding to the alignment shown in any related publications. --------------------------------------------------------------------- Use Care When Depositing Data Occasionally a new set of coordinates received at PDB contains errors in the data. For example, we may find an atom distant from the rest of its residue. Sometimes we find that errors have been introduced by a depositor doing last-minute hand-editing of the data. We recommend that depositors do a final check of each coordinate file immediately before sending it to the PDB. In addition, authors should carefully check the proposed entry that we send them, and consider all points mentioned in our accompanying letter. Viewing the molecule on a graphics terminal also may help in detecting errors. ------------------------------------------------------------------------ SCOP: a Structural Classification of Proteins Database for the Investigation of Sequences and Structures This article was written by Alexey G. Murzin, Steven E. Brenner, Tim Hubbard, and Cyrus Chothia, MRC Laboratory of Molecular Biology and Cambridge Centre for Protein Engineering, Hills Road, Cambridge CB2 2QH, England. It describes a database that complements the PDB and should be of interest to the user community. Presently, the PDB contains 3091 entries and the number is increasing by about seventy-five per month. To facilitate the understanding of, and access to, this information, we have constructed the Structural Classification of Proteins (SCOP) database. This database provides a detailed and comprehensive description of the structural and evolutionary relationships of proteins whose three-dimensional structures have been determined. It includes all proteins in the current version of the PDB and many proteins whose structures have been published but whose coordinates are not available from the PDB. The classification of the proteins, or the individual domains in the case of large proteins, is on hierarchical levels that embody the following evolutionary and structural relationships: - Family Proteins are clustered together into families on the basis of clear evidence for their having a common evolutionary origin. - Superfamily Families whose proteins have low sequence identities but whose structures and functional features suggest that a common evolutionary origin is probable are placed together in superfamilies. - Common Fold Superfamilies and families are defined as having a common fold if their proteins have same major secondary structures in the same arrangement and with the same topological connections. - Class For the convenience of users, the different folds have been grouped into classes. Most of the folds are assigned to the All-alpha, All-beta, alpha and beta, or alpha plus beta classes. Each entry (for which coordinates are available) has links to images of the structure, interactive molecular viewers, the atomic coordinates, sequence data, homologues, and MEDLINE abstracts. Two major searching facilities are currently available. Homology searching permits users to enter a sequence and obtain a list of structures which have significant levels of sequence similarity. Keyword searching finds matches from both the text of the SCOP database and the headers of PDB structure files. The SCOP database is available as a set of tightly coupled hypertext pages on WWW. This allows it to be accessed by any machine on the Internet (including Macintoshes, PCs, and workstations) using free WWW reader programs, such as Mosaic. Once such a program has been started, it is necessary only to open URL http://scop.mrc-lmb.cam.ac.uk/scop/ to obtain the home page of the database. The SCOP database was originally created as a tool for understanding protein evolution through sequence-structure relationships and determining if new sequences and new structures are related to previously known protein structures. In a more general way, the highest levels of classification provide an overview of the diversity of protein structures now known, and would be appropriate both for researchers and students. The specific lower levels should be helpful for comparing individual structures with their evolutionary and structurally-related counterparts. In addition, the search capabilities with their easy access to data and images make SCOP a powerful general-purpose interface to the PDB. For comments or additional information, please contact Steven Brenner (scop@www.bio.cam.ac.uk). ------------------------------------------------------------------------ The EBI NetNews Filtering Service This article was written by Jose R. Valverde and Rob Harper, European Bioinformatics Institute, an EMBL Outstation, Hinxton Hall, Hinxton, Cambridge CB10 1RQ, England. It describes one way to access Usenet News that may be of interest to users who are overwhelmed with the information available. Usenet News is possibly the most popular tool used today by researchers for efficient communication. It provides fast dissemination of news, fast on-line help, and a free forum for scientific discussion. However, the newsgroups either tend to be too specialized for interdisciplinary research (thus requiring a scientist to follow many groups), or they address too broad an interest (which results in information overload). The NetNews Filtering Service provides an easy way in which researchers can filter newsgroups automatically so that they will only receive those articles in which they are interested. A user can define one or several independent profiles that describe his or her fields of interest by means of keywords. The server will then periodically check the profiles against all the newsgroups in Bionet, EMBnet, and Sci and select only those articles that best reflect the interests of the user. The filter makes a preliminary review and selects a small number of articles, and, if the user finds any of these especially interesting, the server can be instructed to mail the full contents. The EBI NetNews Filtering Service can be accessed in either of two ways: giving commands by e-mail or through an easy graphical user interface using a WWW client (e.g., NCSA Mosaic). In the first case a user can start by sending an e-mail message to netnews@ebi.ac.uk with no subject and a body consisting of a single line with the word `help'. Working with Mosaic or another WWW browser, a user can select the URL http://www.ebi.ac.uk which will connect to our server and then select `Documentation Software and Services' from the main menu. In both cases the user can get on-line help, look at examples, and make tests before subscribing. For comments or additional information, please contact Jose R. Valverde (Jose.Valverde@ebi.ac.uk). ------------------------------------------------------------------------ MakeMolS This article was written by Liisa Holm, European Molecular Biology Laboratory, Heidelberg, Germany. It describes a tool that may be of interest to a wide cross section of PDB users. MakeMolS is a simple tool to facilitate the generation of input scripts to MolScript [P. Kraulis, J. Appl. Cryst. 24, 946-950 (1991)], a popular program for creating molecular graphics in PostScript form. MolScript supports a variety of representations of protein structures, including Jane Richardson-type schematic ribbon drawings to highlight secondary structure elements. The sole functionality of MakeMolS is to read the secondary structure definitions from a DSSP file [W. Kabsch and C. Sander, Dictionary of Protein Secondary Structure: Pattern Recognition of Hydrogen-Bonded and Geometrical Features. Biopolymers 22, 2577-2637 (1983)] and write them out in the form of MolScript instructions for a ribbon drawing. The DSSP program and database are available from the URL http://www.sander.embl-heidelberg.de/dssp/ and from anonymous FTP (ftp.embl-heidelberg.de) in the following directories: /pub/databases/protein_extras/dssp and /pub/software/unix/dssp. The Fortran source code of MakeMolS may be obtained from the PDB FTP directory /pub/program_tape/makemols.f. The user may further modify the scripts in a text editor to exploit the full spectrum of options provided by MolScript. For comments or additional information, please contact Liisa Holm (HOLM@embl-heidelberg.de). ------------------------------------------------------------------------ ZINC - Galvanizing CIF to Work with Unix The July 1994 Newsletter described the basic structure of a CIF (Crystallographic Information File), the standard that will probably become the interchange format of the future for the PDB due to increased amounts of information that it can contain. It was also pointed out that those who are accustomed to working with the PDB format in a Unix environment (with grep, awk, perl, diff, etc.) will not be able to use those skills in dealing directly with CIF. This article describes a format called ZINC (Zinc Is Not CIF) which is fully accessible to Unix tools, and a number of utilities that allow a CIF to be converted into a ZINC and back again, as well as versions of some familiar Unix tools (grep, diff) and some surprising new tools (zincSubset and zincNl) that should make access to CIF much easier. What is the Problem with CIF in a Unix Environment? CIF defines a format that is generally at odds with Unix tools. - Many Unix tools are line oriented ­ they expect related information to be on a single line whereas CIF allows and encourages the use of multiple lines, both by limiting the line length and in the definition of loops. - Many Unix tools break the lines into fields based on a separator character. Both PDB and CIF formats work against this, the former by having column-based fields, the latter by encouraging the liberal use of white space (across line boundaries). - A number of Unix tools treat the information in files as being position dependent (diff, head, tail, etc.). The PDB file format adapted neatly to this, but CIF allows a much wider variation in the placement of data. ­ ZINC Zinc is not an interchange format as CIF is, but rather a piping format, i.e., a format that makes the contents of a CIF accessible to Unix utilities. Each data line of a ZINC file consists of five tab-separated fields: block name index value loop-id The first field is the name of the CIF data block (the data_ prefix is omitted) and is repeated on each line where appropriate. The second field is the name of the data item. The third field is an index specifier, which is empty for non-looped data, and is a zero-based index for looped data. The fourth field is the data item itself. For multiple-line CIF data, new line characters are replaced by the two characters \n. The backslash character becomes the escape character throughout the ZINC format. The fifth field is a loop identifier. Comments that appear in a CIF are associated with the previous token and are also represented in the ZINC format. ­ Existing Tools A number of tools have been developed to support the user community in using the ZINC format and to access the information contained in a CIF. Most are simple and allow users to modify the code to tackle new prob lems. Source is provided for all. - cifZinc, obviously the first required tool, converts an existing CIF into a ZINC. This is a C program that converts even the largest CIF in a few seconds. - zincCif, the next most important tool, takes a ZINC and creates a pretty-printed CIF. Very often the pipeline: cifZinc old.cif | zincCif > new.cif will produce a better looking CIF than the original. - zincGrep, greps a ZINC (or a CIF file specified as a command-line argument) for a regular expression and returns the block name, data name, index, and value of the match. This is the single most requested tool for dealing with CIF. - cifdiff is a four-line C-shell script that takes two CIFs and determines the differences between them. It will handle CIFs that have been rearranged, and even loops with rearranged columns, and provide only the real differences. - zb, a small (< 200 lines) tcl/tk program that provides a simple GUI front end to a ZINC or CIF, allows users to browse through the contents. Multiple files as well as multiple data blocks can be viewed simultaneously on any X terminal. zb recognizes command-line argument file names in the form *.cif as being a CIF and converts t to a ZINC automatically. - zincSubset is another C-Shell script that is very short but very useful. It allows users to generate a custom subset of any ZINC (CIF), simply by listing the data blocks and data names that he or she wishes to include. For example, if you wanted to extract only the names and definitions from the mmCIF dictionary, you would create a file (e.g., defs) with two lines that appear as: _name _definition (preceded and followed by tabs) and run the command: zincSubset defs mmcif94 | zincCif - zincNl, a perl script, takes a ZINC file and creates a Fortran compatible namelist file allowing easy access to any CIF by Fortran programs without the need for extensive I/O libraries or reprogramming. As with zb above, it will automatically convert a CIF to a ZINC. It may be used in the following pipeline which extracts the coordinates from a CIF and presents them to a Fortran program via the namelist mechanism: zincSubset coords datafile | zincNl | myfortran (where coords is a file that simply has three lines with x, y, and z surrounded by tabs). ­ A Sample CIF and ZINC The following CIF illustrates most aspects involved in translating to a ZINC: # # A simple CIF # data_object # # polygon # _name ; triangle ; loop_ _x _y 0.0 0.0 1.0 0.0 0.0 1.0 _num_sides 3 In ZINC, this would appear as: ( 0 # ( 1 # A simple CIF ( 2 # object ( 3 # object ( 4 # polygon object ( 5 # object _name ;\ntriangle\n; object _x 0 0.0 _x object _y 0 0.0 _x object _x 1 1.0 _x object _y 1 0.0 _x object _x 2 0.0 _x object _y 2 1.0 _x object _num_sides 3 Note that comments `belong' to a data block and are represented with an open parenthesis for the data name. Initially, the data block name is defined to be the null string. ­ Code The formal definitions of ZINC and the above mentioned programs are available from PDB using FTP, Gopher, or WWW (ftp://pub/other-software/Zinc). Please give it a try. All are invited to submit their favorite scripts that use ZINC. Any comments should be directed to Dave Stampf (drs@bnl.gov). ------------------------------------------------------------------------ Mirroring Earlier this year, PDB started updating the FTP server with fully- released entries on a frequent basis ­ typically every few weeks. PDB users often ask how they can keep their local archives up to date without having to continually check the current holdings of the PDB. Fortunately, a public domain package called mirror exists that does exactly this. This program is a perl script that runs on your system and periodically creates an FTP connection to the PDB, determines the difference between your local archive and the PDB (including deletions!), and performs all necessary tasks needed to make them equivalent. We have used this program with the Weizmann Institute of Science in Israel, Turku University in Finland, the European Molecular Biology Laboratory in Germany, and the European Bioinformatics Institute in England with great success even over very poor network connections. We encourage the members of our user community who maintain complete local archives to do the same. If you wish to try this, here is what you need to do: 0) Think: Decide what your local archive will look like. Will it be a virtual image of the PDB? Will it only hold compressed files? Will it use the all_files in one directory scheme, or the 2-character directory scheme? Your space limitations and your usage patterns will determine what is best for you. 1) Prepare: Set up your local archive from a 1994 version of the PDB CD-ROM. Be sure the dates on the files match those on the CD. In order to copy a directory on the CD to a local disk without changing the dates, use the following tar pipeline: mkdir /usr/distr or where you want the files to go cd /CDROM/distr or where you are copying the files from tar cf - . | (cd /usr/distr; tar xvf -) 2) Get mirror: Copy the mirror software from PDB or elsewhere (/pub/other-software/Mirror). There is nothing to compile, but you must have already installed perl and have run the h2ph perl script. Install the mirror program in one of the standard locations. 3) Configure: Adapt a copy of the mirror.defaults configuration file. A sample of this file is also on the PDB FTP server. 4) TEST!!! If you run mirror -n it will tell you what it will do without doing it. If you are about to transfer a gigabyte over a megabit line, you may wish to reconsider. 5) When you are satisfied, make the mirror run as part of your crontab and forget about it. Your local archive will be kept up to date `automagically'! As the mirror README says, `Objects in the mirror are closer than they appear!' ------------------------------------------------------------------------ Access to PDB ­ World Wide Web (WWW) PDB has a World Wide Web (WWW) server on the computer system www.pdb.bnl.gov (130.199.144.1). This server is accessible using the document URL http://www.pdb.bnl.gov/. Besides including links to the PDB FTP and Gopher servers, the WWW server includes links to many other useful databases and information servers. ­ Gopher PDB has a Gopher server on the system gopher.pdb.bnl.gov (130.199.144.1). This server is accessible using a Gopher client connecting to the following link: Name = Protein Data Bank FTP server Type = 1 Host = gopher.pdb.bnl.gov Port = 70 Path = 1/ As a Gopher client, you may navigate through a hierarchy of directories and documents or ask an index server to return a list of all documents that contain one or more specified words. For instance, you can choose `The PDB Anonymous FTP' after reaching PDB's Gopher server in order to search and download the same information and coordinate files as through FTP. Alternatively, you can select `An (almost) full-text search of the PDB Bibliographic Headers' in order to search PDB using any keyword. ­ FTP PDB has an anonymous FTP account on the computer system ftp.pdb.bnl.gov (Internet address 130.199.144.1). Files may be transferred to and from this system using anonymous as the FTP user name and your e-mail address as the password. Besides downloading entries, data files, and documenta tion, it is possible to upload any files that you may wish to send to PDB, into the special directory /new_uploads. Those using VMS may need to place quotes around file names. ­ Listserv PDB has a mailing list devoted to discussions concerning its operation, contents, and access procedures. To subscribe, send e-mail to listserv@pdb.pdb.bnl.gov with the one-line message: subscribe PDB-L Firstname Lastname. To find out what can be done with this mailing list, send e-mail to the same address (listserv@pdb.pdb.bnl.gov) with the one-line message: help. To send a message to all PDB-L subscribers, e-mail the message to: PDB-L@pdb.pdb.bnl.gov. --------------------------------------------------------------------- AFFILIATED CENTERS Twenty-two affiliated centers offer DATAPRTP information for distribution. These centers are members of the Protein Data Bank Service Association (PDBSA). Centers designated with an asterisk(*) may distribute DATAPRTP information both on-line and on magnetic or optical media; those without an asterisk are on-line distributors only. BMERC BioMolecular Engineering Research Center College of Engineering, Boston University Boston, Massachusetts Kathleen Klose (617-353-7123) klose@darwin.bu.edu *BIOSYM BIOSYM Technologies, Inc. San Diego, California Laurel Frey (619-546-5509) rcenter@biosym.com or laurel@biosym.com CAN/SND Canadian Scientific Numeric Data Base Service Ottawa, Ontario, Canada Roger Gough (613-993-3294) cansnd@vm.nrc.ca CAOS/CAMM Dutch National Facility for Computer Assisted Chemistry Nijmegen, The Netherlands Jan Noordik (+1 31-80-653386) noordik@caos.caos.kun.nl *CCDC Cambridge Crystallographic Data Centre Cambridge, United Kingdom David Watson (+1 44-223-336394) dgw1@chemcrys.cam.ac.uk CSC CSC Scientific Computing Ltd. Espoo, Finland Heikki Lehvaslaiho (+1 358-0-457-2076) heikki.lehvaslaiho@csc.fi CINECA NE Italy Interuniversity Computing Center Casalecchio di Reno (BO), Italy Laura Setti (+1 39-51-6599478) asltc0@icineca.cineca.it ICGEB International Centre for Genetic Engineering and Biotechnology Trieste, Italy Sandor Pongor (+1 39-40-3757300) pongor@icgeb.trieste.it EMBL European Molecular Biology Laboratory Heidelberg, Germany Hans Doebbeling (+1 49-6221-387-247) hans.doebbeling@embl-heidelberg.de INN Israeli National Node Weizmann Institute of Science Rehovot, Israel Leon Esterman (+1 972-8-343934) lsestern@weizmann.weizmann.ac.il *JAICI Japan Association for International Chemical Information Tokyo, Japan Hideaki Chihara (+1 81-3-5978-3608) *MAG Molecular Applications Group Palo Alto, California Hilary Jensen (415-473-3039) hilary@suerte.mag.com *MSI Molecular Simulations Inc. Burlington, Massachusetts Lance J. Ransom Wright (617-229-9800) lance@msi.com NCHC National Center for High-Performance Computing Hsinchu, Taiwan, ROC Jyh-Shyong Ho (+1 886-35-776085; ex: 342) c00jsh00@nchc.gov.tw NCSA National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Champaign, Illinois Patricia Carlson (217-244-0768) pcarlson@ncsa.uiuc.edu National Center for Biotechnology Information National Library of Medicine National Institutes of Health Bethesda, Maryland Stephen Bryant (301-496-2475) bryant@ncbi.nlm.nih.gov *OML Oxford Molecular Ltd. Oxford, United Kingdom Steve Gardner (+1 44-865-784600) steve@gardner.demon.co.uk *Osaka University Institute for Protein Research Osaka, Japan Yoshiki Matsuura (+1 81-6-879-8605) matsuura@protein.osaka-u.ac.jp Pittsburgh Supercomputing Center Pittsburgh, Pennsylvania Hugh Nicholas (412-268-4960) nicholas@cpwpsca.psc.edu SDSC San Diego Supercomputer Center San Diego, California Lynn Ten Eyck (619-534-8189) teneyckl@sdsc.edu SEQNET Daresbury Laboratory Warrington, United Kingdom User Interface Group (+1 44-925-603351) uig@daresbury.ac.uk *Tripos Tripos, Inc. St. Louis, Missouri Akbar Nayeem (314-647-1099; ex: 3224) akbar@tripos.com ------------------------------------------------------------------------ Protein Data Bank Chemistry Department, Bldg. 555 Brookhaven National Laboratory P.O. Box 5000 Upton, NY 11973-5000 USA ------------------------------------------------------------------------ To Contact PDB Telephone +1 516-282-3629 Facsimile +1 516-282-5751 Internet: pdb@bnl.gov general correspondence orders@pdb.pdb.bnl.gov order information sysadmin@pdb.pdb.bnl.gov network services listserv@pdb.pdb.bnl.gov Listserver subscriptions pdb-l@pdb.pdb.bnl.gov Listserver postings errata@pdb.pdb.bnl.gov entry error reporting Please include your name, postal mailing address, e-mail address, facsimile number, and telephone number in all correspondence. ------------------------------------------------------------------------ Statement of Support PDB is supported by a combination of Federal Government Agency funds (work supported by the U.S. National Science Foundation; the U.S. Public Health Service, National Institutes of Health, National Center for Research Resources, National Institute of General Medical Sciences, and National Library of Medicine; and the U.S. Department of Energy under contract DE-AC02-76CH00016) and user fees. ------------------------------------------------------------------------ PDB Staff Joel L. Sussman, Head David R. Stampf, Sr. Project Mgr. Enrique E. Abola, Science Coordinator Frances C. Bernstein Judith A. Callaway Minette Cummings Betty R. Deroski Pamela A. Esposito Arthur Forman Patricia A. Langdon Michael D. Libeson Nancy O. Manning John E. McCarthy Regina K. Shea John G. Skora Karen E. Smith Dejun Xue ------------------------------------------------------------------------ ------------------------------------------------------------------------