----------------------------------------------------------------------- Protein Data Bank Quarterly Newsletter Release #70 October 1994 ----------------------------------------------------------------------- Table of Contents What is New at the PDB ZINC ­ Galvanizing CIF to Work with Unix ­ ZINC ­ Existing Tools ­ A Sample CIF and ZINC ­ Code Mirroring Managing the Archives ­ Keeping Up with Changes ­ Informing PDB of Errors and Updating Entries ­ New Record Types ­ Record Descriptions ­ HET Dictionary ­ Whom to Contact New Electronic Deposition Form User Group News WWW Access to PDB Release Schedules Discontinuation of Tape Distribution Revised Newsletter Format Newsletter Distribution Changes New Entry Tracking System Access to PDB ­ FTP ­ Gopher ­ WWW - Listserv Affiliated Centers ----------------------------------------------------------------------- The latest version of the Electronic Deposition Form should be obtained from the FTP /pub directory before depositing data. ----------------------------------------------------------------------- October 1994 PDB Release 2921 full-release atomic coordinate entries (249 new additions) 2703 proteins, enzymes and viruses 208 (DNA, RNA, tRNA) 10 carbohydrates 358 structure factor entries 31 NMR experimental entries The total size of the atomic coordinate entry database is 944 Mbytes uncompressed. ----------------------------------------------------------------------- What is New at the PDB There has been a true revolution in the field of Structural Biology during the past few years, resulting in an enormous increase in the number of laboratories doing structural studies of biological macromolecules to atomic resolution as well as in the rate that these structures can be determined. This is due in part to great improvements in molecular biology techniques and X-ray detectors, greater availability of synchrotrons for protein crystallography, improved tools for refinement and map fitting, and improvements in NMR techniques which make possible structure determination from non-crystalline samples. The resulting avalanche of new biomolecular structural data presents a major challenge for the PDB, and the major revolution in computer networking and database hyperlinking over the past couple of years represents the most promising way to deal with it effectively. The PDB has been very active in this area. Over the past few years, an anonymous FTP server and archive, and later a gopher hole, were set up at the PDB. In August 1994, a World Wide Web (WWW) home page was set up (http://www.pdb.bnl.gov/). This new way of communicating via the Internet, recently described in detail [Schatz and Hardin (1994), "NCSA Mosaic and the World Wide Web: Global Hypermedia Protocols for the Internet", Science 265, 895-901], is at present the most convenient way to access the PDB as well as many other biological and chemical databases. Communicating via Mosaic provides a very friendly user front end that allows easy access to, and retrieval of files from, the PDB as well as search capabilities for new entries which are pending or on hold. As described in our July 1994 Newsletter, several hundred spectacular images of PDB structures in RGB and GIF formats are available from the PDB FTP server. Created by Dr. Manual Peitsch of the Glaxo Institute for Molecular Biology in Geneva, Switzerland, each one depicts a key aspect of the entry or experiment that it represents. In the future, we plan to add hyperlinks to other biological and chemical databases via the WWW server, as well as a home page for the PDB-Browser to replace its current tcl/tk-based front end. - Joel L. Sussman ----------------------------------------------------------------------- ZINC - Galvanizing CIF to Work with Unix The July 1994 Newsletter described the basic structure of a CIF (Crystallographic Information File) and the standard that will probably become the interchange format of the future for the PDB due to increased amounts of information that it can contain. It was also pointed out that those who are accustomed to working with the PDB format in a Unix environment (with grep, awk, perl, diff, etc.) will not be able to use those skills in dealing directly with CIF. This article describes a format called ZINC (Zinc Is Not CIF) which is fully accessible to Unix tools and a number of utilities that allow a CIF to be converted into a ZINC and back again, as well as versions of some familiar Unix tools (grep, diff) and some surprising new tools (zincSubset and zincNl) that should make access to CIF much easier. What is the Problem with CIF and Unix? CIF defines a format that is generally at odds with Unix tools. - Many Unix tools are line oriented - they expect related information to be on a single line whereas CIF allows and encourages the use of multiple lines, both by limiting the line length and in the definition of loops. - Many Unix tools break the lines into fields based on a separator character. Both PDB and CIF formats work against this, the former by having column based fields, the latter by encouraging the liberal use of white space (across line boundaries). - A number of Unix tools treat the information in files as being position dependent (diff, head, tail, etc.). The PDB file format adapted neatly to this, but CIF allows a much wider variation in the placement of data. - ZINC Zinc is not an interchange format as CIF is, but rather a piping format, i.e., a format that makes the contents of a CIF accessible to Unix utilities. Each data line of a ZINC file consists of five tab-separated fields: block name index value loop-id The first field is the name of the CIF data block (the data_ prefix is omitted) and is repeated on each line where appropriate. The second field is the name of the data item. The third field is an index specifier, which is empty for non-looped data, and is a zero-based index for looped data. The fourth field is the data item itself. For multiple line CIF data, new lines are replaced by the two characters \n. The backslash character becomes the escape character throughout the ZINC format. The fifth field is a loop identifier. Comments that appear in a CIF are associated with the previous token and are also represented in the ZINC format. - Existing Tools A number of tools have been developed to support the user community in using the ZINC format and to access the information contained in a CIF. Most are simple and allow users to modify the code to tackle new problems. Source is provided for all. - cifZinc, obviously the first required tool, converts an existing CIF into a ZINC. This is a C program that converts even the largest CIF in a few seconds. - zincCif, the next most important tool, takes a ZINC and creates a pretty-printed CIF. Very often the pipeline: cifZinc old.cif | zincCif > new.cif will produce a better looking CIF than the original. - zincGrep, greps a ZINC (or a CIF file specified as a command line argument) for a regular expression and returns the block name, data name, index, and value of the match. This is the single most requested tool for dealing with CIF. - cifdiff is a four-line C-shell script that takes two CIFs and determines the differences between them. It will handle CIFs that have been rearranged, and even loops with rearranged columns, and provide only the real differences. - zb, a small (< 200 lines) tcl/tk program that provides a simple GUI front end to a ZINC or CIF, allows users to browse through the contents. Multiple files as well as multiple data blocks can be viewed simultaneously on any X terminal. zb recognizes command line argument file names in the form *.cif as being a CIF and converts it to a ZINC automatically. - zincSubset is another C-Shell script that is very short but very useful. It allows users to generate a custom subset of any ZINC (CIF), simply by listing the data blocks and data names that he or she wishes to include. For example, if you wanted to extract only the names and definitions from the mmCIF dictionary, you would create a file (e.g., defs) with two lines that appear as: _name _definition (preceded and followed by tabs) and run the command: zincSubset defs mmcif94 | zincCif - zincNl, a perl script, takes a ZINC file and creates a FORTRAN compatible namelist file allowing easy access to any CIF by FORTRAN programs without the need for extensive I/O libraries or reprogramming. As with zb above, it will automatically convert a CIF to a ZINC. It may be used in the following pipeline which extracts the coordinates from a CIF and presents them to a FORTRAN program via the namelist mechanism: zincSubset coords datafile | zincNl | myfortran (where coords is a file that simply has three lines with x, y, and z surrounded by tabs). - A Sample CIF and ZINC The following CIF illustrates most aspects involved in translating to a ZINC: # # A simple CIF # data_object # # polygon # _name ; triangle ; loop_ _x _y 0.0 0.0 1.0 0.0 0.0 1.0 _num_sides 3 In ZINC, this would appear as: ( 0 # ( 1 # A simple CIF ( 2 # object ( 3 # object ( 4 # polygon object ( 5 # object _name ;\ntriangle\n; object _x 0 0.0 _x object _y 0 0.0 _x object _x 1 1.0 _x object _y 1 0.0 _x object _x 2 0.0 _x object _y 2 1.0 _x object _num_sides 3 Note that comments "belong" to a data block and are represented with an open parenthesis for the data name. Initially, the data block name is defined to be the null string. - Code The formal definition of ZINC and the above mentioned programs are available from PDB using FTP, Gopher, or WWW (ftp://pub/other-software/Zinc). Please give it a try. All are invited to submit their favorite scripts that use ZINC. Any comments should be directed to Dave Stampf (drs@bnl.gov). ----------------------------------------------------------------------- Mirroring Earlier this year, PDB started updating the FTP server with fully released entries on a frequent basis - typically every few weeks. PDB users often ask how they can keep their local archive up to date without having to continually check the current holdings of the PDB. Fortunately, a public domain package called mirror exists that does exactly this. This program is a perl script that runs on your system and periodically creates an FTP connection to the PDB, determines the difference between your local archive and the PDB (including deletions!) and performs all necessary tasks needed to make them equivalent. We have used this program with the Weizmann Institute of Science in Israel, Turku University in Finland and the European Molecular Biology Laboratory in Germany with great success even over very poor network connections. We encourage the members of our user community who maintain a complete local archive to do the same. If you wish to try this, here is what you have to do: 0) Think: Decide what your local archive will look like. Will it be a virtual image of the PDB? Will it only hold compressed files? Will it use the all_files in one directory scheme, or the 2-character directory scheme? Your space limitations and your usage patterns will determine what is best for you. 1) Prepare: Set up your local archive from a 1994 version of the PDB CD-ROM. Be sure the dates on the files match those on the CD. In order to copy a directory on the CD to a local disk without changing the dates, use the following tar pipeline: mkdir /usr/distr; or where you want the files to go cd /CDROM/distr ; or where you are copying the files from tar cf - . | (cd /usr/distr; tar xvf -) 2) Get mirror: Copy the mirror software from PDB or elsewhere (/pub/other-software/Mirror). There is nothing to compile, but you must have already installed perl and have run the h2ph perl script. Install the mirror program in one of the standard locations. 3) Configure: Adapt a copy of the mirror.defaults configuration file. A sample of this file is also on the PDB FTP server. 4) TEST!!! If you run mirror -n it will tell you what it will do without doing it. If you are about to transfer a gigabyte over a megabit line, you may wish to reconsider. 5) When you are satisfied, make the mirror run as part of your crontab and forget about it. Your local archive will be kept up to date "automagically"! As the mirror README says, "Objects in the mirror are closer than they appear!" ----------------------------------------------------------------------- Managing the Archives Careful watchers of the PDB will have noticed significant improvements in the quality of data distributed by PDB. These changes are primarily due to our use of new data processing and validation procedures introduced last year as part of our effort to convert all prerelease entries into fully annotated form. Details of this work were initially discussed during the 1991 ACA meeting in Toledo, Ohio. We then made public commitments to erase the backlog of unprocessed entries by 1993-1994 and announced plans to upgrade older entries by subjecting them to the same processes as those entries recently deposited. Now that we have achieved our primary goal, we have started the task of upgrading older entries. Sections below discuss some recent work aimed at improving the archives and its effect on users and depositors. Upgrading the contents of the archive will require a number of changes to PDB entries. Some of these changes might effect existing software. New record types, a new HET group dictionary, and an upgrade of the existing format description document are forthcoming. Information missing from existing entries will be added when possible. We will use our current validation suite of programs to check for possible errors not previously reported in the entries. Advisory notices will be added when appropriate. In addition to improving the quality of the data in the archive, these changes will pave the way for the conversion of PDB interchange format to CIF. - Keeping Up with Changes A number of steps are being taken to keep users and software developers informed of developments that may effect their work. Announcements, documents, and production schedules related to the upgrade and cleanup work are now stored in the FTP directory /pub/pdb_upgrade. In addition, all related announcements will be posted to the PDB-L mailing list. Those interested can participate by sending an e-mail message to listserv@pdb.pdb.bnl.gov with the one-line message: subscribe PDB-L Firstname Lastname. - Informing PDB of Errors and Updating Entries We have established an e-mail address that can be used to inform us of errors found in PDB entries or to supply us with information needed to update the contents of these files. The updating and upgrading of entries will require the cooperation of our depositors and users. There are two areas in which your help would be invaluable at this time: 1. Many entries list papers as TO BE PUBLISHED. Please send us full publication information for any papers that have now been published. Titles often change between the manuscript and publication stages, making it difficult for us to track them. 2. We suspect that there are entries where the sample was prepared using recombinant techniques but this is not noted in the SOURCE records. It is important that our depositors check all their entries and notify us if there are any that are missing this information. Error and update reports may be sent to errata@pdb.pdb.bnl.gov. Please include the latest REVDAT record found in the entry along with your e-mail message. - New Record Types A number of suggested new record types related to amino acid or nucleic sequences were introduced in our July 1994 Newsletter. The following is a list of additional record types to be included in forthcoming releases of the PDB: Record - Purpose Type TITLE - To provide a succinct description of the contents of the entry. KEYWRD - To provide additional keywords that can be used to classify the contents of the entry. This record will be used to supplement the classification field provided in the HEADER record. NAMHET - To provide the complete compound name for HET groups. This record type will contain information now presented in columns 31-70 of a HET record. SYNHET - To provide synonymous compound names for HET groups. The TITLE record will contain text now found in the COMPND record. It will contain a brief and succinct description of the experiment represented in the PDB entry. A quick look at recently-released entries will reveal that changes have been made to the COMPND record. These changes reflect our attempts to introduce an internal structure to the COMPND record making it easier to parse information from it (e.g., macromolecule name). There is, however, a need to describe the contents of the file in the same way that titles of published articles help users quickly identify interesting papers to read. The TITLE record will serve this purpose. The KEYWRD record is designed to allow for inclusion of additional keywords that can be used to classify and index PDB entries. This record supplements information now provided in the HEADER record. Thus this record may contain text that gives both functional and structural classification descriptions. A list of valid values will be made available and will be updated periodically. The NAMHET and SYNHET records are designed to help identify the heterogen groups. NAMHET will be used to provide the complete and systematic name for a HET group. This information is currently found in columns 31-70 of the HET record or in REMARK records for longer names. SYNHET will include a list of synonyms for HET groups. This record type is acutely needed as it is now very difficult to search for even simple groups such as acetic acids which are described as acetate ions in a number of entries. SYNHET should also make it easier for PDB to describe molecules using systematic IUPAC names in the NAMHET record while allowing for use of more common names in SYNHET records. NAMHET and SYNHET records will be included in the HET group dictionary that is in preparation. - Record Descriptions Record Name: TITLE Cols. Contents and Description 01 - 06 TITLE 09 - 10 Continuation field (this field will be blank for the first TITLE record in each entry and will be numbered 2, 3, etc., for continuation records). 11 - 70 Succinct description of the experiment. Example: TITLE BACTERIOPHAGE T4 LYSOZYME AT HIGH IONIC STRENGTH Record Name: KEYWRD Cols. Contents and Description 01 - 06 KEYWRD 09 - 10 Continuation field (this field will be blank for the first KEYWRD record in each entry and will be numbered 2, 3, etc., for continuation records). 11 - 70 List of keywords - each keyword or phrase will be separated by a semicolon. Example: KEYWORD SERINE PROTEASE; ALPHA/BETA DOMAIN Record Name: NAMHET Cols. Contents and Description 01 - 06 NAMHET 09 - 10 Continuation field (this field will be blank for the first NAMHET record for each HET group and will be numbered 2, 3, etc. for continuation records). 11 - 14 Non-standard group (heterogen) identifier. 16 - 70 Compound name - in most cases the IUPAC name will be provided. Example: NAMHET CMP CYCLIC-3'-5'-CYCLIC MONOPHOSPHATE Record Name: SYNHET Cols. Contents and Description 01 - 05 SYNHET 09 - 10 Continuation field (this field will be blank for the first NAMHET record for each HET group and will be numbered 2, 3, etc. for continuation records). 12 - 14 Non-standard group (heterogen) identifier. 16 - 70 List of synonyms for the HET group - each synonym name will be separated by a semicolon. Example: SYNHET CMP CYCLIC AMP; CAMP - HET Dictionary A dictionary containing descriptions of all HET groups found in PDB entries is currently being constructed. Included in the dictionary are the HET, FORMUL, NAMHET, and SYNHET records for each heterogen. In addition, atom connectivity is described by CONECT records that give atom names. When ready, a copy of the dictionary will be included in the directory / pub from the FTP server and will be updated as needed. The table is being constructed in collaboration with Professor Betty Deroski of Suffolk County Community College and Dr. Clifford Felder of the Weizmann Institute of Science. Sample description of a HET group: # C4* HN1 O2-HO2 C2'--C3' # \ | | / \ # C2*--C1*--N1--C1--C2--C3--C4--C1' C4' # / | | \ / # C3* O==C--OXT-HXT N-1,2HN C6'--C5' RESIDUE AHS 50 CONECT C1' 4 C4 C2' C6' H1' CONECT C2' 4 C1' C3' 1H2' 2H2' CONECT C6' 4 C1' C5' 1H6' 2H6' CONECT C3' 4 C2' C4' 1H3' 2H3' CONECT C5' 4 C6' C4' 1H5' 2H5' CONECT C4' 4 C3' C5' 1H4' 2H4' CONECT C4 4 C3 C1' 1H4 2H4 CONECT C3 4 N C4 C2 H3 CONECT N 3 C3 1HN 2HN CONECT C2 4 C3 O2 C1 H2 CONECT O2 2 C2 HO2 CONECT C1 4 C2 N1 1H1 2H1 CONECT N1 3 C1 C C1* CONECT C 3 O N1 OXT CONECT O 1 C CONECT OXT 2 C HXT CONECT C1* 4 N1 C2* 1H1* 2H1* CONECT C2* 4 C1* C3* C4* H2* CONECT C3* 4 C2* 1H3* 2H3* 4H3* CONECT C4* 4 C2* 1H4* 2H4* 4H4* CONECT H1' 1 C1' CONECT 1H2' 1 C2' CONECT 1H2' 1 C2' CONECT 1H6' 1 C6' CONECT 1H6' 1 C6' CONECT 1H3' 1 C3' CONECT 1H3' 1 C3' CONECT 1H5' 1 C5' CONECT 1H5' 1 C5' CONECT 1H4' 1 C4' CONECT 1H4' 1 C4' CONECT 1H4 1 C4 CONECT 2H4 1 C4 CONECT 1HN 1 N CONECT 2HN 1 N CONECT H3 1 C3 CONECT H2 1 C2 CONECT HO2 1 O2 CONECT 1H1 1 C1 CONECT 2H1 1 C1 CONECT HXT 1 OXT CONECT 1H1* 1 C1* CONECT 2H1* 1 C1* CONECT H2* 1 C2* CONECT 1H3* 1 C3* CONECT 2H3* 1 C3* CONECT 3H3* 1 C3* CONECT 1H4* 1 C4* CONECT 2H4* 1 C4* CONECT 3H4* 1 C4* END HET AHS I 4 50 NAMHET AHS N-ISOLEUCYL-N-CARBOXY-(2-HYDROXY-3-AMINO-4- NAMHET 2 AHS CYCLOHEXYL-BUTYL) AMINE SYNHET AHS AZOHOMOSTATINE DIPEPTIDE ISOSTERE; CP-69,799 FORMUL 2 AHS C15 H30 N2 O3 - Whom to Contact General inquiries pertaining to the contents and management of the archive may be sent to Enrique Abola (abola1@bnl.gov). ----------------------------------------------------------------------- New Electronic Deposition Form A new Electronic Deposition Form which has been tested by a number of depositors is now available for general use. All entries submitted to the PDB must now use this form. This form is expected to simplify preparation of your deposition. It will also ensure that all data necessary for a complete PDB entry is provided in the initial submission. The new Deposition Form is available from the FTP server in the file /pub/dep_form.txt. You should retrieve this file, edit it on your computer and return it electronically to the PDB along with all associated data files. We request that you use FTP rather than electronic mail to upload these files. For explicit instructions on uploading to the PDB, see the PDB FAQ available from the FTP server in the files /pub/faq.ps (PostScript) and /pub/faq.txt (text). Both the Electronic Deposition Form and the PDB FAQ are also easily obtained via Gopher or WWW. The Electronic Deposition Form begins with a description of the standard structure of a PDB coordinate file and provides guidelines for preparing data for deposition. These guidelines should be studied and followed carefully for the most appropriate representation of your experiment. Additional information not included in PDB's earlier Deposition Form is now being requested. These data items will help our staff significantly in preparing an entry for distribution. The Electronic Deposition Form leads you through the needed material from your address information through the description of the experiment, crystallographic data, and secondary structure information. Secondary structure information is now being prepared by PDB using the Kabsch and Sander algorithm [Dictionary of Protein Secondary Structure: Pattern Recognition of Hydrogen-bonded and Geometrical Features. Biopolymers 22, 2577-637 (1983)]. This makes submission of HELIX, SHEET, and TURN records optional; however, we will continue to include your secondary structure specifications if you wish to provide this information. Below is a sample of a completed compound section. COMPOUND (COMPND) Present a brief title for the experiment. This is analogous to the title of a journal article. Repeat the block for each molecule of a macromolecular complex. Separate multiple synonym names or ligands with semicolons. (EC stands for Enzyme Commission.) Title to be used for this entry: P-hydroxybenzoate hydroxylase mutant with FAD and 2,4-dihydroxybenzoic acid. // \\ molecule name : P-hydroxybenzoate hydroxylase synonyms : PHBH EC number : 1.14.13.2 engineered mutation : Cys 116 replaced by ser ligands : FAD; 2,4-dihydroxybenzoic acid other details : \\ // ----------------------------------------------------------------------- User Group News The PDB User Group is pleased to announce that the directory with coordinates for full biological units is now available from the PDB FTP server. This was the second of our user-initiated initial priorities (the first one was the on-line pending_waiting list for newly-deposited entries), both of which have now been implemented. The User Group greatly appreciates the effort that has been put into generating these files by the PDB staff. These files were generated by Enrique Abola and Mingyu Xu from the PDB and by John Rose of the University of Pittsburgh. We hope that these full-molecule files will be useful for making teaching examples and for study of important subunit interactions by those not using crystallographic software (or, even, for those who are just lazy or can't take the time to sort it out for themselves!). The directory is available from the FTP server as the file /user_group/biological_units. It contains expanded PDB entries with coordinates for the full biological unit (that is, the functional molecule), for cases where there is internal symmetry and only part of the molecule was in the standard PDB entry. For example, standard PDB entries for hemoglobins contain only one alpha- beta dimer, while the entries here contain the full alpha2,beta2 tetramer. To indicate this difference, the files have names starting with bio and ending in .pdb. For example, file pdb1rop.ent contains only the helix-hairpin monomer of ROP protein, while bio1rop.pdb has the full 4-helix bundle. Header information in the file explains its origin and the symmetry elements used to generate it. At present this directory is experimental. It contains many, but not all, of the useful cases (for instance, viruses will be tackled soon). Please let us know if we have missed a molecule that would be useful to you, or if the header information provided is not sufficient (send e-mail to pdbusrgp@suna.biochem.duke.edu). Also, if these files turn out to be a major help to you, please let us know. ----------------------------------------------------------------------- WWW Access to PDB The World Wide Web (WWW) home page for PDB is accessible using the document URL http://www.pdb.bnl.gov/. Starting with links to the PDB FTP and Gopher servers, the WWW home page includes links to many important PDB files and tools, as well as other useful databases and information servers. WWW is a single, consistent user interface to many of the information retrieval protocols found on Internet today including ftp, telnet, nntp, wais, and gopher. WWW understands the different data formats used by these protocols, including ascii, gif, postscript, dvi, and texinfo. It also adds a new multimedia protocol (HyperText Transfer Protocol or http) and data format (HyperText Markup Language or html). A very popular WWW client is NCSA Mosaic, freely available for X Windows, Macintosh, and Microsoft Windows platforms. Mosaic clients communicate with WWW servers as well as with more traditional Internet protocols such as ftp, gopher, and wais. Hyperlinks to PDB appear on a growing number of WWW home page locations. To connect to the WWW home page at PDB from Mosaic, select the FILE pulldown menu item, then OPEN URL. Type html:// www.pdb.bnl.gov, hit the key or the OPEN button, and the PDB WWW home page will appear. You can choose from many options including FTP and Gopher, search the pending_waiting list, view the latest copies of important documents, visit other databases, and much more. To add the PDB WWW home page to your Mosaic Hotlist for quick access, select Navigate from the pulldown menu and then Add Current To Hotlist. Or you can select Navigate, Hotlist, and then Add Current. Information on setting up Mosaic to run on your computer is available via anonymous FTP from ftp.ncsa.uiuc.edu. Pertinent files are in the following subdirectories: Mac/Mosaic for Macintosh PC/Mosaic for Microsoft Windows Web/Mosaic-binaries for X Windows Web/Mosaic-source for X Windows There are README files at various levels to provide information on what is available and on how to set up your Mosaic client. Also available from your bookstore or library is a useful book on the subject: "Mosaic Quick Tour" by Gareth Branwyn. The following is a list of some interesting and useful home pages accessible using WWW. As you use Mosaic, you will find many other intriguing locations. URL Title http://www.nih.gov/ National Institutes of Health http://expasy.hcuge.ch/sprot/sprot-top.html/ SWISS-PROT Protein Sequence Database http://www.nih.gov/molbio/ Molecular Biology Databases specifically related to DNA and protein sequence database holdings http://expasy.hcuge.ch/ ExPASy Molecular Biology server http://golgi.harvard.edu/biopages.html/ Comprehensive Biosciences Index from Keith Robison gopher://vm1.hqadmin.doe.gov/1/ Department of Energy Gopher http://gdbwww.gdb.org/ Genome Data Base http://kaktus.kemi.aau.dk/ The O Protein Crystallographic Package http://dapsas1.weizmann.ac.il/ Biological Computing Division at Weizmann Institute of Science http://ndbserver.rutgers.edu:80/ Nucleic Acid Database Project at Rutgers University http://www.prosci.uci.edu/ Protein Science Web Server http://cui_www.unige.ch/meta-index.html/ Search Engines for WWW http://www.mit.edu:8001/people/mkgray/compre3.html/ Comprehensive list of HTTP sites http://www.sgi.com/ Silicon Surf from SGI gopher://pdb.pdb.bnl.gov/11/FTP/pub/pdbbrowse/ PDB-Browse - a browser for Unix systems gopher://pdb.pdb.bnl.gov/11/Software/PDBShell/ PDB-Shell - a browser for Windows (PC) systems gopher://pdb.pdb.bnl.gov/11/Software/Procheck/ Procheck information ftp://pdb.pdb.bnl.gov/pub/other-software/ Other software http://bnlstb.bio.bnl.gov:8000/ Structural Biology at the BNL Biology Department http://www.tc.cornell.edu/~richard/AChE.html/ Acetylcholinesterate: Nature's Vacuum Cleaner http://www.public.iastate.edu/~pedro/research_tools.html/ Pedro's Research Tools http://csdvx2.ccdc.cam.ac.uk/ Cambridge Crystallographic Data Centre http://www.sander.embl-heidelberg.de/dssp/ The DSSP program and database ----------------------------------------------------------------------- Release Schedules The PDB has been hard at work improving the lag time between quarterly release dates and shipment of the CD-ROM set. Some of the major steps which must take place between a release date and shipment of the CD-ROM include: - building the release on FTP - building an image of the CD-ROM files on hard disk - dumping the image to tape and checking for accuracy - sending image tape to CD-ROM duplication vendor for premastering - checking worm checkdisk from vendor for accuracy - checking final CD-ROM set from vendor for accuracy - packaging and shipping CD-ROM set We have had a schedule in place for the past several months that will eventually allow us to ship the CD-ROM at the end of the release month. For instance, CD-ROM shipment for the October 1995 release should begin on October 31. Implementation of these changes will take time since the many steps needed do not allow for gross adjustments in schedules. What follows are previous and projected CD-ROM shipment dates. As you can see, the scheduled dates versus actual dates sometimes differ, most often due to CD-ROM production problems beyond our control. More importantly, you can see that eventually CD-ROM shipment will begin at the end of the release month. scheduled actual (1994) January 04/26/94 04/25/94 April 07/07/94 07/08/94 July 09/26/94 09/26/94 October 12/12/94 na (1995) January 03/03/95 na April 05/22/95 na July 08/11/95 na October 10/31/95 na ----------------------------------------------------------------------- Discontinuation of Tape Distribution Due to the very small number of tapes ordered, PDB has discontinued distribution of these media. We have contacted those few who had previously been interested in tape format, discussed the situation, and all have decided that the CD-ROM would be an acceptable and perhaps even more convenient format. ----------------------------------------------------------------------- Revised Newsletter Format As discussed in both the January and April 1994 Newsletters, PDB has eliminated the tables of newly released and newly deposited entries in this and future Newsletters. These tables will appear individually in the Full Tables document. The Full Tables document accompanies each order shipped and is available from the FTP server in the /newsletter subdirectory in both PostScript and ASCII formats. A printed copy may be obtained upon request. ----------------------------------------------------------------------- Newsletter Distribution Changes Due to the ease of retrieving the Newsletter via Internet (FTP, Gopher, and WWW in the /newsletter subdirectory in PostScript and ASCII formats) and the very high number of printed Newsletters being distributed via the postal system, PDB has decided to re-initialize the Newsletter mailing list. This is expected to take place just after distribution of the January 1995 Newsletter. Anyone who wishes to receive printed copies of the Newsletter will be able to do so by specifically requesting to be on the Newsletter mailing list. Intentions are to have a form available in January which can be completed by users and returned to us electronically or via the postal system. Plans are to have this re- initialized mailing list operational for the April 1995 Newsletter distribution. Please stay tuned for further developments. ----------------------------------------------------------------------- New Entry Tracking System Many depositors, users and journals that refer to PDB have asked us to provide a mechanism for checking the status of entries from depositor's initial contact with PDB to the time when the entry is released. This is now possible due to the establishment of tracking codes, one of which is given to each depositor upon deposition of an entry. This code can be used by depositors, users and journals to track an entry's progress until full release. Upon receipt of coordinates for an entry, PDB issues a tracking number and sends a letter to the primary and secondary contacts that includes this number. The format of the tracking number is a number preceded by the letter T (e.g., T9999). The tracking number is published, for every entry that is currently being processed by PDB, in the file /pub/pending_waiting.list which is available from the FTP server. If you use Gopher to access PDB, this file is indexed by tracking number, author, and compound name in order to speed searches. Status for each pending and waiting entry is issued from the following list: INCOMPLETE - if PDB is waiting for additional materials or information from depositor to make a complete deposition PROCESSING - if PDB is checking and verifying entry DEPOSITOR - if entry is with depositor for approval REVIEW - if entry is undergoing final review REL - if entry is in current release HLD - if entry is on hold at present time This number is not to be construed as an ident code. Its purpose is to speed up our searches, make our responses to inquiries faster and more accurate, and allow depositors to track the progress of their entries. PDB issues the PDB ident code, required by most journals, only after running some preliminary checking programs and receiving all required documentation, including a (p)reprint of the JRNL reference (or a statement indicting that the entry does not correspond to a specific publication). ----------------------------------------------------------------------- Access to PDB - FTP PDB has an anonymous FTP account on the computer system ftp.pdb.bnl.gov (Internet address 130.199.144.1). Files may be transferred to and from this system using anonymous as the FTP user name and your e-mail address as the password. Besides downloading entries, data files and documentation, it is possible to upload any files that you may wish to send to PDB, only into the directory /new_uploads. Those using VMS may need to place quotes around file names. - Gopher PDB has a Gopher server on the system gopher.pdb.bnl.gov (130.199.144.1). This server is accessible using a Gopher client connecting to the following link: Name = Protein Data Bank FTP server Type = 1 Host = gopher.pdb.bnl.gov Port = 70 Path = 1/ As a Gopher client, you may navigate through a hierarchy of directories and documents or ask an index server to return a list of all documents that contain one or more specified words. For instance, you can choose "The PDB Anonymous FTP" after reaching PDB's Gopher server in order to search and download the same information and coordinate files as through FTP. Alternatively, you can select "An (almost) full-text search of the PDB Bibliographic Headers" in order to search PDB using any keyword. - World Wide Web (WWW) PDB has a World Wide Web (WWW) server on the computer system www.pdb.bnl.gov (130.199.144.1). This server is accessible using the document URL http://www.pdb.bnl.gov/. Besides including links to the PDB FTP and Gopher servers, the WWW server includes links to many other useful databases and information servers. - Listserv PDB has a mailing list devoted to discussions concerning its operation, contents, and access procedures. To subscribe, send e-mail to listserv@pdb.pdb.bnl.gov with the one-line message: subscribe PDB-L Firstname Lastname. To find out what can be done with this mailing list, send e-mail to the same address (listserv@pdb.pdb.bnl.gov) with the one-line message: help. To send a message to all PDB-L subscribers, e-mail the message to: PDB-L@pdb.pdb.bnl.gov. ----------------------------------------------------------------------- Affiliated Centers Twenty-two affiliated centers offer DATAPRTP information for distribution. These centers are members of the Protein Data Bank Service Association (PDBSA). Centers designated with an asterisk(*) may distribute DATAPRTP information both on-line and on magnetic or optical media; those without an asterisk are on-line distributors only. BMERC BioMolecular Engineering Research Center College of Engineering, Boston University Boston, Massachusetts Kathleen Klose (617-353-7123) klose@darwin.bu.edu * BIOSYM BIOSYM Technologies, Inc. San Diego, California Laurel Frey (619-546-5509) rcenter@biosym.com or laurel@biosym.com CAN/SND Canadian Scientific Numeric Data Base Service Ottawa, Ontario, Canada Roger Gough (613-993-3294) cansnd@vm.nrc.ca CAOS/CAMM Dutch National Facility for Computer Assisted Chemistry Nijmegen, The Netherlands Jan Noordik (+1 31-80-653386) noordik@caos.caos.kun.nl * CCDC Cambridge Crystallographic Data Centre Cambridge, United Kingdom David Watson (+1 44-223-336394) dgwl@chemcrys.cam.ac.uk CINECA NE Italy Interuniversity Computing Center Casalecchio di Reno (BO), Italy Laura Setti (+1 39-51-6599478) ICGEB International Centre for Genetic Engineering and Biotechnology Trieste, Italy Sandor Pongor (+1 39-40-3757300) pongor@icgeb.trieste.it asltc0@icineca.cineca.it EMBL European Molecular Biology Laboratory Heidelberg, Germany Hans Doebbeling(+1 49-6221-387-247) hans.doebbeling@embl-heidelberg.de INN Israeli National Node Weizmann Institute of Science Rehovot, Israel Leon Esterman (+1 972-8-343934) lsestern@weizmann.weizmann.ac.il * JAICI Japan Association for International Chemical Information Tokyo, Japan Hideaki Chihara (+1 81-3-5978-3608) * MAG Molecular Applications Group Palo Alto, California Hilary Jensen (415-473-3039) hilary@suerte.mag.com * MSI Molecular Simulations Inc. Burlington, Massachusetts Lance J. Ransom Wright (617-229-9800) lance@msi.com NCHC National Center for High-Performance Computing Hsinchu, Taiwan, ROC Jyh-Shyong Ho (+1 886-35-776085; ex: 342) c00jsh00@nchc.gov.tw NCSA National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Champaign, Illinois Patricia Carlson (217-244-0768) pcarlson@ncsa.uiuc.edu National Center for Biotechnology Information National Library of Medicine National Institutes of Health Bethesda, Maryland Stephen Bryant (301-496-2475) bryant@ncbi.nlm.nih.gov * OML Oxford Molecular Ltd. Oxford, United Kingdom Steve Gardner (+1 44-865-784600) steve@gardner.demon.co.uk * Osaka University Institute for Protein Research Osaka, Japan Yoshiki Matsuura (+1 81-6-879-8605) matsuura@protein.osaka-u.ac.jp Pittsburgh Supercomputing Center Pittsburgh, Pennsylvania Hugh Nicholas (412-268-4960) nicholas@cpwpsca.bitnet * Protein Science Princeton, New Jersey Joseph Villafranca (609-252-3573) villafranca@bms.com SDSC San Diego Supercomputer Center San Diego, California Lynn Ten Eyck (619-534-8189) teneyckl@sdsc.bitnet SEQNET Daresbury Laboratory Warrington, United Kingdom User Interface Group (+1 44-925-603351) uig@daresbury.ac.uk * Tripos Tripos Inc. St. Louis, Missouri Akbar Nayeem (314-647-1099; ex: 3224) akbar@tripos.com ----------------------------------------------------------------------- Protein Data Bank Chemistry Department, Bldg. 555 Brookhaven National Laboratory P.O. Box 5000 Upton, NY 11973-5000 USA ----------------------------------------------------------------------- To Contact PDB Telephone: +1 516-282-3629 Facsimile: +1 516-282-5751 Internet: pdb@bnl.gov general correspondence orders@pdb.pdb.bnl.gov order information sysadmin@pdb.pdb.bnl.gov network services listserv@pdb.pdb.bnl.gov Listserver subscriptions pdb-l@pdb.pdb.bnl.gov Listserver postings errata@pdb.pdb.bnl.gov entry error reporting Please include your name, postal mailing address, e-mail address, facsimile number and telephone number in all correspondence. ----------------------------------------------------------------------- Statement of Support PDB is supported by a combination of Federal Government Agency funds (work supported by the U.S. National Science Foundation; the U.S. Public Health Service, National Insti tutes of Health, National Center for Research Resources, National Institute of General Medical Sciences and National Library of Medicine; and the U.S. Department of Energy under contract DE-AC02-76CH00016) and user fees. ----------------------------------------------------------------------- PDB Staff Joel L. Sussman, Head David R. Stampf, Sr. Project Mgr. Enrique E. Abola, Science Coordinator Frances C. Bernstein Judith A. Callaway Minette Cummings Betty R. Deroski Pamela A. Esposito Arthur Forman Thomas F. Koetzle Patricia A. Langdon Michael D. Libeson Nancy O. Manning John E. McCarthy Regina K. Shea John G. Skora Karen E. Smith Dejun Xue -----------------------------------------------------------------------