Opened 3 months ago
Last modified 3 weeks ago
#19838 accepted enhancement
Better support for long PDB IDs.
| Reported by: | Tom Goddard | Owned by: | Eric Pettersen |
|---|---|---|---|
| Priority: | moderate | Milestone: | |
| Component: | Input/Output | Version: | |
| Keywords: | Cc: | ||
| Blocked By: | Blocking: | ||
| Notify when closed: | Platform: | all | |
| Project: | ChimeraX |
Description
As per the PDB announcement (below) 8-character PDB IDs are getting closer. T.G. suggest allowing not only pdb_00012xyz, but 12xyz and 00012xyz as synonyms. The code already allows the last one, but assumes the first five characters are digits, which does not jive with the example code in the mail (pdb_1000axyz). It would be good to see what IDs are in practice in order to make the rules as specific as possible to avoid trying to fetch file-name typos and the resulting confusing error message.
PDB announcement:
From: Irina Persikova via pdb-l <pdb-l@…>
Subject: pdb-l: New PDB Beta Archive Available for Testing
Date: February 11, 2026 at 8:07:30 AM PST
To: pdb-l@…
Reply-To: Irina Persikova <persikov@…>
*By 2028* 4-character PDB IDs (e.g. *1abc*) will be fully allocated. After that, all new entries will be assigned *only extended PDB IDs*.
The new *extended PDB ID format <https://www.wwpdb.org/documentation/pdb-id-extension-faq> will be 12 characters*, which includes a prefix pdb_ followed by 8 alphanumeric characters, e.g. *pdb_1000axyz*. This new ID format <https://www.wwpdb.org/documentation/pdb-id-extension-faq> will enable text mining detection of PDB entries in the published literature and allow for more informative and transparent delivery of revised data files. */When submitting extended PDB IDs to journals and citing extended PDB IDs in manuscripts, all 12 characters including prefix pdb_ should be provided./ *
A PDB Beta Archive <https://www.wwpdb.org/ftp/pdb-beta-ftp-sites> is now available to help community adopt extended PDB ID and PDBx/mmCIF format during the transition phase. All files at this archive are re-organized with extended PDB ID (including file naming and directories) at entry level, mirroring the same data organization of the PDB Versioned Archive <http://files-versioned.wwpdb.org/>.
All data files for a particular entry are stored in a single directory, labeled based on a two-character hash generated from the penultimate two characters of the PDB code, i.e., https://files-beta.wwpdb.org/pub/wwpdb/pdb/data/entries/<two-letter-hash>/<pdb_accession_code>/<entry_data_File_names>. The two-letter hash will be based on the second and third characters from the last character. For example, PDB entry pdb_1abc5*67*8 will be under */67/*. This will maintain consistency with the current PDB archive: PDB entry 1abc is under /ab.
File naming is standardized such that the file type is used for the extension.
For example, file naming is changed from *r116dsf.ent.gz* to *pdb_0000116d-sf.cif.gz* for the structure factor file and from *pdb318d.ent.gz* to *pdb_0000318d.pdb.gz* for the legacy PDB formatted coordinate file.
When four character PDB IDs are about to be consumed, this PDB Beta Archive will replace the current PDB Archive (expect to be around mid-2027) and entries with extended PDB IDs issued are not compatible with PDB format. wwPDB encourages scientific journals, PDB community and users to transition to PDBx/mmCIF format and adopt new PDB ID format as earlier as possible.
For any further information please contact us at info@….
Please read the full news at: https://www.wwpdb.org/news/news?year=2026#698a36067e4af405aeeb5b24
On behalf of the wwPDB,
Change History (2)
comment:1 by , 3 months ago
| Component: | Unassigned → Input/Output |
|---|---|
| Status: | assigned → accepted |
May also have to update fetching from PDB-REDO (and maybe other databases) when the identifiers switch.
—Eric
Begin forwarded message:
From: CCP4BB automatic digest system
Subject: CCP4BB Digest - 22 Apr 2026 to 23 Apr 2026 (#2026-96)
Date: April 23, 2026 at 4:00:35 PM PDT
To: CCP4BB@…
Reply-To: CCP4 bulletin board
Date: Thu, 23 Apr 2026 15:14:43 +0000
From: Robbie Joosten
Subject: Moving to the new PDB identifiers - DSSP and PDB-REDO
Hi everyone,
The PDB will soon run out of 4-character PDB identifiers and they will switch to a new, more recognizable format. For example, 2b8h will become pdb_00002b8h. When this switch is definitive there will not be any new PDB formatted entry files. mmCIF is the way of the future or actually already the way of the present.
To deal with this improvement we already moved the DSSP databank at https://pdb-redo.eu/dssp to the new identifiers. Now you should refer to a pdb entry with pdb_00001abc or just 00001abc. No worries, the old PDB identifiers like 1abc will continue to work for now. If you download the whole DSSP databank through rsync://rsync.pdb-redo.eu/dssp/ you will find that the files will have the new style identifier in the name.
We are now working on updating pdb-redo to also use the new identifiers. We will make the switch over the next few months and warn you when we flip the switch. In preparation for that we are gradually saying goodbye to the PDB format. The intermediate models in PDB format are already removed and the final models are provided in PDB format only if they fit. For the PDB-REDO server PDB format will be supported as input, but we strongly recommend users to switch to mmCIF.
Thank you for your attention to this matter,
The PDB-REDO team