#19838 assigned enhancement

Better support for long PDB IDs.

Reported by: Tom Goddard Owned by: Eric Pettersen
Priority: moderate Milestone:
Component: Unassigned Version:
Keywords: Cc:
Blocked By: Blocking:
Notify when closed: Platform: all
Project: ChimeraX

Description

As per the PDB announcement (below) 8-character PDB IDs are getting closer. T.G. suggest allowing not only pdb_00012xyz, but 12xyz and 00012xyz as synonyms. The code already allows the last one, but assumes the first five characters are digits, which does not jive with the example code in the mail (pdb_1000axyz). It would be good to see what IDs are in practice in order to make the rules as specific as possible to avoid trying to fetch file-name typos and the resulting confusing error message.

PDB announcement:

From: Irina Persikova via pdb-l <pdb-l@…>
Subject: pdb-l: New PDB Beta Archive Available for Testing
Date: February 11, 2026 at 8:07:30 AM PST
To: pdb-l@…
Reply-To: Irina Persikova <persikov@…>

*By 2028* 4-character PDB IDs (e.g. *1abc*) will be fully allocated. After that, all new entries will be assigned *only extended PDB IDs*.

The new *extended PDB ID format <https://www.wwpdb.org/documentation/pdb-id-extension-faq> will be 12 characters*, which includes a prefix pdb_ followed by 8 alphanumeric characters, e.g. *pdb_1000axyz*. This new ID format <https://www.wwpdb.org/documentation/pdb-id-extension-faq> will enable text mining detection of PDB entries in the published literature and allow for more informative and transparent delivery of revised data files. */When submitting extended PDB IDs to journals and citing extended PDB IDs in manuscripts, all 12 characters including prefix pdb_ should be provided./ *

A PDB Beta Archive <https://www.wwpdb.org/ftp/pdb-beta-ftp-sites> is now available to help community adopt extended PDB ID and PDBx/mmCIF format during the transition phase. All files at this archive are re-organized with extended PDB ID (including file naming and directories) at entry level, mirroring the same data organization of the PDB Versioned Archive <http://files-versioned.wwpdb.org/>.

All data files for a particular entry are stored in a single directory, labeled based on a two-character hash generated from the penultimate two characters of the PDB code, i.e., https://files-beta.wwpdb.org/pub/wwpdb/pdb/data/entries/<two-letter-hash>/<pdb_accession_code>/<entry_data_File_names>. The two-letter hash will be based on the second and third characters from the last character. For example, PDB entry pdb_1abc5*67*8 will be under */67/*. This will maintain consistency with the current PDB archive: PDB entry 1abc is under /ab.

File naming is standardized such that the file type is used for the extension.
For example, file naming is changed from *r116dsf.ent.gz* to *pdb_0000116d-sf.cif.gz* for the structure factor file and from *pdb318d.ent.gz* to *pdb_0000318d.pdb.gz* for the legacy PDB formatted coordinate file.

When four character PDB IDs are about to be consumed, this PDB Beta Archive will replace the current PDB Archive (expect to be around mid-2027) and entries with extended PDB IDs issued are not compatible with PDB format. wwPDB encourages scientific journals, PDB community and users to transition to PDBx/mmCIF format and adopt new PDB ID format as earlier as possible.

For any further information please contact us at info@….

Please read the full news at: https://www.wwpdb.org/news/news?year=2026#698a36067e4af405aeeb5b24

On behalf of the wwPDB,

Change History (0)

Note: See TracTickets for help on using tickets.