The matchmaker (or mmaker) command superimposes protein or nucleic acid structures by first creating a pairwise sequence alignment, then fitting the aligned residue pairs. It is the command-line implementation of the Matchmaker tool.
Residue types and/or protein secondary structure information can be used to align the sequences, and the pairwise alignment can be shown in the Sequence Viewer. Fitting uses one point per residue: CA in amino acid residues and C4' in nucleic acid residues. If a nucleic acid residue lacks a C4' atom (some lower-resolution structures are P traces), its P atom will be paired with the P atom of the aligned residue.
The method was originally implemented in Chimera, as described in:
Tools for integrated sequence-structure analysis with UCSF Chimera. Meng EC, Pettersen EF, Couch GS, Huang CC, Ferrin TE. BMC Bioinformatics. 2006 Jul 12;7:339.
Note: if it is already known which residue numbers in one structure should be paired with which residue numbers in the other, another possibility is to use the align command, which executes more quickly because it does not include a sequence alignment step. See also: rmsd, fitmap, morph, view, dssp, measure rotation, save PDB
The structure to match (matchstruct) and a reference structure (refstruct) must be specified. The matchstruct specifcation can include multiple models to be matched to refstruct independently, but cannot include parts of the same model as refstruct. The pairing mode determines whether chains or models should be specified. On occasion, it may be useful to restrict the calculation to certain residues. When matchstruct includes only a single model, one or more additional models to move along with it can be specified with the bring option.
Sequence alignment scores, parameter values, and final match RMSDs are reported in the Log. If the fit is iterated, the final RMSD over all residue pairs (columns in the sequence alignment) will be reported along with the RMSD over the pruned set of pairs.
match #2 to #1 bring #3– superimpose model 2 onto model 1 using default settings, and move model 3 along with model 2 as if they were a single rigid body (retaining the current spatial relationship between models 2 and 3). Default settings are to recalculate secondary structure assignments with dssp, generate sequence alignments using the Needleman-Wunsch algorithm with the BLOSUM-62 residue similarity matrix (weight 0.7) and secondary structure scoring (weight 0.3), keep the sequence alignment for the best-scoring pair of chains (one from model 0 and one from model 1), and using that alignment, iteratively fit the structures with a cutoff of 2.0 Å.
mm #1-5 to #6– independently superimpose models 1-5 onto model 6 using default settings
mm #2 to #1/a pair bs alg sw matrix PAM-150 ss false cut 5.0(example structures: mouse and human phosphoserine phosphatases 1j97, 1nnl open as models 1 and 2, respectively)
mm #1/a,b to #2/c,d pair ss(example structures: insulin 1b17, 1ben open as models 1 and 2, respectively)
mm #1 to #2 matrix Nucleic(example structures: tRNAs 2tra, 4tra open as models 1 and 2, respectively)
Including specific residues in the matchstruct and/or refstruct specifications restricts the calculation to only those residues. In general, restriction should only be used in specific cases to suppress results that would otherwise be obtained. For example, two chains that would otherwise align on their N-terminal domains can be forced to align on their C-terminal domains by specifying only the residues in the C-terminal domains. Otherwise, restriction is not recommended, because full-length alignments tend to be of higher quality, and iteration already serves to exclude poorly superimposed regions from the final fit. Although the unused parts of matched chains will appear in the resulting sequence alignment (if shown), they have simply been added back in as “filler,” without consideration of how the characters align, after alignment and matching of only the specified residues.
The mode controls which chain sequences are used to construct the reference-match alignment:
- bb (default) - use the pair of chains, one from the match model and one from the reference model, with the highest alignment score; matchstruct and refstruct should each specify a model or part of a model
- bs - use the chain in the match model that gives the best alignment score with a specific chain in the reference model; matchstruct should specify a model or part of a model, and refstruct should specify a chain
- ss - pair specific chain(s) in the match model with specific chain(s) in the reference model; matchstruct and refstruct should specify equal numbers of chains
The alignment-algorithm can be:
The alignment scoring options allow for fine-tuning, if needed.
- nw (or needle; default) - Needleman-Wunsch, global
- sw (or smith) - Smith-Waterman, local
showAlignment true | false
Whether to show the resulting pairwise sequence alignment(s) in the Sequence Viewer. When fit iteration is employed, the pairs used in the final fit will be highlighted in the sequence alignment as a region named matched residues. An RMSD header is automatically shown above the sequences, with histogram bar heights representing the spatial variation among residues associated with a column.
*These sequence alignments are a by-product of superposition, and may not be entirely correct. Successful superposition only requires the sequence alignment to be partly correct, as incorrect portions tend to be excluded from the fit during iteration. If the sequences are easy to align (highly similar), the sequence alignment is likely to be correct throughout. However, if the sequences are more distantly related, parts of the alignment may be incorrect even when the superposition is good.
**When the fit has been restricted to specified residues, the remaining residues of matched chains will still appear in the alignment, but merely as a convenient compact representation; how they are aligned is not meaningful.
cutoffDistance cutoff | none
By default, structures are fit iteratively with a cutoff of 2.0 Å for omitting farther-apart pairs from the fit. Specifying the cutoff as none turns iteration off. When iteration is performed, the sequence alignment is not changed, but residue pairs in the alignment can be pruned from the “match list” used to superimpose the structures. In each cycle, pairs of atoms are removed from the match list and the remaining pairs are fitted, until no matched pair is more than cutoff apart (default 2.0 Å). The atom pairs removed are either the 10% farthest apart of all pairs or the 50% farthest apart of all pairs exceeding the cutoff, whichever is the lesser number of pairs. The result of iteration is that the best-matching “core” regions are maximally superimposed, and conformationally dissimilar regions such as flexible loops are not included in the final fit, even though they may be aligned in the sequence alignment.
verbose true | false
For each chain-chain pair, send additional information to the Log:
- Sequences: followed by the pairwise sequence alignment, i.e., two lines, each containing a sequence name and (gapped) sequence
- Residues: followed by two lines, each a comma-separated list of the structure residues associated with the nongap positions of the corresponding sequence; missing structure residues are reported as None
- Residue usage in match (1=used, 0=unused): followed by two lines, each a comma-separated list of zeros and ones, indicating which structure residues were used in the final fit
Sequence alignment scores can include a residue similarity term, a secondary structure term (if protein), and gap penalties.
The similarity-matrix can be any of: BLOSUM-30, BLOSUM-35, BLOSUM-40, BLOSUM-45, BLOSUM-50, BLOSUM-55, BLOSUM-60, BLOSUM-62 (default), BLOSUM-65, BLOSUM-70, BLOSUM-75, BLOSUM-80, BLOSUM-85, BLOSUM-90, BLOSUM-100, BLOSUM-N, PAM-40, PAM-120, PAM-150, PAM-250, SDM, HSDM, Nucleic.
If an amino acid matrix (any except Nucleic) is specified, only peptide chains will be aligned; if the Nucleic matrix is specified, only nucleic acid chains will be aligned. An error message will appear if there are no reference-match pairs of the appropriate type.
ssFraction fraction | false
Fraction is the relative weight of the secondary structure term for proteins and can range from 0 to 1 (default 0.3). Unless the option is set to false, a protein secondary structure term will be included with a weight of ssfract and the residue similarity term will be given a weight of (1-ssfract).
computeSs true | false
When secondary structure scoring is used, whether to first identify helices and strands with the dssp algorithm (except for CA-only structures, which are automatically skipped). This option may improve superposition results by generating consistent assignments, whereas pre-existing assignments may reflect the use of different criteria on different structures. However, any pre-existing assignments will not be overwritten unless keepComputedSs is set to true. When secondary structure scoring is not used, computeSs is ignored and secondary structure assignments are not computed.
keepComputedSs true | false
When secondary structure assignments are recomputed (computeSs true), whether to overwrite any previous assignments with the newly computed ones. The default is false, meaning to use the new assignments only temporarily for superposition and to keep any secondary structure assignments that existed before the matchmaker command was used.
When secondary structure scoring is not used, the opening-penalty is subtracted from the score for each gap opened (12 by default). When secondary structure scoring is used, secondary-structure-specific gap opening penalties (see hgap, sgap, and ogap) are used instead.
The extension-penalty is subtracted from the score for each increment in gap length (1 by default).
When secondary structure scoring is used, the intrahelix-penalty is subtracted from the score for each gap opened within a helix (18 by default). When secondary structure scoring is not used, a generic gap penalty (see gapOpen) is used instead.
When secondary structure scoring is used, the intrastrand-penalty is subtracted from the score for each gap opened within a strand (18 by default). When secondary structure scoring is not used, a generic gap penalty (see gapOpen) is used instead.
When secondary structure scoring is used, the other-penalty is subtracted from the score for each gap opened that is not within a helix or strand (6 by default). When secondary structure scoring is not used, a generic gap penalty (see gapOpen) is used instead.
When secondary structure scoring is used, helix-helix-score is the value added to the secondary structure term for aligning a residue in a helix with a residue in a helix (default 6).
When secondary structure scoring is used, strand-strand-score is the value added to the secondary structure term for aligning a residue in a strand with a residue in a strand (default 6).
When secondary structure scoring is used, other-other-score is the value added to the secondary structure term for aligning a non-helix, non-strand residue with a non-helix, non-strand residue (default 4).
When secondary structure scoring is used, helix-strand-score is the value added to the secondary structure term for aligning a residue in a helix with a residue in a strand (default -9).
When secondary structure scoring is used, helix-other-score is the value added to the secondary structure term for aligning a residue in a helix with a non-helix, non-strand residue (default -6).
When secondary structure scoring is used, strand-other-score is the value added to the secondary structure term for aligning a residue in a strand with a non-helix, non-strand residue (default -6).