mmaker, matchmaker

Usage:
( mmaker | matchmaker ) refstruct matchstruct options

Mmaker (or matchmaker) is the command-line implementation of MatchMaker, which superimposes structures by first creating a pairwise sequence alignment, then fitting the aligned residue pairs. Residue types and/or secondary structure information can be used to align the sequences. Fitting uses one point per residue: CA in amino acid residues and C4' in nucleic acid residues. If a nucleic acid residue lacks a C4' atom (some lower-resolution structures are P traces), its P atom will be paired with the P atom of the aligned residue.

Note: if it is already known which residue numbers in one structure should be paired with which residue numbers in the other, another possibility is to use the match command, which executes more quickly because it does not include a sequence alignment step. See superimposing structures for a discussion of the different methods available in Chimera.

A reference structure (refstruct) and a structure to match (matchstruct) must be specified. Matchstruct can include multiple models to be matched to refstruct independently, but cannot include parts of the same model as refstruct. The pairing mode determines whether chains or models should be specified. If a specification includes any spaces, it must be enclosed in single or double quote marks. On occasion, it may be useful to restrict the calculation to certain residues.

Sequence alignment scores, parameter values, and final match RMSDs are reported in the Reply Log. If the fit is iterated, the final RMSD over all residue pairs (columns in the sequence alignment) will be reported along with the RMSD over the pruned set of pairs.

Examples:

mm #0 #1 show true

– superimpose model 1 onto model 0 using default settings; show the sequence alignment. Default settings are to recalculate secondary structure assignments with ksdssp, generate sequence alignments using the Needleman-Wunsch algorithm with the BLOSUM-62 residue similarity matrix (weight 0.7) and secondary structure scoring (weight 0.3), keep the sequence alignment for the best-scoring pair of chains (one from model 0 and one from model 1), and using that alignment, iteratively fit the structures with a cutoff of 2.0 Å.

mm #0 #1-5 computeSS false

– independently superimpose models 1-5 onto model 0 using default settings, except without recalculating secondary structure assignments.

mmaker #0:.a #1 pair sb alg sw matrix PAM-150 ss false iter 5.0

(example structures: mouse and human phosphoserine phosphatases 1j97, 1nnl open as models 0 and 1, respectively)
– align chain A in model 0 with the highest-scoring chain in model 1 using the Smith-Waterman algorithm with the PAM-150 residue similarity matrix (weight 1.0, no secondary structure scoring); iteratively fit the structures using a cutoff of 5.0 Å.

mm #0:.a:.b #1:.c:.d pair ss show true

(example structures: insulin 1b17, 1ben open as models 0 and 1, respectively)
– align the specific chain pairs A/C and B/D (in models 0/1) using default settings; show both sequence alignments.

Including specific residues in the refstruct and/or matchstruct specifications restricts the calculation to only those residues. In general, restriction should only be used in specific cases to suppress results that would otherwise be obtained. For example, two chains that would otherwise align on their N-terminal domains can be forced to align on their C-terminal domains by specifying only the residues in the C-terminal domains. Otherwise, restriction is not recommended, because full-length alignments tend to be of higher quality, and iteration already serves to exclude poorly superimposed regions from the final fit. Although the unused parts of matched chains will appear in the resulting sequence alignment (if shown), they have simply been added back in as “filler,” without consideration of how the characters align, after alignment and matching of only the specified residues.

Options

Option keywords for mmaker (matchmaker) can be truncated to unique strings and their case does not matter. A vertical bar “|” designates mutually exclusive options, and default values are indicated with bold. Synonyms for true: True, 1. Synonyms for false: False, 0. Many of the options affect alignment scoring.

pairing mode
The mode controls which chain sequences are used to construct the reference-match alignment:

bb (default) - use the pair of chains, one from the reference model and one from the match model, with the highest alignment score; refstruct and matchstruct should each specify a model or part of a model
sb - pair a specific chain in the reference model with whichever chain in the match model gives the highest alignment score; refstruct should specify a chain, matchstruct a model or part of a model
ss - pair specific chain(s) in the reference model with specific chain(s) in the match model; refstruct and matchstruct should specify equal numbers of chains

alg alignment-algorithm
The alignment-algorithm can be:

nw (or needle; default) - Needleman-Wunsch, global
sw (or smith) - Smith-Waterman, local

showAlignment true | false
Whether to show the resulting sequence alignment(s) with Multalign Viewer. When fit iteration is employed, the pairs used in the final fit will be shown in the alignment as a region named matched residues. The “RMSD: ca” header is automatically shown above the sequences, with histogram bar heights representing the single-point spatial variation among residues associated with a column. In the pairwise case, the value per column is simply the distance between the two associated residues.
*When the fit has been restricted to specified residues, the remaining residues of matched chains will still appear in the alignment, but merely as a convenient compact representation; how they are aligned is not meaningful.
**These sequence alignments can be considered a by-product of superposition. Successful superposition only requires the sequence alignment to be partly correct, as incorrect portions tend to be excluded from the fit during iteration. If the sequences are easy to align (highly similar), the sequence alignment is likely to be correct throughout. However, if the sequences are more distantly related, parts of the alignment may be incorrect even when a successful superposition is produced. When matchmaker is used simply to superimpose structures, this is not important. However, if one also wants a corresponding sequence alignment, generating a structure-based alignment (after superposition) with Match -> Align is recommended, especially if the sequences are dissimilar. The structure-based sequence alignment will provide better statistics for describing structural similarity (RMSD, etc.) because more columns will be aligned correctly.

iterate cutoff | false
Whether to iteratively fit the structures, and the cutoff for excluding residue pairs from the fit. An iterative fit will be performed unless this option is set to false. The sequence alignment is not changed, but residue pairs in the alignment can be removed from the "match list" used to superimpose the structures. In each cycle, pairs of atoms are removed from the match list and the remaining pairs are fitted, until no matched pair is more than cutoff Å apart (2.0 by default). The atom pairs removed are either the 10% farthest apart of all pairs or the 50% farthest apart of all pairs exceeding the cutoff, whichever is the lesser number of pairs. The result is that the best-matching "core" regions are maximally superimposed; conformationally dissimilar regions such as flexible loops are not included in the final fit, even though they may be aligned in the sequence alignment.

Alignment Scoring Options

matrix similarity-matrix
The similarity-matrix can be any of those listed in the MatchMaker graphical interface (case is important): BLOSUM-30, BLOSUM-35, BLOSUM-40, BLOSUM-45, BLOSUM-50, BLOSUM-55, BLOSUM-60, BLOSUM-62 (default), BLOSUM-65, BLOSUM-70, BLOSUM-75, BLOSUM-80, BLOSUM-85, BLOSUM-90, BLOSUM-100, BLOSUM-N, PAM-40, PAM-120, PAM-150, PAM-250, SDM, HSDM, Nucleic. Matrix files reside in the share/SmithWaterman/matrices/ directory of a Chimera installation.
If an amino acid matrix (any except Nucleic) is chosen, only peptide chains will be aligned; if the Nucleic matrix is chosen, only nucleic acid chains will be aligned. An error message will appear if there are no reference-match pairs of the appropriate type.

gapOpen opening-penalty
When secondary structure scoring is not being used, the opening-penalty is subtracted from the score for each gap opened (12 by default). When secondary structure scoring is used, secondary-structure-specific gap opening penalties (see hgap, sgap, and ogap) are used instead.

gapExtend extension-penalty
The extension-penalty is subtracted from the score for each increment in gap length (1 by default).

ssFraction fraction | false
Sequence alignment scores can include a residue identity/similarity term, a secondary structure term, and gap penalties. Fraction is the relative weight of the secondary structure term and can range from 0 to 1 (default 0.3). Unless the option is set to false, a secondary structure term will be included with a weight of ssfract and the residue similarity term will be given a weight of (1-ssfract).

computeSS true | false
When secondary structure scoring is used, whether to first identify helices and strands with the ksdssp algorithm, overwriting any pre-existing secondary structure assignments (except for CA-only structures, which are automatically skipped). This option may improve superposition results by generating consistent assignments, whereas pre-existing assignments may reflect the use of different criteria on different structures. When secondary structure scoring is not being used, this option is ignored and secondary structure assignments are not computed.

hgap intrahelix-penalty
When secondary structure scoring is used, the intrahelix-penalty is subtracted from the score for each gap opened within a helix (18 by default). When secondary structure scoring is not being used, a generic gap penalty (see gapOpen) is used instead.

sgap intrastrand-penalty
When secondary structure scoring is used, the intrastrand-penalty is subtracted from the score for each gap opened within a strand (18 by default). When secondary structure scoring is not being used, a generic gap penalty (see gapOpen) is used instead.

ogap other-penalty
When secondary structure scoring is used, the other-penalty is subtracted from the score for each gap opened that is not within a helix or strand (6 by default). When secondary structure scoring is not being used, a generic gap penalty (see gapOpen) is used instead.

matHH helix-helix-score
When secondary structure scoring is used, helix-helix-score is the value added to the secondary structure term for aligning a residue in a helix with a residue in a helix (default 6).

matSS strand-strand-score
When secondary structure scoring is used, strand-strand-score is the value added to the secondary structure term for aligning a residue in a strand with a residue in a strand (default 6).

matOO other-other-score
When secondary structure scoring is used, other-other-score is the value added to the secondary structure term for aligning a non-helix, non-strand residue with a non-helix, non-strand residue (default 4).

matHS helix-strand-score
When secondary structure scoring is used, helix-strand-score is the value added to the secondary structure term for aligning a residue in a helix with a residue in a strand (default -9).

matHO helix-other-score
When secondary structure scoring is used, helix-other-score is the value added to the secondary structure term for aligning a residue in a helix with a non-helix, non-strand residue (default -6).

matSO strand-other-score
When secondary structure scoring is used, strand-other-score is the value added to the secondary structure term for aligning a residue in a strand with a non-helix, non-strand residue (default -6).

verbose true | false
For each chain-chain pair, send additional information to the Reply Log:

Sequences: followed by the pairwise sequence alignment, i.e., two lines, each containing a sequence name and (gapped) sequence
Residues: followed by two lines, each a comma-separated list of the structure residues associated with the nongap positions of the corresponding sequence; missing structure residues are reported as None
Residue usage in match (1=used, 0=unused): followed by two lines, each a comma-separated list of zeroes and ones, indicating which structure residues were used in the final fit

Usage: ( mmaker | matchmaker ) refstruct matchstruct options

Options

Alignment Scoring Options

Usage:
( mmaker | matchmaker ) refstruct matchstruct options