Online Sources of Sequence Alignments

This is by no means an exhaustive list, but includes several sources of protein multiple sequence alignments for use in Chimera and/or ChimeraX.* Why you might want such alignments:

Chimera(X) can be used to display the sequence alignment, calculate and display sequence conservation, superimpose structures, morph between structures, etc.
Related tutorials (Chimera): Sequences and Structures, Superpositions and Alignments, Mapping Sequence Conservation onto Structures
Related tutorials (ChimeraX): Coloring by Sequence Conservation

*Chimera(X) is not meant to handle very large multiple sequence alignments (those containing several hundreds to thousands of sequences). A good tool for working with large alignments is Jalview. It allows redundancy-filtering and saving a smaller alignment in various formats (aligned FASTA, etc.) that can be read into Chimera. Jalview also calculates per-column conservation directly and via AACon web service; these annotation values can be exported as CSV, but it can be difficult to reformat them properly into a sequence alignment header file (with the same number of positions as the filtered alignment) or residue attribute file for use in Chimera. A direct connection between Jalview and Chimera is under development.

Alignment sources are grouped as follows:

Descriptions here are minimal; see literature references and/or other documentation at the individual websites for more details.

Points to consider:

(A) Alignments containing proteins of known structure

These databases contain sequence alignments of proteins with experimentally determined 3D structures. Typically the names in the alignment are structure identifiers, which makes it easy to autofetch all the structures with a single step in Chimera (from the sequence alignment window, choose Structure... Load Structures). Of course, you can just fetch a subset of the structures individually with the open command or File... Fetch by ID.

(B) Alignments that do not necessarily contain proteins of known structure

If the corresponding tree in New Hampshire (aka Newick) format is available, it can be loaded after the sequence alignment has been opened.

(C) Server-generated multiple alignment from a single input

(D) DIY: Find sequences individually, use alignment server

Issues to consider are how diverse the set of sequences should be, alignment quality, and balance, i.e. an alignment could oversample some areas of the intended “sequence space” and undersample others. Imbalance can be reduced by filtering out sequences at some level of sequence identity, and in Chimera, using sequence-weighting options to calculate conservation.

I used the DIY approach to make the alignments in the Chimera “hormone-receptor complex” demo (under Tools... Demos in the menu) because I wanted to include sequences for the hormone and receptor from the same six species. The sequences were similar enough to align easily, so I didn't have to worry about tweaking parameters to improve the results.

Look up sequences (I usually save or text-edit the sequences into a single FASTA file):

Use a server to align them (order is merely alphabetical):
December 2023 / meng[at] / home page

If you find broken links or outdated information in this page, please let me know – thanks!