|DPDB Home Page||Search||Analysis||Help||Statistics||Links||Contact us|
|(1) Sequence comparison||(2) Nucleotide Diversity|
When run as an application jalview takes the name of an alignment file on the command line. The format is :
java jalview.AlignFrame <alignfile> File|URL <format> [-mail <mailserver>-srsserver <srsserver> -database <srsdatabase>]For people using the script file jalview.bat or Jalview the syntax is
Jalview <alignfile> <format><alignfile> is the name of the alignment file which has to be in one of the supported formats. The allowed formats are MSF, FASTA, PIR, CLUSTAL and BLC and are described in more detail here.
File|URL If you are reading from a local file use the File switch here. If you are reading a URL use the URL switch.
<format> This must be one of MSF, FASTA, PIR CLUSTAL, BLC, PFAM, MSP
1) For a URL:java jalview.AlignFrame http://circinus.ebi.ac.uk:6543/jalview/llym.msf URL MSF2) For a local fileJalview 1lym.blc BLCjava jalview.AlignFrame 1lym.pir File PIR<mailserver> This defaults to circinus.ebi.ac.uk. For far away places and behind firewalls insert your own mail server.
<srsserver> The default is the EBI srs server :srs.ebi.ac.uk:5000/srs5bin/cgi-bin/ .If you use your own srs server then this option takes the location (minus the http://) of the wgetz program. In most cases this will beyour.hostname/srs5bin/cgi-bin/<database> The default database is swall which is probably EBI specific. Change this to your relevant database.
See here for more details about SRS access.
Example for access to the sanger SRS site and the pfam database :See Input from the command line for details of how to do this.
Selecting this option brings up a window where you can type in your local file and select the right format. Pressing the 'Apply' button loads up the alignment in a new window. Input formats allowed are MSF, FASTA, PIR, BLC, CLUSTAL, PFAM and MSP. Further details about formats are here.
This option allows you to save your alignment as text to a local file using much the same procedure as reading in a file. WARNING: files can be overwritten with no prompting.
You can save your coloured alignment as postscript using this option. A window appears where you can select the font and fontsize you wish to use as well as whether the output orientation is portrait or landscape.
As applets can't write or read local files I have provided a way of inputting alignments by cutting and pasting.
Select your alignment from your local text editor or xterm and paste directly into jalview. You then have to select the format your alignment text is in and click apply to tell jalview to interpret the output. If the format is correct a new alignment window will appear.
Unix selection: 'cat myalignfile' will display the alignment file on screen. Select the alignment text with the mouse. Move the mouse over the text input window and press the middle button. You should now have transferred your alignment to the jalview input box.
Windows selection: Open up your alignment in a text editor (notepad, wordpad or whatever). Select all the text with the mouse and type CTRL-C to copy it. Move your mouse over the jalview text input window and type CTRL-V. The alignment text should now be transferred to the input window.
Again, as applets can't write local files jalview provides output via email. Selecting the 'Mail alignment...' option from the File menu brings up a window where you can enter your email address and select the output format you want. Pressing the 'Apply' button initiates the mail transfer and will send you a text version of your alignment.
Similarly to the textbox input option the text version of your alignment can be output via a java text box. See the 'Input via a text box' description for how to cut and paste your alignments.
A coloured version of your alignment can be obtained by email. Selecting the 'Mail postscript...' option in the file menu brings up a window where you can enter your email address. You can then select your choice of fonts and the orientation of the paper. Pressing the Apply button generates the postscript and mails it to your account. For large alignments this can take quite a while and can generate large postscript files (>1Mb).
The formats supported are
For more details look here.
- MSF (GCG output with no checksum),
- CLUSTAL (Clustalw output),
- FASTA (common and simple format),
- PIR (less common but almost as simple format) ,
- BLC (AMPS output)
- PFAM (simple and has the advantage of including start-end points).
If you are running jalview via the button applet provided in the distribution then there are a number of parameters you can set to define different sequence groups and colour schemes. These are described here:
The html to do this is:<APPLET codebase="http://srs.ebi.ac.uk/~michele/jalview/"code="jalview.ButtonAlignApplet"width = 1000height = 35><param name=input value="http://srs.ebi.ac.uk/~michele/jalview/lipase.msf"><param name=type value="URL"><param name=format value="MSF"></APPLET>
Description of the html parameters
The <APPLET> tag<APPLET codebase="http://www.ebi.ac.uk/~michele/jalview/"code="jalview.ButtonAlignApplet"width = 150height = 40>This line defines where the code for the applet lies (www.ebi.ac.uk) and what java class is to be called
The width and height parameters define how many pixels the applet should take up in the browser. In this case they should be enough to display a button.
This CODEBASE method loads classes as and when it needs them so is probably suited for slow net connections. An
alternative is to load all the classes at once in a jar file (~100kb). This makes the applet slow to start but subsequent
To use a jar file:
<APPLET ARCHIVE="http://www.ebi.ac.uk/~michele/jalview/jalview.jar"CODE="jalview.ButtonAlignApplet"WIDTH= 250HEIGHT = 40>
Specifying an alignment file and format<param name=input value="http://www.ebi.ac.uk/jalview/test.msf"><param name=type value="URL"><param name=format value="MSF">These parameters describe where to find the alignment and what format it is in.
input - The URL of the alignment to be viewed
type - Whether the alignment is a file or a URL. Use URL here as File is not allowed for security reasons.
format - the format of the alignment. Allowable values are MSF, CLUSTAL, FASTA and PIR.
The applet above reads in a protein alignment in fasta format that also contains some secondary structure predictions and definitions. The alignment is split into two groups which are displayed differently.
The protein alignment has the residue text displayed and is coloured according to each residues agreement with the consensus sequence. The secondary structure predictions/definitions have no text displayed but are coloured according to the structural element (helix/sheet).
The consensus calculation is done over a subset of the sequences in the alignment (the protein sequences). This is also specified in the html. The html to do this is :
<APPLET archive="http://srs.ebi.ac.uk:5000/~michele/jalview/jalview.jar"code="jalview.ButtonAlignApplet" width = 100 height = 35><PARAM name=input value="http://srs.ebi.ac.uk:5000/~michele/jalview/test2.fa"><PARAM name=type value="URL"><PARAM name=format value="FASTA"><PARAM name="groups" value="2"><PARAM name="group1" value="1-36:PID:true:true:false"><PARAM name="group2" value="37-45:SECONDARY:false:true:true"><PARAM name=fontsize value="10"><PARAM name="mailServer" value = "srs.ebi.ac.uk"><PARAM name="Consensus" value="1-36"><PARAM name="srsServer" value = "srs.ebi.ac.uk:5000/srs5bin/cgi-bin/"><PARAM name="database" value = "swall"></APPLET>
Defining groups<param name="groups" value="2"><param name="group1" value="1-36:PID:true:true:false"><param name="group2" value="37-45:SECONDARY:true:false:false">groups - the number of groups defined
groupn - the group definition for group n. If there are N groups defined with the groups parameter then there must be
N groupn parameter lines.<br>
The value string for the groupn parameter is split into 5 fields separated by colons. The fields are:
<sequences in group> : <colour scheme> : <display boxes> : <display text> : <colour text>
Allocating sequences to a groupThe first field defines which sequences are to be in this group. A few examples are the most informative here I think.
20-26 Sequences 20,21,22,23,2425,26 are in the group
1,2,4 Sequences 1,2 and 4 are in the group
10-15,2,6-7 Sequences 2,6,7,10,11,12,13,14,15 are in the group
Defining a group colour schemeThis colour scheme will apply to the residue boxes or the residue text depending on whether those parameters are set
Allowed values are
ZAPPO - colour residues by physio-chemical properties
PID - colour residues by percentage identity agreement with consensus sequence
BLOSUM62 - colour residues by agreement with consensus or +ve BLOSUM62 score with consensus residue
HYDROPHOBIC - colour residues by hydrophobicity (red - hydrophobic, blue - hydrophilic).
SECONDARY - colour residues by secondary structure (H - magenta, E - yellow).
Displaying residue boxesThe third parameter can be true or false according to whether you want residue boxes displayed or not
Displaying textThe 4th parameter can be true or false according to whether you want the residue text displayed or not.
Colouring textThe 5th parameter can be true or false according to whether you want the residue text coloured. Displaying the residue
boxes and colouring the text will give the impression of no text displayed at all but will slow the display down
Defining sequences to be included in the consensusDefining the sequences for the consensus calculation is done in much the same way as for the colour grouping.
<PARAM name="Consensus" value="1,4,6-10">You can also select all the sequences by
<PARAM name="Consensus" value = "*">
Defining the SRS server
This is done using the srsServer parameter as follows<PARAM name="srsServer" value="www.sanger.ac.uk/srs5bin/cgi-bin/">n.b. the value does NOT have http:// at the front and is otherwise the URL of the SRS program wgetz
(If you are using Netscape you MUST have this server the same as both your mail server and the applet server)
Defining the SRS database
Use the database parameter to change your srs database (default is swall).<PARAM name="database" value="pfamseq">
Defining sequences in htmlIf you want to you can submit an alignment in the html as follows
<PARAM name="numseqs" value=3><PARAM name="seq1" value="ALSDKFHLKRHELKHASLDKHLKHR"><PARAM name="seq2" value="ALSDKFHLKRHE-----LDKHLKHR"><PARAM name="seq3" value="ALSDKFHLKRHELKHALKFL-HLKHR">Optionally you can supply parameters that define the ids for each of the sequences as follows:<PARAM name="id1" value="LCAT_HUMAN"><PARAM name="id2" value="LCAT_YEAST"><PARAM name="id3" value="LCAT_MOUSE">If you don't supply ids then the defaults will be Seq_1 Seq_2 etc.
Selected sequences are used in the Colour and Calculate menus. The Consensus option in the Calculate menu only uses the selected sequences in its calculation and will display an error in its status bar if none are selected. Selecting a colour scheme in the Colour menu will only apply that scheme if any sequences are selected. If none are selected that colour scheme is applied to all sequences.
Sequences can also be selected in other displays
such as the tree display window and the PCA results window. If a
sequence is selected in one window it will also be selected in all the
All columns may be deselected by choosing the
'Deselect all columns' option from the Edit menu.
To generate a new group or delete a group the buttons in the bottom right hand of the window can be used.
Tip: To generate a new group quickly :
To create a group from selected sequences or to see the available sequence groups refere to the previous entry. By default when jalview is first started all sequences are in the same group.
If Fastdraw is switched off other proportional fonts
can be used (Helvetica and Times) and the residues appear more spaced out
on the screen. The screen update time will also be slower
(typically by a factor of 3).
Smaller font sizes (probably < 6) are of most use if the text is switched off and the coloured residue boxes only are displayed (see view menu).
If the redraw speed is too slow for you on your
system then turning off the boxes option and colouring the text
black will speed it up considerably.
The residues are coloured according to their physico-chemical properties as follows:
|PG||Proline/Glycine (conformationally special)||magenta|
Choosing a colour (the colour selector)
Underneath the list of colours and residues is a panel
where you can select the rgb values of any colour you wish to use. The
user can either move the scrollbars to change the rgb values or type in the
values (0-255) in the text boxes. The new colour will be displayed in
the panel to the right of the scrollbars.
Changing colours (the residue panels)
Clicking on the colours assigned to different residues
with the left mouse button will cause whichever colour is displayed in the
colour selector to appear in that residue panel. If you wish to modify
an existing residue panel colour right clicking that colour will change the
the colour selector's colour to the residue panel colour. The colour
selector colour can then be modified.
For each colour present in that scheme a list of residues it is applied to appears to its right. These residues can be moved or deleted or added to.to group the residues in a different way. For instance you may just want to display the charged residues in one colour and the rest in another to highlight the charged ones or you may want to only colour the cysteines differently from the others.
If you wish to change the residues associated with a colour edit the residue string in the text field and press the 'Apply' button to its right. If any residues have been deleted from the text field they will be assigned a white colour and appear in the bottom residue panel. If any residues have been transferred from another colour panel they will be deleted from the old one. The main jalview alignment window will be automatically updated.
Any modifications of the colour scheme will only apply
to sequences that are selected in the main alignment window. This
allows the user to have multiple colour schemes in one alignment. If
no sequences are selected then the colour scheme applies to all sequences.
BEWARE:: there is NO UNDO function.
This option depends on a consensus calculation having
been performed. If no consensus exists (e.g. after a copy or a
clustalw alignment) then no residues are coloured.
The PID option colours the residues (boxes and/or text) according to the percentage of the residues in each column that agree with the consensus sequence. Only the residues that agree with the consensus residue for each column are coloured.
Percentage agreement Colour > 80 % Mid blue > 60 % Light blue > 40 % Light grey <= 40% White
When the features have finished transferring the
features will be displayed on the alignment with different colours for
different features. The colours are as follows
|anything else||Light gray|
When the features have been displayed on the alignment selecting a residue will change the display in the sequence feature console. The console will display details of any feature that has been selected and underneath a list of all features listed for that sequence.
There are at the moment a few limitations on the
sequence feature display:
means the swissprot ID HBA_HUMAN starting at position
3 and ending at position 45. If your alignment doesn't have the
correct start end positions the sequence feature overlay is at best
A good example of the usage of the start-end positions
is the Pfam database of
If everything is configured correctly (srs server,
database and alignment ids) then you should get output like
the following :
The main window is coloured using all the features in the Pfam pancreatic trypsin inhibitor alignment and the sequence feature console shows details of all features at the selected residue (which is in between 2 disulphide bonds and at the active site in this case). In the background can be seen the mini web browser showing the contents of a Swissprot entry.
When the editor first starts up the consensus
sequence is automatically calculated using all the sequences in the
alignment and the PID colour scheme is used as default. If the
consensus option is selected again only the currently selected sequences
are used to calculate it and all sequences in the alignment are coloured
according to that consensus.
For each pair of sequences the best global alignment is found using BLOSUM62 as the scoring matrix. The scores reported are the raw scores. The sequences are aligned using a dynamic programming technique and using the following gap penalties :
Gap open :
Gap extend : 2
When you select the pairwise alignment option a new window will come up which will display the alignments in a text format as they are calculated. Also displayed is information about the alignment such as alignment score, length and percentage identity between the sequences.
If you want to save that pairwise alignment (it's
not in any known format I'm afraid) you can cut and paste it from the text
window with the mouse. You can also press the 'View in alignment
editor' button to bring up another editor window.
The version implemented here only looks at the clustering of whole sequences and not individual positions in the alignment to help identify functional residues. This is due to memory and time limitations associated with java (you have to diagonalize a much bigger matrix basically). Plans are afoot to use the CORBA server written by Chris Dodge to do this 'residue space' PCA remotely.
When the Calculate->Principal component analysis option is selected all the sequences (not just the selected ones) are used in the calculation and for large numbers of sequences this could take quite a time. When the calculation is finished a new window is displayed showing the projections of the sequences along the 2nd, 3rd and 4th vectors giving a 3dimensional view of how the sequences cluster.
This 3d view can be rotated by holding the left mouse button down in the PCA window and moving it. The user can also zoom in and out by using the up and down arrow keys.
Individual points can be selected using the mouse and selected sequences show up green in the PCA window and the usual grey background/white text in the alignment and tree windows.
Different eigenvectors can be used to do the
projection by changing the selected dimensions in the 3 menus underneath
the 3d window.
When the tree has been calculated a new window is displayed showing the tree with labels on the leaves showing the sequence ids. The user can select the ids with the mouse and the selected sequences will also be selected in the alignment window and the PCA window if that analysis has been calculated.
Selecting the 'show distances' checkbox will put branch lengths on the branches. These branch lengths are the percentage mismatch between two nodes.
Postscript output can be generated for this tree and
mailed to you by clicking the Output button. This will bring up a
window asking you for your email address and you can set font
options and the page orientation. Clicking the Apply button will
generate the postscript and send the email.
Selection and output options are the same as for the
Hierarchical analysis is based on each residue having certain physico-chemical properties listed as follows:
In brief go about it like this :
SEE IMAGE This link provides an example of the output after grouping for Pfam family rnaseH:
The grouping by tree may not be satisfactory and the user may want to edit the groups (Edit->Groups...) to put any outliers together.
Before selecting the conservation option change the colour scheme to something sensible (Taylor or hydrophobicity for example). When the conservation is done the existing colour scheme is modified so that the most conserved columns in each group have the most intense colours and the least conserved are the palest.
This link shows the results of first colouring the alignment by hydrophobicity (Colour->by hydrophobicity) then performing conservation analysis (Calculate->Conservation). Conserved hydrophobic columns are shown with predominately red residues and conserved hydrophilic columns with blue. The most conserved regions have the brightest colours.
Here is shown the same conservation but with Taylor colours instead of hydrophobicity (Colour->Taylor).
The conservation analysis is done on each sequence group. This highlights differences and similarities in conserved residue properties between groups.
When this option is selected a window will appear giving you a message about whether your process is running and the time elapsed since the job was started. The cancel button will kill your process at any time.
The text box below should show the progress of your job but at the moment doesn't. I haven't been able to devise a simple way of displaying stdout as the alignment is progressing but I'm working on this. The stdout will appear in the xterm you started
When the alignment is finished a new alignment window is created with the aligned sequences in. No consensus calculation is done on these sequences by default so to see the similarity select Calculate->Consensus.
Due to applet security restrictions this option can
only be used from an application.
The Cancel button will cancel your job and the output is sent back to the text box below as the alignment progresses. As this application is written in 1.0 java (pretty much) to enable it to be used in older versions of netscape this display is somewhat flickery. Using the java1.1 textbox things are much improved but this won't be available until the majority of browsers support full 1.1.
SRS server and database
The default SRS server and database are srs.ebi.ac.uk and swall at the EBI. To change to your own SRS server either use the -srsserver and -database options on the command line (see command line parameters) or use the
<param name="srsServer" value="srs.ebi.ac.uk:5000/srs5bin/cgi-bin/">and
<param name="database" value="swall">options in the applet version.
Also for entries to be fetched correctly the sequence IDs in your alignment file must be of the right form.
The IDs must be :
HBA_HUMANwhere HBA_HUMAN is the sequence ID (not the accession) and the 6 and the 20 refer to the start and end residues of that sequence entry in the alignment. The start and end positions do not include gaps and are only essential if you wish to display the sequence features in the alignment window. I urge everyone to include these numbers as it stops embarrassing mistakes when inferring function from annotation of a multi-domain protein.
The application version now allows access to SRS through it's own mini web browser but at present none of the SRS links work (I'm trying to resist rewriting netscape :)
The alignment window
The Status Bar
The right hand side of the status bar is mostly for development purposes but displays in milliseconds the time taken for the last redraw of the central sequence panel.
The ID panel
* From Michele Clamp (firstname.lastname@example.org), in http://www.es.embnet.org/Doc/jalview/contents.html.