Generation of a Phylogenetic Tree based upon DNA sequence alignment

 Adapted from Scott Cooper (University of Wisconsin, La Crosse, cooper@mail.uwlax.edu )  Originally found at  (http://bioquest.org/bioinformatics/module/cooper/Exercise/exercise.htm) Adapted for NGWB by Mark Alan Miller January 2009.

A phylogenetic tree is a type of graph that scientists use to describe the evolutionary relationships between related organisms.  Traditionally these trees were created using physical traits of organisms such as bone structure, beak shape, etc.  More recently molecular data has been used to support and refine these phylogenetic trees. This is done by aligning DNA sequences for a particular gene from several organisms and then applying a program to examine the number of mutations that have accumulated in these sequences since the species diverged from a common ancestor. The more accumulated mutations, the more distantly related the two species.

Exercise:   (tutorial  http://www.ngbw.org/labs/primates/primate_lesson.htm)

There are two steps to creating a phylogenetic tree of the mitochondrial DNA, D-loop region:

1.  Aligning the DNA sequences (this determines which positions in the sequences are related  evolutionarily)

2.  Using the magnitude of the differences between the aligned DNA sequences to generate a phylogenetic tree


1. Using the NGBW to align two or more sequences.

If you haven’t done so already, create an account, and log on to the NGBW (www.ngbw.org)

First we will align DNA sequences from the Mitochondrial D-loop from five species (Human, Chimpanzee (2) include regular chimpanzee and pygmy chimpanzee, Gorilla, Orangutan and Neanderthal).   This is easy if you happen to know the sequence IDs already. Today we are looking for these primate sequences; their codes are listed:

Human: Homo sapiens AB241389

Neanderthal: Homo sapiens neanderthalensis AF254446

Chimpanzee (1): Pan troglodytes AJ851169

Pygmy Chimpanzee (2): Pan paniscus AJ829473

Gorilla: Gorilla gorilla AJ422244

Orangutan: Pongo pygmaeus AJ627443

You can get these in the NGBW as follows:

After logging in, click on the word Folders on the top left of the page. Click on “Create New Folder” button, and create a folder named phylo. When you save the folder, this message will appear:

There is currently no data in this folder.
What would you like to do?

Chose “Search for Data”

A search dialogue box appears. Type in the query string:

AB241389 OR AF254446 OR AJ851169 OR AJ829473 OR AJ422244 OR AJ627443

This string will return any sequence that has one of the 6 identifiers.

Now tell the application what kind of data you are looking for.

Entity Type: Nucleic Acid

Data Type: Sequence

A Database list will appear. Select GBPRI: Genbank Primate Sequences

Now click Submit Search.

After a few minutes, a list of six sequences will appear, corresponding to the six sequences you asked for. Choose these sequences individually with the check boxes to the left, or with the “Select All” check box. Now click Save results.

These sequences are now in your data area, and a success message will appear in green at the top of the page.

Now align the data. 

Click on the “Tasks” icon for your phylo folder. This message will appear:

There are currently no tasks in this folder.

Click the “Create New Task” button.

When the Task management pane appear, click on the “Create New Task” Button. Enter a description for the task and click the “Set Description” button.

Now click on the “Select Input Data” button. Check the click boxes to the left of all the sequences you just imported. Click the “Select Data” button.

When the Task Creation Pane re-appears, click the “Select Tool” button. From the Nucleic Acids Sequence Tools tab, choose “CLUSTALW_N” (the tools are alphabetical). Now click the "Save and Run Task" button.

A new page will load that lets you follow the progress of your jobs. Click the “Refresh Tasks” tab near the top of the page, until the “View Status” button on the right turns into “View Results.” Click on the “View Results” tab, and a page showing your results will appear. Click on the link “outfile.aln”, and the results of your alignment will be exposed.

The results will show the six sequences aligned, with asterisks representing a strictly conserved positions.  Black letters will illustrate a mismatch and dashes will represent gaps.

Presentation1.jpg

 

Click the “Save to Current Folder” button. A window will appear that lets you name and specify the kind of data you are saving. Enter a data Label, then select Entity Type: “Nucleic Acid” Data Type: “Sequence Alignment” and Format: “Clustal”. Then click the “Save” button.

2. Using NGBW to create a phylogenetic tree.

Now go back to the Tasks area of your folder. Click the “Create New Task” button. Give the task a description and “Set the Description”, just like before. Click the “Select Data” button, and find your alignment data, check the box to the left of the alignment, and click “Select Data.” When the task creation page reloads, choose Select Tool, and find “CLUSTALW_DIST” under the Phylogeny/Alignment Tools tab.  When the task creation page appears, click the “Save and Run” button.

When the Task management page reloads, use the “Refresh Tasks” button to monitor when the job completes. When the “View Output” button appears, click on it, and expose the results. Click on the infile.dst link to expose the Distance Matrix results. Record the distance matrix in your journal to examine later.

CLUSTALW_DIST also outputs a phylogenetic tree. Under the “View Output” for CLUSTALW_DIST click on the infile.ph link. Save this data to your current folder, just like in 1.d and 1.e. But this time chose Entity Type: “Taxon” Data Type: “Phylogenetic Tree” and Format: “Newick”

Now go find this data item in the Data area of your folder, under the tab “Phylogenetic Trees.” Click the data item, and it will open up, revealing these two links “Show/Hide Data Contents | Draw Tree.” Cllck on the Draw Tree link, and you will see an interactive view of the Tree.

Congratulations you are now an evolutionary biologist.

If you would like a printable version of this tree, you can create a task using the infile.ph as input and  DRAWGRAM as the tool, using the same strategy you used to run CLUSTALW_DIST.

3. Analysis of trees

Examine your tree.

Draw your tree. Add the distances from CLUSTALW_DIST information to the tree.

Which species appear to be the most closely related? (make sure you use data to support and elaborate your answer.)

From your tree, does it appear that Neanderthals were direct ancestors of humans, or did we share a common ancestor that branched off from the other apes? (make sure you use data to support and elaborate your answer.)
 
Try this analysis with 4-5 other species of your choosing. ( you can generate one tree with all 10-11 species)
 
Include your clustal distance matrices (clustal matrices and trees may be copied and pasted from Workbench).