This exercise will increase your familiarity with some of the tools available from the
NCBI web site. Follow the instructions below and answer the questions in a separate text
document. Upload this document to the Assignment folder Homework 6.
In this exercise, you will use the tools at NCBI to learn more about globins, which are
frequently referenced in the textbook.
Change the pull-down menu near the
top of the page from “All Databases” to “protein” and search for “beta globin”.
1)
How many records are retrieved with this search?
We would like to start by finding the human beta globin protein. One way to do this is to
restrict the search to only human sequences. Add the words “AND Homo sapiens
[ORGN]” after the words beta globin in the search field and press Go. This searches for
entries with the words “beta globin” and the words “Homo sapiens” in the organism field.
2)
How many entries did this search retrieve?
Click on the first entry and look at the information contained in the file.
3)
What is the accession number of this beta globin entry?
Click the FASTA link at the top of the page. This redisplays the sequence in FASTA
format, a common format used in bioinformatics. The essential elements of this format
are a line beginning with the “>” sign followed by a description of the sequence, and the
nucleotide or amino acid sequence starting on the next line.
4)
Copy and paste the FASTA formatted amino acid sequence (starting with the >
symbol) as your answer to question 4.
On the right side of the page, click on the “Run BLAST” link. We are going to search for
proteins similar to the Beta globin protein using BLAST. Copy and paste the FASTA
format beta globin protein sequence into the Query sequence field. Change the database
to “refseq”. Press the “blast” button at the bottom of the page. You will be redirected to a
temporary page that displays the estimated wait time for the search. This could take
several minutes, but is usually quick.
The results page will display the “hits” in graphical format at the top of the page. Scroll
down to see the alignments of your query protein with hits from the database. Scroll
back above the graphical output and find click the link “taxonomy reports”. This will
display a list of all of the organisms contain sequences similar to your query. Keep in
mind that the default settings of the blast program restrict your results to the top 100 hits,
there are many more good hits that are not displayed.
Immediately to the right of the species names are the scores of the blast matches. Since
the sequence you searched with is also in the database, the best hit will be to your query
sequence from humans.
5)
What is the score of the hit to human sequences? What other organisms contain
sequences that match with the same score? (click on the scientific name on the
taxonomy reports page to see the taxonomy entry for that organism. You can find
the common name listed there if one exists).
All of the organisms with similar sequences found in this search are primates. Lets see if
we can find if more distantly related organisms have proteins similar to beta globin.
Return to the protein blast page and paste the beta globin sequence in the query window
again. This time type “Mammals” in the “Organism” field and check the “exclude” box.
This will restrict the blast search to non-mammal sequences. Press Blast again and wait
for the results.
Again, you will have 100 or so hits. Scroll down to the alignment of the top hit. There is
a lot of information here, use it to answer the following questions.
6)
What is the score and expect value of the top hit? How many “identities” or
identical amino acids are shared between these two sequences? What percentage
of the sequences is identical?
7)
The result also lists a category called positives, what do you think positives are?
(look at the alignment for clues).
8)
From which species is the top hit? Can find the common name for this organism?
Find the hit to a “hemoglobin, subunit beta” protein from the chicken,
Gallus
gallus
(use
a text search). Click on the link “Sequence ID: NP_990820.1”. The link takes you to the
gene page for hemoglobin beta subunit from chicken. This provides much information
about the organism, gene and protein.
On the right side, find “More about the HBBA gene” and click on “HBBA gene”
Scroll down to “Genomic Context”.
9)
On which chicken chromosome is this gene located? What other globin genes are
located in this region? Mouse over the arrows representing genes in the figure in
the “Genomic context section”.
10)
Click on OR51M1 that is downstream of the hemoglobin genes. What type of
protein does this gene encode