CS22A Hands-On Sixteen: Chapter Eight (II)
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
CS22A Fall 2021
Hands-On Sixteen: Chapter Eight (II)
Problem One: Complement of a DNA sequence
Read, understand, and run complement.py.
The program uses a function: complement(), that takes one argument: s, of type string.
The program also uses the dictionary data structure, the for loop, and string concatenation with the “+” operator.
A dictionary consists of (key,value) pairs. Dictionaries are delimited by { and }.
● Example: basecomp = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'}.
Elements are retrieved from dictionaries with square brackets [key].
● Example: basecomp['G'] G is the key and C is the value Function complement() uses a docstring: the first line in the function, enclosed between
a pair of """, that explains the purpose of the function.
● Example: """Return the complementary sequence of s."""
Problem Two: [Bubalus bubalis]
We are going to retrieve the CDS (Coding Sequence) from the gamma-globin gene of Bubalus.bubalis.
● Go to NCBI: https://www.ncbi.nlm.nih.gov/
● Choose “Nucleotide” from the dropdown window
● Type “AM886151” in the search window and click on the blue “Search” button
● You will be taken to the page that contains the gamma-globin gene of Bubalus.bubalis
● We want just the coding sequence (CDS, all exons concatenated together: 210..295, 424..646, and 1547..1675), click on “Send to:” and check the choice “Coding Sequences”
● In the Format section, click “Fasta Nucleotide” and click Create File
● Save the CDS sequence in a file you name: “Bubalus_gamma_CDS”.
1) Modify translation.py from hands-on 15 to read the input sequence to translate the CDS from the Bubalus_gamma_CDS and have the output sequence of amino acids saved in an output file called Bubalus_gamma_protein. Name your program translation_ncbi.py.
Let us compare the answer we got from running translate_ncbi.py to the Bubaus.bubalis gamma protein found at NCBI.
Note that the accession number of the gamma-globin protein is
“CAP07633.1” (which can be seen in the NCBI gene page a few lines
underneath the information for the CDS).
● Go to NCBI: https://www.ncbi.nlm.nih.gov/
● Click on BLAST under Popular Resources on the right-hand side of the page
● Click on Protein BLAST from the BLAST page
● Choose “Align two or more sequences” in the “Enter Query Sequence” box
● Paste the accession CAP07633.1 in the first window and the sequence found in Bubalus_gamma_protein in the second window and then hit “BLAST”.
2) How similar are the two sequences? Explain.
Problem Three: Bird Sites
Understand and run bird_sites.py. Answer all the commented questions by using Python code.
2021-12-05