Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Viral genome annotation scripting assignment

BINF2010-Assignment specification (15%)

"Glue" programming and scripting are important skills in bioinformatics, allowing the automation of complex and repetitive tasks when analysing large datasets. This assignment is worth 15% of the total mark for the course and requires you to write a script to automate a "pipeline" of bioinformatics tasks.

Annotating and analyzing viral genomic sequences

Viruses are infectious agents that replicate only inside the living cells of an organism. An enormous varlety of genomic structures can be seen among viral species. Along with studying the viral cycle and target hosts to decipher the infectious mechanism, knowing the genetic code for viruses is key for vaccine design and studying the antibody/antigen repertoire.

In this assignment, you will write a Python script to query and annotate viral genomes deposited in the NCBI viral database, the output of which will allow you/others to analyse the virus.

Specifications

Two options for the assignment are presented. The assignment involves answering questions in the Jupyter notebook and writing a Python script (Option 1), with extension tasks and questions (Option 2). Students who are doing or have done COMP2041 Software Construction and are familiar with Python should be doing Option 2. If you have or are taking COMP2041 and you choose Option 1 your mark will be capped at 65%.

Specification for Option 1

Specification for Option 2

Learning outcomes

After the completion of this assignment, you should be able to:

Understand the needs and requirements of a pipeline specification

Demonstrate the ability to use bioinformatic libraries and modules, including accessing and reading their documentation

Apply common bioinformatic procedures using python

Construct scripts in python to perform common tasks

Evaluate viral genomic data and databases

General considerations:

1. You will fill in a Juypter notebook and a Python script. You will need to submit both!

2. Your script should not crash or hang when given incorrect user input (e.g., no arguments, or an incorrect argument) and instead should quit gracefully with a suitable error message.

3. Your script should be properly formatted and commented. Some marks will be given for code that can be understood easily.

4. The script will be marked on its ability to perform the required task correctly, and its compliance with the specification.

5. You can make use of open source libraries (e.g., BioPython) as long as this use is specified in the comments, and the script does not require any additional special installation steps. Any extras should be either already installed on the CSE computers or binftools or included in the script itself.

6. Your submission should consist of two files: viral genome annotation ipynb and annotate_viral_genome.py. If you are doing Option 2, your python script should be called annotate_viral genome ext.py. Skeleton files are included in the data folder.

7. Your script should be your own work. The use of published open source code is acceptable if acknowledged in the comments. The use of ChatGPT-type LLMs is discouraged, and there wil be mark penalties for their use if not acknowledged.

8. Submission will be through a link on Moodle

9. Submission deadline is the Friday of Week 9 (November 2025) at 5pm.

10. Please ask your questions on the Assignment forum/channel on Teams.

11. Finally: TEST, TEST and then TEST some more. Use a varlety of inputs, testing locations etc, and make sure your program has no unintended consequences.

12. Please use python3