representing sequence annotation

*Note: This is a python code that should work on UNIX terminal. Do not copy and paste other people’s work. I want something original.

The GFF3 format is used for representing sequence annotation. You can find the specification here: http://www.sequenceontology.org/gff3.shtml The genome and annotation for Saccharomyces cerevisiae S288C is on the class server here:

/home/jorvis1/Saccharomyces_cerevisiae_S288C.annotation.gff

This file has both the annotation feature table and the FASTA sequence for the molecules referenced.

(See the ‘##FASTA’ directive in the specification.)

Within the feature table another column of note is the 9 th , where we can store any key=value pairs relevant to that row’s feature such as ID, Ontology_term or Note. Your task is to write a GFF3 feature exporter. A user should be able to run your script like this:

$ export_gff3_feature.py –source_gff=/path/to/some.gff3 –type=gene –attribute=ID –value=YAR003W

There are 4 arguments here that correspond to values in the GFF3 columns. In this case, your script should read the path to a GFF3 file, find any gene (column 3) which has an ID=YAR003W (column 9). When it finds this, it should use the coordinates for that feature (columns 4, 5 and 7) and the FASTA sequence at the end of the document to return its FASTA sequence.

Your script should work regardless of the parameter values passed and should say the following:

“No features were found that matched your query.” or “More than one feature matches the query but only one shown below.” (Like the comment that is written before, it should also check and warn if more than one feature matches the query.)

The output should just be printed on STDOUT (no writing to a file is necessary.) It should have a header which matches their query, like this:

>gene:ID:YAR003W

…. sequence here …

As an extra challenge, you can format the sequence portion of the FASTA output as 60-characters per line, which follows the standard.

Provide the code and the output. Do test runs with 3 features that are present in the file and 1 wh