Homework 1 Warning: Overall this is a pretty hard assignment for beginners. Make sure you make time to do it. Homeworks should be submitted as follows: A text file with all the solutions fully annotated and explained. Separate program files for each piece of code. Homework should be submitted electronically to your TA and to myself. The subject line of the email should be “BioE142 Home #1”. 1) Basic input and output Write a small PERL script to guess a number you are thinking between 1 and 10. Let the program guess a number: you indicate that the answer is greater than, equal to or less than the number. Continue until the program guesses the number. Try and make it guess the number in as few steps as possible. Argue roughly why you think your solution is optimal. 2) Text Processing a) Extract the genome sequence from the GENBANK file: ftp://ftp.ncbi.nih.gov/genbank/genomes/Bacteria/Bacillus_subtilis/AL009126.gbk. Print it out into a new file with all the bases in upper case, and as one long word with no base numbers or new lines. b) Using this new file (NOT the GENBANK file), calculate the number of times each base appears in the file. Make sure it matches that reported in the GENBANK file.(see the line starting with BASE COUNT). c) Using the GENBANK file write a PERL script to make an associative array of the open reading frame base ranges keyed to the gene name (the RANGE array). Make another hash associating the gene name with whether or not the ORF appears on the Watson or Crick strand (the STRAND array). Print out these hashes into a file. d) Convert the little programs in (c) into subroutines for the following program: extract from the file in (a) all the open reading frames according the RANGE array, and print out all the ORFs (one per line) in the same orientation (3’->5’) and with the proper coding (complement) by reference to the STRAND array preceded by the gene name. Check your results by hand by translating a few of your genes into amino acids and comparing to the results in the GENBANK file. e) Reading the file generated in (d), write and efficient program to calculate the codon usage in B. subtilis (the number of times each codon appears) and print it out.