Bioinformatics: Principles and Applications

When molecular biology courses have a mandatory requirement for basic information science (data down a noisy channel) that's when we'll be making real advances.

Ewan Birney, developer of ENSEMBL in an O'Reilly Network interview.

Bioinformatics is a rapidly growing field at the junction of molecular biology and computer science. The University of Puerto Rico is in need of expertise in this area, and has decided to develop a course to bootstrap new students into the area.

This course was developed for Dr. Fernando Gonzalez from the UPR-RRP, and is designed to introduce the basic concepts involved in bioinformatics, using the programming language python. The course is targeted at advanced undergraduates or graduate students in computer or the natural sciences. I have adapted the course for Dr. Alma L. Santiago-Cortes at Pontificia Universidad Catolica The course is designed as a series of hour long sessions, combining a lecture introducing the material, and a laboratory exercise to give hands-on experience with the material. The workshop was held Thursday, May 27, 2010 at the Centro de Educacion Virtual in the Encarnacion Valdez Library of the Pontificial Catholic University of Puerto Rico.

We have selected python for this course. python is an interpreted, dynamically typed programming language that has many developers, and a growing reputation in scientific circles. The interpreted and dynamic nature of the language encourages interactive development, where the student can test out many computations and examine the results of each. Many packages for scientific computing are available for python, including biopython for sequence analysis tasks, mavric for phylogeny, MMTK for molecular modelling.

In addition to python, we will leverage many other free resources on the internet for this course, and teach the students to search for and take advantage of these resources for their own work. These resources include the VSNS BCD bioinformatics course, the free online book Think Python, biopython, blast, clustalw, phylip, mavric, MMTK, VMD, and the R Project for Statistical Computing.

Course Outline

Session 1
Introduction to bioinformatics: based on the VSNS BCD Introduction. Lab session - retreiving and interpreting sequences from genbank via the web.
Session 2
Internet resources for bioinformatics: based on a workshop given at the UPR. Lab session - given a sequence, find similar sequences, and perform a multiple alignment.
Session 3
Programming with python: based on the online book Think Python Lab session - editing and running simple python programs
Session 4
Sequence analysis with biopython: based on the BioPython Tutorial and Cookbook. Lab session - reading biological sequences from files with biopython, and scanning for restriction sites.

Session 1 - Introduction to bioinformatics

Lab session - retreiving and interpreting sequences from genbank via the web.

  1. Search genebank and obtain the sequence for the rhodopsin gene in Xenopus laevis, the accession number for the cDNA is L07770.
  2. Can you find the sequence for whale myoglobin? The structure we saw is called MBCO.

Session 2 - Internet resources for bioinformatics.

Lab session - given a sequence, find similar sequences, and perform a multiple alignment.

  1. Follow the EMBOSS tutorial on sequences. Remember to use "tembl" and "tsw" as the database IDs.
  2. Do the multiple sequence alignment exercises as well.

Session 3 - Programming with python

Lab session - editing and running simple python programs

  1. I have put the gccontent.py program on the course page. Using this as a model, write a function to compute the melting point of a sequence, using the formula:

    Tm = 64.9 + 41 * (G + C − 16.4)/(A + T + G + C )

  2. Print the %GC and melting points of seq1 seq2 seq3.

Session 4 - Sequence analysis with biopython.

Lab session - reading biological sequences from a file with biopython, and scanning for restriction sites.


Troglodita approved!

Humberto Ortiz Zuazaga
humberto@hpcf.upr.edu

Most recent change: 2012/10/1 at 21:14
Generated with GTML