Bootstrapping Bioinformatics

When molecular biology courses have a mandatory requirement for basic information science (data down a noisy channel) that's when we'll be making real advances.

Ewan Birney, developer of ENSEMBL in an O'Reilly Network interview.

Bioinformatics is a rapidly growing field at the junction of molecular biology and computer science. The University of Puerto Rico is in need of expertise in this area, and has decided to develop a course to bootstrap new students into the area.

This course is being developed for Dr. Fernando Gonzalez from the UPR-RRP, and is designed to introduce the basic concepts involved in bioinformatics, using the programming language python. The course is targeted at advanced undergraduates or graduate students in computer or the natural sciences. The course is designed as a series of half-day sessions, combining a lecture introducing the material, and a laboratory exercise to give hands-on experience with the material. We plan to teach the course to 10-20 students over 5 Saturdays at the HPCf Teaching Lab. Students will be encouraged to install the software on their own computers and work on the Lab sessions during the week, or can come visit the HPCf.

We have selected python for this course. python is an interpreted, dynamically typed programming language that has many developers, and a growing reputation in scientific circles. The interpreted and dynamic nature of the language encourages interactive development, where the student can test out many computations and examine the results of each. Many packages for scientific computing are available for python, including biopython for sequence analysis tasks, mavric for phylogeny, MMTK for molecular modelling. The VMD program for molecular visualization allows complex visualizations to be scripted or automated using python. This means we can teach most of the course with only one language, concentrating on algorithms and fundamental principles, instead of programming mechanics. For the section on microarray analysis, we will use the R language in addition to python.

In addition to python, we will leverage many other free resources on the internet for this course, and teach the students to search for and take advantage of these resources for their own work. These resources include the VSNS BCD bioinformatics course, the free online book "Think Python", biopython, blast, clustalw, phylip, mavric, MMTK, VMD, and the R Project for Statistical Computing.

Course Outline

Session 1
Introduction to bioinformatics: based on the VSNS BCD Introduction. Lab session - retreiving and interpreting sequences from genbank via the web.
Session 2
Internet resources for bioinformatics: based on a workshop given at the UPR. Lab session - given a sequence, find similar sequences, and perform a multiple alignment online.
Session 3
Programming with python: based on the online book Think Python Lab session - editing and running simple python programs
Session 4
Sequence analysis with biopython: based on the BioPython Tutorial and Cookbook. Lab session - retreiving biological sequences from genbank with biopython, and scanning for restriction sites or other sequence motifs.
Session 5
Sequence alignment: based on the VSNS BCD Pairwise Alignments chapter. Lab session - Blast searching with biopython, aligning the two most similar sequences in biopython.
Session 6
Multiple Sequence Alignments: based on the VSNS BCD Multiple Alignment chapter. Extend the pairwise alignments to multiple sequences, show why optimal alignments are infeasible, heuristic techniqes for multiple alignments.

Lab session - Construct a multiple sequence alignment using biopython's interface to clustalw.

Session 7
Sequence phylogeny: based on the VSND BCD Molecular Phylogenetics chapter. Building trees from alignments, estimating evolutionary distance, parsimony and likelyhood.

Lab session, use phylip and mavric to construct and display cladograms from the previous lab's sequence alignments.

Session 8
The Molecular Modeling Toolkit (MMTK): based on the MMTK User's Guide

Lab session - use MMTK to build a small protein and perform structural minimization.

Session 9
Viewing molecular structure with VMD: based on the VMD User's Guide VMD is a standalone program for viewing structures. It has a python interpreter built-in, through which we can control the visualization programatically.

Lab session - use VMD to render the structures we modelled in the previous session.

Session 10
Statistical analysis of microarray expression data: based on the microarray analysis workshops developed for Dr. Sandra Peņa, : Clustering Microarray Data, Gene Regulation Networks From Expression Data. These lectures will use the R language in addition to python, for statistical anlysis of microarray data sets.

Troglodita approved!

Humberto Ortiz Zuazaga
humberto@hpcf.upr.edu

Most recent change: 2012/10/1 at 09:33
Generated with GTML