Bioinformatics: Principles and Applications
When molecular biology courses have a mandatory requirement
for basic information science (data down a noisy channel) that's when
we'll be making real advances.
Ewan Birney, developer of ENSEMBL in an O'Reilly
Network interview.
Bioinformatics is a rapidly growing field at the junction of
molecular biology and computer science. The University of Puerto Rico
is in need of expertise in this area, and has decided to develop a
course to bootstrap new students into the area.
This course was developed for Dr. Fernando Gonzalez from the
UPR-RRP, and is designed to introduce the basic concepts involved
in bioinformatics, using the programming language python. The course
is targeted at advanced undergraduates or graduate students in
computer or the natural sciences. I have adapted the course for Dr.
Alma L. Santiago-Cortes at Pontificia
Universidad Catolica The course is designed as a series of hour
long sessions, combining a lecture introducing the material, and a
laboratory exercise to give hands-on experience with the material. The
workshop was held Thursday, May 27, 2010 at the Centro de Educacion
Virtual in the Encarnacion Valdez Library of the Pontificial Catholic
University of Puerto Rico.
We have selected python for this course. python is an interpreted,
dynamically typed programming language that has many developers, and a
growing reputation in scientific circles. The interpreted and dynamic
nature of the language encourages interactive development, where the
student can test out many computations and examine the results of
each. Many packages for scientific computing are available for python,
including biopython for sequence analysis tasks, mavric for phylogeny,
MMTK for molecular modelling.
In addition to python, we
will leverage many other free resources on the internet for this
course, and teach the students to search for and take advantage of
these resources for their own work. These resources include the VSNS
BCD bioinformatics course, the free online book
Think Python,
biopython, blast, clustalw,
phylip,
mavric, MMTK, VMD, and the
R Project for Statistical Computing.
Course Outline
- Session 1
- Introduction to bioinformatics: based on the VSNS BCD Introduction.
- Definition of bioinformatics
- overview of the course
- Introduction to nucleic acids and proteins
- bioinformatic databases
Lab session - retreiving and interpreting sequences from genbank via
the web.
- Session 2
- Internet resources for bioinformatics: based on a workshop given at the UPR.
Lab session - given a sequence, find similar sequences, and perform a
multiple alignment.
- Session 3
- Programming with python: based on the online book
Think Python
- programming as a recipe (or lab protocol)
- python as a very high level language
- python control structures
- python functions
Lab session - editing and running simple python programs
- Session 4
- Sequence analysis with biopython: based on the BioPython
Tutorial and Cookbook.
- Reading sequence files
- Iteration
- GC content revised
Lab session - reading biological sequences from files with
biopython, and scanning for restriction sites.
Session 1 - Introduction to bioinformatics
- Definition of bioinformatics
- Genes
for Geeks: an introduction to molecular biology for computer
scientists (138 KB PDF file).
- Introduction to nucleic acid and protein sequences
- bioinformatic databases
- More information: MATE 6685
Lab session - retreiving and interpreting sequences from genbank via
the web.
- Search genebank
and obtain the sequence for the rhodopsin gene in Xenopus laevis, the
accession number for the cDNA is L07770.
- Can you find the sequence for whale myoglobin? The structure we
saw is called MBCO.
Session 2
- Internet resources for bioinformatics.
Lab session - given a sequence, find similar sequences, and perform a
multiple alignment.
- Follow the EMBOSS tutorial on sequences. Remember to use "tembl" and "tsw" as the database IDs.
- Do the multiple sequence alignment exercises as well.
Session 3
- Programming with python
- programming as a recipe (or lab protocol)
- Why bother?
- python as a very high level language
- python control structures
- python functions
- Python lecture
Lab session - editing and running simple python programs
- I have put the gccontent.py program on
the course page. Using this as a model, write a function to compute
the melting point of a sequence, using the formula:
Tm = 64.9 + 41 * (G + C − 16.4)/(A + T + G + C )
- Print the %GC and melting points of seq1 seq2 seq3.
Session 4
-
Sequence analysis with biopython.
Lab session - reading biological sequences from a file with
biopython, and scanning for restriction sites.
Humberto Ortiz Zuazaga
humberto@hpcf.upr.edu
Most recent change: 2012/10/1 at 21:14
Generated with GTML