Center for Biomedical Research Excellence in Microarray Analysis

The human genome has been decoded, but the real work begins now: to discover the function of the genes and their interaction in order to find drugs to cure diseases. One of the main tools in order to do this is the use of microarrays which are the focus of our proposed center. The purpose of the Center will be microarray analysis and all the components will work closely with one another. The goals of the Center will be to create in Puerto Rico the research capacity to make a significant contribution to microarray research which is an essential component of modern computational biology. Our approach will be multidisciplinary with the purpose of serving the microarray area, but including tools from mathematics, statistics, electrical engineering, and computer science.

The center will leverage several existing initiatives in the UPR system. First, several of these projects are extensions of proposals first submitted as PR-BRIN Research Awards. Second, we will make extensive use of the HPCf's Bioinformatics Resource Center of the PR-BRIN to perform our computational analyses. Third, the research will have a direct impact on research being performed at the COBRE Microarray Facility in UPR-RRP.

The first project deals with the design of mathematical models for the reverse engineering problem, and their simulation and testing. The second project will use the mathematical models developed in the first project, and build specific networks, including an error-correction capacity. The third project takes a statistical approach to the analysis of microarray data. The fourth project looks at microarray analysis from the perspective of image processing and databases.

In order to create a permanent infrastructure for the future of research in our area, an important goal of the project will be the design of a PhD in computational biology and a research institute including a component in computational biology. Dr. Ricardo Gonzalez UPR-RCM, has established a relationship with Dr. David Deerfield, Director of the Biomedical Initiative at the Pittsburgh Supercomputing Center, which we intend to develop into a close collaboration.

Simulation and reverse engineering problem of genetic networks.

The reverse engineering problem consists of: given a set of gene expression data construct a particular model (in the given model class) that is consistent with the data. The simulation problem is: given a model, observe its behavior and compare to gene expression data from real networks.

We will work in both directions. We will develop mathematical models (like sequential dynamical systems) and their properties. We propose to study some algorithms that will be applied for describing relationships in genetic networks with a large number of genes, in particular, "continuous time recurrent neural networks". This algorithm is used in Reverse Engineering of Genetic Networks from time course data. A genetic dynamical system is a time-discrete dynamical system. That is, a finite dynamical system moving from continuous states to a finite set.

We are proposing to develop mathematical models of example gene expression regulatory networks in two experimental organisms: the bacteriophage Lambda and the fly Drosophila melanogaster. In addition, we will test models on the learning and memory datasets developed in Dr. Sandra Peña's microarray laboratory.

Bacteriophage Lambda, a lysogenic phage of Escherichia coli, can proceed through either of two alternative life cycles following infection of the host bacteria. Lambda can enter the reproductive lytic cycle (by replicating its genome, expressing its genes, and releasing assembled phage) or the integrative lysogenic cycle (by recombination with the bacterial host genome followed by replication along with bacterial cell division). Control of these alternative possibilities is achieved by a molecular mechanism involving the regulation of expression of several key genes and the stability of expressed gene products. Mathematical models will be developed to simulate this relatively simple genetic system (a simple regulatory network).

The fly Drosophila melanogaster is a multicellular organism with a complex life cycle and developmental program. Its fully sequenced genome predicts approximately 12,000 genes involved in the life of a fly. Recent microarray experimental results documented gene expression patterns for approximately one-third of Drosophila genes during the complete time course of the fly life cycle (Arbeitman et al., 2002). The many genes active in early development are maternal or zygotic in origin and function hierarchically to establish the body plan resulting in the fully functional adult fly. Some of these early genes are transcription factors that coordinately regulate the expression of numerous other genes. Mathematical models will be developed to simulate various specific example components of some gene regulatory interactions in this more complex genetic system.

The mathematical models developed to simulate regulatory networks in these well-characterized genetic systems will allow an increased understanding of the underlying interactions between components controlling gene activity important for development and enable the prediction of gene expression patterns under diverse experimental conditions. The mathematical models developed in this project will be extended to simulate increasing complex patterns of coordinated gene expression based upon new experimentation elucidating genome-wide genetic regulation in more sophisticated systems such as the rat or mouse, using the data obtained in Dr. Peña´s lab.

Literature Cited

  1. Arbeitman et al. 2002. Gene Expression During the Life Cycle of Drosophila melanogaster. Science 297: 2270 51; 2275.

Relevant Publications

  1. María Aviño, Dorothy Bollman, Oscar Moreno and Humberto Ortiz-Zuazaga. Genetic Sequential Dynamical Systems. Preprint.

  2. Oscar Moreno, Dorothy Bollman and María A. Avino. Finite Dynamical systems, Linear Automata, and Finite Fields. Accepted WSEAS International Conferences on: System Science 2002, Applied Mathematics and Computer Science 2002, Power Engineering Systems 2002. Brazil Oct 21-24, 2002.

Potential External Advisory Committee members

Roland Somogyi, Chris Barrett, Reinhardt Laubenbacher

Gene network design and error correction for microarrays

In this project we will design gene networks that fit the models of the previous project and furthermore we will include an error-correction capacity for out networks. In other words, if we find the network that fits our microarray data we will be able to correct the errors that the microarray measurements have. We will also work in the problem of fitting the microarray data into the best network.

Relevant Publications

P. Udaya, O. Moreno, M.A. Aviñó and D. Bollman. Connections between Ring Theory and Finite Dynamical Systems. Preprint

Potential EAC members

Richard Karp, Solomon Goulomb

Microarray expression analysis: statistical significance of expression changes

This research group will focus on the following applications of statistical analysis to probems in microarrays:

  1. Unsupervised learning (clustering) to find genes that behave similarly in various condition and to find subgroups of samples (patients' tissues) that are similar to each other.

  2. Supervised Learning to predict types of tumors based on the gene expression profile of each sample. In particular, bagging, boosting, nonparametric classifiers, support vector machine

  3. Future selection techniques (Filter and Wrapper methods) applied to gene expression data.

  4. Parallel computation applied to procedures 1-3 since I am using nonparametric statistical techniques that requires a lot of data to produce good estimations

  5. Detect statistically valid changes in gene expression measurements from replicated microarray expreriments. Microarray data is extremely noisy, and better tools for handling thousands of samples are needed.

Relevant publications

  1. Y. Robles, H. G. Ortiz-Zuazaga, Y. Carrasquillo, S. Peña de Ortiz. Gene Expression Profiling of the Rat Hippocampus in Spatial Discrimination Learning. Submitted November 20, 2001 to the Journal of Neurochemistry.

  2. H.G. Ortiz-Zuazaga, Y. Robles, R. Chiesa, S. Peña de Ortiz. Analysis of Learning-Related Changes in Gene Expression Using Nylon Membrane cDNA Microarrays. Abstract presented at the Fifth Annual International Conference on Computational Molecular Biology (RECOMB 2001). Montreal, Quebec, Canada. April 2001.

Potential EAC members

James Berger, Nir Friedman, Amir Ben-Dor

Microarray data analysis

All of my interests in bioinformatics fall within the general area of microarray data analysis. My method is a systems approach to microarray data analysis. In it my plan is to develop different modules that work together in the generation of new knowledge from raw microarray data.

  1.  The first module involves Data Preprocessing in which raw data in the form of an image is used to obtain signal intensity per spot using segmentation and background subtraction algorithms. Differentially expressed genes are then determined using supervised discriminatory gene classifiers (parametric and non-parametric).

  2. The first step towards the generation of new knowledge involves the integration of data.  This requires the development of new databases and integration strategies.  The information stored in these will include gene expression, biochemical reaction pathways, xenobiotics and biomolecules. Functional data will be integrated using a hybrid of bio-ontologies and molecular interactions.

  3. Another area of study is the use of gene expression data to identify genes and biochemical pathways associated with diseases, suggest novel targets and mechanism of action or modulation, prioritize new drug targets for screening, and assess the potential toxicity of new therapeutic compounds. A rule-based expert system will be developed using the gene expression data and the databases developed in 2) to build the pathways in which the differentially expressed genes are involved in.

  4. The next step in development requires dealing with incomplete biochemical pathway information, since not all reactions are fully known for a given organism. Two different approaches will be used to infer new pathway information:

    1. If no new information results from this, the data is compared with orthologous pathways in other organisms, closely related to the organism in question, to infer new interactions by this comparison. 

    2. Another approach is to use the expression data together with pathway data in the construction of probabilistic models such as  Bayesian networks to infer unknown pathway structures.

  5. Finally, interactive pathway visualization tools are required to view and query the pathway results.

This project will develop software that will augment the data handling capability of the existing COBRE-Neuro Microarray Core Facility.

Relevant Publications

JE Ramírez-Vick, Bioinformatics and the Post-Genomic Era. BIND Journal. Vol. 1, No. 1, 2002.

Potential EAC Members

Adam Arkin, UC Berkeley, Bioengineering and Chemistry and LBNL (http://www.lbl.gov/~aparkin/)

Roger Brent, CEO Molecular Sciences Institute, Berkeley (http://www.molsci.org/)