Bioinformatics COBRE (BiCu)

Relevant info

The RFA reads in part:

The purpose of this Request for Applications (RFA) is to augment and strengthen the institutional biomedical research capacity through flexible support to expand and develop biomedical faculty research capability and enhance research infrastructure through support of a multi-disciplinary center, led by a peer-reviewed, funded investigator with expertise central to the research theme of the proposal. The application must have a thematic scientific focus in a specific research area, such as neuroscience, cancer, structural biology, immunology, or bioengineering, and may use basic, clinical or both research approaches to attain the goals of the proposed center. The scientific leadership provided by one or more established biomedical research faculty is critical to the success of this initiative, especially for the mentoring of promising junior investigators. The center is intended to support investigators from several complementary disciplines. It will enable the institution to develop a critical mass of investigators and enhance their competitiveness in a specific research area that accelerates the rate at which those investigators compete for other complementary NIH research grant support.

This BiCu proposal seeks to establish an interdisciplinary, multicampus Center for Excellence in Biomedical Research in the scientific theme of bioinformatics, particularly emphasizing microarray analysis.

The center will be hosted by the UPR-RRP campus.

Owen McMillan (UPR-RRP Biology) and Oscar Moreno (UPR-RRP Computer Science) will be co-PIs. The center should be run by a governing board, with representation from each discipline involved. This has worked well for the BRIN grant.

Projects should revolve around bioinformatics, should address the other NCRR funded projects. Should leverage HPCf BiRC hardware, Internet2, local expertise.

It is crucial that the proposal soundly addresses the guidelines described in the RFA.

The RFA allows for an instrumentation core. Currently, the biomedical instrumentation needed is being provided by the COBRE-I, BRIN, and RCMI programs. We should instead devote resources to developing a teaching/career-development core, or the development of a microarray/bioinformatics lab in Cayey.

The center will develop a seminar series in bioinformatics, a graduate course in bioinformatics, and our long-term goal is a PhD granting program in bioinformatics. Strenghtening ties between biology faculty and computer science faculty. Train biology students in CS, modeling, statistics. Train CS/Math faculty in biology.

We should expand the bioinformatics course to two semesters, include the biological concepts and the algorithms for each of the topics studied. Sandra has stated she could teach a seminar on DNA and microarrays for mathematicians early next semester.

Michael Rubin already taught undergraduate bioinformatics course. Sandra Peña taught a graduate microarrays course.

Maria Aviño and Michael Rubin would like to include a microarray/bioinformatics laboratory in Cayey.

Possible Research Projects

Again, the RFA says:

Each COBRE program should include three to five research projects that stand alone, but share a common thematic scientific focus. Each research project should be supervised by a single junior investigator who is responsible for insuring that the specific aims of that project are met. For the purpose of eligibility a junior investigator is defined either as (1) an individual who does not have or has not previously had an external, peer-reviewed Research Project Grant from either a Federal or non-Federal source that names that investigator as the PI or (2) an established investigator who is making a significant change to his/her career.

With respect to the item (1), grants that name an individual as a co- investigator, collaborator or consultant do not disqualify that investigator. Starter grants (such as NIH's FIRST award mechanism, R29), Academic Research Enhancement Award grants (AREA, R15), or exploratory/pilot project grants (such as NIH R03 or R21 awards) also do not disqualify the investigator. The investigator must hold either a tenure track or non-tenure track faculty appointment of any rank at the time that the award is made. Furthermore, a clear commitment to support this faculty appointment must be demonstrated from the institution by a letter(s) from the appropriate senior institutional official(s). Postdoctoral fellows or other positions that do not carry independent faculty status will disqualify that individual and his/her research project from further consideration.

With respect to the item (2), support may be provided to an individual who is making a significant change to his/her career goals by initiating a new line of research that is distinctly and significantly different from his/her current investigative program. In this case a current or previous history of independent peer-reviewed research support in a different investigative area from that proposed in this application does not disqualify the investigator. Moreover, this individual can be of any faculty rank.

Simulation and reverse engineering problem of genetic networks.

The reverse engineering problem consists on: given a gene expression data construct a particular model (in the given model class) that is consistent with the data. The simulation is given a model, observe its behavior and compare to gene expression data from real networks.

We will work in both directions. We will develop mathematical models (like sequential dynamical systems) and their properties. We propose to study some algorithms that will be applied for describing relationships in genetic networks with a large number of genes, in particular, "continuous time recurrent neural networks". This algorithm is using in Reverse Engineering of Genetic Networks from time course data. A genetic dynamical system is a time-discrete dynamical system. That is, a finite dynamical system moving from continuous states to a finite set.  We propose to describe properties of these dynamical systems using Coding theory and Cryptography. Another focus in this project is to research some algorithms for their usefulness in processing the data to answer biological questions.

We will simulate models for genetic networks and test them on real data including the learning and memory datasets developed in Dr. Sandra Peña's microarray laboratory

Addresses the needs of COBRE-I (Neurobiology), other microarray projects

Relevant Publications

  1. María Aviño, Dorothy Bollman, Oscar Moreno and Humberto Ortiz-Zuazaga. Genetic Sequential Dynamical Systems. Submited to ACM Conference on Computational Biology. Miami Florida.
  2. Oscar Moreno, Dorothy Bollman and María A. Avino. Finite Dynamical systems, Linear Automata, and Finite Fields. Accepted WSEAS International Conferences on: System Science 2002, Applied Mathematics and Computer Science 2002, Power Engineering Systems 2002. Brazil Oct 21-24, 2002.

Potential External Advisory Committee members:

Roland Somogyi, Chris Barrett, Reinhardt Laubenbacher

Microarray expression analysis: statistical significance of expression changes

Detect statistically valid changes in gene expression measurements from replicated microarray expreriments. Microarray data is extremely noisy, and better tools for handling thousands of samples are needed.

Addresses the needs of COBRE-I (Neurobiology), Owen's microarray project

Relevant publications

  1. Y. Robles, H. G. Ortiz-Zuazaga, Y. Carrasquillo, S. Peña de Ortiz. Gene Expression Profiling of the Rat Hippocampus in Spatial Discrimination Learning. Submitted November 20, 2001 to the Journal of Neurochemistry.
  2. H.G. Ortiz-Zuazaga, Y. Robles, R. Chiesa, S. Peña de Ortiz. Analysis of Learning-Related Changes in Gene Expression Using Nylon Membrane cDNA Microarrays. Abstract presented at the Fifth Annual International Conference on Computational Molecular Biology (RECOMB 2001). Montreal, Quebec, Canada. April 2001.

Possible External Advisory Committee members:

James Berger, Nir Friedman

Sequence database searches: significance of gapped local alignments.

Better tools for determining the statistical significance of gapped local pairwise alignments could make better BLAST searches.

Addresses RCMI Proposal was submitted to BRIN.

Possible EAC members:

PSC?, Altscul?

Statistical learning techniques for microarray analysis.

Dr. Acuña's research work is related to the following areas:

  1. Unsupervised learning (clustering) to find genes that behave similarly in various condition and to find subgroups of samples (patients' tissues) that are similar to each other.
  2. Supervised Learning to predict types of tumors based on the gene expression profile of each sample. In particular, bagging, boosting, nonparametric classifiers, support vector machine
  3. Future selection techniques (Filter and Wrapper methods) applied to gene expression data.
  4. Parallel computation applied to procedures 1-3 since I am using nonparametric statistical techniques that requires a lot of data to produce good estimations

Addresses COBRE-I (Sandra)

Possible EAC members:

Amir Ben-Dor (Agilent) works in this area, Nir Friedman (Stanford, Hebrew University) works on Bayesian techniques for these problems.

Modeling Gene Expression Regulatory Networks in Experimental Genetic Systems

The coordinated regulation of gene expression is an essential feature of the development of complex organisms. Even simple organisms such as bacteriophages and viruses must regulate gene expression to control processes essential for implementing their genetic plan and progress through their life cycles. The availability of complete sequences for various model organism genomes in conjunction with genome-wide gene expression patterns obtained from recent microarray experiments enable the analysis and modeling of gene expression regulatory networks in these systems. Gene expression can be regulated at various levels including transcriptional and posttranscriptional controls as well as posttranslational modifications of proteins involved in the regulatory processes resulting in multiple interactions between numerous components. We are proposing to develop mathematical models of example gene expression regulatory networks in two experimental organisms: the bacteriophage Lambda and the fly Drosophila melanogaster.

Bacteriophage Lambda, a lysogenic phage of Escherichia coli, can proceed through either of two alternative life cycles following infection of the host bacteria. Lambda can enter the reproductive lytic cycle (by replicating its genome, expressing its genes, and releasing assembled phage) or the integrative lysogenic cycle (by recombination with the bacterial host genome followed by replication along with bacterial cell division). Control of these alternative possibilities is achieved by a molecular mechanism involving the regulation of expression of several key genes and the stability of expressed gene products. Mathematical models will be developed to simulate this relatively simple genetic system (a simple regulatory network).

The fly Drosophila melanogaster is a multicellular organism with a complex life cycle and developmental program. Its fully sequenced genome predicts approximately 12,000 genes involved in the life of a fly. Recent microarray experimental results documented gene expression patterns for approximately one-third of Drosophila genes during the complete time course of the fly life cycle (Arbeitman et al., 2002). The many genes active in early development are maternal or zygotic in origin and function hierarchically to establish the body plan resulting in the fully functional adult fly. Some of these early genes are transcription factors that coordinately regulate the expression of numerous other genes. Mathematical models will be developed to simulate various specific example components of some gene regulatory interactions in this more complex genetic system.

The mathematical models developed to simulate regulatory networks in these well-characterized genetic systems will allow an increased understanding of the underlying interactions between components controlling gene activity important for development and enable the prediction of gene expression patterns under diverse experimental conditions. The mathematical models developed in this project will be extended to simulate increasing complex patterns of coordinated gene expression based upon new experimentation elucidating genome-wide genetic regulation in more sophisticated systems under varying specific conditions.  

Literature Cited

  1. Arbeitman et al. 2002. Gene Expression During the Life Cycle of Drosophila melanogaster. Science 297: 2270 51; 2275.

Microarray data analysis.

PI: Jaime Ramírez Vick (UPR-RUM Engineering)

All of my interests in bioinformatics fall within the general area of microarray data analysis.  My method is a systems approach to microarray data analysis.  In it my plan is to develop different modules that work together in the generation of new knowledge from raw microarray data. 

  1.  The first module involves Data Preprocessing in which raw data in the form of an image is used to obtain signal intensity per spot using segmentation and background subtraction algorithms.   Differentially expressed genes are then determined using supervised discriminatory gene classifiers (parametric and non-parametric).
  2. The first step towards the generation of new knowledge involves the integration of data.  This requires the development of new databases and integration strategies.  The information stored in these will include gene expression, biochemical reaction pathways, xenobiotics and biomolecules.  Functional data will be integrated using a hybrid of bio-ontologies and molecular interactions.
  3. Another area of study is the use of gene expression data to identify genes and biochemical pathways associated with diseases, suggest novel targets and mechanism of action or modulation, prioritize new drug targets for screening, and assess the potential toxicity of new therapeutic compounds.  A rule-based expert system will be developed using the gene expression data and the databases developed in 2) to build the pathways in which the differentially expressed genes are involved in.
  4. The next step in development requires dealing with incomplete biochemical pathway information, since not all reactions are fully known for a given organism.  Two different approaches will be used to infer new pathway information:
    1. If no new information results from this, the data is compared with orthologous pathways in other organisms, closely related to the organism in question, to infer new interactions by this comparison. 
    2. Another approach is to use the expression data together with pathway data in the construction of probabilistic models such as  Bayesian networks to infer unknown pathway structures.
  5. Finally, interactive pathway visualization tools are required to view and query the pathway results.

Other possible projects include Heralal Janwa (UPR-RRP Mathematics), Pedro Romero (to be recruited at UPR-RRP Computer Science).

Not all of these have to start immediately. The COBRE should outline a plan for graduating older members, recruiting new members, surviving past the end of the grant. We could probably get P20 (Center Grants) and P01 (Program Project Grants) with the results generated during the COBRE.

Troglodita approved!

Humberto Ortiz Zuazaga

Most recent change: 2007/9/3 at 22:00
Generated with GTML