Data mining and data integration in biology

  • Páll Ísólfur Ólason

    Research output: Book/ReportPh.D. thesis

    754 Downloads (Orbit)

    Abstract

    Last decade saw an explosion in DNA sequencing and the draft version of the human genome. Now, proteomics is experiencing the same growth. With proteins being the functional elements of living cells, high-throughput proteomics promises more understanding of cellular functions and the interactions between molecules, the essence of systems biology. Internet technologies are very important in this respect as bioinformatics labs around the world generate staggering amounts of novel annotations, increasing the importance of on-line processing and distributed systems. One of the most important new data types in proteomics is protein-protein interactions. Interactions between the functional elements in the cell are a natural place to start when integrating protein annotations with the aim of gaining a systems view of the cell. Interaction data, however, are notoriously biased, erroneous and incomplete. They also necessitate new ways of data preparation as established methods for sequence sets are often useless when dealing with sets of sequence pairs. Therefore careful analysis on the sequence level as well as the integrated network level is needed to benchmark these data prior to use. The networks, which emerge when interaction data are integrated, form a skeleton to which we can attach other annotation types. Then, using graph theoretical methods, we can identify network structures and infer annotations across the links of physical interactions, thus defining novel functional modules, or in the case of dysfunction: disease modules and genes.
    Original languageEnglish
    Number of pages176
    Publication statusPublished - Mar 2008

    Fingerprint

    Dive into the research topics of 'Data mining and data integration in biology'. Together they form a unique fingerprint.

    Cite this