Biological Data Processing Systems Lab
The big data era in molecular biology has created exiting potential for novel biological discoveries, but also exiting challenges for computer scientists in data management, processing, and visualization. In the next decades there will be developed sophisticated bioinformatics methods and framework to analyze and explore the information in the data. However, these will require development of novel infrastructure systems targeted for bioinformatics data and methods.
Our research goal is to build and experimentally evaluate infrastructure systems that support the methods under development by our bioinformatics collaborators. We are designing and implementing systems for big data storage, interactive analysis, and large-scale visualization. We are primarily interested in improving the scalability and interactivity of bioinformatics analysis methods and frameworks.
We combine experimental computer science with real problems, applications, and data obtained from our biology collaborators. We focus on distributed and parallel systems, including high-resolution visualizations.
We participate in three large projects, but we also have several minor projects.
Elixir is a large scale European project to construct and operate a sustainable infrastructure to support life science research. One of the responsibilities of the Norwegian Elixir node, Elixir.no, is marine genomics. In the Norwegian node the Center for Bioinformatics at the University of Tromsø is responsible for bioinformatics services and research in marine metagenomics. Our research is focused on building infrastructure systems for metagenomics analysis pipelines. We are focusing on building and experimentally evaluating systems that provide more scalable, flexible, and interactive data processing. In addition we build systems for configuring, storage, and provenance management.
We participate in the Marine metagenomic infrastructure as driver for research and industrial innovation use case project in the ESFRI ELIXIR-EXCELERATE infrastructure project were we develop the META-pipe data analysis service for ELIXIR users and Norwegian users, and the Marine Metagenomcs Portal that provides marine reference databases. We were a partner in the ELIXIR Pilot Action Marine metagenomics pilot – towards domain specific service.
The Norwegian Woman and Cancer (NOWAC) postgenome biobank contains time series with questionnaire data from 170 000 women and more than 60 000 blood samples. The biobank is analyzed using several omics technologies and the data is being analyzed by the Systems Epidemiology group lead by Professor Eiliv Lund at the Department of Community Medicine, University of Tromsø. Our responsibility in the project is to build a backend for machine learning based data analysis, and a system for exploration and visualization of the multi-level and multi-omics dataset. Our research focus on interactive data analysis methods and systems, scalable integrated visualizations, interactive data cleaning, and human computer interfaces for large-scale display walls.
Troilkatt is a data processing systems for massive bioinformatics datasets. It was built in cooperation with the Troyanskaya lab and Kai Li at Princeton University. The research goal was to extend data intensive computing systems for heterogeneous biological data and to provide an infrastructure system for next-generation bioinformatics data analysis and exploration tools. Troilkatt is used to provide data for several bioinformatics tools built by Troyanskaya lab including IMP, Spell, and HEFalMp.
In the lung sounds project we are building a database with lung sound recordings of 3000 persons (6 samples per person). The recordings are done as part og Tromsøundersøkelsen 7, which is an Epidemiological study that was started in 1974. The database will be used to provide educational and analysis services for lung sounds. Our contributions are methods for automated classification and similarity search for the sounds. This project is done in collaboration with Hasse Melbye at the Department of Community Medicine, University of Tromsø.
Kodeklubben Tromsø is an after school club for kids and youth who want to learn about computer programming. Together with volunteers from local tech companies and students from our department, members of our lab organize the activies. Each semester we run a 10-week program with two-hour classes each week. In the spring of 2016 we have over 130 kids from ages 7 to 17 attending the club.
In the air polution project we are developing educational projects for use in Norwegian High Schools. This work is done in collaboration with Nordnorsk Vitensenter Tromsø.
M.O.R.T.A.L. is a new programming language being developed for domain specific high performance computing. We will use MORTAL for controlling and executing biological data processing.
COST Action IC1406 - High-Performance Modelling and Simulation for Big Data Applications (cHiPSet). We are in the management committee representing Norway.
The Network for Sustainable Ultrascale Computing (NESUS) is a European (COST) research network. We are one of the participating institutions. We are contributing to the Applications working group (WP6).
The lab currently consist of:
|Lars Ailo Bongo||Associate Professor||Principal investigator||Homepage | Github | Bitbucket|
|Einar Holsbø||Ph. D. student||NOWAC||Homepage | Github|
|Bjørn Fjukstad||Ph. D. student||NOWAC||Homepage | Github|
|Morten Grønnesby||Ph. D. student||NOWAC||Homepage | Github|
|Johan Ravn||Master student||Lung sounds|
|Tim Alexander Teige||Master student||Elixir||Github|
|Nina Angelvik||Master student||Air pollution|
|Inge Alexander Raknes||Technical staff||Elixir||Github | Bitbucket|
|Jon Ivar Kristiansen||Technical staff||Center for Bioinformatics||Homepage|
|Giacomo Tartari||Technical staff||Center for Bioinformatics, Elixir||Github|
|Aleksandr Agafonov||Technical staff||Center for Bioinformatics, Elixir||Github|
|Rigmor Katrine Johansen||Intern||Bioethics, NOWAC|
Former lab members are:
|Dr. Edvard Pedersen||Ph.D. student, 2016, CS, UiT||Thesis: A Data Management Model For Large-Scale Bioinformatics Analysis (Thesis and source code).|
|Frode Opdahl||Master student, 2016, CS, UiT||Project: Virtual reality.|
|Jarl Fagerli||Master student, 2015, CS, UiT||Thesis: COMBUSTI/O. Abstractions facilitating parallel execution of programs implementing common I/O patterns in a pipelined fashion as workflows in Spark. Thesis. Source code.|
|Kenneth Knudsen||Master student, 2015, CS, UIT||Thesis: Freia: Exploring Biological Pathways Using Unity3D. (Thesis, Source Code, Demo)|
|Ove Kåven||Master student, 2015, CS, UiT||Thesis: Multiparadigm Optimizing Retargetable Transdisciplinary Abstraction Language (Thesis, Source code)|
|Ida Jaklin Johansen||Technical staff||Elixir.no|
|Martin Ernstsen||Master student, 2013, CS, UiT||Thesis: Mario - A system for iterative and interactive processing of biological data (Thesis, Source code)|
|Terje André Johansen||Master student, 2011, CS, UiT||Thesis: A scalable, interactive widget library for visualizing biological data (Thesis, Paper)|
|Torje Henriksen||Master student (main-advisor Otto J. Anshus, co-advisor Phuong Hoai Ha), 2008, CS, UiT||Thesis: Efficient intra-node Communication for Chip Multiprocessors|
We are a lab in the High Performance Distributed Systems research group at the Department of Computer Science, University of Tromsø.
We collaborate with Professor Eiliv Lund in the NOWAC project.
We collaborate with Professor Hasse Melbye in the Lung Sounds project.
EPINOR is a national research school in population based epidemiology. We are one of associated research groups, and our PhD students can apply for admission to the school.
NORBIS is the national research school for bioinformatics, biostatistics, and systems biology.