Biological Data Processing Systems Lab

The big data era in molecular biology has created exiting potential for novel biological discoveries, but also exiting challenges for computer scientists in data management, processing, and visualization. In the next decades there will be developed sophisticated bioinformatics methods and framework to analyze and explore the information in the data. However, these will require development of novel infrastructure systems targeted for bioinformatics data and methods.

Our research goal is to build and experimentally evaluate infrastructure systems that support the methods under development by our bioinformatics collaborators. We are designing and implementing systems for big data storage, interactive analysis, and large-scale visualization. We are primarily interested in improving the scalability and interactivity of bioinformatics analysis methods and frameworks.

We combine experimental computer science with real problems, applications, and data obtained from our biology collaborators. We focus on distributed and parallel systems, including high-resolution visualizations.

Projects

We participate in three large projects, but we also have several minor projects.

ELIXIR

Elixir is a large scale European project to construct and operate a sustainable infrastructure to support life science research. One of the responsibilities of the Norwegian Elixir node, Elixir.no, is marine genomics. In the Norwegian node the Center for Bioinformatics at the University of Tromsø is responsible for bioinformatics services and research in marine metagenomics. Our research is focused on building infrastructure systems for metagenomics analysis pipelines. We are focusing on building and experimentally evaluating systems that provide more scalable, flexible, and interactive data processing. In addition we build systems for configuring, storage, and provenance management.

We participate in the Marine metagenomic infrastructure as driver for research and industrial innovation use case project in the ESFRI ELIXIR-EXCELERATE infrastructure project were we develop the META-pipe data analysis service for ELIXIR users and Norwegian users, and the Marine Metagenomcs Portal that provides marine reference databases. We were a partner in the ELIXIR Pilot Action Marine metagenomics pilot – towards domain specific service.

NOWAC

The Norwegian Woman and Cancer (NOWAC) postgenome biobank contains time series with questionnaire data from 170 000 women and more than 60 000 blood samples. The biobank is analyzed using several omics technologies and the data is being analyzed by the Systems Epidemiology group lead by Professor Eiliv Lund at the Department of Community Medicine, University of Tromsø. Our responsibility in the project is to build a backend for machine learning based data analysis, and a system for exploration and visualization of the multi-level and multi-omics dataset. Our research focus on interactive data analysis methods and systems, scalable integrated visualizations, interactive data cleaning, and human computer interfaces for large-scale display walls.

Troilkatt

Troilkatt is a data processing systems for massive bioinformatics datasets. It was built in cooperation with the Troyanskaya lab and Kai Li at Princeton University. The research goal was to extend data intensive computing systems for heterogeneous biological data and to provide an infrastructure system for next-generation bioinformatics data analysis and exploration tools. Troilkatt is used to provide data for several bioinformatics tools built by Troyanskaya lab including IMP, Spell, and HEFalMp.

Lung Sounds

In the lung sounds project we are building a database with lung sound recordings of 3000 persons (6 samples per person). The recordings are done as part og Tromsøundersøkelsen 7, which is an Epidemiological study that was started in 1974. The database will be used to provide educational and analysis services for lung sounds. Our contributions are methods for automated classification and similarity search for the sounds. This project is done in collaboration with Hasse Melbye at the Department of Community Medicine, University of Tromsø.

Other

Kodeklubben Tromsø is an after school club for kids and youth who want to learn about computer programming. Together with volunteers from local tech companies and students from our department, members of our lab organize the activies. Each semester we run a 10-week program with two-hour classes each week. In the spring of 2016 we have over 130 kids from ages 7 to 17 attending the club.

In the air polution project we are developing educational projects for use in Norwegian High Schools. This work is done in collaboration with Nordnorsk Vitensenter Tromsø.

M.O.R.T.A.L. is a new programming language being developed for domain specific high performance computing. We will use MORTAL for controlling and executing biological data processing.

COST Action IC1406 - High-Performance Modelling and Simulation for Big Data Applications (cHiPSet). We are in the management committee representing Norway.

The Network for Sustainable Ultrascale Computing (NESUS) is a European (COST) research network. We are one of the participating institutions. We are contributing to the Applications working group (WP6).

People

The lab currently consist of:

Lars Ailo Bongo Associate Professor Principal investigator Homepage | Github | Bitbucket
Einar Holsbø Ph. D. student NOWAC Homepage | Github
Bjørn Fjukstad Ph. D. student NOWAC Homepage | Github
Morten Grønnesby Ph. D. student NOWAC Homepage | Github
Johan Ravn Master student Lung sounds
Tim Alexander Teige Master student Elixir Github
Nina Angelvik Master student Air pollution
Inge Alexander Raknes Technical staff Elixir Github | Bitbucket
Jon Ivar Kristiansen Technical staff Center for Bioinformatics Homepage
Giacomo Tartari Technical staff Center for Bioinformatics, Elixir Github
Aleksandr Agafonov Technical staff Center for Bioinformatics, Elixir Github
Rigmor Katrine Johansen Intern Bioethics, NOWAC

Former lab members are:

Dr. Edvard Pedersen Ph.D. student, 2016, CS, UiT Thesis: A Data Management Model For Large-Scale Bioinformatics Analysis (Thesis and source code).
Frode Opdahl Master student, 2016, CS, UiT Project: Virtual reality.
Jarl Fagerli Master student, 2015, CS, UiT Thesis: COMBUSTI/O. Abstractions facilitating parallel execution of programs implementing common I/O patterns in a pipelined fashion as workflows in Spark. Thesis. Source code.
Kenneth Knudsen Master student, 2015, CS, UIT Thesis: Freia: Exploring Biological Pathways Using Unity3D. (Thesis, Source Code, Demo)
Ove Kåven Master student, 2015, CS, UiT Thesis: Multiparadigm Optimizing Retargetable Transdisciplinary Abstraction Language (Thesis, Source code)
Ida Jaklin Johansen Technical staff Elixir.no
Martin Ernstsen Master student, 2013, CS, UiT Thesis: Mario - A system for iterative and interactive processing of biological data (Thesis, Source code)
Terje André Johansen Master student, 2011, CS, UiT Thesis: A scalable, interactive widget library for visualizing biological data (Thesis, Paper)
Torje Henriksen Master student (main-advisor Otto J. Anshus, co-advisor Phuong Hoai Ha), 2008, CS, UiT Thesis: Efficient intra-node Communication for Chip Multiprocessors

Collaborators

We are a lab in the High Performance Distributed Systems research group at the Department of Computer Science, University of Tromsø.

We are also part of the Center for Bioinformatics which is co-located with the Norwegian Structural Biology Centre (NORSTRUCT).

We collaborate with Professor Eiliv Lund in the NOWAC project.

We collaborate with Professor Hasse Melbye in the Lung Sounds project.

EPINOR is a national research school in population based epidemiology. We are one of associated research groups, and our PhD students can apply for admission to the school.

NORBIS is the national research school for bioinformatics, biostatistics, and systems biology.

We have long term collaboration with Professors Kai Li and Olga Troyanskaya at Princeton University. We are also collaborating with Etienne Birmelé at Université Paris Descartes.