Software

This page contains links to the source code for our publications. Note that code without a public repository is generally not maintained anymore. We will however help with any questions and issues as best we can.

Contact: larsab@cs.uit.no

Open source projects

    We have developed the META-pipe pipeline for marine metagenomics data analysis. Version 2.0 consists of several backend systems, servers, and services:

    1. Marine Metageonomics Portal. Marine reference databases and more.
    2. Galaxy pipeline provided as part of the NeLS infrastructure. It is intended for Norwegian users so a a FEIDE account is needed for login.
    3. Spark based backend.
    4. Authorization server integrated with Elixir AAI.
    5. Web application frontend.
    6. Object storage server.
    7. Tool to setup META-pipe backend on the OpenStack cPouta cloud.
    8. Tool to setup META-pipe backend on OCCI enabled endpoints.
    Soruce code for META-pipe 1.0 is in the following repositories. Note that these are not maintained anymore.
    1. META-pipe 1.0. Implemented for execution on HPC clusters.
    2. Patches for META-pipe specific metarep (1.4.0) sequence retrieval modifications.

    Source code for our research projects (random order):

    1. UiT Github course guide and template. The unofficial guide for using GitHub for UiT courses.
    2. validator. an R package for running repeated k-fold cross-validation.
    3. So you want to use R on stallo. A brief guide to launching long-running embararssingly parallel R jobs on the UiT supercomputer Stallo.
    4. nrsoot. Minimalist process isolation tool implemented with Linux namespaces. Desribed in this short paper.
    5. geneset. R package of data sets and functions that facilitate gene set analysis.
    6. nowaclean. R package implementing the methods of the standard operating procedure for cleaning microarray data in the Norwegian Women and Cancer postgenome study.
    7. Kvik. A framework for developing interactive data exploration applications in genomics and systems biology. The Master thesis of Fjukstad, Fjukstad et al. 2015, and Fjukstad et al. 2017 describe the system.
    8. walrus. A system for running data analysis pipelines using Docker containers.
    9. seq. A collection of Docker containers with different bioinformatics tools, such as GATK, bwa, and Picard, installed.
    10. Mixt. Matched Interaction Across Tissues (MIxT) is a web application for exploring and comparing transcriptional profiles from two or more matched tissues across individuals. Online at mixt-blood-tumor.bci.mcgill.ca
    11. Luft. Web application for visualizing air quality in Tromsø with data from The Norwegian Institute for Air Research (NILU) and Kongsbakken VGS. Online at luft.cs.uit.no.
    12. Air quality sensor and web server. an Arduino-based portable air quality sensor kit and a Ruby on Rails web application deployed on Heroku.
    13. COMBUSTI/O. Abstractions facilitating parallel execution of programs implementing common I/O patterns in a pipelined fashion as workflows in Spark. A detailed description is found in Jarl Fagerli's master thesis.
    14. Freia. Biological Path Visualization using Unity3D to visualize gene expression data integrated with pathway images. The Master thesis of Kenneth Knudsen has a detailed description of the tool.
    15. KEGGviewer. Simple Python Flask web viewer for KEGG images.
    16. krongen. Creates kronecker graphs that simulate networks with power law edge distributions.
    17. Mr. Clean is a tool for combining different visualization tools, interaction devices, and display middleware for visual comparisons on high-resolution displays.
    18. M.O.R.T.A.L. is a programming language for domain specific high performance computing.
    19. Mario is a system for interactive data analysis. It is built on top of the HBase storage system, that provide data processing using commonly used bioinformatics applications, interactive tuning, automatic parallelization and data provenance support. The README file in the source code provides installation instructions, and the Master thesis of Ernstsen gives a detailed description of the system.
    20. This benchmark was used to evaluate the performance of Hbase using data and access pattern found in typical biological data processing tools. The dataset size is tunable. The README file in the source code provides installation instructions, and the Master thesis of Ernstsen gives a detailed description of the system.
    21. Spell expression data processing pipeline. This a data cleaning pipeline for microarray data.
    22. Troilkatt system This is a system for scalable batch processing of biological data. Troilkatt is built on the hadoop stack.
    23. BSV system. This is a system for scalable visualizations on multi-core and multi-display platforms. It provides a programmatic control for visualizations implemented using Python visualization libraries.
    24. GeStore. This is a system for enabling transparent incremental updates for metagenomic pipelines. Several publications describe the system in detail.