Multi-omic Data Orchestration
Over the last few years, one of the key challenges we have faced in our multi-omic translational studies is the orchestration of multi-omic datasets. In this blog post, I summarise some of the strategies we have employed to manage this complexity, after dedicating significant time to considering this problem. Industry Data Warehousing I recommend the Kimball book on data warehousing (The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modelling) as a starting guide to see what other people have done in tackling ‘big data’. ...
How to Handle Big Data Uploads in Django
Traditional File Uploads in Django Handling non-trivial big data file uploads (think >5gb/file) in Django can be challenging. A typical file upload storage strategy in Django is to use django-storages and an Amazon S3 backend. In the traditional way, your browser uploads the file directly to Django which then transmits the data to S3. This creates two problems: Django needs to have enough memory to hold your uploaded file for retransmission. The Django worker receiving the file upload is blocked until the upload is finished. Streaming the upload onwards to S3 doubles the amount of bandwidth used. What if we could instead upload directly into AWS S3 while registering the upload with Django? I stumbled across this excellent article by Radoslav Georgiev of HackSoft: https://www.hacksoft.io/blog/direct-to-s3-file-upload-with-django ...
Launching Nextflow Pipelines From the Cloud
Why bother setting up another server to manage your Nextflow pipeline? Avoid premature termination of the pipeline. When you run a Nextflow pipeline from your local computer, your local computer is managing the tasks and communicating with Azure as jobs are completed. Depending on the complexity of your pipeline, this may be a long time (couple of days!). If your wi-fi router accidentally comes unplug or the connection is broken, the pipeline will terminate prematurely. While you could recover from this using -resume, you can avoid the risks of disconnection by using a cloud server, which is designed to run 24/7, to orchestrate this. ...
Setting Up Azure With Nextflow
We will cover getting your nextflow pipeline up and running in the cloud using Azure. First thing to note, this process will take some time so sit back, grab a coffee and take your time working through each section. Useful References Nextflow documentation here. Nextflow blog post here. 1. Context Let’s start with some context. I’m working with a small team of immunology researchers who are following their scientific questions which has led into genomics. We don’t have any bioinformatics setup and are starting from nothing - which is probably going to be the case for many newcomers as NGS technology continues to become more accessible (for example Nanopore sequencing). ...
How to Create Your Bioinformatics Pipeline with Nextflow
Now that you know how to run bioinformatics software in Docker containers, it’s time to connect them up. If you’ve missed the last post the link is here: Getting started with Docker for bioinformatics. Content Overview What is a pipeline? Nextflow vs Snakemake Using Nextflow and Docker containers to create your pipeline Summary What is a pipeline anyway? The term ‘pipeline’ is thrown around a lot in bioinformatics. In simple terms, it refers to the programs that have to be run in a certain order to complete the analysis. Some of these programs take the outputs of earlier programs and process them in order to achieve a specific objective. ...
Getting Started With Docker for Bioinformatics
Next generation sequencing is becoming much more accessible to researchers in 2021. As you stare at the freshly minted .fastq files, you’re wondering - how do I go about analysing this? After a stint on Google, you decide that you want to run bwa-mem/bowtie2 and then send the output into samtools. Next thing you know, you’re trying to install half a dozen bioinformatic programs on your new ubuntu machine. You run into dependency hell or else conda seems to be stuck solving god knows what and this quickly eats up half your day. ...
Architectural Approaches to Building Websites
This article is for people who are new to web development, lost in the myriad of web frameworks and are asking themselves: What are the real differences in the frameworks? Having spent the last year getting up to speed with the state of the web, I’m going to summarise the 4 major architectural approaches to launching a new website in 2021. Although I describe them as discrete categories, in reality they exist on a continuum and some frameworks will blur the lines between these categories (eg. NextJS). For the beginner though, it is helpful to have a rough overview of the major choices available to you. ...
The State of Web Development
I’ve had to dabble in a bit of web development over the past year as part of my research fellowship and having not done this for a while, I have accrued a couple of thoughts I want to share. My background First, a bit of background. Growing up, I’ve had the privilege of witnessing the birth and evolution of the modern internet. From the days of 56kbps dial-up modems which gets taken out by lightning(!) occasionally and back when Yahoo was the default home page for everybody (remember those days?). This was before MySpace and Friendster. ...