Powering data-driven brain tumour research

Researcher Chun Vong is using REANNZ HPC resources and deep learning approaches to study and characterise brain tumour microenvironments.

Research background

Glioblastoma (GBM) is the most common and aggressive form of high-grade brain tumour (HGBT), with a median survival of just 15 months. Accurate pathological diagnosis is critical for prognosis and treatment planning.

Traditionally, histopathological analysis, such as hematoxylin and eosin (H&E) staining, has been central to diagnosis. Over time, ancillary immunohistochemistry (IHC) and advances in transcriptomic and methylomic technologies have provided unprecedented tumour heterogeneity data, enhancing diagnostic precision.

However, these advanced omics-based methods are often inaccessible due to high costs and the need for specialised infrastructure. To bridge this gap, researchers are increasingly turning to digital pathology platforms that integrate rich biological data with routinely available histological images.

At the University of Auckland, researcher Chun Vong is developing a data-driven multiscale analysis platform for HGBT that leverages deep learning to characterise tumour microenvironments.

By mapping molecular insights onto histological features, the platform aims to uncover prognostic biomarkers and guide personalised therapeutic strategies—without requiring expensive omics testing for every patient.

This ambitious project relies on computationally intensive deep learning models, demanding significant memory, parallel processing power, and efficient pipeline orchestration.

An example of the images generated from the analysis platform University of Auckland Research Chun Vong is developing.

Project challenges

The development of Chun’s multiscale platform faces several technical hurdles:

High computational demands: Training deep learning models requires extensive GPU resources and large memory capacity.
Complex software ecosystem: The pipeline integrates multiple tools—including PathML for digital pathology analysis, Dask for parallel computing, and Nextflow for workflow management—each with its own dependencies and configuration needs.
HPC integration: Efficiently deploying and scaling this pipeline on NeSI’s high-performance computing (HPC) infrastructure required expertise in Slurm job scheduling, GPU allocation, and performance optimisation.

Without expert support, navigating these challenges could delay progress and limit scalability.

What was done

Chris Scott and Maxime Rio, Research Software Engineers working at New Zealand eScience Infrastructure at the time, collaborated closely with Chun to optimise and deploy his analysis pipeline on the national HPC platform.

While Chun led the development of the scientific pipeline – drawing on his strong programming background – Chris and Maxime provided critical expertise in HPC best practices, containerisation, and scalable computing:

Containerisation with Apptainer
To ensure portability, reproducibility, and compatibility across systems, Chris and Maxime helped Chun containerise his software stack using Apptainer. This allowed the entire environment—including Python packages, deep learning frameworks (e.g., PyTorch), and PathML—to be encapsulated in a single, portable image that runs consistently on NeSI’s clusters and elsewhere.
Scaling with Dask
The PathML pipeline uses Dask, a flexible parallel computing library in Python, to process large whole-slide images efficiently by breaking them into smaller chunks and processing them concurrently. Chris and Maxime assisted in configuring Dask to work effectively within the distributed memory architecture of the HPC system, enabling the pipeline to scale up and take full advantage of available CPU and GPU resources.
Optimising Nextflow for Slurm and GPUs
A core part of the workflow is orchestrated using Nextflow, a powerful tool for building portable and scalable data pipelines. Chris and Maxime helped Chun adapt the Nextflow configuration to:
- Submit jobs via Slurm, the HPC platform workload manager
- Request and utilise NVIDIA HGX A100 GPUs effectively
- Manage resource allocation (memory, cores, walltime) based on task requirements
Performance profiling and debugging
Chris and Maxime profiled different stages of the pipeline to understand memory usage, I/O bottlenecks, and compute requirements. This profiling informed optimal job configurations and helped debug issues related to file handling, GPU memory overflow, and inter-process communication.

Main outcomes

Thanks to the collaboration with Chris and Maxime, the project achieved the following outcomes:

Containerised software using Apptainer for reproducibility, portability and ease of deployment
Nextflow workflow successfully deployed on the national HPC platform, capable of leveraging GPU acceleration
Dask integration optimised to scale up and take advantage of HPC resources
Pipeline performance profiled to guide efficient resource allocation

As a result, Chun can continue to develop his HGBT deep learning platform using powerful GPUs – accelerating model development and validation.

Researcher feedback

This project required scaling of our model training and optimisations for us to train many models at the same time, efficiently. We sought the consultancy’s help with this, and they were instrumental in automating and setting up the optimisations runs for my models. In the end, I was able to scale our training and was able to optimised over 100 models effectively during my PhD candidature. Without them, none of the optimised modelling would have been possible.

Chun Kiet Vong, Auckland Bioengineering Institute, University of Auckland

This case study shares some of the technical details and outcomes provided through our Consultancy Service. This service supports projects across a range of domains, with an aim to lift researchers’ productivity, efficiency, and skills in research computing. Get in touch to discuss how our Research Software Engineers and specialist support could help advance your project.