Red ribbon shaped into an awareness symbol on a light gray background.

Multi-Modal Signature of Treatment Response in Breast Cancer

Breast cancer, like all cancers and no doubt other complex traits, is likely driven by multiple bio-environmental factors that coincidentally drive its development. Consequently, focusing on a single mechanism is likely to impact the efficacy of response and run a higher risk of recurrence, particularly in those who initially responded to treatment. Thus, a greater chance of success undoubtedly lies in a systemic approach targeting multiple biological processes, either by interrupting or enhancing their molecular activities. Achieving this requires an analytical platform that combines complex laboratory processes with automated analytics capable of ingesting multimodal, high-dimensional data.

For this project, Profenso previously partnered with a laboratory at QIMR Berghofer to build a bespoke analytical pipeline from data generation and quality control through segmentation to multimodal predictions of response to immunotherapy. The analytics are based on the latest architectures of deep learning models, calibrated on the most advanced training regimes.

Colored scanning electron microscope image of viruses with long, tail-like structures shown in teal color and smaller, round virus particles in orange color.

Intelligent Spectral Flow Cytometry in Microbiome

Spectral Flow Cytometry is an advanced technology for the analysis of single cells or particles, using dispersive multi-laser filtering at various wavelengths. In biological sciences, this technique was traditionally used for the detection and sorting of human immune cells. However, recent analytical advancements have extended its utility to much smaller cells, such as bacteria.

Profenso worked with collaborators at QUT to develop semi-automated quality control pipelines for microbiome data, as well as bespoke deep learning analytics for bacterial isolates and faecal matter.

Close-up of a spherical sea urchin with long, thin spines in shades of purple and blue, set against a dark background.

Immunotherapy Outcome Prediction

Cancer is a major health burden and the cause of ~50,000 deaths per year in Australia. One in four Australians will receive a cancer diagnosis during their lifetime. Immunotherapy has emerged as a promising treatment for some patients but not all. An area of critical need is the ability to predict who will benefit from immunotherapy prior to treatment. To date, one of the genomic features used to identify therapy response is tumour mutational burden (TMB). However, its predictive accuracy (~60%-80%) is inadequate and TMB often fails to translate across different cancer types.

In this project, we utilised publicly available genomic and transcriptomic datasets (limited to melanoma data) to develop a machine learning model predicting immunotherapy response and to test generalisability of this model to lung cancer we teamed up with our partners, Metro North Hospital Health Service, QIMR Berghofer and BGI. There, we prospectively recruited lung cancer patients from 8 hospitals within Australia. We generated genomic and transcriptomic data from these patients to tested our model on that dataset.

The final machine learning (ML) model, integrated information from the genome (somatic mutations and TMB) and the transcriptome data (differential expression) to classify patients into responders and non-responders to immunotherapy. Within melanoma we achieved accuracy above 92% on a validation dataset and above 70% on a totally independent/unseen test set.

Key Findings:

Analysing different types of genetic data together, such as DNA and RNA, can provide more reliable results than when using DNA or RNA alone

We observed issues with predicting immunotherapy response, using a single value as a total aggregation of cancer mutations (otherwise known as Tumour Mutational Burden). It is argued that cancers with a high number of such mutations respond to immunotherapy better, because cancer is less able to "hide" itself from our immune system. However, we observed that in two separate populations, diagnosed with the same cancer, non-responders had a higher total number of such mutations compared to responders in another, which poses a serious problem to the TMB hypothesis. Specifically, because we show that it could be mutations in specific families of genes that matter rather than just the overall number of mutations.

We found that genes whose expression levels change in immunotherapy non-responders are different from the genes that are characterised by mutations in coding regions

Expression level changes seemed to be related to biological processes that dealt with the cell's distress signals, which directly impacted activation or inactivation of the immune system. Whereas, the changes caused by coding mutations were mostly linked to cell wall structures, cell ability to migrate and the external environment surrounding the cells.

DISCLAIMER: This work is unpublished and was supported by a CRCP grant and led by Dr Maciej Trzaskowski when he acted as Head of Research at Max Kelsen.

Colored scanning electron microscope image of human blood cells, showing pink and green cells against a black background.

Cancer of Unknown Primary

Cancer of Unknown Primary is a heterogenous group of unrelated cancers whose primary site escapes detection. By definition, CUP is a diagnosis of exclusion after performing a minimum standardised diagnostic work-up.

CUP accounts for 3-5% of all malignancies worldwide3 and prognosis remains poor. Patients with CUP have a median overall survival (OS) of 8-11 months. In Australia, CUP constitutes 1.6% of all cancers and its incidence has significantly decreased between 1982 and 2019 (from 16 to 8.1 per 100,000 persons), however, like the global statistics, the prognosis is poor. The 5-year survival of a patient with CUP remains low at 13%.

Based on our previous work with transparency and numerical safety of neural networks [1,2], we have devised and tested AI-based differential expression algorithms that have the potential to deliver fast, reliable and accurate predictors of the primary site. The aim was to help pathologist and clinical oncologist identify where in the body the tumour originated as early and as fast as possible. Knowing this could initiate early and/or better treatment.

DISCLAIMER: This work is unpublished and was led by Dr Maciej Trzaskowski when he acted as Head of Research at Max Kelsen.

Digital illustration of a brain formed by interconnected nodes and lines emerging from a maze-shaped base, representing artificial intelligence or neural networks, on a grid background.

Generalising Uncertainty in Machine Learning

Uncertainty estimation is crucial for understanding the reliability of deep learning (DL) predictions, and critical for deploying DL in the clinic. Differences between training and production datasets can lead to incorrect predictions with underestimated uncertainty. To investigate this pitfall, we benchmarked one pointwise and three approximate Bayesian DL models for predicting cancer of unknown primary, using three RNA-seq datasets with 10,968 samples across 57 cancer types. Our results highlight that simple and scalable Bayesian DL significantly improves the generalisation of uncertainty estimation. Moreover, we designed a prototypical metric—the area between development and production curve (ADP), which evaluates the accuracy loss when deploying models from development to production. Using ADP, we demonstrate that Bayesian DL improves accuracy under data distributional shifts when utilising ‘uncertainty thresholding’. In summary, Bayesian DL is a promising approach for generalising uncertainty, improving performance, transparency, and safety of DL models for deployment in the real world.

Our study highlighted approaches for quantifying and improving robustness to shift-induced overconfidence with simple and accessible DL methods in the context of oncology. We justified our approach with mathematical and empirical evidence, biological interpretation, and a new metric, the ADP designed to encapsulate shift-induced overconfidence—a crucial aspect that needs to be considered when deploying DL in real-world production. Moreover, the ADP is directly interpretable as a proxy to expected accuracy loss when deploying DL models from development to production. Although we have addressed the shift-induced overconfidence by utilising first-line solutions, work remains to bridge DL from theory to practice. We must account for data distributions, evaluation metrics, and modelling assumptions as all are equally important and necessary considerations to see safe translation of DL into clinical practice.

DISCLAIMER: This work was published in Scientific Reports and was led by Dr Maciej Trzaskowski when he acted as Head of Research at Max Kelsen.

Multi-Modal Signature of Treatment Response in Breast Cancer

Intelligent Spectral Flow Cytometry in Microbiome

Immunotherapy Outcome Prediction

Cancer of Unknown Primary

Generalising Uncertainty in Machine Learning

Location

Hours

Contact