From Retina Scans to Tumour Mutations: How Multim...

Multimodal medical AI moves beyond chatbots

The most visible face of artificial intelligence has been chat-based systems, but a quieter revolution is unfolding in hospitals and laboratories. Multimodal medical AI now merges imaging, genomic and clinical data into single models that can spot patterns humans and single‑modal systems often miss. Instead of analysing pathology slides, sequencing data or electronic health records in isolation, these models learn how features in one data stream relate to another, improving diagnostic accuracy and robustness. This shift is reshaping tasks such as retinal disease prediction, breast cancer mutation profiling and drug discovery, where clinical decisions depend on the interplay of structure, function and molecular biology. Techniques first honed in consumer multimodal large language models—like shared embeddings, attention across modalities and large‑scale pretraining—are being repurposed for medicine, but with tighter constraints around safety, transparency and regulation. The result is an emerging class of specialised AI tools built not to chat, but to guide real-world care.

OCTCube and the rise of the OCT foundation model for retinal disease

Retinal diseases are a leading cause of vision loss, and optical coherence tomography (OCT) has become central to diagnosis by providing high‑resolution, cross‑sectional views of the retina’s layered structure. Yet traditional algorithms often flatten this rich 3D information into 2D slices, losing crucial context. OCTCube‑M, a three‑dimensional OCT foundation model, tackles this by treating the full OCT volume as a coherent object, capturing subtle spatial relationships between layers that signal early disease. It also embraces multimodal imaging, integrating OCT with complementary signals such as fundus or infrared views to build a more complete picture of retinal health. As a multimodal medical AI system, OCTCube‑M is designed for better generalizability across scanners, clinics and disease types, making retinal disease prediction more robust. For patients, this promises earlier detection of conditions like macular degeneration or diabetic retinopathy and more precise monitoring of treatment response, without additional invasive testing.

Predicting breast cancer PIK3CA mutations with multimodal pathology AI

In breast cancer, mutations in the PI3K/AKT/mTOR signalling pathway—and particularly in the PIK3CA gene—play a pivotal role in selecting patients for PI3K inhibitor therapies. Conventional molecular assays such as PCR or next‑generation sequencing can be accurate but require specialised infrastructure that is not always available in routine practice. A new Multimodal PIK3CA Model (MPM) addresses this gap by combining deep learning analysis of whole‑slide pathology images with structured clinical data, including age, molecular subtype and lymph node status. Its histopathology component uses a transformer‑based pretrained encoder and an attention‑driven multiple instance learning classifier to detect subtle morphological features linked to PIK3CA mutations, while the clinical model adds context that single‑modal systems lack. Evaluated on The Cancer Genome Atlas and several external cohorts, this multimodal medical AI approach shows promise as a more accessible alternative to molecular testing, potentially expanding access to targeted therapies and refining breast cancer mutation–driven treatment decisions.

Biological foundation models connect drug discovery and patient care

Beyond diagnostics, multimodal biological foundation models are reshaping how therapies are discovered, optimised and brought to patients. These biological foundation models (BioFMs) are pre‑trained on massive datasets spanning protein structures, molecular libraries, omics profiles, medical imaging and clinical documents. Unimodal BioFMs have already transformed protein structure prediction, while newer multimodal BioFMs can move fluidly between sequence, structure, text and images in a single model. Systems such as Latent‑X1 and Latent‑X2 generate novel binders and predict their interactions, and frameworks like Evo 2 aim to map relationships across the central dogma of biology. Cloud platforms provide the scalable compute, data integration and partner tools needed to train and deploy these models across the drug development lifecycle, from target discovery to trial optimisation and personalised treatment planning. This infrastructure makes it feasible for healthcare organisations to operationalise multimodal medical AI without rebuilding everything from scratch in‑house.

Benefits, risks and the need for human‑centred multimodal AI

The payoff from multimodal medical AI could be significant: earlier detection of eye disease before symptoms appear, breast cancer mutation profiling without waiting for sequencing, and biologically informed drug selection tailored to each patient’s molecular and imaging profile. Integrating OCT foundation models, pathology–clinic hybrids and biological foundation models opens the door to more personalised, less invasive care. Yet these advances bring familiar risks. Models are only as reliable as the data they are trained on, and biased or low‑quality datasets can embed inequities into diagnostics. Regulatory pathways for complex, continuously learning systems remain challenging, and clinicians must be able to understand and contest AI‑generated recommendations. The most promising path forward treats multimodal AI as an assistive layer: tools that synthesise vast imaging, genomic and clinical streams, while humans retain responsibility for interpretation, consent and care. As techniques migrate from general‑purpose multimodal LLMs into medicine, keeping patients at the centre will determine whether this quiet revolution fulfils its promise.

From Retina Scans to Tumour Mutations: How Multimodal AI Is Quietly Pushing the Next Generation of Medical Diagnostics

Multimodal medical AI moves beyond chatbots

OCTCube and the rise of the OCT foundation model for retinal disease

Predicting breast cancer PIK3CA mutations with multimodal pathology AI

Biological foundation models connect drug discovery and patient care

Benefits, risks and the need for human‑centred multimodal AI