How AI finds submarines (for dummies)

Peter Spayne
Jan 2
5 min read

Today at the First Sea Lord’s Seapower Conference, we heard that underwater gliders will soon help protect our seas. This marks a new era in military strategy, using new tools and new kinds of thinking. But how hard is this to do? To understand that, let me explain something very complex called a Large Acoustic Model, or LAM, to you as if you were a toddler.

Lets compare LAMs to LLMs. Start by thinking about Chat GPT. At first, language and underwater sound seem completely different. One is made of the words people speak. The other is made of noises deep in the sea. But both have a similar puzzle: the meaning can change depending on what is happening around them. Language models must understand how words shift with tone and place. Sonar systems must work out how ocean sounds change with temperature, salt and movement. They may live in different worlds, but they follow the same idea, and what we learn from one can help us understand the other.

Language and sonar work with very different kinds of signals. Language models handle symbolic tokens, such as words, morphemes and punctuation. These symbols are usually stable, but their meaning can change with syntax, culture or the way a sentence is put together. Sonar systems, on the other hand, listen to continuous acoustic signals shaped by wave propagation, reflection and interference. These change as the ocean’s temperature, salinity and pressure change. Even though one deals with symbols and the other with physical sound, both must solve the same puzzle: their signals can become unclear when the surrounding context shifts.

Some words can mean many different things, and the right meaning depends on the other words around them. An LLM must work out whether a “bank” is the side of a river, a place to keep money, or something else altogether.

Many languages change the shape of their words to show gender, number, tense, mood or case. The core meaning stays the same, but the outer form can look very different.

Languages can arrange their words in different orders and use different rules for grammar, meaning and social style. An LLM has to learn all these systems at once and know which one to use at the right time.

Some languages have plenty of examples in training data, while others appear only a little. This makes the model better at some languages than others. All these differences make language tricky and changeable, but they come from human rules and habits, not from the physical world.

The ocean is never the same from one place to another. Changes in temperature, salinity and depth can all change how fast sound travels and how it behaves. Because of this, one target can look very different on sonar depending on the ocean conditions around it.

Sound in the sea is often disturbed by many things. The seabed can scatter it, sea life can clutter it, and reflections can bounce it along many paths at once. Waves and surface turbulence add even more noise. These distortions do not stay the same; they shift with the seasons, the weather, the tides and the living creatures in the water.

When a vessel rolls, pitches or yaws, or when a towed system swings from side to side, the shape of the sonar view changes. Because of this movement, the same object can look stretched, shifted or weakened in the sonar image.

Unlike language, where billions of examples are easy to gather, sonar data is limited and costly to collect. There is little ground truth, and every mission takes place under different conditions. All of this makes sonar analysis harder than language analysis: the signals change more, the environment has a stronger effect, and there is far less clean data to learn from.

Despite the surface differences, the two domains share important conceptual parallels.

In language, a word’s meaning depends on the other words around it. In sonar, the meaning of an echo depends on things like temperature layers, salinity, seabed type and how the platform is moving. In both cases, the model must look past these changing surface effects and find the stable pattern underneath.

One idea in language can show up in many different word forms. In the same way, a single object underwater can create many different acoustic signatures. In both cases, what you see on the surface is not always a reliable guide to what is really there.

Language can be unclear when one word has several meanings. Sound propagation can be unclear too, when echoes bounce along many paths or when clutter gets in the way. Both systems must sort out these conflicts and decide which meaning fits best.

Rare languages behave much like rare sonar conditions: they do not appear often, are hard to learn, and can lead to more mistakes. These similarities show that, even though LAMs are shaped by the physical world, many of its modelling challenges closely match those faced by LLMs.

Although LAM output data presents harsher constraints than looking at raw SONAR data, several lessons from LLM development translate well.

LLMs became much stronger when they learned to use long-range links and wider context. LAMs gain the same advantage by using architectures that bring in environmental data, past tracks and information from many sensors to help untangle difficult returns.

Embedding methods help LLMs learn deep ideas that stay the same even when the words look different. In LAM acoustic embeddings can do something similar by smoothing out environmental distortions and grouping targets that belong together, even when the sound has travelled different paths.

LLMs grow strong by training on huge amounts of mixed and noisy text. LAMs build the same kind of toughness by using synthetic data, challenging conditions and realistic simulations, so they learn to cope with many types of noise.

LLMs that are first trained on wide, general text become very good at special tasks once they are fine-tuned. In the same way, LAMs could be pretrained on large amounts of unlabelled acoustic data and then adapted for jobs like object classification, seabed mapping or spotting unusual patterns.

LLMs now bring together text, images, audio and code. LAMs follow this path by combining acoustic signals with inertial, optical and environmental data to build a fuller picture of what is happening. In both fields, these ideas from LLMs can help reduce the problems caused by sparse, changeable and noisy data.

In this new phase, the focus must move from traditional sonar theory and interpretation to the study of LAM output itself. Instead of analysing raw sound, experts must learn how to read and question what a Large Acoustic Model produces. This shift matters because LAMs need to be trained and refined so they can reduce false alarms and prevent false positives, helping crews trust their results. It is the first real turning point in how AI is used at sea. We are not seeing uncrewed systems that no longer need people; instead, we are seeing systems that need people in a new way. Sonar operators must now learn an extra skill: how to interrogate AI-filtered output and understand the patterns and choices the model has made before the data reaches them.

How AI finds submarines (for dummies)

Recent Posts

Comments