Silence of the LAMs
- Peter Spayne

- Jan 2
- 14 min read

Introduction
Large Acoustic Models (LAMs) are expansive machine-learning systems trained on diverse undersea acoustic data, acting as decision engines that interpret sonar and other acoustic inputs for uncrewed underwater vehicles.
In harsh, low-visibility, GPS-denied environments where traditional crewed sonar platforms once struggled with an undersea conditions, LAMs can now analyse sonar returns in real time, often matching or surpassing human operators in detecting and classifying contacts such as mines, wrecks, submarines and the seabed. Coupled with modern sonar suites and robust UUV platforms, they promise step changes in mine detection, collision avoidance, and autonomous navigation. In this article, I will explore how LAMs sit between sonar sensors and propulsion effectors (and, where relevant, UAV partners), and then examine the key trade-offs for UUVs: sensor suites and in situ sound-speed measurement, operations at different depths, communication constraints, and the resulting implications for vehicle design and onboard compute.
Large Acoustic Models as Underwater Decision Engines
Large Acoustic Models (LAMs) are to underwater acoustics what Charles Babbage’s analytical engine was to early computation: general-purpose decision mechanisms that transform raw inputs into structured judgement. In the subsea domain, a LAM ingests active and passive sonar returns together with environmental data (temperature & salinity profiles, sound-speed gradients, local bathymetry, and seabed composition) to perform high-level interpretation such as contact detection, target classification, and spatial reasoning about the water column.
Oceanographic factors dramatically shape the acoustic scene: thermoclines refract sound rays, haloclines create shadow zones, and deep sound channels can guide low-frequency energy across long distances. Modern research groups (e.g., MIT/WHOI, Scripps, NATO CMRE) have shown that deep-learning architectures which fuse sonar imagery with environmental metadata can significantly outperform traditional sonar pipelines, especially in variable or stratified waters where classical models struggle to adapt. The result is a sonar brain that understands what the sensors are seeing, and why the returns look the way they do given the local water mass structure.
LAMs enable real-time, at-the-edge decision-making, incorporating both sonar data and evolving environmental measurements so a UUV can interpret the acoustic scene as conditions shift. A vehicle equipped with CTD or SVP instruments can update its sound-velocity profile continuously; the LAM can then account for breaking thermoclines, internal waves, or seasonal stratification when it classifies or localises a target. This is vital because identical objects can appear very different when the acoustic path bends or sound is trapped within a particular thermal layer. When a side-scan or SAS return suggests a mine-like contact, the LAM can flag it with high confidence while adjusting its confidence based on local propagation loss or bottom-type uncertainty.
Distributed-acoustic trials (such as DARPA-supported glider experiments) have demonstrated that swarms of UUVs using onboard machine learning and environmental context can dramatically increase coverage and detection probability, as each node dynamically adapts to its local sound-speed structure. In such systems, a single operator supervises the picture, not the vehicles, while each UUV conducts its own sensor-to-decider loop using local oceanographic cues.
Building a robust LAM for these conditions is not a trivial undertaking. Sonar interpretation must be trained on massive, environmentally diverse datasets spanning warm shallow shelves, cold high-latitude basins, deep sound channels, and turbid estuaries, each with unique propagation signatures. Seasonal thermoclines, upwelling regions, river plumes, oxygen minimal zones, and variable seabed geology all alter the acoustic field, so training data must reflect these complexities. Modern academic systems now integrate environmental side channels; temperature, salinity, depth, surface conditions, and benthic type alongside raw acoustic features, enabling the model to generalise to new regions by reasoning about the physics of sound propagation.
This continuous-learning paradigm is the reason why SONAR operators (or more specifically people who can read or see acoustics and envision the environment) are some of the most intelligent (and odd) people in the world. The AI version mirrors the intuition of a human sonar operator, except the LAM can assimilate new environmental and acoustic data from every mission to refine its detection logic and expand the set of conditions under which it performs reliably. In essence, a LAM is not just a classifier of sonar returns, but an adaptive acoustic–oceanography engine that evolves with each deployment.
The Geeky Bit
For a LAM to make defensible, physics-aware decisions, a UUV must supply it with high-fidelity, time-synchronised acoustic and environmental data.
This requires a tightly integrated multimodal sensor stack. Side-Scan Sonar (SSS) provides wide-area sonification using fan-shaped beams, generating shadow–highlight geometries that encode object shape, grazing angle, and bottom roughness; LAMs exploit these coherent spatial patterns to reject clutter and flag mine-like primitives.
Synthetic Aperture Sonar (SAS) extends this further by phase-coherently combining pings over the vehicle track to synthesize a long aperture, yielding centimetric-resolution imagery whose spatial bandwidth rivals optical systems. Because SAS encodes fine-scale scattering physics; micro-shadowing, edge diffraction, texture anisotropy, it feeds LAMs with feature-rich data enabling single-pass detect–classify workflows.
The main constraint being platform stability and data throughput, making on-vehicle inference essential.
Multibeam Echo Sounders (MBES) contribute high-rate bathymetric point clouds through wide fan sonification and beamforming, providing the geo-spatial scaffold against which SAS/SSS detections are registered. Interferometric and dual-frequency MBES allow simultaneous depth, slope, and backscatter estimation, enabling LAMs to filter false positives by correlating object returns with known geomorphology. Forward-Looking Sonar (FLS) supplies short-range, high-update obstacle scans using sector-based acoustic imaging; its near-field geometry allows the LAM to compute time-to-collision, perform obstacle shape inference, and trigger evasive path replanning.
To characterise the propagation environment, UUVs carry SVP/CTD sensors for in situ measurement of temperature, salinity, and pressure (depth), allowing reconstruction of the sound-speed profile (SSP). Since refraction at thermoclines bends acoustic rays and shifts target range/angle solutions, LAMs require this SSP to correct for multipath, shadow zones, and variable spreading loss. High-turbidity detectors, optical backscatter meters, and turbidity proxies further help the LAM predict volume reverberation levels and coherence degradation.
Passive hydrophones provide broadband ambient sound fields, enabling the model to identify anthropogenic tonals, transient machinery signatures, and biological clutter; with enough spatial apertures (hull-mounted or compact arrays), LAMs can perform localisation or at least directional classification.
Navigation aids; INS, DVL, magnetometers & MEMS allow precise georeferencing of acoustic contacts. SAS micro-navigation techniques use phase correlation to correct inertial drift, feeding navigation corrections directly into the model’s world-state estimator. When fused, these streams provide the LAM with a dense, multi-spectral view of the environment: SSS/SAS for texture, MBES for geometry, FLS for collision envelopes, SSP for acoustic physics, and navigation for coherent mapping.
This fusion allows the LAM to infer whether an object is a mine, a rock outcrop, or suspended debris, outperforming rule-based systems by combining raw scattering physics with statistical priors learned from vast sonar–oceanography datasets.
But it’s big, heavy, and takes a lot of power. This is a problem…
LAM Infra – Sheep require Shepherds
Operating at different depths creates a set of practical compromises for anyone designing or deploying UUVs. In shallow water, vehicles are close to the surface and can regularly get GPS, radio or satellite signals. This makes it possible to offload some of the heavy processing to a ship, shoreside server, or an unmanned surface vessel. Engineers can therefore choose smaller, lighter and cheaper vehicles, because they don’t need to carry large computers onboard. The downside is that communications are not guaranteed; surf, harbour clutter, seabed features or jamming can interrupt links so operators must accept the risk of intermittent supervision. From a logistics viewpoint, shallow-water boats are easy to transport and launch, but they can’t always be trusted to run a full LAM onboard if power and space are limited.
At the opposite extreme, deep-sea operations remove the luxury of constant communication. Radio and GPS do not penetrate, and acoustic messages are slow, unreliable and often too low-bandwidth to send images. That means a deep UUV must make most decisions on its own, including recognising hazards and choosing its navigation path. The engineering trade-off is that more autonomy demands more battery, more compute, and a stronger pressure hull. But building a ‘gold-plated’ deep vehicle with every possible capability leads to practical problems: the UUV becomes too heavy to move, too large to fit on smaller ships, and too expensive to risk losing. Operators therefore have to balance capability against the realities of transport, launch and recovery.
Vehicle size and power become central constraints. Deep-rated pressure hulls need thicker walls and stronger materials, which quickly add mass. Larger vehicles can carry big batteries and powerful processors to run complex LAMs, but they require cranes, trained crews and suitable weather to deploy. Smaller vehicles, the kind that two people can carry, are attractive for mine hunting and coastal survey, but they simply do not have the power budget or space for server-class computers. Owners and operators must decide whether to buy a small fleet of simple vehicles or a few highly capable but logistically demanding ones.
Another consideration is whether to rely on centralised or distributed AI. In friendly, shallow environments, it may make sense for multiple small UUVs to pass data to a surface vessel or UAV acting as the thinking hub. This allows small vehicles to behave smarter than their onboard hardware would normally permit. But for anything beyond sheltered waters, this approach becomes fragile: if the hub goes down or the link is jammed, the whole system collapses. Engineers often end up settling on a hybrid design, where the UUV handles first-pass detection with a modest LAM onboard, then occasionally surfaces to upload summaries for human review.
Endurance and battery life shape every decision. Running a LAM continuously burns power, and propulsion burns even more. Long-endurance gliders solve this by moving slowly and using very little energy, but that limits the kinds of missions they can undertake. Fast AUVs performing active-sonar mine hunting may only run for a day, which makes them viable platforms for heavier onboard compute.
Environmental conditions matter too: cold deep water reduces battery performance, while warm shallow water can overheat electronics inside sealed pressure housings. Operators therefore have to plan missions around charging cycles, weather windows, deck space and maintenance routines.
In practice, there is no perfect UUV. Shallow-water units can be small, cheap and communication-friendly, but limited in onboard intelligence. Deep-water units can be powerful, autonomous and sensor-rich, but expensive, heavy and difficult to deploy. Choosing the right balance is less about ideal engineering and more about what the mission, crew, vessel and budget can realistically support.
On top of the engineering trade-offs, the human and logistical infrastructure around a UUV programme is often dramatically underestimated. Many procurement documents assume that control will take place from a laptop on a folding table, or from an ISO container dropped onto a dock somewhere. In peacetime survey work, that might hold true. But mine warfare is not a survey contract, and assuming you can simply plug in a laptop and run multimillion-pound autonomous systems ignores reality.
You need secure comms, environmental control for mission systems, reliable power, and trained technicians who can diagnose faults quickly. A containerised control cell can work, but it still needs to be transported, protected, powered, and staffed, and it takes up deck space that many vessels do not have. Worse still, the assumption that such a setup is plug-and-play collapses the moment the operating area is constrained, the timeline is short, or the vessel of opportunity is not the vessel you expected.
When you shift from benign waters to contested operations, the problem becomes even sharper. Launch and recovery which is already the most accident-prone part of UUV operations becomes a tactical risk.
The bigger and heavier the vehicle, the more equipment, people, and time you need on deck. That means cranes, davits, backup winches, safety boats, and a ships company sized working party to handle them under pressure. This all has to happen while someone may be trying to detect, disrupt, or kill you.
Operators may need to abandon their control site at short notice, relocate, or hand over to another team mid-mission. A large UUV that requires a fleet of HGVs to support a fleet of fully kitted ISO containers, gantry crane, and engineering team is suddenly far less useful if you must launch it from a small, fast, minimally crewed platform operating within the threat envelope.
In the worst case, the launch, recovery, and control of the vehicle becomes a special forces task, not an engineering one. Requiring small-team insertion, rapid deployment, minimal signatures, and the ability to fight your way out if something goes wrong. This drift from technical operation into tactical operation is something many organisations are not culturally, procedurally, or contractually prepared for, but it is central to real-world viability.
Challenging LAMs Underwater
Even the most capable LAM will be challenged by the hard physics of the ocean: sound-speed gradients, internal waves, scattering layers and bottom-type shifts all distort acoustic returns in ways no static model can fully anticipate. A LAM tuned on one propagation profile may see its feature space warp completely in another, producing confidently wrong classifications. Engineers can mitigate this with adaptive models and environmental side-channel inputs, but they cannot eliminate context drift. When the classifier’s decision boundary shifts unexpectedly, it is the human supervisor who must catch the anomaly, re-label outputs, or downgrade the model’s authority in real time.
The severity of this depends on what your looking for. A LAM that generates a dozen “mine-like” detections forces a queue of verification tasks (often manual); cross-checks using multibeam, FLS snapshots or raw SAS tiles. If the model is under-sensitive, you risk a mine slipping through; if over-sensitive, you overwhelm limited operator bandwidth and burn mission time looking at rocks. In practice, this forces a human arbitration layer: an operator who reviews low-confidence detections, resolves disagreements between sensors, and overrides LAM decisions when the acoustic context looks inconsistent with the physics. Autonomy reduces grunt work but increases the cognitive load of supervising a system that occasionally hallucinates patterns in noise.
Running a LAM onboard is also constrained by compute, heat dissipation and energy budgets. Pruning, quantization and low-power accelerators help, but they can create edge-case fragilities: rare reverberation patterns, exotic multipath, or transient tonals that fall outside the compressed model’s representational capacity. Since no engineer can live-debug underwater, UUVs still require human-designed fail-safes; watchdog timers, deterministic collision-avoidance behaviours, and dead-man routines that take control when the model’s outputs stall, saturate or contradict the vehicle’s inertial state.
Communications constraints add a final human dependency. A UUV cannot stream full-resolution sonar or upload diagnostic logs; it can only send terse acoustic summaries.
That means model updates, error analysis and retraining all depend on post-mission analysis. If the ocean throws an unfamiliar target, the LAM will operate with stale priors until a human can pull the data, retrain the model and redeploy it. In contested environments, this lag is the gap where human judgement must backstop machine autonomy, ensuring that when the LAM’s confidence exceeds its competence, someone is still checking the work.
Multi Robot Collaboration (Not a swarm – that’s not a thing)
Cross-domain teaming between UUVs and UAVs sounds elegant on a slide, but in practice it behaves like a fragile, salt-water-soaked variant of a distributed fire-control network; one where timing errors, doppler drift, and skew can collapse the entire chain. When a UUV passes data to a UAV, that UAV effectively becomes a micro-node in a digital targeting web, and now the system must maintain cross-medium synchronisation between two clocks operating in physics mediums that require different technical solutions.
The UUV’s acoustic modem works on narrowband, Doppler-sensitive carriers whose timing is continuously warped by water temperature, depth, multipath, and the vehicle’s own motion, while its airborne uplink modem, used only when surfaced must operate in a completely different regime, switching to high-frequency RF with its own oscillator drift and antenna alignment quirks. Meanwhile, the UAV’s RF link is disciplined by GPS with microsecond precision. Bridging all three timing domains; underwater acoustic, airborne RF, and the UUV’s onboard clock requires time base translation, buffering, doppler correction, and drift compensation simply to ensure a timestamped sonar contact isn’t treated as being tens of metres off-position because the clocks disagreed for a heartbeat during the handover.
Both systems depend on coherent timing across nodes, but here one node is whispering at 2 kbit/s through variable-velocity seawater and the other is burning battery fighting wind shear. Surfacing windows must be choreographed to sub-second tolerances to ensure the acoustic packet aligns with the UAV’s RF polling cycle; miss that slot and the system either drops the data or introduces latency that renders the targeting information tactically useless.
Doppler from waves, body-flex induced jitter in the UAV’s IMU, and the UUV’s own ascent/descent noise floor, and you end up with a timing stack that must be constantly corrected, resynchronised and sanity-checked.
Using UAVs for multi-static sensing; dipping sonars, opportunistic sonobuoys, high-frequency rescan passes, etc is technically powerful but operationally brittle. Multi-static geometry gives the LAM far richer data, but coordination is a nightmare: two platforms moving in different media, under different physics, trying to maintain coherent timing and mutual localisation. It is an engineering dream and a logistical headache. Even when a drone can, in theory, drop a buoy or relay a data stream, endurance, deck space, sea state, and operator workload often ruin the plan before the first packet moves.
Recovery and redeployment by UAVs or other light assets sounds futuristic but collapses under real constraints: precision sea-surface rendezvous, motion compensation, battery swaps in salt spray, and the danger of losing both vehicles in one mishap. The LAM may decide when a UUV should come up for extraction, but the physical act of retrieval is still a human, weather-sensitive, risk-laden operation. No AI can fix the fact that a 1-metre swell can flip a drone, drown a connector, or slam a UUV into a hull.
A realistic and increasingly achievable combined CONOPS is emerging in which UUVs, UAVs and USVs form a distributed maritime intelligence network, each running a lightweight slice of the overall LAM and exchanging concise, physics-aware summaries rather than bulky datasets.
UUVs provide deep-environment sensing, UAVs handle rapid relay and overhead coverage, and USVs act as the unglamorous but essential technology hubs; self-propelled server racks carrying precision timing systems, disciplined oscillators, high-stability antennas and ruggedised compute clusters. These surface nodes stabilize the timing and data backbone, while the LAM functions as a federated decision layer that fuses inputs across noisy acoustics, intermittent RF and mismatched clock domains. As autonomy improves, the platforms coordinate synchronization windows, sensor tasking and failover routing, allowing the network to reassign roles dynamically and maintain mission continuity even when individual nodes are degraded, delivering a cross-domain system far more resilient and capable than any unmanned asset operating alone.
But even in this future, human resource remains the most strategic dependency, and this is where many single-domain organisations will have to fundamentally rethink how they operate. Launch and recovery still demand trained deck crews, safe weather windows and vessels capable of handling the physical mass and awkward dynamics of UUVs; no amount of autonomy changes the fact that someone has to wrestle the thing into the water. Transportation dictates what can actually be deployed: a high-end UUV is worthless if the only ship available can’t lift it, and no air force, navy or army can cling to its own domain boundaries when the system requires cross-domain logistics, permissions and risk ownership.
Ultimately, this is a command-and-control problem, and the C2 architecture is more important than the vehicles it supervises. Control cells whether housed in a shore facility, a ship’s ops room, mobile shelter, or the back of a van in an underground car park must be staffed, protected, mobile, and survivable under threat. They are the nerve centers for deployment, extraction, infiltration, exfiltration, security and secrecy.
The people haven’t gone away; they’ve simply migrated into a different domain. This raises uncomfortable but necessary questions: if the operational centre of gravity shifts onto land, is the Navy still the right organisation to run it? Should ownership move to a joint unit, or a special-operations detachment better equipped to manage dispersed, relocatable C2 under fire? Even the most sophisticated distributed AI still depends on humans to move hardware, fix failures, validate ambiguous contacts and make time-critical decisions in hostile settings. The combined CONOPS only works if autonomy is matched with a realistic, cross-domain, properly resourced command-and-control framework that accepts the friction of real operations instead of pretending the hard problems disappear once the vehicles are unmanned.
Conclusion
Large Acoustic Models are the core enablers of unmanned systems operating underwater. A LAM can distinguish mine-like returns from seabed clutter, track distant propeller signatures and drive avoidance maneuvers in milliseconds, but only when treated as a serious engineering capability. Effective autonomy demands a realistic understanding of what the ocean will and will not tolerate.
LAMs differ fundamentally from LLMs. They interpret acoustic waveforms shaped by environmental variability. They cannot sit in a cloud cluster, they must run inside a pressure hull, close to the sensors, sharing limited power and thermal headroom. The real constraint is not how to build the model, but where to place it. Too little compute and the LAM underperforms; too much and the UUV becomes heavy, power-hungry and difficult to deploy.
These constraints can be overcome through distributed processing across UUVs, UAV relays and USVs acting as floating server racks and physics-informed model compression tied to environmental and timing conditions. But each solution raises further considerations: How much compute can a pressure hull safely support? How to maintain synchronisation across air, surface and subsurface nodes? How to secure command and control when operators may be forward-deployed or geographically dispersed?
The challenge is integration: ensuring these powerful models fit into real world environments, organisations and logistics without overlooking the human, physical and operational factors that ultimately determine success.
Sources: The insights and examples in this article are drawn from recent advances in autonomous undersea systems and AI, including studies on synthetic aperture sonar for autonomous mine hunting, and real-world trials of edge-processing glider swarms for undersea surveillance. The discussion integrates findings from naval research on multi-UUV cooperation and deep learning-based sonar recognition in complex environments, as well as specifications of modern UUV sensor suites (e.g. Bluefin-12 with forward sonar and automatic target recognition). These sources collectively underline both the potential and challenges of deploying large-scale acoustic intelligence in AUVs, guiding our exploration of LAM applications and limitations.


Comments