Dec 2, 2025
Omar Al-Anni
9
min read
I. In 2025, LLMs Became Ubiquitous — And Their Limits Became Clear
A Year of Transformation — and Realization
As the industry closes out the current technological cycle, there has been profound reflection on how deeply Artificial Intelligence has reshaped operational processes. This period marks the year when LLMs became truly ubiquitous, embedding themselves into the tools, workflows, and high-velocity decision loops across nearly every industry sector. LLMs have demonstrated an unprecedented ability to rewrite code, analyze complex operational incidents, explain voluminous logs, summarize vast repositories of institutional knowledge, accelerate strategic planning, and fundamentally serve as the universal linguistic interface to immense system complexity. The expansive power of LLMs in accelerating human work and workflows is undeniable, having significantly expanded the horizons of what organizations can achieve in terms of automation and efficiency.
But after extensive deployment of AI systems inside real-world networks, complex cloud systems, and high-stakes operational environments, one profound truth has become unavoidable: while LLMs are indeed extraordinary feats of engineering, their fundamental architecture renders them insufficient for the level of intelligence and autonomy that modern digital infrastructure now critically requires. The need is shifting from accelerated human reasoning to genuine machine understanding.
II. Before We Go Further — A Necessary Clarification: Defining the Intelligence Frontier
The Scientific Uncertainty Around Defining Cognition & Consciousness
Part of the widespread confusion characterizing the contemporary debate around Artificial Intelligence stems from a foundational epistemological challenge: science itself has not yet achieved a full or unified understanding of what cognition or consciousness truly are. Neuroscience, cognitive science, philosophy, and AI research study disparate aspects of these phenomena, yet no single domain has provided a complete, scientifically resolved picture.
This lack of scientific consensus is not merely an academic disagreement; it reflects the reality that the foundational concepts remain scientifically unresolved. Yet, this vacuum of definition has allowed speculative claims to proliferate, suggesting that LLMs, by virtue of scale, "will become conscious" or "will evolve cognition". When the terms themselves are not fully understood, the expectations built upon them rapidly disconnect from technical reality. To move beyond speculation and ensure the development of reliable, high-stakes infrastructure, a practical framing grounded in measurable, engineer-able properties is essential.
Cognition vs Consciousness — Clearing the Confusion
It is necessary to establish a clear, functional distinction between Consciousness and Cognition, Consciousness refers to subjective experience—awareness, qualia, the inner point of view, or the "inner life" explored through introspection. As consciousness remains scientifically unresolved, it offers no operational value for the functional engineering or verification of distributed, high-stakes systems. These are not machines being built to "feel" or experience; they are systems designed to execute predictable, observable, and verifiable actions in dynamic environments.
Cognition, by contrast, is the functional component of intelligence. It is defined by a system’s observable ability to perceive its environment, form persistent internal causal models of how components behave, reason logically about causes and effects, anticipate future states, make non-linear decisions under uncertainty, act autonomously, and robustly learn from outcomes . Cognition is measurable, observable, and engineerable . By focusing on a functional definition of cognition (the ability to generate reliable and verifiable actions), the strategic choice for system architecture is preempted from the philosophical deadlock of consciousness, anchoring development in concrete performance metrics suitable for mission-critical infrastructure. The objective is not to generate more fluid text, but to enable true understanding of the operational environment.
III. The Architectural Limit — Prediction Isn’t Cognition
The Mathematical Framing (LLMs as Probabilistic Engines)
At their mathematical core, Large Language Models are fundamentally rooted in the transformer architecture, operating exclusively as probabilistic next-token predictors. The entire mechanism is engineered to compute the likelihood: P(next token | previous tokens).
Through training on massive datasets, they learn tremendously rich correlations, which allows them to generate complex, coherent, and fluent outputs, often indistinguishable from human prose, and sometimes allowing them to act as predictive models capable of generating risk scores .
This mechanism, however, leads to a profound architectural limit: LLMs learn correlation, but they fundamentally do not learn grounded causality or dynamic physics.4 They do not natively construct persistent internal world models , they cannot simulate environments based on first principles , they do not test counterfactual scenarios reliably, and they do not update internal beliefs based on the consequences of their actions unless externally orchestrated via a separate agentic framework. This distinction leads to the central question concerning their suitability for autonomy:
If LLMs are fundamentally probabilistic prediction engines, how can they ever truly think or understand — rather than just approximate thought?
The evidence suggests that scaling prediction yields increased fluency and coherence, but does not inherently cross the boundary into genuine causal insight or grounded understanding .
Autoregressive Fragility in High-Stakes Systems
The purely autoregressive nature of LLMs introduces a critical architectural flaw when applied to autonomous control systems. In high-stakes environments, such as managing a live network or a compute fabric, an autonomous system must execute complex, sequential, multi-step interventions and long-term planning. The technical challenge arises because the probability of error, however small, inherent in predicting the next token in each step, accumulates exponentially over time. This exponential degradation of accuracy in long compositional or planning tasks transforms a linguistic imperfection (a factual hallucination) into a potentially catastrophic operational risk (a cascading network failure). This mathematical reality strictly mandates the need for non-probabilistic, constraint-bound reasoning for operational control.
Furthermore, deep testing suggests that LLMs often operate as incoherent simulators . They simulate plausible human text, acting like sophisticated role-players or improvisers , but this simulation falls apart when robust understanding of physics, mathematics, or grounded reality is required . While LLMs can provide context and aid in causal discovery methods , current evidence indicates that they primarily perform shallow (Level-1) causal reasoning, largely attributed to the causal knowledge already embedded in their immense parameter space. They lack the capacity for genuine human-like (Level-2) causal reasoning, especially in counterfactual or fresh contexts.8 The core intelligence required for autonomy cannot rely on implicit, latent patterns derived from text; it must be a structured causal model that verifies actions against physical and logical constraints.
The fundamental architectural differences between the LLM layer and the required cognitive layer for autonomous systems can be summarized:
Table 1: Architectural Comparison: LLMs vs. Cognitive Systems
Feature/Capability | LLMs (Interface/Orchestration Layer) | Cognitive Systems (Autonomy Core) |
Core Mechanism | Probabilistic next-token prediction (Correlation) | Structured representation learning (Causality, Logic) |
Model of Reality | Implicit, latent patterns in text corpus (Incoherent Simulator) | Explicit, mathematical, and grounded world models (Telemetry, Physics) |
Memory/State | Finite context window; Requires RAG for external facts | Persistent, continuous, and environment-driven memory |
Reasoning Type | Shallow (Level-1) linguistic/analogical | Deep (Level-2) causal, counterfactual, and symbolic |
Reliability Profile | Exponential error accumulation in long sequences | Deterministic decision bounds/Constraint satisfaction |
IV. Why We Pursue Cognition?
Where LLMs Are Enough — and Where They Are Not
It is crucial to define where LLMs excel. They are not merely useful, but transformative, for tasks requiring linguistic fluency: summarization, knowledge retrieval, code generation, support workflows, and user assistance. Entire industries and companies can be built successfully leveraging LLMs for these categories.
However, the pursuit of Cognitive Digital Infrastructure is fundamentally different. The mission centers on building systems that achieve true comprehension of complex, dynamic systems—how networks, clouds, compute fabrics, AI factories, and energy systems behave. This requires intelligence that can interpret live telemetry, infer causal chains, anticipate complex failures, reason through planning scenarios, and execute autonomous, optimal action.
The Clear Division of Labor
The strategy for engineered intelligence dictates an explicit architectural separation of roles:
LLMs power the interface. They are deeply integrated to help interpret unstructured data, orchestrate agentic workflows, reason linguistically, and accelerate collaboration between machine and human operators .
Cognition powers the autonomy. It provides the causal reasoning, persistent world-models, and environment-driven learning necessary for the system to think and act reliably .
The system's intelligence is explicitly grounded in real-time telemetry—the logs, metrics, alerts, and event data—directly from the operational environment . This grounding ensures that the causal model's reasoning is rooted in the current, actual state of the system, preventing the linguistic speculation characteristic of ungrounded LLMs. This intelligent coupling, where the LLM facilitates communication and orchestration while the Cognitive Core provides the verified decision-making, forms the competitive advantage in developing resilient, self-managing digital infrastructure .
V. The Techno-Economics of LLMs — Why Scale Is Breaking Down
The Log-X Chart Reality: Quantifying Diminishing Returns
The year 2025 has also served to expose a stark economic reality: the current techno-economics of massive LLM scaling are fundamentally unsustainable. Training and serving models with trillion parameters requires enormous capital expenditure on specialized infrastructure—thousands of accelerators, massive interconnect bandwidth, and huge power capacity. This investment strain is a central concern in the global technology sector, particularly given the forecast of over $8 trillion in global spending on technology infrastructure by 2030, a challenge we have previously detailed in our analysis of the need for techno-economic strategies in network planning. While the financial and energy costs rise exponentially, the capability gains in performance or accuracy, when measured linearly, are only increasing logarithmically.
The economic viability of LLMs, therefore, is not uniform. They remain economically transformative in domains where their core linguistic capability provides outsized value relative to the cost of inference. This is primarily in high-throughput, low-stakes applications such as customer-facing chatbots and virtual assistants, code generation platforms, or tasks requiring large-scale document summarization and knowledge retrieval.
In these applications, the cost of an occasional linguistic error or hallucination is low, and the ROI derived from accelerated workflow or enhanced customer support justifies the necessary inference costs. The successful economic deployment of LLMs is increasingly reliant on techniques like distillation and quantization, shrinking them into highly efficient models suitable for specific tasks where the cost per inference is carefully managed, such as on edge devices and NPUs (see our previous related post to this topic - The AI Techno-Economic Spectrum).
Real world evidence confirms this divergence in 2025, demonstrating that the industry has reached a point of diminishing returns for this particular technology pathway. Training and operational costs surged significantly (e.g., 87% reported increases in some analyses) while corresponding revenue or quantifiable performance gains lagged far behind. This increasing cost-to-capability ratio proves that the LLM arms race, focused on maximizing parameter count, has hit critical diminishing returns.
The Impossible Trinity and the Pivot to Efficient Intelligence
The economic burden is not limited to training; the most resource-intensive and often underestimated expense is continuous inference in production. High latency and significant compute requirements for every query compromise the potential return on investment for large models in operational environments.19
Operational systems are constrained by the necessity of balancing the "impossible trinity": Model Quality (Q), Inference Performance (P), and Economic Cost (C). Massive, generalized LLMs achieve high Q by sacrificing C and P, making them commercially unfeasible for real-time, large-scale deployment where cost per interaction must be minimal. For autonomous infrastructure, the critical metric of success is shifting from cost-per-token to the cost per decision and the energy per decision.
This economic pressure is already forcing a strategic shift toward smaller, highly efficient intelligence. This manifests in industry-wide focus on compute-efficient architectures and optimization techniques such as quantization (reducing numerical precision for lower memory footprint), sparsity (removing unnecessary weights), and model distillation (transferring learned knowledge from large "teacher" models into smaller, efficient "student" models) . Furthermore, the proliferation of specialized hardware like Neural Processing Units (NPUs) allows efficient inferencing of substantial reasoning models (7B and 14B parameters) directly on the edge, decentralizing compute and reducing reliance on massive, centralized clusters . This industry trajectory toward decentralized, cost-effective, high-speed computation validates the foundational need for architectures designed for efficiency and grounded understanding, rather than brute-force prediction.
VI. The “AGI Will Pay for Itself” Argument — And Why It Breaks Down
A popular counter-argument often raised to justify the continued exponential investment in massive LLM training suggests that the eventual achievement of Artificial General Intelligence (AGI) through scale will ultimately render all current costs negligible—the AGI will simply "pay for itself".
However, this argument rests on two fundamentally flawed premises. First, it assumes that LLMs can reach AGI simply by increasing scale, a belief contradicted by architectural evidence . LLMs do not inherently possess continuous learning capabilities, Level-2 causal reasoning, persistent internal state, or environment-driven adaptation—all considered prerequisites for genuine general intelligence. Scaling prediction does not magically produce the required underlying understanding.
Second, the economics break down before the theoretical point of AGI is reached. As established by the logarithmic capability curve, the return on investment diminishes rapidly. A system built on exponential training costs , which requires periodic, massive retraining on enormous clusters to update its knowledge, is economically fragile and inherently unable to compete with architectures designed for continuous, low-cost learning from real-time experience. Cognitive systems, designed to learn continuously and ground their models in the environment they operate in , inherently possess a far superior economic model for operational deployment.
Table 2: Economic Scaling Dynamics in AI Architectures
Dimension | Massive LLM Scaling Trajectory | Cognitive Architecture Trajectory |
Training Cost Curve | Exponential increase (Rapidly unsustainable) | Moderate, utilizing domain-specific data and continuous learning |
Capability Gain Curve | Logarithmic increase (Diminishing Returns) | Linear to Super-linear, especially in operational efficiency |
Operational Cost Driver | High Cost per Inference (C) due to VRAM/Bandwidth | Low Energy per Decision (Efficiency, NPUs) |
Learning Methodology | Requires periodic massive retraining | Continuous, real-time learning from environment feedback |
Economic Justification | Reliance on speculative "AGI will pay for itself" future return | Justified by immediate, quantifiable operational ROI (uptime, optimization) |
The reality remains fixed: LLMs are highly effective interface engines. They are not, and cannot become through mere scaling, the robust engines required for autonomous intelligence. No economic projection reverses this architectural limitation.
VII. The Cognitive Stack We’re Building: The Neuro-Symbolic Foundation for Autonomy
The transition to Cognitive Digital Infrastructure necessitates an entirely different architectural foundation capable of combining the statistical power of deep learning with the deterministic reliability of formal logic. This is the realm of hybrid neuro-symbolic AI . This approach solves the dual challenge of perception and verifiable reasoning, creating systems that are both robust and transparent.
The pivot to genuine cognition necessitates architectural approaches fundamentally different from autoregressive text models. This new generation of cognitive architecture requires shifting away from purely text-based next-token prediction toward frameworks designed to learn the dynamics of the world from perception and interaction. We are actively incorporating and exploring advanced self-supervised methods, such as Joint-Embedding Predictive Architectures (JEPAs) and Energy-Based Models (EBMs). Unlike language models that learn from linguistic patterns, JEPAs are designed to learn robust representations by predicting hidden parts of a system's state or future states, mirroring how agents learn physical reality through observation. This focus on perception and world dynamics is crucial for developing the explicit, causal simulator the Cognitive Core requires.
Our Grounded Reasoning Mechanisms (Technical Deep Dive)
Our Cognitive Stack is built upon systems that learn structured representations of the world—telemetry, constraints, and causal relationships—rather than relying solely on unstructured token sequences.
Causal World Model Integration: A fundamental component of the cognitive core is the explicit, structured Causal World Model . This model, grounded in real telemetry , acts as a dynamic simulator of the managed environment (e.g., the network topology, the physics of the power grid, or the flow dynamics of a compute fabric) . This allows the system to simulate behavior and rigorously test counterfactual scenarios (e.g., "If I apply this patch, what is the consequence?") before any action is committed . This guaranteed pre-testing is essential for safe, high-stakes autonomy.
Logic Tensor Networks (LTN) and Symbolic Constraints: To ensure the system’s actions adhere to operational standards and physical laws, the architecture integrates neuro-symbolic frameworks . Tools like Logic Tensor Networks (LTN) utilize a differentiable first-order logic language (Real Logic) to incorporate abstract knowledge and logical formulas directly into the learning process . This guarantees that the system’s decisions satisfy explicit, deterministic constraints, grounding abstract concepts in concrete data tensors and ensuring compliance with predetermined safety guardrails.
Continuous Learning and Digital Twins: The intelligence of the cognitive core is realized through continuous agentic loops, which integrate neural perception with symbolic logic and act on continuous feedback . This methodology involves creating dynamic digital twins of the managed infrastructure, which map real-time data from various sources into a semantic ontology . This process defines system-wide semantic relationships, allowing for continuous contextualization and updating of the world model based on streaming data . This approach ensures learning occurs in real time, at low cost, grounded in the environmental dynamics.
Explainability as a Competitive Advantage in Critical Systems
The adoption of the Neuro-Symbolic architecture ensures compliance with safety guardrails by fundamentally engineering transparency into the decision-making process . In critical infrastructure environments, such as energy grids or defense operations, trust requires explainability.29 The hybrid model allows the neural component to detect an unusual anomaly (statistical correlation), while the symbolic component provides the verifiable explanation—a root-cause analysis paired with the logical rules that were applied or violated . This critical capability allows human operators to audit, validate, or override system recommendations with confidence, a feature impossible to deliver reliably using opaque, black-box LLMs that only provide a plausible sequence of tokens . This verifiable, trustworthy autonomy is a core competitive differentiator for operational systems.
VIII. Closing Thought — The Real Frontier Begins Now
In 2025, Large Language Models achieved total ubiquity, fundamentally changing the human interface with technology. They have defined a new generation of productivity and linguistic automation. Yet, their architectural and economic limitations confirmed that they are the beginning of the journey, not the destination for autonomous intelligence.
In 2026, the scientific and engineering focus shifts, and cognition becomes the frontier - and we have been saying this all day long since early 2021! :)
The future of digital infrastructure will be defined not by the size of generalized transformer stacks, but by the grounded, efficient, and causal intelligence of systems that truly understand and manage the complex world they operate within.
Conclusion
This is where the rubber hits the road. The defining lesson of 2025 is that we stopped asking, "Can they talk?" and finally focused on the critical question: "Can they truly think and act with verifiable certainty?" For those of us managing the vital pulse of digital infrastructure—from global networks to private AI/Cloud Infra—we learned that linguistic fluency is a liability without grounding. The "3 a.m. decision" —the autonomous action that stabilizes a failure, optimizes a resource, or prevents a catastrophic outage—cannot be entrusted to a probabilistic guess. It must be rooted in causality, logic, and verifiable action.
TelcoBrain’s choice is clear. We are not abandoning the LLM; we are largely using it and permanently changing its role. It remains the brilliant interpreter and interface, accelerating human-machine collaboration. But the engine of autonomy and thinking—the system that learns continuously and locally, solving for cost-per-decision against the backdrop of diminishing economic returns —must be the Cognitive Stack.
The shift is from scale-at-all-costs to understanding at all times. We commit to the path of neuro-symbolic, grounded, and causal architectures because in high-stakes environments, the margin for error is zero. The industry must engage in an open dialogue regarding these fundamental limits.
At TelcoBrain, we would welcome thoughts from others building AI in real operational environments: Are the same architectural and economic limits being observed in your deployments? Where are your critical bets for what comes next?




