Most companies are caught off guard when product launches slip, incidents occur, team conflicts arise, or budgets are exceeded. Yet what managers call “unexpected” disruptions are often statistical tail events playing out predictably.
Until now, organizations have relied on consultants and managers for ad-hoc post-crisis analyses. But traditional analysis is expensive and inconsistent because we humans struggle with inductive reasoning, statistics, and we usually only have access to a portion of the data we need.
This year, I’ve run two experiments to get a glimpse at the role that LLMs will play in organizational development. In my first experiment, I replaced an agile coach with OpenAIs GPT-4.5 and had it help a team in addressing slow code peer review. In this experiment, the team interacted directly with the LLM through prompts, and I sat in and listened and observed…mostly (more on that later).
In my second experiment, I created an LLM agent, fed it with data from Jira, Github, Slack, and Splunk, and applied an analytical framework with mathematical models (queuing theory, graph theory, probabilistic processing, Bayesian networks). I tested if the LLM agent could identify, and predict operational disruptions such as incidents, growing backlogs, delays, and conflicts between teams using this combined exhaust data, and these mathematical models (it did, exceedingly well).
LLMs enable a significant shift in how organizational problems can be understood and addressed. With access to this form of exhaust data, and when constrained by mathematical models, LLMs can process and detect (modellable) patterns that traditional analysis typically misses until crisis emerge. This comes at a time when positive cash flow is more important than ever, something I’ve written about: The Treasury Renaissance: Why Tech Hiring Won’t Return to Normal
While traditional analysis relies on periodic assessments usually performed by consultants or managers, LLM agents can provide continuous, predictive organizational intelligence. And for deterministic and stochastic patterns (the foundation behind many operational problems), LLM agents can be cheaper, faster, and more reliable than bringing in consultants.
The implications of utilizing LLM agents in organizational excellence go beyond cost savings in analysis. Organizations that adopt this capability will be able to identify bottlenecks, predict disruptions, and intervene before problems compound into expensive technological and social failures.
But LLM agent driven organizational excellence also creates serious tensions. It poses existential challenges for management consulting firms and agile coaches whose value propositions center around ideology, and narrative-framework-driven interventions. It raises ethical questions about workplace surveillance and algorithmic decision-making. It will force evolved operational ways of working, and it risks cognitive atrophy as managers become dependent on systems they don’t fully understand.
In this post, I describe my experiments, drivers and tensions behind the adoption of LLM agents for organizational development. I also go through ethical considerations, and future implications. But first, I go through the mathematical foundations that make this transformation possible.
Mathematical Patterns at Work
In organizational contexts there are three distinct mathematical patterns that demand different approaches to analysis and intervention. LLM agents can reshape organizational development in two of these patterns: deterministic and stochastic. Understanding these three patterns is crucial for knowing what to use LLM agents for, and when human judgment (or abductive reasoning) remains essential.
Deterministic patterns operate as mechanical relationships with fully predictable outcomes. Work arrives at rate λ and gets processed at rate μ, and backlogs grow whenever λ exceeds μ. Dependencies create network bottlenecks where teams become nodes and handoffs become edges. These follow classical operations research (queuing theory, network analysis, critical path calculations). When a development team receives 10 tickets daily but completes only 8, the backlog grows by 2 tickets each day regardless if they’re running Scrum, Kanban, or SAFe.
Stochastic patterns involve random variations that occur within predictable distributions. Developer illness, delayed code reviews, missed requirements, and minor outages aren’t anomalies, they’re expected statistical noise. A developer might be sick 2% of the time, reviews late 15% of the time, and incidents occur at measurable frequencies. While we cannot predict exactly when these events happen, we can model their probability distributions and expected impact. Finance departments routinely hedge against this type of volatility, yet organizations treat these predictable variations as unforeseeable disruptions when it comes to organizational performance and operations.
Emergent phenomena resist mathematical prediction entirely. Trust doesn’t erode linearly, it holds until reaching a tipping point, then collapses suddenly. Team culture can shift from collaborative to toxic through a rapid transition that no equation captures. These phase shifts often appear sudden to managers because the underlying forces remain invisible until a critical threshold. Black swan events, by definition, fall outside models. Mathematics can describe the boundaries where emergence and phase shifts might occur, and it can measure system fragility, but it cannot predict when or how these transitions will happen.
The analysis challenge lies in distinguishing between these patterns. Applying narrative interventions to deterministic bottlenecks wastes time and capital that could have been used to address mathematical constraints.
Treating predictable statistical variation as mysterious emergence means missing patterns that Bayesian inference could surface. And applying mathematical models to genuine emergence fails to acknowledge human complexity and may create chaos.
Diagnostic Challenge
This diagnostic challenge is central in Dave Snowden’s Cynefin framework, a sense-making framework that helps governments, companies, teams, and leaders understand what type of problem they’re facing before choosing how to respond.
Deterministic and stochastic patterns exist in Cynefin’s ‘ordered’ domains, while emergent phenomena live in the ‘complex’ domain. However, even deterministic systems can become practically unpredictable through sensitive dependence on initial conditions. This means LLM effectiveness isn’t perfectly binary, and there are edge cases where mathematical systems require human judgment despite following equations.
A simplification of the mathematical patterns overlaid on top of Cynefin 3+1. This visualization is not vetted by The Cynefin Company.
I wanted to see how well LLMs perform within these three patterns, and ran two LLM experiments.
Experiment One: Using an LLM as a Coaching Proxy
The first experiment tested whether an LLM could meaningfully replace the exploratory, dialogue-driven interactions of agile coaching for less ordered problems. I worked with a team that wanted to reduce their code peer review lead time. In the past, they’d made attempts to do so, they knew their WIP, had overview of pending reviews, and all knew why peer reviews matter and agreed. But despite knowing their process, the status of their work, and people agreeing it’s important, code reviews were piling up. The increasing lead times seemed like an emergent phenomenon.
We used OpenAIs GPT-4.5 to see if it could help the team understand their dynamics, define an experiment, and ultimately help solve their problem. My own learning objective was to assess the LLM’s capacity to help with emergent phenomena where abductive reasoning (inferring the best explanation or developing intuitive hunches about underlying causes) is critical.
In two sessions, using the chat prompt, the LLM guided the team in framing their problem, explored possible causes, and brainstormed interventions. The LLM helped structure the exploration, prompted managers to consider multiple angles, and generated a broad set of potential improvements. The team found this structured thinking scaffolding valuable.
However, there were clear limitations. The LLM struggled with subtext, cultural markers, and contextual fit. Also, what it eventually proposed a “one-week experiment” was a large transformation. The team considered several suggestions unrealistic, and ineffective.
I broke protocol and offered my thoughts. This helped the team arrive at a small, targeted experiment where instead of changing their process, tooling, role expectations etc, they’d introduce “Review Notes” to help the reviewer know what specifically to look at. As this teams changes varied in size and complexity, and their review tool was ineffective, the team thought review notes would reduce the burden of reviews. They also made low-risk reviews optional. One team member explicitly asked, “Would the LLM have helped us realize what you helped us realize if you hadn’t spoken up?”
The experience highlighted two things: first, that LLMs can provide useful thinking scaffolding for problem-solving; second, that abductive reasoning is still uniquely human.
Experiment Two: Feeding Organizational Exhaust Data to an LLM Agent
While the first experiment focused on emergent phenomena, the second experiment focused on automated analysis of deterministic and stochastic patterns in organizational exhaust data Specifically, instead of relying on traditional analysis (direct observation, interviews, platform reviews), I wanted to test whether a LLM agent could read a organization’s exhaust data to identify what was happening right now, and if it could make predictions of what was likely to happen next.
At an AI hackathon facilitated by Henrik Kniberg and Hans Brattberg at Abundly, I created an LLM agent, fed it with mocked/fictitious organizational exhaust data, defined the analytical framework and mathematical models to use, and gave it limited autonomy to see what it would do.
The fictitious setting was a company with six development teams and forty team members. In the exhaust data, I added conflicts within and between teams, sometimes expressed in slack, sometimes in jira comments, and sometimes in commit messages. I added work appearing from the side after sprint start, and incidents affecting sometimes one team, sometimes multiple. In one team, a developer didn’t show up and nobody knew where the developer was for days. The mental script I followed when creating the organizational data included both ordered and random events.
The LLM agent analyzed Jira tickets, GitHub commits and reviews, Slack messages, and Splunk performance logs over seven days and instead of treating them as separate systems, the agent treated them as one system.
The analytical framework included queuing theory to model throughput and backlog growth, graph theory to understand and determine dependencies, and bottlenecks. I also added probabilistic processing to capture noise, and random disturbances, and Bayesian networks to update probabilities as new information was generated daily.
The LLM operated with clear and strict rules, it had to use the data I provided and the mathematical models I imposed. For being a one hour proof of concept, I was impressed. The agent flagged emerging overload when queues began growing faster than capacity. It identified bottlenecks and dependencies that put delivery deadlines at risk. It also highlighted sudden bursts of productivity as improbable suggesting to management to divert their attention there.
I also created a situation during the weekend to see what the agent would do. During the weekend, I created several incidents, increasing load, and the team huddling to address it. The team was able to fix all incidents and managers celebrated over slack. However, performance logs showed increasing load, payment failures, and notification delays, and the agent notice that no tickets had been created, noone had started working on those sections of the code base, and noone was discussing this over slack. The LLM agent flagged this as a process vulnerability, or system fragility i.e. that operational stability relied on human analysis.
This represents an entirely different kind of organizational support that managers can receive. While the first experiment required human judgement, the LLM agent generated immediately actionable insights. When it detected untracked incidents emerging in logs, it created tickets.
When it identified people working excessive hours day after day by monitoring activity across channels, it scheduled check-ins with managers.
When crisis probability rose, it drafted triage agendas and booked meetings. The contrast with traditional management was stark: what had previously taken weeks of observation and discussion could now be surfaced, validated, and acted upon in real time.
This is obviously not ready for production, and organizational data is not as ordered as the data I used in my experiment, but it shows the future potential. What took data science weeks if not months, can be completed in an afternoon since the analytical frameworks are ready to use off the shelf.
Why LLMs Fit Deterministic and Stochastic Patterns
LLMs excel with deterministic and stochastic patterns because these domains offer predictability. Combined with organizational exhaust data, LLMs have access to signal-rich information within stable mathematical constraints, making accurate prediction possible.
Deterministic patterns (queues, dependencies, throughput) operate through clear cause-effect relationships. When an LLM is constrained by appropriate mathematical models, it can detect overloads, bottlenecks, and coordination failures faster than human observation allows.
Stochastic patterns (illness, delayed reviews, misunderstandings, meeting overruns, minor incidents) produce variations that humans often misinterpret as “unpredictable.” LLM agents can track these variations across time, apply probability models, and identify when random disturbances are accumulating into systemic risk.
However, LLMs struggle with emergent phenomena, the organizational dynamics that resist mathematical prediction and require human judgment about context, culture, and complex interpersonal dynamics. This explains why my second experiment succeeded where the first struggled. The agent wasn’t “thinking” in a human sense, it was continuously scanning structured dynamics and applying inductive reasoning at scale.
When patterns are deterministic or stochastic, they can be inferred from large datasets, and LLM analysis outperforms human intuition. In experiment two, the LLM used mathematical models suitable to the context, so it didn’t guess—it detected, aggregated, and updated predictions in real time, surfacing signals that humans typically miss or misinterpret until problems become crises.
Ethics
While mathematics is neutral, its application is not. An agent constrained by queues, networks, and probabilities can reveal where workloads are unsustainable or where trust is eroding. But what leadership chooses to do with that information determines whether these tools become supportive or oppressive.
Used well, or constructively, these capabilities can prevent burnout, improve decision-making, and create more resilient organizations. Used poorly, they can enable harmful cost-cutting, increase surveillance, hurt customers and employees, and shift blame to teams without addressing the structural issues the data actually reveals.
The difference in interpretation is significant. An employee flagged for working excessive hours could either be perceived to need support to prevent burnout or be labeled as a top performer(Elona Rodriguez above). And someone might be considered having a sustainable workload or be fired (David Kim above). The same mathematical analysis that identifies an overburdened team could lead to additional resources being provided, or to demands for higher productivity. Context, ethics, and leadership intent matter enormously.
This places new demands on managers and HR departments. Leaders must continuously evaluate whether their use of these tools aligns with stated values about employee wellbeing. HR and People & Culture functions may face identity questions about their role in organizations where algorithmic analysis can predict individual stress levels and team dynamics more accurately than traditional assessment methods. The fundamental challenge isn’t technical.
Cognitive Atrophy
The greater the insight these systems provide, the more tempting it becomes to over-rely on them, whether to control outcomes or outsource judgment entirely. This reliance carries significant costs. Just as muscles weaken without use, reasoning skills decline when no longer exercised.
As organizations grow accustomed to machine-driven insights, their capacity to reason about complex systems will erode. Systems thinking, and navigating complexity require active practice. Without regular engagement with ambiguous problems, the ability to identify emergence diminishes. When this happens, the value of human judgment fades.
The long-term risk is ironically that consultants and managers may become even less effective than the LLM in my first experiment. They may lose the reasoning skills that once justified their roles while failing to develop the technical literacy needed to work effectively with algorithmic systems.
Inductive, and Abductive Reasoning
Erosion of reasoning matters because organizational improvement depends on these two complementary forms of thought. Machines excel at inductive reasoning, and when constraints are stable (queues, dependencies, distributions), LLM agents can provide a powerful advantage. My second experiment demonstrated this clearly: the agent processed thousands of signals, spotted bottlenecks no human could detect, and continuously refined its assessment in real time.
However, we humans excel at abductive reasoning. We connect seemingly unrelated signals, infer causes from incomplete information, and interpret ambiguity. This was my role in the first experiment. While the LLM provided structure, I broke protocol and guided the team toward insights. Abductive reasoning enables us to navigate emergent phenomena and create meaning where data alone is unclear.
The future of organizational development lies in combining these complementary strengths: machines handling inductive analysis at scale for deterministic and stochastic patterns, and humans using abductive reasoning on emergent phenomena, creating conditions for adaptation, and facilitating complex change initiatives.
However, since even modellable, predictable, deterministic systems can exhibit complex behavior by being sensitive to initial conditions, over-relyiance on LLM agents for seemingly predictable patterns could miss underlying dynamics that require human judgment. Ultimately, this can create highly unstable, chaotic turn of events.
Key Drivers
While we’re early in this transition, LLM agent adoption in organizational development will be driven by compelling incentives across organizational layers.
For executives, the motivation is financially clear. COOs and CFOs see opportunities to improve operational efficiency, reduce friction, and increase margins by replacing slow, retrospective analysis with continuous insight. Capital allocation and cost reduction decisions become data-backed rather than guesswork.
For middle management, the appeal is equally strong who are often asked to do more with less and deliver results despite tight budgets and limited staff. They can use LLM agents to surface bottlenecks, anticipate overload, and resolve systemic constraints without immediately adding staff. But they can also use LLM agents as input for detecting and connecting weak signals with operational performance, and for monitoring emergance.
For product and project managers, real-time situational awareness becomes actionable intelligence. Flow efficiency, delivery timelines, dependency risks, and shifting priorities can be monitored continuously, enabling intervention before delays compound rather than after crises emerge.
These drivers create unified momentum for adoption. Each role, from executives to development teams and HR, can extract practical benefits from LLM agents analyzing organizational exhaust data. This breadth of incentive ensures the shift toward agent-generated organizational intelligence will emerge as systemic pull from multiple directions rather than top-down imposition.
But every strong pull also meets resistance, and adoption of LLM agents is no exception.
Key Tensions
Despite the potential for LLM agents to replace traditional analysis for deterministic and stochastic patterns, significant tensions will shape how LLM agents are adopted.
Power and surveillance concerns top the list. The same capabilities that identify systemic bottlenecks can enable invasive employee monitoring and performance ranking systems. Employees will distrust any technology that provides management with detailed visibility into individual behaviors.
Another major tension will be professional identity. Consultants, and coaches who sell ideology based change frameworks face turning obsolete. This is likely to create resistance both within organizations and the broader industry.
Economy vs autonomy is another trade-off tension that complicates implementation. Organizations will be required to balance algorithmic accuracy with human agency, and determine liability for algorithmic decisions.
These tensions will influence not just whether companies adopt LLM agents, but how they implement them and what safeguards they establish.
What This Means for You
The Narrow Window
Organizations have a brief window to position themselves thoughtfully in this transition. The first step is deciding your ethical stance on algorithmic management. This means figuring out how much visibility is appropriate, and what safeguards will protect your people. The second is starting pilot projects with LLM agents analyzing your organizational data. The cost of postponing this experimentation grows daily as competitors gain advantages from continuous insight.
Engineering managers and directors ought to consider what data streams they already have access to. Jira, GitHub, Slack, incident logs, deployment metrics are rich signal sources. LLM agents don’t require significant customization to start surfacing patterns in bottlenecks, delivery risks, and team health indicators. The question isn’t whether this capability will exist, but whether you’ll develop it internally or buy it from vendors who may not understand your specific context.
Individual Career Implications
For Agile Coaches: Your role is splitting in two directions. The framework-driven, periodic intervention model is ending. I’ve written about how zero interest rates created an Agile Coach bubble, and LLM agents represent another singularity for this profession.
The coaches who will remain valuable fall into two categories: specialists who can be deployed exactly when data indicates a need (an agent detecting backlog growth books a refinement expert), and generalists who can integrate mathematical insights with complex change facilitation. Both paths require deep sector knowledge—being an expert in Scrum without understanding your industry’s operational realities won’t be sufficient.
For Engineering Managers and Directors: You need both mathematical literacy and emotional intelligence. While it’s tempting to outsource one aspect, imbalanced managers get overlooked for promotions. When LLM agents flag team overload or predict delivery risks, you’ll need to interpret those signals within your organization’s culture and constraints, then lead the human response to technical insights.
The INGKA IKEA Agile Team Coaches offer a preview of this future with their FÖRBÄTTRA program, a standardized, on-demand coaching sessions that teams request as needed. The logical next step is LLM agents booking these sessions directly based on detected patterns.
Two Types of Survivors
Looking ahead, I see two types of practitioners who remain relevant:
On-Demand Specialists: When agents detect specific problems, they’ll instantly connect teams with targeted expertise. This collapses the gap between problem detection and resolution from months to days. Instead of embedded coaches, we’ll see high-leverage specialists deployed precisely when data calls for their intervention.
Integration Leaders: Consultants and managers with deep sector knowledge who can translate agent-generated insights into complex organizational change. These leaders navigate emergent phenomena that resist mathematical prediction—culture shifts, trust issues, and the human responses to algorithmic management itself.
The Broader Transformation
This shift extends beyond individual roles. Organizational development as a whole is moving from episodic, narrative-driven interventions to continuous, probability-based operations. Companies already generate the data needed to understand their dynamics in real-time. The step from storing this data to analyzing it continuously is small, but the payoff is transformational.
The same visibility that enables improvement can enable surveillance. Over-reliance on machine insights can erode the human judgment needed for emergent challenges. And there’s temptation to apply mathematical agents to problems they cannot solve—the phase shifts and cultural dynamics that define organizational health.
These aren’t reasons to reject LLM agents, but reasons to use them wisely. The organizations that succeed will combine machine-generated insights about deterministic patterns with human leadership for everything that emerges beyond the mathematics.
The change will happen faster than we expect. Consider what this means for your organization, and for you individually.
Napkin-Operational Analysis
Until LLM agents are integrated in companies technical ecosystems, many mathematical models can still be useful by themselves. To help consultants, coaches, and managers become more effective with their interventions, and get a more precise situational awareness, I’ve developed a free application called Napkin-Ops.
Napkin-Ops allows users to manually specify some of the deterministic, and stochastic data they have for their organizations, and in return they get team dependency cost assessments, visualizations of how queues and work flows in their organizations. They can also use this data for build vs buy decisions, or reallocating capital between team missions, and much more. There is a premium version with collaboration features, advanced analysis, Monte Carlo simulations, but the free version provides the basics that can be helpful for organizational development.













