With a keen eye for turning massive datasets into compelling visual stories, Chloe Maraina stands at the forefront of business intelligence and data science. Her work focuses on the critical intersection of real-time data integration and the future of AI governance, helping enterprises navigate the complex journey from experimental AI to secure, production-ready systems. In our conversation, we explore the nuances of building trust in autonomous agents through advanced observability, the strategic advantages of a streaming-first approach to AI governance, and the practical steps organizations can take to control both the behavior and the costs of their AI initiatives.
Many AI initiatives struggle to move from experimentation to secure production due to governance concerns. How do new Agentic Data Plane features like the AI Gateway and observability tools specifically address these trust and control barriers for enterprises? Please provide a practical example.
This is the fundamental challenge we’re seeing everywhere. Teams are excited, they build a proof-of-concept agent, but then the security and operations teams pump the brakes, asking, “How can we be sure this thing won’t go rogue?” That’s precisely where these new features come in. The AI Gateway isn’t just a simple pass-through; it’s a centralized control plane. It’s the single chokepoint where you can enforce all your organizational policies. Think of it like a customs checkpoint for all AI interactions. For example, a financial services company could use the gateway to ensure their trading agent never accesses personally identifiable information from a customer database, enforcing a strict policy at the gateway level rather than trying to hardcode it into a dozen different agents. This shifts the dynamic from risky, ad-hoc experimentation to a governed, secure production environment where you have a unified framework for control.
The concept of “glass box” visibility is crucial for trusting autonomous agents. Could you walk us through how the AI observability capabilities, using the OpenTelemetry Protocol, allow a developer to inspect agent behavior and what specific steps they might take to debug an issue using your console?
The term “glass box” is perfect because it captures the shift away from the “black box” mystery that makes so many people nervous about AI. Using a standard like the OpenTelemetry Protocol is key because it provides a common language for observability. When an agent acts, we’re not just seeing the final output; we’re automatically generating a rich stream of metrics, logs, and full transcripts of its “thought process.” Imagine a developer gets an alert that a supply chain optimization agent is making strange recommendations. They can immediately go into the Redpanda console, pull up the complete trace for that agent’s recent actions, and see exactly which data stream it processed, what logic it applied, and what model it queried. They can literally read the transcript of the agent-to-agent communication. This allows them to pinpoint if the error came from bad input data, a flawed model interaction, or a bug in the agent’s logic, and then debug it directly. It’s the difference between guessing what happened and having a detailed, second-by-second replay.
Building a centralized AI governance layer on a streaming foundation is a unique approach. How does this differ from traditional data-at-rest platforms, and what kinds of real-time, autonomous agent use cases does this specific architecture unlock for customers?
The difference is night and day, really. Traditional platforms are built around batch processing—they look at data after it has landed in a database or a data lake. That’s fine for historical reporting, but for autonomous agents, it’s like driving by looking in the rearview mirror. A streaming foundation means the governance is applied in-motion, as events happen. This unlocks a whole class of use cases where immediate action is critical. Consider a fraud detection system. An agent built on a streaming platform can analyze transaction data, user behavior, and location information in milliseconds, identify a fraudulent pattern as it’s unfolding, and trigger an immediate kill switch on the transaction. On a data-at-rest platform, you might not discover that fraud until the batch report runs hours later. This real-time capability allows agents to not just analyze but to interact with and shape events as they occur, which is the true promise of agentic AI.
Beyond technical oversight, cost control is a major concern with AI. How does the AI Gateway provide FinOps controls to manage token budgets and limit spending, and could you share an anecdote about how this prevents costs from spiraling out of control in a multi-agent system?
This is a huge, often underestimated, pain point. The power of multi-agent systems is that they can work together on complex problems, but that can also lead to an explosion in model API calls and cloud usage. The AI Gateway provides a crucial FinOps layer by acting as a central toll booth. You can set hard limits on token usage per agent, per team, or per project. I remember one team was developing a system where a “researcher” agent would pass tasks to several “analyst” agents. During a test run, a bug caused the researcher to send the same complex query in an infinite loop, and the analyst agents just kept executing it. Without a central control, they would have burned through their entire monthly cloud budget in a few hours. But because they had a policy in the AI Gateway limiting the researcher’s token budget per hour, the system automatically throttled the requests after a few minutes, an alert was sent, and the team could fix the bug with minimal financial damage. It turns a potential catastrophe into a manageable operational issue.
Your roadmap mentions plans for agent kill switches and enhanced evaluation functionality. Can you elaborate on how these features will work in practice and what specific scenarios would prompt a user to activate a manual or automatic kill switch on an agent?
These features are the next logical step in building enterprise-grade trust. The kill switches, both manual and automatic, are an essential safety net. A manual kill switch is straightforward: an operator in a monitoring dashboard sees an agent behaving erratically—say, a customer service agent is giving nonsensical or offensive answers—and can hit a button to immediately halt all its operations. The automatic kill switch is even more powerful. It would be tied to the observability data. For example, you could set a policy that if an agent’s response latency spikes, or if it starts accessing a data source it’s not supposed to, the system automatically triggers the kill switch. This is critical for preventing cascading failures or security breaches. These features aren’t just about stopping bad behavior; they’re about giving organizations the confidence to deploy agents in more mission-critical roles, knowing they have a reliable off-ramp if things go wrong.
What is your forecast for agentic AI in the enterprise over the next two years?
Over the next two years, I believe we’ll see agentic AI move from a niche, experimental technology to a core component of enterprise automation, but this will be entirely dependent on the maturity of governance platforms. The conversation will shift from “Can we build an agent?” to “How do we manage a fleet of a thousand agents securely and efficiently?” We will see a consolidation of tools, with platforms that offer integrated governance, observability, and security becoming the standard, much like what happened with container orchestration. The companies that succeed won’t be the ones with the flashiest demos, but the ones who have built a robust, trust-inspiring “nervous system” for their AI, allowing them to deploy agents with confidence, control their costs, and prove their value in real-world, mission-critical operations. The focus will be less on the “magic” of AI and more on the practical engineering of trust.
