Home / BI Tech / Can AI Agents Safely Automate Infrastructure as Code?

Can AI Agents Safely Automate Infrastructure as Code?

May 28, 2026 Article

James DaisleyBusiness Solutions Expert

A massive network overhaul in a major bank serves as a testament to both the power and the terror of automated infrastructure management in the modern cloud landscape. When TD Bank embarked on a journey to rebuild the digital foundations of 1,300 branch locations, the goal was not just modernization but an exploration of the boundaries of human-machine collaboration. Utilizing nearly 13,000 lines of code to facilitate 90-minute, non-disruptive rebuilds, the project became a flashpoint for a larger conversation: can artificial intelligence be trusted with the keys to the kingdom? This dilemma lies at the heart of the current shift in Infrastructure as Code (IaC), where the deterministic certainty of traditional scripts meets the probabilistic flexibility of generative models.

This transition represents more than a simple upgrade in tooling; it is a fundamental reconfiguration of how enterprise stability is maintained. As organizations across the financial, healthcare, and technology sectors experiment with AI agents, they are discovering a landscape fraught with hallucinations and silent failures. The promise of near-instantaneous provisioning is countered by the non-negotiable requirement for security and compliance. Consequently, the industry is currently defining the rules of engagement for a world where AI proposes, but humans—aided by rigid guardrails—dispose. The following analysis explores how the world’s most risk-averse institutions are navigating this new reality.

The 90-Minute Rebuild: Why Major Financial Institutions Still Hesitate to Grant AI Full Autonomy

The experience at TD Bank illustrates the “abundant caution” that characterizes the current state of AI adoption in high-stakes environments. During their massive Canadian branch network modernization, the bank integrated Microsoft Copilot with the Red Hat Ansible Automation Platform. While the AI was instrumental in generating the thousands of lines of code required for the project, the bank strictly limited the agent’s scope. AI was tasked with writing tests and generating repetitive code segments rather than defining the core architectural logic. This decision reflects a broader trend among financial giants: AI is viewed as a high-speed assistant, not a sovereign pilot.

Leadership at such institutions frequently emphasizes that it is not “open season” for AI in production environments. At TD Bank, for instance, the AI was responsible for approximately 15% of the total workload, primarily focusing on low-risk tasks that would have otherwise consumed hundreds of human hours. The core runtime playbooks—the instructions that actually execute changes on live systems—remained entirely within the domain of human expertise. This hybrid approach allows for significant productivity gains without exposing the bank to the unpredictable nature of an autonomous agent operating without a safety net.

The hesitation stems from the inherent nature of banking infrastructure, where a single misconfigured firewall or an incorrectly provisioned server can result in millions of dollars in losses or severe regulatory penalties. By keeping the AI “in a box” and training it on high-quality, pre-existing internal standards, the bank ensured that the generated code adhered to strict organizational styles. This method proves that while AI can accelerate the journey toward a 90-minute rebuild, the destination must still be mapped out by engineers who understand the gravity of every semicolon in a configuration file.

The Evolution of Infrastructure: Moving from Static Scripts to Generative Models

Infrastructure as Code has traditionally been a world of absolute certainty. Engineers used languages like Terraform, OpenTofu, or Ansible to write deterministic scripts where every action was explicitly defined. If a developer wanted to provision a virtual machine, they wrote the exact specifications, and the machine followed those instructions to the letter. However, the rise of AI agents is shifting the paradigm from writing syntax to expressing intent. Instead of coding the “how,” developers are beginning to describe the “what” in natural language, allowing AI models to translate those desires into functional code.

This evolution is often compared to the transition from manual command-line interfaces to the World Wide Web. Early adopters are experimenting with tools that replace traditional Internal Developer Platforms (IDPs)—which are often static and expensive to maintain—with conversational interfaces. In this new model, a developer might tell an agent to “provision a secure staging environment for a new microservice,” and the agent would ideate the necessary configuration in a sandbox. This shift promises to democratize infrastructure management, allowing developers to focus on application logic rather than the complexities of cloud-provider APIs.

Despite this progress, the transition remains grounded in a “deterministic handoff.” Even the most advanced generative models are currently used as translators that bridge the gap between human intent and the rigid code that machines require. Once an AI agent generates a design, that design is converted into stable, repeatable IaC scripts. This ensures that the final deployment remains predictable and auditable. The goal is not to eliminate the deterministic layer, but to use AI as an intelligent coordinator that makes that layer more accessible and dynamic for the modern enterprise.

Identifying the Technical Risks: The Danger of Fabricated Code and Silent Failures

The primary technical hurdle preventing full AI autonomy is the phenomenon of model hallucinations. In the context of infrastructure, a hallucination is not just a factual error; it is the fabrication of non-existent variables, provider arguments, or security protocols. For example, a medical device company recently reported that while using AI agents to scaffold OpenTofu modules, the agent consistently suggested infrastructure variables that simply did not exist in the official documentation. Such errors are particularly dangerous because they often look technically plausible to the untrained eye and can even pass some basic validation checks.

The risk of “silent failures” is another major concern for DevOps teams. An AI agent might generate a script that successfully provisions a resource but ignores critical compliance tags or security guardrails. Because the code is syntactically correct, it might be deployed without triggering alarms, only for the organization to discover a massive security hole or a budget-draining resource leak weeks later. Unlike human engineers, AI agents currently lack the “self-awareness” to flag their own uncertainty or to understand the broader business context that makes a specific configuration “wrong” despite being technically functional.

Furthermore, the lack of contextual memory in many standard LLMs can lead to inconsistencies across a large-scale project. If an agent generates one module on Monday and another on Tuesday, there is no guarantee that the two will be compatible or follow the same naming conventions unless strict internal standards are enforced. This necessitates a rigorous verification process where every piece of AI-generated code is scrutinized for “fabricated logic.” The consensus among technical leads is that AI agents are currently “pathologically confident,” making them excellent at generating drafts but dangerous as final authorities.

The Changing Role of the Engineer: Transitioning from Writing Code to Reviewing Logic

As AI agents take over the “grunt work” of writing boilerplate code, the role of the infrastructure engineer is undergoing a profound transformation. The value of a modern engineer is no longer measured by their ability to memorize syntax or write scripts from scratch. Instead, the focus has shifted toward architectural oversight and logical verification. Engineers are becoming “curators of intent,” responsible for ensuring that the AI’s output aligns with the organization’s long-term goals and security posture. This requires a deeper level of system knowledge, as catching a subtle AI error is often more difficult than writing the code correctly in the first place.

This shift has led to the emergence of the “human-in-the-loop” requirement as a standard industry practice. At organizations like EY, even the most advanced coding agents are restricted by a hard stop at the production line. Every pull request generated by an AI must be reviewed by a human who can explain every resource argument and justify every architectural choice. This ensures that the “why” behind the infrastructure remains a human responsibility. The engineer’s job is to act as the final “veto power,” preventing the deployment of code that might be efficient but ultimately violates compliance or operational standards.

Moreover, the requirement for engineers to provide detailed explanations for their code—even when generated by AI—is becoming a critical part of the modern workflow. This practice forces a level of rigor that prevents teams from becoming over-reliant on automated tools. If an engineer cannot explain why a particular variable was used, they cannot approve the deployment. In this sense, AI is not replacing the engineer; it is raising the bar for what an engineer must know. The ability to debug “black box” logic is now just as important as the ability to design a cloud architecture.

Expert Perspectives: How Industry Leaders Balance Productivity with Verification

Industry leaders are increasingly adopting a “trust but verify” strategy, utilizing multiple layers of defense to manage AI-generated infrastructure. EY’s Global Tax Platform, for example, utilizes an agent named “IBM Bob” to assist with modernization and Terraform tasks. While the platform allows the agent significant “agentic autonomy” during the development phase—allowing it to iterate on solutions and remember past decisions—the actual deployment is governed by external cloud-level guardrails. These guardrails, implemented in platforms like AWS or Azure, act as a physical limit on what the AI can do, preventing catastrophic errors regardless of what the code says.

Vendors like Red Hat and IBM are also shaping the narrative by positioning AI agents as “triggers” for existing, verified automation. The introduction of visual automation orchestrators allows organizations to manage three distinct modes: human-approved actions, supervised agentic workflows, and eventually, autonomous operations for low-risk systems. This vision turns tools like Ansible into “playbook engines.” In this model, the AI handles the high-level task determination, but the execution is carried out by pre-validated, deterministic playbooks that have been tested and approved by human experts over many years.

This layered approach provides the audit trails and approval gates necessary for regulated industries. Experts suggest that the future of infrastructure lies in this “orchestrated autonomy,” where AI provides the intelligence and humans provide the boundaries. By separating the “thinking” (AI) from the “doing” (deterministic automation), companies can achieve the speed of AI while maintaining the reliability of traditional IT. The prevailing sentiment is that the most successful organizations starting in 2026 will be those that build the best “fences” around their AI agents, rather than those that simply give them the most power.

A Practical Framework for Securely Integrating AI into Production Workflows

The journey toward safe AI-driven infrastructure required a fundamental rethink of the standard deployment pipeline. Organizations discovered that the most effective strategy involved treating AI-generated code with a higher level of suspicion than human-authored work. This led to the widespread adoption of “deterministic handoffs,” where AI agents operated strictly in sandbox environments to ideate and prototype. Once the proposed configuration reached a state of maturity, it was translated into hardened, version-controlled scripts that underwent rigorous automated testing. The industry prioritized a workflow where AI could experiment, but only human-vetted logic reached the core of the business.

Security frameworks also evolved to include cloud-native guardrails that functioned independently of the infrastructure code itself. By implementing service control policies and budget alerts at the provider level, teams created a “fail-safe” that could catch AI errors before they resulted in outages. Furthermore, the practice of “logic reviews” replaced simple “code reviews,” as engineers began to focus on the intent and the relationships between resources rather than just checking for syntax errors. This roadmap established that the value of AI was found in its ability to accelerate the drafting process, while the value of the human remained in the final validation and strategic alignment.

Ultimately, the successful integration of AI agents into Infrastructure as Code was defined by a refusal to sacrifice stability for the sake of speed. The industry recognized that while probabilistic models offered incredible flexibility, the digital foundations of society still demanded deterministic certainty. The path forward emerged as a hybrid model: a conversational, AI-driven interface for planning and development, backed by a rigid, human-governed execution layer. Through this disciplined approach, organizations were able to harness the productivity of the machine without surrendering the oversight that keeps the modern world’s digital infrastructure standing.