Trust, But Verify in the Age of AI

For most of my career in engineering, the phrase “trust but verify” showed up in predictable places: production deployments, risky migrations, or code reviews before a release. You trusted the system enough to move forward, but you verified because experience had taught you something simple—systems fail in ways that are rarely obvious.

Today an engineer can ask an AI tool to write a function, debug an error, design an API, or summarize documentation. Within seconds, the tool returns something that looks structured, confident, and complete. Often it’s genuinely useful. But usefulness depends heavily on whether the person using the tool understands what they are looking at.

For experienced engineers, AI mostly acts as an accelerator. You already have a mental model of how the system works, so when AI produces an answer you immediately start interrogating it. Does this actually solve the requirement? What assumptions are hidden here? What happens in failure cases? Where could this break in production? Because you know what “correct” should roughly look like, AI becomes a fast starting point rather than the final answer.

For junior engineers, the dynamic can be very different.

Early in your career, you’re still building that internal model of how software behaves under real conditions. Code that compiles and passes a simple test often looks correct. If an AI tool produces something that appears clean and well explained, it’s easy to trust it. The problem is that software rarely breaks where the code looks messy. It breaks in assumptions, edge cases, and system interactions that only appear under production traffic. Without the experience to question those assumptions, AI can quietly accelerate mistakes rather than prevent them.

A recent news example illustrates the risk. Reports earlier this month described Amazon holding an internal engineering review after several outages affecting its retail systems and cloud services. Some incidents involved AI-assisted changes with unusually large blast radiuses, prompting leadership to examine reliability practices and safeguards. Amazon clarified that AI itself was not directly writing the failing code in those incidents. In at least one case, an engineer followed inaccurate guidance inferred by an AI agent from outdated internal documentation, which contributed to a failure.

The important lesson from the story isn’t that AI is unreliable. The lesson is that AI outputs still require verification, especially in systems where reliability matters.

AI changes the speed at which engineers can generate code and deploy changes. If the rate of change grows faster than the system’s ability to review, test, and validate those changes, the overall reliability of the system declines. In large-scale production environments, speed without verification is how outages happen.

That’s why critical systems need explicit safeguards around AI-assisted work.

A Scalable Framework for “Trust but Verify”

If teams are going to use AI in development (which IMO they should to boot productivity), verification cannot be left to individual judgment. It needs to be systematic and scalable. A simple framework looks like this:

1. Source Verification (Input Layer)

Before trusting an AI suggestion, verify the inputs.

Questions to ask:

  • What documentation or data is the model relying on?

  • Is the documentation current?

  • Are system constraints explicitly defined?

Mechanism:

  • Maintain versioned internal documentation

  • Require explicit references when AI suggests operational changes

AI is powerful, but it inherits the accuracy of its inputs.

2. Automated Validation (Test Layer)

Every AI-generated or AI-assisted change should pass deterministic safeguards.

Examples:

  • Unit tests and integration tests

  • Static analysis and security scanners

  • Contract tests for APIs

  • Simulation or staging environments

Mechanism:

  • CI pipelines that automatically block deployment if validation fails.

Automation scales verification faster than humans alone can.

3. Human Review (Judgment Layer)

Certain decisions require engineering judgment.

Examples:

  • Architecture changes

  • Infrastructure modifications

  • Changes to critical production paths

Mechanism:

  • Code reviews by experienced engineers

  • Design reviews for system-level changes

  • Explicit sign-offs for high-risk systems

In fact, after recent incidents, Amazon reportedly moved toward requiring senior engineer approval for AI-assisted production changes in some systems.

Human review remains the final safety boundary.

4. Blast Radius Control (Deployment Layer)

Even verified systems can fail.

Mechanism:

  • Gradual rollouts

  • Feature flags

  • Canary deployments

  • Automatic rollback triggers

This ensures mistakes remain small rather than catastrophic.

The Real Skill AI Doesn’t Replace

The engineers who benefit most from AI aren’t the ones who trust it the most. They are the ones who know how to question it. AI can generate answers quickly, but engineering has never been about generating answers. It’s about validating them against reality. And that principle hasn’t changed. If anything, AI makes it more important.

Trust the tool enough to move forward.
Verify the result enough to rely on it.

© Sasi Pagadrai | 2026

© Sasi Pagadrai | 2026