byronjacobs
Home/AI/Article
AIApr 8, 202613 min read

How I Taught My AI Agents to Stop Lying and Start Asking Questions

My agents used to lie about what they tested. Now they can't. And they ask questions instead of assuming.

Byron Jacobs
Byron Jacobs
Senior Developer
How I Taught My AI Agents to Stop Lying and Start Asking Questions

This article is the sequel to The AI Architecture I Built to Automate Every Project I Touch, which introduced the project rules generator. If you have not read it, this post stands on its own, but the original gives you the full backstory.

The problem I did not see coming

When I released the project rules generator, it solved the problem I set out to solve. My AI agents had persistent rules, automatic documentation, task tracking, and a verification pipeline that stopped them from calling things done when they were not. One markdown file generated the entire system. It worked.

What I did not anticipate was the ceiling.

The generator only produced output for GitHub Copilot. Instruction files under .github/instructions/, prompt files under .github/prompts/, everything wired for one editor. If you used Cursor, you had nothing. If your team ran Codex, you had nothing. If someone on your team preferred Windsurf or Cline, same story. The rules were good. The reach was narrow.

I also discovered something more fundamental. The agents were following the rules, but they were gaming the spirit of them. I had a verification gate that said "verify before marking complete." So the agent would read its own code, declare it verified, and move on. No browser. No runtime test. No proof. It satisfied the letter of the rule while completely ignoring the point.

And the planning was worse. I had a prompt that said "create a plan before coding." The agent would write a plan, then immediately start implementing in the same response. It treated planning as a formality, a speed bump on the way to doing what it actually wanted to do: write code.

Two problems. One about reach, one about discipline. The v1.1 update fixes both.

Agents that ask before they assume

The biggest behavioral change in v1.1 is something I call the Question Gate. The agent writes the plan first, laying out my initial idea as a structured document. But embedded in that plan are open questions, every ambiguity the agent identified, every assumption it is not sure about. It surfaces those questions in the chat and includes them in the plan itself. Not silently assume the answer. Not pick the most likely interpretation and move on. Ask.

Each question gets a stable ID. Q1, Q2, Q3. I can respond directly in the chat, or I can update the plan itself. Either way, the agent takes my answers and revises the plan accordingly. If my answers raise new questions, those get new IDs, Q4, Q5, and the cycle repeats. The plan keeps evolving with each round until zero open questions remain.

Why does this matter? Because the number one failure mode I saw with AI agents was not bad code. It was wrong assumptions. The agent would interpret a vague requirement one way, build the entire solution around that interpretation, and then I would discover three hours later that it misunderstood the requirement from the start. Three hours of work, thrown away. Not because the agent was dumb. Because it never asked.

The Question Gate makes silent assumptions impossible. Every ambiguity gets surfaced. Every interpretation gets confirmed. The plan is not a static document that gets written once. It is a living draft that gets refined through conversation until both sides agree on what needs to happen.

Answered questions move to a Confirmed Inputs section in the plan document. When I review the final plan before approving it, I can see exactly what the agent asked, exactly what I said, and exactly how that shaped the implementation steps. Full traceability from question to decision to code.

Agents that cannot lie about testing

The old verification system had a gap you could drive a truck through. The rule said "verify your work." The agent's idea of verification was reading its own output and deciding it looked correct. That is not verification. That is self-grading your own homework.

The v1.1 update replaces that with a 3-tier blocking gate pipeline. Three gates, executed in sequence. Each one must pass before the next one unlocks.

Gate 1: Agent Self-Evaluation. The agent asks itself: "Is this genuinely the best solution? Would a staff-level engineer approve this?" If the answer is no, it iterates before moving forward. If the solution feels hacky, it re-implements using the best-known approach before presenting it. This gate catches the lazy first-draft solutions that agents love to ship.

Gate 2: Visual Verification. The agent must use actual browser tools to verify its work: screenshots, DOM snapshots, runtime checks. Reading code does not count. Reading test output does not count. The agent has to open a browser and look at the result with what amounts to its own eyes. If the task changes UI, the only acceptable evidence is a screenshot proving the UI renders correctly.

But the v1.1 version goes further than "look at it in the browser." It defines a measurement-based verification methodology. The protocol is specific: measure the current state of the affected element before making any change, using getComputedStyle() for spacing, getBoundingClientRect() for dimensions, or whatever method matches the property type. Then make the change. Save. Refresh the preview from disk, not from in-memory state. Re-measure the exact same element. Compare before and after. Every option value gets cycled through, not just the first one.

This killed an entire class of false positives. A field appearing in an editor sidebar proves nothing. It only means the schema parsed correctly. The value has to reach the template and produce a measurable visual difference. Before this rule, agents would see a dropdown in the admin panel and declare "the feature works." Now they have to prove the dropdown's value actually changes something on the page.

For projects with state transitions like publish/draft/pending workflows or cache invalidation, there is a Status/Cache Matrix rule on top of this. The agent runs the full transition matrix across all touched surfaces: publish to draft, publish to pending, publish to unpublish. Each transition gets verified end-to-end, including cache artifact removal and front-end access outcomes.

Gate 3: User Verification. The agent stops. It summarizes what changed and what it verified. It asks me to check. It waits. It does not proceed to documentation or commit until I either confirm or explicitly waive verification. Every summary includes a mandatory "Not Tested" section. If anything was skipped, the agent has to disclose it right there. If any scenario was untested or blocked, the agent must report it before moving forward.

These gates are blocking. The agent cannot document the change until all three pass. It cannot commit until all three pass. There is also a Documentation Timing Gate that sits on top: documentation updates are blocked until verification is complete. No premature doc updates that describe behavior nobody has proven works yet.

And then there is the Scenario Evidence Rule sitting underneath all of it. The agent cannot claim work is complete unless every relevant behavior path was executed end-to-end. Not the happy path only. Every path. If a scenario needs test data that does not exist, the agent has to create a temporary test fixture to exercise it. "Nothing currently triggers this path" is never a valid excuse. If the agent still cannot verify something after trying, it reports the blocker, leaves the changes uncommitted, and stops.

That last part, the Blocker Protocol, was the missing piece. Before v1.1, an agent that hit a verification wall would log it as an "anomaly" and commit anyway. Now it cannot. Unverified work does not get committed. Period. The agent stops, reports the problem, and waits. Changes stay staged but uncommitted until the blocker is resolved.

Every element gets an ID

One rule that makes all of this verification possible is deceptively simple: every rendered HTML element must have a descriptive id attribute. The naming convention is {page}-{section}-{element} in kebab-case. Dynamic list items use the item's unique key. Nested elements maintain the parent hierarchy in their prefix.

This is not about testing convenience. It makes every element on the page addressable, queryable, and verifiable by the agent. Combined with the measurement methodology, the agent can programmatically confirm that a specific element exists, is sized correctly, and responds to configuration changes. No more relying on fragile CSS selectors or XPath queries. No more "I think this is the right div." The element has an ID. The agent queries it directly. The measurement is exact.

Planning that actually means planning

I mentioned the old planning problem. Agents treating the plan as a formality and jumping straight to code. The v1.1 update introduces what I call the Anti-Implementation Gate. It is a hard rule baked into the planning prompt: zero code may be written during planning. Not one line. Not a "quick prototype." Not "I will just set up the file structure while I plan." Zero.

The gate redefines how the agent interprets user language. When I say "please update the payment handler to support refunds," the old agent would read that as an instruction to update the payment handler right now. The new agent reads it as a requirement to capture in the plan. Imperative language from the user is treated as requirements input, not as commands to execute. The only files the agent can touch during planning are the plan document itself and todo.md.

This sounds restrictive. It is. That is the point.

Plans now live in lifecycle folders: draft/, pending/, completed/. Every new plan starts in draft/ with a date prefix (YYYY-MM-DD-slug.md). When I approve it, I have three options: "Go" starts implementation immediately, "Pending" moves it to pending/ for later, and "Keep in draft" means it needs more work. When a plan is fully implemented, tested, and committed, it moves to completed/. I can look at the completed/ folder six months from now and see every plan that shipped, with the full context of what was decided and why.

Commits are not optional, and neither is discipline

The v1.0 system had a vague "commit after completion" rule. The v1.1 version turns that into a structured commit workflow. Every commit requires a branch safety check that blocks commits on master or main. Explicit file staging, never git add . or git add -A. Three numbered commit message options that the agent presents for me to pick from. And a separate approval step before push.

There is a batch-commit prompt for catching up on accumulated changes and a merge-dev-to-master prompt with safety checks. If no .git/ directory exists, the generator initializes one with a proper .gitignore so the commit workflow is immediately usable.

One safety rule worth noting: the agent may never push or deploy to production. All live deployments are my responsibility.

The system learns from its mistakes

The v1.0 system had a lessons.md file. The v1.1 version turns it into a real self-improvement loop. After any correction, the agent updates lessons.md with a rule preventing the same mistake. The critical part is what happens next: at the start of every new session, the agent reviews those lessons. The corrections persist across sessions. The agent gets better at my project over time because the rules it learned last month are still active today.

This creates a compounding effect. Month one, the agent makes a mistake with my naming conventions. I correct it. Month two, it never makes that mistake again. By month six, the agent has internalized dozens of project-specific patterns that no generic model would ever know.

It works everywhere now

The v1.1 update transforms the generator from a Copilot-only tool into a multi-agent platform. When you run the prompt, Phase 0 asks you to pick your targets before anything else happens. You choose Copilot, Cursor, Codex, or any combination. Phase 0.5 immediately creates a tracked todo list of every phase the generator will execute, so it cannot skip or forget a phase during a long generation run. Phase 1.6 scans for existing instruction files and produces a create-vs-update status table before writing anything, making re-runs surgical rather than destructive.

Each target gets purpose-built output. Copilot gets up to 8 instruction files with applyTo scoping and 8 prompt files. Cursor gets 7 to 9 .mdc rule files with globs and alwaysApply fields. Codex gets AGENTS.md at the project root plus reusable skills under .agents/skills/, and that format is also read natively by Cursor, Windsurf, Cline, and other agents.

Nothing duplicates across targets. Each is self-contained. Shared resources live in the same directories and get referenced by all. The pipeline grew from 13 phases to 24-plus, with conditional execution per target, all from a single prompt file at about 3,300 lines.

Update mode: the feature that makes this sustainable

Most instruction systems are write-once. You generate them, customize them over months of real use, and then you can never re-run the generator because it would overwrite everything you built on top of it.

The v1.1 generator reads before it writes. Every phase checks if the target file already exists. If it does, the generator reads it, diffs mentally against what it would generate fresh, and merges changes in. New rules get added. Existing custom content gets preserved. Task logs, observation history, lessons learned, custom categories. All untouched.

I can re-run the generator on a project that has been active for six months without losing a single custom rule or lesson. Without update mode, the generator is a one-time setup tool. With it, the generator becomes a living system that evolves alongside your project.

Session-aware batching

One overhead problem I noticed in v1.0 was the documentation cycle. Document every task, log every change, commit after completion. The agent obeyed literally. Fix a typo, write the log, update the doc, make the commit. Rename a variable, same thing. The admin work was drowning the actual work.

The Batch Boundary System solves this. The agent implements changes immediately but defers the documentation-and-commit cycle to natural breakpoints: when I change topic, signal completion, roughly five changes pile up, or the session ends. The documentation is still mandatory. But it happens at the right rhythm instead of after every single message.

What is next

The v1.1 update turned a Copilot-specific tool into a platform. More agents will emerge. More rule formats will appear. The generator's architecture makes adding new targets straightforward.

The example directory in the repo includes both the v1.0 output (Copilot-only) and the v1.1 output (all three targets). Compare them side by side.

Fork it, pick your targets, run it on your project. Your agents will stop lying about what they tested. They will start asking questions instead of assuming. And every session will pick up exactly where the last one ended. Not because the model got smarter, but because the rules got better.

Get the generator on GitHub →


Related Reading

Get Notified

Occasional Notes

I publish occasional posts based on my own experiences, tools I’m tinkering with, AI experiments, and how I stay ahead in tech.