Running Your AFK Agent
An autonomous agent can work through your issue backlog, but you need to understand how it behaves first. The key is running it human-in-the-loop (HITL) to observe its decision-making and identify weaknesses in your prompts or issue descriptions.
In this lesson, you're going to run an agent HITL on your issues using the /tdd skill. You'll watch it pick tasks, write tests, implement code, and commit changes. By observing where it struggles, you'll learn how to write better prompts and issues.
Steps To Complete
Understanding the Agent Setup
- Review the
ralph/prompt.mdfile
This is the agent's instruction set. It tells the agent:
- Which issues to work on (AFK vs HITL)
- How to prioritize tasks (bugfixes, infrastructure, features, polish, refactors)
- What feedback loops to run (tests, type checking)
- How to commit with meaningful messages
Notice the prioritization order - development infrastructure like tests and types come before features. This matters because good tests and types make feature work faster and safer.
- Review the
ralph/once.shscript
The script:
- Reads your open issues from the
issues/directory - Reads the last 5 commits to understand what's been done
- Loads the prompt
- Passes everything to Claude with permission to make edits
Note that --permission-mode acceptEdits is included, which means Claude can write and commit code directly. But you'll still be watching and can intervene at any point - that's what makes this HITL.
Setting Up Your First Agent Run
- Make sure you have issues in your
issues/directory
You should have at least 2-5 issues written as markdown files. Each issue should have a clear description of what needs to be done.
- Review the agent prompt structure
The prompt provided in ralph/prompt.md is the agent's instruction set. It defines:
- Task selection priorities (bugfixes, infrastructure, tracer bullets, polish, refactors)
- The workflow to follow (explore, implement with
/tdd, run feedback loops, commit) - How to handle completed vs incomplete issues
- The rule to only work on a single task per run
This structure keeps the agent focused and prevents it from trying to do too much at once.
- Review the
/tddskill in.claude/skills/tdd/
The agent uses the /tdd skill to implement features. Take a moment to look at the skill files to understand the workflow:
SKILL.mddefines the test-driven development process- Supporting files like
tests.mdandmocking.mdprovide guidance on writing good tests
The skill emphasizes vertical slices (one test, one implementation) over horizontal slices (all tests, then all implementation).
Running the Agent Human-in-the-Loop
- Run the
ralph/once.shscript
bash ralph/once.sh
- Watch the agent work through one complete task
The agent will:
- Read your issues
- Pick the highest-priority AFK task
- Explore the repo
- Use
/tddto write tests and implement code - Run feedback loops (
npm run test,npm run typecheck) - Make a git commit
Pay attention to where it struggles. Does it pick the right task? Does it understand the requirements? Does it write good tests?
- Approve permission requests as they come up
The agent will ask for permission to run bash scripts. You'll need to approve these requests as usual.
- Let the agent finish before intervening
Let it complete the full cycle, even if you see issues. You want to observe its natural workflow before making adjustments.
Observing the Results
- Check your git log
git log --oneline -10
You should see new commits with messages that describe what was done, what files changed, and any notes for the next iteration.
- Check your
issues/directory
Issues the agent completed should have moved to issues/done/. Incomplete issues should have notes added describing what was done.
- Review the code the agent wrote
Look at the actual implementation. Does it make sense? Is it testable? Did it follow the TDD approach (test first, minimal implementation)? Did it produce good or bad tests?
Refining the Agent
- Identify where the agent struggled
Did it:
-
Pick the wrong task?
-
Misunderstand the requirements?
-
Write poor tests?
-
Skip feedback loops?
-
Make unclear commits?
-
If you have time, run the agent again to watch it pick up the next task
bash ralph/once.sh
Watch how it selects and works through the next issue.