The build phase's tests_failed loop-breaker (3 strikes -> blocked + human_issue)
was burning autonomous retries on environmental errors: a missing git clone,
worktree contention, transient DNS/TLS hiccups, and 429s all looked the same as
real test failures. Verified 2026-06-26 across 27 mindmaps stories that ended up
in blocked - every one was a single 'project repo not found at' error.
This change classifies 6 known transient patterns at the build return site and
sets feedback.transient=True. The cycle function's Txn 3 loop-breaker skips
those: within 24h of first_attempted_at, the row stays in the same phase (no
extra attempt increment, no human_issue), emits phase.transient_retry so the
relay/dashboard sees retry activity without spam, and the stale-claim window
provides natural backoff. After 24h of persistent transient retries, the row
escalates to blocked + human_issue.
Files:
- phases.py: is_transient helper + 7 build-phase _transient_verdict annotations
- cycle.py: split Txn 3 loop-breaker into transient vs structural paths;
emits phase.transient_retry
- state.py: claim_for_spec/build/review set first_attempted_at on first claim
- schema.sql + db/migrations/0007_first_attempted_at.sql: new nullable column,
backfilled from updated_at for existing rows (forward-compatible)
- api_schemas.py: VerdictFeedback model with transient: Optional[bool] = None
- 3 new test files: test_is_transient.py (13 cases),
test_cycle_transient_skip.py (transient skips loop-breaker; structural
preserves 3-strike), test_first_attempted_at.py (24h escalation, fresh
transient no-escalation, first claim sets timestamp)
- docs/VERIFICATION.md: new section documenting the rule
Co-Authored-By: Claude <noreply@anthropic.com>