feat(orchestrator): distinguish transient vs structural tests_failed (ADR-005) #31
Reference in New Issue
Block a user
Delete Branch "feat/S2-transient-tests-failed"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Implements ADR-005: classifies 6 known transient error patterns at the build return site and bypasses the 3-strike loop-breaker for them. Within 24h of
first_attempted_at, transient retries stay in the same phase (no extra attempt increment, no human_issue), emitphase.transient_retry, and use the stale-claim window for backoff. After 24h of persistent transient retries, escalates toblocked+ human_issue.Motivation
Verified 2026-06-26 across the 27 mindmaps stories that ended up in
blocked: every one was a single{"error": "project repo not found at /workspace/projects/mindmaps; clone the Gitea repo..."}— a one-time setup miss, not 3 failed attempts. The current loop-breaker treats all non-pass outcomes the same; transient environmental errors burn the same budget as real test failures. Operator drain cycles are dominated by noise from one-time setup gaps and short-lived network blips.Changes
phases.py: newis_transient(err)helper matching 6 documented substrings; all 7 build-phasetests_failedreturn sites route through_transient_verdict()to annotate matching errors.cycle.py: Txn 3 loop-breaker split into transient vs structural paths. Whenverdict == "tests_failed"ANDfeedback.transient is True:first_attempted_at: row stays in same phase, emitsphase.transient_retry, no human_issue.blocked+ opens human_issue.state.py:claim_for_spec/build/reviewsetfirst_attempted_at = COALESCE(first_attempted_at, NOW())on the first claim.schema.sql+ new migrationsrc/damascus/db/migrations/0007_first_attempted_at.sql: addfirst_attempted_at TIMESTAMPTZ DEFAULT NULL(forward-compatible, backfilled fromupdated_at).api_schemas.py: newVerdictFeedbackPydantic model withtransient: Optional[bool] = None.docs/VERIFICATION.md: new section documenting the rule and 24h escalation.ADR
Design rationale:
wiki/decisions/ADR-005-distinguish-transient-tests-failed.mdLive evidence:
wiki/decisions/research/adr-005-transient-tests-failed-research.mdTest Results
Pre-existing failures in
tests/api/test_api_endpoints.pyandtests/contract/test_contracts_match_source.pywere verified to also fail onmain(unrelated to this change).Out of scope (per ADR-005)
spec_ambiguous/spec_wrong(LLM nondeterminism, not infra).error_classenum (deferred until ≥3 distinct classes observed).🤖 Generated with Claude Code
Co-Authored-By: Claude noreply@anthropic.com