feat(orchestrator): distinguish transient vs structural tests_failed (ADR-005) #31

Merged
kaykayyali merged 1 commits from feat/S2-transient-tests-failed into main 2026-06-27 16:38:33 +00:00
Owner

Summary

Implements ADR-005: classifies 6 known transient error patterns at the build return site and bypasses the 3-strike loop-breaker for them. Within 24h of first_attempted_at, transient retries stay in the same phase (no extra attempt increment, no human_issue), emit phase.transient_retry, and use the stale-claim window for backoff. After 24h of persistent transient retries, escalates to blocked + human_issue.

Motivation

Verified 2026-06-26 across the 27 mindmaps stories that ended up in blocked: every one was a single {"error": "project repo not found at /workspace/projects/mindmaps; clone the Gitea repo..."} — a one-time setup miss, not 3 failed attempts. The current loop-breaker treats all non-pass outcomes the same; transient environmental errors burn the same budget as real test failures. Operator drain cycles are dominated by noise from one-time setup gaps and short-lived network blips.

Changes

  • phases.py: new is_transient(err) helper matching 6 documented substrings; all 7 build-phase tests_failed return sites route through _transient_verdict() to annotate matching errors.
  • cycle.py: Txn 3 loop-breaker split into transient vs structural paths. When verdict == "tests_failed" AND feedback.transient is True:
    • Within 24h of first_attempted_at: row stays in same phase, emits phase.transient_retry, no human_issue.
    • After 24h: escalates to blocked + opens human_issue.
  • state.py: claim_for_spec/build/review set first_attempted_at = COALESCE(first_attempted_at, NOW()) on the first claim.
  • schema.sql + new migration src/damascus/db/migrations/0007_first_attempted_at.sql: add first_attempted_at TIMESTAMPTZ DEFAULT NULL (forward-compatible, backfilled from updated_at).
  • api_schemas.py: new VerdictFeedback Pydantic model with transient: Optional[bool] = None.
  • docs/VERIFICATION.md: new section documenting the rule and 24h escalation.
  • 3 new test files (18 test cases total).

ADR

Design rationale: wiki/decisions/ADR-005-distinguish-transient-tests-failed.md
Live evidence: wiki/decisions/research/adr-005-transient-tests-failed-research.md

Test Results

$ pytest tests/test_is_transient.py tests/test_cycle_transient_skip.py tests/test_first_attempted_at.py tests/test_conftest_safety.py tests/unit/
======================== 52 passed, 4 warnings in 6.98s ========================

Pre-existing failures in tests/api/test_api_endpoints.py and tests/contract/test_contracts_match_source.py were verified to also fail on main (unrelated to this change).

Out of scope (per ADR-005)

  • Not classifying spec_ambiguous / spec_wrong (LLM nondeterminism, not infra).
  • Not changing 3-strike budget for structural errors.
  • No new alerting beyond the 24h escalation.
  • No error_class enum (deferred until ≥3 distinct classes observed).

🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

## Summary Implements ADR-005: classifies 6 known transient error patterns at the build return site and bypasses the 3-strike loop-breaker for them. Within 24h of `first_attempted_at`, transient retries stay in the same phase (no extra attempt increment, no human_issue), emit `phase.transient_retry`, and use the stale-claim window for backoff. After 24h of persistent transient retries, escalates to `blocked` + human_issue. ## Motivation Verified 2026-06-26 across the 27 mindmaps stories that ended up in `blocked`: every one was a single `{"error": "project repo not found at /workspace/projects/mindmaps; clone the Gitea repo..."}` — a one-time setup miss, not 3 failed attempts. The current loop-breaker treats all non-pass outcomes the same; transient environmental errors burn the same budget as real test failures. Operator drain cycles are dominated by noise from one-time setup gaps and short-lived network blips. ## Changes - `phases.py`: new `is_transient(err)` helper matching 6 documented substrings; all 7 build-phase `tests_failed` return sites route through `_transient_verdict()` to annotate matching errors. - `cycle.py`: Txn 3 loop-breaker split into transient vs structural paths. When `verdict == "tests_failed"` AND `feedback.transient is True`: - Within 24h of `first_attempted_at`: row stays in same phase, emits `phase.transient_retry`, no human_issue. - After 24h: escalates to `blocked` + opens human_issue. - `state.py`: `claim_for_spec/build/review` set `first_attempted_at = COALESCE(first_attempted_at, NOW())` on the first claim. - `schema.sql` + new migration `src/damascus/db/migrations/0007_first_attempted_at.sql`: add `first_attempted_at TIMESTAMPTZ DEFAULT NULL` (forward-compatible, backfilled from `updated_at`). - `api_schemas.py`: new `VerdictFeedback` Pydantic model with `transient: Optional[bool] = None`. - `docs/VERIFICATION.md`: new section documenting the rule and 24h escalation. - 3 new test files (18 test cases total). ## ADR Design rationale: `wiki/decisions/ADR-005-distinguish-transient-tests-failed.md` Live evidence: `wiki/decisions/research/adr-005-transient-tests-failed-research.md` ## Test Results ``` $ pytest tests/test_is_transient.py tests/test_cycle_transient_skip.py tests/test_first_attempted_at.py tests/test_conftest_safety.py tests/unit/ ======================== 52 passed, 4 warnings in 6.98s ======================== ``` Pre-existing failures in `tests/api/test_api_endpoints.py` and `tests/contract/test_contracts_match_source.py` were verified to also fail on `main` (unrelated to this change). ## Out of scope (per ADR-005) - Not classifying `spec_ambiguous` / `spec_wrong` (LLM nondeterminism, not infra). - Not changing 3-strike budget for structural errors. - No new alerting beyond the 24h escalation. - No `error_class` enum (deferred until ≥3 distinct classes observed). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
kaykayyali added 1 commit 2026-06-27 02:38:06 +00:00
feat(orchestrator): distinguish transient vs structural tests_failed (ADR-005)
Some checks failed
test / contract-and-unit (pull_request) Failing after 15s
c122bc262b
The build phase's tests_failed loop-breaker (3 strikes -> blocked + human_issue)
was burning autonomous retries on environmental errors: a missing git clone,
worktree contention, transient DNS/TLS hiccups, and 429s all looked the same as
real test failures. Verified 2026-06-26 across 27 mindmaps stories that ended up
in blocked - every one was a single 'project repo not found at' error.

This change classifies 6 known transient patterns at the build return site and
sets feedback.transient=True. The cycle function's Txn 3 loop-breaker skips
those: within 24h of first_attempted_at, the row stays in the same phase (no
extra attempt increment, no human_issue), emits phase.transient_retry so the
relay/dashboard sees retry activity without spam, and the stale-claim window
provides natural backoff. After 24h of persistent transient retries, the row
escalates to blocked + human_issue.

Files:
- phases.py: is_transient helper + 7 build-phase _transient_verdict annotations
- cycle.py: split Txn 3 loop-breaker into transient vs structural paths;
  emits phase.transient_retry
- state.py: claim_for_spec/build/review set first_attempted_at on first claim
- schema.sql + db/migrations/0007_first_attempted_at.sql: new nullable column,
  backfilled from updated_at for existing rows (forward-compatible)
- api_schemas.py: VerdictFeedback model with transient: Optional[bool] = None
- 3 new test files: test_is_transient.py (13 cases),
  test_cycle_transient_skip.py (transient skips loop-breaker; structural
  preserves 3-strike), test_first_attempted_at.py (24h escalation, fresh
  transient no-escalation, first claim sets timestamp)
- docs/VERIFICATION.md: new section documenting the rule

Co-Authored-By: Claude <noreply@anthropic.com>
kaykayyali merged commit 8bf73e255f into main 2026-06-27 16:38:33 +00:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: kaykayyali/damascus-orchestrator#31