Created: 2026-06-01 | Updated: 2026-06-01 | Type: Reference
Purpose: Canonical catalog of build/CI failures for automated detection and triage by hermes-03-retro-signals cron.

Structure

Each entry has:

  • ID: Unique short code (e.g., DEPLOY-MISMATCH)
  • Name: Human-readable description
  • Category: Grouping (infra, test, deployment, model, external)
  • Severity: critical / high / medium / low
  • DetectionPattern: Regex or string pattern for log matching
  • ContextMatcher: Optional secondary filter to reduce false positives
  • AutoFileTask: Whether a fix task should be auto-created when threshold hit (yes/no)

Infrastructure Failures

INFRA-CRASHLOOPBACKOFF

FieldValue
IDINFRA-CRASHLOOPBACKOFF
NameKubernetes CrashLoopBackOff
Categoryinfra
Severitycritical
DetectionPatternCrashLoopBackOff
ContextMatcher(none)
AutoFileTaskyes

Container restarts repeatedly, pod never stabilizes. Often OOM or startup error.


INFRA-OOMKILLED

FieldValue
IDINFRA-OOMKILLED
NameContainer OOM Killed
Categoryinfra
Severitycritical
DetectionPattern`OOMKilled
ContextMatcher(none)
AutoFileTaskyes

Pod killed by kernel OOM killer. Memory limit too low or leak in container.


INFRA-CRASH-ON-STARTUP

FieldValue
IDINFRA-CRASH-ON-STARTUP
NameContainer crashed on startup
Categoryinfra
Severityhigh
DetectionPattern`crash loop detected
ContextMatcher(none)
AutoFileTaskyes

Container fails immediately on startup, likely binary error or missing dependency.


INFRA-IMAGE-PULL-ERROR

FieldValue
IDINFRA-IMAGE-PULL-ERROR
NameContainer image pull failure
Categoryinfra
Severityhigh
DetectionPattern`Failed to pull image
ContextMatcher(none)
AutoFileTaskyes

Image registry auth failure, rate limit, or corrupt image.


Test Failures

TEST-CI-TIMEOUT

FieldValue
IDTEST-CI-TIMEOUT
NameCI test timed out
Categorytest
Severitymedium
DetectionPattern`timed out
ContextMatcher(none)
AutoFileTaskyes

Test job exceeded time limit. Usually indicates slow test or deadlock.


TEST-ASSERTION-FAILED

FieldValue
IDTEST-ASSERTION-FAILED
NameTest assertion failed
Categorytest
Severityhigh
DetectionPattern`AssertionError:
ContextMatcher(none)
AutoFileTaskyes

Logic regression — expected output doesn’t match actual.


TEST-DEPENDENCY-MISSING

FieldValue
IDTEST-DEPENDENCY-MISSING
NameTest dependency not found
Categorytest
Severitymedium
DetectionPattern`ModuleNotFoundError:
ContextMatcher(none)
AutoFileTaskyes

Missing Python package or system dependency in CI environment.


TEST-SERVICE-UNREACHABLE

FieldValue
IDTEST-SERVICE-UNREACHABLE
NameTest service unreachable
Categorytest
Severitymedium
DetectionPattern`Connection refused.*8080
ContextMatcher(none)
AutoFileTaskno

Inference server, database, or test fixture not responding. Often transient.


Deployment Failures

DEPLOY-MERGE-CONFLICT

FieldValue
IDDEPLOY-MERGE-CONFLICT
NameMerge conflict preventing deploy
Categorydeployment
Severityhigh
DetectionPattern`CONFLICT.*merging
ContextMatcher(none)
AutoFileTaskyes

CI can’t auto-merge due to conflicting changes. Manual resolution needed.


DEPLOY-HOOK-FAILURE

FieldValue
IDDEPLOY-HOOK-FAILURE
NameDeployment hook failed
Categorydeployment
Severityhigh
DetectionPattern`post-deploy.*hook.*failed
ContextMatcher(none)
AutoFileTaskyes

Post-deploy notification, integration hook, or callback failed.


DEPLOY-REVISION-MISMATCH

FieldValue
IDDEPLOY-REVISION-MISMATCH
NameDeployment revision doesn’t match commit
Categorydeployment
Severitymedium
DetectionPattern`revision.*does not match
ContextMatcher(none)
AutoFileTaskyes

Rollback or cache issue — wrong version deployed.


Model / Inference Failures

MODEL-SERVER-DOWN

FieldValue
IDMODEL-SERVER-DOWN
NameInference server unreachable
Categorymodel
Severitycritical
DetectionPattern`connection refused.*8080
ContextMatcher`192.168.100.(106
AutoFileTaskno

Local or remote model serving endpoint not responding. Often transient GPU/CPU load issue.


MODEL-OUTPUT-MALFORMED

FieldValue
IDMODEL-OUTPUT-MALFORMED
NameModel output not parseable as JSON/YAML
Categorymodel
Severitymedium
DetectionPattern`invalid JSON
ContextMatcher(none)
AutoFileTaskyes

Structured output pipeline broke — usually indicates prompt template regression or context overflow.


MODEL-VISION-MISMATCH

FieldValue
IDMODEL-VISION-MISMATCH
NameVision model output doesn’t match ground truth
Categorymodel
Severitymedium
DetectionPattern`vision.*mismatch
ContextMatcher(none)
AutoFileTaskno

Small model for vision/OCR returned wrong result. Usually indicates prompt issue, not model bug.


External Service Failures

EXTERNAL-API-DOWN

FieldValue
IDEXTERNAL-API-DOWN
NameExternal API service unavailable
Categoryexternal
Severitymedium
DetectionPattern`external.*service.*unavailable
ContextMatcher(none)
AutoFileTaskno

Third-party API down — don’t auto-file tasks for things we can’t fix. Monitor only.


EXTERNAL-RATE-LIMITED

FieldValue
IDEXTERNAL-RATE-LIMITED
NameRate limited by external service
Categoryexternal
Severitymedium
DetectionPattern`429 Too Many Requests
ContextMatcher(none)
AutoFileTaskno

Hit API rate limits. Usually self-resolving with backoff.


Error Patterns

ERROR-500-INTEG-FALLBACK

FieldValue
IDERROR-500-INTEG-FALLBACK
Name500 errors from integration fallback (hermes-gateway)
Categoryexternal
Severityhigh
DetectionPattern`integration fallback.*500
ContextMatcher(none)
AutoFileTaskyes

The integration fallback endpoint at the hermes gateway is returning 500 errors. This often happens when the primary service falls back to a secondary integration path that itself fails. Auto-file for review and fix.


FieldValue
IDERROR-DEADLINK-HACK
NameDeadlink hack attack detected
Categoryexternal
Severitycritical
DetectionPattern`Deadlink Hack:.*attacker is attempting to inject malicious links
ContextMatcher(none)
AutoFileTaskno

Security alert — external service flagged a Deadlink Hack injection attempt. This is an attack, not a bug. Log it but don’t create fix tasks for attacker payloads.


Taxonomy Meta-Rules

  1. Pattern specificity: Detection patterns must match the failure mode with high precision to avoid false positives
  2. Threshold: Auto-file tasks when count >= 3 within rolling 24-hour window
  3. Escalation: Critical severity → alert + task; High severity → task only; Medium/Low → log only (no auto-task)
  4. ContextMatcher: Used to further filter matches — must also match in log context if specified