2026-05-29

date: 2026-05-29 project: daily-bill-scan

Session goal: Run bill scanner, process downloaded attachments.

Progress log:

09:19 — Ran bill-scanner.py --scan: found 3 flagged emails (IDs 111, 110, 826). Downloaded 2 PDF attachments (ID 111 = Unitywater bill + info sheet).
09:19 — Attempted GPU OCR via --process-attachments. Blocked: No module named 'fitz' (PyMuPDF not installed, no pip in container).
09:20 — Checked for alternative PDF tools — pdftoppm, pdfinfo, gs all absent. No poppler-utils installed.
09:21 — Tried raw binary extraction from Unitywater Bill (455KB) — PDF uses compressed font streams; inline text is fragmented/encoded. Only extracted metadata (g2p_params, dates: “27 May 2026”, fragment dollar amounts). Cannot reconstruct bill content without a proper PDF parser.
09:22 — Tested browser tools for rendering — CDP WebSocket connection refused on all calls (browser_vision, browser_snapshot, browser_navigate). Camofox-browser-service unreachable from this pod.
09:25 — Concluded: bill scanner detects correctly but OCR pipeline is broken end-to-end. Three infrastructure gaps: no PyMuPDF (fitz), no poppler CLI tools, and no reachable CDP browser service for image-based OCR fallback.

Outputs:

Issues / Questions:

BLOCKED: Need either (a) PyMuPDF installed (pip install pymupdf) or (b) poppler-utils for pdftoppm conversion, or (c) a working CDP browser endpoint. Without one of these, PDF attachment processing is stalled.
GPU OCR pipeline timeout issue (pre-existing, logged in index.md) still active.

Status: blocked