Session goal: Run bill scanner, fix PyMuPDF venv issue, process all attachments.

Progress log:

  • 14:00 — Ran bill-scanner.py --scan: found 2 potential bills (Unitywater May, Policy 77512971)
  • 14:01 — Ran --process-attachments: all 11 PDFs failed with “No module named ‘fitz’” — system Python missing PyMuPDF
  • 14:03 — Installed PyMuPDF v1.27.2.3 via uv pip install into /opt/data/.venv/
  • 14:04 — Fixed scanner shebang from #!/usr/bin/env python3#!/opt/data/.venv/bin/python3
  • 14:04 — Removed stale sys.path.insert(0, '/opt/hermes/...') hack in pdf_to_images() function
  • 14:05 — Re-ran --process-attachments: SUCCESS — 11 PDFs → 15 pages converted → all OCR’d via GPU node (carnice-v2-27b, 192.168.100.106:8080)

Outputs:

  • Fixed /opt/data/bin/bill-scanner.py shebang + removed broken sys.path hack
  • PyMuPDF v1.27.2.3 installed in /opt/data/.venv/
  • 15 OCR results saved to /opt/data/bills/processed/:
    • 111_Unitywater Bill 27 May 2026_page1.txt + _page2.txt (Unitywater Qtr bill)
    • 514_PowerCo_Bill_MAR2026*.txt (3 PowerCo test files — NZ address, skip)
    • 55_PowerCo_Bill_MAR2026_page1.txt (PowerCo test file — NZ address, skip)
    • 825_Unitywater Bill 27 May 2026_page*.txt (duplicate of email 111, same files)
    • 98_JB-25702383-5008679994-146_page1.txt + 99_* (JB Hi-Fi AirPods, $360.99 paid)
    • test_421_Invoice_page*.txt (Superloop $119 broadband — past due, needs verification)

Issues / Questions:

  • PowerCo files are clearly test data (NZ Auckland address). Leave as-is or clean up?
  • Superloop “test_421” account ($119/month, past due since 5 May): likely dummy. Needs pvs confirmation to delete.

Status: done