Session goal: Run bill scanner, process new email attachments with OCR/text extraction.
Progress log:
- 09:00 — Ran
bill-scanner.py --scan. Found 5 emails, downloaded 3 attachment(s). - 09:01 — Attempted
--process-attachments. PyMuPDF (fitz) missing from system Python — venv had it but shebang was wrong. Confirmed/opt/data/.venv/bin/python3loads PyMuPDF v1.27.2.3. - 09:05 — Extracted text via PyMuPDF directly (heredoc script) for the 3 new attachments:
- ID 112 — Invoice #33519 from The Lawnfeed Company ($125, due 12 Jun 2026). New vendor.
- ID 111 — Unitywater Bill (455 KB PDF) — duplicate of existing; no new data.
- ID 111 — “What does your water bill pay for?” — marketing flyer, not a bill.
- 09:08 — Extracted key details from both invoices via OCR text output.
Outputs:
| Vendor | Bill # | Amount | Due Date | Notes |
|---|---|---|---|---|
| The Lawnfeed Co. | INV-33519 | $125.00 (inc GST) | 12 Jun 2026 | New vendor — lawn fertiliser/treatment. Bank: Westpac ANTHONY PECK, BSB 034243, AC# 228865 |
| Unitywater | #7128760918 | $493.71 | 26 Jun 2026 | Duplicate of previously processed bill. Account #100114688 |
Issues / Questions:
- The Lawnfeed Company is a new vendor not seen in prior bills. Invoice reference = payment ref for bank transfer. Needs pvs verification that this service is expected/authorised.
- PyMuPDF works from venv only (
/opt/data/.venv/bin/python3). The scanner script shebang should point there permanently to avoid future “No module named ‘fitz’” errors.
Status: done