Refactor smart-groceries importers to use camofox

Goal

Refactor smart-groceries scraping to route through camofox so we evade bot detection on Coles/Woolworths and never reveal our gateway IP.

Why this is broken

The smart-groceries-catalogue-scrape cronjob has been failing for days. Last manual run got 0 categories from both Coles and Woolworths because requests.Session() is detectable as a bot. Switching to camofox solves both: real Firefox fingerprint + dedicated NordVPN sidecar.

Files to change

  • /opt/data/smart-groceries/app/importers/coles.py
  • /opt/data/smart-groceries/app/importers/woolworths.py
  • New: /opt/data/smart-groceries/app/importers/camofox_client.py
  • /opt/data/smart-groceries/requirements.txt add httpx

camofox API (probed in production)

  • Service: http://camofox-browser-service.ai-agents.svc.cluster.local:9377
  • GET / returns engine status
  • POST /start boots the Firefox engine if not running
  • POST /tabs/open body {"url":"...","wait":"networkidle"} returns {tabId, ...}
  • POST /tabs/{tabId}/evaluate body {"script":"document.documentElement.outerHTML"} returns {result: "..."}
  • DELETE /tabs/{tabId} to close

After changes

  1. cd /opt/data/smart-groceries && git add app/importers/ requirements.txt
  2. git -c user.email=hermes@paralla.org -c user.name=hermes commit -m "refactor: route Coles/Woolworths importers through camofox"
  3. git push origin main — token is $GITLAB_TOKEN env from hermes-credentials.gitlab-token
  4. Watch GitLab CI: image rebuilds on main branch.

After CI rebuilds

Tell Claude and Claude will update the cronjob to use the new image, drop nordvpn-sidecar, drop wait-for-VPN logic.

Validation

Once cronjob is updated, manual job should yield ~15-20 categories per store and hundreds of products. If anything is unclear, ask Claude.

Result (failed, completed by hermes at 2026-04-30T20:45:03Z)

poller failed: timed out