Engineering case studyAll work

An AI agent that files
quarterly fuel taxes.

Every multi-state trucking fleet has to file an IFTA return each quarter — miles and fuel, apportioned across every state, at rates that change quarterly. I built the system that does it: deterministic Python for the math, a Claude agent for the judgment. It is live, and a real carrier files through it.

01The problem

A quarterly tax nobody wants to do by hand.

The International Fuel Tax Agreement makes every carrier that crosses state lines report miles driven and fuel purchased, apportioned across each U.S. state and Canadian province, at tax rates that change every quarter. It is tedious, deadline-driven, and a wrong number invites an expensive state audit.

Today it is done by hand in spreadsheets or paid out to a bookkeeper. That is the gap this product closes — and it is a real, recurring, paid-for need in a vertical I know.

02The core decision

Deterministic math. AI judgment. Never mixed.

The single most important design choice — and the first thing I'd talk through in an interview.

Deterministic

Python pipeline does the math

Ingest CSV/Excel/PDF exports, compute fleet MPG, taxable gallons, and per-jurisdiction tax + surcharges with CDTFA-exact rounding, then write the state-portal CSV and a per-truck reconciliation Excel. No model touches a number that lands on a tax form.

Judgment

A Claude agent reviews the return

A 16-tool review agent reads the computed return and checks it against the rule base, the live rate matrix, and 21 quarters of real filing history — then writes a plain-English note flagging missing surcharges, MPG anomalies, and audit-bait patterns before a human files.

03Architecture

Vercel at the edge, a Mac mini doing the work.

The front door is on Vercel; the backend runs at home behind a Cloudflare Tunnel — public, always-on, and effectively free to host.

01
Customeruploads mileage + fuel files
02
artjeck.com/iftaNext.js 16 · Vercel
03
Server proxyX-Backend-Key · hides the backend
04
ifta-apiFastAPI · Cloudflare Tunnel → Mac mini
then fans out to

Deterministic pipeline

ingest · calc · validate · report

Review agent

Claude + 16 grounded tools

Operator gate

Telegram approve → packet emailed

04Engineering decisions

The choices worth interviewing on.

Each one is a deliberate trade-off, not a default.

The LLM reviews, it never computes

Every taxable number is produced by tested Python; the agent only checks work already done and is grounded by 16 tools so it cites real figures instead of inventing them. That separation is the line between a tool a carrier can file from and a demo.

Regression-tested to the penny

429 tests, including a real-data backtest that matches a Kentucky carrier's Q4 2025 CDTFA filing exactly. A refactor can't silently change a number the state will recompute.

Cost controlled by risk tier

Routine reviews run on cheaper models; only the highest-risk filings escalate to Opus, with an effort knob for thinking depth. Pennies per filing by default, thorough when the stakes are high.

Real-world safety, not toy auth

Per-client tenant isolation, a Telegram operator-approval gate before any file is processed, magic-link tokens as auth, Turnstile CAPTCHA, per-IP rate limiting, and atomic 'all files land or none do' writes.

~$0 infrastructure

The FastAPI backend runs on a Mac mini behind a Cloudflare Tunnel — no public IP, no server bill — with the Next.js front end on Vercel. The only variable cost is model spend per review.

Multi-tenant from day one

Built around a client registry, not hard-coded to one carrier — so onboarding the second and third fleet is configuration, not a rewrite.

05Results

Shipped, in use, and provably correct.

  • In production with a recurring real-world client (DM Express Inc., Kentucky).
  • Backtested against a real KY carrier's Q4 2025 CDTFA filing — matches to the penny.
  • 429 automated tests run on every change, including real-data backtests.
  • Per-truck reconciliation sums to the fleet total within $5 rounding drift.

Hours → minutes

A quarter that used to mean an afternoon of spreadsheet reconciliation and hand-keying per-state lines comes back in minutes — already reviewed, with a portal-ready CSV, a per-truck Excel for each owner-operator, and a written check of what to fix before filing.

What's next

A hosted vector store for the rule base, a public eval dashboard (golden filings → pass rate), and onboarding the next two carriers.

See it working.

The product is live and accepting carriers. The pipeline, agent, and tests are a real repo.