AccounTech Buddy

Multi-agent month-end close system

▶ Full demo with audio. Click play to start.

What it does

A root Accounting Coordinator agent reads each user request, decides which workflow to run, and tracks session state across the run.
Transaction Categorization workflow: loads the bank CSV and Chart of Accounts, splits the transactions into small batches so multiple agents can classify in parallel, and returns each line with a confidence score (0.0–1.0) and a short reasoning note.
A filtering pass flags low-confidence categorizations, generates account-usage stats, and writes a review summary so a human knows exactly where to look.
Journal Entry Generation workflow: picks up the categorized data and writes balanced double-entry journal entries. Each transaction debits the expense or revenue account it was categorized into and credits the cash control account, so total debits always equal total credits.
Session Management tools let the agent reload prior categorization results so a run can be resumed or revised without redoing the AI work.

Why I built it

I had spent a few weeks using AI for the basics: explaining standards, polishing communications, cleaning Excel formulas. I wanted to push further. I needed a real project to find out where the models actually held up and where the architecture broke down. Classifying bank transactions and generating journal entries was familiar enough that I would catch every wrong answer, and complex enough to push a multi-agent system past the demo path. AccounTech Buddy was the bet.

How it works

The root coordinator reads the request and dispatches to one of three branches. Each branch runs deterministically to its own output, so the audit trail stays clean.

What was hard

The vibe-coding trap. Every Cursor suggestion felt reasonable in isolation, and I rode the "look at me, I am basically a software architect now" energy until a month later I had built a monster that could not reliably do what I designed it for.
Agent overload. The first version crammed dozens of tools and a half-dozen sub-agents into one root agent with a massive prompt. It worked once in a while, then failed in a new way the next run.
AI vs. code confusion. I could not figure out when to let an agent "think through" a problem versus when to write a deterministic function. The wrong call in either direction blew up reliability.
Death by large files. I was feeding entire documents to agents, which caused memory issues and scaling problems. Fine on small samples, broken on real input.
Google ADK runs sub-agents in parallel, but what I actually needed was map-reduce: the same agent running on different chunks of one job, not different agents running on the same job. I had to manually implement file chunking and custom parallel processing.
The 2 AM rebuild. After debugging for the 50th time, I realized I was fighting the wrong battle. I stripped out 90% of the clever architecture and rebuilt using audit thinking: map the process flow, sample-test instead of processing everything at once, give each agent a small specific task, exception reporting (auto-approve high confidence, flag unusual), simple file-based session state. The current diagram is the rebuild, not the monster.

Outcome

A categorized list of every bank transaction, with a confidence score and a short reasoning line so a human knows where to double-check.
A balanced double-entry journal ready to drop into accounting software.
A period-end summary report with account-level totals and the low-confidence items flagged for review.

Tech

Google Agent Development Kit · Python · LLMs · ThreadPoolExecutor