AccounTech Buddy
Multi-agent month-end close system
▶ Full demo with audio. Click play to start.
What it does
- A root Accounting Coordinator agent reads each user request, decides which workflow to run, and tracks session state across the run.
- Transaction Categorization workflow: loads the bank CSV and Chart of Accounts, splits the transactions into small batches so multiple agents can classify in parallel, and returns each line with a confidence score (0.0–1.0) and a short reasoning note.
- A filtering pass flags low-confidence categorizations, generates account-usage stats, and writes a review summary so a human knows exactly where to look.
- Journal Entry Generation workflow: picks up the categorized data and writes balanced double-entry journal entries. Each transaction debits the expense or revenue account it was categorized into and credits the cash control account, so total debits always equal total credits.
- Session Management tools let the agent reload prior categorization results so a run can be resumed or revised without redoing the AI work.
Why I built it
I had spent a few weeks using AI for the basics: explaining standards, polishing communications, cleaning Excel formulas. I wanted to push further. I needed a real project to find out where the models actually held up and where the architecture broke down. Classifying bank transactions and generating journal entries was familiar enough that I would catch every wrong answer, and complex enough to push a multi-agent system past the demo path. AccounTech Buddy was the bet.
How it works
What was hard
- The vibe-coding trap. Every Cursor suggestion felt reasonable in isolation, and I rode the "look at me, I am basically a software architect now" energy until a month later I had built a monster that could not reliably do what I designed it for.
- Agent overload. The first version crammed dozens of tools and a half-dozen sub-agents into one root agent with a massive prompt. It worked once in a while, then failed in a new way the next run.
- AI vs. code confusion. I could not figure out when to let an agent "think through" a problem versus when to write a deterministic function. The wrong call in either direction blew up reliability.
- Death by large files. I was feeding entire documents to agents, which caused memory issues and scaling problems. Fine on small samples, broken on real input.
- Google ADK runs sub-agents in parallel, but what I actually needed was map-reduce: the same agent running on different chunks of one job, not different agents running on the same job. I had to manually implement file chunking and custom parallel processing.
- The 2 AM rebuild. After debugging for the 50th time, I realized I was fighting the wrong battle. I stripped out 90% of the clever architecture and rebuilt using audit thinking: map the process flow, sample-test instead of processing everything at once, give each agent a small specific task, exception reporting (auto-approve high confidence, flag unusual), simple file-based session state. The current diagram is the rebuild, not the monster.
Outcome
- A categorized list of every bank transaction, with a confidence score and a short reasoning line so a human knows where to double-check.
- A balanced double-entry journal ready to drop into accounting software.
- A period-end summary report with account-level totals and the low-confidence items flagged for review.
Tech
Google Agent Development Kit · Python · LLMs · ThreadPoolExecutor