iWhat AI can ship today
This is roughly the boundary as of late 2025, in our hands, on real client projects:
- First-pass code for well-defined features in mainstream languages and frameworks. Often 80% correct, always needing senior review.
- Test scaffolding and migration scripts. Boring, repetitive, deterministic work where the LLM's bias toward verbosity is a feature.
- Translation between formats — JSON to SQL, Markdown to HTML, log scrapers, one-off ETL. The fastest 10× win we've measured.
- Document search over a closed corpus, with citations. Done well, this replaces a junior researcher.
- Draft copy in English. Acceptable but tonally flat; needs a human pass.
iiWhat only humans ship
- The taste judgment — what to build, what to cut, in what order. Every LLM we have used will happily over-build to please you.
- Architecture decisions with multi-year consequences. The model has no view of your hiring plan, your investor narrative, or what the team can maintain in three years.
- The 5% of edge cases that determine whether the product works for a real customer at 11pm. Models are statistical; production is not.
- Conversations with stakeholders. A senior engineer who can disagree with the founder, in person, and stay in the room, is irreplaceable.
- 繁體中文 with the right register. Models translate; they do not yet write with the cadence a HK customer expects.
iiiThe split, in practice
On our engagements, AI now contributes roughly 40% of the keystrokes and 5-10% of the judgment. Headcount has not changed in two years. Output has roughly doubled. We are spending the difference on more careful design, better tests, and longer relationships with each client.
The right question is not "how much can AI do" but "what is now possible because the boring parts are cheaper". The answer, on the engagements we are proudest of, is "more of the thing that actually matters".