Anchovy Labs
Development Infrastructure

Hard problems we're working on

Using AI to write code is the easy part. The hard part is making agentic workflows reliable enough to trust. These are the questions we spend most of our infrastructure time on.

Quality at generation speed

Agents produce code faster than humans can review it. We gate every change against spec-driven acceptance criteria before it merges — automated verification, not trust. It catches more than manual review did, but calibrating the gates is ongoing work.

Context across long-running work

A session that loses context mid-task makes worse decisions than one that never had it. We maintain structured memory across sessions so that decisions, constraints, and domain knowledge persist. Getting compaction right — what to keep, what to drop — is harder than it sounds.

Model routing

We route through a model hierarchy that optimizes for correctness first, cost second, speed third. The routing adapts automatically when providers go down. This isn't novel — it's table stakes — but getting it reliable across many providers took longer than expected.

Detecting when things go off track

Autonomous processes fail silently in ways that are expensive to debug after the fact. We use stagnation detection and circuit breakers to catch drift early. It works well enough that we trust unattended runs, but the false-positive rate is still higher than we'd like.

Observability

You can't trust a system you can't see into. Every session, routing decision, and verification outcome is logged and traceable. The goal is that any failure can be understood from the logs alone — we're close but not there yet.

Giving agents the right knowledge

General-purpose models make generic decisions. We maintain curated knowledge bases — domain documents, verified research, internal doctrine — that agents retrieve at decision time. The retrieval layer is tuned so agents get relevant context without drowning in noise. Building the knowledge is the easy part; keeping it current and knowing when to distrust it is the ongoing work.

None of this is solved. It's infrastructure we build and revise continuously because the alternative is slowing down or shipping lower quality work. Neither is acceptable.