EngineeringAI

Why @Via reads with a fast model and acts with a careful one

A single model that's both quick to read intent and careful enough to take action is expensive and slow. Here's the two-tier routing behind Via — a fast model classifies what you want, a capable one does it — and why it makes multi-step requests more reliable.

Evelyn OngJune 19, 20265 min read

When you mention @Via in Slack, two very different jobs hide behind that one message. First, something has to figure out what you actually want — file an issue, set a reminder, answer a question, or all three at once. Then something has to do it well — write the issue body, pick the severity, call the right tool with the right arguments. Most assistants hand both jobs to one big model. That's where the cost and the latency come from, and oddly, where a lot of the mistakes do too.

One model doing two jobs is a bad trade

The instinct is to reach for the most capable model and let it handle everything. But intent classification and careful execution pull in opposite directions:

Reading intent wants to be fast and cheap. It runs on every message, most of which are simple. Paying frontier-model latency to decide "this is a reminder" is wasteful.
Taking action wants to be careful. It runs less often but the stakes are higher — a malformed tool call or a wrong assignee is a real mistake a user has to undo.

Fuse them and you get the worst of both: every trivial message pays the expensive model's tax, and the expensive model — primed to be decisive — sometimes over-acts on a message that only needed a one-line answer.

Two tiers, one request

Via splits the work. A fast, inexpensive model reads the message and emits a small structured plan. A capable model then executes only the steps that plan calls for.

{
  "reply": "On it — filing that and reminding the channel Friday.",
  "actions": [
    { "tool": "create_issue", "title": "Checkout 500s on Safari", "severity": "high" },
    { "tool": "set_reminder", "when": "friday 9am", "text": "Verify checkout fix" }
  ]
}

The fast tier never touches your data or calls a tool — it only decides what should happen. The capable tier never has to guess at intent — it's handed an explicit list and spends its budget getting each step right.

Why the split makes it more reliable

Routing isn't only about cost. Separating "decide" from "do" gives each stage a smaller, sharper job:

Stage	Model	Job	Optimised for
Read	Fast	Classify intent → plan	Latency, recall
Act	Capable	Execute each tool call	Precision

A multi-action request — "file this as a bug, remind the channel Friday, and tag @sam" — used to be where single-model assistants slipped, dropping a step or merging two into one. With an explicit plan in hand, the capable model executes a checklist instead of re-deriving intent mid-flight.

Smaller jobs are easier jobs. The fast model only has to be right about what; the careful model only has to be right about how.

What it feels like in the channel

The visible payoff is mundane in the best way. Simple questions come back quickly because they never wake the expensive tier. Complex, multi-step asks land all their steps because the plan was written down before any action ran. And the bill tracks usage honestly — you pay frontier prices only for the messages that actually move something.

None of this is exposed as knobs. You mention Via, it reads, it acts. The two tiers are just how it stays both fast and careful at the same time — without asking you to choose.

Curious how Via fits the rest of the workspace? Read the assistant docs or add Viably to Slack.

One model doing two jobs is a bad trade

Two tiers, one request

Why the split makes it more reliable

What it feels like in the channel

Bring Viably to your workspace