How model and effort evaluation became
inference-router, a reusable skill that can score any
capability before it is built.
The previous article explained why “which model should I use?” was the wrong altitude for a pipeline. If you have not read it, start here: I Asked Which Model to Use. The Codebase Answered: For Which Step?
This article picks up with the next design move.
Once model selection became a per-step decision, I had a new problem. If every generated capability needed model and effort assignments, where should that decision live?
The easy answer would have been to bury the logic inside
capability-architect.
But that would make the builder too smart and everything else too dependent on it.
Instead, the routing process became its own skill:
inference-router.
While working on capability-architect, I reviewed the
AIMM model-selection guide for anything that could transfer into skill
design.
The useful discovery was that model choice was not enough.
Every component needed two independent assignments:
model tier
effort level
Model controls the capability ceiling. Effort controls how much deliberation the model applies.
The routing table needed to answer questions like:
That last question became the cost-substitution heuristic:
model and effort are cost substitutes
In plain English: sometimes a lower-tier model at higher effort is the right choice. Sometimes a stronger model at lower effort is better. The point is not always to climb the model ladder.
For AIMM members, this matters because “use the best model” is a blunt habit. The better question is “what is the cheapest reliable path to the quality bar for this component?”
Concept #1: Assign model and effort
independently.
A model name is not a routing decision. Effort changes how the model
works and should be chosen on purpose.
The user clarified the desired shape:
I want to apply, as a separate process, the skill that determines what the correct level of inference has to be applied to each skill; and then, based on that result, to create a skill that fits into the .ailib
That sentence split the architecture cleanly.
There were now two jobs:
Those are related, but they are not the same job.
If capability-architect owned both, the routing
knowledge would be trapped inside the builder. Any other skill or agent
that needed routing advice would have to duplicate the logic or call the
whole builder just to get a model recommendation.
The better decomposition was:
references/model-effort-routing.md
skills/inference-router/SKILL.md
skills/capability-architect/
The shared reference holds the routing knowledge.
The inference-router skill applies that knowledge.
The capability-architect skill consumes the result when
generating a bundle.
Concept #2: Make routing a reusable process, not an inline
habit.
If many capabilities need the same decision, the decision process
deserves its own skill.
The routing knowledge moved to sibling scope:
.ailibrary/references/model-effort-routing.md
That file contains the reusable ideas:
This matters because a routing table is not owned by
capability-architect. A writing orchestrator can use it. A
research agent can use it. A course-design system can use it. A future
skill evaluator can use it.
That is exactly the local-vs-shared rule from the first article in the series. If only one skill uses a resource, keep it local. If multiple skills can use it, promote it to sibling scope.
The routing table was clearly reusable, so it belonged at the root reference layer.
The inference-router skill needed a clean output.
It should not return a long essay. It should return structured decisions that another process can use.
The contract looked like this:
[
{
"component": "draft-fork",
"step_class": "long-form_generation",
"model": "opus",
"effort": "high",
"rationale": "Voice and coherence are the deliverable; output is unbounded."
}
]That output is small, but it carries the essential decision.
The component says what is being routed.
The step_class says what kind of work it is.
The model and effort say how much inference
should be applied.
The rationale makes the decision auditable.
This is important. A routing decision without rationale becomes superstition. A routing decision with rationale can be reviewed, corrected, and reused.
Concept #3: Return decisions with reasons.
A model assignment should be inspectable. Otherwise the system
cannot learn from bad routing.
One explicit warning belonged in the router:
don't default everything to flagship+high
That is the failure mode of most model-selection conversations.
If the work feels important, people reach for the strongest model at high effort. But importance to the business is not the same as difficulty for the model.
A formatting step may be business-critical and still mechanically simple. A fact-checking step may be high-stakes but depend more on retrieval design than raw intelligence. A copy-editing step may matter, but low effort may be exactly right.
The router’s job is to resist prestige routing.
It should ask:
That makes routing a diagnostic process, not a status symbol.
capability-architectAfter the decomposition, capability-architect no longer
decided model and effort inline.
Its procedure changed to:
capability spec
-> inference-router assigns model + effort
-> capability-architect builds the skill bundle using those assignments
That is cleaner.
The builder can focus on structure:
The router can focus on inference level:
The two processes can evolve independently.
If the model lineup changes, update the routing reference and router. The builder does not need to be rewritten.
If the skill-bundle architecture changes, update
capability-architect. The routing principles still
stand.
The immediate unlock is cleaner capability generation.
The larger unlock is reusable judgment.
A knowledge entrepreneur can use the same pattern for any AI workflow:
The router becomes a reusable thinking partner for architecture.
It does not answer “what model is best?” in the abstract. It answers “what model and effort does this component need, given the job it performs?”
That is a much more useful question.
inference-router applies routing knowledge and returns
structured assignments.capability-architect consumes those assignments when
building new skill bundles.The highest teaching value was the separation between routing knowledge, routing process, and capability construction. That is the design move readers can reuse.
I left model names somewhat generic because the durable pattern is the router architecture, not a temporary model lineup.
The single most valuable thing to add before publishing would be a
sample model-effort-routing.md excerpt that members could
adapt for their own stack.