The Model Picker Became More Useful When It Stopped Being Part of the Builder

How model and effort evaluation became inference-router, a reusable skill that can score any capability before it is built.

The previous article explained why “which model should I use?” was the wrong altitude for a pipeline. If you have not read it, start here: I Asked Which Model to Use. The Codebase Answered: For Which Step?

This article picks up with the next design move.

Once model selection became a per-step decision, I had a new problem. If every generated capability needed model and effort assignments, where should that decision live?

The easy answer would have been to bury the logic inside capability-architect.

But that would make the builder too smart and everything else too dependent on it.

Instead, the routing process became its own skill: inference-router.

The Missing Axis

While working on capability-architect, I reviewed the AIMM model-selection guide for anything that could transfer into skill design.

The useful discovery was that model choice was not enough.

Every component needed two independent assignments:

model tier
effort level

Model controls the capability ceiling. Effort controls how much deliberation the model applies.

The routing table needed to answer questions like:

That last question became the cost-substitution heuristic:

model and effort are cost substitutes

In plain English: sometimes a lower-tier model at higher effort is the right choice. Sometimes a stronger model at lower effort is better. The point is not always to climb the model ladder.

For AIMM members, this matters because “use the best model” is a blunt habit. The better question is “what is the cheapest reliable path to the quality bar for this component?”

Concept #1: Assign model and effort independently.
A model name is not a routing decision. Effort changes how the model works and should be chosen on purpose.

Why It Needed to Be Separate

The user clarified the desired shape:

I want to apply, as a separate process, the skill that determines what the correct level of inference has to be applied to each skill; and then, based on that result, to create a skill that fits into the .ailib

That sentence split the architecture cleanly.

There were now two jobs:

  1. Evaluate what level of inference each component needs.
  2. Build the skill bundle using that evaluation.

Those are related, but they are not the same job.

If capability-architect owned both, the routing knowledge would be trapped inside the builder. Any other skill or agent that needed routing advice would have to duplicate the logic or call the whole builder just to get a model recommendation.

The better decomposition was:

references/model-effort-routing.md
skills/inference-router/SKILL.md
skills/capability-architect/

The shared reference holds the routing knowledge.

The inference-router skill applies that knowledge.

The capability-architect skill consumes the result when generating a bundle.

Inference router flow

Concept #2: Make routing a reusable process, not an inline habit.
If many capabilities need the same decision, the decision process deserves its own skill.

The Shared Reference

The routing knowledge moved to sibling scope:

.ailibrary/references/model-effort-routing.md

That file contains the reusable ideas:

This matters because a routing table is not owned by capability-architect. A writing orchestrator can use it. A research agent can use it. A course-design system can use it. A future skill evaluator can use it.

That is exactly the local-vs-shared rule from the first article in the series. If only one skill uses a resource, keep it local. If multiple skills can use it, promote it to sibling scope.

The routing table was clearly reusable, so it belonged at the root reference layer.

The Router Contract

The inference-router skill needed a clean output.

It should not return a long essay. It should return structured decisions that another process can use.

The contract looked like this:

[
  {
    "component": "draft-fork",
    "step_class": "long-form_generation",
    "model": "opus",
    "effort": "high",
    "rationale": "Voice and coherence are the deliverable; output is unbounded."
  }
]

That output is small, but it carries the essential decision.

The component says what is being routed.

The step_class says what kind of work it is.

The model and effort say how much inference should be applied.

The rationale makes the decision auditable.

This is important. A routing decision without rationale becomes superstition. A routing decision with rationale can be reviewed, corrected, and reused.

Concept #3: Return decisions with reasons.
A model assignment should be inspectable. Otherwise the system cannot learn from bad routing.

The Gotcha: Do Not Default Everything to Flagship

One explicit warning belonged in the router:

don't default everything to flagship+high

That is the failure mode of most model-selection conversations.

If the work feels important, people reach for the strongest model at high effort. But importance to the business is not the same as difficulty for the model.

A formatting step may be business-critical and still mechanically simple. A fact-checking step may be high-stakes but depend more on retrieval design than raw intelligence. A copy-editing step may matter, but low effort may be exactly right.

The router’s job is to resist prestige routing.

It should ask:

That makes routing a diagnostic process, not a status symbol.

How It Feeds capability-architect

After the decomposition, capability-architect no longer decided model and effort inline.

Its procedure changed to:

capability spec
  -> inference-router assigns model + effort
  -> capability-architect builds the skill bundle using those assignments

That is cleaner.

The builder can focus on structure:

The router can focus on inference level:

The two processes can evolve independently.

If the model lineup changes, update the routing reference and router. The builder does not need to be rewritten.

If the skill-bundle architecture changes, update capability-architect. The routing principles still stand.

What This Unlocks

The immediate unlock is cleaner capability generation.

The larger unlock is reusable judgment.

A knowledge entrepreneur can use the same pattern for any AI workflow:

The router becomes a reusable thinking partner for architecture.

It does not answer “what model is best?” in the abstract. It answers “what model and effort does this component need, given the job it performs?”

That is a much more useful question.

Key Takeaways

How to Start

  1. Pick one existing workflow or skill.
  2. Break it into components.
  3. Label each component’s work type: mechanical, extraction, judgment, generation, review, or synthesis.
  4. Assign model and effort separately.
  5. Write one sentence explaining each assignment.

Behind the Article

The highest teaching value was the separation between routing knowledge, routing process, and capability construction. That is the design move readers can reuse.

I left model names somewhat generic because the durable pattern is the router architecture, not a temporary model lineup.

The single most valuable thing to add before publishing would be a sample model-effort-routing.md excerpt that members could adapt for their own stack.