Skip to content
Article

The not-yet rate: the calibration metric that measures B2B AI honesty

June 2, 2026

Most B2B AI systems answer every question, including the ones they cannot support. The not-yet rate measures how often a system declines — and why that number is a sign of health, not weakness.

Ask a B2B AI system about a product detail it has thin evidence for, and watch what it does. Almost always, it answers anyway. The polished sentence arrives with the same fluency as one the system could fully support, and nobody in the room can tell the two apart.

That uniformity is the problem. The Commercial Truth manifesto argues that marketing has never had the measurement infrastructure other functions take for granted, and this is one of the quietest places that gap shows up. A system that answers everything is not confident — it is undisciplined, and we have no number that says so.

This piece is for the data-literate GTM operator who already reads a confidence score and acts on it. The metric I want to name plainly is the not-yet rate: the share of questions a system declines to answer, returning “not yet” instead of a guess dressed up as a fact.

What the not-yet rate measures

The not-yet rate is a single fraction. Of all the questions a system was asked, how many did it decline to answer because the evidence behind the answer did not clear a threshold? Source: Assay calibration tutorial corpus, Essay 1.

A system that always answers has a not-yet rate of zero. That sounds like coverage; it is usually overclaiming. Somewhere in that universal willingness to respond are the questions the system could not actually support, answered with the same confident fluency as the ones it could. Source: Assay calibration tutorial corpus, Essay 1.

The brand line behind this is short. Confident is a marketing word; calibrated is a measurement word. Source: Assay brand canon, PIL-B2. The not-yet rate is one way to put a number on the difference.

Why this matters now

The substrate shifted underneath the question. A confident wrong answer used to cost a slightly overstated slide; now it propagates. When an AI agent reads a fact and emits it into an email, a chatbot reply, and a proposal, the overclaim is reproduced at machine speed across every surface at once.

So the willingness to abstain stops being a soft virtue and becomes load-bearing. A system that knows when to say “not yet” contains the error before it copies itself. A system with a not-yet rate of zero has no brake, and the confident wrong answer ships everywhere a rep, a doc, or an agent reaches.

This is why “not yet” is a first-class output in Assay’s posture, not an error state. Source: Assay brand canon, PIL-B2. The metric exists to make that posture auditable rather than aspirational.

The primitives the metric introduces

Treating the not-yet rate as real forces three primitives into reporting.

First, abstention becomes an explicit decision with a threshold. A claim is answered only when the evidence behind it clears a bar; below the bar, the output is “not yet.” That bar is a published number a reviewer can inspect and argue with, not a mood.

Second, the rate is meaningless on its own and only reads against accuracy. The honest pairing is abstention rate next to the accuracy of what survived. A system that declines on the right questions raises its accuracy on the ones it answers; a high not-yet rate with no accuracy gain is just timidity. Source: Assay calibration tutorial corpus, Essay 1.

Third, the threshold is tied to the strength of the evidence, not the smoothness of the prose. In Assay’s substrate a confidence score is a trust score over the evidence chain behind a claim, capped by a per-source-type ceiling — not a probability that the claim reads well. Source: Assay confidence-scoring canon. The not-yet rate counts the claims that fell below that ceiling and were withheld.

A worked example

Consider a system fielding 1,000 product and pricing questions. The numbers that follow are illustrative, not measured Assay results.

The always-answer baseline responds to all 1,000 and lands 78% of them correct. Its not-yet rate is zero, and the 22% it gets wrong arrive with the same confident phrasing as the 78% it gets right — the reader cannot separate them.

A calibrated system declines the 150 questions whose evidence falls below its threshold, answering 850 with 94% correct. Its not-yet rate is 15%. The operator now has two honest sentences instead of one false one: the system answered 85% of questions, and on those it was right 94% of the time.

The 150 “not yet” replies are not failures — they are the questions routed to a human or to fresh sourcing before anyone speaks for the company. The total volume of correct answers barely moved; the volume of confident wrong ones collapsed.

That is the whole point. The not-yet rate does not make a system know more. It makes a system stop pretending to know what it does not, and it turns that restraint into a number you can track.

Closes / opens

Closes the LSO §F.7 calibration-tutorial cluster’s honesty question: a B2B AI system’s willingness to answer everything is not a strength but the absence of a brake, and the not-yet rate — read against accuracy — is how that discipline becomes measurable.

Opens the next question it raises: if a system should decline below a threshold, who sets the threshold, and how do you choose it without baking in the answer you wanted? That is a question about priors, and its own calibration problem.

The next time a vendor tells you their AI “answers anything you ask,” treat it as a warning, not a feature, and ask for the not-yet rate beside the accuracy. A number that pairs the discipline to abstain with the accuracy that discipline buys is the methodology Assay is developing for the Commercial Truth Index, measuring whether the claims a platform emits are honestly grounded rather than merely fluent.

This essay is grounded in the Assay calibration tutorial corpus (Essay 1) and the brand canon PIL-B2 (Calibrated). Methodology for the Commercial Truth Index is in development.

FAQ

Frequently Asked Questions

What is the not-yet rate for a B2B AI system?
It is the share of questions a system declines to answer outright, returning not yet instead of a confident-looking guess. A system that always answers has a not-yet rate of zero, which usually means it is fabricating on the questions it cannot actually support. Source: Assay calibration tutorial corpus, Essay 1.
Is a higher not-yet rate good or bad?
Neither on its own. The not-yet rate is only meaningful against accuracy: a system that abstains on the right questions raises its accuracy on the ones it does answer. A high rate with no accuracy gain is just timidity; a zero rate is overclaiming.
How is the not-yet rate different from a confidence score?
A confidence score grades a claim the system chose to make. The not-yet rate counts the claims it chose not to make. One measures graded answers; the other measures the discipline to abstain. A calibrated system reports both. Source: Assay confidence-scoring canon.
Can you cite a measured not-yet rate for Assay?
Not yet. A published not-yet rate requires the Commercial Truth Index dry-run, whose methodology is in development. Reporting one before it is measured would be exactly the overclaiming the metric is meant to catch.