The posterior distribution: what a sales score should actually report
Your lead score is a single number. The honest version is a distribution — a posterior over the parameter, with a width that tells you how much to trust it. Here is why that matters for B2B.
Your scoring model gave a lead a 70 and routed it to a rep. Someone asked the only question that matters — “is a 70 worth a same-day call?” — and the dashboard had nothing more to say. The number was the whole answer.
That is the quiet failure in most B2B scoring. A 70 backed by 800 historical outcomes and a 70 backed by 6 are not the same fact, yet they print identically and route identically. The Commercial Truth manifesto argues that marketing and revenue have never had the measurement infrastructure other functions take for granted, and the single-number score is one of the clearest places that gap shows.
This piece is for the data-literate GTM operator who reads scores and acts on them. The concept underneath the fix has a precise name. A sales score should report a posterior distribution, not a point estimate.
What a posterior distribution is
A posterior distribution is the full range of plausible values for a quantity after a model has seen the data. Its center is the score you would quote; its width is how much the data has actually pinned that score down. Source: Assay calibration tutorial corpus.
A point estimate keeps only the center and throws the width away. The posterior keeps both, because the width is information. A narrow posterior says the evidence is strong; a wide one says the sample is thin and the number is provisional.
The practical handle on a posterior is a credible interval — a band that, given the data and a stated prior, contains the true value with a stated probability. The posterior is the distribution; the interval is how you read it on a dashboard. The two sibling questions of which interval to report and why the word “confidence” misleads are their own essay.
Why this matters now
GTM teams are scoring more leads, accounts, and opportunities than ever, and more of those scores are being read by AI agents that act on them. When an agent reads a score off a system and treats a thinly-evidenced 70 exactly like a well-evidenced 70, the false certainty propagates at machine speed.
The cost of a point estimate used to be one rep’s misallocated afternoon. Now it is a routing rule, a prioritization queue, and an AI agent’s outbound, all built on a number that never carried its own uncertainty. The Calibration Engine is the discipline that closes this gap; its defining posture is credible intervals over point estimates, and “not yet” when the data is thin. Source: Assay calibration product canon.
The primitives this introduces
Reporting a posterior instead of a number forces three primitives into how a score is read and trusted.
First, the width becomes a first-class output. The score is no longer one figure but a center and a band, and the band is the part that governs how hard you lean on the score. A wide posterior is not a defect; it is the model telling the truth about a small sample.
Second, “not yet” becomes a legitimate verdict. When the posterior is so wide that no action threshold is cleanly inside or outside it, the calibrated answer is “not yet,” not a precise-looking point estimate. This piece reports no measured Assay scoring figures, accordingly — those are not yet published.
Third, the calculation must be reproducible. A posterior produced once and never regenerable is a claim, not evidence. Sampling-based posteriors require a locked seed, model version, and frozen inputs so the same data returns the same distribution on replay. Source: Assay calibration tutorial corpus, Essay 9.
That third primitive is where statistics meets governance. A score that can be regenerated from primary records survives an audit; a score that cannot is an assertion dressed as a measurement. Source: Assay calibration tutorial corpus, Essay 9.
A worked example
Consider two accounts that both score 70 on intent-to-convert. The numbers that follow are illustrative, not measured Assay results.
Account A’s score rests on 800 comparable historical outcomes. Its posterior is narrow — a 90% credible interval of roughly 66 to 74. The 70 is real; the data has pinned it down, and a same-day call is a defensible move.
Account B’s score rests on 6 outcomes from a new vertical. Its posterior is wide — a 90% credible interval of roughly 41 to 88. The center is still 70, but the band spans from “ignore this” to “drop everything.” Acting on B as if it were A is acting on a coin flip dressed as a 70.
A point-estimate dashboard shows two 70s and routes them identically. A posterior dashboard shows a tight band and a sprawling one, and the right action diverges immediately. The honest move for B is not a forced number; it is “not yet — gather more outcomes before we route on this.” Source: Assay calibration product canon.
The difference is not that one account is good and one is bad. It is that one score has earned the action it triggers and the other has not, and only the width tells you which.
Where this lives in the substrate
In Assay, a calibrated score is not a loose number on a chart; it is bound to the same governed substrate as every other commercial claim. The Calibration Engine fits hierarchical models, reports credible intervals rather than p-values pretending to be certainty, and says “not yet” when samples are too thin to declare a winner. Source: Assay calibration product canon.
The brand glyph for this is the Credible Interval Bar — a filled band when the posterior is informative, a ghosted band when the honest answer is “not yet.” Source: Assay calibration product canon. It is the visual signature of a score that reports its own uncertainty instead of hiding it. A posterior, in other words, is what a sales score looks like once it has to be sourced, scored, and defensible.
Closes / opens
Closes the LSO §F.7 calibration-tutorial cluster’s scoring question: a sales score should report a posterior distribution, not a point estimate, because the width of that distribution is the part that tells a GTM team and its AI agents how much to trust the number.
Opens the question that immediately follows — a posterior depends on the prior you start from, so how do you choose that prior without quietly baking in the answer you wanted? That is its own calibration problem, and its own essay.
A score reported as a distribution, regenerable from frozen inputs and capped by a source-type ceiling, is exactly the kind of measured claim the methodology Assay is developing for the Commercial Truth Index is built to evaluate — whether the numbers a platform emits are honestly grounded and calibrated rather than confidently wrong.
This essay is grounded in the Assay calibration tutorial corpus (Essay 9) and the Calibration Engine product canon. Methodology for the Commercial Truth Index is in development.