Credible and confidence intervals: the statistic your marketing dashboard gets wrong
Your messaging test reports a 95% interval. It almost certainly does not mean what your team thinks it means. Here is the difference, and why it matters for small B2B samples.
Your last messaging test came back with a number and a band around it. Variant B lifted reply rate, and the dashboard printed a 95% interval. Someone in the room said “so we’re 95% sure the real lift is in that range” — and everyone nodded.
That sentence was almost certainly false, and not because the math was wrong: it was false because the interval on the screen was a frequentist confidence interval, and a confidence interval does not license that sentence. The Commercial Truth manifesto argues that marketing has never had the measurement infrastructure other functions take for granted. This is one of the places that absence shows up most quietly.
This piece is for the data-literate GTM operator who reads intervals and acts on them. The distinction between a credible interval and a confidence interval is not pedantry. It is the gap between the claim your tool makes and the claim you thought you were reading.
Why this matters now
GTM teams are running more experiments than ever, and more of them are being read by AI agents that emit copy downstream. When an agent reads a number off a dashboard and turns it into a confident sentence in an email, the misinterpretation propagates at machine speed.
The cost of confusing the two intervals used to be a slightly overstated slide. Now it is a chain of overstated claims, each grounded in a band that never meant what the chain assumes. Calibration is the discipline that closes this gap, and it begins with reading the interval correctly.
What each interval actually claims
A frequentist confidence interval is a statement about a procedure, not about your parameter. A 95% confidence interval means: if you repeated this experiment many times and built an interval each time by the same method, about 95% of those intervals would contain the true value. Source: Assay calibration tutorial corpus, Essay 2.
Notice what that does not say: it says nothing about the one interval you actually have. The true lift is either inside your specific band or it is not — there is no probability attached to this interval, only to the long-run frequency of the method that produced it. It is a procedural guarantee, and rarely the thing the marketer wanted.
A Bayesian credible interval is a statement about the parameter, given your data. A 95% credible interval means: given the data you observed and your stated prior, there is a 95% probability the true lift lies inside this band. Source: Assay calibration tutorial corpus, Essay 2.
That is the sentence the room said out loud. It is what people always meant. The credible interval delivers a direct probability statement about where the value is; the confidence interval delivers a defense of the method that found a range.
The primitives this introduces
Reading intervals correctly forces three primitives into your reporting.
First, the prior becomes explicit. A credible interval requires you to state what you believed before the data arrived — and that is a feature, not a leak. Honest reporting publishes the prior alongside the posterior, so a reviewer can substitute their own and re-run.
Second, “not yet” becomes a first-class output. When the sample is too thin to produce a band you would act on, the calibrated answer is “not yet,” not a precise-looking point estimate. This piece, accordingly, reports no measured Assay lift figures — those are not yet published.
Third, confidence stops meaning “probably true.” In Assay’s substrate, a confidence score is a trust score over the evidence chain behind a claim, capped by a per-source-type ceiling — not a probability that the claim is semantically correct. Source: Assay confidence-scoring canon. A credible interval and a confidence score answer different questions, and conflating them is its own quiet error.
A worked example
Consider a healthcare-vertical messaging test with 12 conversions on the new variant. The numbers that follow are illustrative, not measured Assay results.
The frequentist tool reports a 95% confidence interval on lift of roughly −2% to +18%. Technically correct — the procedure has the right long-run coverage — but operationally it is a shrug. It spans zero, it is very wide, and it offers no probability you can act on.
A Bayesian model with a weakly-informative prior reports a 95% credible interval of roughly +1% to +14%, with about an 88% posterior probability the lift is positive. Now you can say a true sentence: “given what we saw and what we assumed, there is an 88% chance B beats A.” At 12 conversions, the credible interval stays interpretable where the confidence interval collapsed into a procedural defense. Source: Assay calibration tutorial corpus, Essay 2.
The difference is not that one is optimistic and one is cautious. It is that one answers your question and one answers a different question well.
Closes / opens
Closes the LSO §F.7 calibration-tutorial cluster’s interval question: a confidence interval is a procedural guarantee about a method; a credible interval is a probability statement about your parameter, and the second is what GTM teams have been reading into the first.
Opens the next question this raises — once you accept credible intervals, how do you choose the prior without baking in the answer you wanted? That is its own calibration problem, and its own essay.
The next time a vendor says “we’re 89% confident,” ask whether they mean it in the frequentist sense or the Bayesian sense. The first is a defense of their method. The second is a claim about your reality — and a claim that can be calibrated, scored, and checked over time is the methodology Assay is developing for the Commercial Truth Index, measuring whether the numbers a platform emits are honestly grounded.
This essay is grounded in the Assay calibration tutorial corpus (Essay 2) and the confidence-scoring concept canon. Methodology for the Commercial Truth Index is in development.