Compare2
where is a threshold with metric (, , or ), is the projection with estimate , bounds , and verdict if ; if ; otherwise.
Two-sample confirmatory analysis: compares estimates against practical thresholds.
Input
- — first sample of measurements
- — second sample of measurements
- — list of thresholds: is , , or ; is the threshold value; is the per-threshold error rate
- — optional string for reproducible randomization (passed to )
Output
- Value — list of projections in input order, each
- Unit — per-projection: same unit as the underlying estimator (, , or )
Notes
- Independence — each threshold evaluated independently; no family-wise guarantee
- Unit compatibility — : compatible with sample units; : dimensionless or , requires ; : dimensionless or
- See also — for one-sample metrics (, )
Properties
- Order preservation corresponds to input
- Metric deduplication each distinct metric computed once regardless of threshold count
Example
Compare2([1..30], [21..50], [(Shift, 0, 1e-3)])→[Projection(-20, [...], Less)]Compare2([21..50], [1..30], [(Shift, 0, 1e-3)])→[Projection(20, [...], Greater)]
automates the pattern of computing a two-sample estimate, constructing bounds, and comparing the bounds against a practical threshold. Instead of asking whether is significantly different from zero, it answers whether is reliably above or below a practical threshold. Each threshold produces a ternary verdict that respects both statistical uncertainty and practical relevance. When multiple thresholds are needed (different metrics or different misrates), pass them all in one call to avoid redundant computation.
Algorithm
orchestrates estimation and comparison in two phases: pre-pass validation and the statistical phase.
Pre-pass validation
Before any statistical computation:
- Reject weighted samples for and (unsupported).
- Check that and have compatible units.
- Reject null or empty threshold list.
- Reject threshold items containing null.
- Reject thresholds with metrics not in (wrong arity).
- Reject thresholds with non-finite values.
These checks happen before bounds computation, so no statistical work is done on invalid inputs.
Validate-and-normalize pass
For each threshold, in input order:
- Shift: check unit compatibility with ; convert threshold value to the finer of s and s units.
- Ratio: accept unit or dimensionless (coerce to ); threshold value must be .
- Disparity: accept unit or dimensionless (coerce to ); threshold value must be finite.
Bindings that support plain numeric shorthand (Python and R) interpret it directly on the working comparison scale; explicit measurement thresholds are normalized as above.
Statistical phase (canonical metric order: Shift → Ratio → Disparity)
For each present metric (in canonical order), compute the estimate once and bounds for each threshold entry of that metric:
Verdict computation
where are the bounds for threshold and is the normalized threshold value.
Result ordering
Results are stored in input order regardless of canonical processing order. Input [Disparity, Shift] produces output [disparity_projection, shift_projection].
using Pragmastat.Estimators;
using Pragmastat.Exceptions;
using Pragmastat.Metrology;
namespace Pragmastat.Internal;
internal static class CompareEngine
{
private readonly struct MetricSpec
{
public Metric Metric { get; }
public Func<Threshold, Sample, Sample?, Measurement> ValidateAndNormalize { get; }
public Func<Sample, Sample?, Measurement> Estimate { get; }
public Func<Sample, Sample?, Probability, Bounds> Bounds { get; }
public Func<Sample, Sample?, Probability, string, Bounds>? SeededBounds { get; }
public MetricSpec(
Metric metric,
Func<Threshold, Sample, Sample?, Measurement> validateAndNormalize,
Func<Sample, Sample?, Measurement> estimate,
Func<Sample, Sample?, Probability, Bounds> bounds,
Func<Sample, Sample?, Probability, string, Bounds>? seededBounds = null)
{
Metric = metric;
ValidateAndNormalize = validateAndNormalize;
Estimate = estimate;
Bounds = bounds;
SeededBounds = seededBounds;
}
}
private static readonly MetricSpec[] Compare1Specs =
[
new MetricSpec(
Metric.Center,
ValidateCenter,
(x, _) => CenterEstimator.Instance.Estimate(x),
(x, _, alpha) => CenterBoundsEstimator.Instance.Estimate(x, alpha)),
new MetricSpec(
Metric.Spread,
ValidateSpread,
(x, _) => SpreadEstimator.Instance.Estimate(x),
(x, _, alpha) => SpreadBoundsEstimator.Instance.Estimate(x, alpha),
(x, _, alpha, seed) => SpreadBoundsEstimator.Instance.Estimate(x, alpha, seed)),
];
private static readonly MetricSpec[] Compare2Specs =
[
new MetricSpec(
Metric.Shift,
ValidateShift,
(x, y) => ShiftEstimator.Instance.Estimate(x, y!),
(x, y, alpha) => ShiftBoundsEstimator.Instance.Estimate(x, y!, alpha)),
new MetricSpec(
Metric.Ratio,
ValidateRatio,
(x, y) => RatioEstimator.Instance.Estimate(x, y!),
(x, y, alpha) => RatioBoundsEstimator.Instance.Estimate(x, y!, alpha)),
new MetricSpec(
Metric.Disparity,
ValidateDisparity,
(x, y) => DisparityEstimator.Instance.Estimate(x, y!),
(x, y, alpha) => DisparityBoundsEstimator.Instance.Estimate(x, y!, alpha),
(x, y, alpha, seed) => DisparityBoundsEstimator.Instance.Estimate(x, y!, alpha, seed)),
];
private static Measurement ValidateCenter(Threshold threshold, Sample x, Sample? _)
{
if (!threshold.Value.Unit.IsCompatible(x.Unit))
throw new UnitMismatchException(threshold.Value.Unit, x.Unit);
if (!threshold.Value.NominalValue.IsFinite())
throw new ArgumentOutOfRangeException(nameof(threshold), "threshold.Value must be finite");
double factor = MeasurementUnit.ConversionFactor(threshold.Value.Unit, x.Unit);
return new Measurement(threshold.Value.NominalValue * factor, x.Unit);
}
private static Measurement ValidateSpread(Threshold threshold, Sample x, Sample? _) =>
ValidateCenter(threshold, x, null);
private static Measurement ValidateShift(Threshold threshold, Sample x, Sample? y)
{
if (!threshold.Value.Unit.IsCompatible(x.Unit))
throw new UnitMismatchException(threshold.Value.Unit, x.Unit);
if (!threshold.Value.NominalValue.IsFinite())
throw new ArgumentOutOfRangeException(nameof(threshold), "threshold.Value must be finite");
var finerUnit = MeasurementUnit.Finer(x.Unit, y!.Unit);
double factor = MeasurementUnit.ConversionFactor(threshold.Value.Unit, finerUnit);
return new Measurement(threshold.Value.NominalValue * factor, finerUnit);
}
private static Measurement ValidateRatio(Threshold threshold, Sample _, Sample? __)
{
var unit = threshold.Value.Unit;
if (unit != MeasurementUnit.Ratio && unit != MeasurementUnit.Number)
throw new UnitMismatchException(unit, MeasurementUnit.Ratio);
double value = threshold.Value.NominalValue;
if (value <= 0 || !value.IsFinite())
throw new ArgumentOutOfRangeException(nameof(threshold), "Ratio threshold.Value must be finite and positive");
return new Measurement(value, MeasurementUnit.Ratio);
}
private static Measurement ValidateDisparity(Threshold threshold, Sample _, Sample? __)
{
var unit = threshold.Value.Unit;
if (unit != MeasurementUnit.Disparity && unit != MeasurementUnit.Number)
throw new UnitMismatchException(unit, MeasurementUnit.Disparity);
double value = threshold.Value.NominalValue;
if (!value.IsFinite())
throw new ArgumentOutOfRangeException(nameof(threshold), "Disparity threshold.Value must be finite");
return new Measurement(value, MeasurementUnit.Disparity);
}
public static IReadOnlyList<Projection> Compare1(Sample x, IReadOnlyList<Threshold> thresholds, string? seed)
{
Assertion.NonWeighted("x", x);
Assertion.NotNullOrEmpty("thresholds", thresholds);
Assertion.ItemNotNull("thresholds", thresholds);
foreach (var threshold in thresholds)
{
if (threshold.Metric is Metric.Shift or Metric.Ratio or Metric.Disparity)
throw new ArgumentException(
$"Metric {threshold.Metric} is not supported by Compare1. Use Compare2 instead.",
nameof(thresholds));
}
foreach (var threshold in thresholds)
{
if (!threshold.Value.NominalValue.IsFinite())
throw new ArgumentOutOfRangeException(nameof(thresholds), "threshold.Value must be finite");
}
var normalizedValues = new Measurement[thresholds.Count];
for (int i = 0; i < thresholds.Count; i++)
{
var spec = GetSpec(Compare1Specs, thresholds[i].Metric);
normalizedValues[i] = spec.ValidateAndNormalize(thresholds[i], x, null);
}
return Execute(Compare1Specs, x, null, thresholds, normalizedValues, seed);
}
public static IReadOnlyList<Projection> Compare2(
Sample x, Sample y, IReadOnlyList<Threshold> thresholds, string? seed)
{
Assertion.NonWeighted("x", x);
Assertion.NonWeighted("y", y);
Assertion.CompatibleUnits(x, y);
Assertion.NotNullOrEmpty("thresholds", thresholds);
Assertion.ItemNotNull("thresholds", thresholds);
foreach (var threshold in thresholds)
{
if (threshold.Metric is Metric.Center or Metric.Spread)
throw new ArgumentException(
$"Metric {threshold.Metric} is not supported by Compare2. Use Compare1 instead.",
nameof(thresholds));
}
foreach (var threshold in thresholds)
{
if (!threshold.Value.NominalValue.IsFinite())
throw new ArgumentOutOfRangeException(nameof(thresholds), "threshold.Value must be finite");
}
var normalizedValues = new Measurement[thresholds.Count];
for (int i = 0; i < thresholds.Count; i++)
{
var spec = GetSpec(Compare2Specs, thresholds[i].Metric);
normalizedValues[i] = spec.ValidateAndNormalize(thresholds[i], x, y);
}
return Execute(Compare2Specs, x, y, thresholds, normalizedValues, seed);
}
private static MetricSpec GetSpec(MetricSpec[] specs, Metric metric)
{
foreach (var spec in specs)
if (spec.Metric == metric) return spec;
throw new ArgumentException($"No spec found for metric {metric}");
}
private static IReadOnlyList<Projection> Execute(
MetricSpec[] canonicalSpecs,
Sample x,
Sample? y,
IReadOnlyList<Threshold> thresholds,
Measurement[] normalizedValues,
string? seed)
{
var results = new Projection[thresholds.Count];
var byMetric = thresholds
.Select((t, i) => (t, i, normalizedValues[i]))
.GroupBy(item => item.t.Metric)
.ToDictionary(g => g.Key, g => g.ToList());
foreach (var spec in canonicalSpecs)
{
if (!byMetric.TryGetValue(spec.Metric, out var entries)) continue;
var estimate = spec.Estimate(x, y);
foreach (var (threshold, inputIndex, normalizedValue) in entries)
{
var bounds = (seed != null && spec.SeededBounds != null)
? spec.SeededBounds(x, y, threshold.Misrate, seed)
: spec.Bounds(x, y, threshold.Misrate);
var verdict = ComputeVerdict(bounds, normalizedValue);
results[inputIndex] = new Projection(threshold, estimate, bounds, verdict);
}
}
return results;
}
private static ComparisonVerdict ComputeVerdict(Bounds bounds, Measurement normalizedThreshold)
{
double t = normalizedThreshold.NominalValue;
if (bounds.Lower > t) return ComparisonVerdict.Greater;
if (bounds.Upper < t) return ComparisonVerdict.Less;
return ComparisonVerdict.Inconclusive;
}
}
Notes
Verdict Boundary Condition
When (bounds lower equals threshold), the verdict is , not . When (bounds upper equals threshold), the verdict is , not . The verdict is only when (strictly).
This conservative choice reflects the discrete nature of confidence bounds: the true value could plausibly equal the boundary.
From Hypothesis Testing to Practical Thresholds
Compare2 extends the Inversion Principle to two-sample comparisons. Instead of testing Is Shift significantly different from zero?, Compare2 answers Is Shift reliably greater than my practical threshold?
A Shift of ms may be statistically significant (bounds exclude zero) but practically inconclusive (bounds include your threshold of ms). Traditional hypothesis testing declares this significant and stops; Compare2 declares it relative to the practical threshold.
Tests
The test suite contains 26 test cases (5 demo + 4 multi-threshold + 2 order + 5 misrate + 4 natural + 2 property + 4 error). All tests use seed "compare2-tests" for reproducibility. Each test case output is a JSON object with a projections array; each projection has estimate, lower, upper, and verdict fields.
Demo examples (, , ) single threshold, clear verdicts:
demo-shift-less: shift threshold at with clearly negative shift → ,demo-shift-greater: and swapped → ,demo-shift-inconclusive: , threshold at →demo-ratio-less: , , ratio threshold at →demo-disparity-less: , , disparity threshold at →
Multi-threshold (, ):
multi-shift-ratio: combined shift and ratio thresholdsmulti-shift-disparity: combined shift and disparity thresholdsmulti-all-three: shift, ratio, and disparity togethermulti-two-shifts: two different shift thresholds
Input order preservation verifies output order matches input order, not canonical order:
order-disparity-shift: disparity listed before shift → output[0] = disparity, output[1] = shiftorder-ratio-shift: ratio listed before shift → output[0] = ratio, output[1] = shift
Misrate variation (, , threshold at ):
5 tests spanning progressively stricter fixture misrates, from narrower to wider bounds.
Natural sequences (sizes from , achievable fixture misrates):
natural-10-10,natural-10-15,natural-15-10,natural-15-15
Property validation ():
property-shift-identity: threshold at → bounds includeproperty-ratio-identity: threshold at → bounds include
Error cases inputs that violate assumptions:
error-empty-x: →error-empty-y: →error-constant-x-disparity: , threshold →error-constant-y-disparity: , threshold →