Compare2

Compare2(x,y,[T1,,Tk])=[P1,,Pk]\operatorname{Compare2}(\mathbf{x}, \mathbf{y}, [T_1, \ldots, T_k]) = [P_1, \ldots, P_k]

where Ti=(Mi,ti,misratei)T_i = (M_i, t_i, \mathrm{misrate}_i) is a threshold with metric MiM_i (Shift\operatorname{Shift}, Ratio\operatorname{Ratio}, or Disparity\operatorname{Disparity}), Pi=(ei,[Li,Ui],vi)P_i = (e_i, [L_i, U_i], v_i) is the projection with estimate eie_i, bounds [Li,Ui][L_i, U_i], and verdict vi=Greaterv_i = \mathrm{Greater} if Li>tiL_i > t_i; Less\mathrm{Less} if Ui<tiU_i < t_i; Inconclusive\mathrm{Inconclusive} otherwise.

Two-sample confirmatory analysis: compares estimates against practical thresholds.

Input

  • x=(x1,,xn)\mathbf{x} = (x_1, \ldots, x_n) — first sample of measurements
  • y=(y1,,ym)\mathbf{y} = (y_1, \ldots, y_m) — second sample of measurements
  • Ti=(Mi,ti,misratei)T_i = (M_i, t_i, \mathrm{misrate}_i) — list of kk thresholds: MiM_i is Shift\operatorname{Shift}, Ratio\operatorname{Ratio}, or Disparity\operatorname{Disparity}; tit_i is the threshold value; misratei\mathrm{misrate}_i is the per-threshold error rate
  • seed\text{seed} — optional string for reproducible randomization (passed to DisparityBounds\operatorname{DisparityBounds})

Output

  • Value — list of kk projections in input order, each Pi=(ei,[Li,Ui],vi)P_i = (e_i, [L_i, U_i], v_i)
  • Unit — per-projection: same unit as the underlying estimator (Shift\operatorname{Shift}, Ratio\operatorname{Ratio}, or Disparity\operatorname{Disparity})

Notes

  • Independence — each threshold evaluated independently; no family-wise guarantee
  • Unit compatibilityShift\operatorname{Shift}: compatible with sample units; Ratio\operatorname{Ratio}: dimensionless or Ratio\operatorname{Ratio}, requires ti>0t_i > 0; Disparity\operatorname{Disparity}: dimensionless or Disparity\operatorname{Disparity}
  • See alsoCompare1\operatorname{Compare1} for one-sample metrics (Center\operatorname{Center}, Spread\operatorname{Spread})

Properties

  • Order preservation PiP_i corresponds to input TiT_i
  • Metric deduplication each distinct metric computed once regardless of threshold count

Example

  • Compare2([1..30], [21..50], [(Shift, 0, 1e-3)])[Projection(-20, [...], Less)]
  • Compare2([21..50], [1..30], [(Shift, 0, 1e-3)])[Projection(20, [...], Greater)]

Compare2\operatorname{Compare2} automates the pattern of computing a two-sample estimate, constructing bounds, and comparing the bounds against a practical threshold. Instead of asking whether Shift\operatorname{Shift} is significantly different from zero, it answers whether Shift\operatorname{Shift} is reliably above or below a practical threshold. Each threshold produces a ternary verdict that respects both statistical uncertainty and practical relevance. When multiple thresholds are needed (different metrics or different misrates), pass them all in one call to avoid redundant computation.

Algorithm

Compare2\operatorname{Compare2} orchestrates estimation and comparison in two phases: pre-pass validation and the statistical phase.

Pre-pass validation

Before any statistical computation:

  • Reject weighted samples for x\mathbf{x} and y\mathbf{y} (unsupported).
  • Check that x\mathbf{x} and y\mathbf{y} have compatible units.
  • Reject null or empty threshold list.
  • Reject threshold items containing null.
  • Reject thresholds with metrics not in Shift,Ratio,Disparity{\operatorname{Shift}, \operatorname{Ratio}, \operatorname{Disparity}} (wrong arity).
  • Reject thresholds with non-finite values.

These checks happen before bounds computation, so no statistical work is done on invalid inputs.

Validate-and-normalize pass

For each threshold, in input order:

  • Shift: check unit compatibility with x\mathbf{x}; convert threshold value to the finer of x\mathbf{x}s and y\mathbf{y}s units.
  • Ratio: accept unit Ratio\operatorname{Ratio} or dimensionless (coerce to Ratio\operatorname{Ratio}); threshold value must be >0> 0.
  • Disparity: accept unit Disparity\operatorname{Disparity} or dimensionless (coerce to Disparity\operatorname{Disparity}); threshold value must be finite.

Bindings that support plain numeric shorthand (Python and R) interpret it directly on the working comparison scale; explicit measurement thresholds are normalized as above.

Statistical phase (canonical metric order: Shift → Ratio → Disparity)

For each present metric (in canonical order), compute the estimate once and bounds for each threshold entry of that metric:

estimate={Shift(x,y)if metric=Shift,Ratio(x,y)if metric=Ratio,Disparity(x,y)if metric=Disparity\begin{aligned} \text{estimate} = \begin{cases} \operatorname{Shift}(\mathbf{x}, \mathbf{y}) & \text{if metric} = \operatorname{Shift}, \operatorname{Ratio}(\mathbf{x}, \mathbf{y}) & \text{if metric} = \operatorname{Ratio}, \operatorname{Disparity}(\mathbf{x}, \mathbf{y}) & \text{if metric} = \operatorname{Disparity} \end{cases} \end{aligned} bounds={ShiftBounds(x,y,misratei)if metric=Shift,RatioBounds(x,y,misratei)if metric=Ratio,DisparityBounds(x,y,misratei,seed)if metric=Disparityandseednull,DisparityBounds(x,y,misratei)if metric=Disparityandseed=null\begin{aligned} \text{bounds} = \begin{cases} \operatorname{ShiftBounds}(\mathbf{x}, \mathbf{y}, \mathrm{misrate}_i) & \text{if metric} = \operatorname{Shift}, \operatorname{RatioBounds}(\mathbf{x}, \mathbf{y}, \mathrm{misrate}_i) & \text{if metric} = \operatorname{Ratio}, \operatorname{DisparityBounds}(\mathbf{x}, \mathbf{y}, \mathrm{misrate}_i, \text{seed}) & \text{if metric} = \operatorname{Disparity} and \text{seed} \neq \text{null}, \operatorname{DisparityBounds}(\mathbf{x}, \mathbf{y}, \mathrm{misrate}_i) & \text{if metric} = \operatorname{Disparity} and \text{seed} = \text{null} \end{cases} \end{aligned}

Verdict computation

verdicti={GreaterifLi>ti,LessifUi<ti,Inconclusiveotherwise\begin{aligned} \text{verdict}_i = \begin{cases} \text{Greater} & \text{if} L_i > t_i, \text{Less} & \text{if} U_i < t_i, \text{Inconclusive} & \text{otherwise} \end{cases} \end{aligned}

where [Li,Ui][L_i, U_i] are the bounds for threshold ii and tit_i is the normalized threshold value.

Result ordering

Results are stored in input order regardless of canonical processing order. Input [Disparity, Shift] produces output [disparity_projection, shift_projection].

using Pragmastat.Estimators;
using Pragmastat.Exceptions;
using Pragmastat.Metrology;

namespace Pragmastat.Internal;

internal static class CompareEngine
{
  private readonly struct MetricSpec
  {
    public Metric Metric { get; }
    public Func<Threshold, Sample, Sample?, Measurement> ValidateAndNormalize { get; }
    public Func<Sample, Sample?, Measurement> Estimate { get; }
    public Func<Sample, Sample?, Probability, Bounds> Bounds { get; }
    public Func<Sample, Sample?, Probability, string, Bounds>? SeededBounds { get; }

    public MetricSpec(
      Metric metric,
      Func<Threshold, Sample, Sample?, Measurement> validateAndNormalize,
      Func<Sample, Sample?, Measurement> estimate,
      Func<Sample, Sample?, Probability, Bounds> bounds,
      Func<Sample, Sample?, Probability, string, Bounds>? seededBounds = null)
    {
      Metric = metric;
      ValidateAndNormalize = validateAndNormalize;
      Estimate = estimate;
      Bounds = bounds;
      SeededBounds = seededBounds;
    }
  }

  private static readonly MetricSpec[] Compare1Specs =
  [
    new MetricSpec(
      Metric.Center,
      ValidateCenter,
      (x, _) => CenterEstimator.Instance.Estimate(x),
      (x, _, alpha) => CenterBoundsEstimator.Instance.Estimate(x, alpha)),
    new MetricSpec(
      Metric.Spread,
      ValidateSpread,
      (x, _) => SpreadEstimator.Instance.Estimate(x),
      (x, _, alpha) => SpreadBoundsEstimator.Instance.Estimate(x, alpha),
      (x, _, alpha, seed) => SpreadBoundsEstimator.Instance.Estimate(x, alpha, seed)),
  ];

  private static readonly MetricSpec[] Compare2Specs =
  [
    new MetricSpec(
      Metric.Shift,
      ValidateShift,
      (x, y) => ShiftEstimator.Instance.Estimate(x, y!),
      (x, y, alpha) => ShiftBoundsEstimator.Instance.Estimate(x, y!, alpha)),
    new MetricSpec(
      Metric.Ratio,
      ValidateRatio,
      (x, y) => RatioEstimator.Instance.Estimate(x, y!),
      (x, y, alpha) => RatioBoundsEstimator.Instance.Estimate(x, y!, alpha)),
    new MetricSpec(
      Metric.Disparity,
      ValidateDisparity,
      (x, y) => DisparityEstimator.Instance.Estimate(x, y!),
      (x, y, alpha) => DisparityBoundsEstimator.Instance.Estimate(x, y!, alpha),
      (x, y, alpha, seed) => DisparityBoundsEstimator.Instance.Estimate(x, y!, alpha, seed)),
  ];

  private static Measurement ValidateCenter(Threshold threshold, Sample x, Sample? _)
  {
    if (!threshold.Value.Unit.IsCompatible(x.Unit))
      throw new UnitMismatchException(threshold.Value.Unit, x.Unit);
    if (!threshold.Value.NominalValue.IsFinite())
      throw new ArgumentOutOfRangeException(nameof(threshold), "threshold.Value must be finite");
    double factor = MeasurementUnit.ConversionFactor(threshold.Value.Unit, x.Unit);
    return new Measurement(threshold.Value.NominalValue * factor, x.Unit);
  }

  private static Measurement ValidateSpread(Threshold threshold, Sample x, Sample? _) =>
    ValidateCenter(threshold, x, null);

  private static Measurement ValidateShift(Threshold threshold, Sample x, Sample? y)
  {
    if (!threshold.Value.Unit.IsCompatible(x.Unit))
      throw new UnitMismatchException(threshold.Value.Unit, x.Unit);
    if (!threshold.Value.NominalValue.IsFinite())
      throw new ArgumentOutOfRangeException(nameof(threshold), "threshold.Value must be finite");
    var finerUnit = MeasurementUnit.Finer(x.Unit, y!.Unit);
    double factor = MeasurementUnit.ConversionFactor(threshold.Value.Unit, finerUnit);
    return new Measurement(threshold.Value.NominalValue * factor, finerUnit);
  }

  private static Measurement ValidateRatio(Threshold threshold, Sample _, Sample? __)
  {
    var unit = threshold.Value.Unit;
    if (unit != MeasurementUnit.Ratio && unit != MeasurementUnit.Number)
      throw new UnitMismatchException(unit, MeasurementUnit.Ratio);
    double value = threshold.Value.NominalValue;
    if (value <= 0 || !value.IsFinite())
      throw new ArgumentOutOfRangeException(nameof(threshold), "Ratio threshold.Value must be finite and positive");
    return new Measurement(value, MeasurementUnit.Ratio);
  }

  private static Measurement ValidateDisparity(Threshold threshold, Sample _, Sample? __)
  {
    var unit = threshold.Value.Unit;
    if (unit != MeasurementUnit.Disparity && unit != MeasurementUnit.Number)
      throw new UnitMismatchException(unit, MeasurementUnit.Disparity);
    double value = threshold.Value.NominalValue;
    if (!value.IsFinite())
      throw new ArgumentOutOfRangeException(nameof(threshold), "Disparity threshold.Value must be finite");
    return new Measurement(value, MeasurementUnit.Disparity);
  }

  public static IReadOnlyList<Projection> Compare1(Sample x, IReadOnlyList<Threshold> thresholds, string? seed)
  {
    Assertion.NonWeighted("x", x);
    Assertion.NotNullOrEmpty("thresholds", thresholds);
    Assertion.ItemNotNull("thresholds", thresholds);

    foreach (var threshold in thresholds)
    {
      if (threshold.Metric is Metric.Shift or Metric.Ratio or Metric.Disparity)
        throw new ArgumentException(
          $"Metric {threshold.Metric} is not supported by Compare1. Use Compare2 instead.",
          nameof(thresholds));
    }

    foreach (var threshold in thresholds)
    {
      if (!threshold.Value.NominalValue.IsFinite())
        throw new ArgumentOutOfRangeException(nameof(thresholds), "threshold.Value must be finite");
    }

    var normalizedValues = new Measurement[thresholds.Count];
    for (int i = 0; i < thresholds.Count; i++)
    {
      var spec = GetSpec(Compare1Specs, thresholds[i].Metric);
      normalizedValues[i] = spec.ValidateAndNormalize(thresholds[i], x, null);
    }

    return Execute(Compare1Specs, x, null, thresholds, normalizedValues, seed);
  }

  public static IReadOnlyList<Projection> Compare2(
    Sample x, Sample y, IReadOnlyList<Threshold> thresholds, string? seed)
  {
    Assertion.NonWeighted("x", x);
    Assertion.NonWeighted("y", y);
    Assertion.CompatibleUnits(x, y);
    Assertion.NotNullOrEmpty("thresholds", thresholds);
    Assertion.ItemNotNull("thresholds", thresholds);

    foreach (var threshold in thresholds)
    {
      if (threshold.Metric is Metric.Center or Metric.Spread)
        throw new ArgumentException(
          $"Metric {threshold.Metric} is not supported by Compare2. Use Compare1 instead.",
          nameof(thresholds));
    }

    foreach (var threshold in thresholds)
    {
      if (!threshold.Value.NominalValue.IsFinite())
        throw new ArgumentOutOfRangeException(nameof(thresholds), "threshold.Value must be finite");
    }

    var normalizedValues = new Measurement[thresholds.Count];
    for (int i = 0; i < thresholds.Count; i++)
    {
      var spec = GetSpec(Compare2Specs, thresholds[i].Metric);
      normalizedValues[i] = spec.ValidateAndNormalize(thresholds[i], x, y);
    }

    return Execute(Compare2Specs, x, y, thresholds, normalizedValues, seed);
  }

  private static MetricSpec GetSpec(MetricSpec[] specs, Metric metric)
  {
    foreach (var spec in specs)
      if (spec.Metric == metric) return spec;
    throw new ArgumentException($"No spec found for metric {metric}");
  }

  private static IReadOnlyList<Projection> Execute(
    MetricSpec[] canonicalSpecs,
    Sample x,
    Sample? y,
    IReadOnlyList<Threshold> thresholds,
    Measurement[] normalizedValues,
    string? seed)
  {
    var results = new Projection[thresholds.Count];

    var byMetric = thresholds
      .Select((t, i) => (t, i, normalizedValues[i]))
      .GroupBy(item => item.t.Metric)
      .ToDictionary(g => g.Key, g => g.ToList());

    foreach (var spec in canonicalSpecs)
    {
      if (!byMetric.TryGetValue(spec.Metric, out var entries)) continue;
      var estimate = spec.Estimate(x, y);
      foreach (var (threshold, inputIndex, normalizedValue) in entries)
      {
        var bounds = (seed != null && spec.SeededBounds != null)
          ? spec.SeededBounds(x, y, threshold.Misrate, seed)
          : spec.Bounds(x, y, threshold.Misrate);
        var verdict = ComputeVerdict(bounds, normalizedValue);
        results[inputIndex] = new Projection(threshold, estimate, bounds, verdict);
      }
    }

    return results;
  }

  private static ComparisonVerdict ComputeVerdict(Bounds bounds, Measurement normalizedThreshold)
  {
    double t = normalizedThreshold.NominalValue;
    if (bounds.Lower > t) return ComparisonVerdict.Greater;
    if (bounds.Upper < t) return ComparisonVerdict.Less;
    return ComparisonVerdict.Inconclusive;
  }
}

Notes

Verdict Boundary Condition

When L=tL = t (bounds lower equals threshold), the verdict is Inconclusive\mathrm{Inconclusive}, not Greater\mathrm{Greater}. When U=tU = t (bounds upper equals threshold), the verdict is Inconclusive\mathrm{Inconclusive}, not Less\mathrm{Less}. The verdict is Greater\mathrm{Greater} only when L>tL > t (strictly).

This conservative choice reflects the discrete nature of confidence bounds: the true value could plausibly equal the boundary.

From Hypothesis Testing to Practical Thresholds

Compare2 extends the Inversion Principle to two-sample comparisons. Instead of testing Is Shift significantly different from zero?, Compare2 answers Is Shift reliably greater than my practical threshold?

A Shift of 55 ms may be statistically significant (bounds exclude zero) but practically inconclusive (bounds include your threshold of 1010 ms). Traditional hypothesis testing declares this significant and stops; Compare2 declares it Inconclusive\mathrm{Inconclusive} relative to the practical threshold.

Tests

The Compare2\operatorname{Compare2} test suite contains 26 test cases (5 demo + 4 multi-threshold + 2 order + 5 misrate + 4 natural + 2 property + 4 error). All tests use seed "compare2-tests" for reproducibility. Each test case output is a JSON object with a projections array; each projection has estimate, lower, upper, and verdict fields.

Demo examples (n=m=30n = m = 30, x=(1,,30)\mathbf{x} = (1, \ldots, 30), y=(21,,50)\mathbf{y} = (21, \ldots, 50)) single threshold, clear verdicts:

  • demo-shift-less: shift threshold at 00 with clearly negative shift → Shift20\operatorname{Shift} \approx -20, Less\mathrm{Less}
  • demo-shift-greater: x\mathbf{x} and y\mathbf{y} swapped → Shift20\operatorname{Shift} \approx 20, Greater\mathrm{Greater}
  • demo-shift-inconclusive: x=y\mathbf{x} = \mathbf{y}, threshold at 00Inconclusive\mathrm{Inconclusive}
  • demo-ratio-less: x=(1,,20)\mathbf{x} = (1, \ldots, 20), y=(21,,40)\mathbf{y} = (21, \ldots, 40), ratio threshold at 11Less\mathrm{Less}
  • demo-disparity-less: x=(1,,30)\mathbf{x} = (1, \ldots, 30), y=(21,,50)\mathbf{y} = (21, \ldots, 50), disparity threshold at 00Less\mathrm{Less}

Multi-threshold (x=(1,,30)\mathbf{x} = (1, \ldots, 30), y=(21,,50)\mathbf{y} = (21, \ldots, 50)):

  • multi-shift-ratio: combined shift and ratio thresholds
  • multi-shift-disparity: combined shift and disparity thresholds
  • multi-all-three: shift, ratio, and disparity together
  • multi-two-shifts: two different shift thresholds

Input order preservation verifies output order matches input order, not canonical order:

  • order-disparity-shift: disparity listed before shift → output[0] = disparity, output[1] = shift
  • order-ratio-shift: ratio listed before shift → output[0] = ratio, output[1] = shift

Misrate variation (x=(1,,20)\mathbf{x} = (1, \ldots, 20), y=(11,,30)\mathbf{y} = (11, \ldots, 30), Shift\operatorname{Shift} threshold at 00):

5 tests spanning progressively stricter fixture misrates, from narrower to wider bounds.

Natural sequences (sizes from 10,15{10, 15}, achievable fixture misrates):

  • natural-10-10, natural-10-15, natural-15-10, natural-15-15

Property validation (x=y=(1,,20)\mathbf{x} = \mathbf{y} = (1, \ldots, 20)):

  • property-shift-identity: Shift\operatorname{Shift} threshold at 00 → bounds include 00
  • property-ratio-identity: Ratio\operatorname{Ratio} threshold at 11 → bounds include 11

Error cases inputs that violate assumptions:

  • error-empty-x: x=()\mathbf{x} = ()validity(x)\text{validity}(x)
  • error-empty-y: y=()\mathbf{y} = ()validity(y)\text{validity}(y)
  • error-constant-x-disparity: x=(5,5,,5)\mathbf{x} = (5, 5, \ldots, 5), Disparity\operatorname{Disparity} threshold → sparity(x)\text{sparity}(x)
  • error-constant-y-disparity: y=(5,5,,5)\mathbf{y} = (5, 5, \ldots, 5), Disparity\operatorname{Disparity} threshold → sparity(y)\text{sparity}(y)