Compare2

\operatorname{Compare2}(\mathbf{x}, \mathbf{y}, [T_1, \ldots, T_k]) = [P_1, \ldots, P_k]

where $T_i = (M_i, t_i, \mathrm{misrate}_i)$ is a threshold with metric $M_i$ ( $\operatorname{Shift}$ , $\operatorname{Ratio}$ , or $\operatorname{Disparity}$ ), $P_i = (e_i, [L_i, U_i], v_i)$ is the projection with estimate $e_i$ , bounds $[L_i, U_i]$ , and verdict $v_i = \mathrm{Greater}$ if $L_i > t_i$ ; $\mathrm{Less}$ if $U_i < t_i$ ; $\mathrm{Inconclusive}$ otherwise.

Two-sample confirmatory analysis: compares estimates against practical thresholds.

Input

$\mathbf{x} = (x_1, \ldots, x_n)$ — first sample of measurements
$\mathbf{y} = (y_1, \ldots, y_m)$ — second sample of measurements
$T_i = (M_i, t_i, \mathrm{misrate}_i)$ — list of $k$ thresholds: $M_i$ is $\operatorname{Shift}$ , $\operatorname{Ratio}$ , or $\operatorname{Disparity}$ ; $t_i$ is the threshold value; $\mathrm{misrate}_i$ is the per-threshold error rate
$\text{seed}$ — optional string for reproducible randomization (passed to $\operatorname{DisparityBounds}$ )

Output

Value — list of $k$ projections in input order, each $P_i = (e_i, [L_i, U_i], v_i)$
Unit — per-projection: same unit as the underlying estimator ( $\operatorname{Shift}$ , $\operatorname{Ratio}$ , or $\operatorname{Disparity}$ )

Notes

Independence — each threshold evaluated independently; no family-wise guarantee
Unit compatibility — $\operatorname{Shift}$ : compatible with sample units; $\operatorname{Ratio}$ : dimensionless or $\operatorname{Ratio}$ , requires $t_i > 0$ ; $\operatorname{Disparity}$ : dimensionless or $\operatorname{Disparity}$
See also — $\operatorname{Compare1}$ for one-sample metrics ( $\operatorname{Center}$ , $\operatorname{Spread}$ )

Properties

Order preservation $P_i$ corresponds to input $T_i$
Metric deduplication each distinct metric computed once regardless of threshold count

Example

Compare2([1..30], [21..50], [(Shift, 0, 1e-3)]) → [Projection(-20, [...], Less)]
Compare2([21..50], [1..30], [(Shift, 0, 1e-3)]) → [Projection(20, [...], Greater)]

$\operatorname{Compare2}$ automates the pattern of computing a two-sample estimate, constructing bounds, and comparing the bounds against a practical threshold. Instead of asking whether $\operatorname{Shift}$ is significantly different from zero, it answers whether $\operatorname{Shift}$ is reliably above or below a practical threshold. Each threshold produces a ternary verdict that respects both statistical uncertainty and practical relevance. When multiple thresholds are needed (different metrics or different misrates), pass them all in one call to avoid redundant computation.

Algorithm

$\operatorname{Compare2}$ orchestrates estimation and comparison in two phases: pre-pass validation and the statistical phase.

Pre-pass validation

Before any statistical computation:

Reject weighted samples for $\mathbf{x}$ and $\mathbf{y}$ (unsupported).
Check that $\mathbf{x}$ and $\mathbf{y}$ have compatible units.
Reject null or empty threshold list.
Reject threshold items containing null.
Reject thresholds with metrics not in ${\operatorname{Shift}, \operatorname{Ratio}, \operatorname{Disparity}}$ (wrong arity).
Reject thresholds with non-finite values.

These checks happen before bounds computation, so no statistical work is done on invalid inputs.

Validate-and-normalize pass

For each threshold, in input order:

Shift: check unit compatibility with $\mathbf{x}$ ; convert threshold value to the finer of $\mathbf{x}$ s and $\mathbf{y}$ s units.
Ratio: accept unit $\operatorname{Ratio}$ or dimensionless (coerce to $\operatorname{Ratio}$ ); threshold value must be $> 0$ .
Disparity: accept unit $\operatorname{Disparity}$ or dimensionless (coerce to $\operatorname{Disparity}$ ); threshold value must be finite.

Bindings that support plain numeric shorthand (Python and R) interpret it directly on the working comparison scale; explicit measurement thresholds are normalized as above.

Statistical phase (canonical metric order: Shift → Ratio → Disparity)

For each present metric (in canonical order), compute the estimate once and bounds for each threshold entry of that metric:

\begin{aligned} \text{estimate} = \begin{cases} \operatorname{Shift}(\mathbf{x}, \mathbf{y}) & \text{if metric} = \operatorname{Shift}, \operatorname{Ratio}(\mathbf{x}, \mathbf{y}) & \text{if metric} = \operatorname{Ratio}, \operatorname{Disparity}(\mathbf{x}, \mathbf{y}) & \text{if metric} = \operatorname{Disparity} \end{cases} \end{aligned}

\begin{aligned} \text{bounds} = \begin{cases} \operatorname{ShiftBounds}(\mathbf{x}, \mathbf{y}, \mathrm{misrate}_i) & \text{if metric} = \operatorname{Shift}, \operatorname{RatioBounds}(\mathbf{x}, \mathbf{y}, \mathrm{misrate}_i) & \text{if metric} = \operatorname{Ratio}, \operatorname{DisparityBounds}(\mathbf{x}, \mathbf{y}, \mathrm{misrate}_i, \text{seed}) & \text{if metric} = \operatorname{Disparity} and \text{seed} \neq \text{null}, \operatorname{DisparityBounds}(\mathbf{x}, \mathbf{y}, \mathrm{misrate}_i) & \text{if metric} = \operatorname{Disparity} and \text{seed} = \text{null} \end{cases} \end{aligned}

Verdict computation

\begin{aligned} \text{verdict}_i = \begin{cases} \text{Greater} & \text{if} L_i > t_i, \text{Less} & \text{if} U_i < t_i, \text{Inconclusive} & \text{otherwise} \end{cases} \end{aligned}

where $[L_i, U_i]$ are the bounds for threshold $i$ and $t_i$ is the normalized threshold value.

Result ordering

Results are stored in input order regardless of canonical processing order. Input [Disparity, Shift] produces output [disparity_projection, shift_projection].

using Pragmastat.Estimators;
using Pragmastat.Exceptions;
using Pragmastat.Metrology;

namespace Pragmastat.Internal;

internal static class CompareEngine
{
  private readonly struct MetricSpec
  {
    public Metric Metric { get; }
    public Func<Threshold, Sample, Sample?, Measurement> ValidateAndNormalize { get; }
    public Func<Sample, Sample?, Measurement> Estimate { get; }
    public Func<Sample, Sample?, Probability, Bounds> Bounds { get; }
    public Func<Sample, Sample?, Probability, string, Bounds>? SeededBounds { get; }

    public MetricSpec(
      Metric metric,
      Func<Threshold, Sample, Sample?, Measurement> validateAndNormalize,
      Func<Sample, Sample?, Measurement> estimate,
      Func<Sample, Sample?, Probability, Bounds> bounds,
      Func<Sample, Sample?, Probability, string, Bounds>? seededBounds = null)
    {
      Metric = metric;
      ValidateAndNormalize = validateAndNormalize;
      Estimate = estimate;
      Bounds = bounds;
      SeededBounds = seededBounds;
    }
  }

  private static readonly MetricSpec[] Compare1Specs =
  [
    new MetricSpec(
      Metric.Center,
      ValidateCenter,
      (x, _) => CenterEstimator.Instance.Estimate(x),
      (x, _, alpha) => CenterBoundsEstimator.Instance.Estimate(x, alpha)),
    new MetricSpec(
      Metric.Spread,
      ValidateSpread,
      (x, _) => SpreadEstimator.Instance.Estimate(x),
      (x, _, alpha) => SpreadBoundsEstimator.Instance.Estimate(x, alpha),
      (x, _, alpha, seed) => SpreadBoundsEstimator.Instance.Estimate(x, alpha, seed)),
  ];

  private static readonly MetricSpec[] Compare2Specs =
  [
    new MetricSpec(
      Metric.Shift,
      ValidateShift,
      (x, y) => ShiftEstimator.Instance.Estimate(x, y!),
      (x, y, alpha) => ShiftBoundsEstimator.Instance.Estimate(x, y!, alpha)),
    new MetricSpec(
      Metric.Ratio,
      ValidateRatio,
      (x, y) => RatioEstimator.Instance.Estimate(x, y!),
      (x, y, alpha) => RatioBoundsEstimator.Instance.Estimate(x, y!, alpha)),
    new MetricSpec(
      Metric.Disparity,
      ValidateDisparity,
      (x, y) => DisparityEstimator.Instance.Estimate(x, y!),
      (x, y, alpha) => DisparityBoundsEstimator.Instance.Estimate(x, y!, alpha),
      (x, y, alpha, seed) => DisparityBoundsEstimator.Instance.Estimate(x, y!, alpha, seed)),
  ];

  private static Measurement ValidateCenter(Threshold threshold, Sample x, Sample? _)
  {
    if (!threshold.Value.Unit.IsCompatible(x.Unit))
      throw new UnitMismatchException(threshold.Value.Unit, x.Unit);
    if (!threshold.Value.NominalValue.IsFinite())
      throw new ArgumentOutOfRangeException(nameof(threshold), "threshold.Value must be finite");
    double factor = MeasurementUnit.ConversionFactor(threshold.Value.Unit, x.Unit);
    return new Measurement(threshold.Value.NominalValue * factor, x.Unit);
  }

  private static Measurement ValidateSpread(Threshold threshold, Sample x, Sample? _) =>
    ValidateCenter(threshold, x, null);

  private static Measurement ValidateShift(Threshold threshold, Sample x, Sample? y)
  {
    if (!threshold.Value.Unit.IsCompatible(x.Unit))
      throw new UnitMismatchException(threshold.Value.Unit, x.Unit);
    if (!threshold.Value.NominalValue.IsFinite())
      throw new ArgumentOutOfRangeException(nameof(threshold), "threshold.Value must be finite");
    var finerUnit = MeasurementUnit.Finer(x.Unit, y!.Unit);
    double factor = MeasurementUnit.ConversionFactor(threshold.Value.Unit, finerUnit);
    return new Measurement(threshold.Value.NominalValue * factor, finerUnit);
  }

  private static Measurement ValidateRatio(Threshold threshold, Sample _, Sample? __)
  {
    var unit = threshold.Value.Unit;
    if (unit != MeasurementUnit.Ratio && unit != MeasurementUnit.Number)
      throw new UnitMismatchException(unit, MeasurementUnit.Ratio);
    double value = threshold.Value.NominalValue;
    if (value <= 0 || !value.IsFinite())
      throw new ArgumentOutOfRangeException(nameof(threshold), "Ratio threshold.Value must be finite and positive");
    return new Measurement(value, MeasurementUnit.Ratio);
  }

  private static Measurement ValidateDisparity(Threshold threshold, Sample _, Sample? __)
  {
    var unit = threshold.Value.Unit;
    if (unit != MeasurementUnit.Disparity && unit != MeasurementUnit.Number)
      throw new UnitMismatchException(unit, MeasurementUnit.Disparity);
    double value = threshold.Value.NominalValue;
    if (!value.IsFinite())
      throw new ArgumentOutOfRangeException(nameof(threshold), "Disparity threshold.Value must be finite");
    return new Measurement(value, MeasurementUnit.Disparity);
  }

  public static IReadOnlyList<Projection> Compare1(Sample x, IReadOnlyList<Threshold> thresholds, string? seed)
  {
    Assertion.NonWeighted("x", x);
    Assertion.NotNullOrEmpty("thresholds", thresholds);
    Assertion.ItemNotNull("thresholds", thresholds);

    foreach (var threshold in thresholds)
    {
      if (threshold.Metric is Metric.Shift or Metric.Ratio or Metric.Disparity)
        throw new ArgumentException(
          $"Metric {threshold.Metric} is not supported by Compare1. Use Compare2 instead.",
          nameof(thresholds));
    }

    foreach (var threshold in thresholds)
    {
      if (!threshold.Value.NominalValue.IsFinite())
        throw new ArgumentOutOfRangeException(nameof(thresholds), "threshold.Value must be finite");
    }

    var normalizedValues = new Measurement[thresholds.Count];
    for (int i = 0; i < thresholds.Count; i++)
    {
      var spec = GetSpec(Compare1Specs, thresholds[i].Metric);
      normalizedValues[i] = spec.ValidateAndNormalize(thresholds[i], x, null);
    }

    return Execute(Compare1Specs, x, null, thresholds, normalizedValues, seed);
  }

  public static IReadOnlyList<Projection> Compare2(
    Sample x, Sample y, IReadOnlyList<Threshold> thresholds, string? seed)
  {
    Assertion.NonWeighted("x", x);
    Assertion.NonWeighted("y", y);
    Assertion.CompatibleUnits(x, y);
    Assertion.NotNullOrEmpty("thresholds", thresholds);
    Assertion.ItemNotNull("thresholds", thresholds);

    foreach (var threshold in thresholds)
    {
      if (threshold.Metric is Metric.Center or Metric.Spread)
        throw new ArgumentException(
          $"Metric {threshold.Metric} is not supported by Compare2. Use Compare1 instead.",
          nameof(thresholds));
    }

    foreach (var threshold in thresholds)
    {
      if (!threshold.Value.NominalValue.IsFinite())
        throw new ArgumentOutOfRangeException(nameof(thresholds), "threshold.Value must be finite");
    }

    var normalizedValues = new Measurement[thresholds.Count];
    for (int i = 0; i < thresholds.Count; i++)
    {
      var spec = GetSpec(Compare2Specs, thresholds[i].Metric);
      normalizedValues[i] = spec.ValidateAndNormalize(thresholds[i], x, y);
    }

    return Execute(Compare2Specs, x, y, thresholds, normalizedValues, seed);
  }

  private static MetricSpec GetSpec(MetricSpec[] specs, Metric metric)
  {
    foreach (var spec in specs)
      if (spec.Metric == metric) return spec;
    throw new ArgumentException($"No spec found for metric {metric}");
  }

  private static IReadOnlyList<Projection> Execute(
    MetricSpec[] canonicalSpecs,
    Sample x,
    Sample? y,
    IReadOnlyList<Threshold> thresholds,
    Measurement[] normalizedValues,
    string? seed)
  {
    var results = new Projection[thresholds.Count];

    var byMetric = thresholds
      .Select((t, i) => (t, i, normalizedValues[i]))
      .GroupBy(item => item.t.Metric)
      .ToDictionary(g => g.Key, g => g.ToList());

    foreach (var spec in canonicalSpecs)
    {
      if (!byMetric.TryGetValue(spec.Metric, out var entries)) continue;
      var estimate = spec.Estimate(x, y);
      foreach (var (threshold, inputIndex, normalizedValue) in entries)
      {
        var bounds = (seed != null && spec.SeededBounds != null)
          ? spec.SeededBounds(x, y, threshold.Misrate, seed)
          : spec.Bounds(x, y, threshold.Misrate);
        var verdict = ComputeVerdict(bounds, normalizedValue);
        results[inputIndex] = new Projection(threshold, estimate, bounds, verdict);
      }
    }

    return results;
  }

  private static ComparisonVerdict ComputeVerdict(Bounds bounds, Measurement normalizedThreshold)
  {
    double t = normalizedThreshold.NominalValue;
    if (bounds.Lower > t) return ComparisonVerdict.Greater;
    if (bounds.Upper < t) return ComparisonVerdict.Less;
    return ComparisonVerdict.Inconclusive;
  }
}

Notes

Verdict Boundary Condition

When $L = t$ (bounds lower equals threshold), the verdict is $\mathrm{Inconclusive}$ , not $\mathrm{Greater}$ . When $U = t$ (bounds upper equals threshold), the verdict is $\mathrm{Inconclusive}$ , not $\mathrm{Less}$ . The verdict is $\mathrm{Greater}$ only when $L > t$ (strictly).

This conservative choice reflects the discrete nature of confidence bounds: the true value could plausibly equal the boundary.

From Hypothesis Testing to Practical Thresholds

Compare2 extends the Inversion Principle to two-sample comparisons. Instead of testing Is Shift significantly different from zero?, Compare2 answers Is Shift reliably greater than my practical threshold?

A Shift of $5$ ms may be statistically significant (bounds exclude zero) but practically inconclusive (bounds include your threshold of $10$ ms). Traditional hypothesis testing declares this significant and stops; Compare2 declares it $\mathrm{Inconclusive}$ relative to the practical threshold.

Tests

The $\operatorname{Compare2}$ test suite contains 26 test cases (5 demo + 4 multi-threshold + 2 order + 5 misrate + 4 natural + 2 property + 4 error). All tests use seed "compare2-tests" for reproducibility. Each test case output is a JSON object with a projections array; each projection has estimate, lower, upper, and verdict fields.

Demo examples ( $n = m = 30$ , $\mathbf{x} = (1, \ldots, 30)$ , $\mathbf{y} = (21, \ldots, 50)$ ) single threshold, clear verdicts:

demo-shift-less: shift threshold at $0$ with clearly negative shift → $\operatorname{Shift} \approx -20$ , $\mathrm{Less}$
demo-shift-greater: $\mathbf{x}$ and $\mathbf{y}$ swapped → $\operatorname{Shift} \approx 20$ , $\mathrm{Greater}$
demo-shift-inconclusive: $\mathbf{x} = \mathbf{y}$ , threshold at $0$ → $\mathrm{Inconclusive}$
demo-ratio-less: $\mathbf{x} = (1, \ldots, 20)$ , $\mathbf{y} = (21, \ldots, 40)$ , ratio threshold at $1$ → $\mathrm{Less}$
demo-disparity-less: $\mathbf{x} = (1, \ldots, 30)$ , $\mathbf{y} = (21, \ldots, 50)$ , disparity threshold at $0$ → $\mathrm{Less}$

Multi-threshold ( $\mathbf{x} = (1, \ldots, 30)$ , $\mathbf{y} = (21, \ldots, 50)$ ):

multi-shift-ratio: combined shift and ratio thresholds
multi-shift-disparity: combined shift and disparity thresholds
multi-all-three: shift, ratio, and disparity together
multi-two-shifts: two different shift thresholds

Input order preservation verifies output order matches input order, not canonical order:

order-disparity-shift: disparity listed before shift → output[0] = disparity, output[1] = shift
order-ratio-shift: ratio listed before shift → output[0] = ratio, output[1] = shift

Misrate variation ( $\mathbf{x} = (1, \ldots, 20)$ , $\mathbf{y} = (11, \ldots, 30)$ , $\operatorname{Shift}$ threshold at $0$ ):

5 tests spanning progressively stricter fixture misrates, from narrower to wider bounds.

Natural sequences (sizes from ${10, 15}$ , achievable fixture misrates):

natural-10-10, natural-10-15, natural-15-10, natural-15-15

Property validation ( $\mathbf{x} = \mathbf{y} = (1, \ldots, 20)$ ):

property-shift-identity: $\operatorname{Shift}$ threshold at $0$ → bounds include $0$
property-ratio-identity: $\operatorname{Ratio}$ threshold at $1$ → bounds include $1$

Error cases inputs that violate assumptions:

error-empty-x: $\mathbf{x} = ()$ → $\text{validity}(x)$
error-empty-y: $\mathbf{y} = ()$ → $\text{validity}(y)$
error-constant-x-disparity: $\mathbf{x} = (5, 5, \ldots, 5)$ , $\operatorname{Disparity}$ threshold → $\text{sparity}(x)$
error-constant-y-disparity: $\mathbf{y} = (5, 5, \ldots, 5)$ , $\operatorname{Disparity}$ threshold → $\text{sparity}(y)$