DisparityBounds

DisparityBounds(x,y,misrate)=[LS,US][LA,UA]\operatorname{DisparityBounds}(\mathbf{x}, \mathbf{y}, \mathrm{misrate}) = \frac{[L_S, U_S]}{[L_A, U_A]}

where [LS,US]=ShiftBounds(x,y,misrateS)[L_S, U_S] = \operatorname{ShiftBounds}(\mathbf{x}, \mathbf{y}, \mathrm{misrate}_S), [LA,UA]=AvgSpreadBounds(x,y,misrateA)[L_A, U_A] = \operatorname{AvgSpreadBounds}(\mathbf{x}, \mathbf{y}, \mathrm{misrate}_A), and misrateS+misrateA=misrate\mathrm{misrate}_S + \mathrm{misrate}_A = \mathrm{misrate} (Bonferroni split).

Robust bounds on Disparity(x,y)\operatorname{Disparity}(\mathbf{x}, \mathbf{y}) with specified coverage.

Input

  • x=(x1,x2,,xn)\mathbf{x} = (x_1, x_2, \ldots, x_n) — first sample of measurements, where n2n \geq 2, requires sparity(x)
  • y=(y1,y2,,ym)\mathbf{y} = (y_1, y_2, \ldots, y_m) — second sample of measurements, where m2m \geq 2, requires sparity(y)
  • misrate\mathrm{misrate} — probability that true disparity falls outside bounds in the long run (minimum depends on nn, mm; see Algorithm)

Output

  • Value interval [L,U][L, U] bounding Disparity(x,y)\operatorname{Disparity}(\mathbf{x}, \mathbf{y})
  • Unit dimensionless (spread units)

Notes

  • Note Bonferroni split between shift and avg-spread bounds; no independence assumption needed; bounds may be unbounded when pooled spread cannot be certified positive

Properties

  • Location invariance DisparityBounds(x+k,y+k,misrate)=DisparityBounds(x,y,misrate)\operatorname{DisparityBounds}(\mathbf{x} + k, \mathbf{y} + k, \mathrm{misrate}) = \operatorname{DisparityBounds}(\mathbf{x}, \mathbf{y}, \mathrm{misrate})
  • Scale invariance DisparityBounds(kx,ky,misrate)=sign(k)DisparityBounds(x,y,misrate)\operatorname{DisparityBounds}(k \cdot \mathbf{x}, k \cdot \mathbf{y}, \mathrm{misrate}) = \operatorname{sign}(k) \cdot \operatorname{DisparityBounds}(\mathbf{x}, \mathbf{y}, \mathrm{misrate})
  • Antisymmetry DisparityBounds(x,y,misrate)=DisparityBounds(y,x,misrate)\operatorname{DisparityBounds}(\mathbf{x}, \mathbf{y}, \mathrm{misrate}) = -\operatorname{DisparityBounds}(\mathbf{y}, \mathbf{x}, \mathrm{misrate}) (bounds reversed)
  • Monotonicity in misrate smaller misrate\mathrm{misrate} produces wider bounds

Example

  • DisparityBounds([1..30], [21..50], 10^(-3)) returns bounds containing Disparity

See also: Compare2\operatorname{Compare2} for comparing Disparity against practical thresholds with automatic verdict generation.

Algorithm

The DisparityBounds\operatorname{DisparityBounds} estimator constructs bounds on Disparity(x,y)\operatorname{Disparity}(\mathbf{x}, \mathbf{y}) by combining ShiftBounds\operatorname{ShiftBounds} and AvgSpreadBounds\operatorname{AvgSpreadBounds} through a Bonferroni split.

Misrate allocation

The total misrate\mathrm{misrate} budget is split between the shift and avg-spread components. Let minS=2/(n+mn)min_S = 2 / \binom{n + m}{n} (minimum for ShiftBounds\operatorname{ShiftBounds}) and minA=2max(21nslash2,21mslash2)min_A = 2 \cdot \max(2^{1-\lfloor n slash 2 \rfloor}, 2^{1-\lfloor m slash 2 \rfloor}) (minimum for AvgSpreadBounds\operatorname{AvgSpreadBounds}). The extra budget beyond the minimums is split equally:

misrateS=minS+misrateminSminA2,misrateA=minA+misrateminSminA2\mathrm{misrate}_S = min_S + \frac{\mathrm{misrate} - min_S - min_A}{2}, \quad \mathrm{misrate}_A = min_A + \frac{\mathrm{misrate} - min_S - min_A}{2}

Component bounds

Compute [LS,US]=ShiftBounds(x,y,misrateS)[L_S, U_S] = \operatorname{ShiftBounds}(\mathbf{x}, \mathbf{y}, \mathrm{misrate}_S) and [LA,UA]=AvgSpreadBounds(x,y,misrateA)[L_A, U_A] = \operatorname{AvgSpreadBounds}(\mathbf{x}, \mathbf{y}, \mathrm{misrate}_A). By Bonferronis inequality, the probability that both intervals simultaneously contain their respective true values is at least 1misrateSmisrateA=1misrate1 - \mathrm{misrate}_S - \mathrm{misrate}_A = 1 - \mathrm{misrate}.

Interval division

When LA>0L_A > 0, the disparity bounds are obtained by dividing the shift interval by the avg-spread interval. Since dividing by a positive interval can flip the ordering depending on the sign of the numerator endpoints, the algorithm computes all four combinations and takes the extremes:

[LD,UD]=[min(LSLA,LSUA,USLA,USUA),max(LSLA,LSUA,USLA,USUA)][L_D, U_D] = [\min(\frac{L_S}{L_A}, \frac{L_S}{U_A}, \frac{U_S}{L_A}, \frac{U_S}{U_A}), \max(\frac{L_S}{L_A}, \frac{L_S}{U_A}, \frac{U_S}{L_A}, \frac{U_S}{U_A})]

Edge cases

When LA=0L_A = 0 (the avg-spread interval includes zero), the bounds become partially or fully unbounded depending on the sign of [LS,US][L_S, U_S]:

  • LS>0L_S > 0: [LS/UA,+)[L_S / U_A, +\infty)
  • US<0U_S < 0: (,US/UA](-\infty, U_S / U_A]
  • LS=US=0L_S = U_S = 0: [0,0][0, 0]
  • otherwise: (,+)(-\infty, +\infty)

When UA=0U_A = 0 (the avg-spread interval collapses to zero), only the sign of the shift determines the result.

using Pragmastat.Algorithms;
using Pragmastat.Exceptions;
using Pragmastat.Internal;
using Pragmastat.Metrology;

using static Pragmastat.Functions.MinAchievableMisrate;

namespace Pragmastat.Estimators;

/// <summary>
/// Distribution-free bounds for disparity using Bonferroni combination.
/// </summary>
public class DisparityBoundsEstimator : ITwoSampleBoundsEstimator
{
  public static readonly DisparityBoundsEstimator Instance = new();

  public Bounds Estimate(Sample x, Sample y, Probability misrate)
  {
    return Estimate(x, y, misrate, null);
  }

  public Bounds Estimate(Sample x, Sample y, Probability misrate, string? seed)
  {
    Assertion.NonWeighted("x", x);
    Assertion.NonWeighted("y", y);
    Assertion.CompatibleUnits(x, y);
    (x, y) = Assertion.ConvertToFiner(x, y);

    if (double.IsNaN(misrate) || misrate < 0 || misrate > 1)
      throw AssumptionException.Domain(Subject.Misrate);

    int n = x.Size;
    int m = y.Size;
    if (n < 2)
      throw AssumptionException.Domain(Subject.X);
    if (m < 2)
      throw AssumptionException.Domain(Subject.Y);

    double minShift = TwoSample(n, m);
    double minX = OneSample(n / 2);
    double minY = OneSample(m / 2);
    double minAvg = 2.0 * Math.Max(minX, minY);

    if (misrate < minShift + minAvg)
      throw AssumptionException.Domain(Subject.Misrate);

    double extra = misrate - (minShift + minAvg);
    double alphaShift = minShift + extra / 2.0;
    double alphaAvg = minAvg + extra / 2.0;

    if (FastSpread.Estimate(x.SortedValues, isSorted: true) <= 0)
      throw AssumptionException.Sparity(Subject.X);
    if (FastSpread.Estimate(y.SortedValues, isSorted: true) <= 0)
      throw AssumptionException.Sparity(Subject.Y);

    var shiftBounds = ShiftBoundsEstimator.Instance.Estimate(x, y, alphaShift);
    var avgBounds = seed == null
      ? AvgSpreadBoundsEstimator.Instance.Estimate(x, y, alphaAvg)
      : AvgSpreadBoundsEstimator.Instance.Estimate(x, y, alphaAvg, seed);

    double la = avgBounds.Lower;
    double ua = avgBounds.Upper;
    double ls = shiftBounds.Lower;
    double us = shiftBounds.Upper;

    if (la > 0.0)
    {
      double r1 = ls / la;
      double r2 = ls / ua;
      double r3 = us / la;
      double r4 = us / ua;
      double lower = Math.Min(Math.Min(r1, r2), Math.Min(r3, r4));
      double upper = Math.Max(Math.Max(r1, r2), Math.Max(r3, r4));
      return new Bounds(lower, upper, MeasurementUnit.Disparity);
    }

    if (ua <= 0.0)
    {
      if (ls == 0.0 && us == 0.0)
        return new Bounds(0.0, 0.0, MeasurementUnit.Disparity);
      if (ls >= 0.0)
        return new Bounds(0.0, double.PositiveInfinity, MeasurementUnit.Disparity);
      if (us <= 0.0)
        return new Bounds(double.NegativeInfinity, 0.0, MeasurementUnit.Disparity);
      return new Bounds(double.NegativeInfinity, double.PositiveInfinity, MeasurementUnit.Disparity);
    }

    if (ls > 0.0)
      return new Bounds(ls / ua, double.PositiveInfinity, MeasurementUnit.Disparity);
    if (us < 0.0)
      return new Bounds(double.NegativeInfinity, us / ua, MeasurementUnit.Disparity);
    if (ls == 0.0 && us == 0.0)
      return new Bounds(0.0, 0.0, MeasurementUnit.Disparity);
    if (ls == 0.0 && us > 0.0)
      return new Bounds(0.0, double.PositiveInfinity, MeasurementUnit.Disparity);
    if (ls < 0.0 && us == 0.0)
      return new Bounds(double.NegativeInfinity, 0.0, MeasurementUnit.Disparity);

    return new Bounds(double.NegativeInfinity, double.PositiveInfinity, MeasurementUnit.Disparity);
  }
}

Notes

Width Convergence

The table below shows how Width=UL\text{Width} = U - L narrows as NN grows, for x=y=(1,1+1/(N1),,2)\mathbf{x} = \mathbf{y} = (1, 1 + 1/(N-1), \ldots, 2) (NN evenly spaced points on [1,2][1, 2]) and misrate=103\mathrm{misrate} = 10^{-3}. Dashes indicate NN too small to achieve the target misrate.

NWidth
2
3
4
5
10
20
3018.0000
405.0000
503.1429
1002.3077
2001.2000
3000.9123
4000.5918
5000.5893
10000.3817
100000.1050

Tests

DisparityBounds(x,y,misrate)=ShiftBounds(x,y,misrate)/AvgSpreadBounds(x,y,misrate)\operatorname{DisparityBounds}(\mathbf{x}, \mathbf{y}, \mathrm{misrate}) = \operatorname{ShiftBounds}(\mathbf{x}, \mathbf{y}, \mathrm{misrate}) / \operatorname{AvgSpreadBounds}(\mathbf{x}, \mathbf{y}, \mathrm{misrate})

The DisparityBounds\operatorname{DisparityBounds} test suite contains 39 test cases (3 demo + 5 natural + 6 property + 5 edge + 5 misrate + 2 distro + 6 unsorted + 7 error). Since DisparityBounds\operatorname{DisparityBounds} returns bounds rather than a point estimate, tests validate that the bounds contain Disparity(x,y)\operatorname{Disparity}(\mathbf{x}, \mathbf{y}) and satisfy equivariance properties. Each test case output is a JSON object with lower and upper fields representing the interval bounds. Because the denominator (AvgSpreadBounds\operatorname{AvgSpreadBounds}) uses randomized SpreadBounds\operatorname{SpreadBounds}, tests fix a seed to keep outputs deterministic.

Demo examples (n=m=30n = m = 30, n=m=20n = m = 20) from manual introduction:

  • demo-1: x=(1,,30)\mathbf{x} = (1, \ldots, 30), y=(21,,50)\mathbf{y} = (21, \ldots, 50), baseline fixture misrate
  • demo-2: x=(1,,30)\mathbf{x} = (1, \ldots, 30), y=(21,,50)\mathbf{y} = (21, \ldots, 50), stricter fixture misrate, wider bounds
  • demo-3: x=(1,,20)\mathbf{x} = (1, \ldots, 20), y=(5,,24)\mathbf{y} = (5, \ldots, 24), looser fixture misrate

These cases illustrate how tighter misrates produce wider bounds.

Natural sequences (reference fixture misrates) 5 tests:

  • natural-10-10: x=(1,,10)\mathbf{x} = (1, \ldots, 10), y=(1,,10)\mathbf{y} = (1, \ldots, 10), bounds containing 00
  • natural-10-15: x=(1,,10)\mathbf{x} = (1, \ldots, 10), y=(1,,15)\mathbf{y} = (1, \ldots, 15)
  • natural-15-10: x=(1,,15)\mathbf{x} = (1, \ldots, 15), y=(1,,10)\mathbf{y} = (1, \ldots, 10)
  • natural-15-15: x=(1,,15)\mathbf{x} = (1, \ldots, 15), y=(1,,15)\mathbf{y} = (1, \ldots, 15), bounds containing 00
  • natural-20-20: x=(1,,20)\mathbf{x} = (1, \ldots, 20), y=(1,,20)\mathbf{y} = (1, \ldots, 20), bounds containing 00

Property validation (n=m=10n = m = 10) 6 tests:

  • property-identity: x=(0,2,,18)\mathbf{x} = (0, 2, \ldots, 18), y=(0,2,,18)\mathbf{y} = (0, 2, \ldots, 18), expected output: [1.5,1.5][-1.5, 1.5]
  • property-location-shift: x=(10,12,,28)\mathbf{x} = (10, 12, \ldots, 28), y=(12,14,,30)\mathbf{y} = (12, 14, \ldots, 30), expected output: [2,1][-2, 1]
  • property-scale-2x: x=(0,4,,36)\mathbf{x} = (0, 4, \ldots, 36), y=(4,8,,40)\mathbf{y} = (4, 8, \ldots, 40) (= 2× location-shift), expected output: [2,1][-2, 1] (scale invariance of disparity)
  • property-scale-neg: x=(0,2,,18)\mathbf{x} = (0, -2, \ldots, -18), y=(2,4,,20)\mathbf{y} = (-2, -4, \ldots, -20) (negated), expected output: [1,2][-1, 2] (anti-symmetry under sign flip)
  • property-symmetry: x=(1,,10)\mathbf{x} = (1, \ldots, 10), y=(6,,15)\mathbf{y} = (6, \ldots, 15), observed bounds
  • property-symmetry-swapped: x\mathbf{x} and y\mathbf{y} swapped, bounds negated (anti-symmetry)

Edge cases boundary conditions (5 tests):

  • edge-small: n=m=6n = m = 6 (small samples)
  • edge-negative: negative values for both samples
  • edge-mixed-signs: mixed positive/negative values
  • edge-wide-range: extreme value range
  • edge-asymmetric-10-20: n=10n = 10, m=20m = 20 (unbalanced sizes)

Misrate variation (x=(1,,20)\mathbf{x} = (1, \ldots, 20), y=(5,,24)\mathbf{y} = (5, \ldots, 24)) 5 tests spanning progressively stricter fixture misrates:

These tests validate monotonicity: smaller misrates produce wider bounds.

Distribution tests (misrate\mathrm{misrate} varies) 2 tests:

  • additive-20-20: n=m=20n = m = 20, Additive(10,1)\underline{\operatorname{Additive}}(10, 1)
  • uniform-20-20: n=m=20n = m = 20, Uniform(0,1)\underline{\operatorname{Uniform}}(0, 1)

Unsorted tests verify independent sorting of x\mathbf{x} and y\mathbf{y} (6 tests):

  • unsorted-reverse-x: X reversed, Y sorted
  • unsorted-reverse-y: X sorted, Y reversed
  • unsorted-reverse-both: both reversed
  • unsorted-shuffle-x: X shuffled, Y sorted
  • unsorted-shuffle-y: X sorted, Y shuffled
  • unsorted-wide-range: wide value range, both unsorted

These tests validate that DisparityBounds\operatorname{DisparityBounds} produces identical results regardless of input order.

Error cases inputs that violate assumptions (7 tests):

  • error-empty-x: x=()\mathbf{x} = () (empty X array)
  • error-empty-y: y=()\mathbf{y} = () (empty Y array)
  • error-single-element-x: x=1|\mathbf{x}| = 1 (too few elements for pairing)
  • error-single-element-y: y=1|\mathbf{y}| = 1 (too few elements for pairing)
  • error-constant-x: constant x\mathbf{x} violates sparity (Spread=0\operatorname{Spread} = 0)
  • error-constant-y: constant y\mathbf{y} violates sparity (Spread=0\operatorname{Spread} = 0)
  • error-misrate-below-min: misrate below minimum achievable