DisparityBounds

DisparityBounds(x,y,misrate)=[LD,UD]\operatorname{DisparityBounds}(\mathbf{x}, \mathbf{y}, \mathrm{misrate}) = [L_D, U_D]

Let minS=2(n+mn)min_S = \frac{2}{\binom{n + m}{n}} and minA=2max(21n2,21m2)min_A = 2 \cdot \max(2^{1-\lfloor \frac{n}{2} \rfloor}, 2^{1-\lfloor \frac{m}{2} \rfloor}). Require misrateminS+minA\mathrm{misrate} \geq min_S + min_A. Let extra=misrate(minS+minA)\text{extra} = \mathrm{misrate} - (min_S + min_A), αS=minS+extra2\alpha_S = min_S + \frac{\text{extra}}{2}, αA=minA+extra2\alpha_A = min_A + \frac{\text{extra}}{2}. Compute [LS,US]=ShiftBounds(x,y,αS)[L_S, U_S] = \operatorname{ShiftBounds}(\mathbf{x}, \mathbf{y}, \alpha_S) and [LA,UA]=AvgSpreadBounds(x,y,αA)[L_A, U_A] = \operatorname{AvgSpreadBounds}(\mathbf{x}, \mathbf{y}, \alpha_A).

If LA>0L_A > 0, return [LD,UD]=[min(LSLA,LSUA,USLA,USUA),max(LSLA,LSUA,USLA,USUA)][L_D, U_D] = [\min(\frac{L_S}{L_A}, \frac{L_S}{U_A}, \frac{U_S}{L_A}, \frac{U_S}{U_A}), \max(\frac{L_S}{L_A}, \frac{L_S}{U_A}, \frac{U_S}{L_A}, \frac{U_S}{U_A})].

If LA=0L_A = 0, return the tightest single interval that is always valid:

  • LS>0L_S > 0: [LSUA,+)[\frac{L_S}{U_A}, +\infty)
  • US<0U_S < 0: (,USUA](-\infty, \frac{U_S}{U_A}]
  • LS=0L_S = 0 and US=0U_S = 0: [0,0][0, 0]
  • LS=0L_S = 0 and US>0U_S > 0: [0,+)[0, +\infty)
  • LS<0L_S < 0 and US=0U_S = 0: (,0](-\infty, 0]
  • otherwise: (,+)(-\infty, +\infty)

If UA=0U_A = 0, use the sign-only rule: [0,+)[0, +\infty) if LS0L_S \geq 0, (,0](-\infty, 0] if US0U_S \leq 0, (,+)(-\infty, +\infty) otherwise (with [0,0][0, 0] when LS=US=0L_S = U_S = 0).

Robust bounds on Disparity(x,y)\operatorname{Disparity}(\mathbf{x}, \mathbf{y}) with specified coverage.

  • Interpretation misrate\mathrm{misrate} is probability that true disparity falls outside bounds
  • Domain any real numbers, n2n \geq 2, m2m \geq 2, misrateminS+minA\mathrm{misrate} \geq min_S + min_A
  • Assumptions sparity(x), sparity(y)
  • Unit dimensionless (spread units)
  • Note Bonferroni split between shift and avg-spread bounds; no independence assumption needed; bounds may be unbounded when pooled spread cannot be certified positive

Properties

  • Location invariance DisparityBounds(x+k,y+k,misrate)=DisparityBounds(x,y,misrate)\operatorname{DisparityBounds}(\mathbf{x} + k, \mathbf{y} + k, \mathrm{misrate}) = \operatorname{DisparityBounds}(\mathbf{x}, \mathbf{y}, \mathrm{misrate})
  • Scale invariance DisparityBounds(kx,ky,misrate)=sign(k)DisparityBounds(x,y,misrate)\operatorname{DisparityBounds}(k \cdot \mathbf{x}, k \cdot \mathbf{y}, \mathrm{misrate}) = \operatorname{sign}(k) \cdot \operatorname{DisparityBounds}(\mathbf{x}, \mathbf{y}, \mathrm{misrate})
  • Antisymmetry DisparityBounds(x,y,misrate)=DisparityBounds(y,x,misrate)\operatorname{DisparityBounds}(\mathbf{x}, \mathbf{y}, \mathrm{misrate}) = -\operatorname{DisparityBounds}(\mathbf{y}, \mathbf{x}, \mathrm{misrate}) (bounds reversed)
  • Monotonicity in misrate smaller misrate\mathrm{misrate} produces wider bounds

Example

  • DisparityBounds([1..30], [21..50], 0.02) returns bounds containing Disparity

Algorithm

The DisparityBounds\operatorname{DisparityBounds} estimator constructs bounds on Disparity(x,y)\operatorname{Disparity}(\mathbf{x}, \mathbf{y}) by combining ShiftBounds\operatorname{ShiftBounds} and AvgSpreadBounds\operatorname{AvgSpreadBounds} through a Bonferroni split.

Misrate allocation

The total misrate\mathrm{misrate} budget is split between the shift and avg-spread components. Let minS=2(n+mn)min_S = \frac{2}{\binom{n + m}{n}} (minimum for ShiftBounds\operatorname{ShiftBounds}) and minA=2max(21nslash2,21mslash2)min_A = 2 \cdot \max(2^{1-\lfloor n slash 2 \rfloor}, 2^{1-\lfloor m slash 2 \rfloor}) (minimum for AvgSpreadBounds\operatorname{AvgSpreadBounds}). The extra budget beyond the minimums is split equally:

αS=minS+misrateminSminA2,αA=minA+misrateminSminA2\alpha_S = min_S + \frac{\mathrm{misrate} - min_S - min_A}{2}, \quad \alpha_A = min_A + \frac{\mathrm{misrate} - min_S - min_A}{2}

Component bounds

Compute [LS,US]=ShiftBounds(x,y,αS)[L_S, U_S] = \operatorname{ShiftBounds}(\mathbf{x}, \mathbf{y}, \alpha_S) and [LA,UA]=AvgSpreadBounds(x,y,αA)[L_A, U_A] = \operatorname{AvgSpreadBounds}(\mathbf{x}, \mathbf{y}, \alpha_A). By Bonferronis inequality, the probability that both intervals simultaneously contain their respective true values is at least 1αSαA=1misrate1 - \alpha_S - \alpha_A = 1 - \mathrm{misrate}.

Interval division

When LA>0L_A > 0, the disparity bounds are obtained by dividing the shift interval by the avg-spread interval. Since dividing by a positive interval can flip the ordering depending on the sign of the numerator endpoints, the algorithm computes all four combinations and takes the extremes:

[LD,UD]=[min(LSLA,LSUA,USLA,USUA),max(LSLA,LSUA,USLA,USUA)][L_D, U_D] = [\min(\frac{L_S}{L_A}, \frac{L_S}{U_A}, \frac{U_S}{L_A}, \frac{U_S}{U_A}), \max(\frac{L_S}{L_A}, \frac{L_S}{U_A}, \frac{U_S}{L_A}, \frac{U_S}{U_A})]

Edge cases

When LA=0L_A = 0 (the avg-spread interval includes zero), the bounds become partially or fully unbounded depending on the sign of [LS,US][L_S, U_S]:

  • LS>0L_S > 0: [LSUA,+)[\frac{L_S}{U_A}, +\infty)
  • US<0U_S < 0: (,USUA](-\infty, \frac{U_S}{U_A}]
  • LS=US=0L_S = U_S = 0: [0,0][0, 0]
  • otherwise: (,+)(-\infty, +\infty)

When UA=0U_A = 0 (the avg-spread interval collapses to zero), only the sign of the shift determines the result.

using Pragmastat.Exceptions;
using Pragmastat.Internal;
using Pragmastat.Metrology;

using static Pragmastat.Functions.MinAchievableMisrate;

namespace Pragmastat.Estimators;

/// <summary>
/// Distribution-free bounds for disparity using Bonferroni combination.
/// </summary>
public class DisparityBoundsEstimator : ITwoSampleBoundsEstimator
{
  public static readonly DisparityBoundsEstimator Instance = new();

  public Bounds Estimate(Sample x, Sample y, Probability misrate)
  {
    return Estimate(x, y, misrate, null);
  }

  public Bounds Estimate(Sample x, Sample y, Probability misrate, string? seed)
  {
    Assertion.MatchedUnit(x, y);
    // Check validity (priority 0)
    Assertion.Validity(x, Subject.X);
    Assertion.Validity(y, Subject.Y);

    if (double.IsNaN(misrate) || misrate < 0 || misrate > 1)
      throw AssumptionException.Domain(Subject.Misrate);

    int n = x.Size;
    int m = y.Size;
    if (n < 2)
      throw AssumptionException.Domain(Subject.X);
    if (m < 2)
      throw AssumptionException.Domain(Subject.Y);

    double minShift = TwoSample(n, m);
    double minX = OneSample(n / 2);
    double minY = OneSample(m / 2);
    double minAvg = 2.0 * Math.Max(minX, minY);

    if (misrate < minShift + minAvg)
      throw AssumptionException.Domain(Subject.Misrate);

    double extra = misrate - (minShift + minAvg);
    double alphaShift = minShift + extra / 2.0;
    double alphaAvg = minAvg + extra / 2.0;

    // Check sparity (priority 2)
    Assertion.Sparity(x, Subject.X);
    Assertion.Sparity(y, Subject.Y);

    var shiftBounds = ShiftBoundsEstimator.Instance.Estimate(x, y, alphaShift);
    var avgBounds = seed == null
      ? AvgSpreadBoundsEstimator.Instance.Estimate(x, y, alphaAvg)
      : AvgSpreadBoundsEstimator.Instance.Estimate(x, y, alphaAvg, seed);

    double la = avgBounds.Lower;
    double ua = avgBounds.Upper;
    double ls = shiftBounds.Lower;
    double us = shiftBounds.Upper;

    if (la > 0.0)
    {
      double r1 = ls / la;
      double r2 = ls / ua;
      double r3 = us / la;
      double r4 = us / ua;
      double lower = Math.Min(Math.Min(r1, r2), Math.Min(r3, r4));
      double upper = Math.Max(Math.Max(r1, r2), Math.Max(r3, r4));
      return new Bounds(lower, upper, DisparityUnit.Instance);
    }

    if (ua <= 0.0)
    {
      if (ls == 0.0 && us == 0.0)
        return new Bounds(0.0, 0.0, DisparityUnit.Instance);
      if (ls >= 0.0)
        return new Bounds(0.0, double.PositiveInfinity, DisparityUnit.Instance);
      if (us <= 0.0)
        return new Bounds(double.NegativeInfinity, 0.0, DisparityUnit.Instance);
      return new Bounds(double.NegativeInfinity, double.PositiveInfinity, DisparityUnit.Instance);
    }

    if (ls > 0.0)
      return new Bounds(ls / ua, double.PositiveInfinity, DisparityUnit.Instance);
    if (us < 0.0)
      return new Bounds(double.NegativeInfinity, us / ua, DisparityUnit.Instance);
    if (ls == 0.0 && us == 0.0)
      return new Bounds(0.0, 0.0, DisparityUnit.Instance);
    if (ls == 0.0 && us > 0.0)
      return new Bounds(0.0, double.PositiveInfinity, DisparityUnit.Instance);
    if (ls < 0.0 && us == 0.0)
      return new Bounds(double.NegativeInfinity, 0.0, DisparityUnit.Instance);

    return new Bounds(double.NegativeInfinity, double.PositiveInfinity, DisparityUnit.Instance);
  }
}

Tests

DisparityBounds(x,y,misrate)=ShiftBounds(x,y,misrate)/AvgSpreadBounds(x,y,misrate)\operatorname{DisparityBounds}(\mathbf{x}, \mathbf{y}, \mathrm{misrate}) = \operatorname{ShiftBounds}(\mathbf{x}, \mathbf{y}, \mathrm{misrate}) / \operatorname{AvgSpreadBounds}(\mathbf{x}, \mathbf{y}, \mathrm{misrate})

The DisparityBounds\operatorname{DisparityBounds} test suite contains 39 test cases (3 demo + 5 natural + 6 property + 5 edge + 5 misrate + 2 distro + 6 unsorted + 7 error). Since DisparityBounds\operatorname{DisparityBounds} returns bounds rather than a point estimate, tests validate that the bounds contain Disparity(x,y)\operatorname{Disparity}(\mathbf{x}, \mathbf{y}) and satisfy equivariance properties. Each test case output is a JSON object with lower and upper fields representing the interval bounds. Because the denominator (AvgSpreadBounds\operatorname{AvgSpreadBounds}) uses randomized SpreadBounds\operatorname{SpreadBounds}, tests fix a seed to keep outputs deterministic.

Demo examples (n=m=30n = m = 30, n=m=20n = m = 20) from manual introduction:

  • demo-1: x=(1,,30)\mathbf{x} = (1, \ldots, 30), y=(21,,50)\mathbf{y} = (21, \ldots, 50), misrate=0.02\mathrm{misrate} = 0.02
  • demo-2: x=(1,,30)\mathbf{x} = (1, \ldots, 30), y=(21,,50)\mathbf{y} = (21, \ldots, 50), misrate=0.005\mathrm{misrate} = 0.005, wider bounds (tighter misrate)
  • demo-3: x=(1,,20)\mathbf{x} = (1, \ldots, 20), y=(5,,24)\mathbf{y} = (5, \ldots, 24), misrate=0.05\mathrm{misrate} = 0.05

These cases illustrate how tighter misrates produce wider bounds.

Natural sequences (misrate=0.2\mathrm{misrate} = 0.2) 5 tests:

  • natural-10-10: x=(1,,10)\mathbf{x} = (1, \ldots, 10), y=(1,,10)\mathbf{y} = (1, \ldots, 10), bounds containing 00
  • natural-10-15: x=(1,,10)\mathbf{x} = (1, \ldots, 10), y=(1,,15)\mathbf{y} = (1, \ldots, 15)
  • natural-15-10: x=(1,,15)\mathbf{x} = (1, \ldots, 15), y=(1,,10)\mathbf{y} = (1, \ldots, 10)
  • natural-15-15: x=(1,,15)\mathbf{x} = (1, \ldots, 15), y=(1,,15)\mathbf{y} = (1, \ldots, 15), bounds containing 00
  • natural-20-20: x=(1,,20)\mathbf{x} = (1, \ldots, 20), y=(1,,20)\mathbf{y} = (1, \ldots, 20), bounds containing 00

Property validation (n=m=10n = m = 10, misrate=0.2\mathrm{misrate} = 0.2) 6 tests:

  • property-identity: x=(0,2,,18)\mathbf{x} = (0, 2, \ldots, 18), y=(0,2,,18)\mathbf{y} = (0, 2, \ldots, 18), bounds must contain 00
  • property-location-shift: x\mathbf{x} and y\mathbf{y} shifted by constant, same bounds as identity (location invariance)
  • property-scale-2x: x\mathbf{x} and y\mathbf{y} scaled by 2, same bounds as identity (scale invariance)
  • property-scale-neg: x\mathbf{x} and y\mathbf{y} negated, bounds preserved (abs\text{abs} scaling)
  • property-symmetry: x=(1,,10)\mathbf{x} = (1, \ldots, 10), y=(6,,15)\mathbf{y} = (6, \ldots, 15), observed bounds
  • property-symmetry-swapped: x\mathbf{x} and y\mathbf{y} swapped, bounds negated (anti-symmetry)

Edge cases boundary conditions (5 tests):

  • edge-small: n=m=6n = m = 6, misrate=0.6\mathrm{misrate} = 0.6 (small samples)
  • edge-negative: negative values for both samples
  • edge-mixed-signs: mixed positive/negative values
  • edge-wide-range: extreme value range
  • edge-asymmetric-10-20: n=10n = 10, m=20m = 20 (unbalanced sizes)

Misrate variation (x=(1,,20)\mathbf{x} = (1, \ldots, 20), y=(5,,24)\mathbf{y} = (5, \ldots, 24)) 5 tests:

  • misrate-2e-1: misrate=0.2\mathrm{misrate} = 0.2
  • misrate-1e-1: misrate=0.1\mathrm{misrate} = 0.1
  • misrate-5e-2: misrate=0.05\mathrm{misrate} = 0.05
  • misrate-2e-2: misrate=0.02\mathrm{misrate} = 0.02
  • misrate-1e-2: misrate=0.01\mathrm{misrate} = 0.01

These tests validate monotonicity: smaller misrates produce wider bounds.

Distribution tests (misrate\mathrm{misrate} varies) 2 tests:

  • additive-20-20: n=m=20n = m = 20, Additive(10,1)\underline{\operatorname{Additive}}(10, 1)
  • uniform-20-20: n=m=20n = m = 20, Uniform(0,1)\underline{\operatorname{Uniform}}(0, 1)

Unsorted tests verify independent sorting of x\mathbf{x} and y\mathbf{y} (6 tests):

  • unsorted-reverse-x: X reversed, Y sorted
  • unsorted-reverse-y: X sorted, Y reversed
  • unsorted-reverse-both: both reversed
  • unsorted-shuffle-x: X shuffled, Y sorted
  • unsorted-shuffle-y: X sorted, Y shuffled
  • unsorted-wide-range: wide value range, both unsorted

These tests validate that DisparityBounds\operatorname{DisparityBounds} produces identical results regardless of input order.

Error cases inputs that violate assumptions (7 tests):

  • error-empty-x: x=()\mathbf{x} = () (empty X array)
  • error-empty-y: y=()\mathbf{y} = () (empty Y array)
  • error-single-element-x: x=1|\mathbf{x}| = 1 (too few elements for pairing)
  • error-single-element-y: y=1|\mathbf{y}| = 1 (too few elements for pairing)
  • error-constant-x: constant x\mathbf{x} violates sparity (Spread=0\operatorname{Spread} = 0)
  • error-constant-y: constant y\mathbf{y} violates sparity (Spread=0\operatorname{Spread} = 0)
  • error-misrate-below-min: misrate below minimum achievable