DisparityBounds

\operatorname{DisparityBounds}(\mathbf{x}, \mathbf{y}, \mathrm{misrate}) = [L_D, U_D]

Let $min_S = \frac{2}{\binom{n + m}{n}}$ and $min_A = 2 \cdot \max(2^{1-\lfloor \frac{n}{2} \rfloor}, 2^{1-\lfloor \frac{m}{2} \rfloor})$ . Require $\mathrm{misrate} \geq min_S + min_A$ . Let $\text{extra} = \mathrm{misrate} - (min_S + min_A)$ , $\alpha_S = min_S + \frac{\text{extra}}{2}$ , $\alpha_A = min_A + \frac{\text{extra}}{2}$ . Compute $[L_S, U_S] = \operatorname{ShiftBounds}(\mathbf{x}, \mathbf{y}, \alpha_S)$ and $[L_A, U_A] = \operatorname{AvgSpreadBounds}(\mathbf{x}, \mathbf{y}, \alpha_A)$ .

If $L_A > 0$ , return $[L_D, U_D] = [\min(\frac{L_S}{L_A}, \frac{L_S}{U_A}, \frac{U_S}{L_A}, \frac{U_S}{U_A}), \max(\frac{L_S}{L_A}, \frac{L_S}{U_A}, \frac{U_S}{L_A}, \frac{U_S}{U_A})]$ .

If $L_A = 0$ , return the tightest single interval that is always valid:

$L_S > 0$ : $[\frac{L_S}{U_A}, +\infty)$
$U_S < 0$ : $(-\infty, \frac{U_S}{U_A}]$
$L_S = 0$ and $U_S = 0$ : $[0, 0]$
$L_S = 0$ and $U_S > 0$ : $[0, +\infty)$
$L_S < 0$ and $U_S = 0$ : $(-\infty, 0]$
otherwise: $(-\infty, +\infty)$

If $U_A = 0$ , use the sign-only rule: $[0, +\infty)$ if $L_S \geq 0$ , $(-\infty, 0]$ if $U_S \leq 0$ , $(-\infty, +\infty)$ otherwise (with $[0, 0]$ when $L_S = U_S = 0$ ).

Robust bounds on $\operatorname{Disparity}(\mathbf{x}, \mathbf{y})$ with specified coverage.

Interpretation $\mathrm{misrate}$ is probability that true disparity falls outside bounds
Domain any real numbers, $n \geq 2$ , $m \geq 2$ , $\mathrm{misrate} \geq min_S + min_A$
Assumptions sparity(x), sparity(y)
Unit dimensionless (spread units)
Note Bonferroni split between shift and avg-spread bounds; no independence assumption needed; bounds may be unbounded when pooled spread cannot be certified positive

Properties

Location invariance $\operatorname{DisparityBounds}(\mathbf{x} + k, \mathbf{y} + k, \mathrm{misrate}) = \operatorname{DisparityBounds}(\mathbf{x}, \mathbf{y}, \mathrm{misrate})$
Scale invariance $\operatorname{DisparityBounds}(k \cdot \mathbf{x}, k \cdot \mathbf{y}, \mathrm{misrate}) = \operatorname{sign}(k) \cdot \operatorname{DisparityBounds}(\mathbf{x}, \mathbf{y}, \mathrm{misrate})$
Antisymmetry $\operatorname{DisparityBounds}(\mathbf{x}, \mathbf{y}, \mathrm{misrate}) = -\operatorname{DisparityBounds}(\mathbf{y}, \mathbf{x}, \mathrm{misrate})$ (bounds reversed)
Monotonicity in misrate smaller $\mathrm{misrate}$ produces wider bounds

Example

DisparityBounds([1..30], [21..50], 0.02) returns bounds containing Disparity

Algorithm

The $\operatorname{DisparityBounds}$ estimator constructs bounds on $\operatorname{Disparity}(\mathbf{x}, \mathbf{y})$ by combining $\operatorname{ShiftBounds}$ and $\operatorname{AvgSpreadBounds}$ through a Bonferroni split.

Misrate allocation

The total $\mathrm{misrate}$ budget is split between the shift and avg-spread components. Let $min_S = \frac{2}{\binom{n + m}{n}}$ (minimum for $\operatorname{ShiftBounds}$ ) and $min_A = 2 \cdot \max(2^{1-\lfloor n slash 2 \rfloor}, 2^{1-\lfloor m slash 2 \rfloor})$ (minimum for $\operatorname{AvgSpreadBounds}$ ). The extra budget beyond the minimums is split equally:

\alpha_S = min_S + \frac{\mathrm{misrate} - min_S - min_A}{2}, \quad \alpha_A = min_A + \frac{\mathrm{misrate} - min_S - min_A}{2}

Component bounds

Compute $[L_S, U_S] = \operatorname{ShiftBounds}(\mathbf{x}, \mathbf{y}, \alpha_S)$ and $[L_A, U_A] = \operatorname{AvgSpreadBounds}(\mathbf{x}, \mathbf{y}, \alpha_A)$ . By Bonferronis inequality, the probability that both intervals simultaneously contain their respective true values is at least $1 - \alpha_S - \alpha_A = 1 - \mathrm{misrate}$ .

Interval division

When $L_A > 0$ , the disparity bounds are obtained by dividing the shift interval by the avg-spread interval. Since dividing by a positive interval can flip the ordering depending on the sign of the numerator endpoints, the algorithm computes all four combinations and takes the extremes:

[L_D, U_D] = [\min(\frac{L_S}{L_A}, \frac{L_S}{U_A}, \frac{U_S}{L_A}, \frac{U_S}{U_A}), \max(\frac{L_S}{L_A}, \frac{L_S}{U_A}, \frac{U_S}{L_A}, \frac{U_S}{U_A})]

Edge cases

When $L_A = 0$ (the avg-spread interval includes zero), the bounds become partially or fully unbounded depending on the sign of $[L_S, U_S]$ :

$L_S > 0$ : $[\frac{L_S}{U_A}, +\infty)$
$U_S < 0$ : $(-\infty, \frac{U_S}{U_A}]$
$L_S = U_S = 0$ : $[0, 0]$
otherwise: $(-\infty, +\infty)$

When $U_A = 0$ (the avg-spread interval collapses to zero), only the sign of the shift determines the result.

using Pragmastat.Exceptions;
using Pragmastat.Internal;
using Pragmastat.Metrology;

using static Pragmastat.Functions.MinAchievableMisrate;

namespace Pragmastat.Estimators;

/// <summary>
/// Distribution-free bounds for disparity using Bonferroni combination.
/// </summary>
public class DisparityBoundsEstimator : ITwoSampleBoundsEstimator
{
  public static readonly DisparityBoundsEstimator Instance = new();

  public Bounds Estimate(Sample x, Sample y, Probability misrate)
  {
    return Estimate(x, y, misrate, null);
  }

  public Bounds Estimate(Sample x, Sample y, Probability misrate, string? seed)
  {
    Assertion.MatchedUnit(x, y);
    // Check validity (priority 0)
    Assertion.Validity(x, Subject.X);
    Assertion.Validity(y, Subject.Y);

    if (double.IsNaN(misrate) || misrate < 0 || misrate > 1)
      throw AssumptionException.Domain(Subject.Misrate);

    int n = x.Size;
    int m = y.Size;
    if (n < 2)
      throw AssumptionException.Domain(Subject.X);
    if (m < 2)
      throw AssumptionException.Domain(Subject.Y);

    double minShift = TwoSample(n, m);
    double minX = OneSample(n / 2);
    double minY = OneSample(m / 2);
    double minAvg = 2.0 * Math.Max(minX, minY);

    if (misrate < minShift + minAvg)
      throw AssumptionException.Domain(Subject.Misrate);

    double extra = misrate - (minShift + minAvg);
    double alphaShift = minShift + extra / 2.0;
    double alphaAvg = minAvg + extra / 2.0;

    // Check sparity (priority 2)
    Assertion.Sparity(x, Subject.X);
    Assertion.Sparity(y, Subject.Y);

    var shiftBounds = ShiftBoundsEstimator.Instance.Estimate(x, y, alphaShift);
    var avgBounds = seed == null
      ? AvgSpreadBoundsEstimator.Instance.Estimate(x, y, alphaAvg)
      : AvgSpreadBoundsEstimator.Instance.Estimate(x, y, alphaAvg, seed);

    double la = avgBounds.Lower;
    double ua = avgBounds.Upper;
    double ls = shiftBounds.Lower;
    double us = shiftBounds.Upper;

    if (la > 0.0)
    {
      double r1 = ls / la;
      double r2 = ls / ua;
      double r3 = us / la;
      double r4 = us / ua;
      double lower = Math.Min(Math.Min(r1, r2), Math.Min(r3, r4));
      double upper = Math.Max(Math.Max(r1, r2), Math.Max(r3, r4));
      return new Bounds(lower, upper, DisparityUnit.Instance);
    }

    if (ua <= 0.0)
    {
      if (ls == 0.0 && us == 0.0)
        return new Bounds(0.0, 0.0, DisparityUnit.Instance);
      if (ls >= 0.0)
        return new Bounds(0.0, double.PositiveInfinity, DisparityUnit.Instance);
      if (us <= 0.0)
        return new Bounds(double.NegativeInfinity, 0.0, DisparityUnit.Instance);
      return new Bounds(double.NegativeInfinity, double.PositiveInfinity, DisparityUnit.Instance);
    }

    if (ls > 0.0)
      return new Bounds(ls / ua, double.PositiveInfinity, DisparityUnit.Instance);
    if (us < 0.0)
      return new Bounds(double.NegativeInfinity, us / ua, DisparityUnit.Instance);
    if (ls == 0.0 && us == 0.0)
      return new Bounds(0.0, 0.0, DisparityUnit.Instance);
    if (ls == 0.0 && us > 0.0)
      return new Bounds(0.0, double.PositiveInfinity, DisparityUnit.Instance);
    if (ls < 0.0 && us == 0.0)
      return new Bounds(double.NegativeInfinity, 0.0, DisparityUnit.Instance);

    return new Bounds(double.NegativeInfinity, double.PositiveInfinity, DisparityUnit.Instance);
  }
}

Tests

\operatorname{DisparityBounds}(\mathbf{x}, \mathbf{y}, \mathrm{misrate}) = \operatorname{ShiftBounds}(\mathbf{x}, \mathbf{y}, \mathrm{misrate}) / \operatorname{AvgSpreadBounds}(\mathbf{x}, \mathbf{y}, \mathrm{misrate})

The $\operatorname{DisparityBounds}$ test suite contains 39 test cases (3 demo + 5 natural + 6 property + 5 edge + 5 misrate + 2 distro + 6 unsorted + 7 error). Since $\operatorname{DisparityBounds}$ returns bounds rather than a point estimate, tests validate that the bounds contain $\operatorname{Disparity}(\mathbf{x}, \mathbf{y})$ and satisfy equivariance properties. Each test case output is a JSON object with lower and upper fields representing the interval bounds. Because the denominator ( $\operatorname{AvgSpreadBounds}$ ) uses randomized $\operatorname{SpreadBounds}$ , tests fix a seed to keep outputs deterministic.

Demo examples ( $n = m = 30$ , $n = m = 20$ ) from manual introduction:

demo-1: $\mathbf{x} = (1, \ldots, 30)$ , $\mathbf{y} = (21, \ldots, 50)$ , $\mathrm{misrate} = 0.02$
demo-2: $\mathbf{x} = (1, \ldots, 30)$ , $\mathbf{y} = (21, \ldots, 50)$ , $\mathrm{misrate} = 0.005$ , wider bounds (tighter misrate)
demo-3: $\mathbf{x} = (1, \ldots, 20)$ , $\mathbf{y} = (5, \ldots, 24)$ , $\mathrm{misrate} = 0.05$

These cases illustrate how tighter misrates produce wider bounds.

Natural sequences ( $\mathrm{misrate} = 0.2$ ) 5 tests:

natural-10-10: $\mathbf{x} = (1, \ldots, 10)$ , $\mathbf{y} = (1, \ldots, 10)$ , bounds containing $0$
natural-10-15: $\mathbf{x} = (1, \ldots, 10)$ , $\mathbf{y} = (1, \ldots, 15)$
natural-15-10: $\mathbf{x} = (1, \ldots, 15)$ , $\mathbf{y} = (1, \ldots, 10)$
natural-15-15: $\mathbf{x} = (1, \ldots, 15)$ , $\mathbf{y} = (1, \ldots, 15)$ , bounds containing $0$
natural-20-20: $\mathbf{x} = (1, \ldots, 20)$ , $\mathbf{y} = (1, \ldots, 20)$ , bounds containing $0$

Property validation ( $n = m = 10$ , $\mathrm{misrate} = 0.2$ ) 6 tests:

property-identity: $\mathbf{x} = (0, 2, \ldots, 18)$ , $\mathbf{y} = (0, 2, \ldots, 18)$ , bounds must contain $0$
property-location-shift: $\mathbf{x}$ and $\mathbf{y}$ shifted by constant, same bounds as identity (location invariance)
property-scale-2x: $\mathbf{x}$ and $\mathbf{y}$ scaled by 2, same bounds as identity (scale invariance)
property-scale-neg: $\mathbf{x}$ and $\mathbf{y}$ negated, bounds preserved ( $\text{abs}$ scaling)
property-symmetry: $\mathbf{x} = (1, \ldots, 10)$ , $\mathbf{y} = (6, \ldots, 15)$ , observed bounds
property-symmetry-swapped: $\mathbf{x}$ and $\mathbf{y}$ swapped, bounds negated (anti-symmetry)

Edge cases boundary conditions (5 tests):

edge-small: $n = m = 6$ , $\mathrm{misrate} = 0.6$ (small samples)
edge-negative: negative values for both samples
edge-mixed-signs: mixed positive/negative values
edge-wide-range: extreme value range
edge-asymmetric-10-20: $n = 10$ , $m = 20$ (unbalanced sizes)

Misrate variation ( $\mathbf{x} = (1, \ldots, 20)$ , $\mathbf{y} = (5, \ldots, 24)$ ) 5 tests:

misrate-2e-1: $\mathrm{misrate} = 0.2$
misrate-1e-1: $\mathrm{misrate} = 0.1$
misrate-5e-2: $\mathrm{misrate} = 0.05$
misrate-2e-2: $\mathrm{misrate} = 0.02$
misrate-1e-2: $\mathrm{misrate} = 0.01$

These tests validate monotonicity: smaller misrates produce wider bounds.

Distribution tests ( $\mathrm{misrate}$ varies) 2 tests:

additive-20-20: $n = m = 20$ , $\underline{\operatorname{Additive}}(10, 1)$
uniform-20-20: $n = m = 20$ , $\underline{\operatorname{Uniform}}(0, 1)$

Unsorted tests verify independent sorting of $\mathbf{x}$ and $\mathbf{y}$ (6 tests):

unsorted-reverse-x: X reversed, Y sorted
unsorted-reverse-y: X sorted, Y reversed
unsorted-reverse-both: both reversed
unsorted-shuffle-x: X shuffled, Y sorted
unsorted-shuffle-y: X sorted, Y shuffled
unsorted-wide-range: wide value range, both unsorted

These tests validate that $\operatorname{DisparityBounds}$ produces identical results regardless of input order.

Error cases inputs that violate assumptions (7 tests):

error-empty-x: $\mathbf{x} = ()$ (empty X array)
error-empty-y: $\mathbf{y} = ()$ (empty Y array)
error-single-element-x: $|\mathbf{x}| = 1$ (too few elements for pairing)
error-single-element-y: $|\mathbf{y}| = 1$ (too few elements for pairing)
error-constant-x: constant $\mathbf{x}$ violates sparity ( $\operatorname{Spread} = 0$ )
error-constant-y: constant $\mathbf{y}$ violates sparity ( $\operatorname{Spread} = 0$ )
error-misrate-below-min: misrate below minimum achievable