AvgSpreadBounds

AvgSpreadBounds(x,y,misrate)=[LA,UA]\operatorname{AvgSpreadBounds}(\mathbf{x}, \mathbf{y}, \mathrm{misrate}) = [L_A, U_A]

where [Lx,Ux]=SpreadBounds(x,misrate/2)[L_x, U_x] = \operatorname{SpreadBounds}(\mathbf{x}, \mathrm{misrate} / 2), [Ly,Uy]=SpreadBounds(y,misrate/2)[L_y, U_y] = \operatorname{SpreadBounds}(\mathbf{y}, \mathrm{misrate} / 2), wx=n/(n+m)w_x = n / (n + m), wy=m/(n+m)w_y = m / (n + m), and [LA,UA]=[wxLx+wyLy,wxUx+wyUy][L_A, U_A] = [w_x L_x + w_y L_y, w_x U_x + w_y U_y].

Robust bounds on AvgSpread(x,y)\operatorname{AvgSpread}(\mathbf{x}, \mathbf{y}) with specified coverage.

Input

  • x=(x1,x2,,xn)\mathbf{x} = (x_1, x_2, \ldots, x_n) — first sample of measurements, where n2n \geq 2, requires sparity(x)
  • y=(y1,y2,,ym)\mathbf{y} = (y_1, y_2, \ldots, y_m) — second sample of measurements, where m2m \geq 2, requires sparity(y)
  • misrate2max(21n/2,21m/2)\mathrm{misrate} \geq 2 \cdot \max(2^{1-\lfloor n/2 \rfloor}, 2^{1-\lfloor m/2 \rfloor}) — probability that true avg spread falls outside bounds in the long run

Output

  • Value interval [L,U][L, U] bounding AvgSpread(x,y)\operatorname{AvgSpread}(\mathbf{x}, \mathbf{y})
  • Unit same unit as x\mathbf{x}, y\mathbf{y}

Notes

Properties

  • Symmetry AvgSpreadBounds(x,y,misrate)=AvgSpreadBounds(y,x,misrate)\operatorname{AvgSpreadBounds}(\mathbf{x}, \mathbf{y}, \mathrm{misrate}) = \operatorname{AvgSpreadBounds}(\mathbf{y}, \mathbf{x}, \mathrm{misrate}) (equal split)
  • Shift invariance adding constants to x\mathbf{x} and/or y\mathbf{y} does not change bounds
  • Scale equivariance AvgSpreadBounds(kx,ky,misrate)=kAvgSpreadBounds(x,y,misrate)\operatorname{AvgSpreadBounds}(k \cdot \mathbf{x}, k \cdot \mathbf{y}, \mathrm{misrate}) = \lvert k \rvert \cdot \operatorname{AvgSpreadBounds}(\mathbf{x}, \mathbf{y}, \mathrm{misrate})
  • Non-negativity bounds are non-negative
  • Monotonicity in misrate smaller misrate\mathrm{misrate} produces wider bounds

Example

  • AvgSpreadBounds([1..30], [21..50], 10^(-3)) returns bounds containing AvgSpread

AvgSpreadBounds\operatorname{AvgSpreadBounds} provides distribution-free uncertainty bounds for the pooled spread: the weighted average of the two sample spreads. The algorithm computes separate SpreadBounds\operatorname{SpreadBounds} for each sample using an equal Bonferroni split and then combines them linearly with weights n/(n+m)n/(n+m) and m/(n+m)m/(n+m). This guarantees that the probability of missing the true AvgSpread\operatorname{AvgSpread} is at most misrate\mathrm{misrate} without requiring independence between samples.

Minimum misrate because misrate/2\mathrm{misrate} / 2 must satisfy the per-sample minimum, the overall misrate must be large enough for both samples:

misrate2max(21n2,21m2)\mathrm{misrate} \geq 2 \cdot \max(2^{1-\lfloor \frac{n}{2} \rfloor}, 2^{1-\lfloor \frac{m}{2} \rfloor})

Algorithm

The AvgSpreadBounds\operatorname{AvgSpreadBounds} estimator constructs bounds on the pooled spread by combining two independent SpreadBounds\operatorname{SpreadBounds} calls through a Bonferroni split.

The algorithm proceeds as follows:

  • Equal Bonferroni split Use misrate/2\mathrm{misrate} / 2 for each per-sample bounds call. Each call uses half the total error budget.

  • Per-sample bounds Compute [Lx,Ux]=SpreadBounds(x,misrate/2)[L_x, U_x] = \operatorname{SpreadBounds}(\mathbf{x}, \mathrm{misrate} / 2) and [Ly,Uy]=SpreadBounds(y,misrate/2)[L_y, U_y] = \operatorname{SpreadBounds}(\mathbf{y}, \mathrm{misrate} / 2) (see SpreadBounds).

  • Weighted linear combination With weights wx=n/(n+m)w_x = n / (n + m) and wy=m/(n+m)w_y = m / (n + m), return:

[LA,UA]=[wxLx+wyLy,wxUx+wyUy][L_A, U_A] = [w_x L_x + w_y L_y, w_x U_x + w_y U_y]

By Bonferronis inequality, the probability that both per-sample bounds simultaneously cover their respective true spreads is at least 1misrate1 - \mathrm{misrate}. Since AvgSpread\operatorname{AvgSpread} is a weighted average of the individual spreads, the linear combination of the bounds covers the true AvgSpread\operatorname{AvgSpread} whenever both individual bounds hold.

using Pragmastat.Algorithms;
using Pragmastat.Exceptions;
using Pragmastat.Functions;
using Pragmastat.Internal;

namespace Pragmastat.Estimators;

/// <summary>
/// Distribution-free bounds for AvgSpread via Bonferroni combination of SpreadBounds.
/// Uses equal split alpha = misrate / 2.
/// </summary>
internal class AvgSpreadBoundsEstimator : ITwoSampleBoundsEstimator
{
  internal static readonly AvgSpreadBoundsEstimator Instance = new();

  public Bounds Estimate(Sample x, Sample y, Probability misrate)
  {
    return Estimate(x, y, misrate, null);
  }

  public Bounds Estimate(Sample x, Sample y, Probability misrate, string? seed)
  {
    Assertion.NonWeighted("x", x);
    Assertion.NonWeighted("y", y);
    Assertion.CompatibleUnits(x, y);
    (x, y) = Assertion.ConvertToFiner(x, y);

    if (double.IsNaN(misrate) || misrate < 0 || misrate > 1)
      throw AssumptionException.Domain(Subject.Misrate);

    int n = x.Size;
    int m = y.Size;
    if (n < 2)
      throw AssumptionException.Domain(Subject.X);
    if (m < 2)
      throw AssumptionException.Domain(Subject.Y);

    double alpha = misrate / 2.0;
    double minX = MinAchievableMisrate.OneSample(n / 2);
    double minY = MinAchievableMisrate.OneSample(m / 2);
    if (alpha < minX || alpha < minY)
      throw AssumptionException.Domain(Subject.Misrate);

    if (FastSpread.Estimate(x.SortedValues, isSorted: true) <= 0)
      throw AssumptionException.Sparity(Subject.X);
    if (FastSpread.Estimate(y.SortedValues, isSorted: true) <= 0)
      throw AssumptionException.Sparity(Subject.Y);

    Bounds boundsX = seed == null
      ? SpreadBoundsEstimator.Instance.Estimate(x, alpha)
      : SpreadBoundsEstimator.Instance.Estimate(x, alpha, seed);
    Bounds boundsY = seed == null
      ? SpreadBoundsEstimator.Instance.Estimate(y, alpha)
      : SpreadBoundsEstimator.Instance.Estimate(y, alpha, seed);

    double weightX = (double)n / (n + m);
    double weightY = (double)m / (n + m);

    double lower = weightX * boundsX.Lower + weightY * boundsY.Lower;
    double upper = weightX * boundsX.Upper + weightY * boundsY.Upper;
    return new Bounds(lower, upper, x.Unit);
  }
}

Tests

AvgSpreadBounds(x,y,misrate)=[LA,UA]\operatorname{AvgSpreadBounds}(\mathbf{x}, \mathbf{y}, \mathrm{misrate}) = [L_A, U_A]

Use an equal Bonferroni split. Compute [Lx,Ux]=SpreadBounds(x,misrate/2)[L_x, U_x] = \operatorname{SpreadBounds}(\mathbf{x}, \mathrm{misrate} / 2) and [Ly,Uy]=SpreadBounds(y,misrate/2)[L_y, U_y] = \operatorname{SpreadBounds}(\mathbf{y}, \mathrm{misrate} / 2) using disjoint-pair sign-test inversion (see SpreadBounds\operatorname{SpreadBounds}). Let wx=n/(n+m)w_x = n / (n + m) and wy=m/(n+m)w_y = m / (n + m). Return [LA,UA]=[wxLx+wyLy,wxUx+wyUy][L_A, U_A] = [w_x L_x + w_y L_y, w_x U_x + w_y U_y].

The AvgSpreadBounds\operatorname{AvgSpreadBounds} test suite contains 40 test cases (3 demo + 5 natural + 6 property + 6 edge + 2 distro + 5 misrate + 6 unsorted + 7 error). Since AvgSpreadBounds\operatorname{AvgSpreadBounds} returns bounds rather than a point estimate, tests validate that bounds are well-formed and satisfy equivariance properties. Each test case output is a JSON object with lower and upper fields representing the interval bounds. Because SpreadBounds\operatorname{SpreadBounds} is randomized, tests fix a seed to make outputs deterministic. Both SpreadBounds\operatorname{SpreadBounds} calls use the same seed (two identical RNG streams).

Minimum misrate constraint the equal split requires

misrate221n2\frac{\mathrm{misrate}}{2} \geq 2^{1-\lfloor \frac{n}{2} \rfloor}

and

misrate221m2\frac{\mathrm{misrate}}{2} \geq 2^{1-\lfloor \frac{m}{2} \rfloor}

,

so

misrate2max(21n2,21m2)\mathrm{misrate} \geq 2 \cdot \max(2^{1-\lfloor \frac{n}{2} \rfloor}, 2^{1-\lfloor \frac{m}{2} \rfloor})

.

Demo examples (n=m=30n = m = 30, n=m=20n = m = 20) from manual introduction:

  • demo-1: x=(1,,30)\mathbf{x} = (1, \ldots, 30), y=(21,,50)\mathbf{y} = (21, \ldots, 50), baseline fixture misrate
  • demo-2: x=(1,,30)\mathbf{x} = (1, \ldots, 30), y=(21,,50)\mathbf{y} = (21, \ldots, 50), stricter fixture misrate, wider bounds
  • demo-3: x=(1,,20)\mathbf{x} = (1, \ldots, 20), y=(5,,24)\mathbf{y} = (5, \ldots, 24), looser fixture misrate

These cases illustrate how tighter misrates produce wider bounds.

Natural sequences (misrate\mathrm{misrate} varies across achievable fixture levels) 5 tests:

  • natural-10-10: x=(1,,10)\mathbf{x} = (1, \ldots, 10), y=(1,,10)\mathbf{y} = (1, \ldots, 10)
  • natural-10-15: x=(1,,10)\mathbf{x} = (1, \ldots, 10), y=(1,,15)\mathbf{y} = (1, \ldots, 15)
  • natural-15-10: x=(1,,15)\mathbf{x} = (1, \ldots, 15), y=(1,,10)\mathbf{y} = (1, \ldots, 10)
  • natural-15-15: x=(1,,15)\mathbf{x} = (1, \ldots, 15), y=(1,,15)\mathbf{y} = (1, \ldots, 15)
  • natural-20-20: x=(1,,20)\mathbf{x} = (1, \ldots, 20), y=(1,,20)\mathbf{y} = (1, \ldots, 20)

Property validation (n=m=10n = m = 10) 6 tests:

  • property-identity: x=(0,2,,18)\mathbf{x} = (0, 2, \ldots, 18), y=(0,2,,18)\mathbf{y} = (0, 2, \ldots, 18), expected output: [2,16][2, 16]
  • property-location-shift: x=(10,12,,28)\mathbf{x} = (10, 12, \ldots, 28), y=(12,14,,30)\mathbf{y} = (12, 14, \ldots, 30), expected output: [2,16][2, 16] (shift invariance)
  • property-scale-2x: x=(0,4,,36)\mathbf{x} = (0, 4, \ldots, 36), y=(4,8,,40)\mathbf{y} = (4, 8, \ldots, 40), expected output: [4,32][4, 32] (= 2× identity bounds, scale equivariance)
  • property-scale-neg: x=(0,2,,18)\mathbf{x} = (0, -2, \ldots, -18), y=(2,4,,20)\mathbf{y} = (-2, -4, \ldots, -20), expected output: [2,16][2, 16] (= identity bounds, k\lvert k \rvert scaling)
  • property-symmetry: x=(1,,10)\mathbf{x} = (1, \ldots, 10), y=(6,,15)\mathbf{y} = (6, \ldots, 15)
  • property-symmetry-swapped: x=(6,,15)\mathbf{x} = (6, \ldots, 15), y=(1,,10)\mathbf{y} = (1, \ldots, 10), same output as property-symmetry (swap symmetry with equal Bonferroni split)

Edge cases boundary conditions and extreme scenarios (6 tests):

  • edge-small: n=m=4n = m = 4, minimum non-trivial fixture misrate
  • edge-negative: x=(10,,1)\mathbf{x} = (-10, \ldots, -1), y=(20,,11)\mathbf{y} = (-20, \ldots, -11) (negative values)
  • edge-mixed-signs: mixed positive/negative values
  • edge-wide-range: powers of 10 from 11 to 10910^9 (extreme value range)
  • edge-asymmetric-8-30: n=8n = 8, m=30m = 30 (unbalanced sizes)
  • edge-duplicates-mixed: x=(1,1,1,2,3,4)\mathbf{x} = (1, 1, 1, 2, 3, 4), y=(2,2,2,3,4,5)\mathbf{y} = (2, 2, 2, 3, 4, 5) (partial ties)

Distribution tests (reference fixture misrates) 2 tests:

  • additive-20-20: n=m=20n = m = 20, Additive(10,1)\underline{\operatorname{Additive}}(10, 1)
  • uniform-20-20: n=m=20n = m = 20, Uniform(0,1)\underline{\operatorname{Uniform}}(0, 1)

Misrate variation (x=(1,,20)\mathbf{x} = (1, \ldots, 20), y=(5,,24)\mathbf{y} = (5, \ldots, 24)) 5 tests spanning progressively stricter fixture misrates:

These tests validate monotonicity: smaller misrates produce wider bounds.

Unsorted tests verify independent sorting of x\mathbf{x} and y\mathbf{y} (6 tests):

  • unsorted-reverse-x: X reversed, Y sorted
  • unsorted-reverse-y: X sorted, Y reversed
  • unsorted-reverse-both: both reversed
  • unsorted-shuffle-x: X shuffled, Y sorted
  • unsorted-shuffle-y: X sorted, Y shuffled
  • unsorted-wide-range: wide value range, both unsorted

These tests validate that AvgSpreadBounds\operatorname{AvgSpreadBounds} produces identical results regardless of input order.

Error cases inputs that violate assumptions (7 tests):

  • error-empty-x: x=()\mathbf{x} = () (empty X array) — validity error
  • error-empty-y: y=()\mathbf{y} = () (empty Y array) — validity error
  • error-single-element-x: x=1|\mathbf{x}| = 1 (too few elements for pairing) — domain error
  • error-single-element-y: y=1|\mathbf{y}| = 1 (too few elements for pairing) — domain error
  • error-constant-x: constant x\mathbf{x} violates sparity (Spread=0\operatorname{Spread} = 0)
  • error-constant-y: constant y\mathbf{y} violates sparity (Spread=0\operatorname{Spread} = 0)
  • error-misrate-below-min: misrate below minimum achievable — domain error