AvgSpreadBounds

AvgSpreadBounds(x,y,misrate)=[LA,UA]\operatorname{AvgSpreadBounds}(\mathbf{x}, \mathbf{y}, \mathrm{misrate}) = [L_A, U_A]

where α=misrate2\alpha = \frac{\mathrm{misrate}}{2}, [Lx,Ux]=SpreadBounds(x,α)[L_x, U_x] = \operatorname{SpreadBounds}(\mathbf{x}, \alpha), [Ly,Uy]=SpreadBounds(y,α)[L_y, U_y] = \operatorname{SpreadBounds}(\mathbf{y}, \alpha), wx=nn+mw_x = \frac{n}{n + m}, wy=mn+mw_y = \frac{m}{n + m}, and [LA,UA]=[wxLx+wyLy,wxUx+wyUy][L_A, U_A] = [w_x L_x + w_y L_y, w_x U_x + w_y U_y].

Robust bounds on AvgSpread(x,y)\operatorname{AvgSpread}(\mathbf{x}, \mathbf{y}) with specified coverage.

  • Interpretation misrate\mathrm{misrate} is probability that true avg spread falls outside bounds
  • Domain any real numbers, n2n \geq 2, m2m \geq 2, α21n2\alpha \geq 2^{1-\lfloor \frac{n}{2} \rfloor} and α21m2\alpha \geq 2^{1-\lfloor \frac{m}{2} \rfloor}
  • Assumptions sparity(x), sparity(y)
  • Unit same as measurements
  • Note Bonferroni combination of two SpreadBounds\operatorname{SpreadBounds} calls with equal split α=misrate2\alpha = \frac{\mathrm{misrate}}{2}; no independence assumption needed; randomized pairing and cutoff, conservative with ties

Properties

  • Symmetry AvgSpreadBounds(x,y,misrate)=AvgSpreadBounds(y,x,misrate)\operatorname{AvgSpreadBounds}(\mathbf{x}, \mathbf{y}, \mathrm{misrate}) = \operatorname{AvgSpreadBounds}(\mathbf{y}, \mathbf{x}, \mathrm{misrate}) (equal split)
  • Shift invariance adding constants to x\mathbf{x} and/or y\mathbf{y} does not change bounds
  • Scale equivariance AvgSpreadBounds(kx,ky,misrate)=kAvgSpreadBounds(x,y,misrate)\operatorname{AvgSpreadBounds}(k \cdot \mathbf{x}, k \cdot \mathbf{y}, \mathrm{misrate}) = \lvert k \rvert \cdot \operatorname{AvgSpreadBounds}(\mathbf{x}, \mathbf{y}, \mathrm{misrate})
  • Non-negativity bounds are non-negative
  • Monotonicity in misrate smaller misrate\mathrm{misrate} produces wider bounds

Example

  • AvgSpreadBounds([1..30], [21..50], 0.02) returns bounds containing AvgSpread

AvgSpreadBounds\operatorname{AvgSpreadBounds} provides distribution-free uncertainty bounds for the pooled spread: the weighted average of the two sample spreads. The algorithm computes separate SpreadBounds\operatorname{SpreadBounds} for each sample using an equal Bonferroni split and then combines them linearly with weights nn+m\frac{n}{n+m} and mn+m\frac{m}{n+m}. This guarantees that the probability of missing the true AvgSpread\operatorname{AvgSpread} is at most misrate\mathrm{misrate} without requiring independence between samples.

Minimum misrate because α=misrate2\alpha = \frac{\mathrm{misrate}}{2} must satisfy the per-sample minimum, the overall misrate must be large enough for both samples:

misrate2max(21n2,21m2)\mathrm{misrate} \geq 2 \cdot \max(2^{1-\lfloor \frac{n}{2} \rfloor}, 2^{1-\lfloor \frac{m}{2} \rfloor})

Algorithm

The AvgSpreadBounds\operatorname{AvgSpreadBounds} estimator constructs bounds on the pooled spread by combining two independent SpreadBounds\operatorname{SpreadBounds} calls through a Bonferroni split.

The algorithm proceeds as follows:

  • Equal Bonferroni split Set α=misrate2\alpha = \frac{\mathrm{misrate}}{2}. Each per-sample bounds call uses half the total error budget.

  • Per-sample bounds Compute [Lx,Ux]=SpreadBounds(x,α)[L_x, U_x] = \operatorname{SpreadBounds}(\mathbf{x}, \alpha) and [Ly,Uy]=SpreadBounds(y,α)[L_y, U_y] = \operatorname{SpreadBounds}(\mathbf{y}, \alpha) (see SpreadBounds).

  • Weighted linear combination With weights wx=nn+mw_x = \frac{n}{n + m} and wy=mn+mw_y = \frac{m}{n + m}, return:

[LA,UA]=[wxLx+wyLy,wxUx+wyUy][L_A, U_A] = [w_x L_x + w_y L_y, w_x U_x + w_y U_y]

By Bonferronis inequality, the probability that both per-sample bounds simultaneously cover their respective true spreads is at least 12α=1misrate1 - 2 \alpha = 1 - \mathrm{misrate}. Since AvgSpread\operatorname{AvgSpread} is a weighted average of the individual spreads, the linear combination of the bounds covers the true AvgSpread\operatorname{AvgSpread} whenever both individual bounds hold.

using Pragmastat.Exceptions;
using Pragmastat.Functions;
using Pragmastat.Internal;

namespace Pragmastat.Estimators;

/// <summary>
/// Distribution-free bounds for AvgSpread via Bonferroni combination of SpreadBounds.
/// Uses equal split alpha = misrate / 2.
/// </summary>
internal class AvgSpreadBoundsEstimator : ITwoSampleBoundsEstimator
{
  internal static readonly AvgSpreadBoundsEstimator Instance = new();

  public Bounds Estimate(Sample x, Sample y, Probability misrate)
  {
    return Estimate(x, y, misrate, null);
  }

  public Bounds Estimate(Sample x, Sample y, Probability misrate, string? seed)
  {
    Assertion.MatchedUnit(x, y);
    // Check validity for x (priority 0, subject x)
    Assertion.Validity(x, Subject.X);
    // Check validity for y (priority 0, subject y)
    Assertion.Validity(y, Subject.Y);

    if (double.IsNaN(misrate) || misrate < 0 || misrate > 1)
      throw AssumptionException.Domain(Subject.Misrate);

    int n = x.Size;
    int m = y.Size;
    if (n < 2)
      throw AssumptionException.Domain(Subject.X);
    if (m < 2)
      throw AssumptionException.Domain(Subject.Y);

    double alpha = misrate / 2.0;
    double minX = MinAchievableMisrate.OneSample(n / 2);
    double minY = MinAchievableMisrate.OneSample(m / 2);
    if (alpha < minX || alpha < minY)
      throw AssumptionException.Domain(Subject.Misrate);

    // Check sparity (priority 2)
    Assertion.Sparity(x, Subject.X);
    Assertion.Sparity(y, Subject.Y);

    Bounds boundsX = seed == null
      ? SpreadBoundsEstimator.Instance.Estimate(x, alpha)
      : SpreadBoundsEstimator.Instance.Estimate(x, alpha, seed);
    Bounds boundsY = seed == null
      ? SpreadBoundsEstimator.Instance.Estimate(y, alpha)
      : SpreadBoundsEstimator.Instance.Estimate(y, alpha, seed);

    double weightX = (double)n / (n + m);
    double weightY = (double)m / (n + m);

    double lower = weightX * boundsX.Lower + weightY * boundsY.Lower;
    double upper = weightX * boundsX.Upper + weightY * boundsY.Upper;
    return new Bounds(lower, upper, x.Unit);
  }
}

Tests

AvgSpreadBounds(x,y,misrate)=[LA,UA]\operatorname{AvgSpreadBounds}(\mathbf{x}, \mathbf{y}, \mathrm{misrate}) = [L_A, U_A]

Let α=misrate2\alpha = \frac{\mathrm{misrate}}{2} (equal Bonferroni split). Compute [Lx,Ux]=SpreadBounds(x,α)[L_x, U_x] = \operatorname{SpreadBounds}(\mathbf{x}, \alpha) and [Ly,Uy]=SpreadBounds(y,α)[L_y, U_y] = \operatorname{SpreadBounds}(\mathbf{y}, \alpha) using disjoint-pair sign-test inversion (see SpreadBounds\operatorname{SpreadBounds}). Let wx=nn+mw_x = \frac{n}{n + m} and wy=mn+mw_y = \frac{m}{n + m}. Return [LA,UA]=[wxLx+wyLy,wxUx+wyUy][L_A, U_A] = [w_x L_x + w_y L_y, w_x U_x + w_y U_y].

The AvgSpreadBounds\operatorname{AvgSpreadBounds} test suite validates:

  • bounds are well-formed (LAUAL_A \leq U_A and non-negative)
  • shift invariance and scale equivariance
  • monotonicity in misrate\mathrm{misrate}
  • symmetry under swapping x\mathbf{x} and y\mathbf{y} (with equal split)
  • error cases for invalid inputs and misrate domain violations

Because SpreadBounds\operatorname{SpreadBounds} is randomized, tests fix a seed to make outputs deterministic. Both SpreadBounds\operatorname{SpreadBounds} calls use the same seed (two identical RNG streams).

Minimum misrate constraint the equal split requires

α21n2\alpha \geq 2^{1-\lfloor \frac{n}{2} \rfloor}

and

α21m2\alpha \geq 2^{1-\lfloor \frac{m}{2} \rfloor}

,

so

misrate2max(21n2,21m2)\mathrm{misrate} \geq 2 \cdot \max(2^{1-\lfloor \frac{n}{2} \rfloor}, 2^{1-\lfloor \frac{m}{2} \rfloor})

.