AvgSpread

AvgSpread(x,y)=nSpread(x)+mSpread(y)n+m\operatorname{AvgSpread}(\mathbf{x}, \mathbf{y}) = \frac{n \cdot \operatorname{Spread}(\mathbf{x}) + m \cdot \operatorname{Spread}(\mathbf{y})}{n + m}

Weighted average of dispersions (pooled scale).

  • Also known as — robust pooled standard deviation
  • Domain — any real numbers
  • Assumptionssparity(x), sparity(y)
  • Unit — same as measurements
  • CaveatAvgSpread(x,y)Spread(xuniony)\operatorname{AvgSpread}(\mathbf{x}, \mathbf{y}) \neq \operatorname{Spread}(\mathbf{x} union \mathbf{y}) (pooled scale, not concatenated spread)

Properties

  • Self-average AvgSpread(x,x)=Spread(x)\operatorname{AvgSpread}(\mathbf{x}, \mathbf{x}) = \operatorname{Spread}(\mathbf{x})
  • Symmetry AvgSpread(x,y)=AvgSpread(y,x)\operatorname{AvgSpread}(\mathbf{x}, \mathbf{y}) = \operatorname{AvgSpread}(\mathbf{y}, \mathbf{x})
  • Scale equivariance AvgSpread(kx,ky)=kAvgSpread(x,y)\operatorname{AvgSpread}(k \cdot \mathbf{x}, k \cdot \mathbf{y}) = \lvert k \rvert \cdot \operatorname{AvgSpread}(\mathbf{x}, \mathbf{y})
  • Mixed scaling AvgSpread(k1x,k2x)=k1+k22Spread(x)\operatorname{AvgSpread}(k_1 \cdot \mathbf{x}, k_2 \cdot \mathbf{x}) = \frac{\lvert k_1 \rvert + \lvert k_2 \rvert}{2} \cdot \operatorname{Spread}(\mathbf{x})

Example

  • AvgSpread(x, y) = 5 where Spread(x) = 6, Spread(y) = 4, n = m
  • AvgSpread(x, y) = AvgSpread(y, x)

AvgSpread\operatorname{AvgSpread} provides a single number representing the typical variability across two groups. It combines the spread of both samples, giving more weight to larger samples since they provide more reliable estimates. This pooled spread serves as a common reference scale, essential for expressing a difference in relative terms. Disparity\operatorname{Disparity} uses AvgSpread\operatorname{AvgSpread} internally to normalize the shift into a scale-free effect size.

Algorithm

The AvgSpread\operatorname{AvgSpread} function computes the weighted average of per-sample spreads:

AvgSpread(x,y)=nSpread(x)+mSpread(y)n+m\operatorname{AvgSpread}(\mathbf{x}, \mathbf{y}) = \frac{n \cdot \operatorname{Spread}(\mathbf{x}) + m \cdot \operatorname{Spread}(\mathbf{y})}{n + m}

The algorithm delegates to the Spread algorithm independently for each sample, then forms the weighted linear combination with weights nn+m\frac{n}{n + m} and mn+m\frac{m}{n + m}.

using Pragmastat.Algorithms;
using Pragmastat.Exceptions;
using Pragmastat.Internal;
using Pragmastat.Metrology;

namespace Pragmastat.Estimators;

internal class AvgSpreadEstimator : ITwoSampleEstimator
{
  public static readonly AvgSpreadEstimator Instance = new();

  public Measurement Estimate(Sample x, Sample y)
  {
    Assertion.MatchedUnit(x, y);
    // Check validity for x (priority 0, subject x)
    Assertion.Validity(x, Subject.X);
    // Check validity for y (priority 0, subject y)
    Assertion.Validity(y, Subject.Y);
    // Check sparity for x (priority 2, subject x)
    Assertion.Sparity(x, Subject.X);
    // Check sparity for y (priority 2, subject y)
    Assertion.Sparity(y, Subject.Y);

    // Calculate spreads (using internal implementation since we already validated)
    var spreadX = FastSpread.Estimate(x.SortedValues, isSorted: true);
    var spreadY = FastSpread.Estimate(y.SortedValues, isSorted: true);
    return ((x.Size * spreadX + y.Size * spreadY) / (x.Size + y.Size)).WithUnitOf(x);
  }
}

Tests

AvgSpread(x,y)=nSpread(x)+mSpread(y)n+m\operatorname{AvgSpread}(\mathbf{x}, \mathbf{y}) = \frac{n \cdot \operatorname{Spread}(\mathbf{x}) + m \cdot \operatorname{Spread}(\mathbf{y})}{n + m}

The AvgSpread\operatorname{AvgSpread} test suite contains 36 test cases (5 demo + 4 natural + 1 negative + 9 additive + 4 uniform + 1 composite + 12 unsorted). Since AvgSpread\operatorname{AvgSpread} is a weighted average of two Spread\operatorname{Spread} estimates, tests validate both the individual spread calculations and the weighting formula.

Demo examples (n=m=5n = m = 5) from manual introduction, validating properties:

  • demo-1: x=(0,3,6,9,12)\mathbf{x} = (0, 3, 6, 9, 12), y=(0,2,4,6,8)\mathbf{y} = (0, 2, 4, 6, 8), expected output: 55 (base case)
  • demo-2: x=(0,3,6,9,12)\mathbf{x} = (0, 3, 6, 9, 12), y=(0,3,6,9,12)\mathbf{y} = (0, 3, 6, 9, 12), expected output: 66 (equal samples)
  • demo-3: x=(0,6,12,18,24)\mathbf{x} = (0, 6, 12, 18, 24), y=(0,9,18,27,36)\mathbf{y} = (0, 9, 18, 27, 36), expected output: 1515 (scale equivariance, 3×3 \times demo-1)
  • demo-4: x=(0,2,4,6,8)\mathbf{x} = (0, 2, 4, 6, 8), y=(0,3,6,9,12)\mathbf{y} = (0, 3, 6, 9, 12), expected output: 55 (swap symmetry with demo-1)
  • demo-5: x=(0,6,12,18,24)\mathbf{x} = (0, 6, 12, 18, 24), y=(0,4,8,12,16)\mathbf{y} = (0, 4, 8, 12, 16), expected output: 1010 (scale, 2×2 \times demo-1)

Natural sequences ([n,m]in2,3×2,3[n, m] in {2, 3} \times {2, 3}) 4 combinations:

  • natural-2-2, natural-2-3, natural-3-2, natural-3-3
  • Minimum size n,m2n, m \geq 2 required for meaningful dispersion

Negative values ([n,m]=[2,2][n, m] = [2, 2]) sign handling validation:

  • negative-2-2: x=(2,1)\mathbf{x} = (-2, -1), y=(2,1)\mathbf{y} = (-2, -1), expected output: 11

Additive distribution ([n,m]in5,10,30×5,10,30[n, m] in {5, 10, 30} \times {5, 10, 30}) 9 combinations with Additive(10,1)\underline{\operatorname{Additive}}(10, 1):

  • additive-5-5, additive-5-10, additive-5-30
  • additive-10-5, additive-10-10, additive-10-30
  • additive-30-5, additive-30-10, additive-30-30
  • Random generation: x\mathbf{x} uses seed 0, y\mathbf{y} uses seed 1

Uniform distribution ([n,m]in5,100×5,100[n, m] in {5, 100} \times {5, 100}) 4 combinations with Uniform(0,1)\underline{\operatorname{Uniform}}(0, 1):

  • uniform-5-5, uniform-5-100, uniform-100-5, uniform-100-100
  • Random generation: x\mathbf{x} uses seed 0, y\mathbf{y} uses seed 1

Composite estimator stress test 1 test:

  • composite-asymmetric-weights: x=(1,2)\mathbf{x} = (1, 2), y=(3,4,5,6,7,8,9,10)\mathbf{y} = (3, 4, 5, 6, 7, 8, 9, 10) (n=2n = 2, m=8m = 8, highly asymmetric weights wx=0.2w_x = 0.2, wy=0.8w_y = 0.8)

Unsorted tests verify sorting independence (12 tests):

  • unsorted-x-natural-{n}-{m} for (n,m)in(3,3),(4,4)(n,m) in {(3,3), (4,4)}: X unsorted (reversed), Y sorted (2 tests)
  • unsorted-y-natural-{n}-{m} for (n,m)in(3,3),(4,4)(n,m) in {(3,3), (4,4)}: X sorted, Y unsorted (reversed) (2 tests)
  • unsorted-both-natural-{n}-{m} for (n,m)in(3,3),(4,4)(n,m) in {(3,3), (4,4)}: both unsorted (reversed) (2 tests)
  • unsorted-demo-unsorted-x: demo-1 with X unsorted
  • unsorted-demo-unsorted-y: demo-1 with Y unsorted
  • unsorted-demo-both-unsorted: demo-1 with both unsorted
  • unsorted-identity-unsorted: equal samples, both unsorted
  • unsorted-negative-unsorted: negative values, both unsorted
  • unsorted-asymmetric-weights-unsorted: asymmetric weights, both unsorted

As a composite estimator, AvgSpread\operatorname{AvgSpread} tests both individual Spread\operatorname{Spread} computations and the weighted combination. Unsorted variants verify end-to-end correctness including the weighting formula.