Disparity

Disparity(x,y)=Shift(x,y)/AvgSpread(x,y)\operatorname{Disparity}(\mathbf{x}, \mathbf{y}) = \operatorname{Shift}(\mathbf{x}, \mathbf{y}) / \operatorname{AvgSpread}(\mathbf{x}, \mathbf{y})

where AvgSpread(x,y)=nSpread(x)+mSpread(y)n+m\operatorname{AvgSpread}(\mathbf{x}, \mathbf{y}) = \frac{n \cdot \operatorname{Spread}(\mathbf{x}) + m \cdot \operatorname{Spread}(\mathbf{y})}{n + m} is the weighted average of dispersions (pooled scale).

Robust effect size (shift normalized by pooled dispersion).

  • Also known as — robust Cohens d (Cohen 1988; estimates differ due to robust construction)
  • DomainAvgSpread(x,y)>0\operatorname{AvgSpread}(\mathbf{x}, \mathbf{y}) > 0
  • Assumptionssparity(x), sparity(y)
  • Unit — spread units

Properties

  • Location invariance Disparity(x+k,y+k)=Disparity(x,y)\operatorname{Disparity}(\mathbf{x} + k, \mathbf{y} + k) = \operatorname{Disparity}(\mathbf{x}, \mathbf{y})
  • Scale invariance Disparity(kx,ky)=sign(k)Disparity(x,y)\operatorname{Disparity}(k \cdot \mathbf{x}, k \cdot \mathbf{y}) = \operatorname{sign}(k) \cdot \operatorname{Disparity}(\mathbf{x}, \mathbf{y})
  • Antisymmetry Disparity(x,y)=Disparity(y,x)\operatorname{Disparity}(\mathbf{x}, \mathbf{y}) = -\operatorname{Disparity}(\mathbf{y}, \mathbf{x})

Example

  • Disparity(x, y) = 0.4 where Shift = 2, AvgSpread = 5
  • Disparity(x + c, y + c) = Disparity(x, y) Disparity(kx, ky) = Disparity(x, y)

Disparity\operatorname{Disparity} expresses a difference between groups in a way that does not depend on the original measurement units. A disparity of 0.5 means the groups differ by half a spread unit; 1.0 means one full spread unit. Being dimensionless allows comparison of effect sizes across different studies, metrics, or measurement scales. What counts as a large or small disparity depends entirely on the domain and what matters practically in a given application. Do not rely on universal thresholds; interpret the number in context.

Algorithm

The Disparity\operatorname{Disparity} estimator is a composition of Shift and Spread:

Disparity(x,y)=Shift(x,y)/AvgSpread(x,y)\operatorname{Disparity}(\mathbf{x}, \mathbf{y}) = \operatorname{Shift}(\mathbf{x}, \mathbf{y}) / \operatorname{AvgSpread}(\mathbf{x}, \mathbf{y})

where AvgSpread(x,y)=nSpread(x)+mSpread(y)n+m\operatorname{AvgSpread}(\mathbf{x}, \mathbf{y}) = \frac{n \cdot \operatorname{Spread}(\mathbf{x}) + m \cdot \operatorname{Spread}(\mathbf{y})}{n + m} is the pooled scale.

The algorithm proceeds as follows:

  • Compute Spread for each sample Delegate to the Spread algorithm for x\mathbf{x} and y\mathbf{y} independently.

  • Compute AvgSpread Form the weighted average AvgSpread=nSpread(x)+mSpread(y)n+m\operatorname{AvgSpread} = \frac{n \cdot \operatorname{Spread}(\mathbf{x}) + m \cdot \operatorname{Spread}(\mathbf{y})}{n + m}.

  • Domain check Verify that AvgSpread>0\operatorname{AvgSpread} > 0. If the pooled spread is zero, the division is undefined.

  • Compute Shift Delegate to the Shift algorithm for the pair (x,y)(\mathbf{x}, \mathbf{y}).

  • Divide Return Shift(x,y)/AvgSpread(x,y)\operatorname{Shift}(\mathbf{x}, \mathbf{y}) / \operatorname{AvgSpread}(\mathbf{x}, \mathbf{y}).

using Pragmastat.Algorithms;
using Pragmastat.Exceptions;
using Pragmastat.Internal;
using Pragmastat.Metrology;

namespace Pragmastat.Estimators;

public class DisparityEstimator : ITwoSampleEstimator
{
  public static readonly DisparityEstimator Instance = new();

  public Measurement Estimate(Sample x, Sample y)
  {
    Assertion.MatchedUnit(x, y);
    // Check validity for x (priority 0, subject x)
    Assertion.Validity(x, Subject.X);
    // Check validity for y (priority 0, subject y)
    Assertion.Validity(y, Subject.Y);
    // Check sparity for x (priority 2, subject x)
    Assertion.Sparity(x, Subject.X);
    // Check sparity for y (priority 2, subject y)
    Assertion.Sparity(y, Subject.Y);

    // Calculate shift (we know inputs are valid)
    var shiftVal = FastShift.Estimate(x.SortedValues, y.SortedValues, [0.5], true)[0];
    // Calculate avg_spread (using internal implementation since we already validated)
    var spreadX = FastSpread.Estimate(x.SortedValues, isSorted: true);
    var spreadY = FastSpread.Estimate(y.SortedValues, isSorted: true);
    var avgSpreadVal = (x.Size * spreadX + y.Size * spreadY) / (x.Size + y.Size);

    return (shiftVal / avgSpreadVal).WithUnit(DisparityUnit.Instance);
  }
}

Tests

Disparity(x,y)=Shift(x,y)/AvgSpread(x,y)\operatorname{Disparity}(\mathbf{x}, \mathbf{y}) = \operatorname{Shift}(\mathbf{x}, \mathbf{y}) / \operatorname{AvgSpread}(\mathbf{x}, \mathbf{y})

The Disparity\operatorname{Disparity} test suite contains 28 test cases (16 original + 12 unsorted). Since Disparity\operatorname{Disparity} combines Shift\operatorname{Shift} and AvgSpread\operatorname{AvgSpread}, unsorted tests verify both components handle sorting correctly.

Demo examples (n=m=5n = m = 5) — from manual introduction, validating properties:

  • demo-1: x=(0,3,6,9,12)\mathbf{x} = (0, 3, 6, 9, 12), y=(0,2,4,6,8)\mathbf{y} = (0, 2, 4, 6, 8), expected output: 0.40.4 (base case: 25\frac{2}{5})
  • demo-2: x=(5,8,11,14,17)\mathbf{x} = (5, 8, 11, 14, 17), y=(5,7,9,11,13)\mathbf{y} = (5, 7, 9, 11, 13) (= demo-1 + 5), expected output: 0.40.4 (location invariance)
  • demo-3: x=(0,6,12,18,24)\mathbf{x} = (0, 6, 12, 18, 24), y=(0,4,8,12,16)\mathbf{y} = (0, 4, 8, 12, 16) (= 2 × demo-1), expected output: 0.40.4 (scale invariance)
  • demo-4: x=(0,2,4,6,8)\mathbf{x} = (0, 2, 4, 6, 8), y=(0,3,6,9,12)\mathbf{y} = (0, 3, 6, 9, 12) (= reversed demo-1), expected output: 0.4-0.4 (anti-symmetry)

Natural sequences ([n,m]in2,3×2,3[n, m] in {2, 3} \times {2, 3}) — 4 combinations:

  • natural-2-2, natural-2-3, natural-3-2, natural-3-3
  • Minimum size n,m2n, m \geq 2 required for meaningful dispersion calculations

Negative values ([n,m]=[2,2][n, m] = [2, 2]) — end-to-end validation with negative values:

  • negative-2-2: x=(2,1)\mathbf{x} = (-2, -1), y=(2,1)\mathbf{y} = (-2, -1), expected output: 00

Uniform distribution ([n,m]in5,100×5,100[n, m] in {5, 100} \times {5, 100}) — 4 combinations with Uniform(0,1)\underline{\operatorname{Uniform}}(0, 1):

  • uniform-5-5, uniform-5-100, uniform-100-5, uniform-100-100
  • Random generation: x\mathbf{x} uses seed 0, y\mathbf{y} uses seed 1

The smaller test set for Disparity\operatorname{Disparity} reflects implementation confidence. Since Disparity\operatorname{Disparity} combines Shift\operatorname{Shift} and AvgSpread\operatorname{AvgSpread}, correct implementation of those components ensures Disparity\operatorname{Disparity} correctness. The test cases validate the division operation and confirm scale-free properties.

Composite estimator stress tests — edge cases for effect size calculation:

  • composite-small-avgspread: x=(10.001,10.002,10.003)\mathbf{x} = (10.001, 10.002, 10.003), y=(10.004,10.005,10.006)\mathbf{y} = (10.004, 10.005, 10.006) (tiny spread, large shift)
  • composite-large-avgspread: x=(1,100,200)\mathbf{x} = (1, 100, 200), y=(50,150,250)\mathbf{y} = (50, 150, 250) (large spread, small shift)
  • composite-extreme-disparity: x=(1,1.001)\mathbf{x} = (1, 1.001), y=(100,100.001)\mathbf{y} = (100, 100.001) (extreme ratio, tests precision)

Unsorted tests — verify both Shift and AvgSpread handle sorting (12 tests):

  • unsorted-x-natural-{n}-{m} for (n,m)in(3,3),(4,4)(n,m) in {(3,3), (4,4)}: X unsorted (reversed), Y sorted (2 tests)
  • unsorted-y-natural-{n}-{m} for (n,m)in(3,3),(4,4)(n,m) in {(3,3), (4,4)}: X sorted, Y unsorted (reversed) (2 tests)
  • unsorted-both-natural-{n}-{m} for (n,m)in(3,3),(4,4)(n,m) in {(3,3), (4,4)}: both unsorted (reversed) (2 tests)
  • unsorted-demo-unsorted-x: x=(12,0,6,3,9)\mathbf{x} = (12, 0, 6, 3, 9), y=(0,2,4,6,8)\mathbf{y} = (0, 2, 4, 6, 8) (demo-1 with X unsorted)
  • unsorted-demo-unsorted-y: x=(0,3,6,9,12)\mathbf{x} = (0, 3, 6, 9, 12), y=(8,0,4,2,6)\mathbf{y} = (8, 0, 4, 2, 6) (demo-1 with Y unsorted)
  • unsorted-demo-both-unsorted: x=(9,0,12,3,6)\mathbf{x} = (9, 0, 12, 3, 6), y=(6,0,8,2,4)\mathbf{y} = (6, 0, 8, 2, 4) (demo-1 both unsorted)
  • unsorted-location-invariance-unsorted: x=(17,5,11,8,14)\mathbf{x} = (17, 5, 11, 8, 14), y=(13,5,9,7,11)\mathbf{y} = (13, 5, 9, 7, 11) (demo-2 unsorted)
  • unsorted-scale-invariance-unsorted: x=(24,0,12,6,18)\mathbf{x} = (24, 0, 12, 6, 18), y=(16,0,8,4,12)\mathbf{y} = (16, 0, 8, 4, 12) (demo-3 unsorted)
  • unsorted-anti-symmetry-unsorted: x=(8,0,4,2,6)\mathbf{x} = (8, 0, 4, 2, 6), y=(12,0,6,3,9)\mathbf{y} = (12, 0, 6, 3, 9) (demo-4 reversed and unsorted)

As a composite estimator, Disparity\operatorname{Disparity} tests both the numerator (Shift\operatorname{Shift}) and denominator (AvgSpread\operatorname{AvgSpread}). Unsorted variants verify end-to-end correctness including invariance properties.

References

Statistical Power Analysis for the Behavioral Sciences
Cohen, Jacob (1988)