Tests

Center Tests

Center(x)=median1ijnxi+xj2\operatorname{Center}(\mathbf{x}) = \operatorname{median}_{1 \leq i \leq j \leq n} \frac{x_i + x_j}{2}

The Center\operatorname{Center} test suite contains 38 correctness test cases stored in the repository (24 original + 14 unsorted), plus 1 performance test that should be implemented manually (see Test Framework).

Demo examples (n=5n = 5) — from manual introduction, validating properties:

  • demo-1: x=(0,2,4,6,8)\mathbf{x} = (0, 2, 4, 6, 8), expected output: 44 (base case)
  • demo-2: x=(10,12,14,16,18)\mathbf{x} = (10, 12, 14, 16, 18) (= demo-1 + 10), expected output: 1414 (location equivariance)
  • demo-3: x=(0,6,12,18,24)\mathbf{x} = (0, 6, 12, 18, 24) (= 3 × demo-1), expected output: 1212 (scale equivariance)

Natural sequences (n=1,2,3,4n = 1, 2, 3, 4) — canonical happy path examples:

  • natural-1: x=(1)\mathbf{x} = (1), expected output: 11
  • natural-2: x=(1,2)\mathbf{x} = (1, 2), expected output: 1.51.5
  • natural-3: x=(1,2,3)\mathbf{x} = (1, 2, 3), expected output: 22
  • natural-4: x=(1,2,3,4)\mathbf{x} = (1, 2, 3, 4), expected output: 2.52.5 (smallest even size with rich structure)

Negative values (n=3n = 3) — sign handling validation:

  • negative-3: x=(3,2,1)\mathbf{x} = (-3, -2, -1), expected output: 2-2

Zero values (n=1,2n = 1, 2) — edge case testing with zeros:

  • zeros-1: x=(0)\mathbf{x} = (0), expected output: 00
  • zeros-2: x=(0,0)\mathbf{x} = (0, 0), expected output: 00

Additive distribution (n=5,10,30n = 5, 10, 30) — fuzzy testing with Additive(10,1)\underline{\operatorname{Additive}}(10, 1):

  • additive-5, additive-10, additive-30: random samples generated with seed 0

Uniform distribution (n=5,100n = 5, 100) — fuzzy testing with Uniform(0,1)\underline{\operatorname{Uniform}}(0, 1):

  • uniform-5, uniform-100: random samples generated with seed 1

The random samples validate that Center\operatorname{Center} performs correctly on realistic distributions at various sample sizes. The progression from small (n=5n = 5) to large (n=100n = 100) samples helps identify issues that only manifest at specific scales.

Algorithm stress tests — edge cases for fast algorithm implementation:

  • duplicates-5: x=(3,3,3,3,3)\mathbf{x} = (3, 3, 3, 3, 3) (all identical, stress stall handling)
  • duplicates-10: x=(1,1,1,2,2,2,3,3,3,3)\mathbf{x} = (1, 1, 1, 2, 2, 2, 3, 3, 3, 3) (many duplicates, stress tie-breaking)
  • parity-odd-7: x=(1,2,3,4,5,6,7)\mathbf{x} = (1, 2, 3, 4, 5, 6, 7) (odd sample size for odd total pairs)
  • parity-even-6: x=(1,2,3,4,5,6)\mathbf{x} = (1, 2, 3, 4, 5, 6) (even sample size for even total pairs)
  • parity-odd-49: 49-element sequence (1,2,,49)(1, 2, \ldots, 49) (large odd, 1225 pairs)
  • parity-even-50: 50-element sequence (1,2,,50)(1, 2, \ldots, 50) (large even, 1275 pairs)

Extreme values — numerical stability and range tests:

  • extreme-large-5: x=(108,2108,3108,4108,5108)\mathbf{x} = (10^8, 2 \cdot 10^8, 3 \cdot 10^8, 4 \cdot 10^8, 5 \cdot 10^8) (very large values)
  • extreme-small-5: x=(108,2108,3108,4108,5108)\mathbf{x} = (10^{-8}, 2 \cdot 10^{-8}, 3 \cdot 10^{-8}, 4 \cdot 10^{-8}, 5 \cdot 10^{-8}) (very small positive values)
  • extreme-wide-5: x=(0.001,1,100,1000,1000000)\mathbf{x} = (0.001, 1, 100, 1000, 1000000) (wide range, tests precision)

Unsorted tests — verify sorting correctness (14 tests):

  • unsorted-reverse-{n} for nin2,3,4,5,7n in {2, 3, 4, 5, 7}: reverse sorted natural sequences (5 tests)
  • unsorted-shuffle-3: x=(2,1,3)\mathbf{x} = (2, 1, 3) (middle element first)
  • unsorted-shuffle-4: x=(3,1,4,2)\mathbf{x} = (3, 1, 4, 2) (interleaved)
  • unsorted-shuffle-5: x=(5,2,4,1,3)\mathbf{x} = (5, 2, 4, 1, 3) (complex shuffle)
  • unsorted-last-first-5: x=(5,1,2,3,4)\mathbf{x} = (5, 1, 2, 3, 4) (last moved to first)
  • unsorted-first-last-5: x=(2,3,4,5,1)\mathbf{x} = (2, 3, 4, 5, 1) (first moved to last)
  • unsorted-duplicates-mixed-5: x=(3,3,3,3,3)\mathbf{x} = (3, 3, 3, 3, 3) (all identical, any order)
  • unsorted-duplicates-unsorted-10: x=(3,1,2,3,1,3,2,1,3,2)\mathbf{x} = (3, 1, 2, 3, 1, 3, 2, 1, 3, 2) (duplicates mixed)
  • unsorted-extreme-large-unsorted-5: x=(5108,108,4108,2108,3108)\mathbf{x} = (5 \cdot 10^8, 10^8, 4 \cdot 10^8, 2 \cdot 10^8, 3 \cdot 10^8) (large values unsorted)
  • unsorted-parity-odd-reverse-7: x=(7,6,5,4,3,2,1)\mathbf{x} = (7, 6, 5, 4, 3, 2, 1) (odd size reverse)

These tests ensure implementations correctly sort input data before computing pairwise averages. The variety of shuffle patterns (reverse, rotation, interleaving, single element displacement) catches common sorting bugs.

Performance test — validates the fast O(nlogn)O(n \log n) algorithm:

  • Input: x=(1,2,3,,100000)\mathbf{x} = (1, 2, 3, \ldots, 100000)
  • Expected output: 50000.550000.5
  • Time constraint: Must complete in under 5 seconds
  • Purpose: Ensures that the implementation uses the efficient algorithm rather than materializing all (n+12)5\binom{n+1}{2} \approx 5 billion pairwise averages

This test case is not stored in the repository because it generates a large JSON file (approximately 1.5 MB). Each language implementation should manually implement this test with the hardcoded expected result.

Spread Tests

Spread(x)=median1i<jnxixj\operatorname{Spread}(\mathbf{x}) = \operatorname{median}_{1 \leq i < j \leq n} \lvert x_i - x_j \rvert

The Spread\operatorname{Spread} test suite contains 30 correctness test cases stored in the repository (20 original + 10 unsorted), plus 1 performance test that should be implemented manually (see Test Framework).

Demo examples (n=5n = 5) — from manual introduction, validating properties:

  • demo-1: x=(0,2,4,6,8)\mathbf{x} = (0, 2, 4, 6, 8), expected output: 44 (base case)
  • demo-2: x=(10,12,14,16,18)\mathbf{x} = (10, 12, 14, 16, 18) (= demo-1 + 10), expected output: 44 (location invariance)
  • demo-3: x=(0,4,8,12,16)\mathbf{x} = (0, 4, 8, 12, 16) (= 2 × demo-1), expected output: 88 (scale equivariance)

Natural sequences (n=2,3,4n = 2, 3, 4):

  • natural-2: x=(1,2)\mathbf{x} = (1, 2), expected output: 11
  • natural-3: x=(1,2,3)\mathbf{x} = (1, 2, 3), expected output: 11
  • natural-4: x=(1,2,3,4)\mathbf{x} = (1, 2, 3, 4), expected output: 1.51.5 (smallest even size with rich structure)

Negative values (n=3n = 3) — sign handling validation:

  • negative-3: x=(3,2,1)\mathbf{x} = (-3, -2, -1), expected output: 11

Additive distribution (n=5,10,30n = 5, 10, 30) — Additive(10,1)\underline{\operatorname{Additive}}(10, 1):

  • additive-5, additive-10, additive-30: random samples generated with seed 0

Uniform distribution (n=5,100n = 5, 100) — Uniform(0,1)\underline{\operatorname{Uniform}}(0, 1):

  • uniform-5, uniform-100: random samples generated with seed 1

The natural sequence cases validate the basic pairwise difference calculation. Constant samples and n=1n = 1 are excluded because Spread\operatorname{Spread} requires Spread(x)>0\operatorname{Spread}(\mathbf{x}) > 0.

Algorithm stress tests — edge cases for fast algorithm implementation:

  • duplicates-10: x=(1,1,1,2,2,2,3,3,3,3)\mathbf{x} = (1, 1, 1, 2, 2, 2, 3, 3, 3, 3) (many duplicates, stress tie-breaking)
  • parity-odd-7: x=(1,2,3,4,5,6,7)\mathbf{x} = (1, 2, 3, 4, 5, 6, 7) (odd sample size, 21 differences)
  • parity-even-6: x=(1,2,3,4,5,6)\mathbf{x} = (1, 2, 3, 4, 5, 6) (even sample size, 15 differences)
  • parity-odd-49: 49-element sequence (1,2,,49)(1, 2, \ldots, 49) (large odd, 1176 differences)
  • parity-even-50: 50-element sequence (1,2,,50)(1, 2, \ldots, 50) (large even, 1225 differences)

Extreme values — numerical stability and range tests:

  • extreme-large-5: x=(108,2108,3108,4108,5108)\mathbf{x} = (10^8, 2 \cdot 10^8, 3 \cdot 10^8, 4 \cdot 10^8, 5 \cdot 10^8) (very large values)
  • extreme-small-5: x=(108,2108,3108,4108,5108)\mathbf{x} = (10^{-8}, 2 \cdot 10^{-8}, 3 \cdot 10^{-8}, 4 \cdot 10^{-8}, 5 \cdot 10^{-8}) (very small positive values)
  • extreme-wide-5: x=(0.001,1,100,1000,1000000)\mathbf{x} = (0.001, 1, 100, 1000, 1000000) (wide range, tests precision)

Unsorted tests — verify sorting correctness (10 tests):

  • unsorted-reverse-{n} for nin2,3,4,5,7n in {2, 3, 4, 5, 7}: reverse sorted natural sequences (5 tests)
  • unsorted-shuffle-3: x=(3,1,2)\mathbf{x} = (3, 1, 2) (rotated)
  • unsorted-shuffle-4: x=(4,2,1,3)\mathbf{x} = (4, 2, 1, 3) (mixed order)
  • unsorted-shuffle-5: x=(5,1,3,2,4)\mathbf{x} = (5, 1, 3, 2, 4) (partial shuffle)
  • unsorted-duplicates-unsorted-10: x=(2,3,1,3,2,1,2,3,1,3)\mathbf{x} = (2, 3, 1, 3, 2, 1, 2, 3, 1, 3) (duplicates mixed)
  • unsorted-extreme-wide-unsorted-5: x=(1000,0.001,1000000,100,1)\mathbf{x} = (1000, 0.001, 1000000, 100, 1) (wide range unsorted)

These tests verify that implementations correctly sort input before computing pairwise differences. Since Spread\operatorname{Spread} uses absolute differences, order-dependent bugs would manifest differently than in Center\operatorname{Center}.

Performance test — validates the fast O(nlogn)O(n \log n) algorithm:

  • Input: x=(1,2,3,,100000)\mathbf{x} = (1, 2, 3, \ldots, 100000)
  • Expected output: 2929029290
  • Time constraint: Must complete in under 5 seconds
  • Purpose: Ensures that the implementation uses the efficient algorithm rather than materializing all (n2)5\binom{n}{2} \approx 5 billion pairwise differences

This test case is not stored in the repository because it generates a large JSON file (approximately 1.5 MB). Each language implementation should manually implement this test with the hardcoded expected result.

RelSpread Tests

RelSpread(x)=Spread(x)Center(x)\operatorname{RelSpread}(\mathbf{x}) = \frac{\operatorname{Spread}(\mathbf{x})}{\lvert \operatorname{Center}(\mathbf{x}) \rvert}

The RelSpread\operatorname{RelSpread} test suite contains 18 test cases (13 original + 5 unsorted) focusing on relative dispersion.

Demo examples (n=5n = 5) — from manual introduction, validating properties:

  • demo-1: x=(1,3,5,7,9)\mathbf{x} = (1, 3, 5, 7, 9), expected output: 0.80.8 (base case)
  • demo-2: x=(5,15,25,35,45)\mathbf{x} = (5, 15, 25, 35, 45) (= 5 × demo-1), expected output: 0.80.8 (scale invariance)

Natural sequences (n=2,3,4n = 2, 3, 4):

  • natural-2: x=(1,2)\mathbf{x} = (1, 2), expected output: 0.667\approx 0.667
  • natural-3: x=(1,2,3)\mathbf{x} = (1, 2, 3), expected output: 0.50.5
  • natural-4: x=(1,2,3,4)\mathbf{x} = (1, 2, 3, 4), expected output: 0.60.6 (validates composite with even size)

Uniform distribution (n=5,10,20,30,100n = 5, 10, 20, 30, 100) — Uniform(0,1)\underline{\operatorname{Uniform}}(0, 1):

  • uniform-5, uniform-10, uniform-20, uniform-30, uniform-100: random samples generated with seed 0

The uniform distribution tests span multiple sample sizes to verify that RelSpread\operatorname{RelSpread} correctly normalizes dispersion. Zero and negative values are excluded because RelSpread\operatorname{RelSpread} requires strictly positive samples.

Composite estimator stress tests — edge cases specific to division operation:

  • composite-small-center: x=(0.001,0.002,0.003,0.004,0.005)\mathbf{x} = (0.001, 0.002, 0.003, 0.004, 0.005) (small center, tests division stability)
  • composite-large-spread: x=(1,100,200,300,1000)\mathbf{x} = (1, 100, 200, 300, 1000) (large spread relative to center)
  • composite-extreme-ratio: x=(1,1.0001,1.0002,1.0003,1.0004)\mathbf{x} = (1, 1.0001, 1.0002, 1.0003, 1.0004) (tiny spread, tests precision)

Unsorted tests — verify sorting for composite estimator (5 tests):

  • unsorted-reverse-{n} for nin3,4,5n in {3, 4, 5}: reverse sorted natural sequences (3 tests)
  • unsorted-composite-small-unsorted: x=(0.005,0.001,0.003,0.002,0.004)\mathbf{x} = (0.005, 0.001, 0.003, 0.002, 0.004) (small center unsorted)
  • unsorted-composite-large-unsorted: x=(1000,1,300,100,200)\mathbf{x} = (1000, 1, 300, 100, 200) (large spread unsorted)

Since RelSpread\operatorname{RelSpread} combines both Center\operatorname{Center} and Spread\operatorname{Spread}, these tests verify that sorting works correctly for composite estimators.

Shift Tests

Shift(x,y)=median1in,1jm(xiyj)\operatorname{Shift}(\mathbf{x}, \mathbf{y}) = \operatorname{median}_{1 \leq i \leq n, 1 \leq j \leq m} (x_i - y_j)

The Shift\operatorname{Shift} test suite contains 60 correctness test cases stored in the repository (42 original + 18 unsorted), plus 1 performance test that should be implemented manually (see Test Framework).

Demo examples (n=m=5n = m = 5) — from manual introduction, validating properties:

  • demo-1: x=(0,2,4,6,8)\mathbf{x} = (0, 2, 4, 6, 8), y=(10,12,14,16,18)\mathbf{y} = (10, 12, 14, 16, 18), expected output: 10-10 (base case)
  • demo-2: x=(0,2,4,6,8)\mathbf{x} = (0, 2, 4, 6, 8), y=(0,2,4,6,8)\mathbf{y} = (0, 2, 4, 6, 8), expected output: 00 (identity property)
  • demo-3: x=(7,9,11,13,15)\mathbf{x} = (7, 9, 11, 13, 15), y=(13,15,17,19,21)\mathbf{y} = (13, 15, 17, 19, 21) (= demo-1 + [7,3]), expected output: 6-6 (location equivariance)
  • demo-4: x=(0,4,8,12,16)\mathbf{x} = (0, 4, 8, 12, 16), y=(20,24,28,32,36)\mathbf{y} = (20, 24, 28, 32, 36) (= 2 × demo-1), expected output: 20-20 (scale equivariance)
  • demo-5: x=(10,12,14,16,18)\mathbf{x} = (10, 12, 14, 16, 18), y=(0,2,4,6,8)\mathbf{y} = (0, 2, 4, 6, 8) (= reversed demo-1), expected output: 1010 (anti-symmetry)

Natural sequences ([n,m]in1,2,3×1,2,3[n, m] in {1, 2, 3} \times {1, 2, 3}) — 9 combinations:

  • natural-1-1: x=(1)\mathbf{x} = (1), y=(1)\mathbf{y} = (1), expected output: 00
  • natural-1-2: x=(1)\mathbf{x} = (1), y=(1,2)\mathbf{y} = (1, 2), expected output: 0.5-0.5
  • natural-1-3: x=(1)\mathbf{x} = (1), y=(1,2,3)\mathbf{y} = (1, 2, 3), expected output: 1-1
  • natural-2-1: x=(1,2)\mathbf{x} = (1, 2), y=(1)\mathbf{y} = (1), expected output: 0.50.5
  • natural-2-2: x=(1,2)\mathbf{x} = (1, 2), y=(1,2)\mathbf{y} = (1, 2), expected output: 00
  • natural-2-3: x=(1,2)\mathbf{x} = (1, 2), y=(1,2,3)\mathbf{y} = (1, 2, 3), expected output: 0.5-0.5
  • natural-3-1: x=(1,2,3)\mathbf{x} = (1, 2, 3), y=(1)\mathbf{y} = (1), expected output: 11
  • natural-3-2: x=(1,2,3)\mathbf{x} = (1, 2, 3), y=(1,2)\mathbf{y} = (1, 2), expected output: 0.50.5
  • natural-3-3: x=(1,2,3)\mathbf{x} = (1, 2, 3), y=(1,2,3)\mathbf{y} = (1, 2, 3), expected output: 00

Negative values ([n,m]=[2,2][n, m] = [2, 2]) — sign handling validation:

  • negative-2-2: x=(2,1)\mathbf{x} = (-2, -1), y=(2,1)\mathbf{y} = (-2, -1), expected output: 00

Mixed-sign values ([n,m]=[2,2][n, m] = [2, 2]) — validates anti-symmetry across zero:

  • mixed-2-2: x=(1,1)\mathbf{x} = (-1, 1), y=(1,1)\mathbf{y} = (-1, 1), expected output: 00

Zero values ([n,m]in1,2×1,2[n, m] in {1, 2} \times {1, 2}) — 4 combinations:

  • zeros-1-1, zeros-1-2, zeros-2-1, zeros-2-2: all produce output 00

Additive distribution ([n,m]in5,10,30×5,10,30[n, m] in {5, 10, 30} \times {5, 10, 30}) — 9 combinations with Additive(10,1)\underline{\operatorname{Additive}}(10, 1):

  • additive-5-5, additive-5-10, additive-5-30
  • additive-10-5, additive-10-10, additive-10-30
  • additive-30-5, additive-30-10, additive-30-30
  • Random generation: x\mathbf{x} uses seed 0, y\mathbf{y} uses seed 1

Uniform distribution ([n,m]in5,100×5,100[n, m] in {5, 100} \times {5, 100}) — 4 combinations with Uniform(0,1)\underline{\operatorname{Uniform}}(0, 1):

  • uniform-5-5, uniform-5-100, uniform-100-5, uniform-100-100
  • Random generation: x\mathbf{x} uses seed 2, y\mathbf{y} uses seed 3

The natural sequences validate anti-symmetry (Shift(x,y)=Shift(y,x)\operatorname{Shift}(\mathbf{x}, \mathbf{y}) = -\operatorname{Shift}(\mathbf{y}, \mathbf{x})) and the identity property (Shift(x,x)=0\operatorname{Shift}(\mathbf{x}, \mathbf{x}) = 0). The asymmetric size combinations test the two-sample algorithm with unbalanced inputs.

Algorithm stress tests — edge cases for fast binary search algorithm:

  • duplicates-5-5: x=(3,3,3,3,3)\mathbf{x} = (3, 3, 3, 3, 3), y=(3,3,3,3,3)\mathbf{y} = (3, 3, 3, 3, 3) (all identical, expected output: 00)
  • duplicates-10-10: x=(1,1,2,2,3,3,4,4,5,5)\mathbf{x} = (1, 1, 2, 2, 3, 3, 4, 4, 5, 5), y=(1,1,2,2,3,3,4,4,5,5)\mathbf{y} = (1, 1, 2, 2, 3, 3, 4, 4, 5, 5) (many duplicates)
  • parity-odd-7-7: x=(1,2,3,4,5,6,7)\mathbf{x} = (1, 2, 3, 4, 5, 6, 7), y=(1,2,3,4,5,6,7)\mathbf{y} = (1, 2, 3, 4, 5, 6, 7) (odd sizes, 49 differences, expected output: 00)
  • parity-even-6-6: x=(1,2,3,4,5,6)\mathbf{x} = (1, 2, 3, 4, 5, 6), y=(1,2,3,4,5,6)\mathbf{y} = (1, 2, 3, 4, 5, 6) (even sizes, 36 differences, expected output: 00)
  • parity-asymmetric-7-6: x=(1,2,3,4,5,6,7)\mathbf{x} = (1, 2, 3, 4, 5, 6, 7), y=(1,2,3,4,5,6)\mathbf{y} = (1, 2, 3, 4, 5, 6) (mixed parity, 42 differences)
  • parity-large-49-50: x=(1,2,,49)\mathbf{x} = (1, 2, \ldots, 49), y=(1,2,,50)\mathbf{y} = (1, 2, \ldots, 50) (large asymmetric, 2450 differences)

Extreme asymmetry — tests with very unbalanced sample sizes:

  • asymmetry-1-100: x=(50)\mathbf{x} = (50), y=(1,2,,100)\mathbf{y} = (1, 2, \ldots, 100) (single vs many, 100 differences)
  • asymmetry-2-50: x=(10,20)\mathbf{x} = (10, 20), y=(1,2,,50)\mathbf{y} = (1, 2, \ldots, 50) (tiny vs medium, 100 differences)
  • asymmetry-constant-varied: x=(5,5,5,5,5)\mathbf{x} = (5, 5, 5, 5, 5), y=(1,2,3,4,5,6,7,8,9,10)\mathbf{y} = (1, 2, 3, 4, 5, 6, 7, 8, 9, 10) (constant vs varied)

Unsorted tests — verify independent sorting of each sample (18 tests):

  • unsorted-x-natural-{n}-{m} for (n,m)in(3,3),(4,4),(5,5)(n,m) in {(3,3), (4,4), (5,5)}: X unsorted (reversed), Y sorted (3 tests)
  • unsorted-y-natural-{n}-{m} for (n,m)in(3,3),(4,4),(5,5)(n,m) in {(3,3), (4,4), (5,5)}: X sorted, Y unsorted (reversed) (3 tests)
  • unsorted-both-natural-{n}-{m} for (n,m)in(3,3),(4,4),(5,5)(n,m) in {(3,3), (4,4), (5,5)}: both unsorted (reversed) (3 tests)
  • unsorted-reverse-3-3: x=(3,2,1)\mathbf{x} = (3, 2, 1), y=(3,2,1)\mathbf{y} = (3, 2, 1) (both reversed)
  • unsorted-x-shuffle-3-3: x=(2,1,3)\mathbf{x} = (2, 1, 3), y=(1,2,3)\mathbf{y} = (1, 2, 3) (X shuffled, Y sorted)
  • unsorted-y-shuffle-3-3: x=(1,2,3)\mathbf{x} = (1, 2, 3), y=(3,1,2)\mathbf{y} = (3, 1, 2) (X sorted, Y shuffled)
  • unsorted-both-shuffle-4-4: x=(3,1,4,2)\mathbf{x} = (3, 1, 4, 2), y=(4,2,1,3)\mathbf{y} = (4, 2, 1, 3) (both shuffled)
  • unsorted-duplicates-mixed-5-5: x=(3,3,3,3,3)\mathbf{x} = (3, 3, 3, 3, 3), y=(3,3,3,3,3)\mathbf{y} = (3, 3, 3, 3, 3) (all identical)
  • unsorted-x-unsorted-duplicates: x=(2,1,3,2,1)\mathbf{x} = (2, 1, 3, 2, 1), y=(1,1,2,2,3)\mathbf{y} = (1, 1, 2, 2, 3) (X has unsorted duplicates)
  • unsorted-y-unsorted-duplicates: x=(1,1,2,2,3)\mathbf{x} = (1, 1, 2, 2, 3), y=(3,2,1,3,2)\mathbf{y} = (3, 2, 1, 3, 2) (Y has unsorted duplicates)
  • unsorted-asymmetric-unsorted-2-5: x=(2,1)\mathbf{x} = (2, 1), y=(5,2,4,1,3)\mathbf{y} = (5, 2, 4, 1, 3) (asymmetric sizes, both unsorted)
  • unsorted-negative-unsorted-3-3: x=(1,3,2)\mathbf{x} = (-1, -3, -2), y=(2,3,1)\mathbf{y} = (-2, -3, -1) (negative unsorted)

These tests are critical for two-sample estimators because they verify that x\mathbf{x} and y\mathbf{y} are sorted independently. The variety includes cases where only one sample is unsorted, ensuring implementations dont incorrectly assume pre-sorted input or sort samples together.

Performance test — validates the fast O((m+n)logL)O((m+n) \log L) binary search algorithm:

  • Input: x=(1,2,3,,100000)\mathbf{x} = (1, 2, 3, \ldots, 100000), y=(1,2,3,,100000)\mathbf{y} = (1, 2, 3, \ldots, 100000)
  • Expected output: 00
  • Time constraint: Must complete in under 5 seconds
  • Purpose: Ensures that the implementation uses the efficient algorithm rather than materializing all mn=10m n = 10 billion pairwise differences

This test case is not stored in the repository because it generates a large JSON file (approximately 1.5 MB). Each language implementation should manually implement this test with the hardcoded expected result.

Ratio Tests

Ratio(x,y)=exp(Shift(logx,logy))\operatorname{Ratio}(\mathbf{x}, \mathbf{y}) = \exp(\operatorname{Shift}(\log \mathbf{x}, \log \mathbf{y}))

The Ratio\operatorname{Ratio} test suite contains 37 test cases (25 original + 12 unsorted), excluding zero values due to division constraints. The new definition uses geometric interpolation (via log-space), which affects expected values for even m×nm \times n cases.

Demo examples (n=m=5n = m = 5) — from manual introduction, validating properties:

  • demo-1: x=(1,2,4,8,16)\mathbf{x} = (1, 2, 4, 8, 16), y=(2,4,8,16,32)\mathbf{y} = (2, 4, 8, 16, 32), expected output: 0.50.5 (base case, odd m×nm \times n)
  • demo-2: x=(1,2,4,8,16)\mathbf{x} = (1, 2, 4, 8, 16), y=(1,2,4,8,16)\mathbf{y} = (1, 2, 4, 8, 16), expected output: 11 (identity property)
  • demo-3: x=(2,4,8,16,32)\mathbf{x} = (2, 4, 8, 16, 32), y=(10,20,40,80,160)\mathbf{y} = (10, 20, 40, 80, 160) (= [2×demo-1.x, 5×demo-1.y]), expected output: 0.20.2 (scale property)

Natural sequences ([n,m]in1,2,3×1,2,3[n, m] in {1, 2, 3} \times {1, 2, 3}) — 9 combinations:

  • natural-1-1: x=(1)\mathbf{x} = (1), y=(1)\mathbf{y} = (1), expected output: 11
  • natural-1-2: x=(1)\mathbf{x} = (1), y=(1,2)\mathbf{y} = (1, 2), expected output: 0.707\approx 0.707 (=0.5= \sqrt{0.5}, geometric interpolation)
  • natural-1-3: x=(1)\mathbf{x} = (1), y=(1,2,3)\mathbf{y} = (1, 2, 3), expected output: 0.50.5
  • natural-2-1: x=(1,2)\mathbf{x} = (1, 2), y=(1)\mathbf{y} = (1), expected output: 1.414\approx 1.414 (=2= \sqrt{2}, geometric interpolation)
  • natural-2-2: x=(1,2)\mathbf{x} = (1, 2), y=(1,2)\mathbf{y} = (1, 2), expected output: 11
  • natural-2-3: x=(1,2)\mathbf{x} = (1, 2), y=(1,2,3)\mathbf{y} = (1, 2, 3), expected output: 0.816\approx 0.816 (geometric interpolation)
  • natural-3-1: x=(1,2,3)\mathbf{x} = (1, 2, 3), y=(1)\mathbf{y} = (1), expected output: 22
  • natural-3-2: x=(1,2,3)\mathbf{x} = (1, 2, 3), y=(1,2)\mathbf{y} = (1, 2), expected output: 1.225\approx 1.225 (geometric interpolation)
  • natural-3-3: x=(1,2,3)\mathbf{x} = (1, 2, 3), y=(1,2,3)\mathbf{y} = (1, 2, 3), expected output: 11

Additive distribution ([n,m]in5,10,30×5,10,30[n, m] in {5, 10, 30} \times {5, 10, 30}) — 9 combinations with Additive(10,1)\underline{\operatorname{Additive}}(10, 1):

  • additive-5-5, additive-5-10, additive-5-30
  • additive-10-5, additive-10-10, additive-10-30
  • additive-30-5, additive-30-10, additive-30-30
  • Random generation: x\mathbf{x} uses seed 0, y\mathbf{y} uses seed 1

Uniform distribution ([n,m]in5,100×5,100[n, m] in {5, 100} \times {5, 100}) — 4 combinations with Uniform(0,1)\underline{\operatorname{Uniform}}(0, 1):

  • uniform-5-5, uniform-5-100, uniform-100-5, uniform-100-100
  • Random generation: x\mathbf{x} uses seed 2, y\mathbf{y} uses seed 3
  • Note: all generated values are strictly positive (no zeros); values near zero test numerical stability of log-transformation

The natural sequences verify the identity property (Ratio(x,x)=1\operatorname{Ratio}(\mathbf{x}, \mathbf{x}) = 1) and validate ratio calculations with simple integer inputs. Note that implementations should handle the practical constraint of avoiding division by values near zero.

Unsorted tests — verify independent sorting for ratio calculation (12 tests):

  • unsorted-x-natural-{n}-{m} for (n,m)in(3,3),(4,4)(n,m) in {(3,3), (4,4)}: X unsorted (reversed), Y sorted (2 tests)
  • unsorted-y-natural-{n}-{m} for (n,m)in(3,3),(4,4)(n,m) in {(3,3), (4,4)}: X sorted, Y unsorted (reversed) (2 tests)
  • unsorted-both-natural-{n}-{m} for (n,m)in(3,3),(4,4)(n,m) in {(3,3), (4,4)}: both unsorted (reversed) (2 tests)
  • unsorted-demo-unsorted-x: x=(16,1,8,2,4)\mathbf{x} = (16, 1, 8, 2, 4), y=(2,4,8,16,32)\mathbf{y} = (2, 4, 8, 16, 32) (demo-1 with X unsorted)
  • unsorted-demo-unsorted-y: x=(1,2,4,8,16)\mathbf{x} = (1, 2, 4, 8, 16), y=(32,2,16,4,8)\mathbf{y} = (32, 2, 16, 4, 8) (demo-1 with Y unsorted)
  • unsorted-demo-both-unsorted: x=(8,1,16,4,2)\mathbf{x} = (8, 1, 16, 4, 2), y=(16,32,2,8,4)\mathbf{y} = (16, 32, 2, 8, 4) (demo-1 both unsorted)
  • unsorted-identity-unsorted: x=(4,1,8,2,16)\mathbf{x} = (4, 1, 8, 2, 16), y=(16,1,8,4,2)\mathbf{y} = (16, 1, 8, 4, 2) (identity property, both unsorted)
  • unsorted-asymmetric-unsorted-2-3: x=(2,1)\mathbf{x} = (2, 1), y=(3,1,2)\mathbf{y} = (3, 1, 2) (asymmetric, both unsorted)
  • unsorted-power-unsorted-5: x=(16,2,8,1,4)\mathbf{x} = (16, 2, 8, 1, 4), y=(32,4,16,2,8)\mathbf{y} = (32, 4, 16, 2, 8) (powers of 2 unsorted)

AvgSpread Tests

AvgSpread(x,y)=nSpread(x)+mSpread(y)n+m\operatorname{AvgSpread}(\mathbf{x}, \mathbf{y}) = \frac{n \cdot \operatorname{Spread}(\mathbf{x}) + m \cdot \operatorname{Spread}(\mathbf{y})}{n + m}

The AvgSpread\operatorname{AvgSpread} test suite contains 36 test cases (24 original + 12 unsorted). Since AvgSpread\operatorname{AvgSpread} computes Spread(x)\operatorname{Spread}(\mathbf{x}) and Spread(y)\operatorname{Spread}(\mathbf{y}) independently, unsorted tests are critical to verify that both samples are sorted independently before computing their spreads.

Demo examples (n=m=5n = m = 5) — from manual introduction, validating properties:

  • demo-1: x=(0,3,6,9,12)\mathbf{x} = (0, 3, 6, 9, 12), y=(0,2,4,6,8)\mathbf{y} = (0, 2, 4, 6, 8), expected output: 55 (base case: 56+5410\frac{5 \cdot 6 + 5 \cdot 4}{10})
  • demo-2: x=(0,3,6,9,12)\mathbf{x} = (0, 3, 6, 9, 12), y=(0,3,6,9,12)\mathbf{y} = (0, 3, 6, 9, 12), expected output: 66 (identity case)
  • demo-3: x=(0,6,12,18,24)\mathbf{x} = (0, 6, 12, 18, 24), y=(0,9,18,27,36)\mathbf{y} = (0, 9, 18, 27, 36) (= [2×demo-1.x, 3×demo-1.y]), expected output: 1515 (scale equivariance)
  • demo-4: x=(0,2,4,6,8)\mathbf{x} = (0, 2, 4, 6, 8), y=(0,3,6,9,12)\mathbf{y} = (0, 3, 6, 9, 12) (= reversed demo-1), expected output: 55 (symmetry)
  • demo-5: x=(0,6,12,18,24)\mathbf{x} = (0, 6, 12, 18, 24), y=(0,4,8,12,16)\mathbf{y} = (0, 4, 8, 12, 16) (= 2 × demo-1), expected output: 1010 (uniform scaling)

Natural sequences ([n,m]in2,3×2,3[n, m] in {2, 3} \times {2, 3}) — 4 combinations:

  • All combinations for two- and three-element samples, validating the weighted average calculation

Negative values ([n,m]=[2,2][n, m] = [2, 2]) — validates spread calculation with negative values:

  • negative-2-2: x=(2,1)\mathbf{x} = (-2, -1), y=(2,1)\mathbf{y} = (-2, -1), expected output: 11

Additive distribution ([n,m]in5,10,30×5,10,30[n, m] in {5, 10, 30} \times {5, 10, 30}) — 9 combinations with Additive(10,1)\underline{\operatorname{Additive}}(10, 1):

  • Tests pooled dispersion across different sample size combinations
  • Random generation: x\mathbf{x} uses seed 0, y\mathbf{y} uses seed 1

Uniform distribution ([n,m]in5,100×5,100[n, m] in {5, 100} \times {5, 100}) — 4 combinations with Uniform(0,1)\underline{\operatorname{Uniform}}(0, 1):

  • Validates correct weighting when sample sizes differ substantially
  • Random generation: x\mathbf{x} uses seed 2, y\mathbf{y} uses seed 3

The asymmetric size combinations are particularly important for AvgSpread\operatorname{AvgSpread} because the estimator must correctly weight each samples contribution by its size.

Composite estimator stress tests — edge cases for weighted averaging:

  • composite-asymmetric-weights: x=(1,2)\mathbf{x} = (1, 2), y=(3,4,5,6,7,8,9,10)\mathbf{y} = (3, 4, 5, 6, 7, 8, 9, 10) (2 vs 8, tests weighting formula)

Unsorted tests — critical for verifying independent sorting (12 tests):

  • unsorted-x-natural-{n}-{m} for (n,m)in(3,3),(4,4)(n,m) in {(3,3), (4,4)}: X unsorted (reversed), Y sorted (2 tests)
  • unsorted-y-natural-{n}-{m} for (n,m)in(3,3),(4,4)(n,m) in {(3,3), (4,4)}: X sorted, Y unsorted (reversed) (2 tests)
  • unsorted-both-natural-{n}-{m} for (n,m)in(3,3),(4,4)(n,m) in {(3,3), (4,4)}: both unsorted (reversed) (2 tests)
  • unsorted-demo-unsorted-x: x=(12,0,6,3,9)\mathbf{x} = (12, 0, 6, 3, 9), y=(0,2,4,6,8)\mathbf{y} = (0, 2, 4, 6, 8) (demo-1 with X unsorted)
  • unsorted-demo-unsorted-y: x=(0,3,6,9,12)\mathbf{x} = (0, 3, 6, 9, 12), y=(8,0,4,2,6)\mathbf{y} = (8, 0, 4, 2, 6) (demo-1 with Y unsorted)
  • unsorted-demo-both-unsorted: x=(9,0,12,3,6)\mathbf{x} = (9, 0, 12, 3, 6), y=(6,0,8,2,4)\mathbf{y} = (6, 0, 8, 2, 4) (demo-1 both unsorted)
  • unsorted-identity-unsorted: x=(6,0,12,3,9)\mathbf{x} = (6, 0, 12, 3, 9), y=(9,0,12,6,3)\mathbf{y} = (9, 0, 12, 6, 3) (demo-2 unsorted)
  • unsorted-negative-unsorted: x=(1,2)\mathbf{x} = (-1, -2), y=(1,2)\mathbf{y} = (-1, -2) (negative unsorted)
  • unsorted-asymmetric-weights-unsorted: x=(2,1)\mathbf{x} = (2, 1), y=(8,3,6,4,10,5,9,7)\mathbf{y} = (8, 3, 6, 4, 10, 5, 9, 7) (asymmetric unsorted)

These tests verify that implementations compute Spread(x)\operatorname{Spread}(\mathbf{x}) and Spread(y)\operatorname{Spread}(\mathbf{y}) with properly sorted samples.

Disparity Tests

Disparity(x,y)=Shift(x,y)/AvgSpread(x,y)\operatorname{Disparity}(\mathbf{x}, \mathbf{y}) = \operatorname{Shift}(\mathbf{x}, \mathbf{y}) / \operatorname{AvgSpread}(\mathbf{x}, \mathbf{y})

The Disparity\operatorname{Disparity} test suite contains 28 test cases (16 original + 12 unsorted). Since Disparity\operatorname{Disparity} combines Shift\operatorname{Shift} and AvgSpread\operatorname{AvgSpread}, unsorted tests verify both components handle sorting correctly.

Demo examples (n=m=5n = m = 5) — from manual introduction, validating properties:

  • demo-1: x=(0,3,6,9,12)\mathbf{x} = (0, 3, 6, 9, 12), y=(0,2,4,6,8)\mathbf{y} = (0, 2, 4, 6, 8), expected output: 0.40.4 (base case: 25\frac{2}{5})
  • demo-2: x=(5,8,11,14,17)\mathbf{x} = (5, 8, 11, 14, 17), y=(5,7,9,11,13)\mathbf{y} = (5, 7, 9, 11, 13) (= demo-1 + 5), expected output: 0.40.4 (location invariance)
  • demo-3: x=(0,6,12,18,24)\mathbf{x} = (0, 6, 12, 18, 24), y=(0,4,8,12,16)\mathbf{y} = (0, 4, 8, 12, 16) (= 2 × demo-1), expected output: 0.40.4 (scale invariance)
  • demo-4: x=(0,2,4,6,8)\mathbf{x} = (0, 2, 4, 6, 8), y=(0,3,6,9,12)\mathbf{y} = (0, 3, 6, 9, 12) (= reversed demo-1), expected output: 0.4-0.4 (anti-symmetry)

Natural sequences ([n,m]in2,3×2,3[n, m] in {2, 3} \times {2, 3}) — 4 combinations:

  • natural-2-2, natural-2-3, natural-3-2, natural-3-3
  • Minimum size n,m2n, m \geq 2 required for meaningful dispersion calculations

Negative values ([n,m]=[2,2][n, m] = [2, 2]) — end-to-end validation with negative values:

  • negative-2-2: x=(2,1)\mathbf{x} = (-2, -1), y=(2,1)\mathbf{y} = (-2, -1), expected output: 00

Uniform distribution ([n,m]in5,100×5,100[n, m] in {5, 100} \times {5, 100}) — 4 combinations with Uniform(0,1)\underline{\operatorname{Uniform}}(0, 1):

  • uniform-5-5, uniform-5-100, uniform-100-5, uniform-100-100
  • Random generation: x\mathbf{x} uses seed 0, y\mathbf{y} uses seed 1

The smaller test set for Disparity\operatorname{Disparity} reflects implementation confidence. Since Disparity\operatorname{Disparity} combines Shift\operatorname{Shift} and AvgSpread\operatorname{AvgSpread}, correct implementation of those components ensures Disparity\operatorname{Disparity} correctness. The test cases validate the division operation and confirm scale-free properties.

Composite estimator stress tests — edge cases for effect size calculation:

  • composite-small-avgspread: x=(10.001,10.002,10.003)\mathbf{x} = (10.001, 10.002, 10.003), y=(10.004,10.005,10.006)\mathbf{y} = (10.004, 10.005, 10.006) (tiny spread, large shift)
  • composite-large-avgspread: x=(1,100,200)\mathbf{x} = (1, 100, 200), y=(50,150,250)\mathbf{y} = (50, 150, 250) (large spread, small shift)
  • composite-extreme-disparity: x=(1,1.001)\mathbf{x} = (1, 1.001), y=(100,100.001)\mathbf{y} = (100, 100.001) (extreme ratio, tests precision)

Unsorted tests — verify both Shift and AvgSpread handle sorting (12 tests):

  • unsorted-x-natural-{n}-{m} for (n,m)in(3,3),(4,4)(n,m) in {(3,3), (4,4)}: X unsorted (reversed), Y sorted (2 tests)
  • unsorted-y-natural-{n}-{m} for (n,m)in(3,3),(4,4)(n,m) in {(3,3), (4,4)}: X sorted, Y unsorted (reversed) (2 tests)
  • unsorted-both-natural-{n}-{m} for (n,m)in(3,3),(4,4)(n,m) in {(3,3), (4,4)}: both unsorted (reversed) (2 tests)
  • unsorted-demo-unsorted-x: x=(12,0,6,3,9)\mathbf{x} = (12, 0, 6, 3, 9), y=(0,2,4,6,8)\mathbf{y} = (0, 2, 4, 6, 8) (demo-1 with X unsorted)
  • unsorted-demo-unsorted-y: x=(0,3,6,9,12)\mathbf{x} = (0, 3, 6, 9, 12), y=(8,0,4,2,6)\mathbf{y} = (8, 0, 4, 2, 6) (demo-1 with Y unsorted)
  • unsorted-demo-both-unsorted: x=(9,0,12,3,6)\mathbf{x} = (9, 0, 12, 3, 6), y=(6,0,8,2,4)\mathbf{y} = (6, 0, 8, 2, 4) (demo-1 both unsorted)
  • unsorted-location-invariance-unsorted: x=(17,5,11,8,14)\mathbf{x} = (17, 5, 11, 8, 14), y=(13,5,9,7,11)\mathbf{y} = (13, 5, 9, 7, 11) (demo-2 unsorted)
  • unsorted-scale-invariance-unsorted: x=(24,0,12,6,18)\mathbf{x} = (24, 0, 12, 6, 18), y=(16,0,8,4,12)\mathbf{y} = (16, 0, 8, 4, 12) (demo-3 unsorted)
  • unsorted-anti-symmetry-unsorted: x=(8,0,4,2,6)\mathbf{x} = (8, 0, 4, 2, 6), y=(12,0,6,3,9)\mathbf{y} = (12, 0, 6, 3, 9) (demo-4 reversed and unsorted)

As a composite estimator, Disparity\operatorname{Disparity} tests both the numerator (Shift\operatorname{Shift}) and denominator (AvgSpread\operatorname{AvgSpread}). Unsorted variants verify end-to-end correctness including invariance properties.

PairwiseMargin Tests

PairwiseMargin(n,m,misrate)\operatorname{PairwiseMargin}(n, m, \mathrm{misrate})

The PairwiseMargin\operatorname{PairwiseMargin} test suite contains 178 test cases (4 demo + 4 natural + 10 edge + 12 small grid + 148 large grid). The domain constraint misrate2(n+mn)\mathrm{misrate} \geq \frac{2}{\binom{n+m}{n}} is enforced; inputs violating this return a domain error. Combinations where the requested misrate falls below the minimum achievable misrate are excluded from the grid.

Demo examples (n=m=30n = m = 30) — from manual introduction:

  • demo-1: n=30n=30, m=30m=30, misrate=106\mathrm{misrate}=10^{-6}, expected output: 276276
  • demo-2: n=30n=30, m=30m=30, misrate=105\mathrm{misrate}=10^{-5}, expected output: 328328
  • demo-3: n=30n=30, m=30m=30, misrate=104\mathrm{misrate}=10^{-4}, expected output: 390390
  • demo-4: n=30n=30, m=30m=30, misrate=103\mathrm{misrate}=10^{-3}, expected output: 464464

These demo cases match the reference values used throughout the manual to illustrate ShiftBounds\operatorname{ShiftBounds} construction.

Natural sequences ([n,m]in1,2,3,4×1,2,3,4[n, m] in {1, 2, 3, 4} \times {1, 2, 3, 4} × 2 misrates, filtered by min misrate) — 4 tests:

  • Misrate values: misratein101,102\mathrm{misrate} in {10^{-1}, 10^{-2}}
  • After filtering by misrate2(n+mn)\mathrm{misrate} \geq \frac{2}{\binom{n+m}{n}}, only 4 combinations survive:
  • natural-3-3-mr1: n=3n=3, m=3m=3, misrate=0.1\mathrm{misrate}=0.1, expected output: 00
  • natural-3-4-mr1: n=3n=3, m=4m=4, misrate=0.1\mathrm{misrate}=0.1
  • natural-4-3-mr1: n=4n=4, m=3m=3, misrate=0.1\mathrm{misrate}=0.1
  • natural-4-4-mr1: n=4n=4, m=4m=4, misrate=0.1\mathrm{misrate}=0.1

Edge cases — boundary condition validation (10 tests):

  • boundary-min: n=1n=1, m=1m=1, misrate=1.0\mathrm{misrate}=1.0 (minimum samples with maximum misrate, expected output: 00)
  • boundary-zero-margin-small: n=20n=20, m=20m=20, misrate=106\mathrm{misrate}=10^{-6} (strict misrate with sufficient samples)
  • boundary-loose: n=5n=5, m=5m=5, misrate=0.9\mathrm{misrate}=0.9 (very permissive misrate)
  • symmetry-2-5: n=2n=2, m=5m=5, misrate=0.1\mathrm{misrate}=0.1 (tests symmetry property)
  • symmetry-5-2: n=5n=5, m=2m=2, misrate=0.1\mathrm{misrate}=0.1 (symmetric counterpart, same output as above)
  • symmetry-3-7: n=3n=3, m=7m=7, misrate=0.05\mathrm{misrate}=0.05 (asymmetric sizes)
  • symmetry-7-3: n=7n=7, m=3m=3, misrate=0.05\mathrm{misrate}=0.05 (symmetric counterpart)
  • asymmetry-extreme-1-100: n=1n=1, m=100m=100, misrate=0.1\mathrm{misrate}=0.1 (extreme size difference)
  • asymmetry-extreme-100-1: n=100n=100, m=1m=1, misrate=0.1\mathrm{misrate}=0.1 (reversed extreme)
  • asymmetry-extreme-2-50: n=2n=2, m=50m=50, misrate=0.05\mathrm{misrate}=0.05 (highly unbalanced)

These edge cases validate correct handling of boundary conditions, the symmetry property PairwiseMargin(n,m,misrate)=PairwiseMargin(m,n,misrate)\operatorname{PairwiseMargin}(n, m, \mathrm{misrate}) = \operatorname{PairwiseMargin}(m, n, \mathrm{misrate}), and extreme asymmetry in sample sizes.

Comprehensive grid — systematic coverage for thorough validation:

Small sample combinations ([n,m]in1,2,3,4,5×1,2,3,4,5[n, m] in {1, 2, 3, 4, 5} \times {1, 2, 3, 4, 5} × 6 misrates, filtered) — 12 tests:

  • Misrate values: misratein101,102,103,104,105,106\mathrm{misrate} in {10^{-1}, 10^{-2}, 10^{-3}, 10^{-4}, 10^{-5}, 10^{-6}}
  • Combinations where misrate<2(n+mn)\mathrm{misrate} < \frac{2}{\binom{n+m}{n}} are excluded
  • Test naming: n{n}_m{m}_mr{k} where kk is the negative log10 of misrate
  • Examples:
  • n5_m5_mr1: n=5n=5, m=5m=5, misrate=0.1\mathrm{misrate}=0.1, expected output: 1010
  • n5_m5_mr2: n=5n=5, m=5m=5, misrate=0.01\mathrm{misrate}=0.01

Large sample combinations ([n,m]in10,20,30,50,100×10,20,30,50,100[n, m] in {10, 20, 30, 50, 100} \times {10, 20, 30, 50, 100} × 6 misrates, filtered) — 148 tests:

  • Misrate values: same as small samples
  • Combinations where misrate<2(n+mn)\mathrm{misrate} < \frac{2}{\binom{n+m}{n}} are excluded (affects n=m=10n = m = 10 at misrates 10510^{-5} and 10610^{-6})
  • Test naming: n{n}_m{m}_r{k} where kk is the negative log10 of misrate
  • Examples:
  • n10_m10_r1: n=10n=10, m=10m=10, misrate=0.1\mathrm{misrate}=0.1, expected output: 5656
  • n50_m50_r3: n=50n=50, m=50m=50, misrate=0.001\mathrm{misrate}=0.001, expected output: 15561556
  • n100_m100_r6: n=100n=100, m=100m=100, misrate=106\mathrm{misrate}=10^{-6}, expected output: 60606060

The comprehensive grid validates both symmetric (n=mn = m) and asymmetric sample size combinations across six orders of magnitude in misrate, ensuring robust coverage of the parameter space.

SignedRankMargin Tests

SignedRankMargin(n,misrate)\operatorname{SignedRankMargin}(n, \mathrm{misrate})

The SignedRankMargin\operatorname{SignedRankMargin} test suite contains 39 correctness test cases (4 demo + 6 boundary + 7 exact + 20 medium + 2 error).

Demo examples (n=30n = 30) — from manual introduction:

  • demo-1: n=30n=30, misrate=106\mathrm{misrate}=10^{-6}, expected output: 4646
  • demo-2: n=30n=30, misrate=105\mathrm{misrate}=10^{-5}, expected output: 7474
  • demo-3: n=30n=30, misrate=104\mathrm{misrate}=10^{-4}, expected output: 112112
  • demo-4: n=30n=30, misrate=103\mathrm{misrate}=10^{-3}, expected output: 158158

These demo cases match the reference values used throughout the manual to illustrate CenterBounds\operatorname{CenterBounds} construction.

Boundary cases — minimum achievable misrate validation:

  • boundary-n2-min: n=2n=2, misrate=0.5\mathrm{misrate}=0.5 (minimum misrate for n=2n=2, expected output: 00)
  • boundary-n3-min: n=3n=3, misrate=0.25\mathrm{misrate}=0.25 (minimum misrate for n=3n=3)
  • boundary-n4-min: n=4n=4, misrate=0.125\mathrm{misrate}=0.125 (minimum misrate for n=4n=4)
  • boundary-loose: n=5n=5, misrate=0.5\mathrm{misrate}=0.5 (permissive misrate)
  • boundary-tight: n=10n=10, misrate=0.01\mathrm{misrate}=0.01 (strict misrate)
  • boundary-very-tight: n=20n=20, misrate=0.001\mathrm{misrate}=0.001 (very strict misrate)

These boundary cases validate correct handling of minimum achievable misrate (formula: 21n2^{1-n}) and edge conditions.

Exact computation (n10n \leq 10) — validates dynamic programming path:

  • exact-n5-mr1e1: n=5n=5, misrate=0.1\mathrm{misrate}=0.1
  • exact-n6-mr1e1: n=6n=6, misrate=0.1\mathrm{misrate}=0.1
  • exact-n6-mr5e2: n=6n=6, misrate=0.05\mathrm{misrate}=0.05
  • exact-n10-mr1e1: n=10n=10, misrate=0.1\mathrm{misrate}=0.1, expected output: 2222
  • exact-n10-mr1e2: n=10n=10, misrate=0.01\mathrm{misrate}=0.01
  • exact-n10-mr5e2: n=10n=10, misrate=0.05\mathrm{misrate}=0.05
  • exact-n10-mr5e3: n=10n=10, misrate=0.005\mathrm{misrate}=0.005

These cases exercise the exact Wilcoxon signed-rank CDF computation for small samples where dynamic programming is used.

Medium samples (nin15,20,30,50,100n in {15, 20, 30, 50, 100} × 4 misrates) — 20 tests:

  • Misrate values: misratein101,102,103,104\mathrm{misrate} in {10^{-1}, 10^{-2}, 10^{-3}, 10^{-4}}
  • Test naming: medium-n{n}-mr{k} where kk encodes the misrate
  • Examples:
  • medium-n15-mr1e1: n=15n=15, misrate=0.1\mathrm{misrate}=0.1
  • medium-n30-mr1e2: n=30n=30, misrate=0.01\mathrm{misrate}=0.01, expected output: 220220
  • medium-n50-mr1e3: n=50n=50, misrate=0.001\mathrm{misrate}=0.001
  • medium-n100-mr1e4: n=100n=100, misrate=0.0001\mathrm{misrate}=0.0001

The medium sample tests validate the transition region between exact computation (n63n \leq 63) and approximate computation, ensuring consistent results across sample sizes and misrate values.

Error case — domain violation:

  • error-n1: n=1n=1, misrate=0.5\mathrm{misrate}=0.5 (invalid: misrate below minimum achievable 211=1.02^{1-1} = 1.0)
  • error-n0: n=0n=0, misrate=0.05\mathrm{misrate}=0.05 (invalid: n must be positive)

This error case verifies that implementations correctly reject n=1n=1 with misrate=0.5\mathrm{misrate}=0.5 as invalid input, since the minimum achievable misrate for n=1n=1 is 20=1.02^0 = 1.0.

ShiftBounds Tests

ShiftBounds(x,y,misrate)=[z(kleft),z(kright)]\operatorname{ShiftBounds}(\mathbf{x}, \mathbf{y}, \mathrm{misrate}) = [z_{(k_{\text{left}})}, z_{(k_{\text{right}})}]

where

z=xiyj1in,1jm(sorted)\mathbf{z} = { x_i - y_j }_{1 \leq i \leq n, 1 \leq j \leq m} \quad (\text{sorted}) kleft=PairwiseMargin(n,m,misrate)2+1k_{\text{left}} = \lfloor \frac{\operatorname{PairwiseMargin}(n, m, \mathrm{misrate})}{2} \rfloor + 1 kright=nmPairwiseMargin(n,m,misrate)2k_{\text{right}} = n m - \lfloor \frac{\operatorname{PairwiseMargin}(n, m, \mathrm{misrate})}{2} \rfloor

The ShiftBounds\operatorname{ShiftBounds} test suite contains 61 correctness test cases (3 demo + 9 natural + 6 property + 10 edge + 9 additive + 4 uniform + 5 misrate + 15 unsorted). Since ShiftBounds\operatorname{ShiftBounds} returns bounds rather than a point estimate, tests validate that the bounds contain Shift(x,y)\operatorname{Shift}(\mathbf{x}, \mathbf{y}) and satisfy equivariance properties. Each test case output is a JSON object with lower and upper fields representing the interval bounds. The domain constraint misrate2(n+mn)\mathrm{misrate} \geq \frac{2}{\binom{n+m}{n}} is enforced; inputs violating this return a domain error.

Demo examples (n=m=5n = m = 5) — from manual introduction, validating basic bounds:

  • demo-1: x=(1,2,3,4,5)\mathbf{x} = (1, 2, 3, 4, 5), y=(3,4,5,6,7)\mathbf{y} = (3, 4, 5, 6, 7), misrate=0.05\mathrm{misrate} = 0.05, expected output: [4,0][-4, 0]
  • demo-2: x=(1,2,3,4,5)\mathbf{x} = (1, 2, 3, 4, 5), y=(3,4,5,6,7)\mathbf{y} = (3, 4, 5, 6, 7), misrate=0.01\mathrm{misrate} = 0.01, expected output: [5,1][-5, 1]
  • demo-3: x=(3,4,5,6,7)\mathbf{x} = (3, 4, 5, 6, 7), y=(3,4,5,6,7)\mathbf{y} = (3, 4, 5, 6, 7), misrate=0.05\mathrm{misrate} = 0.05, expected output: bounds containing 00 (identity case)

These cases illustrate how tighter misrates produce wider bounds and validate the identity property where identical samples yield bounds containing zero.

Natural sequences ([n,m]in5,8,10×5,8,10[n, m] in {5, 8, 10} \times {5, 8, 10}, misrate=102\mathrm{misrate} = 10^{-2}) — 9 combinations:

  • natural-5-5: x=(1,,5)\mathbf{x} = (1, \ldots, 5), y=(1,,5)\mathbf{y} = (1, \ldots, 5), expected bounds containing 00
  • natural-5-8: x=(1,,5)\mathbf{x} = (1, \ldots, 5), y=(1,,8)\mathbf{y} = (1, \ldots, 8)
  • natural-5-10: x=(1,,5)\mathbf{x} = (1, \ldots, 5), y=(1,,10)\mathbf{y} = (1, \ldots, 10)
  • natural-8-5: x=(1,,8)\mathbf{x} = (1, \ldots, 8), y=(1,,5)\mathbf{y} = (1, \ldots, 5)
  • natural-8-8: x=(1,,8)\mathbf{x} = (1, \ldots, 8), y=(1,,8)\mathbf{y} = (1, \ldots, 8), expected bounds containing 00
  • natural-8-10: x=(1,,8)\mathbf{x} = (1, \ldots, 8), y=(1,,10)\mathbf{y} = (1, \ldots, 10)
  • natural-10-5: x=(1,,10)\mathbf{x} = (1, \ldots, 10), y=(1,,5)\mathbf{y} = (1, \ldots, 5)
  • natural-10-8: x=(1,,10)\mathbf{x} = (1, \ldots, 10), y=(1,,8)\mathbf{y} = (1, \ldots, 8)
  • natural-10-10: x=(1,,10)\mathbf{x} = (1, \ldots, 10), y=(1,,10)\mathbf{y} = (1, \ldots, 10), expected bounds containing 00

These sizes are chosen to satisfy misrate2(n+mn)\mathrm{misrate} \geq \frac{2}{\binom{n+m}{n}} for all combinations.

Property validation (n=m=10n = m = 10, misrate=103\mathrm{misrate} = 10^{-3}) — 6 tests:

  • property-identity: x=(0,2,4,,18)\mathbf{x} = (0, 2, 4, \ldots, 18), y=(0,2,4,,18)\mathbf{y} = (0, 2, 4, \ldots, 18), bounds must contain 00
  • property-location-shift: x=(7,9,11,,25)\mathbf{x} = (7, 9, 11, \ldots, 25), y=(13,15,17,,31)\mathbf{y} = (13, 15, 17, \ldots, 31)
  • Must produce same bounds as base case (location invariance)
  • property-scale-2x: x=(2,4,6,,20)\mathbf{x} = (2, 4, 6, \ldots, 20), y=(6,8,10,,24)\mathbf{y} = (6, 8, 10, \ldots, 24)
  • Bounds must be 2× the base case bounds (scale equivariance)
  • property-antisymmetry: x=(3,4,,12)\mathbf{x} = (3, 4, \ldots, 12), y=(1,2,,10)\mathbf{y} = (1, 2, \ldots, 10)
  • Bounds must be negated: if original is [a,b][a, b], this yields [b,a][-b, -a]
  • property-negative: x=(10,9,,1)\mathbf{x} = (-10, -9, \ldots, -1), y=(12,11,,3)\mathbf{y} = (-12, -11, \ldots, -3)
  • Validates sign handling with all negative values
  • property-mixed-signs: x=(4,3,,5)\mathbf{x} = (-4, -3, \ldots, 5), y=(3,2,,6)\mathbf{y} = (-3, -2, \ldots, 6)
  • Validates bounds crossing zero with mixed-sign samples

Edge cases — boundary conditions and extreme scenarios (10 tests):

  • edge-min-samples: x=(1,2,3,4,5)\mathbf{x} = (1, 2, 3, 4, 5), y=(6,7,8,9,10)\mathbf{y} = (6, 7, 8, 9, 10), misrate=0.05\mathrm{misrate} = 0.05
  • edge-permissive-misrate: x=(1,2,3,4,5)\mathbf{x} = (1, 2, 3, 4, 5), y=(3,4,5,6,7)\mathbf{y} = (3, 4, 5, 6, 7), misrate=0.5\mathrm{misrate} = 0.5 (very wide bounds)
  • edge-strict-misrate: n=m=20n = m = 20, misrate=106\mathrm{misrate} = 10^{-6} (very narrow bounds)
  • edge-zero-shift: n=m=10n = m = 10, all values =5= 5, misrate=103\mathrm{misrate} = 10^{-3} (bounds around 0)
  • edge-asymmetric-3-100: n=3n = 3, m=100m = 100, misrate=102\mathrm{misrate} = 10^{-2} (extreme size difference)
  • edge-asymmetric-5-50: n=5n = 5, m=50m = 50, misrate=103\mathrm{misrate} = 10^{-3} (highly unbalanced)
  • edge-duplicates: x=(3,3,3,3,3)\mathbf{x} = (3, 3, 3, 3, 3), y=(5,5,5,5,5)\mathbf{y} = (5, 5, 5, 5, 5), misrate=102\mathrm{misrate} = 10^{-2} (all duplicates, bounds around 2)
  • edge-wide-range: n=m=10n = m = 10, values spanning 10310^{-3} to 10810^8, misrate=103\mathrm{misrate} = 10^{-3} (extreme value range)
  • edge-tiny-values: n=m=10n = m = 10, values 108\approx 10^{-8}, misrate=103\mathrm{misrate} = 10^{-3} (numerical precision)
  • edge-large-values: n=m=10n = m = 10, values 108\approx 10^8, misrate=103\mathrm{misrate} = 10^{-3} (large magnitude)

These edge cases stress-test boundary conditions, numerical stability, and the margin calculation with extreme parameters.

Additive distribution ([n,m]in10,30,50×10,30,50[n, m] in {10, 30, 50} \times {10, 30, 50}, misrate=103\mathrm{misrate} = 10^{-3}) — 9 combinations with Additive(10,1)\underline{\operatorname{Additive}}(10, 1):

  • additive-10-10, additive-10-30, additive-10-50
  • additive-30-10, additive-30-30, additive-30-50
  • additive-50-10, additive-50-30, additive-50-50
  • Random generation: x\mathbf{x} uses seed 0, y\mathbf{y} uses seed 1

These fuzzy tests validate that bounds properly encompass the shift estimate for realistic normally-distributed data at various sample sizes.

Uniform distribution ([n,m]in10,100×10,100[n, m] in {10, 100} \times {10, 100}, misrate=104\mathrm{misrate} = 10^{-4}) — 4 combinations with Uniform(0,1)\underline{\operatorname{Uniform}}(0, 1):

  • uniform-10-10, uniform-10-100, uniform-100-10, uniform-100-100
  • Random generation: x\mathbf{x} uses seed 2, y\mathbf{y} uses seed 3

The asymmetric size combinations are particularly important for testing margin calculation with unbalanced samples.

Misrate variation (n=m=20n = m = 20, x=(0,2,4,,38)\mathbf{x} = (0, 2, 4, \ldots, 38), y=(10,12,14,,48)\mathbf{y} = (10, 12, 14, \ldots, 48)) — 5 tests with varying misrates:

  • misrate-1e-2: misrate=102\mathrm{misrate} = 10^{-2}
  • misrate-1e-3: misrate=103\mathrm{misrate} = 10^{-3}
  • misrate-1e-4: misrate=104\mathrm{misrate} = 10^{-4}
  • misrate-1e-5: misrate=105\mathrm{misrate} = 10^{-5}
  • misrate-1e-6: misrate=106\mathrm{misrate} = 10^{-6}

These tests use identical samples with varying misrates to validate the monotonicity property: smaller misrates (higher confidence) produce wider bounds. The sequence demonstrates how bound width increases as misrate decreases, helping implementations verify correct margin calculation.

Unsorted tests — verify independent sorting of x\mathbf{x} and y\mathbf{y} (15 tests):

  • unsorted-x-natural-5-5: x=(5,3,1,4,2)\mathbf{x} = (5, 3, 1, 4, 2), y=(1,2,3,4,5)\mathbf{y} = (1, 2, 3, 4, 5), misrate=102\mathrm{misrate} = 10^{-2} (X reversed, Y sorted)
  • unsorted-y-natural-5-5: x=(1,2,3,4,5)\mathbf{x} = (1, 2, 3, 4, 5), y=(5,3,1,4,2)\mathbf{y} = (5, 3, 1, 4, 2), misrate=102\mathrm{misrate} = 10^{-2} (X sorted, Y reversed)
  • unsorted-both-natural-5-5: x=(5,3,1,4,2)\mathbf{x} = (5, 3, 1, 4, 2), y=(5,3,1,4,2)\mathbf{y} = (5, 3, 1, 4, 2), misrate=102\mathrm{misrate} = 10^{-2} (both reversed)
  • unsorted-x-shuffle-5-5: x=(3,1,5,4,2)\mathbf{x} = (3, 1, 5, 4, 2), y=(1,2,3,4,5)\mathbf{y} = (1, 2, 3, 4, 5), misrate=102\mathrm{misrate} = 10^{-2} (X shuffled)
  • unsorted-y-shuffle-5-5: x=(1,2,3,4,5)\mathbf{x} = (1, 2, 3, 4, 5), y=(4,2,5,1,3)\mathbf{y} = (4, 2, 5, 1, 3), misrate=102\mathrm{misrate} = 10^{-2} (Y shuffled)
  • unsorted-both-shuffle-5-5: x=(3,1,5,4,2)\mathbf{x} = (3, 1, 5, 4, 2), y=(2,4,1,5,3)\mathbf{y} = (2, 4, 1, 5, 3), misrate=102\mathrm{misrate} = 10^{-2} (both shuffled)
  • unsorted-demo-unsorted-x: x=(5,1,4,2,3)\mathbf{x} = (5, 1, 4, 2, 3), y=(3,4,5,6,7)\mathbf{y} = (3, 4, 5, 6, 7), misrate=0.05\mathrm{misrate} = 0.05 (demo-1 X unsorted)
  • unsorted-demo-unsorted-y: x=(1,2,3,4,5)\mathbf{x} = (1, 2, 3, 4, 5), y=(7,3,6,4,5)\mathbf{y} = (7, 3, 6, 4, 5), misrate=0.05\mathrm{misrate} = 0.05 (demo-1 Y unsorted)
  • unsorted-demo-both-unsorted: x=(4,1,5,2,3)\mathbf{x} = (4, 1, 5, 2, 3), y=(6,3,7,4,5)\mathbf{y} = (6, 3, 7, 4, 5), misrate=0.05\mathrm{misrate} = 0.05 (demo-1 both unsorted)
  • unsorted-identity-unsorted: x=(4,1,5,2,3)\mathbf{x} = (4, 1, 5, 2, 3), y=(5,1,4,3,2)\mathbf{y} = (5, 1, 4, 3, 2), misrate=102\mathrm{misrate} = 10^{-2} (identity property, both unsorted)
  • unsorted-negative-unsorted: x=(1,5,3,2,4)\mathbf{x} = (-1, -5, -3, -2, -4), y=(2,4,3,5,1)\mathbf{y} = (-2, -4, -3, -5, -1), misrate=102\mathrm{misrate} = 10^{-2} (negative values unsorted)
  • unsorted-asymmetric-5-10: x=(2,5,1,3,4)\mathbf{x} = (2, 5, 1, 3, 4), y=(10,5,2,8,4,1,9,3,7,6)\mathbf{y} = (10, 5, 2, 8, 4, 1, 9, 3, 7, 6), misrate=102\mathrm{misrate} = 10^{-2} (asymmetric sizes, both unsorted)
  • unsorted-duplicates: x=(3,3,3,3,3)\mathbf{x} = (3, 3, 3, 3, 3), y=(5,5,5,5,5)\mathbf{y} = (5, 5, 5, 5, 5), misrate=102\mathrm{misrate} = 10^{-2} (all duplicates, any order)
  • unsorted-mixed-duplicates-x: x=(2,1,3,2,1)\mathbf{x} = (2, 1, 3, 2, 1), y=(1,1,2,2,3)\mathbf{y} = (1, 1, 2, 2, 3), misrate=102\mathrm{misrate} = 10^{-2} (X has unsorted duplicates)
  • unsorted-mixed-duplicates-y: x=(1,1,2,2,3)\mathbf{x} = (1, 1, 2, 2, 3), y=(3,2,1,3,2)\mathbf{y} = (3, 2, 1, 3, 2), misrate=102\mathrm{misrate} = 10^{-2} (Y has unsorted duplicates)

These unsorted tests are critical because ShiftBounds\operatorname{ShiftBounds} computes bounds from pairwise differences, requiring both samples to be sorted independently. The variety ensures implementations dont incorrectly assume pre-sorted input or sort samples together. Each test must produce identical output to its sorted counterpart, validating that the implementation correctly handles the sorting step.

No performance testShiftBounds\operatorname{ShiftBounds} uses the FastShift\text{FastShift} algorithm internally, which is already validated by the Shift\operatorname{Shift} performance test. Since bounds computation involves only two quantile calculations from the pairwise differences (at positions determined by PairwiseMargin\operatorname{PairwiseMargin}), the performance characteristics are equivalent to computing two Shift\operatorname{Shift} estimates, which completes efficiently for large samples.

RatioBounds Tests

RatioBounds(x,y,misrate)=exp(ShiftBounds(logx,logy,misrate))\operatorname{RatioBounds}(\mathbf{x}, \mathbf{y}, \mathrm{misrate}) = \exp(\operatorname{ShiftBounds}(\log \mathbf{x}, \log \mathbf{y}, \mathrm{misrate}))

The RatioBounds\operatorname{RatioBounds} test suite contains 61 correctness test cases (3 demo + 9 natural + 6 property + 10 edge + 9 multiplic + 4 uniform + 5 misrate + 15 unsorted). Since RatioBounds\operatorname{RatioBounds} returns bounds rather than a point estimate, tests validate that the bounds contain Ratio(x,y)\operatorname{Ratio}(\mathbf{x}, \mathbf{y}) and satisfy equivariance properties. Each test case output is a JSON object with lower and upper fields representing the interval bounds. All samples must contain strictly positive values. The domain constraint misrate2(n+mn)\mathrm{misrate} \geq \frac{2}{\binom{n+m}{n}} is enforced; inputs violating this return a domain error.

Demo examples (n=m=5n = m = 5, positive samples) — 3 tests:

  • demo-1: x=(1,2,3,4,5)\mathbf{x} = (1, 2, 3, 4, 5), y=(2,3,4,5,6)\mathbf{y} = (2, 3, 4, 5, 6), misrate=0.05\mathrm{misrate} = 0.05
  • demo-2: x=(1,2,3,4,5)\mathbf{x} = (1, 2, 3, 4, 5), y=(2,3,4,5,6)\mathbf{y} = (2, 3, 4, 5, 6), misrate=0.01\mathrm{misrate} = 0.01, expected: wider bounds than demo-1
  • demo-3: x=(2,3,4,5,6)\mathbf{x} = (2, 3, 4, 5, 6), y=(2,3,4,5,6)\mathbf{y} = (2, 3, 4, 5, 6), misrate=0.05\mathrm{misrate} = 0.05, expected: bounds containing 11 (identity case)

These cases illustrate how tighter misrates produce wider bounds and validate the identity property where identical samples yield bounds containing one.

Natural sequences ([n,m]in5,8,10×5,8,10[n, m] in {5, 8, 10} \times {5, 8, 10}, misrate=102\mathrm{misrate} = 10^{-2}) — 9 combinations:

  • natural-5-5: x=(1,,5)\mathbf{x} = (1, \ldots, 5), y=(1,,5)\mathbf{y} = (1, \ldots, 5), expected bounds containing 11
  • natural-5-8: x=(1,,5)\mathbf{x} = (1, \ldots, 5), y=(1,,8)\mathbf{y} = (1, \ldots, 8)
  • natural-5-10: x=(1,,5)\mathbf{x} = (1, \ldots, 5), y=(1,,10)\mathbf{y} = (1, \ldots, 10)
  • natural-8-5: x=(1,,8)\mathbf{x} = (1, \ldots, 8), y=(1,,5)\mathbf{y} = (1, \ldots, 5)
  • natural-8-8: x=(1,,8)\mathbf{x} = (1, \ldots, 8), y=(1,,8)\mathbf{y} = (1, \ldots, 8), expected bounds containing 11
  • natural-8-10: x=(1,,8)\mathbf{x} = (1, \ldots, 8), y=(1,,10)\mathbf{y} = (1, \ldots, 10)
  • natural-10-5: x=(1,,10)\mathbf{x} = (1, \ldots, 10), y=(1,,5)\mathbf{y} = (1, \ldots, 5)
  • natural-10-8: x=(1,,10)\mathbf{x} = (1, \ldots, 10), y=(1,,8)\mathbf{y} = (1, \ldots, 8)
  • natural-10-10: x=(1,,10)\mathbf{x} = (1, \ldots, 10), y=(1,,10)\mathbf{y} = (1, \ldots, 10), expected bounds containing 11

These sizes are chosen to satisfy misrate2(n+mn)\mathrm{misrate} \geq \frac{2}{\binom{n+m}{n}} for all combinations.

Property validation (n=m=10n = m = 10, misrate=103\mathrm{misrate} = 10^{-3}) — 6 tests:

  • property-identity: x=(1,2,,10)\mathbf{x} = (1, 2, \ldots, 10), y=(1,2,,10)\mathbf{y} = (1, 2, \ldots, 10), bounds must contain 11
  • property-scale-2x: x=(2,4,,20)\mathbf{x} = (2, 4, \ldots, 20), y=(1,2,,10)\mathbf{y} = (1, 2, \ldots, 10), bounds must contain 22
  • property-reciprocal: x=(1,2,,10)\mathbf{x} = (1, 2, \ldots, 10), y=(2,4,,20)\mathbf{y} = (2, 4, \ldots, 20), bounds must contain 0.50.5 (reciprocal of scale-2x)
  • property-common-scale: x=(10,20,,100)\mathbf{x} = (10, 20, \ldots, 100), y=(20,40,,200)\mathbf{y} = (20, 40, \ldots, 200)
  • Same ratio as property-reciprocal (common scale invariance)
  • property-small-values: x=(0.1,0.2,,1.0)\mathbf{x} = (0.1, 0.2, \ldots, 1.0), y=(0.2,0.4,,2.0)\mathbf{y} = (0.2, 0.4, \ldots, 2.0)
  • Same ratio as property-reciprocal (small value handling)
  • property-mixed-scales: x=(0.01,0.1,1,10,100,1000,0.5,5,50,500)\mathbf{x} = (0.01, 0.1, 1, 10, 100, 1000, 0.5, 5, 50, 500), y=(0.1,1,10,100,1000,10000,5,50,500,5000)\mathbf{y} = (0.1, 1, 10, 100, 1000, 10000, 5, 50, 500, 5000)
  • Wide range validation

Edge cases — boundary conditions and extreme scenarios (10 tests):

  • edge-min-samples: x=(2,3,4,5,6)\mathbf{x} = (2, 3, 4, 5, 6), y=(3,4,5,6,7)\mathbf{y} = (3, 4, 5, 6, 7), misrate=0.05\mathrm{misrate} = 0.05
  • edge-permissive-misrate: x=(1,2,3,4,5)\mathbf{x} = (1, 2, 3, 4, 5), y=(2,3,4,5,6)\mathbf{y} = (2, 3, 4, 5, 6), misrate=0.5\mathrm{misrate} = 0.5 (very wide bounds)
  • edge-strict-misrate: n=m=20n = m = 20, misrate=106\mathrm{misrate} = 10^{-6} (very narrow bounds)
  • edge-unity-ratio: n=m=10n = m = 10, all values =5= 5, misrate=103\mathrm{misrate} = 10^{-3} (bounds around 1)
  • edge-asymmetric-3-100: n=3n = 3, m=100m = 100, misrate=102\mathrm{misrate} = 10^{-2} (extreme size difference)
  • edge-asymmetric-5-50: n=5n = 5, m=50m = 50, misrate=103\mathrm{misrate} = 10^{-3} (highly unbalanced)
  • edge-duplicates: x=(3,3,3,3,3)\mathbf{x} = (3, 3, 3, 3, 3), y=(5,5,5,5,5)\mathbf{y} = (5, 5, 5, 5, 5), misrate=102\mathrm{misrate} = 10^{-2} (all duplicates, bounds around 0.6)
  • edge-wide-range: n=m=10n = m = 10, values spanning 10310^{-3} to 10810^8, misrate=103\mathrm{misrate} = 10^{-3} (extreme value range)
  • edge-tiny-values: n=m=10n = m = 10, values 106\approx 10^{-6}, misrate=103\mathrm{misrate} = 10^{-3} (numerical precision)
  • edge-large-values: n=m=10n = m = 10, values 108\approx 10^8, misrate=103\mathrm{misrate} = 10^{-3} (large magnitude)

These edge cases stress-test boundary conditions, numerical stability, and the margin calculation with extreme parameters.

Multiplic distribution ([n,m]in10,30,50×10,30,50[n, m] in {10, 30, 50} \times {10, 30, 50}, misrate=103\mathrm{misrate} = 10^{-3}) — 9 combinations with Multiplic(1,0.5)\underline{\operatorname{Multiplic}}(1, 0.5):

  • multiplic-10-10, multiplic-10-30, multiplic-10-50
  • multiplic-30-10, multiplic-30-30, multiplic-30-50
  • multiplic-50-10, multiplic-50-30, multiplic-50-50
  • Random generation: x\mathbf{x} uses seed 0, y\mathbf{y} uses seed 1

These fuzzy tests validate that bounds properly encompass the ratio estimate for realistic log-normally-distributed data at various sample sizes.

Uniform distribution ([n,m]in10,100×10,100[n, m] in {10, 100} \times {10, 100}, misrate=104\mathrm{misrate} = 10^{-4}) — 4 combinations with Uniform(1,10)\underline{\operatorname{Uniform}}(1, 10):

  • uniform-10-10, uniform-10-100, uniform-100-10, uniform-100-100
  • Random generation: x\mathbf{x} uses seed 2, y\mathbf{y} uses seed 3
  • Note: positive range [1,10)[1, 10) used for ratio compatibility

The asymmetric size combinations are particularly important for testing margin calculation with unbalanced samples.

Misrate variation (n=m=20n = m = 20, x=(1,2,,20)\mathbf{x} = (1, 2, \ldots, 20), y=(2,4,,40)\mathbf{y} = (2, 4, \ldots, 40)) — 5 tests with varying misrates:

  • misrate-1e-2: misrate=102\mathrm{misrate} = 10^{-2}
  • misrate-1e-3: misrate=103\mathrm{misrate} = 10^{-3}
  • misrate-1e-4: misrate=104\mathrm{misrate} = 10^{-4}
  • misrate-1e-5: misrate=105\mathrm{misrate} = 10^{-5}
  • misrate-1e-6: misrate=106\mathrm{misrate} = 10^{-6}

These tests use identical samples with varying misrates to validate the monotonicity property: smaller misrates (higher confidence) produce wider bounds. The sequence demonstrates how bound width increases as misrate decreases, helping implementations verify correct margin calculation.

Unsorted tests — verify independent sorting of x\mathbf{x} and y\mathbf{y} (15 tests):

  • unsorted-x-natural-5-5: x=(5,3,1,4,2)\mathbf{x} = (5, 3, 1, 4, 2), y=(1,2,3,4,5)\mathbf{y} = (1, 2, 3, 4, 5), misrate=102\mathrm{misrate} = 10^{-2} (X reversed, Y sorted)
  • unsorted-y-natural-5-5: x=(1,2,3,4,5)\mathbf{x} = (1, 2, 3, 4, 5), y=(5,3,1,4,2)\mathbf{y} = (5, 3, 1, 4, 2), misrate=102\mathrm{misrate} = 10^{-2} (X sorted, Y reversed)
  • unsorted-both-natural-5-5: x=(5,3,1,4,2)\mathbf{x} = (5, 3, 1, 4, 2), y=(5,3,1,4,2)\mathbf{y} = (5, 3, 1, 4, 2), misrate=102\mathrm{misrate} = 10^{-2} (both reversed)
  • unsorted-x-shuffle-5-5: x=(3,1,5,4,2)\mathbf{x} = (3, 1, 5, 4, 2), y=(1,2,3,4,5)\mathbf{y} = (1, 2, 3, 4, 5), misrate=102\mathrm{misrate} = 10^{-2} (X shuffled)
  • unsorted-y-shuffle-5-5: x=(1,2,3,4,5)\mathbf{x} = (1, 2, 3, 4, 5), y=(4,2,5,1,3)\mathbf{y} = (4, 2, 5, 1, 3), misrate=102\mathrm{misrate} = 10^{-2} (Y shuffled)
  • unsorted-both-shuffle-5-5: x=(3,1,5,4,2)\mathbf{x} = (3, 1, 5, 4, 2), y=(2,4,1,5,3)\mathbf{y} = (2, 4, 1, 5, 3), misrate=102\mathrm{misrate} = 10^{-2} (both shuffled)
  • unsorted-demo-unsorted-x: x=(5,1,4,2,3)\mathbf{x} = (5, 1, 4, 2, 3), y=(2,3,4,5,6)\mathbf{y} = (2, 3, 4, 5, 6), misrate=0.05\mathrm{misrate} = 0.05 (demo-1 X unsorted)
  • unsorted-demo-unsorted-y: x=(1,2,3,4,5)\mathbf{x} = (1, 2, 3, 4, 5), y=(6,2,5,3,4)\mathbf{y} = (6, 2, 5, 3, 4), misrate=0.05\mathrm{misrate} = 0.05 (demo-1 Y unsorted)
  • unsorted-demo-both-unsorted: x=(4,1,5,2,3)\mathbf{x} = (4, 1, 5, 2, 3), y=(5,2,6,3,4)\mathbf{y} = (5, 2, 6, 3, 4), misrate=0.05\mathrm{misrate} = 0.05 (demo-1 both unsorted)
  • unsorted-identity-unsorted: x=(4,1,5,2,3)\mathbf{x} = (4, 1, 5, 2, 3), y=(5,1,4,3,2)\mathbf{y} = (5, 1, 4, 3, 2), misrate=102\mathrm{misrate} = 10^{-2} (identity property, both unsorted)
  • unsorted-scale-unsorted: x=(10,30,20)\mathbf{x} = (10, 30, 20), y=(15,5,10)\mathbf{y} = (15, 5, 10), misrate=0.5\mathrm{misrate} = 0.5 (scale relationship, both unsorted)
  • unsorted-asymmetric-5-10: x=(2,5,1,3,4)\mathbf{x} = (2, 5, 1, 3, 4), y=(10,5,2,8,4,1,9,3,7,6)\mathbf{y} = (10, 5, 2, 8, 4, 1, 9, 3, 7, 6), misrate=102\mathrm{misrate} = 10^{-2} (asymmetric sizes, both unsorted)
  • unsorted-duplicates: x=(3,3,3,3,3)\mathbf{x} = (3, 3, 3, 3, 3), y=(5,5,5,5,5)\mathbf{y} = (5, 5, 5, 5, 5), misrate=102\mathrm{misrate} = 10^{-2} (all duplicates, any order)
  • unsorted-mixed-duplicates-x: x=(2,1,3,2,1)\mathbf{x} = (2, 1, 3, 2, 1), y=(1,1,2,2,3)\mathbf{y} = (1, 1, 2, 2, 3), misrate=102\mathrm{misrate} = 10^{-2} (X has unsorted duplicates)
  • unsorted-mixed-duplicates-y: x=(1,1,2,2,3)\mathbf{x} = (1, 1, 2, 2, 3), y=(3,2,1,3,2)\mathbf{y} = (3, 2, 1, 3, 2), misrate=102\mathrm{misrate} = 10^{-2} (Y has unsorted duplicates)

These unsorted tests are critical because RatioBounds\operatorname{RatioBounds} computes bounds from pairwise ratios, requiring both samples to be sorted independently. The variety ensures implementations dont incorrectly assume pre-sorted input or sort samples together. Each test must produce identical output to its sorted counterpart, validating that the implementation correctly handles the sorting step.

No performance testRatioBounds\operatorname{RatioBounds} uses the FastRatio\text{FastRatio} algorithm internally, which delegates to FastShift\text{FastShift} in log-space. Since bounds computation involves only two quantile calculations from the pairwise differences (at positions determined by PairwiseMargin\operatorname{PairwiseMargin}), the performance characteristics are equivalent to computing two Ratio\operatorname{Ratio} estimates, which completes efficiently for large samples.

CenterBounds Tests

CenterBounds(x,misrate)=[w(kleft),w(kright)]\operatorname{CenterBounds}(\mathbf{x}, \mathrm{misrate}) = [w_{(k_{\text{left}})}, w_{(k_{\text{right}})}]

where w=xi+xj2\mathbf{w} = { \frac{x_i + x_j}{2} } (pairwise averages, sorted) for iji \leq j, kleft=SignedRankMargin/2)+1k_{\text{left}} = \lfloor \operatorname{SignedRankMargin} / \rfloor2) + 1, kright=NSignedRankMargin/2)k_{\text{right}} = N - \lfloor \operatorname{SignedRankMargin} / \rfloor2), and N=n(n+1)2N = \frac{n(n+1)}{2}.

The CenterBounds\operatorname{CenterBounds} test suite contains 43 test cases (3 demo + 4 natural + 5 property + 7 edge + 4 symmetric + 4 asymmetric + 2 additive + 2 uniform + 4 misrate + 6 unsorted + 2 error cases). Since CenterBounds\operatorname{CenterBounds} returns bounds rather than a point estimate, tests validate that bounds contain Center(x)\operatorname{Center}(\mathbf{x}) and satisfy equivariance properties. Each test case output is a JSON object with lower and upper fields representing the interval bounds.

Demo examples — from manual introduction, validating basic bounds:

  • demo-1: x=(1,2,3,4,5)\mathbf{x} = (1, 2, 3, 4, 5), misrate=0.1\mathrm{misrate} = 0.1, expected output: [1.5,4.5][1.5, 4.5]
  • demo-2: x=(1,,10)\mathbf{x} = (1, \ldots, 10), misrate=0.01\mathrm{misrate} = 0.01, expected output: [2.5,8.5][2.5, 8.5]
  • demo-3: x=(0,2,4,6,8)\mathbf{x} = (0, 2, 4, 6, 8), misrate=0.1\mathrm{misrate} = 0.1

These cases illustrate how tighter misrates produce wider bounds.

Natural sequences (nin5,7,10,20n in {5, 7, 10, 20}, misrate=0.01\mathrm{misrate} = 0.01) — 4 tests:

  • natural-5: x=(1,2,3,4,5)\mathbf{x} = (1, 2, 3, 4, 5), bounds containing Center=3\operatorname{Center} = 3
  • natural-7: x=(1,,7)\mathbf{x} = (1, \ldots, 7), bounds containing Center=4\operatorname{Center} = 4
  • natural-10: x=(1,,10)\mathbf{x} = (1, \ldots, 10), expected output: [2.5,8.5][2.5, 8.5]
  • natural-20: x=(1,,20)\mathbf{x} = (1, \ldots, 20), bounds containing Center=10.5\operatorname{Center} = 10.5

Property validation (n=5n = 5, misrate=0.05\mathrm{misrate} = 0.05) — 5 tests:

  • property-identity: x=(1,2,3,4,5)\mathbf{x} = (1, 2, 3, 4, 5), bounds must contain Center=3\operatorname{Center} = 3
  • property-centered: x=(2,1,0,1,2)\mathbf{x} = (-2, -1, 0, 1, 2), bounds must contain Center=0\operatorname{Center} = 0
  • property-location-shift: x=(11,12,13,14,15)\mathbf{x} = (11, 12, 13, 14, 15) (= demo-1 + 10), bounds must be demo-1 bounds + 10
  • property-scale-2x: x=(2,4,6,8,10)\mathbf{x} = (2, 4, 6, 8, 10) (= 2 × demo-1), bounds must be 2× demo-1 bounds
  • property-mixed-signs: x=(2,1,0,1,2)\mathbf{x} = (-2, -1, 0, 1, 2), validates bounds crossing zero

Edge cases — boundary conditions and extreme scenarios (7 tests):

  • edge-two-elements: x=(1,2)\mathbf{x} = (1, 2), misrate=0.5\mathrm{misrate} = 0.5 (minimum meaningful sample)
  • edge-three-elements: x=(1,2,3)\mathbf{x} = (1, 2, 3), misrate=0.25\mathrm{misrate} = 0.25 (small sample)
  • edge-loose-misrate: x=(1,2,3,4,5)\mathbf{x} = (1, 2, 3, 4, 5), misrate=0.5\mathrm{misrate} = 0.5 (permissive bounds)
  • edge-strict-misrate: x=(1,,10)\mathbf{x} = (1, \ldots, 10), misrate=0.002\mathrm{misrate} = 0.002 (near-minimum misrate for n=10n=10)
  • edge-duplicates-10: x=(5,5,5,5,5,5,5,5,5,5)\mathbf{x} = (5, 5, 5, 5, 5, 5, 5, 5, 5, 5), misrate=0.01\mathrm{misrate} = 0.01 (all identical, bounds =[5,5]= [5, 5])
  • edge-negative: x=(5,4,3,2,1)\mathbf{x} = (-5, -4, -3, -2, -1), misrate=0.05\mathrm{misrate} = 0.05 (negative values)
  • edge-wide-range: x=(1,10,100,1000,10000)\mathbf{x} = (1, 10, 100, 1000, 10000), misrate=0.1\mathrm{misrate} = 0.1 (extreme value range)

Symmetric distributions (misrate=0.05\mathrm{misrate} = 0.05) — 4 tests with symmetric data:

  • symmetric-5: x=(2,1,0,1,2)\mathbf{x} = (-2, -1, 0, 1, 2), bounds centered around 00
  • symmetric-7: x=(3,2,1,0,1,2,3)\mathbf{x} = (-3, -2, -1, 0, 1, 2, 3), bounds centered around 00
  • symmetric-10: n=10n = 10 symmetric around 00
  • symmetric-15: n=15n = 15 symmetric around 00

These tests validate that symmetric data produces symmetric bounds around the center.

Asymmetric distributions (n=5n = 5, misrate=0.1\mathrm{misrate} = 0.1) — 4 tests validating bounds with asymmetric data:

  • asymmetric-left-skew: x=(1,7,8,9,10)\mathbf{x} = (1, 7, 8, 9, 10), expected output: [4,9.5][4, 9.5]
  • asymmetric-right-skew: x=(1,2,3,4,10)\mathbf{x} = (1, 2, 3, 4, 10), expected output: [1.5,7][1.5, 7]
  • asymmetric-bimodal: x=(1,1,5,9,9)\mathbf{x} = (1, 1, 5, 9, 9), expected output: [1,9][1, 9]
  • asymmetric-outlier: x=(1,2,3,4,100)\mathbf{x} = (1, 2, 3, 4, 100), expected output: [1.5,52][1.5, 52]

These tests validate that CenterBounds\operatorname{CenterBounds} handles asymmetric data correctly, complementing the symmetric test cases.

Additive distribution (misrate=0.01\mathrm{misrate} = 0.01) — 2 tests with Additive(10,1)\underline{\operatorname{Additive}}(10, 1):

  • additive-10: n=10n = 10, seed 0
  • additive-20: n=20n = 20, seed 0

Uniform distribution (misrate=0.01\mathrm{misrate} = 0.01) — 2 tests with Uniform(0,1)\underline{\operatorname{Uniform}}(0, 1):

  • uniform-10: n=10n = 10, seed 1
  • uniform-20: n=20n = 20, seed 1

Misrate variation (x=(1,,10)\mathbf{x} = (1, \ldots, 10)) — 4 tests with varying misrates:

  • misrate-1e-1: misrate=0.1\mathrm{misrate} = 0.1
  • misrate-5e-2: misrate=0.05\mathrm{misrate} = 0.05
  • misrate-1e-2: misrate=0.01\mathrm{misrate} = 0.01
  • misrate-5e-3: misrate=0.005\mathrm{misrate} = 0.005

These tests validate monotonicity: smaller misrates produce wider bounds.

Unsorted tests — verify sorting independence (6 tests):

  • unsorted-reverse-5: x=(5,4,3,2,1)\mathbf{x} = (5, 4, 3, 2, 1), must equal natural-5 output
  • unsorted-reverse-7: x=(7,6,5,4,3,2,1)\mathbf{x} = (7, 6, 5, 4, 3, 2, 1), must equal natural-7 output
  • unsorted-shuffle-5: x\mathbf{x} shuffled, must equal sorted counterpart
  • unsorted-shuffle-7: x\mathbf{x} shuffled, must equal sorted counterpart
  • unsorted-negative-5: negative values unsorted
  • unsorted-mixed-signs-5: mixed signs unsorted

These tests validate that CenterBounds\operatorname{CenterBounds} produces identical results regardless of input order.

Error cases — 2 tests validating input validation:

  • error-single-element: x=(1)\mathbf{x} = (1), misrate=0.5\mathrm{misrate} = 0.5 (minimum sample size violation)
  • error-invalid-misrate: x=(1,2,3,4,5)\mathbf{x} = (1, 2, 3, 4, 5), misrate=0.001\mathrm{misrate} = 0.001 (misrate below minimum achievable)

Test Framework

The reference test framework consists of three components:

Test generation — The C# implementation defines test inputs programmatically using builder patterns. For deterministic cases, inputs are explicitly specified. For random cases, the framework uses controlled seeds with System.Random to ensure reproducibility across all platforms.

The random generation mechanism works as follows:

  • Each test suite builder maintains a seed counter initialized to zero.
  • For one-sample estimators, each distribution type receives the next available seed. The same random generator produces all samples for all sizes within that distribution.
  • For two-sample estimators, each pair of distributions receives two consecutive seeds: one for the x\mathbf{x} sample generator and one for the y\mathbf{y} sample generator.
  • The seed counter increments with each random generator creation, ensuring deterministic test data generation.

For Additive\underline{\operatorname{Additive}} distributions, random values are generated using the Box-Müller transform, which converts pairs of uniform random values into normally distributed values. The transform applies the formula:

X=μ+σ2ln(U1)sin(2πU2)X = \mu + \sigma \sqrt{-2 \ln(U_1)} \sin(2 \pi U_2)

where U1,U2U_1, U_2 are uniform random values from Uniform(0,1)\underline{\operatorname{Uniform}}(0, 1), μ\mu is the mean, and σ\sigma is the standard deviation.

For Uniform\underline{\operatorname{Uniform}} distributions, random values are generated directly using the quantile function:

X=min+U(maxmin)X = \min + U \cdot (\max - \min)

where UU is a uniform random value from Uniform(0,1)\underline{\operatorname{Uniform}}(0, 1).

The framework executes the reference implementation on all generated inputs and serializes input-output pairs to JSON format.

Test validation — Each language implementation loads the JSON test cases and executes them against its local estimator implementation. Assertions verify that outputs match expected values within a given numerical tolerance (typically 101010^{-10} for relative error).

Test data format — Each test case is a JSON file containing input and output fields. For one-sample estimators, the input contains array x and optional parameters. For two-sample estimators, input contains arrays x and y. For bounds estimators (ShiftBounds\operatorname{ShiftBounds}, RatioBounds\operatorname{RatioBounds}), input additionally contains misrate. Output is a single numeric value for point estimators, or an object with lower and upper fields for bounds estimators.

Performance testing — The toolkit provides O(nlogn)O(n \log n) fast algorithms for Center\operatorname{Center}, Spread\operatorname{Spread}, and Shift\operatorname{Shift} estimators, dramatically more efficient than naive implementations that materialize all pairwise combinations. Performance tests use sample size n=100,000n = 100,000 (for one-sample) or n=m=100,000n = m = 100,000 (for two-sample). This specific size creates a clear performance distinction: fast implementations (O(nlogn)O(n \log n) or O((m+n)logL)O((m+n) \log L)) complete in under 5 seconds on modern hardware across all supported languages, while naive implementations (O(n2logn)O(n^2 \log n) or O(mnlog(mn))O(m n \log(m n))) would be prohibitively slow (taking hours or failing due to memory exhaustion). With n=100,000n = 100,000, naive approaches would need to materialize approximately 5 billion pairwise values for Center\operatorname{Center}/Spread\operatorname{Spread} or 10 billion for Shift\operatorname{Shift}, whereas fast algorithms require only O(n)O(n) additional memory. Performance tests serve dual purposes: correctness validation at scale and performance regression detection, ensuring implementations use the efficient algorithms and remain practical for real-world datasets with hundreds of thousands of observations. Performance test specifications are provided in the respective estimator sections above.

This framework ensures that all seven language implementations maintain strict numerical agreement across the full test suite.