SignedRankMargin

SignedRankMargin(n,misrate)\operatorname{SignedRankMargin}(n, \mathrm{misrate})

Exclusion count for one-sample signed-rank based bounds.

  • Purpose — determines extreme pairwise averages to exclude when constructing bounds
  • Based on — Wilcoxon signed-rank distribution under weak symmetry
  • Returns — total margin split evenly between lower and upper tails
  • Used byCenterBounds\operatorname{CenterBounds} to select appropriate order statistics
  • Complexity — exact for n63n \leq 63, approximated for larger
  • Domainn2n \geq 2, misrate21n\mathrm{misrate} \geq 2^{1-n}
  • Unit — count
  • Note — assumes weak symmetry and weak continuity

Properties

  • Bounds 0SignedRankMargin(n,misrate)n(n+1)20 \leq \operatorname{SignedRankMargin}(n, \mathrm{misrate}) \leq \frac{n(n+1)}{2}
  • Monotonicity lower misrate \rightarrow smaller margin \rightarrow wider bounds

Example

  • SignedRankMargin(10, 0.05) = 18
  • SignedRankMargin(30, 1e-4) = 112
  • SignedRankMargin(100, 1e-6) = 706

This is a supporting function that CenterBounds\operatorname{CenterBounds} uses internally, so most users do not need to call it directly. It calculates how many extreme pairwise averages should be excluded when constructing bounds, based on sample size and the desired error rate. A lower misrate (higher confidence) results in a smaller margin, which produces wider bounds. The function automatically chooses between exact computation for small samples and a fast approximation for large samples.

Algorithm

The SignedRankMargin\operatorname{SignedRankMargin} function determines how many extreme pairwise averages to exclude when constructing bounds around Center(x)\operatorname{Center}(\mathbf{x}). Given a sample x=(x1,,xn)\mathbf{x} = (x_1, \ldots, x_n), the CenterBounds\operatorname{CenterBounds} estimator computes all N=n(n+1)2N = \frac{n(n+1)}{2} pairwise averages wij=xi+xj2w_{i j} = \frac{x_i + x_j}{2} for iji \leq j and sorts them. The bounds select specific order statistics from this sorted sequence: [w(kleft),w(kright)][w_{(k_{\text{left}})}, w_{(k_{\text{right}})}]. The challenge lies in determining which order statistics produce bounds that contain the true center with probability 1misrate1 - \mathrm{misrate}.

The margin function is the one-sample analog of PairwiseMargin\operatorname{PairwiseMargin}. While PairwiseMargin\operatorname{PairwiseMargin} uses the Mann-Whitney distribution for two-sample comparisons, SignedRankMargin\operatorname{SignedRankMargin} uses the Wilcoxon signed-rank distribution for one-sample inference. Under the weak symmetry assumption, the signed-rank statistic has a known distribution that enables exact computation of bounds coverage.

For symmetric distributions, consider the signs of deviations from the center. The Wilcoxon signed-rank statistic WW sums the ranks of positive deviations:

W=i=1nRi1(xi>θ)W = \sum_{i=1}^n R_i \cdot \mathbb{1}(x_i > \theta)

where RiR_i is the rank of xiθ\lvert x_i - \theta \rvert among all xjθ\lvert x_j - \theta \rvert, and θ\theta is the true center. Under symmetry, each deviation is equally likely to be positive or negative, giving WW a discrete distribution over [0,n(n+1)2][0, \frac{n(n+1)}{2}].

The connection to pairwise averages is fundamental: the kk-th order statistic of sorted pairwise averages corresponds to a specific threshold of the signed-rank statistic. By computing the distribution of WW, we determine which order statistics provide bounds with the desired coverage.

Two computational approaches provide the distribution of WW: exact calculation for small samples and approximation for large samples.

Exact method

Small sample sizes allow exact computation without approximation. The Wilcoxon signed-rank distribution has 2n2^n equally likely outcomes under symmetry, corresponding to all possible sign patterns for deviations from the center.

Dynamic programming builds the probability mass function efficiently. Define p(w)p(w) as the number of sign patterns producing signed-rank statistic equal to ww. The recurrence considers whether to include rank ii in the positive sum:

pi(w)=pi1(wi)+pi1(w)p_i(w) = p_{i-1}(w - i) + p_{i-1}(w)

with base case p0(0)=1p_0(0) = 1. This builds the distribution incrementally, rank by rank.

The algorithm computes cumulative probabilities Pr(Ww)\Pr(W \leq w) sequentially until the threshold misrate2\frac{\mathrm{misrate}}{2} is exceeded. For symmetric two-tailed bounds, the margin becomes SignedRankMargin=2w\operatorname{SignedRankMargin} = 2w. Memory is O(n2)O(n^2) for storing the probability array, and time is O(n3)O(n^3) for the full computation.

The sequential computation performs well for small misrates. For misrate=106\mathrm{misrate} = 10^{-6}, the threshold ww typically remains small, requiring only iterations through the lower tail regardless of sample size.

Approximate method

Large samples make exact computation impractical. For n>63n > 63, the Wilcoxon distribution is approximated using an Edgeworth expansion.

Under symmetry, the signed-rank statistic WW has:

E[W]=n(n+1)4,Var(W)=n(n+1)2n+124\mathbb{E}[W] = \frac{n(n+1)}{4}, \quad \operatorname{Var}(W) = n(n+1)\frac{2n+1}{24}

The basic normal approximation uses these moments directly, but underestimates tail probabilities for moderate sample sizes.

The Edgeworth expansion refines this through moment-based corrections. The fourth central moment of WW is:

μ4=9n5+45n4+65n3+15n214n480\mu_4 = \frac{9n^5 + 45n^4 + 65n^3 + 15n^2 - 14n}{480}

This enables kurtosis correction:

e3=μ43σ424σ4e_3 = \frac{\mu_4 - 3 \sigma^4}{24 \sigma^4}

The approximated CDF becomes:

Pr(Ww)Φ(z)+e3ϕ(z)(z33z)\Pr(W \leq w) \approx \Phi(z) + e_3 \phi(z)(z^3 - 3z)

where z=wμ+0.5σz = \frac{w - \mu + 0.5}{\sigma} includes a continuity correction.

Binary search locates the threshold efficiently. Each CDF evaluation costs O(1)O(1), and O(logN)O(\log N) evaluations suffice. The approximate method completes in constant time regardless of sample size.

The toolkit uses exact computation for n63n \leq 63 and approximation for n>63n > 63. At n=63n = 63, the exact method requires arrays of size 2,0162,016 (=63×642= 63 \times \frac{64}{2}), which remains practical on modern hardware. The transition at n=63n = 63 is determined by the requirement that 2n2^n fits in a 64-bit integer. The approximation achieves sub-1% accuracy for n>100n > 100, making the transition smooth.

Minimum achievable misrate

The misrate\mathrm{misrate} parameter controls how many extreme pairwise averages the bounds exclude. However, sample size limits how small misrate can meaningfully become.

The most extreme configuration occurs when all signs are positive (or all negative): W=n(n+1)2W = \frac{n(n+1)}{2} or W=0W = 0. Under symmetry, this extreme occurs with probability:

misratemin=22n=21n\mathrm{misrate}_{min} = \frac{2}{2^n} = 2^{1-n}

Setting misrate<misratemin\mathrm{misrate} < \mathrm{misrate}_{min} makes bounds construction problematic. Pragmastat rejects such requests with a domain error.

The table below shows misratemin\mathrm{misrate}_{min} for small sample sizes:

nn235710
misratemin\mathrm{misrate}_{min}0.50.250.06250.01560.00195
max conf50%75%93.75%98.4%99.8%

For meaningful bounds construction, choose misrate>misratemin\mathrm{misrate} > \mathrm{misrate}_{min}. With n10n \geq 10, standard choices like misrate=103\mathrm{misrate} = 10^{-3} become feasible. With n20n \geq 20, even misrate=106\mathrm{misrate} = 10^{-6} is achievable.

using JetBrains.Annotations;
using Pragmastat.Exceptions;

namespace Pragmastat.Functions;

/// <summary>
/// SignedRankMargin function for one-sample bounds.
/// One-sample analog of PairwiseMargin using Wilcoxon signed-rank distribution.
/// </summary>
/// <param name="threshold">Maximum n for exact computation; larger n uses approximation</param>
internal class SignedRankMargin(int threshold = SignedRankMargin.MaxExactSize)
{
  public static readonly SignedRankMargin Instance = new();

  private const int MaxExactSize = 63;

  [PublicAPI]
  public int Calc(int n, double misrate)
  {
    if (n <= 0)
      throw AssumptionException.Domain(Subject.X);
    if (double.IsNaN(misrate) || misrate < 0 || misrate > 1)
      throw AssumptionException.Domain(Subject.Misrate);

    double minMisrate = MinAchievableMisrate.OneSample(n);
    if (misrate < minMisrate)
      throw AssumptionException.Domain(Subject.Misrate);

    return n <= threshold
      ? CalcExact(n, misrate)
      : CalcApprox(n, misrate);
  }

  internal int CalcExact(int n, double misrate)
  {
    int raw = CalcExactRaw(n, misrate / 2);
    return checked(raw * 2);
  }

  internal int CalcApprox(int n, double misrate)
  {
    long raw = CalcApproxRaw(n, misrate / 2);
    long margin = raw * 2;
    if (margin > int.MaxValue)
      throw new OverflowException($"Signed-rank margin exceeds supported range for n={n}");
    return (int)margin;
  }

  /// <summary>
  /// Compute one-sided margin using exact Wilcoxon signed-rank distribution.
  /// Uses dynamic programming to compute the CDF.
  /// </summary>
  private static int CalcExactRaw(int n, double p)
  {
    ulong total = 1UL << n;
    long maxW = (long)n * (n + 1) / 2;

    var count = new ulong[maxW + 1];
    count[0] = 1;

    for (int i = 1; i <= n; i++)
    {
      for (long w = Min(maxW, (long)i * (i + 1) / 2); w >= i; w--)
        count[w] += count[w - i];
    }

    ulong cumulative = 0;
    for (int w = 0; w <= maxW; w++)
    {
      cumulative += count[w];
      double cdf = (double)cumulative / total;
      if (cdf >= p)
        return w;
    }

    return (int)maxW;
  }

  /// <summary>
  /// Compute one-sided margin using Edgeworth approximation for large n.
  /// </summary>
  private static long CalcApproxRaw(int n, double misrate)
  {
    long maxW = (long)n * (n + 1) / 2;
    long a = 0;
    long b = maxW;

    while (a < b - 1)
    {
      long c = (a + b) / 2;
      double cdf = EdgeworthCdf(n, c);
      if (cdf < misrate)
        a = c;
      else
        b = c;
    }

    return EdgeworthCdf(n, b) < misrate ? b : a;
  }

  /// <summary>
  /// Edgeworth expansion for Wilcoxon signed-rank distribution CDF.
  /// </summary>
  private static double EdgeworthCdf(int n, long w)
  {
    double mu = (double)n * (n + 1) / 4.0;
    double sigma2 = n * (n + 1.0) * (2 * n + 1) / 24.0;
    double sigma = Sqrt(sigma2);

    // +0.5 continuity correction: computing P(W ≤ w) for a left-tail discrete CDF
    double z = (w - mu + 0.5) / sigma;
    double phi = Exp(-z * z / 2) / Sqrt(2 * PI);
    double Phi = AcmAlgorithm209.Gauss(z);

    double nd = n;
    double kappa4 = -nd * (nd + 1) * (2 * nd + 1) * (3 * nd * nd + 3 * nd - 1) / 240.0;

    double e3 = kappa4 / (24 * sigma2 * sigma2);

    double z2 = z * z;
    double z3 = z2 * z;
    double f3 = -phi * (z3 - 3 * z);

    double edgeworth = Phi + e3 * f3;
    return Min(Max(edgeworth, 0), 1);
  }

}

Tests

SignedRankMargin(n,misrate)\operatorname{SignedRankMargin}(n, \mathrm{misrate})

The SignedRankMargin\operatorname{SignedRankMargin} test suite contains 39 correctness test cases (4 demo + 6 boundary + 7 exact + 20 medium + 2 error).

Demo examples (n=30n = 30) — from manual introduction:

  • demo-1: n=30n=30, misrate=106\mathrm{misrate}=10^{-6}, expected output: 4646
  • demo-2: n=30n=30, misrate=105\mathrm{misrate}=10^{-5}, expected output: 7474
  • demo-3: n=30n=30, misrate=104\mathrm{misrate}=10^{-4}, expected output: 112112
  • demo-4: n=30n=30, misrate=103\mathrm{misrate}=10^{-3}, expected output: 158158

These demo cases match the reference values used throughout the manual to illustrate CenterBounds\operatorname{CenterBounds} construction.

Boundary cases — minimum achievable misrate validation:

  • boundary-n2-min: n=2n=2, misrate=0.5\mathrm{misrate}=0.5 (minimum misrate for n=2n=2, expected output: 00)
  • boundary-n3-min: n=3n=3, misrate=0.25\mathrm{misrate}=0.25 (minimum misrate for n=3n=3)
  • boundary-n4-min: n=4n=4, misrate=0.125\mathrm{misrate}=0.125 (minimum misrate for n=4n=4)
  • boundary-loose: n=5n=5, misrate=0.5\mathrm{misrate}=0.5 (permissive misrate)
  • boundary-tight: n=10n=10, misrate=0.01\mathrm{misrate}=0.01 (strict misrate)
  • boundary-very-tight: n=20n=20, misrate=0.001\mathrm{misrate}=0.001 (very strict misrate)

These boundary cases validate correct handling of minimum achievable misrate (formula: 21n2^{1-n}) and edge conditions.

Exact computation (n10n \leq 10) — validates dynamic programming path:

  • exact-n5-mr1e1: n=5n=5, misrate=0.1\mathrm{misrate}=0.1
  • exact-n6-mr1e1: n=6n=6, misrate=0.1\mathrm{misrate}=0.1
  • exact-n6-mr5e2: n=6n=6, misrate=0.05\mathrm{misrate}=0.05
  • exact-n10-mr1e1: n=10n=10, misrate=0.1\mathrm{misrate}=0.1, expected output: 2222
  • exact-n10-mr1e2: n=10n=10, misrate=0.01\mathrm{misrate}=0.01
  • exact-n10-mr5e2: n=10n=10, misrate=0.05\mathrm{misrate}=0.05
  • exact-n10-mr5e3: n=10n=10, misrate=0.005\mathrm{misrate}=0.005

These cases exercise the exact Wilcoxon signed-rank CDF computation for small samples where dynamic programming is used.

Medium samples (nin15,20,30,50,100n in {15, 20, 30, 50, 100} × 4 misrates) — 20 tests:

  • Misrate values: misratein101,102,103,104\mathrm{misrate} in {10^{-1}, 10^{-2}, 10^{-3}, 10^{-4}}
  • Test naming: medium-n{n}-mr{k} where kk encodes the misrate
  • Examples:
  • medium-n15-mr1e1: n=15n=15, misrate=0.1\mathrm{misrate}=0.1
  • medium-n30-mr1e2: n=30n=30, misrate=0.01\mathrm{misrate}=0.01, expected output: 220220
  • medium-n50-mr1e3: n=50n=50, misrate=0.001\mathrm{misrate}=0.001
  • medium-n100-mr1e4: n=100n=100, misrate=0.0001\mathrm{misrate}=0.0001

The medium sample tests validate the transition region between exact computation (n63n \leq 63) and approximate computation, ensuring consistent results across sample sizes and misrate values.

Error case — domain violation:

  • error-n1: n=1n=1, misrate=0.5\mathrm{misrate}=0.5 (invalid: misrate below minimum achievable 211=1.02^{1-1} = 1.0)
  • error-n0: n=0n=0, misrate=0.05\mathrm{misrate}=0.05 (invalid: n must be positive)

This error case verifies that implementations correctly reject n=1n=1 with misrate=0.5\mathrm{misrate}=0.5 as invalid input, since the minimum achievable misrate for n=1n=1 is 20=1.02^0 = 1.0.