Algorithms

This chapter describes the core algorithms that power the toolkit.

Pseudorandom Number Generation

The toolkit provides deterministic randomization utilities that produce identical sequences across all supported programming languages. This cross-language reproducibility enables deterministic experiments and cross-validation of statistical analyses.

The core random number generator uses the xoshiro256++ algorithm (see Blackman & Vigna 2021), a member of the xoshiro/xoroshiro family developed by David Blackman and Sebastiano Vigna. This algorithm was selected for several reasons:

  • Quality: passes all tests in the BigCrush test suite from TestU01
  • Speed: extremely fast due to simple bitwise operations (shifts, rotations, XORs)
  • Period: period of 225612^{256} - 1, sufficient for parallel simulations
  • Adoption: used by .NET 6+, Julia, and Rusts rand crate

The algorithm maintains a 256-bit state (s0,s1,s2,s3s_0, s_1, s_2, s_3) and produces 64-bit outputs. Each step updates the state through a combination of XOR, shift, and rotate operations.

Seed Initialization

Converting a single seed value into the full 256-bit state requires a seeding algorithm. The toolkit uses SplitMix64 (see Steele et al. 2014) for this purpose:

xx+0x9e3779b97f4a7c15z(xxor(x30))×0xbf58476d1ce4e5b9z(zxor(z27))×0x94d049bb133111eboutputzxor(z31)\begin{aligned} x &\leftarrow x + \text{0x9e3779b97f4a7c15} \\ z &\leftarrow (x \operatorname{xor} (x \gg 30)) \times \text{0xbf58476d1ce4e5b9} \\ z &\leftarrow (z \operatorname{xor} (z \gg 27)) \times \text{0x94d049bb133111eb} \\ \text{output} &\leftarrow z \operatorname{xor} (z \gg 31) \end{aligned}

Four consecutive outputs from SplitMix64 initialize the xoshiro256++ state (s0,s1,s2,s3s_0, s_1, s_2, s_3). This approach provides high-quality initial states from simple integer seeds.

String Seeds

For named experiments (e.g., Rng(experiment-1)\operatorname{Rng}(\text{experiment-1})), string seeds are converted to integers using FNV-1a hash (see Fowler et al. 1991):

hash0xcbf29ce484222325(offset basis)for each byteb:hash(hashxorb)×0x00000100000001b3(FNV prime)\begin{aligned} \text{hash} &\leftarrow \text{0xcbf29ce484222325} \quad \text{(offset basis)} \\ \text{for each byte} b: \quad \text{hash} &\leftarrow (\text{hash} \operatorname{xor} b) \times \text{0x00000100000001b3} \quad \text{(FNV prime)} \end{aligned}

This enables meaningful experiment identifiers while maintaining determinism.

Uniform Floating-Point Generation

To generate uniform values in [0,1)[0, 1), the upper 53 bits of a 64-bit output are used:

uniform()=(next()11)×253\text{uniform()} = (\text{next()} \gg 11) \times 2^{-53}

The 53-bit mantissa of IEEE 754 double precision ensures all representable values in [0,1)[0, 1) are reachable.

Shuffle Algorithm

The Shuffle\operatorname{Shuffle} function uses the Fisher-Yates algorithm (see Fisher & Yates 1938, Knuth 1997), also known as the Knuth shuffle:

for i from n-1 down to 1:
   j = uniform_int(0, i+1)
   swap(array[i], array[j])

This produces a uniformly random permutation in O(n)O(n) time with O(1)O(1) additional space. The algorithm is unbiased: each of the n!n! permutations has equal probability.

Selection Sampling

The Sample\operatorname{Sample} function uses selection sampling (see Fan et al. 1962) to select kk elements from nn without replacement:

seen = 0, selected = 0
for each element x at position i:
   if uniform() < (k - selected) / (n - seen):
       output x
       selected += 1
   seen += 1

This algorithm preserves the original order of elements (order of first appearance) and requires only a single pass through the data. Each element is selected independently with the correct marginal probability, producing a simple random sample.

Distribution Sampling

Box-Muller Transform

The Additive\underline{\operatorname{Additive}} (normal) distribution uses the Box-Muller transform (see Box & Muller 1958) to convert uniform random values to normally distributed values. Given two independent uniform values U1,U2in[0,1)U_1, U_2 in [0, 1):

Z0=2ln(U1)cos(2πU2)Z1=2ln(U1)sin(2πU2)\begin{aligned} Z_0 &= \sqrt{-2 \ln(U_1)} \cos(2 \pi U_2) \\ Z_1 &= \sqrt{-2 \ln(U_1)} \sin(2 \pi U_2) \end{aligned}

Both Z0Z_0 and Z1Z_1 are independent standard normal values. The implementation uses only Z0Z_0 to maintain cross-language determinism.

Numerical Constants

Distribution sampling requires handling edge cases where floating-point operations would be undefined. Two constants are used across all language implementations:

Machine Epsilon (ϵmach\epsilon_{\text{mach}}): The smallest ϵ\epsilon such that 1+ϵ11 + \epsilon \neq 1 in float64 arithmetic.

ϵmach=2522.22×1016\epsilon_{\text{mach}} = 2^{-52} \approx 2.22 \times 10^{-16}

Used when U=1U = 1 to avoid ln(0)\ln(0) or division by zero in inverse transform sampling.

Smallest Positive Subnormal (ϵsub\epsilon_{\text{sub}}): The smallest positive value representable in IEEE 754 binary64.

ϵsub=210744.94×10324\epsilon_{\text{sub}} = 2^{-1074} \approx 4.94 \times 10^{-324}

Used when U=0U = 0 to avoid ln(0)\ln(0) in the Box-Muller transform.

All language implementations use the same literal values for these constants (not language-specific builtins like Number.EPSILON or f64::EPSILON) to ensure bit-identical outputs across languages.

Inverse Transform Sampling

Other distributions use inverse transform sampling. Given a uniform value Uin[0,1)U in [0, 1) and the inverse CDF F1F^{-1}:

  • Exponential: X=ln(1U)λX = -\frac{\ln(1 - U)}{\lambda}
  • Pareto (Power): X=xmin(1U)1αX = \frac{x_min}{(1 - U)^{\frac{1}{\alpha}}}
  • Uniform: X=a+U(ba)X = a + U(b - a)

The log-normal (Multiplic\underline{\operatorname{Multiplic}}) distribution samples from Additive\underline{\operatorname{Additive}} and applies the exponential transform: X=exp(Additive(μ,σ))X = \exp(\underline{\operatorname{Additive}}(\mu, \sigma)).

Fast Center

The Center\operatorname{Center} estimator computes the median of all pairwise averages from a sample. Given a dataset x=(x1,x2,,xn)x = (x_1, x_2, \ldots, x_n), this estimator is defined as:

Center(x)=median1ijnxi+xj2\operatorname{Center}(\mathbf{x}) = \operatorname{median}_{1 \leq i \leq j \leq n} \frac{x_i + x_j}{2}

A direct implementation would generate all n(n+1)2\frac{n(n+1)}{2} pairwise averages and sort them. With n=10000n = 10000, this approach creates approximately 50 million values, requiring quadratic memory and O(n2logn)O(n^2 \log n) time.

The breakthrough came in 1984 when John Monahan developed an algorithm that reduces expected complexity to O(nlogn)O(n \log n) while using only linear memory (see Monahan 1984). The algorithm exploits the inherent structure in pairwise sums rather than computing them explicitly. After sorting the values x1x2xnx_1 \leq x_2 \leq \ldots \leq x_n, the algorithm considers the implicit upper triangular matrix MM where Mi,j=xi+xjM_{i,j} = x_i + x_j for iji \leq j. This matrix has a crucial property: each row and column is sorted in non-decreasing order, enabling efficient median selection without storing the matrix.

Rather than sorting all pairwise sums, the algorithm uses a selection approach similar to quickselect. It maintains search bounds for each matrix row and iteratively narrows the search space. For each row ii, the algorithm tracks active column indices from i+1i+1 to nn, defining which pairwise sums remain candidates for the median. It selects a candidate sum as a pivot using randomized selection from active matrix elements, then counts how many pairwise sums fall below the pivot. Because both rows and columns are sorted, this counting takes only O(n)O(n) time using a two-pointer sweep from the matrixs upper-right corner.

The median corresponds to rank k=N+12k = \lfloor \frac{N+1}{2} \rfloor where N=n(n+1)2N = \frac{n(n+1)}{2}. If fewer than kk sums lie below the pivot, the median must be larger; if more than kk sums lie below the pivot, the median must be smaller. Based on this comparison, the algorithm eliminates portions of each row that cannot contain the median, shrinking the active search space while preserving the true median.

Real data often contain repeated values, which can cause the selection process to stall. When the algorithm detects no progress between iterations, it switches to a midrange strategy: find the smallest and largest pairwise sums still in the search space, then use their average as the next pivot. If the minimum equals the maximum, all remaining candidates are identical and the algorithm terminates. This tie-breaking mechanism ensures reliable convergence with discrete or duplicated data.

The algorithm achieves O(nlogn)O(n \log n) time complexity through linear partitioning (each pivot evaluation requires only O(n)O(n) operations) and logarithmic iterations (randomized pivot selection leads to expected O(logn)O(\log n) iterations, similar to quickselect). The algorithm maintains only row bounds and counters, using O(n)O(n) additional space. This matches the complexity of sorting a single array while avoiding the quadratic memory and time explosion of computing all pairwise combinations.

namespace Pragmastat.Algorithms;

internal static class FastCenter
{
  /// <summary>
  /// ACM Algorithm 616: fast computation of the Hodges-Lehmann location estimator
  /// </summary>
  /// <remarks>
  /// Computes the median of all pairwise averages (xi + xj)/2 efficiently.
  /// See: John F Monahan, "Algorithm 616: fast computation of the Hodges-Lehmann location estimator"
  /// (1984) DOI: 10.1145/1271.319414
  /// </remarks>
  /// <param name="values">A sorted sample of values</param>
  /// <param name="random">Random number generator</param>
  /// <param name="isSorted">If values are sorted</param>
  /// <returns>Exact center value (Hodges-Lehmann estimator)</returns>
  public static double Estimate(IReadOnlyList<double> values, Random? random = null, bool isSorted = false)
  {
    int n = values.Count;
    if (n == 1) return values[0];
    if (n == 2) return (values[0] + values[1]) / 2;
    random ??= new Random();
    if (!isSorted)
      values = values.OrderBy(x => x).ToList();

    // Calculate target median rank(s) among all pairwise sums
    long totalPairs = (long)n * (n + 1) / 2;
    long medianRankLow = (totalPairs + 1) / 2; // For odd totalPairs, this is the median
    long medianRankHigh =
      (totalPairs + 2) / 2; // For even totalPairs, average of ranks medianRankLow and medianRankHigh

    // Initialize search bounds for each row in the implicit matrix
    long[] leftBounds = new long[n];
    long[] rightBounds = new long[n];
    long[] partitionCounts = new long[n];

    for (int i = 0; i < n; i++)
    {
      leftBounds[i] = i + 1; // Row i can pair with columns [i+1..n] (1-based indexing)
      rightBounds[i] = n; // Initially, all columns are available
    }

    // Start with a good pivot: sum of middle elements (handles both odd and even n)
    double pivot = values[(n - 1) / 2] + values[n / 2];
    long activeSetSize = totalPairs;
    long previousCount = 0;

    while (true)
    {
      // === PARTITION STEP ===
      // Count pairwise sums less than current pivot
      long countBelowPivot = 0;
      long currentColumn = n;

      for (int row = 1; row <= n; row++)
      {
        partitionCounts[row - 1] = 0;

        // Move left from current column until we find sums < pivot
        // This exploits the sorted nature of the matrix
        while (currentColumn >= row && values[row - 1] + values[(int)currentColumn - 1] >= pivot)
          currentColumn--;

        // Count elements in this row that are < pivot
        if (currentColumn >= row)
        {
          long elementsBelow = currentColumn - row + 1;
          partitionCounts[row - 1] = elementsBelow;
          countBelowPivot += elementsBelow;
        }
      }

      // === CONVERGENCE CHECK ===
      // If no progress, we have ties - break them using midrange strategy
      if (countBelowPivot == previousCount)
      {
        double minActiveSum = double.MaxValue;
        double maxActiveSum = double.MinValue;

        // Find the range of sums still in the active search space
        for (int i = 0; i < n; i++)
        {
          if (leftBounds[i] > rightBounds[i]) continue; // Skip empty rows

          double rowValue = values[i];
          double smallestInRow = values[(int)leftBounds[i] - 1] + rowValue;
          double largestInRow = values[(int)rightBounds[i] - 1] + rowValue;

          minActiveSum = Min(minActiveSum, smallestInRow);
          maxActiveSum = Max(maxActiveSum, largestInRow);
        }

        pivot = (minActiveSum + maxActiveSum) / 2;
        if (pivot <= minActiveSum || pivot > maxActiveSum) pivot = maxActiveSum;

        // If all remaining values are identical, we're done
        if (minActiveSum == maxActiveSum || activeSetSize <= 2)
          return pivot / 2;

        continue;
      }

      // === TARGET CHECK ===
      // Check if we've found the median rank(s)
      bool atTargetRank = countBelowPivot == medianRankLow || countBelowPivot == medianRankHigh - 1;
      if (atTargetRank)
      {
        // Find the boundary values: largest < pivot and smallest >= pivot
        double largestBelowPivot = double.MinValue;
        double smallestAtOrAbovePivot = double.MaxValue;

        for (int i = 1; i <= n; i++)
        {
          long countInRow = partitionCounts[i - 1];
          double rowValue = values[i - 1];
          long totalInRow = n - i + 1;

          // Find largest sum in this row that's < pivot
          if (countInRow > 0)
          {
            long lastBelowIndex = i + countInRow - 1;
            double lastBelowValue = rowValue + values[(int)lastBelowIndex - 1];
            largestBelowPivot = Max(largestBelowPivot, lastBelowValue);
          }

          // Find smallest sum in this row that's >= pivot
          if (countInRow < totalInRow)
          {
            long firstAtOrAboveIndex = i + countInRow;
            double firstAtOrAboveValue = rowValue + values[(int)firstAtOrAboveIndex - 1];
            smallestAtOrAbovePivot = Min(smallestAtOrAbovePivot, firstAtOrAboveValue);
          }
        }

        // Calculate final result based on whether we have odd or even number of pairs
        if (medianRankLow < medianRankHigh)
        {
          // Even total: average the two middle values
          return (smallestAtOrAbovePivot + largestBelowPivot) / 4;
        }
        else
        {
          // Odd total: return the single middle value
          bool needLargest = countBelowPivot == medianRankLow;
          return (needLargest ? largestBelowPivot : smallestAtOrAbovePivot) / 2;
        }
      }

      // === UPDATE BOUNDS ===
      // Narrow the search space based on partition result
      if (countBelowPivot < medianRankLow)
      {
        // Too few values below pivot - eliminate smaller values, search higher
        for (int i = 0; i < n; i++)
          leftBounds[i] = i + partitionCounts[i] + 1;
      }
      else
      {
        // Too many values below pivot - eliminate larger values, search lower
        for (int i = 0; i < n; i++)
          rightBounds[i] = i + partitionCounts[i];
      }

      // === PREPARE NEXT ITERATION ===
      previousCount = countBelowPivot;

      // Recalculate how many elements remain in the active search space
      activeSetSize = 0;
      for (int i = 0; i < n; i++)
      {
        long rowSize = rightBounds[i] - leftBounds[i] + 1;
        activeSetSize += Max(0, rowSize);
      }

      // Choose next pivot based on remaining active set size
      if (activeSetSize > 2)
      {
        // Use randomized row median strategy for efficiency
        // Handle large activeSetSize by using double precision for random selection
        double randomFraction = random.NextDouble();
        long targetIndex = (long)(randomFraction * activeSetSize);
        int selectedRow = 0;

        // Find which row contains the target index
        long cumulativeSize = 0;
        for (int i = 0; i < n; i++)
        {
          long rowSize = Max(0, rightBounds[i] - leftBounds[i] + 1);
          if (targetIndex < cumulativeSize + rowSize)
          {
            selectedRow = i;
            break;
          }

          cumulativeSize += rowSize;
        }

        // Use median element of the selected row as pivot
        long medianColumnInRow = (leftBounds[selectedRow] + rightBounds[selectedRow]) / 2;
        pivot = values[selectedRow] + values[(int)medianColumnInRow - 1];
      }
      else
      {
        // Few elements remain - use midrange strategy
        double minRemainingSum = double.MaxValue;
        double maxRemainingSum = double.MinValue;

        for (int i = 0; i < n; i++)
        {
          if (leftBounds[i] > rightBounds[i]) continue; // Skip empty rows

          double rowValue = values[i];
          double minInRow = values[(int)leftBounds[i] - 1] + rowValue;
          double maxInRow = values[(int)rightBounds[i] - 1] + rowValue;

          minRemainingSum = Min(minRemainingSum, minInRow);
          maxRemainingSum = Max(maxRemainingSum, maxInRow);
        }

        pivot = (minRemainingSum + maxRemainingSum) / 2;
        if (pivot <= minRemainingSum || pivot > maxRemainingSum)
          pivot = maxRemainingSum;

        if (minRemainingSum == maxRemainingSum)
          return pivot / 2;
      }
    }
  }
}

Fast Spread

The Spread\operatorname{Spread} estimator computes the median of all pairwise absolute differences. Given a sample x=(x1,x2,,xn)x = (x_1, x_2, \ldots, x_n), this estimator is defined as:

Spread(x)=median1i<jnxixj\operatorname{Spread}(\mathbf{x}) = \operatorname{median}_{1 \leq i < j \leq n} \lvert x_i - x_j \rvert

Like Center\operatorname{Center}, computing Spread\operatorname{Spread} naively requires generating all n(n1)2\frac{n(n-1)}{2} pairwise differences, sorting them, and finding the median — a quadratic approach that becomes computationally prohibitive for large datasets.

The same structural principles that accelerate Center\operatorname{Center} computation also apply to pairwise differences, yielding an exact O(nlogn)O(n \log n) algorithm. After sorting the input to obtain y1y2yny_1 \leq y_2 \leq \ldots \leq y_n, all pairwise absolute differences xixj\lvert x_i - x_j \rvert with i<ji < j become positive differences yjyiy_j - y_i. This allows considering the implicit upper triangular matrix DD where Di,j=yjyiD_{i,j} = y_j - y_i for i<ji < j. This matrix has a crucial structural property: for a fixed row ii, differences increase monotonically, while for a fixed column jj, differences decrease as ii increases. This sorted structure enables linear-time counting of elements below any threshold.

The algorithm applies Monahans selection strategy (Monahan 1984), adapted for differences rather than sums. For each row ii, it tracks active column indices representing differences still under consideration, initially spanning columns i+1i+1 through nn. It chooses candidate differences from the active set using weighted random row selection, maintaining expected logarithmic convergence while avoiding expensive pivot computations. For any pivot value pp, the number of differences falling below pp is counted using a single sweep; the monotonic structure ensures this counting requires only O(n)O(n) operations. While counting, the largest difference below pp and the smallest difference at or above pp are maintained — these boundary values become the exact answer when the target rank is reached.

The algorithm naturally handles both odd and even cases. For an odd number of differences, it returns the single middle element when the count exactly hits the median rank. For an even number of differences, it returns the average of the two middle elements; boundary tracking during counting provides both values simultaneously. Unlike approximation methods, this algorithm returns the precise median of all pairwise differences; randomness affects only performance, not correctness.

The algorithm includes the same stall-handling mechanisms as the center algorithm. It tracks whether the count below the pivot changes between iterations; when progress stalls due to tied values, it computes the range of remaining active differences and pivots to their midrange. This midrange strategy ensures convergence even with highly discrete data or datasets with many identical values.

Several optimizations make the implementation practical for production use. A global column pointer that never moves backward during counting exploits the matrix structure to avoid redundant comparisons. Exact boundary values are captured during each counting pass, eliminating the need for additional searches when the target rank is reached. Using only O(n)O(n) additional space for row bounds and counters, the algorithm achieves O(nlogn)O(n \log n) time complexity with minimal memory overhead, making robust scale estimation practical for large datasets.

namespace Pragmastat.Algorithms;

internal static class FastSpread
{
  /// <summary>
  /// Shamos "Spread".  Expected O(n log n) time, O(n) extra space. Exact.
  /// </summary>
  public static double Estimate(IReadOnlyList<double> values, Random? random = null, bool isSorted = false)
  {
    int n = values.Count;
    if (n <= 1) return 0;
    if (n == 2) return Abs(values[1] - values[0]);
    random ??= new Random();

    // Prepare a sorted working copy.
    double[] a = isSorted ? CopySorted(values) : EnsureSorted(values);

    // Total number of pairwise differences with i < j
    long N = (long)n * (n - 1) / 2;
    long kLow = (N + 1) / 2; // 1-based rank of lower middle
    long kHigh = (N + 2) / 2; // 1-based rank of upper middle

    // Per-row active bounds over columns j (0-based indices).
    // Row i allows j in [i+1, n-1] initially.
    int[] L = new int[n];
    int[] R = new int[n];
    long[] rowCounts = new long[n]; // # of elements in row i that are < pivot (for current partition)

    for (int i = 0; i < n; i++)
    {
      L[i] = Min(i + 1, n); // n means empty
      R[i] = n - 1; // inclusive
      if (L[i] > R[i])
      {
        L[i] = 1;
        R[i] = 0;
      } // mark empty
    }

    // A reasonable initial pivot: a central gap
    double pivot = a[n / 2] - a[(n - 1) / 2];

    long prevCountBelow = -1;

    while (true)
    {
      // === PARTITION: count how many differences are < pivot; also track boundary neighbors ===
      long countBelow = 0;
      double largestBelow = double.NegativeInfinity; // max difference < pivot
      double smallestAtOrAbove = double.PositiveInfinity; // min difference >= pivot

      int j = 1; // global two-pointer (non-decreasing across rows)
      for (int i = 0; i < n - 1; i++)
      {
        if (j < i + 1) j = i + 1;
        while (j < n && a[j] - a[i] < pivot) j++;

        long cntRow = j - (i + 1);
        if (cntRow < 0) cntRow = 0;
        rowCounts[i] = cntRow;
        countBelow += cntRow;

        // boundary elements for this row
        if (cntRow > 0)
        {
          // last < pivot in this row is (j-1)
          double candBelow = a[j - 1] - a[i];
          if (candBelow > largestBelow) largestBelow = candBelow;
        }

        if (j < n)
        {
          double candAtOrAbove = a[j] - a[i];
          if (candAtOrAbove < smallestAtOrAbove) smallestAtOrAbove = candAtOrAbove;
        }
      }

      // === TARGET CHECK ===
      // If we've split exactly at the middle, we can return using the boundaries we just found.
      bool atTarget =
        (countBelow == kLow) || // lower middle is the largest < pivot
        (countBelow == (kHigh - 1)); // upper middle is the smallest >= pivot

      if (atTarget)
      {
        if (kLow < kHigh)
        {
          // Even N: average the two central order stats.
          return 0.5 * (largestBelow + smallestAtOrAbove);
        }
        else
        {
          // Odd N: pick the single middle depending on which side we hit.
          bool needLargest = (countBelow == kLow);
          return needLargest ? largestBelow : smallestAtOrAbove;
        }
      }

      // === STALL HANDLING (ties / no progress) ===
      if (countBelow == prevCountBelow)
      {
        // Compute min/max remaining difference in the ACTIVE set and pivot to their midrange.
        double minActive = double.PositiveInfinity;
        double maxActive = double.NegativeInfinity;
        long active = 0;

        for (int i = 0; i < n - 1; i++)
        {
          int Li = L[i], Ri = R[i];
          if (Li > Ri) continue;

          double rowMin = a[Li] - a[i];
          double rowMax = a[Ri] - a[i];
          if (rowMin < minActive) minActive = rowMin;
          if (rowMax > maxActive) maxActive = rowMax;
          active += (Ri - Li + 1);
        }

        if (active <= 0)
        {
          // No active candidates left: the only consistent answer is the boundary implied by counts.
          // Fall back to neighbors from this partition.
          if (kLow < kHigh) return 0.5 * (largestBelow + smallestAtOrAbove);
          return (countBelow >= kLow) ? largestBelow : smallestAtOrAbove;
        }

        if (maxActive <= minActive) return minActive; // all remaining equal

        double mid = 0.5 * (minActive + maxActive);
        pivot = (mid > minActive && mid <= maxActive) ? mid : maxActive;
        prevCountBelow = countBelow;
        continue;
      }

      // === SHRINK ACTIVE WINDOW ===
      // --- SHRINK ACTIVE WINDOW (fixed) ---
      if (countBelow < kLow)
      {
        // Need larger differences: discard all strictly below pivot.
        for (int i = 0; i < n - 1; i++)
        {
          // First j with a[j] - a[i] >= pivot is j = i + 1 + cntRow (may be n => empty row)
          int newL = i + 1 + (int)rowCounts[i];
          if (newL > L[i]) L[i] = newL; // do NOT clamp; allow L[i] == n to mean empty
          if (L[i] > R[i])
          {
            L[i] = 1;
            R[i] = 0;
          } // mark empty
        }
      }
      else
      {
        // Too many below: keep only those strictly below pivot.
        for (int i = 0; i < n - 1; i++)
        {
          // Last j with a[j] - a[i] < pivot is j = i + cntRow  (not cntRow-1!)
          int newR = i + (int)rowCounts[i];
          if (newR < R[i]) R[i] = newR; // shrink downward to the true last-below
          if (R[i] < i + 1)
          {
            L[i] = 1;
            R[i] = 0;
          } // empty row if none remain
        }
      }

      prevCountBelow = countBelow;

      // === CHOOSE NEXT PIVOT FROM ACTIVE SET (weighted random row, then row median) ===
      long activeSize = 0;
      for (int i = 0; i < n - 1; i++)
      {
        if (L[i] <= R[i]) activeSize += (R[i] - L[i] + 1);
      }

      if (activeSize <= 2)
      {
        // Few candidates left: return midrange of remaining exactly.
        double minRem = double.PositiveInfinity, maxRem = double.NegativeInfinity;
        for (int i = 0; i < n - 1; i++)
        {
          if (L[i] > R[i]) continue;
          double lo = a[L[i]] - a[i];
          double hi = a[R[i]] - a[i];
          if (lo < minRem) minRem = lo;
          if (hi > maxRem) maxRem = hi;
        }

        if (activeSize <= 0) // safety net; fall back to boundary from last partition
        {
          if (kLow < kHigh) return 0.5 * (largestBelow + smallestAtOrAbove);
          return (countBelow >= kLow) ? largestBelow : smallestAtOrAbove;
        }

        if (kLow < kHigh) return 0.5 * (minRem + maxRem);
        return (Abs((kLow - 1) - countBelow) <= Abs(countBelow - kLow)) ? minRem : maxRem;
      }
      else
      {
        long t = NextIndex(random, activeSize); // 0..activeSize-1
        long acc = 0;
        int row = 0;
        for (; row < n - 1; row++)
        {
          if (L[row] > R[row]) continue;
          long size = R[row] - L[row] + 1;
          if (t < acc + size) break;
          acc += size;
        }

        // Median column of the selected row
        int col = (L[row] + R[row]) >> 1;
        pivot = a[col] - a[row];
      }
    }
  }
  // --- Helpers ---

  private static double[] CopySorted(IReadOnlyList<double> values)
  {
    var a = new double[values.Count];
    for (int i = 0; i < a.Length; i++)
    {
      double v = values[i];
      if (double.IsNaN(v)) throw new ArgumentException("NaN not allowed.", nameof(values));
      a[i] = v;
    }

    Array.Sort(a);
    return a;
  }

  private static double[] EnsureSorted(IReadOnlyList<double> values)
  {
    // Trust caller; still copy to array for fast indexed access.
    var a = new double[values.Count];
    for (int i = 0; i < a.Length; i++)
    {
      double v = values[i];
      if (double.IsNaN(v)) throw new ArgumentException("NaN not allowed.", nameof(values));
      a[i] = v;
    }

    return a;
  }

  private static long NextIndex(Random rng, long limitExclusive)
  {
    // Uniform 0..limitExclusive-1 even for large ranges.
    // Use rejection sampling for correctness.
    ulong uLimit = (ulong)limitExclusive;
    if (uLimit <= int.MaxValue)
    {
      return rng.Next((int)uLimit);
    }

    while (true)
    {
      ulong u = ((ulong)(uint)rng.Next() << 32) | (uint)rng.Next();
      ulong r = u % uLimit;
      if (u - r <= ulong.MaxValue - (ulong.MaxValue % uLimit)) return (long)r;
    }
  }
}

Fast Shift

The Shift\operatorname{Shift} estimator measures the median of all pairwise differences between elements of two samples. Given samples x=(x1,x2,,xn)\mathbf{x} = (x_1, x_2, \ldots, x_n) and y=(y1,y2,,ym)\mathbf{y} = (y_1, y_2, \ldots, y_m), this estimator is defined as:

Shift(x,y)=median1in,1jm(xiyj)\operatorname{Shift}(\mathbf{x}, \mathbf{y}) = \operatorname{median}_{1 \leq i \leq n, 1 \leq j \leq m} (x_i - y_j)

This definition represents a special case of a more general problem: computing arbitrary quantiles of all pairwise differences. For samples of size nn and mm, the total number of pairwise differences is n×mn \times m. A naive approach would materialize all differences, sort them, and extract the desired quantile. With n=m=10000n = m = 10000, this approach creates 100 million values, requiring quadratic memory and O(nmlog(nm))O(n m \log(n m)) time.

The presented algorithm avoids materializing pairwise differences by exploiting the sorted structure of the samples. After sorting both samples to obtain x1x2xnx_1 \leq x_2 \leq \ldots \leq x_n and y1y2ymy_1 \leq y_2 \leq \ldots \leq y_m, the key insight is that its possible to count how many pairwise differences fall below any threshold without computing them explicitly. This counting operation enables a binary search over the continuous space of possible difference values, iteratively narrowing the search range until it converges to the exact quantile.

The algorithm operates through a value-space search rather than index-space selection. It maintains a search interval [searchMin,searchMax][\text{searchMin}, \text{searchMax}], initialized to the range of all possible differences: [x1ym,xny1][x_1 - y_m, x_n - y_1]. At each iteration, it selects a candidate value within this interval and counts how many pairwise differences are less than or equal to this threshold. For the median (quantile p=0.5p = 0.5), if fewer than half the differences lie below the threshold, the median must be larger; if more than half lie below, the median must be smaller. Based on this comparison, the search space is reduced by eliminating portions that cannot contain the target quantile.

The counting operation achieves linear complexity via a two-pointer sweep. For a given threshold tt, the number of pairs (i,j)(i,j) satisfying xiyjtx_i - y_j \leq t is counted. This is equivalent to counting pairs where yjxity_j \geq x_i - t. For each row ii in the implicit matrix of differences, a column pointer advances through the sorted yy array while xiyj>tx_i - y_j > t, stopping at the first position where xiyjtx_i - y_j \leq t. All subsequent positions in that row satisfy the condition, contributing (mj)(m - j) pairs to the count for row ii. Because both samples are sorted, the column pointer advances monotonically and never backtracks across rows, making each counting pass O(n+m)O(n + m) regardless of the total number of differences.

During each counting pass, the algorithm tracks boundary values: the largest difference at or below the threshold and the smallest difference above it. When the count exactly matches the target rank (or the two middle ranks for even-length samples), these boundary values provide the exact answer without additional searches. For Type-7 quantile computation (Hyndman & Fan 1996), which interpolates between order statistics, the algorithm collects the necessary boundary values in a single pass and performs linear interpolation: (1w)lower+wupper(1 - w) \cdot \text{lower} + w \cdot \text{upper}.

Real datasets often contain discrete or repeated values that can cause search stagnation. The algorithm detects when the search interval stops shrinking between iterations, indicating that multiple pairwise differences share the same value. When the closest difference below the threshold equals the closest above, all remaining candidates are identical and the algorithm terminates immediately. Otherwise, it uses the boundary values from the counting pass to snap the search interval to actual difference values, ensuring reliable convergence even with highly discrete data.

The binary search employs numerically stable midpoint calculations and terminates when the search interval collapses to a single value or when boundary tracking confirms convergence. Iteration limits are included as a safety mechanism, though convergence typically occurs much earlier due to the exponential narrowing of the search space.

The algorithm naturally generalizes to multiple quantiles by computing each one independently. For kk quantiles with samples of size nn and mm, the total complexity is O(k(n+m)logL)O(k(n + m) \log L), where LL represents the convergence precision. This is dramatically more efficient than the naive O(nmlog(nm))O(n m \log(n m)) approach, especially for large nn and mm with small kk. The algorithm requires only O(1)O(1) additional space beyond the input arrays, making it practical for large-scale statistical analysis where memory constraints prohibit materializing quadratic data structures.

namespace Pragmastat.Algorithms;

using System;
using System.Collections.Generic;
using System.Linq;

public static class FastShift
{
  /// <summary>
  /// Computes quantiles of all pairwise differences { x_i - y_j }.
  /// Time: O((m + n) * log(precision)) per quantile. Space: O(1).
  /// </summary>
  /// <param name="p">Probabilities in [0, 1].</param>
  /// <param name="assumeSorted">If false, collections will be sorted.</param>
  public static double[] Estimate(IReadOnlyList<double> x, IReadOnlyList<double> y, double[] p, bool assumeSorted = false)
  {
    if (x == null || y == null || p == null)
      throw new ArgumentNullException();
    if (x.Count == 0 || y.Count == 0)
      throw new ArgumentException("x and y must be non-empty.");

    foreach (double pk in p)
      if (double.IsNaN(pk) || pk < 0.0 || pk > 1.0)
        throw new ArgumentOutOfRangeException(nameof(p), "Probabilities must be within [0, 1].");

    double[] xs, ys;
    if (assumeSorted)
    {
      xs = x as double[] ?? x.ToArray();
      ys = y as double[] ?? y.ToArray();
    }
    else
    {
      xs = x.OrderBy(v => v).ToArray();
      ys = y.OrderBy(v => v).ToArray();
    }

    int m = xs.Length;
    int n = ys.Length;
    long total = (long)m * n;

    // Type-7 quantile: h = 1 + (n-1)*p, then interpolate between floor(h) and ceil(h)
    var requiredRanks = new SortedSet<long>();
    var interpolationParams = new (long lowerRank, long upperRank, double weight)[p.Length];

    for (int i = 0; i < p.Length; i++)
    {
      double h = 1.0 + (total - 1) * p[i];
      long lowerRank = (long)Math.Floor(h);
      long upperRank = (long)Math.Ceiling(h);
      double weight = h - lowerRank;
      if (lowerRank < 1) lowerRank = 1;
      if (upperRank > total) upperRank = total;
      interpolationParams[i] = (lowerRank, upperRank, weight);
      requiredRanks.Add(lowerRank);
      requiredRanks.Add(upperRank);
    }

    var rankValues = new Dictionary<long, double>();
    foreach (long rank in requiredRanks)
      rankValues[rank] = SelectKthPairwiseDiff(xs, ys, rank);

    var result = new double[p.Length];
    for (int i = 0; i < p.Length; i++)
    {
      var (lowerRank, upperRank, weight) = interpolationParams[i];
      double lower = rankValues[lowerRank];
      double upper = rankValues[upperRank];
      result[i] = weight == 0.0 ? lower : (1.0 - weight) * lower + weight * upper;
    }

    return result;
  }

  // Binary search in [min_diff, max_diff] that snaps to actual discrete values.
  // Avoids materializing all m*n differences.
  private static double SelectKthPairwiseDiff(double[] x, double[] y, long k)
  {
    int m = x.Length;
    int n = y.Length;
    long total = (long)m * n;

    if (k < 1 || k > total)
      throw new ArgumentOutOfRangeException(nameof(k));

    double searchMin = x[0] - y[n - 1];
    double searchMax = x[m - 1] - y[0];

    if (double.IsNaN(searchMin) || double.IsNaN(searchMax))
      throw new InvalidOperationException("NaN in input values.");

    const int maxIterations = 128; // Sufficient for double precision convergence
    double prevMin = double.NegativeInfinity;
    double prevMax = double.PositiveInfinity;

    for (int iter = 0; iter < maxIterations && searchMin != searchMax; iter++)
    {
      double mid = Midpoint(searchMin, searchMax);
      CountAndNeighbors(x, y, mid, out long countLessOrEqual, out double closestBelow, out double closestAbove);

      if (closestBelow == closestAbove)
        return closestBelow;

      // No progress means we're stuck between two discrete values
      if (searchMin == prevMin && searchMax == prevMax)
        return countLessOrEqual >= k ? closestBelow : closestAbove;

      prevMin = searchMin;
      prevMax = searchMax;

      if (countLessOrEqual >= k)
        searchMax = closestBelow;
      else
        searchMin = closestAbove;
    }

    if (searchMin != searchMax)
      throw new InvalidOperationException("Convergence failure (pathological input).");

    return searchMin;
  }

  // Two-pointer algorithm: counts pairs where x[i] - y[j] <= threshold, and tracks
  // the closest actual differences on either side of threshold.
  private static void CountAndNeighbors(
    double[] x, double[] y, double threshold,
    out long countLessOrEqual, out double closestBelow, out double closestAbove)
  {
    int m = x.Length, n = y.Length;
    long count = 0;
    double maxBelow = double.NegativeInfinity;
    double minAbove = double.PositiveInfinity;

    int j = 0;
    for (int i = 0; i < m; i++)
    {
      while (j < n && x[i] - y[j] > threshold)
        j++;

      count += (n - j);

      if (j < n)
      {
        double diff = x[i] - y[j];
        if (diff > maxBelow) maxBelow = diff;
      }

      if (j > 0)
      {
        double diff = x[i] - y[j - 1];
        if (diff < minAbove) minAbove = diff;
      }
    }

    // Fallback to actual min/max if no boundaries found (shouldn't happen in normal operation)
    if (double.IsNegativeInfinity(maxBelow))
      maxBelow = x[0] - y[n - 1];
    if (double.IsPositiveInfinity(minAbove))
      minAbove = x[m - 1] - y[0];

    countLessOrEqual = count;
    closestBelow = maxBelow;
    closestAbove = minAbove;
  }

  private static double Midpoint(double a, double b) => a + (b - a) * 0.5;
}

Fast Ratio

The Ratio\operatorname{Ratio} estimator measures the typical multiplicative relationship between elements of two samples. Given samples x=(x1,x2,,xn)\mathbf{x} = (x_1, x_2, \ldots, x_n) and y=(y1,y2,,ym)\mathbf{y} = (y_1, y_2, \ldots, y_m) with all positive values, this estimator is defined as:

Ratio(x,y)=exp(mediani,j(logxilogyj))=exp(Shift(logx,logy))\operatorname{Ratio}(\mathbf{x}, \mathbf{y}) = \exp(\operatorname{median}_{i,j}(\log x_i - \log y_j)) = \exp(\operatorname{Shift}(\log \mathbf{x}, \log \mathbf{y}))

A naive approach would compute all n×mn \times m ratios, sort them, and extract the median. With n=m=10000n = m = 10000, this creates 100 million values, requiring quadratic memory and O(nmlog(nm))O(n m \log(n m)) time.

The presented algorithm avoids this cost by exploiting the multiplicative-additive duality. Taking the logarithm of a ratio yields a difference:

log(xiyj)=log(xi)log(yj)\log(\frac{x_i}{y_j}) = \log(x_i) - \log(y_j)

This transforms the problem of finding the median of pairwise ratios into finding the median of pairwise differences in log-space.

The algorithm operates in three steps:

  • Log-transform — Apply log\log to each element of both samples. If x\mathbf{x} was sorted, logx\log \mathbf{x} remains sorted (log is monotonically increasing for positive values).

  • Delegate to FastShift — Use the Fast Shift algorithm to compute the desired quantile of pairwise differences in log-space. This leverages the O((m+n)logL)O((m + n) \log L) complexity of FastShift.

  • Exp-transform — Apply exp\exp to convert the result back to ratio-space. If the log-space median is dd, then exp(d)=exp(log(xi)log(yj))=xiyj\exp(d) = \exp(\log(x_i) - \log(y_j)) = \frac{x_i}{y_j} is the original ratio.

The total complexity is O((m+n)logL)O((m + n) \log L) per quantile, where LL represents the convergence precision in the log-space binary search. This is dramatically more efficient than the naive O(nmlog(nm))O(n m \log(n m)) approach.

Memory usage is O(m+n)O(m + n) for storing the log-transformed samples, compared to O(nm)O(n m) for materializing all pairwise ratios.

This algorithm naturally extends to computing bounds on ratios: RatioBounds\operatorname{RatioBounds} uses the same log-exp transformation, delegating to ShiftBounds\operatorname{ShiftBounds} in log-space.

namespace Pragmastat.Algorithms;

using System;
using System.Collections.Generic;
using System.Linq;
using Pragmastat.Exceptions;
using Pragmastat.Internal;

/// <summary>
/// Computes quantiles of pairwise ratios via log-transformation and FastShift delegation.
/// Ratio(x, y) = exp(Shift(log(x), log(y)))
/// </summary>
public static class FastRatio
{
  /// <summary>
  /// Computes quantiles of all pairwise ratios { x_i / y_j }.
  /// Time: O((m + n) * log(precision)) per quantile. Space: O(m + n).
  /// </summary>
  /// <remarks>
  /// Log-transformation preserves sort order for positive values.
  /// </remarks>
  /// <param name="x">First sample (must be strictly positive).</param>
  /// <param name="y">Second sample (must be strictly positive).</param>
  /// <param name="p">Probabilities in [0, 1].</param>
  /// <param name="assumeSorted">If true, assumes x and y are already sorted in ascending order.</param>
  /// <returns>Quantile values for each probability in p.</returns>
  public static double[] Estimate(IReadOnlyList<double> x, IReadOnlyList<double> y, double[] p, bool assumeSorted = false)
  {
    // Log-transform both samples (includes positivity check)
    var logX = MathExtensions.Log(x, Subject.X);
    var logY = MathExtensions.Log(y, Subject.Y);

    // Delegate to FastShift in log-space
    var logResult = FastShift.Estimate(logX, logY, p, assumeSorted);

    // Exp-transform back to ratio-space
    return logResult.Select(v => Math.Exp(v)).ToArray();
  }
}

Fast PairwiseMargin

The PairwiseMargin\operatorname{PairwiseMargin} function determines how many extreme pairwise differences to exclude when constructing bounds around Shift(x,y)\operatorname{Shift}(\mathbf{x}, \mathbf{y}). Given samples x=(x1,,xn)\mathbf{x} = (x_1, \ldots, x_n) and y=(y1,,ym)\mathbf{y} = (y_1, \ldots, y_m), the ShiftBounds\operatorname{ShiftBounds} estimator computes all nmn m pairwise differences zij=xiyjz_{i j} = x_i - y_j and sorts them. The bounds select specific order statistics from this sorted sequence: [z(kleft),z(kright)][z_{(k_{\text{left}})}, z_{(k_{\text{right}})}]. The challenge lies in determining which order statistics produce bounds that contain the true shift Shift[X,Y]\operatorname{Shift}[X, Y] with probability 1misrate1 - \mathrm{misrate}.

Random sampling creates natural variation in pairwise differences. Even when populations have identical distributions, sampling variation produces both positive and negative differences. The margin function quantifies this sampling variability: it specifies how many extreme pairwise differences could occur by chance with probability misrate\mathrm{misrate}. For symmetric bounds, this margin splits evenly between the tails, giving kleft=PairwiseMargin(n,m,misrate)2+1k_{\text{left}} = \lfloor \frac{\operatorname{PairwiseMargin}(n, m, \mathrm{misrate})}{2} \rfloor + 1 and kright=nmPairwiseMargin(n,m,misrate)2k_{\text{right}} = n m - \lfloor \frac{\operatorname{PairwiseMargin}(n, m, \mathrm{misrate})}{2} \rfloor.

Computing the margin requires understanding the distribution of pairwise comparisons. Each pairwise difference corresponds to a comparison: xiyj>0x_i - y_j > 0 exactly when xi>yjx_i > y_j. This connection motivates the dominance function:

Dominance(x,y)=i=1nj=1m1(xi>yj)\text{Dominance}(\mathbf{x}, \mathbf{y}) = \sum_{i=1}^n \sum_{j=1}^m \mathbb{1}(x_i > y_j)

The dominance function counts how many pairwise comparisons favor x\mathbf{x} over y\mathbf{y}. Both Shift\operatorname{Shift} and Dominance\text{Dominance} operate on the same collection of nmn m pairwise differences. The Shift\operatorname{Shift} estimator examines difference values, returning the median as a location estimate. The Dominance\text{Dominance} function examines difference signs, counting how many comparisons produce positive differences. While Shift\operatorname{Shift} provides the estimate itself, Dominance\text{Dominance} determines which order statistics form reliable bounds around that estimate.

When populations have equivalent distributions, Dominance\text{Dominance} concentrates near nm2n \frac{m}{2} by symmetry. The distribution of Dominance\text{Dominance} across all possible sample orderings determines reliable bounds. If Dominance\text{Dominance} deviates from nm2n \frac{m}{2} by at least k2\frac{k}{2} with probability misrate\mathrm{misrate}, then the interval excluding the kk most extreme pairwise differences contains zero with probability 1misrate1 - \mathrm{misrate}. Translation invariance extends this relationship to arbitrary shifts: the margin computed from the comparison distribution applies regardless of the true shift value.

Two computational approaches provide the distribution of Dominance\text{Dominance}: exact calculation for small samples and approximation for large samples.

Exact method

Small sample sizes allow exact computation without approximation. The exact approach exploits a fundamental symmetry: under equivalent populations, all (n+mn)\binom{n+m}{n} orderings of the combined measurements occur with equal probability. This symmetry enables direct calculation of how many orderings produce each comparison count.

Direct computation faces a combinatorial challenge. Enumerating all orderings to count comparison outcomes requires substantial resources. For samples beyond a few dozen measurements, naive implementation becomes impractical.

Löfflers recurrence relation (Löffler 1982) resolves this through algebraic structure. The recurrence exploits cycle properties in the comparison distribution, reducing memory requirements while maintaining exact calculation. The algorithm builds cumulative probabilities sequentially until reaching the threshold corresponding to the desired error rate. This approach extends practical exact computation to combined sample sizes of several hundred.

Define pn,m(c)p_{n,m}(c) as the number of orderings producing exactly cc comparisons favoring x\mathbf{x}. The probability mass function becomes:

Pr(Dominance=c)=pn,m(c)(n+mn)\Pr(\text{Dominance} = c) = \frac{p_{n,m}(c)}{\binom{n+m}{n}}

A direct recurrence follows from considering the largest measurement. The rightmost element comes from either x\mathbf{x} (contributing mm comparisons) or y\mathbf{y} (contributing zero):

pn,m(c)=pn1,m(cm)+pn,m1(c)p_{n,m}(c) = p_{n-1,m}(c - m) + p_{n,m-1}(c)

with base cases pn,0(0)=1p_{n,0}(0) = 1 and p0,m(0)=1p_{0,m}(0) = 1.

Direct implementation requires O(nmnm)O(n \cdot m \cdot n m) time and O(nm)O(n m) memory. An alternative recurrence (Löffler 1982) exploits cycle structure:

pn,m(c)=1ci=0c1pn,m(i)σn,m(ci)p_{n,m}(c) = \frac{1}{c} \sum_{i=0}^{c-1} p_{n,m}(i) \cdot \sigma_{n,m}(c - i)

where σn,m(d)\sigma_{n,m}(d) captures structural properties through divisors:

σn,m(d)=kdϵkk,ϵk={1if1kn1ifm+1km+n0otherwise\begin{aligned} \sigma_{n,m}(d) = \sum_{k|d} \epsilon_k \cdot k, \quad \epsilon_k = \begin{cases} 1 & \text{if} 1 \leq k \leq n \\ -1 & \text{if} m+1 \leq k \leq m+n \\ 0 & \text{otherwise} \end{cases} \end{aligned}

This reduces memory to O(nm)O(n m) and enables efficient computation through c=nmc = n m.

The algorithm computes cumulative probabilities Pr(Dominancec)\Pr(\text{Dominance} \leq c) sequentially until the threshold misrate/2\mathrm{misrate}/2 is exceeded. By symmetry, the lower and upper thresholds determine the total margin PairwiseMargin=2c\operatorname{PairwiseMargin} = 2c.

The sequential computation proceeds incrementally. Starting from u=0u = 0 with base probability pn,m(0)=1p_{n,m}(0) = 1, the algorithm computes pn,m(1)p_{n,m}(1), then pn,m(2)p_{n,m}(2), and so on, accumulating the cumulative distribution function with each step. The loop terminates as soon as Pr(Dominanceu)\Pr(\text{Dominance} \leq u) reaches misrate/2\mathrm{misrate}/2, returning the threshold value uu without computing further probabilities.

This sequential approach performs particularly well for small misrates. For misrate=106\mathrm{misrate} = 10^{-6}, the threshold uu typically remains small even with large sample sizes, requiring only a few iterations regardless of whether nn and mm equal 50 or 200. The algorithm computes only the extreme tail probabilities needed to reach the threshold, never touching the vast majority of probability mass concentrated near nm2n \frac{m}{2}. This efficiency advantage grows as misrates decrease: stricter bounds require fewer computed values, making exact calculation particularly attractive for high-confidence applications.

Approximate method

Large samples make exact computation impractical. The dominance count Dominance\text{Dominance} concentrates near nm2n \frac{m}{2} with variance nm(n+m+1)12n \frac{m(n+m+1)}{12}. A basic Additive\underline{\operatorname{Additive}} (Normal) approximation suffices asymptotically:

DominanceAdditive(nm2,nm(n+m+1)12)\text{Dominance} \approx \underline{\operatorname{Additive}}(n \frac{m}{2}, \sqrt{n \frac{m(n+m+1)}{12}})

This approximation underestimates tail probabilities for moderate sample sizes. The Additive\underline{\operatorname{Additive}} (Normal) approximation provides a baseline but fails to capture the true distribution shape in the tails, producing mis-calibrated probabilities that become problematic for small error rates.

The Edgeworth expansion refines this approximation through moment-based corrections (Fix & Hodges 1955). The expansion starts with the Additive\underline{\operatorname{Additive}} (Normal) cumulative distribution as a baseline, then adds correction terms that account for the distributions asymmetry (skewness) and tail weight (kurtosis). These corrections use Hermite polynomials to adjust the baseline curve where the Additive\underline{\operatorname{Additive}} (Normal) approximation deviates most from the true distribution. The first few correction terms typically achieve the practical balance between accuracy and computational cost, substantially improving tail probability estimates compared to the basic approximation.

The standardized comparison count (with continuity correction):

z=cnm20.5nm(n+m+1)12z = \frac{c - n \frac{m}{2} - 0.5}{\sqrt{n \frac{m(n+m+1)}{12}}}

produces the approximated cumulative distribution:

Pr(Dominancec)Φ(z)+e3ϕ(3)(z)+e5ϕ(5)(z)+e7ϕ(7)(z)\Pr(\text{Dominance} \leq c) \approx \Phi(z) + e_3 \phi^{(3)}(z) + e_5 \phi^{(5)}(z) + e_7 \phi^{(7)}(z)

where Φ\Phi denotes the standard Additive\underline{\operatorname{Additive}} (Normal) CDF.

The correction coefficients depend on standardized moments:

e3=124(μ4μ223),e5=1720(μ6μ2315μ4μ22+30),e7=3540320(μ4μ223)2e_3 = \frac{1}{24} (\frac{\mu_4}{\mu_2^2} - 3), \quad e_5 = \frac{1}{720} (\frac{\mu_6}{\mu_2^3} - 15 \frac{\mu_4}{\mu_2^2} + 30), \quad e_7 = \frac{35}{40320} (\frac{\mu_4}{\mu_2^2} - 3)^2

The moments μ2\mu_2, μ4\mu_4, μ6\mu_6 are computed from sample sizes:

μ2=nm(n+m+1)12\mu_2 = \frac{n m(n+m+1)}{12} μ4=nm(n+m+1)240(5nm(n+m)2(n2+m2)+3nm2(n+m))\mu_4 = \frac{n m(n+m+1)}{240} (5 n m(n+m) - 2(n^2 + m^2) + 3 n m - 2(n+m)) μ6=nm(n+m+1)4032(35n2m2(n2+m2)+70n3m342nm(n3+m3)14n2m2(n+m)+16(n4+m4)52nm(n2+m2)43n2m2+32(n3+m3)+14nm(n+m)+8(n2+m2)+16nm8(n+m))\mu_6 = \frac{n m(n+m+1)}{4032} (35 n^2 m^2(n^2 + m^2) + 70 n^3 m^3 - 42 n m(n^3 + m^3) \\ - 14 n^2 m^2(n + m) + 16(n^4 + m^4) - 52 n m(n^2 + m^2) \\ - 43 n^2 m^2 + 32(n^3 + m^3) + 14 n m(n + m) \\ + 8(n^2 + m^2) + 16 n m - 8(n + m))

The correction terms use Hermite polynomials:

ϕ(k)(z)=ϕ(z)Hk(z)\phi^{(k)}(z) = -\phi(z) H_k(z) H3(z)=z33z,H5(z)=z510z3+15z,H7(z)=z721z5+105z3105zH_3(z) = z^3 - 3z, \quad H_5(z) = z^5 - 10z^3 + 15z, \quad H_7(z) = z^7 - 21z^5 + 105z^3 - 105z

Binary search locates the threshold value efficiently. The algorithm maintains a search interval [a,b][a, b] initialized to [0,nm][0, n m]. Each iteration computes the midpoint c=a+b2c = \frac{a + b}{2} and evaluates the Edgeworth CDF at cc. If Pr(Dominancec)<misrate/2\Pr(\text{Dominance} \leq c) < \mathrm{misrate}/2, the threshold lies above cc and the search continues in [c,b][c, b]. If Pr(Dominancec)misrate/2\Pr(\text{Dominance} \leq c) \geq \mathrm{misrate}/2, the threshold lies below cc and the search continues in [a,c][a, c]. The loop terminates when aa and bb become adjacent, requiring O(log(nm))O(\log(n m)) CDF evaluations.

This binary search exhibits uniform performance across misrate values. Whether computing bounds for misrate=106\mathrm{misrate} = 10^{-6} or misrate=0.05\mathrm{misrate} = 0.05, the algorithm performs the same number of iterations determined solely by the sample sizes. Each CDF evaluation costs constant time regardless of the threshold location, making the approximate method particularly efficient for large samples where exact computation becomes impractical. The logarithmic scaling ensures that doubling the sample size adds only one additional iteration, enabling practical computation for samples in the thousands or tens of thousands.

The toolkit selects between exact and approximate computation based on combined sample size: exact method for n+m400n + m \leq 400, approximate method for n+m>400n + m > 400. The exact method guarantees correctness but scales as O(nm)O(n m) memory and O((nm)2)O((n m)^2) time. For n=m=200n = m = 200, this requires 40,000 memory locations. The approximate method achieves 1% accuracy with O(log(nm))O(\log(n m)) constant-time evaluations. For n=m=10000n = m = 10000, the approximate method completes in milliseconds versus minutes for exact computation.

Both methods handle discrete data. Repeated measurements produce tied pairwise differences, creating plateaus in the sorted sequence. The exact method counts orderings without assuming continuity. The approximate methods moment-based corrections capture the actual distribution shape regardless of discreteness.

Minimum achievable misrate

The misrate\mathrm{misrate} parameter controls how many extreme pairwise differences the bounds exclude. Lower misrate produces narrower bounds with higher confidence but requires excluding fewer extreme values. However, sample size limits how small misrate can meaningfully become.

Consider the most extreme configuration: all measurements from x\mathbf{x} exceed all measurements from y\mathbf{y}, giving x1,,xn>y1,,ymx_1, \ldots, x_n > y_1, \ldots, y_m. Under equivalent populations, this arrangement occurs purely by chance. The probability equals the chance of having all nn elements from x\mathbf{x} occupy the top nn positions among n+mn+m total measurements:

misratemin=2(n+mn)=2n!m!(n+m)!\mathrm{misrate}_min = \frac{2}{\binom{n+m}{n}} = \frac{2 \cdot n! \cdot m!}{(n+m)!}

This represents the minimum probability of the most extreme ordering under random sampling. Setting misrate<misratemin\mathrm{misrate} < \mathrm{misrate}_min makes bounds construction problematic. The exact distribution of Dominance\text{Dominance} cannot support misrates smaller than the probability of its most extreme realization. Attempting to construct bounds with misrate<misratemin\mathrm{misrate} < \mathrm{misrate}_min forces the algorithm to exclude zero pairwise differences from the tails, making PairwiseMargin=0\operatorname{PairwiseMargin} = 0. The resulting bounds span all nmn m pairwise differences, returning [z(1),z(nm)][z_{(1)}, z_{(n m)}] regardless of the desired confidence level. These bounds convey no useful information beyond the range of observed pairwise differences.

For small samples, misratemin\mathrm{misrate}_min can exceed commonly used values. With n=m=6n = m = 6, the minimum misrate equals 2(126)0.00217\frac{2}{\binom{12}{6}} \approx 0.00217, making the typical choice of misrate=103\mathrm{misrate} = 10^{-3} impossible. With n=m=4n = m = 4, the minimum becomes 2(84)0.0286\frac{2}{\binom{8}{4}} \approx 0.0286, exceeding even misrate=0.01\mathrm{misrate} = 0.01.

The table below shows misratemin\mathrm{misrate}_min for small sample sizes:

12345678910
11.0000000.6666670.5000000.4000000.3333330.2857140.2500000.2222220.2000000.181818
20.6666670.3333330.2000000.1333330.0952380.0714290.0555560.0444440.0363640.030303
30.5000000.2000000.1000000.0571430.0357140.0238100.0166670.0121210.0090910.006993
40.4000000.1333330.0571430.0285710.0158730.0095240.0060610.0040400.0027970.001998
50.3333330.0952380.0357140.0158730.0079370.0043290.0025250.0015540.0009990.000666
60.2857140.0714290.0238100.0095240.0043290.0021650.0011660.0006660.0004000.000250
70.2500000.0555560.0166670.0060610.0025250.0011660.0005830.0003110.0001750.000103
80.2222220.0444440.0121210.0040400.0015540.0006660.0003110.0001550.0000820.000046
90.2000000.0363640.0090910.0027970.0009990.0004000.0001750.0000820.0000410.000022
100.1818180.0303030.0069930.0019980.0006660.0002500.0001030.0000460.0000220.000011

For meaningful bounds construction, choose misrate>misratemin\mathrm{misrate} > \mathrm{misrate}_min. This ensures the margin function excludes at least some extreme pairwise differences, producing bounds narrower than the full range. When working with small samples, verify that the desired misrate exceeds misratemin\mathrm{misrate}_min for the given sample sizes. With moderate sample sizes (n,m15n, m \geq 15), misratemin\mathrm{misrate}_min drops below 10810^{-8}, making standard choices like misrate=106\mathrm{misrate} = 10^{-6} feasible.

using System;
using JetBrains.Annotations;
using Pragmastat.Exceptions;
using Pragmastat.Internal;

namespace Pragmastat.Functions;

/// <summary>
/// PairwiseMargin function
/// </summary>
/// <param name="threshold">The maximum value for n+m, after which implementation switches from exact to approx</param>
internal class PairwiseMargin(int threshold = PairwiseMargin.MaxExactSize)
{
  public static readonly PairwiseMargin Instance = new();

  private const int MaxExactSize = 400;

  [PublicAPI]
  public int Calc(int n, int m, double misrate)
  {
    if (n <= 0)
      throw AssumptionException.Domain(Subject.X);
    if (m <= 0)
      throw AssumptionException.Domain(Subject.Y);
    if (double.IsNaN(misrate) || misrate < 0 || misrate > 1)
      throw AssumptionException.Domain(Subject.Misrate);

    double minMisrate = MinAchievableMisrate.TwoSample(n, m);
    if (misrate < minMisrate)
      throw AssumptionException.Domain(Subject.Misrate);

    return n + m <= threshold
      ? CalcExact(n, m, misrate)
      : CalcApprox(n, m, misrate);
  }

  internal int CalcExact(int n, int m, double misrate)
  {
    int raw = CalcExactRaw(n, m, misrate / 2);
    return checked(raw * 2);
  }

  internal int CalcApprox(int n, int m, double misrate)
  {
    long raw = CalcApproxRaw(n, m, misrate / 2);
    long margin = raw * 2;
    if (margin > int.MaxValue)
      throw new OverflowException($"Pairwise margin exceeds supported range for n={n}, m={m}");
    return (int)margin;
  }

  // Inversed implementation of Andreas Löffler's (1982) "Über eine Partition der nat. Zahlen und ihre Anwendung beim U-Test"
  private static int CalcExactRaw(int n, int m, double p)
  {
    double total = n + m < BinomialCoefficientFunction.MaxAcceptableN
      ? BinomialCoefficientFunction.BinomialCoefficient(n + m, m)
      : BinomialCoefficientFunction.BinomialCoefficient(n + m, m * 1.0);

    var pmf = new List<double> { 1 }; // pmf[0] = 1
    var sigma = new List<double> { 0 }; // sigma[0] is unused

    int u = 0;
    double cdf = 1.0 / total;

    if (cdf >= p)
      return 0;

    while (true)
    {
      u++;
      // Ensure sigma has entry for u
      if (sigma.Count <= u)
      {
        int value = 0;
        for (int d = 1; d <= n; d++)
          if (u % d == 0 && u >= d)
            value += d;
        for (int d = m + 1; d <= m + n; d++)
          if (u % d == 0 && u >= d)
            value -= d;
        sigma.Add(value);
      }

      // Compute pmf[u] using Loeffler recurrence
      double sum = 0;
      for (int i = 0; i < u; i++)
        sum += pmf[i] * sigma[u - i];
      sum /= u;
      pmf.Add(sum);

      cdf += sum / total;
      if (cdf >= p)
        return u;
      if (sum == 0)
        break;
    }

    return pmf.Count - 1;
  }

  // Inverse Edgeworth Approximation
  private static long CalcApproxRaw(int n, int m, double misrate)
  {
    long a = 0;
    long b = (long)n * m;
    while (a < b - 1)
    {
      long c = (a + b) / 2;
      double p = EdgeworthCdf(n, m, c);
      if (p < misrate)
        a = c;
      else
        b = c;
    }

    return EdgeworthCdf(n, m, b) < misrate ? b : a;
  }

  private static double EdgeworthCdf(int n, int m, long u)
  {
    double nm = (double)n * m;
    double mu = nm / 2.0;
    double su = Sqrt(nm * (n + m + 1) / 12.0);
    // -0.5 continuity correction: computing P(U ≥ u) for a right-tail discrete CDF
    double z = (u - mu - 0.5) / su;
    double phi = Exp(-z.Sqr() / 2) / Sqrt(2 * PI);
    double Phi = AcmAlgorithm209.Gauss(z);

    // Pre-compute powers of n and m for efficiency
    double n2 = (double)n * n;
    double n3 = n2 * n;
    double n4 = n2 * n2;
    double m2 = (double)m * m;
    double m3 = m2 * m;
    double m4 = m2 * m2;

    double mu2 = (double)n * m * (n + m + 1) / 12.0;
    double mu4 =
      (double)n * m * (n + m + 1) *
      (0
       + 5 * m * n * (m + n)
       - 2 * (m2 + n2)
       + 3 * m * n
       - 2 * (n + m)
      ) / 240.0;

    double mu6 =
      (double)n * m * (n + m + 1) *
      (0
       + 35 * m2 * n2 * (m2 + n2)
       + 70 * m3 * n3
       - 42 * m * n * (m3 + n3)
       - 14 * m2 * n2 * (n + m)
       + 16 * (n4 + m4)
       - 52 * n * m * (n2 + m2)
       - 43 * n2 * m2
       + 32 * (m3 + n3)
       + 14 * m * n * (n + m)
       + 8 * (n2 + m2)
       + 16 * n * m
       - 8 * (n + m)
      ) / 4032.0;

    // Pre-compute powers of mu2 and related terms
    double mu2_2 = mu2 * mu2;
    double mu2_3 = mu2_2 * mu2;
    double mu4_mu2_2 = mu4 / mu2_2;

    // Factorial constants: 4! = 24, 6! = 720, 8! = 40320
    double e3 = (mu4_mu2_2 - 3) / 24.0;
    double e5 = (mu6 / mu2_3 - 15 * mu4_mu2_2 + 30) / 720.0;
    double e7 = 35 * (mu4_mu2_2 - 3) * (mu4_mu2_2 - 3) / 40320.0;

    // Pre-compute powers of z for Hermite polynomials
    double z2 = z * z;
    double z3 = z2 * z;
    double z5 = z3 * z2;
    double z7 = z5 * z2;

    double f3 = -phi * (z3 - 3 * z);
    double f5 = -phi * (z5 - 10 * z3 + 15 * z);
    double f7 = -phi * (z7 - 21 * z5 + 105 * z3 - 105 * z);

    double edgeworth = Phi + e3 * f3 + e5 * f5 + e7 * f7;
    return Min(Max(edgeworth, 0), 1);
  }
}

Fast SignedRankMargin

The SignedRankMargin\operatorname{SignedRankMargin} function determines how many extreme pairwise averages to exclude when constructing bounds around Center(x)\operatorname{Center}(\mathbf{x}). Given a sample x=(x1,,xn)\mathbf{x} = (x_1, \ldots, x_n), the CenterBounds\operatorname{CenterBounds} estimator computes all N=n(n+1)2N = \frac{n(n+1)}{2} pairwise averages wij=xi+xj2w_{i j} = \frac{x_i + x_j}{2} for iji \leq j and sorts them. The bounds select specific order statistics from this sorted sequence: [w(kleft),w(kright)][w_{(k_{\text{left}})}, w_{(k_{\text{right}})}]. The challenge lies in determining which order statistics produce bounds that contain the true center with probability 1misrate1 - \mathrm{misrate}.

The margin function is the one-sample analog of PairwiseMargin\operatorname{PairwiseMargin}. While PairwiseMargin\operatorname{PairwiseMargin} uses the Mann-Whitney distribution for two-sample comparisons, SignedRankMargin\operatorname{SignedRankMargin} uses the Wilcoxon signed-rank distribution for one-sample inference. Under the weak symmetry assumption, the signed-rank statistic has a known distribution that enables exact computation of bounds coverage.

For symmetric distributions, consider the signs of deviations from the center. The Wilcoxon signed-rank statistic WW sums the ranks of positive deviations:

W=i=1nRi1(xi>θ)W = \sum_{i=1}^n R_i \cdot \mathbb{1}(x_i > \theta)

where RiR_i is the rank of xiθ\lvert x_i - \theta \rvert among all xjθ\lvert x_j - \theta \rvert, and θ\theta is the true center. Under symmetry, each deviation is equally likely to be positive or negative, giving WW a discrete distribution over [0,n(n+1)2][0, \frac{n(n+1)}{2}].

The connection to pairwise averages is fundamental: the kk-th order statistic of sorted pairwise averages corresponds to a specific threshold of the signed-rank statistic. By computing the distribution of WW, we determine which order statistics provide bounds with the desired coverage.

Two computational approaches provide the distribution of WW: exact calculation for small samples and approximation for large samples.

Exact method

Small sample sizes allow exact computation without approximation. The Wilcoxon signed-rank distribution has 2n2^n equally likely outcomes under symmetry, corresponding to all possible sign patterns for deviations from the center.

Dynamic programming builds the probability mass function efficiently. Define p(w)p(w) as the number of sign patterns producing signed-rank statistic equal to ww. The recurrence considers whether to include rank ii in the positive sum:

pi(w)=pi1(wi)+pi1(w)p_i(w) = p_{i-1}(w - i) + p_{i-1}(w)

with base case p0(0)=1p_0(0) = 1. This builds the distribution incrementally, rank by rank.

The algorithm computes cumulative probabilities Pr(Ww)\Pr(W \leq w) sequentially until the threshold misrate/2\mathrm{misrate}/2 is exceeded. For symmetric two-tailed bounds, the margin becomes SignedRankMargin=2w\operatorname{SignedRankMargin} = 2w. Memory is O(n2)O(n^2) for storing the probability array, and time is O(n3)O(n^3) for the full computation.

The sequential computation performs well for small misrates. For misrate=106\mathrm{misrate} = 10^{-6}, the threshold ww typically remains small, requiring only iterations through the lower tail regardless of sample size.

Approximate method

Large samples make exact computation impractical. For n>63n > 63, the Wilcoxon distribution is approximated using an Edgeworth expansion.

Under symmetry, the signed-rank statistic WW has:

E[W]=n(n+1)4,Var(W)=n(n+1)2n+124\mathbb{E}[W] = \frac{n(n+1)}{4}, \quad \operatorname{Var}(W) = n(n+1)\frac{2n+1}{24}

The basic normal approximation uses these moments directly, but underestimates tail probabilities for moderate sample sizes.

The Edgeworth expansion refines this through moment-based corrections. The fourth central moment of WW is:

μ4=9n5+45n4+65n3+15n214n480\mu_4 = \frac{9n^5 + 45n^4 + 65n^3 + 15n^2 - 14n}{480}

This enables kurtosis correction:

e3=μ43σ424σ4e_3 = \frac{\mu_4 - 3 \sigma^4}{24 \sigma^4}

The approximated CDF becomes:

Pr(Ww)Φ(z)+e3ϕ(z)(z33z)\Pr(W \leq w) \approx \Phi(z) + e_3 \phi(z)(z^3 - 3z)

where z=wμ+0.5σz = \frac{w - \mu + 0.5}{\sigma} includes a continuity correction.

Binary search locates the threshold efficiently. Each CDF evaluation costs O(1)O(1), and O(logN)O(\log N) evaluations suffice. The approximate method completes in constant time regardless of sample size.

The toolkit uses exact computation for n63n \leq 63 and approximation for n>63n > 63. At n=63n = 63, the exact method requires arrays of size 2,0162,016 (=63×642= 63 \times \frac{64}{2}), which remains practical on modern hardware. The transition at n=63n = 63 is determined by the requirement that 2n2^n fits in a 64-bit integer. The approximation achieves sub-1% accuracy for n>100n > 100, making the transition smooth.

Minimum achievable misrate

The misrate\mathrm{misrate} parameter controls how many extreme pairwise averages the bounds exclude. However, sample size limits how small misrate can meaningfully become.

The most extreme configuration occurs when all signs are positive (or all negative): W=n(n+1)2W = \frac{n(n+1)}{2} or W=0W = 0. Under symmetry, this extreme occurs with probability:

misratemin=22n=21n\mathrm{misrate}_min = \frac{2}{2^n} = 2^{1-n}

Setting misrate<misratemin\mathrm{misrate} < \mathrm{misrate}_min makes bounds construction problematic. Pragmastat rejects such requests with a domain error.

The table below shows misratemin\mathrm{misrate}_min for small sample sizes:

nn235710
misratemin\mathrm{misrate}_min0.50.250.06250.01560.00195
max conf50%75%93.75%98.4%99.8%

For meaningful bounds construction, choose misrate>misratemin\mathrm{misrate} > \mathrm{misrate}_min. With n10n \geq 10, standard choices like misrate=103\mathrm{misrate} = 10^{-3} become feasible. With n20n \geq 20, even misrate=106\mathrm{misrate} = 10^{-6} is achievable.

using JetBrains.Annotations;
using Pragmastat.Exceptions;

namespace Pragmastat.Functions;

/// <summary>
/// SignedRankMargin function for one-sample bounds.
/// One-sample analog of PairwiseMargin using Wilcoxon signed-rank distribution.
/// </summary>
/// <param name="threshold">Maximum n for exact computation; larger n uses approximation</param>
internal class SignedRankMargin(int threshold = SignedRankMargin.MaxExactSize)
{
  public static readonly SignedRankMargin Instance = new();

  private const int MaxExactSize = 63;

  [PublicAPI]
  public int Calc(int n, double misrate)
  {
    if (n <= 0)
      throw AssumptionException.Domain(Subject.X);
    if (double.IsNaN(misrate) || misrate < 0 || misrate > 1)
      throw AssumptionException.Domain(Subject.Misrate);

    double minMisrate = MinAchievableMisrate.OneSample(n);
    if (misrate < minMisrate)
      throw AssumptionException.Domain(Subject.Misrate);

    return n <= threshold
      ? CalcExact(n, misrate)
      : CalcApprox(n, misrate);
  }

  internal int CalcExact(int n, double misrate)
  {
    int raw = CalcExactRaw(n, misrate / 2);
    return checked(raw * 2);
  }

  internal int CalcApprox(int n, double misrate)
  {
    long raw = CalcApproxRaw(n, misrate / 2);
    long margin = raw * 2;
    if (margin > int.MaxValue)
      throw new OverflowException($"Signed-rank margin exceeds supported range for n={n}");
    return (int)margin;
  }

  /// <summary>
  /// Compute one-sided margin using exact Wilcoxon signed-rank distribution.
  /// Uses dynamic programming to compute the CDF.
  /// </summary>
  private static int CalcExactRaw(int n, double p)
  {
    ulong total = 1UL << n;
    long maxW = (long)n * (n + 1) / 2;

    var count = new ulong[maxW + 1];
    count[0] = 1;

    for (int i = 1; i <= n; i++)
    {
      for (long w = Min(maxW, (long)i * (i + 1) / 2); w >= i; w--)
        count[w] += count[w - i];
    }

    ulong cumulative = 0;
    for (int w = 0; w <= maxW; w++)
    {
      cumulative += count[w];
      double cdf = (double)cumulative / total;
      if (cdf >= p)
        return w;
    }

    return (int)maxW;
  }

  /// <summary>
  /// Compute one-sided margin using Edgeworth approximation for large n.
  /// </summary>
  private static long CalcApproxRaw(int n, double misrate)
  {
    long maxW = (long)n * (n + 1) / 2;
    long a = 0;
    long b = maxW;

    while (a < b - 1)
    {
      long c = (a + b) / 2;
      double cdf = EdgeworthCdf(n, c);
      if (cdf < misrate)
        a = c;
      else
        b = c;
    }

    return EdgeworthCdf(n, b) < misrate ? b : a;
  }

  /// <summary>
  /// Edgeworth expansion for Wilcoxon signed-rank distribution CDF.
  /// </summary>
  private static double EdgeworthCdf(int n, long w)
  {
    double mu = (double)n * (n + 1) / 4.0;
    double sigma2 = n * (n + 1.0) * (2 * n + 1) / 24.0;
    double sigma = Sqrt(sigma2);

    // +0.5 continuity correction: computing P(W ≤ w) for a left-tail discrete CDF
    double z = (w - mu + 0.5) / sigma;
    double phi = Exp(-z * z / 2) / Sqrt(2 * PI);
    double Phi = AcmAlgorithm209.Gauss(z);

    double nd = n;
    double kappa4 = -nd * (nd + 1) * (2 * nd + 1) * (3 * nd * nd + 3 * nd - 1) / 240.0;

    double e3 = kappa4 / (24 * sigma2 * sigma2);

    double z2 = z * z;
    double z3 = z2 * z;
    double f3 = -phi * (z3 - 3 * z);

    double edgeworth = Phi + e3 * f3;
    return Min(Max(edgeworth, 0), 1);
  }

}

Fast CenterQuantiles

The Center\operatorname{Center} estimator computes the median of all pairwise averages xi+xj2\frac{x_i + x_j}{2} for iji \leq j. For CenterBounds\operatorname{CenterBounds}, we need not just the median but specific order statistics: the kk-th smallest pairwise average for bounds computation. Given a sample of size nn, there are N=n(n+1)2N = \frac{n(n+1)}{2} such pairwise averages.

A naive approach would materialize all NN pairwise averages, sort them, and extract the desired quantile. With n=10000n = 10000, this creates approximately 50 million values, requiring quadratic memory and O(NlogN)O(N \log N) time. The fast algorithm avoids materializing the pairs entirely.

The algorithm exploits the sorted structure of the implicit pairwise average matrix. After sorting the input to obtain x1x2xnx_1 \leq x_2 \leq \ldots \leq x_n, the pairwise averages form a symmetric matrix where both rows and columns are sorted. Instead of searching through indices, the algorithm searches through values. It maintains a search interval [lo,hi][\text{lo}, \text{hi}], initialized to [x1,xn][x_1, x_n] (the range of all possible pairwise averages). At each iteration, it asks: How many pairwise averages are at most this threshold? Based on the count, it narrows the search interval. For finding the kk-th smallest pairwise average: if countk\text{count} \geq k, the target is at or below the threshold and the search continues in the lower half; if count<k\text{count} < k, the target is above the threshold and the search continues in the upper half.

The key operation is counting pairwise averages at or below a threshold tt. For each ii, we need to count how many jij \geq i satisfy xi+xj2t\frac{x_i + x_j}{2} \leq t, equivalently xj2txix_j \leq 2t - x_i. Because the array is sorted, a single pass through the array suffices. For each ii, a pointer jj tracks the largest index where xj2txix_j \leq 2t - x_i. As ii increases, xix_i increases, so the threshold 2txi2t - x_i decreases, meaning jj can only decrease (or stay the same). This two-pointer technique ensures each element is visited at most twice, making the counting operation O(n)O(n) regardless of the threshold value.

Binary search converges to an approximate value, but bounds require exact pairwise averages. After the search converges, the algorithm identifies candidate exact values near the approximate threshold, then selects the one at the correct rank. The candidate generation examines positions near the boundary: for each row ii, it finds the jj values where the pairwise average crosses the threshold, collecting the actual pairwise average values. Sorting these candidates and verifying ranks ensures the exact quantile is returned.

CenterBounds\operatorname{CenterBounds} needs two quantiles: w(kleft)w_{(k_{\text{left}})} and w(kright)w_{(k_{\text{right}})}. The algorithm computes each independently using the same technique. For symmetric bounds around the center, the lower and upper ranks are kleft=SignedRankMargin/2+1k_{\text{left}} = \lfloor \operatorname{SignedRankMargin} / 2 \rfloor + 1 and kright=NSignedRankMargin/2k_{\text{right}} = N - \lfloor \operatorname{SignedRankMargin} / 2 \rfloor.

The algorithm achieves O(nlogn)O(n \log n) time complexity for sorting, plus O(nlogR)O(n \log R) for binary search where RR is the value range precision. Memory usage is O(n)O(n) for the sorted array plus O(n)O(n) for candidate generation. This is dramatically more efficient than the naive O(n2logn2)O(n^2 \log n^2) approach. For n=10000n = 10000, the fast algorithm completes in milliseconds versus minutes for the naive approach.

The algorithm uses relative tolerance for convergence: hilo1014max(1,lo,hi)\text{hi} - \text{lo} \leq 10^{-14} \cdot \max(1, \lvert \text{lo} \rvert, \lvert \text{hi} \rvert). The floor of 1 prevents degenerate tolerance when bounds are near zero. This ensures stable behavior across different scales of input data. For candidate generation near the threshold, small tolerances prevent missing exact values due to floating-point imprecision.

using System.Diagnostics;

namespace Pragmastat.Algorithms;

/// <summary>
/// Efficiently computes quantiles from all pairwise averages (x[i] + x[j]) / 2 for i ≤ j.
/// Uses binary search with counting function to avoid materializing all N(N+1)/2 pairs.
/// </summary>
internal static class FastCenterQuantiles
{
  private static bool IsSorted(IReadOnlyList<double> list)
  {
    for (int i = 1; i < list.Count; i++)
      if (list[i] < list[i - 1])
        return false;
    return true;
  }
  /// <summary>
  /// Relative epsilon for floating-point comparisons in binary search convergence.
  /// </summary>
  private const double RelativeEpsilon = 1e-14;
  /// <summary>
  /// Compute specified quantile from pairwise averages.
  /// </summary>
  /// <param name="sorted">Sorted input array.</param>
  /// <param name="k">1-based rank of the desired quantile.</param>
  /// <returns>The k-th smallest pairwise average.</returns>
  public static double Quantile(IReadOnlyList<double> sorted, long k)
  {
    Debug.Assert(
      IsSorted(sorted),
      "FastCenterQuantiles.Quantile: input must be sorted");
    int n = sorted.Count;
    if (n == 0)
      throw new ArgumentException("Input cannot be empty", nameof(sorted));

    long totalPairs = (long)n * (n + 1) / 2;
    if (k < 1 || k > totalPairs)
      throw new ArgumentOutOfRangeException(nameof(k), $"k must be in range [1, {totalPairs}]");

    if (n == 1)
      return sorted[0];

    return FindExactQuantile(sorted, k);
  }

  /// <summary>
  /// Compute both lower and upper bounds from pairwise averages.
  /// </summary>
  /// <param name="sorted">Sorted input array.</param>
  /// <param name="marginLo">Rank of lower bound (1-based).</param>
  /// <param name="marginHi">Rank of upper bound (1-based).</param>
  /// <returns>Lower and upper quantiles.</returns>
  public static (double Lo, double Hi) Bounds(IReadOnlyList<double> sorted, long marginLo, long marginHi)
  {
    Debug.Assert(
      IsSorted(sorted),
      "FastCenterQuantiles.Bounds: input must be sorted");
    int n = sorted.Count;
    if (n == 0)
      throw new ArgumentException("Input cannot be empty", nameof(sorted));

    long totalPairs = (long)n * (n + 1) / 2;

    marginLo = Max(1, Min(marginLo, totalPairs));
    marginHi = Max(1, Min(marginHi, totalPairs));

    double lo = FindExactQuantile(sorted, marginLo);
    double hi = FindExactQuantile(sorted, marginHi);

    return (Min(lo, hi), Max(lo, hi));
  }

  /// <summary>
  /// Count pairwise averages ≤ target value.
  /// </summary>
  private static long CountPairsLessOrEqual(IReadOnlyList<double> sorted, double target)
  {
    int n = sorted.Count;
    long count = 0;
    // j is not reset: as i increases, threshold decreases monotonically
    int j = n - 1;

    for (int i = 0; i < n; i++)
    {
      double threshold = 2 * target - sorted[i];

      while (j >= 0 && sorted[j] > threshold)
        j--;

      if (j >= i)
        count += j - i + 1;
    }

    return count;
  }

  /// <summary>
  /// Find exact k-th pairwise average using selection algorithm.
  /// </summary>
  private static double FindExactQuantile(IReadOnlyList<double> sorted, long k)
  {
    int n = sorted.Count;
    long totalPairs = (long)n * (n + 1) / 2;

    if (n == 1)
      return sorted[0];

    if (k == 1)
      return sorted[0];

    if (k == totalPairs)
      return sorted[n - 1];

    double lo = sorted[0];
    double hi = sorted[n - 1];
    const double eps = RelativeEpsilon;

    var candidates = new List<double>(n);

    while (hi - lo > eps * Max(1.0, Max(Abs(lo), Abs(hi))))
    {
      double mid = (lo + hi) / 2;
      long countLessOrEqual = CountPairsLessOrEqual(sorted, mid);

      if (countLessOrEqual >= k)
        hi = mid;
      else
        lo = mid;
    }

    double target = (lo + hi) / 2;

    for (int i = 0; i < n; i++)
    {
      double threshold = 2 * target - sorted[i];

      int left = i;
      int right = n;

      while (left < right)
      {
        int m = (left + right) / 2;
        if (sorted[m] < threshold - eps)
          left = m + 1;
        else
          right = m;
      }

      if (left < n && left >= i && Abs(sorted[left] - threshold) < eps * Max(1.0, Abs(threshold)))
        candidates.Add((sorted[i] + sorted[left]) / 2);

      if (left > i)
      {
        double avgBefore = (sorted[i] + sorted[left - 1]) / 2;
        if (avgBefore <= target + eps)
          candidates.Add(avgBefore);
      }
    }

    if (candidates.Count == 0)
      return target;

    candidates.Sort();

    foreach (double candidate in candidates)
    {
      long countAtCandidate = CountPairsLessOrEqual(sorted, candidate);
      if (countAtCandidate >= k)
        return candidate;
    }

    return target;
  }
}