CenterBounds

CenterBounds(x,misrate)=[w(kleft),w(kright)]\operatorname{CenterBounds}(\mathbf{x}, \mathrm{misrate}) = [w_{(k_{\text{left}})}, w_{(k_{\text{right}})}]

where w=xi+xj2\mathbf{w} = { \frac{x_i + x_j}{2} } (pairwise averages, sorted) for iji \leq j, kleft=SignedRankMargin2)+1k_{\text{left}} = \lfloor \frac{\operatorname{SignedRankMargin}}{\rfloor2}) + 1, kright=NSignedRankMargin2)k_{\text{right}} = N - \lfloor \frac{\operatorname{SignedRankMargin}}{\rfloor2}), and N=n(n+1)2N = \frac{n(n+1)}{2}

Robust bounds on Center(x)\operatorname{Center}(\mathbf{x}) with specified coverage.

  • Also known as — Wilcoxon signed-rank confidence interval for Hodges-Lehmann pseudomedian
  • Interpretationmisrate\mathrm{misrate} is probability that true center falls outside bounds
  • Domain — any real numbers, n2n \geq 2, misrate21n\mathrm{misrate} \geq 2^{1-n}
  • Unit — same as measurements
  • Note — assumes weak symmetry and weak continuity; exact for n63n \leq 63, Edgeworth approximation for n>63n > 63

Properties

  • Shift equivariance CenterBounds(x+k,misrate)=CenterBounds(x,misrate)+k\operatorname{CenterBounds}(\mathbf{x} + k, \mathrm{misrate}) = \operatorname{CenterBounds}(\mathbf{x}, \mathrm{misrate}) + k
  • Scale equivariance CenterBounds(kx,misrate)=kCenterBounds(x,misrate)\operatorname{CenterBounds}(k \cdot \mathbf{x}, \mathrm{misrate}) = k \cdot \operatorname{CenterBounds}(\mathbf{x}, \mathrm{misrate})

Example

  • CenterBounds([1..10], 0.01) = [2.5, 8.5] where Center = 5.5
  • Bounds fail to cover true center with probability misrate\approx \mathrm{misrate}

CenterBounds\operatorname{CenterBounds} provides not just the estimated center but also the uncertainty of that estimate. The function returns an interval of plausible center values given the data. Set misrate\mathrm{misrate} to control how often the bounds might fail to contain the true center: use 10310^{-3} for everyday analysis or 10610^{-6} for critical decisions where errors are costly. These bounds require weak symmetry but no specific distributional form. If the bounds exclude some reference value, that suggests the true center differs reliably from that value.

Algorithm

CenterBounds\operatorname{CenterBounds} uses two components: SignedRankMargin to determine which order statistics to select, and a fast quantile algorithm to compute them.

The Center\operatorname{Center} estimator computes the median of all pairwise averages xi+xj2\frac{x_i + x_j}{2} for iji \leq j. For CenterBounds\operatorname{CenterBounds}, we need not just the median but specific order statistics: the kk-th smallest pairwise average for bounds computation. Given a sample of size nn, there are N=n(n+1)2N = \frac{n(n+1)}{2} such pairwise averages.

A naive approach would materialize all NN pairwise averages, sort them, and extract the desired quantile. With n=10000n = 10000, this creates approximately 50 million values, requiring quadratic memory and O(NlogN)O(N \log N) time. The fast algorithm avoids materializing the pairs entirely.

The algorithm exploits the sorted structure of the implicit pairwise average matrix. After sorting the input to obtain x1x2xnx_1 \leq x_2 \leq \ldots \leq x_n, the pairwise averages form a symmetric matrix where both rows and columns are sorted. Instead of searching through indices, the algorithm searches through values. It maintains a search interval [lo,hi][\text{lo}, \text{hi}], initialized to [x1,xn][x_1, x_n] (the range of all possible pairwise averages). At each iteration, it asks: How many pairwise averages are at most this threshold? Based on the count, it narrows the search interval. For finding the kk-th smallest pairwise average: if countk\text{count} \geq k, the target is at or below the threshold and the search continues in the lower half; if count<k\text{count} < k, the target is above the threshold and the search continues in the upper half.

The key operation is counting pairwise averages at or below a threshold tt. For each ii, we need to count how many jij \geq i satisfy xi+xj2t\frac{x_i + x_j}{2} \leq t, equivalently xj2txix_j \leq 2t - x_i. Because the array is sorted, a single pass through the array suffices. For each ii, a pointer jj tracks the largest index where xj2txix_j \leq 2t - x_i. As ii increases, xix_i increases, so the threshold 2txi2t - x_i decreases, meaning jj can only decrease (or stay the same). This two-pointer technique ensures each element is visited at most twice, making the counting operation O(n)O(n) regardless of the threshold value.

Binary search converges to an approximate value, but bounds require exact pairwise averages. After the search converges, the algorithm identifies candidate exact values near the approximate threshold, then selects the one at the correct rank. The candidate generation examines positions near the boundary: for each row ii, it finds the jj values where the pairwise average crosses the threshold, collecting the actual pairwise average values. Sorting these candidates and verifying ranks ensures the exact quantile is returned.

CenterBounds\operatorname{CenterBounds} needs two quantiles: w(kleft)w_{(k_{\text{left}})} and w(kright)w_{(k_{\text{right}})}. The algorithm computes each independently using the same technique. For symmetric bounds around the center, the lower and upper ranks are kleft=SignedRankMargin2+1k_{\text{left}} = \lfloor \frac{\operatorname{SignedRankMargin}}{2} \rfloor + 1 and kright=NSignedRankMargin2k_{\text{right}} = N - \lfloor \frac{\operatorname{SignedRankMargin}}{2} \rfloor.

The algorithm achieves O(nlogn)O(n \log n) time complexity for sorting, plus O(nlogR)O(n \log R) for binary search where RR is the value range precision. Memory usage is O(n)O(n) for the sorted array plus O(n)O(n) for candidate generation. This is dramatically more efficient than the naive O(n2logn2)O(n^2 \log n^2) approach. For n=10000n = 10000, the fast algorithm completes in milliseconds versus minutes for the naive approach.

The algorithm uses relative tolerance for convergence: hilo1014max(1,lo,hi)\text{hi} - \text{lo} \leq 10^{-14} \cdot \max(1, \lvert \text{lo} \rvert, \lvert \text{hi} \rvert). The floor of 1 prevents degenerate tolerance when bounds are near zero. This ensures stable behavior across different scales of input data. For candidate generation near the threshold, small tolerances prevent missing exact values due to floating-point imprecision.

using System.Diagnostics;

namespace Pragmastat.Algorithms;

/// <summary>
/// Efficiently computes quantiles from all pairwise averages (x[i] + x[j]) / 2 for i ≤ j.
/// Uses binary search with counting function to avoid materializing all N(N+1)/2 pairs.
/// </summary>
internal static class FastCenterQuantiles
{
  private static bool IsSorted(IReadOnlyList<double> list)
  {
    for (int i = 1; i < list.Count; i++)
      if (list[i] < list[i - 1])
        return false;
    return true;
  }
  /// <summary>
  /// Relative epsilon for floating-point comparisons in binary search convergence.
  /// </summary>
  private const double RelativeEpsilon = 1e-14;
  /// <summary>
  /// Compute specified quantile from pairwise averages.
  /// </summary>
  /// <param name="sorted">Sorted input array.</param>
  /// <param name="k">1-based rank of the desired quantile.</param>
  /// <returns>The k-th smallest pairwise average.</returns>
  public static double Quantile(IReadOnlyList<double> sorted, long k)
  {
    Debug.Assert(
      IsSorted(sorted),
      "FastCenterQuantiles.Quantile: input must be sorted");
    int n = sorted.Count;
    if (n == 0)
      throw new ArgumentException("Input cannot be empty", nameof(sorted));

    long totalPairs = (long)n * (n + 1) / 2;
    if (k < 1 || k > totalPairs)
      throw new ArgumentOutOfRangeException(nameof(k), $"k must be in range [1, {totalPairs}]");

    if (n == 1)
      return sorted[0];

    return FindExactQuantile(sorted, k);
  }

  /// <summary>
  /// Compute both lower and upper bounds from pairwise averages.
  /// </summary>
  /// <param name="sorted">Sorted input array.</param>
  /// <param name="marginLo">Rank of lower bound (1-based).</param>
  /// <param name="marginHi">Rank of upper bound (1-based).</param>
  /// <returns>Lower and upper quantiles.</returns>
  public static (double Lo, double Hi) Bounds(IReadOnlyList<double> sorted, long marginLo, long marginHi)
  {
    Debug.Assert(
      IsSorted(sorted),
      "FastCenterQuantiles.Bounds: input must be sorted");
    int n = sorted.Count;
    if (n == 0)
      throw new ArgumentException("Input cannot be empty", nameof(sorted));

    long totalPairs = (long)n * (n + 1) / 2;

    marginLo = Max(1, Min(marginLo, totalPairs));
    marginHi = Max(1, Min(marginHi, totalPairs));

    double lo = FindExactQuantile(sorted, marginLo);
    double hi = FindExactQuantile(sorted, marginHi);

    return (Min(lo, hi), Max(lo, hi));
  }

  /// <summary>
  /// Count pairwise averages ≤ target value.
  /// </summary>
  private static long CountPairsLessOrEqual(IReadOnlyList<double> sorted, double target)
  {
    int n = sorted.Count;
    long count = 0;
    // j is not reset: as i increases, threshold decreases monotonically
    int j = n - 1;

    for (int i = 0; i < n; i++)
    {
      double threshold = 2 * target - sorted[i];

      while (j >= 0 && sorted[j] > threshold)
        j--;

      if (j >= i)
        count += j - i + 1;
    }

    return count;
  }

  /// <summary>
  /// Find exact k-th pairwise average using selection algorithm.
  /// </summary>
  private static double FindExactQuantile(IReadOnlyList<double> sorted, long k)
  {
    int n = sorted.Count;
    long totalPairs = (long)n * (n + 1) / 2;

    if (n == 1)
      return sorted[0];

    if (k == 1)
      return sorted[0];

    if (k == totalPairs)
      return sorted[n - 1];

    double lo = sorted[0];
    double hi = sorted[n - 1];
    const double eps = RelativeEpsilon;

    var candidates = new List<double>(n);

    while (hi - lo > eps * Max(1.0, Max(Abs(lo), Abs(hi))))
    {
      double mid = (lo + hi) / 2;
      long countLessOrEqual = CountPairsLessOrEqual(sorted, mid);

      if (countLessOrEqual >= k)
        hi = mid;
      else
        lo = mid;
    }

    double target = (lo + hi) / 2;

    for (int i = 0; i < n; i++)
    {
      double threshold = 2 * target - sorted[i];

      int left = i;
      int right = n;

      while (left < right)
      {
        int m = (left + right) / 2;
        if (sorted[m] < threshold - eps)
          left = m + 1;
        else
          right = m;
      }

      if (left < n && left >= i && Abs(sorted[left] - threshold) < eps * Max(1.0, Abs(threshold)))
        candidates.Add((sorted[i] + sorted[left]) / 2);

      if (left > i)
      {
        double avgBefore = (sorted[i] + sorted[left - 1]) / 2;
        if (avgBefore <= target + eps)
          candidates.Add(avgBefore);
      }
    }

    if (candidates.Count == 0)
      return target;

    candidates.Sort();

    foreach (double candidate in candidates)
    {
      long countAtCandidate = CountPairsLessOrEqual(sorted, candidate);
      if (countAtCandidate >= k)
        return candidate;
    }

    return target;
  }
}

Notes

On Bootstrap for Center Bounds

A natural question arises: can bootstrap resampling improve CenterBounds\operatorname{CenterBounds} coverage for asymmetric distributions where the weak symmetry assumption fails?

The idea is appealing. The signed-rank approach computes bounds from order statistics of Walsh averages using a margin derived from the Wilcoxon distribution, which assumes symmetric deviations from the center. Bootstrap makes no symmetry assumption: resample the data with replacement, compute Center\operatorname{Center} on each resample, and take quantiles of the bootstrap distribution as bounds. This should yield valid bounds regardless of distributional shape.

This manual deliberately does not provide a bootstrap-based alternative to CenterBounds\operatorname{CenterBounds}. The reasons are both computational and statistical.

Computational cost

CenterBounds\operatorname{CenterBounds} computes bounds in O(nlogn)O(n \log n) time: a single pass through the Walsh averages guided by the signed-rank margin. No resampling, no iteration.

A bootstrap version requires BB resamples (typically B=10000B = 10000 for stable tail quantiles), each computing Center\operatorname{Center} on the resample. Center\operatorname{Center} itself costs O(nlogn)O(n \log n) via the fast selection algorithm on the implicit pairwise matrix. The total cost becomes O(Bnlogn)O(B \cdot n \log n) per call — roughly 10000×10000 \times slower than the signed-rank approach.

For n=5n = 5, each call to Center\operatorname{Center} operates on 1515 Walsh averages. The bootstrap recomputes this 1000010000 times. The computation is not deep — it is merely wasteful. For n=100n = 100, there are 50505050 Walsh averages per resample, and 1000010000 resamples produce 5×1075 \times 10^7 selection operations per bounds call. In a simulation study that evaluates bounds across many samples, this cost becomes prohibitive.

Statistical quality

Bootstrap bounds are nominal, not exact. The percentile method has well-documented undercoverage for small samples: requesting 95% confidence (misrate=0.05\mathrm{misrate} = 0.05) typically yields 85–92% actual coverage for n<30n < 30. This is inherent to the bootstrap percentile method — the quantile estimates from BB resamples are biased toward the sample and underrepresent tail behavior. Refined methods (BCa, bootstrap-tt) partially address this but add complexity and still provide only asymptotic guarantees.

Meanwhile, CenterBounds\operatorname{CenterBounds} provides exact distribution-free coverage under symmetry. For n=5n = 5 requesting misrate=0.1\mathrm{misrate} = 0.1, the signed-rank method delivers exactly 10%10\% misrate. A bootstrap method, requesting the same 10%10\%, typically delivers 121215%15\% misrate. The exact method is simultaneously faster and more accurate.

Behavior under asymmetry

Under asymmetric distributions, the signed-rank margin is no longer calibrated: the Wilcoxon distribution assumes symmetric deviations, and asymmetry shifts the actual distribution of comparison counts.

However, the coverage degradation is gradual, not catastrophic. Mild asymmetry produces mild coverage drift. The bounds remain meaningful — they still bracket the pseudomedian using order statistics of the Walsh averages — but the actual misrate differs from the requested value.

This is the same situation as bootstrap, which also provides only approximate coverage. The practical difference is that the signed-rank approach achieves this approximate coverage in O(nlogn)O(n \log n) time, while bootstrap achieves comparable approximate coverage in O(Bnlogn)O(B \cdot n \log n) time.

Why not both?

One might argue for providing both methods: the signed-rank approach as default, and a bootstrap variant for cases where symmetry is severely violated.

This creates a misleading choice. If the bootstrap method offered substantially better coverage under asymmetry, the complexity would be justified. But for the distributions practitioners encounter (Multiplic\underline{\operatorname{Multiplic}}, Exp\underline{\operatorname{Exp}}, and other moderate asymmetries), the coverage difference between the two approaches is small relative to the 10000×10000 \times cost difference. For extreme asymmetries where the signed-rank coverage genuinely breaks down, the sign test provides an alternative foundation for median bounds (see On Misrate Efficiency of MedianBounds), but its O(n12)O(n^{-\frac{1}{2}}) efficiency convergence makes it impractical for moderate sample sizes.

The toolkit therefore provides CenterBounds\operatorname{CenterBounds} as the single bounds estimator. The weak symmetry assumption means the method performs well under approximate symmetry and degrades gracefully under moderate asymmetry. There is no useful middle ground that justifies a 10000×10000 \times computational penalty for marginally different approximate coverage.

On Misrate Efficiency of MedianBounds

This note analyzes MedianBoundsMedianBounds, a bounds estimator for the population median based on the sign test, and explains why pragmastat omits it in favor of CenterBounds\operatorname{CenterBounds}.

Definition

MedianBounds(x,misrate)=[x(k),x(nk+1)]MedianBounds(\mathbf{x}, \mathrm{misrate}) = [x_{(k)}, x_{(n-k+1)}]

where kk is the largest integer satisfying 2Pr(Bk1)misrate2 \cdot \Pr(B \leq k-1) \leq \mathrm{misrate} and BBinomial(n,0.5)B \sim \text{Binomial}(n, 0.5). The interval brackets the population median using order statistics, with misrate\mathrm{misrate} controlling the probability that the true median falls outside the bounds.

MedianBoundsMedianBounds requires no symmetry assumption — only weak continuity — making it applicable to arbitrarily skewed distributions. This is its principal advantage over CenterBounds\operatorname{CenterBounds}, which assumes weak symmetry.

Sign test foundation

The method is equivalent to inverting the sign test. Under weak continuity, each observation independently falls above or below the true median with probability 12\frac{1}{2}. The number of observations below the median follows Binomial(n,0.5)\text{Binomial}(n, 0.5), and the order statistics x(k)x_{(k)} and x(nk+1)x_{(n-k+1)} form a confidence interval whose coverage is determined exactly by the binomial CDF.

Because the binomial CDF is a step function, the achievable misrate values form a discrete set. The algorithm rounds down to the nearest achievable level, inevitably wasting part of the requested misrate budget. This note derives the resulting efficiency loss and its convergence rate.

Achievable misrate levels

The achievable misrate values for sample size nn are:

mk=2Pr(Bk),k=0,1,2,dotsm_k = 2 \cdot \Pr(B \leq k), \quad k = 0, 1, 2, dots

The algorithm selects the largest kk satisfying mkmisratem_k \leq \mathrm{misrate}. The efficiency η=mkmisrate\eta = \frac{m_k}{\mathrm{misrate}} measures how much of the requested budget is used. Efficiency η=1\eta = 1 means the bounds are as tight as the misrate allows; η=0.5\eta = 0.5 means half the budget is wasted, producing unnecessarily wide bounds.

Spacing between consecutive levels

The gap between consecutive achievable misrates is:

Δmk=mk+1mk=2Pr(B=k+1)=2(nk+1)2n\Delta m_k = m_{k+1} - m_k = 2 \cdot \Pr(B = k+1) = 2 \cdot \frac{\binom{n}{k+1}}{2^n}

For a target misrate α\alpha, the relevant index kk satisfies mkαm_k \approx \alpha. By the normal approximation to the binomial, Bcal(N)(n2,n4)B \approx cal(N)(\frac{n}{2}, \frac{n}{4}), the binomial CDF near this index changes by approximately:

Δm4ϕ(zα)n\Delta m \approx \frac{4 \phi(z_\alpha)}{\sqrt{n}}

where zα=Φ1(α2)z_\alpha = \Phi^{-1}(\frac{\alpha}{2}) is the corresponding standard normal quantile and ϕ\phi is the standard normal density. This spacing governs how coarsely the achievable misrates are distributed near the target.

Expected efficiency

The requested misrate α\alpha falls at a uniformly random position within a gap of width Δm\Delta m. On average, the algorithm wastes Δm2\Delta \frac{m}{2}, giving expected efficiency:

E[η]1Δm2α=12ϕ(zα)αn\mathbb{E}[\eta] \approx 1 - \frac{\Delta m}{2 \alpha} = 1 - \frac{2 \phi(z_\alpha)}{\alpha \sqrt{n}}

Define the misrate-dependent constant:

C(α)=2ϕ(Φ1(α2))αC(\alpha) = \frac{2 \phi(\Phi^{-1}(\frac{\alpha}{2}))}{\alpha}

Then the expected efficiency has the form:

E[η]1C(α)n\mathbb{E}[\eta] \approx 1 - \frac{C(\alpha)}{\sqrt{n}}

The convergence rate is O(n12)O(n^{-\frac{1}{2}}): efficiency improves as the square root of sample size.

Values of C(α)C(\alpha)

The constant C(α)C(\alpha) increases for smaller misrates, meaning tighter error tolerances require proportionally larger samples for efficient bounds:

α\alpha0.10.10.050.050.010.010.0050.0050.0010.001
zαz_\alpha1.64-1.641.96-1.962.58-2.582.81-2.813.29-3.29
ϕ(zα)\phi(z_\alpha)0.1030.1030.0580.0580.0150.0150.0080.0080.0020.002
C(α)C(\alpha)2.062.062.332.332.942.943.173.173.453.45

For α=0.1\alpha = 0.1 and n=50n = 50: E[η]12.06500.71\mathbb{E}[\eta] \approx 1 - 2.\frac{06}{\sqrt{50}} \approx 0.71. Achieving 90%90\% efficiency on average requires n>(C(α)0.1)2n > (\frac{C(\alpha)}{0}.1)^2. For α=0.1\alpha = 0.1 this gives n>424n > 424; for α=0.001\alpha = 0.001 this gives n>1190n > 1190.

Comparison with CenterBounds

CenterBounds\operatorname{CenterBounds} uses the signed-rank statistic WW with range [0,n(n+1)2][0, \frac{n(n+1)}{2}]. Under the null hypothesis, WW has variance σ2=n(n+1)2n+124n312\sigma^2 = n(n+1)\frac{2n+1}{24} \approx n^\frac{3}{12}. The CDF spacing at the relevant quantile is:

ΔmW212ϕ(zα)n32\Delta m_W \approx \frac{2 \sqrt{12} \cdot \phi(z_\alpha)}{n^{\frac{3}{2}}}

The expected efficiency for CenterBounds\operatorname{CenterBounds} is therefore:

E[ηW]112ϕ(zα)αn32\mathbb{E}[\eta_W] \approx 1 - \frac{\sqrt{12} \cdot \phi(z_\alpha)}{\alpha \cdot n^{\frac{3}{2}}}

This converges at rate O(n32)O(n^{-\frac{3}{2}}) — three polynomial orders faster than MedianBoundsMedianBounds. The difference arises because the signed-rank distribution has n(n+1)2\frac{n(n+1)}{2} discrete levels compared to the binomials nn levels, providing fundamentally finer resolution.

Why pragmastat omits MedianBounds

The efficiency loss of MedianBoundsMedianBounds is not an implementation artifact. It reflects a structural limitation of the sign test: using only the signs of (Xiθ)(X_i - \theta) discards magnitude information, leaving only nn binary observations to determine coverage. The signed-rank test used by CenterBounds\operatorname{CenterBounds} exploits both signs and ranks, producing n(n+1)2\frac{n(n+1)}{2} comparison outcomes and correspondingly finer misrate resolution.

For applications requiring tight misrate control on the median, large samples (n>500n > 500) are needed to ensure efficient use of the misrate budget. For smaller samples, the bounds remain valid but conservative: the actual misrate is guaranteed to not exceed the requested value, even though it may be substantially below it.

CenterBounds\operatorname{CenterBounds} with its O(n32)O(n^{-\frac{3}{2}}) convergence achieves near-continuous misrate control even for moderate nn, at the cost of requiring weak symmetry. For the distributions practitioners typically encounter, this tradeoff favors CenterBounds\operatorname{CenterBounds} as the single bounds estimator in the toolkit. When symmetry is severely violated, the coverage drift of CenterBounds\operatorname{CenterBounds} is gradual — mild asymmetry produces mild drift — making it a robust default without the efficiency penalty inherent to the sign test approach.

Tests

CenterBounds(x,misrate)=[w(kleft),w(kright)]\operatorname{CenterBounds}(\mathbf{x}, \mathrm{misrate}) = [w_{(k_{\text{left}})}, w_{(k_{\text{right}})}]

where w=xi+xj2\mathbf{w} = { \frac{x_i + x_j}{2} } (pairwise averages, sorted) for iji \leq j, kleft=SignedRankMargin2)+1k_{\text{left}} = \lfloor \frac{\operatorname{SignedRankMargin}}{\rfloor2}) + 1, kright=NSignedRankMargin2)k_{\text{right}} = N - \lfloor \frac{\operatorname{SignedRankMargin}}{\rfloor2}), and N=n(n+1)2N = \frac{n(n+1)}{2}.

The CenterBounds\operatorname{CenterBounds} test suite contains 43 test cases (3 demo + 4 natural + 5 property + 7 edge + 4 symmetric + 4 asymmetric + 2 additive + 2 uniform + 4 misrate + 6 unsorted + 2 error cases). Since CenterBounds\operatorname{CenterBounds} returns bounds rather than a point estimate, tests validate that bounds contain Center(x)\operatorname{Center}(\mathbf{x}) and satisfy equivariance properties. Each test case output is a JSON object with lower and upper fields representing the interval bounds.

Demo examples — from manual introduction, validating basic bounds:

  • demo-1: x=(1,2,3,4,5)\mathbf{x} = (1, 2, 3, 4, 5), misrate=0.1\mathrm{misrate} = 0.1, expected output: [1.5,4.5][1.5, 4.5]
  • demo-2: x=(1,,10)\mathbf{x} = (1, \ldots, 10), misrate=0.01\mathrm{misrate} = 0.01, expected output: [2.5,8.5][2.5, 8.5]
  • demo-3: x=(0,2,4,6,8)\mathbf{x} = (0, 2, 4, 6, 8), misrate=0.1\mathrm{misrate} = 0.1

These cases illustrate how tighter misrates produce wider bounds.

Natural sequences (nin5,7,10,20n in {5, 7, 10, 20}, misrate=0.01\mathrm{misrate} = 0.01) — 4 tests:

  • natural-5: x=(1,2,3,4,5)\mathbf{x} = (1, 2, 3, 4, 5), bounds containing Center=3\operatorname{Center} = 3
  • natural-7: x=(1,,7)\mathbf{x} = (1, \ldots, 7), bounds containing Center=4\operatorname{Center} = 4
  • natural-10: x=(1,,10)\mathbf{x} = (1, \ldots, 10), expected output: [2.5,8.5][2.5, 8.5]
  • natural-20: x=(1,,20)\mathbf{x} = (1, \ldots, 20), bounds containing Center=10.5\operatorname{Center} = 10.5

Property validation (n=5n = 5, misrate=0.05\mathrm{misrate} = 0.05) — 5 tests:

  • property-identity: x=(1,2,3,4,5)\mathbf{x} = (1, 2, 3, 4, 5), bounds must contain Center=3\operatorname{Center} = 3
  • property-centered: x=(2,1,0,1,2)\mathbf{x} = (-2, -1, 0, 1, 2), bounds must contain Center=0\operatorname{Center} = 0
  • property-location-shift: x=(11,12,13,14,15)\mathbf{x} = (11, 12, 13, 14, 15) (= demo-1 + 10), bounds must be demo-1 bounds + 10
  • property-scale-2x: x=(2,4,6,8,10)\mathbf{x} = (2, 4, 6, 8, 10) (= 2 × demo-1), bounds must be 2× demo-1 bounds
  • property-mixed-signs: x=(2,1,0,1,2)\mathbf{x} = (-2, -1, 0, 1, 2), validates bounds crossing zero

Edge cases — boundary conditions and extreme scenarios (7 tests):

  • edge-two-elements: x=(1,2)\mathbf{x} = (1, 2), misrate=0.5\mathrm{misrate} = 0.5 (minimum meaningful sample)
  • edge-three-elements: x=(1,2,3)\mathbf{x} = (1, 2, 3), misrate=0.25\mathrm{misrate} = 0.25 (small sample)
  • edge-loose-misrate: x=(1,2,3,4,5)\mathbf{x} = (1, 2, 3, 4, 5), misrate=0.5\mathrm{misrate} = 0.5 (permissive bounds)
  • edge-strict-misrate: x=(1,,10)\mathbf{x} = (1, \ldots, 10), misrate=0.002\mathrm{misrate} = 0.002 (near-minimum misrate for n=10n=10)
  • edge-duplicates-10: x=(5,5,5,5,5,5,5,5,5,5)\mathbf{x} = (5, 5, 5, 5, 5, 5, 5, 5, 5, 5), misrate=0.01\mathrm{misrate} = 0.01 (all identical, bounds =[5,5]= [5, 5])
  • edge-negative: x=(5,4,3,2,1)\mathbf{x} = (-5, -4, -3, -2, -1), misrate=0.05\mathrm{misrate} = 0.05 (negative values)
  • edge-wide-range: x=(1,10,100,1000,10000)\mathbf{x} = (1, 10, 100, 1000, 10000), misrate=0.1\mathrm{misrate} = 0.1 (extreme value range)

Symmetric distributions (misrate=0.05\mathrm{misrate} = 0.05) — 4 tests with symmetric data:

  • symmetric-5: x=(2,1,0,1,2)\mathbf{x} = (-2, -1, 0, 1, 2), bounds centered around 00
  • symmetric-7: x=(3,2,1,0,1,2,3)\mathbf{x} = (-3, -2, -1, 0, 1, 2, 3), bounds centered around 00
  • symmetric-10: n=10n = 10 symmetric around 00
  • symmetric-15: n=15n = 15 symmetric around 00

These tests validate that symmetric data produces symmetric bounds around the center.

Asymmetric distributions (n=5n = 5, misrate=0.1\mathrm{misrate} = 0.1) — 4 tests validating bounds with asymmetric data:

  • asymmetric-left-skew: x=(1,7,8,9,10)\mathbf{x} = (1, 7, 8, 9, 10), expected output: [4,9.5][4, 9.5]
  • asymmetric-right-skew: x=(1,2,3,4,10)\mathbf{x} = (1, 2, 3, 4, 10), expected output: [1.5,7][1.5, 7]
  • asymmetric-bimodal: x=(1,1,5,9,9)\mathbf{x} = (1, 1, 5, 9, 9), expected output: [1,9][1, 9]
  • asymmetric-outlier: x=(1,2,3,4,100)\mathbf{x} = (1, 2, 3, 4, 100), expected output: [1.5,52][1.5, 52]

These tests validate that CenterBounds\operatorname{CenterBounds} handles asymmetric data correctly, complementing the symmetric test cases.

Additive distribution (misrate=0.01\mathrm{misrate} = 0.01) — 2 tests with Additive(10,1)\underline{\operatorname{Additive}}(10, 1):

  • additive-10: n=10n = 10, seed 0
  • additive-20: n=20n = 20, seed 0

Uniform distribution (misrate=0.01\mathrm{misrate} = 0.01) — 2 tests with Uniform(0,1)\underline{\operatorname{Uniform}}(0, 1):

  • uniform-10: n=10n = 10, seed 1
  • uniform-20: n=20n = 20, seed 1

Misrate variation (x=(1,,10)\mathbf{x} = (1, \ldots, 10)) — 4 tests with varying misrates:

  • misrate-1e-1: misrate=0.1\mathrm{misrate} = 0.1
  • misrate-5e-2: misrate=0.05\mathrm{misrate} = 0.05
  • misrate-1e-2: misrate=0.01\mathrm{misrate} = 0.01
  • misrate-5e-3: misrate=0.005\mathrm{misrate} = 0.005

These tests validate monotonicity: smaller misrates produce wider bounds.

Unsorted tests — verify sorting independence (6 tests):

  • unsorted-reverse-5: x=(5,4,3,2,1)\mathbf{x} = (5, 4, 3, 2, 1), must equal natural-5 output
  • unsorted-reverse-7: x=(7,6,5,4,3,2,1)\mathbf{x} = (7, 6, 5, 4, 3, 2, 1), must equal natural-7 output
  • unsorted-shuffle-5: x\mathbf{x} shuffled, must equal sorted counterpart
  • unsorted-shuffle-7: x\mathbf{x} shuffled, must equal sorted counterpart
  • unsorted-negative-5: negative values unsorted
  • unsorted-mixed-signs-5: mixed signs unsorted

These tests validate that CenterBounds\operatorname{CenterBounds} produces identical results regardless of input order.

Error cases — 2 tests validating input validation:

  • error-single-element: x=(1)\mathbf{x} = (1), misrate=0.5\mathrm{misrate} = 0.5 (minimum sample size violation)
  • error-invalid-misrate: x=(1,2,3,4,5)\mathbf{x} = (1, 2, 3, 4, 5), misrate=0.001\mathrm{misrate} = 0.001 (misrate below minimum achievable)