Ratio

Ratio(x,y)=exp(Shift(logx,logy))\operatorname{Ratio}(\mathbf{x}, \mathbf{y}) = \exp(\operatorname{Shift}(\log \mathbf{x}, \log \mathbf{y}))

Robust measure of scale ratio between two samples — the multiplicative dual of Shift\operatorname{Shift}.

  • Asymptotic — geometric median of pairwise ratios xiyj\frac{x_i}{y_j} (via log-space aggregation)
  • Domainxi>0x_i > 0, yj>0y_j > 0
  • Assumptionspositivity(x), positivity(y)
  • Unit — dimensionless
  • ComplexityO((m+n)logL)O((m + n) \log L)

Properties

  • Self-ratio Ratio(x,x)=1\operatorname{Ratio}(\mathbf{x}, \mathbf{x}) = 1
  • Scale equivariance Ratio(kxx,kyy)=(kxky)Ratio(x,y)\operatorname{Ratio}(k_x \cdot \mathbf{x}, k_y \cdot \mathbf{y}) = (\frac{k_x}{k_y}) \cdot \operatorname{Ratio}(\mathbf{x}, \mathbf{y})
  • Multiplicative antisymmetry Ratio(x,y)=1/Ratio(y,x)\operatorname{Ratio}(\mathbf{x}, \mathbf{y}) = 1 / \operatorname{Ratio}(\mathbf{y}, \mathbf{x})

Example

  • Ratio([1, 2, 4, 8, 16], [2, 4, 8, 16, 32]) = 0.5
  • Ratio(x, x) = 1 Ratio(2x, 5y) = 0.4 · Ratio(x, y)

Relationship to Shift

Ratio\operatorname{Ratio} is the multiplicative analog of Shift\operatorname{Shift}. While Shift\operatorname{Shift} computes the median of pairwise differences xiyjx_i - y_j, Ratio\operatorname{Ratio} computes the median of pairwise ratios xiyj\frac{x_i}{y_j} via log-transformation. This relationship is expressed formally as:

Ratio(x,y)=exp(Shift(logx,logy))\operatorname{Ratio}(\mathbf{x}, \mathbf{y}) = \exp(\operatorname{Shift}(\log \mathbf{x}, \log \mathbf{y}))

The log-transformation converts multiplicative relationships to additive ones, allowing the fast Shift\operatorname{Shift} algorithm to compute the result efficiently. The exp-transformation converts back to the ratio scale.

Ratio\operatorname{Ratio} is appropriate for multiplicative relationships rather than additive differences. If one system is twice as fast or prices are 30% lower, the underlying thinking is in ratios. A result of 0.5 means the first group is typically half the size of the second; 2.0 means twice as large. This estimator is appropriate for quantities like prices, response times, and concentrations where relative comparisons make more sense than absolute ones. Both samples must contain strictly positive values.

Algorithm

The Ratio\operatorname{Ratio} estimator measures the typical multiplicative relationship between elements of two samples. Given samples x=(x1,x2,,xn)\mathbf{x} = (x_1, x_2, \ldots, x_n) and y=(y1,y2,,ym)\mathbf{y} = (y_1, y_2, \ldots, y_m) with all positive values, this estimator is defined as:

Ratio(x,y)=exp(mediani,j(logxilogyj))=exp(Shift(logx,logy))\operatorname{Ratio}(\mathbf{x}, \mathbf{y}) = \exp(\operatorname{median}_{i,j}(\log x_i - \log y_j)) = \exp(\operatorname{Shift}(\log \mathbf{x}, \log \mathbf{y}))

A naive approach would compute all n×mn \times m ratios, sort them, and extract the median. With n=m=10000n = m = 10000, this creates 100 million values, requiring quadratic memory and O(nmlog(nm))O(n m \log(n m)) time.

The presented algorithm avoids this cost by exploiting the multiplicative-additive duality. Taking the logarithm of a ratio yields a difference:

log(xiyj)=log(xi)log(yj)\log(\frac{x_i}{y_j}) = \log(x_i) - \log(y_j)

This transforms the problem of finding the median of pairwise ratios into finding the median of pairwise differences in log-space.

The algorithm operates in three steps:

  • Log-transform — Apply log\log to each element of both samples. If x\mathbf{x} was sorted, logx\log \mathbf{x} remains sorted (log is monotonically increasing for positive values).

  • Delegate to Shift — Use the Shift algorithm to compute the desired quantile of pairwise differences in log-space. This leverages the O((m+n)logL)O((m + n) \log L) complexity of the Shift algorithm.

  • Exp-transform — Apply exp\exp to convert the result back to ratio-space. If the log-space median is dd, then exp(d)=exp(log(xi)log(yj))=xiyj\exp(d) = \exp(\log(x_i) - \log(y_j)) = \frac{x_i}{y_j} is the original ratio.

The total complexity is O((m+n)logL)O((m + n) \log L) per quantile, where LL represents the convergence precision in the log-space binary search. This is dramatically more efficient than the naive O(nmlog(nm))O(n m \log(n m)) approach.

Memory usage is O(m+n)O(m + n) for storing the log-transformed samples, compared to O(nm)O(n m) for materializing all pairwise ratios.

namespace Pragmastat.Algorithms;

using System;
using System.Collections.Generic;
using System.Linq;
using Pragmastat.Exceptions;
using Pragmastat.Internal;

/// <summary>
/// Computes quantiles of pairwise ratios via log-transformation and FastShift delegation.
/// Ratio(x, y) = exp(Shift(log(x), log(y)))
/// </summary>
public static class FastRatio
{
  /// <summary>
  /// Computes quantiles of all pairwise ratios { x_i / y_j }.
  /// Time: O((m + n) * log(precision)) per quantile. Space: O(m + n).
  /// </summary>
  /// <remarks>
  /// Log-transformation preserves sort order for positive values.
  /// </remarks>
  /// <param name="x">First sample (must be strictly positive).</param>
  /// <param name="y">Second sample (must be strictly positive).</param>
  /// <param name="p">Probabilities in [0, 1].</param>
  /// <param name="assumeSorted">If true, assumes x and y are already sorted in ascending order.</param>
  /// <returns>Quantile values for each probability in p.</returns>
  public static double[] Estimate(IReadOnlyList<double> x, IReadOnlyList<double> y, double[] p, bool assumeSorted = false)
  {
    // Log-transform both samples (includes positivity check)
    var logX = MathExtensions.Log(x, Subject.X);
    var logY = MathExtensions.Log(y, Subject.Y);

    // Delegate to FastShift in log-space
    var logResult = FastShift.Estimate(logX, logY, p, assumeSorted);

    // Exp-transform back to ratio-space
    return logResult.Select(v => Math.Exp(v)).ToArray();
  }
}

Tests

Ratio(x,y)=exp(Shift(logx,logy))\operatorname{Ratio}(\mathbf{x}, \mathbf{y}) = \exp(\operatorname{Shift}(\log \mathbf{x}, \log \mathbf{y}))

The Ratio\operatorname{Ratio} test suite contains 37 test cases (25 original + 12 unsorted), excluding zero values due to division constraints. The new definition uses geometric interpolation (via log-space), which affects expected values for even m×nm \times n cases.

Demo examples (n=m=5n = m = 5) — from manual introduction, validating properties:

  • demo-1: x=(1,2,4,8,16)\mathbf{x} = (1, 2, 4, 8, 16), y=(2,4,8,16,32)\mathbf{y} = (2, 4, 8, 16, 32), expected output: 0.50.5 (base case, odd m×nm \times n)
  • demo-2: x=(1,2,4,8,16)\mathbf{x} = (1, 2, 4, 8, 16), y=(1,2,4,8,16)\mathbf{y} = (1, 2, 4, 8, 16), expected output: 11 (identity property)
  • demo-3: x=(2,4,8,16,32)\mathbf{x} = (2, 4, 8, 16, 32), y=(10,20,40,80,160)\mathbf{y} = (10, 20, 40, 80, 160) (= [2×demo-1.x, 5×demo-1.y]), expected output: 0.20.2 (scale property)

Natural sequences ([n,m]in1,2,3×1,2,3[n, m] in {1, 2, 3} \times {1, 2, 3}) — 9 combinations:

  • natural-1-1: x=(1)\mathbf{x} = (1), y=(1)\mathbf{y} = (1), expected output: 11
  • natural-1-2: x=(1)\mathbf{x} = (1), y=(1,2)\mathbf{y} = (1, 2), expected output: 0.707\approx 0.707 (=0.5= \sqrt{0.5}, geometric interpolation)
  • natural-1-3: x=(1)\mathbf{x} = (1), y=(1,2,3)\mathbf{y} = (1, 2, 3), expected output: 0.50.5
  • natural-2-1: x=(1,2)\mathbf{x} = (1, 2), y=(1)\mathbf{y} = (1), expected output: 1.414\approx 1.414 (=2= \sqrt{2}, geometric interpolation)
  • natural-2-2: x=(1,2)\mathbf{x} = (1, 2), y=(1,2)\mathbf{y} = (1, 2), expected output: 11
  • natural-2-3: x=(1,2)\mathbf{x} = (1, 2), y=(1,2,3)\mathbf{y} = (1, 2, 3), expected output: 0.816\approx 0.816 (geometric interpolation)
  • natural-3-1: x=(1,2,3)\mathbf{x} = (1, 2, 3), y=(1)\mathbf{y} = (1), expected output: 22
  • natural-3-2: x=(1,2,3)\mathbf{x} = (1, 2, 3), y=(1,2)\mathbf{y} = (1, 2), expected output: 1.225\approx 1.225 (geometric interpolation)
  • natural-3-3: x=(1,2,3)\mathbf{x} = (1, 2, 3), y=(1,2,3)\mathbf{y} = (1, 2, 3), expected output: 11

Additive distribution ([n,m]in5,10,30×5,10,30[n, m] in {5, 10, 30} \times {5, 10, 30}) — 9 combinations with Additive(10,1)\underline{\operatorname{Additive}}(10, 1):

  • additive-5-5, additive-5-10, additive-5-30
  • additive-10-5, additive-10-10, additive-10-30
  • additive-30-5, additive-30-10, additive-30-30
  • Random generation: x\mathbf{x} uses seed 0, y\mathbf{y} uses seed 1

Uniform distribution ([n,m]in5,100×5,100[n, m] in {5, 100} \times {5, 100}) — 4 combinations with Uniform(0,1)\underline{\operatorname{Uniform}}(0, 1):

  • uniform-5-5, uniform-5-100, uniform-100-5, uniform-100-100
  • Random generation: x\mathbf{x} uses seed 2, y\mathbf{y} uses seed 3
  • Note: all generated values are strictly positive (no zeros); values near zero test numerical stability of log-transformation

The natural sequences verify the identity property (Ratio(x,x)=1\operatorname{Ratio}(\mathbf{x}, \mathbf{x}) = 1) and validate ratio calculations with simple integer inputs. Note that implementations should handle the practical constraint of avoiding division by values near zero.

Unsorted tests — verify independent sorting for ratio calculation (12 tests):

  • unsorted-x-natural-{n}-{m} for (n,m)in(3,3),(4,4)(n,m) in {(3,3), (4,4)}: X unsorted (reversed), Y sorted (2 tests)
  • unsorted-y-natural-{n}-{m} for (n,m)in(3,3),(4,4)(n,m) in {(3,3), (4,4)}: X sorted, Y unsorted (reversed) (2 tests)
  • unsorted-both-natural-{n}-{m} for (n,m)in(3,3),(4,4)(n,m) in {(3,3), (4,4)}: both unsorted (reversed) (2 tests)
  • unsorted-demo-unsorted-x: x=(16,1,8,2,4)\mathbf{x} = (16, 1, 8, 2, 4), y=(2,4,8,16,32)\mathbf{y} = (2, 4, 8, 16, 32) (demo-1 with X unsorted)
  • unsorted-demo-unsorted-y: x=(1,2,4,8,16)\mathbf{x} = (1, 2, 4, 8, 16), y=(32,2,16,4,8)\mathbf{y} = (32, 2, 16, 4, 8) (demo-1 with Y unsorted)
  • unsorted-demo-both-unsorted: x=(8,1,16,4,2)\mathbf{x} = (8, 1, 16, 4, 2), y=(16,32,2,8,4)\mathbf{y} = (16, 32, 2, 8, 4) (demo-1 both unsorted)
  • unsorted-identity-unsorted: x=(4,1,8,2,16)\mathbf{x} = (4, 1, 8, 2, 16), y=(16,1,8,4,2)\mathbf{y} = (16, 1, 8, 4, 2) (identity property, both unsorted)
  • unsorted-asymmetric-unsorted-2-3: x=(2,1)\mathbf{x} = (2, 1), y=(3,1,2)\mathbf{y} = (3, 1, 2) (asymmetric, both unsorted)
  • unsorted-power-unsorted-5: x=(16,2,8,1,4)\mathbf{x} = (16, 2, 8, 1, 4), y=(32,4,16,2,8)\mathbf{y} = (32, 4, 16, 2, 8) (powers of 2 unsorted)