Distributions

Distributions are parametrized random generators with well-defined statistical properties. Each distribution describes a family of random variables characterized by specific parameters.

Notation

  • XAdditive(0,1)X \sim \underline{\operatorname{Additive}}(0, 1)XX is distributed as standard normal
  • Estimator(x)\operatorname{Estimator}(\mathbf{x}) — estimate computed from sample
  • Estimator[X]\operatorname{Estimator}[X] — true value (asymptotic limit)
  • nn \to \infty — asymptotic case (large sample approximation)

Additive (Normal)

Additive(mean,stdDev)\underline{\operatorname{Additive}}(\mathrm{mean}, \mathrm{stdDev})
  • mean\mathrm{mean}: location parameter (center of the distribution), consistent with Center\operatorname{Center}
  • stdDev\mathrm{stdDev}: scale parameter (standard deviation), can be rescaled to Spread\operatorname{Spread}

  • Formation: the sum of many variables X1+X2++XnX_1 + X_2 + \ldots + X_n under mild CLT (Central Limit Theorem) conditions (e.g., Lindeberg-Feller).
  • Origin: historically called Normal or Gaussian distribution after Carl Friedrich Gauss and others.
  • Rename Motivation: renamed to Additive\underline{\operatorname{Additive}} to reflect its formation mechanism through addition.
  • Properties: symmetric, bell-shaped, characterized by central limit theorem convergence.
  • Applications: measurement errors, heights and weights in populations, test scores, temperature variations.
  • Characteristics: symmetric around the mean, light tails, finite variance.
  • Caution: no perfectly additive distributions exist in real data; all real-world measurements contain deviations. Traditional estimators like Mean\operatorname{Mean} and StdDev\operatorname{StdDev} lack robustness to outliers; use them only when strong evidence supports approximate additivity with no extreme measurements.

Multiplic (LogNormal)

Multiplic(logMean,logStdDev)\underline{\operatorname{Multiplic}}(\mathrm{logMean}, \mathrm{logStdDev})
  • logMean\mathrm{logMean}: mean of log values (location parameter; elogMeane^\mathrm{logMean} equals the geometric mean)
  • logStdDev\mathrm{logStdDev}: standard deviation of log values (scale parameter; controls multiplicative spread)

  • Formation: the product of many positive variables X1X2XnX_1 \cdot X_2 \cdot \ldots \cdot X_n with mild conditions (e.g., finite variance of logX\log X).
  • Origin: historically called Log-Normal or Galton distribution after Francis Galton.
  • Rename Motivation: renamed to Multiplic\underline{\operatorname{Multiplic}} to reflect its formation mechanism through multiplication.
  • Properties: logarithm of a Multiplic\underline{\operatorname{Multiplic}} (LogNormal) variable follows an Additive\underline{\operatorname{Additive}} (Normal) distribution.
  • Applications: stock prices, file sizes, reaction times, income distributions, biological growth rates.
  • Caution: no perfectly multiplic distributions exist in real data; all real-world measurements contain deviations. Traditional estimators struggle with the inherent skewness and heavy right tail.

Exponential

Exp(rate)\underline{\operatorname{Exp}}(\mathrm{rate})
  • rate\mathrm{rate}: rate parameter (λ>0\lambda > 0, controls decay speed; mean = 1rate\frac{1}{\mathrm{rate}})

  • Formation: the waiting time between events in a Poisson process.
  • Origin: naturally arises from memoryless processes where the probability of an event occurring is constant over time.
  • Properties: memoryless (past events do not affect future probabilities).
  • Applications: time between failures, waiting times in queues, radioactive decay, customer service times.
  • Characteristics: always positive, right-skewed with a light (exponential) tail.
  • Caution: extreme skewness makes traditional location estimators like Mean\operatorname{Mean} unreliable; robust estimators provide more stable results.

Power (Pareto)

Power(min,shape)\underline{\operatorname{Power}}(\mathrm{min}, \mathrm{shape})
  • min\mathrm{min}: minimum value (lower bound, min>0\mathrm{min} > 0)
  • shape\mathrm{shape}: shape parameter (α>0\alpha > 0, controls tail heaviness; smaller values = heavier tails)

  • Formation: follows a power-law relationship where large values are rare but possible.
  • Origin: historically called Pareto distribution after Vilfredo Paretos work on wealth distribution.
  • Rename Motivation: renamed to Power\underline{\operatorname{Power}} to reflect its connection with power-law.
  • Properties: exhibits scale invariance and extremely heavy tails.
  • Applications: wealth distribution, city population sizes, word frequencies, earthquake magnitudes, website traffic.
  • Characteristics: infinite variance for many parameter values; extreme outliers are common.
  • Caution: traditional variance-based estimators completely fail; robust estimators are essential for reliable analysis.

Uniform

Uniform(min,max)\underline{\operatorname{Uniform}}(\mathrm{min}, \mathrm{max})
  • min\mathrm{min}: lower bound of the support interval
  • max\mathrm{max}: upper bound of the support interval (max>min\mathrm{max} > \mathrm{min})

  • Formation: all values within a bounded interval have equal probability.
  • Origin: represents complete uncertainty within known bounds.
  • Properties: rectangular probability density, finite support with hard boundaries.
  • Applications: random number generation, round-off errors, arrival times within known intervals.
  • Characteristics: symmetric, bounded, no tail behavior.
  • Note: traditional estimators work reasonably well due to symmetry and bounded nature.