API

Modules

Types and constants

Functions and macros

Documentation

RunStatistics.RunStatisticsModule

This package implements the evaluation of the cumulative distribution function of the Squares test statistic originally defined in

Frederik Beaujean and Allen Caldwell. A Test Statistic for Weighted Runs.

https://arxiv.org/abs/1005.3233

The authors further derived an approximation to be able to compute the cumulative also for large numbers of observations in

Frederik Beaujean and Allen Caldwell. Is the bump significant? An axion-search example

https://arxiv.org/abs/1710.06642

Where they renamed the weighted-runs statistic to the *Squares statistic*. This code is based on the original implementation by Frederik Beaujean in c++ and mathematica:

https://github.com/fredRos/runs

source
RunStatistics.IntegrandDataType
IntegrandData

Represent the parameters needed for the 1D numerical integration performed in Delta().

T_obs is the value for the Squares statistic observed in the data, Nl the left-hand length and Nr the right-hand length of a boundary spanning run, as defined in section II.A. in

Frederik Beaujean and Allen Caldwell. Is the bump significant? An axion-search example

https://arxiv.org/abs/1710.06642

source
RunStatistics.IntegrandDataMethod
(integrand::IntegrandData)(x::Real)

Compute the integrand in the Δ(Tobs | Nl, N_r) term defined in equation (13) in

Frederik Beaujean and Allen Caldwell. Is the bump significant? An axion-search example

https://arxiv.org/abs/1710.06642

source
RunStatistics.PartitionType
Partition(n::Int, k::Int, h::Int, c::Vector{Int}, y::Vector{Int})

Express the integer partition of n into k parts in the multiplicity representation with n = \sum_{i=2}^(h + 1) c_i * y_i.

(see https://en.wikipedia.org/wiki/Partition(numbertheory))

h is the number of distinct parts, y an array containing the distinct parts and c an array containing their multiplicities.

NOTE: due to the computation of subsequent partitions with the algorithm used in next_partition!() the arrays y and c only hold relevant values for the indices [2, h + 1]

When reading a partition: ignore the first element of c and y and do not read beyond c[h + 1], y[h + 1]!

source
RunStatistics.DeltaFunction
Delta(T_obs::Real, Nl::Integer, Nr::Integer, [epsrel::Nothing, epsabs::::Union{Real, Nothing}])

Compute the Δ(Tobs | Nl, N_r) term defined in equation (13) in

Frederik Beaujean and Allen Caldwell. Is the bump significant? An axion-search example

https://arxiv.org/abs/1710.06642

The calculation involves a 1D numerical integration using the quadgk() function with the relative and absolute target precision epsrel and epsabs. If not specified, the default values of quadgk() are used. See https://juliamath.github.io/QuadGK.jl/stable/ for documentation.

source
RunStatistics.HMethod
H(a::Real, b::Real, N::Integer)

Compute the cumulative of h() as defined in section II.A. in

Frederik Beaujean and Allen Caldwell. Is the bump significant? An axion-search example

https://arxiv.org/abs/1710.06642

source
RunStatistics.hMethod
h(chisq::Real, N::Integer)

Compute the probability density h(χ2 | Nr) for the right-hand side of a boundary spanning run to be above expectation; as explained in section II.A. in the paper below.

Calculate it as the sum of probability densities for runs of different length times the χ2 probability for that number of degrees of freedom.

Implements the term defined in equation (8) in

Frederik Beaujean and Allen Caldwell. Is the bump significant? An axion-search example

https://arxiv.org/abs/1710.06642

source
RunStatistics.init_partitionMethod
init_partition(n::Int, k::Int)

Initiate the first partition of an integer n into k parts; arguments must satisfy 0 < k <= n. Returns an object of type Partition.

The elements in y[1] and c[1], of the arrays y and c containing the distinct parts and their multiplicities, are buffer values needed for the computation of the next partition in next_partition!().

When reading a partition: ignore the first element of c and y and do not read beyond c[h + 1], y[h + 1]: n = \sum_{i=2}^(h + 1) c_i * y_i.

source
RunStatistics.next_partition!Method
next_partition!(p::Partition)

Compute the next partition of p, using a modified version of Algorithm Z from A. Zoghbi: Algorithms for generating integer partitions, Ottawa (1993), https://www.ruor.uottawa.ca/handle/10393/6506.

The partition p is updated in place, saving memory. Returns a boolean corresponding to whether the final partition has been reached.

source
RunStatistics.squares_cdfMethod
squares_cdf(T_obs::Real, N::Integer)

Compute P(T < Tobs | N), the value of the cumulative distribution of the Squares statistic T at the value `Tobsobserved inN` independent trials with gaussian probability.

T_obs is the value of the test statistic for the observed data set; i.e., the largest χ^2 of any run of consecutive observed values above the expectation in a sequence of N independent trials with Gaussian uncertainty.

N is the total number of data points.

The calculation implements equations (16) and (17) from

Frederik Beaujean and Allen Caldwell. A Test Statistic for Weighted Runs. Journal of Statistical Planning and Inference 141, no. 11 (November 2011): 3437–46. doi:10.1016/j.jspi.2011.04.022

https://arxiv.org/abs/1005.3233.

source
RunStatistics.squares_cdf_approxFunction

squarescdfapprox(T_obs::Real, L::Integer, [epsp::Real])

Compute an approximation of P(T < T_obs | L = n * N), the value of the cumulative distribution function for the Squares test statistic at T_obs, the value of the Squares statistic observed in the data. The total number of datapoints is L = n * N, if not defined otherwise, the function chooses the default values N = 80 and n = L / N.

To specify a certain choice for N and n, do:

squares_cdf_approx(T_obs::Real,  Ns::AbstractArray, epsp::Real = 0)

With Ns being an array holding N::Integer and n::Real as its first and second element: Ns = [N, n].

The accuracy's lower bound is n * 10^(-14), a desired accuracy up to this boundary can be specified with the optional epsp argument. See documentation on Accuracy under Guide/Details of computation.

This function implements equation (17) from:

Frederik Beaujean and Allen Caldwell. Is the bump significant? An axion-search example

https://arxiv.org/abs/1710.06642

source
RunStatistics.squares_pvalueMethod
squares_pvalue(T_obs::Real, N::Integer)

Compute P(T >= T_obs | N), the p value for the Squares test statistic T being larger or equal to T_obs, the value of the Squares statistic observed in N datapoints.

The Squares statistic T denotes the largest χ^2 of any run of consecutive successes (above expectation) in a sequence of N independent trials with Gaussian uncertainty.

Via squares_cdf() this function implements equations (16) and (17) from

Frederik Beaujean and Allen Caldwell. A Test Statistic for Weighted Runs. Journal of Statistical Planning and Inference 141, no. 11 (November 2011): 3437–46. doi:10.1016/j.jspi.2011.04.022

https://arxiv.org/abs/1005.3233.

source
RunStatistics.squares_pvalue_approxFunction
squares_pvalue_approx(T_obs::Real, L::Integer, [epsp::Real])

Compute an approximation of P(T >= T_obs | L), the p value for the Squares test statistic T being larger or equal to T_obs, the value of the Squares statistic observed in the data. The total number of datapoints is L = n * N, if not defined otherwise, the function chooses the default values N = 80 and n = L / N.

To specify a certain choice for N and n, do:

squares_pvalue_approx(T_obs::Real, Ns::AbstractArray,  [epsp::Real])

With Ns being an array holding N::Integer and n::Real as its first and second element: Ns = [N, n]

The accuracy's lower bound is n * 10^(-14), a desired accuracy up to this boundary can be specified with the optional epsp argument. See documentation on Accuracy under Guide/Details of computation.

Via squares_cdf_approx() this function implements equation (17) from:

Frederik Beaujean and Allen Caldwell. Is the bump significant? An axion-search example

https://arxiv.org/abs/1710.06642

source
RunStatistics.t_obsMethod
t_obs(X::AbstractArray, μ::Real, σ2::Real)

Compute the value of the Squares test statistic T_obs i.e. the largest χ2 of any run of consecutive successes (above expectation) in a sequence of N independent trials with Gaussian uncertainty. μ and σ2 are the expectation and variance of the observations.

Find the location(s) of the run(s) that produces T_obs.

Returns a tuple containing T_obs and one or more arrays containing the indices of the runs that produce T_obs.

For the Squares statistic to be calculable, the observed data must satisfy following conditions:

    All observations {X_i} are independent. 
    Each observation is normally distributed, X_i ∼ N( µ_i, σ^2_i ).
    Mean µ_i and variance σ^2_i are known.

In case the observations {X_i} have individual expectations and variances, use:

t_obs(X::AbstractArray, μ::AbstractArray, σ2::AbstractArray)

With μ[i] and σ2[i] being the mean and variance of the i-th element of X.

See:

Frederik Beaujean and Allen Caldwell. A Test Statistic for Weighted Runs. Journal of Statistical Planning and Inference 141, no. 11 (November 2011): 3437–46.

https://www.sciencedirect.com/science/article/abs/pii/S0378375811001935?via%3Dihub

https://arxiv.org/abs/1005.3233

source