API
Modules
Types and constants
Functions and macros
RunStatistics.Delta
RunStatistics.H
RunStatistics.h
RunStatistics.init_partition
RunStatistics.next_partition!
RunStatistics.squares_cdf
RunStatistics.squares_cdf_approx
RunStatistics.squares_pvalue
RunStatistics.squares_pvalue_approx
RunStatistics.t_obs
Documentation
RunStatistics.RunStatistics
— ModuleThis package implements the evaluation of the cumulative distribution function of the Squares test statistic
originally defined in
Frederik Beaujean and Allen Caldwell. A Test Statistic for Weighted Runs.
https://arxiv.org/abs/1005.3233
The authors further derived an approximation to be able to compute the cumulative also for large numbers of observations in
Frederik Beaujean and Allen Caldwell. Is the bump significant? An axion-search example
https://arxiv.org/abs/1710.06642
Where they renamed the weighted-runs statistic to the *Squares statistic*
. This code is based on the original implementation by Frederik Beaujean in c++ and mathematica:
https://github.com/fredRos/runs
RunStatistics.IntegrandData
— TypeIntegrandData
Represent the parameters needed for the 1D numerical integration performed in Delta()
.
T_obs
is the value for the Squares statistic observed in the data, Nl
the left-hand length and Nr
the right-hand length of a boundary spanning run, as defined in section II.A. in
Frederik Beaujean and Allen Caldwell. Is the bump significant? An axion-search example
https://arxiv.org/abs/1710.06642
RunStatistics.IntegrandData
— Method(integrand::IntegrandData)(x::Real)
Compute the integrand in the Δ(Tobs | Nl, N_r) term defined in equation (13) in
Frederik Beaujean and Allen Caldwell. Is the bump significant? An axion-search example
https://arxiv.org/abs/1710.06642
RunStatistics.Partition
— TypePartition(n::Int, k::Int, h::Int, c::Vector{Int}, y::Vector{Int})
Express the integer partition of n
into k
parts in the multiplicity representation with n = \sum_{i=2}^(h + 1) c_i * y_i
.
(see https://en.wikipedia.org/wiki/Partition(numbertheory))
h
is the number of distinct parts, y
an array containing the distinct parts and c
an array containing their multiplicities.
NOTE: due to the computation of subsequent partitions with the algorithm used in next_partition!()
the arrays y
and c
only hold relevant values for the indices [2, h + 1]
When reading a partition: ignore the first element of c
and y
and do not read beyond c[h + 1]
, y[h + 1]
!
RunStatistics.Delta
— FunctionDelta(T_obs::Real, Nl::Integer, Nr::Integer, [epsrel::Nothing, epsabs::::Union{Real, Nothing}])
Compute the Δ(Tobs | Nl, N_r) term defined in equation (13) in
Frederik Beaujean and Allen Caldwell. Is the bump significant? An axion-search example
https://arxiv.org/abs/1710.06642
The calculation involves a 1D numerical integration using the quadgk()
function with the relative and absolute target precision epsrel
and epsabs
. If not specified, the default values of quadgk()
are used. See https://juliamath.github.io/QuadGK.jl/stable/ for documentation.
RunStatistics.H
— MethodH(a::Real, b::Real, N::Integer)
Compute the cumulative of h()
as defined in section II.A. in
Frederik Beaujean and Allen Caldwell. Is the bump significant? An axion-search example
https://arxiv.org/abs/1710.06642
RunStatistics.h
— Methodh(chisq::Real, N::Integer)
Compute the probability density h(χ2 | Nr) for the right-hand side of a boundary spanning run to be above expectation; as explained in section II.A. in the paper below.
Calculate it as the sum of probability densities for runs of different length times the χ2 probability for that number of degrees of freedom.
Implements the term defined in equation (8) in
Frederik Beaujean and Allen Caldwell. Is the bump significant? An axion-search example
https://arxiv.org/abs/1710.06642
RunStatistics.init_partition
— Methodinit_partition(n::Int, k::Int)
Initiate the first partition of an integer n
into k
parts; arguments must satisfy 0 < k <= n
. Returns an object of type Partition
.
The elements in y[1]
and c[1]
, of the arrays y
and c
containing the distinct parts and their multiplicities, are buffer values needed for the computation of the next partition in next_partition!()
.
When reading a partition: ignore the first element of c
and y
and do not read beyond c[h + 1]
, y[h + 1]
: n = \sum_{i=2}^(h + 1) c_i * y_i
.
RunStatistics.next_partition!
— Methodnext_partition!(p::Partition)
Compute the next partition of p
, using a modified version of Algorithm Z from A. Zoghbi: Algorithms for generating integer partitions, Ottawa (1993), https://www.ruor.uottawa.ca/handle/10393/6506.
The partition p
is updated in place, saving memory. Returns a boolean
corresponding to whether the final partition has been reached.
RunStatistics.squares_cdf
— Methodsquares_cdf(T_obs::Real, N::Integer)
Compute P(T < Tobs | N), the value of the cumulative distribution of the Squares statistic T
at the value `Tobsobserved in
N` independent trials with gaussian probability.
T_obs
is the value of the test statistic for the observed data set; i.e., the largest χ^2
of any run of consecutive observed values above the expectation in a sequence of N
independent trials with Gaussian uncertainty.
N
is the total number of data points.
The calculation implements equations (16) and (17) from
Frederik Beaujean and Allen Caldwell. A Test Statistic for Weighted Runs. Journal of Statistical Planning and Inference 141, no. 11 (November 2011): 3437–46. doi:10.1016/j.jspi.2011.04.022
https://arxiv.org/abs/1005.3233.
RunStatistics.squares_cdf_approx
— Functionsquarescdfapprox(T_obs::Real, L::Integer, [epsp::Real])
Compute an approximation of P(T < T_obs
| L = n * N
), the value of the cumulative distribution function for the Squares test statistic at T_obs
, the value of the Squares statistic observed in the data. The total number of datapoints is L = n * N
, if not defined otherwise, the function chooses the default values N = 80
and n = L / N
.
To specify a certain choice for N
and n
, do:
squares_cdf_approx(T_obs::Real, Ns::AbstractArray, epsp::Real = 0)
With Ns
being an array holding N::Integer
and n::Real
as its first and second element: Ns = [N, n].
The accuracy's lower bound is n * 10^(-14)
, a desired accuracy up to this boundary can be specified with the optional epsp
argument. See documentation on Accuracy under Guide/Details of computation
.
This function implements equation (17) from:
Frederik Beaujean and Allen Caldwell. Is the bump significant? An axion-search example
https://arxiv.org/abs/1710.06642
RunStatistics.squares_pvalue
— Methodsquares_pvalue(T_obs::Real, N::Integer)
Compute P(T >= T_obs
| N
), the p value for the Squares test statistic T
being larger or equal to T_obs
, the value of the Squares statistic observed in N
datapoints.
The Squares statistic T
denotes the largest χ^2
of any run of consecutive successes (above expectation) in a sequence of N
independent trials with Gaussian uncertainty.
Via squares_cdf()
this function implements equations (16) and (17) from
Frederik Beaujean and Allen Caldwell. A Test Statistic for Weighted Runs. Journal of Statistical Planning and Inference 141, no. 11 (November 2011): 3437–46. doi:10.1016/j.jspi.2011.04.022
https://arxiv.org/abs/1005.3233.
RunStatistics.squares_pvalue_approx
— Functionsquares_pvalue_approx(T_obs::Real, L::Integer, [epsp::Real])
Compute an approximation of P(T >= T_obs
| L
), the p value for the Squares test statistic T being larger or equal to T_obs
, the value of the Squares statistic observed in the data. The total number of datapoints is L = n * N
, if not defined otherwise, the function chooses the default values N = 80
and n = L / N
.
To specify a certain choice for N
and n
, do:
squares_pvalue_approx(T_obs::Real, Ns::AbstractArray, [epsp::Real])
With Ns
being an array holding N::Integer
and n::Real
as its first and second element: Ns = [N, n]
The accuracy's lower bound is n * 10^(-14)
, a desired accuracy up to this boundary can be specified with the optional epsp
argument. See documentation on Accuracy under Guide/Details of computation
.
Via squares_cdf_approx()
this function implements equation (17) from:
Frederik Beaujean and Allen Caldwell. Is the bump significant? An axion-search example
https://arxiv.org/abs/1710.06642
RunStatistics.t_obs
— Methodt_obs(X::AbstractArray, μ::Real, σ2::Real)
Compute the value of the Squares test statistic T_obs
i.e. the largest χ2
of any run of consecutive successes (above expectation) in a sequence of N
independent trials with Gaussian uncertainty. μ
and σ2
are the expectation and variance of the observations.
Find the location(s) of the run(s) that produces T_obs
.
Returns a tuple containing T_obs
and one or more arrays containing the indices of the runs that produce T_obs
.
For the Squares statistic to be calculable, the observed data must satisfy following conditions:
All observations {X_i} are independent.
Each observation is normally distributed, X_i ∼ N( µ_i, σ^2_i ).
Mean µ_i and variance σ^2_i are known.
In case the observations {X_i} have individual expectations and variances, use:
t_obs(X::AbstractArray, μ::AbstractArray, σ2::AbstractArray)
With μ[i]
and σ2[i]
being the mean and variance of the i-th element of X
.
See:
Frederik Beaujean and Allen Caldwell. A Test Statistic for Weighted Runs. Journal of Statistical Planning and Inference 141, no. 11 (November 2011): 3437–46.
https://www.sciencedirect.com/science/article/abs/pii/S0378375811001935?via%3Dihub
https://arxiv.org/abs/1005.3233