Normal-approximation-based test for k-sample linear hypothesis via random integration proposed by Li et al. (2025)
Source:R/LHNB2025.GLHTBF.NABT.R
LHNB2025.GLHTBF.NABT.RdLi et al. (2025)'s test for general linear hypothesis testing (GLHT) problem for high-dimensional data under heteroscedasticity.
Arguments
- Y
A list of \(k\) data matrices. The \(i\)th element represents the data matrix (\(n_i \times p\)) from the \(i\)th population with each row representing a \(p\)-dimensional observation.
- B
A vector of \(k\) coefficients \((B_1,\ldots,B_k)\) specifying the linear combination of group mean vectors.
- O
A length-\(p\) vector used to form \(\Omega = \mathrm{diag}(O_1^2,\ldots,O_p^2)\).
- A
A length-\(p\) vector used in \(W = \Omega + A A^\top\).
- n
A vector of \(k\) sample sizes. The \(i\)th element represents the sample size of group \(i\), \(n_i\).
- p
The dimension of data.
Details
Suppose we have \(k\) independent high-dimensional samples $$\boldsymbol{Y}_{i1},\ldots,\boldsymbol{Y}_{in_i}\ \text{are i.i.d. with}\ \mathrm{E}(\boldsymbol{Y}_{i1})=\boldsymbol{\mu}_i,\ \mathrm{Cov}(\boldsymbol{Y}_{i1})=\boldsymbol{\Sigma}_i,\ i=1,\ldots,k,$$ where the covariance matrices \(\boldsymbol{\Sigma}_i\) may differ across groups.
It is of interest to test the k-sample linear hypothesis $$H_0:\ \sum_{i=1}^k B_i\boldsymbol{\mu}_i=\boldsymbol{0}\quad \text{vs.}\quad H_1:\ \sum_{i=1}^k B_i\boldsymbol{\mu}_i\neq\boldsymbol{0}.$$
Li et al. (2025) proposed a random-integration-based U-statistic \(T_n\) (Eq. (5) in the paper), constructed using the weight matrix \(\boldsymbol{W}=\boldsymbol{\Omega}+\boldsymbol{A}\boldsymbol{A}^\top\) with \(\boldsymbol{\Omega}=\mathrm{diag}(O_1^2,\ldots,O_p^2)\). They showed that the standardized statistic \(Z=T_n/\sqrt{\hat{\sigma}^2}\) is approximated by \(N(0,1)\) under \(H_0\).
A recommended default choice of tuning parameters is of the form \(A_1=\cdots=A_p=\sqrt{5}\,p^{-3/8}\) and \(O_k=\sqrt{\epsilon\left(1+\frac{2k}{3p}\right)}\), \(k=1,\ldots,p\).
References
Li J, Hong S, Niu Z, Bai Z (2025). “Test for high-dimensional linear hypothesis of mean vectors via random integration.” Statistical Papers, 66(1), 8.
Examples
# \donttest{
library("HDNRA")
data("corneal")
# corneal: 150 x p, split into 4 groups (n_i x p)
group1 <- as.matrix(corneal[1:43, ]) # normal
group2 <- as.matrix(corneal[44:57, ]) # unilateral suspect
group3 <- as.matrix(corneal[58:78, ]) # suspect map
group4 <- as.matrix(corneal[79:150,]) # clinical keratoconus
Y <- list(group2, group3, group4)
n <- c(nrow(group2), nrow(group3), nrow(group4))
p <- ncol(group2)
# One linear combination (example): B = (4, -1.5, -2.5)
B <- c(4, -1.5, -2.5)
# Paper-style tuning parameters (example with eps = 2)
A <- rep(sqrt(5) * p^(-3/8), p)
O <- sqrt(2) * (1 + 2*(1:p)/(3*p))
LHNB2025.GLHTBF.NABT(Y, B, O, A, n, p)
#>
#> Results of Hypothesis Test
#> --------------------------
#>
#> Test name: Random integration test
#>
#> Null Hypothesis: Linear combination of mean vectors is 0
#>
#> Alternative Hypothesis: Linear combination of mean vectors is not 0
#>
#> Data: Y
#>
#> Sample Sizes: n1 = 14
#> n2 = 21
#> n3 = 72
#>
#> Sample Dimension: 2000
#>
#> Test Statistic: Z[RI] = -0.0521
#>
#> Approximation method to the Normal approximation
#> null distribution of Z[RI]:
#>
#> Approximation parameter(s): Tn = -5.4898
#> sigma^2 = 11082.9278
#>
#> P-value: 0.5207941
#>
# }