Normal-approximation-based test for k-sample linear hypothesis via random integration proposed by Li et al. (2025)

Li et al. (2025)'s test for general linear hypothesis testing (GLHT) problem for high-dimensional data under heteroscedasticity.

Usage

LHNB2025.GLHTBF.NABT(Y, B, O, A, n, p)

Arguments

Y: A list of $k$ data matrices. The $i$th element represents the data matrix ($n_i \times p$) from the $i$th population with each row representing a $p$-dimensional observation.
B: A vector of $k$ coefficients $(B_1,\ldots,B_k)$ specifying the linear combination of group mean vectors.
O: A length-$p$ vector used to form $\Omega = \mathrm{diag}(O_1^2,\ldots,O_p^2)$.
A: A length-$p$ vector used in $W = \Omega + A A^\top$.
n: A vector of $k$ sample sizes. The $i$th element represents the sample size of group $i$, $n_i$.
p: The dimension of data.

Value

A list of class "NRtest" containing the results of the hypothesis test.

Details

Suppose we have $k$ independent high-dimensional samples $$\boldsymbol{Y}_{i1},\ldots,\boldsymbol{Y}_{in_i}\ \text{are i.i.d. with}\ \mathrm{E}(\boldsymbol{Y}_{i1})=\boldsymbol{\mu}_i,\ \mathrm{Cov}(\boldsymbol{Y}_{i1})=\boldsymbol{\Sigma}_i,\ i=1,\ldots,k,$$ where the covariance matrices $\boldsymbol{\Sigma}_i$ may differ across groups.

It is of interest to test the k-sample linear hypothesis $$H_0:\ \sum_{i=1}^k B_i\boldsymbol{\mu}_i=\boldsymbol{0}\quad \text{vs.}\quad H_1:\ \sum_{i=1}^k B_i\boldsymbol{\mu}_i\neq\boldsymbol{0}.$$

Li et al. (2025) proposed a random-integration-based U-statistic $T_n$ (Eq. (5) in the paper), constructed using the weight matrix $\boldsymbol{W}=\boldsymbol{\Omega}+\boldsymbol{A}\boldsymbol{A}^\top$ with $\boldsymbol{\Omega}=\mathrm{diag}(O_1^2,\ldots,O_p^2)$. They showed that the standardized statistic $Z=T_n/\sqrt{\hat{\sigma}^2}$ is approximated by $N(0,1)$ under $H_0$.

A recommended default choice of tuning parameters is of the form $A_1=\cdots=A_p=\sqrt{5}\,p^{-3/8}$ and $O_k=\sqrt{\epsilon\left(1+\frac{2k}{3p}\right)}$, $k=1,\ldots,p$.

References

Li J, Hong S, Niu Z, Bai Z (2025). “Test for high-dimensional linear hypothesis of mean vectors via random integration.” Statistical Papers, 66(1), 8.

Examples

# \donttest{
library("HDNRA")
data("corneal")

# corneal: 150 x p, split into 4 groups (n_i x p)
group1 <- as.matrix(corneal[1:43,  ])      # normal
group2 <- as.matrix(corneal[44:57, ])      # unilateral suspect
group3 <- as.matrix(corneal[58:78, ])      # suspect map
group4 <- as.matrix(corneal[79:150,])      # clinical keratoconus

Y <- list(group2, group3, group4)
n <- c(nrow(group2), nrow(group3), nrow(group4))
p <- ncol(group2)

# One linear combination (example): B = (4, -1.5, -2.5)
B <- c(4, -1.5, -2.5)

# Paper-style tuning parameters (example with eps = 2)
A <- rep(sqrt(5) * p^(-3/8), p)
O <- sqrt(2) * (1 + 2*(1:p)/(3*p))

LHNB2025.GLHTBF.NABT(Y, B, O, A, n, p)
#> 
#> Results of Hypothesis Test
#> --------------------------
#> 
#> Test name:                       Random integration test
#> 
#> Null Hypothesis:                 Linear combination of mean vectors is 0
#> 
#> Alternative Hypothesis:          Linear combination of mean vectors is not 0
#> 
#> Data:                            Y
#> 
#> Sample Sizes:                    n1 = 14
#>                                  n2 = 21
#>                                  n3 = 72
#> 
#> Sample Dimension:                2000
#> 
#> Test Statistic:                  Z[RI] = -0.0521
#> 
#> Approximation method to the      Normal approximation
#> null distribution of Z[RI]: 
#> 
#> Approximation parameter(s):      Tn      =    -5.4898
#>                                  sigma^2 = 11082.9278
#> 
#> P-value:                         0.5207941
#> 
# }