Test proposed by Zhu et al. (2022) — glht

Zhu et al. (2022)'s test for general linear hypothesis testing (GLHT) problem for high-dimensional data with assuming that underlying covariance matrices are the same.

Usage

glht_zzz2022(Y,X,C)

Arguments

Y: An $n\times p$ response matrix obtained by independently observing a $p$-dimensional response variable for $n$ subjects.
X: A known $n\times k$ full-rank design matrix with $\operatorname{rank}(\boldsymbol{G})=k<n-2$.
C: A known matrix of size $q\times k$ with $\operatorname{rank}(\boldsymbol{C})=q<k$.

Value

A (list) object of S3 class htest containing the following elements:

p.value: the $p$-value of the test proposed by Zhu et al. (2022)
statistic: the test statistic proposed by Zhu et al. (2022).
df: estimated approximate degrees of freedom of Zhu et al. (2022)'s test.

Details

A high-dimensional linear regression model can be expressed as $$\boldsymbol{Y}=\boldsymbol{X\Theta}+\boldsymbol{\epsilon},$$ where $\Theta$ is a $k\times p$ unknown parameter matrix and $\boldsymbol{\epsilon}$ is an $n\times p$ error matrix.

It is of interest to test the following GLHT problem $$H_0: \boldsymbol{C\Theta}=\boldsymbol{0}, \quad \text { vs. } H_1: \boldsymbol{C\Theta} \neq \boldsymbol{0}.$$

Zhu et al. (2022) proposed the following test statistic: $$T_{ZZZ}=\frac{(n-k-2)}{(n-k)pq}\operatorname{tr}(\boldsymbol{S}_h\boldsymbol{D}^{-1}),$$ where $\boldsymbol{S}_h$ and $\boldsymbol{S}_e$ are the variation matrices due to the hypothesis and error, respectively, and $\boldsymbol{D}$ is the diagonal matrix with the diagonal elements of $\boldsymbol{S}_e/(n-k)$. They showed that under the null hypothesis, $T_{ZZZ}$ and a chi-squared-type mixture have the same limiting distribution.

References

Zhu T, Zhang L, Zhang J (2023). “Hypothesis Testing in High-Dimensional Linear Regression: A Normal Reference Scale-Invariant Test.” Statistica Sinica. doi:10.5705/ss.202020.0362 .

Examples

set.seed(1234)
k <- 3
q <- k - 1
p <- 50
n <- c(25, 30, 40)
rho <- 0.01
Theta <- matrix(rep(0, k * p), nrow = k)
X <- matrix(c(rep(1, n[1]), rep(0, sum(n)), rep(1, n[2]), rep(0, sum(n)), rep(1, n[3])),
  ncol = k, nrow = sum(n)
)
y <- (-2 * sqrt(1 - rho) + sqrt(4 * (1 - rho) + 4 * p * rho)) / (2 * p)
x <- y + sqrt((1 - rho))
Gamma <- matrix(rep(y, p * p), nrow = p)
diag(Gamma) <- rep(x, p)
U <- matrix(ncol = sum(n), nrow = p)
for (i in 1:sum(n)) {
  U[, i] <- rnorm(p, 0, 1)
}
Y <- X %*% Theta + t(U) %*% Gamma
C <- cbind(diag(q), -rep(1, q))
glht_zzz2022(Y, X, C)
#> 
#> 
#> 
#> data:  
#> statistic = 1.0496, df = 105.45, p-value = 0.3447
#>