Skip to contents

Zhu et al. (2022)'s test for general linear hypothesis testing (GLHT) problem for high-dimensional data with assuming that underlying covariance matrices are the same.

Usage

glht_zzz2022(Y,X,C)

Arguments

Y

An \(n\times p\) response matrix obtained by independently observing a \(p\)-dimensional response variable for \(n\) subjects.

X

A known \(n\times k\) full-rank design matrix with \(\operatorname{rank}(\boldsymbol{G})=k<n-2\).

C

A known matrix of size \(q\times k\) with \(\operatorname{rank}(\boldsymbol{C})=q<k\).

Value

A (list) object of S3 class htest containing the following elements:

p.value

the \(p\)-value of the test proposed by Zhu et al. (2022)

statistic

the test statistic proposed by Zhu et al. (2022).

df

estimated approximate degrees of freedom of Zhu et al. (2022)'s test.

Details

A high-dimensional linear regression model can be expressed as $$\boldsymbol{Y}=\boldsymbol{X\Theta}+\boldsymbol{\epsilon},$$ where \(\Theta\) is a \(k\times p\) unknown parameter matrix and \(\boldsymbol{\epsilon}\) is an \(n\times p\) error matrix.

It is of interest to test the following GLHT problem $$H_0: \boldsymbol{C\Theta}=\boldsymbol{0}, \quad \text { vs. } H_1: \boldsymbol{C\Theta} \neq \boldsymbol{0}.$$

Zhu et al. (2022) proposed the following test statistic: $$T_{ZZZ}=\frac{(n-k-2)}{(n-k)pq}\operatorname{tr}(\boldsymbol{S}_h\boldsymbol{D}^{-1}),$$ where \(\boldsymbol{S}_h\) and \(\boldsymbol{S}_e\) are the variation matrices due to the hypothesis and error, respectively, and \(\boldsymbol{D}\) is the diagonal matrix with the diagonal elements of \(\boldsymbol{S}_e/(n-k)\). They showed that under the null hypothesis, \(T_{ZZZ}\) and a chi-squared-type mixture have the same limiting distribution.

References

Zhu T, Zhang L, Zhang J (2023). “Hypothesis Testing in High-Dimensional Linear Regression: A Normal Reference Scale-Invariant Test.” Statistica Sinica. doi:10.5705/ss.202020.0362 .

Examples

set.seed(1234)
k <- 3
q <- k - 1
p <- 50
n <- c(25, 30, 40)
rho <- 0.01
Theta <- matrix(rep(0, k * p), nrow = k)
X <- matrix(c(rep(1, n[1]), rep(0, sum(n)), rep(1, n[2]), rep(0, sum(n)), rep(1, n[3])),
  ncol = k, nrow = sum(n)
)
y <- (-2 * sqrt(1 - rho) + sqrt(4 * (1 - rho) + 4 * p * rho)) / (2 * p)
x <- y + sqrt((1 - rho))
Gamma <- matrix(rep(y, p * p), nrow = p)
diag(Gamma) <- rep(x, p)
U <- matrix(ncol = sum(n), nrow = p)
for (i in 1:sum(n)) {
  U[, i] <- rnorm(p, 0, 1)
}
Y <- X %*% Theta + t(U) %*% Gamma
C <- cbind(diag(q), -rep(1, q))
glht_zzz2022(Y, X, C)
#> 
#> 
#> 
#> data:  
#> statistic = 1.0496, df = 105.45, p-value = 0.3447
#>