Zhu et al. (2022)'s test for general linear hypothesis testing (GLHT) problem for high-dimensional data with assuming that underlying covariance matrices are the same.
Arguments
- Y
An \(n\times p\) response matrix obtained by independently observing a \(p\)-dimensional response variable for \(n\) subjects.
- X
A known \(n\times k\) full-rank design matrix with \(\operatorname{rank}(\boldsymbol{G})=k<n-2\).
- C
A known matrix of size \(q\times k\) with \(\operatorname{rank}(\boldsymbol{C})=q<k\).
Value
A (list) object of S3
class htest
containing the following elements:
- p.value
the \(p\)-value of the test proposed by Zhu et al. (2022)
- statistic
the test statistic proposed by Zhu et al. (2022).
- df
estimated approximate degrees of freedom of Zhu et al. (2022)'s test.
Details
A high-dimensional linear regression model can be expressed as $$\boldsymbol{Y}=\boldsymbol{X\Theta}+\boldsymbol{\epsilon},$$ where \(\Theta\) is a \(k\times p\) unknown parameter matrix and \(\boldsymbol{\epsilon}\) is an \(n\times p\) error matrix.
It is of interest to test the following GLHT problem $$H_0: \boldsymbol{C\Theta}=\boldsymbol{0}, \quad \text { vs. } H_1: \boldsymbol{C\Theta} \neq \boldsymbol{0}.$$
Zhu et al. (2022) proposed the following test statistic: $$T_{ZZZ}=\frac{(n-k-2)}{(n-k)pq}\operatorname{tr}(\boldsymbol{S}_h\boldsymbol{D}^{-1}),$$ where \(\boldsymbol{S}_h\) and \(\boldsymbol{S}_e\) are the variation matrices due to the hypothesis and error, respectively, and \(\boldsymbol{D}\) is the diagonal matrix with the diagonal elements of \(\boldsymbol{S}_e/(n-k)\). They showed that under the null hypothesis, \(T_{ZZZ}\) and a chi-squared-type mixture have the same limiting distribution.
References
Zhu T, Zhang L, Zhang J (2023). “Hypothesis Testing in High-Dimensional Linear Regression: A Normal Reference Scale-Invariant Test.” Statistica Sinica. doi:10.5705/ss.202020.0362 .
Examples
set.seed(1234)
k <- 3
q <- k - 1
p <- 50
n <- c(25, 30, 40)
rho <- 0.01
Theta <- matrix(rep(0, k * p), nrow = k)
X <- matrix(c(rep(1, n[1]), rep(0, sum(n)), rep(1, n[2]), rep(0, sum(n)), rep(1, n[3])),
ncol = k, nrow = sum(n)
)
y <- (-2 * sqrt(1 - rho) + sqrt(4 * (1 - rho) + 4 * p * rho)) / (2 * p)
x <- y + sqrt((1 - rho))
Gamma <- matrix(rep(y, p * p), nrow = p)
diag(Gamma) <- rep(x, p)
U <- matrix(ncol = sum(n), nrow = p)
for (i in 1:sum(n)) {
U[, i] <- rnorm(p, 0, 1)
}
Y <- X %*% Theta + t(U) %*% Gamma
C <- cbind(diag(q), -rep(1, q))
glht_zzz2022(Y, X, C)
#>
#>
#>
#> data:
#> statistic = 1.0496, df = 105.45, p-value = 0.3447
#>