Zhu and Zhang (2022)'s test for general linear hypothesis testing (GLHT) problem for high-dimensional data with assuming that underlying covariance matrices are the same.
Arguments
- Y
A list of \(k\) data matrices. The \(i\)th element represents the data matrix (\(p\times n_i\)) from the \(i\)th population with each column representing a \(p\)-dimensional observation.
- G
A known full-rank coefficient matrix (\(q\times k\)) with \(\operatorname{rank}(\boldsymbol{G})<k\).
- n
A vector of \(k\) sample sizes. The \(i\)th element represents the sample size of group \(i\), \(n_i\).
- p
The dimension of data.
Value
A (list) object of S3
class htest
containing the following elements:
- p.value
the \(p\)-value of the test proposed by Zhu and Zhang (2022).
- statistic
the test statistic proposed by Zhu and Zhang (2022).
- beta0
the parameter used in Zhu and Zhang (2022)'s test.
- beta1
the parameter used in Zhu and Zhang (2022)'s test.
- df
estimated approximate degrees of freedom of Zhu and Zhang (2022)'s test.
Details
Suppose we have the following \(k\) independent high-dimensional samples: $$ \boldsymbol{y}_{i1},\ldots,\boldsymbol{y}_{in_i}, \;\operatorname{are \; i.i.d. \; with}\; \operatorname{E}(\boldsymbol{y}_{i1})=\boldsymbol{\mu}_i,\; \operatorname{Cov}(\boldsymbol{y}_{i1})=\boldsymbol{\Sigma},\; i=1,\ldots,k. $$ It is of interest to test the following GLHT problem: $$H_0: \boldsymbol{G M}=\boldsymbol{0}, \quad \text { vs. } \quad H_1: \boldsymbol{G M} \neq \boldsymbol{0},$$ where \(\boldsymbol{M}=(\boldsymbol{\mu}_1,\ldots,\boldsymbol{\mu}_k)^\top\) is a \(k\times p\) matrix collecting \(k\) mean vectors and \(\boldsymbol{G}:q\times k\) is a known full-rank coefficient matrix with \(\operatorname{rank}(\boldsymbol{G})<k\).
Zhu and Zhang (2022) proposed the following test statistic: $$ T_{ZZ}=\|\boldsymbol{C} \hat{\boldsymbol{\mu}}\|^2-q \operatorname{tr}(\hat{\boldsymbol{\Sigma}}), $$ where \(\boldsymbol{C}=[(\boldsymbol{G D G}^\top)^{-1/2}\boldsymbol{G}]\otimes\boldsymbol{I}_p\), and \(\hat{\boldsymbol{\mu}}=(\bar{\boldsymbol{y}}_1^\top,\ldots,\bar{\boldsymbol{y}}_k^\top)^\top\), with \(\bar{\boldsymbol{y}}_{i},i=1,\ldots,k\) being the sample mean vectors and \(\hat{\boldsymbol{\Sigma}}\) being the usual pooled sample covariance matrix of the \(k\) samples.
They showed that under the null hypothesis, \(T_{ZZ}\) and a chi-squared-type mixture have the same normal or non-normal limiting distribution.
References
Zhu T, Zhang J (2022). “Linear hypothesis testing in high-dimensional one-way MANOVA: a new normal reference approach.” Computational Statistics, 37(1), 1--27. doi:10.1007/s00180-021-01110-6 .
Examples
set.seed(1234)
k <- 3
p <- 50
n <- c(25, 30, 40)
rho <- 0.1
M <- matrix(rep(0, k * p), nrow = k, ncol = p)
y <- (-2 * sqrt(1 - rho) + sqrt(4 * (1 - rho) + 4 * p * rho)) / (2 * p)
x <- y + sqrt((1 - rho))
Gamma <- matrix(rep(y, p * p), nrow = p)
diag(Gamma) <- rep(x, p)
Y <- list()
for (g in 1:k) {
Z <- matrix(rnorm(n[g] * p, mean = 0, sd = 1), p, n[g])
Y[[g]] <- Gamma %*% Z + t(t(M[g, ])) %*% (rep(1, n[g]))
}
G <- cbind(diag(k - 1), rep(-1, k - 1))
glht_zz2022(Y, G, n, p)
#>
#>
#>
#> data:
#> statistic = 5.345, df = 18.8571, beta0 = -51.3143, beta1 = 2.7212,
#> p-value = 0.3382
#>