Skip to contents

Zhang et al. (2022)'s test for general linear hypothesis testing (GLHT) problem for high-dimensional data under heteroscedasticity.

Usage

glhtbf_zzg2022(Y,G,n,p)

Arguments

Y

A list of \(k\) data matrices. The \(i\)th element represents the data matrix (\(p\times n_i\)) from the \(i\)th population with each column representing a \(p\)-dimensional observation.

G

A known full-rank coefficient matrix (\(q\times k\)) with \(\operatorname{rank}(\boldsymbol{G})< k\).

n

A vector of \(k\) sample sizes. The \(i\)th element represents the sample size of group \(i\), \(n_i\).

p

The dimension of data.

Value

A (list) object of S3 class htest containing the following elements:

p.value

the \(p\)-value of the test proposed by Zhang et al. (2022)

statistic

the test statistic proposed by Zhang et al. (2022).

beta

the parameters used in Zhang et al. (2022)'s test.

df

estimated approximate degrees of freedom of Zhang et al. (2022)'s test.

Details

Suppose we have the following \(k\) independent high-dimensional samples: $$ \boldsymbol{y}_{i1},\ldots,\boldsymbol{y}_{in_i}, \;\operatorname{are \; i.i.d. \; with}\; \operatorname{E}(\boldsymbol{y}_{i1})=\boldsymbol{\mu}_i,\; \operatorname{Cov}(\boldsymbol{y}_{i1})=\boldsymbol{\Sigma}_i,i=1,\ldots,k. $$ It is of interest to test the following GLHT problem: $$H_0: \boldsymbol{G M}=\boldsymbol{0}, \quad \text { vs. } \; H_1: \boldsymbol{G M} \neq \boldsymbol{0},$$ where \(\boldsymbol{M}=(\boldsymbol{\mu}_1,\ldots,\boldsymbol{\mu}_k)^\top\) is a \(k\times p\) matrix collecting \(k\) mean vectors and \(\boldsymbol{G}:q\times k\) is a known full-rank coefficient matrix with \(\operatorname{rank}(\boldsymbol{G})<k\).

Zhang et al. (2022) proposed the following test statistic: $$ T_{ZZG}=\|\boldsymbol{C} \hat{\boldsymbol{\mu}}\|^2, $$ where \(\boldsymbol{C}=[(\boldsymbol{G D G}^\top)^{-1/2}\boldsymbol{G}]\otimes\boldsymbol{I}_p\) with \(\boldsymbol{D}=\operatorname{diag}(1/n_1,\ldots,1/n_k)\), and \(\hat{\boldsymbol{\mu}}=(\bar{\boldsymbol{y}}_1^\top,\ldots,\bar{\boldsymbol{y}}_k^\top)^\top\) with \(\bar{\boldsymbol{y}}_{i},i=1,\ldots,k\) being the sample mean vectors.

They showed that under the null hypothesis, \(T_{ZZG}\) and a chi-squared-type mixture have the same normal or non-normal limiting distribution.

References

Zhang J, Zhou B, Guo J (2022). “Linear hypothesis testing in high-dimensional heteroscedastic one-way MANOVA: A normal reference \(L^2\)-norm based test.” Journal of Multivariate Analysis, 187, 104816. doi:10.1016/j.jmva.2021.104816 .

Examples

set.seed(1234)
k <- 3
p <- 50
n <- c(25, 30, 40)
rho <- 0.1
M <- matrix(rep(0, k * p), nrow = k, ncol = p)
avec <- seq(1, k)
Y <- list()
for (g in 1:k) {
  a <- avec[g]
  y <- (-2 * sqrt(a * (1 - rho)) + sqrt(4 * a * (1 - rho) + 4 * p * a * rho)) / (2 * p)
  x <- y + sqrt(a * (1 - rho))
  Gamma <- matrix(rep(y, p * p), nrow = p)
  diag(Gamma) <- rep(x, p)
  Z <- matrix(rnorm(n[g] * p, mean = 0, sd = 1), p, n[g])
  Y[[g]] <- Gamma %*% Z + t(t(M[g, ])) %*% (rep(1, n[g]))
}
G <- cbind(diag(k - 1), rep(-1, k - 1))
glhtbf_zzg2022(Y, G, n, p)
#> 
#> 
#> 
#> data:  
#> statistic = 192.13, df = 68.7634, beta = 2.7436, p-value = 0.4348
#>