Zhang et al. (2023)'s test for testing equality of two-sample high-dimensional mean vectors without assuming that two covariance matrices are the same.
Arguments
- y1
The data matrix (p by n1) from the first population. Each column represents a \(p\)-dimensional observation.
- y2
The data matrix (p by n2) from the first population. Each column represents a \(p\)-dimensional observation.
- cutoff
An empirical criterion for applying the adjustment coefficient
Value
A (list) object of S3
class htest
containing the following elements:
- p.value
the p-value of the test proposed by Zhang et al. (2023)'s test.
- statistic
the test statistic proposed by Zhang et al. (2023)'s test.
- df
estimated approximate degrees of freedom of Zhang et al. (2023)'s test.
- cpn
the adjustment coefficient used in Zhang et al. (2023)'s test.
Details
Suppose we have two independent high-dimensional samples: $$ \boldsymbol{y}_{i1},\ldots,\boldsymbol{y}_{in_i}, \;\operatorname{are \; i.i.d. \; with}\; \operatorname{E}(\boldsymbol{y}_{i1})=\boldsymbol{\mu}_i,\; \operatorname{Cov}(\boldsymbol{y}_{i1})=\boldsymbol{\Sigma}_i,i=1,2. $$ The primary object is to test $$H_{0}: \boldsymbol{\mu}_1 = \boldsymbol{\mu}_2\; \operatorname{versus}\; H_{1}: \boldsymbol{\mu}_1 \neq \boldsymbol{\mu}_2.$$ Zhang et al.(2023) proposed the following test statistic: $$T_{ZZZ}=\frac{n_1 n_2}{np}(\bar{\boldsymbol{y}}_1-\bar{\boldsymbol{y}}_2)^{\top} \hat{\boldsymbol{D}}_n^{-1}(\bar{\boldsymbol{y}}_1-\bar{\boldsymbol{y}}_2),$$ where \(\bar{\boldsymbol{y}}_{i},i=1,2\) are the sample mean vectors, and \(\hat{\boldsymbol{D}}_n=\operatorname{diag}(\hat{\boldsymbol{\Sigma}}_1/n+\hat{\boldsymbol{\Sigma}}_2/n)\) with \(n=n_1+n_2\). They showed that under the null hypothesis, \(T_{ZZZ}\) and a chi-squared-type mixture have the same limiting distribution.
References
Zhang L, Zhu T, Zhang J (2023). “Two-sample Behrens--Fisher problems for high-dimensional data: a normal reference scale-invariant test.” Journal of Applied Statistics, 50(3), 456--476. doi:10.1080/02664763.2020.1834516 .
Examples
set.seed(1234)
n1 <- 20
n2 <- 30
p <- 50
mu1 <- t(t(rep(0, p)))
mu2 <- mu1
rho1 <- 0.1
rho2 <- 0.2
a1 <- 1
a2 <- 2
w1 <- (-2 * sqrt(a1 * (1 - rho1)) + sqrt(4 * a1 * (1 - rho1) + 4 * p * a1 * rho1)) / (2 * p)
x1 <- w1 + sqrt(a1 * (1 - rho1))
Gamma1 <- matrix(rep(w1, p * p), nrow = p)
diag(Gamma1) <- rep(x1, p)
w2 <- (-2 * sqrt(a2 * (1 - rho2)) + sqrt(4 * a2 * (1 - rho2) + 4 * p * a2 * rho2)) / (2 * p)
x2 <- w2 + sqrt(a2 * (1 - rho2))
Gamma2 <- matrix(rep(w2, p * p), nrow = p)
diag(Gamma2) <- rep(x2, p)
Z1 <- matrix(rnorm(n1*p,mean = 0,sd = 1), p, n1)
Z2 <- matrix(rnorm(n2*p,mean = 0,sd = 1), p, n2)
y1 <- Gamma1 %*% Z1 + mu1%*%(rep(1,n1))
y2 <- Gamma2 %*% Z2 + mu2%*%(rep(1,n2))
tsbf_zzz2023(y1,y2,cutoff=1.2)
#>
#>
#>
#> data:
#> statistic = 0.96296, df = 24.6376, cpn = 1.4402, p-value = 0.5144
#>