Skip to contents

Zhang and Zhu (2022)'s test for testing equality of two-sample high-dimensional mean vectors without assuming that two covariance matrices are the same.

Usage

tsbf_zz2022(y1, y2)

Arguments

y1

The data matrix (p by n1) from the first population. Each column represents a \(p\)-dimensional observation.

y2

The data matrix (p by n2) from the first population. Each column represents a \(p\)-dimensional observation.

Value

A (list) object of S3 class htest containing the following elements:

p.value

the p-value of the test proposed by Zhang and Zhu (2022).

statistic

the test statistic proposed by Zhang and Zhu (2022).

beta0

parameter used in Zhang and Zhu (2022)'s test.

beta1

parameter used in Zhang and Zhu (2022)'s test.

df

estimated approximate degrees of freedom of Zhang and Zhu (2022)'s test.

Details

Suppose we have two independent high-dimensional samples: $$ \boldsymbol{y}_{i1},\ldots,\boldsymbol{y}_{in_i}, \;\operatorname{are \; i.i.d. \; with}\; \operatorname{E}(\boldsymbol{y}_{i1})=\boldsymbol{\mu}_i,\; \operatorname{Cov}(\boldsymbol{y}_{i1})=\boldsymbol{\Sigma}_i,i=1,2. $$

The primary object is to test $$H_{0}: \boldsymbol{\mu}_1 = \boldsymbol{\mu}_2\; \operatorname{versus}\; H_{1}: \boldsymbol{\mu}_1 \neq \boldsymbol{\mu}_2.$$ Zhang and Zhu (2022) proposed the following test statistic: $$T_{ZZ} = \|\bar{\boldsymbol{y}}_1 - \bar{\boldsymbol{y}}_2\|^2-\operatorname{tr}(\hat{\boldsymbol{\Omega}}_n),$$ where \(\bar{\boldsymbol{y}}_{i},i=1,2\) are the sample mean vectors and \(\hat{\boldsymbol{\Omega}}_n\) is the estimator of \(\operatorname{Cov}(\bar{\boldsymbol{y}}_1-\bar{\boldsymbol{y}}_2)\). They showed that under the null hypothesis, \(T_{ZZ}\) and a chi-squared-type mixture have the same normal or non-normal limiting distribution.

References

Zhang J, Zhu T (2022). “A further study on Chen-Qin’s test for two-sample Behrens--Fisher problems for high-dimensional data.” Journal of Statistical Theory and Practice, 16(1), 1. doi:10.1007/s42519-021-00232-w .

Examples

set.seed(1234)
n1 <- 20
n2 <- 30
p <- 50
mu1 <- t(t(rep(0, p)))
mu2 <- mu1
rho1 <- 0.1
rho2 <- 0.2
a1 <- 1
a2 <- 2
w1 <- (-2 * sqrt(a1 * (1 - rho1)) + sqrt(4 * a1 * (1 - rho1) + 4 * p * a1 * rho1)) / (2 * p)
x1 <- w1 + sqrt(a1 * (1 - rho1))
Gamma1 <- matrix(rep(w1, p * p), nrow = p)
diag(Gamma1) <- rep(x1, p)
w2 <- (-2 * sqrt(a2 * (1 - rho2)) + sqrt(4 * a2 * (1 - rho2) + 4 * p * a2 * rho2)) / (2 * p)
x2 <- w2 + sqrt(a2 * (1 - rho2))
Gamma2 <- matrix(rep(w2, p * p), nrow = p)
diag(Gamma2) <- rep(x2, p)
Z1 <- matrix(rnorm(n1 * p, mean = 0, sd = 1), p, n1)
Z2 <- matrix(rnorm(n2 * p, mean = 0, sd = 1), p, n2)
y1 <- Gamma1 %*% Z1 + mu1 %*% (rep(1, n1))
y2 <- Gamma2 %*% Z2 + mu2 %*% (rep(1, n2))
tsbf_zz2022(y1, y2)
#> 
#> 
#> 
#> data:  
#> statistic = 0.2389, df = 3.70891, beta0 = -2.31342, beta1 = 0.62375,
#> p-value = 0.3515
#>