This article builds on the ideas and extends the methods outlined in the previous publication Statistically Robust Data Analysis: The Wilcoxon Test for two samples. This is a simple but widely used model, as even in more complex situations, targets are often compared at two levels.
The analysis of the model about the shift of the parameters of the position of two general populations begins with a description of the distribution-free Mann-Whitney-Wilcoxon (MWW) rank procedure, here point and interval estimates for the magnitude of the shift are constructed. The following briefly describes an analysis method based on the use of score functions and, with its help, also tests the null hypothesis about the magnitude of the shift parameter. In conclusion, the model for the position parameter is formulated as a regression problem, the solution of which also allows one to construct point and interval estimates for the shift parameter.
All the methods described in the article are illustrated with an end-to-end example implemented in the form of algorithms in the R language.
1. Let and be two continuous random variables: and denote the function (cdf) and density (pdf) of the distribution of the random variable , and and denote the function (cdf) and density (pdf) of the random variable, respectively . We say that and follow the model of the position parameter (location model), if for some parameter , have
A parameter is a shift in the position parameter of random variables and , for example, it can be the difference between medians or averages (if there are averages). Note that the proposed model assumes equality of the parameters of the scale of random variables and .
2. , . โ ( cdf pdf), โ ( cdf pdf). โ .
.
( ) , : .
3. , . โ . , , . . .
> z <- c(12, 18, 11, 5, 11, 5, 11, 11)
> rank(z)
[1] 7.0 8.0 4.5 1.5 4.5 1.5 4.5 4.5
, .
-- (Mann-Whitney-Wilcoxon, MWW). โ ,
,
, . , (, , ). , () p-value ( ).
4. -- , ( - (Hodges-Lehmann))
. โ , , .
.
5. -- t- c .
> x <- round(rt(11, 5) * 10 + 42, 1)
> y <- round(rt(9, 5) * 10 + 50, 1)
> x
[1] 76.6 41.0 59.3 34.9 29.1 45.0 42.6 31.1 32.4 52.5 47.9
> y
[1] 58.3 47.2 40.1 45.8 62.0 58.7 64.8 48.1 49.5
> wilcox.test(y, x, exact = TRUE, conf.int = TRUE, conf.level = 0.95)
Wilcoxon rank sum exact test
data: y and x
W = 72, p-value = 0.09518
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
-1.0 18.4
sample estimates:
difference in location
10.4
: p-value , , . p-value ยซยป. exact = FALSE
correct = FALSE
( ) , . p-value .
> wilcox.test(y, x, exact = FALSE, correct = FALSE)
Wilcoxon rank sum test
data: y and x
W = 72, p-value = 0.08738
alternative hypothesis: true location shift is not equal to 0
6. , (score )
,, โ , cdf . (Normal score function) , , . , normal score rankit, standard score z-score. normal score, score , score .
:
โ score ,โ . , :
, , , () .
, . , :
, , โ . .
7. R p-value score ( Rfit
).
> x <- c(76.6, 41.0, 59.3, 34.9, 29.1, 45.0, 42.6, 31.1, 32.4, 52.5, 47.9)
> y <- c(58.3, 47.2, 40.1, 45.8, 62.0, 58.7, 64.8, 48.1, 49.5)
> # x y
> z = c(x, y)
> n1 = length(x)
> n2 = length(y)
> n = n1 + n2
> # score
> scores = Rfit::wscores
> # score z
> rs = rank(z)/(n + 1)
> asg = Rfit::getScores(scores, rs)
> # Sphi
> Sphi = sum(asg[(n1 + 1):n])
> # Sphi
> asc = Rfit::getScores(scores, 1:n/(n + 1))
> varphi = ((n1 * n2)/(n * (n - 1))) * sum(asc^2)
> # zphi p-value
> zphi = Sphi/sqrt(varphi)
> alternative = "two.sided"
> pvalue <-
+ switch(
+ alternative,
+ two.sided = 2 * (1 - pnorm(abs(zphi))),
+ less = pnorm(zphi),
+ greater = 1 - pnorm(zphi)
+ )
> #
> res <- list(Sphi = Sphi, statistic = zphi, p.value = pvalue)
> with(res, cat("statistic = ", statistic, ", p-value = ", p.value, "\n"))
statistic = 1.709409 , p-value = 0.08737528
, p-value p-value : .
8. C . ,โ - .
โ , . , . - . , score , - โ .
R .
> z = c(x, y)
> ci <- c(rep(0, n1), rep(1, n2))
> fit <- Rfit::rfit(z ~ ci, scores = Rfit::wscores)
> coef(summary(fit))
Estimate Std. Error t.value p.value
(Intercept) 41.8 4.400895 9.498068 1.960951e-08
ci 10.4 5.720594 1.817993 8.574801e-02
, . , , , , t- :
> conf.level <- 0.95
> estse <- coef(summary(fit))[2, 1:2]
> alpha <- 1 - conf.level
> alternative = "two.sided"
> tcvs <- switch(
+ alternative,
+ two.sided = qt(1 - alpha / 2, n - 2) * c(-1, 1),
+ less = c(-Inf, qt(1 - alpha, n - 2)),
+ greater = c(qt(alpha, n - 2), Inf)
+ )
> conf.int <- estse[1] + tcvs * estse[2]
> cat(100 * conf.level, " percent confidence interval:\n", conf.int)
95 percent confidence interval:
-1.618522 22.41852
, .
( -- score ) . .