Statistical Formulas

    James Sulzen

   11/21/00

                          

S

D

@

~

¹

»

º

£

³

±

´

¸

Ö

Þ

¹

®

a

b

c

d

e

g

l

m

p

r

s

X

Ø

 

¼

 

Subscript = CTRL+=            Superscript = CTRL+SHIFT+=           Font = CTRL+SHIFT+F 

s = CTRL+SHIFT+S      S = ALT+SHIFT+S       Ö = CTRL-SHIFT-R    m = CTRL-SHIFT-M

1

Percentile rank

Percentile rank = L%i-1 + i% * (Score – LRLi) / hi

= 100* [Fi-1/FN + fi/FN*(Score–{Ui-1+ (Li-Ui-1)/2)}
                                                 (Ui-Ui-1)

i=Interval in which Score falls; i%= i’s % of total; 
 L%i-1=Cumulative % below interval i;  LRLi=halfway pt bet. i and top of  next lower interval; hi=# range of i.

Fi=Cumul. count 0 to I (FN=N); fi= count of items in i; Li/Ui=Lower/upper score of i;

36

2

Raw score from % rank

Scorep=LRL + hi*(pN-SFBi)/fi
= Ui-1*+(Li-Ui-1)/2 + (Ui-Ui-1)* (pN – Fi-1) / fi

(see above also) Scorep=raw score corresp. to percentile p; p=specified percentile; N=total samples; SFB=Fi (sum freq below i);

39

3

Mean

Median

Xbar = SX / N

Score.50 = median = LRLi + hi * (N/2 – SFB) / fi
= Ui-1*+(Li-Ui-1)/2 + (Ui-Ui-1)* (N/2 – Fi-1) / fi

When median score occurs in interval i (also see above):

 

 

49

4

Std Dev.

s = Ö [ S [ (X – m)2] / N] = Ö [SS / N]

s =  Ö [S [ (X – Xbar)2] / (N-1) ] = Ö [SS / (N-1)]

 

56

 

 

  Computing formula:

s = Ö [(S[X2] – (SX)2/N) / (N-1)]

 

 

5

Z scores

Z = (X – Xbar) / s

 

68

6

T scores

SAT scores

T = 10Z + 50

SAT = 100Z + 500

 

71

73

7

Z scores

Zx = (X – Xbar) / sx  Zy = (Y – Ybar) / sy

 

109

167

8

Std Err of the Mean

sXbar = s / ÖN

z = (Xbar - m ) / sXbar                    when s is known

t = (Xbar - m ) / SXbar                when s not known

- Std. Dev. Of sample means

- Z score of sample mean

- t score of sample mean (df=N-1)

119

127

129

9

Confidence Interval of sample mean

Xbar – tsXbar       £    m    £     Xbar + tsXbar

df = N-1

134

10

Std Err. of Proportion

z = (p - p) / sp

sp =  Ö[p (1-p)/N]    (sp = std. err. of a proportion)

p = proportion of observed sample

p = hypothesized value of population prop.

136

11

Std. Err. of the Diff of two popul. means

s2pooled =  (Nc-1)sc2 + (Nx-1)sx2
             Nc + Nx – 2

scbar-xbar = Ö[ s2pooled (1/Nc + 1/Nx) ]

             = Ö[ s2pooled (Nc+Nx)/NcNx]     

Assume two populations, mc & mx, and sc & sx (control & experimental).

 

150

12

Signif. of diff’s of two pop.’s

t = (Xcbar – Xxbar) / scbar-xbar        with df = N-2

 

Note: N’s and s’s should be approx. equal (see bottom of p. 156)

Test:  H0 : mx = mc,   H1 :  mx ¹ mc

151

 

Confidence interval

[(Xcbar – Xxbar) – tscbar-xbar] £  mc-mx
                  £ [(Xcbar – Xxbar) +  tscbar-xbar]

The presumption is that the two population means are equal so that mc-mx = 0.

154

13

Matched Pairs

  t = (Dbar - mD) / Ö[sD2/N]   (df=N-1)
                   = Dbar / Ö[sD2/N]   since mD = 0

Dbar = SDX/N

157

159

 

 

Computing formula:
t = SD / Ö[(NS[D2] – (SD)2) / (N-1)

  D=X1-X2

 

14

Pearson rZ

rxy = 1 – 0.5 * ( S[ (Zx – Zy)2 ] / N)

     = S[(X-Xbar)(Y-Ybar)] / NsXsY

     = S [ZxZy] / N

 

169

15

Pearson r.
comput.

rxy =                                      N S [XY] – SX SY                                        

        Ö [ (N S X2 – (S X) 2) (N S Y2 – (S Y) 2)  ]

Computing formula

172

16

Corr.
Signif.

Significance of correlation:

t = r Ö [N-2] / Ö [1-r2]

df = N-2;  use Table D (for Pearson).

175
  -6

17

Regression
equation

Y’ = bXYX + aXY

bYX = rxy (sy / sx )
= (N S [XY] – SX SY) / (NSX2 – (SX)2)

aYX = Ybar - bXYXbar

 

 Z’Y = rXYZX

182
  -4

 

 

 

 

 

18

Std err. of Estimate of Correlation

 sY’ = sY Ö [ 1 – rxy2 ]  =  Ö [ S [Y-Y’]2 / N ]

rxy2 = [sY2sY’2] / sY2 



186

187

 

 

sY’ = Ö [ S [(Y – Y’)2] / (N – 2) ]

When s & m aren’t known

189

19

Spearman Correlation

rs  = 1 – (6S[(Xi-Yi)^2] / (N*(N^2 – 1))

When data items are ranked 1® N

Use Table E to test significance

195

20

Point Biserial Correlation

rpb  = [(Y1bar – Y0bar)/sY] * Ö[pq]
p = proportion of M, q = proportion of F
Y1bar = mean of M score, Y0bar = mean of F scores
sY = Std dev of combined M & F scores

Correlates bi-discrete (M/F) & continuous samples (test scores)

Use Table D for significance testing

197

21

Strength of significance

rpb = Ö[ t2 / (t2 + df) ]     where df = N1 + N2 - 2

 

199

 

 

 

 

 

22

ANOVA
(one-way)
(F test)

SST = S [X – Xbar]2 = SSB + SSW

       = SX2 – (SX)2/N      (computing formula)

H0 : m1=m2=m3=…mk

H1 : At least one of the m’s not equal

k = # of groups;

N = total # of samples;

Xbar = total mean across all samples;

NG = Size of group G;

XbarG =mean of grp G;

XG = samples in group G

Procedure: Compute F and compare with .05 or .01 significance value from Table F in book.  If computed value > Table F, reject H0.

Assumptions (p. 239): Independent samples; equal variances (or NG’s approx. equal); normal populations.

228-
233

SSB = SG [ NG (XbarG – Xbar)2 ]

     =  SkG [ (SXG)2/NG ] – (SX)2/N  (computing formula)

SSW = SkG [ S[XG – XbarG]2 ]

         = SST – SSB           (computing formula)

MSB = SSB / dfB = SSB / (k - 1)         dfB=k-1

MSW = SSW / dfW = SSW / (N – k)    dfW=N-k

F =  MSB / MSW    = SSB(N-k) / SSW(k-1)

23

Protected t test

t = (Xbari – Xbarj) / Ö[MSW(1/Ni+1/Nj)]
   with df = dfW = N - k

Use to perform pairwise t test of means across all groups or across all groups of interest.

Can only be used if F test H0 is rejected!

Perform pairwise t test among all groups.

Not recommended for over six groups or so (p. 237).

234-236

 

Confidence interval of protected t test

(Xbari – Xbarj) – tsxbarI-xbarJ £ mi - mj
   £  (Xbari – Xbarj) + tsxbarI-xbarJ

where sxbarI-xbarJ = Ö[MSw(1/Ni + 1/Nj)  

and df = dfW

238

24

Least significant difference (LSD) of pairwise t test

LSD = t Ö [ MSw(2/Ni) ]          Ni = group size

                                                 df = N – k

Procedure:  Look up t value for a & df. Multiply times the expression; result is min t value that any pairwise t test of the menas must achieve to have significance.

Can use LSD if all group sizes are equal.

LSD = min t value that must be achieved for significance.

237

25

Strength of F test

e = Ö[ dfB(F-1)/(dfBF + dfW) ]   

Produces a correlation coef. (??) which gives measure with the same meaning as rpb which shows strength of the relationship.

239

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

            Which Correlation Techniques to use (p. 203)                

 

 

Var A

 

 

Continuous

Dichotomous(normal)

Dichotomous (not normal)

Var B

Continuous

Pearson

Biserial

Point biserial

Dichotomous (normal)

Biserial

Tetrachoric

 

Dichotomus (~normal)

Point biserial

 

phi