Expectation & Variance

For discrete functions, the expected value is defined like this:

For a variate X which takes values xi, each associated with a probability pi then the expected value is defined as E(X) =  ∑pi xi for all values of i. 

Three results follow rapidly:

(i) the expectation of a constant is the constant, E (a) = a

(ii) multiplying everything by a constant multiplies the expectation           E (aX) = a E(X)

(iii) the expectation of a sum is the sum of the expectations:                 E(X+Y) = E(X) + E(Y) 

The definition for continuous functions is similar, E(X) = ∫ x p(x) dx where p(x) is the probability density function (the p.d.f.) and obviously ∫ p(x) dx = 1. What is not so obvious is P(X≤x) being the integral of p(x) from initial value to x — where the initial value is perhaps zero, value a, or negative infinity.  See a decent explanation here.

The Mean, the arithmetic mean, is, as ever, the sum of the N values and divide by N. The probability of event i, pi, must equal the frequency of its occurrence, the function f(xi), divided by the number of possible events, N, the upper limit of the counter i. So 1/N  fi   = pi is the definition of the probability, and so  expectation and mean become the same thing. For many statisticians, the terms are interchangeable, but one always distinguishes between the population mean, , and the sample mean, x.

The deviation is the difference between a value and the mean. The mean deviation will be zero and so the useful measure of spread is the mean of the squares of the deviations, called the variance:

∑f(x- µ)2  / ∑fi   = Variance, s²     where usually ∑f = n

∑pi(x- µ)²  = E(x - µ)² = E(X²) – E²(X)


1     Prove the last equality for yourself, where E2(X) means E(X).E(X) and E(X2) =  Σpi xi².

2     Write the equivalent definition of the variance for continuous functions

3     Find the mean and variance of the distribution that matches a single die,  p=1/6 for 1≤x≤6, x ∈ N

4    Repeat 3 for a uniform distribution, a≤x≤b,{a,x,b} in N+p = 1/(b-a)  [and p=0 otherwise]

5    P(X=r) = e-a ar /r!,     non-negative integer r, a>0 and constant, defines a distribution. Show

     (i) the probability sums to unity (ii) the mean is a (iii) the variance is a

6  The probability that a baker will have sold all his bread X hours after baking is given by the p.d.f (probability density function) p(x) = k(36-x) 0≤x≤6, p(x)=0 otherwise. Find k,  [k0fsfc printed on back-up page - very confusing] and sketch the curve. Calculate the mean value and find the probability there will be bread left after five hours.

7      Do Q6 again with p(x) = k (36-x²)

8      The length of an offcut (the remainder after cutting to a precise length) of wooden planking is a random variable which can take any value up to half a metre. The probability of the length not exceeding x is kx. Find (i) the value of k (ii) the pdf for X (iii) E(X) (iv) Var(X)

Many exam boards find that the connection (or its lack) between pdf and p(x) is a source of grade discrimination, so it is worth spending time on getting the distinction clear.
The pdf is P(x).
The cumulative distribution function, probably best labeled CD(x), but usually as F(x), is the integral of P(x) from the lower limit to x. Think of the capitalisation as meaning its an integral of something else. This recursiveness, integrating x as far as x, drives some students towards giving up, when I would rather see it as something to marvel at.
The next source of difficulty is that F(x) = P(X≤x) = ∫ p(y) dy  [low limit<y<x].   For some, by this point the way in which this is written has blown some students away, so it needs some clarity of explanation. Whether that occurs earlier or later is for you to decide. Often all the variables are x, which adds to confusion for some, while for others changing the symbol is what adds to the confusion. Some texts use f(x) for the probability function and F(x) for its integral. The test for sense is always that when integrated across the full range of the variable, the probabilities total unity,  ∫ p(y) dy = 1. So diagrams showing P(X≤x) ( in the sidebar) are significantly helpful to indicate an integration is required. I think this is the best way to eliminate confusion and the best way to prevent confusion occurring.

1    I cannot persuade iWeb to accept the formatting of equations from Word
2     E(X
²) – E²(X) = E (x-µ² =  (x-µ² p(x) dx  =  ∫ x² p(x) dx  - µ²
3     E(die) = 21/6 = 3.5; Var(die) = 91/6 - 441/36 = 35/12
4   E(uniform) = (a+b)/2      Var(uniform) = (b²-a² / 125.  answers in question.
6.   I think k=1/198 since p dx = 1.  Mean is px dx = 32/11, so just under three hours.  It is many years since I’ve done this work; the next bit is an integral from 0 to 5; I am uncertain if the integral is of pdx or something else. If it is p dx, then the answer would be
1 - 335/396 =      61/396 = 0.154, which seems about right to me.
1/144, 9/4, 17/432  confirms answers to 6.
8.   incorrect read of problem:   P(X≤x)=kx.    p dx = 1 => kx dx = 1 , 0<x<.05, => k=8,   pdf is p(x) = 8x [or is the pdf simply constant? Is it x that is distributed uniformly, or its probability?].    

     E(x) = px dx             = 8x² dx = [8x³/3] o<x<0.5 => 1/3 (a believable result)

    Var X = px² dx  - E²(X) = [8x/4] - 1/9 = 1/8-1/9 = 1/72   => s.d of about 0.11, also believable.

    correct read: P(X≤x) = p dx =kx =1 [0<X<x] => k = 2 [0≤x≤0.5], p(x)=2.

    Now   E(x) = px dx             = 2x dx = [2x²/2] o<x<0.5 => 1/4 (another believable result)

    Var X = px² dx  - E²(X) = [2x³/3] - 1/16 = 1/12-1/16 = 1/48   => s.d of about 0.144, also believable.

© David Scoins 2017