FM Revision 2 - Statistics | Scoins.net | DJS

## FM Revision 2 - Statistics

1a) X has a p.d.f. given by f(x) = λe-λx, for x≥0 then show that E(X) = 1/λ and find Var(X).

b) The lifetime of a certain make of television tube is a random variable T and its probability density function f (t) is given by f(t) = Ae-kt for  0≤t≤∞ and k>0; f(t)=0 elsewhere. Find A in terms of k.

c) After some research, the manufacturer discovers that of 1000 such tubes, 371 failed within the first two years of use. Estimate the value of k.

d) Using this value, calculate the mean and variance of T.

e) If two such tubes are bought, what is the probability that one of them fails within 6 months while the other lasts more than six years?

2  Motorists in the west of Ireland often have a choice between a direct route and a longer but scenic detour. Observation shows that about one car in forty will take the detour. The local road engineer wants to do some repairs on the detour and he chooses a time of year when he thinks 100 cars per day will use the road (make a choice to detour or not) in daylight. Find the probability that:
a)  In one day (discount the workers themselves) there are (i) no cars at all (ii) up to four cars on the detour.
b)  For one third of a day (the day-lit part) he will need to close the road completely. What is the probability that the work will affect no drivers?

3   On a large construction project, there are on average two telephone calls every five minutes. Stating your assumptions, write the probability that in a period of t minutes there are (a) no phone calls (b) at least one phone call.
c) Use this last result to write the cumulative distribution function for the length of time between phone calls and hence establish that the probability density function f(t) is f(t)=0.4e—0.4t for  0≤t≤∞.   Calculate
d)  the mean time between calls
e)  the median time between calls
f)  Given that there was no call for 3 minutes, what is the probability of having no calls for five minutes?

4   A machine produces camshafts for an engine; these are normally distributed with a mean of 8cm and a variance of 0.0009 cm².

a. Find the percentage of camshafts between 7.93 and 7.97cm

b. What is the probability that two successive camshafts fall in this interval?

c. Camshafts with diameters outside 8±0.05cm are rejected. If the mean stays constant, what is the target rmsd (std deviation) to have only 4% of the production rejected?

5     People suffering from mental illness tend to have recurrences of their illness. Many of these illnesses are not merely anti-social, they are positively dangerous to society. Therefore it is important to time check-ups to minimise the danger to society. The symptoms of the illness, the things which count as unacceptable behaviour, recur as an exponential distribution with parameter >0.

a)     Find, in terms of λ and t, the probability that neither of two sufferers will show renewed symptoms within t days of their last treatment.

b)     Given that two patients have no renewed symptoms for t days after a treatment find the probability that both will remain free of symptoms for a further t days.

c)     Another subject is found to have renewed symptoms during his routine check t days after the last treatment. Show that the probability that the renewed symptoms first showed less than kt days (0≤k≤1) before the day of the routine check is (1-e-λ(1-k)t)/ (1-e-λt)

d)     Find the value of λ that represents any single patient going for four weeks with a 5% chance of regression (renewed symptoms). Use this and your previous answer to find the probability that the last patient was showing symptoms for less than a week before the routine four week check-up. Hence show the reverse, that symptoms have been showing for longer than that.

e)     Comment on the timing between checkups. Write several sentences but less than half a page.

6    The number of container ships arriving at a small port between successive high tides is a Poisson distribution with a mean of two. The depth of water is such that vessels can only enter the port at high tide. The port has dock space for only three ships of this size, each of which is emptied (discharged) and can leave the dock at the next tide. Only the first three ships can enter the dock area; any others must wait.

Starting from an evening (Sunday) high tide after which no ships are left waiting, find, to three decimal places, the probabilities that, after the next (Monday) morning high tide,

(i)    all three dock spaces remain empty

(ii)    all three berths are filled.

(iii)   that no tankers are left waiting after the following evening tide (Monday)

(iv)   that at no time in a seven day period is a ship left waiting outside the dock area and for the next tide.

(v)    If a fourth dock is contemplated, how does the last answer change? Long, considered answer, please. Imagine you’re being paid a consultation fee by a decision-maker.

7   A producer of 500g tins of wax polish has a specification that says 97.5% of tins will exceed 500g and 99.9% will exceed 495g. The masses are expected to follow a normal distribution.

a)  If the mean mass is 501g, calculate the greatest standard deviation (root of variance) such that both conditions are satisfied.

b)  If the variance is 0.25, find the least mean such that both conditions are satisfied.

There is a loss of weight over time due to evaporation. After a two-year shelf life, the loss is 5% with very little variation. So that the goods sold in the shops shall meet the conditions as to quantities, the manufacturer decides to set the mean mass at production at 540g on the grounds that (i) the extra material costs less than the cost of soothing unsatisfied customers and (ii) the ‘sell-by date’ can be two years from manufacture.

c) Calculate the greatest value of the standard deviation (root of variance) so that both specification conditions still apply after two years.

d) A collection of tins that is two years old is to be sampled. It would be acceptable if 99% of samples fit a distribution of N(512, 5.8). How big do you think the samples should be?

8  A random sample of 600 people from Gaoxin were surveyed and the results show that 30% drink date milk (hong zao niu nai). In a second sample of 300 people, 96 drink it. Find an unbiased estimate of the proportion of people who drink it.

9   The lifetime of a shuttlecock is the number of hours of continuous play before it becomes unusable. A random sample of 40 shuttlecocks has a mean lifetime of 4 hours with a standard deviation of 1.1 hours.

(a)  Find the value of c so that c<μ<∞ is a 95% one-sided confidence interval for μ, the mean lifetime of a shuttlecock. Explain what this answer means.

A new supplier claims his shuttlecocks will last longer for the same price. At the next opportunity, the box of samples is used and the sample of 20 has a mean of 4.2 hours and the same variance.
(b) What do you think? Why? Explain your choice of statistical method.

(c) Another look at the figures shows that the new sample had a variance of 0.62, not 1.12 and that the mean was 4.52 not 4.2. Now what do you think?

10  There is lots of argument in Britain about single-sex schooling. The evidence suggests that in general girls do better in a single-sex environment but boys do better in a mixed sex school. To test the hypothesis, 140 girls of similar ability are split into two groups, 68 attending all-girls classes and 72 going to mixed classes. All the classes follow the same syllabus and after two terms the girls are given a test, summarised as follows:

Mixed: Σx = 7920 Σx² = 879912

Single: Σx = 7820 Σx² = 904808

Treating both samples as large and from normal distributions with a common variance, obtain a pooled two-sample estimate of the common population variance, . Test whether, as the 1% level, the girls reach a higher standard in single-sex classes.

Updated answers in Xi’an school office, 2008, not here. Minimum two hours work on this page, possibly eight for UK students.

1[Cranshaw & Chambers p309 Eg 5.22]

A=k = 0.232

E(T) = 1/k = 4.3 years

Var(T) = 1/k² = 19 (2s.f.)

P(T<1) = 0.207, P(T>6)=0.249

so P(two tubes bought and both results happen) = 2x0.207x0.249 = 0.103 (3s.f)

2 [MEI Stats 2 P49 Q2]

p(x=0)=0.082

P(X<=4);   [P(X=4)=0.1336]

0.4346

3 (C&C P316 Ex 5g Q5)

a)e—0.4t

b)1- e—0.4t = P(T<=t) F’(t) = f(t) = 0.4 e—0.4t

d)2.5 = 1/k

e) 1.73 = 2.5 ln2

f)  0.135 = P(T>=5) = P(T>=3)x P(T>=2)= e-2

OR P(T>=5)/ P(T>=3)= P(T>=2)= e—0.8=0.45

depending on whether you think the two times are independent or dependent. Book thinks e—2. Class unanimous at e-0.8. I thought the latter too high, but it is a little like tossing coins; every extra two minutes you still have a half chance of a call.

Note the parallel with half-life from physics, which presumably is answer (e), close to root 3.

Like Q5, need to use third answer to find first two. Not obvious which distribution applies, but that the distribution is partly discrete (counting calls) and partly continuous (the time between them) gives the clue that it is not quite Poisson and so exponential. The distinction seems to be (is) that the time between calls is f(t) = λe-λt, that the cumulative pdf for the length of time between calls is F(t) =1- e- λt  and the number of calls behaves as if Poisson.

4 (MEI Stats2Ex3a P 68 Q 19 (basis))

14.87%, 0.0222, 0.05/2.054=0.02430.……..

A very difficult question:

5  e-2λt, e-2λt; c) look at P(T<t<kt|T<t) and the integrals that relate to that (< means =<). Formula works when 0<k<1, less than the regular treatment period.

e-28λ=0.95, λ=0.001832;; (d) k=1/4, t=28 But k=7 does not fit formula. k is only applicable while 0≤k<1. Try integral to 21 days / to 28 days. Last lambda suggests about 75% probability, 0.0377394*20=0.7548.

Comment starts with the point that the interval between treatments needs to be reduced, but could go on to discuss the problem of an open society in which the incidence of ‘bad’ events is kept to zero; that suggests that the treatments are ‘too’ often, that the incidence is ‘too’ safe. Students could go on to discuss how lambda is measured and that investigations in more closed societies (e.g. prison) might be good sources for better information. In the US, the problem of relatively easy access to guns makes the events where symptoms have recurred expensive to the society.

This question very difficult. Exam question techniques says that, like Q3, to use the given answer (c) to work back to the earlier ones. the expression P(T<t<kt|T<t) gives the route to all of the answers.

The discussion/comment should say that the need to consider the timing of check-ups is crucial and that the evidence that would suggest the check-up should be sooner (smaller gaps between them). When the % has fallen, the outsiders’ cry will always be that the interval is too short; a recipe for apparent commercialsim and profiteering. On the other hand, the cost of getting it wrong, in a litigious society.......

6       1Bryars P166 Ex 11.3 Q14

0.135, 0.323 (book says 0.677),

0.820 = e-4x133/3  (book says 0.84). X~Po(6) will not work; need to pair (6, 0), (5, ≤1), (4, ≤2) (≤3,≤3) for answer to make sense. Go on, disagree !!   [2017 issues with index fails in typing]

Seven days, 14 tides; 0.8571234614 = 0.1155   ??index lost?

Four berths: 0.947346982714=0.46895 ??index lost? Discuss effect on boats which are left out in rough seas, and how concerned harbour masters might encourage ships to go elsewhere in bad weather – knowing how full the berths are.

7  Bryars Rev B Q 14

0.510, 500.98, both using the 97.5% rule.  5.825=18 / 3.090

5.8x(2.326/1)2=31.38, so samples of 32 tins since expression uses n. It is not an (n-1) case because we have the parent population variance given to us.

8 (600*.3+300*96/300) / (600+300) = 0.3067 with thanks to Ma Li, so 31%.

9    C&C Ex 8d P441 Q 16    as basis only

4-1.645*1.1/√40 = 3.7139

40 is large enough for it not to matter whether we use the sample variance. We are 95% certain that the mean is 3.71 or bigger. We accept that 5% of the time we will be wrong.

z = 0.2 / (1.1*√(1/40+1/20)) = 0.332 ~ N(0,1) so … this is unlikely to be significant.

z = ((4.52-4)-0)/ √(1.12/40+0.62/20)) = 2.367 ~ N(0,1) so significant at the 1% level. Need z >2.326 one sided.

Some students think this should be pooled; we do this when both mean and variance of the parent population are unknown. That doesn’t make them wrong; the pooled variance is about 1.188.

The particular difficulty with this question is that it is not clear whether nS²+ nS²/(n1+n2-2) =  ‘sigma hat’ is used. ??index lost? The issue is one of statement; if the two shuttlecocks are from the same population with unknown mean and variance, then “sigma hat” is appropriate.  If we know that the populations are different but that the means may be the same then √(1.12/40+0.62/20) is the correct divisor when testing the difference between means. Sigma hat is used when we have no idea of the variance and need a best estimator.

10    Cranshaw & Chambers P483 Eg 9.15

Let y be single sex and x be mixed. Then for x, mean=110 and variance 121: for y, mean = 115 and variance 81. Sx=11, Sy=9; Pooled variance, (72*121+68*81)/(72+68-2) = 103.0435

Test that the difference in means can use 2/n  gives a test statistic of 5.828, so well significant.   !!copying error!!

If H0: the means are the same and H1: mixed is worse,  then a one-tailed test gives

(110-115-0) / √(103.04(1/72+1/68)) = -2.913 so z<2.326 and we reject H0, concluding that girls fare better in a single sex environment.

15 minute city   Email: David@Scoins.net      © David Scoins 2021