Paper 1

Question 1

1a) Hint 1: recognise that we have a Poisson distrubution for each family

1b) Hint 2: set A ∼ Po(3) and B ∼ Po(2) where each random variable stands for the number of times the respective baby awakes

1b) Hint 3: assume independence of babies waking up

1b) Hint 4: know that P( A=0 ∩ B=0) = P(A=0) × P(B=0)

Question 2

2a)i) Hint 1: draw a tree diagram

2a)i) Hint 2: make the first branches of the tree diagram to be: Junior, Senior and Staff

2a)i) Hint 3: make the second branches of the tree diagram to be: Late and Not Late, from each of the first branches

2a)i) Hint 4: add the probabilities to the tree diagram, as well as the probabilities of their complementary events

2a)i) Hint 5: P(Junior ∩ Not Late) = P(Junior) × P(Not Late | Junior)

2a)ii) Hint 6: P(Late) = P( Junior ∩ Late) + P(Senior ∩ Late) + P(Staff ∩ Late)

2b) Hint 7: recognise that we are asked for a conditional probability

2b) Hint 8: P(Senior | Late ) = P(Senior ∩ Late) / P(Late)

2c) Hint 9: recognise that there are 3 separate groups of people, all of whom need to be sampled

2c) Hint 10: recognise that we only want 10% of each group, to give 10% of the overall population

2c) Hint 11: recognise that this is Stratified Random Sampling

2c) Hint 12: describe in detail how random numbers might be used and how each strata would be sampled

Question 3

3a) Hint 1: perform a standard hypothesis test of β

3a) Hint 2: remember that the t distribution used here has n-2 degrees of freedom

3b) Hint 3: know that the product moment correlation coefficient measures strength of a linear relationship

3b) Hint 4: communicate what value you'd expect the pmcc to take, if the model were to be useful

Question 4

4a) Hint 1: standard normal probability calculation - note that we are given the value of the standard deviation, not the variance

4b) Hint 2: recognise that we are looking for a value, above which 90% of the distribution lies

4b) Hint 3: use inverse normal tables or calculator functions to obtain the value that has 10% below it

4b) Hint 4: present your answer with a sensible degree of accuracy

4c) Hint 5: recognise that we do not have 3 random variables here, but rather 201 random variables

4c) Hint 6: know that the formula for the variance of a sum of random variables requires them all to the independent

4c) Hint 7: calculate the expectation and variance of the sum of all 201 random variables

4c) Hint 8: know that the sum of normally distributed random variables is itself normally distributed

4c) Hint 9: calculate the desired probability using a single normal distribution with parameter values just calculated

Question 5

5a) Hint 1: recognise that the sampling involves taking a pebble at regularly spaced intervals

5b) Hint 2: consider whether pebbles in a single location of the stream are representative of all pebbles in the stream

5b) Hint 3: with your knowledge of how pebbles end up lying in a flowing stream, give a reason why larger or smaller pebbles may each cluster together

5c) Hint 4: note that we have mean and standard deviation, but not the type of distribution of the sample sizes

5c) Hint 5: note that we have a large sample size of 100 pebbles

5c) Hint 6: note that we can use the Central Limit Theorem to claim that the sample mean is approximately normally distributed

5c) Hint 7: proceed to work out a z-interval

5c) Hint 8: comment on the meaning of the interval containing, or not-containing, the previous sample mean

Question 6

Hint 1: note that with m = 4 and n = 7, we have a total of 11 data points to rank in order

Hint 2: write out the numbers 1 to 11 inclusive

Hint 3: picking a sample of size 4 each time, determine which four numbers from 1 to 11 can give a rank sum of 10

Hint 4: repeat for find all of the ways for four numbers to add together to give a rank sum of 11

Hint 5: repeat for 12, 13 and 14

Hint 6: the total number of ways listed is now out of the total number of ways of choosing 4 items from 11

Question 7

7a) Hint 1: when testing whether a set of data fits a given distribution, then it's a chi-squared goodness of fit test

7a) Hint 2: after working out the expected frequencies, check to see if there are any that are < 1 and whether there are too many < 5

7a) Hint 3: combine categories to ensure that there are a sufficient number of large expected frequencies

7b) Hint 4: know that binomial models require independent trials with a fixed probability of success

7b) Hint 5: consider whether the bulbs in a packet are independent and/or if they have the same probability of growth

Question 8

8a) Hint 1: look at the key words in the question to determine that a single sample z-test for a proportion Is required

8b) Hint 2: realise that the workings from part (a) can be re-cycled in principal, but with the proportion being d/40 rather than 18/100

8b) Hint 3: work back from the z value of -1.645, equating it to a standardised proportion from ∼ N(0.119, 0.119 x 0.881 / 40)

8b) Hint 4: with the calculated value for d, decide whether it needs to be rounded up or down to the nearest integer in order to answer the question

Question 9

9a) Hint 1: realise that the sample size is 5, not 10.

9a) Hint 2: use all 10 sample means to obtain an estimate for the population mean

9a) Hint 3: calculate a 1-sigma limit from the estimated population mean and 1 standard deviation of the sample mean

9b) Hint 4: notice that the last 4 sample means had 3 values above the 1-sigma limit

9b) Hint 5: know that if we have 4 out of 5 sample means above the same 1-sigma limit, then a WECO rule is broken

9b) Hint 6: know that if the 21st batch has a sample mean above the upper 1-sigma limit, then this rule is broken

9b) Hint 7: knowing what the mean of the 5 samples needs to be, and that we have already been given 4 sample values, calculate the 5th sample

Question 10

10a)i) Hint 1: know that the sum of probabilities must add to 1

10a)i) Hint 2: know that the estimated probabiities can be obtained from 'scaling down' the observed frequencies

10a)ii) Hint 3: use V(X²) = E(X²) - E²(X)

10a)ii) Hint 4: work out E(X²) from the table

10b)i) Hint 5: note that we have the mean and variance of the distribution

10b)i) Hint 6: notice that the distribution has a shape very similar to that of a normal distribution

10b)ii) Hint 7: note that the value of 10 is from the discrete distribution of X

10b)ii) Hint 8: note that from part (b)(i) we have a continuous normal distribution that is an approximation for X

10b)ii) Hint 9: know that continuity correction will be needed

10b)iii) Hint 10: realise that part (b)(ii) was just checking a single value of X against the expected probability

10b)iii) Hint 11: realise that to check the full model, all of the probabilities for the other values of X could be calculated

10b)iii) Hint 12: know that these estimated probabilities could be scaled up to expected frequencies

10b)iii) Hint 13: realise that we then have observed and expected frequencies, and thus a chi-squared goodness of fit test could be used

Question 11

11a)i) Hint 1: know that an outlier is likely to be the point furthest away from the linear model, with the greatest residual

11a)ii) Hint 2: a phrase that every statistician should know off by heart is: 'correlation does not imply causation'

11a)iii) Hint 3: know that one assumption behind a linear model is that the residuals have constant variance

11a)iii) Hint 4: review the graph and determine whether the spread of the points either side of the line remains constant as welfare generosity increases

11b) Hint 5: perform a standard method for calculating the equation of a least squares regression line from the provided summary statisitics

11b) Hint 6: know that a residual is equal to the (observed value) subtract the (fitted value)

Question 12

12a)i) Hint 1: know that a two sample t-test requires the parent populations to be normally distributed

12a)i) Hint 2: know that a two sample t-test requires the variances of the two parent populations to be equal

12a)i) Hint 3: conduct a standard two sample t-test, taking care not to mix up standard deviations with variances

12a)ii) Hint 4: the two sample standard deviations are 1.53 and 10.8241 ... which are not numerically very close to each other

12b) Hint 5: know that a two sample z-test does not require the two parent population variances to be equal

12b) Hint 6: know that a two sample z-test requires us to know the variances of the populations, which we have just been given

12b) Hint 7: conduct a standard two sample z-test

12c) Hint 8: know that z-tests and t-tests both require normality assumptions, but that non-parametric tests don't make such demanding assumptions

12c) Hint 9: recognise that we are dealing with two samples of non-paired data, so it is not a Wilcoxon Signed Rank test

12c) Hint 10: know the assumptions behind the Mann-Whitney test

Did this hint help?