Paper 1

Question 1

1a) Hint 1: look carefully at the intended population

1a) Hint 2: look carefully at the population from which the sample is going to be taken

1a) Hint 3: note down the differences between these two.

1b)i) Hint 4: know that stratified random sampling involves taking samples from each strata, so that each sample's size is proportional to number of items in each strata

1b)i) Hint 5: notice that 5 films were sampled from each year, regardless of which year it was and how many films had been released during that year

1b)ii) Hint 6: describe what would happen for each individual year, one at a time, and then the full sample is the combining of these annual samples

1b)ii) Hint 7: stratified sampling requires knowing the number of films released in any single year, so call that number N

1b)ii) Hint 8: we need to work out 1% of N, rounded to the nearest whole number - this will be the sample size we take from that year. Call this sample size M.

1b)ii) Hint 9: we need to take a simple random sample of M films from the N available films, for each year

1b)ii) Hint 10: number the N films from 1 to N and then use a random number generator to provide M distinct values between 1 and N inclusive

1b)ii) Hint 11: Use these M distinct values to identify the M films that form the sample from that one year

1c)i) Hint 12: know that the null and alternative hypothesis for a chi-squared test of association will typically talk about 'no association' and 'an association'. (it could also be 'independent' and 'not independent')

1c)i) Hint 13: remember to include the context of what the association is between

1c)ii) Hint 14: know that small expected frequencies can cause problems

1c)ii) Hint 15: use page 6 of the Statistical Formulae and Tables booklet to remind you of the criteria for expected frequencies

1c)iii) Hint 16: know that combining either some rows, or some columns, can increase the observed frequencies, and thus increase the expected frequencies

1c)iii) Hint 17: decide on whether to describe combining two rows, or two columns, (either will do, in this case, or example purposes) and show that the number of rows (or columns) reduces, giving a new number of degrees of freedom

1d)i) Hint 18: recognise this is a z-test for a difference in population proportions being performed on the age 18 films.

1d)i) Hint 19: identify the two proportions and their sample sizes

1d)i) Hint 20: calculate the pooled proportion

1d)i) Hint 21: substitute these values into the formula given on page 6 of the Statistical Formulae and Tables booklet

1d)i) Hint 22: if you obtained a positive value of the test statistic, write down an explanation for how the calculation might have been done differently so that a negative test statistic would be generated

1d)ii) Hint 23: know that a 2 sample proportion test is built around an original model involving two binomial distributions, B(n1, p1) and B(n2, p2)

1d)ii) Hint 24: know that these binomial distributions are each then approximated with a normal distribution

1d)ii) Hint 25: know that for these approximations to be good, n1 × p1 > 5 and n1 × q1 > 5 as well as n2 × p2 > 5 and n2 × q2 > 5

1d)ii) Hint 26: write down the consequences for the inferential test, if any of these four inequalities are not satisfied

1e) Hint 27: think about the claim that 'numbers of films in every age category have changed'

1e) Hint 28: think whether the researcher actually gathered data on the numbers of films

1e) Hint 29: think about the claim that 'there is an increase in the number of family friendly films'

1e) Hint 30: look back at lines 11 and 12 to read the definition of 'family friendly' films used in this report

1e) Hint 31: notice that U and PG films in the samples have not increased in their quantity

Question 2

2a) Hint 1: know that a two sample z-test would require the data to have come from a normally distributed population, for which the parameter values are known

2a) Hint 2: look at Figure 1 to decide whether there is sufficient evidence to support the assumptions of normality

2a) Hint 3: look at the opening description to the question to decide what population parameter values are known

2b)i) Hint 4: know that a Mann-Whitney test is a test of population medians

2b)i) Hint 5: remember to include in your hypotheses the context of the situation. To be safe, write things out in full, and don't rely on shorthand abbreviations.

2b)ii) Hint 6: know the process that a Mann-Whitney test requires to be followed

2b)ii) Hint 7: you have the data in Table 1, so clearly describe in words what you would do with the 29 values

2b)ii) Hint 8: complete your description with a statement of how the value of 205 would be generated

2b)iii) Hint 9: identify the two sample sizes from Table 1

2b)iii) Hint 10: use Output 1 to determine whether we are doing a one-tailed test or a two-tailed test

2b)iii) Hint 11: using m = 14 and n = 15, look at page 16 of the Statistical Formulae and Tables booklet, to obtain the critical value for a 5% two-tailed test

2b)iii) Hint 12: using the critical value of 164, and the test statistic of 205, decide whether to either reject H0 in favour of H1, or to not reject H0

2b)iii) Hint 13: write the usual conclusion to a hypothesis test, in terms of H1, including context and making sure to include phrasing such as 'evidence to suggest'

2c)) Hint 14: know that the missing value is a rank sum

2c)) Hint 15: notice that the 15 values of differences are listed in Output 2 and their corresponding ranks are also listed, in the same order

2c)) Hint 16: notice that we are after the rank value for a difference of 20, and that there is another 20 in the list, so it must have the same rank

2c)) Hint 17: notice that these are the only dupicated values with tied ranks in the whole list, and thus the missing value is 6.5

2d)) Hint 18: to calculate W- add up the four ranks that correspond to the negative differences

2d)) Hint 19: to calculate W+ add up the eleven ranks that correspond to the positive differences

2d)) Hint 20: know that W will be the minimum of W- and W+

2e)i) Hint 21: using Table 2 and Output 2, write down the value of n and whether it is a one-tailed or two-tailed test

2e)i) Hint 22: look at page 15 of the Statistical Formulae and Tables booklet, to look at the critical values for a two-tailed test where n = 15

2e)i) Hint 23: recognise that 18 is between 19 and 15

2e)i) Hint 24: read off that 19 corresponds to 2%, and 15 corresponds to 1%

2e)i) Hint 25: hence we can conclude that the p-value is between 0.01 and 0.02

2e)ii) Hint 26: write the usual conclusion to a hypothesis test, including context and making sure to include phrasing such as 'evidence to suggest'

Paper 2

Question 1

Hint 1: identify the median, lower quartile, upper quartile and calculate the lower and upper fences from these

Hint 2: if any data points lie beyond either fence, then clearly state which point(s) and which fence(s)

Question 2

2a) Hint 1: consider drawing a tree-diagram

2a) Hint 2: on the tree-diagram, the first set of branches are labelled 'route A' and 'route B', with the second sets of branches are 'Late' and 'Not late'

2a) Hint 3: put as many of the probability values from the question on the tree diagram

2a) Hint 4: fill in as many of the remaining gaps on the tree diagram as you can, knowing that pairs of branches' probabilities add to 1

2a) Hint 5: know that P(late) = P(route A ∩ late) + P(route B ∩ late)

2a) Hint 6: this gives P(late) = P(route A) × P(late | route A) + P(route B) × P(late | route B)

2a) Hint 7: now substitute in the corresponding probability values from the tree diagram

2b) Hint 8: recognise that they are wanting the conditional probability of P(route B | late)

2b) Hint 9: use Bayes' theorem to give P(route B | late) = P(route B ∩ late) / P(late)

2b) Hint 10: substitute into this formula the probability values from your workings in part (a)

Question 3

3a) Hint 1: recognise that there are 12 repeated trials, each with a probability of success of 0.88

3a) Hint 2: define your random variable, X

3a) Hint 3: with X = number of bins emptied on first lift, state the distribution of X

3a) Hint 4: with X ∼ B(12, 0.88) decide what value of X we want the probability for

3a) Hint 5: calculate P(X = 9) either by the formula nCrprqn-r, or using a graphic calulator

3a) Hint 6: if using a graphic calculator, do NOT write the calculator syntax command as part of your MAIN workings. Instead, write it to the SIDE, so that the examiner knows where your final answer came from.

3b) Hint 7: recognise that we have a new binomial distribution, with a different number of trials

3b) Hint 8: know that we shall be approximating a binomial distribution with a normal distribution

3b) Hint 9: show that B(48, 0.88) is approximated by N(42.24, 5.0688)

3b) Hint 10: also, as a good habit, remember to show that you've checked that np > 5 and nq > 5, to confirm that it's a good approximation

3b) Hint 11: know that 75% of 48 is 36

3b) Hint 12: decide if you want > 36 bins or ≥ 36 bins, by reading the question carefully

3b) Hint 13: remember continuity correction, as we are going from a discrete distribution to a continuous distribution

Question 4

Hint 1: recognise that this is a straightforward chi-squared test for goodness of fit

Hint 2: generate the corresponding expected frequencies for each grade, using the total of the observed frequencies and the model's U(5) distribution

Hint 3: write down your null and alternative hypotheses, mentioning the context and the model

Hint 4: decide if it is a one-tailed test, or a two-tailed test

Hint 5: calculate the test statistic, X²

Hint 6: determine the number of degrees of freedom, using the number of categories and the number of constraints

Hint 7: generate a p-value, or obtain a critical value from page 14 of the Statistical Formulae and Tables booklet

Hint 8: decide whether to either reject H0 in favour of H1, or to not reject H0

Hint 9: write the usual conclusion to a hypothesis test, in terms of H1, including context and making sure to include phrasing such as 'evidence to suggest'

Question 5

5a) Hint 1: note that the data points do not appear to lie on a straight line

5a) Hint 2: this means a non-linear relationship exists between 'errors' and 'attempt'

5a) Hint 3: note that as the number of attempts increase, the number of errors decrease

5a) Hint 4: this means there is a negative association between 'errors' and 'attempt'. (it may be described as 'inversely proportional')

5b)i) Hint 5: to calculate b, we need Sxy and Sxx

5b)i) Hint 6: we need to calculate Sxy from the provided data

5b)i) Hint 7: for this stage of working, realise that the usual notation of 'y' has been replaced with '1/y', due to the transformation that was done to the 'errors' data

5b)i) Hint 8: calculate the value of 'a' in the similar way, using the information on page 5 of the Statistical Formulae and Tables booklet

5b)i) Hint 9: be sure to write your final equation in terms of '1/y' and not 'y'

5b)ii) Hint 10: know that a confidence interval is for mean values, and that a prediction interval is for single values

5b)ii) Hint 11: using x = 7 and the formula from (b)(i), calculate the fitted value for '1/y'

5b)ii) Hint 12: proceed with substituting in all other values into the prediction interval formulae, to give lower and upper bounds for '1/y'

5b)ii) Hint 13: know to convert back from '1/y' to 'y' by taking the reciprocal of each of the bounds that have just been calculated

5b)ii) Hint 14: note that the researcher's question was the 'expected number of errors by a single rat' so it is appropriate to interpret the prediction interval's bounds to give a whole numbers of errors for the single rat

Question 6

6a) Hint 1: know that you are essentially starting with a binomial distribution that is approximated by a normal distribution, in order to give the confidence interval

6a) Hint 2: define your random variable, X

6a) Hint 3: with X = number of crackers that work, state the distribution of X

6a) Hint 4: using a binomial distribution, we have to assume independence of cracker successes (independent trials)

6a) Hint 5: we also have to assume a fixed probability of crackers working correctly (fixed p)

6a) Hint 6: with X ∼ B(20, p) we look to the sample to give a sample estimate of the proportion, p̂

6a) Hint 7: note that p̂ = 14/20

6a) Hint 8: the approximation from binomial to normal would require n × p > 5 and n × q > 5. This is NOT an assumption, but a requirement of the process being undertaken.

6a) Hint 9: using the approximation that p is equal to p̂, and q is equal to q̂, then these requirements do appear to be met

6a) Hint 10: we are after a 99% confidence interval, so we will need z0.995

6a) Hint 11: substitute the appropriate values into the standard formula for a confidence interval for a proportion. This is not provided in the Statistical Formulae and Tables booklet - you need to know it.

6b) Hint 12: use your confidence interval from part (a)

6b) Hint 13: decide whether 0.75 is within, or outwith, your confidence interval

6b) Hint 14: write down a statement that includes such phrasing as 'evidence to suggest' or 'supports the belief'

Question 7

Hint 1: for X, using the information on page 4 of the Statistical Formulae and Tables booklet, calculate E(X) and V(X)

Hint 2: use the laws of expectation and variance to calculate E(3X - 2) and V(3X - 2)

Hint 3: note that the question asks for the standard deviation of Y, not the variance of Y.

Question 8

Hint 1: recognise that we have two samples, that are not paired

Hint 2: recognise that we are given sample means, not medians, and so we are looking at a parametric test

Hint 3: recognise that the population variances are not known

Hint 4: recognise that we have small sample sizes

Hint 5: note that we do not know the population distribution(s) of plant heights

Hint 6: we shall therefore assume that plant heights are normally distributed, else we can't really continue

Hint 7: conclude that shall be performing a t-test for a difference in population means

Hint 8: this t-test also requires the assumption that the plant height population variances for each area are equal

Hint 9: (this assumption seems plausible, as 3.42 is very close to 3.51)

Hint 10: write down the null and alternative hypotheses, noting whether the test is one-tailed or two-tailed

Hint 11: declare the level of significance that you intend to use (such as 1%, 5% or 10%, or similar)

Hint 12: use the information on page 6 of the Statistical Formulae and Tables booklet to evaluate the test statistic, t

Hint 13: decide how many degrees of freedom we have

Hint 14: either calculate a p-value, or read off a critical value from page 13 of the Statistical Formulae and Tables booklet

Hint 15: decide whether to either reject H0 in favour of H1, or to not reject H0

Hint 16: write the usual conclusion to a hypothesis test, in terms of H1, including context and making sure to include phrasing such as 'evidence to suggest'

Question 9

9a) Hint 1: calculating the mean rate of component failure for a race of 6000 miles will depend on the mean number of miles that each component lasts for

9a) Hint 2: assumptions behind a Poisson distribution are typically independent events and a constant mean rate of events happening

9a) Hint 3: write down at least one of these two assumptions, phrased using the context of the question

9b) Hint 4: know that if a component does not fail, then the number of replacement components needed will be zero

9b) Hint 5: If X ∼ Po(2.5), then we want P(X = 0)

9b) Hint 6: calculate P(X = 0) using either the information on page 4 or page 10 of the Statistical Formulae and Tables

9c) Hint 7: recognise that we are after the minimum value of x, where P(X ≤ x) ≥ 0.90

9c) Hint 8: this is best attempted using 'trial and improvement' for different values of x

9c) Hint 9: use page 10 of the Statistical Formulae and Tables to find the first value of x which gives a probability of larger than 0.90

Question 10

Hint 1: know that an overall width of 1.4, means 0.7 above and 0.7 below the sample mean

Hint 2: hence the part of the confidence interval formula after the ± must be less than 0.7

Hint 3: as it is a 90% confidence interval, we shall need the value of z0.95

Hint 4: set up the equation z0.95 × 2.9 / √n < 0.7 where n is the sample size

Hint 5: algebraically rearrange this formula to make n the subject, so it has the form 'n > ..'

Hint 6: state the smallest whole number value of n that satistifies this inequality

Question 11

11a)i) Hint 1: know that discrete data typically takes on a limited number of values, and - in this context - are likely to be whole numbers

11a)ii) Hint 2: know that a Mann-Whitney test requires the two population distrubutions to have the same shape and spread

11b) Hint 3: know that a chi-squared test of association requires observed frequencies of categorial data

11b) Hint 4: know that categorical data are typically non-numerical data

11c) Hint 5: recognise that pulse rate and BMI are both numerical data, and - in this context - paired data

11c) Hint 6: this means that they could be displayed on a scatterplot

11c) Hint 7: know that a linear association could then be determined using Pearson's Product Moment Correlation Coefficient

Question 12

12a) Hint 1: recognise that we need to determine the total mass of 48 jars that are filled with honey

12a) Hint 2: know that writing total, T = 48 × J + 48 × H is not correct, as that is taking 48 copies of one Jar and 48 copies of one 'Honey'

12a) Hint 3: know that it ought to be written as total, T = J1 + .. + J48 + H1 + .. + H48

12a) Hint 4: recognise that we have a total of 96 random variables being added together

12a) Hint 5: using laws of expecation and variance, calculate E(T) and V(T)

12a) Hint 6: know that the calculate V(T) requires the assumption that all of the 96 random variables are independent of one another

12a) Hint 7: as each of Ji and Hi are normally distributed, then T is also normally distributed

12a) Hint 8: with T ∼ N(25056, 1056), calculate P(T > 25000)

12b) Hint 9: recognise that we have a sample mean of 527.5 and a sample size of 10

12b) Hint 10: note that as we don't have a sample standard deviation, we shall assume that the population standard deviation is 5

12b) Hint 11: if X = mass of jar and honey, and X ∼ N(μ, 5²) then proceed with a z-test for a single sample

12b) Hint 12: write down the null and alternative hypotheses in terms of μ, noting whether it is one-tailed or two-tailed

12b) Hint 13: state the distribution of the sample mean, X̄

12b) Hint 14: calculate the test statistic, z, using the sample mean

12b) Hint 15: either calculate a p-value, or read off a critical value from page 12 of the Statistical Formulae and Tables booklet

12b) Hint 16: decide whether to either reject H0 in favour of H1, or to not reject H0

12b) Hint 17: write the usual conclusion to a hypothesis test, in terms of H1, including context and making sure to include phrasing such as 'evidence to suggest'

Question 13

13a) Hint 1: recognise the sample size of n = 5

13a) Hint 2: write down a definition for the random variable X

13a) Hint 3: with X = sugar content per litre bottle, write down the distribution of X

13a) Hint 4: with X ∼ N(102, 0.13²) write down the distribution of the sample mean, X̄

13a) Hint 5: calculate the 2-sigma limits in the standard way using 102 ± 2 × 0.13 / √5

13b) Hint 6: you are strongly recommended to draw a diagram showing all six limit lines, using values 101.94 and 101.83 and those from part (a) to label all six limit line's values

13b) Hint 7: plot on the diagram the points corresponding to the first two sample means of 101.86 and 101.89

13b) Hint 8: notice that 101.86 is between the lower 2 sigma limit and the lower 3 sigma limit

13b) Hint 9: notice that 101.89 is between the lower 1 sigma limit and the lower 2 sigma limit

13b) Hint 10: for the process to be in control the next sample mean cannot be below the lower 2 sigma limit line

13b) Hint 11: for the process to be in control the next sample mean cannot be above the upper 3 sigma limit line

13b) Hint 12: hence the next sample mean has to be above the lower 2 sigma limit line, and also below the upper 3 sigma limit line

Question 14

14a)i) Hint 1: know to emphasise that the sample mean has an approximate normal distribution

14a)ii) Hint 2: remember to use phrases such as 'population mean' and 'population variance', and not just 'mean' and 'variance'

14b)i) Hint 3: the Central Limit Theorem is usually used where the population distribution is not known

14b)i) Hint 4: in this case, the birth weights are widely agreed to be normally distributed, so any sample mean of them will also be normally distributed

14b)ii) Hint 5: any sample used ought to be representative of the population from which it has been taken

14b)ii) Hint 6: representative samples are often best obtained from random sampling methods

14b)ii) Hint 7: read the description of what the researchers did in terms of selecting the maternity hospital, and of the babies they weighed

14b)ii) Hint 8: clearly communicate whether this sample was obtained using random methodology

Did this hint help?