Paper 1

Question 1

1a) Hint 1: note the two aspects that you have to comment on: location and spread

1a) Hint 2: be sure to use the boxplot on the left that relates to grams of fat, and not the boxplot on the right that relates to calories

1b) Hint 3: make reference to both upper and lower fences

1c) Hint 4: know that the outliers that we are interested in here are below the lower fence, which is Q1 - 1.5 × (Q3 - Q1)

1c) Hint 5: be sure to clearly communicate that the two outliers are numerically less than the value of the lower fence

1d) Hint 6: think of the process that would have had to happen in order to gather and record the data, and what might have gone wrong somewhere in this process

1e) Hint 7: know that a t-test is generally used when a population standard deviation is not known

1e) Hint 8: know that in order to use a t-test, we therefore need to use the sample standard deviation as an estimate of the population standard deviation

1e) Hint 9: look at Output 1 to see whether the required information is there, or not.

1f) Hint 10: take the time to write out explicitly in words the two null hypotheses

1f) Hint 11: be clear that the hypotheses must refer to the appropriate populations' means

1g) Hint 12: know that a p-value for a two tailed test is 2 × probability of obtaining the test statistic, or a more extreme value

1g) Hint 13: here, the required p-value = 2 × P(t75 > 1.0496)

1g) Hint 14: know that a t75 distribution can be approximated by the standard normal distribution, Z

1g) Hint 15: know that to use Table 3 in the Statistical Formulae and Tables booklet, we have to round 1.0496 to 1.05

1g) Hint 16: know that P(Z > 1.05) = 1 - P(Z < 1.05)

1h) Hint 17: when no direction is provided, it is the convention to use a significance level of 5%

1h) Hint 18: make sure to comment in terms of the impact to the mean calorie intake

1i) Hint 19: note that there are several coffee shop chains in the UK, and only one was selected for this sample

1i) Hint 20: note that from the single coffee shop chain, a (simple) random sample of bakery and non-bakery items was taken

1i) Hint 21: so we have two stages of random selection, and thus this is two stage cluster sampling

1i) Hint 22: make sure to decribe how a single coffee shop chain might have been selected

1i) Hint 23: make sure to describe how a simple random sample of bakery/non-bakery items might have been conducted

Question 2

2a)i) Hint 1: notice that as crop density increases, so does crop yield

2a)i) Hint 2: this suggests a positive relationship

2a)i) Hint 3: looking at the positioning of the points, a positively correlated linear relationship between the two variables is very plausible

2a)ii) Hint 4: know that a residual = observed value - fitted value

2a)ii) Hint 5: know that a residual is a measure of the error in the observed data that is not explained by the linear model (that gives the fitted value)

2a)iii) Hint 6: the residuals do have a mean of zero, so that is not the reason

2a)iii) Hint 7: the residuals do have a constant variance, so that is also not the reason (see the next hint for the reason behind this)

2a)iii) Hint 8: reason for constant variance: looking from left to right, taking two or three points at a time, the 'spread' of those groups of points is roughly the same

2a)iii) Hint 9: the only remaining reason for the residual plot not being acceptable is the non-random pattern, that has a 'U' shape

2a)iii) Hint 10: this U shape tells us that the linear model starts off underestimating the crop yield, then over estimating it , then underestimating it again

2a)iii) Hint 11: hence transforming the data before fitting a new linear model would hopefully give more consistent estimates of the crop yield for different crop densities

2b)i) Hint 12: accept that you will use the usual formulae for this process, replacing the notation of y with √y

2b)i) Hint 13: calculate 'b' using Sx√y and Sxx

2b)i) Hint 14: obtain the mean of √y, using Σ√y and n = 13

2b)i) Hint 15: obtain x̄ using Σx and n = 13

2b)i) Hint 16: calculate 'a' using the previously obtained values

2b)i) Hint 17: be sure to write your linear regression equation in terms of √y = ...

2b)ii) Hint 18: substitute the value of x = 3.5 into your regression equation to obtain the fitted value for √(crop yield)

2b)ii) Hint 19: use your knowledge from part (a)(ii) to calculate the residual for this data point

2b)ii) Hint 20: use the fitted value for the √(crop yield) and the residual value to locate the required point on the residual plot

2c)i) Hint 21: look at just the crop density values in Figure 1 and Figure 2

2c)i) Hint 22: know that a linear model is only potentially valid for values that were encompassed by the original data set

2c)ii) Hint 23: read lines 7 to 9 of the report

2c)ii) Hint 24: recognise that a linear equation will never reveal a maximum value, by the nature of it being linear

2c)ii) Hint 25: know that the transformation of data from (crop yield) to √(crop yield) will only 'slightly curve the line' and still not give a maximum

2c)iii) Hint 26: recognise that the report did not fit crop yield, but rather √(crop yield)

2c)iii) Hint 27: recognise that the report did not cover all crop densities, only those between 2 and 8 plants/m²

Paper 2

Question 1

1a) Hint 1: define a random variable, X, for the height of one boy, and state its distribution and its parameters

1a) Hint 2: calculate P(X > 111) using either tables, or graphic calculator

1b) Hint 3: define a new random variable, X̄, for the mean height of 25 boys, and state its distribution and its parameters

1b) Hint 4: calculate P(X̄ > 111) using either tables, or graphic calculator

1c) Hint 5: look at the parameters of distributions for X and X̄ and see how they are different

Question 2

Hint 1: know that the assumption required by the Wilcoxon Signed Rank test is that the distributions used are each symmetrical

Hint 2: make sure to mention the distribution of steps is symmetrical (you must always include the context)

Hint 3: calculate the differences between the recorded values of steps, and the number 300

Hint 4: looking at the absolute values of these differences, rank the |differences|

Hint 5: notice that there is one pair of equal values of |differences|, and this will affect the values of their ranks

Hint 6: calculate the sum of the ranks for the positive differences and/or the negative differences, whichever is going to be the smaller

Hint 7: re-read the final sentence of the question to decide whether this is a one-tailed or two-tailed hypothesis test

Hint 8: use Table 7 of the Statistical Formulae and Tables booklet, to obtain the appropriate one-tailed critical value

Hint 9: decide whether to reject H0 or not to reject H0

Hint 10: communicate what this evidence suggests, making sure to mention 'median number of steps' (you must always include the context)

Hint 11: communicate clearly what this means in terms of whether the mobile phone over-counts the number of steps, or not.

Question 3

3a) Hint 1: define a random variable, X, for the number of blood donors with blood type B-

3a) Hint 2: determine what the distribution of X will be, along with its parameters

3a) Hint 3: calculate P(X ≥ 2) using either tables, or graphic calculator

3b) Hint 4: define a random variable, Y, for the number of blood donors with blood type O+ or O-

3b) Hint 5: calculate the combined probability of having either O+ or O- blood

3b) Hint 6: determine what the distribution of Y will be, along with its parameters

3b) Hint 7: with Y ∼ B(50, 0.504), we now approximate it with a normal distribution

3b) Hint 8: calculate the mean and the variance of the normal distribution (checking that np >5 and nq >5, as it's a good habit to do)

3b) Hint 9: if W is the normal approximation to Y, then W ∼ N(25.2, 12.4992)

3b) Hint 10: know that P(Y ≤ 30) = P(W ≤ 30.5) due to continuity correction

3b) Hint 11: calculate P(W ≤ 30.5) using either tables or a graphic calculator

Question 4

Hint 1: a standard chi-squared goodness of fit question...

Hint 2: make sure the null hypothesis references the specified ratio, or similar

Hint 3: calculate the expected frequencies by using the ratios 1:1:2 and the total sample size of 320

Hint 4: know that the degrees of freedom = categories - constraints

Hint 5: we have 3 categories here, and only 1 constraint (i.e. the sum of the frequencies must equal 320)

Hint 6: obtain the value of the test statistic, X²

Hint 7: obtain either critical values from the Data Booklet, or p-value from a graphic calculator

Hint 8: decide whether or not to reject H0

Hint 9: make sure that final statement includes the context of the problem

Question 5

5a)i) Hint 1: know that for the distribution of a random variable to be valid, all of its probabilities must sum to 1

5a)i) Hint 2: so P(X = 4) = 1 - P(X ≤ 3)

5a)i) Hint 3: after calculating an expression for P(X = 4) in terms of p, proceed with calculating E(X) in the normal manner

5a)ii) Hint 4: using E(X) = 3, calculate the value of p using E(X) = 4 - 16p, from part (a)(i)

5a)ii) Hint 5: proceed to calculate V(X) in the normal manner

5b) Hint 6: as Y ∼ Po(1), write down E(Y) and V(Y)

5b) Hint 7: use laws of expectation and variance to calculate E(K) and V(K)

5b) Hint 8: remember that V(aY) = a²V(Y)

5b) Hint 9: remember to state the value of SD(K)

Question 6

Hint 1: recognise that we have a single sample of 75 baby lengths

Hint 2: recognise that we don't have the baby lengths population standard deviation

Hint 3: know that we shall have to estimate the population standard deviation from the sample standard deviation

Hint 4: this all suggests that a single sample t-test is required

Hint 5: however, we are told that the sample standard deviation is a good estimate of the population standard deviation, and we would have a t74 distribution, which can be approximated with a Z distribution, and so a single sample z-test is now the appropriate choice, going forward

Hint 6: calculate the sample mean, x̄, from Σx and n = 75

Hint 7: calculate the sample standard deviation, s, using the formula on page 4 of the Statistical Formulae and Tables booklet

Hint 8: state your hypotheses in terms of the population mean

Hint 9: define X and X̄, using all of the data so far gathered

Hint 10: using the Z distribution, calculate the test statistic, or the p-value, for the sample mean

Hint 11: decide whether to reject H0 or not to reject H0

Hint 12: communicate what this evidence suggests, making sure to mention 'mean baby length' (you must always include the context)

Hint 13: write a clear comment on the midwife's theory

Question 7

7a) Hint 1: draw a tree diagram!

7a) Hint 2: your tree diagram should have a first set of branches with 'jam', 'cheese', 'tuna' with the second set of branches being 'water', 'lemonade' and the third set of branches being 'apple', 'banana'

7a) Hint 3: recognise that P(tuna ∩ water) = 0.035

7a) Hint 4: know that P(water | tuna) = P(tuna ∩ water) ÷ P(tuna)

7b) Hint 5: recognise that we need to know either P(banana) or P(apple)

7b) Hint 6: use P(cheese ∩ banana) to help obtain P(banana)

7b) Hint 7: know that P(cheese ∩ banana) = P(cheese) × P(banana) as they are independent events

7b) Hint 8: notice that 'fruit being an apple' is the complementary event to 'fruit being a banana'

7b) Hint 9: know that P(jam ∩ apple) = P(jame) × P(apple) as they are independent events

Question 8

Hint 1: recognise that this is a hypothesis test on ρ, the population correlation coefficient

Hint 2: use the formulae from the Data Booklet to calculate the test statistic, t, using n and r

Hint 3: the number of degrees of freedom is two less than the sample size (due to it being bivariate data)

Hint 4: note that you are conducting a two-tailed test

Hint 5: decide whether to reject H0 or not to reject H0, remembering that 0.1% can be written as 0.001

Hint 6: clearly communicate your conclusion, citing the context of the problem.

Question 9

9a) Hint 1: recognise that you have paired data

9a) Hint 2: recognise that we do not know the population standard deviation

9a) Hint 3: hence we are going to perform a t-test for the mean difference in populations

9a) Hint 4: state your hypotheses in terms of the mean of the differences

9a) Hint 5: decide whether it is a one-tailed or two-tailed test being performed

9a) Hint 6: calculate the test statistic, using x̄, sn-1 and n

9a) Hint 7: determine the number of degrees of freedom for the t distribution

9a) Hint 8: obtain a critical value, or calculate a p-value

9a) Hint 9: decide whether to reject H0 or not to reject H0

9a) Hint 10: clearly communicate your conclusion, citing the context of the problem.

9b)i) Hint 11: comment on whether the histogram's shape is one that looks like a normal distribution

9b)ii)) Hint 12: know that a Wilcoxon Signed Rank test is also designed for paired data

9b)ii)) Hint 13: think about the assumption required for this test and whether the histogram provides any evidence that supports that assumption being valid

Question 10

Hint 1: recognise that you are given data on proportions, so a proportion test is the chosen test to perform

Hint 2: decide whether we have the difference in two population proportions, or a single sample proportion

Hint 3: state your hypotheses in terms of the population proportion, p.

Hint 4: decide whether it is a one-tailed or two-tailed test being performed

Hint 5: calculate the sample proportion test statistic, p̂ using the numbers 23312 and 37878

Hint 6: for the model, define X to be the number of homeless veterans in sheltered accommodation in 2018

Hint 7: determine the distribution of X, and its parameters

Hint 8: this distribution will first be approximated to a normal distribution

Hint 9: approximate the X ∼ B(n,p) into a N(np, npq) distribution

Hint 10: create a new random variable to represent the proportion of homeless veterans in sheltered accommodation in 2018

Hint 11: determine the parameters of the normal distribution of this new random variable which will model the proportions

Hint 12: using p̂, obtain a critical value, or calculate a p-value

Hint 13: decide whether to reject H0 or not to reject H0

Hint 14: clearly communicate your conclusion, citing the context of the problem.

Question 11

Hint 1: recognise that we are looking to set up some equations to allow us to calculate μ and σ

Hint 2: this suggests two simultaneous equations being formed, as we have two unknown variables

Hint 3: know that one equation will come from using P(X > 24) = 0.05

Hint 4: know that the second equation will come from using P(X < 17) = 0.10

Hint 5: use the inverse normal tables/function to obtain z-values corresponding to cumulative probabilities of 0.10 and 0.95

Hint 6: assemble the information of 17, μ, σ and -1.28155 into an equation

Hint 7: assemble the information of 24, μ, σ and 1.64485 into an equation

Hint 8: solve these simultaneous equations

Question 12

12a) Hint 1: know that the formula for a 99% CI is p̂ ± z0.995 √ (p̂q̂/n)

12a) Hint 2: substitute all of the correct values into this formula to obtain the confidence interval

12a) Hint 3: know that the origins of this formula come from a binomial distribution being approximated by a normal distribution

12a) Hint 4: hence the process of approximating one distribution with another inherently introduces added uncertainty

12b) Hint 5: recognise that we want the lower bound of the confidence interval to be greater than 0.50

12b) Hint 6: this means that we want p̂ - z0.995 √ (p̂q̂/n) > 0.50

12b) Hint 7: rearrange this inequality to make n the subject

12b) Hint 8: evaluate the inequality to obtain a minimum value for n

12b) Hint 9: know to interpret the value for n, bearing in mind the context of the problem

Did this hint help?