Paper 1

Question 1

1a) Hint 1: the best diagram is one where you can see the individual data values

1a) Hint 2: if you construct a stem-and-leaf diagram, remember to order the leaves and include a key

1b) Hint 3: use your (stem-and-leaf) diagram to work out the quartiles, to obtain the inter-quartile range (IQR)

1b) Hint 4: calculate fences by Q1 - 1.5 × IQR and Q3 + 1.5 × IQR

1b) Hint 5: know that points that lie beyond fences are called outliers

Question 2

2a) Hint 1: think about the 100 patients sampled and which of the 10 surgeons are most likely to have operated on them

2a) Hint 2: think about surgeons who have done hundreds of operations compared to surgeons who have done few: why might there be these differences?

2b) Hint 3: recognise that there are 10 different surgeons, each with their own patients

2b) Hint 4: decide if these are 10 strata or 10 clusters

2b) Hint 5: when selecting from each of the 10 groups, be sure to mention how random numbers are used

2c) Hint 6: notice that the survey was on 'quality of life' not 'quality of surgeon'

2c) Hint 7: think about the other factors that may influence why they had the surgery in the first place and how they might recover from it

Question 3

3a) Hint 1: notice that we do not have the population standard deviation

3a) Hint 2: notice that we have an assumed normal distribution of the population

3a) Hint 3: realise that the t-distribution will be needed here as we use the sample standard deviation to estimate the population standard deviation

3a) Hint 4: perform a single sample t-test at the 1% level

3a) Hint 5: be sure to communicate your conclusion in terms of the context of the problem

3a) Hint 6: know that the z-test would have required knowing the value of the population standard deviation

3a) Hint 7: alternatively, if the sample had been much larger in size, then the sample standard deviation could have then been taken as a very good estimate of the population standard deviation and a z-test could have been used

Question 4

4a)i) Hint 1: it is asking about 'shape' so write about whether it is symmetrical or skewed, unimodal or bimodal, etc.

4a)ii) Hint 2: use the given summary statistics to calculate the required statistics

4b) Hint 3: know that the CLT is used here as the population has an unknown distribution and …

4b) Hint 4: … we are wanting to work with the sample mean and …

4b) Hint 5: … the sample size is large enough to use the approximate normal distribution of X̄

4b) Hint 6: note that we are having to estimate the population standard deviation from the sample of size 20

4b) Hint 7: proceed with calculation, using X̄ ≈ N(7.1, 2.427²/20)

Question 5

5a)i) Hint 1: we have to assume that the population is distributed normally, in order to work with a z-interval or a t-interval

5a)i) Hint 2: recognise that we are not given the population standard deviation, so by estimating it from the sample, we will have to use the t-distribution

5a)i) Hint 3: proceed to calculate a 90% confidence interval using the t₈ distribution

5a)ii) Hint 4: establish whether 4.75 lies in the confidence interval, or not, and what this means

5b) Hint 5: think about what else a tomato plant needs in order to grow and produce tomotoes

Question 6

6a) Hint 1: consider how many data points are below the centre line

6a) Hint 2: refer to the data booklet for the WECO rules

6b) Hint 3: assume that if the process is in control, then P(above centre line) = P(below centre line)

6b) Hint 4: realise that there are 14 repetitions of the success criteria this is 'being on one side of the centre line'

6b) Hint 5: use the Binomial distribution, B(14, ½) to work out P(12 points being on one side of the centre line)

6b) Hint 6: remember that the 12 points could all be above, or all below, the centre line

6c) Hint 7: look at the graph for when the points on the graph visibly 'shifted'

6d) Hint 8: realise that you are working 'backwards' from a control limit value to the sample size

6e) Hint 9: know the conditions required for approximating a binomial distribution with a normal distribution

6e) Hint 10: that is, for B(n, p) we need np > 5 and nq > 5 for the approximation to be a good one

6e) Hint 11: start from a B(975, 0.025) distrubution for the number of difficult recoveries per month

6e) Hint 12: approximate this to a N( 975 × 0.025, 975 × 0.025 × 0.975) distribution

6e) Hint 13: convert it to a proportion of difficult recoveries per month by dividing by 975

6e) Hint 14: this gives the distribution of proportions to be N(0.025, (0.025 × 0.975)/975)

6e) Hint 15: calculate the probability of the proportion being larger than 0.03, using this distribution

Question 7

7a) Hint 1: construct a 2×2 contingency table to summarise the information provided in the opening paragraph

7a) Hint 2: count the number of people who have a virus, and the total number of people

7b) Hint 3: note that without replacement, the probability will change after each person is selected

7c)i) Hint 4: consider drawing a tree diagram with virus/not virus as the first branches and react positive/negative as the second branches

7c)i) Hint 5: P( reacts positively ) = P( reacts positively ∩ have virus) + P( reacts positively ∩ not have virus)

7c)ii) Hint 6: realise that this is a conditional probability

7c)ii) Hint 7: we want P( has the virus | reacts positively)

Question 8

8a) Hint 1: recogise that we have been provided with two samples (of the same size) but that they are not paired

8a) Hint 2: know that we can work out two sample proportions

8a) Hint 3: the z-test referred to is one on the two sample proportions

8a) Hint 4: under H₀: p₁ = p₂, we need to calculate a pooled proportion, the formula for which is in the data booklet

8a) Hint 5: perform the test, communicating your conclusion in terms of the context

8b) Hint 6: realise that a new medicine may do better, or may do worse, than the existing medicine

Question 9

9a) Hint 1: know that a Poisson model requires events to happen independently of each other

9a) Hint 2: know that a Poisson model needs events happening at a constant rate

9b) Hint 3: know that a Po(4) has a variance of 4, and thus a standard deviation of 2

9b) Hint 4: know that P(X > 4 + 2 × 2) = P(X > 8) = P( X ≥ 9) = 1 - P(X ≤ 8)

9c) Hint 5: know that we do not want 38X, but rather X₁ + X₂ + … + X₃₈

9c) Hint 6: as the Poisson parameter for the total number of injuries is >10, we can approximate with a normal distribution

9c) Hint 7: as we are doing from a discrete distrubution to a continuous distrubution, we need to use continuity correction

9d) Hint 8: consider your answer to part (a) and think of what might cause the assumptions to not be plausible

Question 10

10a) Hint 1: set up a probability distribution table with the two probabilities represented by 'p' and 'q'

10a) Hint 2: calculate E(X) in terms of 'p' and 'q'

10a) Hint 3: know that p + q = 1

10a) Hint 4: solve simulataneous equations to obtain the probabilities

10b) Hint 5: work out E(Y) and E(Y²) to then work out V(Y)

10c) Hint 6: apply the laws of expectation and variance of E(aX ± bY) and V(aX ± bY)

Question 11

11a) Hint 1: recognise that we have a two-sample paired test

11a) Hint 2: recognise that we do not know the population variance of the differences

11a) Hint 3: realise that we need to use a t-test as we will estimate the population standard deviation with the sample standard deviation

11a) Hint 4: be sure to communicate your conclusion in terms of the context of the problem

11b) Hint 5: without assumptions of population normality, we need to use a non-parametric test

11b) Hint 6: realise that it is a Wilcoxon Signed Rank test that is needed

11b) Hint 7: know that this test requires the assumption that the population differences are from a symmetrical distribution

11c) Hint 8: think what a non-paired study would not have allowed Charles Darwin to do

Question 12

12a) Hint 1: make a comment about whether there is positive, negative or no correlation

12a) Hint 2: make a comment about whether any correlation that you see is linear, or not

12b) Hint 3: use standard formula to work out r

12b) Hint 4: know that r is an estimate for ρ which is the population correlation

12b) Hint 5: know that the square of r is the coefficient of determination, R²

12b) Hint 6: know that R² captures what percentage of variability would be captured by a least-squares linear regression model

12c) Hint 7: perform a standard t-test on the correlation coefficient, to establish if it is different from zero, or not

12d) Hint 8: know that the sample size effects both the value of r and the number of degrees of freedom

12d) Hint 9: know that a smaller sample size with the same value of r would potentially give a different conclusion to that from part (c)

12d) Hint 10: be sure to communicate your final conclusion in terms of the context of the problem

12e) Hint 11: know that a regression line is often used for predicting purposes

12e) Hint 12: realise these predictions will not be reliable if the line does not fit the sample data very well, indicated by a small value of r

Did this hint help?