Paper 1

Question 1

1a) Hint 1: know that stem and leaf diagrams need a key

1a) Hint 2: they also need the leaves listed in increasing order, moving away from the stem

1a) Hint 3: and back to back stem and leaf diagrams also need a title for each side

1b) Hint 4: decide which side of the stem and leaf diagram relates to doodles of cats, by looking at either the maximum or minimum values and referring to the summary data below the diagram

1b) Hint 5: know that the outliers being asked about will be beyond the upper fence, rather than below the lower fence

1b) Hint 6: know that the upper fence is Q3 + 1.5 × IQR

1c)i) Hint 7: recognise that the success rate is a number of successs out of a number of 'trials'

1c)i) Hint 8: this indicates a Binomial setup, from which proportions of success can be calculated

1c)i) Hint 9: hence we have two proportions, and we are interested in whether they are different

1c)i) Hint 10: therefore a two sample proportion test is appropriate

1c)ii) Hint 11: make it clear which proportions are which, by using pcats and pdogs

1d) Hint 12: know that Mann-Whitney tests require the shape and spread of the population distributions to be the same

1d) Hint 13: also sampling theory requires the samples to be independent and randomly selected from the populations

1d) Hint 14: remember to phrase these assumptions in the context of the study, so mention that the sample data is of 'times to draw a doodle'

1e) Hint 15: recognise that the large sample sizes require a normal approximation to be used

1e) Hint 16: correctly identify that m = 121 and n = 145

1e) Hint 17: calculate E(W) and V(W) using the formulae below the Mann-Whitney tables in the Data Booklet

1e) Hint 18: as we are approximating a discrete distribution with a continuous distribution, remember to use continuity correction

1e) Hint 19: so P(W ≤ 12048) becomes P(W < 12048.5)

1e) Hint 20: now standardise the value of 12048.5 to give the probability of -6.57

1f) Hint 21: make sure the hypotheses mention 'median' and that they clearly refer to the time take to draw each animal

1f) Hint 22: recognise that a two tailed test is being conducted

1f) Hint 23: know that P(Z < -2.58)=0.005

1f) Hint 24: hence the significance level = 2 × 0.005 = 0.01 = 1%

1f) Hint 25: in phrasing your conclusion, avoid strong words like 'prove'. We only have 'evidence to suggest'

1f) Hint 26: reflect the fact that it was a two-tailed test in your wording, and not a one-tailed test

1g) Hint 27: know that a two sample pooled t-test would require an assumption to do with the two population variances

1g) Hint 28: know that for these large samples, the population standard deviations could be well approximated by the sample standard deviations

Question 2

2a) Hint 1: notice that as length increases, the cost also increases

2a) Hint 2: this suggests a positive correlation but . . .

2a) Hint 3: . . . not necessarily a linear relationship

2b) Hint 4: both models have small p-values arising from high values of r

2b) Hint 5: know that high values of r mean evidence of a linear relationship

2b) Hint 6: hence both transformation models seem to give linear data sets

2c) Hint 7: know that the coefficient of determination is written as R² which is r²

2c) Hint 8: know that you need to write a sentence about the percentage of variation in the data that is explained by the model, ensuring that you phrase it in the context of what the numbers refer to

2d) Hint 9: know that we are looking for residual plots where there is a random scatter of points, centred on the zero axis

2d) Hint 10: be confident to conclude that both residual plots seem suitable in this regard, and do not feel as if you had to favour one over the other!

2e) Hint 11: know that a confidence interval is constructed around a sample mean, which is the estimated cost in this context

2e) Hint 12: know that the centre of the confidence interval is the estimate of the √cost

2e) Hint 13: realise that you now need to square this value to obtain the estimate of the cost, in £'s

2e) Hint 14: repeat the squaring of the end values of the confidence interval for the transformed costs

2f)i) Hint 15: a linear regression model either fits y on x, or x on y, and that these are not the same process

2f)i) Hint 16: both Models have fitted transformed cost on length, which is not the same as fitting length on transformed cost

2f)ii) Hint 17: realise that for Peter to do what we wants to do with his numbers, he needs to construct models of length based on cost, and that these will also likely involve transformations

Paper 2

Question 1

1a) Hint 1: know that the sum of all probabilities of a random variable must equal 1

1a) Hint 2: draw out a table of the five probabilities, each in terms of k

1a) Hint 3: add the five fractions together to equal 1 and solve for k

1b) Hint 4: use standard techniques for working out E(S) and E(S²) from the probability table

1b) Hint 5: know that V(S) = E(S²) - [E(S)]²

Question 2

2a) Hint 1: from the table, identify the number of Teachers who are also Female

2a) Hint 2: from the table, identify the total number of staff

2a) Hint 3: for the second calculation, use the general law of probability: P(X ∪ Y) = P(X) + P(Y) - P(X ∩ Y)

2a) Hint 4: you will need to determine the number of Female Teachers, the number of non-Admin Staff and the number of Female staff who are not Admin

2b)i) Hint 5: consider drawing a tree diagram to summarise the new information, with the first set of three branches being 'teacher', 'admin' and 'other'

2b)i) Hint 6: P(drives) = P(drives | teacher)P(teacher) + P(drives | admin)P(admin) + P(drives | other)P(other)

2b)ii) Hint 7: recognise this is wanting the conditional probability P(admin | not drive)

2b)ii) Hint 8: use P(X | Y) = P(X ∩ Y)/P(Y)

Question 3

Hint 1: know that the single sample Wilcoxon test requires the population distribution to be symmetrical

Hint 2: be sure to phrase this assumption in the context of the problem, by using the phrase 'daily mite counts'

Hint 3: conduct a standard single sample Wilcoxon test, looking out for values of 0 that are omitted, as well as any tied ranks

Hint 4: be sure to phrase your conclusion in terms of whether there is evidence that median mite counts are equal to 7 or greater than 7

Question 4

4a)i) Hint 1: know that P(W > 10) = 1 - P(W ≤ 10)

4a)ii) Hint 2: given that all of the random variables count a rate of the same thing over the same time period, the parameters for the F and M distributions are allowed to be added together

4a)ii) Hint 3: so F + M ∼ Po(3.5)

4b) Hint 4: notice that 6 weeks is a total of 42 periods of 24 hours

4b) Hint 5: construct a new random variable, T, that is the sum of 42 independent observations of each of the three random variables, W, F and M

4b) Hint 6: so T = W1 + . . . + W42 + F1 + . . . + F42 + M1 + . . . + M42

4b) Hint 7: so T has a Poisson distrubution that is the sum of 126 random variables!

4b) Hint 8: use a normal approximation to this poisson distribution

4b) Hint 9: remember to use continuity correction when calculating the probability of < 340 captures

Question 5

Hint 1: recognise that this is non-paired data, so the samples will be pooled to give an estimate for the population standard deviation

Hint 2: use the formula in the data booklet for calculating s², and then subsequently calculate the value of s

Hint 3: know that the number of degrees of freedom for t = the total pooled sample size - 2

Hint 4: be sure to state the final conclusion using the context of the data

Question 6

6a) Hint 1: recognise that the phrase 'groups were proportionally represented' is describing different strata

6b) Hint 2: recognise that there is no randomness involved, so this is a non-random sampling strategy. Therefore it is likely to be either quota or convenience sampling

6b) Hint 3: know that not involving randomness can introduce bias and thus the sample may not be representative of the population

6c) Hint 4: identify the sample size, the sample mean and the claimed value for the population standard deviation

6c) Hint 5: define a new random variable that is for the sample mean, and so the standard error of the mean is not the same as the population standard deviation

6c) Hint 6: know that a 95% confidence inverval comes from using z0.975

6d) Hint 7: look at how the 90% interval is different from the 95% interval

6d) Hint 8: consider what the student may be wanting to show, and why they may be wanting to show it [re-read the information between parts (b) and (c)]

Question 7

7a) Hint 1: consider sketching the graph of the distribution of X to see how 80.45 and 80.83 fit into the region from 78 to 83

7b) Hint 2: notice that the sample size of 75 is large, and that X is not a normal distrubution

7b) Hint 3: recognise that the Central Limit Theorem can be used here, to give the distribution of the sample mean

7b) Hint 4: be sure to clearly state that the CLT tells us that it is approximately normally distributed

7c) Hint 5: use the normal distrubution from part (b) for this

Question 8

Hint 1: assemble all the facts that you have been given and define a random variable in both words and notation

Hint 2: state the hypotheses in terms of the parameter we are testing

Hint 3: realise that this is a single sample z-test of the population mean

Hint 4: the outcome of the test is the same regardless of whether 5% or 1% significance is being used

Hint 5: think about reasons for your conclusion, from the perspective of the candidates as well as from the perspective of the designers of the problem being solved by the candidates

Question 9

9a) Hint 1: know the laws of expectation and variance for the difference between two random variables

9b) Hint 2: think about what A and B each represent, and therefore what their difference stands for

9c) Hint 3: recognise that total profit, T is not equal to 33A + 26B

9c) Hint 4: know that total profit, T = A1 + . . . + A33 + B1 + . . . + B26

9c) Hint 5: after calculating V(T), work out SD(T) and present your answer as an amount of money.

Question 10

10a) Hint 1: start with defining a binomial random variable, approximate it to a normal, and then transform it to a proportion

10a) Hint 2: calculate a z-interval from this proportional random variable, using z0.975 and sample estimates for p and q

10b) Hint 3: if the interval has width 0.04, then this arises from the mean ±0.02

10b) Hint 4: reverse engineer' the confidence interval width to give a value for n

10b) Hint 5: interpret and round this value for n to the nearest whole number

Question 11

11a) Hint 1: look at the shape of the graph and consider which distribution this looks very similar to

11b) Hint 2: recognise that we have lots of sample data, and that the population is normally distributed (but we don't know any population parameters)

11b) Hint 3: notice that the sample sizes are all very large, so the values of the sample standard deviations can be assumed to be numerically close to the values for the population standard deviations

11b) Hint 4: we therefore have either t-tests or z-tests as possible tests to use

11b) Hint 5: if we used a t-test, think how many degrees of freedom it would have, and how that t distribution would compare to the standard normal distribution

11b) Hint 6: if we used a t-test on these two samples, we would pool the samples and have to assume that the standard deviations of the populations were equal. If we used a z-test, then this assumption would not be required.

11b) Hint 7: therefore proceeding with a two sample z-test seems to require the least stringent assumptions

11c) Hint 8: realise that this will require using a similar framework to part (b), but with the adjustments that we will not use the value of 55.4, and we will be 'working backwards' from the critical value corresponding to z0.90

Question 12

12a) Hint 1: think carefully whether the sample size is either 4 or 5 or 20

12a) Hint 2: you should use n = 5 here, as it is the sample size for each 4 hour time period that we would be plotting on a control chart

12b)i) Hint 3: recognise that we are looking for at least 2 'successes' out of 3 'trials'

12b)i) Hint 4: the probability of being above a 1σ limit is calculated from using one of your answers from part (a)

12b)i) Hint 5: realise that the situation is symmetrical for both above the upper 1σ limit and below the lower 1σ limit, and that this will mean a probability will need to be doubled

12b)ii) Hint 6: know that the usual WECO rules each give very small probabilities of returning an 'out of control' warning

12b)ii) Hint 7: recognise that your answer from (b)(i) is a much larger probability and thus what the impact of using this new 'rule' would be

Question 13

13a) Hint 1: consider the context from the perspective of the lecturing quality or from the ability level of the groups of students

13b) Hint 2: realise that there are too many small expected frequencies, and that either columns or rows need to be combined

13b) Hint 3: combine the columns for grade D and grade E

13b) Hint 4: recalculate the degrees of freedom for this smaller contingency table

13b) Hint 5: be sure to state your conclusion in terms of the context of the study

13c) Hint 6: the greatest contribution will typically come from where there is greatest disparity between the observed and expected frequencies

13c) Hint 7: check this calculation on your graphic calculator, accessing the matrix of results that sum to give the X² statistic

Did this hint help?