The Scottish Qualifications Authority owns the copyright to its exam papers and marking instructions.
Hints offered by N Hopley
Click here to start/reset.
Paper 1
Question 1
1a) Hint 1: know that stem and leaf diagrams need a key
1a) Hint 2: they also need the leaves listed in increasing order, moving away from the stem
1a) Hint 3: and back to back stem and leaf diagrams also need a title for each side
1b) Hint 4: decide which side of the stem and leaf diagram relates to doodles of cats, by looking at either the maximum or minimum values and referring to the summary data below the diagram
1b) Hint 5: know that the outliers being asked about will be beyond the upper fence, rather than below the lower fence
1b) Hint 6: know that the upper fence is Q3 + 1.5 × IQR
1c)i) Hint 7: recognise that the success rate is a number of successs out of a number of 'trials'
1c)i) Hint 8: this indicates a Binomial setup, from which proportions of success can be calculated
1c)i) Hint 9: hence we have two proportions, and we are interested in whether they are different
1c)i) Hint 10: therefore a two sample proportion test is appropriate
1c)ii) Hint 11: make it clear which proportions are which, by using pcats and pdogs
1d) Hint 12: know that Mann-Whitney tests require the shape and spread of the population distributions to be the same
1d) Hint 13: also sampling theory requires the samples to be independent and randomly selected from the populations
1d) Hint 14: remember to phrase these assumptions in the context of the study, so mention that the sample data is of 'times to draw a doodle'
1e) Hint 15: recognise that the large sample sizes require a normal approximation to be used
1e) Hint 16: correctly identify that m = 121 and n = 145
1e) Hint 17: calculate E(W) and V(W) using the formulae below the Mann-Whitney tables in the Data Booklet
1e) Hint 18: as we are approximating a discrete distribution with a continuous distribution, remember to use continuity correction
1e) Hint 19: so P(W ≤ 12048) becomes P(W < 12048.5)
1e) Hint 20: now standardise the value of 12048.5 to give the probability of -6.57
1f) Hint 21: make sure the hypotheses mention 'median' and that they clearly refer to the time take to draw each animal
1f) Hint 22: recognise that a two tailed test is being conducted
1f) Hint 23: know that P(Z < -2.58)=0.005
1f) Hint 24: hence the significance level = 2 × 0.005 = 0.01 = 1%
1f) Hint 25: in phrasing your conclusion, avoid strong words like 'prove'. We only have 'evidence to suggest'
1f) Hint 26: reflect the fact that it was a two-tailed test in your wording, and not a one-tailed test
1g) Hint 27: know that a two sample pooled t-test would require an assumption to do with the two population variances
1g) Hint 28: know that for these large samples, the population standard deviations could be well approximated by the sample standard deviations
Question 2
2a) Hint 1: notice that as length increases, the cost also increases
2a) Hint 2: this suggests a positive correlation but . . .
2a) Hint 3: . . . not necessarily a linear relationship
2b) Hint 4: both models have small p-values arising from high values of r
2b) Hint 5: know that high values of r mean evidence of a linear relationship
2b) Hint 6: hence both transformation models seem to give linear data sets
2c) Hint 7: know that the coefficient of determination is written as R² which is r²
2c) Hint 8: know that you need to write a sentence about the percentage of variation in the data that is explained by the model, ensuring that you phrase it in the context of what the numbers refer to
2d) Hint 9: know that we are looking for residual plots where there is a random scatter of points, centred on the zero axis
2d) Hint 10: be confident to conclude that both residual plots seem suitable in this regard, and do not feel as if you had to favour one over the other!
2e) Hint 11: know that a confidence interval is constructed around a sample mean, which is the estimated cost in this context
2e) Hint 12: know that the centre of the confidence interval is the estimate of the √cost
2e) Hint 13: realise that you now need to square this value to obtain the estimate of the cost, in £'s
2e) Hint 14: repeat the squaring of the end values of the confidence interval for the transformed costs
2f)i) Hint 15: a linear regression model either fits y on x, or x on y, and that these are not the same process
2f)i) Hint 16: both Models have fitted transformed cost on length, which is not the same as fitting length on transformed cost
2f)ii) Hint 17: realise that for Peter to do what we wants to do with his numbers, he needs to construct models of length based on cost, and that these will also likely involve transformations
Paper 2
Question 1
1a) Hint 1: know that the sum of all probabilities of a random variable must equal 1
1a) Hint 2: draw out a table of the five probabilities, each in terms of k
1a) Hint 3: add the five fractions together to equal 1 and solve for k
1b) Hint 4: use standard techniques for working out E(S) and E(S²) from the probability table
1b) Hint 5: know that V(S) = E(S²) - [E(S)]²
Question 2
2a) Hint 1: from the table, identify the number of Teachers who are also Female
2a) Hint 2: from the table, identify the total number of staff
2a) Hint 3: for the second calculation, use the general law of probability: P(X ∪ Y) = P(X) + P(Y) - P(X ∩ Y)
2a) Hint 4: you will need to determine the number of Female Teachers, the number of non-Admin Staff and the number of Female staff who are not Admin
2b)i) Hint 5: consider drawing a tree diagram to summarise the new information, with the first set of three branches being 'teacher', 'admin' and 'other'
2b)i) Hint 6: P(drives) = P(drives | teacher)P(teacher) + P(drives | admin)P(admin) + P(drives | other)P(other)
2b)ii) Hint 7: recognise this is wanting the conditional probability P(admin | not drive)
2b)ii) Hint 8: use P(X | Y) = P(X ∩ Y)/P(Y)
Question 3
Hint 1: know that the single sample Wilcoxon test requires the population distribution to be symmetrical
Hint 2: be sure to phrase this assumption in the context of the problem, by using the phrase 'daily mite counts'
Hint 3: conduct a standard single sample Wilcoxon test, looking out for values of 0 that are omitted, as well as any tied ranks
Hint 4: be sure to phrase your conclusion in terms of whether there is evidence that median mite counts are equal to 7 or greater than 7
Question 4
4a)i) Hint 1: know that P(W > 10) = 1 - P(W ≤ 10)
4a)ii) Hint 2: given that all of the random variables count a rate of the same thing over the same time period, the parameters for the F and M distributions are allowed to be added together
4a)ii) Hint 3: so F + M ∼ Po(3.5)
4b) Hint 4: notice that 6 weeks is a total of 42 periods of 24 hours
4b) Hint 5: construct a new random variable, T, that is the sum of 42 independent observations of each of the three random variables, W, F and M
4b) Hint 6: so T = W1 + . . . + W42 + F1 + . . . + F42 + M1 + . . . + M42
4b) Hint 7: so T has a Poisson distrubution that is the sum of 126 random variables!
4b) Hint 8: use a normal approximation to this poisson distribution
4b) Hint 9: remember to use continuity correction when calculating the probability of < 340 captures
Question 5
Hint 1: recognise that this is non-paired data, so the samples will be pooled to give an estimate for the population standard deviation
Hint 2: use the formula in the data booklet for calculating s², and then subsequently calculate the value of s
Hint 3: know that the number of degrees of freedom for t = the total pooled sample size - 2
Hint 4: be sure to state the final conclusion using the context of the data
Question 6
6a) Hint 1: recognise that the phrase 'groups were proportionally represented' is describing different strata
6b) Hint 2: recognise that there is no randomness involved, so this is a non-random sampling strategy. Therefore it is likely to be either quota or convenience sampling
6b) Hint 3: know that not involving randomness can introduce bias and thus the sample may not be representative of the population
6c) Hint 4: identify the sample size, the sample mean and the claimed value for the population standard deviation
6c) Hint 5: define a new random variable that is for the sample mean, and so the standard error of the mean is not the same as the population standard deviation
6c) Hint 6: know that a 95% confidence inverval comes from using z0.975
6d) Hint 7: look at how the 90% interval is different from the 95% interval
6d) Hint 8: consider what the student may be wanting to show, and why they may be wanting to show it [re-read the information between parts (b) and (c)]
Question 7
7a) Hint 1: consider sketching the graph of the distribution of X to see how 80.45 and 80.83 fit into the region from 78 to 83
7b) Hint 2: notice that the sample size of 75 is large, and that X is not a normal distrubution
7b) Hint 3: recognise that the Central Limit Theorem can be used here, to give the distribution of the sample mean
7b) Hint 4: be sure to clearly state that the CLT tells us that it is approximately normally distributed
7c) Hint 5: use the normal distrubution from part (b) for this
Question 8
Hint 1: assemble all the facts that you have been given and define a random variable in both words and notation
Hint 2: state the hypotheses in terms of the parameter we are testing
Hint 3: realise that this is a single sample z-test of the population mean
Hint 4: the outcome of the test is the same regardless of whether 5% or 1% significance is being used
Hint 5: think about reasons for your conclusion, from the perspective of the candidates as well as from the perspective of the designers of the problem being solved by the candidates
Question 9
9a) Hint 1: know the laws of expectation and variance for the difference between two random variables
9b) Hint 2: think about what A and B each represent, and therefore what their difference stands for
9c) Hint 3: recognise that total profit, T is not equal to 33A + 26B
9c) Hint 4: know that total profit, T = A1 + . . . + A33 + B1 + . . . + B26
9c) Hint 5: after calculating V(T), work out SD(T) and present your answer as an amount of money.
Question 10
10a) Hint 1: start with defining a binomial random variable, approximate it to a normal, and then transform it to a proportion
10a) Hint 2: calculate a z-interval from this proportional random variable, using z0.975 and sample estimates for p and q
10b) Hint 3: if the interval has width 0.04, then this arises from the mean ±0.02
10b) Hint 4: reverse engineer' the confidence interval width to give a value for n
10b) Hint 5: interpret and round this value for n to the nearest whole number
Question 11
11a) Hint 1: look at the shape of the graph and consider which distribution this looks very similar to
11b) Hint 2: recognise that we have lots of sample data, and that the population is normally distributed (but we don't know any population parameters)
11b) Hint 3: notice that the sample sizes are all very large, so the values of the sample standard deviations can be assumed to be numerically close to the values for the population standard deviations
11b) Hint 4: we therefore have either t-tests or z-tests as possible tests to use
11b) Hint 5: if we used a t-test, think how many degrees of freedom it would have, and how that t distribution would compare to the standard normal distribution
11b) Hint 6: if we used a t-test on these two samples, we would pool the samples and have to assume that the standard deviations of the populations were equal. If we used a z-test, then this assumption would not be required.
11b) Hint 7: therefore proceeding with a two sample z-test seems to require the least stringent assumptions
11c) Hint 8: realise that this will require using a similar framework to part (b), but with the adjustments that we will not use the value of 55.4, and we will be 'working backwards' from the critical value corresponding to z0.90
Question 12
12a) Hint 1: think carefully whether the sample size is either 4 or 5 or 20
12a) Hint 2: you should use n = 5 here, as it is the sample size for each 4 hour time period that we would be plotting on a control chart
12b)i) Hint 3: recognise that we are looking for at least 2 'successes' out of 3 'trials'
12b)i) Hint 4: the probability of being above a 1σ limit is calculated from using one of your answers from part (a)
12b)i) Hint 5: realise that the situation is symmetrical for both above the upper 1σ limit and below the lower 1σ limit, and that this will mean a probability will need to be doubled
12b)ii) Hint 6: know that the usual WECO rules each give very small probabilities of returning an 'out of control' warning
12b)ii) Hint 7: recognise that your answer from (b)(i) is a much larger probability and thus what the impact of using this new 'rule' would be
Question 13
13a) Hint 1: consider the context from the perspective of the lecturing quality or from the ability level of the groups of students
13b) Hint 2: realise that there are too many small expected frequencies, and that either columns or rows need to be combined
13b) Hint 3: combine the columns for grade D and grade E
13b) Hint 4: recalculate the degrees of freedom for this smaller contingency table
13b) Hint 5: be sure to state your conclusion in terms of the context of the study
13c) Hint 6: the greatest contribution will typically come from where there is greatest disparity between the observed and expected frequencies
13c) Hint 7: check this calculation on your graphic calculator, accessing the matrix of results that sum to give the X² statistic