The Scottish Qualifications Authority owns the copyright to its exam papers and marking instructions.
Hints offered by N Hopley
Click here to start/reset.
Paper 1
Question 1
1a) Hint 1: know that an outlier is 1.5 × (Q3 - Q1) above Q3 (the upper fence), or below Q1 (the lower fence)
1a) Hint 2: use the summary statistics in the 1980s row of Output 1 to obtain the information you need
1a) Hint 3: see if either the Min value or Max value is beyond either fence
1b) Hint 4: recognise that each decade has 10 years, and so there will be 10 strata
1b) Hint 5: know that we want 2% of all of the songs, from each of these 10 years
1b) Hint 6: know that the 2% of songs from each year is found by doing a simple random sample from each year
1b) Hint 7: be sure to communicate the the whole stratified sample comes from merging together the 10 samples, one from each year
1c) Hint 8: note that there are 6 marks and three things to focus upon (location, spread and sample size)
1c) Hint 9: for each thing, look at the numbers in Output 1 to decide what's happening to each from 1980s to 1990s to 2000s
1c) Hint 10: make sure that you relate you phrase each of your answers using the context of the report. i.e. duration of songs in the Top 40 Charts
1d)i) Hint 11: know that a confidence interval needs a sample mean value to be constructed around - it's in Output 1
1d)i) Hint 12: notice that Output 2 contains the required number of degrees of freedom required for the t-distribution you will use
1d)i) Hint 13: find the value for the sample standard deviation and sample size from Output 1
1d)ii) Hint 14: know that a 95% confidence interval only captures the (unknown) population mean about 95 times of out 100, if the sample means were to be repeated
1e) Hint 15: recognise that the p-value is greater than 0.05
1e) Hint 16: this means that we do not have evidence to reject H0
1e) Hint 17: write the conclusion in terms of mean number of weeks in the Top 40 charts
1f) Hint 18: notice that Output 4 contains the two required standard deviations that will be compared
1f) Hint 19: decide whether 12.713 is 'equal' to 8.991, or not!
Question 2
2a) Hint 1: recognise that it is a non-random sampling method
2a) Hint 2: know that non-random sampling methods cannot be relied upon to give samples that are representative of the population
2b) Hint 3: put yourself in the shoes of the S6 pupils. What would make you want to fill in a survey form?
2c) Hint 4: compare the 'rows' in Figures 1 and 2 and see which one seems to have changed the most
2c) Hint 5: now think about why the Teachers row has changed much more than the Pupils' rows
2d)i) Hint 6: the expected frequency will come from the mean of the observed frequencies
2d)ii) Hint 7: after reading the introduction again, what type of people does it mention, and what type of people does it not mention?
2e) Hint 8: recognise that you will be writing a 'full solution' to a two sample proportion test
2e) Hint 9: notice that the S1 proportion values are given in line 36 of the report
2e) Hint 10: look back to Table 1 to see whether the S1 values came from, and extract the equivalent values for the S5 pupils
2e) Hint 11: use the appropriate formulae on Page 6 of the Statistical Formulae and Tables Booklet
2e) Hint 12: don't forget to show the p-value calculation being 2 × P(Z>2.03)
2f) Hint 13: notice from the Introduction that the intended population was 'all young people'
2f) Hint 14: think how the particular secondary school was chosen
2f) Hint 15: think about whether one secondary school's roll is representative of all young people
Paper 2
Question 1
Hint 1: a standard chi-squared test of association question...
Hint 2: make sure the null hypothesis contains the context and either the phrase 'no association' or 'independent'
Hint 3: calculate the expected frequencies, and for good measure show the actual calculation for one of them that uses the row/column totals
Hint 4: check that there are no expected frequencies that are too small
Hint 5: know that the degrees of freedom = (rows -1 ) × (columns - 1)
Hint 6: obtain the value of the test statistic, X²
Hint 7: obtain either critical values from the Data Booklet, or p-value from a graphic calculator
Hint 8: decide whether or not to reject H0
Hint 9: make sure that final statement includes the context of the problem
Question 2
2a) Hint 1: notice the phrase 'mean rate per minute' in the question, which indicates a Poisson distribution
2b) Hint 2: know that you are working out P(X = 0), which can be using the formula, or a graphic calculator
2c) Hint 3: know that you are working out P(X = 2 and Y = 2)
2c) Hint 4: recognise that P(X = 2 and Y = 2) = P(X = 2) × P(Y = 2) as X and Y are independent (this was stated in the question)
2c) Hint 5: calculate each of P(X = 2) and P(Y = 2) either by formula, or a graphic calculator
2d) Hint 6: notice that both X and Y are rates per minute, and it makes sense to add them
2d) Hint 7: know that X + Y will also be a Poisson distribution, as it is the sum of two independent Poisson distributions that are 'allowed' to be added
2d) Hint 8: know that the parameter for X + Y is the sum of the parameter of X and the parameter of Y
2d) Hint 9: know that P(X + Y > 5) = 1 - P(X + Y ≤ 5)
2d) Hint 10: calculate P(X + Y ≤ 5) using tables or a graphic calculator
Question 3
Hint 1: consider constructing an outcome table with 0, 0, 2, 4, 4 along the row and column headings (but a tree diagram is also possible...)
Hint 2: notice that it is 'without replacement' so the main diagonal of the outcome table is void, as you can't take the same card twice
Hint 3: populate the outcome table with the sum of the row and column values
Hint 4: use the frequencies of 0, 2, 4, 6 and 8 to generate the probabilities of each value of T.
Hint 5: calculate E(T) in the usual manner
Hint 6: calculate E(T²) in the usual manner
Hint 7: use V(V) = E(T²) - E²(X)
Question 4
4a) Hint 1: recognise that we have a random variable with a binomial distribution
4a) Hint 2: we have X ~ B(104, 0.44) and we want P(X = 52)
4a) Hint 3: either use the formula or a graphic calculator to calculate P(X = 52)
4b) Hint 4: recognise that this is asking for a normal approximation to a binomial distribution
4b) Hint 5: calculate the values of the mean and variance of this normal approximation - call this new random variable, Y.
4b) Hint 6: when calculating P(40 ≤ X ≤ 50) remember to use continuity correction
4b) Hint 7: proceed to calculate P(39.5 ≤ Y ≤ 50.5), preferably by standardising Y to Z first.
Question 5
5a) Hint 1: think about whether the data presented is paired, or not.
5b) Hint 2: you should have noticed that it is paired data, so one set can be subtracted from the other to obtain the 'difference'
5b) Hint 3: state your null hypothesis in terms of the mean of the difference being equal to zero
5b) Hint 4: recognise that we do not have the population standard deviation, so it will need to be calculated from the data set
5b) Hint 5: as a consequence, we are now going to use a t8 distribution, and not a z distribution
5b) Hint 6: at the conclusion, note that we are performing a two-tailed test, so use the correct critical value, or p-value
5b) Hint 7: remember to give final conclusion in terms of the context of the problem
Question 6
6a) Hint 1: use page 5 of the Statistical Formulae and Tables booklet to remind you of what E(εi) and V(εi) should be.
6a) Hint 2: know that we need the residual points to be randomly scattered, with constant variance, around zero on the residual plot
6a) Hint 3: comment on whether the shape of the dots on the residual plot meets our expectations for a good model
6b) Hint 4: proceed with the standard process to find out the parameters of a least squares regression line, using the formulae on page 5 of the Statistical Formulae and Tables booklet
6b) Hint 5: use the letter 'w' instead of the letter 'y' throughout your calculations
6b) Hint 6: after obtaining w = 1.80624 - 0.013982x, know that we shall have to substitute in a value for x.
6b) Hint 7: notice that x is not 1927, but rather 1927 - 1840 which equals 87
6b) Hint 8: once you have the value for w, this needs to be converted to a value of y, using the information about logarithms
Question 7
7a) Hint 1: consider re-writing X/n as (¹/n)X to emphasise that X is being multiplied by a constant
7a) Hint 2: know that E(aX + b) = aE(X) + b, and that here a = ¹/n and b = 0
7a) Hint 3: know that V(aX + b) = a²V(X), and that here a = ¹/n and b = 0
7b) Hint 4: know that p̂ = 14 ÷ 50 = 0.28
7b) Hint 5: to work from first principals, define X to have a binomial distribution
7b) Hint 6: then approximate X with a normal distribution
7b) Hint 7: check that np>5 and nq >5 for this approximation to be valid for the method about to be use
7b) Hint 8: for completeness, now divide the normal approximation by 50 to obtain a random variable for the proportion of successes
7b) Hint 9: construct the confidence interval using p̂ ± z0.995 √(p̂q̂/50)
Question 8
8a) Hint 1: know that P(spin 4 and then goldfish) = P(spin 4) × P(goldfish | card number 4)
8b)i) Hint 2: this can be read off from the first row of the table
8b)ii) Hint 3: know that losing the game comes from revealing a shark
8b)ii) Hint 4: notice that revealing a shark can only come from rolling either a 1 or a 4
8b)ii) Hint 5: calculate these two ways of revealing a shark, in a similar manner to that done for part (a)
8b)iii) Hint 6: use the formula P(A | B) = P( A ∩ B) / P(B), where A = 'spin a 1' and B = 'lose the game'
8b)iii) Hint 7: use the values calculated for b)ii) to help evaluate this formula
Question 9
9a) Hint 1: think about why you would ever want to use the Central Limit Theorem - what does it deliver, which you don't know already know?
9a) Hint 2: the CLT is used when the population distribution is not known...
9a) Hint 3: ... as the distribution of the sample mean is then stated to be approximately normal ...
9a) Hint 4: ... with the mean parameters being equal, and the variances different by a factor of 1/n
9b) Hint 5: know that using a z-test requires knowing the variance of the normal distribution
9b) Hint 6: at the outset, we do not know the variance, but we have a large sample from which it could be estimated.
9b) Hint 7: so the 'further assumption' will be that the (large) sample gives a good estimate for the population variance
9b) Hint 8: proceed with quoting the CLT to give the distribution of the sample mean
9b) Hint 9: proceed with calculating a test statistic, and either calculate a p-value, or compare to a critical value
9b) Hint 10: clearly communicate your conclusion, citing the context of the problem.
Question 10
10a) Hint 1: recognise that this is a hypothesis test on ρ, the population correlation coefficient
10a) Hint 2: calculate the value of r from Sxy, Sxx and Syy
10a) Hint 3: use the formulae from the Data Booklet to calculate the test statistic, t, using n and r
10a) Hint 4: the number of degrees of freedom is two less than the sample size (due to bivariate data)
10a) Hint 5: note that you are conducting a two-tailed test
10a) Hint 6: clearly communicate your conclusion, citing the context of the problem.
10a) Hint 7: there are several underlying assumptions that could be mentioned - think of the sample, and of the population distributions ...
10a) Hint 8: ... as with all samples, we'd expect them to be independent values
10a) Hint 9: ... and for a t-distribution to be used, there must be the assumption of normality early on in the process
10b) Hint 10: consider what else may have caused someone to die ...
10b) Hint 11: ... and just because a correlation exists between two events, it does not mean that one event caused the other
Question 11
11a)i) Hint 1: decide on the stems, and on the leaves, given the size of all of the values
11a)i) Hint 2: remember to order both sets of the leaves so that the smallerst values are nearest the stem
11a)i) Hint 3: remember to include titles for each side and a key to explain that, say, 2 | 8 = 2.8
11a)ii) Hint 4: state your null hypothesis in terms of the population medians being equal
11a)ii) Hint 5: decide whether the alternative hypothesis is one, or two tailed, based on the phrasing of the question
11a)ii) Hint 6: note that we are given the rank sum (of 89) so we don't need to manually rank all the values
11a)ii) Hint 7: read off the critical value for a two tailed test carefully from the Data Booklet
11a)ii) Hint 8: clearly communicate your conclusion, citing the context of the problem.
11b) Hint 9: if A = adult reaction time, and J = jeuvenile raction time, then decide whether we want A to be numerically more, or less, than J ...
11b) Hint 10: ... we want P(A > J) ...
11b) Hint 11: ... but this involves two random variables, so we need to change 'A > J' into 'A - J > 0'
11b) Hint 12: and define a new random variable, D = A - J, and determine D's distribution and its parameters
11b) Hint 13: know that D's variance will be V(D) = V(A - J) = V(A) + V(J) ... The variances are added, not subtracted.
11b) Hint 14: calculate P(D > 0)