Paper 1

Question 1

1a) Hint 1: know that an outlier is 1.5 × (Q3 - Q1) above Q3 (the upper fence), or below S1 (the lower fence)

1a) Hint 2: use the summary statistics in the 1980s row of Output 1 to obtain the information you need

1a) Hint 3: see if either the Min value or Max value is beyond either fence

1b) Hint 4: recognise that each decade has 10 years, and so there will be 10 strata

1b) Hint 5: know that we want 2% of all of the songs, from each of these 10 years

1b) Hint 6: know that the 2% of songs from each year is found by doing a simple random sample from each year

1b) Hint 7: be sure to communicate the the whole stratified sample comes from merging together the 10 samples, one from each year

1c) Hint 8: note that there are 6 marks and three things to focus upon (location, spread and sample size)

1c) Hint 9: for each thing, look at the numbers in Output 1 to decide what's happening to each from 1980s to 1990s to 2000s

1c) Hint 10: make sure that you relate you phrase each of your answers using the context of the report. i.e. duration of songs in the Top 40 Charts

1d)i) Hint 11: know that a confidence interval needs a sample mean value to be constructed around - it's in Output 1

1d)i) Hint 12: notice that Output 2 contains the required number of degrees of freedom required for the t-distribution you will use

1d)i) Hint 13: find the value for the sample standard deviation and sample size from Output 1

1d)ii) Hint 14: know that a 95% confidence interval only captures the (unknown) population mean about 95 times of out 100, if the sample means were to be repeated

1e) Hint 15: recognise that the p-value is greater than 0.05

1e) Hint 16: this means that we do not have evidence to reject H0

1e) Hint 17: write the conclusion in terms of mean number of weeks in the Top 40 charts

1f) Hint 18: notice that Output 4 contains the two required standard deviations that will be compared

1f) Hint 19: decide whether 12.713 is 'equal' to 8.991, or not!

Question 2

2a) Hint 1: recognise that it is a non-random sampling method

2a) Hint 2: know that non-random sampling methods cannot be relied upon to give samples that are representative of the population

2b) Hint 3: put yourself in the shoes of the S6 pupils. What would make you want to fill in a survey form?

2c) Hint 4: compare the 'rows' in Figures 1 and 2 and see which one seems to have changed the most

2c) Hint 5: now think about why the Teachers row has changed much more than the Pupils' rows

2d)i) Hint 6: the expected frequency will come from the mean of the observed frequencies

2d)ii) Hint 7: after reading the introduction again, what type of people does it mention, and what type of people does it not mention?

2e) Hint 8: recognise that you will be writing a 'full solution' to a two sample proportion test

2e) Hint 9: notice that the S1 proportion values are given in line 36 of the report

2e) Hint 10: look back to Table 1 to see whether the S1 values came from, and extract the equivalent values for the S5 pupils

2e) Hint 11: use the appropriate formulae on Page 6 of the Statistical Formulae and Tables Booklet

2e) Hint 12: don't forget to show the p-value calculation being 2 × P(Z>2.03)

2f) Hint 13: notice from the Introduction that the intended population was 'all young people'

2f) Hint 14: think how the particular secondary school was chosen

2f) Hint 15: think about whether one secondary school's roll is representative of all young people

Paper 2

Question 1

Hint 1: a standard chi-squared test of association question...

Hint 2: make sure the null hypothesis contains the context and either the phrase 'no association' or 'independent'

Hint 3: calculate the expected frequencies, and for good measure show the actual calculation for one of them that uses the row/column totals

Hint 4: check that there are no expected frequencies that are too small

Hint 5: know that the degrees of freedom = (rows -1 ) × (columns - 1)

Hint 6: obtain the value of the test statistic, X²

Hint 7: obtain either critical values from the Data Booklet, or p-value from a graphic calculator

Hint 8: decide whether or not to reject H0

Hint 9: make sure that final statement includes the context of the problem

Question 2

2a) Hint 1: notice the phrase 'mean rate per minute' in the question, which indicates a Poisson distribution

2b) Hint 2: know that you are working out P(X = 0), which can be using the formula, or a graphic calculator

2c) Hint 3: know that you are working out P(X = 2 and Y = 2)

2c) Hint 4: recognise that P(X = 2 and Y = 2) = P(X = 2) × P(Y = 2) as X and Y are independent (this was stated in the question)

2c) Hint 5: calculate each of P(X = 2) and P(Y = 2) either by formula, or a graphic calculator

2d) Hint 6: notice that both X and Y are rates per minute, and it makes sense to add them

2d) Hint 7: know that X + Y will also be a Poisson distribution, as it is the sum of two independent Poisson distributions that are 'allowed' to be added

2d) Hint 8: know that the parameter for X + Y is the sum of the parameter of X and the parameter of Y

2d) Hint 9: know that P(X + Y > 5) = 1 - P(X + Y ≤ 5)

2d) Hint 10: calculate P(X + Y ≤ 5) using tables or a graphic calculator

Question 3

Hint 1: consider constructing an outcome table with 0, 0, 2, 4, 4 along the row and column headings (but a tree diagram is also possible...)

Hint 2: notice that it is 'without replacement' so the main diagonal of the outcome table is void, as you can't take the same card twice

Hint 3: populate the outcome table with the sum of the row and column values

Hint 4: use the frequencies of 0, 2, 4, 6 and 8 to generate the probabilities of each value of T.

Hint 5: calculate E(T) in the usual manner

Hint 6: calculate E(T²) in the usual manner

Hint 7: use V(V) = E(T²) - E²(X)

Question 4

4a) Hint 1: recognise that we have a random variable with a binomial distribution

4a) Hint 2: we have X ~ B(104, 0.44) and we want P(X = 52)

4a) Hint 3: either use the formula or a graphic calculator to calculate P(X = 52)

4b) Hint 4: recognise that this is asking for a normal approximation to a binomial distribution

4b) Hint 5: calculate the values of the mean and variance of this normal approximation - call this new random variable, Y.

4b) Hint 6: when calculating P(40 ≤ X ≤ 50) remember to use continuity correction

4b) Hint 7: proceed to calculate P(39.5 ≤ Y ≤ 50.5), preferably by standardising Y to Z first.

Question 5

5a) Hint 1: think about whether the data presented is paired, or not.

5b) Hint 2: you should have noticed that it is paired data, so one set can be subtracted from the other to obtain the 'difference'

5b) Hint 3: state your null hypothesis in terms of the mean of the difference being equal to zero

5b) Hint 4: recognise that we do not have the population standard deviation, so it will need to be calculated from the data set

5b) Hint 5: as a consequence, we are now going to use a t8 distribution, and not a z distribution

5b) Hint 6: at the conclusion, note that we are performing a two-tailed test, so use the correct critical value, or p-value

5b) Hint 7: remember to give final conclusion in terms of the context of the problem

Question 6

6a) Hint 1: use page 5 of the Statistical Formulae and Tables booklet to remind you of what E(εi) and V(εi) should be.

6a) Hint 2: know that we need the residual points to be randomly scattered, with constant variance, around zero on the residual plot

6a) Hint 3: comment on whether the shape of the dots on the residual plot meets our expectations for a good model

6b) Hint 4: proceed with the standard process to find out the parameters of a least squares regression line, using the formulae on page 5 of the Statistical Formulae and Tables booklet

6b) Hint 5: use the letter 'w' instead of the letter 'y' throughout your calculations

6b) Hint 6: after obtaining w = 1.80624 - 0.013982x, know that we shall have to substitute in a value for x.

6b) Hint 7: notice that x is not 1927, but rather 1927 - 1840 which equals 87

6b) Hint 8: once you have the value for w, this needs to be converted to a value of y, using the information about logarithms

Question 7

7a) Hint 1: consider re-writing X/n as (¹/n)X to emphasise that X is being multiplied by a constant

7a) Hint 2: know that E(aX + b) = aE(X) + b, and that here a = ¹/n and b = 0

7a) Hint 3: know that V(aX + b) = a²V(X), and that here a = ¹/n and b = 0

7b) Hint 4: know that p̂ = 14 ÷ 50 = 0.28

7b) Hint 5: to work from first principals, define X to have a binomial distribution

7b) Hint 6: then approximate X with a normal distribution

7b) Hint 7: check that np>5 and nq >5 for this approximation to be valid for the method about to be use

7b) Hint 8: for completeness, now divide the normal approximation by 50 to obtain a random variable for the proportion of successes

7b) Hint 9: construct the confidence interval using p̂ ± z0.995 √(p̂q̂/50)

Question 8

8a) Hint 1: know that P(spin 4 and then goldfish) = P(spin 4) × P(goldfish | card number 4)

8b)i) Hint 2: this can be read off from the first row of the table

8b)ii) Hint 3: know that losing the game comes from revealing a shark

8b)ii) Hint 4: notice that revealing a shark can only come from rolling either a 1 or a 4

8b)ii) Hint 5: calculate these two ways of revealing a shark, in a similar manner to that done for part (a)

8b)iii) Hint 6: use the formula P(A | B) = P( A ∩ B) / P(B), where A = 'spin a 1' and B = 'lose the game'

8b)iii) Hint 7: use the values calculated for b)ii) to help evaluate this formula

Question 9

9a) Hint 1: think about why you would ever want to use the Central Limit Theorem - what does it deliver, which you don't know already know?

9a) Hint 2: the CLT is used when the population distribution is not known...

9a) Hint 3: ... as the distribution of the sample mean is then stated to be approximately normal ...

9a) Hint 4: ... with the mean parameters being equal, and the variances different by a factor of 1/n

9b) Hint 5: know that using a z-test requires knowing the variance of the normal distribution

9b) Hint 6: at the outset, we do not know the variance, but we have a large sample from which it could be estimated.

9b) Hint 7: so the 'further assumption' will be that the (large) sample gives a good estimate for the population variance

9b) Hint 8: proceed with quoting the CLT to give the distribution of the sample mean

9b) Hint 9: proceed with calculating a test statistic, and either calculate a p-value, or compare to a critical value

9b) Hint 10: clearly communicate your conclusion, citing the context of the problem.

Question 10

10a) Hint 1: recognise that this is a hypothesis test on ρ, the population correlation coefficient

10a) Hint 2: calculate the value of r from Sxy, Sxx and Syy

10a) Hint 3: use the formulae from the Data Booklet to calculate the test statistic, t, using n and r

10a) Hint 4: the number of degrees of freedom is two less than the sample size (due to bivariate data)

10a) Hint 5: note that you are conducting a two-tailed test

10a) Hint 6: clearly communicate your conclusion, citing the context of the problem.

10a) Hint 7: there are several underlying assumptions that could be mentioned - think of the sample, and of the population distributions ...

10a) Hint 8: ... as with all samples, we'd expect them to be independent values

10a) Hint 9: ... and for a t-distribution to be used, there must be the assumption of normality early on in the process

10b) Hint 10: consider what else may have caused someone to die ...

10b) Hint 11: ... and just because a correlation exists between two events, it does not mean that one event caused the other

Question 11

11a)i) Hint 1: decide on the stems, and on the leaves, given the size of all of the values

11a)i) Hint 2: remember to order both sets of the leaves so that the smallerst values are nearest the stem

11a)i) Hint 3: remember to include titles for each side and a key to explain that, say, 2 | 8 = 2.8

11a)ii) Hint 4: state your null hypothesis in terms of the population medians being equal

11a)ii) Hint 5: decide whether the alternative hypothesis is one, or two tailed, based on the phrasing of the question

11a)ii) Hint 6: note that we are given the rank sum (of 89) so we don't need to manually rank all the values

11a)ii) Hint 7: read off the critical value for a two tailed test carefully from the Data Booklet

11a)ii) Hint 8: clearly communicate your conclusion, citing the context of the problem.

11b) Hint 9: if A = adult reaction time, and J = jeuvenile raction time, then decide whether we want A to be numerically more, or less, than J ...

11b) Hint 10: ... we want P(A > J) ...

11b) Hint 11: ... but this involves two random variables, so we need to change 'A > J' into 'A - J > 0'

11b) Hint 12: and define a new random variable, D = A - J, and determine D's distribution and its parameters

11b) Hint 13: know that D's variance will be V(D) = V(A - J) = V(A) + V(J) ... The variances are added, not subtracted.

11b) Hint 14: calculate P(D > 0)

Did this hint help?