BS2506 INFERENTIAL STATISTICS, STATISTICAL MODELLING & SURVEY METHODS 2017/2018
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
2017/2018
SPRING
BS2506
INFERENTIAL STATISTICS, STATISTICAL MODELLING & SURVEY METHODS
1. a) Explain the advantages and disadvantages of non-parametric statistical tests.
(10 marks)
b) A random sample of 500 adults were questioned regarding their political affiliation and opinion on a tax reform bill. The responses are shown in the following contingency table:
|
Favour |
Indifferent |
Opposed |
Labour |
138 |
83 |
64 |
Conservative |
64 |
67 |
84 |
(i) Using α = 0.01, test to see whether there is any evidence that the political affiliation and their opinion on a tax reform are associated?
(ii) Calculate Cramer’s contingency coefficient and interpret it.
(iii) Construct a 99% confidence interval estimate for the percentage of people who
are in favour of the tax reform bill and interpret its meaning.
(23 marks)
2 a) i) What is the nature of multicollinearity and what are its practical consequences?
ii) How can you detect and deal with multicollinearity
(11 marks)
b) Your company has 20 retail outlets across Britain selling a similar range of products. Using last year’s data, a regression equation was developed relating Sales (in £10,000)
to three independent variables. These variables are:
X1: Floor space of the outlet (in sq. meter)
X2 : Size of population in the catchment areas (in thousands)
X3 : 1 if store is situated on a prime site location, 0 otherwise
Part of the regression results obtained are shown below.
Variables in equation |
||
Variable |
ˆ |
SE( ˆ ) |
Constant |
16.39 |
2.635 |
X1 |
0.1751 |
0.0467 |
X2 |
0.2069 |
0.0398 |
X3 |
1.552 |
0.1829 |
For this model:
SSR = Regression Sum of Squares = 7500
SST = Total Sum of Squares = 8580
i) Use the above results to write the regression model and interpret the meaning of the slope coefficients.
ii) Explain what happens if you would add another variable for location as (X4 = 0 if store is located on a prime site, 1 representing otherwise) in the model.
iii) Predict the sales of a new store with the size of 100 sq. meters, in a catchment area of 75,000 and prime site location.
(9 marks)
c) At α=0.01 level of significance,
i) Conduct a test to determine whether there is a significant relationship between sales and the three explanatory variables
ii) Determine which of the explanatory variables have significant regression coefficients. Which variable(s) would you consider eliminating?
(13 marks)
Q3 (a) (i) State the assumptions behind the classical linear regression model and explain
briefly what each means and how to check them.
(ii) For the following model, outline the method you would use to estimate the
parameters.
Y = 0 XX
(14 marks)
A company has opened several outdoor ice-skating rinks and would like to know what factors affects the attendance at the rinks. The manager believes that the following variables affects attendance.
X1: Temperature
X2 : Wind speed
X3 : 1 if weekend, 0 otherwise
X4 : X1 X2
The following least square regression was found from 30 days of data:
Ŷ = 250 + 4.8X1 -30X2 + 1.3X3 + 35X4 R2 = 0.72 (Model 1)
i) What is the predicted attendance on a weekend if the temperature is 28 degrees Fahrenheit and wind speed is 12 miles per hour?
ii) At the 5% level of significance, test to determine whether Model 1 is significant.
iii) The coefficient of determination for the model which involves only the independent variables X1 and X2 is 0.52. Do the variables X3 and X4 in Model 1 contribute significantly to predicting the variation in attendance? Use a 5% significance level.
iv) Compute the adjusted coefficient of determination for Model 1. Explain the difference between R2 and the adjusted R2 .
(19 marks)
Propose a time series regression model for quarterly data ( with 20 observations) that will account for both the linear trend and seasonal variations in the data.
From your proposed model, write down the forecast for each quarter of Year 6.
Explain, with the help of a diagram, what the quarterly dummy variables do. (12 marks)
The following data represent the annual revenues (in billions of pounds) of a company over the past 20 years.
Year |
Revenues |
1 |
5.2 |
2 |
4.3 |
3 |
5.0 |
4 |
6.0 |
5 |
7.1 |
6 |
8.2 |
7 |
12.7 |
8 |
15.1 |
9 |
17.8 |
10 |
20.1 |
11 |
21.0 |
12 |
22.4 |
13 |
25.0 |
14 |
34.3 |
15 |
34.6 |
16 |
35.6 |
17 |
37.1 |
18 |
40.1 |
19 |
45.1 |
20 |
47.2 |
Forecast the revenues for the next two years using:
i) Moving average, with K=3
ii) Exponential smoothing with α= 1
iii) Holt model with α = 0.9 and y =0.5 (You may use L19 = 44.80, T19 = 3.55)
iv) The estimated linear trend equation
Yt = -2.97 + 2.4t, t=1,2 …… 20
v) The estimated quadratic trend model
Yt = 1.32 + 1.22t + 0.0558t2 t =1,2…… .20
(14 marks)
For the five forecasting methods in (b), the respective mean absolute errors (MAE) are:
MAE (Moving average) = 4.76
MAE (Exponential smoothing) = 2.48
MAE (Holt) = 1.8
MAE (Linear Trend Model) = 1.76
MAE (Quadratic trend model) = 1.36
Which of the five methods would you select for the purpose of forecasting? Discuss.
(7 marks)
Q5 a) Give a brief account of the sources of error which can affect the survey process
from survey design through to presentation of results.
(16 marks)
b) Explain the cluster sampling method. Discuss two advantages of this technique.
(17 marks)
2022-08-25