MSCI212 Statistical Methods for Business 2022 EXAMINATIONS PART II
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
2022 EXAMINATIONS
PART II (Second, Third and Final Year)
MANAGEMENT SCIENCE
MSCI212 Statistical Methods for Business
Question 1
A business think-tank in the North-West region are concerned about the number of failures of small to medium enterprises (SMEs) during the Covid pandemic. In 2019, before the pandemic, the mean number of SMEs failing per week was 5.
(a) Which probability distribution would you use to describe the number of SME failures in a given week in 2019? Clearly state any assumptions you must make. (4 marks)
(b) What is the probability that during a given week in 2019, there was less than 4 SME failures? (4 marks)
(c) What is the probability that during a given month in 2019 (i.e., a four-week period) there was more than 30 SME failures? (4 marks)
(d) Give an example where one of the assumptions for your choice of probability distribution in part (a) may not strictly hold? (3 marks)
(e) In the last 27 weeks of 2019, the number of SME failures in each week was recorded. The sample mean number of failures per week was 4.9 with a sample standard deviation of 2.2. Data have also been collected over the last 27 weeks of 2020, which was during the first year of the Covid pandemic. The sample mean number of failures per week was 6.5 with a sample standard deviation of 2.6.
Carry out an appropriate test to see whether there has been a significant increase in SME failures from 2019 to 2020. Justify your choice of test, stating clearly any assumptions that you have made. Use a 1% significance level and state your conclusion clearly. (10 marks)
Question 2
At the turn of the year, a UK consumer website published an article highlighting the impact of the global supply chain crisis on the UK car market. A shortage of computer chips for new cars has reduced production and, as a consequence, increased demand for second-hand cars. In the article, it is claimed that the mean price of a second-hand car in November 2020 has increased by more than 25% of the population mean price of £13,504 recorded in November 2019. Second-hand car prices are assumed to follow a normal distribution with a known standard deviation of £800. Data collected from a sample of 100 second-hand cars sold in November 2020 were collected and the sample mean price was found to be £17,000.
(a) Calculate a 95% confidence interval for the mean second-hand car price in November 2020, justifying your choice of calculation method. Conclude whether there has been a statistically significant increase over the mean price in November 2019. (5 marks)
(b) Propose and carry out a test at the 5% significance level to see whether the mean second-hand car price in November 2020 has increased by more than 25% over the mean price in November 2019. Justify your choice of test and state your conclusion clearly. (8 marks)
(c) Show that the power of the statistical test in part (b) is more than 50% if the true increase over the mean price in November 2019 is 26%. (8 marks)
(d) What should the sample size be for the statistical test in part (c) to achieve a test power of at least 90%? (4 marks)
Question 3
A games developer claims that the average amount of time a player of a new MMORPG (Massively Multiplayer Online Roleplaying Game) spends in the online game world during a single session is 40 minutes. It has been suggested by the developer that the time spent by players in a single session of the game follows an exponential distribution.
(a) On the basis of the information provided by the games developer, write down the cumulative probability function for the time spent by players in a single session of the game. (2 marks)
(b) What is the probability that a player will spend less than 25 minutes in the game world during a single session of the game? (4 marks)
(c) What is the probability that a player will spend between 50 and 75 minutes in the game world during a single session of the game? (4 marks)
(d) A computer games website have decided to investigate the developer’s suggestion that single session playing durations follow an exponential distribution. Data have been collected on the duration of a single gaming session from a random sample of 100 players of the game and are summarised below:
Duration of a single session |
Number of observed players |
Less than 25 minutes |
31 |
Between 25 and 50 minutes |
32 |
Between 50 and 75 minutes |
19 |
Between 75 and 100 minutes |
11 |
Between 100 and 125 minutes |
4 |
More than 125 minutes |
3 |
Using the information provided, test the developer’s suggestion that single
session playing durations follow an exponential distribution with a mean of 40 minutes at the 5% level of significance. (15 marks)
Question 4
An article is being produced for a business magazine looking into excessive alcohol consumption amongst people in managerial roles in the UK. The UK Chief Medical Officers’ advice is to not drink regularly more than 14 units of alcohol per week, to keep health risks from drinking alcohol to a low level. A recent study suggests that, in the UK, the percentage of the adult population who regularly drink more than 14 units of alcohol per week is 20%.
(a) If a random sample of 20 adults has been taken, what probability distribution describes the number of adults in the sample that regularly drink more than 14 units of alcohol per week? Justify your answer. (3 marks)
(b) If a random sample of 20 adults has been taken, what is the probability that less than 5 adults in the sample regularly drink more than 14 units of alcohol per week? (3 marks)
(c) If a random sample of 500 adults has been taken, what is the probability that between 75 and 100 adults (inclusive of the end-points) in the sample regularly drink more than 14 units of alcohol per week? (Hint: Use normal approximation.) (4 marks)
(d) For the article, the magazine have taken a sample of 500 adults in the UK. They would like you to test whether alcohol consumption is independent of job role. Perform an appropriate test at a 5% level of significance. The aggregated responses from each adult have been recorded in the table below, with job role coded as “management” or “other” and alcohol consumption coded as “ ≤ 14 units” or “> 14 units” .
|
≤ 14 units |
> 14 units |
Management |
40 |
18 |
Other |
358 |
84 |
(e) Using the data and the result of the test from part (d), explain the outcome of the test in a way that would be appropriate for the general readership of the magazine (i.e., in layman’s terms). (5 marks)
Question 5
The HR department in an organisation is looking to model the relationship between employees’ current salaries and data about the employees from when they first joined the organisation. The following data have been collected on a random sample of 100 employees in the organisation.
Salary Current salary at the organisation (£)
Starting Starting salary at the organisation (£)
Experience Previous experience when joining (in months)
Education Total years of education
a) Explain what type of relationship, if any, that MIGHT exist between Salary and each of the other 3 variables. You must provide a reason to support your response for each of the three potential relationships. Your responses should not be based on (or be influenced by) the SPSS output at the end of the question. (3 marks)
b) Explain what type of relationship if any, that MIGHT exist between any pair of the three explanatory variables. You must provide a reason to support your response for each of the three potential relationships. Your responses should not be based on (or be influenced by) the SPSS output at the end of the question. (3 marks)
c) To what extent do the scatterplots, labelled Scatterplots at the end of the question, support your ideas in parts (a) and (b)? (3 marks)
d) To what extent do the correlations, labelled Correlations at the end of the question, support your ideas in parts (a) and (b)? (3 marks)
e) The results of running a STEPWISE regression analysis starting with “no variables” and starting with “all variables” result in the same recommended model. The SPSS STEPWISE output for the “All variables” model is shown at the end of the question (labelled STEPWISE from “All-in” model). State clearly your preferred regression model (Model 1 or Model 2) including the error term. Justify your decision for choosing your preferred model over the non-preferred model. For each of the explanatory variables included in your preferred model, explain its presence in the light of your answers to parts (a) to (d). (8 marks)
f) Using the SPSS output at the end of the question (Model 1 Residual Analysis and Model 2 Residual Analysis), carry out a residuals analysis to check whether or not the usual regression assumptions seem to hold for your preferred model. Carefully justify your conclusions. (5 marks)
2023-05-18