Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Exam 2 review - STAT 1000/1100

Regression

Problem 1:

An elderly individual has a fractured hip and is facing surgery to repair it.  The orthopedic surgeon explains the risks to the patient.  Infection occurs in 2% of such operations, the repair fails in 10%, and both infection and failure together in1.5%.  (Use a Venn Diagram to model this problem.)

What is the probability that the operation succeeds and is free from infection?

a.0.785

b. 0.80

c.0.895

d. 0.81

e. 0.865

problem 2

One of the common tests for tuberculosis (TB) is a skin test where a substance is injected into a subject’s  armand we see if a rash develops in a few days. The test is relatively accurate, but in a few cases a rash    might be detected even when the subject does not have TB (afalse positive) or the rash may not be seen even when the subject has TB (afalse negative).  The probability of a false positive is approximately 0.05  and the probability of a false negative is approximately 0.01.  In the U.S. only about 4 in 10,000 people have TB (an incidence rate of 0.0004).  Model this with a tree diagram.


Calculate the overall probability of a Positive Test result.

a.0.050376

b.0.9904

c.0.015374

d.0.52

e.0.003322

Problem 3:

What is the theoretical probability of rolling a sum of 10, 11 or 12 on these two 8-sided dice?  Hint:

Consider all the possible equally likely outcomes in the sample space.

a.

0.28125

b.

0.19

c.

0.375

d.

0.2375

e.

0.171875

Problem 3:


We would like to explore the relationship between unemployment and the education rates, using the data for 11 states. Let X variable be the percentage of people who attained a Bachelor Degree (Bachelor Degree %), and let Y variable be the ‘Unemployment Rate”

STATE

Bachelor Degree (%)

Unemployment Rate (%)

Vermont

36

2.9

Colorado

38.1

3.3

Utah

31.1

3.8

Minnesota

33.7

3.3

Kansas

31

3.5

Montana

29.5

4.1

Michigan

26.9

4.7

New Jersey

36.8

4.7

Ohio

26.1

4.7

Nevada

23

4.9

West Virginia

19.2

5.4

Enter the “Bachelor Degree (%)” data into L1 and “Unemployment Rate (%)” data into L2

i.          Fit a linear model to this data and store the equation in Y1.  Store the predictions in

L3 and the Residuals in L4.   What is the resulting equation?

Calculator Steps :

STAT  CALC  LinReg(a+bx)   XList: L1 ; YList: L2  ; Store ReqEq*: Y1   Calculate

*to store in Y1, do this: VARS  Y-VARS  FUNCTION  Y1          to store

predicted values: “L3=”  VARS  Y-VARS  FUNCTION  Y1 ”Y1(L1)”

a.) yˆ = 12.117 - 0.3735x

b.) yˆ = 7.2947 - 0.1054x

c.)   yˆ = 1.5498 – 1.6175x

d.)   yˆ = 20.785 - 0.1076x

ii.          How much of the variation in state unemployment rates can be explained when we use Bachelor Degree % as the predictor in our linear model?

a.)  77.3%

b.)   10.5%

c.) 59.9%

d.)  68%

e.)  97.5%

iii.        The state of West Virginia has the lowest percentage of people who completed a

Bachelor Degree.  What is our predicted unemployment rate, and what is the value of the residual for this state?

a.)   yˆ = 4.0%,  Residual = 0.21%

b.)  yˆ = 30.6%,  Residual = 2.96%

c.)   yˆ = 4.54%,  Residual = 0.16%

d.) yˆ = 5.27%,  Residual = 0.13%

iv.        Using a calculator, generate a scatter plot of Bachelor Degree % (your X value) versus RESIDUALS (we stored them in L4).  Do you see one unusually large residual?

Calculator Steps:

2nd   STAT PLOT  ON  TYPE (upper left)  Xlist: L1, Ylist: L4   ZOOM  9

Which of the two statements below correctly describes this unusual value?  (use “trace”)

a. Predicted Unemployment rate in New Jersey is much higher than the actual unemployment rate b. Predicted Unemployment rate in New Jersey is much lower than the actual unemployment rate

c.   Predicted Unemployment rate in Ohio is much higher than the actual unemployment rate d.   Predicted Unemployment rate in Ohio is much lower than the actual unemployment rate

Problem 4:

The concentration (in milligrams per liter) of a medication in a patient’s blood as time passes is given by the data in the following table:

Time

(hours)

Concentration

(mg/l)

0

0

0.5

78.1

1

99.8

1.5

84.4

2

50.1

2.5

15.6

i.            Fit a quadratic model to this data with your calculator (X: Hours and Y:

Concentration).  What are the coefficients, and R-Square of your quadratic model?

a.)   yˆ = - .0053x2  254.21x + 2059, R-square = .949

b.)   yˆ = .0134x2 – 2567x + 11921, R-square = .841

c.) yˆ = - 56.214x2 + 139.31x + 9.35, R-square = .9218

d.) yˆ = 0.000478x2 – 1.82202x + 1737.34, R-square = .9775

ii.            Using your calculator to generate predictions (store them in L3) and the residuals (L4). Create a scatter plot: Hours(L1) vs Residuals(L4).  Which of the following describes the largest residual?

a.) Time: 0, y: 0 mg/l, yˆ : 9.35 mg/l, Residual: -9.35 mg/l

b.) Time: .5, y: 78.1 mg/l, yˆ : 64.953 mg/l, Residual: 13.147 mg/l

c.)  Time: .5, y: 64.953 mg/l, yˆ : 78.1 mg/l, Residual: 13.147mg/l

d.)  Time: 1, y: 99.8  mg/l, yˆ : 92.449 mg/l, Residual: 7.3514 mg/l

iii.          What is the predicted concentration of medicine in the blood after 2.25 hours have passed?

Hint: plugin 2.25 for X into the equation you stored in Y1

Problem 5 (Teen Birth Rate vs Poverty Level):

This dataset of size n = 51 are for the 50 states and the District of Columbia in the United States. The variables are x = year 2002 birthrate per 1000 females 15 to 17 years old andy = poverty rate, which is the percent of the state’s population living in households with incomes below the federally defined poverty level.

Regression Analysis: Brth15to17 versus   PovPct

The regression equation is

PovPct = 4.46 + 0.3883 Brth15to17

.      How much of the variation in poverty rates can be explained by state teen birthrate?

a.) 20.2%

b.) 53.33%

c.)  52.38%

d.)  0.5%

ii.         Connecticut in this data had poverty rate of 9.7, and teen birthrate of 14.1.

What is the predicted poverty rate and the residual for Connecticut?

a.) yˆ = 9.93%,  Residual = - 0.23%

b.)   yˆ = 8.13%,  Residual = 1.57%

c.)   yˆ = 10.93%,  Residual = - 1.23%

d.)  yˆ = 7.66%,     Residual = 2.04%

iii.  What do the residual plot and the R-sq value tell you about the linear model that was created for this data?

a.) The linear trend in the Residual plot indicates that the linear model was a correct     model for the data; however, teen birth rates are not good predictors of poverty rates.

b.) The lack of pattern in the Residual plot indicates that the linear model was an incorrect

model for the data and the R-Sq value tells us that teen birth rates are not good predictors of poverty rates.

c.) The lack of pattern in the Residual plot indicates that the linear model was a correct model for the data; however, teen birthrates may not be the best

predictors of poverty rates.

d.) The linear trend in the Residual plot indicates that the linear model was a correct model for the data and the R-Sq value supports that claim.

e.) The lack of pattern in the Residual plot indicates that the linear model was a correct

model for the data and the R-Sq value indicates a strong predictive relationship between x andy.

Problem 6:

Data for total New England Old Navy daily sales and New England March temperatures (F0) was used in Minitab to run linear regression.   The following regression equation was generated:   yˆ = 23.2 + 1.4x, where x = “degrees in Fahrenheit” andy = “dollar amount in thousands per day”

Which statement best interprets this regression equation:

a.)  Average daily sales decrease by $1,400 per one degree(F0) increase in temperature.  b.)  Average daily sales increase by $14,000 per one degree(F0) increase in temperature.

c.)  Average daily sales increase by $1,400 per one degree(F0) increase in temperature. d.)  Average daily sales increase by $(1,400+23,200) per one degree(F0) increase in temperature.

Problem 7:

Below is a scatterplot of the weight (lbs) of various cars vs. their Mileage(mpg)

 

A plausible value for the correlation between Weight(lbs) and Mileage(mpg) could be:

a.)   +0.75

b.)  - 0.75

c.)  +1.4

d.)  - 0.1

Problem 8:

A sociologist conducted a study to determine the relationship between years of education,x, and

annual income, y, (in thousands of dollars).  From the data, the following statistics were calculated:

 = 13.5                          Sx = 14.8                    = 57.286               Sy = 29.426                 T = 0.93

i. Use the following equations to create the regression equation for x: years of education and y: annual income

b =T (sy/Sx)    and   a =−b

y-hat = 32.31  + 1.85 x