Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

ASSIGNMENT 2

ECON494

Economic Data Analysis: SAS

Due: noon Monday Feb 6th 2023

Notes:

Please print your name and UofA ID on the submitted assignment.

Late submission of assignment is not accepted. No exceptions.

Please send the assignment directly to our TA

Total marks 100

In this assignment, we will prepare and analyze Transportation Security Administration (TSA)    Airport Claims data from 2002 through 2017. TSA is an agency of the United States Department of Homeland Security that has authority over the security of the traveling public. A claim is filed if you are injured or your property is lost or damaged during the screening process at an airport.

In this assignment, you need to:

    Prepare the data.

    Create one PDF report that analyzes the overall data as well as the data for a dynamically

specified state

The data you will use is TSAClaims2002_2017.csv, which was created from the following sources:

    TSA Airport Claims data from https://www.dhs.gov/tsa-claims-data

    FAA Airport Facilities data from

https://www.faa.gov/airports/airport_safety/airportdata_5010/.

Here is the variable information for the TSAClaims2002_2017.csv data file. Read it through carefully.

Column

Description

Claim_Number

Number for each claim. Some claims can have duplicate claim numbers but different               information for each claim. Those claims are    considered valid for this assignment.                Any duplicate rows should be removed from    the data.

Date_Received

Date the claim was received. Date_Received must occur after Incident_Date.

Range:

From 2002 through 2017

Incident_Date

Date the incident occurred. Incident_Date must occur before Date_Received.           Range:

From 2002 through 2017

Airport_Code

Airport code three-letter abbreviation.

Airport_Name

Full name of the airport.

Claim_Type

Category of the claim. If the claim is separated into two types by a slash, Claim_Type is the   first type.

For example: Personal Property Loss/Injury is considered Personal Property Loss.                Possible values (14):

• Bus Terminal

• Complaint

• Compliment

• Employee Loss (MPCECA)

• Missed Flight

• Motor Vehicle

• Not Provided

• Passenger Property Loss

Passenger Theft

• Personal Injury

• Property Damage

• Property Loss

• Unknown

• Wrongful Death

Claim_Site

Item_Category

Airport location of the claim.

Possible values (8):

• Bus Station

• Checked Baggage

• Checkpoint

• Motor Vehicle

• Not Provided

• Other

• Pre-Check

• Unknown

Type of items that have been filed in the claim. Depending on the year of the data, the             Item_Category values are input differently.       Because of varying consistency, you do not     need to clean this variable for the assignment.

Close_Amount                                                    The dollar amount a claim was closed for.

Disposition

The final settlement of the claim.

Possible values (10):

• *Insufficient

• Approve in Full

• Closed: Canceled

• Closed: Contractor Claim

• Deny

• In Review

• Pending Payment

• Received

• Settle

• Unknown

*Insufficient is the value from the raw data.

StateName

Associated airport state name (for example, NEW YORK).

Requirements:

Values should be in all proper case. (Original data is in all uppercase.)

Associated airport state code. This is the     standard two-letter abbreviation used by the post office for US states and territories (for  example, IL, PR, CQ).

Requirements:

Values should be in all uppercase.

County

Airport associated county (or parish) name (for example, Cook).

City

Associated airport city name (for example, CHICAGO).

Data work:

To prepare and analyze the data, youfollow thefollowing requirements (65 marks)

    Import the raw data file TSAClaims2002_2017.csv. (5 marks)

    The final data should be in the permanent library tsa, and the data set should be named

claims_cleaned. (5 marks)

    Entirely duplicated records need to be removed from the data set (5 marks)                      All missing and -“ values in the variables Claim_Type, Claim_Site, and Disposition

must be changed to Unknown . (5 marks)

    Values in the variables Claim_Type, Claim_Site, and Disposition must follow the

requirements in the data layout. (5 marks)

    All StateName values should be in proper case. (5 marks)

    All State values should be in uppercase. (5 marks)

    You create a new variable named Date_Issues with a value of Needs Review to indicate

that a row has a date problem. Date problems consist of the following: (5 marks)

o a missing value for Incident_Date or Date_Received

o an Incident_Date or Date_Received value out of the predefined year range of

2002 through 2017

o an Incident_Date value that occurs after the Date_Received value

    Remove the County and City variables. (5 marks)

    Currency should be permanently formatted with a dollar sign and include two decimal

points. (5 marks)

    All dates should be permanently formatted in the style 01JAN2000. (5 marks)               Permanent labels should be assigned to variables by replacing the underscores with a

space. (5 marks)

    Final data should be sorted in ascending order by Incident_Date. (5 marks) Analysis:

Analyze the overall data to answer the following questions. Be sure to add appropriate titles to your report:

Export a final PDF file (contain the following data analysis results) named ClaimReports that has a style of your choice (15 marks):

Thefinal single pdf report needs to befree of all observations with date issues and answer the following questions:

    How many date issues are in the overall data? (5 marks)

    How many claims per year of Incident_Date are in the overall data? Be sure to include a

plot. (5 marks)

    Present the following statistics for (California, Texas, Florida, and New York) and

answer the following for each state: (Note: pay attention to missing values for each state,

we are not presenting statistics for states that have missing values in the following variables. Note unknown” is not a missing value!) (5 marks)

o a. What are the frequency values for Claim_Type for the selected state?

o b. What are the frequency values for Claim_Site for the selected state?

o c. What are the frequency values for Disposition for the selected state?

o d. What is the mean, minimum, maximum, and sum of Close_Amount for the selected state? Round to the nearest integer.

    The following analysis is a separate task and should not be included in the same pdf

above. You can present the following analysis in a new report (for example, another separate pdf file). (20 marks)

    Merge your cleaned SAS data file with another excel file

“US_State_population_2020.xlsx” (Note: this excel file also needs to be loaded in SAS  and converted into a SAS data file). Name this new merged file “TSA_POP”: (10 marks)

o Rank (rank of state by population size in 2020)

o State (state name)

o Pop (population size in 2020)

o Growth (pop growth rate since last year)

o Pop2018 (size of population in person)

o Pop2010 (size of population by 2010 census)

o growthSince2010 (pop growth rate since 2010)

o Percent (pop size as a percent of the U.S.)

o Density (people / square miles)

    Combine with the above new population information by states, answer the following

questions and save your results: (10 marks)

o Can you argue with the available data, that the incidence of Passenger Property Loss” (this is one category in the variable Claim_Type”) is positively related to “population size” of the state where this claim was made? (5 marks)

o Alternatively, can you argue with the available data, that the incidence of             “Passenger Property Loss” (this is one category in the variable Claim_Type”) is positively related to population density” of the state where this claim was made? (5 marks)