ECON494 ASSIGNMENT 2 Economic Data Analysis: SAS
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
ASSIGNMENT 2
ECON494
Economic Data Analysis: SAS
Due: noon Monday Feb 6th 2023
Notes:
Please print your name and UofA ID on the submitted assignment.
Late submission of assignment is not accepted. No exceptions.
Please send the assignment directly to our TA
Total marks 100
In this assignment, we will prepare and analyze Transportation Security Administration (TSA) Airport Claims data from 2002 through 2017. TSA is an agency of the United States Department of Homeland Security that has authority over the security of the traveling public. A claim is filed if you are injured or your property is lost or damaged during the screening process at an airport.
In this assignment, you need to:
Prepare the data.
Create one PDF report that analyzes the overall data as well as the data for a dynamically
specified state
The data you will use is TSAClaims2002_2017.csv, which was created from the following sources:
TSA Airport Claims data from https://www.dhs.gov/tsa-claims-data
FAA Airport Facilities data from
https://www.faa.gov/airports/airport_safety/airportdata_5010/.
Here is the variable information for the TSAClaims2002_2017.csv data file. Read it through carefully.
Column |
Description |
Claim_Number |
Number for each claim. Some claims can have duplicate claim numbers but different information for each claim. Those claims are considered valid for this assignment. Any duplicate rows should be removed from the data. |
Date_Received |
Date the claim was received. Date_Received must occur after Incident_Date. Range: From 2002 through 2017 |
Incident_Date |
Date the incident occurred. Incident_Date must occur before Date_Received. Range: From 2002 through 2017 |
Airport_Code |
Airport code three-letter abbreviation. |
Airport_Name |
Full name of the airport. |
Claim_Type |
Category of the claim. If the claim is separated into two types by a slash, Claim_Type is the first type. For example: Personal Property Loss/Injury is considered Personal Property Loss. Possible values (14): • Bus Terminal • Complaint • Compliment • Employee Loss (MPCECA) • Missed Flight • Motor Vehicle • Not Provided • Passenger Property Loss • Passenger Theft • Personal Injury • Property Damage • Property Loss • Unknown • Wrongful Death |
Claim_Site
Item_Category
Airport location of the claim.
Possible values (8):
• Bus Station
• Checked Baggage
• Checkpoint
• Motor Vehicle
• Not Provided
• Other
• Pre-Check
• Unknown
Type of items that have been filed in the claim. Depending on the year of the data, the Item_Category values are input differently. Because of varying consistency, you do not need to clean this variable for the assignment.
Close_Amount The dollar amount a claim was closed for.
Disposition
The final settlement of the claim.
Possible values (10):
• *Insufficient
• Approve in Full
• Closed: Canceled
• Closed: Contractor Claim
• Deny
• In Review
• Pending Payment
• Received
• Settle
• Unknown
*Insufficient is the value from the raw data.
StateName
Associated airport state name (for example, NEW YORK).
Requirements:
Values should be in all proper case. (Original data is in all uppercase.)
Associated airport state code. This is the standard two-letter abbreviation used by the post office for US states and territories (for example, IL, PR, CQ).
Requirements:
Values should be in all uppercase.
County
Airport associated county (or parish) name (for example, Cook).
City
Associated airport city name (for example, CHICAGO).
Data work:
To prepare and analyze the data, youfollow thefollowing requirements (65 marks)
Import the raw data file TSAClaims2002_2017.csv. (5 marks)
The final data should be in the permanent library tsa, and the data set should be named
claims_cleaned. (5 marks)
Entirely duplicated records need to be removed from the data set (5 marks) All missing and “-“ values in the variables Claim_Type, Claim_Site, and Disposition
must be changed to “Unknown” . (5 marks)
Values in the variables Claim_Type, Claim_Site, and Disposition must follow the
requirements in the data layout. (5 marks)
All StateName values should be in proper case. (5 marks)
All State values should be in uppercase. (5 marks)
You create a new variable named Date_Issues with a value of Needs Review to indicate
that a row has a date problem. Date problems consist of the following: (5 marks)
o a missing value for Incident_Date or Date_Received
o an Incident_Date or Date_Received value out of the predefined year range of
2002 through 2017
o an Incident_Date value that occurs after the Date_Received value
Remove the County and City variables. (5 marks)
Currency should be permanently formatted with a dollar sign and include two decimal
points. (5 marks)
All dates should be permanently formatted in the style 01JAN2000. (5 marks) Permanent labels should be assigned to variables by replacing the underscores with a
space. (5 marks)
Final data should be sorted in ascending order by Incident_Date. (5 marks) Analysis:
Analyze the overall data to answer the following questions. Be sure to add appropriate titles to your report:
Export a final PDF file (contain the following data analysis results) named ClaimReports that has a style of your choice (15 marks):
Thefinal single pdf report needs to befree of all observations with date issues and answer the following questions:
How many date issues are in the overall data? (5 marks)
How many claims per year of Incident_Date are in the overall data? Be sure to include a
plot. (5 marks)
Present the following statistics for (California, Texas, Florida, and New York) and
answer the following for each state: (Note: pay attention to missing values for each state,
we are not presenting statistics for states that have missing values in the following variables. Note “unknown” is not a missing value!) (5 marks)
o a. What are the frequency values for Claim_Type for the selected state?
o b. What are the frequency values for Claim_Site for the selected state?
o c. What are the frequency values for Disposition for the selected state?
o d. What is the mean, minimum, maximum, and sum of Close_Amount for the selected state? Round to the nearest integer.
The following analysis is a separate task and should not be included in the same pdf
above. You can present the following analysis in a new report (for example, another separate pdf file). (20 marks)
Merge your cleaned SAS data file with another excel file
“US_State_population_2020.xlsx” (Note: this excel file also needs to be loaded in SAS and converted into a SAS data file). Name this new merged file “TSA_POP”: (10 marks)
o Rank (rank of state by population size in 2020)
o State (state name)
o Pop (population size in 2020)
o Growth (pop growth rate since last year)
o Pop2018 (size of population in person)
o Pop2010 (size of population by 2010 census)
o growthSince2010 (pop growth rate since 2010)
o Percent (pop size as a percent of the U.S.)
o Density (people / square miles)
Combine with the above new population information by states, answer the following
questions and save your results: (10 marks)
o Can you argue with the available data, that the incidence of “Passenger Property Loss” (this is one category in the variable “Claim_Type”) is positively related to “population size” of the state where this claim was made? (5 marks)
o Alternatively, can you argue with the available data, that the incidence of “Passenger Property Loss” (this is one category in the variable “Claim_Type”) is positively related to “population density” of the state where this claim was made? (5 marks)
2023-02-04