Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

MET CS 689 Designing and Implementing a Data Warehouse

Assignment 1B: Install Database and Python tools

Overview of the Assignment:

Part 1 of this assignment has you working with Analytical functions in SQL and gives you a bit of a taste in what you can do to transform data in SQL. Part 2 goes into an introduction to Python. Make sure to download all of the Python-related files into the same folder.

Part 1 – Analytical/Windowed Functions

You will be working with the us_national_statistics database that you restored in assignment 1-A. The primary table is person_economic_info. Each row describes a person sampled from that state. It has the following columns:

Column_name	Column_description
age	Person’s age in years
marital_status	Indicates whether the person is married
address_state	Numeric code refers to ‘states’ table
income	Annual income in USD
income_category	Categorized income
car_price	Price of car in USD
car_price_category	Categorized car price
education	Numeric code refers to education_codes table
years_employed	Years of continuous employment for person
retired	Indicates person is retired
employment_category	Categorizes type of most recent employment for person per employment_categories table
gender	‘m’ or ‘f’ for male or female
length_at_current_residence	Years person has lived at current residence
wireless	Indicates person owns a wireless phone
multiple_lines	Indicates person has multiple voice phone lines
voice_mail	Indicates person has voice mail
pager	Indicates person has a pager
internet	Indicates person has a dedicated residential internet connection rather than cellular or dial-up
caller_id	Indicates person has caller id service on voice line
call_waiting	Indicates person has call waiting service on voice line
own_tv	Indicates person owns a television set
own_dvd_player	Indicates person owns a DVD player
own_smartphone	Indicates person owns a smartphone
own_computer	Indicates person owns a personal computer
own_fax	Indicates person has a fax send/receive device on a phone line
read_newspapers	Indicates person reads physical newspapers

Description tables augmenting this table are:

Table name	Usage
states	Translates numeric state codes to actual states
employment_categories	Translates numeric employment categories to descriptions
education_codes	Translates numeric education codes to descriptions

With each question, write ONE query that answers the question. Put the text of the query in your submission document, along with a screen shot of the results from running the query.

1. Create a view named additional_person_info

· This view should give the names of states, employment categories, and education levels along with all the other fields in the person_economic_info table for use in later reports.

· Hint: you don’t need to specify each field individually for the person_economic_info table, think about how to select all the columns from the table. Once you have your query constructed, turn it into a view.

Text of query:

Screenshot of result:

2. Using a single aggregate query to select from the additional_person_info view which shows states where there is no one who owns a personal computer.

· Hints: think about aggregates such as count and sum, think about what to group by, consider the differences between where clause and having clause.

Text of query:

Screenshot of result:

3. With one query, show each state’s aggregates by education level and one of the attributes below (your choice). Use a CUBE to show subtotals by education level for each state, each state will be grouped together at the end.

Select one of the following attributes to aggregate on

· People responding

· Number of people who own computer

· Average income

Hints:

· Result should have five columns and multiple rows for each state, for example Alabama will have 6 rows, one for each education level and a subtotal for that state.

· Write the aggregate first, then add the CUBE

Text of query:

Screenshot of result:

4. Modify the query above to show each state’s rank based on one of the aggregated columns below. Result will list each state once, the aggregated column, and its rank – 3 columns total. Order the result by the ranking of your choice. There is no need to CUBE the results.

· People responding

· Number of people who own computer

· Average income

Text of query:

Screenshot of result:

5. We want to look at statistics of states where there is at least one person using a computer. Using a single query select from the additional_person_info view: For each state, give the following summary information (result should have seven columns and one row for each state).

a. State name

b. Number of people reported

c. Number of people who use the internet – Hint: review distinct values here, there will be a bit of transformation that you will need to do, look at case statements

d. Number of people who own a personal computer

e. Highest income (format as currency)

f. Average income (format as currency)

g. This is the challenging part: Of the people who own a personal computer, calculate the percentage of people who have a dedicated internet connection rather than dial-up (internet column).

• Hint 1: you will need to cast your aggregates to evaluate this as decimals instead of integers.

• Hint 2: look at ISNULL and NULLIF to solve the divide by zero issue

• Hint 3: Consider solving this in steps, get columns a-f solved as an inline view, and then solve column g after.

• Hint 4: If you are not able to figure out the divide by zero issue, you can filter out states where no one has a computer (the results of your last query can be an inline view or a CTE), this will be accepted as an alternate solution.

• If you are totally stuck, run the query without g solved for partial credit.

Text of query:

Screenshot of result:

6. For each state AND education level, give the same information (result should have eight columns and multiple rows for each state). Hint: this should be a minor adjustment of the query above by adding a single attribute – education level.

Text of query:

Screenshot of result:

7. Extra credit (3 points): Implement a query using Lag/Lead or Pivot. In a single sentence explain what you are trying to accomplish.

Text of query:

Screenshot of result:

Part 2 – Familiarization with Python

1. Make sure that all of the assignment .py, .csv and .ipynb files have been downloaded into the same folder.

2. From the start menu, open Anaconda Navigator (it may take a minute or two)

3. Launch Jupyter Notebook

4. Navigate to the folder containing the “Python Intro.ipynb” file

5. Run each of the first eight cells individually. In the submission file, summarize what these commands did.

Your summary here:

6. Run the ninth cell. In the submission file, summarize what these commands did. Add another column for the month. Paste a screen shot of the result.

Your summary here:

Screenshot of result:

7. Run the next five cells, exploring the titanic.csv file. In the submission file, summarize what these commands did.

Your summary here:

8. Extra credit: Using a notepad editor, edit the LoadFromDatabase.py file. Change the connection for your type of database server. Run the last two cells. In the submission file, summarize what these commands did. Note – we will explore how to connect to the database to both read and write to a database in a future assignment, so don’t worry if you can’t figure this out just yet.

Your summary here:

Criterion	A	B	C	D	F	Letter Grade
Correctness and Completeness of Results (70%)	All steps' results are entirely complete and correct	About ¾ of the steps' results are correct and complete	About half of the steps' results are correct and complete	About ¼ of the steps' results are correct and complete	Virtually none of the step's results are correct and complete
Constitution of SQL/Python and Explanations (30%)	Excellent use and integration of appropriate SQL/Python constructs and supporting explanations	Good use and integration of appropriate SQL/Python constructs and supporting explanations	Mediocre use and integration of appropriate SQL/Python constructs and supporting explanations	Substandard use and integration of appropriate SQL/Python constructs and supporting explanations	Virtually all SQL/Python constructs and supporting explanations are unsuitable or improperly integrated
					Assignment Grade: