Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

MET CS 689 Designing and Implementing a Data Warehouse

Assignment 1B: Install Database and Python tools

Overview of the Assignment:

Part 1 of this assignment has you working with Analytical functions in SQL and gives you a bit of a taste in what you can do to transform data in SQL.  Part 2 goes into an introduction to Python.  Make sure to download all of the Python-related files into the same folder.

Part 1 – Analytical/Windowed Functions

You will be working with the us_national_statistics database that you restored in assignment 1-A. The primary table is person_economic_info. Each row describes a person sampled from that state. It has the following columns:

Column_name

Column_description

age

Person’s age in years

marital_status

Indicates whether the person is married

address_state

Numeric code refers to ‘states’ table

income

Annual income in USD

income_category

Categorized income

car_price

Price of car in USD

car_price_category

Categorized car price

education

Numeric code refers to education_codes table

years_employed

Years of continuous employment for person

retired

Indicates person is retired

employment_category

Categorizes type of most recent employment for person per employment_categories table

gender

‘m’ or ‘f’ for male or female

length_at_current_residence

Years person has lived at current residence

wireless

Indicates person owns a wireless phone

multiple_lines

Indicates person has multiple voice phone lines

voice_mail

Indicates person has voice mail

pager

Indicates person has a pager

internet

Indicates person has a dedicated residential internet connection rather than cellular or dial-up

caller_id

Indicates person has caller id service on voice line

call_waiting

Indicates person has call waiting service on voice line

own_tv

Indicates person owns a television set

own_dvd_player

Indicates person owns a DVD player

own_smartphone

Indicates person owns a smartphone

own_computer

Indicates person owns a personal computer

own_fax

Indicates person has a fax send/receive device on a phone line

read_newspapers

Indicates person reads physical newspapers

Description tables augmenting this table are:

Table name

Usage

states

Translates numeric state codes to actual states

employment_categories

Translates numeric employment categories to descriptions

education_codes

Translates numeric education codes to descriptions

With each question, write ONE query that answers the question. Put the text of the query in your submission document, along with a screen shot of the results from running the query.

1. Create a view named additional_person_info

· This view should give the names of states, employment categories, and education levels along with all the other fields in the person_economic_info table for use in later reports.  

· Hint: you don’t need to specify each field individually for the person_economic_info table, think about how to select all the columns from the table.  Once you have your query constructed, turn it into a view. 

Text of query:

Screenshot of result:

2. Using a single aggregate query to select from the additional_person_info view which shows states where there is no one who owns a personal computer.

· Hints: think about aggregates such as count and sum, think about what to group by, consider the differences between where clause and having clause. 

Text of query:

Screenshot of result:

3. With one query, show each state’s aggregates by education level and one of the attributes below (your choice).  Use a CUBE to show subtotals by education level for each state, each state will be grouped together at the end.

Select one of the following attributes to aggregate on

· People responding

· Number of people who own computer

· Average income 

Hints:

· Result should have five columns and multiple rows for each state, for example Alabama will have 6 rows, one for each education level and a subtotal for that state.

· Write the aggregate first, then add the CUBE

Text of query:

Screenshot of result:

4. Modify the query above to show each state’s rank based on one of the aggregated columns below. Result will list each state once, the aggregated column, and its rank – 3 columns total.  Order the result by the ranking of your choice.  There is no need to CUBE the results.

· People responding

· Number of people who own computer

· Average income

Text of query:

Screenshot of result:

5. We want to look at statistics of states where there is at least one person using a computer.  Using a single query select from the additional_person_info view: For each state, give the following summary information (result should have seven columns and one row for each state).  

a. State name

b. Number of people reported

c. Number of people who use the internet – Hint:  review distinct values here, there will be a bit of transformation that you will need to do, look at case statements

d. Number of people who own a personal computer

e. Highest income (format as currency)

f. Average income (format as currency)

g. This is the challenging part: Of the people who own a personal computer, calculate the percentage of people who have a dedicated internet connection rather than dial-up (internet column).   

• Hint 1: you will need to cast your aggregates to evaluate this as decimals instead of integers.  

• Hint 2: look at ISNULL and NULLIF to solve the divide by zero issue

• Hint 3: Consider solving this in steps, get columns a-f solved as an inline view, and then solve column g after.

• Hint 4: If you are not able to figure out the divide by zero issue, you can filter out states where no one has a computer (the results of your last query can be an inline view or a CTE), this will be accepted as an alternate solution.

• If you are totally stuck, run the query without g solved for partial credit.

Text of query:

Screenshot of result:

6. For each state AND education level, give the same information (result should have eight columns and multiple rows for each state).   Hint: this should be a minor adjustment of the query above by adding a single attribute – education level.

Text of query:

Screenshot of result:

7. Extra credit (3 points): Implement a query using Lag/Lead or Pivot.  In a single sentence explain what you are trying to accomplish.

Text of query:

Screenshot of result: 

Part 2 – Familiarization with Python

1. Make sure that all of the assignment .py, .csv and .ipynb files have been downloaded into the same folder.

2. From the start menu, open Anaconda Navigator (it may take a minute or two)

3. Launch Jupyter Notebook

4. Navigate to the folder containing the “Python Intro.ipynb” file

5. Run each of the first eight cells individually. In the submission file, summarize what these commands did.

Your summary here:

6. Run the ninth cell. In the submission file, summarize what these commands did. Add another column for the month. Paste a screen shot of the result.

Your summary here:

Screenshot of result:

7. Run the next five cells, exploring the titanic.csv file. In the submission file, summarize what these commands did.

Your summary here:

8. Extra credit: Using a notepad editor, edit the LoadFromDatabase.py file. Change the connection for your type of database server. Run the last two cells. In the submission file, summarize what these commands did.  Note – we will explore how to connect to the database to both read and write to a database in a future assignment, so don’t worry if you can’t figure this out just yet.

Your summary here:

Criterion 

D

F

Letter Grade

Correctness and Completeness of Results (70%) 

All steps' results are entirely complete and correct

About ¾ of the steps' results are correct and complete

About half of the steps' results are correct and complete

About ¼ of the steps' results are correct and complete

Virtually none of the step's results are correct and complete

 

Constitution of SQL/Python and Explanations (30%)

Excellent use and integration of appropriate SQL/Python constructs and supporting explanations

Good use and integration of appropriate SQL/Python constructs and supporting explanations

Mediocre use and integration of appropriate SQL/Python constructs and supporting explanations

Substandard use and integration of appropriate SQL/Python constructs and supporting explanations

Virtually all SQL/Python constructs and supporting explanations are unsuitable or improperly integrated

 

 

 

 

 

 

Assignment Grade: