Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

School Exercise

The database in this exercise is about colleges and universities, hereinafter called schools.  The following information about schools is provided:  school conferences, and starting and mid-career salaries for schools.  The following are the relations you need to create:

school(school_name, conference)

school_salary(school, region, starting_median, mid_career_median, mid_career_90)

All source files are in school_exercise.zip.

The main source file for schools is school_src.csv.  The main source file for school_salary is school_salary_src.csv.  

Each school is in one of the following conferences: Patriot, Pac-12, SEC, Big 12, ACC, Big Ten, and Independent.  Each school is in one of the following regions: Northeastern, Southern, Western, Midwestern, and California.  Each school has salary information.  The relation school_salary is the authoritative source for school names.  

The salaries are the starting median starting salary for students, the mid-career median salary for graduates, and the 90th percentile (highest 10%) salaries for graduates at mid-career.

Some of the attributes in the relations are unknown and, should therefore, be set to NULL.  If the attribute is NULL in a source file, the attribute will be an empty string ''.  You will need to do arithmetic on the salaries, so be sure that ultimately, you load them as numbers.

Whenever you start a project, you must clean and integrate the data and check it for consistency.  This project is no different.  You may find that you need to convert empty strings '' to null values, change school names in school to match school_salary, or change the format of the salaries so they can be treated as numbers.  A question you will need to answer is: “How will you find the rows that you need to update?”

The data in these relations come from various sources.  School is derived from a dataset fivethirtyeight.com used for an article called “Our Guide To The Exuberant Nonsense Of College Fight Songs.” The article is available at https://projects.fivethirtyeight.com/college-fight-song-lyrics/. Data was downloaded from https://fivethirtyeight.datasettes.com/fivethirtyeight/fight-songs%2Ffight-songs on 11/2/2020.  The schools for the Patriot League were compiled by Professor Ordille using Internet Search.  Professor Ordille has normalized most, though perhaps not all, the school names in the school relation to match the names in the school_salary relation.

Salary information about schools is derived from a dataset from the Wall Street Journal used for an article called “Where it Pays to Attend College.”  Data was downloaded from https://www.kaggle.com/wsj/college-salaries on 11/2/2020.  In order to guarantee a match between school names in school and school names in school salary, Professor Ordille estimated the salaries at some schools using Internet search.  Be aware that these values do not come from the validated sources used by the Wall Street Journal and may be inaccurate:

school

region

starting_median

mid_career_median

mid_career_90

Loyola University Maryland

Southern

59900.00

111200.00

 

Texas Tech University

Southern

47000.00

 

 

The United States Military Academy (Army)

Northeastern

74000.00

120000.00

 

United States Naval Academy (Navy)

Southern

77100.00

131000.00

 

University of Louisville

Southern

38000.00

 

 

University of Miami

Southern

58760.00

 

 

University of Pittsburgh (Pitt)

Northeastern

45700.00

74000.00

150000.00

Wake Forest University

Southern

63800.00

 

 

 

 

 

 

 

To complete this assignment, you MUST create a MySQL script that runs from beginning to end to:

· create the target tables school and school_salaries,  

· clean and insert the data in that table from the uploaded input files, and

· generate answers to all the questions.  

You MUST also submit the answers to the the questions in this file via individual Gradescope questions.  Submit your sql file (2 points) and your answers to the individual questions through Gradescope.  You can submit your answers to individual questions to Gradescope as you create them, and update your answers/resubmit through the end of the submission period.

It is not enough to simply have the right SQL somewhere in the script file.  We will test your answer by running your script on our upload of the input files.  Note that you CANNOT change the input csv files.  If you do so, we will not be able to execute your script successfully from the input files, and you will lose points.  You may change the source tables (school_src and school_salary_src) after you upload the source input files into those tables.

You will find the following commands helpful when running your script repeatedly:

DROP TABLE IF EXISTS school;

DROP TABLE IF EXISTS school_salary;

Answer the following questions by performing SQL operations on the database. Your SQL answer must work for any data or changes to data in this database.  Your data answer must be in a single table you generated using SQL, not several tables that need to be combined manually to get the answer to the question.  You cannot use data values in a query unless they are specified in the question.  If you need another value, generate it with a query and use that query or its resulting table in the query that ultimately produces the answer.  Unless specified otherwise, remove duplicates from your answers as appropriate.  Show the answer (that is a table of data that answers the question for 2-9), and the SQL used to generate the answer.  

If you have a Windows machine, you can copy the answer table from MySQL Workbench and paste it into Gradescope by following these steps:

· Click in the cell to the left of the first tuple in the result.

· Click on the cell above that which is to the left of the column headings.

· Press the copy key (Control-C on Windows).

· Paste the table into the Gradescope answer section.  The formatting won’t be perfect, but this is acceptable.  Alternatively, follow the Mac instructions for using screenshots below.

If you have a Mac, you can include a screen shot of the answers you get, or you can export the result table to Excel and then cut/paste the table into Gradescope.  You must include text for the SQL, not a picture of the SQL.

What is the SQL for creating the target relations and loading the files into the target relations?  Be sure to include any key, foreign key, or check constraints that you see when creating the target relations.   Include any transformations you do on the data to enable the data to be transferred into the target tables, used and made consistent.  (6 points)

What is all the salary information about Rutgers University?  (2 points)

How many schools are in the Big Ten?  (2 points)

What is all the information in the school_salary relation about tech schools in descending order by starting median salary?  (2 points)

What are all the conferences?  List each conference once. (2 points)

What is the school in the Big Ten with the highest 90th percentile mid-career salary and what is that 90th percentile mid-career salary? (2 points)  You cannot use LIMIT to answer this question.  LIMIT 1 will not work if there is a tie for the highest salary.

What is all the salary information for the schools in the Big Ten in decreasing order of mid-career 90th percentile salary? (2 points)

List the school and salary information for these NJ schools:  Fairleigh Dickinson University, Princeton University, Rider University, Rutgers University, Seton Hall University, Stevens Institute of Technology in ascending order by school.  Use FORMAT and CONCAT to create a string for the salary that has a starting $ and a comma after the thousands place, for example:  $49,200.00 .  The salary columns in the result should be named starting_median, mid_career_median, and mid_career_90.  (2 points)

What conference has the most member schools and how many member schools does it have?  (2 points)  You cannot use LIMIT to answer this question.  LIMIT 1 will not work if there is a tie for the highest number of members. 

You must create a script that creates the target tables, does any necessary data integration, populates the target tables, and then generates the answers to each of the queries.  The script must run from beginning to end successfully when nothing is highlighted and the lightning bolt is pressed. (2 points)