Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

CS544 Final Project

Picking the Data Set

Look into the following sites as an example and select a data set that interests you. 1.   https://www.kaggle.com/datasets

2.   https://github.com/fivethirtyeight/data

3.   http://www.kdnuggets.com/datasets/index.html

4.   Any other source of your choice

Preparing the data

.   Import the data set into R.

.   Document the steps for the import process and any preprocessing had  to be done prior to or after the import. Any R code used in the process should be included.

Analyzing the data

.    Do the analysis as in Module3 for at least one categorical variable and at least one numerical variable. Show appropriate plots for your data.

.    Do the analysis as in Module3 for at least one set of two or more variables. Show appropriate plots for your data.

.    Pick one variable with numerical data and examine the distribution of the data.

.    Draw various random samples of the data and show the applicability of the Central Limit Theorem for this variable.

.    Show how various sampling methods can be used on your data. What are your conclusions if these samples are used instead of the whole dataset.

.    Use Data wrangling techniques from Module6 for the appropriate analysis of your data.

.    Use plotly for your plots for interactivity.

Code Files

.    Use RMarkdown for the code. You can start with the provided template

.    Render (Knit) the RMarkdown to HTML.

Submitting the Project

Upload a single zip file (CS544Final_lastName.zip) containing all the code as

RMarkdown (Rmd file) and all the results in a RMarkdown HTML. If the dataset is

directly imported from internet, no need to include the dataset. Otherwise, include your dataset file also in the zip file before uploading.