Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

MET CS 555 Final Project

20 points

Select a small data set from the available public data sets (the list is at the end of this document).

Describe a research scenario and specify a research question based on data analytic methods we learned in class. For example, one and two-sample mean tests, correlation tests, simple and multiple linear regression, ANOVA and ANCOVA, one and two-sample tests for proportions, and logistic regression are fair game. Perform your analysis, and then report your results and conclusions.

Clean up your data and randomly sample 1000 observations if your data set is large.

1. Describe your research scenario and question(s).

Briefly describe your research scenario.  Similar to our class examples, you should first describe the overall scenario and then specify a specific research question (or questions) based on it.  

2. Describe the data set.

Briefly describe the data set. Describe each data set variable that you plan to use in your analysis. Describe any data cleaning you have performed. You may present tables or figures if it is possible. If possible, provide a link to the main data set source.

3. Describe the statistical methods you plan to use.

Briefly describe the statistical methods you will utilize to investigate your research question(s).  

4. Report your results.

Write up the results of your analysis. You should present tables and figures when relevant, and you should have a short write-up describing your results.

5. State your conclusions and discuss any limitations.

State the conclusion so that a non-statistician can understand. Discuss any potential limitations of your analysis. For example, are you suspicious that the assumptions of your test may not hold? Do you feel the analysis may have limitations for any other reasons?

Solution Submission

1. Upload a write-up document.

2. Upload your data set. This is the data set after cleaning (a small CSV file).

3. Upload your R script.

Grading will be based on:

1. Development and description of a research scenario and question(s).

2. Data preparation and data cleaning (when relevant).

3. Correctness of statistical analysis methods chosen to answer the research question.

4. Correctness of R code.

5. Quality of figures, tables, and write-up.

6. Correct and thoughtful discussion of conclusions and limitations.

Public Dataset List:

· https://github.com/awesomedata/awesome-public-datasets

· https://github.com/apiad/datasets-list

· https://github.com/datasets/openml-datasets/tree/master/data the same data as this list https://www.openml.org/search?type=data

· Machine Learning Data sets  https://archive.ics.uci.edu/ml/datasets.html

· https://www.data.gov/

· https://www.kaggle.com/datasets

· https://sites.google.com/a/drwren.com/wmd/details

· https://data.cityofnewyork.us/data

· http://snap.stanford.edu/data/index.html

· Amazon AWS Public Dataset Program https://aws.amazon.com/opendata/public-datasets/