Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

This project is designed to introduce you to a real life data analysis. You need to answer real life questions by analyzing the data set using the techniques we have learned in class. You do not need to use all methods, but you need at least to fit two different models and compare the goodness offit of both models and how well each model is predicting the response. In order to do this part, you need to use  a training set, and a testing set and compare the model predictions on the testing set.

Project Rules

o The data set to be analyzed will provided by the instructor

o You need to present a project report of maximum 15 pages including figures, tables and appendices (if any) and excluding the cover page.

o You are NOT allowed to discuss your project with other students. If you have questions, please email me or post your question on the discussion board

o  You are allowed to use online resources.    It will be a good idea to have an "Acknowledgement"  Section at the end of your report where you acknowledge the author (or authors) of the on line resources.

o  You are NOT allowed to copy any sentences from other' work (paper, blog, or his/her post on  the Forum) verbatim to your report. You have to  either paraphrase or cite the source. Check some on line websites on "how to avoid plagiarism".

Objectives

1.   Understand the importance of a good descriptive analysis as an initial modeling step, and recognizing the nature of the variable involved. Which are the most important questions you want to answer with your analysis?

2.   Fit an appropriate regression model to a data set by considering the type of predictors (categorical (if any) and/or continuous) and the distribution assumptions of the response. Consider the possibility to apply a transformation to the response or the predictors or both if needed.

3.   Perform a proper analysis including: model selection and model diagnostics, checking for model assumptions and model structure and investigate possible unusual observations (if any).

4.   Write a project report which includes a clear and concise explanation of the nature of your data, the different methods used for your analysis and the conclusions you arrived after this real data analysis problem.

Project Parts

Below are summaries of the parts that make up the full project.

Cover Page:

This page should have the title of the project and your name

Section 1: Introduction

Provide a brief introduction of the goal of this final project. What is it all about? Where did you get the data from? What is the data framework What are the main questions you want to answer with this data analysis?

Section 2: Exploratory Data Analysis

Include some graphical displays and numerical summaries of the data. Also comment on any patterns/characteristics of the data which you find interesting or anything relevant to your later analysis.

Provide a brief explanation/summary of variables you plan to include in your analysis Here are some question you might ask:

•    Which variables are categorical (when applicable) and which are numerical?

•    Which variables are categorical (when applicable) and which are numerical?

Should we remove any unusual observations?

•    Should we add or remove some variables variables in our analysis?

•     For categorical variables (when applicable), should we include any interactions?

•     For numerical variables, any evidence supporting nonlinear trends?

Section 3: Methodology

You are required to build at least two prediction models using the methods covered in this class. For  each model or method you are using, include a brief description of the methodology and a description of the R implementation (R coding steps).

Your should consider the following sub-sections:

•    Section 3.1: Start with a simple model, a model that  doesn't required much training, for example, a linear regression model.

•    Section 3.2: Use the Linear Regression model built in 1 to make predictions on a testing set.

Section 3.3: Fit a different kind of model like a non-parametric regression or principal component regression and make a prediction on the same testing set

•    Section 3.4 (Optional): You can also try with other methods learnt in other classes if applicable (like for example Random Forests.

Section 4: Discussion and conclusions

In this part you should make a summary of your results and discuss the impact of your analysis. You should also write the main conclusions in three to four bullet points.

Appendices (If Any)

In this part you can include for example any intermediate model results or more detailed model diagnostics that you do not want to appear in the main report.