Final Project: STAT 462, Fall 2021
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Final Project: STAT 462, Fall 2021
About the data:
The Professional Golf Association (PGA) is interested to learn how various statistics they use to judge a golfer’s performance relates to certain outcomes. Data on 12 tournaments played in 2020 was collected resulting in a data set of 1614 observations of 7 variables. However, you will use a random select of 150 rows (see Analysis Paper section below for more details). Each line of the data set provides information on these 7 variables. The 7 variables are:
Top25 |
Player finished in Top 25 (value = 1) in tournament played. This variable should only will be used as the response variable in your logistic regression! |
Score |
This is the total score earned by that player in the tournament. This variable should only be used as the response variable in your linear regression. NOTE: In golf, lower scores are better. For example, a score of -2 is better than 2. This means the leading “-“ for of any data value in the Score column is meaningful and extremely important! |
Five variables: Putt through T2G |
These variables refer to strokes gained and represent the “various statistics” the PGA is interested in using. Larger, positive numbers are better. The values represent the average strokes gained in that aspect over the tournament. These acronyms stand for Putt(putting); ARG (around the green); APP (approach); OTT (offthe tee): and T2G (tee to green). If curious, you can read more about strokes gained here: https://www.pgatour.com/news/2016/05/31/strokes-gained-defined.html NOTE: The T2G value is the summation of ARG, APP and OTT. |
Basics:
The project will involve two (2) different statistical applications:
1. Linear Regression to predict Score
2. Logistic Regression to predict Top25
Variables to Use:
1. Linear regression to predict Score use these five (5) predictors: Putt, ARG, APP, OTT, and T2G. Remember, as stated above in the table T2G equals the summation of ARG, APP, and
OTT.
2. Logistic regression to predict “Top25” will use the predictor(s) you determined to be the best predictor(s) in your linear regression analysis.
Estimation:
Apply your models to estimate (and interpret!) the Score and probability of finishing in the Top 25 for a player with (remember the model you choose as bestmight NOT include all these predictors):
Putt = 0
ARG = 0
APP = 0
OTT = 4
T2G = 2
Project Guidelines:
The project consists oftwo main parts and thus two submissions: One is the write-up ofthe analysis, and the second is a Zoom video presentation ofthis analysis. What follows are guidelines to follow for both.
Analysis Paper
1. You are provided with a large data set (ProjectData.csv) that consists of 1614 observations. You will run the R program CreateRandomData to generate a data set consisting of 150 randomly selected rows from that data set. This random selection will be dependent on your entering correctly the LAST five (5) digits of your 9-digit PSU number as the seed value in the provided R program. Severe penalties to violations of this (see Step 11).
2. The paper will be written using Times New Roman size 12 font and single spaced. Margins should be the default margins in Word.
3. The paper must be submitted in Word document form (doc or docx extension) or as pdf. Any other formats will NOT be accepted.
4. The main body of paper will be 4 pages minimum to 6 pages maximum, NOT including appendix or the Executive Summary (Seehttps://custompapers.com/summaries-mistakes/for help). The executive summary should be a standalone first page of your report.
5. The main body of paper should NOT contain any code or copied output from R. Instead, put all output, excluding graphs, in a table. For example, linear regression output would be a table that includes rows for y-intercept, and each predictor, and columns of estimates, test statistics, and p-values.
6. The paper should NOT read like a recipe, i.e. “First, I did….”, “Next, I did…”, “Then, I did…” or like “Step 1…”, “Step 2….”, “Step 3….”.
7. The paper will consist of four parts: Part 1 the executive summary (remember does NOT count toward page limits), Part 2 for the linear regression application, Part 3 for the logistic regression analysis, and Part 4 the appendix (remember does NOT count toward page limits).
8. Parts 2 and 3 will include an introduction of what will follow. Part 1 will include a section on assumption analysis, model analysis, followed by overall conclusion of your results and comments regarding the analysis (e.g., what should your “client”, the PGA, be aware of regarding the results). Part 3 will NOT include an assumptions section but a section that discusses the results of your logistic regression model
9. The Appendix will include ALL graphs which must be referenced from the main body. The graphs are to be titled using Figure 1, Figure 2, etc. and referenced as such in the main body.
10. All analysis must be done using R and a copy ofall code use – INCLUDING THE CODE YOU USED TO SELECT YOUR 150 RANDOM ROWS - must be included in the Appendix. This code should be “clean” and free of errors. That is, I should be able to copy your code into my RStudio and run the code without errors and produce the output/graphs you included and referenced in your paper.
11. VERY IMPORTANT!! The seed number as explained in Step 1 must be the LAST five digits of your 9-digit PSU number. This will allow me to duplicate your results. If you do NOT use the correct number, you will automatically lose HALF points for the entire project. If your number matches that of another student, you lose ALL points for the entire project.
Zoom Video Presentation
Please submit just a link to your Zoom presentation. Guidelines for the presentation are these:
• The presentation should be a minimum of 3 minutes and maximum of4 minutes.
• At beginning of presentation, you MUST include yourself in the video holding (visible for me to see your photo and 9-digit number) your PSU ID with photo and clearly state your name and major or intended major. The purpose ofthis step is to verify the person doing the presentation. If there is a noticeable change in voice (i.e., suggesting someone else is now providing the presentation) you will receive a zero for BOTH the project and the and the presentation.
• The presentation itself should just provide a screen shot of each of the project graphs and output and your interpretation ofthem.
• The last part of video (e.g., the last slide if using PowerPoint) will include your thoughts on the project – what you found most challenging and at least two “things” you learned by doing this project. This can be anything related to R coding, statistics, or about yourself.
2021-12-10