Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

ALY2010 Summer 2023 Final Project

Overview and Rationale

Being able to ask appropriate questions of data is an important part of the work of data analytics. It is also  critical to be able to interpret the results of the analysis. This assignment is intended to familiarize you with the data sets and to get you thinking about key business questions you can ask and answer from this data.

This project will help you measure your understanding of basic concepts in analytics. It

will help you measure your skills to obtain basic descriptive statistics from a data set.

It will help you measure your skills to run hypothesis testing using different methods in R.

It will help you measure your skills on R, R Studio and R Markdown.

It will help you measure your skills to apply critical thinking to make meaningful observations of your data

analysis results.

Grade

Project report:                   300 points

Files to present:

For this assignment you must submit two (2) files:

1.   Present your final R Markdown file.

2.   Your original HTML report including all your R codes.

Please remember: your report is very important, make it look

Professional, make it as short as possible but containing all the relevant information, tell me what you learnt, and using deep critical thinking, provide examples of practical applications.

Part 1. Title and Introduction

Start by creating a nice title for your report then write a very informative introduction.

1.   Using your own words, describe in general terms your understanding of Analytics.

2.   Explain how you visualize the application of analytics in an industry of YOUR interest.

3.   Throughout this course, which are the analytical techniques that you liked the most and why?

Use practical application examples.

4.   Using your own words, explain your understanding about ANOVA.

5.   Provide one practical example of ANOVA application in an industry of your interest.

6.   Compare your understanding of R and the beginning of the course and your actual understanding

of R. Explain in some details what is your actual expertise level.

Remember to use at least 2 different academic references for each topic. I want to see at least 10 different references being used in the introduction section.

Part 2. Analysis section

As you have already learnt, for each task, enter all your codes in one single r chunk {r}, creating object names for all your codes and equations, and using the name of those objects to present your answers using inline R codes or tables.

Several good examples are presented in Lecture 13.

Remember, for each task, follow these task rules:

a.   Write a title for the task.

b.   Describe the  task.

c.    Do the task.

d.   Write observations about the task results.

Be organized.

Task 1. Descriptive statistics

Using the file wine(ALY2010).xlsx, present basic descriptive statistics of the data set:

1.   a. Write a title for the task.

2.   b. Describe the  task:

a.   Start by writing a description of the task, what you are about to perform and what is the purpose of the task.

3.   c. Do the task:

a.   After the description, enter the R chunk for this task. All the codes for this task should be

organized in just one single R chunk.

b.   Present the number of categories in wine type and observations per category, choose a table or

a graph to present this data.

c.    For each numerical variable, present at least two values of central tendency and two values of

dispersion. Use a very well-organized table to present this information.

d.   Use kable() from library(knitr), and then apply a post format using library(kableExtra). Refer to:10.2 The kableExtra package | R Markdown Cookbook (bookdown.org)for various design tips

available under kableExtra.

e.   Extra points of you chose to use additional graphs to describe your numerical variables.

4.   d. Write observations about the task results.

a.   End the task by writing what you learnt about the data set, general information.

b.   Then write well-informed observations of your results, in other words: what the graphs and

values you obtained tell you about your variables.

This is basically an open task; this scenario will happen at some point at any job. You will be handling some data set and you will need to describe it to your team, and this happens before you start any deeper data   analysis. You always need to first know as much as possible about your data.

Task 2. Probability table and graphs

Using the first table you produced in task 1, categories of wine type and number of observation (frequencies), complete the table by adding the cumulative frequencies, probabilities, and cumulative probabilities.

For this task, you will use two R chunks.

In the first R chunk:

1.   Prepare the codes to create the table.

2.   Use a very well-organized table to present the information.

3.   Use kable() to present your table, and then apply a post format using library(kableExtra), you can see some examples on my website.

4.   Remember that your table contains 5 columns, the first column is the names of the wine types, second

column the frequencies, then cumulative frequencies, probability, and finally, cumulative probabilities.

In a second R chunk:

1.   Start the R chunk by inserting code par(mfrow=c(2,2)) , this will present your graphs in a matrix of 4 by 4.

2.   Prepare and present four bar plots to display the four columns from your previous table. For example,

the first bar plot is wine type and Frequencies, the second Wine type and cumulative frequencies, etc.

Remember to follow the task rules, you must write meaningful observations of your results.

Task 3. Confidence intervals

a.   Calculate the 90% confidence interval of the mean for variable Ash.

b.   Calculate the 94% confidence interval of the mean for variable Color Intensity.

c.   Calculate the 98% confidence interval of the mean for Phenols ONLY for wine type California. In this

case, you need to first filter() values from type California.

All the codes should be in one single R chunk. Use tables or Inline R Codes to present your results.

Remember to follow the task rules.

Task 4. Regression analysis

Analyze the relationships between x=Flavonoids and y=Phenols.

Remember that for this type of analysis you need to present the coefficients of correlation and determination, the linear regression formula, and a scatter plot.

Remember to follow the task rules.

Task 5. ANOVA analysis 1

1.   Start this task by presenting and analyzing a box plot for wine Type versus Magnesium.

2.   Find some references on why magnesium is important for wine makers and why it is used as a

measure of quality.

3.   Then run a One-Way ANOVA analysis to test the hypothesis that there are differences in the

magnesium content between the three wine types under study.

4.   Notice: To perform this analysis, you do not need to filter the data, you can pull the information

directly into the aov() code as I explained in class (also read my lecture and Bluman’s book).

Remember to follow the task rules.

Part 3. Conclusions and Bibliography.

a. Write a very informative conclusions section following indications and recommendations giving by your instructor,

Be mindful to make an overall observation of the whole project, the meaning of the results you obtained regarding the direction of the project, explain any new skills you gained.

b. Present a bibliography section with the references you used in your report.

Technically speaking, if you do not mention any references in the main text of your report, then it is like you did not use any, even if you add a list at the end. Present references in the main text of your

reports, use either only the first author’s last name and year, e.g., (Bluman, 2017) and then list them in the bibliography section in alphabetical order, or use a number in order of use or appearance, then list  them in the bibliography section in that numerical order.

Part 4. Appendix and Acknowledgments

a. Present an appendix title to mention the Rmd document you are attaching to your report.

b. For a final project, it is a courtesy to add an acknowledgments section to thank anybody that you feel helps you in any way during the project and whole course (classmates, TA, instructor, advisor, etc.).

Appendix 1. Proper use of object names in R Markdown