闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

COMP60711 Data Engineering RESIT

Section A

a) Give two arguments to persuade an IT Company’s CEO to invest financial resources in a new Data-Lake-based IT Architecture for his company. Make sure you add a few lines explaining each argument and justifying its relevance and potential impact on the IT Company.

b) Explain why you agree or disagree with the following statement, illustrating your answer with an example:

“Some data preparation strategies can negatively impact on data analysis”.

c) Describe how parallelism can be used to scale out the calculation of the following data profiling operations, emphasising any challenges:

i. Second Quartile.

ii. Mean.

d) Give an example of an unclean dataset being submitted to a Data Transformation step (in a Data Cleaning Process) that generates more data discrepancy. Make sure you describe the dataset by providing schema information, instances (i.e., values), the reason why the data is unclean, as well as details about the data transformation being applied on the data and the discrepancy that the transformation generates.

e) Provide two example situations where multi-column data profiling is useful, providing an explanation for each.

Section B

a) In the context of business intelligence:

(i) Characterise the differences between OLTP and OLAP? Why might different database systems be used for each type?

(ii) Compare and contrast the use of row store and column store.

(iii) Discuss the current state of data warehousing and how it has adapted to the emerging ‘big and complex data” revolution- including many and often dynamic data sources and requirements for analytics.

b) In the context of classification:

i) Outline a decision tree classification algorithm; discuss how the attribute used at each node is chosen and what effect different training sets may have.

ii) Tests on a classifier give the following confusion matrix: