Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

ISIT312 Big Data Management

Assignment 1

Spring  2023

Published on 24 July 2023

Scope

This  assignment  includes  the  tasks  related  to  implementation  of  HDFS  application  and implementation MapReduce applications .

This assignment is due on Saturday, 19 August 2023, 7:00pm (sharp).

This assignment is worth 10% of the total evaluation in the subject.

The assignment consists of 4 tasks and specification of each task starts from a new page.

Only electronic submission through Moodle at:

https://moodle.uowplatform.edu.au/login/index.php

will be accepted. A submission procedure is explained at the end of Assignment 1 specification.

A policy regarding late submissions is included in the subject outline.

Only  one  submission  of  Assignment   1  is  allowed  and  only  one  submission  per  student  is accepted.

A submission marked by Moodle as "late" is always treated as a late submission no matter how many seconds it is late.

A submission that contains an incorrect file attached is treated as a correct submission with all consequences coming from the evaluation of the file attached.

All files left on Moodle in a state "Draft(not submitted)" will not be evaluated.

A submission of compressed files (zipped, gzipped, rared, tared, 7-zipped, lhzed, … etc) is not allowed. The compressed files will not be evaluated.

An implementation that does not compile well due to one or more syntactical and/or run time errors scores no marks.

The first assignment is an individual assignment and it is expected that all its tasks will be solved individually without any cooperation with the other students.  However, it is allowed to declare in the submission comments that a particular component or task of this assignment has been implemented in cooperation with another student. In such a case evaluation of a task or component may be shared with another student. In all other cases plagiarism will result in a FAIL grade being recorded for entire assignment. If you have any doubts, questions, etc. please consult your lecturer or tutor during laboratory/tutorial classes or over e-mail.

Task 1 (1 mark)

Merging files in HDFS

Read an analyse HDFS applications provided in the files FileSystemCat.java and FileSystemPut.java and   available   in   a   folder   Resources attached   to   a specification of laboratory class for Week2 on Moodle.

Use   the   applications   FileSystemCat.java and    FileSystemPut.java to implement in Java HDFS application, that merges two files located in HDFS into one file also located in HDFS.

The application must have the following parameters.

(1)  A path to, and a name of the first input file in HDFS.

(2)  A path to, and a name of the second input file in HDFS.

(3)  A path to, and a new name of an output file to be created in HDFS. The file supposed to contain the contents of the first input file followed by the contents of the second input file.

Implement the application and save its source code in a file solution1.java.

Upload to two files to HDFS.  The contents, the name, and the locations of the files in HDSF are up to you.

When ready, compile, create jar file, and process your application. Display the results created by the application.

Use  Hadoop  to  provide  an  evidence,  that  two  files  uploaded  into  HDFS  has  been successful merged in one file in HDFS.

Deliverables

A file  solution1.txt that contains a listing of source code of your application , a report  from  compilation,  creation  of jar  file,  uploading  to  HDFS  two  small  files  for testing, listing of both files in HDFS, processing of the application and an evidence that that two files uploaded into HDFS has been successful merges in one file in HDFS. A file solution1.txt must  be  created  through  Copy/Paste  of the  contents  of Terminal window  into  a  file  solution1.txt.  No  screen  dumps  are  allowed  and  no  screen dumps will be evaluated.

Task 2 (2 marks)

Implementation of a simple MapReduce application

Read an analyse MapReduce application provided in a file Filter.java available in a folder Resources attached to a specification of laboratory class for Week3 on Moodle.

The application has the functionality equivalent to the functionality of the following SQL statement:

SELECT key, value

FROM sequence-of-key-value-pairs

WHERE value > given-value;

An objective of this task is to use the Java code provided in a file  Filter.java to implement a MapReduce application Solution2 that has the functionality equivalent to the functionality of the following SQL statement:

SELECT item-name, price-per-unit * total-units

FROM sales.txt

WHERE price-per-unit * total-units > given-value;

A single line in an input data set sales.txt must have the following format.

item-name price-per-unit total-units

For example:

bolt 2 25

washer 3 8

screw 7 20

nail 5 10

screw 7 2

bolt 2 20

bolt 2 30

drill 10 5

washer 3 8

The contents of a file sales.txt is up to you as long as it is consistent with a format explained above.

A value of given-value must be passed through a parameter of your program.

Save your solution in a file Solution2.java.

When ready list Solution2.java in Terminal window, compile, create jar file, and process the application. List an input dataset sales.txt in Terminal window and the results created by the application. When completed, Copy and Paste all messages from a Terminal screen into a file solution2.txt.

Deliverables

A file solution2.txt with a listing of source code of your application, report from compilation, creating jar file, processing the application, listing of a file sales.txt and listing of the results of processing of MapReduce application Solution2.java. A file solution2.txt must be created through Copy/Paste of the contents of Terminal window  into  a  file  solution2.txt.  No  screen  dumps  are  allowed  and  no  screen dumps will be evaluated.

Task 3 (3 marks)

Implementation of a simple MapReduce application

Read an analyse MapReduce application provided in a file MinMax.java available in a folder Resources attached to a specification of laboratory class for Week3 on Moodle.

The application has the functionality equivalent to the functionality of the following SQL statement.

SELECT key, MIN(value), MAX(value)

FROM sequence-of-key-value-pairs

GROUP BY key;

An objective of this task is to use the Java code provided in a file MinMax.java to implement a MapReduce application Solution3 that has the functionality equivalent to the functionality of the following SQL statement.

SELECT item-name, SUM(price-per-unit * total-units)

FROM sales.txt

GROUP BY item-name

A single line in an input data set sales.txt must have the following format.

item-name price-per-unit total-units

For example:

bolt 2 25

washer 3 8

screw 7 20

nail 5 10

screw 7 2

bolt 2 20

bolt 2 30

drill 10 5

washer 3 8

The contents of a file sales.txt is up to you as long as it is consistent with a format explained above.

Save your solution in a file Solution3.java.

When ready list Solution3.java in Terminal window, compile, create jar file, and process the application. List an input dataset sales.txt in Terminal window and the results created by the application. When completed, Copy and Paste all messages from a Terminal screen into a file solution3.txt. 

Deliverables

A file solution3.txt with a listing of source code of your application, report from compilation, creating jar file, processing the application, listing a file sales.txt and listing of the results of processing of MapReduce application Solution3.java. A file solution3.txt must  be  created  through  Copy/Paste  of the  contents  of Terminal window  into  a  file  solution3.txt.  No  screen  dumps  are  allowed  and  no  screen dumps will be evaluated.

Task 4 (4 marks)

Implementation of MapReduce application

Assume, that a bank records in a text file the withdrawals and deposits of certain amounts of money from the bank accounts. A single row in a file with the withdrawal/deposit records consists of an account number, a date when a withdrawal/deposit occurred, and an  amount  of money  involved.  Assume,  that  the withdrawals  are represented by  the negative numbers and the deposits are represent by the positive numbers and that each withdrawal/deposit modulo 50 = 0.   All values in a single record are always separated with a single blank.

An objective of this task is to implement MapReduce application Solution4 that finds the total amount of money deposited by each customer per year. For example, if a sample file with the withdrawals and deposits contains the following lines

1234567

1234567

9876543

9876543

9876543

1234567

9876543

12-DEC-2019

15-DEC-2019

25-JUL-2018

12-FEB-2018

01-JAN-2019

21-OCT-2020

22-OCT-2019

200

50

150

-50

150

-250

300

then your application supposed to produce the following outputs.

1234567 2019 250

9876543 2018 150

9876543 2019 450

The order of the lines listed above is up to you.

Upload to a local file system a small file for the purpose of future testing.   The file must contain the withdrawals and deposits and it must have an internal structure the same as it is explained and visualized above. A name of file and location of file in a local file system is up to you.

Save your solution in a file Solution4.java.

When ready list Solution4.java in Terminal window, compile, create jar file, and process  the  application.  List  an  input  dataset  with  information  about  deposits  and withdrawals  in  Terminal  window  and  the  results  created  by  the  application.  When completed,   Copy   and   Paste   all   messages   from   a   Terminal    screen   into   a   file solution4.txt.

Deliverables

A file solution4.txt with a listing of source code of your application, report from compilation, creating jar file, processing the application, listing a file with information about deposits and withdrawals and listing of the results of processing of MapReduce application  Solution4.java.  A  file  solution4.txt must  be  created  through Copy/Paste of the contents of Terminal window into a file solution4.txt. No screen dumps are allowed and no screen dumps will be evaluated.

Submission of Assignment 1

Note, that you have only one submission. So, make it absolutely sure that you submit the correct files with the correct contents. No other submission is possible !

Submit  the  files   solution1.txt,    solution2.txt,    solution3.txt,    and solution4.txt through Moodle in the following way:

(1)  Access Moodle at  http://moodle.uowplatform.edu.au/

(2)  To login use a Login link located in the right upper corner the Web page or in the middle of the bottom of the Web page

(3)  When    logged     select    a     site     ISIT312/912  (S223)  Big  Data

Management

(4)  Scroll down to a section Assessment items (Assignments)

(5)  Click at In this place you can submit the outcomes of your work on the tasks included in Assignment 1 link.

(6)  Click at a button Add Submission

(7)  Move  a  file  solution1.txt into  an  area  You can drag and drop files here to add them. You can also use a link Add…

(8)  Repeat step (7) for the remaining files  solution2.txt,   solution3.txt, and  solution4.txt

(9)  Click at a button Save changes

(10) Click at the checkbox with a text attached: By checking this box, I confirm that this submission is my own work,  in order to confirm the authorship of your submission .

(11) Click at a button Continue

(12) Check if Submission status is Submitted for grading.