Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

INST0069

Exercises on graph databases

Learning outcomes to be assessed:

.     Understand the underlying ideas of graph databases

.     Be able to create, manage and query simple graph databases

Basic Assignment

Description

The goal is to create appropriate Cypher statements that retrieve

information from a graph database and implement simple

recommendation algorithms.

Submission

Requirements

Each student must submit:

. A report (in PDF) containing the code of the Cypher statements that you created and some explanatory comments on the

recommendation algorithms that you implemented.

The name of the submitted file must consist of your student number (SRN - this can be found on your ID card), the module code and the Assignment Code, in that order, with no spaces e.g. "123456-

INST0069-CW1.pdf".

Do not include your name anywhere in your submission.

The first page of your report must give the following information: (1) Your

student number, (2) The module code and title (INST0069 Graph Databases & Semantic Technologies), (3) The lecturer’s name, (4) The assignment you are submitting - 'CW1: Exercises on graph databases'.

Please submit this assessment electronically at

https://moodle.ucl.ac.uk/mod/assign/view.php?id=5303930

before 3pm, Monday, 11 March 2024.

Conditions

This assessment forms part of your degree assessment. It must be done entirely on your own from start to finish:

. You must not collaborate or work with other students at any stage.

. You must not send or show other students your answers.

. You must not ask other students for help, or ask to see their answers. This is unfair to the other student concerned, since it may lead to them being accused of plagiarism.

. You must not seek help from friends, relatives, or anyone other than the lecturer and/or TAs for INST0069.

. You must not use any generative AI tools (such as ChatGPT or Co-Pilot) at any stage in the preparation of your submission

If you are having difficulty in attempting this assessment, you should discuss this in the first instance with the lecturer.

The assignment is worth 50 % of the overall assessment for this course

This assignment must be completed: As an individual piece of work.

Date work set (provisional): 27/01/2024

Date and time due in (provisional): 3pm on Monday, 11 March 2024

Standard lateness penalty will apply

Target date for return of marked work and full feedback (provisional): 8 April 2024 (tentatively within 4 working weeks, according to DIS policy)

Description

Using the Cypher statements given in the next section, you will create on Neo4j Desktop a graph database describing a fictitious social network where users share information about the books they have read. In this network, users can follow other users and rate the books they have read. The database contains information about the users (username and age), the books (id, title, genre, author and publisher), the “follows” relationships among the users, the “read” relationships between the users and the books, and the ratings of the users for the books they have read.

For the first exercise, you will create appropriate Cypher statements to retrieve information from the database (as described in the exercise). For the second exercise, you will think of some simple recommendation algorithms with which you would recommend users to follow or books to read to a user of the network and implement these algorithms using Cypher statements.

Your report must contain:

-    The Cypher statements that you created for exercise 1.

-    The Cypher statements that you created for exercise 2 along with some brief description of the recommendation algorithms they implement.

Setup

Before you attempt the exercises, follow the steps below to create the graph database on Neo4j  Desktop. Examine the csv files and the database that you created to verify that the database has been correctly implemented and to familiarise with the structure of the database.

1.   Using Neo4j Desktop, create a new graph database and name it Book Graph.

2.   Download from Moodle and copy the files books.csv, users.csv, followers.csv and ratings.csv to the import folder of your database.

3.   Run the following Cypher statements (one at a time) to populate the database with data about books, the users of a social network, the relationships among the users and the relationships  between the users and the books.

a .  LOAD  CSV  WITH  HEADERS  from   'file:///books.csv'  AS  book

CREATE  (:Book  {bookID:book.BookId ,  title:book.Title,  genre:book.Genre, author:book.Author,  publisher:book.Publisher})

This should create 101 nodes with label Book, each with a bookID, a title, a genre, an author and a publisher property.


b .  LOAD  CSV  WITH  HEADERS  from   'file:///users.csv'  AS  user  CREATE  (:User {username:user.Username,  age:toFloat(user.Age)})

This should create 26 nodes with label User, each with a username and an age property.

c .  LOAD  CSV  WITH  HEADERS  from   'file:///followers.csv'  AS  fol

MATCH  (u1:User  {username:fol.User1}),  (u2:User  {username:fol.User2}) CREATE  (u1)- [:FOLLOWS]->(u2)

This should create 100 :FOLLOWS relationships among users.

d .  LOAD  CSV  WITH  HEADERS  from   'file:///ratings.csv'  AS  rat

MATCH  (u:User  {username:rat.User}),  (b:Book  {bookID :rat.Book})  CREATE  (u)- [:READ  {rating:toInteger(rat.Rating)}]->(b)

This should create 199 :READ relationships between users and books each with a rating property.

Exercise 1

Create Cypher queries to:

1.   List the titles of the books that have been read by Charles and by a user whose age is more than 20, and have received a rating by both that is greater than 2.

2.   List the titles and authors of the books that have been published by MIT Press, Penguin,  Springer or Wiley and their genre is fiction, history, mathematics or economics. Show the results in alphabetic order of the titles.

3.   For each pair of users such that one follows the other, list the titles and the publishers of the books that they have both read.

4.   List the names of users who follow Fiona and have read more than 10 books. For each such user, show also the number of books they have read.

5.   List all publishers such that the average rating of the books they have published is higher than the average rating of the books published by Pearson.

6.   List the nodes in the shortest path from Adam to Lilly.

7.   Show the maximum distance from a user to a science book, where the distance from node A to node B is the length of the shortest path from A to B.

8.   List the titles of the books for which the publisher is not known and for each of them the list of names of the users that have read them. For each such book, add the label

UknownPublisher.

9.   List the names of the users that are followed by Fiona and, if they have read any nonfiction books, the list of titles of those books.

10.A book is considered popular if it has been read by more than 4 users and it has received at least two ratings that are greater than 3. List the titles of the popular books.

Exercise 2

1.  Write down two algorithms that provide recommendations for users to follow, using the

available information about users and books, and implement each of them as a CYPHER statement. The statement should create new RECOMMENDED_USER relationships, each connecting a user with a recommended user to follow. The recommended users should not include those that the user already follows.

2.  Write down two algorithms that provide recommendations for books, using the available

information about users and books, and implement each of them as a CYPHER statement. The statement should create new RECOMMENDED_BOOK relationships, each connecting a user with a recommended book. The recommended books should not include those that   the user has already read.

Marking Criteria and Procedure

This set of exercises counts as 50% of the total course assessment. Exercise 1 is worth 30% (marks are divided equally among its subquestions) and Exercise 2 is worth 20% (marks are  divided equally among its subquestions). Marks will be awarded according to:

-    whether the answers are technically correct (i.e. the syntax of the Cypher statements is correct and the statements produce the correct results)

-    whether the answers given are as straightforward as possible and not more complicated than necessary

-    whether the answers are set out clearly and in good style

-    (for Exercise 2) whether the recommendation algorithms are clearly described and correctly implemented and the recommendations they produce are reasonable

Each submission will be first marked according to the criteria given above, and a sample of

submissions will also be second marked, using open and check marking, in accordance with the   guidelines in https://www.ucl.ac.uk/academic-manual/chapters/chapter-4-assessment-framework- taught-programmes/section-4-marking-moderation#4.6