Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Study, Implement and Present a Machine Learning Model

Introduction

The objective of this assignment is to enhance both your practical skills in machine learning and yourunderstanding of the technical details involved in the learning process.

You will be tasked with i) performing a machine learning task, and ii) presenting the results of your project while being prepared to address related questions.

Specification

There are generally two categories of acceptable tasks for this assignment, based on the primary effort needed to complete them. This classification serves as a guideline to focus your study efforts within the project. While you can define your effort as a mixed type, and you will not be penalised for 'crossing boundaries'.

Option 1, study a fundamental machine learning model: The focus is the theoretical fundamentals, including notions and techniques of hypothesis space, learning algorithms, loss function design etc. To perform a project of this type, you need to demonstrate your understanding of the technical details of a specific machine learning model. Your study should be conducted within the context of the learning theory framework introduced in the subject.

Option 2, build a machine learning system as a solution to a practical challenge: The focus is on aligning your understanding of machine learning systems with practical applications. “Alignment” and “Practicality” are the two essential factors in this type of projects. To fulfil this project type, you must design a learning-based system to tackle a problem involving real-world data. This task can be anything from predicting stock prices to identifying patterns in images or text. You should formulate the task into a computable form and ensure the learning objective (loss) is both computationally effective and practically relevant.

An acceptable demonstration should include a computer program of a learning model, which can be executed from a cloud-based computational platform. In your presentation, you need to explain the key components of the machine learning model and their implementation as functions / objects in the computer program. You also need to be able to discuss the main technical challenges encountered in your study, as well as your understanding of the benefits, drawbacks of the machine learning model.

Deliverables (submissions)

This project will have three primary deliverables: (1) a computer program, (2) a project journal, and (3) materials for presentation. The computer program serves a formal demonstration of your projectoutcome. The project journal must be uploaded to Canvas by 11:59 pm on October 15, 2023. You will submit a PDF file, including a link to a cloud-based computational service where your implementation with a version history prior to the date above can be assessed and evaluated. Colab is recommended (https://colab.research.google.com/).

The submitted journal will be used as reference in the demonstration. You are encouraged to consult AI assistant services such as ChatGPT to resolve technical issues during the study and project implementation. AI-generated code and notes are allowed to be used in the project without modification (but with critical and correctness check!)

However, careful documentation in the project journal is crucial to ensure you can demonstrate your understanding of all technical details of the project, especially if you employ AI generated contents. In the presentation, including both demonstration / addressing questions, the efforts recorded in the project journal can be considered as evidence of fulfilling the criteria (see below). In other words, the project journal may serve as evidence of criterion fulfillment (see details below).

As to “(3) presentation materials”, you have the flexibility to select the appropriate format, such as slides or well-structured Jupyter notebooks.

Project Presentation and Grading

Both assessments 2 and 3 will be graded based on demonstrations during interactive sessions in Weeks 11 and 12. Evaluation criteria for A2 are concerned with the project quality and your understanding of the details. They include the following aspects:

A. Clear definition of the learning task

You are required to clearly define and address queries regarding the information exchange interface between the system and its environment. This requires specifying the program's inputs and anticipated outputs. Note that most machine learning programs consist of two phases: training and deployment. The interfaces for each of these phases should be distinctly outlined. Remark for Type-II projects: Defining the interface involves posing a coherent and relevant question that can be answered through data-driven computational models. A clear and compelling articulation will satisfy the 'alignment' criterion.

B. Knowledge of the data model and learning algorithm

You need to show sufficient familiarity of the essential components of the computational procedure of the machine learning model(s) that you used or studied in the project. The focus is on articulating the design and function of key algorithmic components.

A rigorous and insightful exposition of the core computation steps -- clear explanation of how a program function is connected to the theory of a model -- will filful the “explain the key components” requirement.

C. Evaluation and improvement

You need to specify the intended behaviour of the model, explain the chosen loss function, and identify the difference (if any) between the loss function and the practical objectives of the task. Note that the “difference” means that the model's target performance characteristics may or may not be the same as the “loss function” that describes the quantifiable metric used for optimization. When discussing disparities, address whether the loss function fully encapsulates the task goal or if additional measures are required.

The three criteria carry equal weight in the assessment. Your grade will be determined by peer students and a moderator based on your demonstration and answering questions of the project. Study notes and implementation logs documented in the submitted project journal will serve as credible evidence of competence. Each section's grade will be calculated as a percentage of fulfillment, ranging from 0% to 100%. The final grade for A2 will be: (grade_A + grade_B + grade_C) / 3 * 50.

Evaluation criteria for A3 assess your communication skills and engagement in the peer review session to ask critical questions and provide constructive comments. A3 has the following criteria:

A. Clarity of presentation: This assesses your ability to articulate thoughts clearly and logically. The logical flow of the presentation should be well organised, e.g., including introduction, main points, and conclusion.

You should be able to communicate technical notions used in the project with effective means such as graph illustration.

B. Understand the response to questions in direct manner.

C. Peer interaction: You make active engagement in post-presentation discussions or collaborative activities.

All criteria carry equal weight in the assessment. Each section's grade will be calculated as a percentage of fulfillment, ranging from 0% to 100%. The final grade for A2 will be: (grade_A + grade_B + grade_C) / 3 * 20.

Examples and feedbacks

There are no prior examples of this assignment. However, suggestions on study topics and journal keeping will be provided on the Canvas site. Advices will be provided in various channels depending on the instructor’s resources and students engagement.

Please consider the suggestions as helpful guidelines and AVOID using them as strict project templates. This applies to the journal document in particular. The journal would be a personal reference to present and defend your project in the viva session. It is mostly reasonable to organise the document to effectively store personal learning experience and knowledge.

Further clarification

Computer program: The project must be self-sufficient, encompassing data retrieval, processing, and environment setup within its implementation. For large datasets exceeding 10 GB, consider pre-loading the data onto your Google Drive for expedient access via Colab, and initiate the Colab notebook (Virtual Machine) prior to your demonstration. If opting for a demonstration from a personal device, you bear the responsibility for the environment's reliability. Hardware/sytem failures, such as a malfunctioning laptop, will be considered equivalent to an implementation failure.

Furthermore, ensure that your local demonstration aligns with an archived online version (e.g., a GitHub repository) timestamped before the A2 deadline.

For the purpose of demonstration, you may find it is convenient to keep the computer program well organised and annotated (this is in addition to the project journal, please consider in-class notebook demonstration for examples).

Implementation with technical details (A2 option 1): The implementation should include detailed computational steps of an algorithm. We do NOT consider straightforward usage of the off-the-shelf toolboxes as implementing the algorithm details. For example, if you choose to build a decision tree, the implementation of the tree-building algorithm should address the construction of the tree structure, the computation of splitting data (a subset of the training dataset) at a tree node to create the children nodes -- in ID3, this is to compute the information gain and the entropy and decide the split accordingly. However, it is allowed to use basic auxiliary tools such as the libraries to perform matrix or linear algebra operations, to facilitate loading and parsing the data files, etc.

Practical task (A2 option 2): The data analytics task should present genuine opportunity and challenge that can be addressed via using machine learning techniques. You need to justify that the dataset contains sufficient information to represent practical relationships between the attributes and the target to be predicted.

Be aware that utilizing small-scale, pre-packaged datasets—like Iris flower classification, handwritten digit recognition, or Titanic survival predictions—available through third-party libraries as built-in examples, will not suffice for meeting the 'practical challenge' requirement. Using such 'toy' datasets could result in a lower score under criterion A when following the guidelines for the second type of project.

Learning framework and test scheme (evaluation of your model): A proper training and validation scheme must be set up for the test of the implementation. More sophisticated evaluation schemes are also welcomed. For formal and detailed information of the learning framework, refer to the related sections in course materials.

Interpretation of criteria: This document should not serve as a basis for disputing grades. Any ambiguities in its interpretation must be clarified by October 16, 2023 (grading sessions begin).