Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Project: Evaluation of Scalable Data Processing Methods

Course Overview

This course focuses on Big Data Processing and Analysis, covering core topics such as:

· Indexing and search in large-scale data systems

· Algorithmic design for scalability

· Learning-based methods for data processing and analysis

· Scale-out frameworks

Students will explore both theoretical foundations and practical techniques used in modern data-intensive methodologies.

Project Objective

The primary goal of this course project is to provide hands-on experience with state-of-the-art methods in big data processing. Students will select and adopt an existing method recently published in a top-tier research venue (e.g., SIGMOD, VLDB, ICDE, KDD, NeurIPS, ICML, or any other Core A+ venue, https://people.iiti.ac.in/~artiwari/cseconflist.html) and evaluate its performance on a dataset that was not used in the original study. Note that the methodology should be at least somewhat related to scalability issues in data processing, analysis, and / or learning. The grading will be awarded for any new designs, ideas, or implementations that improve upon the existing method.

Project Tasks

1. Method Selection

o Choose a data processing method from a top-tier research publication.

o Clearly summarize the original goal, assumptions, and evaluation methodology.

2. Dataset Selection

o Identify or construct a new dataset that was not part of the original evaluation.

o Justify why this dataset is suitable for testing the generalizability of the method.

3. Implementation and Adaptation

o Re-evaluate the chosen method or adapt publicly available code.

o Make necessary modifications to ensure compatibility with the new dataset.

4. Performance Evaluation

o Compare the results with those reported in the original paper.

o Analyze performance discrepancies and provide insights into the method’s robustness.

Deliverables

· Presentation slides (within 20 pages): A brief outline of the method selection, dataset selection, implementation, and your findings.

· Codes and datasets: Reproducible codebase.

· Appendix report [Optional]: in ACM SIG Conf format (https://www.overleaf.com/latex/templates/association-for-computing-machinery-acm-sig-proceedings-template/bmvfhcdnxfty)

· All these should be submitted to a Gitea repository. The teaching team will create a repository to each team.

Evaluation Criteria

· Relevance and Quantity of the chosen method(s) and dataset(s) (40%)

· Depth of analysis, findings, and experimental rigor (60%)

Notes

· Students may work individually (not recommended) or in a team of maximum 3 people.

· Reproducibility and clarity of documentation will be emphasized.

· Projects with potential for further research or publication are highly encouraged.