闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

EE 660

Project Assignment

Introduction

For this project you will pick your own topic and design your project. You are encouraged to pick a topic (or dataset) of interest to you, and that is appropriate for a machine learning class project.

You will submit a project proposal, a final written report that describes your approach and results, and your computer code. A timeline of due dates and grading criteria are given at the end of this assignment.

Types of Projects

There are two overall types of projects; you may choose either one for your project.

(1) Type 1 project. Solve a machine learning problem by implementing a machine learning system of your own design, that uses real-world data. For this, you will choose one (or more) set(s) of real-world data, and define the goals of your project. For example, the goal of your project might be to use regression or classification techniques to predict the output attribute y as well as possible. You could additionally include other goals, such as understanding what the limitations in your final system are caused by; investigating the attributes that are most predictive, and assessing why; etc. You will typically have other issues to address as well, such as number of data points N not being ideal, missing or noisy data, imbalance of data set, categorical feature values, preprocessing steps, etc. See the “Project Tips” document for suggestions of where to find datasets, and criteria for sifting through them to find one appropriate for a class project.

(2) Type 2 project. Perform one or more experiments in machine learning. The experiments would typically use synthetic data, so that the data can be controlled and varied in various ways; synthetic data also allows you to generate “unknowns” to numerically estimate the out-of-sample error directly. It might also be applied to real-world data to assess the effects of realistic data.

This would typically also involve some theory – either to predict what would happen, or to help interpret the results of what did happen. Experimental work would typically have a statement of what will be learned from the experimental results, or a prediction of what is expected; and explanations and interpretation (after the experiment) based on some theory, intuition, or conjecture. Or, a project might start with a theoretical component that develops some predictions, and then run some numerical experiments to test them.

A good example of an experiment is Sec. 4.1.2 of AML, especially Exercise 4.2, including the results shown in Fig. 4.3 and some of its interpretation.

Suggestion: If you’re not sure what you want to do, you can try the following.

(1) For a Type 1 project, start by finding a dataset that you’re interested in, and develop a project and goals based on that data. Or, you can also browse through Kaggle competitions to get an idea of what kinds of topics could constitute a project.

(2) For a Type 2 project, you can choose some aspect of class material you find interesting, and pose some questions of how some variables would depend on others; especially where it isn’t obvious, where we haven’t given examples that show the dependence, or where you can think of a lot more to try than in the examples we covered in class.

Guidelines and Ground Rules

Groups: You may do your own individual project, or you may work in a team of 2 students. Your project will be graded accordingly; that is, 2 students should accomplish substantially more work than one student (or solve a problem that is substantially more difficult). Teams of 3 students will be considered in exceptional circumstances; a private piazza post (describing the effort and justifying a 3-person team) is recommended before submission of the project proposal in this case. Note that if you work in a team, you will submit one project final report together. All students should participate in writing the final report. Moreover, the report should clearly state the contributions each student made to the project. Usually all students of a team will receive the same grade for the project, although different grades may be assigned in exceptional cases.

Your course project must be work that you do specifically for this course. If you want to do a project that is on a topic you have worked on previously, or are currently working on (e.g., as part of your research, or a project for another class), that is OK. But, you must clearly distinguish between what is done for EE 660 this semester, and what is done for other purposes (e.g., research or other class work). In your proposal and your final report, you must include a brief summary of the other work and describe how the EE 660 project work is distinguished from it. Also, consider how much background information will need to be described in your project report for the project work to be understandable to people that may not have the domain knowledge you have; too much would imply it’s not a good topic for a class project.

Code - writing your own vs. using available code from the internet. OK to use code from the internet - be sure to state so in your report. It’s also OK to write your own code, for which we recommend Python. (You may write portions of your code in C, C++ if that would be advantageous.)* Keep in mind that your project topic should be focused on machine learning issues. Spending almost all your time coding up a well-known but complicated algorithm (or coding most of your project in C, C++) will not leave you much time to do anything else. On the other hand, if your project consists of running lots of different algorithms from the internet without understanding what the algorithms are doing, then you are missing the point of the project.

Suggestion: Best to use only standard libraries, and code up what else you need yourself; and for functions/methods you use from libraries, make the effort to understand what they actually do.

Data: For real-world data, it is recommended to use dataset(s) that are publicly available on the internet. You may also acquire your own data. However, be advised that data gathering (and subsequent processing of it to make it usable) can be very time consumptive, so think this through carefully during your planning/proposal stage if you want to acquire your own data. A team effort can make acquiring your own data more feasible.

Suggestion: Try to make the size of your project big enough to be interesting to you or your team, and to not be a trivial project; but small enough to be consistent with the amount of time and resources available. Keep in mind we will also have some homework assignments during the project period: 1 homework assignment while your project proposal is being graded; and probably 1 homework assignment during the 3.5-week project period after you receive your graded proposal. Also consider the computational resources you have, and the likely amount of computation needed for your proposed project (for example, datasets with 1 million data points will likely eat up a lot of computational resources if you use the entire dataset).

Required Elements

Your project is required to include the following elements.

Significant machine learning content. This should be the main part of your project, and will include the use of ML concepts, techniques, and algorithms. It will also include some understanding of, or insightful attempts at understanding, results that you are observing (intermediate results as well as final results).

Use of real-world data for Type 1 projects, or use of synthetic data and/or real- world data for Type 2 projects, as described in project types above.

A portion of your project must include at least one of: (i) transfer learning (TL), (ii) semi-supervised learning (SSL), (iii) some other extension of your project work. For help and tips on any of these, please see the Project Tips document.

Complexity analysis. Some consideration of complexity of your approach wherever reasonably possible. This could include complexity of the model(s) used and hypothesis set(s), the number of data points, and anything known or relevant about the underlying target function. If it isn’t tractable to analyze the complexity mathematically, then a rough estimate using principles like degrees of freedom, perhaps accompanied by some numerical experiments, should be done. Whatever method you use, it should help you make good choices in developing your model(s), managing the number of data points, size of test set, etc.

Reporting and interpretation of intermediate (or multiple) results. For Type 1 projects, this would typically be done using validation set(s), with or without cross- validation. Accumulating a numerical estimate of mean and standard deviation of the (cross-)validation error can give intermediate results to be interpreted or explained. For Type 2 projects, this will depend on the experiments being performed, and could involve results of smaller experiments that together comprise a larger experiment, or merely a set of different results from one overall experiment.

Interpretation and understanding of your methods, results, and procedures. Your report should demonstrate that you have an understanding of what you are doing and discovering. Where the reason behind some results or findings are unclear, state so and try to make a conjecture that could explain it, and/or suggest an experiment that could shed more light on the issue.

Baseline systems. For Type 1 projects, 2 baseline systems are required: (i) trivial and (ii) non-trivial. Trivial systems typically only use the output data y: e.g., for a binary classification problem a trivial system might always decide the majority class. A nontrivial baseline system will typically use the training data with a straightforward ML system. Clearly describe your baseline systems in your report, and use their results to compare with your ML system(s).

For Type 2 projects, your work must be compared with something. For example comparing results with and without some change, or results from some experiment to some theoretical prediction. Clearly state what is being compared with what, and what you can conclude from the comparison.

Description of how the data was used – pre-training set (if any), training set, validation sets, any cross-validation loops, test set, etc. You should use your datasets in a valid way. Consider using a diagram or flow chart to make your description clear. This may be included in the next item below rather than a stand-alone description.

Description of the overall procedure (methodology) followed. For example, this could be a list of steps, sequence of paragraphs, or flow chart showing, for example: drawing data samples, choices of hypotheses, preprocessing, separation of data into various sets, training algorithms, model selection, feature selection, choosing parameters and validation, final choices, and final testing.

Final results using appropriate performance measure(s). Compare with baseline systems, possibly with intermediate results, and any published results you found.

Estimation of out-of-sample error. Some valid method(s) for estimating the out-of- sample error or predicted error on unknown (new) data. Ideally, this would include application of some theory as well as some numerical results. A simple example for Type 1 projects, is to use a true test set, and to use a theoretical error bound to estimate the maximum generalization error. A simple example for Type 2 projects using synthetic data, is to numerically estimate the out-of-sample error by drawing a new set of data points, multiple times; a sample mean and sample standard deviation can be used to estimate the out-of-sample error and its error bar.

*Allowed languages are Python, C/C++. You may use any libraries or functions you deem appropriate. Please note that you are expected to understand what the functions you use are doing, and this will enable you to interpret the results more clearly in your report. If you want to use other languages, check with the TAs or instructor first.

Methods and techniques you can use. A minimum of 50% of your project work should use methods and techniques covered in EE 660. This includes topics already covered in class, as well as topics we haven’t yet covered (refer to the course outline for upcoming topics). You can also include methods and techniques from EE 559, and from outside of both classes; but these (combined) should constitute less than 50% of your project. Deep neural networks (DNNs) are not a topic of EE 660; they can account for no more than 10- 15% of your project work. If you’re not familiar with developing DNN’s, please be advised that training DNN’s can require a lot of data, computer time, and development time, as well as knowledge of languages/packages tailored to DNNs.

Citation of others where appropriate. This applies to both your project final report and your code. In the final report, any statements taken from other sources must be cited and referenced as such. Similarly, any results of others that are stated in your report must also be cited and referenced. Instructions for doing this will be included with the Project Final Report Instructions (to be posted later). Any code that is taken from elsewhere and used in your project, must be commented as such in your code. Failure to cite other sources where appropriate amounts to plagiarism, and will result in deductionfrom your project score. In egregious cases, your final course grade will be lowered directly, as a penalty.

Comment: Details and instructions for the final report will be posted later.

Grading Criteria

Criteria used to grade the projects will include: workload (difficulty of problem, amount of work), inclusion of required elements, technical approach and execution, data handling (correctness and appropriateness), performance (correctly estimated or evaluated; comparison with baseline system(s) and work of other people if available), analysis (understanding and interpretation), project proposal score, and write up (clarity, completeness, conciseness).

2022-11-06

Java

物理(Physical)

LINUX

C++

Python

Processing

sas

ios

maths

maple

C语言