COMP3425 Data Mining S1 2021


Assignment 1


Maximum marks 100

Weight 15% of the total marks for the course

Length Maximum of 8 pages excluding cover sheet, bibliography and appendices.

Layout A4. At least 11 point type size. Use of typeface, margins and headings consistent with a professional style.

Submission deadline 9 am, Monday 15 March

Submission mode Electronic, PDF via Wattle, file-name includes u-number

Estimated time 15 hours

Penalty for lateness 100% after the deadline has passed

First posted: 22nd Feb, 9am

Last modified: 1st Mar, 3pm

Questions to: Wattle Discussion Forum


This assignment specification may be updated to reflect clarifications and modifications after it is first issued.

In this assignment, you are required to submit a single report comprising your answers to set questions in the form of a single PDF file with a file-name that includes your University u-number ID. The first page must have a clearly identified title and author, identified by both name and university u-number. You may also attach supporting information (appendices) in the same PDF file. Appendices will not be marked but may be treated as supporting information to your answers.

This is a single-person assignment and should be completed on your own. Make certain you carefully reference all the material that you use. Any material that you wish to quote must have the source clearly referenced. It is unacceptable to present any portion of another author's work as your own. Anyone found doing so will be penalised in marks. In addition, CECS procedures for plagiarism will apply.

It is strongly suggested that you start working on the assignment right away. You can submit as many times as you wish. Only the most recent submission at the due date will be assessed.


Task

The Australian Computer Society Code of Professional Conduct 2014 is expected to be applied by all Computing Professionals in Australia. It sets out six values but stresses the primacy of the public interest as the overriding value. In 2017, the US Branch of the Association for Computing Machinery (ACM), recognizing the ubiquity and far-reaching impact of algorithms in daily lives, issued a Statement on Algorithmic Transparency and Accountability incorporating seven Principles designed to address potential harmful social discrimination due to bias. In 2018, the Australian Government Office of the Australian Information Commissioner released the Guide to Data Analytics and the Australian Privacy Principles (APP). These three documents are provided with this assignment specification.

You must also read the paper, Clarke R. (2018), “Guidelines for the Responsible Application of Data Analytics” Computer Law & Security Review 34, 3 (Jul-Aug 2018), that is provided with this assignment specification and hereafter referred to as the Guidelines. You must also read the paper, Du, Liu and Hu, (2020) “Techniques for Interpretable Machine Learning”, Communications of the ACM 63(1) that is also provided with the assignment.

You are to consider the application of the ACS code of conduct, the 7 US ACM Principles, Clarke’s Guidelines and Du et al’s Techniques to the following fictitious ad targeting scenario. You may also use the APP guide, where it is helpful.

Ad Targeting Scenario (from Clarke R. (2016) “Big Data, Big Risks”, Information Systems Journal 26, 1 (January 2016) 77-90, PrePrint at http://www.rogerclarke.com/EC/BDBR.html

A social media service-provider accumulates a vast amount of social transaction data, and some economic transaction data, through activity on its own sites and those of strategic partners. It applies complex data analytics techniques to this data to infer attributes of individual digital personae. It projects third-party ads and its own promotional materials based on the inferred attributes of online identities and the characteristics of the material being projected.

The 'brute force' nature of the data consolidation and analysis means that no account is taken of the incidence of partial identities, conflated identities, obfuscated identities, and imaginary, fanciful, falsified and fraudulent profiles. This results in mis-placement of a significant proportion of ads, to the detriment mostly of advertisers, but to some extent also of individual consumers. It is challenging to conduct audits of ad-targeting effectiveness, and hence advertisers remain unaware of the low quality of the data and of the inferences. This approach to business is undermined by inappropriate content appearing on childrens' screens, and gambling and alcohol ads seen by partners in the browser-windows of nominally reformed gamblers and drinkers.

You must answer the following questions, clearly indicating which question you are answering within your submission. The page lengths suggested for each question here are for guidance only; the given page length limit for the overall assignment is mandatory.

Question 1. (1 page) Consider the ACS code of conduct. For each of the six values, taking account of any relevant sub-parts, discuss whether the value was demonstrated in the scenario and to what extent. If you assess any value as largely irrelevant to the scenario, then a very brief reason for this assessment is sufficient.

Question 2. (1/2 page) Consider the 7 US ACM Principles. Looking closely at Principle 1, Awareness, discuss how this principle is applied (or not) in the scenario and identify any “potential harm” that might have ensued.

Question 3. (2 pages) Consider the numbered guidelines in Table 2 of Clarke’s Guidelines for the responsible application of data analytics. From every segment (1 General, 2 Data Acquisition, 3 Data analysis, and 4 Use of the Inferences) choose one guideline that you consider would have been applied in the scenario. Its application may not be explicit in the scenario description, but it should be relevant and important to the scenario and you can argue that it was applied properly and therefore did not contribute to the negative consequences of the scenario. Explain its role in the scenario including how it would have contributed to positive outcomes. Justify why it is more relevant than every one of the other guidelines that you consider would have been applied in the same segment. Argue how it is more or less relevant than any guidelines in the same segment that you consider may have been disregarded in the scenario. Be careful to consider the intention of the guidelines rather than an overly literal interpretation; you may rephrase the chosen guideline for the scenario context where beneficial. For further explanation of this point, see Section 3 in Clarke’s Guidelines.

Question 4. (1 page) (a) Choose one, numbered guideline (e.g. guideline 3.3) in Table 2 of the Guidelines that you consider to have been disregarded in the scenario. You may choose any guideline that you did not choose for Question 3. Discuss how the failure to consider the guideline could have contributed to the negative outcome of the scenario. (b) In addition, identify any other potential consequences that could have occurred due to the failure to consider that same guideline. For this purpose, the consequences you identify are not necessarily explicit within the scenario description. You might find it helpful to think of this activity as contributing to a risk assessment process prior to your hypothetical involvement in the analysis work of the scenario.

Question 5. (1 page) Consider the paper by Du et al, Techniques for Interpretable Machine Learning. Discuss whether and how intrinsic and post-hoc interpretability techniques could be applied to the scenario and what benefits could ensue.


General Comments

An abstract or executive summary is not required. A cover sheet is optional and does not contribute to the page count. No particular layout is specified, but you should follow a professional style and use no smaller than 11 point typeface and stay within the maximum specified page count. Page margins, heading sizes, paragraph breaks and so forth are not specified but a professional style must be maintained. Text beyond the page limit or word count limit will be treated as non-existent. Appendices may be used and do not contribute to the page count, but appendices might be only quickly scanned or used for reference and will not be specifically marked.

You must properly attribute the source documents provided for your assignment (but not this assignment specification itself) and any other reference materials you choose to use. You are not required to use additional materials. No particular referencing style is required. However, you are expected to reference conventionally, conveniently, and consistently. Your references should be sufficient to unambiguously identify the source, to describe the nature of the source, and also to retrieve the source in online and (if possible) traditional publisher formats.

An assessment rubric is provided. The rubric will be used to mark your assignment. You are advised to use it to supplement your understanding of what is expected for the assignment and to direct your effort towards the most rewarding parts of the work.

Your assignment submission will be treated confidentially, but it will be available to ANU staff involved in the course for the purposes of marking.


Assessment Rubric

This rubric will be used to mark your assignment. You are advised to use it to supplement your understanding of what is expected for the assignment and to direct your effort towards the most rewarding parts of the work. Your assignment will be marked out of 100, and marks will be scaled back to contribute to the defined weighting for assessment of the course.

Review
Criteria
Max
Mark
Exemplary
Excellent
Good
Acceptable
Unsatisfactory
Communication,
Structure and
Presentation
10
9-10
Exemplary use of language
enhancing the quality of the
submission.
Very well ordered with
logical and clear structure
supported by appropriate
headings and sub-headings.
All use of others' ideas and
materials acknowledged.
References are all included
and are formatted
consistently and
appropriately.
Diagrams and/or images are
ideally suited to the points
where they are used.
Professional presentation
style.
7-8
Very good use of language.
Well-ordered and logical.
Headings and sub-headings
assist the reader.
All use of others' ideas and
material is acknowledged.
All references are included,
though some minor
inconsistency of in-text
citation or formatting.
Diagrams and/or images are
used effectively.
Professional presentation
style.
6
Reasonable but needs some
revision.
Mostly well-ordered and
logical, most supported by
headings and sub-headings
All use of others' ideas and
material is acknowledged.
Some references are missing
and occasional
inconsistencies of in-text
citation and formatting.
Diagrams and/or images
improve readability.
Professional presentation
style.
5
Poor, needs significant
revision.
Order is not always logical
and is sometimes confusing.
Headings are largely those
suggested by the assignment
specification and the
questions posed.
All use of other's ideas and
material is acknowledged,
though sometimes
inconsistently. Missing
references and inconsistent
in-text citation and
formatting.
Diagrams and/or images are
not well selected.
Professional style
attempted.
0-4
Very difficult to understand
Order is confusing and not
always logical. Headings and
sub-headings do little to help
clarify the text
Not all use of other's ideas
and material is
acknowledged. Missing in-
text citations, i.e. plagiarism.
References in the
bibliography not used in the
text. Poorly and
inconsistently formatted.
Diagrams and/or images
detract from the key
messages.
Question 1: Code
of Conduct
20
17-20
The discussion raises subtle
and challenging ethical
issues related to the code of
conduct in important
aspects of the scenario. The
code itself may be
questioned with persuasive
argument.
All values are addressed in
full, with clearly identified
extent of demonstration in
the scenario.
The extent to which the
value is pertinent is justified
by argument.
14-16
All values are addressed in
full.
For each value, the extent to
which it is demonstrated in
the scenario is clear.
The extent to which the
value is pertinent is justified
by argument.
More attention may be
given to more important or
relevant values.
12-13
For most values, the extent
to which the value is
demonstrated in and
pertinent to the scenario is
given.
10-11
Perfunctory but arguably
correct analysis is given for
most of the six values.
0-9
Work does not demonstrate
an adequate understanding
of the code of conduct.
Question 2: ACM
Principle
10
9-10
The principle has been well
understood as
demonstrated by the
identification of application
(or not) in the scenario
supported by considered,
reasoned argument with
evidence drawn from the
scenario together with real-
world knowledge.
Analysis of the ethical issues
demonstrates awareness of
alternative viewpoints and
possibly cost vs benefits.
7-8
Multiple aspects of the
scenario have been used to
discuss the application of
the principle to the scenario.
Potential harm analysis
considers a diverse range of
harms.
6
It is clear how the principle
applies (or not) to the
scenario.
Potential harm may be too
narrowly interpreted.
5
There is a cursory attempt to
relate the scenario to the
ACM Statement but the
analysis is shallow.
0-4
Unclear whether the
relevance and purpose of
the ACM Statement has
been fully understood.
Question 3:
Guidelines
20
17-20
All 4 segments considered.
All selected guidelines show
good understanding of the
guideline, are rephrased
where appropriate, and the
benefit of application to the
scenario context is clear.
All justifications of relevance
convincingly argue the
relative importance of the
guideline to all others in the
segment
14-16
All 4 segments considered.
Most selected guidelines
show good understanding of
the guideline, are rephrased
where appropriate, and are
beneficially applied to the
scenario context.
Most justifications of
relevance convincingly
argue the relative
importance of the guideline
to others in the segment.
12-13
All 4 segments considered.
Most chosen guidelines are
well explained in the
scenario context.
For most chosen guidelines,
the argument for its
relevance is made with
reference to the alternative
guidelines in the segment.
10-11
All 4 segments have been
considered with a chosen
guideline from each segment
both explained and justified.
0-9
Partial attempt, incomplete
or hard to follow.
Question 4:
Disregarded
guideline
20
17-20
The significance of the
selected guideline is argued
persuasively.
The impact of the failure of
the guideline is supported
by critical analysis
demonstrating an
understanding of both the
costs and benefits of
applying the guideline in the
scenario from multiple
viewpoints.
The consequence is
thought-provoking and its
connection to the failed
guideline is explained and
logical.
Arguments are supported by
real-world evidence or
literature.
14-16
Guideline that was selected
is made clear and is relevant
to the scenario.
Discussion on impact of the
failure of the guideline is
supported by critical
analysis of both the
guideline itself and the
events of the scenario with
multiple viewpoints on the
situation presented.
The connection of the
alternative potential
consequence to the failed
guideline is explained and
logical.
12-13
Guideline that was selected
is made clear and is relevant
to the scenario.
Discussion on impact of the
failure to follow the
guideline is reasoned and
related to scenario.
The connection of the
alternative potential
consequence to the failed
guideline is explained.
10-11
Guideline that was selected
is made clear and is relevant
to the scenario.
There is a cursory attempt to
explain the impact of the
failure to follow the
guideline on the scenario
outcome.
Alternative potential
consequence identified.
0-9
Unclear that the selected
guideline was understood.
Unconvincing or implausible
impact on the scenario
outcome.
Other potential consequence
missing or implausible.
Question 5:
20
17-20
Understanding of both
techniques is demonstrated
by detailed description of
application to scenario.
Potential benefits of
interpretability are well
situated in the scenario
14-16
The techniques and their
application in the scenario
are well explained and
potential benefits are
articulated.
12-13
It is shown how the
techniques can be used in
the scenario and some
potential benefits are
enumerated.
10-11
The techniques and their
purpose are broadly
understood.
0-9
It is not clear that the paper
was read.