CS 230 Final Project Fall 2023
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
CS 230 Final Project
Fall 2023
Due on December 15, 5:00 PM
Interactive Data-Explorer: Tell a Story with Real-World Data
In this last project you will develop an interactive data-driven web-based Python application that tells a story with real-world data. You will show your mastery of many coding concepts as you interact with real-world data. You will use Pandas for managing and interacting with data, MatPlotLib, or other charting packages for creating charts and graphs, PyDeck (or other mapping packages) for maps, and the Streamlit.io package for creating interactive web applications using Python.
The datasets we will be using this semester all come from Analyze Boston ( http://data.boston.gov), the city of Boston’s open data hub.
Name |
Description |
Where do Blue Bikes riders ride? When do they ride? How far do they go? Which stations are most popular? On what days of the week are most rides taken? Two files are included: Trip History from Q1 2015 and alist of stations. |
|
When is trash collected in Boston's neighborhoods? |
|
Where shouldn’t you hangout in Boston at night? Crime incident reports are provided by Boston Police Department (BPD) to document the initial details surrounding an incident to which BPD officers respond. This data set includes the type of incident as well as when and where it occurred.
|
|
Where togo for cannabis? This Open Data Registry includes currently licensed applicants as well aspending cannabis license applicants. |
|
Looking for a place to park in Boston? This data set shows where to park in downtown Boston on each block, and hours of meter operation. |
|
BigBellyTrash Alertsand |
Big Belly trash receptacles are solar powered, internet connected, compacting trash receptacles that can collect up to five times as much waste as traditional bins and help the city more efficiently manage the waste collection process. This is a legacy dataset containing all signals received from the trash receptacles for the calendar year 2014.
|
Which buildings in Boston are unsafe to enter? This data set contains violations on Boston buildings or properties issued by inspectors from the Building and Structures Division of the Inspectional Services Department.
|
The links for each data set provide background information and sources for the data. Please read them, as they
often contain data dictionaries describing the fields or columns in each data table. You shouldget the csv data files here, rather than from Analyze Boston website, as the data files provided for this project in some cases are
sampled or cleaned versions of the original data files. (If there’sa file that ends with 7000_sample, use that one. It means the original file from Analyze Boston is much larger, and a random sample of 7000 records is provided.)
You will be assigned to a specific dataset . Find your assigned dataset here. Failure to use the assigned dataset will result in a zero for your final project.
Project Description
Part 1. Design
The purpose of the design phase is to start thinking about what you might do before you jump into coding. Identify at least three different queries or questions you can ask about your data set. Try to phrase your questions so that they can have a parameter which can come from user input.
For example: (and these queries don’t match the data sets you are given but are here to inspire you!)
• What’s the cost of the most expensive <house_type> in <city>?
• Find all of the apartments in <city> that rent for under <amount> .
Then think about the interactive widgets from Streamlit that you can bring to your application to obtain user input. For example: you can use a numeric slider to have the user enter a monthly rental amount.
Next, describe how you will visually present the data or query results using charts, graphs, tables, or maps.
Be sure your web pages and visualizations are "user friendly" and as "polished" as possible. Be sure to label controls requiring user interaction, make sure your charts have titles, legends or explanations that would be helpful to the user. Think about how the user will navigate from one part of your site to another.
Feel free to add to your project as you explore Pandas and Streamlit capabilities and find cool ways to implement new or additional features. Part of your grade will be a "complexity/originality" score. If you use a module or do something cool that we may not have discussed in class or implement more than the minimum requirements, you will receive a higher complexity score.
A complexity score of 1 means you implemented the minimum requirements for this project. A complexity score of zero means you didn’t meet the requirements.
Part 2. Coding
Create your Python application with a Streamlit UI and several visualizations.
Create charts and graphs of different types with custom legends, axis labels, tick marks, colors, other features), and at least one map showing locations and databasedon latitude and longitude. Your chart should tell a story, so be
sure elements are labelled appropriately, and add any narrative that will help the reader understand your
visualizations and to cue the reader about which values to specify, and the purpose of each chart or graph. You may wish to add a few sentences explaining each chart as a paragraph of text on the screen.
You might also use pandas to create summary report based on the data itself (max/min values, relationships between columns, etc.).
See thedocumentationfor how to use different Streamlit features. You might make use ofsidebarsto place your widgets, multi-page applications, orcachingto improve performance.
Read the documentation forPyDeckMaps. Our examples of maps in class were PyDeck’s Scatterplot Layer or IconLayer, but PyDeck support several other styles such as Text and Heatmaps. Have a look.
To explore another chart library, considerSeaborncharts which have additional chart types and customization
options. You might also look at Foliummaps (here’s asimple tutorial) if you’d like to play with a different mapping library.
If your project contains more than one Python code file (i.e., one or more Python code files and images),create a zipfolder containing all of your project files and submit it. You do not have to submit the data file that you used.
Part 3. In-Class Presentation (December 15, 3:00 - 5:00 PM)
You will have five minutes to present your project. Give an overview of your project’s capabilities. Demonstrate what you feel is the most interesting part of your project, and then show how you implemented the cool stuff in your program. Describe how you used various coding features and pandas queries. Then talk through the pandas and Streamlit code well enough to convince me that you understand how your code works and what you did.
The presentation is mandatory. Failure to show up and present your project will result in a zero for your final project grade.
Requirements
As you write your program, be sure to include code that demonstrates each of these items. Each contributes to your project grade (see the rubric below).
Python Features:
• A function with two or more parameters, one of which has a default value
• A function that returns more than one value
• A list comprehension
• A loop that iterates through items in a list, dictionary, or data frame
• At least two different methods of lists, dictionaries, or tuples.
Streamlit Features:
• At least three Streamlitwidgets (sliders, drop downs, multi-selects, text box, etc.)
• Page design features (sidebar, fonts, colors, images, navigation)
Visualizations:
• At least three different charts with titles, colors, labels, legends, as appropriate
• At least one detailed map (st.map will only get you partial credit) – for full credit, include dots, icons, text that appears when hovering over a marker, or other map features
Data Analytics Capabilities:
• Sorting data in ascending or descending order, by one or more columns,
• Filtering data by one condition
• Filtering data by two or more conditions with AND or OR
• Analyzing data with pivot tables
• Add/drop/select/create new/group columns, frequency count, other features
• Text analysis based on word frequencies, etc.
Usual rules about writing "good" code apply:
• Make your code as modular and easy to follow as possible.
• Include a docstring, comments, and meaningful variable names.
• If you did something "cool" in your code that you are incredibly proud of, please write a comment call attention to what you did.
• If you referred to any online articles or other information beyond class examples, please be sure to list them as references / comments in your code.
• Make sure the program runs and the output is correct.
Documentation String
Use this documentation string at the top of your Python code file:
"""
Description: This program ... (a few sentences about your program and the queries and charts) """ |
Grading
The Final Project will be worth 16% of your course grade. It is based on 50 points, as follows:
Requirement |
Points |
Project: Proposal, Design and Queries submitted on time |
2 |
Python Coding Features (at least 4 @ 2.5 points) |
10 |
Code quality |
3 |
Streamlit Features (3 controls and other page design features) |
8 |
Visualizations (3 different charts and one map, 4 points each) • 2 points for displaying the data correctly • 2 points for customization (colors, grid lines, legend, etc.) |
16 |
Data Analytics Capabilities (at least 4 – sort, filter, etc.) |
12 |
Presentation |
6 |
Complexity: 0 = Your project implements less than the minimum requirements 1 = Your project meets the minimum requirements 2 = Your project includes some complex queries, charts, or UI features, or added a small number of extra features beyond those which are required 3 = Wow! You went above and beyond in requirements, ether doing more than what is required, or by including features, modules, or packages learned independently or not described in class |
3 |
Total: |
60 |
Submission
Submit your Python file (or zip file containing multiple files) on the BS by 5:00 PM on December 15. Also submit your data files.
Getting Help and Academic Integrity:
• Please do not discuss your program with anyone other than your instructor.
• You can ask CIS Sandbox tutors for assistance on related or general topics, but you cannot ask them to
help you write your code for this project. For example, you can ask tutors to help review examples of how to create bar charts in Python (in general), but you cannot ask them to help you debug a bar chart you
might create using the data set for this project. You can ask for help with fixing syntax or runtime errors.
• You are prohibited from seeking help from anyone other than your instructor or CIS Sandbox tutors.
• You are prohibited from using ChatGPT or any other AI tools to do any part of your project.
• Any violation of these policies will result in a zero for the project at minimum or even a grade of ‘F’ for the course.
2023-12-12