Math 5750/6880: Mathematics of Data Science Project #1
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Math 5750/6880: Mathematics of Data Science
Project #1
September 4, 2025: project progress report due
September 11, 2025: project final report due
1. (LATEX) From the Canvas Project1 page, download the LATEX project template,
project1_submission_template .tex.
You will modify this template to produce your project progress/final report in pdf format.
If you haven’t used LATEX before, take some time to learn the basics. There are many good online tutorials. I suggest using the online editor Overleaf and reading this tutorial.
In the project progress/final report, tell me about your previous experience with LATEX. If you had difficulties with this exercise or learned something new, tell me about it.
2. (GitHub) First, setup a GitHub account if you don’t already have one.
Visit the GitHub repo
https://github.com/math-data-science-course/Project1
Follow the instructions in the README file to make a copy of the repo.
In the project progress/final report, tell me about your previous experience with Git and GitHub. If you had difficulties with this exercise or learned something new, tell me about it.
3. (Python and Google Colab) Familiarize yourself with Python programming and the Google Colab environment.
Pick one of the mathematical/computer programming problems at
and write code in your Project1 .ipynb Jupyter notebook to solve it. Some of these look very challenging! You should be able to find one that is reasonable for you and fun to solve.
In the project progress/final report, write the problem you solved and carefully describe your solution. Include figures and tables as necessary. You don’t need to include code verbatim since I have access to your Project1 repository. But, feel free to include pseudocode if it helps you describe what you did.
In the project progress/final report, tell me about your previous experience with programming. How familiar are you with python and Google colab? If you had difficulties with this exercise or learned something new, tell me about it.
4. (Regression Analysis) In this exercise, you will import and perform a regression analysis on the california-housing dataset. Use the provided code to import the dataset as a pandas dataframe and use the provided train/test split. In your Project1 .ipynb Jupyter notebook, write code to regress the median house value for California districts on the eight predictor variables. You can use scikit-learn or another python package for the analysis. You may exclude predictive variables if necessary, but explain why in the project report. Explore other regression methods on this dataset.
In the project progress/final report, describe the dataset analyzed. Describe your analysis, both for linear regression and other methods you try. Report the train/test r2 , MAE, and RMSE values. Include a scatterplot of the predicted vs. true median house values. Include a histogram of your model error. What are the most important predictor variables? Interpret the results.
5. (Classification Analysis) In this exercise, you will import and perform a classification analysis on the Breast Cancer Wisconsin Dataset. Use the provided code to import the dataset as a pandas dataframe and use the provided train/test split. In your Project1.ipynb Jupyter notebook, write code to use support vector machines (SVM) to classify whether the breast cancer is malignant or benign based on the 30 predictor variables. You can use scikit-learn or another python package for the analysis. Explore other classification methods on this dataset.
In the project progress/final report, describe the dataset analyzed. Describe your analysis. Re- port the train/test accuracy, roc_auc, and average precision. Include a confusion matrix and plots of the ROC curve and precision-recall curves. What are the most important predictor variables? Interpret the results.
General comments for Project Progress/Final Reports. I expect that you submit the progress/final reports to Gradescope in .pdf format. The reports should be based on the LATEX project template. Any code should be stored in your GitHub repository and a link to the repository should be provided in the .pdf document.
In the progress report, you should describe your progress on each part of the project, including the following points:
(1) Describe what you’ve already completed.
(2) Describe any difficulties you’ve encountered.
(3) Describe your next steps to finish.
If you’ve already finished a problem, you can include your solution and a comment that it is finished.
2025-09-22