闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

DAT 560M: Big Data and Cloud Computing

Fall 2023, Mini B

Lab #4

INSTRUCTIONS

1. This is a group assignment, to be worked during the lab.

2. ONLY utilize the codes we practice.

3. Please submit the answers on Canvas.

4. Only one per groups is sufficient.

ASSIGNMENT

In this assignment, we are going to work on a dataset called auction.csv and the file is located in

dataset folder on the server. The datasets contain eBay auction information on Cartier

wristwatches, Palm Pilot M515 PDAs, Xbox game consoles, and Swarovski beads. It has the following columns:

auctionid: unique identifier of an auction

bid: the proxy bid placedby a bidder

bidtime: the time in days that the bid was placed, from the start of the auction

bidder: eBay username of the bidder

bidderrate: eBay feedback rating of the bidder

openbid: the opening bid set by the seller

price: the closing price that the item sold for

item: auction item

auction_type: The type of auction (3 days auction, 7 days auction …)

Part 1- Initialization (10pts)

1- Start the PySpark engine and load the file into it. (5 pts)

2- Get to know the dataset and do a preliminary examination (for example type of columns, summary, …) (5 pts)

Part 2- Feature Engineering (20pts)

3- Create a new feature to see how many bids have been given for each item.

Part 3- Linear Regression (70pts)

4- Make the data ready for a linear regression. We are interested to find the final price of item based on “openbid”, “”auction_type”, and the new added feature in part 2. (40 pts)

5- Run the linear regression on the data by splitting data to 70% training, and 30% testing. (20 pts)

6- Report MSE and R2 of the model. (10 pts)

2023-12-10

Java

物理(Physical)

LINUX

C++

Python

Processing

sas

ios

maths

maple

C语言

R语言

Internet and World Wide Web

Principles of Programming Languages

sql

scheme

prolog

JavaScript

Haskell

essay

HDL

VBA

会计学(Accounting)

Rust

经济学（ Economics）

算法分析（Algorithm analysis）

MATLAB

Philosophy

Ethics

地理学（Geography）

Project management （管理学）

SysML

社会学（Sociology）

商业分析(Business Analysis)

市场营销学(Marketing)

人类学(Anthropology)

人文艺术(Arts and humanities)

电气工程（Electrical Engineering）

材料学（hylology）

生物科学（biological science）

哲学（Philosophy）

管理科学与工程类（Management science and Engineering）

工商管理（Business Administration）

数学（mathematics）

计算机（computer）

网络安全（Cyber Security）

统计学 Statistics

金融 Finance

经济与贸易 Economy and trade

Excel

Chemistry

LaTeX

OCaml

SPSS

Project

ASP

Stata

FORTRAN

Information system

SDLC

Basic

Digital Media

Biological

Android

ruby

HTML/CSS

Scala

PHP语言

DAT 560M – Big Data and Cloud Computing 2023 – Lab #4