闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

COMP-206 Introduction to Software Systems, Fall 2022

Mini Assignment 3: Advanced Unix Utils

Ex. 1 — Parsing program logs for analysis (12 Points)

It is often necessary to parse the output produced by various specialized software systems to extract/generate data that is of speciﬁc interest to us. In this assignment, we will use the advanced Unix utilities that we covered in class to analyze the log ﬁles of a distributed software system.

The log ﬁles that we will be using for this assignment is available under the directory hierarchy of

/home/2013/jdsilv2/206/mini3/logset1. Please note that this directory may not be accessible through FileZilla, etc. It is primarily meant to be accessed from the Unix command line in mimi.

The description of the software system is provided towards the end of the assignment under the section logs of the distributed system, to help you understand how the log ﬁles are generated, the meaning of various descriptions, etc. It is important to read it thoroughly to understand some of the techniques you will have to use to analyze the

log ﬁles to generate the required output before continuing with the rest of the questions.

You will be writing a shell script logparser .bash that would process these log ﬁles.

1. (0.5 Points) The shell script is expected to be given the name of a directory as its argument, under which we will look for log ﬁles whose names are of the form host .port .log. Where host is made up of alpha-numeric

characters and - (like our SOCS servers). port is basically an integer. Do not hard code the directory name in your script. Each of these are log ﬁles of processes with identiﬁers host:port.

If the script is not invoked with the correct number of arguments, it should throw an usage message and terminate with a code of 1.

$ ./logparser .bash

Usage ./logparser .bash <logdir>

2. (0.5 Points)

If the passed argument is not a valid directory name, it should throw an error message and terminate with code 2. For this particular situation (and only here), the error message must be send to the standard error and not the standard output.

./logparser .bash /tas/r

Error: /tas/r is not a valid directory name

You do not have to explicitly check if you have the permissions to access the directory or the log ﬁles.

3. (6 Points)

The script should parse all the log ﬁles and produce a comma separated output ﬁle (CSV) called logdata .csv of the following format (truncated for brevity), in the current directory (i.e., where you are currently in the shell when the script was invoked).

........

teach-node-05:40190,13,teach-node-05:40190,10:56:52 .297790000,10:56:52 .362437000,10:56:52 .492420000 teach-node-05:40190,13,teach-node-05:40290,10:56:52 .297790000,10:56:52 .361714000,

teach-node-05:40190,13,teach-node-05:40390,10:56:52 .297790000,10:56:52 .362785000,10:56:52 .363999000 teach-node-05:40190,13,teach-node-05:40490,10:56:52 .297790000,10:56:52 .363259000,10:56:52 .364581000 teach-node-05:40190,13,teach-node-05:40590,10:56:52 .297790000,,

teach-node-05:40190,13,teach-node-05:40690,10:56:52 .297790000,10:56:52 .366296000,10:56:52 .366887000

........

In the above example, the ﬁrst ﬁeld refers to the process that initiated the broadcast, teach-node-05:40190, the second ﬁeld is the message id (13), unique only to the broadcaster process, the third ﬁeld is a process that recevied that broadcast message (so basically if you have 6 processes in the group, you will have a maximum of 6 entries in the CSV ﬁle per broadcast message - one per receiver). The fourth, ﬁfth and sixth ﬁelds are broadcast time (available only in the sender’s log), receive time and delivery time (the last two being available in the receveing process’s log), respectively. So as you can see, the broadcast time is the same for all the 6 entries but the other ﬁelds could vary.

Please note that since the group communication (GC) messaging system is not very reliable, a broadcast message may not get to the other process (receive), or in some cases may not get actually delivered. This means that the entries for receive and/or delivery might be missing from the log ﬁles. In which case you leave those entries empty. You do not have to worry about duplicate delivery of the same message from a broadcaster. The output log CSV must be ordered based on the broadcast process identiﬁer, message identiﬁer, and receiver process identiﬁer.

4. (3 Points) Using logdata .csv, the script must generate another CSV ﬁle (in the same directory), stats .csv which would contain a summary of the eﬃciency of the group communication system. (Truncated for brevity). Below example is only a sample format, the data may not be accurate.

broadcaster,nummsgs,teach-node-05:40190,teach-node-05:40290,teach-node-05:40390,teach-node-05:40490, . . teach-node-05:40190,70,81,90,98,76,93,78

teach-node-05:40390,30,83,70,80,80,80,84

teach-node-05:40490,26,84,88,73,95,93 .3333,92

teach-node-05:40590,72,61 .6667,100,91 .6667,83 .3333,80,98

This is basically one row per processes’ that do broadcasts, a corresponding column that shows how many messages were broadcasted by each of the process and a column per each receiver, indicating the percentage of messages that were delivered by the broadcaster at that receiver (as in there is an entry for delivery time in the receiver’s log ﬁle for the said messages). Do not include processes that do not perform any broadcasts in the rows. The ﬁle should also have a header to name the columns, as given in the above example. Rows should follow the order of broadcast process identiﬁers and columns with in a row (left to right) should follow the order of the process identifers.

5. (2 Points) Next we will use the stats .csv ﬁle to generate an HTML ﬁle by the name stats .html (also in the same directory). This can be easily accomplished by replacing comma ( ,) with appropriate HTML tags. (Vi the sample ﬁle given to you to understand the format). Below example is only a sample format, the data may not be accurate.

<HTML>

<H2>GC EfficiencyH2>

<TR><TH>broadcasterTH><TH>nummsgsTH><TH>teach-node-05:40190TH> . . .TR>

<TR><TD>teach-node-05:40190TD><TD>70TD><TD>81TD> . . .TR>

<TR><TD>teach-node-05:40390TD><TD>30TD><TD>83TD> . . .TR>

<TR><TD>teach-node-05:40490TD><TD>26TD><TD>84TD> . . .TR>

<TR><TD>teach-node-05:40590TD><TD>72TD><TD>61 .6667TD> . . .

</TABLE>

</BODY>

</HTML>

You can also use the lynx command line browser available in mimi to look at the html ﬁle, which will be displayed like below (colors may be diﬀerent/absent depending on your terminal software and is not relevant). Below example is only a sample format, the data may not be accurate.

ADDITIONAL RESTRICTIONS

● You must write a reasonable amount of comments (to understand the logic) in your script. You can lose up to -2 points for not writing comments.

● Follow the sample output format that is given to you (in the ﬁles) for the valid invocation. It does not take much eﬀort to implement them. Not following it can result in a deduction of -2 points or more.

● The script should only create the ﬁles asked for and MUST NOT create any other temporary/intermediate ﬁles to do its work. Use the techniques already covered from previous assignments and labs to pass output of one command/utility to another, etc. Violations would result in a deduction of -3 points.

● Any error messages from your program should be as a result of an explicit echo command in your script. Any error messages from commands/utilities used by your script should be handled by the script itself and not reported to the user. Violating this would result in -2 points deduction per occurrence.

● DO NOT assume that the process identifers will have only so many characters or so many integers.

● Your submission should be a single script (ﬁle), speciﬁcally, do not put awk commands, etc., in a separate ﬁle. (-3 points deduction). It might even be a 0 for the assignment, if it does not run as the TA expects because of this.

● For the log ﬁles in the test directory given to you for testing, your script should run under 1 minute (clock time), not “hang”, etc.. Scripts that take longer than this may not get graded or maybe graded only for the outputs produced in that time. To give some perspective, a simple, unoptimized implementation of this solution runs well under 11 seconds. This could result in 0 or very low points depending on how far your script progressed.

● DO NOT Edit/Save ﬁles in your local laptop, not even editing comments in the scripts. This can interfere with the ﬁleformat and it might not run on mimi when TAs try to execute them. This will result in a 0. No Exemptions !!. TAs will not modify your script to make it work.

WHAT TO HAND IN

Upload your script logparser .bash to MyCourses under the mini 3 folder. Do not zip the ﬁle. Re-submissions are allowed.

You are responsible to ensure that you have uploaded the correct ﬁle to the correct assignment/course. There are no exemptions. If you think it is not worth spending 5 minutes of your time to ensure that your submission that is worth 10% of your grade is correct, we won’t either. NO Emailing of submissions . If it is not in MyCourses in the correct submission folder, it does not get graded. Because you do not know how to resubmit an assignment or run out of time to ﬁgure it out is not an excuse for emailing the assignment. Imagine what will happen if all of you emailed your assignment instead of submitting it in MyCourses!

LATE SUBMISSIONS

Late penalty is -20% per day. Even if you are late only by a few minutes it will be rounded up to a day. Maximum of 2 late days are allowed. Do not email me asking for extensions because you have other midterms, assignments, etc. You had the assignment out for a long time and the schedules were given to you in the beginning of the semester!

Neverthless, any requests made less than 24 hours from the deadline is automatically denied. It would have been too late to start your work anyways. Extensions are given only for extenuating circumstances.

ASSUMPTIONS

● You may assume that any ﬁles and directories that your script needs to access will have the necessary permissions for it to execute the tasks outlined in the assignment.

● Directory will have only valid log ﬁles in the correct format (name and contents). It will not have anything else. No empty directories either.

● The entries in the log ﬁles follow the order of time (as seen by that process). i.e., they are not deliberately scrambled to randomize time.

● You need not worry about the number of decimal digits for your stats computations, as long as the fractional part is in the same range. I.e., it is ok for 83.3333 to be 83.33 or 83.3333333, etc. (minor approximations), but it should not be 84 or 83.345, etc., which generally means you did the math incorrect some place.

HINTS

This is a high-level outline to get you started with the ﬁrst part of the output format in case you feel stuck. You are not obliged to follow it.

● Remember that each log ﬁle represents the events that happened in that process.

● Look for the broadcast messages to see which processes were broadcasting and what were their message ids.

● Remember that message ids are unique only with respect to a broadcaster process. I.e. a process p1 and another process p2 can both broadcast the same id, say i and both the messages will be send to all the process (and possibly received and delivered).

● Use the combination of process and message id to ﬁgure out which message you are tracking across the various log ﬁles.

● For each broadcast message, check in all the log ﬁles to see when it was received / delivered at each process.

A pseudo code for the main part is as given below (you are not obliged to follow it, if you do not understand it, build one based your logic). This is deﬁnitely not eﬃcient, but will get you through the task.

for each process, get a list of message identifier ls that they broadcasted

find the broadcast timestamp of that message .

go over each process log and extract the receive time and delivery time of that message in that process . produce an output that contains the broadcaster, message id, receiver, and the timestamps .

For the last part, (again, not obliged, there could be other ways to implement this).

● This link provides a simple introduction to the concept of HTML tables.

● You will ﬁnd it easier if you look at this problem as how to replace comma ( ,) with appropriate HTML tags as the separator between the ﬁelds. Verify the correctness of the data in the stats .csv ﬁle, etc., AND THEN think about the logic to transform that into an HTML table format.

2022-10-18

Java

物理(Physical)

LINUX

C++

Python

Processing

sas

ios

maths

maple

C语言