This assignment involves writing a program (called texta) to transform text files according to a set of commands in a fi

Business, Finance, Economics, Accounting, Operations Management, Computer Science, Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Algebra, Precalculus, Statistics and Probabilty, Advanced Math, Physics, Chemistry, Biology, Nursing, Psychology, Certifications, Tests, Prep, and more.
Post Reply
answerhappygod
Site Admin
Posts: 899603
Joined: Mon Aug 02, 2021 8:13 am

This assignment involves writing a program (called texta) to transform text files according to a set of commands in a fi

Post by answerhappygod »

This Assignment Involves Writing A Program Called Texta To Transform Text Files According To A Set Of Commands In A Fi 1
This Assignment Involves Writing A Program Called Texta To Transform Text Files According To A Set Of Commands In A Fi 1 (147.2 KiB) Viewed 45 times
This assignment involves writing a program (called texta) to transform text files according to a set of commands in a file. Unix and Unix-like systems (Linux etc) have a large number of programs for manipulating text. In fact, the original Unix was developed to help people write technical documentation. In those days everything was stored as plain text and specialised document processing systems did the typesetting to produce the final document. We still use a lot of plain text files today. There are still widely used document processing systems such as Latex that take a plain text file and interpret commands in the file to display typeset text. Latex is almost universal in the technical academic world (CS, Eng, Science...). Text formats such as CSV (comma separated values) are used to store data files. Log files that store information about activity in a system are all stored in text. There are many things we might like to do with a text file. For example, pick out all lines that contain a certain string, or replace a certain string with another string, or pick out particular substrings of the line. A good example of this occurs when you analyse log files. A web server log might accumulate many thousands of entries (lines) each day. If I want to find out how many people accessed a certain URL I want to extract all relevant lines and count them. I might want the total number of accesses from a particular client machine, or how many total bytes were transferred.
All of these examples require analysis of the text log file. Unix has many commands for doing this analysis. Each of these commands does a small task (for example, selecting lines that match a pattern) and we can string them together in a shell pipeline. For this assignment we're going to write a program that implements a few of these operations and control what to do with a file of instructions. It will be a sort of Swiss Army Knife that does everything. The texta Command Your command (called "texta") will ad a file containing instructions (commands) to do something with the input text. It will read lines from the input files and apply each command, in sequence, to each line, sending the result to output. The commands to implement are: filter regexp # selects lines that match regular expression regexp fields "delimiter-string" a b c d # Divide a source line into fields using the delimiter-string and keep only the fields numbered a b c d, in the given order replace "string1" "string2" # replaces string1 by string2 count # at the end of a run prints a count of the number of output lines on stderr A comment may be placed at the end of any command using a hash character (#). Each command must consist of a single line. filter The regexp argument specifies a regular expression which is matched against each line. If the expression matches, the line is passed to the next stage. If the expression does not match, that line is skipped. You can implement regular expression matching easily in Python using the "re" module. V2: Note that the double quote character cannot appear in a regexp in this exercise. This is to make the exercise a bit easier - regular expressions can normally include quotes. fields The delimiter-string is any string of characters enclosed in double quotes ("), it is used to break the line into a set of fields, numbered from 0. The numbers a b c etc are used to select the order that these fields are to be written (separated by delimiter-string) to the output. This allows the line to be re-ordered, fields deleted etc. Note that the double quote character cannot appear in adelimiter-string, and an empty string (") means any white
space. V2: If the delimiter string is empty, output fields are separated with a single space. replace This allows one string to be replaced by another. The strings are enclosed in double quotes. This is done in one pass, left to right. If string1 is empty ("") it means any amount of whitespace is to be replaced. If string2 is empty ("") it means remove string1. V2: The double quote character cannot appear in the strings. V2: a newline character (\n) is illegal in any of the strings in filter, fields or replace. count This command sets a flag to tell texta to print the count of output lines to the standard error file when the texta program finishes. The general form of the texta command is: texta cmdfile [file1 [file2...]] cmdfile is the file containing commands (eg filter etc) file1 is a file to apply the commands to file2... are more files to apply the commands to If there are no files specified, texta reads from the standard input. The processed lines are written to the standard output. Error handling and assumptions You should make sure the arguments to your texta command are correct: must have a cmdfile and it must be readable and contain commands. If there are filenames, the files must exist and be readable. You should check that the commands are legal, ie one of filter, fields, replace, count, and they have the right number of arguments. Also check that the arguments are legal, eg no negative field numbers or non-integers. Error messages must have the form: Error: file name not readable Error: command line N: bad field number Error: command line N: incorrect number of strings in replace Error: command line N: message Error: message
where name is one of the file names on the command line, w is the line number in the command file, message is an informative error message. V2: If an input file is unreadable give an error message and move on to the next file. V2: The philosophy with errors is to give an informative message and try and keep going. For example, if a line doesn't have the correct number of fields you should give an error message on stderr but then go to the next command for the same line. V2: It is impossible to cover all possible input states in a specification like this. If you find something you think is ambiguous in the specification make your own judgement and justify it with comments in the code. The marker will be reading your code. Usage Examples Glven an input file (called testdata) in the comma separated value spreadsheet format (CSV) containing the following data: Jim, Smith, [email protected], INFO1110 Jane, Smith, [email protected], Bill, Smith, jsmi [email protected], And the following set of commands in the file testcmds: filter "INF01110" #select students in INFO1110 fields ", " 2 # replace matching lines with the email address count #print the number of lines Then the command: INFO1113 INFO1110 texta testcmds testdata will output a set of lines containing email addresses only for students that are in INFO1110. It will also print the count of lines on output to the standard error file. Standard output: [email protected] jsmi [email protected] Standard error:
2 Another example with the same input but testcmds containing: fields ", replace "@uni.sydney.edu.au" This will take the email address field and remove the "@uni.sydney.edu.au" part. "1 2 Standard output: jsmi4321 jsmi1234 jsmi 9876 Standard error will not have any output. An example showing error messages With testcmds containing: fields "," x Standard output: Standard error: # extract unikey Error: command line 1: bad field number Implementation The assignment is to be implemented in Python as a script which handles command line arguments. A set of scaffold files will be provided. You are expected to write legible code with good style. The only Python modules which you are allowed to import are os, sys, and re. If you want to use an additional module which will not trivialize the assignment, ask your tutor, and the allowed library list may be extended.
Join a community of subject matter experts. Register for FREE to view solutions, replies, and use search function. Request answer by replying!
Post Reply