This assignment asks you to implement a short shell script. Given a list of keywords and a list of files, the script should produce an organized index that shows the line number of each occurrence of each keyword within the specified files. Basically, this script makes it easy to find a reference within a file by keyword.
The script accepts two arguments, each of which is a quoted list. The first argument is the list of keywords. The second argument is the list of files to search. The files are to be searched for occurrences of each word that is provided. Substrings of the keywords should not be matched, e.g., if one of the keywods was "ark", occurrences of "mark" should not be flagged as a match.
Should the user invoke the script with other than two (2) arguments, it should display its usage help and terminate, as below:
Usage: index "keywords" "file list"
The output should be neat and readable. In the section below you can see an example of sample functionality. Please try to match this format as closely as possible, using tabs to force the leading whitespace on a line.
> ./index "Parker Waterman" "pens/*" Parker pens/fountain-pens.txt: 12 15 39 58 62 67 pens/inks.txt: 8 21 42 pens/other-pens.txt: 7 22 pens/pens.txt: 20 26 Waterman pens/fountain-pens.txt: 4 10 23 27 28 29 37 43 44 47 54 59 69 71 pens/inks.txt: 11 19 22 24 28 30 32 47 pens/other-pens.txt: 10 27 pens/pens.txt: 5 18 35 39 40 42
grep can be used to find lines that contain matching patterns, such as words. RTFM to find grep switches that print out line numbers and word-based searches.
cut can be used to select the necessary information from the grep output. For example, cut -d: -f3 selects the third field from the input.
tr, short for translate can be used to substitute one character for another, or, with the -d option, to delete one character, outright.
Try these commands with some sample searches at the command prompt before using them in your shell script!
My solution first checked the argument count. If it wasn't right, it printed the usage and exited. Then, it entered a nested loop. The outer loop was "for each keyword in the list." The nested loop was "for each file in the list".
Basically, I printed out the keyword. Then, indented, I printed out each hit for each file. I got the hits and line numbers using grep, then doing some massaging. Lastly I used printf to generate the output.
The handin directory will be set up next week.