I have read in a string, split it based on delimiter and store it into an array. I want to iterate through a text file and delete the lines that do not contain the strings I stored in the array. Say my resulting array is ['foo', 'bar', 'baz', 'qux', 'quux', 'corge']
and my text file is:
foo
grault
bar
xyzzy
baz
quz
quux
I want to delete the line grault, xyzzy (because they are not in the array) and add corge at the end so my resulting file would be:
foo
bar
baz
quz
quux
corge
I am planning to use a for loop to iterate through my array and use grep to add the lines that are missing from the file, but how should I delete the lines that do not exist in the array but exist in the file?
Let's define the list of approved words:
Now, let's remove from
fileany word that is not inwords:If we want to remove any word not in
wordsand also add the end any word inwordsthat was not infile, then:How it works
-v s="$words"This defines an awk variable
swhich has the contents of the shell variablewords.BEGIN{split(s,a,/ /); for (i in a) b[a[i]]}Before we read
file, this splits up the words insinto an arrayawhose values are those words. Then, we create an associative arraybwith one key for each of the words.($0 in b){b[$0]++;print}As we read through
file, if the line matches a word inb, then increment the count of the number of times that that word has appeared and also print the word.END{for (w in b) if (b[w]==0) print w}After we have finished reading the file, if any word in array
bwas not printed, that is its countb[w]is still zero, then print it.