Bash compare an input array and a text file and update the file

454 Views Asked by At

I have read in a string, split it based on delimiter and store it into an array. I want to iterate through a text file and delete the lines that do not contain the strings I stored in the array. Say my resulting array is ['foo', 'bar', 'baz', 'qux', 'quux', 'corge']

and my text file is: foo grault bar xyzzy baz quz quux

I want to delete the line grault, xyzzy (because they are not in the array) and add corge at the end so my resulting file would be: foo bar baz quz quux corge

I am planning to use a for loop to iterate through my array and use grep to add the lines that are missing from the file, but how should I delete the lines that do not exist in the array but exist in the file?

2

There are 2 best solutions below

0
John1024 On BEST ANSWER

Let's define the list of approved words:

$ words='foo bar baz qux quux corge'

Now, let's remove from file any word that is not in words:

$ awk -v s="$words" 'BEGIN{split(s,a,/ /); for (i in a) b[a[i]]} ($0 in b){b[$0]++;print}' file
foo
bar
baz
quux

If we want to remove any word not in words and also add the end any word in words that was not in file, then:

$ awk -v s="$words" 'BEGIN{split(s,a,/ /); for (i in a) b[a[i]]} ($0 in b){b[$0]++;print} END{for (w in b) if (b[w]==0) print w}' file
foo
bar
baz
quux
corge
qux

How it works

  • -v s="$words"

    This defines an awk variable s which has the contents of the shell variable words.

  • BEGIN{split(s,a,/ /); for (i in a) b[a[i]]}

    Before we read file, this splits up the words in s into an array a whose values are those words. Then, we create an associative array b with one key for each of the words.

  • ($0 in b){b[$0]++;print}

    As we read through file, if the line matches a word in b, then increment the count of the number of times that that word has appeared and also print the word.

  • END{for (w in b) if (b[w]==0) print w}

    After we have finished reading the file, if any word in array b was not printed, that is its count b[w] is still zero, then print it.

0
Eric Renouf On

If your original stuff is in a nice file like the second one you can just do

(grep -f <good list> <bad list>; echo 'corge')

to get the right list otherwise you could try

(grep -f <(printf '%s\n' "${array[@]}") <bad file>; echo 'corge')

which will use process substitution to make your array be like a file that grep could use to search the file for you

This will give you only the lines that are in your word list from the original file, plus corge that you had identified. If you just want the other file to match the word list though you could probably skip all the line matching and just write your array to the file.