how to compare values between two files?

1.2k Views Asked by At

I have two files with two columns separated by space

cat file1.txt
281475225437349 33,32,21,17,20,22,18,30,19,16,23,31
281475550885480 35,32,33,21,39,40,57,36,41,17,20,38,34,37,16,99

cat file2.txt
281475550885480 16,17,20,21,32,33,34,35,36,37,38,39,40,41
281475225437349 16,17,18,19,20,21,22,23,24,25,30,31,32,33

i want to compare values from file1 column2 with file2 column2 for same value in column1. and print only those values which does exist in file1 column2 but not in file2 column2 along and not vice versa along with respective value in column1

Desired o/p

it should not print anything for 281475225437349 as all values from file1 column2 present in file2 column 2 for 281475225437349

it should only print values for 281475550885480 which present in file1 column2 but not present in file2 column2 . Like values 57 and 99 for 281475550885480

So o/p file like:

cat output.txt
281475550885480 57,99

I have tried sorting the file and the comparing using sdiff but it give difference and its taking time

sdiff file1.txt file2.txt
6

There are 6 best solutions below

9
On BEST ANSWER

Perl solution: create a hash of hashes from the second file. The key is the large number, the inner keys are the smaller numbers from the comma separated list. Then iterate over the first file and check what numbers aren't mentioned in the remembered structure.

#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };

open my $f1, '<', 'file1' or die $!;
open my $f2, '<', 'file2' or die $!;

my %seen;
while (<$f2>) {
    my ($key, $value_string) = split ' ';
    my @values = split /,/, $value_string;  #/
    undef @{ $seen{$key} }{@values};
}
while (<$f1>) {
    my ($key, $value_string) = split ' ';
    my @values = split /,/, $value_string;
    my %surplus;
    undef @surplus{@values};
    delete @surplus{ keys %{ $seen{$key} } };
    say $key, ' ', join ',', keys %surplus
        if keys %surplus;
}

BTW, when you switch the files, the output will be

281475225437349 24,25

because 24 and 25 aren't present in file1.

0
On

This is straightforward with awk:

awk '(NR==FNR) { a[$1]=","$2","; next }
     { delete b }                   # clear array for new run
     { split($2,f,",") }            # split string of file1 in array f
     { for(i in f) if (! match(a[$1],"," f[i] ",")) b[f[i]]  }                                                                                                                                                                             
     { c=$1" "; for(i in b) {printf c i; c="," }; if (c==",") printf "\n" }' file2 file1

which returns:

281475550885480 57,99

The idea is to store the string of column two with an extra two <comma>'s to ensure that every number is between commas. Then search for the comma-sandwiched substring.

If you have to do a real number check and you might have to compare numbers "5" with "005" and they should be equal, then you should do the following:

awk '(NR==FNR) { a[$1]=$2; next }
     { delete b }                       # clear array
     { # split strings in number elements
       split($2,f,",");    for(i in f) f1[f[i]+0];
       split(a[$1],f,","); for(i in f) f2[f[i]+0]
     }
     { for(i in f1) if (! (i in f2)) b[i] }
     { c=$1" "; for(i in b) {printf c i; c="," }; if (c==",") printf "\n" }' file2 file1
1
On

You must use two loop in your code. Also split ',' character and set two array, after

foreach(var itemOne as arrayOne){
boolean isExist=false;
foreach(var itemTwo as arrayTwo)
if(itemOne==itemTwo) isExist=true;

if(isExist) console.log(itemOne+" is exist");
}
0
On
$ cat tst.awk
BEGIN { FS="[ ,]" }
NR==FNR {
    for (i=2; i<=NF; i++) {
        file2[$1,$i]
    }
    next
}
{
    diff = ""
    for (i=2; i<=NF; i++) {
        if ( !(($1,$i) in file2) ) {
            diff = (diff == "" ? "" : diff ",") $i
        }
    }
    if (diff != "") {
        print $1, diff
    }
}

$ awk -f tst.awk file2 file1
281475550885480 57,99
0
On

another awk

$ awk -v c=, 'NR==FNR {a[$1]=$2; next}
                      {for(i=2;i<=NF;i++) 
                         {if(c a[$1] c !~ c $i c) p=(p==""?$1" ":p c) $i} 
                          if(p) print p; p=""}' file2 FS=' |,' file1

281475550885480 57,99
0
On

This might work for you (GNU sed):

sed -r 's#^(\S+)\s(\S+)$#/^\1 /s/$/,\\n\2,/#' file2 |
sed -rnf - -e ':a;s/(\b[0-9]+,)(.*\n.*)\1/\2/;ta;s/(.*),\n.*/\1/p' file1

The solution can be split into two parts.

First file2 is turned into a sed script which appends the values for each key to the same key in file1. Additional ,'s are added to make matching easier and will act as indicator of whether a line from file1 has values to be printed.

The script generated from file2 is piped into a second sed invocation and an additional subscript uses substitution and a loop to remove matching values from file1.

On a match failing, and because the second sed invocation uses the -n switch to make printing explicit, a final match removes the introduced , and non-matched values following the newline and prints the required result.