merging 2 files into a third one, using columns as index and merging lines too

217 Views Asked by At

I've been studying awk and i've come upon a problem i'm not being able to solve, please help if you can.

I have 2 files I generated using awk, sort and uniq -c.

File 1 is in the format:

1 aaa.c 10/10/2010

1 bbb.h 1/1/2011

3 ccc.c 2/2/2012

1 ccc.c 20/6/2011

1 ddd.c 1/1/2010

1 ddd.c 2/4/1999

1 ddd.c 7/1/2012

1 ddd.c 10/1/1977

Meaning: number_of_equal_files name date (so, 3 files ccc.c from the same date and 1 file ccc.c from another)

File 2 is in the format:

4 ddd.c

2 ccc.c

3 xxx.c

Meaning: number_of_different_dates name (so, ccc.c has been found with 2 different dates) -> files that would have number=1 i removed usind a reverse grep, so there won't be any

What i'd like to do is to generate a third file in the format

number_of_different_dates name date1 date2 date 3 date4 (...)

something like:

2 ccc.c 2/2/2012 20/6/2011 

4 ddd.c 1/1/2010 2/4/1999 7/1/2012 10/1/1977

Thanks in advance!

2

There are 2 best solutions below

3
On BEST ANSWER

You should be able to get that result using only the first file as input. The following uses two associative arrays. The first collects the number of times a file is seen and the second collects the dates. The END block just prints the entries that appeared more than once.

{
   counts[$2] += 1;
   dates[$2] = sprintf( "%s %s", dates[$2], $3 );
}

END {
   for ( f in dates ) {
      if ( counts[f] > 1 )
     printf( "%d %s %s\n", counts[f], f, dates[f]);
   }
}
2
On

You can try something like this -

#!/usr/bin/awk -f

NR==FNR{
            a[$3]=$2; b[$2]++;next
       } 

($2 in b){
            printf ("%s %s ", $1,$2);
            for (i in a) 
                if (a[i]==$2) 
                    printf i" "; print ""
          }

Test:

[jaypal:~/Temp] cat file1
1 aaa.c 10/10/2010

1 bbb.h 1/1/2011

3 ccc.c 2/2/2012

1 ccc.c 20/6/2011

1 ddd.c 1/1/2010

1 ddd.c 2/4/1999

1 ddd.c 7/1/2012

1 ddd.c 10/1/1977

[jaypal:~/Temp] cat file2
4 ddd.c

2 ccc.c

3 xxx.c

[jaypal:~/Temp] ./s.awk ff1 ff2
4 ddd.c 10/1/1977 1/1/2010 2/4/1999 7/1/2012 

2 ccc.c 20/6/2011 2/2/2012