merging 2 files into a third one, using columns as index and merging lines too

247 Views Asked by At

I've been studying awk and i've come upon a problem i'm not being able to solve, please help if you can.

I have 2 files I generated using awk, sort and uniq -c.

File 1 is in the format:

1 aaa.c 10/10/2010

1 bbb.h 1/1/2011

3 ccc.c 2/2/2012

1 ccc.c 20/6/2011

1 ddd.c 1/1/2010

1 ddd.c 2/4/1999

1 ddd.c 7/1/2012

1 ddd.c 10/1/1977

Meaning: number_of_equal_files name date (so, 3 files ccc.c from the same date and 1 file ccc.c from another)

File 2 is in the format:

4 ddd.c

2 ccc.c

3 xxx.c

Meaning: number_of_different_dates name (so, ccc.c has been found with 2 different dates) -> files that would have number=1 i removed usind a reverse grep, so there won't be any

What i'd like to do is to generate a third file in the format

number_of_different_dates name date1 date2 date 3 date4 (...)

something like:

2 ccc.c 2/2/2012 20/6/2011 

4 ddd.c 1/1/2010 2/4/1999 7/1/2012 10/1/1977

Thanks in advance!

2

There are 2 best solutions below

3
Mark Wilkins On BEST ANSWER

You should be able to get that result using only the first file as input. The following uses two associative arrays. The first collects the number of times a file is seen and the second collects the dates. The END block just prints the entries that appeared more than once.

{
   counts[$2] += 1;
   dates[$2] = sprintf( "%s %s", dates[$2], $3 );
}

END {
   for ( f in dates ) {
      if ( counts[f] > 1 )
     printf( "%d %s %s\n", counts[f], f, dates[f]);
   }
}
2
jaypal singh On

You can try something like this -

#!/usr/bin/awk -f

NR==FNR{
            a[$3]=$2; b[$2]++;next
       } 

($2 in b){
            printf ("%s %s ", $1,$2);
            for (i in a) 
                if (a[i]==$2) 
                    printf i" "; print ""
          }

Test:

[jaypal:~/Temp] cat file1
1 aaa.c 10/10/2010

1 bbb.h 1/1/2011

3 ccc.c 2/2/2012

1 ccc.c 20/6/2011

1 ddd.c 1/1/2010

1 ddd.c 2/4/1999

1 ddd.c 7/1/2012

1 ddd.c 10/1/1977

[jaypal:~/Temp] cat file2
4 ddd.c

2 ccc.c

3 xxx.c

[jaypal:~/Temp] ./s.awk ff1 ff2
4 ddd.c 10/1/1977 1/1/2010 2/4/1999 7/1/2012 

2 ccc.c 20/6/2011 2/2/2012