How to use unix comm command in Tcl?

91 Views Asked by At

I am trying to use Unix's comm command to compare two files in Tcl.

I tried the below to no avail:

exec bash -c {comm -2 -3 <(sort file1) <(sort file2) > only_in_file1}
exec {comm -2 -3 <(sort file1) <(sort file2) > only_in_file1}

It is one of the quick way that I know to do so but if there is a method in Tcl, I would like to be introduced. In general, I would need to compare two files and find unique lines in only one of the files when the two files are lines of text of 10~100K lines.

2

There are 2 best solutions below

6
Donal Fellows On

Since your files are small by comparison with modern computer memories (and you're just looking for the lines in the first that aren't in the second), the simplest method of doing the filtering in pure Tcl is to hold the files in memory.

# Standard read-all-the-lines-of-a-file stanza
proc readLines {filename} {
    set f [open $filename]
    set data [read $f]
    close $f
    return [split $data "\n"]
}

# Read and sort (but the sort is unnecessary for this algorithm)
set lines1 [lsort [readLines file1]]

# Read and load into a dict's keys (i.e., an associative map); sort not needed
set d {}
foreach line [readLines file2] {
    dict set d $line "dummy"
}

# Write out the lines from file1 that aren't in the dictionary
set f [open only_in_file1 "w"]
foreach line $lines1 {
    if {![dict exists $d $line]} {
        puts $f $line
    }
}
close $f

This isn't exactly the method used by comm, but the logic it uses is more difficult to get right and requires both inputs to be sorted.

1
glenn jackman On

is commonly used for this scenario: Invoking from Tcl looks like

exec awk {NR == FNR {f2[$0]; next} !($0 in f2)} file2 file1 > only_in_file1

That's a one-liner-external-tool version of Donal's suggestion.

But your bash solution should work:

$ cat file1
1        
2
3
4
5

$ cat file2
6
5
4
3

$ tclsh
% exec bash -c {comm -2 -3 <(sort file1) <(sort file2)}
1
2
% exec awk {NR == FNR {f2[$0]; next} !($0 in f2)} file2 file1 
1
2