I always use univocity parser in my java program to compare csv files. It works excellent and is much faster.
But the problem is, this time am trying to parse two different large volume csv files with complex values and print the difference in new csv file,
Looking into one of the authors examples I tried to use processFile after reading file1 into list then converting to map, still I get error while parsing.
Below are my sample input and expected output file.
INPUT - file1
"h1","h2","h3","h4","h5"
"00000","US","9503.00.0089","USA","9503.0089"
"","EU","9503.00.7000","EUROPEAN UNION","9503.00.7000"
"#1200","US","5601.22.0010","USA","5601.22.0010"
"0180691","US","9503.00.0073","USA","9503.00.0073"
“DRTY01”,”CA”,”9603.01.0088”,”CAN”,”9603.01.0088”
INPUT - file2
"h1","h2","h3","h6","h7","h8","h9","h10",h11
"018890","US","","2015","101","1","1","All",””
"00000","US","9503.00.0090","1986","101","1","1","All","9503.00.0090"
"0180691","US","9503.00.0073","2019","101","1","1","All","9503.00.0073”
“DRTY01”,”CA”,”9603.01.0087”,”2002”,”102”,”1”,”2”,”CA”, “9603.01.0087”
Selecting h1, h2 common values in file1 and file2 then comparing h3 of file1 to h3 of file2 , if both files h3 are not equal then I want to print “h1”,”h4”,” h10”,”h5”, ”h11”,”h6”,”h7”,”h8”,”h9” to file3
OUTPUT - file3
“h1”,”h4”,” h10”,”h5”, ”h11”,”h6”,”h7”,”h8”,”h9”
"00000","USA”,”All”,”9503.00.0089”,”9503.00.0090”, "1986","101","1","1"
"DRTY01”,“CAN”,”CA”,”9603.01.0088”,“9603.01.0087”,”2002”,”102”,”1”,”2”
I have a solution for your problem but please do regress testing. So what I'm assuming is that h1 and h2 combined would be a unique value. I'm creating a HashMap with a map of as key and the entire row of the csv file as value. We will override the hashcode and equals method of the created class like:
The Logic in equals will be - if both h1 and h2 are same in map1 and map2 while h3 is different give me the row from map1 and map2. This logic uses additional space in maps but the overall computation logic is reduced to O(N). The below code will give you the rows you want from the maps.I have not performed IO and exception handling properly, please take care of them accordingly.
The Test class
The bean class which will have the three column h1,h2,h3:
Output for the given input csv files: