My Rails3 app parses user-uploaded CSV files.
As can be expected, users upload tab-separated AND comma-separated files.
I want to support both.
My code:
input = CSV.read(uploaded_io.tempfile, { encoding: "UTF-8", :col_sep => "\t"})
QUESTION:How to change it to support commas too?
FasterCSV's doc describes col_sep as The String placed between each field. so :col_sep => ",\t" won't work.
Note: All data inside are integers or identifiers, so the probability of someone using \t or , within the content (not a delimiter) is zero. So usage of the two different delimiters in the same file is not something I expressly want to prevent.
Solution 1:
One simple way to do it is to let the user select with a drop-down which separator they use in their CSV file, and then you just set that value in the
CSV.read()call. But I guess you want it automatic. :-)Solution 2:
You can read-in the first line of the CSV file with regular
File.read()and analyze it by matching the first line against/,/and then against/\t/... depending on which RegExp matches, you select the separator in theCSV.read()call to the according (single) separator. Then you read in the file withCSV.read(..., :col_sep => single_separator )accordingly.But Beware:
At first it looks nice and elegant to want to use
",\t"as the separator in the method call to allow both -- but please note this would introduce a possible nasty bug!If a CVS file would contain both tabs and commas by accident or by chance ... what do you do then? Separate on both? How can you be sure? I think that would be a mistake, because CSV separators don't appear "mixed" like this in regular CSV files -- it's always either
','or"\t"So I think you should not use
",\t"-- that could be causing huge problems, and that's probably the reason why they did not implement / allow thecol_sepoption to accept a RegExp.