Remove commas from numbers in a CSV

3k Views Asked by At

I have folder info for all user folders. It is dumped out to a CSV file as follows:

Servername, F:\Users\user, 9,355.7602 MB, 264, 3054, 03/15/2000 13:28:48, 12/10/2018 11:58:29

We are unable to work with the data as is due to the thousands separator in the 3rd column. I could run the report scripts again, but we have a lot of file servers and a large number of users on one in particular, so running it again is very time consuming. The reason the commas are there is that the data was written as a string not a number.

I can import and convert, the only problem is that any number over 1000 will be wrong and then all other data is 1 column off. I would like to replace any comma between 2 numbers. It doesn't seem it would be that hard to do with PowerShell, but I am not having any luck finding anything.

2

There are 2 best solutions below

1
On BEST ANSWER

If you assume that columns of data are comma plus space separated and your numbers have no spaces, you can use the -replace operator for this.

$line = 'Servername, F:\Users\user, 9,355.7602 MB, 264, 3054, 03/15/2000 13:28:48, 12/10/2018 11:58:29'
$line -replace '(?<=\d),(?=\d)'

If you are reading the data from a file, you can read the data with Get-Content, replace your data, and update the file with Set-Content.

(Get-Content file.csv) -replace '(?<=\d),(?=\d)' | Set-Content file.csv

If the file is large, you can utilize the faster switch statement.

$data = switch -regex -file file.csv {
          '(?<=\d),(?=\d)' { $_ -replace '(?<=\d),(?=\d)' }
          default {$_}
        }
$data | Set-Content file.csv

Explanation:

  • (?<=\d) uses a positive lookbehind assertion (?<=) that matches a single digit \d.
  • (?=\d) uses a positive lookahead assertion (?=) that matches a single digit. You could replace this with (?=\d{3}) to match 3 consecutive digits after the comma.
  • Since you want to replace the target comma with empty string, you do not need a replacement string.

Typically, it would be best to stick with commands that work with CSV data or files. However, if your data contains commas and you aren't qualifying your text, it may be difficult to distinguish between data and delimiters. If you have a clear way of making that distinction, you are better off using ConvertFrom-Csv for already read data or Import-Csv for files. You will need to define headers either in the files or in the command.

5
On

EDIT

It was my oversight that the , in the dataset is not delimited, which causes this answer to not work as expected as the comma is seen as a column separator when parsing the CSV. I'm going to leave it as it does explain how to generally manipulate the data as you'd expect, if the column data were escaped property. However, @AdminOfThings' answer below should work for your specific case here, and will fix the erroneous defined column without relying on parsing the CSV content as a CSV first.


Import the data using Import-Csv, then remove any , in the third column. This assumes that you have no values where , is the decimal separator:

If you have headers in the CSV, you won't need to define header names or get fancy with writing the CSV back out:

Import-Csv -Path \path\to\file.csv | Foreach-Object {
  $_.ColumnName = $_.ColumnName -replace ','
} | Export-Csv -NoTypeInformation -Path \path\to\file.csv

The way this works is that we import the CSV as an operable PSCustomObject, then for each line we take whatever the column name with the size is and remove the , from it. Finally, we export the modified PSCustomObject back out to the original CSV.

If you don't have headers, it gets a little trickier since we have to define temporary headers, but Export-Csv doesn't have an option to skip writing out headers:

Import-Csv -Path \path\to\file.csv -Headers Col1, Col2, Col3, Col4, Col5, Col6, Col7 |
  Foreach-Object {
    $_.Col3 = $_.Col3 -replace ','
  } | ConvertTo-Csv | Select-Object -Skip 1 |
  Set-Content -Path \path\to\file.csv

This does the same thing as the first block of code, but since we don't want to export the temporary headers, we have to get creative. First, note we reference the target column with the temporary header name. Instead of piping the modified CSV object right to Export-Csv, first we want to convert the object to CSV using ConvertTo-Csv. We then use Select-Object to skip the first line of the converted CSV text, which is the header, so we just have the row data and column values. Finally, we use Set-Content to write the CSV text without the header back to the original file.