PowerShell replace

88 Views Asked by At

I'm using PowerShell to run and export a SQL query. By default my output file has " " around the values, but one of the requirements is to only return raw data, so I piped in | % {$_ -replace '"',""} | to address that, however one of the fields is a text field that contains measurements that have " for length I need to maintain.

My questions is how to apply | % {$_ -replace '"',""} | to everything BUT one field.

2

There are 2 best solutions below

7
piotreks-uw-edu On

If I had a string content as similar to a CSV file content with 3 columns: Col1, Col2, Col3 I would write something like this:

$csvSimilarContent = @"
Col1,Col2,Col3
"AAA","MOD70103- 24`" X 16`" PP2 54W`" X 40D`" CLR Black","CCC"
"@

# takes a string that contains CSV data as input and creates a 
# custom object for each row in the CSV data. 
$content = $csvSimilarContent | ConvertFrom-Csv

# Create an empty array for CSV content
$modifiedContent = @()

# Iterate through each row in the CSV file
foreach ($row in $content) {
    # Remove surrounding double quotes in fields
    $row.Col1 = $row.Col1 -replace '^"|"$', ''
    $row.Col2 = $row.Col2 -replace '^"|"$', ''
    $row.Col3 = $row.Col3 -replace '^"|"$', ''

    # Add the modified row to the modified content array
    $modifiedContent += $row
}

# Display the modified content
$modifiedContent

The script removes only double quotes, that surround the fields.

Note: Extending arrays in a loop with += is inefficient

0
mklement0 On

Note:

  • In PowerShell (Core) 7+ the Export-Csv and ConvertTo-Csv cmdlets now have -UseQuotes and -QuoteFields parameters, which would allow you to create your CSV data without double-quoted fields to begin with; e.g.:

    ... | Export-Csv out.csv -UseQuotes Never
    
  • In Windows PowerShell these cmdlets invariably output double-quoted fields, necessitating manual post-processing, as shown below.


Try the following:

... | % { $_ -replace '(?<=^|",)"|"(?=,"|$)' -replace '""', '"' }

Note:

  • The assumption is that fields with embedded " values do not also contain ,, because the removal of the enclosing double quotes would then break the partitioning of the line into fields.

  • The extra -replace operation, -replace '""', '"' unescapes the field-embedded " chars.

    • Note that a well-formed CSV requires " chars. inside double-quoted fields to be escaped as "".
      Export-Csv / ConvertTo-Csv do perform this escaping.

    • However, Import-Csv / ConvertFrom-Csv also tolerate input without this kind of escaping.

    • Either way, the above command works: if the embedded " chars. aren't escaped as "", -replace '""', '"' is simply a no-op.

A simple, self-contained example:

@'
"foo","bar"
"baz","qu""ux"
'@ -split '\r?\n' | 
  % { $_ -replace '(?<=^|",)"|"(?=,"|$)' -replace '""', '"' }

Output (only the field internal " was retained):

foo,bar
baz,qu"ux

For an explanation of the regex and the ability to experiment with it, see this regex101.com page - note that, because the linked page uses C# string-literal syntax, the " chars. in the regex are doubled.


Note that you can speed up processing considerably if you load your entire CSV file as a whole into memory and modify your regex with the Multiline regex option ((?m)), which then causes ^ and $ to match the start and end of each line:

(Get-Content -Raw your.csv) -replace '(?m)(?<=^|",)"|"(?=,"|$)' -replace '""', '"'