Why does Powershell xml.save create LFs in multi line comments on Windows?

132 Views Asked by At

I found out that xml.save ends multi line comments with LF on windows instead of CR LF. Why is that? Here is my code

    [xml]$myXml= @"
<myTag>
        <!-- multi line comment
        each line ends with CR LF
        in my code, but with LF
        after .save gets called
        end of my multi line comment -->
</myTag>
"@

$myXml.save("C:\temp\my.xml")

Here is a screenshot of Notepad++ from my.ps1 to show that my code does not contain any LFs

enter image description here

Here is a screenshot of Notepad++ from my.xml to show the LFs

enter image description here

One way to fix this, which I found, is using XmlWriterSettings and XmlTextWriter:

$settings = New-Object System.Xml.XmlWriterSettings
$settings.NewLineChars = "`r`n"
$settings.Indent = $true
$writer = [System.Xml.XmlTextWriter]::Create($dependencyXmlPath, $settings)
$myXml.Save($writer)
$writer.Close()

Is this the most simple solution?

2

There are 2 best solutions below

2
On BEST ANSWER

Instruct the [xml] (System.Xml.XmlDocument) instance to preserve insignificant whitespace before loading, by setting its PreserveWhitespace property to $true, which preserves the input newline format as well as the specific intra-line whitespace[1] (note how the indentation changed in your output file).

# Create an [xml] instance explicitly, so that its
# .PreserveWhitespace property can be set *before* loading content.
($myXml = [xml]::new()).PreserveWhitespace = $true
# Now load the XML text (parse it into a DOM).
$myXml.LoadXml(
@"
<myTag>
        <!-- multi line comment
        each line ends with CR LF
        in my code, but with LF
        after .save gets called
        end of my multi line comment -->
</myTag>
"@
)

$myXml.Save("C:\temp\my.xml")

However, apart from preserving the indentation, the above only consistently results in Windows-format CRLF newlines if your .ps1 file uses them (which it does):

  • The string values resulting from here-strings preserve the script file's newlines as-is.

See:

  • the next section for an explanation of the problem
  • the bottom section for a solution without .PreserveWhitespace or, even with the latter, to ensure consistent use of a (possibly different) format.

Generally - unless .PreserveWhitespace = $true is in effect - the .Save() method:

  • invariably uses the platform-native newline format, irrespective of the original input text's newline format...

  • ... with one exception, which is the one you ran into:

    • Behind the scenes, the XmlWriterSettings instance that the .Save() method uses (when not passed an XmlWriter instance explicitly) has its .NewLineHandling property set to None.

    • This results in newlines that are part of multiline comments, multiline text nodes and other potentially multiline constructs such as CDATA sections getting serialized with LF-only newlines - always, irrespective of the original newline format in the input (presumably because in the in-memory DOM all newlines are stored in LF-only format).

    • This behavior is certainly surprising, and arguably a bug, given that it's therefore easy to end up with a mix of CRLF and LF newlines on Windows, as in your case.


A workaround without .PreserveWhitespace = $true and / or ensuring a consistent output newline format:

Note the two use cases:

  • You may need need or want to preserve insignificant whitespace from the input on reading and are only concerned with ensuring consistent use of the newline format of interest on writing.

  • You need .PreserveWhitespace = $true but want to use a different newline format on writing.

You can control the output newline format by explicitly creating a XmlWriter instance with with an XmlWriterSettings instance with the following properties:

  • For pretty-printing - if desired - set .Indent = $true, which uses 2 spaces per indentation level by default, overridable via .IndentChars - this is what .Save() does when given an output file path or stream.

  • Set .NewLineHandling = 'Replace' to ensure consistent use of newlines (this is actually the default value, so it is curious that .Save() in effect uses 'None').

    • By default, this gives you platform-native newlines.
    • To use a fixed format, use .NewLineChars = "`r`n" / .NewLineChars = "`n" for Windows-format CRLF / Unix-format LF-only newlines.
# Using an [xml] cast means that insignificant whitespace
# is *not* preserved.
[xml] $myXml= @"
<myTag>
        <!-- multi line comment
        each line ends with CR LF
        in my code, but with LF
        after .save gets called
        end of my multi line comment -->
</myTag>
"@

# Create an XML writer explicitly, with settings
# that 
$writer = [System.Xml.XmlWriter]::Create(
  "C:\temp\my.xml", 
  [System.Xml.XmlWriterSettings] @{ 
    # Pretty-print, using the value of .IndentChars
    # per indentation level; default is *two spaces*.
    Indent = $true
    # Replace all newlines in the DOM with the character(s) 
    # specified in the .NewLineChars property,
    # which defaults to the platform-native format.
    NewLineHandling = 'Replace'
   }
  )

# Save to the target file via the writer.
$myXml.Save($writer); $writer.Dispose()

Testing a given file for the presence of LF-only newlines:
# Returns $true if at least one LF-only newline is present.
(Get-Content -Raw $myXmlPath) -match '(?<!\r)\n' 

[1] There is one exception: intra-tag whitespace is not preserved, i.e. the specific whitespace - including any newlines - that separates the element name from the first attribute as well as the whitespace between attributes isn't preserved - see this answer.

3
On

I would just have the comment marks on every line. (Usually an editor like the ISE or Notepad++ or Emacs has column based commands for inserting text.) A long comment line won't get rewrapped after saving. This is how it gets reformatted after a save. (<?ignore ... ?> has the same problem as the question.)

<myTag>
  <!-- multi line comment           -->
  <!-- each line ends with CR LF    -->
  <!-- in my code, but with LF      -->
  <!-- after .save gets called      -->
  <!-- end of my multi line comment -->
</myTag>