How to strip the XHTML tags from the SQL query

106 Views Asked by At

I have this below query for Rapid 7 Nexpose SQL database and have been trying to remove xhtl,

SELECT
  dp.policy_id, dp.title as policy_title, dpr.rule_id, dpr.title as policy_rule_title,
  dp.benchmark_name, da.ip_address, da.host_name, dpr.description, dp.category,
  fapr.date_tested, htmlToText(fapr.proof) as proof, fapr.compliance,
  dpr.severity, htmlToText(dpr.rationale) as rationale, htmlToText(dpr.remediation) as remediation
FROM fact_asset_policy_rule fapr
  JOIN dim_policy dp on dp.policy_id = fapr.policy_id
  JOIN dim_policy_rule dpr on dpr.policy_id = fapr.policy_id and fapr.rule_id = dpr.rule_id
  JOIN dim_asset da on da.asset_id = fapr.asset_id
WHERE fapr.compliance = false order by dp.title, dpr.title`

However it exports the below content for rationale and remediation with below xhtml tags which i would like to get rid off:

Rationale column example:

<xhtml:p xmlns="http://checklists.nist.gov/xccdf/1.2" 
xmlns:ae="http://benchmarks.cisecurity.org/ae/0.5" 
xmlns:cc6="http://cisecurity.org/20-cc/v6.1" 
xmlns:cc7="http://cisecurity.org/20-cc/v7.0" 
xmlns:ciscf="https://benchmarks.cisecurity.org/ciscf/1.0" 
xmlns:notes="http://benchmarks.cisecurity.org/notes" 
xmlns:xccdf="http://checklists.nist.gov/xccdf/1.2" 
xmlns:xhtml="http://www.w3.org/1999/xhtml"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">Attack surface 
reduction helps prevent actions and apps that are typically used by 
exploit-seeking malware to infect machines.</xhtml:p>

I would only like to get the output with TEXT "Attack surface reduction helps prevent actions and apps that are typically used by exploit-seeking malware to infect machines" and remove rest of the data.

Remediation Example:

<xhtml:div xmlns="http://checklists.nist.gov/xccdf/1.2"
xmlns:ae="http://benchmarks.cisecurity.org/ae/0.5" 
xmlns:cc6="http://cisecurity.org/20-cc/v6.1" 
xmlns:cc7="http://cisecurity.org/20-cc/v7.0" 
xmlns:ciscf="https://benchmarks.cisecurity.org/ciscf/1.0" 
xmlns:notes="http://benchmarks.cisecurity.org/notes" 
xmlns:xccdf="http://checklists.nist.gov/xccdf/1.2" 
xmlns:xhtml="http://www.w3.org/1999/xhtml" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <xhtml:p> 
<xhtml:p> To establish the recommended configuration via GP, set the 
following UI path to <xhtml:span 
class="inline_block">Enabled</xhtml:span> : </xhtml:p> <xhtml:code 
class="code_block">Computer Configuration\Policies\Administrative 
Templates\Windows Components\Windows Defender Antivirus\Windows Defender 
Exploit Guard\Attack Surface Reduction\Configure Attack Surface 
Reduction rules </xhtml:code> <xhtml:p> <xhtml:strong>Note: 
</xhtml:strong> This Group Policy path may not exist by default. It is 
provided by the Group Policy template <xhtml:span 
class="inline_block">WindowsDefender.admx/adml</xhtml:span> that is 
included with the Microsoft Windows 10 Release 1709 Administrative 
Templates (or newer). </xhtml:p> <xhtml:p class="bold">Impact:</xhtml:p> 
<xhtml:p> <xhtml:p>When a rule is triggered, a notification will be 
displayed from the Action Center.</xhtml:p> </xhtml:p> </xhtml:p> 
</xhtml:div>

Again need to output of remediation to be clean without XHTML and xmlns, so the clean text should look like something below:

To establish the recommended configuration via GP, set the following UI path to Enabled : Computer Configuration\Policies\Administrative Templates\Windows Components\Windows Defender Antivirus\Windows Defender Exploit Guard\Attack Surface Reduction\Configure Attack Surface Reduction rules, This Group Policy path may not exist by default. It is provided by the Group Policy template WindowsDefender.admx/adml that is included with the Microsoft Windows 10 Release 1709 Administrative Templates (or newer).
Impact: When a rule is triggered, a notification will be displayed from the Action Center.

Is there a way I can get clean HTML to text ? without the content that I don't need?

1

There are 1 best solutions below

3
Tomalak On

Is there a way I can get clean HTML to text?

If it is fully valid XHTML, you can cast it as XML and then select the value of the node that interests you using XPath.

declare @xhtml varchar(max)
set @xhtml = '<xhtml:div xmlns="http://checklists.nist.gov/xccdf/1.2" xmlns:ae="http://benchmarks.cisecurity.org/ae/0.5" xmlns:cc6="http://cisecurity.org/20-cc/v6.1" xmlns:cc7="http://cisecurity.org/20-cc/v7.0" xmlns:ciscf="https://benchmarks.cisecurity.org/ciscf/1.0" xmlns:notes="http://benchmarks.cisecurity.org/notes" xmlns:xccdf="http://checklists.nist.gov/xccdf/1.2" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">     <xhtml:p>         <xhtml:p>To establish the recommended configuration via GP, set the following UI path to <xhtml:span class="inline_block">Enabled</xhtml:span> : </xhtml:p>         <xhtml:code class="code_block">Computer Configuration\Policies\Administrative Templates\Windows Components\Windows Defender Antivirus\Windows Defender Exploit Guard\Attack Surface Reduction\Configure Attack Surface Reduction rules </xhtml:code> <xhtml:p> <xhtml:strong>Note:</xhtml:strong> This Group Policy path may not exist by default. It is provided by the Group Policy template <xhtml:span class="inline_block">WindowsDefender.admx/adml</xhtml:span> that is included with the Microsoft Windows 10 Release 1709 Administrative Templates (or newer). </xhtml:p>         <xhtml:p class="bold">Impact:</xhtml:p> <xhtml:p> <xhtml:p>When a rule is triggered, a notification will be displayed from the Action Center.</xhtml:p>     </xhtml:p>     </xhtml:p>     </xhtml:div>'

;with xmlnamespaces('http://www.w3.org/1999/xhtml' as xhtml) 
select (cast(@xhtml as xml)).value('(//xhtml:p)[1]', 'varchar(max)')

See http://sqlfiddle.com/#!18/9eecb/136834.

Use more specific XPath if you need. For more complex requirements than "value of a single node" you can switch to XQuery, too.