I have this below query for Rapid 7 Nexpose SQL database and have been trying to remove xhtl,
SELECT
dp.policy_id, dp.title as policy_title, dpr.rule_id, dpr.title as policy_rule_title,
dp.benchmark_name, da.ip_address, da.host_name, dpr.description, dp.category,
fapr.date_tested, htmlToText(fapr.proof) as proof, fapr.compliance,
dpr.severity, htmlToText(dpr.rationale) as rationale, htmlToText(dpr.remediation) as remediation
FROM fact_asset_policy_rule fapr
JOIN dim_policy dp on dp.policy_id = fapr.policy_id
JOIN dim_policy_rule dpr on dpr.policy_id = fapr.policy_id and fapr.rule_id = dpr.rule_id
JOIN dim_asset da on da.asset_id = fapr.asset_id
WHERE fapr.compliance = false order by dp.title, dpr.title`
However it exports the below content for rationale and remediation with below xhtml tags which i would like to get rid off:
Rationale column example:
<xhtml:p xmlns="http://checklists.nist.gov/xccdf/1.2"
xmlns:ae="http://benchmarks.cisecurity.org/ae/0.5"
xmlns:cc6="http://cisecurity.org/20-cc/v6.1"
xmlns:cc7="http://cisecurity.org/20-cc/v7.0"
xmlns:ciscf="https://benchmarks.cisecurity.org/ciscf/1.0"
xmlns:notes="http://benchmarks.cisecurity.org/notes"
xmlns:xccdf="http://checklists.nist.gov/xccdf/1.2"
xmlns:xhtml="http://www.w3.org/1999/xhtml"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">Attack surface
reduction helps prevent actions and apps that are typically used by
exploit-seeking malware to infect machines.</xhtml:p>
I would only like to get the output with TEXT "Attack surface reduction helps prevent actions and apps that are typically used by exploit-seeking malware to infect machines" and remove rest of the data.
Remediation Example:
<xhtml:div xmlns="http://checklists.nist.gov/xccdf/1.2"
xmlns:ae="http://benchmarks.cisecurity.org/ae/0.5"
xmlns:cc6="http://cisecurity.org/20-cc/v6.1"
xmlns:cc7="http://cisecurity.org/20-cc/v7.0"
xmlns:ciscf="https://benchmarks.cisecurity.org/ciscf/1.0"
xmlns:notes="http://benchmarks.cisecurity.org/notes"
xmlns:xccdf="http://checklists.nist.gov/xccdf/1.2"
xmlns:xhtml="http://www.w3.org/1999/xhtml"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <xhtml:p>
<xhtml:p> To establish the recommended configuration via GP, set the
following UI path to <xhtml:span
class="inline_block">Enabled</xhtml:span> : </xhtml:p> <xhtml:code
class="code_block">Computer Configuration\Policies\Administrative
Templates\Windows Components\Windows Defender Antivirus\Windows Defender
Exploit Guard\Attack Surface Reduction\Configure Attack Surface
Reduction rules </xhtml:code> <xhtml:p> <xhtml:strong>Note:
</xhtml:strong> This Group Policy path may not exist by default. It is
provided by the Group Policy template <xhtml:span
class="inline_block">WindowsDefender.admx/adml</xhtml:span> that is
included with the Microsoft Windows 10 Release 1709 Administrative
Templates (or newer). </xhtml:p> <xhtml:p class="bold">Impact:</xhtml:p>
<xhtml:p> <xhtml:p>When a rule is triggered, a notification will be
displayed from the Action Center.</xhtml:p> </xhtml:p> </xhtml:p>
</xhtml:div>
Again need to output of remediation to be clean without XHTML and xmlns, so the clean text should look like something below:
To establish the recommended configuration via GP, set the following UI path to Enabled : Computer Configuration\Policies\Administrative Templates\Windows Components\Windows Defender Antivirus\Windows Defender Exploit Guard\Attack Surface Reduction\Configure Attack Surface Reduction rules, This Group Policy path may not exist by default. It is provided by the Group Policy template WindowsDefender.admx/adml that is included with the Microsoft Windows 10 Release 1709 Administrative Templates (or newer).
Impact: When a rule is triggered, a notification will be displayed from the Action Center.
Is there a way I can get clean HTML to text ? without the content that I don't need?
If it is fully valid XHTML, you can cast it as XML and then select the value of the node that interests you using XPath.
See http://sqlfiddle.com/#!18/9eecb/136834.
Use more specific XPath if you need. For more complex requirements than "value of a single node" you can switch to XQuery, too.