I expected a function to validate HTML to be standard Oracle database functionality, but alas.
Some internet searching did return a few solutions, but none of those were as good as I needed them to be.
This was the most useful link I could find:
https://forums.oracle.com/ords/apexds/post/how-validate-html-using-pl-sql-8336.
That post mentioned utl_i18n.escape_reference, but that will not work with HTML like e.g.: <p>Hello&</p>
So instead, I used apex_escape.html_allowlist_clob, which does work for the previous example.
However, by default that will not work for HTML like e.g.: <p>Hello</p><br>
To get that to work, self-closing tags like <br> - so I needed to exclude those.
Documentation on that:
https://docs.oracle.com/en/database/oracle/apex/24.2/aeapi/HTML_ALLOWLIST-Function.html#GUID-AB0E2A42-D232-4AC6-9881-FE9437B9373E:
"The HTML_ALLOWLIST function performs HTML escape on all characters in the input text except the specified allowlist tags.
This function can be useful if the input text contains simple html markup but a developer wants to ensure that an attacker cannot use malicious tags for cross-site scripting."
I'm sure this is not 100% waterproof, but it worked well enough for me. If you have ideas for improvements please let me know.
Html | Expected | Is Valid Html | Escaped |
<p>Hello</p> | Y | Y | <p>Hello</p> |
<p>Hello&</p> | Y | Y | <p>Hello&amp;</p> |
<p>Hello</p><br> | Y | Y | <p>Hello</p><br> |
<p>Hello</p><br/> | Y | Y | <p>Hello</p><br/> |
<p>Hello&</p | N | N | <p>Hello&amp;</p |