PortalTransforms/safe_html: supports python3

- HTMLParseError no longer exist, on python3 parse_declaration throws AttributeError py2: https://github.com/python/cpython/blob/2.7/Lib/markupbase.py#L135-L140 https://github.com/python/cpython/blob/2.7/Lib/HTMLParser.py#L124 py3: https://github.com/python/cpython/blob/3.12/Lib/_markupbase.py#L130-L134 - scrubHTML must pass `html` as unicode on python2 and str on python3, adjust the check to cover both py2 / py3

PortalTransforms/safe_html: supports python3
- HTMLParseError no longer exist, on python3 parse_declaration throws AttributeError py2: https://github.com/python/cpython/blob/2.7/Lib/markupbase.py#L135-L140 https://github.com/python/cpython/blob/2.7/Lib/HTMLParser.py#L124 py3: https://github.com/python/cpython/blob/3.12/Lib/_markupbase.py#L130-L134 - scrubHTML must pass `html` as unicode on python2 and str on python3, adjust the check to cover both py2 / py3
a738f950 · Jérome Perrin · Arnaud Fontaine · 8397282a · a738f950
Commit a738f950 authored Jan 13, 2024 by Jérome Perrin Committed by Arnaud Fontaine Jul 09, 2024
Hide whitespace changes
Inline Side-by-side

Showing with 7 additions and 2 deletions

product/PortalTransforms/transforms/safe_html.py product/PortalTransforms/transforms/safe_html.py +7 -2

No files found.
--- a/product/PortalTransforms/transforms/safe_html.py
+++ b/product/PortalTransforms/transforms/safe_html.py
 # -*- coding: utf-8 -*-
 from six import unichr
 from zLOG import ERROR
-from six.moves.html_parser import HTMLParser, HTMLParseError
+from six.moves.html_parser import HTMLParser
 import re
 from Products.PythonScripts.standard import html_quote
 import codecs
@@ -17,6 +17,11 @@ from lxml.etree import HTMLParser as LHTMLParser
 from lxml.html import tostring
 import six
+if six.PY2:
+  from six.moves.html_parser import HTMLParseError
+else:
+  HTMLParseError = AssertionError
 try:
  from lxml.html.soupparser import fromstring as soupfromstring
 except ImportError:
@@ -365,7 +370,7 @@ def scrubHTML(html, valid=VALID_TAGS, nasty=NASTY_TAGS,
    # As suggested by python developpers:
    # "Python 3.0 implicitly rejects non-unicode strings"
    # We try to decode strings against provided codec first
-    if isinstance(html, str):
+    if isinstance(html, bytes):
      try:
        html = html.decode(default_encoding)
      except UnicodeDecodeError: