Python: Parsing HTML with Beautiful soup

Beautiful Soup is probably the most popular Python library to parse HTML files.

Here is an example for when we have < and/or > as part of the HTML attributes.

examples/python/beautiful_soup_example.py

from bs4 import BeautifulSoup
# BeautifulSoup4-4.10.0 soupsieve-2.2.1
# html5lib-1.1

for html in [
    '<a if="{something.length > 0}">remove</a>'
    ]:
    for parser in ["lxml", "html5lib", "html.parser"]:
        soup = BeautifulSoup(html, parser)
        for formatter in [None, "minimal", "html"]:
            prettyHTML = soup.prettify(formatter=formatter)
            print(prettyHTML)

Python: Parsing HTML with Beautiful soup

Author