document.html
Module¶
HTML document readers.
-
wpull.document.html.
COMMENT
= <object object>¶ Comment element
-
class
wpull.document.html.
HTMLLightParserTarget
(callback, text_elements=frozenset({'link', 'icon', 'style', 'script', 'url'}))[source]¶ Bases:
object
An HTML parser target for partial elements.
Parameters: - callback –
A callback function. The function should accept the :param 1. tag: The tag name of the element. :type 1. tag: str :param 2. attrib: The attributes of the element. :type 2. attrib: dict :param 3. text: The text of the element.
type 3. text: str, None - text_elements – A frozenset of element tag names that we should keep track of text.
- callback –
-
class
wpull.document.html.
HTMLParserTarget
(callback)[source]¶ Bases:
object
An HTML parser target.
Parameters: callback – A callback function. The function should accept the :param 1. tag: The tag name of the element. :type 1. tag: str :param 2. attrib: The attributes of the element. :type 2. attrib: dict :param 3. text: The text of the element. :type 3. text: str, None :param 4. tail: The text after the element. :type 4. tail: str, None :param 5. end: Whether the tag is and end tag.
type 5. end: bool
-
class
wpull.document.html.
HTMLReadElement
(tag, attrib, text, tail, end)[source]¶ Bases:
object
Results from
HTMLReader.read_links()
.-
tag
¶ str
The element tag name.
-
attrib
¶ dict
The element attributes.
-
text
¶ str, None
The element text.
-
tail
¶ str, None
The text after the element.
-
end
¶ bool
Whether the tag is an end tag.
-
attrib
-
end
-
tag
-
tail
-
text
-
-
class
wpull.document.html.
HTMLReader
(html_parser)[source]¶ Bases:
wpull.document.base.BaseDocumentDetector
,wpull.document.base.BaseHTMLReader
HTML document reader.
Parameters: html_parser ( document.htmlparse.BaseParser
) – An HTML parser.