`document.html` Module¶

HTML document readers.

wpull.document.html.COMMENT = <object object>¶: Comment element

class wpull.document.html.HTMLLightParserTarget(callback, text_elements=frozenset({'link', 'icon', 'style', 'script', 'url'}))[source]¶

Bases: object

An HTML parser target for partial elements.

Parameters:

callback –
A callback function. The function should accept the :param 1. tag: The tag name of the element. :type 1. tag: str :param 2. attrib: The attributes of the element. :type 2. attrib: dict :param 3. text: The text of the element.

type 3. text: str, None
text_elements – A frozenset of element tag names that we should keep track of text.

close()[source]¶

data(data)[source]¶

end(tag)[source]¶

start(tag, attrib)[source]¶

class wpull.document.html.HTMLParserTarget(callback)[source]¶

Bases: object

An HTML parser target.

Parameters:

callback –

A callback function. The function should accept the :param 1. tag: The tag name of the element. :type 1. tag: str :param 2. attrib: The attributes of the element. :type 2. attrib: dict :param 3. text: The text of the element. :type 3. text: str, None :param 4. tail: The text after the element. :type 4. tail: str, None :param 5. end: Whether the tag is and end tag.

type 5. end:	bool

close()[source]¶

comment(text)[source]¶

data(data)[source]¶

end(tag)[source]¶

start(tag, attrib)[source]¶

class wpull.document.html.HTMLReadElement(tag, attrib, text, tail, end)[source]¶

Bases: object

Results from HTMLReader.read_links().

tag¶

str

The element tag name.

attrib¶

dict

The element attributes.

text¶

str, None

The element text.

tail¶

str, None

The text after the element.

end¶

bool

Whether the tag is an end tag.

attrib

end

tag

tail

text

class wpull.document.html.HTMLReader(html_parser)[source]¶

Bases: wpull.document.base.BaseDocumentDetector, wpull.document.base.BaseHTMLReader

HTML document reader.

Parameters:	html_parser (`document.htmlparse.BaseParser`) – An HTML parser.

classmethod is_file(file)[source]¶: Return whether the file is likely to be HTML.

classmethod is_request(request)[source]¶: Return whether the Request is likely to be a HTML.

classmethod is_response(response)[source]¶: Return whether the Response is likely to be HTML.

classmethod is_url(url_info)[source]¶: Return whether the URLInfo is likely to be a HTML.

iter_elements(file, encoding=None)[source]¶

document.html Module¶

`document.html` Module¶