processor.web
Module¶
Web processing.
-
exception
wpull.processor.web.
HookPreResponseBreak
[source]¶ Bases:
wpull.errors.ProtocolError
Hook pre-response break.
-
class
wpull.processor.web.
WebProcessor
(web_client: wpull.protocol.http.web.WebClient, fetch_params: wpull.processor.web.WebProcessorFetchParamsType)[source]¶ Bases:
wpull.processor.base.BaseProcessor
,wpull.application.hook.HookableMixin
HTTP processor.
Parameters: - web_client – The web client.
- fetch_params – Fetch parameters
See also
-
DOCUMENT_STATUS_CODES
= (200, 204, 206, 304)¶ Default status codes considered successfully fetching a document.
-
NO_DOCUMENT_STATUS_CODES
= (401, 403, 404, 405, 410)¶ Default status codes considered a permanent error.
-
fetch_params
¶ The fetch parameters.
-
web_client
¶ The web client.
-
wpull.processor.web.
WebProcessorFetchParams
¶ WebProcessorFetchParams
Parameters: - post_data (str) – If provided, all requests will be POSTed with the given post_data. post_data must be in percent-encoded query format (“application/x-www-form-urlencoded”).
- strong_redirects (bool) – If True, redirects are allowed to span hosts.
alias of
WebProcessorFetchParamsType
-
class
wpull.processor.web.
WebProcessorSession
(processor: wpull.processor.web.WebProcessor, item_session: wpull.pipeline.session.ItemSession)[source]¶ Bases:
wpull.processor.base.BaseProcessorSession
Fetches an HTTP document.
This Processor Session will handle document redirects within the same Session. HTTP errors such as 404 are considered permanent errors. HTTP errors like 500 are considered transient errors and are handled in subsequence sessions by marking the item as “error”.
If a successful document has been downloaded, it will be scraped for URLs to be added to the URL table. This Processor Session is very simple; it cannot handle JavaScript or Flash plugins.