database.base
Module¶
Base table class.
-
wpull.database.base.
AddURLInfo
¶ alias of
_AddURLInfo
-
class
wpull.database.base.
BaseURLTable
[source]¶ Bases:
object
URL table.
-
add_many
(new_urls: typing.Iterator) → typing.Iterator[source]¶ Add the URLs to the table.
Parameters: new_urls – URLs to be added. Returns: The URLs added. Useful for tracking duplicates.
-
add_one
(url: str, url_properties: typing.Union=None, url_data: typing.Union=None)[source]¶ Add a single URL to the table.
Parameters: - url – The URL to be added
- url_properties – Additional values to be saved
- url_data – Additional data to be saved
-
add_visits
(visits)[source]¶ Add visited URLs from CDX file.
Parameters: visits (iterable) – An iterable of items. Each item is a tuple containing a URL, the WARC ID, and the payload digest.
-
check_in
(url: str, new_status: wpull.pipeline.item.Status, increment_try_count: bool=True, url_result: typing.Union=None)[source]¶ Update record for processed URL.
Parameters: - url – The URL.
- new_status – Update the item status to new_status.
- increment_try_count – Whether to increment the try counter for the URL.
- url_result – Additional values.
-
check_out
(filter_status: wpull.pipeline.item.Status, filter_level: typing.Union=None) → wpull.pipeline.item.URLRecord[source]¶ Find a URL, mark it in progress, and return it.
Parameters: - filter_status – Gets first item with given status.
- filter_level – Gets item with filter_level or lower.
Raises:
-
get_one
(url: str) → wpull.pipeline.item.URLRecord[source]¶ Return a URLRecord for the URL.
Raises: NotFound
-
-
exception
wpull.database.base.
NotFound
[source]¶ Bases:
wpull.database.base.DatabaseError
Item not found in the table.