Options¶
Wget-compatible web downloader and crawler.
usage: wpull [-h] [-V] [--plugin-script FILE] [--plugin-args PLUGIN_ARGS]
[--database FILE | --database-uri URI] [--concurrent N]
[--debug-console-port PORT] [--debug-manhole]
[--ignore-fatal-errors] [--monitor-disk MONITOR_DISK]
[--monitor-memory MONITOR_MEMORY] [-o FILE | -a FILE]
[-d | -v | -nv | -q | -qq] [--ascii-print]
[--report-speed TYPE={bits}] [-i FILE] [-F] [-B URL]
[--http-proxy HTTP_PROXY] [--https-proxy HTTPS_PROXY]
[--proxy-user USER] [--proxy-password PASS] [--no-proxy]
[--proxy-domains LIST] [--proxy-exclude-domains LIST]
[--proxy-hostnames LIST] [--proxy-exclude-hostnames LIST]
[-t NUMBER] [--retry-connrefused] [--retry-dns-error] [-O FILE]
[-nc] [-c] [--progress TYPE={bar,dot,none}] [-N]
[--no-use-server-timestamps] [-S] [-T SECONDS]
[--dns-timeout SECS] [--connect-timeout SECS]
[--read-timeout SECS] [--session-timeout SECS] [-w SECONDS]
[--waitretry SECONDS] [--random-wait] [-Q NUMBER]
[--bind-address ADDRESS] [--limit-rate RATE] [--no-dns-cache]
[--rotate-dns] [--no-skip-getaddrinfo]
[--restrict-file-names MODES=<ascii,lower,nocontrol,unix,upper,windows>]
[-4 | -6 | --prefer-family FAMILY={IPv4,IPv6,none}] [--user USER]
[--password PASSWORD] [--no-iri] [--local-encoding ENC]
[--remote-encoding ENC] [--max-filename-length NUMBER] [-nd | -x]
[-nH] [--protocol-directories] [-P PREFIX] [--cut-dirs NUMBER]
[--http-user HTTP_USER] [--http-password HTTP_PASSWORD]
[--no-cache] [--default-page NAME] [-E] [--ignore-length]
[--header STRING] [--max-redirect NUMBER] [--referer URL]
[--save-headers] [-U AGENT] [--no-robots] [--no-http-keep-alive]
[--no-cookies] [--load-cookies FILE] [--save-cookies FILE]
[--keep-session-cookies] [--post-data STRING | --post-file FILE]
[--content-disposition] [--content-on-error] [--http-compression]
[--html-parser {html5lib,libxml2-lxml}]
[--link-extractors <css,html,javascript>] [--escaped-fragment]
[--strip-session-id]
[--secure-protocol PR={SSLv3,TLSv1,TLSv1.1,TLSv1.2,auto}]
[--https-only] [--no-check-certificate] [--no-strong-crypto]
[--certificate FILE] [--certificate-type TYPE={PEM}]
[--private-key FILE] [--private-key-type TYPE={PEM}]
[--ca-certificate FILE] [--ca-directory DIR]
[--no-use-internal-ca-certs] [--random-file FILE]
[--edg-file FILE] [--ftp-user USER] [--ftp-password PASS]
[--no-remove-listing] [--no-glob] [--preserve-permissions]
[--retr-symlinks [{0,1,no,off,on,yes}]] [--warc-file FILENAME]
[--warc-append] [--warc-header STRING] [--warc-max-size NUMBER]
[--warc-move DIRECTORY] [--warc-cdx] [--warc-dedup FILE]
[--no-warc-compression] [--no-warc-digests] [--no-warc-keep-log]
[--warc-tempdir DIRECTORY] [-r] [-l NUMBER] [--delete-after] [-k]
[-K] [-p] [--page-requisites-level NUMBER] [--sitemaps] [-A LIST]
[-R LIST] [--accept-regex REGEX] [--reject-regex REGEX]
[--regex-type TYPE={pcre}] [-D LIST] [--exclude-domains LIST]
[--hostnames LIST] [--exclude-hostnames LIST] [--follow-ftp]
[--follow-tags LIST] [--ignore-tags LIST]
[-H | --span-hosts-allow LIST=<linked-pages,page-requisites>]
[-L] [-I LIST] [--trust-server-names] [-X LIST] [-np]
[--no-strong-redirects] [--proxy-server]
[--proxy-server-address ADDRESS] [--proxy-server-port PORT]
[--phantomjs] [--phantomjs-exe PATH]
[--phantomjs-max-time PHANTOMJS_MAX_TIME]
[--phantomjs-scroll NUM] [--phantomjs-wait SEC]
[--no-phantomjs-snapshot] [--no-phantomjs-smart-scroll]
[--youtube-dl] [--youtube-dl-exe PATH]
[URL [URL ...]]
- Positional arguments:
urls the URL to be downloaded - Options:
-V, --version show program’s version number and exit --plugin-script load plugin script from FILE --plugin-args arguments for the plugin --database save database tables into FILE instead of memory --database-uri save database tables at SQLAlchemy URI instead of memory --concurrent run at most N downloads at the same time --debug-console-port run a web debug console at given port number --debug-manhole install Manhole debugging socket --ignore-fatal-errors ignore all internal fatal exception errors --monitor-disk pause if minimum free disk space is exceeded --monitor-memory pause if minimum free memory is exceeded -o, --output-file write program messages to FILE -a, --append-output append program messages to FILE -d, --debug print debugging messages -v, --verbose print informative program messages and detailed progress -nv, --no-verbose print informative program messages and errors -q, --quiet print program error messages -qq, --very-quiet do not print program messages unless critical --ascii-print print program messages in ASCII only --report-speed print speed in bits only instead of human formatted units
Possible choices: bits
-i, --input-file download URLs listed in FILE -F, --force-html read URL input files as HTML files -B, --base resolves input relative URLs to URL --http-proxy HTTP proxy for HTTP requests --https-proxy HTTP proxy for HTTPS requests --proxy-user username for proxy “basic” authentication --proxy-password password for proxy “basic” authentication --no-proxy disable proxy support --proxy-domains use proxy only from LIST of hostname suffixes --proxy-exclude-domains don’t use proxy only from LIST of hostname suffixes --proxy-hostnames use proxy only from LIST of hostnames --proxy-exclude-hostnames don’t use proxy only from LIST of hostnames -t, --tries try NUMBER of times on transient errors --retry-connrefused retry even if the server does not accept connections --retry-dns-error retry even if DNS fails to resolve hostname -O, --output-document stream every document into FILE -nc, --no-clobber don’t use anti-clobbering filenames -c, --continue resume downloading a partially-downloaded file --progress choose the type of progress indicator
Possible choices: dot, bar, none
-N, --timestamping only download files that are newer than local files --no-use-server-timestamps don’t set the last-modified time on files -S, --server-response print the protocol responses from the server -T, --timeout set DNS, connect, read timeout options to SECONDS --dns-timeout timeout after SECS seconds for DNS requests --connect-timeout timeout after SECS seconds for connection requests --read-timeout timeout after SECS seconds for reading requests --session-timeout timeout after SECS seconds for downloading files -w, --wait wait SECONDS seconds between requests --waitretry wait up to SECONDS seconds on retries --random-wait randomly perturb the time between requests -Q, --quota stop after downloading NUMBER bytes --bind-address bind to ADDRESS on the local host --limit-rate limit download bandwidth to RATE --no-dns-cache disable caching of DNS lookups --rotate-dns use different resolved IP addresses on requests --no-skip-getaddrinfo always use the OS’s name resolver interface --restrict-file-names list of safe filename modes to use
Possible choices: windows, lower, unix, ascii, nocontrol, upper
-4, --inet4-only connect to IPv4 addresses only -6, --inet6-only connect to IPv6 addresses only --prefer-family prefer to connect to FAMILY IP addresses
Possible choices: none, IPv6, IPv4
--user username for both FTP and HTTP authentication --password password for both FTP and HTTP authentication --no-iri use ASCII encoding only --local-encoding use ENC as the encoding of input files and options --remote-encoding force decoding documents using codec ENC --max-filename-length limit filename length to NUMBER characters -nd, --no-directories don’t create directories -x, --force-directories always create directories -nH, --no-host-directories don’t create directories for hostnames --protocol-directories create directories for URL schemes -P, --directory-prefix save everything under the directory PREFIX --cut-dirs don’t make NUMBER of leading directories --http-user username for HTTP authentication --http-password password for HTTP authentication --no-cache request server to not use cached version of files --default-page use NAME as index page if not known -E, --adjust-extension append HTML or CSS file extension if needed --ignore-length ignore any Content-Length provided by the server --header adds STRING to the HTTP header --max-redirect follow only up to NUMBER document redirects --referer always use URL as the referrer --save-headers include server header responses in files -U, --user-agent use AGENT instead of Wpull’s user agent --no-robots ignore robots.txt directives --no-http-keep-alive disable persistent HTTP connections --no-cookies disables HTTP cookie support --load-cookies load Mozilla cookies.txt from FILE --save-cookies save Mozilla cookies.txt to FILE --keep-session-cookies include session cookies when saving cookies to file --post-data use POST for all requests with query STRING --post-file use POST for all requests with query in FILE --content-disposition use filename given in Content-Disposition header --content-on-error keep error pages --http-compression request servers to use HTTP compression --html-parser select HTML parsing library and strategy
Possible choices: libxml2-lxml, html5lib
--link-extractors specify which link extractors to use
Possible choices: html, css, javascript
--escaped-fragment rewrite links with hash fragments to escaped fragments --strip-session-id remove session ID tokens from links --secure-protocol specify the version of the SSL protocol to use
Possible choices: SSLv3, TLSv1, TLSv1.1, TLSv1.2, auto
--https-only download only HTTPS URLs --no-check-certificate don’t validate SSL server certificates --no-strong-crypto don’t use secure protocols/ciphers --certificate use FILE containing the local client certificate --certificate-type Undocumented
Possible choices: PEM
--private-key use FILE containing the local client private key --private-key-type Undocumented
Possible choices: PEM
--ca-certificate load and use CA certificate bundle from FILE --ca-directory load and use CA certificates from DIR --no-use-internal-ca-certs don’t use CA certificates included with Wpull --random-file use data from FILE to seed the SSL PRNG --edg-file connect to entropy gathering daemon using socket FILE --ftp-user username for FTP login --ftp-password password for FTP login --no-remove-listing keep directory file listings --no-glob don’t use filename glob patterns on FTP URLs --preserve-permissions apply server’s Unix file permissions on downloaded files --retr-symlinks if disabled, preserve symlinks and run with security risks
Possible choices: yes, on, 1, off, no, 0
--warc-file save WARC file to filename prefixed with FILENAME --warc-append append instead of overwrite the output WARC file --warc-header include STRING in WARC file metadata --warc-max-size write sequential WARC files sized about NUMBER bytes --warc-move move WARC files to DIRECTORY as they complete --warc-cdx write CDX file along with the WARC file --warc-dedup write revisit records using digests in FILE --no-warc-compression do not compress the WARC file --no-warc-digests do not compute and save SHA1 hash digests --no-warc-keep-log do not save a log into the WARC file --warc-tempdir use temporary DIRECTORY for preparing WARC files -r, --recursive follow links and download them -l, --level limit recursion depth to NUMBER --delete-after download files temporarily and delete them after -k, --convert-links rewrite links in files that point to local files -K, --backup-converted save original files before converting their links -p, --page-requisites download objects embedded in pages --page-requisites-level limit page-requisites recursion depth to NUMBER --sitemaps download Sitemaps to discover more links -A, --accept download only files with suffix in LIST -R, --reject don’t download files with suffix in LIST --accept-regex download only URLs matching REGEX --reject-regex don’t download URLs matching REGEX --regex-type use regex TYPE
Possible choices: pcre
-D, --domains download only from LIST of hostname suffixes --exclude-domains don’t download from LIST of hostname suffixes --hostnames download only from LIST of hostnames --exclude-hostnames don’t download from LIST of hostnames --follow-ftp follow links to FTP sites --follow-tags follow only links contained in LIST of HTML tags --ignore-tags don’t follow links contained in LIST of HTML tags -H, --span-hosts follow links and page requisites to other hostnames --span-hosts-allow selectively span hosts for resource types in LIST
Possible choices: linked-pages, page-requisites
-L, --relative follow only relative links -I, --include-directories download only paths in LIST --trust-server-names use the last given URL for filename during redirects -X, --exclude-directories don’t download paths in LIST -np, --no-parent don’t follow to parent directories on URL path --no-strong-redirects don’t implicitly allow span hosts for redirects --proxy-server run HTTP proxy server for capturing requests --proxy-server-address bind the proxy server to ADDRESS --proxy-server-port bind the proxy server port to PORT --phantomjs use PhantomJS for loading dynamic pages --phantomjs-exe path of PhantomJS executable --phantomjs-max-time maximum duration of PhantomJS session --phantomjs-scroll scroll the page up to NUM times --phantomjs-wait wait SEC seconds between page interactions --no-phantomjs-snapshot don’t take dynamic page snapshots --no-phantomjs-smart-scroll always scroll the page to maximum scroll count option --youtube-dl use youtube-dl for downloading videos --youtube-dl-exe path of youtube-dl executable
Defaults may differ depending on the operating system. Use --help
to see them.
This is only a programmatically generated listing from the program. In most cases, you can follow Wget’s documentation options. Wpull will follow Wget’s behavior so please check Wget online documentation and resources before asking questions.