Options¶

Wget-compatible web downloader and crawler.

usage: wpull [-h] [-V] [--plugin-script FILE] [--plugin-args PLUGIN_ARGS]
             [--database FILE | --database-uri URI] [--concurrent N]
             [--debug-console-port PORT] [--debug-manhole]
             [--ignore-fatal-errors] [--monitor-disk MONITOR_DISK]
             [--monitor-memory MONITOR_MEMORY] [-o FILE | -a FILE]
             [-d | -v | -nv | -q | -qq] [--ascii-print]
             [--report-speed TYPE={bits}] [-i FILE] [-F] [-B URL]
             [--http-proxy HTTP_PROXY] [--https-proxy HTTPS_PROXY]
             [--proxy-user USER] [--proxy-password PASS] [--no-proxy]
             [--proxy-domains LIST] [--proxy-exclude-domains LIST]
             [--proxy-hostnames LIST] [--proxy-exclude-hostnames LIST]
             [-t NUMBER] [--retry-connrefused] [--retry-dns-error] [-O FILE]
             [-nc] [-c] [--progress TYPE={bar,dot,none}] [-N]
             [--no-use-server-timestamps] [-S] [-T SECONDS]
             [--dns-timeout SECS] [--connect-timeout SECS]
             [--read-timeout SECS] [--session-timeout SECS] [-w SECONDS]
             [--waitretry SECONDS] [--random-wait] [-Q NUMBER]
             [--bind-address ADDRESS] [--limit-rate RATE] [--no-dns-cache]
             [--rotate-dns] [--no-skip-getaddrinfo]
             [--restrict-file-names MODES=<ascii,lower,nocontrol,unix,upper,windows>]
             [-4 | -6 | --prefer-family FAMILY={IPv4,IPv6,none}] [--user USER]
             [--password PASSWORD] [--no-iri] [--local-encoding ENC]
             [--remote-encoding ENC] [--max-filename-length NUMBER] [-nd | -x]
             [-nH] [--protocol-directories] [-P PREFIX] [--cut-dirs NUMBER]
             [--http-user HTTP_USER] [--http-password HTTP_PASSWORD]
             [--no-cache] [--default-page NAME] [-E] [--ignore-length]
             [--header STRING] [--max-redirect NUMBER] [--referer URL]
             [--save-headers] [-U AGENT] [--no-robots] [--no-http-keep-alive]
             [--no-cookies] [--load-cookies FILE] [--save-cookies FILE]
             [--keep-session-cookies] [--post-data STRING | --post-file FILE]
             [--content-disposition] [--content-on-error] [--http-compression]
             [--html-parser {html5lib,libxml2-lxml}]
             [--link-extractors <css,html,javascript>] [--escaped-fragment]
             [--strip-session-id]
             [--secure-protocol PR={SSLv3,TLSv1,TLSv1.1,TLSv1.2,auto}]
             [--https-only] [--no-check-certificate] [--no-strong-crypto]
             [--certificate FILE] [--certificate-type TYPE={PEM}]
             [--private-key FILE] [--private-key-type TYPE={PEM}]
             [--ca-certificate FILE] [--ca-directory DIR]
             [--no-use-internal-ca-certs] [--random-file FILE]
             [--edg-file FILE] [--ftp-user USER] [--ftp-password PASS]
             [--no-remove-listing] [--no-glob] [--preserve-permissions]
             [--retr-symlinks [{0,1,no,off,on,yes}]] [--warc-file FILENAME]
             [--warc-append] [--warc-header STRING] [--warc-max-size NUMBER]
             [--warc-move DIRECTORY] [--warc-cdx] [--warc-dedup FILE]
             [--no-warc-compression] [--no-warc-digests] [--no-warc-keep-log]
             [--warc-tempdir DIRECTORY] [-r] [-l NUMBER] [--delete-after] [-k]
             [-K] [-p] [--page-requisites-level NUMBER] [--sitemaps] [-A LIST]
             [-R LIST] [--accept-regex REGEX] [--reject-regex REGEX]
             [--regex-type TYPE={pcre}] [-D LIST] [--exclude-domains LIST]
             [--hostnames LIST] [--exclude-hostnames LIST] [--follow-ftp]
             [--follow-tags LIST] [--ignore-tags LIST]
             [-H | --span-hosts-allow LIST=<linked-pages,page-requisites>]
             [-L] [-I LIST] [--trust-server-names] [-X LIST] [-np]
             [--no-strong-redirects] [--proxy-server]
             [--proxy-server-address ADDRESS] [--proxy-server-port PORT]
             [--phantomjs] [--phantomjs-exe PATH]
             [--phantomjs-max-time PHANTOMJS_MAX_TIME]
             [--phantomjs-scroll NUM] [--phantomjs-wait SEC]
             [--no-phantomjs-snapshot] [--no-phantomjs-smart-scroll]
             [--youtube-dl] [--youtube-dl-exe PATH]
             [URL [URL ...]]

Positional arguments:

urls

the URL to be downloaded

Options:

`-V, --version`	show program’s version number and exit
`--plugin-script`
	load plugin script from FILE
`--plugin-args`	arguments for the plugin
`--database`	save database tables into FILE instead of memory
`--database-uri`	save database tables at SQLAlchemy URI instead of memory
`--concurrent`	run at most N downloads at the same time
`--debug-console-port`
	run a web debug console at given port number
`--debug-manhole`
	install Manhole debugging socket
`--ignore-fatal-errors`
	ignore all internal fatal exception errors
`--monitor-disk`	pause if minimum free disk space is exceeded
`--monitor-memory`
	pause if minimum free memory is exceeded
`-o, --output-file`
	write program messages to FILE
`-a, --append-output`
	append program messages to FILE
`-d, --debug`	print debugging messages
`-v, --verbose`	print informative program messages and detailed progress
`-nv, --no-verbose`
	print informative program messages and errors
`-q, --quiet`	print program error messages
`-qq, --very-quiet`
	do not print program messages unless critical
`--ascii-print`	print program messages in ASCII only
`--report-speed`	print speed in bits only instead of human formatted units Possible choices: bits
`-i, --input-file`
	download URLs listed in FILE
`-F, --force-html`
	read URL input files as HTML files
`-B, --base`	resolves input relative URLs to URL
`--http-proxy`	HTTP proxy for HTTP requests
`--https-proxy`	HTTP proxy for HTTPS requests
`--proxy-user`	username for proxy “basic” authentication
`--proxy-password`
	password for proxy “basic” authentication
`--no-proxy`	disable proxy support
`--proxy-domains`
	use proxy only from LIST of hostname suffixes
`--proxy-exclude-domains`
	don’t use proxy only from LIST of hostname suffixes
`--proxy-hostnames`
	use proxy only from LIST of hostnames
`--proxy-exclude-hostnames`
	don’t use proxy only from LIST of hostnames
`-t, --tries`	try NUMBER of times on transient errors
`--retry-connrefused`
	retry even if the server does not accept connections
`--retry-dns-error`
	retry even if DNS fails to resolve hostname
`-O, --output-document`
	stream every document into FILE
`-nc, --no-clobber`
	don’t use anti-clobbering filenames
`-c, --continue`	resume downloading a partially-downloaded file
`--progress`	choose the type of progress indicator Possible choices: dot, bar, none
`-N, --timestamping`
	only download files that are newer than local files
`--no-use-server-timestamps`
	don’t set the last-modified time on files
`-S, --server-response`
	print the protocol responses from the server
`-T, --timeout`	set DNS, connect, read timeout options to SECONDS
`--dns-timeout`	timeout after SECS seconds for DNS requests
`--connect-timeout`
	timeout after SECS seconds for connection requests
`--read-timeout`	timeout after SECS seconds for reading requests
`--session-timeout`
	timeout after SECS seconds for downloading files
`-w, --wait`	wait SECONDS seconds between requests
`--waitretry`	wait up to SECONDS seconds on retries
`--random-wait`	randomly perturb the time between requests
`-Q, --quota`	stop after downloading NUMBER bytes
`--bind-address`	bind to ADDRESS on the local host
`--limit-rate`	limit download bandwidth to RATE
`--no-dns-cache`	disable caching of DNS lookups
`--rotate-dns`	use different resolved IP addresses on requests
`--no-skip-getaddrinfo`
	always use the OS’s name resolver interface
`--restrict-file-names`
	list of safe filename modes to use Possible choices: windows, lower, unix, ascii, nocontrol, upper
`-4, --inet4-only`
	connect to IPv4 addresses only
`-6, --inet6-only`
	connect to IPv6 addresses only
`--prefer-family`
	prefer to connect to FAMILY IP addresses Possible choices: none, IPv6, IPv4
`--user`	username for both FTP and HTTP authentication
`--password`	password for both FTP and HTTP authentication
`--no-iri`	use ASCII encoding only
`--local-encoding`
	use ENC as the encoding of input files and options
`--remote-encoding`
	force decoding documents using codec ENC
`--max-filename-length`
	limit filename length to NUMBER characters
`-nd, --no-directories`
	don’t create directories
`-x, --force-directories`
	always create directories
`-nH, --no-host-directories`
	don’t create directories for hostnames
`--protocol-directories`
	create directories for URL schemes
`-P, --directory-prefix`
	save everything under the directory PREFIX
`--cut-dirs`	don’t make NUMBER of leading directories
`--http-user`	username for HTTP authentication
`--http-password`
	password for HTTP authentication
`--no-cache`	request server to not use cached version of files
`--default-page`	use NAME as index page if not known
`-E, --adjust-extension`
	append HTML or CSS file extension if needed
`--ignore-length`
	ignore any Content-Length provided by the server
`--header`	adds STRING to the HTTP header
`--max-redirect`	follow only up to NUMBER document redirects
`--referer`	always use URL as the referrer
`--save-headers`	include server header responses in files
`-U, --user-agent`
	use AGENT instead of Wpull’s user agent
`--no-robots`	ignore robots.txt directives
`--no-http-keep-alive`
	disable persistent HTTP connections
`--no-cookies`	disables HTTP cookie support
`--load-cookies`	load Mozilla cookies.txt from FILE
`--save-cookies`	save Mozilla cookies.txt to FILE
`--keep-session-cookies`
	include session cookies when saving cookies to file
`--post-data`	use POST for all requests with query STRING
`--post-file`	use POST for all requests with query in FILE
`--content-disposition`
	use filename given in Content-Disposition header
`--content-on-error`
	keep error pages
`--http-compression`
	request servers to use HTTP compression
`--html-parser`	select HTML parsing library and strategy Possible choices: libxml2-lxml, html5lib
`--link-extractors`
	specify which link extractors to use Possible choices: html, css, javascript
`--escaped-fragment`
	rewrite links with hash fragments to escaped fragments
`--strip-session-id`
	remove session ID tokens from links
`--secure-protocol`
	specify the version of the SSL protocol to use Possible choices: SSLv3, TLSv1, TLSv1.1, TLSv1.2, auto
`--https-only`	download only HTTPS URLs
`--no-check-certificate`
	don’t validate SSL server certificates
`--no-strong-crypto`
	don’t use secure protocols/ciphers
`--certificate`	use FILE containing the local client certificate
`--certificate-type`
	Undocumented Possible choices: PEM
`--private-key`	use FILE containing the local client private key
`--private-key-type`
	Undocumented Possible choices: PEM
`--ca-certificate`
	load and use CA certificate bundle from FILE
`--ca-directory`	load and use CA certificates from DIR
`--no-use-internal-ca-certs`
	don’t use CA certificates included with Wpull
`--random-file`	use data from FILE to seed the SSL PRNG
`--edg-file`	connect to entropy gathering daemon using socket FILE
`--ftp-user`	username for FTP login
`--ftp-password`	password for FTP login
`--no-remove-listing`
	keep directory file listings
`--no-glob`	don’t use filename glob patterns on FTP URLs
`--preserve-permissions`
	apply server’s Unix file permissions on downloaded files
`--retr-symlinks`
	if disabled, preserve symlinks and run with security risks Possible choices: yes, on, 1, off, no, 0
`--warc-file`	save WARC file to filename prefixed with FILENAME
`--warc-append`	append instead of overwrite the output WARC file
`--warc-header`	include STRING in WARC file metadata
`--warc-max-size`
	write sequential WARC files sized about NUMBER bytes
`--warc-move`	move WARC files to DIRECTORY as they complete
`--warc-cdx`	write CDX file along with the WARC file
`--warc-dedup`	write revisit records using digests in FILE
`--no-warc-compression`
	do not compress the WARC file
`--no-warc-digests`
	do not compute and save SHA1 hash digests
`--no-warc-keep-log`
	do not save a log into the WARC file
`--warc-tempdir`	use temporary DIRECTORY for preparing WARC files
`-r, --recursive`
	follow links and download them
`-l, --level`	limit recursion depth to NUMBER
`--delete-after`	download files temporarily and delete them after
`-k, --convert-links`
	rewrite links in files that point to local files
`-K, --backup-converted`
	save original files before converting their links
`-p, --page-requisites`
	download objects embedded in pages
`--page-requisites-level`
	limit page-requisites recursion depth to NUMBER
`--sitemaps`	download Sitemaps to discover more links
`-A, --accept`	download only files with suffix in LIST
`-R, --reject`	don’t download files with suffix in LIST
`--accept-regex`	download only URLs matching REGEX
`--reject-regex`	don’t download URLs matching REGEX
`--regex-type`	use regex TYPE Possible choices: pcre
`-D, --domains`	download only from LIST of hostname suffixes
`--exclude-domains`
	don’t download from LIST of hostname suffixes
`--hostnames`	download only from LIST of hostnames
`--exclude-hostnames`
	don’t download from LIST of hostnames
`--follow-ftp`	follow links to FTP sites
`--follow-tags`	follow only links contained in LIST of HTML tags
`--ignore-tags`	don’t follow links contained in LIST of HTML tags
`-H, --span-hosts`
	follow links and page requisites to other hostnames
`--span-hosts-allow`
	selectively span hosts for resource types in LIST Possible choices: linked-pages, page-requisites
`-L, --relative`	follow only relative links
`-I, --include-directories`
	download only paths in LIST
`--trust-server-names`
	use the last given URL for filename during redirects
`-X, --exclude-directories`
	don’t download paths in LIST
`-np, --no-parent`
	don’t follow to parent directories on URL path
`--no-strong-redirects`
	don’t implicitly allow span hosts for redirects
`--proxy-server`	run HTTP proxy server for capturing requests
`--proxy-server-address`
	bind the proxy server to ADDRESS
`--proxy-server-port`
	bind the proxy server port to PORT
`--phantomjs`	use PhantomJS for loading dynamic pages
`--phantomjs-exe`
	path of PhantomJS executable
`--phantomjs-max-time`
	maximum duration of PhantomJS session
`--phantomjs-scroll`
	scroll the page up to NUM times
`--phantomjs-wait`
	wait SEC seconds between page interactions
`--no-phantomjs-snapshot`
	don’t take dynamic page snapshots
`--no-phantomjs-smart-scroll`
	always scroll the page to maximum scroll count option
`--youtube-dl`	use youtube-dl for downloading videos
`--youtube-dl-exe`
	path of youtube-dl executable

Defaults may differ depending on the operating system. Use --help to see them.

This is only a programmatically generated listing from the program. In most cases, you can follow Wget’s documentation options. Wpull will follow Wget’s behavior so please check Wget online documentation and resources before asking questions.