TIL wget can download full sites

2025-12-28

TIL (via paul) that you can use wget to fully download a site. Back when I was backpacking this was an awesome technique to pull in the full documentation of PyData projects like pandas so I could learn from it even if the wifi was down.

$ wget -r -E -k -p -np -nc --random-wait http://example.com

Here's what the different CLI options do:

  • -r or --recursive: makes sure wget uses recursion to download sub-pages of the requested site.
  • -E or --adjust-extension: let wget change the local file extensions to match the type of data it received (based on the HTML Content-Type header)
  • -k or --convert-links: have wget change the references in the page to local references, for offline use.
  • -p or --page-requisites: don't just follow links within the same domain, but also download everything that is used by a page (even if it's outside of the current domain).
  • -np or --no-parent: restrict wget to the provided sub-folder. Don't let it escape to parent or sibling pages.
  • -nc or --no-clobber: don't retrieve files multiple times
  • --random-wait: have wget add some intervals between downloading different files in order to not bash the host server.

Neat!