TIL wget can download full sites
2025-12-28
TIL (via paul) that you can use wget to fully download a site. Back when I was backpacking this was an awesome technique to pull in the full documentation of PyData projects like pandas so I could learn from it even if the wifi was down.
$ wget -r -E -k -p -np -nc --random-wait http://example.com
Here's what the different CLI options do:
-ror--recursive: makes sure wget uses recursion to download sub-pages of the requested site.-Eor--adjust-extension: let wget change the local file extensions to match the type of data it received (based on the HTML Content-Type header)-kor--convert-links: have wget change the references in the page to local references, for offline use.-por--page-requisites: don't just follow links within the same domain, but also download everything that is used by a page (even if it's outside of the current domain).-npor--no-parent: restrict wget to the provided sub-folder. Don't let it escape to parent or sibling pages.-ncor--no-clobber: don't retrieve files multiple times--random-wait: have wget add some intervals between downloading different files in order to not bash the host server.
Neat!