TIL wget can download full sites

TIL (via paul) that you can use wget to fully download a site. Back when I was backpacking this was an awesome technique to pull in the full documentation of PyData projects like pandas so I could learn from it even if the wifi was down.

$ wget -r -E -k -p -np -nc --random-wait http://example.com

Here's what the different CLI options do:

-r or --recursive: makes sure wget uses recursion to download sub-pages of the requested site.
-E or --adjust-extension: let wget change the local file extensions to match the type of data it received (based on the HTML Content-Type header)
-k or --convert-links: have wget change the references in the page to local references, for offline use.
-p or --page-requisites: don't just follow links within the same domain, but also download everything that is used by a page (even if it's outside of the current domain).
-np or --no-parent: restrict wget to the provided sub-folder. Don't let it escape to parent or sibling pages.
-nc or --no-clobber: don't retrieve files multiple times
--random-wait: have wget add some intervals between downloading different files in order to not bash the host server.

Neat!

koaning.io

TIL wget can download full sites

Recent Articles

benchmarking permutations

specific gridsearch

rusty maths

compdef is handy

When Kevin Malone meets Claude

TIL wget can download full sites

diskcache with zlib

TaskyPi can turn your pyproject.toml into a Makefile too

Banning SQLAlchemy Dialects with Ruff

2nd Talk Python Interview

slipways

til etsy datasets

python data tools live

framework mechanical keyboard

ruff banned imports

dev-requirements.txt is bad

The transfer of enthousiasm

Python can open a webbrowser for you

webtui is stunning

Deliberate play

The music of 3b1b

Overtype markdown