ArchiveBox
Open-source self-hosted web archiving.
▶️ Quickstart | Demo | GitHub | Documentation | Info & Motivation | Community
ArchiveBox is a powerful, self-hosted internet archiving solution to collect, save, and view websites offline.
Without active preservation effort, everything on the internet eventually dissapears or degrades. Archive.org does a great job as a centralized service, but saved URLs have to be public, and they can't save every type of content.
ArchiveBox is an open source tool that lets organizations & individuals archive both public & private web content while retaining control over their data. It can be used to save copies of bookmarks, preserve evidence for legal cases, backup photos from FB/Insta/Flickr or media from YT/Soundcloud/etc., save research papers, and more...
➡️ Get ArchiveBox with
pip install archivebox
on Linux, macOS, and Windows (WSL2), or via Docker ⭐️.
Once installed, it can be used as a CLI tool, self-hosted Web App, Python library, or one-off command.
📥 You can feed ArchiveBox URLs one at a time, or schedule regular imports from your bookmarks or history, social media feeds or RSS, link-saving services like Pocket/Pinboard, our Browser Extension, and more.
See Input Formats for a full list of supported input formats...
It saves snapshots of the URLs you feed it in several redundant formats.
It also detects any content featured inside pages & extracts it out into a folder:
- 🌐 HTML/Any websites ➡️
original HTML+CSS+JS
,singlefile HTML
,screenshot PNG
,PDF
,WARC
,title
,article text
,favicon
,headers
, ... - 🎥 Social Media/News ➡️
post content TXT
,comments
,title
,author
,images
, ... - 🎬 YouTube/SoundCloud/etc. ➡️
MP3/MP4
s,subtitles
,metadata
,thumbnail
, ... - 💾 Github/Gitlab/etc. links ➡️
clone of GIT source code
,README
,images
, ... - ✨ and more, see Output Formats below...
You can run ArchiveBox as a Docker web app to manage these snapshots, or continue accessing the same collection using the pip
-installed CLI, Python API, and SQLite3 APIs.
All the ways of using it are equivalent, and provide matching features like adding tags, scheduling regular crawls, viewing logs, and more...
🛠️ ArchiveBox uses standard tools like Chrome, wget
, & yt-dlp
, and stores data in ordinary files & folders.
(no complex proprietary formats, all data is readable without needing to run ArchiveBox)
The goal is to sleep soundly knowing the part of the internet you care about will be automatically preserved in durable, easily accessible formats for decades after it goes down.
📦 Install ArchiveBox using your preferred method: docker
/ pip
/ apt
/ etc. (see full Quickstart below).
Expand for quick copy-pastable install commands... ⤵️
# Option A: Get ArchiveBox with Docker Compose (recommended):
mkdir -p ~/archivebox/data && cd ~/archivebox
curl -fsSL 'https://docker-compose.archivebox.io' > docker-compose.yml # edit options in this file as-needed
docker compose run archivebox init --setup
# docker compose run archivebox add 'https://example.com'
# docker compose run archivebox help
# docker compose up
# Option B: Or use it as a plain Docker container:
mkdir -p ~/archivebox/data && cd ~/archivebox/data
docker run -it -v $PWD:/data archivebox/archivebox init --setup
# docker run -it -v $PWD:/data archivebox/archivebox add 'https://example.com'
# docker run -it -v $PWD:/data archivebox/archivebox help
# docker run -it -v $PWD:/data -p 8000:8000 archivebox/archivebox
# Option C: Or install it with your preferred pkg manager (see Quickstart below for apt, brew, and more)
pip install archivebox
mkdir -p ~/archivebox/data && cd ~/archivebox/data
archivebox init --setup
# archivebox add 'https://example.com'
# archivebox help
# archivebox server 0.0.0.0:8000
# Option D: Or use the optional auto setup script to install it
curl -fsSL 'https://get.archivebox.io' | sh
Open
http://localhost:8000
to see your server's Web UI ➡️
Key Features
- Free & open source, own your own data & maintain your privacy by self-hosting
- Powerful CLI with modular dependencies and support for Google Drive/NFS/SMB/S3/B2/etc.
- Comprehensive documentation, active development, and rich community
- Extracts a wide variety of content out-of-the-box: media (yt-dlp), articles (readability), code (git), etc.
- Supports scheduled/realtime importing from many types of sources
- Uses standard, durable, long-term formats like HTML, JSON, PDF, PNG, MP4, TXT, and WARC
- Usable as a oneshot CLI, self-hosted web UI, Python API (BETA), REST API (ALPHA), or desktop app
- Saves all pages to archive.org as well by default for redundancy (can be disabled for local-only mode)
- Advanced users: support for archiving content requiring login/paywall/cookies (see wiki security caveats!)
- Planned: support for running JS during archiving to adblock, autoscroll, modal-hide, thread-expand
🤝 Professional Integration
ArchiveBox is free for everyone to self-host, but we also provide support, security review, and custom integrations to help NGOs, governments, and other organizations run ArchiveBox professionally:
- Journalists:
crawling during research
,preserving cited pages
,fact-checking & review
- Lawyers:
collecting & preserving evidence
,detecting changes
,tagging & review
- Researchers:
analyzing social media trends
,getting LLM training data
,crawling pipelines
- Individuals:
saving bookmarks
,preserving portfolio content
,legacy / memoirs archival
- Governments:
snapshoting public service sites
,recordkeeping compliance
Contact us if your org wants help using ArchiveBox professionally. (we are also seeking grant funding)
We offer: setup & support, CAPTCHA/ratelimit unblocking, SSO, audit logging/chain-of-custody, and more
ArchiveBox is a 🏛️ 501(c)(3) nonprofit FSP and all our work supports open-source development.
Quickstart
🖥 Supported OSs: Linux/BSD, macOS, Windows (Docker) 👾 CPUs: amd64
(x86_64
), arm64
, arm7
(raspi>=3)
✳️ Easy Setup
docker-compose
(macOS/Linux/Windows) 👈 recommended (click to expand)
👍 Docker Compose is recommended for the easiest install/update UX + best security + all extras out-of-the-box.
- Install Docker on your system (if not already installed).
- Download the
docker-compose.yml
file into a new empty directory (can be anywhere).mkdir -p ~/archivebox/data && cd ~/archivebox # Read and edit docker-compose.yml options as-needed after downloading curl -fsSL 'https://docker-compose.archivebox.io' > docker-compose.yml
- Run the initial setup to create an admin user (or set ADMIN_USER/PASS in docker-compose.yml)
docker compose run archivebox