Difficulty: beginner
Estimated Time: 15-20 minutes

Unpacking a WARC with Warcat

In this tutorial, you will learn how to extract web resources from WARC files as individual files.

We will use the Warcat Python package that provides tools for managing WARC files.

Pre-requisites

Basic familiarity with the UNIX command line is recommended.

IMPORTANT NOTE

This tutorial system provides a safe space for experimentation, but please note that is is temporary, and no data will be kept once the session times-out or once you leave the tutorial.

v0.4 Created by The UK Web Archive.

Well done! You've completed this introduction to Web Archive Discovery!

If you want to know more about the tools and commands you've used, you can visit:

If you have any questions, don't hesitate to get in touch with us via:

Thanks!

The UK Web Archive Team

[DEV] Unpacking a WARC with Warcat

Step 1 of 2

Step 1 - Install Warcat

First, we need to install Warcat.

Installation

You can install Warcat with the following command. Like all the commands in this tutorial, you can just click it and it will start running in the Terminal view to the right:

pip3 install Warcat

Check it's working

Once it's installed, you can check it's working by looking at the command-line options:

python3 -m warcat -h