Exploring The Idea Of An Internet Archive Specifically For WordPress Content

It seems like each time a WordPress podcast disappears, there is one or more to take its place. A few weeks ago, the WP Bacon podcast announced the end of their show to concentrate on other projects. However, a recent search in iTunes for WordPress Podcasts show there is almost an endless amount of content to listen to.

Variety of WordPress Podcasts To Listen To On iTunes
Variety of WordPress Podcasts To Listen To On iTunes

Although websites can be archived by the Internet Archive web crawler to be preserved, podcasts don’t have that luxury since they are audio files. It’s disappointing knowing that some WordPress podcasts will be lost to the ether, never to be heard from again. It’s an even harder pill to swallow if the podcast has 50-100 episodes. It would be great if there was a resource on WordPress.org that acted as a digital archive of WordPress history for text, video, and audio. An enhanced version of the Internet Archive but specifically for WordPress.

Results For WordPress.org In The Wayback Machine
Results For WordPress.org In The Wayback Machine

Make Sure Your Site Is Not Blocking The Internet Archive Web Crawler

The Internet Archive uses web crawlers or spiders to automatically scan and download websites. You can manually trigger the spiders to crawl your site by searching for it using the Wayback Machine. If the site is already indexed, you’ll see a list of results. If not, the Internet Archive will attempt to crawl the site and display the results within six months.

It generally takes 6 months or more (up to 24 months) for pages to appear in the Wayback Machine after they are collected, because of delays in transferring material to long-term storage and indexing, or the requirements of our collection partners.

A robots.txt file at the top-level of a domain is enough to block the Internet Archive from crawling the site, so please don’t use it. The Archive Team explains the history of robots.txt and why it’s dangerous to preserving the web.

Robots
photo credit: gruntzookicc

How To Upload Audio To The Wayback Machine

In order to upload audio to the Internet Archive, you’ll need to register for an account to obtain a virtual library card. Once you’ve registered and activated your account, browser to https://archive.org/upload/. This is the submission form you’ll use to upload audio to the Internet Archive. Select the audio file or drag to the screen to begin the process.

With the audio file selected, you’ll need to fill in additional details such as the description, subject tags, date the work was created, etc. Please be as detailed and descriptive as possible. This is where publishing decent show notes helps as you can just copy and paste the relevant material into the submission form.

One thing you’ll want to pay particular attention to is the license. If the work is not considered in the public domain, CC0 is the least restrictive license. While you can choose to be more restrictive, I recommend being the least restrictive license as possible to remove doubt on how the content can be reused. As an example, I uploaded episode 154 of WordPress Weekly.

The Wayback Machine Audio Upload Form
The Internet Archive Audio Upload Form

Once the upload process is complete, the Internet Archive creates a page dedicated to the piece of audio content. From this page, visitors can read information and listen to the uploaded audio file. I also searched the audio section of the Internet Archive for WordPress Weekly and was able to locate Episode 154 of the show.

Internet Archive Search Results For WordPress Weekly Audio
Internet Archive Search Results For WordPress Weekly Audio

If you’ve produced at least 25 or more episodes of a WordPress podcast and have decided to call it quits, could you please consider uploading the shows to the Internet Archive. I realize it’s manual labor and takes time, but at least your hard work of preparing for each show and the information discussed will not go to waste!

Uploading Video To The Internet Archive

Although the Internet Archive has a section devoted to video content, you’re required to have the source files for upload. These are not only larger, but  require more time and labor to obtain. I doubt YouTube.com is going anywhere, anytime soon, but if you want your WordPress centric videos to be archived, this is where you’d upload them.

Why Archiving WordPress Information Is Important To Me

I think of WP Tavern as a site with a continuous mission of documenting what’s happening within the WordPress ecosystem. Our job is never completed and I value the archived content as if it were gold. When I read posts from the archive, I’m reminded of how many projects that have come and gone over the past few years. It doesn’t matter if it’s text, audio, or video, each piece of content about WordPress whether it’s published on WP Tavern or not is important, especially when looking at the big picture.

My hope is that websites that write about WordPress on a routine basis do their best to archive content, even if they decide to shut down. For example, if WPCandy disappears from the web, a large gaping hole of WordPress history will go with it. During the height of WPCandy’s success, I spent time away from WP Tavern. The Tavern doesn’t have any relevant content from that time period. When piecing together stories to make sense of decisions and trends, historical content is important. Once those holes are created, it’s nearly impossible to fill them.

A lot has happened since the birth of WordPress over 10 years ago. Much of WordPress’ earlier history is documented fairly well but the events and milestones between the beginning and the present are spread throughout many sites in text, video, and audio. As someone who writes about WordPress for a living, it’s important that as much WordPress history as possible is archived. It sucks to view an article about WordPress with a bunch of potentially relevant information to a recent topic of discussion only to discover a 404 error.

How important is it to you that there is a proper archive of historical content related to WordPress and it being available to the public? Is the Internet Archive good enough or would you like to see something catered specifically to WordPress?

Who is Jeff Chandler


Jeff Chandler is a WordPress guy in the buckeye state. Contributing writer for WPTavern. Have been writing about WordPress since 2007. Host of the WordPress Weekly Podcast.

There are 20 comments

Comments are closed.