It seems like each time a WordPress podcast disappears, there is one or more to take its place. A few weeks ago, the WP Bacon podcast announced the end of their show to concentrate on other projects. However, a recent search in iTunes for WordPress Podcasts show there is almost an endless amount of content to listen to.
Although websites can be archived by the Internet Archive web crawler to be preserved, podcasts don’t have that luxury since they are audio files. It’s disappointing knowing that some WordPress podcasts will be lost to the ether, never to be heard from again. It’s an even harder pill to swallow if the podcast has 50-100 episodes. It would be great if there was a resource on WordPress.org that acted as a digital archive of WordPress history for text, video, and audio. An enhanced version of the Internet Archive but specifically for WordPress.
Make Sure Your Site Is Not Blocking The Internet Archive Web Crawler
The Internet Archive uses web crawlers or spiders to automatically scan and download websites. You can manually trigger the spiders to crawl your site by searching for it using the Wayback Machine. If the site is already indexed, you’ll see a list of results. If not, the Internet Archive will attempt to crawl the site and display the results within six months.
It generally takes 6 months or more (up to 24 months) for pages to appear in the Wayback Machine after they are collected, because of delays in transferring material to long-term storage and indexing, or the requirements of our collection partners.
A robots.txt file at the top-level of a domain is enough to block the Internet Archive from crawling the site, so please don’t use it. The Archive Team explains the history of robots.txt and why it’s dangerous to preserving the web.
How To Upload Audio To The Wayback Machine
In order to upload audio to the Internet Archive, you’ll need to register for an account to obtain a virtual library card. Once you’ve registered and activated your account, browser to https://archive.org/upload/. This is the submission form you’ll use to upload audio to the Internet Archive. Select the audio file or drag to the screen to begin the process.
With the audio file selected, you’ll need to fill in additional details such as the description, subject tags, date the work was created, etc. Please be as detailed and descriptive as possible. This is where publishing decent show notes helps as you can just copy and paste the relevant material into the submission form.
One thing you’ll want to pay particular attention to is the license. If the work is not considered in the public domain, CC0 is the least restrictive license. While you can choose to be more restrictive, I recommend being the least restrictive license as possible to remove doubt on how the content can be reused. As an example, I uploaded episode 154 of WordPress Weekly.
Once the upload process is complete, the Internet Archive creates a page dedicated to the piece of audio content. From this page, visitors can read information and listen to the uploaded audio file. I also searched the audio section of the Internet Archive for WordPress Weekly and was able to locate Episode 154 of the show.
If you’ve produced at least 25 or more episodes of a WordPress podcast and have decided to call it quits, could you please consider uploading the shows to the Internet Archive. I realize it’s manual labor and takes time, but at least your hard work of preparing for each show and the information discussed will not go to waste!
Uploading Video To The Internet Archive
Although the Internet Archive has a section devoted to video content, you’re required to have the source files for upload. These are not only larger, but require more time and labor to obtain. I doubt YouTube.com is going anywhere, anytime soon, but if you want your WordPress centric videos to be archived, this is where you’d upload them.
Why Archiving WordPress Information Is Important To Me
I think of WP Tavern as a site with a continuous mission of documenting what’s happening within the WordPress ecosystem. Our job is never completed and I value the archived content as if it were gold. When I read posts from the archive, I’m reminded of how many projects that have come and gone over the past few years. It doesn’t matter if it’s text, audio, or video, each piece of content about WordPress whether it’s published on WP Tavern or not is important, especially when looking at the big picture.
My hope is that websites that write about WordPress on a routine basis do their best to archive content, even if they decide to shut down. For example, if WPCandy disappears from the web, a large gaping hole of WordPress history will go with it. During the height of WPCandy’s success, I spent time away from WP Tavern. The Tavern doesn’t have any relevant content from that time period. When piecing together stories to make sense of decisions and trends, historical content is important. Once those holes are created, it’s nearly impossible to fill them.
A lot has happened since the birth of WordPress over 10 years ago. Much of WordPress’ earlier history is documented fairly well but the events and milestones between the beginning and the present are spread throughout many sites in text, video, and audio. As someone who writes about WordPress for a living, it’s important that as much WordPress history as possible is archived. It sucks to view an article about WordPress with a bunch of potentially relevant information to a recent topic of discussion only to discover a 404 error.
How important is it to you that there is a proper archive of historical content related to WordPress and it being available to the public? Is the Internet Archive good enough or would you like to see something catered specifically to WordPress?
This is an excellent point and an issue that is pretty pervasive, both for hosted and self-hosted blogs.
I don’t know what the answer is. I think hoping that people will back up their audio / video is unlikely. But archive.org is pretty incredible, and I also like that a project BDFL like Matt has acquired some WordPress properties for the purpose off keeping them online as well.
I know Siobhan McKeown has struggled with some of this issue quite a bit while she’s been working on her book. She’s be a really good person to ask about it.