20 Comments


  1. This is an excellent point and an issue that is pretty pervasive, both for hosted and self-hosted blogs.

    I don’t know what the answer is. I think hoping that people will back up their audio / video is unlikely. But archive.org is pretty incredible, and I also like that a project BDFL like Matt has acquired some WordPress properties for the purpose off keeping them online as well.

    I know Siobhan McKeown has struggled with some of this issue quite a bit while she’s been working on her book. She’s be a really good person to ask about it.

    Reply

    1. Which reminds me, gotta get her on the show. At least the WP Daily archives were saved through TorqueMag but you’re right, manual archiving for audio and video just doesn’t cut it. And yeah, Weblogtoolscollection.com is a property Matt acquired with at least one of the primary purposes being to archive the site.

      Reply

  2. I think it is an extremely important thing to do. There’s a lot of useful information and history in various WordPress sites, and letting them disappear permanently would be very sad.

    Speaking of old content, it would be nice if this page reappeared, even if just as a read-only archive … http://wptavern.com/forum

    Reply

    1. The irony is not lost on me, knowing that page is missing from the web as I wrote this post. I can always count on you to bring this stuff up :). I have the content archived, just not anywhere that is available to the public. I want to bring it back somehow, either as part of the site again or at the very least, a public archive in read-only mode.

      Reply

      1. I’ve done a similar thing myself. I have an old forum which I accidentally let drop offline (I let the domain expire), but I still have the old database stashed away for the rainy day when I can be bothered turfing it back online for historical purposes :)

        Reply

  3. Do you know what happened with WPCandy? No new posts since a year. A few months ago I’ve asked one of their editor on twitter and she told me they will come back soon.

    Reply

    1. The site disappeared. The owner never responded to questions about it, then some of the posts reappeared (but not all), then a couple of new posts were made, then it just died.

      Reply

      1. Quite interesting, but not a unique case in the online world.

        Reply

      2. …really hope it does come back though, I for one used to really enjoy reading WPCandy…. Ryan? Listening?

        Reply

  4. Another option is for podcasters to post transcripts for their shows. Those transcripts would be indexed and archived. Same thing is true for video. Add the transcript and the content will be archived.

    Reply

    1. I’m going to look into this more. When I think of transcripts for podcast, I always equate that with having to spend money. I’ll head over to your site shortly as I know you’ve written about this topic a few times before.

      Reply

      1. Check out fanscribed.com – they have a good thing going already. The transcription work is crowdsourced.

        Reply

  5. Yeah, I’d love to have an archive of all the podcasts I’ve been on… from the WordPress Podcast with Charles Stricklin, the TechCanuck Podcast with James Cogan, PerfCast with Jeff, and of course WP Weekly… I think the ones with Charles would be the hardest to find/get at this point.

    Reply

    1. Well, all 29 episodes of Perfcast are available for you to download and archive yourself here, http://www.talkshoe.com/talkshoe/web/talkCast.jsp?masterId=24073&cmd=tc

      I don’t know which episode you started to host the show with me, I bet I wrote about it. But archived episodes can be downloaded here. http://www.talkshoe.com/talkshoe/web/talkCast.jsp?masterId=34224&cmd=tc

      I also have most of them on an external hard drive. All newer episodes are hosted on the same site as the Tavern.

      I brought up the issue of the WordPress community podcast being archived to Joost De Valk a long time ago and he said he would keep them available online. All of the newest episodes are on WebmasterRadio.FM http://www2.webmasterradio.fm/wordpress-community-podcast/ but I don’t know exactly where all of the very old episodes of the show are. They may still be on the wp-community domain but that just shows a login form now.

      For the archives sake, Joost De Valk’s original Press This podcast episodes can be found here. http://www2.webmasterradio.fm/press-this/

      Reply

  6. Archive.org is fantastic for text – I’ve used it extensively in my research. Unfortunately podcasts don’t archive so well :( I’ve been trying to get this podcast: https://web.archive.org/web/20091210030233/http://bitwiremedia.com/wordcast/wordcast-special-edition-live-may-12th-at-6pm-eastern/ but having no luck. I’ve been in touch with one of the publishers of the podcast and he doesn’t have it any more.

    Here’s a good one that I did find via archive.org though: https://web.archive.org/web/20080427183149/http://www.revolutionizeyourblog.com/askthanks.php

    Reply

    1. Well that sucks, I guess Dave Moyer is hard to get in touch with these days or maybe he didn’t archive them. Perhaps Lorelle VanFossen has a copy?

      Reply

  7. I started podcasting just two months ago, using the Internet Archive to store and host my 100% Royalty-Free “Eclectic Music” Podcast, Amateur Zen. I was informed by some web how-tos (I’ll try to find and credit those sources) that one way to podcast for zero cost (as in beer) is to use the ‘Internet Archive – Feedburner – WordPress’ triumvarate. I’ve been wholly satisfied with this method, and my handful of listeners have too. Although the learning curve is steep-ish, submitting audio content to the archive is extremely easy and reliable. I’ve come across plenty of podcasts hosted there in their comprehensive 100+ “episode” glory (Note: Creating playlists of episodical ‘casts in sequence MAY entice enforcement of payment of fees to ‘Pro Audio’).

    The issue tackled in this article truly doesn’t compute with me. As long as podcasters are independent creators, they can’t expect a free automated platform to pick up where their own laziness or lack of spare time leave off. WordPress is certainly not that platform. The various podcast plug-ins don’t bother to help producers get their content submitted to archive.org either. What do they do? I’m getting snarky so..’nuff for now.

    If a podcaster intends for his/her works to be archived, that is their own responsibility – and hopefully a responsibility shared with one’s eager-to-contribute audience. If podcasters who are unable to do this make up a significant portion of the WP user-base then I’d work on a solution proritizing the creation of a simple automation script by WP. A script automating the upload of podcasts containing a given tag (‘archive’, perhaps?) to archive.org under an auto-generated account there. Perhaps we put a limit of X gigabytes on this process, after which a podcaster must go and actually interact with archive.org manually to prove his/her interest/sentience. I’m no developer, so i’ll stop there.

    To suffice, this is a solution in search of a problem.

    Reply

  8. Great post and I love anything that raises awareness about the free resources provided by archive.org. Some of the information is inaccurate though: Wayback Machine crawls do include MP3s or any other file types, provided they are served up by normal HTTP or HTTPS download links and not hidden behind flash players or MediaFire-type download sites. As with any Wayback Machine content it can be hit and miss as to whether any particular link gets archived, although they have been improving over time. I highly recommend just using the archive.org file area to host the MP3s in the first place – free bandwidth!

    Submitting your site to be archived is another very useful feature, however it will only archive the specific URL you give it and does not actually trigger a crawl of the entire site. The six months blurb you quote was in reference to the previous situation where URLs archived by Wayback would not go live until months later when the indexing and such had been completed. This is no longer the case and archived URLs are now available within seconds.

    If you have a full site that needs crawling you can bring it to the attention of the Archive Team (http://archiveteam.org/) and we have tools that can make that happen and import the results into Wayback.

    Reply

Leave a Reply