Internet Archive: A Treasure Trove of (Virtual) Content

Movies, music, books, games… whatever you’re looking for, there’s a good chance you’ll find it on the Internet Archive.  Quite possibly the web’s ultimate resource, if you’re not already familiar with this excellent site (or even if you are), it could well be worth a(nother) look.  Join me as we take a guided tour around numerous collections, and consider the topics listed in the following menu (click one to skip ahead).

Back in the day, in an age where everyone was still using dial-up to get online – and when Bruce Willis still had hair – the Internet Archive was born.  Though unlike Bruce’s, arguably, most famous movie, Die Hard, the intention of this HUGE digital library is to preserve the internet’s past so that it doesn’t die at all, hard or otherwise.  That’s why, way-back in 1996, this enormous undertaking began; possibly also the reason they use this term to describe one of the fundamental elements of the project.  Known as the WayBack Machine, the Internet Archive has created a tool for cataloguing the internet.

Think about that for a moment.  We’re not just talking about making a single copy of the World Wide Web; which would be quite an achievement in itself.  No, this site is collecting multiple copies (over days, months, and years) and making this content readily accessible to all, in an indexed and orderly fashion.  If that’s not one heck of an accomplishment then I don’t know what is.  This non-profit digital library certainly more than lives up to its mission statement of providing “universal access to all knowledge”.

Yet, as if this wasn’t enough, they’re not even content to stop there.  Besides capturing a copy (snapshot) of the very websites that make up the internet, the project aims to host other cultural artefacts in digital form as well.  They do this by means of tabbed categories on their website.  When you visit the Internet Archive, after the Web tab (and its Wayback Machine), you’ll find Texts (books and magazines), Video (film and television), Audio (music, audiobooks, podcasts and radio), and Images (photos and art)… there’s even Software (arcade, computer and consoles).  No wonder the entire library occupies over thirty petabytes of storage!

Here we’ll take a closer look at each of these categories and find out precisely how they work.  Consider these impressive facts first; to date, the Internet Archive’s website is home to the following.

330 billion web pages
20 million books and other texts
4.5 million audio recordings
4 million videos
3 million images
200,000 software programs

Amassing any one of these alone would deserve applaud; combined, it’s simply incredible.  As you’ll soon discover, this makes this enormous website a fantastic resource.  So set a few hours aside, and let’s begin our journey into the Internet Archive’s virtual vaults.  Click the link below to open the homepage and let’s get stuck in.

https://archive.org/

You can search the entire Internet Archive from the homescreen using either of the search boxes highlighted in the screenshot above.  However, if you’re not in a rush, it can be much more beneficial to explore the categories individually, as we’ll do now.

Web

Wayback Machine

Has your favourite website gone the way of the dodo?  On the Internet Archive’s Web tab, find out if the Wayback Machine can resurrect it for you.  To get started, type the address, or name of the website you’re looking for, into the search box and press “Enter” on your keyboard.  As an example, I’m going with AOL – remember them?  Yes, I know, they’re actually still around – though you may not know it – but there was a time they were everywhere.

If you typed a search term rather than an actual web address, the Wayback Machine will load a list of possible websites.  Hopefully the one you’re after is in the list (the Internet Archive is pretty comprehensive, though there may be some sites that escaped its reach).  Click the site you’re looking for and a calendar will open.  First, pick a year, and then choose a date – it must have a circle on it, as these are the dates the Internet Archive took a snapshot of the website.

Tone’s Tip. Blue circles are the best one’s to select, as this means the crawler (the tool scanning the internet) got a good capture of the web server; in other words, the old website you’re trying to view should display correctly.

Ring any bells?  This is how AOL looked in 1996 (on 20th December, to be precise).  How times have changed – thankfully.  Depending on how well the website has been indexed, you may be able to click some of the links to read other pages.  Note. Not every page will have been catalogued, so your mileage may vary.  At any point, you can change the date in the bar at the top of the screen.  You can also search for another website from there.

So take a trip down memory lane.  If you have your own website, you might even find that.  Odds are, if the site has been (or was) around for any length of time, it will have been “backed up” by the Wayback Machine.

If there’s a web page you’d like to add (that it isn’t on there already), you can even add it yourself by clicking on the following link.  Type the address in the Save Page Now box and click the “SAVE PAGE” button.  It will be captured and date stamped, and you’ll see how the archived page looks – best of all, you don’t even need an account to do this!

https://archive.org/web/

While there I decided to take a snapshot of tonestechtips.com homepage, for the sake of posterity.

Fire up your web browser and step into your very own internet time machine.  Whether you’ve a burning desire to remember what Grooveshark.com used to look like, or just want to amuse yourself at how basic YouTube was back in 2005 (when the video platform first launched), the Wayback Machine’s got you covered.

Texts

The Texts tab is the gateway to online literature.  The two main sections are Books to Borrow and Open Library.

Books to Borrow

This is exactly what you’d expect: a virtual library.  To use this, you’ll need to create a (free) account.  Then search for a book, either manually or using any of the multitude of filtering options.  After finding one that you’d like to borrow, click on it and select the “Borrow This Book” button (as in the screenshot).

You can borrow up to five books at a time, and each book can be borrowed for fourteen days.  If a book is already on loan you can click “Join Waitlist” and be notified when it’s available to borrow.  Books can be read in your web browser (which has a button to toggle fullscreen on and off) or downloaded as either an Adobe PDF or ePub (note, the file will be encrypted and you’ll need to install Adobe Digital Editions to view it – though this is freely available for Windows Mac, Android and iOS).  As you might expect, the reader saves your place so you can pick up where you left off.  To view borrowed books, click your account name in the top bar and select “My loans”.

The web reader also has the option to read text out loud if you click the “Read this book aloud” speaker icon.  This could be useful for anyone with a visual impairment, though the text to speech engine could be better.  For example, when I used it, it pronounced “Ray’s” as “R-a-y-s”, with each letter spoken individually rather than reading the word as a whole – indeed, it does seem to struggle with words containing apostrophes.

Directly beneath the web reader are three square buttons.  The first is to Favorite a book, the second allows you to Share a link to it through various social media platforms (the usual suspects are there: Twitter, Facebook, etc.), email, or even grab the html code to embed into your own website (as demonstrated by the Memoirs of Sherlock Holmes example below).

Lastly, you can Flag the book if there’s a reason you think it should be reported (for graphic violence, for example).  Books can be returned at any time, freeing up your allowance to borrow other content.

Open Library

The Open Library is the (e)library proper: a virual collection of books.  It can be browsed by subject – with genres such as Science, Biographies, Romance, Fantasy and more – or you can search for something specific like an author or title.  Having found something you’d like to read, simply click on it.

Depending on the book in question (e.g. if it’s an old classic), you may not have to login – books that are out of copyright can be freely read or downloaded, they don’t have to be borrowed.  Its easy to spot which publication is which by the fact it will either have a Read or a Borrow button.  In the former category you can even download the book in a range of different formats: PDF, ePub, Text, and even Kindle’s mobi.  There’s also DAISY, which is designed specifically to help people with a range of disabilities, including blindness, impaired vision, and dyslexia.

Tone’s Tip. When you’ve found a book, scroll to the bottom of the page.  You’ll see it helpfully lists SIMILAR ITEMS to the one you’re viewing – this happens in other Internet Archive categories as well, not just books.

Another useful feature of the Open Library is being able to create List(s).  This comes in handy for making a note of books that you’d like to read, as well as those you’re currently reading, and even those you’ve read.

The part that makes the Open Library (and the Internet Archive as a whole) extra special is that, once you have an account, you can make your own contributions.  By clicking on an Edit link, you’re able to submit additional information about an existing book.  Add your description or excerpt, or whatever else you feel would benefit the text and then click the “Save” button.  You can even click “Manage Covers” to submit a different book cover.

More Texts collections

The Texts tab also includes Featured and Top.  These are books (and other texts) grouped into helpful collections.  If you don’t see what you’re looking for, try All Texts. From here you can search for anything in the archive.  Once you’ve found a title you like, why not download it to read at your own leisure offline.

Tone’s Tip. Transfer a downloaded book to your eReader using the excellent open source Calibre software, available here.

https://calibre-ebook.com/download

Project Gutenberg

One of the Internet Archive’s collections is Project Gutenberg.  If you haven’t heard of it, this is a free eBook service itself and can be located at the following address – it’s well worth checking out.

http://www.gutenberg.org/

Children’s Library

Another collection in Texts is the Children’s Library.  This is a great one for the kids and features such classics as The Wonderful Wizard of Oz and Cinderella.

Books by Language

This collection could be useful if you’ve started learning a new language.  Perhaps you can find a book in your native tongue and that of the language your studying and compare the two.  There’s a large selection of languages to choose from; everything from Latvian to Swahili.  Even Latin’s in there, too – who said it was dead!

Video

The Internet Archive is also home to millions of videos, all accessible from the Video tab.  These range from TV shows and cartoons to news and movies.

TV News

The TV News Archive contains a wealth of material that aired across a whole gamut of news outlets.  Can you recall a particular news story that you’d like to look up?  Either browse through the numerous collections, or use the search bar to hunt for something specific.  Don’t forget the Advanced Search if you really want to get granular.

Further Video collections

As with most of the main categories on the Internet Archive, content is also listed under Featured and Top, where it is organised into more collections.  In the Video section, the following are certainly worth a look.

Animation & Cartoons

Who doesn’t love a good cartoon?  I reckon there’s a big kid hiding inside each of us (perhaps some more than others).

Computers & Technology

The fact that you’re reading these very words on my technology website should be proof enough that this collection needs to be explored.

Movies and Television

Okay, so technically this is two different collections, neither of which really needs explaining – other than to say, free movies and TV, what’s not to like!  You could even cast them straight to your TV (there’s a handy little cast button right in the video window – highlighted in the screenshot – which supports both Chromecast and AirPlay).

Videogames Videos

There are some absolute classics in this one. Surely a must for any gamers out there.

All Video

If all else fails (and you haven’t seen anything that floats your boat up to this point), clicking All Video brings up the Moving Image Archive.  You now have the power at your fingertips to search the entire video archive (there’s even a YouTube category, if you need to watch yet another cat video).

Videos can be sorted by all manner of options, including year, subject and language.  However, ticking one of these filters doesn’t always narrow the search exactly how you might expect.  For example, selecting “Movies” brought up all kinds of content that wasn’t all films.

Another minor irritation that sometimes happens, and I’m not entirely sure why, is when playing a video.  One click isn’t always enough; it’s as though the stream is buffering or preloading first.  Still, hitting the play button twice is a small price to pay for being able to watch such a large library of free content – and you could always download it anyway.

Audio

Selecting the Audio tab reveals a couple of interesting collections.

Live Music Archive

A collaboration between the Internet Archive and etree.org, the aim is “to preserve and archive as many live concerts as possible for current and future generations to enjoy” (according to the Internet Archive’s website).

There are thousands of concerts available.  Once you’ve found one, click on the title and a new page will load.  This contains a list of tracks from the event, and you can click on any to start it playing.  Alternatively, either click the play button at the top (or track 1) and the music will start playing from the beginning.

If you’d prefer to download the music, scroll down the page to the Download Options.  Various audio formats will usually be available.  To download the entire concert, click on the actual number of files (as highlighted in blue in the screenshot).  This will download a zip file containing all of the tracks, which you can then extract to listen to them.

To download individual tracks, click on the file type (also highlighted in the screenshot, this time in green).  This will list the individual tracks.  Click one and it will start playing in a new page.  Then click the three vertical dots to reveal the Download button.

Librivox Free Audiobook

As with Project Gutenberg’s eBooks (in Texts), the LibriVox Free Audiobook Collection is the Internet Archive making use of LibriVox for audiobooks.  You can check out the original project here.

https://librivox.org/

The layout is virtually identical to that of Live Concerts, and, in the same way, you can click on any part to play it in the web browser; only (as these are audiobooks) rather than selecting tracks, you’re choosing chapters.

If you’d sooner download the audiobook (to play it offline), you can scroll down to the Download Options and choose your preferred file format.  Often the left-hand column provides links to download it as a whole (or in larger parts), rather than individual chapters.

There’s a wide range of literature to listen to, including classics like Alice’s Adventures in Wonderland by Lewis Carroll, Pride and Prejudice by Jane Austen, and Dracula by Bram Stoker.

Note. You don’t need to create an account to be able to listen to, or download, music or audio books.

More audio collections deserving consideration

Old Time Radio and 78 RPMs and Cylinder Recordings

These may be two different collections, but they certainly share a theme.  Crank up that gramophone for some real old school listening.

Community Audio

Uploaded entirely by archive users and community members, a diverse collection of audio awaits.

Podcasts

Looking to load up your headphones with Podcasts – there are more than ten thousand entries in here to keep you going.

All Audio

As All Video did with moving pictures, All Audio puts the Internet Archive’s entire library of sound at your disposal.  Search for any artist or track that takes your fancy (some titles even include the CD cover).  Even the most discriminating of audiophiles should find something to sate their creative appetite.

Software

This is, perhaps, the Internet Archive’s most unexpected category; with Software covering everything from retro gaming to Android packages, classic PC programs to images of Operating Systems, and more…

Warning. Installing any software where the source cannot be verified can be dangerous.  Even though it’s hosted on the Internet Archive, it’s been uploaded by various users and you have no idea where it came from.  This could potentially result in malware or have other unwanted side effects.  Assuming the software is genuine, if it’s old (or discontinued), a program may be open to vulnerabilities because it’s no longer receiving security updates.  Installing APKs (Android Apps) from anywhere other than Google’s Play Store is also ill advised.  It requires allowing apps to be installed from Unknown sources in the Settings, which is best avoided – unless you REALLY trust the source.  If you’re determined to download and install any software, at the very least make sure you have antivirus software installed and that it’s up to date.  I take no responsibility for any damage that may result from running software procured from the Internet Archive.

Okay, enough with the warnings, let’s dig in.

But, if I’m not going to download anything, isn’t the Software category a waste of time?

Not at all.  Allow me to introduce you to Emulation.

A word on emulation

An emulator is basically software that allows a modern computer to behave like an older computer (or games console) in order to run the programs from that earlier period.

Didn’t you just say we’re not going to install any software?

That’s correct, and fortunately we don’t have to.  The Internet Archive does everything for us; their servers take care of all the heavy lifting.  All you need is a web browser.

To briefly explain how this works, you need to prepare yourself for an acronym overload.

The Internet Archive website uses an emulator called JSMESS (which stands for JavaScript Multi Emulator Super System).  This is a porting (adapting software to make it work in an environment that it wasn’t intended for – in this case, a web browser) of the MESS (Multi Emulator Super System) emulator into JavaScript (a programming language that is often used to add interactivity to websites).  MESS itself was based on the MAME (Multi Arcade Machine Emulator) core, and has now become part of the MAME project, being integrated into the upstream repository (the central location where data is stored).

That’s a lot of techno-jargon, but, put simply, it just means that a traditional emulator has been re-developed to work inside a web browser.  The end result is being able to run old software easily (and safely) right from within your browser (be it Chrome, Firefox, or even Internet Explorer or Microsoft Edge).

The result is tens of thousands of playable software titles, ranging across multiple computer platforms – which brings us nicely to the first collection.

Internet Arcade

The Internet Arcade category emulates the old coin-operated arcade machines.  Scroll through the collection, or manually search for something.

Once you’ve found a title you like the look of, click on it.  If the game has been emulated, you should see a large Click to Begin power button (as in the screenshot).

Clicking the button will launch the (JSMESS) emulator.  Wait while it downloads all the data it requires to run the game.  This will vary, depending on the age of the game – newer titles usually have larger file sizes.  You can click the four outward-facing arrows to go full-screen (or tap the Esc key on your keyboard to return to windowed mode).  The sound can also be toggled on or off (from the speaker icon).

Some games simulate adding money (insert coin) and require you to press this button to start.  Likewise, you may need to select one or two player mode (Player 1 or Player 2).  Once the game begins, if the obvious keys (cursor – arrow – keys, space and enter) don’t seem to do anything, you may have to experiment with your keyboard to find out what works.

Scroll down the screen to read further information.  Useful links are also provided (some to external sites) that are related to the videogame.  Unfortunately, not all titles run.  If you’re having trouble getting one to play, check the reviews (if there are any) to see if others are experiencing difficulties.  You could always try a different web browser, too.

Console Living Room

The Console Living Room looks back to a time when videogame entertainment was at the very heart of the living room.  Here we have all manner of console classics, by big brand names such as Atari, Sega, Amstrad, and Sony.  Need for Speed, Sim City, Grand Theft Auto… the list goes on.  Note. Some titles (like those for the Playstation) can be quite large – several hundred megabytes – so may take a while to load in the emulator.

Tone’s Tip. To really get nostaligic, why not hook up your computer to the TV and recreate that living room experience.

Other Software collections you may find rewarding

MS-DOS Games

MS-DOS (short for Microsoft Disk Operating System) was the main operating system used by IBM compatible PCs in the 1980’s up to the mid 90’s.  Here’s your chance to relive some of the games from that period.

Many of these are also emulated, though unlike the console games (which use the JSMESS emulator), these employ DOSBox.  Should you wish to download any of the old games and run them on your own computer, DOSBox is available to download here.

https://www.dosbox.com/

Be aware, although lots of people do this and have no problems whatsoever, the same risks apply as with any other software you download.  Emulated programs are not isolated from your computer.  You could always run DOSBox inside a virtual machine, but that is beyond the scope of this feature.

TOSEC

TOSEC: The Old School Emulation Center is the work of the TOSEC Project, which can be found here.

https://www.tosecdev.org/

As the website states, this “is a retrocomputing initiative dedicated to the cataloguing and preservation of software, firmware and resources for arcade machines, microcomputers, minicomputers and video game consoles”.

In some ways it’s goals are similar to those of the Internet Archive itself, but this site focuses on the area of retrocomputing only, whereas the Internet Archive aims to do this (and more) for the entire internet!  Still, it’s an impressive project and well worth a look if you’re into the computing of yesteryear.

Vintage Software

The aim of Vintage Software collection is to preserve historical software.  Lots of these programs ran on hardware that no longer exists.  The information gleaned here can be useful for both educational and historical purposes.

ZX Spectrum

Sinclair’s ZX Spectrum: that little rubber-keyed marvel that lit a fire under home computing, and (along with the Commodore 64) introduced the youth of the 1980s to the wonders of 8-bit computing.  Though primarily a games-focused machine, it produced a generation of bedroom coders that would go on to work in the field of PC programming.

It took nearly thirty years for another device to come along that would stir up such inspiration in youngsters and get them coding again (that’s the Raspberry Pi, if you hadn’t guessed – click here to get your fill of Pi).

In the ZX Spectrum collection you can relive your youth, playing such classics as Manic Miner and Daley Thompson’s Decathlon.  If you haven’t revisited these in thirty plus years, prepare yourself for just how bad the graphics were – it’s amazing what those rose-tinted Speccytacles (sorry, couldn’t resist) can do!  Even so, the games haven’t lost any of their addictiveness – so get ready to lose a few hours in the pixelated past.

Tucows Software Library

According to the Internet Archive, The Tucows Software Library is the largest freeware/shareware library on the internet” (with over 40,000 titles), so it’s perhaps no surprise that they’ve included a collection for it.

I’m not keen on how, when you click on a (hyper)link that looks as though it will take you to the original website, it actually directs you to the Internet Archive’s search for the software.  However, as a lot of this software has been discontinued, I can understand why they probably did this – I just wish they’d worded the links differently.

All Software

As with most of the Internet Archive’s categories, you have the option to view All Software and – with over a million programs available – can search until your heart’s content.  The website tells us that “The Internet Archive Software Collection is the largest vintage and historical software library in the world”.

Thanks to emulation, many of the games can be enjoyed in your web browser (you’ll usually see Stream Only rather than Download in the description).  However, the various collections offer an overwhelming amount of downloads.  Always consider the risks before clicking on these, though it may be worth checking the Reviews (if others have used the software).

Now that you know the wealth of choice that awaits, head over to the Internet Archive’s Software section and get ready for some retro-gaming action.

Images

The last of the Internet Archive’s categories is Images.  Here, two museums take pride of place.

Metropolitan Museum

The Metropolitan Museum of Art (New York) collection is home to thousands of images.  Each can be viewed in a window or full size.  You also have the option to downloaded them.  Details can be found below each image regarding the work of art.  Among the collection, the gallery contains lots of pieces depicting local landmarks.

Brooklyn Museum

The Brooklyn Museum collection, though not as substantial, has a wide range of artwork.  It, too, contains artwork depicting the area.

Further image collections of note

Cover Art Archive

The Cover Art Archive is a joint project between the Internet Archive and MusicBrainz (the latter of which you can find at the link below).

https://musicbrainz.org/

The goal is to make cover art (CD) images available to all.  You can download a slideshow of over 9000 covers directly from the About tab (note, it’s a VERY large download)!

Flickr Commons Archive

We are told that, “The key goals of The Commons on Flickr are to firstly show you hidden treasures in the world’s public photography archives, and secondly to show how your input and knowledge can help make these collections even richer”.  As you’d rightly suspect, the Flickr Commons Archive photos are taken from the Flickr website, and grouped into various collections.

USGS Maps

USGS (United States Geology Service) Maps may be of interest for any budding geologists from the US (or any that are planning to visit).  Lots of states are covered and good quality TIFF files are available for download.

NASA Images

Despite the title, NASA Images not only contains thousands of images, but videos and audio too.  The collection is grouped into lots of sub-collections and can be manually searched as well.  Content can be watched or downloaded, so start stargazing through your screen today.

All Image

An archive of over three million images.  Select All Image to browse the entire collection.

Hints and Tips

We may have come to the end of the Internet Archive’s categories, but let’s draw attention to a few useful features.

Reviews

When viewing an item in most of the categories, there are often Reviews.  These are written by other members (yes, you do need to have an account to add a review) and, depending on the item in question, can be helpful to read.  If there isn’t one already, you could always write your own to benefit other people using the website.  You can also leave a rating (out of five stars).

Forum

Several of the Internet Archive’s categories have an archive Forum.  If you want to post any comments or ask any questions, this is the place to do it.

About

If you want to read more about any of the collections, click the About tab.

Become a contributor

Anyone with an account can upload media; be it books, audio, images, video files… any type of file that fits within a category.  The Internet Archive will provide free storage and share it on their website.  When you upload items you automatically become the admin for those items.  As admin, you can edit and delete them (as required).

Important. You need to have the legal right to share the file(s).  E.g. It’s something you created.

While anyone can use the Internet Archive without an account (though some services may be restricted), without doubt the best way to make full use of it is to create a free account.  Get started here.

https://archive.org/account/signup

Blog

Keep an eye on the Internet Archive’s Blog.  Besides bringing new collections to your attention, and other things of interest, it also goes behind the scenes so you can get to know the team behind the project.

Internet Archive Store

The project is open to donations, but another way to support them is to purchase merchandise through their online store.

https://store.archive.org/

Help

When all else fails, there’s a Help page with articles covering a range of key topics.

A final word

They say there’s a pot of gold at the end of the rainbow.  However, with the Internet Archive, you don’t have to go that far; pure internet gold is only a mouse click, or screen tap, away.  A one stop (free) shop amassing over twenty years of web history.  The project works with more than four hundred and fifty libraries, as well as other partners (some of whom we’ve looked at in this post).

It’s a shame that the Internet Archive team don’t produce their own mobile app (though apps from third parties do exist) and rely simply on the humble web browser.  Still, the website remains a collector’s dream of what might otherwise be forgotten classics.

So, pay them a visit.  Unlike the movie, The Sixth Sense, hopefully you won’t see actual dead people, but you’ll certainly dig up more than a few long since departed websites! (I couldn’t finish without squeezing one last reference to Mr Willis in).