The Internet Archive isn’t just old websites

Huge storage plus magazine scanning is also part of it.

In the 1990s, Kathleen Maher ran Cadence magazine and, as a result, had boxes full of all the issues spanning a decade. Cadence captured the developing CAD-on-PC market shift, all new products, company introductions, mergers and acquisitions, and failures. A significant slice of the history pertaining to the development of the CAD industry. But seven big boxes of magazines take up a lot of space, so we went looking for a home for them.

The Internet Archive (IA) is probably best known for the Wayback Machine. Brewster Kahle and Bruce Gilliat developed the Wayback Machine to provide “universal access to all knowledge” by preserving archived copies of defunct Web pages. The ambitious and seemingly impossible project was launched on May 10, 1996, and became available to the public in 2001.

The Internet Archive provides free public access to collections of digitized materials, including websites, software applications/games, music, movies/videos, moving images, and millions of books. In addition to its archiving function, the Internet Archive is an activist organization, advocating a free and open Internet. As of September 10, 2022, the IA held more than 35 million books and texts; 8.5 million movies, videos, and TV shows; 894,000 software programs; 14 million audio files; 4.4 million images; 2.4 million TV clips; 241,000 concerts; and over 734 billion Web pages in the Wayback Machine.

The IA has been operating in a large, stately white building—the former Christian Science Church in San Francisco’s Richmond District (where Geary Boulevard meets Park Presidio)—since 2009.

An older photo of the pews—notice the statues to the right.

I visited it to drop off the magazines and was surprised to see the original Wayback Machine as well as one of the four current Wayback Machines. The original machine with four screens now sits to the left on the stage.

In 1997, the entire Internet could be stored in 2TB.

Standing with your back to the original Wayback Machine on the far left and right walls are small statues about 5 feet tall of all the people who have worked at or are currently working at the IA.

Created by ceramic artist Nuala Creed, the portraits depict staff of the Internet Archive and were commissioned by its founder, engineer, and Web activist Brewster Kahle.

Up at the top of the pews, where the entrance would be but is now blocked off, sit the 70PB-plus servers of the current Wayback Machine. The IA has a mirrored backup system in Berkeley, California, and other locations.

Every time a light blinks, someone is either uploading something or downloading something from the Internet Archive.

The Wayback Machine gets between 2 million and 3 million visitors a day, and has 745 billion Web pages on the Internet.

Each pair of pages is painstakingly photographed and digitized by hand.

The IA has also been embroiled in controversy. Some organizations (like Scientology) and people (like porn actors) who want to keep their history and behavior secrete have sued and petitioned the IA to remove information about them. You can read some of the stories here.

The IA is a not-for-profit 501(c)(3) organization and operates from donations and grants. JPR is, and has been, a contributor. You can contribute here.

Sometimes libraries pay the Internet Archive to digitize their entire collections. Institutions looking to build or enhance their own ongoing Web archiving workflows can subscribe to ArchiveIt’s end-to-end service. An annual subscription includes access to IA’s Web-based application for digital collection building, management, preservation, and public accessibility, an accompanying annual data storage budget, and access to a robust help center.

Other archive systems can be found here.