Plan Before You Scan: Considerations for launching a digital archive project
The American Center of Oriental Research (ACOR)’s new photo archive project aims to bring a collection of 30,000 photos documenting the archaeological and cultural heritage of the Middle East to the public eye. But where do you even begin to digitize 30,000 slides, negatives, and prints?
ACOR is one of the many diverse libraries who use Softlink’s Liberty library management system. In 2016 they were awarded funding from the U.S. Department of Education through the competitive American Overseas Research Centers grant program which will support the ACOR Library Photographic Archive. As a guest blogger for Softlink, Corrie Commisso, Senior Archival Consultant for ACOR, shares her experiences with launching the digital archive project.
Plan Before You Scan: Considerations for launching a digital archive project
by Corrie Commisso
When I agreed to join the American Center of Oriental Research (ACOR) in Amman, Jordan, as the first archivist to tackle a new photo archive project, I anticipated two things: hot weather and cool photos.
I wasn’t wrong on either count. Amman is, indeed, very hot in the summer. And the Center’s historical photo collection — more than 100,000 images documenting the archaeological and cultural heritage of Jordan and other countries throughout the Middle East — is incredible, and incredibly valuable. The digital archive project is making these photos publicly and freely available to researchers for the first time.
While it was tempting to spend all of my time investigating the fascinating people, places, and events captured in the boxes of 35mm slides and black and white prints, with just twelve weeks to get the digital archive project off the ground, I knew that the bulk of my work would be legwork: making sure the project was set up to be successful.
So what does it take to start a digital archive from scratch? A lot, as we’ve discovered. Before you scan that first image or document, think through these core considerations for launching a digital archive project.
1. Maintain, maintain, maintain.
There’s no such thing as a set-it-and-forget-it solution for digital archiving — it’s not a self-cleaning oven. Platforms become obsolete, file formats become obsolete, better technology and solutions become available. Information technologies have a life span of about 18 months, so you need to be prepared to maintain and update your technology regularly. Which means going into the project with the understanding that the time and money you invest up front is just the beginning. If your project is dependent on grant funding, for example, what happens a few years down the road when those funds are gone? How will you handle ongoing maintenance costs so that the work you have put into the project won’t be wasted?
2. Assess and prioritize.
Digitizing a physical collection takes time, unless you have boatloads of money at your disposal and can buy lots of equipment and hire lots of project staff. Just to give you an idea, while ACOR’s photo collection includes more than 100,000 photos, our goal is to have 30,000 digitized in four years. That’s 1/3 of the collection. And that’s with a full-time archivist and a full-time scanning tech working every day on the collection.
Assessing your collection, prioritizing what materials to digitize, and documenting your priorities is critical. What collections are frequently accessed? Do you have rare collections where access to originals is limited due to physical preservation concerns? Do you have time-based media — VHS tapes, audio cassettes — that are becoming obsolete? These are all high priority collections for digitization.
The other type of assessment to consider is an assessment of the purpose of your digitization project. Is the purpose to make the digital copies accessible to the public? Or will they be kept private and made available on request only? This will make a difference in what types of files you create down the road and how you manage them.
3. Know your rights.
Your Intellectual Property (IP) rights, that is. Before you make any digital files public, make sure you have the rights to the material you’re digitizing. Most of the time, IP rights are detailed in the Deed of Gift or accession record that comes with a collection. But, as we’ve discovered at ACOR, collections (particularly collections that have been sitting around for a long time) may not always have a Deed of Gift. There are several old collections for which we are having new paperwork drafted and signed. Whether IP information is documented in a Deed of Gift or another document, the important thing is that it’s documented, as it may affect how you use and share the images. For example, the Deed of Gift for one of our collections stipulates that we can only share low-resolution images on the web and they must be watermarked.
4. Choose file storage wisely.
Space is a constant issue for libraries and archives. Just like physical collections need to be stored, so do digital files. And they add up quickly, especially if you’re scanning high resolution images or converting lots of video. The good news is that digital storage options are typically a lot more flexible than “I can’t find shelf space for this box of 35MM Kodachrome slides!”
You should consider options for both primary and backup storage. If you’ve ever had a computer stolen or damaged before you could backup your information, or forgotten to save a file you’re working on, then you know why backups are worth their weight in gold. The more you have, the better protected you are against data loss.
There are many options for storage, from private servers to external hard drives to cloud storage. I highly recommend considering cloud storage for backup, at least, as it is relatively inexpensive and the technical platforms are updated regularly. Amazon, for example, provides multiple types of file storage at an affordable rate depending on whether you need to access the files regularly or only on occasion.
Whatever you choose, assume that your storage format is going to change at some point in the near future, and be prepared to migrate it (see #1). In the past twelve weeks, I have seen critical data stored on an ancient external hard drive roughly the size of a small child that required it to be plugged into both a computer and a power outlet to operate, as well as an archive storing all of its backup files across hundreds of DVDs. I am not making this up. Both of these scenarios are a migration nightmare because hardware maintenance was neglected.
5. What’s in a name?
Not only is developing a file naming convention essential for organizing your digital files, we also found it extremely helpful in organizing physical collections that came to us with little or no organization at all, as it forced us to begin categorizing the content of the collection.
While there are some best practices related to file naming, there is no set of hard rules, as each archive has a different set of needs related to their digital collections. But there are two general approaches to file naming: a descriptive system, where the file name includes specific details about the content of the file; and an opaque system, where the filename is simply a numeric code assigned to a file.
For an archive that does not have a collection management system in place (i.e., is just storing files on a local server in a system of nested folders), the file names become the primary means of determining the content a file contains, and thus more description is necessary. As a result, these file names can be quite lengthy — sometimes more than 50 characters! On the other hand, when an archive is using a collection management system that can search content using other metadata assets, there’s less of a need for description in the file name.
ACOR arrived at a compromise, given that our content management system has very robust metadata capabilities: including some descriptive detail but keeping the file names relatively short. Here’s what our file naming convention looks like:
We have a document that details what each segment represents: The Rami Khouri Collection (RK), the geographic region of North Jordan (01), the subject category of Buildings (01), the media type 35mm Slide (S), and the digital file number 1 (0001*).
(* Here’s a tip: file numbers should always have leading zeroes. So make sure you have enough zeroes to accommodate the number of files — in this case, we could have up to 9,999 files. If the collection were smaller, say, 500 files, we would use 001 as our file number.)
There are good reasons for keeping digital file names as short as possible. Not all systems and software, for example, handle long filenames well. We can’t predict where our files will end up — especially those we are making accessible to the public — and our content is useless if users can’t access it due to a poorly formatted file name. In addition, longer file names leave more room for user error when entering the file name. A single mistyped number or letter can create hours of frustration for both archivists and researchers. Just ask the guy who keeps mistyping a password with the number 0 as an “O.”
Don’t forget to make sure your physical files are labeled to match your digital files to make it easier to find your originals!
6. Find a DAM solution
The platform you choose for Digital Asset Management (DAM), or collection management, is one of the most critical decisions you will need to make in establishing your digital archive, and one that will influence many other aspects of your project: your storage choices, how you collect your metadata, how you name your files.
There are options to fit every budget, and they all have their pros and cons. The important thing in choosing the right system is to know what your requirements are. How much can you spend? How much technical expertise do you have available to manage a system? What do you want your DAM solution to do — provide public access to your collections, or just manage them internally? It’s helpful to draw up a list of your key project requirements. Here’s the list ACOR used as criteria for choosing a DAM platform:
- Ability to handle both standard (i.e., Library of Congress, Dublin Core) and custom (i.e., Royal Geographic Place Name) metadata fields
- A web-based user interface that is customizable and visually pleasing, with robust search capabilities
- Ability to manage multiple media types (i.e., photo, video, PDF, animation, etc.)
- Accommodation of multiple languages (i.e., Arabic)
- Available technical support
- Ability to download/migrate any data entered into the system in the event we need to change systems
Look at all of that beautiful metadata! This is a peek at what our DAM system looks like behind the scenes. We’re hoping to get our publicly searchable site launched within the next month.
7. Be tech-savvy.
Lastly, don’t digitize anything until you’ve established technical standards for your project — what size and resolution you should be scanning at, what kind of equipment you need, etc. (Already started scanning? That’s ok: see #1.)
There are plenty of good resources online where you can find suggestions and standards for digitization, including the Library of Congress, the National Archives and Records Administration, and the Federal Agencies Digital Guidelines Initiative.
Documenting the standards you will use for your project will ensure technical consistency across all of your files. One thing you should definitely include in your technical workflow is spot checking the quality of your scans. We discovered though our spot checking that our scanner was randomly distorting areas with deep shadows and dark colors and we had to spend some time troubleshooting our equipment.
Looks like there’s a technical glitch with our scanner. Fortunately, we caught it by spot-checking our digital files before it became a real problem in the archive.
In terms of digitizing your collections, you can either purchase equipment and do the work in house, or work with a vendor. It seems that digitization of video files tends to be hired out, as the conversion equipment is expensive and the process is extremely time consuming; however, if you have a large collection, it may be worth setting up a transfer station in house. If you do work with a vendor, be sure to clarify your technical standards and expectations before beginning work.
Digitization of documents and photographs can be managed easily with the purchase of some basic equipment. You don’t need to purchase an extravagant wet mount film scanner to get a good scan, but don’t cut corners on your equipment, either. Epson brand scanners, combined with Silverfast scanning software, seem to be used widely in archival settings, and ACOR just purchased two Epson 550V scanners — a model several years old but perfectly adequate for meeting our technical requirements (and a lot less expensive than the latest model, allowing us to purchase two instead of one!).
Getting a digital archive project started well can mean the difference between a useful, functional, well-organized archive and a digital disaster. If you jump straight to creating digital files without considering some of these key factors, you’ll waste a lot of time having to go back and undo/redo things.
So invest the time up front to get it right.
And once you’ve got your file naming convention down, your storage and DAM systems set up, your technical requirements outlined, and your team on board with routine maintenance tasks — then you can start digitizing. (Or re-digitizing, as the case may be.)
Corrie is the Senior Archival Consultant for the American Center of Oriental Research (ACOR) in Amman, Jordan. She is completing her Master of Library and Information Science at the University of Illinois, Urbana-Champaign, specializing in archives and special collections. Her research interests include cultural heritage preservation, digital archives, and conservation of book and paper materials, and she has studied book history and book arts at the University of London Rare Books School. Corrie has nearly 20 years of professional experience as a writer, graphic designer, and creative director, and she currently resides full-time in Dakar, Senegal, where she freelances as a communications expert for the non-profit/NGO field.
Related Reading from the Softlink Information Centres Blog
Liberty enables vital information to be discovered and delivered anywhere, anytime through modern digital devices. Liberty combines advanced functionality with ease of use.
illumin is the perfect tool for knowledge and research management as it collects, records, quantifies and précis’s queries and answers provided to staff and clients. The knowledge base becomes a comprehensive, audited database of relevant information that can be easily shared.