Digital Storage Needs

In the interest of starting a discussion about our group collectively approaching campus digital storage providers (such as TACC and ITS), this page is intended as a place where we can state the needs that each of us have in regards to storage of archival digital objects, whether those objects be born-digital or surrogates of analog originals.

Please edit the page and include your institution's needs.  Once everyone has provided their input, we can begin to determine what group request we should take to digital storage providers.

Some issues to take into consideration:

  • How much storage do you need now?
  • How much digital material do you estimate you will acquire in 5 years?
  • How readily do you need access to the storage?
  • What security concerns do you have?
  • Taking a risk management perspective, what risk (if any) of data loss are you willing to assume?
  • What functionalities are you looking to come with your storage?

If anyone has any considerations to add, please feel free to add them to this list.  Ideally, contributors to this page could then update their storage needs profiles in response to the new considerations.


Briscoe Center for American History
  • ~10.5 TB of digital content total
    • However, ~3.5 TB is uncompressed digital video, which is currently stored on tape.  Access to the uncompressed video does not need to be fast, making tape a viable option.
  • We would estimate we will generate 1 TB per year over the next 5 years, with at least 500 GB of this per-year total being uncompressed video.
  • Networked access to these files would be a welcome bonus, but is not necessary.
  • On the other hand, access to BCAH content should be restricted to BCAH staff.  If the storage is online, it needs to be well-secured against external access.
  • 3 of risk levels:
    • Digital surrogates - willing to accept some risk of data loss
    • Audio-visual (born-digital and surrogates) - willing to accept minimal risk of data loss.  The difficulty of digitizing a/v materials, as well as their extreme fragility and difficulty of access in their analog states, compel me to accept much less risk than most digital surrogates such as photos and text.
    • Born-digital - since we've established a space in UTDR to handle born-digital materials, BCAH is willing to accept a moderate level of risk.  Any storage of born-digital objects outside of UTDR would be redundant, and therefore recoverable if lost.
  • Some functionalities that are necessary:
    • Fixity checks
    • Back-up mechanisms
    • Password protection against access
  • A functionality that would be nice: automated creation of derivatives
  • A functionality that BCAH does not need: a database or CMS attached to the storage that allows description and management of the content.

School of Architecture Visual Resources Collection (SOA VRC)

contributed by Elizabeth Schaub (eschaub@austin.utexas.edu)

  • How much storage do you need now?
    • Metadata: ~2 GB stored on ITS SQL server; Digital images: ~904 GB stored on a UT Libraries server (1500 GB of data written to tape)
  • How much digital material do you estimate you will acquire in 5 years?
    • We estimate that the digital image collection grows at a rate of 3 GB per year.
  • How readily do you need access to the storage?
    • Networked access to the server where the files are stored is necessary to support our local image production workflow. On demand access to the images, or a version thereof, via our EID accessible online image collection is also necessary.
  • What security concerns do you have?
    • Direct access to storage space must be limited to designated SOA VRC staff members and server administrators.
  • Taking a risk management perspective, what risk (if any) of data loss are you willing to assume?
    • Zero tolerance for archival file data loss (TIFF files); minimal tolerance for derivative file data loss as long as there was an automated system in place to recreate derivate files from archival files.
  • What functionalities are you looking to come with your storage?
    • Fixity checks
    • Back-up mechanisms
    • Password protection against access
    • Yet-to-be-determined interoperability functionality that will be identified during the course of selecting a new DAMS.

Tarlton Law Library

contributed by Melanie Cofield (mcofield@law.utexas.edu)

  • How much storage do you need now?
    • ~200 GB of digital archives content comprised of digital surrogates and born-digital materials from Special Collections.
    • Materials currently stored and backed up variously on CDs and DVDs, an external hard drive, UT network storage and a Sharepoint site.
      • Digital collections created for purposes of preservation reformatting
      • Digital images and audiovisual materials created for use in online resources & exhibits
      • Digital archives of UT Law news “clippings” (electronic vertical file)
      • Oral history interviews: original recordings and transcripts, and edited print publication files
      • born-digital Library event recordings (annual lectures and special events)
      • Select UT law event recordings (digitized and born-digital, extent to be determined)
  • How much digital material do you estimate you will acquire in 5 years?
    • Based on accumulation rates over the last 3 years, we estimate that our digital archives are currently growing at a rate of ~50GB/year.
  • How readily do you need access to the storage?
    • Networked access to file storage is required to accommodate our local production and e-archiving workflows, as well as patron request fulfillment.  It would be nice to enable designated Tarlton staff to grant limited access to other credentialed individuals when needed.
  • What security concerns do you have?
    • Direct access to storage space must be limited to designated Tarlton staff members and server administrators.
    • If the storage is online, it needs to be well-secured against external access.
  • Taking a risk management perspective, what risk (if any) of data loss are you willing to assume?
    • Archival masters (TIFF) and Audiovisual (born-digital and surrogates): minimal tolerance for data loss, assuming there is a backup mechanism to provide recovery.  Much of our actual and anticipated audiovisual content is born-digital in a compressed format, so our derivative creation options are limited.
    • Access derivatives: moderate tolerance for data loss.
  • What functionalities are you looking to come with your storage?
    • Fixity checks
    • Back-up mechanisms
    • Password protection against access
    • interoperability with yet-to-be-determined DAMS

______________________________________________________________________________________________________

Harry Ransom Center (born-digital)

contributed by Gabriela Redwine (gredwine@mail.utexas.edu)

  • How much storage do you need now?
    • 10 GB each on 1 preservation store and 2 backup stores, all of which are external hard drives. This includes files from the small portion of born-digital holdings that we've processed. The content of the majority of our digital media has not yet been captured.
  • How much digital material do you estimate you will acquire in 5 years?
    • Most of what we have now is 5.25- and 3.5-inch disks and other "legacy" media and computers--in other words, media with fairly limited storage capacity. Once we begin to receive modern hard drives and flash drives, our storage needs will increase significantly. So even if the volume of digital media decreases (i.e., we receive fewer individual disks, CDs, etc.), the storage capacity of each item will increase.
  • How readily do you need access to the storage?
    • I need weekly access to each store to run my weekly fixity checks. I need on-demand access to one backup store in the event that I need to fill a request from a staff member or patron. And I need on-demand access to all three stores so that I can add materials as I process additional disks. I also need on-demand access to the "mobile storage" (e.g., flash drive) that I use to transfer files from one workstation to another at different stages of the capture and migration processes.
    • At this point, I do not want my storage to be accessible via a network. In the future, only the "grey" store (i.e., the storage from which staff and the DAMS pull files) will be connected to a network.
    • Only myself, the head of Tech. Services, and the head of my department (M&A) will have permission to access the preservation store. Other staff will have limited access to the grey store, as will the DAMS. Student workers and interns may also have some level of access in the future, as needed.
  • What security concerns do you have?
    • My security concerns include old computer viruses that might not be recognized by more modern anti-virus software. I am also concerned to keep all preservation copies of born-digital materials (including copies that result from migration) separate from digital surrogates.
  • Taking a risk management perspective, what risk (if any) of data loss are you willing to assume?
    • I am not willing to risk any data loss for our preservation masters of original copies or migrated copies. Minimal tolerance for loss in access copies. The 2nd backup store is currently our oldest external hard drive. The first time my weekly fixity checks reveal a problem with the files on that store, I will transfer everything to a new external hard drive and replace the degraded file(s) with copies from one of the other two stores.
  • What functionalities are you looking to come with your storage?
    • Automating and documenting fixity checks. It takes me 6 hours a week to generate checksums of the three stores for our fixity checks, so having a server with USB ports for all three of the external hard drives and a script (or something) to (1) automate the entire checksum generation/fixity check process AND (2) generate appropriate documentation would save me a lot of time.
    • Also, once I create a package that I want to commit to storage, it would be nice to have something in place that would send that package to all three stores, verify its checksum upon arrival, and document the (successful or unsuccessful) transfer process. Currently I do all of that manually.