5. Preservation record-keeping
Once files have been bagged, they must be logged and renamed before they can be written to the tape archive.
Assign a SIP number and log the bag in the SIPs spreadsheet
The SIPs spreadsheet records all files that pass through Digital Stewardship on their way to the tape archive. It is stored on the dps disk1 volume, and read-write permissions are restricted by LITS Root Squad to prevent accidental editing.
Each bag must be assigned a unique SIP (submission information package) number, which consists of the year the materials were received or processed as well as a four-digit number, separated by an underscore. It is typically a good idea to assign a SIP number and log the file in the SIPs spreadsheet prior to bagging, to "claim" the number and avoid conflict with other materials being handled by Digital Stewardship staff.
Bags can be named according to their SIP number, but this is not required. If a bag is not named according to its SIP number, it must be placed within a folder named according to the SIP number.
For example, a set of files from the "David Bliss Papers" collection might be assigned SIP number "2022_0904", as the 904th item assigned a SIP number that year. The files can be bagged in a folder called "2022_0904
", with a payload containing "2022_0904_files
" and "2022_0904_files_metadata
" directories. The files can also be bagged as "bliss_david_papers
", with a payload containing "bliss_david_papers_files
" and "bliss_david_papers_files_metadata
" directories, in which case the entire bag must be placed within a folder named "2022_0904
". This is to ensure that searching the tape manifests will return the desired files. See /wiki/spaces/utldigitalstewardship/pages/43057645 for more information about searching the tape archive.
In the SIPs spreadsheet, enter a brief description of the bag, including the collection and series name (if applicable). Be sure to record this information exactly how collections managers are likely to ask after the files, if they need to request a restore: avoid typos and abbreviations, and clearly state if the bag is a part of a larger set of files. Record the original format/medium, the Bag Group Identifier (if applicable), the owning location, the External Identifier UUID, any associated identifiers such as OCLC numbers, and the size of the bag as recorded in the bag-info.txt file.
Close the sips spreadsheet!
The SIPs spreadsheet can only be edited by one user at a time. Please be sure to save and close it as soon as you are finished entering information about your bags, to allow other staff to edit it as needed
Next, open the DS_tracking spreadsheet in /dps2/0_processing
and enter the same SIP number, descriptive information, and Bag Group Identifier there. The DS_tracking spreadsheet is primarily used for digitization projects, so some of the fields may apply in the case of bags of born-digital materials, but it's important to update both spreadsheets to avoid any SIP numbers being used twice.
Copy the SIP to the write_to_tape
folder
Next, copy the complete SIP into the /dps/write_to_tape
folder. This folder is where all files bound for the tape archive are staged. See /wiki/spaces/utldigitalstewardship/pages/43057554 for more information on how materials are copied to the tape archive. Larger bags will take a long time to copy, so it's a good idea to use a powerful workstation or virtual machine for this step. It's also a good idea to use a copying program like TeraCopy that will allow you to verify that the files were copied exactly as intended. Finally, it's important to validate the bag after copying it. See the "Bag validation" section of 4. Bagging for more information about validating bags.
Copy the bag text files to the sips
folder
Finally, copy the bag to /dps/sips
, within the appropriate annual subdirectory, and delete the data directory from this final copy. The /dps/sips
directory folder contains a record of all SIPs sent to the tape archive since SIP numbering began in 2016, with a folder for each SIP containing the bag-info, bag-it, manifest, and tagmanifest text files. These files are kept in the /dps/sips
directory after the bags themselves have been written to tape and deleted from the volume so that Digital Stewardship staff can review the contents of materials on tape without needing to restore them in every case. If the _files_metadata
folder contains files that would be useful for understanding the collection down the line, it can also be retained in /dps/sips.
However, assets themselves (images, PDFs, audio or video files, etc.) must be deleted to keep the size of /dps/sips
from ballooning over time.
Once the bags have been created, validated, logged in the SIPs spreadsheet and DS_tracking spreadsheet, and copied to /dps/write_to_tape
and /dps/sips
, they are ready to be written to tape. See /wiki/spaces/utldigitalstewardship/pages/43057554 for instructions on that process.
Welcome to the University Wiki Service! Please use your IID (yourEID@eid.utexas.edu) when prompted for your email address during login or click here to enter your EID. If you are experiencing any issues loading content on pages, please try these steps to clear your browser cache.