Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Use Case:  As a DAMS user, I need to ingest a set of digital objects

Table of Contents

What this does

Tiered Ingest allows you to group all of the files corresponding to a simple asset's datastreams (including archival files, publication files, other derivatives created outside of Islandora) as one Fedora asset with multiple datastreams so that my workflow is streamlined and related objects are stored together in one place.

Solution:  We need to use a specialized batch ingest module for this because the standard Islandora Batch only allows for two files per asset, one .xml file for the MODS datastream and one other file for the OBJ datastream. Tiered Ingest allows you to group all of the files corresponding to an asset's datastreams (with the exception of RELS-EXT) into a sub-directory.

The tiered ingest batch module uses filenames to identify the files that correspond to specific datastreams. , with the exception of RELS-EXT) into a sub-directory.

Warning

When ingesting derivative files, with this method, they may be overwritten by the DAMS software.


Note

The DAMS software will determine the asset's Content model based on the file type (MIME type) of the primary media file, which is ingested into the OBJ datastream. Particularly with AV content, this can lead to unwanted results (e.g. an audio file being ingested with the Video content model). Some media file formats can be used for different kinds of content. Consult with the DAMS Management team when planning your ingest project.


Note

This tiered batch ingest method is NOT suitable for paged content (complex/compound assets with children). See Batch ingest complex assets (paged content) for instructions on how to ingest assets comprised of multiple pages.

The tiered ingest allows you to store additional files with a digital asset, and you can use this method to ingest externally created derivative datastreams (e.g. for streaming audio). See Content models for a breakdown of the expected datastreams per content model, and for information which datastreams can be published to e.g. the Collections Portal.

Multiexcerpt include
MultiExcerptNameBatch ingest general instructions
PageWithExcerptBatch ingest simple assets

Staging folder structure

  • All of the files you are ingesting as part of one asset

...

  • will be staged in one directory per asset, as a sub-directory of

...

  • a batch job folder.
  • Each sub-directory corresponds to one asset and must

...

  • contain at least a manifest file for the

...

  • key datastreams

...

  • (datastreams.txt).

...

In order for the script to know what the datastreams to be ingested are we need a "manifest" to be included with the queued batch.

  • The batch job folder can contain just one asset folder, but would still need the extra nesting


Code Block
titleSample folder structure
eid1234_example-batch-submission/ (batch job folder)
├── asset1/
│   ├── datastreams.txt
│   ├── modsfile.xml
│	├── primaryfile.tif
│	├── anyarbitraryderivativefile.ext
│	├── anyarbitrarycomponentfile.ext
│   └── anymediaphotographfile.ext
├── asset2_audio_example/
│   ├── datastreams.txt
│   ├── modsfile.xml
│   ├── audiofile.wav
│   ├── derivative_audiofile_for_streaming.mp4 (e.g. for creating PROXY_MP4 datastream, which is required for streaming audio)
│   └── audio_transcript.txt
└──	asset3_video_example/
    ├── datastreams.txt
    ├── modsfile.xml
    ├── videofile.mp4
    ├── video_captions.vtt
    └── video_transcript.txt
    └── page02_custom_ocr.txt

Step 2: Create datastreams.txt manifest

Subdirectories in the batch job folder MUST each contain a manifest file named datastreams.txt. The manifest file specifies the intended structure of the DAMS asset, for instance pointing to the MODS XML containing the metadata for the asset, or specifying which additional datastreams should be created from staged files.

Each line of the manifest file contains an argument-value pair in the following format:

<ARGUMENT>==<VALUE> 

Use 2 (two) equal signs to separate arguments and values.

Manifest Arguments

Refer to Anatomy of DAMS digital assets and Content models for a list of allowed/expected datastreams per content model. Consult with the DAMS Management Team for use cases not covered by the datastreams listed in this documentation.

Warning

DO NOT use any of the Restricted Datastream IDs.

Manifest generator script

Multiexcerpt include
MultiExcerptNamedatastreams generator script
PageWithExcerptDAMS datastreams.txt generator

Sample manifests

Code Block
languagetext
titleSample generic datastreams.txt manifest file
OBJ==primaryfile.ext
MODS==metadata.xml
# optional, if no MODS file is included, minimal metadata is automatically generated during ingest
PDF==custom.pdf
# optional
ARCHIVAL_FILE==originalversionof_primaryfile.ext
# optional, use for archival file (e.g. uncropped scan)
COMPONENT1==componentfile1.ext
COMPONENT2==componentfile2.ext
# optional, can for instance be used in cases where a primary image is stitched from multiple component images; increment for additional files in same directory
# DO NOT use for complex objects that can be modeled as paged content or Islandora component assets!
MEDIAPHOTOGRAPH1MEDIAPHOTOGRAPH==anymediaphotographfile.ext 
# optional, can be used for images documenting physical media, cases, covers, etc.; incrementuse MEDIAPHOTOGRAPH forif additionalthere filesis inone sameimage directoryonly
DERIVATIVE1MEDIAPHOTOGRAPH1==anyarbitraryderivativefileanymediaphotographfile.ext
MEDIAPHOTOGRAPH2==anymediaphotographfile.ext
# optional, usecan forbe derivativeused filesfor withimages directdocumenting descendantphysical relationshipmedia, fromcases, file designated OBJcovers, etc.; increment for additional in same directory
# CAUTION, do not duplicate derivative files that aremultiple images documenting the physical carrier(s)


Code Block
languagetext
titleSample datastreams.txt manifest file for audio content
OBJ==audiofile.wav
MODS==metadata.xml
# optional, if no MODS file is included, minimal metadata is automatically generated byduring theingest
DAMS

...

TRANSCRIPT==

...

DERIVATIVE1==anyarbitraryderivativefile.ext [use for derivative file with direct descendant relationship from file designated OBJ; increment for additional in same directory]

COMPONENT1==anyarbitrarycomponentfile.ext [use for cases such as a file comprising one piece of a stitched OBJ or one page image in a pdf OBJ; increment for additional in same directory]

MEDIAPHOTOGRAPH1==anymediaphotographfile.ext [use for images documenting physical media, cases, covers, etc.; increment for additional in same directory]

MODS==metadata.xml   [use for optional included metadata file, if not included then very minimal mods will be added]

Notes:  

  • [text] should not be included in datastreams.txt file, used above for explanatory purposes only.
  • Additions beyond the standard datastream IDs shown above are allowed.  Consult with DAMS Management Team for recommendations. 

Example Ingest:

User1 in Architecture has a collection and needs to ingest their media with extra datastreams

they use ftp to upload their files to the server in a directory called batch1

fill out form as follows:

>>> Architectural Collections

Enter identifer of sub-collection that will contain your batch of assets >>> utlarch:5a4f464a-b4d5-4dd7-b2c2-4562643ac1bd
  >>> batch1

sample directory structure:

utlarch

    batch1

    set1

        datastreams.txt

primaryfile.tif

        anyarbitraryderivativefile.ext

        anyarbitrarycomponentfile.ext

        anymediaphotographfile.ext

        metadata.xml

    set2

       datastreams.txt

primaryfile.tif

        anyarbitraryderivativefile.ext

        anyarbitrarycomponentfile.ext

        anymediaphotographfile.ext

        metadata.xml

   

Notes:

...

audiotranscript.txt
# Textual representation of linguistic content in audio and video assets. REQUIRED for audio assets to be publishable. Transcripts MUST be in plain text.
PROXY_MP4==audioderivative.mp4
# optional; audio content can be provided as streaming media, which adds a limited technical hurdle against a simple download of a complete MP3 audio file. If you prefer to deliver audio content as streaming media, you need to externally create an MP4 derivative and ingest it into a datastream labeled PROXY_MP4.


Code Block
languagetext
titleSample datastreams.txt manifest file for video content
OBJ==videofile.mpg
MODS==metadata.xml
# optional, if no MODS file is included, minimal metadata is automatically generated during ingest
CAPTIONS==videocaptions.vtt
# Timed textual representation of linguistic content in audio and video assets. REQUIRED for video assets to be publishable. Captions MUST be provided in WebVTT format.
TRANSCRIPT==audiotranscript.txt
# optional; textual representation of linguistic content in audio and video assets. Transcripts MUST be in plain text.

Step 3: Upload batch job to Jscape

Multiexcerpt include
MultiExcerptNameBatch ingest upload
PageWithExcerptBatch ingest simple assets

Step 4: Set up collection and submit form in DAMS interface

Multiexcerpt include
MultiExcerptNamebatch ingest queue
PageWithExcerptBatch ingest simple assets