Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Multiexcerpt include
MultiExcerptNameBatch ingest general instructions
PageWithExcerptBatch ingest simple assets


Multiexcerpt include
MultiExcerptNamedatastreams generator script
PageWithExcerptDAMS datastreams.txt generator


The tiered ingest batch module uses filenames to identify the files that correspond to specific datastreams. All of the files you are ingesting as one asset should go in one directory, a sub-directory of the path you identify in the queue form. Each sub-directory corresponds to one asset and must have at least a file for the "key datastreams" (datastreams.txt). This file will list the datastream ID and corresponding filename, for instance the MODS datastream (MODS.xml), OBJ datastream (ex: filename.tif for large image), or other datastreams with derivatives. 

...

sample directory structure:

utlarch

    batch1

    set1

        datastreams.txt

primaryfile.tif

        anyarbitraryderivativefile.ext

        anyarbitrarycomponentfile.ext

        anymediaphotographfile.ext

        metadata.xml

    set2

       datastreams.txt

primaryfile.tif

        anyarbitraryderivativefile.ext

        anyarbitrarycomponentfile.ext

        anymediaphotographfile.ext

        metadata.xml

...


Code Block
eid1234_example-batch-submission/ (batch job folder)
├── asset1/
│   ├── datastreams.txt
│   ├── modsfile.xml
│	├── primaryfile.tif
│	├── anyarbitraryderivativefile.ext
│	├── anyarbitrarycomponentfile.ext
│   └── anymediaphotographfile.ext
├── asset2_audio_example/
│   ├── datastreams.txt
│   ├── modsfile.xml
│   ├── audiofile.wav
│   ├── derivative_audiofile_for_streaming.mp4 (e.g. for creating PROXY_MP4 datastream, which is required for streaming audio)
│   └── audio_transcript.txt
└──	asset3_video_example/
    ├── datastreams.txt
    ├── modsfile.xml
    ├── videofile.mp4
    ├── video_captions.vtt
    └── video_transcript.txt
    └── page02_custom_ocr.txt

Notes:

  • set1 & set2 as shown above would be under the batch directory and each set represents an individual asset with its datastreams
  • batch can be just one set but would still need the extra nesting
  • there is no upper limit on number of sets/objects or filesize