Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Warning

Doesn't work as described here, still missing instructions to place the PDF/A instructions and ICC profile files: https://ghostscript.readthedocs.io/en/latest/VectorDevices.html#creating-a-pdf-a-document

The path under step 4 (using Ghostscript) needs to be valid for the OS/computer from which you are running the command. The PS file the path points to needs to be edited to contain a valid path pointing to an ICC profile.

  1. Navigate to the staging directory containing the Production Master (_pm) image files

  2. Create temporary subdirectories 'pdf' and 'pdf/pages', and navigate back to the staging directory:

    Code Block
    languagebash
    mkdir pdf
    cd pdf
    mkdir pages
    cd ..

    Run tesseract OCR on all images in the staging directory: 

    Code Block
    languagebash
    for i in *.tif; do tesseract -c tessedit_page_number=0 -l eng $i pdf/pages/${i%_pm.tif} pdf; done
  3. Merge page-level PDF documents into one PDF per volume/issue:

    Code Block
    languagebash
    mutool merge -o pdf/combined-pdf.pdf pdf/pages/*.pdf
  4. Produce the final PDF document.

    Code Block
    languagebash
    gs -sDEVICE=pdfwrite \
        -dPDFA=2 \
        -dPDFACompatibilityPolicy=1 \
        -dNOSAFER \
        -dFastWebView \
        -sColorConversionStrategy=RGB \
        -dDownsampleColorImages=true \
        -dColorImageDownsampleThreshold=1.0 \
        -dAutoRotatePages=/None \
        -dColorImageResolution=150 -o downsampled-pdf.pdf /mnt/dps/staff_workspaces/mirko/pdfa/PDFA_def_UTL.ps combined-pdf.pdf
  5. Delete the pdf subfolder and combined-pdf.pdf

...