Page Comparison

...

Warning

Doesn't work as described here, still missing instructions to place the PDF/A instructions and ICC profile files: https://ghostscript.readthedocs.io/en/latest/VectorDevices.html#creating-a-pdf-a-document

The path under step 4 (using Ghostscript) needs to be valid for the OS/computer from which you are running the command. The PS file the path points to needs to be edited to contain a valid path pointing to an ICC profile.

Navigate to the staging directory containing the Production Master (_pm) image files
Create temporary subdirectories 'pdf' and 'pdf/pages', and navigate back to the staging directory:
Code Block
language bash
mkdir pdf cd pdf mkdir pages cd ..
Run tesseract OCR on all images in the staging directory:
Code Block
language bash
for i in *.tif; do tesseract -c tessedit_page_number=0 -l eng $i pdf/pages/${i%_pm.tif} pdf; done
Merge page-level PDF documents into one PDF per volume/issue:
Code Block
language bash
mutool merge -o pdf/combined-pdf.pdf pdf/pages/*.pdf

Produce the final PDF document.

Code Block

language	bash

gs -sDEVICE=pdfwrite \
    -dPDFA=2 \
    -dPDFACompatibilityPolicy=1 \
    -dNOSAFER \
    -dFastWebView \
    -sColorConversionStrategy=RGB \
    -dDownsampleColorImages=true \
    -dColorImageDownsampleThreshold=1.0 \
    -dAutoRotatePages=/None \
    -dColorImageResolution=150 -o downsampled-pdf.pdf /mnt/dps/staff_workspaces/mirko/pdfa/PDFA_def_UTL.ps combined-pdf.pdf

Delete the pdf subfolder and combined-pdf.pdf

...

Versions Compared

Old Version 14

New Version Current

Key