...
Warning |
---|
Doesn't work as described here, still missing instructions to place the PDF/A instructions and ICC profile files: https://ghostscript.readthedocs.io/en/latest/VectorDevices.html#creating-a-pdf-a-document The path under step 4 (using Ghostscript) needs to be valid for the OS/computer from which you are running the command. The PS file the path points to needs to be edited to contain a valid path pointing to an ICC profile. |
Navigate to the staging directory containing the Production Master (_pm) image files
Create temporary subdirectories 'pdf' and 'pdf/pages', and navigate back to the staging directory:
Code Block language bash mkdir pdf cd pdf mkdir pages cd ..
Run tesseract OCR on all images in the staging directory:
Code Block language bash for i in *.tif; do tesseract -c tessedit_page_number=0 -l eng $i pdf/pages/${i%_pm.tif} pdf; done
Merge page-level PDF documents into one PDF per volume/issue:
Code Block language bash mutool merge -o pdf/combined-pdf.pdf pdf/pages/*.pdf
Produce the final PDF document.
Code Block language bash gs -sDEVICE=pdfwrite \ -dPDFA=2 \ -dPDFACompatibilityPolicy=1 \ -dNOSAFER \ -dFastWebView \ -sColorConversionStrategy=RGB \ -dDownsampleColorImages=true \ -dColorImageDownsampleThreshold=1.0 \ -dAutoRotatePages=/None \ -dColorImageResolution=150 -o downsampled-pdf.pdf /mnt/dps/staff_workspaces/mirko/pdfa/PDFA_def_UTL.ps combined-pdf.pdf
Delete the pdf subfolder and combined-pdf.pdf
...