Redacting PII

PII Found in Image Files (.tif, .png, .jpg, .btmp) and Text Files (.doc, .wpd, .txt, .mbox, .emls)

            When PII is found in image files, it is harder to redact than in text-based files. The following is a case-study for scrubbing image files of PII. This workflow is conducted in Windows.

To redact PII from text-based files, follow the instructions below but skip steps 5-9.

  1. After running bulk_extractor, check the pii.txt file. If PII is found, it will display here. The document will show you the filepaths and the PII found (here redacted with xxxs). This allows you to determine the filetypes containing PII. In this case, it is a TIF image.
  2. Open the disk image in FTK Imager. Expand the evidence tree to find the file and click on it. This opens the hex file and associated text string in FTK’s main window.
  3. Because this is an image, you need to determine that the PII string is really in the text or metadata and not a mistake in bulk_extractor. Press CTRL+F to search for the PII in the text string. If it is present, it will be highlighted in blue (here redacted). Now, you know that the PII string is present in the text.
  4. Because FTK is write-blocked, you need to open the file in a different tool. Right click on the file in FTK’s evidence tree and export it to an external folder.
  5. Next, open XnViewMP.
  6. When the application opens, you will see three windows: the folder tree, the filepath and folder view, and Preview. Click File>Open to open the folder containing the extracted image(s). The image(s) will appear in the folder view.
  7. Click on the image to open it in Preview. Determine that the PII is not visually present in the image. If PII is visually present, contact Digital Stewardship.
  8. Browse through the tabs on the left (Preview, etc.) to determine whether PII is present. If it is, contact Digital Stewardship.
  9. If the PII is not visible on the image or in the associated text, you will need to reformat the image to scrub the PII from extant metadata. Right-click on the image and select Convert into>JPEG. Converting an image into a JPEG condenses the amount of available metadata space, and should automatically delete the PII. Once you convert, the .jpeg file will appear in the folder.
  10. Next, open Super Hex Editor. Click File>Open and open the .jpeg image file. You will see the hex file and text displayed.
  11. Click Edit>Find. Paste the hexadecimal string or PII text into the Find box (here redacted).

  12. Click Find First. If nothing appears, you have successfully scrubbed the PII from your .jpeg file. If there is still PII in the image file, contact Digital Stewardship.
  13. If this is a text file, highlight the PII string and replace it with x's.
  14. In order to keep the PII out of the data written to tape, extract the disk image to a folder titled "AIPNUMBER_files" (i.e, 2017009_02_253_files). Delete the disk image from the AIP folder.
  15. In the exported folder, delete the file(s) containing PII. If you wish, use FTK Imager to make the deleted files impossible to recover.
  16. Replace the deleted file with the redacted one. Make sure it has the same title and is in the same location.
  17. Examine the folder using the Folder Processor in BitCurator. This process is explained in Problematic Disk Images. Once the folder is processed, check the pii.txt file. It should be blank. If it isn't, locate the source of the remaining PII and repeat the above process.
  18. When all PII is erased, add this redacted folder to the AIP folder and bag it.