OCR Preferences

The I.R.I.S. OCR plugin fully integrates with Nitro Pro 7 allowing it to recognize text from scanned pages, or from images in open PDF documents. The OCR utility is fully customizable, with options to configure options such as skewed image correction, to image compression settings. You can also enable text recognition for scanners directly on the Create PDF From Scanner dialog, to automatically create PDF documents from scanned pages with text that can be modified or searched.

To configure OCR preferences:

  1. Click the File menu button in the top-left corner of the application
  2. Click on the Preferences button at the bottom of the main menu
  3. In the Preferences dialog, click on OCR in the categories column

OCR General Preferences

The general preferences to configure the OCR functionality are split into 3 categories, with the following options:

  • Correct image skew: Straighten any text which is skewed on the scanned document
  • Use fixed threshold: Thresholding is the process of analyzing the histogram of an image to distinguish the text from the background. A fixed threshold applies the same cut-off point to the entire image, as opposed to finding the text dynamically. The percentage values indicate the point at which the contrast between blacks and whites is ideal to recognize text, with 0% being completely dark and 100% being completely white. This setting is recommended if an image contains different background colors, or a background which varies in shading or gradient. For more common OCR operations, it is recommended to disable the fixed threshold setting
  • Detect text orientation: Rotate pages automatically when they have been scanned at 90, 180, or 270 degree angles
  • Smooth color image: Flatten out the colors of the image to remove the JPEG compression artifacts and help recognition
  • Language: Select the language in which the scanned text you want to detect is written in
  • Quality:
    • Low (fast): Ideally used when the text on the scanned document is very crisp and easy to recognize. This setting allows scans to be performed quickly
    • Medium (medium): Recognition is more precise than the Low setting, requiring the scan to take slightly longer
    • High (slow): Recommended for scanning text which is harder to recognize. Scan time with this setting is noticeably longer since the recognition methods are more complex
  • Type
    • Searchable image: Adds a hidden layer of text to enable searching and text markup (e.g. highlighting). The text however cannot be modified in any way
    • Editable text: Rebuilds the entire document, resulting in a PDF file that contains both searchable and editable text. Because this method does not retain the original scanned image, results may vary, and is only recommended when you need to make changes to the PDF file
  • Downsample images: After the scan is complete, you are able to reduce the resolution of your scanned images to lower the size of the resultant PDF document
  • Image compression factor: The more an image is compressed, the smaller the file size of the output PDF. However, if your PDF document is intended for print, low compression is recommended to preserve quality
  • Embed fonts: After the scan process, if the required fonts are found on the system, the output PDF will save and store the fonts used to recognize the output text

OCR Preferences