How to Use OCR

You can find instructions for the following Soda PDF versions:

Soda PDF 14

Soda PDF 12

Soda PDF 11

 

Soda PDF 14

Optical Character Recognition, OCR, is a technology that recognizes text within images. It allows Soda PDF to differentiate the text from the rest of the image so you can edit it.

If the OCR module is not available for you, you can purchase it here. 

 

 

You will be able to recognize an image by the red border that surrounds it when you select it while in Edit Mode.

 

mceclip0.png

 

When the whole page is one large image, it is indicative of a document made up of scanned pages. Without OCR, they cannot be edited easily.  

Once the image is selected you need to use the OCR to make the text editable.

 

mceclip3.png


Auto and Manual OCR

These are only active when an individual image is selected. Rather than scanning an entire document, you can work image by image. These features do not create a new file but scan the image within the existing PDF. Click here to learn more. 

 

Recognize Document

If you have a document made up of several scanned pages that need to be recognized and edited you need to open the OCR module and choose the Recognize Document option:

 

mceclip4.png

 

In the dialog box that appears you can specify the pages to recognize.

 

mceclip0.png

 

After the recognition process is finished, a new file with the recognized text will be created in a separate tab. Your original file will not change.

 

External Image

To recognize the text of an external image to PDF click External Image.

 

mceclip1.png

 

A Browse window will open where you need to select the file. Click Open.

Once the image has been recognized, it will open in a new portable document within the Soda PDF application.

 

Soda PDF 12

 

 

Soda PDF 11

In Soda PDF 11 the following options are available:

 

mceclip0.png

 

Page Range:  
Click here to select a page range to scan. Select the pages you’d like and hit OK. A new file will be created from the selected range:


Image 


Entire Document: 
This option will apply the OCR Engine to the entire document at once. You will see the status bar appear advising you that Soda PDF is recognizing text. You can click on Cancel to stop it. When it is finished, a new file will open with all your images scanned. Your original file will not change.


Image


Batch: 

With the Batch tool, you can use the OCR engine on multiple files at once. When you click on Add Files… or Add Folder…, you will be prompted to Browse your computer and choose your files. You can use the arrow options to change the order in which the files will be batched. You can remove any file from the list with Delete.


Image


Click on Browse… to change where the files will be saved. When you click Batch, each recognized file will open individually in the order you selected:


Image

 

From External Image:  

This will open a Browse window. Choose your file and it will open ready to be edited.  

Scan and Recognize: 
This feature will interface directly with your scanner. As you create your new PDF file directly from your scanner, the documents will be scanned with OCR as well, making them ready to edit. Click here to learn more about this feature.  

Create from Scanner is only available in Soda PDF Desktop. Soda PDF Online is a web app that does not access the operating system of your computer. We are working towards finding a way to bring this feature to Soda PDF Online as quickly as possible.

Was this article helpful?

Yes No
47 out of 96 found this helpful
Have more questions? Submit a request