A scanned page in old version of e-books or paper, or one created from a photo or drawing, is only an image of a page, and you can’t manipulate its content by extracting images or modifying the text. However, Adobe Acrobat can convert the image of the document into actual text by using optical character recognition (OCR).
I used this function on Adobe Acrobat 8 and 9, which both work very well.
To capture the content of an image or a page from a scanned PDF document, follow these steps:
- Save your target page of the PDF file as an image.
- Crop the image to less than 45×45 inches by using Photoshop or PowerPoint, or any other software that can do it. Save the cropped image.
- Choose File > Create PDF > From file > your cropped image that you want to convert.
- Choose Document > OCR Text Recognition > Recognize Text Using OCR. The Recognize Text dialog box opens. You don’t need to modify any other parameters, and just click OK to start the capture process. The conversion process is very fast and you just need to wait several seconds.
- When the process is complete, the dialog box closes and the results of the conversion are shown in the document, right in front of you.
- Now you can copy and paste the txt content to your Microsoft Word file and start to modify the errors.
It is said that Adobe Acrobat 9 can convert large file rather than a 45×45 inches one, but I don’t know why I can’t achieve it.