by David Sparks

 

RSS Sponsor:

Search
« Mac Power Users 3 - Going Paperless | Main | ABA Virtualization Article »
11:15PM

PDFpen OCR Folder Action Script

As discussed on Mac Power Users episode 3, "Going Paperless," the nice people at Smile On My Mac put together an Applescript that, when combined with a folder action, gives you a way to automatically OCR documents using PDFpen or PDFpenPro. So here is the promised walk through:

What you'll need:

1. Some scanned PDF images;
2. PDFpen or PDFpenPro (See my review here);
3. A bit of patience.

Step 1 - Load up the Script Editor


Script Editor.png

This little application allows you to create and save AppleScripts.

Step 2 - Copy in the below script


on adding folder items to this_folder after receiving added_items
try
repeat with i from 1 to number of items in added_items
set this_item to item i of added_items
tell application "PDFpenPro"
open this_item
set theDoc to document 1
repeat with aPage in pages of theDoc
ocr aPage
-- Looks like we need to modify PDFpen so that we can detect when OCR is done; for now use 15 seconds
delay 15
end repeat
save theDoc
close theDoc
end tell
end repeat
on error errText
display dialog "Error: " & errText
end try
end adding folder items to

-------------

Note - if you use PDFpenPro instead of PDFpen, you'll need to open the script and edit the command that reads "tell application "PDFpen" to read "tell application "PDFpenPro".

Note 2 - Wordpress seems to have converted the double dash before the comment in to an em-dash and the quotes to smart quotes. Although I fixed it in the wordpress code, it still reverts to "fixing" things when I publish so you'll have to correct those in your editor. Sorry. If anyone knows a better way to post applescript via wordpress, please drop me a note.

Step 3 - Save the script


You need to save it to a specific directory:

HD/Library/Scripts/Folder Action Scripts/

I named mine "PDFpen Scriptacular"

Step 4 - Create a folder


Save the folder wherever is convenient. Perhaps in your documents folder or (for you anarchists) on the desktop. By the way, did you know that command-shift-n gets you a new folder? I named mine "OCR Drop."

Step 5 - Enable folder actions


Secondary click on the folder and enable folder actions under the "More" item.
Enable Folder Actions.jpg

Step 6 - Configure Folder Action


Right clicking the folder a second time gives you a new option, Configure Folder Action. Click it.
Configure Folder Actions-1.jpg

Step 7 - Pick Your Folder


On the menu that appears, hit the plus (+) sign under the "Folders with Actions" box.
FA pick folder.jpg

Select your folder, wherever you located it. It will then ask you to pick a script. Pick the PDFpen scriptacular.scpt
pick script.jpg

It should now look like this.
Script menu.jpg

Close the window and you are done.

Now just drag a few PDFs in and let the script go to work. Copy the OCR'd PDFs where they belong and you are done. There are a few additional points:

1. There is no Applescript command in PDFpen that reports when it is done doing an OCR so instead there is a 15 second timer. The PDFpen wizards report they are going to try and fix this in a future release.

2. While this script generally works, it sometimes gave me an error when I overloaded it. Be patient.

I want to give my personal thanks to the gang at Smile On My Mac, particularly Greg, who put this script together for Mac Power Users just because we asked.

Reader Comments (16)

Very helpful. There was an article sometime last year in, I think, Macworld entitled something like "Going Paperless" that used script to control scan, OCR the product, and file the OCR on your hard drive. Acrobat was used to provide the OCR.

May 24, 2009 | Unregistered CommenterMike Harahan

Wordpress seems to have converted the double dash before the comment in to an em-dash and the quotes to smart quotes. You might want to edit the text to correct this so that it functions properly for those who do not know enough to fix that sort of little bug.

May 27, 2009 | Unregistered CommenterGreg Mote

@Greg -

Thanks for catching that. I tried to fix it but Wordpress keeps reverting it so I placed a note. If anyone knows a better way to post Applescript to Wordpress, please drop me a note.

May 28, 2009 | Unregistered CommenterMacSparky

Hey Mr. Sparky

Nice one, now could you do the same with Adobe Acrobat :)

June 12, 2009 | Unregistered CommenterMarkus

Me again

Why I ask for Acrobat is because Acrobat reduces the size after the OCR process which PDFpen doesnt do in the same process.

June 12, 2009 | Unregistered CommenterMarkus

Markus - this script works well from DocumentSnap:

http://www.documentsnap.com/acrobat-applescript-for-scansnap-ocr/

August 17, 2009 | Unregistered CommenterRob

This script is really buggy. I keep getting these errors:

1st error: Error: PDFpen got an error: Connection is invalid.

2nd error: Error: PDFPen got and error: Can't get document 1. Invalid index.

(#2 pops up once I hit OK on error #1)

I thought the Scripts published for Acrobat by Macworld were buggy but this one is just as bad. Ideas? I'll also post this over at MPUs. Thanks.

September 23, 2009 | Unregistered CommenterRob

Rob,

There is a new version of PDF Pen since this post published so the script needs to change. I'll be doing a new post with a new script soon.

September 23, 2009 | Unregistered CommenterMacSparky

David,
Did you ever update this. I just attempted it and it crashed PDFPen Pro. And I assume it just saves the file back as an OCR'd PDF, correct? (I'm attempting with a JPG, so that might be the issue. PDFPen Pro can open the jpg, but might not know what to do when it's time to save it.)

What I really want to do is have a script like this open a jpg screencap save the OCR results as an .rtf

Here's the ideal situation -- I capture a screen on the ipad, save it in a dropbox folder. My Mac at home processes that .jpg via OCR and then saves it as an .rtf, which Hazel then moves into Notational Velocity. In short, I can capture a screen from Kindle or my Bible app (neither of which allow cut and paste) and minutes later have it available in SimpleNote!

June 10, 2010 | Unregistered Commenterjohn chandler

David,
I'm getting the same "Connection is Invalid" error as Rob (above). I wonder if you could take a look at the script again. I inquired with Smile about the script but got no traction...


On May-27-2010, at 6:50 AM, PDFpen Support wrote:

Hi

As its not our script, we don't maintain or update it. Your best bet would be to contact the MacSparky folks if there's a specific update to the script you'd like to see.

Thanks for using PDFPen from SmileOnMyMac!

Regards,

Justin
PDFPen Support
support@pdfpen.com
http://www.smileonmymac.com/pdfpen


Thanks, Marc

August 19, 2010 | Unregistered CommenterMarc

free ocr is a online ocr service. You can have a try.

August 22, 2010 | Unregistered Commenterfolha

Gang,

I'm underwater on a big project for a bit longer but once I can breathe again, I'm going to be rewriting and posting this script. Stay tuned.

August 22, 2010 | Registered CommenterDavid Sparks

Hope I'm not stepping on any toes, but I wanted an OCR script for PDFPen so I took the one David posted and combined it with another one. It seems to be working for me. I posted it here: http://www.documentsnap.com/pdfpen-ocr-applescript-to-automatically-make-pdfs-searchable/

September 28, 2010 | Unregistered CommenterBrooks Duncan

There are many web-based OCR solutions without need to registration. Here a good one I'd like to recommend you: http://www.goodocr.com

October 20, 2010 | Unregistered CommenterHarry

Have you done any more work on this script? PDFPenPro keeps shutting down on my when I run it.

December 15, 2010 | Unregistered CommenterTodd A. Peperkorn

Just checking in to see if any further development has occurred on this script

August 25, 2011 | Unregistered CommenterJack Forbush

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>