WEB SERVICES - ONLINE OFFICE - 25.06.2018

Distilling text from PDFs

Suppose you want to pluck the content from a PDF document without using a special application. How can the cloud drives, Microsoft OneDrive or Google Drive, help you?

Free storage space

Google Drive ( http://drive.google.com ) and Microsoft OneDrive ( http://onedrive.live.com ) give you 15GB and 5GB of free online storage space respectively, which you can use to store photos, videos and documents. However, few people know that both services let you distil the content from saved PDF documents. This is convenient when you need to edit (part of) a text or copy it to an e-mail.

Google Drive

If you’ve saved a PDF document in Google Drive, you will see an “Open with Google Docs” option. This option lets you convert the document to text via OCR (Optical Character Recognition) technology. However, if the file contained tables, charts or images, they will generally disappear from the converted document.

In our tests, the text itself was distilled almost perfectly from the PDF and converted neatly into a Google Document. Needless to say that the result largely depends on the quality of the source file. You can now edit the content you have obtained directly in the text editor, or copy it to another application.

Microsoft OneDrive

We did the same test in Microsoft OneDrive. Here you should right-click on a PDF document in order to convert it to Word (choose the “Open in Word Online” option). In the new document, you can now select and copy the text. However, to edit the content, you should first press the Convert and Edit button. The text will now be opened in the Word Online editor and you can edit the content as you please. In our tests, the conversion of the texts was as good as flawless, but again a lot will depend on the quality of the original PDF document.

Tip. Our tests clearly showed that OneDrive succeeds much better in respecting the original layout of the PDF than Google Drive. Hence, if you have to edit documents containing lots of graphical elements, OneDrive would seem to be the best choice.

Alternatives

If you do a web search you will find lots of online tools which promise to convert PDFs into Word documents. However, the result of these conversions generally turns out to be a mere photo of the PDF, pasted into a Word document. This of course doesn’t help you much.

However, Online OCR ( http://www.onlineocr.net ) does succeed relatively well in using OCR to distil text from a PDF and paste it into a Word file. The original may be lost to a certain extent, but the quality of the converted text is more than acceptable.

All PDFs?

There are two types of PDFs. There are documents which contain only an image, and those combining images with an extra text layer. If the PDF is for instance a photo or a scan of a newspaper article, this is only an image, without a text layer. Still, in our tests Google Drive, OneDrive as well as Online OCR did succeed in converting these PFDs into text files.

Use Microsoft OneDrive or Google Drive in order to have the content of PDFs distilled and converted into usable text files which you can further edit in another program. OneDrive respects the layout of the original documents best.

© Indicator - FL Memo Ltd

Tel.: (01233) 653500 • Fax: (01233) 647100

subscriptions@indicator-flm.co.ukwww.indicator-flm.co.uk

Calgarth House, 39-41 Bank Street, Ashford, Kent TN23 1DQ

VAT GB 726 598 394 • Registered in England • Company Registration No. 3599719