Software review: Infix (www.iceni.com)
Translators' mailing lists and forum often receive messages from colleagues asking how to translate PDF files.
Generally, the answers are always the same: If the PDF file contains only graphic images of what appears to be text, as when it's the result of scanning a paper original, you need to use an OCR application, such as Abbyy Finereader, Nuance Omnipage and others.
If the PDF file is editable, and contains real text, you can extract this text using any of a number of software tools (again Finereader, Omnipage, Wondershare and others are mentioned) or by selecting the text and saving it somewhere. There are some Web sites that offer to convert your PDF to text for free, or at a small charge, with variable results and ensuing confidentiality issues.
If we must, or we want, return a translated document that is graphically similar to the original, we have to reformat it manually, and that requires time.
Another important aspect is that this extracted text is often not very suitable for CAT processing (even if some CAT tools offer a rudimentary PDF translation function) because of text flow issues, and the presence of "rogue" formatting, i.e. formatting that is picked up by the OCR/extraction tool but has no apparent function other than clutter and bother you when you feed the text into a CAT tool.
There is however another solution that allows you to have the cake and eat it too, i.e. work with a CAT tool and provide your client with a document that is graphically very nearly identical to the original.
I have been using for a while Infix (http://www.iceni.com), a very powerful tool that, among other things, allows you to extract all the text of an editable PDF in a format compatible with any XML enabled CAT tool or even Google Translate and, after processing it, reload it into the same PDF to obtain a result that closely, if not totally, mirrors the original. There are obviously issues relating to fonts, translation expansion, text flow, etc. but Infix offers a number of tools that allow you to edit the PDF directly and correct these problems relatively easily.
In my experience, and I translate a couple of PDFs a month, I have only used a few of these functions. Generally, I don't do much more than enlarge a few text boxes, or sometimes fix the horizontal spacing and font size of text that expanded too much in Italian. Infix includes a function that adapts automatically the imported text to its corresponding box. But I disabled it almost immediately, because the results may be too drastic. In summary, Infix is certainly not perfect, but it is very useful and can repay itself with a single translation job. I use Version 4. The new version is apparently more user-friendly, but I have not upgraded it yet. An Infix Pro license costs 159 USD.
Another interesting advantage offered by Infix Pro is that it allows you to return a PDF to your customer. As we all know, the PDF format has been developed to retain the layout and the graphic appearance of the document across platforms. By returning a PDF to your customer, you are actually saving your and their time. The layout is retained, and the file is ready for publishing. In case of later further additions and changes, it does not take much time to repeat the process using the original TM.
For more information, you can watch a movie of the import/export function on http://iceni.cachefly.net/infix/DemoMovies/Translation/Translation.swf and http://pdftranslation.org/ contains another, more general review.
I hope this is useful.
Lorenzo
PS Feel free to distribute and reuse this review, as long as you attribute it to me (Lorenzo Martinelli) and include a clear link to my web site: http://www.martinelli.co.uk




