Categories
Uncategorized

subler subtitle OCR for languages != English

The open source tool Subler offers a perfect feature to convert VobSub captions to TX3G format that is more compatible to iTunes and other clients (like Plex).

But the stock version of Subler only supports English text recognition. In order to recognize German umlauts and other latin special characters like this, you need to download extra data files for the OCR library ‘tesseract’.

Subler’s documentation mentions that but the link is outdated. So here’s the correct link to the data files for the (old) tesseract lib that’s included in Subler:

https://github.com/tesseract-ocr/tesseract/wiki/Data-Files#data-files-for-version-302

The tar files have to be unpacked and the data has to be copied to ~/Library/Application Support/Subler/tessdata