subler subtitle OCR for languages != English

The open source tool Subler offers a perfect feature to convert VobSub captions to TX3G format that is more compatible to iTunes and other clients (like Plex).

But the stock version of Subler only supports English text recognition. In order to recognize German umlauts and other latin special characters like this, you need to download extra data files for the OCR library ‘tesseract’.

Subler’s documentation mentions that but the link is outdated. So here’s the correct link to the data files for the (old) tesseract lib that’s included in Subler:

https://github.com/tesseract-ocr/tesseract/wiki/Data-Files#data-files-for-version-302

The tar files have to be unpacked and the data has to be copied to ~/Library/Application Support/Subler/tessdata

Java :: HTML parsing

I used TagSoup some years ago, but last week I came across ‘JSoup’. It also allows parsing of ‘real world HTML’, and comes with a really neat API to download and select subsets of your document.

See for yourself:

String url = “https://javaspecialists.teachable.com/p/refactoring2j8”;

Document doc = Jsoup.connect(url).get();
Elements items = doc.select(“a.item”);

Extracting subtitles from MPEG files

ccextractor is an open source tool to extract subtitle tracks.

To install it via brew:
brew install ccextractor

To extract the default (English) subtitle to default (SRT) format, just use
ccextractor video.mp4

To extract all subtitles from all m4v files in the current directory (.):
find . -name "*.m4v" -exec ccextractor {} \;

Pushing to Github with Eclipse

Since Google Code went down, I only automatically transferred my old projects there to Github. Today I pushed my first new project directly to Github: https://github.com/krizleebear/Melodies2Go
I’ll not go into detail about this project – it’s still work in progress.

But as a reminder for me: When I need to push a new Github project, here’s the deal:

  1. http://stackoverflow.com/questions/17552457/how-do-i-upload-eclipse-projects-to-github
  2. http://stackoverflow.com/questions/19474186/egit-rejected-non-fast-forward
  3. plus there was a bug using Eclipse Neon.1 – https://www.eclipse.org/forums/index.php/t/1081631/

Audi CES 2014 :: Featuring new HMI and MMI Search

It’s been 3 years of development and with the premiere on CES 2014 we’re allowed to show it to the public:
Our brand new 2014 Audi MMI with completely new look&feel and integrated onboard search engine “Audi MMI Search”!

Audi MMI Radio Search, photo: AUDI AG

Audi MMI Radio Search, photo: AUDI AG

quote from Audi’s press release:

“A special highlight of the new Audi MMI is MMI search. This is a practical function that assists the driver in searching for a term, simplifying the search. MMI search is available in every main menu. The results list is shown right away while the user is inputting – generally just a few characters suffice to come up with the term. In the Radio and Media menus, a character string leads directly to the desired radio station, track, album or performer.

MMI search is especially helpful in navigation. When inputting a navigation destination, MMI search permits free text input without having to use a rigid formula. In most cases, just a few characters are sufficient to find any destination in Europe. It is no longer necessary to input the country. In the results display, the MMI takes the current location of the car into consideration, so that hits for the immediate vicinity are displayed first. When searching for a street near the car’s position, it is generally only necessary to input the first few characters of the street name. When looking for a restaurant in any European city, all the user needs to do is input the first characters of the restaurant name and the first characters of the city name separated by a blank character; then the MMI lists relevant hits with addresses.”

Furthermore we’ve introduced our all-integrated cluster instrument – ready for the launch of the new 2014 TT:

Audi virtual cockpit, integrated MMI cluster instrument in Sport Quattro Concept, photo: AUDI AG

Audi virtual cockpit, integrated MMI cluster instrument in Sport Quattro Concept, photo: AUDI AG

CES 2014, Audi booth

CES 2014, Audi booth

2014 Audi TT interior

2014 Audi TT interior

Tag MPEG-4 meta data

Finally, I found a free, fast and easy tool to tag MPEG-4 files: MetaZ.
https://github.com/griff/metaz

It’s pretty easy to use. One little detail to be mentioned though:
You can add chapter marks by pasting them in the following format:

00:00:00.000 Intro
00:00:30.000 Antihelden
00:5:00.000 Die Jungs Aus Dem Reihenhaus
00:15:00.000 Fenster Zum Berg
00:20:00.000 Affentanz
00:22:00.000 Manfred Mustermann
00:36:00.000 Solala
00:40:00.000 Safari
00:50:00.000 Liebe & Hass
00:55:45.000 6 Meter 90
01:00:00.000 Blattgold Auf Anthrazit

downloading Wikipedia

Wikipedia is a high quality resource for information. And it’s free. You can even download its complete content as XML dumps. For example for the German Wikipedia:

wget -c http://dumps.wikimedia.org/dewiki/latest/dewiki-latest-pages-articles.xml.bz2

 

Weihnachten 2012

stefan_christbaum

 

Baam & Bua

interesting ports (MacPorts)

  • HandBrake – MPEG tool
  • hexfiend – hex editor
  • macfuse, sshfs – user space file systems
  • textmate2 – text editor
  • wget

unzip all zip files in the current dir to a separate folder

for file in `ls *.zip`; do unzip $file -d `echo $file | cut -d "." -f 1`; done