srt2CSV

March 6th, 2010 | Categories: Tech

Currently, programming is not what I do for living. I personally do not feel so much into the programming, although that’s what I used to do in the past, and what a part of my degree says. Personally, I feel much more alive doing current job, which involves linguistic aspect in the software field. I should really consider myself as one of lucky people.

Nevertheless, I graduated from Information Engineering department, and I do not think of myself as a totally blind in programming; when there’s something I’d like to do and when it seems to be implemented rather simply, I start writing code by myself. In most of the cases, I complete the code and use it as a standalone program or as a service (and that’s one of the reason why I should keep using the remote web server, which can have a cron scheduled).

It was the same story this time. I got involved in a translation project called TED Open Translation Project. You first have to translate the transcripts of the talks available on the Internet. After the translation, reviewer must be assigned. Finally, after the discussion between them, the completed translation is published as subtitle. After a few talks, I felt it really frustrating and inefficient to communicate with my translator / reviewer, saying something like :

I think the following line,
3 0:00:05 0:00:07 this was a world unseen. それが目にされることのない世界だと気づき
should be modified to
3 0:00:05 0:00:07 this was a world unseen. 世界である事に気がつきました

Whole communication was done via email (or with Skype chat). After a while, I just couldn’t help myself this cannot be the optimal way of communication, because :

– Your changes to miss the segment is relatively high, which should actually be translated.
– You might misspell, when typing it into the subtitle system again (by the way, we use a service called dotSUB).
– You’re often confused which is the most recent version of the translation
– After exchanging several emails, you may lose track of the previous discussion, why we have the current translation.
– You might end up translating the same segment more than once.

The first thing I thought was using the Google Docs, which has a spreadsheet with sharing feature with others. The problem was that the subtitle format (called SubRip, or .srt) is not supported by Google Docs (I don’t even know how it’d look like on the Google Docs if it were supported). The Google Docs, however, can import / export the CSV (Comma-Separated Value) format, probably most primitive format for spreadsheet-like application. After that flash, it was rather simple; write codes to convert the SubRip file into CSV (and other way around).

My translation partner was first not convinced if it’s really a good way. After a couple of translations, he changed his mind. This method was introduced to the other Japanese translators. Gosh, I must say I felt good :-)

Soon enough (as expected), a person said it didn’t work as expected. It ended up that this person had not set up the environment properly. Sooner or later, I could see it would come; not all the people in the group are familiar with the names like “Perl” or “CPAN”. So I thought of making those scripts as a web service, taking away the necessary steps to create an environment on his/her local machine (although you still have to have a text editor). With some feedbacks, it seems most of the users are satisfied with this one (although I must admit I could already see potential improvement, which I’m too lazy to get down to implement).

A friend of mine, who actually introduced me this project, sent to California and held a talk on TEDActive. He said the reaction from other translators were enormous and the people from the dotSUB were also interested in this service. Gosh, I felt great!

A while later, another person asked me how exactly to use the service. In order to promote this service, I’d been thinking about making a tutorial of how to use it, even when I was setting up the service. So using 2 weekends, I started creating an easy (meaning not particularly fancy, but enough to get the bottom line) tutorial videos and posted on the YouTube.

How to import content into Google Docs :

How to work on Google Docs

How to import your translation into dotSUB

After creating those videos, I couldn’t help myself writing an email to TED Project, asking them to introduce this service with the above YouTube links to other TED translators / reviewers. The answer was to post the exact content on Facebook Discussion Board for TED Open Translation.

As I mentioned above, I’m not a professional programmer, and I do not consider myself as a brilliant engineer (I rather like something with natural language). But that flash, with a bit of ability to write some scripts obviously made others’ lives easier for translation, for which I think I should be proud of.

Comments

Powered by Facebook Comments

  1. Robert Green
    April 25th, 2010 at 12:44
    Quote | #1

    Thank you for your cvs/srt service. At Miyagi Gakuin
    University I am teaching a subtitle translation class. We had trouble exporting and importing srt files from/to Dotsub to/from windows Notepad,
    until we stopped editing the files in Notepad and started editing them on your srt2csv.html page, without using the submit/reset buttons. In other words, Dotsub > Notepad (EDIT) > dotsub resulted in “format error”, but Dotsub > Notepad > srt2csv.html (EDIT) > Notepad > dotsub resulted in “your file has been imported”. We don’t know why it works, but we are glad you made the service.
    Thank you very much.

  2. kira
    April 25th, 2010 at 14:36
    Quote | #2

    > Robert Green

    Thank you very much for your warm comment. I feel really happy that those services are used by various people (it seems that your case was even beyond my initial scope; I only thought that the services are used by the TED translators).

    The trouble you experienced : Did you use the srt2csv service at all? There are some changes made in srt2csv that the contents are better handled in Google Docs. If you by-pass the Google Docs (meaning converting the CSV format as it was converted back to srt format), it may cause some trouble.

    Another thing I could think of is the encoding. By default, it seems (and I assume) that the dotSUB service presumes that srt file to be encoded by UTF-8. I am not sure if the Notepad in your environment is capable of UTF-8 (I’m by no means Windows user in my private space). My suggestion is to use a software called “Notepad++”, which is an open-source text editing software with many more features than Notepad (which, to be quite frank, barely covers the minimal needs). Using Notepad++, you could easily see (and set) in which encoding the file is (to be) saved.

Comments are closed.