2014년 4월 29일 화요일

Finding Korean public domain works

On Wednesday Apr. 23rd, 2014 I attended the second class of the year for the KLTI (Korean Literature Translation Institute) Translation Atelier. This will be my fifth year as a KLTI-affiliated translator but I'm especially excited about this year's class because it's being led by Sora Kim-Russell, who has translated Shin Kyung-sook's I'll Be Right There (어디선가 나를 찾는 전화벨이 울리고) and Gong Ji-young's Our Happy Time (우리들의 행복한 시간) among other works.

During class, the topic of crowd-sourced and team translation came up and we discussed the idea of finding a non-copyrighted work to translate together as a class. Thanks to the Internet, there are a variety of sources for non-copyrighted works in Korean. Most of these works have been written by Korean authors who passed away long ago.

The first source is Wikibooks Korea. There are over 14,000 Korean public-domain documents available, including works written in Classical Chinese from the Joseon Dynasty era.

The second source is the Korea Copyright Commission, which maintains a list of works by Korean artists and writers from the late-19th century onward that are out of copyright (I noticed that most of these works are written in pure Hangul rather than in Classical Chinese).

The class hasn't yet started to discuss the logistics of team translation, but I think using some kind of CAT software would be a good idea. I use OmegaT in my day-to-day translation work, but have yet to use the team translation feature that supports git and SVN repos for storing translation memories and glossaries. I think Google Translator Toolkit (GTT) might also be a possibility (although lately I've heard it's been really slow and unresponsive) but its concordance searching and TM matching ability is far poorer than that of stand-alone locally-installed CAT applications. Regardless of what tool we end up using, any sort of team translation needs to have a mechanism for ensuring that translators don't step on each other's toes -- i.e. using multiple spellings for the same object or character and other inconsistencies in language use.

2014년 4월 22일 화요일

Using CINT (ROOT) as a REPL for C

  The first language I formally learned was Python and the REPL was invaluable for running quick snippets of code to get a handle on syntax and the behavior of built-in functions. I'm now taking the 2014 offering of CS50x from edX/HarvardX and one of the languages that is introduced is C.

Aside from the syntactical differences with Python, I've found it difficult to get used to the lack of a C REPL. Even running a simple 'hello world' snippet of code requires me to compile (in the course we use the clang compiler) the source file and run the output binary.

However, I recently learned that CERN has a numerical analysis package called ROOT which they use in physics research. The neat thing about this package is that it contains a built-in C/C++ interpreter that works like a REPL. From within ROOT, I can evaluate expressions like 2+2 or printf() statements without compiling:

[archjun@arch ~]$ root
  *******************************************
  *                                         *
  *        W E L C O M E  to  R O O T       *
  *                                         *
  *   Version   5.34/15  11 February 2014   *
  *                                         *
  *  You are welcome to visit our Web site  *
  *          http://root.cern.ch            *
  *                                         *
  *******************************************

ROOT 5.34/15 (v5-34-15@v5-34-15, Feb 11 2014, 18:58:45 on linuxx8664gcc)

CINT/ROOT C/C++ Interpreter version 5.18.00, July 2, 2010
Type ? for help. Commands must be C++ statements.
Enclose multiple statements between { }.
root [0] 2 + 2
(const int)4
root [1] printf("Hello\n");
Hello
root [2] 

The ROOT REPL kind of reminds me of ipython's interface. One downside of ROOT is the large size of the package -- well over 100MB -- that includes lots of analysis libraries that a beginner in C has no use for.

2014년 4월 15일 화요일

WinXP VM provided by Microsoft (ModernIE) is very useful for the Korean ActiveX web environment

I'm a happy user of Virtualbox on Linux. I mostly use Virtualbox to run WinXP guests on my Linux host so I can use Korean Internet banking services and shop online in Korea. You might ask, "Why do you need an ancient Windows OS just to do Internet banking?"

Some background on this situation: in the late 1990's, prescient Korean bureaucrats decided to implement PKI (Public Key Infrastructure) for all online transactions. Since existing open standards for PKI were still in the early stages back then, the Korean gov't rolled their own solution which was implemented through ActiveX plugins for Microsoft's Internet Explorer web browser.

Fast forward 15 years -- the web has evolved but the Korean PKI standard is still unchanged and all Koreans are forced to keep using old ActiveX plugins which have been left in the dustbin of history. Shopping and banking online in Korea is a hugely frustrating experience, as you must download and install new ActiveX plugins almost every time you start a new session. Although nominal Internet speeds are quite fast (several times the speed of American Internet) in practice most Korean users cannot benefit due to all the cruft they are forced to install to use native Korean Internet services (which are still optimized for the IE8 web browser).

Although IE has > 80% market share in Korea, online banking and shopping sites only support IE's older incarnations. Good luck getting modern versions of IE (11+) working on, say, the websites of Woori Bank or Auction Korea.

On modern Windows OS's, however, more recent versions of IE are installed by default. If you're one of the millions of Koreans still using WinXP, that's not a problem because IE8 is the most recent version of IE compatible with that venerable OS. For those not using XP, however, using the Korean Internet is often problematic. For example, my girlfriend's computer runs Windows 7 with a modern version of IE but she sometimes can't use certain Korean sites requiring ActiveX.

The solution? Use virtualization -- Virtualbox, VMWare, Parallels, etc. As I mentioned earlier, I use Oracle's Virtualbox. In the past I installed WinXP onto a fresh VM using an old .iso from MSDN. Even the most slimmed-down install with extraneous features omitted came in at just under 10 GB with no custom applications installed.

Whether using dynamically-allocated partitions or not in Virtualbox, after installing a few key applications for Korean banking, shopping and word processing (in the form of Hancom's infamous 한글 200x series) partitions soon balloon to 15GB or more.

Enter WinXP VM's provided directly by Microsoft through their ModernIE site -- for Virtualbox on Linux, we import a compressed OVA file that blows up to a regular VMDK image file. The bare-bones WinXP VM only takes up 1.9 GB on initial boot! Compare that to ~ 10 GB for a clean install from .iso or CD.

For users in East Asia, however, the VM cannot be used as-is because the default VM image is of a US version of Windows. Prepping the VM requires the following steps (which requires a WinXP install CD or .iso from MSDN):

1) Extract the necessary install files from the WinXP install CD or .iso - extract the entire /i386 directory from the .iso to some directory, then use the Virtualbox "shared folder" feature to make this directory readable by the WinXP guest.

2) Enable East Asian Font Support -- go into Control Panel and select "Regional Settings & Languages" and make sure the box for "East Asian Language support" is checked. The installer will then ask for the WinXP CD or a location where the installation files reside. If we point the installer to the shared folder /i386/... the language file installation will go forward. The path changes a few times during the install, so you may need to enter the /lang subfolder and then later point to the parent folder again.

3) Install East Asian Language IME (Korean, in my case) - go into Control Panel and select "Regional Settings & Languages" once more but this time click on "input methods" instead of "languages"-- now choose Korean and make it the default. After a reboot, pressing the Hangul key (Right Alt on non-Korean keyboards) will toggle Korean language input.

There are also some Virtualbox-specific VM settings that you might want to change. The default memory setting is 512MB, but I changed this to 1024MB. Also I had problems booting the VM before enabling the Virtualbox "System" option, I/O APIC. For those users who store their Korean banking PKI certs on a USB thumb drive, you will need to enable USB 2.0 in the VM settings and also separately download the Oracle VM VirtualBox Extension Pack and load it from the Virtualbox Manager (under File -> Preferences -> Extensions).

Rearming the WinXP VM after 30 days

Note that the WinXP VM's from ModernIE will only run for 30 days. After that, a dialog box will appear at boot asking you whether you want to activate your copy of Windows. If you click 'No', the system will automatically reboot. Microsoft initially said that taking a snapshot image upon first importing the VM and later restoring that image would reset the activation clock, but this is actually not the case!

When 30 days have expired, what I do is restore a clean snapshot (which will still ask for activation) and then boot into Safe Mode (pressing F8 to bring up the Windows boot menu and selecting 'Safe Mode'). After booting open a Command Prompt and type the following:

rundll32.exe syssetup,SetupOobeBnk

Even with a successful activation, there will be no output.

Now reboot into a regular session and the activation clock will be reset to give you 30 more days!

Although Microsoft says that the WinXP VM activation counter can be reset up to 3 times (for a total of 90 days), by restoring a snapshot from a clean install and booting into WinXP Safe Mode and resetting the activation from the command prompt/CLI I've been able to use the same VM for more than 90 days!