2015년 1월 26일 월요일

Bash script to update a Beeminder Pomodoro goal

I sometimes use the Pomodoro technique to help me focus in 25-minute bursts with forced breaks of 5 minutes in between work blocks and longer extended breaks after 3~4 pomodoros. I thought it would be cool to write a script that would keep track of pomodoros as well as integrate with the Beeminder API, but then I realized that I could more easily use existing applications in the Linux ecosystem and glue them together with a bash script.

Ingredients

The Script





Usage

Both pystopwatch and xfce4-timer-plugin allow the user to launch an application when a timer completes its countdown. See screenshots below:



For pystopwatch, you access the alarm command dialog by right-clicking on the face of the timer after changing to Countdown Mode. To launch a cli script, I invoke my terminal program, terminator, with the -e option flag to make it run a specified command contained in double quotes (btw, pystopwatch asks that you add & to the end of any alarm commands so that they run in the background without locking the main program). You could simply specify sh ~/path/to/pomo2beeminder.sh to make pystopwatch run the script above.

For xfce4-timer-plugin, you must right-click on the timer icon, select Properties, and click edit on your timer to specify a command to run. Unlike pystopwatch, it is not necessary to append & at the end of your command invocation.

Careful readers will note that the script I am invoking through both timer apps is NOT pomo2beeminder.sh (although this will work fine). I created a separate timer-launched script that first calls mplayer to play a wav file and then executes pomo2beeminder.sh:


#!/bin/bash
# Script to be executed by pystopwatch or xfce4-timer-plugin
# when the timer for 1 pomodoro completely counts down

MUSIC=~/Music/zen_temple_bell-soundbible-com.wav

mplayer $MUSIC
sh ~/Documents/MyProjects/pomodoro/pomo2beeminder.sh

This way I can hear a sound effect when the timer completes after which the pomodoro script is executed.

Format for emails to the Beeminder bot

When sending emails to bot@beeminder.com how does it know which goal to update and how much to increment the graph? It parses this info from the subject line of the email, which must be in the format

username/goalName

The body of the email to the Beeminder bot must contain a line starting with a caret followed by a space and then a positive integer. The graph will be incremented by the integer value. Comments can also be added by placing them in between double quotes. Example:

^ 1 "sent by pomo2beeminder.sh"


smtp-cli parameters

The pomo2beeminder script uses smtp-cli to send emails through your webmail provider's smtp server. I use the Gmail smtp server, but other webmail services probably work, too.

Obviously you will have to change the values for the variables HOST (if you don't use Gmail), USER, FROM, TO and SUBJ. You should also create a separate smtp server password file named pass.txt that resides in the same path as the pomo2beeminder.sh script. It is good practice not to hardcode passwords in your scripts which might end up in VCS - git users out there could blacklist the password file from being tracked by adding it to .gitignore, for example. Within the script itself, the value of variable PW will be read from the separate file, pass.txt.

*Note: For those of you planning to use smtp-cli with gmail, you cannot use your regular webmail password to send mails through the Google smtp server -- please refer to a previous blog post of mine in which I explain how to generate an application-specific password for gmail which you can use with third-party apps.

Here's sample output from pomo2beeminder.sh

[archjun@arch pomodoro]$ ./pomo2beeminder.sh
Did you successfully complete your pomodoro?(y/n)
y
Connection from 192.168.0.9:42780 to 74.125.203.108:25
[220] 'mx.google.com ESMTP i3sm10596671pdf.39 - gsmtp'
> EHLO localhost
[250] 'mx.google.com at your service, [220.76.214.86]'
[250] 'SIZE 35882577'
[250] '8BITMIME'
[250] 'STARTTLS'
[250] 'ENHANCEDSTATUSCODES'
[250] 'PIPELINING'
[250] 'CHUNKING'
[250] 'SMTPUTF8'
Starting TLS...
> STARTTLS
[220] '2.0.0 Ready to start TLS'
Using cipher: ECDHE-RSA-AES128-SHA
Subject Name: /C=US/ST=California/L=Mountain View/O=Google Inc/CN=smtp.gmail.com
Issuer  Name: /C=US/O=Google Inc/CN=Google Internet Authority G2
> EHLO localhost
[250] 'mx.google.com at your service, [220.76.214.86]'
[250] 'SIZE 35882577'
[250] '8BITMIME'
[250] 'AUTH LOGIN PLAIN XOAUTH XOAUTH2 PLAIN-CLIENTTOKEN'
[250] 'ENHANCEDSTATUSCODES'
[250] 'PIPELINING'
[250] 'CHUNKING'
[250] 'SMTPUTF8'
AUTH method (LOGIN PLAIN XOAUTH XOAUTH2 PLAIN-CLIENTTOKEN): using LOGIN
> AUTH LOGIN
[334] '....' (redacted)
> .......... (redacted)
[334] '....' (redacted)
> .....
[235] '2.7.0 Accepted'
Authentication of gojun077@gmail.com@smtp.gmail.com succeeded
> MAIL FROM:
[250] '2.1.0 OK i3sm10596671pdf.39 - gsmtp'
> RCPT TO:
[250] '2.1.5 OK i3sm10596671pdf.39 - gsmtp'
> DATA
[354] ' Go ahead i3sm10596671pdf.39 - gsmtp'
[250] '2.0.0 OK 1422308096 i3sm10596671pdf.39 - gsmtp'
> QUIT
[221] '2.0.0 closing connection i3sm10596671pdf.39 - gsmtp'

[archjun@arch pomodoro]$ ./pomo2beeminder.sh
Did you successfully complete your pomodoro?(y/n)
a
Please answer y or n
Did you successfully complete your pomodoro?(y/n)
a
Please answer y or n
Did you successfully complete your pomodoro?(y/n)
;Please answer y or n
Did you successfully complete your pomodoro?(y/n)
q
Please answer y or n
Did you successfully complete your pomodoro?(y/n)
e
Please answer y or n
Did you successfully complete your pomodoro?(y/n)
r
Please answer y or n
Did you successfully complete your pomodoro?(y/n)
ty
Please answer y or n
Did you successfully complete your pomodoro?(y/n)
1111
Please answer y or n
Did you successfully complete your pomodoro?(y/n)
n
Concentrate harder next time!



If you have any comments or suggestions about the script, please let me know!

2015년 1월 19일 월요일

Open Source OCR on Linux using GUI Frontends for Tesseract

Although I use Linux both at home and at work, for some tasks, like OCR for Korean and Chinese, I have had to rely on proprietary software on Windows (ABBYY Finereader provides excellent recognition results, by the way). This is starting to change, thanks to the tesseract OCR engine currently sponsored by Google.

Tesseract has been around for several years, but it wasn't easily accessible before the advent of GUI frontends that make it easy to select the area of an image to be recognized. The two more popular frontends to tesseract are YAGF (which also works with the Cuneiform OCR engine) and gimagereader both of which now use the QT framework (the latter used to be based on gtk, but in recent versions, QT can also be used).

Screenshot of YAGF



Screenshot of gimagereader


Tesseract's English-language recognition is almost on par with ABBYY Finereader for 300 dpi images, but much worse than Finereader at detecting images less than 300 dpi resolution. When it comes to non-English text, especially Asian text such as CJK (Chinese, Japanese, Korean) and other scripts, however, the performance of the tesseract engine still has a long way to go before matching the performance of Finereader.

YAGF doesn't give the option to use Asian languages, despite the existence of tesseract data files for many Asian languages. For example, here is a listing of the available tesseract-data packages for various languages in Archlinux:

[archjun@lenovoS310 cam1]$ sudo pacman -Ss tesseract-data
[sudo] password for archjun: 
community/tesseract-data-afr 3.02.02-5 (tesseract-data)
    Tesseract OCR data (afr)
...
community/tesseract-data-chi_sim 3.02.02-5 (tesseract-data)
    Tesseract OCR data (chi_sim)
community/tesseract-data-chi_tra 3.02.02-5 (tesseract-data)
    Tesseract OCR data (chi_tra)
...
community/tesseract-data-jpn 3.02.02-5 (tesseract-data)
    Tesseract OCR data (jpn)
...
community/tesseract-data-kor 3.02.02-5 (tesseract-data) [installed]
    Tesseract OCR data (kor)
...
community/tesseract-data-vie 3.02.02-5 (tesseract-data)
    Tesseract OCR data (vie)

Piping the output through wc -l gives a line count of 130, divided by 2 (two lines per entry) gives 65 unique languages supported by Tesseract. As you can see in the sample output above, Asian languages CJK and Vietnamese are supported. According to the YAGF developer, Asian language OCR will be added to the GUI menu after European languages.

Fortunately, gimageview does support OCR for Asian languages as long as the necessary language data for tesseract has been installed. You may notice that the screenshot of gimagereader shows Korean text being recognized. Unfortunately, tesseract does a poor job of recognizing Korean. Although I haven't done a meticulous count, I would say off the top of my head that the results in the second screenshot above represent a recognition accuracy of maybe 70%. This is much worse than ABBYY Finereader. The tesseract-ocr project page offers some tips for improving OCR accuracy, such as upping the scan resolution, deskewing pages, etc., but the scanned image I used to test tesseract for Korean returned 90%+ OCR accuracy in ABBYY Finereader on Windows.

My conclusion: circa Jan. 2015, tesseract is good for English, not so good for Hangul/Korean.

2015년 1월 12일 월요일

Dell Latitude D630 from 2007 has Gigabit Ethernet!

Today I looked up the specs for my 2007 Dell Latitude D630 and noticed that it supposedly supports Gigabit Ethernet.

I ran lspci and ethtool to get more info on the Ethernet controller:

[archjun@arch ~]$ ethtool -i enp9s0
driver: tg3
version: 3.137
firmware-version: 5755m-v3.29
bus-info: 0000:09:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no
[archjun@arch ~]$ ethtool enp9s0
Settings for enp9s0:
Supported ports: [ TP ]
Supported link modes:   10baseT/Half 10baseT/Full 
                        100baseT/Half 100baseT/Full 
                        1000baseT/Half 1000baseT/Full 
Supported pause frame use: No
Supports auto-negotiation: Yes
Advertised link modes:  10baseT/Half 10baseT/Full 
                        100baseT/Half 100baseT/Full 
                        1000baseT/Half 1000baseT/Full 
Advertised pause frame use: Symmetric
Advertised auto-negotiation: Yes
Link partner advertised link modes:  10baseT/Half 10baseT/Full 
                                     100baseT/Half 100baseT/Full 
                                     1000baseT/Full 
Link partner advertised pause frame use: Symmetric
Link partner advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
MDI-X: off
Cannot get wake-on-lan settings: Operation not permitted
Current message level: 0x000000ff (255)
       drv probe link timer ifdown ifup rx_err tx_err
Link detected: yes


[archjun@arch ~]$ lspci
...
09:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5755M Gigabit Ethernet PCI Express (rev 02)

ethtool also verifies that the speed is 1000Mb/s and indicates that the Broadcom card uses the tg3 driver.

I was unaware that my old D630 Latitude had Gigabit Ethernet because I never had the chance to connect this machine to Gigabit-capable network hardware until recently. It is ironic that this old notebook that is 8 years old has more capable Ethernet than my Lenovo S310 work laptop, which only supports up to 100Mb/s and uses the Realtek r8169 driver:

[archjun@lenovoS310 bin]$ sudo ethtool enp1s0
[sudo] password for archjun: 
Settings for enp1s0:
Supported ports: [ TP MII ]
Supported link modes:   10baseT/Half 10baseT/Full 
                       100baseT/Half 100baseT/Full 
Supported pause frame use: No
Supports auto-negotiation: Yes
Advertised link modes:  10baseT/Half 10baseT/Full 
                       100baseT/Half 100baseT/Full 
Advertised pause frame use: Symmetric Receive-only
Advertised auto-negotiation: Yes
Speed: 10Mb/s
Duplex: Half
Port: MII
PHYAD: 0
Transceiver: internal
Auto-negotiation: on
Supports Wake-on: pumbg
Wake-on: g
Current message level: 0x00000033 (51)
      drv probe ifdown ifup
Link detected: no
[archjun@lenovoS310 bin]$ sudo ethtool -i enp1s0
driver: r8169
version: 2.3LK-NAPI
firmware-version: 
bus-info: 0000:01:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: yes
supports-priv-flags: no

2015년 1월 5일 월요일

Using a Virtualbox VM stored on a shared partition in several different OS'es

  On my notebook I use at work, I have one shared 250 GB btrfs partition that stores all my RHEL, CentOS, Debian/Ubuntu, etc installation .iso images as well as Virtualbox VM .vdi files The remaining space on the 500 GB disk is divided up into separate /boot partitions for various Linux distros while system partitions for each distro reside within LUKS containers containing separate LVM partitions (1 PV, 1 VG, and several logical volumes within the VG for mounting / /home /var and swap).

Since all the Virtualbox .vdi virtual disk images reside on the shared 250 GB partition, within each OS (i.e. Win7, Ubuntu, Arch...) we can import .vdi files into Virtualbox using VBoxManager thereby enabling us to use the VM's from multiple OS'es.

There are some caveats, however. If you want to use the VM's on multiple OS'es, you must not keep any save states; after finishing up your session, be sure to do a proper shutdown of the VM. This is of particular importance for WinXP or Win7 VM's. In the case of XP, I had a saved state residing on CentOS 7, but then I tried to load the VM from Arch which totally corrupted the image making booting no longer possible. In the case of a Win7 VM, even if there are no save states, on the first boot from a new host OS Win7 will restart itself to apply new configurations for the "new" host.

Another issue is that MAC addresses for existing network interfaces are randomized when an existing VM is imported into Virtualbox residing on a different host. This was problematic for me because my CentOS 6.5 VM serves as a PXE installation server. Within the VM there is a single network interface eth0 (note the host OS may have a different name for the wired network interface, i.e. enp1s0 etc) with the following MAC address:


The original MAC for our network interface is 08:00:27:3C:8A:03 but when we imported the CentOS 6.5 VM into Virtualbox on an Archlinux host, Virtualbox randomized the MAC address to another value.

When I started the CentOS 6.5 VM, I noticed that eth0 failed to come up and returned an error, "interface eth0 does not exist", and network interface eth1 was generated by the OS instead. This was problematic because my dhcp server config and PXE install scripts all assume that the network interface is on eth0.

Solution

I shut down the VM and, taking note of the original MAC above, simply replaced the randomized MAC address with 08:00:27:3C:8A:03 in the VBoxManager administration interface. Upon starting the VM once more, eth0 came up at boot without any problems.