2016년 8월 27일 토요일

Finding PID for programs - Why 'pidof foo' is less trustworthy than 'ps -eF | grep foo'

I often use pidof to find the PID of a running program. It works well with dhclient, dnsmasq and other executable binaries. But pidof runs into problems when trying to find the PID of script files that in turn invoke other programs. Take, for example, the Python 2 program deluge-gtk BitTorrent client.

[archjun@pinkS310 bin]$ pidof deluge-gtk
[archjun@pinkS310 bin]$ ps -eF | grep "deluge*" | grep -v grep
archjun  25862     1  3 289160 89272  1 16:47 ?        00:01:30 /usr/bin/python2 /usr/bin/deluge-gtk /MULTIMEDIA/Torrents/CentOS-5.5-x86_64-bin-DVD.torrent

In the first case, pidof fails to return any PID for the deluge-gtk executable file. In the second case, grepping for deluge-gtk in the output of ps -eF (all processes, extra full format) correctly returns the PID of the BitTorrent client which is executed by Python 2.

Let's take a look at the contents of the deluge-gtk executable file:

[archjun@pinkS310 bin]$ cat /usr/bin/deluge-gtk
#!/usr/bin/python2
# EASY-INSTALL-ENTRY-SCRIPT: 'deluge==1.3.13.dev0','gui_scripts','deluge-gtk'
__requires__ = 'deluge==1.3.13.dev0'
import sys
from pkg_resources import load_entry_point

if __name__ == '__main__':
    sys.exit(
        load_entry_point('deluge==1.3.13.dev0', 'gui_scripts', 'deluge-gtk')()
    )

ps -eF is more useful because it can follow an execution chain to the final PID.

2016년 8월 20일 토요일

Web scraping using lynx and shell utilities

In 2016, many people would probably think of using Python modules such as BeautifulSoup, urllib, or requests for scraping and parsing web pages. While this is a good choice, in some cases it can be quicker to scrape web pages using the text browser lynx and parsing the results using grep, awk, and sed.

My use case is as follows: I want to programatically generate a list of rpm packages from Fedora's EPEL X (5, 6, 7), CentOS vault, CentOS mirror, and HP DL server firmware sites. I want this list to be comparable to the output of rpm -qa on RHEL machines. Here are some sample URL's for sites showing rpm package lists:

http://vault.centos.org/5.7/updates/x86_64/RPMS/
http://mirror.centos.org/centos-5/5.11/os/x86_64/CentOS/
https://dl.fedoraproject.org/pub/epel/6/x86_64/
http://mirror.centos.org/centos-7/7.2.1511/updates/x86_64/Packages/
http://downloads.linux.hpe.com/repo/spp/rhel/6/x86_64/2016.04.0_supspp_rhel6.8_x86_64/

If you visit any of these links you will find that the basic format is the same -- from the left, the first field is an icon, the second field is the rpm filename, the third field is the date in YYYY-MM-DD, the fourth field is time in HH:MM, and the fifth field is file size.

Here is my bash script which parses file list html pages into a simple text file:


You can see that lynx renders the page from HTML into regular text and dumps this output to a file if you pass the -dump option. But this is not enough, because lynx by default inserts a newline character in lines greater than 79 characters. To avoid this problem, you must manually set the line width to something larger. The maximum width in lynx is 990 characters, so I specified this value through the option -width=990. Finally the -nolist option removes the list of links that lynx inserts at the bottom of the page.

Using grep I then extract just the lines containing the string ".rpm". Next I replace all tabs with 4 spaces using sed and then use awk to print just the filename field. Finally I use sed to remove the ".rpm" extension from the filenames to make the output identical to the format of rpm -qa. Note that the last sed statement might not render correctly in your browser because I use mathjax on my blog. Unfortunately, the characters I am trying to express are also the tags for a mathjax expression; The sed snippet should appear as follows:

sed "s:\openparens\.rpm\closeparens::g" "${F3}" > "$2"

I have replaced '(' and ')' with openparens and closeparens, respectively due to my blog's mathjax plugin incorrectly interpreting the above expression as a mathjax statement.

If you don't escape ".rpm" with backslashes, '.' will be interpreted as a regex "match any character" which would match strings like "-rpm", ".rpm", "redhat-rpm-config", etc. This is undesirable.

BTW this script is for informational and educational purposes only. It would actually be easier to just invoke lynx with lynx -dump -listonly ... and skip the data munging steps of replacing tabs with spaces using sed. If you do it this way you will get just the links to rpm files from EPEL, CentOS mirror, etc. Then you can return just the filename from each link's path with awk:

awk -F'/' '{ print $NF }'




2016년 8월 13일 토요일

Differences in binary file sizes between RHEL and CentOS

CentOS maintains binary compatibility with Red Hat Enterprise Linux, so applications which run on certain versions of RHEL should be able to run without changes on analogous versions of CentOS. Recently, however, a client asked me why executable binaries from the initscripts package (which contains /bin/ipcalc, /bin/usleep, etc) on RHEL 6.X have slightly different file sizes with those from the CentOS 6.X initscripts package.

First, I needed to verify that the source code in the initscripts srpm's for RHEL and CentOS were identical.

I downloaded initscripts-9.03.46-1.el6.src.rpm for RHEL 6.6 from the Redhat partner site and I downloaded initscripts-9.03.46-1.el6.centos.src.rpm from CentOS vault at the following url:

http://vault.centos.org/6.6/os/Source/SPackages/initscripts-9.03.46-1.el6.centos.src.rpm

I then unpacked the RHEL 6.6 initscripts source rpm's as follows:

[root@localhost srpm]# rpm2cpio initscripts-9.03.46-1.el6.src.rpm | cpio -idmv
initscripts-9.03.46.tar.bz2
initscripts.spec
3146 blocks
[root@localhost srpm]# ls
initscripts-9.03.46-1.el6.src.rpm  initscripts-9.03.46.tar.bz2  initscripts.spec
[root@localhost srpm]# tar -xvf initscripts-9.03.46.tar.bz2
initscripts-9.03.46/
initscripts-9.03.46/.gitignore
initscripts-9.03.46/.tx/
initscripts-9.03.46/.tx/config
initscripts-9.03.46/COPYING
initscripts-9.03.46/Makefile
...

I also did the same for the CentOS 6.6 initscripts package. I then renamed the directories for the extracted srpm's and then used the meld GUI diff tool to compare the .../src as well as the entire extracted initscripts srpm directories for RHEL 6.6 and CentOS 6.6.

As you can see below, the contents of the srpm's are identical:



Compiler options are contained within .../src/Makefile and the options are identical, as you can see from the diff results above. So the binary size differences are not due to differences in the source code, compiler options, or rpm Specfile between RHEL and CentOS.

Next, I did a simple C program compilation test of my own using gcc on a stock installation of RHEL 6.6 and CentOS 6.6.

Here is a simple hello world one-liner I have named hello.c:

#include

int main(void)
{
  printf("hello world!\n");
}

If I compile it with gcc with the following options

gcc hello.c -O0 -std=c99 -Wall -Werror -o hello

I still get slightly different file sizes on RHEL and CentOS:

RHEL 6.6
[root@localhost pset1]# ls -al hello
-rwxr-xr-x. 1 root root 6473 Aug  9 08:31 hello

CentOS 6.6
[root@localhost pset1]# ls -al hello
-rwxr-xr-x. 1 root root 6425 Aug 11 06:15 hello

This is a difference of 48 bytes.

I then used objdump from bintools to examine the assembly code in the compile hello object files. I renamed each object file as hello_rhel66 and hello_cent66, respectively. I am using the -s option with objdump so I can see full contents that also converts hex strings to ASCII.

[fedjun@u36jcFedora Downloads]$ objdump -s hello_rhel66 > hello_rhel66.dump
[fedjun@u36jcFedora Downloads]$ objdump -s hello_cent66 > hello_cent66.dump
[fedjun@u36jcFedora Downloads]$ diff -u hello_rhel66.dump hello_cent66
hello_cent66       hello_cent66.dump  
[fedjun@u36jcFedora Downloads]$ diff -u hello_rhel66.dump hello_cent66.dump
--- hello_rhel66.dump 2016-08-13 10:07:48.893239117 +0900
+++ hello_cent66.dump 2016-08-13 10:08:02.078435160 +0900
@@ -1,5 +1,5 @@

-hello_rhel66:     file format elf64-x86-64
+hello_cent66:     file format elf64-x86-64

 Contents of section .interp:
  400200 2f6c6962 36342f6c 642d6c69 6e75782d  /lib64/ld-linux-
@@ -9,8 +9,8 @@
  40022c 00000000 02000000 06000000 12000000  ................
 Contents of section .note.gnu.build-id:
  40023c 04000000 14000000 03000000 474e5500  ............GNU.
- 40024c 4cc6b3fd d6ec9bb6 e4540da0 aba4807f  L........T......
- 40025c 0f84997f                             ....            
+ 40024c 69320cbb e7408021 2c646e86 8344b173  i2...@.!,dn..D.s
+ 40025c 5e478671                             ^G.q            
 Contents of section .gnu.hash:
  400260 01000000 01000000 01000000 00000000  ................
  400270 00000000 00000000 00000000           ............    
@@ -137,7 +137,4 @@
 Contents of section .comment:
  0000 4743433a 2028474e 55292034 2e342e37  GCC: (GNU) 4.4.7
  0010 20323031 32303331 33202852 65642048   20120313 (Red H
- 0020 61742034 2e342e37 2d313029 00474343  at 4.4.7-10).GCC
- 0030 3a202847 4e552920 342e342e 37203230  : (GNU) 4.4.7 20
- 0040 31323033 31332028 52656420 48617420  120313 (Red Hat 
- 0050 342e342e 372d3131 2900               4.4.7-11).      
+ 0020 61742034 2e342e37 2d313129 00        at 4.4.7-11).

Apparently the contents of the .interp and .comments section differ between the two binaries. I believe the same holds true for each of the individual binaries from the initscripts package on RHEL 6.6 and CentOS 6.6. Each of the object files may contain different comments and time stamps which will lead to different binary file sizes.


2016년 8월 6일 토요일

RHEL 5.X, 6.X - Converting RHEL to CentOS using shell scripts and Python 2

A client recently asked my company to help them convert Red Hat Enterprise Linux 5.11 and 6.8 to the comparable versions of CentOS. Before I present the scripts I wrote to automate this process, I believe that there is a lot of value in a RHEL subscription and better yet, Red Hat employs full-time kernel developers who submit patches and updates to the Linux open source project. In effect, supporting Red Hat indirectly supports the development of Linux as an operating system.

That being said, CentOS has full binary compatibility with RHEL and the only difference is that RHEL-specific logos and artwork are not present in CentOS. On the Internet there are lots of blog posts on converting from RHEL to CentOS but many of these guides are incomplete or provide incorrect information. In order to save others the trial-and-error required to fine-tune this process, I am sharing the scripts I wrote to automate the conversion process. For the purposes of this post, I will be converting RHEL 5.11 and 6.8 to Cent 5.11 and 6.8 but the process applies to all 5.X and 6.X versions.

Step 1
Generate a baseline of all packages from the RHEL installation DVD. The script I wrote takes one parameter, the path to the installation iso mount point and outputs a text file that can be compared with the local output of rpm -qa (which lists all packages currently installed on a RHEL system).


Step 2
Generate a list of packages which differ from the baseline packages on the RHEL installation ISO. This step is necessary because when you do the RHEL to CentOS conversion, packages will be updated to stock CentOS package versions. If you have some errata packages installed on your RHEL system, these will be updated to stock CentOS package versions. Therefore you need to generate a list of local packages which differ from the rpm's on the RHEL installation DVD/iso.

RHEL 5.X only supports Python 2.4.3, so this is the Python version I wrote the script in. 2.4 was a new experience for me as I am accustomed to using the argparse module from Python 2.7.X instead of optparse. Also 2.4 doesn't support the file opening idiom with open(filename, 'r') as foo: ... instead you have to do something like

f = open(filename 'r')
...
f.close()

and remember to manually close file objects after opening them. Fortunately, Python 2.6.X which is used on RHEL 6.X, and Python 2.7.X which is used on RHEL 7.X are backwards-compatible with 2.4.X, so my script works fine on RHEL 6.X, too.

One issue I encountered when writing the Python script above is that rpm -qa on RHEL 5.X by default does not return the package architecture (i.e. i386, x86_64, i686). To also see package architecture you have to edit /usr/lib/rpm/macros and add the architecture field in the following format:

%_query_all_fmt %%{name}-%%{version}-%%{release}.%%{arch}

Fortunately, this is the default on RHEL 6.X so this setting only needs to be changed for RHEL 5.X so you can compare the baseline rpm package list generated in Step 1 with rpm -qa in Step 2.

Step 3

Run the conversion scripts for RHEL 5.11 and RHEL 6.8. Note that before running the scripts you must have mounted the appropriate CentOS installation iso on a mount point which you will specify to the script as a parameter.

Here is the script for RHEL 5.X:

and here is the script for RHEL 6.X:

The big difference between RHEL 5.X and 6.X is that for 6.X you need to rebuild the initial ramdisk image to remove the RHEL progress bar at boot.

Note that the script edits entries in /boot/grub/grub.conf so that there will be no references to RHEL in the grub boot menu.


Step 4

Reboot the system and manually upgrade the packages highlighted by rhel-baseline-diff.py in the output file pkg-diff.txt (upgrading errata rpm's to equivalent CentOS versions will require you to separately download and install the relevant packages).

Questions and comments (especially about how to improve the scripts) are welcome!