October 03, 2013

Finding a reason AIX cannot boot (debugging the AIX Kernel)

If you stuck at the following point or a bit further, find below a way how to enter debug mode in order to see what is the real cause of the problem.

-------------------------------------------------------------------------------
                                Welcome to AIX.
                       boot image timestamp: 10:31 08/24
                 The current time and date: 15:58:17 03/14/2011
       processor count: 2;  memory size: 19712MB;  kernel size: 27451313
boot device: /pci@800000020000204/fibre-channel@0/disk@203500a0b8479d46,0000000000000000
                        kernel debugger setting: invoked
-------------------------------------------------------------------------------
Entering debug mode

In the abnormal situation like machine failed to boot and you don’t have boot media to boot into maintenance mode to enable KDB, how do you handle this? For chrp type machine, there is a function in the open firmware that can enable the KDB with normal boot image.

A. Make sure your first boot device has been set to the boot device you want to do debug boot (In SMS menu).

B. Enter into ok prompt of open firmware.

ok> boot -s trap

or

ok> boot  -s trap

In action you are going to see following

1 = SMS Menu 5 = Default Boot List
8 = Open Firmware Prompt 6 = Stored Boot List

memory keyboard network scsi speaker ok

press 8 to enter Firmware Prompt

0 > boot -s trap
KDB(0)> mw enter_dbg

Then type following number (42) and a dot (.) after it ** 42** .

KDB(0)> g

Like so

KDB(0)> mw enter_dbg
enter_dbg+000000:  00000000  = **42**
uexcept_anchor+000000:  00000000  = **.**
KDB(0)> g

This will help you to determine the point where system stops booting

-------------------------------------------------------------------------------
                                Welcome to AIX.
                       boot image timestamp: 10:31 08/24
                 The current time and date: 15:58:17 03/14/2011
       processor count: 2;  memory size: 19712MB;  kernel size: 27451313
boot device: /pci@800000020000204/fibre-channel@0/disk@203500a0b8479d46,0000000000000000
                        kernel debugger setting: invoked
-------------------------------------------------------------------------------

.......... kdb_tty_init done
.......... kdb_init_flihs done
Real memory size = 19712 M Bytes
Model = 0800004C
Data cache size  =   64 K Bytes
Inst cache size  =   64 K Bytes
.......... kdb_mem_size done
.......... kdb_code_init done
Preserving 1836563 bytes of symbol table
First symbol __mulh
           START              END
0000000000001000 00000000040B0000 start+000FD8
F00000002FF47600 F00000002FFDF948 __ublock+000000
000000002FF22FF4 000000002FF22FF8 environ+000000
000000002FF22FF8 000000002FF22FFC errno+000000
F1000F0A00000000 F1000F0A10000000 pvproc+000000
F1000F0A10000000 F1000F0A18000000 pvthread+000000

************* Welcome to KDB *************
Call gimmeabreak...
Static breakpoint:
.gimmeabreak+000000       tweq    r8,r8               r8=0
.gimmeabreak+000004        blr                        <.kdb_init+000234> r3=0
KDB(0)> mw enter_dbg
enter_dbg+000000:  00000000  = 42
uexcept_anchor+000000:  00000000  = .
KDB(0)> g
LED{814}

AIX Version 6.1
Starting NODE#000 physical CPU#001 as logical CPU#001... done.
Starting NODE#000 physical CPU#002 as logical CPU#002... done.
Starting NODE#000 physical CPU#003 as logical CPU#003... done.
exec(/etc/init){1,0}

INIT: EXECUTING /sbin/rc.boot 1
exec(/usr/bin/sh,-c,/sbin/rc.boot 1){1245222,1}
exec(/sbin/rc.boot,/sbin/rc.boot,1){1245222,1}
+ PHASE=1
+ + bootinfo -p
exec(/usr/sbin/bootinfo,-p){1310760,1245222}
PLATFORM=chrp
+ [ ! -x /usr/lib/boot/bin/bootinfo_chrp ]
...
...

...
...
...
...
...
...

The current volume is: /dev/hd2
Primary superblock is valid.

+ echo rc.boot: executing "mount /usr"
+ 1>> /../tmp/boot_log
+ /../usr/bin/tee -a /../tmp/boot_log
+ /../usr/sbin/mount /usr
+ exec(/../usr/bin/tee,-a,/../tmp/boot_log){589872,1245224}
2>& 1
exec(/../usr/sbin/mount,/usr){1835074,1310806}
exec(/sbin/helpers/jfs2/mount,-V,jfs2,-o,rw,log=/dev/hd8,/dev/hd2,/usr){1966144,1835074}
exec(/usr/bin/sh,-c,/usr/sbin/wlmcntrl -u -d "" > /dev/null 2>&1){1966146,1835074}
Could not load program sh:
Symbol resolution failed for /usr/lib/libc.a(shr.o) because:
        Symbol am_i_a_vios (number 28) is not exported from dependent
          module /unix.
        Symbol cluster_prod_unlock (number 49) is not exported from dependent
          module /unix.
        Symbol proc_getattr (number 178) is not exported from dependent
          module /unix.
...
...
...