November 11, 2013

Analyzing Samba corefile with dbx (nmbd process hangs)

The environment

AIX 6.1 TL06 SP05 & Samba version 3.5.6 ( pware61.samba.rte )

# grep pware /etc/inittab
nmbd:2:once:/opt/pware/sbin/nmbd -D > /dev/console 2>&1 # Start SAMBA
smbd:2:once:/opt/pware/sbin/smbd -D > /dev/console 2>&1 # Start SAMBA

The problem

nmbd process cannot start, in the log file the error message is quite useless (even despite that log level = 10 in the smb.conf) ( nmbd - NetBIOS name server to provide NetBIOS over IP naming services to clients )

# cat /opt/pware/var/log.nmbd
...
[2013/01/25 11:06:55.555535,  3] nmbd/nmbd.c:739(open_sockets)
  open_sockets: Broadcast sockets opened.
[2013/01/25 11:06:55.555716,  0] lib/fault.c:46(fault_report)
  ===============================================================
[2013/01/25 11:06:55.556151,  0] lib/fault.c:47(fault_report)
  INTERNAL ERROR: Signal 11 in pid 26935352 (3.5.6)
  Please read the Trouble-Shooting section of the Samba3-HOWTO
[2013/01/25 11:06:55.556461,  0] lib/fault.c:49(fault_report)

  From: http://www.samba.org/samba/docs/Samba3-HOWTO.pdf
[2013/01/25 11:06:55.556765,  0] lib/fault.c:50(fault_report)
  ===============================================================
[2013/01/25 11:06:55.556995,  0] lib/util.c:1465(smb_panic)
  PANIC (pid 26935352): internal error
[2013/01/25 11:06:55.557223,  0] lib/util.c:1619(log_stack_trace)
  unable to produce a stack trace on this platform
[2013/01/25 11:06:55.557472,  0] lib/fault.c:326(dump_core)
  dumping core in /opt/pware/var/cores/nmbd

The analysis

You can use dbx tool in order to analyze the core file.


# dbx /opt/pware/sbin/nmbd /opt/pware/var/cores/nmbd/core
Type 'help' for help.
warning: The core file is not a fullcore. Some info may
not be available.
[using memory image in /opt/pware/var/cores/nmbd/core]
reading symbolic information ...

IOT/Abort trap in pthread_kill at 0xd05098c0
0xd05098c0 (pthread_kill+0xa0) 80410014 lwz r2,0x14(r1)
(dbx) where
pthread_kill(??, ??) at 0xd05098c0
_p_raise(??) at 0xd0508d28
raise.raise(??) at 0xd01373e0
abort() at 0xd01c5704
dump_core() at 0x100a78b8
smb_panic(??) at 0x10008648
sig_fault(??) at 0x100a7e68
freeaddrinfo(??) at 0xd0227ae0
getifaddrs.rep_getifaddrs(??) at 0x1007bc50
get_interfaces(??, ??) at 0x1007b6e4
load_interfaces() at 0x1007ac74
main(??, ??) at 0x10001bd8
(dbx) quit

The output of where(in dbx) gives us a list of active procedures and functions on the moment the program crashed.

As you can see, last functions are related to network interfaces.

Now at least we can know what can be related to the problem, contrary to what nmbd.log output gave us nothing useful.

In the end, it was found, that there were two problems nmbd couldn’t start.

1: Samba configuration file must have netmask specified at the interfaces section (in our case we had /23 netmask).

2: log level option must be less than 2. ( This is not visible from the debug output above. If anyone have more information on how this could be determined with dbx or anything else, I’ll be happy to update this doc. )

Happy debugging !