HOS 2.X PXE Booting with LACP

Today I was working on an installation that was using bonded network interface cards on all the servers. Dual port 10gb network interfaces were bonded using LACP on the HP59XX switches as follows:

[GJL-HP5900-1-Bridge-Aggregation2]display link-aggregation verbose Bridge-Aggregation2
Loadsharing Type: Shar -- Loadsharing, NonS -- Non-Loadsharing
Port Status: S -- Selected, U -- Unselected, I -- Individual
Flags:  A -- LACP_Activity, B -- LACP_Timeout, C -- Aggregation,
        D -- Synchronization, E -- Collecting, F -- Distributing,
        G -- Defaulted, H -- Expired

Aggregate Interface: Bridge-Aggregation2
Aggregation Mode: Dynamic
Loadsharing Type: Shar
System ID: 0x8000, aaaa-bbbb-cccc
Local:
  Port             Status  Priority Oper-Key  Flag
--------------------------------------------------------------------------------
  XGE1/0/3         S       32768    2         {ACDEFG}
  XGE1/0/4         U       32768    2         {ACG}
Remote:
  Actor            Partner Priority Oper-Key  SystemID               Flag
--------------------------------------------------------------------------------
  XGE1/0/3         0       32768    0         0x8000, 0000-0000-0000 {DEF}
  XGE1/0/4         0       32768    0         0x8000, 0000-0000-0000 {EF}

 

However, all the servers were unable to PXE boot.

Examining the console of the servers revealed that they were attempting to PXE boot but did not appear to be getting a response from the deployer.

A quick check of the configuration on the Helion Lifecycle Manager shows that everything is correctly setup.

  • Verify the correct server mac details have been configured in /etc/dhcp/dhcpd.conf
graham@helion-cp1-c1-m1-mgmt:/etc/dhcp$ cat dhcpd.conf | more
# ******************************************************************
# Cobbler managed dhcpd.conf file
# generated from cobbler dhcp.conf template (Mon Jan 18 14:32:10 2016)
# Do NOT make changes to /etc/dhcpd.conf. Instead, make your changes
# in /etc/cobbler/dhcp.template, as /etc/dhcpd.conf will be
# overwritten.
# ******************************************************************

ddns-update-style interim;

allow booting;
allow bootp;

ignore client-updates;
set vendorclass = option vendor-class-identifier;

option pxe-system-type code 93 = unsigned integer 16;

subnet 172.16.60.0 netmask 255.255.255.0 {
     option routers             172.16.60.10;
     option domain-name-servers 172.16.60.10;
     option subnet-mask         255.255.255.0;
     deny unknown-clients;
     default-lease-time         21600;
     max-lease-time             43200;
     next-server                172.16.60.10;
     class "pxeclients" {
          match if substring (option vendor-class-identifier, 0, 9) = "PXEClient";
          if option pxe-system-type = 00:02 {
                  filename "ia64/elilo.efi";
          } else if option pxe-system-type = 00:06 {
                  filename "grub/grub-x86.efi";
          } else if option pxe-system-type = 00:07 {
                  filename "grub/grub-x86_64.efi";
          } else {
                  filename "pxelinux.0";
          }
     }

}

# group for Cobbler DHCP tag: default
group {
    host generic7 {
        hardware ethernet 8c:dc:d4:b5:ce:80;
        fixed-address 172.16.60.14;
        option host-name "compute2";
        option routers 172.16.60.10;
        next-server 172.16.60.10;
    }
    host generic5 {
        hardware ethernet 8c:dc:d4:b5:c9:74;
        fixed-address 172.16.60.12;
        option host-name "controller3";
        option routers 172.16.60.10;
        next-server 172.16.60.10;
    }
    host generic1 {
        hardware ethernet 8c:dc:d4:b5:c9:00;
        fixed-address 172.16.60.13;
        option host-name "compute1";
        option routers 172.16.60.10;
        next-server 172.16.60.10;
    }
    host generic6 {
        hardware ethernet 8c:dc:d4:b5:c6:40;
        fixed-address 172.16.60.11;
        option host-name "controller2";
        option routers 172.16.60.10;
        next-server 172.16.60.10;
    }
    host generic4 {
        hardware ethernet 5c:b9:01:8d:6b:68;
        fixed-address 172.16.60.15;
        option host-name "osd1";
        option routers 172.16.60.10;
        next-server 172.16.60.10;
    }
    host generic3 {
        hardware ethernet 5c:b9:01:8d:73:dc;
        fixed-address 172.16.60.17;
        option host-name "osd3";
        option routers 172.16.60.10;
        next-server 172.16.60.10;
    }
    host generic2 {
        hardware ethernet 5c:b9:01:8d:70:0c;
        fixed-address 172.16.60.16;
        option host-name "osd2";
        option routers 172.16.60.10;
        next-server 172.16.60.10;
    }
}

 

  • Verify that the DHCP service on the HLM deployer node is listening on port 67
$ netstat -an | fgrep -w 67
udp        0      0 0.0.0.0:67              0.0.0.0:*

dhcpVerify

  • Verify the TFTP service is listening on port 68
netstat -an | fgrep -w 69
udp        0      0 0.0.0.0:69              0.0.0.0:*

tftpVerify

  • Examine the syslog file for DHCP Discovery Requests
sudo grep -i dhcp /var/log/syslog

 

  • Use TCPDUMP to examine what DHCP traffic is traversing the interfaces on the HLM deployer
tcpdump -n -i any port 67 or port 68 or port 69

 

These checks showed that the deployer node services are operational but no DHCPDISCOVER requests are reaching its interfaces.

This points to a fundamental issue with the management network broadcast domain. The PXE broadcast requests are leaving the servers but not reaching the HLM node on the same network.

The basic switch configuration was reviewed to ensure that the management vlan was the native vlan and all ports were correctly configured. All looked correct.

Next we removed the LACP configuration and the PXE boot process sprang into life. The TCPDUMP showed the incoming DHCPDISCOVER requests now being received.

A quick Google for HP5900 PXE issues with LACP revealed the root cause of the issue which can be found here – thank you Peter Debruyne.

SOLUTION : When using LACP on nodes that initially boot as individual network interface cards – such as when trying to PXE boot – it’s necessary to configure the LACP type as “edge-port” as follows:

[GJL-HP5900-1-Bridge-Aggregation2]lacp edge-port
[GJL-HP5900-1-Bridge-Aggregation2]display link-aggregation verbose Bridge-Aggregation2
Loadsharing Type: Shar -- Loadsharing, NonS -- Non-Loadsharing
Port Status: S -- Selected, U -- Unselected, I -- Individual
Flags:  A -- LACP_Activity, B -- LACP_Timeout, C -- Aggregation,
        D -- Synchronization, E -- Collecting, F -- Distributing,
        G -- Defaulted, H -- Expired

Aggregate Interface: Bridge-Aggregation2
Aggregation Mode: Dynamic
Loadsharing Type: Shar
System ID: 0x8000, aaaa-bbbb-cccc
Local:
  Port             Status  Priority Oper-Key  Flag
--------------------------------------------------------------------------------
  XGE1/0/3         I       32768    2         {AG}
  XGE1/0/4         I       32768    2         {AG}
Remote:
  Actor            Partner Priority Oper-Key  SystemID               Flag
--------------------------------------------------------------------------------
  XGE1/0/3         0       32768    0         0x8000, 0000-0000-0000 {DEF}
  XGE1/0/4         0       32768    0         0x8000, 0000-0000-0000 {EF}
[NEO-HP5900-1-Bridge-Aggregation2]

 

The newly configured LACP ports now work as individual network interfaces when PXE booting.

The following images give a view of what a correctly configured system should look like when PXE booting-

PXE&TCPDUMP

SyslogMessages

 

 

One thought on “HOS 2.X PXE Booting with LACP

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s