Today I was working on an installation that was using bonded network interface cards on all the servers. Dual port 10gb network interfaces were bonded using LACP on the HP59XX switches as follows:
[GJL-HP5900-1-Bridge-Aggregation2]display link-aggregation verbose Bridge-Aggregation2 Loadsharing Type: Shar -- Loadsharing, NonS -- Non-Loadsharing Port Status: S -- Selected, U -- Unselected, I -- Individual Flags: A -- LACP_Activity, B -- LACP_Timeout, C -- Aggregation, D -- Synchronization, E -- Collecting, F -- Distributing, G -- Defaulted, H -- Expired Aggregate Interface: Bridge-Aggregation2 Aggregation Mode: Dynamic Loadsharing Type: Shar System ID: 0x8000, aaaa-bbbb-cccc Local: Port Status Priority Oper-Key Flag -------------------------------------------------------------------------------- XGE1/0/3 S 32768 2 {ACDEFG} XGE1/0/4 U 32768 2 {ACG} Remote: Actor Partner Priority Oper-Key SystemID Flag -------------------------------------------------------------------------------- XGE1/0/3 0 32768 0 0x8000, 0000-0000-0000 {DEF} XGE1/0/4 0 32768 0 0x8000, 0000-0000-0000 {EF}
However, all the servers were unable to PXE boot.
Examining the console of the servers revealed that they were attempting to PXE boot but did not appear to be getting a response from the deployer.
A quick check of the configuration on the Helion Lifecycle Manager shows that everything is correctly setup.
- Verify the correct server mac details have been configured in /etc/dhcp/dhcpd.conf
graham@helion-cp1-c1-m1-mgmt:/etc/dhcp$ cat dhcpd.conf | more # ****************************************************************** # Cobbler managed dhcpd.conf file # generated from cobbler dhcp.conf template (Mon Jan 18 14:32:10 2016) # Do NOT make changes to /etc/dhcpd.conf. Instead, make your changes # in /etc/cobbler/dhcp.template, as /etc/dhcpd.conf will be # overwritten. # ****************************************************************** ddns-update-style interim; allow booting; allow bootp; ignore client-updates; set vendorclass = option vendor-class-identifier; option pxe-system-type code 93 = unsigned integer 16; subnet 172.16.60.0 netmask 255.255.255.0 { option routers 172.16.60.10; option domain-name-servers 172.16.60.10; option subnet-mask 255.255.255.0; deny unknown-clients; default-lease-time 21600; max-lease-time 43200; next-server 172.16.60.10; class "pxeclients" { match if substring (option vendor-class-identifier, 0, 9) = "PXEClient"; if option pxe-system-type = 00:02 { filename "ia64/elilo.efi"; } else if option pxe-system-type = 00:06 { filename "grub/grub-x86.efi"; } else if option pxe-system-type = 00:07 { filename "grub/grub-x86_64.efi"; } else { filename "pxelinux.0"; } } } # group for Cobbler DHCP tag: default group { host generic7 { hardware ethernet 8c:dc:d4:b5:ce:80; fixed-address 172.16.60.14; option host-name "compute2"; option routers 172.16.60.10; next-server 172.16.60.10; } host generic5 { hardware ethernet 8c:dc:d4:b5:c9:74; fixed-address 172.16.60.12; option host-name "controller3"; option routers 172.16.60.10; next-server 172.16.60.10; } host generic1 { hardware ethernet 8c:dc:d4:b5:c9:00; fixed-address 172.16.60.13; option host-name "compute1"; option routers 172.16.60.10; next-server 172.16.60.10; } host generic6 { hardware ethernet 8c:dc:d4:b5:c6:40; fixed-address 172.16.60.11; option host-name "controller2"; option routers 172.16.60.10; next-server 172.16.60.10; } host generic4 { hardware ethernet 5c:b9:01:8d:6b:68; fixed-address 172.16.60.15; option host-name "osd1"; option routers 172.16.60.10; next-server 172.16.60.10; } host generic3 { hardware ethernet 5c:b9:01:8d:73:dc; fixed-address 172.16.60.17; option host-name "osd3"; option routers 172.16.60.10; next-server 172.16.60.10; } host generic2 { hardware ethernet 5c:b9:01:8d:70:0c; fixed-address 172.16.60.16; option host-name "osd2"; option routers 172.16.60.10; next-server 172.16.60.10; } }
- Verify that the DHCP service on the HLM deployer node is listening on port 67
$ netstat -an | fgrep -w 67
udp 0 0 0.0.0.0:67 0.0.0.0:*
- Verify the TFTP service is listening on port 68
netstat -an | fgrep -w 69 udp 0 0 0.0.0.0:69 0.0.0.0:*
- Examine the syslog file for DHCP Discovery Requests
sudo grep -i dhcp /var/log/syslog
- Use TCPDUMP to examine what DHCP traffic is traversing the interfaces on the HLM deployer
tcpdump -n -i any port 67 or port 68 or port 69
These checks showed that the deployer node services are operational but no DHCPDISCOVER requests are reaching its interfaces.
This points to a fundamental issue with the management network broadcast domain. The PXE broadcast requests are leaving the servers but not reaching the HLM node on the same network.
The basic switch configuration was reviewed to ensure that the management vlan was the native vlan and all ports were correctly configured. All looked correct.
Next we removed the LACP configuration and the PXE boot process sprang into life. The TCPDUMP showed the incoming DHCPDISCOVER requests now being received.
A quick Google for HP5900 PXE issues with LACP revealed the root cause of the issue which can be found here – thank you Peter Debruyne.
SOLUTION : When using LACP on nodes that initially boot as individual network interface cards – such as when trying to PXE boot – it’s necessary to configure the LACP type as “edge-port” as follows:
[GJL-HP5900-1-Bridge-Aggregation2]lacp edge-port [GJL-HP5900-1-Bridge-Aggregation2]display link-aggregation verbose Bridge-Aggregation2 Loadsharing Type: Shar -- Loadsharing, NonS -- Non-Loadsharing Port Status: S -- Selected, U -- Unselected, I -- Individual Flags: A -- LACP_Activity, B -- LACP_Timeout, C -- Aggregation, D -- Synchronization, E -- Collecting, F -- Distributing, G -- Defaulted, H -- Expired Aggregate Interface: Bridge-Aggregation2 Aggregation Mode: Dynamic Loadsharing Type: Shar System ID: 0x8000, aaaa-bbbb-cccc Local: Port Status Priority Oper-Key Flag -------------------------------------------------------------------------------- XGE1/0/3 I 32768 2 {AG} XGE1/0/4 I 32768 2 {AG} Remote: Actor Partner Priority Oper-Key SystemID Flag -------------------------------------------------------------------------------- XGE1/0/3 0 32768 0 0x8000, 0000-0000-0000 {DEF} XGE1/0/4 0 32768 0 0x8000, 0000-0000-0000 {EF} [NEO-HP5900-1-Bridge-Aggregation2]
The newly configured LACP ports now work as individual network interfaces when PXE booting.
The following images give a view of what a correctly configured system should look like when PXE booting-
Thank you! This saved my day yesterday, Graham!
This is how to do it with Juniper switching: http://broken.net/openindiana/how-to-pxe-boot-systems-on-lacp-using-juniper-switches/
LikeLike