OpenStack baremetal introspection bulk start fails - Red hat Openstack Director
I am trying to install director and overcloud.
Followed steps from : https://keithtenzer.com/2015/10/14/howto-openstack-deployment-using-tripleo-and-the-red-hat-openstack-director/comment-page-1/#comment-1335
In this environment we have used KVM hypervisor host , the undercloud (single VM) and overcloud (1 compute VM , 1 controller VM). . The KVM hypervisor host is on the 192.168.122.0/24 network and has IP of 192.168.122.136. The undercloud runs on a single VM on the 192.168.122.0/24 management network and 192.168.126.0/24 (provisioning) netowrk. The undercloud has an IP address of 192.168.122.90 (eth0). The overcloud is on the 192.168.126.0/24 (provisioning) and 192.168.125.0/24 (external) network.
I have facing issue while introspection (command: openstack baremetal introspection bulk start). The overcloud VMs are getting started but unable to configure network interface. Eventually introspection fails. Can anyone tell me what might have gone wrong?
Responses
Thanks, Dinesh! So it looks like it's not resolving DHCP/booting from PXE. You mentioned that the Undercloud has an IP on the management network, but does it have an IP on the Provisioning network?
Also can you post the IP config settings from the undercloud.conf? Specifically the following:
- local_ip
- network_gateway
- network_cidr
- masquerade_network
- dhcp_start
- dhcp_end
- inspection_iprange
Sounds like it's having trouble loading agent.ramdisk. What version of the introspection images are you using?
$ yum list rhosp-director-images-ipa
And did you load the images into glance without error as per: https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/10/html/director_installation_and_usage/chap-installing_the_undercloud#sect-Obtaining_Images_for_Overcloud_Nodes ?
Glad to hear the introspection worked!
As for the overcloud deployment, it should still use the provisioning network for DHCP, but the mechanism changes slightly between introspection vs provisioning. Here's why:
The introspection process uses ironic-inspector, which sets a dynamic DHCP range using dnsmasq.
The provisioning process uses neutron, which also sets a DHCP range using dnsmasq. However, neutron/dnsmasq maps each node's MAC address to an IP address. This way each node has a permanent IP assignment.
This is also the reason why you have to specify two different DHCP ranges:
- dhcp_start, dhcp_end - sets the provisioning network DHCP range in the director's neutron
- inspection_iprange - sets the DHCP range in ironic-inspector's dnsmasq.conf file
So first thing to do is check that you had a valid dhcp range for dhcp_start and dhcp_end in your undercloud.conf file. Also check that these settings carried over to neutron during the "openstack undercloud install" phase. The following command should show the allocation pool for all your existing subnets:
$ neutron subnet-list
Also, when running a deployment, check if dnsmasq is running:
$ ps -aux | grep dnsmasq
There might be two dnsmasq commands: one that uses /etc/ironic-inspector/dnsmasq.conf for config (ignore this one, it's the ironic-inspection one), and a larger command that uses configs from /var/lib/neutron/dhcp/. Make sure that second one is running during your provisioning process. if not, you might have to check if there a problem with neutron on the undercloud.
Also check that /httpboot/boot.ipxe has a pxe config in it. I remember you had problems with inspector.pxe being empty before, so might be a good idea to check this one as well.
What are the current specs for the nodes? RAM, CPU, and disk?
I ask because the only other thing I think it could be is that the nodes need to have specs that align or exceed the specs defined for each flavor that corresponds with the tags. Each flavor uses a default 4096MB RAM, 40GB disk and 1 CPU. If the specs for the VM are less than that, the director ignores the node even if tagged.
So if the specs are lower than the flavors, you might have to:
- Bump up the specs for each node
- Reduce the specs for the compute and control flavors (not recommended as it can lead to performance issues)
Yep, memory and CPU usage are main factors here. The docs list 8 cores and 16GB RAM minimum for production environments, and for good reason.
I used to test on a very low spec machine (8GB, low spec CPU) and would get race conditions for this particular issue. I'm now testing POCs with 16GB and a decent 4-core CPU and haven't experienced any race conditions. Plus I think they refined the code since OSPd 8 to avoid these types of race conditions.
How did the reboot go? Are both nodes now PXE booting and provisioning?
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
