A fix for the udev eth1 issue, take back your eth0.

Latest response

I don't know if this has been stated before (I searched => nothing)

I have been working on fixing the dreaded udev eth1 issue, and after some thought and discussion with my team we figured out how to circumvent udev to give us eth0 on clones of vm's. I have searched high and low to see if there was a way to fix this issue (google => no, startup/shutdown script => no) so I finally stopped thinking like an administrator and looked at the problem differently. The problem was not with the cloned machines but with the template itself.

My solution:

To fix the problem all you have to do is "change the template's network interface"

In my case, I changed the interface on the template to eth5 (because we won't have more than 4 interfaces on the system)

How to change:

modify your /etc/sysconfig/network-scripts/ifcfg-eth0 file to /etc/sysconfig/network-scritps/ifcfg-eth5 (or whatever number you want, just not eth0)
make sure you change the contents of the ifcfg-eth5 file to reflect eth5
then modify you /etc/udev/rules.d/70-persistent-net.rules file to reflect that you now what to rename the interface that has the HW address to eth5
reboot your template and when it comes back up you should see that the interface is now eth5.
You are now ready to clone vm's while they are up or down, doesn't matter your interface on the cloned server will come back as eth0


I think I know what you are referring to, but could you provide some additional detail (or history) to the "issue" section? It will help others find this thread. I like the format of this post though (issue - solution (summary) - solution (details))

This thread made me also wonder.. what happens if you simply remove 70-persistent-net.rules (from the template) and let it rebuild itself on the first boot after it is cloned? (My assumption is based on another assumption that the file is rebuilt by udev triggers - but I could be wrong).

Thanks for the post.

James, I've removed the 70-persistent-net.rules (on a previous case with Red Hat) and it reappears on the next boot (on rhel 6)... you probably knew that.

I did that with a system that was moved/imported to another hypervisor under virtualization. I've had to play games with these files on KVM systems on occasion that will not pick the proper nic when I reintroduce them to the host system after a reload.

James you are correct that it will rebuild on reboot if you remove the 70-persistent-net.rules file. It seems that this file is generated by the /etc/rc.sysinit file (or before it) because if you watch a RH system boot it will show that udev is running as one of the first processes during boot.

The problem is when I clone a vm when my template has eth0 set then every system afterwards has eth1 for the first nic. This isn't a problem for most(I wanted eth0..because we use puppet and it's predictable) and most just use eth1, but I wanted the consistency of eth0, eth1..etc when I build my machines using cloning. Using this solution I was able to fix the eth1 problem and build consistency into my manifests.

I feel your pain with having the NIC show up as eth1 when introduced to the host for the first time.

I know this is likely a completely human take on the problem, but it bothers me when the host does not use eth0.. probably to a degree that is not normal ;-) So, even if I didn't have something like Puppet managing that aspect of my environment, I would still tirelessly find a way to resolve the issue! ;-)

So - regarding the udev rules: If you were to create your template with 70-persistent-net.rules missing (and comment out HWADDR and UID in ifcfg-eth0), then deploy a new host - does it still create the device as eth1?

I wonder if the steps can be solid enough to include on this doc

The problem with removing the file then rebooting is sometimes when we clone systems the template is up(for patching and modifications to the template), and when the template is up the eth0 interface is used. When that happens and the clone ends up with eth1..:-( We have our QA team deploy servers now (going Agile). This way allows our systems to be up or down and cloned with consistency.

And to answer your question, it will work if you remove UUID, HW and the 70-persistent-net.rules, but from the issues listed above it causes problems and manual work..:-)

When we spin VM templates, we have a template prep-script that we run against them before converting the source VM to template. One of the things that script does is nukes out any persistent device bindings (e.g., nulls out the 70-persistent-net.rules). That way, on first-boot of VMs made from the template, any device nodes get mapped appropriately. If you have either a standard process or script out those standard processes, you won't run into the eth1 problem.

We took this approach to Linux as it was a simple cognate to what had to be done for Windows (actually, a lot less of a pain in the ass than cleaning up Windows templates' dev-trees).

How do you handle patching systems? Also, is your group the only group that deploy servers? Our process is automated through Jenkins and the servers could be up when our QA team presses the button to build our environment(using VMware)...how would you handle this scenario with the script?

Also, we wanted to have our template as plain as possible (kickstart, dvd, usb..whatever) and run puppet to build our systems and we are back to a full running system.

In general, we use a dated "platform" release cycle for our Linux systems. Due to our tenants' security requirements, our tenants have to test deployments against a given patch-cluster and then deploy exactly that certified cluster. Thus, we tend to retain 2-3 standard builds from which tenants systems can be patched-up to a date-targeted patch-state. All of this is automated by our (soon to be retired) automated CM/build/patching framework. For Kickstarted systems, an OS sequence is run and a date-targeted patch-bundle is applied via a post-Anaconda job. For VMs, once their VM's "run-once" script executes and the VM is joined to the CM/build/patching framework, it requests a date-targeted patch-bundle from the CM/build/patching framework and the framework complies. Specifically, the VM's script lets the framework know "ready to be patched" and the framework applies the patch bundle associated to that tenant's systems.

Basically, we've created orchestrated build workflows that de-emphasize the need for maintaining patched-up builds or templates. With the nature of our tenants' security directives, we really didn't have any choice but to do so (having dozens of templates distributed across dozens of vCenter installations scattered around the globe and across multiple, not interconnected networks, would have been too onerous).

This is just how RHEL6 makes network device names persistent. You don't need HWADDR at all in RHEL6, as the device is already properly named by udev. In my own templates I just remove the udev rules, and leave eth0 getting its address from DHCP with no HWADDR.

We supply the command sys-unconfig for use sealing VM images for templating.

It's just a script, though not a very complex one :)

$ rpm -qf $(which sys-unconfig)

$ cat $(which sys-unconfig)

. /etc/init.d/functions

if [ $# -ne 0 ]; then
    echo $"Usage: sys-unconfig" >&2
    exit 1

touch /.unconfigured
rm -f /etc/udev/rules.d/*-persistent-*.rules

Heh... Solaris has a similar "tool". I'd assumed our build just didn't have an equivalent tool installed (a lot of RPMs are removed as part of the hardening procedures), thus why they weren't using such a tool. If the guys that originally prescribed our template process were still there, I'd ask they why they reinvented the wheel. =)

Actually, just read the one link. Looks like the folks who originally defined our template build-process essentially prescribed all the same things in that article, but opted to manually nuke the udev rules as part of their larger prep-script.

One down side of just leaving the VM as "sealed" according to the linked article is you still end up with "leftover" log entries. Yes, the article says to optionally clear the logs, but the shutdown process will still tend to create more entries on the way down. So, the anal-retentive side of me tends to opt to nuke those log files outside of a normal boot-epoch.

I remember also using sys-unconfig on some remaining SGI systems a long while ago.

VMware cloning adds the HWADDR and UUID into the ifcfg-eth* file as well as when a server is rebooted udev creates the 70-persistent file. It's also a pain to troubleshoot VMware issues when cloning RH6 systems. What we wanted to do with this solution is eliminate the possibility of having eth1 on any of our servers as the first interface whether the system is up (because an admin wanted to verify some configs in the template) or not. Also, we wanted to keep as much data off the servers as possible, so if we decided to go RH 7 we have minimal work to install. We just use minimal install, modify the network interface and shutdown...and clone away. This flexibility allows us to be very agile and tell puppet the end state for the system.

The other benefit is with backups, because we no longer have to backup our templates, or servers (that are using config management) and just focus on backing up the data from the CM server(all text data).

Yeah. Part of our template creation process is to install a "run-once" file that automagically takes care of those annoyances.

We further opted to make the run-once file VERY basic. All it does is pull down the actual run-once tasks/scripts (the last of which notifies the CM server of the VM's "ready to patch" state) that are hosted on a web server. That way, rather than having to respin the template to push "run-once" changes into the template, the deployed VM simply grabs the most up to date collection of "run-once" scripts. Also makes it a lot easier to have one template-method that works against multiple EL releases (since our environment includes RHEL 5, RHEL 6 and CentOS 6 physicals and VMs).

Basically, it makes deploying Linux VMs a lot more like deploying Windows VMs (since the Windows deployment process allows you to specify a run-once operation as part of the deployment sysprep process).

Ahhh, I see, I didn't know that about VMWare cloning.

Do the virtual NICs always end up in the same PCI bus location? If so, you may be able to name your network devices by PCI bus address instead.

Not always, no - particularly if you've multi-NIC VM.

We simply got into the habits we did because of the need to support multiple-platforms. Using a consistent (and sufficiently abstracted) methodology makes it so you don't really have to think about it, any more - and protects you against within-platform changes that can creep in with platform version updates.

Chances are, were we a pure-RHEL/Linux shop, our procedures (and scripts) would be less pedantic.

Jaime - I had no idea it did that either. In fairness, they seem to make the cloning process as "pain-free" by offering tools to unidentify the host. So, it seems like they are trying ;-)

Thanks Everyone,

This is a useful thread. It gives me ideas for the clones we create. Tom's idea of time-stamping clones with a new date for the basesystem rpm is useful with cloned systems...

Sounds like it is the week for VM clones.. I am going through the process of stripping down another base image for VMware template based deployment, so the contributed info will be handy!

I think it would be of value to start a new thread discussing stripping an OS for template creation (or extending sys-unconfig.. without the .unconfigured creation).

Other things I am covering on top of NIC (sorry, slightly offtopic) is removing SSH host key, log files and /etc/sysconfig/rhn/systemid.. I am sure there are plenty more bits and pieces people have tucked away (resizing swap on first boot?).

Great to see I am not the only one fighting VMware and it's NIC configuration/cloning process!

Heh... Things that I used to take for granted back in the days when I was doing Solaris administration.

Pixel, you are certainly not alone in experiencing this

Yeah, I had that today under vmware, 70-persistent-net.rules vs ifcfg-eth1 created by the clone wars... (and I've seen the same thing on workstations running KVM systems moved/cloned/ingested)

I like your idea of a new thread to discuss that...

I just built/deployed a file today that resolves a authentication and time sync issue we run once in a while, but the script differs based on a subnet. I made a script that looks at what the gateway is then downloads the proper script based on the gateway... and tested it on a subset of about 30 systems and it worked, mercifully.

Once you have some ideas together (possibly even patches) you're more than welcome to log a bug against initscripts to improve sys-unconfig. Please let me know if you do, I'd be interested to follow it.

You also might want to check out what Rich Jones has done with the libguestfs-tools like virt-sysprep. This does similar things to what sys-unconfig does, but does it against KVM virt images using Python, Perl, and C.

Thanks Jamie, will look at that

As it stands, the existing unconfig script makes sense from the standpoint of not making assumptions. That said, if you didn't want to assume that SSH host-keys needed to be nuked-out (etc.), you could always put logic-blocks in that check for specific vendor-RPMs. If the RPM's missing, skip the procedure; if it's there, do the cleanup steps appropriate to deconfiguring that RPM's associated service(s).

I generally reference the same doc that Jaime linked to

I wonder if that process should be part of the "base" RHEL documents as well? In the Deployment Guide perhaps?

Given the proliferation of VM-hosted RHEL installations, it'd probably be a good idea to increase the number of mentions of how to optimize the use/experience of RHEL within those contexts. =)

Long time ago, but still troublesome. i stumbled over this today when preparing some templates running RHEL 6.8 and created machines from them (with a single NIC only) I ended up with doing to things within the template a) removed the file /etc/udev/rules.d/70-persistent-net.rules b) removed the line HWADDR.... from /etc/sysconfig/network-scripts/ifcfg-eth0.

When i then created a machine from that template the NIC was configured properly and got eth0 as identifier.

Yes, this thread is helpful. Also, other parameters such as hostname also required to be nullified when cloning.

This post was very helpful and got me pointed in the right direction.

In my case, my VM template is an automatically-generated OVA. Part of the OVA generation process involves booting the VM with an ISO that does a kickstart install, and of course the deployed OVA was getting stuck with eth1.

To solve this problem, I added a post-install script to my kickstart that automatically renames the just-installed NIC to eth5. That way, when the OVA is deployed, the OVA gets eth0.

# Make a copy of the ifcfg-eth0 with the appropriate name for eth5
cp /etc/sysconfig/network-scripts/ifcfg-eth0 /etc/sysconfig/network-scripts/ifcfg-eth5

# Then update the device name inside the config
sed -i 's/eth0/eth5/' /etc/sysconfig/network-scripts/ifcfg-eth5

# Remove the HWADDR binding for the eth0 config, which allows
# the network configuration we did at the top of the kickstart to
# apply to the deployed VM automatically
sed -i '/^HWADDR=/d' /etc/sysconfig/network-scripts/ifcfg-eth0

# Rename the device to eth5
sed -i 's/eth0/eth5/' /etc/udev/rules.d/70-persistent-net.rules

# get rid of default 'localhost.localdomain' hostname
sed -i '/^HOSTNAME=/d' /etc/sysconfig/network

# ...