kvm-clock

Latest response

The Virtualization Host Configuration and Guest Installation Guide indicates that the NTP daemon should be running on the host and the guest virtual machines. 

Why is it necessary to run the NTP daemon on the guest if kvm-clock is paravirtualized?

Responses

I had not put much thought into -why- we run NTP in a VM, until I read your question.  I had recalled hearing about the TSC issue.

I found the following doc and explanation (although even the doc proclaims their info might be dated).

http://support.ntp.org/bin/view/Support/KnownOsIssues

 

I'm surprised that the clock could be para-virtualized, or more importantly, why it could be.  It makes me wonder what happens if you were to run hwclock on a VM.  Does the PV attempt to write directly to the bare metal clock?

Anyhow - good question - I look forward to some of the responses. ;-)

 

 

Some information about the paravirtualized clock:

KVM timekeeping reference document:  http://lxr.linux.no/#linux+v3.8/Documentation/virtual/kvm/timekeeping.txt

Original kvmclock documentation: https://lkml.org/lkml/2010/4/15/355

"guests can register a memory page to contain kvmclock data. This page has to be present in guest's address space throughout its whole life. The hypervisor continues to write to it until it is explicitly disabled or the guest is turned off"

So it doesn't look like the paravirtualized clock is directly connected to the host clock, but is just a memory page shared between host and guest. hwclock shouldn't have much meaning in a VM, as the host shouldn't read from this page.

I'm interested in finding out if running NTP *servers* as VMs, and asked in https://access.redhat.com/discussion/how-good-paravirtualized-clock -- unfortunately it hasn't gotten any replies yet.

Probably depends on if you're using anything that's Kerberos-based for integrated authentication-management. While it may not be critical that your hypervisor's clocks are coordinated, it will certainly matter if your VMs' clocks are. Heck, it may be entirely possible that your hypervisor and VMs are in two different authentication/time domains (particularly if your hypervisors are hosting guests that are scattered across authentication/time domains).

Some AD integrations solutions (e.g. LikeWise/BeyondTrust), embed an NTP-synchronization service within them to force the AD-integrated OS to be in time-sync with the Active Directory servers. We discovered this, the hard way, when our AD servers somehow ended up out of sync with our NTP servers. System logs got FILLED with time-skew correction alerts as the xntpd service fought with the AD-integration service to set the system time.

I was dealing with this just over a year  ago, and learned that running NTPD on the VMs created a conflict. The NTPD had to be set on the hypervisor, and it would set the time on the VMs.

Since the hypervisor is running on a Linux kernel, whether VMWare or RHEV, We were able to run NTPD

It got messy when we were setting time on the VM and the host was resetting the time. Learned that the host needed to be running NTPD to keep its time accurate and all was good. We then did not have to set the clock on the VM.

 

Don't forget to run three reliable sources. Last job the "too smart" engineers decided to run it against the AD server and a server pointing at AD. Times got off two hours on AD server somehow and caused some problems with the Oracle DB.

If I have a choice, I point at three internal servers, which point at RedHat, Naval Obs, and one other good source.

From my last work on NTP, the stance I found most reliable was to run NTP client on every host and guest, then disable any sync between guest and host.  This seemed to be the 'official' advice I could pull from VMware, KVM etc.

My reading on this was that your hosts and guests are better off if they all point to the same Stratum NTP servers.  Our requirement isn't just for authentication, but for high accuracy log file stamps across the whole estate.

Oh - and using 4 or more NTP sources per client was the benchmark for us.  Using 2 is a no-no due to flip-flopping between the sources.  Using 3 is much better, but reverts to 2 when you take a time source down for maintenance.  Using 4 keeps you at a good level even when 1 source is down.  Our top level comes from a mix of GPS, radio and secure external internet sources and is assisted by a rubidium clock.  This top level is used by our AD servers, core network switches and some physical Linux boxes.  This layer has many more devices and is used by all other devices in the network requiring NTP.

D

Thanks for all of the responses so far to this discussion.  I have come up with the following explanation.

kvm-clock is a Linux clocksource like tsc and hpet.  (tsc is commonly used on physical machines.)  The discussion at the URL below indicates that all clocksources act like time counters that are periodically read by interrupts.

http://thread.gmane.org/gmane.linux.kernel/1062438

kvm-clock apparently has special functionality to handle the fact that the timing of interrupts is not always precisely regular in VMs. 

So, kvm-clock is a time counter and not a virtual hardware clock.  It was not designed to keep the guest's system time in sync with the host's system time.  NTP is required on both the host and the guest for adjusting their system times.

The RTC (a.k.a. the hardware clock) is not a Linux clocksource.  It keeps the date and time of day but is not that accurate.  It is not used to update the system time except for when the OS starts up.

Ah hah! Thanks for sharing your findings, Aram.

Any recommendation for RHEL guests 'ported' (imported) to run as AWS EC2 instances?

We have seen instances in which if the system date/time has been modified to the future/past to test, for instance, Application functionality in the future/past, the RHEL OS just becomes unresponsive along time and, eventually, there is no mechanism to get out of the situation but to re-start the EC2 instance from the AWS Console.

Surprising is the fact that the AWS Instance Status Checks (software/network) does not immediately fails, i.e., it reports that the EC2 instance running RHEL is still reachable (via, whether, ssh, DB connectivity, App connectivity, etc.) when in fact the instance is practically dead, but not acknowledged as dead by AWS Status Check, CloudWatch,etc.

Is there a recommended list of kernel/system/network tunables for RHEL OSes running as guests on ESX that are to be ported (imported) to AWS EC2 instances?

Forgot to mention that systems resources are completely ruled out. No resource bottleneck has been evident whether by CloudWatch, sar, or our monitoring tool.

I haven't found much of significance on the subject.

Any insight is greatly valued. Thanks.

Anyone landing at this discussion. It was resurrected 5 years later with a different scope in topic.
.

Miguel,

Please examine this link https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/set-time.html and let us know if that helps at all.

Also see this https://aws.amazon.com/blogs/aws/keeping-time-with-amazon-time-sync-service/. I personally like/prefer ntp over chrony, but examine Amazon's methods.

Regards,

RJ

p.s. Miguel, you have the freedom to open a new discussion too. While your scenario has to do with virtualization, the original poster (OP) kinda centered on kvm virtualization. Yours is a good topic, but can deserve it's own discussion too.

Wish you well with this.

I will update all on the resolution to the issue I ran into.

The problem was caused by DHCP on the RHEL instances imported into AWS. By default, the Linux instance is checking with the DHCP server for a lease of 3600s (1h).

When the date/time manual changes to the RHEL OS were done, whether setting a date/time in the future/past, this DHCP leasing got out of whack and the Linux instance would drop responsiveness (it would fall out of network access first, and later, the AWS Instance Status Check would fail) because the IP could not get assigned.

FIX: Hard code the Network Parameters in the Linux instance to NOT use DHCP, namely, through the OS System files for it: /etc/sysconfig/network-scripts, /etc/sysconfig/network, and potentially, /etc/hosts

*** DO NOT USE DHCP IF THE RHEL INSTANCE IS HAVING ITS SYSTEM CLOCK CHANGED FOR DATE/TIME APPLICATION FUNCTIONALITY TESTING ***

The problem has been resolved fortunately after many hypotheses and their testing were conducted. I hope AWS Support can learn from this case. It appears that no AWS customer had run into this issue before, or, at a minimum, the issues was never reported/documented.

Below, the messages:

messages-20190102:Jan 2 10:25:06 costello NetworkManager[932]: [1546449906.4073] dhcp4 (eth0): dhclient started with pid 4131 messages-20190102:Jan 2 10:25:06 costello dhclient[4131]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 4 (xid=0x5934eeaa) messages-20190102:Jan 2 10:25:06 costello dhclient[4131]: DHCPREQUEST on eth0 to 255.255.255.255 port 67 (xid=0x5934eeaa) messages-20190102:Jan 2 10:25:06 costello dhclient[4131]: DHCPOFFER from 10.139.6.129 messages-20190102:Jan 2 10:25:06 costello dhclient[4131]: DHCPACK from 10.139.6.129 (xid=0x5934eeaa) messages-20190102:Jan 2 10:25:06 costello NetworkManager[932]: [1546449906.4458] dhcp4 (eth0): address 10.139.6.150 messages-20190102:Jan 2 10:25:06 costello NetworkManager[932]: [1546449906.4458] dhcp4 (eth0): plen 25 (255.255.255.128) messages-20190102:Jan 2 10:25:06 costello NetworkManager[932]: [1546449906.4459] dhcp4 (eth0): gateway 10.139.6.129 messages-20190102:Jan 2 10:25:06 costello NetworkManager[932]: [1546449906.4459] dhcp4 (eth0): lease time 3600 messages-20190102:Jan 2 10:25:06 costello NetworkManager[932]: [1546449906.4459] dhcp4 (eth0): hostname 'ip-10.139.6.150' messages-20190102:Jan 2 10:25:06 costello NetworkManager[932]: [1546449906.4459] dhcp4 (eth0): nameserver '10.139.6.2' messages-20190102:Jan 2 10:25:06 costello NetworkManager[932]: [1546449906.4459] dhcp4 (eth0): domain name 'candystore.com' messages-20190102:Jan 2 10:25:06 costello NetworkManager[932]: [1546449906.4459] dhcp4 (eth0): state changed unknown -> bound messages-20190102:Jan 2 10:25:06 costello nm-dispatcher: req:1 'dhcp4-change' [eth0]: new request (3 scripts) messages-20190102:Jan 2 10:25:06 costello nm-dispatcher: req:1 'dhcp4-change' [eth0]: start running ordered scripts... messages-20190102:Jan 2 10:48:22 costello dhclient[4131]: DHCPREQUEST on eth0 to 10.139.6.129 port 67 (xid=0x5934eeaa) messages-20190102:Jan 2 10:48:22 costello dhclient[4131]: DHCPACK from 10.139.6.129 (xid=0x5934eeaa) messages-20190102:Jan 2 10:48:22 costello NetworkManager[932]: [1546451302.8918] dhcp4 (eth0): address 10.139.6.150 messages-20190102:Jan 2 10:48:22 costello NetworkManager[932]: [1546451302.8922] dhcp4 (eth0): plen 25 (255.255.255.128) messages-20190102:Jan 2 10:48:22 costello NetworkManager[932]: [1546451302.8922] dhcp4 (eth0): gateway 10.139.6.129 messages-20190102:Jan 2 10:48:22 costello NetworkManager[932]: [1546451302.8922] dhcp4 (eth0): lease time 3600 messages-20190102:Jan 2 10:48:22 costello NetworkManager[932]: [1546451302.8922] dhcp4 (eth0): hostname 'ip-10.139.6.150' messages-20190102:Jan 2 10:48:22 costello NetworkManager[932]: [1546451302.8922] dhcp4 (eth0): nameserver '10.139.6.2' messages-20190102:Jan 2 10:48:22 costello NetworkManager[932]: [1546451302.8922] dhcp4 (eth0): domain name 'candystore.com' messages-20190102:Jan 2 10:48:22 costello NetworkManager[932]: [1546451302.8922] dhcp4 (eth0): state changed bound -> bound messages-20190102:Jan 2 10:48:22 costello nm-dispatcher: req:1 'dhcp4-change' [eth0]: new request (3 scripts) messages-20190102:Jan 2 10:48:22 costello nm-dispatcher: req:1 'dhcp4-change' [eth0]: start running ordered scripts...