Linux server time not in sync with ntp server

Latest response

Hello all, I have a virtual machine Linux server version 6 (2.6.32-573.18.1.el6.x86_64)
We recently rebooted our servers during our maintenance window and after the reboot one server was not able to sync with the ntp server.
The server was running with +3 minutes ahead of the ntp server, we restarted ntpd service and ran ntpdate in between recycling ntpd.
That solved the problem, but what I am trying to find why this server did not sync with the ntp server.
We also found that the ESXI where the server was running had the wrong time (+3 minutes), and only 1 server out of the 20 running on the same esxi was not able to sync with the ntp server.

I tried to replicate the issue using a test server but in all the attempts to replicate the problem fails, the time gets adjusted to the ntp server with in a minute.

Here are some entries I grep out of the message log file, if any additional information is needed please let me know.

Any suggestions or theories as to what could have been the problem will be appreciated.

(server was rebooted twice you will see ntp restarted twice)

grep ntpd messages*

messages-20160918:Sep 17 12:11:42 rsagovprd1 ntpd[2762]: ntpd exiting on signal 15
messages-20160918:Sep 17 12:15:51 rsagovprd1 ntpd[3139]: ntpd 4.2.6p5@1.2349-o Tue May 3 15:12:50 UTC 2016 (1)
messages-20160918:Sep 17 12:15:51 rsagovprd1 ntpd[3140]: proto: precision = 0.059 usec
messages-20160918:Sep 17 12:15:51 rsagovprd1 ntpd[3140]: 0.0.0.0 c01d 0d kern kernel time sync enabled
messages-20160918:Sep 17 12:15:53 rsagovprd1 ntpd[3140]: Listen and drop on 0 v4wildcard 0.0.0.0 UDP 123
messages-20160918:Sep 17 12:15:53 rsagovprd1 ntpd[3140]: Listen and drop on 1 v6wildcard :: UDP 123
messages-20160918:Sep 17 12:15:53 rsagovprd1 ntpd[3140]: Listen normally on 2 lo 127.0.0.1 UDP 123
messages-20160918:Sep 17 12:15:53 rsagovprd1 ntpd[3140]: Listen normally on 3 eth0 172.16.95.193 UDP 123
messages-20160918:Sep 17 12:15:53 rsagovprd1 ntpd[3140]: Listen normally on 4 eth1 172.16.51.25 UDP 123
messages-20160918:Sep 17 12:15:53 rsagovprd1 ntpd[3140]: Listen normally on 5 lo ::1 UDP 123
messages-20160918:Sep 17 12:15:53 rsagovprd1 ntpd[3140]: Listen normally on 6 eth1 fe80::250:56ff:fe87:1285 UDP 123
messages-20160918:Sep 17 12:15:53 rsagovprd1 ntpd[3140]: Listen normally on 7 eth0 fe80::250:56ff:fe87:575 UDP 123
messages-20160918:Sep 17 12:15:53 rsagovprd1 ntpd[3140]: Listening on routing socket on fd #24 for interface updates
messages-20160918:Sep 17 12:15:55 rsagovprd1 ntpd[3140]: 0.0.0.0 c016 06 restart
messages-20160918:Sep 17 12:15:55 rsagovprd1 ntpd[3140]: 0.0.0.0 c012 02 freq_set ntpd -2.713 PPM
messages-20160918:Sep 17 12:15:55 rsagovprd1 ntpd[3140]: 0.0.0.0 c515 05 clock_sync
messages-20160918:Sep 17 12:23:48 rsagovprd1 ntpd[3140]: frequency error -968 PPM exceeds tolerance 500 PPM
messages-20160918:Sep 17 12:24:55 rsagovprd1 ntpd[3140]: frequency error -636 PPM exceeds tolerance 500 PPM
messages-20160918:Sep 17 12:26:04 rsagovprd1 ntpd[3140]: frequency error -640 PPM exceeds tolerance 500 PPM
messages-20160918:Sep 17 12:35:01 rsagovprd1 ntpd[3140]: frequency error -1585 PPM exceeds tolerance 500 PPM
messages-20160918:Sep 17 12:36:10 rsagovprd1 ntpd[3140]: frequency error -535 PPM exceeds tolerance 500 PPM
messages-20160918:Sep 17 14:12:19 rsagovprd1 ntpd[3140]: frequency error -624 PPM exceeds tolerance 500 PPM
messages-20160918:Sep 17 14:14:33 rsagovprd1 ntpd[3140]: frequency error -746 PPM exceeds tolerance 500 PPM
messages-20160918:Sep 17 14:16:41 rsagovprd1 ntpd[3140]: frequency error -734 PPM exceeds tolerance 500 PPM
messages-20160918:Sep 17 14:25:30 rsagovprd1 ntpd[3140]: frequency error -1459 PPM exceeds tolerance 500 PPM
messages-20160918:Sep 17 15:05:15 rsagovprd1 ntpd[3284]: ntpd 4.2.6p5@1.2349-o Tue May 3 15:12:50 UTC 2016 (1)
messages-20160918:Sep 17 15:05:15 rsagovprd1 ntpd[3285]: proto: precision = 0.123 usec
messages-20160918:Sep 17 15:05:15 rsagovprd1 ntpd[3285]: 0.0.0.0 c01d 0d kern kernel time sync enabled
messages-20160918:Sep 17 15:05:15 rsagovprd1 ntpd[3285]: Listen and drop on 0 v4wildcard 0.0.0.0 UDP 123
messages-20160918:Sep 17 15:05:15 rsagovprd1 ntpd[3285]: Listen and drop on 1 v6wildcard :: UDP 123
messages-20160918:Sep 17 15:05:15 rsagovprd1 ntpd[3285]: Listen normally on 2 lo 127.0.0.1 UDP 123
messages-20160918:Sep 17 15:05:15 rsagovprd1 ntpd[3285]: Listen normally on 3 eth0 172.16.95.193 UDP 123
messages-20160918:Sep 17 15:05:15 rsagovprd1 ntpd[3285]: Listen normally on 4 eth1 172.16.51.25 UDP 123
messages-20160918:Sep 17 15:05:15 rsagovprd1 ntpd[3285]: Listen normally on 5 lo ::1 UDP 123
messages-20160918:Sep 17 15:05:15 rsagovprd1 ntpd[3285]: Listen normally on 6 eth1 fe80::250:56ff:fe87:1285 UDP 123
messages-20160918:Sep 17 15:05:15 rsagovprd1 ntpd[3285]: Listen normally on 7 eth0 fe80::250:56ff:fe87:575 UDP 123
messages-20160918:Sep 17 15:05:15 rsagovprd1 ntpd[3285]: Listening on routing socket on fd #24 for interface updates
messages-20160918:Sep 17 15:05:15 rsagovprd1 ntpd[3285]: 0.0.0.0 c016 06 restart
messages-20160918:Sep 17 15:05:15 rsagovprd1 ntpd[3285]: 0.0.0.0 c012 02 freq_set ntpd -500.000 PPM
messages-20160918:Sep 17 15:05:16 rsagovprd1 ntpd[3285]: 0.0.0.0 c515 05 clock_sync
messages-20160918:Sep 17 15:05:18 rsagovprd1 ntpd[3285]: frequency error -554 PPM exceeds tolerance 500 PPM
messages-20160918:Sep 17 15:05:27 rsagovprd1 ntpd[3285]: frequency error -745 PPM exceeds tolerance 500 PPM
messages-20160918:Sep 17 15:10:56 rsagovprd1 ntpd[3285]: frequency error -9437 PPM exceeds tolerance 500 PPM
messages-20160918:Sep 17 15:19:54 rsagovprd1 ntpd[3285]: frequency error -15113 PPM exceeds tolerance 500 PPM
messages-20160918:Sep 17 15:21:02 rsagovprd1 ntpd[3285]: frequency error -2346 PPM exceeds tolerance 500 PPM
messages-20160918:Sep 17 15:35:30 rsagovprd1 ntpd[3285]: frequency error -6388 PPM exceeds tolerance 500 PPM
messages-20160918:Sep 17 15:37:41 rsagovprd1 ntpd[3285]: frequency error -1388 PPM exceeds tolerance 500 PPM
messages-20160918:Sep 17 15:53:01 rsagovprd1 ntpd[3285]: frequency error -6746 PPM exceeds tolerance 500 PPM
messages-20160918:Sep 17 16:10:26 rsagovprd1 ntpd[3285]: frequency error -7588 PPM exceeds tolerance 500 PPM
messages-20160918:Sep 17 16:17:02 rsagovprd1 ntpd[3285]: frequency error -11239 PPM exceeds tolerance 500 PPM
messages-20160918:Sep 17 16:21:35 rsagovprd1 ntpd[3285]: frequency error -7603 PPM exceeds tolerance 500 PPM
messages-20160918:Sep 17 16:25:59 rsagovprd1 ntpd[3285]: frequency error -7670 PPM exceeds tolerance 500 PPM
messages-20160918:Sep 17 16:31:27 rsagovprd1 ntpd[3285]: frequency error -9406 PPM exceeds tolerance 500 PPM
messages-20160918:Sep 17 16:37:59 rsagovprd1 ntpd[3285]: frequency error -11139 PPM exceeds tolerance 500 PPM
....
....
....
messages-20160925:Sep 21 07:06:48 rsagovprd1 ntpd[3285]: frequency error -9832 PPM exceeds tolerance 500 PPM
messages-20160925:Sep 21 07:10:03 rsagovprd1 ntpd[3285]: frequency error -3991 PPM exceeds tolerance 500 PPM
messages-20160925:Sep 21 07:12:55 rsagovprd1 ntpd[3285]: 0.0.0.0 0618 08 no_sys_peer
messages-20160925:Sep 21 07:18:47 rsagovprd1 ntpd[3285]: frequency error -6798 PPM exceeds tolerance 500 PPM
messages-20160925:Sep 21 07:25:19 rsagovprd1 ntpd[3285]: frequency error -7511 PPM exceeds tolerance 500 PPM
messages-20160925:Sep 21 07:29:46 rsagovprd1 ntpd[3285]: frequency error -5272 PPM exceeds tolerance 500 PPM
....
....
....
messages-20160925:Sep 21 17:58:04 rsagovprd1 ntpd[3285]: frequency error -1572 PPM exceeds tolerance 500 PPM
messages-20160925:Sep 21 18:00:23 rsagovprd1 ntpd[3285]: 0.0.0.0 0628 08 no_sys_peer
messages-20160925:Sep 21 18:03:31 rsagovprd1 ntpd[3285]: frequency error -3647 PPM exceeds tolerance 500 PPM
messages-20160925:Sep 21 18:04:35 rsagovprd1 ntpd[3285]: frequency error -1571 PPM exceeds tolerance 500 PPM
messages-20160925:Sep 21 18:13:20 rsagovprd1 ntpd[3285]: frequency error -9285 PPM exceeds tolerance 500 PPM

Responses

Hi,

NTPd doesn't like if the real time differs too much from the current one. I don't know if this is the case but try to sync the time manually using ntpdate: stop the ntpd, run ntpdate, start the ntpd

Zdenek, thank you for your feedback, the issue is resolved. That is what we did to update the time. what I am trying to find out is why the client never synced with the NTP server.

Nice.

As I described, when the ntpd daemon detects too big difference between local and remote time (i.e. high jitter), it refuses to synchronize, it is by design.

NTP client may refuse to sync with the server when the clocksource is slow or the jitter is high or the stratum is too low.

Zdenek,

I tried to replicate the issue on different server (same NTP configuration and same RPM version) by setting the time difference as experienced on my prod server, unfortunately the ntpd daemon was able to sync the time despite the +3 minutes difference. Any other suggestions?

Hmm looking at the error messages, I believe it could be fixed by following this article.

Let me know if it helps...

While these types of errors are typical of the time-slew being too large, you state that your VM to time-service delta is in the +3 minute range. NTP typically allows a time-slew range of +/- 16.5 minutes. So, you should be good from that standpoint.

Another, less frequent cause, is that your VM's internal time-keeping is too unstable. This can have a couple causes:

  • your VM's virtual hwclock is flakey (for some reason)
  • you've got time-source competition.

For older versions of RHEL (i.e., RHEL 5 and older), flakiness in the virtual hwclock wasn't uncommon. Sometimes, you had to change your boot options to select a different hardware source or change your default clock-frequency.

Since you state that your VM is ESX-hosted, that introduces the possibility that your (failing) VM has the VMware Tools installed and that those tools have been configured to perform time-setting functions. If the ESX host disagrees with the NTP source, the NTP software can declare your clock unreliable and refuse to operate.

Overall, you might want to consider fixing your ESX host's time (it sorta reads like you have other ESX hosts that do have the correct time). If that's not possible, make sure that VMware Tools isn't screwing you.

NTPD allows a default maximum delta between local time and server time. If you local time is within 1000s of the NTP-source, the local ntp daemon will attempt to correct the clock. If it's greater than 1000s, you have to do a manual sync.

You can also configure the local NTP daemon's start-options to allow the initial-sync to function even if the local-to-remote delta is greater than 1000s. Set the -g flag to allow this one-time slew-correction.

See the second paragraph of the "How NTP Operates" section of the NTP documentation.

Juan - you mentioned these systems were rebooted.

One thing that may help is the "ntpdate" service (separate from the ntpdate command).

    NOTE: this bit is not intended as a total & complete mitigation, but as one facet of many in an overall approach in a proper time synchronization setup.  

The ntpdate service arrived in RHEL 6 as described in this RH solution at https://access.redhat.com/solutions/227033, and the Red Hat documentation is here.

Check if ntpdate service is on or not

chkconfig --list ntpdate

and activate it if it is not activated.

To make it work, it will need to have /etc/ntp/step-tickers populated with (preferably) the IP address(es) of a reachable/reliable stratum NTP server, and again, see the documentation.

Also, see this NTP best practices from Red Hat at https://access.redhat.com/solutions/778603

I recommend a sane use of a collection of "peers" and more than one NTP server. Also take a look at ntp.org.

R. Hinton, thank you for your response, we don't have ntpdate service running, I agree that perhaps I should have this service enable to ensure the time is set correctly despite the hardware clock being +3minutes ahead. Will this cause any issues with ntpd daemon?

Typically, an ESX-hosted Linux VM will only take time-cues from the hypervisor if you've: a) got VMware Tools installed; and, b) select the option to have VMware Tools update the VM's time. Since you seem to be saying that the VM and the ESX host both deviate from the NTP server by the same, it's likely that "a" and "b" are true. That you've got differences between VMs hosted on the same ESX host is likely indicative that "a" and "b" are not (both) consistently true across the VMs hosted on that ESX host.

When it comes to time services, you only ever want one thing setting time. Otherwise, you can end up with time jumping each time one of the configured services attempts to "fix" the time. Depending on what you're running, these kinds of multi-source time skips can cause your host to crash (some clusterware will shoot nodes whose time is skipping).

The worst case I ever saw was an AD-integrated RHEL 6 VM running under VMware: the VM-owner had configured VMware tools to update time, ntpd was active and the AD-integration software was also setting time. Because there were time deltas between the ESX host, the AD domain and the NTP service, their VMs' system logs were filled with time adjustment log entries as each time source attempted to fix the corrections made by the other two time sources. They also had sporadic authentication issues because AD is even less forgiving of time-slew than the ntp service is.

Tom, that worst case you mentioned - yuk, what a mess...

The joys of working on large teams where people have a mix of backgrounds and skill-sets.

That said, it wouldn't have been so much of a mess if all of the time-sources agreed with each other.

indeed Tom,

Juan, by the way, here's a list of stratum 1 servers at http://support.ntp.org/bin/view/Servers/StratumOneTimeServers

Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.