Sat 6.2.4 - Last Checkin date/time not updating

Latest response

Systems were registered using activation keys and the katello-agent is installed and running on the systems. However, the last checkin day/time has not been updated since the systems have been registered. Should they not be checking in every 30 minutes or so?

Responses

Gary,

Apologies for the late reply. Is this problem still occurring?

If so, when you say "last checkin day/time", are you referring to the "Last Report" column for all hosts when you open the Satellite web UI and navigate to Hosts > All hosts?

The frequency with which a host reports to either a Satellite Server or Capsule Server is determined by the host's Puppet agent's configuration. The default frequency is between 30 minutes and 60 minutes.

Hi Russell,

Thanks for the reply, yes this is still happening. I am seeing the "Last Checkin" in the UI:

Hosts > Content Hosts

The last column on the right side named Last Checkin.

The puppet agent makes sense then, since puppet was installed at first while I was registering with the bootstrap, since then I have switched over registration to just use the activation key and not install puppet.

However, I was under the impression that the check in was similar to the rhn_check found in sat5.

Gary,

I'll check with an engineer, but it's my understanding that the "Last Checkin" reflects the most recent date and time on which the host's Puppet agent "checked in" with the Puppet Master. Since you're not installing the Puppet agent on hosts, that would explain the last check-in information not being updated.

Russell--

Is that new behavior with 6.2? in the past (at least my experience up to Satellite 6.1.x), "Last Checkin" referred to the Katello agent (gopherd) checking in, i.e. the "Pulp" side of Satellite 6's split personality, not Puppet/config mgmt. side. So checking 'gopherd' status & connectivity on the client would be my first guess.

Not sure, but I actually think it is last time rhsm "checked in" (happens every 4 hours by default, I think)

Checking the rhsm.log on a couple of servers here, the time matches well.

That was my thinking as well, that the last checkin was referring to when katello or goferd checks in to the sat server, saying hey look at me I'm online and good to go. I have verified that the gofer service is running in the past and just checked again on a host that has a last checkin time of when I registered it:

[root@gltest62 ~]# service goferd status goferd (23828) is running.

System registered stamp: 2016-12-22 11:03:15 -0600

Last Checkin stamp: 2016-12-22 11:12:13 -0600

However, as Russell has stated that it could be the puppet agent last checkin. This makes sense as there are some systems registered that have the puppet agent installed and the last checkin stamp is recent.

But here is the issue. I do not want to use puppet as we have another system we use for config management.

Isn't it "Last Report" that is the last puppet run? The value you see in the table you get when you choose Hosts > All hosts.

Gary,

Thanks for clarifying which field you were looking at. I realise, after reading your comment above, that I was looking at the wrong place in the Satellite web UI. I was looking at Hosts > All hosts, but you were looking at Hosts > Content Hosts, which is a different view. Apologies for confusing the discussion.

I will check and confirm just where the "Last Checkin" date comes from.

Gary,

Can you please check something on 1 or 2 of those hosts for which the "Last Checkin" date and time is not being updated? Check if the Red Hat Subscription Manager daemon, rhsmcertd , is running? I was just reading about a customer case where hosts were migrated from Satellite 5 to Satellite 6. The customer had disabled the rhsmcertd daemon when the hosts were under Satellite 5 management, and they were successfully migrated to Satellite 6 management. However, without the rhsmcertd running, they were not reporting their status to the Satellite 6 server and so their "Last checkin" date and time was not being updated.

The rhsmcertd daemon's log file is located at /var/log/rhsm/rhsmcertd.log. Before enabling the daemon, it would be interesting to check the log file and see if the time stamps recorded there match with the time stamps noted in the Satellite web UI.

Ok so a couple of interesting things. The rhsmcertd service was not running on the servers that did not have an updated checkin time. However, on the system that did have an updated checkin time, the service was not running, but had a checkin stamp from earlier this morning, possibly puppet agent? The rhsmcertd.log did not exist previously so I could not see what the log file was stating.

Now, after restarting the rhsmcertd service, the checkin time updated and showed the appropriate errata that the system needed.

However, checking the log file I see an error:

[root@gltest69 ~]# cat /var/log/rhsm/rhsmcertd.log Fri Jan 13 10:02:29 2017 [INFO] Starting rhsmcertd... Fri Jan 13 10:02:29 2017 [INFO] Auto-attach interval: 1440.0 minute(s) [86400 second(s)] Fri Jan 13 10:02:29 2017 [INFO] Cert check interval: 240.0 minute(s) [14400 second(s)] Fri Jan 13 10:02:29 2017 [INFO] Waiting 120 second(s) [2.0 minute(s)] before running updates. Fri Jan 13 10:04:32 2017 [INFO] (Auto-attach) Certificates updated. Fri Jan 13 10:04:34 2017 [INFO] (Cert Check) Certificates updated. Fri Jan 13 10:05:43 2017 [ERROR] unable to get lock, exiting

Taking a look at the -help option for the rhsmcertd service I see there's the -n option for now.

[root@gltest69 ~]# rhsmcertd -help Usage: rhsmcertd [OPTION...]

Help Options: -h, --help Show help options

Application Options: --cert-interval=MINUTES deprecated, see --cert-check-interval -c, --cert-check-interval=MINUTES interval to run cert check (in minutes) --heal-interval=MINUTES deprecated, see --auto-attach-interval -i, --auto-attach-interval=MINUTES interval to run auto-attach (in minutes) -n, --now run the initial checks immediately, with no delay -d, --debug show debug messages

After I run that, I see there lock error again. I then chose the -d option for debug, log results:

Fri Jan 13 10:24:18 2017 [DEBUG] Loading configuration from: /etc/rhsm/rhsm.conf Fri Jan 13 10:24:18 2017 [ERROR] unable to get lock, exiting

The checkin time shows the updated time that I started the rhsmcertd service time on the Satellite UI, and has not updated since.

Contents of /etc/rhsm/rhsm.conf file:

Red Hat Subscription Manager Configuration File: Unified Entitlement Platform Configuration

[server]

Server hostname:

hostname = satserver.example.com

Server prefix:

prefix = /rhsm

Server port:

port = 443

Set to 1 to disable certificate validation:

insecure = 0

Set the depth of certs which should be checked when validating a certificate

ssl_verify_depth = 3

an http proxy server to use

proxy_hostname =

I'm very curious for those who did not install the puppet agent, if their checkin times are out of date or continue to update..

Unless there is some strange thing going on here puppet should not be related to rhsm (the Red Hat Subscription Manager system. I'm quite sure Last Checkin has to do with the Katello/Candlepin part of Satellite that you find under Hosts > Content Hosts and not to the Foreman/Puppet part that is found under Hosts > All Hosts.

So:

Puppet -> Last report under Hosts > All Hosts.

rhsm -> Last Hosts > Content Hosts

It might be useful to see what it looks on a working system, so checking on our test system I see that a host has last checkin 14:10:06 UTC.

On the host I see this in the logs (note that this is the +0100 timezone)

/var/log/rhsm/rhsm.log

2017-01-13 15:10:06,353 [INFO] rhsmcertd-worker:3171:MainThread @connection.py:830 - Connection built: host=satellite01.example.com port=443 handler=/rhsm auth=identity_cert ca_dir=/etc/rhsm/ca/ verify=False
2017-01-13 15:10:06,604 [INFO] rhsmcertd-worker:3171:MainThread @entcertlib.py:131 - certs updated:
Total updates: 0

...etc. Continues with the certificate numbers and repo list etc.

/var/log/rhsm/rhsmcertd.log

Fri Jan 13 07:10:07 2017 [INFO] (Cert Check) Certificates updated.
Fri Jan 13 11:10:04 2017 [INFO] (Auto-attach) Certificates updated.
Fri Jan 13 11:10:07 2017 [INFO] (Cert Check) Certificates updated.
Fri Jan 13 15:10:07 2017 [INFO] (Cert Check) Certificates updated

...so every 4 hours.

On the satellite server I see this searching for the subscription id (UUID from Content Host page):

root@satellite01:~> grep 4cc65712-53c8-442c-9ee9-5eb23c60820d/certificates /var/log/foreman/production.log

2017-01-13 07:10:06 [app] [I] Started GET "/rhsm/consumers/4cc65712-53c8-442c-9ee9-5eb23c60820d/certificates/serials" for 10.236.7.132 at 2017-01-13 07:10:06 +0100
2017-01-13 11:10:04 [app] [I] Started GET "/rhsm/consumers/4cc65712-53c8-442c-9ee9-5eb23c60820d/certificates/serials" for 10.236.7.132 at 2017-01-13 11:10:04 +0100
2017-01-13 11:10:06 [app] [I] Started GET "/rhsm/consumers/4cc65712-53c8-442c-9ee9-5eb23c60820d/certificates/serials" for 10.236.7.132 at 2017-01-13 11:10:06 +0100
2017-01-13 15:10:06 [app] [I] Started GET "/rhsm/consumers/4cc65712-53c8-442c-9ee9-5eb23c60820d/certificates/serials" for 10.236.7.132 at 2017-01-13 15:10:06 +0100

Gary and Terje,

Thanks for that additional information. It seems we don't yet have a definitive answer. I have asked the Satellite 6 engineers to confirm the source of the last checkin date and time.

Gary,

In looking over the content of the example rhsmcertd.log file, I'm concerned about the presence of the message "...unable to get lock". This may indicate that one or more of the Satellite background tasks is paused, and has a lock on resources required by RHSM.

Fri Jan 13 10:02:29 2017 [INFO] Starting rhsmcertd...
Fri Jan 13 10:02:29 2017 [INFO] Auto-attach interval: 1440.0 minute(s) [86400 second(s)]
Fri Jan 13 10:02:29 2017 [INFO] Cert check interval: 240.0 minute(s) [14400 second(s)]
Fri Jan 13 10:02:29 2017 [INFO] Waiting 120 second(s) [2.0 minute(s)] before running updates.
Fri Jan 13 10:04:32 2017 [INFO] (Auto-attach) Certificates updated.
Fri Jan 13 10:04:34 2017 [INFO] (Cert Check) Certificates updated.
Fri Jan 13 10:05:43 2017 [ERROR] unable to get lock, exiting

To check this, open the Satellite 6 web UI, navigate to Monitor > Tasks. All tasks are listed in descending order by date and time, so the very latest will be at the top. If you still can't see any that are at paused state, put the following search criteria into the Search field - state = paused.

I'm still waiting on an answer to my question about the "Last checkin" field, but checking for paused tasks in the meantime would be a useful step.

Hey Russell,

I did find one task paused:

Id: 665671f0-be07-44e9-94b4-3c928b1df257
Label: Actions::Katello::Host::GenerateApplicability
Name: Generate applicability
Owner: foreman_admin
Execution type: Delayed
Start at: 2016-11-28 10:13:17 -0600
Start before: -
Started at: 2016-11-28 10:13:17 -0600
Ended at:
State: paused
Result: error
Params: {"services_checked"=>["pulp", "pulp_auth"], "host_ids"=>[28], "current_user_id"=>1}


PG::Error: ERROR:  current transaction is aborted, commands ignored until end of transaction block
: DELETE FROM "katello_content_facet_errata" WHERE "katello_content_facet_errata"."content_facet_id" = 17

The Cancel button is not able to be pressed but I pressed the "Stop auto reloading" button.

It seems like the hosts are checking in. I see time stamps on the UI on the systems between 6:44AM - 7:24AM this morning. This is before when I stopped the auto reload for the task above. I'm going to keep an eye on the check in times, hopefully they will continue to check in.

Gary,

OK - it's good to hear that the hosts are now checking in, and the Satellite web UI reflects currrent timestamps. I'm a little concerned about that paused task, and would suggest you raise a support case with Red Hat to get that resolved.

Regarding the "Last Checkin" time listed for hosts in the Satellite web UI, I confirmed with an engineer that is the last date and time the rhsmcertd daemon reported its status to the Satellite Server. It seems that starting that service, and stopping the paused task autoreloading, has fixed the hosts' checkins.

Russell,

Thank you for your help with this, all systems are checking with Satellite.

I do have a follow-up question. Is there any reason why the rhsmcertd was not turned on by default once the system registered to Satellite 6?

I have the systems registered by the activation key, not via the provided bootstrap.py.

subscription-manager register --org="SatTest" --activationkey="Sat_Test_Key"

Gary,

Thanks for your reply. It's great to know that all systems are checking in at the expected interval.

As to your question about the state of rhsmcertd, it definitely should have been running, so why it was not is a mystery, and one I'd live to resolve. The method of registration should not, I believe, have any effect on the rhsmcertd service. Can you please confirm - are these systems new? What version of Red Hat Enterprise Linux are they running? Is there any other management-oriented software installed on them?

The servers were new but provisioned from Salt. However, they were first registered to Satellite 5, I then removed them from 5 and subscribed them to 6.

The RHEL version is 6.

I'm going to give it a try so that when the servers are provisioned it will not register to Sat 5 first. Will keep you posted!

So I found out what may have caused this.

rhsmcertd appears to come part of subscription-manager, my systems did not have this on the template we use. I had to register the systems first to Sat5, install the subscription-manager package, then unsubscribe from Sat5.

After making a few new test servers yesterday, I found that after you install subscription-manager, rhsmcertd does not auto start, but the runlevels are configured to start. So lesson learned, make sure rhsmcertd is running :)

As an FYI, I opened bz1414993 to add the capability to start (and enable on startup) to bootstrap.py.

Gary,

Thanks for that reply. It's great to know that you now have the hosts successfully reporting, and have found the root cause of the problem.

Since you're migrating hosts from Satellite 5 to Satellite 6, I would recommend you use the bootstrap script, a CLI tool which you run on individual hosts. It is available in Satelite 6.2+ and was created solely for the purpose of migrating hosts from either RHSM or Satellite 5 to Satellite 6.

For full details of the bootstrap tool, including example uses, see the KBase article Red Hat Satellite 6.2 Feature Overview: Importing Existing Hosts via the Bootstrap Script.

As you noted above, you have had to manually install the subscription-manager package on the hosts to be migrated. The bootstrap script completes this step for you. As you don't wish to use Puppet, the example usages includes "Registering a system to Satellite 6, omitting puppet setup".

Near the bottom of the article is a video which describes the script and provides a demonstration of its use.

Gary,

I just realised that it was you who recently raised a Customer Portal discussion about an issue you had with the bootstrap script. :)

Hopefully my reply may still be of use to others who encounter similar problems.

Yup, it was I who had that question in a previous thread. I was encountering some issues with the bootstrap.py so that's what promoted me to just use the subscription-manager activation method, using the key only. I'll go ahead and look at the bootstrap and use it for migrations.

Huge appreciations to Rich and yourself for taking the time to help!

Gary,

You're welcome. I'm a technical writer, assigned to Red Hat Satellite. Although I'd prefer that customers didn't encounter these sorts of problems, I learn a lot in working through them. It helps us better understand what aspects of the product customers have difficulty with, and so which need refinement.

Hi Guys,

Sorry for joining late the party but I do have the same issue which was my "Last Report" for every client does not reflect in the Web GUI (it does not appear)

I tried to restart rhsmcertd as per previous post and no error appear when i run command "rhsmcertd -n"

But, still the "Last Report" did not show up and i found below error in rhsmcertd.log

"[ERROR] unable to get lock, exiting"

I did found the solution here https://access.redhat.com/solutions/3132321

But, I would really appreciate if I can have a solution without rebooting the server first.

Please help me.

Thanks.

Hey guys,

Please ignore the previous question as I managed to get it done.

I just realize that I need to stop daemon rhsmcertd first and then run rhsmcertd -n. It seems like when Subscription Manager has been scheduled then we received this kind of alert especially when we want to run it immediately.

Anyway, I have another question. I did rhsmcertd -n and hoping that "Last Report" in the "Hosts > All Hosts" will appear, but it doesn't.

May i know what other things that i can check?

Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.