ETL service sampling has encountered an error reported in the RHV-M GUI/webadmin events tab.

Solution In Progress - Updated -

Environment

  • Red Hat Virtualization (RHV-M) version: 4.4.6 (ovirt-engine-4.4.6.8-0.1.el8ev.noarch)
  • Red Hat Virtualization (RHV-M) Data Warehouse version 4.4.6.2 (ovirt-engine-dwh-4.4.6.2-1.el8ev.noarch)

Issue

  • ETL service sampling has encountered an error reported in the RHV-M GUI/webadmin events tab.

  • The /var/log/ovirt-engine-dwh/ovirt-engine-dwh.log file contains the following error:

2021-06-05 15:17:59|E2VBuS|8AswF0|BTCsBe|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|6|Java Exception|tRunJob_1|java.lang.RuntimeException:Child job running failed|1
Exception in component tJDBCInput_18
org.postgresql.util.PSQLException: Bad value for type int : 2705788958
        at org.postgresql.jdbc.PgResultSet.toInt(PgResultSet.java:2838)
        at org.postgresql.jdbc.PgResultSet.getInt(PgResultSet.java:2073)
        at ovirt_engine_dwh.statisticssync_4_4.StatisticsSync.tJDBCInput_18Process(StatisticsSync.java:13283)
        at ovirt_engine_dwh.statisticssync_4_4.StatisticsSync$7.run(StatisticsSync.java:16287)
2021-06-05 15:18:01|h9VuCs|8AswF0|z4SX9P|OVIRT_ENGINE_DWH|StatisticsSync|Default|6|Java Exception|tJDBCInput_18|org.postgresql.util.PSQLException:Bad value for type int : 2705788958|1
Exception in component tRunJob_5
java.lang.RuntimeException: Child job running failed
        at ovirt_engine_dwh.samplerunjobs_4_4.SampleRunJobs.tRunJob_5Process(SampleRunJobs.java:1654)
        at ovirt_engine_dwh.samplerunjobs_4_4.SampleRunJobs.tRunJob_6Process(SampleRunJobs.java:1456)
        at ovirt_engine_dwh.samplerunjobs_4_4.SampleRunJobs.tRunJob_1Process(SampleRunJobs.java:1228)
        at ovirt_engine_dwh.samplerunjobs_4_4.SampleRunJobs.tRunJob_4Process(SampleRunJobs.java:1000)
        at ovirt_engine_dwh.samplerunjobs_4_4.SampleRunJobs.tJDBCConnection_2Process(SampleRunJobs.java:767)
        at ovirt_engine_dwh.samplerunjobs_4_4.SampleRunJobs.tJDBCConnection_1Process(SampleRunJobs.java:642)
        at ovirt_engine_dwh.samplerunjobs_4_4.SampleRunJobs$2.run(SampleRunJobs.java:2683)
2021-06-05 15:18:01|z4SX9P|8AswF0|85YaR7|OVIRT_ENGINE_DWH|SampleRunJobs|Default|6|Java Exception|tRunJob_5|java.lang.RuntimeException:Child job running failed|1
Exception in component tRunJob_1
java.lang.RuntimeException: Child job running failed
        at ovirt_engine_dwh.sampletimekeepingjob_4_4.SampleTimeKeepingJob.tRunJob_1Process(SampleTimeKeepingJob.java:6196)
        at ovirt_engine_dwh.sampletimekeepingjob_4_4.SampleTimeKeepingJob.tJDBCInput_2Process(SampleTimeKeepingJob.java:5938)
        at ovirt_engine_dwh.sampletimekeepingjob_4_4.SampleTimeKeepingJob.tJDBCConnection_1Process(SampleTimeKeepingJob.java:4573)
        at ovirt_engine_dwh.sampletimekeepingjob_4_4.SampleTimeKeepingJob.tJDBCConnection_2Process(SampleTimeKeepingJob.java:4448)
        at ovirt_engine_dwh.sampletimekeepingjob_4_4.SampleTimeKeepingJob.tRowGenerator_2Process(SampleTimeKeepingJob.java:4317)
        at ovirt_engine_dwh.sampletimekeepingjob_4_4.SampleTimeKeepingJob.tJDBCInput_3Process(SampleTimeKeepingJob.java:3722)
        at ovirt_engine_dwh.sampletimekeepingjob_4_4.SampleTimeKeepingJob.tJDBCInput_5Process(SampleTimeKeepingJob.java:3106)
        at ovirt_engine_dwh.sampletimekeepingjob_4_4.SampleTimeKeepingJob.tJDBCInput_4Process(SampleTimeKeepingJob.java:2424)
        at ovirt_engine_dwh.sampletimekeepingjob_4_4.SampleTimeKeepingJob.tJDBCConnection_3Process(SampleTimeKeepingJob.java:1778)
        at ovirt_engine_dwh.sampletimekeepingjob_4_4.SampleTimeKeepingJob$2.run(SampleTimeKeepingJob.java:11524)

Resolution

  • BZ 1929211 was added to correct the defect and fixed in ovirt-engine-dwh-4.4.6.3.

  • Please upgrade the to the following errata fix version to avoid the problem.

RHBA-2021:2528 - Bug Fix Advisory -> https://access.redhat.com/errata/RHBA-2021:2528

Root Cause

  • The IOPS stats from VDSM are gathered via libvirt and libvirt returns a long long (64bit long)
  • In the engine database the table disk_image_dynamic column read_ops and write_ops are type bigint to accommodate the value.

  • The problem actually is on the ovirt-engine-dwh database side.
    The following tables have the read_ops_per_second and write_ops_per_second type as int4 (-2147483648 to +2147483647)

vm_disk_hourly_history
vm_disk_daily_history
vm_disk_samples_history
  • The ETL Bad Value is generated when trying to update the IOPS stats on the ovirt-engine-dwh database in the tables mentioned above.

Diagnostic Steps

  • When checking the engine database, in the table disk_image_dynamic the following images read_ops and write_ops with higher values.
               image_id               |  read_ops  | write_ops 
--------------------------------------+------------+-----------
 354b7807-a273-4f97-8072-a3bc664f7721 | 2705807314 |  35957110
  • In the ovirt_engine_history database the read_ops_per_second and write_ops_per_second are type integer (-2147483648 to +2147483647) for the following tables.
                                           Table "public.vm_disk_samples_history"
            Column             |           Type           | Collation | Nullable |                  Default                  
-------------------------------+--------------------------+-----------+----------+-------------------------------------------
 history_id                    | bigint                   |           | not null | nextval('vm_disk_history_seq1'::regclass)
 history_datetime              | timestamp with time zone |           | not null | 
 image_id                      | uuid                     |           |          | 
 vm_disk_status                | smallint                 |           |          | 
 vm_disk_actual_size_mb        | integer                  |           | not null | 
 read_rate_bytes_per_second    | integer                  |           |          | 
 read_latency_seconds          | numeric(18,9)            |           |          | 
 write_rate_bytes_per_second   | integer                  |           |          | 
 write_latency_seconds         | numeric(18,9)            |           |          | 
 flush_latency_seconds         | numeric(18,9)            |           |          | 
 vm_disk_configuration_version | integer                  |           |          | 
 vm_disk_id                    | uuid                     |           | not null | 
 seconds_in_status             | integer                  |           | not null | 20
 read_ops_per_second           | integer                  |           |          | 
 write_ops_per_second          | integer                  |           |          | 
Indexes:
    "vm_disk_samples_history_pkey" PRIMARY KEY, btree (history_id)
    "idx_vm_disk_configuration_version_samples" btree (vm_disk_configuration_version)
    "idx_vm_disk_history_datetime_samples" btree (history_datetime)
    "vm_disk_samples_history_vm_disk_id_idx" btree (vm_disk_id)


                                        Table "public.vm_disk_daily_history"
             Column              |     Type      | Collation | Nullable |                  Default                  
---------------------------------+---------------+-----------+----------+-------------------------------------------
 history_id                      | bigint        |           | not null | nextval('vm_disk_history_seq3'::regclass)
 history_datetime                | date          |           | not null | 
 image_id                        | uuid          |           |          | 
 vm_disk_status                  | smallint      |           |          | 
 minutes_in_status               | numeric(7,2)  |           | not null | 1
 vm_disk_actual_size_mb          | integer       |           | not null | 
 read_rate_bytes_per_second      | integer       |           |          | 
 max_read_rate_bytes_per_second  | integer       |           |          | 
 read_latency_seconds            | numeric(18,9) |           |          | 
 max_read_latency_seconds        | numeric(18,9) |           |          | 
 write_rate_bytes_per_second     | integer       |           |          | 
 max_write_rate_bytes_per_second | integer       |           |          | 
 write_latency_seconds           | numeric(18,9) |           |          | 
 max_write_latency_seconds       | numeric(18,9) |           |          | 
 flush_latency_seconds           | numeric(18,9) |           |          | 
 max_flush_latency_seconds       | numeric(18,9) |           |          | 
 vm_disk_configuration_version   | integer       |           |          | 
 vm_disk_id                      | uuid          |           | not null | 
 read_ops_per_second             | integer       |           |          | 
 write_ops_per_second            | integer       |           |          | 
 max_read_ops_per_second         | integer       |           |          | 
 max_write_ops_per_second        | integer       |           |          | 
Indexes:
    "vm_disk_daily_history_pkey" PRIMARY KEY, btree (history_id)
    "idx_vm_disk_configuration_version_daily" btree (vm_disk_configuration_version)
    "idx_vm_disk_history_datetime_daily" btree (history_datetime)
    "vm_disk_daily_history_vm_disk_id_idx" btree (vm_disk_id)

                                             Table "public.vm_disk_hourly_history"
             Column              |           Type           | Collation | Nullable |                  Default                  
---------------------------------+--------------------------+-----------+----------+-------------------------------------------
 history_id                      | bigint                   |           | not null | nextval('vm_disk_history_seq2'::regclass)
 history_datetime                | timestamp with time zone |           | not null | 
 image_id                        | uuid                     |           |          | 
 vm_disk_status                  | smallint                 |           |          | 
 minutes_in_status               | numeric(7,2)             |           | not null | 1
 vm_disk_actual_size_mb          | integer                  |           | not null | 
 read_rate_bytes_per_second      | integer                  |           |          | 
 max_read_rate_bytes_per_second  | integer                  |           |          | 
 read_latency_seconds            | numeric(18,9)            |           |          | 
 max_read_latency_seconds        | numeric(18,9)            |           |          | 
 write_rate_bytes_per_second     | integer                  |           |          | 
 max_write_rate_bytes_per_second | integer                  |           |          | 
 write_latency_seconds           | numeric(18,9)            |           |          | 
 max_write_latency_seconds       | numeric(18,9)            |           |          | 
 flush_latency_seconds           | numeric(18,9)            |           |          | 
 max_flush_latency_seconds       | numeric(18,9)            |           |          | 
 vm_disk_configuration_version   | integer                  |           |          | 
 vm_disk_id                      | uuid                     |           | not null | 
 read_ops_per_second             | integer                  |           |          | 
 write_ops_per_second            | integer                  |           |          | 
 max_read_ops_per_second         | integer                  |           |          | 
 max_write_ops_per_second        | integer                  |           |          | 
Indexes:
    "vm_disk_hourly_history_pkey" PRIMARY KEY, btree (history_id)
    "idx_vm_disk_configuration_version_hourly" btree (vm_disk_configuration_version)
    "idx_vm_disk_history_datetime_hourly" btree (history_datetime)
    "vm_disk_hourly_history_vm_disk_id_idx" btree (vm_disk_id)

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments