Vector collector pods keep crashing due to panic in RHOCP 4
Issue
-
Vectorcollector pods keep on restarting:$ oc get pod -n <namespace> NAME READY STATUS RESTARTS AGE collector-xxxxx 0/1 CrashLoopBackOff 5 (93s ago) 6m9s collector-xxxxx 1/1 Running 5 (84s ago) 6m10s collector-xxxxx 0/1 CrashLoopBackOff 5 (85s ago) 6m10s collector-xxxxx 0/1 CrashLoopBackOff 5 (83s ago) 6m9s -
Below logs keeps on streaming in
vectorcollector pods indefinitely:$ oc logs collector-xxxxx -n <namespace> Creating the directory used for persisting Vector state /var/lib/vector/openshift-logging/collector Checking for buffer lock files /var/lib/vector/openshift-logging/collector /usr/bin found lock files: './buffer/v2/output_default_lokistack_application/buffer.lock . /buffer/v2/output_default_lokistack_audit/buffer.lock . /buffer/v2/output_default_lokistack_infrastructure/buffer.lock' removing file: './buffer/v2/output_default_lokistack_application/buffer.lock' removing file: './buffer/v2/output_default_lokistack_audit/buffer.lock' removing file: './buffer/v2/output_default_lokistack_infrastructure/buffer.lock' /usr/bin Starting Vector process... 2024-10-21T07:41:55.464002Z WARN vector::internal_events::file::source: Currently ignoring file too small to fingerprint. file=/var/log/ovn/acl-audit-log.log thread 'vector-worker' panicked at /remote-source/vector/app/lib/vector-buffers/src/variants/disk_v2/reader.rs:601:30: skipping more than 2^64 events at a time is obviously a bug note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace 2024-10-21T07:41:55.895411Z ERROR sink{component_kind="sink" component_id=output_default_lokistack_audit component_type=loki}: vector::topology: An error occurred that Vector couldn't handle: the task panicked and was aborted. thread 'vector-worker' panicked at /remote-source/vector/app/lib/vector-buffers/src/variants/disk_v2/reader.rs:601:30: skipping more than 2^64 events at a time is obviously a bug 2024-10-21T07:41:55.903386Z ERROR sink{component_kind="sink" component_id=output_default_lokistack_infrastructure component_type=loki}: vector::topology: An error occurred that Vector couldn't handle: the task panicked and was aborted. 2024-10-21T07:42:55.896501Z ERROR vector_common::shutdown: Source 'input_audit_kube' failed to shutdown before deadline. Forcing shutdown. 2024-10-21T07:42:55.896567Z ERROR vector::topology::running: Failed to gracefully shut down in time. Killing components. components="pipeline_pipeline_lokistack_2_viaq_0, pipeline_pipeline_lokistack_2_viaqdedot_1, input_audit_kube_meta, input_audit_kube, output_default_lokistack_audit_remap, output_default_lokistack_audit_remap_label, output_default_lokistack_infrastructure_remap_label" 2024-10-21T07:42:55.905733Z ERROR source{component_kind="source" component_id=input_audit_kube component_type=file}: vector_common::internal_event::component_events_dropped: Events dropped intentional=false count=389 reason="Source send cancelled." internal_log_rate_limit=true
Environment
- Red Hat OpenShift Container Platform (RHOCP)
- 4
- Red Hat OpenShift Logging (RHOL)
- 6.0+
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.