Vector collector pods keep crashing due to panic in RHOCP 4

Solution Verified - Updated -

Issue

  • Vector collector pods keep on restarting:

    $ oc get pod -n <namespace>
    NAME                                   READY         STATUS              RESTARTS       AGE     
    collector-xxxxx                        0/1           CrashLoopBackOff    5 (93s ago)    6m9s    
    collector-xxxxx                        1/1           Running             5 (84s ago)    6m10s   
    collector-xxxxx                        0/1           CrashLoopBackOff    5 (85s ago)    6m10s   
    collector-xxxxx                        0/1           CrashLoopBackOff    5 (83s ago)    6m9s    
    
  • Below logs keeps on streaming in vector collector pods indefinitely:

    $ oc logs collector-xxxxx -n <namespace>
    Creating the directory used for persisting Vector state /var/lib/vector/openshift-logging/collector
    Checking for buffer lock files
    /var/lib/vector/openshift-logging/collector /usr/bin
    found lock files: './buffer/v2/output_default_lokistack_application/buffer.lock
    .   /buffer/v2/output_default_lokistack_audit/buffer.lock
    .   /buffer/v2/output_default_lokistack_infrastructure/buffer.lock'
    removing file: './buffer/v2/output_default_lokistack_application/buffer.lock'
    removing file: './buffer/v2/output_default_lokistack_audit/buffer.lock'
    removing file: './buffer/v2/output_default_lokistack_infrastructure/buffer.lock'
    /usr/bin
    Starting Vector process...
    2024-10-21T07:41:55.464002Z  WARN vector::internal_events::file::source: Currently ignoring file too small to fingerprint. file=/var/log/ovn/acl-audit-log.log
    thread 'vector-worker' panicked at /remote-source/vector/app/lib/vector-buffers/src/variants/disk_v2/reader.rs:601:30:
    skipping more than 2^64 events at a time is obviously a bug
    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
    2024-10-21T07:41:55.895411Z ERROR sink{component_kind="sink" component_id=output_default_lokistack_audit component_type=loki}: vector::topology: An error occurred that Vector couldn't handle: the task panicked and was aborted.
    thread 'vector-worker' panicked at /remote-source/vector/app/lib/vector-buffers/src/variants/disk_v2/reader.rs:601:30:
    skipping more than 2^64 events at a time is obviously a bug
    2024-10-21T07:41:55.903386Z ERROR sink{component_kind="sink" component_id=output_default_lokistack_infrastructure component_type=loki}: vector::topology: An error occurred that Vector couldn't handle: the task panicked and was aborted.
    2024-10-21T07:42:55.896501Z ERROR vector_common::shutdown: Source 'input_audit_kube' failed to shutdown before deadline. Forcing shutdown.
    2024-10-21T07:42:55.896567Z ERROR vector::topology::running: Failed to gracefully shut down in time. Killing components. components="pipeline_pipeline_lokistack_2_viaq_0, pipeline_pipeline_lokistack_2_viaqdedot_1, input_audit_kube_meta, input_audit_kube, output_default_lokistack_audit_remap, output_default_lokistack_audit_remap_label, output_default_lokistack_infrastructure_remap_label" 
    2024-10-21T07:42:55.905733Z ERROR source{component_kind="source" component_id=input_audit_kube component_type=file}: vector_common::internal_event::component_events_dropped: Events dropped intentional=false count=389 reason="Source send cancelled." internal_log_rate_limit=true
    

Environment

  • Red Hat OpenShift Container Platform (RHOCP)
    • 4
  • Red Hat OpenShift Logging (RHOL)
    • 6.0+

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content