How to recover/resume kafka consumer when kafka storage is corrupted?

Solution Verified - Updated -

Issue

  • How to recover/resume kafka consumer when kafka storage is corrupted?
  • Kafka pods are in CreatingContainer State and Ready with 1/2 status, and the following logs are shown:

    ERROR [ReplicaManager broker=0] Error processing append operation on partition __consumer_offsets-13 (kafka.server.ReplicaManager) [data-plane-kafka-request-handler-2]
    org.apache.kafka.common.errors.NotEnoughReplicasException: The size of the current ISR Set(0) is insufficient to satisfy the min.isr requirement of 2 for partition __consumer_offsets-13
    INFO [GroupCoordinator 0]: Preparing to rebalance group group-xxx in state PreparingRebalance with old generation 118964 (__consumer_offsets-13) (reason: Error COORDINATOR_NOT_AVAILABLE when storing group assignment during SyncGroup (member: consumer-group-xxx-x-aaa-bbb-ccc)) (kafka.coordinator.group.GroupCoordinator) [data-plane-kafka-request-handler-2]
    [...]
    ERROR [ReplicaFetcher replicaId=2, leaderId=0, fetcherId=0] Error for partition __consumer_offsets-13 at offset xxxxx (kafka.server.ReplicaFetcherThread) [ReplicaFetcherThread-0-0]
    org.apache.kafka.common.errors.CorruptRecordException: This message has failed its CRC checksum, exceeds the valid size, has a null key for a compacted topic, or is otherwise corrupt.
    ERROR [ReplicaFetcher replicaId=2, leaderId=0, fetcherId=0] Error for partition __consumer_offsets-13 at offset xxxxx (kafka.server.ReplicaFetcherThread) [ReplicaFetcherThread-0-0]
    org.apache.kafka.common.errors.CorruptRecordException: This message has failed its CRC checksum, exceeds the valid size, has a null key for a compacted topic, or is otherwise corrupt.
    [...]
    org.apache.kafka.common.errors.KafkaStorageException: Disk error when trying to access log file on 
    the disk.
    ERROR [ReplicaFetcher replicaId=1, leaderId=0, fetcherId=0] Error for partition SQL_xxx-10 at offset 12345 (kafka.server.ReplicaFetcherThread) [ReplicaFetcherThread-0-0]
    org.apache.kafka.common.errors.KafkaStorageException: Disk error when trying to access log file on 
    the disk.
    
  • Observed below read only error on Kafka storage mount volumes:

    message: 'relabel failed /var/lib/kubelet/pods/xxxxxyyyyyzzzz/volumes/kubernetes.io~csi/pvc-abcdef/mount:
          lsetxattr /var/lib/kubelet/pods/xxxxxyyyyyzzzz/volumes/kubernetes.io~csi/pvc-abcdef/mount/kafka-log0:
          read-only file system'
    

Environment

  • Red Hat AMQ Streams
    • 2.4.0

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content