How to recover/resume kafka consumer when kafka storage is corrupted?

Issue

How to recover/resume kafka consumer when kafka storage is corrupted?

Kafka pods are in CreatingContainer State and Ready with 1/2 status, and the following logs are shown:

ERROR [ReplicaManager broker=0] Error processing append operation on partition __consumer_offsets-13 (kafka.server.ReplicaManager) [data-plane-kafka-request-handler-2]
org.apache.kafka.common.errors.NotEnoughReplicasException: The size of the current ISR Set(0) is insufficient to satisfy the min.isr requirement of 2 for partition __consumer_offsets-13
INFO [GroupCoordinator 0]: Preparing to rebalance group group-xxx in state PreparingRebalance with old generation 118964 (__consumer_offsets-13) (reason: Error COORDINATOR_NOT_AVAILABLE when storing group assignment during SyncGroup (member: consumer-group-xxx-x-aaa-bbb-ccc)) (kafka.coordinator.group.GroupCoordinator) [data-plane-kafka-request-handler-2]
[...]
ERROR [ReplicaFetcher replicaId=2, leaderId=0, fetcherId=0] Error for partition __consumer_offsets-13 at offset xxxxx (kafka.server.ReplicaFetcherThread) [ReplicaFetcherThread-0-0]
org.apache.kafka.common.errors.CorruptRecordException: This message has failed its CRC checksum, exceeds the valid size, has a null key for a compacted topic, or is otherwise corrupt.
ERROR [ReplicaFetcher replicaId=2, leaderId=0, fetcherId=0] Error for partition __consumer_offsets-13 at offset xxxxx (kafka.server.ReplicaFetcherThread) [ReplicaFetcherThread-0-0]
org.apache.kafka.common.errors.CorruptRecordException: This message has failed its CRC checksum, exceeds the valid size, has a null key for a compacted topic, or is otherwise corrupt.
[...]
org.apache.kafka.common.errors.KafkaStorageException: Disk error when trying to access log file on 
the disk.
ERROR [ReplicaFetcher replicaId=1, leaderId=0, fetcherId=0] Error for partition SQL_xxx-10 at offset 12345 (kafka.server.ReplicaFetcherThread) [ReplicaFetcherThread-0-0]
org.apache.kafka.common.errors.KafkaStorageException: Disk error when trying to access log file on 
the disk.

Observed below read only error on Kafka storage mount volumes:

message: 'relabel failed /var/lib/kubelet/pods/xxxxxyyyyyzzzz/volumes/kubernetes.io~csi/pvc-abcdef/mount:
      lsetxattr /var/lib/kubelet/pods/xxxxxyyyyyzzzz/volumes/kubernetes.io~csi/pvc-abcdef/mount/kafka-log0:
      read-only file system'

Environment

Red Hat AMQ Streams
- 2.4.0

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Ansible.com

Red Hat Ecosystem Catalog

Red Hat Hybrid Cloud Console

Red Hat Store

Red Hat Marketplace

Red Hat Summit and AnsibleFest

How to recover/resume kafka consumer when kafka storage is corrupted?

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links