AMQ Streams crashes when deleting log

Solution Verified - Updated -

Issue

  • AMQ Streams crashed when deleting logs and unable to start cleanly. The Kafka pods ends with crash/loop backoff status, and raise the following exception in the log:

    09:02:07.274    kafka   2019-12-11 09:02:07,274 INFO [ReplicaAlterLogDirsManager on broker 0] Removed fetcher for partitions __consumer_offsets-22,__consumer_offsets-30,bridge-alfa-0,__consumer_offsets-8,__consumer_offsets-21,bulki-feedback-store-0,__consumer_offsets-4,event-notification-0,bridge-ca-0,__consumer_offsets-27,bulki-event-log-0,__consumer_offsets-7,bulki-user-access-0,__consumer_offsets-9,__consumer_offsets-46,event-notification-2-0,__consumer_offsets-25,__consumer_offsets-35,feed-data-0,grants-redemption-replay-0,bulki-access-bulki-user-access-store-changelog-0,__consumer_offsets-41,__consumer_offsets-33,__consumer_offsets-23,__consumer_offsets-49,__consumer_offsets-47,__consumer_offsets-16,__consumer_offsets-28,bulki-access-updates-0,__consumer_offsets-31,__consumer_offsets-36,__consumer_offsets-42,__consumer_offsets-3,__consumer_offsets-18,__consumer_offsets-37,feeds-audit-jobs-0,__consumer_offsets-15,__consumer_offsets-24,message-stats-0,grants-settlement-replay-0,command-log-0,__consumer_offsets-38,__consumer_offsets-17,__consumer_offsets-48,bulki-account-bulki-user-account-store-changelog-0,bulki-account-updates-0,bulki-chunk-updates-3,__consumer_offsets-19,__consumer_offsets-11,asset-notification-0,rsam-live-replay-0,feeds-event-log-0,__consumer_offsets-13,__consumer_offsets-2,__consumer_offsets-43,__consumer_offsets-6,__consumer_offsets-14,bulki-chunk-updates-1,__consumer_offsets-20,__consumer_offsets-0,__consumer_offsets-44,__consumer_offsets-39,bulki-file-data-0,__consumer_offsets-12,__consumer_offsets-45,__consumer_offsets-1,__consumer_offsets-5,__consumer_offsets-26,__consumer_offsets-29,__consumer_offsets-34,__consumer_offsets-10,__consumer_offsets-32,__consumer_offsets-40,bulki-chunk-updates-2 (kafka.server.ReplicaAlterLogDirsManager) [LogDirFailureHandler]
    09:02:07.269    kafka       at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
    09:02:07.269    kafka       at java.nio.file.Files.move(Files.java:1395)
    09:02:07.269    kafka       at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:786)
    09:02:07.269    kafka       at org.apache.kafka.common.record.FileRecords.renameTo(FileRecords.java:211)
    09:02:07.269    kafka       at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:488)
    09:02:07.269    kafka       at kafka.log.Log.$anonfun$replaceSegments$6(Log.scala:1814)
    09:02:07.269    kafka       at kafka.log.Log.$anonfun$replaceSegments$6$adapted(Log.scala:1814)
    09:02:07.269    kafka       at scala.collection.immutable.List.foreach(List.scala:389)
    09:02:07.269    kafka       at kafka.log.Log.replaceSegments(Log.scala:1814)
    09:02:07.269    kafka       at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:533)
    09:02:07.269    kafka       at kafka.log.Cleaner.$anonfun$doClean$6(LogCleaner.scala:465)
    09:02:07.269    kafka       at kafka.log.Cleaner.$anonfun$doClean$6$adapted(LogCleaner.scala:464)
    09:02:07.269    kafka       at scala.collection.immutable.List.foreach(List.scala:389)
    09:02:07.269    kafka       at kafka.log.Cleaner.doClean(LogCleaner.scala:464)
    09:02:07.269    kafka       at kafka.log.Cleaner.clean(LogCleaner.scala:442)
    09:02:07.269    kafka       at kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:303)
    09:02:07.269    kafka       at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:289)
    09:02:07.269    kafka       at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
    09:02:07.269    kafka       Suppressed: java.nio.file.FileAlreadyExistsException: /var/lib/kafka/kafka-log0/__consumer_offsets-16/00000000000000000000.log.swap -> /var/lib/kafka/kafka-log0/__consumer_offsets-16/00000000000000000000.log
    09:02:07.269    kafka           at sun.nio.fs.UnixException.translateToIOException(UnixException.java:88)
    09:02:07.269    kafka           at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
    09:02:07.269    kafka           at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:396)
    09:02:07.269    kafka           at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
    09:02:07.269    kafka           at java.nio.file.Files.move(Files.java:1395)
    09:02:07.269    kafka           at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:783)
    09:02:07.269    kafka           ... 15 more
    09:02:07.269    kafka   2019-12-11 09:02:07,269 INFO [ReplicaManager broker=0] Stopping serving replicas in dir /var/lib/kafka/kafka-log0 (kafka.server.ReplicaManager) [LogDirFailureHandler]
    09:02:07.268    kafka   java.nio.file.FileAlreadyExistsException: /var/lib/kafka/kafka-log0/__consumer_offsets-16/00000000000000000000.log.swap -> /var/lib/kafka/kafka-log0/__consumer_offsets-16/00000000000000000000.log
    09:02:07.268    kafka       at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
    09:02:07.268    kafka   2019-12-11 09:02:07,268 ERROR Failed to clean up log for __consumer_offsets-16 in dir /var/lib/kafka/kafka-log0 due to IOException (kafka.server.LogDirFailureChannel) [kafka-log-cleaner-thread-0]
    09:02:07.268    kafka       at sun.nio.fs.UnixException.translateToIOException(UnixException.java:88)
    09:02:07.268    kafka       at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:457)
    09:02:07.095    kafka   2019-12-11 09:02:07,095 INFO Cleaner 0: Swapping in cleaned segment LogSegment(baseOffset=0, size=1612) for segment(s) List(LogSegment(baseOffset=0, size=1619), LogSegment(baseOffset=715905, size=16049337)) in log Log(/var/lib/kafka/kafka-log0/__consumer_offsets-16) (kafka.log.LogCleaner) [kafka-log-cleaner-thread-0]
    09:02:06.986    kafka   2019-12-11 09:02:06,985 INFO Cleaner 0: Cleaning segment 0 in log __consumer_offsets-16 (largest timestamp Wed Dec 04 09:01:45 UTC 2019) into 0, retaining deletes. (kafka.log.LogCleaner) [kafka-log-cleaner-thread-0]
    09:02:06.986    kafka   2019-12-11 09:02:06,986 INFO Cleaner 0: Cleaning segment 715905 in log __consumer_offsets-16 (largest timestamp Wed Dec 11 09:01:48 UTC 2019) into 0, retaining deletes. (kafka.log.LogCleaner) [kafka-log-cleaner-thread-0]
    09:02:06.958    kafka   2019-12-11 09:02:06,958 INFO Cleaner 0: Cleaning log __consumer_offsets-16 (cleaning prior to Wed Dec 11 09:01:48 UTC 2019, discarding tombstones prior to Tue Dec 03 09:01:45 UTC 2019)... (kafka.log.LogCleaner) [kafka-log-cleaner-thread-0]
    09:02:06.957    kafka   2019-12-11 09:02:06,957 INFO Cleaner 0: Offset map for log __consumer_offsets-16 complete. (kafka.log.LogCleaner) [kafka-log-cleaner-thread-0]
    09:02:06.909    kafka   2019-12-11T09:02:06.865+0000: 57766.127: [GC pause (G1 Evacuation Pause) (young), 0.0431825 secs]
    

Environment

  • AMQ Streams 1.0.0
  • OpenShift Container 3.11
  • Storage: Gluster File Storage

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content