AMQ Streams crashes when deleting log

Solution Verified - Updated -

Issue

  • AMQ Streams crashed when deleting logs and unable to start cleanly. The Kafka pods ends with crash/loop backoff status, and raise the following exception in the log:

    09:02:07.274    kafka   2019-12-11 09:02:07,274 INFO [ReplicaAlterLogDirsManager on broker 0] Removed fetcher for partitions __consumer_offsets-22,__consumer_offsets-30,bridge-alfa-0,__consumer_offsets-8,__consumer_offsets-21,bulki-feedback-store-0,__consumer_offsets-4,event-notification-0,bridge-ca-0,__consumer_offsets-27,bulki-event-log-0,__consumer_offsets-7,bulki-user-access-0,__consumer_offsets-9,__consumer_offsets-46,event-notification-2-0,__consumer_offsets-25,__consumer_offsets-35,feed-data-0,grants-redemption-replay-0,bulki-access-bulki-user-access-store-changelog-0,__consumer_offsets-41,__consumer_offsets-33,__consumer_offsets-23,__consumer_offsets-49,__consumer_offsets-47,__consumer_offsets-16,__consumer_offsets-28,bulki-access-updates-0,__consumer_offsets-31,__consumer_offsets-36,__consumer_offsets-42,__consumer_offsets-3,__consumer_offsets-18,__consumer_offsets-37,feeds-audit-jobs-0,__consumer_offsets-15,__consumer_offsets-24,message-stats-0,grants-settlement-replay-0,command-log-0,__consumer_offsets-38,__consumer_offsets-17,__consumer_offsets-48,bulki-account-bulki-user-account-store-changelog-0,bulki-account-updates-0,bulki-chunk-updates-3,__consumer_offsets-19,__consumer_offsets-11,asset-notification-0,rsam-live-replay-0,feeds-event-log-0,__consumer_offsets-13,__consumer_offsets-2,__consumer_offsets-43,__consumer_offsets-6,__consumer_offsets-14,bulki-chunk-updates-1,__consumer_offsets-20,__consumer_offsets-0,__consumer_offsets-44,__consumer_offsets-39,bulki-file-data-0,__consumer_offsets-12,__consumer_offsets-45,__consumer_offsets-1,__consumer_offsets-5,__consumer_offsets-26,__consumer_offsets-29,__consumer_offsets-34,__consumer_offsets-10,__consumer_offsets-32,__consumer_offsets-40,bulki-chunk-updates-2 (kafka.server.ReplicaAlterLogDirsManager) [LogDirFailureHandler]
    09:02:07.269    kafka       at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
    09:02:07.269    kafka       at java.nio.file.Files.move(Files.java:1395)
    09:02:07.269    kafka       at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:786)
    09:02:07.269    kafka       at org.apache.kafka.common.record.FileRecords.renameTo(FileRecords.java:211)
    09:02:07.269    kafka       at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:488)
    09:02:07.269    kafka       at kafka.log.Log.$anonfun$replaceSegments$6(Log.scala:1814)
    09:02:07.269    kafka       at kafka.log.Log.$anonfun$replaceSegments$6$adapted(Log.scala:1814)
    09:02:07.269    kafka       at scala.collection.immutable.List.foreach(List.scala:389)
    09:02:07.269    kafka       at kafka.log.Log.replaceSegments(Log.scala:1814)
    09:02:07.269    kafka       at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:533)
    09:02:07.269    kafka       at kafka.log.Cleaner.$anonfun$doClean$6(LogCleaner.scala:465)
    09:02:07.269    kafka       at kafka.log.Cleaner.$anonfun$doClean$6$adapted(LogCleaner.scala:464)
    09:02:07.269    kafka       at scala.collection.immutable.List.foreach(List.scala:389)
    09:02:07.269    kafka       at kafka.log.Cleaner.doClean(LogCleaner.scala:464)
    09:02:07.269    kafka       at kafka.log.Cleaner.clean(LogCleaner.scala:442)
    09:02:07.269    kafka       at kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:303)
    09:02:07.269    kafka       at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:289)
    09:02:07.269    kafka       at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
    09:02:07.269    kafka       Suppressed: java.nio.file.FileAlreadyExistsException: /var/lib/kafka/kafka-log0/__consumer_offsets-16/00000000000000000000.log.swap -> /var/lib/kafka/kafka-log0/__consumer_offsets-16/00000000000000000000.log
    09:02:07.269    kafka           at sun.nio.fs.UnixException.translateToIOException(UnixException.java:88)
    09:02:07.269    kafka           at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
    09:02:07.269    kafka           at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:396)
    09:02:07.269    kafka           at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
    09:02:07.269    kafka           at java.nio.file.Files.move(Files.java:1395)
    09:02:07.269    kafka           at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:783)
    09:02:07.269    kafka           ... 15 more
    09:02:07.269    kafka   2019-12-11 09:02:07,269 INFO [ReplicaManager broker=0] Stopping serving replicas in dir /var/lib/kafka/kafka-log0 (kafka.server.ReplicaManager) [LogDirFailureHandler]
    09:02:07.268    kafka   java.nio.file.FileAlreadyExistsException: /var/lib/kafka/kafka-log0/__consumer_offsets-16/00000000000000000000.log.swap -> /var/lib/kafka/kafka-log0/__consumer_offsets-16/00000000000000000000.log
    09:02:07.268    kafka       at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
    09:02:07.268    kafka   2019-12-11 09:02:07,268 ERROR Failed to clean up log for __consumer_offsets-16 in dir /var/lib/kafka/kafka-log0 due to IOException (kafka.server.LogDirFailureChannel) [kafka-log-cleaner-thread-0]
    09:02:07.268    kafka       at sun.nio.fs.UnixException.translateToIOException(UnixException.java:88)
    09:02:07.268    kafka       at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:457)
    09:02:07.095    kafka   2019-12-11 09:02:07,095 INFO Cleaner 0: Swapping in cleaned segment LogSegment(baseOffset=0, size=1612) for segment(s) List(LogSegment(baseOffset=0, size=1619), LogSegment(baseOffset=715905, size=16049337)) in log Log(/var/lib/kafka/kafka-log0/__consumer_offsets-16) (kafka.log.LogCleaner) [kafka-log-cleaner-thread-0]
    09:02:06.986    kafka   2019-12-11 09:02:06,985 INFO Cleaner 0: Cleaning segment 0 in log __consumer_offsets-16 (largest timestamp Wed Dec 04 09:01:45 UTC 2019) into 0, retaining deletes. (kafka.log.LogCleaner) [kafka-log-cleaner-thread-0]
    09:02:06.986    kafka   2019-12-11 09:02:06,986 INFO Cleaner 0: Cleaning segment 715905 in log __consumer_offsets-16 (largest timestamp Wed Dec 11 09:01:48 UTC 2019) into 0, retaining deletes. (kafka.log.LogCleaner) [kafka-log-cleaner-thread-0]
    09:02:06.958    kafka   2019-12-11 09:02:06,958 INFO Cleaner 0: Cleaning log __consumer_offsets-16 (cleaning prior to Wed Dec 11 09:01:48 UTC 2019, discarding tombstones prior to Tue Dec 03 09:01:45 UTC 2019)... (kafka.log.LogCleaner) [kafka-log-cleaner-thread-0]
    09:02:06.957    kafka   2019-12-11 09:02:06,957 INFO Cleaner 0: Offset map for log __consumer_offsets-16 complete. (kafka.log.LogCleaner) [kafka-log-cleaner-thread-0]
    09:02:06.909    kafka   2019-12-11T09:02:06.865+0000: 57766.127: [GC pause (G1 Evacuation Pause) (young), 0.0431825 secs]
    

Environment

  • AMQ Streams 1.0.0
  • OpenShift Container 3.11
  • Storage: Gluster File Storage

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In