nvkm_runl_preempt_wait() is timed out and PCIe Bus Error occurs on nouveau
Issue
-
Sometimes the screen does not show anything, and the following kernel messages are shown.
------------[ cut here ]------------ nouveau 0000:01:00.0: timeout WARNING: CPU: 7 PID: 32201 at drivers/gpu/drm/nouveau/nvkm/engine/fifo/runl.c:251 nvkm_runl_preempt_wait+0xc5/0xe0 [nouveau] Modules linked in: tls uinput rfcomm snd_seq_dummy snd_hrtimer nf_log_syslog nft_log nft_limit qrtr nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables bnep nfnetlink sunrpc vfat fat snd_ctl_led snd_soc_skl_hda_dsp snd_soc_intel_hda_dsp_common snd_soc_hdac_hdmi snd_sof_probes hid_multitouch snd_hda_codec_realtek snd_hda_codec_generic snd_soc_dmic snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel snd_sof_intel_hda_mlink soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp iwlmvm snd_sof snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi soundwire_generic_allocation mac80211 snd_soc_core intel_uncore_frequency snd_hda_codec_hdmi intel_uncore_frequency_common snd_compress x86_pkg_temp_thermal intel_powerclamp soundwire_bus coretemp snd_hda_intel kvm_intel snd_intel_dspcfg snd_intel_sdw_acpi libarc4 snd_hda_codec kvm uvcvideo snd_hda_core btusb uvc btrtl videobuf2_vmalloc snd_hwdep irqbypass snd_hda_scodec_cs35l41_spi videobuf2_memops btbcm i2c_designware_platform snd_seq iwlwifi videobuf2_v4l2 btintel rapl i2c_designware_core regmap_spi btmtk snd_seq_device videobuf2_common intel_cstate iTCO_wdt snd_hda_scodec_cs35l41_i2c mei_wdt iTCO_vendor_support intel_rapl_msr bluetooth videodev intel_uncore snd_hda_scodec_cs35l41 snd_pcm think_lmi cfg80211 thinkpad_acpi firmware_attributes_class processor_thermal_device_pci snd_hda_cs_dsp_ctls processor_thermal_device pcspkr wmi_bmof cs_dsp ledtrig_audio mc processor_thermal_rfim snd_timer snd_soc_cs35l41_lib platform_profile mei_me processor_thermal_mbox intel_lpss_pci snd i2c_i801 processor_thermal_rapl iosm(X) intel_lpss mei i2c_smbus idma64 intel_rapl_common soundcore rfkill regmap_i2c intel_pmc_core int3403_thermal serial_multi_instantiate intel_vsec int340x_thermal_zone pmt_telemetry int3400_thermal pmt_class acpi_thermal_rel acpi_pad joydev acpi_tad xfs libcrc32c i915 nouveau drm_ttm_helper drm_exec gpu_sched mxm_wmi drm_buddy i2c_algo_bit ttm intel_gtt drm_display_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops rtsx_pci_sdmmc drm nvme mmc_core nvme_core crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel rtsx_pci cec nvme_common t10_pi i2c_hid_acpi i2c_hid video wmi pinctrl_tigerlake serio_raw dm_mirror dm_region_hash dm_log dm_mod fuse CPU: 7 PID: 32201 Comm: kworker/u40:58 Kdump: loaded Tainted: G X ------- --- 5.14.0-427.28.1.el9_4.x86_64 #1 Hardware name: LENOVO 21FVCTO1WW/21FVCTO1WW, BIOS N3ZET41W (1.28 ) 04/24/2024 Workqueue: events_unbound async_run_entry_fn RIP: 0010:nvkm_runl_preempt_wait+0xc5/0xe0 [nouveau] Code: 8b 40 10 48 8b 78 10 4c 8b 67 50 4d 85 e4 75 03 4c 8b 27 e8 1d 93 b9 f7 4c 89 e2 48 c7 c7 ac 7a b7 c0 48 89 c6 e8 bb 36 4a f7 <0f> 0b b8 92 ff ff ff eb ac e8 4d ef 01 f8 66 66 2e 0f 1f 84 00 00 RSP: 0000:ffffa0480249fc08 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff8d57c1bee200 RCX: 0000000000000027 RDX: 0000000000000027 RSI: ffffffffb9a67a00 RDI: ffff8d5b0f3e0848 RBP: ffff8d5826a44a00 R08: 80000000ffff86a4 R09: 00000000ba8c6d97 R10: ffffffffffffffff R11: 000000000000001f R12: ffff8d57c248e730 R13: ffff8d57f39dd440 R14: ffff8d57c1bee200 R15: 0000000000000040 FS: 0000000000000000(0000) GS:ffff8d5b0f3c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000311010000 CR4: 0000000000750ee0 PKRU: 55555554 Call Trace: <TASK> ? show_trace_log_lvl+0x1c4/0x2df ? show_trace_log_lvl+0x1c4/0x2df ? nvkm_chan_cctx_bind+0x91/0x100 [nouveau] ? nvkm_runl_preempt_wait+0xc5/0xe0 [nouveau] ? __warn+0x81/0x110 ? nvkm_runl_preempt_wait+0xc5/0xe0 [nouveau] ? report_bug+0x10a/0x140 ? handle_bug+0x3c/0x70 ? exc_invalid_op+0x14/0x70 ? asm_exc_invalid_op+0x16/0x20 ? nvkm_runl_preempt_wait+0xc5/0xe0 [nouveau] ? nvkm_runl_preempt_wait+0xc5/0xe0 [nouveau] nvkm_chan_cctx_bind+0x91/0x100 [nouveau] nvkm_uchan_object_init_0+0xe4/0x140 [nouveau] nvkm_oproxy_init+0x20/0x60 [nouveau] nvkm_object_init+0x3b/0x120 [nouveau] nvkm_object_init+0x73/0x120 [nouveau] nvkm_object_init+0x73/0x120 [nouveau] nvkm_object_init+0x73/0x120 [nouveau] nvkm_object_init+0x73/0x120 [nouveau] nouveau_do_resume+0x2b/0xb0 [nouveau] nouveau_pmops_resume+0x65/0x90 [nouveau] ? __pfx_pci_pm_restore+0x10/0x10 dpm_run_callback+0x49/0x140 device_resume+0x8b/0x190 async_resume+0x19/0x30 async_run_entry_fn+0x2d/0x130 process_one_work+0x1e2/0x3b0 worker_thread+0x50/0x3a0 ? __pfx_worker_thread+0x10/0x10 kthread+0xdd/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x29/0x50 </TASK> ---[ end trace 3a3d07299e5fd330 ]--- -
The following messages are also shown.
pcieport 0000:00:01.0: AER: Uncorrected (Non-Fatal) error received: 0000:01:00.0 nouveau 0000:01:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID) nouveau 0000:01:00.0: device [10de:25bc] error status/mask=00100000/00000000 nouveau 0000:01:00.0: [20] UnsupReq (First) nouveau 0000:01:00.0: AER: TLP Header: 40000001 0000000f bd00da60 f7f7f7f7 nouveau 0000:01:00.0: AER: can't recover (no error_detected callback) snd_hda_intel 0000:01:00.1: AER: can't recover (no error_detected callback) pcieport 0000:00:01.0: AER: device recovery failed
Environment
- Red Hat Enterprise Linux 9
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.