The commit mentioned below was more tolerant with the number of RST seen
during a test because in some uncontrollable situations, multiple RST
can be generated.
But it was not taking into account the case where no RST are expected:
this validation was then no longer reporting issues for the 0 RST case
because it is not possible to have less than 0 RST in the counter. This
patch fixes the issue by adding a specific condition.
Fixes: 6bf41020b72b ("selftests: mptcp: update and extend fastclose test-cases") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts <matttbe@kernel.org> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20231018-send-net-20231018-v1-1-17ecb002e41d@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Matthieu Baerts <matttbe@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When closing the first subflow, the MPTCP protocol unconditionally
calls tcp_disconnect(), which in turn generates a reset if the subflow
is established.
That is unexpected and different from what MPTCP does with MPJ
subflows, where resets are generated only on FASTCLOSE and other edge
scenarios.
We can't reuse for the first subflow the same code in place for MPJ
subflows, as MPTCP clean them up completely via a tcp_close() call,
while must keep the first subflow socket alive for later re-usage, due
to implementation constraints.
This patch adds a new helper __mptcp_subflow_disconnect() that
encapsulates, a logic similar to tcp_close, issuing a reset only when
the MPTCP_CF_FASTCLOSE flag is set, and performing a clean shutdown
otherwise.
The code pattern of memcpy(dst, src, strlen(src)) is almost always
wrong. In this case it is wrong because it leaves memory uninitialized
if it is less than sizeof(ni->name), and overflows ni->name when longer.
Normally strtomem_pad() could be used here, but since ni->name is a
trailing array in struct hci_mon_new_index, compilers that don't support
-fstrict-flex-arrays=3 can't tell how large this array is via
__builtin_object_size(). Instead, open-code the helper and use sizeof()
since it will work correctly.
Additionally mark ni->name as __nonstring since it appears to not be a
%NUL terminated C string.
Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Cc: Edward AD <twuufnxlz@gmail.com> Cc: Marcel Holtmann <marcel@holtmann.org> Cc: Johan Hedberg <johan.hedberg@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: linux-bluetooth@vger.kernel.org Cc: netdev@vger.kernel.org Fixes: 18f547f3fc07 ("Bluetooth: hci_sock: fix slab oob read in create_monitor_event") Link: https://lore.kernel.org/lkml/202310110908.F2639D3276@keescook/ Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The altname nodes are currently not moved to the new netns
when netdevice itself moves:
[ ~]# ip netns add test
[ ~]# ip -netns test link add name eth0 type dummy
[ ~]# ip -netns test link property add dev eth0 altname some-name
[ ~]# ip -netns test link show dev some-name
2: eth0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 1e:67:ed:19:3d:24 brd ff:ff:ff:ff:ff:ff
altname some-name
[ ~]# ip -netns test link set dev eth0 netns 1
[ ~]# ip link
...
3: eth0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 02:40:88:62:ec:b8 brd ff:ff:ff:ff:ff:ff
altname some-name
[ ~]# ip li show dev some-name
Device "some-name" does not exist.
Remove them from the hash table when device is unlisted
and add back when listed again.
Fixes: 36fbf1e52bd3 ("net: rtnetlink: add linkprop commands to add and delete alternative ifnames") Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Looks like the driver sleep pins configuration is unusable. Adding the
sleep pins causes the usb phy to not respond. We need to use the default
pins in probe, and only set sleep pins at phy_mdm6600_device_power_off().
As the modem can also be booted to a serial port mode for firmware
flashing, let's make the pin changes limited to probe and remove. For
probe, we get the default pins automatically. We only need to set the
sleep pins in phy_mdm6600_device_power_off() to prevent the modem from
waking up because the gpio line glitches.
If it turns out that we need a separate state for phy_mdm6600_power_on()
and phy_mdm6600_power_off(), we can use the pinctrl idle state.
Cc: Ivaylo Dimitrov <ivo.g.dimitrov.75@gmail.com> Cc: Merlijn Wajer <merlijn@wizzup.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: Sebastian Reichel <sre@kernel.org> Fixes: 2ad2af081622 ("phy: mapphone-mdm6600: Improve phy related runtime PM calls") Signed-off-by: Tony Lindgren <tony@atomide.com> Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.com> Link: https://lore.kernel.org/r/20230913060433.48373-3-tony@atomide.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Commit d644e0d79829 ("phy: mapphone-mdm6600: Fix PM error handling in
phy_mdm6600_probe") caused a regression where we now unconditionally
disable runtime PM at the end of the probe while it is only needed on
errors.
Cc: Ivaylo Dimitrov <ivo.g.dimitrov.75@gmail.com> Cc: Merlijn Wajer <merlijn@wizzup.org> Cc: Miaoqian Lin <linmq006@gmail.com> Cc: Pavel Machek <pavel@ucw.cz> Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.com> Fixes: d644e0d79829 ("phy: mapphone-mdm6600: Fix PM error handling in phy_mdm6600_probe") Signed-off-by: Tony Lindgren <tony@atomide.com> Link: https://lore.kernel.org/r/20230913060433.48373-1-tony@atomide.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Add flag IRQCHIP_MASK_ON_SUSPEND to make sure gpio irq is masked on
suspend, if lack this flag, current irq arctitecture will not mask
the irq, and these unmasked gpio irq will wrongly wakeup the system
even they are not config as wakeup source.
Also add flag IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND to make sure the gpio
irq which is configed as wakeup source can work as expect.
Since recently, the kernel is nagging about mutable irq_chips:
"not an immutable chip, please consider fixing it!"
Drop the unneeded copy, flag it as IRQCHIP_IMMUTABLE, add the new
helper functions and call the appropriate gpiolib functions.
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Stable-dep-of: 430232619791 ("gpio: vf610: mask the gpio irq in system suspend and support wakeup") Signed-off-by: Sasha Levin <sashal@kernel.org>
When a kprobe is attached to a function that's name is not unique (is
static and shares the name with other functions in the kernel), the
kprobe is attached to the first function it finds. This is a bug as the
function that it is attaching to is not necessarily the one that the
user wants to attach to.
Instead of blindly picking a function to attach to what is ambiguous,
error with EADDRNOTAVAIL to let the user know that this function is not
unique, and that the user must use another unique function with an
address offset to get to the function they want to attach to.
Function kallsyms_on_each_symbol() traverses all symbols and submits each
symbol to the hook 'fn' for judgment and processing. For some cases, the
hook actually only handles the matched symbol, such as livepatch.
Because all symbols are currently sorted by name, all the symbols with the
same name are clustered together. Function kallsyms_lookup_names() gets
the start and end positions of the set corresponding to the specified
name. So we can easily and quickly traverse all the matches.
The test results are as follows (twice): (x86)
kallsyms_on_each_match_symbol: 7454, 7984
kallsyms_on_each_symbol : 11733809, 11785803
kallsyms_on_each_match_symbol() consumes only 0.066% of
kallsyms_on_each_symbol()'s time. In other words, 1523x better
performance.
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Stable-dep-of: b022f0c7e404 ("tracing/kprobes: Return EADDRNOTAVAIL when func matches several symbols") Signed-off-by: Sasha Levin <sashal@kernel.org>
kallsyms_seqs_of_names[] records the symbol index sorted by address, the
maximum value in kallsyms_seqs_of_names[] is the number of symbols. And
2^24 = 16777216, which means that three bytes are enough to store the
index. This can help us save (1 * kallsyms_num_syms) bytes of memory.
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Stable-dep-of: b022f0c7e404 ("tracing/kprobes: Return EADDRNOTAVAIL when func matches several symbols") Signed-off-by: Sasha Levin <sashal@kernel.org>
The kernel produces a warning splat and the DSI device fails to register
in this driver if the i2c driver probes, populates child auxiliary
devices, and then somewhere in ti_sn_bridge_probe() a function call
returns -EPROBE_DEFER. When the auxiliary driver probe defers, the dsi
device created by devm_mipi_dsi_device_register_full() is left
registered because the devm managed device used to manage the lifetime
of the DSI device is the parent i2c device, not the auxiliary device
that is being probed.
Associate the DSI device created and managed by this driver to the
lifetime of the auxiliary device, not the i2c device, so that the DSI
device is removed when the auxiliary driver unbinds. Similarly change
the device pointer used for dev_err_probe() so the deferred probe errors
are associated with the auxiliary device instead of the parent i2c
device so we can narrow down future problems faster.
Cc: Douglas Anderson <dianders@chromium.org> Cc: Maxime Ripard <maxime@cerno.tech> Fixes: c3b75d4734cb ("drm/bridge: sn65dsi86: Register and attach our DSI device at probe") Signed-off-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://patchwork.freedesktop.org/patch/msgid/20231002235407.769399-1-swboyd@chromium.org Signed-off-by: Sasha Levin <sashal@kernel.org>
Free the "priv" pointer before returning the error code.
Fixes: 90eb6b59d311 ("ASoC: pxa-ssp: add support for an external clock in devicetree") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/r/84ac2313-1420-471a-b2cb-3269a2e12a7c@moroto.mountain Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
We found a glitch when configuring the pad as output high. To avoid this
glitch, move the data value setting before direction config in the
function vf610_gpio_direction_output().
Map 0x2a to KEY_SELECTIVE_SCREENSHOT mirroring how similar hotkeys
are mapped on other laptops.
PrintScreem and CapsLock are also reported as normal PS/2 keyboard events,
map these event codes to KE_IGNORE to avoid "Unknown key code 0x%x\n" log
messages.
Reported-by: James John <me@donjajo.com> Closes: https://lore.kernel.org/platform-driver-x86/a2c441fe-457e-44cf-a146-0ecd86b037cf@donjajo.com/ Closes: https://bbs.archlinux.org/viewtopic.php?pid=2123716 Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20231017090725.38163-4-hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This mapping is causing issues on new laptop models which actually
send 0x2b events for printscreen presses and 0x2c events for
capslock presses, which get translated into spurious brightness-down
presses.
This mapping is really only necessary when asus-wmi has registered
a backlight-device for backlight control. In this case the mapping
was used to decide to filter out the keypresss since in this case
the firmware has already modified the brightness itself and instead
of reporting a keypress asus-wmi will just report the new brightness
value to userspace.
OTOH when the firmware does not adjust the brightness itself then
it seems to always report 0x2e for brightness-down presses and
0x2f for brightness up presses independent of the actual brightness
level. So in this case the mapping of the code is not necessary
and this translation actually leads to spurious brightness-down
presses being send to userspace when pressing printscreen or capslock.
Modify asus_wmi_handle_event_code() to only do the mapping
when using asus-wmi backlight control to fix the spurious
brightness-down presses.
Reported-by: James John <me@donjajo.com> Closes: https://lore.kernel.org/platform-driver-x86/a2c441fe-457e-44cf-a146-0ecd86b037cf@donjajo.com/ Closes: https://bbs.archlinux.org/viewtopic.php?pid=2123716 Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20231017090725.38163-3-hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Before this commit all the NOTIFY_BRNDOWN_MIN - NOTIFY_BRNDOWN_MAX
aka 0x20 - 0x2e events were mapped to 0x20.
This mapping is causing issues on new laptop models which actually
send 0x2b events for printscreen presses and 0x2c events for
capslock presses, which get translated into spurious brightness-down
presses.
The plan is disable the 0x11-0x2e special mapping on laptops
where asus-wmi does not register a backlight-device to avoid
the spurious brightness-down keypresses. New laptops always send
0x2e for brightness-down presses, change the special internal
ASUS_WMI_BRN_DOWN value from 0x20 to 0x2e to match this in
preparation for fixing the spurious brightness-down presses.
This change does not have any functional impact since all
of 0x20 - 0x2e is mapped to ASUS_WMI_BRN_DOWN first and only
then checked against the keymap code and the new 0x2e
value is still in the 0x20 - 0x2e range.
Reported-by: James John <me@donjajo.com> Closes: https://lore.kernel.org/platform-driver-x86/a2c441fe-457e-44cf-a146-0ecd86b037cf@donjajo.com/ Closes: https://bbs.archlinux.org/viewtopic.php?pid=2123716 Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20231017090725.38163-2-hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When the current uncore frequency can't be read, don't create attribute
"current_freq_khz" as any read will fail later. Some user space
applications like turbostat fail to continue with the failure. So, check
error during attribute creation.
Fixes: 414eef27283a ("platform/x86/intel/uncore-freq: Display uncore current frequency") Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20231004181915.1887913-1-srinivas.pandruvada@linux.intel.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If platform_profile_register() fails, the driver does not propagate
the error, but instead probes successfully. This means when the driver
unbinds, the a warning might be issued by platform_profile_remove().
Fix this by propagating the error back to the caller of
surface_platform_profile_probe().
Compile-tested only.
Fixes: b78b4982d763 ("platform/surface: Add platform profile driver") Signed-off-by: Armin Wolf <W_Armin@gmx.de> Reviewed-by: Maximilian Luz <luzmaximilian@gmail.com> Tested-by: Maximilian Luz <luzmaximilian@gmail.com> Link: https://lore.kernel.org/r/20231014235449.288702-1-W_Armin@gmx.de Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If name_show() is non unique, this test will try to install a kprobe on this
function which should fail returning EADDRNOTAVAIL.
On kernel where name_show() is not unique, this test is skipped.
Since the fixed commits both zdev->iommu_bitmap and zdev->lazy_bitmap
are allocated as vzalloc(zdev->iommu_pages / 8). The problem is that
zdev->iommu_bitmap is a pointer to unsigned long but the above only
yields an allocation that is a multiple of sizeof(unsigned long) which
is 8 on s390x if the number of IOMMU pages is a multiple of 64.
This in turn is the case only if the effective IOMMU aperture is
a multiple of 64 * 4K = 256K. This is usually the case and so didn't
cause visible issues since both the virt_to_phys(high_memory) reduced
limit and hardware limits use nice numbers.
Under KVM, and in particular with QEMU limiting the IOMMU aperture to
the vfio DMA limit (default 65535), it is possible for the reported
aperture not to be a multiple of 256K however. In this case we end up
with an iommu_bitmap whose allocation is not a multiple of
8 causing bitmap operations to access it out of bounds.
Sadly we can't just fix this in the obvious way and use bitmap_zalloc()
because for large RAM systems (tested on 8 TiB) the zdev->iommu_bitmap
grows too large for kmalloc(). So add our own bitmap_vzalloc() wrapper.
This might be a candidate for common code, but this area of code will
be replaced by the upcoming conversion to use the common code DMA API on
s390 so just add a local routine.
Fixes: 224593215525 ("s390/pci: use virtual memory for iommu bitmap") Fixes: 13954fd6913a ("s390/pci_dma: improve lazy flush for unmap") Cc: stable@vger.kernel.org Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Because group consistency is non-atomic between parent (filedesc) and children
(inherited) events, it is possible for PERF_FORMAT_GROUP read() to try and sum
non-matching counter groups -- with non-sensical results.
Add group_generation to distinguish the case where a parent group removes and
adds an event and thus has the same number, but a different configuration of
events as inherited groups.
This became a problem when commit fa8c269353d5 ("perf/core: Invert
perf_read_group() loops") flipped the order of child_list and sibling_list.
Previously it would iterate the group (sibling_list) first, and for each
sibling traverse the child_list. In this order, only the group composition of
the parent is relevant. By flipping the order the group composition of the
child (inherited) events becomes an issue and the mis-match in group
composition becomes evident.
That said; even prior to this commit, while reading of a group that is not
equally inherited was not broken, it still made no sense.
(Ab)use ECHILD as error return to indicate issues with child process group
composition.
HID descriptors with Battery System (0x85) Charging (0x44) usage are
ignored and POWER_SUPPLY_STATUS_DISCHARGING is always reported to user
space, even when the device is charging.
Map this usage and when it is reported set the right charging status.
In addition, add KUnit tests to make sure that the charging status is
correctly set and reported. They can be run with the usual command:
$ ./tools/testing/kunit/kunit.py run --kunitconfig=drivers/hid
Stop zapping invalidate TDP MMU roots via work queue now that KVM
preserves TDP MMU roots until they are explicitly invalidated. Zapping
roots asynchronously was effectively a workaround to avoid stalling a vCPU
for an extended during if a vCPU unloaded a root, which at the time
happened whenever the guest toggled CR0.WP (a frequent operation for some
guest kernels).
While a clever hack, zapping roots via an unbound worker had subtle,
unintended consequences on host scheduling, especially when zapping
multiple roots, e.g. as part of a memslot. Because the work of zapping a
root is no longer bound to the task that initiated the zap, things like
the CPU affinity and priority of the original task get lost. Losing the
affinity and priority can be especially problematic if unbound workqueues
aren't affined to a small number of CPUs, as zapping multiple roots can
cause KVM to heavily utilize the majority of CPUs in the system, *beyond*
the CPUs KVM is already using to run vCPUs.
When deleting a memslot via KVM_SET_USER_MEMORY_REGION, the async root
zap can result in KVM occupying all logical CPUs for ~8ms, and result in
high priority tasks not being scheduled in in a timely manner. In v5.15,
which doesn't preserve unloaded roots, the issues were even more noticeable
as KVM would zap roots more frequently and could occupy all CPUs for 50ms+.
Consuming all CPUs for an extended duration can lead to significant jitter
throughout the system, e.g. on ChromeOS with virtio-gpu, deleting memslots
is a semi-frequent operation as memslots are deleted and recreated with
different host virtual addresses to react to host GPU drivers allocating
and freeing GPU blobs. On ChromeOS, the jitter manifests as audio blips
during games due to the audio server's tasks not getting scheduled in
promptly, despite the tasks having a high realtime priority.
Deleting memslots isn't exactly a fast path and should be avoided when
possible, and ChromeOS is working towards utilizing MAP_FIXED to avoid the
memslot shenanigans, but KVM is squarely in the wrong. Not to mention
that removing the async zapping eliminates a non-trivial amount of
complexity.
Note, one of the subtle behaviors hidden behind the async zapping is that
KVM would zap invalidated roots only once (ignoring partial zaps from
things like mmu_notifier events). Preserve this behavior by adding a flag
to identify roots that are scheduled to be zapped versus roots that have
already been zapped but not yet freed.
Add a comment calling out why kvm_tdp_mmu_invalidate_all_roots() can
encounter invalid roots, as it's not at all obvious why zapping
invalidated roots shouldn't simply zap all invalid roots.
Reported-by: Pattara Teerapong <pteerapong@google.com> Cc: David Stevens <stevensd@google.com> Cc: Yiwei Zhang<zzyiwei@google.com> Cc: Paul Hsia <paulhsia@google.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230916003916.2545000-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: David Matlack <dmatlack@google.com> Tested-by: David Matlack <dmatlack@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It may happen that the work to destroy a queue
(for example nvmet_tcp_release_queue_work()) is started while
an auth-send or auth-receive command is still completing.
nvmet_sq_destroy() will block, waiting for all the references
to the sq to be dropped, the last reference is then
dropped when nvmet_req_complete() is called.
When this happens, both nvmet_sq_destroy() and
nvmet_execute_auth_send()/_receive() will free the dhchap pointers by
calling nvmet_auth_sq_free().
Since there isn't any lock, the two threads may race against each other,
causing double frees and memory corruptions, as reported by KASAN.
Reproduced by stress blktests nvme/041 nvme/042 nvme/043
nvme nvme2: qid 0: authenticated with hash hmac(sha512) dhgroup ffdhe4096
==================================================================
BUG: KASAN: double-free in kfree+0xec/0x4b0
User can request more metadata bytes than the device will write. Ensure
kernel buffer is initialized so we're not leaking unsanitized memory on
the copy-out.
Fixes: 0b7f1f26f95a51a ("nvme: use the block layer for userspace passthrough metadata") Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
acpi_register_gsi() should return a negative value in case of failure.
Currently, it returns the return value from irq_create_fwspec_mapping().
However, irq_create_fwspec_mapping() returns 0 for failure. Fix the
issue by returning -EINVAL if irq_create_fwspec_mapping() returns zero.
Fixes: d44fa3d46079 ("ACPI: Add support for ResourceSource/IRQ domain mapping") Cc: 4.11+ <stable@vger.kernel.org> # 4.11+ Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
[ rjw: Rename a new local variable ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patches fixes commit 51d674a5e488 "NFSv4.1: use
EXCHGID4_FLAG_USE_PNFS_DS for DS server", purpose of that
commit was to mark EXCHANGE_ID to the DS with the appropriate
flag.
However, connection to MDS can return both EXCHGID4_FLAG_USE_PNFS_DS
and EXCHGID4_FLAG_USE_PNFS_MDS set but previous patch would only
remember the USE_PNFS_DS and for the 2nd EXCHANGE_ID send that
to the MDS.
Instead, just mark the pnfs path exclusively.
Fixes: 51d674a5e488 ("NFSv4.1: use EXCHGID4_FLAG_USE_PNFS_DS for DS server") Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We are not allowed to call pnfs_mark_matching_lsegs_return() without
also holding a reference to the layout header, since doing so could lead
to the reference count going to zero when we call
pnfs_layout_remove_lseg(). This again can lead to a hang when we get to
nfs4_evict_inode() and are unable to clear the layout pointer.
pnfs_layout_return_unused_byserver() is guilty of this behaviour, and
has been seen to trigger the refcount warning prior to a hang.
Fixes: b6d49ecd1081 ("NFSv4: Fix a pNFS layout related use-after-free race when freeing the inode") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The OEMID is an 8-bit binary number rather than 16-bit as the current code
parses for. The OEMID occupies bits [111:104] in the CID register, see the
eMMC spec JESD84-B51 paragraph 7.2.3. It seems that the 16-bit comes from
the legacy MMC specs (v3.31 and before).
Let's fix the parsing by simply move to use 8-bit instead of 16-bit. This
means we ignore the impact on some of those old MMC cards that may be out
there, but on the other hand this shouldn't be a problem as the OEMID seems
not be an important feature for these cards.
tuning only support in 4-bit mode or 8 bit mode, so in 1-bit mode,
need to hold retuning.
Find this issue when use manual tuning method on imx93. When system
resume back, SDIO WIFI try to switch back to 4 bit mode, first will
trigger retuning, and all tuning command failed.
Signed-off-by: Haibo Chen <haibo.chen@nxp.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Fixes: dfa13ebbe334 ("mmc: host: Add facility to support re-tuning") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230830093922.3095850-1-haibo.chen@nxp.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To improve the r/w performance of GL9763E, the current driver inhibits LPM
negotiation while the device is active.
This prevents a large number of SoCs from suspending, notably x86 systems
which commonly use S0ix as the suspend mechanism - for example, Intel
Alder Lake and Raptor Lake processors.
Failure description:
1. Userspace initiates s2idle suspend (e.g. via writing to
/sys/power/state)
2. This switches the runtime_pm device state to active, which disables
LPM negotiation, then calls the "regular" suspend callback
3. With LPM negotiation disabled, the bus cannot enter low-power state
4. On a large number of SoCs, if the bus not in a low-power state, S0ix
cannot be entered, which in turn prevents the SoC from entering
suspend.
Fix by re-enabling LPM negotiation in the device's suspend callback.
Suggested-by: Stanislaw Kardach <skardach@google.com> Fixes: f9e5b33934ce ("mmc: host: Improve I/O read/write performance for GL9763E") Cc: stable@vger.kernel.org Signed-off-by: Sven van Ashbrook <svenva@chromium.org> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Link: https://lore.kernel.org/r/20230831160055.v3.1.I7ed1ca09797be2dd76ca914c57d88b32d24dac88@changeid Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In the TXT binding before conversion, the "xo" clock was listed as
optional. Conversion kept it optional in "clock-names", but not in
"clocks". This fixes dbts_check warnings like:
qcom-sdx65-mtp.dtb: mmc@8804000: clocks: [[13, 59], [13, 58]] is too short
When the exact mapping type driver was not available, the old
physmap_of_core driver fell back to mapping the region as ROM.
Unfortunately this feature was lost when the DT and pdata cases were
merged. Revive this useful feature.
The NAND core complies with the ONFI specification, which itself
mentions that after any program or erase operation, a status check
should be performed to see whether the operation was finished *and*
successful.
The NAND core offers helpers to finish a page write (sending the
"PAGE PROG" command, waiting for the NAND chip to be ready again, and
checking the operation status). But in some cases, advanced controller
drivers might want to optimize this and craft their own page write
helper to leverage additional hardware capabilities, thus not always
using the core facilities.
Some drivers, like this one, do not use the core helper to finish a page
write because the final cycles are automatically managed by the
hardware. In this case, the additional care must be taken to manually
perform the final status check.
Let's read the NAND chip status at the end of the page write helper and
return -EIO upon error.
Cc: Michal Simek <michal.simek@amd.com> Cc: stable@vger.kernel.org Fixes: 88ffef1b65cf ("mtd: rawnand: arasan: Support the hardware BCH ECC engine") Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Acked-by: Michal Simek <michal.simek@amd.com> Link: https://lore.kernel.org/linux-mtd/20230717194221.229778-2-miquel.raynal@bootlin.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The NAND core complies with the ONFI specification, which itself
mentions that after any program or erase operation, a status check
should be performed to see whether the operation was finished *and*
successful.
The NAND core offers helpers to finish a page write (sending the
"PAGE PROG" command, waiting for the NAND chip to be ready again, and
checking the operation status). But in some cases, advanced controller
drivers might want to optimize this and craft their own page write
helper to leverage additional hardware capabilities, thus not always
using the core facilities.
Some drivers, like this one, do not use the core helper to finish a page
write because the final cycles are automatically managed by the
hardware. In this case, the additional care must be taken to manually
perform the final status check.
Let's read the NAND chip status at the end of the page write helper and
return -EIO upon error.
The NAND core complies with the ONFI specification, which itself
mentions that after any program or erase operation, a status check
should be performed to see whether the operation was finished *and*
successful.
The NAND core offers helpers to finish a page write (sending the
"PAGE PROG" command, waiting for the NAND chip to be ready again, and
checking the operation status). But in some cases, advanced controller
drivers might want to optimize this and craft their own page write
helper to leverage additional hardware capabilities, thus not always
using the core facilities.
Some drivers, like this one, do not use the core helper to finish a page
write because the final cycles are automatically managed by the
hardware. In this case, the additional care must be taken to manually
perform the final status check.
Let's read the NAND chip status at the end of the page write helper and
return -EIO upon error.
Cc: Michal Simek <michal.simek@amd.com> Cc: stable@vger.kernel.org Fixes: 08d8c62164a3 ("mtd: rawnand: pl353: Add support for the ARM PL353 SMC NAND controller") Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Tested-by: Michal Simek <michal.simek@amd.com> Link: https://lore.kernel.org/linux-mtd/20230717194221.229778-3-miquel.raynal@bootlin.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If the netdevice is within a container and communicates externally
through network technologies such as VxLAN, we won't be able to find
routing information in the init_net namespace. To address this issue,
we need to add a struct net parameter to the smc_ib_find_route function.
This allow us to locate the routing information within the corresponding
net namespace, ensuring the correct completion of the SMC CLC interaction.
Fixes: e5c4744cfb59 ("net/smc: add SMC-Rv2 connection establishment") Signed-off-by: Albert Huang <huangjie.albert@bytedance.com> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Link: https://lore.kernel.org/r/20231011074851.95280-1-huangjie.albert@bytedance.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
In the blamed commit below, I completely forgot to release the acquired
resources before erroring out in the TCP BPF code, as reported by Dan.
Address the issues by replacing the bogus return with a jump to the
relevant cleanup code.
Fixes: 419ce133ab92 ("tcp: allow again tcp_disconnect() when threads are waiting") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/r/8f99194c698bcef12666f0a9a999c58f8b1cb52c.1697557782.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Paolo Abeni reports that on some systems the pyroute2 version isn't
new enough to run the test suite. Ensure that we support a minimum
version of 0.6 for all cases (which does include the existing ones).
The 0.6.1 version was released in May of 2021, so should be
propagated to most installations at this point.
The alternative that Paolo proposed was to only skip when the
add-flow is being run. This would be okay for most cases, except
if a future test case is added that needs to do flow dump without
an associated add (just guessing). In that case, it could also be
broken and we would need additional skip logic anyway. Just draw
a line in the sand now.
Fixes: 25f16c873fb1 ("selftests: add openvswitch selftest suite") Reported-by: Paolo Abeni <pabeni@redhat.com> Closes: https://lore.kernel.org/lkml/8470c431e0930d2ea204a9363a60937289b7fdbe.camel@redhat.com/ Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
memcmp is not consider safe to use with cryptographic secrets:
'Do not use memcmp() to compare security critical data, such as
cryptographic secrets, because the required CPU time depends on the
number of equal bytes.'
While usage of memcmp for ZERO_KEY may not be considered a security
critical data, it can lead to more usage of memcmp with pairing keys
which could introduce more security problems.
Fixes: 455c2ff0a558 ("Bluetooth: Fix BR/EDR out-of-band pairing with only initiator data") Fixes: 33155c4aae52 ("Bluetooth: hci_event: Ignore NULL link key") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
pci_enable_pcie_error_reporting() enables the device to send ERR_*
Messages. Since f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is
native"), the PCI core does this for all devices during enumeration.
Remove the redundant pci_enable_pcie_error_reporting() call from the
driver. Also remove the corresponding pci_disable_pcie_error_reporting()
from the driver .remove() path.
Note that this doesn't control interrupt generation by the Root Port; that
is controlled by the AER Root Error Command register, which is managed by
the AER service driver.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Cc: Tony Nguyen <anthony.l.nguyen@intel.com> Cc: intel-wired-lan@lists.osuosl.org Cc: netdev@vger.kernel.org Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Stable-dep-of: 0288c3e709e5 ("ice: reset first in crash dump kernels") Signed-off-by: Sasha Levin <sashal@kernel.org>
As reported by Tom, .NET and applications build on top of it rely
on connect(AF_UNSPEC) to async cancel pending I/O operations on TCP
socket.
The blamed commit below caused a regression, as such cancellation
can now fail.
As suggested by Eric, this change addresses the problem explicitly
causing blocking I/O operation to terminate immediately (with an error)
when a concurrent disconnect() is executed.
Instead of tracking the number of threads blocked on a given socket,
track the number of disconnect() issued on such socket. If such counter
changes after a blocking operation releasing and re-acquiring the socket
lock, error out the current operation.
Fixes: 4faeee0cf8a5 ("tcp: deny tcp_disconnect() when threads are waiting") Reported-by: Tom Deseyn <tdeseyn@redhat.com> Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1886305 Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/f3b95e47e3dbed840960548aebaa8d954372db41.1697008693.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Split tls_rx_reader_{lock,unlock} into an 'acquire/release' and
the actual locking part.
With that we can use the tls_rx_reader_lock in situations where
the socket is already locked.
Suggested-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20230726191556.41714-6-hare@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stable-dep-of: 419ce133ab92 ("tcp: allow again tcp_disconnect() when threads are waiting") Signed-off-by: Sasha Levin <sashal@kernel.org>
In the cited commit, if the routing device is ovs internal port, the
out device is set to uplink, and packets go out after encapsulation.
If filter device is uplink, it can trigger the following syndrome:
mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 3966): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0xcdb051), err(-22)
Fix this issue by not offloading internal port if filter device is out
device. In this case, packets are not forwarded to the root table to
be processed, the termination table is used instead to forward them
from uplink to uplink.
Fixes: 100ad4e2d758 ("net/mlx5e: Offload internal port as encap route device") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Ariel Levkovich <lariel@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently, whenever fw issues a change ownership event, the PF that owns
the fw tracer drops its ownership directly and the other PFs try to pick
up the ownership via what MTRC register suggests.
In some cases, driver releases the ownership of the tracer and reacquires
it later on. Whenever the driver releases ownership of the tracer, fw
issues a change ownership event. This event can be delayed and come after
driver has reacquired ownership of the tracer. Thus the late event will
trigger the tracer owner PF to release the ownership again and lead to a
scenario where no PF is owning the tracer.
To prevent the scenario described above, when handling a change
ownership event, do not drop ownership of the tracer directly, instead
read the fw MTRC register to retrieve the up-to-date owner of the tracer
and set it accordingly in driver level.
Currently, mlx5 is registering event handler for vport context change
event some time after arming the event. this can lead to missing an
event, which will result in wrong rules in the FDB.
Hence, register the event handler before arming the event.
This solution is valid since FW is sending vport context change event
only on vports which SW armed, and SW arming the vport when enabling
it, which is done after the FDB has been created.
There are race conditions that may lead to inet6_dev refcount underflow
in xfrm6_dst_destroy() and rt6_uncached_list_flush_dev().
One of the refcount underflow bugs is shown below:
(cpu 1) | (cpu 2)
xfrm6_dst_destroy() |
... |
in6_dev_put() |
| rt6_uncached_list_flush_dev()
... | ...
| in6_dev_put()
rt6_uncached_list_del() | ...
... |
xfrm6_dst_destroy() calls rt6_uncached_list_del() after in6_dev_put(),
so rt6_uncached_list_flush_dev() has a chance to call in6_dev_put()
again for the same inet6_dev.
Fix it by moving in6_dev_put() after rt6_uncached_list_del() in
xfrm6_dst_destroy().
cpu 0 cpu1
xfrm_dump_policy
xfrm_policy_walk
list_move_tail
xfrm_add_policy
... ...
xfrm_policy_inexact_list_reinsert
list_for_each_entry_reverse
if (!policy->bydst_reinsert)
//read non-existent policy
xfrm_dump_policy_done
xfrm_policy_walk_done
list_del(&walk->walk.all);
If dump_one_policy() returns err (triggered by netlink socket),
xfrm_policy_walk() will move walk initialized by socket to list
net->xfrm.policy_all. so this socket becomes visible in the global
policy list. The head *walk can be traversed when users add policies
with different prefixlen and trigger xfrm_policy node merge.
The issue can also be triggered by policy list traversal while rehashing
and flushing policies.
It can be fixed by skip such "policies" with walk.dead set to 1.
Fixes: 9cf545ebd591 ("xfrm: policy: store inexact policies in a tree ordered by destination address") Fixes: 12a169e7d8f4 ("ipsec: Put dumpers on the dump list") Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Add nr_maxactive to specify rethook_node pool size. This means
the maximum number of actively running target functions concurrently
for probing by exit_handler. Note that if the running function is
preempted or sleep, it is still counted as 'active'.
Link: https://lkml.kernel.org/r/167526697917.433354.17779774988245113106.stgit@mhiramat.roam.corp.google.com Cc: Florent Revest <revest@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Stable-dep-of: 700b2b439766 ("fprobe: Fix to ensure the number of active retprobes is not zero") Signed-off-by: Sasha Levin <sashal@kernel.org>
Pass the private entry_data to the entry and exit handlers so that
they can share the context data, something like saved function
arguments etc.
User must specify the private entry_data size by @entry_data_size
field before registering the fprobe.
Link: https://lkml.kernel.org/r/167526696173.433354.17408372048319432574.stgit@mhiramat.roam.corp.google.com Cc: Florent Revest <revest@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Stable-dep-of: 700b2b439766 ("fprobe: Fix to ensure the number of active retprobes is not zero") Signed-off-by: Sasha Levin <sashal@kernel.org>
When cpufreq's policy is 'single', there is a scenario that will
cause sg_policy's next_freq to be unable to update.
When the CPU's util is always max, the cpufreq will be max,
and then if we change the policy's scaling_max_freq to be a
lower freq, indeed, the sg_policy's next_freq need change to
be the lower freq, however, because the cpu_is_busy, the next_freq
would keep the max_freq.
When suspending the computer, a Switch Pro Controller connected via USB will
lose its internal status. However, because the USB connection was technically
never lost, when resuming the computer, the driver will attempt to communicate
with the controller as if nothing happened (and fail).
Because of this, the user was forced to manually disconnect the controller
(or to press the sync button on the controller to power it off), so that it
can be re-initialized.
With this patch, the controller will be automatically re-initialized after
resuming from suspend.
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=216233 Signed-off-by: Martino Fontana <tinozzo123@gmail.com> Reviewed-by: Daniel J. Ogorchock <djogorchock@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>
At btrfs_realloc_node() we have these checks to verify we are not using a
stale transaction (a past transaction with an unblocked state or higher),
and the only thing we do is to trigger two WARN_ON(). This however is a
critical problem, highly unexpected and if it happens it's most likely due
to a bug, so we should error out and turn the fs into error state so that
such issue is much more easily noticed if it's triggered.
The problem is critical because in btrfs_realloc_node() we COW tree blocks,
and using such stale transaction will lead to not persisting the extent
buffers used for the COW operations, as allocating tree block adds the
range of the respective extent buffers to the ->dirty_pages iotree of the
transaction, and a stale transaction, in the unlocked state or higher,
will not flush dirty extent buffers anymore, therefore resulting in not
persisting the tree block and resource leaks (not cleaning the dirty_pages
iotree for example).
So do the following changes:
1) Return -EUCLEAN if we find a stale transaction;
2) Turn the fs into error state, with error -EUCLEAN, so that no
transaction can be committed, and generate a stack trace;
3) Combine both conditions into a single if statement, as both are related
and have the same error message;
4) Mark the check as unlikely, since this is not expected to ever happen.
Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
At btrfs_cow_block() we check if the block being COWed belongs to a root
that is being deleted and if so we log an error message. However this is
an unexpected case and it indicates a bug somewhere, so we should return
an error and abort the transaction. So change this in the following ways:
1) Abort the transaction with -EUCLEAN, so that if the issue ever happens
it can easily be noticed;
2) Change the logged message level from error to critical, and change the
message itself to print the block's logical address and the ID of the
root;
3) Return -EUCLEAN to the caller;
4) As this is an unexpected scenario, that should never happen, mark the
check as unlikely, allowing the compiler to potentially generate better
code.
Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
At btrfs_cow_block() we have these checks to verify we are not using a
stale transaction (a past transaction with an unblocked state or higher),
and the only thing we do is to trigger a WARN with a message and a stack
trace. This however is a critical problem, highly unexpected and if it
happens it's most likely due to a bug, so we should error out and turn the
fs into error state so that such issue is much more easily noticed if it's
triggered.
The problem is critical because using such stale transaction will lead to
not persisting the extent buffer used for the COW operation, as allocating
a tree block adds the range of the respective extent buffer to the
->dirty_pages iotree of the transaction, and a stale transaction, in the
unlocked state or higher, will not flush dirty extent buffers anymore,
therefore resulting in not persisting the tree block and resource leaks
(not cleaning the dirty_pages iotree for example).
So do the following changes:
1) Return -EUCLEAN if we find a stale transaction;
2) Turn the fs into error state, with error -EUCLEAN, so that no
transaction can be committed, and generate a stack trace;
3) Combine both conditions into a single if statement, as both are related
and have the same error message;
4) Mark the check as unlikely, since this is not expected to ever happen.
Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Jens reported the following warnings from -Wmaybe-uninitialized recent
Linus' branch.
In file included from ./include/asm-generic/rwonce.h:26,
from ./arch/arm64/include/asm/rwonce.h:71,
from ./include/linux/compiler.h:246,
from ./include/linux/export.h:5,
from ./include/linux/linkage.h:7,
from ./include/linux/kernel.h:17,
from fs/btrfs/ioctl.c:6:
In function ‘instrument_copy_from_user_before’,
inlined from ‘_copy_from_user’ at ./include/linux/uaccess.h:148:3,
inlined from ‘copy_from_user’ at ./include/linux/uaccess.h:183:7,
inlined from ‘btrfs_ioctl_space_info’ at fs/btrfs/ioctl.c:2999:6,
inlined from ‘btrfs_ioctl’ at fs/btrfs/ioctl.c:4616:10:
./include/linux/kasan-checks.h:38:27: warning: ‘space_args’ may be used
uninitialized [-Wmaybe-uninitialized]
38 | #define kasan_check_write __kasan_check_write
./include/linux/instrumented.h:129:9: note: in expansion of macro
‘kasan_check_write’
129 | kasan_check_write(to, n);
| ^~~~~~~~~~~~~~~~~
./include/linux/kasan-checks.h: In function ‘btrfs_ioctl’:
./include/linux/kasan-checks.h:20:6: note: by argument 1 of type ‘const
volatile void *’ to ‘__kasan_check_write’ declared here
20 | bool __kasan_check_write(const volatile void *p, unsigned int
size);
| ^~~~~~~~~~~~~~~~~~~
fs/btrfs/ioctl.c:2981:39: note: ‘space_args’ declared here
2981 | struct btrfs_ioctl_space_args space_args;
| ^~~~~~~~~~
In function ‘instrument_copy_from_user_before’,
inlined from ‘_copy_from_user’ at ./include/linux/uaccess.h:148:3,
inlined from ‘copy_from_user’ at ./include/linux/uaccess.h:183:7,
inlined from ‘_btrfs_ioctl_send’ at fs/btrfs/ioctl.c:4343:9,
inlined from ‘btrfs_ioctl’ at fs/btrfs/ioctl.c:4658:10:
./include/linux/kasan-checks.h:38:27: warning: ‘args32’ may be used
uninitialized [-Wmaybe-uninitialized]
38 | #define kasan_check_write __kasan_check_write
./include/linux/instrumented.h:129:9: note: in expansion of macro
‘kasan_check_write’
129 | kasan_check_write(to, n);
| ^~~~~~~~~~~~~~~~~
./include/linux/kasan-checks.h: In function ‘btrfs_ioctl’:
./include/linux/kasan-checks.h:20:6: note: by argument 1 of type ‘const
volatile void *’ to ‘__kasan_check_write’ declared here
20 | bool __kasan_check_write(const volatile void *p, unsigned int
size);
| ^~~~~~~~~~~~~~~~~~~
fs/btrfs/ioctl.c:4341:49: note: ‘args32’ declared here
4341 | struct btrfs_ioctl_send_args_32 args32;
| ^~~~~~
This was due to his config options and having KASAN turned on,
which adds some extra checks around copy_from_user(), which then
triggered the -Wmaybe-uninitialized checker for these cases.
Fix the warnings by initializing the different structs we're copying
into.
Reported-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The One Mix 2S is a mini laptop with a 1200x1920 portrait screen
mounted in a landscape oriented clamshell case. Because of the too
generic DMI strings this entry is also doing bios-date matching.
After deleting an interface address in fib_del_ifaddr(), the function
scans the fib_info list for stray entries and calls fib_flush() and
fib_table_flush(). Then the stray entries will be deleted silently and no
RTM_DELROUTE notification will be sent.
This lack of notification can make routing daemons, or monitor like
`ip monitor route` miss the routing changes. e.g.
+ ip link add dummy1 type dummy
+ ip link add dummy2 type dummy
+ ip link set dummy1 up
+ ip link set dummy2 up
+ ip addr add 192.168.5.5/24 dev dummy1
+ ip route add 7.7.7.0/24 dev dummy2 src 192.168.5.5
+ ip -4 route
7.7.7.0/24 dev dummy2 scope link src 192.168.5.5
192.168.5.0/24 dev dummy1 proto kernel scope link src 192.168.5.5
+ ip monitor route
+ ip addr del 192.168.5.5/24 dev dummy1
Deleted 192.168.5.0/24 dev dummy1 proto kernel scope link src 192.168.5.5
Deleted broadcast 192.168.5.255 dev dummy1 table local proto kernel scope link src 192.168.5.5
Deleted local 192.168.5.5 dev dummy1 table local proto kernel scope host src 192.168.5.5
As Ido reminded, fib_table_flush() isn't only called when an address is
deleted, but also when an interface is deleted or put down. The lack of
notification in these cases is deliberate. And commit 7c6bb7d2faaf
("net/ipv6: Add knob to skip DELROUTE message on device down") introduced
a sysctl to make IPv6 behave like IPv4 in this regard. So we can't send
the route delete notify blindly in fib_table_flush().
To fix this issue, let's add a new flag in "struct fib_info" to track the
deleted prefer source address routes, and only send notify for them.
After update:
+ ip monitor route
+ ip addr del 192.168.5.5/24 dev dummy1
Deleted 192.168.5.0/24 dev dummy1 proto kernel scope link src 192.168.5.5
Deleted broadcast 192.168.5.255 dev dummy1 table local proto kernel scope link src 192.168.5.5
Deleted local 192.168.5.5 dev dummy1 table local proto kernel scope host src 192.168.5.5
Deleted 7.7.7.0/24 dev dummy2 scope link src 192.168.5.5
Suggested-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230922075508.848925-1-liuhangbin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
In the pathological case of building sky2 with 16k PAGE_SIZE, the
frag_addr[] array would never be used, so the original code was correct
that size should be 0. But the compiler now gets upset with 0 size arrays
in places where it hasn't eliminated the code that might access such an
array (it can't figure out that in this case an rx skb with fragments
would never be created). To keep the compiler happy, make sure there is
at least 1 frag_addr in struct rx_ring_info:
In file included from include/linux/skbuff.h:28,
from include/net/net_namespace.h:43,
from include/linux/netdevice.h:38,
from drivers/net/ethernet/marvell/sky2.c:18:
drivers/net/ethernet/marvell/sky2.c: In function 'sky2_rx_unmap_skb':
include/linux/dma-mapping.h:416:36: warning: array subscript i is outside array bounds of 'dma_addr_t[0]' {aka 'long long unsigned int[]'} [-Warray-bounds=]
416 | #define dma_unmap_page(d, a, s, r) dma_unmap_page_attrs(d, a, s, r, 0)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/marvell/sky2.c:1257:17: note: in expansion of macro 'dma_unmap_page'
1257 | dma_unmap_page(&pdev->dev, re->frag_addr[i],
| ^~~~~~~~~~~~~~
In file included from drivers/net/ethernet/marvell/sky2.c:41:
drivers/net/ethernet/marvell/sky2.h:2198:25: note: while referencing 'frag_addr'
2198 | dma_addr_t frag_addr[ETH_JUMBO_MTU >> PAGE_SHIFT];
| ^~~~~~~~~
With CONFIG_PAGE_SIZE_16KB=y, PAGE_SHIFT == 14, so:
#define ETH_JUMBO_MTU 9000
causes "ETH_JUMBO_MTU >> PAGE_SHIFT" to be 0. Use "?: 1" to solve this build warning.
Cc: Mirko Lindner <mlindner@marvell.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: netdev@vger.kernel.org Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202309191958.UBw1cjXk-lkp@intel.com/ Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
If the structure is not initialized then boolean types might be copied
into the tracing data without being initialised. This causes data from
the stack to leak into the trace and also triggers a UBSAN failure which
can easily be avoided here.
Lower layer device driver stop/wake TX by calling ieee80211_stop_queue()/
ieee80211_wake_queue() while hw scan. Sometimes hw scan and PTK rekey are
running in parallel, when M4 sent from wpa_supplicant arrive while the TX
queue is stopped, then the M4 will pending send, and then new key install
from wpa_supplicant. After TX queue wake up by lower layer device driver,
the M4 will be dropped by below call stack.
When key install started, the current key flag is set KEY_FLAG_TAINTED in
ieee80211_pairwise_rekey(), and then mac80211 wait key install complete by
lower layer device driver. Meanwhile ieee80211_tx_h_select_key() will return
TX_DROP for the M4 in step 12 below, and then ieee80211_free_txskb() called
by ieee80211_tx_dequeue(), so the M4 will not send and free, then the rekey
process failed becaue AP not receive M4. Please see details in steps below.
There are a interval between KEY_FLAG_TAINTED set for current key flag and
install key complete by lower layer device driver, the KEY_FLAG_TAINTED is
set in this interval, all packet including M4 will be dropped in this
interval, the interval is step 8~13 as below.
issue steps:
TX thread install key thread
1. stop_queue -idle-
2. sending M4 -idle-
3. M4 pending -idle-
4. -idle- starting install key from wpa_supplicant
5. -idle- =>ieee80211_key_replace()
6. -idle- =>ieee80211_pairwise_rekey() and set
currently key->flags |= KEY_FLAG_TAINTED
7. -idle- =>ieee80211_key_enable_hw_accel()
8. -idle- =>drv_set_key() and waiting key install
complete from lower layer device driver
9. wake_queue -waiting state-
10. re-sending M4 -waiting state-
11. =>ieee80211_tx_h_select_key() -waiting state-
12. drop M4 by KEY_FLAG_TAINTED -waiting state-
13. -idle- install key complete with success/fail
success: clear flag KEY_FLAG_TAINTED
fail: start disconnect
Hence add check in step 11 above to allow the EAPOL send out in the
interval. If lower layer device driver use the old key/cipher to encrypt
the M4, then AP received/decrypt M4 correctly, after M4 send out, lower
layer device driver install the new key/cipher to hardware and return
success.
If lower layer device driver use new key/cipher to send the M4, then AP
will/should drop the M4, then it is same result with this issue, AP will/
should kick out station as well as this issue.
Cisco AP module 9115 with FW 17.3 has a bug and sends a too
large maximum MPDU length in the association response
(indicating 12k) that it cannot actually process.
Work around that by taking the minimum between what's in the
association response and the BSS elements (from beacon or
probe response).
When the scan request includes a non broadcast BSSID, when adding the
scan parameters for 6GHz collocated scanning, do not include entries
that do not match the given BSSID.
net/bluetooth/hci_core.c: In function ‘hci_register_dev’:
net/bluetooth/hci_core.c:2620:54: warning: ‘%d’ directive output may
be truncated writing between 1 and 10 bytes into a region of size 5
[-Wformat-truncation=]
2620 | snprintf(hdev->name, sizeof(hdev->name), "hci%d", id);
| ^~
net/bluetooth/hci_core.c:2620:50: note: directive argument in the range
[0, 2147483647]
2620 | snprintf(hdev->name, sizeof(hdev->name), "hci%d", id);
| ^~~~~~~
net/bluetooth/hci_core.c:2620:9: note: ‘snprintf’ output between 5 and
14 bytes into a destination of size 8
2620 | snprintf(hdev->name, sizeof(hdev->name), "hci%d", id);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
While executing the Android 13 CTS Verifier Secure Server test on a
ChromeOS device, it was observed that the Bluetooth host initiates
authentication for an RFCOMM connection after SSP completes.
When this happens, some Intel Bluetooth controllers, like AC9560, would
disconnect with "Connection Rejected due to Security Reasons (0x0e)".
Historically, BlueZ did not mandate this authentication while an
authenticated combination key was already in use for the connection.
This behavior was changed since commit 7b5a9241b780
("Bluetooth: Introduce requirements for security level 4").
So, this patch addresses the aforementioned disconnection issue by
restoring the previous behavior.
Signed-off-by: Ying Hsu <yinghsu@chromium.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
There is a slab-out-of-bounds Write bug in hid-holtek-kbd driver.
The problem is the driver assumes the device must have an input
but some malicious devices violate this assumption.
Fix this by checking hid_device's input is non-empty before its usage.
Signed-off-by: Ma Ke <make_ruc2021@163.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>
Debugging indicates that nothing else is clearing the info->flags,
so some frames were flagged as ACKed when they should not be.
Explicitly clear the ack flag to ensure this does not happen.
When kernel is compiled without preemption, the eval_map_work_func()
(which calls trace_event_eval_update()) will not be preempted up to its
complete execution. This can actually cause a problem since if another
CPU call stop_machine(), the call will have to wait for the
eval_map_work_func() function to finish executing in the workqueue
before being able to be scheduled. This problem was observe on a SMP
system at boot time, when the CPU calling the initcalls executed
clocksource_done_booting() which in the end calls stop_machine(). We
observed a 1 second delay because one CPU was executing
eval_map_work_func() and was not preempted by the stop_machine() task.
Adding a call to cond_resched() in trace_event_eval_update() allows
other tasks to be executed and thus continue working asynchronously
like before without blocking any pending task at boot time.
The 6 bytes length of the tries_buf string in ata_eh_link_report() is
too short and results in a gcc compilation warning with W-!:
drivers/ata/libata-eh.c: In function ‘ata_eh_link_report’:
drivers/ata/libata-eh.c:2371:59: warning: ‘%d’ directive output may be truncated writing between 1 and 11 bytes into a region of size 4 [-Wformat-truncation=]
2371 | snprintf(tries_buf, sizeof(tries_buf), " t%d",
| ^~
drivers/ata/libata-eh.c:2371:56: note: directive argument in the range [-2147483648, 4]
2371 | snprintf(tries_buf, sizeof(tries_buf), " t%d",
| ^~~~~~
drivers/ata/libata-eh.c:2371:17: note: ‘snprintf’ output between 4 and 14 bytes into a destination of size 6
2371 | snprintf(tries_buf, sizeof(tries_buf), " t%d",
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2372 | ap->eh_tries);
| ~~~~~~~~~~~~~
Avoid this warning by increasing the string size to 16B.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The 24 bytes length allocated to the ncq_desc string in
ata_dev_config_lba() for ata_dev_config_ncq() to use is too short,
causing the following gcc compilation warnings when compiling with W=1:
drivers/ata/libata-core.c: In function ‘ata_dev_configure’:
drivers/ata/libata-core.c:2378:56: warning: ‘%d’ directive output may be truncated writing between 1 and 2 bytes into a region of size between 1 and 11 [-Wformat-truncation=]
2378 | snprintf(desc, desc_sz, "NCQ (depth %d/%d)%s", hdepth,
| ^~
In function ‘ata_dev_config_ncq’,
inlined from ‘ata_dev_config_lba’ at drivers/ata/libata-core.c:2649:8,
inlined from ‘ata_dev_configure’ at drivers/ata/libata-core.c:2952:9:
drivers/ata/libata-core.c:2378:41: note: directive argument in the range [1, 32]
2378 | snprintf(desc, desc_sz, "NCQ (depth %d/%d)%s", hdepth,
| ^~~~~~~~~~~~~~~~~~~~~
drivers/ata/libata-core.c:2378:17: note: ‘snprintf’ output between 16 and 31 bytes into a destination of size 24
2378 | snprintf(desc, desc_sz, "NCQ (depth %d/%d)%s", hdepth,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2379 | ddepth, aa_desc);
| ~~~~~~~~~~~~~~~~
Avoid these warnings and the potential truncation by changing the size
of the ncq_desc string to 32 characters.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
As timbgpio_irq_enable()/timbgpio_irq_disable() callback could be
executed under irq context, it could introduce double locks on
&tgpio->lock if it preempts other execution units requiring
the same locks.
This flaw was found by an experimental static analysis tool I am
developing for irq-related deadlock.
To prevent the potential deadlock, the patch uses spin_lock_irqsave()
on &tgpio->lock inside timbgpio_gpio_set() to prevent the possible
deadlock scenario.
Signed-off-by: Chengfeng Ye <dg573847474@gmail.com> Reviewed-by: Andy Shevchenko <andy@kernel.org> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Nathan reported that he was seeing the new warning in
setattr_copy_mgtime pop when starting podman containers. Overlayfs is
trying to set the atime and mtime via notify_change without also
setting the ctime.
POSIX states that when the atime and mtime are updated via utimes() that
we must also update the ctime to the current time. The situation with
overlayfs copy-up is analogies, so add ATTR_CTIME to the bitmask.
notify_change will fill in the value.
Reported-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Jeff Layton <jlayton@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: Amir Goldstein <amir73il@gmail.com>
Message-Id: <20230913-ctime-v1-1-c6bc509cbc27@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
I2C_CLASS_DEPRECATED is a flag and not an actual class.
There's nothing speaking against both, parent and child, having
I2C_CLASS_DEPRECATED set. Therefore exclude it from the check.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Acked-by: Peter Rosin <peda@axentia.se> Signed-off-by: Wolfram Sang <wsa@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>