When setting the fan speed, i8k_set_fan() calls i8k_get_fan_status(),
causing an unnecessary SMM call since from the two users of this
function, only i8k_ioctl_unlocked() needs to know the new fan status
while dell_smm_write() ignores the new fan status.
Since SMM calls can be very slow while also making error reporting
difficult for dell_smm_write(), remove the function call from
i8k_set_fan() and call it separately in i8k_ioctl_unlocked().
Programmable lab power supplies made by GW Instek, such as the
GPP-2323, have a USB port exposing a serial port to control the device.
Stringing the supplied Windows driver, references to the ch341 chip are
found. Binding the existing ch341 driver to the VID/PID of the GPP-2323
("GW Instek USB2.0-Serial" as per the USB product name) works out of the
box, communication and control is now possible.
This patch should work with any GPP series power supply due to
similarities in the product line.
With CPU re-ordering on write instructions, there might
be a chance that the HWO is set before the TRB is updated
with the new mapped buffer address.
And in the case where core is processing a list of TRBs
it is possible that it fetched the TRBs when the HWO is set
but before the buffer address is updated.
Prevent this by adding a memory barrier before the HWO
is updated to ensure that the core always process the
updated TRBs.
The poll man page says POLLRDNORM is equivalent to POLLIN when used as
an event.
$ man poll
<snip>
POLLRDNORM
Equivalent to POLLIN.
However, in n_tty driver, POLLRDNORM does not return until timeout even
if there is terminal input, whereas POLLIN returns.
The following test program works until kernel-3.17, but the test stops
in poll() after commit 57087d515441 ("tty: Fix spurious poll() wakeups").
[Steps to run test program]
$ cc -o test-pollrdnorm test-pollrdnorm.c
$ ./test-pollrdnorm
foo <-- Type in something from the terminal followed by [RET].
The string should be echoed back.
Add a kconfig knob which allows for unprivileged bpf to be disabled by default.
If set, the knob sets /proc/sys/kernel/unprivileged_bpf_disabled to value of 2.
This still allows a transition of 2 -> {0,1} through an admin. Similarly,
this also still keeps 1 -> {1} behavior intact, so that once set to permanently
disabled, it cannot be undone aside from a reboot.
We've also added extra2 with max of 2 for the procfs handler, so that an admin
still has a chance to toggle between 0 <-> 2.
Either way, as an additional alternative, applications can make use of CAP_BPF
that we added a while ago.
in vt_setactivate an almost identical code path has been patched
with array_index_nospec. In the VT_ACTIVATE path the user input
is from a system call argument instead of a usercopy.
For consistency both code paths should have the same mitigations
applied.
Kasper Acknowledgements: Jakob Koschel, Brian Johannesmeyer, Kaveh
Razavi, Herbert Bos, Cristiano Giuffrida from the VUSec group at VU
Amsterdam.
Co-developed-by: Brian Johannesmeyer <bjohannesmeyer@gmail.com> Signed-off-by: Brian Johannesmeyer <bjohannesmeyer@gmail.com> Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com> Link: https://lore.kernel.org/r/20220127144406.3589293-2-jakobkoschel@gmail.com Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
array_index_nospec ensures that an out-of-bounds value is set to zero
on the transient path. Decreasing the value by one afterwards causes
a transient integer underflow. vsa.console should be decreased first
and then sanitized with array_index_nospec.
Kasper Acknowledgements: Jakob Koschel, Brian Johannesmeyer, Kaveh
Razavi, Herbert Bos, Cristiano Giuffrida from the VUSec group at VU
Amsterdam.
Co-developed-by: Brian Johannesmeyer <bjohannesmeyer@gmail.com> Signed-off-by: Brian Johannesmeyer <bjohannesmeyer@gmail.com> Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com> Link: https://lore.kernel.org/r/20220127144406.3589293-1-jakobkoschel@gmail.com Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It would be easy to craft a message containing an illegal binding table
update operation. This is handled correctly by the code, but the
corresponding warning printout is not rate limited as is should be.
We fix this now.
Fixes: b97bf3fd8f6a ("[TIPC] Initial merge") Signed-off-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
When uncloning an skb dst and its associated metadata, a new
dst+metadata is allocated and later replaces the old one in the skb.
This is helpful to have a non-shared dst+metadata attached to a specific
skb.
The issue is the uncloned dst+metadata is initialized with a refcount of
1, which is increased to 2 before attaching it to the skb. When
tun_dst_unclone returns, the dst+metadata is only referenced from a
single place (the skb) while its refcount is 2. Its refcount will never
drop to 0 (when the skb is consumed), leading to a memory leak.
Fix this by removing the call to dst_hold in tun_dst_unclone, as the
dst+metadata refcount is already 1.
Fixes: fc4099f17240 ("openvswitch: Fix egress tunnel info.") Cc: Pravin B Shelar <pshelar@ovn.org> Reported-by: Vlad Buslov <vladbu@nvidia.com> Tested-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
When uncloning an skb dst and its associated metadata a new dst+metadata
is allocated and the tunnel information from the old metadata is copied
over there.
The issue is the tunnel metadata has references to cached dst, which are
copied along the way. When a dst+metadata refcount drops to 0 the
metadata is freed including the cached dst entries. As they are also
referenced in the initial dst+metadata, this ends up in UaFs.
In practice the above did not happen because of another issue, the
dst+metadata was never freed because its refcount never dropped to 0
(this will be fixed in a subsequent patch).
Fix this by initializing the dst cache after copying the tunnel
information from the old metadata to also unshare the dst cache.
Fixes: d71785ffc7e7 ("net: add dst_cache to ovs vxlan lwtunnel") Cc: Paolo Abeni <pabeni@redhat.com> Reported-by: Vlad Buslov <vladbu@nvidia.com> Tested-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Antoine Tenart <atenart@kernel.org> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
When 803.2ad mode enables a participating port, it should update
the slave-array. I have observed that the member links are participating
and are part of the active aggregator while the traffic is egressing via
only one member link (in a case where two links are participating). Via
kprobes I discovered that slave-arr has only one link added while
the other participating link wasn't part of the slave-arr.
I couldn't see what caused that situation but the simple code-walk
through provided me hints that the enable_port wasn't always associated
with the slave-array update.
Fixes: ee6377147409 ("bonding: Simplify the xmit function for modes that use xmit_hash") Signed-off-by: Mahesh Bandewar <maheshb@google.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Link: https://lore.kernel.org/r/20220207222901.1795287-1-maheshb@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently, SD card fails to mount due to the following pinctrl error:
[ 11.170000] imx23-pinctrl 80018000.pinctrl: pin SSP1_DETECT already requested by 80018000.pinctrl; cannot claim for 80010000.spi
[ 11.180000] imx23-pinctrl 80018000.pinctrl: pin-65 (80010000.spi) status -22
[ 11.190000] imx23-pinctrl 80018000.pinctrl: could not request pin 65 (SSP1_DETECT) from group mmc0-pins-fixup.0 on device 80018000.pinctrl
[ 11.200000] mxs-mmc 80010000.spi: Error applying setting, reverse things back
Fix it by removing the MX23_PAD_SSP1_DETECT pin from the hog group as it
is already been used by the mmc0-pins-fixup pinctrl group.
With this change the rootfs can be mounted and the imx23-evk board can
boot successfully.
Calling dwc2_hsotg_ep_disable on ep0 (in/out) will lead to the following
logs before returning -EINVAL:
dwc2 49000000.usb-otg: dwc2_hsotg_ep_disable: called for ep0
dwc2 49000000.usb-otg: dwc2_hsotg_ep_disable: called for ep0
To avoid these two logs while suspending, start disabling the endpoint
from the index 1, as done in dwc2_hsotg_udc_stop:
/* all endpoints should be shutdown */
for (ep = 1; ep < hsotg->num_of_eps; ep++) {
if (hsotg->eps_in[ep])
dwc2_hsotg_ep_disable_lock(&hsotg->eps_in[ep]->ep);
if (hsotg->eps_out[ep])
dwc2_hsotg_ep_disable_lock(&hsotg->eps_out[ep]->ep);
}
iscsit_tpg_check_network_portal() has nested for_each loops and is supposed
to return true when a match is found. However, the tpg loop will still
continue after existing the tpg_np loop. If this tpg_np is not the last the
match value will be changed.
Break the outer loop after finding a match and make sure the np under each
tpg is unique.
Link: https://lore.kernel.org/r/20220111054742.19582-1-mingzhe.zou@easystack.cn Signed-off-by: ZouMingzhe <mingzhe.zou@easystack.cn> Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When the bitmask of the attributes doesn't include the security label,
don't bother printing it. Since the label might not be null terminated,
adjust the printing format accordingly.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The upstream commit b400c2f4f4c5 ("net: axienet: Wait for PhyRstCmplt
after core reset") add new instructions in the `__axienet_device_reset`
function to wait until PhyRstCmplt bit is set, and return a non zero-exit
value if this exceeds a timeout.
However, its backported version in 4.9 (commit 9f1a3c13342b ("net:
axienet: Wait for PhyRstCmplt after core reset")) insert these
instructions in `axienet_dma_bd_init` instead.
Unlike upstream, the version of `__axienet_device_reset` currently present
in branch stable/linux-4.9.y does not return an integer for error codes.
Therefore fixing the backport cannot be as simple as moving the inserted
instructions in the right place, and we simply revert it instead.
Fixes: 9f1a3c13342b ("net: axienet: Wait for PhyRstCmplt after core reset") Signed-off-by: Guillaume Bertholon <guillaume.bertholon@ens.fr> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ALSA: line6: Fix misplaced backport of "Fix wrong altsetting for LINE6_PODHD500_1"
The upstream commit 70256b42caaf ("ALSA: line6: Fix wrong altsetting for
LINE6_PODHD500_1") changed the .altsetting field of the LINE6_PODHD500_1
entry in podhd_properties_table from 1 to 0.
However, its backported version in stable (commit ec565611f930 ("ALSA:
line6: Fix wrong altsetting for LINE6_PODHD500_1")) change the
.altsetting field of the LINE6_PODHD500_0 entry instead.
This patch resets the altsetting of LINE6_PODHD500_0 to 1, and sets the
altsetting of LINE6_PODHD500_1 to 0, as wanted by the original fix.
serial: sh-sci: Fix misplaced backport of "Fix late enablement of AUTORTS"
The upstream commit 5f76895e4c71 ("serial: sh-sci: Fix late enablement of
AUTORTS") inserted a new call to .set_mctrl().
However the backported version in stable (commit ad3faea03fdf ("serial:
sh-sci: Fix late enablement of AUTORTS")) does not insert it at the same
position.
This patch moves the added instructions back to where they should be
according to the upsteam patch.
Fixes: ad3faea03fdf ("serial: sh-sci: Fix late enablement of AUTORTS") Signed-off-by: Guillaume Bertholon <guillaume.bertholon@ens.fr> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Input: i8042 - Fix misplaced backport of "add ASUS Zenbook Flip to noselftest list"
The upstream commit b5d6e7ab7fe7 ("Input: i8042 - add ASUS Zenbook Flip to
noselftest list") inserted a new entry in the `i8042_dmi_noselftest_table`
table, further patched by commit daa58c8eec0a ("Input: i8042 - fix Pegatron
C15B ID entry") to insert a missing separator.
However, their backported version in stable (commit e9e8b3769099
("Input: i8042 - add ASUS Zenbook Flip to noselftest list") and
commit c551d20d487a ("Input: i8042 - fix Pegatron C15B ID entry"))
inserted this entry in `i8042_dmi_forcemux_table` instead.
This patch moves the entry back into `i8042_dmi_noselftest_table`.
Fixes: e9e8b3769099 ("Input: i8042 - add ASUS Zenbook Flip to noselftest list") Signed-off-by: Guillaume Bertholon <guillaume.bertholon@ens.fr> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ensure that a client cannot specify a WRITE range that falls in a
byte range outside what the kernel's internal types (such as loff_t,
which is signed) can represent. The kiocb iterators, invoked in
nfsd_vfs_write(), should properly limit write operations to within
the underlying file system's s_maxbytes.
Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For some long forgotten reason, the nfs_client cl_flags field is
initialised in nfs_get_client() instead of being initialised at
allocation time. This quirk was harmless until we moved the call to
nfs_create_rpc_client().
Fixes: dd99e9f98fbf ("NFSv4: Initialise connection to the server in nfs4_alloc_client()") Cc: stable@vger.kernel.org # 4.8.x Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
audit_log_start() returns audit_buffer pointer on success or NULL on
error, so it is better to check the return value of it.
Fixes: 3323eec921ef ("integrity: IMA as an integrity service provider") Signed-off-by: Xiaoke Wang <xkernel.wang@foxmail.com> Cc: <stable@vger.kernel.org> Reviewed-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The function tipc_mon_rcv() allows a node to receive and process
domain_record structs from peer nodes to track their views of the
network topology.
This patch verifies that the number of members in a received domain
record does not exceed the limit defined by MAX_MON_DOMAIN, something
that may otherwise lead to a stack overflow.
tipc_mon_rcv() is called from the function tipc_link_proto_rcv(), where
we are reading a 32 bit message data length field into a uint16. To
avert any risk of bit overflow, we add an extra sanity check for this in
that function. We cannot see that happen with the current code, but
future designers being unaware of this risk, may introduce it by
allowing delivery of very large (> 64k) sk buffers from the bearer
layer. This potential problem was identified by Eric Dumazet.
This fixes CVE-2022-0435
Reported-by: Samuel Page <samuel.page@appgate.com> Reported-by: Eric Dumazet <edumazet@google.com> Fixes: 35c55c9877f8 ("tipc: add neighbor monitoring framework") Signed-off-by: Jon Maloy <jmaloy@redhat.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Samuel Page <samuel.page@appgate.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It was reported that the mmc host structure could be accessed after it
was freed in moxart_remove(), so fix this by saving the base register of
the device and using it instead of the pointer dereference.
Cc: Ulf Hansson <ulf.hansson@linaro.org> Cc: Xiyu Yang <xiyuyang19@fudan.edu.cn> Cc: Xin Xiong <xiongx18@fudan.edu.cn> Cc: Xin Tan <tanxin.ctf@gmail.com> Cc: Tony Lindgren <tony@atomide.com> Cc: Yang Li <yang.lee@linux.alibaba.com> Cc: linux-mmc@vger.kernel.org Cc: stable <stable@vger.kernel.org> Reported-by: whitehat002 <hackyzh002@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20220127071638.4057899-1-gregkh@linuxfoundation.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The cgroup release_agent is called with call_usermodehelper. The function
call_usermodehelper starts the release_agent with a full set fo capabilities.
Therefore require capabilities when setting the release_agaent.
Reported-by: Tabitha Sable <tabitha.c.sable@gmail.com> Tested-by: Tabitha Sable <tabitha.c.sable@gmail.com> Fixes: 81a6a5cdd2c5 ("Task Control Groups: automatic userspace notification of idle cgroups") Cc: stable@vger.kernel.org # v2.6.24+ Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Tejun Heo <tj@kernel.org>
[mkoutny: Adjust for pre-fs_context, duplicate mount/remount check, drop log messages.] Acked-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
While running "./check -I 200 generic/475" it sometimes gives below
kernel BUG(). Ideally we should not call ext4_write_inline_data() if
ext4_create_inline_data() has failed.
<log snip>
[73131.453234] kernel BUG at fs/ext4/inline.c:223!
This patch handles the error and prints out a emergency msg saying potential
data loss for the given inode (since we couldn't restore the original
inline_data due to some previous error).
[ 9571.070313] EXT4-fs (dm-0): error restoring inline_data for inode -- potential data loss! (inode 1703982, error -30)
The driver overrides error codes returned by platform_get_irq_optional()
to -EINVAL for some strange reason, so if it returns -EPROBE_DEFER, the
driver will fail the probe permanently instead of the deferred probing.
Switch to propagating the proper error codes to platform driver code
upwards.
The driver overrides the error codes returned by platform_get_irq() to
-ENODEV for some strange reason, so if it returns -EPROBE_DEFER, the
driver will fail the probe permanently instead of the deferred probing.
Switch to propagating the proper error codes to platform driver code
upwards.
There's limiting the year to 2069. When setting the rtc year to 2070,
reading it returns 1970. Evaluate century starting from 19 to count the
correct year.
$ sudo date -s 20700106
Mon 06 Jan 2070 12:00:00 AM CST
$ sudo hwclock -w
$ sudo hwclock -r
1970-01-06 12:00:49.604968+08:00
Fixes: 2a4daadd4d3e5071 ("rtc: cmos: ignore bogus century byte") Signed-off-by: Riwen Lu <luriwen@kylinos.cn> Acked-by: Eric Wong <e@80x24.org> Reviewed-by: Mateusz Jończyk <mat.jonczyk@o2.pl> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20220106084609.1223688-1-luriwen@kylinos.cn Signed-off-by: Mateusz Jończyk <mat.jonczyk@o2.pl> # preparation for stable Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
o The server has not recorded an unconfirmed { v, x, c, *, * } and
has recorded a confirmed { v, x, c, *, s }. If the principals of
the record and of SETCLIENTID_CONFIRM do not match, the server
returns NFS4ERR_CLID_INUSE without removing any relevant leased
client state, and without changing recorded callback and
callback_ident values for client { x }.
The current code intends to do what the spec describes above but
it forgot to set 'old' to NULL resulting to the confirmed client
to be expired.
Fixes: 2b63482185e6 ("nfsd: fix clid_inuse on mount with security change") Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Bruce Fields <bfields@fieldses.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Running tests with a debug kernel shows that bnx2fc_recv_frame() is
modifying the per_cpu lport stats counters in a non-mpsafe way. Just boot
a debug kernel and run the bnx2fc driver with the hardware enabled.
When setting Tx sci explicit, the Rx side is expected to use this
sci and not recalculate it from the packet.However, in case of Tx sci
is explicit and send_sci is off, the receiver is wrongly recalculate
the sci from the source MAC address which most likely be different
than the explicit sci.
Fix by preventing such configuration when macsec newlink is established
and return EINVAL error code on such cases.
Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver") Signed-off-by: Lior Nahmanson <liorna@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Raed Salem <raeds@nvidia.com> Link: https://lore.kernel.org/r/1643542672-29403-1-git-send-email-raeds@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In some case, like after a transfer timeout, master->cur_msg pointer
is NULL which led to a kernel crash when trying to use master->cur_msg->spi.
mtk_spi_can_dma(), pointed by master->can_dma, doesn't use this parameter
avoid the problem by setting NULL as second parameter.
Fixes: a568231f46322 ("spi: mediatek: Add spi bus for Mediatek MT8173") Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com> Link: https://lore.kernel.org/r/20220131141708.888710-1-benjamin.gaignard@collabora.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Apply only valid chip select value. This change fixes case where chip
select is set to initial value of '-1' during probe and PM supend and
subsequent resume can try to use the value with undefined behaviour.
Also in case where gpio based chip select, the check in
bcm_qspi_chip_select() shall prevent undefined behaviour on resume.
Fixes: fa236a7ef240 ("spi: bcm-qspi: Add Broadcom MSPI driver") Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20220127185359.27322-1-kdasu.kdev@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The polling loop for the register change in iommu_ga_log_enable() needs
to have a udelay() in it. Otherwise the CPU might be faster than the
IOMMU hardware and wrongly trigger the WARN_ON() further down the code
stream. Use a 10us for udelay(), has there is some hardware where
activation of the GA log can take more than a 100ms.
A future optimization should move the activation check of the GA log
to the point where it gets used for the first time. But that is a
bigger change and not suitable for a fix.
Bounds checking when parsing init scripts embedded in the BIOS reject
access to the last byte. This causes driver initialization to fail on
Apple eMac's with GeForce 2 MX GPUs, leaving the system with no working
console.
This is probably only seen on OpenFirmware machines like PowerPC Macs
because the BIOS image provided by OF is only the used parts of the ROM,
not a power-of-two blocks read from PCI directly so PCs always have
empty bytes at the end that are never accessed.
Signed-off-by: Nick Lopez <github@glowingmonkey.org> Fixes: 4d4e9907ff572 ("drm/nouveau/bios: guard against out-of-bounds accesses to image") Cc: <stable@vger.kernel.org> # v4.10+ Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Karol Herbst <kherbst@redhat.com> Signed-off-by: Karol Herbst <kherbst@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220122081906.2633061-1-github@glowingmonkey.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We don't currently validate that the values being set are within the range
we advertised to userspace as being valid, do so and reject any values
that are out of range.
We don't currently validate that the values being set are within the range
we advertised to userspace as being valid, do so and reject any values
that are out of range.
We don't currently validate that the values being set are within the range
we advertised to userspace as being valid, do so and reject any values
that are out of range.
When packet_setsockopt( PACKET_FANOUT_DATA ) reads po->fanout,
no lock is held, meaning that another thread can change po->fanout.
Given that po->fanout can only be set once during the socket lifetime
(it is only cleared from fanout_release()), we can use
READ_ONCE()/WRITE_ONCE() to document the race.
BUG: KCSAN: data-race in packet_setsockopt / packet_setsockopt
write to 0xffff88813ae8e300 of 8 bytes by task 14653 on cpu 0:
fanout_add net/packet/af_packet.c:1791 [inline]
packet_setsockopt+0x22fe/0x24a0 net/packet/af_packet.c:3931
__sys_setsockopt+0x209/0x2a0 net/socket.c:2180
__do_sys_setsockopt net/socket.c:2191 [inline]
__se_sys_setsockopt net/socket.c:2188 [inline]
__x64_sys_setsockopt+0x62/0x70 net/socket.c:2188
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
read to 0xffff88813ae8e300 of 8 bytes by task 14654 on cpu 1:
packet_setsockopt+0x691/0x24a0 net/packet/af_packet.c:3935
__sys_setsockopt+0x209/0x2a0 net/socket.c:2180
__do_sys_setsockopt net/socket.c:2191 [inline]
__se_sys_setsockopt net/socket.c:2188 [inline]
__x64_sys_setsockopt+0x62/0x70 net/socket.c:2188
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
value changed: 0x0000000000000000 -> 0xffff888106f8c000
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 14654 Comm: syz-executor.3 Not tainted 5.16.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Fixes: 47dceb8ecdc1 ("packet: add classic BPF fanout mode") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Link: https://lore.kernel.org/r/20220201022358.330621-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
While looking at one unrelated syzbot bug, I found the replay logic
in __rtnl_newlink() to potentially trigger use-after-free.
It is better to clear master_dev and m_ops inside the loop,
in case we have to replay it.
Fixes: ba7d49b1f0f8 ("rtnetlink: provide api for getting and setting slave info") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20220201012106.216495-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There will be BUG_ON() triggered in include/linux/skbuff.h leading to
intermittent kernel panic, when the skb length underflow is detected.
Fix this by dropping the packet if such length underflows are seen
because of inconsistencies in the hardware descriptors.
Fixes: 622c36f143fc ("amd-xgbe: Fix jumbo MTU processing on newer hardware") Suggested-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Acked-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20220127092003.2812745-1-Shyam-sundar.S-k@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When rx_buf is allocated we need to account for IPHETH_IP_ALIGN,
which reduces the usable size by 2 bytes. Otherwise we have 1512
bytes usable instead of 1514, and if we receive more than 1512
bytes, ipheth_rcvbulk_callback is called with status -EOVERFLOW,
after which the driver malfunctiones and all communication stops.
In case almost or all available ports are taken, clash resolution can
take a very long time, resulting in soft lockup.
This can happen when many to-be-natted hosts connect to same
destination:port (e.g. a proxy) and all connections pass the same SNAT.
Pick a random offset in the acceptable range, then try ever smaller
number of adjacent port numbers, until either the limit is reached or a
useable port was found. This results in at most 248 attempts
(128 + 64 + 32 + 16 + 8, i.e. 4 restarts with new search offset)
instead of 64000+,
This is a leftover from days where single-cpu systems were common:
Store last port used to resolve a clash to use it as a starting point when
the next conflict needs to be resolved.
When we have parallel attempt to connect to same address:port pair,
its likely that both cores end up computing the same "available" port,
as both use same starting port, and newly used ports won't become
visible to other cores until the conntrack gets confirmed later.
One of the cores then has to drop the packet at insertion time because
the chosen new tuple turns out to be in use after all.
Lets simplify this: remove port rover and use a pseudo-random starting
point.
Note that this doesn't make netfilter default to 'fully random' mode;
the 'rover' was only used if NAT could not reuse source port as-is.
In commit 431280eebed9 ("ipv4: tcp: send zero IPID for RST and
ACK sent in SYN-RECV and TIME-WAIT state") we took care of some
ctl packets sent by TCP.
It turns out we need to use a similar strategy for SYNACK packets.
By default, they carry IP_DF and IPID==0, but there are ways
to ask them to use the hashed IP ident generator and thus
be used to build off-path attacks.
(Ref: Off-Path TCP Exploits of the Mixed IPID Assignment)
One of this way is to force (before listener is started)
echo 1 >/proc/sys/net/ipv4/ip_no_pmtu_disc
Another way is using forged ICMP ICMP_FRAG_NEEDED
with a very small MTU (like 68) to force a false return from
ip_dont_fragment()
In this patch, ip_build_and_send_pkt() uses the following
heuristics.
1) Most SYNACK packets are smaller than IPV4_MIN_MTU and therefore
can use IP_DF regardless of the listener or route pmtu setting.
2) In case the SYNACK packet is bigger than IPV4_MIN_MTU,
we use prandom_u32() generator instead of the IPv4 hashed ident one.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Ray Che <xijiache@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Cc: Geoff Alexander <alexandg@cs.unm.edu> Cc: Willy Tarreau <w@1wt.eu> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
For some reason, raw_bind() forgot to lock the socket.
BUG: KCSAN: data-race in __ip4_datagram_connect / raw_bind
write to 0xffff8881170d4308 of 4 bytes by task 5466 on cpu 0:
raw_bind+0x1b0/0x250 net/ipv4/raw.c:739
inet_bind+0x56/0xa0 net/ipv4/af_inet.c:443
__sys_bind+0x14b/0x1b0 net/socket.c:1697
__do_sys_bind net/socket.c:1708 [inline]
__se_sys_bind net/socket.c:1706 [inline]
__x64_sys_bind+0x3d/0x50 net/socket.c:1706
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
read to 0xffff8881170d4308 of 4 bytes by task 5468 on cpu 1:
__ip4_datagram_connect+0xb7/0x7b0 net/ipv4/datagram.c:39
ip4_datagram_connect+0x2a/0x40 net/ipv4/datagram.c:89
inet_dgram_connect+0x107/0x190 net/ipv4/af_inet.c:576
__sys_connect_file net/socket.c:1900 [inline]
__sys_connect+0x197/0x1b0 net/socket.c:1917
__do_sys_connect net/socket.c:1927 [inline]
__se_sys_connect net/socket.c:1924 [inline]
__x64_sys_connect+0x3d/0x50 net/socket.c:1924
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
value changed: 0x00000000 -> 0x0003007f
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 5468 Comm: syz-executor.5 Not tainted 5.17.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
According to its datasheet, G781 supports a maximum conversion rate value
of 8 (62.5 ms). However, chips labeled G781 and G780 were found to only
support a maximum conversion rate value of 7 (125 ms). On the other side,
chips labeled G781-1 and G784 were found to support a conversion rate value
of 8. There is no known means to distinguish G780 from G781 or G784; all
chips report the same manufacturer ID and chip revision.
Setting the conversion rate register value to 8 on chips not supporting
it causes unexpected behavior since the real conversion rate is set to 0
(16 seconds) if a value of 8 is written into the conversion rate register.
Limit the conversion rate register value to 7 for all G78x chips to avoid
the problem.
Fixes: ae544f64cc7b ("hwmon: (lm90) Add support for GMT G781") Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
After commit:7866a621043f ("dev: add per net_device packet type chains"),
we can not get packet types that are bound to a specified net device by
/proc/net/ptype, this patch fix the regression.
Run "tcpdump -i ens192 udp -nns0" Before and after apply this patch:
Before:
[root@localhost ~]# cat /proc/net/ptype
Type Device Function
0800 ip_rcv
0806 arp_rcv
86dd ipv6_rcv
After:
[root@localhost ~]# cat /proc/net/ptype
Type Device Function
ALL ens192 tpacket_rcv
0800 ip_rcv
0806 arp_rcv
86dd ipv6_rcv
v1 -> v2:
- fix the regression rather than adding new /proc API as
suggested by Stephen Hemminger.
Fixes: 7866a621043f ("dev: add per net_device packet type chains") Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If the application sets the O_DIRECTORY flag, and tries to open a
regular file, nfs_atomic_open() will punt to doing a regular lookup.
If the server then returns a regular file, we will happily return a
file descriptor with uninitialised open state.
The fix is to return the expected ENOTDIR error in these cases.
ip_select_ident_segs() has been very conservative about using
the connected socket private generator only for packets with IP_DF
set, claiming it was needed for some VJ compression implementations.
As mentioned in this referenced document, this can be abused.
(Ref: Off-Path TCP Exploits of the Mixed IPID Assignment)
Before switching to pure random IPID generation and possibly hurt
some workloads, lets use the private inet socket generator.
Not only this will remove one vulnerability, this will also
improve performance of TCP flows using pmtudisc==IP_PMTUDISC_DONT
Fixes: 73f156a6e8c1 ("inetpeer: get rid of ip_id_count") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reported-by: Ray Che <xijiache@gmail.com> Cc: Willy Tarreau <w@1wt.eu> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In one net namespace, after creating a packet socket without binding
it to a device, users in other net namespaces can observe the new
`packet_type` added by this packet socket by reading `/proc/net/ptype`
file. This is minor information leakage as packet socket is
namespace aware.
Add a net pointer in `packet_type` to keep the net namespace of
of corresponding packet socket. In `ptype_seq_show`, this net pointer
must be checked when it is not NULL.
Fixes: 2feb27dbe00c ("[NETNS]: Minor information leak via /proc/net/ptype file.") Signed-off-by: Congyu Liu <liu3101@purdue.edu> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The bnx2fc_destroy() functions are removing the interface before calling
destroy_work. This results multiple WARNings from sysfs_remove_group() as
the controller rport device attributes are removed too early.
Replace the fcoe_port's destroy_work queue. It's not needed.
The problem is easily reproducible with the following steps.
The syzbot fuzzer has identified a bug in which processes hang waiting
for usb_kill_urb() to return. It turns out the issue is not unlinking
the URB; that works just fine. Rather, the problem arises when the
wakeup notification that the URB has completed is not received.
The reason is memory-access ordering on SMP systems. In outline form,
usb_kill_urb() and __usb_hcd_giveback_urb() operating concurrently on
different CPUs perform the following actions:
CPU 0 CPU 1
---------------------------- ---------------------------------
usb_kill_urb(): __usb_hcd_giveback_urb():
... ...
atomic_inc(&urb->reject); atomic_dec(&urb->use_count);
... ...
wait_event(usb_kill_urb_queue,
atomic_read(&urb->use_count) == 0);
if (atomic_read(&urb->reject))
wake_up(&usb_kill_urb_queue);
Confining your attention to urb->reject and urb->use_count, you can
see that the overall pattern of accesses on CPU 0 is:
write urb->reject, then read urb->use_count;
whereas the overall pattern of accesses on CPU 1 is:
write urb->use_count, then read urb->reject.
This pattern is referred to in memory-model circles as SB (for "Store
Buffering"), and it is well known that without suitable enforcement of
the desired order of accesses -- in the form of memory barriers -- it
is entirely possible for one or both CPUs to execute their reads ahead
of their writes. The end result will be that sometimes CPU 0 sees the
old un-decremented value of urb->use_count while CPU 1 sees the old
un-incremented value of urb->reject. Consequently CPU 0 ends up on
the wait queue and never gets woken up, leading to the observed hang
in usb_kill_urb().
The same pattern of accesses occurs in usb_poison_urb() and the
failure pathway of usb_hcd_submit_urb().
The problem is fixed by adding suitable memory barriers. To provide
proper memory-access ordering in the SB pattern, a full barrier is
required on both CPUs. The atomic_inc() and atomic_dec() accesses
themselves don't provide any memory ordering, but since they are
present, we can use the optimized smp_mb__after_atomic() memory
barrier in the various routines to obtain the desired effect.
Currently when gadget enumerates in super speed plus, the isoc
endpoint request buffer size is not calculated correctly. Fix
this by checking the gadget speed against USB_SPEED_SUPER_PLUS
and update the request buffer size.
Two people have reported (and mentioned numerous other reports on the
web) that VIA's VL817 USB-SATA bridge does not work with the uas
driver. Typical log messages are:
[ 3606.232149] sd 14:0:0:0: [sdg] tag#2 uas_zap_pending 0 uas-tag 1 inflight: CMD
[ 3606.232154] sd 14:0:0:0: [sdg] tag#2 CDB: Write(16) 8a 00 00 00 00 00 18 0c c9 80 00 00 00 80 00 00
[ 3606.306257] usb 4-4.4: reset SuperSpeed Plus Gen 2x1 USB device number 11 using xhci_hcd
[ 3606.328584] scsi host14: uas_eh_device_reset_handler success
Surprisingly, the devices do seem to work okay for some other people.
The cause of the differing behaviors is not known.
In the hope of getting the devices to work for the most users, even at
the possible cost of degraded performance for some, this patch adds an
unusual_devs entry for the VL817 to block it from binding to the uas
driver by default. Users will be able to override this entry by means
of a module parameter, if they want.
CC: <stable@vger.kernel.org> Reported-by: DocMAX <mail@vacharakis.de> Reported-and-tested-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Link: https://lore.kernel.org/r/Ye8IsK2sjlEv1rqU@rowland.harvard.edu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This commit adds support for the some of the Brainboxes PCI range of
cards, including the UC-101, UC-235/246, UC-257, UC-268, UC-275/279,
UC-302, UC-310, UC-313, UC-320/324, UC-346, UC-357, UC-368
and UC-420/431.
n_gsm is based on the 3GPP 07.010 and its newer version is the 3GPP 27.010.
See https://portal.3gpp.org/desktopmodules/Specifications/SpecificationDetails.aspx?specificationId=1516
The changes from 07.010 to 27.010 are non-functional. Therefore, I refer to
the newer 27.010 here. Chapter 5.2.7.3 states that DC1 (XON) and DC3 (XOFF)
are the control characters defined in ISO/IEC 646. These shall be quoted if
seen in the data stream to avoid interpretation as flow control characters.
ISO/IEC 646 refers to the set of ISO standards described as the ISO
7-bit coded character set for information interchange. Its final version
is also known as ITU T.50.
See https://www.itu.int/rec/T-REC-T.50-199209-I/en
To abide the standard it is needed to quote DC1 and DC3 correctly if these
are seen as data bytes and not as control characters. The current
implementation already tries to enforce this but fails to catch all
defined cases. 3GPP 27.010 chapter 5.2.7.3 clearly states that the most
significant bit shall be ignored for DC1 and DC3 handling. The current
implementation handles only the case with the most significant bit set 0.
Cases in which DC1 and DC3 have the most significant bit set 1 are left
unhandled.
This patch fixes this by masking the data bytes with ISO_IEC_646_MASK (only
the 7 least significant bits set 1) before comparing them with XON
(a.k.a. DC1) and XOFF (a.k.a. DC3) when testing which byte values need
quotation via byte stuffing.
The buffer handling in pm_show_wakelocks() is tricky, and hopefully
correct. Ensure it really is correct by using sysfs_emit_at() which
handles all of the tricky string handling logic in a PAGE_SIZE buffer
for us automatically as this is a sysfs file being read from.
Reviewed-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
udf_expand_file_adinicb() calls directly ->writepage to write data
expanded into a page. This however misses to setup inode for writeback
properly and so we can crash on inode->i_wb dereference when submitting
page for IO like:
Fix the problem by marking the page dirty and going through the standard
writeback path to write the page. Strictly speaking we would not even
have to write the page but we want to catch e.g. ENOSPC errors early.
Reported-by: butt3rflyh4ck <butterflyhuangxx@gmail.com> CC: stable@vger.kernel.org Fixes: 52ebea749aae ("writeback: make backing_dev_info host cgroup-specific bdi_writebacks") Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When we fail to expand inode from inline format to a normal format, we
restore inode to contain the original inline formatting but we forgot to
set i_lenAlloc back. The mismatch between i_lenAlloc and i_size was then
causing further problems such as warnings and lost data down the line.
Reported-by: butt3rflyh4ck <butterflyhuangxx@gmail.com> CC: stable@vger.kernel.org Fixes: 7e49b6f2480c ("udf: Convert UDF to new truncate calling sequence") Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suppose we have an environment with a number of non-NPIV FCP devices
(virtual HBAs / FCP devices / zfcp "adapter"s) sharing the same physical
FCP channel (HBA port) and its I_T nexus. Plus a number of storage target
ports zoned to such shared channel. Now one target port logs out of the
fabric causing an RSCN. Zfcp reacts with an ADISC ELS and subsequent port
recovery depending on the ADISC result. This happens on all such FCP
devices (in different Linux images) concurrently as they all receive a copy
of this RSCN. In the following we look at one of those FCP devices.
Requests other than FSF_QTCB_FCP_CMND can be slow until they get a
response.
Depending on which requests are affected by slow responses, there are
different recovery outcomes. Here we want to fix failed recoveries on port
or adapter level by avoiding recovery requests that can be slow.
We need the cached N_Port_ID for the remote port "link" test with ADISC.
Just before sending the ADISC, we now intentionally forget the old cached
N_Port_ID. The idea is that on receiving an RSCN for a port, we have to
assume that any cached information about this port is stale. This forces a
fresh new GID_PN [FC-GS] nameserver lookup on any subsequent recovery for
the same port. Since we typically can still communicate with the nameserver
efficiently, we now reach steady state quicker: Either the nameserver still
does not know about the port so we stop recovery, or the nameserver already
knows the port potentially with a new N_Port_ID and we can successfully and
quickly perform open port recovery. For the one case, where ADISC returns
successfully, we re-initialize port->d_id because that case does not
involve any port recovery.
This also solves a problem if the storage WWPN quickly logs into the fabric
again but with a different N_Port_ID. Such as on virtual WWPN takeover
during target NPIV failover.
[https://www.redbooks.ibm.com/abstracts/redp5477.html] In that case the
RSCN from the storage FDISC was ignored by zfcp and we could not
successfully recover the failover. On some later failback on the storage,
we could have been lucky if the virtual WWPN got the same old N_Port_ID
from the SAN switch as we still had cached. Then the related RSCN
triggered a successful port reopen recovery. However, there is no
guarantee to get the same N_Port_ID on NPIV FDISC.
Even though NPIV-enabled FCP devices are not affected by this problem, this
code change optimizes recovery time for gone remote ports as a side effect.
The timely drop of cached N_Port_IDs prevents unnecessary slow open port
attempts.
While the problem might have been in code before v2.6.32 commit 799b76d09aee ("[SCSI] zfcp: Decouple gid_pn requests from erp") this fix
depends on the gid_pn_work introduced with that commit, so we mark it as
culprit to satisfy fix dependencies.
Note: Point-to-point remote port is already handled separately and gets its
N_Port_ID from the cached peer_d_id. So resetting port->d_id in general
does not affect PtP.
Link: https://lore.kernel.org/r/20220118165803.3667947-1-maier@linux.ibm.com Fixes: 799b76d09aee ("[SCSI] zfcp: Decouple gid_pn requests from erp") Cc: <stable@vger.kernel.org> #2.6.32+ Suggested-by: Benjamin Block <bblock@linux.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently if z/VM guest is allowed to retrieve hypervisor performance
data globally for all guests (privilege class B) the query is formed in a
way to include all guests but the group name is left empty. This leads to
that z/VM guests which have access control group set not being included
in the results (even local vm).
Change the query group identifier from empty to "any" to retrieve
information about all guests from any groups (or without a group set).
Check for out-of-bound read was being performed at the end of while
num_reports loop, and would fill journal with false positives. Added
check to beginning of loop processing so that it doesn't get checked
after ptr has been advanced.
Ziyang Xuan [Fri, 28 Jan 2022 06:40:54 +0000 (14:40 +0800)]
can: bcm: fix UAF of bcm op
Stopping tasklet and hrtimer rely on the active state of tasklet and
hrtimer sequentially in bcm_remove_op(), the op object will be freed
if they are all unactive. Assume the hrtimer timeout is short, the
hrtimer cb has been excuted after tasklet conditional judgment which
must be false after last round tasklet_kill() and before condition
hrtimer_active(), it is false when execute to hrtimer_active(). Bug
is triggerd, because the stopping action is end and the op object
will be freed, but the tasklet is scheduled. The resources of the op
object will occur UAF bug.
Move hrtimer_cancel() behind tasklet_kill() and switch 'while () {...}'
to 'do {...} while ()' to fix the op UAF problem.
Fixes: a06393ed0316 ("can: bcm: fix hrtimer/tasklet termination in bcm op removal") Reported-by: syzbot+5ca851459ed04c778d1d@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Daniel Rosenberg [Tue, 25 Jan 2022 14:18:07 +0000 (14:18 +0000)]
ion: Protect kref from userspace manipulation
This separates the kref for ion handles into two components.
Userspace requests through the ioctl will hold at most one
reference to the internally used kref. All additional requests
will increment a separate counter, and the original reference is
only put once that counter hits 0. This protects the kernel from
a poorly behaving userspace.
Signed-off-by: Daniel Rosenberg <drosen@google.com>
[d-cagle@codeaurora.org: Resolve style issues] Signed-off-by: Dennis Cagle <d-cagle@codeaurora.org> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Daniel Rosenberg [Tue, 25 Jan 2022 14:18:06 +0000 (14:18 +0000)]
ion: Fix use after free during ION_IOC_ALLOC
If a user happens to call ION_IOC_FREE during an ION_IOC_ALLOC
on the just allocated id, and the copy_to_user fails, the cleanup
code will attempt to free an already freed handle.
This adds a wrapper for ion_alloc that adds an ion_handle_get to
avoid this.
Signed-off-by: Daniel Rosenberg <drosen@google.com> Signed-off-by: Dennis Cagle <d-cagle@codeaurora.org> Signed-off-by: Patrick Daly <pdaly@codeaurora.org> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
While in theory multiple unwinders could be compiled in, it does
not make sense in practise. Use a choice to make the unwinder
selection mutually exclusive and mandatory.
Already before this commit it has not been possible to deselect
FRAME_POINTER. Remove the obsolete comment.
Furthermore, to produce a meaningful backtrace with FRAME_POINTER
enabled the kernel needs a specific function prologue:
mov ip, sp
stmfd sp!, {fp, ip, lr, pc}
sub fp, ip, #4
To get to the required prologue gcc uses apcs and no-sched-prolog.
This compiler options are not available on clang, and clang is not
able to generate the required prologue. Make the FRAME_POINTER
config symbol depending on !clang.
Suggested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Stefan Agner <stefan@agner.ch> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When computing the access permissions of a shadow page, use the effective
permissions of the walk up to that point, i.e. the logic AND of its parents'
permissions. Two guest PxE entries that point at the same table gfn need to
be shadowed with different shadow pages if their parents' permissions are
different. KVM currently uses the effective permissions of the last
non-leaf entry for all non-leaf entries. Because all non-leaf SPTEs have
full ("uwx") permissions, and the effective permissions are recorded only
in role.access and merged into the leaves, this can lead to incorrect
reuse of a shadow page and eventually to a missing guest protection page
fault.
pud1 and pud2 point to the same pmd table, so:
- ptr1 and ptr3 points to the same page.
- ptr2 and ptr4 points to the same page.
(pud1 and pud2 here are pud entries, while pmd1 and pmd2 here are pmd entries)
- First, the guest reads from ptr1 first and KVM prepares a shadow
page table with role.access=u--, from ptr1's pud1 and ptr1's pmd1.
"u--" comes from the effective permissions of pgd, pud1 and
pmd1, which are stored in pt->access. "u--" is used also to get
the pagetable for pud1, instead of "uw-".
- Then the guest writes to ptr2 and KVM reuses pud1 which is present.
The hypervisor set up a shadow page for ptr2 with pt->access is "uw-"
even though the pud1 pmd (because of the incorrect argument to
kvm_mmu_get_page in the previous step) has role.access="u--".
- Then the guest reads from ptr3. The hypervisor reuses pud1's
shadow pmd for pud2, because both use "u--" for their permissions.
Thus, the shadow pmd already includes entries for both pmd1 and pmd2.
- At last, the guest writes to ptr4. This causes no vmexit or pagefault,
because pud1's shadow page structures included an "uw-" page even though
its role.access was "u--".
Any kind of shared pagetable might have the similar problem when in
virtual machine without TDP enabled if the permissions are different
from different ancestors.
In order to fix the problem, we change pt->access to be an array, and
any access in it will not include permissions ANDed from child ptes.
The test code is: https://lore.kernel.org/kvm/20210603050537.19605-1-jiangshanlai@gmail.com/
Remember to test it with TDP disabled.
The problem had existed long before the commit 41074d07c78b ("KVM: MMU:
Fix inherited permissions for emulated guest pte updates"), and it
is hard to find which is the culprit. So there is no fixes tag here.
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20210603052455.21023-1-jiangshanlai@gmail.com> Cc: stable@vger.kernel.org Fixes: cea0f0e7ea54 ("[PATCH] KVM: MMU: Shadow page table caching") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[bwh: Backported to 4.9:
- Keep passing vcpu argument to gpte_access functions
- Adjust filenames, context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This fixes the new ept_access_test_read_only and ept_access_test_read_write
testcases from vmx.flat.
The problem is that gpte_access moves bits around to switch from EPT
bit order (XWR) to ACC_*_MASK bit order (RWX). This results in an
incorrect exit qualification. To fix this, make pt_access and
pte_access operate on raw PTE values (only with NX flipped to mean
"can execute") and call gpte_access at the end of the walk. This
lets us use pte_access to compute the exit qualification with XWR
bit order.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Xiao Guangrong <xiaoguangrong@tencent.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
[bwh: Backported to 4.9:
- There's no support for EPT accessed/dirty bits, so do not use
have_ad flag
- Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Set up the connection to the NFSv4 server in nfs4_alloc_client(), before
we've added the struct nfs_client to the net-namespace's nfs_client_list
so that a downed server won't cause other mounts to hang in the trunking
detection code.
Reported-by: Michael Wakabayashi <mwakabayashi@vmware.com> Fixes: 5c6e5b60aae4 ("NFS: Fix an Oops in the pNFS files and flexfiles connection setup to the DS") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
[bwh: Backported to 4.9: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The bounds checking in avc_ca_pmt() is not strict enough. It should
be checking "read_pos + 4" because it's reading 5 bytes. If the
"es_info_length" is non-zero then it reads a 6th byte so there needs to
be an additional check for that.
I also added checks for the "write_pos". I don't think these are
required because "read_pos" and "write_pos" are tied together so
checking one ought to be enough. But they make the code easier to
understand for me. The check on write_pos is:
if (write_pos + 4 >= sizeof(c->operand) - 4) {
The first "+ 4" is because we're writing 5 bytes and the last " - 4"
is to leave space for the CRC.
The other problem is that "length" can be invalid. It comes from
"data_length" in fdtv_ca_pmt().
Cc: stable@vger.kernel.org Reported-by: Luo Likang <luolikang@nsfocus.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
[bwh: Backported to 4.9: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We need to flush TLBs before releasing backing store otherwise userspace
is able to encounter stale entries if a) it is not declaring access to
certain buffers and b) it races with the backing store release from a
such undeclared execution already executing on the GPU in parallel.
The approach taken is to mark any buffer objects which were ever bound
to the GPU and to trigger a serialized TLB flush when their backing
store is released.
Alternatively the flushing could be done on VMA unbind, at which point
we would be able to ascertain whether there is potential a parallel GPU
execution (which could race), but essentially it boils down to paying
the cost of TLB flushes potentially needlessly at VMA unbind time (when
the backing store is not known to be going away so not needed for
safety), versus potentially needlessly at backing store relase time
(since we at that point cannot tell whether there is anything executing
on the GPU which uses that object).
Thereforce simplicity of implementation has been chosen for now with
scope to benchmark and refine later as required.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reported-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Dave Airlie <airlied@redhat.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Jon Bloomfield <jon.bloomfield@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>