Kernel iterates over ATTR_RECORDs in mft record in ntfs_attr_find().
Because the ATTR_RECORDs are next to each other, kernel can get the next
ATTR_RECORD from end address of current ATTR_RECORD, through current
ATTR_RECORD length field.
The problem is that during iteration, when kernel calculates the end
address of current ATTR_RECORD, kernel may trigger an integer overflow bug
in executing `a = (ATTR_RECORD*)((u8*)a + le32_to_cpu(a->length))`. This
may wrap, leading to a forever iteration on 32bit systems.
This patch solves it by adding some checks on calculating end address
of current ATTR_RECORD during iteration.
Kernel iterates over ATTR_RECORDs in mft record in ntfs_attr_find(). To
ensure access on these ATTR_RECORDs are within bounds, kernel will do some
checking during iteration.
The problem is that during checking whether ATTR_RECORD's name is within
bounds, kernel will dereferences the ATTR_RECORD name_offset field, before
checking this ATTR_RECORD strcture is within bounds. This problem may
result out-of-bounds read in ntfs_attr_find(), reported by Syzkaller:
==================================================================
BUG: KASAN: use-after-free in ntfs_attr_find+0xc02/0xce0 fs/ntfs/attrib.c:597
Read of size 2 at addr ffff88807e352009 by task syz-executor153/3607
This patch solves it by moving the ATTR_RECORD strcture's bounds checking
earlier, then checking whether ATTR_RECORD's name is within bounds.
What's more, this patch also add some comments to improve its
maintainability.
Patch series "ntfs: fix bugs about Attribute", v2.
This patchset fixes three bugs relative to Attribute in record:
Patch 1 adds a sanity check to ensure that, attrs_offset field in first
mft record loading from disk is within bounds.
Patch 2 moves the ATTR_RECORD's bounds checking earlier, to avoid
dereferencing ATTR_RECORD before checking this ATTR_RECORD is within
bounds.
Patch 3 adds an overflow checking to avoid possible forever loop in
ntfs_attr_find().
Without patch 1 and patch 2, the kernel triggersa KASAN use-after-free
detection as reported by Syzkaller.
Although one of patch 1 or patch 2 can fix this, we still need both of
them. Because patch 1 fixes the root cause, and patch 2 not only fixes
the direct cause, but also fixes the potential out-of-bounds bug.
This patch (of 3):
Syzkaller reported use-after-free read as follows:
==================================================================
BUG: KASAN: use-after-free in ntfs_attr_find+0xc02/0xce0 fs/ntfs/attrib.c:597
Read of size 2 at addr ffff88807e352009 by task syz-executor153/3607
Kernel will loads $MFT/$DATA's first mft record in
ntfs_read_inode_mount().
Yet the problem is that after loading, kernel doesn't check whether
attrs_offset field is a valid value.
To be more specific, if attrs_offset field is larger than bytes_allocated
field, then it may trigger the out-of-bounds read bug(reported as
use-after-free bug) in ntfs_attr_find(), when kernel tries to access the
corresponding mft record's attribute.
This patch solves it by adding the sanity check between attrs_offset field
and bytes_allocated field, after loading the first mft record.
We got report from sysbot [1] about warnings that were caused by
bpf program attached to contention_begin raw tracepoint triggering
the same tracepoint by using bpf_trace_printk helper that takes
trace_printk_lock lock.
The can be reproduced by attaching bpf program as raw tracepoint on
contention_begin tracepoint. The bpf prog calls bpf_trace_printk
helper. Then by running perf bench the spin lock code is forced to
take slow path and call contention_begin tracepoint.
Fixing this by skipping execution of the bpf program if it's
already running, Using bpf prog 'active' field, which is being
currently used by trampoline programs for the same reason.
Moving bpf_prog_inc_misses_counter to syscall.c because
trampoline.c is compiled in just for CONFIG_BPF_JIT option.
Shamelessly copying the explanation from Tetsuo Handa's suggested
patch[1] (slightly reworded):
syzbot is reporting inconsistent lock state in p9_req_put()[2],
for p9_tag_remove() from p9_req_put() from IRQ context is using
spin_lock_irqsave() on "struct p9_client"->lock but trans_fd
(not from IRQ context) is using spin_lock().
Since the locks actually protect different things in client.c and in
trans_fd.c, just replace trans_fd.c's lock by a new one specific to the
transport (client.c's protect the idr for fid/tag allocations,
while trans_fd.c's protects its own req list and request status field
that acts as the transport's state machine)
Functions implementing the a_ops->write_end() interface accept the `void
*fsdata` parameter that is supposed to be initialized by the corresponding
a_ops->write_begin() (which accepts `void **fsdata`).
However not all a_ops->write_begin() implementations initialize `fsdata`
unconditionally, so it may get passed uninitialized to a_ops->write_end(),
resulting in undefined behavior.
Fix this by initializing fsdata with NULL before the call to
write_begin(), rather than doing so in all possible a_ops implementations.
This patch covers only the following cases found by running x86 KMSAN
under syzkaller:
- generic_perform_write()
- cont_expand_zero() and generic_cont_expand_simple()
- page_symlink()
Other cases of passing uninitialized fsdata may persist in the codebase.
Link: https://lkml.kernel.org/r/20220915150417.722975-43-glider@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kees Cook <keescook@chromium.org> Cc: Marco Elver <elver@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
These commits use WARN_ON_ONCE() and kill the offending processes when
deprecated and unknown flags are encountered:
commit c17a6ff93213 ("rseq: Kill process when unknown flags are encountered in ABI structures")
commit 0190e4198e47 ("rseq: Deprecate RSEQ_CS_FLAG_NO_RESTART_ON_* flags")
The WARN_ON_ONCE() triggered by userspace input prevents use of
Syzkaller to fuzz the rseq system call.
Replace this WARN_ON_ONCE() by pr_warn_once() messages which contain
actually useful information.
Reported-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lkml.kernel.org/r/20221102130635.7379-1-mathieu.desnoyers@efficios.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Wireless events will be sent on the appropriate channels in
wireless_send_event(). Different wireless events may have different
payload structure and size, so kernel uses **len** and **cmd** field
in struct __compat_iw_event as wireless event common LCP part, uses
**pointer** as a label to mark the position of remaining different part.
Yet the problem is that, **pointer** is a compat_caddr_t type, which may
be smaller than the relative structure at the same position. So during
wireless_send_event() tries to parse the wireless events payload, it may
trigger the memcpy() run-time destination buffer bounds checking when the
relative structure's data is copied to the position marked by **pointer**.
This patch solves it by introducing flexible-array field **ptr_bytes**,
to mark the position of the wireless events remaining part next to
LCP part. What's more, this patch also adds **ptr_len** variable in
wireless_send_event() to improve its maintainability.
In preparation for FORTIFY_SOURCE doing bounds-check on memcpy(),
switch from __nlmsg_put to nlmsg_put(), and explain the bounds check
for dealing with the memcpy() across a composite flexible array struct.
Avoids this future run-time warning:
memcpy: detected field-spanning write (size 32) of single field "&errmsg->msg" at net/netlink/af_netlink.c:2447 (size 16)
Cc: Jakub Kicinski <kuba@kernel.org> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Jozsef Kadlecsik <kadlec@netfilter.org> Cc: Florian Westphal <fw@strlen.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: syzbot <syzkaller@googlegroups.com> Cc: netfilter-devel@vger.kernel.org Cc: coreteam@netfilter.org Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220901071336.1418572-1-keescook@chromium.org Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
syzbot is reporting hung task at p9_fd_close() [1], for p9_mux_poll_stop()
from p9_conn_destroy() from p9_fd_close() is failing to interrupt already
started kernel_read() from p9_fd_read() from p9_read_work() and/or
kernel_write() from p9_fd_write() from p9_write_work() requests.
Since p9_socket_open() sets O_NONBLOCK flag, p9_mux_poll_stop() does not
need to interrupt kernel_read()/kernel_write(). However, since p9_fd_open()
does not set O_NONBLOCK flag, but pipe blocks unless signal is pending,
p9_mux_poll_stop() needs to interrupt kernel_read()/kernel_write() when
the file descriptor refers to a pipe. In other words, pipe file descriptor
needs to be handled as if socket file descriptor.
We somehow need to interrupt kernel_read()/kernel_write() on pipes.
A minimal change, which this patch is doing, is to set O_NONBLOCK flag
from p9_fd_open(), for O_NONBLOCK flag does not affect reading/writing
of regular files. But this approach changes O_NONBLOCK flag on userspace-
supplied file descriptors (which might break userspace programs), and
O_NONBLOCK flag could be changed by userspace. It would be possible to set
O_NONBLOCK flag every time p9_fd_read()/p9_fd_write() is invoked, but still
remains small race window for clearing O_NONBLOCK flag.
If we don't want to manipulate O_NONBLOCK flag, we might be able to
surround kernel_read()/kernel_write() with set_thread_flag(TIF_SIGPENDING)
and recalc_sigpending(). Since p9_read_work()/p9_write_work() works are
processed by kernel threads which process global system_wq workqueue,
signals could not be delivered from remote threads when p9_mux_poll_stop()
from p9_conn_destroy() from p9_fd_close() is called. Therefore, calling
set_thread_flag(TIF_SIGPENDING)/recalc_sigpending() every time would be
needed if we count on signals for making kernel_read()/kernel_write()
non-blocking.
Switch from strlcpy to strscpy and make sure that @count is the size of
the smaller of the source and destination buffers. This prevents
reading beyond the end of the source buffer when the source string isn't
null terminated.
Found by a modified version of syzkaller.
Suggested-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fuzzers like to scribble over sb_bsize_shift but in reality it's very
unlikely that this field would be corrupted on its own. Nevertheless it
should be checked to avoid the possibility of messy mount errors due to
bad calculations. It's always a fixed value based on the block size so
we can just check that it's the expected value.
Tested with:
mkfs.gfs2 -O -p lock_nolock /dev/vdb
for i in 0 -1 64 65 32 33; do
gfs2_edit -p sb field sb_bsize_shift $i /dev/vdb
mount /dev/vdb /mnt/test && umount /mnt/test
done
and with UBSAN configured we also get complaints like
[ 76.373395] UBSAN: shift-out-of-bounds in fs/gfs2/ops_fstype.c:295:19
[ 76.373815] shift exponent 4294967287 is too large for 64-bit type 'long unsigned int'
After the patch, these complaints don't appear, mount fails immediately
and we get an explanation in dmesg.
Reported-by: syzbot+dcf33a7aae997956fe06@syzkaller.appspotmail.com Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
syzbot found that kcm_tx_work() could crash [1] in:
/* Primarily for SOCK_SEQPACKET sockets */
if (likely(sk->sk_socket) &&
test_bit(SOCK_NOSPACE, &sk->sk_socket->flags)) {
<<*>> clear_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
sk->sk_write_space(sk);
}
I think the reason is that another thread might concurrently
run in kcm_release() and call sock_orphan(sk) while sk is not
locked. kcm_tx_work() find sk->sk_socket being NULL.
[1]
BUG: KASAN: null-ptr-deref in instrument_atomic_write include/linux/instrumented.h:86 [inline]
BUG: KASAN: null-ptr-deref in clear_bit include/asm-generic/bitops/instrumented-atomic.h:41 [inline]
BUG: KASAN: null-ptr-deref in kcm_tx_work+0xff/0x160 net/kcm/kcmsock.c:742
Write of size 8 at addr 0000000000000008 by task kworker/u4:3/53
Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]
Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.
[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567
macvlan should enforce a minimal mtu of 68, even at link creation.
This patch avoids the current behavior (which could lead to crashes
in ipv6 stack if the link is brought up)
$ ip link add macvlan1 link eno1 mtu 8 type macvlan # This should fail !
$ ip link sh dev macvlan1
5: macvlan1@eno1: <BROADCAST,MULTICAST> mtu 8 qdisc noop
state DOWN mode DEFAULT group default qlen 1000
link/ether 02:47:6c:24:74:82 brd ff:ff:ff:ff:ff:ff
$ ip link set macvlan1 mtu 67
Error: mtu less than device minimum.
$ ip link set macvlan1 mtu 68
$ ip link set macvlan1 mtu 8
Error: mtu less than device minimum.
Fixes: 91572088e3fd ("net: use core MTU range checking in core net infra") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Avoid resetting the module-wide i8042_platform_device pointer in
i8042_probe() or i8042_remove(), so that the device can be properly
destroyed by i8042_exit() on module unload.
Since pmd_user_accessible_page() doesn't check if a pmd is leaf, it
decrease file_map_count for a non-leaf pmd comes from collapse_huge_page().
and so trigger BUG_ON() unexpectedly.
Fix this problem by using pmd_leaf() insteal of pmd_present() in
pmd_user_accessible_page(). Moreover, use pud_leaf() for
pud_user_accessible_page() too.
Fixes: 42b2547137f5 ("arm64/mm: enable ARCH_SUPPORTS_PAGE_TABLE_CHECK") Reported-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Liu Shixin <liushixin2@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20221117075602.2904324-2-liushixin2@huawei.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Entries in list 'tr->err_log' will be reused after entry number
exceed TRACING_LOG_ERRS_MAX.
The cmd string of the to be reused entry will be freed first then
allocated a new one. If the allocation failed, then the entry will
still be in list 'tr->err_log' but its 'cmd' field is set to be NULL,
later access of 'cmd' is risky.
Currently above problem can cause the loss of 'cmd' information of first
entry in 'tr->err_log'. When execute `cat /sys/kernel/tracing/error_log`,
reproduce logs like:
[ 37.495100] trace_kprobe: error: Maxactive is not for kprobe(null) ^
[ 38.412517] trace_kprobe: error: Maxactive is not for kprobe
Command: p4:myprobe2 do_sys_openat2
^
In __unregister_kprobe_top(), if the currently unregistered probe has
post_handler but other child probes of the aggrprobe do not have
post_handler, the post_handler of the aggrprobe is cleared. If this is
a ftrace-based probe, there is a problem. In later calls to
disarm_kprobe(), we will use kprobe_ftrace_ops because post_handler is
NULL. But we're armed with kprobe_ipmodify_ops. This triggers a WARN in
__disarm_kprobe_ftrace() and may even cause use-after-free:
For the kprobe-on-ftrace case, we keep the post_handler setting to
identify this aggrprobe armed with kprobe_ipmodify_ops. This way we
can disarm it correctly.
If device_register() fails in sdebug_add_host_helper(), it will goto clean
and sdbg_host will be freed, but sdbg_host->host_list will not be removed
from sdebug_host_list, then list traversal may cause UAF. Fix it.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Yuan Can <yuancan@huawei.com> Link: https://lore.kernel.org/r/20221117084421.58918-1-yuancan@huawei.com Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
If device_register() fails in tcm_loop_setup_hba_bus(), the name allocated
by dev_set_name() need be freed. As comment of device_register() says, it
should use put_device() to give up the reference in the error path. So fix
this by calling put_device(), then the name can be freed in kobject_cleanup().
The 'tl_hba' will be freed in tcm_loop_release_adapter(), so it don't need
goto error label in this case.
Fixes: 3703b2c5d041 ("[SCSI] tcm_loop: Add multi-fabric Linux/SCSI LLD fabric module") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Link: https://lore.kernel.org/r/20221115015042.3652261-1-yangyingliang@huawei.com Reviewed-by: Mike Christie <michael.chritie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
kernel test robot reported warnings when build bonding module with
make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash drivers/net/bonding/:
from ../drivers/net/bonding/bond_main.c:35:
In function ‘fortify_memcpy_chk’,
inlined from ‘iph_to_flow_copy_v4addrs’ at ../include/net/ip.h:566:2,
inlined from ‘bond_flow_ip’ at ../drivers/net/bonding/bond_main.c:3984:3:
../include/linux/fortify-string.h:413:25: warning: call to ‘__read_overflow2_field’ declared with attribute warning: detected read beyond size of f
ield (2nd parameter); maybe use struct_group()? [-Wattribute-warning]
413 | __read_overflow2_field(q_size_field, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In function ‘fortify_memcpy_chk’,
inlined from ‘iph_to_flow_copy_v6addrs’ at ../include/net/ipv6.h:900:2,
inlined from ‘bond_flow_ip’ at ../drivers/net/bonding/bond_main.c:3994:3:
../include/linux/fortify-string.h:413:25: warning: call to ‘__read_overflow2_field’ declared with attribute warning: detected read beyond size of f
ield (2nd parameter); maybe use struct_group()? [-Wattribute-warning]
413 | __read_overflow2_field(q_size_field, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This is because we try to copy the whole ip/ip6 address to the flow_key,
while we only point the to ip/ip6 saddr. Note that since these are UAPI
headers, __struct_group() is used to avoid the compiler warnings.
Reported-by: kernel test robot <lkp@intel.com> Fixes: c3f8324188fa ("net: Add full IPv6 addresses to flow_keys") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/20221115142400.1204786-1-liuhangbin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
An external PHY needs settling time after power up or reset.
In the bind() function an mdio bus is registered. If at this point
the external PHY is still initialising, no valid PHY ID will be
read and on phy_find_first() the bind() function will fail.
If an external PHY is present, wait the maximum time specified
in 802.3 45.2.7.1.1.
Fixes: 05b35e7eb9a1 ("smsc95xx: add phylib support") Signed-off-by: Alexandru Tachici <alexandru.tachici@analog.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20221115114434.9991-2-alexandru.tachici@analog.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Move the declaration of 'struct trace_array' out of #ifdef
CONFIG_TRACING block, to fix the following warning when CONFIG_TRACING
is not set:
>> include/linux/trace.h:63:45: warning: 'struct trace_array' declared
inside parameter list will not be visible outside of this definition or
declaration
Link: https://lkml.kernel.org/r/20221107160556.2139463-1-shraash@google.com Fixes: 1a77dd1c2bb5 ("scsi: tracing: Fix compile error in trace_array calls when TRACING is disabled") Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Arun Easi <aeasi@marvell.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Aashish Sharma <shraash@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The function ring_buffer_nr_dirty_pages() was created to find out how many
pages are filled in the ring buffer. There's two running counters. One is
incremented whenever a new page is touched (pages_touched) and the other
is whenever a page is read (pages_read). The dirty count is the number
touched minus the number read. This is used to determine if a blocked task
should be woken up if the percentage of the ring buffer it is waiting for
is hit.
The problem is that it does not take into account dropped pages (when the
new writes overwrite pages that were not read). And then the dirty pages
will always be greater than the percentage.
This makes the "buffer_percent" file inaccurate, as the number of dirty
pages end up always being larger than the percentage, event when it's not
and this causes user space to be woken up more than it wants to be.
Add a new counter to keep track of lost pages, and include that in the
accounting of dirty pages so that it is actually accurate.
Link: https://lkml.kernel.org/r/20221021123013.55fb6055@gandalf.local.home Fixes: 2c2b0a78b3739 ("ring-buffer: Add percentage of ring buffer full to wake up reader") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
A perf NMI of another event can come between these two steps. Perf NMI
handler internally disables and enables _all_ events, including the one
which nmi-intercepted amd_pmu_enable_all() was in process of enabling.
If that unintentionally enabled event has very low sampling period and
causes immediate successive NMI, causing the event to be throttled,
cpuc->events[idx] and cpuc->active_mask gets cleared by x86_pmu_stop().
This will result in amd_pmu_enable_event() getting called with event=NULL
when amd_pmu_enable_all() resumes after handling the NMIs. This causes a
kernel crash:
amd_pmu_disable_all()/amd_pmu_enable_all() calls inside perf NMI handler
were recently added as part of BRS enablement but I'm not sure whether
we really need them. We can just disable BRS in the beginning and enable
it back while returning from NMI. This will solve the issue by not
enabling those events whose active_masks are set but are not yet enabled
in hw pmu.
Fixes: ada543459cab ("perf/x86/amd: Add AMD Fam19h Branch Sampling support") Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221114044029.373-1-ravi.bangoria@amd.com Signed-off-by: Sasha Levin <sashal@kernel.org>
To catch missing SIGTRAP we employ a WARN in __perf_event_overflow(),
which fires if pending_sigtrap was already set: returning to user space
without consuming pending_sigtrap, and then having the event fire again
would re-enter the kernel and trigger the WARN.
This, however, seemed to miss the case where some events not associated
with progress in the user space task can fire and the interrupt handler
runs before the IRQ work meant to consume pending_sigtrap (and generate
the SIGTRAP).
In this case, syzbot produced a program with event type
PERF_TYPE_SOFTWARE and config PERF_COUNT_SW_CPU_CLOCK. The hrtimer
manages to fire again before the IRQ work got a chance to run, all while
never having returned to user space.
Improve the WARN to check for real progress in user space: approximate
this by storing a 32-bit hash of the current IP into pending_sigtrap,
and if an event fires while pending_sigtrap still matches the previous
IP, we assume no progress (false negatives are possible given we could
return to user space and trigger again on the same IP).
Move the return value check before attempting to assign the core ID to the
swidget since we are going to fail the sof_widget_ready() and free up
swidget anyways.
Fixes: 909dadf21aae ("ASoC: SOF: topology: Make DAI widget parsing IPC agnostic") Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com> Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Link: https://lore.kernel.org/r/20221107090433.5146-1-peter.ujfalusi@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The subsystem reset writes to a register, so we have to ensure the
device state is capable of handling that otherwise the driver may access
unmapped registers. Use the state machine to ensure the subsystem reset
doesn't try to write registers on a device already undergoing this type
of reset.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=214771 Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The passthrough commands already have this restriction, but the other
operations do not. Require the same capabilities for all users as all of
these operations, which include resets and rescans, can be disruptive.
Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Deal with errata TGL052, ADL037 and RPL017 "Trace May Contain Incorrect
Data When Configured With Single Range Output Larger Than 4KB" by
disabling single range output whenever larger than 4KB.
Fixes: 670638477aed ("perf/x86/intel/pt: Opportunistically use single range output mode") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20221112151508.13768-1-adrian.hunter@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When a CPU comes online, the per-CPU NB and LLC uncore contexts are
freed but not the events array within the context structure. This
causes a memory leak as identified by the kmemleak detector.
The splat comes from fpu_inherit_perms() being called under fpregs_lock(),
and us reaching the spin_lock_irq() therein due to fpu_state_size_dynamic()
returning true despite static key __fpu_state_size_dynamic having never
been enabled.
Mike's assessment looks correct. fpregs_lock on a PREEMPT_RT kernel disables
preemption so calling spin_lock_irq() in fpu_inherit_perms() is unsafe. This
problem exists since commit
9e798e9aa14c ("x86/fpu: Prepare fpu_clone() for dynamically enabled features").
Even though the original bug report should not have enabled the paths at
all, the bug still exists.
fpregs_lock is necessary when editing the FPU registers or a task's FP
state but it is not necessary for fpu_inherit_perms(). The only write
of any FP state in fpu_inherit_perms() is for the new child which is
not running yet and cannot context switch or be borrowed by a kernel
thread yet. Hence, fpregs_lock is not protecting anything in the new
child until clone() completes and can be dropped earlier. The siglock
still needs to be acquired by fpu_inherit_perms() as the read of the
parent's permissions has to be serialised.
sgx_validate_offset_length() function verifies "offset" and "length"
arguments provided by userspace, but was missing an overflow check on
their addition. Add it.
blkcg_css_online is supposed to pin the blkcg of the parent, but 397c9f46ee4d refactored things and along the way, changed it to pin the
css instead. This results in extra pins, and we end up leaking blkcgs
and cgroups.
Fixes: 397c9f46ee4d ("blk-cgroup: move blkcg_{pin,unpin}_online out of line") Signed-off-by: Chris Mason <clm@fb.com> Spotted-by: Rik van Riel <riel@surriel.com> Cc: <stable@vger.kernel.org> # v5.19+ Acked-by: Johannes Weiner <hannes@cmpxchg.org> Link: https://lore.kernel.org/r/20221114181930.2093706-1-clm@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Local variable ev created at:
qp_notify_peer+0x54/0x290 drivers/misc/vmw_vmci/vmci_queue_pair.c:1456
qp_broker_attach drivers/misc/vmw_vmci/vmci_queue_pair.c:1662
qp_broker_alloc+0x2977/0x2f30 drivers/misc/vmw_vmci/vmci_queue_pair.c:1750
Bytes 28-31 of 48 are uninitialized
Memory access of size 48 starts at ffff888035155e00
Data copied to user address 0000000020000100
Use memset() to prevent the infoleaks.
Also speculatively fix qp_notify_peer_local(), which may suffer from the
same problem.
After the rework from commit 1ebe2e5f9d68 ("block: remove
GENHD_FL_EXT_DEVT"), when calling device_add_disk(), dcssblk will end up
in disk_scan_partitions(), and not break out early w/o GENHD_FL_NO_PART.
This will trigger implicit open/release via blkdev_get/put_whole()
later. dcssblk_release() will then deadlock on dcssblk_devices_sem
semaphore, which is already held from dcssblk_add_store() when calling
device_add_disk().
dcssblk does not support partitions (DCSSBLK_MINORS_PER_DISK == 1), and
never scanned partitions before. Therefore restore the previous
behavior, and explicitly disallow partition scanning by setting the
GENHD_FL_NO_PART flag. This will also prevent this deadlock scenario.
Since merge of tty-6.0-rc1, "make htmldocs" with Sphinx >=3.1 emits
a bunch of warnings indicating duplicate kernel-doc comments from
drivers/tty/serial/serial_core.c.
This is due to the kernel-doc directive for serial_core.c in
serial/drivers.rst added in the merge. It conflicts with an existing
kernel-doc directive in miscellaneous.rst.
Remove the latter directive and resolve the duplicates.
pci_get_device() will increase the reference count for the returned
pci_dev. We need to use pci_dev_put() to decrease the reference count
before amd_probe() returns. There is no problem for the 'smbus_dev ==
NULL' branch because pci_dev_put() can also handle the NULL input
parameter case.
Fixes: 659c9bc114a8 ("mmc: sdhci-pci: Build o2micro support in the same module") Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20221114083100.149200-1-wangxiongfeng2@huawei.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The SD card is recognized failed sometimes when resume from suspend.
Because CD# debounce time too long then card present report wrong.
Finally, card is recognized failed.
In mmc_select_voltage(), if there is no full power cycle, the voltage
range selected at the end of the function will be on a single range
(e.g. 3.3V/3.4V). To keep a range around the selected voltage (3.2V/3.4V),
the mask shift should be reduced by 1.
This issue was triggered by using a specific SD-card (Verbatim Premium
16GB UHS-1) on an STM32MP157C-DK2 board. This board cannot do UHS modes
and there is no power cycle. And the card was failing to switch to
high-speed mode. When adding the range 3.2V/3.3V for this card with the
proposed shift change, the card can switch to high-speed mode.
The coreboot_table driver registers a coreboot bus while probing a
"coreboot_table" device representing the coreboot table memory region.
Probing this device (i.e., registering the bus) is a dependency for the
module_init() functions of any driver for this bus (e.g.,
memconsole-coreboot.c / memconsole_driver_init()).
With synchronous probe, this dependency works OK, as the link order in
the Makefile ensures coreboot_table_driver_init() (and thus,
coreboot_table_probe()) completes before a coreboot device driver tries
to add itself to the bus.
With asynchronous probe, however, coreboot_table_probe() may race with
memconsole_driver_init(), and so we're liable to hit one of these two:
1. coreboot_driver_register() eventually hits "[...] the bus was not
initialized.", and the memconsole driver fails to register; or
2. coreboot_driver_register() gets past #1, but still races with
bus_register() and hits some other undefined/crashing behavior (e.g.,
in driver_find() [1])
We can resolve this by registering the bus in our initcall, and only
deferring "device" work (scanning the coreboot memory region and
creating sub-devices) to probe().
[1] Example failure, using 'driver_async_probe=*' kernel command line:
SRS cap is the hardware cap telling if the hardware IOMMU can support
requests seeking supervisor privilege or not. SRE bit in scalable-mode
PASID table entry is treated as Reserved(0) for implementation not
supporting SRS cap.
Checking SRS cap before setting SRE bit can avoid the non-recoverable
fault of "Non-zero reserved field set in PASID Table Entry" caused by
setting SRE bit while there is no SRS cap support. The fault messages
look like below:
DMAR: DRHD: handling fault status reg 2
DMAR: [DMA Read NO_PASID] Request device [00:0d.0] fault addr 0x1154e1000
[fault reason 0x5a]
SM: Non-zero reserved field set in PASID Table Entry
The A/D bits are preseted for IOVA over first level(FL) usage for both
kernel DMA (i.e, domain typs is IOMMU_DOMAIN_DMA) and user space DMA
usage (i.e., domain type is IOMMU_DOMAIN_UNMANAGED).
Presetting A bit in FL requires to preset the bit in every related paging
entries, including the non-leaf ones. Otherwise, hardware may treat this
as an error. For example, in a case of ECAP_REG.SMPWC==0, DMA faults might
occur with below DMAR fault messages (wrapped for line length) dumped.
DMAR: DRHD: handling fault status reg 2
DMAR: [DMA Read NO_PASID] Request device [aa:00.0] fault addr 0x10c3a6000
[fault reason 0x90]
SM: A/D bit update needed in first-level entry when set up in no snoop
We used to use the wrong type of integer in 'zfcp_fsf_req_send()' to cache
the FSF request ID when sending a new FSF request. This is used in case the
sending fails and we need to remove the request from our internal hash
table again (so we don't keep an invalid reference and use it when we free
the request again).
In 'zfcp_fsf_req_send()' we used to cache the ID as 'int' (signed and 32
bit wide), but the rest of the zfcp code (and the firmware specification)
handles the ID as 'unsigned long'/'u64' (unsigned and 64 bit wide [s390x
ELF ABI]). For one this has the obvious problem that when the ID grows
past 32 bit (this can happen reasonably fast) it is truncated to 32 bit
when storing it in the cache variable and so doesn't match the original ID
anymore. The second less obvious problem is that even when the original ID
has not yet grown past 32 bit, as soon as the 32nd bit is set in the
original ID (0x80000000 = 2'147'483'648) we will have a mismatch when we
cast it back to 'unsigned long'. As the cached variable is of a signed
type, the compiler will choose a sign-extending instruction to load the 32
bit variable into a 64 bit register (e.g.: 'lgf %r11,188(%r15)'). So once
we pass the cached variable into 'zfcp_reqlist_find_rm()' to remove the
request again all the leading zeros will be flipped to ones to extend the
sign and won't match the original ID anymore (this has been observed in
practice).
If we can't successfully remove the request from the hash table again after
'zfcp_qdio_send()' fails (this happens regularly when zfcp cannot notify
the adapter about new work because the adapter is already gone during
e.g. a ChpID toggle) we will end up with a double free. We unconditionally
free the request in the calling function when 'zfcp_fsf_req_send()' fails,
but because the request is still in the hash table we end up with a stale
memory reference, and once the zfcp adapter is either reset during recovery
or shutdown we end up freeing the same memory twice.
The resulting stack traces vary depending on the kernel and have no direct
correlation to the place where the bug occurs. Here are three examples that
have been seen in practice:
To fix this, simply change the type of the cache variable to 'unsigned
long', like the rest of zfcp and also the argument for
'zfcp_reqlist_find_rm()'. This prevents truncation and wrong sign extension
and so can successfully remove the request from the hash table.
Fixes: e60a6d69f1f8 ("[SCSI] zfcp: Remove function zfcp_reqlist_find_safe") Cc: <stable@vger.kernel.org> #v2.6.34+ Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Link: https://lore.kernel.org/r/979f6e6019d15f91ba56182f1aaf68d61bf37fc6.1668595505.git.bblock@linux.ibm.com Reviewed-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If a page fault occurs while copying the first byte, this function resets one
byte before dst.
As a consequence, an address could be modified and leaded to kernel crashes if
case the modified address was accessed later.
Fixes: b58294ead14c ("maccess: allow architectures to provide kernel probing directly") Signed-off-by: Alban Crequy <albancrequy@linux.microsoft.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Tested-by: Francis Laniel <flaniel@linux.microsoft.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: <stable@vger.kernel.org> [5.8] Link: https://lore.kernel.org/bpf/20221110085614.111213-2-albancrequy@linux.microsoft.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
syzbot is reporting uninitialized value at iforce_init_device() [1], for
commit 6ac0aec6b0a6 ("Input: iforce - allow callers supply data buffer
when fetching device IDs") is checking that valid length is shorter than
bytes to read. Since iforce_get_id_packet() stores valid length when
returning 0, the caller needs to check that valid length is longer than or
equals to bytes to read.
When decoding the snaps fails it maybe leaving the 'first_realm'
and 'realm' pointing to the same snaprealm memory. And then it'll
put it twice and could cause random use-after-free, BUG_ON, etc
issues.
When we post a CQE we wake all ring pollers as it normally should be.
However, if a CQE was generated by a multishot poll request targeting
its own ring, it'll wake that request up, which will make it to post
a new CQE, which will wake the request and so on until it exhausts all
CQ entries.
Don't allow multishot polling io_uring files but downgrade them to
oneshots, which was always stated as a correct behaviour that the
userspace should check for.
Having REQ_F_POLLED set doesn't guarantee that the request is
executed as a multishot from the polling path. Fortunately for us, if
the code thinks it's multishot issue when it's not, it can only ask to
skip completion so leaking the request. Use issue_flags to mark
multipoll issues.
Having REQ_F_POLLED set doesn't guarantee that the request is
executed as a multishot from the polling path. Fortunately for us, if
the code thinks it's multishot issue when it's not, it can only ask to
skip completion so leaking the request. Use issue_flags to mark
multipoll issues.
We may never try to process a poll wake and its mask if there was
multiple wake ups racing for queueing up a tw. Force
io_poll_check_events() to update the mask by vfs_poll().
Configure DMA to use 16B burst size with Elkhart Lake. This makes the
bus use more efficient and works around an issue which occurs with the
previously used 1B.
The fix was initially developed by Srikanth Thokala and Aman Kumar.
This together with the previous config change is the cleaned up version
of the original fix.
Fixes: 0a9410b981e9 ("serial: 8250_lpss: Enable DMA on Intel Elkhart Lake") Cc: <stable@vger.kernel.org> # serial: 8250_lpss: Configure DMA also w/o DMA filter Reported-by: Wentong Wu <wentong.wu@intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20221108121952.5497-4-ilpo.jarvinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If the platform doesn't use DMA device filter (as is the case with
Elkhart Lake), whole lpss8250_dma_setup() setup is skipped. This
results in skipping also *_maxburst setup which is undesirable.
Refactor lpss8250_dma_setup() to configure DMA even if filter is not
setup.
Returning true from handle_rx_dma() without flushing DMA first creates
a data ordering hazard. If DMA Rx has handled any character at the
point when RLSI occurs, the non-DMA path handles any pending characters
jumping them ahead of those characters that are pending under DMA.
DW UART sometimes triggers IIR_RDI during DMA Rx when IIR_RX_TIMEOUT
should have been triggered instead. Since IIR_RDI has higher priority
than IIR_RX_TIMEOUT, this causes the Rx to hang into interrupt loop.
The problem seems to occur at least with some combinations of
small-sized transfers (I've reproduced the problem on Elkhart Lake PSE
UARTs).
If there's already an on-going Rx DMA and IIR_RDI triggers, fall
graciously back to non-DMA Rx. That is, behave as if IIR_RX_TIMEOUT had
occurred.
8250_omap already considers IIR_RDI similar to this change so its
nothing unheard of.
Fixes: 75df022b5f89 ("serial: 8250_dma: Fix RX handling") Cc: <stable@vger.kernel.org> Co-developed-by: Srikanth Thokala <srikanth.thokala@intel.com> Signed-off-by: Srikanth Thokala <srikanth.thokala@intel.com> Co-developed-by: Aman Kumar <aman.kumar@intel.com> Signed-off-by: Aman Kumar <aman.kumar@intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20221108121952.5497-2-ilpo.jarvinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
__list_versions will first estimate the required space using the
"dm_target_iterate(list_version_get_needed, &needed)" call and then will
fill the space using the "dm_target_iterate(list_version_get_info,
&iter_info)" call. Each of these calls locks the targets using the
"down_read(&_lock)" and "up_read(&_lock)" calls, however between the first
and second "dm_target_iterate" there is no lock held and the target
modules can be loaded at this point, so the second "dm_target_iterate"
call may need more space than what was the first "dm_target_iterate"
returned.
The code tries to handle this overflow (see the beginning of
list_version_get_info), however this handling is incorrect.
The code sets "param->data_size = param->data_start + needed" and
"iter_info.end = (char *)vers+len" - "needed" is the size returned by the
first dm_target_iterate call; "len" is the size of the buffer allocated by
userspace.
"len" may be greater than "needed"; in this case, the code will write up
to "len" bytes into the buffer, however param->data_size is set to
"needed", so it may write data past the param->data_size value. The ioctl
interface copies only up to param->data_size into userspace, thus part of
the result will be truncated.
Fix this bug by setting "iter_info.end = (char *)vers + needed;" - this
guarantees that the second "dm_target_iterate" call will write only up to
the "needed" buffer and it will exit with "DM_BUFFER_FULL_FLAG" if it
overflows the "needed" space - in this case, userspace will allocate a
larger buffer and retry.
Note that there is also a bug in list_version_get_needed - we need to add
"strlen(tt->name) + 1" to the needed size, not "strlen(tt->name)".
The 'no_sleep_enabled' should be decreased in error handling path
in dm_bufio_client_create() when the DM_BUFIO_CLIENT_NO_SLEEP flag
is set, otherwise static_branch_unlikely() will always return true
even if no dm_bufio_client instances have DM_BUFIO_CLIENT_NO_SLEEP
flag set.
When using multiple instances of this driver the compensation PROM was
overwritten by the last initialized sensor. Now each sensor has own PROM
storage.
Signed-off-by: Mitja Spes <mitja@lxnav.com> Fixes: 9690d81a02dc ("iio: pressure: ms5611: add support for MS5607 temperature and pressure sensor") Link: https://lore.kernel.org/r/20221021135827.1444793-2-mitja@lxnav.com Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
dev_set_name() allocates memory for name, it need be freed
when device_add() fails, call put_device() to give up the
reference that hold in device_initialize(), so that it can
be freed in kobject_cleanup() when the refcount hit to 0.
If iio_trigger_register() returns error, it should call iio_trigger_free()
to give up the reference that hold in iio_trigger_alloc(), so that it can
call iio_trig_release() to free memory when the refcount hit to 0.
The regulator enables were after the check on the chip variant, which was
very unlikely to return a correct value when not powered.
Presumably all the device anyone is testing on have a regulator that
is already powered up when this code runs for reasons beyond the scope
of this driver. Move the read call down a few lines.
If reading TPS_REG_INT_EVENT1/2 fails in the interrupt handler event1
and event2 may be uninitialized when they are used to determine
IRQ_HANDLED vs. IRQ_NONE in the error path.
Fixes: c7260e29dd20 ("usb: typec: tipd: Add short-circuit for no irqs") Fixes: 45188f27b3d0 ("usb: typec: tipd: Add support for Apple CD321X") Cc: stable <stable@kernel.org> Signed-off-by: Sven Peter <sven@svenpeter.dev> Reviewed-by: Eric Curtin <ecurtin@redhat.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Guido Günther <agx@sigxcpu.org> Link: https://lore.kernel.org/r/20221102161542.30669-1-sven@svenpeter.dev Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There is no point to enter safe mode during DP/TBT configuration
if the DP/TBT was already configured in mux. This is because safe
mode is only applicable when there is a need to reconfigure the
pins in order to avoid damage within/to port partner.
In some chrome systems, IOM/mux is already configured before OS
comes up. Thus, when driver is probed, it blindly enters safe
mode due to PD negotiations but only after gfx driver lowers
dp_phy_ownership, will the IOM complete safe mode and send an
ack to PMC.
Since, that never happens, we see IPC timeout.
Hence, allow safe mode only when pin reconfiguration is not
required, which makes sense.
Fixes: 43d596e32276 ("usb: typec: intel_pmc_mux: Check the port status before connect") Cc: stable <stable@kernel.org> Signed-off-by: Rajat Khandelwal <rajat.khandelwal@linux.intel.com> Signed-off-by: Lee Shawn C <shawn.c.lee@intel.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Link: https://lore.kernel.org/r/20221024171611.181468-1-rajat.khandelwal@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When usb 3.0 hub connect with one USB 2.0 device and NO USB 3.0 device,
some usb hub reports endless port reset message.
[ 190.324169] usb 2-1: new SuperSpeed USB device number 88 using xhci-hcd
[ 190.352834] hub 2-1:1.0: USB hub found
[ 190.356995] hub 2-1:1.0: 4 ports detected
[ 190.700056] usb 2-1: USB disconnect, device number 88
[ 192.472139] usb 2-1: new SuperSpeed USB device number 89 using xhci-hcd
[ 192.500820] hub 2-1:1.0: USB hub found
[ 192.504977] hub 2-1:1.0: 4 ports detected
[ 192.852066] usb 2-1: USB disconnect, device number 89
The reason is the runtime pm state of USB2.0 port is active and
USB 3.0 port is suspend, so parent device is active state.
So xhci_cdns3_suspend_quirk() have not called. U3 configure is not applied.
move U3 configure into host start. Reinit again in resume function in case
controller power lost during suspend.
Cc: stable@vger.kernel.org 5.10 Signed-off-by: Li Jun <jun.li@nxp.com> Signed-off-by: Frank Li <Frank.Li@nxp.com> Reviewed-by: Peter Chen <peter.chen@kernel.org> Acked-by: Alexander Stein <alexander.stein@ew.tq-group.com> Link: https://lore.kernel.org/r/20221026190749.2280367-1-Frank.Li@nxp.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We hold ci->lock in position (1) and use hrtimer_cancel() to
wait ci_otg_hrtimer_func() to stop, but ci_otg_hrtimer_func()
also need ci->lock in position (2). As a result, the
hrtimer_cancel() in ci_otg_del_timer() will be blocked forever.
This patch extracts hrtimer_cancel() from the protection of
spin_lock_irqsave() in order that the ci_otg_hrtimer_func()
could obtain the ci->lock.
What`s more, there will be no race happen. Because the
"next_timer" is always under the protection of
spin_lock_irqsave() and we only check whether "next_timer"
equals to NUM_OTG_FSM_TIMERS in the following code.
Before adding this quirk, this (mechanical keyboard) device would not be
recognized, logging:
new full-speed USB device number 56 using xhci_hcd
unable to read config index 0 descriptor/start: -32
chopping to 0 config(s)
It would take dozens of plugging/unpuggling cycles for the keyboard to
be recognized. Keyboard seems to simply work after applying this quirk.
This issue had been reported by users in two places already ([1], [2])
but nobody tried upstreaming a patch yet. After testing I believe their
suggested fix (DELAY_INIT + NO_LPM + DEVICE_QUALIFIER) was probably a
little overkill. I assume this particular combination was tested because
it had been previously suggested in [3], but only NO_LPM seems
sufficient for this device.
Add LARA-L6 PIDs for three different USB compositions.
LARA-L6 module can be configured (by AT interface) in three different
USB modes:
* Default mode (Vendor ID: 0x1546 Product ID: 0x1341) with 4 serial
interfaces
* RmNet mode (Vendor ID: 0x1546 Product ID: 0x1342) with 4 serial
interfaces and 1 RmNet virtual network interface
* CDC-ECM mode (Vendor ID: 0x1546 Product ID: 0x1343) with 4 serial
interface and 1 CDC-ECM virtual network interface
In default mode LARA-L6 exposes the following interfaces:
If 0: Diagnostic
If 1: AT parser
If 2: AT parser
If 3: AT parser/alternative functions
In RmNet mode LARA-L6 exposes the following interfaces:
If 0: Diagnostic
If 1: AT parser
If 2: AT parser
If 3: AT parset/alternative functions
If 4: RMNET interface
In CDC-ECM mode LARA-L6 exposes the following interfaces:
If 0: Diagnostic
If 1: AT parser
If 2: AT parser
If 3: AT parset/alternative functions
If 4: CDC-ECM interface
Signed-off-by: Davide Tronchin <davide.tronchin.94@gmail.com>
[ johan: drop PID defines in favour of comments ] Cc: stable@vger.kernel.org Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The official LARA-R6 (00B) modem uses 0x908b PID. LARA-R6 00B does not
implement a QMI interface on port 4, the reservation (RSVD(4)) has been
added to meet other companies that implement QMI on that interface.
LARA-R6 00B USB composition exposes the following interfaces:
If 0: Diagnostic
If 1: AT parser
If 2: AT parser
If 3: AT parser/alternative functions
Remove the UBLOX_PRODUCT_R6XX 0x90fa association since LARA-R6 00B final
product uses a new USB composition with different PID. 0x90fa PID used
only by LARA-R6 internal prototypes.
Move 0x90fa PID directly in the option_ids array since used by other
Qualcomm based modem vendors as pointed out in:
Add support for the AT and diag ports, similar to other qualcomm SDX55
modems. In QDL mode, the modem uses a different device ID and support
is provided by qcserial in commit 11c52d250b34 ("USB: serial: qcserial:
add EM9191 QDL support").
What the code does is to not check the return value from
devm_gpiod_get() and then avoid using an erroneous GPIO descriptor
with IS_ERR_OR_NULL().
This will miss real errors from the GPIO core that should not be
ignored, such as probe deferral.
Instead request the GPIO as explicitly optional, which means that
if it doesn't exist, the descriptor returned will be NULL.
Then we can add error handling and also avoid just doing this on
the device tree path, and simplify the site where the optional
GPIO descriptor is used.
There were some problems with cleaning up this GPIO descriptor
use in the past, but this is the proper way to deal with it.
If CONFIG_SLIM_QCOM_NGD_CTRL=y, CONFIG_QCOM_RPROC_COMMON=m, COMPILE_TEST=y,
bulding fails:
drivers/slimbus/qcom-ngd-ctrl.o: In function `qcom_slim_ngd_ctrl_probe':
qcom-ngd-ctrl.c:(.text+0x330): undefined reference to `qcom_register_ssr_notifier'
qcom-ngd-ctrl.c:(.text+0x5fc): undefined reference to `qcom_unregister_ssr_notifier'
drivers/slimbus/qcom-ngd-ctrl.o: In function `qcom_slim_ngd_remove':
qcom-ngd-ctrl.c:(.text+0x90c): undefined reference to `qcom_unregister_ssr_notifier'
Make SLIM_QCOM_NGD_CTRL depends on QCOM_RPROC_COMMON || (COMPILE_TEST && !QCOM_RPROC_COMMON) to fix this.
Fixes: e291691c6977 ("slimbus: qcom-ngd-ctrl: allow compile testing without QCOM_RPROC_COMMON") Cc: stable <stable@kernel.org> Signed-off-by: Zheng Bin <zhengbin13@huawei.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20221027095904.3388959-1-zhengbin13@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When io_poll_check_events() collides with someone attempting to queue a
task work, it'll spin for one more time. However, it'll continue to use
the mask from the first iteration instead of updating it. For example,
if the first wake up was a EPOLLIN and the second EPOLLOUT, the
userspace will not get EPOLLOUT in time.
Clear the mask for all subsequent iterations to force vfs_poll().
The change breaks device tree based platforms with PHY device and use
usb-role-switch instead of an extcon switch. extcon_find_edev_by_node()
will return EPROBE_DEFER if it can not find a device so probing without
an extcon device will be deferred indefinitely. Fix this by
explicitly checking for usb-role-switch.
At least the out-of-tree USB3 support on Apple silicon based platforms
using dwc3 with tipd USB Type-C and PD controller is affected by this
issue.
Fixes: d182c2e1bc92 ("usb: dwc3: Don't switch OTG -> peripheral if extcon is present") Cc: stable@kernel.org Signed-off-by: Janne Grunau <j@jannau.net> Acked-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com> Reviewed-by: Sven Peter <sven@svenpeter.dev> Link: https://lore.kernel.org/r/20221106214804.2814-1-j@jannau.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The offending commit disabled the USB core PHY management as the dwc3
already manages the PHYs in question.
Unfortunately some platforms have started relying on having USB core
also controlling the PHY and this is specifically currently needed on
some Exynos platforms for PHY calibration or connected device may fail
to enumerate.
The PHY calibration was previously handled in the dwc3 driver, but to
work around some issues related to how the dwc3 driver interacts with
xhci (e.g. using multiple drivers) this was moved to USB core by commits 34c7ed72f4f0 ("usb: core: phy: add support for PHY calibration") and a0a465569b45 ("usb: dwc3: remove generic PHY calibrate() calls").
The same PHY obviously should not be controlled from two different
places, which for example do no agree on the PHY mode or power state
during suspend, but as the offending patch was backported to stable,
let's revert it for now.
Samsung Galaxy Book Pro 360 (13" 2021 NP930QBD-ke1US) with codec SSID
144d:c1a6 requires the same workaround for enabling the speaker amp
like other Samsung models with ALC298 codec.
The Samsung Galaxy Book Pro seems to have the same issue as a few
other Samsung laptops, detailed in kernel bug report 207423. Sound from
headphone jack works, but not the built-in speakers.
snd_usbmidi_output_open() has a check of the NULL port with
snd_BUG_ON(). snd_BUG_ON() was used as this shouldn't have happened,
but in reality, the NULL port may be seen when the device gives an
invalid endpoint setup at the descriptor, hence the driver skips the
allocation. That is, the check itself is valid and snd_BUG_ON()
should be dropped from there. Otherwise it's confusing as if it were
a real bug, as recently syzbot stumbled on it.
[Description]
Prefetch calculation loop was not exiting until utilizing all of vstartup if it
failed once. Locals need to be reset on each iteration of the loop.
Reviewed-by: Jun Lei <Jun.Lei@amd.com> Acked-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Dillon Varone <Dillon.Varone@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.0.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
DM maps DRM CRTC degamma to DPP (pre-blending) degamma block, but DCE doesn't
support programmable degamma curve anywhere. Currently, a custom degamma is
accepted by DM but just ignored by DCE driver and degamma correction isn't
actually applied. There is no way to map custom degamma in DCE, therefore, DRM
CRTC degamma property shouldn't be enabled for DCE drivers.
[Why]
dcn314 uses optc2_configure_crc() that wraps
optc1_configure_crc() + set additional registers
not applicable to dcn314.
It's not critical but when used leads to warning like:
WARNING: drivers/gpu/drm/amd/amdgpu/../display/dc/dc_helper.c
Call Trace:
<TASK>
generic_reg_set_ex+0x6d/0xe0 [amdgpu]
optc2_configure_crc+0x60/0x80 [amdgpu]
dc_stream_configure_crc+0x129/0x150 [amdgpu]
amdgpu_dm_crtc_configure_crc_source+0x5d/0xe0 [amdgpu]
[How]
Use optc1_configure_crc() directly
Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com> Acked-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.0.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[Why]
For DCN3.2 and DCN3.21, VBIOS has switch to using v3.0 of the VRAM
info struct. We should read and override the VRAM info in driver with
values provided by VBIOS to support memory downbin cases.
Reviewed-by: Alvin Lee <Alvin.Lee2@amd.com> Acked-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: George Shen <george.shen@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.0.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[Why]
Since introduction of patch "Query DPIA HPD status.", link detection at
boot could be accessing DPIA AUX, which will not succeed until
DMUB outbox messaging is enabled and results in below dmesg logs:
[How]
Enable DMUB outbox messaging before link detection at boot time.
Reviewed-by: Wayne Lin <Wayne.Lin@amd.com> Acked-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Stylon Wang <stylon.wang@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.0.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>