Calling btrfs_qgroup_reserve_meta_prealloc from
btrfs_delayed_inode_reserve_metadata can result in flushing delalloc
while holding a transaction and delayed node locks. This is deadlock
prone. In the past multiple commits:
* ae5e070eaca9 ("btrfs: qgroup: don't try to wait flushing if we're
already holding a transaction")
* 6f23277a49e6 ("btrfs: qgroup: don't commit transaction when we already
hold the handle")
Tried to solve various aspects of this but this was always a
whack-a-mole game. Unfortunately those 2 fixes don't solve a deadlock
scenario involving btrfs_delayed_node::mutex. Namely, one thread
can call btrfs_dirty_inode as a result of reading a file and modifying
its atime:
To fully address those cases the complete fix is to never issue any
flushing while holding the transaction or the delayed node lock. This
patch achieves it by calling qgroup_reserve_meta directly which will
either succeed without flushing or will fail and return -EDQUOT. In the
latter case that return value is going to be propagated to
btrfs_dirty_inode which will fallback to start a new transaction. That's
fine as the majority of time we expect the inode will have
BTRFS_DELAYED_NODE_INODE_DIRTY flag set which will result in directly
copying the in-memory state.
Fixes: c53e9653605d ("btrfs: qgroup: try to flush qgroup space when we get -EDQUOT") CC: stable@vger.kernel.org # 5.10+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[CAUSE]
In try_flush_qgroup(), we assume we don't hold a transaction handle at
all. This is true for data reservation and mostly true for metadata.
Since data space reservation always happens before we start a
transaction, and for most metadata operation we reserve space in
start_transaction().
But there is an exception, btrfs_delayed_inode_reserve_metadata().
It holds a transaction handle, while still trying to reserve extra
metadata space.
When we hit EDQUOT inside btrfs_delayed_inode_reserve_metadata(), we
will join current transaction and commit, while we still have
transaction handle from qgroup code.
[FIX]
Let's check current->journal before we join the transaction.
If current->journal is unset or BTRFS_SEND_TRANS_STUB, it means
we are not holding a transaction, thus are able to join and then commit
transaction.
If current->journal is a valid transaction handle, we avoid committing
transaction and just end it
This is less effective than committing current transaction, as it won't
free metadata reserved space, but we may still free some data space
before new data writes.
Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1178634 Fixes: c53e9653605d ("btrfs: qgroup: try to flush qgroup space when we get -EDQUOT") Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Printing kernel pointers is discouraged because they might leak kernel
memory layout. This fixes smatch warning:
drivers/net/ethernet/xilinx/xilinx_emaclite.c:1191 xemaclite_of_probe() warn:
argument 4 to %08lX specifier is cast from pointer
Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Pavel Machek (CIP) <pavel@denx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is because during the quota enable ioctl we lock first the mutex
qgroup_ioctl_lock and then start a transaction, and starting a transaction
acquires a fs freeze semaphore (at the VFS level). However, every other
code path, except for the quota disable ioctl path, we do the opposite:
we start a transaction and then lock the mutex.
So fix this by making the quota enable and disable paths to start the
transaction without having the mutex locked, and then, after starting the
transaction, lock the mutex and check if some other task already enabled
or disabled the quotas, bailing with success if that was the case.
Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Anand Jain <anand.jain@oracle.com>
Conflicts:
fs/btrfs/qgroup.c Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a514d63882c3 ("btrfs: qgroup: Commit transaction in advance to
reduce early EDQUOT") tries to reduce the early EDQUOT problems by
checking the qgroup free against threshold and tries to wake up commit
kthread to free some space.
The problem of that mechanism is, it can only free qgroup per-trans
metadata space, can't do anything to data, nor prealloc qgroup space.
Now since we have the ability to flush qgroup space, and implemented
retry-after-EDQUOT behavior, such mechanism can be completely replaced.
So this patch will cleanup such mechanism in favor of
retry-after-EDQUOT.
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The state was introduced in commit 4a9d8bdee368 ("Btrfs: make the state
of the transaction more readable"), then in commit 302167c50b32
("btrfs: don't end the transaction for delayed refs in throttle") the
state is completely removed.
So we can just clean up the state since it's only compared but never
set.
Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[PROBLEM]
There are known problem related to how btrfs handles qgroup reserved
space. One of the most obvious case is the the test case btrfs/153,
which do fallocate, then write into the preallocated range.
# btrfs/153 1s ... - output mismatch (see xfstests-dev/results//btrfs/153.out.bad)
# --- tests/btrfs/153.out 2019-10-22 15:18:14.068965341 +0800
# +++ xfstests-dev/results//btrfs/153.out.bad 2020-07-01 20:24:40.730000089 +0800
# @@ -1,2 +1,5 @@
# QA output created by 153
# +pwrite: Disk quota exceeded
# +/mnt/scratch/testfile2: Disk quota exceeded
# +/mnt/scratch/testfile2: Disk quota exceeded
# Silence is golden
# ...
# (Run 'diff -u xfstests-dev/tests/btrfs/153.out xfstests-dev/results//btrfs/153.out.bad' to see the entire diff)
[CAUSE]
Since commit c6887cd11149 ("Btrfs: don't do nocow check unless we have to"),
we always reserve space no matter if it's COW or not.
Such behavior change is mostly for performance, and reverting it is not
a good idea anyway.
For preallcoated extent, we reserve qgroup data space for it already,
and since we also reserve data space for qgroup at buffered write time,
it needs twice the space for us to write into preallocated space.
This leads to the -EDQUOT in buffered write routine.
And we can't follow the same solution, unlike data/meta space check,
qgroup reserved space is shared between data/metadata.
The EDQUOT can happen at the metadata reservation, so doing NODATACOW
check after qgroup reservation failure is not a solution.
[FIX]
To solve the problem, we don't return -EDQUOT directly, but every time
we got a -EDQUOT, we try to flush qgroup space:
- Flush all inodes of the root
NODATACOW writes will free the qgroup reserved at run_dealloc_range().
However we don't have the infrastructure to only flush NODATACOW
inodes, here we flush all inodes anyway.
- Wait for ordered extents
This would convert the preallocated metadata space into per-trans
metadata, which can be freed in later transaction commit.
- Commit transaction
This will free all per-trans metadata space.
Also we don't want to trigger flush multiple times, so here we introduce
a per-root wait list and a new root status, to ensure only one thread
starts the flushing.
Fixes: c6887cd11149 ("Btrfs: don't do nocow check unless we have to") Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[PROBLEM]
Before this patch, when btrfs_qgroup_reserve_data() fails, we free all
reserved space of the changeset.
For example:
ret = btrfs_qgroup_reserve_data(inode, changeset, 0, SZ_1M);
ret = btrfs_qgroup_reserve_data(inode, changeset, SZ_1M, SZ_1M);
ret = btrfs_qgroup_reserve_data(inode, changeset, SZ_2M, SZ_1M);
If the last btrfs_qgroup_reserve_data() failed, it will release the
entire [0, 3M) range.
This behavior is kind of OK for now, as when we hit -EDQUOT, we normally
go error handling and need to release all reserved ranges anyway.
But this also means the following call is not possible:
ret = btrfs_qgroup_reserve_data();
if (ret == -EDQUOT) {
/* Do something to free some qgroup space */
ret = btrfs_qgroup_reserve_data();
}
As if the first btrfs_qgroup_reserve_data() fails, it will free all
reserved qgroup space.
[CAUSE]
This is because we release all reserved ranges when
btrfs_qgroup_reserve_data() fails.
[FIX]
This patch will implement a new function, qgroup_unreserve_range(), to
iterate through the ulist nodes, to find any nodes in the failure range,
and remove the EXTENT_QGROUP_RESERVED bits from the io_tree, and
decrease the extent_changeset::bytes_changed, so that we can revert to
previous state.
This allows later patches to retry btrfs_qgroup_reserve_data() if EDQUOT
happens.
Suggested-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When registering new ppp interface via PPPIOCNEWUNIT ioctl then kernel has
to choose interface name as this ioctl API does not support specifying it.
Kernel in this case register new interface with name "ppp<id>" where <id>
is the ppp unit id, which can be obtained via PPPIOCGUNIT ioctl. This
applies also in the case when registering new ppp interface via rtnl
without supplying IFLA_IFNAME.
PPPIOCNEWUNIT ioctl allows to specify own ppp unit id which will kernel
assign to ppp interface, in case this ppp id is not already used by other
ppp interface.
In case user does not specify ppp unit id then kernel choose the first free
ppp unit id. This applies also for case when creating ppp interface via
rtnl method as it does not provide a way for specifying own ppp unit id.
If some network interface (does not have to be ppp) has name "ppp<id>"
with this first free ppp id then PPPIOCNEWUNIT ioctl or rtnl call fails.
And registering new ppp interface is not possible anymore, until interface
which holds conflicting name is renamed. Or when using rtnl method with
custom interface name in IFLA_IFNAME.
As list of allocated / used ppp unit ids is not possible to retrieve from
kernel to userspace, userspace has no idea what happens nor which interface
is doing this conflict.
So change the algorithm how ppp unit id is generated. And choose the first
number which is not neither used as ppp unit id nor in some network
interface with pattern "ppp<id>".
This issue can be simply reproduced by following pppd call when there is no
ppp interface registered and also no interface with name pattern "ppp<id>":
pppd ifname ppp1 +ipv6 noip noauth nolock local nodetach pty "pppd +ipv6 noip noauth nolock local nodetach notty"
Or by creating the one ppp interface (which gets assigned ppp unit id 0),
renaming it to "ppp1" and then trying to create a new ppp interface (which
will always fails as next free ppp unit id is 1, but network interface with
name "ppp1" exists).
This patch fixes above described issue by generating new and new ppp unit
id until some non-conflicting id with network interfaces is generated.
Signed-off-by: Pali Rohár <pali@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kunpeng920's EHCI controller does not have SBRN register.
Reading the SBRN register when the controller driver is
initialized will get 0.
When rebooting the EHCI driver, ehci_shutdown() will be called.
if the sbrn flag is 0, ehci_shutdown() will return directly.
The sbrn flag being 0 will cause the EHCI interrupt signal to
not be turned off after reboot. this interrupt that is not closed
will cause an exception to the device sharing the interrupt.
Therefore, the EHCI controller of Kunpeng920 needs to skip
the read operation of the SBRN register.
When computing the access permissions of a shadow page, use the effective
permissions of the walk up to that point, i.e. the logic AND of its parents'
permissions. Two guest PxE entries that point at the same table gfn need to
be shadowed with different shadow pages if their parents' permissions are
different. KVM currently uses the effective permissions of the last
non-leaf entry for all non-leaf entries. Because all non-leaf SPTEs have
full ("uwx") permissions, and the effective permissions are recorded only
in role.access and merged into the leaves, this can lead to incorrect
reuse of a shadow page and eventually to a missing guest protection page
fault.
pud1 and pud2 point to the same pmd table, so:
- ptr1 and ptr3 points to the same page.
- ptr2 and ptr4 points to the same page.
(pud1 and pud2 here are pud entries, while pmd1 and pmd2 here are pmd entries)
- First, the guest reads from ptr1 first and KVM prepares a shadow
page table with role.access=u--, from ptr1's pud1 and ptr1's pmd1.
"u--" comes from the effective permissions of pgd, pud1 and
pmd1, which are stored in pt->access. "u--" is used also to get
the pagetable for pud1, instead of "uw-".
- Then the guest writes to ptr2 and KVM reuses pud1 which is present.
The hypervisor set up a shadow page for ptr2 with pt->access is "uw-"
even though the pud1 pmd (because of the incorrect argument to
kvm_mmu_get_page in the previous step) has role.access="u--".
- Then the guest reads from ptr3. The hypervisor reuses pud1's
shadow pmd for pud2, because both use "u--" for their permissions.
Thus, the shadow pmd already includes entries for both pmd1 and pmd2.
- At last, the guest writes to ptr4. This causes no vmexit or pagefault,
because pud1's shadow page structures included an "uw-" page even though
its role.access was "u--".
Any kind of shared pagetable might have the similar problem when in
virtual machine without TDP enabled if the permissions are different
from different ancestors.
In order to fix the problem, we change pt->access to be an array, and
any access in it will not include permissions ANDed from child ptes.
The test code is: https://lore.kernel.org/kvm/20210603050537.19605-1-jiangshanlai@gmail.com/
Remember to test it with TDP disabled.
The problem had existed long before the commit 41074d07c78b ("KVM: MMU:
Fix inherited permissions for emulated guest pte updates"), and it
is hard to find which is the culprit. So there is no fixes tag here.
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20210603052455.21023-1-jiangshanlai@gmail.com> Cc: stable@vger.kernel.org Fixes: cea0f0e7ea54 ("[PATCH] KVM: MMU: Shadow page table caching") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[OP: - apply arch/x86/kvm/mmu/* changes to arch/x86/kvm
- apply documentation changes to Documentation/virt/kvm/mmu.txt
- adjusted context in arch/x86/kvm/paging_tmpl.h] Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If the device is already in the runtime suspended state, any call to
the pullup routine will issue a runtime resume on the DWC3 core
device. If the USB gadget is disabling the pullup, then avoid having
to issue a runtime resume, as DWC3 gadget has already been
halted/stopped.
This fixes an issue where the following condition occurs:
This leads to a situation where the DWC3 gadget is not properly
stopped, as the runtime resume would have re-enabled EP0 and event
interrupts, and since we avoided the DWC3 gadget suspend, these
resources were never disabled.
Current sequence utilizes dwc3_gadget_disable_irq() alongside
synchronize_irq() to ensure that no further DWC3 events are generated.
However, the dwc3_gadget_disable_irq() API only disables device
specific events. Endpoint events can still be generated. Briefly
disable the interrupt line, so that the cleanup code can run to
prevent device and endpoint events. (i.e. __dwc3_gadget_stop() and
dwc3_stop_active_transfers() respectively)
Without doing so, it can lead to both the interrupt handler and the
pullup disable routine both writing to the GEVNTCOUNT register, which
will cause an incorrect count being read from future interrupts.
Ensure that dep->flags are cleared until after stop active transfers
is completed. Otherwise, the ENDXFER command will not be executed
during ep disable.
Fixes: f09ddcfcb8c5 ("usb: dwc3: gadget: Prevent EP queuing while stopping transfers") Cc: stable <stable@vger.kernel.org> Reported-and-tested-by: Andy Shevchenko <andy.shevchenko@gmail.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Wesley Cheng <wcheng@codeaurora.org> Link: https://lore.kernel.org/r/1616610664-16495-1-git-send-email-wcheng@codeaurora.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In the situations where the DWC3 gadget stops active transfers, once
calling the dwc3_gadget_giveback(), there is a chance where a function
driver can queue a new USB request in between the time where the dwc3
lock has been released and re-aquired. This occurs after we've already
issued an ENDXFER command. When the stop active transfers continues
to remove USB requests from all dep lists, the newly added request will
also be removed, while controller still has an active TRB for it.
This can lead to the controller accessing an unmapped memory address.
Fix this by ensuring parameters to prevent EP queuing are set before
calling the stop active transfers API.
Fixes: ae7e86108b12 ("usb: dwc3: Stop active transfers before halting the controller") Signed-off-by: Wesley Cheng <wcheng@codeaurora.org> Link: https://lore.kernel.org/r/1615507142-23097-1-git-send-email-wcheng@codeaurora.org Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
usb_gadget_deactivate/usb_gadget_activate does not execute the UDC start
operation, which may leave EP0 disabled and event IRQs disabled when
re-activating the function. Move the enabling/disabling of USB EP0 and
device event IRQs to be performed in the pullup routine.
Fixes: ae7e86108b12 ("usb: dwc3: Stop active transfers before halting the controller") Tested-by: Michael Tretter <m.tretter@pengutronix.de> Cc: stable <stable@vger.kernel.org> Reported-by: Michael Tretter <m.tretter@pengutronix.de> Signed-off-by: Wesley Cheng <wcheng@codeaurora.org> Link: https://lore.kernel.org/r/1609282837-21666-1-git-send-email-wcheng@codeaurora.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The DWC3 runtime suspend routine checks for the USB connected parameter to
determine if the controller can enter into a low power state. The
connected state is only set to false after receiving a disconnect event.
However, in the case of a device initiated disconnect (i.e. UDC unbind),
the controller is halted and a disconnect event is never generated. Set
the connected flag to false if issuing a device initiated disconnect to
allow the controller to be suspended.
In the DWC3 databook, for a device initiated disconnect or bus reset, the
driver is required to send dependxfer commands for any pending transfers.
In addition, before the controller can move to the halted state, the SW
needs to acknowledge any pending events. If the controller is not halted
properly, there is a chance the controller will continue accessing stale or
freed TRBs and buffers.
Since the string type can not be the target of the addition / subtraction
operation, it must be rejected. Without this fix, the string type silently
converted to digits.
Link: https://lkml.kernel.org/r/162742654278.290973.1523000673366456634.stgit@devnote2 Cc: stable@vger.kernel.org Fixes: 100719dcef447 ("tracing: Add simple expression support to hist triggers") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If poll() is called on a m2m device with the EPOLLOUT event after the
last buffer of the CAPTURE queue is dequeued, any buffer available on
OUTPUT queue will never be signaled because v4l2_m2m_poll_for_data()
starts by checking whether dst_q->last_buffer_dequeued is set and
returns EPOLLIN in this case, without looking at the state of the OUTPUT
queue.
Fix this by not early returning so we keep checking the state of the
OUTPUT queue afterwards.
Currently TEE_SHM_DMA_BUF flag has been inappropriately used to not
register shared memory allocated for private usage by underlying TEE
driver: OP-TEE in this case. So rather add a new flag as TEE_SHM_PRIV
that can be utilized by underlying TEE drivers for private allocation
and usage of shared memory.
With this corrected, allow tee_shm_alloc_kernel_buf() to allocate a
shared memory region without the backing of dma-buf.
Use the raw ASID, not ASID-1, when nullifying the last used VMCB when
freeing an SEV ASID. The consumer, pre_sev_run(), indexes the array by
the raw ASID, thus KVM could get a false negative when checking for a
different VMCB if KVM manages to reallocate the same ASID+VMCB combo for
a new VM.
Note, this cannot cause a functional issue _in the current code_, as
pre_sev_run() also checks which pCPU last did VMRUN for the vCPU, and
last_vmentry_cpu is initialized to -1 during vCPU creation, i.e. is
guaranteed to mismatch on the first VMRUN. However, prior to commit 8a14fe4f0c54 ("kvm: x86: Move last_cpu into kvm_vcpu_arch as
last_vmentry_cpu"), SVM tracked pCPU on its own and zero-initialized the
last_cpu variable. Thus it's theoretically possible that older versions
of KVM could miss a TLB flush if the first VMRUN is on pCPU0 and the ASID
and VMCB exactly match those of a prior VM.
Fixes: 70cd94e60c73 ("KVM: SVM: VMRUN should use associated ASID when SEV is enabled") Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Due to inconsistencies in the way we manipulate compat GPRs, we have a
few issues today:
* For audit and tracing, where error codes are handled as a (native)
long, negative error codes are expected to be sign-extended to the
native 64-bits, or they may fail to be matched correctly. Thus a
syscall which fails with an error may erroneously be identified as
failing.
* For ptrace, *all* compat return values should be sign-extended for
consistency with 32-bit arm, but we currently only do this for
negative return codes.
* As we may transiently set the upper 32 bits of some compat GPRs while
in the kernel, these can be sampled by perf, which is somewhat
confusing. This means that where a syscall returns a pointer above 2G,
this will be sign-extended, but will not be mistaken for an error as
error codes are constrained to the inclusive range [-4096, -1] where
no user pointer can exist.
To fix all of these, we must consistently use helpers to get/set the
compat GPRs, ensuring that we never write the upper 32 bits of the
return code, and always sign-extend when reading the return code. This
patch does so, with the following changes:
* We re-organise syscall_get_return_value() to always sign-extend for
compat tasks, and reimplement syscall_get_error() atop. We update
syscall_trace_exit() to use syscall_get_return_value().
* We consistently use syscall_set_return_value() to set the return
value, ensureing the upper 32 bits are never set unexpectedly.
* As the core audit code currently uses regs_return_value() rather than
syscall_get_return_value(), we special-case this for
compat_user_mode(regs) such that this will do the right thing. Going
forward, we should try to move the core audit code over to
syscall_get_return_value().
Cc: <stable@vger.kernel.org> Reported-by: He Zhe <zhe.he@windriver.com> Reported-by: weiyuchen <weiyuchen3@huawei.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20210802104200.21390-1-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
[Mark: trivial conflict resolution for v5.4.y] Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When calling the 'ql_wait_for_drvr_lock' and 'ql_adapter_reset', the driver
has already acquired the spin lock, so the driver should not call 'ssleep'
in atomic context.
This bug can be fixed by using 'mdelay' instead of 'ssleep'.
Reported-by: Letu Ren <fantasquex@gmail.com> Signed-off-by: Letu Ren <fantasquex@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
This issue was noticed while debugging a shutdown issue where some
secondary CPUs are not being shutdown correctly. A fix for that [1] requires
that secondary cpus be offlined using the cpu_online_mask so that the
stop operation is a no-op if CPU HOTPLUG is disabled. I, like the author in
[1] looked at the architectures and found that alpha is one of two
architectures that executes smp_send_stop() on all possible CPUs.
On alpha, smp_send_stop() sends an IPI to all possible CPUs but only needs
to send them to online CPUs.
Send the stop IPI to only the online CPUs.
[1] https://lkml.org/lkml/2020/1/10/250
Signed-off-by: Prarit Bhargava <prarit@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Signed-off-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Add the fake bss just before the connect so that cfg80211_get_bss()
finds the virtual network.
As some code was duplicated, move it in a common function.
While verifying the leaf item that we read from the disk, reiserfs
doesn't check the directory items, this could cause a crash when we
read a directory item from the disk that has an invalid deh_location.
This patch adds a check to the directory items read from the disk that
does a bounds check on deh_location for the directory entries. Any
directory entry header with a directory entry offset greater than the
item length is considered invalid.
This is because 'root_inode' is initialized with wrong mode, and
it's i_op is set to 'reiserfs_special_inode_operations'. Thus add
check for 'root_inode' to fix the problem.
Data transfers are not required to be block aligned in memory, so they
span two pages. Fix this by splitting the call to >sff_data_xfer into
two for that case.
This has been broken since the initial libata import before the damn
of git, but was uncovered by the legacy ide driver removal.
Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210709130237.3730959-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
Given we don't need to simulate the speculative domain for registers with
immediates anymore since the verifier uses direct imm-based rewrites instead
of having to mask, we can also lift a few cases that were previously rejected.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>
[OP: backport to 5.4, small context adjustment in stack_ptr.c] Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sparse reports a compile time warning when dereferencing an
__iomem pointer:
drivers/soc/ixp4xx/ixp4xx-qmgr.c:149:37: warning: dereference of noderef expression
drivers/soc/ixp4xx/ixp4xx-qmgr.c:153:40: warning: dereference of noderef expression
drivers/soc/ixp4xx/ixp4xx-qmgr.c:154:40: warning: dereference of noderef expression
drivers/soc/ixp4xx/ixp4xx-qmgr.c:174:38: warning: dereference of noderef expression
drivers/soc/ixp4xx/ixp4xx-qmgr.c:174:44: warning: dereference of noderef expression
Use __raw_readl() here for consistency with the rest of the file.
This should really get converted to some proper accessor, as the
__raw functions are not meant to be used in drivers, but the driver
has used these since the start, so for the moment, let's only fix
the warning.
Reported-by: kernel test robot <lkp@intel.com> Fixes: d4c9e9fc9751 ("IXP42x: Add QMgr support for IXP425 rev. A0 processors.") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In meson_spicc_probe, the error handling code needs to clean up master
by calling spi_master_put, but the remove function does not have this
function call. This will lead to memory leak of spicc->master.
Reported-by: Dongliang Mu <mudongliangabcd@gmail.com> Fixes: 454fa271bc4e("spi: Add Meson SPICC driver") Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com> Link: https://lore.kernel.org/r/20210720100116.1438974-1-mudongliangabcd@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When compile-testing with 64-bit resource_size_t, gcc reports an invalid
printk format string:
In file included from include/linux/dma-mapping.h:7,
from drivers/soc/ixp4xx/ixp4xx-npe.c:15:
drivers/soc/ixp4xx/ixp4xx-npe.c: In function 'ixp4xx_npe_probe':
drivers/soc/ixp4xx/ixp4xx-npe.c:694:18: error: format '%x' expects argument of type 'unsigned int', but argument 4 has type 'resource_size_t' {aka 'long long unsigned int'} [-Werror=format=]
dev_info(dev, "NPE%d at 0x%08x-0x%08x not available\n",
Use the special %pR format string to print the resources.
We can avoid the expensive ISB instruction after reading the counter in
the vDSO gettime functions by creating a fake address hazard against a
dummy stack read, just like we do inside the kernel.
Take a signed 'long' instead of an 'unsigned long' for the number of
pages to add/subtract to the total number of pages used by the MMU. This
fixes a zero-extension bug on 32-bit kernels that effectively corrupts
the per-cpu counter used by the shrinker.
Per-cpu counters take a signed 64-bit value on both 32-bit and 64-bit
kernels, whereas kvm_mod_used_mmu_pages() takes an unsigned long and thus
an unsigned 32-bit value on 32-bit kernels. As a result, the value used
to adjust the per-cpu counter is zero-extended (unsigned -> signed), not
sign-extended (signed -> signed), and so KVM's intended -1 gets morphed to 4294967295 and effectively corrupts the counter.
This was found by a staggering amount of sheer dumb luck when running
kvm-unit-tests on a 32-bit KVM build. The shrinker just happened to kick
in while running tests and do_shrink_slab() logged an error about trying
to free a negative number of objects. The truly lucky part is that the
kernel just happened to be a slightly stale build, as the shrinker no
longer yells about negative objects as of commit 18bb473e5031 ("mm:
vmscan: shrink deferred objects proportional to priority").
vmscan: shrink_slab: mmu_shrink_scan+0x0/0x210 [kvm] negative objects to delete nr=-858993460
Fixes: bc8a3d8925a8 ("kvm: mmu: Fix overflow on kvm mmu page limit calculation") Cc: stable@vger.kernel.org Cc: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210804214609.1096003-1-seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
KVM creates a debugfs directory for each VM in order to store statistics
about the virtual machine. The directory name is built from the process
pid and a VM fd. While generally unique, it is possible to keep a
file descriptor alive in a way that causes duplicate directories, which
manifests as these messages:
[ 471.846235] debugfs: Directory '20245-4' with parent 'kvm' already present!
Even though this should not happen in practice, it is more or less
expected in the case of KVM for testcases that call KVM_CREATE_VM and
close the resulting file descriptor repeatedly and in parallel.
When this happens, debugfs_create_dir() returns an error but
kvm_create_vm_debugfs() goes on to allocate stat data structs which are
later leaked. The slow memory leak was spotted by syzkaller, where it
caused OOM reports.
Since the issue only affects debugfs, do a lookup before calling
debugfs_create_dir, so that the message is downgraded and rate-limited.
While at it, ensure kvm->debugfs_dentry is NULL rather than an error
if it is not created. This fixes kvm_destroy_vm_debugfs, which was not
checking IS_ERR_OR_NULL correctly.
Cc: stable@vger.kernel.org Fixes: 536a6f88c49d ("KVM: Create debugfs dir and stat files for each VM") Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Once an exception has been injected, any side effects related to
the exception (such as setting CR2 or DR6) have been taked place.
Therefore, once KVM sets the VM-entry interruption information
field or the AMD EVENTINJ field, the next VM-entry must deliver that
exception.
Pending interrupts are processed after injected exceptions, so
in theory it would not be a problem to use KVM_INTERRUPT when
an injected exception is present. However, DOSEMU is using
run->ready_for_interrupt_injection to detect interrupt windows
and then using KVM_SET_SREGS/KVM_SET_REGS to inject the
interrupt manually. For this to work, the interrupt window
must be delayed after the completion of the previous event
injection.
Similar to [1], this patch fixes the same bug in raid10. Also cleanup the
comments.
[1] commit 2417b9869b81 ("md/raid1: properly indicate failure when ending
a failed write request") Cc: stable@vger.kernel.org Fixes: 7cee6d4e6035 ("md/raid10: end bio when the device faulty") Signed-off-by: Wei Shuyu <wsy@dogben.com> Acked-by: Guoqing Jiang <jiangguoqing@kylinos.cn> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
During the driver loading process, the 'dev' field was not assigned, but
the 'dev' field was referenced in the subsequent 'i82092aa_set_mem_map'
function.
Signed-off-by: Zheyu Ma <zheyuma97@gmail.com> CC: <stable@vger.kernel.org>
[linux@dominikbrodowski.net: shorten commit message, add Cc to stable] Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
syzbot reported KCSAN data races vs. timer_base::timer_running being set to
NULL without holding base::lock in expire_timers().
This looks innocent and most reads are clearly not problematic, but
Frederic identified an issue which is:
int data = 0;
void timer_func(struct timer_list *t)
{
data = 1;
}
CPU 0 CPU 1
------------------------------ --------------------------
base = lock_timer_base(timer, &flags); raw_spin_unlock(&base->lock);
if (base->running_timer != timer) call_timer_fn(timer, fn, baseclk);
ret = detach_if_pending(timer, base, true); base->running_timer = NULL;
raw_spin_unlock_irqrestore(&base->lock, flags); raw_spin_lock(&base->lock);
x = data;
If the timer has previously executed on CPU 1 and then CPU 0 can observe
base->running_timer == NULL and returns, assuming the timer has completed,
but it's not guaranteed on all architectures. The comment for
del_timer_sync() makes that guarantee. Moving the assignment under
base->lock prevents this.
For non-RT kernel it's performance wise completely irrelevant whether the
store happens before or after taking the lock. For an RT kernel moving the
store under the lock requires an extra unlock/lock pair in the case that
there is a waiter for the timer, but that's not the end of the world.
Reported-by: syzbot+aa7c2385d46c5eba0b89@syzkaller.appspotmail.com Reported-by: syzbot+abea4558531bae1ba9fe@syzkaller.appspotmail.com Fixes: 030dcdd197d7 ("timers: Prepare support for PREEMPT_RT") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lore.kernel.org/r/87lfea7gw8.fsf@nanos.tec.linutronix.de Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This attempts to fix a bug found with a serial port card which uses
an MCS9922 chip, one of the 4 models for which MSI-X interrupts are
currently supported. I don't possess such a card, and i'm not
experienced with the serial subsystem, so this patch is based on what
i think i found as a likely reason for failure, based on walking the
user who actually owns the card through some diagnostic.
The user who reported the problem finds the following in his dmesg
output for the relevant ttyS4 and ttyS5:
[ 0.580425] serial 0000:02:00.0: enabling device (0000 -> 0003)
[ 0.601448] 0000:02:00.0: ttyS4 at I/O 0x3010 (irq = 125, base_baud = 115200) is a ST16650V2
[ 0.603089] serial 0000:02:00.1: enabling device (0000 -> 0003)
[ 0.624119] 0000:02:00.1: ttyS5 at I/O 0x3000 (irq = 126, base_baud = 115200) is a ST16650V2
...
[ 6.323784] genirq: Flags mismatch irq 128. 00000080 (ttyS5) vs. 00000000 (xhci_hcd)
[ 6.324128] genirq: Flags mismatch irq 128. 00000080 (ttyS5) vs. 00000000 (xhci_hcd)
...
This suggests to me that the serial driver wants to register and share a
MSI/MSI-X irq 128 with the xhci_hcd driver, whereas the xhci driver does
not want to share the irq, as flags 0x00000080 (== IRQF_SHARED) from the
serial port driver means to share the irq, and this mismatch ends in some
failed irq init?
With this setup, data reception works very unreliable, with dropped data,
already at a transmission rate of only a 16 Bytes chunk every 1/120th of
a second, ie. 1920 Bytes/sec, presumably due to rx fifo overflow due to
mishandled or not used at all rx irq's?
See full discussion thread with attempted diagnosis at:
Disabling the use of MSI interrupts for the serial port pci card did
fix the reliability problems. The user executed the following sequence
of commands to achieve this:
echo 0000:02:00.0 | sudo tee /sys/bus/pci/drivers/serial/unbind
echo 0000:02:00.1 | sudo tee /sys/bus/pci/drivers/serial/unbind
echo 0 | sudo tee /sys/bus/pci/devices/0000:02:00.0/msi_bus
echo 0 | sudo tee /sys/bus/pci/devices/0000:02:00.1/msi_bus
echo 0000:02:00.0 | sudo tee /sys/bus/pci/drivers/serial/bind
echo 0000:02:00.1 | sudo tee /sys/bus/pci/drivers/serial/bind
This resulted in the following log output:
[ 82.179021] pci 0000:02:00.0: MSI/MSI-X disallowed for future drivers
[ 87.003031] pci 0000:02:00.1: MSI/MSI-X disallowed for future drivers
[ 98.537010] 0000:02:00.0: ttyS4 at I/O 0x3010 (irq = 17, base_baud = 115200) is a ST16650V2
[ 103.648124] 0000:02:00.1: ttyS5 at I/O 0x3000 (irq = 18, base_baud = 115200) is a ST16650V2
This patch attempts to fix the problem by disabling irq sharing when
using MSI irq's. Note that all i know for sure is that disabling MSI
irq's fixed the problem for the user, so this patch could be wrong and
is untested. Please review with caution, keeping this in mind.
Elkhart Lake UARTs are PCI enumerated Synopsys DesignWare v4.0+ UART
integrated with Intel iDMA 32-bit DMA controller. There is a specific
driver to handle them, i.e. 8250_lpss. Hence, disable 8250_pci
enumeration for these UARTs.
Fixes: 1b91d97c66ef ("serial: 8250_lpss: Add ->setup() for Elkhart Lake ports") Fixes: 4f912b898dc2 ("serial: 8250_lpss: Enable HS UART on Elkhart Lake") Cc: stable <stable@vger.kernel.org> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20210713101739.36962-1-andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Correct big-endian accesses to the CBUS UART, a Malta on-board discrete
TI16C550C part wired directly to the system controller's device bus, and
do not use byte swapping with the 32-bit accesses to the device.
The CBUS is used for devices such as the boot flash memory needed early
on in system bootstrap even before PCI has been initialised. Therefore
it uses the system controller's device bus, which follows the endianness
set with the CPU, which means no byte-swapping is ever required for data
accesses to CBUS, unlike with PCI.
The CBUS UART uses the UPIO_MEM32 access method, that is the `readl' and
`writel' MMIO accessors, which on the MIPS platform imply byte-swapping
with PCI systems. Consequently the wrong byte lane is accessed with the
big-endian configuration and the UART is not correctly accessed.
As it happens the UPIO_MEM32BE access method makes use of the `ioread32'
and `iowrite32' MMIO accessors, which still use `readl' and `writel'
respectively, however they byte-swap data passed, effectively cancelling
swapping done with the accessors themselves and making it suitable for
the CBUS UART.
Make the CBUS UART switch between UPIO_MEM32 and UPIO_MEM32BE then,
based on the endianness selected. With this change in place the device
is correctly recognised with big-endian Malta at boot, along with the
Super I/O devices behind PCI:
Serial: 8250/16550 driver, 5 ports, IRQ sharing enabled
printk: console [ttyS0] disabled
serial8250.0: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
printk: console [ttyS0] enabled
printk: bootconsole [uart8250] disabled
serial8250.0: ttyS1 at I/O 0x2f8 (irq = 3, base_baud = 115200) is a 16550A
serial8250.0: ttyS2 at MMIO 0x1f000900 (irq = 20, base_baud = 230400) is a 16550A
Fixes: e7c4782f92fc ("[MIPS] Put an end to <asm/serial.h>'s long and annyoing existence") Cc: stable@vger.kernel.org # v2.6.23+ Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Link: https://lore.kernel.org/r/alpine.DEB.2.21.2106260524430.37803@angie.orcam.me.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Make sure only actual 8 bits of the IIR register are used in determining
the port type in `autoconfig'.
The `serial_in' port accessor returns the `unsigned int' type, meaning
that with UPIO_AU, UPIO_MEM16, UPIO_MEM32, and UPIO_MEM32BE access types
more than 8 bits of data are returned, of which the high order bits will
often come from bus lines that are left floating in the data phase. For
example with the MIPS Malta board's CBUS UART, where the registers are
aligned on 8-byte boundaries and which uses 32-bit accesses, data as
follows is returned:
Evidently high-order 24 bits return values previously driven in the
address phase (the 3 highest order address bits used with the command
above are masked out in the simple virtual address mapping used here and
come out at zeros on the external bus), a common scenario with bus lines
left floating, due to bus capacitance.
Consequently when the value of IIR, mapped at 0x1f000910, is retrieved
in `autoconfig', it comes out at 0x1f0009c1 and when it is right-shifted
by 6 and then assigned to 8-bit `scratch' variable, the value calculated
is 0x27, not one of 0, 1, 2, 3 expected in port type determination.
Fix the issue then, by assigning the value returned from `serial_in' to
`scratch' first, which masks out 24 high-order bits retrieved, and only
then right-shift the resulting 8-bit data quantity, producing the value
of 3 in this case, as expected. Fix the same issue in `serial_dl_read'.
The problem first appeared with Linux 2.6.9-rc3 which predates our repo
history, but the origin could be identified with the old MIPS/Linux repo
also at: <git://git.kernel.org/pub/scm/linux/kernel/git/ralf/linux.git>
as commit e0d2356c0777 ("Merge with Linux 2.6.9-rc3."), where code in
`serial_in' was updated with this case:
+ case UPIO_MEM32:
+ return readl(up->port.membase + offset);
+
which made it produce results outside the unsigned 8-bit range for the
first time, though obviously it is system dependent what actual values
appear in the high order bits retrieved and it may well have been zeros
in the relevant positions with the system the change originally was
intended for. It is at that point that code in `autoconf' should have
been updated accordingly, but clearly it was overlooked.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org # v2.6.12+ Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Link: https://lore.kernel.org/r/alpine.DEB.2.21.2106260516220.37803@angie.orcam.me.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix uart corruption issue when rx power off.
Add spin lock in mtk8250_dma_rx_complete function in APDMA mode.
when uart is used as a communication port with external device(GPS).
when external device(GPS) power off, the power of rx pin is also from
1.8v to 0v. Even if there is not any data in rx. But uart rx pin can
capture the data "0".
If uart don't receive any data in specified cycle, uart will generates
BI(Break interrupt) interrupt.
If external device(GPS) power off, we found that BI interrupt appeared
continuously and very frequently.
When uart interrupt type is BI, uart IRQ handler(8250 framwork
API:serial8250_handle_irq) will push data to tty buffer.
mtk8250_dma_rx_complete is a task of mtk_uart_apdma_rx_handler.
mtk8250_dma_rx_complete priority is lower than uart irq
handler(serial8250_handle_irq).
if we are in process of mtk8250_dma_rx_complete, uart appear BI
interrupt:1)serial8250_handle_irq will priority execution.2)it may cause
write tty buffer conflict in mtk8250_dma_rx_complete.
So the spin lock protect the rx receive data process is not break.
The Tegra serial driver always prints an error message when enabling the
FIFO for devices that have support for checking the FIFO enable status.
Fix this by displaying the error message, only when an error occurs.
Finally, update the error message to make it clear that enabling the
FIFO failed and display the error code.
Commit b5776e7524af ("ext4: fix potential htree index checksum
corruption) removed a required restart when multiple levels of index
nodes need to be split. Fix this to avoid directory htree corruptions
when using the large_dir feature.
The direction of the pipe argument must match the request-type direction
bit or control requests may fail depending on the host-controller-driver
implementation.
Control transfers without a data stage are treated as OUT requests by
the USB stack and should be using usb_sndctrlpipe(). Failing to do so
will now trigger a warning.
The driver uses a zero-length i2c-read request for type detection so
update the control-request code to use usb_sndctrlpipe() in this case.
Note that actually trying to read the i2c register in question does not
work as the register might not exist (e.g. depending on the demodulator)
as reported by Eero Lehtinen <debiangamer2@gmail.com>.
Reported-by: syzbot+faf11bbadc5a372564da@syzkaller.appspotmail.com Reported-by: Eero Lehtinen <debiangamer2@gmail.com> Tested-by: Eero Lehtinen <debiangamer2@gmail.com> Fixes: d0f232e823af ("[media] rtl28xxu: add heuristic to detect chip type") Cc: stable@vger.kernel.org # 4.0 Cc: Antti Palosaari <crope@iki.fi> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Sean Young <sean@mess.org> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch is preparation for following patch for error handling
refactoring.
flush_scheduled_work() takes (wq_completion)events lock and
it can lead to deadlock when r871xu_dev_remove() is called from workqueue.
To avoid deadlock sutiation we can change flush_scheduled_work() call to
flush_work() call for all possibly scheduled works in this driver,
since next patch adds device_release_driver() in case of fw load failure.
dma-buf backed shared memory cannot be reliably freed and unregistered
during a kexec operation even when tee_shm_free() is called on the shm
from a .shutdown hook. The problem occurs because dma_buf_put() calls
fput() which then uses task_work_add(), with the TWA_RESUME parameter,
to queue tee_shm_release() to be called before the current task returns
to user mode. However, the current task never returns to user mode
before the kexec completes so the memory is never freed nor
unregistered.
Use tee_shm_alloc_kernel_buf() to avoid dma-buf backed shared memory
allocation so that tee_shm_free() can directly call tee_shm_release().
This will ensure that the shm can be freed and unregistered during a
kexec operation.
Adds a new function tee_shm_alloc_kernel_buf() to allocate shared memory
from a kernel driver. This function can later be made more lightweight
by unnecessary dma-buf export.
The shm cache could contain invalid addresses if
optee_disable_shm_cache() was not called from the .shutdown hook of the
previous kernel before a kexec. These addresses could be unmapped or
they could point to mapped but unintended locations in memory.
Clear the shared memory cache, while being careful to not translate the
addresses returned from OPTEE_SMC_DISABLE_SHM_CACHE, during driver
initialization. Once all pre-cache shm objects are removed, proceed with
enabling the cache so that we know that we can handle cached shm objects
with confidence later in the .shutdown hook.
When working on my user space applications, I found a bug in the synthetic
event code where the automated synthetic event field was not matching the
event field calculation it was attached to. Looking deeper into it, it was
because the calculation hist_field was not given a size.
The synthetic event fields are matched to their hist_fields either by
having the field have an identical string type, or if that does not match,
then the size and signed values are used to match the fields.
The problem arose when I tried to match a calculation where the fields
were "unsigned int". My tool created a synthetic event of type "u32". But
it failed to match. The string was:
Adding debugging into the kernel, I found that the size of "diff" was 0.
And since it was given "unsigned int" as a type, the histogram fallback
code used size and signed. The signed matched, but the size of u32 (4) did
not match zero, and the event failed to be created.
This can be worse if the field you want to match is not one of the
acceptable fields for a synthetic event. As event fields can have any type
that is supported in Linux, this can cause an issue. For example, if a
type is an enum. Then there's no way to use that with any calculations.
Have the calculation field simply take on the size of what it is
calculating.
Link: https://lkml.kernel.org/r/20210730171951.59c7743f@oasis.local.home Cc: Tom Zanussi <zanussi@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@vger.kernel.org Fixes: 100719dcef447 ("tracing: Add simple expression support to hist triggers") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since commit 77271ce4b2c0 ("tracing: Add irq, preempt-count and need resched info
to default trace output"), the default trace output format has been changed to:
<idle>-0 [009] d.h. 22420.068695: _raw_spin_lock_irqsave <-hrtimer_interrupt
<idle>-0 [000] ..s. 22420.068695: _nohz_idle_balance <-run_rebalance_domains
<idle>-0 [011] d.h. 22420.068695: account_process_tick <-update_process_times
The draw_functrace.py(introduced in v2.6.28) can't parse the new version format trace_func,
So we need modify draw_functrace.py to adapt the new version trace output format.
Link: https://lkml.kernel.org/r/20210611022107.608787-1-suhui@zeku.com Cc: stable@vger.kernel.org Fixes: 77271ce4b2c0 tracing: Add irq, preempt-count and need resched info to default trace output Signed-off-by: Hui Su <suhui@zeku.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
clk_bulk_get_all() allocates an array of struct clk_bulk data for us
(unlike clk_bulk_get()), so we need to free it. Let's use the
clk_bulk_put_all() helper.
kmemleak complains, on an RK3399 Gru/Kevin system:
The HNP work can be re-scheduled while it's still in-fly. This results in
re-initialization of the busy work, resetting the hrtimer's list node of
the work and crashing kernel with null dereference within kernel/timer
once work's timer is expired. It's very easy to trigger this problem by
re-plugging USB cable quickly. Initialize HNP work only once to fix this
trouble.
Unable to handle kernel NULL pointer dereference at virtual address 00000126)
...
PC is at __run_timers.part.0+0x150/0x228
LR is at __next_timer_interrupt+0x51/0x9c
...
(__run_timers.part.0) from [<c0187a2b>] (run_timer_softirq+0x2f/0x50)
(run_timer_softirq) from [<c01013ad>] (__do_softirq+0xd5/0x2f0)
(__do_softirq) from [<c012589b>] (irq_exit+0xab/0xb8)
(irq_exit) from [<c0170341>] (handle_domain_irq+0x45/0x60)
(handle_domain_irq) from [<c04c4a43>] (gic_handle_irq+0x6b/0x7c)
(gic_handle_irq) from [<c0100b65>] (__irq_svc+0x65/0xac)
The USB HID standard declares mandatory support for GET_IDLE and SET_IDLE
requests for Boot Keyboard. Most hosts can handle their absence, but others
like some old/strange UEFIs and BIOSes consider this a critical error
and refuse to work with f_hid.
This primitive implementation of saving and returning idle is sufficient
to meet the requirements of the standard and these devices.
For delayed status phase, the usb_gadget->state was set
to USB_STATE_ADDRESS and it has never been updated to
USB_STATE_CONFIGURED.
Patch updates the gadget state to correct USB_STATE_CONFIGURED.
As a result of this bug the controller was not able to enter to
Test Mode while using MSC function.
This use-after-free happens when a fw_priv object has been freed but
hasn't been removed from the pending list (pending_fw_head). The next
time fw_load_sysfs_fallback tries to insert into the list, it ends up
accessing the pending_list member of the previously freed fw_priv.
The root cause here is that all code paths that abort the fw load
don't delete it from the pending list. For example:
To fix this, delete the fw_priv from the list in __fw_set_state() if
the new state is DONE or ABORTED. This way, all aborts will remove
the fw_priv from the list. Accordingly, remove calls to list_del_init
that were being made before calling fw_state_(aborted|done).
Also, in fw_load_sysfs_fallback, don't add the fw_priv to the pending
list if it is already aborted. Instead, just jump out and return early.
Fixes: bcfbd3523f3c ("firmware: fix a double abort case with fw_load_sysfs_fallback") Cc: stable <stable@vger.kernel.org> Reported-by: syzbot+de271708674e2093097b@syzkaller.appspotmail.com Tested-by: syzbot+de271708674e2093097b@syzkaller.appspotmail.com Reviewed-by: Shuah Khan <skhan@linuxfoundation.org> Acked-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Anirudh Rayabharam <mail@anirudhrb.com> Link: https://lore.kernel.org/r/20210728085107.4141-3-mail@anirudhrb.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The only motivation for using -EAGAIN in commit 0542ad88fbdd81bb
("firmware loader: Fix _request_firmware_load() return val for fw load
abort") was to distinguish the error from -ENOMEM, and so there is no
real reason in keeping it. -EAGAIN is typically used to tell the
userspace to try something again and in this case re-using the sysfs
loading interface cannot be retried when a timeout happens, so the
return value is also bogus.
-ETIMEDOUT is received when the wait times out and returning that
is much more telling of what the reason for the failure was. So, just
propagate that instead of returning -EAGAIN.
The chip supports high transfer rates, but with the small default buffers
(64 bytes read), some entire blocks are regularly lost. This typically
happens at 1.5 Mbps (which is the default speed on Rockchip devices) when
used as a console to access U-Boot where the output of the "help" command
misses many lines and where "printenv" mangles the environment.
The FTDI driver doesn't suffer at all from this. One difference is that
it uses 512 bytes rx buffers and 256 bytes tx buffers. Adopting these
values completely resolved the issue, even the output of "dmesg" is
reliable. I preferred to leave the Tx value unchanged as it is not
involved in this issue, while a change could increase the risk of
triggering the same issue with other devices having too small buffers.
I verified that it backports well (and works) at least to 5.4. It's of
low importance enough to be dropped where it doesn't trivially apply
anymore.
rcu: INFO: rcu_preempt self-detected stall on CPU
rcu: 1-...!: (2 ticks this GP) idle=d92/1/0x4000000000000000
softirq=25390/25392 fqs=3
(t=12164 jiffies g=31645 q=43226)
rcu: rcu_preempt kthread starved for 12162 jiffies! g31645 f0x0
RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
rcu: Unless rcu_preempt kthread gets sufficient CPU time,
OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_preempt state:R running task
...........
usbtmc 3-1:0.0: unknown status received: -71
usbtmc 3-1:0.0: unknown status received: -71
usbtmc 3-1:0.0: unknown status received: -71
usbtmc 3-1:0.0: unknown status received: -71
usbtmc 3-1:0.0: unknown status received: -71
usbtmc 3-1:0.0: unknown status received: -71
usbtmc 3-1:0.0: unknown status received: -71
usbtmc 3-1:0.0: unknown status received: -71
usbtmc 3-1:0.0: usb_submit_urb failed: -19
The function usbtmc_interrupt() resubmits urbs when the error status
of an urb is -EPROTO. In systems using the dummy_hcd usb controller
this can result in endless interrupt loops when the usbtmc device is
disconnected from the host system.
Since host controller drivers already try to recover from transmission
errors, there is no need to resubmit the urb or try other solutions
to repair the error situation.
In case of errors the INT pipe just stops to wait for further packets.
Fixes: dbf3e7f654c0 ("Implement an ioctl to support the USMTMC-USB488 READ_STATUS_BYTE operation") Cc: stable@vger.kernel.org Reported-by: syzbot+e2eae5639e7203360018@syzkaller.appspotmail.com Signed-off-by: Qiang.zhang <qiang.zhang@windriver.com> Acked-by: Guido Kiener <guido.kiener@rohde-schwarz.com> Link: https://lore.kernel.org/r/20210723004334.458930-1-qiang.zhang@windriver.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
syzbot is hitting might_sleep() warning at hci_sock_dev_event() due to
calling lock_sock() with rw spinlock held [1].
It seems that history of this locking problem is a trial and error.
Commit b40df5743ee8 ("[PATCH] bluetooth: fix socket locking in
hci_sock_dev_event()") in 2.6.21-rc4 changed bh_lock_sock() to
lock_sock() as an attempt to fix lockdep warning.
Then, commit 4ce61d1c7a8e ("[BLUETOOTH]: Fix locking in
hci_sock_dev_event().") in 2.6.22-rc2 changed lock_sock() to
local_bh_disable() + bh_lock_sock_nested() as an attempt to fix the
sleep in atomic context warning.
Then, commit 4b5dd696f81b ("Bluetooth: Remove local_bh_disable() from
hci_sock.c") in 3.3-rc1 removed local_bh_disable().
Then, commit e305509e678b ("Bluetooth: use correct lock to prevent UAF
of hdev object") in 5.13-rc5 again changed bh_lock_sock_nested() to
lock_sock() as an attempt to fix CVE-2021-3573.
This difficulty comes from current implementation that
hci_sock_dev_event(HCI_DEV_UNREG) is responsible for dropping all
references from sockets because hci_unregister_dev() immediately
reclaims resources as soon as returning from
hci_sock_dev_event(HCI_DEV_UNREG).
But the history suggests that hci_sock_dev_event(HCI_DEV_UNREG) was not
doing what it should do.
Therefore, instead of trying to detach sockets from device, let's accept
not detaching sockets from device at hci_sock_dev_event(HCI_DEV_UNREG),
by moving actual cleanup of resources from hci_unregister_dev() to
hci_cleanup_dev() which is called by bt_host_release() when all
references to this unregistered device (which is a kobject) are gone.
Since hci_sock_dev_event(HCI_DEV_UNREG) no longer resets
hci_pi(sk)->hdev, we need to check whether this device was unregistered
and return an error based on HCI_UNREGISTER flag. There might be subtle
behavioral difference in "monitor the hdev" functionality; please report
if you found something went wrong due to this patch.
Link: https://syzkaller.appspot.com/bug?extid=a5df189917e79d5e59c9 Reported-by: syzbot <syzbot+a5df189917e79d5e59c9@syzkaller.appspotmail.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Fixes: e305509e678b ("Bluetooth: use correct lock to prevent UAF of hdev object") Acked-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
If queue is dying while iolatency_set_limit() is in progress,
blk_get_queue() won't increment the refcount of the queue. However,
blk_put_queue() will still decrement the refcount later, which will
cause the refcout to be unbalanced.
Thus error out in such case to fix the problem.
Fixes: 8c772a9bfc7c ("blk-iolatency: fix IO hang due to negative inflight counter") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20210805124645.543797-1-yukuai3@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
Smatch says:
drivers/net/ethernet/neterion/vxge/vxge-main.c:3518 vxge_device_unregister() error: Using vdev after free_{netdev,candev}(dev);
drivers/net/ethernet/neterion/vxge/vxge-main.c:3518 vxge_device_unregister() error: Using vdev after free_{netdev,candev}(dev);
drivers/net/ethernet/neterion/vxge/vxge-main.c:3520 vxge_device_unregister() error: Using vdev after free_{netdev,candev}(dev);
drivers/net/ethernet/neterion/vxge/vxge-main.c:3520 vxge_device_unregister() error: Using vdev after free_{netdev,candev}(dev);
Since vdev pointer is netdev private data accessing it after free_netdev()
call can cause use-after-free bug. Fix it by moving free_netdev() call at
the end of the function
Fixes: 6cca200362b4 ("vxge: cleanup probe error paths") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Smatch says:
drivers/net/ethernet/freescale/fec_main.c:3994 fec_drv_remove() error: Using fep after free_{netdev,candev}(ndev);
drivers/net/ethernet/freescale/fec_main.c:3995 fec_drv_remove() error: Using fep after free_{netdev,candev}(ndev);
Since fep pointer is netdev private data, accessing it after free_netdev()
call can cause use-after-free bug. Fix it by moving free_netdev() call at
the end of the function
Syzbot reported uninit value pegasus_probe(). The problem was in missing
error handling.
get_interrupt_interval() internally calls read_eprom_word() which can
fail in some cases. For example: failed to receive usb control message.
These cases should be handled to prevent uninit value bug, since
read_eprom_word() will not initialize passed stack variable in case of
internal failure.
Set the error code if bnx2x_alloc_fw_stats_mem() fails. The current
code returns success.
Fixes: ad5afc89365e ("bnx2x: Separate VF and PF logic") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
When cross compiling a MIPS kernel on a BSD based HOSTCC leads
to errors like
SYNC include/config/auto.conf.cmd - due to: .config
egrep: empty (sub)expression
UPD include/config/kernel.release
HOSTCC scripts/dtc/dtc.o - due to target missing
It turns out that egrep uses this egrep pattern:
(|MINOR_|PATCHLEVEL_)
This is not valid syntax or gives undefined results according
to POSIX 9.5.3 ERE Grammar
It seems to be silently accepted by the Linux egrep implementation
while a BSD host complains.
Such patterns can be replaced by a transformation like
"(|p1|p2)" -> "(p1|p2)?"
Fixes: 48c35b2d245f ("[MIPS] There is no __GNUC_MAJOR__") Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The patch fixing the returned value of ip6_skb_dst_mtu (int -> unsigned
int) was rebased between its initial review and the version applied. In
the meantime fade56410c22 was applied, which added a new variable (int)
used as the returned value. This lead to a mismatch between the function
prototype and the variable used as the return value.
Fixes: 40fc3054b458 ("net: ipv6: fix return value of ip6_skb_dst_mtu") Cc: Vadim Fedorenko <vfedorenko@novek.ru> Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Pauseframe control is set to symmetric mode by default on the NFP.
Pause frames can not be configured through ethtool now, but ethtool can
report the supported mode.
Fixes: 265aeb511bd5 ("nfp: add support for .get_link_ksettings()") Signed-off-by: Fei Qin <fei.qin@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
In commit 58acd1009226 ("sctp: update active_key for asoc when old key is
being replaced"), sctp_auth_asoc_init_active_key() is called to update
the active_key right after the old key is deleted and before the new key
is added, and it caused that the active_key could be found with the key_id.
In Ying Xu's testing, the BUG_ON in sctp_auth_asoc_init_active_key() was
triggered:
So fix it by moving the active_key update after sh_keys is added.
Fixes: 58acd1009226 ("sctp: update active_key for asoc when old key is being replaced") Reported-by: Ying Xu <yinxu@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Replace pci_enable_device() with pcim_enable_device(),
pci_disable_device() and pci_release_regions() will be
called in release automatically.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
The logic for discerning between KSZ8051 and KSZ87XX PHYs is incorrect
such that the that KSZ87XX switch is not identified correctly.
ksz8051_ksz8795_match_phy_device() uses the parameter ksz_phy_id
to discriminate whether it was called from ksz8051_match_phy_device()
or from ksz8795_match_phy_device() but since PHY_ID_KSZ87XX is the
same value as PHY_ID_KSZ8051, this doesn't work.
Instead use a bool to discriminate the caller.
Without this patch, the KSZ8795 switch port identifies as:
Fixes: 8b95599c55ed24b36cf4 ("net: phy: micrel: Discern KSZ8051 and KSZ8795 PHYs") Signed-off-by: Steve Bennett <steveb@workware.net.au> Reviewed-by: Marek Vasut <marex@denx.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
The procedure to add a static FDB entry in sja1105 is concurrent with
dynamic learning performed on all bridge ports and the CPU port.
The switch looks up the FDB from left to right, and also learns
dynamically from left to right, so it is possible that between the
moment when we pick up a free slot to install an FDB entry, another slot
to the left of that one becomes free due to an address ageing out, and
that other slot is then immediately used by the switch to learn
dynamically the same address as we're trying to add statically.
The result is that we succeeded to add our static FDB entry, but it is
being shadowed by a dynamic FDB entry to its left, and the switch will
behave as if our static FDB entry did not exist.
We cannot really prevent this from happening unless we make the entire
process to add a static FDB entry a huge critical section where address
learning is temporarily disabled on _all_ ports, and then re-enabled
according to the configuration done by sja1105_port_set_learning.
However, that is kind of disruptive for the operation of the network.
What we can do alternatively is to simply read back the FDB for dynamic
entries located before our newly added static one, and delete them.
This will guarantee that our static FDB entry is now operational. It
will still not guarantee that there aren't dynamic FDB entries to the
_right_ of that static FDB entry, but at least those entries will age
out by themselves since they aren't hit, and won't bother anyone.
Fixes: 291d1e72b756 ("net: dsa: sja1105: Add support for FDB and MDB management") Fixes: 1da73821343c ("net: dsa: sja1105: Add FDB operations for P/Q/R/S series") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
The SJA1105 switch family leaves it up to software to decide where
within the FDB to install a static entry, and to concatenate destination
ports for already existing entries (the FDB is also used for multicast
entries), it is not as simple as just saying "please add this entry".
This means we first need to search for an existing FDB entry before
adding a new one. The driver currently manages to fool itself into
thinking that if an FDB entry already exists, there is nothing to be
done. But that FDB entry might be dynamically learned, case in which it
should be replaced with a static entry, but instead it is left alone.
This patch checks the LOCKEDS ("locked/static") bit from found FDB
entries, and lets the code "goto skip_finding_an_index;" if the FDB
entry was not static. So we also need to move the place where we set
LOCKEDS = true, to cover the new case where a dynamic FDB entry existed
but was dynamic.
Fixes: 291d1e72b756 ("net: dsa: sja1105: Add support for FDB and MDB management") Fixes: 1da73821343c ("net: dsa: sja1105: Add FDB operations for P/Q/R/S series") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
GSO expects inner transport header offset to be valid when
skb->encapsulation flag is set. GSO uses this value to calculate the length
of an individual segment of a GSO packet in skb_gso_transport_seglen().
However, tcp/udp gro_complete callbacks don't update the
skb->inner_transport_header when processing an encapsulated TCP/UDP
segment. As a result a GRO skb has ->inner_transport_header set to a value
carried over from earlier skb processing.
This can have mild to tragic consequences. From miscalculating the GSO
segment length to triggering a page fault [1], when trying to read TCP/UDP
header at an address past the skb->data page.
The latter scenario leads to an oops report like so:
The bug can be observed in a simple setup where we send IP/GRE/IP/TCP
packets into a netns over a veth pair. Inside the netns, packets are
forwarded to dummy device:
trafgen -> [veth A]--[veth B] -forward-> [dummy]
For veth B to GRO aggregate packets on receive, it needs to have an XDP
program attached (for example, a trivial XDP_PASS). Additionally, for UDP,
we need to enable GSO_UDP_L4 feature on the device:
ip netns exec A ethtool -K AB rx-udp-gro-forwarding on
The last component is an artificial delay to increase the chances of GRO
batching happening:
ip netns exec A tc qdisc add dev AB root \
netem delay 200us slot 5ms 10ms packets 2 bytes 64k
With such a setup in place, the bug can be observed by tracing the skb
outer and inner offsets when GSO skb is transmitted from the dummy device:
Fix it by updating the inner transport header offset in tcp/udp
gro_complete callbacks, similar to how {inet,ipv6}_gro_complete callbacks
update the inner network header offset, when skb->encapsulation flag is
set.
Commit dea7a9fbb009 ("dmaengine: imx-dma: remove dma_slave_config
direction usage") changes the method from a "configuration when called"
to an "configuration when used". Due to this, only the cyclic DMA type
gets configured correctly, while the generic DMA type is left
non-configured.
Without this additional call, the struct imxdma_channel::word_size member
is stuck at DMA_SLAVE_BUSWIDTH_UNDEFINED and imxdma_prep_slave_sg() always
returns NULL.
If a vb2_queue sets q->min_buffers_needed then when the number of
queued buffers reaches q->min_buffers_needed, vb2_core_qbuf() will call
the start_streaming() callback. If start_streaming() returns an error,
then that error was just returned by vb2_core_qbuf(), but the buffer
was still queued. However, userspace expects that if VIDIOC_QBUF fails,
the buffer is returned dequeued.
So if start_streaming() fails, then remove the buffer from the queue,
thus avoiding this unwanted side-effect.
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Tested-by: Kieran Bingham <kieran.bingham@ideasonboard.com> Fixes: b3379c6201bb ("[media] vb2: only call start_streaming if sufficient buffers are queued") Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Media event code 3 is defined in the MMC-6 spec as follows:
"MediaRemoval: The media has been removed from the specified slot, and
the Drive is unable to access the media without user intervention. This
applies to media changers only."
This indicated that treating the condition as an EJECT_REQUEST was
appropriate. However, doing so had the unfortunate side-effect of causing
the drive tray to be physically ejected on resume. Instead treat the event
as a MEDIA_CHANGE request.
The spi_imx->spi_bus_clk may be uninitialized and thus also zero in
mx51_ecspi_prepare_message(), which would lead to division by zero
in kernel. Since bitbang .setup_transfer callback which initializes
the spi_imx->spi_bus_clk is called after bitbang prepare_message
callback, iterate over all the transfers in spi_message, find the
one with lowest bus frequency, and use that bus frequency for the
delay calculation.
Note that it is not possible to move this CONFIGREG delay back into
the .setup_transfer callback, because that is invoked too late, after
the GPIO chipselects were already configured.
Fixes: 135cbd378eab ("spi: imx: mx51-ecspi: Reinstate low-speed CONFIGREG delay") Signed-off-by: Marek Vasut <marex@denx.de> Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Cc: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20210726100102.5188-1-marex@denx.de Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Since 00b80ac935539 ("spi: imx: mx51-ecspi: Move some initialisation to
prepare_message hook."), the MX51_ECSPI_CONFIG write no longer happens
in prepare_transfer hook, but rather in prepare_message hook, however
the MX51_ECSPI_CONFIG delay is still left in prepare_transfer hook and
thus has no effect. This leads to low bus frequency operation problems
described in 6fd8b8503a0dc ("spi: spi-imx: Fix out-of-order CS/SCLK
operation at low speeds") again.
Move the MX51_ECSPI_CONFIG write delay into the prepare_message hook
as well, thus reinstating the low bus frequency fix.
Fixes: 00b80ac935539 ("spi: imx: mx51-ecspi: Move some initialisation to prepare_message hook.") Signed-off-by: Marek Vasut <marex@denx.de> Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Cc: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20210703022300.296114-1-marex@denx.de Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>