Digging into the documentation we find that the DT_ID bitfield is used to
map the six bit DT to a two bit ID code. This value is concatenated to the
VC bitfield to create a CID value. DT_ID is the two least significant bits
of CID and VC the most significant bits.
Originally we set dt_id = vc * 4 in and then subsequently set dt_id = vc.
commit 3c4ed72a16bc ("media: camss: sm8250: Virtual channels for CSID")
silently fixed the multiplication by four which would give a better
value for the generated CID without mentioning what was being done or why.
Next up I haplessly changed the value back to "dt_id = vc * 4" since there
didn't appear to be any logic behind it.
Hans asked what the change was for and I honestly couldn't remember the
provenance of it, so I dug in.
Poison inject and clear are supported via debugfs where a privileged
user can inject and clear poison to a device physical address.
Commit 458ba8189cb4 ("cxl: Add cxl_decoders_committed() helper")
added a lockdep assert that highlighted a gap in poison inject and
clear functions where holding the dpa_rwsem does not assure that a
a DPA is not added to a region.
The impact for inject and clear is that if the DPA address being
injected or cleared has been attached to a region, but not yet
committed, the dev_dbg() message intended to alert the debug user
that they are acting on a mapped address is not emitted. Also, the
cxl_poison trace event that serves as a log of the inject and clear
activity will not include region info.
Close this gap by snapshotting an unchangeable region state during
poison inject and clear operations. That means holding both the
region_rwsem and the dpa_rwsem during the inject and clear ops.
Fixes: d2fbc4865802 ("cxl/memdev: Add support for the Inject Poison mailbox command") Fixes: 9690b07748d1 ("cxl/memdev: Add support for the Clear Poison mailbox command") Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/08721dc1df0a51e4e38fecd02425c3475912dfd5.1701041440.git.alison.schofield@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The new helper "cxl_num_decoders_committed()" added a lockdep assertion
to validate that port->commit_end is protected against modification.
That assertion fires in init_hdm_decoder() where it is initializing
port->commit_end. Given that it is both accessing and writing that
property it obstensibly needs the lock.
In practice, CXL decoder commit rules (must commit in order) and the
in-order discovery of device decoders makes the manipulation of
->commit_end in init_hdm_decoder() safe. However, rather than rely on
the subtle rules of CXL hardware, just make the implementation obviously
correct from a software perspective.
The Fixes: tag is only for cleaning up a lockdep splat, there is no
functional issue addressed by this fix.
Fixes: 458ba8189cb4 ("cxl: Add cxl_decoders_committed() helper") Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/170025232811.2147250.16376901801315194121.stgit@djiang5-mobl3 Acked-by: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 458ba8189cb4 ("cxl: Add cxl_decoders_committed() helper") missed the
conversion for cxl_test. Add usage of cxl_num_decoders_committed() to
replace the open coding.
Suggested-by: Alison Schofield <alison.schofield@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Fan Ni <fan.ni@samsung.com> Link: https://lore.kernel.org/r/169929160525.824083.11813222229025394254.stgit@djiang5-mobl3 Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
/*
* After this, the interrupt handler may be invoked at any time
*
* tmio_mmc_irq()
* {
* __tmio_mmc_card_detect_irq()
* mmc_detect_change()
* _mmc_detect_change()
* mmc_schedule_delayed_work(&host->detect, delay);
* }
*/
When expire_timers() runs later, it warns because the MMC host structure
containing the delayed work was freed, and now contains an invalid work
function pointer.
Fix this by cancelling any pending delayed work before releasing the
MMC host structure.
When RPMB was converted to a character device, it added support for
multiple RPMB partitions (Commit 97548575bef3 ("mmc: block: Convert RPMB to
a character device").
One of the changes in this commit was transforming the variable target_part
defined in __mmc_blk_ioctl_cmd into a bitmask. This inadvertently regressed
the validation check done in mmc_blk_part_switch_pre() and
mmc_blk_part_switch_post(), so let's fix it.
Fixes: 97548575bef3 ("mmc: block: Convert RPMB to a character device") Signed-off-by: Jorge Ramirez-Ortiz <jorge@foundries.io> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20231201153143.1449753-1-jorge@foundries.io Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 4bc31edebde5 ("mmc: core: Set HS clock speed before sending
HS CMD13") set HS clock (52MHz) before switching to HS mode. For this
freq, FCLK_DIV5 will be selected and div value is 10 (reg value is 9).
Then we set rx_clk_phase to 11 or 15 which is out of range and make
hardware frozen. After we send command request, no irq will be
interrupted and the mmc driver will keep to wait for request finished,
even durning rebooting.
So let's set it to Phase 90 which should work in most cases. Then let
meson_mx_sdhc_execute_tuning() to find the accurate value for data
transfer.
If this doesn't work, maybe need to define a factor in dts.
The bug happens when highest bit of holebegin is 1, suppose holebegin is
0x8000000111111000, after shift, hba would be 0xfff8000000111111, then
vma_interval_tree_foreach would look it up fail or leads to the wrong
result.
here pgoff is correctly shifted to 0x8000000111111,
but pass 0x8000000111111000 as holebegin to unmap
would then cause terrible result, as shown below:
The issue happens in Heterogeneous computing, where the device(e.g.
gpu) and host share the same virtual address space.
A simple workflow pattern which hit the issue is:
/* host */
1. userspace first mmap a file backed VA range with specified offset.
e.g. (offset=0x800..., mmap return: va_a)
2. write some data to the corresponding sys page
e.g. (va_a = 0xAABB)
/* device */
3. gpu workload touches VA, triggers gpu fault and notify the host.
/* host */
4. reviced gpu fault notification, then it will:
4.1 unmap host pages and also takes care of cpu tlb
(use unmap_mapping_range with offset=0x800...)
4.2 migrate sys page to device
4.3 setup device page table and resolve device fault.
/* device */
5. gpu workload continued, it accessed va_a and got 0xAABB.
6. gpu workload continued, it wrote 0xBBCC to va_a.
/* host */
7. userspace access va_a, as expected, it will:
7.1 trigger cpu vm fault.
7.2 driver handling fault to migrate gpu local page to host.
8. userspace then could correctly get 0xBBCC from va_a
9. done
But in step 4.1, if we hit the bug this patch mentioned, then userspace
would never trigger cpu fault, and still get the old value: 0xAABB.
Since commit aa49c90894d0 ("i2c: core: Run atomic i2c xfer when
!preemptible"), the whole reboot/power off sequence on non-preempt kernels
is using atomic i2c xfer, as !preemptible() always results to 1.
During device_shutdown(), the i2c might be used a lot and not all busses
have implemented an atomic xfer handler. This results in a lot of
avoidable noise, like:
[ 12.687169] No atomic I2C transfer handler for 'i2c-0'
[ 12.692313] WARNING: CPU: 6 PID: 275 at drivers/i2c/i2c-core.h:40 i2c_smbus_xfer+0x100/0x118
...
Fix this by allowing non-atomic xfer when the interrupts are enabled, as
it was before.
kprobe_emulate_call_indirect currently uses int3_emulate_call to emulate
indirect calls. However, int3_emulate_call always assumes the size of
the call to be 5 bytes when calculating the return address. This is
incorrect for register-based indirect calls in x86, which can be either
2 or 3 bytes depending on whether REX prefix is used. At kprobe runtime,
the incorrect return address causes control flow to land onto the wrong
place after return -- possibly not a valid instruction boundary. This
can lead to a panic like the following:
The emulation incorrectly sets the return address to be ffffffff8102ed9d
+ 0x5 = ffffffff8102eda2, which is the 8b byte in the middle of the next
mov. This in turn causes incorrect subsequent instruction decoding and
eventually triggers the page fault above.
Instead of invoking int3_emulate_call, perform push and jmp emulation
directly in kprobe_emulate_call_indirect. At this point we can obtain
the instruction size from p->ainsn.size so that we can calculate the
correct return address.
Link: https://lore.kernel.org/all/20240102233345.385475-1-jinghao7@illinois.edu/ Fixes: 6256e668b7af ("x86/kprobes: Use int3 instead of debug trap for single-step") Cc: stable@vger.kernel.org Signed-off-by: Jinghao Jia <jinghao7@illinois.edu> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
VIA VT6306/6307/6308 provides PCI interface compliant to 1394 OHCI. When
the hardware is combined with Asmedia ASM1083/1085 PCIe-to-PCI bus bridge,
it appears that accesses to its 'Isochronous Cycle Timer' register (offset
0xf0 on PCI memory space) often causes unexpected system reboot in any
type of AMD Ryzen machine (both 0x17 and 0x19 families). It does not
appears in the other type of machine (AMD pre-Ryzen machine, Intel
machine, at least), or in the other OHCI 1394 hardware (e.g. Texas
Instruments).
The issue explicitly appears at a commit dcadfd7f7c74 ("firewire: core:
use union for callback of transaction completion") added to v6.5 kernel.
It changed 1394 OHCI driver to access to the register every time to
dispatch local asynchronous transaction. However, the issue exists in
older version of kernel as long as it runs in AMD Ryzen machine, since
the access to the register is required to maintain bus time. It is not
hard to imagine that users experience the unexpected system reboot when
generating bus reset by plugging any devices in, or reading the register
by time-aware application programs; e.g. audio sample processing.
This commit suppresses the unexpected system reboot in the combination of
hardware. It avoids the access itself. As a result, the software stack can
not provide the hardware time anymore to unit drivers, userspace
applications, and nodes in the same IEEE 1394 bus. It brings apparent
disadvantage since time-aware application programs require it, while
time-unaware applications are available again; e.g. sbp2.
Special VMAs like VM_PFNMAP can contain anon pages from COW. There isn't
much profit in doing lookaround on them. Besides, they can trigger the
pte_special() warning in get_pte_pfn().
ifconfig ethx up, will set page->refcount larger than 1,
and then ifconfig ethx down, calling __page_frag_cache_drain()
to free pages, it is not compatible with page pool.
So deleting codes which changing page->refcount.
Fixes: 3c47e8ae113a ("net: libwx: Support to receive packets in NAPI") Signed-off-by: duanqiangwen <duanqiangwen@net-swift.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
devm_cxl_pmu_add() correctly registers a device remove function but it
only calls device_del() which is only part of device unregistration.
Properly call device_unregister() to free up the memory associated with
the device.
Fixes: 1ad3f701c399 ("cxl/pci: Find and register CXL PMU devices") Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20231016-pmu-unregister-fix-v1-1-1e2eb2fa3c69@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The hypervisor returns migration failure if all VAS windows are not
closed. During pre-migration stage, vas_migration_handler() sets
migration_in_progress flag and closes all windows from the list.
The allocate VAS window routine checks the migration flag, setup
the window and then add it to the list. So there is possibility of
the migration handler missing the window that is still in the
process of setup.
t1: Allocate and open VAS t2: Migration event
window
lock vas_pseries_mutex
If migration_in_progress set
unlock vas_pseries_mutex
return
open window HCALL
unlock vas_pseries_mutex
Modify window HCALL lock vas_pseries_mutex
setup window migration_in_progress=true
Closes all windows from the list
// May miss windows that are
// not in the list
unlock vas_pseries_mutex
lock vas_pseries_mutex return
if nr_closed_windows == 0
// No DLPAR CPU or migration
add window to the list
// Window will be added to the
// list after the setup is completed
unlock vas_pseries_mutex
return
unlock vas_pseries_mutex
Close VAS window
// due to DLPAR CPU or migration
return -EBUSY
This patch resolves the issue with the following steps:
- Set the migration_in_progress flag without holding mutex.
- Introduce nr_open_wins_progress counter in VAS capabilities
struct
- This counter tracks the number of open windows are still in
progress
- The allocate setup window thread closes windows if the migration
is set and decrements nr_open_window_progress counter
- The migration handler waits for no in-progress open windows.
The code flow with the fix is as follows:
t1: Allocate and open VAS t2: Migration event
window
lock vas_pseries_mutex
If migration_in_progress set
unlock vas_pseries_mutex
return
open window HCALL
nr_open_wins_progress++
// Window opened, but not
// added to the list yet
unlock vas_pseries_mutex
Modify window HCALL migration_in_progress=true
setup window lock vas_pseries_mutex
Closes all windows from the list
While nr_open_wins_progress {
unlock vas_pseries_mutex
lock vas_pseries_mutex sleep
if nr_closed_windows == 0 // Wait if any open window in
or migration is not started // progress. The open window
// No DLPAR CPU or migration // thread closes the window without
add window to the list // adding to the list and return if
nr_open_wins_progress-- // the migration is in progress.
unlock vas_pseries_mutex
return
Close VAS window
nr_open_wins_progress--
unlock vas_pseries_mutex
return -EBUSY lock vas_pseries_mutex
}
unlock vas_pseries_mutex
return
Fixes: 37e6764895ef ("powerpc/pseries/vas: Add VAS migration handler") Signed-off-by: Haren Myneni <haren@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231125235104.3405008-1-haren@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
The emulated IMSIC update the external interrupt pending depending on
the value of eidelivery and topei. It might lose an interrupt when it
is interrupted before setting the new value to the pending status.
For example, when VCPU0 sends an IPI to VCPU1 via IMSIC:
VCPU0 VCPU1
CSRSWAP topei = 0
The VCPU1 has claimed all the external
interrupt in its interrupt handler.
topei of VCPU1's IMSIC = 0
set pending in VCPU1's IMSIC
topei of VCPU1' IMSIC = 1
set the external interrupt
pending of VCPU1
clear the external interrupt pending
of VCPU1
When the VCPU1 switches back to VS mode, it exits the interrupt handler
because the result of CSRSWAP topei is 0. If there are no other external
interrupts injected into the VCPU1's IMSIC, VCPU1 will never know this
pending interrupt unless it initiative read the topei.
If the interruption occurs between updating interrupt pending in IMSIC
and updating external interrupt pending of VCPU, it will not cause a
problem. Suppose that the VCPU1 clears the IPI pending in IMSIC right
after VCPU0 sets the pending, the external interrupt pending of VCPU1
will not be set because the topei is 0. But when the VCPU1 goes back to
VS mode, the pending IPI will be reported by the CSRSWAP topei, it will
not lose this interrupt.
So we only need to make the external interrupt updating procedure as a
critical section to avoid the problem.
Fixes: db8b7e97d613 ("RISC-V: KVM: Add in-kernel virtualization of AIA IMSIC") Tested-by: Roy Lin <roy.lin@sifive.com> Tested-by: Wayling Chen <wayling.chen@sifive.com> Co-developed-by: Vincent Chen <vincent.chen@sifive.com> Signed-off-by: Vincent Chen <vincent.chen@sifive.com> Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com> Signed-off-by: Anup Patel <anup@brainfault.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Does the same thing as:
commit 6740ec97bcdb ("drm/amd/display: Increase frame warning limit with KASAN or KCSAN in dml2")
Reviewed-by: Harry Wentland <harry.wentland@amd.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202311302107.hUDXVyWT-lkp@intel.com/ Fixes: 67e38874b85b ("drm/amd/display: Increase num voltage states to 40") Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Alvin Lee <alvin.lee2@amd.com> Cc: Hamza Mahfooz <hamza.mahfooz@amd.com> Cc: Samson Tam <samson.tam@amd.com> Cc: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently get_free_mem_region() searches for available capacity
in increments equal to the region size being requested. This can
cause the search to take giant steps through the resource leaving
needless gaps and missing available space.
Specifically 'cxl create-region' fails with ERANGE even though capacity
of the given size and CXL's expected 256M x InterleaveWays alignment can
be satisfied.
Replace the total-request-size increment with a next alignment increment
so that the next possible address is always examined for availability.
Fixes: 14b80582c43e ("resource: Introduce alloc_free_mem_region()") Reported-by: Dmytro Adamenko <dmytro.adamenko@intel.com> Reported-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20231113221324.1118092-1-alison.schofield@intel.com Cc: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
A read of a device poison list is triggered via a sysfs attribute
and the results are logged as kernel trace events of type cxl_poison.
The work is managed by either: a) the region driver when one of more
regions map the device, or by b) the memdev driver when no regions
map the device.
In the case of a) the region driver holds the region_rwsem while
reading the poison by committed endpoint decoder mappings and for
any unmapped resources. This makes sure that the cxl_poison trace
event trace reports valid region info. (Region name, HPA, and UUID).
In the case of b) the memdev driver holds the dpa_rwsem preventing
new DPA resources from being attached to a region. However, it leaves
a gap between region attach and decoder commit actions. If a DPA in
the gap is in the poison list, the cxl_poison trace event will omit
the region info.
Close the gap by holding the region_rwsem and the dpa_rwsem when
reading poison per memdev. Since both methods now hold both locks,
down_read both from the caller. Doing so also addresses the lockdep
assert that found this issue:
Commit 458ba8189cb4 ("cxl: Add cxl_decoders_committed() helper")
Fixes: f0832a586396 ("cxl/region: Provide region info to the cxl_poison trace event") Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/08e8e7ec9a3413b91d51de39e385653494b1eed0.1701041440.git.alison.schofield@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[Description]
If during driver init stage there are greater than 20
intermediary voltage states while constructing the SOC
BB we could hit issues because we will index outside of the
clock_limits array and start overwriting data. Increase the
total number of states to 40 to avoid this issue.
Cc: stable@vger.kernel.org # 6.1+ Reviewed-by: Samson Tam <samson.tam@amd.com> Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alvin Lee <alvin.lee2@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
We used to call intel_pre_plane_updates() for any pipe going through
a modeset whether the pipe was previously enabled or not. This in
fact needed to apply all the necessary clock gating workarounds/etc.
Restore the correct behaviour.
SCLK_SDMMC is the parent for SCLK_SDMMC_DRV and SCLK_SDMMC_SAMPLE, but
used with the (more) correct name sclk_sdmmc. SD card tuning does currently
fail as the parent can't be found under that name.
There is no need to suffix the name with '0' since RK312x SoCs do have a
single sdmmc controller - so rename it to the name which is already used
by it's children.
According to the TRM there are no specific gpll_peri, cpll_peri,
gpll_div2_peri or gpll_div3_peri gates, but a single clk_peri_src gate.
Instead mux_clk_peri_src directly connects to the plls respectively the pll
divider clocks.
Fix this by creating a single gated composite.
Also rename all occurrences of aclk_peri_src to clk_peri_src, since it
is the parent for peri aclks, pclks and hclks. That name also matches
the one used in the TRM.
The lowest supported clock frequency of the PHY is 125MHz (see also
mtk_mipi_tx_pll_enable()), but the clamping in .round_rate() has the
wrong minimal value, which will make the .enable() op return -EINVAL on
low frequencies. Fix the minimal clamping value.
Fixes: efda51a58b4a ("drm/mediatek: add mipi_tx driver for mt8183") Signed-off-by: Michael Walle <mwalle@kernel.org> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://lore.kernel.org/r/20231123110202.2025585-1-mwalle@kernel.org Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The enforce_cache_coherency callback ensures DMA cache coherency for
devices attached to the domain.
Intel IOMMU supports enforced DMA cache coherency when the Snoop
Control bit in the IOMMU's extended capability register is set.
Supporting it differs between legacy and scalable modes.
In legacy mode, it's supported page-level by setting the SNP field
in second-stage page-table entries. In scalable mode, it's supported
in PASID-table granularity by setting the PGSNP field in PASID-table
entries.
In legacy mode, mappings before attaching to a device have SNP
fields cleared, while mappings after the callback have them set.
This means partial DMAs are cache coherent while others are not.
One possible fix is replaying mappings and flipping SNP bits when
attaching a domain to a device. But this seems to be over-engineered,
given that all real use cases just attach an empty domain to a device.
To meet practical needs while reducing mode differences, only support
enforce_cache_coherency on a domain without mappings if SNP field is
used.
Fixes: fc0051cb9590 ("iommu/vt-d: Check domain force_snooping against attached devices") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20231114011036.70142-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
Some channels may be masked. When the system is suspended,
if these masked channels are not filtered out, this will
lead to null pointer operations and system crash:
AM62Ax has 3 SPI channels where each channel has 4x TX and 4x RX
threads. Also fix the thread numbers to match what the firmware expects
according to the PSI-L device description.
When the node for this phy selector is a child node of a syscon node then the
property 'reg' is used as an offset into the parent regmap. When the node
is standalone and gets its own regmap this offset is pre-applied. So we need
to track which method was used to get the regmap and not apply the offset
in the standalone case.
Fixes: 1fdfa7cccd35 ("phy: ti: gmii-sel: Allow parent to not be syscon node") Signed-off-by: Andrew Davis <afd@ti.com> Reviewed-by: Roger Quadros <rogerq@kernel.org> Link: https://lore.kernel.org/r/20231025143302.1265633-1-afd@ti.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
When the host invalidates a guest page, it will also check if the page
was used to map the prefix of any guest CPUs, in which case they are
stopped and marked as needing a prefix refresh. Upon starting the
affected CPUs again, their prefix pages are explicitly faulted in and
revalidated if they had been invalidated. A bit in the PGSTEs indicates
whether or not a page might contain a prefix. The bit is allowed to
overindicate. Pages above 2G are skipped, because they cannot be
prefixes, since KVM runs all guests with MSO = 0.
The same applies for nested guests (VSIE). When the host invalidates a
guest page that maps the prefix of the nested guest, it has to stop the
affected nested guest CPUs and mark them as needing a prefix refresh.
The same PGSTE bit used for the guest prefix is also used for the
nested guest. Pages above 2G are skipped like for normal guests, which
is the source of the bug.
The nested guest runs is the guest primary address space. The guest
could be running the nested guest using MSO != 0. If the MSO + prefix
for the nested guest is above 2G, the check for nested prefix will skip
it. This will cause the invalidation notifier to not stop the CPUs of
the nested guest and not mark them as needing refresh. When the nested
guest is run again, its prefix will not be refreshed, since it has not
been marked for refresh. This will cause a fatal validity intercept
with VIR code 37.
Fix this by removing the check for 2G for nested guests. Now all
invalidations of pages with the notify bit set will always scan the
existing VSIE shadow state descriptors.
This allows to catch invalidations of nested guest prefix mappings even
when the prefix is above 2G in the guest virtual address space.
If misaligned_access_speed percpu var isn't so called "HWPROBE
MISALIGNED UNKNOWN", it means the probe has happened(this is possible
for example, hotplug off then hotplug on one cpu), and the percpu var
has been set, don't probe again in this case.
has changed the semantics of what is to be considered an idle task in
such a way that the idle task of an offline CPU may not carry the
PF_IDLE flag anymore.
However RCU-tasks-trace tests the opposite assertion, still assuming
that idle tasks carry the PF_IDLE flag during their whole lifecycle.
Remove this assumption to avoid spurious warnings but keep the initial
test verifying that the idle task is the current task on any offline
CPU.
has changed the semantics of what is to be considered an idle task in
such a way that CPU boot code preceding the actual idle loop is excluded
from it.
This has however introduced new potential RCU-tasks stalls when either:
1) Grace period is started before init/0 had a chance to set PF_IDLE,
keeping it stuck in the holdout list until idle ever schedules.
2) Grace period is started when some possible CPUs have never been
online, keeping their idle tasks stuck in the holdout list until the
CPU ever boots up.
3) Similar to 1) but with secondary CPUs: Grace period is started
concurrently with secondary CPU booting, putting its idle task in
the holdout list because PF_IDLE isn't yet observed on it. It stays
then stuck in the holdout list until that CPU ever schedules. The
effect is mitigated here by the hotplug AP thread that must run to
bring the CPU up.
Fix this with handling the new semantics of PF_IDLE, keeping in mind
that it may or may not be set on an idle task. Take advantage of that to
strengthen the coverage of an RCU-tasks quiescent state within an idle
task, excluding the CPU boot code from it. Only the code running within
the idle loop is now a quiescent state, along with offline CPUs.
Export the RCU point of view as to when a CPU is considered offline
(ie: when does RCU consider that a CPU is sufficiently down in the
hotplug process to not feature any possible read side).
This will be used by RCU-tasks whose vision of an offline CPU should
reasonably match the one of RCU core.
Commit 851a723e45d1 ("sched: Always clear user_cpus_ptr in
do_set_cpus_allowed()") added a kfree() call to free any user
provided affinity mask, if present. It was changed later to use
kfree_rcu() in commit 9a5418bc48ba ("sched/core: Use kfree_rcu()
in do_set_cpus_allowed()") to avoid a circular locking dependency
problem.
It turns out that even kfree_rcu() isn't safe for avoiding
circular locking problem. As reported by kernel test robot,
the following circular locking dependency now exists:
&rdp->nocb_lock --> rcu_node_0 --> &rq->__lock
Solve this by breaking the rcu_node_0 --> &rq->__lock chain by moving
the resched_cpu() out from under rcu_node lock.
[peterz: heavily borrowed from Waiman's Changelog]
[paulmck: applied Z qiang feedback]
Fixes: 851a723e45d1 ("sched: Always clear user_cpus_ptr in do_set_cpus_allowed()") Reported-by: kernel test robot <oliver.sang@intel.com> Acked-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/oe-lkp/202310302207.a25f1a30-oliver.sang@intel.com Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The acpi_thermal_unregister_thermal_zone() is paired with
acpi_thermal_register_thermal_zone() so it should mirror it. It should
clean up all the resources that the register function allocated and
leave the stuff that was allocated elsewhere.
Unfortunately, it doesn't call thermal_zone_device_disable(). Also it
calls kfree(tz->trip_table) when it shouldn't. That was allocated in
acpi_thermal_add(). Putting the kfree() here leads to a double free
in the acpi_thermal_add() clean up function.
Likewise, the acpi_thermal_remove() should mirror acpi_thermal_add() so
it should have an explicit kfree(tz->trip_table) as well.
Fixes: ec23c1c462de ("ACPI: thermal: Use trip point table to register thermal zones") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The cited patch tries to ensure no pending works on the mkey cache
workqueue by disabling adding new works and call flush_workqueue().
But this workqueue also has delayed works which might still be pending
the delay time to be queued.
Add cancel_delayed_work() for the delayed works which waits to be queued
and then the flush_workqueue() will flush all works which are already
queued and running.
Increase the size of temporary print buffer on stack to fix the
following warnings reported by LKP.
Since all the input parameters of snprintf() are under control
of this driver, it is not possible to trigger and overflow here,
but since the print buffer is on stack and discarded once driver
probe() finishes, it is not an issue to increase it by 10 bytes
and fix the warning in the process. Make it so.
"
drivers/clk/clk-si521xx.c: In function 'si521xx_probe':
>> drivers/clk/clk-si521xx.c:318:26: warning: '%d' directive output may be truncated writing between 1 and 10 bytes into a region of size 2 [-Wformat-truncation=]
snprintf(name, 6, "DIFF%d", i);
^~
drivers/clk/clk-si521xx.c:318:21: note: directive argument in the range [0, 2147483647]
snprintf(name, 6, "DIFF%d", i);
^~~~~~~~
drivers/clk/clk-si521xx.c:318:3: note: 'snprintf' output between 6 and 15 bytes into a destination of size 6
snprintf(name, 6, "DIFF%d", i);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"
Fixes: edc12763a3a2 ("clk: si521xx: Clock driver for Skyworks Si521xx I2C PCIe clock generators") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202310260412.AGASjFN4-lkp@intel.com/ Signed-off-by: Marek Vasut <marex@denx.de> Link: https://lore.kernel.org/r/20231027085840.30098-1-marex@denx.de Signed-off-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The mtty driver does not currently conform to the vfio SET_IRQS uAPI.
For example, it claims to support mask and unmask of INTx, but actually
does nothing. It claims to support AUTOMASK for INTx, but doesn't. It
fails to teardown eventfds under the full semantics specified by the
SET_IRQS ioctl. It also fails to teardown eventfds when the device is
closed, leading to memory leaks. It claims to support the request IRQ,
but doesn't.
Fix all these.
A side effect of this is that QEMU will now report a warning:
vfio <uuid>: Failed to set up UNMASK eventfd signaling for interrupt \
INTX-0: VFIO_DEVICE_SET_IRQS failure: Inappropriate ioctl for device
The fact is that the unmask eventfd was never supported but quietly
failed. mtty never honored the AUTOMASK behavior, therefore there
was nothing to unmask. QEMU is verbose about the failure, but
properly falls back to userspace unmasking.
Fixes: 9d1a546c53b4 ("docs: Sample driver to demonstrate how to use Mediated device framework.") Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/r/20231016224736.2575718-2-alex.williamson@redhat.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
During hisilicon accelerator live migration operation. In order to
prevent the problem of EQ/AEQ interrupt loss. Migration driver will
trigger an EQ/AEQ doorbell at the end of the migration.
This operation may cause double interruption of EQ/AEQ events.
To ensure that the EQ/AEQ interrupt processing function is normal.
The interrupt handling functionality of EQ/AEQ needs to be updated.
Used to handle repeated interrupts event.
Fixes: b0eed085903e ("hisi_acc_vfio_pci: Add support for VFIO live migration") Signed-off-by: Longfang Liu <liulongfang@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
There is no need to free the reset_data structure if the recovery is
unsuccessful and the reset is synchronous. The function
adf_dev_aer_schedule_reset() handles the cleanup properly. Only
asynchronous resets require such structure to be freed inside the reset
worker.
Since commit adad556efcdd ("crypto: api - Fix built-in testing
dependency failures"), the following warning appears when booting an
x86_64 kernel that is configured with
CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y and CONFIG_CRYPTO_AES_NI_INTEL=y,
even when CONFIG_CRYPTO_XTS=y and CONFIG_CRYPTO_AES=y:
alg: skcipher: skipping comparison tests for xts-aes-aesni because xts(ecb(aes-generic)) is unavailable
This is caused by an issue in the xts template where it allocates an
"aes" single-block cipher without declaring a dependency on it via the
crypto_spawn mechanism. This issue was exposed by the above commit
because it reversed the order that the algorithms are tested in.
Specifically, when "xts(ecb(aes-generic))" is instantiated and tested
during the comparison tests for "xts-aes-aesni", the "xts" template
allocates an "aes" crypto_cipher for encrypting tweaks. This resolves
to "aes-aesni". (Getting "aes-aesni" instead of "aes-generic" here is a
bit weird, but it's apparently intended.) Due to the above-mentioned
commit, the testing of "aes-aesni", and the finalization of its
registration, now happens at this point instead of before. At the end
of that, crypto_remove_spawns() unregisters all algorithm instances that
depend on a lower-priority "aes" implementation such as "aes-generic"
but that do not depend on "aes-aesni". However, because "xts" does not
use the crypto_spawn mechanism for its "aes", its dependency on
"aes-aesni" is not recognized by crypto_remove_spawns(). Thus,
crypto_remove_spawns() unexpectedly unregisters "xts(ecb(aes-generic))".
Fix this issue by making the "xts" template use the crypto_spawn
mechanism for its "aes" dependency, like what other templates do.
Note, this fix could be applied as far back as commit f1c131b45410
("crypto: xts - Convert to skcipher"). However, the issue only got
exposed by the much more recent changes to how the crypto API runs the
self-tests, so there should be no need to backport this to very old
kernels. Also, an alternative fix would be to flip the list iteration
order in crypto_start_tests() to restore the original testing order.
I'm thinking we should do that too, since the original order seems more
natural, but it shouldn't be relied on for correctness.
Fixes: adad556efcdd ("crypto: api - Fix built-in testing dependency failures") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
libbpf accesses the ELF data requiring at least 8 byte alignment,
however, the data is generated into a C string that doesn't guarantee
alignment. Fix this by assigning to an aligned char array. Use sizeof
on the array, less one for the \0 terminator, rather than generating a
constant.
Fixes: a6cc6b34b93e ("bpftool: Provide a helper method for accessing skeleton's embedded ELF data") Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Acked-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20231007044439.25171-1-irogers@google.com Signed-off-by: Sasha Levin <sashal@kernel.org>
This cast was made by purpose for older libbpf where the
bpf_object_skeleton field is void * instead of const void *
to eliminate a warning (as i understand
-Wincompatible-pointer-types-discards-qualifiers) but this
cast introduces another warning (-Wcast-qual) for libbpf
where data field is const void *
It makes sense for bpftool to be in sync with libbpf from
kernel sources
While BPF allows to set icsk->->icsk_delack_max
and/or icsk->icsk_rto_min, we have an ip route
attribute (RTAX_RTO_MIN) to be able to tune rto_min,
but nothing to consequently adjust max delayed ack,
which vary from 40ms to 200 ms (TCP_DELACK_{MIN|MAX}).
This makes RTAX_RTO_MIN of almost no practical use,
unless customers are in big trouble.
Modern days datacenter communications want to set
rto_min to ~5 ms, and the max delayed ack one jiffie
smaller to avoid spurious retransmits.
After this patch, an "rto_min 5" route attribute will
effectively lower max delayed ack timers to 4 ms.
Note in the following ss output, "rto:6 ... ato:4"
While we could argue this patch fixes a bug with RTAX_RTO_MIN,
I do not add a Fixes: tag, so that we can soak it a bit before
asking backports to stable branches.
Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Right now we never release the power-domains properly on the error path.
Add a routine to be reused for this purpose and appropriate jumps in
probe() to run that routine where necessary.
Previously the jump label err_cleanup was used higher in the probe()
function to release the async notifier however the async notifier
registration was moved later in the code rendering the previous four jumps
redundant.
Rename the label from err_cleanup to err_v4l2_device_unregister to capture
what the jump does.
Fixes: 51397a4ec75d ("media: qcom: Initialise V4L2 async notifier later") Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
[hverkuil: fix old name in commit log: err_v4l2_device_register -> err_v4l2_device_unregister] Signed-off-by: Sasha Levin <sashal@kernel.org>
Userspace applications indicate their multi-buffer capability to xsk
using XSK_USE_SG socket bind flag. For sockets using shared umem the
bind flag may contain XSK_USE_SG only for the first socket. For any
subsequent socket the only option supported is XDP_SHARED_UMEM.
Add option XDP_UMEM_SG_FLAG in umem config flags to store the
multi-buffer handling capability when indicated by XSK_USE_SG option in
bing flag by the first socket. Use this to derive multi-buffer capability
for subsequent sockets in xsk core.
Signed-off-by: Tirthendu Sarkar <tirthendu.sarkar@intel.com> Fixes: 81470b5c3c66 ("xsk: introduce XSK_USE_SG bind flag for xsk socket") Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20230907035032.2627879-1-tirthendu.sarkar@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
I've been looking at the memory-failure code and I believe I have found
three bugs that need fixing -- one going all the way back to 2010! I'll
have more patches later to use folios more extensively but didn't want
these bugfixes to get caught up in that.
This patch (of 3):
Both collect_procs_anon() and collect_procs_file() iterate over the VMA
interval trees looking for a single pgoff, so it is wrong to look for the
pgoff of the head page as is currently done. However, it is also wrong to
look at page->mapping of the precise page as this is invalid for tail
pages. Clear up the confusion by passing both the folio and the precise
page to collect_procs().
The one caller of DAX lock/unlock page already calls compound_head(), so
use page_folio() instead, then use a folio throughout the DAX code to
remove uses of page->mapping and page->index.
[jane.chu@oracle.com: add comment to mf_generic_kill_procss(), simplify mf_generic_kill_procs:folio initialization] Link: https://lkml.kernel.org/r/20230908222336.186313-1-jane.chu@oracle.com Link: https://lkml.kernel.org/r/20230822231314.349200-1-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Jane Chu <jane.chu@oracle.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Jane Chu <jane.chu@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Stable-dep-of: 376907f3a0b3 ("mm/memory-failure: pass the folio and the page to collect_procs()") Signed-off-by: Sasha Levin <sashal@kernel.org>
Commit 9718475e6908 ("socket: Add SO_TIMESTAMPING_NEW") added the new
socket option SO_TIMESTAMPING_NEW. However, it was never implemented in
__sock_cmsg_send thus breaking SO_TIMESTAMPING cmsg for platforms using
SO_TIMESTAMPING_NEW.
The 2 lines to check for the BNXT_HWRM_PF_UNLOAD_SP_EVENT bit was
mis-applied to bnxt_cfg_ntp_filters() and should have been applied to
bnxt_sp_task().
Fixes: 19241368443f ("bnxt_en: Send PF driver unload notification to all VFs.") Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
CSR.OPS bits specify the current operating mode and (according to
documentation) they are updated by HW when the operating mode change
request is processed. To comply with this check CSR.OPS before proceeding.
Commit introduces ravb_set_opmode() that does all the necessities for
setting the operating mode (set CCC.OPC (and CCC.GAC, CCC.CSEL, if any) and
wait for CSR.OPS) and call it where needed. This should comply with all the
HW manuals requirements as different manual variants specify that different
modes need to be checked in CSR.OPS when setting CCC.OPC.
If gPTP active in config mode is supported and it needs to be enabled, the
CCC.GAC and CCC.CSEL needs to be configured along with CCC.OPC in the same
write access. For this, ravb_set_opmode() allows passing GAC and CSEL as
part of opmode and the function updates accordingly CCC register.
Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper") Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Add check for usbnet_get_endpoints() and return the error if it fails
in order to transfer the error.
Fixes: 16626b0cc3d5 ("asix: Add a new driver for the AX88172A") Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
During QoS scheduling testing with multiple strict priority flows, the
netdev tx watchdog timeout routine is invoked when a low priority QoS
queue doesn't get a chance to transmit the packets because other high
priority flows are completely subscribing the transmit link. The netdev
tx watchdog timeout routine will stop MAC RX and TX functionality in
otx2_stop() routine before cleanup of HW TX queues which results in SMQ
flush errors because the packets belonging to low priority queues will
never gets flushed since MAC TX is disabled. This patch fixes the issue
by re-enabling MAC TX to ensure the packets in HW pipeline gets flushed
properly.
Fixes: a7faa68b4e7f ("octeontx2-af: Start/Stop traffic in CGX along with NPC") Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently the NIX TX link credits are initialized based on the max frame
size that can be transmitted on a link but when the MTU is changed, the
NIX TX link credits are reprogrammed by the SW based on the new MTU value.
Since SMQ max packet length is programmed to max frame size by default,
there is a chance that NIX TX may stall while sending a max frame sized
packet on the link with insufficient credits to send the packet all at
once. This patch avoids stall issue by not changing the link credits
dynamically when the MTU is changed.
Fixes: 1c74b89171c3 ("octeontx2-af: Wait for TX link idle for credits change") Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: Nithin Kumar Dabilpuram <ndabilpuram@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
A crash was found when dumping SMC-R connections. It can be reproduced
by following steps:
- environment: two RNICs on both sides.
- run SMC-R between two sides, now a SMC_LGR_SYMMETRIC type link group
will be created.
- set the first RNIC down on either side and link group will turn to
SMC_LGR_ASYMMETRIC_LOCAL then.
- run 'smcss -R' and the crash will be triggered.
When the first RNIC is set down, the lgr->lnk[0] will be cleared and an
asymmetric link will be allocated in lgr->link[SMC_LINKS_PER_LGR_MAX - 1]
by smc_llc_alloc_alt_link(). Then when we try to dump SMC-R connections
in __smc_diag_dump(), the invalid lgr->lnk[0] will be accessed, resulting
in this issue. So fix it by accessing the right link.
Fixes: f16a7dd5cf27 ("smc: netlink interface for SMC sockets") Reported-by: henaumars <henaumars@sina.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=7616 Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Link: https://lore.kernel.org/r/1703662835-53416-1-git-send-email-guwen@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
When dma_alloc_coherent() fails, we should free qdev->lrg_buf
to prevent potential memleak.
Fixes: 1357bfcf7106 ("qla3xxx: Dynamically size the rx buffer queue based on the MTU.") Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn> Link: https://lore.kernel.org/r/20231227070227.10527-1-dinghao.liu@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Use DEV_STATS_INC() and DEV_STATS_READ() which provide
atomicity on paths that can be used concurrently.
Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: 2311e06b9bf3 ("virtio_net: fix missing dma unmap for resize") Signed-off-by: Sasha Levin <sashal@kernel.org>
Prevent move_mount from applying the attach_disconnected flag
to move_mount(). This prevents detached mounts from appearing
as / when applying mount mediation, which is not only incorrect
but could result in bad policy being generated.
Basic mount rules like
allow mount,
allow mount options=(move) -> /target/,
will allow detached mounts, allowing older policy to continue
to function. New policy gains the ability to specify `detached` as
a source option
allow mount detached -> /target/,
In addition make sure support of move_mount is advertised as
a feature to userspace so that applications that generate policy
can respond to the addition.
Note: this fixes mediation of move_mount when a detached mount is used,
it does not fix the broader regression of apparmor mediation of
mounts under the new mount api.
According to the Intel Software Manual for I225, Section 7.5.2.7,
hicredit should be multiplied by the constant link-rate value, 0x7736.
Currently, the old constant link-rate value, 0x7735, from the boards
supported on igb are being used, most likely due to a copy'n'paste, as
the rest of the logic is the same for both drivers.
Update hicredit accordingly.
Fixes: 1ab011b0bf07 ("igc: Add support for CBS offloading") Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: Rodrigo Cataldo <rodrigo.cadore@l-acoustics.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
During a PCI FLR the MSI-X Enable flag in the VF PCI MSI-X capability
register will be cleared. This can lead to issues when a VF is
assigned to a VM because in these cases the VF driver receives no
indication of the PF PCI error/reset and additionally it is incapable
of restoring the cleared flag in the hypervisor configuration space
without fully reinitializing the driver interrupt functionality.
Since the VF driver is unable to easily resolve this condition on its own,
restore the VF MSI-X flag during the PF PCI reset handling.
When a control changes value the return value from _put() should be 1 so
we get events generated to userspace notifying applications of the change.
While the I2S mux gets this right the S/PDIF mux does not, fix the return
value.
When a control changes value the return value from _put() should be 1 so
we get events generated to userspace notifying applications of the change.
We are checking if there has been a change and exiting early if not but we
are not providing the correct return value in the latter case, fix this.
When writing to an enum we need to verify that the value written is valid
for the enumeration, the helper function snd_soc_item_enum_to_val() doesn't
do it since it needs to return an unsigned (and in any case we'd need to
check the return value).
When writing to an enum we need to verify that the value written is valid
for the enumeration, the helper function snd_soc_item_enum_to_val() doesn't
do it since it needs to return an unsigned (and in any case we'd need to
check the return value).
Commit 3116f59c12bd ("i40e: fix use-after-free in
i40e_sync_filters_subtask()") avoided use-after-free issues,
by increasing refcount during update the VSI filter list to
the HW. However, it missed the unicast situation.
When deleting an unicast FDB entry, the i40e driver will release
the mac_filter, and i40e_service_task will concurrently request
firmware to add the mac_filter, which will lead to the following
use-after-free issue.
Fix again for both netdev->uc and netdev->mc.
BUG: KASAN: use-after-free in i40e_aqc_add_filters+0x55c/0x5b0 [i40e]
Read of size 2 at addr ffff888eb3452d60 by task kworker/8:7/6379
Commit 86a7e0b69bd5 ("net: prevent rewrite of msg_name in
sock_sendmsg()") made sock_sendmsg save the incoming msg_name pointer
and restore it before returning, to insulate the caller against
msg_name being changed by the called code. If the address length
was also changed however, we may return with an inconsistent structure
where the length doesn't match the address, and attempts to reuse it may
lead to lost packets.
For example, a kernel that doesn't have commit 1c5950fc6fe9 ("udp6: fix
potential access to stale information") will replace a v4 mapped address
with its ipv4 equivalent, and shorten namelen accordingly from 28 to 16.
If the caller attempts to reuse the resulting msg structure, it will have
the original ipv6 (v4 mapped) address but an incorrect v4 length.
Fixes: 86a7e0b69bd5 ("net: prevent rewrite of msg_name in sock_sendmsg()") Signed-off-by: Marc Dionne <marc.dionne@auristor.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
In the init path, nft_data_init() bumps the chain reference counter,
decrement it on error by following the error path which calls
nft_data_release() to restore it.
This fixes openvswitch's handling of nat packets in the related state.
In nf_ct_nat_execute(), which is called from nf_ct_nat(), ICMP/ICMPv6
packets in the IP_CT_RELATED or IP_CT_RELATED_REPLY state, which have
not been dropped, will follow the goto, however the placement of the
goto label means that updating the action bit field will be bypassed.
This causes ovs_nat_update_key() to not be called from ovs_ct_nat()
which means the openvswitch match key for the ICMP/ICMPv6 packet is not
updated and the pre-nat value will be retained for the key, which will
result in the wrong openflow rule being matched for that packet.
Move the goto label above where the action bit field is being set so
that it is updated in all cases where the packet is accepted.
Fixes: ebddb1404900 ("net: move the nat function to nf_nat_ovs for ovs and tc") Signed-off-by: Brad Cowie <brad@faucet.nz> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Xin Long <lucien.xin@gmail.com> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The flag DMA_TX_APPEND_CRC was only written to the first DMA descriptor
in the TX path, where each descriptor corresponds to a single skbuff
fragment (or the skbuff head). This led to packets with no FCS appearing
on the wire if the kernel allocated the packet in fragments, which would
always happen when using PACKET_MMAP/TPACKET (cf. tpacket_fill_skb() in
net/af_packet.c).
Fixes: 1c1008c793fa ("net: bcmgenet: add main driver file") Signed-off-by: Adrian Cinal <adriancinal1@gmail.com> Acked-by: Doug Berger <opendmb@gmail.com> Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/20231228135638.1339245-1-adriancinal1@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
In efx_probe_filters, the channel->rps_flow_id is freed in a
efx_for_each_channel marco when success equals to 0.
However, after the following call chain:
Running a multi-arch kernel (multi_v7_defconfig) on a Raspberry Pi 3B+
with enabled CONFIG_UBSAN triggers the following warning:
UBSAN: array-index-out-of-bounds in arch/arm/mach-sunxi/mc_smp.c:810:29
index 2 is out of range for type 'sunxi_mc_smp_data [2]'
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.7.0-rc6-00248-g5254c0cbc92d
Hardware name: BCM2835
unwind_backtrace from show_stack+0x10/0x14
show_stack from dump_stack_lvl+0x40/0x4c
dump_stack_lvl from ubsan_epilogue+0x8/0x34
ubsan_epilogue from __ubsan_handle_out_of_bounds+0x78/0x80
__ubsan_handle_out_of_bounds from sunxi_mc_smp_init+0xe4/0x4cc
sunxi_mc_smp_init from do_one_initcall+0xa0/0x2fc
do_one_initcall from kernel_init_freeable+0xf4/0x2f4
kernel_init_freeable from kernel_init+0x18/0x158
kernel_init from ret_from_fork+0x14/0x28
Since the enabled method couldn't match with any entry from
sunxi_mc_smp_data, the value of the index shouldn't be used right after
the loop. So move it after the check of ret in order to have a valid
index.
Similar to commit be809424659c ("selftests: bonding: do not set port down
before adding to bond"). The bond-arp-interval-causes-panic test failed
after commit a4abfa627c38 ("net: rtnetlink: Enslave device before bringing
it up") as the kernel will set the port down _after_ adding to bond if setting
port down specifically.
Fix it by removing the link down operation when adding to bond.
Fixes: 2ffd57327ff1 ("selftests: bonding: cause oops in bond_rr_gen_slave_id") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Tested-by: Benjamin Poirier <benjamin.poirier@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Commit 9718475e6908 ("socket: Add SO_TIMESTAMPING_NEW") added the new
socket option SO_TIMESTAMPING_NEW. Setting the option is handled in
sk_setsockopt(), querying it was not handled in sk_getsockopt(), though.
Following remarks on an earlier submission of this patch, keep the old
behavior of getsockopt(SO_TIMESTAMPING_OLD) which returns the active
flags even if they actually have been set through SO_TIMESTAMPING_NEW.
The new getsockopt(SO_TIMESTAMPING_NEW) is stricter, returning flags
only if they have been set through the same option.
Not sure if it's related, but those NICs have a BMC device at function
0:
02:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. Realtek RealManage BMC [10ec:816e] (rev 1a)
Trial and error shows that increase the loop wait on
rtl_ep_ocp_read_cond to 30 can eliminate the issue, so let
rtl8168ep_driver_start() to wait a bit longer.
Fixes: e6d6ca6e1204 ("r8169: Add support for another RTL8168FP") Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
m->data needs to be freed when em_text_destroy is called.
Fixes: d675c989ed2d ("[PKT_SCHED]: Packet classification based on textsearch (ematch)") Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Hangyu Hua <hbh25y@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Under heavy traffic, the BlueField Gigabit interface can
become unresponsive. This is due to a possible race condition
in the mlxbf_gige_rx_packet function, where the function exits
with producer and consumer indices equal but there are remaining
packet(s) to be processed. In order to prevent this situation,
read receive consumer index *before* the HW replenish so that
the mlxbf_gige_rx_packet function returns an accurate return
value even if a packet is received into just-replenished buffer
prior to exiting this routine. If the just-replenished buffer
is received and occupies the last RX ring entry, the interface
would not recover and instead would encounter RX packet drops
related to internal buffer shortages since the driver RX logic
is not being triggered to drain the RX ring. This patch will
address and prevent this "ring full" condition.
Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver") Reviewed-by: Asmaa Mnebhi <asmaa@nvidia.com> Signed-off-by: David Thompson <davthompson@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
AUD_PAD_TOP widget's correct register is AFE_AUD_PAD_TOP , and not zero.
Having a zero as register, it would mean that the `snd_soc_dapm_new_widgets`
would try to read the register at offset zero when trying to get the power
status of this widget, which is incorrect.
Fixes: b65c466220b3 ("ASoC: mediatek: mt8186: support adda in platform driver") Signed-off-by: Eugen Hristev <eugen.hristev@collabora.com> Link: https://lore.kernel.org/r/20231229114342.195867-1-eugen.hristev@collabora.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Fixes: b73d9e6225e8 ("ASoC: fsl_rpmsg: Add CPU DAI driver for audio base on rpmsg") Signed-off-by: Chancel Liu <chancel.liu@nxp.com> Acked-by: Shengjiu Wang <shengjiu.wang@gmail.com> Link: https://lore.kernel.org/r/20231225080608.967953-1-chancel.liu@nxp.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently the driver accepts VLAN EtherType steering rules regardless of
the configured mask. And things might fail silently or with confusing error
messages to the user. The VLAN EtherType can only be matched by full
mask. Therefore, add a check for that.
For instance the following rule is invalid, but the driver accepts it and
ignores the user specified mask:
|root@host:~# ethtool -N enp3s0 flow-type ether vlan-etype 0x8100 \
| m 0x00ff action 0
|Added rule with ID 63
|root@host:~# ethtool --show-ntuple enp3s0
|4 RX rings available
|Total 1 rules
|
|Filter: 63
| Flow Type: Raw Ethernet
| Src MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF
| Dest MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF
| Ethertype: 0x0 mask: 0xFFFF
| VLAN EtherType: 0x8100 mask: 0x0
| VLAN: 0x0 mask: 0xffff
| User-defined: 0x0 mask: 0xffffffffffffffff
| Action: Direct to queue 0
After:
|root@host:~# ethtool -N enp3s0 flow-type ether vlan-etype 0x8100 \
| m 0x00ff action 0
|rmgr: Cannot insert RX class rule: Operation not supported
Fixes: 2b477d057e33 ("igc: Integrate flex filter into ethtool ops") Suggested-by: Suman Ghosh <sumang@marvell.com> Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently the driver accepts VLAN TCI steering rules regardless of the
configured mask. And things might fail silently or with confusing error
messages to the user.
There are two ways to handle the VLAN TCI mask:
1. Match on the PCP field using a VLAN prio filter
2. Match on complete TCI field using a flex filter
Therefore, add checks and code for that.
For instance the following rule is invalid and will be converted into a
VLAN prio rule which is not correct:
|root@host:~# ethtool -N enp3s0 flow-type ether vlan 0x0001 m 0xf000 \
| action 1
|Added rule with ID 61
|root@host:~# ethtool --show-ntuple enp3s0
|4 RX rings available
|Total 1 rules
|
|Filter: 61
| Flow Type: Raw Ethernet
| Src MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF
| Dest MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF
| Ethertype: 0x0 mask: 0xFFFF
| VLAN EtherType: 0x0 mask: 0xffff
| VLAN: 0x1 mask: 0x1fff
| User-defined: 0x0 mask: 0xffffffffffffffff
| Action: Direct to queue 1
After:
|root@host:~# ethtool -N enp3s0 flow-type ether vlan 0x0001 m 0xf000 \
| action 1
|rmgr: Cannot insert RX class rule: Operation not supported
Fixes: 7991487ecb2d ("igc: Allow for Flex Filters to be installed") Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Prevent VF from configuring filters with unsupported actions or use
REDIRECT action with invalid tc number. Current checks could cause
out of bounds access on PF side.
Fixes: e284fc280473 ("i40e: Add and delete cloud filter") Reviewed-by: Andrii Staikov <andrii.staikov@intel.com> Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Disabling netdev with ethtool private flag "link-down-on-close" enabled
can cause NULL pointer dereference bug. Shut down VSI regardless of
"link-down-on-close" state.
Fixes: 8ac7132704f3 ("ice: Fix interface being down after reset with link-down-on-close flag on") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Ngai-Mint Kwan <ngai-mint.kwan@intel.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The driver should not report an error message when for a medialess port
the link_down_on_close flag is enabled and the physical link cannot be
set down.
Fixes: 8ac7132704f3 ("ice: Fix interface being down after reset with link-down-on-close flag on") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Katarzyna Wieczerzycka <katarzyna.wieczerzycka@intel.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>