With commit 13046370c4d1 ("ALSA: hda/hdmi: let new platforms assign the
pcm slot dynamically"), old behaviour to consider the HDA pin number,
when choosing PCM to assign, was dropped.
Build on this change and limit the number of PCMs created to number of
converters (= maximum number of concurrent display/receivers) when
"mst_no_extra_pcms" and "dyn_pcm_no_legacy" quirks are both set.
Fix the check in hdmi_find_pcm_slot() to ensure only spec->pcm_used
entries are considered in the search. Elsewhere in the driver
spec->pcm_used is already checked properly.
Doing this avoids following warning at SOF driver probe for multiple
machine drivers:
[ 112.425297] sof_sdw sof_sdw: hda_dsp_hdmi_build_controls: no
PCM in topology for HDMI converter 4
[ 112.425298] sof_sdw sof_sdw: hda_dsp_hdmi_build_controls: no
PCM in topology for HDMI converter 5
[ 112.425299] sof_sdw sof_sdw: hda_dsp_hdmi_build_controls: no
PCM in topology for HDMI converter 6
I encountered some occasional crashes of poke_int3_handler() when
kprobes are set, while accessing desc->vec.
The text poke mechanism claims to have an RCU-like behavior, but it
does not appear that there is any quiescent state to ensure that
nobody holds reference to desc. As a result, the following race
appears to be possible, which can lead to memory corruption.
if (!desc) [false, success]
WRITE_ONCE(bp_desc, NULL);
atomic_dec_and_test(&desc.refs)
[ success, desc space on the stack
is being reused and might have
non-zero value. ]
arch_atomic_inc_not_zero(&desc->refs)
[ might succeed since desc points to
stack memory that was freed and might
be reused. ]
Fix this issue with small backportable patch. Instead of trying to
make RCU-like behavior for bp_desc, just eliminate the unnecessary
level of indirection of bp_desc, and hold the whole descriptor as a
global. Anyhow, there is only a single descriptor at any given
moment.
Fixes: 1f676247f36a4 ("x86/alternatives: Implement a better poke_int3_handler() completion scheme") Signed-off-by: Nadav Amit <namit@vmware.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@kernel.org Link: https://lkml.kernel.org/r/20220920224743.3089-1-namit@vmware.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The only thing reported by CPUID.9 is the value of
IA32_PLATFORM_DCA_CAP[31:0] in EAX. This MSR doesn't even exist in the
guest, since CPUID.1:ECX.DCA[bit 18] is clear in the guest.
Clear CPUID.9 in KVM_GET_SUPPORTED_CPUID.
Fixes: 24c82e576b78 ("KVM: Sanitize cpuid") Signed-off-by: Jim Mattson <jmattson@google.com>
Message-Id: <20220922231854.249383-1-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
After commit 31fd9b79dc58 ("ARM: dts: BCM5301X: update CRU block
description") a warning from clk-iproc-pll.c was generated due to a
duplicate PLL name as well as the console stopped working. Upon closer
inspection it became clear that iproc_pll_clk_setup() used the Device
Tree node unit name as an unique identifier as well as a parent name to
parent all clocks under the PLL.
BCM5301X was the first platform on which that got noticed because of the
DT node unit name renaming but the same assumptions hold true for any
user of the iproc_pll_clk_setup() function.
The first 'clock-output-names' property is always guaranteed to be
unique as well as providing the actual desired PLL clock name, so we
utilize that to register the PLL and as a parent name of all children
clock.
There is no dedicate parent clock for QSPI so SET_RATE_PARENT flag
should not be used. For instance, the default parent clock for QSPI is
pll2_bus, which is also the parent clock for quite a few modules, such
as MMDC, once GPMI NAND set clock rate for EDO5 mode can cause system
hang due to pll2_bus rate changed.
Fixes: f1541e15e38e ("clk: imx6sx: Switch to clk_hw based API") Signed-off-by: Han Xu <han.xu@nxp.com> Link: https://lore.kernel.org/r/20220915150959.3646702-1-han.xu@nxp.com Tested-by: Fabio Estevam <festevam@denx.de> Reviewed-by: Abel Vesa <abel.vesa@linaro.org> Signed-off-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The issue is related with serdes which impacts clock. There is
serdes in ADLink I-Pi SMARC board ethernet controller. Please refer to
commit b9663b7ca6ff78 ("net: stmmac: Enable SERDES power up/down sequence")
for detial. When issue is reproduced, DMA engine clock is not ready
because serdes is not powered up.
To reproduce DMA engine reset timeout issue with hardware which has
serdes in GBE controller, install Ubuntu. In Ubuntu GUI, click
"Power Off/Log Out" -> "Suspend" menu, it disables network interface,
then goes to sleep mode. When it wakes up, it enables network
interface again. Stmmac driver is called in this way:
1. stmmac_release: Stop network interface. In this function, it
disables DMA engine and network interface;
2. stmmac_suspend: It is called in kernel suspend flow. But because
network interface has been disabled(netif_running(ndev) is
false), it does nothing and returns directly;
3. System goes into S3 or S0ix state. Some time later, system is
waken up by keyboard or mouse;
4. stmmac_resume: It does nothing because network interface has
been disabled;
5. stmmac_open: It is called to enable network interace again. DMA
engine is initialized in this API, but serdes is not power on so
there will be DMA engine reset timeout issue.
Similarly, serdes powerdown should be added in stmmac_release.
Network interface might be disabled by cmd "ifconfig eth0 down",
DMA engine, phy and mac have been disabled in ndo_stop callback,
serdes should be powered down as well. It doesn't make sense that
serdes is on while other components have been turned off.
If ethernet interface is in enabled state(netif_running(ndev) is true)
before suspend/resume, the issue couldn't be reproduced because serdes
could be powered up in stmmac_resume.
Because serdes_powerup is added in stmmac_open, it doesn't need to be
called in probe function.
Fixes: b9663b7ca6ff78 ("net: stmmac: Enable SERDES power up/down sequence") Signed-off-by: Junxiao Chang <junxiao.chang@intel.com> Reviewed-by: Voon Weifeng <weifeng.voon@intel.com> Tested-by: Jimmy JS Chen <jimmyjs.chen@adlinktech.com> Tested-by: Looi, Hong Aun <hong.aun.looi@intel.com> Link: https://lore.kernel.org/r/20220923050448.1220250-1-junxiao.chang@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The IOC_PR_CLEAR and IOC_PR_RELEASE ioctls are
non-functional on NVMe devices because the nvme_pr_clear()
and nvme_pr_release() functions set the IEKEY field incorrectly.
The IEKEY field should be set only when the key is zero (i.e,
not specified). The current code does it backwards.
Furthermore, the NVMe spec describes the persistent
reservation "clear" function as an option on the reservation
release command. The current implementation of nvme_pr_clear()
erroneously uses the reservation register command.
Fix these errors. Note that NVMe version 1.3 and later specify
that setting the IEKEY field will return an error of Invalid
Field in Command. The fix will set IEKEY when the key is zero,
which is appropriate as these ioctls consider a zero key to
be "unspecified", and the intention of the spec change is
to require a valid key.
Tested on a version 1.4 PCI NVMe device in an Azure VM.
Fixes: 1673f1f08c88 ("nvme: move block_device_operations and ns/ctrl freeing to common code") Fixes: 1d277a637a71 ("NVMe: Add persistent reservation ops") Signed-off-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
Add a new line in functions nvme_pr_preempt(), nvme_pr_clear(), and
nvme_pr_release() after variable declaration which follows the rest of
the code in the nvme/host/core.c.
No functional change(s) in this patch.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
Stable-dep-of: c292a337d0e4 ("nvme: Fix IOC_PR_CLEAR and IOC_PR_RELEASE ioctls for nvme devices") Signed-off-by: Sasha Levin <sashal@kernel.org>
The label passed to the QDESC_GET for the ETHOFLD TXQ, RXQ, and FLQ, is the
'out' one, which skips the 'out_unlock' label, and thus doesn't unlock the
'uld_mutex' before returning. Additionally, since commit 5148e5950c67
("cxgb4: add EOTID tracking and software context dump"), the access to
these ETHOFLD hardware queues should be protected by the 'mqprio_mutex'
instead.
Currently usbnet_disconnect() unanchors and frees all deferred URBs
using usb_scuttle_anchored_urbs(), which does not free urb->context,
causing a memory leak as reported by syzbot.
Use a usb_get_from_anchor() while loop instead, similar to what we did
in commit 19cfe912c37b ("Bluetooth: btusb: Fix memory leak in
play_deferred"). Also free urb->sg.
Reported-and-tested-by: syzbot+dcd3e13cf4472f2e0ba1@syzkaller.appspotmail.com Fixes: 69ee472f2706 ("usbnet & cdc-ether: Autosuspend for online devices") Fixes: 638c5115a794 ("USBNET: support DMA SG") Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Link: https://lore.kernel.org/r/20220923042551.2745-1-yepeilin.cs@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
For quite some time, core DRM helpers already ensure that any relevant
connectors/CRTCs/etc. are disabled, as well as their associated
components (e.g., bridges) when suspending the system. Thus,
analogix_dp_bridge_{enable,disable}() already get called, which in turn
call drm_panel_{prepare,unprepare}(). This makes these drm_panel_*()
calls redundant.
Besides redundancy, there are a few problems with this handling:
(1) drm_panel_{prepare,unprepare}() are *not* reference-counted APIs and
are not in general designed to be handled by multiple callers --
although some panel drivers have a coarse 'prepared' flag that mitigates
some damage, at least. So at a minimum this is redundant and confusing,
but in some cases, this could be actively harmful.
(2) The error-handling is a bit non-standard. We ignored errors in
suspend(), but handled errors in resume(). And recently, people noticed
that the clk handling is unbalanced in error paths, and getting *that*
right is not actually trivial, given the current way errors are mostly
ignored.
(3) In the particular way analogix_dp_{suspend,resume}() get used (e.g.,
in rockchip_dp_*(), as a late/early callback), we don't necessarily have
a proper PM relationship between the DP/bridge device and the panel
device. So while the DP bridge gets resumed, the panel's parent device
(e.g., platform_device) may still be suspended, and so any prepare()
calls may fail.
So remove the superfluous, possibly-harmful suspend()/resume() handling
of panel state.
On probe of the ASoC component, the device is reset but the regcache is
retained. This means the regcache gets out of sync if the codec is
rebound to a sound card for a second time. Fix it by reinitializing the
regcache to defaults after the device is reset.
Fixes: b0bcbe615756 ("ASoC: tas2770: Fix calling reset in probe") Signed-off-by: Martin Povišer <povik+lin@cutebit.org> Link: https://lore.kernel.org/r/20220919173453.84292-1-povik+lin@cutebit.org Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Errors from debugfs are intended to be non-fatal, and should not prevent
the driver from probing.
Since debugfs file creation is treated as infallible, move it below the
parts of the probe function that can fail. This prevents an error
elsewhere in the probe function from causing the file to leak. Do the
same for the call to of_platform_populate().
Finally, checkpatch suggests an octal literal for the file permissions.
Fixes: 4af34b572a85 ("drivers: soc: sunxi: Introduce SoC driver to map SRAMs") Fixes: 5828729bebbb ("soc: sunxi: export a regmap for EMAC clock reg on A64") Reviewed-by: Jernej Skrabec <jernej.skrabec@gmail.com> Signed-off-by: Samuel Holland <samuel@sholland.org> Tested-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com> Link: https://lore.kernel.org/r/20220815041248.53268-6-samuel@sholland.org Signed-off-by: Sasha Levin <sashal@kernel.org>
This driver exports a regmap tied to the platform device (as opposed to
a syscon, which exports a regmap tied to the OF node). Because of this,
the driver can never be unbound, as that would destroy the regmap. Use
builtin_platform_driver_probe() to enforce this limitation.
Fixes: 5828729bebbb ("soc: sunxi: export a regmap for EMAC clock reg on A64") Reviewed-by: Jernej Skrabec <jernej.skrabec@gmail.com> Signed-off-by: Samuel Holland <samuel@sholland.org> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Tested-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com> Link: https://lore.kernel.org/r/20220815041248.53268-5-samuel@sholland.org Signed-off-by: Sasha Levin <sashal@kernel.org>
sunxi_sram_claim() checks the sram_desc->claimed flag before updating
the register, with the intent that only one device can claim a region.
However, this was ineffective because the flag was never set.
On i.MX7/iMX8MM/iMX8MQ, the initialized default value of PERST bit(BIT3)
of SRC_PCIEPHY_RCR is 1b'1.
But i.MX8MP has one inversed default value 1b'0 of PERST bit.
And the PERST bit should be kept 1b'1 after power and clocks are stable.
So fix the i.MX8MP PCIe PHY PERST support here.
Fixes: e08672c03981 ("reset: imx7: Add support for i.MX8MP SoC") Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Tested-by: Marek Vasut <marex@denx.de> Tested-by: Richard Leitner <richard.leitner@skidata.com> Tested-by: Alexander Stein <alexander.stein@ew.tq-group.com> Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de> Link: https://lore.kernel.org/r/1661845564-11373-5-git-send-email-hongxing.zhu@nxp.com Signed-off-by: Sasha Levin <sashal@kernel.org>
According to technical manual(table 11-24), the DMA of MMCHS0 should be
direct mapped.
Fixes: b5e509066074 ("ARM: DTS: am33xx: Use the new DT bindings for the eDMA3") Signed-off-by: YuTong Chang <mtwget@gmail.com>
Message-Id: <20220620124146.5330-1-mtwget@gmail.com> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
1) commit 4e89dce72521 ("iommu/iova: Retry from last rb tree node if
iova search fails") tries to fix that iova allocation can fail while
there are still free space available. This is not backported to 5.10
stable.
2) commit fce54ed02757 ("scsi: hisi_sas: Limit max hw sectors for v3
HW") fix the performance regression introduced by 1), however, this
is just a temporary solution and will cause io performance regression
because it limit max io size to PAGE_SIZE * 32(128k for 4k page_size).
3) John Garry posted a patchset to fix the problem.
4) The temporary solution is reverted.
It's weird that the patch in 2) is backported to 5.10 stable alone,
while the right thing to do is to backport them all together.
swiotlb_find_slots() skips slots according to io tlb aligned mask
calculated from min aligned mask and original physical address
offset. This affects max mapping size. The mapping size can't
achieve the IO_TLB_SEGSIZE * IO_TLB_SIZE when original offset is
non-zero. This will cause system boot up failure in Hyper-V
Isolation VM where swiotlb force is enabled. Scsi layer use return
value of dma_max_mapping_size() to set max segment size and it
finally calls swiotlb_max_mapping_size(). Hyper-V storage driver
sets min align mask to 4k - 1. Scsi layer may pass 256k length of
request buffer with 0~4k offset and Hyper-V storage driver can't
get swiotlb bounce buffer via DMA API. Swiotlb_find_slots() can't
find 256k length bounce buffer with offset. Make swiotlb_max_mapping
_size() take min align mask into account.
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Rishabh Bhatnagar <risbhat@amazon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Quite often, the HW get stuck in error condition if a stream error
was detected. As documented, the HW should stop immediately and self
reset. There is likely a problem or a miss-understanding of the self
reset mechanism, as unless we make a long pause, the next command
will then report an error even if there is no error in it.
Disabling error detection fixes the issue, and let the decoder continue
after an error. This patch is safe for backport into older kernels.
Fixes: cd33c830448b ("media: rkvdec: Add the rkvdec driver") Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> Reviewed-by: Brian Norris <briannorris@chromium.org> Tested-by: Brian Norris <briannorris@chromium.org> Reviewed-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When clearing a PTE the TLB should be flushed whilst still holding the PTL
to avoid a potential race with madvise/munmap/etc. For example consider
the following sequence:
CPU0 CPU1
---- ----
migrate_vma_collect_pmd()
pte_unmap_unlock()
madvise(MADV_DONTNEED)
-> zap_pte_range()
pte_offset_map_lock()
[ PTE not present, TLB not flushed ]
pte_unmap_unlock()
[ page is still accessible via stale TLB ]
flush_tlb_range()
In this case the page may still be accessed via the stale TLB entry after
madvise returns. Fix this by flushing the TLB while holding the PTL.
Fixes: 8c3328f1f36a ("mm/migrate: migrate_vma() unmap page from vma while collecting pages") Link: https://lkml.kernel.org/r/9f801e9d8d830408f2ca27821f606e09aa856899.1662078528.git-series.apopple@nvidia.com Signed-off-by: Alistair Popple <apopple@nvidia.com> Reported-by: Nadav Amit <nadav.amit@gmail.com> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Cc: Alex Sierra <alex.sierra@amd.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: huang ying <huang.ying.caritas@gmail.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Karol Herbst <kherbst@redhat.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Lyude Paul <lyude@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A number of drivers call page_frag_alloc() with a fragment's size >
PAGE_SIZE.
In low memory conditions, __page_frag_cache_refill() may fail the order
3 cache allocation and fall back to order 0; In this case, the cache
will be smaller than the fragment, causing memory corruptions.
Prevent this from happening by checking if the newly allocated cache is
large enough for the fragment; if not, the allocation will fail and
page_frag_alloc() will return NULL.
Link: https://lkml.kernel.org/r/20220715125013.247085-1-mlombard@redhat.com Fixes: b63ae8ca096d ("mm/net: Rename and move page fragment handling from net/ to mm/") Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Cc: Chen Lin <chen45464546@163.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For a GFP_KERNEL allocation, alloc_pages_slowpath() will save the
offset of ZONE_NORMAL in ac->preferred_zoneref. If a concurrent
memory_offline operation removes the last page from ZONE_MOVABLE,
build_all_zonelists() & build_zonerefs_node() will update
node_zonelists as shown below. Only populated zones are added.
The race is simple -- page allocation could be in progress when a memory
hot-remove operation triggers a zonelist rebuild that removes zones. The
allocation request will still have a valid ac->preferred_zoneref that is
now pointing to NULL and triggers an OOM kill.
This problem probably always existed but may be slightly easier to trigger
due to 6aa303defb74 ("mm, vmscan: only allocate and reclaim from zones
with pages managed by the buddy allocator") which distinguishes between
zones that are completely unpopulated versus zones that have valid pages
not managed by the buddy allocator (e.g. reserved, memblock, ballooning
etc). Memory hotplug had multiple stages with timing considerations
around managed/present page updates, the zonelist rebuild and the zone
span updates. As David Hildenbrand puts it
memory offlining adjusts managed+present pages of the zone
essentially in one go. If after the adjustments, the zone is no
longer populated (present==0), we rebuild the zone lists.
Once that's done, we try shrinking the zone (start+spanned
pages) -- which results in zone_start_pfn == 0 if there are no
more pages. That happens *after* rebuilding the zonelists via
remove_pfn_range_from_zone().
The only requirement to fix the race is that a page allocation request
identifies when a zonelist rebuild has happened since the allocation
request started and no page has yet been allocated. Use a seqlock_t to
track zonelist updates with a lockless read-side of the zonelist and
protecting the rebuild and update of the counter with a spinlock.
[akpm@linux-foundation.org: make zonelist_update_seq static] Link: https://lkml.kernel.org/r/20220824110900.vh674ltxmzb3proq@techsingularity.net Fixes: 6aa303defb74 ("mm, vmscan: only allocate and reclaim from zones with pages managed by the buddy allocator") Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Reported-by: Patrick Daly <quic_pdaly@quicinc.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: <stable@vger.kernel.org> [4.9+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The block device uses multiple queues to access emmc. There will be up to 3
requests in the hsq of the host. The current code will check whether there
is a request doing recovery before entering the queue, but it will not check
whether there is a request when the lock is issued. The request is in recovery
mode. If there is a request in recovery, then a read and write request is
initiated at this time, and the conflict between the request and the recovery
request will cause the data to be trampled.
According to the datasheet [1] at page 377, 4-bit bus width is turned on by
bit 2 of the Bus Width Register. Thus the current bitmask is wrong: define
BUS_WIDTH_4 BIT(1)
BIT(1) does not work but BIT(2) works. This has been verified on real MOXA
hardware with FTSDC010 controller revision 1_6_0.
The corrected value of BUS_WIDTH_4 mask collides with: define BUS_WIDTH_8
BIT(2). Additionally, 8-bit bus width mode isn't supported according to the
datasheet, so let's remove the corresponding code.
Commit 1527f69204fe ("ata: ahci: Add Green Sardine vendor ID as
board_ahci_mobile") added an explicit entry for AMD Green Sardine
AHCI controller using the board_ahci_mobile configuration (this
configuration has later been renamed to board_ahci_low_power).
The board_ahci_low_power configuration enables support for low power
modes.
This explicit entry takes precedence over the generic AHCI controller
entry, which does not enable support for low power modes.
Therefore, when commit 1527f69204fe ("ata: ahci: Add Green Sardine
vendor ID as board_ahci_mobile") was backported to stable kernels,
it make some Pioneer optical drives, which was working perfectly fine
before the commit was backported, stop working.
The real problem is that the Pioneer optical drives do not handle low
power modes correctly. If these optical drives would have been tested
on another AHCI controller using the board_ahci_low_power configuration,
this issue would have been detected earlier.
Unfortunately, the board_ahci_low_power configuration is only used in
less than 15% of the total AHCI controller entries, so many devices
have never been tested with an AHCI controller with low power modes.
Fixes: 1527f69204fe ("ata: ahci: Add Green Sardine vendor ID as board_ahci_mobile") Cc: stable@vger.kernel.org Reported-by: Jaap Berkhout <j.j.berkhout@staalenberk.nl> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Move the PLL init of the switch out of the pad configuration of the port
6 (usally cpu port).
Fix a unidirectional 100 mbit limitation on 1 gbit or 2.5 gbit links for
outbound traffic on port 5 or port 6.
Fixes: c288575f7810 ("net: dsa: mt7530: Add the support of MT7531 switch") Cc: stable@vger.kernel.org Signed-off-by: Alexander Couzens <lynxis@fe80.eu> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix this by adding sanity check on extended system files' directory inode
to ensure that it is directory, just like ntfs_extend_init() when mounting
ntfs3.
Link: https://lkml.kernel.org/r/20220809064730.2316892-1-chenxiaosong2@huawei.com Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com> Cc: Anton Altaparmakov <anton@tuxera.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Access to registers is guarded by ingenic_tcu_{enable,disable}_regs()
so the stop bit can be cleared before accessing a timer channel, but
those functions did not clear the stop bit on SoCs with a global TCU
clock gate.
Testing on the X1000 has revealed that the stop bits must be cleared
_and_ the global TCU clock must be ungated to access timer registers.
This appears to be the norm on Ingenic SoCs, and is specified in the
documentation for the X1000 and numerous JZ47xx SoCs.
If the stop bit isn't cleared, register writes don't take effect and
the system can be left in a broken state, eg. the watchdog timer may
not run.
The bug probably went unnoticed because stop bits are zeroed when
the SoC is reset, and the kernel does not set them unless a timer
gets disabled at runtime. However, it is possible that a bootloader
or a previous kernel (if using kexec) leaves the stop bits set and
we should not rely on them being cleared.
Fixing this is easy: have ingenic_tcu_{enable,disable}_regs() always
clear the stop bit, regardless of the presence of a global TCU gate.
Reviewed-by: Paul Cercueil <paul@crapouillou.net> Tested-by: Paul Cercueil <paul@crapouillou.net> Fixes: 4f89e4b8f121 ("clk: ingenic: Add driver for the TCU clocks") Cc: stable@vger.kernel.org Signed-off-by: Aidan MacDonald <aidanmacdonald.0x0@gmail.com> Link: https://lore.kernel.org/r/20220617122254.738900-1-aidanmacdonald.0x0@gmail.com Signed-off-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Both i.MX6 and i.MX8 reference manuals list 0xBF8 as SNVS_HPVIDR1
(chapters 57.9 and 6.4.5 respectively).
Without this, trying to read the revision number results in 0 on
all revisions, causing the i.MX6 quirk to apply on all platforms,
which in turn causes the driver to synthesise power button release
events instead of passing the real one as they happen even on
platforms like i.MX8 where that's not wanted.
Fixes: 1a26c920717a ("Input: snvs_pwrkey - send key events for i.MX6 S, DL and Q") Tested-by: Martin Kepplinger <martin.kepplinger@puri.sm> Signed-off-by: Sebastian Krzyszkowiak <sebastian.krzyszkowiak@puri.sm> Reviewed-by: Mattijs Korpershoek <mkorpershoek@baylibre.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/4599101.ElGaqSPkdT@pliszka Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If any software has interacted with the USB4 registers before the Linux
USB4 CM runs, it may have modified the plug events delay. It has been
observed that if this value too large, it's possible that hotplugged
devices will negotiate a fallback mode instead in Linux.
To prevent this, explicitly align the plug events delay with the USB4
spec value of 10ms.
Cc: stable@vger.kernel.org Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sink only devices do not have any source capabilities, so
the driver should not warn about that. Also DRP (Dual Role
Power) capable devices, such as USB Type-C docking stations,
do not return any source capabilities unless they are
plugged to a power supply themselves.
Fixes: 1f4642b72be7 ("usb: typec: ucsi: Retrieve all the PDOs instead of just the first 4") Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> Cc: <stable@vger.kernel.org> Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Link: https://lore.kernel.org/r/20220922145924.80667-1-heikki.krogerus@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The UAS mode of Thinkplus(0x17ef, 0x3899) is reported to influence
performance and trigger kernel panic on several platforms with the
following error message:
The UAS mode of Hiksemi USB_HDD is reported to fail to work on several
platforms with the following error message, then after re-connecting the
device will be offlined and not working at all.
The UAS mode of Hiksemi is reported to fail to work on several platforms
with the following error message, then after re-connecting the device will
be offlined and not working at all.
1) The cleaner kthread tries to start a transaction to delete an unused
block group, but the metadata reservation can not be satisfied right
away, so a reservation ticket is created and it starts the async
metadata reclaim task (fs_info->async_reclaim_work);
2) Writeback for all the filler inodes with an i_size of 2K starts
(generic/562 creates a lot of 2K files with the goal of filling
metadata space). We try to create an inline extent for them, but we
fail when trying to insert the inline extent with -ENOSPC (at
cow_file_range_inline()) - since this is not critical, we fallback
to non-inline mode (back to cow_file_range()), reserve extents, create
extent maps and create the ordered extents;
3) An unmount starts, enters close_ctree();
4) The async reclaim task is flushing stuff, entering the flush states one
by one, until it reaches RUN_DELAYED_IPUTS. There it runs all current
delayed iputs.
After running the delayed iputs and before calling
btrfs_wait_on_delayed_iputs(), one or more ordered extents complete,
and btrfs_add_delayed_iput() is called for each one through
btrfs_finish_ordered_io() -> btrfs_put_ordered_extent(). This results
in bumping fs_info->nr_delayed_iputs from 0 to some positive value.
So the async reclaim task blocks at btrfs_wait_on_delayed_iputs() waiting
for fs_info->nr_delayed_iputs to become 0;
5) The current transaction is committed by the transaction kthread, we then
start unpinning extents and end up calling btrfs_try_granting_tickets()
through unpin_extent_range(), since we released some space.
This results in satisfying the ticket created by the cleaner kthread at
step 1, waking up the cleaner kthread;
6) At close_ctree() we ask the cleaner kthread to park;
7) The cleaner kthread starts the transaction, deletes the unused block
group, and then calls kthread_should_park(), which returns true, so it
parks. And at this point we have the delayed iputs added by the
completion of the ordered extents still pending;
8) Then later at close_ctree(), when we call:
cancel_work_sync(&fs_info->async_reclaim_work);
We hang forever, since the cleaner was parked and no one else can run
delayed iputs after that, while the reclaim task is waiting for the
remaining delayed iputs to be completed.
Fix this by waiting for all ordered extents to complete and running the
delayed iputs before attempting to stop the async reclaim tasks. Note that
we can not wait for ordered extents with btrfs_wait_ordered_roots() (or
other similar functions) because that waits for the BTRFS_ORDERED_COMPLETE
flag to be set on an ordered extent, but the delayed iput is added after
that, when doing the final btrfs_put_ordered_extent(). So instead wait for
the work queues used for executing ordered extent completion to be empty,
which works because we do the final put on an ordered extent at
btrfs_finish_ordered_io() (while we are in the unmount context).
Fixes: d6fd0ae25c6495 ("Btrfs: fix missing delayed iputs on unmount") CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Nvidia HDA HW expects infoframe data bytes order same for both
HDMI and DP i.e infoframe data starts from 5th bytes offset. As
dp infoframe structure has 4th byte as valid infoframe data, use
hdmi infoframe structure for nvidia dp infoframe to match HW behvaior.
If the platform set the dyn_pcm_assign to true, it will call
hdmi_find_pcm_slot() to find a pcm slot when hdmi/dp monitor is
connected and need to create a pcm.
So far only intel_hsw_common_init() and patch_nvhdmi() set the
dyn_pcm_assign to true, here we let tgl platforms assign the pcm slot
dynamically first, if the driver runs for a period of time and there
is no regression reported, we could set no_fixed_assgin to true in
the intel_hsw_common_init(), and then set it to true in the
patch_nvhdmi().
This change comes from the discussion between Takashi and
Kai Vehmanen. Please refer to:
https://github.com/alsa-project/alsa-lib/pull/118
Use clk_bulk helpers to make code cleaner. Note that this patch changed
the order in which clocks are enabled to make code look nicer, but this
doesn't matter in terms of hardware.
Add support for Maple Ridge discrete USB4 host controller from Intel
which has a single USB4 port (versus the already supported dual port
Maple Ridge USB4 host controller).
Cc: stable@vger.kernel.org Signed-off-by: Gil Fine <gil.fine@intel.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Maple Ridge is first discrete USB4 host controller from Intel. It comes
with firmware based connection manager and the flows are similar as used
in Intel Titan Ridge.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Stable-dep-of: 14c7d9052837 ("thunderbolt: Add support for Intel Maple Ridge single port controller") Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently the Orlov inode allocator searches for free inodes for a
directory only in flex block groups with at most inodes_per_group/16
more directory inodes than average per flex block group. However with
growing size of flex block group this becomes unnecessarily strict.
Scale allowed difference from average directory count per flex block
group with flex block group size as we do with other metrics.
This patch avoids threads live-locking for hours when a large number
threads are competing over the last few free extents as they blocks
getting added and removed from preallocation pools. From our bug
reporter:
A reliable way for triggering this has multiple writers
continuously write() to files when the filesystem is full, while
small amounts of space are freed (e.g. by truncating a large file
-1MiB at a time). In the local filesystem, this can be done by
simply not checking the return code of write (0) and/or the error
(ENOSPACE) that is set. Over NFS with an async mount, even clients
with proper error checking will behave this way since the linux NFS
client implementation will not propagate the server errors [the
write syscalls immediately return success] until the file handle is
closed. This leads to a situation where NFS clients send a
continuous stream of WRITE rpcs which result in ERRNOSPACE -- but
since the client isn't seeing this, the stream of writes continues
at maximum network speed.
When some space does appear, multiple writers will all attempt to
claim it for their current write. For NFS, we may see dozens to
hundreds of threads that do this.
The real-world scenario of this is database backup tooling (in
particular, github.com/mdkent/percona-xtrabackup) which may write
large files (>1TiB) to NFS for safe keeping. Some temporary files
are written, rewound, and read back -- all before closing the file
handle (the temp file is actually unlinked, to trigger automatic
deletion on close/crash.) An application like this operating on an
async NFS mount will not see an error code until TiB have been
written/read.
The lockup was observed when running this database backup on large
filesystems (64 TiB in this case) with a high number of block
groups and no free space. Fragmentation is generally not a factor
in this filesystem (~thousands of large files, mostly contiguous
except for the parts written while the filesystem is at capacity.)
When walking through an inode extents, the ext4_ext_binsearch_idx() function
assumes that the extent header has been previously validated. However, there
are no checks that verify that the number of entries (eh->eh_entries) is
non-zero when depth is > 0. And this will lead to problems because the
EXT_FIRST_INDEX() and EXT_LAST_INDEX() will return garbage and result in this:
The "hmem" platform-devices that are created to represent the
platform-advertised "Soft Reserved" memory ranges end up inserting a
resource that causes the iomem_resource tree to look like this:
This is because insert_resource() reparents ranges when they completely
intersect an existing range.
This matters because code that uses region_intersects() to scan for a
given IORES_DESC will only check that top-level 'hmem.0' resource and
not the 'Soft Reserved' descendant.
So, to support EINJ (via einj_error_inject()) to inject errors into
memory hosted by a dax-device, be sure to describe the memory as
IORES_DESC_SOFT_RESERVED. This is a follow-on to:
commit b13a3e5fd40b ("ACPI: APEI: Fix _EINJ vs EFI_MEMORY_SP")
...that fixed EINJ support for "Soft Reserved" ranges in the first
instance.
Fixes: 262b45ae3ab4 ("x86/efi: EFI soft reservation to E820 enumeration") Reported-by: Ricardo Sandoval Torres <ricardo.sandoval.torres@intel.com> Tested-by: Ricardo Sandoval Torres <ricardo.sandoval.torres@intel.com> Cc: <stable@vger.kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Omar Avelar <omar.avelar@intel.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Mark Gross <markgross@kernel.org> Link: https://lore.kernel.org/r/166397075670.389916.7435722208896316387.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The i2c-mlxbf.c driver is currently broken because there is a bug
in the calculation of the frequency. core_f, core_r and core_od
are components read from hardware registers and are used to
compute the frequency used to compute different timing parameters.
The shifting mechanism used to get core_f, core_r and core_od is
wrong. Use FIELD_GET to mask and shift the bitfields properly.
Correct the base address used during io write.
This bug had no impact over the overall functionality of the read and write
transactions. MLXBF_I2C_CAUSE_OR_CLEAR=0x18 so writing to (smbus->io + 0x18)
instead of (mst_cause->ioi + 0x18) actually writes to the sc_low_timeout
register which just sets the timeout value before a read/write aborts.
syzbot should have been able to catch cancel_work_sync() in work context
by checking lockdep_map in __flush_work() for both flush and cancel.
in [1], being unable to report an obvious deadlock scenario shown below is
broken. From locking dependency perspective, sync version of cancel request
should behave as if flush request, for it waits for completion of work if
that work has already started execution.
The check this patch restores was added by commit 0976dfc1d0cd80a4
("workqueue: Catch more locking problems with flush_work()").
Then, lockdep's crossrelease feature was added by commit b09be676e0ff25bd
("locking/lockdep: Implement the 'crossrelease' feature"). As a result,
this check was once removed by commit fd1a5b04dfb899f8 ("workqueue: Remove
now redundant lock acquisitions wrt. workqueue flushes").
But lockdep's crossrelease feature was removed by commit e966eaeeb623f099
("locking/lockdep: Remove the cross-release locking checks"). At this
point, this check should have been restored.
Then, commit d6e89786bed977f3 ("workqueue: skip lockdep wq dependency in
cancel_work_sync()") introduced a boolean flag in order to distinguish
flush_work() and cancel_work_sync(), for checking "struct workqueue_struct"
dependency when called from cancel_work_sync() was causing false positives.
Then, commit 87915adc3f0acdf0 ("workqueue: re-add lockdep dependencies for
flushing") tried to restore "struct work_struct" dependency check, but by
error checked this boolean flag. Like an example shown above indicates,
"struct work_struct" dependency needs to be checked for both flush_work()
and cancel_work_sync().
The mode_valid field in drm_connector_helper_funcs is expected to be of
type:
enum drm_mode_status (* mode_valid) (struct drm_connector *connector,
struct drm_display_mode *mode);
The mismatched return type breaks forward edge kCFI since the underlying
function definition does not match the function hook definition.
The return type of cdn_dp_connector_mode_valid should be changed from
int to enum drm_mode_status.
Commit a0f7e7f759cf ("drm/amd/display: fix i386 frame size warning")
aimed to address this for i386 but it did not help x86_64.
To reduce the amount of stack space that
dml30_ModeSupportAndSystemConfigurationFull() uses, mark
UseMinimumDCFCLK() as noinline, using the _for_stack variant for
documentation. While this will increase the total amount of stack usage
between the two functions (1632 and 1304 bytes respectively), it will
make sure both stay below the limit of 2048 bytes for these files. The
aforementioned change does help reduce UseMinimumDCFCLK()'s stack usage
so it should not be reverted in favor of this change.
[Why]
For HDR mode, we get total 512 tf_point and after switching to SDR mode
we actually get 400 tf_point and the rest of points(401~512) still use
dirty value from HDR mode. We should limit the rest of the points to max
value.
[How]
Limit the value when coordinates_x.x > 1, just like what we do in
translate_from_linear_space for other re-gamma build paths.
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Reviewed-by: Krunoslav Kovac <Krunoslav.Kovac@amd.com> Reviewed-by: Aric Cyr <Aric.Cyr@amd.com> Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com> Signed-off-by: Yao Wang1 <Yao.Wang1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
the interesting part is the 'f8000000' region as it is actually the
VM's framebuffer:
$ lspci -v
...
0000:00:08.0 VGA compatible controller: Microsoft Corporation Hyper-V virtual VGA (prog-if 00 [VGA controller])
Flags: bus master, fast devsel, latency 0, IRQ 11
Memory at f8000000 (32-bit, non-prefetchable) [size=64M]
...
hv_vmbus: registering driver hyperv_drm
hyperv_drm 5620e0c7-8062-4dce-aeb7-520c7ef76171: [drm] Synthvid Version major 3, minor 5
hyperv_drm 0000:00:08.0: vgaarb: deactivate vga console
hyperv_drm 0000:00:08.0: BAR 0: can't reserve [mem 0xf8000000-0xfbffffff]
hyperv_drm 5620e0c7-8062-4dce-aeb7-520c7ef76171: [drm] Cannot request framebuffer, boot fb still active?
Note: "Cannot request framebuffer" is not a fatal error in
hyperv_setup_gen1() as the code assumes there's some other framebuffer
device there but we actually have some other PCI device (mlx4 in this
case) config space there!
The problem appears to be that vmbus_allocate_mmio() can use dedicated
framebuffer region to serve any MMIO request from any device. The
semantics one might assume of a parameter named "fb_overlap_ok"
aren't implemented because !fb_overlap_ok essentially has no effect.
The existing semantics are really "prefer_fb_overlap". This patch
implements the expected and needed semantics, which is to not allocate
from the frame buffer space when !fb_overlap_ok.
Note, Gen2 VMs are usually unaffected by the issue because
framebuffer region is already taken by EFI fb (in case kernel supports
it) but Gen1 VMs may have this region unclaimed by the time Hyper-V PCI
pass-through driver tries allocating MMIO space if Hyper-V DRM/FB drivers
load after it. Devices can be brought up in any sequence so let's
resolve the issue by always ignoring 'fb_mmio' region for non-FB
requests, even if the region is unclaimed.
So far we were just lucky because the uninitialized members
of struct msghdr are not used by default on a SOCK_STREAM tcp
socket.
But as new things like msg_ubuf and sg_from_iter where added
recently, we should play on the safe side and avoid potention
problems in future.
Signed-off-by: Stefan Metzmacher <metze@samba.org> Cc: stable@vger.kernel.org Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The iterator, ITER_DISCARD, that can only be used in READ mode and
just discards any data copied to it, was added to allow a network
filesystem to discard any unwanted data sent by a server.
Convert cifs_discard_from_socket() to use this.
Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
Stable-dep-of: bedc8f76b353 ("cifs: always initialize struct msghdr smb_msg completely") Signed-off-by: Sasha Levin <sashal@kernel.org>
Use positive logic to check for RAS
support. Rename the function to actually indicate
what it is testing for. Essentially, make the
function a predicate with the correct name.
Cc: Stanley Yang <Stanley.Yang@amd.com> Cc: Alexander Deucher <Alexander.Deucher@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Stable-dep-of: 6c2049066355 ("drm/amdgpu: Don't enable LTR if not supported") Signed-off-by: Sasha Levin <sashal@kernel.org>
vaddr_get_pfns() now returns the positive number of pfns successfully
gotten instead of zero. vfio_pin_page_external() might return 1 to
vfio_iommu_type1_pin_pages(), which will treat it as an error, if
vaddr_get_pfns() is successful but vfio_pin_page_external() doesn't
reach vfio_lock_acct().
Fix it up in vfio_pin_page_external(). Found by inspection.
Fixes: be16c1fd99f4 ("vfio/type1: Change success value of vaddr_get_pfn()") Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Message-Id: <20210308172452.38864-1-daniel.m.jordan@oracle.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bus bandwidth array access is based on esit, increase one
will cause out-of-bounds issue; for example, when esit is
XHCI_MTK_MAX_ESIT, will overstep boundary.
Fixes: 7c986fbc16ae ("usb: xhci-mtk: get the microframe boundary for ESIT") Cc: <stable@vger.kernel.org> Reported-by: Stan Lu <stan.lu@mediatek.com> Signed-off-by: Chunfeng Yun <chunfeng.yun@mediatek.com> Link: https://lore.kernel.org/r/1629189389-18779-5-git-send-email-chunfeng.yun@mediatek.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix Oops in dasd_alias_get_start_dev() function caused by the pavgroup
pointer being NULL.
The pavgroup pointer is checked on the entrance of the function but
without the lcu->lock being held. Therefore there is a race window
between dasd_alias_get_start_dev() and _lcu_update() which sets
pavgroup to NULL with the lcu->lock held.
Fix by checking the pavgroup pointer with lcu->lock held.
Cc: <stable@vger.kernel.org> # 2.6.25+ Fixes: 8e09f21574ea ("[S390] dasd: add hyper PAV support to DASD device driver, part 1") Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com> Link: https://lore.kernel.org/r/20220919154931.4123002-2-sth@linux.ibm.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[Why]
This fixes 892deb48269c ("drm/amdgpu: Separate vf2pf work item init from virt data exchange").
we should read pf2vf data based at mman.fw_vram_usage_va after gmc
sw_init. commit 892deb48269c breaks this logic.
[How]
calling amdgpu_virt_exchange_data in amdgpu_virt_init_data_exchange to
set the right base in the right sequence.
v2:
call amdgpu_virt_init_data_exchange after gmc sw_init to make data
exchange workqueue run
v3:
clean up the code logic
v4:
add some comment and make the code more readable
Fixes: 892deb48269c ("drm/amdgpu: Separate vf2pf work item init from virt data exchange") Signed-off-by: Jingwen Chen <Jingwen.Chen2@amd.com> Reviewed-by: Horace Chen <horace.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
RHEL/Fedora RPM build checks are stricter, and complain when executable
files don't have a shebang line, e.g.
*** WARNING: ./kselftests/net/forwarding/sch_red.sh is executable but has no shebang, removing executable bit
Fix it by adding shebang line.
Fixes: 6cf0291f9517 ("selftests: forwarding: Add a RED test for SW datapath") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/20220922024453.437757-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
tfilter_put need to be called to put the refount got by tp->ops->get to
avoid possible refcount leak when chain->tmplt_ops != NULL and
chain->tmplt_ops != tp->ops.
Fixes: 7d5509fa0d3d ("net: sched: extend proto ops with 'put' callback") Signed-off-by: Hangyu Hua <hbh25y@gmail.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Link: https://lore.kernel.org/r/20220921092734.31700-1-hbh25y@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
There is a separate receive path for small packets (under 256 bytes).
Instead of allocating a new dma-capable skb to be used for the next packet,
this path allocates a skb and copies the data into it (reusing the existing
sbk for the next packet). There are two bytes of junk data at the beginning
of every packet. I believe these are inserted in order to allow aligned DMA
and IP headers. We skip over them using skb_reserve. Before copying over
the data, we must use a barrier to ensure we see the whole packet. The
current code only synchronizes len bytes, starting from the beginning of
the packet, including the junk bytes. However, this leaves off the final
two bytes in the packet. Synchronize the whole packet.
To reproduce this problem, ping a HME with a payload size between 17 and
214
$ ping -s 17 <hme_address>
which will complain rather loudly about the data mismatch. Small packets
(below 60 bytes on the wire) do not have this issue. I suspect this is
related to the padding added to increase the minimum packet size.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Sean Anderson <seanga2@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20220920235018.1675956-1-seanga2@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
There might be a potential race between SMC-R buffer map and
link group termination.
smc_smcr_terminate_all() | smc_connect_rdma()
--------------------------------------------------------------
| smc_conn_create()
for links in smcibdev |
schedule links down |
| smc_buf_create()
| \- smcr_buf_map_usable_links()
| \- no usable links found,
| (rmb->mr = NULL)
|
| smc_clc_send_confirm()
| \- access conn->rmb_desc->mr[]->rkey
| (panic)
During reboot and IB device module remove, all links will be set
down and no usable links remain in link groups. In such situation
smcr_buf_map_usable_links() should return an error and stop the
CLC flow accessing to uninitialized mr.
As the comment right before the mtk_dsi_stop() call advises,
mtk_dsi_stop() should only be called after
mtk_drm_crtc_atomic_disable(). That's because that function calls
drm_crtc_wait_one_vblank(), which requires the vblank irq to be enabled.
Previously mtk_dsi_stop(), being in mtk_dsi_poweroff() and guarded by a
refcount, would only be called at the end of
mtk_drm_crtc_atomic_disable(), through the call to mtk_crtc_ddp_hw_fini().
Commit cde7e2e35c28 ("drm/mediatek: Separate poweron/poweroff from
enable/disable and define new funcs") moved the mtk_dsi_stop() call to
mtk_output_dsi_disable(), causing it to be called before
mtk_drm_crtc_atomic_disable(), and consequently generating vblank
timeout warnings during suspend.
Move the mtk_dsi_stop() call back to mtk_dsi_poweroff() so that we have
a working vblank irq during mtk_drm_crtc_atomic_disable() and stop
getting vblank timeout warnings.
Fixes: cde7e2e35c28 ("drm/mediatek: Separate poweron/poweroff from enable/disable and define new funcs") Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> Tested-by: Hsin-Yi Wang <hsinyi@chromium.org> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Tested-by: Allen-KH Cheng <allen-kh.cheng@mediatek.com> Link: http://lists.infradead.org/pipermail/linux-mediatek/2022-August/046713.html Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
/proc/kallsyms and /proc/modules are compared before and after the copy
in order to ensure no changes during the copy.
However /proc/modules also might change due to reference counts changing
even though that does not make any difference.
Any modules loaded or unloaded should be visible in changes to kallsyms,
so it is not necessary to check /proc/modules also anyway.
Remove the comparison checking that /proc/modules is unchanged.
Fixes: fc1b691d7651d949 ("perf buildid-cache: Add ability to add kcore to the cache") Reported-by: Daniel Dao <dqminh@cloudflare.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Daniel Dao <dqminh@cloudflare.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20220914122429.8770-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The missing header makes it hard for programs like elfutils to open
these files.
Fixes: 2d86612aacb7805f ("perf symbol: Correct address for bss symbols") Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Lieven Hey <lieven.hey@kdab.com> Tested-by: Leo Yan <leo.yan@linaro.org> Cc: Leo Yan <leo.yan@linaro.org> Link: https://lore.kernel.org/r/20220915092910.711036-1-lieven.hey@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The dev->can.state is set to CAN_STATE_ERROR_ACTIVE, after the device
has been started. On busy networks the CAN controller might receive
CAN frame between and go into an error state before the dev->can.state
is assigned.
Assign dev->can.state before starting the controller to close the race
window.
The bug fix was incomplete, it "replaced" crash with a memory leak.
The old code had an assignment to "ret" embedded into the conditional,
restore this.
Fixes: 7997eff82828 ("netfilter: ebtables: reject blobs that don't provide all entry points") Reported-and-tested-by: syzbot+a24c5252f3e3ab733464@syzkaller.appspotmail.com Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
It seems to me that percpu memory for chain stats started leaking since
commit 3bc158f8d0330f0a ("netfilter: nf_tables: map basechain priority to
hardware priority") when nft_chain_offload_priority() returned an error.
syzbot is reporting underflow of nft_counters_enabled counter at
nf_tables_addchain() [1], for commit 43eb8949cfdffa76 ("netfilter:
nf_tables: do not leave chain stats enabled on error") missed that
nf_tables_chain_destroy() after nft_basechain_init() in the error path of
nf_tables_addchain() decrements the counter because nft_basechain_init()
makes nft_is_base_chain() return true by setting NFT_CHAIN_BASE flag.
Increment the counter immediately after returning from
nft_basechain_init().
This is because tc_modify_qdisc() behaves differently when mqprio is
root, vs when taprio is root.
In the mqprio case, it finds the parent qdisc through
p = qdisc_lookup(dev, TC_H_MAJ(clid)), and then the child qdisc through
q = qdisc_leaf(p, clid). This leaf qdisc q has handle 0, so it is
ignored according to the comment right below ("It may be default qdisc,
ignore it"). As a result, tc_modify_qdisc() goes through the
qdisc_create() code path, and this gives taprio_init() a chance to check
for sch_parent != TC_H_ROOT and error out.
Whereas in the taprio case, the returned q = qdisc_leaf(p, clid) is
different. It is not the default qdisc created for each netdev queue
(both taprio and mqprio call qdisc_create_dflt() and keep them in
a private q->qdiscs[], or priv->qdiscs[], respectively). Instead, taprio
makes qdisc_leaf() return the _root_ qdisc, aka itself.
When taprio does that, tc_modify_qdisc() goes through the qdisc_change()
code path, because the qdisc layer never finds out about the child qdisc
of the root. And through the ->change() ops, taprio has no reason to
check whether its parent is root or not, just through ->init(), which is
not called.
The problem is the taprio_leaf() implementation. Even though code wise,
it does the exact same thing as mqprio_leaf() which it is copied from,
it works with different input data. This is because mqprio does not
attach itself (the root) to each device TX queue, but one of the default
qdiscs from its private array.
In fact, since commit 13511704f8d7 ("net: taprio offload: enforce qdisc
to netdev queue mapping"), taprio does this too, but just for the full
offload case. So if we tried to attach a taprio child to a fully
offloaded taprio root qdisc, it would properly fail too; just not to a
software root taprio.
To fix the problem, stop looking at the Qdisc that's attached to the TX
queue, and instead, always return the default qdiscs that we've
allocated (and to which we privately enqueue and dequeue, in software
scheduling mode).
Since Qdisc_class_ops :: leaf is only called from tc_modify_qdisc(),
the risk of unforeseen side effects introduced by this change is
minimal.
Fixes: 5a781ccbd19e ("tc: Add support for configuring the taprio scheduler") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
In an incredibly strange API design decision, qdisc->destroy() gets
called even if qdisc->init() never succeeded, not exclusively since
commit 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation"),
but apparently also earlier (in the case of qdisc_create_dflt()).
The taprio qdisc does not fully acknowledge this when it attempts full
offload, because it starts off with q->flags = TAPRIO_FLAGS_INVALID in
taprio_init(), then it replaces q->flags with TCA_TAPRIO_ATTR_FLAGS
parsed from netlink (in taprio_change(), tail called from taprio_init()).
But in taprio_destroy(), we call taprio_disable_offload(), and this
determines what to do based on FULL_OFFLOAD_IS_ENABLED(q->flags).
But looking at the implementation of FULL_OFFLOAD_IS_ENABLED()
(a bitwise check of bit 1 in q->flags), it is invalid to call this macro
on q->flags when it contains TAPRIO_FLAGS_INVALID, because that is set
to U32_MAX, and therefore FULL_OFFLOAD_IS_ENABLED() will return true on
an invalid set of flags.
As a result, it is possible to crash the kernel if user space forces an
error between setting q->flags = TAPRIO_FLAGS_INVALID, and the calling
of taprio_enable_offload(). This is because drivers do not expect the
offload to be disabled when it was never enabled.
The error that we force here is to attach taprio as a non-root qdisc,
but instead as child of an mqprio root qdisc:
Fixes: 9c66d1564676 ("taprio: Add support for hardware offloading") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Since dynamic registration of the gifconf() helper is only used for
IPv4, and this can not be in a loadable module, this can be simplified
noticeably by turning it into a direct function call as a preparation
for cleaning up the compat handling.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: 5641c751fe2f ("net: enetc: deny offload of tc-based TSN features on VF interfaces") Signed-off-by: Sasha Levin <sashal@kernel.org>
The VF netdev driver shouldn't respond to changes in the NETIF_F_HW_TC
flag; only PFs should. Moreover, TSN-specific code should go to
enetc_qos.c, which should not be included in the VF driver.
Fixes: 79e499829f3f ("net: enetc: add hw tc hw offload features for PSPF capability") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220916133209.3351399-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Doing a variable-sized memcpy is slower, and the compiler isn't smart
enough to turn this into a constant-size assignment.
Further, Kees' latest fortified memcpy will actually bark, because the
destination pointer is type sockaddr, not explicitly sockaddr_in or
sockaddr_in6, so it thinks there's an overflow:
memcpy: detected field-spanning write (size 28) of single field
"&endpoint.addr" at drivers/net/wireguard/netlink.c:446 (size 16)
Fix this by just assigning by using explicit casts for each checked
case.
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reported-by: syzbot+a448cda4dba2dac50de5@syzkaller.appspotmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
A previous commit tried to make the ratelimiter timings test more
reliable but in the process made it less reliable on other
configurations. This is an impossible problem to solve without
increasingly ridiculous heuristics. And it's not even a problem that
actually needs to be solved in any comprehensive way, since this is only
ever used during development. So just cordon this off with a DEBUG_
ifdef, just like we do for the trie's randomized tests, so it can be
enabled while hacking on the code, and otherwise disabled in CI. In the
process we also revert 151c8e499f47.
Fixes: 151c8e499f47 ("wireguard: ratelimiter: use hrtimer in selftest") Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
IPA can route packets between IPA-connected entities. The AP and
modem are currently the only such entities supported, and no routing
is required to transfer packets between them.
The number of entries in each routing table is fixed, and defined at
initialization time. Some of these entries are designated for use
by the modem, and the rest are available for the AP to use. The AP
sends a QMI message to the modem which describes (among other
things) information about routing table memory available for the
modem to use.
Currently the QMI initialization packet gives wrong information in
its description of routing tables. What *should* be supplied is the
maximum index that the modem can use for the routing table memory
located at a given location. The current code instead supplies the
total *number* of routing table entries. Furthermore, the modem is
granted the entire table, not just the subset it's supposed to use.
This patch fixes this. First, the ipa_mem_bounds structure is
generalized so its "end" field can be interpreted either as a final
byte offset, or a final array index. Second, the IPv4 and IPv6
(non-hashed and hashed) table information fields in the QMI
ipa_init_modem_driver_req structure are changed to be ipa_mem_bounds
rather than ipa_mem_array structures. Third, we set the "end" value
for each routing table to be the last index, rather than setting the
"count" to be the number of indices. Finally, instead of allowing
the modem to use all of a routing table's memory, it is limited to
just the portion meant to be used by the modem. In all versions of
IPA currently supported, that is IPA_ROUTE_MODEM_COUNT (8) entries.
Entries in an IPA route or filter table are 64-bit little-endian
addresses, each of which refers to a routing or filtering rule.
The format of these table slots are fixed, but IPA_TABLE_ENTRY_SIZE
is used to define their size. This symbol doesn't really add value,
and I think it unnecessarily obscures what a table entry *is*.
So get rid of IPA_TABLE_ENTRY_SIZE, and just use sizeof(__le64) in
its place throughout the code.
Update the comments in "ipa_table.c" to provide a little better
explanation of these table slots.
Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: cf412ec33325 ("net: ipa: properly limit modem routing table use") Signed-off-by: Sasha Levin <sashal@kernel.org>
A recent patch avoided doing 64-bit modulo operations by checking
the alignment of some DMA allocations using only the lower 32 bits
of the address.
David Laight pointed out (after the fix was committed) that DMA
allocations might already satisfy the alignment requirements. And
he was right.
Remove the alignment checks that occur after DMA allocation requests,
and update comments to explain why the constraint is satisfied. The
only place IPA_TABLE_ALIGN was used was to check the alignment; it is
therefore no longer needed, so get rid of it.
Add comments where GSI_RING_ELEMENT_SIZE and the tre_count and
event_count channel data fields are defined to make explicit they
are required to be powers of 2.
Revise a comment in gsi_trans_pool_init_dma(), taking into account
that dma_alloc_coherent() guarantees its result is aligned to a page
size (or order thereof).
Don't bother printing an error if a DMA allocation fails.
Suggested-by: David Laight <David.Laight@ACULAB.COM> Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: cf412ec33325 ("net: ipa: properly limit modem routing table use") Signed-off-by: Sasha Levin <sashal@kernel.org>
It is possible for a 32 bit x86 build to use a 64 bit DMA address.
There are two remaining spots where the IPA driver does a modulo
operation to check alignment of a DMA address, and under certain
conditions this can lead to a build error on i386 (at least).
The alignment checks we're doing are for power-of-2 values, and this
means the lower 32 bits of the DMA address can be used. This ensures
both operands to the modulo operator are 32 bits wide.
Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Alex Elder <elder@linaro.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: cf412ec33325 ("net: ipa: properly limit modem routing table use") Signed-off-by: Sasha Levin <sashal@kernel.org>
We currently have a build-time check to ensure that the minimum DMA
allocation alignment satisfies the constraint that IPA filter and
route tables must point to rules that are 128-byte aligned.
But what's really important is that the actual allocated DMA memory
has that alignment, even if the minimum is smaller than that.
Remove the BUILD_BUG_ON() call checking against minimim DMA alignment
and instead verify at rutime that the allocated memory is properly
aligned.
Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: cf412ec33325 ("net: ipa: properly limit modem routing table use") Signed-off-by: Sasha Levin <sashal@kernel.org>