It seems that the beep playback doesn't work well on IDT codec devices
when the codec auto-pm is enabled. Keep the power on while the beep
switch is enabled.
syzbot is hitting percpu_rwsem_assert_held(&cpu_hotplug_lock) warning at
cpuset_attach() [1], for commit 4f7e7236435ca0ab ("cgroup: Fix
threadgroup_rwsem <-> cpus_read_lock() deadlock") missed that
cpuset_attach() is also called from cgroup_attach_task_all().
Add cpus_read_lock() like what cgroup_procs_write_start() does.
If we set XFRM security policy by calling setsockopt with option
IPV6_XFRM_POLICY, the policy will be stored in 'sock_policy' in 'sock'
struct. However tcp_v6_send_response doesn't look up dst_entry with the
actual socket but looks up with tcp control socket. This may cause a
problem that a RST packet is sent without ESP encryption & peer's TCP
socket can't receive it.
This patch will make the function look up dest_entry with actual socket,
if the socket has XFRM policy(sock_policy), so that the TCP response
packet via this function can be encrypted, & aligned on the encrypted
TCP socket.
Tested: We encountered this problem when a TCP socket which is encrypted
in ESP transport mode encryption, receives challenge ACK at SYN_SENT
state. After receiving challenge ACK, TCP needs to send RST to
establish the socket at next SYN try. But the RST was not encrypted &
peer TCP socket still remains on ESTABLISHED state.
So we verified this with test step as below.
[Test step]
1. Making a TCP state mismatch between client(IDLE) & server(ESTABLISHED).
2. Client tries a new connection on the same TCP ports(src & dst).
3. Server will return challenge ACK instead of SYN,ACK.
4. Client will send RST to server to clear the SOCKET.
5. Client will retransmit SYN to server on the same TCP ports.
[Expected result]
The TCP connection should be established.
Cc: Maciej Żenczykowski <maze@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Sehee Lee <seheele@google.com> Signed-off-by: Sewook Seo <sewookseo@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This mirrors what we do for other asics and this way we are
sure the sdma doorbell range is properly initialized.
There is a comment about the way doorbells on gfx9 work that
requires that they are initialized for other IPs before GFX
is initialized. However, the statement says that it applies to
multimedia as well, but the VCN code currently initializes
doorbells after GFX and there are no known issues there. In my
testing at least I don't see any problems on SDMA.
This is a prerequisite for fixing the Unsupported Request error
reported through AER during driver load.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216373
The error was unnoticed before and got visible because of the commit
referenced below. This doesn't fix anything in the commit below, rather
fixes the issue in amdgpu exposed by the commit. The reference is only
to associate this commit with below one so that both go together.
Fixes: 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()") Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This mirrors what we do for other asics and this way we are
sure the ih doorbell range is properly initialized.
There is a comment about the way doorbells on gfx9 work that
requires that they are initialized for other IPs before GFX
is initialized. In this case IH is initialized before GFX,
so there should be no issue.
This is a prerequisite for fixing the Unsupported Request error
reported through AER during driver load.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216373
The error was unnoticed before and got visible because of the commit
referenced below. This doesn't fix anything in the commit below, rather
fixes the issue in amdgpu exposed by the commit. The reference is only
to associate this commit with below one so that both go together.
Fixes: 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()") Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As per PCIE Base Spec r4.0 Section 6.18
'Software must not enable LTR in an Endpoint unless the Root Complex
and all intermediate Switches indicate support for LTR.'
This fixes the Unsupported Request error reported through AER during
ASPM enablement.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216455
The error was unnoticed before and got visible because of the commit
referenced below. This doesn't fix anything in the commit below, rather
fixes the issue in amdgpu exposed by the commit. The reference is only
to associate this commit with below one so that both go together.
Move common IP init before GMC init so that HDP gets
remapped before GMC init which uses it.
This fixes the Unsupported Request error reported through
AER during driver load. The error happens as a write happens
to the remap offset before real remapping is done.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216373
The error was unnoticed before and got visible because of the commit
referenced below. This doesn't fix anything in the commit below, rather
fixes the issue in amdgpu exposed by the commit. The reference is only
to associate this commit with below one so that both go together.
Fixes: 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()") Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix regression introduced by commit:
"drm/i915: Individualize fences before adding to dma_resv obj"
which sets obj->read_domains to 0 for both read and write paths.
Also set obj->write_domain to 0 on read path which was removed by
the commit.
References: https://gitlab.freedesktop.org/drm/intel/-/issues/6639 Fixes: 420a07b841d0 ("drm/i915: Individualize fences before adding to dma_resv obj") Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> Cc: <stable@vger.kernel.org> # v5.16+ Cc: Matthew Auld <matthew.auld@intel.com> Cc: Andrzej Hajda <andrzej.hajda@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220907172641.12555-1-nirmoy.das@intel.com
(cherry picked from commit 04f7eb3d4582a0a4da67c86e55fda7de2df86d91) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
tools/include/uapi/asm/errno.h currently attempts to include
non-existent arch-specific errno.h header for xtensa.
Remove this case so that <asm-generic/errno.h> is used instead,
and add the missing arch-specific header for parisc.
The previous patch triggered a build failure for the debian kernel,
which has CONFIG_64BIT enabled, uses the CROSS_COMPILER environment
variable and uses ARCH=parisc to configure the kernel for 64-bit
support.
This patch weakens the previous patch while keeping the recommended way
to configure the kernel with:
ARCH=parisc -> build 32-bit kernel
ARCH=parisc64 -> build 64-bit kernel
while adding the possibility for debian to configure a 64-bit kernel
even if ARCH=parisc is set (PA8X00 CPU has to be selected and
CONFIG_64BIT needs to be enabled).
The downside of this patch is, that we now have a small window open
again where people may get it wrong: if they enable CONFIG_64BIT and try
to compile with a 32-bit compiler.
Fixes: 3dcfb729b5f4 ("parisc: Make CONFIG_64BIT available for ARCH=parisc64 only") Signed-off-by: Helge Deller <deller@gmx.de> Cc: <stable@vger.kernel.org> # 5.15+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There's a bug in blkdev_issue_secure_erase. The statement
"unsigned int len = min_t(sector_t, nr_sects, max_sectors);"
sets the variable "len" to the length in sectors, but the statement
"bio->bi_iter.bi_size = len" treats it as if it were in bytes.
The statements "sector += len << SECTOR_SHIFT" and "nr_sects -= len <<
SECTOR_SHIFT" are thinko.
So far we were just lucky because the uninitialized members
of struct msghdr are not used by default on a SOCK_STREAM tcp
socket.
But as new things like msg_ubuf and sg_from_iter where added
recently, we should play on the safe side and avoid potention
problems in future.
Signed-off-by: Stefan Metzmacher <metze@samba.org> Cc: stable@vger.kernel.org Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Stefan Metzmacher <metze@samba.org> Cc: stable@vger.kernel.org Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When doing direct writes we need to also invalidate the mapping in case
we have a cached copy of the affected page(s) in memory or else
subsequent reads of the data might return the old/stale content
before we wrote an update to the server.
Cc: stable@vger.kernel.org Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If we're invoked with a fixed file, follow the normal rules of not
calling io_fput_file(). Fixed files are permanently registered to the
ring, and do not need putting separately.
Cc: stable@vger.kernel.org Fixes: aa184e8671f0 ("io_uring: don't attempt to IOPOLL for MSG_RING requests") Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since the stub version of of_dma_configure_id() was added in commit a081bd4af4ce ("of/device: Add input id to of_dma_configure()"), it has
not matched the signature of the full function, leading to build failure
reports when code using this function is built on !OF configurations.
Fixes: a081bd4af4ce ("of/device: Add input id to of_dma_configure()") Cc: stable@vger.kernel.org Signed-off-by: Thierry Reding <treding@nvidia.com> Reviewed-by: Frank Rowand <frank.rowand@sony.com> Acked-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Link: https://lore.kernel.org/r/20220824153256.1437483-1-thierry.reding@gmail.com Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The worker is canceled in gt_park path, but earlier it was assumed that
gt_park path cannot sleep and the cancel is asynchronous. This caused a
race with suspend flow where the worker runs after suspend and causes an
unclaimed register access warning. Cancel the worker synchronously since
the gt_park is indeed allowed to sleep.
v2: Fix author name and sign-off mismatch
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/4419 Fixes: 77cdd054dd2c ("drm/i915/pmu: Connect engine busyness stats from GuC to pmu") Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220827002135.139349-1-umesh.nerlige.ramappa@intel.com Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
(cherry picked from commit 31335aa8e08be3fe10c50aecd2f11aba77544a78) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Using two different types of workoads, it was observed that
guc_update_engine_gt_clks was being called too frequently and/or
causing a CPU-to-lmem bandwidth hit over PCIE. Details on
the workloads and numbers are in the notes below.
Background: At the moment, guc_update_engine_gt_clks can be invoked
via one of 3 ways. #1 and #2 are infrequent under normal operating
conditions:
1.When a predefined "ping_delay" timer expires so that GuC-
busyness can sample the GTPM clock counter to ensure it
doesn't miss a wrap-around of the 32-bits of the HW counter.
(The ping_delay is calculated based on 1/8th the time taken
for the counter go from 0x0 to 0xffffffff based on the
GT frequency. This comes to about once every 28 seconds at a
GT frequency of 19.2Mhz).
2.In preparation for a gt reset.
3.In response to __gt_park events (as the gt power management
puts the gt into a lower power state when there is no work
being done).
Root-cause: For both the workloads described farther below, it was
observed that when user space calls IOCTLs that unparks the
gt momentarily and repeats such calls many times in quick succession,
it triggers calling guc_update_engine_gt_clks as many times. However,
the primary purpose of guc_update_engine_gt_clks is to ensure we don't
miss the wraparound while the counter is ticking. Thus, the solution
is to ensure we skip that check if gt_park is calling this function
earlier than necessary.
Solution: Snapshot jiffies when we do actually update the busyness
stats. Then get the new jiffies every time intel_guc_busyness_park
is called and bail if we are being called too soon. Use half of the
ping_delay as a safe threshold.
NOTE1: Workload1: IGTs' gem_create was modified to create a file handle,
allocate memory with sizes that range from a min of 4K to the max supported
(in power of two step-sizes). Its maps, modifies and reads back the
memory. Allocations and modification is repeated until total memory
allocation reaches the max. Then the file handle is closed. With this
workload, guc_update_engine_gt_clks was called over 188 thousand times
in the span of 15 seconds while this test ran three times. With this patch,
the number of calls reduced to 14.
NOTE2: Workload2: 30 transcode sessions are created in quick succession.
While these sessions are created, pcm-iio tool was used to measure I/O
read operation bandwidth consumption sampled at 100 milisecond intervals
over the course of 20 seconds. The total bandwidth consumed over 20 seconds
without this patch was measured at average at 311KBps per sample. With this
patch, the number went down to about 175Kbps which is about a 43% savings.
Currently, pic_height of vdsc_cfg structure is being used to calculate
slice_height, before it is set for DP.
So taking out the lines to set pic_height from the helper
intel_dp_dsc_compute_params() to individual encoders, and setting
pic_height, before it is used to calculate slice_height for DP.
Fixes: 5a6d866f8e1b ("drm/i915: Get slice height before computing rc params") Cc: Manasi Navare <manasi.d.navare@intel.com> Cc: Vandita Kulkarni <vandita.kulkarni@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Ankit Nautiyal <ankit.k.nautiyal@intel.com> Reviewed-by: Vandita Kulkarni <vandita.kulkarni@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220902103219.1168781-1-ankit.k.nautiyal@intel.com
(cherry picked from commit e72df53dcb01ec58e0410da353551adf94c8d0f1) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The hsync/vsync polarities were not honoured for the eDP and HDMI ports.
Add the register settings to configure the polarities as requested by the
DRM_MODE_FLAG_PHSYNC/DRM_MODE_FLAG_PVSYNC flags.
VPP_WRAP_OSD1_MATRIX_COEF22.Coeff22 is documented as being bits 0-12,
not 16-28.
Without this the output tends to have a pink hue, changing it results
in better color accuracy.
The vendor kernel doesn't use this register. However the code which
sets VIU2_OSD1_MATRIX_COEF22 also uses bits 0-12. There is a slightly
different style of registers for configuring some of the other matrices,
which do use bits 16-28 for this coefficient, but those have names
ending in MATRIX_COEF22_30, and this is not one of those.
Commit 52824ca4502d ("drm/panel-edp: Better describe eDP panel delays")
clarified the various delays used for eDP panels, tying them to the eDP
panel timing diagram.
For Innolux N116BCA-EA1, .prepare_to_enable would be:
t4_min + t5_min + t6_min + max(t7_max, t8_min)
Since t4_min and t5_min are both 0, the panel can use either .enable or
.prepare_to_enable.
As .enable is better defined, switch to using .enable for this panel.
Also add .disable = 50, based on the datasheet's t9_min value. This
effectively makes the delays the same as delay_200_500_e80_d50.
Cc: Douglas Anderson <dianders@chromium.org> Fixes: 51d35631c970 ("drm/panel-simple: Add N116BCA-EA1") Signed-off-by: Chen-Yu Tsai <wenst@chromium.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://patchwork.freedesktop.org/patch/msgid/20220908085454.1024167-1-wenst@chromium.org Signed-off-by: Sasha Levin <sashal@kernel.org>
The assumption that it is impossible to return an ERR pointer from
rpc_run_task() no longer holds due to commit 25cf32ad5dba ("SUNRPC:
Handle allocation failure in rpc_new_task()").
The fallocate call invalidates suid and sgid bits as part of normal
operation. We need to mark the mode bits as invalid when using fallocate
with an suid so these will be updated the next time the user looks at them.
This fixes xfstests generic/683 and generic/684.
Reported-by: Yue Cui <cuiyue-fnst@fujitsu.com> Fixes: 913eca1aea87 ("NFS: Fallocate should use the nfs4_fattr_bitmap") Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Commit e39d5ef67804 ("powerpc/5xxx: extend mpc8xxx_gpio driver to support
mpc512x gpios") implemented support for IRQ_TYPE_LEVEL_LOW flow type in
mpc512x via falling edge type. Do same for mpc85xx which support was added
in commit 345e5c8a1cc3 ("powerpc: Add interrupt support to mpc8xxx_gpio").
Fixes probing of lm90 hwmon driver on mpc85xx based board which use level
interrupt. Without it kernel prints error and refuse lm90 to work:
[ 15.258370] genirq: Setting trigger mode 8 for irq 49 failed (mpc8xxx_irq_set_type+0x0/0xf8)
[ 15.267168] lm90 0-004c: cannot request IRQ 49
[ 15.272708] lm90: probe of 0-004c failed with error -22
Fixes: 345e5c8a1cc3 ("powerpc: Add interrupt support to mpc8xxx_gpio") Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl> Signed-off-by: Sasha Levin <sashal@kernel.org>
We need to make sure that the req->rq_private_buf is completely up to
date before we make req->rq_reply_bytes_recvd visible to the
call_decode() routine in order to avoid triggering the WARN_ON().
Reported-by: Benjamin Coddington <bcodding@redhat.com> Fixes: 72691a269f0b ("SUNRPC: Don't reuse bvec on retransmission of the request") Tested-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The name of A100 R_PIO driver should be sun50i-a100-r-pinctrl,
not sun50iw10p1-r-pinctrl.
Fixes: 473436e7647d6 ("pinctrl: sunxi: add support for the Allwinner A100 pin controller") Signed-off-by: Michael Wu <michael@allwinnertech.com> Acked-by: Samuel Holland <samuel@sholland.org> Link: https://lore.kernel.org/r/20220819024541.74191-1-michael@allwinnertech.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Switching between falling/rising edges for IRQ_TYPE_EDGE_BOTH on pins that
require debounce can cause the device to lose events due to a desync
between pin state and irq type.
This problem is resolved by switching between IRQ_TYPE_LEVEL_LOW and
IRQ_TYPE_LEVEL_HIGH instead.
Fixes: 936ee2675eee ("gpio/rockchip: add driver for rockchip gpio") Signed-off-by: João H. Spies <jhlspies@gmail.com> Link: https://lore.kernel.org/r/20220808025121.110223-1-jhlspies@gmail.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently in the wakeirq_map, gpio36 and gpio37 have the same wakeirq
number, resulting in gpio37 being unable to trigger interrupts.
It looks like that this is a typo in the wakeirq map. So fix it.
Commit 78c44d910d3e ("drivers/of: Fix depth when unflattening devicetree")
forgot to fix up the depth check in the loop body in unflatten_dt_nodes()
which makes it possible to overflow the nps[] buffer...
Found by Linux Verification Center (linuxtesting.org) with the SVACE static
analysis tool.
Now acpi_os_ioremap() is marked with __init because it calls memblock_
is_memory() which is also marked with __init in the !ARCH_KEEP_MEMBLOCK
case. However, acpi_os_ioremap() is called by ordinary functions such
as acpi_os_{read, write}_memory() and causes section mismatch warnings:
Fix these warnings by selecting ARCH_KEEP_MEMBLOCK unconditionally and
removing the __init modifier of acpi_os_ioremap(). This can also give a
chance to track "memory" and "reserved" memblocks after early boot.
Fix for TUF laptops returning with an -ENOSPC on calling
asus_wmi_evaluate_method_buf() when fetching default curves. The TUF method
requires at least 32 bytes space.
This also moves and changes the pr_debug() in fan_curve_check_present() to
pr_warn() in fan_curve_get_factory_default() so that there is at least some
indication in logs of why it fails.
Signed-off-by: Luke D. Jones <luke@ljones.dev> Link: https://lore.kernel.org/r/20220828074638.5473-1-luke@ljones.dev Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
USB external storage device(0x0b05:1932), use gnome-disk-utility tools
to test usb write < 30MB/s.
if does not to load module of uas for this device, can increase the
write speed from 20MB/s to >40MB/s.
2 keymap fixes for the Acer Aspire One AOD270 and the same hardware
rebranded as Packard Bell Dot SC:
1. The F2 key is marked with a big '?' symbol on the Packard Bell Dot SC,
this sends WMID_HOTKEY_EVENTs with a scancode of 0x27 add a mapping
for this.
2. Scancode 0x61 is KEY_SWITCHVIDEOMODE. Usually this is a duplicate
input event with the "Video Bus" input device events. But on these devices
the "Video Bus" does not send events for this key. Map 0x61 to KEY_UNKNOWN
instead of using KE_IGNORE so that udev/hwdb can override it on these devs.
In case the source port cannot be decoded, print the warning only once. This
still brings attention to the user and does not spam the logs at the same time.
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20220830163448.8921-1-kurt@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
TCP_FIN_WAIT2 and TCP_LAST_ACK were not handled, the connection is closing
so we can ignore them and avoid printing the "unhandled state"
warning message.
[ 1298.852386] nvmet_tcp: queue 2 unhandled state 5
[ 1298.879112] nvmet_tcp: queue 7 unhandled state 5
[ 1298.884253] nvmet_tcp: queue 8 unhandled state 5
[ 1298.889475] nvmet_tcp: queue 9 unhandled state 5
v2: Do not call nvmet_tcp_schedule_release_queue(), just ignore
the fin_wait2 and last_ack states.
Lexar NM610 reports bogus eui64 values that appear to be the same across
all drives. Quirk them out so they are not marked as "non globally unique"
duplicates.
Signed-off-by: Shyamin Ayesh <me@shyamin.com>
[patch formatting] Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
Below driver load error will be printed, not friendly to end user.
amdgpu: ATOM BIOS: 113-D603GLXE-077
[drm] FRU: Failed to get size field
[drm:amdgpu_fru_get_product_info [amdgpu]] *ERROR* Failed to read FRU Manufacturer, ret:-5
Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
GPIO mockup debugfs is created in gpio_mockup_probe() but
forgot to remove when remove device. This patch add a devm
managed callback for removing them.
The Lenovo OneLink+ Dock contains an RTL8153 controller that behaves as
a broken CDC device by default. Add the custom Lenovo PID to the r8152
driver to support it properly.
Also, systems compatible with this dock provide a BIOS option to enable
MAC address passthrough (as per Lenovo document "ThinkPad Docking
Solutions 2017"). Add the custom PID to the MAC passthrough list too.
Tested on a ThinkPad 13 1st gen with the expected results:
passthrough disabled: Invalid header when reading pass-thru MAC addr
passthrough enabled: Using pass-thru MAC addr XX:XX:XX:XX:XX:XX
Signed-off-by: Jean-Francois Le Fillatre <jflf_kernel@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
The MSI is probably raised by incoming packets, so power down the device
and disable bus mastering to stop the traffic, as user confirmed this
approach works.
In addition to that, be extra safe and cancel reset task if it's running.
There is a timing issue captured during ishtp client sending stress tests.
It was observed during stress tests that ISH firmware is getting out of
ordered messages. This is a rare scenario as the current set of ISH client
drivers don't send much data to firmware. But this may not be the case
going forward.
When message size is bigger than IPC MTU, ishtp splits the message into
fragments and uses serialized async method to send message fragments.
The call stack:
ishtp_cl_send_msg_ipc->ipc_tx_callback(first fregment)->
ishtp_send_msg(with callback)->write_ipc_to_queue->
write_ipc_from_queue->callback->ipc_tx_callback(next fregment)......
When an ipc write complete interrupt is received, driver also calls
write_ipc_from_queue->ipc_tx_callback in ISR to start sending of next fragment.
Through ipc_tx_callback uses spin_lock to protect message splitting, as the
serialized sending method will call back to ipc_tx_callback again, so it doesn't
put sending under spin_lock, it causes driver cannot guarantee all fragments
be sent in order.
Considering this scenario:
ipc_tx_callback just finished a fragment splitting, and not call ishtp_send_msg
yet, there is a write complete interrupt happens, then ISR->write_ipc_from_queue
->ipc_tx_callback->ishtp_send_msg->write_ipc_to_queue......
Because ISR has higher exec priority than normal thread, this causes the new
fragment be sent out before previous fragment. This disordered message causes
invalid message to firmware.
The solution is, to send fragments synchronously:
Use ishtp_write_message writing fragments into tx queue directly one by one,
instead of ishtp_send_msg only writing one fragment with completion callback.
As no completion callback be used, so change ipc_tx_callback to ipc_tx_send.
When A/D bits are not available, KVM uses a software access tracking
mechanism, which involves making the SPTEs inaccessible. However,
the clear_young() MMU notifier does not flush TLBs. So it is possible
that there may still be stale, potentially writable, TLB entries.
This is usually fine, but can be problematic when enabling dirty
logging, because it currently only does a TLB flush if any SPTEs were
modified. But if all SPTEs are in access-tracked state, then there
won't be a TLB flush, which means that the guest could still possibly
write to memory and not have it reflected in the dirty bitmap.
So just unconditionally flush the TLBs when enabling dirty logging.
As an alternative, KVM could explicitly check the MMU-Writable bit when
write-protecting SPTEs to decide if a flush is needed (instead of
checking the Writable bit), but given that a flush almost always happens
anyway, so just making it unconditional seems simpler.
When auxiliary_device_add() returns an error, auxiliary_device_uninit()
is called, which causes refcount for device to be decremented and
.release callback will be triggered.
Because adev_release() re-calls auxiliary_device_uninit(), it will cause
use-after-free:
[ 1269.455172] WARNING: CPU: 0 PID: 14267 at lib/refcount.c:28 refcount_warn_saturate+0x110/0x15
[ 1269.464007] refcount_t: underflow; use-after-free.
If the previous thing cat'ing $debugfs/rd left the FIFO full, then
subsequent open could deadlock in rd_write() (because open is blocked,
not giving a chance for read() to consume any data in the FIFO). Also
it is generally a good idea to clear out old data from the FIFO.
IRQ override isn't needed on modern AMD Zen systems.
There's an active low keyboard IRQ on AMD Ryzen 6000 and it will stay
this way on newer platforms. This IRQ override breaks keyboards for
almost all Ryzen 6000 laptops currently on the market.
Skip this IRQ override for all AMD Zen platforms because this IRQ
override is supposed to be a workaround for buggy ACPI DSDT and we can't
have a long list of all future AMD CPUs/Laptops in the kernel code.
If a device with buggy ACPI DSDT shows up, a separated list containing
just them should be created.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216118 Suggested-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Chuanhong Guo <gch981213@gmail.com> Acked-by: Mario Limonciello <mario.limonciello@amd.com> Tested-by: XiaoYan Li <lxy.lixiaoyan@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The cited commit removed from the cleanup flow of umr the checks
if the resources were created. This could lead to null-ptr-deref
in case that we had failure in mlx5_ib_stage_ib_reg_init stage.
Fix it by adding new state to the umr that can say if the resources
were created or not and check it in the umr cleanup flow before
destroying the resources.
When the RDMA auxiliary driver probes, it sets its profile based on
devlink driverinit value. The latter might not be in sync with FW yet
(In case devlink reload is not performed), thus causing a mismatch
between RDMA driver and FW. This results in the following FW syndrome
when the RDMA driver tries to adjust RoCE state, which fails the probe:
"0xC1F678 | modify_nic_vport_context: roce_en set on a vport that
doesn't support roce"
To prevent this, select the PF profile based on FW RoCE capability
instead of relying on devlink driverinit value.
To provide backward compatibility of the RoCE disable feature, on older
FW's where roce_rw is not set (FW RoCE capability is read-only), keep
the current behavior e.g., rely on devlink driverinit value.
Fixes: fbfa97b4d79f ("net/mlx5: Disable roce at HCA level") Reviewed-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Link: https://lore.kernel.org/r/cb34ce9a1df4a24c135cb804db87f7d2418bd6cc.1661763459.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Use software VHCA id when it's supported by the firmware.
A unique id is allocated upon mlx5_mdev_init() and freed upon
mlx5_mdev_uninit(), as such it stays the same during the full life cycle
of the device including upon health recovery if occurred.
The conjunction of sw_vhca_id with sw_owner_id will be a global unique
id per function which uses mlx5_core.
The sw_vhca_id is set upon init_hca command and is used to specify the
VHCA that the NIC vport is affiliated with.
This functionality is needed upon migration of VM which is MPV based.
(i.e. multi port device).
Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Stable-dep-of: 9ca05b0f27de ("RDMA/mlx5: Rely on RoCE fw cap instead of devlink when setting profile") Signed-off-by: Sasha Levin <sashal@kernel.org>
Introduce ifc related stuff to enable using software vhca id
functionality.
Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Stable-dep-of: 9ca05b0f27de ("RDMA/mlx5: Rely on RoCE fw cap instead of devlink when setting profile") Signed-off-by: Sasha Levin <sashal@kernel.org>
The translation table copying code for kdump kernels is currently based
on the extended root/context entry formats of ECS mode defined in older
VT-d v2.5, and doesn't handle the scalable mode formats. This causes
the kexec capture kernel boot failure with DMAR faults if the IOMMU was
enabled in scalable mode by the previous kernel.
The ECS mode has already been deprecated by the VT-d spec since v3.0 and
Intel IOMMU driver doesn't support this mode as there's no real hardware
implementation. Hence this converts ECS checking in copying table code
into scalable mode.
The existing copying code consumes a bit in the context entry as a mark
of copied entry. It needs to work for the old format as well as for the
extended context entries. As it's hard to find such a common bit for both
legacy and scalable mode context entries. This replaces it with a per-
IOMMU bitmap.
Fixes: 7373a8cc38197 ("iommu/vt-d: Setup context and enable RID2PASID support") Cc: stable@vger.kernel.org Reported-by: Jerry Snitselaar <jsnitsel@redhat.com> Tested-by: Wen Jin <wen.jin@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20220817011035.3250131-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
[WHY]
Logging for SMU response value after the wait allows us to know
immediately what the response value was. Makes it easier to debug should
the value be anything other than OK.
[HOW]
Using the the already available DC SMU logging functions.
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Reviewed-by: Charlene Liu <Charlene.Liu@amd.com> Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Saaem Rizvi <SyedSaaem.Rizvi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The AMD IOMMU driver cannot activate PASID mode on a RID without the RID's
translation being set to IDENTITY. Further it requires changing the RID's
page table layout from the normal v1 IOMMU_DOMAIN_IDENTITY layout to a
different v2 layout.
It does this by creating a new iommu_domain, configuring that domain for
v2 identity operation and then attaching it to the group, from within the
driver. This logic assumes the group is already set to the IDENTITY domain
and is being used by the DMA API.
However, since the ownership logic is based on the group's domain pointer
equaling the default domain to detect DMA API ownership, this causes it to
look like the group is not attached to the DMA API any more. This blocks
attaching drivers to any other devices in the group.
In a real system this manifests itself as the HD-audio devices on some AMD
platforms losing their device drivers.
Work around this unique behavior of the AMD driver by checking for
equality of IDENTITY domains based on their type, not their pointer
value. This allows the AMD driver to have two IDENTITY domains for
internal purposes without breaking the check.
Have the AMD driver properly declare that the special domain it created is
actually an IDENTITY domain.
Commit e8ae0e140c05 ("vfio: Require that devices support DMA cache
coherence") requires IOMMU drivers to advertise
IOMMU_CAP_CACHE_COHERENCY, in order to be used by VFIO. Since VFIO does
not provide to userspace the ability to maintain coherency through cache
invalidations, it requires hardware coherency. Advertise the capability
in order to restore VFIO support.
The meaning of IOMMU_CAP_CACHE_COHERENCY also changed from "IOMMU can
enforce cache coherent DMA transactions" to "IOMMU_CACHE is supported".
While virtio-iommu cannot enforce coherency (of PCIe no-snoop
transactions), it does support IOMMU_CACHE.
We can distinguish different cases of non-coherent DMA:
(1) When accesses from a hardware endpoint are not coherent. The host
would describe such a device using firmware methods ('dma-coherent'
in device-tree, '_CCA' in ACPI), since they are also needed without
a vIOMMU. In this case mappings are created without IOMMU_CACHE.
virtio-iommu doesn't need any additional support. It sends the same
requests as for coherent devices.
(2) When the physical IOMMU supports non-cacheable mappings. Supporting
those would require a new feature in virtio-iommu, new PROBE request
property and MAP flags. Device drivers would use a new API to
discover this since it depends on the architecture and the physical
IOMMU.
(3) When the hardware supports PCIe no-snoop. It is possible for
assigned PCIe devices to issue no-snoop transactions, and the
virtio-iommu specification is lacking any mention of this.
Arm platforms don't necessarily support no-snoop, and those that do
cannot enforce coherency of no-snoop transactions. Device drivers
must be careful about assuming that no-snoop transactions won't end
up cached; see commit e02f5c1bb228 ("drm: disable uncached DMA
optimization for ARM and arm64"). On x86 platforms, the host may or
may not enforce coherency of no-snoop transactions with the physical
IOMMU. But according to the above commit, on x86 a driver which
assumes that no-snoop DMA is compatible with uncached CPU mappings
will also work if the host enforces coherency.
Although these issues are not specific to virtio-iommu, it could be
used to facilitate discovery and configuration of no-snoop. This
would require a new feature bit, PROBE property and ATTACH/MAP
flags.
Cc: stable@vger.kernel.org Fixes: e8ae0e140c05 ("vfio: Require that devices support DMA cache coherence") Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20220825154622.86759-1-jean-philippe@linaro.org Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The Intel IOMMU driver possibly selects between the first-level and the
second-level translation tables for DMA address translation. However,
the levels of page-table walks for the 4KB base page size are calculated
from the SAGAW field of the capability register, which is only valid for
the second-level page table. This causes the IOMMU driver to stop working
if the hardware (or the emulated IOMMU) advertises only first-level
translation capability and reports the SAGAW field as 0.
This solves the above problem by considering both the first level and the
second level when calculating the supported page table levels.
Fixes: b802d070a52a1 ("iommu/vt-d: Use iova over first level") Cc: stable@vger.kernel.org Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20220817023558.3253263-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
GCC does not insert a `bti c` instruction at the beginning of a function
when it believes that all callers reach the function through a direct
branch[1]. Unfortunately the logic it uses to determine this is not
sufficiently robust, for example not taking account of functions being
placed in different sections which may be loaded separately, so we may
still see thunks being generated to these functions. If that happens,
the first instruction in the callee function will result in a Branch
Target Exception due to the missing landing pad.
While this has currently only been observed in the case of modules
having their main code loaded sufficiently far from their init section
to require thunks it could potentially happen for other cases so the
safest thing is to disable BTI for the kernel when building with an
affected toolchain.
Reported-by: D Scott Phillips <scott@os.amperecomputing.com>
[Bits of the commit message are lifted from his report & workaround] Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20220905142255.591990-1-broonie@kernel.org Cc: <stable@vger.kernel.org> # v5.10+ Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Replace autoloading data based on the ACPI EC device with the DMI
records for motherboards models. The ACPI method created a bug that when
this driver returns error from the probe function because of the
unsupported motherboard model, the ACPI subsystem concludes
that the EC device does not work properly.
On Mon, Sep 12, 2022 at 10:52:45AM +0100, Catalin Marinas wrote:
>I missed this (holidays) and it looks like it's in stable already. On
>its own it will likely break kasan_hw if used together with user-space
>MTE as this change relies on two previous commits:
>
>70c248aca9e7 ("mm: kasan: Skip unpoisoning of user pages")
>6d05141a3930 ("mm: kasan: Skip page unpoisoning only if __GFP_SKIP_KASAN_UNPOISON")
>
>The reason I did not cc stable is that there are other dependencies in
>this area. The potential issues without the above commits were rather
>theoretical, so take these patches rather as clean-ups/refactoring than
>fixes.
In perf/Documentation/perf-stat.txt, for "--td-level" the default "0" means
the max level that the current hardware support.
So we need initialize the stat_config.topdown_level to TOPDOWN_MAX_LEVEL
when “--td-level=0” or no “--td-level” option. Otherwise, for the
hardware with a max level is 2, the 2nd level metrics disappear for raw
events in this case.
The issue cannot be observed for the perf stat default or "--topdown"
options. This commit fixes the raw events issue and removes the
duplicated code for the perf stat default.
Before:
# ./perf stat -e "cpu-clock,context-switches,cpu-migrations,page-faults,instructions,cycles,ref-cycles,branches,branch-misses,{slots,topdown-retiring,topdown-bad-spec,topdown-fe-bound,topdown-be-bound,topdown-heavy-ops,topdown-br-mispredict,topdown-fetch-lat,topdown-mem-bound}" sleep 1
Current perf stat uses the evlist__add_default_attrs() to add the
generic default attrs, and uses arch_evlist__add_default_attrs() to add
the Arch specific default attrs, e.g., Topdown for x86.
It works well for the non-hybrid platforms. However, for a hybrid
platform, the hard code generic default attrs don't work.
Uses arch_evlist__add_default_attrs() to replace the
evlist__add_default_attrs(). The arch_evlist__add_default_attrs() is
modified to invoke the same __evlist__add_default_attrs() for the
generic default attrs. No functional change.
Add default_null_attrs[] to indicate the arch specific attrs.
No functional change for the arch specific default attrs either.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220721065706.2886112-4-zhengjun.xing@linux.intel.com Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Stable-dep-of: f0c86a2bae4f ("perf stat: Fix L2 Topdown metrics disappear for raw events") Signed-off-by: Sasha Levin <sashal@kernel.org>
The global rwsem dmar_global_lock was introduced by commit 3a5670e8ac932
("iommu/vt-d: Introduce a rwsem to protect global data structures"). It
is used to protect DMAR related global data from DMAR hotplug operations.
The dmar_global_lock used in the intel_iommu_init() might cause recursive
locking issue, for example, intel_iommu_get_resv_regions() is taking the
dmar_global_lock from within a section where intel_iommu_init() already
holds it via probe_acpi_namespace_devices().
Using dmar_global_lock in intel_iommu_init() could be relaxed since it is
unlikely that any IO board must be hot added before the IOMMU subsystem is
initialized. This eliminates the possible recursive locking issue by moving
down DMAR hotplug support after the IOMMU is initialized and removing the
uses of dmar_global_lock in intel_iommu_init().
Configure ip-polling register to enable polling for all voltage monitor
channels.
This enables reading the voltage values for all inputs other than just
input 0.
Fix voltage allocation and reading to support all channels in all VMs.
Prior to this change allocation and reading were done only for the first
channel in each VM.
This change counts the total number of channels for allocation, and takes
into account the channel offset when reading the sample data register.
According to Moortec Embedded Voltage Monitor (MEVM) series 3 data
sheet, the minimum input signal is -100mv and maximum input signal
is +1000mv.
The equation used to convert the digital word to voltage uses mixed
types (*val signed and n unsigned), and on 64 bit machines also has
different size, since sizeof(u32) = 4 and sizeof(long) = 8.
So when measuring a negative input, n will be small enough, such that
PVT_N_CONST * n < PVT_R_CONST, and the result of
(PVT_N_CONST * n - PVT_R_CONST) will overflow to a very big positive
32 bit number. Then when storing the result in *val it will be the same
value just in 64 bit (instead of it representing a negative number which
will what happen when sizeof(long) = 4).
When -1023 <= (PVT_N_CONST * n - PVT_R_CONST) <= -1
dividing the number by 1024 should result of in 0, but because ">> 10"
is used, and the sign bit is used to fill the vacated bit positions, it
results in -1 (0xf...fffff) which is wrong.
This change fixes the sign problem and supports negative values by
casting n to long and replacing the shift right with div operation.
Bug - in case "intel,vm-map" is missing in device-tree ,'num' is set
to 0, and no voltage channel infos are allocated.
The reason num is set to 0 when "intel,vm-map" is missing is to set the
entire pvt->vm_idx[] with incremental channel numbers, but it didn't
take into consideration that same num is used later in devm_kcalloc().
If "intel,vm-map" does exist there is no need to set the unspecified
channels with incremental numbers, because the unspecified channels
can't be accessed in pvt_read_in() which is the only other place besides
the probe functions that uses pvt->vm_idx[].
This change fixes the bug by moving the incremental channel numbers
setting to be done only if "intel,vm-map" property is defined (starting
loop from 0), and removing 'num = 0'.
because the gate entry for TC 7 (S 0x80 10000 ns) now has a static guard
band added earlier than its 'gate close' event, such that packet
overruns won't occur in the worst case of the largest packet possible.
Since guard bands are statically determined based on the per-tc
QSYS_QMAXSDU_CFG_* with a fallback on the port-based QSYS_PORT_MAX_SDU,
we need to discuss what happens with TC 7 depending on kernel version,
since the driver, prior to commit 55a515b1f5a9 ("net: dsa: felix: drop
oversized frames with tc-taprio instead of hanging the port"), did not
touch QSYS_QMAXSDU_CFG_*, and therefore relied on QSYS_PORT_MAX_SDU.
1 (before vsc9959_tas_guard_bands_update): QSYS_PORT_MAX_SDU defaults to
1518, and at gigabit this introduces a static guard band (independent
of packet sizes) of 12144 ns, plus QSYS::HSCH_MISC_CFG.FRM_ADJ (bit
time of 20 octets => 160 ns). But this is larger than the time window
itself, of 10000 ns. So, the queue system never considers a frame with
TC 7 as eligible for transmission, since the gate practically never
opens, and these frames are forever stuck in the TX queues and hang
the port.
2 (after vsc9959_tas_guard_bands_update): Under the sole goal of
enabling oversized frame dropping, we make an effort to set
QSYS_QMAXSDU_CFG_7 to 1230 bytes. But QSYS_QMAXSDU_CFG_7 plays
one more role, which we did not take into account: per-tc static guard
band, expressed in L2 byte time (auto-adjusted for FCS and L1 overhead).
There is a discrepancy between what the driver thinks (that there is
no guard band, and 100% of min_gate_len[tc] is available for egress
scheduling) and what the hardware actually does (crops the equivalent
of QSYS_QMAXSDU_CFG_7 ns out of min_gate_len[tc]). In practice, this
means that the hardware thinks it has exactly 0 ns for scheduling tc 7.
In both cases, even minimum sized Ethernet frames are stuck on egress
rather than being considered for scheduling on TC 7, even if they would
fit given a proper configuration. Considering the current situation,
with vsc9959_tas_guard_bands_update(), frames between 60 octets and 1230
octets in size are not eligible for oversized dropping (because they are
smaller than QSYS_QMAXSDU_CFG_7), but won't be considered as eligible
for scheduling either, because the min_gate_len[7] (10000 ns) minus the
guard band determined by QSYS_QMAXSDU_CFG_7 (1230 octets * 8 ns per
octet == 9840 ns) minus the guard band auto-added for L1 overhead by
QSYS::HSCH_MISC_CFG.FRM_ADJ (20 octets * 8 ns per octet == 160 octets)
leaves 0 ns for scheduling in the queue system proper.
Investigating the hardware behavior, it becomes apparent that the queue
system needs precisely 33 ns of 'gate open' time in order to consider a
frame as eligible for scheduling to a tc. So the solution to this
problem is to amend vsc9959_tas_guard_bands_update(), by giving the
per-tc guard bands less space by exactly 33 ns, just enough for one
frame to be scheduled in that interval. This allows the queue system to
make forward progress for that port-tc, and prevents it from hanging.
Fixes: 297c4de6f780 ("net: dsa: felix: re-enable TAS guard band mode") Reported-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>