]> git.itanic.dy.fi Git - linux-stable/log
linux-stable
2 years agoLinux 5.15.28 v5.15.28
Greg Kroah-Hartman [Fri, 11 Mar 2022 11:22:37 +0000 (12:22 +0100)]
Linux 5.15.28

Link: https://lore.kernel.org/r/20220309155859.239810747@linuxfoundation.org
Tested-by: Fox Chen <foxhlchen@gmail.com>
Tested-by: Hulk Robot <hulkrobot@huawei.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Bagas Sanjaya <bagasdotme@gmail.com>
Tested-by: Pavel Machek (CIP) <pavel@denx.de>
Link: https://lore.kernel.org/r/20220310140812.869208747@linuxfoundation.org
Tested-by: Pavel Machek (CIP) <pavel@denx.de> =
Tested-by: Salvatore Bonaccorso <carnil@debian.org>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Tested-by: Shuah Khan <skhan@linuxfoundation.org>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Bagas Sanjaya <bagasdotme@gmail.com>
Tested-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoRevert "ACPI: PM: s2idle: Cancel wakeup before dispatching EC GPE"
Greg Kroah-Hartman [Thu, 10 Mar 2022 13:47:12 +0000 (14:47 +0100)]
Revert "ACPI: PM: s2idle: Cancel wakeup before dispatching EC GPE"

This reverts commit e799974e7cbb2e77ec12431512e155574c6ed333 which is
commit dc0075ba7f387fe4c48a8c674b11ab6f374a6acc upstream.

It's been reported to cause problems with a number of Fedora and Arch
Linux users, so drop it for now until that is resolved.

Link: https://lore.kernel.org/r/CAJZ5v0gE52NT=4kN4MkhV3Gx=M5CeMGVHOF0jgTXDb5WwAMs_Q@mail.gmail.com
Link: https://lore.kernel.org/r/31b9d1cd-6a67-218b-4ada-12f72e6f00dc@redhat.com
Reported-by: Hans de Goede <hdegoede@redhat.com>
Reported-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Justin Forbes <jmforbes@linuxtx.org>
Cc: Mark Pearson <markpearson@lenovo.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoxen/netfront: react properly to failing gnttab_end_foreign_access_ref()
Juergen Gross [Fri, 25 Feb 2022 15:05:43 +0000 (16:05 +0100)]
xen/netfront: react properly to failing gnttab_end_foreign_access_ref()

Commit 66e3531b33ee51dad17c463b4d9c9f52e341503d upstream.

When calling gnttab_end_foreign_access_ref() the returned value must
be tested and the reaction to that value should be appropriate.

In case of failure in xennet_get_responses() the reaction should not be
to crash the system, but to disable the network device.

The calls in setup_netfront() can be replaced by calls of
gnttab_end_foreign_access(). While at it avoid double free of ring
pages and grant references via xennet_disconnect_backend() in this case.

This is CVE-2022-23042 / part of XSA-396.

Reported-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoxen/gnttab: fix gnttab_end_foreign_access() without page specified
Juergen Gross [Fri, 25 Feb 2022 15:05:43 +0000 (16:05 +0100)]
xen/gnttab: fix gnttab_end_foreign_access() without page specified

Commit 42baefac638f06314298087394b982ead9ec444b upstream.

gnttab_end_foreign_access() is used to free a grant reference and
optionally to free the associated page. In case the grant is still in
use by the other side processing is being deferred. This leads to a
problem in case no page to be freed is specified by the caller: the
caller doesn't know that the page is still mapped by the other side
and thus should not be used for other purposes.

The correct way to handle this situation is to take an additional
reference to the granted page in case handling is being deferred and
to drop that reference when the grant reference could be freed
finally.

This requires that there are no users of gnttab_end_foreign_access()
left directly repurposing the granted page after the call, as this
might result in clobbered data or information leaks via the not yet
freed grant reference.

This is part of CVE-2022-23041 / XSA-396.

Reported-by: Simon Gaiser <simon@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoxen/pvcalls: use alloc/free_pages_exact()
Juergen Gross [Fri, 25 Feb 2022 15:05:43 +0000 (16:05 +0100)]
xen/pvcalls: use alloc/free_pages_exact()

Commit b0576cc9c6b843d99c6982888d59a56209341888 upstream.

Instead of __get_free_pages() and free_pages() use alloc_pages_exact()
and free_pages_exact(). This is in preparation of a change of
gnttab_end_foreign_access() which will prohibit use of high-order
pages.

This is part of CVE-2022-23041 / XSA-396.

Reported-by: Simon Gaiser <simon@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoxen/9p: use alloc/free_pages_exact()
Juergen Gross [Fri, 25 Feb 2022 15:05:42 +0000 (16:05 +0100)]
xen/9p: use alloc/free_pages_exact()

Commit 5cadd4bb1d7fc9ab201ac14620d1a478357e4ebd upstream.

Instead of __get_free_pages() and free_pages() use alloc_pages_exact()
and free_pages_exact(). This is in preparation of a change of
gnttab_end_foreign_access() which will prohibit use of high-order
pages.

By using the local variable "order" instead of ring->intf->ring_order
in the error path of xen_9pfs_front_alloc_dataring() another bug is
fixed, as the error path can be entered before ring->intf->ring_order
is being set.

By using alloc_pages_exact() the size in bytes is specified for the
allocation, which fixes another bug for the case of
order < (PAGE_SHIFT - XEN_PAGE_SHIFT).

This is part of CVE-2022-23041 / XSA-396.

Reported-by: Simon Gaiser <simon@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoxen: remove gnttab_query_foreign_access()
Juergen Gross [Fri, 25 Feb 2022 15:05:42 +0000 (16:05 +0100)]
xen: remove gnttab_query_foreign_access()

Commit 1dbd11ca75fe664d3e54607547771d021f531f59 upstream.

Remove gnttab_query_foreign_access(), as it is unused and unsafe to
use.

All previous use cases assumed a grant would not be in use after
gnttab_query_foreign_access() returned 0. This information is useless
in best case, as it only refers to a situation in the past, which could
have changed already.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoxen/gntalloc: don't use gnttab_query_foreign_access()
Juergen Gross [Fri, 25 Feb 2022 15:05:42 +0000 (16:05 +0100)]
xen/gntalloc: don't use gnttab_query_foreign_access()

Commit d3b6372c5881cb54925212abb62c521df8ba4809 upstream.

Using gnttab_query_foreign_access() is unsafe, as it is racy by design.

The use case in the gntalloc driver is not needed at all. While at it
replace the call of gnttab_end_foreign_access_ref() with a call of
gnttab_end_foreign_access(), which is what is really wanted there. In
case the grant wasn't used due to an allocation failure, just free the
grant via gnttab_free_grant_reference().

This is CVE-2022-23039 / part of XSA-396.

Reported-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoxen/scsifront: don't use gnttab_query_foreign_access() for mapped status
Juergen Gross [Fri, 25 Feb 2022 15:05:42 +0000 (16:05 +0100)]
xen/scsifront: don't use gnttab_query_foreign_access() for mapped status

Commit 33172ab50a53578a95691310f49567c9266968b0 upstream.

It isn't enough to check whether a grant is still being in use by
calling gnttab_query_foreign_access(), as a mapping could be realized
by the other side just after having called that function.

In case the call was done in preparation of revoking a grant it is
better to do so via gnttab_try_end_foreign_access() and check the
success of that operation instead.

This is CVE-2022-23038 / part of XSA-396.

Reported-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoxen/netfront: don't use gnttab_query_foreign_access() for mapped status
Juergen Gross [Fri, 25 Feb 2022 15:05:41 +0000 (16:05 +0100)]
xen/netfront: don't use gnttab_query_foreign_access() for mapped status

Commit 31185df7e2b1d2fa1de4900247a12d7b9c7087eb upstream.

It isn't enough to check whether a grant is still being in use by
calling gnttab_query_foreign_access(), as a mapping could be realized
by the other side just after having called that function.

In case the call was done in preparation of revoking a grant it is
better to do so via gnttab_end_foreign_access_ref() and check the
success of that operation instead.

This is CVE-2022-23037 / part of XSA-396.

Reported-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoxen/blkfront: don't use gnttab_query_foreign_access() for mapped status
Juergen Gross [Fri, 25 Feb 2022 15:05:41 +0000 (16:05 +0100)]
xen/blkfront: don't use gnttab_query_foreign_access() for mapped status

Commit abf1fd5919d6238ee3bc5eb4a9b6c3947caa6638 upstream.

It isn't enough to check whether a grant is still being in use by
calling gnttab_query_foreign_access(), as a mapping could be realized
by the other side just after having called that function.

In case the call was done in preparation of revoking a grant it is
better to do so via gnttab_end_foreign_access_ref() and check the
success of that operation instead.

For the ring allocation use alloc_pages_exact() in order to avoid
high order pages in case of a multi-page ring.

If a grant wasn't unmapped by the backend without persistent grants
being used, set the device state to "error".

This is CVE-2022-23036 / part of XSA-396.

Reported-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoxen/grant-table: add gnttab_try_end_foreign_access()
Juergen Gross [Fri, 25 Feb 2022 15:05:41 +0000 (16:05 +0100)]
xen/grant-table: add gnttab_try_end_foreign_access()

Commit 6b1775f26a2da2b05a6dc8ec2b5d14e9a4701a1a upstream.

Add a new grant table function gnttab_try_end_foreign_access(), which
will remove and free a grant if it is not in use.

Its main use case is to either free a grant if it is no longer in use,
or to take some other action if it is still in use. This other action
can be an error exit, or (e.g. in the case of blkfront persistent grant
feature) some special handling.

This is CVE-2022-23036, CVE-2022-23038 / part of XSA-396.

Reported-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoxen/xenbus: don't let xenbus_grant_ring() remove grants in error case
Juergen Gross [Fri, 25 Feb 2022 15:05:40 +0000 (16:05 +0100)]
xen/xenbus: don't let xenbus_grant_ring() remove grants in error case

Commit 3777ea7bac3113005b7180e6b9dadf16d19a5827 upstream.

Letting xenbus_grant_ring() tear down grants in the error case is
problematic, as the other side could already have used these grants.
Calling gnttab_end_foreign_access_ref() without checking success is
resulting in an unclear situation for any caller of xenbus_grant_ring()
as in the error case the memory pages of the ring page might be
partially mapped. Freeing them would risk unwanted foreign access to
them, while not freeing them would leak memory.

In order to remove the need to undo any gnttab_grant_foreign_access()
calls, use gnttab_alloc_grant_references() to make sure no further
error can occur in the loop granting access to the ring pages.

It should be noted that this way of handling removes leaking of
grant entries in the error case, too.

This is CVE-2022-23040 / part of XSA-396.

Reported-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoARM: fix build warning in proc-v7-bugs.c
Russell King (Oracle) [Thu, 10 Mar 2022 10:22:14 +0000 (10:22 +0000)]
ARM: fix build warning in proc-v7-bugs.c

commit b1a384d2cbccb1eb3f84765020d25e2c1929706e upstream.

The kernel test robot discovered that building without
HARDEN_BRANCH_PREDICTOR issues a warning due to a missing
argument to pr_info().

Add the missing argument.

Reported-by: kernel test robot <lkp@intel.com>
Fixes: 9dd78194a372 ("ARM: report Spectre v2 status through sysfs")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoarm64: Do not include __READ_ONCE() block in assembly files
Nathan Chancellor [Wed, 9 Mar 2022 19:16:34 +0000 (12:16 -0700)]
arm64: Do not include __READ_ONCE() block in assembly files

commit 52c9f93a9c482251cb0d224faa602ba26d462be8 upstream.

When building arm64 defconfig + CONFIG_LTO_CLANG_{FULL,THIN}=y after
commit 558c303c9734 ("arm64: Mitigate spectre style branch history side
channels"), the following error occurs:

  <instantiation>:4:2: error: invalid fixup for movz/movk instruction
   mov w0, #ARM_SMCCC_ARCH_WORKAROUND_3
   ^

Marc figured out that moving "#include <linux/init.h>" in
include/linux/arm-smccc.h into a !__ASSEMBLY__ block resolves it. The
full include chain with CONFIG_LTO=y from include/linux/arm-smccc.h:

include/linux/init.h
include/linux/compiler.h
arch/arm64/include/asm/rwonce.h
arch/arm64/include/asm/alternative-macros.h
arch/arm64/include/asm/assembler.h

The asm/alternative-macros.h include in asm/rwonce.h only happens when
CONFIG_LTO is set, which ultimately casues asm/assembler.h to be
included before the definition of ARM_SMCCC_ARCH_WORKAROUND_3. As a
result, the preprocessor does not expand ARM_SMCCC_ARCH_WORKAROUND_3 in
__mitigate_spectre_bhb_fw, which results in the error above.

Avoid this problem by just avoiding the CONFIG_LTO=y __READ_ONCE() block
in asm/rwonce.h with assembly files, as nothing in that block is useful
to assembly files, which allows ARM_SMCCC_ARCH_WORKAROUND_3 to be
properly expanded with CONFIG_LTO=y builds.

Fixes: e35123d83ee3 ("arm64: lto: Strengthen READ_ONCE() to acquire when CONFIG_LTO=y")
Cc: <stable@vger.kernel.org> # 5.11.x
Link: https://lore.kernel.org/r/20220309155716.3988480-1-maz@kernel.org/
Reported-by: Marc Zyngier <maz@kernel.org>
Acked-by: James Morse <james.morse@arm.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20220309191633.2307110-1-nathan@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoARM: Do not use NOCROSSREFS directive with ld.lld
Nathan Chancellor [Wed, 9 Mar 2022 22:07:27 +0000 (15:07 -0700)]
ARM: Do not use NOCROSSREFS directive with ld.lld

commit 36168e387fa7d0f1fe0cd5cf76c8cea7aee714fa upstream.

ld.lld does not support the NOCROSSREFS directive at the moment, which
breaks the build after commit b9baf5c8c5c3 ("ARM: Spectre-BHB
workaround"):

  ld.lld: error: ./arch/arm/kernel/vmlinux.lds:34: AT expected, but got NOCROSSREFS

Support for this directive will eventually be implemented, at which
point a version check can be added. To avoid breaking the build in the
meantime, just define NOCROSSREFS to nothing when using ld.lld, with a
link to the issue for tracking.

Cc: stable@vger.kernel.org
Fixes: b9baf5c8c5c3 ("ARM: Spectre-BHB workaround")
Link: https://github.com/ClangBuiltLinux/linux/issues/1609
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoARM: fix co-processor register typo
Russell King (Oracle) [Wed, 9 Mar 2022 19:08:42 +0000 (19:08 +0000)]
ARM: fix co-processor register typo

commit 33970b031dc4653cc9dc80f2886976706c4c8ef1 upstream.

In the recent Spectre BHB patches, there was a typo that is only
exposed in certain configurations: mcr p15,0,XX,c7,r5,4 should have
been mcr p15,0,XX,c7,c5,4

Reported-by: kernel test robot <lkp@intel.com>
Fixes: b9baf5c8c5c3 ("ARM: Spectre-BHB workaround")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoARM: fix build error when BPF_SYSCALL is disabled
Emmanuel Gil Peyrot [Tue, 8 Mar 2022 19:18:20 +0000 (20:18 +0100)]
ARM: fix build error when BPF_SYSCALL is disabled

commit 330f4c53d3c2d8b11d86ec03a964b86dc81452f5 upstream.

It was missing a semicolon.

Signed-off-by: Emmanuel Gil Peyrot <linkmauve@linkmauve.fr>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Fixes: 25875aa71dfe ("ARM: include unprivileged BPF status in Spectre V2 reporting").
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoarm64: proton-pack: Include unprivileged eBPF status in Spectre v2 mitigation reporting
James Morse [Thu, 3 Mar 2022 16:53:56 +0000 (16:53 +0000)]
arm64: proton-pack: Include unprivileged eBPF status in Spectre v2 mitigation reporting

commit 58c9a5060cb7cd529d49c93954cdafe81c1d642a upstream.

The mitigations for Spectre-BHB are only applied when an exception is
taken from user-space. The mitigation status is reported via the spectre_v2
sysfs vulnerabilities file.

When unprivileged eBPF is enabled the mitigation in the exception vectors
can be avoided by an eBPF program.

When unprivileged eBPF is enabled, print a warning and report vulnerable
via the sysfs vulnerabilities file.

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoarm64: Use the clearbhb instruction in mitigations
James Morse [Fri, 10 Dec 2021 14:32:56 +0000 (14:32 +0000)]
arm64: Use the clearbhb instruction in mitigations

commit 228a26b912287934789023b4132ba76065d9491c upstream.

Future CPUs may implement a clearbhb instruction that is sufficient
to mitigate SpectreBHB. CPUs that implement this instruction, but
not CSV2.3 must be affected by Spectre-BHB.

Add support to use this instruction as the BHB mitigation on CPUs
that support it. The instruction is in the hint space, so it will
be treated by a NOP as older CPUs.

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoKVM: arm64: Allow SMCCC_ARCH_WORKAROUND_3 to be discovered and migrated
James Morse [Fri, 10 Dec 2021 11:16:18 +0000 (11:16 +0000)]
KVM: arm64: Allow SMCCC_ARCH_WORKAROUND_3 to be discovered and migrated

commit a5905d6af492ee6a4a2205f0d550b3f931b03d03 upstream.

KVM allows the guest to discover whether the ARCH_WORKAROUND SMCCC are
implemented, and to preserve that state during migration through its
firmware register interface.

Add the necessary boiler plate for SMCCC_ARCH_WORKAROUND_3.

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoarm64: Mitigate spectre style branch history side channels
James Morse [Wed, 10 Nov 2021 14:48:00 +0000 (14:48 +0000)]
arm64: Mitigate spectre style branch history side channels

commit 558c303c9734af5a813739cd284879227f7297d2 upstream.

Speculation attacks against some high-performance processors can
make use of branch history to influence future speculation.
When taking an exception from user-space, a sequence of branches
or a firmware call overwrites or invalidates the branch history.

The sequence of branches is added to the vectors, and should appear
before the first indirect branch. For systems using KPTI the sequence
is added to the kpti trampoline where it has a free register as the exit
from the trampoline is via a 'ret'. For systems not using KPTI, the same
register tricks are used to free up a register in the vectors.

For the firmware call, arch-workaround-3 clobbers 4 registers, so
there is no choice but to save them to the EL1 stack. This only happens
for entry from EL0, so if we take an exception due to the stack access,
it will not become re-entrant.

For KVM, the existing branch-predictor-hardening vectors are used.
When a spectre version of these vectors is in use, the firmware call
is sufficient to mitigate against Spectre-BHB. For the non-spectre
versions, the sequence of branches is added to the indirect vector.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoarm64: proton-pack: Report Spectre-BHB vulnerabilities as part of Spectre-v2
James Morse [Tue, 8 Feb 2022 16:08:13 +0000 (16:08 +0000)]
arm64: proton-pack: Report Spectre-BHB vulnerabilities as part of Spectre-v2

commit dee435be76f4117410bbd90573a881fd33488f37 upstream.

Speculation attacks against some high-performance processors can
make use of branch history to influence future speculation as part of
a spectre-v2 attack. This is not mitigated by CSV2, meaning CPUs that
previously reported 'Not affected' are now moderately mitigated by CSV2.

Update the value in /sys/devices/system/cpu/vulnerabilities/spectre_v2
to also show the state of the BHB mitigation.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoarm64: Add percpu vectors for EL1
James Morse [Tue, 23 Nov 2021 18:29:25 +0000 (18:29 +0000)]
arm64: Add percpu vectors for EL1

commit bd09128d16fac3c34b80bd6a29088ac632e8ce09 upstream.

The Spectre-BHB workaround adds a firmware call to the vectors. This
is needed on some CPUs, but not others. To avoid the unaffected CPU in
a big/little pair from making the firmware call, create per cpu vectors.

The per-cpu vectors only apply when returning from EL0.

Systems using KPTI can use the canonical 'full-fat' vectors directly at
EL1, the trampoline exit code will switch to this_cpu_vector on exit to
EL0. Systems not using KPTI should always use this_cpu_vector.

this_cpu_vector will point at a vector in tramp_vecs or
__bp_harden_el1_vectors, depending on whether KPTI is in use.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoarm64: entry: Add macro for reading symbol addresses from the trampoline
James Morse [Thu, 25 Nov 2021 14:25:34 +0000 (14:25 +0000)]
arm64: entry: Add macro for reading symbol addresses from the trampoline

commit b28a8eebe81c186fdb1a0078263b30576c8e1f42 upstream.

The trampoline code needs to use the address of symbols in the wider
kernel, e.g. vectors. PC-relative addressing wouldn't work as the
trampoline code doesn't run at the address the linker expected.

tramp_ventry uses a literal pool, unless CONFIG_RANDOMIZE_BASE is
set, in which case it uses the data page as a literal pool because
the data page can be unmapped when running in user-space, which is
required for CPUs vulnerable to meltdown.

Pull this logic out as a macro, instead of adding a third copy
of it.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoarm64: entry: Add vectors that have the bhb mitigation sequences
James Morse [Thu, 18 Nov 2021 13:59:46 +0000 (13:59 +0000)]
arm64: entry: Add vectors that have the bhb mitigation sequences

commit ba2689234be92024e5635d30fe744f4853ad97db upstream.

Some CPUs affected by Spectre-BHB need a sequence of branches, or a
firmware call to be run before any indirect branch. This needs to go
in the vectors. No CPU needs both.

While this can be patched in, it would run on all CPUs as there is a
single set of vectors. If only one part of a big/little combination is
affected, the unaffected CPUs have to run the mitigation too.

Create extra vectors that include the sequence. Subsequent patches will
allow affected CPUs to select this set of vectors. Later patches will
modify the loop count to match what the CPU requires.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoarm64: entry: Add non-kpti __bp_harden_el1_vectors for mitigations
James Morse [Wed, 24 Nov 2021 15:03:15 +0000 (15:03 +0000)]
arm64: entry: Add non-kpti __bp_harden_el1_vectors for mitigations

commit aff65393fa1401e034656e349abd655cfe272de0 upstream.

kpti is an optional feature, for systems not using kpti a set of
vectors for the spectre-bhb mitigations is needed.

Add another set of vectors, __bp_harden_el1_vectors, that will be
used if a mitigation is needed and kpti is not in use.

The EL1 ventries are repeated verbatim as there is no additional
work needed for entry from EL1.

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoarm64: entry: Allow the trampoline text to occupy multiple pages
James Morse [Thu, 18 Nov 2021 15:04:32 +0000 (15:04 +0000)]
arm64: entry: Allow the trampoline text to occupy multiple pages

commit a9c406e6462ff14956d690de7bbe5131a5677dc9 upstream.

Adding a second set of vectors to .entry.tramp.text will make it
larger than a single 4K page.

Allow the trampoline text to occupy up to three pages by adding two
more fixmap slots. Previous changes to tramp_valias allowed it to reach
beyond a single page.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoarm64: entry: Make the kpti trampoline's kpti sequence optional
James Morse [Thu, 18 Nov 2021 13:16:23 +0000 (13:16 +0000)]
arm64: entry: Make the kpti trampoline's kpti sequence optional

commit c47e4d04ba0f1ea17353d85d45f611277507e07a upstream.

Spectre-BHB needs to add sequences to the vectors. Having one global
set of vectors is a problem for big/little systems where the sequence
is costly on cpus that are not vulnerable.

Making the vectors per-cpu in the style of KVM's bh_harden_hyp_vecs
requires the vectors to be generated by macros.

Make the kpti re-mapping of the kernel optional, so the macros can be
used without kpti.

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoarm64: entry: Move trampoline macros out of ifdef'd section
James Morse [Thu, 18 Nov 2021 14:02:30 +0000 (14:02 +0000)]
arm64: entry: Move trampoline macros out of ifdef'd section

commit 13d7a08352a83ef2252aeb464a5e08dfc06b5dfd upstream.

The macros for building the kpti trampoline are all behind
CONFIG_UNMAP_KERNEL_AT_EL0, and in a region that outputs to the
.entry.tramp.text section.

Move the macros out so they can be used to generate other kinds of
trampoline. Only the symbols need to be guarded by
CONFIG_UNMAP_KERNEL_AT_EL0 and appear in the .entry.tramp.text section.

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoarm64: entry: Don't assume tramp_vectors is the start of the vectors
James Morse [Wed, 24 Nov 2021 13:40:09 +0000 (13:40 +0000)]
arm64: entry: Don't assume tramp_vectors is the start of the vectors

commit ed50da7764535f1e24432ded289974f2bf2b0c5a upstream.

The tramp_ventry macro uses tramp_vectors as the address of the vectors
when calculating which ventry in the 'full fat' vectors to branch to.

While there is one set of tramp_vectors, this will be true.
Adding multiple sets of vectors will break this assumption.

Move the generation of the vectors to a macro, and pass the start
of the vectors as an argument to tramp_ventry.

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoarm64: entry: Allow tramp_alias to access symbols after the 4K boundary
James Morse [Wed, 24 Nov 2021 11:40:18 +0000 (11:40 +0000)]
arm64: entry: Allow tramp_alias to access symbols after the 4K boundary

commit 6c5bf79b69f911560fbf82214c0971af6e58e682 upstream.

Systems using kpti enter and exit the kernel through a trampoline mapping
that is always mapped, even when the kernel is not. tramp_valias is a macro
to find the address of a symbol in the trampoline mapping.

Adding extra sets of vectors will expand the size of the entry.tramp.text
section to beyond 4K. tramp_valias will be unable to generate addresses
for symbols beyond 4K as it uses the 12 bit immediate of the add
instruction.

As there are now two registers available when tramp_alias is called,
use the extra register to avoid the 4K limit of the 12 bit immediate.

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoarm64: entry: Move the trampoline data page before the text page
James Morse [Tue, 23 Nov 2021 15:43:31 +0000 (15:43 +0000)]
arm64: entry: Move the trampoline data page before the text page

commit c091fb6ae059cda563b2a4d93fdbc548ef34e1d6 upstream.

The trampoline code has a data page that holds the address of the vectors,
which is unmapped when running in user-space. This ensures that with
CONFIG_RANDOMIZE_BASE, the randomised address of the kernel can't be
discovered until after the kernel has been mapped.

If the trampoline text page is extended to include multiple sets of
vectors, it will be larger than a single page, making it tricky to
find the data page without knowing the size of the trampoline text
pages, which will vary with PAGE_SIZE.

Move the data page to appear before the text page. This allows the
data page to be found without knowing the size of the trampoline text
pages. 'tramp_vectors' is used to refer to the beginning of the
.entry.tramp.text section, do that explicitly.

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoarm64: entry: Free up another register on kpti's tramp_exit path
James Morse [Tue, 23 Nov 2021 18:41:43 +0000 (18:41 +0000)]
arm64: entry: Free up another register on kpti's tramp_exit path

commit 03aff3a77a58b5b52a77e00537a42090ad57b80b upstream.

Kpti stashes x30 in far_el1 while it uses x30 for all its work.

Making the vectors a per-cpu data structure will require a second
register.

Allow tramp_exit two registers before it unmaps the kernel, by
leaving x30 on the stack, and stashing x29 in far_el1.

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoarm64: entry: Make the trampoline cleanup optional
James Morse [Wed, 24 Nov 2021 15:36:12 +0000 (15:36 +0000)]
arm64: entry: Make the trampoline cleanup optional

commit d739da1694a0eaef0358a42b76904b611539b77b upstream.

Subsequent patches will add additional sets of vectors that use
the same tricks as the kpti vectors to reach the full-fat vectors.
The full-fat vectors contain some cleanup for kpti that is patched
in by alternatives when kpti is in use. Once there are additional
vectors, the cleanup will be needed in more cases.

But on big/little systems, the cleanup would be harmful if no
trampoline vector were in use. Instead of forcing CPUs that don't
need a trampoline vector to use one, make the trampoline cleanup
optional.

Entry at the top of the vectors will skip the cleanup. The trampoline
vectors can then skip the first instruction, triggering the cleanup
to run.

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoKVM: arm64: Allow indirect vectors to be used without SPECTRE_V3A
James Morse [Tue, 16 Nov 2021 15:06:19 +0000 (15:06 +0000)]
KVM: arm64: Allow indirect vectors to be used without SPECTRE_V3A

commit 5bdf3437603d4af87f9c7f424b0c8aeed2420745 upstream.

CPUs vulnerable to Spectre-BHB either need to make an SMC-CC firmware
call from the vectors, or run a sequence of branches. This gets added
to the hyp vectors. If there is no support for arch-workaround-1 in
firmware, the indirect vector will be used.

kvm_init_vector_slots() only initialises the two indirect slots if
the platform is vulnerable to Spectre-v3a. pKVM's hyp_map_vectors()
only initialises __hyp_bp_vect_base if the platform is vulnerable to
Spectre-v3a.

As there are about to more users of the indirect vectors, ensure
their entries in hyp_spectre_vector_selector[] are always initialised,
and __hyp_bp_vect_base defaults to the regular VA mapping.

The Spectre-v3a check is moved to a helper
kvm_system_needs_idmapped_vectors(), and merged with the code
that creates the hyp mappings.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoarm64: spectre: Rename spectre_v4_patch_fw_mitigation_conduit
James Morse [Tue, 16 Nov 2021 15:00:51 +0000 (15:00 +0000)]
arm64: spectre: Rename spectre_v4_patch_fw_mitigation_conduit

commit 1b33d4860deaecf1d8eec3061b7e7ed7ab0bae8d upstream.

The spectre-v4 sequence includes an SMC from the assembly entry code.
spectre_v4_patch_fw_mitigation_conduit is the patching callback that
generates an HVC or SMC depending on the SMCCC conduit type.

As this isn't specific to spectre-v4, rename it
smccc_patch_fw_mitigation_conduit so it can be re-used.

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoarm64: entry.S: Add ventry overflow sanity checks
James Morse [Wed, 17 Nov 2021 15:15:26 +0000 (15:15 +0000)]
arm64: entry.S: Add ventry overflow sanity checks

commit 4330e2c5c04c27bebf89d34e0bc14e6943413067 upstream.

Subsequent patches add even more code to the ventry slots.
Ensure kernels that overflow a ventry slot don't get built.

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoarm64: cpufeature: add HWCAP for FEAT_RPRES
Joey Gouly [Fri, 10 Dec 2021 16:54:32 +0000 (16:54 +0000)]
arm64: cpufeature: add HWCAP for FEAT_RPRES

commit 1175011a7d0030d49dc9c10bde36f08f26d0a8ee upstream.

Add a new HWCAP to detect the Increased precision of Reciprocal Estimate
and Reciprocal Square Root Estimate feature (FEAT_RPRES), introduced in Armv8.7.

Also expose this to userspace in the ID_AA64ISAR2_EL1 feature register.

Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211210165432.8106-4-joey.gouly@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoarm64: cpufeature: add HWCAP for FEAT_AFP
Joey Gouly [Fri, 10 Dec 2021 16:54:30 +0000 (16:54 +0000)]
arm64: cpufeature: add HWCAP for FEAT_AFP

commit 5c13f042e73200b50573ace63e1a6b94e2917616 upstream.

Add a new HWCAP to detect the Alternate Floating-point Behaviour
feature (FEAT_AFP), introduced in Armv8.7.

Also expose this to userspace in the ID_AA64MMFR1_EL1 feature register.

Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Will Deacon <will@kernel.org>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211210165432.8106-2-joey.gouly@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoarm64: add ID_AA64ISAR2_EL1 sys register
Joey Gouly [Fri, 10 Dec 2021 16:54:31 +0000 (16:54 +0000)]
arm64: add ID_AA64ISAR2_EL1 sys register

commit 9e45365f1469ef2b934f9d035975dbc9ad352116 upstream.

This is a new ID register, introduced in 8.7.

Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Reiji Watanabe <reijiw@google.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211210165432.8106-3-joey.gouly@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoarm64: Add Cortex-X2 CPU part definition
Anshuman Khandual [Mon, 24 Jan 2022 03:15:37 +0000 (08:45 +0530)]
arm64: Add Cortex-X2 CPU part definition

commit 72bb9dcb6c33cfac80282713c2b4f2b254cd24d1 upstream.

Add the CPU Partnumbers for the new Arm designs.

Cc: Will Deacon <will@kernel.org>
Cc: Suzuki Poulose <suzuki.poulose@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/1642994138-25887-2-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoarm64: Add HWCAP for self-synchronising virtual counter
Marc Zyngier [Sun, 17 Oct 2021 12:42:25 +0000 (13:42 +0100)]
arm64: Add HWCAP for self-synchronising virtual counter

commit fee29f008aa3f2aff01117f28b57b1145d92cb9b upstream.

Since userspace can make use of the CNTVSS_EL0 instruction, expose
it via a HWCAP.

Suggested-by: Will Deacon <will@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211017124225.3018098-18-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoarm64: Add Neoverse-N2, Cortex-A710 CPU part definition
Suzuki K Poulose [Tue, 19 Oct 2021 16:31:39 +0000 (17:31 +0100)]
arm64: Add Neoverse-N2, Cortex-A710 CPU part definition

commit 2d0d656700d67239a57afaf617439143d8dac9be upstream.

Add the CPU Partnumbers for the new Arm designs.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20211019163153.3692640-2-suzuki.poulose@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoARM: include unprivileged BPF status in Spectre V2 reporting
Russell King (Oracle) [Mon, 7 Mar 2022 19:28:32 +0000 (19:28 +0000)]
ARM: include unprivileged BPF status in Spectre V2 reporting

commit 25875aa71dfefd1959f07e626c4d285b88b27ac2 upstream.

The mitigations for Spectre-BHB are only applied when an exception
is taken, but when unprivileged BPF is enabled, userspace can
load BPF programs that can be used to exploit the problem.

When unprivileged BPF is enabled, report the vulnerable status via
the spectre_v2 sysfs file.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoARM: Spectre-BHB workaround
Russell King (Oracle) [Thu, 10 Feb 2022 16:05:45 +0000 (16:05 +0000)]
ARM: Spectre-BHB workaround

commit b9baf5c8c5c356757f4f9d8180b5e9d234065bc3 upstream.

Workaround the Spectre BHB issues for Cortex-A15, Cortex-A57,
Cortex-A72, Cortex-A73 and Cortex-A75. We also include Brahma B15 as
well to be safe, which is affected by Spectre V2 in the same ways as
Cortex-A15.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
[changes due to lack of SYSTEM_FREEING_INITMEM - gregkh]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoARM: use LOADADDR() to get load address of sections
Russell King (Oracle) [Fri, 11 Feb 2022 19:49:50 +0000 (19:49 +0000)]
ARM: use LOADADDR() to get load address of sections

commit 8d9d651ff2270a632e9dc497b142db31e8911315 upstream.

Use the linker's LOADADDR() macro to get the load address of the
sections, and provide a macro to set the start and end symbols.

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoARM: early traps initialisation
Russell King (Oracle) [Fri, 11 Feb 2022 19:46:15 +0000 (19:46 +0000)]
ARM: early traps initialisation

commit 04e91b7324760a377a725e218b5ee783826d30f5 upstream.

Provide a couple of helpers to copy the vectors and stubs, and also
to flush the copied vectors and stubs.

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoARM: report Spectre v2 status through sysfs
Russell King (Oracle) [Fri, 11 Feb 2022 16:45:54 +0000 (16:45 +0000)]
ARM: report Spectre v2 status through sysfs

commit 9dd78194a3722fa6712192cdd4f7032d45112a9a upstream.

As per other architectures, add support for reporting the Spectre
vulnerability status via sysfs CPU.

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agox86/speculation: Warn about eIBRS + LFENCE + Unprivileged eBPF + SMT
Josh Poimboeuf [Fri, 25 Feb 2022 22:32:28 +0000 (14:32 -0800)]
x86/speculation: Warn about eIBRS + LFENCE + Unprivileged eBPF + SMT

commit 0de05d056afdb00eca8c7bbb0c79a3438daf700c upstream.

The commit

   44a3918c8245 ("x86/speculation: Include unprivileged eBPF status in Spectre v2 mitigation reporting")

added a warning for the "eIBRS + unprivileged eBPF" combination, which
has been shown to be vulnerable against Spectre v2 BHB-based attacks.

However, there's no warning about the "eIBRS + LFENCE retpoline +
unprivileged eBPF" combo. The LFENCE adds more protection by shortening
the speculation window after a mispredicted branch. That makes an attack
significantly more difficult, even with unprivileged eBPF. So at least
for now the logic doesn't warn about that combination.

But if you then add SMT into the mix, the SMT attack angle weakens the
effectiveness of the LFENCE considerably.

So extend the "eIBRS + unprivileged eBPF" warning to also include the
"eIBRS + LFENCE + unprivileged eBPF + SMT" case.

  [ bp: Massage commit message. ]

Suggested-by: Alyssa Milburn <alyssa.milburn@linux.intel.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agox86/speculation: Warn about Spectre v2 LFENCE mitigation
Josh Poimboeuf [Fri, 25 Feb 2022 22:31:49 +0000 (14:31 -0800)]
x86/speculation: Warn about Spectre v2 LFENCE mitigation

commit eafd987d4a82c7bb5aa12f0e3b4f8f3dea93e678 upstream.

With:

  f8a66d608a3e ("x86,bugs: Unconditionally allow spectre_v2=retpoline,amd")

it became possible to enable the LFENCE "retpoline" on Intel. However,
Intel doesn't recommend it, as it has some weaknesses compared to
retpoline.

Now AMD doesn't recommend it either.

It can still be left available as a cmdline option. It's faster than
retpoline but is weaker in certain scenarios -- particularly SMT, but
even non-SMT may be vulnerable in some cases.

So just unconditionally warn if the user requests it on the cmdline.

  [ bp: Massage commit message. ]

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agox86/speculation: Update link to AMD speculation whitepaper
Kim Phillips [Mon, 28 Feb 2022 17:23:16 +0000 (11:23 -0600)]
x86/speculation: Update link to AMD speculation whitepaper

commit e9b6013a7ce31535b04b02ba99babefe8a8599fa upstream.

Update the link to the "Software Techniques for Managing Speculation
on AMD Processors" whitepaper.

Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agox86/speculation: Use generic retpoline by default on AMD
Kim Phillips [Mon, 28 Feb 2022 17:23:15 +0000 (11:23 -0600)]
x86/speculation: Use generic retpoline by default on AMD

commit 244d00b5dd4755f8df892c86cab35fb2cfd4f14b upstream.

AMD retpoline may be susceptible to speculation. The speculation
execution window for an incorrect indirect branch prediction using
LFENCE/JMP sequence may potentially be large enough to allow
exploitation using Spectre V2.

By default, don't use retpoline,lfence on AMD.  Instead, use the
generic retpoline.

Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agox86/speculation: Include unprivileged eBPF status in Spectre v2 mitigation reporting
Josh Poimboeuf [Fri, 18 Feb 2022 19:49:08 +0000 (11:49 -0800)]
x86/speculation: Include unprivileged eBPF status in Spectre v2 mitigation reporting

commit 44a3918c8245ab10c6c9719dd12e7a8d291980d8 upstream.

With unprivileged eBPF enabled, eIBRS (without retpoline) is vulnerable
to Spectre v2 BHB-based attacks.

When both are enabled, print a warning message and report it in the
'spectre_v2' sysfs vulnerabilities file.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
[fllinden@amazon.com: backported to 5.15]
Signed-off-by: Frank van der Linden <fllinden@amazon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoDocumentation/hw-vuln: Update spectre doc
Peter Zijlstra [Wed, 16 Feb 2022 19:57:02 +0000 (20:57 +0100)]
Documentation/hw-vuln: Update spectre doc

commit 5ad3eb1132453b9795ce5fd4572b1c18b292cca9 upstream.

Update the doc with the new fun.

  [ bp: Massage commit message. ]

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
[fllinden@amazon.com: backported to 5.15]
Signed-off-by: Frank van der Linden <fllinden@amazon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agox86/speculation: Add eIBRS + Retpoline options
Peter Zijlstra [Wed, 16 Feb 2022 19:57:01 +0000 (20:57 +0100)]
x86/speculation: Add eIBRS + Retpoline options

commit 1e19da8522c81bf46b335f84137165741e0d82b7 upstream.

Thanks to the chaps at VUsec it is now clear that eIBRS is not
sufficient, therefore allow enabling of retpolines along with eIBRS.

Add spectre_v2=eibrs, spectre_v2=eibrs,lfence and
spectre_v2=eibrs,retpoline options to explicitly pick your preferred
means of mitigation.

Since there's new mitigations there's also user visible changes in
/sys/devices/system/cpu/vulnerabilities/spectre_v2 to reflect these
new mitigations.

  [ bp: Massage commit message, trim error messages,
    do more precise eIBRS mode checking. ]

Co-developed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Patrick Colp <patrick.colp@oracle.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agox86/speculation: Rename RETPOLINE_AMD to RETPOLINE_LFENCE
Peter Zijlstra (Intel) [Wed, 16 Feb 2022 19:57:00 +0000 (20:57 +0100)]
x86/speculation: Rename RETPOLINE_AMD to RETPOLINE_LFENCE

commit d45476d9832409371537013ebdd8dc1a7781f97a upstream.

The RETPOLINE_AMD name is unfortunate since it isn't necessarily
AMD only, in fact Hygon also uses it. Furthermore it will likely be
sufficient for some Intel processors. Therefore rename the thing to
RETPOLINE_LFENCE to better describe what it is.

Add the spectre_v2=retpoline,lfence option as an alias to
spectre_v2=retpoline,amd to preserve existing setups. However, the output
of /sys/devices/system/cpu/vulnerabilities/spectre_v2 will be changed.

  [ bp: Fix typos, massage. ]

Co-developed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
[fllinden@amazon.com: backported to 5.15]
Signed-off-by: Frank van der Linden <fllinden@amazon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agox86,bugs: Unconditionally allow spectre_v2=retpoline,amd
Peter Zijlstra [Tue, 26 Oct 2021 12:01:46 +0000 (14:01 +0200)]
x86,bugs: Unconditionally allow spectre_v2=retpoline,amd

commit f8a66d608a3e471e1202778c2a36cbdc96bae73b upstream.

Currently Linux prevents usage of retpoline,amd on !AMD hardware, this
is unfriendly and gets in the way of testing. Remove this restriction.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120310.487348118@infradead.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoslip: fix macro redefine warning
Huang Pei [Tue, 23 Nov 2021 11:07:49 +0000 (19:07 +0800)]
slip: fix macro redefine warning

commit e5b40668e930979bd1e82c7ed7c9029db635f0e4 upstream.

MIPS/IA64 define END as assembly function ending, which conflict
with END definition in slip.h, just undef it at first

Reported-by: lkp@intel.com
Signed-off-by: Huang Pei <huangpei@loongson.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoLinux 5.15.27 v5.15.27
Greg Kroah-Hartman [Tue, 8 Mar 2022 18:12:55 +0000 (19:12 +0100)]
Linux 5.15.27

Link: https://lore.kernel.org/r/20220307091702.378509770@linuxfoundation.org
Link: https://lore.kernel.org/r/20220307162207.188028559@linuxfoundation.org
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Shuah Khan <skhan@linuxfoundation.org>
Tested-by: Fox Chen <foxhlchen@gmail.com>
Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
Tested-by: Ron Economos <re@w6rz.net>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Tested-by: Bagas Sanjaya <bagasdotme@gmail.com>
Tested-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agohamradio: fix macro redefine warning
Huang Pei [Tue, 23 Nov 2021 11:07:48 +0000 (19:07 +0800)]
hamradio: fix macro redefine warning

commit 16517829f2e02f096fb5ea9083d160381127faf3 upstream.

MIPS/IA64 define END as assembly function ending, which conflict
with END definition in mkiss.c, just undef it at first

Reported-by: lkp@intel.com
Signed-off-by: Huang Pei <huangpei@loongson.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoKVM: x86/mmu: Passing up the error state of mmu_alloc_shadow_roots()
Like Xu [Tue, 1 Mar 2022 12:49:41 +0000 (20:49 +0800)]
KVM: x86/mmu: Passing up the error state of mmu_alloc_shadow_roots()

commit c6c937d673aaa1d603f62f134e1ca9c173eeeed3 upstream.

Just like on the optional mmu_alloc_direct_roots() path, once shadow
path reaches "r = -EIO" somewhere, the caller needs to know the actual
state in order to enter error handling and avoid something worse.

Fixes: 4a38162ee9f1 ("KVM: MMU: load PDPTRs outside mmu_lock")
Signed-off-by: Like Xu <likexu@tencent.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220301124941.48412-1-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoproc: fix documentation and description of pagemap
Yun Zhou [Sat, 5 Mar 2022 04:29:07 +0000 (20:29 -0800)]
proc: fix documentation and description of pagemap

commit dd21bfa425c098b95ca86845f8e7d1ec1ddf6e4a upstream.

Since bit 57 was exported for uffd-wp write-protected (commit
fb8e37f35a2f: "mm/pagemap: export uffd-wp protection information"),
fixing it can reduce some unnecessary confusion.

Link: https://lkml.kernel.org/r/20220301044538.3042713-1-yun.zhou@windriver.com
Fixes: fb8e37f35a2fe1 ("mm/pagemap: export uffd-wp protection information")
Signed-off-by: Yun Zhou <yun.zhou@windriver.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Tiberiu A Georgescu <tiberiu.georgescu@nutanix.com>
Cc: Florian Schmidt <florian.schmidt@nutanix.com>
Cc: Ivan Teterevkov <ivan.teterevkov@nutanix.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Yang Shi <shy828301@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Colin Cross <ccross@google.com>
Cc: Alistair Popple <apopple@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoRevert "xfrm: xfrm_state_mtu should return at least 1280 for ipv6"
Jiri Bohac [Wed, 26 Jan 2022 15:00:18 +0000 (16:00 +0100)]
Revert "xfrm: xfrm_state_mtu should return at least 1280 for ipv6"

commit a6d95c5a628a09be129f25d5663a7e9db8261f51 upstream.

This reverts commit b515d2637276a3810d6595e10ab02c13bfd0b63a.

Commit b515d2637276a3810d6595e10ab02c13bfd0b63a ("xfrm: xfrm_state_mtu
should return at least 1280 for ipv6") in v5.14 breaks the TCP MSS
calculation in ipsec transport mode, resulting complete stalls of TCP
connections. This happens when the (P)MTU is 1280 or slighly larger.

The desired formula for the MSS is:
MSS = (MTU - ESP_overhead) - IP header - TCP header

However, the above commit clamps the (MTU - ESP_overhead) to a
minimum of 1280, turning the formula into
MSS = max(MTU - ESP overhead, 1280) -  IP header - TCP header

With the (P)MTU near 1280, the calculated MSS is too large and the
resulting TCP packets never make it to the destination because they
are over the actual PMTU.

The above commit also causes suboptimal double fragmentation in
xfrm tunnel mode, as described in
https://lore.kernel.org/netdev/20210429202529.codhwpc7w6kbudug@dwarf.suse.cz/

The original problem the above commit was trying to fix is now fixed
by commit 6596a0229541270fb8d38d989f91b78838e5e9da ("xfrm: fix MTU
regression").

Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agobtrfs: do not start relocation until in progress drops are done
Josef Bacik [Fri, 18 Feb 2022 19:56:10 +0000 (14:56 -0500)]
btrfs: do not start relocation until in progress drops are done

commit b4be6aefa73c9a6899ef3ba9c5faaa8a66e333ef upstream.

We hit a bug with a recovering relocation on mount for one of our file
systems in production.  I reproduced this locally by injecting errors
into snapshot delete with balance running at the same time.  This
presented as an error while looking up an extent item

  WARNING: CPU: 5 PID: 1501 at fs/btrfs/extent-tree.c:866 lookup_inline_extent_backref+0x647/0x680
  CPU: 5 PID: 1501 Comm: btrfs-balance Not tainted 5.16.0-rc8+ #8
  RIP: 0010:lookup_inline_extent_backref+0x647/0x680
  RSP: 0018:ffffae0a023ab960 EFLAGS: 00010202
  RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: 000000000000000c RDI: 0000000000000000
  RBP: ffff943fd2a39b60 R08: 0000000000000000 R09: 0000000000000001
  R10: 0001434088152de0 R11: 0000000000000000 R12: 0000000001d05000
  R13: ffff943fd2a39b60 R14: ffff943fdb96f2a0 R15: ffff9442fc923000
  FS:  0000000000000000(0000) GS:ffff944e9eb40000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f1157b1fca8 CR3: 000000010f092000 CR4: 0000000000350ee0
  Call Trace:
   <TASK>
   insert_inline_extent_backref+0x46/0xd0
   __btrfs_inc_extent_ref.isra.0+0x5f/0x200
   ? btrfs_merge_delayed_refs+0x164/0x190
   __btrfs_run_delayed_refs+0x561/0xfa0
   ? btrfs_search_slot+0x7b4/0xb30
   ? btrfs_update_root+0x1a9/0x2c0
   btrfs_run_delayed_refs+0x73/0x1f0
   ? btrfs_update_root+0x1a9/0x2c0
   btrfs_commit_transaction+0x50/0xa50
   ? btrfs_update_reloc_root+0x122/0x220
   prepare_to_merge+0x29f/0x320
   relocate_block_group+0x2b8/0x550
   btrfs_relocate_block_group+0x1a6/0x350
   btrfs_relocate_chunk+0x27/0xe0
   btrfs_balance+0x777/0xe60
   balance_kthread+0x35/0x50
   ? btrfs_balance+0xe60/0xe60
   kthread+0x16b/0x190
   ? set_kthread_struct+0x40/0x40
   ret_from_fork+0x22/0x30
   </TASK>

Normally snapshot deletion and relocation are excluded from running at
the same time by the fs_info->cleaner_mutex.  However if we had a
pending balance waiting to get the ->cleaner_mutex, and a snapshot
deletion was running, and then the box crashed, we would come up in a
state where we have a half deleted snapshot.

Again, in the normal case the snapshot deletion needs to complete before
relocation can start, but in this case relocation could very well start
before the snapshot deletion completes, as we simply add the root to the
dead roots list and wait for the next time the cleaner runs to clean up
the snapshot.

Fix this by setting a bit on the fs_info if we have any DEAD_ROOT's that
had a pending drop_progress key.  If they do then we know we were in the
middle of the drop operation and set a flag on the fs_info.  Then
balance can wait until this flag is cleared to start up again.

If there are DEAD_ROOT's that don't have a drop_progress set then we're
safe to start balance right away as we'll be properly protected by the
cleaner_mutex.

CC: stable@vger.kernel.org # 5.10+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agobtrfs: add missing run of delayed items after unlink during log replay
Filipe Manana [Mon, 28 Feb 2022 16:29:28 +0000 (16:29 +0000)]
btrfs: add missing run of delayed items after unlink during log replay

commit 4751dc99627e4d1465c5bfa8cb7ab31ed418eff5 upstream.

During log replay, whenever we need to check if a name (dentry) exists in
a directory we do searches on the subvolume tree for inode references or
or directory entries (BTRFS_DIR_INDEX_KEY keys, and BTRFS_DIR_ITEM_KEY
keys as well, before kernel 5.17). However when during log replay we
unlink a name, through btrfs_unlink_inode(), we may not delete inode
references and dir index keys from a subvolume tree and instead just add
the deletions to the delayed inode's delayed items, which will only be
run when we commit the transaction used for log replay. This means that
after an unlink operation during log replay, if we attempt to search for
the same name during log replay, we will not see that the name was already
deleted, since the deletion is recorded only on the delayed items.

We run delayed items after every unlink operation during log replay,
except at unlink_old_inode_refs() and at add_inode_ref(). This was due
to an overlook, as delayed items should be run after evert unlink, for
the reasons stated above.

So fix those two cases.

Fixes: 0d836392cadd5 ("Btrfs: fix mount failure after fsync due to hard link recreation")
Fixes: 1f250e929a9c9 ("Btrfs: fix log replay failure after unlink and link combination")
CC: stable@vger.kernel.org # 4.19+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agobtrfs: qgroup: fix deadlock between rescan worker and remove qgroup
Sidong Yang [Mon, 28 Feb 2022 01:43:40 +0000 (01:43 +0000)]
btrfs: qgroup: fix deadlock between rescan worker and remove qgroup

commit d4aef1e122d8bbdc15ce3bd0bc813d6b44a7d63a upstream.

The commit e804861bd4e6 ("btrfs: fix deadlock between quota disable and
qgroup rescan worker") by Kawasaki resolves deadlock between quota
disable and qgroup rescan worker. But also there is a deadlock case like
it. It's about enabling or disabling quota and creating or removing
qgroup. It can be reproduced in simple script below.

for i in {1..100}
do
    btrfs quota enable /mnt &
    btrfs qgroup create 1/0 /mnt &
    btrfs qgroup destroy 1/0 /mnt &
    btrfs quota disable /mnt &
done

Here's why the deadlock happens:

1) The quota rescan task is running.

2) Task A calls btrfs_quota_disable(), locks the qgroup_ioctl_lock
   mutex, and then calls btrfs_qgroup_wait_for_completion(), to wait for
   the quota rescan task to complete.

3) Task B calls btrfs_remove_qgroup() and it blocks when trying to lock
   the qgroup_ioctl_lock mutex, because it's being held by task A. At that
   point task B is holding a transaction handle for the current transaction.

4) The quota rescan task calls btrfs_commit_transaction(). This results
   in it waiting for all other tasks to release their handles on the
   transaction, but task B is blocked on the qgroup_ioctl_lock mutex
   while holding a handle on the transaction, and that mutex is being held
   by task A, which is waiting for the quota rescan task to complete,
   resulting in a deadlock between these 3 tasks.

To resolve this issue, the thread disabling quota should unlock
qgroup_ioctl_lock before waiting rescan completion. Move
btrfs_qgroup_wait_for_completion() after unlock of qgroup_ioctl_lock.

Fixes: e804861bd4e6 ("btrfs: fix deadlock between quota disable and qgroup rescan worker")
CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Sidong Yang <realwakka@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agobtrfs: do not WARN_ON() if we have PageError set
Josef Bacik [Fri, 18 Feb 2022 15:17:39 +0000 (10:17 -0500)]
btrfs: do not WARN_ON() if we have PageError set

commit a50e1fcbc9b85fd4e95b89a75c0884cb032a3e06 upstream.

Whenever we do any extent buffer operations we call
assert_eb_page_uptodate() to complain loudly if we're operating on an
non-uptodate page.  Our overnight tests caught this warning earlier this
week

  WARNING: CPU: 1 PID: 553508 at fs/btrfs/extent_io.c:6849 assert_eb_page_uptodate+0x3f/0x50
  CPU: 1 PID: 553508 Comm: kworker/u4:13 Tainted: G        W         5.17.0-rc3+ #564
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
  Workqueue: btrfs-cache btrfs_work_helper
  RIP: 0010:assert_eb_page_uptodate+0x3f/0x50
  RSP: 0018:ffffa961440a7c68 EFLAGS: 00010246
  RAX: 0017ffffc0002112 RBX: ffffe6e74453f9c0 RCX: 0000000000001000
  RDX: ffffe6e74467c887 RSI: ffffe6e74453f9c0 RDI: ffff8d4c5efc2fc0
  RBP: 0000000000000d56 R08: ffff8d4d4a224000 R09: 0000000000000000
  R10: 00015817fa9d1ef0 R11: 000000000000000c R12: 00000000000007b1
  R13: ffff8d4c5efc2fc0 R14: 0000000001500000 R15: 0000000001cb1000
  FS:  0000000000000000(0000) GS:ffff8d4dbbd00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007ff31d3448d8 CR3: 0000000118be8004 CR4: 0000000000370ee0
  Call Trace:

   extent_buffer_test_bit+0x3f/0x70
   free_space_test_bit+0xa6/0xc0
   load_free_space_tree+0x1f6/0x470
   caching_thread+0x454/0x630
   ? rcu_read_lock_sched_held+0x12/0x60
   ? rcu_read_lock_sched_held+0x12/0x60
   ? rcu_read_lock_sched_held+0x12/0x60
   ? lock_release+0x1f0/0x2d0
   btrfs_work_helper+0xf2/0x3e0
   ? lock_release+0x1f0/0x2d0
   ? finish_task_switch.isra.0+0xf9/0x3a0
   process_one_work+0x26d/0x580
   ? process_one_work+0x580/0x580
   worker_thread+0x55/0x3b0
   ? process_one_work+0x580/0x580
   kthread+0xf0/0x120
   ? kthread_complete_and_exit+0x20/0x20
   ret_from_fork+0x1f/0x30

This was partially fixed by c2e39305299f01 ("btrfs: clear extent buffer
uptodate when we fail to write it"), however all that fix did was keep
us from finding extent buffers after a failed writeout.  It didn't keep
us from continuing to use a buffer that we already had found.

In this case we're searching the commit root to cache the block group,
so we can start committing the transaction and switch the commit root
and then start writing.  After the switch we can look up an extent
buffer that hasn't been written yet and start processing that block
group.  Then we fail to write that block out and clear Uptodate on the
page, and then we start spewing these errors.

Normally we're protected by the tree lock to a certain degree here.  If
we read a block we have that block read locked, and we block the writer
from locking the block before we submit it for the write.  However this
isn't necessarily fool proof because the read could happen before we do
the submit_bio and after we locked and unlocked the extent buffer.

Also in this particular case we have path->skip_locking set, so that
won't save us here.  We'll simply get a block that was valid when we
read it, but became invalid while we were using it.

What we really want is to catch the case where we've "read" a block but
it's not marked Uptodate.  On read we ClearPageError(), so if we're
!Uptodate and !Error we know we didn't do the right thing for reading
the page.

Fix this by checking !Uptodate && !Error, this way we will not complain
if our buffer gets invalidated while we're using it, and we'll maintain
the spirit of the check which is to make sure we have a fully in-cache
block while we're messing with it.

CC: stable@vger.kernel.org # 5.4+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agobtrfs: fix relocation crash due to premature return from btrfs_commit_transaction()
Omar Sandoval [Thu, 17 Feb 2022 23:14:43 +0000 (15:14 -0800)]
btrfs: fix relocation crash due to premature return from btrfs_commit_transaction()

commit 5fd76bf31ccfecc06e2e6b29f8c809e934085b99 upstream.

We are seeing crashes similar to the following trace:

[38.969182] WARNING: CPU: 20 PID: 2105 at fs/btrfs/relocation.c:4070 btrfs_relocate_block_group+0x2dc/0x340 [btrfs]
[38.973556] CPU: 20 PID: 2105 Comm: btrfs Not tainted 5.17.0-rc4 #54
[38.974580] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
[38.976539] RIP: 0010:btrfs_relocate_block_group+0x2dc/0x340 [btrfs]
[38.980336] RSP: 0000:ffffb0dd42e03c20 EFLAGS: 00010206
[38.981218] RAX: ffff96cfc4ede800 RBX: ffff96cfc3ce0000 RCX: 000000000002ca14
[38.982560] RDX: 0000000000000000 RSI: 4cfd109a0bcb5d7f RDI: ffff96cfc3ce0360
[38.983619] RBP: ffff96cfc309c000 R08: 0000000000000000 R09: 0000000000000000
[38.984678] R10: ffff96cec0000001 R11: ffffe84c80000000 R12: ffff96cfc4ede800
[38.985735] R13: 0000000000000000 R14: 0000000000000000 R15: ffff96cfc3ce0360
[38.987146] FS:  00007f11c15218c0(0000) GS:ffff96d6dfb00000(0000) knlGS:0000000000000000
[38.988662] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[38.989398] CR2: 00007ffc922c8e60 CR3: 00000001147a6001 CR4: 0000000000370ee0
[38.990279] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[38.991219] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[38.992528] Call Trace:
[38.992854]  <TASK>
[38.993148]  btrfs_relocate_chunk+0x27/0xe0 [btrfs]
[38.993941]  btrfs_balance+0x78e/0xea0 [btrfs]
[38.994801]  ? vsnprintf+0x33c/0x520
[38.995368]  ? __kmalloc_track_caller+0x351/0x440
[38.996198]  btrfs_ioctl_balance+0x2b9/0x3a0 [btrfs]
[38.997084]  btrfs_ioctl+0x11b0/0x2da0 [btrfs]
[38.997867]  ? mod_objcg_state+0xee/0x340
[38.998552]  ? seq_release+0x24/0x30
[38.999184]  ? proc_nr_files+0x30/0x30
[38.999654]  ? call_rcu+0xc8/0x2f0
[39.000228]  ? __x64_sys_ioctl+0x84/0xc0
[39.000872]  ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
[39.001973]  __x64_sys_ioctl+0x84/0xc0
[39.002566]  do_syscall_64+0x3a/0x80
[39.003011]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[39.003735] RIP: 0033:0x7f11c166959b
[39.007324] RSP: 002b:00007fff2543e998 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[39.008521] RAX: ffffffffffffffda RBX: 00007f11c1521698 RCX: 00007f11c166959b
[39.009833] RDX: 00007fff2543ea40 RSI: 00000000c4009420 RDI: 0000000000000003
[39.011270] RBP: 0000000000000003 R08: 0000000000000013 R09: 00007f11c16f94e0
[39.012581] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff25440df3
[39.014046] R13: 0000000000000000 R14: 00007fff2543ea40 R15: 0000000000000001
[39.015040]  </TASK>
[39.015418] ---[ end trace 0000000000000000 ]---
[43.131559] ------------[ cut here ]------------
[43.132234] kernel BUG at fs/btrfs/extent-tree.c:2717!
[43.133031] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[43.133702] CPU: 1 PID: 1839 Comm: btrfs Tainted: G        W         5.17.0-rc4 #54
[43.134863] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
[43.136426] RIP: 0010:unpin_extent_range+0x37a/0x4f0 [btrfs]
[43.139913] RSP: 0000:ffffb0dd4216bc70 EFLAGS: 00010246
[43.140629] RAX: 0000000000000000 RBX: ffff96cfc34490f8 RCX: 0000000000000001
[43.141604] RDX: 0000000080000001 RSI: 0000000051d00000 RDI: 00000000ffffffff
[43.142645] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff96cfd07dca50
[43.143669] R10: ffff96cfc46e8a00 R11: fffffffffffec000 R12: 0000000041d00000
[43.144657] R13: ffff96cfc3ce0000 R14: ffffb0dd4216bd08 R15: 0000000000000000
[43.145686] FS:  00007f7657dd68c0(0000) GS:ffff96d6df640000(0000) knlGS:0000000000000000
[43.146808] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[43.147584] CR2: 00007f7fe81bf5b0 CR3: 00000001093ee004 CR4: 0000000000370ee0
[43.148589] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[43.149581] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[43.150559] Call Trace:
[43.150904]  <TASK>
[43.151253]  btrfs_finish_extent_commit+0x88/0x290 [btrfs]
[43.152127]  btrfs_commit_transaction+0x74f/0xaa0 [btrfs]
[43.152932]  ? btrfs_attach_transaction_barrier+0x1e/0x50 [btrfs]
[43.153786]  btrfs_ioctl+0x1edc/0x2da0 [btrfs]
[43.154475]  ? __check_object_size+0x150/0x170
[43.155170]  ? preempt_count_add+0x49/0xa0
[43.155753]  ? __x64_sys_ioctl+0x84/0xc0
[43.156437]  ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
[43.157456]  __x64_sys_ioctl+0x84/0xc0
[43.157980]  do_syscall_64+0x3a/0x80
[43.158543]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[43.159231] RIP: 0033:0x7f7657f1e59b
[43.161819] RSP: 002b:00007ffda5cd1658 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[43.162702] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f7657f1e59b
[43.163526] RDX: 0000000000000000 RSI: 0000000000009408 RDI: 0000000000000003
[43.164358] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
[43.165208] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[43.166029] R13: 00005621b91c3232 R14: 00005621b91ba580 R15: 00007ffda5cd1800
[43.166863]  </TASK>
[43.167125] Modules linked in: btrfs blake2b_generic xor pata_acpi ata_piix libata raid6_pq scsi_mod libcrc32c virtio_net virtio_rng net_failover rng_core failover scsi_common
[43.169552] ---[ end trace 0000000000000000 ]---
[43.171226] RIP: 0010:unpin_extent_range+0x37a/0x4f0 [btrfs]
[43.174767] RSP: 0000:ffffb0dd4216bc70 EFLAGS: 00010246
[43.175600] RAX: 0000000000000000 RBX: ffff96cfc34490f8 RCX: 0000000000000001
[43.176468] RDX: 0000000080000001 RSI: 0000000051d00000 RDI: 00000000ffffffff
[43.177357] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff96cfd07dca50
[43.178271] R10: ffff96cfc46e8a00 R11: fffffffffffec000 R12: 0000000041d00000
[43.179178] R13: ffff96cfc3ce0000 R14: ffffb0dd4216bd08 R15: 0000000000000000
[43.180071] FS:  00007f7657dd68c0(0000) GS:ffff96d6df800000(0000) knlGS:0000000000000000
[43.181073] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[43.181808] CR2: 00007fe09905f010 CR3: 00000001093ee004 CR4: 0000000000370ee0
[43.182706] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[43.183591] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

We first hit the WARN_ON(rc->block_group->pinned > 0) in
btrfs_relocate_block_group() and then the BUG_ON(!cache) in
unpin_extent_range(). This tells us that we are exiting relocation and
removing the block group with bytes still pinned for that block group.
This is supposed to be impossible: the last thing relocate_block_group()
does is commit the transaction to get rid of pinned extents.

Commit d0c2f4fa555e ("btrfs: make concurrent fsyncs wait less when
waiting for a transaction commit") introduced an optimization so that
commits from fsync don't have to wait for the previous commit to unpin
extents. This was only intended to affect fsync, but it inadvertently
made it possible for any commit to skip waiting for the previous commit
to unpin. This is because if a call to btrfs_commit_transaction() finds
that another thread is already committing the transaction, it waits for
the other thread to complete the commit and then returns. If that other
thread was in fsync, then it completes the commit without completing the
previous commit. This makes the following sequence of events possible:

Thread 1____________________|Thread 2 (fsync)_____________________|Thread 3 (balance)___________________
btrfs_commit_transaction(N) |                                     |
  btrfs_run_delayed_refs    |                                     |
    pin extents             |                                     |
  ...                       |                                     |
  state = UNBLOCKED         |btrfs_sync_file                      |
                            |  btrfs_start_transaction(N + 1)     |relocate_block_group
                            |                                     |  btrfs_join_transaction(N + 1)
                            |  btrfs_commit_transaction(N + 1)    |
  ...                       |  trans->state = COMMIT_START        |
                            |                                     |  btrfs_commit_transaction(N + 1)
                            |                                     |    wait_for_commit(N + 1, COMPLETED)
                            |  wait_for_commit(N, SUPER_COMMITTED)|
  state = SUPER_COMMITTED   |  ...                                |
  btrfs_finish_extent_commit|                                     |
    unpin_extent_range()    |  trans->state = COMPLETED           |
                            |                                     |    return
                            |                                     |
    ...                     |                                     |Thread 1 isn't done, so pinned > 0
                            |                                     |and we WARN
                            |                                     |
                            |                                     |btrfs_remove_block_group
    unpin_extent_range()    |                                     |
      Thread 3 removed the  |                                     |
      block group, so we BUG|                                     |

There are other sequences involving SUPER_COMMITTED transactions that
can cause a similar outcome.

We could fix this by making relocation explicitly wait for unpinning,
but there may be other cases that need it. Josef mentioned ENOSPC
flushing and the free space cache inode as other potential victims.
Rather than playing whack-a-mole, this fix is conservative and makes all
commits not in fsync wait for all previous transactions, which is what
the optimization intended.

Fixes: d0c2f4fa555e ("btrfs: make concurrent fsyncs wait less when waiting for a transaction commit")
CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agobtrfs: fix lost prealloc extents beyond eof after full fsync
Filipe Manana [Thu, 17 Feb 2022 12:12:02 +0000 (12:12 +0000)]
btrfs: fix lost prealloc extents beyond eof after full fsync

commit d99478874355d3a7b9d86dfb5d7590d5b1754b1f upstream.

When doing a full fsync, if we have prealloc extents beyond (or at) eof,
and the leaves that contain them were not modified in the current
transaction, we end up not logging them. This results in losing those
extents when we replay the log after a power failure, since the inode is
truncated to the current value of the logged i_size.

Just like for the fast fsync path, we need to always log all prealloc
extents starting at or beyond i_size. The fast fsync case was fixed in
commit 471d557afed155 ("Btrfs: fix loss of prealloc extents past i_size
after fsync log replay") but it missed the full fsync path. The problem
exists since the very early days, when the log tree was added by
commit e02119d5a7b439 ("Btrfs: Add a write ahead tree log to optimize
synchronous operations").

Example reproducer:

  $ mkfs.btrfs -f /dev/sdc
  $ mount /dev/sdc /mnt

  # Create our test file with many file extent items, so that they span
  # several leaves of metadata, even if the node/page size is 64K. Use
  # direct IO and not fsync/O_SYNC because it's both faster and it avoids
  # clearing the full sync flag from the inode - we want the fsync below
  # to trigger the slow full sync code path.
  $ xfs_io -f -d -c "pwrite -b 4K 0 16M" /mnt/foo

  # Now add two preallocated extents to our file without extending the
  # file's size. One right at i_size, and another further beyond, leaving
  # a gap between the two prealloc extents.
  $ xfs_io -c "falloc -k 16M 1M" /mnt/foo
  $ xfs_io -c "falloc -k 20M 1M" /mnt/foo

  # Make sure everything is durably persisted and the transaction is
  # committed. This makes all created extents to have a generation lower
  # than the generation of the transaction used by the next write and
  # fsync.
  sync

  # Now overwrite only the first extent, which will result in modifying
  # only the first leaf of metadata for our inode. Then fsync it. This
  # fsync will use the slow code path (inode full sync bit is set) because
  # it's the first fsync since the inode was created/loaded.
  $ xfs_io -c "pwrite 0 4K" -c "fsync" /mnt/foo

  # Extent list before power failure.
  $ xfs_io -c "fiemap -v" /mnt/foo
  /mnt/foo:
   EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
     0: [0..7]:          2178048..2178055     8   0x0
     1: [8..16383]:      26632..43007     16376   0x0
     2: [16384..32767]:  2156544..2172927 16384   0x0
     3: [32768..34815]:  2172928..2174975  2048 0x800
     4: [34816..40959]:  hole              6144
     5: [40960..43007]:  2174976..2177023  2048 0x801

  <power fail>

  # Mount fs again, trigger log replay.
  $ mount /dev/sdc /mnt

  # Extent list after power failure and log replay.
  $ xfs_io -c "fiemap -v" /mnt/foo
  /mnt/foo:
   EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
     0: [0..7]:          2178048..2178055     8   0x0
     1: [8..16383]:      26632..43007     16376   0x0
     2: [16384..32767]:  2156544..2172927 16384   0x1

  # The prealloc extents at file offsets 16M and 20M are missing.

So fix this by calling btrfs_log_prealloc_extents() when we are doing a
full fsync, so that we always log all prealloc extents beyond eof.

A test case for fstests will follow soon.

CC: stable@vger.kernel.org # 4.19+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agotracing: Fix return value of __setup handlers
Randy Dunlap [Thu, 3 Mar 2022 03:17:44 +0000 (19:17 -0800)]
tracing: Fix return value of __setup handlers

commit 1d02b444b8d1345ea4708db3bab4db89a7784b55 upstream.

__setup() handlers should generally return 1 to indicate that the
boot options have been handled.

Using invalid option values causes the entire kernel boot option
string to be reported as Unknown and added to init's environment
strings, polluting it.

  Unknown kernel command line parameters "BOOT_IMAGE=/boot/bzImage-517rc6
    kprobe_event=p,syscall_any,$arg1 trace_options=quiet
    trace_clock=jiffies", will be passed to user space.

 Run /sbin/init as init process
   with arguments:
     /sbin/init
   with environment:
     HOME=/
     TERM=linux
     BOOT_IMAGE=/boot/bzImage-517rc6
     kprobe_event=p,syscall_any,$arg1
     trace_options=quiet
     trace_clock=jiffies

Return 1 from the __setup() handlers so that init's environment is not
polluted with kernel boot options.

Link: lore.kernel.org/r/64644a2f-4a20-bab3-1e15-3b2cdd0defe3@omprussia.ru
Link: https://lkml.kernel.org/r/20220303031744.32356-1-rdunlap@infradead.org
Cc: stable@vger.kernel.org
Fixes: 7bcfaf54f591 ("tracing: Add trace_options kernel command line parameter")
Fixes: e1e232ca6b8f ("tracing: Add trace_clock=<clock> kernel parameter")
Fixes: 970988e19eb0 ("tracing/kprobe: Add kprobe_event= boot parameter")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Igor Zhbanov <i.zhbanov@omprussia.ru>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agotracing/histogram: Fix sorting on old "cpu" value
Steven Rostedt (Google) [Wed, 2 Mar 2022 03:29:04 +0000 (22:29 -0500)]
tracing/histogram: Fix sorting on old "cpu" value

commit 1d1898f65616c4601208963c3376c1d828cbf2c7 upstream.

When trying to add a histogram against an event with the "cpu" field, it
was impossible due to "cpu" being a keyword to key off of the running CPU.
So to fix this, it was changed to "common_cpu" to match the other generic
fields (like "common_pid"). But since some scripts used "cpu" for keying
off of the CPU (for events that did not have "cpu" as a field, which is
most of them), a backward compatibility trick was added such that if "cpu"
was used as a key, and the event did not have "cpu" as a field name, then
it would fallback and switch over to "common_cpu".

This fix has a couple of subtle bugs. One was that when switching over to
"common_cpu", it did not change the field name, it just set a flag. But
the code still found a "cpu" field. The "cpu" field is used for filtering
and is returned when the event does not have a "cpu" field.

This was found by:

  # cd /sys/kernel/tracing
  # echo hist:key=cpu,pid:sort=cpu > events/sched/sched_wakeup/trigger
  # cat events/sched/sched_wakeup/hist

Which showed the histogram unsorted:

{ cpu:         19, pid:       1175 } hitcount:          1
{ cpu:          6, pid:        239 } hitcount:          2
{ cpu:         23, pid:       1186 } hitcount:         14
{ cpu:         12, pid:        249 } hitcount:          2
{ cpu:          3, pid:        994 } hitcount:          5

Instead of hard coding the "cpu" checks, take advantage of the fact that
trace_event_field_field() returns a special field for "cpu" and "CPU" if
the event does not have "cpu" as a field. This special field has the
"filter_type" of "FILTER_CPU". Check that to test if the returned field is
of the CPU type instead of doing the string compare.

Also, fix the sorting bug by testing for the hist_field flag of
HIST_FIELD_FL_CPU when setting up the sort routine. Otherwise it will use
the special CPU field to know what compare routine to use, and since that
special field does not have a size, it returns tracing_map_cmp_none.

Cc: stable@vger.kernel.org
Fixes: 1e3bac71c505 ("tracing/histogram: Rename "cpu" to "common_cpu"")
Reported-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoHID: add mapping for KEY_ALL_APPLICATIONS
William Mahon [Fri, 4 Mar 2022 02:26:22 +0000 (18:26 -0800)]
HID: add mapping for KEY_ALL_APPLICATIONS

commit 327b89f0acc4c20a06ed59e4d9af7f6d804dc2e2 upstream.

This patch adds a new key definition for KEY_ALL_APPLICATIONS
and aliases KEY_DASHBOARD to it.

It also maps the 0x0c/0x2a2 usage code to KEY_ALL_APPLICATIONS.

Signed-off-by: William Mahon <wmahon@chromium.org>
Acked-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Link: https://lore.kernel.org/r/20220303035618.1.I3a7746ad05d270161a18334ae06e3b6db1a1d339@changeid
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoHID: add mapping for KEY_DICTATE
William Mahon [Fri, 4 Mar 2022 02:23:42 +0000 (18:23 -0800)]
HID: add mapping for KEY_DICTATE

commit bfa26ba343c727e055223be04e08f2ebdd43c293 upstream.

Numerous keyboards are adding dictate keys which allows for text
messages to be dictated by a microphone.

This patch adds a new key definition KEY_DICTATE and maps 0x0c/0x0d8
usage code to this new keycode. Additionally hid-debug is adjusted to
recognize this new usage code as well.

Signed-off-by: William Mahon <wmahon@chromium.org>
Acked-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Link: https://lore.kernel.org/r/20220303021501.1.I5dbf50eb1a7a6734ee727bda4a8573358c6d3ec0@changeid
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoInput: samsung-keypad - properly state IOMEM dependency
David Gow [Mon, 28 Feb 2022 05:00:10 +0000 (21:00 -0800)]
Input: samsung-keypad - properly state IOMEM dependency

commit ba115adf61b36b8c167126425a62b0efc23f72c0 upstream.

Make the samsung-keypad driver explicitly depend on CONFIG_HAS_IOMEM, as it
calls devm_ioremap(). This prevents compile errors in some configs (e.g,
allyesconfig/randconfig under UML):

/usr/bin/ld: drivers/input/keyboard/samsung-keypad.o: in function `samsung_keypad_probe':
samsung-keypad.c:(.text+0xc60): undefined reference to `devm_ioremap'

Signed-off-by: David Gow <davidgow@google.com>
Acked-by: anton ivanov <anton.ivanov@cambridgegreys.com>
Link: https://lore.kernel.org/r/20220225041727.1902850-1-davidgow@google.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoInput: elan_i2c - fix regulator enable count imbalance after suspend/resume
Hans de Goede [Tue, 1 Mar 2022 07:39:50 +0000 (23:39 -0800)]
Input: elan_i2c - fix regulator enable count imbalance after suspend/resume

commit 04b7762e37c95d9b965d16bb0e18dbd1fa2e2861 upstream.

Before these changes elan_suspend() would only disable the regulator
when device_may_wakeup() returns false; whereas elan_resume() would
unconditionally enable it, leading to an enable count imbalance when
device_may_wakeup() returns true.

This triggers the "WARN_ON(regulator->enable_count)" in regulator_put()
when the elan_i2c driver gets unbound, this happens e.g. with the
hot-plugable dock with Elan I2C touchpad for the Asus TF103C 2-in-1.

Fix this by making the regulator_enable() call also be conditional
on device_may_wakeup() returning false.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20220131135436.29638-2-hdegoede@redhat.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoInput: elan_i2c - move regulator_[en|dis]able() out of elan_[en|dis]able_power()
Hans de Goede [Tue, 1 Mar 2022 07:39:38 +0000 (23:39 -0800)]
Input: elan_i2c - move regulator_[en|dis]able() out of elan_[en|dis]able_power()

commit 81a36d8ce554b82b0a08e2b95d0bd44fcbff339b upstream.

elan_disable_power() is called conditionally on suspend, where as
elan_enable_power() is always called on resume. This leads to
an imbalance in the regulator's enable count.

Move the regulator_[en|dis]able() calls out of elan_[en|dis]able_power()
in preparation of fixing this.

No functional changes intended.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20220131135436.29638-1-hdegoede@redhat.com
[dtor: consolidate elan_[en|dis]able() into elan_set_power()]
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoMAINTAINERS: adjust file entry for of_net.c after movement
Lukas Bulwahn [Sat, 16 Oct 2021 05:58:15 +0000 (07:58 +0200)]
MAINTAINERS: adjust file entry for of_net.c after movement

commit f616447034a120b18f6e612814641e7d8f5d7f0a upstream.

Commit e330fb14590c ("of: net: move of_net under net/") moves of_net.c
to ./net/core/, but misses to adjust the reference to this file in
MAINTAINERS.

Hence, ./scripts/get_maintainer.pl --self-test=patterns complains:

   warning: no file matches    F:    drivers/of/of_net.c

Adjust the file entry after this file movement.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Link: https://lore.kernel.org/r/20211016055815.14397-1-lukas.bulwahn@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoiavf: missing unlocks in iavf_watchdog_task()
Dan Carpenter [Wed, 10 Nov 2021 08:13:50 +0000 (11:13 +0300)]
iavf: missing unlocks in iavf_watchdog_task()

commit bc2f39a6252ee40d9bfc2743d4437d420aec5f6e upstream.

This code was re-organized and there some unlocks missing now.

Fixes: 898ef1cb1cb2 ("iavf: Combine init and watchdog state machines")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoiavf: do not override the adapter state in the watchdog task (again)
Stefan Assmann [Wed, 1 Dec 2021 08:14:34 +0000 (09:14 +0100)]
iavf: do not override the adapter state in the watchdog task (again)

commit fe523d7c9a8332855376ad5eb1aa301091129ba4 upstream.

The watchdog task incorrectly changes the state to __IAVF_RESETTING,
instead of letting the reset task take care of that. This was already
resolved by commit 22c8fd71d3a5 ("iavf: do not override the adapter
state in the watchdog task") but the problem was reintroduced by the
recent code refactoring in commit 45eebd62999d ("iavf: Refactor iavf
state machine tracking").

Fixes: 45eebd62999d ("iavf: Refactor iavf state machine tracking")
Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agonet: stmmac: perserve TX and RX coalesce value during XDP setup
Ong Boon Leong [Wed, 24 Nov 2021 11:40:19 +0000 (19:40 +0800)]
net: stmmac: perserve TX and RX coalesce value during XDP setup

commit 61da6ac715700bcfeef50d187e15c6cc7c9d079b upstream.

When XDP program is loaded, it is desirable that the previous TX and RX
coalesce values are not re-inited to its default value. This prevents
unnecessary re-configurig the coalesce values that were working fine
before.

Fixes: ac746c8520d9 ("net: stmmac: enhance XDP ZC driver level switching performance")
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Tested-by: Kurt Kanzenbach <kurt@linutronix.de>
Link: https://lore.kernel.org/r/20211124114019.3949125-1-boon.leong.ong@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years agoselftests: mlxsw: resource_scale: Fix return value
Amit Cohen [Wed, 2 Mar 2022 16:14:47 +0000 (18:14 +0200)]
selftests: mlxsw: resource_scale: Fix return value

[ Upstream commit 196f9bc050cbc5085b4cbb61cce2efe380bc66d0 ]

The test runs several test cases and is supposed to return an error in
case at least one of them failed.

Currently, the check of the return value of each test case is in the
wrong place, which can result in the wrong return value. For example:

 # TESTS='tc_police' ./resource_scale.sh
 TEST: 'tc_police' [default] 968                                     [FAIL]
         tc police offload count failed
 Error: mlxsw_spectrum: Failed to allocate policer index.
 We have an error talking to the kernel
 Command failed /tmp/tmp.i7Oc5HwmXY:969
 TEST: 'tc_police' [default] overflow 969                            [ OK ]
 ...
 TEST: 'tc_police' [ipv4_max] overflow 969                           [ OK ]

 $ echo $?
 0

Fix this by moving the check to be done after each test case.

Fixes: 059b18e21c63 ("selftests: mlxsw: Return correct error code in resource scale test")
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agonet: dcb: disable softirqs in dcbnl_flush_dev()
Vladimir Oltean [Wed, 2 Mar 2022 19:39:39 +0000 (21:39 +0200)]
net: dcb: disable softirqs in dcbnl_flush_dev()

[ Upstream commit 10b6bb62ae1a49ee818fc479cf57b8900176773e ]

Ido Schimmel points out that since commit 52cff74eef5d ("dcbnl : Disable
software interrupts before taking dcb_lock"), the DCB API can be called
by drivers from softirq context.

One such in-tree example is the chelsio cxgb4 driver:
dcb_rpl
-> cxgb4_dcb_handle_fw_update
   -> dcb_ieee_setapp

If the firmware for this driver happened to send an event which resulted
in a call to dcb_ieee_setapp() at the exact same time as another
DCB-enabled interface was unregistering on the same CPU, the softirq
would deadlock, because the interrupted process was already holding the
dcb_lock in dcbnl_flush_dev().

Fix this unlikely event by using spin_lock_bh() in dcbnl_flush_dev() as
in the rest of the dcbnl code.

Fixes: 91b0383fef06 ("net: dcb: flush lingering app table entries for unregistered devices")
Reported-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220302193939.1368823-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agodrm/amdgpu: fix suspend/resume hang regression
Qiang Yu [Tue, 1 Mar 2022 06:11:59 +0000 (14:11 +0800)]
drm/amdgpu: fix suspend/resume hang regression

[ Upstream commit f1ef17011c765495c876fa75435e59eecfdc1ee4 ]

Regression has been reported that suspend/resume may hang with
the previous vm ready check commit.

So bring back the evicted list check as a temp fix.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1922
Fixes: c1a66c3bc425 ("drm/amdgpu: check vm ready by amdgpu_vm->evicting flag")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Qiang Yu <qiang.yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agonl80211: Handle nla_memdup failures in handle_nan_filter
Jiasheng Jiang [Tue, 1 Mar 2022 10:00:20 +0000 (18:00 +0800)]
nl80211: Handle nla_memdup failures in handle_nan_filter

[ Upstream commit 6ad27f522cb3b210476daf63ce6ddb6568c0508b ]

As there's potential for failure of the nla_memdup(),
check the return value.

Fixes: a442b761b24b ("cfg80211: add add_nan_func / del_nan_func")
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Link: https://lore.kernel.org/r/20220301100020.3801187-1-jiasheng@iscas.ac.cn
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoMIPS: ralink: mt7621: use bitwise NOT instead of logical
Ilya Lipnitskiy [Tue, 1 Mar 2022 01:15:07 +0000 (17:15 -0800)]
MIPS: ralink: mt7621: use bitwise NOT instead of logical

[ Upstream commit 5d8965704fe5662e2e4a7e4424a2cbe53e182670 ]

It was the intention to reverse the bits, not make them all zero by
using logical NOT operator.

Fixes: cc19db8b312a ("MIPS: ralink: mt7621: do memory detection on KSEG1")
Suggested-by: Chuanhong Guo <gch981213@gmail.com>
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Reviewed-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoe1000e: Fix possible HW unit hang after an s0ix exit
Sasha Neftin [Tue, 25 Jan 2022 17:31:23 +0000 (19:31 +0200)]
e1000e: Fix possible HW unit hang after an s0ix exit

[ Upstream commit 1866aa0d0d6492bc2f8d22d0df49abaccf50cddd ]

Disable the OEM bit/Gig Disable/restart AN impact and disable the PHY
LAN connected device (LCD) reset during power management flows. This
fixes possible HW unit hangs on the s0ix exit on some corporate ADL
platforms.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214821
Fixes: 3e55d231716e ("e1000e: Add handshake with the CSME to support S0ix")
Suggested-by: Dima Ruinskiy <dima.ruinskiy@intel.com>
Suggested-by: Nir Efrati <nir.efrati@intel.com>
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agodrm/bridge: ti-sn65dsi86: Properly undo autosuspend
Douglas Anderson [Tue, 22 Feb 2022 22:18:43 +0000 (14:18 -0800)]
drm/bridge: ti-sn65dsi86: Properly undo autosuspend

[ Upstream commit 26d3474348293dc752c55fe6d41282199f73714c ]

The PM Runtime docs say:
  Drivers in ->remove() callback should undo the runtime PM changes done
  in ->probe(). Usually this means calling pm_runtime_disable(),
  pm_runtime_dont_use_autosuspend() etc.

We weren't doing that for autosuspend. Let's do it.

Fixes: 9bede63127c6 ("drm/bridge: ti-sn65dsi86: Use pm_runtime autosuspend")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220222141838.1.If784ba19e875e8ded4ec4931601ce6d255845245@changeid
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agodrm/i915/guc/slpc: Correct the param count for unset param
Vinay Belgaumkar [Wed, 16 Feb 2022 18:15:04 +0000 (10:15 -0800)]
drm/i915/guc/slpc: Correct the param count for unset param

[ Upstream commit 1b279f6ad467535c3b8a66b4edefaca2cdd5bdc3 ]

SLPC unset param H2G only needs one parameter - the id of the
param.

Fixes: 025cb07bebfa ("drm/i915/guc/slpc: Cache platform frequency limits")
Suggested-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220216181504.7155-1-vinay.belgaumkar@intel.com
(cherry picked from commit 9648f1c3739505557d94ff749a4f32192ea81fe3)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoiavf: Fix __IAVF_RESETTING state usage
Slawomir Laba [Wed, 23 Feb 2022 12:38:55 +0000 (13:38 +0100)]
iavf: Fix __IAVF_RESETTING state usage

[ Upstream commit 14756b2ae265d526b8356e86729090b01778fdf6 ]

The setup of __IAVF_RESETTING state in watchdog task had no
effect and could lead to slow resets in the driver as
the task for __IAVF_RESETTING state only requeues watchdog.
Till now the __IAVF_RESETTING was interpreted by reset task
as running state which could lead to errors with allocating
and resources disposal.

Make watchdog_task queue the reset task when it's necessary.
Do not update the state to __IAVF_RESETTING so the reset task
knows exactly what is the current state of the adapter.

Fixes: 898ef1cb1cb2 ("iavf: Combine init and watchdog state machines")
Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com>
Signed-off-by: Phani Burra <phani.r.burra@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoiavf: Fix race in init state
Slawomir Laba [Wed, 23 Feb 2022 12:38:01 +0000 (13:38 +0100)]
iavf: Fix race in init state

[ Upstream commit a472eb5cbaebb5774672c565e024336c039e9128 ]

When iavf_init_version_check sends VIRTCHNL_OP_GET_VF_RESOURCES
message, the driver will wait for the response after requeueing
the watchdog task in iavf_init_get_resources call stack. The
logic is implemented this way that iavf_init_get_resources has
to be called in order to allocate adapter->vf_res. It is polling
for the AQ response in iavf_get_vf_config function. Expect a
call trace from kernel when adminq_task worker handles this
message first. adapter->vf_res will be NULL in
iavf_virtchnl_completion.

Make the watchdog task not queue the adminq_task if the init
process is not finished yet.

Fixes: 898ef1cb1cb2 ("iavf: Combine init and watchdog state machines")
Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com>
Signed-off-by: Phani Burra <phani.r.burra@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoiavf: Fix locking for VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS
Slawomir Laba [Wed, 23 Feb 2022 12:37:50 +0000 (13:37 +0100)]
iavf: Fix locking for VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS

[ Upstream commit 0579fafd37fb7efe091f0e6c8ccf968864f40f3e ]

iavf_virtchnl_completion is called under crit_lock but when
the code for VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS is called,
this lock is released in order to obtain rtnl_lock to avoid
ABBA deadlock with unregister_netdev.

Along with the new way iavf_remove behaves, there exist
many risks related to the lock release and attmepts to regrab
it. The driver faces crashes related to races between
unregister_netdev and netdev_update_features. Yet another
risk is that the driver could already obtain the crit_lock
in order to destroy it and iavf_virtchnl_completion could
crash or block forever.

Make iavf_virtchnl_completion never relock crit_lock in it's
call paths.

Extract rtnl_lock locking logic to the driver for
unregister_netdev in order to set the netdev_registered flag
inside the lock.

Introduce a new flag that will inform adminq_task to perform
the code from VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS right after
it finishes processing messages. Guard this code with remove
flags so it's never called when the driver is in remove state.

Fixes: 5951a2b9812d ("iavf: Fix VLAN feature flags after VFR")
Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com>
Signed-off-by: Phani Burra <phani.r.burra@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoiavf: Fix init state closure on remove
Slawomir Laba [Wed, 23 Feb 2022 12:37:10 +0000 (13:37 +0100)]
iavf: Fix init state closure on remove

[ Upstream commit 3ccd54ef44ebfa0792c5441b6d9c86618f3378d1 ]

When init states of the adapter work, the errors like lack
of communication with the PF might hop in. If such events
occur the driver restores previous states in order to retry
initialization in a proper way. When remove task kicks in,
this situation could lead to races with unregistering the
netdevice as well as resources cleanup. With the commit
introducing the waiting in remove for init to complete,
this problem turns into an endless waiting if init never
recovers from errors.

Introduce __IAVF_IN_REMOVE_TASK bit to indicate that the
remove thread has started.

Make __IAVF_COMM_FAILED adapter state respect the
__IAVF_IN_REMOVE_TASK bit and set the __IAVF_INIT_FAILED
state and return without any action instead of trying to
recover.

Make __IAVF_INIT_FAILED adapter state respect the
__IAVF_IN_REMOVE_TASK bit and return without any further
actions.

Make the loop in the remove handler break when adapter has
__IAVF_INIT_FAILED state set.

Fixes: 898ef1cb1cb2 ("iavf: Combine init and watchdog state machines")
Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com>
Signed-off-by: Phani Burra <phani.r.burra@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoiavf: Add waiting so the port is initialized in remove
Slawomir Laba [Wed, 23 Feb 2022 12:36:56 +0000 (13:36 +0100)]
iavf: Add waiting so the port is initialized in remove

[ Upstream commit 974578017fc1fdd06cea8afb9dfa32602e8529ed ]

There exist races when port is being configured and remove is
triggered.

unregister_netdev is not and can't be called under crit_lock
mutex since it is calling ndo_stop -> iavf_close which requires
this lock. Depending on init state the netdev could be still
unregistered so unregister_netdev never cleans up, when shortly
after that the device could become registered.

Make iavf_remove wait until port finishes initialization.
All critical state changes are atomic (under crit_lock).
Crashes that come from iavf_reset_interrupt_capability and
iavf_free_traffic_irqs should now be solved in a graceful
manner.

Fixes: 605ca7c5c6707 ("iavf: Fix kernel BUG in free_msi_irqs")
Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com>
Signed-off-by: Phani Burra <phani.r.burra@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoiavf: Fix kernel BUG in free_msi_irqs
Przemyslaw Patynowski [Fri, 22 Oct 2021 08:30:14 +0000 (10:30 +0200)]
iavf: Fix kernel BUG in free_msi_irqs

[ Upstream commit 605ca7c5c670762e36ccb475cfa089d7ad0698e0 ]

Fix driver not freeing VF's traffic irqs, prior to calling
pci_disable_msix in iavf_remove.
There were possible 2 erroneous states in which, iavf_close would
not be called.
One erroneous state is fixed by allowing netdev to register, when state
is already running. It was possible for VF adapter to enter state loop
from running to resetting, where iavf_open would subsequently fail.
If user would then unload driver/remove VF pci, iavf_close would not be
called, as the netdev was not registered, leaving traffic pcis still
allocated.
Fixed this by breaking loop, allowing netdev to open device when adapter
state is __IAVF_RUNNING and it is not explicitily downed.
Other possiblity is entering to iavf_remove from __IAVF_RESETTING state,
where iavf_close would not free irqs, but just return 0.
Fixed this by checking for last adapter state and then removing irqs.

Kernel panic:
[ 2773.628585] kernel BUG at drivers/pci/msi.c:375!
...
[ 2773.631567] RIP: 0010:free_msi_irqs+0x180/0x1b0
...
[ 2773.640939] Call Trace:
[ 2773.641572]  pci_disable_msix+0xf7/0x120
[ 2773.642224]  iavf_reset_interrupt_capability.part.41+0x15/0x30 [iavf]
[ 2773.642897]  iavf_remove+0x12e/0x500 [iavf]
[ 2773.643578]  pci_device_remove+0x3b/0xc0
[ 2773.644266]  device_release_driver_internal+0x103/0x1f0
[ 2773.644948]  pci_stop_bus_device+0x69/0x90
[ 2773.645576]  pci_stop_and_remove_bus_device+0xe/0x20
[ 2773.646215]  pci_iov_remove_virtfn+0xba/0x120
[ 2773.646862]  sriov_disable+0x2f/0xe0
[ 2773.647531]  ice_free_vfs+0x2f8/0x350 [ice]
[ 2773.648207]  ice_sriov_configure+0x94/0x960 [ice]
[ 2773.648883]  ? _kstrtoull+0x3b/0x90
[ 2773.649560]  sriov_numvfs_store+0x10a/0x190
[ 2773.650249]  kernfs_fop_write+0x116/0x190
[ 2773.650948]  vfs_write+0xa5/0x1a0
[ 2773.651651]  ksys_write+0x4f/0xb0
[ 2773.652358]  do_syscall_64+0x5b/0x1a0
[ 2773.653075]  entry_SYSCALL_64_after_hwframe+0x65/0xca

Fixes: 22ead37f8af8 ("i40evf: Add longer wait after remove module")
Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoiavf: Add helper function to go from pci_dev to adapter
Karen Sornek [Wed, 15 Sep 2021 06:41:23 +0000 (08:41 +0200)]
iavf: Add helper function to go from pci_dev to adapter

[ Upstream commit 247aa001b72b6c8a89df9d108a2ec6f274a6b64d ]

Add helper function to go from pci_dev to adapter to make work simple -
to go from a pci_dev to the adapter structure and make netdev assignment
instead of having to go to the net_device then the adapter.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Karen Sornek <karen.sornek@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoiavf: Rework mutexes for better synchronisation
Slawomir Laba [Wed, 23 Feb 2022 12:35:49 +0000 (13:35 +0100)]
iavf: Rework mutexes for better synchronisation

[ Upstream commit fc2e6b3b132a907378f6af08356b105a4139c4fb ]

The driver used to crash in multiple spots when put to stress testing
of the init, reset and remove paths.

The user would experience call traces or hangs when creating,
resetting, removing VFs. Depending on the machines, the call traces
are happening in random spots, like reset restoring resources racing
with driver remove.

Make adapter->crit_lock mutex a mandatory lock for guarding the
operations performed on all workqueues and functions dealing with
resource allocation and disposal.

Make __IAVF_REMOVE a final state of the driver respected by
workqueues that shall not requeue, when they fail to obtain the
crit_lock.

Make the IRQ handler not to queue the new work for adminq_task
when the __IAVF_REMOVE state is set.

Fixes: 5ac49f3c2702 ("iavf: use mutexes for locking of critical sections")
Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com>
Signed-off-by: Phani Burra <phani.r.burra@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoiavf: Add trace while removing device
Jedrzej Jagielski [Tue, 22 Jun 2021 13:43:48 +0000 (15:43 +0200)]
iavf: Add trace while removing device

[ Upstream commit bdb9e5c7aec73a7b8b5acab37587b6de1203e68d ]

Add kernel trace that device was removed.
Currently there is no such information.
I.e. Host admin removes a PCI device from a VM,
than on VM shall be info about the event.

This patch adds info log to iavf_remove function.

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoiavf: Combine init and watchdog state machines
Mateusz Palczewski [Thu, 19 Aug 2021 08:47:58 +0000 (08:47 +0000)]
iavf: Combine init and watchdog state machines

[ Upstream commit 898ef1cb1cb24040c3e89263e02c605af70c776a ]

Use single state machine for driver initialization and for service
initialized driver. The init state machine implemented in init_task()
is merged into the watchdog_task(). The init_task() function is
removed.

Signed-off-by: Jakub Pawlak <jakub.pawlak@intel.com>
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoiavf: Add __IAVF_INIT_FAILED state
Mateusz Palczewski [Thu, 19 Aug 2021 08:47:49 +0000 (08:47 +0000)]
iavf: Add __IAVF_INIT_FAILED state

[ Upstream commit 59756ad6948be91d66867ce458083b820c59b8ba ]

This commit adds a new state, __IAVF_INIT_FAILED to the state machine.
From now on initialization functions report errors not by returning an
error value, but by changing the state to indicate that something went
wrong.

Signed-off-by: Jakub Pawlak <jakub.pawlak@intel.com>
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>