]> git.itanic.dy.fi Git - linux-stable/log
linux-stable
5 years agoLinux 4.17.9 v4.17.9
Greg Kroah-Hartman [Sun, 22 Jul 2018 13:16:09 +0000 (15:16 +0200)]
Linux 4.17.9

5 years agobpf: undo prog rejection on read-only lock failure
Daniel Borkmann [Thu, 28 Jun 2018 21:34:59 +0000 (23:34 +0200)]
bpf: undo prog rejection on read-only lock failure

commit 85782e037f8aba8922dadb24a1523ca0b82ab8bc upstream.

Partially undo commit 9facc336876f ("bpf: reject any prog that failed
read-only lock") since it caused a regression, that is, syzkaller was
able to manage to cause a panic via fault injection deep in set_memory_ro()
path by letting an allocation fail: In x86's __change_page_attr_set_clr()
it was able to change the attributes of the primary mapping but not in
the alias mapping via cpa_process_alias(), so the second, inner call
to the __change_page_attr() via __change_page_attr_set_clr() had to split
a larger page and failed in the alloc_pages() with the artifically triggered
allocation error which is then propagated down to the call site.

Thus, for set_memory_ro() this means that it returned with an error, but
from debugging a probe_kernel_write() revealed EFAULT on that memory since
the primary mapping succeeded to get changed. Therefore the subsequent
hdr->locked = 0 reset triggered the panic as it was performed on read-only
memory, so call-site assumptions were infact wrong to assume that it would
either succeed /or/ not succeed at all since there's no such rollback in
set_memory_*() calls from partial change of mappings, in other words, we're
left in a state that is "half done". A later undo via set_memory_rw() is
succeeding though due to matching permissions on that part (aka due to the
try_preserve_large_page() succeeding). While reproducing locally with
explicitly triggering this error, the initial splitting only happens on
rare occasions and in real world it would additionally need oom conditions,
but that said, it could partially fail. Therefore, it is definitely wrong
to bail out on set_memory_ro() error and reject the program with the
set_memory_*() semantics we have today. Shouldn't have gone the extra mile
since no other user in tree today infact checks for any set_memory_*()
errors, e.g. neither module_enable_ro() / module_disable_ro() for module
RO/NX handling which is mostly default these days nor kprobes core with
alloc_insn_page() / free_insn_page() as examples that could be invoked long
after bootup and original 314beb9bcabf ("x86: bpf_jit_comp: secure bpf jit
against spraying attacks") did neither when it got first introduced to BPF
so "improving" with bailing out was clearly not right when set_memory_*()
cannot handle it today.

Kees suggested that if set_memory_*() can fail, we should annotate it with
__must_check, and all callers need to deal with it gracefully given those
set_memory_*() markings aren't "advisory", but they're expected to actually
do what they say. This might be an option worth to move forward in future
but would at the same time require that set_memory_*() calls from supporting
archs are guaranteed to be "atomic" in that they provide rollback if part
of the range fails, once that happened, the transition from RW -> RO could
be made more robust that way, while subsequent RO -> RW transition /must/
continue guaranteeing to always succeed the undo part.

Reported-by: syzbot+a4eb8c7766952a1ca872@syzkaller.appspotmail.com
Reported-by: syzbot+d866d1925855328eac3b@syzkaller.appspotmail.com
Fixes: 9facc336876f ("bpf: reject any prog that failed read-only lock")
Cc: Laura Abbott <labbott@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agobpf, arm32: fix to use bpf_jit_binary_lock_ro api
Daniel Borkmann [Thu, 28 Jun 2018 21:34:57 +0000 (23:34 +0200)]
bpf, arm32: fix to use bpf_jit_binary_lock_ro api

commit 18d405af30bf6506bf2fc49056de7691c949812e upstream.

Any eBPF JIT that where its underlying arch supports ARCH_HAS_SET_MEMORY
would need to use bpf_jit_binary_{un,}lock_ro() pair instead of the
set_memory_{ro,rw}() pair directly as otherwise changes to the former
might break. arm32's eBPF conversion missed to change it, so fix this
up here.

Fixes: 39c13c204bb1 ("arm: eBPF JIT compiler")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agobpf: enforce correct alignment for instructions
Eric Dumazet [Thu, 21 Jun 2018 00:24:09 +0000 (17:24 -0700)]
bpf: enforce correct alignment for instructions

commit 9262478220eac908ae6e168c3df2c453c87e2da3 upstream.

After commit 9facc336876f ("bpf: reject any prog that failed read-only lock")
offsetof(struct bpf_binary_header, image) became 3 instead of 4,
breaking powerpc BPF badly, since instructions need to be word aligned.

Fixes: 9facc336876f ("bpf: reject any prog that failed read-only lock")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoarm64: KVM: Add ARCH_WORKAROUND_2 discovery through ARCH_FEATURES_FUNC_ID
Marc Zyngier [Fri, 20 Jul 2018 09:47:22 +0000 (10:47 +0100)]
arm64: KVM: Add ARCH_WORKAROUND_2 discovery through ARCH_FEATURES_FUNC_ID

commit 5d81f7dc9bca4f4963092433e27b508cbe524a32 upstream.

Now that all our infrastructure is in place, let's expose the
availability of ARCH_WORKAROUND_2 to guests. We take this opportunity
to tidy up a couple of SMCCC constants.

Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoarm64: KVM: Handle guest's ARCH_WORKAROUND_2 requests
Marc Zyngier [Fri, 20 Jul 2018 09:47:21 +0000 (10:47 +0100)]
arm64: KVM: Handle guest's ARCH_WORKAROUND_2 requests

commit b4f18c063a13dfb33e3a63fe1844823e19c2265e upstream.

In order to forward the guest's ARCH_WORKAROUND_2 calls to EL3,
add a small(-ish) sequence to handle it at EL2. Special care must
be taken to track the state of the guest itself by updating the
workaround flags. We also rely on patching to enable calls into
the firmware.

Note that since we need to execute branches, this always executes
after the Spectre-v2 mitigation has been applied.

Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoarm64: KVM: Add ARCH_WORKAROUND_2 support for guests
Marc Zyngier [Fri, 20 Jul 2018 09:47:20 +0000 (10:47 +0100)]
arm64: KVM: Add ARCH_WORKAROUND_2 support for guests

commit 55e3748e8902ff641e334226bdcb432f9a5d78d3 upstream.

In order to offer ARCH_WORKAROUND_2 support to guests, we need
a bit of infrastructure.

Let's add a flag indicating whether or not the guest uses
SSBD mitigation. Depending on the state of this flag, allow
KVM to disable ARCH_WORKAROUND_2 before entering the guest,
and enable it when exiting it.

Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoarm64: KVM: Add HYP per-cpu accessors
Marc Zyngier [Fri, 20 Jul 2018 09:47:19 +0000 (10:47 +0100)]
arm64: KVM: Add HYP per-cpu accessors

commit 85478bab409171de501b719971fd25a3d5d639f9 upstream.

As we're going to require to access per-cpu variables at EL2,
let's craft the minimum set of accessors required to implement
reading a per-cpu variable, relying on tpidr_el2 to contain the
per-cpu offset.

Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoarm64: ssbd: Add prctl interface for per-thread mitigation
Marc Zyngier [Fri, 20 Jul 2018 09:47:18 +0000 (10:47 +0100)]
arm64: ssbd: Add prctl interface for per-thread mitigation

commit 9cdc0108baa8ef87c76ed834619886a46bd70cbe upstream.

If running on a system that performs dynamic SSBD mitigation, allow
userspace to request the mitigation for itself. This is implemented
as a prctl call, allowing the mitigation to be enabled or disabled at
will for this particular thread.

Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoarm64: ssbd: Introduce thread flag to control userspace mitigation
Marc Zyngier [Fri, 20 Jul 2018 09:47:17 +0000 (10:47 +0100)]
arm64: ssbd: Introduce thread flag to control userspace mitigation

commit 9dd9614f5476687abbff8d4b12cd08ae70d7c2ad upstream.

In order to allow userspace to be mitigated on demand, let's
introduce a new thread flag that prevents the mitigation from
being turned off when exiting to userspace, and doesn't turn
it on on entry into the kernel (with the assumption that the
mitigation is always enabled in the kernel itself).

This will be used by a prctl interface introduced in a later
patch.

Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoarm64: ssbd: Restore mitigation status on CPU resume
Marc Zyngier [Fri, 20 Jul 2018 09:47:16 +0000 (10:47 +0100)]
arm64: ssbd: Restore mitigation status on CPU resume

commit 647d0519b53f440a55df163de21c52a8205431cc upstream.

On a system where firmware can dynamically change the state of the
mitigation, the CPU will always come up with the mitigation enabled,
including when coming back from suspend.

If the user has requested "no mitigation" via a command line option,
let's enforce it by calling into the firmware again to disable it.

Similarily, for a resume from hibernate, the mitigation could have
been disabled by the boot kernel. Let's ensure that it is set
back on in that case.

Acked-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoarm64: ssbd: Skip apply_ssbd if not using dynamic mitigation
Marc Zyngier [Fri, 20 Jul 2018 09:47:15 +0000 (10:47 +0100)]
arm64: ssbd: Skip apply_ssbd if not using dynamic mitigation

commit 986372c4367f46b34a3c0f6918d7fb95cbdf39d6 upstream.

In order to avoid checking arm64_ssbd_callback_required on each
kernel entry/exit even if no mitigation is required, let's
add yet another alternative that by default jumps over the mitigation,
and that gets nop'ed out if we're doing dynamic mitigation.

Think of it as a poor man's static key...

Reviewed-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoarm64: ssbd: Add global mitigation state accessor
Marc Zyngier [Fri, 20 Jul 2018 09:47:14 +0000 (10:47 +0100)]
arm64: ssbd: Add global mitigation state accessor

commit c32e1736ca03904c03de0e4459a673be194f56fd upstream.

We're about to need the mitigation state in various parts of the
kernel in order to do the right thing for userspace and guests.

Let's expose an accessor that will let other subsystems know
about the state.

Reviewed-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoarm64: Add 'ssbd' command-line option
Marc Zyngier [Fri, 20 Jul 2018 09:47:13 +0000 (10:47 +0100)]
arm64: Add 'ssbd' command-line option

commit a43ae4dfe56a01f5b98ba0cb2f784b6a43bafcc6 upstream.

On a system where the firmware implements ARCH_WORKAROUND_2,
it may be useful to either permanently enable or disable the
workaround for cases where the user decides that they'd rather
not get a trap overhead, and keep the mitigation permanently
on or off instead of switching it on exception entry/exit.

In any case, default to the mitigation being enabled.

Reviewed-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoarm64: Add ARCH_WORKAROUND_2 probing
Marc Zyngier [Fri, 20 Jul 2018 09:47:12 +0000 (10:47 +0100)]
arm64: Add ARCH_WORKAROUND_2 probing

commit a725e3dda1813ed306734823ac4c65ca04e38500 upstream.

As for Spectre variant-2, we rely on SMCCC 1.1 to provide the
discovery mechanism for detecting the SSBD mitigation.

A new capability is also allocated for that purpose, and a
config option.

Reviewed-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoarm64: Add per-cpu infrastructure to call ARCH_WORKAROUND_2
Marc Zyngier [Fri, 20 Jul 2018 09:47:11 +0000 (10:47 +0100)]
arm64: Add per-cpu infrastructure to call ARCH_WORKAROUND_2

commit 5cf9ce6e5ea50f805c6188c04ed0daaec7b6887d upstream.

In a heterogeneous system, we can end up with both affected and
unaffected CPUs. Let's check their status before calling into the
firmware.

Reviewed-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoarm64: Call ARCH_WORKAROUND_2 on transitions between EL0 and EL1
Marc Zyngier [Fri, 20 Jul 2018 09:47:10 +0000 (10:47 +0100)]
arm64: Call ARCH_WORKAROUND_2 on transitions between EL0 and EL1

commit 8e2906245f1e3b0d027169d9f2e55ce0548cb96e upstream.

In order for the kernel to protect itself, let's call the SSBD mitigation
implemented by the higher exception level (either hypervisor or firmware)
on each transition between userspace and kernel.

We must take the PSCI conduit into account in order to target the
right exception level, hence the introduction of a runtime patching
callback.

Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoarm/arm64: smccc: Add SMCCC-specific return codes
Marc Zyngier [Fri, 20 Jul 2018 09:47:09 +0000 (10:47 +0100)]
arm/arm64: smccc: Add SMCCC-specific return codes

commit eff0e9e1078ea7dc1d794dc50e31baef984c46d7 upstream.

We've so far used the PSCI return codes for SMCCC because they
were extremely similar. But with the new ARM DEN 0070A specification,
"NOT_REQUIRED" (-2) is clashing with PSCI's "PSCI_RET_INVALID_PARAMS".

Let's bite the bullet and add SMCCC specific return codes. Users
can be repainted as and when required.

Acked-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoipvs: initialize tbl->entries in ip_vs_lblc_init_svc()
Cong Wang [Mon, 23 Apr 2018 21:04:45 +0000 (14:04 -0700)]
ipvs: initialize tbl->entries in ip_vs_lblc_init_svc()

commit 8b2ebb6cf064247d60cccbf1750610ac9bb2e672 upstream.

Similarly, tbl->entries is not initialized after kmalloc(),
therefore causes an uninit-value warning in ip_vs_lblc_check_expire(),
as reported by syzbot.

Reported-by: <syzbot+3e9695f147fb529aa9bc@syzkaller.appspotmail.com>
Cc: Simon Horman <horms@verge.net.au>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoipvs: initialize tbl->entries after allocation
Cong Wang [Mon, 23 Apr 2018 20:53:41 +0000 (13:53 -0700)]
ipvs: initialize tbl->entries after allocation

commit 3aa1409a7b160f9444945c0df1cb079df82be84e upstream.

tbl->entries is not initialized after kmalloc(), therefore
causes an uninit-value warning in ip_vs_lblc_check_expire()
as reported by syzbot.

Reported-by: <syzbot+3dfdea57819073a04f21@syzkaller.appspotmail.com>
Cc: Simon Horman <horms@verge.net.au>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonet/nfc: Avoid stalls when nfc_alloc_send_skb() returned NULL.
Tetsuo Handa [Wed, 18 Jul 2018 09:57:27 +0000 (18:57 +0900)]
net/nfc: Avoid stalls when nfc_alloc_send_skb() returned NULL.

commit 3bc53be9db21040b5d2de4d455f023c8c494aa68 upstream.

syzbot is reporting stalls at nfc_llcp_send_ui_frame() [1]. This is
because nfc_llcp_send_ui_frame() is retrying the loop without any delay
when nonblocking nfc_alloc_send_skb() returned NULL.

Since there is no need to use MSG_DONTWAIT if we retry until
sock_alloc_send_pskb() succeeds, let's use blocking call.
Also, in case an unexpected error occurred, let's break the loop
if blocking nfc_alloc_send_skb() failed.

[1] https://syzkaller.appspot.com/bug?id=4a131cc571c3733e0eff6bc673f4e36ae48f19c6

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: syzbot <syzbot+d29d18215e477cfbfbdd@syzkaller.appspotmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agobpf: don't leave partial mangled prog in jit_subprogs error path
Daniel Borkmann [Thu, 12 Jul 2018 19:44:28 +0000 (21:44 +0200)]
bpf: don't leave partial mangled prog in jit_subprogs error path

commit c7a897843224a92209f306c984975b704969b89d upstream.

syzkaller managed to trigger the following bug through fault injection:

  [...]
  [  141.043668] verifier bug. No program starts at insn 3
  [  141.044648] WARNING: CPU: 3 PID: 4072 at kernel/bpf/verifier.c:1613
                 get_callee_stack_depth kernel/bpf/verifier.c:1612 [inline]
  [  141.044648] WARNING: CPU: 3 PID: 4072 at kernel/bpf/verifier.c:1613
                 fixup_call_args kernel/bpf/verifier.c:5587 [inline]
  [  141.044648] WARNING: CPU: 3 PID: 4072 at kernel/bpf/verifier.c:1613
                 bpf_check+0x525e/0x5e60 kernel/bpf/verifier.c:5952
  [  141.047355] CPU: 3 PID: 4072 Comm: a.out Not tainted 4.18.0-rc4+ #51
  [  141.048446] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),BIOS 1.10.2-1 04/01/2014
  [  141.049877] Call Trace:
  [  141.050324]  __dump_stack lib/dump_stack.c:77 [inline]
  [  141.050324]  dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
  [  141.050950]  ? dump_stack_print_info.cold.2+0x52/0x52 lib/dump_stack.c:60
  [  141.051837]  panic+0x238/0x4e7 kernel/panic.c:184
  [  141.052386]  ? add_taint.cold.5+0x16/0x16 kernel/panic.c:385
  [  141.053101]  ? __warn.cold.8+0x148/0x1ba kernel/panic.c:537
  [  141.053814]  ? __warn.cold.8+0x117/0x1ba kernel/panic.c:530
  [  141.054506]  ? get_callee_stack_depth kernel/bpf/verifier.c:1612 [inline]
  [  141.054506]  ? fixup_call_args kernel/bpf/verifier.c:5587 [inline]
  [  141.054506]  ? bpf_check+0x525e/0x5e60 kernel/bpf/verifier.c:5952
  [  141.055163]  __warn.cold.8+0x163/0x1ba kernel/panic.c:538
  [  141.055820]  ? get_callee_stack_depth kernel/bpf/verifier.c:1612 [inline]
  [  141.055820]  ? fixup_call_args kernel/bpf/verifier.c:5587 [inline]
  [  141.055820]  ? bpf_check+0x525e/0x5e60 kernel/bpf/verifier.c:5952
  [...]

What happens in jit_subprogs() is that kcalloc() for the subprog func
buffer is failing with NULL where we then bail out. Latter is a plain
return -ENOMEM, and this is definitely not okay since earlier in the
loop we are walking all subprogs and temporarily rewrite insn->off to
remember the subprog id as well as insn->imm to temporarily point the
call to __bpf_call_base + 1 for the initial JIT pass. Thus, bailing
out in such state and handing this over to the interpreter is troublesome
since later/subsequent e.g. find_subprog() lookups are based on wrong
insn->imm.

Therefore, once we hit this point, we need to jump to out_free path
where we undo all changes from earlier loop, so that interpreter can
work on unmodified insn->{off,imm}.

Another point is that should find_subprog() fail in jit_subprogs() due
to a verifier bug, then we also should not simply defer the program to
the interpreter since also here we did partial modifications. Instead
we should just bail out entirely and return an error to the user who is
trying to load the program.

Fixes: 1c2a088a6626 ("bpf: x64: add JIT support for multi-function programs")
Reported-by: syzbot+7d427828b2ea6e592804@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agobpf: sockmap, consume_skb in close path
John Fastabend [Thu, 5 Jul 2018 15:50:10 +0000 (08:50 -0700)]
bpf: sockmap, consume_skb in close path

commit 7ebc14d507b4b55105da8d1a1eda323381529cc7 upstream.

Currently, when a sock is closed and the bpf_tcp_close() callback is
used we remove memory but do not free the skb. Call consume_skb() if
the skb is attached to the buffer.

Reported-by: syzbot+d464d2c20c717ef5a6a8@syzkaller.appspotmail.com
Fixes: 1aa12bdf1bfb ("bpf: sockmap, add sock close() hook to remove socks")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agobpf: sockmap, fix crash when ipv6 sock is added
John Fastabend [Sat, 30 Jun 2018 13:17:36 +0000 (06:17 -0700)]
bpf: sockmap, fix crash when ipv6 sock is added

commit 9901c5d77e969d8215a8e8d087ef02e6feddc84c upstream.

This fixes a crash where we assign tcp_prot to IPv6 sockets instead
of tcpv6_prot.

Previously we overwrote the sk->prot field with tcp_prot even in the
AF_INET6 case. This patch ensures the correct tcp_prot and tcpv6_prot
are used.

Tested with 'netserver -6' and 'netperf -H [IPv6]' as well as
'netperf -H [IPv4]'. The ESTABLISHED check resolves the previously
crashing case here.

Fixes: 174a79ff9515 ("bpf: sockmap with sk redirect support")
Reported-by: syzbot+5c063698bdbfac19f363@syzkaller.appspotmail.com
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoblock: don't use blocking queue entered for recursive bio submits
Jens Axboe [Sat, 2 Jun 2018 20:04:07 +0000 (14:04 -0600)]
block: don't use blocking queue entered for recursive bio submits

commit cd4a4ae4683dc2e09380118e205e057896dcda2b upstream.

If we end up splitting a bio and the queue goes away between
the initial submission and the later split submission, then we
can block forever in blk_queue_enter() waiting for the reference
to drop to zero. This will never happen, since we already hold
a reference.

Mark a split bio as already having entered the queue, so we can
just use the live non-blocking queue enter variant.

Thanks to Tetsuo Handa for the analysis.

Reported-by: syzbot+c4f9cebf9d651f6e54de@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agords: avoid unenecessary cong_update in loop transport
Santosh Shilimkar [Thu, 14 Jun 2018 18:52:34 +0000 (11:52 -0700)]
rds: avoid unenecessary cong_update in loop transport

commit f1693c63ab133d16994cc50f773982b5905af264 upstream.

Loop transport which is self loopback, remote port congestion
update isn't relevant. Infact the xmit path already ignores it.
Receive path needs to do the same.

Reported-by: syzbot+4c20b3866171ce8441d2@syzkaller.appspotmail.com
Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agobpf: reject any prog that failed read-only lock
Daniel Borkmann [Fri, 15 Jun 2018 00:30:48 +0000 (02:30 +0200)]
bpf: reject any prog that failed read-only lock

commit 9facc336876f7ecf9edba4c67b90426fde4ec898 upstream.

We currently lock any JITed image as read-only via bpf_jit_binary_lock_ro()
as well as the BPF image as read-only through bpf_prog_lock_ro(). In
the case any of these would fail we throw a WARN_ON_ONCE() in order to
yell loudly to the log. Perhaps, to some extend, this may be comparable
to an allocation where __GFP_NOWARN is explicitly not set.

Added via 65869a47f348 ("bpf: improve read-only handling"), this behavior
is slightly different compared to any of the other in-kernel set_memory_ro()
users who do not check the return code of set_memory_ro() and friends /at
all/ (e.g. in the case of module_enable_ro() / module_disable_ro()). Given
in BPF this is mandatory hardening step, we want to know whether there
are any issues that would leave both BPF data writable. So it happens
that syzkaller enabled fault injection and it triggered memory allocation
failure deep inside x86's change_page_attr_set_clr() which was triggered
from set_memory_ro().

Now, there are two options: i) leaving everything as is, and ii) reworking
the image locking code in order to have a final checkpoint out of the
central bpf_prog_select_runtime() which probes whether any of the calls
during prog setup weren't successful, and then bailing out with an error.
Option ii) is a better approach since this additional paranoia avoids
altogether leaving any potential W+X pages from BPF side in the system.
Therefore, lets be strict about it, and reject programs in such unlikely
occasion. While testing I noticed also that one bpf_prog_lock_ro()
call was missing on the outer dummy prog in case of calls, e.g. in the
destructor we call bpf_prog_free_deferred() on the main prog where we
try to bpf_prog_unlock_free() the program, and since we go via
bpf_prog_select_runtime() do that as well.

Reported-by: syzbot+3b889862e65a98317058@syzkaller.appspotmail.com
Reported-by: syzbot+9e762b52dd17e616a7a5@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agobdi: Fix another oops in wb_workfn()
Jan Kara [Mon, 18 Jun 2018 13:46:58 +0000 (15:46 +0200)]
bdi: Fix another oops in wb_workfn()

commit 3ee7e8697d5860b173132606d80a9cd35e7113ee upstream.

syzbot is reporting NULL pointer dereference at wb_workfn() [1] due to
wb->bdi->dev being NULL. And Dmitry confirmed that wb->state was
WB_shutting_down after wb->bdi->dev became NULL. This indicates that
unregister_bdi() failed to call wb_shutdown() on one of wb objects.

The problem is in cgwb_bdi_unregister() which does cgwb_kill() and thus
drops bdi's reference to wb structures before going through the list of
wbs again and calling wb_shutdown() on each of them. This way the loop
iterating through all wbs can easily miss a wb if that wb has already
passed through cgwb_remove_from_bdi_list() called from wb_shutdown()
from cgwb_release_workfn() and as a result fully shutdown bdi although
wb_workfn() for this wb structure is still running. In fact there are
also other ways cgwb_bdi_unregister() can race with
cgwb_release_workfn() leading e.g. to use-after-free issues:

CPU1                            CPU2
                                cgwb_bdi_unregister()
                                  cgwb_kill(*slot);

cgwb_release()
  queue_work(cgwb_release_wq, &wb->release_work);
cgwb_release_workfn()
                                  wb = list_first_entry(&bdi->wb_list, ...)
                                  spin_unlock_irq(&cgwb_lock);
  wb_shutdown(wb);
  ...
  kfree_rcu(wb, rcu);
                                  wb_shutdown(wb); -> oops use-after-free

We solve these issues by synchronizing writeback structure shutdown from
cgwb_bdi_unregister() with cgwb_release_workfn() using a new mutex. That
way we also no longer need synchronization using WB_shutting_down as the
mutex provides it for CONFIG_CGROUP_WRITEBACK case and without
CONFIG_CGROUP_WRITEBACK wb_shutdown() can be called only once from
bdi_unregister().

Reported-by: syzbot <syzbot+4a7438e774b21ddd8eca@syzkaller.appspotmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonetfilter: ipv6: nf_defrag: drop skb dst before queueing
Florian Westphal [Mon, 9 Jul 2018 11:43:38 +0000 (13:43 +0200)]
netfilter: ipv6: nf_defrag: drop skb dst before queueing

commit 84379c9afe011020e797e3f50a662b08a6355dcf upstream.

Eric Dumazet reports:
 Here is a reproducer of an annoying bug detected by syzkaller on our production kernel
 [..]
 ./b78305423 enable_conntrack
 Then :
 sleep 60
 dmesg | tail -10
 [  171.599093] unregister_netdevice: waiting for lo to become free. Usage count = 2
 [  181.631024] unregister_netdevice: waiting for lo to become free. Usage count = 2
 [  191.687076] unregister_netdevice: waiting for lo to become free. Usage count = 2
 [  201.703037] unregister_netdevice: waiting for lo to become free. Usage count = 2
 [  211.711072] unregister_netdevice: waiting for lo to become free. Usage count = 2
 [  221.959070] unregister_netdevice: waiting for lo to become free. Usage count = 2

Reproducer sends ipv6 fragment that hits nfct defrag via LOCAL_OUT hook.
skb gets queued until frag timer expiry -- 1 minute.

Normally nf_conntrack_reasm gets called during prerouting, so skb has
no dst yet which might explain why this wasn't spotted earlier.

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Reported-by: John Sperbeck <jsperbeck@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Tested-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonsh: set mac len based on inner packet
Willem de Bruijn [Wed, 11 Jul 2018 16:00:44 +0000 (12:00 -0400)]
nsh: set mac len based on inner packet

commit bab2c80e5a6c855657482eac9e97f5f3eedb509a upstream.

When pulling the NSH header in nsh_gso_segment, set the mac length
based on the encapsulated packet type.

skb_reset_mac_len computes an offset to the network header, which
here still points to the outer packet:

  >     skb_reset_network_header(skb);
  >     [...]
  >     __skb_pull(skb, nsh_len);
  >     skb_reset_mac_header(skb);    // now mac hdr starts nsh_len == 8B after net hdr
  >     skb_reset_mac_len(skb);       // mac len = net hdr - mac hdr == (u16) -8 == 65528
  >     [..]
  >     skb_mac_gso_segment(skb, ..)

Link: http://lkml.kernel.org/r/CAF=yD-KeAcTSOn4AxirAxL8m7QAS8GBBe1w09eziYwvPbbUeYA@mail.gmail.com
Reported-by: syzbot+7b9ed9872dab8c32305d@syzkaller.appspotmail.com
Fixes: c411ed854584 ("nsh: add GSO support")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoautofs: fix slab out of bounds read in getname_kernel()
Tomas Bortoli [Fri, 13 Jul 2018 23:58:59 +0000 (16:58 -0700)]
autofs: fix slab out of bounds read in getname_kernel()

commit 02f51d45937f7bc7f4dee21e9f85b2d5eac37104 upstream.

The autofs subsystem does not check that the "path" parameter is present
for all cases where it is required when it is passed in via the "param"
struct.

In particular it isn't checked for the AUTOFS_DEV_IOCTL_OPENMOUNT_CMD
ioctl command.

To solve it, modify validate_dev_ioctl(function to check that a path has
been provided for ioctl commands that require it.

Link: http://lkml.kernel.org/r/153060031527.26631.18306637892746301555.stgit@pluto.themaw.net
Signed-off-by: Tomas Bortoli <tomasbortoli@gmail.com>
Signed-off-by: Ian Kent <raven@themaw.net>
Reported-by: syzbot+60c837b428dc84e83a93@syzkaller.appspotmail.com
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agotls: Stricter error checking in zerocopy sendmsg path
Dave Watson [Thu, 12 Jul 2018 15:03:43 +0000 (08:03 -0700)]
tls: Stricter error checking in zerocopy sendmsg path

commit 32da12216e467dea70a09cd7094c30779ce0f9db upstream.

In the zerocopy sendmsg() path, there are error checks to revert
the zerocopy if we get any error code.  syzkaller has discovered
that tls_push_record can return -ECONNRESET, which is fatal, and
happens after the point at which it is safe to revert the iter,
as we've already passed the memory to do_tcp_sendpages.

Previously this code could return -ENOMEM and we would want to
revert the iter, but AFAIK this no longer returns ENOMEM after
a447da7d004 ("tls: fix waitall behavior in tls_sw_recvmsg"),
so we fail for all error codes.

Reported-by: syzbot+c226690f7b3126c5ee04@syzkaller.appspotmail.com
Reported-by: syzbot+709f2810a6a05f11d4d3@syzkaller.appspotmail.com
Signed-off-by: Dave Watson <davejwatson@fb.com>
Fixes: 3c4d7559159b ("tls: kernel TLS support")
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoKEYS: DNS: fix parsing multiple options
Eric Biggers [Wed, 11 Jul 2018 17:46:29 +0000 (10:46 -0700)]
KEYS: DNS: fix parsing multiple options

commit c604cb767049b78b3075497b80ebb8fd530ea2cc upstream.

My recent fix for dns_resolver_preparse() printing very long strings was
incomplete, as shown by syzbot which still managed to hit the
WARN_ONCE() in set_precision() by adding a crafted "dns_resolver" key:

    precision 50001 too large
    WARNING: CPU: 7 PID: 864 at lib/vsprintf.c:2164 vsnprintf+0x48a/0x5a0

The bug this time isn't just a printing bug, but also a logical error
when multiple options ("#"-separated strings) are given in the key
payload.  Specifically, when separating an option string into name and
value, if there is no value then the name is incorrectly considered to
end at the end of the key payload, rather than the end of the current
option.  This bypasses validation of the option length, and also means
that specifying multiple options is broken -- which presumably has gone
unnoticed as there is currently only one valid option anyway.

A similar problem also applied to option values, as the kstrtoul() when
parsing the "dnserror" option will read past the end of the current
option and into the next option.

Fix these bugs by correctly computing the length of the option name and
by copying the option value, null-terminated, into a temporary buffer.

Reproducer for the WARN_ONCE() that syzbot hit:

    perl -e 'print "#A#", "\0" x 50000' | keyctl padd dns_resolver desc @s

Reproducer for "dnserror" option being parsed incorrectly (expected
behavior is to fail when seeing the unknown option "foo", actual
behavior was to read the dnserror value as "1#foo" and fail there):

    perl -e 'print "#dnserror=1#foo\0"' | keyctl padd dns_resolver desc @s

Reported-by: syzbot <syzkaller@googlegroups.com>
Fixes: 4a2d789267e0 ("DNS: If the DNS server returns an error, allow that to be cached [ver #2]")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoreiserfs: fix buffer overflow with long warning messages
Eric Biggers [Fri, 13 Jul 2018 23:59:27 +0000 (16:59 -0700)]
reiserfs: fix buffer overflow with long warning messages

commit fe10e398e860955bac4d28ec031b701d358465e4 upstream.

ReiserFS prepares log messages into a 1024-byte buffer with no bounds
checks.  Long messages, such as the "unknown mount option" warning when
userspace passes a crafted mount options string, overflow this buffer.
This causes KASAN to report a global-out-of-bounds write.

Fix it by truncating messages to the buffer size.

Link: http://lkml.kernel.org/r/20180707203621.30922-1-ebiggers3@gmail.com
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot+b890b3335a4d8c608963@syzkaller.appspotmail.com
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonetfilter: ebtables: reject non-bridge targets
Florian Westphal [Wed, 6 Jun 2018 10:14:56 +0000 (12:14 +0200)]
netfilter: ebtables: reject non-bridge targets

commit 11ff7288beb2b7da889a014aff0a7b80bf8efcf3 upstream.

the ebtables evaluation loop expects targets to return
positive values (jumps), or negative values (absolute verdicts).

This is completely different from what xtables does.
In xtables, targets are expected to return the standard netfilter
verdicts, i.e. NF_DROP, NF_ACCEPT, etc.

ebtables will consider these as jumps.

Therefore reject any target found due to unspec fallback.
v2: also reject watchers.  ebtables ignores their return value, so
a target that assumes skb ownership (and returns NF_STOLEN) causes
use-after-free.

The only watchers in the 'ebtables' front-end are log and nflog;
both have AF_BRIDGE specific wrappers on kernel side.

Reported-by: syzbot+2b43f681169a2a0d306a@syzkaller.appspotmail.com
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoPCI: hv: Disable/enable IRQs rather than BH in hv_compose_msi_msg()
Dexuan Cui [Mon, 9 Jul 2018 18:16:07 +0000 (13:16 -0500)]
PCI: hv: Disable/enable IRQs rather than BH in hv_compose_msi_msg()

commit 35a88a18d7ea58600e11590405bc93b08e16e7f5 upstream.

Commit de0aa7b2f97d ("PCI: hv: Fix 2 hang issues in hv_compose_msi_msg()")
uses local_bh_disable()/enable(), because hv_pci_onchannelcallback() can
also run in tasklet context as the channel event callback, so bottom halves
should be disabled to prevent a race condition.

With CONFIG_PROVE_LOCKING=y in the recent mainline, or old kernels that
don't have commit f71b74bca637 ("irq/softirqs: Use lockdep to assert IRQs
are disabled/enabled"), when the upper layer IRQ code calls
hv_compose_msi_msg() with local IRQs disabled, we'll see a warning at the
beginning of __local_bh_enable_ip():

  IRQs not enabled as expected
    WARNING: CPU: 0 PID: 408 at kernel/softirq.c:162 __local_bh_enable_ip

The warning exposes an issue in de0aa7b2f97d: local_bh_enable() can
potentially call do_softirq(), which is not supposed to run when local IRQs
are disabled. Let's fix this by using local_irq_save()/restore() instead.

Note: hv_pci_onchannelcallback() is not a hot path because it's only called
when the PCI device is hot added and removed, which is infrequent.

Fixes: de0aa7b2f97d ("PCI: hv: Fix 2 hang issues in hv_compose_msi_msg()")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Cc: stable@vger.kernel.org
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agocrypto: af_alg - Initialize sg_num_bytes in error code path
Stephan Mueller [Sat, 7 Jul 2018 18:41:47 +0000 (20:41 +0200)]
crypto: af_alg - Initialize sg_num_bytes in error code path

commit 2546da99212f22034aecf279da9c47cbfac6c981 upstream.

The RX SGL in processing is already registered with the RX SGL tracking
list to support proper cleanup. The cleanup code path uses the
sg_num_bytes variable which must therefore be always initialized, even
in the error code path.

Signed-off-by: Stephan Mueller <smueller@chronox.de>
Reported-by: syzbot+9c251bdd09f83b92ba95@syzkaller.appspotmail.com
#syz test: https://github.com/google/kmsan.git master
CC: <stable@vger.kernel.org> #4.14
Fixes: e870456d8e7c ("crypto: algif_skcipher - overhaul memory management")
Fixes: d887c52d6ae4 ("crypto: algif_aead - overhaul memory management")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonet: lan78xx: Fix race in tx pending skb size calculation
Stefan Wahren [Sun, 15 Jul 2018 19:53:20 +0000 (21:53 +0200)]
net: lan78xx: Fix race in tx pending skb size calculation

commit dea39aca1d7aef1e2b95b07edeacf04cc8863a2e upstream.

The skb size calculation in lan78xx_tx_bh is in race with the start_xmit,
which could lead to rare kernel oopses. So protect the whole skb walk with
a spin lock. As a benefit we can unlink the skb directly.

This patch was tested on Raspberry Pi 3B+

Link: https://github.com/raspberrypi/linux/issues/2608
Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Floris Bos <bos@je-eigen-domein.nl>
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agortlwifi: rtl8821ae: fix firmware is not ready to run
Ping-Ke Shih [Thu, 28 Jun 2018 02:02:27 +0000 (10:02 +0800)]
rtlwifi: rtl8821ae: fix firmware is not ready to run

commit 9a98302de19991d51e067b88750585203b2a3ab6 upstream.

Without this patch, firmware will not run properly on rtl8821ae, and it
causes bad user experience. For example, bad connection performance with
low rate, higher power consumption, and so on.

rtl8821ae uses two kinds of firmwares for normal and WoWlan cases, and
each firmware has firmware data buffer and size individually. Original
code always overwrite size of normal firmware rtlpriv->rtlhal.fwsize, and
this mismatch causes firmware checksum error, then firmware can't start.

In this situation, driver gives message "Firmware is not ready to run!".

Fixes: fe89707f0afa ("rtlwifi: rtl8821ae: Simplify loading of WOWLAN firmware")
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Cc: Stable <stable@vger.kernel.org> # 4.0+
Reviewed-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agortlwifi: Fix kernel Oops "Fw download fail!!"
Ping-Ke Shih [Fri, 22 Jun 2018 05:31:57 +0000 (13:31 +0800)]
rtlwifi: Fix kernel Oops "Fw download fail!!"

commit 12dfa2f68ab659636e092db13b5d17cf9aac82af upstream.

When connecting to AP, mac80211 asks driver to enter and leave PS quickly,
but driver deinit doesn't wait for delayed work complete when entering PS,
then driver reinit procedure and delay work are running simultaneously.
This will cause unpredictable kernel oops or crash like

rtl8723be: error H2C cmd because of Fw download fail!!!
WARNING: CPU: 3 PID: 159 at drivers/net/wireless/realtek/rtlwifi/
 rtl8723be/fw.c:227 rtl8723be_fill_h2c_cmd+0x182/0x510 [rtl8723be]
CPU: 3 PID: 159 Comm: kworker/3:2 Tainted: G       O     4.16.13-2-ARCH #1
Hardware name: ASUSTeK COMPUTER INC. X556UF/X556UF, BIOS X556UF.406
       10/21/2016
Workqueue: rtl8723be_pci rtl_c2hcmd_wq_callback [rtlwifi]
RIP: 0010:rtl8723be_fill_h2c_cmd+0x182/0x510 [rtl8723be]
RSP: 0018:ffffa6ab01e1bd70 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffffa26069071520 RCX: 0000000000000001
RDX: 0000000080000001 RSI: ffffffff8be70e9c RDI: 00000000ffffffff
RBP: 0000000000000000 R08: 0000000000000048 R09: 0000000000000348
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
R13: ffffa26069071520 R14: 0000000000000000 R15: ffffa2607d205f70
FS:  0000000000000000(0000) GS:ffffa26081d80000(0000) knlGS:000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000443b39d3000 CR3: 000000037700a005 CR4: 00000000003606e0
Call Trace:
 ? halbtc_send_bt_mp_operation.constprop.17+0xd5/0xe0 [btcoexist]
 ? ex_btc8723b1ant_bt_info_notify+0x3b8/0x820 [btcoexist]
 ? rtl_c2hcmd_launcher+0xab/0x110 [rtlwifi]
 ? process_one_work+0x1d1/0x3b0
 ? worker_thread+0x2b/0x3d0
 ? process_one_work+0x3b0/0x3b0
 ? kthread+0x112/0x130
 ? kthread_create_on_node+0x60/0x60
 ? ret_from_fork+0x35/0x40
Code: 00 76 b4 e9 e2 fe ff ff 4c 89 ee 4c 89 e7 e8 56 22 86 ca e9 5e ...

This patch ensures all delayed works done before entering PS to satisfy
our expectation, so use cancel_delayed_work_sync() instead. An exception
is delayed work ips_nic_off_wq because running task may be itself, so add
a parameter ips_wq to deinit function to handle this case.

This issue is reported and fixed in below threads:
https://github.com/lwfinger/rtlwifi_new/issues/367
https://github.com/lwfinger/rtlwifi_new/issues/366

Tested-by: Evgeny Kapun <abacabadabacaba@gmail.com> # 8723DE
Tested-by: Shivam Kakkar <shivam543@gmail.com> # 8723BE on 4.18-rc1
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Fixes: cceb0a597320 ("rtlwifi: Add work queue for c2h cmd.")
Cc: Stable <stable@vger.kernel.org> # 4.11+
Reviewed-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonet: cxgb3_main: fix potential Spectre v1
Gustavo A. R. Silva [Tue, 17 Jul 2018 01:59:58 +0000 (20:59 -0500)]
net: cxgb3_main: fix potential Spectre v1

commit 676bcfece19f83621e905aa55b5ed2d45cc4f2d3 upstream.

t.qset_idx can be indirectly controlled by user-space, hence leading to
a potential exploitation of the Spectre variant 1 vulnerability.

This issue was detected with the help of Smatch:

drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c:2286 cxgb_extension_ioctl()
warn: potential spectre issue 'adapter->msix_info'

Fix this by sanitizing t.qset_idx before using it to index
adapter->msix_info

Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].

[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2

Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agox86/kvm/Kconfig: Ensure CRYPTO_DEV_CCP_DD state at minimum matches KVM_AMD
Janakarajan Natarajan [Wed, 27 Jun 2018 16:30:53 +0000 (11:30 -0500)]
x86/kvm/Kconfig: Ensure CRYPTO_DEV_CCP_DD state at minimum matches KVM_AMD

commit d30f370d3a4998c13ed3e5c8ef607d05be0a987a upstream.

Prevent a config where KVM_AMD=y and CRYPTO_DEV_CCP_DD=m thereby ensuring
that AMD Secure Processor device driver will be built-in when KVM_AMD is
also built-in.

v1->v2:
* Removed usage of 'imply' Kconfig option.
* Change patch commit message.

Fixes: 505c9e94d832 ("KVM: x86: prefer "depends on" to "select" for SEV")
Cc: <stable@vger.kernel.org> # 4.16.x
Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Reviewed-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agovirtio_net: split XDP_TX kick and XDP_REDIRECT map flushing
Jesper Dangaard Brouer [Tue, 26 Jun 2018 15:39:58 +0000 (17:39 +0200)]
virtio_net: split XDP_TX kick and XDP_REDIRECT map flushing

[ Upstream commit 2471c75efed32529698c26da499954f0253cb401 ]

The driver was combining XDP_TX virtqueue_kick and XDP_REDIRECT
map flushing (xdp_do_flush_map).  This is suboptimal, these two
flush operations should be kept separate.

The suboptimal behavior was introduced in commit 9267c430c6b6
("virtio-net: add missing virtqueue kick when flushing packets").

Fixes: 9267c430c6b6 ("virtio-net: add missing virtqueue kick when flushing packets")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agosfc: correctly initialise filter rwsem for farch
Bert Kenward [Fri, 29 Jun 2018 15:29:28 +0000 (16:29 +0100)]
sfc: correctly initialise filter rwsem for farch

[ Upstream commit cafb39600e7a73263122a0e2db052d691686378f ]

Fixes: fc7a6c287ff3 ("sfc: use a semaphore to lock farch filters too")
Suggested-by: Joseph Korty <joe.korty@concurrent-rt.com>
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agos390/qeth: fix race when setting MAC address
Julian Wiedmann [Fri, 29 Jun 2018 17:45:51 +0000 (19:45 +0200)]
s390/qeth: fix race when setting MAC address

[ Upstream commit 4789a21880488048105590049fc41a99f53d565d ]

When qeth_l2_set_mac_address() finds the card in a non-reachable state,
it merely copies the new MAC address into dev->dev_addr so that
__qeth_l2_set_online() can later register it with the HW.

But __qeth_l2_set_online() may very well be running concurrently, so we
can't trust the card state without appropriate locking:
If the online sequence is past the point where it registers
dev->dev_addr (but not yet in SOFTSETUP state), any address change needs
to be properly programmed into the HW. Otherwise the netdevice ends up
with a different MAC address than what's set in the HW, and inbound
traffic is not forwarded as expected.

This is most likely to occur for OSD in LPAR, where
commit 21b1702af12e ("s390/qeth: improve fallback to random MAC address")
now triggers eg. systemd to immediately change the MAC when the netdevice
is registered with a NET_ADDR_RANDOM address.

Fixes: bcacfcbc82b4 ("s390/qeth: fix MAC address update sequence")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agos390/qeth: avoid using is_multicast_ether_addr_64bits on (u8 *)[6]
Vasily Gorbik [Fri, 29 Jun 2018 17:45:52 +0000 (19:45 +0200)]
s390/qeth: avoid using is_multicast_ether_addr_64bits on (u8 *)[6]

[ Upstream commit 9d0a58fb9747afd27d490c02a97889a1b59f6be4 ]

*ether_addr*_64bits functions have been introduced to optimize
performance critical paths, which access 6-byte ethernet address as u64
value to get "nice" assembly. A harmless hack works nicely on ethernet
addresses shoved into a structure or a larger buffer, until busted by
Kasan on smth like plain (u8 *)[6].

qeth_l2_set_mac_address calls qeth_l2_remove_mac passing
u8 old_addr[ETH_ALEN] as an argument.

Adding/removing macs for an ethernet adapter is not that performance
critical. Moreover is_multicast_ether_addr_64bits itself on s390 is not
faster than is_multicast_ether_addr:

is_multicast_ether_addr(%r2) -> %r2
llc %r2,0(%r2)
risbg %r2,%r2,63,191,0

is_multicast_ether_addr_64bits(%r2) -> %r2
llgc %r2,0(%r2)
risbg %r2,%r2,63,191,0

So, let's just use is_multicast_ether_addr instead of
is_multicast_ether_addr_64bits.

Fixes: bcacfcbc82b4 ("s390/qeth: fix MAC address update sequence")
Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoRevert "s390/qeth: use Read device to query hypervisor for MAC"
Julian Wiedmann [Fri, 29 Jun 2018 17:45:50 +0000 (19:45 +0200)]
Revert "s390/qeth: use Read device to query hypervisor for MAC"

[ Upstream commit 4664610537d398d55be19432f9cd9c29c831e159 ]

This reverts commit b7493e91c11a757cf0f8ab26989642ee4bb2c642.

On its own, querying RDEV for a MAC address works fine. But when upgrading
from a qeth that previously queried DDEV on a z/VM NIC (ie. any kernel with
commit ec61bd2fd2a2), the RDEV query now returns a _different_ MAC address
than the DDEV query.

If the NIC is configured with MACPROTECT, z/VM apparently requires us to
use the MAC that was initially returned (on DDEV) and registered. So after
upgrading to a kernel that uses RDEV, the SETVMAC registration cmd for the
new MAC address fails and we end up with a non-operabel interface.

To avoid regressions on upgrade, switch back to using DDEV for the MAC
address query. The downgrade path (first RDEV, later DDEV) is fine, in this
case both queries return the same MAC address.

Fixes: b7493e91c11a ("s390/qeth: use Read device to query hypervisor for MAC")
Reported-by: Michal Kubecek <mkubecek@suse.com>
Tested-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoIB/mlx5: Avoid dealing with vport representors if not being e-switch manager
Or Gerlitz [Thu, 31 May 2018 08:40:17 +0000 (11:40 +0300)]
IB/mlx5: Avoid dealing with vport representors if not being e-switch manager

[ Upstream commit aff2252a2ad3844ca47bf2f18af071101baace40 ]

In smartnic env, the host (PF) driver might not be an e-switch
manager, hence the switchdev mode representors are running on
the embedded cpu (EC) and not at the host.

As such, we should avoid dealing with vport representors if
not being esw manager.

Fixes: b5ca15ad7e61 ('IB/mlx5: Add proper representors support')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoi40e: split XDP_TX tail and XDP_REDIRECT map flushing
Jesper Dangaard Brouer [Tue, 26 Jun 2018 15:39:53 +0000 (17:39 +0200)]
i40e: split XDP_TX tail and XDP_REDIRECT map flushing

[ Upstream commit 2e6893123830d04edc366e0ce59d46e622e140c1 ]

The driver was combining the XDP_TX tail flush and XDP_REDIRECT
map flushing (xdp_do_flush_map).  This is suboptimal, these two
flush operations should be kept separate.

It looks like the mistake was copy-pasted from ixgbe.

Fixes: d9314c474d4f ("i40e: add support for XDP_REDIRECT")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoenic: do not overwrite error code
Govindarajulu Varadarajan [Mon, 18 Jun 2018 17:01:05 +0000 (10:01 -0700)]
enic: do not overwrite error code

[ Upstream commit 56f772279a762984f6e9ebbf24a7c829faba5712 ]

In failure path, we overwrite err to what vnic_rq_disable() returns. In
case it returns 0, enic_open() returns success in case of error.

Reported-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Fixes: e8588e268509 ("enic: enable rq before updating rq descriptors")
Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoxen-netfront: Update features after registering netdev
Ross Lagerwall [Thu, 21 Jun 2018 13:00:21 +0000 (14:00 +0100)]
xen-netfront: Update features after registering netdev

[ Upstream commit 45c8184c1bed1ca8a7f02918552063a00b909bf5 ]

Update the features after calling register_netdev() otherwise the
device features are not set up correctly and it not possible to change
the MTU of the device. After this change, the features reported by
ethtool match the device's features before the commit which introduced
the issue and it is possible to change the device's MTU.

Fixes: f599c64fdf7d ("xen-netfront: Fix race between device setup and open")
Reported-by: Liam Shepherd <liam@dancer.es>
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoxen-netfront: Fix mismatched rtnl_unlock
Ross Lagerwall [Thu, 21 Jun 2018 13:00:20 +0000 (14:00 +0100)]
xen-netfront: Fix mismatched rtnl_unlock

[ Upstream commit cb257783c2927b73614b20f915a91ff78aa6f3e8 ]

Fixes: f599c64fdf7d ("xen-netfront: Fix race between device setup and open")
Reported-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonfp: reject binding to shared blocks
John Hurley [Tue, 26 Jun 2018 03:36:28 +0000 (20:36 -0700)]
nfp: reject binding to shared blocks

[ Upstream commit 951a8ee6def39e25d0e60b9394e5a249ba8b2390 ]

TC shared blocks allow multiple qdiscs to be grouped together and filters
shared between them. Currently the chains of filters attached to a block
are only flushed when the block is removed. If a qdisc is removed from a
block but the block still exists, flow del messages are not passed to the
callback registered for that qdisc. For the NFP, this presents the
possibility of rules still existing in hw when they should be removed.

Prevent binding to shared blocks until the kernel can send per qdisc del
messages when block unbinds occur.

tcf_block_shared() was not used outside of the core until now, so also
add an empty implementation for builds with CONFIG_NET_CLS=n.

Fixes: 4861738775d7 ("net: sched: introduce shared filter blocks infrastructure")
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonet: use dev_change_tx_queue_len() for SIOCSIFTXQLEN
Cong Wang [Fri, 29 Jun 2018 20:42:48 +0000 (13:42 -0700)]
net: use dev_change_tx_queue_len() for SIOCSIFTXQLEN

[ Upstream commit 3f76df198288ceec92fc9eddecad1e73c52769b0 ]

As noticed by Eric, we need to switch to the helper
dev_change_tx_queue_len() for SIOCSIFTXQLEN call path too,
otheriwse still miss dev_qdisc_change_tx_queue_len().

Fixes: 6a643ddb5624 ("net: introduce helper dev_change_tx_queue_len()")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonet: macb: initialize bp->queues[0].bp for at91rm9200
Alexandre Belloni [Tue, 26 Jun 2018 08:44:01 +0000 (10:44 +0200)]
net: macb: initialize bp->queues[0].bp for at91rm9200

[ Upstream commit fec9d3b1dc4c481f20f5d2f5aef3ad1cb7504186 ]

The macb driver currently crashes on at91rm9200 with the following trace:

Unable to handle kernel NULL pointer dereference at virtual address 00000014
[...]
[<c031da44>] (macb_rx_desc) from [<c031f2bc>] (at91ether_open+0x2e8/0x3f8)
[<c031f2bc>] (at91ether_open) from [<c041e8d8>] (__dev_open+0x120/0x13c)
[<c041e8d8>] (__dev_open) from [<c041ec08>] (__dev_change_flags+0x17c/0x1a8)
[<c041ec08>] (__dev_change_flags) from [<c041ec4c>] (dev_change_flags+0x18/0x4c)
[<c041ec4c>] (dev_change_flags) from [<c07a5f4c>] (ip_auto_config+0x220/0x10b0)
[<c07a5f4c>] (ip_auto_config) from [<c000a4fc>] (do_one_initcall+0x78/0x18c)
[<c000a4fc>] (do_one_initcall) from [<c0783e50>] (kernel_init_freeable+0x184/0x1c4)
[<c0783e50>] (kernel_init_freeable) from [<c0574d70>] (kernel_init+0x8/0xe8)
[<c0574d70>] (kernel_init) from [<c00090e0>] (ret_from_fork+0x14/0x34)

Solve that by initializing bp->queues[0].bp in at91ether_init (as is done
in macb_init).

Fixes: ae1f2a56d273 ("net: macb: Added support for many RX queues")
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonfp: flower: fix mpls ether type detection
Pieter Jansen van Vuuren [Tue, 26 Jun 2018 03:36:27 +0000 (20:36 -0700)]
nfp: flower: fix mpls ether type detection

[ Upstream commit a64119415ff248efa61301783bc26551df5dabf6 ]

Previously it was not possible to distinguish between mpls ether types and
other ether types. This leads to incorrect classification of offloaded
filters that match on mpls ether type. For example the following two
filters overlap:

 # tc filter add dev eth0 parent ffff: \
    protocol 0x8847 flower \
    action mirred egress redirect dev eth1

 # tc filter add dev eth0 parent ffff: \
    protocol 0x0800 flower \
    action mirred egress redirect dev eth2

The driver now correctly includes the mac_mpls layer where HW stores mpls
fields, when it detects an mpls ether type. It also sets the MPLS_Q bit to
indicate that the filter should match mpls packets.

Fixes: bb055c198d9b ("nfp: add mpls match offloading support")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agohinic: reset irq affinity before freeing irq
Wei Yongjun [Wed, 27 Jun 2018 09:47:37 +0000 (17:47 +0800)]
hinic: reset irq affinity before freeing irq

[ Upstream commit 82be2ab159a3a0ae4024b946a31f12b221f6c8ff ]

Following warning is seen when rmmod hinic. This is because affinity
value is not reset before calling free_irq(). This patch fixes it.

[   55.181232] WARNING: CPU: 38 PID: 19589 at kernel/irq/manage.c:1608
__free_irq+0x2aa/0x2c0

Fixes: 352f58b0d9f2 ("net-next/hinic: Set Rxq irq to specific cpu for NUMA")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoVSOCK: fix loopback on big-endian systems
Claudio Imbrenda [Wed, 20 Jun 2018 13:51:51 +0000 (15:51 +0200)]
VSOCK: fix loopback on big-endian systems

[ Upstream commit e5ab564c9ebee77794842ca7d7476147b83d6a27 ]

The dst_cid and src_cid are 64 bits, therefore 64 bit accessors should be
used, and in fact in virtio_transport_common.c only 64 bit accessors are
used. Using 32 bit accessors for 64 bit values breaks big endian systems.

This patch fixes a wrong use of le32_to_cpu in virtio_transport_send_pkt.

Fixes: b9116823189e85ccf384 ("VSOCK: add loopback to virtio_transport")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agovhost_net: validate sock before trying to put its fd
Jason Wang [Thu, 21 Jun 2018 05:11:31 +0000 (13:11 +0800)]
vhost_net: validate sock before trying to put its fd

[ Upstream commit b8f1f65882f07913157c44673af7ec0b308d03eb ]

Sock will be NULL if we pass -1 to vhost_net_set_backend(), but when
we meet errors during ubuf allocation, the code does not check for
NULL before calling sockfd_put(), this will lead NULL
dereferencing. Fixing by checking sock pointer before.

Fixes: bab632d69ee4 ("vhost: vhost TX zero-copy support")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agotcp: prevent bogus FRTO undos with non-SACK flows
Ilpo Järvinen [Fri, 29 Jun 2018 10:07:53 +0000 (13:07 +0300)]
tcp: prevent bogus FRTO undos with non-SACK flows

[ Upstream commit 1236f22fbae15df3736ab4a984c64c0c6ee6254c ]

If SACK is not enabled and the first cumulative ACK after the RTO
retransmission covers more than the retransmitted skb, a spurious
FRTO undo will trigger (assuming FRTO is enabled for that RTO).
The reason is that any non-retransmitted segment acknowledged will
set FLAG_ORIG_SACK_ACKED in tcp_clean_rtx_queue even if there is
no indication that it would have been delivered for real (the
scoreboard is not kept with TCPCB_SACKED_ACKED bits in the non-SACK
case so the check for that bit won't help like it does with SACK).
Having FLAG_ORIG_SACK_ACKED set results in the spurious FRTO undo
in tcp_process_loss.

We need to use more strict condition for non-SACK case and check
that none of the cumulatively ACKed segments were retransmitted
to prove that progress is due to original transmissions. Only then
keep FLAG_ORIG_SACK_ACKED set, allowing FRTO undo to proceed in
non-SACK case.

(FLAG_ORIG_SACK_ACKED is planned to be renamed to FLAG_ORIG_PROGRESS
to better indicate its purpose but to keep this change minimal, it
will be done in another patch).

Besides burstiness and congestion control violations, this problem
can result in RTO loop: When the loss recovery is prematurely
undoed, only new data will be transmitted (if available) and
the next retransmission can occur only after a new RTO which in case
of multiple losses (that are not for consecutive packets) requires
one RTO per loss to recover.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Tested-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agotcp: fix Fast Open key endianness
Yuchung Cheng [Wed, 27 Jun 2018 23:04:48 +0000 (16:04 -0700)]
tcp: fix Fast Open key endianness

[ Upstream commit c860e997e9170a6d68f9d1e6e2cf61f572191aaf ]

Fast Open key could be stored in different endian based on the CPU.
Previously hosts in different endianness in a server farm using
the same key config (sysctl value) would produce different cookies.
This patch fixes it by always storing it as little endian to keep
same API for LE hosts.

Reported-by: Daniele Iamartino <danielei@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agostrparser: Remove early eaten to fix full tcp receive buffer stall
Doron Roberts-Kedes [Wed, 27 Jun 2018 01:33:33 +0000 (18:33 -0700)]
strparser: Remove early eaten to fix full tcp receive buffer stall

[ Upstream commit 977c7114ebda2e746a114840d3a875e0cdb826fb ]

On receving an incomplete message, the existing code stores the
remaining length of the cloned skb in the early_eaten field instead of
incrementing the value returned by __strp_recv. This defers invocation
of sock_rfree for the current skb until the next invocation of
__strp_recv, which returns early_eaten if early_eaten is non-zero.

This behavior causes a stall when the current message occupies the very
tail end of a massive skb, and strp_peek/need_bytes indicates that the
remainder of the current message has yet to arrive on the socket. The
TCP receive buffer is totally full, causing the TCP window to go to
zero, so the remainder of the message will never arrive.

Incrementing the value returned by __strp_recv by the amount otherwise
stored in early_eaten prevents stalls of this nature.

Signed-off-by: Doron Roberts-Kedes <doronrk@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agostmmac: fix DMA channel hang in half-duplex mode
Bhadram Varka [Sun, 17 Jun 2018 14:32:05 +0000 (20:02 +0530)]
stmmac: fix DMA channel hang in half-duplex mode

[ Upstream commit b6cfffa7ad923c73f317ea50fd4ebcb3b4b6669c ]

HW does not support Half-duplex mode in multi-queue
scenario. Fix it by not advertising the Half-Duplex
mode if multi-queue enabled.

Signed-off-by: Bhadram Varka <vbhadram@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agos390/qeth: don't clobber buffer on async TX completion
Julian Wiedmann [Fri, 29 Jun 2018 17:45:53 +0000 (19:45 +0200)]
s390/qeth: don't clobber buffer on async TX completion

[ Upstream commit ce28867fd20c23cd769e78b4d619c4755bf71a1c ]

If qeth_qdio_output_handler() detects that a transmit requires async
completion, it replaces the pending buffer's metadata object
(qeth_qdio_out_buffer) so that this queue buffer can be re-used while
the data is pending completion.

Later when the CQ indicates async completion of such a metadata object,
qeth_qdio_cq_handler() tries to free any data associated with this
object (since HW has now completed the transfer). By calling
qeth_clear_output_buffer(), it erronously operates on the queue buffer
that _previously_ belonged to this transfer ... but which has been
potentially re-used several times by now.
This results in double-free's of the buffer's data, and failing
transmits as the buffer descriptor is scrubbed in mid-air.

The correct way of handling this situation is to
1. scrub the queue buffer when it is prepared for re-use, and
2. later obtain the data addresses from the async-completion notifier
   (ie. the AOB), instead of the queue buffer.

All this only affects qeth devices used for af_iucv HiperTransport.

Fixes: 0da9581ddb0f ("qeth: exploit asynchronous delivery of storage blocks")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agor8152: napi hangup fix after disconnect
Jiri Slaby [Mon, 25 Jun 2018 07:26:27 +0000 (09:26 +0200)]
r8152: napi hangup fix after disconnect

[ Upstream commit 0ee1f4734967af8321ecebaf9c74221ace34f2d5 ]

When unplugging an r8152 adapter while the interface is UP, the NIC
becomes unusable.  usb->disconnect (aka rtl8152_disconnect) deletes
napi. Then, rtl8152_disconnect calls unregister_netdev and that invokes
netdev->ndo_stop (aka rtl8152_close). rtl8152_close tries to
napi_disable, but the napi is already deleted by disconnect above. So
the first while loop in napi_disable never finishes. This results in
complete deadlock of the network layer as there is rtnl_mutex held by
unregister_netdev.

So avoid the call to napi_disable in rtl8152_close when the device is
already gone.

The other calls to usb_kill_urb, cancel_delayed_work_sync,
netif_stop_queue etc. seem to be fine. The urb and netdev is not
destroyed yet.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: linux-usb@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoqmi_wwan: add support for the Dell Wireless 5821e module
Aleksander Morgado [Sat, 23 Jun 2018 21:22:52 +0000 (23:22 +0200)]
qmi_wwan: add support for the Dell Wireless 5821e module

[ Upstream commit e7e197edd09c25774b4f12cab19f9d5462f240f4 ]

This module exposes two USB configurations: a QMI+AT capable setup on
USB config #1 and a MBIM capable setup on USB config #2.

By default the kernel will choose the MBIM capable configuration as
long as the cdc_mbim driver is available. This patch adds support for
the QMI port in the secondary configuration.

Signed-off-by: Aleksander Morgado <aleksander@aleksander.es>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoqed: Limit msix vectors in kdump kernel to the minimum required count.
Sudarsana Reddy Kalluru [Mon, 2 Jul 2018 03:03:05 +0000 (20:03 -0700)]
qed: Limit msix vectors in kdump kernel to the minimum required count.

[ Upstream commit bb7858ba1102f82470a917e041fd23e6385c31be ]

Memory size is limited in the kdump kernel environment. Allocation of more
msix-vectors (or queues) consumes few tens of MBs of memory, which might
lead to the kdump kernel failure.
This patch adds changes to limit the number of MSI-X vectors in kdump
kernel to minimum required value (i.e., 2 per engine).

Fixes: fe56b9e6a ("qed: Add module with basic common support")
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoqed: Fix use of incorrect size in memcpy call.
Sudarsana Reddy Kalluru [Mon, 2 Jul 2018 03:03:07 +0000 (20:03 -0700)]
qed: Fix use of incorrect size in memcpy call.

[ Upstream commit cc9b27cdf7bd3c86df73439758ac1564bc8f5bbe ]

Use the correct size value while copying chassis/port id values.

Fixes: 6ad8c632e ("qed: Add support for query/config dcbx.")
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoqed: Fix setting of incorrect eswitch mode.
Sudarsana Reddy Kalluru [Mon, 2 Jul 2018 03:03:06 +0000 (20:03 -0700)]
qed: Fix setting of incorrect eswitch mode.

[ Upstream commit 538f8d00ba8bb417c4d9e76c61dee59d812d8287 ]

By default, driver sets the eswitch mode incorrectly as VEB (virtual
Ethernet bridging).
Need to set VEB eswitch mode only when sriov is enabled, and it should be
to set NONE by default. The patch incorporates this change.

Fixes: 0fefbfbaa ("qed*: Management firmware - notifications and defaults")
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoqede: Adverstise software timestamp caps when PHC is not available.
Sudarsana Reddy Kalluru [Mon, 2 Jul 2018 03:03:08 +0000 (20:03 -0700)]
qede: Adverstise software timestamp caps when PHC is not available.

[ Upstream commit 82a4e71b1565dea8387f54503e806cf374e779ec ]

When ptp clock is not available for a PF (e.g., higher PFs in NPAR mode),
get-tsinfo() callback should return the software timestamp capabilities
instead of returning the error.

Fixes: 4c55215c ("qede: Add driver support for PTP")
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonet/tcp: Fix socket lookups with SO_BINDTODEVICE
David Ahern [Mon, 18 Jun 2018 19:30:37 +0000 (12:30 -0700)]
net/tcp: Fix socket lookups with SO_BINDTODEVICE

[ Upstream commit 8c43bd1706885ba1acfa88da02bc60a2ec16f68c ]

Similar to 69678bcd4d2d ("udp: fix SO_BINDTODEVICE"), TCP socket lookups
need to fail if dev_match is not true. Currently, a packet to a given port
can match a socket bound to device when it should not. In the VRF case,
this causes the lookup to hit a VRF socket and not a global socket
resulting in a response trying to go through the VRF when it should not.

Fixes: 3fa6f616a7a4d ("net: ipv4: add second dif to inet socket lookups")
Fixes: 4297a0ef08572 ("net: ipv6: add second dif to inet6 socket lookups")
Reported-by: Lou Berger <lberger@labn.net>
Diagnosed-by: Renato Westphal <renato@opensourcerouting.org>
Tested-by: Renato Westphal <renato@opensourcerouting.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonet: sungem: fix rx checksum support
Eric Dumazet [Wed, 20 Jun 2018 02:18:50 +0000 (19:18 -0700)]
net: sungem: fix rx checksum support

[ Upstream commit 12b03558cef6d655d0d394f5e98a6fd07c1f6c0f ]

After commit 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE
are friends"), sungem owners reported the infamous "eth0: hw csum failure"
message.

CHECKSUM_COMPLETE has in fact never worked for this driver, but this
was masked by the fact that upper stacks had to strip the FCS, and
therefore skb->ip_summed was set back to CHECKSUM_NONE before
my recent change.

Driver configures a number of bytes to skip when the chip computes
the checksum, and for some reason only half of the Ethernet header
was skipped.

Then a second problem is that we should strip the FCS by default,
unless the driver is updated to eventually support NETIF_F_RXFCS in
the future.

Finally, a driver should check if NETIF_F_RXCSUM feature is enabled
or not, so that the admin can turn off rx checksum if wanted.

Many thanks to Andreas Schwab and Mathieu Malaterre for their
help in debugging this issue.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Meelis Roos <mroos@linux.ee>
Reported-by: Mathieu Malaterre <malat@debian.org>
Reported-by: Andreas Schwab <schwab@linux-m68k.org>
Tested-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonet_sched: blackhole: tell upper qdisc about dropped packets
Konstantin Khlebnikov [Fri, 15 Jun 2018 10:27:31 +0000 (13:27 +0300)]
net_sched: blackhole: tell upper qdisc about dropped packets

[ Upstream commit 7e85dc8cb35abf16455f1511f0670b57c1a84608 ]

When blackhole is used on top of classful qdisc like hfsc it breaks
qlen and backlog counters because packets are disappear without notice.

In HFSC non-zero qlen while all classes are inactive triggers warning:
WARNING: ... at net/sched/sch_hfsc.c:1393 hfsc_dequeue+0xba4/0xe90 [sch_hfsc]
and schedules watchdog work endlessly.

This patch return __NET_XMIT_BYPASS in addition to NET_XMIT_SUCCESS,
this flag tells upper layer: this packet is gone and isn't queued.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonet/sched: act_ife: preserve the action control in case of error
Davide Caratti [Tue, 19 Jun 2018 13:45:50 +0000 (15:45 +0200)]
net/sched: act_ife: preserve the action control in case of error

[ Upstream commit cbf56c29624fa056a0c1c3d177e67aa51a7fd8d6 ]

in the following script

 # tc actions add action ife encode allow prio pass index 42
 # tc actions replace action ife encode allow tcindex drop index 42

the action control should remain equal to 'pass', if the kernel failed
to replace the TC action. Pospone the assignment of the action control,
to ensure it is not overwritten in the error path of tcf_ife_init().

Fixes: ef6980b6becb ("introduce IFE action")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonet/sched: act_ife: fix recursive lock and idr leak
Davide Caratti [Tue, 19 Jun 2018 13:39:46 +0000 (15:39 +0200)]
net/sched: act_ife: fix recursive lock and idr leak

[ Upstream commit 0a889b9404c084c6fd145020c939a8f688b3e058 ]

a recursive lock warning [1] can be observed with the following script,

 # $TC actions add action ife encode allow prio pass index 42
 IFE type 0xED3E
 # $TC actions replace action ife encode allow tcindex pass index 42

in case the kernel was unable to run the last command (e.g. because of
the impossibility to load 'act_meta_skbtcindex'). For a similar reason,
the kernel can leak idr in the error path of tcf_ife_init(), because
tcf_idr_release() is not called after successful idr reservation:

 # $TC actions add action ife encode allow tcindex index 47
 IFE type 0xED3E
 RTNETLINK answers: No such file or directory
 We have an error talking to the kernel
 # $TC actions add action ife encode allow tcindex index 47
 IFE type 0xED3E
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel
 # $TC actions add action ife encode use mark 7 type 0xfefe pass index 47
 IFE type 0xFEFE
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel

Since tcfa_lock is already taken when the action is being edited, a call
to tcf_idr_release() wrongly makes tcf_idr_cleanup() take the same lock
again. On the other hand, tcf_idr_release() needs to be called in the
error path of tcf_ife_init(), to undo the last tcf_idr_create() invocation.
Fix both problems in tcf_ife_init().
Since the cleanup() routine can now be called when ife->params is NULL,
also add a NULL pointer check to avoid calling kfree_rcu(NULL, rcu).

 [1]
 ============================================
 WARNING: possible recursive locking detected
 4.17.0-rc4.kasan+ #417 Tainted: G            E
 --------------------------------------------
 tc/3932 is trying to acquire lock:
 000000005097c9a6 (&(&p->tcfa_lock)->rlock){+...}, at: tcf_ife_cleanup+0x19/0x80 [act_ife]

 but task is already holding lock:
 000000005097c9a6 (&(&p->tcfa_lock)->rlock){+...}, at: tcf_ife_init+0xf6d/0x13c0 [act_ife]

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&(&p->tcfa_lock)->rlock);
   lock(&(&p->tcfa_lock)->rlock);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

 2 locks held by tc/3932:
  #0: 000000007ca8e990 (rtnl_mutex){+.+.}, at: tcf_ife_init+0xf61/0x13c0 [act_ife]
  #1: 000000005097c9a6 (&(&p->tcfa_lock)->rlock){+...}, at: tcf_ife_init+0xf6d/0x13c0 [act_ife]

 stack backtrace:
 CPU: 3 PID: 3932 Comm: tc Tainted: G            E     4.17.0-rc4.kasan+ #417
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 Call Trace:
  dump_stack+0x9a/0xeb
  __lock_acquire+0xf43/0x34a0
  ? debug_check_no_locks_freed+0x2b0/0x2b0
  ? debug_check_no_locks_freed+0x2b0/0x2b0
  ? debug_check_no_locks_freed+0x2b0/0x2b0
  ? __mutex_lock+0x62f/0x1240
  ? kvm_sched_clock_read+0x1a/0x30
  ? sched_clock+0x5/0x10
  ? sched_clock_cpu+0x18/0x170
  ? find_held_lock+0x39/0x1d0
  ? lock_acquire+0x10b/0x330
  lock_acquire+0x10b/0x330
  ? tcf_ife_cleanup+0x19/0x80 [act_ife]
  _raw_spin_lock_bh+0x38/0x70
  ? tcf_ife_cleanup+0x19/0x80 [act_ife]
  tcf_ife_cleanup+0x19/0x80 [act_ife]
  __tcf_idr_release+0xff/0x350
  tcf_ife_init+0xdde/0x13c0 [act_ife]
  ? ife_exit_net+0x290/0x290 [act_ife]
  ? __lock_is_held+0xb4/0x140
  tcf_action_init_1+0x67b/0xad0
  ? tcf_action_dump_old+0xa0/0xa0
  ? sched_clock+0x5/0x10
  ? sched_clock_cpu+0x18/0x170
  ? kvm_sched_clock_read+0x1a/0x30
  ? sched_clock+0x5/0x10
  ? sched_clock_cpu+0x18/0x170
  ? memset+0x1f/0x40
  tcf_action_init+0x30f/0x590
  ? tcf_action_init_1+0xad0/0xad0
  ? memset+0x1f/0x40
  tc_ctl_action+0x48e/0x5e0
  ? mutex_lock_io_nested+0x1160/0x1160
  ? tca_action_gd+0x990/0x990
  ? sched_clock+0x5/0x10
  ? find_held_lock+0x39/0x1d0
  rtnetlink_rcv_msg+0x4da/0x990
  ? validate_linkmsg+0x680/0x680
  ? sched_clock_cpu+0x18/0x170
  ? find_held_lock+0x39/0x1d0
  netlink_rcv_skb+0x127/0x350
  ? validate_linkmsg+0x680/0x680
  ? netlink_ack+0x970/0x970
  ? __kmalloc_node_track_caller+0x304/0x3a0
  netlink_unicast+0x40f/0x5d0
  ? netlink_attachskb+0x580/0x580
  ? _copy_from_iter_full+0x187/0x760
  ? import_iovec+0x90/0x390
  netlink_sendmsg+0x67f/0xb50
  ? netlink_unicast+0x5d0/0x5d0
  ? copy_msghdr_from_user+0x206/0x340
  ? netlink_unicast+0x5d0/0x5d0
  sock_sendmsg+0xb3/0xf0
  ___sys_sendmsg+0x60a/0x8b0
  ? copy_msghdr_from_user+0x340/0x340
  ? lock_downgrade+0x5e0/0x5e0
  ? tty_write_lock+0x18/0x50
  ? kvm_sched_clock_read+0x1a/0x30
  ? sched_clock+0x5/0x10
  ? sched_clock_cpu+0x18/0x170
  ? find_held_lock+0x39/0x1d0
  ? lock_downgrade+0x5e0/0x5e0
  ? lock_acquire+0x10b/0x330
  ? __audit_syscall_entry+0x316/0x690
  ? current_kernel_time64+0x6b/0xd0
  ? __fget_light+0x55/0x1f0
  ? __sys_sendmsg+0xd2/0x170
  __sys_sendmsg+0xd2/0x170
  ? __ia32_sys_shutdown+0x70/0x70
  ? syscall_trace_enter+0x57a/0xd60
  ? rcu_read_lock_sched_held+0xdc/0x110
  ? __bpf_trace_sys_enter+0x10/0x10
  ? do_syscall_64+0x22/0x480
  do_syscall_64+0xa5/0x480
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
 RIP: 0033:0x7fd646988ba0
 RSP: 002b:00007fffc9fab3c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
 RAX: ffffffffffffffda RBX: 00007fffc9fab4f0 RCX: 00007fd646988ba0
 RDX: 0000000000000000 RSI: 00007fffc9fab440 RDI: 0000000000000003
 RBP: 000000005b28c8b3 R08: 0000000000000002 R09: 0000000000000000
 R10: 00007fffc9faae20 R11: 0000000000000246 R12: 0000000000000000
 R13: 00007fffc9fab504 R14: 0000000000000001 R15: 000000000066c100

Fixes: 4e8c86155010 ("net sched: net sched: ife action fix late binding")
Fixes: ef6980b6becb ("introduce IFE action")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonet/packet: fix use-after-free
Eric Dumazet [Thu, 21 Jun 2018 21:16:02 +0000 (14:16 -0700)]
net/packet: fix use-after-free

[ Upstream commit 945d015ee0c3095d2290e845565a23dedfd8027c ]

We should put copy_skb in receive_queue only after
a successful call to virtio_net_hdr_from_skb().

syzbot report :

BUG: KASAN: use-after-free in __skb_unlink include/linux/skbuff.h:1843 [inline]
BUG: KASAN: use-after-free in __skb_dequeue include/linux/skbuff.h:1863 [inline]
BUG: KASAN: use-after-free in skb_dequeue+0x16a/0x180 net/core/skbuff.c:2815
Read of size 8 at addr ffff8801b044ecc0 by task syz-executor217/4553

CPU: 0 PID: 4553 Comm: syz-executor217 Not tainted 4.18.0-rc1+ #111
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
 print_address_description+0x6c/0x20b mm/kasan/report.c:256
 kasan_report_error mm/kasan/report.c:354 [inline]
 kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
 __skb_unlink include/linux/skbuff.h:1843 [inline]
 __skb_dequeue include/linux/skbuff.h:1863 [inline]
 skb_dequeue+0x16a/0x180 net/core/skbuff.c:2815
 skb_queue_purge+0x26/0x40 net/core/skbuff.c:2852
 packet_set_ring+0x675/0x1da0 net/packet/af_packet.c:4331
 packet_release+0x630/0xd90 net/packet/af_packet.c:2991
 __sock_release+0xd7/0x260 net/socket.c:603
 sock_close+0x19/0x20 net/socket.c:1186
 __fput+0x35b/0x8b0 fs/file_table.c:209
 ____fput+0x15/0x20 fs/file_table.c:243
 task_work_run+0x1ec/0x2a0 kernel/task_work.c:113
 exit_task_work include/linux/task_work.h:22 [inline]
 do_exit+0x1b08/0x2750 kernel/exit.c:865
 do_group_exit+0x177/0x440 kernel/exit.c:968
 __do_sys_exit_group kernel/exit.c:979 [inline]
 __se_sys_exit_group kernel/exit.c:977 [inline]
 __x64_sys_exit_group+0x3e/0x50 kernel/exit.c:977
 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4448e9
Code: Bad RIP value.
RSP: 002b:00007ffd5f777ca8 EFLAGS: 00000202 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004448e9
RDX: 00000000004448e9 RSI: 000000000000fcfb RDI: 0000000000000001
RBP: 00000000006cf018 R08: 00007ffd0000a45b R09: 0000000000000000
R10: 00007ffd5f777e48 R11: 0000000000000202 R12: 00000000004021f0
R13: 0000000000402280 R14: 0000000000000000 R15: 0000000000000000

Allocated by task 4553:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:448
 set_track mm/kasan/kasan.c:460 [inline]
 kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
 kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
 kmem_cache_alloc+0x12e/0x760 mm/slab.c:3554
 skb_clone+0x1f5/0x500 net/core/skbuff.c:1282
 tpacket_rcv+0x28f7/0x3200 net/packet/af_packet.c:2221
 deliver_skb net/core/dev.c:1925 [inline]
 deliver_ptype_list_skb net/core/dev.c:1940 [inline]
 __netif_receive_skb_core+0x1bfb/0x3680 net/core/dev.c:4611
 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4693
 netif_receive_skb_internal+0x12e/0x7d0 net/core/dev.c:4767
 netif_receive_skb+0xbf/0x420 net/core/dev.c:4791
 tun_rx_batched.isra.55+0x4ba/0x8c0 drivers/net/tun.c:1571
 tun_get_user+0x2af1/0x42f0 drivers/net/tun.c:1981
 tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:2009
 call_write_iter include/linux/fs.h:1795 [inline]
 new_sync_write fs/read_write.c:474 [inline]
 __vfs_write+0x6c6/0x9f0 fs/read_write.c:487
 vfs_write+0x1f8/0x560 fs/read_write.c:549
 ksys_write+0x101/0x260 fs/read_write.c:598
 __do_sys_write fs/read_write.c:610 [inline]
 __se_sys_write fs/read_write.c:607 [inline]
 __x64_sys_write+0x73/0xb0 fs/read_write.c:607
 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 4553:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:448
 set_track mm/kasan/kasan.c:460 [inline]
 __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
 __cache_free mm/slab.c:3498 [inline]
 kmem_cache_free+0x86/0x2d0 mm/slab.c:3756
 kfree_skbmem+0x154/0x230 net/core/skbuff.c:582
 __kfree_skb net/core/skbuff.c:642 [inline]
 kfree_skb+0x1a5/0x580 net/core/skbuff.c:659
 tpacket_rcv+0x189e/0x3200 net/packet/af_packet.c:2385
 deliver_skb net/core/dev.c:1925 [inline]
 deliver_ptype_list_skb net/core/dev.c:1940 [inline]
 __netif_receive_skb_core+0x1bfb/0x3680 net/core/dev.c:4611
 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4693
 netif_receive_skb_internal+0x12e/0x7d0 net/core/dev.c:4767
 netif_receive_skb+0xbf/0x420 net/core/dev.c:4791
 tun_rx_batched.isra.55+0x4ba/0x8c0 drivers/net/tun.c:1571
 tun_get_user+0x2af1/0x42f0 drivers/net/tun.c:1981
 tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:2009
 call_write_iter include/linux/fs.h:1795 [inline]
 new_sync_write fs/read_write.c:474 [inline]
 __vfs_write+0x6c6/0x9f0 fs/read_write.c:487
 vfs_write+0x1f8/0x560 fs/read_write.c:549
 ksys_write+0x101/0x260 fs/read_write.c:598
 __do_sys_write fs/read_write.c:610 [inline]
 __se_sys_write fs/read_write.c:607 [inline]
 __x64_sys_write+0x73/0xb0 fs/read_write.c:607
 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff8801b044ecc0
 which belongs to the cache skbuff_head_cache of size 232
The buggy address is located 0 bytes inside of
 232-byte region [ffff8801b044ecc0ffff8801b044eda8)
The buggy address belongs to the page:
page:ffffea0006c11380 count:1 mapcount:0 mapping:ffff8801d9be96c0 index:0x0
flags: 0x2fffc0000000100(slab)
raw: 02fffc0000000100 ffffea0006c17988 ffff8801d9bec248 ffff8801d9be96c0
raw: 0000000000000000 ffff8801b044e040 000000010000000c 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff8801b044eb80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff8801b044ec00: 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc
>ffff8801b044ec80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
                                           ^
 ffff8801b044ed00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff8801b044ed80: fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc

Fixes: 58d19b19cd99 ("packet: vnet_hdr support for tpacket_rcv")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonet: mvneta: fix the Rx desc DMA address in the Rx path
Antoine Tenart [Fri, 22 Jun 2018 08:15:39 +0000 (10:15 +0200)]
net: mvneta: fix the Rx desc DMA address in the Rx path

[ Upstream commit 271f7ff5aa5a73488b7a9d8b84b5205fb5b2f7cc ]

When using s/w buffer management, buffers are allocated and DMA mapped.
When doing so on an arm64 platform, an offset correction is applied on
the DMA address, before storing it in an Rx descriptor. The issue is
this DMA address is then used later in the Rx path without removing the
offset correction. Thus the DMA address is wrong, which can led to
various issues.

This patch fixes this by removing the offset correction from the DMA
address retrieved from the Rx descriptor before using it in the Rx path.

Fixes: 8d5047cf9ca2 ("net: mvneta: Convert to be 64 bits compatible")
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonet/mlx5: Fix wrong size allocation for QoS ETC TC regitster
Shay Agroskin [Tue, 22 May 2018 11:14:02 +0000 (14:14 +0300)]
net/mlx5: Fix wrong size allocation for QoS ETC TC regitster

[ Upstream commit d14fcb8d877caf1b8d6bd65d444bf62b21f2070c ]

The driver allocates wrong size (due to wrong struct name) when issuing
a query/set request to NIC's register.

Fixes: d8880795dabf ("net/mlx5e: Implement DCBNL IEEE max rate")
Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonet/mlx5: Fix required capability for manipulating MPFS
Eli Cohen [Wed, 13 Jun 2018 07:27:34 +0000 (10:27 +0300)]
net/mlx5: Fix required capability for manipulating MPFS

[ Upstream commit f811980444ec59ad62f9e041adbb576a821132c7 ]

Manipulating of the MPFS requires eswitch manager capabilities.

Fixes: eeb66cdb6826 ('net/mlx5: Separate between E-Switch and MPFS')
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonet/mlx5: Fix incorrect raw command length parsing
Alex Vesker [Fri, 25 May 2018 17:25:59 +0000 (20:25 +0300)]
net/mlx5: Fix incorrect raw command length parsing

[ Upstream commit 603b7bcff824740500ddfa001d7a7168b0b38542 ]

The NULL character was not set correctly for the string containing
the command length, this caused failures reading the output of the
command due to a random length. The fix is to initialize the output
length string.

Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonet/mlx5: Fix command interface race in polling mode
Alex Vesker [Tue, 12 Jun 2018 13:14:31 +0000 (16:14 +0300)]
net/mlx5: Fix command interface race in polling mode

[ Upstream commit d412c31dae053bf30a1bc15582a9990df297a660 ]

The command interface can work in two modes: Events and Polling.
In the general case, each time we invoke a command, a work is
queued to handle it.

When working in events, the interrupt handler completes the
command execution. On the other hand, when working in polling
mode, the work itself completes it.

Due to a bug in the work handler, a command could have been
completed by the interrupt handler, while the work handler
hasn't finished yet, causing the it to complete once again
if the command interface mode was changed from Events to
polling after the interrupt handler was called.

mlx5_unload_one()
        mlx5_stop_eqs()
                // Destroy the EQ before cmd EQ
                ...cmd_work_handler()
                        write_doorbell()
                        --> EVENT_TYPE_CMD
                                mlx5_cmd_comp_handler() // First free
                                        free_ent(cmd, ent->idx)
                                        complete(&ent->done)

        <-- mlx5_stop_eqs //cmd was complete
                // move to polling before destroying the last cmd EQ
                mlx5_cmd_use_polling()
                        cmd->mode = POLL;

                --> cmd_work_handler (continues)
                        if (cmd->mode == POLL)
                                mlx5_cmd_comp_handler() // Double free

The solution is to store the cmd->mode before writing the doorbell.

Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonet/mlx5: E-Switch, Avoid setup attempt if not being e-switch manager
Or Gerlitz [Thu, 31 May 2018 08:16:18 +0000 (11:16 +0300)]
net/mlx5: E-Switch, Avoid setup attempt if not being e-switch manager

[ Upstream commit 0efc8562491b7d36f6bbc4fbc8f3348cb6641e9c ]

In smartnic env, the host (PF) driver might not be an e-switch
manager, hence the FW will err on driver attempts to deal with
setting/unsetting the eswitch and as a result the overall setup
of sriov will fail.

Fix that by avoiding the operation if e-switch management is not
allowed for this driver instance. While here, move to use the
correct name for the esw manager capability name.

Fixes: 81848731ff40 ('net/mlx5: E-Switch, Add SR-IOV (FDB) support')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Guy Kushnir <guyk@mellanox.com>
Reviewed-by: Eli Cohen <eli@melloanox.com>
Tested-by: Eli Cohen <eli@melloanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonet/mlx5e: Don't attempt to dereference the ppriv struct if not being eswitch manager
Or Gerlitz [Mon, 4 Jun 2018 16:46:53 +0000 (19:46 +0300)]
net/mlx5e: Don't attempt to dereference the ppriv struct if not being eswitch manager

[ Upstream commit 8ffd569aaa818f2624ca821d9a246342fa8b8c50 ]

The check for cpu hit statistics was not returning immediate false for
any non vport rep netdev and hence we crashed (say on mlx5 probed VFs) if
user-space tool was calling into any possible netdev in the system.

Fix that by doing a proper check before dereferencing.

Fixes: 1d447a39142e ('net/mlx5e: Extendable vport representor netdev private data')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Eli Cohen <eli@melloanox.com>
Reviewed-by: Eli Cohen <eli@melloanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonet/mlx5e: Avoid dealing with vport representors if not being e-switch manager
Or Gerlitz [Thu, 31 May 2018 08:32:56 +0000 (11:32 +0300)]
net/mlx5e: Avoid dealing with vport representors if not being e-switch manager

[ Upstream commit 733d3e5497070d05971352ca5087bac83c197c3d ]

In smartnic env, the host (PF) driver might not be an e-switch
manager, hence the switchdev mode representors are running on
the embedded cpu (EC) and not at the host.

As such, we should avoid dealing with vport representors if
not being esw manager.

While here, make sure to disallow eswitch switchdev related
setups through devlink if we are not esw managers.

Fixes: cb67b832921c ('net/mlx5e: Introduce SRIOV VF representors')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonet: macb: Fix ptp time adjustment for large negative delta
Harini Katakam [Wed, 20 Jun 2018 11:34:20 +0000 (17:04 +0530)]
net: macb: Fix ptp time adjustment for large negative delta

[ Upstream commit 64d7839af8c8f67daaf9bf387135052c55d85f90 ]

When delta passed to gem_ptp_adjtime is negative, the sign is
maintained in the ns_to_timespec64 conversion. Hence timespec_add
should be used directly. timespec_sub will just subtract the negative
value thus increasing the time difference.

Signed-off-by: Harini Katakam <harini.katakam@xilinx.com>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonet: fix use-after-free in GRO with ESP
Sabrina Dubroca [Sat, 30 Jun 2018 15:38:55 +0000 (17:38 +0200)]
net: fix use-after-free in GRO with ESP

[ Upstream commit 603d4cf8fe095b1ee78f423d514427be507fb513 ]

Since the addition of GRO for ESP, gro_receive can consume the skb and
return -EINPROGRESS. In that case, the lower layer GRO handler cannot
touch the skb anymore.

Commit 5f114163f2f5 ("net: Add a skb_gro_flush_final helper.") converted
some of the gro_receive handlers that can lead to ESP's gro_receive so
that they wouldn't access the skb when -EINPROGRESS is returned, but
missed other spots, mainly in tunneling protocols.

This patch finishes the conversion to using skb_gro_flush_final(), and
adds a new helper, skb_gro_flush_final_remcsum(), used in VXLAN and
GUE.

Fixes: 5f114163f2f5 ("net: Add a skb_gro_flush_final helper.")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonet: dccp: switch rx_tstamp_last_feedback to monotonic clock
Eric Dumazet [Fri, 22 Jun 2018 13:44:15 +0000 (06:44 -0700)]
net: dccp: switch rx_tstamp_last_feedback to monotonic clock

[ Upstream commit 0ce4e70ff00662ad7490e545ba0cd8c1fa179fca ]

To compute delays, better not use time of the day which can
be changed by admins or malicious programs.

Also change ccid3_first_li() to use s64 type for delta variable
to avoid potential overflows.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Cc: dccp@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonet: dccp: avoid crash in ccid3_hc_rx_send_feedback()
Eric Dumazet [Fri, 22 Jun 2018 13:44:14 +0000 (06:44 -0700)]
net: dccp: avoid crash in ccid3_hc_rx_send_feedback()

[ Upstream commit 74174fe5634ffbf645a7ca5a261571f700b2f332 ]

On fast hosts or malicious bots, we trigger a DCCP_BUG() which
seems excessive.

syzbot reported :

BUG: delta (-6195) <= 0 at net/dccp/ccids/ccid3.c:628/ccid3_hc_rx_send_feedback()
CPU: 1 PID: 18 Comm: ksoftirqd/1 Not tainted 4.18.0-rc1+ #112
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
 ccid3_hc_rx_send_feedback net/dccp/ccids/ccid3.c:628 [inline]
 ccid3_hc_rx_packet_recv.cold.16+0x38/0x71 net/dccp/ccids/ccid3.c:793
 ccid_hc_rx_packet_recv net/dccp/ccid.h:185 [inline]
 dccp_deliver_input_to_ccids+0xf0/0x280 net/dccp/input.c:180
 dccp_rcv_established+0x87/0xb0 net/dccp/input.c:378
 dccp_v4_do_rcv+0x153/0x180 net/dccp/ipv4.c:654
 sk_backlog_rcv include/net/sock.h:914 [inline]
 __sk_receive_skb+0x3ba/0xd80 net/core/sock.c:517
 dccp_v4_rcv+0x10f9/0x1f58 net/dccp/ipv4.c:875
 ip_local_deliver_finish+0x2eb/0xda0 net/ipv4/ip_input.c:215
 NF_HOOK include/linux/netfilter.h:287 [inline]
 ip_local_deliver+0x1e9/0x750 net/ipv4/ip_input.c:256
 dst_input include/net/dst.h:450 [inline]
 ip_rcv_finish+0x823/0x2220 net/ipv4/ip_input.c:396
 NF_HOOK include/linux/netfilter.h:287 [inline]
 ip_rcv+0xa18/0x1284 net/ipv4/ip_input.c:492
 __netif_receive_skb_core+0x2488/0x3680 net/core/dev.c:4628
 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4693
 process_backlog+0x219/0x760 net/core/dev.c:5373
 napi_poll net/core/dev.c:5771 [inline]
 net_rx_action+0x7da/0x1980 net/core/dev.c:5837
 __do_softirq+0x2e8/0xb17 kernel/softirq.c:284
 run_ksoftirqd+0x86/0x100 kernel/softirq.c:645
 smpboot_thread_fn+0x417/0x870 kernel/smpboot.c:164
 kthread+0x345/0x410 kernel/kthread.c:240
 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Cc: dccp@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoixgbe: split XDP_TX tail and XDP_REDIRECT map flushing
Jesper Dangaard Brouer [Tue, 26 Jun 2018 15:39:48 +0000 (17:39 +0200)]
ixgbe: split XDP_TX tail and XDP_REDIRECT map flushing

[ Upstream commit ad088ec480768850db019a5cc543685e868a513d ]

The driver was combining the XDP_TX tail flush and XDP_REDIRECT
map flushing (xdp_do_flush_map).  This is suboptimal, these two
flush operations should be kept separate.

Fixes: 11393cc9b9be ("xdp: Add batching support to redirect map")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoipvlan: fix IFLA_MTU ignored on NEWLINK
Xin Long [Thu, 21 Jun 2018 04:56:04 +0000 (12:56 +0800)]
ipvlan: fix IFLA_MTU ignored on NEWLINK

[ Upstream commit 30877961b1cdd6fdca783c2e8c4f0f47e95dc58c ]

Commit 296d48568042 ("ipvlan: inherit MTU from master device") adjusted
the mtu from the master device when creating a ipvlan device, but it
would also override the mtu value set in rtnl_create_link. It causes
IFLA_MTU param not to take effect.

So this patch is to not adjust the mtu if IFLA_MTU param is set when
creating a ipvlan device.

Fixes: 296d48568042 ("ipvlan: inherit MTU from master device")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoipv6: sr: fix passing wrong flags to crypto_alloc_shash()
Eric Biggers [Sat, 30 Jun 2018 22:26:56 +0000 (15:26 -0700)]
ipv6: sr: fix passing wrong flags to crypto_alloc_shash()

[ Upstream commit fc9c2029e37c3ae9efc28bf47045e0b87e09660c ]

The 'mask' argument to crypto_alloc_shash() uses the CRYPTO_ALG_* flags,
not 'gfp_t'.  So don't pass GFP_KERNEL to it.

Fixes: bf355b8d2c30 ("ipv6: sr: add core files for SR HMAC support")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agohv_netvsc: split sub-channel setup into async and sync
Stephen Hemminger [Fri, 29 Jun 2018 21:07:16 +0000 (14:07 -0700)]
hv_netvsc: split sub-channel setup into async and sync

[ Upstream commit 3ffe64f1a641b80a82d9ef4efa7a05ce69049871 ]

When doing device hotplug the sub channel must be async to avoid
deadlock issues because device is discovered in softirq context.

When doing changes to MTU and number of channels, the setup
must be synchronous to avoid races such as when MTU and device
settings are done in a single ip command.

Reported-by: Thomas Walker <Thomas.Walker@twosigma.com>
Fixes: 8195b1396ec8 ("hv_netvsc: fix deadlock on hotplug")
Fixes: 732e49850c5e ("netvsc: fix race on sub channel creation")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoatm: zatm: Fix potential Spectre v1
Gustavo A. R. Silva [Fri, 29 Jun 2018 18:28:07 +0000 (13:28 -0500)]
atm: zatm: Fix potential Spectre v1

[ Upstream commit ced9e191501e52b95e1b57b8e0db00943869eed0 ]

pool can be indirectly controlled by user-space, hence leading to
a potential exploitation of the Spectre variant 1 vulnerability.

This issue was detected with the help of Smatch:

drivers/atm/zatm.c:1491 zatm_ioctl() warn: potential spectre issue
'zatm_dev->pool_info' (local cap)

Fix this by sanitizing pool before using it to index
zatm_dev->pool_info

Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].

[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoatm: Preserve value of skb->truesize when accounting to vcc
David Woodhouse [Sat, 16 Jun 2018 10:55:44 +0000 (11:55 +0100)]
atm: Preserve value of skb->truesize when accounting to vcc

[ Upstream commit 9bbe60a67be5a1c6f79b3c9be5003481a50529ff ]

ATM accounts for in-flight TX packets in sk_wmem_alloc of the VCC on
which they are to be sent. But it doesn't take ownership of those
packets from the sock (if any) which originally owned them. They should
remain owned by their actual sender until they've left the box.

There's a hack in pskb_expand_head() to avoid adjusting skb->truesize
for certain skbs, precisely to avoid messing up sk_wmem_alloc
accounting. Ideally that hack would cover the ATM use case too, but it
doesn't — skbs which aren't owned by any sock, for example PPP control
frames, still get their truesize adjusted when the low-level ATM driver
adds headroom.

This has always been an issue, it seems. The truesize of a packet
increases, and sk_wmem_alloc on the VCC goes negative. But this wasn't
for normal traffic, only for control frames. So I think we just got away
with it, and we probably needed to send 2GiB of LCP echo frames before
the misaccounting would ever have caused a problem and caused
atm_may_send() to start refusing packets.

Commit 14afee4b609 ("net: convert sock.sk_wmem_alloc from atomic_t to
refcount_t") did exactly what it was intended to do, and turned this
mostly-theoretical problem into a real one, causing PPPoATM to fail
immediately as sk_wmem_alloc underflows and atm_may_send() *immediately*
starts refusing to allow new packets.

The least intrusive solution to this problem is to stash the value of
skb->truesize that was accounted to the VCC, in a new member of the
ATM_SKB(skb) structure. Then in atm_pop_raw() subtract precisely that
value instead of the then-current value of skb->truesize.

Fixes: 158f323b9868 ("net: adjust skb->truesize in pskb_expand_head()")
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Tested-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoalx: take rtnl before calling __alx_open from resume
Sabrina Dubroca [Fri, 29 Jun 2018 15:51:26 +0000 (17:51 +0200)]
alx: take rtnl before calling __alx_open from resume

[ Upstream commit bc800e8b39bad60ccdb83be828da63af71ab87b3 ]

The __alx_open function can be called from ndo_open, which is called
under RTNL, or from alx_resume, which isn't. Since commit d768319cd427,
we're calling the netif_set_real_num_{tx,rx}_queues functions, which
need to be called under RTNL.

This is similar to commit 0c2cc02e571a ("igb: Move the calls to set the
Tx and Rx queues into igb_open").

Fixes: d768319cd427 ("alx: enable multiple tx queues")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agopinctrl: mt7622: fix a kernel panic when gpio-hog is being applied
Sean Wang [Fri, 22 Jun 2018 03:49:08 +0000 (11:49 +0800)]
pinctrl: mt7622: fix a kernel panic when gpio-hog is being applied

commit 5b1c4bf2519efc2328d252fd7697bdfb306f10f3 upstream.

When we are explicitly using GPIO hogging mechanism in the pinctrl node,
such as:

&pio {
line_input {
gpio-hog;
gpios = <95 0>, <96 0>, <97 0>;
input;
};
};

A kernel panic happens at dereferencing a NULL pointer: In this case, the
drvdata is still not setup properly yet when it is being accessed.

A better solution for fixing up this issue should be we should obtain the
private data from struct gpio_chip using a specific gpiochip_get_data
instead of a generic dev_get_drvdata.

[    0.249424] Unable to handle kernel NULL pointer dereference at virtual
       address 000000c8
[    0.257818] Mem abort info:
[    0.260704]   ESR = 0x96000005
[    0.263869]   Exception class = DABT (current EL), IL = 32 bits
[    0.270011]   SET = 0, FnV = 0
[    0.273167]   EA = 0, S1PTW = 0
[    0.276421] Data abort info:
[    0.279398]   ISV = 0, ISS = 0x00000005
[    0.283372]   CM = 0, WnR = 0
[    0.286440] [00000000000000c8] user address but active_mm is swapper
[    0.293027] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[    0.298795] Modules linked in:
[    0.301958] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.16.0-rc1+ #389
[    0.308716] Hardware name: MediaTek MT7622 RFB1 board (DT)
[    0.314396] pstate: 80000005 (Nzcv daif -PAN -UAO)
[    0.319362] pc : mtk_hw_pin_field_get+0x28/0x118
[    0.324140] lr : mtk_hw_set_value+0x30/0x104
[    0.328557] sp : ffffff800801b6d0
[    0.331983] x29: ffffff800801b6d0 x28: ffffff80086b7970
[    0.337484] x27: 0000000000000000 x26: ffffff80087b8000
[    0.342986] x25: 0000000000000000 x24: ffffffc00324c230
[    0.348487] x23: 0000000000000003 x22: 0000000000000000
[    0.353988] x21: ffffff80087b8000 x20: 0000000000000000
[    0.359489] x19: 0000000000000054 x18: 00000000fffff7c0
[    0.364990] x17: 0000000000006300 x16: 000000000000003f
[    0.370492] x15: 000000000000000e x14: ffffffffffffffff
[    0.375993] x13: 0000000000000000 x12: 0000000000000020
[    0.381494] x11: 0000000000000006 x10: 0101010101010101
[    0.386995] x9 : fffffffffffffffa x8 : 0000000000000007
[    0.392496] x7 : ffffff80085d63f8 x6 : 0000000000000003
[    0.397997] x5 : 0000000000000054 x4 : ffffffc0031eb800
[    0.403499] x3 : ffffff800801b728 x2 : 0000000000000003
[    0.409000] x1 : 0000000000000054 x0 : 0000000000000000
[    0.414502] Process swapper/0 (pid: 1, stack limit = 0x000000002a913c1c)
[    0.421441] Call trace:
[    0.423968]  mtk_hw_pin_field_get+0x28/0x118
[    0.428387]  mtk_hw_set_value+0x30/0x104
[    0.432445]  mtk_gpio_set+0x20/0x28
[    0.436052]  mtk_gpio_direction_output+0x18/0x30
[    0.440833]  gpiod_direction_output_raw_commit+0x7c/0xa0
[    0.446333]  gpiod_direction_output+0x104/0x114
[    0.451022]  gpiod_configure_flags+0xbc/0xfc
[    0.455441]  gpiod_hog+0x8c/0x140
[    0.458869]  of_gpiochip_add+0x27c/0x2d4
[    0.462928]  gpiochip_add_data_with_key+0x338/0x5f0
[    0.467976]  mtk_pinctrl_probe+0x388/0x400
[    0.472217]  platform_drv_probe+0x58/0xa4
[    0.476365]  driver_probe_device+0x204/0x44c
[    0.480783]  __device_attach_driver+0xac/0x108
[    0.485384]  bus_for_each_drv+0x7c/0xac
[    0.489352]  __device_attach+0xa0/0x144
[    0.493320]  device_initial_probe+0x10/0x18
[    0.497647]  bus_probe_device+0x2c/0x8c
[    0.501616]  device_add+0x2f8/0x540
[    0.505226]  of_device_add+0x3c/0x44
[    0.508925]  of_platform_device_create_pdata+0x80/0xb8
[    0.514245]  of_platform_bus_create+0x290/0x3e8
[    0.518933]  of_platform_populate+0x78/0x100
[    0.523352]  of_platform_default_populate+0x24/0x2c
[    0.528403]  of_platform_default_populate_init+0x94/0xa4
[    0.533903]  do_one_initcall+0x98/0x130
[    0.537874]  kernel_init_freeable+0x13c/0x1d4
[    0.542385]  kernel_init+0x10/0xf8
[    0.545903]  ret_from_fork+0x10/0x18
[    0.549603] Code: 900020a1 f9400800 911dcc21 1400001f (f9406401)
[    0.555916] ---[ end trace de8c34787fdad3b3 ]---
[    0.560722] Kernel panic - not syncing: Attempted to kill init!
       exitcode=0x0000000b
[    0.560722]
[    0.570188] SMP: stopping secondary CPUs
[    0.574253] ---[ end Kernel panic - not syncing: Attempted to kill
       init! exitcode=0x0000000b
[    0.574253]

Cc: stable@vger.kernel.org
Fixes: d6ed93551320 ("pinctrl: mediatek: add pinctrl driver for MT7622 SoC")
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agopinctrl: mt7622: stop using the deprecated pinctrl_add_gpio_range
Sean Wang [Fri, 22 Jun 2018 03:49:07 +0000 (11:49 +0800)]
pinctrl: mt7622: stop using the deprecated pinctrl_add_gpio_range

commit de227ed7965d06dcfcd06376e03c107004a4881c upstream.

If the pinctrl node has the gpio-ranges property, the range will be added
by the gpio core and doesn't need to be added by the pinctrl driver.

But for keeping backward compatibility, an explicit pinctrl_add_gpio_range
is still needed to be called when there is a missing gpio-ranges in pinctrl
node in old dts files.

Cc: stable@vger.kernel.org
Fixes: d6ed93551320 ("pinctrl: mediatek: add pinctrl driver for MT7622 SoC")
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agopinctrl: mt7622: fix error path on failing at groups building
Sean Wang [Fri, 22 Jun 2018 03:49:04 +0000 (11:49 +0800)]
pinctrl: mt7622: fix error path on failing at groups building

commit fafa35cce34ba4c4f6fd7f1026c038de0a2884af upstream.

It should be to return an error code when failing at groups building.

Cc: stable@vger.kernel.org
Fixes: d6ed93551320 ("pinctrl: mediatek: add pinctrl driver for MT7622 SoC")
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agopinctrl: sh-pfc: r8a77970: remove SH_PFC_PIN_CFG_DRIVE_STRENGTH flag
Niklas Söderlund [Tue, 3 Jul 2018 15:18:42 +0000 (17:18 +0200)]
pinctrl: sh-pfc: r8a77970: remove SH_PFC_PIN_CFG_DRIVE_STRENGTH flag

commit 550b6f7e8cf93fc2753aa01e655ed5471012ab5a upstream.

The datasheet does not document any registers to control drive strength,
and no drive strength registers are for this reason described for this
SoC. The flags indicating that drive strength can be controlled are
however set for some pins in the driver.

This leads to a NULL pointer dereference when the sh-pfc core tries to
access the struct describing the drive strength registers, for example
when reading the sysfs file pinconf-pins.

Fix this by removing the SH_PFC_PIN_CFG_DRIVE_STRENGTH from all pins.

Fixes: b92ac66a1819602b ("pinctrl: sh-pfc: Add R8A77970 PFC support")
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agox86/paravirt: Make native_save_fl() extern inline
Nick Desaulniers [Thu, 21 Jun 2018 16:23:24 +0000 (09:23 -0700)]
x86/paravirt: Make native_save_fl() extern inline

commit d0a8d9378d16eb3c69bd8e6d23779fbdbee3a8c7 upstream.

native_save_fl() is marked static inline, but by using it as
a function pointer in arch/x86/kernel/paravirt.c, it MUST be outlined.

paravirt's use of native_save_fl() also requires that no GPRs other than
%rax are clobbered.

Compilers have different heuristics which they use to emit stack guard
code, the emittance of which can break paravirt's callee saved assumption
by clobbering %rcx.

Marking a function definition extern inline means that if this version
cannot be inlined, then the out-of-line version will be preferred. By
having the out-of-line version be implemented in assembly, it cannot be
instrumented with a stack protector, which might violate custom calling
conventions that code like paravirt rely on.

The semantics of extern inline has changed since gnu89. This means that
folks using GCC versions >= 5.1 may see symbol redefinition errors at
link time for subdirs that override KBUILD_CFLAGS (making the C standard
used implicit) regardless of this patch. This has been cleaned up
earlier in the patch set, but is left as a note in the commit message
for future travelers.

Reports:
 https://lkml.org/lkml/2018/5/7/534
 https://github.com/ClangBuiltLinux/linux/issues/16

Discussion:
 https://bugs.llvm.org/show_bug.cgi?id=37512
 https://lkml.org/lkml/2018/5/24/1371

Thanks to the many folks that participated in the discussion.

Debugged-by: Alistair Strachan <astrachan@google.com>
Debugged-by: Matthias Kaehlcke <mka@chromium.org>
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Suggested-by: H. Peter Anvin <hpa@zytor.com>
Suggested-by: Tom Stellar <tstellar@redhat.com>
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@redhat.com
Cc: akataria@vmware.com
Cc: akpm@linux-foundation.org
Cc: andrea.parri@amarulasolutions.com
Cc: ard.biesheuvel@linaro.org
Cc: aryabinin@virtuozzo.com
Cc: astrachan@google.com
Cc: boris.ostrovsky@oracle.com
Cc: brijesh.singh@amd.com
Cc: caoj.fnst@cn.fujitsu.com
Cc: geert@linux-m68k.org
Cc: ghackmann@google.com
Cc: gregkh@linuxfoundation.org
Cc: jan.kiszka@siemens.com
Cc: jarkko.sakkinen@linux.intel.com
Cc: joe@perches.com
Cc: jpoimboe@redhat.com
Cc: keescook@google.com
Cc: kirill.shutemov@linux.intel.com
Cc: kstewart@linuxfoundation.org
Cc: linux-efi@vger.kernel.org
Cc: linux-kbuild@vger.kernel.org
Cc: manojgupta@google.com
Cc: mawilcox@microsoft.com
Cc: michal.lkml@markovi.net
Cc: mjg59@google.com
Cc: mka@chromium.org
Cc: pombredanne@nexb.com
Cc: rientjes@google.com
Cc: rostedt@goodmis.org
Cc: thomas.lendacky@amd.com
Cc: tweek@google.com
Cc: virtualization@lists.linux-foundation.org
Cc: will.deacon@arm.com
Cc: yamada.masahiro@socionext.com
Link: http://lkml.kernel.org/r/20180621162324.36656-4-ndesaulniers@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>