]> git.itanic.dy.fi Git - linux-stable/log
linux-stable
7 years agoLinux 4.10.5 v4.10.5
Greg Kroah-Hartman [Wed, 22 Mar 2017 12:38:58 +0000 (13:38 +0100)]
Linux 4.10.5

7 years agocrypto: s5p-sss - Fix spinlock recursion on LRW(AES)
Krzysztof Kozlowski [Wed, 8 Mar 2017 21:14:20 +0000 (23:14 +0200)]
crypto: s5p-sss - Fix spinlock recursion on LRW(AES)

commit 28b62b1458685d8f68f67d9b2d511bf8fa32b746 upstream.

Running TCRYPT with LRW compiled causes spinlock recursion:

    testing speed of async lrw(aes) (lrw(ecb-aes-s5p)) encryption
    tcrypt: test 0 (256 bit key, 16 byte blocks): 19007 operations in 1 seconds (304112 bytes)
    tcrypt: test 1 (256 bit key, 64 byte blocks): 15753 operations in 1 seconds (1008192 bytes)
    tcrypt: test 2 (256 bit key, 256 byte blocks): 14293 operations in 1 seconds (3659008 bytes)
    tcrypt: test 3 (256 bit key, 1024 byte blocks): 11906 operations in 1 seconds (12191744 bytes)
    tcrypt: test 4 (256 bit key, 8192 byte blocks):
    BUG: spinlock recursion on CPU#1, irq/84-10830000/89
     lock: 0xeea99a68, .magic: dead4ead, .owner: irq/84-10830000/89, .owner_cpu: 1
    CPU: 1 PID: 89 Comm: irq/84-10830000 Not tainted 4.11.0-rc1-00001-g897ca6d0800d #559
    Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
    [<c010e1ec>] (unwind_backtrace) from [<c010ae1c>] (show_stack+0x10/0x14)
    [<c010ae1c>] (show_stack) from [<c03449c0>] (dump_stack+0x78/0x8c)
    [<c03449c0>] (dump_stack) from [<c015de68>] (do_raw_spin_lock+0x11c/0x120)
    [<c015de68>] (do_raw_spin_lock) from [<c0720110>] (_raw_spin_lock_irqsave+0x20/0x28)
    [<c0720110>] (_raw_spin_lock_irqsave) from [<c0572ca0>] (s5p_aes_crypt+0x2c/0xb4)
    [<c0572ca0>] (s5p_aes_crypt) from [<bf1d8aa4>] (do_encrypt+0x78/0xb0 [lrw])
    [<bf1d8aa4>] (do_encrypt [lrw]) from [<bf1d8b00>] (encrypt_done+0x24/0x54 [lrw])
    [<bf1d8b00>] (encrypt_done [lrw]) from [<c05732a0>] (s5p_aes_complete+0x60/0xcc)
    [<c05732a0>] (s5p_aes_complete) from [<c0573440>] (s5p_aes_interrupt+0x134/0x1a0)
    [<c0573440>] (s5p_aes_interrupt) from [<c01667c4>] (irq_thread_fn+0x1c/0x54)
    [<c01667c4>] (irq_thread_fn) from [<c0166a98>] (irq_thread+0x12c/0x1e0)
    [<c0166a98>] (irq_thread) from [<c0136a28>] (kthread+0x108/0x138)
    [<c0136a28>] (kthread) from [<c0107778>] (ret_from_fork+0x14/0x3c)

Interrupt handling routine was calling req->base.complete() under
spinlock.  In most cases this wasn't fatal but when combined with some
of the cipher modes (like LRW) this caused recursion - starting the new
encryption (s5p_aes_crypt()) while still holding the spinlock from
previous round (s5p_aes_complete()).

Beside that, the s5p_aes_interrupt() error handling path could execute
two completions in case of error for RX and TX blocks.

Rewrite the interrupt handling routine and the completion by:

1. Splitting the operations on scatterlist copies from
   s5p_aes_complete() into separate s5p_sg_done(). This still should be
   done under lock.
   The s5p_aes_complete() now only calls req->base.complete() and it has
   to be called outside of lock.

2. Moving the s5p_aes_complete() out of spinlock critical sections.
   In interrupt service routine s5p_aes_interrupts(), it appeared in few
   places, including error paths inside other functions called from ISR.
   This code was not so obvious to read so simplify it by putting the
   s5p_aes_complete() only within ISR level.

Reported-by: Nathan Royce <nroycea+kernel@gmail.com>
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocrypto: powerpc - Fix initialisation of crc32c context
Daniel Axtens [Fri, 3 Mar 2017 06:56:55 +0000 (17:56 +1100)]
crypto: powerpc - Fix initialisation of crc32c context

commit aa2be9b3d6d2d699e9ca7cbfc00867c80e5da213 upstream.

Turning on crypto self-tests on a POWER8 shows:

    alg: hash: Test 1 failed for crc32c-vpmsum
    00000000: ff ff ff ff

Comparing the code with the Intel CRC32c implementation on which
ours is based shows that we are doing an init with 0, not ~0
as CRC32c requires.

This probably wasn't caught because btrfs does its own weird
open-coded initialisation.

Initialise our internal context to ~0 on init.

This makes the self-tests pass, and btrfs continues to work.

Fixes: 6dd7a82cc54e ("crypto: powerpc - Add POWER8 optimised crc32c")
Cc: Anton Blanchard <anton@samba.org>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Acked-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agolocking/rwsem: Fix down_write_killable() for CONFIG_RWSEM_GENERIC_SPINLOCK=y
Niklas Cassel [Sat, 25 Feb 2017 00:17:53 +0000 (01:17 +0100)]
locking/rwsem: Fix down_write_killable() for CONFIG_RWSEM_GENERIC_SPINLOCK=y

commit 17fcbd590d0c3e35bd9646e2215f86586378bc42 upstream.

We hang if SIGKILL has been sent, but the task is stuck in down_read()
(after do_exit()), even though no task is doing down_write() on the
rwsem in question:

  INFO: task libupnp:21868 blocked for more than 120 seconds.
  libupnp         D    0 21868      1 0x08100008
  ...
  Call Trace:
  __schedule()
  schedule()
  __down_read()
  do_exit()
  do_group_exit()
  __wake_up_parent()

This bug has already been fixed for CONFIG_RWSEM_XCHGADD_ALGORITHM=y in
the following commit:

 04cafed7fc19 ("locking/rwsem: Fix down_write_killable()")

... however, this bug also exists for CONFIG_RWSEM_GENERIC_SPINLOCK=y.

Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Niklas Cassel <niklass@axis.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: d47996082f52 ("locking/rwsem: Introduce basis for down_write_killable()")
Link: http://lkml.kernel.org/r/1487981873-12649-1-git-send-email-niklass@axis.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agofutex: Add missing error handling to FUTEX_REQUEUE_PI
Peter Zijlstra [Sat, 4 Mar 2017 09:27:19 +0000 (10:27 +0100)]
futex: Add missing error handling to FUTEX_REQUEUE_PI

commit 9bbb25afeb182502ca4f2c4f3f88af0681b34cae upstream.

Thomas spotted that fixup_pi_state_owner() can return errors and we
fail to unlock the rt_mutex in that case.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Darren Hart <dvhart@linux.intel.com>
Cc: juri.lelli@arm.com
Cc: bigeasy@linutronix.de
Cc: xlpang@redhat.com
Cc: rostedt@goodmis.org
Cc: mathieu.desnoyers@efficios.com
Cc: jdesfossez@efficios.com
Cc: dvhart@infradead.org
Cc: bristot@redhat.com
Link: http://lkml.kernel.org/r/20170304093558.867401760@infradead.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agofutex: Fix potential use-after-free in FUTEX_REQUEUE_PI
Peter Zijlstra [Sat, 4 Mar 2017 09:27:18 +0000 (10:27 +0100)]
futex: Fix potential use-after-free in FUTEX_REQUEUE_PI

commit c236c8e95a3d395b0494e7108f0d41cf36ec107c upstream.

While working on the futex code, I stumbled over this potential
use-after-free scenario. Dmitry triggered it later with syzkaller.

pi_mutex is a pointer into pi_state, which we drop the reference on in
unqueue_me_pi(). So any access to that pointer after that is bad.

Since other sites already do rt_mutex_unlock() with hb->lock held, see
for example futex_lock_pi(), simply move the unlock before
unqueue_me_pi().

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Darren Hart <dvhart@linux.intel.com>
Cc: juri.lelli@arm.com
Cc: bigeasy@linutronix.de
Cc: xlpang@redhat.com
Cc: rostedt@goodmis.org
Cc: mathieu.desnoyers@efficios.com
Cc: jdesfossez@efficios.com
Cc: dvhart@infradead.org
Cc: bristot@redhat.com
Link: http://lkml.kernel.org/r/20170304093558.801744246@infradead.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agox86/perf: Fix CR4.PCE propagation to use active_mm instead of mm
Andy Lutomirski [Thu, 16 Mar 2017 19:59:39 +0000 (12:59 -0700)]
x86/perf: Fix CR4.PCE propagation to use active_mm instead of mm

commit 5dc855d44c2ad960a86f593c60461f1ae1566b6d upstream.

If one thread mmaps a perf event while another thread in the same mm
is in some context where active_mm != mm (which can happen in the
scheduler, for example), refresh_pce() would write the wrong value
to CR4.PCE.  This broke some PAPI tests.

Reported-and-tested-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 7911d3f7af14 ("perf/x86: Only allow rdpmc if a perf_event is mapped")
Link: http://lkml.kernel.org/r/0c5b38a76ea50e405f9abe07a13dfaef87c173a1.1489694270.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agox86/intel_rdt: Put group node in rdtgroup_kn_unlock
Jiri Olsa [Tue, 14 Mar 2017 14:20:53 +0000 (15:20 +0100)]
x86/intel_rdt: Put group node in rdtgroup_kn_unlock

commit 49ec8f5b6ae3ab60385492cad900ffc8a523c895 upstream.

The rdtgroup_kn_unlock waits for the last user to release and put its
node. But it's calling kernfs_put on the node which calls the
rdtgroup_kn_unlock, which might not be the group's directory node, but
another group's file node.

This race could be easily reproduced by running 2 instances
of following script:

  mount -t resctrl resctrl /sys/fs/resctrl/
  pushd /sys/fs/resctrl/
  mkdir krava
  echo "krava" > krava/schemata
  rmdir krava
  popd
  umount  /sys/fs/resctrl

It triggers the slub debug error message with following command
line config: slub_debug=,kernfs_node_cache.

Call kernfs_put on the group's node to fix it.

Fixes: 60cf5e101fd4 ("x86/intel_rdt: Add mkdir to resctrl file system")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Shaohua Li <shli@fb.com>
Link: http://lkml.kernel.org/r/1489501253-20248-1-git-send-email-jolsa@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agox86/kasan: Fix boot with KASAN=y and PROFILE_ANNOTATED_BRANCHES=y
Andrey Ryabinin [Mon, 13 Mar 2017 16:33:37 +0000 (19:33 +0300)]
x86/kasan: Fix boot with KASAN=y and PROFILE_ANNOTATED_BRANCHES=y

commit be3606ff739d1c1be36389f8737c577ad87e1f57 upstream.

The kernel doesn't boot with both PROFILE_ANNOTATED_BRANCHES=y and KASAN=y
options selected. With branch profiling enabled we end up calling
ftrace_likely_update() before kasan_early_init(). ftrace_likely_update() is
built with KASAN instrumentation, so calling it before kasan has been
initialized leads to crash.

Use DISABLE_BRANCH_PROFILING define to make sure that we don't call
ftrace_likely_update() from early code before kasan_early_init().

Fixes: ef7f0d6a6ca8 ("x86_64: add KASan support")
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: kasan-dev@googlegroups.com
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: lkp@01.org
Cc: Dmitry Vyukov <dvyukov@google.com>
Link: http://lkml.kernel.org/r/20170313163337.1704-1-aryabinin@virtuozzo.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agox86/tsc: Fix ART for TSC_KNOWN_FREQ
Peter Zijlstra [Mon, 13 Mar 2017 14:57:12 +0000 (15:57 +0100)]
x86/tsc: Fix ART for TSC_KNOWN_FREQ

commit 44fee88cea43d3c2cac962e0439cb10a3cabff6d upstream.

Subhransu reported that convert_art_to_tsc() isn't working for him.

The ART to TSC relation is only set up for systems which use the refined
TSC calibration. Systems with known TSC frequency (available via CPUID 15)
are not using the refined calibration and therefor the ART to TSC relation
is never established.

Add the setup to the known frequency init path which skips ART
calibration. The init code needs to be duplicated as for systems which use
refined calibration the ART setup must be delayed until calibration has
been done.

The problem has been there since the ART support was introdduced, but only
detected now because Subhransu tested the first time on hardware which has
TSC frequency enumerated via CPUID 15.

Note for stable: The conditional has changed from TSC_RELIABLE to
         TSC_KNOWN_FREQUENCY.

[ tglx: Rewrote changelog and identified the proper 'Fixes' commit ]

Fixes: f9677e0f8308 ("x86/tsc: Always Running Timer (ART) correlated clocksource")
Reported-by: "Prusty, Subhransu S" <subhransu.s.prusty@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: christopher.s.hall@intel.com
Cc: kevin.b.stanton@intel.com
Cc: john.stultz@linaro.org
Cc: akataria@vmware.com
Link: http://lkml.kernel.org/r/20170313145712.GI3312@twins.programming.kicks-ass.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agox86/unwind: Fix last frame check for aligned function stacks
Josh Poimboeuf [Tue, 14 Mar 2017 04:27:47 +0000 (23:27 -0500)]
x86/unwind: Fix last frame check for aligned function stacks

commit 87a6b2975f0d340c75b7488d22d61d2f98fb8abf upstream.

Pavel Machek reported the following warning on x86-32:

  WARNING: kernel stack frame pointer at f50cdf98 in swapper/2:0 has bad value   (null)

The warning is caused by the unwinder not realizing that it reached the
end of the stack, due to an unusual prologue which gcc sometimes
generates for aligned stacks.  The prologue is based on a gcc feature
called the Dynamic Realign Argument Pointer (DRAP).  It's almost always
enabled for aligned stacks when -maccumulate-outgoing-args isn't set.

This issue is similar to the one fixed by the following commit:

  8023e0e2a48d ("x86/unwind: Adjust last frame check for aligned function stacks")

... but that fix was specific to x86-64.

Make the fix more generic to cover x86-32 as well, and also ensure that
the return address referred to by the frame pointer is a copy of the
original return address.

Fixes: acb4608ad186 ("x86/unwind: Create stack frames for saved syscall registers")
Reported-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: http://lkml.kernel.org/r/50d4924db716c264b14f1633037385ec80bf89d2.1489465609.git.jpoimboe@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodrm/i915/lspcon: Fix resume time initialization due to unasserted HPD
Imre Deak [Fri, 27 Jan 2017 09:39:19 +0000 (11:39 +0200)]
drm/i915/lspcon: Fix resume time initialization due to unasserted HPD

commit 4b84b4a5507913ee0da27b1f1b27671937839de6 upstream.

During system resume time initialization the HPD level on LSPCON ports
can stay low for an extended amount of time, leading to failed AUX
transfers and LSPCON initialization. Fix this by waiting for HPD to get
asserted.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99178
Cc: Shashank Sharma <shashank.sharma@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Shashank Sharma <shashank.sharma@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1485509961-9010-3-git-send-email-imre.deak@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
(corrected stable tag)
Signed-off-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodrm/i915/gen9+: Enable hotplug detection early
Imre Deak [Fri, 27 Jan 2017 09:39:18 +0000 (11:39 +0200)]
drm/i915/gen9+: Enable hotplug detection early

commit 2a57d9cce1c08578097d965468e37f06d71fa495 upstream.

For LSPCON resume time initialization we need to sample the
corresponding pin's HPD level, but this is only available when HPD
detection is enabled. Currently we enable detection only when enabling
HPD interrupts which is too late, so bring the enabling of detection
earlier.

This is needed by the next patch.

Cc: Shashank Sharma <shashank.sharma@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Shashank Sharma <shashank.sharma@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1485509961-9010-2-git-send-email-imre.deak@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
(rebased onto v4.10.4 due to missing s/IS_BROXTON/IS_GEN9_LP/ upstream change)
(corrected stable tag)
Signed-off-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodrm/i915/lspcon: Enable AUX interrupts for resume time initialization
Imre Deak [Tue, 29 Nov 2016 19:40:29 +0000 (21:40 +0200)]
drm/i915/lspcon: Enable AUX interrupts for resume time initialization

commit 908764f6d0bd1ba496cb8da33b9b98297ed27351 upstream.

For LSPCON initialization during system resume we need AUX
functionality, but we call the corresponding encoder reset hook with all
interrupts disabled. Without interrupts we'll do a poll-wait for AUX
transfer completions, which adds a significant delay if the transfers
timeout/need to be retried for some reason.

Fix this by enabling interrupts before calling the reset hooks. Note
that while this will enable AUX interrupts it will keep HPD interrupts
disabled, in a similar way to the init time output setup code.

This issue existed since LSPCON support was added.

v2:
- Rebased on drm-tip.

Cc: Shashank Sharma <shashank.sharma@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: David Weinehall <david.weinehall@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1480448429-27739-1-git-send-email-imre.deak@intel.com
(rebased onto v4.10.4 due to missing s/dev/dev_priv/ upstream change)
(corrected stable tag)
Signed-off-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoirqchip/gicv3-its: Add workaround for QDF2400 ITS erratum 0065
Shanker Donthineni [Tue, 7 Mar 2017 14:20:38 +0000 (08:20 -0600)]
irqchip/gicv3-its: Add workaround for QDF2400 ITS erratum 0065

commit 90922a2d03d84de36bf8a9979d62580102f31a92 upstream.

On Qualcomm Datacenter Technologies QDF2400 SoCs, the ITS hardware
implementation uses 16Bytes for Interrupt Translation Entry (ITE),
but reports an incorrect value of 8Bytes in GITS_TYPER.ITTE_size.

It might cause kernel memory corruption depending on the number
of MSI(x) that are configured and the amount of memory that has
been allocated for ITEs in its_create_device().

This patch fixes the potential memory corruption by setting the
correct ITE size to 16Bytes.

Cc: stable@vger.kernel.org
Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoarm64: KVM: VHE: Clear HCR_TGE when invalidating guest TLBs
Marc Zyngier [Fri, 17 Feb 2017 14:32:18 +0000 (14:32 +0000)]
arm64: KVM: VHE: Clear HCR_TGE when invalidating guest TLBs

commit 68925176296a8b995e503349200e256674bfe5ac upstream.

When invalidating guest TLBs, special care must be taken to
actually shoot the guest TLBs and not the host ones if we're
running on a VHE system.  This is controlled by the HCR_EL2.TGE
bit, which we forget to clear before invalidating TLBs.

Address the issue by introducing two wrappers (__tlb_switch_to_guest
and __tlb_switch_to_host) that take care of both the VTTBR_EL2
and HCR_EL2.TGE switching.

Reported-by: Tomasz Nowicki <tnowicki@caviumnetworks.com>
Tested-by: Tomasz Nowicki <tnowicki@caviumnetworks.com>
Reviewed-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodccp: fix memory leak during tear-down of unsuccessful connection request
Hannes Frederic Sowa [Sun, 12 Mar 2017 23:01:30 +0000 (00:01 +0100)]
dccp: fix memory leak during tear-down of unsuccessful connection request

[ Upstream commit 72ef9c4125c7b257e3a714d62d778ab46583d6a3 ]

This patch fixes a memory leak, which happens if the connection request
is not fulfilled between parsing the DCCP options and handling the SYN
(because e.g. the backlog is full), because we forgot to free the
list of ack vectors.

Reported-by: Jianwen Ji <jiji@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agotun: fix premature POLLOUT notification on tun devices
Hannes Frederic Sowa [Sun, 12 Mar 2017 23:00:26 +0000 (00:00 +0100)]
tun: fix premature POLLOUT notification on tun devices

[ Upstream commit b20e2d54789c6acbf6bd0efdbec2cf5fa4d90ef1 ]

aszlig observed failing ssh tunnels (-w) during initialization since
commit cc9da6cc4f56e0 ("ipv6: addrconf: use stable address generator for
ARPHRD_NONE"). We already had reports that the mentioned commit breaks
Juniper VPN connections. I can't clearly say that the Juniper VPN client
has the same problem, but it is worth a try to hint to this patch.

Because of the early generation of link local addresses, the kernel now
can start asking for routers on the local subnet much earlier than usual.
Those router solicitation packets arrive inside the ssh channels and
should be transmitted to the tun fd before the configuration scripts
might have upped the interface and made it ready for transmission.

ssh polls on the interface and receives back a POLL_OUT. It tries to send
the earily router solicitation packet to the tun interface.  Unfortunately
it hasn't been up'ed yet by config scripts, thus failing with -EIO. ssh
doesn't retry again and considers the tun interface broken forever.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=121131
Fixes: cc9da6cc4f56 ("ipv6: addrconf: use stable address generator for ARPHRD_NONE")
Cc: Bjørn Mork <bjorn@mork.no>
Reported-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Reported-by: Jonas Lippuner <jonas@lippuner.ca>
Cc: Jonas Lippuner <jonas@lippuner.ca>
Reported-by: aszlig <aszlig@redmoonstudios.org>
Cc: aszlig <aszlig@redmoonstudios.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodccp/tcp: fix routing redirect race
Jon Maxwell [Fri, 10 Mar 2017 05:40:33 +0000 (16:40 +1100)]
dccp/tcp: fix routing redirect race

[ Upstream commit 45caeaa5ac0b4b11784ac6f932c0ad4c6b67cda0 ]

As Eric Dumazet pointed out this also needs to be fixed in IPv6.
v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.

We have seen a few incidents lately where a dst_enty has been freed
with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
dst_entry. If the conditions/timings are right a crash then ensues when the
freed dst_entry is referenced later on. A Common crashing back trace is:

 #8 [] page_fault at ffffffff8163e648
    [exception RIP: __tcp_ack_snd_check+74]
.
.
 #9 [] tcp_rcv_established at ffffffff81580b64
#10 [] tcp_v4_do_rcv at ffffffff8158b54a
#11 [] tcp_v4_rcv at ffffffff8158cd02
#12 [] ip_local_deliver_finish at ffffffff815668f4
#13 [] ip_local_deliver at ffffffff81566bd9
#14 [] ip_rcv_finish at ffffffff8156656d
#15 [] ip_rcv at ffffffff81566f06
#16 [] __netif_receive_skb_core at ffffffff8152b3a2
#17 [] __netif_receive_skb at ffffffff8152b608
#18 [] netif_receive_skb at ffffffff8152b690
#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
#21 [] net_rx_action at ffffffff8152bac2
#22 [] __do_softirq at ffffffff81084b4f
#23 [] call_softirq at ffffffff8164845c
#24 [] do_softirq at ffffffff81016fc5
#25 [] irq_exit at ffffffff81084ee5
#26 [] do_IRQ at ffffffff81648ff8

Of course it may happen with other NIC drivers as well.

It's found the freed dst_entry here:

 224 static bool tcp_in_quickack_mode(struct sock *sk)↩
 225 {↩
 226 ▹       const struct inet_connection_sock *icsk = inet_csk(sk);↩
 227 ▹       const struct dst_entry *dst = __sk_dst_get(sk);↩
 228 ↩
 229 ▹       return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩
 230 ▹       ▹       (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩
 231 }↩

But there are other backtraces attributed to the same freed dst_entry in
netfilter code as well.

All the vmcores showed 2 significant clues:

- Remote hosts behind the default gateway had always been redirected to a
different gateway. A rtable/dst_entry will be added for that host. Making
more dst_entrys with lower reference counts. Making this more probable.

- All vmcores showed a postitive LockDroppedIcmps value, e.g:

LockDroppedIcmps                  267

A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
regardless of whether user space has the socket locked. This can result in a
race condition where the same dst_entry cached in sk->sk_dst_entry can be
decremented twice for the same socket via:

do_redirect()->__sk_dst_check()-> dst_release().

Which leads to the dst_entry being prematurely freed with another socket
pointing to it via sk->sk_dst_cache and a subsequent crash.

To fix this skip do_redirect() if usespace has the socket locked. Instead let
the redirect take place later when user space does not have the socket
locked.

The dccp/IPv6 code is very similar in this respect, so fixing it there too.

As Eric Garver pointed out the following commit now invalidates routes. Which
can set the dst->obsolete flag so that ipv4_dst_check() returns null and
triggers the dst_release().

Fixes: ceb3320610d6 ("ipv4: Kill routes during PMTU/redirect updates.")
Cc: Eric Garver <egarver@redhat.com>
Cc: Hannes Sowa <hsowa@redhat.com>
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet: use net->count to check whether a netns is alive or not
Andrey Vagin [Mon, 13 Mar 2017 04:36:18 +0000 (21:36 -0700)]
net: use net->count to check whether a netns is alive or not

[ Upstream commit 91864f5852f9996210fad400cf70fb85af091243 ]

The previous idea was to check whether a net namespace is in
net_exit_list or not. It doesn't work, because net->exit_list is used in
__register_pernet_operations and __unregister_pernet_operations where
all namespaces are added to a temporary list to make cleanup in a error
case, so list_empty(&net->exit_list) always returns false.

Reported-by: Mantas Mikulėnas <grawity@gmail.com>
Fixes: 002d8a1a6c11 ("net: skip genenerating uevents for network namespaces that are exiting")
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agobridge: drop netfilter fake rtable unconditionally
Florian Westphal [Mon, 13 Mar 2017 16:38:17 +0000 (17:38 +0100)]
bridge: drop netfilter fake rtable unconditionally

[ Upstream commit a13b2082ece95247779b9995c4e91b4246bed023 ]

Andreas reports kernel oops during rmmod of the br_netfilter module.
Hannes debugged the oops down to a NULL rt6info->rt6i_indev.

Problem is that br_netfilter has the nasty concept of adding a fake
rtable to skb->dst; this happens in a br_netfilter prerouting hook.

A second hook (in bridge LOCAL_IN) is supposed to remove these again
before the skb is handed up the stack.

However, on module unload hooks get unregistered which means an
skb could traverse the prerouting hook that attaches the fake_rtable,
while the 'fake rtable remove' hook gets removed from the hooklist
immediately after.

Fixes: 34666d467cbf1e2e3c7 ("netfilter: bridge: move br_netfilter out of the core")
Reported-by: Andreas Karis <akaris@redhat.com>
Debugged-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoipv6: avoid write to a possibly cloned skb
Florian Westphal [Mon, 13 Mar 2017 15:24:28 +0000 (16:24 +0100)]
ipv6: avoid write to a possibly cloned skb

[ Upstream commit 79e49503efe53a8c51d8b695bedc8a346c5e4a87 ]

ip6_fragment, in case skb has a fraglist, checks if the
skb is cloned.  If it is, it will move to the 'slow path' and allocates
new skbs for each fragment.

However, right before entering the slowpath loop, it updates the
nexthdr value of the last ipv6 extension header to NEXTHDR_FRAGMENT,
to account for the fragment header that will be inserted in the new
ipv6-fragment skbs.

In case original skb is cloned this munges nexthdr value of another
skb.  Avoid this by doing the nexthdr update for each of the new fragment
skbs separately.

This was observed with tcpdump on a bridge device where netfilter ipv6
reassembly is active:  tcpdump shows malformed fragment headers as
the l4 header (icmpv6, tcp, etc). is decoded as a fragment header.

Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Reported-by: Andreas Karis <akaris@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoipv6: make ECMP route replacement less greedy
Sabrina Dubroca [Mon, 13 Mar 2017 12:28:09 +0000 (13:28 +0100)]
ipv6: make ECMP route replacement less greedy

[ Upstream commit 67e194007be08d071294456274dd53e0a04fdf90 ]

Commit 27596472473a ("ipv6: fix ECMP route replacement") introduced a
loop that removes all siblings of an ECMP route that is being
replaced. However, this loop doesn't stop when it has replaced
siblings, and keeps removing other routes with a higher metric.
We also end up triggering the WARN_ON after the loop, because after
this nsiblings < 0.

Instead, stop the loop when we have taken care of all routes with the
same metric as the route being replaced.

  Reproducer:
  ===========
    #!/bin/sh

    ip netns add ns1
    ip netns add ns2
    ip -net ns1 link set lo up

    for x in 0 1 2 ; do
        ip link add veth$x netns ns2 type veth peer name eth$x netns ns1
        ip -net ns1 link set eth$x up
        ip -net ns2 link set veth$x up
    done

    ip -net ns1 -6 r a 2000::/64 nexthop via fe80::0 dev eth0 \
            nexthop via fe80::1 dev eth1 nexthop via fe80::2 dev eth2
    ip -net ns1 -6 r a 2000::/64 via fe80::42 dev eth0 metric 256
    ip -net ns1 -6 r a 2000::/64 via fe80::43 dev eth0 metric 2048

    echo "before replace, 3 routes"
    ip -net ns1 -6 r | grep -v '^fe80\|^ff00'
    echo

    ip -net ns1 -6 r c 2000::/64 nexthop via fe80::4 dev eth0 \
            nexthop via fe80::5 dev eth1 nexthop via fe80::6 dev eth2

    echo "after replace, only 2 routes, metric 2048 is gone"
    ip -net ns1 -6 r | grep -v '^fe80\|^ff00'

Fixes: 27596472473a ("ipv6: fix ECMP route replacement")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agompls: Do not decrement alive counter for unregister events
David Ahern [Fri, 10 Mar 2017 22:11:39 +0000 (14:11 -0800)]
mpls: Do not decrement alive counter for unregister events

[ Upstream commit 79099aab38c8f5c746748b066ae74ba984fe2cc8 ]

Multipath routes can be rendered usesless when a device in one of the
paths is deleted. For example:

$ ip -f mpls ro ls
100
nexthop as to 200 via inet 172.16.2.2  dev virt12
nexthop as to 300 via inet 172.16.3.2  dev br0
101
nexthop as to 201 via inet6 2000:2::2  dev virt12
nexthop as to 301 via inet6 2000:3::2  dev br0

$ ip li del br0

When br0 is deleted the other hop is not considered in
mpls_select_multipath because of the alive check -- rt_nhn_alive
is 0.

rt_nhn_alive is decremented once in mpls_ifdown when the device is taken
down (NETDEV_DOWN) and again when it is deleted (NETDEV_UNREGISTER). For
a 2 hop route, deleting one device drops the alive count to 0. Since
devices are taken down before unregistering, the decrement on
NETDEV_UNREGISTER is redundant.

Fixes: c89359a42e2a4 ("mpls: support for dead routes")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agompls: Send route delete notifications when router module is unloaded
David Ahern [Fri, 10 Mar 2017 17:46:15 +0000 (09:46 -0800)]
mpls: Send route delete notifications when router module is unloaded

[ Upstream commit e37791ec1ad785b59022ae211f63a16189bacebf ]

When the mpls_router module is unloaded, mpls routes are deleted but
notifications are not sent to userspace leaving userspace caches
out of sync. Add the call to mpls_notify_route in mpls_net_exit as
routes are freed.

Fixes: 0189197f44160 ("mpls: Basic routing support")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoact_connmark: avoid crashing on malformed nlattrs with null parms
Etienne Noss [Fri, 10 Mar 2017 15:55:32 +0000 (16:55 +0100)]
act_connmark: avoid crashing on malformed nlattrs with null parms

[ Upstream commit 52491c7607c5527138095edf44c53169dc1ddb82 ]

tcf_connmark_init does not check in its configuration if TCA_CONNMARK_PARMS
is set, resulting in a null pointer dereference when trying to access it.

[501099.043007] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
[501099.043039] IP: [<ffffffffc10c60fb>] tcf_connmark_init+0x8b/0x180 [act_connmark]
...
[501099.044334] Call Trace:
[501099.044345]  [<ffffffffa47270e8>] ? tcf_action_init_1+0x198/0x1b0
[501099.044363]  [<ffffffffa47271b0>] ? tcf_action_init+0xb0/0x120
[501099.044380]  [<ffffffffa47250a4>] ? tcf_exts_validate+0xc4/0x110
[501099.044398]  [<ffffffffc0f5fa97>] ? u32_set_parms+0xa7/0x270 [cls_u32]
[501099.044417]  [<ffffffffc0f60bf0>] ? u32_change+0x680/0x87b [cls_u32]
[501099.044436]  [<ffffffffa4725d1d>] ? tc_ctl_tfilter+0x4dd/0x8a0
[501099.044454]  [<ffffffffa44a23a1>] ? security_capable+0x41/0x60
[501099.044471]  [<ffffffffa470ca01>] ? rtnetlink_rcv_msg+0xe1/0x220
[501099.044490]  [<ffffffffa470c920>] ? rtnl_newlink+0x870/0x870
[501099.044507]  [<ffffffffa472cc61>] ? netlink_rcv_skb+0xa1/0xc0
[501099.044524]  [<ffffffffa47073f4>] ? rtnetlink_rcv+0x24/0x30
[501099.044541]  [<ffffffffa472c634>] ? netlink_unicast+0x184/0x230
[501099.044558]  [<ffffffffa472c9d8>] ? netlink_sendmsg+0x2f8/0x3b0
[501099.044576]  [<ffffffffa46d8880>] ? sock_sendmsg+0x30/0x40
[501099.044592]  [<ffffffffa46d8e03>] ? SYSC_sendto+0xd3/0x150
[501099.044608]  [<ffffffffa425fda1>] ? __do_page_fault+0x2d1/0x510
[501099.044626]  [<ffffffffa47fbd7b>] ? system_call_fast_compare_end+0xc/0x9b

Fixes: 22a5dc0e5e3e ("net: sched: Introduce connmark action")
Signed-off-by: Étienne Noss <etienne.noss@wifirst.fr>
Signed-off-by: Victorien Molle <victorien.molle@wifirst.fr>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoamd-xgbe: Enable IRQs only if napi_complete_done() is true
Lendacky, Thomas [Thu, 9 Mar 2017 23:48:23 +0000 (17:48 -0600)]
amd-xgbe: Enable IRQs only if napi_complete_done() is true

[ Upstream commit d7aba644ffdebf756e51e26a2229055211838e89 ]

Depending on the hardware, the amd-xgbe driver may use disable_irq_nosync()
and enable_irq() when an interrupt is received to process Rx packets. If
the napi_complete_done() return value isn't checked an unbalanced enable
for the IRQ could result, generating a warning stack trace.

Update the driver to only enable interrupts if napi_complete_done() returns
true.

Reported-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agouapi: fix linux/packet_diag.h userspace compilation error
Dmitry V. Levin [Tue, 7 Mar 2017 20:50:50 +0000 (23:50 +0300)]
uapi: fix linux/packet_diag.h userspace compilation error

[ Upstream commit 745cb7f8a5de0805cade3de3991b7a95317c7c73 ]

Replace MAX_ADDR_LEN with its numeric value to fix the following
linux/packet_diag.h userspace compilation error:

/usr/include/linux/packet_diag.h:67:17: error: 'MAX_ADDR_LEN' undeclared here (not in a function)
  __u8 pdmc_addr[MAX_ADDR_LEN];

This is not the first case in the UAPI where the numeric value
of MAX_ADDR_LEN is used instead of symbolic one, uapi/linux/if_link.h
already does the same:

$ grep MAX_ADDR_LEN include/uapi/linux/if_link.h
__u8 mac[32]; /* MAX_ADDR_LEN */

There are no UAPI headers besides these two that use MAX_ADDR_LEN.

Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Acked-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet/tunnel: set inner protocol in network gro hooks
Paolo Abeni [Tue, 7 Mar 2017 17:33:31 +0000 (18:33 +0100)]
net/tunnel: set inner protocol in network gro hooks

[ Upstream commit 294acf1c01bace5cea5d30b510504238bf5f7c25 ]

The gso code of several tunnels type (gre and udp tunnels)
takes for granted that the skb->inner_protocol is properly
initialized and drops the packet elsewhere.

On the forwarding path no one is initializing such field,
so gro encapsulated packets are dropped on forward.

Since commit 38720352412a ("gre: Use inner_proto to obtain
inner header protocol"), this can be reproduced when the
encapsulated packets use gre as the tunneling protocol.

The issue happens also with vxlan and geneve tunnels since
commit 8bce6d7d0d1e ("udp: Generalize skb_udp_segment"), if the
forwarding host's ingress nic has h/w offload for such tunnel
and a vxlan/geneve device is configured on top of it, regardless
of the configured peer address and vni.

To address the issue, this change initialize the inner_protocol
field for encapsulated packets in both ipv4 and ipv6 gro complete
callbacks.

Fixes: 38720352412a ("gre: Use inner_proto to obtain inner header protocol")
Fixes: 8bce6d7d0d1e ("udp: Generalize skb_udp_segment")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agovrf: Fix use-after-free in vrf_xmit
David Ahern [Mon, 6 Mar 2017 16:53:04 +0000 (08:53 -0800)]
vrf: Fix use-after-free in vrf_xmit

[ Upstream commit f7887d40e541f74402df0684a1463c0a0bb68c68 ]

KASAN detected a use-after-free:

[  269.467067] BUG: KASAN: use-after-free in vrf_xmit+0x7f1/0x827 [vrf] at addr ffff8800350a21c0
[  269.467067] Read of size 4 by task ssh/1879
[  269.467067] CPU: 1 PID: 1879 Comm: ssh Not tainted 4.10.0+ #249
[  269.467067] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[  269.467067] Call Trace:
[  269.467067]  dump_stack+0x81/0xb6
[  269.467067]  kasan_object_err+0x21/0x78
[  269.467067]  kasan_report+0x2f7/0x450
[  269.467067]  ? vrf_xmit+0x7f1/0x827 [vrf]
[  269.467067]  ? ip_output+0xa4/0xdb
[  269.467067]  __asan_load4+0x6b/0x6d
[  269.467067]  vrf_xmit+0x7f1/0x827 [vrf]
...

Which corresponds to the skb access after xmit handling. Fix by saving
skb->len and using the saved value to update stats.

Fixes: 193125dbd8eb2 ("net: Introduce VRF device driver")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoteam: use ETH_MAX_MTU as max mtu
Jarod Wilson [Mon, 6 Mar 2017 13:48:58 +0000 (08:48 -0500)]
team: use ETH_MAX_MTU as max mtu

[ Upstream commit 3331aa378e9bcbd0d16de9034b0c20f4050e26b4 ]

This restores the ability to set a team device's mtu to anything higher
than 1500. Similar to the reported issue with bonding, the team driver
calls ether_setup(), which sets an initial max_mtu of 1500, while the
underlying hardware can handle something much larger. Just set it to
ETH_MAX_MTU to support all possible values, and the limitations of the
underlying devices will prevent setting anything too large.

Fixes: 91572088e3fd ("net: use core MTU range checking in core net infra")
CC: Cong Wang <xiyou.wangcong@gmail.com>
CC: Jiri Pirko <jiri@resnulli.us>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodccp: fix use-after-free in dccp_feat_activate_values
Eric Dumazet [Sun, 5 Mar 2017 18:52:16 +0000 (10:52 -0800)]
dccp: fix use-after-free in dccp_feat_activate_values

[ Upstream commit 62f8f4d9066c1c6f2474845d1ca7e2891f2ae3fd ]

Dmitry reported crashes in DCCP stack [1]

Problem here is that when I got rid of listener spinlock, I missed the
fact that DCCP stores a complex state in struct dccp_request_sock,
while TCP does not.

Since multiple cpus could access it at the same time, we need to add
protection.

[1]
BUG: KASAN: use-after-free in dccp_feat_activate_values+0x967/0xab0
net/dccp/feat.c:1541 at addr ffff88003713be68
Read of size 8 by task syz-executor2/8457
CPU: 2 PID: 8457 Comm: syz-executor2 Not tainted 4.10.0-rc7+ #127
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:15 [inline]
 dump_stack+0x292/0x398 lib/dump_stack.c:51
 kasan_object_err+0x1c/0x70 mm/kasan/report.c:162
 print_address_description mm/kasan/report.c:200 [inline]
 kasan_report_error mm/kasan/report.c:289 [inline]
 kasan_report.part.1+0x20e/0x4e0 mm/kasan/report.c:311
 kasan_report mm/kasan/report.c:332 [inline]
 __asan_report_load8_noabort+0x29/0x30 mm/kasan/report.c:332
 dccp_feat_activate_values+0x967/0xab0 net/dccp/feat.c:1541
 dccp_create_openreq_child+0x464/0x610 net/dccp/minisocks.c:121
 dccp_v6_request_recv_sock+0x1f6/0x1960 net/dccp/ipv6.c:457
 dccp_check_req+0x335/0x5a0 net/dccp/minisocks.c:186
 dccp_v6_rcv+0x69e/0x1d00 net/dccp/ipv6.c:711
 ip6_input_finish+0x46d/0x17a0 net/ipv6/ip6_input.c:279
 NF_HOOK include/linux/netfilter.h:257 [inline]
 ip6_input+0xdb/0x590 net/ipv6/ip6_input.c:322
 dst_input include/net/dst.h:507 [inline]
 ip6_rcv_finish+0x289/0x890 net/ipv6/ip6_input.c:69
 NF_HOOK include/linux/netfilter.h:257 [inline]
 ipv6_rcv+0x12ec/0x23d0 net/ipv6/ip6_input.c:203
 __netif_receive_skb_core+0x1ae5/0x3400 net/core/dev.c:4190
 __netif_receive_skb+0x2a/0x170 net/core/dev.c:4228
 process_backlog+0xe5/0x6c0 net/core/dev.c:4839
 napi_poll net/core/dev.c:5202 [inline]
 net_rx_action+0xe70/0x1900 net/core/dev.c:5267
 __do_softirq+0x2fb/0xb7d kernel/softirq.c:284
 do_softirq_own_stack+0x1c/0x30 arch/x86/entry/entry_64.S:902
 </IRQ>
 do_softirq.part.17+0x1e8/0x230 kernel/softirq.c:328
 do_softirq kernel/softirq.c:176 [inline]
 __local_bh_enable_ip+0x1f2/0x200 kernel/softirq.c:181
 local_bh_enable include/linux/bottom_half.h:31 [inline]
 rcu_read_unlock_bh include/linux/rcupdate.h:971 [inline]
 ip6_finish_output2+0xbb0/0x23d0 net/ipv6/ip6_output.c:123
 ip6_finish_output+0x302/0x960 net/ipv6/ip6_output.c:148
 NF_HOOK_COND include/linux/netfilter.h:246 [inline]
 ip6_output+0x1cb/0x8d0 net/ipv6/ip6_output.c:162
 ip6_xmit+0xcdf/0x20d0 include/net/dst.h:501
 inet6_csk_xmit+0x320/0x5f0 net/ipv6/inet6_connection_sock.c:179
 dccp_transmit_skb+0xb09/0x1120 net/dccp/output.c:141
 dccp_xmit_packet+0x215/0x760 net/dccp/output.c:280
 dccp_write_xmit+0x168/0x1d0 net/dccp/output.c:362
 dccp_sendmsg+0x79c/0xb10 net/dccp/proto.c:796
 inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
 sock_sendmsg_nosec net/socket.c:635 [inline]
 sock_sendmsg+0xca/0x110 net/socket.c:645
 SYSC_sendto+0x660/0x810 net/socket.c:1687
 SyS_sendto+0x40/0x50 net/socket.c:1655
 entry_SYSCALL_64_fastpath+0x1f/0xc2
RIP: 0033:0x4458b9
RSP: 002b:00007f8ceb77bb58 EFLAGS: 00000282 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000000000017 RCX: 00000000004458b9
RDX: 0000000000000023 RSI: 0000000020e60000 RDI: 0000000000000017
RBP: 00000000006e1b90 R08: 00000000200f9fe1 R09: 0000000000000020
R10: 0000000000008010 R11: 0000000000000282 R12: 00000000007080a8
R13: 0000000000000000 R14: 00007f8ceb77c9c0 R15: 00007f8ceb77c700
Object at ffff88003713be50, in cache kmalloc-64 size: 64
Allocated:
PID = 8446
 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
 save_stack+0x43/0xd0 mm/kasan/kasan.c:502
 set_track mm/kasan/kasan.c:514 [inline]
 kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:605
 kmem_cache_alloc_trace+0x82/0x270 mm/slub.c:2738
 kmalloc include/linux/slab.h:490 [inline]
 dccp_feat_entry_new+0x214/0x410 net/dccp/feat.c:467
 dccp_feat_push_change+0x38/0x220 net/dccp/feat.c:487
 __feat_register_sp+0x223/0x2f0 net/dccp/feat.c:741
 dccp_feat_propagate_ccid+0x22b/0x2b0 net/dccp/feat.c:949
 dccp_feat_server_ccid_dependencies+0x1b3/0x250 net/dccp/feat.c:1012
 dccp_make_response+0x1f1/0xc90 net/dccp/output.c:423
 dccp_v6_send_response+0x4ec/0xc20 net/dccp/ipv6.c:217
 dccp_v6_conn_request+0xaba/0x11b0 net/dccp/ipv6.c:377
 dccp_rcv_state_process+0x51e/0x1650 net/dccp/input.c:606
 dccp_v6_do_rcv+0x213/0x350 net/dccp/ipv6.c:632
 sk_backlog_rcv include/net/sock.h:893 [inline]
 __sk_receive_skb+0x36f/0xcc0 net/core/sock.c:479
 dccp_v6_rcv+0xba5/0x1d00 net/dccp/ipv6.c:742
 ip6_input_finish+0x46d/0x17a0 net/ipv6/ip6_input.c:279
 NF_HOOK include/linux/netfilter.h:257 [inline]
 ip6_input+0xdb/0x590 net/ipv6/ip6_input.c:322
 dst_input include/net/dst.h:507 [inline]
 ip6_rcv_finish+0x289/0x890 net/ipv6/ip6_input.c:69
 NF_HOOK include/linux/netfilter.h:257 [inline]
 ipv6_rcv+0x12ec/0x23d0 net/ipv6/ip6_input.c:203
 __netif_receive_skb_core+0x1ae5/0x3400 net/core/dev.c:4190
 __netif_receive_skb+0x2a/0x170 net/core/dev.c:4228
 process_backlog+0xe5/0x6c0 net/core/dev.c:4839
 napi_poll net/core/dev.c:5202 [inline]
 net_rx_action+0xe70/0x1900 net/core/dev.c:5267
 __do_softirq+0x2fb/0xb7d kernel/softirq.c:284
Freed:
PID = 15
 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
 save_stack+0x43/0xd0 mm/kasan/kasan.c:502
 set_track mm/kasan/kasan.c:514 [inline]
 kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:578
 slab_free_hook mm/slub.c:1355 [inline]
 slab_free_freelist_hook mm/slub.c:1377 [inline]
 slab_free mm/slub.c:2954 [inline]
 kfree+0xe8/0x2b0 mm/slub.c:3874
 dccp_feat_entry_destructor.part.4+0x48/0x60 net/dccp/feat.c:418
 dccp_feat_entry_destructor net/dccp/feat.c:416 [inline]
 dccp_feat_list_pop net/dccp/feat.c:541 [inline]
 dccp_feat_activate_values+0x57f/0xab0 net/dccp/feat.c:1543
 dccp_create_openreq_child+0x464/0x610 net/dccp/minisocks.c:121
 dccp_v6_request_recv_sock+0x1f6/0x1960 net/dccp/ipv6.c:457
 dccp_check_req+0x335/0x5a0 net/dccp/minisocks.c:186
 dccp_v6_rcv+0x69e/0x1d00 net/dccp/ipv6.c:711
 ip6_input_finish+0x46d/0x17a0 net/ipv6/ip6_input.c:279
 NF_HOOK include/linux/netfilter.h:257 [inline]
 ip6_input+0xdb/0x590 net/ipv6/ip6_input.c:322
 dst_input include/net/dst.h:507 [inline]
 ip6_rcv_finish+0x289/0x890 net/ipv6/ip6_input.c:69
 NF_HOOK include/linux/netfilter.h:257 [inline]
 ipv6_rcv+0x12ec/0x23d0 net/ipv6/ip6_input.c:203
 __netif_receive_skb_core+0x1ae5/0x3400 net/core/dev.c:4190
 __netif_receive_skb+0x2a/0x170 net/core/dev.c:4228
 process_backlog+0xe5/0x6c0 net/core/dev.c:4839
 napi_poll net/core/dev.c:5202 [inline]
 net_rx_action+0xe70/0x1900 net/core/dev.c:5267
 __do_softirq+0x2fb/0xb7d kernel/softirq.c:284
Memory state around the buggy address:
 ffff88003713bd00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff88003713bd80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff88003713be00: fc fc fc fc fc fc fc fc fc fc fb fb fb fb fb fb
                                                          ^

Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet/sched: act_skbmod: remove unneeded rcu_read_unlock in tcf_skbmod_dump
Alexey Khoroshilov [Sun, 5 Mar 2017 00:01:55 +0000 (03:01 +0300)]
net/sched: act_skbmod: remove unneeded rcu_read_unlock in tcf_skbmod_dump

[ Upstream commit 6c4dc75c251721f517e9daeb5370ea606b5b35ce ]

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet: fix socket refcounting in skb_complete_tx_timestamp()
Eric Dumazet [Sat, 4 Mar 2017 05:01:03 +0000 (21:01 -0800)]
net: fix socket refcounting in skb_complete_tx_timestamp()

[ Upstream commit 9ac25fc063751379cb77434fef9f3b088cd3e2f7 ]

TX skbs do not necessarily hold a reference on skb->sk->sk_refcnt
By the time TX completion happens, sk_refcnt might be already 0.

sock_hold()/sock_put() would then corrupt critical state, like
sk_wmem_alloc and lead to leaks or use after free.

Fixes: 62bccb8cdb69 ("net-timestamp: Make the clone operation stand-alone from phy timestamping")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet: fix socket refcounting in skb_complete_wifi_ack()
Eric Dumazet [Sat, 4 Mar 2017 05:01:02 +0000 (21:01 -0800)]
net: fix socket refcounting in skb_complete_wifi_ack()

[ Upstream commit dd4f10722aeb10f4f582948839f066bebe44e5fb ]

TX skbs do not necessarily hold a reference on skb->sk->sk_refcnt
By the time TX completion happens, sk_refcnt might be already 0.

sock_hold()/sock_put() would then corrupt critical state, like
sk_wmem_alloc.

Fixes: bf7fa551e0ce ("mac80211: Resolve sk_refcnt/sk_wmem_alloc issue in wifi ack path")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agotcp: fix various issues for sockets morphing to listen state
Eric Dumazet [Fri, 3 Mar 2017 22:08:21 +0000 (14:08 -0800)]
tcp: fix various issues for sockets morphing to listen state

[ Upstream commit 02b2faaf0af1d85585f6d6980e286d53612acfc2 ]

Dmitry Vyukov reported a divide by 0 triggered by syzkaller, exploiting
tcp_disconnect() path that was never really considered and/or used
before syzkaller ;)

I was not able to reproduce the bug, but it seems issues here are the
three possible actions that assumed they would never trigger on a
listener.

1) tcp_write_timer_handler
2) tcp_delack_timer_handler
3) MTU reduction

Only IPv6 MTU reduction was properly testing TCP_CLOSE and TCP_LISTEN
 states from tcp_v6_mtu_reduced()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agostrparser: destroy workqueue on module exit
WANG Cong [Fri, 3 Mar 2017 20:21:14 +0000 (12:21 -0800)]
strparser: destroy workqueue on module exit

[ Upstream commit f78ef7cd9a0686b979679d0de061c6dbfd8d649e ]

Fixes: 43a0c6751a32 ("strparser: Stream parser for messages")
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agobonding: use ETH_MAX_MTU as max mtu
WANG Cong [Thu, 2 Mar 2017 20:24:36 +0000 (12:24 -0800)]
bonding: use ETH_MAX_MTU as max mtu

[ Upstream commit 31c05415f5b471fd333fe42629788364faea8e0d ]

This restores the ability of setting bond device's mtu to 9000.

Fixes: 91572088e3fd ("net: use core MTU range checking in core net infra")
Reported-by: daznis@gmail.com
Reported-by: Brad Campbell <lists2009@fnarfbargle.com>
Cc: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoamd-xgbe: Don't overwrite SFP PHY mod_absent settings
Lendacky, Thomas [Tue, 28 Feb 2017 21:03:10 +0000 (15:03 -0600)]
amd-xgbe: Don't overwrite SFP PHY mod_absent settings

[ Upstream commit 2697ea5a859b83ca49511dcfd98daf42584eb3cf ]

If an SFP module is not present, xgbe_phy_sfp_phy_settings() should
return after applying the default settings. Currently there is no return
statement and the default settings are overwritten.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoamd-xgbe: Be sure to set MDIO modes on device (re)start
Lendacky, Thomas [Tue, 28 Feb 2017 21:03:01 +0000 (15:03 -0600)]
amd-xgbe: Be sure to set MDIO modes on device (re)start

[ Upstream commit b42c6761fd1651f564491b53016046c9ebf0b2a9 ]

The MDIO register mode is set when the device is probed. But when the
device is brought down and then back up, the MDIO register mode has been
reset.  Be sure to reset the mode during device startup and only change
the mode of the address specified.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoamd-xgbe: Stop the PHY before releasing interrupts
Lendacky, Thomas [Tue, 28 Feb 2017 21:02:51 +0000 (15:02 -0600)]
amd-xgbe: Stop the PHY before releasing interrupts

[ Upstream commit 402168b4c2dc0734b8fbd282eff77da0275c5129 ]

Some configurations require the use of the hardware's MDIO support to
communicate with external PHYs. The MDIO commands indicate completion
through the device interrupt. When bringing down the device the interrupts
were released before stopping the external PHY, resulting in MDIO command
timeouts. Move the stopping of the PHY to before the releasing of the
interrupts.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodccp: Unlock sock before calling sk_free()
Arnaldo Carvalho de Melo [Wed, 1 Mar 2017 19:35:07 +0000 (16:35 -0300)]
dccp: Unlock sock before calling sk_free()

[ Upstream commit d5afb6f9b6bb2c57bd0c05e76e12489dc0d037d9 ]

The code where sk_clone() came from created a new socket and locked it,
but then, on the error path didn't unlock it.

This problem stayed there for a long while, till b0691c8ee7c2 ("net:
Unlock sock before calling sk_free()") fixed it, but unfortunately the
callers of sk_clone() (now sk_clone_locked()) were not audited and the
one in dccp_create_openreq_child() remained.

Now in the age of the syskaller fuzzer, this was finally uncovered, as
reported by Dmitry:

 ---- 8< ----

I've got the following report while running syzkaller fuzzer on
86292b33d4b7 ("Merge branch 'akpm' (patches from Andrew)")

  [ BUG: held lock freed! ]
  4.10.0+ #234 Not tainted
  -------------------------
  syz-executor6/6898 is freeing memory
  ffff88006286cac0-ffff88006286d3b7, with a lock still held there!
   (slock-AF_INET6){+.-...}, at: [<ffffffff8362c2c9>] spin_lock
  include/linux/spinlock.h:299 [inline]
   (slock-AF_INET6){+.-...}, at: [<ffffffff8362c2c9>]
  sk_clone_lock+0x3d9/0x12c0 net/core/sock.c:1504
  5 locks held by syz-executor6/6898:
   #0:  (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff839a34b4>] lock_sock
  include/net/sock.h:1460 [inline]
   #0:  (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff839a34b4>]
  inet_stream_connect+0x44/0xa0 net/ipv4/af_inet.c:681
   #1:  (rcu_read_lock){......}, at: [<ffffffff83bc1c2a>]
  inet6_csk_xmit+0x12a/0x5d0 net/ipv6/inet6_connection_sock.c:126
   #2:  (rcu_read_lock){......}, at: [<ffffffff8369b424>] __skb_unlink
  include/linux/skbuff.h:1767 [inline]
   #2:  (rcu_read_lock){......}, at: [<ffffffff8369b424>] __skb_dequeue
  include/linux/skbuff.h:1783 [inline]
   #2:  (rcu_read_lock){......}, at: [<ffffffff8369b424>]
  process_backlog+0x264/0x730 net/core/dev.c:4835
   #3:  (rcu_read_lock){......}, at: [<ffffffff83aeb5c0>]
  ip6_input_finish+0x0/0x1700 net/ipv6/ip6_input.c:59
   #4:  (slock-AF_INET6){+.-...}, at: [<ffffffff8362c2c9>] spin_lock
  include/linux/spinlock.h:299 [inline]
   #4:  (slock-AF_INET6){+.-...}, at: [<ffffffff8362c2c9>]
  sk_clone_lock+0x3d9/0x12c0 net/core/sock.c:1504

Fix it just like was done by b0691c8ee7c2 ("net: Unlock sock before calling
sk_free()").

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170301153510.GE15145@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoipv6: orphan skbs in reassembly unit
Eric Dumazet [Wed, 1 Mar 2017 22:45:06 +0000 (14:45 -0800)]
ipv6: orphan skbs in reassembly unit

[ Upstream commit 48cac18ecf1de82f76259a54402c3adb7839ad01 ]

Andrey reported a use-after-free in IPv6 stack.

Issue here is that we free the socket while it still has skb
in TX path and in some queues.

It happens here because IPv6 reassembly unit messes skb->truesize,
breaking skb_set_owner_w() badly.

We fixed a similar issue for IPV4 in commit 8282f27449bf ("inet: frag:
Always orphan skbs inside ip_defrag()")
Acked-by: Joe Stringer <joe@ovn.org>
==================================================================
BUG: KASAN: use-after-free in sock_wfree+0x118/0x120
Read of size 8 at addr ffff880062da0060 by task a.out/4140

page:ffffea00018b6800 count:1 mapcount:0 mapping:          (null)
index:0x0 compound_mapcount: 0
flags: 0x100000000008100(slab|head)
raw: 0100000000008100 0000000000000000 0000000000000000 0000000180130013
raw: dead000000000100 dead000000000200 ffff88006741f140 0000000000000000
page dumped because: kasan: bad access detected

CPU: 0 PID: 4140 Comm: a.out Not tainted 4.10.0-rc3+ #59
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:15
 dump_stack+0x292/0x398 lib/dump_stack.c:51
 describe_address mm/kasan/report.c:262
 kasan_report_error+0x121/0x560 mm/kasan/report.c:370
 kasan_report mm/kasan/report.c:392
 __asan_report_load8_noabort+0x3e/0x40 mm/kasan/report.c:413
 sock_flag ./arch/x86/include/asm/bitops.h:324
 sock_wfree+0x118/0x120 net/core/sock.c:1631
 skb_release_head_state+0xfc/0x250 net/core/skbuff.c:655
 skb_release_all+0x15/0x60 net/core/skbuff.c:668
 __kfree_skb+0x15/0x20 net/core/skbuff.c:684
 kfree_skb+0x16e/0x4e0 net/core/skbuff.c:705
 inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304
 inet_frag_put ./include/net/inet_frag.h:133
 nf_ct_frag6_gather+0x1125/0x38b0 net/ipv6/netfilter/nf_conntrack_reasm.c:617
 ipv6_defrag+0x21b/0x350 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68
 nf_hook_entry_hookfn ./include/linux/netfilter.h:102
 nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310
 nf_hook ./include/linux/netfilter.h:212
 __ip6_local_out+0x52c/0xaf0 net/ipv6/output_core.c:160
 ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170
 ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722
 ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742
 rawv6_push_pending_frames net/ipv6/raw.c:613
 rawv6_sendmsg+0x2cff/0x4130 net/ipv6/raw.c:927
 inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
 sock_sendmsg_nosec net/socket.c:635
 sock_sendmsg+0xca/0x110 net/socket.c:645
 sock_write_iter+0x326/0x620 net/socket.c:848
 new_sync_write fs/read_write.c:499
 __vfs_write+0x483/0x760 fs/read_write.c:512
 vfs_write+0x187/0x530 fs/read_write.c:560
 SYSC_write fs/read_write.c:607
 SyS_write+0xfb/0x230 fs/read_write.c:599
 entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203
RIP: 0033:0x7ff26e6f5b79
RSP: 002b:00007ff268e0ed98 EFLAGS: 00000206 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00007ff268e0f9c0 RCX: 00007ff26e6f5b79
RDX: 0000000000000010 RSI: 0000000020f50fe1 RDI: 0000000000000003
RBP: 00007ff26ebc1220 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
R13: 00007ff268e0f9c0 R14: 00007ff26efec040 R15: 0000000000000003

The buggy address belongs to the object at ffff880062da0000
 which belongs to the cache RAWv6 of size 1504
The buggy address ffff880062da0060 is located 96 bytes inside
 of 1504-byte region [ffff880062da0000ffff880062da05e0)

Freed by task 4113:
 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
 save_stack+0x43/0xd0 mm/kasan/kasan.c:502
 set_track mm/kasan/kasan.c:514
 kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:578
 slab_free_hook mm/slub.c:1352
 slab_free_freelist_hook mm/slub.c:1374
 slab_free mm/slub.c:2951
 kmem_cache_free+0xb2/0x2c0 mm/slub.c:2973
 sk_prot_free net/core/sock.c:1377
 __sk_destruct+0x49c/0x6e0 net/core/sock.c:1452
 sk_destruct+0x47/0x80 net/core/sock.c:1460
 __sk_free+0x57/0x230 net/core/sock.c:1468
 sk_free+0x23/0x30 net/core/sock.c:1479
 sock_put ./include/net/sock.h:1638
 sk_common_release+0x31e/0x4e0 net/core/sock.c:2782
 rawv6_close+0x54/0x80 net/ipv6/raw.c:1214
 inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425
 inet6_release+0x50/0x70 net/ipv6/af_inet6.c:431
 sock_release+0x8d/0x1e0 net/socket.c:599
 sock_close+0x16/0x20 net/socket.c:1063
 __fput+0x332/0x7f0 fs/file_table.c:208
 ____fput+0x15/0x20 fs/file_table.c:244
 task_work_run+0x19b/0x270 kernel/task_work.c:116
 exit_task_work ./include/linux/task_work.h:21
 do_exit+0x186b/0x2800 kernel/exit.c:839
 do_group_exit+0x149/0x420 kernel/exit.c:943
 SYSC_exit_group kernel/exit.c:954
 SyS_exit_group+0x1d/0x20 kernel/exit.c:952
 entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203

Allocated by task 4115:
 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
 save_stack+0x43/0xd0 mm/kasan/kasan.c:502
 set_track mm/kasan/kasan.c:514
 kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:605
 kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:544
 slab_post_alloc_hook mm/slab.h:432
 slab_alloc_node mm/slub.c:2708
 slab_alloc mm/slub.c:2716
 kmem_cache_alloc+0x1af/0x250 mm/slub.c:2721
 sk_prot_alloc+0x65/0x2a0 net/core/sock.c:1334
 sk_alloc+0x105/0x1010 net/core/sock.c:1396
 inet6_create+0x44d/0x1150 net/ipv6/af_inet6.c:183
 __sock_create+0x4f6/0x880 net/socket.c:1199
 sock_create net/socket.c:1239
 SYSC_socket net/socket.c:1269
 SyS_socket+0xf9/0x230 net/socket.c:1249
 entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203

Memory state around the buggy address:
 ffff880062d9ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff880062d9ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff880062da0000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                       ^
 ffff880062da0080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff880062da0100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet: net_enable_timestamp() can be called from irq contexts
Eric Dumazet [Wed, 1 Mar 2017 22:28:39 +0000 (14:28 -0800)]
net: net_enable_timestamp() can be called from irq contexts

[ Upstream commit 13baa00ad01bb3a9f893e3a08cbc2d072fc0c15d ]

It is now very clear that silly TCP listeners might play with
enabling/disabling timestamping while new children are added
to their accept queue.

Meaning net_enable_timestamp() can be called from BH context
while current state of the static key is not enabled.

Lets play safe and allow all contexts.

The work queue is scheduled only under the problematic cases,
which are the static key enable/disable transition, to not slow down
critical paths.

This extends and improves what we did in commit 5fa8bbda38c6 ("net: use
a work queue to defer net_disable_timestamp() work")

Fixes: b90e5794c5bd ("net: dont call jump_label_dec from irq context")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet: don't call strlen() on the user buffer in packet_bind_spkt()
Alexander Potapenko [Wed, 1 Mar 2017 11:57:20 +0000 (12:57 +0100)]
net: don't call strlen() on the user buffer in packet_bind_spkt()

[ Upstream commit 540e2894f7905538740aaf122bd8e0548e1c34a4 ]

KMSAN (KernelMemorySanitizer, a new error detection tool) reports use of
uninitialized memory in packet_bind_spkt():
Acked-by: Eric Dumazet <edumazet@google.com>
==================================================================
BUG: KMSAN: use of unitialized memory
CPU: 0 PID: 1074 Comm: packet Not tainted 4.8.0-rc6+ #1891
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
01/01/2011
 0000000000000000 ffff88006b6dfc08 ffffffff82559ae8 ffff88006b6dfb48
 ffffffff818a7c91 ffffffff85b9c870 0000000000000092 ffffffff85b9c550
 0000000000000000 0000000000000092 00000000ec400911 0000000000000002
Call Trace:
 [<     inline     >] __dump_stack lib/dump_stack.c:15
 [<ffffffff82559ae8>] dump_stack+0x238/0x290 lib/dump_stack.c:51
 [<ffffffff818a6626>] kmsan_report+0x276/0x2e0 mm/kmsan/kmsan.c:1003
 [<ffffffff818a783b>] __msan_warning+0x5b/0xb0
mm/kmsan/kmsan_instr.c:424
 [<     inline     >] strlen lib/string.c:484
 [<ffffffff8259b58d>] strlcpy+0x9d/0x200 lib/string.c:144
 [<ffffffff84b2eca4>] packet_bind_spkt+0x144/0x230
net/packet/af_packet.c:3132
 [<ffffffff84242e4d>] SYSC_bind+0x40d/0x5f0 net/socket.c:1370
 [<ffffffff84242a22>] SyS_bind+0x82/0xa0 net/socket.c:1356
 [<ffffffff8515991b>] entry_SYSCALL_64_fastpath+0x13/0x8f
arch/x86/entry/entry_64.o:?
chained origin: 00000000eba00911
 [<ffffffff810bb787>] save_stack_trace+0x27/0x50
arch/x86/kernel/stacktrace.c:67
 [<     inline     >] kmsan_save_stack_with_flags mm/kmsan/kmsan.c:322
 [<     inline     >] kmsan_save_stack mm/kmsan/kmsan.c:334
 [<ffffffff818a59f8>] kmsan_internal_chain_origin+0x118/0x1e0
mm/kmsan/kmsan.c:527
 [<ffffffff818a7773>] __msan_set_alloca_origin4+0xc3/0x130
mm/kmsan/kmsan_instr.c:380
 [<ffffffff84242b69>] SYSC_bind+0x129/0x5f0 net/socket.c:1356
 [<ffffffff84242a22>] SyS_bind+0x82/0xa0 net/socket.c:1356
 [<ffffffff8515991b>] entry_SYSCALL_64_fastpath+0x13/0x8f
arch/x86/entry/entry_64.o:?
origin description: ----address@SYSC_bind (origin=00000000eb400911)
==================================================================
(the line numbers are relative to 4.8-rc6, but the bug persists
upstream)

, when I run the following program as root:

=====================================
 #include <string.h>
 #include <sys/socket.h>
 #include <netpacket/packet.h>
 #include <net/ethernet.h>

 int main() {
   struct sockaddr addr;
   memset(&addr, 0xff, sizeof(addr));
   addr.sa_family = AF_PACKET;
   int fd = socket(PF_PACKET, SOCK_PACKET, htons(ETH_P_ALL));
   bind(fd, &addr, sizeof(addr));
   return 0;
 }
=====================================

This happens because addr.sa_data copied from the userspace is not
zero-terminated, and copying it with strlcpy() in packet_bind_spkt()
results in calling strlen() on the kernel copy of that non-terminated
buffer.

Signed-off-by: Alexander Potapenko <glider@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet: bridge: allow IPv6 when multicast flood is disabled
Mike Manning [Wed, 1 Mar 2017 09:55:28 +0000 (09:55 +0000)]
net: bridge: allow IPv6 when multicast flood is disabled

[ Upstream commit 8953de2f02ad7b15e4964c82f9afd60f128e4e98 ]

Even with multicast flooding turned off, IPv6 ND should still work so
that IPv6 connectivity is provided. Allow this by continuing to flood
multicast traffic originated by us.

Fixes: b6cb5ac8331b ("net: bridge: add per-port multicast flood flag")
Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Mike Manning <mmanning@brocade.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agotcp/dccp: block BH for SYN processing
Eric Dumazet [Wed, 1 Mar 2017 16:39:49 +0000 (08:39 -0800)]
tcp/dccp: block BH for SYN processing

[ Upstream commit 449809a66c1d0b1563dee84493e14bf3104d2d7e ]

SYN processing really was meant to be handled from BH.

When I got rid of BH blocking while processing socket backlog
in commit 5413d1babe8f ("net: do not block BH while processing socket
backlog"), I forgot that a malicious user could transition to TCP_LISTEN
from a state that allowed (SYN) packets to be parked in the socket
backlog while socket is owned by the thread doing the listen() call.

Sure enough syzkaller found this and reported the bug ;)

=================================
[ INFO: inconsistent lock state ]
4.10.0+ #60 Not tainted
---------------------------------
inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
syz-executor0/5090 [HC0[0]:SC0[0]:HE1:SE1] takes:
 (&(&hashinfo->ehash_locks[i])->rlock){+.?...}, at:
[<ffffffff83a6a370>] spin_lock include/linux/spinlock.h:299 [inline]
 (&(&hashinfo->ehash_locks[i])->rlock){+.?...}, at:
[<ffffffff83a6a370>] inet_ehash_insert+0x240/0xad0
net/ipv4/inet_hashtables.c:407
{IN-SOFTIRQ-W} state was registered at:
  mark_irqflags kernel/locking/lockdep.c:2923 [inline]
  __lock_acquire+0xbcf/0x3270 kernel/locking/lockdep.c:3295
  lock_acquire+0x241/0x580 kernel/locking/lockdep.c:3753
  __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
  _raw_spin_lock+0x33/0x50 kernel/locking/spinlock.c:151
  spin_lock include/linux/spinlock.h:299 [inline]
  inet_ehash_insert+0x240/0xad0 net/ipv4/inet_hashtables.c:407
  reqsk_queue_hash_req net/ipv4/inet_connection_sock.c:753 [inline]
  inet_csk_reqsk_queue_hash_add+0x1b7/0x2a0 net/ipv4/inet_connection_sock.c:764
  tcp_conn_request+0x25cc/0x3310 net/ipv4/tcp_input.c:6399
  tcp_v4_conn_request+0x157/0x220 net/ipv4/tcp_ipv4.c:1262
  tcp_rcv_state_process+0x802/0x4130 net/ipv4/tcp_input.c:5889
  tcp_v4_do_rcv+0x56b/0x940 net/ipv4/tcp_ipv4.c:1433
  tcp_v4_rcv+0x2e12/0x3210 net/ipv4/tcp_ipv4.c:1711
  ip_local_deliver_finish+0x4ce/0xc40 net/ipv4/ip_input.c:216
  NF_HOOK include/linux/netfilter.h:257 [inline]
  ip_local_deliver+0x1ce/0x710 net/ipv4/ip_input.c:257
  dst_input include/net/dst.h:492 [inline]
  ip_rcv_finish+0xb1d/0x2110 net/ipv4/ip_input.c:396
  NF_HOOK include/linux/netfilter.h:257 [inline]
  ip_rcv+0xd90/0x19c0 net/ipv4/ip_input.c:487
  __netif_receive_skb_core+0x1ad1/0x3400 net/core/dev.c:4179
  __netif_receive_skb+0x2a/0x170 net/core/dev.c:4217
  netif_receive_skb_internal+0x1d6/0x430 net/core/dev.c:4245
  napi_skb_finish net/core/dev.c:4602 [inline]
  napi_gro_receive+0x4e6/0x680 net/core/dev.c:4636
  e1000_receive_skb drivers/net/ethernet/intel/e1000/e1000_main.c:4033 [inline]
  e1000_clean_rx_irq+0x5e0/0x1490
drivers/net/ethernet/intel/e1000/e1000_main.c:4489
  e1000_clean+0xb9a/0x2910 drivers/net/ethernet/intel/e1000/e1000_main.c:3834
  napi_poll net/core/dev.c:5171 [inline]
  net_rx_action+0xe70/0x1900 net/core/dev.c:5236
  __do_softirq+0x2fb/0xb7d kernel/softirq.c:284
  invoke_softirq kernel/softirq.c:364 [inline]
  irq_exit+0x19e/0x1d0 kernel/softirq.c:405
  exiting_irq arch/x86/include/asm/apic.h:658 [inline]
  do_IRQ+0x81/0x1a0 arch/x86/kernel/irq.c:250
  ret_from_intr+0x0/0x20
  native_safe_halt+0x6/0x10 arch/x86/include/asm/irqflags.h:53
  arch_safe_halt arch/x86/include/asm/paravirt.h:98 [inline]
  default_idle+0x8f/0x410 arch/x86/kernel/process.c:271
  arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:262
  default_idle_call+0x36/0x60 kernel/sched/idle.c:96
  cpuidle_idle_call kernel/sched/idle.c:154 [inline]
  do_idle+0x348/0x440 kernel/sched/idle.c:243
  cpu_startup_entry+0x18/0x20 kernel/sched/idle.c:345
  start_secondary+0x344/0x440 arch/x86/kernel/smpboot.c:272
  verify_cpu+0x0/0xfc
irq event stamp: 1741
hardirqs last  enabled at (1741): [<ffffffff84d49d77>]
__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160
[inline]
hardirqs last  enabled at (1741): [<ffffffff84d49d77>]
_raw_spin_unlock_irqrestore+0xf7/0x1a0 kernel/locking/spinlock.c:191
hardirqs last disabled at (1740): [<ffffffff84d4a732>]
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:108 [inline]
hardirqs last disabled at (1740): [<ffffffff84d4a732>]
_raw_spin_lock_irqsave+0xa2/0x110 kernel/locking/spinlock.c:159
softirqs last  enabled at (1738): [<ffffffff84d4deff>]
__do_softirq+0x7cf/0xb7d kernel/softirq.c:310
softirqs last disabled at (1571): [<ffffffff84d4b92c>]
do_softirq_own_stack+0x1c/0x30 arch/x86/entry/entry_64.S:902

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(&hashinfo->ehash_locks[i])->rlock);
  <Interrupt>
    lock(&(&hashinfo->ehash_locks[i])->rlock);

 *** DEADLOCK ***

1 lock held by syz-executor0/5090:
 #0:  (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff83406b43>] lock_sock
include/net/sock.h:1460 [inline]
 #0:  (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff83406b43>]
sock_setsockopt+0x233/0x1e40 net/core/sock.c:683

stack backtrace:
CPU: 1 PID: 5090 Comm: syz-executor0 Not tainted 4.10.0+ #60
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:15 [inline]
 dump_stack+0x292/0x398 lib/dump_stack.c:51
 print_usage_bug+0x3ef/0x450 kernel/locking/lockdep.c:2387
 valid_state kernel/locking/lockdep.c:2400 [inline]
 mark_lock_irq kernel/locking/lockdep.c:2602 [inline]
 mark_lock+0xf30/0x1410 kernel/locking/lockdep.c:3065
 mark_irqflags kernel/locking/lockdep.c:2941 [inline]
 __lock_acquire+0x6dc/0x3270 kernel/locking/lockdep.c:3295
 lock_acquire+0x241/0x580 kernel/locking/lockdep.c:3753
 __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
 _raw_spin_lock+0x33/0x50 kernel/locking/spinlock.c:151
 spin_lock include/linux/spinlock.h:299 [inline]
 inet_ehash_insert+0x240/0xad0 net/ipv4/inet_hashtables.c:407
 reqsk_queue_hash_req net/ipv4/inet_connection_sock.c:753 [inline]
 inet_csk_reqsk_queue_hash_add+0x1b7/0x2a0 net/ipv4/inet_connection_sock.c:764
 dccp_v6_conn_request+0xada/0x11b0 net/dccp/ipv6.c:380
 dccp_rcv_state_process+0x51e/0x1660 net/dccp/input.c:606
 dccp_v6_do_rcv+0x213/0x350 net/dccp/ipv6.c:632
 sk_backlog_rcv include/net/sock.h:896 [inline]
 __release_sock+0x127/0x3a0 net/core/sock.c:2052
 release_sock+0xa5/0x2b0 net/core/sock.c:2539
 sock_setsockopt+0x60f/0x1e40 net/core/sock.c:1016
 SYSC_setsockopt net/socket.c:1782 [inline]
 SyS_setsockopt+0x2fb/0x3a0 net/socket.c:1765
 entry_SYSCALL_64_fastpath+0x1f/0xc2
RIP: 0033:0x4458b9
RSP: 002b:00007fe8b26c2b58 EFLAGS: 00000292 ORIG_RAX: 0000000000000036
RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 00000000004458b9
RDX: 000000000000001a RSI: 0000000000000001 RDI: 0000000000000006
RBP: 00000000006e2110 R08: 0000000000000010 R09: 0000000000000000
R10: 00000000208c3000 R11: 0000000000000292 R12: 0000000000708000
R13: 0000000020000000 R14: 0000000000001000 R15: 0000000000000000

Fixes: 5413d1babe8f ("net: do not block BH while processing socket backlog")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agomlxsw: spectrum_router: Avoid potential packets loss
Ido Schimmel [Tue, 28 Feb 2017 07:55:40 +0000 (08:55 +0100)]
mlxsw: spectrum_router: Avoid potential packets loss

[ Upstream commit f7df4923fa986247e93ec2cdff5ca168fff14dcf ]

When the structure of the LPM tree changes (f.e., due to the addition of
a new prefix), we unbind the old tree and then bind the new one. This
may result in temporary packet loss.

Instead, overwrite the old binding with the new one.

Fixes: 6b75c4807db3 ("mlxsw: spectrum_router: Add virtual router management")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agogeneve: lock RCU on TX path
Jakub Kicinski [Fri, 24 Feb 2017 19:43:37 +0000 (11:43 -0800)]
geneve: lock RCU on TX path

[ Upstream commit a717e3f740803cc88bd5c9a70c93504f6a368663 ]

There is no guarantees that callers of the TX path will hold
the RCU lock.  Grab it explicitly.

Fixes: fceb9c3e3825 ("geneve: avoid using stale geneve socket.")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agovxlan: lock RCU on TX path
Jakub Kicinski [Fri, 24 Feb 2017 19:43:36 +0000 (11:43 -0800)]
vxlan: lock RCU on TX path

[ Upstream commit 56de859e9967c070464a9a9f4f18d73f9447298e ]

There is no guarantees that callers of the TX path will hold
the RCU lock.  Grab it explicitly.

Fixes: c6fcc4fc5f8b ("vxlan: avoid using stale vxlan socket.")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agol2tp: avoid use-after-free caused by l2tp_ip_backlog_recv
Paul Hüber [Sun, 26 Feb 2017 16:58:19 +0000 (17:58 +0100)]
l2tp: avoid use-after-free caused by l2tp_ip_backlog_recv

[ Upstream commit 51fb60eb162ab84c5edf2ae9c63cf0b878e5547e ]

l2tp_ip_backlog_recv may not return -1 if the packet gets dropped.
The return value is passed up to ip_local_deliver_finish, which treats
negative values as an IP protocol number for resubmission.

Signed-off-by: Paul Hüber <phueber@kernsp.in>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet sched actions: decrement module reference count after table flush.
Roman Mashak [Fri, 24 Feb 2017 16:00:32 +0000 (11:00 -0500)]
net sched actions: decrement module reference count after table flush.

[ Upstream commit edb9d1bff4bbe19b8ae0e71b1f38732591a9eeb2 ]

When tc actions are loaded as a module and no actions have been installed,
flushing them would result in actions removed from the memory, but modules
reference count not being decremented, so that the modules would not be
unloaded.

Following is example with GACT action:

% sudo modprobe act_gact
% lsmod
Module                  Size  Used by
act_gact               16384  0
%
% sudo tc actions ls action gact
%
% sudo tc actions flush action gact
% lsmod
Module                  Size  Used by
act_gact               16384  1
% sudo tc actions flush action gact
% lsmod
Module                  Size  Used by
act_gact               16384  2
% sudo rmmod act_gact
rmmod: ERROR: Module act_gact is in use
....

After the fix:
% lsmod
Module                  Size  Used by
act_gact               16384  0
%
% sudo tc actions add action pass index 1
% sudo tc actions add action pass index 2
% sudo tc actions add action pass index 3
% lsmod
Module                  Size  Used by
act_gact               16384  3
%
% sudo tc actions flush action gact
% lsmod
Module                  Size  Used by
act_gact               16384  0
%
% sudo tc actions flush action gact
% lsmod
Module                  Size  Used by
act_gact               16384  0
% sudo rmmod act_gact
% lsmod
Module                  Size  Used by
%

Fixes: f97017cdefef ("net-sched: Fix actions flushing")
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agosctp: set sin_port for addr param when checking duplicate address
Xin Long [Fri, 24 Feb 2017 07:18:46 +0000 (15:18 +0800)]
sctp: set sin_port for addr param when checking duplicate address

[ Upstream commit 2e3ce5bc2aa938653c3866aa7f4901a1f199b1c8 ]

Commit b8607805dd15 ("sctp: not copying duplicate addrs to the assoc's
bind address list") tried to check for duplicate address before copying
to asoc's bind_addr list from global addr list.

But all the addrs' sin_ports in global addr list are 0 while the addrs'
sin_ports are bp->port in asoc's bind_addr list. It means even if it's
a duplicate address, af->cmp_addr will still return 0 as the their
sin_ports are different.

This patch is to fix it by setting the sin_port for addr param with
bp->port before comparing the addrs.

Fixes: b8607805dd15 ("sctp: not copying duplicate addrs to the assoc's bind address list")
Reported-by: Wei Chen <weichen@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoipv4: mask tos for input route
Julian Anastasov [Sun, 26 Feb 2017 15:14:35 +0000 (17:14 +0200)]
ipv4: mask tos for input route

[ Upstream commit 6e28099d38c0e50d62c1afc054e37e573adf3d21 ]

Restore the lost masking of TOS in input route code to
allow ip rules to match it properly.

Problem [1] noticed by Shmulik Ladkani <shmulik.ladkani@gmail.com>

[1] http://marc.info/?t=137331755300040&r=1&w=2

Fixes: 89aef8921bfb ("ipv4: Delete routing cache.")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoipv4: add missing initialization for flowi4_uid
Julian Anastasov [Sun, 26 Feb 2017 13:50:52 +0000 (15:50 +0200)]
ipv4: add missing initialization for flowi4_uid

[ Upstream commit 8bcfd0925ef15f072ba1e7bee2c25e9e1b5fd6ca ]

Avoid matching of random stack value for uid when rules
are looked up on input route or when RP filter is used.
Problem should affect only setups that use ip rules with
uid range.

Fixes: 622ec2c9d524 ("net: core: add UID to flows, rules, and routes")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agovxlan: don't allow overwrite of config src addr
Brian Russell [Fri, 24 Feb 2017 17:47:11 +0000 (17:47 +0000)]
vxlan: don't allow overwrite of config src addr

[ Upstream commit 1158632b5a2dcce0786c1b1b99654e81cc867981 ]

When using IPv6 transport and a default dst, a pointer to the configured
source address is passed into the route lookup. If no source address is
configured, then the value is overwritten.

IPv6 route lookup ignores egress ifindex match if the source address is set,
so if egress ifindex match is desired, the source address must be passed
as any. The overwrite breaks this for subsequent lookups.

Avoid this by copying the configured address to an existing stack variable
and pass a pointer to that instead.

Fixes: 272d96a5ab10 ("net: vxlan: lwt: Use source ip address during route lookup.")
Signed-off-by: Brian Russell <brussell@brocade.com>
Acked-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agovti6: return GRE_KEY for vti6
David Forster [Fri, 24 Feb 2017 14:20:32 +0000 (14:20 +0000)]
vti6: return GRE_KEY for vti6

[ Upstream commit 7dcdf941cdc96692ab99fd790c8cc68945514851 ]

Align vti6 with vti by returning GRE_KEY flag. This enables iproute2
to display tunnel keys on "ip -6 tunnel show"

Signed-off-by: David Forster <dforster@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agovxlan: correctly validate VXLAN ID against VXLAN_N_VID
Matthias Schiffer [Thu, 23 Feb 2017 16:19:41 +0000 (17:19 +0100)]
vxlan: correctly validate VXLAN ID against VXLAN_N_VID

[ Upstream commit 4e37d6911f36545b286d15073f6f2222f840e81c ]

The incorrect check caused an off-by-one error: the maximum VID 0xffffff
was unusable.

Fixes: d342894c5d2f ("vxlan: virtual extensible lan")
Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Acked-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agosctp: deny peeloff operation on asocs with threads sleeping on it
Marcelo Ricardo Leitner [Thu, 23 Feb 2017 12:31:18 +0000 (09:31 -0300)]
sctp: deny peeloff operation on asocs with threads sleeping on it

[ Upstream commit dfcb9f4f99f1e9a49e43398a7bfbf56927544af1 ]

commit 2dcab5984841 ("sctp: avoid BUG_ON on sctp_wait_for_sndbuf")
attempted to avoid a BUG_ON call when the association being used for a
sendmsg() is blocked waiting for more sndbuf and another thread did a
peeloff operation on such asoc, moving it to another socket.

As Ben Hutchings noticed, then in such case it would return without
locking back the socket and would cause two unlocks in a row.

Further analysis also revealed that it could allow a double free if the
application managed to peeloff the asoc that is created during the
sendmsg call, because then sctp_sendmsg() would try to free the asoc
that was created only for that call.

This patch takes another approach. It will deny the peeloff operation
if there is a thread sleeping on the asoc, so this situation doesn't
exist anymore. This avoids the issues described above and also honors
the syscalls that are already being handled (it can be multiple sendmsg
calls).

Joint work with Xin Long.

Fixes: 2dcab5984841 ("sctp: avoid BUG_ON on sctp_wait_for_sndbuf")
Cc: Alexander Popov <alex.popov@linux.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet/mlx5e: Fix wrong CQE decompression
Tariq Toukan [Wed, 22 Feb 2017 15:20:16 +0000 (17:20 +0200)]
net/mlx5e: Fix wrong CQE decompression

[ Upstream commit 36154be40a28e4afaa0416da2681d80b7e2ca319 ]

In cqe compression with striding RQ, the decompression of the CQE field
wqe_counter was done with a wrong wraparound value.
This caused handling cqes with a wrong pointer to wqe (rx descriptor)
and creating SKBs with wrong data, pointing to wrong (and already consumed)
strides/pages.

The meaning of the CQE field wqe_counter in striding RQ holds the
stride index instead of the WQE index. Hence, when decompressing
a CQE, wqe_counter should have wrapped-around the number of strides
in a single multi-packet WQE.

We dropped this wrap-around mask at all in CQE decompression of striding
RQ. It is not needed as in such cases the CQE compression session would
break because of different value of wqe_id field, starting a new
compression session.

Tested:
 ethtool -K ethxx lro off/on
 ethtool --set-priv-flags ethxx rx_cqe_compress on
 super_netperf 16 {ipv4,ipv6} -t TCP_STREAM -m 50 -D
 verified no csum errors and no page refcount issues.

Fixes: 7219ab34f184 ("net/mlx5e: CQE compression")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reported-by: Tom Herbert <tom@herbertland.com>
Cc: kernel-team@fb.com
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet/mlx5e: Update MPWQE stride size when modifying CQE compress state
Saeed Mahameed [Wed, 22 Feb 2017 15:20:15 +0000 (17:20 +0200)]
net/mlx5e: Update MPWQE stride size when modifying CQE compress state

[ Upstream commit 6dc4b54e77282caf17f0ff72aa32dd296037fbc0 ]

When the admin enables/disables cqe compression, updating
mpwqe stride size is required:
    CQE compress ON  ==> stride size = 256B
    CQE compress OFF ==> stride size = 64B

This is already done on driver load via mlx5e_set_rq_type_params, all we
need is just to call it on arbitrary admin changes of cqe compression
state via priv flags or when changing timestamping state
(as it is mutually exclusive with cqe compression).

This bug introduces no functional damage, it only makes cqe compression
occur less often, since in ConnectX4-LX CQE compression is performed
only on packets smaller than stride size.

Tested:
 ethtool --set-priv-flags ethxx rx_cqe_compress on
 pktgen with  64 < pkt size < 256 and netperf TCP_STREAM (IPv4/IPv6)
 verify `ethtool -S ethxx | grep compress` are advancing more often
 (rapidly)

Fixes: 7219ab34f184 ("net/mlx5e: CQE compression")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Cc: kernel-team@fb.com
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet/mlx5e: Fix broken CQE compression initialization
Tariq Toukan [Wed, 22 Feb 2017 15:20:14 +0000 (17:20 +0200)]
net/mlx5e: Fix broken CQE compression initialization

[ Upstream commit b0d4660b4cc52e6477ca3a43435351d565dfcedc ]

Some of RQ type parameters are derived from CQE compression state flag,
CQE compression flag was initialized only after RQ type parameters
setup. This leads to load RQ with stride size smaller than what we
want for when CQE compression is on.

This bug introduces no functional damage, it only makes CQE compression
occur less often, since in ConnectX4-LX CQE compression is performed
only on packets smaller than stride size.

Fix this by marking default status of CQE compression in PFLAG prior to
calling mlx5e_set_rq_priv_params(), as it inits some fields based on it.

Tested:
 load driver on systems where rx CQE compress will be on (MH)
 pktgen with  64 < pkt size < 256 and netperf TCP_STREAM (IPv4/IPv6)
 verify `ethtool -S ethxx | grep compress` are advancing more often
 (rapidly)

Fixes: 2fc4bfb7250d ("net/mlx5e: Dynamic RQ type infrastructure")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Cc: kernel-team@fb.com
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet/mlx5e: Do not reduce LRO WQE size when not using build_skb
Tariq Toukan [Wed, 22 Feb 2017 15:20:13 +0000 (17:20 +0200)]
net/mlx5e: Do not reduce LRO WQE size when not using build_skb

[ Upstream commit 4078e637c12f1e0a74293f1ec9563f42bff14a03 ]

When rq_type is Striding RQ, no room of SKB_RESERVE is needed
as SKB allocation is not done via build_skb.

Fixes: e4b85508072b ("net/mlx5e: Slightly reduce hardware LRO size")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet/mlx5e: Register/unregister vport representors on interface attach/detach
Saeed Mahameed [Wed, 22 Feb 2017 15:20:12 +0000 (17:20 +0200)]
net/mlx5e: Register/unregister vport representors on interface attach/detach

[ Upstream commit 6f08a22c5fb2b9aefb8ecd8496758e7a677c1fde ]

Currently vport representors are added only on driver load and removed on
driver unload.  Apparently we forgot to handle them when we added the
seamless reset flow feature.  This caused to leave the representors
netdevs alive and active with open HW resources on pci shutdown and on
error reset flows.

To overcome this we move their handling to interface attach/detach, so
they would be cleaned up on shutdown and recreated on reset flows.

Fixes: 26e59d8077a3 ("net/mlx5e: Implement mlx5e interface attach/detach callbacks")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoLinux 4.10.4 v4.10.4
Greg Kroah-Hartman [Sat, 18 Mar 2017 11:19:00 +0000 (19:19 +0800)]
Linux 4.10.4

7 years agoIB/mlx5: Verify that Q counters are supported
Kamal Heib [Wed, 18 Jan 2017 12:10:32 +0000 (14:10 +0200)]
IB/mlx5: Verify that Q counters are supported

commit 45bded2c216da6010184ac5ebe88c27f73439009 upstream.

Make sure that the Q counters are supported by the FW before trying
to allocate/deallocte them, this will avoid driver load failure when
they aren't supported by the FW.

Fixes: 0837e86a7a34 ('IB/mlx5: Add per port counters')
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoext4: don't BUG when truncating encrypted inodes on the orphan list
Theodore Ts'o [Tue, 14 Feb 2017 16:31:15 +0000 (11:31 -0500)]
ext4: don't BUG when truncating encrypted inodes on the orphan list

commit 0d06863f903ac5f4f6efb0273079d27de3e53a28 upstream.

Fix a BUG when the kernel tries to mount a file system constructed as
follows:

echo foo > foo.txt
mke2fs -Fq -t ext4 -O encrypt foo.img 100
debugfs -w foo.img << EOF
write foo.txt a
set_inode_field a i_flags 0x80800
set_super_value s_last_orphan 12
quit
EOF

root@kvm-xfstests:~# mount -o loop foo.img /mnt
[  160.238770] ------------[ cut here ]------------
[  160.240106] kernel BUG at /usr/projects/linux/ext4/fs/ext4/inode.c:3874!
[  160.240106] invalid opcode: 0000 [#1] SMP
[  160.240106] Modules linked in:
[  160.240106] CPU: 0 PID: 2547 Comm: mount Tainted: G        W       4.10.0-rc3-00034-gcdd33b941b67 #227
[  160.240106] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1 04/01/2014
[  160.240106] task: f4518000 task.stack: f47b6000
[  160.240106] EIP: ext4_block_zero_page_range+0x1a7/0x2b4
[  160.240106] EFLAGS: 00010246 CPU: 0
[  160.240106] EAX: 00000001 EBX: f7be4b50 ECX: f47b7dc0 EDX: 00000007
[  160.240106] ESI: f43b05a8 EDI: f43babec EBP: f47b7dd0 ESP: f47b7dac
[  160.240106]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[  160.240106] CR0: 80050033 CR2: bfd85b08 CR3: 34a00680 CR4: 000006f0
[  160.240106] Call Trace:
[  160.240106]  ext4_truncate+0x1e9/0x3e5
[  160.240106]  ext4_fill_super+0x286f/0x2b1e
[  160.240106]  ? set_blocksize+0x2e/0x7e
[  160.240106]  mount_bdev+0x114/0x15f
[  160.240106]  ext4_mount+0x15/0x17
[  160.240106]  ? ext4_calculate_overhead+0x39d/0x39d
[  160.240106]  mount_fs+0x58/0x115
[  160.240106]  vfs_kern_mount+0x4b/0xae
[  160.240106]  do_mount+0x671/0x8c3
[  160.240106]  ? _copy_from_user+0x70/0x83
[  160.240106]  ? strndup_user+0x31/0x46
[  160.240106]  SyS_mount+0x57/0x7b
[  160.240106]  do_int80_syscall_32+0x4f/0x61
[  160.240106]  entry_INT80_32+0x2f/0x2f
[  160.240106] EIP: 0xb76b919e
[  160.240106] EFLAGS: 00000246 CPU: 0
[  160.240106] EAX: ffffffda EBX: 08053838 ECX: 08052188 EDX: 080537e8
[  160.240106] ESI: c0ed0000 EDI: 00000000 EBP: 080537e8 ESP: bfa13660
[  160.240106]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
[  160.240106] Code: 59 8b 00 a8 01 0f 84 09 01 00 00 8b 07 66 25 00 f0 66 3d 00 80 75 61 89 f8 e8 3e e2 ff ff 84 c0 74 56 83 bf 48 02 00 00 00 75 02 <0f> 0b 81 7d e8 00 10 00 00 74 02 0f 0b 8b 43 04 8b 53 08 31 c9
[  160.240106] EIP: ext4_block_zero_page_range+0x1a7/0x2b4 SS:ESP: 0068:f47b7dac
[  160.317241] ---[ end trace d6a773a375c810a5 ]---

The problem is that when the kernel tries to truncate an inode in
ext4_truncate(), it tries to clear any on-disk data beyond i_size.
Without the encryption key, it can't do that, and so it triggers a
BUG.

E2fsck does *not* provide this service, and in practice most file
systems have their orphan list processed by e2fsck, so to avoid
crashing, this patch skips this step if we don't have access to the
encryption key (which is the case when processing the orphan list; in
all other cases, we will have the encryption key, or the kernel
wouldn't have allowed the file to be opened).

An open question is whether the fact that e2fsck isn't clearing the
bytes beyond i_size causing problems --- and if we've lived with it
not doing it for so long, can we drop this from the kernel replay of
the orphan list in all cases (not just when we don't have the key for
encrypted inodes).

Addresses-Google-Bug: #35209576

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agorc: raw decoder for keymap protocol is not loaded on register
Sean Young [Wed, 22 Feb 2017 21:48:01 +0000 (18:48 -0300)]
rc: raw decoder for keymap protocol is not loaded on register

commit 413808685dd7c9b54bbc5af79da2eaddd0fc3cb2 upstream.

When the protocol is set via the sysfs protocols attribute, the
decoder is loaded. However, when it is not when a device is first
plugged in or registered.

Fixes: acc1c3c ("[media] media: rc: load decoder modules on-demand")
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodm: flush queued bios when process blocks to avoid deadlock
Mikulas Patocka [Wed, 15 Feb 2017 16:26:10 +0000 (11:26 -0500)]
dm: flush queued bios when process blocks to avoid deadlock

commit d67a5f4b5947aba4bfe9a80a2b86079c215ca755 upstream.

Commit df2cb6daa4 ("block: Avoid deadlocks with bio allocation by
stacking drivers") created a workqueue for every bio set and code
in bio_alloc_bioset() that tries to resolve some low-memory deadlocks
by redirecting bios queued on current->bio_list to the workqueue if the
system is low on memory.  However other deadlocks (see below **) may
happen, without any low memory condition, because generic_make_request
is queuing bios to current->bio_list (rather than submitting them).

** the related dm-snapshot deadlock is detailed here:
https://www.redhat.com/archives/dm-devel/2016-July/msg00065.html

Fix this deadlock by redirecting any bios on current->bio_list to the
bio_set's rescue workqueue on every schedule() call.  Consequently,
when the process blocks on a mutex, the bios queued on
current->bio_list are dispatched to independent workqueus and they can
complete without waiting for the mutex to be available.

The structure blk_plug contains an entry cb_list and this list can contain
arbitrary callback functions that are called when the process blocks.
To implement this fix DM (ab)uses the onstack plug's cb_list interface
to get its flush_current_bio_list() called at schedule() time.

This fixes the snapshot deadlock - if the map method blocks,
flush_current_bio_list() will be called and it redirects bios waiting
on current->bio_list to appropriate workqueues.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1267650
Depends-on: df2cb6daa4 ("block: Avoid deadlocks with bio allocation by stacking drivers")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoserial_ir: ensure we're ready to receive interrupts
Sean Young [Sat, 25 Feb 2017 11:28:16 +0000 (08:28 -0300)]
serial_ir: ensure we're ready to receive interrupts

commit 0265634eb9e04a16ae99941c320718c38eb865e0 upstream.

When the interrupt requested with devm_request_irq(), serial_ir.rcdev
is still null so will cause null deference if the irq handler is called
early on.

Also ensure that timeout_timer is setup.

Link: http://lkml.kernel.org/r/CA+55aFxsh2uF8gi5sN_guY3Z+tiLv7LpJYKBw+y8vqLzp+TsnQ@mail.gmail.com
[mchehab@s-opensource.com: moved serial_ir_probe() back to its original place]

Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodrm/i915/gvt: Fix superfluous newline in GVT_DISPLAY_READY env var
Takashi Iwai [Mon, 20 Feb 2017 13:58:25 +0000 (14:58 +0100)]
drm/i915/gvt: Fix superfluous newline in GVT_DISPLAY_READY env var

commit d8e9b2b9097c117880dc22933239d05199c60b96 upstream.

send_display_send_uevent() sends two environment variable, and the
first one GVT_DISPLAY_READY is set including a new line at the end of
the string; that is obviously superfluous and wrong -- at least, it
*looks* so when you only read the code.

However, it doesn't appear in the actual output by a (supposedly
unexpected) trick.  The code uses snprintf() and truncates the string
in size 20 bytes.  This makes the string as GVT_DISPLAY_READY=0 or
...=1 including the trailing NUL-letter.  That is, the '\n' found in
the format string is always cut off as a result.

Although the code gives the correct result, it is confusing.  This
patch addresses it, just removing the superfluous '\n' from the format
string for avoiding further confusion.  If the argument "ready" were
not a  bool, the size 20 should be corrected as well.  But it's a
bool, so we can leave the magic number 20 as is for now.

FWIW, the bug was spotted by a new GCC7 warning:
  drivers/gpu/drm/i915/gvt/handlers.c: In function 'pvinfo_mmio_write':
  drivers/gpu/drm/i915/gvt/handlers.c:1042:34: error: 'snprintf' output truncated before the last format character [-Werror=format-truncation=]
    snprintf(display_ready_str, 20, "GVT_DISPLAY_READY=%d\n", ready);
                                    ^~~~~~~~~~~~~~~~~~~~~~~~
  drivers/gpu/drm/i915/gvt/handlers.c:1042:2: note: 'snprintf' output 21 bytes into a destination of size 20
    snprintf(display_ready_str, 20, "GVT_DISPLAY_READY=%d\n", ready);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fixes: 04d348ae3f0a ("drm/i915/gvt: vGPU display virtualization")
Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1025903
Reported-by: Richard Biener <rguenther@suse.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoKVM: arm/arm64: Let vcpu thread modify its own active state
Jintack Lim [Mon, 6 Mar 2017 13:42:37 +0000 (05:42 -0800)]
KVM: arm/arm64: Let vcpu thread modify its own active state

commit 370a0ec1819990f8e2a93df7cc9c0146980ed45f upstream.

Currently, if a vcpu thread tries to change the active state of an
interrupt which is already on the same vcpu's AP list, it will loop
forever. Since the VGIC mmio handler is called after a vcpu has
already synced back the LR state to the struct vgic_irq, we can just
let it proceed safely.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoKVM: s390: Fix guest migration for huge guests resulting in panic
Janosch Frank [Thu, 2 Mar 2017 14:23:42 +0000 (15:23 +0100)]
KVM: s390: Fix guest migration for huge guests resulting in panic

commit 2e4d88009f57057df7672fa69a32b5224af54d37 upstream.

While we can technically not run huge page guests right now, we can
setup a guest with huge pages. Trying to migrate it will trigger a
VM_BUG_ON and, if the kernel is not configured to panic on a BUG, it
will happily try to work on non-existing page table entries.

With this patch, we always return "dirty" if we encounter a large page
when migrating. This at least fixes the immediate problem until we
have proper handling for both kind of pages.

Fixes: 15f36eb ("KVM: s390: Add proper dirty bitmap support to S390 kvm.")
Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoserial: samsung: Continue to work if DMA request fails
Krzysztof Kozlowski [Sat, 25 Feb 2017 16:36:44 +0000 (18:36 +0200)]
serial: samsung: Continue to work if DMA request fails

commit f98c7bce570bdbe344b74ff5daa7dfeef3f22929 upstream.

If DMA is not available (even when configured in DeviceTree), the driver
will fail the startup procedure thus making serial console not
available.

For example this causes boot failure on QEMU ARMv7 (Exynos4210, SMDKC210):
    [    1.302575] OF: amba_device_add() failed (-19) for /amba/pdma@12680000
    ...
    [   11.435732] samsung-uart 13800000.serial: DMA request failed
    [   72.963893] samsung-uart 13800000.serial: DMA request failed
    [   73.143361] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000000

DMA is not necessary for serial to work, so continue with UART startup
after emitting a warning.

Fixes: 62c37eedb74c ("serial: samsung: add dma reqest/release functions")
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoUSB: serial: io_ti: fix information leak in completion handler
Johan Hovold [Mon, 6 Mar 2017 16:36:40 +0000 (17:36 +0100)]
USB: serial: io_ti: fix information leak in completion handler

commit 654b404f2a222f918af9b0cd18ad469d0c941a8e upstream.

Add missing sanity check to the bulk-in completion handler to avoid an
integer underflow that can be triggered by a malicious device.

This avoids leaking 128 kB of memory content from after the URB transfer
buffer to user space.

Fixes: 8c209e6782ca ("USB: make actual_length in struct urb field u32")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoUSB: serial: io_ti: fix NULL-deref in interrupt callback
Johan Hovold [Mon, 6 Mar 2017 16:36:37 +0000 (17:36 +0100)]
USB: serial: io_ti: fix NULL-deref in interrupt callback

commit 0b1d250afb8eb9d65afb568bac9b9f9253a82b49 upstream.

Fix a NULL-pointer dereference in the interrupt callback should a
malicious device send data containing a bad port number by adding the
missing sanity check.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoUSB: iowarrior: fix NULL-deref in write
Johan Hovold [Tue, 7 Mar 2017 15:11:04 +0000 (16:11 +0100)]
USB: iowarrior: fix NULL-deref in write

commit de46e56653de7b3b54baa625bd582635008b8d05 upstream.

Make sure to verify that we have the required interrupt-out endpoint for
IOWarrior56 devices to avoid dereferencing a NULL-pointer in write
should a malicious device lack such an endpoint.

Fixes: 946b960d13c1 ("USB: add driver for iowarrior devices.")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoUSB: iowarrior: fix NULL-deref at probe
Johan Hovold [Tue, 7 Mar 2017 15:11:03 +0000 (16:11 +0100)]
USB: iowarrior: fix NULL-deref at probe

commit b7321e81fc369abe353cf094d4f0dc2fe11ab95f upstream.

Make sure to check for the required interrupt-in endpoint to avoid
dereferencing a NULL-pointer should a malicious device lack such an
endpoint.

Note that a fairly recent change purported to fix this issue, but added
an insufficient test on the number of endpoints only, a test which can
now be removed.

Fixes: 4ec0ef3a8212 ("USB: iowarrior: fix oops with malicious USB descriptors")
Fixes: 946b960d13c1 ("USB: add driver for iowarrior devices.")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoUSB: serial: omninet: fix reference leaks at open
Johan Hovold [Mon, 6 Mar 2017 16:36:38 +0000 (17:36 +0100)]
USB: serial: omninet: fix reference leaks at open

commit 30572418b445d85fcfe6c8fe84c947d2606767d8 upstream.

This driver needlessly took another reference to the tty on open, a
reference which was then never released on close. This lead to not just
a leak of the tty, but also a driver reference leak that prevented the
driver from being unloaded after a port had once been opened.

Fixes: 4a90f09b20f4 ("tty: usb-serial krefs")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoUSB: serial: safe_serial: fix information leak in completion handler
Johan Hovold [Mon, 6 Mar 2017 16:36:41 +0000 (17:36 +0100)]
USB: serial: safe_serial: fix information leak in completion handler

commit 8c76d7cd520ebffc1ea9ea0850d87a224a50c7f2 upstream.

Add missing sanity check to the bulk-in completion handler to avoid an
integer underflow that could be triggered by a malicious device.

This avoids leaking up to 56 bytes from after the URB transfer buffer to
user space.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agousb: host: xhci-plat: Fix timeout on removal of hot pluggable xhci controllers
Guenter Roeck [Thu, 9 Mar 2017 13:39:37 +0000 (15:39 +0200)]
usb: host: xhci-plat: Fix timeout on removal of hot pluggable xhci controllers

commit dcc7620cad5ad1326a78f4031a7bf4f0e5b42984 upstream.

Upstream commit 98d74f9ceaef ("xhci: fix 10 second timeout on removal of
PCI hotpluggable xhci controllers") fixes a problem with hot pluggable PCI
xhci controllers which can result in excessive timeouts, to the point where
the system reports a deadlock.

The same problem is seen with hot pluggable xhci controllers using the
xhci-plat driver, such as the driver used for Type-C ports on rk3399.
Similar to hot-pluggable PCI controllers, the driver for this chip
removes the xhci controller from the system when the Type-C cable is
disconnected.

The solution for PCI devices works just as well for non-PCI devices
and avoids the problem.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agousb: host: xhci-dbg: HCIVERSION should be a binary number
Peter Chen [Thu, 9 Mar 2017 13:39:36 +0000 (15:39 +0200)]
usb: host: xhci-dbg: HCIVERSION should be a binary number

commit f95e60a7dbecd2de816bb3ad517b3d4fbc20b507 upstream.

According to xHCI spec, HCIVERSION containing a BCD encoding
of the xHCI specification revision number, 0100h corresponds
to xHCI version 1.0. Change "100" as "0x100".

Cc: Lu Baolu <baolu.lu@linux.intel.com>
Fixes: 04abb6de2825 ("xhci: Read and parse new xhci 1.1 capability register")
Signed-off-by: Peter Chen <peter.chen@nxp.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoRevert "usb: gadget: uvc: Add missing call for additional setup data"
Roger Quadros [Thu, 2 Mar 2017 08:44:58 +0000 (10:44 +0200)]
Revert "usb: gadget: uvc: Add missing call for additional setup data"

commit eb38d913c27f32f4df173791051fecf6aca34173 upstream.

This reverts commit 4fbac5206afd01b717d4bdc58793d471f3391b4b.

This commit breaks g_webcam when used with uvc-gadget [1].

The user space application (e.g. uvc-gadget) is responsible for
sending response to UVC class specific requests on control endpoint
in uvc_send_response() in uvc_v4l2.c.

The bad commit was causing a duplicate response to be sent with
incorrect response data thus causing UVC probe to fail at the host
and broken control transfer endpoint at the gadget.

[1] - git://git.ideasonboard.org/uvc-gadget.git

Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agousb: gadget: function: f_fs: pass companion descriptor along
Felipe Balbi [Tue, 31 Jan 2017 12:54:45 +0000 (14:54 +0200)]
usb: gadget: function: f_fs: pass companion descriptor along

commit 2bfa0719ac2a9b2f3c91345873d3cdebd0296ba9 upstream.

If we're dealing with SuperSpeed endpoints, we need
to make sure to pass along the companion descriptor
and initialize fields needed by the Gadget
API. Eventually, f_fs.c should be converted to use
config_ep_by_speed() like all other functions,
though.

Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agousb: ohci-at91: Do not drop unhandled USB suspend control requests
Jelle Martijn Kok [Tue, 21 Feb 2017 11:48:18 +0000 (12:48 +0100)]
usb: ohci-at91: Do not drop unhandled USB suspend control requests

commit 85550f9148a852ed363a386577ad31b97b95dfb8 upstream.

In patch 2e2aa1bc7eff90ecm, USB suspend and wakeup control requests are
passed to SFR_OHCIICR register. If a processor does not have such a
register, this hub control request will be dropped.

If no such a SFR register is available, all USB suspend control requests
will now be processed using ohci_hub_control()
(like before patch 2e2aa1bc7eff90ecm.)

Tested on an Atmel AT91SAM9G20 with an on-board TI TUSB2046B hub chip
If the last USB device is unplugged from the USB hub, the hub goes into
sleep and will not wakeup when an USB devices is inserted.

Fixes: 2e2aa1bc7eff90ec ("usb: ohci-at91: Forcibly suspend ports while USB suspend")
Signed-off-by: Jelle Martijn Kok <jmkok@youcom.nl>
Tested-by: Wenyou Yang <wenyou.yang@atmel.com>
Cc: Wenyou Yang <wenyou.yang@atmel.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Reviewed-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agousb: dwc3-omap: Fix missing break in dwc3_omap_set_mailbox()
Roger Quadros [Wed, 15 Feb 2017 11:38:22 +0000 (13:38 +0200)]
usb: dwc3-omap: Fix missing break in dwc3_omap_set_mailbox()

commit 0913750f9fb6f26bcd00c8f9dd9a8d1b8d031246 upstream.

We need to break from all cases if we want to treat
each one of them separately.

Reported-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Fixes: d2728fb3e01f ("usb: dwc3: omap: Pass VBUS and ID events transparently")
Signed-off-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agousb: dwc3: gadget: make Set Endpoint Configuration macros safe
Felipe Balbi [Tue, 31 Jan 2017 11:24:54 +0000 (13:24 +0200)]
usb: dwc3: gadget: make Set Endpoint Configuration macros safe

commit 7369090a9fb57c3fc705ce355d2e4523a5a24716 upstream.

Some gadget drivers are bad, bad boys. We notice
that ADB was passing bad Burst Size which caused top
bits of param0 to be overwritten which confused DWC3
when running this command.

In order to avoid future issues, we're going to make
sure values passed by macros are always safe for the
controller. Note that ADB still needs a fix to *not*
pass bad values.

Reported-by: Mohamed Abbas <mohamed.abbas@intel.com>
Sugested-by: Adam Andruszak <adam.andruszak@intel.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agousb: gadget: dummy_hcd: clear usb_gadget region before registration
Peter Chen [Tue, 28 Feb 2017 06:25:45 +0000 (14:25 +0800)]
usb: gadget: dummy_hcd: clear usb_gadget region before registration

commit 5bbc852676ae08e818241cf66a3ffe4be44225c4 upstream.

When the user does device unbind and rebind test, the kernel will
show below dump due to usb_gadget memory region is dirty after unbind.
Clear usb_gadget region for every new probe.

root@imx6qdlsolo:/sys/bus/platform/drivers/dummy_udc# echo dummy_udc.0 > bind
[  102.523312] kobject (eddd78b0): tried to init an initialized object, something is seriously wrong.
[  102.532447] CPU: 0 PID: 734 Comm: sh Not tainted 4.10.0-rc7-00872-g1b2b8e9 #1298
[  102.539866] Hardware name: Freescale i.MX6 SoloX (Device Tree)
[  102.545717] Backtrace:
[  102.548225] [<c010d090>] (dump_backtrace) from [<c010d338>] (show_stack+0x18/0x1c)
[  102.555822]  r7:ede34000 r6:60010013 r5:00000000 r4:c0f29418
[  102.561512] [<c010d320>] (show_stack) from [<c040c2a4>] (dump_stack+0xb4/0xe8)
[  102.568764] [<c040c1f0>] (dump_stack) from [<c040e6d4>] (kobject_init+0x80/0x9c)
[  102.576187]  r10:0000001f r9:eddd7000 r8:eeaf8c10 r7:eddd78a8 r6:c177891c r5:c0f3b060
[  102.584036]  r4:eddd78b0 r3:00000000
[  102.587641] [<c040e654>] (kobject_init) from [<c05359a4>] (device_initialize+0x28/0xf8)
[  102.595665]  r5:eebc4800 r4:eddd78a8
[  102.599268] [<c053597c>] (device_initialize) from [<c05382ac>] (device_register+0x14/0x20)
[  102.607556]  r7:eddd78a8 r6:00000000 r5:eebc4800 r4:eddd78a8
[  102.613256] [<c0538298>] (device_register) from [<c0668ef4>] (usb_add_gadget_udc_release+0x8c/0x1ec)
[  102.622410]  r5:eebc4800 r4:eddd7860
[  102.626015] [<c0668e68>] (usb_add_gadget_udc_release) from [<c0669068>] (usb_add_gadget_udc+0x14/0x18)
[  102.635351]  r10:0000001f r9:eddd7000 r8:eddd788c r7:bf003770 r6:eddd77f8 r5:eddd7818
[  102.643198]  r4:eddd785c r3:eddd7b24
[  102.646834] [<c0669054>] (usb_add_gadget_udc) from [<bf003428>] (dummy_udc_probe+0x170/0x1c4 [dummy_hcd])
[  102.656458] [<bf0032b8>] (dummy_udc_probe [dummy_hcd]) from [<c053d114>] (platform_drv_probe+0x54/0xb8)
[  102.665881]  r10:00000008 r9:c1778960 r8:bf004128 r7:fffffdfb r6:bf004128 r5:eeaf8c10
[  102.673727]  r4:eeaf8c10
[  102.676293] [<c053d0c0>] (platform_drv_probe) from [<c053b160>] (driver_probe_device+0x264/0x474)
[  102.685186]  r7:00000000 r6:00000000 r5:c1778960 r4:eeaf8c10
[  102.690876] [<c053aefc>] (driver_probe_device) from [<c05397c4>] (bind_store+0xb8/0x14c)
[  102.698994]  r10:eeb3bb4c r9:ede34000 r8:0000000c r7:eeaf8c44 r6:bf004128 r5:c0f3b668
[  102.706840]  r4:eeaf8c10
[  102.709402] [<c053970c>] (bind_store) from [<c0538ca8>] (drv_attr_store+0x28/0x34)
[  102.716998]  r9:ede34000 r8:00000000 r7:ee3863c0 r6:ee3863c0 r5:c0538c80 r4:c053970c
[  102.724776] [<c0538c80>] (drv_attr_store) from [<c029c930>] (sysfs_kf_write+0x50/0x54)
[  102.732711]  r5:c0538c80 r4:0000000c
[  102.736313] [<c029c8e0>] (sysfs_kf_write) from [<c029be84>] (kernfs_fop_write+0x100/0x214)
[  102.744599]  r7:ee3863c0 r6:eeb3bb40 r5:00000000 r4:00000000
[  102.750287] [<c029bd84>] (kernfs_fop_write) from [<c0222dd8>] (__vfs_write+0x34/0x120)
[  102.758231]  r10:00000000 r9:ede34000 r8:c0108bc4 r7:0000000c r6:ede35f80 r5:c029bd84
[  102.766077]  r4:ee223780
[  102.768638] [<c0222da4>] (__vfs_write) from [<c0224678>] (vfs_write+0xa8/0x170)
[  102.775974]  r9:ede34000 r8:c0108bc4 r7:ede35f80 r6:01861cb0 r5:ee223780 r4:0000000c
[  102.783743] [<c02245d0>] (vfs_write) from [<c0225498>] (SyS_write+0x4c/0xa8)
[  102.790818]  r9:ede34000 r8:c0108bc4 r7:0000000c r6:01861cb0 r5:ee223780 r4:ee223780
[  102.798595] [<c022544c>] (SyS_write) from [<c0108a20>] (ret_fast_syscall+0x0/0x1c)
[  102.806188]  r7:00000004 r6:b6e83d58 r5:01861cb0 r4:0000000c

Fixes: 90fccb529d24 ("usb: gadget: Gadget directory cleanup - group UDC drivers")
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Peter Chen <peter.chen@nxp.com>
Tested-by: Xiaolong Ye <xiaolong.ye@intel.com>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoPCI: Prevent VPD access for QLogic ISP2722
Ethan Zhao [Mon, 27 Feb 2017 08:08:44 +0000 (17:08 +0900)]
PCI: Prevent VPD access for QLogic ISP2722

commit 0d5370d1d85251e5893ab7c90a429464de2e140b upstream.

QLogic ISP2722-based 16/32Gb Fibre Channel to PCIe Adapter has the VPD
access issue too, while read the common pci-sysfs access interface shown as

 /sys/devices/pci0000:00/0000:00:03.2/0000:0b:00.0/vpd

with simple 'cat' could cause system hang and panic:

  Kernel panic - not syncing: An NMI occurred. Depending on your system the reason for the NMI is logged in any one of the following resources:
  1. Integrated Management Log (IML)
  2. OA Syslog
  3. OA Forward Progress Log
  4. iLO Event Log
  CPU: 0 PID: 15070 Comm: udevadm Not tainted 4.1.12
  Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 12/27/2015
   0000000000000086 000000007f0cdf51 ffff880c4fa05d58 ffffffff817193de
   ffffffffa00b42d8 0000000000000075 ffff880c4fa05dd8 ffffffff81714072
   0000000000000008 ffff880c4fa05de8 ffff880c4fa05d88 000000007f0cdf51
  Call Trace:
   <NMI>  [<ffffffff817193de>] dump_stack+0x63/0x81
   [<ffffffff81714072>] panic+0xd0/0x20e
   [<ffffffffa00b390d>] hpwdt_pretimeout+0xdd/0xe0 [hpwdt]
   [<ffffffff81021fc9>] ? sched_clock+0x9/0x10
   [<ffffffff8101c101>] nmi_handle+0x91/0x170
   [<ffffffff8101c10c>] ? nmi_handle+0x9c/0x170
   [<ffffffff8101c5fe>] io_check_error+0x1e/0xa0
   [<ffffffff8101c719>] default_do_nmi+0x99/0x140
   [<ffffffff8101c8b4>] do_nmi+0xf4/0x170
   [<ffffffff817232c5>] end_repeat_nmi+0x1a/0x1e
   [<ffffffff815d724b>] ? pci_conf1_read+0xeb/0x120
   [<ffffffff815d724b>] ? pci_conf1_read+0xeb/0x120
   [<ffffffff815d724b>] ? pci_conf1_read+0xeb/0x120
   <<EOE>>  [<ffffffff815db4b3>] raw_pci_read+0x23/0x40
   [<ffffffff815db4fc>] pci_read+0x2c/0x30
   [<ffffffff8136f612>] pci_user_read_config_word+0x72/0x110
   [<ffffffff8136f746>] pci_vpd_pci22_wait+0x96/0x130
   [<ffffffff8136ff9b>] pci_vpd_pci22_read+0xdb/0x1a0
   [<ffffffff8136ea30>] pci_read_vpd+0x20/0x30
   [<ffffffff8137d590>] read_vpd_attr+0x30/0x40
   [<ffffffff8128e037>] sysfs_kf_bin_read+0x47/0x70
   [<ffffffff8128d24e>] kernfs_fop_read+0xae/0x180
   [<ffffffff8120dd97>] __vfs_read+0x37/0x100
   [<ffffffff812ba7e4>] ? security_file_permission+0x84/0xa0
   [<ffffffff8120e366>] ? rw_verify_area+0x56/0xe0
   [<ffffffff8120e476>] vfs_read+0x86/0x140
   [<ffffffff8120f3f5>] SyS_read+0x55/0xd0
   [<ffffffff81720f2e>] system_call_fastpath+0x12/0x71
  Shutting down cpus with NMI
  Kernel Offset: disabled
  drm_kms_helper: panic occurred, switching back to text console

So blacklist the access to its VPD.

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agopowerpc/xics: Work around limitations of OPAL XICS priority handling
Balbir Singh [Fri, 3 Mar 2017 00:58:44 +0000 (11:58 +1100)]
powerpc/xics: Work around limitations of OPAL XICS priority handling

commit a69e2fb70350a66f91175cd2625f1e8215c5b6e9 upstream.

The CPPR (Current Processor Priority Register) of a XICS interrupt
presentation controller contains a value N, such that only interrupts
with a priority "more favoured" than N will be received by the CPU,
where "more favoured" means "less than". So if the CPPR has the value 5
then only interrupts with a priority of 0-4 inclusive will be received.

In theory the CPPR can support a value of 0 to 255 inclusive.
In practice Linux only uses values of 0, 4, 5 and 0xff. Setting the CPPR
to 0 rejects all interrupts, setting it to 0xff allows all interrupts.
The values 4 and 5 are used to differentiate IPIs from external
interrupts. Setting the CPPR to 5 allows IPIs to be received but not
external interrupts.

The CPPR emulation in the OPAL XICS implementation only directly
supports priorities 0 and 0xff. All other priorities are considered
equivalent, and mapped to a single priority value internally. This means
when using icp-opal we can not allow IPIs but not externals.

This breaks Linux's use of priority values when a CPU is hot unplugged.
After migrating IRQs away from the CPU that is being offlined, we set
the priority to 5, meaning we still want the offline CPU to receive
IPIs. But the effect of the OPAL XICS emulation's use of a single
priority value is that all interrupts are rejected by the CPU. With the
CPU offline, and not receiving IPIs, we may not be able to wake it up to
bring it back online.

The first part of the fix is in icp_opal_set_cpu_priority(). CPPR values
of 0 to 4 inclusive will correctly cause all interrupts to be rejected,
so we pass those CPPR values through to OPAL. However if we are called
with a CPPR of 5 or greater, the caller is expecting to be able to allow
IPIs but not external interrupts. We know this doesn't work, so instead
of rejecting all interrupts we choose the opposite which is to allow all
interrupts. This is still not correct behaviour, but we know for the
only existing caller (xics_migrate_irqs_away()), that it is the better
option.

The other part of the fix is in xics_migrate_irqs_away(). Instead of
setting priority (CPPR) to 0, and then back to 5 before migrating IRQs,
we migrate the IRQs before setting the priority back to 5. This should
have no effect on an ICP backend with a working set_priority(), and on
icp-opal it means we will keep all interrupts blocked until after we've
finished doing the IRQ migration. Additionally we wait for 5ms after
doing the migration to make sure there are no IRQs in flight.

Fixes: d74361881f0d ("powerpc/xics: Add ICP OPAL backend")
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Reported-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Tested-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
[mpe: Rewrote comments and change log, change delay to 5ms]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agopowerpc/booke: Fix boot crash due to null hugepd
Laurentiu Tudor [Thu, 16 Feb 2017 15:11:29 +0000 (09:11 -0600)]
powerpc/booke: Fix boot crash due to null hugepd

commit 3fb66a70a4ae886445743354e4b60e54058bb3ff upstream.

On 32-bit book-e machines, hugepd_ok() no longer takes into account null
hugepd values, causing this crash at boot:

  Unable to handle kernel paging request for data at address 0x80000000
  ...
  NIP [c0018378] follow_huge_addr+0x38/0xf0
  LR [c001836c] follow_huge_addr+0x2c/0xf0
  Call Trace:
   follow_huge_addr+0x2c/0xf0 (unreliable)
   follow_page_mask+0x40/0x3e0
   __get_user_pages+0xc8/0x450
   get_user_pages_remote+0x8c/0x250
   copy_strings+0x110/0x390
   copy_strings_kernel+0x2c/0x50
   do_execveat_common+0x478/0x630
   do_execve+0x2c/0x40
   try_to_run_init_process+0x18/0x60
   kernel_init+0xbc/0x110
   ret_from_kernel_thread+0x5c/0x64

This impacts all nxp (ex-freescale) 32-bit booke platforms.

This was caused by the change of hugepd_t.pd from signed to unsigned,
and the update to the nohash version of hugepd_ok(). Previously
hugepd_ok() could exclude all non-huge and NULL pgds using > 0, whereas
now we need to explicitly check that the value is not zero and also that
PD_HUGE is *clear*.

This isn't protected by the pgd_none() check in __find_linux_pte_or_hugepte()
because on 32-bit we use pgtable-nopud.h, which causes the pgd_none()
check to be always false.

Fixes: 20717e1ff526 ("powerpc/mm: Fix little-endian 4K hugetlb")
Reported-by: Madalin-Cristian Bucur <madalin.bucur@nxp.com>
Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
[mpe: Flesh out change log details.]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agopowerpc: Emulation support for load/store instructions on LE
Ravi Bangoria [Tue, 14 Feb 2017 09:16:42 +0000 (14:46 +0530)]
powerpc: Emulation support for load/store instructions on LE

commit e148bd17f48bd17fca2f4f089ec879fa6e47e34c upstream.

emulate_step() uses a number of underlying kernel functions that were
initially not enabled for LE. This has been rectified since. So, fix
emulate_step() for LE for the corresponding instructions.

Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoi2c: add missing of_node_put in i2c_mux_del_adapters
Qi Hou [Fri, 3 Mar 2017 07:57:11 +0000 (15:57 +0800)]
i2c: add missing of_node_put in i2c_mux_del_adapters

commit 2e1e4949f9dfb053122785cd73540bb1e61f768b upstream.

Refcount of of_node is increased with of_node_get() in i2c_mux_add_adapter().
It must be decreased with of_node_put() in i2c_mux_del_adapters().

Signed-off-by: Qi Hou <qi.hou@windriver.com>
Reviewed-by: Zhang Xiao <xiao.zhang@windriver.com>
Signed-off-by: Peter Rosin <peda@axentia.se>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodw2102: don't do DMA on stack
Jonathan McDowell [Wed, 15 Feb 2017 20:29:15 +0000 (18:29 -0200)]
dw2102: don't do DMA on stack

commit 606142af57dad981b78707234cfbd15f9f7b7125 upstream.

On Kernel 4.9, WARNINGs about doing DMA on stack are hit at
the dw2102 driver: one in su3000_power_ctrl() and the other in tt_s2_4600_frontend_attach().

Both were due to the use of buffers on the stack as parameters to
dvb_usb_generic_rw() and the resulting attempt to do DMA with them.

The device was non-functional as a result.

So, switch this driver over to use a buffer within the device state
structure, as has been done with other DVB-USB drivers.

Tested with TechnoTrend TT-connect S2-4600.

[mchehab@osg.samsung.com: fixed a warning at su3000_i2c_transfer() that
 state var were dereferenced before check 'd']
Signed-off-by: Jonathan McDowell <noodles@earth.li>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoefi/arm: Fix boot crash with CONFIG_CPUMASK_OFFSTACK=y
Ard Biesheuvel [Wed, 1 Mar 2017 19:05:54 +0000 (19:05 +0000)]
efi/arm: Fix boot crash with CONFIG_CPUMASK_OFFSTACK=y

commit d1eb98143c56f24fef125f5bbed49ae0b52fb7d6 upstream.

On ARM and arm64, we use a dedicated mm_struct to map the UEFI
Runtime Services regions, which allows us to map those regions
on demand, and in a way that is guaranteed to be compatible
with incoming kernels across kexec.

As it turns out, we don't fully initialize the mm_struct in the
same way as process mm_structs are initialized on fork(), which
results in the following crash on ARM if CONFIG_CPUMASK_OFFSTACK=y
is enabled:

  ...
  EFI Variables Facility v0.08 2004-May-17
  Unable to handle kernel NULL pointer dereference at virtual address 00000000
  [...]
  Process swapper/0 (pid: 1)
  ...
  __memzero()
  check_and_switch_context()
  virt_efi_get_next_variable()
  efivar_init()
  efivars_sysfs_init()
  do_one_initcall()
  ...

This is due to a missing call to mm_init_cpumask(), so add it.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1488395154-29786-1-git-send-email-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoucount: Remove the atomicity from ucount->count
Eric W. Biederman [Sun, 5 Mar 2017 21:03:22 +0000 (15:03 -0600)]
ucount: Remove the atomicity from ucount->count

commit 040757f738e13caaa9c5078bca79aa97e11dde88 upstream.

Always increment/decrement ucount->count under the ucounts_lock.  The
increments are there already and moving the decrements there means the
locking logic of the code is simpler.  This simplification in the
locking logic fixes a race between put_ucounts and get_ucounts that
could result in a use-after-free because the count could go zero then
be found by get_ucounts and then be freed by put_ucounts.

A bug presumably this one was found by a combination of syzkaller and
KASAN.  JongWhan Kim reported the syzkaller failure and Dmitry Vyukov
spotted the race in the code.

Fixes: f6b2db1a3e8d ("userns: Make the count of user namespaces per user")
Reported-by: JongHwan Kim <zzoru007@gmail.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agotracing: Add #undef to fix compile error
Rik van Riel [Thu, 29 Sep 2016 02:55:54 +0000 (22:55 -0400)]
tracing: Add #undef to fix compile error

commit bf7165cfa23695c51998231c4efa080fe1d3548d upstream.

There are several trace include files that define TRACE_INCLUDE_FILE.

Include several of them in the same .c file (as I currently have in
some code I am working on), and the compile will blow up with a
"warning: "TRACE_INCLUDE_FILE" redefined #define TRACE_INCLUDE_FILE syscalls"

Every other include file in include/trace/events/ avoids that issue
by having a #undef TRACE_INCLUDE_FILE before the #define; syscalls.h
should have one, too.

Link: http://lkml.kernel.org/r/20160928225554.13bd7ac6@annuminas.surriel.com
Fixes: b8007ef74222 ("tracing: Separate raw syscall from syscall tracer")
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoi2c: bcm2835: Avoid possible NULL ptr dereference
Stefan Wahren [Thu, 16 Feb 2017 21:20:45 +0000 (21:20 +0000)]
i2c: bcm2835: Avoid possible NULL ptr dereference

commit ababb08938df7ac245d30a58b95b94ecf8dc04fc upstream.

Since commit e2474541032d ("bcm2835: Fix hang for writing messages
larger than 16 bytes") the interrupt handler is prone to a possible
NULL pointer dereference. This could happen if an interrupt fires
before curr_msg is set by bcm2835_i2c_xfer_msg() and randomly occurs
on the RPi 3. Even this is an unexpected behavior the driver must
handle that with an error instead of a crash.

Reported-by: Peter Robinson <pbrobinson@gmail.com>
Fixes: e2474541032d ("bcm2835: Fix hang for writing messages larger than 16 bytes")
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Acked-by: Noralf Trønnes <noralf@tronnes.org>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoMIPS: ralink: Remove unused rt*_wdt_reset functions
Arnd Bergmann [Tue, 17 Jan 2017 15:18:43 +0000 (16:18 +0100)]
MIPS: ralink: Remove unused rt*_wdt_reset functions

commit 886f9c69fc68f56ddea34d3de51ac1fc2ac8dfbc upstream.

All pointers to these functions were removed, so now they produce
warnings:

arch/mips/ralink/rt305x.c:92:13: error: 'rt305x_wdt_reset' defined but not used [-Werror=unused-function]

This removes the functions. If we need them again, the patch can be
reverted later.

Fixes: f576fb6a0700 ("MIPS: ralink: cleanup the soc specific pinmux data")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: John Crispin <john@phrozen.org>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/15044/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoMIPS: ralink: Remove unused timer functions
Arnd Bergmann [Tue, 17 Jan 2017 15:18:41 +0000 (16:18 +0100)]
MIPS: ralink: Remove unused timer functions

commit d92240d12a9c010e0094bbc9280aae4a6be9a2d5 upstream.

The functions were originally used for the module unload path,
but are not referenced any more and just cause warnings:

arch/mips/ralink/timer.c:104:13: error: 'rt_timer_disable' defined but not used [-Werror=unused-function]
arch/mips/ralink/timer.c:74:13: error: 'rt_timer_free' defined but not used [-Werror=unused-function]

Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Fixes: 62ee73d284e7 ("MIPS: ralink: Make timer explicitly non-modular")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: John Crispin <john@phrozen.org>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/15041/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>