]> git.itanic.dy.fi Git - linux-stable/log
linux-stable
7 years agoLinux 4.9.4 v4.9.4
Greg Kroah-Hartman [Sun, 15 Jan 2017 12:43:07 +0000 (13:43 +0100)]
Linux 4.9.4

7 years agortlwifi: rtl_usb: Fix missing entry in USB driver's private data
Larry Finger [Wed, 21 Dec 2016 17:18:55 +0000 (11:18 -0600)]
rtlwifi: rtl_usb: Fix missing entry in USB driver's private data

commit 60f59ce0278557f7896d5158ae6d12a4855a72cc upstream.

These drivers need to be able to reference "struct ieee80211_hw" from
the driver's private data, and vice versa. The USB driver failed to
store the address of ieee80211_hw in the private data. Although this
bug has been present for a long time, it was not exposed until
commit ba9f93f82aba ("rtlwifi: Fix enter/exit power_save").

Fixes: ba9f93f82aba ("rtlwifi: Fix enter/exit power_save")
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agortlwifi: Fix enter/exit power_save
Larry Finger [Sat, 26 Nov 2016 20:43:35 +0000 (14:43 -0600)]
rtlwifi: Fix enter/exit power_save

commit ba9f93f82abafe2552eac942ebb11c2df4f8dd7f upstream.

In commit a5ffbe0a1993 ("rtlwifi: Fix scheduling while atomic bug") and
commit a269913c52ad ("rtlwifi: Rework rtl_lps_leave() and rtl_lps_enter()
to use work queue"), an error was introduced in the power-save routines
due to the fact that leaving PS was delayed by the use of a work queue.

This problem is fixed by detecting if the enter or leave routines are
in interrupt mode. If so, the workqueue is used to place the request.
If in normal mode, the enter or leave routines are called directly.

Fixes: a269913c52ad ("rtlwifi: Rework rtl_lps_leave() and rtl_lps_enter() to use work queue")
Reported-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Stable <stable@vger.kernel.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodrm/i915/gen9: Fix PCODE polling during CDCLK change notification
Imre Deak [Mon, 5 Dec 2016 16:27:37 +0000 (18:27 +0200)]
drm/i915/gen9: Fix PCODE polling during CDCLK change notification

commit 2c7d0602c815277f7cb7c932b091288710d8aba7 upstream.

commit 848496e5902833600f7992f4faa82dc1546051ba
Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
Date:   Wed Jul 13 16:32:03 2016 +0300

    drm/i915: Wait up to 3ms for the pcu to ack the cdclk change request on SKL

increased the timeout to match the spec, but we still see a timeout on
at least one SKL. A CDCLK change request following the failed one will
succeed nevertheless.

I could reproduce this problem easily by running kms_pipe_crc_basic in a
loop. In all failure cases _wait_for() was pre-empted for >3ms and so in
the worst case - when the pre-emption happened right after calculating
timeout__ in _wait_for() - we called skl_cdclk_wait_for_pcu_ready() only
once which failed and so _wait_for() timed out. As opposed to this the
spec says to keep retrying the request for at most a 3ms period.

To fix this send the first request explicitly to guarantee that there is
3ms between the first and last request. Though this matches the spec, I
noticed that in rare cases this can still time out if we sent only a few
requests (in the worst case 2) _and_ PCODE is busy for some reason even
after a previous request and a 3ms delay. To work around this retry the
polling with pre-emption disabled to maximize the number of requests.
Also increase the timeout to 10ms to account for interrupts that could
reduce the number of requests. With this change I couldn't trigger
the problem.

v2:
- Use 1ms poll period instead of 10us. (Chris)
v3:
- Poll with pre-emption disabled to increase the number of request
  attempts. (Ville, Chris)
- Factor out a helper to poll, it's also needed by the next patch.
v4:
- Pass reply_mask, reply to skl_pcode_request(), instead of assuming the
  reply is generic. (Ville)
v5:
- List the request specific timeout values as code comment. (Ville)
v6:
- Try the poll first with preemption enabled.
- Add code comment about first request being queued by PCODE. (Art)
- Add timeout_base_ms argument. (Ville)
v7:
- Clarify code comment about first queued request. (Chris)

Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Art Runyan <arthur.j.runyan@intel.com>
Cc: <stable@vger.kernel.org> # v4.2- : 3b2c171 : drm/i915: Wait up to 3ms
Cc: <stable@vger.kernel.org> # v4.2-
Fixes: 5d96d8afcfbb ("drm/i915/skl: Deinit/init the display at suspend/resume")
Reference: https://bugs.freedesktop.org/show_bug.cgi?id=97929
Testcase: igt/kms_pipe_crc_basic/suspend-read-crc-pipe-B
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1480955258-26311-1-git-send-email-imre.deak@intel.com
(cherry picked from commit a0b8a1fe34430c3a82258e8cb45f5968bdf31afd)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoALSA: usb-audio: Add a quirk for Plantronics BT600
Dennis Kadioglu [Mon, 9 Jan 2017 16:10:46 +0000 (17:10 +0100)]
ALSA: usb-audio: Add a quirk for Plantronics BT600

commit 2e40795c3bf344cfb5220d94566205796e3ef19a upstream.

Plantronics BT600 does not support reading the sample rate which leads
to many lines of "cannot get freq at ep 0x1" and "cannot get freq at
ep 0x82". This patch adds the USB ID of the BT600 to quirks.c and
avoids those error messages.

Signed-off-by: Dennis Kadioglu <denk@post.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agospi: mvebu: fix baudrate calculation for armada variant
Uwe Kleine-König [Thu, 8 Dec 2016 16:37:08 +0000 (17:37 +0100)]
spi: mvebu: fix baudrate calculation for armada variant

commit 7243e0b20729d372e97763617a7a9c89f29b33e1 upstream.

The calculation of SPR and SPPR doesn't round correctly at several
places which might result in baud rates that are too big. For example
with tclk_hz = 250000001 and target rate 25000000 it determined a
divider of 10 which is wrong.

Instead of fixing all the corner cases replace the calculation by an
algorithm without a loop which should even be quicker to execute apart
from being correct.

Fixes: df59fa7f4bca ("spi: orion: support armada extended baud rates")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoARM: omap2+: am437x: rollback to use omap3_gptimer_timer_init()
Grygorii Strashko [Mon, 5 Dec 2016 03:57:44 +0000 (09:27 +0530)]
ARM: omap2+: am437x: rollback to use omap3_gptimer_timer_init()

commit f86a2c875fd146d9b82c8fdd86d31084507bcf4c upstream.

The commit 55ee7017ee31 ("arm: omap2: board-generic: use
omap4_local_timer_init for AM437x") unintentionally changes the
clocksource devices for AM437x from OMAP GP Timer to SyncTimer32K.

Unfortunately, the SyncTimer32K is starving from frequency deviation
as mentioned in commit 5b5c01359152 ("ARM: OMAP2+: AM43x: Use gptimer
as clocksource") and, as reported by Franklin [1], even its monotonic
nature is under question (most probably there is a HW issue, but it's
still under investigation).

Taking into account above facts It's reasonable to rollback to the use
of omap3_gptimer_timer_init().

[1] http://www.spinics.net/lists/linux-omap/msg127425.html

Fixes: 55ee7017ee31 ("arm: omap2: board-generic: use
omap4_local_timer_init for AM437x")
Reported-by: Cooper Jr., Franklin <fcooper@ti.com>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Keerthy <j-keerthy@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoARM: 8631/1: clkdev: Detect errors in clk_hw_register_clkdev() for mass registration
Geert Uytterhoeven [Tue, 22 Nov 2016 11:33:11 +0000 (12:33 +0100)]
ARM: 8631/1: clkdev: Detect errors in clk_hw_register_clkdev() for mass registration

commit 9388093db44356af911adf3d355b7544a13a63cd upstream.

Unlike clk_register_clkdev(), clk_hw_register_clkdev() doesn't check for
passed error objects from a previous registration call. Hence the caller
of clk_hw_register_*() has to check for errors before calling
clk_hw_register_clkdev*().

Make clk_hw_register_clkdev() more similar to clk_register_clkdev() by
adding this error check, removing the burden from callers that do mass
registration.

Fixes: e4f1b49bda6d6aa2 ("clkdev: Add clk_hw based registration APIs")
Fixes: 944b9a41e004534f ("clk: ls1x: Migrate to clk_hw based OF and registration APIs")
Fixes: 44ce9a9ae977736f ("MIPS: TXx9: Convert to Common Clock Framework")
Fixes: f48d947a162dfa9d ("clk: clps711x: Migrate to clk_hw based OF and registration APIs")
Fixes: b4626a7f489238a5 ("CLK: Add Loongson1C clock support")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoARM: OMAP4+: Fix bad fallthrough for cpuidle
Tony Lindgren [Mon, 7 Nov 2016 23:50:11 +0000 (16:50 -0700)]
ARM: OMAP4+: Fix bad fallthrough for cpuidle

commit cbf2642872333547b56b8c4d943f5ed04ac9a4ee upstream.

We don't want to fall through to a bunch of errors for retention
if PM_OMAP4_CPU_OSWR_DISABLE is not configured for a SoC.

Fixes: 6099dd37c669 ("ARM: OMAP5 / DRA7: Enable CPU RET on suspend")
Acked-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoARM: OMAP5: Fix build for PM code
Tony Lindgren [Mon, 7 Nov 2016 23:50:10 +0000 (16:50 -0700)]
ARM: OMAP5: Fix build for PM code

commit da6d5993bf951846956903bee4f0eafd918250db upstream.

It's CONFIG_SOC_OMAP5, not CONFIG_ARCH_OMAP5. Looks like make randconfig
builds have not hit this one yet.

Fixes: b3bf289c1c45 ("ARM: OMAP2+: Fix build with CONFIG_SMP and CONFIG_PM is not set")
Acked-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoARM: OMAP5: Fix mpuss_early_init
Tony Lindgren [Mon, 7 Nov 2016 23:50:11 +0000 (16:50 -0700)]
ARM: OMAP5: Fix mpuss_early_init

commit 8a8be46afeaa47aed1debe7e9b18152f9826a6b7 upstream.

We need to properly initialize mpuss also on omap5 like we do on omap4.
Otherwise we run into similar kexec problems like we had on omap4 when
trying to kexec from a kernel with PM initialized.

Fixes: 0573b957fc21 ("ARM: OMAP4+: Prevent CPU1 related hang with kexec")
Acked-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agobus: arm-ccn: Prevent hotplug callback leak
Thomas Gleixner [Thu, 22 Dec 2016 10:14:06 +0000 (11:14 +0100)]
bus: arm-ccn: Prevent hotplug callback leak

commit 26242b330093fd14c2e94fb6cbdf0f482ab26576 upstream.

In case the driver registration fails, the hotplug callback is leaked.

Not fatal, because it's never invoked as there are no instances registered,
but wrong nevertheless.

Fixes: fdc15a36d84e ("bus/arm-ccn: Convert to hotplug statemachine")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Pawel Moll <pawel.moll@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agosvcrdma: Clear xpt_bc_xps in xprt_setup_rdma_bc() error exit arm
Chuck Lever [Tue, 29 Nov 2016 16:04:26 +0000 (11:04 -0500)]
svcrdma: Clear xpt_bc_xps in xprt_setup_rdma_bc() error exit arm

commit 1b9f700b8cfc31089e2dfa5d0905c52fd4529b50 upstream.

Logic copied from xs_setup_bc_tcp().

Fixes: 39a9beab5acb ('rpc: share one xps between all backchannels')
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoARM: qcom_defconfig: Fix MDM9515 LCC and GCC config
Neil Armstrong [Wed, 7 Sep 2016 08:18:42 +0000 (10:18 +0200)]
ARM: qcom_defconfig: Fix MDM9515 LCC and GCC config

commit 206787737e308bb447d18adef7da7749188212f5 upstream.

Correct prefix is MDM instead of MSM.

Fixes: 8aa788d3e59a ("ARM: configs: qualcomm: Add MDM9615 missing defconfigs")
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Andy Gross <andy.gross@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoARM: zynq: Reserve correct amount of non-DMA RAM
Kyle Roeschley [Mon, 31 Oct 2016 16:26:17 +0000 (11:26 -0500)]
ARM: zynq: Reserve correct amount of non-DMA RAM

commit 7a3cc2a7b2c723aa552028f4e66841cec183756d upstream.

On Zynq, we haven't been reserving the correct amount of DMA-incapable
RAM to keep DMA away from it (per the Zynq TRM Section 4.1, it should be
the first 512k). In older kernels, this was masked by the
memblock_reserve call in arm_memblock_init(). Now, reserve the correct
amount excplicitly rather than relying on swapper_pg_dir, which is an
address and not a size anyway.

Fixes: 46f5b96 ("ARM: zynq: Reserve not DMAable space in front of the kernel")
Signed-off-by: Kyle Roeschley <kyle.roeschley@ni.com>
Tested-by: Nathan Rossi <nathan@nathanrossi.com>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoARM: pxa: fix pxa25x interrupt init
Robert Jarzmik [Mon, 26 Sep 2016 07:21:51 +0000 (09:21 +0200)]
ARM: pxa: fix pxa25x interrupt init

commit e413bd33ac44b6d0bebc0ef2ac19cbe7558a7303 upstream.

In the device-tree case, the root interrupt controller cannot be
accessed through the 6th coprocessor, contrary to pxa27x and pxa3xx
architectures.

Fix it to behave as in non-devicetree builds.

Fixes: 32f17997c130 ("ARM: pxa: remove irq init from dt machines")
Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoARM64: dts: bcm2835: Fix bcm2837 compatible string
Andreas Färber [Mon, 24 Oct 2016 15:09:50 +0000 (17:09 +0200)]
ARM64: dts: bcm2835: Fix bcm2837 compatible string

commit 4f24450c6e580ac8591942c8bf65355a06b44635 upstream.

bcm2837-rpi-3-b.dts, its only in-tree user, was overriding it as
"brcm,bcm2837" already.

Fixes: 9d56c22a7861 ("ARM: bcm2835: Add devicetree for the Raspberry Pi 3.")
Cc: Stephen Warren <swarren@wwwdotorg.org>
Signed-off-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoARM64: dts: bcm2837-rpi-3-b: remove incorrect pwr LED
Andrea Merello [Fri, 11 Nov 2016 17:38:21 +0000 (09:38 -0800)]
ARM64: dts: bcm2837-rpi-3-b: remove incorrect pwr LED

commit a44e87b47148c6ee6b78509f47e6a15c0fae890a upstream.

We are incorrectly defining the pwr LED, attaching it to a gpio line
that is wired to the Wi-Fi SDIO module (which fails due to this).

The actual power LED is connected to the GPIO expander, which we don't
expose currently.

Fixes: 9d56c22a7861 ("ARM: bcm2835: Add devicetree for the Raspberry Pi 3.")
Thanks-to: Eric Anholt <eric@anholt.net> [for clarifying we can't control the LED]
Signed-off-by: Andrea Merello <andrea.merello@gmail.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoarm64: dts: mt8173: Fix auxadc node
Matthias Brugger [Wed, 26 Oct 2016 14:15:00 +0000 (16:15 +0200)]
arm64: dts: mt8173: Fix auxadc node

commit a3207d644fb89e5d7d5e01f00c04dcfc6d2d44d5 upstream.

The devicetree node for mt8173-auxadc lacks the clock and
io-channel-cells property. This leads to a non-working driver.

mt6577-auxadc 11001000.auxadc: failed to get auxadc clock
mt6577-auxadc: probe of 11001000.auxadc failed with error -2

Fix these fields to get the device up and running.

Fixes: 748c7d4de46a ("ARM64: dts: mt8173: Add thermal/auxadc device
nodes")
Signed-off-by: Matthias Brugger <matthias.bgg@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agotools/virtio: fix READ_ONCE()
Mark Rutland [Thu, 24 Nov 2016 10:25:12 +0000 (10:25 +0000)]
tools/virtio: fix READ_ONCE()

commit 5da889c795b1fbefc9d8f058b54717ab8ab17891 upstream.

The virtio tools implementation of READ_ONCE() has a single parameter called
'var', but erroneously refers to 'val' for its cast, and thus won't work unless
there's a variable of the correct type that happens to be called 'var'.

Fix this with s/var/val/, making READ_ONCE() work as expected regardless.

Fixes: a7c490333df3cff5 ("tools/virtio: use virt_xxx barriers")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: linux-kernel@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agopowerpc: Fix build warning on 32-bit PPC
Larry Finger [Fri, 23 Dec 2016 03:06:53 +0000 (21:06 -0600)]
powerpc: Fix build warning on 32-bit PPC

commit 8ae679c4bc2ea2d16d92620da8e3e9332fa4039f upstream.

I am getting the following warning when I build kernel 4.9-git on my
PowerBook G4 with a 32-bit PPC processor:

    AS      arch/powerpc/kernel/misc_32.o
  arch/powerpc/kernel/misc_32.S:299:7: warning: "CONFIG_FSL_BOOKE" is not defined [-Wundef]

This problem is evident after commit 989cea5c14be ("kbuild: prevent
lib-ksyms.o rebuilds"); however, this change in kbuild only exposes an
error that has been in the code since 2005 when this source file was
created.  That was with commit 9994a33865f4 ("powerpc: Introduce
entry_{32,64}.S, misc_{32,64}.S, systbl.S").

The offending line does not make a lot of sense.  This error does not
seem to cause any errors in the executable, thus I am not recommending
that it be applied to any stable versions.

Thanks to Nicholas Piggin for suggesting this solution.

Fixes: 9994a33865f4 ("powerpc: Introduce entry_{32,64}.S, misc_{32,64}.S, systbl.S")
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoALSA: firewire-tascam: Fix to handle error from initialization of stream data
Takashi Sakamoto [Tue, 3 Jan 2017 02:58:33 +0000 (11:58 +0900)]
ALSA: firewire-tascam: Fix to handle error from initialization of stream data

commit 6a2a2f45560a9cb7bc49820883b042e44f83726c upstream.

This module has a bug not to return error code in a case that data
structure for transmitted packets fails to be initialized.

This commit fixes the bug.

Fixes: 35efa5c489de ("ALSA: firewire-tascam: add streaming functionality")
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoHID: hid-cypress: validate length of report
Greg Kroah-Hartman [Fri, 6 Jan 2017 14:33:36 +0000 (15:33 +0100)]
HID: hid-cypress: validate length of report

commit 1ebb71143758f45dc0fa76e2f48429e13b16d110 upstream.

Make sure we have enough of a report structure to validate before
looking at it.

Reported-by: Benoit Camredon <benoit.camredon@airbus.com>
Tested-by: Benoit Camredon <benoit.camredon@airbus.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
7 years agonet: vrf: do not allow table id 0
David Ahern [Tue, 10 Jan 2017 23:22:25 +0000 (15:22 -0800)]
net: vrf: do not allow table id 0

[ Upstream commit 24c63bbc18e25d5d8439422aa5fd2d66390b88eb ]

Frank reported that vrf devices can be created with a table id of 0.
This breaks many of the run time table id checks and should not be
allowed. Detect this condition at create time and fail with EINVAL.

Fixes: 193125dbd8eb ("net: Introduce VRF device driver")
Reported-by: Frank Kellermann <frank.kellermann@atos.net>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet: ipv4: Fix multipath selection with vrf
David Ahern [Tue, 10 Jan 2017 22:37:35 +0000 (14:37 -0800)]
net: ipv4: Fix multipath selection with vrf

[ Upstream commit 7a18c5b9fb31a999afc62b0e60978aa896fc89e9 ]

fib_select_path does not call fib_select_multipath if oif is set in the
flow struct. For VRF use cases oif is always set, so multipath route
selection is bypassed. Use the FLOWI_FLAG_SKIP_NH_OIF to skip the oif
check similar to what is done in fib_table_lookup.

Add saddr and proto to the flow struct for the fib lookup done by the
VRF driver to better match hash computation for a flow.

Fixes: 613d09b30f8b ("net: Use VRF device index for lookups on TX")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet/mlx5e: Remove WARN_ONCE from adaptive moderation code
Gil Rockah [Tue, 10 Jan 2017 20:33:38 +0000 (22:33 +0200)]
net/mlx5e: Remove WARN_ONCE from adaptive moderation code

[ Upstream commit 0bbcc0a8fc394d01988fe0263ccf7fddb77a12c3 ]

When trying to do interface down or changing interface configuration
under heavy traffic, some of the adaptive moderation corner cases can
occur and leave a WARN_ONCE call trace in the kernel log.

Those WARN_ONCE are meant for debug only, and should have been inserted
only under debug. We avoid such call traces by removing those WARN_ONCE.

Fixes: cb3c7fd4f839 ("net/mlx5e: Support adaptive RX coalescing")
Signed-off-by: Gil Rockah <gilr@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agogro: Disable frag0 optimization on IPv6 ext headers
Herbert Xu [Tue, 10 Jan 2017 20:24:15 +0000 (12:24 -0800)]
gro: Disable frag0 optimization on IPv6 ext headers

[ Upstream commit 57ea52a865144aedbcd619ee0081155e658b6f7d ]

The GRO fast path caches the frag0 address.  This address becomes
invalid if frag0 is modified by pskb_may_pull or its variants.
So whenever that happens we must disable the frag0 optimization.

This is usually done through the combination of gro_header_hard
and gro_header_slow, however, the IPv6 extension header path did
the pulling directly and would continue to use the GRO fast path
incorrectly.

This patch fixes it by disabling the fast path when we enter the
IPv6 extension header path.

Fixes: 78a478d0efd9 ("gro: Inline skb_gro_header and cache frag0 virtual address")
Reported-by: Slava Shwartsman <slavash@mellanox.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agogro: use min_t() in skb_gro_reset_offset()
Eric Dumazet [Wed, 11 Jan 2017 03:52:43 +0000 (19:52 -0800)]
gro: use min_t() in skb_gro_reset_offset()

[ Upstream commit 7cfd5fd5a9813f1430290d20c0fead9b4582a307 ]

On 32bit arches, (skb->end - skb->data) is not 'unsigned int',
so we shall use min_t() instead of min() to avoid a compiler error.

Fixes: 1272ce87fa01 ("gro: Enter slow-path if there is no tailroom")
Reported-by: kernel test robot <fengguang.wu@intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agogro: Enter slow-path if there is no tailroom
Herbert Xu [Tue, 10 Jan 2017 20:24:01 +0000 (12:24 -0800)]
gro: Enter slow-path if there is no tailroom

[ Upstream commit 1272ce87fa017ca4cf32920764d879656b7a005a ]

The GRO path has a fast-path where we avoid calling pskb_may_pull
and pskb_expand by directly accessing frag0.  However, this should
only be done if we have enough tailroom in the skb as otherwise
we'll have to expand it later anyway.

This patch adds the check by capping frag0_len with the skb tailroom.

Fixes: cb18978cbf45 ("gro: Open-code final pskb_may_pull")
Reported-by: Slava Shwartsman <slavash@mellanox.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet: add the AF_QIPCRTR entries to family name tables
Anna, Suman [Tue, 10 Jan 2017 03:48:56 +0000 (21:48 -0600)]
net: add the AF_QIPCRTR entries to family name tables

[ Upstream commit 5d722b3024f6762addb8642ffddc9f275b5107ae ]

Commit bdabad3e363d ("net: Add Qualcomm IPC router") introduced a
new address family. Update the family name tables accordingly so
that the lockdep initialization can use the proper names for this
family.

Cc: Courtney Cavin <courtney.cavin@sonymobile.com>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Suman Anna <s-anna@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet: dsa: Ensure validity of dst->ds[0]
Florian Fainelli [Mon, 9 Jan 2017 19:58:34 +0000 (11:58 -0800)]
net: dsa: Ensure validity of dst->ds[0]

[ Upstream commit faf3a932fbeb77860226a8323eacb835edc98648 ]

It is perfectly possible to have non zero indexed switches being present
in a DSA switch tree, in such a case, we will be deferencing a NULL
pointer while dsa_cpu_port_ethtool_{setup,restore}. Be more defensive
and ensure that dst->ds[0] is valid before doing anything with it.

Fixes: 0c73c523cf73 ("net: dsa: Initialize CPU port ethtool ops per tree")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agor8152: fix rx issue for runtime suspend
hayeswang [Tue, 10 Jan 2017 09:04:07 +0000 (17:04 +0800)]
r8152: fix rx issue for runtime suspend

[ Upstream commit 75dc692eda114cb234a46cb11893a9c3ea520934 ]

Pause the rx and make sure the rx fifo is empty when the autosuspend
occurs.

If the rx data comes when the driver is canceling the rx urb, the host
controller would stop getting the data from the device and continue
it after next rx urb is submitted. That is, one continuing data is
split into two different urb buffers. That let the driver take the
data as a rx descriptor, and unexpected behavior happens.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agor8152: split rtl8152_suspend function
hayeswang [Tue, 10 Jan 2017 09:04:06 +0000 (17:04 +0800)]
r8152: split rtl8152_suspend function

[ Upstream commit 8fb280616878b81c0790a0c33acbeec59c5711f4 ]

Split rtl8152_suspend() into rtl8152_system_suspend() and
rtl8152_rumtime_suspend().

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet: dsa: bcm_sf2: Utilize nested MDIO read/write
Florian Fainelli [Sun, 8 Jan 2017 05:01:57 +0000 (21:01 -0800)]
net: dsa: bcm_sf2: Utilize nested MDIO read/write

[ Upstream commit 2cfe8f8290bd28cf1ee67db914a6e76cf8e6437b ]

We are implementing a MDIO bus which is behind another one, so use the
nested version of the accessors to get lockdep annotations correct.

Fixes: 461cd1b03e32 ("net: dsa: bcm_sf2: Register our slave MDIO bus")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet: dsa: bcm_sf2: Do not clobber b53_switch_ops
Florian Fainelli [Sun, 8 Jan 2017 05:01:56 +0000 (21:01 -0800)]
net: dsa: bcm_sf2: Do not clobber b53_switch_ops

[ Upstream commit a4c61b92b3a4cbda35bb0251a5063a68f0861b2c ]

We make the bcm_sf2 driver override ds->ops which points to
b53_switch_ops since b53_switch_alloc() did the assignent. This is all
well and good until a second b53 switch comes in, and ends up using the
bcm_sf2 operations. Make a proper local copy, substitute the ds->ops
pointer and then override the operations.

Fixes: f458995b9ad8 ("net: dsa: bcm_sf2: Utilize core B53 driver when possible")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agobpf: change back to orig prog on too many passes
Daniel Borkmann [Fri, 6 Jan 2017 23:26:33 +0000 (00:26 +0100)]
bpf: change back to orig prog on too many passes

[ Upstream commit 9d5ecb09d525469abd1a10c096cb5a17206523f2 ]

If after too many passes still no image could be emitted, then
swap back to the original program as we do in all other cases
and don't use the one with blinding.

Fixes: 959a75791603 ("bpf, x86: add support for constant blinding")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet: vrf: Add missing Rx counters
David Ahern [Tue, 3 Jan 2017 17:37:55 +0000 (09:37 -0800)]
net: vrf: Add missing Rx counters

[ Upstream commit 926d93a33e59b2729afdbad357233c17184de9d2 ]

The move from rx-handler to L3 receive handler inadvertantly dropped the
rx counters. Restore them.

Fixes: 74b20582ac38 ("net: l3mdev: Add hook in ip and ipv6")
Reported-by: Dinesh Dutt <ddutt@cumulusnetworks.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoipv4: Do not allow MAIN to be alias for new LOCAL w/ custom rules
Alexander Duyck [Mon, 2 Jan 2017 21:32:54 +0000 (13:32 -0800)]
ipv4: Do not allow MAIN to be alias for new LOCAL w/ custom rules

[ Upstream commit 5350d54f6cd12eaff623e890744c79b700bd3f17 ]

In the case of custom rules being present we need to handle the case of the
LOCAL table being intialized after the new rule has been added.  To address
that I am adding a new check so that we can make certain we don't use an
alias of MAIN for LOCAL when allocating a new table.

Fixes: 0ddcf43d5d4a ("ipv4: FIB Local/MAIN table collapse")
Reported-by: Oliver Brunel <jjk@jjacky.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoigmp: Make igmp group member RFC 3376 compliant
Michal Tesar [Mon, 2 Jan 2017 13:38:36 +0000 (14:38 +0100)]
igmp: Make igmp group member RFC 3376 compliant

[ Upstream commit 7ababb782690e03b78657e27bd051e20163af2d6 ]

5.2. Action on Reception of a Query

 When a system receives a Query, it does not respond immediately.
 Instead, it delays its response by a random amount of time, bounded
 by the Max Resp Time value derived from the Max Resp Code in the
 received Query message.  A system may receive a variety of Queries on
 different interfaces and of different kinds (e.g., General Queries,
 Group-Specific Queries, and Group-and-Source-Specific Queries), each
 of which may require its own delayed response.

 Before scheduling a response to a Query, the system must first
 consider previously scheduled pending responses and in many cases
 schedule a combined response.  Therefore, the system must be able to
 maintain the following state:

 o A timer per interface for scheduling responses to General Queries.

 o A per-group and interface timer for scheduling responses to Group-
   Specific and Group-and-Source-Specific Queries.

 o A per-group and interface list of sources to be reported in the
   response to a Group-and-Source-Specific Query.

 When a new Query with the Router-Alert option arrives on an
 interface, provided the system has state to report, a delay for a
 response is randomly selected in the range (0, [Max Resp Time]) where
 Max Resp Time is derived from Max Resp Code in the received Query
 message.  The following rules are then used to determine if a Report
 needs to be scheduled and the type of Report to schedule.  The rules
 are considered in order and only the first matching rule is applied.

 1. If there is a pending response to a previous General Query
    scheduled sooner than the selected delay, no additional response
    needs to be scheduled.

 2. If the received Query is a General Query, the interface timer is
    used to schedule a response to the General Query after the
    selected delay.  Any previously pending response to a General
    Query is canceled.
--8<--

Currently the timer is rearmed with new random expiration time for
every incoming query regardless of possibly already pending report.
Which is not aligned with the above RFE.
It also might happen that higher rate of incoming queries can
postpone the report after the expiration time of the first query
causing group membership loss.

Now the per interface general query timer is rearmed only
when there is no pending report already scheduled on that interface or
the newly selected expiration time is before the already pending
scheduled report.

Signed-off-by: Michal Tesar <mtesar@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoflow_dissector: Update pptp handling to avoid null pointer deref.
Ian Kumlien [Mon, 2 Jan 2017 08:18:35 +0000 (09:18 +0100)]
flow_dissector: Update pptp handling to avoid null pointer deref.

[ Upstream commit d0af683407a26a4437d8fa6e283ea201f2ae8146 ]

__skb_flow_dissect can be called with a skb or a data packet, either
can be NULL. All calls seems to have been moved to __skb_header_pointer
except the pptp handling which is still calling skb_header_pointer.

skb_header_pointer will use skb->data and thus:
[  109.556866] BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
[  109.557102] IP: [<ffffffff88dc02f8>] __skb_flow_dissect+0xa88/0xce0
[  109.557263] PGD 0
[  109.557338]
[  109.557484] Oops: 0000 [#1] SMP
[  109.557562] Modules linked in: chaoskey
[  109.557783] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.9.0 #79
[  109.557867] Hardware name: Supermicro A1SRM-LN7F/LN5F/A1SRM-LN7F-2758, BIOS 1.0c 11/04/2015
[  109.557957] task: ffff94085c27bc00 task.stack: ffffb745c0068000
[  109.558041] RIP: 0010:[<ffffffff88dc02f8>]  [<ffffffff88dc02f8>] __skb_flow_dissect+0xa88/0xce0
[  109.558203] RSP: 0018:ffff94087fc83d40  EFLAGS: 00010206
[  109.558286] RAX: 0000000000000130 RBX: ffffffff8975bf80 RCX: ffff94084fab6800
[  109.558373] RDX: 0000000000000010 RSI: 000000000000000c RDI: 0000000000000000
[  109.558460] RBP: 0000000000000b88 R08: 0000000000000000 R09: 0000000000000022
[  109.558547] R10: 0000000000000008 R11: ffff94087fc83e04 R12: 0000000000000000
[  109.558763] R13: ffff94084fab6800 R14: ffff94087fc83e04 R15: 000000000000002f
[  109.558979] FS:  0000000000000000(0000) GS:ffff94087fc80000(0000) knlGS:0000000000000000
[  109.559326] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  109.559539] CR2: 0000000000000080 CR3: 0000000281809000 CR4: 00000000001026e0
[  109.559753] Stack:
[  109.559957]  000000000000000c ffff94084fab6822 0000000000000001 ffff94085c2b5fc0
[  109.560578]  0000000000000001 0000000000002000 0000000000000000 0000000000000000
[  109.561200]  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[  109.561820] Call Trace:
[  109.562027]  <IRQ>
[  109.562108]  [<ffffffff88dfb4fa>] ? eth_get_headlen+0x7a/0xf0
[  109.562522]  [<ffffffff88c5a35a>] ? igb_poll+0x96a/0xe80
[  109.562737]  [<ffffffff88dc912b>] ? net_rx_action+0x20b/0x350
[  109.562953]  [<ffffffff88546d68>] ? __do_softirq+0xe8/0x280
[  109.563169]  [<ffffffff8854704a>] ? irq_exit+0xaa/0xb0
[  109.563382]  [<ffffffff8847229b>] ? do_IRQ+0x4b/0xc0
[  109.563597]  [<ffffffff8902d4ff>] ? common_interrupt+0x7f/0x7f
[  109.563810]  <EOI>
[  109.563890]  [<ffffffff88d57530>] ? cpuidle_enter_state+0x130/0x2c0
[  109.564304]  [<ffffffff88d57520>] ? cpuidle_enter_state+0x120/0x2c0
[  109.564520]  [<ffffffff8857eacf>] ? cpu_startup_entry+0x19f/0x1f0
[  109.564737]  [<ffffffff8848d55a>] ? start_secondary+0x12a/0x140
[  109.564950] Code: 83 e2 20 a8 80 0f 84 60 01 00 00 c7 04 24 08 00
00 00 66 85 d2 0f 84 be fe ff ff e9 69 fe ff ff 8b 34 24 89 f2 83 c2
04 66 85 c0 <41> 8b 84 24 80 00 00 00 0f 49 d6 41 8d 31 01 d6 41 2b 84
24 84
[  109.569959] RIP  [<ffffffff88dc02f8>] __skb_flow_dissect+0xa88/0xce0
[  109.570245]  RSP <ffff94087fc83d40>
[  109.570453] CR2: 0000000000000080

Fixes: ab10dccb1160 ("rps: Inspect PPTP encapsulated by GRE to get flow hash")
Signed-off-by: Ian Kumlien <ian.kumlien@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodrop_monitor: consider inserted data in genlmsg_end
Reiter Wolfgang [Tue, 3 Jan 2017 00:39:10 +0000 (01:39 +0100)]
drop_monitor: consider inserted data in genlmsg_end

[ Upstream commit 3b48ab2248e61408910e792fe84d6ec466084c1a ]

Final nlmsg_len field update must reflect inserted net_dm_drop_point
data.

This patch depends on previous patch:
"drop_monitor: add missing call to genlmsg_end"

Signed-off-by: Reiter Wolfgang <wr0112358@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodrop_monitor: add missing call to genlmsg_end
Reiter Wolfgang [Sat, 31 Dec 2016 20:11:57 +0000 (21:11 +0100)]
drop_monitor: add missing call to genlmsg_end

[ Upstream commit 4200462d88f47f3759bdf4705f87e207b0f5b2e4 ]

Update nlmsg_len field with genlmsg_end to enable userspace processing
using nlmsg_next helper. Also adds error handling.

Signed-off-by: Reiter Wolfgang <wr0112358@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet: ipv4: dst for local input routes should use l3mdev if relevant
David Ahern [Thu, 29 Dec 2016 23:29:03 +0000 (15:29 -0800)]
net: ipv4: dst for local input routes should use l3mdev if relevant

[ Upstream commit f5a0aab84b74de68523599817569c057c7ac1622 ]

IPv4 output routes already use l3mdev device instead of loopback for dst's
if it is applicable. Change local input routes to do the same.

This fixes icmp responses for unreachable UDP ports which are directed
to the wrong table after commit 9d1a6c4ea43e4 because local_input
routes use the loopback device. Moving from ingress device to loopback
loses the L3 domain causing responses based on the dst to get to lost.

Fixes: 9d1a6c4ea43e4 ("net: icmp_route_lookup should use rt dev to
       determine L3 domain")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet: fix incorrect original ingress device index in PKTINFO
Wei Zhang [Thu, 29 Dec 2016 08:45:04 +0000 (16:45 +0800)]
net: fix incorrect original ingress device index in PKTINFO

[ Upstream commit f0c16ba8933ed217c2688b277410b2a37ba81591 ]

When we send a packet for our own local address on a non-loopback
interface (e.g. eth0), due to the change had been introduced from
commit 0b922b7a829c ("net: original ingress device index in PKTINFO"), the
original ingress device index would be set as the loopback interface.
However, the packet should be considered as if it is being arrived via the
sending interface (eth0), otherwise it would break the expectation of the
userspace application (e.g. the DHCPRELEASE message from dhcp_release
binary would be ignored by the dnsmasq daemon, since it come from lo which
is not the interface dnsmasq bind to)

Fixes: 0b922b7a829c ("net: original ingress device index in PKTINFO")
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Wei Zhang <asuka.com@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agortnl: stats - add missing netlink message size checks
Mathias Krause [Wed, 28 Dec 2016 16:52:15 +0000 (17:52 +0100)]
rtnl: stats - add missing netlink message size checks

[ Upstream commit 4775cc1f2d5abca894ac32774eefc22c45347d1c ]

We miss to check if the netlink message is actually big enough to contain
a struct if_stats_msg.

Add a check to prevent userland from sending us short messages that would
make us access memory beyond the end of the message.

Fixes: 10c9ead9f3c6 ("rtnetlink: add new RTM_GETSTATS message to dump...")
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet/mlx5e: Disable netdev after close
Saeed Mahameed [Wed, 28 Dec 2016 12:58:42 +0000 (14:58 +0200)]
net/mlx5e: Disable netdev after close

[ Upstream commit 37f304d10030bb425c19099e7b955d9c3ec4cba3 ]

Disable netdev should come after it was closed, although no harm of doing it
before -hence the MLX5E_STATE_DESTROYING bit- but it is more natural this way.

Fixes: 26e59d8077a3 ("net/mlx5e: Implement mlx5e interface attach/detach callbacks")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet/mlx5e: Don't sync netdev state when not registered
Saeed Mahameed [Wed, 28 Dec 2016 12:58:41 +0000 (14:58 +0200)]
net/mlx5e: Don't sync netdev state when not registered

[ Upstream commit 610e89e05c3f28a7394935aa6b91f99548c4fd3c ]

Skip setting netdev vxlan ports and netdev rx_mode on driver load
when netdev is not yet registered.

Synchronizing with netdev state is needed only on reset flow where the
netdev remains registered for the whole reset period.

This also fixes an access before initialization of net_device.addr_list_lock
- which for some reason initialized on register_netdev - where we queued
set_rx_mode work on driver load before netdev registration.

Fixes: 26e59d8077a3 ("net/mlx5e: Implement mlx5e interface attach/detach callbacks")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet/mlx5: Prevent setting multicast macs for VFs
Mohamad Haj Yahia [Wed, 28 Dec 2016 12:58:37 +0000 (14:58 +0200)]
net/mlx5: Prevent setting multicast macs for VFs

[ Upstream commit ccce1700263d8b5b219359d04180492a726cea16 ]

Need to check that VF mac address entered by the admin user is either
zero or unicast mac.
Multicast mac addresses are prohibited.

Fixes: 77256579c6b4 ('net/mlx5: E-Switch, Introduce Vport administration functions')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet/mlx5: Mask destination mac value in ethtool steering rules
Maor Gottlieb [Wed, 28 Dec 2016 12:58:35 +0000 (14:58 +0200)]
net/mlx5: Mask destination mac value in ethtool steering rules

[ Upstream commit 077b1e8069b9b74477b01d28f6b83774dc19a142 ]

We need to mask the destination mac value with the destination mac
mask when adding steering rule via ethtool.

Fixes: 1174fce8d1410 ('net/mlx5e: Support l3/l4 flow type specs in ethtool flow steering')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet/mlx5: Avoid shadowing numa_node
Eli Cohen [Wed, 28 Dec 2016 12:58:34 +0000 (14:58 +0200)]
net/mlx5: Avoid shadowing numa_node

[ Upstream commit d151d73dcc99de87c63bdefebcc4cb69de1cdc40 ]

Avoid using a local variable named numa_node to avoid shadowing a public
one.

Fixes: db058a186f98 ('net/mlx5_core: Set irq affinity hints')
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet/mlx5: Cancel recovery work in remove flow
Daniel Jurgens [Wed, 28 Dec 2016 12:58:33 +0000 (14:58 +0200)]
net/mlx5: Cancel recovery work in remove flow

[ Upstream commit 689a248df83b6032edc57e86267b4e5cc8d7174e ]

If there is pending delayed work for health recovery it must be canceled
if the device is being unloaded.

Fixes: 05ac2c0b7438 ("net/mlx5: Fix race between PCI error handlers and health work")
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet/mlx5: Check FW limitations on log_max_qp before setting it
Noa Osherovich [Wed, 28 Dec 2016 12:58:32 +0000 (14:58 +0200)]
net/mlx5: Check FW limitations on log_max_qp before setting it

[ Upstream commit 883371c453b937f9eb581fb4915210865982736f ]

When setting HCA capabilities, set log_max_qp to be the minimum
between the selected profile's value and the HCA limitation.

Fixes: 938fe83c8dcb ('net/mlx5_core: New device capabilities...')
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet/sched: cls_flower: Fix missing addr_type in classify
Paul Blakey [Wed, 28 Dec 2016 12:54:47 +0000 (14:54 +0200)]
net/sched: cls_flower: Fix missing addr_type in classify

[ Upstream commit 0df0f207aab4f42e5c96a807adf9a6845b69e984 ]

Since we now use a non zero mask on addr_type, we are matching on its
value (IPV4/IPV6). So before this fix, matching on enc_src_ip/enc_dst_ip
failed in SW/classify path since its value was zero.
This patch sets the proper value of addr_type for encapsulated packets.

Fixes: 970bfcd09791 ('net/sched: cls_flower: Use mask for addr_type')
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet: stmmac: Fix race between stmmac_drv_probe and stmmac_open
Florian Fainelli [Wed, 28 Dec 2016 02:23:06 +0000 (18:23 -0800)]
net: stmmac: Fix race between stmmac_drv_probe and stmmac_open

[ Upstream commit 5701659004d68085182d2fd4199c79172165fa65 ]

There is currently a small window during which the network device registered by
stmmac can be made visible, yet all resources, including and clock and MDIO bus
have not had a chance to be set up, this can lead to the following error to
occur:

[  473.919358] stmmaceth 0000:01:00.0 (unnamed net_device) (uninitialized):
                stmmac_dvr_probe: warning: cannot get CSR clock
[  473.919382] stmmaceth 0000:01:00.0: no reset control found
[  473.919412] stmmac - user ID: 0x10, Synopsys ID: 0x42
[  473.919429] stmmaceth 0000:01:00.0: DMA HW capability register supported
[  473.919436] stmmaceth 0000:01:00.0: RX Checksum Offload Engine supported
[  473.919443] stmmaceth 0000:01:00.0: TX Checksum insertion supported
[  473.919451] stmmaceth 0000:01:00.0 (unnamed net_device) (uninitialized):
                Enable RX Mitigation via HW Watchdog Timer
[  473.921395] libphy: PHY stmmac-1:00 not found
[  473.921417] stmmaceth 0000:01:00.0 eth0: Could not attach to PHY
[  473.921427] stmmaceth 0000:01:00.0 eth0: stmmac_open: Cannot attach to
                PHY (error: -19)
[  473.959710] libphy: stmmac: probed
[  473.959724] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 0 IRQ POLL
                (stmmac-1:00) active
[  473.959728] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 1 IRQ POLL
                (stmmac-1:01)
[  473.959731] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 2 IRQ POLL
                (stmmac-1:02)
[  473.959734] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 3 IRQ POLL
                (stmmac-1:03)

Fix this by making sure that register_netdev() is the last thing being done,
which guarantees that the clock and the MDIO bus are available.

Fixes: 4bfcbd7abce2 ("stmmac: Move the mdio_register/_unregister in probe/remove")
Reported-by: Kweh, Hock Leong <hock.leong.kweh@intel.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet, sched: fix soft lockup in tc_classify
Daniel Borkmann [Wed, 21 Dec 2016 17:04:11 +0000 (18:04 +0100)]
net, sched: fix soft lockup in tc_classify

[ Upstream commit 628185cfddf1dfb701c4efe2cfd72cf5b09f5702 ]

Shahar reported a soft lockup in tc_classify(), where we run into an
endless loop when walking the classifier chain due to tp->next == tp
which is a state we should never run into. The issue only seems to
trigger under load in the tc control path.

What happens is that in tc_ctl_tfilter(), thread A allocates a new
tp, initializes it, sets tp_created to 1, and calls into tp->ops->change()
with it. In that classifier callback we had to unlock/lock the rtnl
mutex and returned with -EAGAIN. One reason why we need to drop there
is, for example, that we need to request an action module to be loaded.

This happens via tcf_exts_validate() -> tcf_action_init/_1() meaning
after we loaded and found the requested action, we need to redo the
whole request so we don't race against others. While we had to unlock
rtnl in that time, thread B's request was processed next on that CPU.
Thread B added a new tp instance successfully to the classifier chain.
When thread A returned grabbing the rtnl mutex again, propagating -EAGAIN
and destroying its tp instance which never got linked, we goto replay
and redo A's request.

This time when walking the classifier chain in tc_ctl_tfilter() for
checking for existing tp instances we had a priority match and found
the tp instance that was created and linked by thread B. Now calling
again into tp->ops->change() with that tp was successful and returned
without error.

tp_created was never cleared in the second round, thus kernel thinks
that we need to link it into the classifier chain (once again). tp and
*back point to the same object due to the match we had earlier on. Thus
for thread B's already public tp, we reset tp->next to tp itself and
link it into the chain, which eventually causes the mentioned endless
loop in tc_classify() once a packet hits the data path.

Fix is to clear tp_created at the beginning of each request, also when
we replay it. On the paths that can cause -EAGAIN we already destroy
the original tp instance we had and on replay we really need to start
from scratch. It seems that this issue was first introduced in commit
12186be7d2e1 ("net_cls: fix unconfigured struct tcf_proto keeps chaining
and avoid kernel panic when we use cls_cgroup").

Fixes: 12186be7d2e1 ("net_cls: fix unconfigured struct tcf_proto keeps chaining and avoid kernel panic when we use cls_cgroup")
Reported-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Tested-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoipv6: handle -EFAULT from skb_copy_bits
Dave Jones [Thu, 22 Dec 2016 16:16:22 +0000 (11:16 -0500)]
ipv6: handle -EFAULT from skb_copy_bits

[ Upstream commit a98f91758995cb59611e61318dddd8a6956b52c3 ]

By setting certain socket options on ipv6 raw sockets, we can confuse the
length calculation in rawv6_push_pending_frames triggering a BUG_ON.

RIP: 0010:[<ffffffff817c6390>] [<ffffffff817c6390>] rawv6_sendmsg+0xc30/0xc40
RSP: 0018:ffff881f6c4a7c18  EFLAGS: 00010282
RAX: 00000000fffffff2 RBX: ffff881f6c681680 RCX: 0000000000000002
RDX: ffff881f6c4a7cf8 RSI: 0000000000000030 RDI: ffff881fed0f6a00
RBP: ffff881f6c4a7da8 R08: 0000000000000000 R09: 0000000000000009
R10: ffff881fed0f6a00 R11: 0000000000000009 R12: 0000000000000030
R13: ffff881fed0f6a00 R14: ffff881fee39ba00 R15: ffff881fefa93a80

Call Trace:
 [<ffffffff8118ba23>] ? unmap_page_range+0x693/0x830
 [<ffffffff81772697>] inet_sendmsg+0x67/0xa0
 [<ffffffff816d93f8>] sock_sendmsg+0x38/0x50
 [<ffffffff816d982f>] SYSC_sendto+0xef/0x170
 [<ffffffff816da27e>] SyS_sendto+0xe/0x10
 [<ffffffff81002910>] do_syscall_64+0x50/0xa0
 [<ffffffff817f7cbc>] entry_SYSCALL64_slow_path+0x25/0x25

Handle by jumping to the failure path if skb_copy_bits gets an EFAULT.

Reproducer:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>

#define LEN 504

int main(int argc, char* argv[])
{
int fd;
int zero = 0;
char buf[LEN];

memset(buf, 0, LEN);

fd = socket(AF_INET6, SOCK_RAW, 7);

setsockopt(fd, SOL_IPV6, IPV6_CHECKSUM, &zero, 4);
setsockopt(fd, SOL_IPV6, IPV6_DSTOPTS, &buf, LEN);

sendto(fd, buf, 1, 0, (struct sockaddr *) buf, 110);
}

Signed-off-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoinet: fix IP(V6)_RECVORIGDSTADDR for udp sockets
Willem de Bruijn [Thu, 22 Dec 2016 23:19:16 +0000 (18:19 -0500)]
inet: fix IP(V6)_RECVORIGDSTADDR for udp sockets

[ Upstream commit 39b2dd765e0711e1efd1d1df089473a8dd93ad48 ]

Socket cmsg IP(V6)_RECVORIGDSTADDR checks that port range lies within
the packet. For sockets that have transport headers pulled, transport
offset can be negative. Use signed comparison to avoid overflow.

Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing")
Reported-by: Nisar Jagabar <njagabar@cloudmark.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agosctp: sctp_transport_lookup_process should rcu_read_unlock when transport is null
Xin Long [Thu, 15 Dec 2016 15:05:52 +0000 (23:05 +0800)]
sctp: sctp_transport_lookup_process should rcu_read_unlock when transport is null

[ Upstream commit 08abb79542c9e8c367d1d8e44fe1026868d3f0a7 ]

Prior to this patch, sctp_transport_lookup_process didn't rcu_read_unlock
when it failed to find a transport by sctp_addrs_lookup_transport.

This patch is to fix it by moving up rcu_read_unlock right before checking
transport and also to remove the out path.

Fixes: 1cceda784980 ("sctp: fix the issue sctp_diag uses lock_sock in rcu_read_lock")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet: vrf: Drop conntrack data after pass through VRF device on Tx
David Ahern [Wed, 14 Dec 2016 22:31:11 +0000 (14:31 -0800)]
net: vrf: Drop conntrack data after pass through VRF device on Tx

[ Upstream commit eb63ecc1706b3e094d0f57438b6c2067cfc299f2 ]

Locally originated traffic in a VRF fails in the presence of a POSTROUTING
rule. For example,

    $ iptables -t nat -A POSTROUTING -s 11.1.1.0/24  -j MASQUERADE
    $ ping -I red -c1 11.1.1.3
    ping: Warning: source address might be selected on device other than red.
    PING 11.1.1.3 (11.1.1.3) from 11.1.1.2 red: 56(84) bytes of data.
    ping: sendmsg: Operation not permitted

Worse, the above causes random corruption resulting in a panic in random
places (I have not seen a consistent backtrace).

Call nf_reset to drop the conntrack info following the pass through the
VRF device.  The nf_reset is needed on Tx but not Rx because of the order
in which NF_HOOK's are hit: on Rx the VRF device is after the real ingress
device and on Tx it is is before the real egress device. Connection
tracking should be tied to the real egress device and not the VRF device.

Fixes: 8f58336d3f78a ("net: Add ethernet header for pass through VRF device")
Fixes: 35402e3136634 ("net: Add IPv6 support to VRF device")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet: vrf: Fix NAT within a VRF
David Ahern [Wed, 14 Dec 2016 19:06:18 +0000 (11:06 -0800)]
net: vrf: Fix NAT within a VRF

[ Upstream commit a0f37efa82253994b99623dbf41eea8dd0ba169b ]

Connection tracking with VRF is broken because the pass through the VRF
device drops the connection tracking info. Removing the call to nf_reset
allows DNAT and MASQUERADE to work across interfaces within a VRF.

Fixes: 73e20b761acf ("net: vrf: Add support for PREROUTING rules on vrf device")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoLinux 4.9.3 v4.9.3
Greg Kroah-Hartman [Thu, 12 Jan 2017 10:41:42 +0000 (11:41 +0100)]
Linux 4.9.3

7 years agousb: gadget: composite: always set ep->mult to a sensible value
Felipe Balbi [Wed, 28 Sep 2016 09:33:31 +0000 (12:33 +0300)]
usb: gadget: composite: always set ep->mult to a sensible value

commit eaa496ffaaf19591fe471a36cef366146eeb9153 upstream.

ep->mult is supposed to be set to Isochronous and
Interrupt Endapoint's multiplier value. This value
is computed from different places depending on the
link speed.

If we're dealing with HighSpeed, then it's part of
bits [12:11] of wMaxPacketSize. This case wasn't
taken into consideration before.

While at that, also make sure the ep->mult defaults
to one so drivers can use it unconditionally and
assume they'll never multiply ep->maxpacket to zero.

Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoRevert "usb: gadget: composite: always set ep->mult to a sensible value"
Greg Kroah-Hartman [Thu, 12 Jan 2017 07:51:32 +0000 (08:51 +0100)]
Revert "usb: gadget: composite: always set ep->mult to a sensible value"

This reverts commit eab1c4e2d0ad4509ccb8476a604074547dc202e0 which is
commit eaa496ffaaf19591fe471a36cef366146eeb9153 upstream as it was
incorrectly backported.

Reported-by: Bin Liu <b-liu@ti.com>
Cc: Felipe Balbi <balbi@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoRevert "rtlwifi: Fix enter/exit power_save"
Greg Kroah-Hartman [Thu, 12 Jan 2017 07:29:09 +0000 (08:29 +0100)]
Revert "rtlwifi: Fix enter/exit power_save"

This reverts commit 98068574928f499b30f136ff57ef9a03cc575a36, which is
commit ba9f93f82abafe2552eac942ebb11c2df4f8dd7f upstream as it causes
problems.

Reported-by: Dmitry Osipenko <digetx@gmail.com>
Cc: Ping-Ke Shih <pkshih@realtek.com>
Cc: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
7 years agotick/broadcast: Prevent NULL pointer dereference
Thomas Gleixner [Thu, 15 Dec 2016 11:10:37 +0000 (12:10 +0100)]
tick/broadcast: Prevent NULL pointer dereference

commit c1a9eeb938b5433947e5ea22f89baff3182e7075 upstream.

When a disfunctional timer, e.g. dummy timer, is installed, the tick core
tries to setup the broadcast timer.

If no broadcast device is installed, the kernel crashes with a NULL pointer
dereference in tick_broadcast_setup_oneshot() because the function has no
sanity check.

Reported-by: Mason <slash.tmp@free.fr>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Richard Cochran <rcochran@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
Cc: Sebastian Frias <sf84@laposte.net>
Cc: Thibaud Cornic <thibaud_cornic@sigmadesigns.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Link: http://lkml.kernel.org/r/1147ef90-7877-e4d2-bb2b-5c4fa8d3144b@free.fr
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoclocksource/dummy_timer: Move hotplug callback after the real timers
Thomas Gleixner [Thu, 15 Dec 2016 11:01:05 +0000 (12:01 +0100)]
clocksource/dummy_timer: Move hotplug callback after the real timers

commit 9bf11ecce5a2758e5a097c2f3a13d08552d0d6f9 upstream.

When the dummy timer callback is invoked before the real timer callbacks,
then it tries to install that timer for the starting CPU. If the platform
does not have a broadcast timer installed the installation fails with a
kernel crash. The crash happens due to a unconditional deference of the non
available broadcast device. This needs to be fixed in the timer core code.

But even when this is fixed in the core code then installing the dummy
timer before the real timers is a pointless exercise.

Move it to the end of the callback list.

Fixes: 00c1d17aab51 ("clocksource/dummy_timer: Convert to hotplug state machine")
Reported-and-tested-by: Mason <slash.tmp@free.fr>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Richard Cochran <rcochran@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
Cc: Sebastian Frias <sf84@laposte.net>
Cc: Thibaud Cornic <thibaud_cornic@sigmadesigns.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Link: http://lkml.kernel.org/r/1147ef90-7877-e4d2-bb2b-5c4fa8d3144b@free.fr
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: fix max_retries _show and _store functions
Carlos Maiolino [Mon, 9 Jan 2017 15:39:03 +0000 (16:39 +0100)]
xfs: fix max_retries _show and _store functions

commit ff97f2399edac1e0fb3fa7851d5fbcbdf04717cf upstream.

max_retries _show and _store functions should test against cfg->max_retries,
not cfg->retry_timeout

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: fix crash and data corruption due to removal of busy COW extents
Christoph Hellwig [Mon, 9 Jan 2017 15:39:02 +0000 (16:39 +0100)]
xfs: fix crash and data corruption due to removal of busy COW extents

commit a1b7a4dea6166cf46be895bce4aac67ea5160fe8 upstream.

There is a race window between write_cache_pages calling
clear_page_dirty_for_io and XFS calling set_page_writeback, in which
the mapping for an inode is tagged neither as dirty, nor as writeback.

If the COW shrinker hits in exactly that window we'll remove the delayed
COW extents and writepages trying to write it back, which in release
kernels will manifest as corruption of the bmap btree, and in debug
kernels will trip the ASSERT about now calling xfs_bmapi_write with the
COWFORK flag for holes.  A complex customer load manages to hit this
window fairly reliably, probably by always having COW writeback in flight
while the cow shrinker runs.

This patch adds another check for having the I_DIRTY_PAGES flag set,
which is still set during this race window.  While this fixes the problem
I'm still not overly happy about the way the COW shrinker works as it
still seems a bit fragile.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: use the actual AG length when reserving blocks
Darrick J. Wong [Mon, 9 Jan 2017 15:39:01 +0000 (16:39 +0100)]
xfs: use the actual AG length when reserving blocks

commit 20e73b000bcded44a91b79429d8fa743247602ad upstream.

We need to use the actual AG length when making per-AG reservations,
since we could otherwise end up reserving more blocks out of the last
AG than there are actual blocks.

Complained-about-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: fix double-cleanup when CUI recovery fails
Darrick J. Wong [Mon, 9 Jan 2017 15:39:00 +0000 (16:39 +0100)]
xfs: fix double-cleanup when CUI recovery fails

commit 7a21272b088894070391a94fdd1c67014020fa1d upstream.

Dan Carpenter reported a double-free of rcur if _defer_finish fails
while we're recovering CUI items.  Fix the error recovery to prevent
this.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: use GPF_NOFS when allocating btree cursors
Darrick J. Wong [Mon, 9 Jan 2017 15:38:59 +0000 (16:38 +0100)]
xfs: use GPF_NOFS when allocating btree cursors

commit b24a978c377be5f14e798cb41238e66fe51aab2f upstream.

Use NOFS for allocating btree cursors, since they can be called
under the ilock.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: ignore leaf attr ichdr.count in verifier during log replay
Eric Sandeen [Mon, 9 Jan 2017 15:38:58 +0000 (16:38 +0100)]
xfs: ignore leaf attr ichdr.count in verifier during log replay

commit 2e1d23370e75d7d89350d41b4ab58c7f6a0e26b2 upstream.

When we create a new attribute, we first create a shortform
attribute, and try to fit the new attribute into it.
If that fails, we copy the (empty) attribute into a leaf attribute,
and do the copy again.  Thus there can be a transient state where
we have an empty leaf attribute.

If we encounter this during log replay, the verifier will fail.
So add a test to ignore this part of the leaf attr verification
during log replay.

Thanks as usual to dchinner for spotting the problem.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: don't cap maximum dedupe request length
Darrick J. Wong [Mon, 9 Jan 2017 15:38:57 +0000 (16:38 +0100)]
xfs: don't cap maximum dedupe request length

commit 1bb33a98702d8360947f18a44349df75ba555d5d upstream.

After various discussions on linux-fsdevel, it has been decided that it
is not necessary to cap the length of a dedupe request, and that
correctly-written userspace client programs will be able to absorb the
change.  Therefore, remove the length clamping behavior.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: don't allow di_size with high bit set
Darrick J. Wong [Mon, 9 Jan 2017 15:38:56 +0000 (16:38 +0100)]
xfs: don't allow di_size with high bit set

commit ef388e2054feedaeb05399ed654bdb06f385d294 upstream.

The on-disk field di_size is used to set i_size, which is a signed
integer of loff_t.  If the high bit of di_size is set, we'll end up with
a negative i_size, which will cause all sorts of problems.  Since the
VFS won't let us create a file with such length, we should catch them
here in the verifier too.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: error out if trying to add attrs and anextents > 0
Darrick J. Wong [Mon, 9 Jan 2017 15:38:55 +0000 (16:38 +0100)]
xfs: error out if trying to add attrs and anextents > 0

commit 0f352f8ee8412bd9d34fb2a6411241da61175c0e upstream.

We shouldn't assert if somehow we end up trying to add an attr fork to
an inode that apparently already has attr extents because this is an
indication of on-disk corruption.  Instead, return an error code to
userspace.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: don't crash if reading a directory results in an unexpected hole
Darrick J. Wong [Mon, 9 Jan 2017 15:38:54 +0000 (16:38 +0100)]
xfs: don't crash if reading a directory results in an unexpected hole

commit 96a3aefb8ffde23180130460b0b2407b328eb727 upstream.

In xfs_dir3_data_read, we can encounter the situation where err == 0 and
*bpp == NULL if the given bno offset happens to be a hole; this leads to
a crash if we try to set the buffer type after the _da_read_buf call.
Holes can happen due to corrupt or malicious entries in the bmbt data,
so be a little more careful when we're handling buffers.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: complain if we don't get nextents bmap records
Darrick J. Wong [Mon, 9 Jan 2017 15:38:53 +0000 (16:38 +0100)]
xfs: complain if we don't get nextents bmap records

commit 356a3225222e5bc4df88aef3419fb6424f18ab69 upstream.

When reading into memory all extents of a btree-format inode fork,
complain if the number of extents we find is not the same as the number
of extents reported in the inode core.  This is needed to stop an IO
action from accessing the garbage areas of the in-core fork.

[dchinner: removed redundant assert]

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: check for bogus values in btree block headers
Darrick J. Wong [Mon, 9 Jan 2017 15:38:52 +0000 (16:38 +0100)]
xfs: check for bogus values in btree block headers

commit bb3be7e7c1c18e1b141d4cadeb98cc89ecf78099 upstream.

When we're reading a btree block, make sure that what we retrieved
matches the owner and level; and has a plausible number of records.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: forbid AG btrees with level == 0
Darrick J. Wong [Mon, 9 Jan 2017 15:38:51 +0000 (16:38 +0100)]
xfs: forbid AG btrees with level == 0

commit d2a047f31e86941fa896e0e3271536d50aba415e upstream.

There is no such thing as a zero-level AG btree since even a single-node
zero-records btree has one level.  Btree cursor constructors read
cur_nlevels straight from disk and then access things like
cur_bufs[cur_nlevels - 1] which is /really/ bad if cur_nlevels is zero!
Therefore, strengthen the verifiers to prevent this possibility.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: handle cow fork in xfs_bmap_trace_exlist
Eric Sandeen [Mon, 9 Jan 2017 15:38:50 +0000 (16:38 +0100)]
xfs: handle cow fork in xfs_bmap_trace_exlist

commit c44a1f22626c153976289e1cd67bdcdfefc16e1f upstream.

By inspection, xfs_bmap_trace_exlist isn't handling cow forks,
and will trace the data fork instead.

Fix this by setting state appropriately if whichfork
== XFS_COW_FORK.

()___()
< @ @ >
 |   |
 {o_o}
  (|)

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: pass state not whichfork to trace_xfs_extlist
Eric Sandeen [Mon, 9 Jan 2017 15:38:49 +0000 (16:38 +0100)]
xfs: pass state not whichfork to trace_xfs_extlist

commit 7710517fc37b1899722707883b54694ea710b3c0 upstream.

When xfs_bmap_trace_exlist called trace_xfs_extlist,
it sent in the "whichfork" var instead of the bmap "state"
as expected (even though state was already set up for this
purpose).

As a result, the xfs_bmap_class in tracing code used
"whichfork" not state in xfs_iext_state_to_fork(), and got
the wrong ifork pointer.  It all goes downhill from
there, including an ASSERT when ifp_bytes is empty
by the time it reaches xfs_iext_get_ext():

XFS: Assertion failed: idx < ifp->if_bytes / sizeof(xfs_bmbt_rec_t)

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: Move AGI buffer type setting to xfs_read_agi
Eric Sandeen [Mon, 9 Jan 2017 15:38:48 +0000 (16:38 +0100)]
xfs: Move AGI buffer type setting to xfs_read_agi

commit 200237d6746faaeaf7f4ff4abbf13f3917cee60a upstream.

We've missed properly setting the buffer type for
an AGI transaction in 3 spots now, so just move it
into xfs_read_agi() and set it if we are in a transaction
to avoid the problem in the future.

This is similar to how it is done in i.e. the dir3
and attr3 read functions.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: pass post-eof speculative prealloc blocks to bmapi
Brian Foster [Mon, 9 Jan 2017 15:38:47 +0000 (16:38 +0100)]
xfs: pass post-eof speculative prealloc blocks to bmapi

commit f782088c9e5d08e9494c63e68b4e85716df3e5f8 upstream.

xfs_file_iomap_begin_delay() implements post-eof speculative
preallocation by extending the block count of the requested delayed
allocation. Now that xfs_bmapi_reserve_delalloc() has been updated to
handle prealloc blocks separately and tag the inode, update
xfs_file_iomap_begin_delay() to use the new parameter and rely on the
former to tag the inode.

Note that this patch does not change behavior.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: use new extent lookup helpers xfs_file_iomap_begin_delay
Christoph Hellwig [Mon, 9 Jan 2017 15:38:46 +0000 (16:38 +0100)]
xfs: use new extent lookup helpers xfs_file_iomap_begin_delay

commit 656152e552e5cbe0c11ad261b524376217c2fb13 upstream.

And only lookup the previous extent inside xfs_iomap_prealloc_size
if we actually need it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: clean up cow fork reservation and tag inodes correctly
Brian Foster [Mon, 9 Jan 2017 15:38:45 +0000 (16:38 +0100)]
xfs: clean up cow fork reservation and tag inodes correctly

commit 0260d8ff5f76617e3a55a1c471383ecb4404c3ad upstream.

COW fork reservation is implemented via delayed allocation. The code is
modeled after the traditional delalloc allocation code, but is slightly
different in terms of how preallocation occurs. Rather than post-eof
speculative preallocation, COW fork preallocation is implemented via a
COW extent size hint that is designed to minimize fragmentation as a
reflinked file is split over time.

xfs_reflink_reserve_cow() still uses logic that is oriented towards
dealing with post-eof speculative preallocation, however, and is stale
or not necessarily correct. First, the EOF alignment to the COW extent
size hint is implemented in xfs_bmapi_reserve_delalloc() (which does so
correctly by aligning the start and end offsets) and so is not necessary
in xfs_reflink_reserve_cow(). The backoff and retry logic on ENOSPC is
also ineffective for the same reason, as xfs_bmapi_reserve_delalloc()
will simply perform the same allocation request on the retry. Finally,
since the COW extent size hint aligns the start and end offset of the
range to allocate, the end_fsb != orig_end_fsb logic is not sufficient.
Indeed, if a write request happens to end on an aligned offset, it is
possible that we do not tag the inode for COW preallocation even though
xfs_bmapi_reserve_delalloc() may have preallocated at the start offset.

Kill the unnecessary, duplicate code in xfs_reflink_reserve_cow().
Remove the inode tag logic as well since xfs_bmapi_reserve_delalloc()
has been updated to tag the inode correctly.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: use new extent lookup helpers in __xfs_reflink_reserve_cow
Christoph Hellwig [Mon, 9 Jan 2017 15:38:44 +0000 (16:38 +0100)]
xfs: use new extent lookup helpers in __xfs_reflink_reserve_cow

commit 2755fc4438501c8c28e7783df890e889f6772bee upstream.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: track preallocation separately in xfs_bmapi_reserve_delalloc()
Brian Foster [Mon, 9 Jan 2017 15:38:43 +0000 (16:38 +0100)]
xfs: track preallocation separately in xfs_bmapi_reserve_delalloc()

commit 974ae922efd93b07b6cdf989ae959883f6f05fd8 upstream.

Speculative preallocation is currently processed entirely by the callers
of xfs_bmapi_reserve_delalloc(). The caller determines how much
preallocation to include, adjusts the extent length and passes down the
resulting request.

While this works fine for post-eof speculative preallocation, it is not
as reliable for COW fork preallocation. COW fork preallocation is
implemented via the cowextszhint, which aligns the start offset as well
as the length of the extent. Further, it is difficult for the caller to
accurately identify when preallocation occurs because the returned
extent could have been merged with neighboring extents in the fork.

To simplify this situation and facilitate further COW fork preallocation
enhancements, update xfs_bmapi_reserve_delalloc() to take a separate
preallocation parameter to incorporate into the allocation request. The
preallocation blocks value is tacked onto the end of the request and
adjusted to accommodate neighboring extents and extent size limits.
Since xfs_bmapi_reserve_delalloc() now knows precisely how much
preallocation was included in the allocation, it can also tag the inodes
appropriately to support preallocation reclaim.

Note that xfs_bmapi_reserve_delalloc() callers are not yet updated to
use the preallocation mechanism. This patch should not change behavior
outside of correctly tagging reflink inodes when start offset
preallocation occurs (which the caller does not handle correctly).

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: remove prev argument to xfs_bmapi_reserve_delalloc
Christoph Hellwig [Mon, 9 Jan 2017 15:38:42 +0000 (16:38 +0100)]
xfs: remove prev argument to xfs_bmapi_reserve_delalloc

commit 65c5f419788d623a0410eca1866134f5e4628594 upstream.

We can easily lookup the previous extent for the cases where we need it,
which saves the callers from looking it up for us later in the series.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: always succeed when deduping zero bytes
Darrick J. Wong [Mon, 9 Jan 2017 15:38:41 +0000 (16:38 +0100)]
xfs: always succeed when deduping zero bytes

commit fba3e594ef0ad911fa8f559732d588172f212d71 upstream.

It turns out that btrfs and xfs had differing interpretations of what
to do when the dedupe length is zero.  Change xfs to follow btrfs'
semantics so that the userland interface is consistent.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: factor rmap btree size into the indlen calculations
Darrick J. Wong [Mon, 9 Jan 2017 15:38:40 +0000 (16:38 +0100)]
xfs: factor rmap btree size into the indlen calculations

commit fd26a88093bab6529ea2de819114ca92dbd1d71d upstream.

When we're estimating the amount of space it's going to take to satisfy
a delalloc reservation, we need to include the space that we might need
to grow the rmapbt.  This helps us to avoid running out of space later
when _iomap_write_allocate needs more space than we reserved.  Eryu Guan
observed this happening on generic/224 when sunit/swidth were set.

Reported-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: new inode extent list lookup helpers
Christoph Hellwig [Mon, 9 Jan 2017 15:38:39 +0000 (16:38 +0100)]
xfs: new inode extent list lookup helpers

commit 93533c7855c3c78c8a900cac65c8d669bb14935d upstream.

xfs_iext_lookup_extent looks up a single extent at the passed in offset,
and returns the extent covering the area, or the one behind it in case
of a hole, as well as the index of the returned extent in arguments,
as well as a simple bool as return value that is set to false if no
extent could be found because the offset is behind EOF.  It is a simpler
replacement for xfs_bmap_search_extent that leaves looking up the rarely
needed previous extent to the caller and has a nicer calling convention.

xfs_iext_get_extent is a helper for iterating over the extent list,
it takes an extent index as input, and returns the extent at that index
in it's expanded form in an argument if it exists.  The actual return
value is a bool whether the index is valid or not.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: fix unbalanced inode reclaim flush locking
Brian Foster [Mon, 9 Jan 2017 15:38:38 +0000 (16:38 +0100)]
xfs: fix unbalanced inode reclaim flush locking

commit 98efe8af1c9ffac47e842b7a75ded903e2f028da upstream.

Filesystem shutdown testing on an older distro kernel has uncovered an
imbalanced locking pattern for the inode flush lock in
xfs_reclaim_inode(). Specifically, there is a double unlock sequence
between the call to xfs_iflush_abort() and xfs_reclaim_inode() at the
"reclaim:" label.

This actually does not cause obvious problems on current kernels due to
the current flush lock implementation. Older kernels use a counting
based flush lock mechanism, however, which effectively breaks the lock
indefinitely when an already unlocked flush lock is repeatedly unlocked.
Though this only currently occurs on filesystem shutdown, it has
reproduced the effect of elevating an fs shutdown to a system-wide crash
or hang.

As it turns out, the flush lock is not actually required for the reclaim
logic in xfs_reclaim_inode() because by that time we have already cycled
the flush lock once while holding ILOCK_EXCL. Therefore, remove the
additional flush lock/unlock cycle around the 'reclaim:' label and
update branches into this label to release the flush lock where
appropriate. Add an assert to xfs_ifunlock() to help prevent future
occurences of the same problem.

Reported-by: Zorro Lang <zlang@redhat.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: check minimum block size for CRC filesystems
Darrick J. Wong [Mon, 9 Jan 2017 15:38:37 +0000 (16:38 +0100)]
xfs: check minimum block size for CRC filesystems

commit bec9d48d7a303a5bb95c05961ff07ec7eeb59058 upstream.

Check the minimum block size on v5 filesystems.

[dchinner: cleaned up XFS_MIN_CRC_BLOCKSIZE check]

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: provide helper for counting extents from if_bytes
Eric Sandeen [Mon, 9 Jan 2017 15:38:36 +0000 (16:38 +0100)]
xfs: provide helper for counting extents from if_bytes

commit 5d829300bee000980a09ac2ccb761cb25867b67c upstream.

The open-coded pattern:

ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t)

is all over the xfs code; provide a new helper
xfs_iext_count(ifp) to count the number of inline extents
in an inode fork.

[dchinner: pick up several missed conversions]

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: don't BUG() on mixed direct and mapped I/O
Brian Foster [Mon, 9 Jan 2017 15:38:35 +0000 (16:38 +0100)]
xfs: don't BUG() on mixed direct and mapped I/O

commit 04197b341f23b908193308b8d63d17ff23232598 upstream.

We've had reports of generic/095 causing XFS to BUG() in
__xfs_get_blocks() due to the existence of delalloc blocks on a
direct I/O read. generic/095 issues a mix of various types of I/O,
including direct and memory mapped I/O to a single file. This is
clearly not supported behavior and is known to lead to such
problems. E.g., the lack of exclusion between the direct I/O and
write fault paths means that a write fault can allocate delalloc
blocks in a region of a file that was previously a hole after the
direct read has attempted to flush/inval the file range, but before
it actually reads the block mapping. In turn, the direct read
discovers a delalloc extent and cannot proceed.

While the appropriate solution here is to not mix direct and memory
mapped I/O to the same regions of the same file, the current
BUG_ON() behavior is probably overkill as it can crash the entire
system.  Instead, localize the failure to the I/O in question by
returning an error for a direct I/O that cannot be handled safely
due to delalloc blocks. Be careful to allow the case of a direct
write to post-eof delalloc blocks. This can occur due to speculative
preallocation and is safe as post-eof blocks are not accompanied by
dirty pages in pagecache (conversely, preallocation within eof must
have been zeroed, and thus dirtied, before the inode size could have
been increased beyond said blocks).

Finally, provide an additional warning if a direct I/O write occurs
while the file is memory mapped. This may not catch all problematic
scenarios, but provides a hint that some known-to-be-problematic I/O
methods are in use.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: don't skip cow forks w/ delalloc blocks in cowblocks scan
Brian Foster [Mon, 9 Jan 2017 15:38:34 +0000 (16:38 +0100)]
xfs: don't skip cow forks w/ delalloc blocks in cowblocks scan

commit 399372349a7f9b2d7e56e4fa4467c69822d07024 upstream.

The cowblocks background scanner currently clears the cowblocks tag
for inodes without any real allocations in the cow fork. This
excludes inodes with only delalloc blocks in the cow fork. While we
might never expect to clear delalloc blocks from the cow fork in the
background scanner, it is not necessarily correct to clear the
cowblocks tag from such inodes.

For example, if the background scanner happens to process an inode
between a buffered write and writeback, the scanner catches the
inode in a state after delalloc blocks have been allocated to the
cow fork but before the delalloc blocks have been converted to real
blocks by writeback. The background scanner then incorrectly clears
the cowblocks tag, even if part of the aforementioned delalloc
reservation will not be remapped to the data fork (i.e., extra
blocks due to the cowextsize hint). This means that any such
additional blocks in the cow fork might never be reclaimed by the
background scanner and could persist until the inode itself is
reclaimed.

To address this problem, only skip and clear inodes without any cow
fork allocations whatsoever from the background scanner. While we
generally do not want to cancel delalloc reservations from the
background scanner, the pagecache dirty check following the
cowblocks check should prevent that situation. If we do end up with
delalloc cow fork blocks without a dirty address space mapping, this
is probably an indication that something has gone wrong and the
blocks should be reclaimed, as they may never be converted to a real
allocation.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: check return value of _trans_reserve_quota_nblks
Darrick J. Wong [Mon, 9 Jan 2017 15:38:33 +0000 (16:38 +0100)]
xfs: check return value of _trans_reserve_quota_nblks

commit 4fd29ec47212c8cbf98916af519019ccc5e58e49 upstream.

Check the return value of xfs_trans_reserve_quota_nblks for errors.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: don't call xfs_sb_quota_from_disk twice
Eric Sandeen [Mon, 9 Jan 2017 15:38:32 +0000 (16:38 +0100)]
xfs: don't call xfs_sb_quota_from_disk twice

commit e6fc6fcf4447c9266038c55c25e4c7c14bee110c upstream.

Source xfsprogs commit: ee3754254e8c186c99b6cdd4d59f741759d04acb

Kernel commit 5ef828c4 ("xfs: avoid false quotacheck after unclean
shutdown") made xfs_sb_from_disk() also call xfs_sb_quota_from_disk
by default.

However, when this was merged to libxfs, existing separate
calls to libxfs_sb_quota_from_disk remained, and calling it
twice in a row on a V4 superblock leads to issues, because:

        if (sbp->sb_qflags & XFS_PQUOTA_ACCT)  {
...
                sbp->sb_pquotino = sbp->sb_gquotino;
                sbp->sb_gquotino = NULLFSINO;

and after the second call, we have set both pquotino and gquotino
to NULLFSINO.

Fix this by making it safe to call twice, and also remove the extra
calls to libxfs_sb_quota_from_disk.

This is only spotted when running xfstests with "-m crc=0" because
the sb_from_disk change came about after V5 became default, and
the above behavior only exists on a V4 superblock.

Reported-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agotpm_tis: Check return values from get_burstcount.
Josh Zimmerman [Thu, 27 Oct 2016 21:50:09 +0000 (14:50 -0700)]
tpm_tis: Check return values from get_burstcount.

commit 26a137e31ffe6fbfdb008554a8d9b3d55bd5c86e upstream.

If the TPM we're connecting to uses a static burst count, it will report
a burst count of zero throughout the response read. However, get_burstcount
assumes that a response of zero indicates that the TPM is not ready to
receive more data. In this case, it returns a negative error code, which
is passed on to tpm_tis_{write,read}_bytes as a u16, causing
them to read/write far too many bytes.

This patch checks for negative return codes and bails out from recv_data
and tpm_tis_send_data.

Fixes: 1107d065fdf1 (tpm_tis: Introduce intermediate layer for TPM access)
Signed-off-by: Josh Zimmerman <joshz@google.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodrm/i915/gen9: fix the WM memory bandwidth WA for Y tiling cases
Paulo Zanoni [Tue, 8 Nov 2016 20:22:11 +0000 (18:22 -0200)]
drm/i915/gen9: fix the WM memory bandwidth WA for Y tiling cases

commit 2ef32dee97fcf41987722a37eb6ff1a983915e99 upstream.

The previous spec version said "double Ytile planes minimum lines",
and I interpreted this as referring to what the spec calls "Y tile
minimum", but in fact it was referring to what the spec calls "Minimum
Scanlines for Y tile". I noticed that Mahesh Kumar had a different
interpretation, so I sent and email to the spec authors and got
clarification on the correct meaning. Also, BSpec was updated and
should be clear now.

Fixes: ee3d532fcb64 ("drm/i915/gen9: unconditionally apply the memory bandwidth WA")
Cc: stable@vger.kernel.org
Cc: Mahesh Kumar <mahesh1.kumar@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1478636531-6081-1-git-send-email-paulo.r.zanoni@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>