]> git.itanic.dy.fi Git - linux-stable/log
linux-stable
7 years agoLinux 4.8.15 v4.8.15
Greg Kroah-Hartman [Thu, 15 Dec 2016 16:50:48 +0000 (08:50 -0800)]
Linux 4.8.15

7 years agocrypto: rsa - Add Makefile dependencies to fix parallel builds
David Michael [Tue, 29 Nov 2016 19:15:12 +0000 (11:15 -0800)]
crypto: rsa - Add Makefile dependencies to fix parallel builds

commit 57891633eeef60e732e045731cf20e50ee80acb4 upstream.

Both asn1 headers are included by rsa_helper.c, so rsa_helper.o
should explicitly depend on them.

Signed-off-by: David Michael <david.michael@coreos.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Tuomas Tynkkynen <tuomas@tuxera.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agohotplug: Make register and unregister notifier API symmetric
Michal Hocko [Wed, 7 Dec 2016 13:54:38 +0000 (14:54 +0100)]
hotplug: Make register and unregister notifier API symmetric

commit 777c6e0daebb3fcefbbd6f620410a946b07ef6d0 upstream.

Yu Zhao has noticed that __unregister_cpu_notifier only unregisters its
notifiers when HOTPLUG_CPU=y while the registration might succeed even
when HOTPLUG_CPU=n if MODULE is enabled. This means that e.g. zswap
might keep a stale notifier on the list on the manual clean up during
the pool tear down and thus corrupt the list. Resulting in the following

[  144.964346] BUG: unable to handle kernel paging request at ffff880658a2be78
[  144.971337] IP: [<ffffffffa290b00b>] raw_notifier_chain_register+0x1b/0x40
<snipped>
[  145.122628] Call Trace:
[  145.125086]  [<ffffffffa28e5cf8>] __register_cpu_notifier+0x18/0x20
[  145.131350]  [<ffffffffa2a5dd73>] zswap_pool_create+0x273/0x400
[  145.137268]  [<ffffffffa2a5e0fc>] __zswap_param_set+0x1fc/0x300
[  145.143188]  [<ffffffffa2944c1d>] ? trace_hardirqs_on+0xd/0x10
[  145.149018]  [<ffffffffa2908798>] ? kernel_param_lock+0x28/0x30
[  145.154940]  [<ffffffffa2a3e8cf>] ? __might_fault+0x4f/0xa0
[  145.160511]  [<ffffffffa2a5e237>] zswap_compressor_param_set+0x17/0x20
[  145.167035]  [<ffffffffa2908d3c>] param_attr_store+0x5c/0xb0
[  145.172694]  [<ffffffffa290848d>] module_attr_store+0x1d/0x30
[  145.178443]  [<ffffffffa2b2b41f>] sysfs_kf_write+0x4f/0x70
[  145.183925]  [<ffffffffa2b2a5b9>] kernfs_fop_write+0x149/0x180
[  145.189761]  [<ffffffffa2a99248>] __vfs_write+0x18/0x40
[  145.194982]  [<ffffffffa2a9a412>] vfs_write+0xb2/0x1a0
[  145.200122]  [<ffffffffa2a9a732>] SyS_write+0x52/0xa0
[  145.205177]  [<ffffffffa2ff4d97>] entry_SYSCALL_64_fastpath+0x12/0x17

This can be even triggered manually by changing
/sys/module/zswap/parameters/compressor multiple times.

Fix this issue by making unregister APIs symmetric to the register so
there are no surprises.

Fixes: 47e627bc8c9a ("[PATCH] hotplug: Allow modules to use the cpu hotplug notifiers even if !CONFIG_HOTPLUG_CPU")
Reported-and-tested-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Link: http://lkml.kernel.org/r/20161207135438.4310-1-mhocko@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agobatman-adv: Check for alloc errors when preparing TT local data
Sven Eckelmann [Wed, 30 Nov 2016 20:47:09 +0000 (21:47 +0100)]
batman-adv: Check for alloc errors when preparing TT local data

commit c2d0f48a13e53b4747704c9e692f5e765e52041a upstream.

batadv_tt_prepare_tvlv_local_data can fail to allocate the memory for the
new TVLV block. The caller is informed about this problem with the returned
length of 0. Not checking this value results in an invalid memory access
when either tt_data or tt_change is accessed.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 7ea7b4a14275 ("batman-adv: make the TT CRC logic VLAN specific")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agom68k: Fix ndelay() macro
Boris Brezillon [Fri, 28 Oct 2016 15:12:28 +0000 (17:12 +0200)]
m68k: Fix ndelay() macro

commit 7e251bb21ae08ca2e4fb28cc0981fac2685a8efa upstream.

The current ndelay() macro definition has an extra semi-colon at the
end of the line thus leading to a compilation error when ndelay is used
in a conditional block without curly braces like this one:

if (cond)
ndelay(t);
else
...

which, after the preprocessor pass gives:

if (cond)
m68k_ndelay(t);;
else
...

thus leading to the following gcc error:

error: 'else' without a previous 'if'

Remove this extra semi-colon.

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Fixes: c8ee038bd1488 ("m68k: Implement ndelay() based on the existing udelay() logic")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoceph: don't set req->r_locked_dir in ceph_d_revalidate
Jeff Layton [Wed, 30 Nov 2016 20:56:46 +0000 (15:56 -0500)]
ceph: don't set req->r_locked_dir in ceph_d_revalidate

commit c3f4688a08fd86f1bf8e055724c84b7a40a09733 upstream.

This function sets req->r_locked_dir which is supposed to indicate to
ceph_fill_trace that the parent's i_rwsem is locked for write.
Unfortunately, there is no guarantee that the dir will be locked when
d_revalidate is called, so we really don't want ceph_fill_trace to do
any dcache manipulation from this context. Clear req->r_locked_dir since
it's clearly not safe to do that.

What we really want to know with d_revalidate is whether the dentry
still points to the same inode. ceph_fill_trace installs a pointer to
the inode in req->r_target_inode, so we can just compare that to
d_inode(dentry) to see if it's the same one after the lookup.

Also, since we aren't generally interested in the parent here, we can
switch to using a GETATTR to hint that to the MDS, which also means that
we only need to reserve one cap.

Finally, just remove the d_unhashed check. That's really outside the
purview of a filesystem's d_revalidate. If the thing became unhashed
while we're checking it, then that's up to the VFS to handle anyway.

Fixes: 200fd27c8fa2 ("ceph: use lookup request to revalidate dentry")
Link: http://tracker.ceph.com/issues/18041
Reported-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoARM: dts: imx7d: fix LCDIF clock assignment
Stefan Agner [Wed, 23 Nov 2016 00:42:04 +0000 (16:42 -0800)]
ARM: dts: imx7d: fix LCDIF clock assignment

commit 4b707fa00a80b19b80bc8df6f1cbf4bdd9c91402 upstream.

The eLCDIF IP of the i.MX 7 SoC knows multiple clocks and lists them
separately:

Clock      Clock Root              Description
apb_clk    MAIN_AXI_CLK_ROOT       AXI clock
pix_clk    LCDIF_PIXEL_CLK_ROOT    Pixel clock
ipg_clk_s  MAIN_AXI_CLK_ROOT       Peripheral access clock

All of them are switched by a single gate, which is part of the
IMX7D_LCDIF_PIXEL_ROOT_CLK clock. Hence using that clock also for
the AXI bus clock (clock-name "axi") makes sure the gate gets
enabled when accessing registers.

There seem to be no separate AXI display clock, and the clock is
optional. Hence remove the dummy clock.

This fixes kernel freezes when starting the X-Server (which
disables/re-enables the display controller).

Fixes: e8ed73f691bd ("ARM: dts: imx7d: add lcdif support")
Signed-off-by: Stefan Agner <stefan@agner.ch>
Reviewed-by: Fabio Estevam <fabio.estevam@nxp.com>
Acked-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoARM: dts: orion5x: fix number of sata port for linkstation ls-gl
Roger Shimizu [Thu, 1 Dec 2016 15:11:12 +0000 (00:11 +0900)]
ARM: dts: orion5x: fix number of sata port for linkstation ls-gl

commit 038ccb3e8cee52e07dc118ff99f47eaebc1d0746 upstream.

Bug report from Debian [0] shows there's minor changed model of
Linkstation LS-GL that uses the 2nd SATA port of the SoC.
So it's necessary to enable two SATA ports, though for that specific
model only the 2nd one is used.

[0] https://bugs.debian.org/845611

Fixes: b1742ffa9ddb ("ARM: dts: orion5x: add device tree for buffalo linkstation ls-gl")
Reported-by: Ryan Tandy <ryan@nardis.ca>
Tested-by: Ryan Tandy <ryan@nardis.ca>
Signed-off-by: Roger Shimizu <rogershimizu@gmail.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoRevert "ACPI: Execute _PTS before system reboot"
Rafael J. Wysocki [Mon, 21 Nov 2016 13:25:49 +0000 (14:25 +0100)]
Revert "ACPI: Execute _PTS before system reboot"

commit 9713adc2a1a5488f4889c657a0c0ce0c16056d3c upstream.

Revert commit 2c85025c75df (ACPI: Execute _PTS before system reboot)
as it is reported to cause poweroff and reboot to hang on Dell
Latitude E7250.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=187061
Reported-by: Gianpaolo <gianpaoloc@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocan: peak: fix bad memory access and free sequence
추지호 [Thu, 8 Dec 2016 12:01:13 +0000 (12:01 +0000)]
can: peak: fix bad memory access and free sequence

commit b67d0dd7d0dc9e456825447bbeb935d8ef43ea7c upstream.

Fix for bad memory access while disconnecting. netdev is freed before
private data free, and dev is accessed after freeing netdev.

This makes a slub problem, and it raise kernel oops with slub debugger
config.

Signed-off-by: Jiho Chu <jiho.chu@samsung.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocan: raw: raw_setsockopt: limit number of can_filter that can be set
Marc Kleine-Budde [Mon, 5 Dec 2016 10:44:23 +0000 (11:44 +0100)]
can: raw: raw_setsockopt: limit number of can_filter that can be set

commit 332b05ca7a438f857c61a3c21a88489a21532364 upstream.

This patch adds a check to limit the number of can_filters that can be
set via setsockopt on CAN_RAW sockets. Otherwise allocations > MAX_ORDER
are not prevented resulting in a warning.

Reference: https://lkml.org/lkml/2016/12/2/230

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocrypto: marvell - Don't corrupt state of an STD req for re-stepped ahash
Romain Perier [Mon, 5 Dec 2016 08:56:39 +0000 (09:56 +0100)]
crypto: marvell - Don't corrupt state of an STD req for re-stepped ahash

commit 9e5f7a149e00d211177f6de8be427ebc72a1c363 upstream.

mv_cesa_hash_std_step() copies the creq->state into the SRAM at each
step, but this is only required on the first one. By doing that, we
overwrite the engine state, and get erroneous results when the crypto
request is split in several chunks to fit in the internal SRAM.

This commit changes the function to copy the state only on the first
step.

Fixes: commit 2786cee8e50b ("crypto: marvell - Move SRAM I/O op...")
Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocrypto: mcryptd - Check mcryptd algorithm compatibility
tim [Mon, 5 Dec 2016 19:46:31 +0000 (11:46 -0800)]
crypto: mcryptd - Check mcryptd algorithm compatibility

commit 48a992727d82cb7db076fa15d372178743b1f4cd upstream.

Algorithms not compatible with mcryptd could be spawned by mcryptd
with a direct crypto_alloc_tfm invocation using a "mcryptd(alg)" name
construct.  This causes mcryptd to crash the kernel if an arbitrary
"alg" is incompatible and not intended to be used with mcryptd.  It is
an issue if AF_ALG tries to spawn mcryptd(alg) to expose it externally.
But such algorithms must be used internally and not be exposed.

We added a check to enforce that only internal algorithms are allowed
with mcryptd at the time mcryptd is spawning an algorithm.

Link: http://marc.info/?l=linux-crypto-vger&m=148063683310477&w=2
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocrypto: caam - fix pointer size for AArch64 boot loader, AArch32 kernel
Horia Geantă [Mon, 5 Dec 2016 09:06:58 +0000 (11:06 +0200)]
crypto: caam - fix pointer size for AArch64 boot loader, AArch32 kernel

commit 39eaf759466f4e3fbeaa39075512f4f345dffdc8 upstream.

Start with a clean slate before dealing with bit 16 (pointer size)
of Master Configuration Register.
This fixes the case of AArch64 boot loader + AArch32 kernel, when
the boot loader might set MCFGR[PS] and kernel would fail to clear it.

Reported-by: Alison Wang <alison.wang@nxp.com>
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Reviewed-By: Alison Wang <Alison.wang@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocrypto: marvell - Don't copy hash operation twice into the SRAM
Romain Perier [Mon, 5 Dec 2016 08:56:38 +0000 (09:56 +0100)]
crypto: marvell - Don't copy hash operation twice into the SRAM

commit 68c7f8c1c4e9b06e6b153fa3e9e0cda2ef5aaed8 upstream.

No need to copy the template of an hash operation twice into the SRAM
from the step function.

Fixes: commit 85030c5168f1 ("crypto: marvell - Add support for chai...")
Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoacpi, nfit: fix bus vs dimm confusion in xlat_status
Dan Williams [Tue, 6 Dec 2016 23:06:55 +0000 (15:06 -0800)]
acpi, nfit: fix bus vs dimm confusion in xlat_status

commit d6eb270c57fef35798525004ddf2ac5dcdadd43b upstream.

Given dimms and bus commands share the same command number space we need
to be careful that we are translating status in the correct context.
Otherwise we can, for example, fail an ND_CMD_GET_CONFIG_SIZE command
because max_xfer is zero. It fails because that condition erroneously
correlates with the 'cleared == 0' failure of ND_CMD_CLEAR_ERROR.

Fixes: aef253382266 ("libnvdimm, nfit: centralize command status translation")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoacpi, nfit: validate ars_status output buffer size
Dan Williams [Tue, 6 Dec 2016 20:45:24 +0000 (12:45 -0800)]
acpi, nfit: validate ars_status output buffer size

commit 82aa37cf09867c5e2c0326649d570e5b25c1189a upstream.

If an ARS Status command returns truncated output, do not process
partial records or otherwise consume non-status fields.

Fixes: 0caeef63e6d2 ("libnvdimm: Add a poison list and export badblocks")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoacpi, nfit, libnvdimm: fix / harden ars_status output length handling
Dan Williams [Tue, 6 Dec 2016 17:10:12 +0000 (09:10 -0800)]
acpi, nfit, libnvdimm: fix / harden ars_status output length handling

commit efda1b5d87cbc3d8816f94a3815b413f1868e10d upstream.

Given ambiguities in the ACPI 6.1 definition of the "Output (Size)"
field of the ARS (Address Range Scrub) Status command, a firmware
implementation may in practice return 0, 4, or 8 to indicate that there
is no output payload to process.

The specification states "Size of Output Buffer in bytes, including this
field.". However, 'Output Buffer' is also the name of the entire
payload, and earlier in the specification it states "Max Query ARS
Status Output Buffer Size: Maximum size of buffer (including the Status
and Extended Status fields)".

Without this fix if the BIOS happens to return 0 it causes memory
corruption as evidenced by this result from the acpi_nfit_ctl() unit
test.

 ars_status00000000: 00020000 00000000                    ........
 BUG: stack guard page was hit at ffffc90001750000 (stack is ffffc9000174c000..ffffc9000174ffff)
 kernel stack overflow (page fault): 0000 [#1] SMP DEBUG_PAGEALLOC
 task: ffff8803332d2ec0 task.stack: ffffc9000174c000
 RIP: 0010:[<ffffffff814cfe72>]  [<ffffffff814cfe72>] __memcpy+0x12/0x20
 RSP: 0018:ffffc9000174f9a8  EFLAGS: 00010246
 RAX: ffffc9000174fab8 RBX: 0000000000000000 RCX: 000000001fffff56
 RDX: 0000000000000000 RSI: ffff8803231f5a08 RDI: ffffc90001750000
 RBP: ffffc9000174fa88 R08: ffffc9000174fab0 R09: ffff8803231f54b8
 R10: 0000000000000008 R11: 0000000000000001 R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000003 R15: ffff8803231f54a0
 FS:  00007f3a611af640(0000) GS:ffff88033ed00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: ffffc90001750000 CR3: 0000000325b20000 CR4: 00000000000406e0
 Stack:
  ffffffffa00bc60d 0000000000000008 ffffc90000000001 ffffc9000174faac
  0000000000000292 ffffffffa00c24e4 ffffffffa00c2914 0000000000000000
  0000000000000000 ffffffff00000003 ffff880331ae8ad0 0000000800000246
 Call Trace:
  [<ffffffffa00bc60d>] ? acpi_nfit_ctl+0x49d/0x750 [nfit]
  [<ffffffffa01f4fe0>] nfit_test_probe+0x670/0xb1b [nfit_test]

Fixes: 747ffe11b440 ("libnvdimm, tools/testing/nvdimm: fix 'ars_status' output buffer sizing")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoacpi, nfit: fix extended status translations for ACPI DSMs
Vishal Verma [Tue, 6 Dec 2016 00:00:37 +0000 (17:00 -0700)]
acpi, nfit: fix extended status translations for ACPI DSMs

commit 9a901f5495e26e691c7d0ea7b6057a2f3e6330ed upstream.

ACPI DSMs can have an 'extended' status which can be non-zero to convey
additional information about the command. In the xlat_status routine,
where we translate the command statuses, we were returning an error for
a non-zero extended status, even if the primary status indicated success.

Return from each command's 'case' once we have verified both its status
and extend status are good.

Fixes: 11294d63ac91 ("nfit: fail DSMs that return non-zero status by default")
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoperf/x86: Fix full width counter, counter overflow
Peter Zijlstra (Intel) [Tue, 29 Nov 2016 20:33:28 +0000 (20:33 +0000)]
perf/x86: Fix full width counter, counter overflow

commit 7f612a7f0bc13a2361a152862435b7941156b6af upstream.

Lukasz reported that perf stat counters overflow handling is broken on KNL/SLM.

Both these parts have full_width_write set, and that does indeed have
a problem. In order to deal with counter wrap, we must sample the
counter at at least half the counter period (see also the sampling
theorem) such that we can unambiguously reconstruct the count.

However commit:

  069e0c3c4058 ("perf/x86/intel: Support full width counting")

sets the sampling interval to the full period, not half.

Fixing that exposes another issue, in that we must not sign extend the
delta value when we shift it right; the counter cannot have
decremented after all.

With both these issues fixed, counter overflow functions correctly
again.

Reported-by: Lukasz Odzioba <lukasz.odzioba@intel.com>
Tested-by: Liang, Kan <kan.liang@intel.com>
Tested-by: Odzioba, Lukasz <lukasz.odzioba@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: 069e0c3c4058 ("perf/x86/intel: Support full width counting")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agovhost-vsock: fix orphan connection reset
Peng Tao [Thu, 8 Dec 2016 17:10:46 +0000 (01:10 +0800)]
vhost-vsock: fix orphan connection reset

commit c4587631c7bad47c045e081d1553cd73a23be59a upstream.

local_addr.svm_cid is host cid. We should check guest cid instead,
which is remote_addr.svm_cid. Otherwise we end up resetting all
connections to all guests.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Peng Tao <bergwolf@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agosched/autogroup: Fix 64-bit kernel nice level adjustment
Mike Galbraith [Wed, 23 Nov 2016 10:33:37 +0000 (11:33 +0100)]
sched/autogroup: Fix 64-bit kernel nice level adjustment

commit 83929cce95251cc77e5659bf493bd424ae0e7a67 upstream.

Michael Kerrisk reported:

> Regarding the previous paragraph...  My tests indicate
> that writing *any* value to the autogroup [nice priority level]
> file causes the task group to get a lower priority.

Because autogroup didn't call the then meaningless scale_load()...

Autogroup nice level adjustment has been broken ever since load
resolution was increased for 64-bit kernels.  Use scale_load() to
scale group weight.

Michael Kerrisk tested this patch to fix the problem:

> Applied and tested against 4.9-rc6 on an Intel u7 (4 cores).
> Test setup:
>
> Terminal window 1: running 40 CPU burner jobs
> Terminal window 2: running 40 CPU burner jobs
> Terminal window 1: running  1 CPU burner job
>
> Demonstrated that:
> * Writing "0" to the autogroup file for TW1 now causes no change
>   to the rate at which the process on the terminal consume CPU.
> * Writing -20 to the autogroup file for TW1 caused those processes
>   to get the lion's share of CPU while TW2 TW3 get a tiny amount.
> * Writing -20 to the autogroup files for TW1 and TW3 allowed the
>   process on TW3 to get as much CPU as it was getting as when
>   the autogroup nice values for both terminals were 0.

Reported-by: Michael Kerrisk <mtk.manpages@gmail.com>
Tested-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-man <linux-man@vger.kernel.org>
Link: http://lkml.kernel.org/r/1479897217.4306.6.camel@gmx.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoscsi: lpfc: fix oops/BUG in lpfc_sli_ringtxcmpl_put()
Mauricio Faria de Oliveira [Wed, 23 Nov 2016 12:33:19 +0000 (10:33 -0200)]
scsi: lpfc: fix oops/BUG in lpfc_sli_ringtxcmpl_put()

commit 2319f847a8910cff1d46c9b66aa1dd7cc3e836a9 upstream.

The BUG_ON() recently introduced in lpfc_sli_ringtxcmpl_put() is hit in
the lpfc_els_abort() > lpfc_sli_issue_abort_iotag() >
lpfc_sli_abort_iotag_issue() function path [similar names], due to
'piocb->vport == NULL':

BUG_ON(!piocb || !piocb->vport);

This happens because lpfc_sli_abort_iotag_issue() doesn't set the
'abtsiocbp->vport' pointer -- but this is not the problem.

Previously, lpfc_sli_ringtxcmpl_put() accessed 'piocb->vport' only if
'piocb->iocb.ulpCommand' is neither CMD_ABORT_XRI_CN nor
CMD_CLOSE_XRI_CN, which are the only possible values for
lpfc_sli_abort_iotag_issue():

    lpfc_sli_ringtxcmpl_put():

        if ((unlikely(pring->ringno == LPFC_ELS_RING)) &&
           (piocb->iocb.ulpCommand != CMD_ABORT_XRI_CN) &&
           (piocb->iocb.ulpCommand != CMD_CLOSE_XRI_CN) &&
            (!(piocb->vport->load_flag & FC_UNLOADING)))

    lpfc_sli_abort_iotag_issue():

        if (phba->link_state >= LPFC_LINK_UP)
                iabt->ulpCommand = CMD_ABORT_XRI_CN;
        else
                iabt->ulpCommand = CMD_CLOSE_XRI_CN;

So, this function path would not have hit this possible NULL pointer
dereference before.

In order to fix this regression, move the second part of the BUG_ON()
check prior to the pointer dereference that it does check for.

For reference, this is the stack trace observed. The problem happened
because an unsolicited event was received - a PLOGI was received after
our PLOGI was issued but not yet complete, so the discovery state
machine goes on to sw-abort our PLOGI.

    kernel BUG at drivers/scsi/lpfc/lpfc_sli.c:1326!
    Oops: Exception in kernel mode, sig: 5 [#1]
    <...>
    NIP [...] lpfc_sli_ringtxcmpl_put+0x1c/0xf0 [lpfc]
    LR  [...] __lpfc_sli_issue_iocb_s4+0x188/0x200 [lpfc]
    Call Trace:
    [...] [...] __lpfc_sli_issue_iocb_s4+0xb0/0x200 [lpfc] (unreliable)
    [...] [...] lpfc_sli_issue_abort_iotag+0x2b4/0x350 [lpfc]
    [...] [...] lpfc_els_abort+0x1a8/0x4a0 [lpfc]
    [...] [...] lpfc_rcv_plogi+0x6d4/0x700 [lpfc]
    [...] [...] lpfc_rcv_plogi_plogi_issue+0xd8/0x1d0 [lpfc]
    [...] [...] lpfc_disc_state_machine+0xc0/0x2b0 [lpfc]
    [...] [...] lpfc_els_unsol_buffer+0xcc0/0x26c0 [lpfc]
    [...] [...] lpfc_els_unsol_event+0xa8/0x220 [lpfc]
    [...] [...] lpfc_complete_unsol_iocb+0xb8/0x138 [lpfc]
    [...] [...] lpfc_sli4_handle_received_buffer+0x6a0/0xec0 [lpfc]
    [...] [...] lpfc_sli_handle_slow_ring_event_s4+0x1c4/0x240 [lpfc]
    [...] [...] lpfc_sli_handle_slow_ring_event+0x24/0x40 [lpfc]
    [...] [...] lpfc_do_work+0xd88/0x1970 [lpfc]
    [...] [...] kthread+0x108/0x130
    [...] [...] ret_from_kernel_thread+0x5c/0xbc
    <...>

Fixes: 22466da5b4b7 ("lpfc: Fix possible NULL pointer dereference")
Reported-by: Harsha Thyagaraja <hathyaga@in.ibm.com>
Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodevice-dax: fix private mapping restriction, permit read-only
Dan Williams [Wed, 7 Dec 2016 01:03:35 +0000 (17:03 -0800)]
device-dax: fix private mapping restriction, permit read-only

commit 325896ffdf90f7cbd59fb873b7ba20d60d1ddf3c upstream.

Hugh notes in response to commit 4cb19355ea19 "device-dax: fail all
private mapping attempts":

  "I think that is more restrictive than you intended: haven't tried, but I
  believe it rejects a PROT_READ, MAP_SHARED, O_RDONLY fd mmap, leaving no
  way to mmap /dev/dax without write permission to it."

Indeed it does restrict read-only mappings, switch to checking
VM_MAYSHARE, not VM_SHARED.

Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Pawel Lebioda <pawel.lebioda@intel.com>
Fixes: 4cb19355ea19 ("device-dax: fail all private mapping attempts")
Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agolocking/rtmutex: Use READ_ONCE() in rt_mutex_owner()
Thomas Gleixner [Wed, 30 Nov 2016 21:04:42 +0000 (21:04 +0000)]
locking/rtmutex: Use READ_ONCE() in rt_mutex_owner()

commit 1be5d4fa0af34fb7bafa205aeb59f5c7cc7a089d upstream.

While debugging the rtmutex unlock vs. dequeue race Will suggested to use
READ_ONCE() in rt_mutex_owner() as it might race against the
cmpxchg_release() in unlock_rt_mutex_safe().

Will: "It's a minor thing which will most likely not matter in practice"

Careful search did not unearth an actual problem in todays code, but it's
better to be safe than surprised.

Suggested-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: David Daney <ddaney@caviumnetworks.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20161130210030.431379999@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agolocking/rtmutex: Prevent dequeue vs. unlock race
Thomas Gleixner [Wed, 30 Nov 2016 21:04:41 +0000 (21:04 +0000)]
locking/rtmutex: Prevent dequeue vs. unlock race

commit dbb26055defd03d59f678cb5f2c992abe05b064a upstream.

David reported a futex/rtmutex state corruption. It's caused by the
following problem:

CPU0 CPU1 CPU2

l->owner=T1
rt_mutex_lock(l)
lock(l->wait_lock)
l->owner = T1 | HAS_WAITERS;
enqueue(T2)
boost()
  unlock(l->wait_lock)
schedule()

rt_mutex_lock(l)
lock(l->wait_lock)
l->owner = T1 | HAS_WAITERS;
enqueue(T3)
boost()
  unlock(l->wait_lock)
schedule()
signal(->T2) signal(->T3)
lock(l->wait_lock)
dequeue(T2)
deboost()
  unlock(l->wait_lock)
lock(l->wait_lock)
dequeue(T3)
  ===> wait list is now empty
deboost()
 unlock(l->wait_lock)
lock(l->wait_lock)
fixup_rt_mutex_waiters()
  if (wait_list_empty(l)) {
    owner = l->owner & ~HAS_WAITERS;
    l->owner = owner
     ==> l->owner = T1
  }

lock(l->wait_lock)
rt_mutex_unlock(l) fixup_rt_mutex_waiters()
  if (wait_list_empty(l)) {
    owner = l->owner & ~HAS_WAITERS;
cmpxchg(l->owner, T1, NULL)
 ===> Success (l->owner = NULL)
    l->owner = owner
     ==> l->owner = T1
  }

That means the problem is caused by fixup_rt_mutex_waiters() which does the
RMW to clear the waiters bit unconditionally when there are no waiters in
the rtmutexes rbtree.

This can be fatal: A concurrent unlock can release the rtmutex in the
fastpath because the waiters bit is not set. If the cmpxchg() gets in the
middle of the RMW operation then the previous owner, which just unlocked
the rtmutex is set as the owner again when the write takes place after the
successfull cmpxchg().

The solution is rather trivial: verify that the owner member of the rtmutex
has the waiters bit set before clearing it. This does not require a
cmpxchg() or other atomic operations because the waiters bit can only be
set and cleared with the rtmutex wait_lock held. It's also safe against the
fast path unlock attempt. The unlock attempt via cmpxchg() will either see
the bit set and take the slowpath or see the bit cleared and release it
atomically in the fastpath.

It's remarkable that the test program provided by David triggers on ARM64
and MIPS64 really quick, but it refuses to reproduce on x86-64, while the
problem exists there as well. That refusal might explain that this got not
discovered earlier despite the bug existing from day one of the rtmutex
implementation more than 10 years ago.

Thanks to David for meticulously instrumenting the code and providing the
information which allowed to decode this subtle problem.

Reported-by: David Daney <ddaney@caviumnetworks.com>
Tested-by: David Daney <david.daney@cavium.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Fixes: 23f78d4a03c5 ("[PATCH] pi-futex: rt mutex core")
Link: http://lkml.kernel.org/r/20161130210030.351136722@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agozram: restrict add/remove attributes to root only
Sergey Senozhatsky [Wed, 7 Dec 2016 22:44:31 +0000 (14:44 -0800)]
zram: restrict add/remove attributes to root only

commit 5c7e9ccd91b90d87029261f8856294ee51934cab upstream.

zram hot_add sysfs attribute is a very 'special' attribute - reading
from it creates a new uninitialized zram device.  This file, by a
mistake, can be read by a 'normal' user at the moment, while only root
must be able to create a new zram device, therefore hot_add attribute
must have S_IRUSR mode, not S_IRUGO.

[akpm@linux-foundation.org: s/sence/sense/, reflow comment to use 80 cols]
Fixes: 6566d1a32bf72 ("zram: add dynamic device add/remove functionality")
Link: http://lkml.kernel.org/r/20161205155845.20129-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Steven Allen <steven@stebalien.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoparisc: Fix TLB related boot crash on SMP machines
Helge Deller [Thu, 8 Dec 2016 20:00:46 +0000 (21:00 +0100)]
parisc: Fix TLB related boot crash on SMP machines

commit 24d0492b7d5d321a9c5846c8c974eba9823ffaa0 upstream.

At bootup we run measurements to calculate the best threshold for when we
should be using full TLB flushes instead of just flushing a specific amount of
TLB entries.  This performance test is run over the kernel text segment.

But running this TLB performance test on the kernel text segment turned out to
crash some SMP machines when the kernel text pages were mapped as huge pages.

To avoid those crashes this patch simply skips this test on some SMP machines
and calculates an optimal threshold based on the maximum number of available
TLB entries and number of online CPUs.

On a technical side, this seems to happen:
The TLB measurement code uses flush_tlb_kernel_range() to flush specific TLB
entries with a page size of 4k (pdtlb 0(sr1,addr)). On UP systems this purge
instruction seems to work without problems even if the pages were mapped as
huge pages.  But on SMP systems the TLB purge instruction is broadcasted to
other CPUs. Those CPUs then crash the machine because the page size is not as
expected.  C8000 machines with PA8800/PA8900 CPUs were not affected by this
problem, because the required cache coherency prohibits to use huge pages at
all.  Sadly I didn't found any documentation about this behaviour, so this
finding is purely based on testing with phyiscal SMP machines (A500-44 and
J5000, both were 2-way boxes).

Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoparisc: Remove unnecessary TLB purges from flush_dcache_page_asm and flush_icache_pag...
John David Anglin [Wed, 7 Dec 2016 03:02:01 +0000 (22:02 -0500)]
parisc: Remove unnecessary TLB purges from flush_dcache_page_asm and flush_icache_page_asm

commit febe42964fe182281859b3d43d844bb25ca49367 upstream.

We have four routines in pacache.S that use temporary alias pages:
copy_user_page_asm(), clear_user_page_asm(), flush_dcache_page_asm() and
flush_icache_page_asm().  copy_user_page_asm() and clear_user_page_asm()
don't purge the TLB entry used for the operation.
flush_dcache_page_asm() and flush_icache_page_asm do purge the entry.

Presumably, this was thought to optimize TLB use.  However, the
operation is quite heavy weight on PA 1.X processors as we need to take
the TLB lock and a TLB broadcast is sent to all processors.

This patch removes the purges from flush_dcache_page_asm() and
flush_icache_page_asm.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoparisc: Purge TLB before setting PTE
John David Anglin [Wed, 7 Dec 2016 02:47:04 +0000 (21:47 -0500)]
parisc: Purge TLB before setting PTE

commit c78e710c1c9fbeff43dddc0aa3d0ff458e70b0cc upstream.

The attached change interchanges the order of purging the TLB and
setting the corresponding page table entry.  TLB purges are strongly
ordered.  It occurred to me one night that setting the PTE first might
have subtle ordering issues on SMP machines and cause random memory
corruption.

A TLB lock guards the insertion of user TLB entries.  So after the TLB
is purged, a new entry can't be inserted until the lock is released.
This ensures that the new PTE value is used when the lock is released.

Since making this change, no random segmentation faults have been
observed on the Debian hppa buildd servers.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agofuse: fix clearing suid, sgid for chown()
Miklos Szeredi [Tue, 6 Dec 2016 15:18:45 +0000 (16:18 +0100)]
fuse: fix clearing suid, sgid for chown()

commit c01638f5d919728f565bf8b5e0a6a159642df0d9 upstream.

Basically, the pjdfstests set the ownership of a file to 06555, and then
chowns it (as root) to a new uid/gid. Prior to commit a09f99eddef4 ("fuse:
fix killing s[ug]id in setattr"), fuse would send down a setattr with both
the uid/gid change and a new mode.  Now, it just sends down the uid/gid
change.

Technically this is NOTABUG, since POSIX doesn't _require_ that we clear
these bits for a privileged process, but Linux (wisely) has done that and I
think we don't want to change that behavior here.

This is caused by the use of should_remove_suid(), which will always return
0 when the process has CAP_FSETID.

In fact we really don't need to be calling should_remove_suid() at all,
since we've already been indicated that we should remove the suid, we just
don't want to use a (very) stale mode for that.

This patch should fix the above as well as simplify the logic.

Reported-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: a09f99eddef4 ("fuse: fix killing s[ug]id in setattr")
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agopowerpc/boot: Fix build failure in 32-bit boot wrapper
Ben Hutchings [Wed, 16 Nov 2016 18:27:56 +0000 (18:27 +0000)]
powerpc/boot: Fix build failure in 32-bit boot wrapper

commit 10c77dba40ff58fc03587b3b60725bb7fd723183 upstream.

OPAL is not callable from 32-bit mode and the assembly code for it
may not even build (depending on how binutils was configured).

References: https://buildd.debian.org/status/fetch.php?pkg=linux&arch=powerpcspe&ver=4.8.7-1&stamp=1479203712
Fixes: 656ad58ef19e ("powerpc/boot: Add OPAL console to epapr wrappers")
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agopowerpc/mm: Fix lazy icache flush on pre-POWER5
Benjamin Herrenschmidt [Tue, 29 Nov 2016 02:13:46 +0000 (13:13 +1100)]
powerpc/mm: Fix lazy icache flush on pre-POWER5

commit dd7b2f035ec41a409f7a7cec7aabc0ec0eacf476 upstream.

On 64-bit CPUs with no-execute support and non-snooping icache, such as
970 or POWER4, we have a software mechanism to ensure coherency of the
cache (using exec faults when needed).

This was broken due to a logic error when the code was rewritten
from assembly to C, previously the assembly code did:

  BEGIN_FTR_SECTION
         mr      r4,r30
         mr      r5,r7
         bl      hash_page_do_lazy_icache
  END_FTR_SECTION(CPU_FTR_NOEXECUTE|CPU_FTR_COHERENT_ICACHE, CPU_FTR_NOEXECUTE)

Which tests that:
   (cpu_features & (NOEXECUTE | COHERENT_ICACHE)) == NOEXECUTE

Which says that the current cpu does have NOEXECUTE, but does not have
COHERENT_ICACHE.

Fixes: 91f1da99792a ("powerpc/mm: Convert 4k hash insert to C")
Fixes: 89ff725051d1 ("powerpc/mm: Convert __hash_page_64K to C")
Fixes: a43c0eb8364c ("powerpc/mm: Convert 4k insert from asm to C")
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Change log verbosification]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agopowerpc/eeh: Fix deadlock when PE frozen state can't be cleared
Andrew Donnellan [Thu, 1 Dec 2016 00:23:05 +0000 (11:23 +1100)]
powerpc/eeh: Fix deadlock when PE frozen state can't be cleared

commit 409bf7f8a02ef88db5a0f2cdcf9489914f4b8508 upstream.

In eeh_reset_device(), we take the pci_rescan_remove_lock immediately after
after we call eeh_reset_pe() to reset the PCI controller. We then call
eeh_clear_pe_frozen_state(), which can return an error. In this case, we
bail out of eeh_reset_device() without calling pci_unlock_rescan_remove().

Add a call to pci_unlock_rescan_remove() in the eeh_clear_pe_frozen_state()
error path so that we don't cause a deadlock later on.

Reported-by: Pradipta Ghosh <pradghos@in.ibm.com>
Fixes: 78954700631f ("powerpc/eeh: Avoid I/O access during PE reset")
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoLinux 4.8.14 v4.8.14
Greg Kroah-Hartman [Sat, 10 Dec 2016 18:09:59 +0000 (19:09 +0100)]
Linux 4.8.14

7 years agoesp6: Fix integrity verification when ESN are used
Tobias Brunner [Tue, 29 Nov 2016 16:05:25 +0000 (17:05 +0100)]
esp6: Fix integrity verification when ESN are used

commit a55e23864d381c5a4ef110df94b00b2fe121a70d upstream.

When handling inbound packets, the two halves of the sequence number
stored on the skb are already in network order.

Fixes: 000ae7b2690e ("esp6: Switch to new AEAD interface")
Signed-off-by: Tobias Brunner <tobias@strongswan.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoesp4: Fix integrity verification when ESN are used
Tobias Brunner [Tue, 29 Nov 2016 16:05:20 +0000 (17:05 +0100)]
esp4: Fix integrity verification when ESN are used

commit 7c7fedd51c02f4418e8b2eed64bdab601f882aa4 upstream.

When handling inbound packets, the two halves of the sequence number
stored on the skb are already in network order.

Fixes: 7021b2e1cddd ("esp4: Switch to new AEAD interface")
Signed-off-by: Tobias Brunner <tobias@strongswan.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoflowcache: Increase threshold for refusing new allocations
Miroslav Urbanek [Mon, 21 Nov 2016 14:48:21 +0000 (15:48 +0100)]
flowcache: Increase threshold for refusing new allocations

commit 6b226487815574193c1da864f2eac274781a2b0c upstream.

The threshold for OOM protection is too small for systems with large
number of CPUs. Applications report ENOBUFs on connect() every 10
minutes.

The problem is that the variable net->xfrm.flow_cache_gc_count is a
global counter while the variable fc->high_watermark is a per-CPU
constant. Take the number of CPUs into account as well.

Fixes: 6ad3122a08e3 ("flowcache: Avoid OOM condition under preasure")
Reported-by: Lukáš Koldrt <lk@excello.cz>
Tested-by: Jan Hejl <jh@excello.cz>
Signed-off-by: Miroslav Urbanek <mu@miroslavurbanek.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoRevert: "ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()"
Eli Cooper [Thu, 1 Dec 2016 02:05:12 +0000 (10:05 +0800)]
Revert: "ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()"

commit 80d1106aeaf689ab5fdf33020c5fecd269b31c88 upstream.

This reverts commit ae148b085876fa771d9ef2c05f85d4b4bf09ce0d
("ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()").

skb->protocol is now set in __ip_local_out() and __ip6_local_out() before
dst_output() is called. It is no longer necessary to do it for each tunnel.

Signed-off-by: Eli Cooper <elicooper@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoipv4: Set skb->protocol properly for local output
Eli Cooper [Thu, 1 Dec 2016 02:05:10 +0000 (10:05 +0800)]
ipv4: Set skb->protocol properly for local output

commit f4180439109aa720774baafdd798b3234ab1a0d2 upstream.

When xfrm is applied to TSO/GSO packets, it follows this path:

    xfrm_output() -> xfrm_output_gso() -> skb_gso_segment()

where skb_gso_segment() relies on skb->protocol to function properly.

This patch sets skb->protocol to ETH_P_IP before dst_output() is called,
fixing a bug where GSO packets sent through a sit tunnel are dropped
when xfrm is involved.

Signed-off-by: Eli Cooper <elicooper@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoipv6: Set skb->protocol properly for local output
Eli Cooper [Thu, 1 Dec 2016 02:05:11 +0000 (10:05 +0800)]
ipv6: Set skb->protocol properly for local output

commit b4e479a96fc398ccf83bb1cffb4ffef8631beaf1 upstream.

When xfrm is applied to TSO/GSO packets, it follows this path:

    xfrm_output() -> xfrm_output_gso() -> skb_gso_segment()

where skb_gso_segment() relies on skb->protocol to function properly.

This patch sets skb->protocol to ETH_P_IPV6 before dst_output() is called,
fixing a bug where GSO packets sent through an ipip6 tunnel are dropped
when xfrm is involved.

Signed-off-by: Eli Cooper <elicooper@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoDon't feed anything but regular iovec's to blk_rq_map_user_iov
Linus Torvalds [Wed, 7 Dec 2016 00:18:14 +0000 (16:18 -0800)]
Don't feed anything but regular iovec's to blk_rq_map_user_iov

commit a0ac402cfcdc904f9772e1762b3fda112dcc56a0 upstream.

In theory we could map other things, but there's a reason that function
is called "user_iov".  Using anything else (like splice can do) just
confuses it.

Reported-and-tested-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoconstify iov_iter_count() and iter_is_iovec()
Al Viro [Mon, 10 Oct 2016 17:57:37 +0000 (13:57 -0400)]
constify iov_iter_count() and iter_is_iovec()

commit b57332b4105abf1d518d93886e547ee2f98cd414 upstream.

[stable note, need this to prevent build warning in commit
a0ac402cfcdc904f9772e1762b3fda112dcc56a0]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agosparc32: Fix inverted invalid_frame_pointer checks on sigreturns
Andreas Larsson [Wed, 9 Nov 2016 09:43:05 +0000 (10:43 +0100)]
sparc32: Fix inverted invalid_frame_pointer checks on sigreturns

[ Upstream commit 07b5ab3f71d318e52c18cc3b73c1d44c908aacfa ]

Signed-off-by: Andreas Larsson <andreas@gaisler.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agosparc64: fix compile warning section mismatch in find_node()
Thomas Tai [Sat, 12 Nov 2016 00:41:00 +0000 (16:41 -0800)]
sparc64: fix compile warning section mismatch in find_node()

[ Upstream commit 87a349f9cc0908bc0cfac0c9ece3179f650ae95a ]

A compile warning is introduced by a commit to fix the find_node().
This patch fix the compile warning by moving find_node() into __init
section. Because find_node() is only used by memblock_nid_range() which
is only used by a __init add_node_ranges(). find_node() and
memblock_nid_range() should also be inside __init section.

Signed-off-by: Thomas Tai <thomas.tai@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agosparc64: Fix find_node warning if numa node cannot be found
Thomas Tai [Thu, 3 Nov 2016 16:19:01 +0000 (09:19 -0700)]
sparc64: Fix find_node warning if numa node cannot be found

[ Upstream commit 74a5ed5c4f692df2ff0a2313ea71e81243525519 ]

When booting up LDOM, find_node() warns that a physical address
doesn't match a NUMA node.

WARNING: CPU: 0 PID: 0 at arch/sparc/mm/init_64.c:835
find_node+0xf4/0x120 find_node: A physical address doesn't
match a NUMA node rule. Some physical memory will be
owned by node 0.Modules linked in:

CPU: 0 PID: 0 Comm: swapper Not tainted 4.9.0-rc3 #4
Call Trace:
 [0000000000468ba0] __warn+0xc0/0xe0
 [0000000000468c74] warn_slowpath_fmt+0x34/0x60
 [00000000004592f4] find_node+0xf4/0x120
 [0000000000dd0774] add_node_ranges+0x38/0xe4
 [0000000000dd0b1c] numa_parse_mdesc+0x268/0x2e4
 [0000000000dd0e9c] bootmem_init+0xb8/0x160
 [0000000000dd174c] paging_init+0x808/0x8fc
 [0000000000dcb0d0] setup_arch+0x2c8/0x2f0
 [0000000000dc68a0] start_kernel+0x48/0x424
 [0000000000dcb374] start_early_boot+0x27c/0x28c
 [0000000000a32c08] tlb_fixup_done+0x4c/0x64
 [0000000000027f08] 0x27f08

It is because linux use an internal structure node_masks[] to
keep the best memory latency node only. However, LDOM mdesc can
contain single latency-group with multiple memory latency nodes.

If the address doesn't match the best latency node within
node_masks[], it should check for an alternative via mdesc.
The warning message should only be printed if the address
doesn't match any node_masks[] nor within mdesc. To minimize
the impact of searching mdesc every time, the last matched
mask and index is stored in a variable.

Signed-off-by: Thomas Tai <thomas.tai@oracle.com>
Reviewed-by: Chris Hyser <chris.hyser@oracle.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoipv4: Drop suffix update from resize code
Alexander Duyck [Thu, 1 Dec 2016 12:27:57 +0000 (07:27 -0500)]
ipv4: Drop suffix update from resize code

[ Upstream commit a52ca62c4a6771028da9c1de934cdbcd93d54bb4 ]

It has been reported that update_suffix can be expensive when it is called
on a large node in which most of the suffix lengths are the same.  The time
required to add 200K entries had increased from around 3 seconds to almost
49 seconds.

In order to address this we need to move the code for updating the suffix
out of resize and instead just have it handled in the cases where we are
pushing a node that increases the suffix length, or will decrease the
suffix length.

Fixes: 5405afd1a306 ("fib_trie: Add tracking value for suffix length")
Reported-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Reviewed-by: Robert Shearman <rshearma@brocade.com>
Tested-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoipv4: Drop leaf from suffix pull/push functions
Alexander Duyck [Thu, 1 Dec 2016 12:27:52 +0000 (07:27 -0500)]
ipv4: Drop leaf from suffix pull/push functions

[ Upstream commit 1a239173cccff726b60ac6a9c79ae4a1e26cfa49 ]

It wasn't necessary to pass a leaf in when doing the suffix updates so just
drop it.  Instead just pass the suffix and work with that.

Since we dropped the leaf there is no need to include that in the name so
the names are updated to node_push_suffix and node_pull_suffix.

Finally I noticed that the logic for pulling the suffix length back
actually had some issues.  Specifically it would stop prematurely if there
was a longer suffix, but it was not as long as the original suffix.  I
updated the code to address that in node_pull_suffix.

Fixes: 5405afd1a306 ("fib_trie: Add tracking value for suffix length")
Suggested-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Reviewed-by: Robert Shearman <rshearma@brocade.com>
Tested-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoipv4: Fix memory leak in exception case for splitting tries
Alexander Duyck [Tue, 15 Nov 2016 10:46:12 +0000 (05:46 -0500)]
ipv4: Fix memory leak in exception case for splitting tries

[ Upstream commit 3114cdfe66c156345b0ae34e2990472f277e0c1b ]

Fix a small memory leak that can occur where we leak a fib_alias in the
event of us not being able to insert it into the local table.

Fixes: 0ddcf43d5d4a0 ("ipv4: FIB Local/MAIN table collapse")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoipv4: Restore fib_trie_flush_external function and fix call ordering
Alexander Duyck [Tue, 15 Nov 2016 10:46:06 +0000 (05:46 -0500)]
ipv4: Restore fib_trie_flush_external function and fix call ordering

[ Upstream commit 3b7093346b326e5d3590c7d49f6aefe6fa5b2c9a, the FIB offload
  removal didn't occur in 4.8 so that part of this patch isn't here.  However
  we still need to fib_unmerge() bits. ]

The patch that removed the FIB offload infrastructure was a bit too
aggressive and also removed code needed to clean up us splitting the table
if additional rules were added.  Specifically the function
fib_trie_flush_external was called at the end of a new rule being added to
flush the foreign trie entries from the main trie.

I updated the code so that we only call fib_trie_flush_external on the main
table so that we flush the entries for local from main.  This way we don't
call it for every rule change which is what was happening previously.

Fixes: 347e3b28c1ba2 ("switchdev: remove FIB offload infrastructure")
Reported-by: Eric Dumazet <edumazet@google.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet: ping: check minimum size on ICMP header length
Kees Cook [Mon, 5 Dec 2016 18:34:38 +0000 (10:34 -0800)]
net: ping: check minimum size on ICMP header length

[ Upstream commit 0eab121ef8750a5c8637d51534d5e9143fb0633f ]

Prior to commit c0371da6047a ("put iov_iter into msghdr") in v3.19, there
was no check that the iovec contained enough bytes for an ICMP header,
and the read loop would walk across neighboring stack contents. Since the
iov_iter conversion, bad arguments are noticed, but the returned error is
EFAULT. Returning EINVAL is a clearer error and also solves the problem
prior to v3.19.

This was found using trinity with KASAN on v3.18:

BUG: KASAN: stack-out-of-bounds in memcpy_fromiovec+0x60/0x114 at addr ffffffc071077da0
Read of size 8 by task trinity-c2/9623
page:ffffffbe034b9a08 count:0 mapcount:0 mapping:          (null) index:0x0
flags: 0x0()
page dumped because: kasan: bad access detected
CPU: 0 PID: 9623 Comm: trinity-c2 Tainted: G    BU         3.18.0-dirty #15
Hardware name: Google Tegra210 Smaug Rev 1,3+ (DT)
Call trace:
[<ffffffc000209c98>] dump_backtrace+0x0/0x1ac arch/arm64/kernel/traps.c:90
[<ffffffc000209e54>] show_stack+0x10/0x1c arch/arm64/kernel/traps.c:171
[<     inline     >] __dump_stack lib/dump_stack.c:15
[<ffffffc000f18dc4>] dump_stack+0x7c/0xd0 lib/dump_stack.c:50
[<     inline     >] print_address_description mm/kasan/report.c:147
[<     inline     >] kasan_report_error mm/kasan/report.c:236
[<ffffffc000373dcc>] kasan_report+0x380/0x4b8 mm/kasan/report.c:259
[<     inline     >] check_memory_region mm/kasan/kasan.c:264
[<ffffffc00037352c>] __asan_load8+0x20/0x70 mm/kasan/kasan.c:507
[<ffffffc0005b9624>] memcpy_fromiovec+0x5c/0x114 lib/iovec.c:15
[<     inline     >] memcpy_from_msg include/linux/skbuff.h:2667
[<ffffffc000ddeba0>] ping_common_sendmsg+0x50/0x108 net/ipv4/ping.c:674
[<ffffffc000dded30>] ping_v4_sendmsg+0xd8/0x698 net/ipv4/ping.c:714
[<ffffffc000dc91dc>] inet_sendmsg+0xe0/0x12c net/ipv4/af_inet.c:749
[<     inline     >] __sock_sendmsg_nosec net/socket.c:624
[<     inline     >] __sock_sendmsg net/socket.c:632
[<ffffffc000cab61c>] sock_sendmsg+0x124/0x164 net/socket.c:643
[<     inline     >] SYSC_sendto net/socket.c:1797
[<ffffffc000cad270>] SyS_sendto+0x178/0x1d8 net/socket.c:1761

CVE-2016-8399

Reported-by: Qidan He <i@flanker017.me>
Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind")
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet: avoid signed overflows for SO_{SND|RCV}BUFFORCE
Eric Dumazet [Fri, 2 Dec 2016 17:44:53 +0000 (09:44 -0800)]
net: avoid signed overflows for SO_{SND|RCV}BUFFORCE

[ Upstream commit b98b0bc8c431e3ceb4b26b0dfc8db509518fb290 ]

CAP_NET_ADMIN users should not be allowed to set negative
sk_sndbuf or sk_rcvbuf values, as it can lead to various memory
corruptions, crashes, OOM...

Note that before commit 82981930125a ("net: cleanups in
sock_setsockopt()"), the bug was even more serious, since SO_SNDBUF
and SO_RCVBUF were vulnerable.

This needs to be backported to all known linux kernels.

Again, many thanks to syzkaller team for discovering this gem.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agogeneve: avoid use-after-free of skb->data
Sabrina Dubroca [Fri, 2 Dec 2016 15:49:29 +0000 (16:49 +0100)]
geneve: avoid use-after-free of skb->data

[ Upstream commit 5b01014759991887b1e450c9def01e58c02ab81b ]

geneve{,6}_build_skb can end up doing a pskb_expand_head(), which
makes the ip_hdr(skb) reference we stashed earlier stale. Since it's
only needed as an argument to ip_tunnel_ecn_encap(), move this
directly in the function call.

Fixes: 08399efc6319 ("geneve: ensure ECN info is handled properly in all tx/rx paths")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agotipc: check minimum bearer MTU
Michal Kubeček [Fri, 2 Dec 2016 08:33:41 +0000 (09:33 +0100)]
tipc: check minimum bearer MTU

[ Upstream commit 3de81b758853f0b29c61e246679d20b513c4cfec ]

Qian Zhang (张谦) reported a potential socket buffer overflow in
tipc_msg_build() which is also known as CVE-2016-8632: due to
insufficient checks, a buffer overflow can occur if MTU is too short for
even tipc headers. As anyone can set device MTU in a user/net namespace,
this issue can be abused by a regular user.

As agreed in the discussion on Ben Hutchings' original patch, we should
check the MTU at the moment a bearer is attached rather than for each
processed packet. We also need to repeat the check when bearer MTU is
adjusted to new device MTU. UDP case also needs a check to avoid
overflow when calculating bearer MTU.

Fixes: b97bf3fd8f6a ("[TIPC] Initial merge")
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reported-by: Qian Zhang (张谦) <zhangqian-c@360.cn>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agosh_eth: remove unchecked interrupts for RZ/A1
Chris Brandt [Thu, 1 Dec 2016 18:32:14 +0000 (13:32 -0500)]
sh_eth: remove unchecked interrupts for RZ/A1

[ Upstream commit 33d446dbba4d4d6a77e1e900d434fa99e0f02c86 ]

When streaming a lot of data and the RZ/A1 can't keep up, some status bits
will get set that are not being checked or cleared which cause the
following messages and the Ethernet driver to stop working. This
patch fixes that issue.

irq 21: nobody cared (try booting with the "irqpoll" option)
handlers:
[<c036b71c>] sh_eth_interrupt
Disabling IRQ #21

Fixes: db893473d313a4ad ("sh_eth: Add support for r7s72100")
Signed-off-by: Chris Brandt <chris.brandt@renesas.com>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet: bcmgenet: Utilize correct struct device for all DMA operations
Florian Fainelli [Thu, 1 Dec 2016 17:45:45 +0000 (09:45 -0800)]
net: bcmgenet: Utilize correct struct device for all DMA operations

[ Upstream commit 8c4799ac799665065f9bf1364fd71bf4f7dc6a4a ]

__bcmgenet_tx_reclaim() and bcmgenet_free_rx_buffers() are not using the
same struct device during unmap that was used for the map operation,
which makes DMA-API debugging warn about it. Fix this by always using
&priv->pdev->dev throughout the driver, using an identical device
reference for all map/unmap calls.

Fixes: 1c1008c793fa ("net: bcmgenet: add main driver file")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocdc_ether: Fix handling connection notification
Kristian Evensen [Thu, 1 Dec 2016 13:23:17 +0000 (14:23 +0100)]
cdc_ether: Fix handling connection notification

[ Upstream commit d5c83d0d1d83b3798c71e0c8b7c3624d39c91d88 ]

Commit bfe9b9d2df66 ("cdc_ether: Improve ZTE MF823/831/910 handling")
introduced a work-around in usbnet_cdc_status() for devices that exported
cdc carrier on twice on connect. Before the commit, this behavior caused
the link state to be incorrect. It was assumed that all CDC Ethernet
devices would either export this behavior, or send one off and then one on
notification (which seems to be the default behavior).

Unfortunately, it turns out multiple devices sends a connection
notification multiple times per second (via an interrupt), even when
connection state does not change. This has been observed with several
different USB LAN dongles (at least), for example 13b1:0041 (Linksys).
After bfe9b9d2df66, the link state has been set as down and then up for
each notification. This has caused a flood of Netlink NEWLINK messages and
syslog to be flooded with messages similar to:

cdc_ether 2-1:2.0 eth1: kevent 12 may have been dropped

This commit fixes the behavior by reverting usbnet_cdc_status() to how it
was before bfe9b9d2df66. The work-around has been moved to a separate
status-function which is only called when a known, affect device is
detected.

v1->v2:

* Do not open-code netif_carrier_ok() (thanks Henning Schild).
* Call netif_carrier_off() instead of usb_link_change(). This prevents
calling schedule_work() twice without giving the work queue a chance to be
processed (thanks Bjørn Mork).

Fixes: bfe9b9d2df66 ("cdc_ether: Improve ZTE MF823/831/910 handling")
Reported-by: Henning Schild <henning.schild@siemens.com>
Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoip6_offload: check segs for NULL in ipv6_gso_segment.
Artem Savkov [Thu, 1 Dec 2016 13:06:04 +0000 (14:06 +0100)]
ip6_offload: check segs for NULL in ipv6_gso_segment.

[ Upstream commit 6b6ebb6b01c873d0cfe3449e8a1219ee6e5fc022 ]

segs needs to be checked for being NULL in ipv6_gso_segment() before calling
skb_shinfo(segs), otherwise kernel can run into a NULL-pointer dereference:

[   97.811262] BUG: unable to handle kernel NULL pointer dereference at 00000000000000cc
[   97.819112] IP: [<ffffffff816e52f9>] ipv6_gso_segment+0x119/0x2f0
[   97.825214] PGD 0 [   97.827047]
[   97.828540] Oops: 0000 [#1] SMP
[   97.831678] Modules linked in: vhost_net vhost macvtap macvlan nfsv3 rpcsec_gss_krb5
nfsv4 dns_resolver nfs fscache xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4
iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack
ipt_REJECT nf_reject_ipv4 tun ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter
bridge stp llc snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel
snd_hda_codec edac_mce_amd snd_hda_core edac_core snd_hwdep kvm_amd snd_seq kvm snd_seq_device
snd_pcm irqbypass snd_timer ppdev parport_serial snd parport_pc k10temp pcspkr soundcore parport
sp5100_tco shpchp sg wmi i2c_piix4 acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc
ip_tables xfs libcrc32c sr_mod cdrom sd_mod ata_generic pata_acpi amdkfd amd_iommu_v2 radeon
broadcom bcm_phy_lib i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
ttm ahci serio_raw tg3 firewire_ohci libahci pata_atiixp drm ptp libata firewire_core pps_core
i2c_core crc_itu_t fjes dm_mirror dm_region_hash dm_log dm_mod
[   97.927721] CPU: 1 PID: 3504 Comm: vhost-3495 Not tainted 4.9.0-7.el7.test.x86_64 #1
[   97.935457] Hardware name: AMD Snook/Snook, BIOS ESK0726A 07/26/2010
[   97.941806] task: ffff880129a1c080 task.stack: ffffc90001bcc000
[   97.947720] RIP: 0010:[<ffffffff816e52f9>]  [<ffffffff816e52f9>] ipv6_gso_segment+0x119/0x2f0
[   97.956251] RSP: 0018:ffff88012fc43a10  EFLAGS: 00010207
[   97.961557] RAX: 0000000000000000 RBX: ffff8801292c8700 RCX: 0000000000000594
[   97.968687] RDX: 0000000000000593 RSI: ffff880129a846c0 RDI: 0000000000240000
[   97.975814] RBP: ffff88012fc43a68 R08: ffff880129a8404e R09: 0000000000000000
[   97.982942] R10: 0000000000000000 R11: ffff880129a84076 R12: 00000020002949b3
[   97.990070] R13: ffff88012a580000 R14: 0000000000000000 R15: ffff88012a580000
[   97.997198] FS:  0000000000000000(0000) GS:ffff88012fc40000(0000) knlGS:0000000000000000
[   98.005280] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   98.011021] CR2: 00000000000000cc CR3: 0000000126c5d000 CR4: 00000000000006e0
[   98.018149] Stack:
[   98.020157]  00000000ffffffff ffff88012fc43ac8 ffffffffa017ad0a 000000000000000e
[   98.027584]  0000001300000000 0000000077d59998 ffff8801292c8700 00000020002949b3
[   98.035010]  ffff88012a580000 0000000000000000 ffff88012a580000 ffff88012fc43a98
[   98.042437] Call Trace:
[   98.044879]  <IRQ> [   98.046803]  [<ffffffffa017ad0a>] ? tg3_start_xmit+0x84a/0xd60 [tg3]
[   98.053156]  [<ffffffff815eeee0>] skb_mac_gso_segment+0xb0/0x130
[   98.059158]  [<ffffffff815eefd3>] __skb_gso_segment+0x73/0x110
[   98.064985]  [<ffffffff815ef40d>] validate_xmit_skb+0x12d/0x2b0
[   98.070899]  [<ffffffff815ef5d2>] validate_xmit_skb_list+0x42/0x70
[   98.077073]  [<ffffffff81618560>] sch_direct_xmit+0xd0/0x1b0
[   98.082726]  [<ffffffff815efd86>] __dev_queue_xmit+0x486/0x690
[   98.088554]  [<ffffffff8135c135>] ? cpumask_next_and+0x35/0x50
[   98.094380]  [<ffffffff815effa0>] dev_queue_xmit+0x10/0x20
[   98.099863]  [<ffffffffa09ce057>] br_dev_queue_push_xmit+0xa7/0x170 [bridge]
[   98.106907]  [<ffffffffa09ce161>] br_forward_finish+0x41/0xc0 [bridge]
[   98.113430]  [<ffffffff81627cf2>] ? nf_iterate+0x52/0x60
[   98.118735]  [<ffffffff81627d6b>] ? nf_hook_slow+0x6b/0xc0
[   98.124216]  [<ffffffffa09ce32c>] __br_forward+0x14c/0x1e0 [bridge]
[   98.130480]  [<ffffffffa09ce120>] ? br_dev_queue_push_xmit+0x170/0x170 [bridge]
[   98.137785]  [<ffffffffa09ce4bd>] br_forward+0x9d/0xb0 [bridge]
[   98.143701]  [<ffffffffa09cfbb7>] br_handle_frame_finish+0x267/0x560 [bridge]
[   98.150834]  [<ffffffffa09d0064>] br_handle_frame+0x174/0x2f0 [bridge]
[   98.157355]  [<ffffffff8102fb89>] ? sched_clock+0x9/0x10
[   98.162662]  [<ffffffff810b63b2>] ? sched_clock_cpu+0x72/0xa0
[   98.168403]  [<ffffffff815eccf5>] __netif_receive_skb_core+0x1e5/0xa20
[   98.174926]  [<ffffffff813659f9>] ? timerqueue_add+0x59/0xb0
[   98.180580]  [<ffffffff815ed548>] __netif_receive_skb+0x18/0x60
[   98.186494]  [<ffffffff815ee625>] process_backlog+0x95/0x140
[   98.192145]  [<ffffffff815edccd>] net_rx_action+0x16d/0x380
[   98.197713]  [<ffffffff8170cff1>] __do_softirq+0xd1/0x283
[   98.203106]  [<ffffffff8170b2bc>] do_softirq_own_stack+0x1c/0x30
[   98.209107]  <EOI> [   98.211029]  [<ffffffff8108a5c0>] do_softirq+0x50/0x60
[   98.216166]  [<ffffffff815ec853>] netif_rx_ni+0x33/0x80
[   98.221386]  [<ffffffffa09eeff7>] tun_get_user+0x487/0x7f0 [tun]
[   98.227388]  [<ffffffffa09ef3ab>] tun_sendmsg+0x4b/0x60 [tun]
[   98.233129]  [<ffffffffa0b68932>] handle_tx+0x282/0x540 [vhost_net]
[   98.239392]  [<ffffffffa0b68c25>] handle_tx_kick+0x15/0x20 [vhost_net]
[   98.245916]  [<ffffffffa0abacfe>] vhost_worker+0x9e/0xf0 [vhost]
[   98.251919]  [<ffffffffa0abac60>] ? vhost_umem_alloc+0x40/0x40 [vhost]
[   98.258440]  [<ffffffff81003a47>] ? do_syscall_64+0x67/0x180
[   98.264094]  [<ffffffff810a44d9>] kthread+0xd9/0xf0
[   98.268965]  [<ffffffff810a4400>] ? kthread_park+0x60/0x60
[   98.274444]  [<ffffffff8170a4d5>] ret_from_fork+0x25/0x30
[   98.279836] Code: 8b 93 d8 00 00 00 48 2b 93 d0 00 00 00 4c 89 e6 48 89 df 66 89 93 c2 00 00 00 ff 10 48 3d 00 f0 ff ff 49 89 c2 0f 87 52 01 00 00 <41> 8b 92 cc 00 00 00 48 8b 80 d0 00 00 00 44 0f b7 74 10 06 66
[   98.299425] RIP  [<ffffffff816e52f9>] ipv6_gso_segment+0x119/0x2f0
[   98.305612]  RSP <ffff88012fc43a10>
[   98.309094] CR2: 00000000000000cc
[   98.312406] ---[ end trace 726a2c7a2d2d78d0 ]---

Signed-off-by: Artem Savkov <asavkov@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agopacket: fix race condition in packet_set_ring
Philip Pettersson [Wed, 30 Nov 2016 22:55:36 +0000 (14:55 -0800)]
packet: fix race condition in packet_set_ring

[ Upstream commit 84ac7260236a49c79eede91617700174c2c19b0c ]

When packet_set_ring creates a ring buffer it will initialize a
struct timer_list if the packet version is TPACKET_V3. This value
can then be raced by a different thread calling setsockopt to
set the version to TPACKET_V1 before packet_set_ring has finished.

This leads to a use-after-free on a function pointer in the
struct timer_list when the socket is closed as the previously
initialized timer will not be deleted.

The bug is fixed by taking lock_sock(sk) in packet_setsockopt when
changing the packet version while also taking the lock at the start
of packet_set_ring.

Fixes: f6fb8f100b80 ("af-packet: TPACKET_V3 flexible buffer implementation.")
Signed-off-by: Philip Pettersson <philip.pettersson@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoGSO: Reload iph after pskb_may_pull
Arnaldo Carvalho de Melo [Mon, 28 Nov 2016 15:36:58 +0000 (12:36 -0300)]
GSO: Reload iph after pskb_may_pull

[ Upstream commit a510887824171ad260cc4a2603396c6247fdd091 ]

As it may get stale and lead to use after free.

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Alexander Duyck <aduyck@mirantis.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Fixes: cbc53e08a793 ("GSO: Add GSO type for fixed IPv4 ID")
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet/dccp: fix use-after-free in dccp_invalid_packet
Eric Dumazet [Mon, 28 Nov 2016 14:26:49 +0000 (06:26 -0800)]
net/dccp: fix use-after-free in dccp_invalid_packet

[ Upstream commit 648f0c28df282636c0c8a7a19ca3ce5fc80a39c3 ]

pskb_may_pull() can reallocate skb->head, we need to reload dh pointer
in dccp_invalid_packet() or risk use after free.

Bug found by Andrey Konovalov using syzkaller.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet: macb: fix the RX queue reset in macb_rx()
Cyrille Pitchen [Mon, 28 Nov 2016 13:40:55 +0000 (14:40 +0100)]
net: macb: fix the RX queue reset in macb_rx()

[ Upstream commit a0b44eea372b449ef9744fb1d90491cc063289b8 ]

On macb only (not gem), when a RX queue corruption was detected from
macb_rx(), the RX queue was reset: during this process the RX ring
buffer descriptor was initialized by macb_init_rx_ring() but we forgot
to also set bp->rx_tail to 0.

Indeed, when processing the received frames, bp->rx_tail provides the
macb driver with the index in the RX ring buffer of the next buffer to
process. So when the whole ring buffer is reset we must also reset
bp->rx_tail so the driver is synchronized again with the hardware.

Since macb_init_rx_ring() is called from many locations, currently from
macb_rx() and macb_init_rings(), we'd rather add the "bp->rx_tail = 0;"
line inside macb_init_rx_ring() than add the very same line after each
call of this function.

Without this fix, the rx queue is not reset properly to recover from
queue corruption and connection drop may occur.

Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Fixes: 9ba723b081a2 ("net: macb: remove BUG_ON() and reset the queue to handle RX errors")
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonetlink: Do not schedule work from sk_destruct
Herbert Xu [Mon, 5 Dec 2016 07:28:21 +0000 (15:28 +0800)]
netlink: Do not schedule work from sk_destruct

[ Upstream commit ed5d7788a934a4b6d6d025e948ed4da496b4f12e ]

It is wrong to schedule a work from sk_destruct using the socket
as the memory reserve because the socket will be freed immediately
after the return from sk_destruct.

Instead we should do the deferral prior to sk_free.

This patch does just that.

Fixes: 707693c8a498 ("netlink: Call cb->done from a worker thread")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonetlink: Call cb->done from a worker thread
Herbert Xu [Mon, 28 Nov 2016 11:22:12 +0000 (19:22 +0800)]
netlink: Call cb->done from a worker thread

[ Upstream commit 707693c8a498697aa8db240b93eb76ec62e30892 ]

The cb->done interface expects to be called in process context.
This was broken by the netlink RCU conversion.  This patch fixes
it by adding a worker struct to make the cb->done call where
necessary.

Fixes: 21e4902aea80 ("netlink: Lockless lookup with RCU grace...")
Reported-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet/sched: pedit: make sure that offset is valid
Amir Vadai [Mon, 28 Nov 2016 10:56:40 +0000 (12:56 +0200)]
net/sched: pedit: make sure that offset is valid

[ Upstream commit 95c2027bfeda21a28eb245121e6a249f38d0788e ]

Add a validation function to make sure offset is valid:
1. Not below skb head (could happen when offset is negative).
2. Validate both 'offset' and 'at'.

Signed-off-by: Amir Vadai <amir@vadai.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet: dsa: fix unbalanced dsa_switch_tree reference counting
Nikita Yushchenko [Mon, 28 Nov 2016 06:48:48 +0000 (09:48 +0300)]
net: dsa: fix unbalanced dsa_switch_tree reference counting

[ Upstream commit 7a99cd6e213685b78118382e6a8fed506c82ccb2 ]

_dsa_register_switch() gets a dsa_switch_tree object either via
dsa_get_dst() or via dsa_add_dst(). Former path does not increase kref
in returned object (resulting into caller not owning a reference),
while later path does create a new object (resulting into caller owning
a reference).

The rest of _dsa_register_switch() assumes that it owns a reference, and
calls dsa_put_dst().

This causes a memory breakage if first switch in the tree initialized
successfully, but second failed to initialize. In particular, freed
dsa_swith_tree object is left referenced by switch that was initialized,
and later access to sysfs attributes of that switch cause OOPS.

To fix, need to add kref_get() call to dsa_get_dst().

Fixes: 83c0afaec7b7 ("net: dsa: Add new binding implementation")
Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet, sched: respect rcu grace period on cls destruction
Daniel Borkmann [Sun, 27 Nov 2016 00:18:01 +0000 (01:18 +0100)]
net, sched: respect rcu grace period on cls destruction

[ Upstream commit d936377414fadbafb4d17148d222fe45ca5442d4 ]

Roi reported a crash in flower where tp->root was NULL in ->classify()
callbacks. Reason is that in ->destroy() tp->root is set to NULL via
RCU_INIT_POINTER(). It's problematic for some of the classifiers, because
this doesn't respect RCU grace period for them, and as a result, still
outstanding readers from tc_classify() will try to blindly dereference
a NULL tp->root.

The tp->root object is strictly private to the classifier implementation
and holds internal data the core such as tc_ctl_tfilter() doesn't know
about. Within some classifiers, such as cls_bpf, cls_basic, etc, tp->root
is only checked for NULL in ->get() callback, but nowhere else. This is
misleading and seemed to be copied from old classifier code that was not
cleaned up properly. For example, d3fa76ee6b4a ("[NET_SCHED]: cls_basic:
fix NULL pointer dereference") moved tp->root initialization into ->init()
routine, where before it was part of ->change(), so ->get() had to deal
with tp->root being NULL back then, so that was indeed a valid case, after
d3fa76ee6b4a, not really anymore. We used to set tp->root to NULL long
ago in ->destroy(), see 47a1a1d4be29 ("pkt_sched: remove unnecessary xchg()
in packet classifiers"); but the NULLifying was reintroduced with the
RCUification, but it's not correct for every classifier implementation.

In the cases that are fixed here with one exception of cls_cgroup, tp->root
object is allocated and initialized inside ->init() callback, which is always
performed at a point in time after we allocate a new tp, which means tp and
thus tp->root was not globally visible in the tp chain yet (see tc_ctl_tfilter()).
Also, on destruction tp->root is strictly kfree_rcu()'ed in ->destroy()
handler, same for the tp which is kfree_rcu()'ed right when we return
from ->destroy() in tcf_destroy(). This means, the head object's lifetime
for such classifiers is always tied to the tp lifetime. The RCU callback
invocation for the two kfree_rcu() could be out of order, but that's fine
since both are independent.

Dropping the RCU_INIT_POINTER(tp->root, NULL) for these classifiers here
means that 1) we don't need a useless NULL check in fast-path and, 2) that
outstanding readers of that tp in tc_classify() can still execute under
respect with RCU grace period as it is actually expected.

Things that haven't been touched here: cls_fw and cls_route. They each
handle tp->root being NULL in ->classify() path for historic reasons, so
their ->destroy() implementation can stay as is. If someone actually
cares, they could get cleaned up at some point to avoid the test in fast
path. cls_u32 doesn't set tp->root to NULL. For cls_rsvp, I just added a
!head should anyone actually be using/testing it, so it at least aligns with
cls_fw and cls_route. For cls_flower we additionally need to defer rhashtable
destruction (to a sleepable context) after RCU grace period as concurrent
readers might still access it. (Note that in this case we need to hold module
reference to keep work callback address intact, since we only wait on module
unload for all call_rcu()s to finish.)

This fixes one race to bring RCU grace period guarantees back. Next step
as worked on by Cong however is to fix 1e052be69d04 ("net_sched: destroy
proto tp when all filters are gone") to get the order of unlinking the tp
in tc_ctl_tfilter() for the RTM_DELTFILTER case right by moving
RCU_INIT_POINTER() before tcf_destroy() and let the notification for
removal be done through the prior ->delete() callback. Both are independant
issues. Once we have that right, we can then clean tp->root up for a number
of classifiers by not making them RCU pointers, which requires a new callback
(->uninit) that is triggered from tp's RCU callback, where we just kfree()
tp->root from there.

Fixes: 1f947bf151e9 ("net: sched: rcu'ify cls_bpf")
Fixes: 9888faefe132 ("net: sched: cls_basic use RCU")
Fixes: 70da9f0bf999 ("net: sched: cls_flow use RCU")
Fixes: 77b9900ef53a ("tc: introduce Flower classifier")
Fixes: bf3994d2ed31 ("net/sched: introduce Match-all classifier")
Fixes: 952313bd6258 ("net: sched: cls_cgroup use RCU")
Reported-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Roi Dayan <roid@mellanox.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet: dsa: bcm_sf2: Ensure we re-negotiate EEE during after link change
Florian Fainelli [Tue, 22 Nov 2016 19:40:58 +0000 (11:40 -0800)]
net: dsa: bcm_sf2: Ensure we re-negotiate EEE during after link change

[ Upstream commit 76da8706d90d8641eeb9b8e579942ed80b6c0880 ]

In case the link change and EEE is enabled or disabled, always try to
re-negotiate this with the link partner.

Fixes: 450b05c15f9c ("net: dsa: bcm_sf2: add support for controlling EEE")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoudplite: call proper backlog handlers
Eric Dumazet [Tue, 22 Nov 2016 17:06:45 +0000 (09:06 -0800)]
udplite: call proper backlog handlers

[ Upstream commit 30c7be26fd3587abcb69587f781098e3ca2d565b ]

In commits 93821778def10 ("udp: Fix rcv socket locking") and
f7ad74fef3af ("net/ipv6/udp: UDP encapsulation: break backlog_rcv into
__udpv6_queue_rcv_skb") UDP backlog handlers were renamed, but UDPlite
was forgotten.

This leads to crashes if UDPlite header is pulled twice, which happens
starting from commit e6afc8ace6dd ("udp: remove headers from UDP packets
before queueing")

Bug found by syzkaller team, thanks a lot guys !

Note that backlog use in UDP/UDPlite is scheduled to be removed starting
from linux-4.10, so this patch is only needed up to linux-4.9

Fixes: 93821778def1 ("udp: Fix rcv socket locking")
Fixes: f7ad74fef3af ("net/ipv6/udp: UDP encapsulation: break backlog_rcv into __udpv6_queue_rcv_skb")
Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoipv6: bump genid when the IFA_F_TENTATIVE flag is clear
Paolo Abeni [Tue, 22 Nov 2016 15:57:40 +0000 (16:57 +0100)]
ipv6: bump genid when the IFA_F_TENTATIVE flag is clear

[ Upstream commit 764d3be6e415b40056834bfd29b994dc3f837606 ]

When an ipv6 address has the tentative flag set, it can't be
used as source for egress traffic, while the associated route,
if any, can be looked up and even stored into some dst_cache.

In the latter scenario, the source ipv6 address selected and
stored in the cache is most probably wrong (e.g. with
link-local scope) and the entity using the dst_cache will
experience lack of ipv6 connectivity until said cache is
cleared or invalidated.

Overall this may cause lack of connectivity over most IPv6 tunnels
(comprising geneve and vxlan), if the first egress packet reaches
the tunnel before the DaD is completed for the used ipv6
address.

This patch bumps a new genid after that the IFA_F_TENTATIVE flag
is cleared, so that dst_cache will be invalidated on
next lookup and ipv6 connectivity restored.

Fixes: 0c1d70af924b ("net: use dst_cache for vxlan device")
Fixes: 468dfffcd762 ("geneve: add dst caching support")
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agortnl: fix the loop index update error in rtnl_dump_ifinfo()
Zhang Shengju [Sat, 19 Nov 2016 15:28:32 +0000 (23:28 +0800)]
rtnl: fix the loop index update error in rtnl_dump_ifinfo()

[ Upstream commit 3f0ae05d6fea0ed5b19efdbc9c9f8e02685a3af3 ]

If the link is filtered out, loop index should also be updated. If not,
loop index will not be correct.

Fixes: dc599f76c22b0 ("net: Add support for filtering link dump by master device and kind")
Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agol2tp: fix racy SOCK_ZAPPED flag check in l2tp_ip{,6}_bind()
Guillaume Nault [Fri, 18 Nov 2016 21:13:00 +0000 (22:13 +0100)]
l2tp: fix racy SOCK_ZAPPED flag check in l2tp_ip{,6}_bind()

[ Upstream commit 32c231164b762dddefa13af5a0101032c70b50ef ]

Lock socket before checking the SOCK_ZAPPED flag in l2tp_ip6_bind().
Without lock, a concurrent call could modify the socket flags between
the sock_flag(sk, SOCK_ZAPPED) test and the lock_sock() call. This way,
a socket could be inserted twice in l2tp_ip6_bind_table. Releasing it
would then leave a stale pointer there, generating use-after-free
errors when walking through the list or modifying adjacent entries.

BUG: KASAN: use-after-free in l2tp_ip6_close+0x22e/0x290 at addr ffff8800081b0ed8
Write of size 8 by task syz-executor/10987
CPU: 0 PID: 10987 Comm: syz-executor Not tainted 4.8.0+ #39
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
 ffff880031d97838 ffffffff829f835b ffff88001b5a1640 ffff8800081b0ec0
 ffff8800081b15a0 ffff8800081b6d20 ffff880031d97860 ffffffff8174d3cc
 ffff880031d978f0 ffff8800081b0e80 ffff88001b5a1640 ffff880031d978e0
Call Trace:
 [<ffffffff829f835b>] dump_stack+0xb3/0x118 lib/dump_stack.c:15
 [<ffffffff8174d3cc>] kasan_object_err+0x1c/0x70 mm/kasan/report.c:156
 [<     inline     >] print_address_description mm/kasan/report.c:194
 [<ffffffff8174d666>] kasan_report_error+0x1f6/0x4d0 mm/kasan/report.c:283
 [<     inline     >] kasan_report mm/kasan/report.c:303
 [<ffffffff8174db7e>] __asan_report_store8_noabort+0x3e/0x40 mm/kasan/report.c:329
 [<     inline     >] __write_once_size ./include/linux/compiler.h:249
 [<     inline     >] __hlist_del ./include/linux/list.h:622
 [<     inline     >] hlist_del_init ./include/linux/list.h:637
 [<ffffffff8579047e>] l2tp_ip6_close+0x22e/0x290 net/l2tp/l2tp_ip6.c:239
 [<ffffffff850b2dfd>] inet_release+0xed/0x1c0 net/ipv4/af_inet.c:415
 [<ffffffff851dc5a0>] inet6_release+0x50/0x70 net/ipv6/af_inet6.c:422
 [<ffffffff84c4581d>] sock_release+0x8d/0x1d0 net/socket.c:570
 [<ffffffff84c45976>] sock_close+0x16/0x20 net/socket.c:1017
 [<ffffffff817a108c>] __fput+0x28c/0x780 fs/file_table.c:208
 [<ffffffff817a1605>] ____fput+0x15/0x20 fs/file_table.c:244
 [<ffffffff813774f9>] task_work_run+0xf9/0x170
 [<ffffffff81324aae>] do_exit+0x85e/0x2a00
 [<ffffffff81326dc8>] do_group_exit+0x108/0x330
 [<ffffffff81348cf7>] get_signal+0x617/0x17a0 kernel/signal.c:2307
 [<ffffffff811b49af>] do_signal+0x7f/0x18f0
 [<ffffffff810039bf>] exit_to_usermode_loop+0xbf/0x150 arch/x86/entry/common.c:156
 [<     inline     >] prepare_exit_to_usermode arch/x86/entry/common.c:190
 [<ffffffff81006060>] syscall_return_slowpath+0x1a0/0x1e0 arch/x86/entry/common.c:259
 [<ffffffff85e4d726>] entry_SYSCALL_64_fastpath+0xc4/0xc6
Object at ffff8800081b0ec0, in cache L2TP/IPv6 size: 1448
Allocated:
PID = 10987
 [ 1116.897025] [<ffffffff811ddcb6>] save_stack_trace+0x16/0x20
 [ 1116.897025] [<ffffffff8174c736>] save_stack+0x46/0xd0
 [ 1116.897025] [<ffffffff8174c9ad>] kasan_kmalloc+0xad/0xe0
 [ 1116.897025] [<ffffffff8174cee2>] kasan_slab_alloc+0x12/0x20
 [ 1116.897025] [<     inline     >] slab_post_alloc_hook mm/slab.h:417
 [ 1116.897025] [<     inline     >] slab_alloc_node mm/slub.c:2708
 [ 1116.897025] [<     inline     >] slab_alloc mm/slub.c:2716
 [ 1116.897025] [<ffffffff817476a8>] kmem_cache_alloc+0xc8/0x2b0 mm/slub.c:2721
 [ 1116.897025] [<ffffffff84c4f6a9>] sk_prot_alloc+0x69/0x2b0 net/core/sock.c:1326
 [ 1116.897025] [<ffffffff84c58ac8>] sk_alloc+0x38/0xae0 net/core/sock.c:1388
 [ 1116.897025] [<ffffffff851ddf67>] inet6_create+0x2d7/0x1000 net/ipv6/af_inet6.c:182
 [ 1116.897025] [<ffffffff84c4af7b>] __sock_create+0x37b/0x640 net/socket.c:1153
 [ 1116.897025] [<     inline     >] sock_create net/socket.c:1193
 [ 1116.897025] [<     inline     >] SYSC_socket net/socket.c:1223
 [ 1116.897025] [<ffffffff84c4b46f>] SyS_socket+0xef/0x1b0 net/socket.c:1203
 [ 1116.897025] [<ffffffff85e4d685>] entry_SYSCALL_64_fastpath+0x23/0xc6
Freed:
PID = 10987
 [ 1116.897025] [<ffffffff811ddcb6>] save_stack_trace+0x16/0x20
 [ 1116.897025] [<ffffffff8174c736>] save_stack+0x46/0xd0
 [ 1116.897025] [<ffffffff8174cf61>] kasan_slab_free+0x71/0xb0
 [ 1116.897025] [<     inline     >] slab_free_hook mm/slub.c:1352
 [ 1116.897025] [<     inline     >] slab_free_freelist_hook mm/slub.c:1374
 [ 1116.897025] [<     inline     >] slab_free mm/slub.c:2951
 [ 1116.897025] [<ffffffff81748b28>] kmem_cache_free+0xc8/0x330 mm/slub.c:2973
 [ 1116.897025] [<     inline     >] sk_prot_free net/core/sock.c:1369
 [ 1116.897025] [<ffffffff84c541eb>] __sk_destruct+0x32b/0x4f0 net/core/sock.c:1444
 [ 1116.897025] [<ffffffff84c5aca4>] sk_destruct+0x44/0x80 net/core/sock.c:1452
 [ 1116.897025] [<ffffffff84c5ad33>] __sk_free+0x53/0x220 net/core/sock.c:1460
 [ 1116.897025] [<ffffffff84c5af23>] sk_free+0x23/0x30 net/core/sock.c:1471
 [ 1116.897025] [<ffffffff84c5cb6c>] sk_common_release+0x28c/0x3e0 ./include/net/sock.h:1589
 [ 1116.897025] [<ffffffff8579044e>] l2tp_ip6_close+0x1fe/0x290 net/l2tp/l2tp_ip6.c:243
 [ 1116.897025] [<ffffffff850b2dfd>] inet_release+0xed/0x1c0 net/ipv4/af_inet.c:415
 [ 1116.897025] [<ffffffff851dc5a0>] inet6_release+0x50/0x70 net/ipv6/af_inet6.c:422
 [ 1116.897025] [<ffffffff84c4581d>] sock_release+0x8d/0x1d0 net/socket.c:570
 [ 1116.897025] [<ffffffff84c45976>] sock_close+0x16/0x20 net/socket.c:1017
 [ 1116.897025] [<ffffffff817a108c>] __fput+0x28c/0x780 fs/file_table.c:208
 [ 1116.897025] [<ffffffff817a1605>] ____fput+0x15/0x20 fs/file_table.c:244
 [ 1116.897025] [<ffffffff813774f9>] task_work_run+0xf9/0x170
 [ 1116.897025] [<ffffffff81324aae>] do_exit+0x85e/0x2a00
 [ 1116.897025] [<ffffffff81326dc8>] do_group_exit+0x108/0x330
 [ 1116.897025] [<ffffffff81348cf7>] get_signal+0x617/0x17a0 kernel/signal.c:2307
 [ 1116.897025] [<ffffffff811b49af>] do_signal+0x7f/0x18f0
 [ 1116.897025] [<ffffffff810039bf>] exit_to_usermode_loop+0xbf/0x150 arch/x86/entry/common.c:156
 [ 1116.897025] [<     inline     >] prepare_exit_to_usermode arch/x86/entry/common.c:190
 [ 1116.897025] [<ffffffff81006060>] syscall_return_slowpath+0x1a0/0x1e0 arch/x86/entry/common.c:259
 [ 1116.897025] [<ffffffff85e4d726>] entry_SYSCALL_64_fastpath+0xc4/0xc6
Memory state around the buggy address:
 ffff8800081b0d80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff8800081b0e00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8800081b0e80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
                                                    ^
 ffff8800081b0f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff8800081b0f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

==================================================================

The same issue exists with l2tp_ip_bind() and l2tp_ip_bind_table.

Fixes: c51ce49735c1 ("l2tp: fix oops in L2TP IP sockets for connect() AF_UNSPEC case")
Reported-by: Baozeng Ding <sploving1@gmail.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Baozeng Ding <sploving1@gmail.com>
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agortnetlink: fix FDB size computation
Sabrina Dubroca [Fri, 18 Nov 2016 14:50:39 +0000 (15:50 +0100)]
rtnetlink: fix FDB size computation

[ Upstream commit f82ef3e10a870acc19fa04f80ef5877eaa26f41e ]

Add missing NDA_VLAN attribute's size.

Fixes: 1e53d5bb8878 ("net: Pass VLAN ID to rtnl_fdb_notify.")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoaf_unix: conditionally use freezable blocking calls in read
WANG Cong [Thu, 17 Nov 2016 23:55:26 +0000 (15:55 -0800)]
af_unix: conditionally use freezable blocking calls in read

[ Upstream commit 06a77b07e3b44aea2b3c0e64de420ea2cfdcbaa9 ]

Commit 2b15af6f95 ("af_unix: use freezable blocking calls in read")
converts schedule_timeout() to its freezable version, it was probably
correct at that time, but later, commit 2b514574f7e8
("net: af_unix: implement splice for stream af_unix sockets") breaks
the strong requirement for a freezable sleep, according to
commit 0f9548ca1091:

    We shouldn't try_to_freeze if locks are held.  Holding a lock can cause a
    deadlock if the lock is later acquired in the suspend or hibernate path
    (e.g.  by dpm).  Holding a lock can also cause a deadlock in the case of
    cgroup_freezer if a lock is held inside a frozen cgroup that is later
    acquired by a process outside that group.

The pipe_lock is still held at that point.

So use freezable version only for the recvmsg call path, avoid impact for
Android.

Fixes: 2b514574f7e8 ("net: af_unix: implement splice for stream af_unix sockets")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Colin Cross <ccross@android.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet: sky2: Fix shutdown crash
Jeremy Linton [Thu, 17 Nov 2016 15:14:25 +0000 (09:14 -0600)]
net: sky2: Fix shutdown crash

[ Upstream commit 06ba3b2133dc203e1e9bc36cee7f0839b79a9e8b ]

The sky2 frequently crashes during machine shutdown with:

sky2_get_stats+0x60/0x3d8 [sky2]
dev_get_stats+0x68/0xd8
rtnl_fill_stats+0x54/0x140
rtnl_fill_ifinfo+0x46c/0xc68
rtmsg_ifinfo_build_skb+0x7c/0xf0
rtmsg_ifinfo.part.22+0x3c/0x70
rtmsg_ifinfo+0x50/0x5c
netdev_state_change+0x4c/0x58
linkwatch_do_dev+0x50/0x88
__linkwatch_run_queue+0x104/0x1a4
linkwatch_event+0x30/0x3c
process_one_work+0x140/0x3e0
worker_thread+0x60/0x44c
kthread+0xdc/0xf0
ret_from_fork+0x10/0x50

This is caused by the sky2 being called after it has been shutdown.
A previous thread about this can be found here:

https://lkml.org/lkml/2016/4/12/410

An alternative fix is to assure that IFF_UP gets cleared by
calling dev_close() during shutdown. This is similar to what the
bnx2/tg3/xgene and maybe others are doing to assure that the driver
isn't being called following _shutdown().

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoip6_tunnel: disable caching when the traffic class is inherited
Paolo Abeni [Wed, 16 Nov 2016 15:26:46 +0000 (16:26 +0100)]
ip6_tunnel: disable caching when the traffic class is inherited

[ Upstream commit b5c2d49544e5930c96e2632a7eece3f4325a1888 ]

If an ip6 tunnel is configured to inherit the traffic class from
the inner header, the dst_cache must be disabled or it will foul
the policy routing.

The issue is apprently there since at leat Linux-2.6.12-rc2.

Reported-by: Liam McBirnie <liam.mcbirnie@boeing.com>
Cc: Liam McBirnie <liam.mcbirnie@boeing.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet: check dead netns for peernet2id_alloc()
WANG Cong [Wed, 16 Nov 2016 18:27:02 +0000 (10:27 -0800)]
net: check dead netns for peernet2id_alloc()

[ Upstream commit cfc44a4d147ea605d66ccb917cc24467d15ff867 ]

Andrei reports we still allocate netns ID from idr after we destroy
it in cleanup_net().

cleanup_net():
  ...
  idr_destroy(&net->netns_ids);
  ...
  list_for_each_entry_reverse(ops, &pernet_list, list)
    ops_exit_list(ops, &net_exit_list);
      -> rollback_registered_many()
        -> rtmsg_ifinfo_build_skb()
         -> rtnl_fill_ifinfo()
           -> peernet2id_alloc()

After that point we should not even access net->netns_ids, we
should check the death of the current netns as early as we can in
peernet2id_alloc().

For net-next we can consider to avoid sending rtmsg totally,
it is a good optimization for netns teardown path.

Fixes: 0c7aecd4bde4 ("netns: add rtnl cmd to add and get peer netns ids")
Reported-by: Andrei Vagin <avagin@gmail.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet: dsa: b53: Fix VLAN usage and how we treat CPU port
Florian Fainelli [Tue, 15 Nov 2016 23:58:15 +0000 (15:58 -0800)]
net: dsa: b53: Fix VLAN usage and how we treat CPU port

[ Upstream commit e47112d9d6009bf6b7438cedc0270316d6b0370d ]

We currently have a fundamental problem in how we treat the CPU port and
its VLAN membership. As soon as a second VLAN is configured to be
untagged, the CPU automatically becomes untagged for that VLAN as well,
and yet, we don't gracefully make sure that the CPU becomes tagged in
the other VLANs it could be a member of. This results in only one VLAN
being effectively usable from the CPU's perspective.

Instead of having some pretty complex logic which tries to maintain the
CPU port's default VLAN and its untagged properties, just do something
very simple which consists in neither altering the CPU port's PVID
settings, nor its untagged settings:

- whenever a VLAN is added, the CPU is automatically a member of this
  VLAN group, as a tagged member
- PVID settings for downstream ports do not alter the CPU port's PVID
  since it now is part of all VLANs in the system

This means that a typical example where e.g: LAN ports are in VLAN1, and
WAN port is in VLAN2, now require having two VLAN interfaces for the
host to properly terminate and send traffic from/to.

Fixes: Fixes: a2482d2ce349 ("net: dsa: b53: Plug in VLAN support")
Reported-by: Hartmut Knaack <knaack.h@gmx.de>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agovirtio-net: add a missing synchronize_net()
Eric Dumazet [Wed, 16 Nov 2016 06:24:12 +0000 (22:24 -0800)]
virtio-net: add a missing synchronize_net()

[ Upstream commit 963abe5c8a0273a1cf5913556da1b1189de0e57a ]

It seems many drivers do not respect napi_hash_del() contract.

When napi_hash_del() is used before netif_napi_del(), an RCU grace
period is needed before freeing NAPI object.

Fixes: 91815639d880 ("virtio-net: rx busy polling support")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agogro_cells: mark napi struct as not busy poll candidates
Eric Dumazet [Tue, 15 Nov 2016 00:28:42 +0000 (16:28 -0800)]
gro_cells: mark napi struct as not busy poll candidates

[ Upstream commit e88a2766143a27bfe6704b4493b214de4094cf29 ]

Rolf Neugebauer reported very long delays at netns dismantle.

Eric W. Biederman was kind enough to look at this problem
and noticed synchronize_net() occurring from netif_napi_del() that was
added in linux-4.5

Busy polling makes no sense for tunnels NAPI.
If busy poll is used for sessions over tunnels, the poller will need to
poll the physical device queue anyway.

netif_tx_napi_add() could be used here, but function name is misleading,
and renaming it is not stable material, so set NAPI_STATE_NO_BUSY_POLL
bit directly.

This will avoid inserting gro_cells napi structures in napi_hash[]
and avoid the problematic synchronize_net() (per possible cpu) that
Rolf reported.

Fixes: 93d05d4a320c ("net: provide generic busy polling to all NAPI drivers")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Tested-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoLinux 4.8.13 v4.8.13
Greg Kroah-Hartman [Thu, 8 Dec 2016 06:16:36 +0000 (07:16 +0100)]
Linux 4.8.13

7 years agoarm64: suspend: Reconfigure PSTATE after resume from idle
James Morse [Tue, 18 Oct 2016 10:27:48 +0000 (11:27 +0100)]
arm64: suspend: Reconfigure PSTATE after resume from idle

commit d08544127d9fb4505635e3cb6871fd50a42947bd upstream.

The suspend/resume path in kernel/sleep.S, as used by cpu-idle, does not
save/restore PSTATE. As a result of this cpufeatures that were detected
and have bits in PSTATE get lost when we resume from idle.

UAO gets set appropriately on the next context switch. PAN will be
re-enabled next time we return from user-space, but on a preemptible
kernel we may run work accessing user space before this point.

Add code to re-enable theses two features in __cpu_suspend_exit().
We re-use uao_thread_switch() passing current.

Signed-off-by: James Morse <james.morse@arm.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoarm64: mm: Set PSTATE.PAN from the cpu_enable_pan() call
James Morse [Tue, 18 Oct 2016 10:27:47 +0000 (11:27 +0100)]
arm64: mm: Set PSTATE.PAN from the cpu_enable_pan() call

commit 7209c868600bd8926e37c10b9aae83124ccc1dd8 upstream.

Commit 338d4f49d6f7 ("arm64: kernel: Add support for Privileged Access
Never") enabled PAN by enabling the 'SPAN' feature-bit in SCTLR_EL1.
This means the PSTATE.PAN bit won't be set until the next return to the
kernel from userspace. On a preemptible kernel we may schedule work that
accesses userspace on a CPU before it has done this.

Now that cpufeature enable() calls are scheduled via stop_machine(), we
can set PSTATE.PAN from the cpu_enable_pan() call.

Add WARN_ON_ONCE(in_interrupt()) to check the PSTATE value we updated
is not immediately discarded.

Reported-by: Tony Thompson <anthony.thompson@arm.com>
Reported-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
[will: fixed typo in comment]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoarm64: cpufeature: Schedule enable() calls instead of calling them via IPI
James Morse [Tue, 18 Oct 2016 10:27:46 +0000 (11:27 +0100)]
arm64: cpufeature: Schedule enable() calls instead of calling them via IPI

commit 2a6dcb2b5f3e21592ca8dfa198dcce7bec09b020 upstream.

The enable() call for a cpufeature/errata is called using on_each_cpu().
This issues a cross-call IPI to get the work done. Implicitly, this
stashes the running PSTATE in SPSR when the CPU receives the IPI, and
restores it when we return. This means an enable() call can never modify
PSTATE.

To allow PAN to do this, change the on_each_cpu() call to use
stop_machine(). This schedules the work on each CPU which allows
us to modify PSTATE.

This involves changing the protype of all the enable() functions.

enable_cpu_capabilities() is called during boot and enables the feature
on all online CPUs. This path now uses stop_machine(). CPU features for
hotplug'd CPUs are enabled by verify_local_cpu_features() which only
acts on the local CPU, and can already modify the running PSTATE as it
is called from secondary_start_kernel().

Reported-by: Tony Thompson <anthony.thompson@arm.com>
Reported-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
[Removed enable() hunks for A53 workaround]
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agobatman-adv: Detect missing primaryif during tp_send as error
Sven Eckelmann [Sat, 29 Oct 2016 07:18:43 +0000 (09:18 +0200)]
batman-adv: Detect missing primaryif during tp_send as error

commit e13258f38e927b61cdb5f4ad25309450d3b127d1 upstream.

The throughput meter detects different situations as problems for the
current test. It stops the test after these and reports it to userspace.
This also has to be done when the primary interface disappeared during the
test.

Fixes: 33a3bb4a3345 ("batman-adv: throughput meter implementation")
Reported-by: Joe Perches <joe@perches.com>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoclk: sunxi: Fix M factor computation for APB1
Stéphan Rafin [Thu, 3 Nov 2016 23:53:56 +0000 (00:53 +0100)]
clk: sunxi: Fix M factor computation for APB1

commit ac95330b96376550ae7a533d1396272d675adfa2 upstream.

commit cfa636886033 ("clk: sunxi: factors: Consolidate get_factors
parameters into a struct") introduced a regression for m factor
computation in sun4i_get_apb1_factors function.

The old code reassigned the "parent_rate" parameter to the targeted
divisor value and was buggy for the returned frequency but not for the
computed factors. Now, returned frequency is good but m factor is
incorrectly computed (its max value 31 is always set resulting in a
significantly slower frequency than the requested one...)

This patch simply restores the original proper computation for m while
keeping the good changes for returned rate.

Fixes: cfa636886033 ("clk: sunxi: factors: Consolidate get_factors parameters into a struct")
Signed-off-by: Stéphan Rafin <stephan@soliotek.com>
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoperf/x86: Restore TASK_SIZE check on frame pointer
Johannes Weiner [Tue, 22 Nov 2016 09:57:42 +0000 (10:57 +0100)]
perf/x86: Restore TASK_SIZE check on frame pointer

commit ae31fe51a3cceaa0cabdb3058f69669ecb47f12e upstream.

The following commit:

  75925e1ad7f5 ("perf/x86: Optimize stack walk user accesses")

... switched from copy_from_user_nmi() to __copy_from_user_nmi() with a manual
access_ok() check.

Unfortunately, copy_from_user_nmi() does an explicit check against TASK_SIZE,
whereas the access_ok() uses whatever the current address limit of the task is.

We are getting NMIs when __probe_kernel_read() has switched to KERNEL_DS, and
then see vmalloc faults when we access what looks like pointers into vmalloc
space:

  [] WARNING: CPU: 3 PID: 3685731 at arch/x86/mm/fault.c:435 vmalloc_fault+0x289/0x290
  [] CPU: 3 PID: 3685731 Comm: sh Tainted: G        W       4.6.0-5_fbk1_223_gdbf0f40 #1
  [] Call Trace:
  []  <NMI>  [<ffffffff814717d1>] dump_stack+0x4d/0x6c
  []  [<ffffffff81076e43>] __warn+0xd3/0xf0
  []  [<ffffffff81076f2d>] warn_slowpath_null+0x1d/0x20
  []  [<ffffffff8104a899>] vmalloc_fault+0x289/0x290
  []  [<ffffffff8104b5a0>] __do_page_fault+0x330/0x490
  []  [<ffffffff8104b70c>] do_page_fault+0xc/0x10
  []  [<ffffffff81794e82>] page_fault+0x22/0x30
  []  [<ffffffff81006280>] ? perf_callchain_user+0x100/0x2a0
  []  [<ffffffff8115124f>] get_perf_callchain+0x17f/0x190
  []  [<ffffffff811512c7>] perf_callchain+0x67/0x80
  []  [<ffffffff8114e750>] perf_prepare_sample+0x2a0/0x370
  []  [<ffffffff8114e840>] perf_event_output+0x20/0x60
  []  [<ffffffff8114aee7>] ? perf_event_update_userpage+0xc7/0x130
  []  [<ffffffff8114ea01>] __perf_event_overflow+0x181/0x1d0
  []  [<ffffffff8114f484>] perf_event_overflow+0x14/0x20
  []  [<ffffffff8100a6e3>] intel_pmu_handle_irq+0x1d3/0x490
  []  [<ffffffff8147daf7>] ? copy_user_enhanced_fast_string+0x7/0x10
  []  [<ffffffff81197191>] ? vunmap_page_range+0x1a1/0x2f0
  []  [<ffffffff811972f1>] ? unmap_kernel_range_noflush+0x11/0x20
  []  [<ffffffff814f2056>] ? ghes_copy_tofrom_phys+0x116/0x1f0
  []  [<ffffffff81040d1d>] ? x2apic_send_IPI_self+0x1d/0x20
  []  [<ffffffff8100411d>] perf_event_nmi_handler+0x2d/0x50
  []  [<ffffffff8101ea31>] nmi_handle+0x61/0x110
  []  [<ffffffff8101ef94>] default_do_nmi+0x44/0x110
  []  [<ffffffff8101f13b>] do_nmi+0xdb/0x150
  []  [<ffffffff81795187>] end_repeat_nmi+0x1a/0x1e
  []  [<ffffffff8147daf7>] ? copy_user_enhanced_fast_string+0x7/0x10
  []  [<ffffffff8147daf7>] ? copy_user_enhanced_fast_string+0x7/0x10
  []  [<ffffffff8147daf7>] ? copy_user_enhanced_fast_string+0x7/0x10
  []  <<EOE>>  <IRQ>  [<ffffffff8115d05e>] ? __probe_kernel_read+0x3e/0xa0

Fix this by moving the valid_user_frame() check to before the uaccess
that loads the return address and the pointer to the next frame.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Fixes: 75925e1ad7f5 ("perf/x86: Optimize stack walk user accesses")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodrm/mediatek: fix null pointer dereference
Matthias Brugger [Fri, 18 Nov 2016 10:06:10 +0000 (11:06 +0100)]
drm/mediatek: fix null pointer dereference

commit 5ad45307d990020b25a8f7486178b6e033790f70 upstream.

The probe function requests the interrupt before initializing
the ddp component. Which leads to a null pointer dereference at boot.
Fix this by requesting the interrput after all components got
initialized properly.

Fixes: 119f5173628a ("drm/mediatek: Add DRM Driver for Mediatek SoC MT8173.")
Signed-off-by: Matthias Brugger <matthias.bgg@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: I57193a7ab554dfb37c35a455900689333adf511c

7 years agopwm: Fix device reference leak
Johan Hovold [Tue, 1 Nov 2016 10:46:39 +0000 (11:46 +0100)]
pwm: Fix device reference leak

commit 0e1614ac84f1719d87bed577963bb8140d0c9ce8 upstream.

Make sure to drop the reference to the parent device taken by
class_find_device() after "unexporting" any children when deregistering
a PWM chip.

Fixes: 0733424c9ba9 ("pwm: Unexport children before chip removal")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoKVM: use after free in kvm_ioctl_create_device()
Dan Carpenter [Wed, 30 Nov 2016 19:21:05 +0000 (22:21 +0300)]
KVM: use after free in kvm_ioctl_create_device()

commit a0f1d21c1ccb1da66629627a74059dd7f5ac9c61 upstream.

We should move the ops->destroy(dev) after the list_del(&dev->vm_node)
so that we don't use "dev" after freeing it.

Fixes: a28ebea2adc4 ("KVM: Protect device ops->create and list_add with kvm->lock")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoarm64: dts: juno: fix cluster sleep state entry latency on all SoC versions
Sudeep Holla [Wed, 16 Nov 2016 17:31:31 +0000 (17:31 +0000)]
arm64: dts: juno: fix cluster sleep state entry latency on all SoC versions

commit 909e481e2467f202b97d42beef246e8829416a85 upstream.

The core and the cluster sleep state entry latencies can't be same as
cluster sleep involves more work compared to core level e.g. shared
cache maintenance.

Experiments have shown on an average about 100us more latency for the
cluster sleep state compared to the core level sleep. This patch fixes
the entry latency for the cluster sleep state.

Fixes: 28e10a8f3a03 ("arm64: dts: juno: Add idle-states to device tree")
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: "Jon Medhurst (Tixy)" <tixy@linaro.org>
Reviewed-by: Liviu Dudau <Liviu.Dudau@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodrm/radeon: fix check for port PM availability
Alex Deucher [Mon, 28 Nov 2016 22:23:40 +0000 (17:23 -0500)]
drm/radeon: fix check for port PM availability

commit bcfdd5d5105087e6f33dfeb08a1ca6b2c0287b61 upstream.

The ATPX method does not always exist on the dGPU, it may be located at
the iGPU. The parent device of the iGPU is the root port for which
bridge_d3 is false. This accidentally enables the legacy PM method which
conflicts with port PM and prevented the dGPU from powering on.

Ported from amdgpu commit:
drm/amdgpu: fix check for port PM availability
from Peter Wu.

Fixes: d3ac31f3b4bf9fad (drm/radeon: fix power state when port pm is unavailable (v2))
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Peter Wu <peter@lekensteyn.nl>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodrm/amdgpu: fix check for port PM availability
Peter Wu [Sat, 26 Nov 2016 14:05:01 +0000 (15:05 +0100)]
drm/amdgpu: fix check for port PM availability

commit 7ac33e47d5769632010e537964c7e45498f8dc26 upstream.

The ATPX method does not always exist on the dGPU, it may be located at
the iGPU. The parent device of the iGPU is the root port for which
bridge_d3 is false. This accidentally enables the legacy PM method which
conflicts with port PM and prevented the dGPU from powering on.

Fixes: 1db4496f167b ("drm/amdgpu: fix power state when port pm is unavailable")
Reported-and-tested-by: Mike Lothian <mike@fireburn.co.uk>
Signed-off-by: Peter Wu <peter@lekensteyn.nl>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodrm/radeon: fix power state when port pm is unavailable (v2)
Peter Wu [Wed, 23 Nov 2016 01:22:25 +0000 (02:22 +0100)]
drm/radeon: fix power state when port pm is unavailable (v2)

commit d3ac31f3b4bf9fade93d69770cb9c34912e017be upstream.

When PCIe port PM is not enabled (system BIOS is pre-2015 or the
pcie_port_pm=off parameter is set), legacy ATPX PM should still be
marked as supported. Otherwise the GPU can fail to power on after
runtime suspend. This affected a Dell Inspiron 5548.

Ideally the BIOS date in the PCI core is lowered to 2013 (the first year
where hybrid graphics platforms using power resources was introduced),
but that seems more risky at this point and would not solve the
pcie_port_pm=off issue.

v2: agd: fix typo

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98505
Signed-off-by: Peter Wu <peter@lekensteyn.nl>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodrm/amdgpu: fix power state when port pm is unavailable
Peter Wu [Wed, 23 Nov 2016 01:22:24 +0000 (02:22 +0100)]
drm/amdgpu: fix power state when port pm is unavailable

commit 1db4496f167bcc7c6541d449355ade2e7d339d52 upstream.

When PCIe port PM is not enabled (system BIOS is pre-2015 or the
pcie_port_pm=off parameter is set), legacy ATPX PM should still be
marked as supported. Otherwise the GPU can fail to power on after
runtime suspend. This affected a Dell Inspiron 5548.

Ideally the BIOS date in the PCI core is lowered to 2013 (the first year
where hybrid graphics platforms using power resources was introduced),
but that seems more risky at this point and would not solve the
pcie_port_pm=off issue.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98505
Reported-and-tested-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Signed-off-by: Peter Wu <peter@lekensteyn.nl>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodrm/i915: drop the struct_mutex when wedged or trying to reset
Matthew Auld [Mon, 28 Nov 2016 10:36:48 +0000 (10:36 +0000)]
drm/i915: drop the struct_mutex when wedged or trying to reset

commit e411072d5740a49cdc9d0713798c30440757e451 upstream.

We grab the struct_mutex in intel_crtc_page_flip, but if we are wedged
or a reset is in progress we bail early but never seem to actually
release the lock.

Fixes: 7f1847ebf48b ("drm/i915: Simplify checking of GPU reset_counter in display pageflips")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161128103648.9235-1-matthew.auld@intel.com
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
(cherry picked from commit ddbb271aea87fc6004d3c8bcdb0710e980c7ec85)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodrm/i915: Don't touch NULL sg on i915_gem_object_get_pages_gtt() error
Chris Wilson [Mon, 14 Nov 2016 11:29:30 +0000 (11:29 +0000)]
drm/i915: Don't touch NULL sg on i915_gem_object_get_pages_gtt() error

commit 2420489bcb8910188578acc0c11c75445c2e4b92 upstream.

On the DMA mapping error path, sg may be NULL (it has already been
marked as the last scatterlist entry), and we should avoid dereferencing
it again.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: e227330223a7 ("drm/i915: avoid leaking DMA mappings")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161114112930.2033-1-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
(cherry picked from commit b17993b7b29612369270567643bcff814f4b3d7f)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoKVM: arm/arm64: vgic: Don't notify EOI for non-SPIs
Marc Zyngier [Wed, 23 Nov 2016 10:11:21 +0000 (10:11 +0000)]
KVM: arm/arm64: vgic: Don't notify EOI for non-SPIs

commit 8ca18eec2b2276b449c1dc86b98bf083c5fe4e09 upstream.

When we inject a level triggerered interrupt (and unless it
is backed by the physical distributor - timer style), we request
a maintenance interrupt. Part of the processing for that interrupt
is to feed to the rest of KVM (and to the eventfd subsystem) the
information that the interrupt has been EOIed.

But that notification only makes sense for SPIs, and not PPIs
(such as the PMU interrupt). Skip over the notification if
the interrupt is not an SPI.

Fixes: 140b086dd197 ("KVM: arm/arm64: vgic-new: Add GICv2 world switch backend")
Fixes: 59529f69f504 ("KVM: arm/arm64: vgic-new: Add GICv3 world switch backend")
Reported-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agomwifiex: printk() overflow with 32-byte SSIDs
Brian Norris [Wed, 9 Nov 2016 02:28:24 +0000 (18:28 -0800)]
mwifiex: printk() overflow with 32-byte SSIDs

commit fcd2042e8d36cf644bd2d69c26378d17158b17df upstream.

SSIDs aren't guaranteed to be 0-terminated. Let's cap the max length
when we print them out.

This can be easily noticed by connecting to a network with a 32-octet
SSID:

[ 3903.502925] mwifiex_pcie 0000:01:00.0: info: trying to associate to
'0123456789abcdef0123456789abcdef <uninitialized mem>' bssid
xx:xx:xx:xx:xx:xx

Fixes: 5e6e3a92b9a4 ("wireless: mwifiex: initial commit for Marvell mwifiex driver")
Signed-off-by: Brian Norris <briannorris@chromium.org>
Acked-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoPCI: Set Read Completion Boundary to 128 iff Root Port supports it (_HPX)
Johannes Thumshirn [Wed, 23 Nov 2016 16:56:28 +0000 (10:56 -0600)]
PCI: Set Read Completion Boundary to 128 iff Root Port supports it (_HPX)

commit e42010d8207f9d15a605ceb8e321bcd9648071b0 upstream.

Per PCIe spec r3.0, sec 2.3.1.1, the Read Completion Boundary (RCB)
determines the naturally aligned address boundaries on which a Read Request
may be serviced with multiple Completions:

  - For a Root Complex, RCB is 64 bytes or 128 bytes
    This value is reported in the Link Control Register

    Note: Bridges and Endpoints may implement a corresponding command bit
    which may be set by system software to indicate the RCB value for the
    Root Complex, allowing the Bridge/Endpoint to optimize its behavior
    when the Root Complex’s RCB is 128 bytes.

  - For all other system elements, RCB is 128 bytes

Per sec 7.8.7, if a Root Port only supports a 64-byte RCB, the RCB of all
downstream devices must be clear, indicating an RCB of 64 bytes.  If the
Root Port supports a 128-byte RCB, we may optionally set the RCB of
downstream devices so they know they can generate larger Completions.

Some BIOSes supply an _HPX that tells us to set RCB, even though the Root
Port doesn't have RCB set, which may lead to Malformed TLP errors if the
Endpoint generates completions larger than the Root Port can handle.

The IBM x3850 X6 with BIOS version -[A8E120CUS-1.30]- 08/22/2016 supplies
such an _HPX and a Mellanox MT27500 ConnectX-3 device fails to initialize:

  mlx4_core 0000:41:00.0: command 0xfff timed out (go bit not cleared)
  mlx4_core 0000:41:00.0: device is going to be reset
  mlx4_core 0000:41:00.0: Failed to obtain HW semaphore, aborting
  mlx4_core 0000:41:00.0: Fail to reset HCA
  ------------[ cut here ]------------
  kernel BUG at drivers/net/ethernet/mellanox/mlx4/catas.c:193!

After 6cd33649fa83 ("PCI: Add pci_configure_device() during enumeration")
and 7a1562d4f2d0 ("PCI: Apply _HPX Link Control settings to all devices
with a link"), we apply _HPX settings to *all* devices, not just those
hot-added after boot.

Before 7a1562d4f2d0, we didn't touch the Mellanox RCB, and the device
worked.  After 7a1562d4f2d0, we set its RCB to 128, and it failed.

Set the RCB to 128 iff the Root Port supports a 128-byte RCB.  Otherwise,
set RCB to 64 bytes.  This effectively ignores what _HPX tells us about
RCB.

Note that this change only affects _HPX handling.  If we have no _HPX, this
does nothing with RCB.

[bhelgaas: changelog, clear RCB if not set for Root Port]
Fixes: 6cd33649fa83 ("PCI: Add pci_configure_device() during enumeration")
Fixes: 7a1562d4f2d0 ("PCI: Apply _HPX Link Control settings to all devices with a link")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=187781
Tested-by: Frank Danapfel <fdanapfe@redhat.com>
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>