]> git.itanic.dy.fi Git - linux-stable/log
linux-stable
4 years agoLinux 5.3.13 v5.3.13
Greg Kroah-Hartman [Sun, 24 Nov 2019 07:17:01 +0000 (08:17 +0100)]
Linux 5.3.13

4 years agofbdev: Ditch fb_edid_add_monspecs
Daniel Vetter [Sun, 21 Jul 2019 20:19:56 +0000 (22:19 +0200)]
fbdev: Ditch fb_edid_add_monspecs

commit 3b8720e63f4a1fc6f422a49ecbaa3b59c86d5aaf upstream.

It's dead code ever since

commit 34280340b1dc74c521e636f45cd728f9abf56ee2
Author: Geert Uytterhoeven <geert+renesas@glider.be>
Date:   Fri Dec 4 17:01:43 2015 +0100

    fbdev: Remove unused SH-Mobile HDMI driver

Also with this gone we can remove the cea_modes db. This entire thing
is massively incomplete anyway, compared to the CEA parsing that
drm_edid.c does.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Tavis Ormandy <taviso@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190721201956.941-1-daniel.vetter@ffwll.ch
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoarm64: uaccess: Ensure PAN is re-enabled after unhandled uaccess fault
Pavel Tatashin [Tue, 19 Nov 2019 22:10:06 +0000 (17:10 -0500)]
arm64: uaccess: Ensure PAN is re-enabled after unhandled uaccess fault

commit 94bb804e1e6f0a9a77acf20d7c70ea141c6c821e upstream.

A number of our uaccess routines ('__arch_clear_user()' and
'__arch_copy_{in,from,to}_user()') fail to re-enable PAN if they
encounter an unhandled fault whilst accessing userspace.

For CPUs implementing both hardware PAN and UAO, this bug has no effect
when both extensions are in use by the kernel.

For CPUs implementing hardware PAN but not UAO, this means that a kernel
using hardware PAN may execute portions of code with PAN inadvertently
disabled, opening us up to potential security vulnerabilities that rely
on userspace access from within the kernel which would usually be
prevented by this mechanism. In other words, parts of the kernel run the
same way as they would on a CPU without PAN implemented/emulated at all.

For CPUs not implementing hardware PAN and instead relying on software
emulation via 'CONFIG_ARM64_SW_TTBR0_PAN=y', the impact is unfortunately
much worse. Calling 'schedule()' with software PAN disabled means that
the next task will execute in the kernel using the page-table and ASID
of the previous process even after 'switch_mm()', since the actual
hardware switch is deferred until return to userspace. At this point, or
if there is a intermediate call to 'uaccess_enable()', the page-table
and ASID of the new process are installed. Sadly, due to the changes
introduced by KPTI, this is not an atomic operation and there is a very
small window (two instructions) where the CPU is configured with the
page-table of the old task and the ASID of the new task; a speculative
access in this state is disastrous because it would corrupt the TLB
entries for the new task with mappings from the previous address space.

As Pavel explains:

  | I was able to reproduce memory corruption problem on Broadcom's SoC
  | ARMv8-A like this:
  |
  | Enable software perf-events with PERF_SAMPLE_CALLCHAIN so userland's
  | stack is accessed and copied.
  |
  | The test program performed the following on every CPU and forking
  | many processes:
  |
  | unsigned long *map = mmap(NULL, PAGE_SIZE, PROT_READ|PROT_WRITE,
  |   MAP_SHARED | MAP_ANONYMOUS, -1, 0);
  | map[0] = getpid();
  | sched_yield();
  | if (map[0] != getpid()) {
  | fprintf(stderr, "Corruption detected!");
  | }
  | munmap(map, PAGE_SIZE);
  |
  | From time to time I was getting map[0] to contain pid for a
  | different process.

Ensure that PAN is re-enabled when returning after an unhandled user
fault from our uaccess routines.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Cc: <stable@vger.kernel.org>
Fixes: 338d4f49d6f7 ("arm64: kernel: Add support for Privileged Access Never")
Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
[will: rewrote commit message]
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agomm/memory_hotplug: fix updating the node span
David Hildenbrand [Wed, 6 Nov 2019 05:17:10 +0000 (21:17 -0800)]
mm/memory_hotplug: fix updating the node span

commit 656d571193262a11c2daa4012e53e4d645bbce56 upstream.

We recently started updating the node span based on the zone span to
avoid touching uninitialized memmaps.

Currently, we will always detect the node span to start at 0, meaning a
node can easily span too many pages.  pgdat_is_empty() will still work
correctly if all zones span no pages.  We should skip over all zones
without spanned pages and properly handle the first detected zone that
spans pages.

Unfortunately, in contrast to the zone span (/proc/zoneinfo), the node
span cannot easily be inspected and tested.  The node span gives no real
guarantees when an architecture supports memory hotplug, meaning it can
easily contain holes or span pages of different nodes.

The node span is not really used after init on architectures that
support memory hotplug.

E.g., we use it in mm/memory_hotplug.c:try_offline_node() and in
mm/kmemleak.c:kmemleak_scan().  These users seem to be fine.

Link: http://lkml.kernel.org/r/20191027222714.5313-1-david@redhat.com
Fixes: 00d6c019b5bc ("mm/memory_hotplug: don't access uninitialized memmaps in shrink_pgdat_span()")
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agomm/memory_hotplug: don't access uninitialized memmaps in shrink_pgdat_span()
David Hildenbrand [Sat, 19 Oct 2019 03:19:33 +0000 (20:19 -0700)]
mm/memory_hotplug: don't access uninitialized memmaps in shrink_pgdat_span()

commit 00d6c019b5bc175cee3770e0e659f2b5f4804ea5 upstream.

We might use the nid of memmaps that were never initialized.  For
example, if the memmap was poisoned, we will crash the kernel in
pfn_to_nid() right now.  Let's use the calculated boundaries of the
separate zones instead.  This now also avoids having to iterate over a
whole bunch of subsections again, after shrinking one zone.

Before commit d0dc12e86b31 ("mm/memory_hotplug: optimize memory
hotplug"), the memmap was initialized to 0 and the node was set to the
right value.  After that commit, the node might be garbage.

We'll have to fix shrink_zone_span() next.

Link: http://lkml.kernel.org/r/20191006085646.5768-4-david@redhat.com
Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") [d0dc12e86b319]
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Damian Tometzki <damian.tometzki@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Halil Pasic <pasic@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jun Yao <yaojun8558363@gmail.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Pankaj Gupta <pagupta@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Rich Felker <dalias@libc.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Yu Zhao <yuzhao@google.com>
Cc: <stable@vger.kernel.org> [4.13+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoblock, bfq: deschedule empty bfq_queues not referred by any process
Paolo Valente [Thu, 14 Nov 2019 09:33:11 +0000 (10:33 +0100)]
block, bfq: deschedule empty bfq_queues not referred by any process

commit 478de3380c1c7dbb0f65f545ee0185848413f3fe upstream.

Since commit 3726112ec731 ("block, bfq: re-schedule empty queues if
they deserve I/O plugging"), to prevent the service guarantees of a
bfq_queue from being violated, the bfq_queue may be left busy, i.e.,
scheduled for service, even if empty (see comments in
__bfq_bfqq_expire() for details). But, if no process will send
requests to the bfq_queue any longer, then there is no point in
keeping the bfq_queue scheduled for service.

In addition, keeping the bfq_queue scheduled for service, but with no
process reference any longer, may cause the bfq_queue to be freed when
descheduled from service. But this is assumed to never happen, and
causes a UAF if it happens. This, in turn, caused crashes [1, 2].

This commit fixes this issue by descheduling an empty bfq_queue when
it remains with not process reference.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1767539
[2] https://bugzilla.kernel.org/show_bug.cgi?id=205447

Fixes: 3726112ec731 ("block, bfq: re-schedule empty queues if they deserve I/O plugging")
Reported-by: Chris Evich <cevich@redhat.com>
Reported-by: Patrick Dung <patdung100@gmail.com>
Reported-by: Thorsten Schubert <tschubert@bafh.org>
Tested-by: Thorsten Schubert <tschubert@bafh.org>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Cc: Laura Abbott <labbott@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agonet: cdc_ncm: Signedness bug in cdc_ncm_set_dgram_size()
Dan Carpenter [Wed, 13 Nov 2019 18:28:31 +0000 (21:28 +0300)]
net: cdc_ncm: Signedness bug in cdc_ncm_set_dgram_size()

commit a56dcc6b455830776899ce3686735f1172e12243 upstream.

This code is supposed to test for negative error codes and partial
reads, but because sizeof() is size_t (unsigned) type then negative
error codes are type promoted to high positive values and the condition
doesn't work as expected.

Fixes: 332f989a3b00 ("CDC-NCM: handle incomplete transfer of MTU")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoLinux 5.3.12 v5.3.12
Greg Kroah-Hartman [Wed, 20 Nov 2019 15:49:17 +0000 (16:49 +0100)]
Linux 5.3.12

4 years agoslcan: Fix memory leak in error path
Jouni Hogander [Wed, 13 Nov 2019 10:08:01 +0000 (12:08 +0200)]
slcan: Fix memory leak in error path

commit ed50e1600b4483c049ce76e6bd3b665a6a9300ed upstream.

This patch is fixing memory leak reported by Syzkaller:

BUG: memory leak unreferenced object 0xffff888067f65500 (size 4096):
  comm "syz-executor043", pid 454, jiffies 4294759719 (age 11.930s)
  hex dump (first 32 bytes):
    73 6c 63 61 6e 30 00 00 00 00 00 00 00 00 00 00 slcan0..........
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
  backtrace:
    [<00000000a06eec0d>] __kmalloc+0x18b/0x2c0
    [<0000000083306e66>] kvmalloc_node+0x3a/0xc0
    [<000000006ac27f87>] alloc_netdev_mqs+0x17a/0x1080
    [<0000000061a996c9>] slcan_open+0x3ae/0x9a0
    [<000000001226f0f9>] tty_ldisc_open.isra.1+0x76/0xc0
    [<0000000019289631>] tty_set_ldisc+0x28c/0x5f0
    [<000000004de5a617>] tty_ioctl+0x48d/0x1590
    [<00000000daef496f>] do_vfs_ioctl+0x1c7/0x1510
    [<0000000059068dbc>] ksys_ioctl+0x99/0xb0
    [<000000009a6eb334>] __x64_sys_ioctl+0x78/0xb0
    [<0000000053d0332e>] do_syscall_64+0x16f/0x580
    [<0000000021b83b99>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [<000000008ea75434>] 0xffffffffffffffff

Cc: Wolfgang Grandegger <wg@grandegger.com>
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Jouni Hogander <jouni.hogander@unikie.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agommc: sdhci-of-at91: fix quirk2 overwrite
Eugen Hristev [Thu, 14 Nov 2019 12:59:26 +0000 (12:59 +0000)]
mmc: sdhci-of-at91: fix quirk2 overwrite

commit fed23c5829ecab4ddc712d7b0046e59610ca3ba4 upstream.

The quirks2 are parsed and set (e.g. from DT) before the quirk for broken
HS200 is set in the driver.
The driver needs to enable just this flag, not rewrite the whole quirk set.

Fixes: 7871aa60ae00 ("mmc: sdhci-of-at91: add quirk for broken HS200")
Signed-off-by: Eugen Hristev <eugen.hristev@microchip.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agomm/page_io.c: do not free shared swap slots
Vinayak Menon [Sat, 16 Nov 2019 01:35:00 +0000 (17:35 -0800)]
mm/page_io.c: do not free shared swap slots

commit 5df373e95689b9519b8557da7c5bd0db0856d776 upstream.

The following race is observed due to which a processes faulting on a
swap entry, finds the page neither in swapcache nor swap.  This causes
zram to give a zero filled page that gets mapped to the process,
resulting in a user space crash later.

Consider parent and child processes Pa and Pb sharing the same swap slot
with swap_count 2.  Swap is on zram with SWP_SYNCHRONOUS_IO set.
Virtual address 'VA' of Pa and Pb points to the shared swap entry.

Pa                                       Pb

fault on VA                              fault on VA
do_swap_page                             do_swap_page
lookup_swap_cache fails                  lookup_swap_cache fails
                                         Pb scheduled out
swapin_readahead (deletes zram entry)
swap_free (makes swap_count 1)
                                         Pb scheduled in
                                         swap_readpage (swap_count == 1)
                                         Takes SWP_SYNCHRONOUS_IO path
                                         zram enrty absent
                                         zram gives a zero filled page

Fix this by making sure that swap slot is freed only when swap count
drops down to one.

Link: http://lkml.kernel.org/r/1571743294-14285-1-git-send-email-vinmenon@codeaurora.org
Fixes: aa8d22a11da9 ("mm: swap: SWP_SYNCHRONOUS_IO: skip swapcache only if swapped page has no other reference")
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
Suggested-by: Minchan Kim <minchan@google.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agomm/memory_hotplug: fix try_offline_node()
David Hildenbrand [Sat, 16 Nov 2019 01:34:57 +0000 (17:34 -0800)]
mm/memory_hotplug: fix try_offline_node()

commit 2c91f8fc6c999fe10185d8ad99fda1759f662f70 upstream.

try_offline_node() is pretty much broken right now:

 - The node span is updated when onlining memory, not when adding it. We
   ignore memory that was mever onlined. Bad.

 - We touch possible garbage memmaps. The pfn_to_nid(pfn) can easily
   trigger a kernel panic. Bad for memory that is offline but also bad
   for subsection hotadd with ZONE_DEVICE, whereby the memmap of the
   first PFN of a section might contain garbage.

 - Sections belonging to mixed nodes are not properly considered.

As memory blocks might belong to multiple nodes, we would have to walk
all pageblocks (or at least subsections) within present sections.
However, we don't have a way to identify whether a memmap that is not
online was initialized (relevant for ZONE_DEVICE).  This makes things
more complicated.

Luckily, we can piggy pack on the node span and the nid stored in memory
blocks.  Currently, the node span is grown when calling
move_pfn_range_to_zone() - e.g., when onlining memory, and shrunk when
removing memory, before calling try_offline_node().  Sysfs links are
created via link_mem_sections(), e.g., during boot or when adding
memory.

If the node still spans memory or if any memory block belongs to the
nid, we don't set the node offline.  As memory blocks that span multiple
nodes cannot get offlined, the nid stored in memory blocks is reliable
enough (for such online memory blocks, the node still spans the memory).

Introduce for_each_memory_block() to efficiently walk all memory blocks.

Note: We will soon stop shrinking the ZONE_DEVICE zone and the node span
when removing ZONE_DEVICE memory to fix similar issues (access of
garbage memmaps) - until we have a reliable way to identify whether
these memmaps were properly initialized.  This implies later, that once
a node had ZONE_DEVICE memory, we won't be able to set a node offline -
which should be acceptable.

Since commit f1dd2cd13c4b ("mm, memory_hotplug: do not associate
hotadded memory to zones until online") memory that is added is not
assoziated with a zone/node (memmap not initialized).  The introducing
commit 60a5a19e7419 ("memory-hotplug: remove sysfs file of node")
already missed that we could have multiple nodes for a section and that
the zone/node span is updated when onlining pages, not when adding them.

I tested this by hotplugging two DIMMs to a memory-less and cpu-less
NUMA node.  The node is properly onlined when adding the DIMMs.  When
removing the DIMMs, the node is properly offlined.

Masayoshi Mizuma reported:

: Without this patch, memory hotplug fails as panic:
:
:  BUG: kernel NULL pointer dereference, address: 0000000000000000
:  ...
:  Call Trace:
:   remove_memory_block_devices+0x81/0xc0
:   try_remove_memory+0xb4/0x130
:   __remove_memory+0xa/0x20
:   acpi_memory_device_remove+0x84/0x100
:   acpi_bus_trim+0x57/0x90
:   acpi_bus_trim+0x2e/0x90
:   acpi_device_hotplug+0x2b2/0x4d0
:   acpi_hotplug_work_fn+0x1a/0x30
:   process_one_work+0x171/0x380
:   worker_thread+0x49/0x3f0
:   kthread+0xf8/0x130
:   ret_from_fork+0x35/0x40

[david@redhat.com: v3]
Link: http://lkml.kernel.org/r/20191102120221.7553-1-david@redhat.com
Link: http://lkml.kernel.org/r/20191028105458.28320-1-david@redhat.com
Fixes: 60a5a19e7419 ("memory-hotplug: remove sysfs file of node")
Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") # visiable after d0dc12e86b319
Signed-off-by: David Hildenbrand <david@redhat.com>
Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Nayna Jain <nayna@linux.ibm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agomm: slub: really fix slab walking for init_on_free
Laura Abbott [Sat, 16 Nov 2019 01:34:50 +0000 (17:34 -0800)]
mm: slub: really fix slab walking for init_on_free

commit aea4df4c53f754cc229edde6c5465e481311cc49 upstream.

Commit 1b7e816fc80e ("mm: slub: Fix slab walking for init_on_free")
fixed one problem with the slab walking but missed a key detail: When
walking the list, the head and tail pointers need to be updated since we
end up reversing the list as a result.  Without doing this, bulk free is
broken.

One way this is exposed is a NULL pointer with slub_debug=F:

  =============================================================================
  BUG skbuff_head_cache (Tainted: G                T): Object already free
  -----------------------------------------------------------------------------

  INFO: Slab 0x000000000d2d2f8f objects=16 used=3 fp=0x0000000064309071 flags=0x3fff00000000201
  BUG: kernel NULL pointer dereference, address: 0000000000000000
  Oops: 0000 [#1] PREEMPT SMP PTI
  RIP: 0010:print_trailer+0x70/0x1d5
  Call Trace:
   <IRQ>
   free_debug_processing.cold.37+0xc9/0x149
   __slab_free+0x22a/0x3d0
   kmem_cache_free_bulk+0x415/0x420
   __kfree_skb_flush+0x30/0x40
   net_rx_action+0x2dd/0x480
   __do_softirq+0xf0/0x246
   irq_exit+0x93/0xb0
   do_IRQ+0xa0/0x110
   common_interrupt+0xf/0xf
   </IRQ>

Given we're now almost identical to the existing debugging code which
correctly walks the list, combine with that.

Link: https://lkml.kernel.org/r/20191104170303.GA50361@gandi.net
Link: http://lkml.kernel.org/r/20191106222208.26815-1-labbott@redhat.com
Fixes: 1b7e816fc80e ("mm: slub: Fix slab walking for init_on_free")
Signed-off-by: Laura Abbott <labbott@redhat.com>
Reported-by: Thibaut Sautereau <thibaut.sautereau@clip-os.org>
Acked-by: David Rientjes <rientjes@google.com>
Tested-by: Alexander Potapenko <glider@google.com>
Acked-by: Alexander Potapenko <glider@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <clipos@ssi.gouv.fr>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agomm: hugetlb: switch to css_tryget() in hugetlb_cgroup_charge_cgroup()
Roman Gushchin [Sat, 16 Nov 2019 01:34:46 +0000 (17:34 -0800)]
mm: hugetlb: switch to css_tryget() in hugetlb_cgroup_charge_cgroup()

commit 0362f326d86c645b5e96b7dbc3ee515986ed019d upstream.

An exiting task might belong to an offline cgroup.  In this case an
attempt to grab a cgroup reference from the task can end up with an
infinite loop in hugetlb_cgroup_charge_cgroup(), because neither the
cgroup will become online, neither the task will be migrated to a live
cgroup.

Fix this by switching over to css_tryget().  As css_tryget_online()
can't guarantee that the cgroup won't go offline, in most cases the
check doesn't make sense.  In this particular case users of
hugetlb_cgroup_charge_cgroup() are not affected by this change.

A similar problem is described by commit 18fa84a2db0e ("cgroup: Use
css_tryget() instead of css_tryget_online() in task_get_css()").

Link: http://lkml.kernel.org/r/20191106225131.3543616-2-guro@fb.com
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agomm: memcg: switch to css_tryget() in get_mem_cgroup_from_mm()
Roman Gushchin [Sat, 16 Nov 2019 01:34:43 +0000 (17:34 -0800)]
mm: memcg: switch to css_tryget() in get_mem_cgroup_from_mm()

commit 00d484f354d85845991b40141d40ba9e5eb60faf upstream.

We've encountered a rcu stall in get_mem_cgroup_from_mm():

  rcu: INFO: rcu_sched self-detected stall on CPU
  rcu: 33-....: (21000 ticks this GP) idle=6c6/1/0x4000000000000002 softirq=35441/35441 fqs=5017
  (t=21031 jiffies g=324821 q=95837) NMI backtrace for cpu 33
  <...>
  RIP: 0010:get_mem_cgroup_from_mm+0x2f/0x90
  <...>
   __memcg_kmem_charge+0x55/0x140
   __alloc_pages_nodemask+0x267/0x320
   pipe_write+0x1ad/0x400
   new_sync_write+0x127/0x1c0
   __kernel_write+0x4f/0xf0
   dump_emit+0x91/0xc0
   writenote+0xa0/0xc0
   elf_core_dump+0x11af/0x1430
   do_coredump+0xc65/0xee0
   get_signal+0x132/0x7c0
   do_signal+0x36/0x640
   exit_to_usermode_loop+0x61/0xd0
   do_syscall_64+0xd4/0x100
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

The problem is caused by an exiting task which is associated with an
offline memcg.  We're iterating over and over in the do {} while
(!css_tryget_online()) loop, but obviously the memcg won't become online
and the exiting task won't be migrated to a live memcg.

Let's fix it by switching from css_tryget_online() to css_tryget().

As css_tryget_online() cannot guarantee that the memcg won't go offline,
the check is usually useless, except some rare cases when for example it
determines if something should be presented to a user.

A similar problem is described by commit 18fa84a2db0e ("cgroup: Use
css_tryget() instead of css_tryget_online() in task_get_css()").

Johannes:

: The bug aside, it doesn't matter whether the cgroup is online for the
: callers.  It used to matter when offlining needed to evacuate all charges
: from the memcg, and so needed to prevent new ones from showing up, but we
: don't care now.

Link: http://lkml.kernel.org/r/20191106225131.3543616-1-guro@fb.com
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Shakeel Butt <shakeeb@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Koutn <mkoutny@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agomm: mempolicy: fix the wrong return value and potential pages leak of mbind
Yang Shi [Sat, 16 Nov 2019 01:34:33 +0000 (17:34 -0800)]
mm: mempolicy: fix the wrong return value and potential pages leak of mbind

commit a85dfc305a21acfc48fa28a0fa0a0cb6ad496120 upstream.

Commit d883544515aa ("mm: mempolicy: make the behavior consistent when
MPOL_MF_MOVE* and MPOL_MF_STRICT were specified") fixed the return value
of mbind() for a couple of corner cases.  But, it altered the errno for
some other cases, for example, mbind() should return -EFAULT when part
or all of the memory range specified by nodemask and maxnode points
outside your accessible address space, or there was an unmapped hole in
the specified memory range specified by addr and len.

Fix this by preserving the errno returned by queue_pages_range().  And,
the pagelist may be not empty even though queue_pages_range() returns
error, put the pages back to LRU since mbind_range() is not called to
really apply the policy so those pages should not be migrated, this is
also the old behavior before the problematic commit.

Link: http://lkml.kernel.org/r/1572454731-3925-1-git-send-email-yang.shi@linux.alibaba.com
Fixes: d883544515aa ("mm: mempolicy: make the behavior consistent when MPOL_MF_MOVE* and MPOL_MF_STRICT were specified")
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Reported-by: Li Xinhai <lixinhai.lxh@gmail.com>
Reviewed-by: Li Xinhai <lixinhai.lxh@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org> [4.19 and 5.2+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoiommu/vt-d: Fix QI_DEV_IOTLB_PFSID and QI_DEV_EIOTLB_PFSID macros
Eric Auger [Fri, 8 Nov 2019 15:58:03 +0000 (16:58 +0100)]
iommu/vt-d: Fix QI_DEV_IOTLB_PFSID and QI_DEV_EIOTLB_PFSID macros

commit 4e7120d79edb31e4ee68e6f8421448e4603be1e9 upstream.

For both PASID-based-Device-TLB Invalidate Descriptor and
Device-TLB Invalidate Descriptor, the Physical Function Source-ID
value is split according to this layout:

PFSID[3:0] is set at offset 12 and PFSID[15:4] is put at offset 52.
Fix the part laid out at offset 52.

Fixes: 0f725561e1684 ("iommu/vt-d: Add definitions for PFSID")
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Acked-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: stable@vger.kernel.org # v4.19+
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agonet: ethernet: dwmac-sun8i: Use the correct function in exit path
Corentin Labbe [Sun, 10 Nov 2019 11:30:48 +0000 (11:30 +0000)]
net: ethernet: dwmac-sun8i: Use the correct function in exit path

commit 40a1dcee2d1846a24619fe9ca45c661ca0db7dda upstream.

When PHY is not powered, the probe function fail and some resource are
still unallocated.
Furthermore some BUG happens:
dwmac-sun8i 5020000.ethernet: EMAC reset timeout
------------[ cut here ]------------
kernel BUG at /linux-next/net/core/dev.c:9844!

So let's use the right function (stmmac_pltfr_remove) in the error path.

Fixes: 9f93ac8d4085 ("net-next: stmmac: Add dwmac-sun8i")
Cc: <stable@vger.kernel.org> # v4.15+
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agontp/y2038: Remove incorrect time_t truncation
Arnd Bergmann [Fri, 8 Nov 2019 20:34:24 +0000 (21:34 +0100)]
ntp/y2038: Remove incorrect time_t truncation

commit 2f5841349df281ecf8f81cc82d869b8476f0db0b upstream.

A cast to 'time_t' was accidentally left in place during the
conversion of __do_adjtimex() to 64-bit timestamps, so the
resulting value is incorrectly truncated.

Remove the cast so the 64-bit time gets propagated correctly.

Fixes: ead25417f82e ("timex: use __kernel_timex internally")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20191108203435.112759-2-arnd@arndb.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoRevert "drm/i915/ehl: Update MOCS table for EHL"
Matt Roper [Tue, 12 Nov 2019 22:47:56 +0000 (14:47 -0800)]
Revert "drm/i915/ehl: Update MOCS table for EHL"

commit ed77d88752aea56b33731aee42e7146379b90769 upstream.

This reverts commit f4071997f1de016780ec6b79c63d90cd5886ee83.

These extra EHL entries won't behave as expected without a bit more work
on the kernel side so let's drop them until that kernel work has had a
chance to land.  Userspace trying to use these new entries won't get the
advantage of the new functionality these entries are meant to provide,
but at least it won't misbehave.

When we do add these back in the future, we'll probably want to
explicitly use separate tables for ICL and EHL so that userspace
software that mistakenly uses these entries (which are undefined on ICL)
sees the same behavior it sees with all the other undefined entries.

Cc: Francisco Jerez <francisco.jerez.plata@intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: <stable@vger.kernel.org> # v5.3+
Fixes: f4071997f1de ("drm/i915/ehl: Update MOCS table for EHL")
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191112224757.25116-1-matthew.d.roper@intel.com
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
(cherry picked from commit 046091758b50a5fff79726a31c1391614a3d84c8)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agodrm/i915: update rawclk also on resume
Jani Nikula [Fri, 1 Nov 2019 14:20:24 +0000 (16:20 +0200)]
drm/i915: update rawclk also on resume

commit 2f216a8507153578efc309c821528a6b81628cd2 upstream.

Since CNP it's possible for rawclk to have two different values, 19.2
and 24 MHz. If the value indicated by SFUSE_STRAP register is different
from the power on default for PCH_RAWCLK_FREQ, we'll end up having a
mismatch between the rawclk hardware and software states after
suspend/resume. On previous platforms this used to work by accident,
because the power on defaults worked just fine.

Update the rawclk also on resume. The natural place to do this would be
intel_modeset_init_hw(), however VLV/CHV need it done before
intel_power_domains_init_hw(). Thus put it there even if it feels
slightly out of place.

v2: Call intel_update_rawclck() in intel_power_domains_init_hw() for all
    platforms (Ville).

Reported-by: Shawn Lee <shawn.c.lee@intel.com>
Cc: Shawn Lee <shawn.c.lee@intel.com>
Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: Shawn Lee <shawn.c.lee@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191101142024.13877-1-jani.nikula@intel.com
(cherry picked from commit 59ed05ccdded5eb18ce012eff3d01798ac8535fa)
Cc: <stable@vger.kernel.org> # v4.15+
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoio_uring: ensure registered buffer import returns the IO length
Jens Axboe [Wed, 13 Nov 2019 23:12:46 +0000 (16:12 -0700)]
io_uring: ensure registered buffer import returns the IO length

commit 5e559561a8d7e6d4adfce6aa8fbf3daa3dec1577 upstream.

A test case was reported where two linked reads with registered buffers
failed the second link always. This is because we set the expected value
of a request in req->result, and if we don't get this result, then we
fail the dependent links. For some reason the registered buffer import
returned -ERROR/0, while the normal import returns -ERROR/length. This
broke linked commands with registered buffers.

Fix this by making io_import_fixed() correctly return the mapped length.

Cc: stable@vger.kernel.org # v5.3
Reported-by: 李通洲 <carter.li@eoitek.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoecryptfs_lookup_interpose(): lower_dentry->d_parent is not stable either
Al Viro [Sun, 3 Nov 2019 18:55:43 +0000 (13:55 -0500)]
ecryptfs_lookup_interpose(): lower_dentry->d_parent is not stable either

commit 762c69685ff7ad5ad7fee0656671e20a0c9c864d upstream.

We need to get the underlying dentry of parent; sure, absent the races
it is the parent of underlying dentry, but there's nothing to prevent
losing a timeslice to preemtion in the middle of evaluation of
lower_dentry->d_parent->d_inode, having another process move lower_dentry
around and have its (ex)parent not pinned anymore and freed on memory
pressure.  Then we regain CPU and try to fetch ->d_inode from memory
that is freed by that point.

dentry->d_parent *is* stable here - it's an argument of ->lookup() and
we are guaranteed that it won't be moved anywhere until we feed it
to d_add/d_splice_alias.  So we safely go that way to get to its
underlying dentry.

Cc: stable@vger.kernel.org # since 2009 or so
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoecryptfs_lookup_interpose(): lower_dentry->d_inode is not stable
Al Viro [Sun, 3 Nov 2019 18:45:04 +0000 (13:45 -0500)]
ecryptfs_lookup_interpose(): lower_dentry->d_inode is not stable

commit e72b9dd6a5f17d0fb51f16f8685f3004361e83d0 upstream.

lower_dentry can't go from positive to negative (we have it pinned),
but it *can* go from negative to positive.  So fetching ->d_inode
into a local variable, doing a blocking allocation, checking that
now ->d_inode is non-NULL and feeding the value we'd fetched
earlier to a function that won't accept NULL is not a good idea.

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agox86/quirks: Disable HPET on Intel Coffe Lake platforms
Kai-Heng Feng [Wed, 16 Oct 2019 10:38:16 +0000 (18:38 +0800)]
x86/quirks: Disable HPET on Intel Coffe Lake platforms

commit fc5db58539b49351e76f19817ed1102bf7c712d0 upstream.

Some Coffee Lake platforms have a skewed HPET timer once the SoCs entered
PC10, which in consequence marks TSC as unstable because HPET is used as
watchdog clocksource for TSC.

Harry Pan tried to work around it in the clocksource watchdog code [1]
thereby creating a circular dependency between HPET and TSC. This also
ignores the fact, that HPET is not only unsuitable as watchdog clocksource
on these systems, it becomes unusable in general.

Disable HPET on affected platforms.

Suggested-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203183
Link: https://lore.kernel.org/lkml/20190516090651.1396-1-harry.pan@intel.com/
Link: https://lkml.kernel.org/r/20191016103816.30650-1-kai.heng.feng@canonical.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoi2c: acpi: Force bus speed to 400KHz if a Silead touchscreen is present
Hans de Goede [Wed, 13 Nov 2019 18:29:38 +0000 (19:29 +0100)]
i2c: acpi: Force bus speed to 400KHz if a Silead touchscreen is present

commit 7574c0db2e68c4d0bae9d415a683bdd8b2a761e9 upstream.

Many cheap devices use Silead touchscreen controllers. Testing has shown
repeatedly that these touchscreen controllers work fine at 400KHz, but for
unknown reasons do not work properly at 100KHz. This has been seen on
both ARM and x86 devices using totally different i2c controllers.

On some devices the ACPI tables list another device at the same I2C-bus
as only being capable of 100KHz, testing has shown that these other
devices work fine at 400KHz (as can be expected of any recent I2C hw).

This commit makes i2c_acpi_find_bus_speed() always return 400KHz if a
Silead touchscreen controller is present, fixing the touchscreen not
working on devices which ACPI tables' wrongly list another device on the
same bus as only being capable of 100KHz.

Specifically this fixes the touchscreen on the Jumper EZpad 6 m4 not
working.

Reported-by: youling 257 <youling257@gmail.com>
Tested-by: youling 257 <youling257@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
[wsa: rewording warning a little]
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoIB/hfi1: Use a common pad buffer for 9B and 16B packets
Mike Marciniszyn [Fri, 4 Oct 2019 20:49:34 +0000 (16:49 -0400)]
IB/hfi1: Use a common pad buffer for 9B and 16B packets

commit 22bb13653410424d9fce8d447506a41f8292f22f upstream.

There is no reason for a different pad buffer for the two
packet types.

Expand the current buffer allocation to allow for both
packet types.

Fixes: f8195f3b14a0 ("IB/hfi1: Eliminate allocation while atomic")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Link: https://lore.kernel.org/r/20191004204934.26838.13099.stgit@awfm-01.aw.intel.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoIB/hfi1: Ensure full Gen3 speed in a Gen4 system
James Erwin [Fri, 1 Nov 2019 19:20:59 +0000 (15:20 -0400)]
IB/hfi1: Ensure full Gen3 speed in a Gen4 system

commit a9c3c4c597704b3a1a2b9bef990e7d8a881f6533 upstream.

If an hfi1 card is inserted in a Gen4 systems, the driver will avoid the
gen3 speed bump and the card will operate at half speed.

This is because the driver avoids the gen3 speed bump when the parent bus
speed isn't identical to gen3, 8.0GT/s.  This is not compatible with gen4
and newer speeds.

Fix by relaxing the test to explicitly look for the lower capability
speeds which inherently allows for gen4 and all future speeds.

Fixes: 7724105686e7 ("IB/hfi1: add driver files")
Link: https://lore.kernel.org/r/20191101192059.106248.1699.stgit@awfm-01.aw.intel.com
Cc: <stable@vger.kernel.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: James Erwin <james.erwin@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoIB/hfi1: TID RDMA WRITE should not return IB_WC_RNR_RETRY_EXC_ERR
Kaike Wan [Fri, 25 Oct 2019 19:58:42 +0000 (15:58 -0400)]
IB/hfi1: TID RDMA WRITE should not return IB_WC_RNR_RETRY_EXC_ERR

commit ce8e8087cf3b5b4f19d29248bfc7deef95525490 upstream.

Normal RDMA WRITE request never returns IB_WC_RNR_RETRY_EXC_ERR to ULPs
because it does not need post receive buffer on the responder side.
Consequently, as an enhancement to normal RDMA WRITE request inside the
hfi1 driver, TID RDMA WRITE request should not return such an error status
to ULPs, although it does receive RNR NAKs from the responder when TID
resources are not available. This behavior is violated when
qp->s_rnr_retry_cnt is set in current hfi1 implementation.

This patch enforces these semantics by avoiding any reaction to the updates
of the RNR QP attributes.

Fixes: 3c6cb20a0d17 ("IB/hfi1: Add TID RDMA WRITE functionality into RDMA verbs")
Link: https://lore.kernel.org/r/20191025195842.106825.71532.stgit@awfm-01.aw.intel.com
Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoIB/hfi1: Calculate flow weight based on QP MTU for TID RDMA
Kaike Wan [Fri, 25 Oct 2019 19:58:36 +0000 (15:58 -0400)]
IB/hfi1: Calculate flow weight based on QP MTU for TID RDMA

commit c2be3865a1763c4be39574937e1aae27e917af4d upstream.

For a TID RDMA WRITE request, a QP on the responder side could be put into
a queue when a hardware flow is not available. A RNR NAK will be returned
to the requester with a RNR timeout value based on the position of the QP
in the queue. The tid_rdma_flow_wt variable is used to calculate the
timeout value and is determined by using a MTU of 4096 at the module
loading time. This could reduce the timeout value by half from the desired
value, leading to excessive RNR retries.

This patch fixes the issue by calculating the flow weight with the real
MTU assigned to the QP.

Fixes: 07b923701e38 ("IB/hfi1: Add functions to receive TID RDMA WRITE request")
Link: https://lore.kernel.org/r/20191025195836.106825.77769.stgit@awfm-01.aw.intel.com
Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoIB/hfi1: Ensure r_tid_ack is valid before building TID RDMA ACK packet
Kaike Wan [Fri, 25 Oct 2019 19:58:30 +0000 (15:58 -0400)]
IB/hfi1: Ensure r_tid_ack is valid before building TID RDMA ACK packet

commit c1abd865bd125015783286b353abb8da51644f59 upstream.

The index r_tid_ack is used to indicate the next TID RDMA WRITE request to
acknowledge in the ring s_ack_queue[] on the responder side and should be
set to a valid index other than its initial value before r_tid_tail is
advanced to the next TID RDMA WRITE request and particularly before a TID
RDMA ACK is built. Otherwise, a NULL pointer dereference may result:

  BUG: unable to handle kernel paging request at ffff9a32d27abff8
  IP: [<ffffffffc0d87ea6>] hfi1_make_tid_rdma_pkt+0x476/0xcb0 [hfi1]
  PGD 2749032067 PUD 0
  Oops: 0000 1 SMP
  Modules linked in: osp(OE) ofd(OE) lfsck(OE) ost(OE) mgc(OE) osd_zfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) ib_ipoib(OE) hfi1(OE) rdmavt(OE) nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache ib_isert iscsi_target_mod target_core_mod ib_ucm dm_mirror dm_region_hash dm_log mlx5_ib dm_mod zfs(POE) rpcrdma sunrpc rdma_ucm ib_uverbs opa_vnic ib_iser zunicode(POE) ib_umad zavl(POE) icp(POE) sb_edac intel_powerclamp coretemp rdma_cm intel_rapl iosf_mbi iw_cm libiscsi scsi_transport_iscsi kvm ib_cm iTCO_wdt mxm_wmi iTCO_vendor_support irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd zcommon(POE) znvpair(POE) pcspkr spl(OE) mei_me
  sg mei ioatdma lpc_ich joydev i2c_i801 shpchp ipmi_si ipmi_devintf ipmi_msghandler wmi acpi_power_meter ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 mlx5_core drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ixgbe ahci ttm mlxfw ib_core libahci devlink mdio crct10dif_pclmul crct10dif_common drm ptp libata megaraid_sas crc32c_intel i2c_algo_bit pps_core i2c_core dca [last unloaded: rdmavt]
  CPU: 15 PID: 68691 Comm: kworker/15:2H Kdump: loaded Tainted: P W OE ------------ 3.10.0-862.2.3.el7_lustre.x86_64 #1
  Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS SE5C610.86B.01.01.0016.033120161139 03/31/2016
  Workqueue: hfi0_0 _hfi1_do_tid_send [hfi1]
  task: ffff9a01f47faf70 ti: ffff9a11776a8000 task.ti: ffff9a11776a8000
  RIP: 0010:[<ffffffffc0d87ea6>] [<ffffffffc0d87ea6>] hfi1_make_tid_rdma_pkt+0x476/0xcb0 [hfi1]
  RSP: 0018:ffff9a11776abd08 EFLAGS: 00010002
  RAX: ffff9a32d27abfc0 RBX: ffff99f2d27aa000 RCX: 00000000ffffffff
  RDX: 0000000000000000 RSI: 0000000000000220 RDI: ffff99f2ffc05300
  RBP: ffff9a11776abd88 R08: 000000000001c310 R09: ffffffffc0d87ad4
  R10: 0000000000000000 R11: 0000000000000000 R12: ffff9a117a423c00
  R13: ffff9a117a423c00 R14: ffff9a03500c0000 R15: ffff9a117a423cb8
  FS: 0000000000000000(0000) GS:ffff9a117e9c0000(0000) knlGS:0000000000000000
  CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: ffff9a32d27abff8 CR3: 0000002748a0e000 CR4: 00000000001607e0
  Call Trace:
  [<ffffffffc0d88874>] _hfi1_do_tid_send+0x194/0x320 [hfi1]
  [<ffffffffaf0b2dff>] process_one_work+0x17f/0x440
  [<ffffffffaf0b3ac6>] worker_thread+0x126/0x3c0
  [<ffffffffaf0b39a0>] ? manage_workers.isra.24+0x2a0/0x2a0
  [<ffffffffaf0bae31>] kthread+0xd1/0xe0
  [<ffffffffaf0bad60>] ? insert_kthread_work+0x40/0x40
  [<ffffffffaf71f5f7>] ret_from_fork_nospec_begin+0x21/0x21
  [<ffffffffaf0bad60>] ? insert_kthread_work+0x40/0x40
  hfi1 0000:05:00.0: hfi1_0: reserved_op: opcode 0xf2, slot 2, rsv_used 1, rsv_ops 1
  Code: 00 00 41 8b 8d d8 02 00 00 89 c8 48 89 45 b0 48 c1 65 b0 06 48 8b 83 a0 01 00 00 48 01 45 b0 48 8b 45 b0 41 80 bd 10 03 00 00 00 <48> 8b 50 38 4c 8d 7a 50 74 45 8b b2 d0 00 00 00 85 f6 0f 85 72
  RIP [<ffffffffc0d87ea6>] hfi1_make_tid_rdma_pkt+0x476/0xcb0 [hfi1]
  RSP <ffff9a11776abd08>
  CR2: ffff9a32d27abff8

This problem can happen if a RESYNC request is received before r_tid_ack
is modified.

This patch fixes the issue by making sure that r_tid_ack is set to a valid
value before a TID RDMA ACK is built. Functions are defined to simplify
the code.

Fixes: 07b923701e38 ("IB/hfi1: Add functions to receive TID RDMA WRITE request")
Fixes: 7cf0ad679de4 ("IB/hfi1: Add a function to receive TID RDMA RESYNC packet")
Link: https://lore.kernel.org/r/20191025195830.106825.44022.stgit@awfm-01.aw.intel.com
Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoKVM: MMU: Do not treat ZONE_DEVICE pages as being reserved
Sean Christopherson [Mon, 11 Nov 2019 22:12:27 +0000 (14:12 -0800)]
KVM: MMU: Do not treat ZONE_DEVICE pages as being reserved

commit a78986aae9b2988f8493f9f65a587ee433e83bc3 upstream.

Explicitly exempt ZONE_DEVICE pages from kvm_is_reserved_pfn() and
instead manually handle ZONE_DEVICE on a case-by-case basis.  For things
like page refcounts, KVM needs to treat ZONE_DEVICE pages like normal
pages, e.g. put pages grabbed via gup().  But for flows such as setting
A/D bits or shifting refcounts for transparent huge pages, KVM needs to
to avoid processing ZONE_DEVICE pages as the flows in question lack the
underlying machinery for proper handling of ZONE_DEVICE pages.

This fixes a hang reported by Adam Borowski[*] in dev_pagemap_cleanup()
when running a KVM guest backed with /dev/dax memory, as KVM straight up
doesn't put any references to ZONE_DEVICE pages acquired by gup().

Note, Dan Williams proposed an alternative solution of doing put_page()
on ZONE_DEVICE pages immediately after gup() in order to simplify the
auditing needed to ensure is_zone_device_page() is called if and only if
the backing device is pinned (via gup()).  But that approach would break
kvm_vcpu_{un}map() as KVM requires the page to be pinned from map() 'til
unmap() when accessing guest memory, unlike KVM's secondary MMU, which
coordinates with mmu_notifier invalidations to avoid creating stale
page references, i.e. doesn't rely on pages being pinned.

[*] http://lkml.kernel.org/r/20190919115547.GA17963@angband.pl

Reported-by: Adam Borowski <kilobyte@angband.pl>
Analyzed-by: David Hildenbrand <david@redhat.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: stable@vger.kernel.org
Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoInput: synaptics-rmi4 - destroy F54 poller workqueue when removing
Chuhong Yuan [Fri, 15 Nov 2019 19:32:36 +0000 (11:32 -0800)]
Input: synaptics-rmi4 - destroy F54 poller workqueue when removing

commit ba60cf9f78f0d7c8e73c7390608f7f818ee68aa0 upstream.

The driver forgets to destroy workqueue in remove() similarly to what is
done when probe() fails. Add a call to destroy_workqueue() to fix it.

Since unregistration will wait for the work to finish, we do not need to
cancel/flush the work instance in remove().

Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20191114023405.31477-1-hslester96@gmail.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoInput: synaptics-rmi4 - clear IRQ enables for F54
Lucas Stach [Wed, 13 Nov 2019 00:47:08 +0000 (16:47 -0800)]
Input: synaptics-rmi4 - clear IRQ enables for F54

commit 549766ac2ac1f6c8bb85906bbcea759541bb19a2 upstream.

The driver for F54 just polls the status and doesn't even have a IRQ
handler registered. Make sure to disable all F54 IRQs, so we don't crash
the kernel on a nonexistent handler.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Link: https://lore.kernel.org/r/20191105114402.6009-1-l.stach@pengutronix.de
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoInput: synaptics-rmi4 - do not consume more data than we have (F11, F12)
Andrew Duggan [Tue, 5 Nov 2019 00:07:30 +0000 (16:07 -0800)]
Input: synaptics-rmi4 - do not consume more data than we have (F11, F12)

commit 5d40d95e7e64756cc30606c2ba169271704d47cb upstream.

Currently, rmi_f11_attention() and rmi_f12_attention() functions update
the attn_data data pointer and size based on the size of the expected
size of the attention data. However, if the actual valid data in the
attn buffer is less then the expected value then the updated data
pointer will point to memory beyond the end of the attn buffer. Using
the calculated valid_bytes instead will prevent this from happening.

Signed-off-by: Andrew Duggan <aduggan@synaptics.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20191025002527.3189-3-aduggan@synaptics.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoInput: synaptics-rmi4 - disable the relative position IRQ in the F12 driver
Andrew Duggan [Tue, 5 Nov 2019 00:06:44 +0000 (16:06 -0800)]
Input: synaptics-rmi4 - disable the relative position IRQ in the F12 driver

commit f6aabe1ff1d9d7bad0879253011216438bdb2530 upstream.

This patch fixes an issue seen on HID touchpads which report finger
positions using RMI4 Function 12. The issue manifests itself as
spurious button presses as described in:
https://www.spinics.net/lists/linux-input/msg58618.html

Commit 24d28e4f1271 ("Input: synaptics-rmi4 - convert irq distribution
to irq_domain") switched the RMI4 driver to using an irq_domain to handle
RMI4 function interrupts. Functions with more then one interrupt now have
each interrupt mapped to their own IRQ and IRQ handler. The result of
this change is that the F12 IRQ handler was now getting called twice. Once
for the absolute data interrupt and once for the relative data interrupt.
For HID devices, calling rmi_f12_attention() a second time causes the
attn_data data pointer and size to be set incorrectly. When the touchpad
button is pressed, F30 will generate an interrupt and attempt to read the
F30 data from the invalid attn_data data pointer and report incorrect
button events.

This patch disables the F12 relative interrupt which prevents
rmi_f12_attention() from being called twice.

Signed-off-by: Andrew Duggan <aduggan@synaptics.com>
Reported-by: Simon Wood <simon@mungewell.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20191025002527.3189-2-aduggan@synaptics.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoInput: synaptics-rmi4 - fix video buffer size
Lucas Stach [Mon, 4 Nov 2019 23:58:34 +0000 (15:58 -0800)]
Input: synaptics-rmi4 - fix video buffer size

commit 003f01c780020daa9a06dea1db495b553a868c29 upstream.

The video buffer used by the queue is a vb2_v4l2_buffer, not a plain
vb2_buffer. Using the wrong type causes the allocation of the buffer
storage to be too small, causing a out of bounds write when
__init_vb2_v4l2_buffer initializes the buffer.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Fixes: 3a762dbd5347 ("[media] Input: synaptics-rmi4 - add support for F54 diagnostics")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20191104114454.10500-1-l.stach@pengutronix.de
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoInput: ff-memless - kill timer in destroy()
Oliver Neukum [Fri, 15 Nov 2019 19:35:05 +0000 (11:35 -0800)]
Input: ff-memless - kill timer in destroy()

commit fa3a5a1880c91bb92594ad42dfe9eedad7996b86 upstream.

No timer must be left running when the device goes away.

Signed-off-by: Oliver Neukum <oneukum@suse.com>
Reported-and-tested-by: syzbot+b6c55daa701fc389e286@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/1573726121.17351.3.camel@suse.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agocgroup: freezer: call cgroup_enter_frozen() with preemption disabled in ptrace_stop()
Oleg Nesterov [Wed, 9 Oct 2019 15:02:30 +0000 (17:02 +0200)]
cgroup: freezer: call cgroup_enter_frozen() with preemption disabled in ptrace_stop()

commit 937c6b27c73e02cd4114f95f5c37ba2c29fadba1 upstream.

ptrace_stop() does preempt_enable_no_resched() to avoid the preemption,
but after that cgroup_enter_frozen() does spin_lock/unlock and this adds
another preemption point.

Reported-and-tested-by: Bruce Ashfield <bruce.ashfield@gmail.com>
Fixes: 76f969e8948d ("cgroup: cgroup v2 freezer")
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoBtrfs: fix log context list corruption after rename exchange operation
Filipe Manana [Fri, 8 Nov 2019 16:11:56 +0000 (16:11 +0000)]
Btrfs: fix log context list corruption after rename exchange operation

commit e6c617102c7e4ac1398cb0b98ff1f0727755b520 upstream.

During rename exchange we might have successfully log the new name in the
source root's log tree, in which case we leave our log context (allocated
on stack) in the root's list of log contextes. However we might fail to
log the new name in the destination root, in which case we fallback to
a transaction commit later and never sync the log of the source root,
which causes the source root log context to remain in the list of log
contextes. This later causes invalid memory accesses because the context
was allocated on stack and after rename exchange finishes the stack gets
reused and overwritten for other purposes.

The kernel's linked list corruption detector (CONFIG_DEBUG_LIST=y) can
detect this and report something like the following:

  [  691.489929] ------------[ cut here ]------------
  [  691.489947] list_add corruption. prev->next should be next (ffff88819c944530), but was ffff8881c23f7be4. (prev=ffff8881c23f7a38).
  [  691.489967] WARNING: CPU: 2 PID: 28933 at lib/list_debug.c:28 __list_add_valid+0x95/0xe0
  (...)
  [  691.489998] CPU: 2 PID: 28933 Comm: fsstress Not tainted 5.4.0-rc6-btrfs-next-62 #1
  [  691.490001] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
  [  691.490003] RIP: 0010:__list_add_valid+0x95/0xe0
  (...)
  [  691.490007] RSP: 0018:ffff8881f0b3faf8 EFLAGS: 00010282
  [  691.490010] RAX: 0000000000000000 RBX: ffff88819c944530 RCX: 0000000000000000
  [  691.490011] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffffffa2c497e0
  [  691.490013] RBP: ffff8881f0b3fe68 R08: ffffed103eaa4115 R09: ffffed103eaa4114
  [  691.490015] R10: ffff88819c944000 R11: ffffed103eaa4115 R12: 7fffffffffffffff
  [  691.490016] R13: ffff8881b4035610 R14: ffff8881e7b84728 R15: 1ffff1103e167f7b
  [  691.490019] FS:  00007f4b25ea2e80(0000) GS:ffff8881f5500000(0000) knlGS:0000000000000000
  [  691.490021] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [  691.490022] CR2: 00007fffbb2d4eec CR3: 00000001f2a4a004 CR4: 00000000003606e0
  [  691.490025] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  [  691.490027] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  [  691.490029] Call Trace:
  [  691.490058]  btrfs_log_inode_parent+0x667/0x2730 [btrfs]
  [  691.490083]  ? join_transaction+0x24a/0xce0 [btrfs]
  [  691.490107]  ? btrfs_end_log_trans+0x80/0x80 [btrfs]
  [  691.490111]  ? dget_parent+0xb8/0x460
  [  691.490116]  ? lock_downgrade+0x6b0/0x6b0
  [  691.490121]  ? rwlock_bug.part.0+0x90/0x90
  [  691.490127]  ? do_raw_spin_unlock+0x142/0x220
  [  691.490151]  btrfs_log_dentry_safe+0x65/0x90 [btrfs]
  [  691.490172]  btrfs_sync_file+0x9f1/0xc00 [btrfs]
  [  691.490195]  ? btrfs_file_write_iter+0x1800/0x1800 [btrfs]
  [  691.490198]  ? rcu_read_lock_any_held.part.11+0x20/0x20
  [  691.490204]  ? __do_sys_newstat+0x88/0xd0
  [  691.490207]  ? cp_new_stat+0x5d0/0x5d0
  [  691.490218]  ? do_fsync+0x38/0x60
  [  691.490220]  do_fsync+0x38/0x60
  [  691.490224]  __x64_sys_fdatasync+0x32/0x40
  [  691.490228]  do_syscall_64+0x9f/0x540
  [  691.490233]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
  [  691.490235] RIP: 0033:0x7f4b253ad5f0
  (...)
  [  691.490239] RSP: 002b:00007fffbb2d6078 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
  [  691.490242] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f4b253ad5f0
  [  691.490244] RDX: 00007fffbb2d5fe0 RSI: 00007fffbb2d5fe0 RDI: 0000000000000003
  [  691.490245] RBP: 000000000000000d R08: 0000000000000001 R09: 00007fffbb2d608c
  [  691.490247] R10: 00000000000002e8 R11: 0000000000000246 R12: 00000000000001f4
  [  691.490248] R13: 0000000051eb851f R14: 00007fffbb2d6120 R15: 00005635a498bda0

This started happening recently when running some test cases from fstests
like btrfs/004 for example, because support for rename exchange was added
last week to fsstress from fstests.

So fix this by deleting the log context for the source root from the list
if we have logged the new name in the source root.

Reported-by: Su Yue <Damenly_Su@gmx.com>
Fixes: d4682ba03ef618 ("Btrfs: sync log after logging new name")
CC: stable@vger.kernel.org # 4.19+
Tested-by: Su Yue <Damenly_Su@gmx.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoALSA: usb-audio: Fix incorrect size check for processing/extension units
Takashi Iwai [Thu, 14 Nov 2019 16:56:12 +0000 (17:56 +0100)]
ALSA: usb-audio: Fix incorrect size check for processing/extension units

commit 976a68f06b2ea49e2ab67a5f84919a8b105db8be upstream.

The recently introduced unit descriptor validation had some bug for
processing and extension units, it counts a bControlSize byte twice so
it expected a bigger size than it should have been.  This seems
resulting in a probe error on a few devices.

Fix the calculation for proper checks of PU and EU.

Fixes: 57f8770620e9 ("ALSA: usb-audio: More validations of descriptor units")
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20191114165613.7422-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoALSA: usb-audio: Fix incorrect NULL check in create_yamaha_midi_quirk()
Takashi Iwai [Wed, 13 Nov 2019 11:12:59 +0000 (12:12 +0100)]
ALSA: usb-audio: Fix incorrect NULL check in create_yamaha_midi_quirk()

commit cc9dbfa9707868fb0ca864c05e0c42d3f4d15cf2 upstream.

The commit 60849562a5db ("ALSA: usb-audio: Fix possible NULL
dereference at create_yamaha_midi_quirk()") added NULL checks in
create_yamaha_midi_quirk(), but there was an overlook.  The code
allows one of either injd or outjd is NULL, but the second if check
made returning -ENODEV if any of them is NULL.  Fix it in a proper
form.

Fixes: 60849562a5db ("ALSA: usb-audio: Fix possible NULL dereference at create_yamaha_midi_quirk()")
Reported-by: Pavel Machek <pavel@denx.de>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20191113111259.24123-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoALSA: usb-audio: not submit urb for stopped endpoint
Henry Lin [Wed, 13 Nov 2019 02:14:19 +0000 (10:14 +0800)]
ALSA: usb-audio: not submit urb for stopped endpoint

commit 528699317dd6dc722dccc11b68800cf945109390 upstream.

While output urb's snd_complete_urb() is executing, calling
prepare_outbound_urb() may cause endpoint stopped before
prepare_outbound_urb() returns and result in next urb submitted
to stopped endpoint. usb-audio driver cannot re-use it afterwards as
the urb is still hold by usb stack.

This change checks EP_FLAG_RUNNING flag after prepare_outbound_urb() again
to let snd_complete_urb() know the endpoint already stopped and does not
submit next urb. Below kind of error will be fixed:

[  213.153103] usb 1-2: timeout: still 1 active urbs on EP #1
[  213.164121] usb 1-2: cannot submit urb 0, error -16: unknown error

Signed-off-by: Henry Lin <henryl@nvidia.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20191113021420.13377-1-henryl@nvidia.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoALSA: usb-audio: Fix missing error check at mixer resolution test
Takashi Iwai [Sat, 9 Nov 2019 18:16:58 +0000 (19:16 +0100)]
ALSA: usb-audio: Fix missing error check at mixer resolution test

commit 167beb1756791e0806365a3f86a0da10d7a327ee upstream.

A check of the return value from get_cur_mix_raw() is missing at the
resolution test code in get_min_max_with_quirks(), which may leave the
variable untouched, leading to a random uninitialized value, as
detected by syzkaller fuzzer.

Add the missing return error check for fixing that.

Reported-and-tested-by: syzbot+abe1ab7afc62c6bb6377@syzkaller.appspotmail.com
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20191109181658.30368-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agonet/smc: fix refcount non-blocking connect() -part 2
Ursula Braun [Tue, 12 Nov 2019 15:03:41 +0000 (16:03 +0100)]
net/smc: fix refcount non-blocking connect() -part 2

[ Upstream commit 6d6dd528d5af05dc2d0c773951ed68d630a0c3f1 ]

If an SMC socket is immediately terminated after a non-blocking connect()
has been called, a memory leak is possible.
Due to the sock_hold move in
commit 301428ea3708 ("net/smc: fix refcounting for non-blocking connect()")
an extra sock_put() is needed in smc_connect_work(), if the internal
TCP socket is aborted and cancels the sk_stream_wait_connect() of the
connect worker.

Reported-by: syzbot+4b73ad6fc767e576e275@syzkaller.appspotmail.com
Fixes: 301428ea3708 ("net/smc: fix refcounting for non-blocking connect()")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agodevlink: Add method for time-stamp on reporter's dump
Aya Levin [Sun, 10 Nov 2019 12:11:56 +0000 (14:11 +0200)]
devlink: Add method for time-stamp on reporter's dump

[ Upstream commit d279505b723cba058b604ed8cf9cd4c854e2a041 ]

When setting the dump's time-stamp, use ktime_get_real in addition to
jiffies. This simplifies the user space implementation and bypasses
some inconsistent behavior with translating jiffies to current time.
The time taken is transformed into nsec, to comply with y2038 issue.

Fixes: c8e1da0bf923 ("devlink: Add health report functionality")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agodpaa2-eth: free already allocated channels on probe defer
Ioana Ciornei [Tue, 12 Nov 2019 16:21:52 +0000 (18:21 +0200)]
dpaa2-eth: free already allocated channels on probe defer

[ Upstream commit 5aa4277d4368c099223bbcd3a9086f3351a12ce9 ]

The setup_dpio() function tries to allocate a number of channels equal
to the number of CPUs online. When there are not enough DPCON objects
already probed, the function will return EPROBE_DEFER. When this
happens, the already allocated channels are not freed. This results in
the incapacity of properly probing the next time around.
Fix this by freeing the channels on the error path.

Fixes: d7f5a9d89a55 ("dpaa2-eth: defer probe on object allocate")
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agotcp: remove redundant new line from tcp_event_sk_skb
Tony Lu [Sat, 9 Nov 2019 10:43:06 +0000 (18:43 +0800)]
tcp: remove redundant new line from tcp_event_sk_skb

[ Upstream commit dd3d792def0d4f33bbf319982b1878b0c8aaca34 ]

This removes '\n' from trace event class tcp_event_sk_skb to avoid
redundant new blank line and make output compact.

Fixes: af4325ecc24f ("tcp: expose sk_state in tcp_retransmit_skb tracepoint")
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoslip: Fix memory leak in slip_open error path
Jouni Hogander [Wed, 13 Nov 2019 11:45:02 +0000 (13:45 +0200)]
slip: Fix memory leak in slip_open error path

[ Upstream commit 3b5a39979dafea9d0cd69c7ae06088f7a84cdafa ]

Driver/net/can/slcan.c is derived from slip.c. Memory leak was detected
by Syzkaller in slcan. Same issue exists in slip.c and this patch is
addressing the leak in slip.c.

Here is the slcan memory leak trace reported by Syzkaller:

BUG: memory leak unreferenced object 0xffff888067f65500 (size 4096):
  comm "syz-executor043", pid 454, jiffies 4294759719 (age 11.930s)
  hex dump (first 32 bytes):
    73 6c 63 61 6e 30 00 00 00 00 00 00 00 00 00 00 slcan0..........
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
  backtrace:
    [<00000000a06eec0d>] __kmalloc+0x18b/0x2c0
    [<0000000083306e66>] kvmalloc_node+0x3a/0xc0
    [<000000006ac27f87>] alloc_netdev_mqs+0x17a/0x1080
    [<0000000061a996c9>] slcan_open+0x3ae/0x9a0
    [<000000001226f0f9>] tty_ldisc_open.isra.1+0x76/0xc0
    [<0000000019289631>] tty_set_ldisc+0x28c/0x5f0
    [<000000004de5a617>] tty_ioctl+0x48d/0x1590
    [<00000000daef496f>] do_vfs_ioctl+0x1c7/0x1510
    [<0000000059068dbc>] ksys_ioctl+0x99/0xb0
    [<000000009a6eb334>] __x64_sys_ioctl+0x78/0xb0
    [<0000000053d0332e>] do_syscall_64+0x16f/0x580
    [<0000000021b83b99>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [<000000008ea75434>] 0xfffffffffffffff

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Jouni Hogander <jouni.hogander@unikie.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agonet: usb: qmi_wwan: add support for Foxconn T77W968 LTE modules
Aleksander Morgado [Wed, 13 Nov 2019 10:11:10 +0000 (11:11 +0100)]
net: usb: qmi_wwan: add support for Foxconn T77W968 LTE modules

[ Upstream commit 802753cb0b141cf5170ab97fe7e79f5ca10d06b0 ]

These are the Foxconn-branded variants of the Dell DW5821e modules,
same USB layout as those.

The QMI interface is exposed in USB configuration #1:

P:  Vendor=0489 ProdID=e0b4 Rev=03.18
S:  Manufacturer=FII
S:  Product=T77W968 LTE
S:  SerialNumber=0123456789ABCDEF
C:  #Ifs= 6 Cfg#= 1 Atr=a0 MxPwr=500mA
I:  If#=0x0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan
I:  If#=0x1 Alt= 0 #EPs= 1 Cls=03(HID  ) Sub=00 Prot=00 Driver=usbhid
I:  If#=0x2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
I:  If#=0x3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
I:  If#=0x4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
I:  If#=0x5 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option

Signed-off-by: Aleksander Morgado <aleksander@aleksander.es>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agonet/smc: fix fastopen for non-blocking connect()
Ursula Braun [Fri, 15 Nov 2019 11:39:30 +0000 (12:39 +0100)]
net/smc: fix fastopen for non-blocking connect()

[ Upstream commit 8204df72bea1a7d83d0777add6da98a41dfbdc34 ]

FASTOPEN does not work with SMC-sockets. Since SMC allows fallback to
TCP native during connection start, the FASTOPEN setsockopts trigger
this fallback, if the SMC-socket is still in state SMC_INIT.
But if a FASTOPEN setsockopt is called after a non-blocking connect(),
this is broken, and fallback does not make sense.
This change complements
commit cd2063604ea6 ("net/smc: avoid fallback in case of non-blocking connect")
and fixes the syzbot reported problem "WARNING in smc_unhash_sk".

Reported-by: syzbot+8488cc4cf1c9e09b8b86@syzkaller.appspotmail.com
Fixes: e1bbdd570474 ("net/smc: reduce sock_put() for fallback sockets")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agonet: gemini: add missed free_netdev
Chuhong Yuan [Fri, 15 Nov 2019 06:24:54 +0000 (14:24 +0800)]
net: gemini: add missed free_netdev

[ Upstream commit 18d647ae74116bfee38953978501cea2960a0c25 ]

This driver forgets to free allocated netdev in remove like
what is done in probe failure.
Add the free to fix it.

Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agomlxsw: core: Enable devlink reload only on probe
Jiri Pirko [Sun, 10 Nov 2019 15:31:23 +0000 (16:31 +0100)]
mlxsw: core: Enable devlink reload only on probe

[ Upstream commit 73a533ecf0af5f73ff72dd7c96d1c8598ca93649 ]

Call devlink enable only during probe time and avoid deadlock
during reload.

Reported-by: Shalom Toledo <shalomt@mellanox.com>
Fixes: 5a508a254bed ("devlink: disallow reload operation during device cleanup")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Tested-by: Shalom Toledo <shalomt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoipmr: Fix skb headroom in ipmr_get_route().
Guillaume Nault [Fri, 15 Nov 2019 17:29:52 +0000 (18:29 +0100)]
ipmr: Fix skb headroom in ipmr_get_route().

[ Upstream commit 7901cd97963d6cbde88fa25a4a446db3554c16c6 ]

In route.c, inet_rtm_getroute_build_skb() creates an skb with no
headroom. This skb is then used by inet_rtm_getroute() which may pass
it to rt_fill_info() and, from there, to ipmr_get_route(). The later
might try to reuse this skb by cloning it and prepending an IPv4
header. But since the original skb has no headroom, skb_push() triggers
skb_under_panic():

skbuff: skb_under_panic: text:00000000ca46ad8a len:80 put:20 head:00000000cd28494e data:000000009366fd6b tail:0x3c end:0xec0 dev:veth0
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:108!
invalid opcode: 0000 [#1] SMP KASAN PTI
CPU: 6 PID: 587 Comm: ip Not tainted 5.4.0-rc6+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
RIP: 0010:skb_panic+0xbf/0xd0
Code: 41 a2 ff 8b 4b 70 4c 8b 4d d0 48 c7 c7 20 76 f5 8b 44 8b 45 bc 48 8b 55 c0 48 8b 75 c8 41 54 41 57 41 56 41 55 e8 75 dc 7a ff <0f> 0b 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
RSP: 0018:ffff888059ddf0b0 EFLAGS: 00010286
RAX: 0000000000000086 RBX: ffff888060a315c0 RCX: ffffffff8abe4822
RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88806c9a79cc
RBP: ffff888059ddf118 R08: ffffed100d9361b1 R09: ffffed100d9361b0
R10: ffff88805c68aee3 R11: ffffed100d9361b1 R12: ffff88805d218000
R13: ffff88805c689fec R14: 000000000000003c R15: 0000000000000ec0
FS:  00007f6af184b700(0000) GS:ffff88806c980000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffc8204a000 CR3: 0000000057b40006 CR4: 0000000000360ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 skb_push+0x7e/0x80
 ipmr_get_route+0x459/0x6fa
 rt_fill_info+0x692/0x9f0
 inet_rtm_getroute+0xd26/0xf20
 rtnetlink_rcv_msg+0x45d/0x630
 netlink_rcv_skb+0x1a5/0x220
 rtnetlink_rcv+0x15/0x20
 netlink_unicast+0x305/0x3a0
 netlink_sendmsg+0x575/0x730
 sock_sendmsg+0xb5/0xc0
 ___sys_sendmsg+0x497/0x4f0
 __sys_sendmsg+0xcb/0x150
 __x64_sys_sendmsg+0x48/0x50
 do_syscall_64+0xd2/0xac0
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Actually the original skb used to have enough headroom, but the
reserve_skb() call was lost with the introduction of
inet_rtm_getroute_build_skb() by commit 404eb77ea766 ("ipv4: support
sport, dport and ip_proto in RTM_GETROUTE").

We could reserve some headroom again in inet_rtm_getroute_build_skb(),
but this function shouldn't be responsible for handling the special
case of ipmr_get_route(). Let's handle that directly in
ipmr_get_route() by calling skb_realloc_headroom() instead of
skb_clone().

Fixes: 404eb77ea766 ("ipv4: support sport, dport and ip_proto in RTM_GETROUTE")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agodevlink: disallow reload operation during device cleanup
Jiri Pirko [Sat, 9 Nov 2019 10:29:46 +0000 (11:29 +0100)]
devlink: disallow reload operation during device cleanup

[ Upstream commit 5a508a254bed9a2e36a5fb96c9065532a6bf1e9c ]

There is a race between driver code that does setup/cleanup of device
and devlink reload operation that in some drivers works with the same
code. Use after free could we easily obtained by running:

while true; do
        echo "0000:00:10.0" >/sys/bus/pci/drivers/mlxsw_spectrum2/bind
        devlink dev reload pci/0000:00:10.0 &
        echo "0000:00:10.0" >/sys/bus/pci/drivers/mlxsw_spectrum2/unbind
done

Fix this by enabling reload only after setup of device is complete and
disabling it at the beginning of the cleanup process.

Reported-by: Ido Schimmel <idosch@mellanox.com>
Fixes: 2d8dc5bbf4e7 ("devlink: Add support for reload")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoax88172a: fix information leak on short answers
Oliver Neukum [Thu, 14 Nov 2019 10:16:01 +0000 (11:16 +0100)]
ax88172a: fix information leak on short answers

[ Upstream commit a9a51bd727d141a67b589f375fe69d0e54c4fe22 ]

If a malicious device gives a short MAC it can elicit up to
5 bytes of leaked memory out of the driver. We need to check for
ETH_ALEN instead.

Reported-by: syzbot+a8d4acdad35e6bbca308@syzkaller.appspotmail.com
Signed-off-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoscsi: core: Handle drivers which set sg_tablesize to zero
Michael Schmitz [Tue, 5 Nov 2019 02:49:10 +0000 (15:49 +1300)]
scsi: core: Handle drivers which set sg_tablesize to zero

commit 9393c8de628cf0968d81a17cc11841e42191e041 upstream.

In scsi_mq_setup_tags(), cmd_size is calculated based on zero size for the
scatter-gather list in case the low level driver uses SG_NONE in its host
template.

cmd_size is passed on to the block layer for calculation of the request
size, and we've seen NULL pointer dereference errors from the block layer
in drivers where SG_NONE is used and a mq IO scheduler is active,
apparently as a consequence of this (see commit 68ab2d76e4be ("scsi:
cxlflash: Set sg_tablesize to 1 instead of SG_NONE"), and a recent patch by
Finn Thain converting the three m68k NFR5380 drivers to avoid setting
SG_NONE).

Try to avoid these errors by accounting for at least one sg list entry when
calculating cmd_size, regardless of whether the low level driver set a zero
sg_tablesize.

Tested on 030 m68k with the atari_scsi driver - setting sg_tablesize to
SG_NONE no longer results in a crash when loading this driver.

CC: Finn Thain <fthain@telegraphics.com.au>
Link: https://lore.kernel.org/r/1572922150-4358-1-git-send-email-schmitzmic@gmail.com
Signed-off-by: Michael Schmitz <schmitzmic@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoLinux 5.3.11 v5.3.11
Greg Kroah-Hartman [Tue, 12 Nov 2019 18:28:31 +0000 (19:28 +0100)]
Linux 5.3.11

4 years agokvm: x86: mmu: Recovery of shattered NX large pages
Junaid Shahid [Thu, 31 Oct 2019 23:14:14 +0000 (00:14 +0100)]
kvm: x86: mmu: Recovery of shattered NX large pages

commit 1aa9b9572b10529c2e64e2b8f44025d86e124308 upstream.

The page table pages corresponding to broken down large pages are zapped in
FIFO order, so that the large page can potentially be recovered, if it is
not longer being used for execution.  This removes the performance penalty
for walking deeper EPT page tables.

By default, one large page will last about one hour once the guest
reaches a steady state.

Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agokvm: Add helper function for creating VM worker threads
Junaid Shahid [Thu, 31 Oct 2019 23:14:08 +0000 (00:14 +0100)]
kvm: Add helper function for creating VM worker threads

commit c57c80467f90e5504c8df9ad3555d2c78800bf94 upstream.

Add a function to create a kernel thread associated with a given VM. In
particular, it ensures that the worker thread inherits the priority and
cgroups of the calling thread.

Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agokvm: mmu: ITLB_MULTIHIT mitigation
Paolo Bonzini [Mon, 4 Nov 2019 11:22:02 +0000 (12:22 +0100)]
kvm: mmu: ITLB_MULTIHIT mitigation

commit b8e8c8303ff28c61046a4d0f6ea99aea609a7dc0 upstream.

With some Intel processors, putting the same virtual address in the TLB
as both a 4 KiB and 2 MiB page can confuse the instruction fetch unit
and cause the processor to issue a machine check resulting in a CPU lockup.

Unfortunately when EPT page tables use huge pages, it is possible for a
malicious guest to cause this situation.

Add a knob to mark huge pages as non-executable. When the nx_huge_pages
parameter is enabled (and we are using EPT), all huge pages are marked as
NX. If the guest attempts to execute in one of those pages, the page is
broken down into 4K pages, which are then marked executable.

This is not an issue for shadow paging (except nested EPT), because then
the host is in control of TLB flushes and the problematic situation cannot
happen.  With nested EPT, again the nested guest can cause problems shadow
and direct EPT is treated in the same way.

[ tglx: Fixup default to auto and massage wording a bit ]

Originally-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agokvm: x86, powerpc: do not allow clearing largepages debugfs entry
Paolo Bonzini [Fri, 11 Oct 2019 09:59:48 +0000 (11:59 +0200)]
kvm: x86, powerpc: do not allow clearing largepages debugfs entry

commit 833b45de69a6016c4b0cebe6765d526a31a81580 upstream.

The largepages debugfs entry is incremented/decremented as shadow
pages are created or destroyed.  Clearing it will result in an
underflow, which is harmless to KVM but ugly (and could be
misinterpreted by tools that use debugfs information), so make
this particular statistic read-only.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm-ppc@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoDocumentation: Add ITLB_MULTIHIT documentation
Gomez Iglesias, Antonio [Mon, 4 Nov 2019 11:22:03 +0000 (12:22 +0100)]
Documentation: Add ITLB_MULTIHIT documentation

commit 7f00cc8d4a51074eb0ad4c3f16c15757b1ddfb7d upstream.

Add the initial ITLB_MULTIHIT documentation.

[ tglx: Add it to the index so it gets actually built. ]

Signed-off-by: Antonio Gomez Iglesias <antonio.gomez.iglesias@intel.com>
Signed-off-by: Nelson D'Souza <nelson.dsouza@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agocpu/speculation: Uninline and export CPU mitigations helpers
Tyler Hicks [Mon, 4 Nov 2019 11:22:02 +0000 (12:22 +0100)]
cpu/speculation: Uninline and export CPU mitigations helpers

commit 731dc9df975a5da21237a18c3384f811a7a41cc6 upstream.

A kernel module may need to check the value of the "mitigations=" kernel
command line parameter as part of its setup when the module needs
to perform software mitigations for a CPU flaw.

Uninline and export the helper functions surrounding the cpu_mitigations
enum to allow for their usage from a module.

Lastly, privatize the enum and cpu_mitigations variable since the value of
cpu_mitigations can be checked with the exported helper functions.

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agox86/cpu: Add Tremont to the cpu vulnerability whitelist
Pawan Gupta [Mon, 4 Nov 2019 11:22:01 +0000 (12:22 +0100)]
x86/cpu: Add Tremont to the cpu vulnerability whitelist

commit cad14885a8d32c1c0d8eaa7bf5c0152a22b6080e upstream.

Add the new cpu family ATOM_TREMONT_D to the cpu vunerability
whitelist. ATOM_TREMONT_D is not affected by X86_BUG_ITLB_MULTIHIT.

ATOM_TREMONT_D might have mitigations against other issues as well, but
only the ITLB multihit mitigation is confirmed at this point.

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agox86/bugs: Add ITLB_MULTIHIT bug infrastructure
Vineela Tummalapalli [Mon, 4 Nov 2019 11:22:01 +0000 (12:22 +0100)]
x86/bugs: Add ITLB_MULTIHIT bug infrastructure

commit db4d30fbb71b47e4ecb11c4efa5d8aad4b03dfae upstream.

Some processors may incur a machine check error possibly resulting in an
unrecoverable CPU lockup when an instruction fetch encounters a TLB
multi-hit in the instruction TLB. This can occur when the page size is
changed along with either the physical address or cache type. The relevant
erratum can be found here:

   https://bugzilla.kernel.org/show_bug.cgi?id=205195

There are other processors affected for which the erratum does not fully
disclose the impact.

This issue affects both bare-metal x86 page tables and EPT.

It can be mitigated by either eliminating the use of large pages or by
using careful TLB invalidations when changing the page size in the page
tables.

Just like Spectre, Meltdown, L1TF and MDS, a new bit has been allocated in
MSR_IA32_ARCH_CAPABILITIES (PSCHANGE_MC_NO) and will be set on CPUs which
are mitigated against this issue.

Signed-off-by: Vineela Tummalapalli <vineela.tummalapalli@intel.com>
Co-developed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agox86/speculation/taa: Fix printing of TAA_MSG_SMT on IBRS_ALL CPUs
Josh Poimboeuf [Thu, 7 Nov 2019 02:26:46 +0000 (20:26 -0600)]
x86/speculation/taa: Fix printing of TAA_MSG_SMT on IBRS_ALL CPUs

commit 012206a822a8b6ac09125bfaa210a95b9eb8f1c1 upstream.

For new IBRS_ALL CPUs, the Enhanced IBRS check at the beginning of
cpu_bugs_smt_update() causes the function to return early, unintentionally
skipping the MDS and TAA logic.

This is not a problem for MDS, because there appears to be no overlap
between IBRS_ALL and MDS-affected CPUs.  So the MDS mitigation would be
disabled and nothing would need to be done in this function anyway.

But for TAA, the TAA_MSG_SMT string will never get printed on Cascade
Lake and newer.

The check is superfluous anyway: when 'spectre_v2_enabled' is
SPECTRE_V2_IBRS_ENHANCED, 'spectre_v2_user' is always
SPECTRE_V2_USER_NONE, and so the 'spectre_v2_user' switch statement
handles it appropriately by doing nothing.  So just remove the check.

Fixes: 1b42f017415b ("x86/speculation/taa: Add mitigation for TSX Async Abort")
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tyler Hicks <tyhicks@canonical.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agox86/tsx: Add config options to set tsx=on|off|auto
Michal Hocko [Wed, 23 Oct 2019 10:35:50 +0000 (12:35 +0200)]
x86/tsx: Add config options to set tsx=on|off|auto

commit db616173d787395787ecc93eef075fa975227b10 upstream.

There is a general consensus that TSX usage is not largely spread while
the history shows there is a non trivial space for side channel attacks
possible. Therefore the tsx is disabled by default even on platforms
that might have a safe implementation of TSX according to the current
knowledge. This is a fair trade off to make.

There are, however, workloads that really do benefit from using TSX and
updating to a newer kernel with TSX disabled might introduce a
noticeable regressions. This would be especially a problem for Linux
distributions which will provide TAA mitigations.

Introduce config options X86_INTEL_TSX_MODE_OFF, X86_INTEL_TSX_MODE_ON
and X86_INTEL_TSX_MODE_AUTO to control the TSX feature. The config
setting can be overridden by the tsx cmdline options.

 [ bp: Text cleanups from Josh. ]

Suggested-by: Borislav Petkov <bpetkov@suse.de>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agox86/speculation/taa: Add documentation for TSX Async Abort
Pawan Gupta [Wed, 23 Oct 2019 10:32:55 +0000 (12:32 +0200)]
x86/speculation/taa: Add documentation for TSX Async Abort

commit a7a248c593e4fd7a67c50b5f5318fe42a0db335e upstream.

Add the documenation for TSX Async Abort. Include the description of
the issue, how to check the mitigation state, control the mitigation,
guidance for system administrators.

 [ bp: Add proper SPDX tags, touch ups by Josh and me. ]

Co-developed-by: Antonio Gomez Iglesias <antonio.gomez.iglesias@intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Antonio Gomez Iglesias <antonio.gomez.iglesias@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Mark Gross <mgross@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agox86/tsx: Add "auto" option to the tsx= cmdline parameter
Pawan Gupta [Wed, 23 Oct 2019 10:28:57 +0000 (12:28 +0200)]
x86/tsx: Add "auto" option to the tsx= cmdline parameter

commit 7531a3596e3272d1f6841e0d601a614555dc6b65 upstream.

Platforms which are not affected by X86_BUG_TAA may want the TSX feature
enabled. Add "auto" option to the TSX cmdline parameter. When tsx=auto
disable TSX when X86_BUG_TAA is present, otherwise enable TSX.

More details on X86_BUG_TAA can be found here:
https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/tsx_async_abort.html

 [ bp: Extend the arg buffer to accommodate "auto\0". ]

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agokvm/x86: Export MDS_NO=0 to guests when TSX is enabled
Pawan Gupta [Wed, 23 Oct 2019 10:23:33 +0000 (12:23 +0200)]
kvm/x86: Export MDS_NO=0 to guests when TSX is enabled

commit e1d38b63acd843cfdd4222bf19a26700fd5c699e upstream.

Export the IA32_ARCH_CAPABILITIES MSR bit MDS_NO=0 to guests on TSX
Async Abort(TAA) affected hosts that have TSX enabled and updated
microcode. This is required so that the guests don't complain,

  "Vulnerable: Clear CPU buffers attempted, no microcode"

when the host has the updated microcode to clear CPU buffers.

Microcode update also adds support for MSR_IA32_TSX_CTRL which is
enumerated by the ARCH_CAP_TSX_CTRL bit in IA32_ARCH_CAPABILITIES MSR.
Guests can't do this check themselves when the ARCH_CAP_TSX_CTRL bit is
not exported to the guests.

In this case export MDS_NO=0 to the guests. When guests have
CPUID.MD_CLEAR=1, they deploy MDS mitigation which also mitigates TAA.

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Neelima Krishnan <neelima.krishnan@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agox86/speculation/taa: Add sysfs reporting for TSX Async Abort
Pawan Gupta [Wed, 23 Oct 2019 10:19:51 +0000 (12:19 +0200)]
x86/speculation/taa: Add sysfs reporting for TSX Async Abort

commit 6608b45ac5ecb56f9e171252229c39580cc85f0f upstream.

Add the sysfs reporting file for TSX Async Abort. It exposes the
vulnerability and the mitigation state similar to the existing files for
the other hardware vulnerabilities.

Sysfs file path is:
/sys/devices/system/cpu/vulnerabilities/tsx_async_abort

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Neelima Krishnan <neelima.krishnan@intel.com>
Reviewed-by: Mark Gross <mgross@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agox86/speculation/taa: Add mitigation for TSX Async Abort
Pawan Gupta [Wed, 23 Oct 2019 09:30:45 +0000 (11:30 +0200)]
x86/speculation/taa: Add mitigation for TSX Async Abort

commit 1b42f017415b46c317e71d41c34ec088417a1883 upstream.

TSX Async Abort (TAA) is a side channel vulnerability to the internal
buffers in some Intel processors similar to Microachitectural Data
Sampling (MDS). In this case, certain loads may speculatively pass
invalid data to dependent operations when an asynchronous abort
condition is pending in a TSX transaction.

This includes loads with no fault or assist condition. Such loads may
speculatively expose stale data from the uarch data structures as in
MDS. Scope of exposure is within the same-thread and cross-thread. This
issue affects all current processors that support TSX, but do not have
ARCH_CAP_TAA_NO (bit 8) set in MSR_IA32_ARCH_CAPABILITIES.

On CPUs which have their IA32_ARCH_CAPABILITIES MSR bit MDS_NO=0,
CPUID.MD_CLEAR=1 and the MDS mitigation is clearing the CPU buffers
using VERW or L1D_FLUSH, there is no additional mitigation needed for
TAA. On affected CPUs with MDS_NO=1 this issue can be mitigated by
disabling the Transactional Synchronization Extensions (TSX) feature.

A new MSR IA32_TSX_CTRL in future and current processors after a
microcode update can be used to control the TSX feature. There are two
bits in that MSR:

* TSX_CTRL_RTM_DISABLE disables the TSX sub-feature Restricted
Transactional Memory (RTM).

* TSX_CTRL_CPUID_CLEAR clears the RTM enumeration in CPUID. The other
TSX sub-feature, Hardware Lock Elision (HLE), is unconditionally
disabled with updated microcode but still enumerated as present by
CPUID(EAX=7).EBX{bit4}.

The second mitigation approach is similar to MDS which is clearing the
affected CPU buffers on return to user space and when entering a guest.
Relevant microcode update is required for the mitigation to work.  More
details on this approach can be found here:

  https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html

The TSX feature can be controlled by the "tsx" command line parameter.
If it is force-enabled then "Clear CPU buffers" (MDS mitigation) is
deployed. The effective mitigation state can be read from sysfs.

 [ bp:
   - massage + comments cleanup
   - s/TAA_MITIGATION_TSX_DISABLE/TAA_MITIGATION_TSX_DISABLED/g - Josh.
   - remove partial TAA mitigation in update_mds_branch_idle() - Josh.
   - s/tsx_async_abort_cmdline/tsx_async_abort_parse_cmdline/g
 ]

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agox86/cpu: Add a "tsx=" cmdline option with TSX disabled by default
Pawan Gupta [Wed, 23 Oct 2019 09:01:53 +0000 (11:01 +0200)]
x86/cpu: Add a "tsx=" cmdline option with TSX disabled by default

commit 95c5824f75f3ba4c9e8e5a4b1a623c95390ac266 upstream.

Add a kernel cmdline parameter "tsx" to control the Transactional
Synchronization Extensions (TSX) feature. On CPUs that support TSX
control, use "tsx=on|off" to enable or disable TSX. Not specifying this
option is equivalent to "tsx=off". This is because on certain processors
TSX may be used as a part of a speculative side channel attack.

Carve out the TSX controlling functionality into a separate compilation
unit because TSX is a CPU feature while the TSX async abort control
machinery will go to cpu/bugs.c.

 [ bp: - Massage, shorten and clear the arg buffer.
       - Clarifications of the tsx= possible options - Josh.
       - Expand on TSX_CTRL availability - Pawan. ]

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agox86/cpu: Add a helper function x86_read_arch_cap_msr()
Pawan Gupta [Wed, 23 Oct 2019 08:52:35 +0000 (10:52 +0200)]
x86/cpu: Add a helper function x86_read_arch_cap_msr()

commit 286836a70433fb64131d2590f4bf512097c255e1 upstream.

Add a helper function to read the IA32_ARCH_CAPABILITIES MSR.

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Neelima Krishnan <neelima.krishnan@intel.com>
Reviewed-by: Mark Gross <mgross@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agox86/msr: Add the IA32_TSX_CTRL MSR
Pawan Gupta [Wed, 23 Oct 2019 08:45:50 +0000 (10:45 +0200)]
x86/msr: Add the IA32_TSX_CTRL MSR

commit c2955f270a84762343000f103e0640d29c7a96f3 upstream.

Transactional Synchronization Extensions (TSX) may be used on certain
processors as part of a speculative side channel attack.  A microcode
update for existing processors that are vulnerable to this attack will
add a new MSR - IA32_TSX_CTRL to allow the system administrator the
option to disable TSX as one of the possible mitigations.

The CPUs which get this new MSR after a microcode upgrade are the ones
which do not set MSR_IA32_ARCH_CAPABILITIES.MDS_NO (bit 5) because those
CPUs have CPUID.MD_CLEAR, i.e., the VERW implementation which clears all
CPU buffers takes care of the TAA case as well.

  [ Note that future processors that are not vulnerable will also
    support the IA32_TSX_CTRL MSR. ]

Add defines for the new IA32_TSX_CTRL MSR and its bits.

TSX has two sub-features:

1. Restricted Transactional Memory (RTM) is an explicitly-used feature
   where new instructions begin and end TSX transactions.
2. Hardware Lock Elision (HLE) is implicitly used when certain kinds of
   "old" style locks are used by software.

Bit 7 of the IA32_ARCH_CAPABILITIES indicates the presence of the
IA32_TSX_CTRL MSR.

There are two control bits in IA32_TSX_CTRL MSR:

  Bit 0: When set, it disables the Restricted Transactional Memory (RTM)
         sub-feature of TSX (will force all transactions to abort on the
 XBEGIN instruction).

  Bit 1: When set, it disables the enumeration of the RTM and HLE feature
         (i.e. it will make CPUID(EAX=7).EBX{bit4} and
  CPUID(EAX=7).EBX{bit11} read as 0).

The other TSX sub-feature, Hardware Lock Elision (HLE), is
unconditionally disabled by the new microcode but still enumerated
as present by CPUID(EAX=7).EBX{bit4}, unless disabled by
IA32_TSX_CTRL_MSR[1] - TSX_CTRL_CPUID_CLEAR.

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Neelima Krishnan <neelima.krishnan@intel.com>
Reviewed-by: Mark Gross <mgross@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agodrm/i915/cmdparser: Fix jump whitelist clearing
Ben Hutchings [Mon, 11 Nov 2019 16:13:24 +0000 (08:13 -0800)]
drm/i915/cmdparser: Fix jump whitelist clearing

commit ea0b163b13ffc52818c079adb00d55e227a6da6f upstream.

When a jump_whitelist bitmap is reused, it needs to be cleared.
Currently this is done with memset() and the size calculation assumes
bitmaps are made of 32-bit words, not longs.  So on 64-bit
architectures, only the first half of the bitmap is cleared.

If some whitelist bits are carried over between successive batches
submitted on the same context, this will presumably allow embedding
the rogue instructions that we're trying to reject.

Use bitmap_zero() instead, which gets the calculation right.

Fixes: f8c08d8faee5 ("drm/i915/cmdparser: Add support for backward jumps")
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agodrm/i915/gen8+: Add RC6 CTX corruption WA
Imre Deak [Mon, 9 Jul 2018 15:24:27 +0000 (18:24 +0300)]
drm/i915/gen8+: Add RC6 CTX corruption WA

commit 7e34f4e4aad3fd34c02b294a3cf2321adf5b4438 upstream.

In some circumstances the RC6 context can get corrupted. We can detect
this and take the required action, that is disable RC6 and runtime PM.
The HW recovers from the corrupted state after a system suspend/resume
cycle, so detect the recovery and re-enable RC6 and runtime PM.

v2: rebase (Mika)
v3:
- Move intel_suspend_gt_powersave() to the end of the GEM suspend
  sequence.
- Add commit message.
v4:
- Rebased on intel_uncore_forcewake_put(i915->uncore, ...) API
  change.
v5: rebased on gem/gt split (Mika)

Signed-off-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agodrm/i915: Lower RM timeout to avoid DSI hard hangs
Uma Shankar [Tue, 7 Aug 2018 15:45:35 +0000 (21:15 +0530)]
drm/i915: Lower RM timeout to avoid DSI hard hangs

commit 1d85a299c4db57c55e0229615132c964d17aa765 upstream.

In BXT/APL, device 2 MMIO reads from MIPI controller requires its PLL
to be turned ON. When MIPI PLL is turned off (MIPI Display is not
active or connected), and someone (host or GT engine) tries to read
MIPI registers, it causes hard hang. This is a hardware restriction
or limitation.

Driver by itself doesn't read MIPI registers when MIPI display is off.
But any userspace application can submit unprivileged batch buffer for
execution. In that batch buffer there can be mmio reads. And these
reads are allowed even for unprivileged applications. If these
register reads are for MIPI DSI controller and MIPI display is not
active during that time, then the MMIO read operation causes system
hard hang and only way to recover is hard reboot. A genuine
process/application won't submit batch buffer like this and doesn't
cause any issue. But on a compromised system, a malign userspace
process/app can generate such batch buffer and can trigger system
hard hang (denial of service attack).

The fix is to lower the internal MMIO timeout value to an optimum
value of 950us as recommended by hardware team. If the timeout is
beyond 1ms (which will hit for any value we choose if MMIO READ on a
DSI specific register is performed without PLL ON), it causes the
system hang. But if the timeout value is lower than it will be below
the threshold (even if timeout happens) and system will not get into
a hung state. This will avoid a system hang without losing any
programming or GT interrupts, taking the worst case of lowest CDCLK
frequency and early DC5 abort into account.

Signed-off-by: Uma Shankar <uma.shankar@intel.com>
Reviewed-by: Jon Bloomfield <jon.bloomfield@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agodrm/i915/cmdparser: Ignore Length operands during command matching
Jon Bloomfield [Thu, 20 Sep 2018 16:45:10 +0000 (09:45 -0700)]
drm/i915/cmdparser: Ignore Length operands during command matching

commit 926abff21a8f29ef159a3ac893b05c6e50e043c3 upstream.

Some of the gen instruction macros (e.g. MI_DISPLAY_FLIP) have the
length directly encoded in them. Since these are used directly in
the tables, the Length becomes part of the comparison used for
matching during parsing. Thus, if the cmd being parsed has a
different length to that in the table, it is not matched and the
cmd is accepted via the default variable length path.

Fix by masking out everything except the Opcode in the cmd tables

Cc: Tony Luck <tony.luck@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com>
Reviewed-by: Chris Wilson <chris.p.wilson@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agodrm/i915/cmdparser: Add support for backward jumps
Jon Bloomfield [Thu, 20 Sep 2018 16:58:36 +0000 (09:58 -0700)]
drm/i915/cmdparser: Add support for backward jumps

commit f8c08d8faee5567803c8c533865296ca30286bbf upstream.

To keep things manageable, the pre-gen9 cmdparser does not
attempt to track any form of nested BB_START's. This did not
prevent usermode from using nested starts, or even chained
batches because the cmdparser is not strictly enforced pre gen9.

Instead, the existence of a nested BB_START would cause the batch
to be emitted in insecure mode, and any privileged capabilities
would not be available.

For Gen9, the cmdparser becomes mandatory (for BCS at least), and
so not providing any form of nested BB_START support becomes
overly restrictive. Any such batch will simply not run.

We make heavy use of backward jumps in igt, and it is much easier
to add support for this restricted subset of nested jumps, than to
rewrite the whole of our test suite to avoid them.

Add the required logic to support limited backward jumps, to
instructions that have already been validated by the parser.

Note that it's not sufficient to simply approve any BB_START
that jumps backwards in the buffer because this would allow an
attacker to embed a rogue instruction sequence within the
operand words of a harmless instruction (say LRI) and jump to
that.

We introduce a bit array to track every instr offset successfully
validated, and test the target of BB_START against this. If the
target offset hits, it is re-written to the same offset in the
shadow buffer and the BB_START cmd is allowed.

Note: This patch deliberately ignores checkpatch issues in the
cmdtables, in order to match the style of the surrounding code.
We'll correct the entire file in one go in a later patch.

v2: set dispatch secure late (Mika)
v3: rebase (Mika)
v4: Clear whitelist on each parse
Minor review updates (Chris)
v5: Correct backward jump batching
v6: fix compilation error due to struct eb shuffle (Mika)

Cc: Tony Luck <tony.luck@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris.p.wilson@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agodrm/i915/cmdparser: Use explicit goto for error paths
Jon Bloomfield [Thu, 27 Sep 2018 17:23:17 +0000 (10:23 -0700)]
drm/i915/cmdparser: Use explicit goto for error paths

commit 0546a29cd884fb8184731c79ab008927ca8859d0 upstream.

In the next patch we will be adding a second valid
termination condition which will require a small
amount of refactoring to share logic with the BB_END
case.

Refactor all error conditions to jump to a dedicated
exit path, with 'break' reserved only for a successful
parse.

Cc: Tony Luck <tony.luck@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com>
Reviewed-by: Chris Wilson <chris.p.wilson@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agodrm/i915: Add gen9 BCS cmdparsing
Jon Bloomfield [Mon, 23 Apr 2018 18:12:15 +0000 (11:12 -0700)]
drm/i915: Add gen9 BCS cmdparsing

commit 0f2f39758341df70202ae1c42d5a1e4ee392b6d3 upstream.

For gen9 we enable cmdparsing on the BCS ring, specifically
to catch inadvertent accesses to sensitive registers

Unlike gen7/hsw, we use the parser only to block certain
registers. We can rely on h/w to block restricted commands,
so the command tables only provide enough info to allow the
parser to delineate each command, and identify commands that
access registers.

Note: This patch deliberately ignores checkpatch issues in
favour of matching the style of the surrounding code. We'll
correct the entire file in one go in a later patch.

v3: rebase (Mika)
v4: Add RING_TIMESTAMP registers to whitelist (Jon)

Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris.p.wilson@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agodrm/i915: Allow parsing of unsized batches
Jon Bloomfield [Wed, 1 Aug 2018 16:45:50 +0000 (09:45 -0700)]
drm/i915: Allow parsing of unsized batches

commit 435e8fc059dbe0eec823a75c22da2972390ba9e0 upstream.

In "drm/i915: Add support for mandatory cmdparsing" we introduced the
concept of mandatory parsing. This allows the cmdparser to be invoked
even when user passes batch_len=0 to the execbuf ioctl's.

However, the cmdparser needs to know the extents of the buffer being
scanned. Refactor the code to ensure the cmdparser uses the actual
object size, instead of the incoming length, if user passes 0.

Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Tyler Hicks <tyhicks@canonical.com>
Reviewed-by: Chris Wilson <chris.p.wilson@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agodrm/i915: Support ro ppgtt mapped cmdparser shadow buffers
Jon Bloomfield [Tue, 22 May 2018 20:59:06 +0000 (13:59 -0700)]
drm/i915: Support ro ppgtt mapped cmdparser shadow buffers

commit 4f7af1948abcb18b4772fe1bcd84d7d27d96258c upstream.

For Gen7, the original cmdparser motive was to permit limited
use of register read/write instructions in unprivileged BB's.
This worked by copying the user supplied bb to a kmd owned
bb, and running it in secure mode, from the ggtt, only if
the scanner finds no unsafe commands or registers.

For Gen8+ we can't use this same technique because running bb's
from the ggtt also disables access to ppgtt space. But we also
do not actually require 'secure' execution since we are only
trying to reduce the available command/register set. Instead we
will copy the user buffer to a kmd owned read-only bb in ppgtt,
and run in the usual non-secure mode.

Note that ro pages are only supported by ppgtt (not ggtt), but
luckily that's exactly what we need.

Add the required paths to map the shadow buffer to ppgtt ro for Gen8+

v2: IS_GEN7/IS_GEN (Mika)
v3: rebase
v4: rebase
v5: rebase

Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris.p.wilson@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agodrm/i915: Add support for mandatory cmdparsing
Jon Bloomfield [Wed, 1 Aug 2018 16:33:59 +0000 (09:33 -0700)]
drm/i915: Add support for mandatory cmdparsing

commit 311a50e76a33d1e029563c24b2ff6db0c02b5afe upstream.

The existing cmdparser for gen7 can be bypassed by specifying
batch_len=0 in the execbuf call. This is safe because bypassing
simply reduces the cmd-set available.

In a later patch we will introduce cmdparsing for gen9, as a
security measure, which must be strictly enforced since without
it we are vulnerable to DoS attacks.

Introduce the concept of 'required' cmd parsing that cannot be
bypassed by submitting zero-length bb's.

v2: rebase (Mika)
v2: rebase (Mika)
v3: fix conflict on engine flags (Mika)

Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris.p.wilson@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agodrm/i915: Remove Master tables from cmdparser
Jon Bloomfield [Fri, 8 Jun 2018 17:05:26 +0000 (10:05 -0700)]
drm/i915: Remove Master tables from cmdparser

commit 66d8aba1cd6db34af10de465c0d52af679288cb6 upstream.

The previous patch has killed support for secure batches
on gen6+, and hence the cmdparsers master tables are
now dead code. Remove them.

Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Tyler Hicks <tyhicks@canonical.com>
Reviewed-by: Chris Wilson <chris.p.wilson@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agodrm/i915: Disable Secure Batches for gen6+
Jon Bloomfield [Fri, 8 Jun 2018 15:53:46 +0000 (08:53 -0700)]
drm/i915: Disable Secure Batches for gen6+

commit 44157641d448cbc0c4b73c5231d2b911f0cb0427 upstream.

Retroactively stop reporting support for secure batches
through the api for gen6+ so that older binaries trigger
the fallback path instead.

Older binaries use secure batches pre gen6 to access resources
that are not available to normal usermode processes. However,
all known userspace explicitly checks for HAS_SECURE_BATCHES
before relying on the secure batch feature.

Since there are no known binaries relying on this for newer gens
we can kill secure batches from gen6, via I915_PARAM_HAS_SECURE_BATCHES.

v2: rebase (Mika)
v3: rebase (Mika)

Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris.p.wilson@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agodrm/i915: Rename gen7 cmdparser tables
Jon Bloomfield [Fri, 20 Apr 2018 21:26:01 +0000 (14:26 -0700)]
drm/i915: Rename gen7 cmdparser tables

commit 0a2f661b6c21815a7fa60e30babe975fee8e73c6 upstream.

We're about to introduce some new tables for later gens, and the
current naming for the gen7 tables will no longer make sense.

v2: rebase

Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris.p.wilson@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agonet/ibmvnic: unlock rtnl_lock in reset so linkwatch_event can run
Juliet Kim [Fri, 20 Sep 2019 20:11:22 +0000 (16:11 -0400)]
net/ibmvnic: unlock rtnl_lock in reset so linkwatch_event can run

[ Upstream commit b27507bb59ed504d7fa4d6a35f25a8cc39903b54 ]

Commit a5681e20b541 ("net/ibmnvic: Fix deadlock problem in reset")
made the change to hold the RTNL lock during a reset to avoid deadlock
but linkwatch_event is fired during the reset and needs the RTNL lock.
That keeps linkwatch_event process from proceeding until the reset
is complete. The reset process cannot tolerate the linkwatch_event
processing after reset completes, so release the RTNL lock during the
process to allow a chance for linkwatch_event to run during reset.
This does not guarantee that the linkwatch_event will be processed as
soon as link state changes, but is an improvement over the current code
where linkwatch_event processing is always delayed, which prevents
transmissions on the device from being deactivated leading transmit
watchdog timer to time-out.

Release the RTNL lock before link state change and re-acquire after
the link state change to allow linkwatch_event to grab the RTNL lock
and run during the reset.

Fixes: a5681e20b541 ("net/ibmnvic: Fix deadlock problem in reset")
Signed-off-by: Juliet Kim <julietk@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoarm64: errata: Update stale comment
Thierry Reding [Mon, 23 Sep 2019 09:12:29 +0000 (11:12 +0200)]
arm64: errata: Update stale comment

[ Upstream commit 7a292b6c7c9c35afee01ce3b2248f705869d0ff1 ]

Commit 73f381660959 ("arm64: Advertise mitigation of Spectre-v2, or lack
thereof") renamed the caller of the install_bp_hardening_cb() function
but forgot to update a comment, which can be confusing when trying to
follow the code flow.

Fixes: 73f381660959 ("arm64: Advertise mitigation of Spectre-v2, or lack thereof")
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agonetfilter: ipset: Copy the right MAC address in hash:ip,mac IPv6 sets
Stefano Brivio [Thu, 10 Oct 2019 17:18:14 +0000 (19:18 +0200)]
netfilter: ipset: Copy the right MAC address in hash:ip,mac IPv6 sets

[ Upstream commit 97664bc2c77e2b65cdedddcae2643fc93291d958 ]

Same as commit 1b4a75108d5b ("netfilter: ipset: Copy the right MAC
address in bitmap:ip,mac and hash:ip,mac sets"), another copy and paste
went wrong in commit 8cc4ccf58379 ("netfilter: ipset: Allow matching on
destination MAC address for mac and ipmac sets").

When I fixed this for IPv4 in 1b4a75108d5b, I didn't realise that
hash:ip,mac sets also support IPv6 as family, and this is covered by a
separate function, hash_ipmac6_kadt().

In hash:ip,mac sets, the first dimension is the IP address, and the
second dimension is the MAC address: check the IPSET_DIM_TWO_SRC flag
in flags while deciding which MAC address to copy, destination or
source.

This way, mixing source and destination matches for the two dimensions
of ip,mac hash type works as expected, also for IPv6. With this setup:

  ip netns add A
  ip link add veth1 type veth peer name veth2 netns A
  ip addr add 2001:db8::1/64 dev veth1
  ip -net A addr add 2001:db8::2/64 dev veth2
  ip link set veth1 up
  ip -net A link set veth2 up

  dst=$(ip netns exec A cat /sys/class/net/veth2/address)

  ip netns exec A ipset create test_hash hash:ip,mac family inet6
  ip netns exec A ipset add test_hash 2001:db8::1,${dst}
  ip netns exec A ip6tables -A INPUT -p icmpv6 --icmpv6-type 135 -j ACCEPT
  ip netns exec A ip6tables -A INPUT -m set ! --match-set test_hash src,dst -j DROP

ipset now correctly matches a test packet:

  # ping -c1 2001:db8::2 >/dev/null
  # echo $?
  0

Reported-by: Chen, Yi <yiche@redhat.com>
Fixes: 8cc4ccf58379 ("netfilter: ipset: Allow matching on destination MAC address for mac and ipmac sets")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agobonding: fix using uninitialized mode_lock
Taehee Yoo [Tue, 29 Oct 2019 09:12:32 +0000 (09:12 +0000)]
bonding: fix using uninitialized mode_lock

[ Upstream commit ad9bd8daf2f9938572b0604e1280fefa8f338581 ]

When a bonding interface is being created, it setups its mode and options.
At that moment, it uses mode_lock so mode_lock should be initialized
before that moment.

rtnl_newlink()
rtnl_create_link()
alloc_netdev_mqs()
->setup() //bond_setup()
->newlink //bond_newlink
bond_changelink()
register_netdevice()
->ndo_init() //bond_init()

After commit 089bca2caed0 ("bonding: use dynamic lockdep key instead of
subclass"), mode_lock is initialized in bond_init().
So in the bond_changelink(), un-initialized mode_lock can be used.
mode_lock should be initialized in bond_setup().
This patch partially reverts commit 089bca2caed0 ("bonding: use dynamic
lockdep key instead of subclass")

Test command:
    ip link add bond0 type bond mode 802.3ad lacp_rate 0

Splat looks like:
[   60.615127] INFO: trying to register non-static key.
[   60.615900] the code is fine but needs lockdep annotation.
[   60.616697] turning off the locking correctness validator.
[   60.617490] CPU: 1 PID: 957 Comm: ip Not tainted 5.4.0-rc3+ #109
[   60.618350] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[   60.619481] Call Trace:
[   60.619918]  dump_stack+0x7c/0xbb
[   60.620453]  register_lock_class+0x1215/0x14d0
[   60.621131]  ? alloc_netdev_mqs+0x7b3/0xcc0
[   60.621771]  ? is_bpf_text_address+0x86/0xf0
[   60.622416]  ? is_dynamic_key+0x230/0x230
[   60.623032]  ? unwind_get_return_address+0x5f/0xa0
[   60.623757]  ? create_prof_cpu_mask+0x20/0x20
[   60.624408]  ? arch_stack_walk+0x83/0xb0
[   60.625023]  __lock_acquire+0xd8/0x3de0
[   60.625616]  ? stack_trace_save+0x82/0xb0
[   60.626225]  ? stack_trace_consume_entry+0x160/0x160
[   60.626957]  ? deactivate_slab.isra.80+0x2c5/0x800
[   60.627668]  ? register_lock_class+0x14d0/0x14d0
[   60.628380]  ? alloc_netdev_mqs+0x7b3/0xcc0
[   60.629020]  ? save_stack+0x69/0x80
[   60.629574]  ? save_stack+0x19/0x80
[   60.630121]  ? __kasan_kmalloc.constprop.4+0xa0/0xd0
[   60.630859]  ? __kmalloc_node+0x16f/0x480
[   60.631472]  ? alloc_netdev_mqs+0x7b3/0xcc0
[   60.632121]  ? rtnl_create_link+0x2ed/0xad0
[   60.634388]  ? __rtnl_newlink+0xad4/0x11b0
[   60.635024]  lock_acquire+0x164/0x3b0
[   60.635608]  ? bond_3ad_update_lacp_rate+0x91/0x200 [bonding]
[   60.636463]  _raw_spin_lock_bh+0x38/0x70
[   60.637084]  ? bond_3ad_update_lacp_rate+0x91/0x200 [bonding]
[   60.637930]  bond_3ad_update_lacp_rate+0x91/0x200 [bonding]
[   60.638753]  ? bond_3ad_lacpdu_recv+0xb30/0xb30 [bonding]
[   60.639552]  ? bond_opt_get_val+0x180/0x180 [bonding]
[   60.640307]  ? ___slab_alloc+0x5aa/0x610
[   60.640925]  bond_option_lacp_rate_set+0x71/0x140 [bonding]
[   60.641751]  __bond_opt_set+0x1ff/0xbb0 [bonding]
[   60.643217]  ? kasan_unpoison_shadow+0x30/0x40
[   60.643924]  bond_changelink+0x9a4/0x1700 [bonding]
[   60.644653]  ? memset+0x1f/0x40
[   60.742941]  ? bond_slave_changelink+0x1a0/0x1a0 [bonding]
[   60.752694]  ? alloc_netdev_mqs+0x8ea/0xcc0
[   60.753330]  ? rtnl_create_link+0x2ed/0xad0
[   60.753964]  bond_newlink+0x1e/0x60 [bonding]
[   60.754612]  __rtnl_newlink+0xb9f/0x11b0
[ ... ]

Reported-by: syzbot+8da67f407bcba2c72e6e@syzkaller.appspotmail.com
Reported-by: syzbot+0d083911ab18b710da71@syzkaller.appspotmail.com
Fixes: 089bca2caed0 ("bonding: use dynamic lockdep key instead of subclass")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agousbip: Fix free of unallocated memory in vhci tx
Suwan Kim [Tue, 22 Oct 2019 09:30:17 +0000 (18:30 +0900)]
usbip: Fix free of unallocated memory in vhci tx

[ Upstream commit d4d8257754c3300ea2a465dadf8d2b02c713c920 ]

iso_buffer should be set to NULL after use and free in the while loop.
In the case of isochronous URB in the while loop, iso_buffer is
allocated and after sending it to server, buffer is deallocated. And
then, if the next URB in the while loop is not a isochronous pipe,
iso_buffer still holds the previously deallocated buffer address and
kfree tries to free wrong buffer address.

Fixes: ea44d190764b ("usbip: Implement SG support to vhci-hcd and stub driver")
Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Suwan Kim <suwan.kim027@gmail.com>
Reviewed-by: Julia Lawall <julia.lawall@lip6.fr>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/20191022093017.8027-1-suwan.kim027@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoASoC: SOF: Intel: hda-stream: fix the CONFIG_ prefix missing
Keyon Jie [Fri, 25 Oct 2019 22:15:38 +0000 (17:15 -0500)]
ASoC: SOF: Intel: hda-stream: fix the CONFIG_ prefix missing

[ Upstream commit f792bd173a6fd51d1a4dde04263085ce67486aa3 ]

We are missing the 'CONFIG_' prefix when using the kernel configure item
SND_SOC_SOF_HDA_ALWAYS_ENABLE_DMI_L1, here correct them.

Fixes: 43b2ab9009b13b ('ASoC: SOF: Intel: hda: Disable DMI L1 entry during capture')
Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Link: https://lore.kernel.org/r/20191025221538.6668-1-pierre-louis.bossart@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoARM: dts: stm32: change joystick pinctrl definition on stm32mp157c-ev1
Amelie Delaunay [Mon, 4 Nov 2019 10:55:29 +0000 (11:55 +0100)]
ARM: dts: stm32: change joystick pinctrl definition on stm32mp157c-ev1

[ Upstream commit f4d6e0f79bcde7810890563bac8e0f3479fe6d03 ]

Pins used for joystick are all configured as input. "push-pull" is not a
valid setting for an input pin.

Fixes: a502b343ebd0 ("pinctrl: stmfx: update pinconf settings")
Signed-off-by: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: Amelie Delaunay <amelie.delaunay@st.com>
Signed-off-by: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agocgroup,writeback: don't switch wbs immediately on dead wbs if the memcg is dead
Tejun Heo [Fri, 8 Nov 2019 20:18:29 +0000 (12:18 -0800)]
cgroup,writeback: don't switch wbs immediately on dead wbs if the memcg is dead

commit 65de03e251382306a4575b1779c57c87889eee49 upstream.

cgroup writeback tries to refresh the associated wb immediately if the
current wb is dead.  This is to avoid keeping issuing IOs on the stale
wb after memcg - blkcg association has changed (ie. when blkcg got
disabled / enabled higher up in the hierarchy).

Unfortunately, the logic gets triggered spuriously on inodes which are
associated with dead cgroups.  When the logic is triggered on dead
cgroups, the attempt fails only after doing quite a bit of work
allocating and initializing a new wb.

While c3aab9a0bd91 ("mm/filemap.c: don't initiate writeback if mapping
has no dirty pages") alleviated the issue significantly as it now only
triggers when the inode has dirty pages.  However, the condition can
still be triggered before the inode is switched to a different cgroup
and the logic simply doesn't make sense.

Skip the immediate switching if the associated memcg is dying.

This is a simplified version of the following two patches:

 * https://lore.kernel.org/linux-mm/20190513183053.GA73423@dennisz-mbp/
 * http://lkml.kernel.org/r/156355839560.2063.5265687291430814589.stgit@buzz

Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Fixes: e8a7abf5a5bd ("writeback: disassociate inodes from dying bdi_writebacks")
Acked-by: Dennis Zhou <dennis@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agomm/filemap.c: don't initiate writeback if mapping has no dirty pages
Konstantin Khlebnikov [Mon, 23 Sep 2019 22:34:45 +0000 (15:34 -0700)]
mm/filemap.c: don't initiate writeback if mapping has no dirty pages

commit c3aab9a0bd91b696a852169479b7db1ece6cbf8c upstream.

Functions like filemap_write_and_wait_range() should do nothing if inode
has no dirty pages or pages currently under writeback.  But they anyway
construct struct writeback_control and this does some atomic operations if
CONFIG_CGROUP_WRITEBACK=y - on fast path it locks inode->i_lock and
updates state of writeback ownership, on slow path might be more work.
Current this path is safely avoided only when inode mapping has no pages.

For example generic_file_read_iter() calls filemap_write_and_wait_range()
at each O_DIRECT read - pretty hot path.

This patch skips starting new writeback if mapping has no dirty tags set.
If writeback is already in progress filemap_write_and_wait_range() will
wait for it.

Link: http://lkml.kernel.org/r/156378816804.1087.8607636317907921438.stgit@buzz
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agotimekeeping/vsyscall: Update VDSO data unconditionally
Huacai Chen [Thu, 24 Oct 2019 03:28:29 +0000 (11:28 +0800)]
timekeeping/vsyscall: Update VDSO data unconditionally

[ Upstream commit 52338415cf4d4064ae6b8dd972dadbda841da4fa ]

The update of the VDSO data is depending on __arch_use_vsyscall() returning
True. This is a leftover from the attempt to map the features of various
architectures 1:1 into generic code.

The usage of __arch_use_vsyscall() in the actual vsyscall implementations
got dropped and replaced by the requirement for the architecture code to
return U64_MAX if the global clocksource is not usable in the VDSO.

But the __arch_use_vsyscall() check in the update code stayed which causes
the VDSO data to be stale or invalid when an architecture actually
implements that function and returns False when the current clocksource is
not usable in the VDSO.

As a consequence the VDSO implementations of clock_getres(), time(),
clock_gettime(CLOCK_.*_COARSE) operate on invalid data and return bogus
information.

Remove the __arch_use_vsyscall() check from the VDSO update function and
update the VDSO data unconditionally.

[ tglx: Massaged changelog and removed the now useless implementations in
   asm-generic/ARM64/MIPS ]

Fixes: 44f57d788e7deecb50 ("timekeeping: Provide a generic update_vsyscall() implementation")
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Paul Burton <paul.burton@mips.com>
Cc: linux-mips@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1571887709-11447-1-git-send-email-chenhc@lemote.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoclk: imx8m: Use SYS_PLL1_800M as intermediate parent of CLK_ARM
Leonard Crestez [Tue, 22 Oct 2019 19:21:28 +0000 (22:21 +0300)]
clk: imx8m: Use SYS_PLL1_800M as intermediate parent of CLK_ARM

[ Upstream commit b234fe9558615098d8d62516e7041ad7f99ebcea ]

During cpu frequency switching the main "CLK_ARM" is reparented to an
intermediate "step" clock. On imx8mm and imx8mn the 24M oscillator is
used for this purpose but it is extremely slow, increasing wakeup
latencies to the point that i2c transactions can timeout and system
becomes unresponsive.

Fix by switching the "step" clk to SYS_PLL1_800M, matching the behavior
of imx8m cpufreq drivers in imx vendor tree.

This bug was not immediately apparent because upstream arm64 defconfig
uses the "performance" governor by default so no cpufreq transitions
happen.

Fixes: ba5625c3e272 ("clk: imx: Add clock driver support for imx8mm")
Fixes: 96d6392b54db ("clk: imx: Add support for i.MX8MN clock driver")
Cc: stable@vger.kernel.org
Signed-off-by: Leonard Crestez <leonard.crestez@nxp.com>
Link: https://lkml.kernel.org/r/f5d2b9c53f1ed5ccb1dd3c6624f56759d92e1689.1571771777.git.leonard.crestez@nxp.com
Acked-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>