]> git.itanic.dy.fi Git - linux-stable/log
linux-stable
6 years agoLinux 3.16.49 v3.16.49
Ben Hutchings [Thu, 12 Oct 2017 14:28:25 +0000 (15:28 +0100)]
Linux 3.16.49

6 years agodm: flush queued bios when process blocks to avoid deadlock
Mikulas Patocka [Wed, 15 Feb 2017 16:26:10 +0000 (11:26 -0500)]
dm: flush queued bios when process blocks to avoid deadlock

commit d67a5f4b5947aba4bfe9a80a2b86079c215ca755 upstream.

Commit df2cb6daa4 ("block: Avoid deadlocks with bio allocation by
stacking drivers") created a workqueue for every bio set and code
in bio_alloc_bioset() that tries to resolve some low-memory deadlocks
by redirecting bios queued on current->bio_list to the workqueue if the
system is low on memory.  However other deadlocks (see below **) may
happen, without any low memory condition, because generic_make_request
is queuing bios to current->bio_list (rather than submitting them).

** the related dm-snapshot deadlock is detailed here:
https://www.redhat.com/archives/dm-devel/2016-July/msg00065.html

Fix this deadlock by redirecting any bios on current->bio_list to the
bio_set's rescue workqueue on every schedule() call.  Consequently,
when the process blocks on a mutex, the bios queued on
current->bio_list are dispatched to independent workqueus and they can
complete without waiting for the mutex to be available.

The structure blk_plug contains an entry cb_list and this list can contain
arbitrary callback functions that are called when the process blocks.
To implement this fix DM (ab)uses the onstack plug's cb_list interface
to get its flush_current_bio_list() called at schedule() time.

This fixes the snapshot deadlock - if the map method blocks,
flush_current_bio_list() will be called and it redirects bios waiting
on current->bio_list to appropriate workqueues.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1267650
Depends-on: df2cb6daa4 ("block: Avoid deadlocks with bio allocation by stacking drivers")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agocpuset: PF_SPREAD_PAGE and PF_SPREAD_SLAB should be atomic flags
Zefan Li [Thu, 25 Sep 2014 01:41:02 +0000 (09:41 +0800)]
cpuset: PF_SPREAD_PAGE and PF_SPREAD_SLAB should be atomic flags

commit 2ad654bc5e2b211e92f66da1d819e47d79a866f0 upstream.

When we change cpuset.memory_spread_{page,slab}, cpuset will flip
PF_SPREAD_{PAGE,SLAB} bit of tsk->flags for each task in that cpuset.
This should be done using atomic bitops, but currently we don't,
which is broken.

Tetsuo reported a hard-to-reproduce kernel crash on RHEL6, which happened
when one thread tried to clear PF_USED_MATH while at the same time another
thread tried to flip PF_SPREAD_PAGE/PF_SPREAD_SLAB. They both operate on
the same task.

Here's the full report:
https://lkml.org/lkml/2014/9/19/230

To fix this, we make PF_SPREAD_PAGE and PF_SPREAD_SLAB atomic flags.

v4:
- updated mm/slab.c. (Fengguang Wu)
- updated Documentation.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Cc: Kees Cook <keescook@chromium.org>
Fixes: 950592f7b991 ("cpusets: update tasks' page/slab spread flags in time")
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agosched: add macros to define bitops for task atomic flags
Zefan Li [Thu, 25 Sep 2014 01:40:40 +0000 (09:40 +0800)]
sched: add macros to define bitops for task atomic flags

commit e0e5070b20e01f0321f97db4e4e174f3f6b49e50 upstream.

This will simplify code when we add new flags.

v3:
- Kees pointed out that no_new_privs should never be cleared, so we
shouldn't define task_clear_no_new_privs(). we define 3 macros instead
of a single one.

v2:
- updated scripts/tags.sh, suggested by Peter

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agosched: fix confusing PFA_NO_NEW_PRIVS constant
Zefan Li [Thu, 25 Sep 2014 01:40:17 +0000 (09:40 +0800)]
sched: fix confusing PFA_NO_NEW_PRIVS constant

commit a2b86f772227bcaf962c8b134f8d187046ac5f0e upstream.

Commit 1d4457f99928 ("sched: move no_new_privs into new atomic flags")
defined PFA_NO_NEW_PRIVS as hexadecimal value, but it is confusing
because it is used as bit number. Redefine it as decimal bit number.

Note this changes the bit position of PFA_NOW_NEW_PRIVS from 1 to 0.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Kees Cook <keescook@chromium.org>
[ lizf: slightly modified subject and changelog ]
Signed-off-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agosched: move no_new_privs into new atomic flags
Kees Cook [Wed, 21 May 2014 22:23:46 +0000 (15:23 -0700)]
sched: move no_new_privs into new atomic flags

commit 1d4457f99928a968767f6405b4a1f50845aa15fd upstream.

Since seccomp transitions between threads requires updates to the
no_new_privs flag to be atomic, the flag must be part of an atomic flag
set. This moves the nnp flag into a separate task field, and introduces
accessors.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoFix match_prepath()
Sachin Prabhu [Wed, 26 Apr 2017 13:05:46 +0000 (14:05 +0100)]
Fix match_prepath()

commit cd8c42968ee651b69e00f8661caff32b0086e82d upstream.

Incorrect return value for shares not using the prefix path means that
we will never match superblocks for these shares.

Fixes: commit c1d8b24d1819 ("Compare prepaths when comparing superblocks")
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Cc: Aurélien Aptel <aaptel@suse.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoFix regression which breaks DFS mounting
Sachin Prabhu [Tue, 6 Sep 2016 12:22:34 +0000 (13:22 +0100)]
Fix regression which breaks DFS mounting

commit d171356ff11ab1825e456dfb979755e01b3c54a1 upstream.

Patch a6b5058 results in -EREMOTE returned by is_path_accessible() in
cifs_mount() to be ignored which breaks DFS mounting.

Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoMove check for prefix path to within cifs_get_root()
Sachin Prabhu [Fri, 29 Jul 2016 21:38:21 +0000 (22:38 +0100)]
Move check for prefix path to within cifs_get_root()

commit 348c1bfa84dfc47da1f1234b7f2bf09fa798edea upstream.

Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Tested-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoCompare prepaths when comparing superblocks
Sachin Prabhu [Fri, 29 Jul 2016 21:38:20 +0000 (22:38 +0100)]
Compare prepaths when comparing superblocks

commit c1d8b24d18192764fe82067ec6aa8d4c3bf094e0 upstream.

The patch
fs/cifs: make share unaccessible at root level mountable
makes use of prepaths when any component of the underlying path is
inaccessible.

When mounting 2 separate shares having different prepaths but are other
wise similar in other respects, we end up sharing superblocks when we
shouldn't be doing so.

Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Tested-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoFix memory leaks in cifs_do_mount()
Sachin Prabhu [Fri, 29 Jul 2016 21:38:19 +0000 (22:38 +0100)]
Fix memory leaks in cifs_do_mount()

commit 4214ebf4654798309364d0c678b799e402f38288 upstream.

Fix memory leaks introduced by the patch
fs/cifs: make share unaccessible at root level mountable

Also move allocation of cifs_sb->prepath to cifs_setup_cifs_sb().

Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Tested-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <smfrench@gmail.com>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agofs/cifs: make share unaccessible at root level mountable
Aurelien Aptel [Wed, 25 May 2016 17:59:09 +0000 (19:59 +0200)]
fs/cifs: make share unaccessible at root level mountable

commit a6b5058fafdf508904bbf16c29b24042cef3c496 upstream.

if, when mounting //HOST/share/sub/dir/foo we can query /sub/dir/foo but
not any of the path components above:

- store the /sub/dir/foo prefix in the cifs super_block info
- in the superblock, set root dentry to the subpath dentry (instead of
  the share root)
- set a flag in the superblock to remember it
- use prefixpath when building path from a dentry

fixes bso#8950

Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
[bwh: Backported to 3.16: use Jiri Slaby's backport of the change in
 cifs_root_iget()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agonetvsc: fix incorrect receive checksum offloading
Stephen Hemminger [Mon, 24 Oct 2016 04:32:47 +0000 (21:32 -0700)]
netvsc: fix incorrect receive checksum offloading

commit e52fed7177f74382f742c27de2cc5314790aebb6 upstream.

The Hyper-V netvsc driver was looking at the incorrect status bits
in the checksum info. It was setting the receive checksum unnecessary
flag based on the IP header checksum being correct. The checksum
flag is skb is about TCP and UDP checksum status. Because of this
bug, any packet received with bad TCP checksum would be passed
up the stack and to the application causing data corruption.
The problem is reproducible via netcat and netem.

This had a side effect of not doing receive checksum offload
on IPv6. The driver was also also always doing checksum offload
independent of the checksum setting done via ethtool.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoALSA: oxygen: Fix logical-not-parentheses warning
Tomer Barletz [Sun, 2 Aug 2015 09:08:57 +0000 (02:08 -0700)]
ALSA: oxygen: Fix logical-not-parentheses warning

commit 8ec7cfce3762299ae289c384e281b2f4010ae231 upstream.

This fixes the following warning, that is seen with gcc 5.1:
warning: logical not is only applied to the left hand side of comparison [-Wlogical-not-parentheses].

Signed-off-by: Tomer Barletz <barletz@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoPCI: Limit config space size for Netronome NFP4000
Simon Horman [Fri, 11 Dec 2015 02:30:12 +0000 (11:30 +0900)]
PCI: Limit config space size for Netronome NFP4000

commit c2e771b02792d222cbcd9617fe71482a64f52647 upstream.

Like the NFP6000, the NFP4000 as an erratum where reading/writing to PCI
config space addresses above 0x600 can cause the NFP to generate PCIe
completion timeouts.

Limit the NFP4000's PF's config space size to 0x600 bytes as is already
done for the NFP6000.

The NFP4000's VF is 0x6004 (PCI_DEVICE_ID_NETRONOME_NFP6000_VF), the same
device ID as the NFP6000's VF.  Thus, its config space is already limited
by the existing use of quirk_nfp6000().

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoPCI: Add Netronome NFP4000 PF device ID
Simon Horman [Fri, 11 Dec 2015 02:30:11 +0000 (11:30 +0900)]
PCI: Add Netronome NFP4000 PF device ID

commit 69874ec233871a62e1bc8c89e643993af93a8630 upstream.

Add the device ID for the PF of the NFP4000.  The device ID for the VF,
0x6003, is already present as PCI_DEVICE_ID_NETRONOME_NFP6000_VF.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoPCI: Limit config space size for Netronome NFP6000 family
Jason S. McMullan [Wed, 30 Sep 2015 06:35:07 +0000 (15:35 +0900)]
PCI: Limit config space size for Netronome NFP6000 family

commit 9f33a2ae59f24452c1076749deb615bccd435ca9 upstream.

The NFP6000 has an erratum where reading/writing to PCI config space
addresses above 0x600 can cause the NFP to generate PCIe completion
timeouts.

Limit the NFP6000's config space size to 0x600 bytes.

Signed-off-by: Jason S. McMullan <jason.mcmullan@netronome.com>
[simon: edited changelog]
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoPCI: Add Netronome vendor and device IDs
Jason S. McMullan [Wed, 30 Sep 2015 06:35:06 +0000 (15:35 +0900)]
PCI: Add Netronome vendor and device IDs

commit a755e169031dac9ebaed03302c4921687c271d62 upstream.

Device IDs for the Netronome NFP3200, NFP3240, NFP6000, and NFP6000 SR-IOV
devices.

Signed-off-by: Jason S. McMullan <jason.mcmullan@netronome.com>
[simon: edited changelog]
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoPCI: Support PCIe devices with short cfg_size
Jason S. McMullan [Wed, 30 Sep 2015 06:35:05 +0000 (15:35 +0900)]
PCI: Support PCIe devices with short cfg_size

commit c20aecf6963d1273d8f6d61c042b4845441ca592 upstream.

If a device quirk modifies the pci_dev->cfg_size to be less than
PCI_CFG_SPACE_EXP_SIZE (4096), but greater than PCI_CFG_SPACE_SIZE (256),
the PCI sysfs interface truncates the readable size to PCI_CFG_SPACE_SIZE.

Allow sysfs access to config space up to cfg_size, even if the device
doesn't support the entire 4096-byte PCIe config space.

Note that pci_read_config() and pci_write_config() limit access to
dev->cfg_size even though pcie_config_attr contains 4096 (the maximum
size).

Signed-off-by: Jason S. McMullan <jason.mcmullan@netronome.com>
[simon: edited changelog]
Signed-off-by: Simon Horman <simon.horman@netronome.com>
[bhelgaas: more changelog edits]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agomm: thp: fix SMP race condition between THP page fault and MADV_DONTNEED
Andrea Arcangeli [Fri, 26 Feb 2016 23:19:28 +0000 (15:19 -0800)]
mm: thp: fix SMP race condition between THP page fault and MADV_DONTNEED

commit ad33bb04b2a6cee6c1f99fabb15cddbf93ff0433 upstream.

pmd_trans_unstable()/pmd_none_or_trans_huge_or_clear_bad() were
introduced to locklessy (but atomically) detect when a pmd is a regular
(stable) pmd or when the pmd is unstable and can infinitely transition
from pmd_none() and pmd_trans_huge() from under us, while only holding
the mmap_sem for reading (for writing not).

While holding the mmap_sem only for reading, MADV_DONTNEED can run from
under us and so before we can assume the pmd to be a regular stable pmd
we need to compare it against pmd_none() and pmd_trans_huge() in an
atomic way, with pmd_trans_unstable().  The old pmd_trans_huge() left a
tiny window for a race.

Useful applications are unlikely to notice the difference as doing
MADV_DONTNEED concurrently with a page fault would lead to undefined
behavior.

[akpm@linux-foundation.org: tidy up comment grammar/layout]
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[js] 3.12 backport: no pmd_devmap in 3.12 yet.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agos390/seccomp: fix error return for filtered system calls
Jan Willeke [Tue, 22 Jul 2014 14:50:57 +0000 (16:50 +0200)]
s390/seccomp: fix error return for filtered system calls

commit dc295880c6752076f8b94ba3885d0bfff09e3e82 upstream.

The syscall_set_return_value function of s390 negates the error argument
before storing the value to the return register gpr2. This is incorrect,
the seccomp code already passes the negative error value.
Store the unmodified error value to gpr2.

Signed-off-by: Jan Willeke <willeke@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agomm/swap.c: flush lru pvecs on compound page arrival
Lukasz Odzioba [Fri, 24 Jun 2016 21:50:01 +0000 (14:50 -0700)]
mm/swap.c: flush lru pvecs on compound page arrival

commit 8f182270dfec432e93fae14f9208a6b9af01009f upstream.

Currently we can have compound pages held on per cpu pagevecs, which
leads to a lot of memory unavailable for reclaim when needed.  In the
systems with hundreads of processors it can be GBs of memory.

On of the way of reproducing the problem is to not call munmap
explicitly on all mapped regions (i.e.  after receiving SIGTERM).  After
that some pages (with THP enabled also huge pages) may end up on
lru_add_pvec, example below.

  void main() {
  #pragma omp parallel
  {
size_t size = 55 * 1000 * 1000; // smaller than  MEM/CPUS
void *p = mmap(NULL, size, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS , -1, 0);
if (p != MAP_FAILED)
memset(p, 0, size);
//munmap(p, size); // uncomment to make the problem go away
  }
  }

When we run it with THP enabled it will leave significant amount of
memory on lru_add_pvec.  This memory will be not reclaimed if we hit
OOM, so when we run above program in a loop:

for i in `seq 100`; do ./a.out; done

many processes (95% in my case) will be killed by OOM.

The primary point of the LRU add cache is to save the zone lru_lock
contention with a hope that more pages will belong to the same zone and
so their addition can be batched.  The huge page is already a form of
batched addition (it will add 512 worth of memory in one go) so skipping
the batching seems like a safer option when compared to a potential
excess in the caching which can be quite large and much harder to fix
because lru_add_drain_all is way to expensive and it is not really clear
what would be a good moment to call it.

Similarly we can reproduce the problem on lru_deactivate_pvec by adding:
madvise(p, size, MADV_FREE); after memset.

This patch flushes lru pvecs on compound page arrival making the problem
less severe - after applying it kill rate of above example drops to 0%,
due to reducing maximum amount of memory held on pvec from 28MB (with
THP) to 56kB per CPU.

Suggested-by: Michal Hocko <mhocko@suse.com>
Link: http://lkml.kernel.org/r/1466180198-18854-1-git-send-email-lukasz.odzioba@intel.com
Signed-off-by: Lukasz Odzioba <lukasz.odzioba@intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: Ming Li <mingli199x@qq.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.16:
 - Drop change in deactivate_file_page()
 - Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoarm64: Rework valid_user_regs
Mark Rutland [Tue, 1 Mar 2016 14:18:50 +0000 (14:18 +0000)]
arm64: Rework valid_user_regs

commit dbd4d7ca563fd0a8949718d35ce197e5642d5d9d upstream.

We validate pstate using PSR_MODE32_BIT, which is part of the
user-provided pstate (and cannot be trusted). Also, we conflate
validation of AArch32 and AArch64 pstate values, making the code
difficult to reason about.

Instead, validate the pstate value based on the associated task. The
task may or may not be current (e.g. when using ptrace), so this must be
passed explicitly by callers. To avoid circular header dependencies via
sched.h, is_compat_task is pulled out of asm/ptrace.h.

To make the code possible to reason about, the AArch64 and AArch32
validation is split into separate functions. Software must respect the
RES0 policy for SPSR bits, and thus the kernel mirrors the hardware
policy (RAZ/WI) for bits as-yet unallocated. When these acquire an
architected meaning writes may be permitted (potentially with additional
validation).

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Dave Martin <dave.martin@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
[ rebased for v3.16
  v3.16 does not support SETEND, support for this was added by
  2d888f48e056 ("arm64: Emulate SETEND for AArch32 tasks") in v3.20
  This backport forces the kernel endianness on userspace.

  Added a DBG_SPSR_SS define hidden by #ifdefs to avoid conflicts with
  other backports.
]
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoMIPS: KVM: Fix modular KVM under QEMU
James Hogan [Thu, 9 Jun 2016 09:50:43 +0000 (10:50 +0100)]
MIPS: KVM: Fix modular KVM under QEMU

commit 797179bc4fe06c89e47a9f36f886f68640b423f8 upstream.

Copy __kvm_mips_vcpu_run() into unmapped memory, so that we can never
get a TLB refill exception in it when KVM is built as a module.

This was observed to happen with the host MIPS kernel running under
QEMU, due to a not entirely transparent optimisation in the QEMU TLB
handling where TLB entries replaced with TLBWR are copied to a separate
part of the TLB array. Code in those pages continue to be executable,
but those mappings persist only until the next ASID switch, even if they
are marked global.

An ASID switch happens in __kvm_mips_vcpu_run() at exception level after
switching to the guest exception base. Subsequent TLB mapped kernel
instructions just prior to switching to the guest trigger a TLB refill
exception, which enters the guest exception handlers without updating
EPC. This appears as a guest triggered TLB refill on a host kernel
mapped (host KSeg2) address, which is not handled correctly as user
(guest) mode accesses to kernel (host) segments always generate address
error exceptions.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: kvm@vger.kernel.org
Cc: linux-mips@linux-mips.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[james.hogan@imgtec.com: backported for stable 3.14]
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agomacintosh/therm_windtunnel: Export I2C module alias information
Javier Martinez Canillas [Thu, 30 Jul 2015 16:18:30 +0000 (18:18 +0200)]
macintosh/therm_windtunnel: Export I2C module alias information

commit cb0eefcc3271ea1d370476dd29685918b99c5a9f upstream.

The I2C core always reports the MODALIAS uevent as "i2c:<client name"
regardless if the driver was matched using the I2C id_table or the
of_match_table. So the driver needs to export the I2C table and this
be built into the module or udev won't have the necessary information
to auto load the correct module when the device is added.

Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoperf/x86: Fix undefined shift on 32-bit kernels
Andrey Ryabinin [Wed, 11 May 2016 13:51:51 +0000 (16:51 +0300)]
perf/x86: Fix undefined shift on 32-bit kernels

commit 6d6f2833bfbf296101f9f085e10488aef2601ba5 upstream.

Jim reported:

UBSAN: Undefined behaviour in arch/x86/events/intel/core.c:3708:12
shift exponent 35 is too large for 32-bit type 'long unsigned int'

The use of 'unsigned long' type obviously is not correct here, make it
'unsigned long long' instead.

Reported-by: Jim Cromie <jim.cromie@gmail.com>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Imre Palik <imrep@amazon.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: 2c33645d366d ("perf/x86: Honor the architectural performance monitoring version")
Link: http://lkml.kernel.org/r/1462974711-10037-1-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.16: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoperf/x86: Honor the architectural performance monitoring version
Palik, Imre [Mon, 8 Jun 2015 12:46:49 +0000 (14:46 +0200)]
perf/x86: Honor the architectural performance monitoring version

commit 2c33645d366d13b969d936b68b9f4875b1fdddea upstream.

Architectural performance monitoring, version 1, doesn't support fixed counters.

Currently, even if a hypervisor advertises support for architectural
performance monitoring version 1, perf may still try to use the fixed
counters, as the constraints are set up based on the CPU model.

This patch ensures that perf honors the architectural performance monitoring
version returned by CPUID, and it only uses the fixed counters for version 2
and above.

(Some of the ideas in this patch came from Peter Zijlstra.)

Signed-off-by: Imre Palik <imrep@amazon.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Anthony Liguori <aliguori@amazon.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1433767609-1039-1-git-send-email-imrep.amz@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoMIPS: Fix 64k page support for 32 bit kernels.
Ralf Baechle [Thu, 4 Feb 2016 00:24:40 +0000 (01:24 +0100)]
MIPS: Fix 64k page support for 32 bit kernels.

commit d7de413475f443957a0c1d256e405d19b3a2cb22 upstream.

TASK_SIZE was defined as 0x7fff8000UL which for 64k pages is not a
multiple of the page size.  Somewhere further down the math fails
such that executing an ELF binary fails.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Tested-by: Joshua Henderson <joshua.henderson@microchip.com>
Cc: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agomisc: ad525x_dpot: Fix the enabling of the "otpXen" attributes
Dan Bogdan Nechita [Tue, 23 Feb 2016 09:48:45 +0000 (11:48 +0200)]
misc: ad525x_dpot: Fix the enabling of the "otpXen" attributes

commit 1bb850a1b7f68b66361e658e334f9fdf8231f17d upstream.

Currently writing the attributes with "echo" will result in comparing:
"enabled\n" with "enabled\0" and attribute is always set to false.

Use the sysfs_streq() instead because it treats both NUL and
new-line-then-NUL as equivalent string terminations.

Signed-off-by: Dan Bogdan Nechita <dan.bogdan.nechita@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoserial: samsung: Reorder the sequence of clock control when call s3c24xx_serial_set_t...
Chanwoo Choi [Thu, 21 Apr 2016 09:58:31 +0000 (18:58 +0900)]
serial: samsung: Reorder the sequence of clock control when call s3c24xx_serial_set_termios()

commit b8995f527aac143e83d3900ff39357651ea4e0f6 upstream.

This patch fixes the broken serial log when changing the clock source
of uart device. Before disabling the original clock source, this patch
enables the new clock source to protect the clock off state for a split second.

Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
Reviewed-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoBtrfs: don't use src fd for printk
Josef Bacik [Fri, 25 Mar 2016 14:02:41 +0000 (10:02 -0400)]
Btrfs: don't use src fd for printk

commit c79b4713304f812d3d6c95826fc3e5fc2c0b0c14 upstream.

The fd we pass in may not be on a btrfs file system, so don't try to do
BTRFS_I() on it.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoARM: OMAP3: Fix booting with thumb2 kernel
Tony Lindgren [Thu, 28 May 2015 14:22:08 +0000 (07:22 -0700)]
ARM: OMAP3: Fix booting with thumb2 kernel

commit d8a50941c91a68da202aaa96a3dacd471ea9c693 upstream.

We get a NULL pointer dereference on omap3 for thumb2 compiled kernels:

Internal error: Oops: 80000005 [#1] SMP THUMB2
...
[<c046497b>] (_raw_spin_unlock_irqrestore) from [<c0024375>]
(omap3_enter_idle_bm+0xc5/0x178)
[<c0024375>] (omap3_enter_idle_bm) from [<c0374e63>]
(cpuidle_enter_state+0x77/0x27c)
[<c0374e63>] (cpuidle_enter_state) from [<c00627f1>]
(cpu_startup_entry+0x155/0x23c)
[<c00627f1>] (cpu_startup_entry) from [<c06b9a47>]
(start_kernel+0x32f/0x338)
[<c06b9a47>] (start_kernel) from [<8000807f>] (0x8000807f)

The power management related assembly on omaps needs to interact with
ARM mode bootrom code, so we need to keep most of the related assembly
in ARM mode.

Turns out this error is because of missing ENDPROC for assembly code
as suggested by Stephen Boyd <sboyd@codeaurora.org>. Let's fix the
problem by adding ENDPROC in two places to sleep34xx.S.

Let's also remove the now duplicate custom code for mode switching.
This has been unnecessary since commit 6ebbf2ce437b ("ARM: convert
all "mov.* pc, reg" to "bx reg" for ARMv6+").

And let's also remove the comments about local variables, they are
now just confusing after the ENDPROC.

The reason why ENDPROC makes a difference is it sets .type and then
the compiler knows what to do with the thumb bit as explained at:

https://wiki.ubuntu.com/ARM/Thumb2PortingHowto

Reported-by: Kevin Hilman <khilman@kernel.org>
Tested-by: Kevin Hilman <khilman@linaro.org>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoInput: ads7846 - correct the value got from SPI
Andrey Gelman [Tue, 6 Oct 2015 22:43:43 +0000 (15:43 -0700)]
Input: ads7846 - correct the value got from SPI

commit 879f2fea8a5a748bcbf98d2cdce9139c045505d3 upstream.

According to the touch controller spec, SPI return a 16 bit value, only 12
bits are valid, they are bit[14-3].

The value of MISO and MOSI can be configured when SPI is in idle mode.
Currently this touch driver assumes the SPI bus sets the MOSI and MISO in
low level when SPI bus is in idle mode. So the bit[15] of the value got
from SPI bus is always 0. But when SPI bus congfigures the MOSI and MISO in
high level during the SPI idle mode, the bit[15] of the value get from SPI
is always 1. If bit[15] is not masked, we may get the wrong value.

Mask the invalid bit to make sure the correct value gets returned.
Regardless of the SPI bus idle configuration.

Signed-off-by: Andrey Gelman <andrey.gelman@compulab.co.il>
Signed-off-by: Haibo Chen <haibo.chen@freescale.com>
Signed-off-by: Igor Grinberg <grinberg@compulab.co.il>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agov4l2-dv-timings.h: fix polarity for 4k formats
Hans Verkuil [Mon, 2 May 2016 08:11:49 +0000 (10:11 +0200)]
v4l2-dv-timings.h: fix polarity for 4k formats

commit 3020ca711871fdaf0c15c8bab677a6bc302e28fe upstream.

The VSync polarity was negative instead of positive for the 4k CEA formats.
I probably copy-and-pasted these from the DMT 4k format, which does have a
negative VSync polarity.

Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Reported-by: Martin Bugge <marbugge@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agousb: musb: cppi41: improve rx channel abort routine
Bin Liu [Mon, 26 Jan 2015 22:22:07 +0000 (16:22 -0600)]
usb: musb: cppi41: improve rx channel abort routine

commit cb83df77f3ec151d68a1b6be957207e6fc7b7f50 upstream.

1. set AUTOREQ to NONE at the beginning of teardown;

2. add delay for dma pipeline to drain;

3. Do not set USB_TDOWN bit for RX teardown.

  The CPPI hw has an issue that when tearing down a RX channel, if
  another RX channel is receiving data, the CPPI will lockup.

  To workaround the issue, do not set the CPPI TD bit. The steps before
  this point ensures the CPPI channel will be torn down properly.

Signed-off-by: Bin Liu <b-liu@ti.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agousb: musb: cppi41: correct the macro name EP_MODE_AUTOREG_*
Bin Liu [Mon, 26 Jan 2015 22:22:06 +0000 (16:22 -0600)]
usb: musb: cppi41: correct the macro name EP_MODE_AUTOREG_*

commit 0149b07a9e28b0d8bd2fc1c238ffe7d530c2673f upstream.

The macro EP_MODE_AUTOREG_* should be called EP_MODE_AUTOREQ_*, as they
are used for register AUTOREQ.

Signed-off-by: Bin Liu <b-liu@ti.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agox86/efi: Avoid triple faults during EFI mixed mode calls
Matt Fleming [Tue, 13 Jan 2015 15:25:00 +0000 (15:25 +0000)]
x86/efi: Avoid triple faults during EFI mixed mode calls

commit 96738c69a7fcdbf0d7c9df0c8a27660011e82a7b upstream.

Andy pointed out that if an NMI or MCE is received while we're in the
middle of an EFI mixed mode call a triple fault will occur. This can
happen, for example, when issuing an EFI mixed mode call while running
perf.

The reason for the triple fault is that we execute the mixed mode call
in 32-bit mode with paging disabled but with 64-bit kernel IDT handlers
installed throughout the call.

At Andy's suggestion, stop playing the games we currently do at runtime,
such as disabling paging and installing a 32-bit GDT for __KERNEL_CS. We
can simply switch to the __KERNEL32_CS descriptor before invoking
firmware services, and run in compatibility mode. This way, if an
NMI/MCE does occur the kernel IDT handler will execute correctly, since
it'll jump to __KERNEL_CS automatically.

However, this change is only possible post-ExitBootServices(). Before
then the firmware "owns" the machine and expects for its 32-bit IDT
handlers to be left intact to service interrupts, etc.

So, we now need to distinguish between early boot and runtime
invocations of EFI services. During early boot, we need to restore the
GDT that the firmware expects to be present. We can only jump to the
__KERNEL32_CS code segment for mixed mode calls after ExitBootServices()
has been invoked.

A liberal sprinkling of comments in the thunking code should make the
differences in early and late environments more apparent.

Reported-by: Andy Lutomirski <luto@amacapital.net>
Tested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
[bwh: Backported to 3.16: in arch/x86/boot/compressed/Makefile, add the new
 object file to VMLINUX_OBJS]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agodrm/irq: BUG_ON() -> WARN_ON()
Rob Clark [Sat, 8 Nov 2014 15:16:19 +0000 (10:16 -0500)]
drm/irq: BUG_ON() -> WARN_ON()

commit 7f907bf284ba7bb8d271f094b226699d3fef2142 upstream.

Let's make things a bit easier to debug when things go bad (potentially
under console_lock).

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoRevert "ACPI / EC: Add support to disallow QR_EC to be issued before completing previ...
Lv Zheng [Wed, 29 Oct 2014 03:33:43 +0000 (11:33 +0800)]
Revert "ACPI / EC: Add support to disallow QR_EC to be issued before completing previous QR_EC"

commit df9ff91801da603079018f21a9412385b62f0f8e upstream.

It is reported that the following commit breaks Samsung hardware:
 Commit: 558e4736f2e1b0e6323adf7a5e4df77ed6cfc1a4.
 Subject: ACPI / EC: Add support to disallow QR_EC to be issued before
          completing previous QR_EC

Which means the Samsung behavior conflicts with the Acer behavior.

1. Samsung may behave like:
   [ +event 1 ] SCI_EVT set
   [ +event 2 ] SCI_EVT set
                              write QR_EC
                              read event
   [ -event 1 ] SCI_EVT clear
   Without the above commit, Samsung can work:
   [ +event 1 ] SCI_EVT set
   [ +event 2 ] SCI_EVT set
                              write QR_EC
                              CAN prepare next QR_EC as SCI_EVT=1
                              read event
   [ -event 1 ] SCI_EVT clear
                              write QR_EC
                              read event
   [ -event 2 ] SCI_EVT clear
   With the above commit, Samsung cannot work:
   [ +event 1 ] SCI_EVT set
   [ +event 2 ] SCI_EVT set
                              write QR_EC
                              read event
   [ -event 1 ] SCI_EVT clear
                              CANNOT prepare next QR_EC as SCI_EVT=0
2. Acer may behave like:
   [ +event 1 ] SCI_EVT set
   [ +event 2 ]
                              write QR_EC
                              read event
   [ -event 1 ] SCI_EVT clear
   [ +event 2 ] SCI_EVT set
   Without the above commit, Acer cannot work when there is only 1 event:
   [ +event 1 ] SCI_EVT set
                              write QR_EC
                              can prepared next QR_EC as SCI_EVT=1
                              read event
   [ -event 1 ] SCI_EVT clear
                              CANNOT write QR_EC as SCI_EVT=0
   With the above commit, Acer can work:
   [ +event 1 ] SCI_EVT set
   [ +event 2 ]
                              write QR_EC
                              read event
   [ -event 1 ] SCI_EVT set
                              can prepare next QR_EC because SCI_EVT=0
                              CAN write QR_EC as SCI_EVT=1

Since Acer can also work with only the following commit applied:
 Commit: 3afcf2ece453e1a8c2c6de19cdf06da3772a1b08
 Subject: ACPI / EC: Add support to disallow QR_EC to be issued when
          SCI_EVT isn't set
commit 558e4736f2e1b0e6323adf7a5e4df77ed6cfc1a4 can be reverted.

Fixes: 558e4736f2e1 (ACPI / EC: Add support to disallow QR_EC to be issued ...)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=44161
Reported-and-tested-by: Ortwin Glück <odi@odi.ch>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agonet sched filters: fix notification of filter delete with proper handle
Jamal Hadi Salim [Tue, 25 Oct 2016 00:18:27 +0000 (20:18 -0400)]
net sched filters: fix notification of filter delete with proper handle

[ Upstream commit 9ee7837449b3d6f0fcf9132c6b5e5aaa58cc67d4 ]

Daniel says:

While trying out [1][2], I noticed that tc monitor doesn't show the
correct handle on delete:

$ tc monitor
qdisc clsact ffff: dev eno1 parent ffff:fff1
filter dev eno1 ingress protocol all pref 49152 bpf handle 0x2a [...]
deleted filter dev eno1 ingress protocol all pref 49152 bpf handle 0xf3be0c80

some context to explain the above:
The user identity of any tc filter is represented by a 32-bit
identifier encoded in tcm->tcm_handle. Example 0x2a in the bpf filter
above. A user wishing to delete, get or even modify a specific filter
uses this handle to reference it.
Every classifier is free to provide its own semantics for the 32 bit handle.
Example: classifiers like u32 use schemes like 800:1:801 to describe
the semantics of their filters represented as hash table, bucket and
node ids etc.
Classifiers also have internal per-filter representation which is different
from this externally visible identity. Most classifiers set this
internal representation to be a pointer address (which allows fast retrieval
of said filters in their implementations). This internal representation
is referenced with the "fh" variable in the kernel control code.

When a user successfuly deletes a specific filter, by specifying the correct
tcm->tcm_handle, an event is generated to user space which indicates
which specific filter was deleted.

Before this patch, the "fh" value was sent to user space as the identity.
As an example what is shown in the sample bpf filter delete event above
is 0xf3be0c80. This is infact a 32-bit truncation of 0xffff8807f3be0c80
which happens to be a 64-bit memory address of the internal filter
representation (address of the corresponding filter's struct cls_bpf_prog);

After this patch the appropriate user identifiable handle as encoded
in the originating request tcm->tcm_handle is generated in the event.
One of the cardinal rules of netlink rules is to be able to take an
event (such as a delete in this case) and reflect it back to the
kernel and successfully delete the filter. This patch achieves that.

Note, this issue has existed since the original TC action
infrastructure code patch back in 2004 as found in:
https://git.kernel.org/cgit/linux/kernel/git/history/history.git/commit/

[1] http://patchwork.ozlabs.org/patch/682828/
[2] http://patchwork.ozlabs.org/patch/682829/

Fixes: 4e54c4816bfe ("[NET]: Add tc extensions infrastructure.")
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agonet: Don't forget pr_fmt on net_dbg_ratelimited for CONFIG_DYNAMIC_DEBUG
Jason A. Donenfeld [Wed, 15 Jun 2016 09:14:53 +0000 (11:14 +0200)]
net: Don't forget pr_fmt on net_dbg_ratelimited for CONFIG_DYNAMIC_DEBUG

commit daddef76c3deaaa7922f9d7b18edbf0a061215c3 upstream.

The implementation of net_dbg_ratelimited in the CONFIG_DYNAMIC_DEBUG
case was added with 2c94b5373 ("net: Implement net_dbg_ratelimited() for
CONFIG_DYNAMIC_DEBUG case"). The implementation strategy was to take the
usual definition of the dynamic_pr_debug macro, but alter it by adding a
call to "net_ratelimit()" in the if statement. This is, in fact, the
correct approach.

However, while doing this, the author of the commit forgot to surround
fmt by pr_fmt, resulting in unprefixed log messages appearing in the
console. So, this commit adds back the pr_fmt(fmt) invocation, making
net_dbg_ratelimited properly consistent across DEBUG, no DEBUG, and
DYNAMIC_DEBUG cases, and bringing parity with the behavior of
dynamic_pr_debug as well.

Fixes: 2c94b5373 ("net: Implement net_dbg_ratelimited() for CONFIG_DYNAMIC_DEBUG case")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Tim Bingham <tbingham@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agonet: Implement net_dbg_ratelimited() for CONFIG_DYNAMIC_DEBUG case
Tim Bingham [Fri, 29 Apr 2016 17:30:23 +0000 (13:30 -0400)]
net: Implement net_dbg_ratelimited() for CONFIG_DYNAMIC_DEBUG case

commit 2c94b53738549d81dc7464a32117d1f5112c64d3 upstream.

Prior to commit d92cff89a0c8 ("net_dbg_ratelimited: turn into no-op
when !DEBUG") the implementation of net_dbg_ratelimited() was buggy
for both the DEBUG and CONFIG_DYNAMIC_DEBUG cases.

The bug was that net_ratelimit() was being called and, despite
returning true, nothing was being printed to the console. This
resulted in messages like the following -

"net_ratelimit: %d callbacks suppressed"

with no other output nearby.

After commit d92cff89a0c8 ("net_dbg_ratelimited: turn into no-op when
!DEBUG") the bug is fixed for the DEBUG case. However, there's no
output at all for CONFIG_DYNAMIC_DEBUG case.

This patch restores debug output (if enabled) for the
CONFIG_DYNAMIC_DEBUG case.

Add a definition of net_dbg_ratelimited() for the CONFIG_DYNAMIC_DEBUG
case. The implementation takes care to check that dynamic debugging is
enabled before calling net_ratelimit().

Fixes: d92cff89a0c8 ("net_dbg_ratelimited: turn into no-op when !DEBUG")
Signed-off-by: Tim Bingham <tbingham@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agonet_dbg_ratelimited: turn into no-op when !DEBUG
Jason A. Donenfeld [Tue, 4 Aug 2015 16:26:19 +0000 (18:26 +0200)]
net_dbg_ratelimited: turn into no-op when !DEBUG

commit d92cff89a0c80e7e49796366e441d97f07b5d321 upstream.

The pr_debug family of functions turns into a no-op when -DDEBUG is not
specified, opting instead to call "no_printk", which gets compiled to a
no-op (but retains gcc's nice warnings about printf-style arguments).

The problem with net_dbg_ratelimited is that it is defined to be a
variant of net_ratelimited_function, which expands to essentially:

    if (net_ratelimit())
        pr_debug(fmt, ...);

When DEBUG is not defined, then this becomes,

    if (net_ratelimit())
        ;

This seems benign, except it isn't. Firstly, there's the obvious
overhead of calling net_ratelimit needlessly, which does quite some book
keeping for the rate limiting. Given that the pr_debug and
net_dbg_ratelimited family of functions are sprinkled liberally through
performance critical code, with developers assuming they'll be compiled
out to a no-op most of the time, we certainly do not want this needless
book keeping. Secondly, and most visibly, even though no debug message
is printed when DEBUG is not defined, if there is a flood of
invocations, dmesg winds up peppered with messages such as
"net_ratelimit: 320 callbacks suppressed". This is because our
aforementioned net_ratelimit() function actually prints this text in
some circumstances. It's especially odd to see this when there isn't any
other accompanying debug message.

So, in sum, it doesn't make sense to have this function's current
behavior, and instead it should match what every other debug family of
functions in the kernel does with !DEBUG -- nothing.

This patch replaces calls to net_dbg_ratelimited when !DEBUG with
no_printk, keeping with the idiom of all the other debug print helpers.

Also, though not strictly neccessary, it guards the call with an if (0)
so that all evaluation of any arguments are sure to be compiled out.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agosparc64: Fix return from trap window fill crashes.
David S. Miller [Sun, 29 May 2016 03:41:12 +0000 (20:41 -0700)]
sparc64: Fix return from trap window fill crashes.

[ Upstream commit 7cafc0b8bf130f038b0ec2dcdd6a9de6dc59b65a ]

We must handle data access exception as well as memory address unaligned
exceptions from return from trap window fill faults, not just normal
TLB misses.

Otherwise we can get an OOPS that looks like this:

ld-linux.so.2(36808): Kernel bad sw trap 5 [#1]
CPU: 1 PID: 36808 Comm: ld-linux.so.2 Not tainted 4.6.0 #34
task: fff8000303be5c60 ti: fff8000301344000 task.ti: fff8000301344000
TSTATE: 0000004410001601 TPC: 0000000000a1a784 TNPC: 0000000000a1a788 Y: 00000002    Not tainted
TPC: <do_sparc64_fault+0x5c4/0x700>
g0: fff8000024fc8248 g1: 0000000000db04dc g2: 0000000000000000 g3: 0000000000000001
g4: fff8000303be5c60 g5: fff800030e672000 g6: fff8000301344000 g7: 0000000000000001
o0: 0000000000b95ee8 o1: 000000000000012b o2: 0000000000000000 o3: 0000000200b9b358
o4: 0000000000000000 o5: fff8000301344040 sp: fff80003013475c1 ret_pc: 0000000000a1a77c
RPC: <do_sparc64_fault+0x5bc/0x700>
l0: 00000000000007ff l1: 0000000000000000 l2: 000000000000005f l3: 0000000000000000
l4: fff8000301347e98 l5: fff8000024ff3060 l6: 0000000000000000 l7: 0000000000000000
i0: fff8000301347f60 i1: 0000000000102400 i2: 0000000000000000 i3: 0000000000000000
i4: 0000000000000000 i5: 0000000000000000 i6: fff80003013476a1 i7: 0000000000404d4c
I7: <user_rtt_fill_fixup+0x6c/0x7c>
Call Trace:
 [0000000000404d4c] user_rtt_fill_fixup+0x6c/0x7c

The window trap handlers are slightly clever, the trap table entries for them are
composed of two pieces of code.  First comes the code that actually performs
the window fill or spill trap handling, and then there are three instructions at
the end which are for exception processing.

The userland register window fill handler is:

add %sp, STACK_BIAS + 0x00, %g1; \
ldxa [%g1 + %g0] ASI, %l0; \
mov 0x08, %g2; \
mov 0x10, %g3; \
ldxa [%g1 + %g2] ASI, %l1; \
mov 0x18, %g5; \
ldxa [%g1 + %g3] ASI, %l2; \
ldxa [%g1 + %g5] ASI, %l3; \
add %g1, 0x20, %g1; \
ldxa [%g1 + %g0] ASI, %l4; \
ldxa [%g1 + %g2] ASI, %l5; \
ldxa [%g1 + %g3] ASI, %l6; \
ldxa [%g1 + %g5] ASI, %l7; \
add %g1, 0x20, %g1; \
ldxa [%g1 + %g0] ASI, %i0; \
ldxa [%g1 + %g2] ASI, %i1; \
ldxa [%g1 + %g3] ASI, %i2; \
ldxa [%g1 + %g5] ASI, %i3; \
add %g1, 0x20, %g1; \
ldxa [%g1 + %g0] ASI, %i4; \
ldxa [%g1 + %g2] ASI, %i5; \
ldxa [%g1 + %g3] ASI, %i6; \
ldxa [%g1 + %g5] ASI, %i7; \
restored; \
retry; nop; nop; nop; nop; \
b,a,pt %xcc, fill_fixup_dax; \
b,a,pt %xcc, fill_fixup_mna; \
b,a,pt %xcc, fill_fixup;

And the way this works is that if any of those memory accesses
generate an exception, the exception handler can revector to one of
those final three branch instructions depending upon which kind of
exception the memory access took.  In this way, the fault handler
doesn't have to know if it was a spill or a fill that it's handling
the fault for.  It just always branches to the last instruction in
the parent trap's handler.

For example, for a regular fault, the code goes:

winfix_trampoline:
rdpr %tpc, %g3
or %g3, 0x7c, %g3
wrpr %g3, %tnpc
done

All window trap handlers are 0x80 aligned, so if we "or" 0x7c into the
trap time program counter, we'll get that final instruction in the
trap handler.

On return from trap, we have to pull the register window in but we do
this by hand instead of just executing a "restore" instruction for
several reasons.  The largest being that from Niagara and onward we
simply don't have enough levels in the trap stack to fully resolve all
possible exception cases of a window fault when we are already at
trap level 1 (which we enter to get ready to return from the original
trap).

This is executed inline via the FILL_*_RTRAP handlers.  rtrap_64.S's
code branches directly to these to do the window fill by hand if
necessary.  Now if you look at them, we'll see at the end:

    ba,a,pt    %xcc, user_rtt_fill_fixup;
    ba,a,pt    %xcc, user_rtt_fill_fixup;
    ba,a,pt    %xcc, user_rtt_fill_fixup;

And oops, all three cases are handled like a fault.

This doesn't work because each of these trap types (data access
exception, memory address unaligned, and faults) store their auxiliary
info in different registers to pass on to the C handler which does the
real work.

So in the case where the stack was unaligned, the unaligned trap
handler sets up the arg registers one way, and then we branched to
the fault handler which expects them setup another way.

So the FAULT_TYPE_* value ends up basically being garbage, and
randomly would generate the backtrace seen above.

Reported-by: Nick Alcock <nix@esperi.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agosparc: Harden signal return frame checks.
David S. Miller [Sun, 29 May 2016 04:21:31 +0000 (21:21 -0700)]
sparc: Harden signal return frame checks.

[ Upstream commit d11c2a0de2824395656cf8ed15811580c9dd38aa ]

All signal frames must be at least 16-byte aligned, because that is
the alignment we explicitly create when we build signal return stack
frames.

All stack pointers must be at least 8-byte aligned.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agosparc64: Take ctx_alloc_lock properly in hugetlb_setup().
David S. Miller [Wed, 25 May 2016 19:51:20 +0000 (12:51 -0700)]
sparc64: Take ctx_alloc_lock properly in hugetlb_setup().

[ Upstream commit 9ea46abe22550e3366ff7cee2f8391b35b12f730 ]

On cheetahplus chips we take the ctx_alloc_lock in order to
modify the TLB lookup parameters for the indexed TLBs, which
are stored in the context register.

This is called with interrupts disabled, however ctx_alloc_lock
is an IRQ safe lock, therefore we must take acquire/release it
properly with spin_{lock,unlock}_irq().

Reported-by: Meelis Roos <mroos@linux.ee>
Tested-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agosparc/PCI: Fix for panic while enabling SR-IOV
Babu Moger [Thu, 24 Mar 2016 20:02:22 +0000 (13:02 -0700)]
sparc/PCI: Fix for panic while enabling SR-IOV

[ Upstream commit d0c31e02005764dae0aab130a57e9794d06b824d ]

We noticed this panic while enabling SR-IOV in sparc.

mlx4_core: Mellanox ConnectX core driver v2.2-1 (Jan  1 2015)
mlx4_core: Initializing 0007:01:00.0
mlx4_core 0007:01:00.0: Enabling SR-IOV with 5 VFs
mlx4_core: Initializing 0007:01:00.1
Unable to handle kernel NULL pointer dereference
insmod(10010): Oops [#1]
CPU: 391 PID: 10010 Comm: insmod Not tainted
4.1.12-32.el6uek.kdump2.sparc64 #1
TPC: <dma_supported+0x20/0x80>
I7: <__mlx4_init_one+0x324/0x500 [mlx4_core]>
Call Trace:
 [00000000104c5ea4] __mlx4_init_one+0x324/0x500 [mlx4_core]
 [00000000104c613c] mlx4_init_one+0xbc/0x120 [mlx4_core]
 [0000000000725f14] local_pci_probe+0x34/0xa0
 [0000000000726028] pci_call_probe+0xa8/0xe0
 [0000000000726310] pci_device_probe+0x50/0x80
 [000000000079f700] really_probe+0x140/0x420
 [000000000079fa24] driver_probe_device+0x44/0xa0
 [000000000079fb5c] __device_attach+0x3c/0x60
 [000000000079d85c] bus_for_each_drv+0x5c/0xa0
 [000000000079f588] device_attach+0x88/0xc0
 [000000000071acd0] pci_bus_add_device+0x30/0x80
 [0000000000736090] virtfn_add.clone.1+0x210/0x360
 [00000000007364a4] sriov_enable+0x2c4/0x520
 [000000000073672c] pci_enable_sriov+0x2c/0x40
 [00000000104c2d58] mlx4_enable_sriov+0xf8/0x180 [mlx4_core]
 [00000000104c49ac] mlx4_load_one+0x42c/0xd40 [mlx4_core]
Disabling lock debugging due to kernel taint
Caller[00000000104c5ea4]: __mlx4_init_one+0x324/0x500 [mlx4_core]
Caller[00000000104c613c]: mlx4_init_one+0xbc/0x120 [mlx4_core]
Caller[0000000000725f14]: local_pci_probe+0x34/0xa0
Caller[0000000000726028]: pci_call_probe+0xa8/0xe0
Caller[0000000000726310]: pci_device_probe+0x50/0x80
Caller[000000000079f700]: really_probe+0x140/0x420
Caller[000000000079fa24]: driver_probe_device+0x44/0xa0
Caller[000000000079fb5c]: __device_attach+0x3c/0x60
Caller[000000000079d85c]: bus_for_each_drv+0x5c/0xa0
Caller[000000000079f588]: device_attach+0x88/0xc0
Caller[000000000071acd0]: pci_bus_add_device+0x30/0x80
Caller[0000000000736090]: virtfn_add.clone.1+0x210/0x360
Caller[00000000007364a4]: sriov_enable+0x2c4/0x520
Caller[000000000073672c]: pci_enable_sriov+0x2c/0x40
Caller[00000000104c2d58]: mlx4_enable_sriov+0xf8/0x180 [mlx4_core]
Caller[00000000104c49ac]: mlx4_load_one+0x42c/0xd40 [mlx4_core]
Caller[00000000104c5f90]: __mlx4_init_one+0x410/0x500 [mlx4_core]
Caller[00000000104c613c]: mlx4_init_one+0xbc/0x120 [mlx4_core]
Caller[0000000000725f14]: local_pci_probe+0x34/0xa0
Caller[0000000000726028]: pci_call_probe+0xa8/0xe0
Caller[0000000000726310]: pci_device_probe+0x50/0x80
Caller[000000000079f700]: really_probe+0x140/0x420
Caller[000000000079fa24]: driver_probe_device+0x44/0xa0
Caller[000000000079fb08]: __driver_attach+0x88/0xa0
Caller[000000000079d90c]: bus_for_each_dev+0x6c/0xa0
Caller[000000000079f29c]: driver_attach+0x1c/0x40
Caller[000000000079e35c]: bus_add_driver+0x17c/0x220
Caller[00000000007a02d4]: driver_register+0x74/0x120
Caller[00000000007263fc]: __pci_register_driver+0x3c/0x60
Caller[00000000104f62bc]: mlx4_init+0x60/0xcc [mlx4_core]
Kernel panic - not syncing: Fatal exception
Press Stop-A (L1-A) to return to the boot prom
---[ end Kernel panic - not syncing: Fatal exception

Details:
Here is the call sequence
virtfn_add->__mlx4_init_one->dma_set_mask->dma_supported

The panic happened at line 760(file arch/sparc/kernel/iommu.c)

758 int dma_supported(struct device *dev, u64 device_mask)
759 {
760         struct iommu *iommu = dev->archdata.iommu;
761         u64 dma_addr_mask = iommu->dma_addr_mask;
762
763         if (device_mask >= (1UL << 32UL))
764                 return 0;
765
766         if ((device_mask & dma_addr_mask) == dma_addr_mask)
767                 return 1;
768
769 #ifdef CONFIG_PCI
770         if (dev_is_pci(dev))
771 return pci64_dma_supported(to_pci_dev(dev), device_mask);
772 #endif
773
774         return 0;
775 }
776 EXPORT_SYMBOL(dma_supported);

Same panic happened with Intel ixgbe driver also.

SR-IOV code looks for arch specific data while enabling
VFs. When VF device is added, driver probe function makes set
of calls to initialize the pci device. Because the VF device is
added different way than the normal PF device(which happens via
of_create_pci_dev for sparc), some of the arch specific initialization
does not happen for VF device.  That causes panic when archdata is
accessed.

To fix this, I have used already defined weak function
pcibios_setup_device to copy archdata from PF to VF.
Also verified the fix.

Signed-off-by: Babu Moger <babu.moger@oracle.com>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agosparc64: Fix sparc64_set_context stack handling.
David S. Miller [Tue, 1 Mar 2016 05:25:32 +0000 (00:25 -0500)]
sparc64: Fix sparc64_set_context stack handling.

[ Upstream commit 397d1533b6cce0ccb5379542e2e6d079f6936c46 ]

Like a signal return, we should use synchronize_user_stack() rather
than flush_user_windows().

Reported-by: Ilya Malakhov <ilmalakhovthefirst@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agosparc64: Fix bootup regressions on some Kconfig combinations.
David S. Miller [Wed, 27 Apr 2016 21:27:37 +0000 (17:27 -0400)]
sparc64: Fix bootup regressions on some Kconfig combinations.

[ Upstream commit 49fa5230462f9f2c4e97c81356473a6bdf06c422 ]

The system call tracing bug fix mentioned in the Fixes tag
below increased the amount of assembler code in the sequence
of assembler files included by head_64.S

This caused to total set of code to exceed 0x4000 bytes in
size, which overflows the expression in head_64.S that works
to place swapper_tsb at address 0x408000.

When this is violated, the TSB is not properly aligned, and
also the trap table is not aligned properly either.  All of
this together results in failed boots.

So, do two things:

1) Simplify some code by using ba,a instead of ba/nop to get
   those bytes back.

2) Add a linker script assertion to make sure that if this
   happens again the build will fail.

Fixes: 1a40b95374f6 ("sparc: Fix system call tracing register handling.")
Reported-by: Meelis Roos <mroos@linux.ee>
Reported-by: Joerg Abraham <joerg.abraham@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agosparc: Fix system call tracing register handling.
Mike Frysinger [Mon, 18 Jan 2016 11:32:30 +0000 (06:32 -0500)]
sparc: Fix system call tracing register handling.

[ Upstream commit 1a40b95374f680625318ab61d81958e949e0afe3 ]

A system call trace trigger on entry allows the tracing
process to inspect and potentially change the traced
process's registers.

Account for that by reloading the %g1 (syscall number)
and %i0-%i5 (syscall argument) values.  We need to be
careful to revalidate the range of %g1, and reload the
system call table entry it corresponds to into %l7.

Reported-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tested-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agom32r: add io*_rep helpers
Sudip Mukherjee [Tue, 29 Dec 2015 22:54:19 +0000 (14:54 -0800)]
m32r: add io*_rep helpers

commit 92a8ed4c7643809123ef0a65424569eaacc5c6b0 upstream.

m32r allmodconfig was failing with the error:

  error: implicit declaration of function 'read'

On checking io.h it turned out that 'read' is not defined but 'readb' is
defined and 'ioread8' will then obviously mean 'readb'.

At the same time some of the helper functions ioreadN_rep() and
iowriteN_rep() were missing which also led to the build failure.

Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agom32r: add definition of ioremap_wc to io.h
Abhilash Kesavan [Fri, 6 Feb 2015 13:45:26 +0000 (19:15 +0530)]
m32r: add definition of ioremap_wc to io.h

commit 71a49d16f06de2ccdf52ca247d496a2bb1ca36fe upstream.

Before adding a resource managed ioremap_wc function, we need
to have ioremap_wc defined for m32r to prevent build errors.

Signed-off-by: Abhilash Kesavan <a.kesavan@samsung.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoipv4/fib: don't warn when primary address is missing if in_dev is dead
Paolo Abeni [Thu, 21 Apr 2016 20:23:31 +0000 (22:23 +0200)]
ipv4/fib: don't warn when primary address is missing if in_dev is dead

[ Upstream commit 391a20333b8393ef2e13014e6e59d192c5594471 ]

After commit fbd40ea0180a ("ipv4: Don't do expensive useless work
during inetdev destroy.") when deleting an interface,
fib_del_ifaddr() can be executed without any primary address
present on the dead interface.

The above is safe, but triggers some "bug: prim == NULL" warnings.

This commit avoids warning if the in_dev is dead

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agonet/route: enforce hoplimit max value
Paolo Abeni [Fri, 13 May 2016 16:33:41 +0000 (18:33 +0200)]
net/route: enforce hoplimit max value

[ Upstream commit 626abd59e51d4d8c6367e03aae252a8aa759ac78 ]

Currently, when creating or updating a route, no check is performed
in both ipv4 and ipv6 code to the hoplimit value.

The caller can i.e. set hoplimit to 256, and when such route will
 be used, packets will be sent with hoplimit/ttl equal to 0.

This commit adds checks for the RTAX_HOPLIMIT value, in both ipv4
ipv6 route code, substituting any value greater than 255 with 255.

This is consistent with what is currently done for ADVMSS and MTU
in the ipv4 code.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.16: for IPv6, add the check to fib6_commit_metrics()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agol2tp: avoid use-after-free caused by l2tp_ip_backlog_recv
Paul Hüber [Sun, 26 Feb 2017 16:58:19 +0000 (17:58 +0100)]
l2tp: avoid use-after-free caused by l2tp_ip_backlog_recv

commit 51fb60eb162ab84c5edf2ae9c63cf0b878e5547e upstream.

l2tp_ip_backlog_recv may not return -1 if the packet gets dropped.
The return value is passed up to ip_local_deliver_finish, which treats
negative values as an IP protocol number for resubmission.

Signed-off-by: Paul Hüber <phueber@kernsp.in>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoBluetooth: Properly check L2CAP config option output buffer length
Ben Seri [Sat, 9 Sep 2017 21:15:59 +0000 (23:15 +0200)]
Bluetooth: Properly check L2CAP config option output buffer length

commit e860d2c904d1a9f38a24eb44c9f34b8f915a6ea3 upstream.

Validate the output buffer length for L2CAP config requests and responses
to avoid overflowing the stack buffer used for building the option blocks.

Signed-off-by: Ben Seri <ben@armis.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoscsi: scsi_transport_iscsi: fix the issue that iscsi_if_rx doesn't parse nlmsg properly
Xin Long [Sun, 27 Aug 2017 12:25:26 +0000 (20:25 +0800)]
scsi: scsi_transport_iscsi: fix the issue that iscsi_if_rx doesn't parse nlmsg properly

commit c88f0e6b06f4092995688211a631bb436125d77b upstream.

ChunYu found a kernel crash by syzkaller:

[  651.617875] kasan: CONFIG_KASAN_INLINE enabled
[  651.618217] kasan: GPF could be caused by NULL-ptr deref or user memory access
[  651.618731] general protection fault: 0000 [#1] SMP KASAN
[  651.621543] CPU: 1 PID: 9539 Comm: scsi Not tainted 4.11.0.cov #32
[  651.621938] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[  651.622309] task: ffff880117780000 task.stack: ffff8800a3188000
[  651.622762] RIP: 0010:skb_release_data+0x26c/0x590
[...]
[  651.627260] Call Trace:
[  651.629156]  skb_release_all+0x4f/0x60
[  651.629450]  consume_skb+0x1a5/0x600
[  651.630705]  netlink_unicast+0x505/0x720
[  651.632345]  netlink_sendmsg+0xab2/0xe70
[  651.633704]  sock_sendmsg+0xcf/0x110
[  651.633942]  ___sys_sendmsg+0x833/0x980
[  651.637117]  __sys_sendmsg+0xf3/0x240
[  651.638820]  SyS_sendmsg+0x32/0x50
[  651.639048]  entry_SYSCALL_64_fastpath+0x1f/0xc2

It's caused by skb_shared_info at the end of sk_buff was overwritten by
ISCSI_KEVENT_IF_ERROR when parsing nlmsg info from skb in iscsi_if_rx.

During the loop if skb->len == nlh->nlmsg_len and both are sizeof(*nlh),
ev = nlmsg_data(nlh) will acutally get skb_shinfo(SKB) instead and set a
new value to skb_shinfo(SKB)->nr_frags by ev->type.

This patch is to fix it by checking nlh->nlmsg_len properly there to
avoid over accessing sk_buff.

Reported-by: ChunYu Wang <chunwang@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoxfs: XFS_IS_REALTIME_INODE() should be false if no rt device present
Richard Wareing [Tue, 12 Sep 2017 23:09:35 +0000 (09:09 +1000)]
xfs: XFS_IS_REALTIME_INODE() should be false if no rt device present

commit b31ff3cdf540110da4572e3e29bd172087af65cc upstream.

If using a kernel with CONFIG_XFS_RT=y and we set the RHINHERIT flag on
a directory in a filesystem that does not have a realtime device and
create a new file in that directory, it gets marked as a real time file.
When data is written and a fsync is issued, the filesystem attempts to
flush a non-existent rt device during the fsync process.

This results in a crash dereferencing a null buftarg pointer in
xfs_blkdev_issue_flush():

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
  IP: xfs_blkdev_issue_flush+0xd/0x20
  .....
  Call Trace:
    xfs_file_fsync+0x188/0x1c0
    vfs_fsync_range+0x3b/0xa0
    do_fsync+0x3d/0x70
    SyS_fsync+0x10/0x20
    do_syscall_64+0x4d/0xb0
    entry_SYSCALL64_slow_path+0x25/0x25

Setting RT inode flags does not require special privileges so any
unprivileged user can cause this oops to occur.  To reproduce, confirm
kernel is compiled with CONFIG_XFS_RT=y and run:

  # mkfs.xfs -f /dev/pmem0
  # mount /dev/pmem0 /mnt/test
  # mkdir /mnt/test/foo
  # xfs_io -c 'chattr +t' /mnt/test/foo
  # xfs_io -f -c 'pwrite 0 5m' -c fsync /mnt/test/foo/bar

Or just run xfstests with MKFS_OPTIONS="-d rtinherit=1" and wait.

Kernels built with CONFIG_XFS_RT=n are not exposed to this bug.

Fixes: f538d4da8d52 ("[XFS] write barrier support")
Signed-off-by: Richard Wareing <rwareing@fb.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.16: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agovideo: fbdev: aty: do not leak uninitialized padding in clk to userspace
Vladis Dronov [Mon, 4 Sep 2017 14:00:50 +0000 (16:00 +0200)]
video: fbdev: aty: do not leak uninitialized padding in clk to userspace

commit 8e75f7a7a00461ef6d91797a60b606367f6e344d upstream.

'clk' is copied to a userland with padding byte(s) after 'vclk_post_div'
field unitialized, leaking data from the stack. Fix this ensuring all of
'clk' is initialized to zero.

References: https://github.com/torvalds/linux/pull/441
Reported-by: sohu0106 <sohu0106@126.com>
Signed-off-by: Vladis Dronov <vdronov@redhat.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agokvm: nVMX: Don't allow L2 to access the hardware CR8
Jim Mattson [Tue, 12 Sep 2017 20:02:54 +0000 (13:02 -0700)]
kvm: nVMX: Don't allow L2 to access the hardware CR8

commit 51aa68e7d57e3217192d88ce90fd5b8ef29ec94f upstream.

If L1 does not specify the "use TPR shadow" VM-execution control in
vmcs12, then L0 must specify the "CR8-load exiting" and "CR8-store
exiting" VM-execution controls in vmcs02. Failure to do so will give
the L2 VM unrestricted read/write access to the hardware CR8.

This fixes CVE-2017-12154.

Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agonl80211: check for the required netlink attributes presence
Vladis Dronov [Tue, 12 Sep 2017 22:21:21 +0000 (00:21 +0200)]
nl80211: check for the required netlink attributes presence

commit e785fa0a164aa11001cba931367c7f94ffaff888 upstream.

nl80211_set_rekey_data() does not check if the required attributes
NL80211_REKEY_DATA_{REPLAY_CTR,KEK,KCK} are present when processing
NL80211_CMD_SET_REKEY_OFFLOAD request. This request can be issued by
users with CAP_NET_ADMIN privilege and may result in NULL dereference
and a system crash. Add a check for the required attributes presence.
This patch is based on the patch by bo Zhang.

This fixes CVE-2017-12153.

References: https://bugzilla.redhat.com/show_bug.cgi?id=1491046
Fixes: e5497d766ad ("cfg80211/nl80211: support GTK rekey offload")
Reported-by: bo Zhang <zhangbo5891001@gmail.com>
Signed-off-by: Vladis Dronov <vdronov@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agosaa7164: fix double fetch PCIe access condition
Steven Toth [Tue, 6 Jun 2017 12:30:27 +0000 (09:30 -0300)]
saa7164: fix double fetch PCIe access condition

commit 6fb05e0dd32e566facb96ea61a48c7488daa5ac3 upstream.

Avoid a double fetch by reusing the values from the prior transfer.

Originally reported via https://bugzilla.kernel.org/show_bug.cgi?id=195559

Thanks to Pengfei Wang <wpengfeinudt@gmail.com> for reporting.

Signed-off-by: Steven Toth <stoth@kernellabs.com>
Reported-by: Pengfei Wang <wpengfeinudt@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agosaa7164: fix sparse warnings
Hans Verkuil [Fri, 7 Nov 2014 14:39:46 +0000 (11:39 -0300)]
saa7164: fix sparse warnings

commit 065e1477d277174242e73e7334c717b840d0693f upstream.

Fix many sparse warnings:

drivers/media/pci/saa7164/saa7164-core.c:97:18: warning: cast removes address space of expression
drivers/media/pci/saa7164/saa7164-core.c:122:31: warning: cast removes address space of expression
drivers/media/pci/saa7164/saa7164-core.c:122:31: warning: incorrect type in initializer (different address spaces)
drivers/media/pci/saa7164/saa7164-core.c:122:31:    expected unsigned char [noderef] [usertype] <asn:2>*bufcpu
drivers/media/pci/saa7164/saa7164-core.c:122:31:    got unsigned char [usertype] *<noident>
drivers/media/pci/saa7164/saa7164-core.c:282:44: warning: cast removes address space of expression
drivers/media/pci/saa7164/saa7164-core.c:286:38: warning: cast removes address space of expression
drivers/media/pci/saa7164/saa7164-core.c:286:35: warning: incorrect type in assignment (different address spaces)
drivers/media/pci/saa7164/saa7164-core.c:286:35:    expected unsigned char [noderef] [usertype] <asn:2>*p
drivers/media/pci/saa7164/saa7164-core.c:286:35:    got unsigned char [usertype] *<noident>
drivers/media/pci/saa7164/saa7164-core.c:352:44: warning: cast removes address space of expression
drivers/media/pci/saa7164/saa7164-core.c:527:53: warning: cast removes address space of expression
drivers/media/pci/saa7164/saa7164-core.c:129:30: warning: dereference of noderef expression
drivers/media/pci/saa7164/saa7164-core.c:133:38: warning: dereference of noderef expression
drivers/media/pci/saa7164/saa7164-core.c:133:72: warning: dereference of noderef expression
drivers/media/pci/saa7164/saa7164-core.c:134:35: warning: dereference of noderef expression
drivers/media/pci/saa7164/saa7164-core.c:287:61: warning: dereference of noderef expression
drivers/media/pci/saa7164/saa7164-core.c:288:65: warning: dereference of noderef expression
drivers/media/pci/saa7164/saa7164-core.c:289:65: warning: dereference of noderef expression
drivers/media/pci/saa7164/saa7164-core.c:290:65: warning: dereference of noderef expression
drivers/media/pci/saa7164/saa7164-core.c:291:65: warning: dereference of noderef expression
drivers/media/pci/saa7164/saa7164-core.c:292:65: warning: dereference of noderef expression
drivers/media/pci/saa7164/saa7164-core.c:293:65: warning: dereference of noderef expression
drivers/media/pci/saa7164/saa7164-core.c:294:65: warning: dereference of noderef expression
drivers/media/pci/saa7164/saa7164-fw.c:548:52: warning: incorrect type in argument 5 (different address spaces)
drivers/media/pci/saa7164/saa7164-fw.c:548:52:    expected unsigned char [usertype] *dst
drivers/media/pci/saa7164/saa7164-fw.c:548:52:    got unsigned char [noderef] [usertype] <asn:2>*
drivers/media/pci/saa7164/saa7164-fw.c:579:44: warning: incorrect type in argument 5 (different address spaces)
drivers/media/pci/saa7164/saa7164-fw.c:579:44:    expected unsigned char [usertype] *dst
drivers/media/pci/saa7164/saa7164-fw.c:579:44:    got unsigned char [noderef] [usertype] <asn:2>*
drivers/media/pci/saa7164/saa7164-fw.c:597:44: warning: incorrect type in argument 5 (different address spaces)
drivers/media/pci/saa7164/saa7164-fw.c:597:44:    expected unsigned char [usertype] *dst
drivers/media/pci/saa7164/saa7164-fw.c:597:44:    got unsigned char [noderef] [usertype] <asn:2>*
drivers/media/pci/saa7164/saa7164-bus.c:36:36: warning: cast removes address space of expression
drivers/media/pci/saa7164/saa7164-bus.c:41:36: warning: cast removes address space of expression
drivers/media/pci/saa7164/saa7164-bus.c:151:19: warning: incorrect type in assignment (different base types)
drivers/media/pci/saa7164/saa7164-bus.c:151:19:    expected unsigned short [unsigned] [usertype] size
drivers/media/pci/saa7164/saa7164-bus.c:151:19:    got restricted __le16 [usertype] <noident>
drivers/media/pci/saa7164/saa7164-bus.c:152:22: warning: incorrect type in assignment (different base types)
drivers/media/pci/saa7164/saa7164-bus.c:152:22:    expected unsigned int [unsigned] [usertype] command
drivers/media/pci/saa7164/saa7164-bus.c:152:22:    got restricted __le32 [usertype] <noident>
drivers/media/pci/saa7164/saa7164-bus.c:153:30: warning: incorrect type in assignment (different base types)
drivers/media/pci/saa7164/saa7164-bus.c:153:30:    expected unsigned short [unsigned] [usertype] controlselector
drivers/media/pci/saa7164/saa7164-bus.c:153:30:    got restricted __le16 [usertype] <noident>
drivers/media/pci/saa7164/saa7164-bus.c:172:20: warning: cast to restricted __le32
drivers/media/pci/saa7164/saa7164-bus.c:173:20: warning: cast to restricted __le32
drivers/media/pci/saa7164/saa7164-bus.c:206:28: warning: cast to restricted __le32
drivers/media/pci/saa7164/saa7164-bus.c:287:9: warning: incorrect type in argument 1 (different base types)
drivers/media/pci/saa7164/saa7164-bus.c:287:9:    expected unsigned int [unsigned] val
drivers/media/pci/saa7164/saa7164-bus.c:287:9:    got restricted __le32 [usertype] <noident>
drivers/media/pci/saa7164/saa7164-bus.c:339:20: warning: cast to restricted __le32
drivers/media/pci/saa7164/saa7164-bus.c:340:20: warning: cast to restricted __le32
drivers/media/pci/saa7164/saa7164-bus.c:463:9: warning: incorrect type in argument 1 (different base types)
drivers/media/pci/saa7164/saa7164-bus.c:463:9:    expected unsigned int [unsigned] val
drivers/media/pci/saa7164/saa7164-bus.c:463:9:    got restricted __le32 [usertype] <noident>
drivers/media/pci/saa7164/saa7164-bus.c:466:21: warning: cast to restricted __le16
drivers/media/pci/saa7164/saa7164-bus.c:467:24: warning: cast to restricted __le32
drivers/media/pci/saa7164/saa7164-bus.c:468:32: warning: cast to restricted __le16
drivers/media/pci/saa7164/saa7164-buffer.c:122:18: warning: incorrect type in assignment (different address spaces)
drivers/media/pci/saa7164/saa7164-buffer.c:122:18:    expected unsigned long long [noderef] [usertype] <asn:2>*cpu
drivers/media/pci/saa7164/saa7164-buffer.c:122:18:    got void *
drivers/media/pci/saa7164/saa7164-buffer.c:127:21: warning: incorrect type in assignment (different address spaces)
drivers/media/pci/saa7164/saa7164-buffer.c:127:21:    expected unsigned long long [noderef] [usertype] <asn:2>*pt_cpu
drivers/media/pci/saa7164/saa7164-buffer.c:127:21:    got void *
drivers/media/pci/saa7164/saa7164-buffer.c:134:20: warning: cast removes address space of expression
drivers/media/pci/saa7164/saa7164-buffer.c:156:63: warning: incorrect type in argument 3 (different address spaces)
drivers/media/pci/saa7164/saa7164-buffer.c:156:63:    expected void *vaddr
drivers/media/pci/saa7164/saa7164-buffer.c:156:63:    got unsigned long long [noderef] [usertype] <asn:2>*cpu
drivers/media/pci/saa7164/saa7164-buffer.c:179:57: warning: incorrect type in argument 3 (different address spaces)
drivers/media/pci/saa7164/saa7164-buffer.c:179:57:    expected void *vaddr
drivers/media/pci/saa7164/saa7164-buffer.c:179:57:    got unsigned long long [noderef] [usertype] <asn:2>*cpu
drivers/media/pci/saa7164/saa7164-buffer.c:180:56: warning: incorrect type in argument 3 (different address spaces)
drivers/media/pci/saa7164/saa7164-buffer.c:180:56:    expected void *vaddr
drivers/media/pci/saa7164/saa7164-buffer.c:180:56:    got unsigned long long [noderef] [usertype] <asn:2>*pt_cpu
drivers/media/pci/saa7164/saa7164-buffer.c:84:17: warning: dereference of noderef expression
drivers/media/pci/saa7164/saa7164-buffer.c:147:31: warning: dereference of noderef expression
drivers/media/pci/saa7164/saa7164-buffer.c:148:17: warning: dereference of noderef expression

Most are caused by pointers marked as __iomem when they aren't or not marked as
__iomem when they should.

Also note that readl/writel already do endian conversion, so there is no need to
do it again.

saa7164_bus_set/get were a bit tricky: you have to make sure the msg endian
conversion is done at the right time, and that the code isn't using fields that
are still little endian instead of cpu-endianness.

The approach chosen is to convert just before writing to the ring buffer
and to convert it back right after reading from the ring buffer.

Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Cc: Steven Toth <stoth@kernellabs.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agobtrfs: preserve i_mode if __btrfs_set_acl() fails
Ernesto A. Fernández [Wed, 2 Aug 2017 06:18:27 +0000 (03:18 -0300)]
btrfs: preserve i_mode if __btrfs_set_acl() fails

commit d7d824966530acfe32b94d1ed672e6fe1638cd68 upstream.

When changing a file's acl mask, btrfs_set_acl() will first set the
group bits of i_mode to the value of the mask, and only then set the
actual extended attribute representing the new acl.

If the second part fails (due to lack of space, for example) and the
file had no acl attribute to begin with, the system will from now on
assume that the mask permission bits are actual group permission bits,
potentially granting access to the wrong users.

Prevent this by restoring the original mode bits if __btrfs_set_acl
fails.

Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agof2fs: preserve i_mode if __f2fs_set_acl() fails
Ernesto A. Fernández [Mon, 24 Jul 2017 01:32:54 +0000 (22:32 -0300)]
f2fs: preserve i_mode if __f2fs_set_acl() fails

commit 14af20fcb1833dd776822361891963c90f7b0262 upstream.

When changing a file's acl mask, __f2fs_set_acl() will first set the
group bits of i_mode to the value of the mask, and only then set the
actual extended attribute representing the new acl.

If the second part fails (due to lack of space, for example) and the
file had no acl attribute to begin with, the system will from now on
assume that the mask permission bits are actual group permission bits,
potentially granting access to the wrong users.

Prevent this by only changing the inode mode after the acl has been set.

Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agof2fs: Don't clear SGID when inheriting ACLs
Jaegeuk Kim [Tue, 11 Jul 2017 21:56:49 +0000 (14:56 -0700)]
f2fs: Don't clear SGID when inheriting ACLs

commit c925dc162f770578ff4a65ec9b08270382dba9e6 upstream.

This patch copies commit b7f8a09f80:
"btrfs: Don't clear SGID when inheriting ACLs" written by Jan.

Fixes: 073931017b49d9458aa351605b43a7e34598caef
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoext4: Don't clear SGID when inheriting ACLs
Jan Kara [Mon, 31 Jul 2017 03:33:01 +0000 (23:33 -0400)]
ext4: Don't clear SGID when inheriting ACLs

commit a3bb2d5587521eea6dab2d05326abb0afb460abd upstream.

When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit
set, DIR1 is expected to have SGID bit set (and owning group equal to
the owning group of 'DIR0'). However when 'DIR0' also has some default
ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on
'DIR1' to get cleared if user is not member of the owning group.

Fix the problem by moving posix_acl_update_mode() out of
__ext4_set_acl() into ext4_set_acl(). That way the function will not be
called when inheriting ACLs which is what we want as it prevents SGID
bit clearing and the mode has been properly set by posix_acl_create()
anyway.

Fixes: 073931017b49d9458aa351605b43a7e34598caef
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
[bwh: Backported to 3.16:
 - Keep using ext4_current_time()
 - Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoext4: preserve i_mode if __ext4_set_acl() fails
Ernesto A. Fernández [Mon, 31 Jul 2017 02:43:41 +0000 (22:43 -0400)]
ext4: preserve i_mode if __ext4_set_acl() fails

commit 397e434176bb62bc6068d2210af1d876c6212a7e upstream.

When changing a file's acl mask, __ext4_set_acl() will first set the group
bits of i_mode to the value of the mask, and only then set the actual
extended attribute representing the new acl.

If the second part fails (due to lack of space, for example) and the file
had no acl attribute to begin with, the system will from now on assume
that the mask permission bits are actual group permission bits, potentially
granting access to the wrong users.

Prevent this by only changing the inode mode after the acl has been set.

Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
[bwh: Backported to 3.16: keep using ext4_current_time()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agojfs: preserve i_mode if __jfs_set_acl() fails
Ernesto A. Fernández [Wed, 12 Jul 2017 09:55:35 +0000 (06:55 -0300)]
jfs: preserve i_mode if __jfs_set_acl() fails

commit f070e5ac9bc7de71c34402048ce5526dccbd347c upstream.

When changing a file's acl mask, __jfs_set_acl() will first set the group
bits of i_mode to the value of the mask, and only then set the actual
extended attribute representing the new acl.

If the second part fails (due to lack of space, for example) and the file
had no acl attribute to begin with, the system will from now on assume
that the mask permission bits are actual group permission bits, potentially
granting access to the wrong users.

Prevent this by only changing the inode mode after the acl has been set.

Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
[bwh: Backported to 3.16: keep using CURRENT_TIME]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agojfs: Don't clear SGID when inheriting ACLs
Jan Kara [Thu, 22 Jun 2017 13:31:10 +0000 (15:31 +0200)]
jfs: Don't clear SGID when inheriting ACLs

commit 9bcf66c72d726322441ec82962994e69157613e4 upstream.

When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit
set, DIR1 is expected to have SGID bit set (and owning group equal to
the owning group of 'DIR0'). However when 'DIR0' also has some default
ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on
'DIR1' to get cleared if user is not member of the owning group.

Fix the problem by moving posix_acl_update_mode() out of
__jfs_set_acl() into jfs_set_acl(). That way the function will not be
called when inheriting ACLs which is what we want as it prevents SGID
bit clearing and the mode has been properly set by posix_acl_create()
anyway.

Fixes: 073931017b49d9458aa351605b43a7e34598caef
CC: jfs-discussion@lists.sourceforge.net
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
[bwh: Backported to 3.16:
 - Keep using CURRENT_TIME
 - Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoreiserfs: preserve i_mode if __reiserfs_set_acl() fails
Ernesto A. Fernández [Mon, 17 Jul 2017 16:42:41 +0000 (18:42 +0200)]
reiserfs: preserve i_mode if __reiserfs_set_acl() fails

commit fcea8aed91f53b51f9b943dc01f12d8aa666c720 upstream.

When changing a file's acl mask, reiserfs_set_acl() will first set the
group bits of i_mode to the value of the mask, and only then set the
actual extended attribute representing the new acl.

If the second part fails (due to lack of space, for example) and the
file had no acl attribute to begin with, the system will from now on
assume that the mask permission bits are actual group permission bits,
potentially granting access to the wrong users.

Prevent this by only changing the inode mode after the acl has been set.

Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoreiserfs: Don't clear SGID when inheriting ACLs
Jan Kara [Thu, 22 Jun 2017 07:32:49 +0000 (09:32 +0200)]
reiserfs: Don't clear SGID when inheriting ACLs

commit 6883cd7f68245e43e91e5ee583b7550abf14523f upstream.

When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit
set, DIR1 is expected to have SGID bit set (and owning group equal to
the owning group of 'DIR0'). However when 'DIR0' also has some default
ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on
'DIR1' to get cleared if user is not member of the owning group.

Fix the problem by moving posix_acl_update_mode() out of
__reiserfs_set_acl() into reiserfs_set_acl(). That way the function will
not be called when inheriting ACLs which is what we want as it prevents
SGID bit clearing and the mode has been properly set by
posix_acl_create() anyway.

Fixes: 073931017b49d9458aa351605b43a7e34598caef
CC: reiserfs-devel@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agohfsplus: Don't clear SGID when inheriting ACLs
Jan Kara [Wed, 21 Jun 2017 13:02:47 +0000 (15:02 +0200)]
hfsplus: Don't clear SGID when inheriting ACLs

commit 84969465ddc4f8aeb3b993123b571aa01c5f2683 upstream.

When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit
set, DIR1 is expected to have SGID bit set (and owning group equal to
the owning group of 'DIR0'). However when 'DIR0' also has some default
ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on
'DIR1' to get cleared if user is not member of the owning group.

Fix the problem by creating __hfsplus_set_posix_acl() function that does
not call posix_acl_update_mode() and use it when inheriting ACLs. That
prevents SGID bit clearing and the mode has been properly set by
posix_acl_create() anyway.

Fixes: 073931017b49d9458aa351605b43a7e34598caef
Signed-off-by: Jan Kara <jack@suse.cz>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoext3: preserve i_mode if ext2_set_acl() fails
Ben Hutchings [Sun, 8 Oct 2017 13:48:44 +0000 (14:48 +0100)]
ext3: preserve i_mode if ext2_set_acl() fails

Based on Ernesto A. Fernández's fix for ext2 (commit fe26569eb919), from
which the following description is taken:

> When changing a file's acl mask, ext2_set_acl() will first set the group
> bits of i_mode to the value of the mask, and only then set the actual
> extended attribute representing the new acl.
>
> If the second part fails (due to lack of space, for example) and the file
> had no acl attribute to begin with, the system will from now on assume
> that the mask permission bits are actual group permission bits, potentially
> granting access to the wrong users.
>
> Prevent this by only changing the inode mode after the acl has been set.

Cc: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoext3: Don't clear SGID when inheriting ACLs
Ben Hutchings [Fri, 6 Oct 2017 02:18:40 +0000 (03:18 +0100)]
ext3: Don't clear SGID when inheriting ACLs

Based on Jan Kara's fix for ext2 (commit a992f2d38e4c), from which the
following description is taken:

> When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit
> set, DIR1 is expected to have SGID bit set (and owning group equal to
> the owning group of 'DIR0'). However when 'DIR0' also has some default
> ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on
> 'DIR1' to get cleared if user is not member of the owning group.

Fix the problem by moving the posix_acl_update_mode() call up from
__ext3_set_acl() into ext3_set_acl().

Fixes: 073931017b49 ("posix_acl: Clear SGID bit when setting file permissions")
Cc: linux-ext4@vger.kernel.org
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoext2: preserve i_mode if ext2_set_acl() fails
Ernesto A. Fernández [Wed, 12 Jul 2017 09:54:19 +0000 (06:54 -0300)]
ext2: preserve i_mode if ext2_set_acl() fails

commit fe26569eb9197d845d73abe7dd20f603d79eb031 upstream.

When changing a file's acl mask, ext2_set_acl() will first set the group
bits of i_mode to the value of the mask, and only then set the actual
extended attribute representing the new acl.

If the second part fails (due to lack of space, for example) and the file
had no acl attribute to begin with, the system will from now on assume
that the mask permission bits are actual group permission bits, potentially
granting access to the wrong users.

Prevent this by only changing the inode mode after the acl has been set.

[JK: Rebased on top of "ext2: Don't clear SGID when inheriting ACLs"]
Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoext2: Don't clear SGID when inheriting ACLs
Jan Kara [Wed, 21 Jun 2017 12:34:15 +0000 (14:34 +0200)]
ext2: Don't clear SGID when inheriting ACLs

commit a992f2d38e4ce17b8c7d1f7f67b2de0eebdea069 upstream.

When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit
set, DIR1 is expected to have SGID bit set (and owning group equal to
the owning group of 'DIR0'). However when 'DIR0' also has some default
ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on
'DIR1' to get cleared if user is not member of the owning group.

Fix the problem by creating __ext2_set_acl() function that does not call
posix_acl_update_mode() and use it when inheriting ACLs. That prevents
SGID bit clearing and the mode has been properly set by
posix_acl_create() anyway.

Fixes: 073931017b49d9458aa351605b43a7e34598caef
CC: linux-ext4@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
[bwh: Backported to 3.16: keep using CURRENT_TIME_SEC]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agomm: fix overflow check in expand_upwards()
Helge Deller [Fri, 14 Jul 2017 21:49:38 +0000 (14:49 -0700)]
mm: fix overflow check in expand_upwards()

commit 37511fb5c91db93d8bd6e3f52f86e5a7ff7cfcdf upstream.

Jörn Engel noticed that the expand_upwards() function might not return
-ENOMEM in case the requested address is (unsigned long)-PAGE_SIZE and
if the architecture didn't defined TASK_SIZE as multiple of PAGE_SIZE.

Affected architectures are arm, frv, m68k, blackfin, h8300 and xtensa
which all define TASK_SIZE as 0xffffffff, but since none of those have
an upwards-growing stack we currently have no actual issue.

Nevertheless let's fix this just in case any of the architectures with
an upward-growing stack (currently parisc, metag and partly ia64) define
TASK_SIZE similar.

Link: http://lkml.kernel.org/r/20170702192452.GA11868@p100.box
Fixes: bd726c90b6b8 ("Allow stack to grow up to address space limit")
Signed-off-by: Helge Deller <deller@gmx.de>
Reported-by: Jörn Engel <joern@purestorage.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoubifs: Don't leak kernel memory to the MTD
Richard Weinberger [Fri, 16 Jun 2017 14:21:44 +0000 (16:21 +0200)]
ubifs: Don't leak kernel memory to the MTD

commit 4acadda74ff8b949c448c0282765ae747e088c87 upstream.

When UBIFS prepares data structures which will be written to the MTD it
ensues that their lengths are multiple of 8. Since it uses kmalloc() the
padded bytes are left uninitialized and we leak a few bytes of kernel
memory to the MTD.
To make sure that all bytes are initialized, let's switch to kzalloc().
Kzalloc() is fine in this case because the buffers are not huge and in
the IO path the performance bottleneck is anyway the MTD.

Fixes: 1e51764a3c2a ("UBIFS: add new flash file system")
Signed-off-by: Richard Weinberger <richard@nod.at>
Reviewed-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
[bwh: Backported to 3.16:
 - Drop change in ubifs_jnl_xrename()
 - Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoubifs: Correctly evict xattr inodes
Richard Weinberger [Tue, 16 May 2017 22:20:27 +0000 (00:20 +0200)]
ubifs: Correctly evict xattr inodes

commit 272eda8298dc82eb411ece82bbb2c62911087b24 upstream.

UBIFS handles extended attributes just like files, as consequence of
that, they also have inodes.
Therefore UBIFS does all the inode machinery also for xattrs. Since new
inodes have i_nlink of 1, a file or xattr inode will be evicted
if i_nlink goes down to 0 after an unlink. UBIFS assumes this model also
for xattrs, which is not correct.
One can create a file "foo" with xattr "user.test". By reading
"user.test" an inode will be created, and by deleting "user.test" it
will get evicted later. The assumption breaks if the file "foo", which
hosts the xattrs, will be removed. VFS nor UBIFS does not remove each
xattr via ubifs_xattr_remove(), it just removes the host inode from
the TNC and all underlying xattr nodes too and the inode will remain
in the cache and wastes memory.

To solve this problem, remove xattr inodes from the VFS inode cache in
ubifs_xattr_remove() to make sure that they get evicted.

Fixes: 1e51764a3c2ac05a ("UBIFS: add new flash file system")
Signed-off-by: Richard Weinberger <richard@nod.at>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agosunrpc: use constant time memory comparison for mac
Jason A. Donenfeld [Sat, 10 Jun 2017 02:59:07 +0000 (04:59 +0200)]
sunrpc: use constant time memory comparison for mac

commit 15a8b93fd5690de017ce665382ea45e5d61811a4 upstream.

Otherwise, we enable a MAC forgery via timing attack.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Jeff Layton <jlayton@poochiereds.net>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Cc: linux-nfs@vger.kernel.org
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agosysctl: fix lax sysctl_check_table() sanity check
Luis R. Rodriguez [Wed, 12 Jul 2017 21:33:27 +0000 (14:33 -0700)]
sysctl: fix lax sysctl_check_table() sanity check

commit 89c5b53b16bf577079d4f0311406dbea3c71202c upstream.

Patch series "sysctl: few fixes", v5.

I've been working on making kmod more deterministic, and as I did that I
couldn't help but notice a few issues with sysctl.  My end goal was just
to fix unsigned int support, which back then was completely broken.
Liping Zhang has sent up small atomic fixes, however it still missed yet
one more fix and Alexey Dobriyan had also suggested to just drop array
support given its complexity.

I have inspected array support using Coccinelle and indeed its not that
popular, so if in fact we can avoid it for new interfaces, I agree its
best.

I did develop a sysctl stress driver but will hold that off for another
series.

This patch (of 5):

Commit 7c60c48f58a7 ("sysctl: Improve the sysctl sanity checks")
improved sanity checks considerbly, however the enhancements on
sysctl_check_table() meant adding a functional change so that only the
last table entry's sanity error is propagated.  It also changed the way
errors were propagated so that each new check reset the err value, this
means only last sanity check computed is used for an error.  This has
been in the kernel since v3.4 days.

Fix this by carrying on errors from previous checks and iterations as we
traverse the table and ensuring we keep any error from previous checks.
We keep iterating on the table even if an error is found so we can
complain for all errors found in one shot.  This works as -EINVAL is
always returned on error anyway, and the check for error is any non-zero
value.

Fixes: 7c60c48f58a7 ("sysctl: Improve the sysctl sanity checks")
Link: http://lkml.kernel.org/r/20170519033554.18592-2-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoInput: i8042 - fix crash at boot time
Chen Hong [Sun, 2 Jul 2017 22:11:10 +0000 (15:11 -0700)]
Input: i8042 - fix crash at boot time

commit 340d394a789518018f834ff70f7534fc463d3226 upstream.

The driver checks port->exists twice in i8042_interrupt(), first when
trying to assign temporary "serio" variable, and second time when deciding
whether it should call serio_interrupt(). The value of port->exists may
change between the 2 checks, and we may end up calling serio_interrupt()
with a NULL pointer:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
IP: [<ffffffff8150feaf>] _spin_lock_irqsave+0x1f/0x40
PGD 0
Oops: 0002 [#1] SMP
last sysfs file:
CPU 0
Modules linked in:

Pid: 1, comm: swapper Not tainted 2.6.32-358.el6.x86_64 #1 QEMU Standard PC (i440FX + PIIX, 1996)
RIP: 0010:[<ffffffff8150feaf>]  [<ffffffff8150feaf>] _spin_lock_irqsave+0x1f/0x40
RSP: 0018:ffff880028203cc0  EFLAGS: 00010082
RAX: 0000000000010000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000282 RSI: 0000000000000098 RDI: 0000000000000050
RBP: ffff880028203cc0 R08: ffff88013e79c000 R09: ffff880028203ee0
R10: 0000000000000298 R11: 0000000000000282 R12: 0000000000000050
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000098
FS:  0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000050 CR3: 0000000001a85000 CR4: 00000000001407f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 1, threadinfo ffff88013e79c000, task ffff88013e79b500)
Stack:
ffff880028203d00 ffffffff813de186 ffffffffffffff02 0000000000000000
<d> 0000000000000000 0000000000000000 0000000000000000 0000000000000098
<d> ffff880028203d70 ffffffff813e0162 ffff880028203d20 ffffffff8103b8ac
Call Trace:
<IRQ>
 [<ffffffff813de186>] serio_interrupt+0x36/0xa0
[<ffffffff813e0162>] i8042_interrupt+0x132/0x3a0
[<ffffffff8103b8ac>] ? kvm_clock_read+0x1c/0x20
[<ffffffff8103b8b9>] ? kvm_clock_get_cycles+0x9/0x10
[<ffffffff810e1640>] handle_IRQ_event+0x60/0x170
[<ffffffff8103b154>] ? kvm_guest_apic_eoi_write+0x44/0x50
[<ffffffff810e3d8e>] handle_edge_irq+0xde/0x180
[<ffffffff8100de89>] handle_irq+0x49/0xa0
[<ffffffff81516c8c>] do_IRQ+0x6c/0xf0
[<ffffffff8100b9d3>] ret_from_intr+0x0/0x11
[<ffffffff81076f63>] ? __do_softirq+0x73/0x1e0
[<ffffffff8109b75b>] ? hrtimer_interrupt+0x14b/0x260
[<ffffffff8100c1cc>] ? call_softirq+0x1c/0x30
[<ffffffff8100de05>] ? do_softirq+0x65/0xa0
[<ffffffff81076d95>] ? irq_exit+0x85/0x90
[<ffffffff81516d80>] ? smp_apic_timer_interrupt+0x70/0x9b
[<ffffffff8100bb93>] ? apic_timer_interrupt+0x13/0x20

To avoid the issue let's change the second check to test whether serio is
NULL or not.

Also, let's take i8042_lock in i8042_start() and i8042_stop() instead of
trying to be overly smart and using memory barriers.

Signed-off-by: Chen Hong <chenhong3@huawei.com>
[dtor: take lock in i8042_start()/i8042_stop()]
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoPM / QoS: return -EINVAL for bogus strings
Dan Carpenter [Mon, 10 Jul 2017 07:21:40 +0000 (10:21 +0300)]
PM / QoS: return -EINVAL for bogus strings

commit 2ca30331c156ca9e97643ad05dd8930b8fe78b01 upstream.

In the current code, if the user accidentally writes a bogus command to
this sysfs file, then we set the latency tolerance to an uninitialized
variable.

Fixes: 2d984ad132a8 (PM / QoS: Introcuce latency tolerance device PM QoS type)
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agopowerpc/64: Fix atomic64_inc_not_zero() to return an int
Michael Ellerman [Tue, 11 Jul 2017 12:10:54 +0000 (22:10 +1000)]
powerpc/64: Fix atomic64_inc_not_zero() to return an int

commit 01e6a61aceb82e13bec29502a8eb70d9574f97ad upstream.

Although it's not documented anywhere, there is an expectation that
atomic64_inc_not_zero() returns a result which fits in an int. This is
the behaviour implemented on all arches except powerpc.

This has caused at least one bug in practice, in the percpu-refcount
code, where the long result from our atomic64_inc_not_zero() was
truncated to an int leading to lost references and stuck systems. That
was worked around in that code in commit 966d2b04e070 ("percpu-refcount:
fix reference leak during percpu-atomic transition").

To the best of my grepping abilities there are no other callers
in-tree which truncate the value, but we should fix it anyway. Because
the breakage is subtle and potentially very harmful I'm also tagging
it for stable.

Code generation is largely unaffected because in most cases the
callers are just using the result for a test anyway. In particular the
case of fget() that was mentioned in commit a6cf7ed5119f
("powerpc/atomic: Implement atomic*_inc_not_zero") generates exactly
the same code.

Fixes: a6cf7ed5119f ("powerpc/atomic: Implement atomic*_inc_not_zero")
Noticed-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agocrypto: atmel - only treat EBUSY as transient if backlog
Gilad Ben-Yossef [Wed, 28 Jun 2017 07:22:03 +0000 (10:22 +0300)]
crypto: atmel - only treat EBUSY as transient if backlog

commit 1606043f214f912a52195293614935811a6e3e53 upstream.

The Atmel SHA driver was treating -EBUSY as indication of queueing
to backlog without checking that backlog is enabled for the request.

Fix it by checking request flags.

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agocrypto: caam - fix signals handling
Horia Geantă [Fri, 7 Jul 2017 13:57:06 +0000 (16:57 +0300)]
crypto: caam - fix signals handling

commit 7459e1d25ffefa2b1be799477fcc1f6c62f6cec7 upstream.

Driver does not properly handle the case when signals interrupt
wait_for_completion_interruptible():
-it does not check for return value
-completion structure is allocated on stack; in case a signal interrupts
the sleep, it will go out of scope, causing the worker thread
(caam_jr_dequeue) to fail when it accesses it

wait_for_completion_interruptible() is replaced with uninterruptable
wait_for_completion().
We choose to block all signals while waiting for I/O (device executing
the split key generation job descriptor) since the alternative - in
order to have a deterministic device state - would be to flush the job
ring (aborting *all* in-progress jobs).

Fixes: 045e36780f115 ("crypto: caam - ahash hmac support")
Fixes: 4c1ec1f930154 ("crypto: caam - refactor key_gen, sg")
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agopowerpc: Fix emulation of mfocrf in emulate_step()
Anton Blanchard [Wed, 14 Jun 2017 23:46:39 +0000 (09:46 +1000)]
powerpc: Fix emulation of mfocrf in emulate_step()

commit 64e756c55aa46fc18fd53e8f3598b73b528d8637 upstream.

From POWER4 onwards, mfocrf() only places the specified CR field into
the destination GPR, and the rest of it is set to 0. The PowerPC AS
from version 3.0 now requires this behaviour.

The emulation code currently puts the entire CR into the destination GPR.
Fix it.

Fixes: 6888199f7fe5 ("[POWERPC] Emulate more instructions in software")
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoiscsi-target: Add login_keys_workaround attribute for non RFC initiators
Nicholas Bellinger [Fri, 7 Jul 2017 21:45:49 +0000 (14:45 -0700)]
iscsi-target: Add login_keys_workaround attribute for non RFC initiators

commit 138d351eefb727ab9e41a3dc5f112ceb4f6e59f2 upstream.

This patch re-introduces part of a long standing login workaround that
was recently dropped by:

  commit 1c99de981f30b3e7868b8d20ce5479fa1c0fea46
  Author: Nicholas Bellinger <nab@linux-iscsi.org>
  Date:   Sun Apr 2 13:36:44 2017 -0700

      iscsi-target: Drop work-around for legacy GlobalSAN initiator

Namely, the workaround for FirstBurstLength ended up being required by
Mellanox Flexboot PXE boot ROMs as reported by Robert.

So this patch re-adds the work-around for FirstBurstLength within
iscsi_check_proposer_for_optional_reply(), and makes the key optional
to respond when the initiator does not propose, nor respond to it.

Also as requested by Arun, this patch introduces a new TPG attribute
named 'login_keys_workaround' that controls the use of both the
FirstBurstLength workaround, as well as the two other existing
workarounds for gPXE iSCSI boot client.

By default, the workaround is enabled with login_keys_workaround=1,
since Mellanox FlexBoot requires it, and Arun has verified the Qlogic
MSFT initiator already proposes FirstBurstLength, so it's uneffected
by this re-adding this part of the original work-around.

Reported-by: Robert LeBlanc <robert@leblancnet.us>
Cc: Robert LeBlanc <robert@leblancnet.us>
Reviewed-by: Arun Easi <arun.easi@cavium.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
[bwh: Backported to 3.16:
 - Use DEF_TPG_ATTRIB() + TPG_ATTR() to define the attribute
 - Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoMIPS: Negate error syscall return in trace
James Hogan [Thu, 29 Jun 2017 09:12:34 +0000 (10:12 +0100)]
MIPS: Negate error syscall return in trace

commit 4f32a39d49b25eaa66d2420f1f03d371ea4cd906 upstream.

The sys_exit trace event takes a single return value for the system
call, which MIPS passes the value of the $v0 (result) register, however
MIPS returns positive error codes in $v0 with $a3 specifying that $v0
contains an error code. As a result erroring system calls are traced
returning positive error numbers that can't always be distinguished from
success.

Use regs_return_value() to negate the error code if $a3 is set.

Fixes: 1d7bf993e073 ("MIPS: ftrace: Add support for syscall tracepoints.")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/16651/
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agopowerpc/asm: Mark cr0 as clobbered in mftb()
Oliver O'Halloran [Thu, 6 Jul 2017 08:46:43 +0000 (18:46 +1000)]
powerpc/asm: Mark cr0 as clobbered in mftb()

commit 2400fd822f467cb4c886c879d8ad99feac9cf319 upstream.

The workaround for the CELL timebase bug does not correctly mark cr0 as
being clobbered. This means GCC doesn't know that the asm block changes cr0 and
might leave the result of an unrelated comparison in cr0 across the block, which
we then trash, leading to basically random behaviour.

Fixes: 859deea949c3 ("[POWERPC] Cell timebase bug workaround")
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
[mpe: Tweak change log and flag for stable]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agofs/dcache.c: fix spin lockup issue on nlru->lock
Sahitya Tummala [Mon, 10 Jul 2017 22:50:00 +0000 (15:50 -0700)]
fs/dcache.c: fix spin lockup issue on nlru->lock

commit b17c070fb624cf10162cf92ea5e1ec25cd8ac176 upstream.

__list_lru_walk_one() acquires nlru spin lock (nlru->lock) for longer
duration if there are more number of items in the lru list.  As per the
current code, it can hold the spin lock for upto maximum UINT_MAX
entries at a time.  So if there are more number of items in the lru
list, then "BUG: spinlock lockup suspected" is observed in the below
path:

  spin_bug+0x90
  do_raw_spin_lock+0xfc
  _raw_spin_lock+0x28
  list_lru_add+0x28
  dput+0x1c8
  path_put+0x20
  terminate_walk+0x3c
  path_lookupat+0x100
  filename_lookup+0x6c
  user_path_at_empty+0x54
  SyS_faccessat+0xd0
  el0_svc_naked+0x24

This nlru->lock is acquired by another CPU in this path -

  d_lru_shrink_move+0x34
  dentry_lru_isolate_shrink+0x48
  __list_lru_walk_one.isra.10+0x94
  list_lru_walk_node+0x40
  shrink_dcache_sb+0x60
  do_remount_sb+0xbc
  do_emergency_remount+0xb0
  process_one_work+0x228
  worker_thread+0x2e0
  kthread+0xf4
  ret_from_fork+0x10

Fix this lockup by reducing the number of entries to be shrinked from
the lru list to 1024 at once.  Also, add cond_resched() before
processing the lru list again.

Link: http://marc.info/?t=149722864900001&r=1&w=2
Link: http://lkml.kernel.org/r/1498707575-2472-1-git-send-email-stummala@codeaurora.org
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Suggested-by: Jan Kara <jack@suse.cz>
Suggested-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Alexander Polakov <apolyakov@beget.ru>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agomm/mmap.c: do not blow on PROT_NONE MAP_FIXED holes in the stack
Michal Hocko [Mon, 10 Jul 2017 22:49:51 +0000 (15:49 -0700)]
mm/mmap.c: do not blow on PROT_NONE MAP_FIXED holes in the stack

commit 561b5e0709e4a248c67d024d4d94b6e31e3edf2f upstream.

Commit 1be7107fbe18 ("mm: larger stack guard gap, between vmas") has
introduced a regression in some rust and Java environments which are
trying to implement their own stack guard page.  They are punching a new
MAP_FIXED mapping inside the existing stack Vma.

This will confuse expand_{downwards,upwards} into thinking that the
stack expansion would in fact get us too close to an existing non-stack
vma which is a correct behavior wrt safety.  It is a real regression on
the other hand.

Let's work around the problem by considering PROT_NONE mapping as a part
of the stack.  This is a gros hack but overflowing to such a mapping
would trap anyway an we only can hope that usespace knows what it is
doing and handle it propely.

Fixes: 1be7107fbe18 ("mm: larger stack guard gap, between vmas")
Link: http://lkml.kernel.org/r/20170705182849.GA18027@dhcp22.suse.cz
Signed-off-by: Michal Hocko <mhocko@suse.com>
Debugged-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agodrm/radeon: Fix eDP for single-display iMac10,1 (v2)
Mario Kleiner [Fri, 7 Jul 2017 02:57:04 +0000 (04:57 +0200)]
drm/radeon: Fix eDP for single-display iMac10,1 (v2)

commit 564d8a2cf3abf16575af48bdc3e86e92ee8a617d upstream.

The late 2009, 27 inch Apple iMac10,1 has an
internal eDP display and an external Mini-
Displayport output, driven by a DCE-3.2, RV730
Radeon Mobility HD-4670.

The machine worked fine in a dual-display setup
with eDP panel + externally connected HDMI
or DVI-D digital display sink, connected via
MiniDP to DVI or HDMI adapter.

However, booting the machine single-display with
only eDP panel results in a completely black
display - even backlight powering off, as soon as
the radeon modesetting driver loads.

This patch fixes the single dispay eDP case by
assigning encoders based on dig->linkb, similar
to DCE-4+. While this should not be generally
necessary (Alex: "...atom on normal boards
should be able to handle any mapping."), Apple
seems to use some special routing here.

One remaining problem not solved by this patch
is that an external Minidisplayport->DP sink
does still not work on iMac10,1, whereas external
DVI and HDMI sinks continue to work.

The problem affects at least all tested kernels
since Linux 3.13 - didn't test earlier kernels, so
backporting to stable probably makes sense.

v2: With the original patch from 2016, Alex was worried it
    will break other DCE3.2 systems. Use dmi_match() to
    apply this special encoder assignment only for the
    Apple iMac 10,1 from late 2009.

Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[bwh: Backported to 3.16:
 - Return the selected encoder rather than assiging it
 - Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agocfg80211: Validate frequencies nested in NL80211_ATTR_SCAN_FREQUENCIES
Srinivas Dasari [Thu, 6 Jul 2017 22:43:42 +0000 (01:43 +0300)]
cfg80211: Validate frequencies nested in NL80211_ATTR_SCAN_FREQUENCIES

commit d7f13f7450369281a5d0ea463cc69890a15923ae upstream.

validate_scan_freqs() retrieves frequencies from attributes
nested in the attribute NL80211_ATTR_SCAN_FREQUENCIES with
nla_get_u32(), which reads 4 bytes from each attribute
without validating the size of data received. Attributes
nested in NL80211_ATTR_SCAN_FREQUENCIES don't have an nla policy.

Validate size of each attribute before parsing to avoid potential buffer
overread.

Fixes: 2a519311926 ("cfg80211/nl80211: scanning (and mac80211 update to use it)")
Signed-off-by: Srinivas Dasari <dasaris@qti.qualcomm.com>
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agocfg80211: Define nla_policy for NL80211_ATTR_LOCAL_MESH_POWER_MODE
Srinivas Dasari [Thu, 6 Jul 2017 22:43:41 +0000 (01:43 +0300)]
cfg80211: Define nla_policy for NL80211_ATTR_LOCAL_MESH_POWER_MODE

commit 8feb69c7bd89513be80eb19198d48f154b254021 upstream.

Buffer overread may happen as nl80211_set_station() reads 4 bytes
from the attribute NL80211_ATTR_LOCAL_MESH_POWER_MODE without
validating the size of data received when userspace sends less
than 4 bytes of data with NL80211_ATTR_LOCAL_MESH_POWER_MODE.
Define nla_policy for NL80211_ATTR_LOCAL_MESH_POWER_MODE to avoid
the buffer overread.

Fixes: 3b1c5a5307f ("{cfg,nl}80211: mesh power mode primitives and userspace access")
Signed-off-by: Srinivas Dasari <dasaris@qti.qualcomm.com>
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agocfg80211: Check if PMKID attribute is of expected size
Srinivas Dasari [Thu, 6 Jul 2017 22:43:39 +0000 (01:43 +0300)]
cfg80211: Check if PMKID attribute is of expected size

commit 9361df14d1cbf966409d5d6f48bb334384fbe138 upstream.

nla policy checks for only maximum length of the attribute data
when the attribute type is NLA_BINARY. If userspace sends less
data than specified, the wireless drivers may access illegal
memory. When type is NLA_UNSPEC, nla policy check ensures that
userspace sends minimum specified length number of bytes.

Remove type assignment to NLA_BINARY from nla_policy of
NL80211_ATTR_PMKID to make this NLA_UNSPEC and to make sure minimum
WLAN_PMKID_LEN bytes are received from userspace with
NL80211_ATTR_PMKID.

Fixes: 67fbb16be69d ("nl80211: PMKSA caching support")
Signed-off-by: Srinivas Dasari <dasaris@qti.qualcomm.com>
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agotarget: Fix COMPARE_AND_WRITE caw_sem leak during se_cmd quiesce
Jiang Yi [Sun, 25 Jun 2017 19:28:50 +0000 (12:28 -0700)]
target: Fix COMPARE_AND_WRITE caw_sem leak during se_cmd quiesce

commit 1d6ef276594a781686058802996e09c8550fd767 upstream.

This patch addresses a COMPARE_AND_WRITE se_device->caw_sem leak,
that would be triggered during normal se_cmd shutdown or abort
via __transport_wait_for_tasks().

This would occur because target_complete_cmd() would catch this
early and do complete_all(&cmd->t_transport_stop_comp), but since
target_complete_ok_work() or target_complete_failure_work() are
never called to invoke se_cmd->transport_complete_callback(),
the COMPARE_AND_WRITE specific callbacks never release caw_sem.

To address this special case, go ahead and release caw_sem
directly from target_complete_cmd().

(Remove '&& success' from check, to release caw_sem regardless
 of scsi_status - nab)

Signed-off-by: Jiang Yi <jiangyilism@gmail.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agotpm: fix a kernel memory leak in tpm-sysfs.c
Jarkko Sakkinen [Tue, 20 Jun 2017 09:38:02 +0000 (11:38 +0200)]
tpm: fix a kernel memory leak in tpm-sysfs.c

commit 13b47cfcfc60495cde216eef4c01040d76174cbe upstream.

While cleaning up sysfs callback that prints EK we discovered a kernel
memory leak. This commit fixes the issue by zeroing the buffer used for
TPM command/response.

The leak happen when we use either tpm_vtpm_proxy, tpm_ibmvtpm or
xen-tpmfront.

Fixes: 0883743825e3 ("TPM: sysfs functions consolidation")
Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Tested-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agortc: rtc-nuc900: fix loop timeout test
Dan Carpenter [Fri, 23 Jun 2017 08:29:00 +0000 (11:29 +0300)]
rtc: rtc-nuc900: fix loop timeout test

commit d0a67c372df410b579197ea818596001fe20070d upstream.

We should change this post-op to a pre-op because we want the loop to
exit with "timeout" set to zero.

Fixes: 0a89b55364e0 ("nuc900/rtc: change the waiting for device ready implement")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>