git.itanic.dy.fi Git - linux-stable/log

Linux 3.18.125

sched/fair: Fix throttle_list starvation with low CFS quota

commit baa9be4ffb55876923dc9716abc0a448e510ba30 upstream.

With a very low cpu.cfs_quota_us setting, such as the minimum of 1000,
distribute_cfs_runtime may not empty the throttled_list before it runs
out of runtime to distribute. In that case, due to the change from
c06f04c7048 to put throttled entries at the head of the list, later entries
on the list will starve.  Essentially, the same X processes will get pulled
off the list, given CPU time and then, when expired, get put back on the
head of the list where distribute_cfs_runtime will give runtime to the same
set of processes leaving the rest.

Fix the issue by setting a bit in struct cfs_bandwidth when
distribute_cfs_runtime is running, so that the code in throttle_cfs_rq can
decide to put the throttled entry on the tail or the head of the list.  The
bit is set/cleared by the callers of distribute_cfs_runtime while they hold
cfs_bandwidth->lock.

This is easy to reproduce with a handful of CPU consumers. I use 'crash' on
the live system. In some cases you can simply look at the throttled list and
see the later entries are not changing:

  crash> list cfs_rq.throttled_list -H 0xffff90b54f6ade40 -s cfs_rq.runtime_remaining | paste - - | awk '{print $1"  "$4}' | pr -t -n3
    1     ffff90b56cb2d200  -976050
    2     ffff90b56cb2cc00  -484925
    3     ffff90b56cb2bc00  -658814
    4     ffff90b56cb2ba00  -275365
    5     ffff90b166a45600  -135138
    6     ffff90b56cb2da00  -282505
    7     ffff90b56cb2e000  -148065
    8     ffff90b56cb2fa00  -872591
    9     ffff90b56cb2c000  -84687
   10     ffff90b56cb2f000  -87237
   11     ffff90b166a40a00  -164582

  crash> list cfs_rq.throttled_list -H 0xffff90b54f6ade40 -s cfs_rq.runtime_remaining | paste - - | awk '{print $1"  "$4}' | pr -t -n3
    1     ffff90b56cb2d200  -994147
    2     ffff90b56cb2cc00  -306051
    3     ffff90b56cb2bc00  -961321
    4     ffff90b56cb2ba00  -24490
    5     ffff90b166a45600  -135138
    6     ffff90b56cb2da00  -282505
    7     ffff90b56cb2e000  -148065
    8     ffff90b56cb2fa00  -872591
    9     ffff90b56cb2c000  -84687
   10     ffff90b56cb2f000  -87237
   11     ffff90b166a40a00  -164582

Sometimes it is easier to see by finding a process getting starved and looking
at the sched_info:

  crash> task ffff8eb765994500 sched_info
  PID: 7800   TASK: ffff8eb765994500  CPU: 16  COMMAND: "cputest"
    sched_info = {
      pcount = 8,
      run_delay = 697094208,
      last_arrival = 240260125039,
      last_queued = 240260327513
    },
  crash> task ffff8eb765994500 sched_info
  PID: 7800   TASK: ffff8eb765994500  CPU: 16  COMMAND: "cputest"
    sched_info = {
      pcount = 8,
      run_delay = 697094208,
      last_arrival = 240260125039,
      last_queued = 240260327513
    },

Signed-off-by: Phil Auld <pauld@redhat.com>
Reviewed-by: Ben Segall <bsegall@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Fixes: c06f04c70489 ("sched: Fix potential near-infinite distribute_cfs_runtime() loop")
Link: http://lkml.kernel.org/r/20181008143639.GA4019@pauld.bos.csb
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

USB: fix the usbfs flag sanitization for control transfers

commit 665c365a77fbfeabe52694aedf3446d5f2f1ce42 upstream.

Commit 7a68d9fb8510 ("USB: usbdevfs: sanitize flags more") checks the
transfer flags for URBs submitted from userspace via usbfs. However,
the check for whether the USBDEVFS_URB_SHORT_NOT_OK flag should be
allowed for a control transfer was added in the wrong place, before
the code has properly determined the direction of the control
transfer. (Control transfers are special because for them, the
direction is set by the bRequestType byte of the Setup packet rather
than direction bit of the endpoint address.)

This patch moves code which sets up the allow_short flag for control
transfers down after is_in has been set to the correct value.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-and-tested-by: syzbot+24a30223a4b609bb802e@syzkaller.appspotmail.com
Fixes: 7a68d9fb8510 ("USB: usbdevfs: sanitize flags more")
CC: Oliver Neukum <oneukum@suse.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

cdc-acm: correct counting of UART states in serial state notification

commit f976d0e5747ca65ccd0fb2a4118b193d70aa1836 upstream.

The usb standard ("Universal Serial Bus Class Definitions for Communication
Devices") distiguishes between "consistent signals" (DSR, DCD), and
"irregular signals" (break, ring, parity error, framing error, overrun).
The bits of "irregular signals" are set, if this error/event occurred on
the device side and are immeadeatly unset, if the serial state notification
was sent.
Like other drivers of real serial ports do, just the occurence of those
events should be counted in serial_icounter_struct (but no 1->0
transitions).

Signed-off-by: Tobias Herzog <t-herzog@gmx.de>
Acked-by: Oliver Neukum <oneukum@suse.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

cachefiles: fix the race between cachefiles_bury_object() and rmdir(2)

commit 169b803397499be85bdd1e3d07d6f5e3d4bd669e upstream.

the victim might've been rmdir'ed just before the lock_rename();
unlike the normal callers, we do not look the source up after the
parents are locked - we know it beforehand and just recheck that it's
still the child of what used to be its parent. Unfortunately,
the check is too weak - we don't spot a dead directory since its
->d_parent is unchanged, dentry is positive, etc. So we sail all
the way to ->rename(), with hosting filesystems _not_ expecting
to be asked renaming an rmdir'ed subdirectory.

The fix is easy, fortunately - the lock on parent is sufficient for
making IS_DEADDIR() on child safe.

Cc: stable@vger.kernel.org
Fixes: 9ae326a69004 (CacheFiles: A cache that backs onto a mounted filesystem)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net: sched: gred: pass the right attribute to gred_change_table_def()

[ Upstream commit 38b4f18d56372e1e21771ab7b0357b853330186c ]

gred_change_table_def() takes a pointer to TCA_GRED_DPS attribute,
and expects it will be able to interpret its contents as
struct tc_gred_sopt.  Pass the correct gred attribute, instead of
TCA_OPTIONS.

This bug meant the table definition could never be changed after
Qdisc was initialized (unless whatever TCA_OPTIONS contained both
passed netlink validation and was a valid struct tc_gred_sopt...).

Old behaviour:
$ ip link add type dummy
$ tc qdisc replace dev dummy0 parent root handle 7: \
     gred setup vqs 4 default 0
$ tc qdisc replace dev dummy0 parent root handle 7: \
     gred setup vqs 4 default 0
RTNETLINK answers: Invalid argument

Now:
$ ip link add type dummy
$ tc qdisc replace dev dummy0 parent root handle 7: \
     gred setup vqs 4 default 0
$ tc qdisc replace dev dummy0 parent root handle 7: \
     gred setup vqs 4 default 0
$ tc qdisc replace dev dummy0 parent root handle 7: \
     gred setup vqs 4 default 0

Fixes: f62d6b936df5 ("[PKT_SCHED]: GRED: Use central VQ change procedure")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

rtnetlink: Disallow FDB configuration for non-Ethernet device

[ Upstream commit da71577545a52be3e0e9225a946e5fd79cfab015 ]

When an FDB entry is configured, the address is validated to have the
length of an Ethernet address, but the device for which the address is
configured can be of any type.

The above can result in the use of uninitialized memory when the address
is later compared against existing addresses since 'dev->addr_len' is
used and it may be greater than ETH_ALEN, as with ip6tnl devices.

Fix this by making sure that FDB entries are only configured for
Ethernet devices.

BUG: KMSAN: uninit-value in memcmp+0x11d/0x180 lib/string.c:863
CPU: 1 PID: 4318 Comm: syz-executor998 Not tainted 4.19.0-rc3+ #49
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x14b/0x190 lib/dump_stack.c:113
  kmsan_report+0x183/0x2b0 mm/kmsan/kmsan.c:956
  __msan_warning+0x70/0xc0 mm/kmsan/kmsan_instr.c:645
  memcmp+0x11d/0x180 lib/string.c:863
  dev_uc_add_excl+0x165/0x7b0 net/core/dev_addr_lists.c:464
  ndo_dflt_fdb_add net/core/rtnetlink.c:3463 [inline]
  rtnl_fdb_add+0x1081/0x1270 net/core/rtnetlink.c:3558
  rtnetlink_rcv_msg+0xa0b/0x1530 net/core/rtnetlink.c:4715
  netlink_rcv_skb+0x36e/0x5f0 net/netlink/af_netlink.c:2454
  rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4733
  netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
  netlink_unicast+0x1638/0x1720 net/netlink/af_netlink.c:1343
  netlink_sendmsg+0x1205/0x1290 net/netlink/af_netlink.c:1908
  sock_sendmsg_nosec net/socket.c:621 [inline]
  sock_sendmsg net/socket.c:631 [inline]
  ___sys_sendmsg+0xe70/0x1290 net/socket.c:2114
  __sys_sendmsg net/socket.c:2152 [inline]
  __do_sys_sendmsg net/socket.c:2161 [inline]
  __se_sys_sendmsg+0x2a3/0x3d0 net/socket.c:2159
  __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2159
  do_syscall_64+0xb8/0x100 arch/x86/entry/common.c:291
  entry_SYSCALL_64_after_hwframe+0x63/0xe7
RIP: 0033:0x440ee9
Code: e8 cc ab 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
ff 0f 83 bb 0a fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fff6a93b518 EFLAGS: 00000213 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000440ee9
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000003
RBP: 0000000000000000 R08: 00000000004002c8 R09: 00000000004002c8
R10: 00000000004002c8 R11: 0000000000000213 R12: 000000000000b4b0
R13: 0000000000401ec0 R14: 0000000000000000 R15: 0000000000000000

Uninit was created at:
  kmsan_save_stack_with_flags mm/kmsan/kmsan.c:256 [inline]
  kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:181
  kmsan_kmalloc+0x98/0x100 mm/kmsan/kmsan_hooks.c:91
  kmsan_slab_alloc+0x10/0x20 mm/kmsan/kmsan_hooks.c:100
  slab_post_alloc_hook mm/slab.h:446 [inline]
  slab_alloc_node mm/slub.c:2718 [inline]
  __kmalloc_node_track_caller+0x9e7/0x1160 mm/slub.c:4351
  __kmalloc_reserve net/core/skbuff.c:138 [inline]
  __alloc_skb+0x2f5/0x9e0 net/core/skbuff.c:206
  alloc_skb include/linux/skbuff.h:996 [inline]
  netlink_alloc_large_skb net/netlink/af_netlink.c:1189 [inline]
  netlink_sendmsg+0xb49/0x1290 net/netlink/af_netlink.c:1883
  sock_sendmsg_nosec net/socket.c:621 [inline]
  sock_sendmsg net/socket.c:631 [inline]
  ___sys_sendmsg+0xe70/0x1290 net/socket.c:2114
  __sys_sendmsg net/socket.c:2152 [inline]
  __do_sys_sendmsg net/socket.c:2161 [inline]
  __se_sys_sendmsg+0x2a3/0x3d0 net/socket.c:2159
  __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2159
  do_syscall_64+0xb8/0x100 arch/x86/entry/common.c:291
  entry_SYSCALL_64_after_hwframe+0x63/0xe7

v2:
* Make error message more specific (David)

Fixes: 090096bf3db1 ("net: generic fdb support for drivers without ndo_fdb_<op>")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-and-tested-by: syzbot+3a288d5f5530b901310e@syzkaller.appspotmail.com
Reported-and-tested-by: syzbot+d53ab4e92a1db04110ff@syzkaller.appspotmail.com
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net: drop skb on failure in ip_check_defrag()

[ Upstream commit 7de414a9dd91426318df7b63da024b2b07e53df5 ]

Most callers of pskb_trim_rcsum() simply drop the skb when
it fails, however, ip_check_defrag() still continues to pass
the skb up to stack. This is suspicious.

In ip_check_defrag(), after we learn the skb is an IP fragment,
passing the skb to callers makes no sense, because callers expect
fragments are defrag'ed on success. So, dropping the skb when we
can't defrag it is reasonable.

Note, prior to commit 88078d98d1bb, this is not a big problem as
checksum will be fixed up anyway. After it, the checksum is not
correct on failure.

Found this during code review.

Fixes: 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends")
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

sctp: fix race on sctp_id2asoc

[ Upstream commit b336decab22158937975293aea79396525f92bb3 ]

syzbot reported an use-after-free involving sctp_id2asoc.  Dmitry Vyukov
helped to root cause it and it is because of reading the asoc after it
was freed:

        CPU 1                       CPU 2
(working on socket 1)            (working on socket 2)
                         sctp_association_destroy
sctp_id2asoc
   spin lock
     grab the asoc from idr
   spin unlock
                                   spin lock
     remove asoc from idr
   spin unlock
   free(asoc)
   if asoc->base.sk != sk ... [*]

This can only be hit if trying to fetch asocs from different sockets. As
we have a single IDR for all asocs, in all SCTP sockets, their id is
unique on the system. An application can try to send stuff on an id
that matches on another socket, and the if in [*] will protect from such
usage. But it didn't consider that as that asoc may belong to another
socket, it may be freed in parallel (read: under another socket lock).

We fix it by moving the checks in [*] into the protected region. This
fixes it because the asoc cannot be freed while the lock is held.

Reported-by: syzbot+c7dd55d7aec49d48e49a@syzkaller.appspotmail.com
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

r8169: fix NAPI handling under high load

[ Upstream commit 6b839b6cf9eada30b086effb51e5d6076bafc761 ]

rtl_rx() and rtl_tx() are called only if the respective bits are set
in the interrupt status register. Under high load NAPI may not be
able to process all data (work_done == budget) and it will schedule
subsequent calls to the poll callback.
rtl_ack_events() however resets the bits in the interrupt status
register, therefore subsequent calls to rtl8169_poll() won't call
rtl_rx() and rtl_tx() - chip interrupts are still disabled.

Fix this by calling rtl_rx() and rtl_tx() independent of the bits
set in the interrupt status register. Both functions will detect
if there's nothing to do for them.

Fixes: da78dbff2e05 ("r8169: remove work from irq handler.")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net: stmmac: Fix stmmac_mdio_reset() when building stmmac as modules

[ Upstream commit 30549aab146ccb1275230c3b4b4bc6b4181fd54e ]

When building stmmac, it is only possible to select CONFIG_DWMAC_GENERIC,
or any of the glue drivers, when CONFIG_STMMAC_PLATFORM is set.
The only exception is CONFIG_STMMAC_PCI.

When calling of_mdiobus_register(), it will call our ->reset()
callback, which is set to stmmac_mdio_reset().

Most of the code in stmmac_mdio_reset() is protected by a
"#if defined(CONFIG_STMMAC_PLATFORM)", which will evaluate
to false when CONFIG_STMMAC_PLATFORM=m.

Because of this, the phy reset gpio will only be pulled when
stmmac is built as built-in, but not when built as modules.

Fix this by using "#if IS_ENABLED()" instead of "#if defined()".

Signed-off-by: Niklas Cassel <niklas.cassel@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net: socket: fix a missing-check bug

[ Upstream commit b6168562c8ce2bd5a30e213021650422e08764dc ]

In ethtool_ioctl(), the ioctl command 'ethcmd' is checked through a switch
statement to see whether it is necessary to pre-process the ethtool
structure, because, as mentioned in the comment, the structure
ethtool_rxnfc is defined with padding. If yes, a user-space buffer 'rxnfc'
is allocated through compat_alloc_user_space(). One thing to note here is
that, if 'ethcmd' is ETHTOOL_GRXCLSRLALL, the size of the buffer 'rxnfc' is
partially determined by 'rule_cnt', which is actually acquired from the
user-space buffer 'compat_rxnfc', i.e., 'compat_rxnfc->rule_cnt', through
get_user(). After 'rxnfc' is allocated, the data in the original user-space
buffer 'compat_rxnfc' is then copied to 'rxnfc' through copy_in_user(),
including the 'rule_cnt' field. However, after this copy, no check is
re-enforced on 'rxnfc->rule_cnt'. So it is possible that a malicious user
race to change the value in the 'compat_rxnfc->rule_cnt' between these two
copies. Through this way, the attacker can bypass the previous check on
'rule_cnt' and inject malicious data. This can cause undefined behavior of
the kernel and introduce potential security risk.

This patch avoids the above issue via copying the value acquired by
get_user() to 'rxnfc->rule_cn', if 'ethcmd' is ETHTOOL_GRXCLSRLALL.

Signed-off-by: Wenwen Wang <wang6495@umn.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net/ipv6: Fix index counter for unicast addresses in in6_dump_addrs

[ Upstream commit 4ba4c566ba8448a05e6257e0b98a21f1a0d55315 ]

The loop wants to skip previously dumped addresses, so loops until
current index >= saved index. If the message fills it wants to save
the index for the next address to dump - ie., the one that did not
fit in the current message.

Currently, it is incrementing the index counter before comparing to the
saved index, and then the saved index is off by 1 - it assumes the
current address is going to fit in the message.

Change the index handling to increment only after a succesful dump.

Fixes: 502a2ffd7376a ("ipv6: convert idev_list to list macros")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ipv6/ndisc: Preserve IPv6 control buffer if protocol error handlers are called

[ Upstream commit ee1abcf689353f36d9322231b4320926096bdee0 ]

Commit a61bbcf28a8c ("[NET]: Store skb->timestamp as offset to a base
timestamp") introduces a neighbour control buffer and zeroes it out in
ndisc_rcv(), as ndisc_recv_ns() uses it.

Commit f2776ff04722 ("[IPV6]: Fix address/interface handling in UDP and
DCCP, according to the scoping architecture.") introduces the usage of the
IPv6 control buffer in protocol error handlers (e.g. inet6_iif() in
present-day __udp6_lib_err()).

Now, with commit b94f1c0904da ("ipv6: Use icmpv6_notify() to propagate
redirect, instead of rt6_redirect()."), we call protocol error handlers
from ndisc_redirect_rcv(), after the control buffer is already stolen and
some parts are already zeroed out. This implies that inet6_iif() on this
path will always return zero.

This gives unexpected results on UDP socket lookup in __udp6_lib_err(), as
we might actually need to match sockets for a given interface.

Instead of always claiming the control buffer in ndisc_rcv(), do that only
when needed.

Fixes: b94f1c0904da ("ipv6: Use icmpv6_notify() to propagate redirect, instead of rt6_redirect().")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ipv6: mcast: fix a use-after-free in inet6_mc_check

[ Upstream commit dc012f3628eaecfb5ba68404a5c30ef501daf63d ]

syzbot found a use-after-free in inet6_mc_check [1]

The problem here is that inet6_mc_check() uses rcu
and read_lock(&iml->sflock)

So the fact that ip6_mc_leave_src() is called under RTNL
and the socket lock does not help us, we need to acquire
iml->sflock in write mode.

In the future, we should convert all this stuff to RCU.

[1]
BUG: KASAN: use-after-free in ipv6_addr_equal include/net/ipv6.h:521 [inline]
BUG: KASAN: use-after-free in inet6_mc_check+0xae7/0xb40 net/ipv6/mcast.c:649
Read of size 8 at addr ffff8801ce7f2510 by task syz-executor0/22432

CPU: 1 PID: 22432 Comm: syz-executor0 Not tainted 4.19.0-rc7+ #280
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1c4/0x2b4 lib/dump_stack.c:113
print_address_description.cold.8+0x9/0x1ff mm/kasan/report.c:256
kasan_report_error mm/kasan/report.c:354 [inline]
kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412
__asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
ipv6_addr_equal include/net/ipv6.h:521 [inline]
inet6_mc_check+0xae7/0xb40 net/ipv6/mcast.c:649
__raw_v6_lookup+0x320/0x3f0 net/ipv6/raw.c:98
ipv6_raw_deliver net/ipv6/raw.c:183 [inline]
raw6_local_deliver+0x3d3/0xcb0 net/ipv6/raw.c:240
ip6_input_finish+0x467/0x1aa0 net/ipv6/ip6_input.c:345
NF_HOOK include/linux/netfilter.h:289 [inline]
ip6_input+0xe9/0x600 net/ipv6/ip6_input.c:426
ip6_mc_input+0x48a/0xd20 net/ipv6/ip6_input.c:503
dst_input include/net/dst.h:450 [inline]
ip6_rcv_finish+0x17a/0x330 net/ipv6/ip6_input.c:76
NF_HOOK include/linux/netfilter.h:289 [inline]
ipv6_rcv+0x120/0x640 net/ipv6/ip6_input.c:271
__netif_receive_skb_one_core+0x14d/0x200 net/core/dev.c:4913
__netif_receive_skb+0x2c/0x1e0 net/core/dev.c:5023
netif_receive_skb_internal+0x12c/0x620 net/core/dev.c:5126
napi_frags_finish net/core/dev.c:5664 [inline]
napi_gro_frags+0x75a/0xc90 net/core/dev.c:5737
tun_get_user+0x3189/0x4250 drivers/net/tun.c:1923
tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:1968
call_write_iter include/linux/fs.h:1808 [inline]
do_iter_readv_writev+0x8b0/0xa80 fs/read_write.c:680
do_iter_write+0x185/0x5f0 fs/read_write.c:959
vfs_writev+0x1f1/0x360 fs/read_write.c:1004
do_writev+0x11a/0x310 fs/read_write.c:1039
__do_sys_writev fs/read_write.c:1112 [inline]
__se_sys_writev fs/read_write.c:1109 [inline]
__x64_sys_writev+0x75/0xb0 fs/read_write.c:1109
do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457421
Code: 75 14 b8 14 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 34 b5 fb ff c3 48 83 ec 08 e8 1a 2d 00 00 48 89 04 24 b8 14 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 63 2d 00 00 48 89 d0 48 83 c4 08 48 3d 01
RSP: 002b:00007f2d30ecaba0 EFLAGS: 00000293 ORIG_RAX: 0000000000000014
RAX: ffffffffffffffda RBX: 000000000000003e RCX: 0000000000457421
RDX: 0000000000000001 RSI: 00007f2d30ecabf0 RDI: 00000000000000f0
RBP: 0000000020000500 R08: 00000000000000f0 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000293 R12: 00007f2d30ecb6d4
R13: 00000000004c4890 R14: 00000000004d7b90 R15: 00000000ffffffff

Allocated by task 22437:
save_stack+0x43/0xd0 mm/kasan/kasan.c:448
set_track mm/kasan/kasan.c:460 [inline]
kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553
__do_kmalloc mm/slab.c:3718 [inline]
__kmalloc+0x14e/0x760 mm/slab.c:3727
kmalloc include/linux/slab.h:518 [inline]
sock_kmalloc+0x15a/0x1f0 net/core/sock.c:1983
ip6_mc_source+0x14dd/0x1960 net/ipv6/mcast.c:427
do_ipv6_setsockopt.isra.9+0x3afb/0x45d0 net/ipv6/ipv6_sockglue.c:743
ipv6_setsockopt+0xbd/0x170 net/ipv6/ipv6_sockglue.c:933
rawv6_setsockopt+0x59/0x140 net/ipv6/raw.c:1069
sock_common_setsockopt+0x9a/0xe0 net/core/sock.c:3038
__sys_setsockopt+0x1ba/0x3c0 net/socket.c:1902
__do_sys_setsockopt net/socket.c:1913 [inline]
__se_sys_setsockopt net/socket.c:1910 [inline]
__x64_sys_setsockopt+0xbe/0x150 net/socket.c:1910
do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 22430:
save_stack+0x43/0xd0 mm/kasan/kasan.c:448
set_track mm/kasan/kasan.c:460 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521
kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
__cache_free mm/slab.c:3498 [inline]
kfree+0xcf/0x230 mm/slab.c:3813
__sock_kfree_s net/core/sock.c:2004 [inline]
sock_kfree_s+0x29/0x60 net/core/sock.c:2010
ip6_mc_leave_src+0x11a/0x1d0 net/ipv6/mcast.c:2448
__ipv6_sock_mc_close+0x20b/0x4e0 net/ipv6/mcast.c:310
ipv6_sock_mc_close+0x158/0x1d0 net/ipv6/mcast.c:328
inet6_release+0x40/0x70 net/ipv6/af_inet6.c:452
__sock_release+0xd7/0x250 net/socket.c:579
sock_close+0x19/0x20 net/socket.c:1141
__fput+0x385/0xa30 fs/file_table.c:278
____fput+0x15/0x20 fs/file_table.c:309
task_work_run+0x1e8/0x2a0 kernel/task_work.c:113
tracehook_notify_resume include/linux/tracehook.h:193 [inline]
exit_to_usermode_loop+0x318/0x380 arch/x86/entry/common.c:166
prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
do_syscall_64+0x6be/0x820 arch/x86/entry/common.c:293
entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff8801ce7f2500
which belongs to the cache kmalloc-192 of size 192
The buggy address is located 16 bytes inside of
192-byte region [ffff8801ce7f2500, ffff8801ce7f25c0)
The buggy address belongs to the page:
page:ffffea000739fc80 count:1 mapcount:0 mapping:ffff8801da800040 index:0x0
flags: 0x2fffc0000000100(slab)
raw: 02fffc0000000100 ffffea0006f6e548 ffffea000737b948 ffff8801da800040
raw: 0000000000000000 ffff8801ce7f2000 0000000100000010 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff8801ce7f2400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8801ce7f2480: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
>ffff8801ce7f2500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8801ce7f2580: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
ffff8801ce7f2600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mremap: properly flush TLB before releasing the page

Commit eb66ae030829605d61fbef1909ce310e29f78821 upstream.

This is a backport to stable 3.18.y, based on Will Deacon's 4.4.y
backport.

Jann Horn points out that our TLB flushing was subtly wrong for the
mremap() case. What makes mremap() special is that we don't follow the
usual "add page to list of pages to be freed, then flush tlb, and then
free pages". No, mremap() obviously just _moves_ the page from one page
table location to another.

That matters, because mremap() thus doesn't directly control the
lifetime of the moved page with a freelist: instead, the lifetime of the
page is controlled by the page table locking, that serializes access to
the entry.

As a result, we need to flush the TLB not just before releasing the lock
for the source location (to avoid any concurrent accesses to the entry),
but also before we release the destination page table lock (to avoid the
TLB being flushed after somebody else has already done something to that
page).

This also makes the whole "need_flush" logic unnecessary, since we now
always end up flushing the TLB for every valid entry.

Reported-and-tested-by: Jann Horn <jannh@google.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Tested-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[will: backport to 4.4 stable]
Signed-off-by: Will Deacon <will.deacon@arm.com>
[ghackmann@google.com: adjust context]
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

/proc/iomem: only expose physical resource addresses to privileged users

commit 51d7b120418e99d6b3bf8df9eb3cc31e8171dee4 upstream.

In commit c4004b02f8e5b ("x86: remove the kernel code/data/bss resources
from /proc/iomem") I was hoping to remove the phyiscal kernel address
data from /proc/iomem entirely, but that had to be reverted because some
system programs actually use it.

This limits all the detailed resource information to properly
credentialed users instead.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mark Salyzyn <salyzyn@android.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

perf tools: Disable parallelism for 'make clean'

[ Upstream commit da15fc2fa9c07b23db8f5e479bd8a9f0d741ca07 ]

The Yocto build system does a 'make clean' when rebuilding due to
changed dependencies, and that consistently fails for me (causing the
whole BSP build to fail) with errors such as

| find: '[...]/perf/1.0-r9/perf-1.0/plugin_mac80211.so': No such file or directory
| find: '[...]/perf/1.0-r9/perf-1.0/plugin_mac80211.so': No such file or directory
| find: find: '[...]/perf/1.0-r9/perf-1.0/libtraceevent.a''[...]/perf/1.0-r9/perf-1.0/libtraceevent.a': No such file or directory: No such file or directory
|
[...]
| find: cannot delete '/mnt/xfs/devel/pil/yocto/tmp-glibc/work/wandboard-oe-linux-gnueabi/perf/1.0-r9/perf-1.0/util/.pstack.o.cmd': No such file or directory

Apparently (despite the comment), 'make clean' ends up launching
multiple sub-makes that all want to remove the same things - perhaps
this only happens in combination with a O=... parameter. In any case, we
don't lose much by explicitly disabling the parallelism for the clean
target, and it makes automated builds much more reliable.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180705131527.19749-1-linux@rasmusvillemoes.dk
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

fs/fat/fatent.c: add cond_resched() to fat_count_free_clusters()

[ Upstream commit ac081c3be3fae6d0cc3e1862507fca3862d30b67 ]

On non-preempt kernels this loop can take a long time (more than 50 ticks)
processing through entries.

Link: http://lkml.kernel.org/r/20181010172623.57033-1-khazhy@google.com
Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

unix: correctly track in-flight fds in sending process user_struct

[ Upstream commit 415e3d3e90ce9e18727e8843ae343eda5a58fad6 ]

The commit referenced in the Fixes tag incorrectly accounted the number
of in-flight fds over a unix domain socket to the original opener
of the file-descriptor. This allows another process to arbitrary
deplete the original file-openers resource limit for the maximum of
open files. Instead the sending processes and its struct cred should
be credited.

To do so, we add a reference counted struct user_struct pointer to the
scm_fp_list and use it to account for the number of inflight unix fds.

Fixes: 712f4aad406bb1 ("unix: properly account for FDs passed over unix sockets")
Reported-by: David Herrmann <dh.herrmann@gmail.com>
Cc: David Herrmann <dh.herrmann@gmail.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>

x86/PCI: Mark Broadwell-EP Home Agent 1 as having non-compliant BARs

[ Upstream commit da77b67195de1c65bef4908fa29967c4d0af2da2 ]

Commit b894157145e4 ("x86/PCI: Mark Broadwell-EP Home Agent & PCU as having
non-compliant BARs") marked Home Agent 0 & PCU has having non-compliant
BARs.  Home Agent 1 also has non-compliant BARs.

Mark Home Agent 1 as having non-compliant BARs so the PCI core doesn't
touch them.

The problem with these devices is documented in the Xeon v4 specification
update:

  BDF2          PCI BARs in the Home Agent Will Return Non-Zero Values
                During Enumeration

  Problem:      During system initialization the Operating System may access
                the standard PCI BARs (Base Address Registers).  Due to
                this erratum, accesses to the Home Agent BAR registers (Bus
                1; Device 18; Function 0,4; Offsets (0x14-0x24) will return
                non-zero values.

  Implication:  The operating system may issue a warning.  Intel has not
                observed any functional failures due to this erratum.

Link: http://www.intel.com/content/www/us/en/processors/xeon/xeon-e5-v4-spec-update.html
Fixes: b894157145e4 ("x86/PCI: Mark Broadwell-EP Home Agent & PCU as having non-compliant BARs")
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

net: fix warnings in 'make htmldocs' by moving macro definition out of field declaration

[ Upstream commit 7bbadd2d1009575dad675afc16650ebb5aa10612 ]

Docbook does not like the definition of macros inside a field declaration
and adds a warning. Move the definition out.

Fixes: 79462ad02e86180 ("net: add validation for the socket syscall protocol argument")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>

USB: hub: fix up early-exit pathway in hub_activate

[ Upstream commit ca5cbc8b02f9b21cc8cd1ab36668763ec34f9ee8 ]

The early-exit pathway in hub_activate, added by commit e50293ef9775
("USB: fix invalid memory access in hub_activate()") needs
improvement. It duplicates code that is already present at the end of
the subroutine, and it neglects to undo the effect of a
usb_autopm_get_interface_no_resume() call.

This patch fixes both problems by making the early-exit pathway jump
directly to the end of the subroutine. It simplifies the code at the
end by merging two conditionals that actually test the same condition
although they appear different: If type < HUB_INIT3 then type must be
either HUB_INIT2 or HUB_INIT, and it can't be HUB_INIT because in that
case the subroutine would have exited earlier.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
CC: <stable@vger.kernel.org> #4.4+
Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

KEYS: put keyring if install_session_keyring_to_cred() fails

[ Upstream commit d636bd9f12a66ea3775c9fabbf3f8e118253467a ]

In join_session_keyring(), if install_session_keyring_to_cred() were to
fail, we would leak the keyring reference, just like in the bug fixed by
commit 23567fd052a9 ("KEYS: Fix keyring ref leak in
join_session_keyring()"). Fortunately this cannot happen currently, but
we really should be more careful. Do this by adding and using a new
error label at which the keyring reference is dropped.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

igb: fix NULL derefs due to skipped SR-IOV enabling

[ Upstream commit be06998f96ecb93938ad2cce46c4289bf7cf45bc ]

The combined effect of commits 6423fc3416 ("igb: do not re-init SR-IOV
during probe") and ceee3450b3 ("igb: make sure SR-IOV init uses the
right number of queues") causes VFs no longer getting set up, leading
to NULL pointer dereferences due to the adapter's ->vf_data being NULL
while ->vfs_allocated_count is non-zero. The first commit not only
neglected the side effect of igb_sriov_reinit() that the second commit
tried to account for, but also that of setting IGB_FLAG_HAS_MSIX,
without which igb_enable_sriov() is effectively a no-op. Calling
igb_{,re}set_interrupt_capability() as done here seems to address this,
but I'm not sure whether this is better than sinply reverting the other
two commits.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

ovl: fix open in stacked overlay

[ Upstream commit 1c8a47df36d72ace8cf78eb6c228aa0f8027d3c2 ]

If two overlayfs filesystems are stacked on top of each other, then we need
recursion in ovl_d_select_inode().

I guess d_backing_inode() is supposed to do that. But currently it doesn't
and that functionality is open coded in vfs_open(). This is now copied
into ovl_d_select_inode() to fix this regression.

Reported-by: Alban Crequy <alban.crequy@gmail.com>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Fixes: 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay...")
Cc: David Howells <dhowells@redhat.com>
Cc: <stable@vger.kernel.org> # v4.2+
Signed-off-by: Sasha Levin <sashal@kernel.org>

iwlwifi: pcie: correctly define 7265-D cfg

[ Upstream commit 2b0e2b0f7bfe9a9098bda6109176adcf78f9b7ac ]

The trans cfg was not replaced for 7265-D cards. This led to a check of
the min-NVM version against a 7265-C card, causing very-old 7265-D cards
to operate incorrectly with the driver.

Fixes: 3fd0d3c170ad ("iwlwifi: pcie: support 7265-D devices")
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

sctp: translate network order to host order when users get a hmacid

[ Upstream commit 7a84bd46647ff181eb2659fdc99590e6f16e501d ]

Commit ed5a377d87dc ("sctp: translate host order to network order when
setting a hmacid") corrected the hmacid byte-order when setting a hmacid.
but the same issue also exists on getting a hmacid.

We fix it by changing hmacids to host order when users get them with
getsockopt.

Fixes: Commit ed5a377d87dc ("sctp: translate host order to network order when setting a hmacid")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>

vfs: Make sendfile(2) killable even better

[ Upstream commit c725bfce7968009756ed2836a8cd7ba4dc163011 ]

Commit 296291cdd162 (mm: make sendfile(2) killable) fixed an issue where
sendfile(2) was doing a lot of tiny writes into a filesystem and thus
was unkillable for a long time. However sendfile(2) can be (mis)used to
issue lots of writes into arbitrary file descriptor such as evenfd or
similar special file descriptors which never hit the standard filesystem
write path and thus are still unkillable. E.g. the following example
from Dmitry burns CPU for ~16s on my test system without possibility to
be killed:

        int r1 = eventfd(0, 0);
        int r2 = memfd_create("", 0);
        unsigned long n = 1<<30;
        fallocate(r2, 0, 0, n);
        sendfile(r1, r2, 0, n);

There are actually quite a few tests for pending signals in sendfile
code however we data to write is always available none of them seems to
trigger. So fix the problem by adding a test for pending signal into
splice_from_pipe_next() also before the loop waiting for pipe buffers to
be available. This should fix all the lockup issues with sendfile of the
do-ton-of-tiny-writes nature.

CC: stable@vger.kernel.org
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sashal@kernel.org>

PCI: Fix devfn for VPD access through function 0

[ Upstream commit 9d9240756e63dd87d6cbf5da8b98ceb8f8192b55 ]

Commit 932c435caba8 ("PCI: Add dev_flags bit to access VPD through function
0") passes PCI_SLOT(devfn) for the devfn parameter of pci_get_slot().
Generally this works because we're fairly well guaranteed that a PCIe
device is at slot address 0, but for the general case, including
conventional PCI, it's incorrect. We need to get the slot and then convert
it back into a devfn.

Fixes: 932c435caba8 ("PCI: Add dev_flags bit to access VPD through function 0")
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
Acked-by: Myron Stowe <myron.stowe@redhat.com>
Acked-by: Mark Rustad <mark.d.rustad@intel.com>
CC: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>

x86/ldt: Fix small LDT allocation for Xen

[ Upstream commit f454b478861325f067fd58ba7ee9f1b5c4a9d6a0 ]

While the following commit:

37868fe113 ("x86/ldt: Make modify_ldt synchronous")

added a nice comment explaining that Xen needs page-aligned
whole page chunks for guest descriptor tables, it then
nevertheless used kzalloc() on the small size path.

As I'm unaware of guarantees for kmalloc(PAGE_SIZE, ) to return
page-aligned memory blocks, I believe this needs to be switched
back to __get_free_page() (or better get_zeroed_page()).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/55E735D6020000780009F1E6@prv-mh.provo.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

Revert "SCSI: Fix NULL pointer dereference in runtime PM"

[ Upstream commit 1c69d3b6eb73e466ecbb8edaf1bc7fd585b288da ]

This reverts commit 49718f0fb8c9 ("SCSI: Fix NULL pointer dereference in
runtime PM")

The old commit may lead to a issue that blk_{pre|post}_runtime_suspend and
blk_{pre|post}_runtime_resume may not be called in pairs.

Take sr device as example, when sr device goes to runtime suspend,
blk_{pre|post}_runtime_suspend will be called since sr device defined
pm->runtime_suspend. But blk_{pre|post}_runtime_resume will not be called
since sr device doesn't have pm->runtime_resume. so, sr device can not
resume correctly anymore.

More discussion can be found from below link.
http://marc.info/?l=linux-scsi&m=144163730531875&w=2

Signed-off-by: Ken Xue <Ken.Xue@amd.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Cc: Xiangliang Yu <Xiangliang.Yu@amd.com>
Cc: James E.J. Bottomley <JBottomley@odin.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Michael Terry <Michael.terry@canonical.com>
Cc: stable@vger.kernel.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

mm: migrate: hugetlb: putback destination hugepage to active list

[ Upstream commit 3aaa76e125c1dd58c9b599baa8c6021896874c12 ]

Since commit bcc54222309c ("mm: hugetlb: introduce page_huge_active")
each hugetlb page maintains its active flag to avoid a race condition
betwe= en multiple calls of isolate_huge_page(), but current kernel
doesn't set the f= lag on a hugepage allocated by migration because the
proper putback routine isn= 't called. This means that users could
still encounter the race referred to by bcc54222309c in this special
case, so this patch fixes it.

Fixes: bcc54222309c ("mm: hugetlb: introduce page_huge_active")
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org> [4.1.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

perf: Fix PERF_EVENT_IOC_PERIOD deadlock

[ Upstream commit 642c2d671ceff40e9453203ea0c66e991e11e249 ]

Dmitry reported a fairly silly recursive lock deadlock for
PERF_EVENT_IOC_PERIOD, fix this by explicitly doing the inactive part of
__perf_event_period() instead of calling that function.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: c7999c6f3fed ("perf: Fix PERF_EVENT_IOC_PERIOD migration race")
Link: http://lkml.kernel.org/r/20151130115615.GJ17308@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

libata: blacklist Micron 500IT SSD with MU01 firmware

[ Upstream commit 136d769e0b3475d71350aa3648a116a6ee7a8f6c ]

While whitelisting Micron M500DC drives, the tweaked blacklist entry
enabled queued TRIM from M500IT variants also. But these do not support
queued TRIM. And while using those SSDs with the latest kernel we have
seen errors and even the partition table getting corrupted.

Some part from the dmesg:
[    6.727384] ata1.00: ATA-9: Micron_M500IT_MTFDDAK060MBD, MU01, max UDMA/133
[    6.727390] ata1.00: 117231408 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
[    6.741026] ata1.00: supports DRM functions and may not be fully accessible
[    6.759887] ata1.00: configured for UDMA/133
[    6.762256] scsi 0:0:0:0: Direct-Access     ATA      Micron_M500IT_MT MU01 PQ: 0 ANSI: 5

and then for the error:
[  120.860334] ata1.00: exception Emask 0x1 SAct 0x7ffc0007 SErr 0x0 action 0x6 frozen
[  120.860338] ata1.00: irq_stat 0x40000008
[  120.860342] ata1.00: failed command: SEND FPDMA QUEUED
[  120.860351] ata1.00: cmd 64/01:00:00:00:00/00:00:00:00:00/a0 tag 0 ncq dma 512 out
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x5 (timeout)
[  120.860353] ata1.00: status: { DRDY }
[  120.860543] ata1: hard resetting link
[  121.166128] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  121.166376] ata1.00: supports DRM functions and may not be fully accessible
[  121.186238] ata1.00: supports DRM functions and may not be fully accessible
[  121.204445] ata1.00: configured for UDMA/133
[  121.204454] ata1.00: device reported invalid CHS sector 0
[  121.204541] sd 0:0:0:0: [sda] tag#18 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
[  121.204546] sd 0:0:0:0: [sda] tag#18 Sense Key : 0x5 [current]
[  121.204550] sd 0:0:0:0: [sda] tag#18 ASC=0x21 ASCQ=0x4
[  121.204555] sd 0:0:0:0: [sda] tag#18 CDB: opcode=0x93 93 08 00 00 00 00 00 04 28 80 00 00 00 30 00 00
[  121.204559] print_req_error: I/O error, dev sda, sector 272512

After few reboots with these errors, and the SSD is corrupted.
After blacklisting it, the errors are not seen and the SSD does not get
corrupted any more.

Fixes: 243918be6393 ("libata: Do not blacklist Micron M500DC")
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

igb: Unpair the queues when changing the number of queues

[ Upstream commit 37a5d163fb447b39f7960d0534de30e88ad395bb ]

By the commit 72ddef0506da ("igb: Fix oops caused by missing queue
pairing"), the IGB_FLAG_QUEUE_PAIRS flag can now be set when changing the
number of queues by "ethtool -L", but it is never cleared unless the igb
driver is reloaded.
This patch clears it if queue pairing becomes unnecessary as a result of
"ethtool -L".

Signed-off-by: Shota Suzuki <suzuki_shota_t3@lab.ntt.co.jp>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

Btrfs: do not ignore errors from btrfs_lookup_xattr in do_setxattr

[ Upstream commit 5cdf83edb8e41cad1ec8eab2d402b4f9d9eb7ee0 ]

The return value from btrfs_lookup_xattr() can be a pointer encoding an
error, therefore deal with it. This fixes commit 5f5bc6b1e2d5
("Btrfs: make xattr replace operations atomic").

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

tty: audit: Fix audit source

[ Upstream commit 6b2a3d628aa752f0ab825fc6d4d07b09e274d1c1 ]

The data to audit/record is in the 'from' buffer (ie., the input
read buffer).

Fixes: 72586c6061ab ("n_tty: Fix auditing support for cannonical mode")
Cc: stable <stable@vger.kernel.org> # 4.1+
Cc: Miloslav Trmač <mitr@redhat.com>
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Laura Abbott <labbott@fedoraproject.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

ALSA: usb-audio: Add a more accurate volume quirk for AudioQuest DragonFly

[ Upstream commit 42e3121d90f42e57f6dbd6083dff2f57b3ec7daa ]

AudioQuest DragonFly DAC reports a volume control range of 0..50
(0x0000..0x0032) which in USB Audio means a range of 0 .. 0.2dB, which
is obviously incorrect and would cause software using the dB information
in e.g. volume sliders to have a massive volume difference in 100..102%
range.

Commit 2d1cb7f658fb ("ALSA: usb-audio: add dB range mapping for some
devices") added a dB range mapping for it with range 0..50 dB.

However, the actual volume mapping seems to be neither linear volume nor
linear dB scale, but instead quite close to the cubic mapping e.g.
alsamixer uses, with a range of approx. -53...0 dB.

Replace the previous quirk with a custom dB mapping based on some basic
output measurements, using a 10-item range TLV (which will still fit in
alsa-lib MAX_TLV_RANGE_SIZE).

Tested on AudioQuest DragonFly HW v1.2. The quirk is only applied if the
range is 0..50, so if this gets fixed/changed in later HW revisions it
will no longer be applied.

v2: incorporated Takashi Iwai's suggestion for the quirk application
method

Signed-off-by: Anssi Hannula <anssi.hannula@iki.fi>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>

ALSA: hda - Add headset mic support for Acer Aspire V5-573G

[ Upstream commit 0420694dddeb9e269a1ab2129a0119a5cea294a4 ]

Acer Aspire V5 with the ALC282 codec is given the wrong value for the
0x19 PIN by the laptop's BIOS. Overriding it with the correct value
adds support for the headset microphone which would not otherwise be
visible in the system.

The fix is based on commit 7819717b1134 with a similar quirk for Acer
Aspire with the ALC269 codec.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=96201
Cc: <stable@vger.kernel.org>
Signed-off-by: Mateusz Sylwestrzak <matisec7@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>

rtlwifi: rtl8821ae: Fix lockups on boot

[ Upstream commit eeec5d0ef7ee54a75e09e861c3cc44177b8752c7 ]

In commit 54328e64047a5 ("rtlwifi: rtl8821ae: Fix system lockups on boot"),
an attempt was made to fix a regression introduced in commit 1277fa2ab2f9
("rtlwifi: Remove the clear interrupt routine from all drivers").
Unfortunately, there were logic errors in that patch that prevented
affected boxes from booting even after that patch was applied.

The actual cause of the original problem is unknown as none of the
developers have systems that are affected.

Fixes: 54328e64047a ("rtlwifi: rtl8821ae: Fix system lockups on boot")
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Stable <stable@vger.kernel.org> [V4.1+]
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

rtlwifi: rtl8821ae: Fix system lockups on boot

[ Upstream commit 54328e64047a54b8fc2362c2e1f0fa16c90f739f ]

In commit 1277fa2ab2f9 ("rtlwifi: Remove the clear interrupt routine from all
drivers"), the code that cleared all interrupt enable bits before setting them
was removed for all PCI drivers. This fixed an issue that caused TX to be
blocked for 3-5 seconds. On some RTL8821AE units, this change causes soft
lockups to occur on boot. For that reason, the portion of the earlier commit
that applied to rtl8821ae is reverted. Kernels 4.1 and newer are affected.

See http://marc.info/?l=linux-wireless&m=144373370103285&w=2 and
https://bugzilla.opensuse.org/show_bug.cgi?id=944978 for two cases where
this regression affected user systems. Note that this bug does not appear on
any of the developer's setups. For those users whose systems are affected
by the TX blockage, but do not lock up on boot, a module parameter is added
to disable the interrupt clear

Fixes: 1277fa2ab2f9 ("rtlwifi: Remove the clear interrupt routine from all drivers")
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Stable <stable@vger.kernel.org> [V4.1+]
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

selftests: Introduce a new script to generate tc batch file

[ Upstream commit 7f071998474a9e5f7b98103d3058a1b8ca5887e6 ]

  # ./tdc_batch.py -h
  usage: tdc_batch.py [-h] [-n NUMBER] [-o] [-s] [-p] device file

  TC batch file generator

  positional arguments:
    device                device name
    file                  batch file name

  optional arguments:
    -h, --help            show this help message and exit
    -n NUMBER, --number NUMBER
                          how many lines in batch file
    -o, --skip_sw         skip_sw (offload), by default skip_hw
    -s, --share_action    all filters share the same action
    -p, --prio            all filters have different prio

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: Chris Mi <chrism@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>

mtd: blkdevs: fix potential deadlock + lockdep warnings

[ Upstream commit f3c63795e90f0c6238306883b6c72f14d5355721 ]

Commit 073db4a51ee4 ("mtd: fix: avoid race condition when accessing
mtd->usecount") fixed a race condition but due to poor ordering of the
mutex acquisition, introduced a potential deadlock.

The deadlock can occur, for example, when rmmod'ing the m25p80 module, which
will delete one or more MTDs, along with any corresponding mtdblock
devices. This could potentially race with an acquisition of the block
device as follows.

-> blktrans_open()
    ->  mutex_lock(&dev->lock);
    ->  mutex_lock(&mtd_table_mutex);

-> del_mtd_device()
    ->  mutex_lock(&mtd_table_mutex);
    ->  blktrans_notify_remove() -> del_mtd_blktrans_dev()
       ->  mutex_lock(&dev->lock);

This is a classic (potential) ABBA deadlock, which can be fixed by
making the A->B ordering consistent everywhere. There was no real
purpose to the ordering in the original patch, AFAIR, so this shouldn't
be a problem. This ordering was actually already present in
del_mtd_blktrans_dev(), for one, where the function tried to ensure that
its caller already held mtd_table_mutex before it acquired &dev->lock:

        if (mutex_trylock(&mtd_table_mutex)) {
                mutex_unlock(&mtd_table_mutex);
                BUG();
        }

So, reverse the ordering of acquisition of &dev->lock and &mtd_table_mutex so
we always acquire mtd_table_mutex first.

Snippets of the lockdep output follow:

  # modprobe -r m25p80
  [   53.419251]
  [   53.420838] ======================================================
  [   53.427300] [ INFO: possible circular locking dependency detected ]
  [   53.433865] 4.3.0-rc6 #96 Not tainted
  [   53.437686] -------------------------------------------------------
  [   53.444220] modprobe/372 is trying to acquire lock:
  [   53.449320]  (&new->lock){+.+...}, at: [<c043fe4c>] del_mtd_blktrans_dev+0x80/0xdc
  [   53.457271]
  [   53.457271] but task is already holding lock:
  [   53.463372]  (mtd_table_mutex){+.+.+.}, at: [<c0439994>] del_mtd_device+0x18/0x100
  [   53.471321]
  [   53.471321] which lock already depends on the new lock.
  [   53.471321]
  [   53.479856]
  [   53.479856] the existing dependency chain (in reverse order) is:
  [   53.487660]
  -> #1 (mtd_table_mutex){+.+.+.}:
  [   53.492331]        [<c043fc5c>] blktrans_open+0x34/0x1a4
  [   53.497879]        [<c01afce0>] __blkdev_get+0xc4/0x3b0
  [   53.503364]        [<c01b0bb8>] blkdev_get+0x108/0x320
  [   53.508743]        [<c01713c0>] do_dentry_open+0x218/0x314
  [   53.514496]        [<c0180454>] path_openat+0x4c0/0xf9c
  [   53.519959]        [<c0182044>] do_filp_open+0x5c/0xc0
  [   53.525336]        [<c0172758>] do_sys_open+0xfc/0x1cc
  [   53.530716]        [<c000f740>] ret_fast_syscall+0x0/0x1c
  [   53.536375]
  -> #0 (&new->lock){+.+...}:
  [   53.540587]        [<c063f124>] mutex_lock_nested+0x38/0x3cc
  [   53.546504]        [<c043fe4c>] del_mtd_blktrans_dev+0x80/0xdc
  [   53.552606]        [<c043f164>] blktrans_notify_remove+0x7c/0x84
  [   53.558891]        [<c04399f0>] del_mtd_device+0x74/0x100
  [   53.564544]        [<c043c670>] del_mtd_partitions+0x80/0xc8
  [   53.570451]        [<c0439aa0>] mtd_device_unregister+0x24/0x48
  [   53.576637]        [<c046ce6c>] spi_drv_remove+0x1c/0x34
  [   53.582207]        [<c03de0f0>] __device_release_driver+0x88/0x114
  [   53.588663]        [<c03de19c>] device_release_driver+0x20/0x2c
  [   53.594843]        [<c03dd9e8>] bus_remove_device+0xd8/0x108
  [   53.600748]        [<c03dacc0>] device_del+0x10c/0x210
  [   53.606127]        [<c03dadd0>] device_unregister+0xc/0x20
  [   53.611849]        [<c046d878>] __unregister+0x10/0x20
  [   53.617211]        [<c03da868>] device_for_each_child+0x50/0x7c
  [   53.623387]        [<c046eae8>] spi_unregister_master+0x58/0x8c
  [   53.629578]        [<c03e12f0>] release_nodes+0x15c/0x1c8
  [   53.635223]        [<c03de0f8>] __device_release_driver+0x90/0x114
  [   53.641689]        [<c03de900>] driver_detach+0xb4/0xb8
  [   53.647147]        [<c03ddc78>] bus_remove_driver+0x4c/0xa0
  [   53.652970]        [<c00cab50>] SyS_delete_module+0x11c/0x1e4
  [   53.658976]        [<c000f740>] ret_fast_syscall+0x0/0x1c
  [   53.664621]
  [   53.664621] other info that might help us debug this:
  [   53.664621]
  [   53.672979]  Possible unsafe locking scenario:
  [   53.672979]
  [   53.679169]        CPU0                    CPU1
  [   53.683900]        ----                    ----
  [   53.688633]   lock(mtd_table_mutex);
  [   53.692383]                                lock(&new->lock);
  [   53.698306]                                lock(mtd_table_mutex);
  [   53.704658]   lock(&new->lock);
  [   53.707946]
  [   53.707946]  *** DEADLOCK ***

Fixes: 073db4a51ee4 ("mtd: fix: avoid race condition when accessing mtd->usecount")
Reported-by: Felipe Balbi <balbi@ti.com>
Tested-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

ASoC: dapm: Don't add prefix to widget stream name

[ Upstream commit a798c24a69b64f09e2d323ac8155a36373e5d5fd ]

Commit fdb6eb0a1287 ("ASoC: dapm: Modify widget stream name according to
prefix") fixed the case where a DAPM route between a DAI widget and a
DAC/ADC/AIF widget with a matching stream name was not created when the
DAPM context was using a prefix.

Unfortunately the patch introduced a few issues on its own like leaking the
dynamically allocated stream name memory and also not checking whether the
allocation succeeded in the first place.

It is also incomplete in that it still does not handle the case where
stream name of the widget is a substring of the stream name of the DAI,
which is explicitly allowed and works fine if no DAPM prefix is used.

Revert the commit and take a slightly different approach to solving the
issue. Instead of comparing the widget's stream name to the name of the DAI
widget compare it to the stream name of the DAI widget. The stream name of
the DAI widget is identical to the name of the DAI widget except that it
wont have the DAPM prefix added. So this approach behaves identical
regardless to whether the DAPM context uses a prefix or not.

We don't have to worry about potentially matching with a widget with the
same stream name, but from a different DAPM context with a different
prefix, since the code already makes sure that both the DAI widget and the
matched widget are from the same DAPM context.

Fixes: fdb6eb0a1287 ("ASoC: dapm: Modify widget stream name according to prefix")
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>

lib: make memzero_explicit more robust against dead store elimination

[ Upstream commit 7829fb09a2b4268b30dd9bc782fa5ebee278b137 ]

In commit 0b053c951829 ("lib: memzero_explicit: use barrier instead
of OPTIMIZER_HIDE_VAR"), we made memzero_explicit() more robust in
case LTO would decide to inline memzero_explicit() and eventually
find out it could be elimiated as dead store.

While using barrier() works well for the case of gcc, recent efforts
from LLVMLinux people suggest to use llvm as an alternative to gcc,
and there, Stephan found in a simple stand-alone user space example
that llvm could nevertheless optimize and thus elimitate the memset().
A similar issue has been observed in the referenced llvm bug report,
which is regarded as not-a-bug.

Based on some experiments, icc is a bit special on its own, while it
doesn't seem to eliminate the memset(), it could do so with an own
implementation, and then result in similar findings as with llvm.

The fix in this patch now works for all three compilers (also tested
with more aggressive optimization levels). Arguably, in the current
kernel tree it's more of a theoretical issue, but imho, it's better
to be pedantic about it.

It's clearly visible with gcc/llvm though, with the below code: if we
would have used barrier() only here, llvm would have omitted clearing,
not so with barrier_data() variant:

  static inline void memzero_explicit(void *s, size_t count)
  {
    memset(s, 0, count);
    barrier_data(s);
  }

  int main(void)
  {
    char buff[20];
    memzero_explicit(buff, sizeof(buff));
    return 0;
  }

  $ gcc -O2 test.c
  $ gdb a.out
  (gdb) disassemble main
  Dump of assembler code for function main:
   0x0000000000400400  <+0>: lea   -0x28(%rsp),%rax
   0x0000000000400405  <+5>: movq  $0x0,-0x28(%rsp)
   0x000000000040040e <+14>: movq  $0x0,-0x20(%rsp)
   0x0000000000400417 <+23>: movl  $0x0,-0x18(%rsp)
   0x000000000040041f <+31>: xor   %eax,%eax
   0x0000000000400421 <+33>: retq
  End of assembler dump.

  $ clang -O2 test.c
  $ gdb a.out
  (gdb) disassemble main
  Dump of assembler code for function main:
   0x00000000004004f0  <+0>: xorps  %xmm0,%xmm0
   0x00000000004004f3  <+3>: movaps %xmm0,-0x18(%rsp)
   0x00000000004004f8  <+8>: movl   $0x0,-0x8(%rsp)
   0x0000000000400500 <+16>: lea    -0x18(%rsp),%rax
   0x0000000000400505 <+21>: xor    %eax,%eax
   0x0000000000400507 <+23>: retq
  End of assembler dump.

As gcc, clang, but also icc defines __GNUC__, it's sufficient to define
this in compiler-gcc.h only to be picked up. For a fallback or otherwise
unsupported compiler, we define it as a barrier. Similarly, for ecc which
does not support gcc inline asm.

Reference: https://llvm.org/bugs/show_bug.cgi?id=15495
Reported-by: Stephan Mueller <smueller@chronox.de>
Tested-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Stephan Mueller <smueller@chronox.de>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: mancha security <mancha1@zoho.com>
Cc: Mark Charlebois <charlebm@gmail.com>
Cc: Behan Webster <behanw@converseincode.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>

dm9000: Fix irq trigger type setup on non-dt platforms

[ Upstream commit a96d3b7593a3eefab62dd930e5c99201c3678ee4 ]

Commit b5a099c67a1c36b "net: ethernet: davicom: fix devicetree irq
resource" causes an interrupt storm after the ethernet interface
is activated on S3C24XX platform (ARM non-dt), due to the interrupt
trigger type not being set properly.

It seems, after adding parsing of IRQ flags in commit 7085a7401ba54e92b
"drivers: platform: parse IRQ flags from resources", there is no path
for non-dt platforms where irq_set_type callback could be invoked when
we don't pass the trigger type flags to the request_irq() call.

In case of a board where the regression is seen the interrupt trigger
type flags are passed through a platform device's resource and it is
not currently handled properly without passing the irq trigger type
flags to the request_irq() call. In case of OF an of_irq_get() call
within platform_get_irq() function seems to be ensuring required irq_chip
setup, but there is no equivalent code for non OF/ACPI platforms.

This patch mostly restores irq trigger type setting code which has been
removed in commit ("net: ethernet: davicom: fix devicetree irq resource").

Fixes: b5a099c67a1c36b913 ("net: ethernet: davicom: fix devicetree irq resource")
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Acked-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>

MIPS: Fix up obsolete cpu_set usage

[ Upstream commit 7363cb7de3999e84243bca79ffea257fd86a2cc6 ]

cpu_set was removed (along with a bunch of cpumask helpers) by
commit 2f0f267ea072 ("cpumask: remove deprecated functions.").

Fix this by replacing cpu_set with cpumask_set_cpu. Without this
fix the following error is triggered when CONFIG_MIPS_MT_FPAFF=y.

arch/mips/kernel/smp-cps.c: In function 'cps_smp_setup':
arch/mips/kernel/smp-cps.c:95:3: error: implicit declaration of function 'cpu_set' [-Werror=implicit-function-declaration]

Fixes: 90db024f140d ("MIPS: smp-cps: cpu_set FPU mask if FPU present")
Signed-off-by: Ezequiel Garcia <ezequiel.garcia@imgtec.com>
Acked-by: Niklas Cassel <niklass@axis.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/9912/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

perf bench numa: Fix to show proper convergence stats

[ Upstream commit 2b42b09b88c831ba4da2d669581dde371c38c2af ]

With commit: e1e455f4f4d3 (perf tools: Work around lack of sched_getcpu
in glibc < 2.6), perf_bench numa mem with -c or -m option is not able to
correctly calculate convergence.

With the above commit, sched_getcpu always seems to return -1. The
intention of commit e1e455f was to add a sched_getcpu in glibc < 2.6.
Hence keep the sched_getcpu definition under an ifdef.

This regression happened occurred between v4.0 and v4.1

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Vinson Lee <vlee@twitter.com>
Fixes: e1e455f4f4d3 ("perf tools: Work around lack of sched_getcpu in glibc < 2.6")
Link: http://lkml.kernel.org/r/20150624111004.GA5220@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

net: ethernet: davicom: fix devicetree irq resource

[ Upstream commit b5a099c67a1c36b91356624ce86eb3f9f48a82c7 ]

The dm9000 driver doesn't work in at least one device-tree
configuration, spitting an error message on irq resource :
[    1.062495] dm9000 8000000.ethernet: insufficient resources
[    1.068439] dm9000 8000000.ethernet: not found (-2).
[    1.073451] dm9000: probe of 8000000.ethernet failed with error -2

The reason behind is that the interrupt might be provided by a gpio
controller, not probed when dm9000 is probed, and needing the probe
deferral mechanism to apply.

Currently, the interrupt is directly taken from resources. This patch
changes this to use the more generic platform_get_irq(), which handles
the deferral.

Moreover, since commit Fixes: 7085a7401ba5 ("drivers: platform: parse
IRQ flags from resources"), the interrupt trigger flags are honored in
platform_get_irq(), so remove the needless code in dm9000.

Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Acked-by: Marcel Ziswiler <marcel@ziswiler.com>
Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Tested-by: Sergei Ianovich <ynvich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>

ext4: fix an ext3 collapse range regression in xfstests

[ Upstream commit b9576fc3624eb9fc88bec0d0ae883fd78be86239 ]

The xfstests test suite assumes that an attempt to collapse range on
the range (0, 1) will return EOPNOTSUPP if the file system does not
support collapse range. Commit 280227a75b56: "ext4: move check under
lock scope to close a race" broke this, and this caused xfstests to
fail when run when testing file systems that did not have the extents
feature enabled.

Reported-by: Eric Whitney <enwlinux@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>

x86/idle: Restore trace_cpu_idle to mwait_idle() calls

[ Upstream commit e43d0189ac02415fe4487f79fc35e8f147e9ea0d ]

Commit b253149b843f ("sched/idle/x86: Restore mwait_idle() to fix boot
hangs, to improve power savings and to improve performance") restores
mwait_idle(), but the trace_cpu_idle related calls are missing. This
causes powertop on my old desktop powered by Intel Core2 E6550 to
report zero wakeups and zero events.

Add them back to restore the proper behaviour.

Fixes: b253149b843f ("sched/idle/x86: Restore mwait_idle() to ...")
Signed-off-by: Jisheng Zhang <jszhang@marvell.com>
Cc: <len.brown@intel.com>
Cc: stable@vger.kernel.org # 4.1
Link: http://lkml.kernel.org/r/1440046479-4262-1-git-send-email-jszhang@marvell.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>

tty: serial: fsl_lpuart: fix clearing of receive flag

[ Upstream commit d68827c62a105eec547945daedf4d1d3e283717d ]

Commit 8e4934c6d6c6 ("tty: serial: fsl_lpuart: clear receive flag on FIFO
flush") implemented clearing of the receive flag by reading the status register
only. It turned out that even though we flush the FIFO afterwards, a explicit
read of the data register is still required.

This leads to a FIFO underrun. To avoid this, follow the advice in the overrun
"Operation section": Unconditionally clear RXUF after using RXFLUSH.

Signed-off-by: Stefan Agner <stefan@agner.ch>
Signed-off-by: Bhuvanchandra DV <bhuvanchandra.dv@toradex.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

iommu/vt-d: Fix VM domain ID leak

[ Upstream commit 46ebb7af7b93792de65e124e1ab8b89a108a41f2 ]

This continues the attempt to fix commit fb170fb4c548 ("iommu/vt-d:
Introduce helper functions to make code symmetric for readability").
The previous attempt in commit 71684406905f ("iommu/vt-d: Detach
domain *only* from attached iommus") overlooked the fact that
dmar_domain.iommu_bmp gets cleared for VM domains when devices are
detached:

intel_iommu_detach_device
  domain_remove_one_dev_info
    domain_detach_iommu

The domain is detached from the iommu, but the iommu is still attached
to the domain, for whatever reason.  Thus when we get to domain_exit(),
we can't rely on iommu_bmp for VM domains to find the active iommus,
we must check them all.  Without that, the corresponding bit in
intel_iommu.domain_ids doesn't get cleared and repeated VM domain
creation and destruction will run out of domain IDs.  Meanwhile we
still can't call iommu_detach_domain() on arbitrary non-VM domains or
we risk clearing in-use domain IDs, as 71684406905f attempted to
address.

It's tempting to modify iommu_detach_domain() to test the domain
iommu_bmp, but the call ordering from domain_remove_one_dev_info()
prevents it being able to work as fb170fb4c548 seems to have intended.
Caching of unused VM domains on the iommu object seems to be the root
of the problem, but this code is far too fragile for that kind of
rework to be proposed for stable, so we simply revert this chunk to
its state prior to fb170fb4c548.

Fixes: fb170fb4c548 ("iommu/vt-d: Introduce helper functions to make
                      code symmetric for readability")
Fixes: 71684406905f ("iommu/vt-d: Detach domain *only* from attached
                      iommus")
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: stable@vger.kernel.org # v3.17+
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>

net/mlx4_en: Remove dependency between timestamping capability and service_task

[ Upstream commit fc9f5ea9b4ecbe9b7839c92f0a54261809c723d3 ]

Service task is responsible for other tasks in addition to timestamping
overflow check. Launch it even if timestamping is not supported by device.

Fixes: 07841f9d94c1 ('net/mlx4_en: Schedule napi when RX buffers allocation fails')
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>

arm/arm64: KVM: Take mmap_sem in stage2_unmap_vm

[ Upstream commit 90f6e150e44a0dc3883110eeb3ab35d1be42b6bb ]

We don't hold the mmap_sem while searching for the VMAs when
we try to unmap each memslot for a VM. Fix this properly to
avoid unexpected results.

Fixes: commit 957db105c997 ("arm/arm64: KVM: Introduce stage2_unmap_vm")
Cc: stable@vger.kernel.org # v3.19+
Reviewed-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

dm: fix AB-BA deadlock in __dm_destroy()

[ Upstream commit 2a708cff93f1845b9239bc7d6310aef54e716c6a ]

__dm_destroy() takes io_barrier SRCU lock (dm_get_live_table) and
suspend_lock in reverse order.  Doing so can cause AB-BA deadlock:

  __dm_destroy                    dm_swap_table
  ---------------------------------------------------
                                  mutex_lock(suspend_lock)
  dm_get_live_table()
    srcu_read_lock(io_barrier)
                                  dm_sync_table()
                                    synchronize_srcu(io_barrier)
                                      .. waiting for dm_put_live_table()
  mutex_lock(suspend_lock)
    .. waiting for suspend_lock

Fix this by taking the locks in proper order.

Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Fixes: ab7c7bb6f4ab ("dm: hold suspend_lock while suspending device during device deletion")
Acked-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>

pinctrl: imx25: ensure that a pin with id i is at position i in the info array

[ Upstream commit 9911a2d5e9d14e39692b751929a92cb5a1d9d0e0 ]

The code in pinctrl-imx.c only works correctly if in the
imx_pinctrl_soc_info passed to imx_pinctrl_probe we have:

info->pins[i].number = i
conf_reg(info->pins[i]) = 4 * i

(which conf_reg(pin) being the offset of the pin's configuration
register).

When the imx25 specific part was introduced in b4a87c9b966f ("pinctrl:
pinctrl-imx: add imx25 pinctrl driver") we had:

info->pins[i].number = i + 1
conf_reg(info->pins[i]) = 4 * i

. Commit 34027ca2bbc6 ("pinctrl: imx25: fix numbering for pins") tried
to fix that but made the situation:

info->pins[i-1].number = i
conf_reg(info->pins[i-1]) = 4 * i

which is hardly better but fixed the error seen back then.

So insert another reserved entry in the array to finally yield:

info->pins[i].number = i
conf_reg(info->pins[i]) = 4 * i

Fixes: 34027ca2bbc6 ("pinctrl: imx25: fix numbering for pins")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

Btrfs: avoid syncing log in the fast fsync path when not necessary

[ Upstream commit b659ef027792219b590d67a2baf1643a93727d29 ]

Commit 3a8b36f37806 ("Btrfs: fix data loss in the fast fsync path") added
a performance regression for that causes an unnecessary sync of the log
trees (fs/subvol and root log trees) when 2 consecutive fsyncs are done
against a file, without no writes or any metadata updates to the inode in
between them and if a transaction is committed before the second fsync is
called.

Huang Ying reported this to lkml (https://lkml.org/lkml/2015/3/18/99)
after a test sysbench test that measured a -62% decrease of file io
requests per second for that tests' workload.

The test is:

  echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
  echo performance > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
  echo performance > /sys/devices/system/cpu/cpu2/cpufreq/scaling_governor
  echo performance > /sys/devices/system/cpu/cpu3/cpufreq/scaling_governor
  mkfs -t btrfs /dev/sda2
  mount -t btrfs /dev/sda2 /fs/sda2
  cd /fs/sda2
  for ((i = 0; i < 1024; i++)); do fallocate -l 67108864 testfile.$i; done
  sysbench --test=fileio --max-requests=0 --num-threads=4 --max-time=600 \
    --file-test-mode=rndwr --file-total-size=68719476736 --file-io-mode=sync \
    --file-num=1024 run

A test on kvm guest, running a debug kernel gave me the following results:

Without 3a8b36f378060d:             16.01 reqs/sec
With 3a8b36f378060d:                 3.39 reqs/sec
With 3a8b36f378060d and this patch: 16.04 reqs/sec

Reported-by: Huang Ying <ying.huang@intel.com>
Tested-by: Huang, Ying <ying.huang@intel.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

of/pci: Remove duplicate kfree in of_pci_get_host_bridge_resources()

[ Upstream commit feb28979c137ba3f649ad36fc27c85c64c111f78 ]

Commit d2be00c0fb5a ("of/pci: Free resources on failure in
of_pci_get_host_bridge_resources()") fixed the error path so it frees
everything on the "resources" list. That list includes the bus_range, so
we should not free it again.

Remove the superfluous free of bus_range.

[bhelgaas: changelog]
Fixes: d2be00c0fb5a ("of/pci: Free resources on failure in of_pci_get_host_bridge_resources()")
Reported-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Rafael J. Wysocki <rjw@rjwysocki.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>

x86/irq: Check for valid irq descriptor in check_irq_vectors_for_cpu_disable()

[ Upstream commit d97eb8966c91f2c9d05f0a22eb89ed5b76d966d1 ]

When an interrupt is migrated away from a cpu it will stay
in its vector_irq array until smp_irq_move_cleanup_interrupt
succeeded. The cfg->move_in_progress flag is cleared already
when the IPI was sent.

When the interrupt is destroyed after migration its 'struct
irq_desc' is freed and the vector_irq arrays are cleaned up.
But since cfg->move_in_progress is already 0 the references
at cpus before the last migration will not be cleared. So
this would leave a reference to an already destroyed irq
alive.

When the cpu is taken down at this point, the
check_irq_vectors_for_cpu_disable() function finds a valid irq
number in the vector_irq array, but gets NULL for its
descriptor and dereferences it, causing a kernel panic.

This has been observed on real systems at shutdown. Add a
check to check_irq_vectors_for_cpu_disable() for a valid
'struct irq_desc' to prevent this issue.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: alnovak@suse.com
Cc: joro@8bytes.org
Link: http://lkml.kernel.org/r/20150204132754.GA10078@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

rcu: Clear need_qs flag to prevent splat

[ Upstream commit c0135d07b013fa8f7ba9ec91b4369c372e6a28cb ]

If the scheduling-clock interrupt sets the current tasks need_qs flag,
but if the current CPU passes through a quiescent state in the meantime,
then rcu_preempt_qs() will fail to clear the need_qs flag, which can fool
RCU into thinking that additional rcu_read_unlock_special() processing
is needed.  This commit therefore clears the need_qs flag before checking
for additional processing.

For this problem to occur, we need rcu_preempt_data.passed_quiesce equal
to true and current->rcu_read_unlock_special.b.need_qs also equal to true.
This condition can occur as follows:

1. CPU 0 is aware of the current preemptible RCU grace period,
but has not yet passed through a quiescent state.  Among other
things, this means that rcu_preempt_data.passed_quiesce is false.

2. Task A running on CPU 0 enters a preemptible RCU read-side
critical section.

3. CPU 0 takes a scheduling-clock interrupt, which notices the
RCU read-side critical section and the need for a quiescent state,
and thus sets current->rcu_read_unlock_special.b.need_qs to true.

4. Task A is preempted, enters the scheduler, eventually invoking
rcu_preempt_note_context_switch() which in turn invokes
rcu_preempt_qs().

Because rcu_preempt_data.passed_quiesce is false,
control enters the body of the "if" statement, which sets
rcu_preempt_data.passed_quiesce to true.

5. At this point, CPU 0 takes an interrupt.  The interrupt
handler contains an RCU read-side critical section, and
the rcu_read_unlock() notes that current->rcu_read_unlock_special
is nonzero, and thus invokes rcu_read_unlock_special().

6. Once in rcu_read_unlock_special(), the fact that
current->rcu_read_unlock_special.b.need_qs is true becomes
apparent, so rcu_read_unlock_special() invokes rcu_preempt_qs().
Recursively, given that we interrupted out of that same
function in the preceding step.

7. Because rcu_preempt_data.passed_quiesce is now true,
rcu_preempt_qs() does nothing, and simply returns.

8. Upon return to rcu_read_unlock_special(), it is noted that
current->rcu_read_unlock_special is still nonzero (because
the interrupted rcu_preempt_qs() had not yet gotten around
to clearing current->rcu_read_unlock_special.b.need_qs).

9. Execution proceeds to the WARN_ON_ONCE(), which notes that
we are in an interrupt handler and thus duly splats.

The solution, as noted above, is to make rcu_read_unlock_special()
clear out current->rcu_read_unlock_special.b.need_qs after calling
rcu_preempt_qs().  The interrupted rcu_preempt_qs() will clear it again,
but this is harmless.  The worst that happens is that we clobber another
attempt to set this field, but this is not a problem because we just
got done reporting a quiescent state.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Fix embarrassing build bug noted by Sasha Levin. ]
Tested-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

nfs: fix high load average due to callback thread sleeping

[ Upstream commit 5d05e54af3cdbb13cf19c557ff2184781b91a22c ]

Chuck pointed out a problem that crept in with commit 6ffa30d3f734 (nfs:
don't call blocking operations while !TASK_RUNNING). Linux counts tasks
in uninterruptible sleep against the load average, so this caused the
system's load average to be pinned at at least 1 when there was a
NFSv4.1+ mount active.

Not a huge problem, but it's probably worth fixing before we get too
many complaints about it. This patch converts the code back to use
TASK_INTERRUPTIBLE sleep, simply has it flush any signals on each loop
iteration. In practice no one should really be signalling this thread at
all, so I think this is reasonably safe.

With this change, there's also no need to game the hung task watchdog so
we can also convert the schedule_timeout call back to a normal schedule.

Cc: <stable@vger.kernel.org>
Reported-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Tested-by: Chuck Lever <chuck.lever@oracle.com>
Fixes: commit 6ffa30d3f734 (“nfs: don't call blocking . . .”)
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

rtnl: don't account unused struct ifla_port_vsi in rtnl_port_size

[ Upstream commit 025331df34f6722f86b467cb13a69326444ab1bc ]

When allocating rtnl dump messages, struct ifla_port_vsi is never dumped,
so we can save header plus payload in rtnl_port_size(). Infact, attribute
IFLA_PORT_VSI_TYPE and struct ifla_port_vsi are not used anywhere in
the kernel. We only need to keep the nla policy should applications in
user space be filling this out. Same NLA_BINARY issue exists as was fixed
in 364d5716a7ad ("rtnetlink: ifla_vf_policy: fix misuses of NLA_BINARY")
and others, but then again IFLA_PORT_VSI_TYPE is not used anywhere, so
just add a comment that it's unused.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>

quota: Fix maximum quota limit settings

[ Upstream commit 7e08da50cf706151f324349f9235ebd311226997 ]

Currently quota format that supports 64-bit usage sets maximum quota
limit as 2^64-1. However quota core code uses signed numbers to track
usage and even limits themselves are stored in long long. Checking of
maximum allowable limits worked by luck until commit 14bf61ffe6ac
(quota: Switch ->get_dqblk() and ->set_dqblk() to use bytes as space
units) because variable we compared with was unsigned. After that commit
the type we compared against changed to signed and thus checks for
maximum limits with the newest VFS quota format started to refuse any
non-negative value. Later the problem was inadvertedly fixed by commit
b10a08194c2b (quota: Store maximum space limit in bytes) because we
started to compare against unsigned type as well.

Fix possible future problems of this kind by setting maximum limits to
2^63-1 to avoid overflow issues.

Reported-by: Carlos Carvalho <carlos@fisica.ufpr.br>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>

clk: rockchip: fix deadlock possibility in cpuclk

[ Upstream commit a5e1baf7dca10f8cf945394034013260297bc416 ]

Lockdep reported a possible deadlock between the cpuclk lock and for example
the i2c driver.

       CPU0                    CPU1
       ----                    ----
  lock(clk_lock);
                               local_irq_disable();
                               lock(&(&i2c->lock)->rlock);
                               lock(clk_lock);
  <Interrupt>
    lock(&(&i2c->lock)->rlock);

*** DEADLOCK ***

The generic clock-types of the core ccf already use spin_lock_irqsave when
touching clock registers, so do the same for the cpuclk.

Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Doug Anderson <dianders@chromium.org>
Signed-off-by: Michael Turquette <mturquette@linaro.org>
[mturquette@linaro.org: removed initialization of "flags"]
Signed-off-by: Sasha Levin <sashal@kernel.org>

ARM: dts: disable CCI on exynos5420 based arndale-octa

[ Upstream commit 25217fef355174209eff68c0eb438a8af5d7b01c ]

The arndale-octa board was giving "imprecise external aborts" during
boot-up with MCPM enabled. CCI enablement of the boot cluster was found
to be the cause of these aborts (possibly because the secure f/w was not
allowing it). Hence, disable CCI for the arndale-octa board.

Signed-off-by: Abhilash Kesavan <a.kesavan@samsung.com>
Tested-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Tested-by: Kevin Hilman <khilman@linaro.org>
Tested-by: Tyler Baker <tyler.baker@linaro.org>
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>

drivers: bus: check cci device tree node status

[ Upstream commit 896ddd600ba4a3426aeb11710ae9c28dd7ce68ce ]

The arm-cci driver completes the probe sequence even if the cci node is
marked as disabled. Add a check in the driver to honour the cci status
in the device tree.

Signed-off-by: Abhilash Kesavan <a.kesavan@samsung.com>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Nicolas Pitre <nico@linaro.org>
Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Tested-by: Kevin Hilman <khilman@linaro.org>
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>

perf tools: Fix segfault for symbol annotation on TUI

[ Upstream commit 813ccd15452ed34e97aa526ffc70d6d8e6c466c5 ]

Currently the symbol structure is allocated with symbol_conf.priv_size
to carry sideband information like annotation, map browser on TUI and
sort-by-name tree node.  So retrieving these information from symbol
needs to care about the details of such placement.

However the annotation code just assumes that the symbol is placed after
the struct annotation.  But actually there's other info between them.
So accessing those struct will lead to an undefined behavior (usually a
crash) after they write their info to the same location.

To reproduce the problem, please follow the steps below:

  1. run perf report (TUI of course) with -v option
  2. open map browser (by pressing right arrow key for any entry)
  3. search any function (by pressing '/' key and input whatever..)
  4. return to the hist browser (by pressing 'q' or left arrow key)
  5. open annotation window for the same entry (by pressing 'a' key)

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1421234288-22758-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

perf tools: Avoid build splat for syscall numbers with uclibc

[ Upstream commit ea1fe3a88763d4dfef7e2529ba606f96e8e6b271 ]

This is due to duplicated unistd inclusion (via uClibc headers + kernel headers)
Also seen on ARM uClibc based tools

   ------- ARC build ---------->8-------------

  CC       util/evlist.o
In file included from
~/arc/k.org/arch/arc/include/uapi/asm/unistd.h:25:0,
                 from util/../perf-sys.h:10,
                 from util/../perf.h:15,
                 from util/event.h:7,
                 from util/event.c:3:
~/arc/k.org/include/uapi/asm-generic/unistd.h:906:0:
warning: "__NR_fcntl64" redefined [enabled by default]
#define __NR_fcntl64 __NR3264_fcntl
^
In file included from
~/arc/gnu/INSTALL_1412-arc-2014.12-rc1/arc-snps-linux-uclibc/sysroot/usr/include/sys/syscall.h:24:0,
                 from util/../perf-sys.h:6,
   ----------------->8-------------------

   ------- ARM build ---------->8-------------

  CC FPIC  plugin_scsi.o
In file included from util/../perf-sys.h:9:0,
                 from util/../perf.h:15,
                 from util/cache.h:7,
                 from perf.c:12:
~/arc/k.org/arch/arm/include/uapi/asm/unistd.h:28:0:
warning: "__NR_restart_syscall" redefined [enabled by default]
In file included from
~/buildroot/host/usr/arm-buildroot-linux-uclibcgnueabi/sysroot/usr/include/sys/syscall.h:25:0,
                 from util/../perf-sys.h:6,
                 from util/../perf.h:15,
                 from util/cache.h:7,
                 from perf.c:12:
~/buildroot/host/usr/arm-buildroot-linux-uclibcgnueabi/sysroot/usr/include/bits/sysnum.h:17:0:
note: this is the location of the previous definition
   ----------------->8-------------------

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Cc: Alexey Brodkin <Alexey.Brodkin@synopsys.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1421156604-30603-4-git-send-email-vgupta@synopsys.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

perf tools: Fix statfs.f_type data type mismatch build error with uclibc

[ Upstream commit db1806edcfef007d9594435a331dcf7e7f1b8fac ]

ARC Linux uses the no legacy syscalls abi and corresponding uClibc headers
statfs defines f_type to be U32 which causes perf build breakage

http://git.uclibc.org/uClibc/tree/libc/sysdeps/linux/common-generic/bits/statfs.h

  ----------->8---------------
    CC       fs/fs.o
  fs/fs.c: In function 'fs__valid_mount':
  fs/fs.c:82:24: error: comparison between signed and unsigned integer
  expressions [-Werror=sign-compare]
    else if (st_fs.f_type != magic)
                          ^
  cc1: all warnings being treated as errors
  ----------->8---------------

Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Cody P Schafer <dev@codyps.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com>
Link: http://lkml.kernel.org/r/1420888254-17504-2-git-send-email-vgupta@synopsys.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

perf machine: Fix __machine__findnew_thread() error path

[ Upstream commit 260d819e3abdbdaa2b88fb983d1314f1b263f9e2 ]

When thread__init_map_groups() fails, a new thread should be removed
from the rbtree since it's gonna be freed. Also update last match cache
only if the function succeeded.

Reported-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1420763892-15535-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

perf/x86/intel: Fix bug for "cycles:p" and "cycles:pp" on SLM

[ Upstream commit 33636732dcd7cc738a5913bb730d663c6b03c8fb ]

cycles:p and cycles:pp do not work on SLM since commit:

86a04461a99f ("perf/x86: Revamp PEBS event selection")

UOPS_RETIRED.ALL is not a PEBS capable event, so it should not be used
to count cycle number.

Actually SLM calls intel_pebs_aliases_core2() which uses INST_RETIRED.ANY_P
to count the number of cycles. It's a PEBS capable event. But inv and
cmask must be set to count cycles.

Considering SLM allows all events as PEBS with no flags, only
INST_RETIRED.ANY_P, inv=1, cmask=16 needs to handled specially.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1421084541-31639-1-git-send-email-kan.liang@intel.com
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

perf/rapl: Fix sysfs_show() initialization for RAPL PMU

[ Upstream commit 433678bdc6ed39f053c55da96b51de5bf0aeebb1 ]

This patch fixes a problem with the initialization of the
sysfs_show() routine for the RAPL PMU.

The current code was wrongly relying on the EVENT_ATTR_STR()
macro which uses the events_sysfs_show() function in the x86
PMU code. That function itself was relying on the x86_pmu data
structure. Yet RAPL and the core PMU (x86_pmu) have nothing to
do with each other. They should therefore not interact with
each other.

The x86_pmu structure is initialized at boot time based on
the host CPU model. When the host CPU is not supported, the
x86_pmu remains uninitialized and some of the callbacks it
contains are NULL.

The false dependency with x86_pmu could potentially cause crashes
in case the x86_pmu is not initialized while the RAPL PMU is. This
may, for instance, be the case in virtualized environments.

This patch fixes the problem by using a private sysfs_show()
routine for exporting the RAPL PMU events.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20150113225953.GA21525@thinkpad
Cc: vincent.weaver@maine.edu
Cc: jolsa@redhat.com
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

tracing: Fix enabling of syscall events on the command line

[ Upstream commit ce1039bd3a89e99e4f624e75fb1777fc92d76eb3 ]

Commit 5f893b2639b2 "tracing: Move enabling tracepoints to just after
rcu_init()" broke the enabling of system call events from the command
line. The reason was that the enabling of command line trace events
was moved before PID 1 started, and the syscall tracepoints require
that all tasks have the TIF_SYSCALL_TRACEPOINT flag set. But the
swapper task (pid 0) is not part of that. Since the swapper task is the
only task that is running at this early in boot, no task gets the
flag set, and the tracepoint never gets reached.

Instead of setting the swapper task flag (there should be no reason to
do that), re-enabled trace events again after the init thread (PID 1)
has been started. It requires disabling all command line events and
re-enabling them, as just enabling them again will not reset the logic
to set the TIF_SYSCALL_TRACEPOINT flag, as the syscall tracepoint will
be fooled into thinking that it was already set, and wont try setting
it again. For this reason, we must first disable it and re-enable it.

Link: http://lkml.kernel.org/r/1421188517-18312-1-git-send-email-mpe@ellerman.id.au
Link: http://lkml.kernel.org/r/20150115040506.216066449@goodmis.org
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

fbdev/broadsheetfb: fix memory leak

[ Upstream commit ef6899cdc8608e2f018e590683f04bb04a069704 ]

static code analysis from cppcheck reports:

[drivers/video/fbdev/broadsheetfb.c:673]:
(error) Memory leak: sector_buffer

sector_buffer is not being kfree'd on each call to
broadsheet_spiflash_rewrite_sector(), so free it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

ARM: at91: board-dt-sama5: add phy_fixup to override NAND_Tree

[ Upstream commit b8659752c37ec157ee254cff443b1c9d523aea22 ]

Appearance: On some SAMA5D4EK boards, after power up, the Eth1 doesn't work.

Reason: The PIOE2 pin is connected to the NAND_Tree# of KSZ8081,
But it outputs LOW during the reset period, which cause the NAND_Tree# enabled.

Add phy_fixup() to disable NAND_Tree by overriding the Operation
Mode Strap Override register(i.e. Register 16h) to clear the NAND_Tree bit.

Signed-off-by: Wenyou Yang <wenyou.yang@atmel.com>
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

ARM: at91/dt: sam9263: Add missing clocks to lcdc node

[ Upstream commit 55eb9c343fdd3611ae3de6ab8a8512f303d3f581 ]

atmel_lcdfb needs also uses hclk clock, but AT91SAM9263 doesn't have that
specific clock, so use lcd_clk twice. The same was done in
arch/arm/mach-at91/at91sam9263.c

Signed-off-by: Alexander Stein <alexanders83@web.de>
Acked-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

ARM: at91: sama5d3: dt: correct the sound route

[ Upstream commit 04582fd03fb263598e3b126c76cc42195aa0fd05 ]

The MICBIAS is a supply, should route to MIC while not IN1L.

Signed-off-by: Bo Shen <voice.shen@atmel.com>
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

ARM: at91/dt: sama5d4: fix the timer reg length

[ Upstream commit 0068b2e1b7f925a818fdc0a5d10ef0ad40f746e7 ]

The second property of reg is the length, so correct it for timer.

Signed-off-by: Bo Shen <voice.shen@atmel.com>
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

mcb: mcb-pci: Only remap the 1st 0x200 bytes of BAR 0

[ Upstream commit 7b7c54914f73966976893747ee8e2ca58166a627 ]

Currently it is not possible to have a kernel with built-in MCB attached
devices. This results out of the fact that mcb-pci requests PCI BAR 0, then
parses the chameleon table and calls the driver's probe function before
releasing BAR 0 again. When building the kernel with modules this is not a
problem (and therefore it wasn't detected by my tests yet).

A solution is to only remap the 1st 0x200 bytes of a Chameleon PCI device.
0x200 bytes is the maximum size of a Chameleon v2 Table.

Also this patch stops disabling the PCI device on successful registration of MCB
devices.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@men.de>
Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

serial: samsung: Add the support for Exynos5433 SoC

[ Upstream commit 31ec77aca72ee5920ed3ec3d047734dc0bc43342 ]

This patch adds new s3c24xx_serial_drv_data structure for Exynos5433 SoC
because Exynos5433 has different fifo size from existing Exynos4 SoC.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: linux-serial@vger.kernel.org
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
Acked-by: Inki Dae <inki.dae@samsung.com>
Acked-by: Geunsik Lim <geunsik.lim@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

Revert "tty: Fix pty master poll() after slave closes v2"

[ Upstream commit 2ce3c10c0c3e0d418c1a7a4c838319ba42c75388 ]

This reverts commit c4dc304677e8d566572c4738d95c48be150c6606.
This fix is superseded by commit 52bce7f8d4fc633c9a9d0646eef58ba6ae9a3b73,
'pty, n_tty: Simplify input processing on final close'.

The final close now waits for input processing to complete before
destroying the pty, so poll() does not need to special case this
condition.

Cc: Francesco Ruggeri <fruggeri@arista.com>
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

usb: host: ehci-tegra: request deferred probe when failing to get phy

[ Upstream commit f56e67f0a880a5b795cdb5f62614aafe264c5304 ]

The commit 1290a958d48e ("usb: phy: propagate __of_usb_find_phy()'s error on
failure") changed the condition to return -EPROBE_DEFER to host driver.
Originally the Tegra host driver depended on the returned -EPROBE_DEFER to
get the phy device later when booting. Now we have to do that explicitly.

Signed-off-by: Vince Hsu <vinceh@nvidia.com>
Tested-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

uas: disable UAS on Apricorn SATA dongles

[ Upstream commit 36d1ffdb210ec2d0d6a69e9f6466ae8727d34119 ]

The Apricorn SATA dongle will occasionally return "USBSUSBSUSB" in
response to SCSI commands when running in UAS mode. Therefore,
disable UAS mode on this dongle.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

USB: EHCI: adjust error return code

[ Upstream commit c401e7b4a808d50ab53ef45cb8d0b99b238bf2c9 ]

The USB stack uses error code -ENOSPC to indicate that the periodic
schedule is too full, with insufficient bandwidth to accommodate a new
allocation. It uses -EFBIG to indicate that an isochronous transfer
could not be linked into the schedule because it would exceed the
number of isochronous packets the host controller driver can handle
(generally because the new transfer would extend too far into the
future).

ehci-hcd uses the wrong error code at one point. This patch fixes it,
along with a misleading comment and debugging message.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

scsi: ->queue_rq can't sleep

[ Upstream commit 70a0f2c1898c6abf53670e55642b6e840b003892 ]

The blk-mq ->queue_rq method is always called from process context,
but might have preemption disabled. This means we still always
have to use GFP_ATOMIC for memory allocations, and thus need to
revert part of commit 3c356bde1 ("scsi: stop passing a gfp_mask
argument down the command setup path").

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Tested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

arm: dts: Use pmu_system_controller phandle for dp phy

[ Upstream commit e93e54544adf3aa6908b821e896cb17a562cb683 ]

DP PHY now require pmu-system-controller to handle PMU register
to control PHY's power isolation. Adding the same to dp-phy
node.

Signed-off-by: Vivek Gautam <gautam.vivek@samsung.com>
Reviewed-by: Jingoo Han <jg1.han@samsung.com>
Tested-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
Signed-off-by: Kukjin Kim <kgene@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

NFSv4: Remove incorrect check in can_open_delegated()

[ Upstream commit 4e379d36c050b0117b5d10048be63a44f5036115 ]

Remove an incorrect check for NFS_DELEGATION_NEED_RECLAIM in
can_open_delegated(). We are allowed to cache opens even in
a situation where we're doing reboot recovery.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

NFS: Ignore transport protocol when detecting server trunking

[ Upstream commit 7a01edf0058df98d6cc734c5a4ecc51f929a86ec ]

Detect server trunking across transport protocols. Otherwise, an
RDMA mount and a TCP mount of the same server will end up with
separate nfs_clients using the same clientid4.

Reported-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

NFSv4/v4.1: Verify the client owner id during trunking detection

[ Upstream commit 55b9df93ddd684cbc4c2dee9b8a99f6e48348212 ]

While we normally expect the NFSv4 client to always send the same client
owner to all servers, there are a couple of situations where that is not
the case:
1) In NFSv4.0, switching between use of '-omigration' and not will cause
    the kernel to switch between using the non-uniform and uniform client
    strings.
2) In NFSv4.1, or NFSv4.0 when using uniform client strings, if the
    uniquifier string is suddenly changed.

This patch will catch those situations by checking the client owner id
in the trunking detection code, and will do the right thing if it notices
that the strings differ.

Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

NFSv4: Cache the NFSv4/v4.1 client owner_id in the struct nfs_client

[ Upstream commit ceb3a16c070c403f5f9ca46b46cf2bb79ea11750 ]

Ensure that we cache the NFSv4/v4.1 client owner_id so that we can
verify it when we're doing trunking detection.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

ARM: dra7xx: Fix counter frequency drift for AM572x errata i856

[ Upstream commit afc9d590b8a150cfeaac0078ef5de6fb21a5ea6a ]

Errata i856 for the AM572x (DRA7xx) points out that the 32.768KHz external
crystal is not enabled at power up. Instead the CPU falls back to using
an emulation for the 32KHz clock which is SYSCLK1/610. SYSCLK1 is usually
20MHz on boards so far (which gives an emulated frequency of 32.786KHz),
but can also be 19.2 or 27MHz which result in much larger drift.

Since this is used to drive the master counter at 32.768KHz * 375 /
2 = 6.144MHz, the emulated speed for 20MHz is of by 570ppm, or about 43
seconds per day, and more than the 500ppm NTP is able to tolerate.

Checking the CTRL_CORE_BOOTSTRAP register can determine if the CPU
is using the real 32.768KHz crystal or the emulated SYSCLK1/610, and
by known that the real counter frequency can be determined and used.
The real speed is then SYSCLK1 / 610 * 375 / 2 or SYSCLK1 * 75 / 244.

Signed-off-by: Len Sorensen <lsorense@csclub.uwaterloo.ca>
Tested-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

iio: iio: Fix iio_channel_read return if channel havn't info

[ Upstream commit 65de7654d39c70c2b942f801cea01590cf7e3458 ]

When xilinx-xadc is used with hwmon driver to read voltage, offset used
for temperature is always applied whatever the channel.

iio_channel_read must return an error to avoid offset for channel
without IIO_CHAN_INFO_OFFSET property.

Signed-off-by: Fabien Proriol <fabien.proriol@jdsu.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

phy: phy-ti-pipe3: fix inconsistent enumeration of PCIe gen2 cards

[ Upstream commit 0bc09f9cdc589e0b54724096138996a00b19babb ]

Prior to DRA74x silicon rev 1.1, pcie_pcs register bits 8-15 and bits 16-23
were used to configure RC delay count for phy1 and phy2 respectively.
phyid was used as index to distinguish the phys and to configure the delay
values appropriately.

As of DRA74x silicon rev 1.1, pcie_pcs register definition has changed.
Bits 16-23 are used to configure delay values for *both* phy1 and phy2.

Hence phyid is no longer required.

So, drop id field from ti_pipe3 structure and its subsequent references
for configuring pcie_pcs register.

Also, pcie_pcs register now needs to be configured with delay value of 0x96
at bit positions 16-23. See register description of CTRL_CORE_PCIE_PCS in
ARM572x TRM, SPRUHZ6, October 2014, section 18.5.2.2, table 18-1804.

This is needed to ensure Gen2 cards are enumerated consistently.

DRA72x silicon behaves same way as DRA74x rev 1.1 as far as this functionality
is considered.

Test results on DRA74x and DRA72x EVMs:

Before patch
------------
DRA74x ES 1.0: Gen1 cards work, Gen2 cards do not work (expected result due to
silicon errata)
DRA74x ES 1.1: Gen1 cards work, Gen2 cards do not work sometimes due to incorrect
programming of register

DRA72x: Gen1 cards work, Gen2 cards do not work sometimes due to incorrect
programming of register

After patch
-----------
DRA74x ES 1.0: Gen1 cards work, Gen2 cards do not work (expected result due to
silicon errata)
DRA74x ES 1.1: Gen1 cards work, Gen2 cards work consistently.

DRA72x: Gen1 and Gen2 cards enumerate consistently.

Signed-off-by: Vignesh R <vigneshr@ti.com>
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

phy-sun4i-usb: Change disconnect threshold value for sun6i

[ Upstream commit 372400344afb60e275a271f3f5ccce17af0e45cb ]

The allwinner SDK uses a value of 3 for the disconnect threshold setting on
sun6i, do the same in the kernel.

In my previous experience with sun5i problems getting the threshold right
is important to avoid usb2 devices being unplugged sometimes going unnoticed.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

usb: dwc2: gadget: kill requests with 'force' in s3c_hsotg_udc_stop()

[ Upstream commit 62f4f0651ce8ef966a0e5b6db6a7a524c268fdd2 ]

This makes us sure that all requests are completed before we unbind
gadget. There are assumptions in gadget API that all requests have to
be completed and leak of complete can break some usb function drivers.

For example unbind of ECM function can cause NULL pointer dereference:

[   26.396595] configfs-gadget gadget: unbind function
'cdc_ethernet'/e79c4c00
[   26.414999] Unable to handle kernel NULL pointer dereference at
virtual address 00000000
(...)
[   26.452223] PC is at ecm_unbind+0x6c/0x9c
[   26.456209] LR is at ecm_unbind+0x68/0x9c
(...)
[   26.603696] [<c033fdb4>] (ecm_unbind) from [<c033661c>]
(purge_configs_funcs+0x94/0xd8)
[   26.611674] [<c033661c>] (purge_configs_funcs) from [<c0336674>]
(configfs_composite_unbind+0x14/0x34)
[   26.620961] [<c0336674>] (configfs_composite_unbind) from
[<c0337124>] (usb_gadget_remove_driver+0x68/0x9c)
[   26.630683] [<c0337124>] (usb_gadget_remove_driver) from [<c03376c8>]
(usb_gadget_unregister_driver+0x64/0x94)
[   26.640664] [<c03376c8>] (usb_gadget_unregister_driver) from
[<c0336be8>] (unregister_gadget+0x20/0x3c)
[   26.650038] [<c0336be8>] (unregister_gadget) from [<c0336c84>]
(gadget_dev_desc_UDC_store+0x80/0xb8)
[   26.659152] [<c0336c84>] (gadget_dev_desc_UDC_store) from
[<c0335120>] (gadget_info_attr_store+0x1c/0x28)
[   26.668703] [<c0335120>] (gadget_info_attr_store) from [<c012135c>]
(configfs_write_file+0xe8/0x148)
[   26.677818] [<c012135c>] (configfs_write_file) from [<c00c8dd4>]
(vfs_write+0xb0/0x1a0)
[   26.685801] [<c00c8dd4>] (vfs_write) from [<c00c91b8>]
(SyS_write+0x44/0x84)
[   26.692834] [<c00c91b8>] (SyS_write) from [<c000e560>]
(ret_fast_syscall+0x0/0x30)
[   26.700381] Code: e30409f8 e34c0069 eb07b88d e59430a8 (e5930000)
[   26.706485] ---[ end trace f62a082b323838a2 ]---

It's because in some cases request is still running on endpoint during
unbind and kill_all_requests() called from s3c_hsotg_udc_stop() function
doesn't cause call of complete() of request. Missing complete() call
causes ecm->notify_req equals NULL in ecm_unbind() function, and this
is reason of this bug.

Similar breaks can be observed in another usb function drivers.

This patch fixes this bug forcing usb request completion in when
s3c_hsotg_ep_disable() is called from s3c_hsotg_udc_stop().

Acked-by: Paul Zimmerman <paulz@synopsys.com>
Signed-off-by: Robert Baldyga <r.baldyga@samsung.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

usb: musb: Fix randconfig build issues for Kconfig options

[ Upstream commit c0442479652b99b62dd1ffccb34231caff25751c ]

Commit 82c02f58ba3a ("usb: musb: Allow multiple glue layers to be
built in") enabled selecting multiple glue layers, which in turn
exposed things more for randconfig builds. If NOP_USB_XCEIV is
built-in and TUSB6010 is a loadable module, we will get:

drivers/built-in.o: In function `tusb_remove':
tusb6010.c:(.text+0x16a817): undefined reference to `usb_phy_generic_unregister'
drivers/built-in.o: In function `tusb_probe':
tusb6010.c:(.text+0x16b24e): undefined reference to `usb_phy_generic_register'
make: *** [vmlinux] Error 1

Let's fix this the same way as commit 70c1ff4b3c86 ("usb: musb:
tusb-dma can't be built-in if tusb is not").

And while at it, let's not allow selecting the glue layers except
on platforms really using them unless COMPILE_TEST is specified:

- TUSB6010 is in practise only used on omaps

- DSPS is only used on TI platforms

- UX500 is only used on STE platforms

Cc: Linus Walleij <linus.walleij@linaro.org>
Reported-by: Jim Davis <jim.epost@gmail.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

usb: gadget: f_uac1: access freed memory at f_audio_free_inst

[ Upstream commit 4fde6204df052bb89ba3d915ed6ed9f306f3cfa1 ]

At f_audio_free_inst, it tries to access struct gaudio *card which is
freed at f_audio_free, it causes below oops if the audio device is not
there (do unload module may trigger the same problem). The gaudio_cleanup
is related to function, so it is better move to f_audio_free.

root@freescale ~$ modprobe g_audio
[  751.968931] g_audio gadget: unable to open sound control device file: /dev/snd/controlC0
[  751.977134] g_audio gadget: we need at least one control device
[  751.988633] Unable to handle kernel paging request at virtual address 455f448e
[  751.995963] pgd = bd42c000
[  751.998681] [455f448e] *pgd=00000000
[  752.002383] Internal error: Oops: 5 [#1] SMP ARM
[  752.007008] Modules linked in: usb_f_uac1 g_audio(+) usb_f_mass_storage libcomposite configfs [last unloaded: g_mass_storage]
[  752.018427] CPU: 0 PID: 692 Comm: modprobe Not tainted 3.18.0-rc4-00345-g842f57b #10
[  752.026176] task: bdb3ba80 ti: bd41a000 task.ti: bd41a000
[  752.031590] PC is at filp_close+0xc/0x84
[  752.035530] LR is at gaudio_cleanup+0x28/0x54 [usb_f_uac1]
[  752.041023] pc : [<800ec94c>]    lr : [<7f03c63c>]    psr: 20000013
[  752.041023] sp : bd41bcc8  ip : bd41bce8  fp : bd41bce4
[  752.052504] r10: 7f036234  r9 : 7f036220  r8 : 7f036500
[  752.057732] r7 : bd456480  r6 : 7f036500  r5 : 7f03626c  r4 : bd441000
[  752.064264] r3 : 7f03b3dc  r2 : 7f03cab0  r1 : 00000000  r0 : 455f4456
[  752.070798] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[  752.077938] Control: 10c5387d  Table: bd42c04a  DAC: 00000015
[  752.083688] Process modprobe (pid: 692, stack limit = 0xbd41a240)
[  752.089786] Stack: (0xbd41bcc8 to 0xbd41c000)
[  752.094152] bcc0:                   7f03b3dc bd441000 7f03626c 7f036500 bd41bcfc bd41bce8
[  752.102337] bce0: 7f03c63c 800ec94c 7f03b3dc bdaa6b00 bd41bd14 bd41bd00 7f03b3f4 7f03c620
[  752.110521] bd00: 7f03b3dc 7f03cbd4 bd41bd2c bd41bd18 7f00f88c 7f03b3e8 00000000 fffffffe
[  752.118705] bd20: bd41bd5c bd41bd30 7f0380d8 7f00f874 7f038000 bd456480 7f036364 be392240
[  752.126889] bd40: 00000000 7f00f620 7f00f638 bd41a008 bd41bd94 bd41bd60 7f00f6d4 7f03800c
[  752.135073] bd60: 00000001 00000000 8047438c be3a4000 7f036364 7f036364 7f00db28 7f00f620
[  752.143257] bd80: 7f00f638 bd41a008 bd41bdb4 bd41bd98 804742ac 7f00f644 00000000 809adde0
[  752.151442] bda0: 7f036364 7f036364 bd41bdcc bd41bdb8 804743c8 80474284 7f03633c 7f036200
[  752.159626] bdc0: bd41bdf4 bd41bdd0 7f00d5b4 8047435c bd41a000 80974060 7f038158 00000000
[  752.167811] bde0: 80974060 bdaa9940 bd41be04 bd41bdf8 7f03816c 7f00d518 bd41be8c bd41be08
[  752.175995] be00: 80008a5c 7f038164 be001f00 7f0363c4 bd41bf48 00000000 bd41be54 bd41be28
[  752.184179] be20: 800e9498 800e8e74 00000002 00000003 bd4129c0 c0a07000 00000001 7f0363c4
[  752.192363] be40: bd41bf48 00000000 bd41be74 bd41be58 800de780 800e9320 bd41a000 7f0363d0
[  752.200547] be60: 00000000 bd41a000 7f0363d0 00000000 bd41beec 7f0363c4 bd41bf48 00000000
[  752.208731] be80: bd41bf44 bd41be90 80093e54 800089e0 ffff8000 00007fff 80091390 0000065f
[  752.216915] bea0: 00000000 c0a0834c bd41bf7c 00000086 bd41bf50 00000000 7f03651c 00000086
[  752.225099] bec0: bd41a010 00c28758 800ddcc4 800ddae0 000000d2 bd412a00 bd41bf24 00000000
[  752.233283] bee0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  752.241467] bf00: 00000000 00000000 00000000 00000000 00000000 00000000 bd41bf44 000025b0
[  752.249651] bf20: 00c28a08 00c28758 00000080 8000edc4 bd41a000 00000000 bd41bfa4 bd41bf48
[  752.257835] bf40: 800943e4 800932ec c0a07000 000025b0 c0a07f8c c0a07ea4 c0a08e5c 0000051c
[  752.266019] bf60: 0000088c 00000000 00000000 00000000 00000018 00000019 00000010 0000000b
[  752.274203] bf80: 00000009 00000000 00000000 000025b0 00000000 00c28758 00000000 bd41bfa8
[  752.282387] bfa0: 8000ec00 8009430c 000025b0 00000000 00c28a08 000025b0 00c28758 00c28980
[  752.290571] bfc0: 000025b0 00000000 00c28758 00000080 000a6a78 00000007 00c28718 00c28980
[  752.298756] bfe0: 7ebc1af0 7ebc1ae0 0001a32c 76e9c490 60000010 00c28a08 22013510 ecebffff
[  752.306933] Backtrace:
[  752.309414] [<800ec940>] (filp_close) from [<7f03c63c>] (gaudio_cleanup+0x28/0x54 [usb_f_uac1])
[  752.318115]  r6:7f036500 r5:7f03626c r4:bd441000 r3:7f03b3dc
[  752.323851] [<7f03c614>] (gaudio_cleanup [usb_f_uac1]) from [<7f03b3f4>] (f_audio_free_inst+0x18/0x68 [usb_f_uac1])
[  752.334288]  r4:bdaa6b00 r3:7f03b3dc
[  752.337931] [<7f03b3dc>] (f_audio_free_inst [usb_f_uac1]) from [<7f00f88c>] (usb_put_function_instance+0x24/0x30 [libcomposite])
[  752.349498]  r4:7f03cbd4 r3:7f03b3dc
[  752.353127] [<7f00f868>] (usb_put_function_instance [libcomposite]) from [<7f0380d8>] (audio_bind+0xd8/0xfc [g_audio])
[  752.363824]  r4:fffffffe r3:00000000
[  752.367456] [<7f038000>] (audio_bind [g_audio]) from [<7f00f6d4>] (composite_bind+0x9c/0x1e8 [libcomposite])
[  752.377284]  r10:bd41a008 r9:7f00f638 r8:7f00f620 r7:00000000 r6:be392240 r5:7f036364
[  752.385193]  r4:bd456480 r3:7f038000
[  752.388825] [<7f00f638>] (composite_bind [libcomposite]) from [<804742ac>] (udc_bind_to_driver+0x34/0xd8)
[  752.398394]  r10:bd41a008 r9:7f00f638 r8:7f00f620 r7:7f00db28 r6:7f036364 r5:7f036364
[  752.406302]  r4:be3a4000
[  752.408860] [<80474278>] (udc_bind_to_driver) from [<804743c8>] (usb_gadget_probe_driver+0x78/0xa8)
[  752.417908]  r6:7f036364 r5:7f036364 r4:809adde0 r3:00000000
[  752.423649] [<80474350>] (usb_gadget_probe_driver) from [<7f00d5b4>] (usb_composite_probe+0xa8/0xd4 [libcomposite])
[  752.434086]  r5:7f036200 r4:7f03633c
[  752.437713] [<7f00d50c>] (usb_composite_probe [libcomposite]) from [<7f03816c>] (audio_driver_init+0x14/0x1c [g_audio])
[  752.448498]  r9:bdaa9940 r8:80974060 r7:00000000 r6:7f038158 r5:80974060 r4:bd41a000
[  752.456330] [<7f038158>] (audio_driver_init [g_audio]) from [<80008a5c>] (do_one_initcall+0x88/0x1d4)
[  752.465564] [<800089d4>] (do_one_initcall) from [<80093e54>] (load_module+0xb74/0x1020)
[  752.473571]  r10:00000000 r9:bd41bf48 r8:7f0363c4 r7:bd41beec r6:00000000 r5:7f0363d0
[  752.481478]  r4:bd41a000
[  752.484037] [<800932e0>] (load_module) from [<800943e4>] (SyS_init_module+0xe4/0xf8)
[  752.491781]  r10:00000000 r9:bd41a000 r8:8000edc4 r7:00000080 r6:00c28758 r5:00c28a08
[  752.499689]  r4:000025b0
[  752.502252] [<80094300>] (SyS_init_module) from [<8000ec00>] (ret_fast_syscall+0x0/0x48)
[  752.510345]  r6:00c28758 r5:00000000 r4:000025b0
[  752.515013] Code: 808475b4 e1a0c00d e92dd878 e24cb004 (e5904038)
[  752.521223] ---[ end trace 70babe34de4ab99b ]---
Segmentation fault

Signed-off-by: Peter Chen <peter.chen@freescale.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

usb: musb: Fix a few off-by-one lengths

[ Upstream commit e87c3f80ad0490d26ffe04754b7d094463b40f30 ]

!strncmp(buf, "force host", 9) is true if and only if buf starts with
"force hos". This was obviously not what was intended. The same error
exists for "force full-speed", "force high-speed" and "test
packet". Using strstarts avoids the error-prone hardcoding of the
prefix length.

For consistency, also change the other occurences of the !strncmp
idiom.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>