]> git.itanic.dy.fi Git - linux-stable/log
linux-stable
10 years agoLinux 3.12.18 v3.12.18
Jiri Slaby [Fri, 18 Apr 2014 09:14:28 +0000 (11:14 +0200)]
Linux 3.12.18

10 years agocrypto: ghash-clmulni-intel - use C implementation for setkey()
Ard Biesheuvel [Thu, 27 Mar 2014 17:14:40 +0000 (18:14 +0100)]
crypto: ghash-clmulni-intel - use C implementation for setkey()

commit 8ceee72808d1ae3fb191284afc2257a2be964725 upstream.

The GHASH setkey() function uses SSE registers but fails to call
kernel_fpu_begin()/kernel_fpu_end(). Instead of adding these calls, and
then having to deal with the restriction that they cannot be called from
interrupt context, move the setkey() implementation to the C domain.

Note that setkey() does not use any particular SSE features and is not
expected to become a performance bottleneck.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Fixes: 0e1227d356e9b (crypto: ghash - Add PCLMULQDQ accelerated implementation)
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agoARC: [nsimosci] Unbork console
Vineet Gupta [Sat, 5 Apr 2014 10:00:22 +0000 (15:30 +0530)]
ARC: [nsimosci] Unbork console

commit 61fb4bfc010b0d2940f7fd87acbce6a0f03217cb upstream.

Despite the switch to right UART driver (prev patch), serial console
still doesn't work due to missing CONFIG_SERIAL_OF_PLATFORM

Also fix the default cmdline in DT to not refer to out-of-tree
ARC framebuffer driver for console.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Cc: Francois Bedard <Francois.Bedard@synopsys.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agoARC: [nsimosci] Change .dts to use generic 8250 UART
Mischa Jonker [Thu, 16 May 2013 17:36:08 +0000 (19:36 +0200)]
ARC: [nsimosci] Change .dts to use generic 8250 UART

commit 6eda477b3c54b8236868c8784e5e042ff14244f0 upstream.

The Synopsys APB DW UART has a couple of special features that are not
in the System C model. In 3.8, the 8250_dw driver didn't really use these
features, but from 3.9 onwards, the 8250_dw driver has become incompatible
with our model.

Signed-off-by: Mischa Jonker <mjonker@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Cc: Francois Bedard <Francois.Bedard@synopsys.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agords: prevent dereference of a NULL device in rds_iw_laddr_check
Sasha Levin [Sun, 30 Mar 2014 00:39:35 +0000 (20:39 -0400)]
rds: prevent dereference of a NULL device in rds_iw_laddr_check

[ Upstream commit bf39b4247b8799935ea91d90db250ab608a58e50 ]

Binding might result in a NULL device which is later dereferenced
without checking.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agoisdnloop: several buffer overflows
Dan Carpenter [Tue, 8 Apr 2014 09:23:09 +0000 (12:23 +0300)]
isdnloop: several buffer overflows

[ Upstream commit 7563487cbf865284dcd35e9ef5a95380da046737 ]

There are three buffer overflows addressed in this patch.

1) In isdnloop_fake_err() we add an 'E' to a 60 character string and
then copy it into a 60 character buffer.  I have made the destination
buffer 64 characters and I'm changed the sprintf() to a snprintf().

2) In isdnloop_parse_cmd(), p points to a 6 characters into a 60
character buffer so we have 54 characters.  The ->eazlist[] is 11
characters long.  I have modified the code to return if the source
buffer is too long.

3) In isdnloop_command() the cbuf[] array was 60 characters long but the
max length of the string then can be up to 79 characters.  I made the
cbuf array 80 characters long and changed the sprintf() to snprintf().
I also removed the temporary "dial" buffer and changed it to use "p"
directly.

Unfortunately, we pass the "cbuf" string from isdnloop_command() to
isdnloop_writecmd() which truncates anything over 60 characters to make
it fit in card->omsg[].  (It can accept values up to 255 characters so
long as there is a '\n' character every 60 characters).  For now I have
just fixed the memory corruption bug and left the other problems in this
driver alone.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agoisdnloop: Validate NUL-terminated strings from user.
YOSHIFUJI Hideaki [Wed, 2 Apr 2014 03:48:42 +0000 (12:48 +0900)]
isdnloop: Validate NUL-terminated strings from user.

[ Upstream commit 77bc6bed7121936bb2e019a8c336075f4c8eef62 ]

Return -EINVAL unless all of user-given strings are correctly
NUL-terminated.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agonet: vxlan: fix crash when interface is created with no group
Mike Rapoport [Tue, 1 Apr 2014 06:23:01 +0000 (09:23 +0300)]
net: vxlan: fix crash when interface is created with no group

[ Upstream commit 5933a7bbb5de66482ea8aa874a7ebaf8e67603c4 ]

If the vxlan interface is created without explicit group definition,
there are corner cases which may cause kernel panic.

For instance, in the following scenario:

node A:
$ ip link add dev vxlan42  address 2c:c2:60:00:10:20 type vxlan id 42
$ ip addr add dev vxlan42 10.0.0.1/24
$ ip link set up dev vxlan42
$ arp -i vxlan42 -s 10.0.0.2 2c:c2:60:00:01:02
$ bridge fdb add dev vxlan42 to 2c:c2:60:00:01:02 dst <IPv4 address>
$ ping 10.0.0.2

node B:
$ ip link add dev vxlan42 address 2c:c2:60:00:01:02 type vxlan id 42
$ ip addr add dev vxlan42 10.0.0.2/24
$ ip link set up dev vxlan42
$ arp -i vxlan42 -s 10.0.0.1 2c:c2:60:00:10:20

node B crashes:

 vxlan42: 2c:c2:60:00:10:20 migrated from 4011:eca4:c0a8:6466:c0a8:6415:8e09:2118 to (invalid address)
 vxlan42: 2c:c2:60:00:10:20 migrated from 4011:eca4:c0a8:6466:c0a8:6415:8e09:2118 to (invalid address)
 BUG: unable to handle kernel NULL pointer dereference at 0000000000000046
 IP: [<ffffffff8143c459>] ip6_route_output+0x58/0x82
 PGD 7bd89067 PUD 7bd4e067 PMD 0
 Oops: 0000 [#1] SMP
 Modules linked in:
 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.14.0-rc8-hvx-xen-00019-g97a5221-dirty #154
 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
 task: ffff88007c774f50 ti: ffff88007c79c000 task.ti: ffff88007c79c000
 RIP: 0010:[<ffffffff8143c459>]  [<ffffffff8143c459>] ip6_route_output+0x58/0x82
 RSP: 0018:ffff88007fd03668  EFLAGS: 00010282
 RAX: 0000000000000000 RBX: ffffffff8186a000 RCX: 0000000000000040
 RDX: 0000000000000000 RSI: ffff88007b0e4a80 RDI: ffff88007fd03754
 RBP: ffff88007fd03688 R08: ffff88007b0e4a80 R09: 0000000000000000
 R10: 0200000a0100000a R11: 0001002200000000 R12: ffff88007fd03740
 R13: ffff88007b0e4a80 R14: ffff88007b0e4a80 R15: ffff88007bba0c50
 FS:  0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
 CR2: 0000000000000046 CR3: 000000007bb60000 CR4: 00000000000006e0
 Stack:
  0000000000000000 ffff88007fd037a0 ffffffff8186a000 ffff88007fd03740
  ffff88007fd036c8 ffffffff814320bb 0000000000006e49 ffff88007b8b7360
  ffff88007bdbf200 ffff88007bcbc000 ffff88007b8b7000 ffff88007b8b7360
 Call Trace:
  <IRQ>
  [<ffffffff814320bb>] ip6_dst_lookup_tail+0x2d/0xa4
  [<ffffffff814322a5>] ip6_dst_lookup+0x10/0x12
  [<ffffffff81323b4e>] vxlan_xmit_one+0x32a/0x68c
  [<ffffffff814a325a>] ? _raw_spin_unlock_irqrestore+0x12/0x14
  [<ffffffff8104c551>] ? lock_timer_base.isra.23+0x26/0x4b
  [<ffffffff8132451a>] vxlan_xmit+0x66a/0x6a8
  [<ffffffff8141a365>] ? ipt_do_table+0x35f/0x37e
  [<ffffffff81204ba2>] ? selinux_ip_postroute+0x41/0x26e
  [<ffffffff8139d0c1>] dev_hard_start_xmit+0x2ce/0x3ce
  [<ffffffff8139d491>] __dev_queue_xmit+0x2d0/0x392
  [<ffffffff813b380f>] ? eth_header+0x28/0xb5
  [<ffffffff8139d569>] dev_queue_xmit+0xb/0xd
  [<ffffffff813a5aa6>] neigh_resolve_output+0x134/0x152
  [<ffffffff813db741>] ip_finish_output2+0x236/0x299
  [<ffffffff813dc074>] ip_finish_output+0x98/0x9d
  [<ffffffff813dc749>] ip_output+0x62/0x67
  [<ffffffff813da9f2>] dst_output+0xf/0x11
  [<ffffffff813dc11c>] ip_local_out+0x1b/0x1f
  [<ffffffff813dcf1b>] ip_send_skb+0x11/0x37
  [<ffffffff813dcf70>] ip_push_pending_frames+0x2f/0x33
  [<ffffffff813ff732>] icmp_push_reply+0x106/0x115
  [<ffffffff813ff9e4>] icmp_reply+0x142/0x164
  [<ffffffff813ffb3b>] icmp_echo.part.16+0x46/0x48
  [<ffffffff813c1d30>] ? nf_iterate+0x43/0x80
  [<ffffffff813d8037>] ? xfrm4_policy_check.constprop.11+0x52/0x52
  [<ffffffff813ffb62>] icmp_echo+0x25/0x27
  [<ffffffff814005f7>] icmp_rcv+0x1d2/0x20a
  [<ffffffff813d8037>] ? xfrm4_policy_check.constprop.11+0x52/0x52
  [<ffffffff813d810d>] ip_local_deliver_finish+0xd6/0x14f
  [<ffffffff813d8037>] ? xfrm4_policy_check.constprop.11+0x52/0x52
  [<ffffffff813d7fde>] NF_HOOK.constprop.10+0x4c/0x53
  [<ffffffff813d82bf>] ip_local_deliver+0x4a/0x4f
  [<ffffffff813d7f7b>] ip_rcv_finish+0x253/0x26a
  [<ffffffff813d7d28>] ? inet_add_protocol+0x3e/0x3e
  [<ffffffff813d7fde>] NF_HOOK.constprop.10+0x4c/0x53
  [<ffffffff813d856a>] ip_rcv+0x2a6/0x2ec
  [<ffffffff8139a9a0>] __netif_receive_skb_core+0x43e/0x478
  [<ffffffff812a346f>] ? virtqueue_poll+0x16/0x27
  [<ffffffff8139aa2f>] __netif_receive_skb+0x55/0x5a
  [<ffffffff8139aaaa>] process_backlog+0x76/0x12f
  [<ffffffff8139add8>] net_rx_action+0xa2/0x1ab
  [<ffffffff81047847>] __do_softirq+0xca/0x1d1
  [<ffffffff81047ace>] irq_exit+0x3e/0x85
  [<ffffffff8100b98b>] do_IRQ+0xa9/0xc4
  [<ffffffff814a37ad>] common_interrupt+0x6d/0x6d
  <EOI>
  [<ffffffff810378db>] ? native_safe_halt+0x6/0x8
  [<ffffffff810110c7>] default_idle+0x9/0xd
  [<ffffffff81011694>] arch_cpu_idle+0x13/0x1c
  [<ffffffff8107480d>] cpu_startup_entry+0xbc/0x137
  [<ffffffff8102e741>] start_secondary+0x1a0/0x1a5
 Code: 24 14 e8 f1 e5 01 00 31 d2 a8 32 0f 95 c2 49 8b 44 24 2c 49 0b 44 24 24 74 05 83 ca 04 eb 1c 4d 85 ed 74 17 49 8b 85 a8 02 00 00 <66> 8b 40 46 66 c1 e8 07 83 e0 07 c1 e0 03 09 c2 4c 89 e6 48 89
 RIP  [<ffffffff8143c459>] ip6_route_output+0x58/0x82
  RSP <ffff88007fd03668>
 CR2: 0000000000000046
 ---[ end trace 4612329caab37efd ]---

When vxlan interface is created without explicit group definition, the
default_dst protocol family is initialiazed to AF_UNSPEC and the driver
assumes IPv4 configuration. On the other side, the default_dst protocol
family is used to differentiate between IPv4 and IPv6 cases and, since,
AF_UNSPEC != AF_INET, the processing takes the IPv6 path.

Making the IPv4 assumption explicit by settting default_dst protocol
family to AF_INET4 and preventing mixing of IPv4 and IPv6 addresses in
snooped fdb entries fixes the corner case crashes.

Signed-off-by: Mike Rapoport <mike.rapoport@ravellosystems.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agoxen-netback: disable rogue vif in kthread context
Wei Liu [Tue, 1 Apr 2014 11:46:12 +0000 (12:46 +0100)]
xen-netback: disable rogue vif in kthread context

[ Upstream commit e9d8b2c2968499c1f96563e6522c56958d5a1d0d ]

When netback discovers frontend is sending malformed packet it will
disables the interface which serves that frontend.

However disabling a network interface involving taking a mutex which
cannot be done in softirq context, so we need to defer this process to
kthread context.

This patch does the following:
1. introduce a flag to indicate the interface is disabled.
2. check that flag in TX path, don't do any work if it's true.
3. check that flag in RX path, turn off that interface if it's true.

The reason to disable it in RX path is because RX uses kthread. After
this change the behavior of netback is still consistent -- it won't do
any TX work for a rogue frontend, and the interface will be eventually
turned off.

Also change a "continue" to "break" after xenvif_fatal_tx_err, as it
doesn't make sense to continue processing packets if frontend is rogue.

This is a fix for XSA-90.

Reported-by: Török Edwin <edwin@etorok.net>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agonetlink: don't compare the nul-termination in nla_strcmp
Pablo Neira [Tue, 1 Apr 2014 17:38:44 +0000 (19:38 +0200)]
netlink: don't compare the nul-termination in nla_strcmp

[ Upstream commit 8b7b932434f5eee495b91a2804f5b64ebb2bc835 ]

nla_strcmp compares the string length plus one, so it's implicitly
including the nul-termination in the comparison.

 int nla_strcmp(const struct nlattr *nla, const char *str)
 {
        int len = strlen(str) + 1;
        ...
                d = memcmp(nla_data(nla), str, len);

However, if NLA_STRING is used, userspace can send us a string without
the nul-termination. This is a problem since the string
comparison will not match as the last byte may be not the
nul-termination.

Fix this by skipping the comparison of the nul-termination if the
attribute data is nul-terminated. Suggested by Thomas Graf.

Cc: Florian Westphal <fw@strlen.de>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agoipv6: some ipv6 statistic counters failed to disable bh
Hannes Frederic Sowa [Mon, 31 Mar 2014 18:14:10 +0000 (20:14 +0200)]
ipv6: some ipv6 statistic counters failed to disable bh

[ Upstream commit 43a43b6040165f7b40b5b489fe61a4cb7f8c4980 ]

After commit c15b1ccadb323ea ("ipv6: move DAD and addrconf_verify
processing to workqueue") some counters are now updated in process context
and thus need to disable bh before doing so, otherwise deadlocks can
happen on 32-bit archs. Fabio Estevam noticed this while while mounting
a NFS volume on an ARM board.

As a compensation for missing this I looked after the other *_STATS_BH
and found three other calls which need updating:

1) icmp6_send: ip6_fragment -> icmpv6_send -> icmp6_send (error handling)
2) ip6_push_pending_frames: rawv6_sendmsg -> rawv6_push_pending_frames -> ...
   (only in case of icmp protocol with raw sockets in error handling)
3) ping6_v6_sendmsg (error handling)

Fixes: c15b1ccadb323ea ("ipv6: move DAD and addrconf_verify processing to workqueue")
Reported-by: Fabio Estevam <festevam@gmail.com>
Tested-by: Fabio Estevam <fabio.estevam@freescale.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agoxen-netback: remove pointless clause from if statement
Paul Durrant [Fri, 28 Mar 2014 11:39:05 +0000 (11:39 +0000)]
xen-netback: remove pointless clause from if statement

[ Upstream commit 0576eddf24df716d8570ef8ca11452a9f98eaab2 ]

This patch removes a test in start_new_rx_buffer() that checks whether
a copy operation is less than MAX_BUFFER_OFFSET in length, since
MAX_BUFFER_OFFSET is defined to be PAGE_SIZE and the only caller of
start_new_rx_buffer() already limits copy operations to PAGE_SIZE or less.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Reported-By: Sander Eikelenboom <linux@eikelenboom.it>
Tested-By: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agovhost: validate vhost_get_vq_desc return value
Michael S. Tsirkin [Thu, 27 Mar 2014 10:53:37 +0000 (12:53 +0200)]
vhost: validate vhost_get_vq_desc return value

[ Upstream commit a39ee449f96a2cd44ce056d8a0a112211a9b1a1f ]

vhost fails to validate negative error code
from vhost_get_vq_desc causing
a crash: we are using -EFAULT which is 0xfffffff2
as vector size, which exceeds the allocated size.

The code in question was introduced in commit
8dd014adfea6f173c1ef6378f7e5e7924866c923
    vhost-net: mergeable buffers support

CVE-2014-0055

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agovhost: fix total length when packets are too short
Michael S. Tsirkin [Thu, 27 Mar 2014 10:00:26 +0000 (12:00 +0200)]
vhost: fix total length when packets are too short

[ Upstream commit d8316f3991d207fe32881a9ac20241be8fa2bad0 ]

When mergeable buffers are disabled, and the
incoming packet is too large for the rx buffer,
get_rx_bufs returns success.

This was intentional in order for make recvmsg
truncate the packet and then handle_rx would
detect err != sock_len and drop it.

Unfortunately we pass the original sock_len to
recvmsg - which means we use parts of iov not fully
validated.

Fix this up by detecting this overrun and doing packet drop
immediately.

CVE-2014-0077

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agovlan: Set hard_header_len according to available acceleration
Vlad Yasevich [Wed, 26 Mar 2014 15:47:56 +0000 (11:47 -0400)]
vlan: Set hard_header_len according to available acceleration

[ Upstream commit fc0d48b8fb449ca007b2057328abf736cb516168 ]

Currently, if the card supports CTAG acceleration we do not
account for the vlan header even if we are configuring an
8021AD vlan.  This may not be best since we'll do software
tagging for 8021AD which will cause data copy on skb head expansion
Configure the length based on available hw offload capabilities and
vlan protocol.

CC: Patrick McHardy <kaber@trash.net>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agousbnet: include wait queue head in device structure
Oliver Neukum [Wed, 26 Mar 2014 13:32:51 +0000 (14:32 +0100)]
usbnet: include wait queue head in device structure

[ Upstream commit 14a0d635d18d0fb552dcc979d6d25106e6541f2e ]

This fixes a race which happens by freeing an object on the stack.
Quoting Julius:
> The issue is
> that it calls usbnet_terminate_urbs() before that, which temporarily
> installs a waitqueue in dev->wait in order to be able to wait on the
> tasklet to run and finish up some queues. The waiting itself looks
> okay, but the access to 'dev->wait' is totally unprotected and can
> race arbitrarily. I think in this case usbnet_bh() managed to succeed
> it's dev->wait check just before usbnet_terminate_urbs() sets it back
> to NULL. The latter then finishes and the waitqueue_t structure on its
> stack gets overwritten by other functions halfway through the
> wake_up() call in usbnet_bh().

The fix is to just not allocate the data structure on the stack.
As dev->wait is abused as a flag it also takes a runtime PM change
to fix this bug.

Signed-off-by: Oliver Neukum <oneukum@suse.de>
Reported-by: Grant Grundler <grundler@google.com>
Tested-by: Grant Grundler <grundler@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agotg3: Do not include vlan acceleration features in vlan_features
Vlad Yasevich [Mon, 24 Mar 2014 21:52:12 +0000 (17:52 -0400)]
tg3: Do not include vlan acceleration features in vlan_features

[ Upstream commit 51dfe7b944998eaeb2b34d314f3a6b16a5fd621b ]

Including hardware acceleration features in vlan_features breaks
stacked vlans (Q-in-Q) by marking the bottom vlan interface as
capable of acceleration.  This causes one of the tags to be lost
and the packets are sent with a sing vlan header.

CC: Nithin Nayak Sujir <nsujir@broadcom.com>
CC: Michael Chan <mchan@broadcom.com>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agoip_tunnel: Fix dst ref-count.
Pravin B Shelar [Mon, 24 Mar 2014 05:06:36 +0000 (22:06 -0700)]
ip_tunnel: Fix dst ref-count.

[ Upstream commit fbd02dd405d0724a0f25897ed4a6813297c9b96f ]

Commit 10ddceb22ba (ip_tunnel:multicast process cause panic due
to skb->_skb_refdst NULL pointer) removed dst-drop call from
ip-tunnel-recv.

Following commit reintroduce dst-drop and fix the original bug by
checking loopback packet before releasing dst.
Original bug: https://bugzilla.kernel.org/show_bug.cgi?id=70681

CC: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agotipc: fix spinlock recursion bug for failed subscriptions
Erik Hugne [Mon, 24 Mar 2014 15:56:38 +0000 (16:56 +0100)]
tipc: fix spinlock recursion bug for failed subscriptions

[ Upstream commit a5d0e7c037119484a7006b883618bfa87996cb41 ]

If a topology event subscription fails for any reason, such as out
of memory, max number reached or because we received an invalid
request the correct behavior is to terminate the subscribers
connection to the topology server. This is currently broken and
produces the following oops:

[27.953662] tipc: Subscription rejected, illegal request
[27.955329] BUG: spinlock recursion on CPU#1, kworker/u4:0/6
[27.957066]  lock: 0xffff88003c67f408, .magic: dead4ead, .owner: kworker/u4:0/6, .owner_cpu: 1
[27.958054] CPU: 1 PID: 6 Comm: kworker/u4:0 Not tainted 3.14.0-rc6+ #5
[27.960230] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[27.960874] Workqueue: tipc_rcv tipc_recv_work [tipc]
[27.961430]  ffff88003c67f408 ffff88003de27c18 ffffffff815c0207 ffff88003de1c050
[27.962292]  ffff88003de27c38 ffffffff815beec5 ffff88003c67f408 ffffffff817f0a8a
[27.963152]  ffff88003de27c58 ffffffff815beeeb ffff88003c67f408 ffffffffa0013520
[27.964023] Call Trace:
[27.964292]  [<ffffffff815c0207>] dump_stack+0x45/0x56
[27.964874]  [<ffffffff815beec5>] spin_dump+0x8c/0x91
[27.965420]  [<ffffffff815beeeb>] spin_bug+0x21/0x26
[27.965995]  [<ffffffff81083df6>] do_raw_spin_lock+0x116/0x140
[27.966631]  [<ffffffff815c6215>] _raw_spin_lock_bh+0x15/0x20
[27.967256]  [<ffffffffa0008540>] subscr_conn_shutdown_event+0x20/0xa0 [tipc]
[27.968051]  [<ffffffffa000fde4>] tipc_close_conn+0xa4/0xb0 [tipc]
[27.968722]  [<ffffffffa00101ba>] tipc_conn_terminate+0x1a/0x30 [tipc]
[27.969436]  [<ffffffffa00089a2>] subscr_conn_msg_event+0x1f2/0x2f0 [tipc]
[27.970209]  [<ffffffffa0010000>] tipc_receive_from_sock+0x90/0xf0 [tipc]
[27.970972]  [<ffffffffa000fa79>] tipc_recv_work+0x29/0x50 [tipc]
[27.971633]  [<ffffffff8105dbf5>] process_one_work+0x165/0x3e0
[27.972267]  [<ffffffff8105e869>] worker_thread+0x119/0x3a0
[27.972896]  [<ffffffff8105e750>] ? manage_workers.isra.25+0x2a0/0x2a0
[27.973622]  [<ffffffff810648af>] kthread+0xdf/0x100
[27.974168]  [<ffffffff810647d0>] ? kthread_create_on_node+0x1a0/0x1a0
[27.974893]  [<ffffffff815ce13c>] ret_from_fork+0x7c/0xb0
[27.975466]  [<ffffffff810647d0>] ? kthread_create_on_node+0x1a0/0x1a0

The recursion occurs when subscr_terminate tries to grab the
subscriber lock, which is already taken by subscr_conn_msg_event.
We fix this by checking if the request to establish a new
subscription was successful, and if not we initiate termination of
the subscriber after we have released the subscriber lock.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agonetpoll: fix the skb check in pkt_is_ns
Li RongQing [Fri, 21 Mar 2014 12:53:57 +0000 (20:53 +0800)]
netpoll: fix the skb check in pkt_is_ns

[ Not applicable upstream commit, the code here has been removed
  upstream. ]

Neighbor Solicitation is ipv6 protocol, so we should check
skb->protocol with ETH_P_IPV6

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Cc: WANG Cong <amwang@redhat.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agonet: micrel : ks8851-ml: add vdd-supply support
Nishanth Menon [Fri, 21 Mar 2014 06:52:48 +0000 (01:52 -0500)]
net: micrel : ks8851-ml: add vdd-supply support

[ Upstream commit ebf4ad955d3e26d4d2a33709624fc7b5b9d3b969 ]

Few platforms use external regulator to keep the ethernet MAC supplied.
So, request and enable the regulator for driver functionality.

Fixes: 66fda75f47dc (regulator: core: Replace direct ops->disable usage)
Reported-by: Russell King <rmk+kernel@arm.linux.org.uk>
Suggested-by: Markus Pargmann <mpa@pengutronix.de>
Signed-off-by: Nishanth Menon <nm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agoip6mr: fix mfc notification flags
Nicolas Dichtel [Wed, 19 Mar 2014 16:47:51 +0000 (17:47 +0100)]
ip6mr: fix mfc notification flags

[ Upstream commit f518338b16038beeb73e155e60d0f70beb9379f4 ]

Commit 812e44dd1829 ("ip6mr: advertise new mfc entries via rtnl") reuses the
function ip6mr_fill_mroute() to notify mfc events.
But this function was used only for dump and thus was always setting the
flag NLM_F_MULTI, which is wrong in case of a single notification.

Libraries like libnl will wait forever for NLMSG_DONE.

CC: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agoipmr: fix mfc notification flags
Nicolas Dichtel [Wed, 19 Mar 2014 16:47:50 +0000 (17:47 +0100)]
ipmr: fix mfc notification flags

[ Upstream commit 65886f439ab0fdc2dff20d1fa87afb98c6717472 ]

Commit 8cd3ac9f9b7b ("ipmr: advertise new mfc entries via rtnl") reuses the
function ipmr_fill_mroute() to notify mfc events.
But this function was used only for dump and thus was always setting the
flag NLM_F_MULTI, which is wrong in case of a single notification.

Libraries like libnl will wait forever for NLMSG_DONE.

CC: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agortnetlink: fix fdb notification flags
Nicolas Dichtel [Wed, 19 Mar 2014 16:47:49 +0000 (17:47 +0100)]
rtnetlink: fix fdb notification flags

[ Upstream commit 1c104a6bebf3c16b6248408b84f91d09ac8a26b6 ]

Commit 3ff661c38c84 ("net: rtnetlink notify events for FDB NTF_SELF adds and
deletes") reuses the function nlmsg_populate_fdb_fill() to notify fdb events.
But this function was used only for dump and thus was always setting the
flag NLM_F_MULTI, which is wrong in case of a single notification.

Libraries like libnl will wait forever for NLMSG_DONE.

CC: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agotcp: syncookies: do not use getnstimeofday()
Eric Dumazet [Thu, 20 Mar 2014 04:02:21 +0000 (21:02 -0700)]
tcp: syncookies: do not use getnstimeofday()

[ Upstream commit 632623153196bf183a69686ed9c07eee98ff1bf8 ]

While it is true that getnstimeofday() uses about 40 cycles if TSC
is available, it can use 1600 cycles if hpet is the clocksource.

Switch to get_jiffies_64(), as this is more than enough, and
go back to 60 seconds periods.

Fixes: 8c27bd75f04f ("tcp: syncookies: reduce cookie lifetime to 128 seconds")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agovxlan: fix nonfunctional neigh_reduce()
David Stevens [Mon, 24 Mar 2014 14:39:58 +0000 (10:39 -0400)]
vxlan: fix nonfunctional neigh_reduce()

[ Upstream commit 4b29dba9c085a4fb79058fb1c45a2f6257ca3dfa ]

The VXLAN neigh_reduce() code is completely non-functional since
check-in. Specific errors:

1) The original code drops all packets with a multicast destination address,
even though neighbor solicitations are sent to the solicited-node
address, a multicast address. The code after this check was never run.
2) The neighbor table lookup used the IPv6 header destination, which is the
solicited node address, rather than the target address from the
neighbor solicitation. So neighbor lookups would always fail if it
got this far. Also for L3MISSes.
3) The code calls ndisc_send_na(), which does a send on the tunnel device.
The context for neigh_reduce() is the transmit path, vxlan_xmit(),
where the host or a bridge-attached neighbor is trying to transmit
a neighbor solicitation. To respond to it, the tunnel endpoint needs
to do a *receive* of the appropriate neighbor advertisement. Doing a
send, would only try to send the advertisement, encapsulated, to the
remote destinations in the fdb -- hosts that definitely did not do the
corresponding solicitation.
4) The code uses the tunnel endpoint IPv6 forwarding flag to determine the
isrouter flag in the advertisement. This has nothing to do with whether
or not the target is a router, and generally won't be set since the
tunnel endpoint is bridging, not routing, traffic.

The patch below creates a proxy neighbor advertisement to respond to
neighbor solicitions as intended, providing proper IPv6 support for neighbor
reduction.

Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agovxlan: fix potential NULL dereference in arp_reduce()
David Stevens [Tue, 18 Mar 2014 16:32:29 +0000 (12:32 -0400)]
vxlan: fix potential NULL dereference in arp_reduce()

[ Upstream commit 7346135dcd3f9b57f30a5512094848c678d7143e ]

This patch fixes a NULL pointer dereference in the event of an
skb allocation failure in arp_reduce().

Signed-Off-By: David L Stevens <dlstevens@us.ibm.com>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agoipv6: ip6_append_data_mtu do not handle the mtu of the second fragment properly
lucien [Mon, 17 Mar 2014 04:51:01 +0000 (12:51 +0800)]
ipv6: ip6_append_data_mtu do not handle the mtu of the second fragment properly

[ Upstream commit e367c2d03dba4c9bcafad24688fadb79dd95b218 ]

In ip6_append_data_mtu(), when the xfrm mode is not tunnel(such as
transport),the ipsec header need to be added in the first fragment, so the mtu
will decrease to reserve space for it, then the second fragment come, the mtu
should be turn back, as the commit 0c1833797a5a6ec23ea9261d979aa18078720b74
said.  however, in the commit a493e60ac4bbe2e977e7129d6d8cbb0dd236be, it use
*mtu = min(*mtu, ...) to change the mtu, which lead to the new mtu is alway
equal with the first fragment's. and cannot turn back.

when I test through  ping6 -c1 -s5000 $ip (mtu=1280):
...frag (0|1232) ESP(spi=0x00002000,seq=0xb), length 1232
...frag (1232|1216)
...frag (2448|1216)
...frag (3664|1216)
...frag (4880|164)

which should be:
...frag (0|1232) ESP(spi=0x00001000,seq=0x1), length 1232
...frag (1232|1232)
...frag (2464|1232)
...frag (3696|1232)
...frag (4928|116)

so delete the min() when change back the mtu.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Fixes: 75a493e60ac4bb ("ipv6: ip6_append_data_mtu did not care about pmtudisc and frag_size")
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agoipv6: Avoid unnecessary temporary addresses being generated
Heiner Kallweit [Wed, 12 Mar 2014 21:13:19 +0000 (22:13 +0100)]
ipv6: Avoid unnecessary temporary addresses being generated

[ Upstream commit ecab67015ef6e3f3635551dcc9971cf363cc1cd5 ]

tmp_prefered_lft is an offset to ifp->tstamp, not now. Therefore
age needs to be added to the condition.

Age calculation in ipv6_create_tempaddr is different from the one
in addrconf_verify and doesn't consider ADDRCONF_TIMER_FUZZ_MINUS.
This can cause age in ipv6_create_tempaddr to be less than the one
in addrconf_verify and therefore unnecessary temporary address to
be generated.
Use age calculation as in addrconf_modify to avoid this.

Signed-off-by: Heiner Kallweit <heiner.kallweit@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agoeth: fec: Fix lost promiscuous mode after reconnecting cable
Stefan Wahren [Wed, 12 Mar 2014 10:28:19 +0000 (11:28 +0100)]
eth: fec: Fix lost promiscuous mode after reconnecting cable

[ Upstream commit 84fe61821e4ebab6322eeae3f3c27f77f0031978 ]

If the Freescale fec is in promiscuous mode and network cable is
reconnected then the promiscuous mode get lost. The problem is caused
by a too soon call of set_multicast_list to re-enable promisc mode.
The FEC_R_CNTRL register changes are overwritten by fec_restart.

This patch fixes this by moving the call behind the init of FEC_R_CNTRL
register in fec_restart.

Successful tested on a i.MX28 board.

Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agobonding: set correct vlan id for alb xmit path
dingtianhong [Wed, 12 Mar 2014 09:31:59 +0000 (17:31 +0800)]
bonding: set correct vlan id for alb xmit path

[ Upstream commit fb00bc2e6cd2046282ba4b03f4fe682aee70b2f8 ]

The commit d3ab3ffd1d728d7ee77340e7e7e2c7cfe6a4013e
(bonding: use rlb_client_info->vlan_id instead of ->tag)
remove the rlb_client_info->tag, but occur some issues,
The vlan_get_tag() will return 0 for success and -EINVAL for
error, so the client_info->vlan_id always be set to 0 if the
vlan_get_tag return 0 for success, so the client_info would
never get a correct vlan id.

We should only set the vlan id to 0 when the vlan_get_tag return error.

Fixes: d3ab3ffd1d7 (bonding: use rlb_client_info->vlan_id instead of ->tag)
CC: Ding Tianhong <dingtianhong@huawei.com>
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Acked-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agonet: socket: error on a negative msg_namelen
Matthew Leach [Tue, 11 Mar 2014 11:58:27 +0000 (11:58 +0000)]
net: socket: error on a negative msg_namelen

[ Upstream commit dbb490b96584d4e958533fb637f08b557f505657 ]

When copying in a struct msghdr from the user, if the user has set the
msg_namelen parameter to a negative value it gets clamped to a valid
size due to a comparison between signed and unsigned values.

Ensure the syscall errors when the user passes in a negative value.

Signed-off-by: Matthew Leach <matthew.leach@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agobridge: multicast: enable snooping on general queries only
Linus Lüssing [Mon, 10 Mar 2014 21:25:25 +0000 (22:25 +0100)]
bridge: multicast: enable snooping on general queries only

[ Upstream commit 20a599bec95a52fa72432b2376a2ce47c5bb68fb ]

Without this check someone could easily create a denial of service
by injecting multicast-specific queries to enable the bridge
snooping part if no real querier issuing periodic general queries
is present on the link which would result in the bridge wrongly
shutting down ports for multicast traffic as the bridge did not learn
about these listeners.

With this patch the snooping code is enabled upon receiving valid,
general queries only.

Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agobridge: multicast: add sanity check for general query destination
Linus Lüssing [Mon, 10 Mar 2014 21:25:24 +0000 (22:25 +0100)]
bridge: multicast: add sanity check for general query destination

[ Upstream commit 9ed973cc40c588abeaa58aea0683ea665132d11d ]

General IGMP and MLD queries are supposed to have the multicast
link-local all-nodes address as their destination according to RFC2236
section 9, RFC3376 section 4.1.12/9.1, RFC2710 section 8 and RFC3810
section 5.1.15.

Without this check, such malformed IGMP/MLD queries can result in a
denial of service: The queries are ignored by most IGMP/MLD listeners
therefore they will not respond with an IGMP/MLD report. However,
without this patch these malformed MLD queries would enable the
snooping part in the bridge code, potentially shutting down the
according ports towards these hosts for multicast traffic as the
bridge did not learn about these listeners.

Reported-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agotcp: tcp_release_cb() should release socket ownership
Eric Dumazet [Mon, 10 Mar 2014 16:50:11 +0000 (09:50 -0700)]
tcp: tcp_release_cb() should release socket ownership

[ Upstream commit c3f9b01849ef3bc69024990092b9f42e20df7797 ]

Lars Persson reported following deadlock :

-000 |M:0x0:0x802B6AF8(asm) <-- arch_spin_lock
-001 |tcp_v4_rcv(skb = 0x8BD527A0) <-- sk = 0x8BE6B2A0
-002 |ip_local_deliver_finish(skb = 0x8BD527A0)
-003 |__netif_receive_skb_core(skb = 0x8BD527A0, ?)
-004 |netif_receive_skb(skb = 0x8BD527A0)
-005 |elk_poll(napi = 0x8C770500, budget = 64)
-006 |net_rx_action(?)
-007 |__do_softirq()
-008 |do_softirq()
-009 |local_bh_enable()
-010 |tcp_rcv_established(sk = 0x8BE6B2A0, skb = 0x87D3A9E0, th = 0x814EBE14, ?)
-011 |tcp_v4_do_rcv(sk = 0x8BE6B2A0, skb = 0x87D3A9E0)
-012 |tcp_delack_timer_handler(sk = 0x8BE6B2A0)
-013 |tcp_release_cb(sk = 0x8BE6B2A0)
-014 |release_sock(sk = 0x8BE6B2A0)
-015 |tcp_sendmsg(?, sk = 0x8BE6B2A0, ?, ?)
-016 |sock_sendmsg(sock = 0x8518C4C0, msg = 0x87D8DAA8, size = 4096)
-017 |kernel_sendmsg(?, ?, ?, ?, size = 4096)
-018 |smb_send_kvec()
-019 |smb_send_rqst(server = 0x87C4D400, rqst = 0x87D8DBA0)
-020 |cifs_call_async()
-021 |cifs_async_writev(wdata = 0x87FD6580)
-022 |cifs_writepages(mapping = 0x852096E4, wbc = 0x87D8DC88)
-023 |__writeback_single_inode(inode = 0x852095D0, wbc = 0x87D8DC88)
-024 |writeback_sb_inodes(sb = 0x87D6D800, wb = 0x87E4A9C0, work = 0x87D8DD88)
-025 |__writeback_inodes_wb(wb = 0x87E4A9C0, work = 0x87D8DD88)
-026 |wb_writeback(wb = 0x87E4A9C0, work = 0x87D8DD88)
-027 |wb_do_writeback(wb = 0x87E4A9C0, force_wait = 0)
-028 |bdi_writeback_workfn(work = 0x87E4A9CC)
-029 |process_one_work(worker = 0x8B045880, work = 0x87E4A9CC)
-030 |worker_thread(__worker = 0x8B045880)
-031 |kthread(_create = 0x87CADD90)
-032 |ret_from_kernel_thread(asm)

Bug occurs because __tcp_checksum_complete_user() enables BH, assuming
it is running from softirq context.

Lars trace involved a NIC without RX checksum support but other points
are problematic as well, like the prequeue stuff.

Problem is triggered by a timer, that found socket being owned by user.

tcp_release_cb() should call tcp_write_timer_handler() or
tcp_delack_timer_handler() in the appropriate context :

BH disabled and socket lock held, but 'owned' field cleared,
as if they were running from timer handlers.

Fixes: 6f458dfb4092 ("tcp: improve latencies of timer triggered events")
Reported-by: Lars Persson <lars.persson@axis.com>
Tested-by: Lars Persson <lars.persson@axis.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agovlan: Set correct source MAC address with TX VLAN offload enabled
Peter Boström [Mon, 10 Mar 2014 15:17:15 +0000 (16:17 +0100)]
vlan: Set correct source MAC address with TX VLAN offload enabled

[ Upstream commit dd38743b4cc2f86be250eaf156cf113ba3dd531a ]

With TX VLAN offload enabled the source MAC address for frames sent using the
VLAN interface is currently set to the address of the real interface. This is
wrong since the VLAN interface may be configured with a different address.

The bug was introduced in commit 2205369a314e12fcec4781cc73ac9c08fc2b47de
("vlan: Fix header ops passthru when doing TX VLAN offload.").

This patch sets the source address before calling the create function of the
real interface.

Signed-off-by: Peter Boström <peter.bostrom@netrounds.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agopkt_sched: fq: do not hold qdisc lock while allocating memory
Eric Dumazet [Fri, 7 Mar 2014 06:57:52 +0000 (22:57 -0800)]
pkt_sched: fq: do not hold qdisc lock while allocating memory

[ Upstream commit 2d8d40afd187bced0a3d056366fb58d66fe845e3 ]

Resizing fq hash table allocates memory while holding qdisc spinlock,
with BH disabled.

This is definitely not good, as allocation might sleep.

We can drop the lock and get it when needed, we hold RTNL so no other
changes can happen at the same time.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: afe4fd062416 ("pkt_sched: fq: Fair Queue packet scheduler")
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agobnx2: Fix shutdown sequence
Michael Chan [Sun, 9 Mar 2014 23:45:32 +0000 (15:45 -0800)]
bnx2: Fix shutdown sequence

[ Upstream commit a8d9bc2e9f5d1c5a25e33cec096d2a1652d3fd52 ]

The pci shutdown handler added in:

    bnx2: Add pci shutdown handler
    commit 25bfb1dd4ba3b2d9a49ce9d9b0cd7be1840e15ed

created a shutdown down sequence without chip reset if the device was
never brought up.  This can cause the firmware to shutdown the PHY
prematurely and cause MMIO read cycles to be unresponsive.  On some
systems, it may generate NMI in the bnx2's pci shutdown handler.

The fix is to tell the firmware not to shutdown the PHY if there was
no prior chip reset.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agoipv6: don't set DST_NOCOUNT for remotely added routes
Sabrina Dubroca [Thu, 6 Mar 2014 16:51:57 +0000 (17:51 +0100)]
ipv6: don't set DST_NOCOUNT for remotely added routes

[ Upstream commit c88507fbad8055297c1d1e21e599f46960cbee39 ]

DST_NOCOUNT should only be used if an authorized user adds routes
locally. In case of routes which are added on behalf of router
advertisments this flag must not get used as it allows an unlimited
number of routes getting added remotely.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agoipv6: Fix exthdrs offload registration.
Anton Nayshtut [Wed, 5 Mar 2014 06:30:08 +0000 (08:30 +0200)]
ipv6: Fix exthdrs offload registration.

[ Upstream commit d2d273ffabd315eecefce21a4391d44b6e156b73 ]

Without this fix, ipv6_exthdrs_offload_init doesn't register IPPROTO_DSTOPTS
offload, but returns 0 (as the IPPROTO_ROUTING registration actually succeeds).

This then causes the ipv6_gso_segment to drop IPv6 packets with IPPROTO_DSTOPTS
header.

The issue detected and the fix verified by running MS HCK Offload LSO test on
top of QEMU Windows guests, as this test sends IPv6 packets with
IPPROTO_DSTOPTS.

Signed-off-by: Anton Nayshtut <anton@swortex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agonet: unix: non blocking recvmsg() should not return -EINTR
Eric Dumazet [Wed, 26 Mar 2014 01:42:27 +0000 (18:42 -0700)]
net: unix: non blocking recvmsg() should not return -EINTR

[ Upstream commit de1443916791d75fdd26becb116898277bb0273f ]

Some applications didn't expect recvmsg() on a non blocking socket
could return -EINTR. This possibility was added as a side effect
of commit b3ca9b02b00704 ("net: fix multithreaded signal handling in
unix recv routines").

To hit this bug, you need to be a bit unlucky, as the u->readlock
mutex is usually held for very small periods.

Fixes: b3ca9b02b00704 ("net: fix multithreaded signal handling in unix recv routines")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agoinet: frag: make sure forced eviction removes all frags
Florian Westphal [Thu, 6 Mar 2014 17:06:41 +0000 (18:06 +0100)]
inet: frag: make sure forced eviction removes all frags

[ Upstream commit e588e2f286ed7da011ed357c24c5b9a554e26595 ]

Quoting Alexander Aring:
  While fragmentation and unloading of 6lowpan module I got this kernel Oops
  after few seconds:

  BUG: unable to handle kernel paging request at f88bbc30
  [..]
  Modules linked in: ipv6 [last unloaded: 6lowpan]
  Call Trace:
   [<c012af4c>] ? call_timer_fn+0x54/0xb3
   [<c012aef8>] ? process_timeout+0xa/0xa
   [<c012b66b>] run_timer_softirq+0x140/0x15f

Problem is that incomplete frags are still around after unload; when
their frag expire timer fires, we get crash.

When a netns is removed (also done when unloading module), inet_frag
calls the evictor with 'force' argument to purge remaining frags.

The evictor loop terminates when accounted memory ('work') drops to 0
or the lru-list becomes empty.  However, the mem accounting is done
via percpu counters and may not be accurate, i.e. loop may terminate
prematurely.

Alter evictor to only stop once the lru list is empty when force is
requested.

Reported-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Reported-by: Alexander Aring <alex.aring@gmail.com>
Tested-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agotipc: don't log disabled tasklet handler errors
Erik Hugne [Thu, 6 Mar 2014 13:40:21 +0000 (14:40 +0100)]
tipc: don't log disabled tasklet handler errors

[ Upstream commit 2892505ea170094f982516bb38105eac45f274b1 ]

Failure to schedule a TIPC tasklet with tipc_k_signal because the
tasklet handler is disabled is not an error. It means TIPC is
currently in the process of shutting down. We remove the error
logging in this case.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agotipc: fix memory leak during module removal
Erik Hugne [Thu, 6 Mar 2014 13:40:20 +0000 (14:40 +0100)]
tipc: fix memory leak during module removal

[ Upstream commit 1bb8dce57f4d15233688c68990852a10eb1cd79f ]

When the TIPC module is removed, the tasklet handler is disabled
before all other subsystems. This will cause lingering publications
in the name table because the node_down tasklets responsible to
clean up publications from an unreachable node will never run.
When the name table is shut down, these publications are detected
and an error message is logged:
tipc: nametbl_stop(): orphaned hash chain detected
This is actually a memory leak, introduced with commit
993b858e37b3120ee76d9957a901cca22312ffaa ("tipc: correct the order
of stopping services at rmmod")

Instead of just logging an error and leaking memory, we free
the orphaned entries during nametable shutdown.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agotipc: drop subscriber connection id invalidation
Erik Hugne [Thu, 6 Mar 2014 13:40:19 +0000 (14:40 +0100)]
tipc: drop subscriber connection id invalidation

[ Upstream commit edcc0511b5ee7235282a688cd604e3ae7f9e1fc9 ]

When a topology server subscriber is disconnected, the associated
connection id is set to zero. A check vs zero is then done in the
subscription timeout function to see if the subscriber have been
shut down. This is unnecessary, because all subscription timers
will be cancelled when a subscriber terminates. Setting the
connection id to zero is actually harmful because id zero is the
identity of the topology server listening socket, and can cause a
race that leads to this socket being closed instead.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agotipc: fix connection refcount leak
Ying Xue [Thu, 6 Mar 2014 13:40:17 +0000 (14:40 +0100)]
tipc: fix connection refcount leak

[ Upstream commit 4652edb70e8a7eebbe47fa931940f65522c36e8f ]

When tipc_conn_sendmsg() calls tipc_conn_lookup() to query a
connection instance, its reference count value is increased if
it's found. But subsequently if it's found that the connection is
closed, the work of sending message is not queued into its server
send workqueue, and the connection reference count is not decreased.
This will cause a reference count leak. To reproduce this problem,
an application would need to open and closes topology server
connections with high intensity.

We fix this by immediately decrementing the connection reference
count if a send fails due to the connection being closed.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agotipc: allow connection shutdown callback to be invoked in advance
Ying Xue [Thu, 6 Mar 2014 13:40:16 +0000 (14:40 +0100)]
tipc: allow connection shutdown callback to be invoked in advance

[ Upstream commit 6d4ebeb4df0176b1973875840a9f7e91394c0685 ]

Currently connection shutdown callback function is called when
connection instance is released in tipc_conn_kref_release(), and
receiving packets and sending packets are running in different
threads. Even if connection is closed by the thread of receiving
packets, its shutdown callback may not be called immediately as
the connection reference count is non-zero at that moment. So,
although the connection is shut down by the thread of receiving
packets, the thread of sending packets doesn't know it. Before
its shutdown callback is invoked to tell the sending thread its
connection has been closed, the sending thread may deliver
messages by tipc_conn_sendmsg(), this is why the following error
information appears:

"Sending subscription event failed, no memory"

To eliminate it, allow connection shutdown callback function to
be called before connection id is removed in tipc_close_conn(),
which makes the sending thread know the truth in time that its
socket is closed so that it doesn't send message to it. We also
remove the "Sending XXX failed..." error reporting for topology
and config services.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agobridge: multicast: add sanity check for query source addresses
Linus Lüssing [Tue, 4 Mar 2014 02:57:35 +0000 (03:57 +0100)]
bridge: multicast: add sanity check for query source addresses

[ Upstream commit 6565b9eeef194afbb3beec80d6dd2447f4091f8c ]

MLD queries are supposed to have an IPv6 link-local source address
according to RFC2710, section 4 and RFC3810, section 5.1.14. This patch
adds a sanity check to ignore such broken MLD queries.

Without this check, such malformed MLD queries can result in a
denial of service: The queries are ignored by any MLD listener
therefore they will not respond with an MLD report. However,
without this patch these malformed MLD queries would enable the
snooping part in the bridge code, potentially shutting down the
according ports towards these hosts for multicast traffic as the
bridge did not learn about these listeners.

Reported-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agonet: sctp: fix skb leakage in COOKIE ECHO path of chunk->auth_chunk
Daniel Borkmann [Tue, 4 Mar 2014 15:35:51 +0000 (16:35 +0100)]
net: sctp: fix skb leakage in COOKIE ECHO path of chunk->auth_chunk

[ Upstream commit c485658bae87faccd7aed540fd2ca3ab37992310 ]

While working on ec0223ec48a9 ("net: sctp: fix sctp_sf_do_5_1D_ce to
verify if we/peer is AUTH capable"), we noticed that there's a skb
memory leakage in the error path.

Running the same reproducer as in ec0223ec48a9 and by unconditionally
jumping to the error label (to simulate an error condition) in
sctp_sf_do_5_1D_ce() receive path lets kmemleak detector bark about
the unfreed chunk->auth_chunk skb clone:

Unreferenced object 0xffff8800b8f3a000 (size 256):
  comm "softirq", pid 0, jiffies 4294769856 (age 110.757s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    89 ab 75 5e d4 01 58 13 00 00 00 00 00 00 00 00  ..u^..X.........
  backtrace:
    [<ffffffff816660be>] kmemleak_alloc+0x4e/0xb0
    [<ffffffff8119f328>] kmem_cache_alloc+0xc8/0x210
    [<ffffffff81566929>] skb_clone+0x49/0xb0
    [<ffffffffa0467459>] sctp_endpoint_bh_rcv+0x1d9/0x230 [sctp]
    [<ffffffffa046fdbc>] sctp_inq_push+0x4c/0x70 [sctp]
    [<ffffffffa047e8de>] sctp_rcv+0x82e/0x9a0 [sctp]
    [<ffffffff815abd38>] ip_local_deliver_finish+0xa8/0x210
    [<ffffffff815a64af>] nf_reinject+0xbf/0x180
    [<ffffffffa04b4762>] nfqnl_recv_verdict+0x1d2/0x2b0 [nfnetlink_queue]
    [<ffffffffa04aa40b>] nfnetlink_rcv_msg+0x14b/0x250 [nfnetlink]
    [<ffffffff815a3269>] netlink_rcv_skb+0xa9/0xc0
    [<ffffffffa04aa7cf>] nfnetlink_rcv+0x23f/0x408 [nfnetlink]
    [<ffffffff815a2bd8>] netlink_unicast+0x168/0x250
    [<ffffffff815a2fa1>] netlink_sendmsg+0x2e1/0x3f0
    [<ffffffff8155cc6b>] sock_sendmsg+0x8b/0xc0
    [<ffffffff8155d449>] ___sys_sendmsg+0x369/0x380

What happens is that commit bbd0d59809f9 clones the skb containing
the AUTH chunk in sctp_endpoint_bh_rcv() when having the edge case
that an endpoint requires COOKIE-ECHO chunks to be authenticated:

  ---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ---------->
  <------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] ---------
  ------------------ AUTH; COOKIE-ECHO ---------------->
  <-------------------- COOKIE-ACK ---------------------

When we enter sctp_sf_do_5_1D_ce() and before we actually get to
the point where we process (and subsequently free) a non-NULL
chunk->auth_chunk, we could hit the "goto nomem_init" path from
an error condition and thus leave the cloned skb around w/o
freeing it.

The fix is to centrally free such clones in sctp_chunk_destroy()
handler that is invoked from sctp_chunk_free() after all refs have
dropped; and also move both kfree_skb(chunk->auth_chunk) there,
so that chunk->auth_chunk is either NULL (since sctp_chunkify()
allocs new chunks through kmem_cache_zalloc()) or non-NULL with
a valid skb pointer. chunk->skb and chunk->auth_chunk are the
only skbs in the sctp_chunk structure that need to be handeled.

While at it, we should use consume_skb() for both. It is the same
as dev_kfree_skb() but more appropriately named as we are not
a device but a protocol. Also, this effectively replaces the
kfree_skb() from both invocations into consume_skb(). Functions
are the same only that kfree_skb() assumes that the frame was
being dropped after a failure (e.g. for tools like drop monitor),
usage of consume_skb() seems more appropriate in function
sctp_chunk_destroy() though.

Fixes: bbd0d59809f9 ("[SCTP]: Implement the receive and verification of AUTH chunk")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Vlad Yasevich <yasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agonet: fix for a race condition in the inet frag code
Nikolay Aleksandrov [Mon, 3 Mar 2014 22:19:18 +0000 (23:19 +0100)]
net: fix for a race condition in the inet frag code

[ Upstream commit 24b9bf43e93e0edd89072da51cf1fab95fc69dec ]

I stumbled upon this very serious bug while hunting for another one,
it's a very subtle race condition between inet_frag_evictor,
inet_frag_intern and the IPv4/6 frag_queue and expire functions
(basically the users of inet_frag_kill/inet_frag_put).

What happens is that after a fragment has been added to the hash chain
but before it's been added to the lru_list (inet_frag_lru_add) in
inet_frag_intern, it may get deleted (either by an expired timer if
the system load is high or the timer sufficiently low, or by the
fraq_queue function for different reasons) before it's added to the
lru_list, then after it gets added it's a matter of time for the
evictor to get to a piece of memory which has been freed leading to a
number of different bugs depending on what's left there.

I've been able to trigger this on both IPv4 and IPv6 (which is normal
as the frag code is the same), but it's been much more difficult to
trigger on IPv4 due to the protocol differences about how fragments
are treated.

The setup I used to reproduce this is: 2 machines with 4 x 10G bonded
in a RR bond, so the same flow can be seen on multiple cards at the
same time. Then I used multiple instances of ping/ping6 to generate
fragmented packets and flood the machines with them while running
other processes to load the attacked machine.

*It is very important to have the _same flow_ coming in on multiple CPUs
concurrently. Usually the attacked machine would die in less than 30
minutes, if configured properly to have many evictor calls and timeouts
it could happen in 10 minutes or so.

An important point to make is that any caller (frag_queue or timer) of
inet_frag_kill will remove both the timer refcount and the
original/guarding refcount thus removing everything that's keeping the
frag from being freed at the next inet_frag_put.  All of this could
happen before the frag was ever added to the LRU list, then it gets
added and the evictor uses a freed fragment.

An example for IPv6 would be if a fragment is being added and is at
the stage of being inserted in the hash after the hash lock is
released, but before inet_frag_lru_add executes (or is able to obtain
the lru lock) another overlapping fragment for the same flow arrives
at a different CPU which finds it in the hash, but since it's
overlapping it drops it invoking inet_frag_kill and thus removing all
guarding refcounts, and afterwards freeing it by invoking
inet_frag_put which removes the last refcount added previously by
inet_frag_find, then inet_frag_lru_add gets executed by
inet_frag_intern and we have a freed fragment in the lru_list.

The fix is simple, just move the lru_add under the hash chain locked
region so when a removing function is called it'll have to wait for
the fragment to be added to the lru_list, and then it'll remove it (it
works because the hash chain removal is done before the lru_list one
and there's no window between the two list adds when the frag can get
dropped). With this fix applied I couldn't kill the same machine in 24
hours with the same setup.

Fixes: 3ef0eb0db4bf ("net: frag, move LRU list maintenance outside of
rwlock")

CC: Florian Westphal <fw@strlen.de>
CC: Jesper Dangaard Brouer <brouer@redhat.com>
CC: David S. Miller <davem@davemloft.net>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agodrm/cirrus: use drm_set_preferred_mode
Gerd Hoffmann [Fri, 11 Oct 2013 08:01:09 +0000 (10:01 +0200)]
drm/cirrus: use drm_set_preferred_mode

commit 121a6a17439b000b9699c3fa876636db20fa4107 upstream.

Explicitly set 1024x768 as default mode, so the display doesn't come up
with the largest supported mode.

While being at it drop first three drm_add_modes_noedid calls.  As
drm_add_modes_noedid fills the mode list with modes from the database
*up to* the specified size it is pretty pointless to call it multiple
times with different sizes.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agodrm: add drm_set_preferred_mode
Gerd Hoffmann [Fri, 11 Oct 2013 08:01:08 +0000 (10:01 +0200)]
drm: add drm_set_preferred_mode

commit 3cf70dafd7bbbc91df0a9ecb081d46f9f3d867f6 upstream.

New helper function to set the preferred video mode.  Can be called
after drm_add_modes_noedid if you don't want the largest supported
video mode be used by default.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agofbdev: Make the switch from generic to native driver less alarming
Adam Jackson [Wed, 12 Feb 2014 21:02:02 +0000 (16:02 -0500)]
fbdev: Make the switch from generic to native driver less alarming

commit 13ba0ad4490c3dd08b15c430a7a01c6fb45d5bce upstream.

Calling this "conflicting" just makes people think there's a problem
when there's not.

Signed-off-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agovideo/fb: Propagate error code from failing to unregister conflicting fb
Chris Wilson [Mon, 16 Dec 2013 15:57:39 +0000 (15:57 +0000)]
video/fb: Propagate error code from failing to unregister conflicting fb

commit 46eeb2c144956e88197439b5ee5cf221a91b0a81 upstream.

If we fail to remove a conflicting fb driver, we need to abort the
loading of the second driver to avoid likely kernel panics.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agofb: reorder the lock sequence to fix potential dead lock
Gu Zheng [Tue, 5 Nov 2013 10:00:57 +0000 (18:00 +0800)]
fb: reorder the lock sequence to fix potential dead lock

commit 3a41c5dbe8bc396a7fb16ca8739e945bb003342e upstream.

Following commits:

50e244cc79 fb: rework locking to fix lock ordering on takeover
e93a9a8687 fb: Yet another band-aid for fixing lockdep mess
054430e773 fbcon: fix locking harder

reworked locking to fix related lock ordering on takeover, and introduced console_lock
into fbmem, but it seems that the new lock sequence(fb_info->lock ---> console_lock)
is against with the one in console_callback(console_lock ---> fb_info->lock), and leads to
a potential dead lock as following:

[  601.079000] ======================================================
[  601.079000] [ INFO: possible circular locking dependency detected ]
[  601.079000] 3.11.0 #189 Not tainted
[  601.079000] -------------------------------------------------------
[  601.079000] kworker/0:3/619 is trying to acquire lock:
[  601.079000]  (&fb_info->lock){+.+.+.}, at: [<ffffffff81397566>] lock_fb_info+0x26/0x60
[  601.079000]
but task is already holding lock:
[  601.079000]  (console_lock){+.+.+.}, at: [<ffffffff8141aae3>] console_callback+0x13/0x160
[  601.079000]
which lock already depends on the new lock.

[  601.079000]
the existing dependency chain (in reverse order) is:
[  601.079000]
-> #1 (console_lock){+.+.+.}:
[  601.079000]        [<ffffffff810dc971>] lock_acquire+0xa1/0x140
[  601.079000]        [<ffffffff810c6267>] console_lock+0x77/0x80
[  601.079000]        [<ffffffff81399448>] register_framebuffer+0x1d8/0x320
[  601.079000]        [<ffffffff81cfb4c8>] efifb_probe+0x408/0x48f
[  601.079000]        [<ffffffff8144a963>] platform_drv_probe+0x43/0x80
[  601.079000]        [<ffffffff8144853b>] driver_probe_device+0x8b/0x390
[  601.079000]        [<ffffffff814488eb>] __driver_attach+0xab/0xb0
[  601.079000]        [<ffffffff814463bd>] bus_for_each_dev+0x5d/0xa0
[  601.079000]        [<ffffffff81447e6e>] driver_attach+0x1e/0x20
[  601.079000]        [<ffffffff81447a07>] bus_add_driver+0x117/0x290
[  601.079000]        [<ffffffff81448fea>] driver_register+0x7a/0x170
[  601.079000]        [<ffffffff8144a10a>] __platform_driver_register+0x4a/0x50
[  601.079000]        [<ffffffff8144a12d>] platform_driver_probe+0x1d/0xb0
[  601.079000]        [<ffffffff81cfb0a1>] efifb_init+0x273/0x292
[  601.079000]        [<ffffffff81002132>] do_one_initcall+0x102/0x1c0
[  601.079000]        [<ffffffff81cb80a6>] kernel_init_freeable+0x15d/0x1ef
[  601.079000]        [<ffffffff8166d2de>] kernel_init+0xe/0xf0
[  601.079000]        [<ffffffff816914ec>] ret_from_fork+0x7c/0xb0
[  601.079000]
-> #0 (&fb_info->lock){+.+.+.}:
[  601.079000]        [<ffffffff810dc1d8>] __lock_acquire+0x1e18/0x1f10
[  601.079000]        [<ffffffff810dc971>] lock_acquire+0xa1/0x140
[  601.079000]        [<ffffffff816835ca>] mutex_lock_nested+0x7a/0x3b0
[  601.079000]        [<ffffffff81397566>] lock_fb_info+0x26/0x60
[  601.079000]        [<ffffffff813a4aeb>] fbcon_blank+0x29b/0x2e0
[  601.079000]        [<ffffffff81418658>] do_blank_screen+0x1d8/0x280
[  601.079000]        [<ffffffff8141ab34>] console_callback+0x64/0x160
[  601.079000]        [<ffffffff8108d855>] process_one_work+0x1f5/0x540
[  601.079000]        [<ffffffff8108e04c>] worker_thread+0x11c/0x370
[  601.079000]        [<ffffffff81095fbd>] kthread+0xed/0x100
[  601.079000]        [<ffffffff816914ec>] ret_from_fork+0x7c/0xb0
[  601.079000]
other info that might help us debug this:

[  601.079000]  Possible unsafe locking scenario:

[  601.079000]        CPU0                    CPU1
[  601.079000]        ----                    ----
[  601.079000]   lock(console_lock);
[  601.079000]                                lock(&fb_info->lock);
[  601.079000]                                lock(console_lock);
[  601.079000]   lock(&fb_info->lock);
[  601.079000]
 *** DEADLOCK ***

so we reorder the lock sequence the same as it in console_callback() to
avoid this issue. And following Tomi's suggestion, fix these similar
issues all in fb subsystem.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agodrm: Prefer noninterlace cmdline mode unless explicitly specified
Takashi Iwai [Wed, 19 Mar 2014 13:53:13 +0000 (14:53 +0100)]
drm: Prefer noninterlace cmdline mode unless explicitly specified

commit c683f427bdc43525f61e26609d34e799e7ea4c12 upstream.

Currently drm_pick_cmdline_mode() doesn't care about the interlace
when the given mode line has no "i" suffix.  That is, when there are
multiple entries for the same resolution, an interlace mode might be
picked up just depending on the assigned order, and there is no way to
exclude it.

This patch changes the logic for the mode selection, to prefer the
noninterlace mode unless the interlace mode is explicitly given.
When no matching mode is found, it still tries the interlace mode as
fallback.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agodrm/radeon: enable speaker allocation setup on dce3.2
Alex Deucher [Tue, 18 Feb 2014 16:12:11 +0000 (11:12 -0500)]
drm/radeon: enable speaker allocation setup on dce3.2

commit 3803c8e5b50946dd6bc18972d9190757d05648f0 upstream.

Now that we disable audio while setting up the audio
hw, we should be able to set this up without hangs.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agodrm/radeon: change audio enable logic
Alex Deucher [Tue, 18 Feb 2014 16:07:55 +0000 (11:07 -0500)]
drm/radeon: change audio enable logic

commit 832eafaf34ff7d0348fe701e417900c6cf1f5656 upstream.

Disable audio around audio hw setup.  This may avoid
hangs on certain asics.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agodrm/cirrus: Fix cirrus drm driver for fbdev + qemu
Martin Koegler [Thu, 9 Jan 2014 09:05:07 +0000 (10:05 +0100)]
drm/cirrus: Fix cirrus drm driver for fbdev + qemu

commit 99d4a8ae93ead27b5a88cdbd09dc556fe96ac3a8 upstream.

Xorg fbdev driver requires smem_start/smem_len, otherwise
it tries to map 0 bytes as video memory.

Bugzilla: https://bugzilla.novell.com/show_bug.cgi?id=856760
Signed-off-by: Martin Koegler <martin.koegler@chello.at>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agodrm/i915: Undo the PIPEA quirk for i845
Chris Wilson [Tue, 8 Oct 2013 10:16:59 +0000 (11:16 +0100)]
drm/i915: Undo the PIPEA quirk for i845

commit a4945f9522d27e1e6d64a02ad055e83768cb0896 upstream.

The PIPEA quirk is specifically for the issue with the PIPEB PLL on
830gm being slaved to the PIPEA PLL, and so to use PIPEB requires PIPEA
running. i845 doesn't even have the second PLL or pipe, and enabling
the quirk results in a blank DVO LVDS.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agofloppy: bail out in open() if drive is not responding to block0 read
Jiri Kosina [Fri, 10 Jan 2014 01:08:13 +0000 (02:08 +0100)]
floppy: bail out in open() if drive is not responding to block0 read

commit 7b7b68bba5ef23734c35ffb0d8d82079ed604d33 upstream.

In case reading of block 0 during open() fails, it is not the right thing
to let open() succeed.

Fix this by introducing FD_OPEN_SHOULD_FAIL_BIT flag, and setting it in
case the bio callback encounters an error while trying to read block 0.

As a bonus, this works around certain broken userspace (blkid), which is
not able to properly handle read()s returning IO errors. Hence be nice to
those, and bail out during open() already; if block 0 is not readable,
read()s are not going to provide any meaningful data anyway.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agoext4: Speedup WB_SYNC_ALL pass called from sync(2)
Jan Kara [Tue, 4 Mar 2014 15:50:50 +0000 (10:50 -0500)]
ext4: Speedup WB_SYNC_ALL pass called from sync(2)

commit 10542c229a4e8e25b40357beea66abe9dacda2c0 upstream.

When doing filesystem wide sync, there's no need to force transaction
commit (or synchronously write inode buffer) separately for each inode
because ext4_sync_fs() takes care of forcing commit at the end (VFS
takes care of flushing buffer cache, respectively). Most of the time
this slowness doesn't manifest because previous WB_SYNC_NONE writeback
doesn't leave much to write but when there are processes aggressively
creating new files and several filesystems to sync, the sync slowness
can be noticeable. In the following test script sync(1) takes around 6
minutes when there are two ext4 filesystems mounted on a standard SATA
drive. After this patch sync takes a couple of seconds so we have about
two orders of magnitude improvement.

      function run_writers
      {
        for (( i = 0; i < 10; i++ )); do
          mkdir $1/dir$i
          for (( j = 0; j < 40000; j++ )); do
            dd if=/dev/zero of=$1/dir$i/$j bs=4k count=4 &>/dev/null
          done &
        done
      }

      for dir in "$@"; do
        run_writers $dir
      done

      sleep 40
      time sync

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agoSUNRPC: Fix potential memory scribble in xprt_free_bc_request()
Trond Myklebust [Tue, 11 Feb 2014 18:56:54 +0000 (13:56 -0500)]
SUNRPC: Fix potential memory scribble in xprt_free_bc_request()

commit 628356791b04ea988fee070f66a748a823d001bb upstream.

The call to xprt_free_allocation() will call list_del() on
req->rq_bc_pa_list, which is not attached to a list.
This patch moves the list_del() out of xprt_free_allocation()
and into those callers that need it.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agoNFSv3: Fix return value of nfs3_proc_setacls
Trond Myklebust [Sun, 2 Feb 2014 19:36:42 +0000 (14:36 -0500)]
NFSv3: Fix return value of nfs3_proc_setacls

commit 8f493b9cfcd8941c6b27d6ce8e3b4a78c094b3c1 upstream.

nfs3_proc_setacls is used internally by the NFSv3 create operations
to set the acl after the file has been created. If the operation
fails because the server doesn't support acls, then it must return '0',
not -EOPNOTSUPP.

Reported-by: Russell King <linux@arm.linux.org.uk>
Link: http://lkml.kernel.org/r/20140201010328.GI15937@n2100.arm.linux.org.uk
Cc: Christoph Hellwig <hch@lst.de>
Tested-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agonfs: initialize the ACL support bits to zero.
Malahal Naineni [Mon, 27 Jan 2014 21:31:09 +0000 (15:31 -0600)]
nfs: initialize the ACL support bits to zero.

commit a1800acaf7d1c2bf6d68b9a8f4ab8560cc66555a upstream.

Avoid returning incorrect acl mask attributes when the server doesn't
support ACLs.

Signed-off-by: Malahal Naineni <malahal@us.ibm.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agoChar: ipmi_bt_sm, fix infinite loop
Jiri Slaby [Mon, 14 Apr 2014 14:46:50 +0000 (09:46 -0500)]
Char: ipmi_bt_sm, fix infinite loop

commit a94cdd1f4d30f12904ab528152731fb13a812a16 upstream.

In read_all_bytes, we do

  unsigned char i;
  ...
  bt->read_data[0] = BMC2HOST;
  bt->read_count = bt->read_data[0];
  ...
  for (i = 1; i <= bt->read_count; i++)
    bt->read_data[i] = BMC2HOST;

If bt->read_data[0] == bt->read_count == 255, we loop infinitely in the
'for' loop.  Make 'i' an 'int' instead of 'char' to get rid of the
overflow and finish the loop after 255 iterations every time.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Reported-and-debugged-by: Rui Hui Dian <rhdian@novell.com>
Cc: Tomas Cech <tcech@suse.cz>
Cc: Corey Minyard <minyard@acm.org>
Cc: <openipmi-developer@lists.sourceforge.net>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agom68k: Skip futex_atomic_cmpxchg_inatomic() test
Finn Thain [Wed, 5 Mar 2014 23:29:27 +0000 (10:29 +1100)]
m68k: Skip futex_atomic_cmpxchg_inatomic() test

commit e571c58f313d35c56e0018470e3375ddd1fd320e upstream.

Skip the futex_atomic_cmpxchg_inatomic() test in futex_init(). It causes a
fatal exception on 68030 (and presumably 68020 also).

Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1403061006440.5525@nippy.intranet
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agofutex: Allow architectures to skip futex_atomic_cmpxchg_inatomic() test
Heiko Carstens [Sun, 2 Mar 2014 12:09:47 +0000 (13:09 +0100)]
futex: Allow architectures to skip futex_atomic_cmpxchg_inatomic() test

commit 03b8c7b623c80af264c4c8d6111e5c6289933666 upstream.

If an architecture has futex_atomic_cmpxchg_inatomic() implemented and there
is no runtime check necessary, allow to skip the test within futex_init().

This allows to get rid of some code which would always give the same result,
and also allows the compiler to optimize a couple of if statements away.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Finn Thain <fthain@telegraphics.com.au>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Link: http://lkml.kernel.org/r/20140302120947.GA3641@osiris
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[geert: Backported to v3.10..v3.13]
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agoselinux: correctly label /proc inodes in use before the policy is loaded
Paul Moore [Wed, 19 Mar 2014 20:46:18 +0000 (16:46 -0400)]
selinux: correctly label /proc inodes in use before the policy is loaded

commit f64410ec665479d7b4b77b7519e814253ed0f686 upstream.

This patch is based on an earlier patch by Eric Paris, he describes
the problem below:

  "If an inode is accessed before policy load it will get placed on a
   list of inodes to be initialized after policy load.  After policy
   load we call inode_doinit() which calls inode_doinit_with_dentry()
   on all inodes accessed before policy load.  In the case of inodes
   in procfs that means we'll end up at the bottom where it does:

     /* Default to the fs superblock SID. */
     isec->sid = sbsec->sid;

     if ((sbsec->flags & SE_SBPROC) && !S_ISLNK(inode->i_mode)) {
             if (opt_dentry) {
                     isec->sclass = inode_mode_to_security_class(...)
                     rc = selinux_proc_get_sid(opt_dentry,
                                               isec->sclass,
                                               &sid);
                     if (rc)
                             goto out_unlock;
                     isec->sid = sid;
             }
     }

   Since opt_dentry is null, we'll never call selinux_proc_get_sid()
   and will leave the inode labeled with the label on the superblock.
   I believe a fix would be to mimic the behavior of xattrs.  Look
   for an alias of the inode.  If it can't be found, just leave the
   inode uninitialized (and pick it up later) if it can be found, we
   should be able to call selinux_proc_get_sid() ..."

On a system exhibiting this problem, you will notice a lot of files in
/proc with the generic "proc_t" type (at least the ones that were
accessed early in the boot), for example:

   # ls -Z /proc/sys/kernel/shmmax | awk '{ print $4 " " $5 }'
   system_u:object_r:proc_t:s0 /proc/sys/kernel/shmmax

However, with this patch in place we see the expected result:

   # ls -Z /proc/sys/kernel/shmmax | awk '{ print $4 " " $5 }'
   system_u:object_r:sysctl_kernel_t:s0 /proc/sys/kernel/shmmax

Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agoPCI: mvebu: move clock enable before register access
Sebastian Hesselbarth [Tue, 13 Aug 2013 12:25:20 +0000 (14:25 +0200)]
PCI: mvebu: move clock enable before register access

commit b42285f66f871a9898a0e79e2d74bc7e7a101995 upstream.

The clock passed to PCI controller found on MVEBU SoCs may come from a
clock gate. This requires the clock to be enabled before any registers
are accessed. Therefore, move the clock enable before register iomap to
ensure it is enabled.

Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agopowernow-k6: reorder frequencies
Mikulas Patocka [Sun, 13 Apr 2014 14:22:21 +0000 (16:22 +0200)]
powernow-k6: reorder frequencies

commit 22c73795b101597051924556dce019385a1e2fa0 upstream.

This patch reorders reported frequencies from the highest to the lowest,
just like in other frequency drivers.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agopowernow-k6: correctly initialize default parameters
Mikulas Patocka [Sun, 13 Apr 2014 14:22:20 +0000 (16:22 +0200)]
powernow-k6: correctly initialize default parameters

commit d82b922a4acc1781d368aceac2f9da43b038cab2 upstream.

The powernow-k6 driver used to read the initial multiplier from the
powernow register. However, there is a problem with this:

* If there was a frequency transition before, the multiplier read from the
  register corresponds to the current multiplier.
* If there was no frequency transition since reset, the field in the
  register always reads as zero, regardless of the current multiplier that
  is set using switches on the mainboard and that the CPU is running at.

The zero value corresponds to multiplier 4.5, so as a consequence, the
powernow-k6 driver always assumes multiplier 4.5.

For example, if we have 550MHz CPU with bus frequency 100MHz and
multiplier 5.5, the powernow-k6 driver thinks that the multiplier is 4.5
and bus frequency is 122MHz. The powernow-k6 driver then sets the
multiplier to 4.5, underclocking the CPU to 450MHz, but reports the
current frequency as 550MHz.

There is no reliable way how to read the initial multiplier. I modified
the driver so that it contains a table of known frequencies (based on
parameters of existing CPUs and some common overclocking schemes) and sets
the multiplier according to the frequency. If the frequency is unknown
(because of unusual overclocking or underclocking), the user must supply
the bus speed and maximum multiplier as module parameters.

This patch should be backported to all stable kernels. If it doesn't
apply cleanly, change it, or ask me to change it.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agopowernow-k6: disable cache when changing frequency
Mikulas Patocka [Sun, 13 Apr 2014 14:22:20 +0000 (16:22 +0200)]
powernow-k6: disable cache when changing frequency

commit e20e1d0ac02308e2211306fc67abcd0b2668fb8b upstream.

I found out that a system with k6-3+ processor is unstable during network
server load. The system locks up or the network card stops receiving. The
reason for the instability is the CPU frequency scaling.

During frequency transition the processor is in "EPM Stop Grant" state.
The documentation says that the processor doesn't respond to inquiry
requests in this state. Consequently, coherency of processor caches and
bus master devices is not maintained, causing the system instability.

This patch flushes the cache during frequency transition. It fixes the
instability.

Other minor changes:
* u64 invalue changed to unsigned long because the variable is 32-bit
* move the logic to set the multiplier to a separate function
  powernow_k6_set_cpu_multiplier
* preserve lower 5 bits of the powernow port instead of 4 (the voltage
  field has 5 bits)
* mask interrupts when reading the multiplier, so that the port is not
  open during other activity (running other kernel code with the port open
  shouldn't cause any misbehavior, but we should better be safe and keep
  the port closed)

This patch should be backported to all stable kernels. If it doesn't
apply cleanly, change it, or ask me to change it.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agoLinux 3.12.17 v3.12.17
Jiri Slaby [Sat, 22 Mar 2014 21:02:40 +0000 (22:02 +0100)]
Linux 3.12.17

10 years agonetfilter: nf_conntrack_dccp: fix skb_header_pointer API usages
Daniel Borkmann [Sun, 5 Jan 2014 23:57:54 +0000 (00:57 +0100)]
netfilter: nf_conntrack_dccp: fix skb_header_pointer API usages

commit b22f5126a24b3b2f15448c3f2a254fc10cbc2b92 upstream.

Some occurences in the netfilter tree use skb_header_pointer() in
the following way ...

  struct dccp_hdr _dh, *dh;
  ...
  skb_header_pointer(skb, dataoff, sizeof(_dh), &dh);

... where dh itself is a pointer that is being passed as the copy
buffer. Instead, we need to use &_dh as the forth argument so that
we're copying the data into an actual buffer that sits on the stack.

Currently, we probably could overwrite memory on the stack (e.g.
with a possibly mal-formed DCCP packet), but unintentionally, as
we only want the buffer to be placed into _dh variable.

Fixes: 2bc780499aa3 ("[NETFILTER]: nf_conntrack: add DCCP protocol support")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agomm: close PageTail race
David Rientjes [Mon, 3 Mar 2014 23:38:18 +0000 (15:38 -0800)]
mm: close PageTail race

commit 668f9abbd4334e6c29fa8acd71635c4f9101caa7 upstream.

Commit bf6bddf1924e ("mm: introduce compaction and migration for
ballooned pages") introduces page_count(page) into memory compaction
which dereferences page->first_page if PageTail(page).

This results in a very rare NULL pointer dereference on the
aforementioned page_count(page).  Indeed, anything that does
compound_head(), including page_count() is susceptible to racing with
prep_compound_page() and seeing a NULL or dangling page->first_page
pointer.

This patch uses Andrea's implementation of compound_trans_head() that
deals with such a race and makes it the default compound_head()
implementation.  This includes a read memory barrier that ensures that
if PageTail(head) is true that we return a head page that is neither
NULL nor dangling.  The patch then adds a store memory barrier to
prep_compound_page() to ensure page->first_page is set.

This is the safest way to ensure we see the head page that we are
expecting, PageTail(page) is already in the unlikely() path and the
memory barriers are unfortunately required.

Hugetlbfs is the exception, we don't enforce a store memory barrier
during init since no race is possible.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Holger Kiehl <Holger.Kiehl@dwd.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agonet: mvneta: fix usage as a module on RGMII configurations
Thomas Petazzoni [Tue, 25 Mar 2014 23:25:42 +0000 (00:25 +0100)]
net: mvneta: fix usage as a module on RGMII configurations

commit e3a8786c10e75903f1269474e21fe8cb49c3a670 upstream.

Commit 5445eaf309ff ('mvneta: Try to fix mvneta when compiled as
module') fixed the mvneta driver to make it work properly when loaded
as a module in SGMII configuration, which was tested successful by the
author on the Armada XP OpenBlocks AX3, which uses SGMII.

However, it turns out that the Armada XP GP, which uses RGMII, is
affected by a similar problem: its SERDES configuration is lost when
mvneta is loaded as a module, because this configuration is set by the
bootloader, and then lost because the clock is gated by the clock
framework until the mvneta driver is loaded again and the clock is
re-enabled.

However, it turns out that for the RGMII case, setting the SERDES
configuration is not sufficient: the PCS enable bit in the
MVNETA_GMAC_CTRL_2 register must also be set, like in the SGMII
configuration.

Therefore, this commit reworks the SGMII/RGMII initialization: the
only difference between the two now is a different SERDES
configuration, all the rest is identical.

In detail, to achieve this, the commit:

 * Renames MVNETA_SGMII_SERDES_CFG to MVNETA_SERDES_CFG because it is
   not specific to SGMII, but also used on RGMII configurations.

 * Adds a MVNETA_RGMII_SERDES_PROTO definition, that must be used as
   the MVNETA_SERDES_CFG value in RGMII configurations.

 * Removes the mvneta_gmac_rgmii_set() and mvneta_port_sgmii_config()
   functions, and instead directly do the SGMII/RGMII configuration in
   mvneta_port_up(), from where those functions where called. It is
   worth mentioning that mvneta_gmac_rgmii_set() had an 'enable'
   parameter that was always passed as '1', so it was pretty useless.

 * Reworks the mvneta_port_up() function to set the MVNETA_SERDES_CFG
   register to the appropriate value depending on the RGMII vs. SGMII
   configuration. It also unconditionally set the PCS_ENABLE bit (was
   already done for SGMII, but is now also needed for RGMII), and sets
   the PORT_RGMII bit (which was already done for both SGMII and
   RGMII).

This commit was successfully tested with mvneta compiled as a module,
on both the OpenBlocks AX3 (SGMII configuration) and the Armada XP GP
(RGMII configuration).

Reported-by: Steve McIntyre <steve@einval.com>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agonet: mvneta: rename MVNETA_GMAC2_PSC_ENABLE to MVNETA_GMAC2_PCS_ENABLE
Thomas Petazzoni [Tue, 25 Mar 2014 23:25:41 +0000 (00:25 +0100)]
net: mvneta: rename MVNETA_GMAC2_PSC_ENABLE to MVNETA_GMAC2_PCS_ENABLE

commit a79121d3b57e7ad61f0b5d23eae05214054f3ccd upstream.

Bit 3 of the MVNETA_GMAC_CTRL_2 is actually used to enable the PCS,
not the PSC: there was a typo in the name of the define, which this
commit fixes.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agomake prepend_name() work correctly when called with negative *buflen
Al Viro [Sun, 23 Mar 2014 04:28:40 +0000 (00:28 -0400)]
make prepend_name() work correctly when called with negative *buflen

commit e825196d48d2b89a6ec3a8eff280098d2a78207e upstream.

In all callchains leading to prepend_name(), the value left in *buflen
is eventually discarded unused if prepend_name() has returned a negative.
So we are free to do what prepend() does, and subtract from *buflen
*before* checking for underflow (which turns into checking the sign
of subtraction result, of course).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agox86: fix boot on uniprocessor systems
Artem Fetishev [Fri, 28 Mar 2014 20:33:39 +0000 (13:33 -0700)]
x86: fix boot on uniprocessor systems

commit 825600c0f20e595daaa7a6dd8970f84fa2a2ee57 upstream.

On x86 uniprocessor systems topology_physical_package_id() returns -1
which causes rapl_cpu_prepare() to leave rapl_pmu variable uninitialized
which leads to GPF in rapl_pmu_init().

See arch/x86/kernel/cpu/perf_event_intel_rapl.c.

It turns out that physical_package_id and core_id can actually be
retreived for uniprocessor systems too.  Enabling them also fixes
rapl_pmu code.

Signed-off-by: Artem Fetishev <artem_fetishev@epam.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agodrm/i915: Undo gtt scratch pte unmapping again
Daniel Vetter [Wed, 26 Mar 2014 19:10:09 +0000 (20:10 +0100)]
drm/i915: Undo gtt scratch pte unmapping again

commit 8ee661b505613ef2747b350ca2871a31b3781bee upstream.

It apparently blows up on some machines. This functionally reverts

commit 828c79087cec61eaf4c76bb32c222fbe35ac3930
Author: Ben Widawsky <benjamin.widawsky@intel.com>
Date:   Wed Oct 16 09:21:30 2013 -0700

    drm/i915: Disable GGTT PTEs on GEN6+ suspend

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=64841
Reported-and-Tested-by: Brad Jackson <bjackson0971@gmail.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Todd Previte <tprevite@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agoi2c: cpm: Fix build by adding of_address.h and of_irq.h
Scott Wood [Tue, 18 Mar 2014 21:10:24 +0000 (16:10 -0500)]
i2c: cpm: Fix build by adding of_address.h and of_irq.h

commit 5f12c5eca6e6b7aeb4b2028d579f614b4fe7a81f upstream.

Fixes a build break due to the undeclared use of irq_of_parse_and_map()
and of_iomap().  This build break was apparently introduced while the
driver was unbuildable due to the bug fixed by
62c19c9d29e65086e5ae76df371ed2e6b23f00cd ("i2c: Remove usage of
orphaned symbol OF_I2C").  When 62c19c was added in v3.14-rc7,
the driver was enabled again, breaking the powerpc mpc85xx_defconfig
and mpc85xx_smp_defconfig.

62c19c is marked for stable, so this should go there as well.

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agoRevert "xen: properly account for _PAGE_NUMA during xen pte translations"
David Vrabel [Tue, 25 Mar 2014 10:38:37 +0000 (10:38 +0000)]
Revert "xen: properly account for _PAGE_NUMA during xen pte translations"

commit 5926f87fdaad4be3ed10cec563bf357915e55a86 upstream.

This reverts commit a9c8e4beeeb64c22b84c803747487857fe424b68.

PTEs in Xen PV guests must contain machine addresses if _PAGE_PRESENT
is set and pseudo-physical addresses is _PAGE_PRESENT is clear.

This is because during a domain save/restore (migration) the page
table entries are "canonicalised" and uncanonicalised". i.e., MFNs are
converted to PFNs during domain save so that on a restore the page
table entries may be rewritten with the new MFNs on the destination.
This canonicalisation is only done for PTEs that are present.

This change resulted in writing PTEs with MFNs if _PAGE_PROTNONE (or
_PAGE_NUMA) was set but _PAGE_PRESENT was clear.  These PTEs would be
migrated as-is which would result in unexpected behaviour in the
destination domain.  Either a) the MFN would be translated to the
wrong PFN/page; b) setting the _PAGE_PRESENT bit would clear the PTE
because the MFN is no longer owned by the domain; or c) the present
bit would not get set.

Symptoms include "Bad page" reports when munmapping after migrating a
domain.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agoxen/balloon: flush persistent kmaps in correct position
Wei Liu [Sat, 15 Mar 2014 16:11:47 +0000 (16:11 +0000)]
xen/balloon: flush persistent kmaps in correct position

commit 09ed3d5ba06137913960f9c9385f71fc384193ab upstream.

Xen balloon driver will update ballooned out pages' P2M entries to point
to scratch page for PV guests. In 24f69373e2 ("xen/balloon: don't alloc
page while non-preemptible", kmap_flush_unused was moved after updating
P2M table. In that case for 32 bit PV guest we might end up with

  P2M    X -----> S  (S is mfn of balloon scratch page)
  M2P    Y -----> X  (Y is mfn in persistent kmap entry)

kmap_flush_unused() iterates through all the PTEs in the kmap address
space, using pte_to_page() to obtain the page. If the p2m and the m2p
are inconsistent the incorrect page is returned.  This will clear
page->address on the wrong page which may cause subsequent oopses if
that page is currently kmap'ed.

Move the flush back between get_page and __set_phys_to_machine to fix
this.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agoInput: cypress_ps2 - don't report as a button pads
Hans de Goede [Wed, 26 Mar 2014 20:30:52 +0000 (13:30 -0700)]
Input: cypress_ps2 - don't report as a button pads

commit 6797b39e6f6f34c74177736e146406e894b9482b upstream.

The cypress PS/2 trackpad models supported by the cypress_ps2 driver
emulate BTN_RIGHT events in firmware based on the finger position, as part
of this no motion events are sent when the finger is in the button area.

The INPUT_PROP_BUTTONPAD property is there to indicate to userspace that
BTN_RIGHT events should be emulated in userspace, which is not necessary
in this case.

When INPUT_PROP_BUTTONPAD is advertised userspace will wait for a motion
event before propagating the button event higher up the stack, as it needs
current abs x + y data for its BTN_RIGHT emulation. Since in the
cypress_ps2 pads don't report motion events in the button area, this means
that clicks in the button area end up being ignored, so
INPUT_PROP_BUTTONPAD actually causes problems for these touchpads, and
removing it fixes:

https://bugs.freedesktop.org/show_bug.cgi?id=76341

Reported-by: Adam Williamson <awilliam@redhat.com>
Tested-by: Adam Williamson <awilliam@redhat.com>
Reviewed-by: Peter Hutterer <peter.hutterer@who-t.net>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agoInput: synaptics - add manual min/max quirk for ThinkPad X240
Hans de Goede [Fri, 28 Mar 2014 08:01:38 +0000 (01:01 -0700)]
Input: synaptics - add manual min/max quirk for ThinkPad X240

commit 8a0435d958fb36d93b8df610124a0e91e5675c82 upstream.

This extends Benjamin Tissoires manual min/max quirk table with support for
the ThinkPad X240.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agoInput: synaptics - add manual min/max quirk
Benjamin Tissoires [Fri, 28 Mar 2014 07:43:00 +0000 (00:43 -0700)]
Input: synaptics - add manual min/max quirk

commit 421e08c41fda1f0c2ff6af81a67b491389b653a5 upstream.

The new Lenovo Haswell series (-40's) contains a new Synaptics touchpad.
However, these new Synaptics devices report bad axis ranges.
Under Windows, it is not a problem because the Windows driver uses RMI4
over SMBus to talk to the device. Under Linux, we are using the PS/2
fallback interface and it occurs the reported ranges are wrong.

Of course, it would be too easy to have only one range for the whole
series, each touchpad seems to be calibrated in a different way.

We can not use SMBus to get the actual range because I suspect the firmware
will switch into the SMBus mode and stop talking through PS/2 (this is the
case for hybrid HID over I2C / PS/2 Synaptics touchpads).

So as a temporary solution (until RMI4 land into upstream), start a new
list of quirks with the min/max manually set.

Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agoInput: mousedev - fix race when creating mixed device
Dmitry Torokhov [Thu, 6 Mar 2014 20:57:24 +0000 (12:57 -0800)]
Input: mousedev - fix race when creating mixed device

commit e4dbedc7eac7da9db363a36f2bd4366962eeefcc upstream.

We should not be using static variable mousedev_mix in methods that can be
called before that singleton gets assigned. While at it let's add open and
close methods to mousedev structure so that we do not need to test if we
are dealing with multiplexor or normal device and simply call appropriate
method directly.

This fixes: https://bugzilla.kernel.org/show_bug.cgi?id=71551

Reported-by: GiulioDP <depasquale.giulio@gmail.com>
Tested-by: GiulioDP <depasquale.giulio@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agoext4: atomically set inode->i_flags in ext4_set_inode_flags()
Theodore Ts'o [Sun, 30 Mar 2014 14:20:01 +0000 (10:20 -0400)]
ext4: atomically set inode->i_flags in ext4_set_inode_flags()

commit 00a1a053ebe5febcfc2ec498bd894f035ad2aa06 upstream.

Use cmpxchg() to atomically set i_flags instead of clearing out the
S_IMMUTABLE, S_APPEND, etc. flags and then setting them from the
EXT4_IMMUTABLE_FL, EXT4_APPEND_FL flags, since this opens up a race
where an immutable file has the immutable flag cleared for a brief
window of time.

Reported-by: John Sullivan <jsrhbz@kanargh.force9.co.uk>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agos390/time,vdso: fix clock_gettime for CLOCK_MONOTONIC
Martin Schwidefsky [Mon, 2 Dec 2013 17:00:36 +0000 (18:00 +0100)]
s390/time,vdso: fix clock_gettime for CLOCK_MONOTONIC

commit ca5de58ba746b08c920b2024aaf01aa1500b110d upstream.

With git commit 79c74ecbebf76732f91b82a62ce7fc8a88326962
"s390/time,vdso: convert to the new update_vsyscall interface"
the new update_vsyscall function already does the sum of xtime
and wall_to_monotonic. The old update_vsyscall function only
copied the wall_to_monotonic offset. The vdso code needs to be
modified to take this into consideration.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agoInput: wacom - add support for three new Intuos Pro devices
Ping Cheng [Thu, 27 Mar 2014 22:40:02 +0000 (15:40 -0700)]
Input: wacom - add support for three new Intuos Pro devices

commit b5fd2a3e92ca5c8c1f3c20d31ac5daed3ec4d604 upstream.

Two tablets in this series support both pen and touch. One (Intuos S)
only supports pen. This patch also updates the driver to process wireless
devices that do not support touch interface.

Tested-by: Jason Gerecke <killertofu@gmail.com>
Acked-by: Chris Bagwell <chris@cnpbagwell.com>
Signed-off-by: Ping Cheng <pingc@wacom.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agoipvs: fix AF assignment in ip_vs_conn_new()
Michal Kubecek [Thu, 30 Jan 2014 07:50:20 +0000 (08:50 +0100)]
ipvs: fix AF assignment in ip_vs_conn_new()

commit 2a971354e74f3837d14b9c8d7f7983b0c9c330e4 upstream.

If a fwmark is passed to ip_vs_conn_new(), it is passed in
vaddr, not daddr. Therefore we should set AF to AF_UNSPEC in
vaddr assignment (like we do in ip_vs_ct_in_get()), otherwise we
may copy only first 4 bytes of an IPv6 address into cp->daddr.

Signed-off-by: Bogdano Arendartchuk <barendartchuk@suse.com>
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agoFS-Cache: Handle removal of unadded object to the fscache_object_list rb tree
David Howells [Mon, 17 Feb 2014 15:01:47 +0000 (15:01 +0000)]
FS-Cache: Handle removal of unadded object to the fscache_object_list rb tree

commit 7026f1929e18921fd67bf478f475a8fdfdff16ae upstream.

When FS-Cache allocates an object, the following sequence of events can
occur:

 -->fscache_alloc_object()
    -->cachefiles_alloc_object() [via cache->ops->alloc_object]
    <--[returns new object]
    -->fscache_attach_object()
    <--[failed]
    -->cachefiles_put_object() [via cache->ops->put_object]
       -->fscache_object_destroy()
          -->fscache_objlist_remove()
             -->rb_erase() to remove the object from fscache_object_list.

resulting in a crash in the rbtree code.

The problem is that the object is only added to fscache_object_list on
the success path of fscache_attach_object() where it calls
fscache_objlist_add().

So if fscache_attach_object() fails, the object won't have been added to
the objlist rbtree.  We do, however, unconditionally try to remove the
object from the tree.

Thanks to NeilBrown for finding this and suggesting this solution.

Reported-by: NeilBrown <neilb@suse.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: (a customer of) NeilBrown <neilb@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agodlm: set zero linger time on sctp socket
Dongmao Zhang [Tue, 10 Dec 2013 16:52:22 +0000 (10:52 -0600)]
dlm: set zero linger time on sctp socket

commit ece35848c1847cdf3dd07954578d3e99238ebbae upstream.

The recovery time for a failed node was taking a long
time because the failed node could not perform the full
shutdown process.  Removing the linger time speeds this
up.  The dlm does not care what happens to messages to
or from the failed node.

Signed-off-by: Dongmao Zhang <dmzhang@suse.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agoNFSv4.1 free slot before resending I/O to MDS
Andy Adamson [Wed, 29 Jan 2014 16:34:38 +0000 (11:34 -0500)]
NFSv4.1 free slot before resending I/O to MDS

commit f9c96fcc501a43dbc292b17fc0ded4b54e63b79d upstream.

Fix a dynamic session slot leak where a slot is preallocated and I/O is
resent through the MDS.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agonfs: add memory barriers around NFS_INO_INVALID_DATA and NFS_INO_INVALIDATING
Jeff Layton [Tue, 28 Jan 2014 18:47:46 +0000 (13:47 -0500)]
nfs: add memory barriers around NFS_INO_INVALID_DATA and NFS_INO_INVALIDATING

commit 4db72b40fdbc706f8957e9773ae73b1574b8c694 upstream.

If the setting of NFS_INO_INVALIDATING gets reordered to before the
clearing of NFS_INO_INVALID_DATA, then another task may hit a race
window where both appear to be clear, even though the inode's pages are
still in need of invalidation. Fix this by adding the appropriate memory
barriers.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agoNFS: Fix races in nfs_revalidate_mapping
Trond Myklebust [Tue, 28 Jan 2014 14:37:16 +0000 (09:37 -0500)]
NFS: Fix races in nfs_revalidate_mapping

commit 17dfeb9113397a6119091a491ef7182649f0c5a9 upstream.

Commit d529ef83c355f97027ff85298a9709fe06216a66 (NFS: fix the handling
of NFS_INO_INVALID_DATA flag in nfs_revalidate_mapping) introduces
a potential race, since it doesn't test the value of nfsi->cache_validity
and set the bitlock in nfsi->flags atomically.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agoNFS: fix the handling of NFS_INO_INVALID_DATA flag in nfs_revalidate_mapping
Jeff Layton [Mon, 27 Jan 2014 18:46:15 +0000 (13:46 -0500)]
NFS: fix the handling of NFS_INO_INVALID_DATA flag in nfs_revalidate_mapping

commit d529ef83c355f97027ff85298a9709fe06216a66 upstream.

There is a possible race in how the nfs_invalidate_mapping function is
handled.  Currently, we go and invalidate the pages in the file and then
clear NFS_INO_INVALID_DATA.

The problem is that it's possible for a stale page to creep into the
mapping after the page was invalidated (i.e., via readahead). If another
writer comes along and sets the flag after that happens but before
invalidate_inode_pages2 returns then we could clear the flag
without the cache having been properly invalidated.

So, we must clear the flag first and then invalidate the pages. Doing
this however, opens another race:

It's possible to have two concurrent read() calls that end up in
nfs_revalidate_mapping at the same time. The first one clears the
NFS_INO_INVALID_DATA flag and then goes to call nfs_invalidate_mapping.

Just before calling that though, the other task races in, checks the
flag and finds it cleared. At that point, it trusts that the mapping is
good and gets the lock on the page, allowing the read() to be satisfied
from the cache even though the data is no longer valid.

These effects are easily manifested by running diotest3 from the LTP
test suite on NFS. That program does a series of DIO writes and buffered
reads. The operations are serialized and page-aligned but the existing
code fails the test since it occasionally allows a read to come out of
the cache incorrectly. While mixing direct and buffered I/O isn't
recommended, I believe it's possible to hit this in other ways that just
use buffered I/O, though that situation is much harder to reproduce.

The problem is that the checking/clearing of that flag and the
invalidation of the mapping really need to be atomic. Fix this by
serializing concurrent invalidations with a bitlock.

At the same time, we also need to allow other places that check
NFS_INO_INVALID_DATA to check whether we might be in the middle of
invalidating the file, so fix up a couple of places that do that
to look for the new NFS_INO_INVALIDATING flag.

Doing this requires us to be careful not to set the bitlock
unnecessarily, so this code only does that if it believes it will
be doing an invalidation.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agopnfs: fix BUG in filelayout_recover_commit_reqs
Weston Andros Adamson [Tue, 21 Jan 2014 20:21:33 +0000 (15:21 -0500)]
pnfs: fix BUG in filelayout_recover_commit_reqs

commit 471252cd8b34b0609973740b25dcd1ff01dc1889 upstream.

cond_resched_lock(cinfo->lock) is called everywhere else while holding
the cinfo->lock spinlock.  Not holding this lock while calling
transfer_commit_list in filelayout_recover_commit_reqs causes the BUG
below.

It's true that we can't hold this lock while calling pnfs_put_lseg,
because that might try to lock the inode lock - which might be the
same lock as cinfo->lock.

To reproduce, mount a 2 DS pynfs server and run an O_DIRECT command
that crosses a stripe boundary and is not page aligned, such as:

 dd if=/dev/zero of=/mnt/f bs=17000 count=1 oflag=direct

BUG: sleeping function called from invalid context at linux/fs/nfs/nfs4filelayout.c:1161
in_atomic(): 0, irqs_disabled(): 0, pid: 27, name: kworker/0:1
2 locks held by kworker/0:1/27:
 #0:  (events){.+.+.+}, at: [<ffffffff810501d7>] process_one_work+0x175/0x3a5
 #1:  ((&dreq->work)){+.+...}, at: [<ffffffff810501d7>] process_one_work+0x175/0x3a5
CPU: 0 PID: 27 Comm: kworker/0:1 Not tainted 3.13.0-rc3-branch-dros_testing+ #21
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
Workqueue: events nfs_direct_write_schedule_work [nfs]
 0000000000000000 ffff88007a39bbb8 ffffffff81491256 ffff88007b87a130  ffff88007a39bbd8 ffffffff8105f103 ffff880079614000 ffff880079617d40  ffff88007a39bc20 ffffffffa011603e ffff880078988b98 0000000000000000
Call Trace:
 [<ffffffff81491256>] dump_stack+0x4d/0x66
 [<ffffffff8105f103>] __might_sleep+0x100/0x105
 [<ffffffffa011603e>] transfer_commit_list+0x94/0xf1 [nfs_layout_nfsv41_files]
 [<ffffffffa01160d6>] filelayout_recover_commit_reqs+0x3b/0x68 [nfs_layout_nfsv41_files]
 [<ffffffffa00ba53a>] nfs_direct_write_reschedule+0x9f/0x1d6 [nfs]
 [<ffffffff810705df>] ? mark_lock+0x1df/0x224
 [<ffffffff8106e617>] ? trace_hardirqs_off_caller+0x37/0xa4
 [<ffffffff8106e691>] ? trace_hardirqs_off+0xd/0xf
 [<ffffffffa00ba8f8>] nfs_direct_write_schedule_work+0x9d/0xb7 [nfs]
 [<ffffffff810501d7>] ? process_one_work+0x175/0x3a5
 [<ffffffff81050258>] process_one_work+0x1f6/0x3a5
 [<ffffffff810501d7>] ? process_one_work+0x175/0x3a5
 [<ffffffff8105187e>] worker_thread+0x149/0x1f5
 [<ffffffff81051735>] ? rescuer_thread+0x28d/0x28d
 [<ffffffff81056d74>] kthread+0xd2/0xda
 [<ffffffff81056ca2>] ? __kthread_parkme+0x61/0x61
 [<ffffffff8149e66c>] ret_from_fork+0x7c/0xb0
 [<ffffffff81056ca2>] ? __kthread_parkme+0x61/0x61

Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
10 years agonfs: increment i_dio_count for reads, too
Christoph Hellwig [Thu, 14 Nov 2013 16:50:30 +0000 (08:50 -0800)]
nfs: increment i_dio_count for reads, too

commit 1f90ee27461e31a1c18e5d819f6ea6f5c7304b16 upstream.

i_dio_count is used to protect dio access against truncate.  We want
to make sure there are no dio reads pending either when doing a
truncate.  I suspect on plain NFS things might work even without
this, but once we use a pnfs layout driver that access backing devices
directly things will go bad without the proper synchronization.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>