git.itanic.dy.fi Git - linux-stable/log

Linux 2.6.32.61

Signed-off-by: Willy Tarreau <w@1wt.eu>

x86, ptrace: fix build breakage with gcc 4.7

Christoph Biedl reported that 2.6.32 does not build with gcc 4.7 on
i386 :

  CC      arch/x86/kernel/ptrace.o
arch/x86/kernel/ptrace.c:1472:17: error: conflicting types for 'syscall_trace_enter'
In file included from /«PKGBUILDDIR»/arch/x86/include/asm/vm86.h:130:0,
                 from /«PKGBUILDDIR»/arch/x86/include/asm/processor.h:10,
                 from /«PKGBUILDDIR»/arch/x86/include/asm/thread_info.h:22,
                 from include/linux/thread_info.h:56,
                 from include/linux/preempt.h:9,
                 from include/linux/spinlock.h:50,
                 from include/linux/seqlock.h:29,
                 from include/linux/time.h:8,
                 from include/linux/timex.h:56,
                 from include/linux/sched.h:56,
                 from arch/x86/kernel/ptrace.c:11:
/«PKGBUILDDIR»/arch/x86/include/asm/ptrace.h:145:13: note: previous declaration of 'syscall_trace_enter' was here
arch/x86/kernel/ptrace.c:1517:17: error: conflicting types for 'syscall_trace_leave'
In file included from /«PKGBUILDDIR»/arch/x86/include/asm/vm86.h:130:0,
                 from /«PKGBUILDDIR»/arch/x86/include/asm/processor.h:10,
                 from /«PKGBUILDDIR»/arch/x86/include/asm/thread_info.h:22,
                 from include/linux/thread_info.h:56,
                 from include/linux/preempt.h:9,
                 from include/linux/spinlock.h:50,
                 from include/linux/seqlock.h:29,
                 from include/linux/time.h:8,
                 from include/linux/timex.h:56,
                 from include/linux/sched.h:56,
                 from arch/x86/kernel/ptrace.c:11:
/«PKGBUILDDIR»/arch/x86/include/asm/ptrace.h:146:13: note: previous declaration of 'syscall_trace_leave' was here
make[4]: *** [arch/x86/kernel/ptrace.o] Error 1
make[3]: *** [arch/x86/kernel] Error 2
make[3]: *** Waiting for unfinished jobs....

He also found that this issue did not appear in more recent kernels since
this asmregparm disappeared in 3.0-rc1 with commit 1b4ac2a935 that was
applied after some UM changes that we don't necessarily want in 2.6.32.

Thus, the cleanest fix for older kernels is to make the declaration in
ptrace.h match the one in ptrace.c by specifying asmregparm on these
functions. They're only called from asm which explains why it used to
work despite the inconsistency in the declaration.

Reported-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Signed-off-by: Willy Tarreau <w@1wt.eu>

mpt2sas: Send default descriptor for RAID pass through in mpt2ctl

commit ebda4d38df542e1ff4747c4daadfc7da250b4fa6 upstream.

RAID_SCSI_IO_PASSTHROUGH: Driver needs to be sending the default
descriptor for RAID Passthru, currently its sending SCSI_IO descriptor.

Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Willy Tarreau <w@1wt.eu>

tipc: fix info leaks via msg_name in recv_msg/recv_stream

commit 60085c3d009b0df252547adb336d1ccca5ce52ec upstream.

The code in set_orig_addr() does not initialize all of the members of
struct sockaddr_tipc when filling the sockaddr info -- namely the union
is only partly filled. This will make recv_msg() and recv_stream() --
the only users of this function -- leak kernel stack memory as the
msg_name member is a local variable in net/socket.c.

Additionally to that both recv_msg() and recv_stream() fail to update
the msg_namelen member to 0 while otherwise returning with 0, i.e.
"success". This is the case for, e.g., non-blocking sockets. This will
lead to a 128 byte kernel stack leak in net/socket.c.

Fix the first issue by initializing the memory of the union with
memset(0). Fix the second one by setting msg_namelen to 0 early as it
will be updated later if we're going to fill the msg_name member.

Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[dannf: backported to Debian's 2.6.32]
Signed-off-by: Willy Tarreau <w@1wt.eu>

irda: Fix missing msg_namelen update in irda_recvmsg_dgram()

commit 5ae94c0d2f0bed41d6718be743985d61b7f5c47d upstream.

The current code does not fill the msg_name member in case it is set.
It also does not set the msg_namelen member to 0 and therefore makes
net/socket.c leak the local, uninitialized sockaddr_storage variable
to userland -- 128 bytes of kernel stack memory.

Fix that by simply setting msg_namelen to 0 as obviously nobody cared
about irda_recvmsg_dgram() not filling the msg_name in case it was
set.

Cc: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[dannf: adjusted to apply to Debian's 2.6.32]
Signed-off-by: Willy Tarreau <w@1wt.eu>

rose: fix info leak via msg_name in rose_recvmsg()

[ Upstream commit 4a184233f21645cf0b719366210ed445d1024d72 ]

The code in rose_recvmsg() does not initialize all of the members of
struct sockaddr_rose/full_sockaddr_rose when filling the sockaddr info.
Nor does it initialize the padding bytes of the structure inserted by
the compiler for alignment. This will lead to leaking uninitialized
kernel stack bytes in net/socket.c.

Fix the issue by initializing the memory used for sockaddr info with
memset(0).

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

rds: set correct msg_namelen

commit 06b6a1cf6e776426766298d055bb3991957d90a7 upstream

Jay Fenlason (fenlason@redhat.com) found a bug,
that recvfrom() on an RDS socket can return the contents of random kernel
memory to userspace if it was called with a address length larger than
sizeof(struct sockaddr_in).
rds_recvmsg() also fails to set the addr_len paramater properly before
returning, but that's just a bug.
There are also a number of cases wher recvfrom() can return an entirely bogus
address. Anything in rds_recvmsg() that returns a non-negative value but does
not go through the "sin = (struct sockaddr_in *)msg->msg_name;" code path
at the end of the while(1) loop will return up to 128 bytes of kernel memory
to userspace.

And I write two test programs to reproduce this bug, you will see that in
rds_server, fromAddr will be overwritten and the following sock_fd will be
destroyed.
Yes, it is the programmer's fault to set msg_namelen incorrectly, but it is
better to make the kernel copy the real length of address to user space in
such case.

How to run the test programs ?
I test them on 32bit x86 system, 3.5.0-rc7.

1 compile
gcc -o rds_client rds_client.c
gcc -o rds_server rds_server.c

2 run ./rds_server on one console

3 run ./rds_client on another console

4 you will see something like:
server is waiting to receive data...
old socket fd=3
server received data from client:data from client
msg.msg_namelen=32
new socket fd=-1067277685
sendmsg()
: Bad file descriptor

/***************** rds_client.c ********************/

int main(void)
{
int sock_fd;
struct sockaddr_in serverAddr;
struct sockaddr_in toAddr;
char recvBuffer[128] = "data from client";
struct msghdr msg;
struct iovec iov;

sock_fd = socket(AF_RDS, SOCK_SEQPACKET, 0);
if (sock_fd < 0) {
perror("create socket error\n");
exit(1);
}

memset(&serverAddr, 0, sizeof(serverAddr));
serverAddr.sin_family = AF_INET;
serverAddr.sin_addr.s_addr = inet_addr("127.0.0.1");
serverAddr.sin_port = htons(4001);

if (bind(sock_fd, (struct sockaddr*)&serverAddr, sizeof(serverAddr)) < 0) {
perror("bind() error\n");
close(sock_fd);
exit(1);
}

memset(&toAddr, 0, sizeof(toAddr));
toAddr.sin_family = AF_INET;
toAddr.sin_addr.s_addr = inet_addr("127.0.0.1");
toAddr.sin_port = htons(4000);
msg.msg_name = &toAddr;
msg.msg_namelen = sizeof(toAddr);
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
msg.msg_iov->iov_base = recvBuffer;
msg.msg_iov->iov_len = strlen(recvBuffer) + 1;
msg.msg_control = 0;
msg.msg_controllen = 0;
msg.msg_flags = 0;

if (sendmsg(sock_fd, &msg, 0) == -1) {
perror("sendto() error\n");
close(sock_fd);
exit(1);
}

printf("client send data:%s\n", recvBuffer);

memset(recvBuffer, '\0', 128);

msg.msg_name = &toAddr;
msg.msg_namelen = sizeof(toAddr);
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
msg.msg_iov->iov_base = recvBuffer;
msg.msg_iov->iov_len = 128;
msg.msg_control = 0;
msg.msg_controllen = 0;
msg.msg_flags = 0;
if (recvmsg(sock_fd, &msg, 0) == -1) {
perror("recvmsg() error\n");
close(sock_fd);
exit(1);
}

printf("receive data from server:%s\n", recvBuffer);

close(sock_fd);

return 0;
}

/***************** rds_server.c ********************/

int main(void)
{
struct sockaddr_in fromAddr;
int sock_fd;
struct sockaddr_in serverAddr;
unsigned int addrLen;
char recvBuffer[128];
struct msghdr msg;
struct iovec iov;

sock_fd = socket(AF_RDS, SOCK_SEQPACKET, 0);
if(sock_fd < 0) {
perror("create socket error\n");
exit(0);
}

memset(&serverAddr, 0, sizeof(serverAddr));
serverAddr.sin_family = AF_INET;
serverAddr.sin_addr.s_addr = inet_addr("127.0.0.1");
serverAddr.sin_port = htons(4000);
if (bind(sock_fd, (struct sockaddr*)&serverAddr, sizeof(serverAddr)) < 0) {
perror("bind error\n");
close(sock_fd);
exit(1);
}

printf("server is waiting to receive data...\n");
msg.msg_name = &fromAddr;

/*
* I add 16 to sizeof(fromAddr), ie 32,
* and pay attention to the definition of fromAddr,
* recvmsg() will overwrite sock_fd,
* since kernel will copy 32 bytes to userspace.
*
* If you just use sizeof(fromAddr), it works fine.
* */
msg.msg_namelen = sizeof(fromAddr) + 16;
/* msg.msg_namelen = sizeof(fromAddr); */
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
msg.msg_iov->iov_base = recvBuffer;
msg.msg_iov->iov_len = 128;
msg.msg_control = 0;
msg.msg_controllen = 0;
msg.msg_flags = 0;

while (1) {
printf("old socket fd=%d\n", sock_fd);
if (recvmsg(sock_fd, &msg, 0) == -1) {
perror("recvmsg() error\n");
close(sock_fd);
exit(1);
}
printf("server received data from client:%s\n", recvBuffer);
printf("msg.msg_namelen=%d\n", msg.msg_namelen);
printf("new socket fd=%d\n", sock_fd);
strcat(recvBuffer, "--data from server");
if (sendmsg(sock_fd, &msg, 0) == -1) {
perror("sendmsg()\n");
close(sock_fd);
exit(1);
}
}

close(sock_fd);
return 0;
}

Signed-off-by: Weiping Pan <wpan@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[dannf: Adjusted to apply to Debian's 2.6.32]
Signed-off-by: Willy Tarreau <w@1wt.eu>

llc: Fix missing msg_namelen update in llc_ui_recvmsg()

[ Upstream commit c77a4b9cffb6215a15196ec499490d116dfad181 ]

For stream sockets the code misses to update the msg_namelen member
to 0 and therefore makes net/socket.c leak the local, uninitialized
sockaddr_storage variable to userland -- 128 bytes of kernel stack
memory. The msg_namelen update is also missing for datagram sockets
in case the socket is shutting down during receive.

Fix both issues by setting msg_namelen to 0 early. It will be
updated later if we're going to fill the msg_name member.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

llc: fix info leak via getsockname()

[ Upstream commit 3592aaeb80290bda0f2cf0b5456c97bfc638b192 ]

The LLC code wrongly returns 0, i.e. "success", when the socket is
zapped. Together with the uninitialized uaddrlen pointer argument from
sys_getsockname this leads to an arbitrary memory leak of up to 128
bytes kernel stack via the getsockname() syscall.

Return an error instead when the socket is zapped to prevent the info
leak. Also remove the unnecessary memset(0). We don't directly write to
the memory pointed by uaddr but memcpy() a local structure at the end of
the function that is properly initialized.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

iucv: Fix missing msg_namelen update in iucv_sock_recvmsg()

[ Upstream commit a5598bd9c087dc0efc250a5221e5d0e6f584ee88 ]

The current code does not fill the msg_name member in case it is set.
It also does not set the msg_namelen member to 0 and therefore makes
net/socket.c leak the local, uninitialized sockaddr_storage variable
to userland -- 128 bytes of kernel stack memory.

Fix that by simply setting msg_namelen to 0 as obviously nobody cared
about iucv_sock_recvmsg() not filling the msg_name in case it was set.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

isdnloop: fix and simplify isdnloop_init()

[ Upstream commit 77f00f6324cb97cf1df6f9c4aaeea6ada23abdb2 ]

Fix a buffer overflow bug by removing the revision and printk.

[   22.016214] isdnloop-ISDN-driver Rev 1.11.6.7
[   22.097508] isdnloop: (loop0) virtual card added
[   22.174400] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff83244972
[   22.174400]
[   22.436157] Pid: 1, comm: swapper Not tainted 3.5.0-bisect-00018-gfa8bbb1-dirty #129
[   22.624071] Call Trace:
[   22.720558]  [<ffffffff832448c3>] ? CallcNew+0x56/0x56
[   22.815248]  [<ffffffff8222b623>] panic+0x110/0x329
[   22.914330]  [<ffffffff83244972>] ? isdnloop_init+0xaf/0xb1
[   23.014800]  [<ffffffff832448c3>] ? CallcNew+0x56/0x56
[   23.090763]  [<ffffffff8108e24b>] __stack_chk_fail+0x2b/0x30
[   23.185748]  [<ffffffff83244972>] isdnloop_init+0xaf/0xb1

Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

ax25: fix info leak via msg_name in ax25_recvmsg()

[ Upstream commit ef3313e84acbf349caecae942ab3ab731471f1a1 ]

When msg_namelen is non-zero the sockaddr info gets filled out, as
requested, but the code fails to initialize the padding bytes of struct
sockaddr_ax25 inserted by the compiler for alignment. Additionally the
msg_namelen value is updated to sizeof(struct full_sockaddr_ax25) but is
not always filled up to this size.

Both issues lead to the fact that the code will leak uninitialized
kernel stack bytes in net/socket.c.

Fix both issues by initializing the memory with memset(0).

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

atm: fix info leak in getsockopt(SO_ATMPVC)

commit e862f1a9b7df4e8196ebec45ac62295138aa3fc2 upstream.

The ATM code fails to initialize the two padding bytes of struct
sockaddr_atmpvc inserted for alignment. Add an explicit memset(0)
before filling the structure to avoid the info leak.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 2.6.32: adjust context, indentation]
Signed-off-by: Willy Tarreau <w@1wt.eu>

atm: fix info leak via getsockname()

commit 3c0c5cfdcd4d69ffc4b9c0907cec99039f30a50a upstream.

The ATM code fails to initialize the two padding bytes of struct
sockaddr_atmpvc inserted for alignment. Add an explicit memset(0)
before filling the structure to avoid the info leak.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 2.6.32: adjust context]
Signed-off-by: Willy Tarreau <w@1wt.eu>

atm: update msg_namelen in vcc_recvmsg()

[ Upstream commit 9b3e617f3df53822345a8573b6d358f6b9e5ed87 ]

The current code does not fill the msg_name member in case it is set.
It also does not set the msg_namelen member to 0 and therefore makes
net/socket.c leak the local, uninitialized sockaddr_storage variable
to userland -- 128 bytes of kernel stack memory.

Fix that by simply setting msg_namelen to 0 as obviously nobody cared
about vcc_recvmsg() not filling the msg_name in case it was set.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

ipvs: fix info leak in getsockopt(IP_VS_SO_GET_TIMEOUT)

commit 2d8a041b7bfe1097af21441cb77d6af95f4f4680 upstream.

If at least one of CONFIG_IP_VS_PROTO_TCP or CONFIG_IP_VS_PROTO_UDP is
not set, __ip_vs_get_timeouts() does not fully initialize the structure
that gets copied to userland and that for leaks up to 12 bytes of kernel
stack. Add an explicit memset(0) before passing the structure to
__ip_vs_get_timeouts() to avoid the info leak.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Wensong Zhang <wensong@linux-vs.org>
Cc: Simon Horman <horms@verge.net.au>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 2.6.32: adjust context]
Signed-off-by: Willy Tarreau <w@1wt.eu>

ipvs: IPv6 MTU checking cleanup and bugfix

Cleaning up the IPv6 MTU checking in the IPVS xmit code, by using
a common helper function __mtu_check_toobig_v6().

The MTU check for tunnel mode can also use this helper as
ntohs(old_iph->payload_len) + sizeof(struct ipv6hdr) is qual to
skb->len. And the 'mtu' variable have been adjusted before
calling helper.

Notice, this also fixes a bug, as the the MTU check in ip_vs_dr_xmit_v6()
were missing a check for skb_is_gso().

This bug e.g. caused issues for KVM IPVS setups, where different
Segmentation Offloading techniques are utilized, between guests,
via the virtio driver. This resulted in very bad performance,
due to the ICMPv6 "too big" messages didn't affect the sender.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
(cherry picked from commit 590e3f79a21edd2e9857ac3ced25ba6b2a491ef8)
Signed-off-by: Willy Tarreau <w@1wt.eu>

ipvs: allow transmit of GRO aggregated skbs

Attempt at allowing LVS to transmit skbs of greater than MTU length that
have been aggregated by GRO and can thus be deaggregated by GSO.

Cc: Julian Anastasov <ja@ssi.bg>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Simon Horman <horms@verge.net.au>
(cherry picked from commit 8f1b03a4c18e8f3f0801447b62330faa8ed3bb37)
Signed-off-by: Willy Tarreau <w@1wt.eu>

netfilter: nf_ct_ipv4: packets with wrong ihl are invalid

commit 07153c6ec074257ade76a461429b567cff2b3a1e upstream.

It was reported that the Linux kernel sometimes logs:

klogd: [2629147.402413] kernel BUG at net / netfilter /
nf_conntrack_proto_tcp.c: 447!
klogd: [1072212.887368] kernel BUG at net / netfilter /
nf_conntrack_proto_tcp.c: 392

ipv4_get_l4proto() in nf_conntrack_l3proto_ipv4.c and tcp_error() in
nf_conntrack_proto_tcp.c should catch malformed packets, so the errors
at the indicated lines - TCP options parsing - should not happen.
However, tcp_error() relies on the "dataoff" offset to the TCP header,
calculated by ipv4_get_l4proto(). But ipv4_get_l4proto() does not check
bogus ihl values in IPv4 packets, which then can slip through tcp_error()
and get caught at the TCP options parsing routines.

The patch fixes ipv4_get_l4proto() by invalidating packets with bogus
ihl value.

The patch closes netfilter bugzilla id 771.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: David Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

ipv6: make fragment identifications less predictable

[ Backport of upstream commit 87c48fa3b4630905f98268dde838ee43626a060c ]

Fernando Gont reported current IPv6 fragment identification generation
was not secure, because using a very predictable system-wide generator,
allowing various attacks.

IPv4 uses inetpeer cache to address this problem and to get good
performance. We'll use this mechanism when IPv6 inetpeer is stable
enough in linux-3.1

For the time being, we use jhash on destination address to provide less
predictable identifications. Also remove a spinlock and use cmpxchg() to
get better SMP performance.

Reported-by: Fernando Gont <fernando@gont.com.ar>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[bwh: Backport further to 2.6.32]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>

ipv6: discard overlapping fragment

commit 70789d7052239992824628db8133de08dc78e593 upstream

RFC5722 prohibits reassembling fragments when some data overlaps.

Bug spotted by Zhang Zuotao <zuotao.zhang@6wind.com>.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[dannf: backported to Debian's 2.6.32]
Signed-off-by: Willy Tarreau <w@1wt.eu>

net: sctp: sctp_auth_key_put: use kzfree instead of kfree

[ Upstream commit 586c31f3bf04c290dc0a0de7fc91d20aa9a5ee53 ]

For sensitive data like keying material, it is common practice to zero
out keys before returning the memory back to the allocator. Thus, use
kzfree instead of kfree.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

net: sctp: sctp_endpoint_free: zero out secret key data

[ Upstream commit b5c37fe6e24eec194bb29d22fdd55d73bcc709bf ]

On sctp_endpoint_destroy, previously used sensitive keying material
should be zeroed out before the memory is returned, as we already do
with e.g. auth keys when released.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

net: sctp: sctp_setsockopt_auth_key: use kzfree instead of kfree

[ Upstream commit 6ba542a291a5e558603ac51cda9bded347ce7627 ]

In sctp_setsockopt_auth_key, we create a temporary copy of the user
passed shared auth key for the endpoint or association and after
internal setup, we free it right away. Since it's sensitive data, we
should zero out the key before returning the memory back to the
allocator. Thus, use kzfree instead of kfree, just as we do in
sctp_auth_key_put().

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

sctp: fix memory leak in sctp_datamsg_from_user() when copy from user space fails

[ Upstream commit be364c8c0f17a3dd42707b5a090b318028538eb9 ]

Trinity (the syscall fuzzer) discovered a memory leak in SCTP,
reproducible e.g. with the sendto() syscall by passing invalid
user space pointer in the second argument:

#include <string.h>
#include <arpa/inet.h>
#include <sys/socket.h>

int main(void)
{
         int fd;
         struct sockaddr_in sa;

         fd = socket(AF_INET, SOCK_STREAM, 132 /*IPPROTO_SCTP*/);
         if (fd < 0)
                 return 1;

         memset(&sa, 0, sizeof(sa));
         sa.sin_family = AF_INET;
         sa.sin_addr.s_addr = inet_addr("127.0.0.1");
         sa.sin_port = htons(11111);

         sendto(fd, NULL, 1, 0, (struct sockaddr *)&sa, sizeof(sa));

         return 0;
}

As far as I can tell, the leak has been around since ~2003.

Signed-off-by: Tommi Rantala <tt.rantala@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

dcbnl: fix various netlink info leaks

commit 29cd8ae0e1a39e239a3a7b67da1986add1199fc0 upstream.

The dcb netlink interface leaks stack memory in various places:
* perm_addr[] buffer is only filled at max with 12 of the 32 bytes but
  copied completely,
* no in-kernel driver fills all fields of an IEEE 802.1Qaz subcommand,
  so we're leaking up to 58 bytes for ieee_ets structs, up to 136 bytes
  for ieee_pfc structs, etc.,
* the same is true for CEE -- no in-kernel driver fills the whole
  struct,

Prevent all of the above stack info leaks by properly initializing the
buffers/structures involved.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 2.6.32: no support for IEEE or CEE commands, so only
deal with perm_addr]
Signed-off-by: Willy Tarreau <w@1wt.eu>

unix: fix a race condition in unix_release()

[ Upstream commit ded34e0fe8fe8c2d595bfa30626654e4b87621e0 ]

As reported by Jan, and others over the past few years, there is a
race condition caused by unix_release setting the sock->sk pointer
to NULL before properly marking the socket as dead/orphaned. This
can cause a problem with the LSM hook security_unix_may_send() if
there is another socket attempting to write to this partially
released socket in between when sock->sk is set to NULL and it is
marked as dead/orphaned. This patch fixes this by only setting
sock->sk to NULL after the socket has been marked as dead; I also
take the opportunity to make unix_release_sock() a void function
as it only ever returned 0/success.

Dave, I think this one should go on the -stable pile.

Special thanks to Jan for coming up with a reproducer for this
problem.

Reported-by: Jan Stancek <jan.stancek@gmail.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

tcp: preserve ACK clocking in TSO

[ Upstream commit f4541d60a449afd40448b06496dcd510f505928e ]

A long standing problem with TSO is the fact that tcp_tso_should_defer()
rearms the deferred timer, while it should not.

Current code leads to following bad bursty behavior :

20:11:24.484333 IP A > B: . 297161:316921(19760) ack 1 win 119
20:11:24.484337 IP B > A: . ack 263721 win 1117
20:11:24.485086 IP B > A: . ack 265241 win 1117
20:11:24.485925 IP B > A: . ack 266761 win 1117
20:11:24.486759 IP B > A: . ack 268281 win 1117
20:11:24.487594 IP B > A: . ack 269801 win 1117
20:11:24.488430 IP B > A: . ack 271321 win 1117
20:11:24.489267 IP B > A: . ack 272841 win 1117
20:11:24.490104 IP B > A: . ack 274361 win 1117
20:11:24.490939 IP B > A: . ack 275881 win 1117
20:11:24.491775 IP B > A: . ack 277401 win 1117
20:11:24.491784 IP A > B: . 316921:332881(15960) ack 1 win 119
20:11:24.492620 IP B > A: . ack 278921 win 1117
20:11:24.493448 IP B > A: . ack 280441 win 1117
20:11:24.494286 IP B > A: . ack 281961 win 1117
20:11:24.495122 IP B > A: . ack 283481 win 1117
20:11:24.495958 IP B > A: . ack 285001 win 1117
20:11:24.496791 IP B > A: . ack 286521 win 1117
20:11:24.497628 IP B > A: . ack 288041 win 1117
20:11:24.498459 IP B > A: . ack 289561 win 1117
20:11:24.499296 IP B > A: . ack 291081 win 1117
20:11:24.500133 IP B > A: . ack 292601 win 1117
20:11:24.500970 IP B > A: . ack 294121 win 1117
20:11:24.501388 IP B > A: . ack 295641 win 1117
20:11:24.501398 IP A > B: . 332881:351881(19000) ack 1 win 119

While the expected behavior is more like :

20:19:49.259620 IP A > B: . 197601:202161(4560) ack 1 win 119
20:19:49.260446 IP B > A: . ack 154281 win 1212
20:19:49.261282 IP B > A: . ack 155801 win 1212
20:19:49.262125 IP B > A: . ack 157321 win 1212
20:19:49.262136 IP A > B: . 202161:206721(4560) ack 1 win 119
20:19:49.262958 IP B > A: . ack 158841 win 1212
20:19:49.263795 IP B > A: . ack 160361 win 1212
20:19:49.264628 IP B > A: . ack 161881 win 1212
20:19:49.264637 IP A > B: . 206721:211281(4560) ack 1 win 119
20:19:49.265465 IP B > A: . ack 163401 win 1212
20:19:49.265886 IP B > A: . ack 164921 win 1212
20:19:49.266722 IP B > A: . ack 166441 win 1212
20:19:49.266732 IP A > B: . 211281:215841(4560) ack 1 win 119
20:19:49.267559 IP B > A: . ack 167961 win 1212
20:19:49.268394 IP B > A: . ack 169481 win 1212
20:19:49.269232 IP B > A: . ack 171001 win 1212
20:19:49.269241 IP A > B: . 215841:221161(5320) ack 1 win 119

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Van Jacobson <vanj@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

tcp: fix MSG_SENDPAGE_NOTLAST logic

[ Upstream commit ae62ca7b03217be5e74759dc6d7698c95df498b3 ]

commit 35f9c09fe9c72e (tcp: tcp_sendpages() should call tcp_push() once)
added an internal flag : MSG_SENDPAGE_NOTLAST meant to be set on all
frags but the last one for a splice() call.

The condition used to set the flag in pipe_to_sendpage() relied on
splice() user passing the exact number of bytes present in the pipe,
or a smaller one.

But some programs pass an arbitrary high value, and the test fails.

The effect of this bug is a lack of tcp_push() at the end of a
splice(pipe -> socket) call, and possibly very slow or erratic TCP
sessions.

We should both test sd->total_len and fact that another fragment
is in the pipe (pipe->nrbufs > 1)

Many thanks to Willy for providing very clear bug report, bisection
and test programs.

Reported-by: Willy Tarreau <w@1wt.eu>
Bisected-by: Willy Tarreau <w@1wt.eu>
Tested-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

tcp: allow splice() to build full TSO packets

[ This combines upstream commit
2f53384424251c06038ae612e56231b96ab610ee and the follow-on bug fix
commit 35f9c09fe9c72eb8ca2b8e89a593e1c151f28fc2 ]

vmsplice()/splice(pipe, socket) call do_tcp_sendpages() one page at a
time, adding at most 4096 bytes to an skb. (assuming PAGE_SIZE=4096)

The call to tcp_push() at the end of do_tcp_sendpages() forces an
immediate xmit when pipe is not already filled, and tso_fragment() try
to split these skb to MSS multiples.

4096 bytes are usually split in a skb with 2 MSS, and a remaining
sub-mss skb (assuming MTU=1500)

This makes slow start suboptimal because many small frames are sent to
qdisc/driver layers instead of big ones (constrained by cwnd and packets
in flight of course)

In fact, applications using sendmsg() (adding an additional memory copy)
instead of vmsplice()/splice()/sendfile() are a bit faster because of
this anomaly, especially if serving small files in environments with
large initial [c]wnd.

Call tcp_push() only if MSG_MORE is not set in the flags parameter.

This bit is automatically provided by splice() internals but for the
last page, or on all pages if user specified SPLICE_F_MORE splice()
flag.

In some workloads, this can reduce number of sent logical packets by an
order of magnitude, making zero-copy TCP actually faster than
one-copy :)

Reported-by: Tom Herbert <therbert@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: H.K. Jerry Chu <hkchu@google.com>
Cc: Maciej Å»enczykowski <maze@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: Ilpo JÃ¤rvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

inet: add RCU protection to inet->opt

commit f6d8bd051c391c1c0458a30b2a7abcd939329259 upstream.

We lack proper synchronization to manipulate inet->opt ip_options

Problem is ip_make_skb() calls ip_setup_cork() and
ip_setup_cork() possibly makes a copy of ipc->opt (struct ip_options),
without any protection against another thread manipulating inet->opt.

Another thread can change inet->opt pointer and free old one under us.

Use RCU to protect inet->opt (changed to inet->inet_opt).

Instead of handling atomic refcounts, just copy ip_options when
necessary, to avoid cache line dirtying.

We cant insert an rcu_head in struct ip_options since its included in
skb->cb[], so this patch is large because I had to introduce a new
ip_options_rcu structure.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
[dannf/bwh: backported to Debian's 2.6.32]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>

net: fix info leak in compat dev_ifconf()

commit 43da5f2e0d0c69ded3d51907d9552310a6b545e8 upstream.

The implementation of dev_ifconf() for the compat ioctl interface uses
an intermediate ifc structure allocated in userland for the duration of
the syscall. Though, it fails to initialize the padding bytes inserted
for alignment and that for leaks four bytes of kernel stack. Add an
explicit memset(0) before filling the structure to avoid the info leak.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 2.6.32: adjust filename, context]
Signed-off-by: Willy Tarreau <w@1wt.eu>

net: guard tcp_set_keepalive() to tcp sockets

[ Upstream commit 3e10986d1d698140747fcfc2761ec9cb64c1d582 ]

Its possible to use RAW sockets to get a crash in
tcp_set_keepalive() / sk_reset_timer()

Fix is to make sure socket is a SOCK_STREAM one.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

net: fix divide by zero in tcp algorithm illinois

commit 8f363b77ee4fbf7c3bbcf5ec2c5ca482d396d664 upstream

Reading TCP stats when using TCP Illinois congestion control algorithm
can cause a divide by zero kernel oops.

The division by zero occur in tcp_illinois_info() at:
do_div(t, ca->cnt_rtt);
where ca->cnt_rtt can become zero (when rtt_reset is called)

Steps to Reproduce:
1. Register tcp_illinois:
     # sysctl -w net.ipv4.tcp_congestion_control=illinois
2. Monitor internal TCP information via command "ss -i"
     # watch -d ss -i
3. Establish new TCP conn to machine

Either it fails at the initial conn, or else it needs to wait
for a loss or a reset.

This is only related to reading stats.  The function avg_delay() also
performs the same divide, but is guarded with a (ca->cnt_rtt > 0) at its
calling point in update_params().  Thus, simply fix tcp_illinois_info().

Function tcp_illinois_info() / get_info() is called without
socket lock.  Thus, eliminate any race condition on ca->cnt_rtt
by using a local stack variable.  Simply reuse info.tcpv_rttcnt,
as its already set to ca->cnt_rtt.
Function avg_delay() is not affected by this race condition, as
its called with the socket lock.

Cc: Petr Matousek <pmatouse@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>

net: prevent setting ttl=0 via IP_TTL

[ Upstream commit c9be4a5c49cf51cc70a993f004c5bb30067a65ce ]

A regression is introduced by the following commit:

commit 4d52cfbef6266092d535237ba5a4b981458ab171
Author: Eric Dumazet <eric.dumazet@gmail.com>
Date:   Tue Jun 2 00:42:16 2009 -0700

    net: ipv4/ip_sockglue.c cleanups

    Pure cleanups

but it is not a pure cleanup...

-               if (val != -1 && (val < 1 || val>255))
+               if (val != -1 && (val < 0 || val > 255))

Since there is no reason provided to allow ttl=0, change it back.

Reported-by: nitin padalia <padalia.nitin@gmail.com>
Cc: nitin padalia <padalia.nitin@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

net: sched: integer overflow fix

[ Upstream commit d2fe85da52e89b8012ffad010ef352a964725d5f ]

Fixed integer overflow in function htb_dequeue

Signed-off-by: Stefan Hasko <hasko.stevo@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

net_sched: gact: Fix potential panic in tcf_gact().

[ Upstream commit 696ecdc10622d86541f2e35cc16e15b6b3b1b67e ]

gact_rand array is accessed by gact->tcfg_ptype whose value
is assumed to less than MAX_RAND, but any range checks are
not performed.

So add a check in tcf_gact_init(). And in tcf_gact(), we can
reduce a branch.

Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

ipv4: check rt_genid in dst_check

commit d11a4dc18bf41719c9f0d7ed494d295dd2973b92
Author: Timo Teräs <timo.teras@iki.fi>
Date:   Thu Mar 18 23:20:20 2010 +0000

    ipv4: check rt_genid in dst_check

    Xfrm_dst keeps a reference to ipv4 rtable entries on each
    cached bundle. The only way to renew xfrm_dst when the underlying
    route has changed, is to implement dst_check for this. This is
    what ipv6 side does too.

    The problems started after 87c1e12b5eeb7b30b4b41291bef8e0b41fc3dde9
    ("ipsec: Fix bogus bundle flowi") which fixed a bug causing xfrm_dst
    to not get reused, until that all lookups always generated new
    xfrm_dst with new route reference and path mtu worked. But after the
    fix, the old routes started to get reused even after they were expired
    causing pmtu to break (well it would occationally work if the rtable
    gc had run recently and marked the route obsolete causing dst_check to
    get called).

Signed-off-by: Timo Teras <timo.teras@iki.fi>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit is based on the above, with the addition of verifying blackhole
routes in the same manner.

Fixing the issue with blackhole routes as it was accomplished in mainline
would require pulling in a lot more code, and people were not interested
in pulling in all of the dependencies given the much higher risk of trying
to select the right subset of changes to include.  The addition of the
single line of "dst->obsolete = -1;" in ipv4_dst_blackhole() was much
easier to verify, and is in the spirit of the patch in question.
This is the minimal set of changes to fix the bug in question.

A test case is available here :
  http://marc.info/?l=linux-netdev&m=135015076708950&w=2

Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

bonding: Fix slave selection bug.

The returned slave is incorrect, if the net device under check is not
charged yet by the master.

Signed-off-by: Hillf Danton <dhillf@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit af3e5bd5f650163c2e12297f572910a1af1b8236)
Signed-off-by: Willy Tarreau <w@1wt.eu>

bridge: set priority of STP packets

Spanning Tree Protocol packets should have always been marked as
control packets, this causes them to get queued in the high prirority
FIFO. As Radia Perlman mentioned in her LCA talk, STP dies if bridge
gets overloaded and can't communicate. This is a long-standing bug back
to the first versions of Linux bridge.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 547b4e718115eea74087e28d7fa70aec619200db)
Signed-off-by: Willy Tarreau <w@1wt.eu>

af_packet: remove BUG statement in tpacket_destruct_skb

[ Upstream commit 7f5c3e3a80e6654cf48dfba7cf94f88c6b505467 ]

Here's a quote of the comment about the BUG macro from asm-generic/bug.h:

Don't use BUG() or BUG_ON() unless there's really no way out; one
example might be detecting data structure corruption in the middle
of an operation that can't be backed out of.  If the (sub)system
can somehow continue operating, perhaps with reduced functionality,
it's probably not BUG-worthy.

If you're tempted to BUG(), think again:  is completely giving up
really the *only* solution?  There are usually better options, where
users don't need to reboot ASAP and can mostly shut down cleanly.

In our case, the status flag of a ring buffer slot is managed from both sides,
the kernel space and the user space. This means that even though the kernel
side might work as expected, the user space screws up and changes this flag
right between the send(2) is triggered when the flag is changed to
TP_STATUS_SENDING and a given skb is destructed after some time. Then, this
will hit the BUG macro. As David suggested, the best solution is to simply
remove this statement since it cannot be used for kernel side internal
consistency checks. I've tested it and the system still behaves /stable/ in
this case, so in accordance with the above comment, we should rather remove it.

Signed-off-by: Daniel Borkmann <daniel.borkmann@tik.ee.ethz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

softirq: reduce latencies

In various network workloads, __do_softirq() latencies can be up
to 20 ms if HZ=1000, and 200 ms if HZ=100.

This is because we iterate 10 times in the softirq dispatcher,
and some actions can consume a lot of cycles.

This patch changes the fallback to ksoftirqd condition to :

- A time limit of 2 ms.
- need_resched() being set on current task

When one of this condition is met, we wakeup ksoftirqd for further
softirq processing if we still have pending softirqs.

Using need_resched() as the only condition can trigger RCU stalls,
as we can keep BH disabled for too long.

I ran several benchmarks and got no significant difference in
throughput, but a very significant reduction of latencies (one order
of magnitude) :

In following bench, 200 antagonist "netperf -t TCP_RR" are started in
background, using all available cpus.

Then we start one "netperf -t TCP_RR", bound to the cpu handling the NIC
IRQ (hard+soft)

Before patch :

RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind
RT_LATENCY=550110.424
MIN_LATENCY=146858
MAX_LATENCY=997109
P50_LATENCY=305000
P90_LATENCY=550000
P99_LATENCY=710000
MEAN_LATENCY=376989.12
STDDEV_LATENCY=184046.92

After patch :

RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind
RT_LATENCY=40545.492
MIN_LATENCY=9834
MAX_LATENCY=78366
P50_LATENCY=33583
P90_LATENCY=59000
P99_LATENCY=69000
MEAN_LATENCY=38364.67
STDDEV_LATENCY=12865.26

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: David Miller <davem@davemloft.net>
Cc: Tom Herbert <therbert@google.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c10d73671ad30f54692f7f69f0e09e75d3a8926a)
Signed-off-by: Willy Tarreau <w@1wt.eu>

net: reduce net_rx_action() latency to 2 HZ

We should use time_after_eq() to get maximum latency of two ticks,
instead of three.

Bug added in commit 24f8b2385 (net: increase receive packet quantum)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit d1f41b67ff7735193bc8b418b98ac99a448833e2)
Signed-off-by: Willy Tarreau <w@1wt.eu>

net/core: Fix potential memory leak in dev_set_alias()

[ Upstream commit 7364e445f62825758fa61195d237a5b8ecdd06ec ]

Do not leak memory by updating pointer with potentially NULL realloc return value.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

nfsd4: fix oops on unusual readlike compound

commit d5f50b0c290431c65377c4afa1c764e2c3fe5305 upstream.

If the argument and reply together exceed the maximum payload size, then
a reply with a read-like operation can overlow the rq_pages array.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

kernel panic when mount NFSv4

On Tue, 2010-12-14 at 16:58 +0800, Mi Jinlong wrote:
> Hi,
>
> When testing NFSv4 at RHEL6 with kernel 2.6.32, I got a kernel panic
> at NFS client's __rpc_create_common function.
>
> The panic place is:
>   rpc_mkpipe
>     __rpc_lookup_create()          <=== find pipefile *idmap*
>     __rpc_mkpipe()                 <=== pipefile is *idmap*
>       __rpc_create_common()
>        ******  BUG_ON(!d_unhashed(dentry)); ******    *panic*
>
> It means that the dentry's d_flags have be set DCACHE_UNHASHED,
> but it should not be set here.
>
> Is someone known this bug? or give me some idea?
>
> A reproduce program is append, but it can't reproduce the bug every time.
> the export is: "/nfsroot       *(rw,no_root_squash,fsid=0,insecure)"
>
> And the panic message is append.
>
> ============================================================================
> #!/bin/sh
>
> LOOPTOTAL=768
> LOOPCOUNT=0
> ret=0
>
> while [ $LOOPCOUNT -ne $LOOPTOTAL ]
> do
> ((LOOPCOUNT += 1))
> service nfs restart
> /usr/sbin/rpc.idmapd
> mount -t nfs4 127.0.0.1:/ /mnt|| return 1;
> ls -l /var/lib/nfs/rpc_pipefs/nfs/*/
> umount /mnt
> echo $LOOPCOUNT
> done
>
> ===============================================================================
> Code: af 60 01 00 00 89 fa 89 f0 e8 64 cf 89 f0 e8 5c 7c 64 cf 31 c0 8b 5c 24 10 8b
> 74 24 14 8b 7c 24 18 8b 6c 24 1c 83 c4 20 c3 <0f> 0b eb fc 8b 46 28 c7 44 24 08 20
> de ee f0 c7 44 24 04 56 ea
> EIP:[<f0ee92ea>] __rpc_create_common+0x8a/0xc0 [sunrpc] SS:ESP 0068:eccb5d28
> ---[ end trace 8f5606cd08928ed2]---
> Kernel panic - not syncing: Fatal exception
> Pid:7131, comm: mount.nfs4 Tainted: G     D   -------------------2.6.32 #1
> Call Trace:
>  [<c080ad18>] ? panic+0x42/0xed
>  [<c080e42c>] ? oops_end+0xbc/0xd0
>  [<c040b090>] ? do_invalid_op+0x0/0x90
>  [<c040b10f>] ? do_invalid_op+0x7f/0x90
>  [<f0ee92ea>] ? __rpc_create_common+0x8a/0xc0[sunrpc]
>  [<f0edc433>] ? rpc_free_task+0x33/0x70[sunrpc]
>  [<f0ed6508>] ? prc_call_sync+0x48/0x60[sunrpc]
>  [<f0ed656e>] ? rpc_ping+0x4e/0x60[sunrpc]
>  [<f0ed6eaf>] ? rpc_create+0x38f/0x4f0[sunrpc]
>  [<c080d80b>] ? error_code+0x73/0x78
>  [<f0ee92ea>] ? __rpc_create_common+0x8a/0xc0[sunrpc]
>  [<c0532bda>] ? d_lookup+0x2a/0x40
>  [<f0ee94b1>] ? rpc_mkpipe+0x111/0x1b0[sunrpc]
>  [<f10a59f4>] ? nfs_create_rpc_client+0xb4/0xf0[nfs]
>  [<f10d6c6d>] ? nfs_fscache_get_client_cookie+0x1d/0x50[nfs]
>  [<f10d3fcb>] ? nfs_idmap_new+0x7b/0x140[nfs]
>  [<c05e76aa>] ? strlcpy+0x3a/0x60
>  [<f10a60ca>] ? nfs4_set_client+0xea/0x2b0[nfs]
>  [<f10a6d0c>] ? nfs4_create_server+0xac/0x1b0[nfs]
>  [<c04f1400>] ? krealloc+0x40/0x50
>  [<f10b0e8b>] ? nfs4_remote_get_sb+0x6b/0x250[nfs]
>  [<c04f14ec>] ? kstrdup+0x3c/0x60
>  [<c0520739>] ? vfs_kern_mount+0x69/0x170
>  [<f10b1a3c>] ? nfs_do_root_mount+0x6c/0xa0[nfs]
>  [<f10b1b47>] ? nfs4_try_mount+0x37/0xa0[nfs]
>  [<f10afe6d>] ? nfs4_validate_text_mount_data+-x7d/0xf0[nfs]
>  [<f10b1c42>] ? nfs4_get_sb+0x92/0x2f0
>  [<c0520739>] ? vfs_kern_mount+0x69/0x170
>  [<c05366d2>] ? get_fs_type+0x32/0xb0
>  [<c052089f>] ? do_kern_mount+0x3f/0xe0
>  [<c053954f>] ? do_mount+0x2ef/0x740
>  [<c0537740>] ? copy_mount_options+0xb0/0x120
>  [<c0539a0e>] ? sys_mount+0x6e/0xa0

Hi,

Does the following patch fix the problem?

Cheers
  Trond

--------------------------
SUNRPC: Fix a BUG in __rpc_create_common

From: Trond Myklebust <Trond.Myklebust@netapp.com>

Mi Jinlong reports:

When testing NFSv4 at RHEL6 with kernel 2.6.32, I got a kernel panic
at NFS client's __rpc_create_common function.

The panic place is:
  rpc_mkpipe
      __rpc_lookup_create()          <=== find pipefile *idmap*
      __rpc_mkpipe()                 <=== pipefile is *idmap*
        __rpc_create_common()
         ******  BUG_ON(!d_unhashed(dentry)); ****** *panic*

The test is wrong: we can find ourselves with a hashed negative dentry here
if the idmapper tried to look up the file before we got round to creating
it.

Just replace the BUG_ON() with a d_drop(dentry).

[2.6.32 background info from Jonathan below]
> Hi Willy et al,
>
> Please consider
>
>   beb0f0a9fba1 kernel panic when mount NFSv4, 2010-12-20
>
> for application to kernel.org's 2.6.32.y and 2.6.34.y trees.  The
> patch was applied upstream during the 2.6.38 merge window, so newer
> kernels don't need it.
>
> (Context: <http://bugs.debian.org/695872>.)  Tom Downes (cc-ed)
> experienced the bug on a Debian kernel close to 2.6.32.58 and
> confirmed that the patch doesn't seem to hurt.
>
> The patch is part of Fedora 13's 2.6.34-based and Fedora 14's
> 2.6.35-based kernels[1].  It was also included in the RHEL kernel at
> some point between 2.6.32-71.29.1.el6 and 2.6.32-131.0.15.el6[2].
>
> Thoughts of all kinds welcome, as always.
>
> Regards,
> Jonathan
>
> [1] https://bugzilla.redhat.com/673207
> [2] https://oss.oracle.com/git/?p=redpatch.git;a=commit;h=8028cccdc4b1

Reported-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
(cherry picked from commit beb0f0a9fba1fa98b378329a9a5b0a73f25097ae)
Cc: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

btrfs: use rcu_barrier() to wait for bdev puts at unmount

commit bc178622d40d87e75abc131007342429c9b03351 upstream.

Doing this would reliably fail with -EBUSY for me:

# mount /dev/sdb2 /mnt/scratch; umount /mnt/scratch; mkfs.btrfs -f /dev/sdb2
...
unable to open /dev/sdb2: Device or resource busy

because mkfs.btrfs tries to open the device O_EXCL, and somebody still has it.

Using systemtap to track bdev gets & puts shows a kworker thread doing a
blkdev put after mkfs attempts a get; this is left over from the unmount
path:

btrfs_close_devices
__btrfs_close_devices
call_rcu(&device->rcu, free_device);
free_device
INIT_WORK(&device->rcu_work, __free_device);
schedule_work(&device->rcu_work);

so unmount might complete before __free_device fires & does its blkdev_put.

Adding an rcu_barrier() to btrfs_close_devices() causes unmount to wait
until all blkdev_put()s are done, and the device is truly free once
unmount completes.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

hfsplus: fix potential overflow in hfsplus_file_truncate()

commit 12f267a20aecf8b84a2a9069b9011f1661c779b4 upstream.

Change a u32 to loff_t hfsplus_file_truncate().

Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

NLS: improve UTF8 -> UTF16 string conversion routine

commit 0720a06a7518c9d0c0125bd5d1f3b6264c55c3dd upstream.

The utf8s_to_utf16s conversion routine needs to be improved.  Unlike
its utf16s_to_utf8s sibling, it doesn't accept arguments specifying
the maximum length of the output buffer or the endianness of its
16-bit output.

This patch (as1501) adds the two missing arguments, and adjusts the
only two places in the kernel where the function is called.  A
follow-on patch will add a third caller that does utilize the new
capabilities.

The two conversion routines are still annoyingly inconsistent in the
way they handle invalid byte combinations.  But that's a subject for a
different patch.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
CC: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[bwh: Bakckported to 2.6.32: drop Hyper-V change]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>

fat: Fix stat->f_namelen

commit eeb5b4ae81f4a750355fa0c15f4fea22fdf83be1 upstream.

I found that the length of a file name when created cannot exceed 255
characters, yet, pathconf(), via statfs(), returns the maximum as 260.

Signed-off-by: Kevin Dankwardt <k@kcomputing.com>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Willy Tarreau <w@1wt.eu>

isofs: avoid info leak on export

commit fe685aabf7c8c9f138e5ea900954d295bf229175 upstream.

For type 1 the parent_offset member in struct isofs_fid gets copied
uninitialized to userland. Fix this by initializing it to 0.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

fs/cifs/cifs_dfs_ref.c: fix potential memory leakage

commit 10b8c7dff5d3633b69e77f57d404dab54ead3787 upstream.

When it goes to error through line 144, the memory allocated to *devname is
not freed, and the caller doesn't free it either in line 250. So we free the
memroy of *devname in function cifs_compose_mount_options() when it goes to
error.

Signed-off-by: Cong Ding <dinggnu@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

udf: Fix bitmap overflow on large filesystems with small block size

commit 89b1f39eb4189de745fae554b0d614d87c8d5c63 upstream.

For large UDF filesystems with 512-byte blocks the number of necessary
bitmap blocks is larger than 2^16 so s_nr_groups in udf_bitmap overflows
(the number will overflow for filesystems larger than 128 GB with
512-byte blocks). That results in ENOSPC errors despite the filesystem
has plenty of free space.

Fix the problem by changing s_nr_groups' type to 'int'. That is enough
even for filesystems 2^32 blocks (UDF maximum) and 512-byte blocksize.

Reported-and-tested-by: v10lator@myway.de
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Jim Trigg <jtrigg@spamcop.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

udf: avoid info leak on export

commit 0143fc5e9f6f5aad4764801015bc8d4b4a278200 upstream.

For type 0x51 the udf.parent_partref member in struct fid gets copied
uninitialized to userland. Fix this by initializing it to 0.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

udf: fix memory leak while allocating blocks during write

commit 2fb7d99d0de3fd8ae869f35ab682581d8455887a upstream.

Need to brelse the buffer_head stored in cur_epos and next_epos.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Shuah Khan <shuah.khan@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

ext4: avoid hang when mounting non-journal filesystems with orphan list

commit 0e9a9a1ad619e7e987815d20262d36a2f95717ca upstream.

When trying to mount a file system which does not contain a journal,
but which does have a orphan list containing an inode which needs to
be truncated, the mount call with hang forever in
ext4_orphan_cleanup() because ext4_orphan_del() will return
immediately without removing the inode from the orphan list, leading
to an uninterruptible loop in kernel code which will busy out one of
the CPU's on the system.

This can be trivially reproduced by trying to mount the file system
found in tests/f_orphan_extents_inode/image.gz from the e2fsprogs
source tree. If a malicious user were to put this on a USB stick, and
mount it on a Linux desktop which has automatic mounts enabled, this
could be considered a potential denial of service attack. (Not a big
deal in practice, but professional paranoids worry about such things,
and have even been known to allocate CVE numbers for such problems.)

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Cc: stable@vger.kernel.org
Signed-off-by: Willy Tarreau <w@1wt.eu>

ext4: make orphan functions be no-op in no-journal mode

commit c9b92530a723ac5ef8e352885a1862b18f31b2f5 upstream.

Instead of checking whether the handle is valid, we check if journal
is enabled. This avoids taking the s_orphan_lock mutex in all cases
when there is no journal in use, including the error paths where
ext4_orphan_del() is called with a handle set to NULL.

Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Willy Tarreau <w@1wt.eu>

CVE-2012-4508 kernel: ext4: AIO vs fallocate stale data exposure

CVE-2012-4508 kernel: ext4: AIO vs fallocate stale data exposure
[dannf: backported to Debian's 2.6.32]

According to Ben :
> The original upstream commits were c278531d39f3158bfee93dc67da0b77e09776de2,
> 60d4616f3dc63371b3dc367e5e88fd4b4f037f65 and (most importantly)
> dee1f973ca341c266229faa5a1a5bb268bed3531 by Dmitry Monakhov
> <dmonakhov@openvz.org>. They were backported into the RHEL 6 kernel by
> Lukas Czerner, according to its changelog. Dann got this version from
> Oracle's redpatch repository, where, if I understand rightly, Jamie Iles
> attempted to regenerate Lukas's patch(es).

Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Jamie Iles <jamie@jamieiles.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

ext4: limit group search loop for non-extent files

commit e6155736ad76b2070652745f9e54cdea3f0d8567 upstream.

In the case where we are allocating for a non-extent file,
we must limit the groups we allocate from to those below
2^32 blocks, and ext4_mb_regular_allocator() attempts to
do this initially by putting a cap on ngroups for the
subsequent search loop.

However, the initial target group comes in from the
allocation context (ac), and it may already be beyond
the artificially limited ngroups. In this case,
the limit

if (group == ngroups)
group = 0;

at the top of the loop is never true, and the loop will
run away.

Catch this case inside the loop and reset the search to
start at group 0.

[sandeen@redhat.com: add commit msg & comments]

Signed-off-by: Lachlan McIlroy <lmcilroy@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

ext4: fix race in ext4_mb_add_n_trim()

commit f1167009711032b0d747ec89a632a626c901a1ad upstream.

In ext4_mb_add_n_trim(), lg_prealloc_lock should be taken when
changing the lg_prealloc_list.

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

ext4: lock i_mutex when truncating orphan inodes

commit 721e3eba21e43532e438652dd8f1fcdfce3187e7 upstream.

Commit c278531d39 added a warning when ext4_flush_unwritten_io() is
called without i_mutex being taken. It had previously not been taken
during orphan cleanup since races weren't possible at that point in
the mount process, but as a result of this c278531d39, we will now see
a kernel WARN_ON in this case. Take the i_mutex in
ext4_orphan_cleanup() to suppress this warning.

Reported-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

ext4: fix fdatasync() for files with only i_size changes

commit b71fc079b5d8f42b2a52743c8d2f1d35d655b1c5 upstream.

Code tracking when transaction needs to be committed on fdatasync(2) forgets
to handle a situation when only inode's i_size is changed. Thus in such
situations fdatasync(2) doesn't force transaction with new i_size to disk
and that can result in wrong i_size after a crash.

Fix the issue by updating inode's i_datasync_tid whenever its size is
updated.

Reported-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

ext4: always set i_op in ext4_mknod()

commit 6a08f447facb4f9e29fcc30fb68060bb5a0d21c2 upstream.

ext4_special_inode_operations have their own ifdef CONFIG_EXT4_FS_XATTR
to mask those methods. And ext4_iget also always sets it, so there is
an inconsistency.

Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

ext4: online defrag is not supported for journaled files

commit f066055a3449f0e5b0ae4f3ceab4445bead47638 upstream.

Proper block swap for inodes with full journaling enabled is
truly non obvious task. In order to be on a safe side let's
explicitly disable it for now.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

ext4: fix memory leak in ext4_xattr_set_acl()'s error path

commit 24ec19b0ae83a385ad9c55520716da671274b96c upstream.

In ext4_xattr_set_acl(), if ext4_journal_start() returns an error,
posix_acl_release() will not be called for 'acl' which may result in a
memory leak.

This patch fixes that.

Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Eugene Shatokhin <eugene.shatokhin@rosalab.ru>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

ext4: Fix max file size and logical block counting of extent format file

commit f17722f917b2f21497deb6edc62fb1683daa08e6 upstream

Kazuya Mio reported that he was able to hit BUG_ON(next == lblock)
in ext4_ext_put_gap_in_cache() while creating a sparse file in extent
format and fill the tail of file up to its end. We will hit the BUG_ON
when we write the last block (2^32-1) into the sparse file.

The root cause of the problem lies in the fact that we specifically set
s_maxbytes so that block at s_maxbytes fit into on-disk extent format,
which is 32 bit long. However, we are not storing start and end block
number, but rather start block number and length in blocks. It means
that in order to cover extent from 0 to EXT_MAX_BLOCK we need
EXT_MAX_BLOCK+1 to fit into len (because we counting block 0 as well) -
and it does not.

The only way to fix it without changing the meaning of the struct
ext4_extent members is, as Kazuya Mio suggested, to lower s_maxbytes
by one fs block so we can cover the whole extent we can get by the
on-disk extent format.

Also in many places EXT_MAX_BLOCK is used as length instead of maximum
logical block number as the name suggests, it is all a bit messy. So
this commit renames it to EXT_MAX_BLOCKS and change its usage in some
places to actually be maximum number of blocks in the extent.

The bug which this commit fixes can be reproduced as follows:

dd if=/dev/zero of=/mnt/mp1/file bs=<blocksize> count=1 seek=$((2**32-2))
sync
dd if=/dev/zero of=/mnt/mp1/file bs=<blocksize> count=1 seek=$((2**32-1))

Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com>
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
[dannf: Applied the backport from RHEL6 to Debian's 2.6.32]
Signed-off-by: Willy Tarreau <w@1wt.eu>

ext4: don't dereference null pointer when make_indexed_dir() fails

Fix for a null pointer bug found while running punch hole tests

Signed-off-by: Allison Henderson <achender@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
(cherry picked from commit 6976a6f2acde2b0443cd64f1d08af90630e4ce81)
Signed-off-by: Willy Tarreau <w@1wt.eu>

ext4: Fix fs corruption when make_indexed_dir() fails

When make_indexed_dir() fails (e.g. because of ENOSPC) after it has
allocated block for index tree root, we did not properly mark all
changed buffers dirty. This lead to only some of these buffers being
written out and thus effectively corrupting the directory.

Fix the issue by marking all changed data dirty even in the error
failure case.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
(cherry picked from commit 7ad8e4e6ae2a7c95445ee1715b1714106fb95037)
Signed-off-by: Willy Tarreau <w@1wt.eu>

jbd: Fix lock ordering bug in journal_unmap_buffer()

commit 25389bb207987b5774182f763b9fb65ff08761c8 upstream.

Commit 09e05d48 introduced a wait for transaction commit into
journal_unmap_buffer() in the case we are truncating a buffer undergoing commit
in the page stradding i_size on a filesystem with blocksize < pagesize. Sadly
we forgot to drop buffer lock before waiting for transaction commit and thus
deadlock is possible when kjournald wants to lock the buffer.

Fix the problem by dropping the buffer lock before waiting for transaction
commit. Since we are still holding page lock (and that is OK), buffer cannot
disappear under us.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

jbd: Fix assertion failure in commit code due to lacking transaction credits

ext3 users of data=journal mode with blocksize < pagesize were occasionally
hitting assertion failure in journal_commit_transaction() checking whether the
transaction has at least as many credits reserved as buffers attached. The
core of the problem is that when a file gets truncated, buffers that still need
checkpointing or that are attached to the committing transaction are left with
buffer_mapped set. When this happens to buffers beyond i_size attached to a
page stradding i_size, subsequent write extending the file will see these
buffers and as they are mapped (but underlying blocks were freed) things go
awry from here.

The assertion failure just coincidentally (and in this case luckily as we would
start corrupting filesystem) triggers due to journal_head not being properly
cleaned up as well.

Under some rare circumstances this bug could even hit data=ordered mode users.
There the assertion won't trigger and we would end up corrupting the
filesystem.

We fix the problem by unmapping buffers if possible (in lots of cases we just
need a buffer attached to a transaction as a place holder but it must not be
written out anyway). And in one case, we just have to bite the bullet and wait
for transaction commit to finish.

Reviewed-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Jan Kara <jack@suse.cz>
(cherry picked from commit 09e05d4805e6c524c1af74e524e5d0528bb3fef3)
Signed-off-by: Willy Tarreau <w@1wt.eu>

jbd: Delay discarding buffers in journal_unmap_buffer

Delay discarding buffers in journal_unmap_buffer until
we know that "add to orphan" operation has definitely been
committed, otherwise the log space of committing transation
may be freed and reused before truncate get committed, updates
may get lost if crash happens.

This patch is a backport of JBD2 fix by dingdinghua <dingdinghua@nrchpc.ac.cn>.

Signed-off-by: Jan Kara <jack@suse.cz>
(cherry picked from commit 86963918965eb8fe0c8ae009e7c1b4c630f533d5)
Signed-off-by: Willy Tarreau <w@1wt.eu>

tmpfs: fix use-after-free of mempolicy object

commit 5f00110f7273f9ff04ac69a5f85bb535a4fd0987 upstream.

The tmpfs remount logic preserves filesystem mempolicy if the mpol=M
option is not specified in the remount request.  A new policy can be
specified if mpol=M is given.

Before this patch remounting an mpol bound tmpfs without specifying
mpol= mount option in the remount request would set the filesystem's
mempolicy object to a freed mempolicy object.

To reproduce the problem boot a DEBUG_PAGEALLOC kernel and run:
    # mkdir /tmp/x

    # mount -t tmpfs -o size=100M,mpol=interleave nodev /tmp/x

    # grep /tmp/x /proc/mounts
    nodev /tmp/x tmpfs rw,relatime,size=102400k,mpol=interleave:0-3 0 0

    # mount -o remount,size=200M nodev /tmp/x

    # grep /tmp/x /proc/mounts
    nodev /tmp/x tmpfs rw,relatime,size=204800k,mpol=??? 0 0
        # note ? garbage in mpol=... output above

    # dd if=/dev/zero of=/tmp/x/f count=1
        # panic here

Panic:
    BUG: unable to handle kernel NULL pointer dereference at           (null)
    IP: [<          (null)>]           (null)
    [...]
    Oops: 0010 [#1] SMP DEBUG_PAGEALLOC
    Call Trace:
      mpol_shared_policy_init+0xa5/0x160
      shmem_get_inode+0x209/0x270
      shmem_mknod+0x3e/0xf0
      shmem_create+0x18/0x20
      vfs_create+0xb5/0x130
      do_last+0x9a1/0xea0
      path_openat+0xb3/0x4d0
      do_filp_open+0x42/0xa0
      do_sys_open+0xfe/0x1e0
      compat_sys_open+0x1b/0x20
      cstar_dispatch+0x7/0x1f

Non-debug kernels will not crash immediately because referencing the
dangling mpol will not cause a fault.  Instead the filesystem will
reference a freed mempolicy object, which will cause unpredictable
behavior.

The problem boils down to a dropped mpol reference below if
shmem_parse_options() does not allocate a new mpol:

    config = *sbinfo
    shmem_parse_options(data, &config, true)
    mpol_put(sbinfo->mpol)
    sbinfo->mpol = config.mpol  /* BUG: saves unreferenced mpol */

This patch avoids the crash by not releasing the mempolicy if
shmem_parse_options() doesn't create a new mpol.

How far back does this issue go? I see it in both 2.6.36 and 3.3.  I did
not look back further.

Signed-off-by: Greg Thelen <gthelen@google.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

sysfs: sysfs_pathname/sysfs_add_one: Use strlcat() instead of strcat()

commit 66081a72517a131430dcf986775f3268aafcb546 upstream.

The warning check for duplicate sysfs entries can cause a buffer overflow
when printing the warning, as strcat() doesn't check buffer sizes.
Use strlcat() instead.

Since strlcat() doesn't return a pointer to the passed buffer, unlike
strcat(), I had to convert the nested concatenation in sysfs_add_one() to
an admittedly more obscure comma operator construct, to avoid emitting code
for the concatenation if CONFIG_BUG is disabled.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

fs/fscache/stats.c: fix memory leak

commit ec686c9239b4d472052a271c505d04dae84214cc upstream.

There is a kernel memory leak observed when the proc file
/proc/fs/fscache/stats is read.

The reason is that in fscache_stats_open, single_open is called and the
respective release function is not called during release. Hence fix
with correct release function - single_release().

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=57101

Signed-off-by: Anurup m <anurup.m@huawei.com>
Cc: shyju pv <shyju.pv@huawei.com>
Cc: Sanil kumar <sanil.kumar@huawei.com>
Cc: Nataraj m <nataraj.m@huawei.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

fs/compat_ioctl.c: VIDEO_SET_SPU_PALETTE missing error check

commit 12176503366885edd542389eed3aaf94be163fdb upstream.

The compat ioctl for VIDEO_SET_SPU_PALETTE was missing an error check
while converting ioctl arguments. This could lead to leaking kernel
stack contents into userspace.

Patch extracted from existing fix in grsecurity.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: David Miller <davem@davemloft.net>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: PaX Team <pageexec@freemail.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

epoll: prevent missed events on EPOLL_CTL_MOD

commit 128dd1759d96ad36c379240f8b9463e8acfd37a1 upstream.

EPOLL_CTL_MOD sets the interest mask before calling f_op->poll() to
ensure events are not missed. Since the modifications to the interest
mask are not protected by the same lock as ep_poll_callback, we need to
ensure the change is visible to other CPUs calling ep_poll_callback.

We also need to ensure f_op->poll() has an up-to-date view of past
events which occured before we modified the interest mask. So this
barrier also pairs with the barrier in wq_has_sleeper().

This should guarantee either ep_poll_callback or f_op->poll() (or both)
will notice the readiness of a recently-ready/modified item.

This issue was encountered by Andreas Voellmy and Junchang(Jason) Wang in:
http://thread.gmane.org/gmane.linux.kernel/1408782/

Signed-off-by: Eric Wong <normalperson@yhbt.net>
Cc: Hans Verkuil <hans.verkuil@cisco.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andreas Voellmy <andreas.voellmy@yale.edu>
Tested-by: "Junchang(Jason) Wang" <junchang.wang@yale.edu>
Cc: netdev@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

USB: ftdi_sio: Quiet sparse noise about using plain integer was NULL pointer

commit a816e3113b63753c330ca4751ea1d208e93e3015 upstream.

Pointers should not be compared to plain integers.
Quiets the sparse warning:
warning: Using plain integer as NULL pointer

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suggested-by: Lotfi Manseur <lotfi.manseur@imag.fr>
Signed-off-by: Willy Tarreau <w@1wt.eu>

USB: serial: ftdi_sio: Handle the old_termios == 0 case e.g. uart_resume_port()

commit c515598e0f5769916c31c00392cc2bfe6af74e55 upstream.

Handle null old_termios in ftdi_set_termios() calls from uart_resume_port().

Signed-off-by: Andrew Worsley <amworsley@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Suggested-by: Lotfi Manseur <lotfi.manseur@imag.fr>
Signed-off-by: Willy Tarreau <w@1wt.eu>

USB: cdc-wdm: fix buffer overflow

commit c0f5ecee4e741667b2493c742b60b6218d40b3aa upstream.

The buffer for responses must not overflow.
If this would happen, set a flag, drop the data and return
an error after user space has read all remaining data.

Signed-off-by: Oliver Neukum <oliver@neukum.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 2.6.32: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>

USB: io_ti: Fix NULL dereference in chase_port()

commit 1ee0a224bc9aad1de496c795f96bc6ba2c394811 upstream

The tty is NULL when the port is hanging up.
chase_port() needs to check for this.

This patch is intended for stable series.
The behavior was observed and tested in Linux 3.2 and 3.7.1.

Johan Hovold submitted a more elaborate patch for the mainline kernel.

[   56.277883] usb 1-1: edge_bulk_in_callback - nonzero read bulk status received: -84
[   56.278811] usb 1-1: USB disconnect, device number 3
[   56.278856] usb 1-1: edge_bulk_in_callback - stopping read!
[   56.279562] BUG: unable to handle kernel NULL pointer dereference at 00000000000001c8
[   56.280536] IP: [<ffffffff8144e62a>] _raw_spin_lock_irqsave+0x19/0x35
[   56.281212] PGD 1dc1b067 PUD 1e0f7067 PMD 0
[   56.282085] Oops: 0002 [#1] SMP
[   56.282744] Modules linked in:
[   56.283512] CPU 1
[   56.283512] Pid: 25, comm: khubd Not tainted 3.7.1 #1 innotek GmbH VirtualBox/VirtualBox
[   56.283512] RIP: 0010:[<ffffffff8144e62a>]  [<ffffffff8144e62a>] _raw_spin_lock_irqsave+0x19/0x35
[   56.283512] RSP: 0018:ffff88001fa99ab0  EFLAGS: 00010046
[   56.283512] RAX: 0000000000000046 RBX: 00000000000001c8 RCX: 0000000000640064
[   56.283512] RDX: 0000000000010000 RSI: ffff88001fa99b20 RDI: 00000000000001c8
[   56.283512] RBP: ffff88001fa99b20 R08: 0000000000000000 R09: 0000000000000000
[   56.283512] R10: 0000000000000000 R11: ffffffff812fcb4c R12: ffff88001ddf53c0
[   56.283512] R13: 0000000000000000 R14: 00000000000001c8 R15: ffff88001e19b9f4
[   56.283512] FS:  0000000000000000(0000) GS:ffff88001fd00000(0000) knlGS:0000000000000000
[   56.283512] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   56.283512] CR2: 00000000000001c8 CR3: 000000001dc51000 CR4: 00000000000006e0
[   56.283512] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   56.283512] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   56.283512] Process khubd (pid: 25, threadinfo ffff88001fa98000, task ffff88001fa94f80)
[   56.283512] Stack:
[   56.283512]  0000000000000046 00000000000001c8 ffffffff810578ec ffffffff812fcb4c
[   56.283512]  ffff88001e19b980 0000000000002710 ffffffff812ffe81 0000000000000001
[   56.283512]  ffff88001fa94f80 0000000000000202 ffffffff00000001 0000000000000296
[   56.283512] Call Trace:
[   56.283512]  [<ffffffff810578ec>] ? add_wait_queue+0x12/0x3c
[   56.283512]  [<ffffffff812fcb4c>] ? usb_serial_port_work+0x28/0x28
[   56.283512]  [<ffffffff812ffe81>] ? chase_port+0x84/0x2d6
[   56.283512]  [<ffffffff81063f27>] ? try_to_wake_up+0x199/0x199
[   56.283512]  [<ffffffff81263a5c>] ? tty_ldisc_hangup+0x222/0x298
[   56.283512]  [<ffffffff81300171>] ? edge_close+0x64/0x129
[   56.283512]  [<ffffffff810612f7>] ? __wake_up+0x35/0x46
[   56.283512]  [<ffffffff8106135b>] ? should_resched+0x5/0x23
[   56.283512]  [<ffffffff81264916>] ? tty_port_shutdown+0x39/0x44
[   56.283512]  [<ffffffff812fcb4c>] ? usb_serial_port_work+0x28/0x28
[   56.283512]  [<ffffffff8125d38c>] ? __tty_hangup+0x307/0x351
[   56.283512]  [<ffffffff812e6ddc>] ? usb_hcd_flush_endpoint+0xde/0xed
[   56.283512]  [<ffffffff8144e625>] ? _raw_spin_lock_irqsave+0x14/0x35
[   56.283512]  [<ffffffff812fd361>] ? usb_serial_disconnect+0x57/0xc2
[   56.283512]  [<ffffffff812ea99b>] ? usb_unbind_interface+0x5c/0x131
[   56.283512]  [<ffffffff8128d738>] ? __device_release_driver+0x7f/0xd5
[   56.283512]  [<ffffffff8128d9cd>] ? device_release_driver+0x1a/0x25
[   56.283512]  [<ffffffff8128d393>] ? bus_remove_device+0xd2/0xe7
[   56.283512]  [<ffffffff8128b7a3>] ? device_del+0x119/0x167
[   56.283512]  [<ffffffff812e8d9d>] ? usb_disable_device+0x6a/0x180
[   56.283512]  [<ffffffff812e2ae0>] ? usb_disconnect+0x81/0xe6
[   56.283512]  [<ffffffff812e4435>] ? hub_thread+0x577/0xe82
[   56.283512]  [<ffffffff8144daa7>] ? __schedule+0x490/0x4be
[   56.283512]  [<ffffffff8105798f>] ? abort_exclusive_wait+0x79/0x79
[   56.283512]  [<ffffffff812e3ebe>] ? usb_remote_wakeup+0x2f/0x2f
[   56.283512]  [<ffffffff812e3ebe>] ? usb_remote_wakeup+0x2f/0x2f
[   56.283512]  [<ffffffff810570b4>] ? kthread+0x81/0x89
[   56.283512]  [<ffffffff81057033>] ? __kthread_parkme+0x5c/0x5c
[   56.283512]  [<ffffffff8145387c>] ? ret_from_fork+0x7c/0xb0
[   56.283512]  [<ffffffff81057033>] ? __kthread_parkme+0x5c/0x5c
[   56.283512] Code: 8b 7c 24 08 e8 17 0b c3 ff 48 8b 04 24 48 83 c4 10 c3 53 48 89 fb 41 50 e8 e0 0a c3 ff 48 89 04 24 e8 e7 0a c3 ff ba 00 00 01 00
<f0> 0f c1 13 48 8b 04 24 89 d1 c1 ea 10 66 39 d1 74 07 f3 90 66
[   56.283512] RIP  [<ffffffff8144e62a>] _raw_spin_lock_irqsave+0x19/0x35
[   56.283512]  RSP <ffff88001fa99ab0>
[   56.283512] CR2: 00000000000001c8
[   56.283512] ---[ end trace 49714df27e1679ce ]---

Signed-off-by: Wolfgang Frisch <wfpub@roembden.net>
Cc: Johan Hovold <jhovold@gmail.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

USB: garmin_gps: fix memory leak on disconnect

commit 618aa1068df29c37a58045fe940f9106664153fd upstream.

Remove bogus disconnect test introduced by 95bef012e ("USB: more serial
drivers writing after disconnect") which prevented queued data from
being freed on disconnect.

The possible IO it was supposed to prevent is long gone.

Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

USB: mos7840: fix port-device leak in error path

commit 3eb55cc4ed88eee3b5230f66abcdbd2a91639eda upstream.

The driver set the usb-serial port pointers to NULL on errors in attach,
effectively preventing usb-serial core from decrementing the port ref
counters and releasing the port devices and associated data.

Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

USB: mos7840: fix urb leak at release

commit 65a4cdbb170e4ec1a7fa0e94936d47e24a17b0e8 upstream.

Make sure control urb is freed at release.

Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

USB: serial: Fix memory leak in sierra_release()

commit f7bc5051667b74c3861f79eed98c60d5c3b883f7 upstream.

I found a memory leak in sierra_release() (well sierra_probe() I guess)
that looses 8 bytes each time the driver releases a device.

Signed-off-by: Len Sorensen <lsorense@csclub.uwaterloo.ca>
Acked-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

USB: whiteheat: fix memory leak in error path

commit c129197c99550d356cf5f69b046994dd53cd1b9d upstream.

Make sure command buffer is deallocated in case of errors during attach.

Signed-off-by: Johan Hovold <jhovold@gmail.com>
Cc: <support@connecttech.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

USB: EHCI: go back to using the system clock for QH unlinks

commit 004c19682884d4f40000ce1ded53f4a1d0b18206 upstream

This patch (as1477) fixes a problem affecting a few types of EHCI
controller.  Contrary to what one might expect, these controllers
automatically stop their internal frame counter when no ports are
enabled.  Since ehci-hcd currently relies on the frame counter for
determining when it should unlink QHs from the async schedule, those
controllers run into trouble: The frame counter stops and the QHs
never get unlinked.

Some systems have also experienced other problems traced back to
commit b963801164618e25fbdc0cd452ce49c3628b46c8 (USB: ehci-hcd unlink
speedups), which made the original switch from using the system clock
to using the frame counter.  It never became clear what the reason was
for these problems, but evidently it is related to use of the frame
counter.

To fix all these problems, this patch more or less reverts that commit
and goes back to using the system clock.  But this can't be done
cleanly because other changes have since been made to the scan_async()
subroutine.  One of these changes involved the tricky logic that tries
to avoid rescanning QHs that have already been seen when the scanning
loop is restarted, which happens whenever an URB is given back.
Switching back to clock-based unlinks would make this logic even more
complicated.

Therefore the new code doesn't rescan the entire async list whenever a
giveback occurs.  Instead it rescans only the current QH and continues
on from there.  This requires the use of a separate pointer to keep
track of the next QH to scan, since the current QH may be unlinked
while the scanning is in progress.  That new pointer must be global,
so that it can be adjusted forward whenever the _next_ QH gets
unlinked.  (uhci-hcd uses this same trick.)

Simplification of the scanning loop removes a level of indentation,
which accounts for the size of the patch.  The amount of code changed
is relatively small, and it isn't exactly a reversion of the
b963801164 commit.

This fixes Bugzilla #32432.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
CC: <stable@kernel.org>
Tested-by: Matej Kenda <matejken@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Thomas Bork <tom@eisfair.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>

xhci: Make handover code more robust

commit e955a1cd086de4d165ae0f4c7be7289d84b63bdc upstream.

My test platform (Intel DX79SI) boots reliably under BIOS, but frequently
crashes when booting via UEFI. I finally tracked this down to the xhci
handoff code. It seems that reads from the device occasionally just return
0xff, resulting in xhci_find_next_cap_offset generating a value that's
larger than the resource region. We then oops when attempting to read the
value. Sanity checking that value lets us avoid the crash.

I've no idea what's causing the underlying problem, and xhci still doesn't
actually *work* even with this, but the machine at least boots which will
probably make further debugging easier.

This should be backported to kernels as old as 2.6.31, that contain the
commit 66d4eadd8d067269ea8fead1a50fe87c2979a80d "USB: xhci: BIOS handoff
and HW initialization."

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

Bluetooth: fix possible info leak in bt_sock_recvmsg()

commit 4683f42fde3977bdb4e8a09622788cc8b5313778 upstream.

In case the socket is already shutting down, bt_sock_recvmsg() returns
with 0 without updating msg_namelen leading to net/socket.c leaking the
local, uninitialized sockaddr_storage variable to userland -- 128 bytes
of kernel stack memory.

Fix this by moving the msg_namelen assignment in front of the shutdown
test.

Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[dannf: adjusted to apply to Debian's 2.6.32]
Signed-off-by: Willy Tarreau <w@1wt.eu>

Bluetooth: L2CAP - Fix info leak via getsockname()

commit 792039c73cf176c8e39a6e8beef2c94ff46522ed upstream.

The L2CAP code fails to initialize the l2_bdaddr_type member of struct
sockaddr_l2 and the padding byte added for alignment. It that for leaks
two bytes kernel stack via the getsockname() syscall. Add an explicit
memset(0) before filling the structure to avoid the info leak.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 2.6.32: adjust filename]
Signed-off-by: Willy Tarreau <w@1wt.eu>

Bluetooth: RFCOMM - Fix missing msg_namelen update in rfcomm_sock_recvmsg()

[ Upstream commit e11e0455c0d7d3d62276a0c55d9dfbc16779d691 ]

If RFCOMM_DEFER_SETUP is set in the flags, rfcomm_sock_recvmsg() returns
early with 0 without updating the possibly set msg_namelen member. This,
in turn, leads to a 128 byte kernel stack leak in net/socket.c.

Fix this by updating msg_namelen in this case. For all other cases it
will be handled in bt_sock_stream_recvmsg().

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

Bluetooth: RFCOMM - Fix info leak via getsockname()

[ Upstream commit 9344a972961d1a6d2c04d9008b13617bcb6ec2ef ]

The RFCOMM code fails to initialize the trailing padding byte of struct
sockaddr_rc added for alignment. It that for leaks one byte kernel stack
via the getsockname() syscall. Add an explicit memset(0) before filling
the structure to avoid the info leak.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

Bluetooth: HCI - Fix info leak in getsockopt(HCI_FILTER)

[ Upstream commit e15ca9a0ef9a86f0477530b0f44a725d67f889ee ]

The HCI code fails to initialize the two padding bytes of struct
hci_ufilter before copying it to userland -- that for leaking two
bytes kernel stack. Add an explicit memset(0) before filling the
structure to avoid the info leak.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

Bluetooth: Fix incorrect strncpy() in hidp_setup_hid()

commit 0a9ab9bdb3e891762553f667066190c1d22ad62b upstream

The length parameter should be sizeof(req->name) - 1 because there is no
guarantee that string provided by userspace will contain the trailing
'\0'.

Can be easily reproduced by manually setting req->name to 128 non-zero
bytes prior to ioctl(HIDPCONNADD) and checking the device name setup on
input subsystem:

$ cat /sys/devices/pnp0/00\:04/tty/ttyS0/hci0/hci0\:1/input8/name
AAAAAA[...]AAAAAAAAf0:af:f0:af:f0:af

("f0:af:f0:af:f0:af" is the device bluetooth address, taken from "phys"
field in struct hid_device due to overflow.)

Cc: stable@vger.kernel.org
Signed-off-by: Anderson Lizardo <anderson.lizardo@openbossa.org>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
[backported to 2.6.32 jmm]
Signed-off-by: Willy Tarreau <w@1wt.eu>

telephony: ijx: buffer overflow in ixj_write_cid()

[Not needed in 3.8 or newer as this driver is removed there. - gregkh]

We get this from user space and nothing has been done to ensure that
these strings are NUL terminated.

Reported-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

IPoIB: Fix use-after-free of multicast object

commit bea1e22df494a729978e7f2c54f7bda328f74bc3 upstream.

Fix a crash in ipoib_mcast_join_task().  (with help from Or Gerlitz)

Commit c8c2afe360b7 ("IPoIB: Use rtnl lock/unlock when changing device
flags") added a call to rtnl_lock() in ipoib_mcast_join_task(), which
is run from the ipoib_workqueue, and hence the workqueue can't be
flushed from the context of ipoib_stop().

In the current code, ipoib_stop() (which doesn't flush the workqueue)
calls ipoib_mcast_dev_flush(), which goes and deletes all the
multicast entries.  This takes place without any synchronization with
a possible running instance of ipoib_mcast_join_task() for the same
ipoib device, leading to a crash due to NULL pointer dereference.

Fix this by making sure that the workqueue is flushed before
ipoib_mcast_dev_flush() is called.  To make that possible, we move the
RTNL-lock wrapped code to ipoib_mcast_join_finish().

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

tg3: Avoid null pointer dereference in tg3_interrupt in netconsole mode

[ Upstream commit 9c13cb8bb477a83b9a3c9e5a5478a4e21294a760 ]

When netconsole is enabled, logging messages generated during tg3_open
can result in a null pointer dereference for the uninitialized tg3
status block. Use the irq_sync flag to disable polling in the early
stages. irq_sync is cleared when the driver is enabling interrupts after
all initialization is completed.

Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

b43legacy: Fix crash on unload when firmware not available

commit 2d838bb608e2d1f6cb4280e76748cb812dc822e7 upstream.

When b43legacy is loaded without the firmware being available, a following
unload generates a kernel NULL pointer dereference BUG as follows:

[  214.330789] BUG: unable to handle kernel NULL pointer dereference at 0000004c
[  214.330997] IP: [<c104c395>] drain_workqueue+0x15/0x170
[  214.331179] *pde = 00000000
[  214.331311] Oops: 0000 [#1] SMP
[  214.331471] Modules linked in: b43legacy(-) ssb pcmcia mac80211 cfg80211 af_packet mperf arc4 ppdev sr_mod cdrom sg shpchp yenta_socket pcmcia_rsrc pci_hotplug pcmcia_core battery parport_pc parport floppy container ac button edd autofs4 ohci_hcd ehci_hcd usbcore usb_common thermal processor scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_dh fan thermal_sys hwmon ata_generic pata_ali libata [last unloaded: cfg80211]
[  214.333421] Pid: 3639, comm: modprobe Not tainted 3.6.0-rc6-wl+ #163 Source Technology VIC 9921/ALI Based Notebook
[  214.333580] EIP: 0060:[<c104c395>] EFLAGS: 00010246 CPU: 0
[  214.333687] EIP is at drain_workqueue+0x15/0x170
[  214.333788] EAX: c162ac40 EBX: cdfb8360 ECX: 0000002a EDX: 00002a2a
[  214.333890] ESI: 00000000 EDI: 00000000 EBP: cd767e7c ESP: cd767e5c
[  214.333957]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[  214.333957] CR0: 8005003b CR2: 0000004c CR3: 0c96a000 CR4: 00000090
[  214.333957] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[  214.333957] DR6: ffff0ff0 DR7: 00000400
[  214.333957] Process modprobe (pid: 3639, ti=cd766000 task=cf802e90 task.ti=cd766000)
[  214.333957] Stack:
[  214.333957]  00000292 cd767e74 c12c5e09 00000296 00000296 cdfb8360 cdfb9220 00000000
[  214.333957]  cd767e90 c104c4fd cdfb8360 cdfb9220 cd682800 cd767ea4 d0c10184 cd682800
[  214.333957]  cd767ea4 cba31064 cd767eb8 d0867908 cba31064 d087e09c cd96f034 cd767ec4
[  214.333957] Call Trace:
[  214.333957]  [<c12c5e09>] ? skb_dequeue+0x49/0x60
[  214.333957]  [<c104c4fd>] destroy_workqueue+0xd/0x150
[  214.333957]  [<d0c10184>] ieee80211_unregister_hw+0xc4/0x100 [mac80211]
[  214.333957]  [<d0867908>] b43legacy_remove+0x78/0x80 [b43legacy]
[  214.333957]  [<d083654d>] ssb_device_remove+0x1d/0x30 [ssb]
[  214.333957]  [<c126f15a>] __device_release_driver+0x5a/0xb0
[  214.333957]  [<c126fb07>] driver_detach+0x87/0x90
[  214.333957]  [<c126ef4c>] bus_remove_driver+0x6c/0xe0
[  214.333957]  [<c1270120>] driver_unregister+0x40/0x70
[  214.333957]  [<d083686b>] ssb_driver_unregister+0xb/0x10 [ssb]
[  214.333957]  [<d087c488>] b43legacy_exit+0xd/0xf [b43legacy]
[  214.333957]  [<c1089dde>] sys_delete_module+0x14e/0x2b0
[  214.333957]  [<c110a4a7>] ? vfs_write+0xf7/0x150
[  214.333957]  [<c1240050>] ? tty_write_lock+0x50/0x50
[  214.333957]  [<c110a6f8>] ? sys_write+0x38/0x70
[  214.333957]  [<c1397c55>] syscall_call+0x7/0xb
[  214.333957] Code: bc 27 00 00 00 00 a1 74 61 56 c1 55 89 e5 e8 a3 fc ff ff 5d c3 90 55 89 e5 57 56 89 c6 53 b8 40 ac 62 c1 83 ec 14 e8 bb b7 34 00 <8b> 46 4c 8d 50 01 85 c0 89 56 4c 75 03 83 0e 40 80 05 40 ac 62
[  214.333957] EIP: [<c104c395>] drain_workqueue+0x15/0x170 SS:ESP 0068:cd767e5c
[  214.333957] CR2: 000000000000004c
[  214.341110] ---[ end trace c7e90ec026d875a6 ]---Index: wireless-testing/drivers/net/wireless/b43legacy/main.c

The problem is fixed by making certain that the ucode pointer is not NULL
before deregistering the driver in mac80211.

Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

r8169: incorrect identifier for a 8168dp

Merge error.

See CFG_METHOD_8 (0x3c800000 + 0x00300000) since version 8.002.00
of Realtek's driver.

Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Cc: Hayes <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 17c99297212a2d1b1779a08caf4b0d83a85545df)
Cc: Thomas Bork <tom@eisfair.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>

r8169: Add support for D-Link 530T rev C1 (Kernel Bug 38862)

[ Upstream commit 93a3aa25933461d76141179fc94aa32d5f9d954a ]

The D-Link DGE-530T rev C1 is a re-badged Realtek 8169 named DLG10028C,
unlike the previous revisions which were skge based. It is probably
the same as the discontinued DGE-528T (0x4300) other than the PCI ID.

The PCI ID is 0x1186:0x4302.

Adding it to r8169.c where 0x1186:0x4300 is already found makes the card
be detected and work.

This fixes https://bugzilla.kernel.org/show_bug.cgi?id=38862

Signed-off-by: Len Sorensen <lsorense@csclub.uwaterloo.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
(cherry picked from commit 7106159f8bd33bd5e5b0ea2c87e499117fc22c69)
Cc: Thomas Bork <tom@eisfair.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>

r8169: remove the obsolete and incorrect AMD workaround

[ Upstream commit 5d0feaff230c0abfe4a112e6f09f096ed99e0b2d ]

This was introduced in commit 6dccd16 "r8169: merge with version
6.001.00 of Realtek's r8169 driver". I did not find the version
6.001.00 online, but in 6.002.00 or any later r8169 from Realtek
this hunk is no longer present.

Also commit 05af214 "r8169: fix Ethernet Hangup for RTL8110SC
rev d" claims to have fixed this issue otherwise.

The magic compare mask of 0xfffe000 is dubious as it masks
parts of the Reserved part, and parts of the VLAN tag. But this
does not make much sense as the VLAN tag parts are perfectly
valid there. In matter of fact this seems to be triggered with
any VLAN tagged packet as RxVlanTag bit is matched. I would
suspect 0xfffe0000 was intended to test reserved part only.

Finally, this hunk is evil as it can cause more packets to be
handled than what was NAPI quota causing net/core/dev.c:
net_rx_action(): WARN_ON_ONCE(work > weight) to trigger, and
mess up the NAPI state causing device to hang.

As result, any system using VLANs and having high receive
traffic (so that NAPI poll budget limits rtl_rx) would result
in device hang.

Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 3a42cce923b0242d0293cae0a162601afa89d552)
Cc: Thomas Bork <tom@eisfair.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>