]> git.itanic.dy.fi Git - linux-stable/log
linux-stable
8 years agoLinux 2.6.32.70 v2.6.32.70
Willy Tarreau [Fri, 29 Jan 2016 21:12:37 +0000 (22:12 +0100)]
Linux 2.6.32.70

Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agokvm: x86: only channel 0 of the i8254 is linked to the HPET
Paolo Bonzini [Thu, 7 Jan 2016 12:50:38 +0000 (13:50 +0100)]
kvm: x86: only channel 0 of the i8254 is linked to the HPET

commit e5e57e7a03b1cdcb98e4aed135def2a08cbf3257 upstream.

While setting the KVM PIT counters in 'kvm_pit_load_count', if
'hpet_legacy_start' is set, the function disables the timer on
channel[0], instead of the respective index 'channel'. This is
because channels 1-3 are not linked to the HPET.  Fix the caller
to only activate the special HPET processing for channel 0.

Reported-by: P J P <pjp@fedoraproject.org>
Fixes: 0185604c2d82c560dab2f2933a18f797e74ab5a8
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit ef90cf3d0b59e3b1dcfe94d1a241107667e6e96a)
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agoKVM: x86: Reload pit counters for all channels when restoring state
Andrew Honig [Wed, 18 Nov 2015 22:50:23 +0000 (14:50 -0800)]
KVM: x86: Reload pit counters for all channels when restoring state

commit 0185604c2d82c560dab2f2933a18f797e74ab5a8 upstream.

Currently if userspace restores the pit counters with a count of 0
on channels 1 or 2 and the guest attempts to read the count on those
channels, then KVM will perform a mod of 0 and crash.  This will ensure
that 0 values are converted to 65536 as per the spec.

This is CVE-2015-7513.

Signed-off-by: Andy Honig <ahonig@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 08b8d1a6ccdefd3d517d04c472b7f42f51b3059b)
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agomm/memory_hotplug.c: check for missing sections in test_pages_in_a_zone()
Andrew Banman [Tue, 29 Dec 2015 22:54:25 +0000 (14:54 -0800)]
mm/memory_hotplug.c: check for missing sections in test_pages_in_a_zone()

commit 5f0f2887f4de9508dcf438deab28f1de8070c271 upstream.

test_pages_in_a_zone() does not account for the possibility of missing
sections in the given pfn range.  pfn_valid_within always returns 1 when
CONFIG_HOLES_IN_ZONE is not set, allowing invalid pfns from missing
sections to pass the test, leading to a kernel oops.

Wrap an additional pfn loop with PAGES_PER_SECTION granularity to check
for missing sections before proceeding into the zone-check code.

This also prevents a crash from offlining memory devices with missing
sections.  Despite this, it may be a good idea to keep the related patch
'[PATCH 3/3] drivers: memory: prohibit offlining of memory blocks with
missing sections' because missing sections in a memory block may lead to
other problems not covered by the scope of this fix.

Signed-off-by: Andrew Banman <abanman@sgi.com>
Acked-by: Alex Thorlton <athorlton@sgi.com>
Cc: Russ Anderson <rja@sgi.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Greg KH <greg@kroah.com>
Cc: Seth Jennings <sjennings@variantweb.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 17f6a291c98199d7ce15a850ce5f548ceef628bc)
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agomm: add SECTION_ALIGN_UP() and SECTION_ALIGN_DOWN() macro
Daniel Kiper [Wed, 25 May 2011 00:12:51 +0000 (17:12 -0700)]
mm: add SECTION_ALIGN_UP() and SECTION_ALIGN_DOWN() macro

commit a539f3533b78e39a22723d6d3e1e11b6c14454d9 upstream.

Add SECTION_ALIGN_UP() and SECTION_ALIGN_DOWN() macro which aligns given
pfn to upper section and lower section boundary accordingly.

Required for the latest memory hotplug support for the Xen balloon driver.

Signed-off-by: Daniel Kiper <dkiper@net-space.pl>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[wt: only needed for next patch]
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agoipv6/addrlabel: fix ip6addrlbl_get()
Andrey Ryabinin [Mon, 21 Dec 2015 09:54:45 +0000 (12:54 +0300)]
ipv6/addrlabel: fix ip6addrlbl_get()

commit e459dfeeb64008b2d23bdf600f03b3605dbb8152 upstream.

ip6addrlbl_get() has never worked. If ip6addrlbl_hold() succeeded,
ip6addrlbl_get() will exit with '-ESRCH'. If ip6addrlbl_hold() failed,
ip6addrlbl_get() will use about to be free ip6addrlbl_entry pointer.

Fix this by inverting ip6addrlbl_hold() check.

Fixes: 2a8cc6c89039 ("[IPV6] ADDRCONF: Support RFC3484 configurable address selection policy table.")
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: Cong Wang <cwang@twopensource.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 39b214ba1a357359f9c0be6ef8d21f2e5187567a)
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agoparisc: Fix syscall restarts
Helge Deller [Mon, 21 Dec 2015 09:03:30 +0000 (10:03 +0100)]
parisc: Fix syscall restarts

commit 71a71fb5374a23be36a91981b5614590b9e722c3 upstream.

On parisc syscalls which are interrupted by signals sometimes failed to
restart and instead returned -ENOSYS which in the worst case lead to
userspace crashes.
A similiar problem existed on MIPS and was fixed by commit e967ef02
("MIPS: Fix restart of indirect syscalls").

On parisc the current syscall restart code assumes that all syscall
callers load the syscall number in the delay slot of the ble
instruction. That's how it is e.g. done in the unistd.h header file:
ble 0x100(%sr2, %r0)
ldi #syscall_nr, %r20
Because of that assumption the current code never restored %r20 before
returning to userspace.

This assumption is at least not true for code which uses the glibc
syscall() function, which instead uses this syntax:
ble 0x100(%sr2, %r0)
copy regX, %r20
where regX depend on how the compiler optimizes the code and register
usage.

This patch fixes this problem by adding code to analyze how the syscall
number is loaded in the delay branch and - if needed - copy the syscall
number to regX prior returning to userspace for the syscall restart.

Signed-off-by: Helge Deller <deller@gmx.de>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 9f2dcffefc599c424ad1dd402e3a96da60639308)
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agoMIPS: Fix restart of indirect syscalls
Ed Swierk [Tue, 13 Jan 2015 05:10:30 +0000 (21:10 -0800)]
MIPS: Fix restart of indirect syscalls

commit e967ef022e00bb7c2e5b1a42007abfdd52055050 upstream.

When 32-bit MIPS userspace invokes a syscall indirectly via syscall(number,
arg1, ..., arg7), the kernel looks up the actual syscall based on the given
number, shifts the other arguments to the left, and jumps to the syscall.

If the syscall is interrupted by a signal and indicates it needs to be
restarted by the kernel (by returning ERESTARTNOINTR for example), the
syscall must be called directly, since the number is no longer the first
argument, and the other arguments are now staged for a direct call.

Before shifting the arguments, store the syscall number in pt_regs->regs[2].
This gets copied temporarily into pt_regs->regs[0] after the syscall returns.
If the syscall needs to be restarted, handle_signal()/do_signal() copies the
number back to pt_regs->reg[2], which ends up in $v0 once control returns to
userspace.

Signed-off-by: Ed Swierk <eswierk@skyportsystems.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/8929/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 08f865bba9c705aef95268a33393698e5385587e)
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agoUSB: fix invalid memory access in hub_activate()
Alan Stern [Wed, 16 Dec 2015 18:32:38 +0000 (13:32 -0500)]
USB: fix invalid memory access in hub_activate()

commit e50293ef9775c5f1cf3fcc093037dd6a8c5684ea upstream.

Commit 8520f38099cc ("USB: change hub initialization sleeps to
delayed_work") changed the hub_activate() routine to make part of it
run in a workqueue.  However, the commit failed to take a reference to
the usb_hub structure or to lock the hub interface while doing so.  As
a result, if a hub is plugged in and quickly unplugged before the work
routine can run, the routine will try to access memory that has been
deallocated.  Or, if the hub is unplugged while the routine is
running, the memory may be deallocated while it is in active use.

This patch fixes the problem by taking a reference to the usb_hub at
the start of hub_activate() and releasing it at the end (when the work
is finished), and by locking the hub interface while the work routine
is running.  It also adds a check at the start of the routine to see
if the hub has already been disconnected, in which nothing should be
done.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Alexandru Cornea <alexandru.cornea@intel.com>
Tested-by: Alexandru Cornea <alexandru.cornea@intel.com>
Fixes: 8520f38099cc ("USB: change hub initialization sleeps to delayed_work")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: add prototype for hub_release() before first use]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 10037421b529bc1fc18994e94e37d745184c4ea9)
[wt: made a few changes :
  - adjusted context due to some autopm code being added only in 2.6.33
  - no device_{lock,unlock}() in 2.6.32, use up/down(&->sem) instead]
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agoUSB: ipaq.c: fix a timeout loop
Dan Carpenter [Wed, 16 Dec 2015 11:06:37 +0000 (14:06 +0300)]
USB: ipaq.c: fix a timeout loop

commit abdc9a3b4bac97add99e1d77dc6d28623afe682b upstream.

The code expects the loop to end with "retries" set to zero but, because
it is a post-op, it will end set to -1.  I have fixed this by moving the
decrement inside the loop.

Fixes: 014aa2a3c32e ('USB: ipaq: minor ipaq_open() cleanup.')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 53a68d3f1629de82ddeb4e0882b0727fc230a6f3)
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agos390/dis: Fix handling of format specifiers
Michael Holzheu [Thu, 17 Dec 2015 18:06:02 +0000 (19:06 +0100)]
s390/dis: Fix handling of format specifiers

commit 272fa59ccb4fc802af28b1d699c2463db6a71bf7 upstream.

The print_insn() function returns strings like "lghi %r1,0". To escape the
'%' character in sprintf() a second '%' is used. For example "lghi %%r1,0"
is converted into "lghi %r1,0".

After print_insn() the output string is passed to printk(). Because format
specifiers like "%r" or "%f" are ignored by printk() this works by chance
most of the time. But for instructions with control registers like
"lctl %c6,%c6,780" this fails because printk() interprets "%c" as
character format specifier.

Fix this problem and escape the '%' characters twice.

For example "lctl %%%%c6,%%%%c6,780" is then converted by sprintf()
into "lctl %%c6,%%c6,780" and by printk() into "lctl %c6,%c6,780".

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
[bwh: Backported to 3.2: drop the OPERAND_VR case]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 45f32e359d36dd045e4af0d1bea8bbbafd81eeb7)
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agospi: fix parent-device reference leak
Johan Hovold [Mon, 14 Dec 2015 15:16:19 +0000 (16:16 +0100)]
spi: fix parent-device reference leak

commit 157f38f993919b648187ba341bfb05d0e91ad2f6 upstream.

Fix parent-device reference leak due to SPI-core taking an unnecessary
reference to the parent when allocating the master structure, a
reference that was never released.

Note that driver core takes its own reference to the parent when the
master device is registered.

Fixes: 49dce689ad4e ("spi doesn't need class_device")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 40ddf30ca43b50019304303949c9b2676cfbd820)
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agoser_gigaset: fix deallocation of platform device structure
Tilman Schmidt [Tue, 15 Dec 2015 17:11:30 +0000 (18:11 +0100)]
ser_gigaset: fix deallocation of platform device structure

commit 4c5e354a974214dfb44cd23fa0429327693bc3ea upstream.

When shutting down the device, the struct ser_cardstate must not be
kfree()d immediately after the call to platform_device_unregister()
since the embedded struct platform_device is still in use.
Move the kfree() call to the release method instead.

Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Fixes: 2869b23e4b95 ("drivers/isdn/gigaset: new M101 driver (v2)")
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 521e4101aaf46b2f37088453a1e935604cdd7f2a)
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agomISDN: fix a loop count
Dan Carpenter [Tue, 15 Dec 2015 10:07:52 +0000 (13:07 +0300)]
mISDN: fix a loop count

commit 40d24c4d8a7430aa4dfd7a665fa3faf3b05b673f upstream.

There are two issue here.
1)  cnt starts as maxloop + 1 so all these loops iterate one more time
    than intended.
2)  At the end of the loop we test for "if (maxloop && !cnt)" but for
    the first two loops, we end with cnt equal to -1.  Changing this to
    a pre-op means we end with cnt set to 0.

Fixes: cae86d4a4e56 ('mISDN: Add driver for Infineon ISDN chipset family')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 7eb2a0151e7c8a95d1be33a923718fb690452c61)
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agotty: Fix GPF in flush_to_ldisc()
Peter Hurley [Fri, 27 Nov 2015 19:25:08 +0000 (14:25 -0500)]
tty: Fix GPF in flush_to_ldisc()

commit 9ce119f318ba1a07c29149301f1544b6c4bea52a upstream.

A line discipline which does not define a receive_buf() method can
can cause a GPF if data is ever received [1]. Oddly, this was known
to the author of n_tracesink in 2011, but never fixed.

[1] GPF report
    BUG: unable to handle kernel NULL pointer dereference at           (null)
    IP: [<          (null)>]           (null)
    PGD 3752d067 PUD 37a7b067 PMD 0
    Oops: 0010 [#1] SMP KASAN
    Modules linked in:
    CPU: 2 PID: 148 Comm: kworker/u10:2 Not tainted 4.4.0-rc2+ #51
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    Workqueue: events_unbound flush_to_ldisc
    task: ffff88006da94440 ti: ffff88006db60000 task.ti: ffff88006db60000
    RIP: 0010:[<0000000000000000>]  [<          (null)>]           (null)
    RSP: 0018:ffff88006db67b50  EFLAGS: 00010246
    RAX: 0000000000000102 RBX: ffff88003ab32f88 RCX: 0000000000000102
    RDX: 0000000000000000 RSI: ffff88003ab330a6 RDI: ffff88003aabd388
    RBP: ffff88006db67c48 R08: ffff88003ab32f9c R09: ffff88003ab31fb0
    R10: ffff88003ab32fa8 R11: 0000000000000000 R12: dffffc0000000000
    R13: ffff88006db67c20 R14: ffffffff863df820 R15: ffff88003ab31fb8
    FS:  0000000000000000(0000) GS:ffff88006dc00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 0000000000000000 CR3: 0000000037938000 CR4: 00000000000006e0
    Stack:
     ffffffff829f46f1 ffff88006da94bf8 ffff88006da94bf8 0000000000000000
     ffff88003ab31fb0 ffff88003aabd438 ffff88003ab31ff8 ffff88006430fd90
     ffff88003ab32f9c ffffed0007557a87 1ffff1000db6cf78 ffff88003ab32078
    Call Trace:
     [<ffffffff8127cf91>] process_one_work+0x8f1/0x17a0 kernel/workqueue.c:2030
     [<ffffffff8127df14>] worker_thread+0xd4/0x1180 kernel/workqueue.c:2162
     [<ffffffff8128faaf>] kthread+0x1cf/0x270 drivers/block/aoe/aoecmd.c:1302
     [<ffffffff852a7c2f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:468
    Code:  Bad RIP value.
    RIP  [<          (null)>]           (null)
     RSP <ffff88006db67b50>
    CR2: 0000000000000000
    ---[ end trace a587f8947e54d6ea ]---

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit b23324ffa8ef8cc96865db76db938905d61d949a)
[wt: applied to drivers/char/tty_buffer.c instead]
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agoses: fix additional element traversal bug
James Bottomley [Fri, 11 Dec 2015 17:16:38 +0000 (09:16 -0800)]
ses: fix additional element traversal bug

commit 5e1033561da1152c57b97ee84371dba2b3d64c25 upstream.

KASAN found that our additional element processing scripts drop off
the end of the VPD page into unallocated space.  The reason is that
not every element has additional information but our traversal
routines think they do, leading to them expecting far more additional
information than is present.  Fix this by adding a gate to the
traversal routine so that it only processes elements that are expected
to have additional information (list is in SES-2 section 6.1.13.1:
Additional Element Status diagnostic page overview)

Reported-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Tested-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 344d6d02690a650607e6372bce8fe40e647a51b5)
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agoses: Fix problems with simple enclosures
James Bottomley [Tue, 8 Dec 2015 17:00:31 +0000 (09:00 -0800)]
ses: Fix problems with simple enclosures

commit 3417c1b5cb1fdc10261dbed42b05cc93166a78fd upstream.

Simple enclosure implementations (mostly USB) are allowed to return only
page 8 to every diagnostic query.  That really confuses our
implementation because we assume the return is the page we asked for and
end up doing incorrect offsets based on bogus information leading to
accesses outside of allocated ranges.  Fix that by checking the page
code of the return and giving an error if it isn't the one we asked for.
This should fix reported bugs with USB storage by simply refusing to
attach to enclosures that behave like this.  It's also good defensive
practise now that we're starting to see more USB enclosures.

Reported-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 25ef938516d26c3db9cb1f5b927ad7ee95be1022)
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agorfkill: copy the name into the rfkill struct
Johannes Berg [Thu, 10 Dec 2015 09:37:51 +0000 (10:37 +0100)]
rfkill: copy the name into the rfkill struct

commit b7bb110008607a915298bf0f47d25886ecb94477 upstream.

Some users of rfkill, like NFC and cfg80211, use a dynamic name when
allocating rfkill, in those cases dev_name(). Therefore, the pointer
passed to rfkill_alloc() might not be valid forever, I specifically
found the case that the rfkill name was quite obviously an invalid
pointer (or at least garbage) when the wiphy had been renamed.

Fix this by making a copy of the rfkill name in rfkill_alloc().

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 6f23bc6f6be370267332a0278a4646126836baee)
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agoaf_unix: fix a fatal race with bit fields
Eric Dumazet [Wed, 1 May 2013 05:24:03 +0000 (05:24 +0000)]
af_unix: fix a fatal race with bit fields

commit 60bc851ae59bfe99be6ee89d6bc50008c85ec75d upstream.

Using bit fields is dangerous on ppc64/sparc64, as the compiler [1]
uses 64bit instructions to manipulate them.
If the 64bit word includes any atomic_t or spinlock_t, we can lose
critical concurrent changes.

This is happening in af_unix, where unix_sk(sk)->gc_candidate/
gc_maybe_cycle/lock share the same 64bit word.

This leads to fatal deadlock, as one/several cpus spin forever
on a spinlock that will never be available again.

A safer way would be to use a long to store flags.
This way we are sure compiler/arch wont do bad things.

As we own unix_gc_lock spinlock when clearing or setting bits,
we can use the non atomic __set_bit()/__clear_bit().

recursion_level can share the same 64bit location with the spinlock,
as it is set only with this spinlock held.

[1] bug fixed in gcc-4.8.0 :
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080

Reported-by: Ambrose Feinstein <ambrose@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 2ee9cbe7e7bfe2d36374288b818aa31b2c4981db)
[wt: adjusted context]
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agosctp: update the netstamp_needed counter when copying sockets
Marcelo Ricardo Leitner [Fri, 4 Dec 2015 17:14:04 +0000 (15:14 -0200)]
sctp: update the netstamp_needed counter when copying sockets

[ Upstream commit 01ce63c90170283a9855d1db4fe81934dddce648 ]

Dmitry Vyukov reported that SCTP was triggering a WARN on socket destroy
related to disabling sock timestamp.

When SCTP accepts an association or peel one off, it copies sock flags
but forgot to call net_enable_timestamp() if a packet timestamping flag
was copied, leading to extra calls to net_disable_timestamp() whenever
such clones were closed.

The fix is to call net_enable_timestamp() whenever we copy a sock with
that flag on, like tcp does.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: SK_FLAGS_TIMESTAMP is newly defined]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit d85242d91610acbe4f905624a5758a01ae7bb32c)
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agonet, scm: fix PaX detected msg_controllen overflow in scm_detach_fds
Daniel Borkmann [Thu, 19 Nov 2015 23:11:56 +0000 (00:11 +0100)]
net, scm: fix PaX detected msg_controllen overflow in scm_detach_fds

[ Upstream commit 6900317f5eff0a7070c5936e5383f589e0de7a09 ]

David and HacKurx reported a following/similar size overflow triggered
in a grsecurity kernel, thanks to PaX's gcc size overflow plugin:

(Already fixed in later grsecurity versions by Brad and PaX Team.)

[ 1002.296137] PAX: size overflow detected in function scm_detach_fds net/core/scm.c:314
               cicus.202_127 min, count: 4, decl: msg_controllen; num: 0; context: msghdr;
[ 1002.296145] CPU: 0 PID: 3685 Comm: scm_rights_recv Not tainted 4.2.3-grsec+ #7
[ 1002.296149] Hardware name: Apple Inc. MacBookAir5,1/Mac-66F35F19FE2A0D05, [...]
[ 1002.296153]  ffffffff81c27366 0000000000000000 ffffffff81c27375 ffffc90007843aa8
[ 1002.296162]  ffffffff818129ba 0000000000000000 ffffffff81c27366 ffffc90007843ad8
[ 1002.296169]  ffffffff8121f838 fffffffffffffffc fffffffffffffffc ffffc90007843e60
[ 1002.296176] Call Trace:
[ 1002.296190]  [<ffffffff818129ba>] dump_stack+0x45/0x57
[ 1002.296200]  [<ffffffff8121f838>] report_size_overflow+0x38/0x60
[ 1002.296209]  [<ffffffff816a979e>] scm_detach_fds+0x2ce/0x300
[ 1002.296220]  [<ffffffff81791899>] unix_stream_read_generic+0x609/0x930
[ 1002.296228]  [<ffffffff81791c9f>] unix_stream_recvmsg+0x4f/0x60
[ 1002.296236]  [<ffffffff8178dc00>] ? unix_set_peek_off+0x50/0x50
[ 1002.296243]  [<ffffffff8168fac7>] sock_recvmsg+0x47/0x60
[ 1002.296248]  [<ffffffff81691522>] ___sys_recvmsg+0xe2/0x1e0
[ 1002.296257]  [<ffffffff81693496>] __sys_recvmsg+0x46/0x80
[ 1002.296263]  [<ffffffff816934fc>] SyS_recvmsg+0x2c/0x40
[ 1002.296271]  [<ffffffff8181a3ab>] entry_SYSCALL_64_fastpath+0x12/0x85

Further investigation showed that this can happen when an *odd* number of
fds are being passed over AF_UNIX sockets.

In these cases CMSG_LEN(i * sizeof(int)) and CMSG_SPACE(i * sizeof(int)),
where i is the number of successfully passed fds, differ by 4 bytes due
to the extra CMSG_ALIGN() padding in CMSG_SPACE() to an 8 byte boundary
on 64 bit. The padding is used to align subsequent cmsg headers in the
control buffer.

When the control buffer passed in from the receiver side *lacks* these 4
bytes (e.g. due to buggy/wrong API usage), then msg->msg_controllen will
overflow in scm_detach_fds():

  int cmlen = CMSG_LEN(i * sizeof(int));  <--- cmlen w/o tail-padding
  err = put_user(SOL_SOCKET, &cm->cmsg_level);
  if (!err)
    err = put_user(SCM_RIGHTS, &cm->cmsg_type);
  if (!err)
    err = put_user(cmlen, &cm->cmsg_len);
  if (!err) {
    cmlen = CMSG_SPACE(i * sizeof(int));  <--- cmlen w/ 4 byte extra tail-padding
    msg->msg_control += cmlen;
    msg->msg_controllen -= cmlen;         <--- iff no tail-padding space here ...
  }                                            ... wrap-around

F.e. it will wrap to a length of 18446744073709551612 bytes in case the
receiver passed in msg->msg_controllen of 20 bytes, and the sender
properly transferred 1 fd to the receiver, so that its CMSG_LEN results
in 20 bytes and CMSG_SPACE in 24 bytes.

In case of MSG_CMSG_COMPAT (scm_detach_fds_compat()), I haven't seen an
issue in my tests as alignment seems always on 4 byte boundary. Same
should be in case of native 32 bit, where we end up with 4 byte boundaries
as well.

In practice, passing msg->msg_controllen of 20 to recvmsg() while receiving
a single fd would mean that on successful return, msg->msg_controllen is
being set by the kernel to 24 bytes instead, thus more than the input
buffer advertised. It could f.e. become an issue if such application later
on zeroes or copies the control buffer based on the returned msg->msg_controllen
elsewhere.

Maximum number of fds we can send is a hard upper limit SCM_MAX_FD (253).

Going over the code, it seems like msg->msg_controllen is not being read
after scm_detach_fds() in scm_recv() anymore by the kernel, good!

Relevant recvmsg() handler are unix_dgram_recvmsg() (unix_seqpacket_recvmsg())
and unix_stream_recvmsg(). Both return back to their recvmsg() caller,
and ___sys_recvmsg() places the updated length, that is, new msg_control -
old msg_control pointer into msg->msg_controllen (hence the 24 bytes seen
in the example).

Long time ago, Wei Yongjun fixed something related in commit 1ac70e7ad24a
("[NET]: Fix function put_cmsg() which may cause usr application memory
overflow").

RFC3542, section 20.2. says:

  The fields shown as "XX" are possible padding, between the cmsghdr
  structure and the data, and between the data and the next cmsghdr
  structure, if required by the implementation. While sending an
  application may or may not include padding at the end of last
  ancillary data in msg_controllen and implementations must accept both
  as valid. On receiving a portable application must provide space for
  padding at the end of the last ancillary data as implementations may
  copy out the padding at the end of the control message buffer and
  include it in the received msg_controllen. When recvmsg() is called
  if msg_controllen is too small for all the ancillary data items
  including any trailing padding after the last item an implementation
  may set MSG_CTRUNC.

Since we didn't place MSG_CTRUNC for already quite a long time, just do
the same as in 1ac70e7ad24a to avoid an overflow.

Btw, even man-page author got this wrong :/ See db939c9b26e9 ("cmsg.3: Fix
error in SCM_RIGHTS code sample"). Some people must have copied this (?),
thus it got triggered in the wild (reported several times during boot by
David and HacKurx).

No Fixes tag this time as pre 2002 (that is, pre history tree).

Reported-by: David Sterba <dave@jikos.cz>
Reported-by: HacKurx <hackurx@gmail.com>
Cc: PaX Team <pageexec@freemail.hu>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: Eric Dumazet <edumazet@google.com>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 831a2a17da39d93cf68981ff99cce5d31551044b)
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agotcp: initialize tp->copied_seq in case of cross SYN connection
Eric Dumazet [Thu, 26 Nov 2015 16:18:14 +0000 (08:18 -0800)]
tcp: initialize tp->copied_seq in case of cross SYN connection

[ Upstream commit 142a2e7ece8d8ac0e818eb2c91f99ca894730e2a ]

Dmitry provided a syzkaller (http://github.com/google/syzkaller)
generated program that triggers the WARNING at
net/ipv4/tcp.c:1729 in tcp_recvmsg() :

WARN_ON(tp->copied_seq != tp->rcv_nxt &&
        !(flags & (MSG_PEEK | MSG_TRUNC)));

His program is specifically attempting a Cross SYN TCP exchange,
that we support (for the pleasure of hackers ?), but it looks we
lack proper tcp->copied_seq initialization.

Thanks again Dmitry for your report and testings.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 6cfa9781d3bf950eed455369966bbdf9d05871c5)
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agoipmi: move timer init to before irq is setup
Jan Stancek [Tue, 8 Dec 2015 18:57:51 +0000 (13:57 -0500)]
ipmi: move timer init to before irq is setup

commit 27f972d3e00b50639deb4cc1392afaeb08d3cecc upstream.

We encountered a panic on boot in ipmi_si on a dell per320 due to an
uninitialized timer as follows.

static int smi_start_processing(void       *send_info,
                                ipmi_smi_t intf)
{
        /* Try to claim any interrupts. */
        if (new_smi->irq_setup)
                new_smi->irq_setup(new_smi);

 --> IRQ arrives here and irq handler tries to modify uninitialized timer

    which triggers BUG_ON(!timer->function) in __mod_timer().

 Call Trace:
   <IRQ>
   [<ffffffffa0532617>] start_new_msg+0x47/0x80 [ipmi_si]
   [<ffffffffa053269e>] start_check_enables+0x4e/0x60 [ipmi_si]
   [<ffffffffa0532bd8>] smi_event_handler+0x1e8/0x640 [ipmi_si]
   [<ffffffff810f5584>] ? __rcu_process_callbacks+0x54/0x350
   [<ffffffffa053327c>] si_irq_handler+0x3c/0x60 [ipmi_si]
   [<ffffffff810efaf0>] handle_IRQ_event+0x60/0x170
   [<ffffffff810f245e>] handle_edge_irq+0xde/0x180
   [<ffffffff8100fc59>] handle_irq+0x49/0xa0
   [<ffffffff8154643c>] do_IRQ+0x6c/0xf0
   [<ffffffff8100ba53>] ret_from_intr+0x0/0x11

        /* Set up the timer that drives the interface. */
        setup_timer(&new_smi->si_timer, smi_timeout, (long)new_smi);

The following patch fixes the problem.

To: Openipmi-developer@lists.sourceforge.net
To: Corey Minyard <minyard@acm.org>
CC: linux-kernel@vger.kernel.org
Signed-off-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Tony Camuso <tcamuso@redhat.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 207ffa8cbcf6e780d862316a94c95a7e70d1e72e)
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agosched/core: Remove false-positive warning from wake_up_process()
Sasha Levin [Tue, 1 Dec 2015 01:34:20 +0000 (20:34 -0500)]
sched/core: Remove false-positive warning from wake_up_process()

commit 119d6f6a3be8b424b200dcee56e74484d5445f7e upstream.

Because wakeups can (fundamentally) be late, a task might not be in
the expected state. Therefore testing against a task's state is racy,
and can yield false positives.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: oleg@redhat.com
Fixes: 9067ac85d533 ("wake_up_process() should be never used to wakeup a TASK_STOPPED/TRACED task")
Link: http://lkml.kernel.org/r/1448933660-23082-1-git-send-email-sasha.levin@oracle.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 0e796c1b57fdd97fc040b0b78ff7fea6c0a4a39d)
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agoipv4: igmp: Allow removing groups from a removed interface
Andrew Lunn [Tue, 1 Dec 2015 15:31:08 +0000 (16:31 +0100)]
ipv4: igmp: Allow removing groups from a removed interface

commit 4eba7bb1d72d9bde67d810d09bf62dc207b63c5c upstream.

When a multicast group is joined on a socket, a struct ip_mc_socklist
is appended to the sockets mc_list containing information about the
joined group.

If the interface is hot unplugged, this entry becomes stale. Prior to
commit 52ad353a5344f ("igmp: fix the problem when mc leave group") it
was possible to remove the stale entry by performing a
IP_DROP_MEMBERSHIP, passing either the old ifindex or ip address on
the interface. However, this fix enforces that the interface must
still exist. Thus with time, the number of stale entries grows, until
sysctl_igmp_max_memberships is reached and then it is not possible to
join and more groups.

The previous patch fixes an issue where a IP_DROP_MEMBERSHIP is
performed without specifying the interface, either by ifindex or ip
address. However here we do supply one of these. So loosen the
restriction on device existence to only apply when the interface has
not been specified. This then restores the ability to clean up the
stale entries.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Fixes: 52ad353a5344f "(igmp: fix the problem when mc leave group")
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit b1fa852672fb1ea4b0e34bfa1136bb65dba66151)
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agowan/x25: Fix use-after-free in x25_asy_open_tty()
Peter Hurley [Fri, 27 Nov 2015 19:18:39 +0000 (14:18 -0500)]
wan/x25: Fix use-after-free in x25_asy_open_tty()

commit ee9159ddce14bc1dec9435ae4e3bd3153e783706 upstream.

The N_X25 line discipline may access the previous line discipline's closed
and already-freed private data on open [1].

The tty->disc_data field _never_ refers to valid data on entry to the
line discipline's open() method. Rather, the ldisc is expected to
initialize that field for its own use for the lifetime of the instance
(ie. from open() to close() only).

[1]
    [  634.336761] ==================================================================
    [  634.338226] BUG: KASAN: use-after-free in x25_asy_open_tty+0x13d/0x490 at addr ffff8800a743efd0
    [  634.339558] Read of size 4 by task syzkaller_execu/8981
    [  634.340359] =============================================================================
    [  634.341598] BUG kmalloc-512 (Not tainted): kasan: bad access detected
    ...
    [  634.405018] Call Trace:
    [  634.405277] dump_stack (lib/dump_stack.c:52)
    [  634.405775] print_trailer (mm/slub.c:655)
    [  634.406361] object_err (mm/slub.c:662)
    [  634.406824] kasan_report_error (mm/kasan/report.c:138 mm/kasan/report.c:236)
    [  634.409581] __asan_report_load4_noabort (mm/kasan/report.c:279)
    [  634.411355] x25_asy_open_tty (drivers/net/wan/x25_asy.c:559 (discriminator 1))
    [  634.413997] tty_ldisc_open.isra.2 (drivers/tty/tty_ldisc.c:447)
    [  634.414549] tty_set_ldisc (drivers/tty/tty_ldisc.c:567)
    [  634.415057] tty_ioctl (drivers/tty/tty_io.c:2646 drivers/tty/tty_io.c:2879)
    [  634.423524] do_vfs_ioctl (fs/ioctl.c:43 fs/ioctl.c:607)
    [  634.427491] SyS_ioctl (fs/ioctl.c:622 fs/ioctl.c:613)
    [  634.427945] entry_SYSCALL_64_fastpath (arch/x86/entry/entry_64.S:188)

Reported-and-tested-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 485274cf9a69504cf4a7ef182c508f3541eee84e)
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agonfs: if we have no valid attrs, then don't declare the attribute cache valid
Jeff Layton [Wed, 25 Nov 2015 18:50:11 +0000 (13:50 -0500)]
nfs: if we have no valid attrs, then don't declare the attribute cache valid

commit c812012f9ca7cf89c9e1a1cd512e6c3b5be04b85 upstream.

If we pass in an empty nfs_fattr struct to nfs_update_inode, it will
(correctly) not update any of the attributes, but it then clears the
NFS_INO_INVALID_ATTR flag, which indicates that the attributes are
up to date. Don't clear the flag if the fattr struct has no valid
attrs to apply.

Reviewed-by: Steve French <steve.french@primarydata.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit ddab0155aa9589600861e3f65d505982b719496a)
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agoext4: Fix handling of extended tv_sec
David Turner [Tue, 24 Nov 2015 19:34:37 +0000 (14:34 -0500)]
ext4: Fix handling of extended tv_sec

commit a4dad1ae24f850410c4e60f22823cba1289b8d52 upstream.

In ext4, the bottom two bits of {a,c,m}time_extra are used to extend
the {a,c,m}time fields, deferring the year 2038 problem to the year
2446.

When decoding these extended fields, for times whose bottom 32 bits
would represent a negative number, sign extension causes the 64-bit
extended timestamp to be negative as well, which is not what's
intended.  This patch corrects that issue, so that the only negative
{a,c,m}times are those between 1901 and 1970 (as per 32-bit signed
timestamps).

Some older kernels might have written pre-1970 dates with 1,1 in the
extra bits.  This patch treats those incorrectly-encoded dates as
pre-1970, instead of post-2311, until kernel 4.20 is released.
Hopefully by then e2fsck will have fixed up the bad data.

Also add a comment explaining the encoding of ext4's extra {a,c,m}time
bits.

Signed-off-by: David Turner <novalis@novalis.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reported-by: Mark Harris <mh8928@yahoo.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=23732
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 6dfd5f6abf1825fc351f663bf630603f9b78251b)
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agovfs: Avoid softlockups with sendfile(2)
Jan Kara [Mon, 23 Nov 2015 12:09:51 +0000 (13:09 +0100)]
vfs: Avoid softlockups with sendfile(2)

commit c2489e07c0a71a56fb2c84bc0ee66cddfca7d068 upstream.

The following test program from Dmitry can cause softlockups or RCU
stalls as it copies 1GB from tmpfs into eventfd and we don't have any
scheduling point at that path in sendfile(2) implementation:

        int r1 = eventfd(0, 0);
        int r2 = memfd_create("", 0);
        unsigned long n = 1<<30;
        fallocate(r2, 0, 0, n);
        sendfile(r1, r2, 0, n);

Add cond_resched() into __splice_from_pipe() to fix the problem.

CC: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 47ae562efbd0fd8440959304915291865b945c67)
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agofix sysvfs symlinks
Al Viro [Tue, 24 Nov 2015 02:11:08 +0000 (21:11 -0500)]
fix sysvfs symlinks

commit 0ebf7f10d67a70e120f365018f1c5fce9ddc567d upstream.

The thing got broken back in 2002 - sysvfs does *not* have inline
symlinks; even short ones have bodies stored in the first block
of file.  sysv_symlink() handles that correctly; unfortunately,
attempting to look an existing symlink up will end up confusing
them for inline symlinks, and interpret the block number containing
the body as the body itself.

Nobody has noticed until now, which says something about the level
of testing sysvfs gets ;-/

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[bwh: Backported to 3.2:
 - Adjust context
 - Also delete unused sysv_fast_symlink_inode_operations]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 081c769778f7a41f4eb3d633df26ad572575c980)
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agofuse: break infinite loop in fuse_fill_write_pages()
Roman Gushchin [Mon, 12 Oct 2015 13:33:44 +0000 (16:33 +0300)]
fuse: break infinite loop in fuse_fill_write_pages()

commit 3ca8138f014a913f98e6ef40e939868e1e9ea876 upstream.

I got a report about unkillable task eating CPU. Further
investigation shows, that the problem is in the fuse_fill_write_pages()
function. If iov's first segment has zero length, we get an infinite
loop, because we never reach iov_iter_advance() call.

Fix this by calling iov_iter_advance() before repeating an attempt to
copy data from userspace.

A similar problem is described in 124d3b7041f ("fix writev regression:
pan hanging unkillable and un-straceable"). If zero-length segmend
is followed by segment with invalid address,
iov_iter_fault_in_readable() checks only first segment (zero-length),
iov_iter_copy_from_user_atomic() skips it, fails at second and
returns zero -> goto again without skipping zero-length segment.

Patch calls iov_iter_advance() before goto again: we'll skip zero-length
segment at second iteraction and iov_iter_fault_in_readable() will detect
invalid address.

Special thanks to Konstantin Khlebnikov, who helped a lot with the commit
description.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Maxim Patlasov <mpatlasov@parallels.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Fixes: ea9b9907b82a ("fuse: implement perform_write")
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit a5b234167a1ff46f311f5835828eec2f971b9bb4)
[wt: adjusted context, as commit 478e0841b from 3.1 was never backported
 to call mark_page_accessed() eventhough it seems it should have been]
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agosctp: translate host order to network order when setting a hmacid
lucien [Thu, 12 Nov 2015 05:07:07 +0000 (13:07 +0800)]
sctp: translate host order to network order when setting a hmacid

commit ed5a377d87dc4c87fb3e1f7f698cba38cd893103 upstream.

now sctp auth cannot work well when setting a hmacid manually, which
is caused by that we didn't use the network order for hmacid, so fix
it by adding the transformation in sctp_auth_ep_set_hmacs.

even we set hmacid with the network order in userspace, it still
can't work, because of this condition in sctp_auth_ep_set_hmacs():

if (id > SCTP_AUTH_HMAC_ID_MAX)
return -EOPNOTSUPP;

so this wasn't working before and thus it won't break compatibility.

Fixes: 65b07e5d0d09 ("[SCTP]: API updates to suport SCTP-AUTH extensions.")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 98d37e7fed9568f37d4a4a150a3773cfafef91ed)
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agobluetooth: Validate socket address length in sco_sock_bind().
David S. Miller [Tue, 15 Dec 2015 20:39:08 +0000 (15:39 -0500)]
bluetooth: Validate socket address length in sco_sock_bind().

commit 5233252fce714053f0151680933571a2da9cbfb4 upstream.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agonet: fix warnings in 'make htmldocs' by moving macro definition out of field declaration
Hannes Frederic Sowa [Mon, 14 Dec 2015 22:30:43 +0000 (23:30 +0100)]
net: fix warnings in 'make htmldocs' by moving macro definition out of field declaration

commit 7bbadd2d1009575dad675afc16650ebb5aa10612 upstream.

Docbook does not like the definition of macros inside a field declaration
and adds a warning. Move the definition out.

Fixes: 79462ad02e86180 ("net: add validation for the socket syscall protocol argument")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: keep open-coding U8_MAX]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 35da6d62b608fd3beb576d72e48e2bc866d714c9)
[wt: adjusted context]
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agonet: add validation for the socket syscall protocol argument
Hannes Frederic Sowa [Mon, 14 Dec 2015 21:03:39 +0000 (22:03 +0100)]
net: add validation for the socket syscall protocol argument

commit 79462ad02e861803b3840cc782248c7359451cd9 upstream.

é\83­æ°¸å\88\9a reported that one could simply crash the kernel as root by
using a simple program:

int socket_fd;
struct sockaddr_in addr;
addr.sin_port = 0;
addr.sin_addr.s_addr = INADDR_ANY;
addr.sin_family = 10;

socket_fd = socket(10,3,0x40000000);
connect(socket_fd , &addr,16);

AF_INET, AF_INET6 sockets actually only support 8-bit protocol
identifiers. inet_sock's skc_protocol field thus is sized accordingly,
thus larger protocol identifiers simply cut off the higher bits and
store a zero in the protocol fields.

This could lead to e.g. NULL function pointer because as a result of
the cut off inet_num is zero and we call down to inet_autobind, which
is NULL for raw sockets.

kernel: Call Trace:
kernel:  [<ffffffff816db90e>] ? inet_autobind+0x2e/0x70
kernel:  [<ffffffff816db9a4>] inet_dgram_connect+0x54/0x80
kernel:  [<ffffffff81645069>] SYSC_connect+0xd9/0x110
kernel:  [<ffffffff810ac51b>] ? ptrace_notify+0x5b/0x80
kernel:  [<ffffffff810236d8>] ? syscall_trace_enter_phase2+0x108/0x200
kernel:  [<ffffffff81645e0e>] SyS_connect+0xe/0x10
kernel:  [<ffffffff81779515>] tracesys_phase2+0x84/0x89

I found no particular commit which introduced this problem.

CVE: CVE-2015-8543
Cc: Cong Wang <cwang@twopensource.com>
Reported-by: é\83­æ°¸å\88\9a <guoyonggang@360.cn>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 2.6.32: open-code U8_MAX; adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agoKEYS: Fix race between read and revoke
David Howells [Fri, 18 Dec 2015 01:34:26 +0000 (01:34 +0000)]
KEYS: Fix race between read and revoke

commit b4a1b4f5047e4f54e194681125c74c0aa64d637d upstream.

This fixes CVE-2015-7550.

There's a race between keyctl_read() and keyctl_revoke().  If the revoke
happens between keyctl_read() checking the validity of a key and the key's
semaphore being taken, then the key type read method will see a revoked key.

This causes a problem for the user-defined key type because it assumes in
its read method that there will always be a payload in a non-revoked key
and doesn't check for a NULL pointer.

Fix this by making keyctl_read() check the validity of a key after taking
semaphore instead of before.

I think the bug was introduced with the original keyrings code.

This was discovered by a multithreaded test program generated by syzkaller
(http://github.com/google/syzkaller).  Here's a cleaned up version:

#include <sys/types.h>
#include <keyutils.h>
#include <pthread.h>
void *thr0(void *arg)
{
key_serial_t key = (unsigned long)arg;
keyctl_revoke(key);
return 0;
}
void *thr1(void *arg)
{
key_serial_t key = (unsigned long)arg;
char buffer[16];
keyctl_read(key, buffer, 16);
return 0;
}
int main()
{
key_serial_t key = add_key("user", "%", "foo", 3, KEY_SPEC_USER_KEYRING);
pthread_t th[5];
pthread_create(&th[0], 0, thr0, (void *)(unsigned long)key);
pthread_create(&th[1], 0, thr1, (void *)(unsigned long)key);
pthread_create(&th[2], 0, thr0, (void *)(unsigned long)key);
pthread_create(&th[3], 0, thr1, (void *)(unsigned long)key);
pthread_join(th[0], 0);
pthread_join(th[1], 0);
pthread_join(th[2], 0);
pthread_join(th[3], 0);
return 0;
}

Build as:

cc -o keyctl-race keyctl-race.c -lkeyutils -lpthread

Run as:

while keyctl-race; do :; done

as it may need several iterations to crash the kernel.  The crash can be
summarised as:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
IP: [<ffffffff81279b08>] user_read+0x56/0xa3
...
Call Trace:
 [<ffffffff81276aa9>] keyctl_read_key+0xb6/0xd7
 [<ffffffff81277815>] SyS_keyctl+0x83/0xe0
 [<ffffffff815dbb97>] entry_SYSCALL_64_fastpath+0x12/0x6f

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
[bwh: Backported to 2.6.32: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agoudp: properly support MSG_PEEK with truncated buffers
Eric Dumazet [Wed, 30 Dec 2015 13:51:12 +0000 (08:51 -0500)]
udp: properly support MSG_PEEK with truncated buffers

commit 197c949e7798fbf28cfadc69d9ca0c2abbf93191 upstream.

Backport of this upstream commit into stable kernels :
89c22d8c3b27 ("net: Fix skb csum races when peeking")
exposed a bug in udp stack vs MSG_PEEK support, when user provides
a buffer smaller than skb payload.

In this case,
skb_copy_and_csum_datagram_iovec(skb, sizeof(struct udphdr),
                                 msg->msg_iov);
returns -EFAULT.

This bug does not happen in upstream kernels since Al Viro did a great
job to replace this into :
skb_copy_and_csum_datagram_msg(skb, sizeof(struct udphdr), msg);
This variant is safe vs short buffers.

For the time being, instead reverting Herbert Xu patch and add back
skb->ip_summed invalid changes, simply store the result of
udp_lib_checksum_complete() so that we avoid computing the checksum a
second time, and avoid the problematic
skb_copy_and_csum_datagram_iovec() call.

This patch can be applied on recent kernels as it avoids a double
checksumming, then backported to stable kernels as a bug fix.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 18a6eba2eabbcb50a78210b16f7dd43d888a537b)
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agoRevert "net: add length argument to skb_copy_and_csum_datagram_iovec"
Willy Tarreau [Sat, 23 Jan 2016 10:16:20 +0000 (11:16 +0100)]
Revert "net: add length argument to skb_copy_and_csum_datagram_iovec"

This reverts commit c507639ba963bb47e3f515670a7cace76af76ab6.
As reported by Michal Kubecek, this fix doesn't handle truncated
reads correctly. Next patch from Eric fixes it better.

Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agoext4: Fix null dereference in ext4_fill_super()
Ben Hutchings [Tue, 1 Dec 2015 02:50:10 +0000 (02:50 +0000)]
ext4: Fix null dereference in ext4_fill_super()

Fix failure paths in ext4_fill_super() that can lead to a null
dereference.  This was designated CVE-2015-8324.

Mostly extracted from commit 744692dc0598 ("ext4: use
ext4_get_block_write in buffer write").

However there's one more incorrect goto to fix, removed upstream in
commit cf40db137cc2 ("ext4: remove failed journal checksum check").

Reference: https://bugs.openvz.org/browse/OVZ-6541
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agounix: avoid use-after-free in ep_remove_wait_queue
Rainer Weikusat [Fri, 20 Nov 2015 22:07:23 +0000 (22:07 +0000)]
unix: avoid use-after-free in ep_remove_wait_queue

commit 7d267278a9ece963d77eefec61630223fce08c6c upstream.

Rainer Weikusat <rweikusat@mobileactivedefense.com> writes:
An AF_UNIX datagram socket being the client in an n:1 association with
some server socket is only allowed to send messages to the server if the
receive queue of this socket contains at most sk_max_ack_backlog
datagrams. This implies that prospective writers might be forced to go
to sleep despite none of the message presently enqueued on the server
receive queue were sent by them. In order to ensure that these will be
woken up once space becomes again available, the present unix_dgram_poll
routine does a second sock_poll_wait call with the peer_wait wait queue
of the server socket as queue argument (unix_dgram_recvmsg does a wake
up on this queue after a datagram was received). This is inherently
problematic because the server socket is only guaranteed to remain alive
for as long as the client still holds a reference to it. In case the
connection is dissolved via connect or by the dead peer detection logic
in unix_dgram_sendmsg, the server socket may be freed despite "the
polling mechanism" (in particular, epoll) still has a pointer to the
corresponding peer_wait queue. There's no way to forcibly deregister a
wait queue with epoll.

Based on an idea by Jason Baron, the patch below changes the code such
that a wait_queue_t belonging to the client socket is enqueued on the
peer_wait queue of the server whenever the peer receive queue full
condition is detected by either a sendmsg or a poll. A wake up on the
peer queue is then relayed to the ordinary wait queue of the client
socket via wake function. The connection to the peer wait queue is again
dissolved if either a wake up is about to be relayed or the client
socket reconnects or a dead peer is detected or the client socket is
itself closed. This enables removing the second sock_poll_wait from
unix_dgram_poll, thus avoiding the use-after-free, while still ensuring
that no blocked writer sleeps forever.

Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Fixes: ec0d215f9420 ("af_unix: fix 'poll for write'/connected DGRAM sockets")
Reviewed-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 2.6.32:
 - Access sk_sleep directly, not through sk_sleep() function
 - Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agoRDS: fix race condition when sending a message on unbound socket
Quentin Casasnovas [Tue, 24 Nov 2015 22:13:21 +0000 (17:13 -0500)]
RDS: fix race condition when sending a message on unbound socket

commit 8c7188b23474cca017b3ef354c4a58456f68303a upstream.

Sasha's found a NULL pointer dereference in the RDS connection code when
sending a message to an apparently unbound socket.  The problem is caused
by the code checking if the socket is bound in rds_sendmsg(), which checks
the rs_bound_addr field without taking a lock on the socket.  This opens a
race where rs_bound_addr is temporarily set but where the transport is not
in rds_bind(), leading to a NULL pointer dereference when trying to
dereference 'trans' in __rds_conn_create().

Vegard wrote a reproducer for this issue, so kindly ask him to share if
you're interested.

I cannot reproduce the NULL pointer dereference using Vegard's reproducer
with this patch, whereas I could without.

Complete earlier incomplete fix to CVE-2015-6937:

  74e98eb08588 ("RDS: verify the underlying transport exists before creating a connection")

Cc: David S. Miller <davem@davemloft.net>
Reviewed-by: Vegard Nossum <vegard.nossum@oracle.com>
Reviewed-by: Sasha Levin <sasha.levin@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agoppp, slip: Validate VJ compression slot parameters completely
Ben Hutchings [Sun, 1 Nov 2015 16:22:53 +0000 (16:22 +0000)]
ppp, slip: Validate VJ compression slot parameters completely

commit 4ab42d78e37a294ac7bc56901d563c642e03c4ae upstream.

Currently slhc_init() treats out-of-range values of rslots and tslots
as equivalent to 0, except that if tslots is too large it will
dereference a null pointer (CVE-2015-7799).

Add a range-check at the top of the function and make it return an
ERR_PTR() on error instead of NULL.  Change the callers accordingly.

Compile-tested only.

Reported-by: é\83­æ°¸å\88\9a <guoyonggang@360.cn>
References: http://article.gmane.org/gmane.comp.security.oss.general/17908
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 2.6.32: adjust filenames, context, indentation]
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agoisdn_ppp: Add checks for allocation failure in isdn_ppp_open()
Ben Hutchings [Sun, 1 Nov 2015 16:21:24 +0000 (16:21 +0000)]
isdn_ppp: Add checks for allocation failure in isdn_ppp_open()

commit 0baa57d8dc32db78369d8b5176ef56c5e2e18ab3 upstream.

Compile-tested only.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agoip6mr: call del_timer_sync() in ip6mr_free_table()
WANG Cong [Tue, 31 Mar 2015 18:01:47 +0000 (11:01 -0700)]
ip6mr: call del_timer_sync() in ip6mr_free_table()

commit 7ba0c47c34a1ea5bc7a24ca67309996cce0569b5 upstream.

We need to wait for the flying timers, since we
are going to free the mrtable right after it.

Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Ben Hutchings <ben@decadent.org.uk>
[ wt: 2.6.32 has a single table hence a single timer. ip6_mr_init() has
  the same del_timer() call on the error path, but we don't need to
  change it since at this point the timer hasn't been started yet ]
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agoLinux 2.6.32.69 v2.6.32.69
Willy Tarreau [Sat, 5 Dec 2015 23:48:59 +0000 (00:48 +0100)]
Linux 2.6.32.69

Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agosplice: sendfile() at once fails for big files
Christophe Leroy [Wed, 6 May 2015 15:26:47 +0000 (17:26 +0200)]
splice: sendfile() at once fails for big files

commit 0ff28d9f4674d781e492bcff6f32f0fe48cf0fed upstream.

Using sendfile with below small program to get MD5 sums of some files,
it appear that big files (over 64kbytes with 4k pages system) get a
wrong MD5 sum while small files get the correct sum.
This program uses sendfile() to send a file to an AF_ALG socket
for hashing.

/* md5sum2.c */
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <fcntl.h>
#include <sys/socket.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <linux/if_alg.h>

int main(int argc, char **argv)
{
int sk = socket(AF_ALG, SOCK_SEQPACKET, 0);
struct stat st;
struct sockaddr_alg sa = {
.salg_family = AF_ALG,
.salg_type = "hash",
.salg_name = "md5",
};
int n;

bind(sk, (struct sockaddr*)&sa, sizeof(sa));

for (n = 1; n < argc; n++) {
int size;
int offset = 0;
char buf[4096];
int fd;
int sko;
int i;

fd = open(argv[n], O_RDONLY);
sko = accept(sk, NULL, 0);
fstat(fd, &st);
size = st.st_size;
sendfile(sko, fd, &offset, size);
size = read(sko, buf, sizeof(buf));
for (i = 0; i < size; i++)
printf("%2.2x", buf[i]);
printf("  %s\n", argv[n]);
close(fd);
close(sko);
}
exit(0);
}

Test below is done using official linux patch files. First result is
with a software based md5sum. Second result is with the program above.

root@vgoip:~# ls -l patch-3.6.*
-rw-r--r--    1 root     root         64011 Aug 24 12:01 patch-3.6.2.gz
-rw-r--r--    1 root     root         94131 Aug 24 12:01 patch-3.6.3.gz

root@vgoip:~# md5sum patch-3.6.*
b3ffb9848196846f31b2ff133d2d6443  patch-3.6.2.gz
c5e8f687878457db77cb7158c38a7e43  patch-3.6.3.gz

root@vgoip:~# ./md5sum2 patch-3.6.*
b3ffb9848196846f31b2ff133d2d6443  patch-3.6.2.gz
5fd77b24e68bb24dcc72d6e57c64790e  patch-3.6.3.gz

After investivation, it appears that sendfile() sends the files by blocks
of 64kbytes (16 times PAGE_SIZE). The problem is that at the end of each
block, the SPLICE_F_MORE flag is missing, therefore the hashing operation
is reset as if it was the end of the file.

This patch adds SPLICE_F_MORE to the flags when more data is pending.

With the patch applied, we get the correct sums:

root@vgoip:~# md5sum patch-3.6.*
b3ffb9848196846f31b2ff133d2d6443  patch-3.6.2.gz
c5e8f687878457db77cb7158c38a7e43  patch-3.6.3.gz

root@vgoip:~# ./md5sum2 patch-3.6.*
b3ffb9848196846f31b2ff133d2d6443  patch-3.6.2.gz
c5e8f687878457db77cb7158c38a7e43  patch-3.6.3.gz

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit fcb2781782b61631db4ed31e98757795eacd31cb)

Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agonet: avoid NULL deref in inet_ctl_sock_destroy()
Eric Dumazet [Mon, 2 Nov 2015 15:50:07 +0000 (07:50 -0800)]
net: avoid NULL deref in inet_ctl_sock_destroy()

[ Upstream commit 8fa677d2706d325d71dab91bf6e6512c05214e37 ]

Under low memory conditions, tcp_sk_init() and icmp_sk_init()
can both iterate on all possible cpus and call inet_ctl_sock_destroy(),
with eventual NULL pointer.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit f79c83d6c41930362bc66fc71489e92975a2facf)

Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agoipmr: fix possible race resulting from improper usage of IP_INC_STATS_BH() in preempt...
Ani Sinha [Fri, 30 Oct 2015 23:54:31 +0000 (16:54 -0700)]
ipmr: fix possible race resulting from improper usage of IP_INC_STATS_BH() in preemptible context.

[ Upstream commit 44f49dd8b5a606870a1f21101522a0f9c4414784 ]

Fixes the following kernel BUG :

BUG: using __this_cpu_add() in preemptible [00000000] code: bash/2758
caller is __this_cpu_preempt_check+0x13/0x15
CPU: 0 PID: 2758 Comm: bash Tainted: P           O   3.18.19 #2
 ffffffff8170eaca ffff880110d1b788 ffffffff81482b2a 0000000000000000
 0000000000000000 ffff880110d1b7b8 ffffffff812010ae ffff880007cab800
 ffff88001a060800 ffff88013a899108 ffff880108b84240 ffff880110d1b7c8
Call Trace:
[<ffffffff81482b2a>] dump_stack+0x52/0x80
[<ffffffff812010ae>] check_preemption_disabled+0xce/0xe1
[<ffffffff812010d4>] __this_cpu_preempt_check+0x13/0x15
[<ffffffff81419d60>] ipmr_queue_xmit+0x647/0x70c
[<ffffffff8141a154>] ip_mr_forward+0x32f/0x34e
[<ffffffff8141af76>] ip_mroute_setsockopt+0xe03/0x108c
[<ffffffff810553fc>] ? get_parent_ip+0x11/0x42
[<ffffffff810e6974>] ? pollwake+0x4d/0x51
[<ffffffff81058ac0>] ? default_wake_function+0x0/0xf
[<ffffffff810553fc>] ? get_parent_ip+0x11/0x42
[<ffffffff810613d9>] ? __wake_up_common+0x45/0x77
[<ffffffff81486ea9>] ? _raw_spin_unlock_irqrestore+0x1d/0x32
[<ffffffff810618bc>] ? __wake_up_sync_key+0x4a/0x53
[<ffffffff8139a519>] ? sock_def_readable+0x71/0x75
[<ffffffff813dd226>] do_ip_setsockopt+0x9d/0xb55
[<ffffffff81429818>] ? unix_seqpacket_sendmsg+0x3f/0x41
[<ffffffff813963fe>] ? sock_sendmsg+0x6d/0x86
[<ffffffff813959d4>] ? sockfd_lookup_light+0x12/0x5d
[<ffffffff8139650a>] ? SyS_sendto+0xf3/0x11b
[<ffffffff810d5738>] ? new_sync_read+0x82/0xaa
[<ffffffff813ddd19>] compat_ip_setsockopt+0x3b/0x99
[<ffffffff813fb24a>] compat_raw_setsockopt+0x11/0x32
[<ffffffff81399052>] compat_sock_common_setsockopt+0x18/0x1f
[<ffffffff813c4d05>] compat_SyS_setsockopt+0x1a9/0x1cf
[<ffffffff813c4149>] compat_SyS_socketcall+0x180/0x1e3
[<ffffffff81488ea1>] cstar_dispatch+0x7/0x1e

Signed-off-by: Ani Sinha <ani@arista.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: ipmr doesn't implement IPSTATS_MIB_OUTOCTETS]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 33cf84ba8c25b40c7de52029efc8d4372725c95f)

Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agoRDS-TCP: Recover correctly from pskb_pull()/pksb_trim() failure in rds_tcp_data_recv
Sowmini Varadhan [Mon, 26 Oct 2015 16:46:37 +0000 (12:46 -0400)]
RDS-TCP: Recover correctly from pskb_pull()/pksb_trim() failure in rds_tcp_data_recv

[ Upstream commit 8ce675ff39b9958d1c10f86cf58e357efaafc856 ]

Either of pskb_pull() or pskb_trim() may fail under low memory conditions.
If rds_tcp_data_recv() ignores such failures, the application will
receive corrupted data because the skb has not been correctly
carved to the RDS datagram size.

Avoid this by handling pskb_pull/pskb_trim failure in the same
manner as the skb_clone failure: bail out of rds_tcp_data_recv(), and
retry via the deferred call to rds_send_worker() that gets set up on
ENOMEM from rds_tcp_read_sock()

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit f114d9374ba3e42c86b112c8b4dbcba50a7330e7)
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agobinfmt_elf: Don't clobber passed executable's file header
Maciej W. Rozycki [Mon, 26 Oct 2015 15:48:19 +0000 (15:48 +0000)]
binfmt_elf: Don't clobber passed executable's file header

commit b582ef5c53040c5feef4c96a8f9585b6831e2441 upstream.

Do not clobber the buffer space passed from `search_binary_handler' and
originally preloaded by `prepare_binprm' with the executable's file
header by overwriting it with its interpreter's file header.  Instead
keep the buffer space intact and directly use the data structure locally
allocated for the interpreter's file header, fixing a bug introduced in
2.1.14 with loadable module support (linux-mips.org commit beb11695
[Import of Linux/MIPS 2.1.14], predating kernel.org repo's history).
Adjust the amount of data read from the interpreter's file accordingly.

This was not an issue before loadable module support, because back then
`load_elf_binary' was executed only once for a given ELF executable,
whether the function succeeded or failed.

With loadable module support supported and enabled, upon a failure of
`load_elf_binary' -- which may for example be caused by architecture
code rejecting an executable due to a missing hardware feature requested
in the file header -- a module load is attempted and then the function
reexecuted by `search_binary_handler'.  With the executable's file
header replaced with its interpreter's file header the executable can
then be erroneously accepted in this subsequent attempt.

Signed-off-by: Maciej W. Rozycki <macro@imgtec.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit beebd9fa9d8aeb8f1a3028acc1987c808b601e7d)

Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agodevres: fix a for loop bounds check
Dan Carpenter [Mon, 21 Sep 2015 16:21:51 +0000 (19:21 +0300)]
devres: fix a for loop bounds check

commit 1f35d04a02a652f14566f875aef3a6f2af4cb77b upstream.

The iomap[] array has PCIM_IOMAP_MAX (6) elements and not
DEVICE_COUNT_RESOURCE (16).  This bug was found using a static checker.
It may be that the "if (!(mask & (1 << i)))" check means we never
actually go past the end of the array in real life.

Fixes: ec04b075843d ('iomap: implement pcim_iounmap_regions()')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit e7102453150c7081a27744989374c474d2ebea8e)

Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agoHID: core: Avoid uninitialized buffer access
Richard Purdie [Fri, 18 Sep 2015 23:31:33 +0000 (16:31 -0700)]
HID: core: Avoid uninitialized buffer access

commit 79b568b9d0c7c5d81932f4486d50b38efdd6da6d upstream.

hid_connect adds various strings to the buffer but they're all
conditional. You can find circumstances where nothing would be written
to it but the kernel will still print the supposedly empty buffer with
printk. This leads to corruption on the console/in the logs.

Ensure buf is initialized to an empty string.

Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
[dvhart: Initialize string to "" rather than assign buf[0] = NULL;]
Cc: Jiri Kosina <jikos@kernel.org>
Cc: linux-input@vger.kernel.org
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 604bfd00358e3d7fce8dc789fe52d2f2be0fa4c7)

Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agoethtool: Use kcalloc instead of kmalloc for ethtool_get_strings
Joe Perches [Wed, 14 Oct 2015 08:09:40 +0000 (01:09 -0700)]
ethtool: Use kcalloc instead of kmalloc for ethtool_get_strings

[ Upstream commit 077cb37fcf6f00a45f375161200b5ee0cd4e937b ]

It seems that kernel memory can leak into userspace by a
kmalloc, ethtool_get_strings, then copy_to_user sequence.

Avoid this by using kcalloc to zero fill the copied buffer.

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 68c3e59aa9cdf2d8870d8fbe4f37b1a509d0abeb)

Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agomvsas: Fix NULL pointer dereference in mvs_slot_task_free
\81vis MosÄ\81ns [Fri, 21 Aug 2015 04:29:22 +0000 (07:29 +0300)]
mvsas: Fix NULL pointer dereference in mvs_slot_task_free

commit 2280521719e81919283b82902ac24058f87dfc1b upstream.

When pci_pool_alloc fails in mvs_task_prep then task->lldd_task stays
NULL but it's later used in mvs_abort_task as slot which is passed
to mvs_slot_task_free causing NULL pointer dereference.

Just return from mvs_slot_task_free when passed with NULL slot.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=101891
Signed-off-by: DÄ\81vis MosÄ\81ns <davispuh@gmail.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit cc1875ecbc3c9fb2774097e03870280c91c1e0e1)

Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agotty: fix stall caused by missing memory barrier in drivers/tty/n_tty.c
Kosuke Tatsukawa [Fri, 2 Oct 2015 08:27:05 +0000 (08:27 +0000)]
tty: fix stall caused by missing memory barrier in drivers/tty/n_tty.c

commit e81107d4c6bd098878af9796b24edc8d4a9524fd upstream.

My colleague ran into a program stall on a x86_64 server, where
n_tty_read() was waiting for data even if there was data in the buffer
in the pty.  kernel stack for the stuck process looks like below.
 #0 [ffff88303d107b58] __schedule at ffffffff815c4b20
 #1 [ffff88303d107bd0] schedule at ffffffff815c513e
 #2 [ffff88303d107bf0] schedule_timeout at ffffffff815c7818
 #3 [ffff88303d107ca0] wait_woken at ffffffff81096bd2
 #4 [ffff88303d107ce0] n_tty_read at ffffffff8136fa23
 #5 [ffff88303d107dd0] tty_read at ffffffff81368013
 #6 [ffff88303d107e20] __vfs_read at ffffffff811a3704
 #7 [ffff88303d107ec0] vfs_read at ffffffff811a3a57
 #8 [ffff88303d107f00] sys_read at ffffffff811a4306
 #9 [ffff88303d107f50] entry_SYSCALL_64_fastpath at ffffffff815c86d7

There seems to be two problems causing this issue.

First, in drivers/tty/n_tty.c, __receive_buf() stores the data and
updates ldata->commit_head using smp_store_release() and then checks
the wait queue using waitqueue_active().  However, since there is no
memory barrier, __receive_buf() could return without calling
wake_up_interactive_poll(), and at the same time, n_tty_read() could
start to wait in wait_woken() as in the following chart.

        __receive_buf()                         n_tty_read()
------------------------------------------------------------------------
if (waitqueue_active(&tty->read_wait))
/* Memory operations issued after the
   RELEASE may be completed before the
   RELEASE operation has completed */
                                        add_wait_queue(&tty->read_wait, &wait);
                                        ...
                                        if (!input_available_p(tty, 0)) {
smp_store_release(&ldata->commit_head,
                  ldata->read_head);
                                        ...
                                        timeout = wait_woken(&wait,
                                          TASK_INTERRUPTIBLE, timeout);
------------------------------------------------------------------------

The second problem is that n_tty_read() also lacks a memory barrier
call and could also cause __receive_buf() to return without calling
wake_up_interactive_poll(), and n_tty_read() to wait in wait_woken()
as in the chart below.

        __receive_buf()                         n_tty_read()
------------------------------------------------------------------------
                                        spin_lock_irqsave(&q->lock, flags);
                                        /* from add_wait_queue() */
                                        ...
                                        if (!input_available_p(tty, 0)) {
                                        /* Memory operations issued after the
                                           RELEASE may be completed before the
                                           RELEASE operation has completed */
smp_store_release(&ldata->commit_head,
                  ldata->read_head);
if (waitqueue_active(&tty->read_wait))
                                        __add_wait_queue(q, wait);
                                        spin_unlock_irqrestore(&q->lock,flags);
                                        /* from add_wait_queue() */
                                        ...
                                        timeout = wait_woken(&wait,
                                          TASK_INTERRUPTIBLE, timeout);
------------------------------------------------------------------------

There are also other places in drivers/tty/n_tty.c which have similar
calls to waitqueue_active(), so instead of adding many memory barrier
calls, this patch simply removes the call to waitqueue_active(),
leaving just wake_up*() behind.

This fixes both problems because, even though the memory access before
or after the spinlocks in both wake_up*() and add_wait_queue() can
sneak into the critical section, it cannot go past it and the critical
section assures that they will be serialized (please see "INTER-CPU
ACQUIRING BARRIER EFFECTS" in Documentation/memory-barriers.txt for a
better explanation).  Moreover, the resulting code is much simpler.

Latency measurement using a ping-pong test over a pty doesn't show any
visible performance drop.

Signed-off-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2:
 - Use wake_up_interruptible(), not wake_up_interruptible_poll()
 - There are only two spurious uses of waitqueue_active() to remove]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 80910ccdd3ee35e4131df38bc73b86ee60abdf0b)
[wt: file is drivers/char/n_tty.c in 2.6.32]
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agomm: hugetlbfs: skip shared VMAs when unmapping private pages to satisfy a fault
Mel Gorman [Thu, 1 Oct 2015 22:36:57 +0000 (15:36 -0700)]
mm: hugetlbfs: skip shared VMAs when unmapping private pages to satisfy a fault

commit 2f84a8990ebbe235c59716896e017c6b2ca1200f upstream.

SunDong reported the following on

  https://bugzilla.kernel.org/show_bug.cgi?id=103841

I think I find a linux bug, I have the test cases is constructed. I
can stable recurring problems in fedora22(4.0.4) kernel version,
arch for x86_64.  I construct transparent huge page, when the parent
and child process with MAP_SHARE, MAP_PRIVATE way to access the same
huge page area, it has the opportunity to lead to huge page copy on
write failure, and then it will munmap the child corresponding mmap
area, but then the child mmap area with VM_MAYSHARE attributes, child
process munmap this area can trigger VM_BUG_ON in set_vma_resv_flags
functions (vma - > vm_flags & VM_MAYSHARE).

There were a number of problems with the report (e.g.  it's hugetlbfs that
triggers this, not transparent huge pages) but it was fundamentally
correct in that a VM_BUG_ON in set_vma_resv_flags() can be triggered that
looks like this

 vma ffff8804651fd0d0 start 00007fc474e00000 end 00007fc475e00000
 next ffff8804651fd018 prev ffff8804651fd188 mm ffff88046b1b1800
 prot 8000000000000027 anon_vma           (null) vm_ops ffffffff8182a7a0
 pgoff 0 file ffff88106bdb9800 private_data           (null)
 flags: 0x84400fb(read|write|shared|mayread|maywrite|mayexec|mayshare|dontexpand|hugetlb)
 ------------
 kernel BUG at mm/hugetlb.c:462!
 SMP
 Modules linked in: xt_pkttype xt_LOG xt_limit [..]
 CPU: 38 PID: 26839 Comm: map Not tainted 4.0.4-default #1
 Hardware name: Dell Inc. PowerEdge R810/0TT6JF, BIOS 2.7.4 04/26/2012
 set_vma_resv_flags+0x2d/0x30

The VM_BUG_ON is correct because private and shared mappings have
different reservation accounting but the warning clearly shows that the
VMA is shared.

When a private COW fails to allocate a new page then only the process
that created the VMA gets the page -- all the children unmap the page.
If the children access that data in the future then they get killed.

The problem is that the same file is mapped shared and private.  During
the COW, the allocation fails, the VMAs are traversed to unmap the other
private pages but a shared VMA is found and the bug is triggered.  This
patch identifies such VMAs and skips them.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: SunDong <sund_sky@126.com>
Reviewed-by: Michal Hocko <mhocko@suse.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: David Rientjes <rientjes@google.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 846bc2d8bef1e0d253cdfabfe707f37fc8cd836d)

Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agox86/process: Add proper bound checks in 64bit get_wchan()
Thomas Gleixner [Wed, 30 Sep 2015 08:38:22 +0000 (08:38 +0000)]
x86/process: Add proper bound checks in 64bit get_wchan()

commit eddd3826a1a0190e5235703d1e666affa4d13b96 upstream.

Dmitry Vyukov reported the following using trinity and the memory
error detector AddressSanitizer
(https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel).

[ 124.575597] ERROR: AddressSanitizer: heap-buffer-overflow on
address ffff88002e280000
[ 124.576801] ffff88002e280000 is located 131938492886538 bytes to
the left of 28857600-byte region [ffffffff81282e0affffffff82e0830a)
[ 124.578633] Accessed by thread T10915:
[ 124.579295] inlined in describe_heap_address
./arch/x86/mm/asan/report.c:164
[ 124.579295] #0 ffffffff810dd277 in asan_report_error
./arch/x86/mm/asan/report.c:278
[ 124.580137] #1 ffffffff810dc6a0 in asan_check_region
./arch/x86/mm/asan/asan.c:37
[ 124.581050] #2 ffffffff810dd423 in __tsan_read8 ??:0
[ 124.581893] #3 ffffffff8107c093 in get_wchan
./arch/x86/kernel/process_64.c:444

The address checks in the 64bit implementation of get_wchan() are
wrong in several ways:

 - The lower bound of the stack is not the start of the stack
   page. It's the start of the stack page plus sizeof (struct
   thread_info)

 - The upper bound must be:

       top_of_stack - TOP_OF_KERNEL_STACK_PADDING - 2 * sizeof(unsigned long).

   The 2 * sizeof(unsigned long) is required because the stack pointer
   points at the frame pointer. The layout on the stack is: ... IP FP
   ... IP FP. So we need to make sure that both IP and FP are in the
   bounds.

Fix the bound checks and get rid of the mix of numeric constants, u64
and unsigned long. Making all unsigned long allows us to use the same
function for 32bit as well.

Use READ_ONCE() when accessing the stack. This does not prevent a
concurrent wakeup of the task and the stack changing, but at least it
avoids TOCTOU.

Also check task state at the end of the loop. Again that does not
prevent concurrent changes, but it avoids walking for nothing.

Add proper comments while at it.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Based-on-patch-from: Wolfram Gloger <wmglo@dent.med.uni-muenchen.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@alien8.de>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: kasan-dev <kasan-dev@googlegroups.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Wolfram Gloger <wmglo@dent.med.uni-muenchen.de>
Link: http://lkml.kernel.org/r/20150930083302.694788319@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bwh: Backported to 3.2:
 - s/READ_ONCE/ACCESS_ONCE/
 - Remove use of TOP_OF_KERNEL_STACK_PADDING, not defined here and would
   be defined as 0]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 5311d93d0d33ae878d5fbb35ea5693b9c813ba04)

Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agomodule: Fix locking in symbol_put_addr()
Peter Zijlstra [Thu, 20 Aug 2015 01:04:59 +0000 (10:34 +0930)]
module: Fix locking in symbol_put_addr()

commit 275d7d44d802ef271a42dc87ac091a495ba72fc5 upstream.

Poma (on the way to another bug) reported an assertion triggering:

  [<ffffffff81150529>] module_assert_mutex_or_preempt+0x49/0x90
  [<ffffffff81150822>] __module_address+0x32/0x150
  [<ffffffff81150956>] __module_text_address+0x16/0x70
  [<ffffffff81150f19>] symbol_put_addr+0x29/0x40
  [<ffffffffa04b77ad>] dvb_frontend_detach+0x7d/0x90 [dvb_core]

Laura Abbott <labbott@redhat.com> produced a patch which lead us to
inspect symbol_put_addr(). This function has a comment claiming it
doesn't need to disable preemption around the module lookup
because it holds a reference to the module it wants to find, which
therefore cannot go away.

This is wrong (and a false optimization too, preempt_disable() is really
rather cheap, and I doubt any of this is on uber critical paths,
otherwise it would've retained a pointer to the actual module anyway and
avoided the second lookup).

While its true that the module cannot go away while we hold a reference
on it, the data structure we do the lookup in very much _CAN_ change
while we do the lookup. Therefore fix the comment and add the
required preempt_disable().

Reported-by: poma <pomidorabelisima@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Fixes: a6e6abd575fc ("module: remove module_text_address()")
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 3895ff2d13043ecd091813b67a485ec487870b63)

Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agonet: add length argument to skb_copy_and_csum_datagram_iovec
Sabrina Dubroca [Thu, 15 Oct 2015 12:25:03 +0000 (14:25 +0200)]
net: add length argument to skb_copy_and_csum_datagram_iovec

Without this length argument, we can read past the end of the iovec in
memcpy_toiovec because we have no way of knowing the total length of the
iovec's buffers.

This is needed for stable kernels where 89c22d8c3b27 ("net: Fix skb
csum races when peeking") has been backported but that don't have the
ioviter conversion, which is almost all the stable trees <= 3.18.

This also fixes a kernel crash for NFS servers when the client uses
 -onfsvers=3,proto=udp to mount the export.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
[bwh: Backported to 3.2: adjust context in include/linux/skbuff.h]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 127500d724f8c43f452610c9080444eedb5eaa6c)

Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agonet: Fix skb csum races when peeking
Herbert Xu [Mon, 13 Jul 2015 12:01:42 +0000 (20:01 +0800)]
net: Fix skb csum races when peeking

[ Upstream commit 89c22d8c3b278212eef6a8cc66b570bc840a6f5a ]

When we calculate the checksum on the recv path, we store the
result in the skb as an optimisation in case we need the checksum
again down the line.

This is in fact bogus for the MSG_PEEK case as this is done without
any locking.  So multiple threads can peek and then store the result
to the same skb, potentially resulting in bogus skb states.

This patch fixes this by only storing the result if the skb is not
shared.  This preserves the optimisations for the few cases where
it can be done safely due to locking or other reasons, e.g., SIOCINQ.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 58a5897a53d535bf95523e6f381f88116217f5ca)

Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agoRDS: verify the underlying transport exists before creating a connection
Sasha Levin [Tue, 8 Sep 2015 14:53:40 +0000 (10:53 -0400)]
RDS: verify the underlying transport exists before creating a connection

commit 74e98eb085889b0d2d4908f59f6e00026063014f upstream.

There was no verification that an underlying transport exists when creating
a connection, this would cause dereferencing a NULL ptr.

It might happen on sockets that weren't properly bound before attempting to
send a message, which will cause a NULL ptr deref:

[135546.047719] kasan: GPF could be caused by NULL-ptr deref or user memory accessgeneral protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
[135546.051270] Modules linked in:
[135546.051781] CPU: 4 PID: 15650 Comm: trinity-c4 Not tainted 4.2.0-next-20150902-sasha-00041-gbaa1222-dirty #2527
[135546.053217] task: ffff8800835bc000 ti: ffff8800bc708000 task.ti: ffff8800bc708000
[135546.054291] RIP: __rds_conn_create (net/rds/connection.c:194)
[135546.055666] RSP: 0018:ffff8800bc70fab0  EFLAGS: 00010202
[135546.056457] RAX: dffffc0000000000 RBX: 0000000000000f2c RCX: ffff8800835bc000
[135546.057494] RDX: 0000000000000007 RSI: ffff8800835bccd8 RDI: 0000000000000038
[135546.058530] RBP: ffff8800bc70fb18 R08: 0000000000000001 R09: 0000000000000000
[135546.059556] R10: ffffed014d7a3a23 R11: ffffed014d7a3a21 R12: 0000000000000000
[135546.060614] R13: 0000000000000001 R14: ffff8801ec3d0000 R15: 0000000000000000
[135546.061668] FS:  00007faad4ffb700(0000) GS:ffff880252000000(0000) knlGS:0000000000000000
[135546.062836] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[135546.063682] CR2: 000000000000846a CR3: 000000009d137000 CR4: 00000000000006a0
[135546.064723] Stack:
[135546.065048]  ffffffffafe2055c ffffffffafe23fc1 ffffed00493097bf ffff8801ec3d0008
[135546.066247]  0000000000000000 00000000000000d0 0000000000000000 ac194a24c0586342
[135546.067438]  1ffff100178e1f78 ffff880320581b00 ffff8800bc70fdd0 ffff880320581b00
[135546.068629] Call Trace:
[135546.069028] ? __rds_conn_create (include/linux/rcupdate.h:856 net/rds/connection.c:134)
[135546.069989] ? rds_message_copy_from_user (net/rds/message.c:298)
[135546.071021] rds_conn_create_outgoing (net/rds/connection.c:278)
[135546.071981] rds_sendmsg (net/rds/send.c:1058)
[135546.072858] ? perf_trace_lock (include/trace/events/lock.h:38)
[135546.073744] ? lockdep_init (kernel/locking/lockdep.c:3298)
[135546.074577] ? rds_send_drop_to (net/rds/send.c:976)
[135546.075508] ? __might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3795)
[135546.076349] ? __might_fault (mm/memory.c:3795)
[135546.077179] ? rds_send_drop_to (net/rds/send.c:976)
[135546.078114] sock_sendmsg (net/socket.c:611 net/socket.c:620)
[135546.078856] SYSC_sendto (net/socket.c:1657)
[135546.079596] ? SYSC_connect (net/socket.c:1628)
[135546.080510] ? trace_dump_stack (kernel/trace/trace.c:1926)
[135546.081397] ? ring_buffer_unlock_commit (kernel/trace/ring_buffer.c:2479 kernel/trace/ring_buffer.c:2558 kernel/trace/ring_buffer.c:2674)
[135546.082390] ? trace_buffer_unlock_commit (kernel/trace/trace.c:1749)
[135546.083410] ? trace_event_raw_event_sys_enter (include/trace/events/syscalls.h:16)
[135546.084481] ? do_audit_syscall_entry (include/trace/events/syscalls.h:16)
[135546.085438] ? trace_buffer_unlock_commit (kernel/trace/trace.c:1749)
[135546.085515] rds_ib_laddr_check(): addr 36.74.25.172 ret -99 node type -1

Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 987ad6eef35223b149baf453171b74917c372cbc)

Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agox86/paravirt: Replace the paravirt nop with a bona fide empty function
Andy Lutomirski [Sun, 20 Sep 2015 23:32:04 +0000 (16:32 -0700)]
x86/paravirt: Replace the paravirt nop with a bona fide empty function

commit fc57a7c68020dcf954428869eafd934c0ab1536f upstream.

PARAVIRT_ADJUST_EXCEPTION_FRAME generates this code (using nmi as an
example, trimmed for readability):

    ff 15 00 00 00 00       callq  *0x0(%rip)        # 2796 <nmi+0x6>
              2792: R_X86_64_PC32     pv_irq_ops+0x2c

That's a call through a function pointer to regular C function that
does nothing on native boots, but that function isn't protected
against kprobes, isn't marked notrace, and is certainly not
guaranteed to preserve any registers if the compiler is feeling
perverse.  This is bad news for a CLBR_NONE operation.

Of course, if everything works correctly, once paravirt ops are
patched, it gets nopped out, but what if we hit this code before
paravirt ops are patched in?  This can potentially cause breakage
that is very difficult to debug.

A more subtle failure is possible here, too: if _paravirt_nop uses
the stack at all (even just to push RBP), it will overwrite the "NMI
executing" variable if it's called in the NMI prologue.

The Xen case, perhaps surprisingly, is fine, because it's already
written in asm.

Fix all of the cases that default to paravirt_nop (including
adjust_exception_frame) with a big hammer: replace paravirt_nop with
an asm function that is just a ret instruction.

The Xen case may have other problems, so document them.

This is part of a fix for some random crashes that Sasha saw.

Reported-and-tested-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Link: http://lkml.kernel.org/r/8f5d2ba295f9d73751c33d97fda03e0495d9ade0.1442791737.git.luto@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 81fbc9a5dd000126ef727dcdaea3ef5714d1e898)

Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agohfs: fix B-tree corruption after insertion at position 0
Hin-Tak Leung [Wed, 9 Sep 2015 22:38:07 +0000 (15:38 -0700)]
hfs: fix B-tree corruption after insertion at position 0

commit b4cc0efea4f0bfa2477c56af406cfcf3d3e58680 upstream.

Fix B-tree corruption when a new record is inserted at position 0 in the
node in hfs_brec_insert().

This is an identical change to the corresponding hfs b-tree code to Sergei
Antonov's "hfsplus: fix B-tree corruption after insertion at position 0",
to keep similar code paths in the hfs and hfsplus drivers in sync, where
appropriate.

Signed-off-by: Hin-Tak Leung <htl10@users.sourceforge.net>
Cc: Sergei Antonov <saproj@gmail.com>
Cc: Joe Perches <joe@perches.com>
Reviewed-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Anton Altaparmakov <anton@tuxera.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit d46a34909db526644edf8ae62058d8371dd5f7e9)

Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agohfs,hfsplus: cache pages correctly between bnode_create and bnode_free
Hin-Tak Leung [Wed, 9 Sep 2015 22:38:04 +0000 (15:38 -0700)]
hfs,hfsplus: cache pages correctly between bnode_create and bnode_free

commit 7cb74be6fd827e314f81df3c5889b87e4c87c569 upstream.

Pages looked up by __hfs_bnode_create() (called by hfs_bnode_create() and
hfs_bnode_find() for finding or creating pages corresponding to an inode)
are immediately kmap()'ed and used (both read and write) and kunmap()'ed,
and should not be page_cache_release()'ed until hfs_bnode_free().

This patch fixes a problem I first saw in July 2012: merely running "du"
on a large hfsplus-mounted directory a few times on a reasonably loaded
system would get the hfsplus driver all confused and complaining about
B-tree inconsistencies, and generates a "BUG: Bad page state".  Most
recently, I can generate this problem on up-to-date Fedora 22 with shipped
kernel 4.0.5, by running "du /" (="/" + "/home" + "/mnt" + other smaller
mounts) and "du /mnt" simultaneously on two windows, where /mnt is a
lightly-used QEMU VM image of the full Mac OS X 10.9:

$ df -i / /home /mnt
Filesystem                  Inodes   IUsed      IFree IUse% Mounted on
/dev/mapper/fedora-root    3276800  551665    2725135   17% /
/dev/mapper/fedora-home   52879360  716221   52163139    2% /home
/dev/nbd0p2             4294967295 1387818 4293579477    1% /mnt

After applying the patch, I was able to run "du /" (60+ times) and "du
/mnt" (150+ times) continuously and simultaneously for 6+ hours.

There are many reports of the hfsplus driver getting confused under load
and generating "BUG: Bad page state" or other similar issues over the
years.  [1]

The unpatched code [2] has always been wrong since it entered the kernel
tree.  The only reason why it gets away with it is that the
kmap/memcpy/kunmap follow very quickly after the page_cache_release() so
the kernel has not had a chance to reuse the memory for something else,
most of the time.

The current RW driver appears to have followed the design and development
of the earlier read-only hfsplus driver [3], where-by version 0.1 (Dec
2001) had a B-tree node-centric approach to
read_cache_page()/page_cache_release() per bnode_get()/bnode_put(),
migrating towards version 0.2 (June 2002) of caching and releasing pages
per inode extents.  When the current RW code first entered the kernel [2]
in 2005, there was an REF_PAGES conditional (and "//" commented out code)
to switch between B-node centric paging to inode-centric paging.  There
was a mistake with the direction of one of the REF_PAGES conditionals in
__hfs_bnode_create().  In a subsequent "remove debug code" commit [4], the
read_cache_page()/page_cache_release() per bnode_get()/bnode_put() were
removed, but a page_cache_release() was mistakenly left in (propagating
the "REF_PAGES <-> !REF_PAGE" mistake), and the commented-out
page_cache_release() in bnode_release() (which should be spanned by
!REF_PAGES) was never enabled.

References:
[1]:
Michael Fox, Apr 2013
http://www.spinics.net/lists/linux-fsdevel/msg63807.html
("hfsplus volume suddenly inaccessable after 'hfs: recoff %d too large'")

Sasha Levin, Feb 2015
http://lkml.org/lkml/2015/2/20/85 ("use after free")

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/740814
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1027887
https://bugzilla.kernel.org/show_bug.cgi?id=42342
https://bugzilla.kernel.org/show_bug.cgi?id=63841
https://bugzilla.kernel.org/show_bug.cgi?id=78761

[2]:
http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/\
fs/hfs/bnode.c?id=d1081202f1d0ee35ab0beb490da4b65d4bc763db
commit d1081202f1d0ee35ab0beb490da4b65d4bc763db
Author: Andrew Morton <akpm@osdl.org>
Date:   Wed Feb 25 16:17:36 2004 -0800

    [PATCH] HFS rewrite

http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/\
fs/hfsplus/bnode.c?id=91556682e0bf004d98a529bf829d339abb98bbbd

commit 91556682e0bf004d98a529bf829d339abb98bbbd
Author: Andrew Morton <akpm@osdl.org>
Date:   Wed Feb 25 16:17:48 2004 -0800

    [PATCH] HFS+ support

[3]:
http://sourceforge.net/projects/linux-hfsplus/

http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.1/
http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.2/

http://linux-hfsplus.cvs.sourceforge.net/viewvc/linux-hfsplus/linux/\
fs/hfsplus/bnode.c?r1=1.4&r2=1.5

Date:   Thu Jun 6 09:45:14 2002 +0000
Use buffer cache instead of page cache in bnode.c. Cache inode extents.

[4]:
http://git.kernel.org/cgit/linux/kernel/git/\
stable/linux-stable.git/commit/?id=a5e3985fa014029eb6795664c704953720cc7f7d

commit a5e3985fa014029eb6795664c704953720cc7f7d
Author: Roman Zippel <zippel@linux-m68k.org>
Date:   Tue Sep 6 15:18:47 2005 -0700

[PATCH] hfs: remove debug code

Signed-off-by: Hin-Tak Leung <htl10@users.sourceforge.net>
Signed-off-by: Sergei Antonov <saproj@gmail.com>
Reviewed-by: Anton Altaparmakov <anton@tuxera.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Sougata Santra <sougata@tuxera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit dd04e674cde34f570509b9e2a6b549af89897640)
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agopagemap: hide physical addresses from non-privileged users
Konstantin Khlebnikov [Tue, 8 Sep 2015 22:00:07 +0000 (15:00 -0700)]
pagemap: hide physical addresses from non-privileged users

commit 1c90308e7a77af6742a97d1021cca923b23b7f0d upstream.

This patch makes pagemap readable for normal users and hides physical
addresses from them.  For some use-cases PFN isn't required at all.

See http://lkml.kernel.org/r/1425935472-17949-1-git-send-email-kirill@shutemov.name

Fixes: ab676b7d6fbf ("pagemap: do not leak physical addresses to non-privileged userspace")
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Mark Williamson <mwilliamson@undo-software.com>
Tested-by: Mark Williamson <mwilliamson@undo-software.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2:
 - Add the same check in the places where we look up a PFN
 - Add struct pagemapread * parameters where necessary
 - Open-code file_ns_capable()
 - Delete pagemap_open() entirely, as it would always return 0]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit b1fb185f26e85f76e3ac6ce557398d78797c9684)
[wt: adjusted context, no pagemap_hugetlb_range() in 2.6.32, needs
     cred argument to security_capable(), tested OK ]
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agosecurity: add cred argument to security_capable()
Chris Wright [Thu, 10 Feb 2011 06:11:51 +0000 (22:11 -0800)]
security: add cred argument to security_capable()

commit 6037b715d6fab139742c3df8851db4c823081561 upstream.

Expand security_capable() to include cred, so that it can be usable in a
wider range of call sites.

Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: James Morris <jmorris@namei.org>
[wt: needed by next patch only]
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agoInput: evdev - do not report errors form flush()
Takashi Iwai [Fri, 4 Sep 2015 05:20:00 +0000 (22:20 -0700)]
Input: evdev - do not report errors form flush()

commit eb38f3a4f6e86f8bb10a3217ebd85ecc5d763aae upstream.

We've got bug reports showing the old systemd-logind (at least
system-210) aborting unexpectedly, and this turned out to be because
of an invalid error code from close() call to evdev devices.  close()
is supposed to return only either EINTR or EBADFD, while the device
returned ENODEV.  logind was overreacting to it and decided to kill
itself when an unexpected error code was received.  What a tragedy.

The bad error code comes from flush fops, and actually evdev_flush()
returns ENODEV when device is disconnected or client's access to it is
revoked. But in these cases the fact that flush did not actually happen is
not an error, but rather normal behavior. For non-disconnected devices
result of flush is also not that interesting as there is no potential of
data loss and even if it fails application has no way of handling the
error. Because of that we are better off always returning success from
evdev_flush().

Also returning EINTR from flush()/close() is discouraged (as it is not
clear how application should handle this error), so let's stop taking
evdev->mutex interruptibly.

Bugzilla: http://bugzilla.suse.com/show_bug.cgi?id=939834
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
[bwh: Backported to 3.2: there's no revoked flag to test]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit a6706174cfe9fa100651b5012aec9796006a884b)

Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agoSUNRPC: xs_reset_transport must mark the connection as disconnected
Trond Myklebust [Sat, 29 Aug 2015 20:36:30 +0000 (13:36 -0700)]
SUNRPC: xs_reset_transport must mark the connection as disconnected

commit 0c78789e3a030615c6650fde89546cadf40ec2cc upstream.

In case the reconnection attempt fails.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
[bwh: Backported to 3.2: add local variable xprt]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 9434e4855e90d2af3751cd93b47b4a3e40bc2dc1)

Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agoxfs: Fix xfs_attr_leafblock definition
Jan Kara [Wed, 19 Aug 2015 00:34:32 +0000 (10:34 +1000)]
xfs: Fix xfs_attr_leafblock definition

commit ffeecc5213024ae663377b442eedcfbacf6d0c5d upstream.

struct xfs_attr_leafblock contains 'entries' array which is declared
with size 1 altough it can in fact contain much more entries. Since this
array is followed by further struct members, gcc (at least in version
4.8.3) thinks that the array has the fixed size of 1 element and thus
may optimize away all accesses beyond the end of array resulting in
non-working code. This problem was only observed with userspace code in
xfsprogs, however it's better to be safe in kernel as well and have
matching kernel and xfsprogs definitions.

Signed-off-by: Jan Kara <jack@suse.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 86cbc0072fa4fc7906dd8abfa6489638014300bb)

Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agowindfarm: decrement client count when unregistering
Paul Bolle [Fri, 31 Jul 2015 12:08:58 +0000 (14:08 +0200)]
windfarm: decrement client count when unregistering

commit fe2b592173ff0274e70dc44d1d28c19bb995aa7c upstream.

wf_unregister_client() increments the client count when a client
unregisters. That is obviously incorrect. Decrement that client count
instead.

Fixes: 75722d3992f5 ("[PATCH] ppc64: Thermal control for SMU based machines")
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 48c46d4aed1c27f363e4673e988212340e82e1bb)
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agodevres: fix devres_get()
Masahiro Yamada [Wed, 15 Jul 2015 01:29:00 +0000 (10:29 +0900)]
devres: fix devres_get()

commit 64526370d11ce8868ca495723d595b61e8697fbf upstream.

Currently, devres_get() passes devres_free() the pointer to devres,
but devres_free() should be given with the pointer to resource data.

Fixes: 9ac7849e35f7 ("devres: device resource management")
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit ebc0ae5a7159086a992f7904dc9ad849a13eecfc)
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agoipc,sem: fix use after free on IPC_RMID after a task using same semaphore set exits
Herton R. Krzesinski [Fri, 14 Aug 2015 22:35:02 +0000 (15:35 -0700)]
ipc,sem: fix use after free on IPC_RMID after a task using same semaphore set exits

commit 602b8593d2b4138c10e922eeaafe306f6b51817b upstream.

The current semaphore code allows a potential use after free: in
exit_sem we may free the task's sem_undo_list while there is still
another task looping through the same semaphore set and cleaning the
sem_undo list at freeary function (the task called IPC_RMID for the same
semaphore set).

For example, with a test program [1] running which keeps forking a lot
of processes (which then do a semop call with SEM_UNDO flag), and with
the parent right after removing the semaphore set with IPC_RMID, and a
kernel built with CONFIG_SLAB, CONFIG_SLAB_DEBUG and
CONFIG_DEBUG_SPINLOCK, you can easily see something like the following
in the kernel log:

   Slab corruption (Not tainted): kmalloc-64 start=ffff88003b45c1c0, len=64
   000: 6b 6b 6b 6b 6b 6b 6b 6b 00 6b 6b 6b 6b 6b 6b 6b  kkkkkkkk.kkkkkkk
   010: ff ff ff ff 6b 6b 6b 6b ff ff ff ff ff ff ff ff  ....kkkk........
   Prev obj: start=ffff88003b45c180, len=64
   000: 00 00 00 00 ad 4e ad de ff ff ff ff 5a 5a 5a 5a  .....N......ZZZZ
   010: ff ff ff ff ff ff ff ff c0 fb 01 37 00 88 ff ff  ...........7....
   Next obj: start=ffff88003b45c200, len=64
   000: 00 00 00 00 ad 4e ad de ff ff ff ff 5a 5a 5a 5a  .....N......ZZZZ
   010: ff ff ff ff ff ff ff ff 68 29 a7 3c 00 88 ff ff  ........h).<....
   BUG: spinlock wrong CPU on CPU#2, test/18028
   general protection fault: 0000 [#1] SMP
   Modules linked in: 8021q mrp garp stp llc nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc ppdev input_leds joydev parport_pc parport floppy serio_raw virtio_balloon virtio_rng virtio_console virtio_net iosf_mbi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr qxl ttm drm_kms_helper drm snd_hda_codec_generic i2c_piix4 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore crc32c_intel virtio_pci virtio_ring virtio pata_acpi ata_generic [last unloaded: speedstep_lib]
   CPU: 2 PID: 18028 Comm: test Not tainted 4.2.0-rc5+ #1
   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
   RIP: spin_dump+0x53/0xc0
   Call Trace:
     spin_bug+0x30/0x40
     do_raw_spin_unlock+0x71/0xa0
     _raw_spin_unlock+0xe/0x10
     freeary+0x82/0x2a0
     ? _raw_spin_lock+0xe/0x10
     semctl_down.clone.0+0xce/0x160
     ? __do_page_fault+0x19a/0x430
     ? __audit_syscall_entry+0xa8/0x100
     SyS_semctl+0x236/0x2c0
     ? syscall_trace_leave+0xde/0x130
     entry_SYSCALL_64_fastpath+0x12/0x71
   Code: 8b 80 88 03 00 00 48 8d 88 60 05 00 00 48 c7 c7 a0 2c a4 81 31 c0 65 8b 15 eb 40 f3 7e e8 08 31 68 00 4d 85 e4 44 8b 4b 08 74 5e <45> 8b 84 24 88 03 00 00 49 8d 8c 24 60 05 00 00 8b 53 04 48 89
   RIP  [<ffffffff810d6053>] spin_dump+0x53/0xc0
    RSP <ffff88003750fd68>
   ---[ end trace 783ebb76612867a0 ]---
   NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [test:18053]
   Modules linked in: 8021q mrp garp stp llc nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc ppdev input_leds joydev parport_pc parport floppy serio_raw virtio_balloon virtio_rng virtio_console virtio_net iosf_mbi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr qxl ttm drm_kms_helper drm snd_hda_codec_generic i2c_piix4 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore crc32c_intel virtio_pci virtio_ring virtio pata_acpi ata_generic [last unloaded: speedstep_lib]
   CPU: 3 PID: 18053 Comm: test Tainted: G      D         4.2.0-rc5+ #1
   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
   RIP: native_read_tsc+0x0/0x20
   Call Trace:
     ? delay_tsc+0x40/0x70
     __delay+0xf/0x20
     do_raw_spin_lock+0x96/0x140
     _raw_spin_lock+0xe/0x10
     sem_lock_and_putref+0x11/0x70
     SYSC_semtimedop+0x7bf/0x960
     ? handle_mm_fault+0xbf6/0x1880
     ? dequeue_task_fair+0x79/0x4a0
     ? __do_page_fault+0x19a/0x430
     ? kfree_debugcheck+0x16/0x40
     ? __do_page_fault+0x19a/0x430
     ? __audit_syscall_entry+0xa8/0x100
     ? do_audit_syscall_entry+0x66/0x70
     ? syscall_trace_enter_phase1+0x139/0x160
     SyS_semtimedop+0xe/0x10
     SyS_semop+0x10/0x20
     entry_SYSCALL_64_fastpath+0x12/0x71
   Code: 47 10 83 e8 01 85 c0 89 47 10 75 08 65 48 89 3d 1f 74 ff 7e c9 c3 0f 1f 44 00 00 55 48 89 e5 e8 87 17 04 00 66 90 c9 c3 0f 1f 00 <55> 48 89 e5 0f 31 89 c1 48 89 d0 48 c1 e0 20 89 c9 48 09 c8 c9
   Kernel panic - not syncing: softlockup: hung tasks

I wasn't able to trigger any badness on a recent kernel without the
proper config debugs enabled, however I have softlockup reports on some
kernel versions, in the semaphore code, which are similar as above (the
scenario is seen on some servers running IBM DB2 which uses semaphore
syscalls).

The patch here fixes the race against freeary, by acquiring or waiting
on the sem_undo_list lock as necessary (exit_sem can race with freeary,
while freeary sets un->semid to -1 and removes the same sem_undo from
list_proc or when it removes the last sem_undo).

After the patch I'm unable to reproduce the problem using the test case
[1].

[1] Test case used below:

    #include <stdio.h>
    #include <sys/types.h>
    #include <sys/ipc.h>
    #include <sys/sem.h>
    #include <sys/wait.h>
    #include <stdlib.h>
    #include <time.h>
    #include <unistd.h>
    #include <errno.h>

    #define NSEM 1
    #define NSET 5

    int sid[NSET];

    void thread()
    {
            struct sembuf op;
            int s;
            uid_t pid = getuid();

            s = rand() % NSET;
            op.sem_num = pid % NSEM;
            op.sem_op = 1;
            op.sem_flg = SEM_UNDO;

            semop(sid[s], &op, 1);
            exit(EXIT_SUCCESS);
    }

    void create_set()
    {
            int i, j;
            pid_t p;
            union {
                    int val;
                    struct semid_ds *buf;
                    unsigned short int *array;
                    struct seminfo *__buf;
            } un;

            /* Create and initialize semaphore set */
            for (i = 0; i < NSET; i++) {
                    sid[i] = semget(IPC_PRIVATE , NSEM, 0644 | IPC_CREAT);
                    if (sid[i] < 0) {
                            perror("semget");
                            exit(EXIT_FAILURE);
                    }
            }
            un.val = 0;
            for (i = 0; i < NSET; i++) {
                    for (j = 0; j < NSEM; j++) {
                            if (semctl(sid[i], j, SETVAL, un) < 0)
                                    perror("semctl");
                    }
            }

            /* Launch threads that operate on semaphore set */
            for (i = 0; i < NSEM * NSET * NSET; i++) {
                    p = fork();
                    if (p < 0)
                            perror("fork");
                    if (p == 0)
                            thread();
            }

            /* Free semaphore set */
            for (i = 0; i < NSET; i++) {
                    if (semctl(sid[i], NSEM, IPC_RMID))
                            perror("IPC_RMID");
            }

            /* Wait for forked processes to exit */
            while (wait(NULL)) {
                    if (errno == ECHILD)
                            break;
            };
    }

    int main(int argc, char **argv)
    {
            pid_t p;

            srand(time(NULL));

            while (1) {
                    p = fork();
                    if (p < 0) {
                            perror("fork");
                            exit(EXIT_FAILURE);
                    }
                    if (p == 0) {
                            create_set();
                            goto end;
                    }

                    /* Wait for forked processes to exit */
                    while (wait(NULL)) {
                            if (errno == ECHILD)
                                    break;
                    };
            }
    end:
            return 0;
    }

[akpm@linux-foundation.org: use normal comment layout]
Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
Acked-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Rafael Aquini <aquini@redhat.com>
CC: Aristeu Rozanski <aris@redhat.com>
Cc: David Jeffery <djeffery@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit a1c4fb80c5d94ef61a77c1e891cae616a50d8d3c)
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agonet: Fix skb_set_peeked use-after-free bug
Herbert Xu [Tue, 4 Aug 2015 07:42:47 +0000 (15:42 +0800)]
net: Fix skb_set_peeked use-after-free bug

commit a0a2a6602496a45ae838a96db8b8173794b5d398 upstream.

The commit 738ac1ebb96d02e0d23bc320302a6ea94c612dec ("net: Clone
skb before setting peeked flag") introduced a use-after-free bug
in skb_recv_datagram.  This is because skb_set_peeked may create
a new skb and free the existing one.  As it stands the caller will
continue to use the old freed skb.

This patch fixes it by making skb_set_peeked return the new skb
(or the old one if unchanged).

Fixes: 738ac1ebb96d ("net: Clone skb before setting peeked flag")
Reported-by: Brenden Blanco <bblanco@plumgrid.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Brenden Blanco <bblanco@plumgrid.com>
Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit e553622ccb6b6e06079f980f55cf04128db3420c)
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agonet: Clone skb before setting peeked flag
Herbert Xu [Mon, 13 Jul 2015 08:04:13 +0000 (16:04 +0800)]
net: Clone skb before setting peeked flag

commit 738ac1ebb96d02e0d23bc320302a6ea94c612dec upstream.

Shared skbs must not be modified and this is crucial for broadcast
and/or multicast paths where we use it as an optimisation to avoid
unnecessary cloning.

The function skb_recv_datagram breaks this rule by setting peeked
without cloning the skb first.  This causes funky races which leads
to double-free.

This patch fixes this by cloning the skb and replacing the skb
in the list when setting skb->peeked.

Fixes: a59322be07c9 ("[UDP]: Only increment counter on first peek/recv")
Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 72e6f0680249f5e0a87f2b282d033baefd90d84e)
[wt: adjusted context for 2.6.32. Introduces a bug, see next commit]
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agords: fix an integer overflow test in rds_info_getsockopt()
Dan Carpenter [Sat, 1 Aug 2015 12:33:26 +0000 (15:33 +0300)]
rds: fix an integer overflow test in rds_info_getsockopt()

commit 468b732b6f76b138c0926eadf38ac88467dcd271 upstream.

"len" is a signed integer.  We check that len is not negative, so it
goes from zero to INT_MAX.  PAGE_SIZE is unsigned long so the comparison
is type promoted to unsigned long.  ULONG_MAX - 4095 is a higher than
INT_MAX so the condition can never be true.

I don't know if this is harmful but it seems safe to limit "len" to
INT_MAX - 4095.

Fixes: a8c879a7ee98 ('RDS: Info and stats')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit f3a66bdc88c3261f55b942453476e623056b92d9)
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agoxhci: fix off by one error in TRB DMA address boundary check
Mathias Nyman [Mon, 3 Aug 2015 13:07:48 +0000 (16:07 +0300)]
xhci: fix off by one error in TRB DMA address boundary check

commit 7895086afde2a05fa24a0e410d8e6b75ca7c8fdd upstream.

We need to check that a TRB is part of the current segment
before calculating its DMA address.

Previously a ring segment didn't use a full memory page, and every
new ring segment got a new memory page, so the off by one
error in checking the upper bound was never seen.

Now that we use a full memory page, 256 TRBs (4096 bytes), the off by one
didn't catch the case when a TRB was the first element of the next segment.

This is triggered if the virtual memory pages for a ring segment are
next to each in increasing order where the ring buffer wraps around and
causes errors like:

[  106.398223] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 0 comp_code 1
[  106.398230] xhci_hcd 0000:00:14.0: Looking for event-dma fffd3000 trb-start fffd4fd0 trb-end fffd5000 seg-start fffd4000 seg-end fffd4ff0

The trb-end address is one outside the end-seg address.

Tested-by: Arkadiusz MiÅ\9bkiewicz <arekm@maven.pl>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 6e3ae6256145b5597bee0296eb5fc384cd86aa3d)
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agoInitialize msg/shm IPC objects before doing ipc_addid()
Linus Torvalds [Wed, 30 Sep 2015 16:48:40 +0000 (12:48 -0400)]
Initialize msg/shm IPC objects before doing ipc_addid()

commit b9a532277938798b53178d5a66af6e2915cb27cf upstream.

As reported by Dmitry Vyukov, we really shouldn't do ipc_addid() before
having initialized the IPC object state.  Yes, we initialize the IPC
object in a locked state, but with all the lockless RCU lookup work,
that IPC object lock no longer means that the state cannot be seen.

We already did this for the IPC semaphore code (see commit e8577d1f0329:
"ipc/sem.c: fully initialize sem_array before making it visible") but we
clearly forgot about msg and shm.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 2.6.32:
 - Adjust context
 - The error path being moved looks a little different]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agoipc/sem.c: fully initialize sem_array before making it visible
Manfred Spraul [Tue, 2 Dec 2014 23:59:34 +0000 (15:59 -0800)]
ipc/sem.c: fully initialize sem_array before making it visible

commit e8577d1f0329d4842e8302e289fb2c22156abef4 upstream.

ipc_addid() makes a new ipc identifier visible to everyone.  New objects
start as locked, so that the caller can complete the initialization
after the call.  Within struct sem_array, at least sma->sem_base and
sma->sem_nsems are accessed without any locks, therefore this approach
doesn't work.

Thus: Move the ipc_addid() to the end of the initialization.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Reported-by: Rik van Riel <riel@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Rafael Aquini <aquini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 2.6.32:
 - Adjust context
 - The error path being moved looks a little different]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agoUSB: whiteheat: fix potential null-deref at probe
Johan Hovold [Wed, 23 Sep 2015 18:41:42 +0000 (11:41 -0700)]
USB: whiteheat: fix potential null-deref at probe

commit cbb4be652d374f64661137756b8f357a1827d6a4 upstream.

Fix potential null-pointer dereference at probe by making sure that the
required endpoints are present.

The whiteheat driver assumes there are at least five pairs of bulk
endpoints, of which the final pair is used for the "command port". An
attempt to bind to an interface with fewer bulk endpoints would
currently lead to an oops.

Fixes CVE-2015-5257.

Reported-by: Moein Ghasemzadeh <moein@istuary.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agovirtio-net: drop NETIF_F_FRAGLIST
Jason Wang [Wed, 5 Aug 2015 02:34:04 +0000 (10:34 +0800)]
virtio-net: drop NETIF_F_FRAGLIST

commit 48900cb6af4282fa0fb6ff4d72a81aa3dadb5c39 upstream.

virtio declares support for NETIF_F_FRAGLIST, but assumes
that there are at most MAX_SKB_FRAGS + 2 fragments which isn't
always true with a fraglist.

A longer fraglist in the skb will make the call to skb_to_sgvec overflow
the sg array, leading to memory corruption.

Drop NETIF_F_FRAGLIST so we only get what we can handle.

Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 2.6.32: there's only a single features field]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agoipv6: addrconf: validate new MTU before applying it
Marcelo Leitner [Mon, 23 Feb 2015 14:17:13 +0000 (11:17 -0300)]
ipv6: addrconf: validate new MTU before applying it

commit 77751427a1ff25b27d47a4c36b12c3c8667855ac upstream.

Currently we don't check if the new MTU is valid or not and this allows
one to configure a smaller than minimum allowed by RFCs or even bigger
than interface own MTU, which is a problem as it may lead to packet
drops.

If you have a daemon like NetworkManager running, this may be exploited
by remote attackers by forging RA packets with an invalid MTU, possibly
leading to a DoS. (NetworkManager currently only validates for values
too small, but not for too big ones.)

The fix is just to make sure the new value is valid. That is, between
IPV6_MIN_MTU and interface's MTU.

Note that similar check is already performed at
ndisc_router_discovery(), for when kernel itself parses the RA.

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 2.6.32:
 - Add a strategy for the sysctl as we don't get a default strategy
 - Adjust context, spacing]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agomd: use kzalloc() when bitmap is disabled
Benjamin Randazzo [Sat, 25 Jul 2015 14:36:50 +0000 (16:36 +0200)]
md: use kzalloc() when bitmap is disabled

commit b6878d9e03043695dbf3fa1caa6dfc09db225b16 upstream.

In drivers/md/md.c get_bitmap_file() uses kmalloc() for creating a
mdu_bitmap_file_t called "file".

5769         file = kmalloc(sizeof(*file), GFP_NOIO);
5770         if (!file)
5771                 return -ENOMEM;

This structure is copied to user space at the end of the function.

5786         if (err == 0 &&
5787             copy_to_user(arg, file, sizeof(*file)))
5788                 err = -EFAULT

But if bitmap is disabled only the first byte of "file" is initialized
with zero, so it's possible to read some bytes (up to 4095) of kernel
space memory from user space. This is an information leak.

5775         /* bitmap disabled, zero the first byte and copy out */
5776         if (!mddev->bitmap_info.file)
5777                 file->pathname[0] = '\0';

Signed-off-by: Benjamin Randazzo <benjamin@randazzo.fr>
Signed-off-by: NeilBrown <neilb@suse.com>
[bwh: Backported to 2.6.32: patch both possible allocation calls]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agoFailing to send a CLOSE if file is opened WRONLY and server reboots on a 4.x mount
Olga Kornievskaia [Mon, 14 Sep 2015 23:54:36 +0000 (19:54 -0400)]
Failing to send a CLOSE if file is opened WRONLY and server reboots on a 4.x mount

commit a41cbe86df3afbc82311a1640e20858c0cd7e065 upstream.

A test case is as the description says:
open(foobar, O_WRONLY);
sleep()  --> reboot the server
close(foobar)

The bug is because in nfs4state.c in nfs4_reclaim_open_state() a few
line before going to restart, there is
clear_bit(NFS4CLNT_RECLAIM_NOGRACE, &state->flags).

NFS4CLNT_RECLAIM_NOGRACE is a flag for the client states not open
owner states. Value of NFS4CLNT_RECLAIM_NOGRACE is 4 which is the
value of NFS_O_WRONLY_STATE in nfs4_state->flags. So clearing it wipes
out state and when we go to close it, â\80\9ccall_closeâ\80\9d doesnâ\80\99t get set as
state flag is not set and CLOSE doesnâ\80\99t go on the wire.

Signed-off-by: Olga Kornievskaia <aglo@umich.edu>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agovfs: Test for and handle paths that are unreachable from their mnt_root
Eric W. Biederman [Sun, 16 Aug 2015 01:27:13 +0000 (20:27 -0500)]
vfs: Test for and handle paths that are unreachable from their mnt_root

commit 397d425dc26da728396e66d392d5dcb8dac30c37 upstream.

In rare cases a directory can be renamed out from under a bind mount.
In those cases without special handling it becomes possible to walk up
the directory tree to the root dentry of the filesystem and down
from the root dentry to every other file or directory on the filesystem.

Like division by zero .. from an unconnected path can not be given
a useful semantic as there is no predicting at which path component
the code will realize it is unconnected.  We certainly can not match
the current behavior as the current behavior is a security hole.

Therefore when encounting .. when following an unconnected path
return -ENOENT.

- Add a function path_connected to verify path->dentry is reachable
  from path->mnt.mnt_root.  AKA to validate that rename did not do
  something nasty to the bind mount.

  To avoid races path_connected must be called after following a path
  component to it's next path component.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agodcache: Handle escaped paths in prepend_path
Eric W. Biederman [Sat, 15 Aug 2015 18:36:12 +0000 (13:36 -0500)]
dcache: Handle escaped paths in prepend_path

commit cde93be45a8a90d8c264c776fab63487b5038a65 upstream.

A rename can result in a dentry that by walking up d_parent
will never reach it's mnt_root.  For lack of a better term
I call this an escaped path.

prepend_path is called by four different functions __d_path,
d_absolute_path, d_path, and getcwd.

__d_path only wants to see paths are connected to the root it passes
in.  So __d_path needs prepend_path to return an error.

d_absolute_path similarly wants to see paths that are connected to
some root.  Escaped paths are not connected to any mnt_root so
d_absolute_path needs prepend_path to return an error greater
than 1.  So escaped paths will be treated like paths on lazily
unmounted mounts.

getcwd needs to prepend "(unreachable)" so getcwd also needs
prepend_path to return an error.

d_path is the interesting hold out.  d_path just wants to print
something, and does not care about the weird cases.  Which raises
the question what should be printed?

Given that <escaped_path>/<anything> should result in -ENOENT I
believe it is desirable for escaped paths to be printed as empty
paths.  As there are not really any meaninful path components when
considered from the perspective of a mount tree.

So tweak prepend_path to return an empty path with an new error
code of 3 when it encounters an escaped path.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[bwh: For 2.6.32, implement the "(unreachable)" string in __d_path()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agoLinux 2.6.32.68 v2.6.32.68
Willy Tarreau [Fri, 18 Sep 2015 11:51:49 +0000 (13:51 +0200)]
Linux 2.6.32.68

Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agoipv6: Fix return of xfrm6_tunnel_rcv()
David S. Miller [Tue, 24 May 2011 05:11:51 +0000 (01:11 -0400)]
ipv6: Fix return of xfrm6_tunnel_rcv()

commit 6ac3f6649223d916bbdf1e823926f8f3b34b5d99 upstream.

Like ipv4, just return xfrm6_rcv_spi()'s return value directly.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agodmaengine: fix missing 'cnt' in ?: in dmatest
Dr. David Alan Gilbert [Thu, 25 Aug 2011 23:13:55 +0000 (16:13 -0700)]
dmaengine: fix missing 'cnt' in ?: in dmatest

commit d07a74a546981a09ba490936645fbf0d1340b96c upstream.

Hi,
  On the latest tree my compiler has started giving the warning:

drivers/dma/dmatest.c:575:28: warning: the omitted middle operand in ?: will always be ?true?, suggest explicit middle operand [-Wparentheses]

The following patch fixes the missing middle clause with the same
fix that Nicolas Ferre used in the similar clauses.
(There seems to have been a race between him fixing that and
the extra clause going in a little later).

I don't actually know the dmatest code/structures, nor do I own
any hardware to test it on (assuming it needs a DMA engine);
 but this patch builds, the existing code is almost certainly
wrong and the fix is the same as the corresponding lines above it.

(WTH is x=y?:z legal C anyway?)

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agodccp: catch failed request_module call in dccp_probe init
Wang Weidong [Wed, 18 Dec 2013 02:24:33 +0000 (19:24 -0700)]
dccp: catch failed request_module call in dccp_probe init

commit 965cdea825693c821d200e38fac9402cde6dce6a upstream.

Check the return value of request_module during dccp_probe initialisation,
bail out if that call fails.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agodccp: Fix compile warning in probe code.
David S. Miller [Thu, 1 Dec 2011 19:45:49 +0000 (14:45 -0500)]
dccp: Fix compile warning in probe code.

commit d984e6197ecd2babc1537f42dc1e676133005cda upstream.

Commit 1386be55e32a3c5d8ef4a2b243c530a7b664c02c ("dccp: fix
auto-loading of dccp(_probe)") fixed a bug but created a new
compiler warning:

net/dccp/probe.c: In function â\80\98dccpprobe_initâ\80\99:
net/dccp/probe.c:166:2: warning: the omitted middle operand in ?: will always be â\80\98trueâ\80\99, suggest explicit middle operand [-Wparentheses]

try_then_request_module() is built for situations where the
"existence" test is some lookup function that returns a non-NULL
object on success, and with a reference count of some kind held.

Here we're looking for a success return of zero from the jprobe
registry.

Instead of fighting the way try_then_request_module() works, simply
open code what we want to happen in a local helper function.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agodccp: fix auto-loading of dccp(_probe)
Gerrit Renker [Tue, 2 Feb 2010 20:16:56 +0000 (20:16 +0000)]
dccp: fix auto-loading of dccp(_probe)

commit 1386be55e32a3c5d8ef4a2b243c530a7b664c02c upstream.

This fixes commit (38ff3e6bb987ec583268da8eb22628293095d43b) ("dccp_probe:
Fix module load dependencies between dccp and dccp_probe", from 15 Jan).

It fixes the construction of the first argument of try_then_request_module(),
where only valid return codes from the first argument should be returned.

What we do now is assign the result of register_jprobe() to ret, without
the side effect of the comparison.

Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agox86/xen: Probe target addresses in set_aliased_prot() before the hypercall
Andy Lutomirski [Thu, 30 Jul 2015 21:31:31 +0000 (14:31 -0700)]
x86/xen: Probe target addresses in set_aliased_prot() before the hypercall

commit aa1acff356bbedfd03b544051f5b371746735d89 upstream.

The update_va_mapping hypercall can fail if the VA isn't present
in the guest's page tables.  Under certain loads, this can
result in an OOPS when the target address is in unpopulated vmap
space.

While we're at it, add comments to help explain what's going on.

This isn't a great long-term fix.  This code should probably be
changed to use something like set_memory_ro.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Vrabel <dvrabel@cantab.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: security@kernel.org <security@kernel.org>
Cc: xen-devel <xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/0b0e55b995cda11e7829f140b833ef932fcabe3a.1438291540.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit b48d6a721ba2cb475aea937c707f577aafa660a2)

Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agolibata: increase the timeout when setting transfer mode
Mikulas Patocka [Wed, 8 Jul 2015 17:06:12 +0000 (13:06 -0400)]
libata: increase the timeout when setting transfer mode

commit d531be2ca2f27cca5f041b6a140504999144a617 upstream.

I have a ST4000DM000 disk. If Linux is booted while the disk is spun down,
the command that sets transfer mode causes the disk to spin up. The
spin-up takes longer than the default 5s timeout, so the command fails and
timeout is reported.

Fix this by increasing the timeout to 15s, which is enough for the disk to
spin up.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit d6ded32444c070ce41ad0d64fce8957d18009d72)

Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agos390/process: fix sfpc inline assembly
Heiko Carstens [Mon, 6 Jul 2015 13:02:37 +0000 (15:02 +0200)]
s390/process: fix sfpc inline assembly

commit e47994dd44bcb4a77b4152bd0eada585934703c0 upstream.

The sfpc inline assembly within execve_tail() may incorrectly set bits
28-31 of the sfpc instruction to a value which is not zero.
These bits however are currently unused and therefore should be zero
so we won't get surprised if these bits will be used in the future.

Therefore remove the second operand from the inline assembly.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit b411a8a3b44d76e782ba4bc6893068f3f590fe8a)

Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agomm: avoid setting up anonymous pages into file mapping
Kirill A. Shutemov [Mon, 6 Jul 2015 20:18:37 +0000 (23:18 +0300)]
mm: avoid setting up anonymous pages into file mapping

commit 6b7339f4c31ad69c8e9c0b2859276e22cf72176d upstream.

Reading page fault handler code I've noticed that under right
circumstances kernel would map anonymous pages into file mappings: if
the VMA doesn't have vm_ops->fault() and the VMA wasn't fully populated
on ->mmap(), kernel would handle page fault to not populated pte with
do_anonymous_page().

Let's change page fault handler to use do_anonymous_page() only on
anonymous VMA (->vm_ops == NULL) and make sure that the VMA is not
shared.

For file mappings without vm_ops->fault() or shred VMA without vm_ops,
page fault on pte_none() entry would lead to SIGBUS.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit e2506476534cff7bb3697fbe0654fdefd101bc80)

Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agofuse: initialize fc->release before calling it
Miklos Szeredi [Wed, 1 Jul 2015 14:25:55 +0000 (16:25 +0200)]
fuse: initialize fc->release before calling it

commit 0ad0b3255a08020eaf50e34ef0d6df5bdf5e09ed upstream.

fc->release is called from fuse_conn_put() which was used in the error
cleanup before fc->release was initialized.

[Jeremiah Mahler <jmmahler@gmail.com>: assign fc->release after calling
fuse_conn_init(fc) instead of before.]

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Fixes: a325f9b92273 ("fuse: update fuse_conn_init() and separate out fuse_conn_kill()")
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 1a713f9828a6abd288ecc9eef0bbe5c56d0ffc0b)

Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agotracing/filter: Do not allow infix to exceed end of string
Steven Rostedt (Red Hat) [Thu, 25 Jun 2015 22:10:09 +0000 (18:10 -0400)]
tracing/filter: Do not allow infix to exceed end of string

commit 6b88f44e161b9ee2a803e5b2b1fbcf4e20e8b980 upstream.

While debugging a WARN_ON() for filtering, I found that it is possible
for the filter string to be referenced after its end. With the filter:

 # echo '>' > /sys/kernel/debug/events/ext4/ext4_truncate_exit/filter

The filter_parse() function can call infix_get_op() which calls
infix_advance() that updates the infix filter pointers for the cnt
and tail without checking if the filter is already at the end, which
will put the cnt to zero and the tail beyond the end. The loop then calls
infix_next() that has

ps->infix.cnt--;
return ps->infix.string[ps->infix.tail++];

The cnt will now be below zero, and the tail that is returned is
already passed the end of the filter string. So far the allocation
of the filter string usually has some buffer that is zeroed out, but
if the filter string is of the exact size of the allocated buffer
there's no guarantee that the charater after the nul terminating
character will be zero.

Luckily, only root can write to the filter.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 7cc2315e7b9c148ee549d4cfbf68735a578b64db)

Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agotracing/filter: Do not WARN on operand count going below zero
Steven Rostedt (Red Hat) [Thu, 25 Jun 2015 22:02:29 +0000 (18:02 -0400)]
tracing/filter: Do not WARN on operand count going below zero

commit b4875bbe7e68f139bd3383828ae8e994a0df6d28 upstream.

When testing the fix for the trace filter, I could not come up with
a scenario where the operand count goes below zero, so I added a
WARN_ON_ONCE(cnt < 0) to the logic. But there is legitimate case
that it can happen (although the filter would be wrong).

 # echo '>' > /sys/kernel/debug/events/ext4/ext4_truncate_exit/filter

That is, a single operation without any operands will hit the path
where the WARN_ON_ONCE() can trigger. Although this is harmless,
and the filter is reported as a error. But instead of spitting out
a warning to the kernel dmesg, just fail nicely and report it via
the proper channels.

Link: http://lkml.kernel.org/r/558C6082.90608@oracle.com
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit b43dd35952747f563d0dec7aefb7570260f10353)

Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agomm: kmemleak: allow safe memory scanning during kmemleak disabling
Catalin Marinas [Wed, 24 Jun 2015 23:58:26 +0000 (16:58 -0700)]
mm: kmemleak: allow safe memory scanning during kmemleak disabling

commit c5f3b1a51a591c18c8b33983908e7fdda6ae417e upstream.

The kmemleak scanning thread can run for minutes.  Callbacks like
kmemleak_free() are allowed during this time, the race being taken care
of by the object->lock spinlock.  Such lock also prevents a memory block
from being freed or unmapped while it is being scanned by blocking the
kmemleak_free() -> ...  -> __delete_object() function until the lock is
released in scan_object().

When a kmemleak error occurs (e.g.  it fails to allocate its metadata),
kmemleak_enabled is set and __delete_object() is no longer called on
freed objects.  If kmemleak_scan is running at the same time,
kmemleak_free() no longer waits for the object scanning to complete,
allowing the corresponding memory block to be freed or unmapped (in the
case of vfree()).  This leads to kmemleak_scan potentially triggering a
page fault.

This patch separates the kmemleak_free() enabling/disabling from the
overall kmemleak_enabled nob so that we can defer the disabling of the
object freeing tracking until the scanning thread completed.  The
kmemleak_free_part() is deliberately ignored by this patch since this is
only called during boot before the scanning thread started.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Vignesh Radhakrishnan <vigneshr@codeaurora.org>
Tested-by: Vignesh Radhakrishnan <vigneshr@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2:
 - Adjust context
 - Drop changes to kmemleak_free_percpu()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 3bc68ffc5b43468537a2f0aa415f3b57f3b19d16)

Signed-off-by: Willy Tarreau <w@1wt.eu>
8 years agoNET: ROSE: Don't dereference NULL neighbour pointer.
Ralf Baechle [Thu, 18 Jun 2015 22:46:53 +0000 (00:46 +0200)]
NET: ROSE: Don't dereference NULL neighbour pointer.

commit d496f7842aada20c61e6044b3395383fa972872c upstream.

A ROSE socket doesn't necessarily always have a neighbour pointer so check
if the neighbour pointer is valid before dereferencing it.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Tested-by: Bernard Pidoux <f6bvp@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 8bbe4f448c01949084ef404eded3622086f052a6)

Signed-off-by: Willy Tarreau <w@1wt.eu>