]> git.itanic.dy.fi Git - linux-stable/log
linux-stable
10 years agoLinux 3.0.98 v3.0.98
Greg Kroah-Hartman [Tue, 1 Oct 2013 15:55:54 +0000 (08:55 -0700)]
Linux 3.0.98

10 years agokernel-doc: bugfix - multi-line macros
Daniel Santos [Fri, 5 Oct 2012 00:15:05 +0000 (17:15 -0700)]
kernel-doc: bugfix - multi-line macros

commit 654784284430bf2739985914b65e09c7c35a7273 upstream.

Prior to this patch the following code breaks:

/**
 * multiline_example - this breaks kernel-doc
 */
 #define multiline_example( \
myparam)

Producing this error:

Error(somefile.h:983): cannot understand prototype: 'multiline_example( \ '

This patch fixes the issue by appending all lines ending in a blackslash
(optionally followed by whitespace), removing the backslash and any
whitespace after it prior to appending (just like the C pre-processor
would).

This fixes a break in kerel-doc introduced by the additions to rbtree.h.

Signed-off-by: Daniel Santos <daniel.santos@pobox.com>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Michal Marek <mmarek@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agosfc: Fix efx_rx_buf_offset() for recycled pages
Ben Hutchings [Fri, 6 Sep 2013 21:39:20 +0000 (22:39 +0100)]
sfc: Fix efx_rx_buf_offset() for recycled pages

This bug fix is only for stable branches older than 3.10.  The bug was
fixed upstream by commit 2768935a4660 ('sfc: reuse pages to avoid DMA
mapping/unmapping costs'), but that change is totally unsuitable for
stable.

Commit b590ace09d51 ('sfc: Fix efx_rx_buf_offset() in the presence of
swiotlb') added an explicit page_offset member to struct
efx_rx_buffer, which must be set consistently with the u.page and
dma_addr fields.  However, it failed to add the necessary assignment
in efx_resurrect_rx_buffer().  It also did not correct the calculation
of efx_rx_buffer::dma_addr in efx_resurrect_rx_buffer(), which assumes
that DMA-mapping a page will result in a page-aligned DMA address
(exactly what swiotlb violates).

Add the assignment of efx_rx_buffer::page_offset and change the
calculation of dma_addr to make use of it.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoperf tools: Handle JITed code in shared memory
Andi Kleen [Thu, 25 Apr 2013 00:03:02 +0000 (17:03 -0700)]
perf tools: Handle JITed code in shared memory

commit 89365e6c9ad4c0e090e4c6a4b67a3ce319381d89 upstream.

Need to check for /dev/zero.

Most likely more strings are missing too.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1366848182-30449-1-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Vinson Lee <vlee@freedesktop.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agofanotify: dont merge permission events
Lino Sanfilippo [Fri, 23 Mar 2012 01:42:23 +0000 (02:42 +0100)]
fanotify: dont merge permission events

commit 03a1cec1f17ac1a6041996b3e40f96b5a2f90e1b upstream.

Boyd Yang reported a problem for the case that multiple threads of the same
thread group are waiting for a reponse for a permission event.
In this case it is possible that some of the threads are never woken up, even
if the response for the event has been received
(see http://marc.info/?l=linux-kernel&m=131822913806350&w=2).

The reason is that we are currently merging permission events if they belong to
the same thread group. But we are not prepared to wake up more than one waiter
for each event. We do

wait_event(group->fanotify_data.access_waitq, event->response ||
atomic_read(&group->fanotify_data.bypass_perm));
and after that
  event->response = 0;

which is the reason that even if we woke up all waiters for the same event
some of them may see event->response being already set 0 again, then go back to
sleep and block forever.

With this patch we avoid that more than one thread is waiting for a response
by not merging permission events for the same thread group any more.

Reported-by: Boyd Yang <boyd.yang@gmail.com>
Signed-off-by: Lino Sanfilippo <LinoSanfilipp@gmx.de>
Signed-off-by: Eric Paris <eparis@redhat.com>
Cc: Mihai Donțu <mihai.dontu@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoperf: Fix perf_cgroup_switch for sw-events
Peter Zijlstra [Tue, 2 Oct 2012 13:41:23 +0000 (15:41 +0200)]
perf: Fix perf_cgroup_switch for sw-events

commit 95cf59ea72331d0093010543b8951bb43f262cac upstream.

Jiri reported that he could trigger the WARN_ON_ONCE() in
perf_cgroup_switch() using sw-events. This is because sw-events share
a cpuctx with multiple PMUs.

Use the ->unique_pmu pointer to limit the pmu iteration to unique
cpuctx instances.

Reported-and-Tested-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-so7wi2zf3jjzrwcutm2mkz0j@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoperf: Clarify perf_cpu_context::active_pmu usage by renaming it to ::unique_pmu
Peter Zijlstra [Tue, 2 Oct 2012 13:38:52 +0000 (15:38 +0200)]
perf: Clarify perf_cpu_context::active_pmu usage by renaming it to ::unique_pmu

commit 3f1f33206c16c7b3839d71372bc2ac3f305aa802 upstream.

Stephane thought the perf_cpu_context::active_pmu name confusing and
suggested using 'unique_pmu' instead.

This pointer is a pointer to a 'random' pmu sharing the cpuctx
instance, therefore limiting a for_each_pmu loop to those where
cpuctx->unique_pmu matches the pmu we get a loop over unique cpuctx
instances.

Suggested-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-kxyjqpfj2fn9gt7kwu5ag9ks@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agocgroup: fail if monitored file and event_control are in different cgroup
Li Zefan [Mon, 18 Feb 2013 06:13:35 +0000 (14:13 +0800)]
cgroup: fail if monitored file and event_control are in different cgroup

commit f169007b2773f285e098cb84c74aac0154d65ff7 upstream.

If we pass fd of memory.usage_in_bytes of cgroup A to cgroup.event_control
of cgroup B, then we won't get memory usage notification from A but B!

What's worse, if A and B are in different mount hierarchy, we'll end up
accessing NULL pointer!

Disallow this kind of invalid usage.

Signed-off-by: Li Zefan <lizefan@huawei.com>
Acked-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Weng Meiling <wengmeiling.weng@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoSCSI: iscsi: don't hang in endless loop if no targets present
Sasha Levin [Thu, 26 Jan 2012 03:16:16 +0000 (22:16 -0500)]
SCSI: iscsi: don't hang in endless loop if no targets present

commit 46a7c17d26967922092f3a8291815ffb20f6cabe upstream.

iscsi_if_send_reply() may return -ESRCH if there were no targets to send
data to. Currently we're ignoring this value and looping in attempt to do it
over and over, which will usually lead in a hung task like this one:

[ 4920.817298] INFO: task trinity:9074 blocked for more than 120 seconds.
[ 4920.818527] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4920.819982] trinity         D 0000000000000000  5504  9074   2756 0x00000004
[ 4920.825374]  ffff880003961a98 0000000000000086 ffff8800001aa000 ffff8800001aa000
[ 4920.826791]  00000000001d4340 ffff880003961fd8 ffff880003960000 00000000001d4340
[ 4920.828241]  00000000001d4340 00000000001d4340 ffff880003961fd8 00000000001d4340
[ 4920.833231]
[ 4920.833519] Call Trace:
[ 4920.834010]  [<ffffffff826363fa>] schedule+0x3a/0x50
[ 4920.834953]  [<ffffffff82634ac9>] __mutex_lock_common+0x209/0x5b0
[ 4920.836226]  [<ffffffff81af805d>] ? iscsi_if_rx+0x2d/0x990
[ 4920.837281]  [<ffffffff81053943>] ? sched_clock+0x13/0x20
[ 4920.838305]  [<ffffffff81af805d>] ? iscsi_if_rx+0x2d/0x990
[ 4920.839336]  [<ffffffff82634eb0>] mutex_lock_nested+0x40/0x50
[ 4920.840423]  [<ffffffff81af805d>] iscsi_if_rx+0x2d/0x990
[ 4920.841434]  [<ffffffff810dffed>] ? sub_preempt_count+0x9d/0xd0
[ 4920.842548]  [<ffffffff82637bb0>] ? _raw_read_unlock+0x30/0x60
[ 4920.843666]  [<ffffffff821f71de>] netlink_unicast+0x1ae/0x1f0
[ 4920.844751]  [<ffffffff821f7997>] netlink_sendmsg+0x227/0x350
[ 4920.845850]  [<ffffffff821857bd>] ? sock_update_netprioidx+0xdd/0x1b0
[ 4920.847060]  [<ffffffff82185732>] ? sock_update_netprioidx+0x52/0x1b0
[ 4920.848276]  [<ffffffff8217f226>] sock_aio_write+0x166/0x180
[ 4920.849348]  [<ffffffff810dfe41>] ? get_parent_ip+0x11/0x50
[ 4920.850428]  [<ffffffff811d0d9a>] do_sync_write+0xda/0x120
[ 4920.851465]  [<ffffffff810dffed>] ? sub_preempt_count+0x9d/0xd0
[ 4920.852579]  [<ffffffff810dfe41>] ? get_parent_ip+0x11/0x50
[ 4920.853608]  [<ffffffff81791887>] ? security_file_permission+0x27/0xb0
[ 4920.854821]  [<ffffffff811d0f4c>] vfs_write+0x16c/0x180
[ 4920.855781]  [<ffffffff811d104f>] sys_write+0x4f/0xa0
[ 4920.856798]  [<ffffffff82638e79>] system_call_fastpath+0x16/0x1b
[ 4920.877487] 1 lock held by trinity/9074:
[ 4920.878239]  #0:  (rx_queue_mutex){+.+...}, at: [<ffffffff81af805d>] iscsi_if_rx+0x2d/0x990
[ 4920.880005] Kernel panic - not syncing: hung_task: blocked tasks

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Acked-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agodrm/radeon: fix handling of variable sized arrays for router objects
Alex Deucher [Tue, 27 Aug 2013 16:36:01 +0000 (12:36 -0400)]
drm/radeon: fix handling of variable sized arrays for router objects

commit fb93df1c2d8b3b1fb16d6ee9e32554e0c038815d upstream.

The table has the following format:

typedef struct _ATOM_SRC_DST_TABLE_FOR_ONE_OBJECT         //usSrcDstTableOffset pointing to this structure
{
  UCHAR               ucNumberOfSrc;
  USHORT              usSrcObjectID[1];
  UCHAR               ucNumberOfDst;
  USHORT              usDstObjectID[1];
}ATOM_SRC_DST_TABLE_FOR_ONE_OBJECT;

usSrcObjectID[] and usDstObjectID[] are variably sized, so we
can't access them directly.  Use pointers and update the offset
appropriately when accessing the Dst members.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agodrm/radeon: fix resume on some rs4xx boards (v2)
Alex Deucher [Mon, 26 Aug 2013 21:52:12 +0000 (17:52 -0400)]
drm/radeon: fix resume on some rs4xx boards (v2)

commit acf88deb8ddbb73acd1c3fa32fde51af9153227f upstream.

Setting MC_MISC_CNTL.GART_INDEX_REG_EN causes hangs on
some boards on resume.  The systems seem to work fine
without touching this bit so leave it as is.

v2: read-modify-write the GART_INDEX_REG_EN bit.
I suspect the problem is that we are losing the other
settings in the register.

fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=52952

Reported-by: Ondrej Zary <linux@rainbow-software.org>
Tested-by: Daniel Tobias <dan.g.tob@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agodrm/radeon: update line buffer allocation for dce4.1/5
Alex Deucher [Mon, 19 Aug 2013 15:06:50 +0000 (11:06 -0400)]
drm/radeon: update line buffer allocation for dce4.1/5

commit 0b31e02363b0db4e7931561bc6c141436e729d9f upstream.

We need to allocate line buffer to each display when
setting up the watermarks.  Failure to do so can lead
to a blank screen.  This fixes blank screen problems
on dce4.1/5 asics.

Based on an initial fix from:
Jay Cornwall <jay.cornwall@amd.com>

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agodrm/radeon: fix LCD record parsing
Alex Deucher [Tue, 20 Aug 2013 18:59:01 +0000 (14:59 -0400)]
drm/radeon: fix LCD record parsing

commit 95663948ba22a4be8b99acd67fbf83e86ddffba4 upstream.

If the LCD table contains an EDID record, properly account
for the edid size when walking through the records.

This should fix error messages about unknown LCD records.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoHID: zeroplus: validate output report details
Kees Cook [Wed, 11 Sep 2013 19:56:51 +0000 (21:56 +0200)]
HID: zeroplus: validate output report details

commit 78214e81a1bf43740ce89bb5efda78eac2f8ef83 upstream.

The zeroplus HID driver was not checking the size of allocated values
in fields it used. A HID device could send a malicious output report
that would cause the driver to write beyond the output report allocation
during initialization, causing a heap overflow:

[ 1442.728680] usb 1-1: New USB device found, idVendor=0c12, idProduct=0005
...
[ 1466.243173] BUG kmalloc-192 (Tainted: G        W   ): Redzone overwritten

CVE-2013-2889

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoHID: provide a helper for validating hid reports
Kees Cook [Wed, 11 Sep 2013 19:56:50 +0000 (21:56 +0200)]
HID: provide a helper for validating hid reports

commit 331415ff16a12147d57d5c953f3a961b7ede348b upstream.

Many drivers need to validate the characteristics of their HID report
during initialization to avoid misusing the reports. This adds a common
helper to perform validation of the report exisitng, the field existing,
and the expected number of values within the field.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agort2800: fix wrong TX power compensation
Stanislaw Gruszka [Mon, 26 Aug 2013 13:18:53 +0000 (15:18 +0200)]
rt2800: fix wrong TX power compensation

commit 6e956da2027c767859128b9bfef085cf2a8e233b upstream.

We should not do temperature compensation on devices without
EXTERNAL_TX_ALC bit set (called DynamicTxAgcControl on vendor driver).
Such devices can have totally bogus TSSI parameters on the EEPROM,
but still threaded by us as valid and result doing wrong TX power
calculations.

This fix inability to connect to AP on slightly longer distance on
some Ralink chips/devices.

Reported-and-tested-by: Fabien ADAM <id2ndr@crocobox.org>
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agonet: usb: cdc_ether: Use wwan interface for Telit modules
Fabio Porcedda [Mon, 16 Sep 2013 09:47:50 +0000 (11:47 +0200)]
net: usb: cdc_ether: Use wwan interface for Telit modules

commit 0092820407901a0b2c4e343e85f96bb7abfcded1 upstream.

Signed-off-by: Fabio Porcedda <fabio.porcedda@gmail.com>
Acked-by: Oliver Neukum <oliver@neukum.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoRevert "sctp: fix call to SCTP_CMD_PROCESS_SACK in sctp_cmd_interpreter()"
Greg Kroah-Hartman [Fri, 27 Sep 2013 15:34:49 +0000 (08:34 -0700)]
Revert "sctp: fix call to SCTP_CMD_PROCESS_SACK in sctp_cmd_interpreter()"

This reverts commit b23270416da409bd4e637a5acbe31a1126235fb6 which is
commit f6e80abeab928b7c47cc1fbf53df13b4398a2bec.

Michal writes:
Mainline commit f6e80abe was introduced in v3.7-rc2 as a
follow-up fix to commit

  edfee033  sctp: check src addr when processing SACK to update transport state

(from v3.7-rc1) which changed the interpretation of third
argument to sctp_cmd_process_sack() and sctp_outq_sack(). But as
commit edfee033 has never been backported to stable branches,
backport of commit f6e80abe actually breaks the code rather than
fixing it.

Reported-by: Michal Kubecek <mkubecek@suse.cz>
Cc: Zijie Pan <zijie.pan@6wind.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoLinux 3.0.97 v3.0.97
Greg Kroah-Hartman [Thu, 26 Sep 2013 23:53:10 +0000 (16:53 -0700)]
Linux 3.0.97

10 years agofuse: invalidate inode attributes on xattr modification
Anand Avati [Tue, 20 Aug 2013 06:21:07 +0000 (02:21 -0400)]
fuse: invalidate inode attributes on xattr modification

commit d331a415aef98717393dda0be69b7947da08eba3 upstream.

Calls like setxattr and removexattr result in updation of ctime.
Therefore invalidate inode attributes to force a refresh.

Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agofuse: postpone end_page_writeback() in fuse_writepage_locked()
Maxim Patlasov [Mon, 12 Aug 2013 16:39:30 +0000 (20:39 +0400)]
fuse: postpone end_page_writeback() in fuse_writepage_locked()

commit 4a4ac4eba1010ef9a804569058ab29e3450c0315 upstream.

The patch fixes a race between ftruncate(2), mmap-ed write and write(2):

1) An user makes a page dirty via mmap-ed write.
2) The user performs shrinking truncate(2) intended to purge the page.
3) Before fuse_do_setattr calls truncate_pagecache, the page goes to
   writeback. fuse_writepage_locked fills FUSE_WRITE request and releases
   the original page by end_page_writeback.
4) fuse_do_setattr() completes and successfully returns. Since now, i_mutex
   is free.
5) Ordinary write(2) extends i_size back to cover the page. Note that
   fuse_send_write_pages do wait for fuse writeback, but for another
   page->index.
6) fuse_writepage_locked proceeds by queueing FUSE_WRITE request.
   fuse_send_writepage is supposed to crop inarg->size of the request,
   but it doesn't because i_size has already been extended back.

Moving end_page_writeback to the end of fuse_writepage_locked fixes the
race because now the fact that truncate_pagecache is successfully returned
infers that fuse_writepage_locked has already called end_page_writeback.
And this, in turn, infers that fuse_flush_writepages has already called
fuse_send_writepage, and the latter used valid (shrunk) i_size. write(2)
could not extend it because of i_mutex held by ftruncate(2).

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoisofs: Refuse RW mount of the filesystem instead of making it RO
Jan Kara [Thu, 25 Jul 2013 09:49:11 +0000 (11:49 +0200)]
isofs: Refuse RW mount of the filesystem instead of making it RO

commit 17b7f7cf58926844e1dd40f5eb5348d481deca6a upstream.

Refuse RW mount of isofs filesystem. So far we just silently changed it
to RO mount but when the media is writeable, block layer won't notice
this change and thus will think device is used RW and will block eject
button of the drive. That is unexpected by users because for
non-writeable media eject button works just fine.

Userspace mount(8) command handles this just fine and retries mounting
with MS_RDONLY set so userspace shouldn't see any regression.  Plus any
tool mounting isofs is likely confronted with the case of read-only
media where block layer already refuses to mount the filesystem without
MS_RDONLY set so our behavior shouldn't be anything new for it.

Reported-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agomm/huge_memory.c: fix potential NULL pointer dereference
Libin [Wed, 11 Sep 2013 21:20:38 +0000 (14:20 -0700)]
mm/huge_memory.c: fix potential NULL pointer dereference

commit a8f531ebc33052642b4bd7b812eedf397108ce64 upstream.

In collapse_huge_page() there is a race window between releasing the
mmap_sem read lock and taking the mmap_sem write lock, so find_vma() may
return NULL.  So check the return value to avoid NULL pointer dereference.

collapse_huge_page
khugepaged_alloc_page
up_read(&mm->mmap_sem)
down_write(&mm->mmap_sem)
vma = find_vma(mm, address)

Signed-off-by: Libin <huawei.libin@huawei.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agomemcg: fix multiple large threshold notifications
Greg Thelen [Wed, 11 Sep 2013 21:23:08 +0000 (14:23 -0700)]
memcg: fix multiple large threshold notifications

commit 2bff24a3707093c435ab3241c47dcdb5f16e432b upstream.

A memory cgroup with (1) multiple threshold notifications and (2) at least
one threshold >=2G was not reliable.  Specifically the notifications would
either not fire or would not fire in the proper order.

The __mem_cgroup_threshold() signaling logic depends on keeping 64 bit
thresholds in sorted order.  mem_cgroup_usage_register_event() sorts them
with compare_thresholds(), which returns the difference of two 64 bit
thresholds as an int.  If the difference is positive but has bit[31] set,
then sort() treats the difference as negative and breaks sort order.

This fix compares the two arbitrary 64 bit thresholds returning the
classic -1, 0, 1 result.

The test below sets two notifications (at 0x1000 and 0x81001000):
  cd /sys/fs/cgroup/memory
  mkdir x
  for x in 4096 2164264960; do
    cgroup_event_listener x/memory.usage_in_bytes $x | sed "s/^/$x listener:/" &
  done
  echo $$ > x/cgroup.procs
  anon_leaker 500M

v3.11-rc7 fails to signal the 4096 event listener:
  Leaking...
  Done leaking pages.

Patched v3.11-rc7 properly notifies:
  Leaking...
  4096 listener:2013:8:31:14:13:36
  Done leaking pages.

The fixed bug is old.  It appears to date back to the introduction of
memcg threshold notifications in v2.6.34-rc1-116-g2e72b6347c94 "memcg:
implement memory thresholds"

Signed-off-by: Greg Thelen <gthelen@google.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoocfs2: fix the end cluster offset of FIEMAP
Jie Liu [Wed, 11 Sep 2013 21:20:05 +0000 (14:20 -0700)]
ocfs2: fix the end cluster offset of FIEMAP

commit 28e8be31803b19d0d8f76216cb11b480b8a98bec upstream.

Call fiemap ioctl(2) with given start offset as well as an desired mapping
range should show extents if possible.  However, we somehow figure out the
end offset of mapping via 'mapping_end -= cpos' before iterating the
extent records which would cause problems if the given fiemap length is
too small to a cluster size, e.g,

Cluster size 4096:
debugfs.ocfs2 1.6.3
        Block Size Bits: 12   Cluster Size Bits: 12

The extended fiemap test utility From David:
https://gist.github.com/anonymous/6172331

# dd if=/dev/urandom of=/ocfs2/test_file bs=1M count=1000
# ./fiemap /ocfs2/test_file 4096 10
start: 4096, length: 10
File /ocfs2/test_file has 0 extents:
# Logical          Physical         Length           Flags
^^^^^ <-- No extent is shown

In this case, at ocfs2_fiemap(): cpos == mapping_end == 1. Hence the
loop of searching extent records was not executed at all.

This patch remove the in question 'mapping_end -= cpos', and loops
until the cpos is larger than the mapping_end as usual.

# ./fiemap /ocfs2/test_file 4096 10
start: 4096, length: 10
File /ocfs2/test_file has 1 extents:
# Logical          Physical         Length           Flags
0: 0000000000000000 0000000056a01000 0000000006a00000 0000

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reported-by: David Weber <wb@munzinger.de>
Tested-by: David Weber <wb@munzinger.de>
Cc: Sunil Mushran <sunil.mushran@gmail.com>
Cc: Mark Fashen <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoHID: check for NULL field when setting values
Kees Cook [Wed, 28 Aug 2013 20:32:01 +0000 (22:32 +0200)]
HID: check for NULL field when setting values

commit be67b68d52fa28b9b721c47bb42068f0c1214855 upstream.

Defensively check that the field to be worked on is not NULL.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoHID: ntrig: validate feature report details
Kees Cook [Wed, 28 Aug 2013 20:31:28 +0000 (22:31 +0200)]
HID: ntrig: validate feature report details

commit 875b4e3763dbc941f15143dd1a18d10bb0be303b upstream.

A HID device could send a malicious feature report that would cause the
ntrig HID driver to trigger a NULL dereference during initialization:

[57383.031190] usb 3-1: New USB device found, idVendor=1b96, idProduct=0001
...
[57383.315193] BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
[57383.315308] IP: [<ffffffffa08102de>] ntrig_probe+0x25e/0x420 [hid_ntrig]

CVE-2013-2896

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Rafi Rubin <rafi@seas.upenn.edu>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoHID: validate HID report id size
Kees Cook [Wed, 28 Aug 2013 20:29:55 +0000 (22:29 +0200)]
HID: validate HID report id size

commit 43622021d2e2b82ea03d883926605bdd0525e1d1 upstream.

The "Report ID" field of a HID report is used to build indexes of
reports. The kernel's index of these is limited to 256 entries, so any
malicious device that sets a Report ID greater than 255 will trigger
memory corruption on the host:

[ 1347.156239] BUG: unable to handle kernel paging request at ffff88094958a878
[ 1347.156261] IP: [<ffffffff813e4da0>] hid_register_report+0x2a/0x8b

CVE-2013-2888

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoHID: pantherlord: validate output report details
Kees Cook [Wed, 28 Aug 2013 20:30:49 +0000 (22:30 +0200)]
HID: pantherlord: validate output report details

commit 412f30105ec6735224535791eed5cdc02888ecb4 upstream.

A HID device could send a malicious output report that would cause the
pantherlord HID driver to write beyond the output report allocation
during initialization, causing a heap overflow:

[  310.939483] usb 1-1: New USB device found, idVendor=0e8f, idProduct=0003
...
[  315.980774] BUG kmalloc-192 (Tainted: G        W   ): Redzone overwritten

CVE-2013-2892

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoath9k: avoid accessing MRC registers on single-chain devices
Felix Fietkau [Tue, 13 Aug 2013 10:33:28 +0000 (12:33 +0200)]
ath9k: avoid accessing MRC registers on single-chain devices

commit a1c781bb20ac1e03280e420abd47a99eb8bbdd3b upstream.

They are not implemented, and accessing them might trigger errors

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoath9k: always clear ps filter bit on new assoc
Felix Fietkau [Tue, 6 Aug 2013 12:18:10 +0000 (14:18 +0200)]
ath9k: always clear ps filter bit on new assoc

commit 026d5b07c03458f9c0ccd19c3850564a5409c325 upstream.

Otherwise in some cases, EAPOL frames might be filtered during the
initial handshake, causing delays and assoc failures.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoALSA: hda - Add Toshiba Satellite C870 to MSI blacklist
Takashi Iwai [Mon, 9 Sep 2013 08:20:48 +0000 (10:20 +0200)]
ALSA: hda - Add Toshiba Satellite C870 to MSI blacklist

commit 83f72151352791836a1b9c1542614cc9bf71ac61 upstream.

Toshiba Satellite C870 shows interrupt problems occasionally when
certain mixer controls like "Mic Switch" is toggled.  This seems
worked around by not using MSI.

Bugzilla: https://bugzilla.novell.com/show_bug.cgi?id=833585
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoASoC: wm8960: Fix PLL register writes
Mike Dyer [Fri, 16 Aug 2013 17:36:28 +0000 (18:36 +0100)]
ASoC: wm8960: Fix PLL register writes

commit 85fa532b6ef920b32598df86b194571a7059a77c upstream.

Bit 9 of PLL2,3 and 4 is reserved as '0'. The 24bit fractional part
should be split across each register in 8bit chunks.

Signed-off-by: Mike Dyer <mike.dyer@md-soft.co.uk>
Signed-off-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agorculist: list_first_or_null_rcu() should use list_entry_rcu()
Tejun Heo [Fri, 28 Jun 2013 17:34:48 +0000 (10:34 -0700)]
rculist: list_first_or_null_rcu() should use list_entry_rcu()

commit c34ac00caefbe49d40058ae7200bd58725cebb45 upstream.

list_first_or_null() should test whether the list is empty and return
pointer to the first entry if not in a RCU safe manner.  It's broken
in several ways.

* It compares __kernel @__ptr with __rcu @__next triggering the
  following sparse warning.

  net/core/dev.c:4331:17: error: incompatible types in comparison expression (different address spaces)

* It doesn't perform rcu_dereference*() and computes the entry address
  using container_of() directly from the __rcu pointer which is
  inconsitent with other rculist interface.  As a result, all three
  in-kernel users - net/core/dev.c, macvlan, cgroup - are buggy.  They
  dereference the pointer w/o going through read barrier.

* While ->next dereference passes through list_next_rcu(), the
  compiler is still free to fetch ->next more than once and thus
  nullify the "__ptr != __next" condition check.

Fix it by making list_first_or_null_rcu() dereference ->next directly
using ACCESS_ONCE() and then use list_entry_rcu() on it like other
rculist accessors.

v2: Paul pointed out that the compiler may fetch the pointer more than
    once nullifying the condition check.  ACCESS_ONCE() added on
    ->next dereference.

v3: Restored () around macro param which was accidentally removed.
    Spotted by Paul.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agousb: config->desc.bLength may not exceed amount of data returned by the device
Hans de Goede [Sat, 3 Aug 2013 14:37:48 +0000 (16:37 +0200)]
usb: config->desc.bLength may not exceed amount of data returned by the device

commit b4f17a488ae2e09bfcf95c0e0b4219c246f1116a upstream.

While reading the config parsing code I noticed this check is missing, without
this check config->desc.wTotalLength can end up with a value larger then the
dev->rawdescriptors length for the config, and when userspace then tries to
get the rawdescriptors bad things may happen.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoUSB: cdc-wdm: fix race between interrupt handler and tasklet
Oliver Neukum [Tue, 6 Aug 2013 12:22:59 +0000 (14:22 +0200)]
USB: cdc-wdm: fix race between interrupt handler and tasklet

commit 6dd433e6cf2475ce8abec1b467720858c24450eb upstream.

Both could want to submit the same URB. Some checks of the flag
intended to prevent that were missing.

Signed-off-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoUSB: mos7720: fix big-endian control requests
Johan Hovold [Mon, 19 Aug 2013 11:05:45 +0000 (13:05 +0200)]
USB: mos7720: fix big-endian control requests

commit 3b716caf190ccc6f2a09387210e0e6a26c1d81a4 upstream.

Fix endianess bugs in parallel-port code which caused corrupt
control-requests to be issued on big-endian machines.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoUSB: mos7720: use GFP_ATOMIC under spinlock
Dan Carpenter [Fri, 16 Aug 2013 07:16:59 +0000 (10:16 +0300)]
USB: mos7720: use GFP_ATOMIC under spinlock

commit d0bd9a41186e076ea543c397ad8a67a6cf604b55 upstream.

The write_parport_reg_nonblock() function shouldn't sleep because it's
called with spinlocks held.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agostaging: comedi: dt282x: dt282x_ai_insn_read() always fails
Dan Carpenter [Tue, 20 Aug 2013 08:57:35 +0000 (11:57 +0300)]
staging: comedi: dt282x: dt282x_ai_insn_read() always fails

commit 2c4283ca7cdcc6605859c836fc536fcd83a4525f upstream.

In dt282x_ai_insn_read() we call this macro like:
wait_for(!mux_busy(), comedi_error(dev, "timeout\n"); return -ETIME;);
Because the if statement doesn't have curly braces it means we always
return -ETIME and the function never succeeds.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agocifs: ensure that srv_mutex is held when dealing with ssocket pointer
Jeff Layton [Thu, 5 Sep 2013 12:38:10 +0000 (08:38 -0400)]
cifs: ensure that srv_mutex is held when dealing with ssocket pointer

commit 73e216a8a42c0ef3d08071705c946c38fdbe12b0 upstream.

Oleksii reported that he had seen an oops similar to this:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000088
IP: [<ffffffff814dcc13>] sock_sendmsg+0x93/0xd0
PGD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: ipt_MASQUERADE xt_REDIRECT xt_tcpudp iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables carl9170 ath usb_storage f2fs nfnetlink_log nfnetlink md4 cifs dns_resolver hid_generic usbhid hid af_packet uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev rfcomm btusb bnep bluetooth qmi_wwan qcserial cdc_wdm usb_wwan usbnet usbserial mii snd_hda_codec_hdmi snd_hda_codec_realtek iwldvm mac80211 coretemp intel_powerclamp kvm_intel kvm iwlwifi snd_hda_intel cfg80211 snd_hda_codec xhci_hcd e1000e ehci_pci snd_hwdep sdhci_pci snd_pcm ehci_hcd microcode psmouse sdhci thinkpad_acpi mmc_core i2c_i801 pcspkr usbcore hwmon snd_timer snd_page_alloc snd ptp rfkill pps_core soundcore evdev usb_common vboxnetflt(O) vboxdrv(O)Oops#2 Part8
 loop tun binfmt_misc fuse msr acpi_call(O) ipv6 autofs4
CPU: 0 PID: 21612 Comm: kworker/0:1 Tainted: G        W  O 3.10.1SIGN #28
Hardware name: LENOVO 2306CTO/2306CTO, BIOS G2ET92WW (2.52 ) 02/22/2013
Workqueue: cifsiod cifs_echo_request [cifs]
task: ffff8801e1f416f0 ti: ffff880148744000 task.ti: ffff880148744000
RIP: 0010:[<ffffffff814dcc13>]  [<ffffffff814dcc13>] sock_sendmsg+0x93/0xd0
RSP: 0000:ffff880148745b00  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff880148745b78 RCX: 0000000000000048
RDX: ffff880148745c90 RSI: ffff880181864a00 RDI: ffff880148745b78
RBP: ffff880148745c48 R08: 0000000000000048 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff880181864a00
R13: ffff880148745c90 R14: 0000000000000048 R15: 0000000000000048
FS:  0000000000000000(0000) GS:ffff88021e200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000088 CR3: 000000020c42c000 CR4: 00000000001407b0
Oops#2 Part7
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
 ffff880148745b30 ffffffff810c4af9 0000004848745b30 ffff880181864a00
 ffffffff81ffbc40 0000000000000000 ffff880148745c90 ffffffff810a5aab
 ffff880148745bc0 ffffffff81ffbc40 ffff880148745b60 ffffffff815a9fb8
Call Trace:
 [<ffffffff810c4af9>] ? finish_task_switch+0x49/0xe0
 [<ffffffff810a5aab>] ? lock_timer_base.isra.36+0x2b/0x50
 [<ffffffff815a9fb8>] ? _raw_spin_unlock_irqrestore+0x18/0x40
 [<ffffffff810a673f>] ? try_to_del_timer_sync+0x4f/0x70
 [<ffffffff815aa38f>] ? _raw_spin_unlock_bh+0x1f/0x30
 [<ffffffff814dcc87>] kernel_sendmsg+0x37/0x50
 [<ffffffffa081a0e0>] smb_send_kvec+0xd0/0x1d0 [cifs]
 [<ffffffffa081a263>] smb_send_rqst+0x83/0x1f0 [cifs]
 [<ffffffffa081ab6c>] cifs_call_async+0xec/0x1b0 [cifs]
 [<ffffffffa08245e0>] ? free_rsp_buf+0x40/0x40 [cifs]
Oops#2 Part6
 [<ffffffffa082606e>] SMB2_echo+0x8e/0xb0 [cifs]
 [<ffffffffa0808789>] cifs_echo_request+0x79/0xa0 [cifs]
 [<ffffffff810b45b3>] process_one_work+0x173/0x4a0
 [<ffffffff810b52a1>] worker_thread+0x121/0x3a0
 [<ffffffff810b5180>] ? manage_workers.isra.27+0x2b0/0x2b0
 [<ffffffff810bae00>] kthread+0xc0/0xd0
 [<ffffffff810bad40>] ? kthread_create_on_node+0x120/0x120
 [<ffffffff815b199c>] ret_from_fork+0x7c/0xb0
 [<ffffffff810bad40>] ? kthread_create_on_node+0x120/0x120
Code: 84 24 b8 00 00 00 4c 89 f1 4c 89 ea 4c 89 e6 48 89 df 4c 89 60 18 48 c7 40 28 00 00 00 00 4c 89 68 30 44 89 70 14 49 8b 44 24 28 <ff> 90 88 00 00 00 3d ef fd ff ff 74 10 48 8d 65 e0 5b 41 5c 41
 RIP  [<ffffffff814dcc13>] sock_sendmsg+0x93/0xd0
 RSP <ffff880148745b00>
CR2: 0000000000000088

The client was in the middle of trying to send a frame when the
server->ssocket pointer got zeroed out. In most places, that we access
that pointer, the srv_mutex is held. There's only one spot that I see
that the server->ssocket pointer gets set and the srv_mutex isn't held.
This patch corrects that.

The upstream bug report was here:

    https://bugzilla.kernel.org/show_bug.cgi?id=60557

Reported-by: Oleksii Shevchuk <alxchk@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agousb: xhci: Disable runtime PM suspend for quirky controllers
Shawn Nematbakhsh [Mon, 19 Aug 2013 17:36:13 +0000 (10:36 -0700)]
usb: xhci: Disable runtime PM suspend for quirky controllers

commit c8476fb855434c733099079063990e5bfa7ecad6 upstream.

If a USB controller with XHCI_RESET_ON_RESUME goes to runtime suspend,
a reset will be performed upon runtime resume. Any previously suspended
devices attached to the controller will be re-enumerated at this time.
This will cause problems, for example, if an open system call on the
device triggered the resume (the open call will fail).

Note that this change is only relevant when persist_enabled is not set
for USB devices.

This patch should be backported to kernels as old as 3.0, that
contain the commit c877b3b2ad5cb9d4fe523c5496185cc328ff3ae9 "xhci: Add
reset on resume quirk for asrock p67 host".

Signed-off-by: Shawn Nematbakhsh <shawnn@chromium.org>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoARM: PCI: versatile: Fix SMAP register offsets
Peter Maydell [Thu, 22 Aug 2013 16:47:50 +0000 (17:47 +0100)]
ARM: PCI: versatile: Fix SMAP register offsets

commit 99f2b130370b904ca5300079243fdbcafa2c708b upstream.

The SMAP register offsets in the versatile PCI controller code were
all off by four.  (This didn't have any observable bad effects
because on this board PHYS_OFFSET is zero, and (a) writing zero to
the flags register at offset 0x10 has no effect and (b) the reset
value of the SMAP register is zero anyway, so failing to write SMAP2
didn't matter.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Kevin Hilman <khilman@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoxen-gnt: prevent adding duplicate gnt callbacks
Roger Pau Monne [Wed, 31 Jul 2013 15:00:42 +0000 (17:00 +0200)]
xen-gnt: prevent adding duplicate gnt callbacks

commit 5f338d9001094a56cf87bd8a280b4e7ff953bb59 upstream.

With the current implementation, the callback in the tail of the list
can be added twice, because the check done in
gnttab_request_free_callback is bogus, callback->next can be NULL if
it is the last callback in the list. If we add the same callback twice
we end up with an infinite loop, were callback == callback->next.

Replace this check with a proper one that iterates over the list to
see if the callback has already been added.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Matt Wilson <msw@amazon.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agopowerpc: Handle unaligned ldbrx/stdbrx
Anton Blanchard [Tue, 6 Aug 2013 16:01:19 +0000 (02:01 +1000)]
powerpc: Handle unaligned ldbrx/stdbrx

commit 230aef7a6a23b6166bd4003bfff5af23c9bd381f upstream.

Normally when we haven't implemented an alignment handler for
a load or store instruction the process will be terminated.

The alignment handler uses the DSISR (or a pseudo one) to locate
the right handler. Unfortunately ldbrx and stdbrx overlap lfs and
stfs so we incorrectly think ldbrx is an lfs and stdbrx is an
stfs.

This bug is particularly nasty - instead of terminating the
process we apply an incorrect fixup and continue on.

With more and more overlapping instructions we should stop
creating a pseudo DSISR and index using the instruction directly,
but for now add a special case to catch ldbrx/stdbrx.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agocrypto: api - Fix race condition in larval lookup
Herbert Xu [Sun, 8 Sep 2013 04:33:50 +0000 (14:33 +1000)]
crypto: api - Fix race condition in larval lookup

commit 77dbd7a95e4a4f15264c333a9e9ab97ee27dc2aa upstream.

crypto_larval_lookup should only return a larval if it created one.
Any larval created by another entity must be processed through
crypto_larval_wait before being returned.

Otherwise this will lead to a larval being killed twice, which
will most likely lead to a crash.

Reported-by: Kees Cook <keescook@chromium.org>
Tested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoSCSI: sd: Fix potential out-of-bounds access
Alan Stern [Fri, 6 Sep 2013 15:49:51 +0000 (11:49 -0400)]
SCSI: sd: Fix potential out-of-bounds access

commit 984f1733fcee3fbc78d47e26c5096921c5d9946a upstream.

This patch fixes an out-of-bounds error in sd_read_cache_type(), found
by Google's AddressSanitizer tool.  When the loop ends, we know that
"offset" lies beyond the end of the data in the buffer, so no Caching
mode page was found.  In theory it may be present, but the buffer size
is limited to 512 bytes.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoLinux 3.0.96 v3.0.96
Greg Kroah-Hartman [Sat, 14 Sep 2013 12:58:09 +0000 (05:58 -0700)]
Linux 3.0.96

10 years agoKVM: s390: move kvm_guest_enter,exit closer to sie
Dominik Dingel [Fri, 26 Jul 2013 13:04:00 +0000 (15:04 +0200)]
KVM: s390: move kvm_guest_enter,exit closer to sie

commit 2b29a9fdcb92bfc6b6f4c412d71505869de61a56 upstream.

Any uaccess between guest_enter and guest_exit could trigger a page fault,
the page fault handler would handle it as a guest fault and translate a
user address as guest address.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[bwh: Backported to 3.2: adjust context and add the rc variable]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Reviewed-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agom32r: make memset() global for CONFIG_KERNEL_BZIP2=y
Geert Uytterhoeven [Tue, 17 Jul 2012 22:48:05 +0000 (15:48 -0700)]
m32r: make memset() global for CONFIG_KERNEL_BZIP2=y

commit 9a75c6e5240f7edc5955e8da5b94bde6f96070b3 upstream.

Fix the m32r compile error:

  arch/m32r/boot/compressed/misc.c:31:14: error: static declaration of 'memset' follows non-static declaration
  make[5]: *** [arch/m32r/boot/compressed/misc.o] Error 1
  make[4]: *** [arch/m32r/boot/compressed/vmlinux] Error 2

by removing the static keyword.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agom32r: add memcpy() for CONFIG_KERNEL_GZIP=y
Geert Uytterhoeven [Tue, 17 Jul 2012 22:48:04 +0000 (15:48 -0700)]
m32r: add memcpy() for CONFIG_KERNEL_GZIP=y

commit a8abbca6617e1caa2344d2d38d0a35f3e5928b79 upstream.

Fix the m32r link error:

    LD      arch/m32r/boot/compressed/vmlinux
  arch/m32r/boot/compressed/misc.o: In function `zlib_updatewindow':
  misc.c:(.text+0x190): undefined reference to `memcpy'
  misc.c:(.text+0x190): relocation truncated to fit: R_M32R_26_PLTREL against undefined symbol `memcpy'
  make[5]: *** [arch/m32r/boot/compressed/vmlinux] Error 1

by adding our own implementation of memcpy().

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agom32r: consistently use "suffix-$(...)"
Geert Uytterhoeven [Tue, 17 Jul 2012 22:48:02 +0000 (15:48 -0700)]
m32r: consistently use "suffix-$(...)"

commit df12aef6a19bb2d69859a94936bda0e6ccaf3327 upstream.

Commit a556bec9955c ("m32r: fix arch/m32r/boot/compressed/Makefile")
changed "$(suffix_y)" to "$(suffix-y)", but didn't update any location
where "suffix_y" is set, causing:

  make[5]: *** No rule to make target `arch/m32r/boot/compressed/vmlinux.bin.', needed by `arch/m32r/boot/compressed/piggy.o'.  Stop.
  make[4]: *** [arch/m32r/boot/compressed/vmlinux] Error 2
  make[3]: *** [zImage] Error 2

Correct the other locations to fix this.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agopci: frv architecture needs generic setup-bus infrastructure
Paul Gortmaker [Wed, 18 Apr 2012 21:17:19 +0000 (17:17 -0400)]
pci: frv architecture needs generic setup-bus infrastructure

commit cd0a2bfb77a3edeecd652081e0b1a163d3b0696b upstream.

Otherwise we get this link failure for frv's defconfig:

   LD      .tmp_vmlinux1
 drivers/built-in.o: In function `pci_assign_resource':
 (.text+0xbf0c): undefined reference to `pci_cardbus_resource_alignment'
 drivers/built-in.o: In function `pci_setup':
 pci.c:(.init.text+0x174): undefined reference to `pci_realloc_get_opt'
 pci.c:(.init.text+0x1a0): undefined reference to `pci_realloc_get_opt'
 make[1]: *** [.tmp_vmlinux1] Error 1

Cc: David Howells <dhowells@redhat.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoPARISC: include <linux/prefetch.h> in drivers/parisc/iommu-helpers.h
Cong Wang [Fri, 3 Feb 2012 07:34:16 +0000 (15:34 +0800)]
PARISC: include <linux/prefetch.h> in drivers/parisc/iommu-helpers.h

commit 650275dbfb2f4c12bc91420ad5a99f955eabec98 upstream.

drivers/parisc/iommu-helpers.h:62: error: implicit declaration of function 'prefetchw'
make[3]: *** [drivers/parisc/sba_iommu.o] Error 1

drivers/parisc/iommu-helpers.h needs to #include <linux/prefetch.h>
where prefetchw is declared.

Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agotipc: fix lockdep warning during bearer initialization
Ying Xue [Thu, 16 Aug 2012 12:09:07 +0000 (12:09 +0000)]
tipc: fix lockdep warning during bearer initialization

[ Upstream commit 4225a398c1352a7a5c14dc07277cb5cc4473983b ]

When the lockdep validator is enabled, it will report the below
warning when we enable a TIPC bearer:

[ INFO: possible irq lock inversion dependency detected ]
---------------------------------------------------------
Possible interrupt unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(ptype_lock);
                                local_irq_disable();
                                lock(tipc_net_lock);
                                lock(ptype_lock);
   <Interrupt>
   lock(tipc_net_lock);

  *** DEADLOCK ***

the shortest dependencies between 2nd lock and 1st lock:
  -> (ptype_lock){+.+...} ops: 10 {
[...]
SOFTIRQ-ON-W at:
                      [<c1089418>] __lock_acquire+0x528/0x13e0
                      [<c108a360>] lock_acquire+0x90/0x100
                      [<c1553c38>] _raw_spin_lock+0x38/0x50
                      [<c14651ca>] dev_add_pack+0x3a/0x60
                      [<c182da75>] arp_init+0x1a/0x48
                      [<c182dce5>] inet_init+0x181/0x27e
                      [<c1001114>] do_one_initcall+0x34/0x170
                      [<c17f7329>] kernel_init+0x110/0x1b2
                      [<c155b6a2>] kernel_thread_helper+0x6/0x10
[...]
   ... key      at: [<c17e4b10>] ptype_lock+0x10/0x20
   ... acquired at:
    [<c108a360>] lock_acquire+0x90/0x100
    [<c1553c38>] _raw_spin_lock+0x38/0x50
    [<c14651ca>] dev_add_pack+0x3a/0x60
    [<c8bc18d2>] enable_bearer+0xf2/0x140 [tipc]
    [<c8bb283a>] tipc_enable_bearer+0x1ba/0x450 [tipc]
    [<c8bb3a04>] tipc_cfg_do_cmd+0x5c4/0x830 [tipc]
    [<c8bbc032>] handle_cmd+0x42/0xd0 [tipc]
    [<c148e802>] genl_rcv_msg+0x232/0x280
    [<c148d3f6>] netlink_rcv_skb+0x86/0xb0
    [<c148e5bc>] genl_rcv+0x1c/0x30
    [<c148d144>] netlink_unicast+0x174/0x1f0
    [<c148ddab>] netlink_sendmsg+0x1eb/0x2d0
    [<c1456bc1>] sock_aio_write+0x161/0x170
    [<c1135a7c>] do_sync_write+0xac/0xf0
    [<c11360f6>] vfs_write+0x156/0x170
    [<c11361e2>] sys_write+0x42/0x70
    [<c155b0df>] sysenter_do_call+0x12/0x38
[...]
}
  -> (tipc_net_lock){+..-..} ops: 4 {
[...]
    IN-SOFTIRQ-R at:
                     [<c108953a>] __lock_acquire+0x64a/0x13e0
                     [<c108a360>] lock_acquire+0x90/0x100
                     [<c15541cd>] _raw_read_lock_bh+0x3d/0x50
                     [<c8bb874d>] tipc_recv_msg+0x1d/0x830 [tipc]
                     [<c8bc195f>] recv_msg+0x3f/0x50 [tipc]
                     [<c146a5fa>] __netif_receive_skb+0x22a/0x590
                     [<c146ab0b>] netif_receive_skb+0x2b/0xf0
                     [<c13c43d2>] pcnet32_poll+0x292/0x780
                     [<c146b00a>] net_rx_action+0xfa/0x1e0
                     [<c103a4be>] __do_softirq+0xae/0x1e0
[...]
}

>From the log, we can see three different call chains between
CPU0 and CPU1:

Time 0 on CPU0:

  kernel_init()->inet_init()->dev_add_pack()

At time 0, the ptype_lock is held by CPU0 in dev_add_pack();

Time 1 on CPU1:

  tipc_enable_bearer()->enable_bearer()->dev_add_pack()

At time 1, tipc_enable_bearer() first holds tipc_net_lock, and then
wants to take ptype_lock to register TIPC protocol handler into the
networking stack.  But the ptype_lock has been taken by dev_add_pack()
on CPU0, so at this time the dev_add_pack() running on CPU1 has to be
busy looping.

Time 2 on CPU0:

  netif_receive_skb()->recv_msg()->tipc_recv_msg()

At time 2, an incoming TIPC packet arrives at CPU0, hence
tipc_recv_msg() will be invoked. In tipc_recv_msg(), it first wants
to hold tipc_net_lock.  At the moment, below scenario happens:

On CPU0, below is our sequence of taking locks:

  lock(ptype_lock)->lock(tipc_net_lock)

On CPU1, our sequence of taking locks looks like:

  lock(tipc_net_lock)->lock(ptype_lock)

Obviously deadlock may happen in this case.

But please note the deadlock possibly doesn't occur at all when the
first TIPC bearer is enabled.  Before enable_bearer() -- running on
CPU1 does not hold ptype_lock, so the TIPC receive handler (i.e.
recv_msg()) is not registered successfully via dev_add_pack(), so
the tipc_recv_msg() cannot be called by recv_msg() even if a TIPC
message comes to CPU0. But when the second TIPC bearer is
registered, the deadlock can perhaps really happen.

To fix it, we will push the work of registering TIPC protocol
handler into workqueue context. After the change, both paths taking
ptype_lock are always in process contexts, thus, the deadlock should
never occur.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoICMPv6: treat dest unreachable codes 5 and 6 as EACCES, not EPROTO
Jiri Bohac [Fri, 30 Aug 2013 09:18:45 +0000 (11:18 +0200)]
ICMPv6: treat dest unreachable codes 5 and 6 as EACCES, not EPROTO

[ Upstream commit 61e76b178dbe7145e8d6afa84bb4ccea71918994 ]

RFC 4443 has defined two additional codes for ICMPv6 type 1 (destination
unreachable) messages:
        5 - Source address failed ingress/egress policy
6 - Reject route to destination

Now they are treated as protocol error and icmpv6_err_convert() converts them
to EPROTO.

RFC 4443 says:
"Codes 5 and 6 are more informative subsets of code 1."

Treat codes 5 and 6 as code 1 (EACCES)

Btw, connect() returning -EPROTO confuses firefox, so that fallback to
other/IPv4 addresses does not work:
https://bugzilla.mozilla.org/show_bug.cgi?id=910773

Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agonet: bridge: convert MLDv2 Query MRC into msecs_to_jiffies for max_delay
Daniel Borkmann [Thu, 29 Aug 2013 21:55:05 +0000 (23:55 +0200)]
net: bridge: convert MLDv2 Query MRC into msecs_to_jiffies for max_delay

[ Upstream commit 2d98c29b6fb3de44d9eaa73c09f9cf7209346383 ]

While looking into MLDv1/v2 code, I noticed that bridging code does
not convert it's max delay into jiffies for MLDv2 messages as we do
in core IPv6' multicast code.

RFC3810, 5.1.3. Maximum Response Code says:

  The Maximum Response Code field specifies the maximum time allowed
  before sending a responding Report. The actual time allowed, called
  the Maximum Response Delay, is represented in units of milliseconds,
  and is derived from the Maximum Response Code as follows: [...]

As we update timers that work with jiffies, we need to convert it.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Linus LĂĽssing <linus.luessing@web.de>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoipv6: Don't depend on per socket memory for neighbour discovery messages
Thomas Graf [Tue, 3 Sep 2013 11:37:01 +0000 (13:37 +0200)]
ipv6: Don't depend on per socket memory for neighbour discovery messages

[ Upstream commit 25a6e6b84fba601eff7c28d30da8ad7cfbef0d43 ]

Allocating skbs when sending out neighbour discovery messages
currently uses sock_alloc_send_skb() based on a per net namespace
socket and thus share a socket wmem buffer space.

If a netdevice is temporarily unable to transmit due to carrier
loss or for other reasons, the queued up ndisc messages will cosnume
all of the wmem space and will thus prevent from any more skbs to
be allocated even for netdevices that are able to transmit packets.

The number of neighbour discovery messages sent is very limited,
use of alloc_skb() bypasses the socket wmem buffer size enforcement
while the manual call to skb_set_owner_w() maintains the socket
reference needed for the IPv6 output path.

This patch has orginally been posted by Eric Dumazet in a modified
form.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Stephen Warren <swarren@wwwdotorg.org>
Cc: Fabio Estevam <festevam@gmail.com>
Tested-by: Fabio Estevam <fabio.estevam@freescale.com>
Tested-by: Stephen Warren <swarren@nvidia.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoipv6: drop packets with multiple fragmentation headers
Hannes Frederic Sowa [Fri, 16 Aug 2013 11:30:07 +0000 (13:30 +0200)]
ipv6: drop packets with multiple fragmentation headers

[ Upstream commit f46078cfcd77fa5165bf849f5e568a7ac5fa569c ]

It is not allowed for an ipv6 packet to contain multiple fragmentation
headers. So discard packets which were already reassembled by
fragmentation logic and send back a parameter problem icmp.

The updates for RFC 6980 will come in later, I have to do a bit more
research here.

Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoipv6: remove max_addresses check from ipv6_create_tempaddr
Hannes Frederic Sowa [Fri, 16 Aug 2013 11:02:27 +0000 (13:02 +0200)]
ipv6: remove max_addresses check from ipv6_create_tempaddr

[ Upstream commit 4b08a8f1bd8cb4541c93ec170027b4d0782dab52 ]

Because of the max_addresses check attackers were able to disable privacy
extensions on an interface by creating enough autoconfigured addresses:

<http://seclists.org/oss-sec/2012/q4/292>

But the check is not actually needed: max_addresses protects the
kernel to install too many ipv6 addresses on an interface and guards
addrconf_prefix_rcv to install further addresses as soon as this limit
is reached. We only generate temporary addresses in direct response of
a new address showing up. As soon as we filled up the maximum number of
addresses of an interface, we stop installing more addresses and thus
also stop generating more temp addresses.

Even if the attacker tries to generate a lot of temporary addresses
by announcing a prefix and removing it again (lifetime == 0) we won't
install more temp addresses, because the temporary addresses do count
to the maximum number of addresses, thus we would stop installing new
autoconfigured addresses when the limit is reached.

This patch fixes CVE-2013-0343 (but other layer-2 attacks are still
possible).

Thanks to Ding Tianhong to bring this topic up again.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Ding Tianhong <dingtianhong@huawei.com>
Cc: George Kargiotakis <kargig@void.gr>
Cc: P J P <ppandit@redhat.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agotun: signedness bug in tun_get_user()
Dan Carpenter [Thu, 15 Aug 2013 12:52:57 +0000 (15:52 +0300)]
tun: signedness bug in tun_get_user()

[ Upstream commit 15718ea0d844e4816dbd95d57a8a0e3e264ba90e ]

The recent fix d9bf5f1309 "tun: compare with 0 instead of total_len" is
not totally correct.  Because "len" and "sizeof()" are size_t type, that
means they are never less than zero.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoipv6: don't stop backtracking in fib6_lookup_1 if subtree does not match
Hannes Frederic Sowa [Wed, 7 Aug 2013 00:34:31 +0000 (02:34 +0200)]
ipv6: don't stop backtracking in fib6_lookup_1 if subtree does not match

[ Upstream commit 3e3be275851bc6fc90bfdcd732cd95563acd982b ]

In case a subtree did not match we currently stop backtracking and return
NULL (root table from fib_lookup). This could yield in invalid routing
table lookups when using subtrees.

Instead continue to backtrack until a valid subtree or node is found
and return this match.

Also remove unneeded NULL check.

Reported-by: Teco Boot <teco@inf-net.nl>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Cc: David Lamparter <equinox@diac24.net>
Cc: <boutier@pps.univ-paris-diderot.fr>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agotcp: cubic: fix bug in bictcp_acked()
Eric Dumazet [Tue, 6 Aug 2013 03:05:12 +0000 (20:05 -0700)]
tcp: cubic: fix bug in bictcp_acked()

[ Upstream commit cd6b423afd3c08b27e1fed52db828ade0addbc6b ]

While investigating about strange increase of retransmit rates
on hosts ~24 days after boot, Van found hystart was disabled
if ca->epoch_start was 0, as following condition is true
when tcp_time_stamp high order bit is set.

(s32)(tcp_time_stamp - ca->epoch_start) < HZ

Quoting Van :

 At initialization & after every loss ca->epoch_start is set to zero so
 I believe that the above line will turn off hystart as soon as the 2^31
 bit is set in tcp_time_stamp & hystart will stay off for 24 days.
 I think we've observed that cubic's restart is too aggressive without
 hystart so this might account for the higher drop rate we observe.

Diagnosed-by: Van Jacobson <vanj@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agotcp: cubic: fix overflow error in bictcp_update()
Eric Dumazet [Tue, 6 Aug 2013 00:10:15 +0000 (17:10 -0700)]
tcp: cubic: fix overflow error in bictcp_update()

[ Upstream commit 2ed0edf9090bf4afa2c6fc4f38575a85a80d4b20 ]

commit 17a6e9f1aa9 ("tcp_cubic: fix clock dependency") added an
overflow error in bictcp_update() in following code :

/* change the unit from HZ to bictcp_HZ */
t = ((tcp_time_stamp + msecs_to_jiffies(ca->delay_min>>3) -
      ca->epoch_start) << BICTCP_HZ) / HZ;

Because msecs_to_jiffies() being unsigned long, compiler does
implicit type promotion.

We really want to constrain (tcp_time_stamp - ca->epoch_start)
to a signed 32bit value, or else 't' has unexpected high values.

This bugs triggers an increase of retransmit rates ~24 days after
boot [1], as the high order bit of tcp_time_stamp flips.

[1] for hosts with HZ=1000

Big thanks to Van Jacobson for spotting this problem.

Diagnosed-by: Van Jacobson <vanj@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agofib_trie: remove potential out of bound access
Eric Dumazet [Mon, 5 Aug 2013 18:18:49 +0000 (11:18 -0700)]
fib_trie: remove potential out of bound access

[ Upstream commit aab515d7c32a34300312416c50314e755ea6f765 ]

AddressSanitizer [1] dynamic checker pointed a potential
out of bound access in leaf_walk_rcu()

We could allocate one more slot in tnode_new() to leave the prefetch()
in-place but it looks not worth the pain.

Bug added in commit 82cfbb008572b ("[IPV4] fib_trie: iterator recode")

[1] :
https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agonet: check net.core.somaxconn sysctl values
Roman Gushchin [Fri, 2 Aug 2013 14:36:40 +0000 (18:36 +0400)]
net: check net.core.somaxconn sysctl values

[ Upstream commit 5f671d6b4ec3e6d66c2a868738af2cdea09e7509 ]

It's possible to assign an invalid value to the net.core.somaxconn
sysctl variable, because there is no checks at all.

The sk_max_ack_backlog field of the sock structure is defined as
unsigned short. Therefore, the backlog argument in inet_listen()
shouldn't exceed USHRT_MAX. The backlog argument in the listen() syscall
is truncated to the somaxconn value. So, the somaxconn value shouldn't
exceed 65535 (USHRT_MAX).
Also, negative values of somaxconn are meaningless.

before:
$ sysctl -w net.core.somaxconn=256
net.core.somaxconn = 256
$ sysctl -w net.core.somaxconn=65536
net.core.somaxconn = 65536
$ sysctl -w net.core.somaxconn=-100
net.core.somaxconn = -100

after:
$ sysctl -w net.core.somaxconn=256
net.core.somaxconn = 256
$ sysctl -w net.core.somaxconn=65536
error: "Invalid argument" setting key "net.core.somaxconn"
$ sysctl -w net.core.somaxconn=-100
error: "Invalid argument" setting key "net.core.somaxconn"

Based on a prior patch from Changli Gao.

Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
Reported-by: Changli Gao <xiaosuo@gmail.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agohtb: fix sign extension bug
stephen hemminger [Fri, 2 Aug 2013 05:32:07 +0000 (22:32 -0700)]
htb: fix sign extension bug

[ Upstream commit cbd375567f7e4811b1c721f75ec519828ac6583f ]

When userspace passes a large priority value
the assignment of the unsigned value hopt->prio
to  signed int cl->prio causes cl->prio to become negative and the
comparison is with TC_HTB_NUMPRIO is always false.

The result is that HTB crashes by referencing outside
the array when processing packets. With this patch the large value
wraps around like other values outside the normal range.

See: https://bugzilla.kernel.org/show_bug.cgi?id=60669

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoLinux 3.0.95 v3.0.95
Greg Kroah-Hartman [Sun, 8 Sep 2013 04:49:47 +0000 (21:49 -0700)]
Linux 3.0.95

10 years agoSCSI: sg: Fix user memory corruption when SG_IO is interrupted by a signal
Roland Dreier [Tue, 6 Aug 2013 00:55:01 +0000 (17:55 -0700)]
SCSI: sg: Fix user memory corruption when SG_IO is interrupted by a signal

commit 35dc248383bbab0a7203fca4d722875bc81ef091 upstream.

There is a nasty bug in the SCSI SG_IO ioctl that in some circumstances
leads to one process writing data into the address space of some other
random unrelated process if the ioctl is interrupted by a signal.
What happens is the following:

 - A process issues an SG_IO ioctl with direction DXFER_FROM_DEV (ie the
   underlying SCSI command will transfer data from the SCSI device to
   the buffer provided in the ioctl)

 - Before the command finishes, a signal is sent to the process waiting
   in the ioctl.  This will end up waking up the sg_ioctl() code:

result = wait_event_interruptible(sfp->read_wait,
(srp_done(sfp, srp) || sdp->detached));

   but neither srp_done() nor sdp->detached is true, so we end up just
   setting srp->orphan and returning to userspace:

srp->orphan = 1;
write_unlock_irq(&sfp->rq_list_lock);
return result; /* -ERESTARTSYS because signal hit process */

   At this point the original process is done with the ioctl and
   blithely goes ahead handling the signal, reissuing the ioctl, etc.

 - Eventually, the SCSI command issued by the first ioctl finishes and
   ends up in sg_rq_end_io().  At the end of that function, we run through:

write_lock_irqsave(&sfp->rq_list_lock, iflags);
if (unlikely(srp->orphan)) {
if (sfp->keep_orphan)
srp->sg_io_owned = 0;
else
done = 0;
}
srp->done = done;
write_unlock_irqrestore(&sfp->rq_list_lock, iflags);

if (likely(done)) {
/* Now wake up any sg_read() that is waiting for this
 * packet.
 */
wake_up_interruptible(&sfp->read_wait);
kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN);
kref_put(&sfp->f_ref, sg_remove_sfp);
} else {
INIT_WORK(&srp->ew.work, sg_rq_end_io_usercontext);
schedule_work(&srp->ew.work);
}

   Since srp->orphan *is* set, we set done to 0 (assuming the
   userspace app has not set keep_orphan via an SG_SET_KEEP_ORPHAN
   ioctl), and therefore we end up scheduling sg_rq_end_io_usercontext()
   to run in a workqueue.

 - In workqueue context we go through sg_rq_end_io_usercontext() ->
   sg_finish_rem_req() -> blk_rq_unmap_user() -> ... ->
   bio_uncopy_user() -> __bio_copy_iov() -> copy_to_user().

   The key point here is that we are doing copy_to_user() on a
   workqueue -- that is, we're on a kernel thread with current->mm
   equal to whatever random previous user process was scheduled before
   this kernel thread.  So we end up copying whatever data the SCSI
   command returned to the virtual address of the buffer passed into
   the original ioctl, but it's quite likely we do this copying into a
   different address space!

As suggested by James Bottomley <James.Bottomley@hansenpartnership.com>,
add a check for current->mm (which is NULL if we're on a kernel thread
without a real userspace address space) in bio_uncopy_user(), and skip
the copy if we're on a kernel thread.

There's no reason that I can think of for any caller of bio_uncopy_user()
to want to do copying on a kernel thread with a random active userspace
address space.

Huge thanks to Costa Sapuntzakis <costa@purestorage.com> for the
original pointer to this bug in the sg code.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Tested-by: David Milburn <dmilburn@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
[lizf: backported to 3.4:
 - Use __bio_for_each_segment() instead of bio_for_each_segment_all()]
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agotarget: Fix trailing ASCII space usage in INQUIRY vendor+model
Nicholas Bellinger [Wed, 24 Jul 2013 23:15:08 +0000 (16:15 -0700)]
target: Fix trailing ASCII space usage in INQUIRY vendor+model

commit ee60bddba5a5f23e39598195d944aa0eb2d455e5 upstream.

This patch fixes spc_emulate_inquiry_std() to add trailing ASCII
spaces for INQUIRY vendor + model fields following SPC-4 text:

  "ASCII data fields described as being left-aligned shall have any
   unused bytes at the end of the field (i.e., highest offset) and
   the unused bytes shall be filled with ASCII space characters (20h)."

This addresses a problem with Falconstor NSS multipathing.

Reported-by: Tomas Molota <tomas.molota@lightstorm.sk>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoACPI / EC: Add ASUSTEK L4R to quirk list in order to validate ECDT
Lan Tianyu [Mon, 26 Aug 2013 02:19:18 +0000 (10:19 +0800)]
ACPI / EC: Add ASUSTEK L4R to quirk list in order to validate ECDT

commit 524f42fab787a9510be826ce3d736b56d454ac6d upstream.

The ECDT of ASUSTEK L4R doesn't provide correct command and data
I/O ports.  The DSDT provides the correct information instead.

For this reason, add this machine to quirk list for ECDT validation
and use the EC information from the DSDT.

[rjw: Changelog]
References: https://bugzilla.kernel.org/show_bug.cgi?id=60765
Reported-and-tested-by: Daniele Esposti <expo@expobrain.net>
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoath9k_htc: Restore skb headroom when returning skb to mac80211
Helmut Schaa [Fri, 16 Aug 2013 19:39:40 +0000 (21:39 +0200)]
ath9k_htc: Restore skb headroom when returning skb to mac80211

commit d2e9fc141e2aa21f4b35ee27072d84e9aa6e2ba0 upstream.

ath9k_htc adds padding between the 802.11 header and the payload during
TX by moving the header. When handing the frame back to mac80211 for TX
status handling the header is not moved back into its original position.
This can result in a too small skb headroom when entering ath9k_htc
again (due to a soft retransmission for example) causing an
skb_under_panic oops.

Fix this by moving the 802.11 header back into its original position
before returning the frame to mac80211 as other drivers like rt2x00
or ath5k do.

Reported-by: Marc Kleine-Budde <mkl@blackshift.org>
Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Tested-by: Marc Kleine-Budde <mkl@blackshift.org>
Signed-off-by: Marc Kleine-Budde <mkl@blackshift.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agodrivers/base/memory.c: fix show_mem_removable() to handle missing sections
Russ Anderson [Wed, 28 Aug 2013 23:35:18 +0000 (16:35 -0700)]
drivers/base/memory.c: fix show_mem_removable() to handle missing sections

commit 21ea9f5ace3a7317cc3ba1fbc749758021a83136 upstream.

"cat /sys/devices/system/memory/memory*/removable" crashed the system.

The problem is that show_mem_removable() is passing a
bad pfn to is_mem_section_removable(), which causes

    if (!node_online(page_to_nid(page)))

to blow up.  Why is it passing in a bad pfn?

The reason is that show_mem_removable() will loop sections_per_block
times.  sections_per_block is 16, but mem->section_count is 8,
indicating holes in this memory block.  Checking that the memory section
is present before checking to see if the memory section is removable
fixes the problem.

   harp5-sys:~ # cat /sys/devices/system/memory/memory*/removable
   0
   1
   1
   1
   1
   1
   1
   1
   1
   1
   1
   1
   1
   1
   BUG: unable to handle kernel paging request at ffffea00c3200000
   IP: [<ffffffff81117ed1>] is_pageblock_removable_nolock+0x1/0x90
   PGD 83ffd4067 PUD 37bdfce067 PMD 0
   Oops: 0000 [#1] SMP
   Modules linked in: autofs4 binfmt_misc rdma_ucm rdma_cm iw_cm ib_addr ib_srp scsi_transport_srp scsi_tgt ib_ipoib ib_cm ib_uverbs ib_umad iw_cxgb3 cxgb3 mdio mlx4_en mlx4_ib ib_sa mlx4_core ib_mthca ib_mad ib_core fuse nls_iso8859_1 nls_cp437 vfat fat joydev loop hid_generic usbhid hid hwperf(O) numatools(O) dm_mod iTCO_wdt ipv6 iTCO_vendor_support igb i2c_i801 ioatdma i2c_algo_bit ehci_pci pcspkr lpc_ich i2c_core ehci_hcd ptp sg mfd_core dca rtc_cmos pps_core mperf button xhci_hcd sd_mod crc_t10dif usbcore usb_common scsi_dh_emc scsi_dh_hp_sw scsi_dh_alua scsi_dh_rdac scsi_dh gru(O) xvma(O) xfs crc32c libcrc32c thermal sata_nv processor piix mptsas mptscsih scsi_transport_sas mptbase megaraid_sas fan thermal_sys hwmon ext3 jbd ata_piix ahci libahci libata scsi_mod
   CPU: 4 PID: 5991 Comm: cat Tainted: G           O 3.11.0-rc5-rja-uv+ #10
   Hardware name: SGI UV2000/ROMLEY, BIOS SGI UV 2000/3000 series BIOS 01/15/2013
   task: ffff88081f034580 ti: ffff880820022000 task.ti: ffff880820022000
   RIP: 0010:[<ffffffff81117ed1>]  [<ffffffff81117ed1>] is_pageblock_removable_nolock+0x1/0x90
   RSP: 0018:ffff880820023df8  EFLAGS: 00010287
   RAX: 0000000000040000 RBX: ffffea00c3200000 RCX: 0000000000000004
   RDX: ffffea00c30b0000 RSI: 00000000001c0000 RDI: ffffea00c3200000
   RBP: ffff880820023e38 R08: 0000000000000000 R09: 0000000000000001
   R10: 0000000000000000 R11: 0000000000000001 R12: ffffea00c33c0000
   R13: 0000160000000000 R14: 6db6db6db6db6db7 R15: 0000000000000001
   FS:  00007ffff7fb2700(0000) GS:ffff88083fc80000(0000) knlGS:0000000000000000
   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
   CR2: ffffea00c3200000 CR3: 000000081b954000 CR4: 00000000000407e0
   Call Trace:
     show_mem_removable+0x41/0x70
     dev_attr_show+0x2a/0x60
     sysfs_read_file+0xf7/0x1c0
     vfs_read+0xc8/0x130
     SyS_read+0x5d/0xa0
     system_call_fastpath+0x16/0x1b

Signed-off-by: Russ Anderson <rja@sgi.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoALSA: opti9xx: Fix conflicting driver object name
Takashi Iwai [Tue, 27 Aug 2013 10:03:01 +0000 (12:03 +0200)]
ALSA: opti9xx: Fix conflicting driver object name

commit fb615499f0ad28ed74201c1cdfddf9e64e205424 upstream.

The recent commit to delay the release of kobject triggered NULL
dereferences of opti9xx drivers.  The cause is that all
snd-opti92x-ad1848, snd-opti92x-cs4231 and snd-opti93x drivers
register the PnP card driver with the very same name, and also
snd-opti92x-ad1848 and -cs4231 drivers register the ISA driver with
the same name, too.  When these drivers are built in, quick
"register-release-and-re-register" actions occur, and this results in
Oops because of the same name is assigned to the kobject.

The fix is simply to assign individual names.  As a bonus, by using
KBUILD_MODNAME, the patch reduces more lines than it adds.

The fix is based on the suggestion by Russell King.

Reported-and-tested-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agojfs: fix readdir cookie incompatibility with NFSv4
Dave Kleikamp [Thu, 15 Aug 2013 20:36:49 +0000 (15:36 -0500)]
jfs: fix readdir cookie incompatibility with NFSv4

commit 44512449c0ab368889dd13ae0031fba74ee7e1d2 upstream.

NFSv4 reserves readdir cookie values 0-2 for special entries (. and ..),
but jfs allows a value of 2 for a non-special entry. This incompatibility
can result in the nfs client reporting a readdir loop.

This patch doesn't change the value stored internally, but adds one to
the value exposed to the iterate method.

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
[bwh: Backported to 3.2:
 - Adjust context
 - s/ctx->pos/filp->f_pos/]
Tested-by: Christian Kujau <lists@nerdbynature.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoLinux 3.0.94 v3.0.94
Greg Kroah-Hartman [Thu, 29 Aug 2013 16:43:15 +0000 (09:43 -0700)]
Linux 3.0.94

10 years agoSCSI: zfcp: fix schedule-inside-lock in scsi_device list loops
Martin Peschke [Thu, 22 Aug 2013 15:45:37 +0000 (17:45 +0200)]
SCSI: zfcp: fix schedule-inside-lock in scsi_device list loops

commit 924dd584b198a58aa7cb3efefd8a03326550ce8f upstream.

BUG: sleeping function called from invalid context at kernel/workqueue.c:2752
in_atomic(): 1, irqs_disabled(): 1, pid: 360, name: zfcperp0.0.1700
CPU: 1 Not tainted 3.9.3+ #69
Process zfcperp0.0.1700 (pid: 360, task: 0000000075b7e080, ksp: 000000007476bc30)
<snip>
Call Trace:
([<00000000001165de>] show_trace+0x106/0x154)
 [<00000000001166a0>] show_stack+0x74/0xf4
 [<00000000006ff646>] dump_stack+0xc6/0xd4
 [<000000000017f3a0>] __might_sleep+0x128/0x148
 [<000000000015ece8>] flush_work+0x54/0x1f8
 [<00000000001630de>] __cancel_work_timer+0xc6/0x128
 [<00000000005067ac>] scsi_device_dev_release_usercontext+0x164/0x23c
 [<0000000000161816>] execute_in_process_context+0x96/0xa8
 [<00000000004d33d8>] device_release+0x60/0xc0
 [<000000000048af48>] kobject_release+0xa8/0x1c4
 [<00000000004f4bf2>] __scsi_iterate_devices+0xfa/0x130
 [<000003ff801b307a>] zfcp_erp_strategy+0x4da/0x1014 [zfcp]
 [<000003ff801b3caa>] zfcp_erp_thread+0xf6/0x2b0 [zfcp]
 [<000000000016b75a>] kthread+0xf2/0xfc
 [<000000000070c9de>] kernel_thread_starter+0x6/0xc
 [<000000000070c9d8>] kernel_thread_starter+0x0/0xc

Apparently, the ref_count for some scsi_device drops down to zero,
triggering device removal through execute_in_process_context(), while
the lldd error recovery thread iterates through a scsi device list.
Unfortunately, execute_in_process_context() decides to immediately
execute that device removal function, instead of scheduling asynchronous
execution, since it detects process context and thinks it is safe to do
so. But almost all calls to shost_for_each_device() in our lldd are
inside spin_lock_irq, even in thread context. Obviously, schedule()
inside spin_lock_irq sections is a bad idea.

Change the lldd to use the proper iterator function,
__shost_for_each_device(), in combination with required locking.

Occurences that need to be changed include all calls in zfcp_erp.c,
since those might be executed in zfcp error recovery thread context
with a lock held.

Other occurences of shost_for_each_device() in zfcp_fsf.c do not
need to be changed (no process context, no surrounding locking).

The problem was introduced in Linux 2.6.37 by commit
b62a8d9b45b971a67a0f8413338c230e3117dff5
"[SCSI] zfcp: Use SCSI device data zfcp_scsi_dev instead of zfcp_unit".

Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Peschke <mpeschke@linux.vnet.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoSCSI: zfcp: fix lock imbalance by reworking request queue locking
Martin Peschke [Thu, 22 Aug 2013 15:45:36 +0000 (17:45 +0200)]
SCSI: zfcp: fix lock imbalance by reworking request queue locking

commit d79ff142624e1be080ad8d09101f7004d79c36e1 upstream.

This patch adds wait_event_interruptible_lock_irq_timeout(), which is a
straight-forward descendant of wait_event_interruptible_timeout() and
wait_event_interruptible_lock_irq().

The zfcp driver used to call wait_event_interruptible_timeout()
in combination with some intricate and error-prone locking. Using
wait_event_interruptible_lock_irq_timeout() as a replacement
nicely cleans up that locking.

This rework removes a situation that resulted in a locking imbalance
in zfcp_qdio_sbal_get():

BUG: workqueue leaked lock or atomic: events/1/0xffffff00/10
    last function: zfcp_fc_wka_port_offline+0x0/0xa0 [zfcp]

It was introduced by commit c2af7545aaff3495d9bf9a7608c52f0af86fb194
"[SCSI] zfcp: Do not wait for SBALs on stopped queue", which had a new
code path related to ZFCP_STATUS_ADAPTER_QDIOUP that took an early exit
without a required lock being held. The problem occured when a
special, non-SCSI I/O request was being submitted in process context,
when the adapter's queues had been torn down. In this case the bug
surfaced when the Fibre Channel port connection for a well-known address
was closed during a concurrent adapter shut-down procedure, which is a
rare constellation.

This patch also fixes these warnings from the sparse tool (make C=1):

drivers/s390/scsi/zfcp_qdio.c:224:12: warning: context imbalance in
 'zfcp_qdio_sbal_check' - wrong count at exit
drivers/s390/scsi/zfcp_qdio.c:244:5: warning: context imbalance in
 'zfcp_qdio_sbal_get' - unexpected unlock

Last but not least, we get rid of that crappy lock-unlock-lock
sequence at the beginning of the critical section.

It is okay to call zfcp_erp_adapter_reopen() with req_q_lock held.

Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Peschke <mpeschke@linux.vnet.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agolibata: apply behavioral quirks to sil3826 PMP
Terry Suereth [Sat, 17 Aug 2013 19:53:12 +0000 (15:53 -0400)]
libata: apply behavioral quirks to sil3826 PMP

commit 8ffff94d20b7eb446e848e0046107d51b17a20a8 upstream.

Fixing support for the Silicon Image 3826 port multiplier, by applying
to it the same quirks applied to the Silicon Image 3726.  Specifically
fixes the repeated timeout/reset process which previously afflicted
the 3726, as described from line 290.  Slightly based on notes from:

https://bugzilla.redhat.com/show_bug.cgi?id=890237

Signed-off-by: Terry Suereth <terry.suereth@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoHostap: copying wrong data prism2_ioctl_giwaplist()
Dan Carpenter [Fri, 9 Aug 2013 09:52:31 +0000 (12:52 +0300)]
Hostap: copying wrong data prism2_ioctl_giwaplist()

commit 909bd5926d474e275599094acad986af79671ac9 upstream.

We want the data stored in "addr" and "qual", but the extra ampersands
mean we are copying stack data instead.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agonilfs2: fix issue with counting number of bio requests for BIO_EOPNOTSUPP error detection
Vyacheslav Dubeyko [Thu, 22 Aug 2013 23:35:45 +0000 (16:35 -0700)]
nilfs2: fix issue with counting number of bio requests for BIO_EOPNOTSUPP error detection

commit 4bf93b50fd04118ac7f33a3c2b8a0a1f9fa80bc9 upstream.

Fix the issue with improper counting number of flying bio requests for
BIO_EOPNOTSUPP error detection case.

The sb_nbio must be incremented exactly the same number of times as
complete() function was called (or will be called) because
nilfs_segbuf_wait() will call wail_for_completion() for the number of
times set to sb_nbio:

  do {
      wait_for_completion(&segbuf->sb_bio_event);
  } while (--segbuf->sb_nbio > 0);

Two functions complete() and wait_for_completion() must be called the
same number of times for the same sb_bio_event.  Otherwise,
wait_for_completion() will hang or leak.

Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agonilfs2: remove double bio_put() in nilfs_end_bio_write() for BIO_EOPNOTSUPP error
Vyacheslav Dubeyko [Thu, 22 Aug 2013 23:35:44 +0000 (16:35 -0700)]
nilfs2: remove double bio_put() in nilfs_end_bio_write() for BIO_EOPNOTSUPP error

commit 2df37a19c686c2d7c4e9b4ce1505b5141e3e5552 upstream.

Remove double call of bio_put() in nilfs_end_bio_write() for the case of
BIO_EOPNOTSUPP error detection.  The issue was found by Dan Carpenter
and he suggests first version of the fix too.

Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoof: fdt: fix memory initialization for expanded DT
Wladislav Wiebe [Mon, 12 Aug 2013 11:06:53 +0000 (13:06 +0200)]
of: fdt: fix memory initialization for expanded DT

commit 9e40127526e857fa3f29d51e83277204fbdfc6ba upstream.

Already existing property flags are filled wrong for properties created from
initial FDT. This could cause problems if this DYNAMIC device-tree functions
are used later, i.e. properties are attached/detached/replaced. Simply dumping
flags from the running system show, that some initial static (not allocated via
kzmalloc()) nodes are marked as dynamic.

I putted some debug extensions to property_proc_show(..) :
..
+       if (OF_IS_DYNAMIC(pp))
+               pr_err("DEBUG: xxx : OF_IS_DYNAMIC\n");
+       if (OF_IS_DETACHED(pp))
+               pr_err("DEBUG: xxx : OF_IS_DETACHED\n");

when you operate on the nodes (e.g.: ~$ cat /proc/device-tree/*some_node*) you
will see that those flags are filled wrong, basically in most cases it will dump
a DYNAMIC or DETACHED status, which is in not true.
(BTW. this OF_IS_DETACHED is a own define for debug purposes which which just
make a test_bit(OF_DETACHED, &x->_flags)

If nodes are dynamic kernel is allowed to kfree() them. But it will crash
attempting to do so on the nodes from FDT -- they are not allocated via
kzmalloc().

Signed-off-by: Wladislav Wiebe <wladislav.kw@gmail.com>
Acked-by: Alexander Sverdlin <alexander.sverdlin@nsn.com>
Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoxen/events: initialize local per-cpu mask for all possible events
David Vrabel [Thu, 15 Aug 2013 12:21:06 +0000 (13:21 +0100)]
xen/events: initialize local per-cpu mask for all possible events

commit 84ca7a8e45dafb49cd5ca90a343ba033e2885c17 upstream.

The sizeof() argument in init_evtchn_cpu_bindings() is incorrect
resulting in only the first 64 (or 32 in 32-bit guests) ports having
their bindings being initialized to VCPU 0.

In most cases this does not cause a problem as request_irq() will set
the irq affinity which will set the correct local per-cpu mask.
However, if the request_irq() is called on a VCPU other than 0, there
is a window between the unmasking of the event and the affinity being
set were an event may be lost because it is not locally unmasked on
any VCPU. If request_irq() is called on VCPU 0 then local irqs are
disabled during the window and the race does not occur.

Fix this by initializing all NR_EVENT_CHANNEL bits in the local
per-cpu masks.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agozd1201: do not use stack as URB transfer_buffer
Jussi Kivilinna [Tue, 6 Aug 2013 11:28:42 +0000 (14:28 +0300)]
zd1201: do not use stack as URB transfer_buffer

commit 1206ff4ff9d2ef7468a355328bc58ac6ebf5be44 upstream.

Patch fixes zd1201 not to use stack as URB transfer_buffer. URB buffers need
to be DMA-able, which stack is not.

Patch is only compile tested.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoLinux 3.0.93 v3.0.93
Greg Kroah-Hartman [Tue, 20 Aug 2013 17:23:58 +0000 (10:23 -0700)]
Linux 3.0.93

10 years agoRevert "genetlink: fix family dump race"
Greg Kroah-Hartman [Tue, 20 Aug 2013 17:06:19 +0000 (10:06 -0700)]
Revert "genetlink: fix family dump race"

This reverts commit bba2a9f0d381e510ba32f2f984e5ae1e705c90d1 which is
commit 58ad436fcf49810aa006016107f494c9ac9013db upstream, as there are
reported problems with it.

Cc: Johannes Berg <johannes.berg@intel.com>
Cc: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoLinux 3.0.92 v3.0.92
Greg Kroah-Hartman [Tue, 20 Aug 2013 15:21:23 +0000 (08:21 -0700)]
Linux 3.0.92

10 years agom68k: Truncate base in do_div()
Andreas Schwab [Fri, 9 Aug 2013 13:14:08 +0000 (15:14 +0200)]
m68k: Truncate base in do_div()

commit ea077b1b96e073eac5c3c5590529e964767fc5f7 upstream.

Explicitly truncate the second operand of do_div() to 32 bits to guard
against bogus code calling it with a 64-bit divisor.

[Thorsten]

After upgrading from 3.2 to 3.10, mounting a btrfs volume fails with:

btrfs: setting nodatacow, compression disabled
btrfs: enabling auto recovery
btrfs: disk space caching is enabled
 *** ZERO DIVIDE ***   FORMAT=2
Current process id is 722
BAD KERNEL TRAP: 00000000
Modules linked in: evdev mac_hid ext4 crc16 jbd2 mbcache btrfs xor lzo_compress zlib_deflate raid6_pq crc32c libcrc32c
PC: [<319535b2>] __btrfs_map_block+0x11c/0x119a [btrfs]
SR: 2000  SP: 30c1fab4  a2: 30f0faf0
d0: 00000000    d1: 00001000    d2: 00000000    d3: 00000000
d4: 00010000    d5: 00000000    a0: 3085c72c    a1: 3085c72c
Process mount (pid: 722, task=30f0faf0)
Frame format=2 instr addr=319535ae
Stack from 30c1faec:
        00000000 00000020 00000000 00001000 00000000 01401000 30253928 300ffc00
        00a843ac 3026f640 00000000 00010000 0009e250 00d106c0 00011220 00000000
        00001000 301c6830 0009e32a 000000ff 00000009 3085c72c 00000000 00000000
        30c1fd14 00000000 00000020 00000000 30c1fd14 0009e26c 00000020 00000003
        00000000 0009dd8a 300b0b6c 30253928 00a843ac 00001000 00000000 00000000
        0000a008 3194e76a 30253928 00a843ac 00001000 00000000 00000000 00000002
Call Trace: [<00001000>] kernel_pg_dir+0x0/0x1000

    [...]

Code: 222e ff74 2a2e ff5c 2c2e ff60 4c45 1402 <2d40> ff64 2d41 ff68 2205 4c2e 1800 ff68 4c04 0800 2041 d1c0 2206 4c2e 1400 ff68

[Geert]

As diagnosed by Andreas, fs/btrfs/volumes.c:__btrfs_map_block()
calls

    do_div(stripe_nr, stripe_len);

with stripe_len u64, while do_div() assumes the divisor is a 32-bit number.

Due to the lack of truncation in the m68k-specific implementation of
do_div(), the division is performed using the upper 32-bit word of
stripe_len, which is zero.

This was introduced by commit 53b381b3abeb86f12787a6c40fee9b2f71edc23b
("Btrfs: RAID5 and RAID6"), which changed the divisor from
map->stripe_len (struct map_lookup.stripe_len is int) to a 64-bit temporary.

Reported-by: Thorsten Glaser <tg@debian.org>
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Tested-by: Thorsten Glaser <tg@debian.org>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agovm: add no-mmu vm_iomap_memory() stub
Linus Torvalds [Sat, 27 Apr 2013 20:25:38 +0000 (13:25 -0700)]
vm: add no-mmu vm_iomap_memory() stub

commit 3c0b9de6d37a481673e81001c57ca0e410c72346 upstream.

I think we could just move the full vm_iomap_memory() function into
util.h or similar, but I didn't get any reply from anybody actually
using nommu even to this trivial patch, so I'm not going to touch it any
more than required.

Here's the fairly minimal stub to make the nommu case at least
potentially work.  It doesn't seem like anybody cares, though.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoARM: 7080/1: l2x0: make sure I&D are not locked down on init
Linus Walleij [Tue, 6 Sep 2011 06:45:46 +0000 (07:45 +0100)]
ARM: 7080/1: l2x0: make sure I&D are not locked down on init

commit bac7e6ecf60933b68af910eb4c83a775a8b20b19 upstream.

Fighting unfixed U-Boots and other beasts that may the cache in
a locked-down state when starting the kernel, we make sure to
disable all cache lock-down when initializing the l2x0 so we
are in a known state.

Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Reported-by: Jan Rinze <janrinze@gmail.com>
Cc: Srinidhi Kasagar <srinidhi.kasagar@stericsson.com>
Cc: Rabin Vincent <rabin.vincent@stericsson.com>
Cc: Adrian Bunk <adrian.bunk@movial.com>
Cc: Rob Herring <robherring2@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Tested-by: Robert Marklund <robert.marklund@stericsson.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agom68k/atari: ARAnyM - Fix NatFeat module support
Geert Uytterhoeven [Thu, 25 Jul 2013 22:08:25 +0000 (00:08 +0200)]
m68k/atari: ARAnyM - Fix NatFeat module support

commit e8184e10f89736a23ea6eea8e24cd524c5c513d2 upstream.

As pointed out by Andreas Schwab, pointers passed to ARAnyM NatFeat calls
should be physical addresses, not virtual addresses.

Fortunately on Atari, physical and virtual kernel addresses are the same,
as long as normal kernel memory is concerned, so this usually worked fine
without conversion.

But for modules, pointers to literal strings are located in vmalloc()ed
memory. Depending on the version of ARAnyM, this causes the nf_get_id()
call to just fail, or worse, crash ARAnyM itself with e.g.

    Gotcha! Illegal memory access. Atari PC = $968c

This is a big issue for distro kernels, who want to have all drivers as
loadable modules in an initrd.

Add a wrapper for nf_get_id() that copies the literal to the stack to
work around this issue.

Reported-by: Thorsten Glaser <tg@debian.org>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agopowerpc: Use -mtraceback=no
Anton Blanchard [Thu, 30 Jun 2011 13:55:27 +0000 (13:55 +0000)]
powerpc: Use -mtraceback=no

commit af9719c3062dfe216a0c3de3fa52be6d22b4456c upstream.

gcc 4.7 will be more strict about parsing the -mtraceback option:

 gcc: error: unrecognized argument in option '-mtraceback=none'
 gcc: note: valid arguments to '-mtraceback=' are: full no part

gcc used to do a 2 char compare so both "no" and "none" would
match. Switch to using -mtraceback=no should work everywhere.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agosparc32: Add ucmpdi2.o to obj-y instead of lib-y.
David S. Miller [Sat, 19 May 2012 22:27:01 +0000 (15:27 -0700)]
sparc32: Add ucmpdi2.o to obj-y instead of lib-y.

commit 74c7b28953d4eaa6a479c187aeafcfc0280da5e8 upstream.

Otherwise if no references exist in the static kernel image,
we won't export the symbol properly to modules.

Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agosparc32: add ucmpdi2
Sam Ravnborg [Sat, 19 May 2012 09:54:11 +0000 (11:54 +0200)]
sparc32: add ucmpdi2

commit de36e66d5fa52bc6e2dacd95c701a1762b5308a7 upstream.

Based on copy from microblaze add ucmpdi2 implementation.
This fixes build of niu driver which failed with:

drivers/built-in.o: In function `niu_get_nfc':
niu.c:(.text+0x91494): undefined reference to `__ucmpdi2'

This driver will never be used on a sparc32 system,
but patch added to fix build breakage with all*config builds.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoUSB: mos7720: fix broken control requests
Johan Hovold [Tue, 13 Aug 2013 11:27:34 +0000 (13:27 +0200)]
USB: mos7720: fix broken control requests

commit ef6c8c1d733e244f0499035be0dabe1f4ed98c6f upstream.

The parallel-port code of the drivers used a stack allocated
control-request buffer for asynchronous (and possibly deferred) control
requests. This not only violates the no-DMA-from-stack requirement but
could also lead to corrupt control requests being submitted.

Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agousb: add two quirky touchscreen
Oliver Neukum [Wed, 14 Aug 2013 09:01:46 +0000 (11:01 +0200)]
usb: add two quirky touchscreen

commit 304ab4ab079a8ed03ce39f1d274964a532db036b upstream.

These devices tend to become unresponsive after S3

Signed-off-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agogenetlink: fix family dump race
Johannes Berg [Tue, 13 Aug 2013 07:04:05 +0000 (09:04 +0200)]
genetlink: fix family dump race

commit 58ad436fcf49810aa006016107f494c9ac9013db upstream.

When dumping generic netlink families, only the first dump call
is locked with genl_lock(), which protects the list of families,
and thus subsequent calls can access the data without locking,
racing against family addition/removal. This can cause a crash.
Fix it - the locking needs to be conditional because the first
time around it's already locked.

A similar bug was reported to me on an old kernel (3.4.47) but
the exact scenario that happened there is no longer possible,
on those kernels the first round wasn't locked either. Looking
at the current code I found the race described above, which had
also existed on the old kernel.

Reported-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoaf_key: initialize satype in key_notify_policy_flush()
Nicolas Dichtel [Mon, 18 Feb 2013 15:24:20 +0000 (16:24 +0100)]
af_key: initialize satype in key_notify_policy_flush()

commit 85dfb745ee40232876663ae206cba35f24ab2a40 upstream.

This field was left uninitialized. Some user daemons perform check against this
field.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Luis Henriques <luis.henriques@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoCRIS: Add _sdata to vmlinux.lds.S
Jesper Nilsson [Mon, 24 Oct 2011 09:19:25 +0000 (11:19 +0200)]
CRIS: Add _sdata to vmlinux.lds.S

commit 473e162eea465e60578edb93341752e7f1c1dacc upstream.

Fixes link error:
  LD      vmlinux
kernel/built-in.o: In function `core_kernel_data':
(.text+0x13e44): undefined reference to `_sdata'

Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agofutex: Take hugepages into account when generating futex_key
Zhang Yi [Tue, 25 Jun 2013 13:19:31 +0000 (21:19 +0800)]
futex: Take hugepages into account when generating futex_key

commit 13d60f4b6ab5b702dc8d2ee20999f98a93728aec upstream.

The futex_keys of process shared futexes are generated from the page
offset, the mapping host and the mapping index of the futex user space
address. This should result in an unique identifier for each futex.

Though this is not true when futexes are located in different subpages
of an hugepage. The reason is, that the mapping index for all those
futexes evaluates to the index of the base page of the hugetlbfs
mapping. So a futex at offset 0 of the hugepage mapping and another
one at offset PAGE_SIZE of the same hugepage mapping have identical
futex_keys. This happens because the futex code blindly uses
page->index.

Steps to reproduce the bug:

1. Map a file from hugetlbfs. Initialize pthread_mutex1 at offset 0
   and pthread_mutex2 at offset PAGE_SIZE of the hugetlbfs
   mapping.

   The mutexes must be initialized as PTHREAD_PROCESS_SHARED because
   PTHREAD_PROCESS_PRIVATE mutexes are not affected by this issue as
   their keys solely depend on the user space address.

2. Lock mutex1 and mutex2

3. Create thread1 and in the thread function lock mutex1, which
   results in thread1 blocking on the locked mutex1.

4. Create thread2 and in the thread function lock mutex2, which
   results in thread2 blocking on the locked mutex2.

5. Unlock mutex2. Despite the fact that mutex2 got unlocked, thread2
   still blocks on mutex2 because the futex_key points to mutex1.

To solve this issue we need to take the normal page index of the page
which contains the futex into account, if the futex is in an hugetlbfs
mapping. In other words, we calculate the normal page mapping index of
the subpage in the hugetlbfs mapping.

Mappings which are not based on hugetlbfs are not affected and still
use page->index.

Thanks to Mel Gorman who provided a patch for adding proper evaluation
functions to the hugetlbfs code to avoid exposing hugetlbfs specific
details to the futex code.

[ tglx: Massaged changelog ]

Signed-off-by: Zhang Yi <zhang.yi20@zte.com.cn>
Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn>
Tested-by: Ma Chenggong <ma.chenggong@zte.com.cn>
Reviewed-by: 'Mel Gorman' <mgorman@suse.de>
Acked-by: 'Darren Hart' <dvhart@linux.intel.com>
Cc: 'Peter Zijlstra' <peterz@infradead.org>
Link: http://lkml.kernel.org/r/000101ce71a6%24a83c5880%24f8b50980%24@com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Mike Galbraith <mgalbraith@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>