In order to be able to use pull_dl_task() from a callback, we need to
do away with the return value.
Since the return value indicates if we should reschedule, do this
inside the function. Since not all callers currently do this, this can
increase the number of reschedules due rt balancing.
Too many reschedules is not a correctness issues, too few are.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: ktkhai@parallels.com Cc: rostedt@goodmis.org Cc: juri.lelli@gmail.com Cc: pang.xunlei@linaro.org Cc: oleg@redhat.com Cc: wanpeng.li@linux.intel.com Cc: umgwanakikbuti@gmail.com Link: http://lkml.kernel.org/r/20150611124742.859398977@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[Conflicts: kernel/sched/deadline.c] Signed-off-by: Byungchul Park <byungchul.park@lge.com>
[bwh: Backported to 3.16: use resched_task() instead of resched_curr()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
In order to be able to use pull_rt_task() from a callback, we need to
do away with the return value.
Since the return value indicates if we should reschedule, do this
inside the function. Since not all callers currently do this, this can
increase the number of reschedules due rt balancing.
Too many reschedules is not a correctness issues, too few are.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: ktkhai@parallels.com Cc: rostedt@goodmis.org Cc: juri.lelli@gmail.com Cc: pang.xunlei@linaro.org Cc: oleg@redhat.com Cc: wanpeng.li@linux.intel.com Cc: umgwanakikbuti@gmail.com Link: http://lkml.kernel.org/r/20150611124742.679002000@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[Conflicts: kernel/sched/rt.c] Signed-off-by: Byungchul Park <byungchul.park@lge.com>
[bwh: Backported to 3.16: use resched_task() instead of resched_curr()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
In order to remove dropping rq->lock from the
switched_{to,from}()/prio_changed() sched_class methods, run the
balance callbacks after it.
We need to remove dropping rq->lock because its buggy,
suppose using sched_setattr()/sched_setscheduler() to change a running
task from FIFO to OTHER.
By the time we get to switched_from_rt() the task is already enqueued
on the cfs runqueues. If switched_from_rt() does pull_rt_task() and
drops rq->lock, load-balancing can come in and move our task @p to
another rq.
The subsequent switched_to_fair() still assumes @p is on @rq and bad
things will happen.
By using balance callbacks we delay the load-balancing operations
{rt,dl}x{push,pull} until we've done all the important work and the
task is fully set up.
Furthermore, the balance callbacks do not know about @p, therefore
they cannot get confused like this.
Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: ktkhai@parallels.com Cc: rostedt@goodmis.org Cc: juri.lelli@gmail.com Cc: pang.xunlei@linaro.org Cc: oleg@redhat.com Cc: wanpeng.li@linux.intel.com Link: http://lkml.kernel.org/r/20150611124742.615343911@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[Conflicts: kernel/sched/core.c] Signed-off-by: Byungchul Park <byungchul.park@lge.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
An arm64 allmodconfig fails to build with GCC 5 due to __asmeq
assertions in the PSCI firmware calling code firing due to mcount
preambles breaking our assumptions about register allocation of function
arguments:
If CONFIG_BALLOON_COMPACTION=n balloon_page_insert() does not link pages
with balloon and doesn't set PagePrivate flag, as a result
balloon_page_dequeue() cannot get any pages because it thinks that all
of them are isolated. Without balloon compaction nobody can isolate
ballooned pages. It's safe to remove this check.
Fixes: d6d86c0a7f8d ("mm/balloon_compaction: redesign ballooned pages management"). Signed-off-by: Konstantin Khlebnikov <k.khlebnikov@samsung.com> Reported-by: Matt Mullins <mmullins@mmlx.us> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Jiri Kosina <jkosina@suse.cz> Cc: jian wang <wangjian@bytedance.com>
Sasha Levin reported KASAN splash inside isolate_migratepages_range().
Problem is in the function __is_movable_balloon_page() which tests
AS_BALLOON_MAP in page->mapping->flags. This function has no protection
against anonymous pages. As result it tried to check address space flags
inside struct anon_vma.
Further investigation shows more problems in current implementation:
* Special branch in __unmap_and_move() never works:
balloon_page_movable() checks page flags and page_count. In
__unmap_and_move() page is locked, reference counter is elevated, thus
balloon_page_movable() always fails. As a result execution goes to the
normal migration path. virtballoon_migratepage() returns
MIGRATEPAGE_BALLOON_SUCCESS instead of MIGRATEPAGE_SUCCESS,
move_to_new_page() thinks this is an error code and assigns
newpage->mapping to NULL. Newly migrated page lose connectivity with
balloon an all ability for further migration.
* lru_lock erroneously required in isolate_migratepages_range() for
isolation ballooned page. This function releases lru_lock periodically,
this makes migration mostly impossible for some pages.
* balloon_page_dequeue have a tight race with balloon_page_isolate:
balloon_page_isolate could be executed in parallel with dequeue between
picking page from list and locking page_lock. Race is rare because they
use trylock_page() for locking.
This patch fixes all of them.
Instead of fake mapping with special flag this patch uses special state of
page->_mapcount: PAGE_BALLOON_MAPCOUNT_VALUE = -256. Buddy allocator uses
PAGE_BUDDY_MAPCOUNT_VALUE = -128 for similar purpose. Storing mark
directly in struct page makes everything safer and easier.
PagePrivate is used to mark pages present in page list (i.e. not
isolated, like PageLRU for normal pages). It replaces special rules for
reference counter and makes balloon migration similar to migration of
normal pages. This flag is protected by page_lock together with link to
the balloon device.
Signed-off-by: Konstantin Khlebnikov <k.khlebnikov@samsung.com> Reported-by: Sasha Levin <sasha.levin@oracle.com> Link: http://lkml.kernel.org/p/53E6CEAA.9020105@oracle.com Cc: Rafael Aquini <aquini@redhat.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.16:
- Remove an additional check for MIGRATEPAGE_BALLOON_SUCCESS in
__unmap_and_move()
- Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Jiri Kosina <jkosina@suse.cz> Cc: jian wang <wangjian@bytedance.com>
The Performance Monitors extension is an optional feature of the
AArch64 architecture, therefore, in order to access Performance
Monitors registers safely, the kernel should detect the architected
PMU unit presence through the ID_AA64DFR0_EL1 register PMUVer field
before accessing them.
This patch implements a guard by reading the ID_AA64DFR0_EL1 register
PMUVer field to detect the architected PMU presence and prevent accessing
PMU system registers if the Performance Monitors extension is not
implemented in the core.
Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Fixes: 60792ad349f3 ("arm64: kernel: enforce pmuserenr_el0 initialization and restore") Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reported-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Stack object "dte_facilities" is allocated in x25_rx_call_request(),
which is supposed to be initialized in x25_negotiate_facilities.
However, 5 fields (8 bytes in total) are not initialized. This
object is then copied to userland via copy_to_user, thus infoleak
occurs.
Signed-off-by: Kangjie Lu <kjlu@gatech.edu> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The stack object “map” has a total size of 32 bytes. Its last 4
bytes are padding generated by compiler. These padding bytes are
not initialized and sent out via “nla_put”.
Signed-off-by: Kangjie Lu <kjlu@gatech.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context, indentation] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
clear_page_dirty_for_io() has accumulated writeback and memcg subtleties
since v2.6.16 first introduced page migration; and the set_page_dirty()
which completed its migration of PageDirty, later had to be moderated to
__set_page_dirty_nobuffers(); then PageSwapBacked had to skip that too.
No actual problems seen with this procedure recently, but if you look into
what the clear_page_dirty_for_io(page)+set_page_dirty(newpage) is actually
achieving, it turns out to be nothing more than moving the PageDirty flag,
and its NR_FILE_DIRTY stat from one zone to another.
It would be good to avoid a pile of irrelevant decrementations and
incrementations, and improper event counting, and unnecessary descent of
the radix_tree under tree_lock (to set the PAGECACHE_TAG_DIRTY which
radix_tree_replace_slot() left in place anyway).
Do the NR_FILE_DIRTY movement, like the other stats movements, while
interrupts still disabled in migrate_page_move_mapping(); and don't even
bother if the zone is the same. Do the PageDirty movement there under
tree_lock too, where old page is frozen and newpage not yet visible:
bearing in mind that as soon as newpage becomes visible in radix_tree, an
un-page-locked set_page_dirty() might interfere (or perhaps that's just
not possible: anything doing so should already hold an additional
reference to the old page, preventing its migration; but play safe).
But we do still need to transfer PageDirty in migrate_page_copy(), for
those who don't go the mapping route through migrate_page_move_mapping().
Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.16: adjust context. This is not just an optimisation,
but turned out to fix a possible oops (CVE-2016-3070).] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
In the ASN.1 decoder, when the length field of an ASN.1 value is extracted,
it isn't validated against the remaining amount of data before being added
to the cursor. With a sufficiently large size indicated, the check:
datalen - dp < 2
may then fail due to integer overflow.
Fix this by checking the length indicated against the amount of remaining
data in both places a definite length is determined.
Whilst we're at it, make the following changes:
(1) Check the maximum size of extended length does not exceed the capacity
of the variable it's being stored in (len) rather than the type that
variable is assumed to be (size_t).
(2) Compare the EOC tag to the symbolic constant ASN1_EOC rather than the
integer 0.
(3) To reduce confusion, move the initialisation of len outside of:
for (len = 0; n > 0; n--) {
since it doesn't have anything to do with the loop counter n.
Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Acked-by: David Woodhouse <David.Woodhouse@intel.com> Acked-by: Peter Jones <pjones@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Right now xfs_vm_page_mkwrite gets this the wrong way around,
While technically it cannot deadlock due to the current freeze
ordering, it's still a landmine that might explode if we change
anything in future. Hence we need to nest the locks correctly.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Jan Kara <jack@suse.cz> Cc: xfs@oss.sgi.com
Extent swap operations are another extent manipulation operation
that we need to ensure does not race against mmap page faults. The
current code returns if the file is mapped prior to the swap being
done, but it could potentially race against new page faults while
the swap is in progress. Hence we should use the XFS_MMAPLOCK_EXCL
for this operation, too.
While there, fix the error path handling that can result in double
unlocks of the inodes when cancelling the swapext transaction.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
[bwh: Backported to 3.16:
- The obsoleted check for mmap'd files was directly in xfs_swap_extents()
and used VN_MAPPED
- Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Now that truncate locks out new page faults, we no longer need to do
special writeback hacks in truncate to work around potential races
between page faults, page cache truncation and file size updates to
ensure we get write page faults for extending truncates on sub-page
block size filesystems. Hence we can remove the code in
xfs_setattr_size() that handles this and update the comments around
the code tha thandles page cache truncate and size updates to
reflect the new reality.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
[bwh: Backported to 3.16: we never had the previous hack, so just update the
comment] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Now we have the i_mmap_lock being held across the page fault IO
path, we now add extent manipulation operation exclusion by adding
the lock to the paths that directly modify extent maps. This
includes truncate, hole punching and other fallocate based
operations. The operations will now take both the i_iolock and the
i_mmaplock in exclusive mode, thereby ensuring that all IO and page
faults block without holding any page locks while the extent
manipulation is in progress.
This gives us the lock order during truncate of i_iolock ->
i_mmaplock -> page_lock -> i_lock, hence providing the same
lock order as the iolock provides the normal IO path without
involving the mmap_sem.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
[bwh: Backported to 3.16:
- We never need to break layouts, so take both i_iolock and i_mmaplock at the
same time
- Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Jan Kara <jack@suse.cz> Cc: xfs@oss.sgi.com
Take the i_mmaplock over write page faults. These come through the
->page_mkwrite callout, so we need to wrap that calls with the
i_mmaplock.
This gives us a lock order of mmap_sem -> i_mmaplock -> page_lock
-> i_lock.
Also, move the page_mkwrite wrapper to the same region of xfs_file.c
as the read fault wrappers and add a tracepoint.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Jan Kara <jack@suse.cz> Cc: xfs@oss.sgi.com
Take the i_mmaplock over read page faults. These come through the
->fault callout, so we need to wrap the generic implementation
with the i_mmaplock. While there, add tracepoints for the read
fault as it passes through XFS.
This gives us a lock order of mmap_sem -> i_mmaplock -> page_lock
-> i_lock.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Jan Kara <jack@suse.cz> Cc: xfs@oss.sgi.com
Right now we cannot serialise mmap against truncate or hole punch
sanely. ->page_mkwrite is not able to take locks that the read IO
path normally takes (i.e. the inode iolock) because that could
result in lock inversions (read - iolock - page fault - page_mkwrite
- iolock) and so we cannot use an IO path lock to serialise page
write faults against truncate operations.
Instead, introduce a new lock that is used *only* in the
->page_mkwrite path that is the equivalent of the iolock. The lock
ordering in a page fault is i_mmaplock -> page lock -> i_ilock,
and so in truncate we can i_iolock -> i_mmaplock and so lock out
new write faults during the process of truncation.
Because i_mmap_lock is outside the page lock, we can hold it across
all the same operations we hold the i_iolock for. The only
difference is that we never hold the i_mmaplock in the normal IO
path and so do not ever have the possibility that we can page fault
inside it. Hence there are no recursion issues on the i_mmap_lock
and so we can use it to serialise page fault IO against inode
modification operations that affect the IO path.
This patch introduces the i_mmaplock infrastructure, lockdep
annotations and initialisation/destruction code. Use of the new lock
will be in subsequent patches.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Jan Kara <jack@suse.cz> Cc: xfs@oss.sgi.com
xfs_swap_extents() holds the ilock over a call to
filemap_write_and_wait(), which can then try to write data and take
the ilock. That causes a self-deadlock.
Fix the deadlock and clean up the code by separating the locking
appropriately. Add a lockflags variable to track what locks we are
holding as we gain and drop them and cleanup the error handling to
always use "out_unlock" with the lockflags variable.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The slab name ends up being visible in the directory structure under
/sys, and even if you don't have access rights to the file you can see
the filenames.
Just use a 64-bit counter instead of the pointer to the 'net' structure
to generate a unique name.
This code will go away in 4.7 when the conntrack code moves to a single
kmemcache, but this is the backportable simple solution to avoiding
leaking kernel pointers to user space.
Fixes: 5b3501faa874 ("netfilter: nf_conntrack: per netns nf_conntrack_cachep") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Commit 702e5bc68ad2 ("ocfs2: use generic posix ACL infrastructure")
refactored code to use posix_acl_create. The problem with this function
is that it is not mindful of the cluster wide inode lock making it
unsuitable for use with ocfs2 inode creation with ACLs. For example,
when used in ocfs2_mknod, this function can cause deadlock as follows.
The parent dir inode lock is taken when calling posix_acl_create ->
get_acl -> ocfs2_iop_get_acl which takes the inode lock again. This can
cause deadlock if there is a blocked remote lock request waiting for the
lock to be downconverted. And same deadlock happened in ocfs2_reflink.
This fix is to revert back using ocfs2_init_acl.
Fixes: 702e5bc68ad2 ("ocfs2: use generic posix ACL infrastructure") Signed-off-by: Tariq Saeed <tariq.x.saeed@oracle.com> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The code at the "out" label assumes that "default_acl" and "acl" are NULL,
but actually the pointers can be NULL, unitialized, or freed.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
At the end of process_filter(), collapse_tree() was changed to update
the parg parameter, but the reassignment after the call wasn't removed.
What happens is that the "current_op" gets modified and freed and parg
is assigned to the new allocated argument. But after the call to
collapse_tree(), parg is assigned again to the just freed "current_op",
and this causes the tool to crash.
The current_op variable must also be assigned to NULL in case of error,
otherwise it will cause it to be free()ed twice.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Fixes: 42d6194d133c ("tools lib traceevent: Refactor process_filter()") Link: http://lkml.kernel.org/r/20160511150936.678c18a1@gandalf.local.home Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Subwoofer does not work out of the box on ASUS N751/N551 laptops. This
patch fixes it. Patch tested on N751 laptop. N551 part is not tested,
but according to [1] and [2] this laptop requires similar changes, so I
included them in the patch.
LPT is pch, so might run into the fdi bandwidth constraint (especially
since it has only 2 lanes). But right now we just force pipe_bpp back
to 24, resulting in a nice loop (which we bail out with a loud
WARN_ON). Fix this.
Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
References: https://bugs.freedesktop.org/show_bug.cgi?id=93477 Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1462264381-7573-1-git-send-email-daniel.vetter@ffwll.ch
(cherry picked from commit f58a1acc7e4a1f37d26124ce4c875c647fbcc61f) Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
If GSO packet is segmented and its segments are properly queued,
we call consume_skb() instead of kfree_skb() to be drop monitor
friendly.
Fixes: 3e4f8b7873709 ("macvtap: Perform GSO on forwarding path.") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Vlad Yasevich <vyasevic@redhat.com> Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Payloads of NM entries are not supposed to contain NUL. When we run
into such, only the part prior to the first NUL goes into the
concatenation (i.e. the directory entry name being encoded by a bunch
of NM entries). We do stop when the amount collected so far + the
claimed amount in the current NM entry exceed 254. So far, so good,
but what we return as the total length is the sum of *claimed*
sizes, not the actual amount collected. And that can grow pretty
large - not unlimited, since you'd need to put CE entries in
between to be able to get more than the maximum that could be
contained in one isofs directory entry / continuation chunk and
we are stop once we'd encountered 32 CEs, but you can get about 8Kb
easily. And that's what will be passed to readdir callback as the
name length. 8Kb __copy_to_user() from a buffer allocated by
__get_free_page()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Do not load one entry beyond the end of the syscall table when the
syscall number of a traced process equals to __NR_Linux_syscalls.
Similar bug with regular processes was fixed by commit 3bb457af4fa8
("[PARISC] Fix bug when syscall nr is __NR_Linux_syscalls").
This bug was found by strace test suite.
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Acked-by: Helge Deller <deller@gmx.de> Signed-off-by: Helge Deller <deller@gmx.de>
[bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Currently we read the tsc radio: ratio = (MSR_PLATFORM_INFO >> 8) & 0x1f;
Thus we get bit 8-12 of MSR_PLATFORM_INFO, however according to the SDM
(35.5), the ratio bits are bit 8-15.
Ignoring the upper bits can result in an incorrect tsc ratio, which causes the
TSC calibration and the Local APIC timer frequency to be incorrect.
Fix this problem by masking 0xff instead.
[ tglx: Massaged changelog ]
Fixes: 7da7c1561366 "x86, tsc: Add static (MSR) TSC calibration on Intel Atom SoCs" Signed-off-by: Chen Yu <yu.c.chen@intel.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Bin Gao <bin.gao@intel.com> Cc: Len Brown <lenb@kernel.org> Link: http://lkml.kernel.org/r/1462505619-5516-1-git-send-email-yu.c.chen@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
If /proc/<PID>/environ gets read before the envp[] array is fully set up
in create_{aout,elf,elf_fdpic,flat}_tables(), we might end up trying to
read more bytes than are actually written, as env_start will already be
set but env_end will still be zero, making the range calculation
underflow, allowing to read beyond the end of what has been written.
Fix this as it is done for /proc/<PID>/cmdline by testing env_end for
zero. It is, apparently, intentionally set last in create_*_tables().
This bug was found by the PaX size_overflow plugin that detected the
arithmetic underflow of 'this_len = env_end - (env_start + src)' when
env_end is still zero.
The expected consequence is that userland trying to access
/proc/<PID>/environ of a not yet fully set up process may get
inconsistent data as we're in the middle of copying in the environment
variables.
Fixes: https://forums.grsecurity.net/viewtopic.php?f=3&t=4363 Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=116461 Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Emese Revfy <re.emese@gmail.com> Cc: Pax Team <pageexec@freemail.hu> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Mateusz Guzik <mguzik@redhat.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Jarod Wilson <jarod@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
On DCE6.1 PPLL2 is exclusively available to UNIPHYA, so it should not
be taken into consideration when looking for an already enabled PLL
to be shared with other outputs.
This fixes the broken VGA port (TRAVIS DP->VGA bridge) on my Richland
based laptop, where the internal display is connected to UNIPHYA through
a TRAVIS DP->LVDS bridge.
This oops happens with the namespace_sem held and can be triggered by
non-root users. An all around not pleasant experience.
To avoid this scenario when finding the appropriate source mount to
copy stop the walk up the mnt_master chain when the first source mount
is encountered.
Further rewrite the walk up the last_source mnt_master chain so that
it is clear what is going on.
The reason why the first source mount is special is that it it's
mnt_parent is not a mount in the dest_mnt propagation tree, and as
such termination conditions based up on the dest_mnt mount propgation
tree do not make sense.
To avoid other kinds of confusion last_dest is not changed when
computing last_source. last_dest is only used once in propagate_one
and that is above the point of the code being modified, so changing
the global variable is meaningless and confusing.
propagate_one(m) calculates "type" argument for copy_tree() like this:
> if (m->mnt_group_id == last_dest->mnt_group_id) {
> type = CL_MAKE_SHARED;
> } else {
> type = CL_SLAVE;
> if (IS_MNT_SHARED(m))
> type |= CL_MAKE_SHARED;
> }
The "type" argument then governs clone_mnt() behavior with respect to flags
and mnt_master of new mount. When we iterate through a slave group, it is
possible that both current "m" and "last_dest" are not shared (although,
both are slaves, i.e. have non-NULL mnt_master-s). Then the comparison
above erroneously makes new mount shared and sets its mnt_master to
last_source->mnt_master. The patch fixes the problem by handling zero
mnt_group_id-s as though they are unequal.
The similar problem exists in the implementation of "else" clause above
when we have to ascend upward in the master/slave tree by calling:
proper number of times. The last step is governed by
"n->mnt_group_id != last_dest->mnt_group_id" condition that may lie if
both are zero. The patch fixes this case in the same way as the former one.
[AV: don't open-code an obvious helper...]
Signed-off-by: Maxim Patlasov <mpatlasov@virtuozzo.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The code for checking whether a BAR address range is valid will break
out of the loop when a start address of 0x0 is encountered.
This behaviour is wrong since by breaking out of the loop we may miss
the BAR that describes the EFI frame buffer in a later iteration.
Because of this bug I can't use video=efifb: boot parameter to get
efifb on my new ThinkPad E550 for my old linux system hard disk with
3.10 kernel. In 3.10, efifb is the only choice due to DRM/I915 not
supporting the GPU.
This patch also add a trivial optimization to break out after we find
the frame buffer address range without testing later BARs.
Signed-off-by: Wang YanQing <udknight@gmail.com>
[ Rewrote changelog. ] Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Reviewed-by: Peter Jones <pjones@redhat.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: David Herrmann <dh.herrmann@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1462454061-21561-2-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The crypto hash walk code is broken when supplied with an offset
greater than or equal to PAGE_SIZE. This patch fixes it by adjusting
walk->pg and walk->offset when this happens.
Reported-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Set the mutex owner thread ID.
Original patch from: Prarit Bhargava <prarit@redhat.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=115121 Link: https://github.com/acpica/acpica/commit/7a3bd2d9 Signed-off-by: Prarit Bhargava <prarit@redhat.com> Tested-by: Andy Lutomirski <luto@kernel.org> # On a Dell XPS 13 9350 Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Mark reported that having asterisks on the end of directory names
confuses get_maintainer.pl when it encounters subdirectories, and that
my name does not appear when run on drivers/firmware/efi/libstub.
Reported-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1462303781-8686-2-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Currently register functions for events will be called
through the 'reg' field of event class directly without
any check when seting up triggers.
Triggers for events that don't support register through
debug fs (events under events/ftrace are for trace-cmd to
read event format, and most of them don't have a register
function except events/ftrace/functionx) can't be enabled
at all, and an oops will be hit when setting up trigger
for those events, so just not creating them is an easy way
to avoid the oops.
This is a fairly minimal fixup to the horribly bad behavior of hash_64()
with certain input patterns.
In particular, because the multiplicative value used for the 64-bit hash
was intentionally bit-sparse (so that the multiply could be done with
shifts and adds on architectures without hardware multipliers), some
bits did not get spread out very much. In particular, certain fairly
common bit ranges in the input (roughly bits 12-20: commonly with the
most information in them when you hash things like byte offsets in files
or memory that have block factors that mean that the low bits are often
zero) would not necessarily show up much in the result.
There's a bigger patch-series brewing to fix up things more completely,
but this is the fairly minimal fix for the 64-bit hashing problem. It
simply picks a much better constant multiplier, spreading the bits out a
lot better.
NOTE! For 32-bit architectures, the bad old hash_64() remains the same
for now, since 64-bit multiplies are expensive. The bigger hashing
cleanup will replace the 32-bit case with something better.
The new constants were picked by George Spelvin who wrote that bigger
cleanup series. I just picked out the constants and part of the comment
from that series.
Cc: George Spelvin <linux@horizon.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The hash_64() function historically does the multiply by the
GOLDEN_RATIO_PRIME_64 number with explicit shifts and adds, because
unlike the 32-bit case, gcc seems unable to turn the constant multiply
into the more appropriate shift and adds when required.
However, that means that we generate those shifts and adds even when the
architecture has a fast multiplier, and could just do it better in
hardware.
Use the now-cleaned-up CONFIG_ARCH_HAS_FAST_MULTIPLIER (together with
"is it a 64-bit architecture") to decide whether to use an integer
multiply or the explicit sequence of shift/add instructions.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[bwh: This has no immediate effect in 3.16 because nothing defines
CONFIG_ARCH_HAS_FAST_MULTIPLIER. However the following fix removes
that condition.]
using the "r" constraint for %2. r0 is a valid register in the "r" set,
but addi X,r0,X turns it into an li:
li r7,-1
andc r7,r7,r0
popcntd r4,r7
Fix this by using the "b" constraint, for which r0 is not a valid
register.
This was found with a kernel build using gcc trunk, narrowed down to
when -frename-registers was enabled at -O2. It is just luck however
that we aren't seeing this on older toolchains.
Thanks to Segher for working with me to find this issue.
Fixes: d0cebfa650a0 ("powerpc: word-at-a-time optimization for 64-bit Little Endian") Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
[bwh: Backported to 3.16: same issue exists with a different variable in
find_zero()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
On the consumer side, we have interrupt driven flow management of the
producer. It is sufficient to base the signaling decision on the
amount of space that is available to write after the read is complete.
The current code samples the previous available space and uses this
in making the signaling decision. This state can be stale and is
unnecessary. Since the state can be stale, we end up not signaling
the host (when we should) and this can result in a hang. Fix this
problem by removing the unnecessary check. I would like to thank
Arseney Romanenko <arseneyr@microsoft.com> for pointing out this issue.
Also, issue a full memory barrier before making the signaling descision
to correctly deal with potential reordering of the write (read index)
followed by the read of pending_sz.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Tested-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Fixes a bug where previously hv_ringbuffer_read would pass in the old
number of bytes available to read instead of the expected old read index
when calculating when to signal to the host that the ringbuffer is empty.
Since the previous write size is already saved, also changes the
hv_need_to_signal_on_read to use the previously read value rather than
recalculating it.
Signed-off-by: Christopher Oo <t-chriso@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* if we have a hashed negative dentry and either CREAT|EXCL on
r/o filesystem, or CREAT|TRUNC on r/o filesystem, or CREAT|EXCL
with failing may_o_create(), we should fail with EROFS or the
error may_o_create() has returned, but not ENOENT. Which is what
the current code ends up returning.
* if we have CREAT|TRUNC hitting a regular file on a read-only
filesystem, we can't fail with EROFS here. At the very least,
not until we'd done follow_managed() - we might have a writable
file (or a device, for that matter) bound on top of that one.
Moreover, the code downstream will see that O_TRUNC and attempt
to grab the write access (*after* following possible mount), so
if we really should fail with EROFS, it will happen. No need
to do that inside atomic_open().
The real logics is much simpler than what the current code is
trying to do - if we decided to go for simple lookup, ended
up with a negative dentry *and* had create_error set, fail with
create_error. No matter whether we'd got that negative dentry
from lookup_real() or had found it in dcache.
Acked-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[bwh: Backported to 3.16: deleted code was slightly different] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Both of these drivers can return NOTIFY_BAD, but this terminates
processing other callbacks that were registered later on the chain.
Since the driver did nothing to log the error it seems wrong to prevent
other interested parties from seeing it. E.g. neither of them had even
bothered to check the type of the error to see if it was a memory error
before the return NOTIFY_BAD.
The batadv_tt_local_entry was specific to a batadv_softif_vlan and held an
implicit reference to it. But this reference was never stored in form of a
pointer in the tt_local_entry itself. Instead batadv_tt_local_remove,
batadv_tt_local_table_free and batadv_tt_local_purge_pending_clients depend
on a consistent state of bat_priv->softif_vlan_list and that
batadv_softif_vlan_get always returns the batadv_softif_vlan object which
it has a reference for. But batadv_softif_vlan_get cannot guarantee that
because it is working only with rcu_read_lock on this list. It can
therefore happen that an vid is in this list twice or that
batadv_softif_vlan_get cannot find the batadv_softif_vlan for an vid due to
some other list operations taking place at the same time.
Instead add a batadv_softif_vlan pointer directly in batadv_tt_local_entry
which will be used for the reference counter decremented on release of
batadv_tt_local_entry.
Fixes: 35df3b298fc8 ("batman-adv: fix TT VLAN inconsistency on VLAN re-add") Signed-off-by: Sven Eckelmann <sven@narfation.org> Acked-by: Antonio Quartulli <a@unstable.cc> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>
[bwh: Backported to 3.16:
- s/_put/_free_ref/ in various function names
- Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Now that DAT is VLAN aware, it must use the VID when
computing the DHT address of the candidate nodes where
an entry is going to be stored/retrieved.
Fixes: be1db4f6615b ("batman-adv: make the Distributed ARP Table vlan aware") Signed-off-by: Antonio Quartulli <a@unstable.cc>
[sven@narfation.org: fix conflicts with current version] Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
batadv_dat_select_candidates provides an u32 to batadv_hash_dat but it
needs a batadv_dat_entry with at least ip and vid filled in.
Fixes: 3e26722bc9f2 ("batman-adv: make the Distributed ARP Table vlan aware") Signed-off-by: Sven Eckelmann <sven@open-mesh.com> Acked-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Khugepaged detects own VMAs by checking vm_file and vm_ops but this way
it cannot distinguish private /dev/zero mappings from other special
mappings like /dev/hpet which has no vm_ops and popultes PTEs in mmap.
This fixes false-positive VM_BUG_ON and prevents installing THP where
they are not expected.
Link: http://lkml.kernel.org/r/CACT4Y+ZmuZMV5CjSFOeXviwQdABAgT7T+StKfTqan9YDtgEi5g@mail.gmail.com Fixes: 78f11a255749 ("mm: thp: fix /dev/zero MAP_PRIVATE and vm_flags cleanups") Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.16: deleted assertions used VM_BUG_ON()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The drivers/infiniband stack uses write() as a replacement for
bi-directional ioctl(). This is not safe. There are ways to
trigger write calls that result in the return structure that
is normally written to user space being shunted off to user
specified kernel memory instead.
For the immediate repair, detect and deny suspicious accesses to
the write API.
For long term, update the user space libraries and the kernel API
to something that doesn't present the same security vulnerabilities
(likely a structured ioctl() interface).
The impacted uAPI interfaces are generally only available if
hardware from drivers/infiniband is installed in the system.
Reported-by: Jann Horn <jann@thejh.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
[ Expanded check to all known write() entry points ] Signed-off-by: Doug Ledford <dledford@redhat.com>
[bwh: Backported to 3.2:
- Drop changes to hfi1
- ipath_write() has the same problem, so add the same restriction there] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
A while ago, commit 9875201e1049 ("rbd: fix use-after free of
rbd_dev->disk") fixed rbd unmap vs notify race by introducing
an exported wrapper for flushing notifies and sticking it into
do_rbd_remove().
A similar problem exists on the rbd map path, though: the watch is
registered in rbd_dev_image_probe(), while the disk is set up quite
a few steps later, in rbd_dev_device_setup(). Nothing prevents
a notify from coming in and crashing on a NULL rbd_dev->disk:
If an error occurs during rbd map, we have to error out, potentially
tearing down a watch. Just like on rbd unmap, notifies have to be
flushed, otherwise rbd_watch_cb() may end up trying to read in the
image header after rbd_dev_image_release() has run:
Assertion failure in rbd_dev_header_info() at line 4722:
To fix this, a) check if RBD_DEV_FLAG_EXISTS is set before calling
revalidate_disk(), b) move ceph_osdc_flush_notifies() call into
rbd_dev_header_unwatch_sync() to cover rbd map error paths and c) turn
header read-in into a critical section. The latter also happens to
take care of rbd map foo@bar vs rbd snap rm foo@bar race.
Fixes: http://tracker.ceph.com/issues/15490 Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Josh Durgin <jdurgin@redhat.com>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The secondary CPU starts up in ARM mode. When the kernel is compiled in
thumb2 mode we have to explicitly compile the secondary startup
trampoline in ARM mode, otherwise the CPU will go to Nirvana.
During system resume we depended on pci_enable_device() also putting the
device into PCI D0 state. This won't work if the PCI device was already
enabled but still in D3 state. This is because pci_enable_device() is
refcounted and will not change the HW state if called with a non-zero
refcount. Leaving the device in D3 will make all subsequent device
accesses fail.
This didn't cause a problem most of the time, since we resumed with an
enable refcount of 0. But it fails at least after module reload because
after that we also happen to leak a PCI device enable reference: During
probing we call drm_get_pci_dev() which will enable the PCI device, but
during device removal drm_put_dev() won't disable it. This is a bug of
its own in DRM core, but without much harm as it only leaves the PCI
device enabled. Fixing it is also a bit more involved, due to DRM
mid-layering and because it affects non-i915 drivers too. The fix in
this patch is valid regardless of the problem in DRM core.
v2:
- Add a code comment about the relation of this fix to the freeze/thaw
vs. the suspend/resume phases. (Ville)
- Add a code comment about the inconsistent ordering of set power state
and device enable calls. (Chris)
So blk_flush_plug_list() was called with from_schedule == true.
If from_schedule is true, that means that finally blk_mq_insert_requests()
offloads execution of __blk_mq_run_hw_queue() and uses kblockd workqueue,
i.e. it calls kblockd_schedule_delayed_work_on().
That means, that we race with another CPU, which is about to execute
__blk_mq_run_hw_queue() work.
Further debugging shows the following traces from different CPUs:
CPU#0 CPU#1
---------------------------------- -------------------------------
reqeust A inserted
STORE hctx->ctx_map[0] bit marked
kblockd_schedule...() returns 1
<schedule to kblockd workqueue>
request B inserted
STORE hctx->ctx_map[1] bit marked
kblockd_schedule...() returns 0
*** WORK PENDING bit is cleared ***
flush_busy_ctxs() is executed, but
bit 1, set by CPU#1, is not observed
As a result request B pended forever.
This behaviour can be explained by speculative LOAD of hctx->ctx_map on
CPU#0, which is reordered with clear of PENDING bit and executed _before_
actual STORE of bit 1 on CPU#1.
The proper fix is an explicit full barrier <mfence>, which guarantees
that clear of PENDING bit is to be executed before all possible
speculative LOADS or STORES inside actual work function.
Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com> Cc: Gioh Kim <gi-oh.kim@profitbricks.com> Cc: Michael Wang <yun.wang@profitbricks.com> Cc: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
When multiple skb are TX-completed in a row, we might incorrectly keep
a timestamp of a prior skb and cause extra work.
Fixes: ec693d47010e8 ("net/mlx4_en: Add HW timestamping (TS) support") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(1) new MON session is established
(2) old "none" ac is destroyed
(3) new "cephx" ac is constructed
...
(4) old OSD session (w/ "none" authorizer) is put
ceph_auth_destroy_authorizer(ac, osd->o_auth.authorizer)
osd->o_auth.authorizer in the "none" case is just a bare pointer into
ac, which contains a single static copy for all services. By the time
we get to (4), "none" ac, freed in (2), is long gone. On top of that,
a new vtable installed in (3) points us at ceph_x_destroy_authorizer(),
so we end up trying to destroy a "none" authorizer with a "cephx"
destructor operating on invalid memory!
To fix this, decouple authorizer destruction from ac and do away with
a single static "none" authorizer by making a copy for each OSD or MDS
session. Authorizers themselves are independent of ac and so there is
no reason for destroy_authorizer() to be an ac op. Make it an op on
the authorizer itself by turning ceph_authorizer into a real struct.
Fixes: http://tracker.ceph.com/issues/15447 Reported-by: Alan Zhang <alan.zhang@linux.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com>
[bwh: Backported to 3.16:
- Implementation of ceph_x_destroy_authorizer() is different
- Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
a255651d4cad ("ceph: ensure auth ops are defined before use") made
kfree() in put_osd() conditional on the authorizer. A mechanical
mistake most likely - fix it.
Cc: Alex Elder <elder@linaro.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com> Reviewed-by: Alex Elder <elder@linaro.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Adding VID:PID for Straizona Focusers to cp210x driver.
Signed-off-by: Jasem Mutlaq <mutlaqja@ikarustech.com> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The Link ECU is an aftermarket ECU computer for vehicles that provides
full tuning abilities as well as datalogging and displaying capabilities
via the USB to Serial adapter built into the device.
Signed-off-by: Mike Manning <michael@bsch.com.au> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
When removing a single interface while a broadcast or ogm packet is
still pending then we will free the forward packet without releasing the
queue slots again.
This patch is supposed to fix this issue.
Fixes: 6d5808d4ae1b ("batman-adv: Add missing hardif_free_ref in forw_packet_free") Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
[sven@narfation.org: fix conflicts with current version] Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
_batadv_update_route rcu_derefences orig_ifinfo->router outside of a
spinlock protected region to print some information messages to the debug
log. But this pointer is not checked again when the new pointer is assigned
in the spinlock protected region. Thus is can happen that the value of
orig_ifinfo->router changed in the meantime and thus the reference counter
of the wrong router gets reduced after the spinlock protected region.
Just rcu_dereferencing the value of orig_ifinfo->router inside the spinlock
protected region (which also set the new pointer) is enough to get the
correct old router object.
Fixes: e1a5382f978b ("batman-adv: Make orig_node->router an rcu protected pointer") Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The encapsulated ethernet and VLAN header may be outside the received
ethernet frame. Thus the skb buffer size has to be checked before it can be
parsed to find out if it encapsulates another batman-adv packet.
Fixes: 420193573f11 ("batman-adv: softif bridge loop avoidance") Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The variable_matches() function can currently read "var_name[len]", for
example when:
- var_name[0] == 'a',
- len == 1
- match_name points to the NUL-terminated string "ab".
This function is supposed to accept "var_name" inputs that are not
NUL-terminated (hence the "len" parameter"). Document the function, and
access "var_name[*match]" only if "*match" is smaller than "len".
Reported-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Laszlo Ersek <lersek@redhat.com> Cc: Peter Jones <pjones@redhat.com> Cc: Matthew Garrett <mjg59@coreos.com> Cc: Jason Andryuk <jandryuk@gmail.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Link: http://thread.gmane.org/gmane.comp.freedesktop.xorg.drivers.intel/86906 Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The exynos5 I2C controller driver always prepares and enables a clock
before using it and then disables unprepares it when the clock is not
used anymore.
But this can cause a possible ABBA deadlock in some scenarios since a
driver that uses regmap to access its I2C registers, will first grab
the regmap lock and then the I2C xfer function will grab the prepare
lock when preparing the I2C clock. But since the clock driver also
uses regmap for I2C accesses, preparing a clock will first grab the
prepare lock and then the regmap lock when using the regmap API.
An example of this happens on the Exynos5422 Odroid XU4 board where a
s2mps11 PMIC is used and both the s2mps11 regulators and clk drivers
share the same I2C regmap.
The possible deadlock is reported by the kernel lockdep:
Fix it by leaving the code prepared on probe and use {en,dis}able in
the I2C transfer function.
This patch is similar to commit 34e81ad5f0b6 ("i2c: s3c2410: fix ABBA
deadlock by keeping clock prepared") that fixes the same bug in other
driver for an I2C controller found in Samsung SoCs.
Reported-by: Anand Moon <linux.amoon@gmail.com> Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com> Reviewed-by: Anand Moon <linux.amoon@gmail.com> Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Huge pages are not normally available to PV guests. Not suppressing
hugetlbfs use results in an endless loop of page faults when user mode
code tries to access a hugetlbfs mapped area (since the hypervisor
denies such PTEs to be created, but error indications can't be
propagated out of xen_set_pte_at(), just like for various of its
siblings), and - once killed in an oops like this:
On s390 we only can enable hugepages if the underlying hardware/hypervisor
also does support this. Common code now would assume this to be
signaled by setting HPAGE_SHIFT to 0. But on s390, where we only
support one hugepage size, there is a link between HPAGE_SHIFT and
pageblock_order.
So instead of setting HPAGE_SHIFT to 0, we will implement the check for
the hardware capability.
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
s390 has a constant hugepage size, by setting HPAGE_SHIFT we also change
e.g. the pageblock_order, which should be independent in respect to
hugepage support.
With this patch every architecture is free to define how to check
for hugepage support.
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
atl2 includes NETIF_F_SG in hw_features even though it has no support
for non-linear skbs. This bug was originally harmless since the
driver does not claim to implement checksum offload and that used to
be a requirement for SG.
Now that SG and checksum offload are independent features, if you
explicitly enable SG *and* use one of the rare protocols that can use
SG without checkusm offload, this potentially leaks sensitive
information (before you notice that it just isn't working). Therefore
this obscure bug has been designated CVE-2016-2117.
Reported-by: Justin Yackoski <jyackoski@crypto-nite.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Fixes: ec5f06156423 ("net: Kill link between CSUM and SG features.") Signed-off-by: David S. Miller <davem@davemloft.net>
When the DaVinci emac driver is removed and re-probed, the actual
pdev->dev.platform_data is populated with an unwanted valid pointer saved by
the previous davinci_emac_of_get_pdata() call, causing a kernel crash when
calling priv->int_disable() in emac_int_disable().
Unable to handle kernel paging request at virtual address c8622a80
...
[<c0426fb4>] (emac_int_disable) from [<c0427700>] (emac_dev_open+0x290/0x5f8)
[<c0427700>] (emac_dev_open) from [<c04c00ec>] (__dev_open+0xb8/0x120)
[<c04c00ec>] (__dev_open) from [<c04c0370>] (__dev_change_flags+0x88/0x14c)
[<c04c0370>] (__dev_change_flags) from [<c04c044c>] (dev_change_flags+0x18/0x48)
[<c04c044c>] (dev_change_flags) from [<c052bafc>] (devinet_ioctl+0x6b4/0x7ac)
[<c052bafc>] (devinet_ioctl) from [<c04a1428>] (sock_ioctl+0x1d8/0x2c0)
[<c04a1428>] (sock_ioctl) from [<c014f054>] (do_vfs_ioctl+0x41c/0x600)
[<c014f054>] (do_vfs_ioctl) from [<c014f2a4>] (SyS_ioctl+0x6c/0x7c)
[<c014f2a4>] (SyS_ioctl) from [<c000ff60>] (ret_fast_syscall+0x0/0x1c)
Fixes: 42f59967a091 ("net: ethernet: davinci_emac: add OF support") Cc: Brian Hutchinson <b.hutchman@gmail.com> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
In order to avoid an Unbalanced pm_runtime_enable in the DaVinci
emac driver when the device is removed and re-probed, and a
pm_runtime_disable() call in davinci_emac_remove().
Actually, using unbind/bind on a TI DM8168 SoC gives :
$ echo 4a120000.ethernet > /sys/bus/platform/drivers/davinci_emac/unbind
net eth1: DaVinci EMAC: davinci_emac_remove()
$ echo 4a120000.ethernet > /sys/bus/platform/drivers/davinci_emac/bind
davinci_emac 4a120000.ethernet: Unbalanced pm_runtime_enable
Cc: Brian Hutchinson <b.hutchman@gmail.com> Fixes: 3ba97381343b ("net: ethernet: davinci_emac: add pm_runtime support") Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Otherwise an incoming waker on the dest hash bucket can miss
the waiter adding itself to the plist during the lockless
check optimization (small window but still the correct way
of doing this); similarly to the decrement counterpart.
Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: bigeasy@linutronix.de Cc: dvhart@infradead.org Link: http://lkml.kernel.org/r/1461208164-29150-1-git-send-email-dave@stgolabs.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The REAL_LE feature entry in the ibm_pa_feature struct is missing an MMU
feature value, meaning all the remaining elements initialise the wrong
values.
This means instead of checking for byte 5, bit 0, we check for byte 0,
bit 0, and then we incorrectly set the CPU feature bit as well as MMU
feature bit 1 and CPU user feature bits 0 and 2 (5).
Checking byte 0 bit 0 (IBM numbering), means we're looking at the
"Memory Management Unit (MMU)" feature - ie. does the CPU have an MMU.
In practice that bit is set on all platforms which have the property.
This means we set CPU_FTR_REAL_LE always. In practice that seems not to
matter because all the modern cpus which have this property also
implement REAL_LE, and we've never needed to disable it.
We're also incorrectly setting MMU feature bit 1, which is:
#define MMU_FTR_TYPE_8xx 0x00000002
Luckily the only place that looks for MMU_FTR_TYPE_8xx is in Book3E
code, which can't run on the same cpus as scan_features(). So this also
doesn't matter in practice.
Finally in the CPU user feature mask, we're setting bits 0 and 2. Bit 2
is not currently used, and bit 0 is:
#define PPC_FEATURE_PPC_LE 0x00000001
Which says the CPU supports the old style "PPC Little Endian" mode.
Again this should be harmless in practice as no 64-bit CPUs implement
that mode.
Fix the code by adding the missing initialisation of the MMU feature.
Also add a comment marking CPU user feature bit 2 (0x4) as reserved. It
would be unsafe to start using it as old kernels incorrectly set it.
Fixes: 44ae3ab3358e ("powerpc: Free up some CPU feature bits by moving out MMU-related features") Signed-off-by: Anton Blanchard <anton@samba.org>
[mpe: Flesh out changelog, add comment reserving 0x4] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Holding a reference to the containing task_struct is not sufficient to
prevent the mm_struct from being reaped under memory pressure. If this
happens whilst we are calling get_user_pages(), explosions erupt -
sometimes an immediate GPF, sometimes page flag corruption. To prevent
the target mm from being reaped as we are reading from it, acquire a
reference before we begin.
Testcase: igt/gem_shrink/*userptr Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1459864801-28606-2-git-send-email-chris@chris-wilson.co.uk
(cherry picked from commit 40313f0cd0b711a7a5905e5182422799e157d8aa) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
[bwh: Backported to 3.16:
- s/get_user_pages_remote/get_user_pages/
- s/npages/num_pages/] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The trigger delay algorithm that converts from microseconds to
the register value looks incorrect. According to most of the PMIC
documentation, the equation is
delay (Seconds) = (1 / 1024) * 2 ^ (x + 4)
except for one case where the documentation looks to have a
formatting issue and the equation looks like
delay (Seconds) = (1 / 1024) * 2 x + 4
Most likely this driver was written with the improper
documentation to begin with. According to the downstream sources
the valid delays are from 2 seconds to 1/64 second, and the
latter equation just doesn't make sense for that. Let's fix the
algorithm and the range check to match the documentation and the
downstream sources.
Reported-by: Bjorn Andersson <bjorn.andersson@linaro.org> Fixes: 92d57a73e410 ("input: Add support for Qualcomm PMIC8XXX power key") Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Tested-by: John Stultz <john.stultz@linaro.org> Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
In certain probe conditions the interrupt came right after registering
the handler causing a NULL pointer exception because of uninitialized
waitqueue:
$ udevadm trigger
i2c-gpio i2c-gpio-1: using pins 143 (SDA) and 144 (SCL)
i2c-gpio i2c-gpio-3: using pins 53 (SDA) and 52 (SCL)
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = e8b38000
[00000000] *pgd=00000000
Internal error: Oops: 5 [#1] SMP ARM
Modules linked in: snd_soc_i2s(+) i2c_gpio(+) snd_soc_idma snd_soc_s3c_dma snd_soc_core snd_pcm_dmaengine snd_pcm snd_timer snd soundcore ac97_bus spi_s3c64xx pwm_samsung dwc2 exynos_adc phy_exynos_usb2 exynosdrm exynos_rng rng_core rtc_s3c
CPU: 0 PID: 717 Comm: data-provider-m Not tainted 4.6.0-rc1-next-20160401-00011-g1b8d87473b9e-dirty #101
Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
(...)
(__wake_up_common) from [<c0379624>] (__wake_up+0x38/0x4c)
(__wake_up) from [<c0a41d30>] (ak8975_irq_handler+0x28/0x30)
(ak8975_irq_handler) from [<c0386720>] (handle_irq_event_percpu+0x88/0x140)
(handle_irq_event_percpu) from [<c038681c>] (handle_irq_event+0x44/0x68)
(handle_irq_event) from [<c0389c40>] (handle_edge_irq+0xf0/0x19c)
(handle_edge_irq) from [<c0385e04>] (generic_handle_irq+0x24/0x34)
(generic_handle_irq) from [<c05ee360>] (exynos_eint_gpio_irq+0x50/0x68)
(exynos_eint_gpio_irq) from [<c0386720>] (handle_irq_event_percpu+0x88/0x140)
(handle_irq_event_percpu) from [<c038681c>] (handle_irq_event+0x44/0x68)
(handle_irq_event) from [<c0389a70>] (handle_fasteoi_irq+0xb4/0x194)
(handle_fasteoi_irq) from [<c0385e04>] (generic_handle_irq+0x24/0x34)
(generic_handle_irq) from [<c03860b4>] (__handle_domain_irq+0x5c/0xb4)
(__handle_domain_irq) from [<c0301774>] (gic_handle_irq+0x54/0x94)
(gic_handle_irq) from [<c030c910>] (__irq_usr+0x50/0x80)
The bug was reproduced on exynos4412-trats2 (with a max77693 device also
using i2c-gpio) after building max77693 as a module.
Fixes: 94a6d5cf7caa ("iio:ak8975 Implement data ready interrupt handling") Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Tested-by: Gregor Boirie <gregor.boirie@parrot.com> Signed-off-by: Jonathan Cameron <jic23@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
On 64bit kernels, device stats are 64bit wide, not 32bit.
Fixes: 1c1008c793fa4 ("net: bcmgenet: add main driver file") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
arch_spin_lock_wait_flags() checks if a spinlock is not held before
trying a compare and swap instruction. If the lock is unlocked it
tries the compare and swap instruction, however if a different cpu
grabbed the lock in the meantime the instruction will fail as
expected.
Subsequently the arch_spin_lock_wait_flags() incorrectly tries to
figure out if the cpu that holds the lock is running. However it is
using the wrong cpu number for this (-1) and then will also yield the
current cpu to the wrong cpu.
Fix this by adding a missing continue statement.
Fixes: 470ada6b1a1d ("s390/spinlock: refactor arch_spin_lock_wait[_flags]") Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Prevent information from leaking to userspace by doing a memset to 0 of
the export state structure before setting the structure values and copying
it. This prevents un-initialized padding areas from being copied into the
export area.
Reported-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
pcs_parse_bits_in_pinctrl_entry uses ffs which gives bit indices
ranging from 1 to MAX. This leads to a corner case where we try to request
the pin number = MAX and fails.
bit_pos value is being calculted using ffs. pin_num_from_lsb uses
bit_pos value. pins array is populated with:
pin + pin_num_from_lsb.
The above is 1 more than usual bit indices as bit_pos uses ffs to compute
first set bit. Hence the last of the pins array is populated with the MAX
value and not MAX - 1 which causes error when we call pin_request.
mask_pos is rightly calculated as ((pcs->fmask) << (bit_pos - 1))
Consequently val_pos and submask are correct.
Hence use __ffs which gives (ffs(x) - 1) as the first bit set.
fixes: 4e7e8017a8 ("pinctrl: pinctrl-single: enhance to configure multiple pins of different modules") Signed-off-by: Keerthy <j-keerthy@ti.com> Acked-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Specifically for the case of reads that use the Extended Register
Read Long command, a multi-byte read operation is broken up into
8-byte chunks. However the call to spmi_ext_register_readl() is
incorrectly passing 'val_size', which if greater than 8 will
always fail. The argument should instead be 'len'.
Fixes: c9afbb05a9ff ("regmap: spmi: support base and extended register spaces") Signed-off-by: Jack Pham <jackp@codeaurora.org> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Because we miss to wipe the remainder of i->addr[] in packet_mc_add(),
pdiag_put_mclist() leaks uninitialized heap bytes via the
PACKET_DIAG_MCLIST netlink attribute.
Fix this by explicitly memset(0)ing the remaining bytes in i->addr[].
Fixes: eea68e2f1a00 ("packet: Report socket mclist info via diag module") Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@parallels.com> Acked-by: Pavel Emelyanov <xemul@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
On BXT platform Host Controller and Device Controller figure as
same PCI device but with different device function. HCD should
not pass data to Device Controller but only to Host Controllers.
Checking if companion device is Host Controller, otherwise skip.
Signed-off-by: Robert Dobrowolski <robert.dobrowolski@linux.intel.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Add a new NO_REPORT_LUNS quirk and set it for Seagate drives with
an usb-id of: 0bc2:331a, as these will fail to respond to a
REPORT_LUNS command.
Reported-and-tested-by: David Webb <djw@noc.ac.uk> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Based on Sergey's test patch [1], this fixes zram with lz4 compression
on big endian cpus.
Note that the 64-bit preprocessor test is not a cleanup, it's part of
the fix, since those identifiers are bogus (for example, __ppc64__
isn't defined anywhere else in the kernel, which means we'd fall into
the 32-bit definitions on ppc64).
We clamp frame_len_words to a maximum of 4096, but do not actually
limit the number of words written or read through the DATA registers
or the length added to spi_message::actual_length. This results in
silent data corruption for commands longer than this maximum.
Recalculate the length of each transfer, taking frame_len_words into
account. Use this length in qspi_{read,write}_msg(), and to increment
spi_message::actual_length.
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Mark Brown <broonie@kernel.org>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Each transfer can specify 8, 16 or 32 bits per word independently of
the default for the device being addressed. However, currently we
calculate the number of words in the frame assuming that the word size
is the device default.
If multiple transfers in the same message have differing
bits_per_word, we bitwise-or the different values in the WLEN register
field.
Fix both of these. Also rename 'frame_length' to 'frame_len_words' to
make clear that it's not a byte count like spi_message::frame_length.
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Mark Brown <broonie@kernel.org>
[bwh: Backported to 3.16:
- QSPI_WLEN_MAX_BITS is not defined so use the literal value 128
- Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
A non-privileged user can create a netlink socket with the same port_id as
used by an existing open nl80211 netlink socket (e.g. as used by a hostapd
process) with a different protocol number.
Closing this socket will then lead to the notification going to nl80211's
socket release notification handler, and possibly cause an action such as
removing a virtual interface.
Fix this issue by checking that the netlink protocol is NETLINK_GENERIC.
Since generic netlink has no notifier chain of its own, we can't fix the
problem more generically.
Fixes: 026331c4d9b5 ("cfg80211/mac80211: allow registering for and sending action frames") Signed-off-by: Dmitry Ivanov <dima@ubnt.com>
[rewrite commit message] Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>