The slab name ends up being visible in the directory structure under
/sys, and even if you don't have access rights to the file you can see
the filenames.
Just use a 64-bit counter instead of the pointer to the 'net' structure
to generate a unique name.
This code will go away in 4.7 when the conntrack code moves to a single
kmemcache, but this is the backportable simple solution to avoiding
leaking kernel pointers to user space.
Fixes: 5b3501faa874 ("netfilter: nf_conntrack: per netns nf_conntrack_cachep") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Commit 702e5bc68ad2 ("ocfs2: use generic posix ACL infrastructure")
refactored code to use posix_acl_create. The problem with this function
is that it is not mindful of the cluster wide inode lock making it
unsuitable for use with ocfs2 inode creation with ACLs. For example,
when used in ocfs2_mknod, this function can cause deadlock as follows.
The parent dir inode lock is taken when calling posix_acl_create ->
get_acl -> ocfs2_iop_get_acl which takes the inode lock again. This can
cause deadlock if there is a blocked remote lock request waiting for the
lock to be downconverted. And same deadlock happened in ocfs2_reflink.
This fix is to revert back using ocfs2_init_acl.
Fixes: 702e5bc68ad2 ("ocfs2: use generic posix ACL infrastructure") Signed-off-by: Tariq Saeed <tariq.x.saeed@oracle.com> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
introduced this issue. ocfs2_setattr called by chmod command holds
cluster wide inode lock when calling posix_acl_chmod. This latter
function in turn calls ocfs2_iop_get_acl and ocfs2_iop_set_acl. These
two are also called directly from vfs layer for getfacl/setfacl commands
and therefore acquire the cluster wide inode lock. If a remote
conversion request comes after the first inode lock in ocfs2_setattr,
OCFS2_LOCK_BLOCKED will be set. And this will cause the second call to
inode lock from the ocfs2_iop_get_acl() to block indefinetly.
The deleted version of ocfs2_acl_chmod() calls __posix_acl_chmod() which
does not call back into the filesystem. Therefore, we restore
ocfs2_acl_chmod(), modify it slightly for locking as needed, and use that
instead.
Fixes: 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()") Signed-off-by: Tariq Saeed <tariq.x.saeed@oracle.com> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Commit 8f1eb48758aa ("ocfs2: fix umask ignored issue") introduced an
issue, SGID of sub dir was not inherited from its parents dir. It is
because SGID is set into "inode->i_mode" in ocfs2_get_init_inode(), but
is overwritten by "mode" which don't have SGID set later.
Fixes: 8f1eb48758aa ("ocfs2: fix umask ignored issue") Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Acked-by: Srinivas Eeda <srinivas.eeda@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
| If a down prepare callback fails, then DOWN_FAILED is invoked for all
| callbacks which have successfully executed DOWN_PREPARE.
|
| But, workqueue has actually two notifiers. One which handles
| UP/DOWN_FAILED/ONLINE and one which handles DOWN_PREPARE.
|
| Now look at the priorities of those callbacks:
|
| CPU_PRI_WORKQUEUE_UP = 5
| CPU_PRI_WORKQUEUE_DOWN = -5
|
| So the call order on DOWN_PREPARE is:
|
| CB 1
| CB ...
| CB workqueue_up() -> Ignores DOWN_PREPARE
| CB ...
| CB X ---> Fails
|
| So we call up to CB X with DOWN_FAILED
|
| CB 1
| CB ...
| CB workqueue_up() -> Handles DOWN_FAILED
| CB ...
| CB X-1
|
| So the problem is that the workqueue stuff handles DOWN_FAILED in the up
| callback, while it should do it in the down callback. Which is not a good idea
| either because it wants to be called early on rollback...
|
| Brilliant stuff, isn't it? The hotplug rework will solve this problem because
| the callbacks become symetric, but for the existing mess, we need some
| workaround in the workqueue code.
The boot CPU handles housekeeping duty(unbound timers, workqueues,
timekeeping, ...) on behalf of full dynticks CPUs. It must remain
online when nohz full is enabled. There is a priority set to every
notifier_blocks:
So tick_nohz_cpu_down callback failed when down prepare cpu 0, and
notifier_blocks behind tick_nohz_cpu_down will not be called any
more, which leads to workers are actually not unbound. Then hotplug
state machine will fallback to undo and online cpu 0 again. Workers
will be rebound unconditionally even if they are not unbound and
trigger the warning in this progress.
This patch fix it by catching !DISASSOCIATED to avoid rebind bound
workers.
Cc: Tejun Heo <tj@kernel.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: stable@vger.kernel.org Suggested-by: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
At the end of process_filter(), collapse_tree() was changed to update
the parg parameter, but the reassignment after the call wasn't removed.
What happens is that the "current_op" gets modified and freed and parg
is assigned to the new allocated argument. But after the call to
collapse_tree(), parg is assigned again to the just freed "current_op",
and this causes the tool to crash.
The current_op variable must also be assigned to NULL in case of error,
otherwise it will cause it to be free()ed twice.
Subwoofer does not work out of the box on ASUS N751/N551 laptops. This
patch fixes it. Patch tested on N751 laptop. N551 part is not tested,
but according to [1] and [2] this laptop requires similar changes, so I
included them in the patch.
LPT is pch, so might run into the fdi bandwidth constraint (especially
since it has only 2 lanes). But right now we just force pipe_bpp back
to 24, resulting in a nice loop (which we bail out with a loud
WARN_ON). Fix this.
Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
References: https://bugs.freedesktop.org/show_bug.cgi?id=93477 Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1462264381-7573-1-git-send-email-daniel.vetter@ffwll.ch
(cherry picked from commit f58a1acc7e4a1f37d26124ce4c875c647fbcc61f) Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
NULL pointer derefence happens when booting with DTB because the
platform data for haptic device is not set in supplied data from parent
MFD device.
The MFD device creates only platform data (from Device Tree) for itself,
not for haptic child.
Unable to handle kernel NULL pointer dereference at virtual address 0000009c
pgd = c0004000
[0000009c] *pgd=00000000
Internal error: Oops: 5 [#1] PREEMPT SMP ARM
(max8997_haptic_probe) from [<c03f9cec>] (platform_drv_probe+0x4c/0xb0)
(platform_drv_probe) from [<c03f8440>] (driver_probe_device+0x214/0x2c0)
(driver_probe_device) from [<c03f8598>] (__driver_attach+0xac/0xb0)
(__driver_attach) from [<c03f67ac>] (bus_for_each_dev+0x68/0x9c)
(bus_for_each_dev) from [<c03f7a38>] (bus_add_driver+0x1a0/0x218)
(bus_add_driver) from [<c03f8db0>] (driver_register+0x78/0xf8)
(driver_register) from [<c0101774>] (do_one_initcall+0x90/0x1d8)
(do_one_initcall) from [<c0a00dbc>] (kernel_init_freeable+0x15c/0x1fc)
(kernel_init_freeable) from [<c06bb5b4>] (kernel_init+0x8/0x114)
(kernel_init) from [<c0107938>] (ret_from_fork+0x14/0x3c)
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Cc: <stable@vger.kernel.org> Fixes: 104594b01ce7 ("Input: add driver support for MAX8997-haptic")
[k.kozlowski: Write commit message, add CC-stable] Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Payloads of NM entries are not supposed to contain NUL. When we run
into such, only the part prior to the first NUL goes into the
concatenation (i.e. the directory entry name being encoded by a bunch
of NM entries). We do stop when the amount collected so far + the
claimed amount in the current NM entry exceed 254. So far, so good,
but what we return as the total length is the sum of *claimed*
sizes, not the actual amount collected. And that can grow pretty
large - not unlimited, since you'd need to put CE entries in
between to be able to get more than the maximum that could be
contained in one isofs directory entry / continuation chunk and
we are stop once we'd encountered 32 CEs, but you can get about 8Kb
easily. And that's what will be passed to readdir callback as the
name length. 8Kb __copy_to_user() from a buffer allocated by
__get_free_page()
Do not load one entry beyond the end of the syscall table when the
syscall number of a traced process equals to __NR_Linux_syscalls.
Similar bug with regular processes was fixed by commit 3bb457af4fa8
("[PARISC] Fix bug when syscall nr is __NR_Linux_syscalls").
Mike Frysinger reported that his ptrace testcase showed strange
behaviour on parisc: It was not possible to avoid a syscall and the
return value of a syscall couldn't be changed.
To modify a syscall number, we were missing to save the new syscall
number to gr20 which is then picked up later in assembly again.
The effect that the return value couldn't be changed is a side-effect of
another bug in the assembly code. When a process is ptraced, userspace
expects each syscall to report entrance and exit of a syscall. If a
syscall number was given which doesn't exist, we jumped to the normal
syscall exit code instead of informing userspace that the (non-existant)
syscall exits. This unexpected behaviour confuses userspace and thus the
bug was misinterpreted as if we can't change the return value.
This patch fixes both problems and was tested on 64bit kernel with
32bit userspace.
Signed-off-by: Helge Deller <deller@gmx.de> Cc: Mike Frysinger <vapier@gentoo.org> Cc: stable@vger.kernel.org # v4.0+ Tested-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
/proc/sys/vm/stat_refresh warns nr_isolated_anon and nr_isolated_file go
increasingly negative under compaction: which would add delay when
should be none, or no delay when should delay. The bug in compaction
was due to a recent mmotm patch, but much older instance of the bug was
also noticed in isolate_migratepages_range() which is used for CMA and
gigantic hugepage allocations.
The bug is caused by putback_movable_pages() in an error path
decrementing the isolated counters without them being previously
incremented by acct_isolated(). Fix isolate_migratepages_range() by
removing the error-path putback, thus reaching acct_isolated() with
migratepages still isolated, and leaving putback to caller like most
other places do.
Fixes: edc2ca612496 ("mm, compaction: move pageblock checks up from isolate_migratepages_range()")
[vbabka@suse.cz: expanded the changelog] Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
This oops happens with the namespace_sem held and can be triggered by
non-root users. An all around not pleasant experience.
To avoid this scenario when finding the appropriate source mount to
copy stop the walk up the mnt_master chain when the first source mount
is encountered.
Further rewrite the walk up the last_source mnt_master chain so that
it is clear what is going on.
The reason why the first source mount is special is that it it's
mnt_parent is not a mount in the dest_mnt propagation tree, and as
such termination conditions based up on the dest_mnt mount propgation
tree do not make sense.
To avoid other kinds of confusion last_dest is not changed when
computing last_source. last_dest is only used once in propagate_one
and that is above the point of the code being modified, so changing
the global variable is meaningless and confusing.
propagate_one(m) calculates "type" argument for copy_tree() like this:
> if (m->mnt_group_id == last_dest->mnt_group_id) {
> type = CL_MAKE_SHARED;
> } else {
> type = CL_SLAVE;
> if (IS_MNT_SHARED(m))
> type |= CL_MAKE_SHARED;
> }
The "type" argument then governs clone_mnt() behavior with respect to flags
and mnt_master of new mount. When we iterate through a slave group, it is
possible that both current "m" and "last_dest" are not shared (although,
both are slaves, i.e. have non-NULL mnt_master-s). Then the comparison
above erroneously makes new mount shared and sets its mnt_master to
last_source->mnt_master. The patch fixes the problem by handling zero
mnt_group_id-s as though they are unequal.
The similar problem exists in the implementation of "else" clause above
when we have to ascend upward in the master/slave tree by calling:
proper number of times. The last step is governed by
"n->mnt_group_id != last_dest->mnt_group_id" condition that may lie if
both are zero. The patch fixes this case in the same way as the former one.
The code for checking whether a BAR address range is valid will break
out of the loop when a start address of 0x0 is encountered.
This behaviour is wrong since by breaking out of the loop we may miss
the BAR that describes the EFI frame buffer in a later iteration.
Because of this bug I can't use video=efifb: boot parameter to get
efifb on my new ThinkPad E550 for my old linux system hard disk with
3.10 kernel. In 3.10, efifb is the only choice due to DRM/I915 not
supporting the GPU.
This patch also add a trivial optimization to break out after we find
the frame buffer address range without testing later BARs.
Signed-off-by: Wang YanQing <udknight@gmail.com>
[ Rewrote changelog. ] Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Reviewed-by: Peter Jones <pjones@redhat.com> Cc: <stable@vger.kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: David Herrmann <dh.herrmann@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1462454061-21561-2-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
The crypto hash walk code is broken when supplied with an offset
greater than or equal to PAGE_SIZE. This patch fixes it by adjusting
walk->pg and walk->offset when this happens.
Set the mutex owner thread ID.
Original patch from: Prarit Bhargava <prarit@redhat.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=115121 Link: https://github.com/acpica/acpica/commit/7a3bd2d9 Signed-off-by: Prarit Bhargava <prarit@redhat.com> Tested-by: Andy Lutomirski <luto@kernel.org> # On a Dell XPS 13 9350 Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Mark reported that having asterisks on the end of directory names
confuses get_maintainer.pl when it encounters subdirectories, and that
my name does not appear when run on drivers/firmware/efi/libstub.
Currently register functions for events will be called
through the 'reg' field of event class directly without
any check when seting up triggers.
Triggers for events that don't support register through
debug fs (events under events/ftrace are for trace-cmd to
read event format, and most of them don't have a register
function except events/ftrace/functionx) can't be enabled
at all, and an oops will be hit when setting up trigger
for those events, so just not creating them is an easy way
to avoid the oops.
This is a fairly minimal fixup to the horribly bad behavior of hash_64()
with certain input patterns.
In particular, because the multiplicative value used for the 64-bit hash
was intentionally bit-sparse (so that the multiply could be done with
shifts and adds on architectures without hardware multipliers), some
bits did not get spread out very much. In particular, certain fairly
common bit ranges in the input (roughly bits 12-20: commonly with the
most information in them when you hash things like byte offsets in files
or memory that have block factors that mean that the low bits are often
zero) would not necessarily show up much in the result.
There's a bigger patch-series brewing to fix up things more completely,
but this is the fairly minimal fix for the 64-bit hashing problem. It
simply picks a much better constant multiplier, spreading the bits out a
lot better.
NOTE! For 32-bit architectures, the bad old hash_64() remains the same
for now, since 64-bit multiplies are expensive. The bigger hashing
cleanup will replace the 32-bit case with something better.
The new constants were picked by George Spelvin who wrote that bigger
cleanup series. I just picked out the constants and part of the comment
from that series.
Cc: stable@vger.kernel.org Cc: George Spelvin <linux@horizon.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
using the "r" constraint for %2. r0 is a valid register in the "r" set,
but addi X,r0,X turns it into an li:
li r7,-1
andc r7,r7,r0
popcntd r4,r7
Fix this by using the "b" constraint, for which r0 is not a valid
register.
This was found with a kernel build using gcc trunk, narrowed down to
when -frename-registers was enabled at -O2. It is just luck however
that we aren't seeing this on older toolchains.
Thanks to Segher for working with me to find this issue.
Cc: stable@vger.kernel.org Fixes: d0cebfa650a0 ("powerpc: word-at-a-time optimization for 64-bit Little Endian") Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
On the consumer side, we have interrupt driven flow management of the
producer. It is sufficient to base the signaling decision on the
amount of space that is available to write after the read is complete.
The current code samples the previous available space and uses this
in making the signaling decision. This state can be stale and is
unnecessary. Since the state can be stale, we end up not signaling
the host (when we should) and this can result in a hang. Fix this
problem by removing the unnecessary check. I would like to thank
Arseney Romanenko <arseneyr@microsoft.com> for pointing out this issue.
Also, issue a full memory barrier before making the signaling descision
to correctly deal with potential reordering of the write (read index)
followed by the read of pending_sz.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Tested-by: Dexuan Cui <decui@microsoft.com> Cc: <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Fixes a bug where previously hv_ringbuffer_read would pass in the old
number of bytes available to read instead of the expected old read index
when calculating when to signal to the host that the ringbuffer is empty.
Since the previous write size is already saved, also changes the
hv_need_to_signal_on_read to use the previously read value rather than
recalculating it.
Signed-off-by: Christopher Oo <t-chriso@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Convert 6+-string comments repeating function names to normal kernel-style
comments and fix a couple of other comment style issues. No textual or
functional changes intended.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
* if we have a hashed negative dentry and either CREAT|EXCL on
r/o filesystem, or CREAT|TRUNC on r/o filesystem, or CREAT|EXCL
with failing may_o_create(), we should fail with EROFS or the
error may_o_create() has returned, but not ENOENT. Which is what
the current code ends up returning.
* if we have CREAT|TRUNC hitting a regular file on a read-only
filesystem, we can't fail with EROFS here. At the very least,
not until we'd done follow_managed() - we might have a writable
file (or a device, for that matter) bound on top of that one.
Moreover, the code downstream will see that O_TRUNC and attempt
to grab the write access (*after* following possible mount), so
if we really should fail with EROFS, it will happen. No need
to do that inside atomic_open().
The real logics is much simpler than what the current code is
trying to do - if we decided to go for simple lookup, ended
up with a negative dentry *and* had create_error set, fail with
create_error. No matter whether we'd got that negative dentry
from lookup_real() or had found it in dcache.
Both of these drivers can return NOTIFY_BAD, but this terminates
processing other callbacks that were registered later on the chain.
Since the driver did nothing to log the error it seems wrong to prevent
other interested parties from seeing it. E.g. neither of them had even
bothered to check the type of the error to see if it was a memory error
before the return NOTIFY_BAD.
Phoenix Audio MT202pcs (1de7:0114) and MT202exe (1de7:0013) need the
same workaround as TMX320 for avoiding the firmware bug. It fixes the
frequent error about the sample rate inquiries and the slow device
probe as consequence.
Khugepaged detects own VMAs by checking vm_file and vm_ops but this way
it cannot distinguish private /dev/zero mappings from other special
mappings like /dev/hpet which has no vm_ops and popultes PTEs in mmap.
This fixes false-positive VM_BUG_ON and prevents installing THP where
they are not expected.
The drivers/infiniband stack uses write() as a replacement for
bi-directional ioctl(). This is not safe. There are ways to
trigger write calls that result in the return structure that
is normally written to user space being shunted off to user
specified kernel memory instead.
For the immediate repair, detect and deny suspicious accesses to
the write API.
For long term, update the user space libraries and the kernel API
to something that doesn't present the same security vulnerabilities
(likely a structured ioctl() interface).
The impacted uAPI interfaces are generally only available if
hardware from drivers/infiniband is installed in the system.
Reported-by: Jann Horn <jann@thejh.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
[ Expanded check to all known write() entry points ] Cc: stable@vger.kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
The secondary CPU starts up in ARM mode. When the kernel is compiled in
thumb2 mode we have to explicitly compile the secondary startup
trampoline in ARM mode, otherwise the CPU will go to Nirvana.
The driver's VDD on/off logic assumes that whenever the VDD is on we
also hold an AUX power domain reference. Since BIOS can leave the VDD on
during booting and resuming and on DDI platforms we won't take a
corresponding power reference, the above assumption won't hold on those
platforms and an eventual delayed VDD off work will do an extraneous AUX
power domain put resulting in a refcount underflow. Fix this the same
way we did this for non-DDI DP encoders:
commit 6d93c0c41760c0 ("drm/i915: fix VDD state tracking after system
resume")
At the same time call the DP encoder suspend handler the same way as the
non-DDI DP encoders do to flush any pending VDD off work. Leaving the
work running may cause a HW access where we don't expect this (at a point
where power domains are suspended already).
While at it remove an unnecessary function call indirection.
This fixed for me AUX refcount underflow problems on BXT during
suspend/resume.
When we suspend we turn everything off so the pps should be idle, and we
also (or at least should) disable all power wells which will reset the
power sequencer port assignment. So when we resume all power sequencers
should be in their reset state. However it's at least theoretically
possible that the BIOS would touch the power seuqencer(s), so to be safe
we ought to read out the current port assignment like we do at driver
init time.
To do that we can simply call vlv_initial_power_sequencer_setup() from
the encoder ->reset() hook before calling intel_edp_panel_vdd_sanitize().
There's no danger or clobbering the pps delays since we now have those
stored within intel_dp and we don't change them once initialized.
This will make sure that the vdd state gets correctly tracked post-resume
in case the BIOS enabled it.
We need to shuffle things around a bit to get the locking right, and
while at it, make intel_edp_panel_vdd_sanitize() static and move it
around a bit to avoid a forward declaration.
Cc: Imre Deak <imre.deak@intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Keep IRQ mappings on context teardown. This won't leak IRQs as if we
allocate the mapping again, the generic code will give the same
mapping used last time.
Doing this works around a race in the generic code. Masking the
interrupt introduces a race which can crash the kernel or result in
IRQ that is never EOIed. The lost of EOI results in all subsequent
mappings to the same HW IRQ never receiving an interrupt.
We've seen this race with cxl test cases which are doing heavy context
startup and teardown at the same time as heavy interrupt load.
A fix to the generic code is being investigated also.
Signed-off-by: Michael Neuling <mikey@neuling.org> Cc: stable@vger.kernel.org # 3.8 Tested-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Tested-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Some hubs are forgetful, and end up forgetting whatever GUID we set
previously after we do a suspend/resume cycle. This can lead to
hotplugging breaking (along with probably other things) since the hub
will start sending connection notifications with the wrong GUID. As
such, we need to check on resume whether or not the GUID the hub is
giving us is valid.
We can thank KASAN for finding this, otherwise I probably would have spent
hours on it. This fixes a somewhat harder to trigger kernel panic, occuring
while enabling MST where the port we were currently updating the payload on
would have all of it's refs dropped before we finished what we were doing:
==================================================================
BUG: KASAN: use-after-free in drm_dp_update_payload_part1+0xb3f/0xdb0 [drm_kms_helper] at addr ffff8800d29de018
Read of size 4 by task Xorg/973
=============================================================================
BUG kmalloc-2048 (Tainted: G B W ): kasan: bad access detected
-----------------------------------------------------------------------------
So blk_flush_plug_list() was called with from_schedule == true.
If from_schedule is true, that means that finally blk_mq_insert_requests()
offloads execution of __blk_mq_run_hw_queue() and uses kblockd workqueue,
i.e. it calls kblockd_schedule_delayed_work_on().
That means, that we race with another CPU, which is about to execute
__blk_mq_run_hw_queue() work.
Further debugging shows the following traces from different CPUs:
CPU#0 CPU#1
---------------------------------- -------------------------------
reqeust A inserted
STORE hctx->ctx_map[0] bit marked
kblockd_schedule...() returns 1
<schedule to kblockd workqueue>
request B inserted
STORE hctx->ctx_map[1] bit marked
kblockd_schedule...() returns 0
*** WORK PENDING bit is cleared ***
flush_busy_ctxs() is executed, but
bit 1, set by CPU#1, is not observed
As a result request B pended forever.
This behaviour can be explained by speculative LOAD of hctx->ctx_map on
CPU#0, which is reordered with clear of PENDING bit and executed _before_
actual STORE of bit 1 on CPU#1.
The proper fix is an explicit full barrier <mfence>, which guarantees
that clear of PENDING bit is to be executed before all possible
speculative LOADS or STORES inside actual work function.
Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com> Cc: Gioh Kim <gi-oh.kim@profitbricks.com> Cc: Michael Wang <yun.wang@profitbricks.com> Cc: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
blk_queue_split marks bio unmergeable, which makes sense for normal bio.
But if dispatching the bio to underlayer disk, the blk_queue_split
checks are invalid, hence it's possible the bio becomes mergeable.
In the reported bug, this bug causes trim against raid0 performance slash
https://bugzilla.kernel.org/show_bug.cgi?id=117051
Reported-and-tested-by: Park Ju Hyung <qkrwngud825@gmail.com> Fixes: 6ac45aeb6bca(block: avoid to merge splitted bio) Cc: stable@vger.kernel.org (v4.3+) Cc: Ming Lei <ming.lei@canonical.com> Cc: Neil Brown <neilb@suse.de> Reviewed-by: Jens Axboe <axboe@fb.com> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
The VSync polarity was negative instead of positive for the 4k CEA formats.
I probably copy-and-pasted these from the DMT 4k format, which does have a
negative VSync polarity.
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Reported-by: Martin Bugge <marbugge@cisco.com> Cc: <stable@vger.kernel.org> # for v4.1 and up Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
The Link ECU is an aftermarket ECU computer for vehicles that provides
full tuning abilities as well as datalogging and displaying capabilities
via the USB to Serial adapter built into the device.
Signed-off-by: Mike Manning <michael@bsch.com.au> Cc: stable <stable@vger.kernel.org> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
The variable_matches() function can currently read "var_name[len]", for
example when:
- var_name[0] == 'a',
- len == 1
- match_name points to the NUL-terminated string "ab".
This function is supposed to accept "var_name" inputs that are not
NUL-terminated (hence the "len" parameter"). Document the function, and
access "var_name[*match]" only if "*match" is smaller than "len".
Reported-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Laszlo Ersek <lersek@redhat.com> Cc: Peter Jones <pjones@redhat.com> Cc: Matthew Garrett <mjg59@coreos.com> Cc: Jason Andryuk <jandryuk@gmail.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: <stable@vger.kernel.org> # v3.10+ Link: http://thread.gmane.org/gmane.comp.freedesktop.xorg.drivers.intel/86906 Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
In certain probe conditions the interrupt came right after registering
the handler causing a NULL pointer exception because of uninitialized
waitqueue:
$ udevadm trigger
i2c-gpio i2c-gpio-1: using pins 143 (SDA) and 144 (SCL)
i2c-gpio i2c-gpio-3: using pins 53 (SDA) and 52 (SCL)
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = e8b38000
[00000000] *pgd=00000000
Internal error: Oops: 5 [#1] SMP ARM
Modules linked in: snd_soc_i2s(+) i2c_gpio(+) snd_soc_idma snd_soc_s3c_dma snd_soc_core snd_pcm_dmaengine snd_pcm snd_timer snd soundcore ac97_bus spi_s3c64xx pwm_samsung dwc2 exynos_adc phy_exynos_usb2 exynosdrm exynos_rng rng_core rtc_s3c
CPU: 0 PID: 717 Comm: data-provider-m Not tainted 4.6.0-rc1-next-20160401-00011-g1b8d87473b9e-dirty #101
Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
(...)
(__wake_up_common) from [<c0379624>] (__wake_up+0x38/0x4c)
(__wake_up) from [<c0a41d30>] (ak8975_irq_handler+0x28/0x30)
(ak8975_irq_handler) from [<c0386720>] (handle_irq_event_percpu+0x88/0x140)
(handle_irq_event_percpu) from [<c038681c>] (handle_irq_event+0x44/0x68)
(handle_irq_event) from [<c0389c40>] (handle_edge_irq+0xf0/0x19c)
(handle_edge_irq) from [<c0385e04>] (generic_handle_irq+0x24/0x34)
(generic_handle_irq) from [<c05ee360>] (exynos_eint_gpio_irq+0x50/0x68)
(exynos_eint_gpio_irq) from [<c0386720>] (handle_irq_event_percpu+0x88/0x140)
(handle_irq_event_percpu) from [<c038681c>] (handle_irq_event+0x44/0x68)
(handle_irq_event) from [<c0389a70>] (handle_fasteoi_irq+0xb4/0x194)
(handle_fasteoi_irq) from [<c0385e04>] (generic_handle_irq+0x24/0x34)
(generic_handle_irq) from [<c03860b4>] (__handle_domain_irq+0x5c/0xb4)
(__handle_domain_irq) from [<c0301774>] (gic_handle_irq+0x54/0x94)
(gic_handle_irq) from [<c030c910>] (__irq_usr+0x50/0x80)
The bug was reproduced on exynos4412-trats2 (with a max77693 device also
using i2c-gpio) after building max77693 as a module.
Cc: <stable@vger.kernel.org> Fixes: 94a6d5cf7caa ("iio:ak8975 Implement data ready interrupt handling") Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Tested-by: Gregor Boirie <gregor.boirie@parrot.com> Signed-off-by: Jonathan Cameron <jic23@kernel.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Specifically for the case of reads that use the Extended Register
Read Long command, a multi-byte read operation is broken up into
8-byte chunks. However the call to spmi_ext_register_readl() is
incorrectly passing 'val_size', which if greater than 8 will
always fail. The argument should instead be 'len'.
Fixes: c9afbb05a9ff ("regmap: spmi: support base and extended register spaces") Signed-off-by: Jack Pham <jackp@codeaurora.org> Signed-off-by: Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
On some SOCs PORTS_IMPL register value is never programmed by the
firmware and left at zero value. Which means that no sata ports are
available for software. AHCI driver used to cope up with this by
fabricating the port_map if the PORTS_IMPL register is read zero,
but recent patch broke this workaround as zero value was valid for
NVMe disks.
This patch adds ports-implemented DT bindings as workaround for this issue
in a way that DT can can override the PORTS_IMPL register in cases where
the firmware did not program it already.
In usecases where force_port_map is used saved_port_map is never set,
resulting in not programming the PORTS_IMPL register as part of initial
config. This patch fixes this by setting it to port_map even in case
where force_port_map is used, making it more inline with other parts of
the code.
The buck9 regulator of S2MPS11 PMIC had incorrect vsel_mask (0xff
instead of 0x1f) thus reading entire register as buck9's voltage. This
effectively caused regulator core to interpret values as higher voltages
than they were and then to set real voltage much lower than intended.
The buck9 provides power to other regulators, including LDO13
and LDO19 which supply the MMC2 (SD card). On Odroid XU3/XU4 the lower
voltage caused SD card detection errors on Odroid XU3/XU4:
mmc1: card never left busy state
mmc1: error -110 whilst initialising SD card
During driver probe the regulator core was checking whether initial
voltage matches the constraints. With incorrect vsel_mask of 0xff and
default value of 0x50, the core interpreted this as 5 V which is outside
of constraints (3-3.775 V). Then the regulator core was adjusting the
voltage to match the constraints. With incorrect vsel_mask this new
voltage mapped to a vere low voltage in the driver.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Reviewed-by: Javier Martinez Canillas <javier@osg.samsung.com> Tested-by: Javier Martinez Canillas <javier@osg.samsung.com> Signed-off-by: Mark Brown <broonie@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Regardless of the previous CPU a timer was on, add_timer_on()
currently simply sets timer->flags to the new CPU. As the caller must
be seeing the timer as idle, this is locally fine, but the timer
leaving the old base while unlocked can lead to race conditions as
follows.
Let's say timer was on cpu 0.
cpu 0 cpu 1
-----------------------------------------------------------------------------
del_timer(timer) succeeds
del_timer(timer)
lock_timer_base(timer) locks cpu_0_base
add_timer_on(timer, 1)
spin_lock(&cpu_1_base->lock)
timer->flags set to cpu_1_base
operates on @timer operates on @timer
This triggered with mod_delayed_work_on() which contains
"if (del_timer()) add_timer_on()" sequence eventually leading to the
following oops.
This upstream commit is causing an oops: d8f00cd685f5 ("usb: hub: do not clear BOS field during reset device")
This patch has already been included in several -stable kernels. Here
are the affected kernels:
4.5.0-rc4 (current git)
4.4.2
4.3.6 (currently in review)
4.1.18
3.18.27
3.14.61
How to reproduce the problem:
Boot kernel with slub debugging enabled (otherwise memory corruption
will cause random oopses later instead of immediately)
Plug in USB 3.0 disk to xhci USB 3.0 port
dd if=/dev/sdc of=/dev/null bs=65536
(where /dev/sdc is the USB 3.0 disk)
Unplug USB cable while dd is still going
Oops is immediate:
Reported-by: Tony Battersby <tonyb@cybernetics.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
eb1af3b71f9d ("Fix computation of channel address")
I switched the "sck_way" variable from holding the log2 value read
from the h/w to instead be the actual number. Unfortunately it
is needed in log2 form when used to shift the address.
Huge pages are not normally available to PV guests. Not suppressing
hugetlbfs use results in an endless loop of page faults when user mode
code tries to access a hugetlbfs mapped area (since the hypervisor
denies such PTEs to be created, but error indications can't be
propagated out of xen_set_pte_at(), just like for various of its
siblings), and - once killed in an oops like this:
On s390 we only can enable hugepages if the underlying hardware/hypervisor
also does support this. Common code now would assume this to be
signaled by setting HPAGE_SHIFT to 0. But on s390, where we only
support one hugepage size, there is a link between HPAGE_SHIFT and
pageblock_order.
So instead of setting HPAGE_SHIFT to 0, we will implement the check for
the hardware capability.
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
s390 has a constant hugepage size, by setting HPAGE_SHIFT we also change
e.g. the pageblock_order, which should be independent in respect to
hugepage support.
With this patch every architecture is free to define how to check
for hugepage support.
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Allowing userptr bo which are basicly a list of page from some vma
(so either anonymous page or file backed page) would lead to serious
corruption of kernel structures and counters (because we overwrite
the page->mapping field when mapping buffer).
This will already block if the buffer was populated before anyone does
try to mmap it because then TTM_PAGE_FLAG_SG would be set in in the
ttm_tt flags. But that flag is check before ttm_tt_populate in the ttm
vm fault handler.
So to be safe just add a check to verify_access() callback.
Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
The commit [9bef72bdb26e: ALSA: pcxhr: Use nonatomic PCM ops]
converted to non-atomic PCM ops, but shamelessly with an unbalanced
mutex locking, which leads to the hangup easily. Fix it.
If userspace calls UNLOCK_PI unconditionally without trying the TID -> 0
transition in user space first then the user space value might not have the
waiters bit set. This opens the following race:
ATPX is the ACPI method for controlling AMD PowerXpress laptops.
There are flags to indicate which methods are supported. If
the dGPU power down flag is not supported, the driver needs to
implement the dGPU power down manually. We had previously
always forced the driver to assume the ATPX dGPU power down
was present, but this causes problems on boards where it is
not, leading to GPU hangs when attempting to power down the
dGPU. Manual dGPU power down is not currently supported in
the Linux driver. Some laptops indicate that the ATPX
dGPU power down method is not present, but it actually
apparently is. I'm not sure if this is a bios bug and it should
be set or if there is a reason it was unset and the method should
not be used. This is not an issue on other OSes since both the
ATPX and the manual driver power down methods are supported.
This is apparently fairly widespread, so just revert for now.
The REAL_LE feature entry in the ibm_pa_feature struct is missing an MMU
feature value, meaning all the remaining elements initialise the wrong
values.
This means instead of checking for byte 5, bit 0, we check for byte 0,
bit 0, and then we incorrectly set the CPU feature bit as well as MMU
feature bit 1 and CPU user feature bits 0 and 2 (5).
Checking byte 0 bit 0 (IBM numbering), means we're looking at the
"Memory Management Unit (MMU)" feature - ie. does the CPU have an MMU.
In practice that bit is set on all platforms which have the property.
This means we set CPU_FTR_REAL_LE always. In practice that seems not to
matter because all the modern cpus which have this property also
implement REAL_LE, and we've never needed to disable it.
We're also incorrectly setting MMU feature bit 1, which is:
#define MMU_FTR_TYPE_8xx 0x00000002
Luckily the only place that looks for MMU_FTR_TYPE_8xx is in Book3E
code, which can't run on the same cpus as scan_features(). So this also
doesn't matter in practice.
Finally in the CPU user feature mask, we're setting bits 0 and 2. Bit 2
is not currently used, and bit 0 is:
#define PPC_FEATURE_PPC_LE 0x00000001
Which says the CPU supports the old style "PPC Little Endian" mode.
Again this should be harmless in practice as no 64-bit CPUs implement
that mode.
Fix the code by adding the missing initialisation of the MMU feature.
Also add a comment marking CPU user feature bit 2 (0x4) as reserved. It
would be unsafe to start using it as old kernels incorrectly set it.
Fixes: 44ae3ab3358e ("powerpc: Free up some CPU feature bits by moving out MMU-related features") Signed-off-by: Anton Blanchard <anton@samba.org> Cc: stable@vger.kernel.org
[mpe: Flesh out changelog, add comment reserving 0x4] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Prevent information from leaking to userspace by doing a memset to 0 of
the export state structure before setting the structure values and copying
it. This prevents un-initialized padding areas from being copied into the
export area.
Cc: <stable@vger.kernel.org> # 3.14.x- Reported-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
In sha_complete_job, incorrect mcryptd_hash_request_ctx pointer is used
when check and complete other jobs. If the memory of first completed req
is freed, while still completing other jobs in the func, kernel will
crash since NULL pointer is assigned to RIP.
Cc: <stable@vger.kernel.org> Signed-off-by: Xiaodong Liu <xiaodong.liu@intel.com> Acked-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
A non-privileged user can create a netlink socket with the same port_id as
used by an existing open nl80211 netlink socket (e.g. as used by a hostapd
process) with a different protocol number.
Closing this socket will then lead to the notification going to nl80211's
socket release notification handler, and possibly cause an action such as
removing a virtual interface.
Fix this issue by checking that the netlink protocol is NETLINK_GENERIC.
Since generic netlink has no notifier chain of its own, we can't fix the
problem more generically.
Fixes: 026331c4d9b5 ("cfg80211/mac80211: allow registering for and sending action frames") Cc: stable@vger.kernel.org Signed-off-by: Dmitry Ivanov <dima@ubnt.com>
[rewrite commit message] Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
The gtco driver expects at least one valid endpoint. If given malicious
descriptors that specify 0 for the number of endpoints, it will crash in
the probe function. Ensure there is at least one endpoint on the interface
before using it.
Also let's fix a minor coding style issue.
The full correct report of this issue can be found in the public
Red Hat Bugzilla:
Tracking idle time in bictcp_cwnd_event() is imprecise, as epoch_start
is normally set at ACK processing time, not at send time.
Doing a proper fix would need to add an additional state variable,
and does not seem worth the trouble, given CUBIC bug has been there
forever before Jana noticed it.
Let's simply not set epoch_start in the future, otherwise
bictcp_update() could overflow and CUBIC would again
grow cwnd too fast.
This was detected thanks to a packetdrill test Neal wrote that was flaky
before applying this fix.
Fixes: 30927520dbae ("tcp_cubic: better follow cubic curve after idle period") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Cc: Jana Iyengar <jri@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
When we call btrfs_commit_transaction(), we splice the list "ordered"
of our transaction handle into the transaction's "pending_ordered"
list, but we don't re-initialize the "ordered" list of our transaction
handle, this means it still points to the same elements it used to
before the splice. Then we check if the current transaction's state is
>= TRANS_STATE_COMMIT_START and if it is we end up calling
btrfs_end_transaction() which simply splices again the "ordered" list
of our handle into the transaction's "pending_ordered" list, leaving
multiple pointers to the same ordered extents which results in list
corruption when we are iterating, removing and freeing ordered extents
at btrfs_wait_pending_ordered(), resulting in access to dangling
pointers / use-after-free issues.
Similarly, btrfs_end_transaction() can end up in some cases calling
btrfs_commit_transaction(), and both did a list splice of the transaction
handle's "ordered" list into the transaction's "pending_ordered" without
re-initializing the handle's "ordered" list, resulting in exactly the
same problem.
This produces the following warning on a kernel with linked list
debugging enabled:
On a non-debug kernel this leads to invalid memory accesses, causing a
crash. Fix this by using list_splice_init() instead of list_splice() in
btrfs_commit_transaction() and btrfs_end_transaction().
Cc: stable@vger.kernel.org Fixes: 50d9aa99bd35 ("Btrfs: make sure logged extents complete in the current transaction V3" Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Mike Galbraith [Sat, 23 Apr 2016 00:38:23 +0000 (20:38 -0400)]
Correct backport of fa3c776 ("Thermal: Ignore invalid trip points")
Backport of 81ad4276b505e987dd8ebbdf63605f92cd172b52 failed to adjust
for intervening ->get_trip_temp() argument type change, thus causing
stack protector to panic.
drivers/thermal/thermal_core.c: In function ‘thermal_zone_device_register’:
drivers/thermal/thermal_core.c:1569:41: warning: passing argument 3 of
‘tz->ops->get_trip_temp’ from incompatible pointer type [-Wincompatible-pointer-types]
if (tz->ops->get_trip_temp(tz, count, &trip_temp))
^
drivers/thermal/thermal_core.c:1569:41: note: expected ‘long unsigned int *’
but argument is of type ‘int *’
CC: <stable@vger.kernel.org> #3.18,#4.1 Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Jana Iyengar found an interesting issue on CUBIC :
The epoch is only updated/reset initially and when experiencing losses.
The delta "t" of now - epoch_start can be arbitrary large after app idle
as well as the bic_target. Consequentially the slope (inverse of
ca->cnt) would be really large, and eventually ca->cnt would be
lower-bounded in the end to 2 to have delayed-ACK slow-start behavior.
This particularly shows up when slow_start_after_idle is disabled
as a dangerous cwnd inflation (1.5 x RTT) after few seconds of idle
time.
Jana initial fix was to reset epoch_start if app limited,
but Neal pointed out it would ask the CUBIC algorithm to recalculate the
curve so that we again start growing steeply upward from where cwnd is
now (as CUBIC does just after a loss). Ideally we'd want the cwnd growth
curve to be the same shape, just shifted later in time by the amount of
the idle period.
Reported-by: Jana Iyengar <jri@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Sangtae Ha <sangtae.ha@gmail.com> Cc: Lawrence Brakmo <lawrence@brakmo.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
On BXT platform Host Controller and Device Controller figure as
same PCI device but with different device function. HCD should
not pass data to Device Controller but only to Host Controllers.
Checking if companion device is Host Controller, otherwise skip.
Cc: <stable@vger.kernel.org> Signed-off-by: Robert Dobrowolski <robert.dobrowolski@linux.intel.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
PCI hotpluggable xhci controllers such as some Alpine Ridge solutions will
remove the xhci controller from the PCI bus when the last USB device is
disconnected.
Add a flag to indicate that the host is being removed to avoid queueing
configure_endpoint commands for the dropped endpoints.
For PCI hotplugged controllers this will prevent 5 second command timeouts
For static xhci controllers the configure_endpoint command is not needed
in the removal case as everything will be returned, freed, and the
controller is reset.
For now the flag is only set for PCI connected host controllers.
The problem seems to be that if a new device is detected
while we have already removed the shared HCD, then many of the
xhci operations (e.g. xhci_alloc_dev(), xhci_setup_device())
hang as command never completes.
I don't think XHCI can operate without the shared HCD as we've
already called xhci_halt() in xhci_only_stop_hcd() when shared HCD
goes away. We need to prevent new commands from being queued
not only when HCD is dying but also when HCD is halted.
The following lockup was detected while testing the otg state
machine.
[ 178.199951] xhci-hcd xhci-hcd.0.auto: xHCI Host Controller
[ 178.205799] xhci-hcd xhci-hcd.0.auto: new USB bus registered, assigned bus number 1
[ 178.214458] xhci-hcd xhci-hcd.0.auto: hcc params 0x0220f04c hci version 0x100 quirks 0x00010010
[ 178.223619] xhci-hcd xhci-hcd.0.auto: irq 400, io mem 0x48890000
[ 178.230677] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002
[ 178.237796] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[ 178.245358] usb usb1: Product: xHCI Host Controller
[ 178.250483] usb usb1: Manufacturer: Linux 4.0.0-rc1-00024-g6111320 xhci-hcd
[ 178.257783] usb usb1: SerialNumber: xhci-hcd.0.auto
[ 178.267014] hub 1-0:1.0: USB hub found
[ 178.272108] hub 1-0:1.0: 1 port detected
[ 178.278371] xhci-hcd xhci-hcd.0.auto: xHCI Host Controller
[ 178.284171] xhci-hcd xhci-hcd.0.auto: new USB bus registered, assigned bus number 2
[ 178.294038] usb usb2: New USB device found, idVendor=1d6b, idProduct=0003
[ 178.301183] usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[ 178.308776] usb usb2: Product: xHCI Host Controller
[ 178.313902] usb usb2: Manufacturer: Linux 4.0.0-rc1-00024-g6111320 xhci-hcd
[ 178.321222] usb usb2: SerialNumber: xhci-hcd.0.auto
[ 178.329061] hub 2-0:1.0: USB hub found
[ 178.333126] hub 2-0:1.0: 1 port detected
[ 178.567585] dwc3 48890000.usb: usb_otg_start_host 0
[ 178.572707] xhci-hcd xhci-hcd.0.auto: remove, state 4
[ 178.578064] usb usb2: USB disconnect, device number 1
[ 178.586565] xhci-hcd xhci-hcd.0.auto: USB bus 2 deregistered
[ 178.592585] xhci-hcd xhci-hcd.0.auto: remove, state 1
[ 178.597924] usb usb1: USB disconnect, device number 1
[ 178.603248] usb 1-1: new high-speed USB device number 2 using xhci-hcd
[ 190.597337] INFO: task kworker/u4:0:6 blocked for more than 10 seconds.
[ 190.604273] Not tainted 4.0.0-rc1-00024-g6111320 #1058
[ 190.610228] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 190.618443] kworker/u4:0 D c05c0ac0 0 6 2 0x00000000
[ 190.625120] Workqueue: usb_otg usb_otg_work
[ 190.629533] [<c05c0ac0>] (__schedule) from [<c05c10ac>] (schedule+0x34/0x98)
[ 190.636915] [<c05c10ac>] (schedule) from [<c05c1318>] (schedule_preempt_disabled+0xc/0x10)
[ 190.645591] [<c05c1318>] (schedule_preempt_disabled) from [<c05c23d0>] (mutex_lock_nested+0x1ac/0x3fc)
[ 190.655353] [<c05c23d0>] (mutex_lock_nested) from [<c046cf8c>] (usb_disconnect+0x3c/0x208)
[ 190.664043] [<c046cf8c>] (usb_disconnect) from [<c0470cf0>] (_usb_remove_hcd+0x98/0x1d8)
[ 190.672535] [<c0470cf0>] (_usb_remove_hcd) from [<c0485da8>] (usb_otg_start_host+0x50/0xf4)
[ 190.681299] [<c0485da8>] (usb_otg_start_host) from [<c04849a4>] (otg_set_protocol+0x5c/0xd0)
[ 190.690153] [<c04849a4>] (otg_set_protocol) from [<c0484b88>] (otg_set_state+0x170/0xbfc)
[ 190.698735] [<c0484b88>] (otg_set_state) from [<c0485740>] (otg_statemachine+0x12c/0x470)
[ 190.707326] [<c0485740>] (otg_statemachine) from [<c0053c84>] (process_one_work+0x1b4/0x4a0)
[ 190.716162] [<c0053c84>] (process_one_work) from [<c00540f8>] (worker_thread+0x154/0x44c)
[ 190.724742] [<c00540f8>] (worker_thread) from [<c0058f88>] (kthread+0xd4/0xf0)
[ 190.732328] [<c0058f88>] (kthread) from [<c000e810>] (ret_from_fork+0x14/0x24)
[ 190.739898] 5 locks held by kworker/u4:0/6:
[ 190.744274] #0: ("%s""usb_otg"){.+.+.+}, at: [<c0053bf4>] process_one_work+0x124/0x4a0
[ 190.752799] #1: ((&otgd->work)){+.+.+.}, at: [<c0053bf4>] process_one_work+0x124/0x4a0
[ 190.761326] #2: (&otgd->fsm.lock){+.+.+.}, at: [<c048562c>] otg_statemachine+0x18/0x470
[ 190.769934] #3: (usb_bus_list_lock){+.+.+.}, at: [<c0470ce8>] _usb_remove_hcd+0x90/0x1d8
[ 190.778635] #4: (&dev->mutex){......}, at: [<c046cf8c>] usb_disconnect+0x3c/0x208
[ 190.786700] INFO: task kworker/1:0:14 blocked for more than 10 seconds.
[ 190.793633] Not tainted 4.0.0-rc1-00024-g6111320 #1058
[ 190.799567] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 190.807783] kworker/1:0 D c05c0ac0 0 14 2 0x00000000
[ 190.814457] Workqueue: usb_hub_wq hub_event
[ 190.818866] [<c05c0ac0>] (__schedule) from [<c05c10ac>] (schedule+0x34/0x98)
[ 190.826252] [<c05c10ac>] (schedule) from [<c05c4e40>] (schedule_timeout+0x13c/0x1ec)
[ 190.834377] [<c05c4e40>] (schedule_timeout) from [<c05c19f0>] (wait_for_common+0xbc/0x150)
[ 190.843062] [<c05c19f0>] (wait_for_common) from [<bf068a3c>] (xhci_setup_device+0x164/0x5cc [xhci_hcd])
[ 190.852986] [<bf068a3c>] (xhci_setup_device [xhci_hcd]) from [<c046b7f4>] (hub_port_init+0x3f4/0xb10)
[ 190.862667] [<c046b7f4>] (hub_port_init) from [<c046eb64>] (hub_event+0x704/0x1018)
[ 190.870704] [<c046eb64>] (hub_event) from [<c0053c84>] (process_one_work+0x1b4/0x4a0)
[ 190.878919] [<c0053c84>] (process_one_work) from [<c00540f8>] (worker_thread+0x154/0x44c)
[ 190.887503] [<c00540f8>] (worker_thread) from [<c0058f88>] (kthread+0xd4/0xf0)
[ 190.895076] [<c0058f88>] (kthread) from [<c000e810>] (ret_from_fork+0x14/0x24)
[ 190.902650] 5 locks held by kworker/1:0/14:
[ 190.907023] #0: ("usb_hub_wq"){.+.+.+}, at: [<c0053bf4>] process_one_work+0x124/0x4a0
[ 190.915454] #1: ((&hub->events)){+.+.+.}, at: [<c0053bf4>] process_one_work+0x124/0x4a0
[ 190.924070] #2: (&dev->mutex){......}, at: [<c046e490>] hub_event+0x30/0x1018
[ 190.931768] #3: (&port_dev->status_lock){+.+.+.}, at: [<c046eb50>] hub_event+0x6f0/0x1018
[ 190.940558] #4: (&bus->usb_address0_mutex){+.+.+.}, at: [<c046b458>] hub_port_init+0x58/0xb10
On some xHCI controllers (e.g. R-Car SoCs), the AC64 bit (bit 0) of
HCCPARAMS1 is set to 1. However, the xHCs don't support 64-bit
address memory pointers actually. So, in this case, this driver should
call dma_set_coherent_mask(dev, DMA_BIT_MASK(32)) in xhci_gen_setup().
Otherwise, the xHCI controller will be died after a usb device is
connected if it runs on above 4GB physical memory environment.
So, this patch adds a new quirk XHCI_NO_64BIT_SUPPORT to resolve
such an issue.
Give USB3 devices a better chance to enumerate at USB 3 speeds if
they are connected to a suspended host.
Solves an issue with NEC uPD720200 host hanging when partially
enumerating a USB3 device as USB2 after host controller runtime resume.
Based on Sergey's test patch [1], this fixes zram with lz4 compression
on big endian cpus.
Note that the 64-bit preprocessor test is not a cleanup, it's part of
the fix, since those identifiers are bogus (for example, __ppc64__
isn't defined anywhere else in the kernel, which means we'd fall into
the 32-bit definitions on ppc64).
The commit 895005202987 ("dmaengine: dw: apply both HS interfaces and remove
slave_id usage") cleaned up the code to avoid usage of depricated slave_id
member of generic slave configuration.
Meanwhile it broke the master selection by removing important call to
dwc_set_masters() in ->device_alloc_chan_resources() which copied masters from
custom slave configuration to the internal channel structure.
Everything works until now since there is no customized connection of
DesignWare DMA IP to the bus, i.e. one bus and one or more masters are in use.
The configurations where 2 masters are connected to the different masters are
not working anymore. We are expecting one user of such configuration and need
to select masters properly. Besides that it is obviously a performance
regression since only one master is in use in multi-master configuration.
Select masters in accordance with what user asked for. Keep this patch in a form
more suitable for back porting.
We are safe to take necessary data in ->device_alloc_chan_resources() because
we don't support generic slave configuration embedded into custom one, and thus
the only way to provide such is to use the parameter to a filter function which
is called exactly before channel resource allocation.
While here, replase BUG_ON to less noisy dev_warn() and prevent channel
allocation in case of error.
Fixes: 895005202987 ("dmaengine: dw: apply both HS interfaces and remove slave_id usage") Cc: stable@vger.kernel.org Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
This is Dell usb dock audio workaround.
It was fixed the master volume keep lower.
[Some background: the patch essentially skips the controls of a couple
of FU volumes. Although the firmware exposes the dB and the value
information via the usb descriptor, changing the values (we set the
min volume as default) screws up the device. Although this has been
fixed in the newer firmware, the devices are shipped with the old
firmware, thus we need the workaround in the driver side. -- tiwai]