Reported-by: syzbot+7d6a57304857423318a5@syzkaller.appspotmail.com Fixes: 408cbe695350 ("vfs: Convert fuse to use the new mount API") Cc: David Howells <dhowells@redhat.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
RIP: 0010:__apic_accept_irq+0x46/0x740 arch/x86/kvm/lapic.c:1029
Call Trace:
kvm_apic_set_irq+0xb4/0x140 arch/x86/kvm/lapic.c:558
stimer_notify_direct arch/x86/kvm/hyperv.c:648 [inline]
stimer_expiration arch/x86/kvm/hyperv.c:659 [inline]
kvm_hv_process_stimers+0x594/0x1650 arch/x86/kvm/hyperv.c:686
vcpu_enter_guest+0x2b2a/0x54b0 arch/x86/kvm/x86.c:7896
vcpu_run+0x393/0xd40 arch/x86/kvm/x86.c:8152
kvm_arch_vcpu_ioctl_run+0x636/0x900 arch/x86/kvm/x86.c:8360
kvm_vcpu_ioctl+0x6cf/0xaf0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2765
The testcase programs HV_X64_MSR_STIMERn_CONFIG/HV_X64_MSR_STIMERn_COUNT,
in addition, there is no lapic in the kernel, the counters value are small
enough in order that kvm_hv_process_stimers() inject this already-expired
timer interrupt into the guest through lapic in the kernel which triggers
the NULL deferencing. This patch fixes it by don't advertise direct mode
synthetic timers and discarding the inject when lapic is not in kernel.
syzbot found that a thread can stall for minutes inside kexec_load() after
that thread was killed by SIGKILL [1]. It turned out that the reproducer
was trying to allocate 2408MB of memory using kimage_alloc_page() from
kimage_load_normal_segment(). Let's check for SIGKILL before doing memory
allocation.
nfc_genl_deactivate_target() relies on the NFC_ATTR_TARGET_INDEX
attribute being present, but doesn't check whether it is actually
provided by the user. Same goes for nfc_genl_fw_download() and
NFC_ATTR_FIRMWARE_NAME.
This patch adds appropriate checks.
Found with syzkaller.
Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Unit of 'chunk_size' is byte, instead of sector, so fix it by setting
the queue_limits' max_discard_sectors to rs->md.chunk_sectors. Also,
rename chunk_size to chunk_size_bytes.
Without this fix, too big max_discard_sectors is applied on the request
queue of dm-raid, finally raid code has to split the bio again.
This re-split done by raid causes the following nested clone_endio:
1) one big bio 'A' is submitted to dm queue, and served as the original
bio
2) one new bio 'B' is cloned from the original bio 'A', and .map()
is run on this bio of 'B', and B's original bio points to 'A'
3) raid code sees that 'B' is too big, and split 'B' and re-submit
the remainded part of 'B' to dm-raid queue via generic_make_request().
4) now dm will handle 'B' as new original bio, then allocate a new
clone bio of 'C' and run .map() on 'C'. Meantime C's original bio
points to 'B'.
5) suppose now 'C' is completed by raid directly, then the following
clone_endio() is called recursively:
clone_endio(C)
->clone_endio(B) #B is original bio of 'C'
->bio_endio(A)
'A' can be big enough to make hundreds of nested clone_endio(), then
stack can be corrupted easily.
Fixes: 61697a6abd24a ("dm: eliminate 'split_discard_bios' flag from DM target interface") Cc: stable@vger.kernel.org Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
inode_smack::smk_lock is taken during smack_d_instantiate(), which is
called during a filesystem transaction when creating a file on ext4.
Therefore to avoid a deadlock, all code that takes this lock must use
GFP_NOFS, to prevent memory reclaim from waiting for the filesystem
transaction to complete.
There is a logic bug in the current smack_bprm_set_creds():
If LSM_UNSAFE_PTRACE is set, but the ptrace state is deemed to be
acceptable (e.g. because the ptracer detached in the meantime), the other
->unsafe flags aren't checked. As far as I can tell, this means that
something like the following could work (but I haven't tested it):
- task A: create task B with fork()
- task B: set NO_NEW_PRIVS
- task B: install a seccomp filter that makes open() return 0 under some
conditions
- task B: replace fd 0 with a malicious library
- task A: attach to task B with PTRACE_ATTACH
- task B: execve() a file with an SMACK64EXEC extended attribute
- task A: while task B is still in the middle of execve(), exit (which
destroys the ptrace relationship)
Make sure that if any flags other than LSM_UNSAFE_PTRACE are set in
bprm->unsafe, we reject the execve().
Cc: stable@vger.kernel.org Fixes: 5663884caab1 ("Smack: unify all ptrace accesses in the smack") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This commit takes care of stack randomization and stack guard gap when
computing mmap base address and checks if the task asked for
randomization. This fixes the problem uncovered and not fixed for arm
here: https://lkml.kernel.org/r/20170622200033.25714-1-riel@redhat.com
Link: http://lkml.kernel.org/r/20190730055113.23635-7-alex@ghiti.fr Signed-off-by: Alexandre Ghiti <alex@ghiti.fr> Acked-by: Kees Cook <keescook@chromium.org> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Christoph Hellwig <hch@lst.de> Cc: James Hogan <jhogan@kernel.org> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Paul Burton <paul.burton@mips.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
This commit takes care of stack randomization and stack guard gap when
computing mmap base address and checks if the task asked for
randomization. This fixes the problem uncovered and not fixed for arm
here: https://lkml.kernel.org/r/20170622200033.25714-1-riel@redhat.com
Link: http://lkml.kernel.org/r/20190730055113.23635-10-alex@ghiti.fr Signed-off-by: Alexandre Ghiti <alex@ghiti.fr> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Paul Burton <paul.burton@mips.com> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Christoph Hellwig <hch@lst.de> Cc: James Hogan <jhogan@kernel.org> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Do not offset mmap base address because of stack randomization if current
task does not want randomization. Note that x86 already implements this
behaviour.
Link: http://lkml.kernel.org/r/20190730055113.23635-4-alex@ghiti.fr Signed-off-by: Alexandre Ghiti <alex@ghiti.fr> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Kees Cook <keescook@chromium.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: James Hogan <jhogan@kernel.org> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Paul Burton <paul.burton@mips.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
There is a scenario causing ocfs2 umount hang when multiple hosts are
rebooting at the same time.
NODE1 NODE2 NODE3
send unlock requset to NODE2
dies
become recovery master
recover NODE2
find NODE2 dead
mark resource RECOVERING
directly remove lock from grant list
calculate usage but RECOVERING marked
**miss the window of purging
clear RECOVERING
To reproduce this issue, crash a host and then umount ocfs2
from another node.
To solve this, just let unlock progress wait for recovery done.
Link: http://lkml.kernel.org/r/1550124866-20367-1-git-send-email-gechangwei@live.cn Signed-off-by: Changwei Ge <gechangwei@live.cn> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Since 9e3596b0c653 ("kbuild: initramfs cleanup, set target from Kconfig")
"make clean" leaves behind compressed initramfs images. Example:
$ make defconfig
$ sed -i 's|CONFIG_INITRAMFS_SOURCE=""|CONFIG_INITRAMFS_SOURCE="/tmp/ir.cpio"|' .config
$ make olddefconfig
$ make -s
$ make -s clean
$ git clean -ndxf | grep initramfs
Would remove usr/initramfs_data.cpio.gz
clean rules do not have CONFIG_* context so they do not know which
compression format was used. Thus they don't know which files to delete.
Tell clean to delete all possible compression formats.
Once patched usr/initramfs_data.cpio.gz and friends are deleted by
"make clean".
Link: http://lkml.kernel.org/r/20190722063251.55541-1-gthelen@google.com Fixes: 9e3596b0c653 ("kbuild: initramfs cleanup, set target from Kconfig") Signed-off-by: Greg Thelen <gthelen@google.com> Cc: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
First, when sgl_current->next is valid, @hw_sgl will be freed in the
first loop, but it free again after the loop.
Second, sgl_current and sgl_current->next_sgl is not match when
dma_pool_free() is invoked, the third parameter should be the dma
address of sgl_current, but sgl_current->next_sgl is the dma address
of next chain, so use sgl_current->next_sgl is wrong.
Fix this by deleting the last dma_pool_free() in sec_free_hw_sgl(),
modifying the condition for while loop, and matching the address for
dma_pool_free().
In hypfs_fill_super(), if hypfs_create_update_file() fails,
sbi->update_file is left holding an error number. This is passed to
hypfs_kill_super() which doesn't check for this.
Fix this by not setting sbi->update_value until after we've checked for
error.
Fixes: 24bbb1faf3f0 ("[PATCH] s390_hypfs filesystem") Signed-off-by: David Howells <dhowells@redhat.com>
cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
cc: Heiko Carstens <heiko.carstens@de.ibm.com>
cc: linux-s390@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
Anatoly reports that he gets the below warning when booting -git on
a sparc64 box on debian unstable:
...
[ 13.352975] aes_sparc64: Using sparc64 aes opcodes optimized AES
implementation
[ 13.428002] ------------[ cut here ]------------
[ 13.428081] WARNING: CPU: 21 PID: 586 at
drivers/block/pktcdvd.c:2597 pkt_setup_dev+0x2e4/0x5a0 [pktcdvd]
[ 13.428147] Attempt to register a non-SCSI queue
[ 13.428184] Modules linked in: pktcdvd libdes cdrom aes_sparc64
n2_rng md5_sparc64 sha512_sparc64 rng_core sha256_sparc64 flash
sha1_sparc64 ip_tables x_tables ipv6 crc_ccitt nf_defrag_ipv6 autofs4
ext4 crc16 mbcache jbd2 raid10 raid456 async_raid6_recov async_memcpy
async_pq async_xor xor async_tx raid6_pq raid1 raid0 multipath linear
md_mod crc32c_sparc64
[ 13.428452] CPU: 21 PID: 586 Comm: pktsetup Not tainted 5.3.0-10169-g574cc4539762 #1234
[ 13.428507] Call Trace:
[ 13.428542] [00000000004635c0] __warn+0xc0/0x100
[ 13.428582] [0000000000463634] warn_slowpath_fmt+0x34/0x60
[ 13.428626] [000000001045b244] pkt_setup_dev+0x2e4/0x5a0 [pktcdvd]
[ 13.428674] [000000001045ccf4] pkt_ctl_ioctl+0x94/0x220 [pktcdvd]
[ 13.428724] [00000000006b95c8] do_vfs_ioctl+0x628/0x6e0
[ 13.428764] [00000000006b96c8] ksys_ioctl+0x48/0x80
[ 13.428803] [00000000006b9714] sys_ioctl+0x14/0x40
[ 13.428847] [0000000000406294] linux_sparc_syscall+0x34/0x44
[ 13.428890] irq event stamp: 4181
[ 13.428924] hardirqs last enabled at (4189): [<00000000004e0a74>]
console_unlock+0x634/0x6c0
[ 13.428984] hardirqs last disabled at (4196): [<00000000004e0540>]
console_unlock+0x100/0x6c0
[ 13.429048] softirqs last enabled at (3978): [<0000000000b2e2d8>]
__do_softirq+0x498/0x520
[ 13.429110] softirqs last disabled at (3967): [<000000000042cfb4>]
do_softirq_own_stack+0x34/0x60
[ 13.429172] ---[ end trace 2220ca468f32967d ]---
[ 13.430018] pktcdvd: setup of pktcdvd device failed
[ 13.455589] des_sparc64: Using sparc64 des opcodes optimized DES
implementation
[ 13.515334] camellia_sparc64: Using sparc64 camellia opcodes
optimized CAMELLIA implementation
[ 13.522856] pktcdvd: setup of pktcdvd device failed
[ 13.529327] pktcdvd: setup of pktcdvd device failed
[ 13.532932] pktcdvd: setup of pktcdvd device failed
[ 13.536165] pktcdvd: setup of pktcdvd device failed
[ 13.539372] pktcdvd: setup of pktcdvd device failed
[ 13.542834] pktcdvd: setup of pktcdvd device failed
[ 13.546536] pktcdvd: setup of pktcdvd device failed
[ 15.431071] XFS (dm-0): Mounting V5 Filesystem
...
Apparently debian auto-attaches any cdrom like device to pktcdvd, which
can lead to the above warning. There's really no reason to warn for this
situation, kill it.
If userspace reads the buffer via blockdev while mounting,
sb_getblk()+modify can race with buffer read via blockdev.
For example,
FS userspace
bh = sb_getblk()
modify bh->b_data
read
ll_rw_block(bh)
fill bh->b_data by on-disk data
/* lost modified data by FS */
set_buffer_uptodate(bh)
set_buffer_uptodate(bh)
Userspace should not use the blockdev while mounting though, the udev
seems to be already doing this. Although I think the udev should try to
avoid this, workaround the race by small overhead.
Link: http://lkml.kernel.org/r/87pnk7l3sw.fsf_-_@mail.parknet.co.jp Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Reported-by: Jan Stancek <jstancek@redhat.com> Tested-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The calculation of memblock_limit in adjust_lowmem_bounds() assumes that
bank 0 starts from a PMD-aligned address. However, the beginning of the
first bank may be NOMAP memory and the start of usable memory
will be not aligned to PMD boundary. In such case the memblock_limit will
be set to the end of the NOMAP region, which will prevent any memblock
allocations.
Mark the region between the end of the NOMAP area and the next PMD-aligned
address as NOMAP as well, so that the usable memory will start at
PMD-aligned address.
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently, multi_v7_defconfig + CONFIG_FUNCTION_TRACER fails to build
with clang:
arm-linux-gnueabi-ld: kernel/softirq.o: in function `_local_bh_enable':
softirq.c:(.text+0x504): undefined reference to `mcount'
arm-linux-gnueabi-ld: kernel/softirq.o: in function `__local_bh_enable_ip':
softirq.c:(.text+0x58c): undefined reference to `mcount'
arm-linux-gnueabi-ld: kernel/softirq.o: in function `do_softirq':
softirq.c:(.text+0x6c8): undefined reference to `mcount'
arm-linux-gnueabi-ld: kernel/softirq.o: in function `irq_enter':
softirq.c:(.text+0x75c): undefined reference to `mcount'
arm-linux-gnueabi-ld: kernel/softirq.o: in function `irq_exit':
softirq.c:(.text+0x840): undefined reference to `mcount'
arm-linux-gnueabi-ld: kernel/softirq.o:softirq.c:(.text+0xa50): more undefined references to `mcount' follow
clang can emit a working mcount symbol, __gnu_mcount_nc, when
'-meabi gnu' is passed to it. Until r369147 in LLVM, this was
broken and caused the kernel not to boot with '-pg' because the
calling convention was not correct. Always build with '-meabi gnu'
when using clang but ensure that '-pg' (which is added with
CONFIG_FUNCTION_TRACER and its prereq CONFIG_HAVE_FUNCTION_TRACER)
cannot be added with it unless this is fixed (which means using
clang 10.0.0 and newer).
Move the static keyword to the front of declarations of pci_regs_behavior[]
and pcie_cap_regs_behavior[], which resolves compiler warnings when
building with "W=1":
drivers/pci/pci-bridge-emul.c:41:1: warning: ‘static’ is not at beginning of
declaration [-Wold-style-declaration]
const static struct pci_bridge_reg_behavior pci_regs_behavior[] = {
^
drivers/pci/pci-bridge-emul.c:176:1: warning: ‘static’ is not at beginning of
declaration [-Wold-style-declaration]
const static struct pci_bridge_reg_behavior pcie_cap_regs_behavior[] = {
^
devm_of_phy_get() can fail for a number of reasons besides probe
deferral. It can for example return -ENOMEM if it runs out of memory as
it tries to allocate devres structures. Propagating only -EPROBE_DEFER
is problematic because it results in these legitimately fatal errors
being treated as "PHY not specified in DT".
What we really want is to ignore the optional PHYs only if they have not
been specified in DT. devm_of_phy_get() returns -ENODEV in this case, so
that's the special case that we need to handle. So we propagate all
errors, except -ENODEV, so that real failures will still cause the
driver to fail probe.
Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Andrew Murray <andrew.murray@arm.com> Cc: Jingoo Han <jingoohan1@gmail.com> Cc: Kukjin Kim <kgene@kernel.org> Cc: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
regulator_get_optional() can fail for a number of reasons besides probe
deferral. It can for example return -ENOMEM if it runs out of memory as
it tries to allocate data structures. Propagating only -EPROBE_DEFER is
problematic because it results in these legitimately fatal errors being
treated as "regulator not specified in DT".
What we really want is to ignore the optional regulators only if they
have not been specified in DT. regulator_get_optional() returns -ENODEV
in this case, so that's the special case that we need to handle. So we
propagate all errors, except -ENODEV, so that real failures will still
cause the driver to fail probe.
Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Andrew Murray <andrew.murray@arm.com> Cc: Richard Zhu <hongxing.zhu@nxp.com> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Shawn Guo <shawnguo@kernel.org> Cc: Sascha Hauer <s.hauer@pengutronix.de> Cc: Fabio Estevam <festevam@gmail.com> Cc: kernel@pengutronix.de Cc: linux-imx@nxp.com Signed-off-by: Sasha Levin <sashal@kernel.org>
regulator_get_optional() can fail for a number of reasons besides probe
deferral. It can for example return -ENOMEM if it runs out of memory as
it tries to allocate data structures. Propagating only -EPROBE_DEFER is
problematic because it results in these legitimately fatal errors being
treated as "regulator not specified in DT".
What we really want is to ignore the optional regulators only if they
have not been specified in DT. regulator_get_optional() returns -ENODEV
in this case, so that's the special case that we need to handle. So we
propagate all errors, except -ENODEV, so that real failures will still
cause the driver to fail probe.
Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Andrew Murray <andrew.murray@arm.com> Cc: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
regulator_get_optional() can fail for a number of reasons besides probe
deferral. It can for example return -ENOMEM if it runs out of memory as
it tries to allocate data structures. Propagating only -EPROBE_DEFER is
problematic because it results in these legitimately fatal errors being
treated as "regulator not specified in DT".
What we really want is to ignore the optional regulators only if they
have not been specified in DT. regulator_get_optional() returns -ENODEV
in this case, so that's the special case that we need to handle. So we
propagate all errors, except -ENODEV, so that real failures will still
cause the driver to fail probe.
Tested-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Andrew Murray <andrew.murray@arm.com> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Acked-by: Shawn Lin <shawn.lin@rock-chips.com> Cc: Shawn Lin <shawn.lin@rock-chips.com> Cc: Heiko Stuebner <heiko@sntech.de> Cc: linux-rockchip@lists.infradead.org Signed-off-by: Sasha Levin <sashal@kernel.org>
This fixes an issue in which key down events for function keys would be
repeatedly emitted even after the user has raised the physical key. For
example, the driver fails to emit the F5 key up event when going through
the following steps:
- fnmode=1: hold FN, hold F5, release FN, release F5
- fnmode=2: hold F5, hold FN, release F5, release FN
The repeated F5 key down events can be easily verified using xev.
Signed-off-by: Joao Moreno <mail@joaomoreno.com> Co-developed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Do not use printk_ratelimit() in drivers/pci/pci.c as it shares the rate
limiting state with all other callers to the printk_ratelimit().
Add pci_info_ratelimited() (similar to pci_notice_ratelimited() added in
the commit a88a7b3eb076 ("vfio: Use dev_printk() when possible")) and use
it instead of printk_ratelimit() + pci_info().
We need to use selinux_cred() to fetch the SELinux cred blob instead
of directly using current->security or current_security(). There
were a couple of lingering uses of current_security() in the SELinux code
that were apparently missed during the earlier conversions. IIUC, this
would only manifest as a bug if multiple security modules including
SELinux are enabled and SELinux is not first in the lsm order. After
this change, there appear to be no other users of current_security()
in-tree; perhaps we should remove it altogether.
Fixes: bbd3662a8348 ("Infrastructure management of the cred security blob") Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Why:
- Relative commit: 8b9f9d4dc511 ("regmap: verify if register is
writeable before writing operations"), this patch
will always check for unwritable registers, it will compare reg
with max_register in regmap_writeable.
- The pcf85363/pcf85263 has the capability of address wrapping
which means if you access an address outside the allowed range
(0x00-0x2f) hardware actually wraps the access to a lower address.
The rtc-pcf85363 driver will use this feature to configure the time
and execute 2 actions in the same i2c write operation (stopping the
clock and configure the time). However the driver has also
configured the `regmap maxregister` protection mechanism that will
block accessing addresses outside valid range (0x00-0x2f).
How:
- Split of writing regs to two parts, first part writes control
registers about stop_enable and resets, second part writes
RTC time and date registers.
The RTC IRQ is requested before the struct rtc_device is allocated,
this may lead to a NULL pointer dereference in IRQ handler.
To fix this issue, allocating the rtc_device struct before requesting
the RTC IRQ using devm_rtc_allocate_device, and use rtc_register_device
to register the RTC device.
Clang produces references to __aeabi_uidivmod and __aeabi_idivmod for
arm-linux-gnueabi and arm-linux-gnueabihf targets incorrectly when AEABI
is not selected (such as when OABI_COMPAT is selected).
While this means that OABI userspaces wont be able to upgraded to
kernels built with Clang, it means that boards that don't enable AEABI
like s3c2410_defconfig will stop failing to link in KernelCI when built
with Clang.
Translation faults arising from cache maintenance instructions are
rather unhelpfully reported with an FSR value where the WnR field is set
to 1, indicating that the faulting access was a write. Since cache
maintenance instructions on 32-bit ARM do not require any particular
permissions, this can cause our private 'cacheflush' system call to fail
spuriously if a translation fault is generated due to page aging when
targetting a read-only VMA.
In this situation, we will return -EFAULT to userspace, although this is
unfortunately suppressed by the popular '__builtin___clear_cache()'
intrinsic provided by GCC, which returns void.
Although it's tempting to write this off as a userspace issue, we can
actually do a little bit better on CPUs that support LPAE, even if the
short-descriptor format is in use. On these CPUs, cache maintenance
faults additionally set the CM field in the FSR, which we can use to
suppress the write permission checks in the page fault handler and
succeed in performing cache maintenance to read-only areas even in the
presence of a translation fault.
Reported-by: Orion Hodson <oth@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
Recent probing at the Linux Kernel Memory Model uncovered a
'surprise'. Strongly ordered architectures where the atomic RmW
primitive implies full memory ordering and
smp_mb__{before,after}_atomic() are a simple barrier() (such as MIPS
without WEAK_REORDERING_BEYOND_LLSC) fail for:
Because, while the atomic_inc() implies memory order, it
(surprisingly) does not provide a compiler barrier. This then allows
the compiler to re-order like so:
klp_module_coming() is called for every module appearing in the system.
It sets obj->mod to a patched module for klp_object obj. Unfortunately
it leaves it set even if an error happens later in the function and the
patched module is not allowed to be loaded.
klp_is_object_loaded() uses obj->mod variable and could currently give a
wrong return value. The bug is probably harmless as of now.
Signed-off-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Petr Mladek <pmladek@suse.com> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The comment describing the loongson_llsc_mb() reorder case doesn't
make any sense what so ever. Instruction re-ordering is not an SMP
artifact, but rather a CPU local phenomenon. Clarify the comment by
explaining that these issue cause a coherence fail.
For the branch speculation case; if futex_atomic_cmpxchg_inatomic()
needs one at the bne branch target, then surely the normal
__cmpxch_asm() implementation does too. We cannot rely on the
barriers from cmpxchg() because cmpxchg_local() is implemented with
the same macro, and branch prediction and speculation are, too, CPU
local.
Fixes: e02e07e3127d ("MIPS: Loongson: Introduce and use loongson_llsc_mb()") Cc: Huacai Chen <chenhc@lemote.com> Cc: Huang Pei <huangpei@loongson.cn> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Paul Burton <paul.burton@mips.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Addresses a few issues that were noticed when compiling with non-default
warnings enabled. The trimmed-down warnings in the order they are fixed
below are:
* declaration of 'size' shadows a parameter
* '%s' directive output may be truncated writing up to 5 bytes into a
region of size between 1 and 64
* pointer targets in initialization of 'char *' from 'unsigned char *'
differ in signedness
Each iteration of for_each_child_of_node() executes of_node_put() on the
previous node, but in some return paths in the middle of the loop
of_node_put() is missing thus causing a reference leak.
Hence stash these mid-loop return values in a variable 'err' and add a
new label err_node_put which executes of_node_put() on the previous node
and returns 'err' on failure.
Change mid-loop return statements to point to jump to this label to
fix the reference leak.
Goodix touchpad may drop its first couple input events when
i2c-designware-platdrv and intel-lpss it connects to took too long to
runtime resume from runtime suspended state.
This issue happens becuase the touchpad has a rather small buffer to
store up to 13 input events, so if the host doesn't read those events in
time (i.e. runtime resume takes too long), events are dropped from the
touchpad's buffer.
The bottleneck is D3cold delay it waits when transitioning from D3cold
to D0, hence remove the delay to make the resume faster. I've tested
some systems with intel-lpss and haven't seen any regression.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202683 Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The problem is that the CHT Whiskey Cove PMIC's builtin i2c-adapter is
itself a part of an i2c-client (the PMIC). This means that transfers done
through it take adapter->bus_lock twice, once for the parent i2c-adapter
and once for its own bus_lock. Lockdep does not like this nested locking.
To make lockdep happy in the case of busses with muxes, the i2c-core's
i2c_adapter_lock_bus function calls:
But i2c_adapter_depth only works when the direct parent of the adapter is
another adapter, as it is only meant for muxes. In this case there is an
i2c-client and MFD instantiated platform_device in the parent->child chain
between the 2 devices.
This commit overrides the default i2c_lock_operations, passing a hardcoded
depth of 1 to rt_mutex_lock_nested, making lockdep happy.
Note that if there were to be a mux attached to the i2c-wc-cht adapter,
this would break things again since the i2c-mux code expects the
root-adapter to have a locking depth of 0. But the i2c-wc-cht adapter
always has only 1 client directly attached in the form of the charger IC
paired with the CHT Whiskey Cove PMIC.
Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
arch/mips/mm/tlbex.c:634:19: error: use of logical '&&' with constant
operand [-Werror,-Wconstant-logical-operand]
if (cpu_has_rixi && _PAGE_NO_EXEC) {
^ ~~~~~~~~~~~~~
arch/mips/mm/tlbex.c:634:19: note: use '&' for a bitwise operation
if (cpu_has_rixi && _PAGE_NO_EXEC) {
^~
&
arch/mips/mm/tlbex.c:634:19: note: remove constant to silence this
warning
if (cpu_has_rixi && _PAGE_NO_EXEC) {
~^~~~~~~~~~~~~~~~
1 error generated.
Explicitly cast this value to a boolean so that clang understands we
intend for this to be a non-zero value.
arch/mips/kernel/branch.c:148:8: error: variable 'bc_false' is used
uninitialized whenever switch case is taken
[-Werror,-Wsometimes-uninitialized]
case mm_bc2t_op:
^~~~~~~~~~
arch/mips/kernel/branch.c:157:8: note: uninitialized use occurs here
if (bc_false)
^~~~~~~~
arch/mips/kernel/branch.c:149:8: error: variable 'bc_false' is used
uninitialized whenever switch case is taken
[-Werror,-Wsometimes-uninitialized]
case mm_bc1t_op:
^~~~~~~~~~
arch/mips/kernel/branch.c:157:8: note: uninitialized use occurs here
if (bc_false)
^~~~~~~~
arch/mips/kernel/branch.c:142:4: note: variable 'bc_false' is declared
here
int bc_false = 0;
^
2 errors generated.
When mm_bc1t_op and mm_bc2t_op are taken, the bc_false initialization
does not happen, which leads to a garbage value upon use, as illustrated
below with a small sample program.
$ mipsel-linux-gnu-gcc --version | head -n1
mipsel-linux-gnu-gcc (Debian 8.3.0-2) 8.3.0
$ clang --version | head -n1
ClangBuiltLinux clang version 9.0.0 (git://github.com/llvm/llvm-project 544315b4197034a3be8acd12cba56a75fb1f08dc) (based on LLVM 9.0.0svn)
$ cat test.c
#include <stdio.h>
static void switch_scoped(int opcode)
{
switch (opcode) {
case 1:
case 2: {
int bc_false = 0;
bc_false = 4;
case 3:
case 4:
printf("\t* switch scoped bc_false = %d\n", bc_false);
}
}
}
static void function_scoped(int opcode)
{
int bc_false = 0;
switch (opcode) {
case 1:
case 2: {
bc_false = 4;
case 3:
case 4:
printf("\t* function scoped bc_false = %d\n", bc_false);
}
}
}
In order to further reduce power consumption, the XBurst core
by default attempts to avoid branch target buffer lookups by
detecting & special casing loops. This feature will cause
BogoMIPS and lpj calculate in error. Set cp0 config7 bit 4 to
disable this feature.
Remount process will release system zone which was allocated before if
"noblock_validity" is specified. If we mount an ext4 file system to two
mountpoints with default mount options, and then remount one of them
with "noblock_validity", it may trigger a use after free problem when
someone accessing the other one.
This problem can also be reproduced by one mountpint, At the same time,
add_system_zone() can get called during remount as well so there can be
racing ext4_data_block_valid() reading the rbtree at the same time.
This patch add RCU to protect system zone from releasing or building
when doing a remount which inverse current "noblock_validity" mount
option. It assign the rbtree after the whole tree was complete and
do actual freeing after rcu grace period, avoid any intermediate state.
Reported-by: syzbot+1e470567330b7ad711d5@syzkaller.appspotmail.com Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>
The reason is if disable_checkpoint mount option is on, meta dirty
pages can remain during umount, and then be flushed by iput() of
meta_inode, however node_inode has been iput()ed before
meta_inode's iput().
Since checkpoint is disabled, all meta/node datas are useless and
should be dropped in next mount, so in umount, let's adjust
drop_inode() to give a hint to iput_final() to drop all those dirty
datas correctly.
During release of the syncpt, we remove it from the list of syncpt and
the tree, but only if it is not already been removed. However, during
signaling, we first remove the syncpt from the list. So, if we
concurrently free and signal the syncpt, the free may decide that it is
not part of the tree and immediately free itself -- meanwhile the
signaler goes on to use the now freed datastructure.
In particular, we get struck by commit 0e2f733addbf ("dma-buf: make
dma_fence structure a bit smaller v2") as the cb_list is immediately
clobbered by the kfree_rcu.
v2: Avoid calling into timeline_fence_release() from under the spinlock
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111381 Fixes: d3862e44daa7 ("dma-buf/sw-sync: Fix locking around sync_timeline lists")
References: 0e2f733addbf ("dma-buf: make dma_fence structure a bit smaller v2") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Sean Paul <seanpaul@chromium.org> Cc: Gustavo Padovan <gustavo@padovan.org> Cc: Christian König <christian.koenig@amd.com> Cc: <stable@vger.kernel.org> # v4.14+ Acked-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190812154247.20508-1-chris@chris-wilson.co.uk Signed-off-by: Sasha Levin <sashal@kernel.org>
The data structure used for log messages is so large that it can cause a
boot failure. Since allocations from that data structure can fail anyway,
use kmalloc() / kfree() instead of that data structure.
See also https://bugzilla.kernel.org/show_bug.cgi?id=204119.
See also commit ded85c193a39 ("scsi: Implement per-cpu logging buffer") # v4.0.
Reported-by: Jan Palus <jpalus@fastmail.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Jan Palus <jpalus@fastmail.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Since commit 4388c9b3a6ee ("powerpc: Do not send system reset request
through the oops path"), pstore dmesg file is not updated when dump is
triggered from HMC. This commit modified system reset (sreset) handler
to invoke fadump or kdump (if configured), without pushing dmesg to
pstore. This leaves pstore to have old dmesg data which won't be much
of a help if kdump fails to capture the dump. This patch fixes that by
calling kmsg_dump() before heading to fadump ot kdump.
Fixes: 4388c9b3a6ee ("powerpc: Do not send system reset request through the oops path") Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190904075949.15607-1-ganeshgr@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
When registering the PLL, unbypass the PLL.
The PLL has two bypass control bit, BYPASS and EXT_BYPASS.
we will expose EXT_BYPASS to clk driver for mux usage, and keep
BYPASS inside pll14xx usage. The PLL has a restriction that
when M/P change, need to RESET/BYPASS pll to avoid glitch, so
we could not expose BYPASS.
To make it easy for clk driver usage, unbypass PLL which does
not hurt current function.
Fixes: 8646d4dcc7fb ("clk: imx: Add PLLs driver for imx8mm soc") Reviewed-by: Leonard Crestez <leonard.crestez@nxp.com> Signed-off-by: Peng Fan <peng.fan@nxp.com> Link: https://lkml.kernel.org/r/1568043491-20680-3-git-send-email-peng.fan@nxp.com Signed-off-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
According to PLL1443XA and PLL1416X spec,
"When BYPASS is 0 and RESETB is changed from 0 to 1, FOUT starts to
output unstable clock until lock time passes. PLL1416X/PLL1443XA may
generate a glitch at FOUT."
So set BYPASS when RESETB is changed from 0 to 1 to avoid glitch.
In the end of set rate, BYPASS will be cleared.
When prepare clock, also need to take care to avoid glitch. So
we also follow Spec to set BYPASS before RESETB changed from 0 to 1.
And add a check if the RESETB is already 0, directly return 0;
Fixes: 8646d4dcc7fb ("clk: imx: Add PLLs driver for imx8mm soc") Reviewed-by: Leonard Crestez <leonard.crestez@nxp.com> Signed-off-by: Peng Fan <peng.fan@nxp.com> Link: https://lkml.kernel.org/r/1568043491-20680-2-git-send-email-peng.fan@nxp.com Signed-off-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Selecting the right parent for the main clock is done using only
main oscillator enabled bit.
In case we have this oscillator bypassed by an external signal (no driving
on the XOUT line), we still use external clock, but with BYPASS bit set.
So, in this case we must select the same parent as before.
Create a macro that will select the right parent considering both bits from
the MOR register.
Use this macro when looking for the right parent.
On arm64 build with clang, sometimes the __cmpxchg_mb is not inlined
when CONFIG_OPTIMIZE_INLINING is set.
Clang then fails a compile-time assertion, because it cannot tell at
compile time what the size of the argument is:
mm/memcontrol.o: In function `__cmpxchg_mb':
memcontrol.c:(.text+0x1a4c): undefined reference to `__compiletime_assert_175'
memcontrol.c:(.text+0x1a4c): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `__compiletime_assert_175'
Mark all of the cmpxchg() style functions as __always_inline to
ensure that the compiler can see the result.
Acked-by: Nick Desaulniers <ndesaulniers@google.com> Reported-by: Nathan Chancellor <natechancellor@gmail.com> Link: https://github.com/ClangBuiltLinux/linux/issues/648 Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Tested-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Andrew Murray <andrew.murray@arm.com> Tested-by: Andrew Murray <andrew.murray@arm.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
GCE hardware stored event information in own internal sysram,
if the initial value in those sysram is not zero value
it will cause a situation that gce can wait the event immediately
after client ask gce to wait event but not really trigger the
corresponding hardware.
In order to make sure that the wait event function is
exactly correct, we need to clear the sysram value in
cmdq initial flow.
prep_irq_for_idle() is intended to be called before entering
H_CEDE (and it is used by the pseries cpuidle driver). However the
default pseries idle routine does not call it, leading to mismanaged
lazy irq state when the cpuidle driver isn't in use. Manifestations of
this include:
* Dropped IPIs in the time immediately after a cpu comes
online (before it has installed the cpuidle handler), making the
online operation block indefinitely waiting for the new cpu to
respond.
* Hitting this WARN_ON in arch_local_irq_restore():
/*
* We should already be hard disabled here. We had bugs
* where that wasn't the case so let's dbl check it and
* warn if we are wrong. Only do that when IRQ tracing
* is enabled as mfmsr() can be costly.
*/
if (WARN_ON_ONCE(mfmsr() & MSR_EE))
__hard_irq_disable();
Call prep_irq_for_idle() from pseries_lpar_idle() and honor its
result.
Fixes: 363edbe2614a ("powerpc: Default arch idle could cede processor on pseries") Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190910225244.25056-1-nathanl@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
Some MMC cards fail to enumerate properly when inserted into an MMC slot
on sdm845 devices. This is because the clk ops for qcom clks round the
frequency up to the nearest rate instead of down to the nearest rate.
For example, the MMC driver requests a frequency of 52MHz from
clk_set_rate() but the qcom implementation for these clks rounds 52MHz
up to the next supported frequency of 100MHz. The MMC driver could be
modified to request clk rate ranges but for now we can fix this in the
clk driver by changing the rounding policy for this clk to be round down
instead of round up.
Fixes: 06391eddb60a ("clk: qcom: Add Global Clock controller (GCC) driver for SDM845") Reported-by: Douglas Anderson <dianders@chromium.org> Cc: Taniya Das <tdas@codeaurora.org> Signed-off-by: Stephen Boyd <swboyd@chromium.org> Link: https://lkml.kernel.org/r/20190830195142.103564-1-swboyd@chromium.org Reviewed-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
When the last device in an eeh_pe is removed the eeh_pe structure itself
(and any empty parents) are freed since they are no longer needed. This
results in a crash when a hotplug driver is involved since the following
may occur:
1. Device is suprise removed.
2. Driver performs an MMIO, which fails and queues and eeh_event.
3. Hotplug driver receives a hotplug interrupt and removes any
pci_devs that were under the slot.
4. pci_dev is torn down and the eeh_pe is freed.
5. The EEH event handler thread processes the eeh_event and crashes
since the eeh_pe pointer in the eeh_event structure is no
longer valid.
Crashing is generally considered poor form. Instead of doing that use
the fact PEs are marked as EEH_PE_INVALID to keep them around until the
end of the recovery cycle, at which point we can safely prune any empty
PEs.
Bare metal machine checks run an "early" handler in real mode before
running the main handler which reports the event.
The main handler runs exactly as a normal interrupt handler, after the
"windup" which sets registers back as they were at interrupt entry.
CFAR does not get restored by the windup code, so that will be wrong
when the handler is run.
Restore the CFAR to the saved value before running the late handler.
Comparing adev->family with CHIP constants is not correct.
adev->family can only be compared with AMDGPU_FAMILY constants and
adev->asic_type is the struct member to compare with CHIP constants.
They are separate identification spaces.
Signed-off-by: Jean Delvare <jdelvare@suse.de> Fixes: 62a37553414a ("drm/amdgpu: add si implementation v10") Cc: Ken Wang <Qingqing.Wang@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: "David (ChunMing) Zhou" <David1.Zhou@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
TM test tm-unavailable must take into account aborts due to host aborting
a transactin because of a facility unavailable exception, just like it
already does for aborts on reschedules (TM_CAUSE_KVM_RESCHED).
Reported-by: Desnes A. Nunes do Rosario <desnesn@linux.ibm.com> Tested-by: Desnes A. Nunes do Rosario <desnesn@linux.ibm.com> Signed-off-by: Gustavo Romero <gromero@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1566341651-19747-1-git-send-email-gromero@linux.vnet.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
[Description]
port spdif fix to staging:
spdif hardwired to afmt inst 1.
spdif func pointer
spdif resource allocation (reserve last audio endpoint for spdif only)
Signed-off-by: Charlene Liu <charlene.liu@amd.com> Reviewed-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com> Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The CPG/MSSR Clock Domain driver does not implement the
generic_pm_domain.power_{on,off}() callbacks, as the domain itself
cannot be powered down. Hence the domain should be marked as always-on
by setting the GENPD_FLAG_ALWAYS_ON flag, to prevent the core PM Domain
code from considering it for power-off, and doing unnessary processing.
Note that this only affects RZ/A2 SoCs. On R-Car Gen2 and Gen3 SoCs,
the R-Car SYSC driver handles Clock Domain creation, and offloads only
device attachment/detachment to the CPG/MSSR driver.
The CPG/MSTP Clock Domain driver does not implement the
generic_pm_domain.power_{on,off}() callbacks, as the domain itself
cannot be powered down. Hence the domain should be marked as always-on
by setting the GENPD_FLAG_ALWAYS_ON flag, to prevent the core PM Domain
code from considering it for power-off, and doing unnessary processing.
This also gets rid of a boot warning when the Clock Domain contains an
IRQ-safe device, e.g. on RZ/A1:
sh_mtu2 fcff0000.timer: PM domain cpg_clocks will not be powered off
When cold-booting Asus X434DA, GPIO 7 is found to be already configured
as an interrupt, and the GPIO level is found to be in a state that
causes the interrupt to fire.
As soon as pinctrl-amd probes, this interrupt fires and invokes
amd_gpio_irq_handler(). The IRQ is acked, but no GPIO-IRQ handler was
invoked, so the GPIO level being unchanged just causes another interrupt
to fire again immediately after.
This results in an interrupt storm causing this platform to hang
during boot, right after pinctrl-amd is probed.
Detect this situation and disable the GPIO interrupt when this happens.
This enables the affected platform to boot as normal. GPIO 7 actually is
the I2C touchpad interrupt line, and later on, i2c-multitouch loads and
re-enables this interrupt when it is ready to handle it.
Instead of this approach, I considered disabling all GPIO interrupts at
probe time, however that seems a little risky, and I also confirmed that
Windows does not seem to have this behaviour: the same 41 GPIO IRQs are
enabled under both Linux and Windows, which is a far larger collection
than the GPIOs referenced by the DSDT on this platform.
Some, mostly Fermi, vbioses appear to have zero max voltage. That causes Nouveau to not parse voltage entries, thus users not being able to set higher clocks.
When changing this value Nvidia driver still appeared to ignore it, and I wasn't able to find out why, thus the code is ignoring the value if it is zero.
CC: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Mark Menzynski <mmenzyns@redhat.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
On Turing, an input LUT is required to transform inputs in fixed-point
formats to FP16 for the internal display pipe. We provide an identity
mapping whenever a window is enabled for this reason.
HW has error checks to ensure when the input is already FP16, that the
input LUT is also disabled.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
vfio_pci_enable() saves the device's initial configuration information
with the intent that it is restored in vfio_pci_disable(). However,
the commit referenced in Fixes: below replaced the call to
__pci_reset_function_locked(), which is not wrapped in a state save
and restore, with pci_try_reset_function(), which overwrites the
restored device state with the current state before applying it to the
device. Reinstate use of __pci_reset_function_locked() to return to
the desired behavior.
Fixes: 890ed578df82 ("vfio-pci: Use pci "try" reset interface") Signed-off-by: hexin <hexin15@baidu.com> Signed-off-by: Liu Qi <liuqi16@baidu.com> Signed-off-by: Zhang Yu <zhangyu31@baidu.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The EEH_DEV_NO_HANDLER flag is used by the EEH system to prevent the
use of driver callbacks in drivers that have been bound part way
through the recovery process. This is necessary to prevent later stage
handlers from being called when the earlier stage handlers haven't,
which can be confusing for drivers.
However, the flag is set for all devices that are added after boot
time and only cleared at the end of the EEH recovery process. This
results in hot plugged devices erroneously having the flag set during
the first recovery after they are added (causing their driver's
handlers to be incorrectly ignored).
To remedy this, clear the flag at the beginning of recovery
processing. The flag is still cleared at the end of recovery
processing, although it is no longer really necessary.
Also clear the flag during eeh_handle_special_event(), for the same
reasons.
pmx_writel uses writel which inserts write barrier before the
register write.
This patch has fix to replace writel with writel_relaxed followed
by a readback and memory barrier to ensure write operation is
completed for successful pinctrl change.
After a partition migration, pseries_devicetree_update() processes
changes to the device tree communicated from the platform to
Linux. This is a relatively heavyweight operation, with multiple
device tree searches, memory allocations, and conversations with
partition firmware.
There's a few levels of nested loops which are bounded only by
decisions made by the platform, outside of Linux's control, and indeed
we have seen RCU stalls on large systems while executing this call
graph. Use cond_resched() in these loops so that the cpu is yielded
when needed.
create_physical_mapping expects physical addresses, but creating and
splitting these mappings after boot is supplying virtual (effective)
addresses. This can be irritated by booting with mem= to limit memory
then probing an unused physical memory range:
echo <addr> > /sys/devices/system/memory/probe
This mostly works by accident, firstly because __va(__va(x)) == __va(x)
so the virtual address does not get corrupted. Secondly because pfn_pte
masks out the upper bits of the pfn beyond the physical address limit,
so a pfn constructed with a 0xc000000000000000 virtual linear address
will be masked back to the correct physical address in the pte.
We see warnings such as:
kernel/futex.c: In function 'do_futex':
kernel/futex.c:1676:17: warning: 'oldval' may be used uninitialized in this function [-Wmaybe-uninitialized]
return oldval == cmparg;
^
kernel/futex.c:1651:6: note: 'oldval' was declared here
int oldval, ret;
^
This is because arch_futex_atomic_op_inuser() only sets *oval if ret
is 0 and GCC doesn't see that it will only use it when ret is 0.
Anyway, the non-zero ret path is an error path that won't suffer from
setting *oval, and as *oval is a local var in futex_atomic_op_inuser()
it will have no impact.
This sets cpu 7 online in all respects except for the cpu's
corresponding struct device; dev->offline remains true.
3. Set cpu 7 online via sysfs. _cpu_up() determines that cpu 7 is
already online and returns success. The driver core (device_online)
sets dev->offline = false.
4. The migration completes and restores cpu 7 to offline state:
This leaves cpu7 in a state where the driver core considers the cpu
device online, but in all other respects it is offline and
unused. Attempts to online the cpu via sysfs appear to succeed but the
driver core actually does not pass the request to the lower-level
cpuhp support code. This makes the cpu unusable until the cpu device
is manually set offline and then online again via sysfs.
Instead of directly calling cpu_up/cpu_down, the migration code should
use the higher-level device core APIs to maintain consistent state and
serialize operations.
Fixes: 120496ac2d2d ("powerpc: Bring all threads online prior to migration/hibernation") Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190802192926.19277-2-nathanl@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently, the xmon 'dx' command calls OPAL to dump the XIVE state in
the OPAL logs and also outputs some of the fields of the internal XIVE
structures in Linux. The OPAL calls can only be done on baremetal
(PowerNV) and they crash a pseries machine. Fix by checking the
hypervisor feature of the CPU.
A future patch is going to change semantics of clk_register() so that
clk_hw::init is guaranteed to be NULL after a clk is registered. Avoid
referencing this member here so that we don't run into NULL pointer
exceptions.
A future patch is going to change semantics of clk_register() so that
clk_hw::init is guaranteed to be NULL after a clk is registered. Avoid
referencing this member here so that we don't run into NULL pointer
exceptions.
Cc: Chunyan Zhang <zhang.chunyan@linaro.org> Cc: Baolin Wang <baolin.wang@linaro.org> Signed-off-by: Stephen Boyd <sboyd@kernel.org> Link: https://lkml.kernel.org/r/20190731193517.237136-8-sboyd@kernel.org Acked-by: Baolin Wang <baolin.wang@linaro.org> Acked-by: Chunyan Zhang <zhang.chunyan@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
A future patch is going to change semantics of clk_register() so that
clk_hw::init is guaranteed to be NULL after a clk is registered. Avoid
referencing this member here so that we don't run into NULL pointer
exceptions.
Cc: Neil Armstrong <narmstrong@baylibre.com> Cc: Jerome Brunet <jbrunet@baylibre.com> Signed-off-by: Stephen Boyd <sboyd@kernel.org> Link: https://lkml.kernel.org/r/20190731193517.237136-4-sboyd@kernel.org Acked-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
A future patch is going to change semantics of clk_register() so that
clk_hw::init is guaranteed to be NULL after a clk is registered. Avoid
referencing this member here so that we don't run into NULL pointer
exceptions.
A future patch is going to change semantics of clk_register() so that
clk_hw::init is guaranteed to be NULL after a clk is registered. Avoid
referencing this member here so that we don't run into NULL pointer
exceptions.
Cc: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Stephen Boyd <sboyd@kernel.org> Link: https://lkml.kernel.org/r/20190731193517.237136-2-sboyd@kernel.org
[sboyd@kernel.org: Move name to after checking for error or NULL hw] Acked-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
We allocate only the first level of multilevel TCE tables for KVM
already (alloc_userspace_copy==true), and the rest is allocated on demand.
This is not enabled though for bare metal.
This removes the KVM limitation (implicit, via the alloc_userspace_copy
parameter) and always allocates just the first level. The on-demand
allocation of missing levels is already implemented.
As from now on DMA map might happen with disabled interrupts, this
allocates TCEs with GFP_ATOMIC; otherwise lockdep reports errors 1].
In practice just a single page is allocated there so chances for failure
are quite low.
To save time when creating a new clean table, this skips non-allocated
indirect TCE entries in pnv_tce_free just like we already do in
the VFIO IOMMU TCE driver.
This changes the default level number from 1 to 2 to reduce the amount
of memory required for the default 32bit DMA window at the boot time.
The default window size is up to 2GB which requires 4MB of TCEs which is
unlikely to be used entirely or at all as most devices these days are
64bit capable so by switching to 2 levels by default we save 4032KB of
RAM per a device.
While at this, add __GFP_NOWARN to alloc_pages_node() as the userspace
can trigger this path via VFIO, see the failure and try creating a table
again with different parameters which might succeed.
========================================================
hardirqs last enabled at (2305): [<c00000000000e4c8>] fast_exc_return_irq+0x28/0x34
hardirqs last disabled at (2303): [<c000000000cb9fd0>] __do_softirq+0x4a0/0x654
WARNING: possible irq lock inversion dependency detected
5.2.0-rc6-le_nv2_aikATfstn1-p1 #634 Tainted: G W
softirqs last enabled at (2304): [<c000000000cba054>] __do_softirq+0x524/0x654
softirqs last disabled at (2297): [<c00000000010f278>] irq_exit+0x128/0x180
--------------------------------------------------------
swapper/0/0 just changed the state of lock: 0000000006cf56a6 (&(&host->lock)->rlock){-...}, at: ahci_single_level_irq_intr+0xac/0x120
but this lock took another, HARDIRQ-unsafe lock in the past:
(fs_reclaim){+.+.}
and interrupts could create inverse lock ordering between them.
other info that might help us debug this:
Possible interrupt unsafe locking scenario:
[Why]
The vm config will be clear to 0 when system enter S4. It will
cause hubbub didn't know how to fetch data when system resume.
The flip always pending because earliest_inuse_address and
request_address are different.
[How]
Reprogram VM config when system resume
Signed-off-by: Lewis Huang <Lewis.Huang@amd.com> Reviewed-by: Jun Lei <Jun.Lei@amd.com> Acked-by: Eric Yang <eric.yang2@amd.com> Acked-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[Why]
The math on deciding on how many
"frames to insert" sometimes sent us over the max refresh rate.
Also integer overflow can occur if we have high refresh rates.
[How]
Instead of clipping the frame duration such that it doesn’t go below the min,
just remove a frame from the number of frames to insert. +
Use unsigned long long for intermediate calculations to prevent
integer overflow.
Signed-off-by: Bayan Zabihiyan <bayan.zabihiyan@amd.com> Reviewed-by: Aric Cyr <Aric.Cyr@amd.com> Acked-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[Why]
When endpoint is at the boundary of a region, such as at 2^0=1
we find that the last segment has a sharp slope and some points
are clipped at the top.
[How]
If end point is 1, which is exactly at the 2^0 region boundary, we
need to program an additional region beyond this point.
Signed-off-by: Anthony Koo <Anthony.Koo@amd.com> Reviewed-by: Aric Cyr <Aric.Cyr@amd.com> Acked-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
In the definition of the p5020 chip, the p2041 chip's info was used
instead. The p5020 and p2041 chips have different info. This is most
likely a typo.
ipmi_thread() uses back-to-back schedule() to poll for command
completion which, on some machines, can push up CPU consumption and
heavily tax the scheduler locks leading to noticeable overall
performance degradation.
This was originally added so firmware updates through IPMI would
complete in a timely manner. But we can't kill the scheduler
locks for that one use case.
Instead, only run schedule() continuously in maintenance mode,
where firmware updates should run.
According to the following tab (coming from STMFX datasheet), updates
have to done in stmfx_pinconf_set function:
-"type" has to be set when "bias" is configured as "pull-up or pull-down"
-PIN_CONFIG_DRIVE_PUSH_PULL should only be used when gpio is configured as
output. There is so no need to check direction.
DIR | TYPE | PUPD | MFX GPIO configuration
----|------|------|---------------------------------------------------
1 | 1 | 1 | OUTPUT open drain with internal pull-up resistor
----|------|------|---------------------------------------------------
1 | 1 | 0 | OUTPUT open drain with internal pull-down resistor
----|------|------|---------------------------------------------------
1 | 0 | 0/1 | OUTPUT push pull no pull
----|------|------|---------------------------------------------------
0 | 1 | 1 | INPUT with internal pull-up resistor
----|------|------|---------------------------------------------------
0 | 1 | 0 | INPUT with internal pull-down resistor
----|------|------|---------------------------------------------------
0 | 0 | 1 | INPUT floating
----|------|------|---------------------------------------------------
0 | 0 | 0 | analog (GPIO not used, default setting)
When building with -Wsometimes-uninitialized, clang warns:
drivers/pci/hotplug/rpaphp_core.c:243:14: warning: variable 'fndit' is
used uninitialized whenever 'for' loop exits because its condition is
false [-Wsometimes-uninitialized]
for (j = 0; j < entries; j++) {
^~~~~~~~~~~
drivers/pci/hotplug/rpaphp_core.c:256:6: note: uninitialized use occurs
here
if (fndit)
^~~~~
drivers/pci/hotplug/rpaphp_core.c:243:14: note: remove the condition if
it is always true
for (j = 0; j < entries; j++) {
^~~~~~~~~~~
drivers/pci/hotplug/rpaphp_core.c:233:14: note: initialize the variable
'fndit' to silence this warning
int j, fndit;
^
= 0
fndit is only used to gate a sprintf call, which can be moved into the
loop to simplify the code and eliminate the local variable, which will
fix this warning.
Fixes: 2fcf3ae508c2 ("hotplug/drc-info: Add code to search ibm,drc-info property") Suggested-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Acked-by: Tyrel Datwyler <tyreld@linux.ibm.com> Acked-by: Joel Savitz <jsavitz@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://github.com/ClangBuiltLinux/linux/issues/504 Link: https://lore.kernel.org/r/20190603221157.58502-1-natechancellor@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>