There is a use-after-free problem in the ion driver.
This is caused by a race condition in the ion_ioctl()
function.
A handle has ref count of 1 and two tasks on different
cpus calls ION_IOC_FREE simultaneously.
cpu 0 cpu 1
-------------------------------------------------------
ion_handle_get_by_id()
(ref == 2)
ion_handle_get_by_id()
(ref == 3)
ion_free()
(ref == 2)
ion_handle_put()
(ref == 1)
ion_free()
(ref == 0 so ion_handle_destroy() is
called
and the handle is freed.)
ion_handle_put() is called and it
decreases the slub's next free pointer
The problem is detected as an unaligned access in the
spin lock functions since it uses load exclusive
instruction. In some cases it corrupts the slub's
free pointer which causes a mis-aligned access to the
next free pointer.(kmalloc returns a pointer like ffffc0745b4580aa). And it causes lots of other
hard-to-debug problems.
This symptom is caused since the first member in the
ion_handle structure is the reference count and the
ion driver decrements the reference after it has been
freed.
To fix this problem client->lock mutex is extended
to protect all the codes that uses the handle.
Signed-off-by: Eun Taik Lee <eun.taik.lee@samsung.com> Reviewed-by: Laura Abbott <labbott@redhat.com> Cc: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
index 7ff2a7ec871f..33b390e7ea31
The VFIO_DEVICE_SET_IRQS ioctl did not sufficiently sanitize
user-supplied integers, potentially allowing memory corruption. This
patch adds appropriate integer overflow checks, checks the range bounds
for VFIO_IRQ_SET_DATA_NONE, and also verifies that only single element
in the VFIO_IRQ_SET_DATA_TYPE_MASK bitmask is set.
VFIO_IRQ_SET_ACTION_TYPE_MASK is already correctly checked later in
vfio_pci_set_irqs_ioctl().
Furthermore, a kzalloc is changed to a kcalloc because the use of a
kzalloc with an integer multiplication allowed an integer overflow
condition to be reached without this patch. kcalloc checks for overflow
and should prevent a similar occurrence.
Signed-off-by: Vlad Tsyrklevich <vlad@tsyrklevich.net> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Cc: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If struct xc2028_config is passed without a firmware name,
the following trouble may happen:
[11009.907205] xc2028 5-0061: type set to XCeive xc2028/xc3028 tuner
[11009.907491] ==================================================================
[11009.907750] BUG: KASAN: use-after-free in strcmp+0x96/0xb0 at addr ffff8803bd78ab40
[11009.907992] Read of size 1 by task modprobe/28992
[11009.907994] =============================================================================
[11009.907997] BUG kmalloc-16 (Tainted: G W ): kasan: bad access detected
[11009.907999] -----------------------------------------------------------------------------
[11009.908130] Bytes b4 ffff8803bd78ab30: 01 00 00 00 2a 07 00 00 9d 28 00 00 01 00 00 00 ....*....(......
[11009.908133] Object ffff8803bd78ab40: 01 00 00 00 00 00 00 00 b0 1d c3 6a 00 88 ff ff ...........j....
[11009.908137] CPU: 3 PID: 28992 Comm: modprobe Tainted: G B W 4.5.0-rc1+ #43
[11009.908140] Hardware name: /NUC5i7RYB, BIOS RYBDWi35.86A.0350.2015.0812.1722 08/12/2015
[11009.908142] ffff8803bd78a000ffff8802c273f1b8ffffffff81932007ffff8803c6407a80
[11009.908148] ffff8802c273f1e8ffffffff81556759ffff8803c6407a80ffffea000ef5e280
[11009.908153] ffff8803bd78ab40dffffc0000000000ffff8802c273f210ffffffff8155ccb4
[11009.908158] Call Trace:
[11009.908162] [<ffffffff81932007>] dump_stack+0x4b/0x64
[11009.908165] [<ffffffff81556759>] print_trailer+0xf9/0x150
[11009.908168] [<ffffffff8155ccb4>] object_err+0x34/0x40
[11009.908171] [<ffffffff8155f260>] kasan_report_error+0x230/0x550
[11009.908175] [<ffffffff81237d71>] ? trace_hardirqs_off_caller+0x21/0x290
[11009.908179] [<ffffffff8155e926>] ? kasan_unpoison_shadow+0x36/0x50
[11009.908182] [<ffffffff8155f5c3>] __asan_report_load1_noabort+0x43/0x50
[11009.908185] [<ffffffff8155ea00>] ? __asan_register_globals+0x50/0xa0
[11009.908189] [<ffffffff8194cea6>] ? strcmp+0x96/0xb0
[11009.908192] [<ffffffff8194cea6>] strcmp+0x96/0xb0
[11009.908196] [<ffffffffa13ba4ac>] xc2028_set_config+0x15c/0x630 [tuner_xc2028]
[11009.908200] [<ffffffffa13bac90>] xc2028_attach+0x310/0x8a0 [tuner_xc2028]
[11009.908203] [<ffffffff8155ea78>] ? memset+0x28/0x30
[11009.908206] [<ffffffffa13ba980>] ? xc2028_set_config+0x630/0x630 [tuner_xc2028]
[11009.908211] [<ffffffffa157a59a>] em28xx_attach_xc3028.constprop.7+0x1f9/0x30d [em28xx_dvb]
[11009.908215] [<ffffffffa157aa2a>] ? em28xx_dvb_init.part.3+0x37c/0x5cf4 [em28xx_dvb]
[11009.908219] [<ffffffffa157a3a1>] ? hauppauge_hvr930c_init+0x487/0x487 [em28xx_dvb]
[11009.908222] [<ffffffffa01795ac>] ? lgdt330x_attach+0x1cc/0x370 [lgdt330x]
[11009.908226] [<ffffffffa01793e0>] ? i2c_read_demod_bytes.isra.2+0x210/0x210 [lgdt330x]
[11009.908230] [<ffffffff812e87d0>] ? ref_module.part.15+0x10/0x10
[11009.908233] [<ffffffff812e56e0>] ? module_assert_mutex_or_preempt+0x80/0x80
[11009.908238] [<ffffffffa157af92>] em28xx_dvb_init.part.3+0x8e4/0x5cf4 [em28xx_dvb]
[11009.908242] [<ffffffffa157a6ae>] ? em28xx_attach_xc3028.constprop.7+0x30d/0x30d [em28xx_dvb]
[11009.908245] [<ffffffff8195222d>] ? string+0x14d/0x1f0
[11009.908249] [<ffffffff8195381f>] ? symbol_string+0xff/0x1a0
[11009.908253] [<ffffffff81953720>] ? uuid_string+0x6f0/0x6f0
[11009.908257] [<ffffffff811a775e>] ? __kernel_text_address+0x7e/0xa0
[11009.908260] [<ffffffff8104b02f>] ? print_context_stack+0x7f/0xf0
[11009.908264] [<ffffffff812e9846>] ? __module_address+0xb6/0x360
[11009.908268] [<ffffffff8137fdc9>] ? is_ftrace_trampoline+0x99/0xe0
[11009.908271] [<ffffffff811a775e>] ? __kernel_text_address+0x7e/0xa0
[11009.908275] [<ffffffff81240a70>] ? debug_check_no_locks_freed+0x290/0x290
[11009.908278] [<ffffffff8104a24b>] ? dump_trace+0x11b/0x300
[11009.908282] [<ffffffffa13e8143>] ? em28xx_register_extension+0x23/0x190 [em28xx]
[11009.908285] [<ffffffff81237d71>] ? trace_hardirqs_off_caller+0x21/0x290
[11009.908289] [<ffffffff8123ff56>] ? trace_hardirqs_on_caller+0x16/0x590
[11009.908292] [<ffffffff812404dd>] ? trace_hardirqs_on+0xd/0x10
[11009.908296] [<ffffffffa13e8143>] ? em28xx_register_extension+0x23/0x190 [em28xx]
[11009.908299] [<ffffffff822dcbb0>] ? mutex_trylock+0x400/0x400
[11009.908302] [<ffffffff810021a1>] ? do_one_initcall+0x131/0x300
[11009.908306] [<ffffffff81296dc7>] ? call_rcu_sched+0x17/0x20
[11009.908309] [<ffffffff8159e708>] ? put_object+0x48/0x70
[11009.908314] [<ffffffffa1579f11>] em28xx_dvb_init+0x81/0x8a [em28xx_dvb]
[11009.908317] [<ffffffffa13e81f9>] em28xx_register_extension+0xd9/0x190 [em28xx]
[11009.908320] [<ffffffffa0150000>] ? 0xffffffffa0150000
[11009.908324] [<ffffffffa0150010>] em28xx_dvb_register+0x10/0x1000 [em28xx_dvb]
[11009.908327] [<ffffffff810021b1>] do_one_initcall+0x141/0x300
[11009.908330] [<ffffffff81002070>] ? try_to_run_init_process+0x40/0x40
[11009.908333] [<ffffffff8123ff56>] ? trace_hardirqs_on_caller+0x16/0x590
[11009.908337] [<ffffffff8155e926>] ? kasan_unpoison_shadow+0x36/0x50
[11009.908340] [<ffffffff8155e926>] ? kasan_unpoison_shadow+0x36/0x50
[11009.908343] [<ffffffff8155e926>] ? kasan_unpoison_shadow+0x36/0x50
[11009.908346] [<ffffffff8155ea37>] ? __asan_register_globals+0x87/0xa0
[11009.908350] [<ffffffff8144da7b>] do_init_module+0x1d0/0x5ad
[11009.908353] [<ffffffff812f2626>] load_module+0x6666/0x9ba0
[11009.908356] [<ffffffff812e9c90>] ? symbol_put_addr+0x50/0x50
[11009.908361] [<ffffffffa1580037>] ? em28xx_dvb_init.part.3+0x5989/0x5cf4 [em28xx_dvb]
[11009.908366] [<ffffffff812ebfc0>] ? module_frob_arch_sections+0x20/0x20
[11009.908369] [<ffffffff815bc940>] ? open_exec+0x50/0x50
[11009.908374] [<ffffffff811671bb>] ? ns_capable+0x5b/0xd0
[11009.908377] [<ffffffff812f5e58>] SyS_finit_module+0x108/0x130
[11009.908379] [<ffffffff812f5d50>] ? SyS_init_module+0x1f0/0x1f0
[11009.908383] [<ffffffff81004044>] ? lockdep_sys_exit_thunk+0x12/0x14
[11009.908394] [<ffffffff822e6936>] entry_SYSCALL_64_fastpath+0x16/0x76
[11009.908396] Memory state around the buggy address:
[11009.908398] ffff8803bd78aa00: 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[11009.908401] ffff8803bd78aa80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[11009.908403] >ffff8803bd78ab00: fc fc fc fc fc fc fc fc 00 00 fc fc fc fc fc fc
[11009.908405] ^
[11009.908407] ffff8803bd78ab80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[11009.908409] ffff8803bd78ac00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[11009.908411] ==================================================================
In order to avoid it, let's set the cached value of the firmware
name to NULL after freeing it. While here, return an error if
the memory allocation fails.
In Thumb2 mode, the stack register r13 is deprecated if the
destination register is the program counter (r15). Similar to
head.S, head-nommu.S uses r13 to store the return address used
after configuring the CPU's CP15 register. However, since we do
not enable a MMU, there will be no address switch and it is
possible to use branch with link instruction to call
__after_proc_init.
Avoid using r13 completely by using bl to call __after_proc_init
and get rid of __secondary_switched.
Beside removing unnecessary complexity, this also fixes a
compiler warning when compiling a !MMU kernel:
Warning: Use of r13 as a source register is deprecated when r15
is the destination register.
Tested-by: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Stefan Agner <stefan@agner.ch> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Using "make tinyconfig" produces a couple of annoying warnings that show
up for build test machines all the time:
.config:966:warning: override: NOHIGHMEM changes choice state
.config:965:warning: override: SLOB changes choice state
.config:963:warning: override: KERNEL_XZ changes choice state
.config:962:warning: override: CC_OPTIMIZE_FOR_SIZE changes choice state
.config:933:warning: override: SLOB changes choice state
.config:930:warning: override: CC_OPTIMIZE_FOR_SIZE changes choice state
.config:870:warning: override: SLOB changes choice state
.config:868:warning: override: KERNEL_XZ changes choice state
.config:867:warning: override: CC_OPTIMIZE_FOR_SIZE changes choice state
I've made a previous attempt at fixing them and we discussed a number of
alternatives.
I tried changing the Makefile to use "merge_config.sh -n
$(fragment-list)" but couldn't get that to work properly.
This is yet another approach, based on the observation that we do want
to see a warning for conflicting 'choice' options, and that we can
simply make them non-conflicting by listing all other options as
disabled. This is a trivial patch that we can apply independent of
plans for other changes.
Over the years the code has been changed various times leading to
argc/argv being defined in a different function to where we actually
use the variables. Clean this up by moving them to prom_init_cmdline().
If no user settings are found it's pointless trying to
read them from flash. So skip that step.
This also fixes a compilation warning about uninitialized variables in
aic94xx.
Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: James Bottomley <JBottomley@Odin.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The aurora cache controller is the only remaining user of a couple
of functions in this file and are completely unused when that is
disabled, leading to build warnings:
arch/arm/mm/cache-l2x0.c:167:13: warning: 'l2x0_cache_sync' defined but not used [-Wunused-function]
arch/arm/mm/cache-l2x0.c:184:13: warning: 'l2x0_flush_all' defined but not used [-Wunused-function]
arch/arm/mm/cache-l2x0.c:194:13: warning: 'l2x0_disable' defined but not used [-Wunused-function]
With the knowledge that the code is now aurora-specific, we can
simplify it noticeably:
- The pl310 errata workarounds are not needed on aurora and can be removed
- As confirmed by Thomas Petazzoni from the data sheet, the cache_wait()
macro is never needed.
- No need to hold the lock across atomic cache sync
- We can load the l2x0_base into a local variable across operations
There should be no functional change in this patch, but readability
and the generated object code improves, along with avoiding the
warnings.
(on Armada 370 RD and Armada XP GP, boot tested, plus a little bit of
DMA traffic by reading data from a SD card)
Acked-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Tested-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suppress the following warning displayed on building 32bit (i686) kernel.
===============================================================================
...
CC [M] fs/btrfs/extent_io.o
fs/btrfs/extent_io.c: In function ‘btrfs_free_io_failure_record’:
fs/btrfs/extent_io.c:2193:13: warning: cast to pointer from integer of
different size [-Wint-to-pointer-cast]
failrec = (struct io_failure_record *)state->private;
...
===============================================================================
Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> Reported-by: Chris Murphy <chris@colorremedies.com> Signed-off-by: Chris Mason <clm@fb.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We get a bogus warning about a potential uninitialized variable
use in gfs2, because the compiler does not figure out that we
never use the leaf number if get_leaf_nr() returns an error:
fs/gfs2/dir.c: In function 'get_first_leaf':
fs/gfs2/dir.c:802:9: warning: 'leaf_no' may be used uninitialized in this function [-Wmaybe-uninitialized]
fs/gfs2/dir.c: In function 'dir_split_leaf':
fs/gfs2/dir.c:1021:8: warning: 'leaf_no' may be used uninitialized in this function [-Wmaybe-uninitialized]
Changing the 'if (!error)' to 'if (!IS_ERR_VALUE(error))' is
sufficient to let gcc understand that this is exactly the same
condition as in IS_ERR() so it can optimize the code path enough
to understand it.
The sunxi mmc driver tries to calculate a dma address by using pointer
arithmetic, which causes a warning when dma_addr_t is wider than a pointer:
drivers/mmc/host/sunxi-mmc.c: In function 'sunxi_mmc_init_idma_des':
drivers/mmc/host/sunxi-mmc.c:296:35: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
struct sunxi_idma_des *pdes_pa = (struct sunxi_idma_des *)host->sg_dma;
^
To avoid this warning and to simplify the logic, this changes
the code to avoid the cast and calculate the correct address
manually. The behavior should be unchanged.
This is the last remaining warning for arm64, and I'd like to get rid of
it. We don't really know the cache line size, architecturally it would
be at least 16 bytes, but all implementations I found have 64 or 128
bytes. Configuring tulip for 32-byte lines as we do on ARM32 seems to
be the safe but slow default, and nobody who cares about performance these
days would use a tulip chip anyway, so we can just use that.
To save the next person the job of trying to find out what this is for
and picking a default for their architecture just to kill off the warning,
I'm now removing the preprocessor #warning and turning it into a pr_warn
or dev_warn that prints the equivalent information when the driver gets
loaded.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Grant Grundler <grundler@parisc-linux.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The driver reads a value from hfa384x_from_bap(), which may fail,
and then assigns the value to a local variable. gcc detects that
in in the failure case, the 'rlen' variable now contains
uninitialized data:
In file included from ../drivers/net/wireless/intersil/hostap/hostap_pci.c:220:0:
drivers/net/wireless/intersil/hostap/hostap_hw.c: In function 'hfa384x_get_rid':
drivers/net/wireless/intersil/hostap/hostap_hw.c:842:5: warning: 'rec' may be used uninitialized in this function [-Wmaybe-uninitialized]
if (le16_to_cpu(rec.len) == 0) {
This restructures the function as suggested by Russell King, to
make it more readable and get more reliable error handling, by
handling each failure mode using a goto.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The loop for measuring the square wave periods over some cycles is
refactored to be more easily readable. This includes avoiding a
"by-hand-implemented" for loop with a "real" one and adding some
comments.
Furthermore the following compiler warning is avoided by this patch:
drivers/misc/ioc4.c: In function ‘ioc4_probe’:
drivers/misc/ioc4.c:194:16: warning: ‘start’ may be used uninitialized
in this function [-Wmaybe-uninitialized]
period = (end - start) /
^
drivers/misc/ioc4.c:148:11: note: ‘start’ was declared here
uint64_t start, end, period;
^
When CONFIG_PCI_MSI is disabled, we get warnings about unused functions
in the vxge driver:
drivers/net/ethernet/neterion/vxge/vxge-main.c:2121:13: warning: 'adaptive_coalesce_tx_interrupts' defined but not used [-Wunused-function]
drivers/net/ethernet/neterion/vxge/vxge-main.c:2149:13: warning: 'adaptive_coalesce_rx_interrupts' defined but not used [-Wunused-function]
We could add another #ifdef here, but it's nicer to avoid those warnings
for good by converting the existing #ifdef to if(IS_ENABLED()), which has
the same effect but provides better compile-time coverage in general,
and lets the compiler understand better when the function is intentionally
unused.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The nozomi wireless data driver has its own helper function to
transfer data from a FIFO, doing an extra byte swap on big-endian
architectures, presumably to bring the data back into byte-serial
order after readw() or readl() perform their implicit byteswap.
This helper function is used in the receive_data() function to
first read the length into a 32-bit variable, which causes
a compile-time warning:
drivers/tty/nozomi.c: In function 'receive_data':
drivers/tty/nozomi.c:857:9: warning: 'size' may be used uninitialized in this function [-Wmaybe-uninitialized]
The problem is that gcc is unsure whether the data was actually
read or not. We know that it is at this point, so we can replace
it with a single readl() to shut up that warning.
I am leaving the byteswap in there, to preserve the existing
behavior, even though this seems fishy: Reading the length of
the data into a cpu-endian variable should normally not use
a second byteswap on big-endian systems, unless the hardware
is aware of the CPU endianess.
There appears to be a lot more confusion about endianess in this
driver, so it probably has not worked on big-endian systems in
a long time, if ever, and I have no way to test it. It's well
possible that this driver has not been used by anyone in a while,
the last patch that looks like it was tested on the hardware is
from 2008.
gcc-5.0 gained a new warning in the fwsignal portion of the brcmfmac
driver:
drivers/net/wireless/brcm80211/brcmfmac/fwsignal.c: In function 'brcmf_fws_txs_process':
drivers/net/wireless/brcm80211/brcmfmac/fwsignal.c:1478:8: warning: 'skb' may be used uninitialized in this function [-Wmaybe-uninitialized]
This is a false positive, and marking the brcmf_fws_hanger_poppkt function
as 'static inline' makes the warning go away. I have checked the object
file output and while a little code gets moved around, the size of
the binary remains identical.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 2ae83bf93882d1 ("[CIFS] Fix setting time before epoch (negative
time values)") changed "u64 t" to "s64 t", which makes do_div() complain
about a pointer signedness mismatch:
CC fs/cifs/netmisc.o
In file included from ./arch/mips/include/asm/div64.h:12:0,
from include/linux/kernel.h:124,
from include/linux/list.h:8,
from include/linux/wait.h:6,
from include/linux/net.h:23,
from fs/cifs/netmisc.c:25:
fs/cifs/netmisc.c: In function ‘cifs_NTtimeToUnix’:
include/asm-generic/div64.h:43:28: warning: comparison of distinct pointer types lacks a cast [enabled by default]
(void)(((typeof((n)) *)0) == ((uint64_t *)0)); \
^
fs/cifs/netmisc.c:941:22: note: in expansion of macro ‘do_div’
ts.tv_nsec = (long)do_div(t, 10000000) * 100;
Introduce a temporary "u64 abs_t" variable to fix this.
Signed-off-by: Kevin Cernekee <cernekee@gmail.com> Signed-off-by: Steve French <steve.french@primarydata.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
gcc-5.0 warns about a potential uninitialized variable use in nfsd:
fs/nfsd/nfs4state.c: In function 'nfsd4_process_open2':
fs/nfsd/nfs4state.c:3781:3: warning: 'old_deny_bmap' may be used uninitialized in this function [-Wmaybe-uninitialized]
reset_union_bmap_deny(old_deny_bmap, stp);
^
fs/nfsd/nfs4state.c:3760:16: note: 'old_deny_bmap' was declared here
unsigned char old_deny_bmap;
^
This is a false positive, the code path that is warned about cannot
actually be reached.
This adds an initialization for the variable to make the warning go
away.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There are certain places where the code uses .set mips32 or .set mips64
or .set arch=r4000. In preparation of MIPS R6 support, and in order to
use as less #ifdefs as possible, we define new macros to set similar
annotations for MIPS R6.
cpmac_start_xmit() used the max() macro on skb->len (an unsigned int)
and ETH_ZLEN (a signed int literal). This led to the following compiler
warning:
In file included from include/linux/list.h:8:0,
from include/linux/module.h:9,
from drivers/net/ethernet/ti/cpmac.c:19:
drivers/net/ethernet/ti/cpmac.c: In function 'cpmac_start_xmit':
include/linux/kernel.h:748:17: warning: comparison of distinct pointer
types lacks a cast
(void) (&_max1 == &_max2); \
^
drivers/net/ethernet/ti/cpmac.c:560:8: note: in expansion of macro 'max'
len = max(skb->len, ETH_ZLEN);
^
On top of this, it assigned the result of the max() macro to a signed
integer whilst all further uses of it result in it being cast to varying
widths of unsigned integer.
Fix this up by using max_t to ensure the comparison is performed as
unsigned integers, and for consistency change the type of the len
variable to unsigned int.
Signed-off-by: Paul Burton <paul.burton@imgtec.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As a part of memory initialisation the architecture passes an array to
free_area_init_nodes() which specifies the max PFN of each memory zone.
This array is not necessarily monotonic (due to unused zones) so this
array is parsed to build monotonic lists of the min and max PFN for each
zone. ZONE_MOVABLE is special cased here as its limits are managed by
the mm subsystem rather than the architecture. Unfortunately, this
special casing is broken when ZONE_MOVABLE is the not the last zone in
the zone list. The core of the issue is:
if (i == ZONE_MOVABLE)
continue;
arch_zone_lowest_possible_pfn[i] =
arch_zone_highest_possible_pfn[i-1];
As ZONE_MOVABLE is skipped the lowest_possible_pfn of the next zone will
be set to zero. This patch fixes this bug by adding explicitly tracking
where the next zone should start rather than relying on the contents
arch_zone_highest_possible_pfn[].
Thie is low priority. To get bitten by this you need to enable a zone
that appears after ZONE_MOVABLE in the zone_type enum. As far as I can
tell this means running a kernel with ZONE_DEVICE or ZONE_CMA enabled,
so I can't see this affecting too many people.
I only noticed this because I've been fiddling with ZONE_DEVICE on
powerpc and 4.6 broke my test kernel. This bug, in conjunction with the
changes in Taku Izumi's kernelcore=mirror patch (d91749c1dda71) and
powerpc being the odd architecture which initialises max_zone_pfn[] to
~0ul instead of 0 caused all of system memory to be placed into
ZONE_DEVICE at boot, followed a panic since device memory cannot be used
for kernel allocations. I've already submitted a patch to fix the
powerpc specific bits, but I figured this should be fixed too.
Link: http://lkml.kernel.org/r/1462435033-15601-1-git-send-email-oohall@gmail.com Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Cc: Anton Blanchard <anton@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The brand new GCC 5.1.0 warns by default on using a boolean in the
switch condition. This results in the following warning:
fs/nfs/nfs4proc.c: In function 'nfs4_proc_get_rootfh':
fs/nfs/nfs4proc.c:3100:10: warning: switch condition has boolean value [-Wswitch-bool]
switch (auth_probe) {
^
This code was obviously using switch to make use of the fall-through
semantics (without the usual comment, though).
Rewrite that code using if statements to avoid the warning and make
the code a bit more readable on the way.
Both Linus (most recent) and Steve (a while ago) reported that perf
related callbacks have massive stack bloat.
The problem is that software events need a pt_regs in order to
properly report the event location and unwind stack. And because we
could not assume one was present we allocated one on stack and filled
it with minimal bits required for operation.
Now, pt_regs is quite large, so this is undesirable. Furthermore it
turns out that most sites actually have a pt_regs pointer available,
making this even more onerous, as the stack space is pointless waste.
This patch addresses the problem by observing that software events
have well defined nesting semantics, therefore we can use static
per-cpu storage instead of on-stack.
Linus made the further observation that all but the scheduler callers
of perf_sw_event() have a pt_regs available, so we change the regular
perf_sw_event() to require a valid pt_regs (where it used to be
optional) and add perf_sw_event_sched() for the scheduler.
We have a scheduler specific call instead of a more generic _noregs()
like construct because we can assume non-recursion from the scheduler
and thereby simplify the code further (_noregs would have to put the
recursion context call inline in order to assertain which __perf_regs
element to use).
One last note on the implementation of perf_trace_buf_prepare(); we
allow .regs = NULL for those cases where we already have a pt_regs
pointer available and do not need another.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Javi Merino <javi.merino@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Petr Mladek <pmladek@suse.cz> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Cc: Vaibhav Nagarnaik <vnagarnaik@google.com> Link: http://lkml.kernel.org/r/20141216115041.GW3337@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Somehow the wrong version of the patch to remove the use of custom
gpio.h on mips has been merged. This patch add the missing fixes for a
build error on jz4740 because linux/gpio.h doesn't provide any machine
specfics definitions anymore.
mips-gcc-5.3 warns about correct code on linux-3.18 and earlier:
In file included from ../include/linux/blkdev.h:4:0,
from ../drivers/md/dm-bufio.h:12,
from ../drivers/md/dm-bufio.c:9:
../drivers/md/dm-bufio.c: In function 'alloc_buffer':
../include/linux/sched.h:1975:56: warning: 'noio_flag' may be used uninitialized in this function [-Wmaybe-uninitialized]
current->flags = (current->flags & ~PF_MEMALLOC_NOIO) | flags;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~
../drivers/md/dm-bufio.c:325:11: note: 'noio_flag' was declared here
The warning disappeared on later kernels with this commit: be0c37c985ed
("MIPS: Rearrange PTE bits into fixed positions.") I assume this only
happened because it changed some inlining decisions.
On 3.18.y, we can shut up the warning by adding an extra initialization.
gadgetfs: fix uninitialized variable in error handling
gcc warns about a bug in 3.18.y:
drivers/usb/gadget/legacy/inode.c:648:10: warning: 'value' may be used
This is caused by the backport of f01d35a15fa0416 from 4.0 to 3.18: c81fc59be42c6e0 gadgetfs: use-after-free in ->aio_read()
The backported patch was buggy, but the mainline code was rewritten
in a larger patch directly following this one in a way that fixed the
bug.
For stable, we should need only a one-line change to make sure we
return an proper error code. It is very unlikely that anybody ever
ran into the out-of-memory case here in practice, but the compiler
is right in theory.
clk: at91: usb: fix determine_rate prototype again
We had an incorrect backport of 4591243102fa ("clk: at91: usb: propagate rate modification to the parent clk")
that was fixed incorrectly in linux-3.18.y by 76723e7ed589 ("clk: at91: usb: fix determine_rate prototype")
as shown by this warning:
drivers/clk/at91/clk-usb.c:155:20: warning: initialization from incompatible pointer type [-Wincompatible-pointer-types]
drivers/clk/at91/clk-usb.c:193:20: warning: initialization from incompatible pointer type [-Wincompatible-pointer-types]
Generally, taking an unexpected exception should be a fatal event, and
bad_mode is intended to cater for this. However, it should be possible
to contain unexpected synchronous exceptions from EL0 without bringing
the kernel down, by sending a SIGILL to the task.
We tried to apply this approach in commit 9955ac47f4ba1c95 ("arm64:
don't kill the kernel on a bad esr from el0"), by sending a signal for
any bad_mode call resulting from an EL0 exception.
However, this also applies to other unexpected exceptions, such as
SError and FIQ. The entry paths for these exceptions branch to bad_mode
without configuring the link register, and have no kernel_exit. Thus, if
we take one of these exceptions from EL0, bad_mode will eventually
return to the original user link register value.
This patch fixes this by introducing a new bad_el0_sync handler to cater
for the recoverable case, and restoring bad_mode to its original state,
whereby it calls panic() and never returns. The recoverable case
branches to bad_el0_sync with a bl, and returns to userspace via the
usual ret_to_user mechanism.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Fixes: 9955ac47f4ba1c95 ("arm64: don't kill the kernel on a bad esr from el0") Reported-by: Mark Salter <msalter@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
unmap_stage2_range() on the entire memory range for the guest. This could
cause problems with other callers (e.g, munmap on a memslot) trying to
unmap a range. And since we have to unmap the entire Guest memory range
holding a spinlock, make sure we yield the lock if necessary, after we
unmap each PUD range.
Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup") Cc: Paolo Bonzini <pbonzin@redhat.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Christoffer Dall <christoffer.dall@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
[ Avoid vCPU starvation and lockup detector warnings ] Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
MCA bank 3 is reserved on systems pre-Fam17h, so it didn't have a name.
However, MCA bank 3 is defined on Fam17h systems and can be accessed
using legacy MSRs. Without a name we get a stack trace on Fam17h systems
when trying to register sysfs files for bank 3 on kernels that don't
recognize Scalable MCA.
Call MCA bank 3 "decode_unit" since this is what it represents on
Fam17h. This will allow kernels without SMCA support to see this bank on
Fam17h+ and prevent the stack trace. This will not affect older systems
since this bank is reserved on them, i.e. it'll be ignored.
Tested on AMD Fam15h and Fam17h systems.
WARNING: CPU: 26 PID: 1 at lib/kobject.c:210 kobject_add_internal
kobject: (ffff88085bb256c0): attempted to be registered with empty name!
...
Call Trace:
kobject_add_internal
kobject_add
kobject_create_and_add
threshold_create_device
threshold_init_device
On a 64-bit system when the user probes on a 'stdu' instruction, the kernel does
not emulate actual store in emulate_step() because it may corrupt the exception
frame. So the kernel does the actual store operation in exception return code
i.e. resume_kernel().
resume_kernel() loads the saved stack pointer from memory using lwz, which only
loads the low 32-bits of the address, causing the kernel crash.
Fix this by loading the 64-bit value instead.
Fixes: be96f63375a1 ("powerpc: Split out instruction analysis part of emulate_step()") Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Reviewed-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
[mpe: Change log massage, add stable tag] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In commit 6afaf8a484cb ("UBI: flush wl before clearing update marker") I
managed to trigger and fix a similar bug. Now here is another version of
which I assumed it wouldn't matter back then but it turns out UBI has a
check for it and will error out like this:
|ubi0 warning: validate_vid_hdr: inconsistent used_ebs
|ubi0 error: validate_vid_hdr: inconsistent VID header at PEB 592
All you need to trigger this is? "ubiupdatevol /dev/ubi0_0 file" + a
powercut in the middle of the operation.
ubi_start_update() sets the update-marker and puts all EBs on the erase
list. After that userland can proceed to write new data while the old EB
aren't erased completely. A powercut at this point is usually not that
much of a tragedy. UBI won't give read access to the static volume
because it has the update marker. It will most likely set the corrupted
flag because it misses some EBs.
So we are all good. Unless the size of the image that has been written
differs from the old image in the magnitude of at least one EB. In that
case UBI will find two different values for `used_ebs' and refuse to
attach the image with the error message mentioned above.
So in order not to get in the situation, the patch will ensure that we
wait until everything is removed before it tries to write any data.
The alternative would be to detect such a case and remove all EBs at the
attached time after we processed the volume-table and see the
update-marker set. The patch looks bigger and I doubt it is worth it
since usually the write() will wait from time to time for a new EB since
usually there not that many spare EB that can be used.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently for DDR50 card, it need tuning in default. We meet tuning fail
issue for DDR50 card and some data CRC error when DDR50 sd card works.
This is because the default pad I/O drive strength can't make sure DDR50
card work stable. So increase the pad I/O drive strength for DDR50 card,
and use pins_100mhz.
This fixes DDR50 card support for IMX since DDR50 tuning was enabled from
commit 9faac7b95ea4 ("mmc: sdhci: enable tuning for DDR50")
gcc -O2 cannot always prove that the loop in acpi_power_get_inferred_state()
is enterered at least once, so it assumes that cur_state might not get
initialized:
drivers/acpi/power.c: In function 'acpi_power_get_inferred_state':
drivers/acpi/power.c:222:9: error: 'cur_state' may be used uninitialized in this function [-Werror=maybe-uninitialized]
This sets the variable to zero at the start of the loop, to ensure that
there is well-defined behavior even for an empty list. This gets rid of
the warning.
The warning first showed up when the -Os flag got removed in a bug fix
patch in linux-4.11-rc5.
I would suggest merging this addon patch on top of that bug fix to avoid
introducing a new warning in the stable kernels.
Fixes: 61b79e16c68d (ACPI: Fix incompatibility with mcount-based function graph tracing) Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On heavy paging with KSM I see guest data corruption. Turns out that
KSM will add pages to its tree, where the mapping return true for
pte_unused (or might become as such later). KSM will unmap such pages
and reinstantiate with different attributes (e.g. write protected or
special, e.g. in replace_page or write_protect_page)). This uncovered
a bug in our pagetable handling: We must remove the unused flag as
soon as an entry becomes present again.
Signed-of-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
STATUS_BAD_NETWORK_NAME can be received during node failover,
causing the flag to be set and making the reconnect thread
always unsuccessful, thereafter.
Once the only place where it is set is removed, the remaining
bits are rendered moot.
Removing it does not prevent "mount" from failing when a non
existent share is passed.
What happens when the share really ceases to exist while the
share is mounted is undefined now as much as it was before.
Signed-off-by: Germano Percossi <germano.percossi@citrix.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4fcd1813e640 ("Fix reconnect to not defer smb3 session reconnect
long after socket reconnect") added support for Negotiate requests to
be initiated by echo calls.
To avoid delays in calling echo after a reconnect, I added the patch
introduced by the commit b8c600120fc8 ("Call echo service immediately
after socket reconnect").
This has however caused a regression with cifs shares which do not have
support for echo calls to trigger Negotiate requests. On connections
which need to call Negotiation, the echo calls trigger an error which
triggers a reconnect which in turn triggers another echo call. This
results in a loop which is only broken when an operation is performed on
the cifs share. For an idle share, it can DOS a server.
The patch uses the smb_operation can_echo() for cifs so that it is
called only if connection has been already been setup.
kernel bz: 194531
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Tested-by: Jonathan Liu <net147@gmail.com> Acked-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
I noticed that reading the snapshot file when it is empty no longer gives a
status. It suppose to show the status of the snapshot buffer as well as how
to allocate and use it. For example:
># cat snapshot
# tracer: nop
#
#
# * Snapshot is allocated *
#
# Snapshot commands:
# echo 0 > snapshot : Clears and frees snapshot buffer
# echo 1 > snapshot : Allocates snapshot buffer, if not already allocated.
# Takes a snapshot of the main buffer.
# echo 2 > snapshot : Clears snapshot buffer (but does not allocate or free)
# (Doesn't have to be '2' works with any number that
# is not a '0' or '1')
What happened was that it was using the ring_buffer_iter_empty() function to
see if it was empty, and if it was, it showed the status. But that function
was returning false when it was empty. The reason was that the iter header
page was on the reader page, and the reader page was empty, but so was the
buffer itself. The check only tested to see if the iter was on the commit
page, but the commit page was no longer pointing to the reader page, but as
all pages were empty, the buffer is also.
Fixes: 651e22f2701b ("ring-buffer: Always reset iterator to reader page") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently the snapshot trigger enables the probe and then allocates the
snapshot. If the probe triggers before the allocation, it could cause the
snapshot to fail and turn tracing off. It's best to allocate the snapshot
buffer first, and then enable the trigger. If something goes wrong in the
enabling of the trigger, the snapshot buffer is still allocated, but it can
also be freed by the user by writting zero into the snapshot buffer file.
Also add a check of the return status of alloc_snapshot().
Fixes: 77fd5c15e3 ("tracing: Add snapshot trigger to function probes") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Running the following program as an unprivileged user exhausts kernel
memory by leaking thread keyrings:
#include <keyutils.h>
int main()
{
for (;;)
keyctl_set_reqkey_keyring(KEY_REQKEY_DEFL_THREAD_KEYRING);
}
Fix it by only creating a new thread keyring if there wasn't one before.
To make things more consistent, make install_thread_keyring_to_cred()
and install_process_keyring_to_cred() both return 0 if the corresponding
keyring is already present.
Fixes: d84f4f992cbd ("CRED: Inaugurate COW credentials") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Userspace should not be able to do things with the "dead" key type as it
doesn't have some of the helper functions set upon it that the kernel
needs. Attempting to use it may cause the kernel to crash.
Fix this by changing the name of the type to ".dead" so that it's rejected
up front on userspace syscalls by key_get_type_from_user().
Though this doesn't seem to affect recent kernels, it does affect older
ones, certainly those prior to:
commit c06cfb08b88dfbe13be44a69ae2fdc3a7c902d81
Author: David Howells <dhowells@redhat.com>
Date: Tue Sep 16 17:36:06 2014 +0100
KEYS: Remove key_type::match in favour of overriding default by match_preparse
which went in before 3.18-rc1.
Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Keyrings whose name begin with a '.' are special internal keyrings and so
userspace isn't allowed to create keyrings by this name to prevent
shadowing. However, the patch that added the guard didn't fix
KEYCTL_JOIN_SESSION_KEYRING. Not only can that create dot-named keyrings,
it can also subscribe to them as a session keyring if they grant SEARCH
permission to the user.
This, for example, allows a root process to set .builtin_trusted_keys as
its session keyring, at which point it has full access because now the
possessor permissions are added. This permits root to add extra public
keys, thereby bypassing module verification.
This also affects kexec and IMA.
This can be tested by (as root):
keyctl session .builtin_trusted_keys
keyctl add user a a @s
keyctl list @s
gcc-7 has an "optimization" pass that completely screws up, and
generates the code expansion for the (impossible) case of calling
ilog2() with a zero constant, even when the code gcc compiles does not
actually have a zero constant.
And we try to generate a compile-time error for anybody doing ilog2() on
a constant where that doesn't make sense (be it zero or negative). So
now gcc7 will fail the build due to our sanity checking, because it
created that constant-zero case that didn't actually exist in the source
code.
There's a whole long discussion on the kernel mailing about how to work
around this gcc bug. The gcc people themselevs have discussed their
"feature" in
but it's all water under the bridge, because while it looked at one
point like it would be solved by the time gcc7 was released, that was
not to be.
So now we have to deal with this compiler braindamage.
And the only simple approach seems to be to just delete the code that
tries to warn about bad uses of ilog2().
So now "ilog2()" will just return 0 not just for the value 1, but for
any non-positive value too.
It's not like I can recall anybody having ever actually tried to use
this function on any invalid value, but maybe the sanity check just
meant that such code never made it out in public.
Reported-by: Laura Abbott <labbott@redhat.com> Cc: John Stultz <john.stultz@linaro.org>, Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The rapf copy loops in the Meta usercopy code is missing some extable
entries for HTP cores with unaligned access checking enabled, where
faults occur on the instruction immediately after the faulting access.
Add the fixup labels and extable entries for these cases so that corner
case user copy failures don't cause kernel crashes.
The fixup code to rewind the source pointer in
__asm_copy_from_user_{32,64}bit_rapf_loop() always rewound the source by
a single unit (4 or 8 bytes), however this is insufficient if the fault
didn't occur on the first load in the loop, as the source pointer will
have been incremented but nothing will have been stored until all 4
register [pairs] are loaded.
Read the LSM_STEP field of TXSTATUS (which is already loaded into a
register), a bit like the copy_to_user versions, to determine how many
iterations of MGET[DL] have taken place, all of which need rewinding.
The fixup code for the copy_to_user rapf loops reads TXStatus.LSM_STEP
to decide how far to rewind the source pointer. There is a special case
for the last execution of an MGETL/MGETD, since it leaves LSM_STEP=0
even though the number of MGETLs/MGETDs attempted was 4. This uses ADDZ
which is conditional upon the Z condition flag, but the AND instruction
which masked the TXStatus.LSM_STEP field didn't set the condition flags
based on the result.
Fix that now by using ANDS which does set the flags, and also marking
the condition codes as clobbered by the inline assembly.
Currently we try to zero the destination for a failed read from userland
in fixup code in the usercopy.c macros. The rest of the destination
buffer is then zeroed from __copy_user_zeroing(), which is used for both
copy_from_user() and __copy_from_user().
Unfortunately we fail to zero in the fixup code as D1Ar1 is set to 0
before the fixup code entry labels, and __copy_from_user() shouldn't even
be zeroing the rest of the buffer.
Move the zeroing out into copy_from_user() and rename
__copy_user_zeroing() to raw_copy_from_user() since it no longer does
any zeroing. This also conveniently matches the name needed for
RAW_COPY_USER support in a later patch.
Fixes: 373cd784d0fc ("metag: Memory handling") Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: linux-metag@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When copying to userland on Meta, if any faults are encountered
immediately abort the copy instead of continuing on and repeatedly
faulting, and worse potentially copying further bytes successfully to
subsequent valid pages.
Fixes: 373cd784d0fc ("metag: Memory handling") Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: linux-metag@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix the error checking of the alignment adjustment code in
raw_copy_from_user(), which mistakenly considers it safe to skip the
error check when aligning the source buffer on a 2 or 4 byte boundary.
If the destination buffer was unaligned it may have started to copy
using byte or word accesses, which could well be at the start of a new
(valid) source page. This would result in it appearing to have copied 1
or 2 bytes at the end of the first (invalid) page rather than none at
all.
Metag's lib/usercopy.c has a bunch of copy_from_user macros for larger
copies between 5 and 16 bytes which are completely unused. Before fixing
zeroing lets drop these macros so there is less to fix.
Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-metag@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In case of error, the function kthread_run() returns ERR_PTR()
and never returns NULL. The NULL test in the return value check
should be replaced with IS_ERR().
In the case that compat_get_bitmap fails we do not want to copy the
bitmap to the user as it will contain uninitialized stack data and leak
sensitive data.
After parsing TRX we should skip to the first block placed behind it.
Our code was working only with TRX with length not aligned to the
blocksize. In other cases (length aligned) it was missing the block
places right after TRX.
This fixes calculation and simplifies the comment.
This bug is triggered when pmd_present() returns true for non-present
hugetlb, so fixing the present check in follow_huge_pmd() prevents it.
Using pmd_present() to determine present/non-present for hugetlb is not
correct, because pmd_present() checks multiple bits (not only
_PAGE_PRESENT) for historical reason and it can misjudge hugetlb state.
Fixes: e66f17ff7177 ("mm/hugetlb: take page table lock in follow_huge_pmd()") Link: http://lkml.kernel.org/r/1490149898-20231-1-git-send-email-n-horiguchi@ah.jp.nec.com Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Hugh Dickins <hughd@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Clearing the status bit on irq_unmask will discard any pending interrupt
that did arrive after the irq_ack, i.e. while the IRQ handler function
was executing.
Fixes: f365be092572 ("pinctrl: Add Qualcomm TLMM driver") Cc: Stephen Boyd <sboyd@codeaurora.org> Reported-by: Timur Tabi <timur@codeaurora.org> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When init_vqs runs, virtio_balloon.stats is either uninitialized or
contains stale values. The host updates its state with garbage data
because it has no way of knowing that this is just a marker buffer
used for signaling.
This patch updates the stats before pushing the initial buffer.
Alternative fixes:
* Push an empty buffer in init_vqs. Not easily done with the current
virtio implementation and violates the spec "Driver MUST supply the
same subset of statistics in all buffers submitted to the statsq".
* Push a buffer with invalid tags in init_vqs. Violates the same
spec clause, plus "invalid tag" is not really defined.
Note: the spec says:
When using the legacy interface, the device SHOULD ignore all values in
the first buffer in the statsq supplied by the driver after device
initialization. Note: Historically, drivers supplied an uninitialized
buffer in the first buffer.
Unfortunately QEMU does not seem to implement the recommendation
even for the legacy interface.
Signed-off-by: Ladi Prosek <lprosek@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The WRITE_SAME commands are not present in the blk_default_cmd_filter
write_ok list, and thus are failed with -EPERM when the SG_IO ioctl()
is executed without CAP_SYS_RAWIO capability (e.g., unprivileged users).
[ sg_io() -> blk_fill_sghdr_rq() > blk_verify_command() -> -EPERM ]
The problem can be reproduced with the sg_write_same command
# sg_write_same --num 1 --xferlen 512 /dev/sda
#
# capsh --drop=cap_sys_rawio -- -c \
'sg_write_same --num 1 --xferlen 512 /dev/sda'
Write same: pass through os error: Operation not permitted
#
For comparison, the WRITE_VERIFY command does not observe this problem,
since it is in that list:
That case happens to be exercised by QEMU KVM guests with 'scsi-block' devices
(qemu "-device scsi-block" [1], libvirt "<disk type='block' device='lun'>" [2]),
which employs the SG_IO ioctl() and runs as an unprivileged user (libvirt-qemu).
In that scenario, when a filesystem (e.g., ext4) performs its zero-out calls,
which are translated to write-same calls in the guest kernel, and then into
SG_IO ioctls to the host kernel, SCSI I/O errors may be observed in the guest:
Some devices have invalid baSourceID references, causing uvc_scan_chain()
to fail, but if we just take the entities we can find and put them
together in the most sensible chain we can think of, turns out they do
work anyway. Note: This heuristic assumes there is a single chain.
At the time of writing, devices known to have such a broken chain are
- Acer Integrated Camera (5986:055a)
- Realtek rtl157a7 (0bda:57a7)
During a PCI error recovery, like the ones provoked by EEH in the ppc64
platform, all IO to the device must be blocked while the recovery is
completed. Current 8250_pci implementation only suspends the port
instead of detaching it, which doesn't prevent incoming accesses like
TIOCMGET and TIOCMSET calls from reaching the device. Those end up
racing with the EEH recovery, crashing it. Similar races were also
observed when opening the device and when shutting it down during
recovery.
This patch implements a more robust IO blockage for the 8250_pci
recovery by unregistering the port at the beginning of the procedure and
re-adding it afterwards. Since the port is detached from the uart
layer, we can be sure that no request will make through to the device
during recovery. This is similar to the solution used by the JSM serial
driver.
I thank Peter Hurley <peter@hurleysoftware.com> for valuable input on
this one over one year ago.
Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
WARNING: CPU: 0 PID: 774 at /build/linux-ROBWaj/linux-4.9.13/kernel/trace/trace_functions_graph.c:233 ftrace_return_to_handler+0x1aa/0x1e0
Bad frame pointer: expected f6919d98, received f6919db0
from func acpi_pm_device_sleep_wake return to c43b6f9d
The warning means that function graph tracing is broken for the
acpi_pm_device_sleep_wake() function. That's because the ACPI Makefile
unconditionally sets the '-Os' gcc flag to optimize for size. That's an
issue because mcount-based function graph tracing is incompatible with
'-Os' on x86, thanks to the following gcc bug:
I have another patch pending which will ensure that mcount-based
function graph tracing is never used with CONFIG_CC_OPTIMIZE_FOR_SIZE on
x86.
But this patch is needed in addition to that one because the ACPI
Makefile overrides that config option for no apparent reason. It has
had this flag since the beginning of git history, and there's no related
comment, so I don't know why it's there. As far as I can tell, there's
no reason for it to be there. The appropriate behavior is for it to
honor CONFIG_CC_OPTIMIZE_FOR_{SIZE,PERFORMANCE} like the rest of the
kernel.
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If we try to allocate memory pages to back an xfs_buf that we're trying
to read, it's possible that we'll be so short on memory that the page
allocation fails. For a blocking read we'll just wait, but for
readahead we simply dump all the pages we've collected so far.
Unfortunately, after dumping the pages we neglect to clear the
_XBF_PAGES state, which means that the subsequent call to xfs_buf_free
thinks that b_pages still points to pages we own. It then double-frees
the b_pages pages.
This results in screaming about negative page refcounts from the memory
manager, which xfs oughtn't be triggering. To reproduce this case,
mount a filesystem where the size of the inodes far outweighs the
availalble memory (a ~500M inode filesystem on a VM with 300MB memory
did the trick here) and run bulkstat in parallel with other memory
eating processes to put a huge load on the system. The "check summary"
phase of xfs_scrub also works for this purpose.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There have been several reports over the years of NULL pointer
dereferences in xfs_trans_log_inode during xfs_fsr processes,
when the process is doing an fput and tearing down extents
on the temporary inode, something like:
As it turns out, this is because the i_itemp pointer, along
with the d_ops pointer, has been overwritten with zeros
when we tear down the extents during truncate. When the in-core
inode fork on the temporary inode used by xfs_fsr was originally
set up during the extent swap, we mistakenly looked at di_nextents
to determine whether all extents fit inline, but this misses extents
generated by speculative preallocation; we should be using if_bytes
instead.
This mistake corrupts the in-memory inode, and code in
xfs_iext_remove_inline eventually gets bad inputs, causing
it to memmove and memset incorrect ranges; this became apparent
because the two values in ifp->if_u2.if_inline_ext[1] contained
what should have been in d_ops and i_itemp; they were memmoved due
to incorrect array indexing and then the original locations
were zeroed with memset, again due to an array overrun.
Fix this by properly using i_df.if_bytes to determine the number
of extents, not di_nextents.
Thanks to dchinner for looking at this with me and spotting the
root cause.
[nborisov: backported to 4.4]
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The on-disk field di_size is used to set i_size, which is a signed
integer of loff_t. If the high bit of di_size is set, we'll end up with
a negative i_size, which will cause all sorts of problems. Since the
VFS won't let us create a file with such length, we should catch them
here in the verifier too.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sometimes firmware may not properly initialize I347AT4_PAGE_SELECT causing
the probe of an igb i210 NIC to fail. This patch adds an addition zeroing
of this register during igb_get_phy_id to workaround this issue.
Thanks for Jochen Henneberg for the idea and original patch.
Signed-off-by: Chris J Arges <christopherarges@gmail.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Disabling interrupts for even a millisecond can cause problems for some
devices. That can happen when sdhci changes clock frequency because it
waits for the clock to become stable under a spin lock.
The spin lock is not necessary here. Anything that is racing with changes
to the I/O state is already broken. The mmc core already provides
synchronization via "claiming" the host.
Although the spin lock probably should be removed from the code paths that
lead to this point, such a patch would touch too much code to be suitable
for stable trees. Consequently, for this patch, just drop the spin lock
while waiting.
If ext4_convert_inline_data() was called on a directory with inline
data, the filesystem was left in an inconsistent state (as considered by
e2fsck) because the file size was not increased to cover the new block.
This happened because the inode was not marked dirty after i_disksize
was updated. Fix this by marking the inode dirty at the end of
ext4_finish_convert_inline_dir().
This bug was probably not noticed before because most users mark the
inode dirty afterwards for other reasons. But if userspace executed
FS_IOC_SET_ENCRYPTION_POLICY with invalid parameters, as exercised by
'kvm-xfstests -c adv generic/396', then the inode was never marked dirty
after updating i_disksize.
The tiadc_irq_h(int irq, void *private) function is handling FIFO
overruns by clearing flags, disabling and enabling the ADC to
recover.
If the ADC is running in continuous mode a FIFO overrun happens
regularly. If the disabling of the ADC happens concurrently with
a new conversion. It might happen that the enabling of the ADC
is ignored by the hardware. This stops the ADC permanently. No
more interrupts are triggered.
According to the AM335x Reference Manual (SPRUH73H October 2011 -
Revised April 2013 - Chapter 12.4 and 12.5) it is necessary to
check the ADC FSM bits in REG_ADCFSM before enabling the ADC
again. Because the disabling of the ADC is done right after the
current conversion has been finished.
To trigger this bug it is necessary to run the ADC in continuous
mode. The ADC values of all channels need to be read in an endless
loop. The bug appears within the first 6 hours (~5.4 million
handled FIFO overruns). The user space application will hang on
reading new values from the character device.
Fixes: ca9a563805f7a ("iio: ti_am335x_adc: Add continuous sampling
support") Signed-off-by: Michael Engl <michael.engl@wjw-solutions.com> Signed-off-by: Jonathan Cameron <jic23@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
USBTMC devices are required to have a bulk-in and a bulk-out endpoint,
but the driver failed to verify this, something which could lead to the
endpoint addresses being taken from uninitialised memory.
Make sure to zero all private data as part of allocation, and add the
missing endpoint sanity check.
Note that this also addresses a more recently introduced issue, where
the interrupt-in-presence flag would also be uninitialised whenever the
optional interrupt-in endpoint is not present. This in turn could lead
to an interrupt urb being allocated, initialised and submitted based on
uninitialised values.
Fixes: dbf3e7f654c0 ("Implement an ioctl to support the USMTMC-USB488 READ_STATUS_BYTE operation.") Fixes: 5b775f672cc9 ("USB: add USB test and measurement class driver") Signed-off-by: Johan Hovold <johan@kernel.org>
[ johan: backport to v4.4 ] Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
icsk_ack.lrcvtime has a 0 value at socket creation time.
tcpi_last_data_recv can have bogus value if no payload is ever received.
This patch initializes icsk_ack.lrcvtime for active sessions
in tcp_finish_connect(), and for passive sessions in
tcp_create_openreq_child()
Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In sk_clone_lock(), we create a new socket and inherit most of the
parent's members via sock_copy() which memcpy()'s various sections.
Now, in case the parent socket had a BPF socket filter attached,
then newsk->sk_filter points to the same instance as the original
sk->sk_filter.
sk_filter_charge() is then called on the newsk->sk_filter to take a
reference and should that fail due to hitting max optmem, we bail
out and release the newsk instance.
The issue is that commit 278571baca2a ("net: filter: simplify socket
charging") wrongly combined the dismantle path with the failure path
of xfrm_sk_clone_policy(). This means, even when charging failed, we
call sk_free_unlock_clone() on the newsk, which then still points to
the same sk_filter as the original sk.
Thus, sk_free_unlock_clone() calls into __sk_destruct() eventually
where it tests for present sk_filter and calls sk_filter_uncharge()
on it, which potentially lets sk_omem_alloc wrap around and releases
the eBPF prog and sk_filter structure from the (still intact) parent.
Fix it by making sure that when sk_filter_charge() failed, we reset
newsk->sk_filter back to NULL before passing to sk_free_unlock_clone(),
so that we don't mess with the parents sk_filter.
Only if xfrm_sk_clone_policy() fails, we did reach the point where
either the parent's filter was NULL and as a result newsk's as well
or where we previously had a successful sk_filter_charge(), thus for
that case, we do need sk_filter_uncharge() to release the prior taken
reference on sk_filter.
Fixes: 278571baca2a ("net: filter: simplify socket charging") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Dmitry has reported that a BUG_ON() condition in unix_notinflight()
may be triggered by a simple code that forwards unix socket in an
SCM_RIGHTS message.
That is caused by incorrect unix socket GC implementation in unix_gc().
The GC first collects list of candidates, then (a) decrements their
"children's" inflight counter, (b) checks which inflight counters are
now 0, and then (c) increments all inflight counters back.
(a) and (c) are done by calling scan_children() with inc_inflight or
dec_inflight as the second argument.
Commit 6209344f5a37 ("net: unix: fix inflight counting bug in garbage
collector") changed scan_children() such that it no longer considers
sockets that do not have UNIX_GC_CANDIDATE flag. It also added a block
of code that that unsets this flag _before_ invoking
scan_children(, dec_iflight, ). This may lead to incorrect inflight
counters for some sockets.
This change fixes this bug by changing order of operations:
UNIX_GC_CANDIDATE is now unset only after all inflight counters are
restored to the original state.
I mistakenly added the code to release sk->sk_frag in
sk_common_release() instead of sk_destruct()
TCP sockets using sk->sk_allocation == GFP_ATOMIC do no call
sk_common_release() at close time, thus leaking one (order-3) page.
iSCSI is using such sockets.
Fixes: 5640f7685831 ("net: use a per task frag allocator") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
With ConnectX-4 sharing SRQs from the same space as QPs, we hit a
limit preventing some applications to allocate needed QPs amount.
Double the size to 256K.
Fixes: e126ba97dba9e ('mlx5: Add driver for Mellanox Connect-IB adapters') Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The gadget code exports the bitfield for serial status changes
over the wire in its internal endianness. The fix is to convert
to little endian before sending it over the wire.
Make sure to check the number of endpoints to avoid dereferencing a
NULL-pointer or accessing memory that lie beyond the end of the endpoint
array should a malicious device lack the expected endpoints.
Make sure to check the number of endpoints to avoid dereferencing a
NULL-pointer should a malicious device lack endpoints.
Fixes: c04148f915e5 ("Input: add driver for USB VoIP phones with CM109...") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Make sure to check the number of endpoints to avoid dereferencing a
NULL-pointer or accessing memory that lie beyond the end of the endpoint
array should a malicious device lack the expected endpoints.
Alexander reported a KMSAN splat caused by reads of uninitialized
field (tb_id_in) from user provided struct fib_result_nl
It turns out nl_fib_input() sanity tests on user input is a bit
wrong :
User can pretend nlh->nlmsg_len is big enough, but provide
at sendmsg() time a too small buffer.
Reported-by: Alexander Potapenko <glider@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Update to pcpu_nr_empty_pop_pages in pcpu_alloc() is currently done
without holding pcpu_lock. This can lead to bad updates to the variable.
Add missing lock calls.
Make sure to check the number of endpoints to avoid dereferencing a
NULL-pointer should a malicious device lack endpoints.
Fixes: cf7776dc05b8 ("[PATCH] isdn4linux: Siemens Gigaset drivers -
direct USB connection") Cc: Hansjoerg Lipp <hjlipp@web.de> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As reported by Max, the Windows 2008 R2 chkdsk utility expects
VERIFY_16 to be supported, and does not handle the returned
CHECK_CONDITION properly, resulting in an infinite loop.