Currently, in PREEMPT_COUNT=n kernel, kvm_async_pf_task_wait() could call
schedule() to reschedule in some cases. This could result in
accidentally ending the current RCU read-side critical section early,
causing random memory corruption in the guest, or otherwise preempting
the currently running task inside between preempt_disable and
preempt_enable.
The difficulty to handle this well is because we don't know whether an
async PF delivered in a preemptible section or RCU read-side critical section
for PREEMPT_COUNT=n, since preempt_disable()/enable() and rcu_read_lock/unlock()
are both no-ops in that case.
To cure this, we treat any async PF interrupting a kernel context as one
that cannot be preempted, preventing kvm_async_pf_task_wait() from choosing
the schedule() path in that case.
To do so, a second parameter for kvm_async_pf_task_wait() is introduced,
so that we know whether it's called from a context interrupting the
kernel, and the parameter is set properly in all the callsites.
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
[bwh: Backported to 3.16:
- Use user_mode_vm() as equivalent to upstream user_mode()
- Adjust filename, context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
This happened when the host hit a page fault, and delivered it as in an
async page fault, while the guest was in an RCU read-side critical
section. The guest then tries to reschedule in kvm_async_pf_task_wait(),
but rcu_preempt_note_context_switch() would treat the reschedule as a
sleep in RCU read-side critical section, which is not allowed (even in
preemptible RCU). Thus the WARN.
To cure this, make kvm_async_pf_task_wait() go to the halt path if the
PF happens in a RCU read-side critical section.
Reported-by: Sasha Levin <levinsasha928@gmail.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The check in vmci_transport_peer_detach_cb should only allow a
detach when the qp handle of the transport matches the one in
the detach message.
Testing: Before this change, a detach from a peer on a different
socket would cause an active stream socket to register a detach.
Reviewed-by: George Zhang <georgezhang@vmware.com> Signed-off-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The recent fix for the vsock sock_put issue used the wrong
initializer for the transport spin_lock causing an issue when
running with lockdep checking.
Testing: Verified fix on kernel with lockdep enabled.
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com> Signed-off-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
In the vsock vmci_transport driver, sock_put wasn't safe to call
in interrupt context, since that may call the vsock destructor
which in turn calls several functions that should only be called
from process context. This change defers the callling of these
functions to a worker thread. All these functions were
deallocation of resources related to the transport itself.
Furthermore, an unused callback was removed to simplify the
cleanup.
Multiple customers have been hitting this issue when using
VMware tools on vSphere 2015.
Also added a version to the vmci transport module (starting from
1.0.2.0-k since up until now it appears that this module was
sharing version with vsock that is currently at 1.0.1.0-k).
Reviewed-by: Aditya Asarwade <asarwade@vmware.com> Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com> Signed-off-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
For the reinstall prevention, the code I had added compares the
whole key. It turns out though that iwlwifi firmware doesn't
provide the TKIP TX MIC key as it's not needed in client mode,
and thus the comparison will always return false.
For client mode, thus always zero out the TX MIC key part before
doing the comparison in order to avoid accepting the reinstall
of the key with identical encryption and RX MIC key, but not the
same TX MIC key (since the supplicant provides the real one.)
Fixes: fdf7cb4185b6 ("mac80211: accept key reinstall without changing anything") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Otherwise we risk leaking information via timing side channel.
Fixes: fdf7cb4185b6 ("mac80211: accept key reinstall without changing anything") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Fix by simply ignoring the bogus descriptor, as it is optional
for QMI devices anyway.
Fixes: 423ce8caab7e ("net: usb: qmi_wwan: New driver for Huawei QMI based WWAN devices") Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Setting dev->hard_mtu to 0 will cause a divide error in
usbnet_probe. Protect against devices with bogus CDC Ethernet
functional descriptors by ignoring a zero wMaxSegmentSize.
Signed-off-by: Bjørn Mork <bjorn@mork.no> Acked-by: Oliver Neukum <oneukum@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.16: parsing code is organised differently] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
parse_hid_report_descriptor() has a while (i < length) loop, which
only guarantees that there's at least 1 byte in the buffer, but the
loop body can read multiple bytes which causes out-of-bounds access.
Make sure to check that we actually have an Interface Association
Descriptor before dereferencing it during probe to avoid dereferencing a
NULL-pointer.
Fixes: e0d3bafd0258 ("V4L/DVB (10954): Add cx231xx USB driver") Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Johan Hovold <johan@kernel.org> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Make sure to reset the USB-console port pointer when console setup fails
in order to avoid having the struct usb_serial be prematurely freed by
the console code when the device is later disconnected.
Fixes: 73e487fdb75f ("[PATCH] USB console: fix disconnection issues") Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Now when peeling off an association to the sock in another netns, all
transports in this assoc are not to be rehashed and keep use the old
key in hashtable.
As a transport uses sk->net as the hash key to insert into hashtable,
it would miss removing these transports from hashtable due to the new
netns when closing the sock and all transports are being freeed, then
later an use-after-free issue could be caused when looking up an asoc
and dereferencing those transports.
This is a very old issue since very beginning, ChunYu found it with
syzkaller fuzz testing with this series:
This patch is to block this call when peeling one assoc off from one
netns to another one, so that the netns of all transport would not
go out-sync with the key in hashtable.
Note that this patch didn't fix it by rehashing transports, as it's
difficult to handle the situation when the tuple is already in use
in the new netns. Besides, no one would like to peel off one assoc
to another netns, considering ipaddrs, ifaces, etc. are usually
different.
Reported-by: ChunYu Wang <chunwang@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
After running some memory intensive workload in guest, I catch the kworker
which completes the GUP too quickly, and queues an "Page Ready" #PF exception
after the "Page not Present" exception before the next vmentry as the above
trace which will result in #DF injected to guest.
This patch fixes it by clearing the queue for "Page not Present" if "Page Ready"
occurs before the next vmentry since the GUP has already got the required page
and shadow page table has already been fixed by "Page Ready" handler.
Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Fixes: 7c90705bf2a3 ("KVM: Inject asynchronous page fault into a PV guest if page is swapped out.")
[Changed indentation and added clearing of injected. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
[bwh: Backported to 3.16: Don't assign to kvm_queued_exception::injected or
x86_exception::async_page_fault] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
gcc-7 points out that a negative port_num value would overflow the
string buffer:
drivers/infiniband/hw/mlx4/sysfs.c: In function 'mlx4_ib_device_register_sysfs':
drivers/infiniband/hw/mlx4/sysfs.c:251:16: error: 'sprintf' may write a terminating nul past the end of the destination [-Werror=format-overflow=]
drivers/infiniband/hw/mlx4/sysfs.c:251:2: note: 'sprintf' output between 2 and 11 bytes into a destination of size 10
drivers/infiniband/hw/mlx4/sysfs.c:303:17: error: 'sprintf' may write a terminating nul past the end of the destination [-Werror=format-overflow=]
drivers/infiniband/hw/mlx4/sysfs.c:303:3: note: 'sprintf' output between 2 and 11 bytes into a destination of size 10
While we should be able to assume that port_num is positive here, making
the buffer one byte longer has no downsides and avoids the warning.
Fixes: c1e7e466120b ("IB/mlx4: Add iov directory in sysfs under the ib device") Link: http://lkml.kernel.org/r/20170714120720.906842-23-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Configure pause time to 0xffff when tx flow control enabled
Set pause time to 0xffff in the pause frame to indicate the
partner to stop sending the packets. When RX buffer frees up,
the device sends pause frame with pause time zero for partner to
resume transmission.
Fixes: 2f7ca802bdae ("Add SMSC LAN9500 USB2.0 10/100 ethernet adapter driver") Signed-off-by: Nisar Sayed <Nisar.Sayed@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
We should only see devices with interrupt endpoints. Ignore any other
endpoints that we find, so we don't send try to send them interrupt URBs
and trigger a WARN down in the USB stack.
The order of endpoints is well defined on official Xbox pads, but
we have found at least one 3rd-party pad that doesn't follow the
standard ("Titanfall 2 Xbox One controller" 0e6f:0165).
Fortunately, we get lucky with this specific pad because it uses
endpoint addresses that differ only by direction. We know that
there are other pads out where this is not true, so let's go
ahead and fix this.
Signed-off-by: Cameron Gutman <aicommander@gmail.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
[bwh: Backported to 3.16:
- Use 'fail3' label in case of failure
- Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Xbox One controllers require an initialization message to start sending
data, so xpad_init_output becomes a required function. The Xbox One
controller does not have LEDs like the Xbox 360 controller, so that
functionality is not implemented. The format of messages controlling rumble
is currently undocumented, so rumble support is not yet implemented.
Note that Xbox One controller advertises three interfaces with the same
interface class, subclass and protocol, so we have to also match against
interface number.
IPv6 FIB should use FIB6_TABLE_HASHSZ, not FIB_TABLE_HASHSZ.
Fixes: ba1cc08d9488 ("ipv6: fix memory leak with multiple tables during netns destruction") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
fib6_net_exit only frees the main and local tables. If another table was
created with fib6_alloc_table, we leak it when the netns is destroyed.
Fix this in the same way ip_fib_net_exit cleans up tables, by walking
through the whole hashtable of fib6_table's. We can get rid of the
special cases for local and main, since they're also part of the
hashtable.
Reproducer:
ip netns add x
ip -net x -6 rule add from 6003:1::/64 table 100
ip netns del x
Reported-by: Jianlin Shi <jishi@redhat.com> Fixes: 58f09b78b730 ("[NETNS][IPV6] ip6_fib - make it per network namespace") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
bcache uses a Proportion-Differentiation Controller algorithm to control
writeback rate to cached devices. In the PD controller algorithm, dirty
stripes of thin flash device should not be counted in, because flash only
volumes never write back dirty data.
Currently dirty stripe counter for thin flash device is not initialized
when the thin flash device starts. Which means the following calculation
in PD controller will reference an undefined dirty stripes number, and
all cached devices attached to the same cache set where the thin flash
device lies on may have an inaccurate writeback rate.
This patch calles bch_sectors_dirty_init() in flash_dev_run(), to
correctly initialize dirty stripe counter when the thin flash device
starts to run. This patch also does following parameter data type change,
-void bch_sectors_dirty_init(struct cached_dev *dc);
+void bch_sectors_dirty_init(struct bcache_device *);
to call this function conveniently in flash_dev_run().
(Commit log is composed by Coly Li)
Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn> Reviewed-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
for_each_active_irq() iterates the sparse irq allocation bitmap. The caller
must hold sparse_irq_lock. Several code pathes expect that an active bit in
the sparse bitmap also has a valid interrupt descriptor.
Unfortunately that's not true. The (de)allocation is a two step process,
which holds the sparse_irq_lock only across the queue/remove from the radix
tree and the set/clear in the allocation bitmap.
If a iteration locks sparse_irq_lock between the two steps, then it might
see an active bit but the corresponding irq descriptor is NULL. If that is
dereferenced unconditionally, then the kernel oopses. Of course, all
iterator sites could be audited and fixed, but....
There is no reason why the sparse_irq_lock needs to be dropped between the
two steps, in fact the code becomes simpler when the mutex is held across
both and the semantics become more straight forward, so future problems of
missing NULL pointer checks in the iteration are avoided and all existing
sites are fixed in one go.
Expand the lock held sections so both operations are covered and the bitmap
and the radixtree are in sync.
Fixes: a05a900a51c7 ("genirq: Make sparse_lock a mutex") Reported-and-tested-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Currently trace_clock timestamps are applied to both regular and max
buffers only for global trace. For instance trace, trace_clock
timestamps are applied only to regular buffer. But, regular and max
buffers can be swapped, for example, following a snapshot. So, for
instance trace, bad timestamps can be seen following a snapshot.
Let's apply trace_clock timestamps to instance max buffer as well.
Most importantly, solve a crash where %llu was used to format signed
numbers. This would cause a buffer overflow when reading sysfs
writeback_rate_debug, as only 20 bytes were allocated for this and
%llu writes 20 characters plus a null.
Always use the units mechanism rather than having different output
paths for simplicity.
Also, correct problems with display output where 1.10 was a larger
number than 1.09, by multiplying by 10 and then dividing by 1024 instead
of dividing by 100. (Remainders of >= 1000 would print as .10).
Minor changes: Always display the decimal point instead of trying to
omit it based on number of digits shown. Decide what units to use
based on 1000 as a threshold, not 1024 (in other words, always print
at most 3 digits before the decimal point).
Signed-off-by: Michael Lyle <mlyle@lyle.org> Reported-by: Dmitry Yu Okunev <dyokunev@ut.mephi.ru> Acked-by: Kent Overstreet <kent.overstreet@gmail.com> Reviewed-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
If you encounter any errors in bch_cached_dev_attach it will return
a negative error code. The variable 'v' which stores the result is
unsigned, thus user space sees a very large value returned for bytes
written which can cause incorrect user space behavior. Utilize 1
signed variable to use throughout the function to preserve error return
capability.
Signed-off-by: Tony Asleson <tasleson@redhat.com> Acked-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
__update_write_rate() uses a Proportion-Differentiation Controller
algorithm to control writeback rate. A dirty target number is used in
this PD controller to control writeback rate. A larger target number
will make the writeback rate smaller, on the versus, a smaller target
number will make the writeback rate larger.
bcache uses the following steps to calculate the target number,
1) cache_sectors = all-buckets-of-cache-set * buckets-size
2) cache_dirty_target = cache_sectors * cached-device-writeback_percent
3) target = cache_dirty_target *
(sectors-of-cached-device/sectors-of-all-cached-devices-of-this-cache-set)
The calculation at step 1) for cache_sectors is incorrect, which does
not consider dirty blocks occupied by flash only volume.
A flash only volume can be took as a bcache device without cached
device. All data sectors allocated for it are persistent on cache device
and marked dirty, they are not touched by bcache writeback and garbage
collection code. So data blocks of flash only volume should be ignore
when calculating cache_sectors of cache set.
Current code does not subtract dirty sectors of flash only volume, which
results a larger target number from the above 3 steps. And in sequence
the cache device's writeback rate is smaller then a correct value,
writeback speed is slower on all cached devices.
This patch fixes the incorrect slower writeback rate by subtracting
dirty sectors of flash only volumes in __update_writeback_rate().
(Commit log composed by Coly Li to pass checkpatch.pl checking)
Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn> Reviewed-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Sequential write IOs were tested with bs=1M by FIO in writeback cache
mode, these IOs were expected to be bypassed, but actually they did not.
We debug the code, and find in check_should_bypass():
if (!congested &&
mode == CACHE_MODE_WRITEBACK &&
op_is_write(bio_op(bio)) &&
(bio->bi_opf & REQ_SYNC))
goto rescale
that means, If in writeback mode, a write IO with REQ_SYNC flag will not
be bypassed though it is a sequential large IO, It's not a correct thing
to do actually, so this patch remove these codes.
Signed-off-by: tang.junhui <tang.junhui@zte.com.cn> Reviewed-by: Kent Overstreet <kent.overstreet@gmail.com> Reviewed-by: Eric Wheeler <bcache@linux.ewheeler.net> Signed-off-by: Jens Axboe <axboe@kernel.dk>
[bwh: Backported to 3.16: deleted code is slightly different] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
If blkdev_get_by_path() in register_bcache() fails, we try to lookup the
block device using lookup_bdev() to detect which situation we are in to
properly report error. However we never drop the reference returned to
us from lookup_bdev(). Fix that.
Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The stack unwinding code uses the mips_instuction union to decode the
instructions it finds. That union uses the __BITFIELD_FIELD macro to
reorder depending on endianness. The stack unwinding code always places
16bit instructions in halfword 1 of the union. This makes the union
accesses correct for little endian systems. Similarly, 32bit
instructions are reordered such that they are correct for little endian
systems. This handling leaves unwinding the stack on big endian systems
broken, as the mips_instruction union will then look for the fields in
the wrong halfword.
To fix this, use a logical shift to place the 16bit instruction into the
correct position in the word field of the union. Use the same shifting
to order the 2 halfwords of 32bit instuctions. Then replace accesses to
the halfword with accesses to the shifted word.
In the case of the ADDIUS5 instruction, switch to using the
mm16_r5_format union member to avoid the need for a 16bit shift.
Fixes: 34c2f668d0f6 ("MIPS: microMIPS: Add unaligned access support.") Signed-off-by: Matt Redfearn <matt.redfearn@imgtec.com> Reviewed-by: James Hogan <james.hogan@imgtec.com> Cc: Marcin Nowakowski <marcin.nowakowski@imgtec.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Paul Burton <paul.burton@imgtec.com> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/16956/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
When the immediate encoded in the instruction is accessed, it is sign
extended due to being a signed value being assigned to a signed integer.
The ISA specifies that this operation is an unsigned operation.
The sign extension leads us to incorrectly decode:
Since the instruction format does not specify signed/unsigned, and this
is currently the only location to use this instuction format, change it
to an unsigned immediate.
Fixes: bb9bc4689b9c ("MIPS: Calculate microMIPS ra properly when unwinding the stack") Suggested-by: Paul Burton <paul.burton@imgtec.com> Signed-off-by: Matt Redfearn <matt.redfearn@imgtec.com> Reviewed-by: James Hogan <james.hogan@imgtec.com> Cc: Marcin Nowakowski <marcin.nowakowski@imgtec.com> Cc: Miodrag Dinic <miodrag.dinic@imgtec.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: David Daney <david.daney@cavium.com> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/16957/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Commit 34c2f668d0f6 ("MIPS: microMIPS: Add unaligned access support.")
added handling of microMIPS instructions to manipulate the stack
pointer. Unfortunately the decoding of the addiusp instruction was
incorrect, and performed a left shift by 2 bits to the raw immediate,
rather than decoding the immediate and then performing the shift, as
documented in the ISA.
This led to incomplete stack traces, due to incorrect frame sizes being
calculated. For example the instruction: 801faee0 <do_sys_poll>: 801faee0: 4e25 addiu sp,sp,-952
As decoded by objdump, would be interpreted by the existing code as
having manipulated the stack pointer by +1096.
Fix this by changing the order of decoding the immediate and applying
the left shift. Also change to accessing the instuction through the
union to avoid the endianness problem of accesing halfword[0], which
will fail on big endian systems.
Cope with the special behaviour of immediates 0x0, 0x1, 0x1fe and 0x1ff
by XORing with 0x100 again if mod(immediate) < 4. This logic was tested
with the following test code:
int main(int argc, char **argv)
{
unsigned int enc;
int imm;
The addiusp instruction uses the pool16d opcode, with bit 0 of the
immediate set. The test for the addiusp opcode erroneously did a logical
and of the immediate with mm_addiusp_func, which has value 1, so this
test always passes when the immediate is non-zero.
Fix the test by replacing the logical and with a bitwise and.
Fixes: 34c2f668d0f6 ("MIPS: microMIPS: Add unaligned access support.") Signed-off-by: Matt Redfearn <matt.redfearn@imgtec.com> Cc: Marcin Nowakowski <marcin.nowakowski@imgtec.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Paul Burton <paul.burton@imgtec.com> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/16954/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Commit 34c2f668d0f6b ("MIPS: microMIPS: Add unaligned access support.")
added fairly broken support for handling 16bit microMIPS instructions in
get_frame_info(). It adjusts the instruction pointer by 16bits in the
case of a 16bit sp move instruction, but not any other 16bit
instruction.
Commit b6c7a324df37 ("MIPS: Fix get_frame_info() handling of microMIPS
function size") goes some way to fixing get_frame_info() to iterate over
microMIPS instuctions, but the instruction pointer is still manipulated
using a postincrement, and is of union mips_instruction type. Since the
union is sized to the largest member (a word), but microMIPS
instructions are a mix of halfword and word sizes, the function does not
always iterate correctly, ending up misaligned with the instruction
stream and interpreting it incorrectly.
Since the instruction modifying the stack pointer is usually the first
in the function, that one is usually handled correctly. But the
instruction which saves the return address to the sp is some variable
number of instructions into the frame and is frequently missed due to
not being on a word boundary, leading to incomplete walking of the
stack.
Fix this by incrementing the instruction pointer based on the size of
the previously decoded instruction (& remove the hack introduced by
commit 34c2f668d0f6b ("MIPS: microMIPS: Add unaligned access support.")
which adjusts the instruction pointer in the case of a 16bit sp move
instruction, but not any other).
Fixes: 34c2f668d0f6b ("MIPS: microMIPS: Add unaligned access support.") Signed-off-by: Matt Redfearn <matt.redfearn@imgtec.com> Cc: Marcin Nowakowski <marcin.nowakowski@imgtec.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Paul Burton <paul.burton@imgtec.com> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/16953/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Make the behaviour of clk_get_rate consistent with common clk's
clk_get_rate by accepting NULL clocks as parameter. Some device
drivers rely on this, and will cause an OOPS otherwise.
Make the behaviour of clk_get_rate consistent with common clk's
clk_get_rate by accepting NULL clocks as parameter, as some device
drivers rely on this.
Make the behaviour of clk_get_rate consistent with common clk's
clk_get_rate by accepting NULL clocks as parameter. Some device
drivers rely on this, and will cause an OOPS otherwise.
Fixes: f8ede0f700f5 ("MIPS: Loongson 2F: Add CPU frequency scaling support") Reported-by: Mathias Kresin <dev@kresin.me> Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/16777/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
[bwh: Backported to 3.16: adjust filename] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Make the behaviour of clk_get_rate consistent with common clk's
clk_get_rate by accepting NULL clocks as parameter. Some device
drivers rely on this, and will cause an OOPS otherwise.
Fixes: e7300d04bd08 ("MIPS: BCM63xx: Add support for the Broadcom BCM63xx family of SOCs.") Reported-by: Mathias Kresin <dev@kresin.me> Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Cc: bcm-kernel-feedback-list@broadcom.com Cc: James Hogan <james.hogan@imgtec.com> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/16776/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Make the behaviour of clk_get_rate consistent with common clk's
clk_get_rate by accepting NULL clocks as parameter. Some device
drivers rely on this, and will cause an OOPS otherwise.
Fixes: 780019ddf02f ("MIPS: AR7: Implement clock API") Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com> Reported-by: Mathias Kresin <dev@kresin.me> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/16775/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The order in __tlb_flush_mm_lazy is to flush TLB first and then clear
the mm->context.flush_mm bit. This can lead to missed flushes as the
bit can be set anytime, the order needs to be the other way aronud.
But this leads to a different race, __tlb_flush_mm_lazy may be called
on two CPUs concurrently. If mm->context.flush_mm is cleared first then
another CPU can bypass __tlb_flush_mm_lazy although the first CPU has
not done the flush yet. In a virtualized environment the time until the
flush is finally completed can be arbitrarily long.
Add a spinlock to serialize __tlb_flush_mm_lazy and use the function
in finish_arch_post_lock_switch as well.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
When HW ROC is supported it is possible that after the HW notified
that the ROC has started, the ROC was cancelled and another ROC was
added while the hw_roc_start worker is waiting on the mutex (since
cancelling the ROC and adding another one also holds the same mutex).
As a result, the hw_roc_start worker will continue to run after the
new ROC is added but before it is actually started by the HW.
This may result in notifying userspace that the ROC has started before
it actually does, or in case of management tx ROC, in an attempt to
tx while not on the right channel.
In addition, when the driver will notify mac80211 that the second ROC
has started, mac80211 will warn that this ROC has already been
notified.
Fix this by flushing the hw_roc_start work before cancelling an ROC.
Signed-off-by: Avraham Stern <avraham.stern@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
[bwh: Backported to 3.16: adjust filename] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
In struct ieee80211_tx_info, control.vif pointer and rate_driver_data[0]
falls on the same place, depending on the union usage.
During the whole TX process, the union is referred to as a control struct,
which holds the vif that is later used in the tx flow, especially in order
to derive the used tx power.
Referring direcly to rate_driver_data[0] and assigning a value to it,
overwrites the vif pointer, hence making all later references irrelevant.
Moreover, rate_driver_data[0] isn't used later in the flow in order to
retrieve the channel that it is pointing to.
Signed-off-by: Beni Lev <beni.lev@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
gcc-8 notices that the register number calculation is wrong
when the offset is an 'u8' but the number is larger than 256:
drivers/mfd/omap-usb-tll.c: In function 'omap_tll_init':
drivers/mfd/omap-usb-tll.c:90:46: error: overflow in conversion from 'int' to 'u8 {aka unsigned char}' chages value from 'i * 256 + 2070' to '22' [-Werror=overflow]
This addresses it by always using a 32-bit offset number for
the register. This is apparently an old problem that previous
compilers did not find.
Fixes: 16fa3dc75c22 ("mfd: omap-usb-tll: HOST TLL platform driver") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
if 'max8998_i2c_parse_dt_pdata() fails (when out of memory), a NULL
pointer dereference will occur in the error handling code.
Return directly instead.
Fixes: ee999fb3f17f("mfd: max8998: Add support for Device Tree") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Using l2tp_tunnel_find() in pppol2tp_session_create() and
l2tp_eth_create() is racy, because no reference is held on the
returned session. These functions are only used to implement the
->session_create callback which is run by l2tp_nl_cmd_session_create().
Therefore searching for the parent tunnel isn't necessary because
l2tp_nl_cmd_session_create() already has a pointer to it and holds a
reference.
This patch modifies ->session_create()'s prototype to directly pass the
the parent tunnel as parameter, thus avoiding searching for it in
pppol2tp_session_create() and l2tp_eth_create().
Since we have to touch the ->session_create() call in
l2tp_nl_cmd_session_create(), let's also remove the useless conditional:
we know that ->session_create isn't NULL at this point because it's
already been checked earlier in this same function.
Finally, one might be tempted to think that the removed
l2tp_tunnel_find() calls were harmless because they would return the
same tunnel as the one held by l2tp_nl_cmd_session_create() anyway.
But that tunnel might be removed and a new one created with same tunnel
Id before the l2tp_tunnel_find() call. In this case l2tp_tunnel_find()
would return the new tunnel which wouldn't be protected by the
reference held by l2tp_nl_cmd_session_create().
Fixes: 309795f4bec2 ("l2tp: Add netlink control API for L2TP") Fixes: d9e31d17ceba ("l2tp: Add L2TP ethernet pseudowire support") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
l2tp_tunnel_destruct() sets tunnel->sock to NULL, then removes the
tunnel from the pernet list and finally closes all its sessions.
Therefore, it's possible to add a session to a tunnel that is still
reachable, but for which tunnel->sock has already been reset. This can
make l2tp_session_create() dereference a NULL pointer when calling
sock_hold(tunnel->sock).
This patch adds the .acpt_newsess field to struct l2tp_tunnel, which is
used by l2tp_tunnel_closeall() to prevent addition of new sessions to
tunnels. Resetting tunnel->sock is done after l2tp_tunnel_closeall()
returned, so that l2tp_session_add_to_tunnel() can safely take a
reference on it when .acpt_newsess is true.
The .acpt_newsess field is modified in l2tp_tunnel_closeall(), rather
than in l2tp_tunnel_destruct(), so that it benefits all tunnel removal
mechanisms. E.g. on UDP tunnels, a session could be added to a tunnel
after l2tp_udp_encap_destroy() proceeded. This would prevent the tunnel
from being removed because of the references held by this new session
on the tunnel and its socket. Even though the session could be removed
manually later on, this defeats the purpose of
commit 9980d001cec8 ("l2tp: add udp encap socket destroy handler").
Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
There is a bug in fragmentation codes use of the percpu_counter API,
that can cause issues on systems with many CPUs.
The frag_mem_limit() just reads the global counter (fbc->count),
without considering other CPUs can have upto batch size (130K) that
haven't been subtracted yet. Due to the 3MBytes lower thresh limit,
this become dangerous at >=24 CPUs (3*1024*1024/130000=24).
The correct API usage would be to use __percpu_counter_compare() which
does the right thing, and takes into account the number of (online)
CPUs and batch size, to account for this and call __percpu_counter_sum()
when needed.
We choose to revert the use of the lib/percpu_counter API for frag
memory accounting for several reasons:
1) On systems with CPUs > 24, the heavier fully locked
__percpu_counter_sum() is always invoked, which will be more
expensive than the atomic_t that is reverted to.
Given systems with more than 24 CPUs are becoming common this doesn't
seem like a good option. To mitigate this, the batch size could be
decreased and thresh be increased.
2) The add_frag_mem_limit+sub_frag_mem_limit pairs happen on the RX
CPU, before SKBs are pushed into sockets on remote CPUs. Given
NICs can only hash on L2 part of the IP-header, the NIC-RXq's will
likely be limited. Thus, a fair chance that atomic add+dec happen
on the same CPU.
Revert note that commit 1d6119baf061 ("net: fix percpu memory leaks")
removed init_frag_mem_limit() and instead use inet_frags_init_net().
After this revert, inet_frags_uninit_net() becomes empty.
Fixes: 6d7b857d541e ("net: use lib/percpu_counter API for fragmentation mem accounting") Fixes: 1d6119baf061 ("net: fix percpu memory leaks") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
When calling into _xfs_log_force{,_lsn}() with a pointer
to log_flushed variable, log_flushed will be set to 1 if:
1. xlog_sync() is called to flush the active log buffer
AND/OR
2. xlog_wait() is called to wait on a syncing log buffers
xfs_file_fsync() checks the value of log_flushed after
_xfs_log_force_lsn() call to optimize away an explicit
PREFLUSH request to the data block device after writing
out all the file's pages to disk.
This optimization is incorrect in the following sequence of events:
The write X is not guarantied to be on persistent storage
when PREFLUSH request in completed, because write A was submitted
after the PREFLUSH request, but xfs_file_fsync() of task A will
be notified of log_flushed=1 and will skip explicit flush.
If the system crashes after fsync of task A, write X may not be
present on disk after reboot.
This bug was discovered and demonstrated using Josef Bacik's
dm-log-writes target, which can be used to record block io operations
and then replay a subset of these operations onto the target device.
The test goes something like this:
- Use fsx to execute ops of a file and record ops on log device
- Every now and then fsync the file, store md5 of file and mark
the location in the log
- Then replay log onto device for each mark, mount fs and compare
md5 of file to stored value
Cc: Christoph Hellwig <hch@lst.de> Cc: Josef Bacik <jbacik@fb.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
I recently came upon a scenario where I would get a double fault
machine check exception tiriggered by a kernel module.
However the ensuing crash stacktrace (ksym lookup) was not working
correctly.
Turns out that machine check auto-disables MMU while modules are allocated
in kernel vaddr spapce.
This patch re-enables the MMU before start printing the stacktrace
making stacktracing of modules work upon a fatal exception.
Signed-off-by: Jose Abreu <joabreu@synopsys.com> Reviewed-by: Alexey Brodkin <abrodkin@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
[vgupta: moved code into low level handler to avoid in 2 places] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
In the second iteration of trace_selftest_ops(), the error goto label is
wrong in the case where trace_selftest_test_global_cnt is off. In the
case of error, it leaks the dynamic ops that was allocated.
Fixes: 95950c2e ("ftrace: Add self-tests for multiple function trace users") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
bitmap_resize() does not work for file-backed bitmaps.
The buffer_heads are allocated and initialized when
the bitmap is read from the file, but resize doesn't
read from the file, it loads from the internal bitmap.
When it comes time to write the new bitmap, the bh is
non-existent and we crash.
The common case when growing an array involves making the array larger,
and that normally means making the bitmap larger. Doing
that inside the kernel is possible, but would need more code.
It is probably easier to require people who use file-backed
bitmaps to remove them and re-add after a reshape.
So this patch disables the resizing of arrays which have
file-backed bitmaps. This is better than crashing.
Reported-by: Zhilong Liu <zlliu@suse.com> Fixes: d60b479d177a ("md/bitmap: add bitmap_resize function to allow bitmap resizing.") Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
When booting Linux as Xen guest with CONFIG_DEBUG_ATOMIC, the following
splat appears:
[ 0.002323] Mountpoint-cache hash table entries: 1024 (order: 1, 8192 bytes)
[ 0.019717] ASID allocator initialised with 65536 entries
[ 0.020019] xen:grant_table: Grant tables using version 1 layout
[ 0.020051] Grant table initialized
[ 0.020069] BUG: sleeping function called from invalid context at /data/src/linux/mm/page_alloc.c:4046
[ 0.020100] in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: swapper/0
[ 0.020123] no locks held by swapper/0/1.
[ 0.020143] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.13.0-rc5 #598
[ 0.020166] Hardware name: FVP Base (DT)
[ 0.020182] Call trace:
[ 0.020199] [<ffff00000808a5c0>] dump_backtrace+0x0/0x270
[ 0.020222] [<ffff00000808a95c>] show_stack+0x24/0x30
[ 0.020244] [<ffff000008c1ef20>] dump_stack+0xb8/0xf0
[ 0.020267] [<ffff0000081128c0>] ___might_sleep+0x1c8/0x1f8
[ 0.020291] [<ffff000008112948>] __might_sleep+0x58/0x90
[ 0.020313] [<ffff0000082171b8>] __alloc_pages_nodemask+0x1c0/0x12e8
[ 0.020338] [<ffff00000827a110>] alloc_page_interleave+0x38/0x88
[ 0.020363] [<ffff00000827a904>] alloc_pages_current+0xdc/0xf0
[ 0.020387] [<ffff000008211f38>] __get_free_pages+0x28/0x50
[ 0.020411] [<ffff0000086566a4>] evtchn_fifo_alloc_control_block+0x2c/0xa0
[ 0.020437] [<ffff0000091747b0>] xen_evtchn_fifo_init+0x38/0xb4
[ 0.020461] [<ffff0000091746c0>] xen_init_IRQ+0x44/0xc8
[ 0.020484] [<ffff000009128adc>] xen_guest_init+0x250/0x300
[ 0.020507] [<ffff000008083974>] do_one_initcall+0x44/0x130
[ 0.020531] [<ffff000009120df8>] kernel_init_freeable+0x120/0x288
[ 0.020556] [<ffff000008c31ca8>] kernel_init+0x18/0x110
[ 0.020578] [<ffff000008083710>] ret_from_fork+0x10/0x40
[ 0.020606] xen:events: Using FIFO-based ABI
[ 0.020658] Xen: initializing cpu0
[ 0.027727] Hierarchical SRCU implementation.
[ 0.036235] EFI services will not be available.
[ 0.043810] smp: Bringing up secondary CPUs ...
This is because get_cpu() in xen_evtchn_fifo_init() will disable
preemption, but __get_free_page() might sleep (GFP_ATOMIC is not set).
xen_evtchn_fifo_init() will always be called before SMP is initialized,
so {get,put}_cpu() could be replaced by a simple smp_processor_id().
This also avoid to modify evtchn_fifo_alloc_control_block that will be
called in other context.
Signed-off-by: Julien Grall <julien.grall@arm.com> Reported-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Fixes: 1fe565517b57 ("xen/events: use the FIFO-based ABI if available") Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The instruction code for xxlor that commit 0016a4cf5582 ("powerpc:
Emulate most Book I instructions in emulate_step()", 2010-06-15)
added is actually the code for xxlnor. It is used in get_vsr()
and put_vsr() and the effect of the error is that if emulate_step
is used to emulate a VSX load or store from any register other
than vsr0, the bitwise complement of the correct value will be
loaded or stored. This corrects the error.
Fixes: 0016a4cf5582 ("powerpc: Emulate most Book I instructions in emulate_step()") Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Anton noticed that if we fault part way through emulating an unaligned
instruction, we don't update the DAR to reflect that.
The DAR value is eventually reported back to userspace as the address
in the SEGV signal, and if userspace is using that value to demand
fault then it can be confused by us not setting the value correctly.
This patch is ugly as hell, but is intended to be the minimal fix and
back ports easily.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
My static checker complains that 0x00001800 >> 13 is zero. Looking at
the context, it seems like a copy and paste bug from the line below
and probably 0x3 << 13 or 0x00006000 was intended.
Fixes: 2af59f7d5c3e ("[POWERPC] 4xx: Add 405GPr and 405EP support in boot wrapper") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The value of "size" comes from the user. When we add "start + size" it
could lead to an integer overflow bug.
It means we vmalloc() a lot more memory than we had intended. I believe
that on 64 bit systems vmalloc() can succeed even if we ask it to
allocate huge 4GB buffers. So we would get memory corruption and likely
a crash when we call ha->isp_ops->write_optrom() and ->read_optrom().
Only root can trigger this bug.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=194061 Fixes: b7cc176c9eb3 ("[SCSI] qla2xxx: Allow region-based flash-part accesses.") Reported-by: shqking <shqking@gmail.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
If "regl_pdata->n_regulators == 0" is true then we accidentally return
PTR_ERR(<some_valid_pointer>) instead of an error code. I've changed it
to return -ENODEV instead.
Fixes: 69ca3e58d178 ("regulator: da9063: Add Dialog DA9063 voltage regulators support.") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Use bcast station for all non bufferable frames on AP and AD-HOC.
The host is no longer aware of STAs PS status because of buffer
station offload, so we can't rely on mac80211 to toggle on
IEEE80211_TX_CTL_NO_PS_BUFFER bit.
A possible issue with buffering such frames, beside the obvious spec
violation, is when a station disconnects while in PS but the AP isn't
aware of that. In such scenarios the AP won't be able to send probe
responses or auth frames so the STA won't be able to reconnect and
the AP will have a queue hang.
Fixes: 3e56eadfb6a1 ("iwlwifi: mvm: implement AP/GO uAPSD support") Signed-off-by: David Spinadel <david.spinadel@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
This fixes a potential race condition observed on Power systems.
Several places throughout the aacraid driver call aac_fib_send or
similar to send a command to the aacraid adapter, then check the return
code to determine if the command was actually sent to the adapter, then
update the phase field in the scsi command scratch pad area to track
that the firmware now owns this command. However, there is nothing that
ensures that by the time the aac_fib_send function returns and we go to
write to the scsi command, that the command hasn't already completed and
the scsi command has been freed. This was causing random crashes in the
TCP stack which was tracked down to be caused by memory that had been a
struct request + scsi_cmnd being now used for an skbuff. Memory
poisoning was enabled in the kernel to debug this which showed that the
last owner of the memory that had been freed was aacraid and that it was
a struct request. The memory that was corrupted was the exact data
pattern of AAC_OWNER_FIRMWARE and it was at the same offset that aacraid
writes, which is scsicmd->SCp.phase. The patch below resolves this
issue.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Tested-by: Wen Xiong <wenxiong@linux.vnet.ibm.com> Reviewed-by: Dave Carroll <david.carroll@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
[bwh: Backported to 3.16:
- Drop changes to aac_send_hba_fib()
- Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The "lg" variable is declared as int so in all places where this variable
is used as a shift operand, the output will be int too.
This produces the following smatch warning:
drivers/net/ethernet/mellanox/mlx4/fw.c:1532 mlx4_map_cmd() warn:
should '1 << lg' be a 64 bit type?
Simple declaration of "1" to be "1ULL" will fix the issue.
Fixes: 225c7b1feef1 ("IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
When there's a fatal signal pending, arm's do_page_fault()
implementation returns 0. The intent is that we'll return to the
faulting userspace instruction, delivering the signal on the way.
However, if we take a fatal signal during fixing up a uaccess, this
results in a return to the faulting kernel instruction, which will be
instantly retried, resulting in the same fault being taken forever. As
the task never reaches userspace, the signal is not delivered, and the
task is left unkillable. While the task is stuck in this state, it can
inhibit the forward progress of the system.
To avoid this, we must ensure that when a fatal signal is pending, we
apply any necessary fixup for a faulting kernel instruction. Thus we
will return to an error path, and it is up to that code to make forward
progress towards delivering the fatal signal.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Section 9.7.7.2.5 of the 1.3 IBTA spec clearly says that receive
credits should never apply to RDMA write.
qib and hfi1 were doing that. The following situation will result
in a QP hang:
- A prior SEND or RDMA_WRITE with immmediate consumed the last
credit for a QP using RC receive buffer credits
- The prior op is acked so there are no more acks
- The peer ULP fails to post receive for some reason
- An RDMA write sees that the credits are exhausted and waits
- The peer ULP posts receive buffers
- The ULP posts a send or RDMA write that will be hung
The fix is to avoid the credit test for the RDMA write operation.
Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
[bwh: Backported to 3.16:
- Drop changes to hfi1
- Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
ACPI defines a number of instructions to use for triggering errors. However
we are currently removing the address resources from the trigger resources
for only the WRITE_REGISTER_VALUE instruction. This leads to a resource
conflict for any other valid instruction.
Check that the instruction is less than or equal to the
WRITE_REGISTER_VALUE instruction. This allows all valid memory access
instructions and protects against invalid instructions.
Fixes: b4e008dc53a3 (ACPI, APEI, EINJ, Refine the fix of resource conflict) Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Acked-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The following commit cause a regression on ATI chipsets.
'commit e788787ef4f9 ("usb:xhci:Add quirk for Certain
failing HP keyboard on reset after resume")'
This causes pinfo->smbus_dev to be wrongly set to NULL on
systems with the ATI chipset that this function checks for first.
Added conditional check for AMD chipsets to avoid the overwriting
pinfo->smbus_dev.
Reported-by: Ben Hutchings <ben@decadent.org.uk> Fixes: e788787ef4f9 ("usb:xhci:Add quirk for Certain
failing HP keyboard on reset after resume")
cc: Nehal Shah <Nehal-bakulchandra.Shah@amd.com> Signed-off-by: Sandeep Singh <Sandeep.Singh@amd.com> Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Commit e0429362ab15
("usb: Add device quirk for Logitech HD Pro Webcams C920 and C930e")
introduced quirk to workaround an issue with some Logitech webcams.
Apparently model C920-C has the same issue so applying
the same quirk as well.
See aforementioned commit message for detailed explanation of the problem.
Corsair Strafe RGB keyboard has trouble to initialize:
[ 1.679455] usb 3-6: new full-speed USB device number 4 using xhci_hcd
[ 6.871136] usb 3-6: unable to read config index 0 descriptor/all
[ 6.871138] usb 3-6: can't read configurations, error -110
[ 6.991019] usb 3-6: new full-speed USB device number 5 using xhci_hcd
[ 12.246642] usb 3-6: unable to read config index 0 descriptor/all
[ 12.246644] usb 3-6: can't read configurations, error -110
[ 12.366555] usb 3-6: new full-speed USB device number 6 using xhci_hcd
[ 17.622145] usb 3-6: unable to read config index 0 descriptor/all
[ 17.622147] usb 3-6: can't read configurations, error -110
[ 17.742093] usb 3-6: new full-speed USB device number 7 using xhci_hcd
[ 22.997715] usb 3-6: unable to read config index 0 descriptor/all
[ 22.997716] usb 3-6: can't read configurations, error -110
Although it may work after several times unpluging/pluging:
[ 68.195240] usb 3-6: new full-speed USB device number 11 using xhci_hcd
[ 68.337459] usb 3-6: New USB device found, idVendor=1b1c, idProduct=1b20
[ 68.337463] usb 3-6: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 68.337466] usb 3-6: Product: Corsair STRAFE RGB Gaming Keyboard
[ 68.337468] usb 3-6: Manufacturer: Corsair
[ 68.337470] usb 3-6: SerialNumber: 0F013021AEB8046755A93ED3F5001941
Tried three quirks: USB_QUIRK_DELAY_INIT, USB_QUIRK_NO_LPM and
USB_QUIRK_DEVICE_QUALIFIER, user confirmed that USB_QUIRK_DELAY_INIT alone
can workaround this issue. Hence add the quirk for Corsair Strafe RGB.
While running reboot tests w/ a specific set of USB devices (and
slub_debug enabled), I found that once every few hours my device would
be crashed with a stack that looked like this:
Investigation using kgdb() found that the wait queue that was passed
into wake_up() had been freed (it was filled with slub_debug poison).
I analyzed and instrumented the code and reproduced. My current
belief is that this is happening:
1. async_completed() is called (from IRQ). Moves "as" onto the
completed list.
2. On another CPU, proc_reapurbnonblock_compat() calls
async_getcompleted(). Blocks on spinlock.
3. async_completed() releases the lock; keeps running; gets blocked
midway through wake_up().
4. proc_reapurbnonblock_compat() => async_getcompleted() gets the
lock; removes "as" from completed list and frees it.
5. usbdev_release() is called. Frees "ps".
6. async_completed() finally continues running wake_up(). ...but
wake_up() has a pointer to the freed "ps".
The instrumentation that led me to believe this was based on adding
some trace_printk() calls in a select few functions and then using
kdb's "ftdump" at crash time. The trace follows (NOTE: in the trace
below I cheated a little bit and added a udelay(1000) in
async_completed() after releasing the spinlock because I wanted it to
trigger quicker):
<...>-2104 0d.h2 13759034us!: async_completed at start: as=ffffffc0cc638200
mtpd-2055 3.... 13759356us : async_getcompleted before spin_lock_irqsave
mtpd-2055 3d..1 13759362us : async_getcompleted after list_del_init: as=ffffffc0cc638200
mtpd-2055 3.... 13759371us+: proc_reapurbnonblock_compat: free_async(ffffffc0cc638200)
mtpd-2055 3.... 13759422us+: async_getcompleted before spin_lock_irqsave
mtpd-2055 3.... 13759479us : usbdev_release at start: ps=ffffffc0cc042080
mtpd-2055 3.... 13759487us : async_getcompleted before spin_lock_irqsave
mtpd-2055 3.... 13759497us!: usbdev_release after kfree(ps): ps=ffffffc0cc042080
<...>-2104 0d.h2 13760294us : async_completed before wake_up(): as=ffffffc0cc638200
To fix this problem we can just move the wake_up() under the ps->lock.
There should be no issues there that I'm aware of.
Signed-off-by: Douglas Anderson <dianders@chromium.org> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The copy_from_user() function returns the number of bytes which we
weren't able to copy. We don't want to return that to the user but
instead we want to return -EFAULT.
Fixes: d7e09d0397e8 ("staging: add Lustre file system client support") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Make the behaviour of clk_get_rate consistent with common clk's
clk_get_rate by accepting NULL clocks as parameter. Some device
drivers rely on this, and will cause an OOPS otherwise.
The calculation of the left volume looks suspect, the value of
0x1f - ((val << 8) & 0x1f) is always 0x1f. The debug prior to the
assignment of value[1] prints the left volume setting using the
calculation 0x1f - (val >> 8) & 0x1f which looks correct to me.
Fix the left volume by using the correct expression as used in
the debug.
Detected by CoverityScan, CID#146140 ("Wrong operator used")
Fixes: 850d24a5a861 ("[media] em28xx-alsa: add mixer support for AC97 volume controls") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Hans Verkuil <hansverk@cisco.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Sparse tool complains with the following error:
drivers/infiniband/hw/usnic/usnic_ib_main.c:445:16: warning: cast removes
address space of expression
Fix it by doing casting on correct field and convert function helper which
sets ifaddr to be void, because there are no users who are interested in
returned value.
Fixes: c7845bcafe4d ("IB/usnic: Add UDP support in u*verbs.c, u*main.c and u*util.h") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
In the error path of sa1100_rtc_open(), info->clk is disabled which will
happen again in sa1100_rtc_remove() when the module is removed whereas it
is only enabled once in sa1100_rtc_init().
Fixes: 0cc0c38e9139 ("drivers/rtc/rtc-sa1100.c: move clock enable/disable to probe/remove") Acked-by: Robert Jarzmik <robert.jarzmik@free.fr> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The mask of sns_key_info1 suggests the upper nybble is being extracted
however the following shift of 8 bits is too large and always results in
0. Fix this by shifting only by 4 bits to correctly get the upper nybble.
Detected by CoverityScan, CID#142891 ("Operands don't affect result")
Fixes: fa590c222fba ("staging: rts5208: add support for rts5208 and rts5288") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
This driver cannot send pulse, it only accepts driver-dependent codes.
Signed-off-by: Sean Young <sean@mess.org> Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The size of uvc_control_mapping is user controlled leading to a
potential heap overflow in the uvc driver. This adds a check to verify
the user provided size fits within the bounds of the defined buffer
size.
Originally-from: Richard Simmons <rssimmo@amazon.com>
If kobject_init_and_add failed, then the failure path would
decrement the reference count of the queue kobject whose reference
count was already zero.
Fixes: 114cf5802165 ("bql: Byte queue limits") Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Ensure that the members of struct skd_msg_buf have been transferred
to the PCIe adapter before the doorbell is triggered. This patch
avoids that I/O fails sporadically and that the following error
message is reported:
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Since put_disk() triggers a disk_release() call and since that
last function calls blk_put_queue() if disk->queue != NULL, clear
the disk->queue pointer before calling put_disk(). This avoids
that unloading the skd kernel module triggers the following
use-after-free:
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Calling blk_start_queue() from interrupt context with the queue
lock held and without disabling IRQs, as the skd driver does, is
safe. This patch avoids that loading the skd driver triggers the
following warning:
Fixes: commit a038e2536472 ("[PATCH] blk_start_queue() must be called with irq disabled - add warning") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Andrew Morton <akpm@osdl.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
When fail to get needed page for pool, need to put allocated pages
into pool. But current code has a miscalculation of allocated pages,
correct it.
Signed-off-by: Xiangliang.Yu <Xiangliang.Yu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Monk Liu <monk.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Andi reported problems when parse errors were detected with vendor
events (json), because in the yyparse/parse_events_parse function we
dereferenced the _data parameter to two different structs, with
different layouts, which ended up making parse_events_evlist->error to
point to random stack addresses.
Fix it by making _data to always be struct parse_events_state, changing
the only place where 'struct parse_events_term' was used in
parse_events.y.
Reported-by: Andi Kleen <ak@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-bc27lshz823hxl8n9nkelcgh@git.kernel.org Fixes: 90e2b22dee90 ("perf/tool: Add support to reuse event grammar to parse out terms") Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Rename it from 'parse_events_evlist' to 'parse_events_state' to better
state that this is parsing state that has to be passed around.
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-dursqtg2h2w98ztaa297u43x@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
[bwh: Backported to 3.16: change all uses of the name in this version] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Several distributions mount the "proper root" as ro during initrd and
then remount it as rw before pivot_root(2). Thus, if a rescan had been
aborted by a previous shutdown, the rescan would never be resumed.
This issue would manifest itself as several btrfs ioctl(2)s causing the
entire machine to hang when btrfs_qgroup_wait_for_completion was hit
(due to the fs_info->qgroup_rescan_running flag being set but the rescan
itself not being resumed). Notably, Docker's btrfs storage driver makes
regular use of BTRFS_QUOTA_CTL_DISABLE and BTRFS_IOC_QUOTA_RESCAN_WAIT
(causing this problem to be manifested on boot for some machines).
Cc: Jeff Mahoney <jeffm@suse.com> Fixes: b382a324b60f ("Btrfs: fix qgroup rescan resume on mount") Signed-off-by: Aleksa Sarai <asarai@suse.de> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Tested-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
[bwh: Backported to 3.16: add #include "qgroup.h"] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>