The OTP in some SKUs have erroneously allowed 40MHz and 80MHz channels
in the 5.2GHz band. The firmware has been modified to not allow this
in those SKUs, so the driver needs to do the same otherwise the
firmware will assert when we try to use it.
Cpusets vs. suspend-resume is _completely_ broken. And it got noticed
because it now resulted in non-cpuset usage breaking too.
On suspend cpuset_cpu_inactive() doesn't call into
cpuset_update_active_cpus() because it doesn't want to move tasks about,
there is no need, all tasks are frozen and won't run again until after
we've resumed everything.
But this means that when we finally do call into
cpuset_update_active_cpus() after resuming the last frozen cpu in
cpuset_cpu_active(), the top_cpuset will not have any difference with
the cpu_active_mask and this it will not in fact do _anything_.
So the cpuset configuration will not be restored. This was largely
hidden because we would unconditionally create identity domains and
mobile users would not in fact use cpusets much. And servers what do use
cpusets tend to not suspend-resume much.
An addition problem is that we'd not in fact wait for the cpuset work to
finish before resuming the tasks, allowing spurious migrations outside
of the specified domains.
Fix the rebuild by introducing cpuset_force_rebuild() and fix the
ordering with cpuset_wait_for_hotplug().
Reported-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: deb7aa308ea2 ("cpuset: reorganize CPU / memory hotplug handling") Link: http://lkml.kernel.org/r/20170907091338.orwxrqkbfkki3c24@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Most importantly, solve a crash where %llu was used to format signed
numbers. This would cause a buffer overflow when reading sysfs
writeback_rate_debug, as only 20 bytes were allocated for this and
%llu writes 20 characters plus a null.
Always use the units mechanism rather than having different output
paths for simplicity.
Also, correct problems with display output where 1.10 was a larger
number than 1.09, by multiplying by 10 and then dividing by 1024 instead
of dividing by 100. (Remainders of >= 1000 would print as .10).
Minor changes: Always display the decimal point instead of trying to
omit it based on number of digits shown. Decide what units to use
based on 1000 as a threshold, not 1024 (in other words, always print
at most 3 digits before the decimal point).
Signed-off-by: Michael Lyle <mlyle@lyle.org> Reported-by: Dmitry Yu Okunev <dyokunev@ut.mephi.ru> Acked-by: Kent Overstreet <kent.overstreet@gmail.com> Reviewed-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sequential write IOs were tested with bs=1M by FIO in writeback cache
mode, these IOs were expected to be bypassed, but actually they did not.
We debug the code, and find in check_should_bypass():
if (!congested &&
mode == CACHE_MODE_WRITEBACK &&
op_is_write(bio_op(bio)) &&
(bio->bi_opf & REQ_SYNC))
goto rescale
that means, If in writeback mode, a write IO with REQ_SYNC flag will not
be bypassed though it is a sequential large IO, It's not a correct thing
to do actually, so this patch remove these codes.
Signed-off-by: tang.junhui <tang.junhui@zte.com.cn> Reviewed-by: Kent Overstreet <kent.overstreet@gmail.com> Reviewed-by: Eric Wheeler <bcache@linux.ewheeler.net> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If you encounter any errors in bch_cached_dev_attach it will return
a negative error code. The variable 'v' which stores the result is
unsigned, thus user space sees a very large value returned for bytes
written which can cause incorrect user space behavior. Utilize 1
signed variable to use throughout the function to preserve error return
capability.
Signed-off-by: Tony Asleson <tasleson@redhat.com> Acked-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
__update_write_rate() uses a Proportion-Differentiation Controller
algorithm to control writeback rate. A dirty target number is used in
this PD controller to control writeback rate. A larger target number
will make the writeback rate smaller, on the versus, a smaller target
number will make the writeback rate larger.
bcache uses the following steps to calculate the target number,
1) cache_sectors = all-buckets-of-cache-set * buckets-size
2) cache_dirty_target = cache_sectors * cached-device-writeback_percent
3) target = cache_dirty_target *
(sectors-of-cached-device/sectors-of-all-cached-devices-of-this-cache-set)
The calculation at step 1) for cache_sectors is incorrect, which does
not consider dirty blocks occupied by flash only volume.
A flash only volume can be took as a bcache device without cached
device. All data sectors allocated for it are persistent on cache device
and marked dirty, they are not touched by bcache writeback and garbage
collection code. So data blocks of flash only volume should be ignore
when calculating cache_sectors of cache set.
Current code does not subtract dirty sectors of flash only volume, which
results a larger target number from the above 3 steps. And in sequence
the cache device's writeback rate is smaller then a correct value,
writeback speed is slower on all cached devices.
This patch fixes the incorrect slower writeback rate by subtracting
dirty sectors of flash only volumes in __update_writeback_rate().
(Commit log composed by Coly Li to pass checkpatch.pl checking)
If blkdev_get_by_path() in register_bcache() fails, we try to lookup the
block device using lookup_bdev() to detect which situation we are in to
properly report error. However we never drop the reference returned to
us from lookup_bdev(). Fix that.
Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
bcache uses a Proportion-Differentiation Controller algorithm to control
writeback rate to cached devices. In the PD controller algorithm, dirty
stripes of thin flash device should not be counted in, because flash only
volumes never write back dirty data.
Currently dirty stripe counter for thin flash device is not initialized
when the thin flash device starts. Which means the following calculation
in PD controller will reference an undefined dirty stripes number, and
all cached devices attached to the same cache set where the thin flash
device lies on may have an inaccurate writeback rate.
This patch calles bch_sectors_dirty_init() in flash_dev_run(), to
correctly initialize dirty stripe counter when the thin flash device
starts to run. This patch also does following parameter data type change,
-void bch_sectors_dirty_init(struct cached_dev *dc);
+void bch_sectors_dirty_init(struct bcache_device *);
to call this function conveniently in flash_dev_run().
ALSA sequencer core has a mechanism to load the enumerated devices
automatically, and it's performed in an off-load work. This seems
causing some race when a sequencer is removed while the pending
autoload work is running. As syzkaller spotted, it may lead to some
use-after-free:
BUG: KASAN: use-after-free in snd_rawmidi_dev_seq_free+0x69/0x70
sound/core/rawmidi.c:1617
Write of size 8 at addr ffff88006c611d90 by task kworker/2:1/567
When the devfreq_add_device fails to register deivce, the memory
leak of devfreq instance happen. So, this patch fix the memory
leak issue. Before freeing the devfreq instance checks whether
devfreq instance is NULL or not because the device_unregister()
frees the devfreq instance when jumping to the 'err_init'.
It is to prevent the duplicate the kfee(devfreq).
The size of uvc_control_mapping is user controlled leading to a
potential heap overflow in the uvc driver. This adds a check to verify
the user provided size fits within the bounds of the defined buffer
size.
Originally-from: Richard Simmons <rssimmo@amazon.com>
Only when the lirc device is freed, should we drop our reference to
rc_dev, else we the rc_dev is freed to early. If userspace has
a file descriptor open during unplug, it goes bang.
==================================================================
BUG: KASAN: use-after-free in __lock_acquire+0x7bb/0x1e10
Read of size 8 at addr ffff8801d7d61ed0 by task ir-rec/2609
Certain syscalls like recvmmsg support 64 bit timespec values for the
X32 ABI. The helper function compat_put_timespec converts a timespec
value to a 32 bit or 64 bit value depending on what ABI is used. The
v4l2 compat layer, however, is not designed to support 64 bit timespec
values and always uses 32 bit values. Hence, compat_put_timespec must
not be used.
Without this patch, user space will be provided with bad timestamp
values from the VIDIOC_DQEVENT ioctl. Also, fields of the struct
v4l2_event32 that come immediately after timestamp get overwritten,
namely the field named id.
Fixes: 81993e81a994 ("compat: Get rid of (get|put)_compat_time(val|spec)") Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Cc: Tiffany Lin <tiffany.lin@mediatek.com> Cc: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Daniel Mentz <danielmentz@google.com> Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The order in __tlb_flush_mm_lazy is to flush TLB first and then clear
the mm->context.flush_mm bit. This can lead to missed flushes as the
bit can be set anytime, the order needs to be the other way aronud.
But this leads to a different race, __tlb_flush_mm_lazy may be called
on two CPUs concurrently. If mm->context.flush_mm is cleared first then
another CPU can bypass __tlb_flush_mm_lazy although the first CPU has
not done the flush yet. In a virtualized environment the time until the
flush is finally completed can be arbitrarily long.
Add a spinlock to serialize __tlb_flush_mm_lazy and use the function
in finish_arch_post_lock_switch as well.
The local TLB flushing code keeps an additional mask in the mm.context,
the cpu_attach_mask. At the time a global flush of an address space is
done the cpu_attach_mask is copied to the mm_cpumask in order to avoid
future global flushes in case the mm is used by a single CPU only after
the flush.
Trouble is that the reset of the mm_cpumask is racy against the detach
of an mm address space by switch_mm. The current order is first the
global TLB flush and then the copy of the cpu_attach_mask to the
mm_cpumask. The order needs to be the other way around.
As we want to remove spin_unlock_wait() and replace it with explicit
spin_lock()/spin_unlock() calls, we can use this to simplify the
locking.
In addition:
- Reading nf_conntrack_locks_all needs ACQUIRE memory ordering.
- The new code avoids the backwards loop.
Only slightly tested, I did not manage to trigger calls to
nf_conntrack_all_lock().
V2: With improved comments, to clearly show how the barriers
pair.
Fixes: b16c29191dc8 ("netfilter: nf_conntrack: use safer way to lock all buckets") Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: netfilter-devel@vger.kernel.org Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When a power fault occurs, the power controller sets Power Fault Detected
in the Slot Status register, and pciehp_isr() queues an INT_POWER_FAULT
event to handle it.
It also clears Power Fault Detected, but since nothing has yet changed to
correct the power fault, the power controller will likely set it again
immediately, which may cause an infinite loop when pcie_isr() rechecks
Slot Status.
Fix that by masking off Power Fault Detected from new events if the driver
hasn't seen the power fault clear from the previous handling attempt.
Fixes: fad214b0aa72 ("PCI: pciehp: Process all hotplug events before looking for new ones") Signed-off-by: Keith Busch <keith.busch@intel.com>
[bhelgaas: changelog, pull test out and add comment] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Mayurkumar Patel <mayurkumar.patel@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
An SHPC may generate MSIs to notify software about slot or controller
events (SHPC spec r1.0, sec 4.7). A PCI device can only generate an MSI if
it has bus mastering enabled.
Enable bus mastering if the bridge contains an SHPC that uses MSI for event
notifications.
Signed-off-by: Aleksandr Bezzubikov <zuban32s@gmail.com>
[bhelgaas: changelog] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
I recently came upon a scenario where I would get a double fault
machine check exception tiriggered by a kernel module.
However the ensuing crash stacktrace (ksym lookup) was not working
correctly.
Turns out that machine check auto-disables MMU while modules are allocated
in kernel vaddr spapce.
This patch re-enables the MMU before start printing the stacktrace
making stacktracing of modules work upon a fatal exception.
Signed-off-by: Jose Abreu <joabreu@synopsys.com> Reviewed-by: Alexey Brodkin <abrodkin@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
[vgupta: moved code into low level handler to avoid in 2 places] Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently trace_clock timestamps are applied to both regular and max
buffers only for global trace. For instance trace, trace_clock
timestamps are applied only to regular buffer. But, regular and max
buffers can be swapped, for example, following a snapshot. So, for
instance trace, bad timestamps can be seen following a snapshot.
Let's apply trace_clock timestamps to instance max buffer as well.
When disabling one trace event, the RECORDED_TGID flag in the event
file is not correctly cleared. It's clearing RECORDED_CMD flag when
it should clear RECORDED_TGID flag.
Link: http://lkml.kernel.org/r/1504589806-8425-1-git-send-email-chuhu@redhat.com Cc: Joel Fernandes <joelaf@google.com> Fixes: d914ba37d7 ("tracing: Add support for recording tgid of tasks") Signed-off-by: Chunyu Hu <chuhu@redhat.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
trace_printk() uses 4 buffers, one for each context (normal, softirq, irq
and NMI), such that it does not need to worry about one context preempting
the other. There's a nesting counter that gets incremented to figure out
which buffer to use. If the context gets preempted by another context which
calls trace_printk() it will increment the counter and use the next buffer,
and restore the counter when it is finished.
The problem is that gcc may optimize the modification of the buffer nesting
counter and it may not be incremented in memory before the buffer is used.
If this happens, and the context gets interrupted by another context, it
could pick the same buffer and corrupt the one that is being used.
Compiler barriers need to be added after the nesting variable is incremented
and before it is decremented to prevent usage of the context buffers by more
than one context at the same time.
Cc: Andy Lutomirski <luto@kernel.org> Fixes: e2ace00117 ("tracing: Choose static tp_printk buffer by explicit nesting count") Hat-tip-to: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If function tracing is disabled by the user via the function-trace option or
the proc sysctl file, and a ftrace_ops that was allocated on the heap is
unregistered, then the shutdown code exits out without doing the proper
clean up. This was found via kmemleak and running the ftrace selftests, as
one of the tests unregisters with function tracing disabled.
In the second iteration of trace_selftest_ops(), the error goto label is
wrong in the case where trace_selftest_test_global_cnt is off. In the
case of error, it leaks the dynamic ops that was allocated.
Fixes: 95950c2e ("ftrace: Add self-tests for multiple function trace users") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
stack_tracer_disable()/stack_tracer_enable() had been using the wrong
name for the config symbol to enable their preempt-debugging checks --
fix with a word swap.
Link: http://lkml.kernel.org/r/20170831154036.4xldyakmmhuts5x7@hatter.bewilderbeest.net Fixes: 8aaf1ee70e ("tracing: Rename trace_active to disable_stack_tracer and inline its modification") Signed-off-by: Zev Weiss <zev@bewilderbeest.net> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The mask used in CMPL_START_ADDR_VALUE() should be 27bits instead of
26bits. This incorrect mask was causing completion writes to 40bits
physical address fail.
This patch fixes mask used in CMPL_START_ADDR_VALUE() macro.
Fixes: dbc049eee730 ("mailbox: Add driver for Broadcom FlexRM
ring manager")
Signed-off-by: Anup Patel <anup.patel@broadcom.com> Reviewed-by: Ray Jui <ray.jui@broadcom.com> Reviewed-by: Scott Branden <scott.branden@broadcom.com> Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The value of "size" comes from the user. When we add "start + size" it
could lead to an integer overflow bug.
It means we vmalloc() a lot more memory than we had intended. I believe
that on 64 bit systems vmalloc() can succeed even if we ask it to
allocate huge 4GB buffers. So we would get memory corruption and likely
a crash when we call ha->isp_ops->write_optrom() and ->read_optrom().
Only root can trigger this bug.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=194061 Fixes: b7cc176c9eb3 ("[SCSI] qla2xxx: Allow region-based flash-part accesses.") Reported-by: shqking <shqking@gmail.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When calling SG_GET_REQUEST_TABLE ioctl only a half-filled table is
returned; the remaining part will then contain stale kernel memory
information. This patch zeroes out the entire table to avoid this
issue.
Signed-off-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Factor out sg_fill_request_table() for better readability.
[mkp: typos, applied by hand]
Signed-off-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When storvsc is sending I/O to Hyper-v, it may allocate a bigger buffer
descriptor for large data payload that can't fit into a pre-allocated
buffer descriptor. This bigger buffer is freed on return path.
If I/O request to Hyper-v fails due to ring buffer busy, the storvsc
allocated buffer descriptor should also be freed.
[mkp: applied by hand]
Fixes: be0cf6ca301c ("scsi: storvsc: Set the tablesize based on the information given by the host") Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After a kill adapter, since the cmd_status is not set, the IOCTLs will
be hung in driver resulting in application hang. Set cmd_status
MFI_STAT_WRONG_STATE when completing pended IOCTLs.
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Setting resetwaittime to 0 during a FW fault will result in driver not
calling the OCR.
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Driver allocated 256 byte MFI frames bytes but while sending MFI frame
(embedded inside chain frame of MPT frame) to firmware, driver sets the
length as 4k. This results in DMA read error messages during boot.
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This fixes a potential race condition observed on Power systems.
Several places throughout the aacraid driver call aac_fib_send or
similar to send a command to the aacraid adapter, then check the return
code to determine if the command was actually sent to the adapter, then
update the phase field in the scsi command scratch pad area to track
that the firmware now owns this command. However, there is nothing that
ensures that by the time the aac_fib_send function returns and we go to
write to the scsi command, that the command hasn't already completed and
the scsi command has been freed. This was causing random crashes in the
TCP stack which was tracked down to be caused by memory that had been a
struct request + scsi_cmnd being now used for an skbuff. Memory
poisoning was enabled in the kernel to debug this which showed that the
last owner of the memory that had been freed was aacraid and that it was
a struct request. The memory that was corrupted was the exact data
pattern of AAC_OWNER_FIRMWARE and it was at the same offset that aacraid
writes, which is scsicmd->SCp.phase. The patch below resolves this
issue.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Tested-by: Wen Xiong <wenxiong@linux.vnet.ibm.com> Reviewed-by: Dave Carroll <david.carroll@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The > here should be >= or we end up reading one element beyond the end
of the qedi->itt_map[] array. The qedi->itt_map[] array is allocated in
qedi_alloc_itt().
Fixes: ace7f46ba5fd ("scsi: qedi: Add QLogic FastLinQ offload iSCSI driver framework.") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Manish Rangankar <Manish.Rangankar@cavium.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Complements debugging aspects of the otherwise functionally complete
v3.17 commit 9cb78c16f5da ("scsi: use 64-bit LUNs").
While I don't have access to a target exporting 3 or 4 level LUNs,
I did test it by explicitly attaching a non-existent fake 4 level LUN
by means of zfcp sysfs attribute "unit_add".
In order to see corresponding trace records of otherwise successful
events, we had to increase the trace level of area SCSI and HBA to 6.
At the default trace level, we only trace unsuccessful events including
FSF responses.
zfcp_dbf_hba_fsf_response() only used protocol status and FSF status to
decide on an unsuccessful response. However, this is only one of multiple
possible sources determining a failed struct zfcp_fsf_req.
An FSF request can also "fail" if its response runs into an ERP timeout
or if it gets dismissed because a higher level recovery was triggered
[trace tags "erscf_1" or "erscf_2" in zfcp_erp_strategy_check_fsfreq()].
FSF requests with ERP timeout are:
FSF_QTCB_EXCHANGE_CONFIG_DATA, FSF_QTCB_EXCHANGE_PORT_DATA,
FSF_QTCB_OPEN_PORT_WITH_DID or FSF_QTCB_CLOSE_PORT or
FSF_QTCB_CLOSE_PHYSICAL_PORT for target ports,
FSF_QTCB_OPEN_LUN, FSF_QTCB_CLOSE_LUN.
One example is slow queue processing which can cause follow-on errors,
e.g. FSF_PORT_ALREADY_OPEN after FSF_QTCB_OPEN_PORT_WITH_DID timed out.
In order to see the root cause, we need to see late responses even if the
channel presented them successfully with FSF_PROT_GOOD and FSF_GOOD.
Example trace records formatted with zfcpdbf from the s390-tools package:
Timestamp : ...
Area : REC
Subarea : 00
Level : 1
Exception : -
CPU ID : ..
Caller : ...
Record ID : 1
Tag : fcegpf1
LUN : 0xffffffffffffffff
WWPN : 0x<WWPN>
D_ID : 0x00<D_ID>
Adapter status : 0x5400050b
Port status : 0x41200000
LUN status : 0x00000000
Ready count : 0x00000001
Running count : 0x...
ERP want : 0x02 ZFCP_ERP_ACTION_REOPEN_PORT
ERP need : 0x02 ZFCP_ERP_ACTION_REOPEN_PORT
|
Timestamp : ... 30 seconds later
Area : REC
Subarea : 00
Level : 1
Exception : -
CPU ID : ..
Caller : ...
Record ID : 2
Tag : erscf_2
LUN : 0xffffffffffffffff
WWPN : 0x<WWPN>
D_ID : 0x00<D_ID>
Adapter status : 0x5400050b
Port status : 0x41200000
LUN status : 0x00000000
Request ID : 0x<request_ID>
ERP status : 0x10000000 ZFCP_STATUS_ERP_TIMEDOUT
ERP step : 0x0800 ZFCP_ERP_STEP_PORT_OPENING
ERP action : 0x02 ZFCP_ERP_ACTION_REOPEN_PORT
ERP count : 0x00
|
Timestamp : ... later than previous record
Area : HBA
Subarea : 00
Level : 5 > default level => 3 <= default level
Exception : -
CPU ID : 00
Caller : ...
Record ID : 1
Tag : fs_qtcb => fs_rerr
Request ID : 0x<request_ID>
Request status : 0x00001010 ZFCP_STATUS_FSFREQ_DISMISSED
| ZFCP_STATUS_FSFREQ_CLEANUP
FSF cmnd : 0x00000005
FSF sequence no: 0x...
FSF issued : ... > 30 seconds ago
FSF stat : 0x00000000 FSF_GOOD
FSF stat qual : 00000000000000000000000000000000
Prot stat : 0x00000001 FSF_PROT_GOOD
Prot stat qual : 00000000000000000000000000000000
Port handle : 0x...
LUN handle : 0x00000000
QTCB log length: ...
QTCB log info : ...
In case of problems detecting that new responses are waiting on the input
queue, we sooner or later trigger adapter recovery due to an FSF request
timeout (trace tag "fsrth_1").
FSF requests with FSF request timeout are:
typically FSF_QTCB_ABORT_FCP_CMND; but theoretically also
FSF_QTCB_EXCHANGE_CONFIG_DATA or FSF_QTCB_EXCHANGE_PORT_DATA via sysfs,
FSF_QTCB_OPEN_PORT_WITH_DID or FSF_QTCB_CLOSE_PORT for WKA ports,
FSF_QTCB_FCP_CMND for task management function (LUN / target reset).
One or more pending requests can meanwhile have FSF_PROT_GOOD and FSF_GOOD
because the channel filled in the response via DMA into the request's QTCB.
In a theroretical case, inject code can create an erroneous FSF request
on purpose. If data router is enabled, it uses deferred error reporting.
A READ SCSI command can succeed with FSF_PROT_GOOD, FSF_GOOD, and
SAM_STAT_GOOD. But on writing the read data to host memory via DMA,
it can still fail, e.g. if an intentionally wrong scatter list does not
provide enough space. Rather than getting an unsuccessful response,
we get a QDIO activate check which in turn triggers adapter recovery.
One or more pending requests can meanwhile have FSF_PROT_GOOD and FSF_GOOD
because the channel filled in the response via DMA into the request's QTCB.
Example trace records formatted with zfcpdbf from the s390-tools package:
Timestamp : ...
Area : HBA
Subarea : 00
Level : 6 > default level => 3 <= default level
Exception : -
CPU ID : ..
Caller : ...
Record ID : 1
Tag : fs_norm => fs_rerr
Request ID : 0x<request_ID2>
Request status : 0x00001010 ZFCP_STATUS_FSFREQ_DISMISSED
| ZFCP_STATUS_FSFREQ_CLEANUP
FSF cmnd : 0x00000001
FSF sequence no: 0x...
FSF issued : ...
FSF stat : 0x00000000 FSF_GOOD
FSF stat qual : 00000000000000000000000000000000
Prot stat : 0x00000001 FSF_PROT_GOOD
Prot stat qual : ........ ........ 0000000000000000
Port handle : 0x...
LUN handle : 0x...
|
Timestamp : ...
Area : SCSI
Subarea : 00
Level : 3
Exception : -
CPU ID : ..
Caller : ...
Record ID : 1
Tag : rsl_err
Request ID : 0x<request_ID2>
SCSI ID : 0x...
SCSI LUN : 0x...
SCSI result : 0x000e0000 DID_TRANSPORT_DISRUPTED
SCSI retries : 0x00
SCSI allowed : 0x05
SCSI scribble : 0x<request_ID2>
SCSI opcode : 28... Read(10)
FCP rsp inf cod: 0x00
FCP rsp IU : 00000000000000000000000000000000
^^ SAM_STAT_GOOD 0000000000000000
Only with luck in both above cases, we could see a follow-on trace record
of an unsuccesful event following a successful but late FSF response with
FSF_PROT_GOOD and FSF_GOOD. Typically this was the case for I/O requests
resulting in a SCSI trace record "rsl_err" with DID_TRANSPORT_DISRUPTED
[On ZFCP_STATUS_FSFREQ_DISMISSED, zfcp_fsf_protstatus_eval() sets
ZFCP_STATUS_FSFREQ_ERROR seen by the request handler functions as failure].
However, the reason for this follow-on trace was invisible because the
corresponding HBA trace record was missing at the default trace level
(by default hidden records with tags "fs_norm", "fs_qtcb", or "fs_open").
On adapter recovery, after we had shut down the QDIO queues, we perform
unsuccessful pseudo completions with flag ZFCP_STATUS_FSFREQ_DISMISSED
for each pending FSF request in zfcp_fsf_req_dismiss_all().
In order to find the root cause, we need to see all pseudo responses even
if the channel presented them successfully with FSF_PROT_GOOD and FSF_GOOD.
Therefore, check zfcp_fsf_req.status for ZFCP_STATUS_FSFREQ_DISMISSED
or ZFCP_STATUS_FSFREQ_ERROR and trace with a new tag "fs_rerr".
It does not matter that there are numerous places which set
ZFCP_STATUS_FSFREQ_ERROR after the location where we trace an FSF response
early. These cases are based on protocol status != FSF_PROT_GOOD or
== FSF_PROT_FSF_STATUS_PRESENTED and are thus already traced by default
as trace tag "fs_perr" or "fs_ferr" respectively.
NB: The trace record with tag "fssrh_1" for status read buffers on dismiss
all remains. zfcp_fsf_req_complete() handles this and returns early.
All other FSF request types are handled separately and as described above.
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 8a36e4532ea1 ("[SCSI] zfcp: enhancement of zfcp debug features") Fixes: 2e261af84cdb ("[SCSI] zfcp: Only collect FSF/HBA debug data for matching trace levels") Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If the FCP_RSP UI has optional parts (FCP_SNS_INFO or FCP_RSP_INFO) and
thus does not fit into the fsp_rsp field built into a SCSI trace record,
trace the full FCP_RSP UI with all optional parts as payload record
instead of just FCP_SNS_INFO as payload and
a 1 byte RSP_INFO_CODE part of FCP_RSP_INFO built into the SCSI record.
That way we would also get the full FCP_SNS_INFO in case a
target would ever send more than
min(SCSI_SENSE_BUFFERSIZE==96, ZFCP_DBF_PAY_MAX_REC==256)==96.
The mandatory part of FCP_RSP IU is only 24 bytes.
PAYload costs at least one full PAY record of 256 bytes anyway.
We cap to the hardware response size which is only FSF_FCP_RSP_SIZE==128.
So we can just put the whole FCP_RSP IU with any optional parts into
PAYload similarly as we do for SAN PAY since v4.9 commit aceeffbb59bb
("zfcp: trace full payload of all SAN records (req,resp,iels)").
This does not cause any additional trace records wasting memory.
Decoded trace records were confusing because they showed a hard-coded
sense data length of 96 even if the FCP_RSP_IU field FCP_SNS_LEN showed
actually less.
Since the same commit, we set pl_len for SAN traces to the full length of a
request/response even if we cap the corresponding trace.
In contrast, here for SCSI traces we set pl_len to the pre-computed
length of FCP_RSP IU considering SNS_LEN or RSP_LEN if valid.
Nonetheless we trace a hardcoded payload of length FSF_FCP_RSP_SIZE==128
if there were optional parts.
This makes it easier for the zfcpdbf tool to format only the relevant
part of the long FCP_RSP UI buffer. And any trailing information is still
available in the payload trace record just in case.
Rename the payload record tag from "fcp_sns" to "fcp_riu" to make the new
content explicit to zfcpdbf which can then pick a suitable field name such
as "FCP rsp IU all:" instead of "Sense info :"
Also, the same zfcpdbf can still be backwards compatible with "fcp_sns".
Old example trace record before this fix, formatted with the tool zfcpdbf
from s390-tools:
Timestamp : ...
Area : SCSI
Subarea : 00
Level : 1
Exception : -
CPU ID : ..
Caller : 0x...
Record ID : 1
Tag : lr_okay
Request ID : 0x<request_id>
SCSI ID : 0x...
SCSI LUN : 0x...
SCSI result : 0x00000000
SCSI retries : 0x00
SCSI allowed : 0x05
SCSI scribble : 0x<request_id>
SCSI opcode : <CDB of unrelated SCSI command passed to eh handler>
FCP rsp inf cod: 0x00
FCP rsp IU : 00000000000000000000010000000000 0000000000000008
FCP rsp IU len : 32
FCP rsp IU all : 00000000000000000000010000000000
^^==FCP_RSP_LEN_VALID 00000000000000080000000000000000
^^^^^^^^==FCP_RSP_LEN
^^^^^^^^^^^^^^^^^==FCP_RSP_INFO
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 250a1352b95e ("[SCSI] zfcp: Redesign of the debug tracing for SCSI records.") Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For problem determination we need to see that we were in scsi_eh
as well as whether and why we were successful or not.
The following commits introduced new early returns without adding
a trace record:
v2.6.35 commit a1dbfddd02d2
("[SCSI] zfcp: Pass return code from fc_block_scsi_eh to scsi eh")
on fc_block_scsi_eh() returning != 0 which is FAST_IO_FAIL,
v2.6.30 commit 63caf367e1c9
("[SCSI] zfcp: Improve reliability of SCSI eh handlers in zfcp")
on not having gotten an FSF request after the maximum number of retry
attempts and thus could not issue a TMF and has to return FAILED.
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: a1dbfddd02d2 ("[SCSI] zfcp: Pass return code from fc_block_scsi_eh to scsi eh") Fixes: 63caf367e1c9 ("[SCSI] zfcp: Improve reliability of SCSI eh handlers in zfcp") Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Without this fix we get SCSI trace records on task management functions
which cannot be correlated to HBA trace records because all fields
related to the FSF request are empty (zero).
Also, the FCP_RSP_IU is missing as well as any sense data if available.
This was caused by v2.6.14 commit 8a36e4532ea1 ("[SCSI] zfcp: enhancement
of zfcp debug features") introducing trace records for TMFs but
hard coding NULL for a possibly existing TMF FSF request.
The scsi_cmnd scribble is also zero or unrelated for the TMF request
so it also could not lookup a suitable FSF request from there.
A broken example trace record formatted with zfcpdbf from the s390-tools
package:
Timestamp : ...
Area : SCSI
Subarea : 00
Level : 1
Exception : -
CPU ID : ..
Caller : 0x...
Record ID : 1
Tag : lr_fail
Request ID : 0x0000000000000000
^^^^^^^^^^^^^^^^ no correlation to HBA record
SCSI ID : 0x<scsitarget>
SCSI LUN : 0x<scsilun>
SCSI result : 0x000e0000
SCSI retries : 0x00
SCSI allowed : 0x05
SCSI scribble : 0x0000000000000000
SCSI opcode : 2a0000173bb800000800000000000000
FCP rsp inf cod: 0x00
^^ no TMF response
FCP rsp IU : 00000000000000000000000000000000
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 0000000000000000
^^^^^^^^^^^^^^^^^ no interesting FCP_RSP_IU
Sense len : ...
^^^^^^^^^^^^^^^^^^^^ no sense data length
Sense info : ...
^^^^^^^^^^^^^^^^^^^^ no sense data content, even if present
There are some true cases where we really do not have an FSF request:
"rsl_fai" from zfcp_dbf_scsi_fail_send() called for early
returns / completions in zfcp_scsi_queuecommand(),
"abrt_or", "abrt_bl", "abrt_ru", "abrt_ar" from
zfcp_scsi_eh_abort_handler() where we did not get as far,
"lr_nres", "tr_nres" from zfcp_task_mgmt_function() where we're
successful and do not need to do anything because adapter stopped.
For these cases it's correct to pass NULL for fsf_req to _zfcp_dbf_scsi().
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 8a36e4532ea1 ("[SCSI] zfcp: enhancement of zfcp debug features") Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
v4.9 commit aceeffbb59bb ("zfcp: trace full payload of all SAN records
(req,resp,iels)") fixed trace data loss of 2.6.38 commit 2c55b750a884
("[SCSI] zfcp: Redesign of the debug tracing for SAN records.")
necessary for problem determination, e.g. to see the
currently active zone set during automatic port scan.
While it already saves space by not dumping any empty residual entries
of the large successful GPN_FT response (4 pages), there are seldom cases
where the GPN_FT response is unsuccessful and likely does not have
FC_NS_FID_LAST set in fp_flags so we did not cap the trace record.
We typically see such case for an initiator WWPN, which is not in any zone.
Cap unsuccessful responses to at least the actual basic CT_IU response
plus whatever fits the SAN trace record built-in "payload" buffer
just in case there's trailing information
of which we would at least see the existence and its beginning.
In order not to erroneously cap successful responses, we need to swap
calling the trace function and setting the CT / ELS status to success (0).
The fix saves all but one of the previously associated 64 PAYload trace
record chunks of size 256 bytes each.
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: aceeffbb59bb ("zfcp: trace full payload of all SAN records (req,resp,iels)") Fixes: 2c55b750a884 ("[SCSI] zfcp: Redesign of the debug tracing for SAN records.") Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Up until now zfcp would just ignore the FCP_RESID_OVER flag in the FCP
response IU. When this flag is set, it is possible, in regards to the
FCP standard, that the storage-server processes the command normally, up
to the point where data is missing and simply ignores those.
In this case no CHECK CONDITION would be set, and because we ignored the
FCP_RESID_OVER flag we resulted in at least a data loss or even
-corruption as a follow-up error, depending on how the
applications/layers on top behave. To prevent this, we now set the
host-byte of the corresponding scsi_cmnd to DID_ERROR.
Other storage-behaviors, where the same condition results in a CHECK
CONDITION set in the answer, don't need to be changed as they are
handled in the mid-layer already.
Following is an example trace record decoded with zfcpdbf from the
s390-tools package. We forcefully injected a fc_dl which is one byte too
small:
Timestamp : ...
Area : SCSI
Subarea : 00
Level : 3
Exception : -
CPU ID : ..
Caller : 0x...
Record ID : 1
Tag : rsl_err
Request ID : 0x...
SCSI ID : 0x...
SCSI LUN : 0x...
SCSI result : 0x00070000
^^DID_ERROR
SCSI retries : 0x..
SCSI allowed : 0x..
SCSI scribble : 0x...
SCSI opcode : 2a000000000000000800000000000000
FCP rsp inf cod: 0x00
FCP rsp IU : 00000000000000000000040000000001
^^fr_flags==FCP_RESID_OVER
^^fr_status==SAM_STAT_GOOD
^^^^^^^^fr_resid 0000000000000000
As of now, we don't actively handle to possibility that a response IU
has both flags - FCP_RESID_OVER and FCP_RESID_UNDER - set at once.
Reported-by: Luke M. Hopkins <lmhopkin@us.ibm.com> Reviewed-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 553448f6c483 ("[SCSI] zfcp: Message cleanup") Fixes: ea127f975424 ("[PATCH] s390 (7/7): zfcp host adapter.") (tglx/history.git) Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since commit db007fc5e20c ("[SCSI] Command protection operation"),
scsi_eh_prep_cmnd() saves scmd->prot_op and temporarily resets it to
SCSI_PROT_NORMAL.
Other FCP LLDDs such as qla2xxx and lpfc shield their queuecommand()
to only access any of scsi_prot_sg...() if
(scsi_get_prot_op(cmd) != SCSI_PROT_NORMAL).
Do the same thing for zfcp, which introduced DIX support with
commit ef3eb71d8ba4 ("[SCSI] zfcp: Introduce experimental support for
DIF/DIX").
Otherwise, TUR SCSI commands as part of scsi_eh likely fail in zfcp,
because the regular SCSI command with DIX protection data, that scsi_eh
re-uses in scsi_send_eh_cmnd(), of course still has
(scsi_prot_sg_count() != 0) and so zfcp sends down bogus requests to the
FCP channel hardware.
This causes scsi_eh_test_devices() to have (finish_cmds == 0)
[not SCSI device is online or not scsi_eh_tur() failed]
so regular SCSI commands, that caused / were affected by scsi_eh,
are moved to work_q and scsi_eh_test_devices() itself returns false.
In turn, it unnecessarily escalates in our case in scsi_eh_ready_devs()
beyond host reset to finally scsi_eh_offline_sdevs()
which sets affected SCSI devices offline with the following kernel message:
"kernel: sd H:0:T:L: Device offlined - not ready after error recovery"
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: ef3eb71d8ba4 ("[SCSI] zfcp: Introduce experimental support for DIF/DIX") Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ensure that the members of struct skd_msg_buf have been transferred
to the PCIe adapter before the doorbell is triggered. This patch
avoids that I/O fails sporadically and that the following error
message is reported:
Since put_disk() triggers a disk_release() call and since that
last function calls blk_put_queue() if disk->queue != NULL, clear
the disk->queue pointer before calling put_disk(). This avoids
that unloading the skd kernel module triggers the following
use-after-free:
bitmap_resize() does not work for file-backed bitmaps.
The buffer_heads are allocated and initialized when
the bitmap is read from the file, but resize doesn't
read from the file, it loads from the internal bitmap.
When it comes time to write the new bitmap, the bh is
non-existent and we crash.
The common case when growing an array involves making the array larger,
and that normally means making the bitmap larger. Doing
that inside the kernel is possible, but would need more code.
It is probably easier to require people who use file-backed
bitmaps to remove them and re-add after a reshape.
So this patch disables the resizing of arrays which have
file-backed bitmaps. This is better than crashing.
Reported-by: Zhilong Liu <zlliu@suse.com> Fixes: d60b479d177a ("md/bitmap: add bitmap_resize function to allow bitmap resizing.") Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
raid5 cache could write bitmap superblock before bitmap superblock is
initialized. The bitmap superblock is less than 512B. The current code will
only copy the superblock to a new page and write the whole 512B, which will
zero the the data after the superblock. Unfortunately the data could include
bitmap, which we should preserve. The patch will make superblock read do 4k
chunk and we always copy the 4k data to new page, so the superblock write will
old data to disk and we don't change the bitmap.
Reported-by: Song Liu <songliubraving@fb.com> Reviewed-by: Song Liu <songliubraving@fb.com> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A NULL pointer crash was reported for the case of having the BFQ IO
scheduler attached to the underlying blk-mq paths of a DM multipath
device. The crash occured in blk_mq_sched_insert_request()'s call to
e->type->ops.mq.insert_requests().
Paolo Valente correctly summarized why the crash occured with:
"the call chain (dm_mq_queue_rq -> map_request -> setup_clone ->
blk_rq_prep_clone) creates a cloned request without invoking
e->type->ops.mq.prepare_request for the target elevator e. The cloned
request is therefore not initialized for the scheduler, but it is
however inserted into the scheduler by blk_mq_sched_insert_request."
All said, a request-based DM multipath device's IO scheduler should be
the only one used -- when the original requests are issued to the
underlying paths as cloned requests they are inserted directly in the
underlying dispatch queue(s) rather than through an additional elevator.
But commit bd166ef18 ("blk-mq-sched: add framework for MQ capable IO
schedulers") switched blk_insert_cloned_request() from using
blk_mq_insert_request() to blk_mq_sched_insert_request(). Which
incorrectly added elevator machinery into a call chain that isn't
supposed to have any.
To fix this introduce a blk-mq private blk_mq_request_bypass_insert()
that blk_insert_cloned_request() calls to insert the request without
involving any elevator that may be attached to the cloned request's
request_queue.
Fixes: bd166ef183c2 ("blk-mq-sched: add framework for MQ capable IO schedulers") Reported-by: Bart Van Assche <Bart.VanAssche@wdc.com> Tested-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Calling blk_start_queue() from interrupt context with the queue
lock held and without disabling IRQs, as the skd driver does, is
safe. This patch avoids that loading the skd driver triggers the
following warning:
Fixes: commit a038e2536472 ("[PATCH] blk_start_queue() must be called with irq disabled - add warning") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Andrew Morton <akpm@osdl.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Anton noticed that if we fault part way through emulating an unaligned
instruction, we don't update the DAR to reflect that.
The DAR value is eventually reported back to userspace as the address
in the SEGV signal, and if userspace is using that value to demand
fault then it can be confused by us not setting the value correctly.
This patch is ugly as hell, but is intended to be the minimal fix and
back ports easily.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Check if an LMB is assigned before attempting to call dlpar_acquire_drc
in order to avoid any unnecessary rtas calls. This substantially
reduces the running time of memory hot add on lpars with large amounts
of memory.
[mpe: We need to explicitly set rc to 0 in the success case, otherwise
the compiler might think we use rc without initialising it.]
Fixes: c21f515c7436 ("powerpc/pseries: Make the acquire/release of the drc for memory a seperate step") Signed-off-by: John Allen <jallen@linux.vnet.ibm.com> Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The nest MMU tlb flush needs to happen before the GPU translation
shootdown is launched to avoid the GPU refilling its tlb with stale
nmmu translations prior to the nmmu flush completing.
cxl keeps a driver use count, which is used with the hash memory model
on p8 to know when to upgrade local TLBIs to global and to trigger
callbacks to manage the MMU for PSL8.
If a process opens a context and closes without attaching or fails the
attachment, the driver use count is never decremented. As a
consequence, TLB invalidations remain global, even if there are no
active cxl contexts.
We should increment the driver use count when the process is attaching
to the cxl adapter, and not on open. It's not needed before the
adapter starts using the context and the use count is decremented on
the detach path, so it makes more sense.
It affects only the user api. The kernel api is already doing The
Right Thing.
Quota does not get enabled for read-only mounts if filesystem
has quota feature, so that quotas cannot updated during orphan
cleanup, which will lead to quota inconsistency.
This patch turn on quotas during orphan cleanup for this case,
make sure quotas can be updated correctly.
Reported-by: Jan Kara <jack@suse.cz> Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Current ext4 quota should always "usage enabled" if the
quota feautre is enabled. But in ext4_orphan_cleanup(), it
turn quotas off directly (used for the older journaled
quota), so we cannot turn it on again via "quotaon" unless
umount and remount ext4.
#reboot and mount
mount -o prjquota /dev/vdb1 /mnt
#query status
quotaon -Ppv /dev/vdb1
#output
quotaon: Cannot find mountpoint for device /dev/vdb1
quotaon: No correct mountpoint specified.
This patch add check for journaled quotas to avoid incorrect
quotaoff when ext4 has quota feautre.
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In the ext4 implementations of SEEK_HOLE and SEEK_DATA, make sure we
return -ENXIO for negative offsets instead of banging around inside
the extent code and returning -EFSCORRUPTED.
Reported-by: Mateusz S <muttdini@gmail.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As the association status changes the driver needs to configure the
hardware. This is done based on information in the "sta" acquired by
ieee80211_find_sta(), which requires the caller to ensure that the "sta"
is valid while its being used; generally by entering an rcu read
section.
But the operations acting on the "sta" has to communicate with the
firmware and may therefor sleep, resulting in the following report:
In order to ensure that the "sta" remains alive (and consistent) for the
duration of bss_info_changed() mutual exclusion has to be ensured with
sta_remove().
This is done by introducing a mutex to cover firmware configuration
changes, which is made to also ensure mutual exclusion between other
operations changing the state or configuration of the firmware. With
this we can drop the rcu read lock.
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The original patch from Tony uses standby mode bit inverted, which is
not correct. This fixes all instances in the driver code for get & set
mode. This did not yet make problems, since mode has not been changed
by any mainline driver so far.
Fixes: 0ad4c07edd41 ("regulator: cpcap: Add basic regulator support") Acked-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Stephan Mueller [Thu, 21 Sep 2017 08:16:53 +0000 (10:16 +0200)]
crypto: AF_ALG - remove SGL terminator indicator when chaining
Fixed differently upstream as commit 2d97591ef43d ("crypto: af_alg - consolidation of duplicate code")
The SGL is MAX_SGL_ENTS + 1 in size. The last SG entry is used for the
chaining and is properly updated with the sg_chain invocation. During
the filling-in of the initial SG entries, sg_mark_end is called for each
SG entry. This is appropriate as long as no additional SGL is chained
with the current SGL. However, when a new SGL is chained and the last
SG entry is updated with sg_chain, the last but one entry still contains
the end marker from the sg_mark_end. This end marker must be removed as
otherwise a walk of the chained SGLs will cause a NULL pointer
dereference at the last but one SG entry, because sg_next will return
NULL.
The patch only applies to all kernels up to and including 4.13. The
patch 2d97591ef43d0587be22ad1b0d758d6df4999a0b added to 4.14-rc1
introduced a complete new code base which addresses this bug in
a different way. Yet, that patch is too invasive for stable kernels
and was therefore not marked for stable.
Fixes: 8ff590903d5fc ("crypto: algif_skcipher - User-space interface for skcipher operations") Signed-off-by: Stephan Mueller <smueller@chronox.de> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
caam/qi needs a fix similar to what was done for caam/jr in
commit "crypto: caam/qi - properly set IV after {en,de}crypt",
to allow for ablkcipher/skcipher chunking/streaming.
Fixes: b189817cf789 ("crypto: caam/qi - add ablkcipher and authenc algorithms") Suggested-by: David Gstir <david@sigma-star.at> Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Due to the use of per-CPU buffers, scomp_acomp_comp_decomp() executes
with preemption disabled, and so whether the CRYPTO_TFM_REQ_MAY_SLEEP
flag is set is irrelevant, since we cannot sleep anyway. So disregard
the flag, and use GFP_ATOMIC unconditionally.
Version 5 CCPs have some new requirements for XTS-AES: the type field
must be specified, and the key requires 512 bits, with each part
occupying 256 bits and padded with zeroes.
Signed-off-by: Gary R Hook <ghook@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6807c84652b0 ("x86: Enable KASLR by default") enables KASLR
by default on x86. While KASLR will confuse gdb which resolve kernel
symbol address from symbol table of vmlinux. We should turn off KASLR for
kernel debugging.
Implement fused multiply-add with correct accuracy.
Fused multiply-add operation has better accuracy than respective
sequential execution of multiply and add operations applied on the
same inputs. This is because accuracy errors accumulate in latter
case.
This patch implements fused multiply-add with the same accuracy
as it is implemented in hardware, using 128-bit intermediate
calculations.
One test case example (raw bits) that this patch fixes:
MADDF.D fd,fs,ft:
fd = 0x00000ca000000000
fs = ft = 0x3f40624dd2f1a9fc
Fixes: e24c3bec3e8e ("MIPS: math-emu: Add support for the MIPS R6 MADDF FPU instruction") Fixes: 83d43305a1df ("MIPS: math-emu: Add support for the MIPS R6 MSUBF FPU instruction") Signed-off-by: Douglas Leung <douglas.leung@imgtec.com> Signed-off-by: Miodrag Dinic <miodrag.dinic@imgtec.com> Signed-off-by: Goran Ferenc <goran.ferenc@imgtec.com> Signed-off-by: Aleksandar Markovic <aleksandar.markovic@imgtec.com> Cc: Douglas Leung <douglas.leung@imgtec.com> Cc: Bo Hu <bohu@google.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Jin Qian <jinqian@google.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: Petar Jovanovic <petar.jovanovic@imgtec.com> Cc: Raghu Gandham <raghu.gandham@imgtec.com> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/16891/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Implement fused multiply-add with correct accuracy.
Fused multiply-add operation has better accuracy than respective
sequential execution of multiply and add operations applied on the
same inputs. This is because accuracy errors accumulate in latter
case.
This patch implements fused multiply-add with the same accuracy
as it is implemented in hardware, using 64-bit intermediate
calculations.
One test case example (raw bits) that this patch fixes:
MADDF.S fd,fs,ft:
fd = 0x22575225
fs = ft = 0x3727c5ac
Fixes: e24c3bec3e8e ("MIPS: math-emu: Add support for the MIPS R6 MADDF FPU instruction") Fixes: 83d43305a1df ("MIPS: math-emu: Add support for the MIPS R6 MSUBF FPU instruction") Signed-off-by: Douglas Leung <douglas.leung@imgtec.com> Signed-off-by: Miodrag Dinic <miodrag.dinic@imgtec.com> Signed-off-by: Goran Ferenc <goran.ferenc@imgtec.com> Signed-off-by: Aleksandar Markovic <aleksandar.markovic@imgtec.com> Cc: Douglas Leung <douglas.leung@imgtec.com> Cc: Bo Hu <bohu@google.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Jin Qian <jinqian@google.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: Petar Jovanovic <petar.jovanovic@imgtec.com> Cc: Raghu Gandham <raghu.gandham@imgtec.com> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/16890/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix definition and usage of "maddf_flags" enumeration. Avoid duplicate
definition and apply more common capitalization.
This patch does not change any scenario. It just makes MADDF and
MSUBF emulation code more readable and easier to maintain, and
hopefully prevents future bugs as well.
Signed-off-by: Miodrag Dinic <miodrag.dinic@imgtec.com> Signed-off-by: Goran Ferenc <goran.ferenc@imgtec.com> Signed-off-by: Aleksandar Markovic <aleksandar.markovic@imgtec.com> Reviewed-by: James Hogan <james.hogan@imgtec.com> Cc: Bo Hu <bohu@google.com> Cc: Douglas Leung <douglas.leung@imgtec.com> Cc: Jin Qian <jinqian@google.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: Petar Jovanovic <petar.jovanovic@imgtec.com> Cc: Raghu Gandham <raghu.gandham@imgtec.com> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/16889/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix the cases of <MADDF|MSUBF>.<D|S> when any of two multiplicands is
+0 or -0, and the third input is also +0 or -0. Depending on the signs
of inputs, certain special cases must be handled.
A relevant example:
MADDF.S fd,fs,ft:
If fs contains +0.0, ft contains -0.0, and fd contains 0.0, fd is
going to contain +0.0 (without this patch, it used to contain -0.0).
Fixes: e24c3bec3e8e ("MIPS: math-emu: Add support for the MIPS R6 MADDF FPU instruction") Fixes: 83d43305a1df ("MIPS: math-emu: Add support for the MIPS R6 MSUBF FPU instruction") Signed-off-by: Miodrag Dinic <miodrag.dinic@imgtec.com> Signed-off-by: Goran Ferenc <goran.ferenc@imgtec.com> Signed-off-by: Aleksandar Markovic <aleksandar.markovic@imgtec.com> Reviewed-by: James Hogan <james.hogan@imgtec.com> Cc: Bo Hu <bohu@google.com> Cc: Douglas Leung <douglas.leung@imgtec.com> Cc: Jin Qian <jinqian@google.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: Petar Jovanovic <petar.jovanovic@imgtec.com> Cc: Raghu Gandham <raghu.gandham@imgtec.com> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/16888/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix the cases of <MADDF|MSUBF>.<D|S> when any of two multiplicands is
infinity. The correct behavior in such cases is affected by the nature
of third input. Cases of addition of infinities with opposite signs
and subtraction of infinities with same signs may arise and must be
handles separately. Also, the value od flags argument (that determines
whether the instruction is MADDF or MSUBF) affects the outcome.
Relevant examples:
MADDF.S fd,fs,ft:
If fs contains +inf, ft contains +inf, and fd contains -inf, fd is
going to contain indef (without this patch, it used to contain
-inf).
MSUBF.S fd,fs,ft:
If fs contains +inf, ft contains 1.0, and fd contains +0.0, fd is
going to contain -inf (without this patch, it used to contain +inf).
Fixes: e24c3bec3e8e ("MIPS: math-emu: Add support for the MIPS R6 MADDF FPU instruction") Fixes: 83d43305a1df ("MIPS: math-emu: Add support for the MIPS R6 MSUBF FPU instruction") Signed-off-by: Douglas Leung <douglas.leung@imgtec.com> Signed-off-by: Miodrag Dinic <miodrag.dinic@imgtec.com> Signed-off-by: Goran Ferenc <goran.ferenc@imgtec.com> Signed-off-by: Aleksandar Markovic <aleksandar.markovic@imgtec.com> Reviewed-by: James Hogan <james.hogan@imgtec.com> Cc: Douglas Leung <douglas.leung@imgtec.com> Cc: Bo Hu <bohu@google.com> Cc: Jin Qian <jinqian@google.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: Petar Jovanovic <petar.jovanovic@imgtec.com> Cc: Raghu Gandham <raghu.gandham@imgtec.com> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/16887/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix the cases of <MADDF|MSUBF>.<D|S> when any of three inputs is any
NaN. Correct behavior of <MADDF|MSUBF>.<D|S> fd, fs, ft is following:
- if any of inputs is sNaN, return a sNaN using following rules: if
only one input is sNaN, return that one; if more than one input is
sNaN, order of precedence for return value is fd, fs, ft
- if no input is sNaN, but at least one of inputs is qNaN, return a
qNaN using following rules: if only one input is qNaN, return that
one; if more than one input is qNaN, order of precedence for
return value is fd, fs, ft
The previous code contained correct handling of some above cases, but
not all. Also, such handling was scattered into various cases of
"switch (CLPAIR(xc, yc))" statement, and elsewhere. With this patch,
this logic is placed in one place, and "switch (CLPAIR(xc, yc))" is
significantly simplified.
A relevant example:
MADDF.S fd,fs,ft:
If fs contains qNaN1, ft contains qNaN2, and fd contains qNaN3, fd
is going to contain qNaN3 (without this patch, it used to contain
qNaN1).
Fixes: e24c3bec3e8e ("MIPS: math-emu: Add support for the MIPS R6 MADDF FPU instruction") Fixes: 83d43305a1df ("MIPS: math-emu: Add support for the MIPS R6 MSUBF FPU instruction") Signed-off-by: Miodrag Dinic <miodrag.dinic@imgtec.com> Signed-off-by: Goran Ferenc <goran.ferenc@imgtec.com> Signed-off-by: Aleksandar Markovic <aleksandar.markovic@imgtec.com> Reviewed-by: James Hogan <james.hogan@imgtec.com> Cc: Bo Hu <bohu@google.com> Cc: Douglas Leung <douglas.leung@imgtec.com> Cc: Jin Qian <jinqian@google.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: Petar Jovanovic <petar.jovanovic@imgtec.com> Cc: Raghu Gandham <raghu.gandham@imgtec.com> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/16886/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
- if one of the inputs is zero, and the other is subnormal, normal,
or infinity, the value of the former should be returned (that is,
a zero).
- if one of the inputs is infinity, and the other input is normal,
or subnormal, the value of the latter should be returned.
The previous implementation's logic for such cases was incorrect - it
appears as if it implements MAXA, and not MINA instruction.
A relevant example:
MINA.S fd,fs,ft:
If fs contains 100.0, and ft contains 0.0, fd is going to contain
0.0 (without this patch, it used to contain 100.0).
Fixes: a79f5f9ba508 ("MIPS: math-emu: Add support for the MIPS R6 MAX{, A} FPU instruction") Fixes: 4e9561b20e2f ("MIPS: math-emu: Add support for the MIPS R6 MIN{, A} FPU instruction") Signed-off-by: Miodrag Dinic <miodrag.dinic@imgtec.com> Signed-off-by: Goran Ferenc <goran.ferenc@imgtec.com> Signed-off-by: Aleksandar Markovic <aleksandar.markovic@imgtec.com> Reviewed-by: James Hogan <james.hogan@imgtec.com> Cc: Bo Hu <bohu@google.com> Cc: Douglas Leung <douglas.leung@imgtec.com> Cc: Jin Qian <jinqian@google.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: Petar Jovanovic <petar.jovanovic@imgtec.com> Cc: Raghu Gandham <raghu.gandham@imgtec.com> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/16885/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix the value returned by <MAXA|MINA>.<D|S> fd,fs,ft, if both inputs
are infinite. The previous implementation returned always the value
contained in ft in such cases. The correct behavior is specified
in Mips instruction set manual and is as follows:
Fix the value returned by <MAX|MIN>.<D|S>, if both inputs are negative
normal fp numbers. The previous logic did not take into account that
if both inputs have the same sign, there should be separate treatment
of the cases when both inputs are negative and when both inputs are
positive.
A relevant example:
MAX.S fd,fs,ft:
If fs contains -5.0, and ft contains -7.0, fd is going to contain
-5.0 (without this patch, it used to contain -7.0).
Fixes: a79f5f9ba508 ("MIPS: math-emu: Add support for the MIPS R6 MAX{, A} FPU instruction") Fixes: 4e9561b20e2f ("MIPS: math-emu: Add support for the MIPS R6 MIN{, A} FPU instruction") Signed-off-by: Miodrag Dinic <miodrag.dinic@imgtec.com> Signed-off-by: Goran Ferenc <goran.ferenc@imgtec.com> Signed-off-by: Aleksandar Markovic <aleksandar.markovic@imgtec.com> Reviewed-by: James Hogan <james.hogan@imgtec.com> Cc: Bo Hu <bohu@google.com> Cc: Douglas Leung <douglas.leung@imgtec.com> Cc: Jin Qian <jinqian@google.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: Petar Jovanovic <petar.jovanovic@imgtec.com> Cc: Raghu Gandham <raghu.gandham@imgtec.com> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/16882/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix the value returned by <MAX|MAXA|MIN|MINA>.<D|S>, if both inputs
are zeros. The right behavior in such cases is stated in instruction
reference manual and is as follows:
fs ft MAX MIN MAXA MINA
---------------------------------------------
0 0 0 0 0 0
0 -0 0 -0 0 -0
-0 0 0 -0 0 -0
-0 -0 -0 -0 -0 -0
Prior to this patch, some of the above cases were yielding correct
results. However, for the sake of code consistency, all such cases
are rewritten in this patch.
A relevant example:
MAX.S fd,fs,ft:
If fs contains +0.0, and ft contains -0.0, fd is going to contain
+0.0 (without this patch, it used to contain -0.0).
Fixes: a79f5f9ba508 ("MIPS: math-emu: Add support for the MIPS R6 MAX{, A} FPU instruction") Fixes: 4e9561b20e2f ("MIPS: math-emu: Add support for the MIPS R6 MIN{, A} FPU instruction") Signed-off-by: Miodrag Dinic <miodrag.dinic@imgtec.com> Signed-off-by: Goran Ferenc <goran.ferenc@imgtec.com> Signed-off-by: Aleksandar Markovic <aleksandar.markovic@imgtec.com> Reviewed-by: James Hogan <james.hogan@imgtec.com> Cc: Bo Hu <bohu@google.com> Cc: Douglas Leung <douglas.leung@imgtec.com> Cc: Jin Qian <jinqian@google.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: Petar Jovanovic <petar.jovanovic@imgtec.com> Cc: Raghu Gandham <raghu.gandham@imgtec.com> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/16881/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix the value returned by <MAX|MAXA|MIN|MINA>.<D|S> fd,fs,ft, if both
inputs are quiet NaNs. The <MAX|MAXA|MIN|MINA>.<D|S> specifications
state that the returned value in such cases should be the quiet NaN
contained in register fs.
A relevant example:
MAX.S fd,fs,ft:
If fs contains qNaN1, and ft contains qNaN2, fd is going to contain
qNaN1 (without this patch, it used to contain qNaN2).
Fixes: a79f5f9ba508 ("MIPS: math-emu: Add support for the MIPS R6 MAX{, A} FPU instruction") Fixes: 4e9561b20e2f ("MIPS: math-emu: Add support for the MIPS R6 MIN{, A} FPU instruction") Signed-off-by: Miodrag Dinic <miodrag.dinic@imgtec.com> Signed-off-by: Goran Ferenc <goran.ferenc@imgtec.com> Signed-off-by: Aleksandar Markovic <aleksandar.markovic@imgtec.com> Reviewed-by: James Hogan <james.hogan@imgtec.com> Cc: Bo Hu <bohu@google.com> Cc: Douglas Leung <douglas.leung@imgtec.com> Cc: Jin Qian <jinqian@google.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: Petar Jovanovic <petar.jovanovic@imgtec.com> Cc: Raghu Gandham <raghu.gandham@imgtec.com> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/16880/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After commit 8b1bd11c1f8f ("pinctrl: samsung: Add the support the
multiple IORESOURCE_MEM for one pin-bank"), the S3C24xx (and probably
S3C64xx as well) fails:
Unable to handle kernel NULL pointer dereference at virtual address 000000a8
...
(s3c24xx_demux_eint4_7) from [<c004469c>] (__handle_domain_irq+0x6c/0xcc)
(__handle_domain_irq) from [<c0009444>] (s3c24xx_handle_irq+0x6c/0x12c)
(s3c24xx_handle_irq) from [<c000e5fc>] (__irq_svc+0x5c/0x78)
Mentioned commit moved the pointer to controller's base IO memory address
from each controller's driver data (samsung_pinctrl_drv_data) to per-bank
structure (samsung_pin_bank). The external interrupt demux
handlers (s3c24xx_demux_eint()) tried to get this base address from opaque
pointer stored under irq_chip data:
which is wrong because this is hardware irq and it bank was never set
for this irq_chip.
For S3C24xx and S3C64xx, this partially reverts mentioned commit by
bringing back the virt_base stored under each controller's driver data
(samsung_pinctrl_drv_data). This virt_base address will be now
duplicated:
- samsung_pinctrl_drv_data->virt_base: used on S3C24xx and S3C64xx,
- samsung_pin_bank->pctl_base: used on Exynos.
Fixes: 8b1bd11c1f8f ("pinctrl: samsung: Add the support the multiple IORESOURCE_MEM for one pin-bank") Cc: Sergio Prado <sergio.prado@e-labworks.com> Reported-by: Sergio Prado <sergio.prado@e-labworks.com> Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Tested-by: Lihua Yao <ylhuajnu@163.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When setting the pin function for external interrupts, the driver used
wrong IO memory address base. The pin function register is always under
pctl_base, not the eint_base.
By updating wrong register, the external interrupts for chosen GPIO
would not work at all and some other GPIO might be configured to wrong
value. For example on Exynos5433-based boards, the external interrupts
for gpf{1-5}-X GPIOs should not work at all (driver toggled reserved
registers from ALIVE bank instead).
Platforms other than Exynos5433 should not be affected as eint_base
equals pctl_base in such case.
Fixes: 8b1bd11c1f8f ("pinctrl: samsung: Add the support the multiple IORESOURCE_MEM for one pin-bank") Reported-by: Tomasz Figa <tomasz.figa@gmail.com> Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Reviewed-by: Sylwester Nawrocki <s.nawrocki@samsung.com> Tested-by: Sylwester Nawrocki <s.nawrocki@samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sergey noticed a small but fatal mistake in __tty_insert_flip_char,
leading to an oops in an interrupt handler when using any serial
port.
The problem is that I accidentally took the tty_buffer pointer
before calling __tty_buffer_request_room(), which replaces the
buffer. This moves the pointer lookup to the right place after
allocating the new buffer space.
While working on improving the fast path of tty_insert_flip_char(),
I noticed that by calling tty_buffer_request_room(), we needlessly
move to the separate flag buffer mode for the tty, even when all
characters use TTY_NORMAL as the flag.
This changes the code to call __tty_buffer_request_room() with the
correct flag, which will then allocate a regular buffer when it rounds
out of space but no special flags have been used. I'm guessing that
this is the behavior that Peter Hurley intended when he introduced
the compacted flip buffers.
kernelci.org reports a crazy stack usage for the VT code when CONFIG_KASAN
is enabled:
drivers/tty/vt/keyboard.c: In function 'kbd_keycode':
drivers/tty/vt/keyboard.c:1452:1: error: the frame size of 2240 bytes is larger than 2048 bytes [-Werror=frame-larger-than=]
The problem is that tty_insert_flip_char() gets inlined many times into
kbd_keycode(), and also into other functions, and each copy requires 128
bytes for stack redzone to check for a possible out-of-bounds access on
the 'ch' and 'flags' arguments that are passed into
tty_insert_flip_string_flags as a variable-length string.
This introduces a new __tty_insert_flip_char() function for the slow
path, which receives the two arguments by value. This completely avoids
the problem and the stack usage goes back down to around 100 bytes.
Without KASAN, this is also slightly better, as we don't have to
spill the arguments to the stack but can simply pass 'ch' and 'flag'
in registers, saving a few bytes in .text for each call site.
This should be backported to linux-4.0 or later, which first introduced
the stack sanitizer in the kernel.
We should only see devices with interrupt endpoints. Ignore any other
endpoints that we find, so we don't send try to send them interrupt URBs
and trigger a WARN down in the USB stack.
If cpuhp_store_callbacks() is called for CPUHP_AP_ONLINE_DYN or
CPUHP_BP_PREPARE_DYN, which are the indicators for dynamically allocated
states, then cpuhp_store_callbacks() allocates a new dynamic state. The
first allocation in each range returns CPUHP_AP_ONLINE_DYN or
CPUHP_BP_PREPARE_DYN.
If cpuhp_remove_state() is invoked for one of these states, then there is
no protection against the allocation mechanism. So the removal, which
should clear the callbacks and the name, gets a new state assigned and
clears that one.
As a consequence the state which should be cleared stays initialized. A
consecutive CPU hotplug operation dereferences the state callbacks and
accesses either freed or reused memory, resulting in crashes.
Add a protection against this by checking the name argument for NULL. If
it's NULL it's a removal. If not, it's an allocation.
[ tglx: Added a comment and massaged changelog ]
Fixes: 5b7aa87e0482 ("cpu/hotplug: Implement setup/removal interface") Signed-off-by: Ethan Barnes <ethan.barnes@sandisk.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.or> Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu> Cc: Sebastian Siewior <bigeasy@linutronix.d> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/DM2PR04MB398242FC7776D603D9F99C894A60@DM2PR04MB398.namprd04.prod.outlook.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tree RCU guarantees that every online CPU has a memory barrier between
any given grace period and any of that CPU's RCU read-side sections that
must be ordered against that grace period. Since RCU doesn't always
know where read-side critical sections are, the actual implementation
guarantees order against prior and subsequent non-idle non-offline code,
whether in an RCU read-side critical section or not. As a result, there
does not need to be a memory barrier at the end of synchronize_rcu()
and friends because the ordering internal to the grace period has
ordered every CPU's post-grace-period execution against each CPU's
pre-grace-period execution, again for all non-idle online CPUs.
In contrast, SRCU can have non-idle online CPUs that are completely
uninvolved in a given SRCU grace period, for example, a CPU that
never runs any SRCU read-side critical sections and took no part in
the grace-period processing. It is in theory possible for a given
synchronize_srcu()'s wakeup to be delivered to a CPU that was completely
uninvolved in the prior SRCU grace period, which could mean that the
code following that synchronize_srcu() would end up being unordered with
respect to both the grace period and any pre-existing SRCU read-side
critical sections.
This commit therefore adds an smp_mb() to the end of __synchronize_srcu(),
which prevents this scenario from occurring.
Reported-by: Lance Roy <ldr709@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Lance Roy <ldr709@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When we have a miss in one order of the mkey cache, we try to get
an mkey from a higher order.
We still need to check that the higher order can be used with UMR
before using it. Otherwise, we will get an mkey with 0 entries and
the post send operation that is used to fill it will complete with
the following error:
Section 9.7.7.2.5 of the 1.3 IBTA spec clearly says that receive
credits should never apply to RDMA write.
qib and hfi1 were doing that. The following situation will result
in a QP hang:
- A prior SEND or RDMA_WRITE with immmediate consumed the last
credit for a QP using RC receive buffer credits
- The prior op is acked so there are no more acks
- The peer ULP fails to post receive for some reason
- An RDMA write sees that the credits are exhausted and waits
- The peer ULP posts receive buffers
- The ULP posts a send or RDMA write that will be hung
The fix is to avoid the credit test for the RDMA write operation.
Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>