When registering for the Xenstore watch of the node control/sysrq the
handler will be called at once. Don't issue an error message if the
Xenstore node isn't there, as it will be created only when an event
is being triggered.
Currently, if the kernel is running on a POWER9 processor under a
hypervisor, it will try to use the radix MMU even though it doesn't have
the necessary code to use radix under a hypervisor (it doesn't negotiate
use of radix, and it doesn't do the H_REGISTER_PROC_TBL hcall). The
result is that the guest kernel will crash when it tries to turn on the
MMU.
This fixes it by looking for the /chosen/ibm,architecture-vec-5
property, and if it exists, clears the radix MMU feature bit, before we
decide whether to initialize for radix or HPT. This property is created
by the hypervisor as a result of the guest calling the
ibm,client-architecture-support method to indicate its capabilities, so
it will indicate whether the hypervisor agreed to us using radix.
Systems without a hypervisor may have this property also (for example,
skiboot creates it), so we check the HV bit in the MSR to see whether we
are running as a guest or not. If we are in hypervisor mode, then we can
do whatever we like including using the radix MMU.
The reason for using this property is that in future, when we have
support for using radix under a hypervisor, we will need to check this
property to see whether the hypervisor agreed to us using radix.
Fixes: 2bfd65e45e87 ("powerpc/mm/radix: Add radix callbacks for early init routines") Cc: stable@vger.kernel.org # v4.7+ Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
pci_lock is an IRQ-safe spinlock that protects all accesses to PCI
configuration space (see PCI_OP_READ() and PCI_OP_WRITE() in pci/access.c).
The pci_cfg_access_unlock() path acquires pci_lock, then p->pi_lock (inside
wake_up_all()). According to lockdep, there is a possible path involving
snbep_uncore_pci_read_counter() that could acquire them in the reverse
order: acquiring p->pi_lock, then pci_lock, which could result in a
deadlock. Lockdep details are in the bugzilla below.
Avoid the possible deadlock by dropping pci_lock before waking up any
config access waiters.
The size computations done in the ioctl function use an integer.
If userspace submits a request with req->cmd_nr or req->cmd_buf_nr
set to INT_MAX, the integer computations overflow later, leading
to potential (kernel) memory corruption.
Prevent this issue by enforcing a limit on the number of submitted
commands, so that we have enough headroom later for the size
computations.
Note that this change has no impact on the currently available
users in userspace, like e.g. libdrm/exynos.
While at it, also make a comment about the size computation more
detailed.
In fips mode only xts keys with 128 bit or 125 bit are allowed.
This fix extends the xts_aes_set_key function to check for these
valid key lengths in fips mode.
Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The generate_entropy function used a sha256 for compacting
together 256 bits of entropy into 32 bytes hash. However, it
is questionable if a sha256 can really be used here, as
potential collisions may reduce the max entropy fitting into
a 32 byte hash value. So this batch introduces the use of
sha512 instead and the required buffer adjustments for the
calling functions.
Further more the working buffer for the generate_entropy
function has been widened from one page to two pages. So now
1024 stckf invocations are used to gather 256 bits of
entropy. This has been done to be on the save side if the
jitters of stckf values isn't as good as supposed.
Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Request for a notification from a disconnected client will be ignored
silently by the FW but the caller should know that the operation hasn't
succeeded.
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch corrects an omission in bytcr_rt5640 and bytcr_rt5651.
All existing machine drivers shall not use .pm_ops to avoid a double
suspend, as initially implemented by 3f2dcbeaeb2b
("ASoC: Intel: Remove soc pm handling to allow platform driver handle it").
Reported-by: Shrirang Bagul <shrirang.bagul@canonical.com> Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
may_create() rejects creation of inodes with ids which lack a
mapping into s_user_ns. However for O_CREAT may_o_create() is
is used instead. Add a similar check there.
Fixes: 036d523641c6 ("vfs: Don't create inodes with a uid or gid unknown to the vfs") Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This method may be unsupported (see: USB bus) or may just fail (see:
SDIO bus).
While at it rework logic in brcmf_sdio_bus_get_memdump function to avoid
too many conditional code nesting levels.
Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This issue is found by smatch; has been reported as-
Unchecked usage of potential ERR_PTR result in lmv_hsm_req_count
and lmv_hsm_req_build. Added ERR_PTR in both functions and also
return value check added.
This patch resolves IO vs eviction race.
After eviction failed export stayed at stale list,
a client had IO processing and reconnected during it.
A client sent brw rpc with last lock cookie and new connection.
The lock with failed export was found and assert was happened.
(ost_handler.c:1812:ost_prolong_lock_one())
ASSERTION( lock->l_export == opd->opd_exp ) failed:
1. Skip the lock at ldlm_handle2lock if lock export failed.
2. Validation of lock for IO was added at hpreq_check(). The lock
searching is based on granted interval tree. If server doesn`t
have a valid lock, it reply to client with ESTALE.
The function hai_dump_data_field will do a stack buffer
overrun when cat'ing /sys/fs/lustre/.../hsm/actions if an action has
some data in it.
hai_dump_data_field uses snprintf. But there is no check for
truncation, and the value returned by snprintf is used as-is. The
coordinator code calls hai_dump_data_field with 12 bytes in the
buffer. The 6th byte of data is printed incompletely to make room for
the terminating NUL. However snprintf still returns 2, so when
hai_dump_data_field writes the final NUL, it does it outside the
reserved buffer, in the 13th byte of the buffer. This stack buffer
overrun hangs my VM.
Fix by checking that there is enough room for the next 2 characters
plus the NUL terminator. Don't print half bytes. Change the format to
02X instead of .2X, which makes more sense.
Signed-off-by: frank zago <fzago@cray.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8171
Reviewed-on: http://review.whamcloud.com/20338 Reviewed-by: John L. Hammond <john.hammond@intel.com> Reviewed-by: Jean-Baptiste Riaux <riaux.jb@intel.com> Reviewed-by: Oleg Drokin <oleg.drokin@intel.com> Signed-off-by: James Simmons <jsimmons@infradead.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If the driver is built as a module, autoload won't work because the module
alias information is not filled. So user-space can't match the registered
device with the corresponding module.
Export the module alias information using the MODULE_DEVICE_TABLE() macro.
Before this patch:
$ modinfo drivers/platform/x86/intel_mid_thermal.ko | grep alias
$
After this patch:
$ modinfo drivers/platform/x86/intel_mid_thermal.ko | grep alias
alias: platform:msic_thermal
Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This kzalloc() could fail. Let's bail out with -ENOMEM here
instead of NULL dereferencing. That silences static checkers. We
should also cleanup on the error path even though this function
returning an error probably means the system won't boot.
Cc: Chen-Yu Tsai <wens@csie.org> Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
With QCA4019 platform, SRAM address can be accessed directly from host but
currently, we are assuming sram addresses cannot be accessed directly and
hence we convert the addresses.
While there, clean up growing hw checks during conversion of target CPU
address to CE address. Now we have function pointer pertaining to different
chips.
Signed-off-by: Ashok Raj Nagarajan <arnagara@qti.qualcomm.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The DP83867 when not properly bootstrapped - especially with LED_0 pin -
can enter N/A MODE4 for "port mirroring" feature.
To provide normal operation of the PHY, one needs not only to explicitly
disable the port mirroring feature, but as well stop some IC internal
testing (which disables RGMII communication).
To do that the STRAP_STS1 (0x006E) register must be read and RESERVED bit
11 examined. When it is set, the another RESERVED bit (11) at PHYCR
(0x0010) register must be clear to disable testing mode and enable RGMII
communication.
Thorough explanation of the problem can be found at following e2e thread:
"DP83867IR: Problem with RESERVED bits in PHY Control Register (PHYCR) -
Linux driver"
This erratum describes a bug in logic outside the core, so MIDR can't be
used to identify its presence, and reading an SoC-specific revision
register from common arch timer code would be awkward. So, describe it
in the device tree.
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
clk_prepare_enable() may fail, so we should better check its return
value.
Also place the of_node_put() function right after clk_prepare_enable(),
in order to avoid calling of_node_put() twice in case clk_prepare_enable()
fails.
When we send a deauth to a station we don't know about, we
need to use the PROBE_RESP queue. This can happen when we
send a deauth to a station that is not associated to us.
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This fixes the condition where the controller has not fully completed its
final transfer and leaves the bus and controller in a undesirable state.
At the end of the last transmitted byte, the existing driver would just
signal for a STOP condition to be transmitted then immediately signal
completion. However, the full STOP procedure might not have fully taken
place by the time the runtime PM shuts off the peripheral clock, leaving
the bus in a suspended state.
Alternatively, the STOP condition on the bus may have completed, but when
the next transaction is requested by the upper layer, not all the
necessary register cleanup was finished from the last transfer which made
the driver return BUS BUSY when it really wasn't.
This patch now makes all transmit and receive transactions wait for the
STOP condition to fully complete before signaling a completed transaction.
With this new method, runtime PM no longer seems to be an issue.
Fixes: 310c18a41450 ("i2c: riic: add driver") Signed-off-by: Chris Brandt <chris.brandt@renesas.com> Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
and we try to allocate 64MB extent, we will end up directly in
ext4_mb_complex_scan_group(). This is because the request is detected
as power-of-two allocation (so we start in ext4_mb_regular_allocator()
with ac_criteria == 0) however the check before
ext4_mb_simple_scan_group() refuses the direct buddy scan because the
allocation request is too large. Since cr == 0, the check whether we
should use ext4_mb_scan_aligned() fails as well and we fall back to
ext4_mb_complex_scan_group().
Fix the problem by checking for upper limit on power-of-two requests
directly when detecting them.
Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
drivers/net/ethernet/marvell/mvneta.c:2694:26: error: storage size of 'status' isn't known
drivers/net/ethernet/marvell/mvneta.c:2695:26: error: storage size of 'changed' isn't known
drivers/net/ethernet/marvell/mvneta.c:2695:9: error: variable 'changed' has initializer but incomplete type
drivers/net/ethernet/marvell/mvneta.c:2709:2: error: implicit declaration of function 'fixed_phy_update_state' [-Werror=implicit-function-declaration]
Add linux/phy_fixed.h to mvneta.c
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Acked-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In the current boot, clients making use of the AB8500 sysctrl
may be probed before the ab8500-sysctrl driver. This gives them
-EINVAL, but should rather give -EPROBE_DEFER.
Before this, the abx500 clock driver didn't probe properly,
and as a result the codec driver in turn using the clocks did
not probe properly. After this patch, everything probes
properly.
Also add OF compatible-string probing. This driver is all
device tree, so let's just make a drive-by-fix of that as
well.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
drivers/mmc/host/s3cmci.h:69:24: error: field 'pio_tasklet' has incomplete type
struct tasklet_struct pio_tasklet;
drivers/mmc/host/s3cmci.c: In function 's3cmci_enable_irq':
drivers/mmc/host/s3cmci.c:390:4: error: implicit declaration of function 'enable_irq';did you mean 'enable_imask'? [-Werror=implicit-function-declaration]
While I haven't found out why this happened now and not earlier, the
solution is obvious, we should include the header that defines
the structure.
Compiling the fsl-mc bus driver will yield a couple of static analysis
errors:
warning: symbol 'fsl_mc_msi_domain_alloc_irqs' was not declared
warning: symbol 'fsl_mc_msi_domain_free_irqs' was not declared.
warning: symbol 'its_fsl_mc_msi_init' was not declared.
warning: symbol 'its_fsl_mc_msi_cleanup' was not declared.
Since these are properly declared, but the header is not included, add
it in the source files. This way the symbol is properly exported.
If new_policy is set in cpufreq_online(), the policy object has just
been created and its real_cpus mask has been zeroed on allocation,
and the driver's ->init() callback should not touch it.
It doesn't need to be cleared again, so don't do that.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 6575257c60e1 ("tracing/samples: Fix creation and deletion of
simple_thread_fn creation") introduced a new warning due to using a
boolean as a counter.
Just make it "int".
Fixes: 6575257c60e1 ("tracing/samples: Fix creation and deletion of simple_thread_fn creation") Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 7496946a8 ("tracing: Add samples of DECLARE_EVENT_CLASS() and
DEFINE_EVENT()") added template examples for all the events. It created a
DEFINE_EVENT_FN() example which reused the foo_bar_reg and foo_bar_unreg
functions.
Enabling both the TRACE_EVENT_FN() and DEFINE_EVENT_FN() example trace
events caused the foo_bar_reg to be called twice, creating the test thread
twice. The foo_bar_unreg would remove it only once, even if it was called
multiple times, leaving a thread existing when the module is unloaded,
causing an oops.
Add a ref count and allow foo_bar_reg() and foo_bar_unreg() be called by
multiple trace events.
Fixes: 7496946a8 ("tracing: Add samples of DECLARE_EVENT_CLASS() and DEFINE_EVENT()") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We recently added an integer overflow check but it needs an additional
tweak to work properly on 32 bit systems.
The problem is that we're doing the right hand side of the assignment as
type unsigned long so the max it will have an integer overflow instead
of being larger than SIZE_MAX. That means the "sz > SIZE_MAX" condition
is never true even on 32 bit systems. We need to first cast it to u64
and then do the math.
Fixes: 4a630fadbb29 ("drm/msm: Fix potential buffer overflow issue") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by: Rob Clark <robdclark@gmail.com> Cc: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In function submit_create, if nr_cmds or nr_bos is assigned with
negative value, the allocated buffer may be small than intended.
Using this buffer will lead to buffer overflow issue.
Signed-off-by: Kasin Li <donglil@codeaurora.org> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by: Rob Clark <robdclark@gmail.com> Cc: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Per my reading of the eDP spec, DP_DPCD_DISPLAY_CONTROL_CAPABLE bit in
DP_EDP_CONFIGURATION_CAP should be set if the eDP display control
registers starting at offset DP_EDP_DPCD_REV are "enabled". Currently we
check the bit before reading the registers, and DP_EDP_DPCD_REV is the
only way to detect eDP revision.
Turns out there are (likely buggy) displays that require eDP 1.4+
features, such as supported link rates and link rate select, but do not
have the bit set. Read the display control registers
unconditionally. They are supposed to read zero anyway if they are not
supported, so there should be no harm in this.
This fixes the referenced bug by enabling the eDP version check, and
thus reading of the supported link rates. The panel in question has 0 in
DP_MAX_LINK_RATE which is only supported in eDP 1.4+. Without the
supported link rates method we default to RBR which is insufficient for
the panel native mode. As a curiosity, the panel also has a bogus value
of 0x12 in DP_EDP_DPCD_REV, but that passes our check for >= DP_EDP_14
(which is 0x03).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103400 Reported-and-tested-by: Nicolas P. <issun.artiste@gmail.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Manasi Navare <manasi.d.navare@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171026142932.17737-1-jani.nikula@intel.com
(cherry picked from commit 0501a3b0eb01ac2209ef6fce76153e5d6b07034e) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The first cluster group descriptor is not stored at the start of the
group but at an offset from the start. We need to take this into
account while doing fstrim on the first cluster group. Otherwise we
will wrongly start fstrim a few blocks after the desired start block and
the range can cross over into the next cluster group and zero out the
group descriptor there. This can cause filesytem corruption that cannot
be fixed by fsck.
Link: http://lkml.kernel.org/r/1507835579-7308-1-git-send-email-ashish.samant@oracle.com Signed-off-by: Ashish Samant <ashish.samant@oracle.com> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Joseph Qi <jiangqi903@gmail.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixes init failures on polaris cards with harvested UVD.
Signed-off-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The asm-generic/unaligned.h header provides two different implementations
for accessing unaligned variables: the access_ok.h version used when
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is set pretends that all pointers
are in fact aligned, while the le_struct.h version convinces gcc that the
alignment of a pointer is '1', to make it issue the correct load/store
instructions depending on the architecture flags.
On ARMv5 and older, we always use the second version, to let the compiler
use byte accesses. On ARMv6 and newer, we currently use the access_ok.h
version, so the compiler can use any instruction including stm/ldm and
ldrd/strd that will cause an alignment trap. This trap can significantly
impact performance when we have to do a lot of fixups and, worse, has
led to crashes in the LZ4 decompressor code that does not have a trap
handler.
This adds an ARM specific version of asm/unaligned.h that uses the
le_struct.h/be_struct.h implementation unconditionally. This should lead
to essentially the same code on ARMv6+ as before, with the exception of
using regular load/store instructions instead of the trapping instructions
multi-register variants.
The crash in the LZ4 decompressor code was probably introduced by the
patch replacing the LZ4 implementation, commit 4e1a33b105dd ("lib: update
LZ4 compressor module"), so linux-4.11 and higher would be affected most.
However, we probably want to have this backported to all older stable
kernels as well, to help with the performance issues.
There are two follow-ups that I think we should also work on, but not
backport to stable kernels, first to change the asm-generic version of
the header to remove the ARM special case, and second to review all
other uses of CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS to see if they
might be affected by the same problem on ARM.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When a exception is trapped to EL2, hardware uses ELR_ELx to hold
the current fault instruction address. If KVM wants to inject a
abort to 32 bit guest, it needs to set the LR register for the
guest to emulate this abort happened in the guest. Because ARM32
architecture is pipelined execution, so the LR value has an offset to
the fault instruction address.
The offsets applied to Link value for exceptions as shown below,
which should be added for the ARM32 link register(LR).
Table taken from ARMv8 ARM DDI0487B-B, table G1-10:
Exception Offset, for PE state of:
A32 T32
Undefined Instruction +4 +2
Prefetch Abort +4 +4
Data Abort +8 +8
IRQ or FIQ +4 +4
[ Removed unused variables in inject_abt to avoid compile warnings.
-- Christoffer ]
Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com> Tested-by: Haibin Zhang <zhanghaibin7@huawei.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The ADC in the ADAU1361 (and possibly other Analog Devices codecs)
exhibits a cyclic variation in the noise floor (in our test setup between
-87 and -93 dB), a new value being attained within this range whenever a
new capture stream is started. The cycle repeats after about 10 or 11
restarts.
The workaround recommended by the manufacturer is to toggle the ADOSR bit
in the Converter Control 0 register each time a new capture stream is
started.
I have verified that the patch fixes this problem on the ADAU1361, and
according to the manufacturer toggling the bit in question in this manner
will at least have no detrimental effect on other chips served by this
driver.
Signed-off-by: Ricard Wanderlof <ricardw@axis.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
syzkaller with KASAN reported an out-of-bounds read in
asn1_ber_decoder(). It can be reproduced by the following command,
assuming CONFIG_X509_CERTIFICATE_PARSER=y and CONFIG_KASAN=y:
keyctl add asymmetric desc $'\x30\x30' @s
The bug is that the length of an ASN.1 data value isn't validated in the
case where it is encoded using the short form, causing the decoder to
read past the end of the input buffer. Fix it by validating the length.
The bug report was:
BUG: KASAN: slab-out-of-bounds in asn1_ber_decoder+0x10cb/0x1730 lib/asn1_decoder.c:233
Read of size 1 at addr ffff88003cccfa02 by task syz-executor0/6818
Commit e645016abc80 ("KEYS: fix writing past end of user-supplied buffer
in keyring_read()") made keyring_read() stop corrupting userspace memory
when the user-supplied buffer is too small. However it also made the
return value in that case be the short buffer size rather than the size
required, yet keyctl_read() is actually documented to return the size
required. Therefore, switch it over to the documented behavior.
Note that for now we continue to have it fill the short buffer, since it
did that before (pre-v3.13) and dump_key_tree_aux() in keyutils arguably
relies on it.
Fixes: e645016abc80 ("KEYS: fix writing past end of user-supplied buffer in keyring_read()") Reported-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: James Morris <james.l.morris@oracle.com> Signed-off-by: James Morris <james.l.morris@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: David Disseldorp <ddiss@samba.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
syzkaller reported the lockdep splat due to the possible deadlock of
grp->list_mutex of each sequencer client object. Actually this is
rather a false-positive report due to the missing nested lock
annotations. The sequencer client may deliver the event directly to
another client which takes another own lock.
For addressing this issue, this patch replaces the simple down_read()
with down_read_nested(). As a lock subclass, the already existing
"hop" can be re-used, which indicates the depth of the call.
The races among ioctl and other operations were protected by the
commit af368027a49a ("ALSA: timer: Fix race among timer ioctls") and
later fixes, but one code path was forgotten in the scenario: the
32bit compat ioctl. As syzkaller recently spotted, a very similar
use-after-free may happen with the combination of compat ioctls.
The fix is simply to apply the same ioctl_lock to the compat_ioctl
callback, too.
In eCryptfs, we failed to verify that the authentication token keys are
not revoked before dereferencing their payloads, which is problematic
because the payload of a revoked key is NULL. request_key() *does* skip
revoked keys, but there is still a window where the key can be revoked
before we acquire the key semaphore.
Fix it by updating ecryptfs_get_key_payload_data() to return
-EKEYREVOKED if the key payload is NULL. For completeness we check this
for "encrypted" keys as well as "user" keys, although encrypted keys
cannot be revoked currently.
Alternatively we could use key_validate(), but since we'll also need to
fix ecryptfs_get_key_payload_data() to validate the payload length, it
seems appropriate to just check the payload pointer.
Fixes: 237fead61998 ("[PATCH] ecryptfs: fs/Makefile and fs/Kconfig") Reviewed-by: James Morris <james.l.morris@oracle.com> Cc: Michael Halcrow <mhalcrow@google.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The device tree nodes all correctly describe the regulators as
syr827 or syr828, but the I2C device id is currently set to the
wildcard value of syr82x in the driver. This causes udev to fail
to match the driver module with the modalias data from sysfs.
Fix this by replacing the I2C device ids with ones that match the
device tree descriptions, with syr827 and syr828. Tested on
Firefly rk3288 board. The syr82x id was not used anywhere.
Fixes: e80c47bd738b (regulator: fan53555: Export I2C module alias information) Signed-off-by: Guillaume Tucker <guillaume.tucker@collabora.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
An independent security researcher, Mohamed Ghannam, has reported
this vulnerability to Beyond Security's SecuriTeam Secure Disclosure
program.
The xfrm_dump_policy_done function expects xfrm_dump_policy to
have been called at least once or it will crash. This can be
triggered if a dump fails because the target socket's receive
buffer is full.
This patch fixes it by using the cb->start mechanism to ensure that
the initialisation is always done regardless of the buffer situation.
Fixes: 12a169e7d8f4 ("ipsec: Put dumpers on the dump list") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If we try to connect while already connected/connecting, but
this fails, we set ssid_len=0 but leave current_bss hanging,
leading to errors.
Check all of this better, first of all ensuring that we can't
try to connect to a different SSID while connected/ing; ensure
that prev_bssid is set for re-association attempts even in the
case of the driver supporting the connect() method, and don't
reset ssid_len in the failure cases.
While at it, also reset ssid_len while disconnecting unless we
were connected and expect a disconnected event, and warn on a
successful connection without ssid_len being set.
Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To avoid kernel warning "Unhandled message (68)", ignore the
CMD_FLUSH_QUEUE_REPLY message for now.
As of Leaf v2 firmware version v4.1.844 (2017-02-15), flush tx queue is
synchronous. There is a capability bit indicating whether flushing tx
queue is synchronous or asynchronous.
A proper solution would be to query the device for capabilities. If the
synchronous tx flush capability bit is set, we should wait for
CMD_FLUSH_QUEUE_REPLY message, while flushing the tx queue.
Signed-off-by: Jimmy Assarsson <jimmyassarsson@gmail.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 109bade9c625 ("scsi: sg: use standard lists for sg_requests")
introduced an off-by-one error in sg_ioctl(), which was fixed by commit bd46fc406b30 ("scsi: sg: off by one in sg_ioctl()").
Unfortunately commit 4759df905a47 ("scsi: sg: factor out
sg_fill_request_table()") moved that code, and reintroduced the
bug (perhaps due to a botched rebase). Fix it again.
Fixes: 4759df905a47 ("scsi: sg: factor out sg_fill_request_table()") Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
v4.10 commit 6f2ce1c6af37 ("scsi: zfcp: fix rport unblock race with LUN
recovery") extended accessing parent pointer fields of struct
zfcp_erp_action for tracing. If an erp_action has never been enqueued
before, these parent pointer fields are uninitialized and NULL. Examples
are zfcp objects freshly added to the parent object's children list,
before enqueueing their first recovery subsequently. In
zfcp_erp_try_rport_unblock(), we iterate such list. Accessing erp_action
fields can cause a NULL pointer dereference. Since the kernel can read
from lowcore on s390, it does not immediately cause a kernel page
fault. Instead it can cause hangs on trying to acquire the wrong
erp_action->adapter->dbf->rec_lock in zfcp_dbf_rec_action_lvl()
^bogus^
while holding already other locks with IRQs disabled.
Real life example from attaching lots of LUNs in parallel on many CPUs:
crash> bt 17723
PID: 17723 TASK: ... CPU: 25 COMMAND: "zfcperp0.0.1800"
LOWCORE INFO:
-psw : 0x0404300180000000 0x000000000038e424
-function : _raw_spin_lock_wait_flags at 38e424
...
#0 [fdde8fc90] zfcp_dbf_rec_action_lvl at 3e0004e9862 [zfcp]
#1 [fdde8fce8] zfcp_erp_try_rport_unblock at 3e0004dfddc [zfcp]
#2 [fdde8fd38] zfcp_erp_strategy at 3e0004e0234 [zfcp]
#3 [fdde8fda8] zfcp_erp_thread at 3e0004e0a12 [zfcp]
#4 [fdde8fe60] kthread at 173550
#5 [fdde8feb8] kernel_thread_starter at 10add2
crash> zfcp_unit <address>
struct zfcp_unit {
erp_action = {
adapter = 0x0,
port = 0x0,
unit = 0x0,
},
}
zfcp_erp_action is always fully embedded into its container object. Such
container object is never moved in its object tree (only add or delete).
Hence, erp_action parent pointers can never change.
To fix the issue, initialize the erp_action parent pointers before
adding the erp_action container to any list and thus before it becomes
accessible from outside of its initializing function.
In order to also close the time window between zfcp_erp_setup_act()
memsetting the entire erp_action to zero and setting the parent pointers
again, drop the memset and instead explicitly initialize individually
all erp_action fields except for parent pointers. To be extra careful
not to introduce any other unintended side effect, even keep zeroing the
erp_action fields for list and timer. Also double-check with
WARN_ON_ONCE that erp_action parent pointers never change, so we get to
know when we would deviate from previous behavior.
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 6f2ce1c6af37 ("scsi: zfcp: fix rport unblock race with LUN recovery") Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix a case in the assoc_array implementation in which a new leaf is
added that needs to go into a node that happens to be full, where the
existing leaves in that node cluster together at that level to the
exclusion of new leaf.
What needs to happen is that the existing leaves get moved out to a new
node, N1, at level + 1 and the existing node needs replacing with one,
N0, that has pointers to the new leaf and to N1.
The code that tries to do this gets this wrong in two ways:
(1) The pointer that should've pointed from N0 to N1 is set to point
recursively to N0 instead.
(2) The backpointer from N0 needs to be set correctly in the case N0 is
either the root node or reached through a shortcut.
Fix this by removing this path and using the split_node path instead,
which achieves the same end, but in a more general way (thanks to Eric
Biggers for spotting the redundancy).
The problem manifests itself as:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
IP: assoc_array_apply_edit+0x59/0xe5
Fixes: 3cb989501c26 ("Add a generic associative array implementation.") Reported-and-tested-by: WU Fan <u3536072@connect.hku.hk> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
parse_hid_report_descriptor() has a while (i < length) loop, which
only guarantees that there's at least 1 byte in the buffer, but the
loop body can read multiple bytes which causes out-of-bounds access.
In case gntdev_mmap() succeeds only partially in mapping grant pages
it will leave some vital information uninitialized needed later for
cleanup. This will lead to an out of bounds array access when unmapping
the already mapped pages.
So just initialize the data needed for unmapping the pages a little bit
earlier.
Reported-by: Arthur Borsboom <arthurborsboom@gmail.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There was an inversion in how the error path in bcm_qspi_probe() is done
which would make us trip over a KASAN use-after-free report. Turns out
that qspi->dev_ids does not get allocated until later in the probe
process. Fix this by introducing a new lable: qspi_resource_err which
takes care of cleaning up the SPI master instance.
Fixes: fa236a7ef240 ("spi: bcm-qspi: Add Broadcom MSPI driver") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The SPI_IOC_MESSAGE() macro references _IOC_SIZEBITS. Add linux/ioctl.h
to make sure this macro is defined. This fixes the following build
failure of lcdproc with the musl libc:
In file included from .../sysroot/usr/include/sys/ioctl.h:7:0,
from hd44780-spi.c:31:
hd44780-spi.c: In function 'spi_transfer':
hd44780-spi.c:89:24: error: '_IOC_SIZEBITS' undeclared (first use in this function)
status = ioctl(p->fd, SPI_IOC_MESSAGE(1), &xfer);
^
Signed-off-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This happens because when using the global KVM fd with
KVM_CHECK_EXTENSION, kvm_vm_ioctl_check_extension() gets
called with a NULL kvm argument, which gets dereferenced
in is_kvmppc_hv_enabled(). Spotted while reading the code.
Let's use the hv_enabled fallback variable, like everywhere
else in this function.
Fixes: 23528bb21ee2 ("KVM: PPC: Introduce KVM_CAP_PPC_HTM") Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
xhci_stop_device() calls xhci_queue_stop_endpoint() multiple times
without checking the return value. xhci_queue_stop_endpoint() can
return error if the HC is already halted or unable to queue commands.
This can cause a deadlock condition as xhci_stop_device() would
end up waiting indefinitely for a completion for the command that
didn't get queued. Fix this by checking the return value and bailing
out of xhci_stop_device() in case of error. This patch happens to fix
potential memory leaks of the allocated command structures as well.
We have several Dell laptops which use the codec alc236, the headset
mic can't work on these machines. Following the commit 736f20a70, we
add the pin cfg table to make the headset mic work.
Josef reported a HARDIRQ-safe -> HARDIRQ-unsafe lock order detected by
lockdep:
[ 1270.472259] WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
[ 1270.472783] 4.14.0-rc1-xfstests-12888-g76833e8 #110 Not tainted
[ 1270.473240] -----------------------------------------------------
[ 1270.473710] kworker/u5:2/5157 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
[ 1270.474239] (&(&lock->wait_lock)->rlock){+.+.}, at: [<ffffffff8da253d2>] __mutex_unlock_slowpath+0xa2/0x280
[ 1270.474994]
[ 1270.474994] and this task is already holding:
[ 1270.475440] (&pool->lock/1){-.-.}, at: [<ffffffff8d2992f6>] worker_thread+0x366/0x3c0
[ 1270.476046] which would create a new lock dependency:
[ 1270.476436] (&pool->lock/1){-.-.} -> (&(&lock->wait_lock)->rlock){+.+.}
[ 1270.476949]
[ 1270.476949] but this new dependency connects a HARDIRQ-irq-safe lock:
[ 1270.477553] (&pool->lock/1){-.-.}
...
[ 1270.488900] to a HARDIRQ-irq-unsafe lock:
[ 1270.489327] (&(&lock->wait_lock)->rlock){+.+.}
...
[ 1270.494735] Possible interrupt unsafe locking scenario:
[ 1270.494735]
[ 1270.495250] CPU0 CPU1
[ 1270.495600] ---- ----
[ 1270.495947] lock(&(&lock->wait_lock)->rlock);
[ 1270.496295] local_irq_disable();
[ 1270.496753] lock(&pool->lock/1);
[ 1270.497205] lock(&(&lock->wait_lock)->rlock);
[ 1270.497744] <Interrupt>
[ 1270.497948] lock(&pool->lock/1);
, which will cause a irq inversion deadlock if the above lock scenario
happens.
The root cause of this safe -> unsafe lock order is the
mutex_unlock(pool->manager_arb) in manage_workers() with pool->lock
held.
Unlocking mutex while holding an irq spinlock was never safe and this
problem has been around forever but it never got noticed because the
only time the mutex is usually trylocked while holding irqlock making
actual failures very unlikely and lockdep annotation missed the
condition until the recent b9c16a0e1f73 ("locking/mutex: Fix
lockdep_assert_held() fail").
Using mutex for pool->manager_arb has always been a bit of stretch.
It primarily is an mechanism to arbitrate managership between workers
which can easily be done with a pool flag. The only reason it became
a mutex is that pool destruction path wants to exclude parallel
managing operations.
This patch replaces the mutex with a new pool flag POOL_MANAGER_ACTIVE
and make the destruction path wait for the current manager on a wait
queue.
v2: Drop unnecessary flag clearing before pool destruction as
suggested by Boqun.
Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When the file /proc/fs/fscache/objects (available with
CONFIG_FSCACHE_OBJECT_LIST=y) is opened, we request a user key with
description "fscache:objlist", then access its payload. However, a
revoked key has a NULL payload, and we failed to check for this.
request_key() *does* skip revoked keys, but there is still a window
where the key can be revoked before we access its payload.
Fix it by checking for a NULL payload, treating it like a key which was
already revoked at the time it was requested.
Fixes: 4fbf4291aa15 ("FS-Cache: Allow the current state of all objects to be dumped") Reviewed-by: James Morris <james.l.morris@oracle.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Consolidate KEY_FLAG_INSTANTIATED, KEY_FLAG_NEGATIVE and the rejection
error into one field such that:
(1) The instantiation state can be modified/read atomically.
(2) The error can be accessed atomically with the state.
(3) The error isn't stored unioned with the payload pointers.
This deals with the problem that the state is spread over three different
objects (two bits and a separate variable) and reading or updating them
atomically isn't practical, given that not only can uninstantiated keys
change into instantiated or rejected keys, but rejected keys can also turn
into instantiated keys - and someone accessing the key might not be using
any locking.
The main side effect of this problem is that what was held in the payload
may change, depending on the state. For instance, you might observe the
key to be in the rejected state. You then read the cached error, but if
the key semaphore wasn't locked, the key might've become instantiated
between the two reads - and you might now have something in hand that isn't
actually an error code.
The state is now KEY_IS_UNINSTANTIATED, KEY_IS_POSITIVE or a negative error
code if the key is negatively instantiated. The key_is_instantiated()
function is replaced with key_is_positive() to avoid confusion as negative
keys are also 'instantiated'.
Additionally, barriering is included:
(1) Order payload-set before state-set during instantiation.
(2) Order state-read before payload-read when using the key.
Further separate barriering is necessary if RCU is being used to access the
payload content after reading the payload pointers.
Fixes: 146aa8b1453b ("KEYS: Merge the type-specific data with the payload data") Reported-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When an fscrypt-encrypted file is opened, we request the file's master
key from the keyrings service as a logon key, then access its payload.
However, a revoked key has a NULL payload, and we failed to check for
this. request_key() *does* skip revoked keys, but there is still a
window where the key can be revoked before we acquire its semaphore.
Fix it by checking for a NULL payload, treating it like a key which was
already revoked at the time it was requested.
Fixes: 88bd6ccdcdd6 ("ext4 crypto: add encryption key management facilities") Reviewed-by: James Morris <james.l.morris@oracle.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The writeback rework in commit fbcc02561359 ("xfs: Introduce
writeback context for writepages") introduced a subtle change in
behavior with regard to the block mapping used across the
->writepages() sequence. The previous xfs_cluster_write() code would
only flush pages up to EOF at the time of the writepage, thus
ensuring that any pages due to file-extending writes would be
handled on a separate cycle and with a new, updated block mapping.
The updated code establishes a block mapping in xfs_writepage_map()
that could extend beyond EOF if the file has post-eof preallocation.
Because we now use the generic writeback infrastructure and pass the
cached mapping to each writepage call, there is no implicit EOF
limit in place. If eofblocks trimming occurs during ->writepages(),
any post-eof portion of the cached mapping becomes invalid. The
eofblocks code has no means to serialize against writeback because
there are no pages associated with post-eof blocks. Therefore if an
eofblocks trim occurs and is followed by a file-extending buffered
write, not only has the mapping become invalid, but we could end up
writing a page to disk based on the invalid mapping.
Consider the following sequence of events:
- A buffered write creates a delalloc extent and post-eof
speculative preallocation.
- Writeback starts and on the first writepage cycle, the delalloc
extent is converted to real blocks (including the post-eof blocks)
and the mapping is cached.
- The file is closed and xfs_release() trims post-eof blocks. The
cached writeback mapping is now invalid.
- Another buffered write appends the file with a delalloc extent.
- The concurrent writeback cycle picks up the just written page
because the writeback range end is LLONG_MAX. xfs_writepage_map()
attributes it to the (now invalid) cached mapping and writes the
data to an incorrect location on disk (and where the file offset is
still backed by a delalloc extent).
This problem is reproduced by xfstests test generic/464, which
triggers racing writes, appends, open/closes and writeback requests.
To address this problem, trim the mapping used during writeback to
within EOF when the mapping is validated. This ensures the mapping
is revalidated for any pages encountered beyond EOF as of the time
the current mapping was cached or last validated.
Reported-by: Eryu Guan <eguan@redhat.com> Diagnosed-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Recently we've had warnings arise from the vm handing us pages
without bufferheads attached to them. This should not ever occur
in XFS, but we don't defend against it properly if it does. The only
place where we remove bufferheads from a page is in
xfs_vm_releasepage(), but we can't tell the difference here between
"page is dirty so don't release" and "page is dirty but is being
invalidated so release it".
In some places that are invalidating pages ask for pages to be
released and follow up afterward calling ->releasepage by checking
whether the page was dirty and then aborting the invalidation. This
is a possible vector for releasing buffers from a page but then
leaving it in the mapping, so we really do need to avoid dirty pages
in xfs_vm_releasepage().
To differentiate between invalidated pages and normal pages, we need
to clear the page dirty flag when invalidating the pages. This can
be done through xfs_vm_invalidatepage(), and will result
xfs_vm_releasepage() seeing the page as clean which matches the
bufferhead state on the page after calling block_invalidatepage().
Hence we can re-add the page dirty check in xfs_vm_releasepage to
catch the case where we might be releasing a page that is actually
dirty and so should not have the bufferheads on it removed. This
will remove one possible vector of "dirty page with no bufferheads"
and so help narrow down the search for the root cause of that
problem.
Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jason reported that a corrupted filesystem failed to replay
the log with a metadata block out of bounds warning:
XFS (dm-2): _xfs_buf_find: Block out of range: block 0x80270fff8, EOFS 0x9c40000
_xfs_buf_find() and xfs_btree_get_bufs() return NULL if
that happens, and then when xfs_alloc_fix_freelist() calls
xfs_trans_binval() on that NULL bp, we oops with:
BUG: unable to handle kernel NULL pointer dereference at 00000000000000f8
We don't handle _xfs_buf_find errors very well, every
caller higher up the stack gets to guess at why it failed.
But we should at least handle it somehow, so return
EFSCORRUPTED here.
Reported-by: Jason L Tibbitts III <tibbs@math.uh.edu> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
xfs_attr3_root_inactive() walks the attr fork tree to invalidate the
associated blocks. xfs_attr3_node_inactive() recursively descends
from internal blocks to leaf blocks, caching block address values
along the way to revisit parent blocks, locate the next entry and
descend down that branch of the tree.
The code that attempts to reread the parent block is unsafe because
it assumes that the local xfs_da_node_entry pointer remains valid
after an xfs_trans_brelse() and re-read of the parent buffer. Under
heavy memory pressure, it is possible that the buffer has been
reclaimed and reallocated by the time the parent block is reread.
This means that 'btree' can point to an invalid memory address, lead
to a random/garbage value for child_fsb and cause the subsequent
read of the attr fork to go off the rails and return a NULL buffer
for an attr fork offset that is most likely not allocated.
Note that this problem can be manufactured by setting
XFS_ATTR_BTREE_REF to 0 to prevent LRU caching of attr buffers,
creating a file with a multi-level attr fork and removing it to
trigger inactivation.
To address this problem, reinit the node/btree pointers to the
parent buffer after it has been re-read. This ensures btree points
to a valid record and allows the walk to proceed.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>