The get_dumpable() return value is not boolean. Most users of the
function actually want to be testing for non-SUID_DUMP_USER(1) rather than
SUID_DUMP_DISABLE(0). The SUID_DUMP_ROOT(2) is also considered a
protected state. Almost all places did this correctly, excepting the two
places fixed in this patch.
Wrong logic:
if (dumpable == SUID_DUMP_DISABLE) { /* be protective */ }
or
if (dumpable == 0) { /* be protective */ }
or
if (!dumpable) { /* be protective */ }
Correct logic:
if (dumpable != SUID_DUMP_USER) { /* be protective */ }
or
if (dumpable != 1) { /* be protective */ }
Without this patch, if the system had set the sysctl fs/suid_dumpable=2, a
user was able to ptrace attach to processes that had dropped privileges to
that user. (This may have been partially mitigated if Yama was enabled.)
The macros have been moved into the file that declares get/set_dumpable(),
which means things like the ia64 code can see them too.
Many btusb devices have 2 modes, a hid mode and a bluetooth hci mode. These
devices default to hid mode for BIOS use. This means that after having been
reset they will revert to HID mode, and are no longer usable as a HCI.
Therefor it is a very bad idea to just blindly make reset_resume point to
the regular resume handler. Note that the btusb driver has no clue how to
switch these devices from hid to hci mode, this is done in userspace through
udev rules, so the proper way to deal with this is to not have a reset-resume
handler and instead let the usb-system re-enumerate the device, and re-run
the udev rules.
I must also note, that the commit message for the commit causing this
problem has a very weak motivation for the change:
"Add missing reset_resume dev_pm_ops. Missing reset_resume results in the
following message after power management device test. This change sets
reset_resume to btusb_resume().
[ 2506.936134] btusb 1-1.5:1.0: no reset_resume for driver btusb?
[ 2506.936137] btusb 1-1.5:1.1: no reset_resume for driver btusb?"
Making a change solely to silence a warning while also changing important
behavior (normal resume handling versus re-enumeration) requires a commit
message with a proper explanation why it is safe to do so, which clearly lacks
here, and unsurprisingly it turns out to not be safe to make this change.
Reverting the commit in question fixes bt no longer working on my Dell
E6430 after a suspend/resume, and I believe it likely also fixes the
following bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=988481
https://bugzilla.redhat.com/show_bug.cgi?id=1010649
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1213239
In commit 3d81535ea5940446510a8a5cee1c6ad23c90c753
(rt2800: 5592: add chip specific vgc calculations)
the rt2800_link_tuner function has been modified to
adjust VGC level for the RT5592 chipset.
On the RT5592 chipset, the VGC level must be adjusted
only if rssi is greater than -65. However the current
code adjusts the VGC value by 0x10 regardless of the
actual chipset if the rssi value is between -80 and
-65.
Fix the broken behaviour by reordering the if-else
statements.
Signed-off-by: Gabor Juhos <juhosg@openwrt.org> Acked-by: Stanislaw Gruszka <stf_xl@wp.pl> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit "rt2x00: fix HT TX descriptor settings regression"
assumes that the control parameter to rt2x00mac_tx is always non-NULL.
There is an internal call in rt2x00lib_bc_buffer_iter where NULL is
passed. Fix the resulting crash by adding an initialized dummy on-stack
ieee80211_tx_control struct.
Signed-off-by: Felix Fietkau <nbd@openwrt.org> Acked-by: Gertjan van Wingerde <gwingerde@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
`comedi_alloc_spriv()` allocates private storage for a comedi subdevice
and sets the `SRF_FREE_SPRIV` flag in the `runflags` member of the
subdevice to allow the private storage to be automatically freed when
the comedi device is being cleaned up. Unfortunately, the flag gets
clobbered by `do_cmd_ioctl()` which calls
`comedi_set_subdevice_runflags()` with a mask value `~0` and only the
`SRF_USER` and `SRF_RUNNING` flags set, all the other SRF flags being
cleared.
Change the calls to `comedi_set_subdevice_runflags()` that currently use
a mask value of `~0` to use a more relevant mask value. For
`do_cmd_ioctl()`, the relevant SRF flags are `SRF_USER`, `SRF_ERROR` and
`SRF_RUNNING`. (At one time, `SRF_RT` would be included in that set of
flags, but it is no longer used.) For `comedi_alloc_spriv()` replace
the call to `comedi_set_subdevice_runflags()` with a simple
OR-assignment to avoid unnecessary use of a spin-lock.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch fixes the bug in reset_store caused by accessing NULL pointer.
The bdev gets its value from bdget_disk() which could fail when memory
pressure is severe and hence can return NULL because allocation of
inode in bdget could fail.
Hence, this patch introduces a check for bdev to prevent reference to a
NULL pointer in the later part of the code. It also removes unnecessary
check of bdev for fsync_bdev().
According to the ACPI spec (5.0, Section 6.3.5), the "Device
insertion in progress (pending)" (0x80) _OST status code is
reserved for the "Insertion Processing" (0x200) source event
which is "a result of an OSPM action". Specifically, it is not
a notification, so that status code should not be used during
notification processing, which unfortunately is done by
acpi_scan_bus_device_check().
For this reason, drop the ACPI_OST_SC_INSERT_IN_PROGRESS _OST
status evaluation from there (it was a mistake to put it in there
in the first place).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It is required to do get_device() on the struct acpi_device in
question before passing it to acpi_bus_hot_remove_device() through
acpi_os_hotplug_execute(), because acpi_bus_hot_remove_device()
calls acpi_scan_hot_remove() that does put_device() on that
object.
The ACPI PCI root removal routine, handle_root_bridge_removal(),
doesn't do that, which may lead to premature freeing of the
device object or to executing put_device() on an object that
has been freed already.
Fix this problem by making handle_root_bridge_removal() use
get_device() as appropriate.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Some firmware doesn't initialize initial backlight level to a proper
value and _BQC will return 0 on first time evaluation. We used to be
able to detect such incorrect value with our code logic, as value 0
normally isn't a valid value in _BCL. But with the introduction of Win8,
firmware begins to fill _BCL with values from 0 to 100, now 0 becomes
a valid value but that value will make user's screen black. This patch
test initial _BQC for value 0, if such a value is returned, do not use
it.
References: https://bugzilla.kernel.org/show_bug.cgi?id=64031
References: https://bugzilla.kernel.org/show_bug.cgi?id=61231
References: https://bugzilla.kernel.org/show_bug.cgi?id=63111 Reported-by: Qingshuai Tian <qingshuai.tian@intel.com> Tested-by: Aaron Lu <aaron.lu@intel.com> # on "Idealpad u330p" Reported-and-tested-by: <erno@iki.fi> # on "Acer Aspire V5-573G" Reported-and-tested-by: Kirill Tkhai <tkhai@yandex.ru> # on "HP 250 G1" Signed-off-by: Aaron Lu <aaron.lu@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A bug was introduced by commit b76b51ba0cef ('ACPI / EC: Add more debug
info and trivial code cleanup') that erroneously caused the struct member
to be accessed before acquiring the required lock. This change fixes
it by ensuring the lock acquisition is done first.
Found by Aaron Durbin <adurbin@chromium.org>
Fixes: b76b51ba0cef ('ACPI / EC: Add more debug info and trivial code cleanup')
References: http://crbug.com/319019 Signed-off-by: Puneet Kumar <puneetster@chromium.org> Reviewed-by: Aaron Durbin <adurbin@chromium.org>
[olof: Commit message reworded a bit] Signed-off-by: Olof Johansson <olof@lixom.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The current default perf paranoid level is "1" which has
"perf_paranoid_kernel()" return false, and giving any operations that
use it, access to normal users. Unfortunately, this includes function
tracing and normal users should not be allowed to enable function
tracing by default.
The proper level is defined at "-1" (full perf access), which
"perf_paranoid_tracepoint_raw()" will only give access to. Use that
check instead for enabling function tracing.
Reported-by: Dave Jones <davej@redhat.com> Reported-by: Vince Weaver <vincent.weaver@maine.edu> Tested-by: Vince Weaver <vincent.weaver@maine.edu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com>
CVE: CVE-2013-2930 Fixes: ced39002f5ea ("ftrace, perf: Add support to use function tracepoint in perf") Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Previously we allowed callers to access Slot Capabilities, Status, and
Control for Root Ports even if the Root Port did not implement a slot.
This seems dubious because the spec only requires these registers if a
slot is implemented.
It's true that even Root Ports without slots must have *space* for these
slot registers, because the Root Capabilities, Status, and Control
registers are after the slot registers in the capability. However,
for a v1 PCIe Capability, the *semantics* of the slot registers are
undefined unless a slot is implemented.
Previously we relied on the PCIe r3.0, sec 7.8, spec language that says
"For Functions that do not implement the [Link, Slot, Root] registers,
these spaces must be hardwired to 0b," which means that for v2 PCIe
capabilities, we don't need to check the device type at all.
But it's simpler if we don't need to check the capability version at all,
and I think the spec is explicit enough about which registers are required
for which types that we can remove the version checks.
Every PCIe device has a link, except Root Complex Integrated Endpoints
and Root Complex Event Collectors. Previously we didn't give access
to PCIe capability link-related registers for Upstream Ports, Downstream
Ports, and Bridges, so attempts to read PCI_EXP_LNKCTL incorrectly
returned zero. See PCIe spec r3.0, sec 7.8 and 1.3.2.3.
Mike reported that commit 7d1a9417 ("x86: Use generic idle loop")
regressed several workloads and caused excessive reschedule
interrupts.
The patch in question failed to notice that the x86 code had an
inverted sense of the polling state versus the new generic code (x86:
default polling, generic: default !polling).
Fix the two prominent x86 mwait based idle drivers and introduce a few
new generic polling helpers (fixing the wrong smp_mb__after_clear_bit
usage).
Also switch the idle routines to using tif_need_resched() which is an
immediate TIF_NEED_RESCHED test as opposed to need_resched which will
end up being slightly different.
The NFS layer needs to know when a key has expired.
This change also returns -EKEYEXPIRED to the application, and the informative
"Key has expired" error message is displayed. The user then knows that
credential renewal is required.
The oops is caused because shm_destroy() calls fput() after dropping the
ipc_lock. fput() clears the file's f_inode, f_path.dentry, and
f_path.mnt, which causes various NULL pointer references in task 2. I
reliably see the oops in task 2 if with shmlock, shmu
This patch fixes the races by:
1) set shm_file=NULL in shm_destroy() while holding ipc_object_lock().
2) modify at risk operations to check shm_file while holding
ipc_object_lock().
Example workloads, which each trigger oops...
Workload 1:
while true; do
id=$(shmget 1 4096)
shm_rmid $id &
shmlock $id &
wait
done
The oops stack shows accessing NULL f_inode due to racing fput:
_raw_spin_lock
shmem_lock
SyS_shmctl
Workload 2:
while true; do
id=$(shmget 1 4096)
shmat $id 4096 &
shm_rmid $id &
wait
done
The oops stack is similar to workload 1 due to NULL f_inode:
touch_atime
shmem_mmap
shm_mmap
mmap_region
do_mmap_pgoff
do_shmat
SyS_shmat
Workload 3:
while true; do
id=$(shmget 1 4096)
shmlock $id
shm_rmid $id &
shmunlock $id &
wait
done
The oops stack shows second fput tripping on an NULL f_inode. The
first fput() completed via from shm_destroy(), but a racing thread did
a get_file() and queued this fput():
locks_remove_flock
__fput
____fput
task_work_run
do_notify_resume
int_signal
Fixes: c2c737a0461e ("ipc,shm: shorten critical region for shmat") Fixes: 2caacaa82a51 ("ipc,shm: shorten critical region for shmctl") Signed-off-by: Greg Thelen <gthelen@google.com> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Rik van Riel <riel@redhat.com> Cc: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 2caacaa82a51 ("ipc,shm: shorten critical region for shmctl")
restructured the ipc shm to shorten critical region, but introduced a
path where the return value could be -EPERM, even if the operation
actually was performed.
Before the commit, the err return value was reset by the return value
from security_shm_shmctl() after the if (!ns_capable(...)) statement.
Now, we still exit the if statement with err set to -EPERM, and in the
case of SHM_UNLOCK, it is not reset at all, and used as the return value
from shmctl.
To fix this, we only set err when errors occur, leaving the fallthrough
case alone.
Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Rik van Riel <riel@redhat.com> Cc: Michel Lespinasse <walken@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This fixes bug 62491 (https://bugzilla.kernel.org/show_bug.cgi?id=62491).
After resuming some users got the following error flooding the kernel log:
alx 0000:02:00.0: invalid PHY speed/duplex: 0xffff
Signed-off-by: Jonas Hahnfeld <linux@hahnjo.de> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: hahnjo <linux@hahnjo.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The cbc-aes-s390 algorithm incorrectly places the IV in the tfm
data structure. As the tfm is shared between multiple threads,
this introduces a possibility of data corruption.
This patch fixes this by moving the parameter block containing
the IV and key onto the stack (the block is 48 bytes long).
The same bug exists elsewhere in the s390 crypto system and they
will be fixed in subsequent patches.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Stephan Mueller reported to me recently a error in random number generation in
the ansi cprng. If several small requests are made that are less than the
instances block size, the remainder for loop code doesn't increment
rand_data_valid in the last iteration, meaning that the last bytes in the
rand_data buffer gets reused on the subsequent smaller-than-a-block request for
random data.
The fix is pretty easy, just re-code the for loop to make sure that
rand_data_valid gets incremented appropriately
Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: Stephan Mueller <stephan.mueller@atsec.com> CC: Stephan Mueller <stephan.mueller@atsec.com> CC: Petr Matousek <pmatouse@redhat.com> CC: Herbert Xu <herbert@gondor.apana.org.au> CC: "David S. Miller" <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cc: Luis Henriques <luis.henriques@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A user reported a problem where they were getting csum errors when running a
balance and running systemd's journal. This is because systemd is awesome and
fallocate()'s its log space and writes into it. Unfortunately we assume that
when we read in all the csums for an extent that they are sequential starting at
the bytenr we care about. This obviously isn't the case for prealloc extents,
where we could have written to the middle of the prealloc extent only, which
means the csum would be for the bytenr in the middle of our range and not the
front of our range. Fix this by offsetting the new bytenr we are logging to
based on the original bytenr the csum was for. With this patch I no longer see
the csum errors I was seeing. Thanks,
Reported-by: Chris Murphy <lists@colorremedies.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Some devices, like the Kvaser Memorator Professional, have several bulk in
endpoints. Only the first one found must be used by the driver. The same holds
for the bulk out endpoint. The official Kvaser driver (leaf) was used as
reference for this patch.
This change fixes a problem where a Store operation to an ArgX object
that contained a reference to a field object did not complete the
automatic dereference and then write to the actual field object.
Instead, the object type of the field object was inadvertently changed
to match the type of the source operand. The new behavior will actually
write to the field object (buffer field or field unit), thus matching
the correct ACPI-defined behavior.
Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Disallow the dereference of a reference (via index) to an uninitialized
package element. Provides compatibility with other ACPI
implementations. ACPICA BZ 1003.
References: https://bugs.acpica.org/show_bug.cgi?id=431 Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It appears that driver runs into a problem here if fibsize is too small
because we allocate user_srbcmd with fibsize size only but later we
access it until user_srbcmd->sg.count to copy it over to srbcmd.
It is not correct to test (fibsize < sizeof(*user_srbcmd)) because this
structure already includes one sg element and this is not needed for
commands without data. So, we would recommend to add the following
(instead of test for fibsize == 0).
Previously, references to these objects were resolved only to the actual
FieldUnit or BufferField object. The correct behavior is to resolve these
references to an actual value.
The problem is that DerefOf did not resolve these objects to actual
values. An "Integer" object is simple, return the value. But a field in
an operation region will require a read operation. For a BufferField, the
appropriate data must be extracted from the parent buffer.
NOTE: It appears that this issues is present in Windows7 but not
Windows8.
Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ignoring usb_hub_create_port_device() errors cause later NULL pointer
deference when uninitialized hub->ports[i] entries are dereferenced
after port memory allocation error.
Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If the hub_configure() fails after setting the hdev->maxchild
the hub->ports might be NULL or point to uninitialized kzallocated
memory causing NULL pointer dereference in hub_quiesce() during cleanup.
Now after such error the hdev->maxchild is set to 0 to avoid cleanup
of uninitialized ports.
Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Two drivers (atmel-pwm-bl and leds-atmel-pwm) currently depend on the
atmel_pwm driver to have bound to any pwm-device before their devices
are probed.
Support deferred probing of such devices by making sure to return
-EPROBE_DEFER from pwm_channel_alloc when no pwm-device has yet been
bound.
Signed-off-by: Johan Hovold <jhovold@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The PPC64 people noticed a missing memory barrier and crufty old
comments in the perf ring buffer code. So update all the comments and
add the missing barrier.
When the architecture implements local_t using atomic_long_t there
will be double barriers issued; but short of introducing more
conditional barrier primitives this is the best we can do.
Reported-by: Victor Kaplansky <victork@il.ibm.com> Tested-by: Victor Kaplansky <victork@il.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: michael@ellerman.id.au Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Michael Neuling <mikey@neuling.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: anton@samba.org Cc: benh@kernel.crashing.org Link: http://lkml.kernel.org/r/20131025173749.GG19466@laptop.lan Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Michael Neuling <mikey@neuling.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This isn't a real fix to the problem, but rather a stopgap measure while
trying to find a proper solution.
There are several laptops out there that fail to light up the eDP panel
in UEFI boot mode. They seem to be mostly IVB machines, including but
apparently not limited to Dell XPS 13, Asus TX300, Asus UX31A, Asus
UX32VD, Acer Aspire S7. They seem to work in CSM or legacy boot.
The difference between UEFI and CSM is that the BIOS provides a
different VBT to the kernel. The UEFI VBT typically specifies 18 bpp and
1.62 GHz link for eDP, while CSM VBT has 24 bpp and 2.7 GHz link. We end
up clamping to 18 bpp in UEFI mode, which we can fit in the 1.62 Ghz
link, and for reasons yet unknown fail to light up the panel.
Dithering from 24 to 18 bpp itself seems to work; if we use 18 bpp with
2.7 GHz link, the eDP panel lights up. So essentially this is a link
speed issue, and *not* a bpp clamping issue.
The bug raised its head since
commit 657445fe8660100ad174600ebfa61536392b7624
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Sat May 4 10:09:18 2013 +0200
which started clamping bpp *before* computing the link requirements, and
thus affecting the required bandwidth. Clamping after the computations
kept the link at 2.7 GHz.
Even though the BIOS tells us to use 18 bpp through the VBT, it happily
boots up at 24 bpp and 2.7 GHz itself! Use this information to
selectively ignore the VBT provided value.
We can't ignore the VBT eDP bpp altogether, as there are other laptops
that do require the clamping to be used due to EDID reporting higher bpp
than the panel can support.
Shadow bytes around the buggy address: ffff8800359c9700: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd ffff8800359c9780: fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa ffff8800359c9800: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa ffff8800359c9880: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa ffff8800359c9900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>ffff8800359c9980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00[03]fb ffff8800359c9a00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa ffff8800359c9a80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa ffff8800359c9b00: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00 ffff8800359c9b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8800359c9c00: 00 00 00 00 00 00 00 00 fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap redzone: fa
Heap kmalloc redzone: fb
Freed heap region: fd
Shadow gap: fe
The out-of-bounds access happens on 'parser->buffer[parser->idx] = 0;'
Although the crash happened in ftrace_regex_open() the real bug
occurred in trace_get_user() where there's an incrementation to
parser->idx without a check against the size. The way it is triggered
is if userspace sends in 128 characters (EVENT_BUF_SIZE + 1), the loop
that reads the last character stores it and then breaks out because
there is no more characters. Then the last character is read to determine
what to do next, and the index is incremented without checking size.
Then the caller of trace_get_user() usually nulls out the last character
with a zero, but since the index is equal to the size, it writes a nul
character after the allocated space, which can corrupt memory.
Luckily, only root user has write access to this file.
hdmi_setup_fake_chmap() is supposed to set the reported channel map when
the channel map is not specified by the user.
However, the function indexes channel_allocations[] with a wrong value
and extracts the wrong nibble from hdmi_channel_mapping[], causing wrong
channel maps to be shown.
Fix those issues.
Tested on Intel HDMI to correctly generate various channel maps, for
example 3,4,14,15,7,8,5,6 (instead of incorrect 3,4,8,7,5,6,14,0) for
standard 7.1 channel audio. (Note that the side and rear channels are
reported as RL/RR and RLC/RRC, respectively, as per the CEA-861
standard, instead of the more traditional SL/SR and RL/RR.)
Note that this only fixes the layouts that only contain traditional 7.1
speakers (2.0, 2.1, 4.0, 5.1, 7.1, etc.). E.g. the rear center of 6.1
is still being shown wrongly due to an issue with from_cea_slot()
which will be fixed in a later patch.
This patch adds a pci stub driver to hyper-fb. The hyperv framebuffer
driver will bind to the pci device then, so linux kernel and userspace
know there is a proper kernel driver for the device active. lspci shows
this for example:
[root@dhcp231 ~]# lspci -vs8
00:08.0 VGA compatible controller: Microsoft Corporation Hyper-V virtual
VGA (prog-if 00 [VGA controller])
Flags: bus master, fast devsel, latency 0, IRQ 11
Memory at f8000000 (32-bit, non-prefetchable) [size=64M]
Expansion ROM at <unassigned> [disabled]
Kernel driver in use: hyperv_fb
Another effect is that the xorg vesa driver will not attach to the
device and thus the Xorg server will automatically use the fbdev
driver instead.
x86_pkg_temp receives thermal notifications via a callback from a
therm_throt driver, where thermal interrupts are processed.
This callback is pkg_temp_thermal_platform_thermal_notify. Here to
avoid multiple interrupts from cores in a package, we disable the
source and also set a variable to avoid scheduling delayed work function.
This variable is protected via spin_lock_irqsave. On one buggy platform,
we still receiving interrupts even if the source is disabled. This
can cause deadlock/lockdep warning, when interrupt is generated while under
spinlock in work function.
Change spin_lock to spin_lock_irqsave and spin_unlock to
spin_unlock_irqrestore as the data it is trying to protect can also
be modified in a notification call called from interrupt handler.
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Paul Durrant <Paul.Durrant@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When the frontend state changes netback now specifies its desired state to
a new function, set_backend_state(), which transitions through any
necessary intermediate states.
This fixes an issue observed with some old Windows frontend drivers where
they failed to transition through the Closing state and netback would not
behave correctly.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: David Vrabel <david.vrabel@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On receiving a packet too big icmp error we update the expire value by
calling rt6_update_expires. This function uses dst_set_expires which is
implemented that it can only reduce the expiration value of the dst entry.
If we insert new routing non-expiry information into the ipv6 fib where
we already have a matching rt6_info we only clear the RTF_EXPIRES flag
in rt6i_flags and leave the dst.expires value as is.
When new mtu information arrives for that cached dst_entry we again
call dst_set_expires. This time it won't update the dst.expire value
because we left the dst.expire value intact from the last update. So
dst_set_expires won't touch dst.expires.
Fix this by resetting dst.expires when clearing the RTF_EXPIRE flag.
dst_set_expires checks for a zero expiration and updates the
dst.expires.
In the past this (not updating dst.expires) was necessary because
dst.expire was placed in a union with the dst_entry *from reference
and rt6_clean_expires did assign NULL to it. This split happend in ecd9883724b78cc72ed92c98bcb1a46c764fff21 ("ipv6: fix race condition
regarding dst->expires and dst->from").
Reported-by: Steinar H. Gunderson <sgunderson@bigfoot.com> Reported-by: Valentijn Sessink <valentyn@blub.net> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Eric Dumazet <edumazet@google.com> Tested-by: Valentijn Sessink <valentyn@blub.net> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On receiving a packet too big icmp error we check if our current cached
dst_entry in the socket is still valid. This validation check did not
care about the expiration of the (cached) route.
The error path I traced down:
The socket receives a packet too big mtu notification. It still has a
valid dst_entry and thus issues the ip6_rt_pmtu_update on this dst_entry,
setting RTF_EXPIRE and updates the dst.expiration value (which could
fail because of not up-to-date expiration values, see previous patch).
In some seldom cases we race with a) the ip6_fib gc or b) another routing
lookup which would result in a recreation of the cached rt6_info from its
parent non-cached rt6_info. While copying the rt6_info we reinitialize the
metrics store by copying it over from the parent thus invalidating the
just installed pmtu update (both dsts use the same key to the inetpeer
storage). The dst_entry with the just invalidated metrics data would
just get its RTF_EXPIRES flag cleared and would continue to stay valid
for the socket.
We should have not issued the pmtu update on the already expired dst_entry
in the first placed. By checking the expiration on the dst entry and
doing a relookup in case it is out of date we close the race because
we would install a new rt6_info into the fib before we issue the pmtu
update, thus closing this race.
Not reliably updating the dst.expire value was fixed by the patch "ipv6:
reset dst.expires value when clearing expire flag".
Reported-by: Steinar H. Gunderson <sgunderson@bigfoot.com> Reported-by: Valentijn Sessink <valentyn@blub.net> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Tested-by: Valentijn Sessink <valentyn@blub.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6ff50cd55545 ("tcp: gso: do not generate out of order packets")
had an heuristic that can trigger a warning in skb_try_coalesce(),
because skb->truesize of the gso segments were exactly set to mss.
ifconfig lo mtu 1500
ethtool -K lo tso off
netperf
As the skbs are looped into the TCP networking stack, skb_try_coalesce()
warns us of these skb under-estimating their truesize.
Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The length calculation here is now invalid on 32-bit architectures,
since sk_buff::tail is a pointer and sk_buff::transport_header is
an integer offset:
drivers/net/ethernet/chelsio/cxgb3/sge.c: In function 'write_ofld_wr':
drivers/net/ethernet/chelsio/cxgb3/sge.c:1603:9: warning: passing argument 4 of 'make_sgl' makes integer from pointer without a cast [enabled by default]
adap->pdev);
^
drivers/net/ethernet/chelsio/cxgb3/sge.c:964:28: note: expected 'unsigned int' but argument is of type 'sk_buff_data_t'
static inline unsigned int make_sgl(const struct sk_buff *skb,
^
Use the appropriate skb accessor functions.
Compile-tested only.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Fixes: 1a37e412a022 ('net: Use 16bits for *_headers fields of struct skbuff') Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
time_after_eq() only works if the delta is < MAX_ULONG/2.
For a 32bit Dom0, if netfront sends packets at a very low rate, the time
between subsequent calls to tx_credit_exceeded() may exceed MAX_ULONG/2
and the test for timer_after_eq() will be incorrect. Credit will not be
replenished and the guest may become unable to send packets (e.g., if
prior to the long gap, all credit was exhausted).
Use jiffies_64 variant to mitigate this problem for 32bit Dom0.
Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Jason Luan <jianhai.luan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3ab098df35f8b98b6553edc2e40234af512ba877 (virtio-net: don't respond to
cpu hotplug notifier if we're not ready) tries to bypass the cpu hotplug
notifier by checking the config_enable and does nothing is it was false. So it
need to try to hold the config_lock mutex which may happen in atomic
environment which leads the following warnings:
A correct fix is to unregister the hotcpu notifier during restore and register a
new one in resume.
Reported-by: Fengguang Wu <fengguang.wu@intel.com> Tested-by: Fengguang Wu <fengguang.wu@intel.com> Cc: Wanlong Gao <gaowanlong@cn.fujitsu.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Wanlong Gao <gaowanlong@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We don't validate iph->ihl which may lead a dead loop if we meet a IPIP
skb whose iph->ihl is zero. Fix this by failing immediately when iph->ihl
is evil (less than 5).
Signed-off-by: Jason Wang <jasowang@redhat.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Petr Matousek <pmatouse@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Daniel Borkmann <dborkman@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Introduced in f9e42b853523 ("net: sctp: sideeffect: throw BUG if
primary_path is NULL"), we intended to find a buggy assoc that's
part of the assoc hash table with a primary_path that is NULL.
However, we better remove the BUG_ON for now and find a more
suitable place to assert for these things as Mark reports that
this also triggers the bug when duplication cookie processing
happens, and the assoc is not part of the hash table (so all
good in this case). Such a situation can for example easily be
reproduced by:
tc qdisc add dev eth0 root handle 1: prio bands 2 priomap 1 1 1 1 1 1
tc qdisc add dev eth0 parent 1:2 handle 20: netem loss 20%
tc filter add dev eth0 protocol ip parent 1: prio 2 u32 match ip \
protocol 132 0xff match u8 0x0b 0xff at 32 flowid 1:2
This drops 20% of COOKIE-ACK packets. After some follow-up
discussion with Vlad we came to the conclusion that for now we
should still better remove this BUG_ON() assertion, and come up
with two follow-ups later on, that is, i) find a more suitable
place for this assertion, and possibly ii) have a special
allocator/initializer for such kind of temporary assocs.
Reported-by: Mark Thomas <Mark.Thomas@metaswitch.com> Signed-off-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In function mlx4_master_deactivate_admin_state() __mlx4_unregister_mac was
called using the MAC index. It should be called with the value of the MAC itself.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Debugfs was setup in NTB to only have a single debugfs directory. This
resulted in the leaking of debugfs directories and files when multiple
NTB devices were present, due to each device stomping on the variables
containing the previous device's values (thus preventing them from being
freed on cleanup). Correct this by creating a secondary directory of
the PCI BDF for each device present, and nesting the previously existing
information in those directories.
Signed-off-by: Jon Mason <jon.mason@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Due to ambiguous documentation, the USD/DSD identification is backward
when compared to the setting in BIOS. Correct the bits to match the
BIOS setting.
Signed-off-by: Jon Mason <jon.mason@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The NTB Xeon hardware has 16 scratch pad registers and 16 back-to-back
scratch pad registers. Correct the #define to represent this and update
the variable names to reflect their usage.
Signed-off-by: Jon Mason <jon.mason@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If an error is encountered in ntb_device_setup, it is possible that the
spci_cmd isn't populated. Writes to the offset can result in a NULL
pointer dereference. This issue is easily encountered by running in
NTB-RP mode, as it currently is not supported and will generate an
error. To get around this issue, return if an error is encountered
prior to attempting to write to the spci_cmd offset.
Signed-off-by: Jon Mason <jon.mason@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This issue was first pointed out by Jiaxing Wang several months ago, but no
further comments:
https://lkml.org/lkml/2013/6/29/41
As we know pread() does not change f_pos, so after pread(), file->f_pos
and m->read_pos become different. And seq_lseek() does not update file->f_pos
if offset equals to m->read_pos, so after pread() and seq_lseek()(lseek to
m->read_pos), then a subsequent read may read from a wrong position, the
following program produces the problem:
char str1[32] = { 0 };
char str2[32] = { 0 };
int poffset = 10;
int count = 20;
/*open any seq file*/
int fd = open("/proc/modules", O_RDONLY);
Commit 040a0a37 ("mutex: Add support for wound/wait style locks")
used "!__builtin_constant_p(p == NULL)" but gcc 3.x cannot
handle such expression correctly, leading to boot failure when
built with CONFIG_DEBUG_MUTEXES=y.
Fix it by explicitly passing a bool which tells whether p != NULL
or not.
[ PeterZ: This is a sad patch, but provided it actually generates
similar code I suppose its the best we can do bar whole
sale deprecating gcc-3. ]
Originally I've thought that this is leftover hw state dirt from the
BIOS. But after way too much helpless flailing around on my part I've
noticed that the actual bug is when we change the state of an already
active pipe.
For example when we change the fdi lines from 2 to 3 without switching
off outputs in-between we'll never see the crucial on->off transition
in the ->modeset_global_resources hook the current logic relies on.
Patch version 2 got this right by instead also checking whether the
pipe is indeed active. But that in turn broke things when pipes have
been turned off through dpms since the bifurcate enabling is done in
the ->crtc_mode_set callback.
To address this issues discussed with Ville in the patch review move
the setting of the bifurcate bit into the ->crtc_enable hook. That way
we won't wreak havoc with this state when userspace puts all other
outputs into dpms off state. This also moves us forward with our
overall goal to unify the modeset and dpms on paths (which we need to
have to allow runtime pm in the dpms off state).
Unfortunately this requires us to move the bifurcate helpers around a
bit.
Also update the commit message, I've misanalyzed the bug rather badly.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70507 Tested-by: Jan-Michael Brummer <jan.brummer@tabos.org> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The Intel D410PT(LW) and D425KT Mini-ITX desktop boards both show up as
having LVDS but the hardware is not populated. This patch adds them to
the list of such systems. Patch is against 3.11.4
v2: Patch revised to match the D425KT exactly as the D425KTW does have
LVDS. According to Intel's documentation, the D410PTL and D410PLTW
don't.
Signed-off-by: Rob Pearce <rob@flitspace.org.uk>
[danvet: Pimp commit message to my liking and add cc: stable.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On CTG+ read out the pipe bpp setting from hardware and fill it into
pipe config. Also check it appropriately.
v2: Don't do the pipe_bpp extraction inside the PCH only code block on
ILK+.
Avoid the PIPECONF read as we already have read it for the
PIPECONF_EANBLE check.
Note: This is already in drm-intel-next-queued as
commit 42571aefafb1d330ef84eb29418832f72e7dfb4c
Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
Date: Fri Sep 6 23:29:00 2013 +0300
drm/i915: Add support for pipe_bpp readout
but is needed for the following bugfix.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pavel Roskin reported that DRM_IOCTL_MODE_GETCONNECTOR was overwritting
the 4 bytes beyond the end of its structure with a 32-bit userspace
running on a 64-bit kernel. This is due to the padding gcc inserts as
the drm_mode_get_connector struct includes a u64 and its size is not a
natural multiple of u64s.
Fortuituously we can insert explicit padding to the tail of our
structures without breaking ABI.
Reported-by: Pavel Roskin <proski@gnu.org> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Dave Airlie <airlied@redhat.com> Cc: dri-devel@lists.freedesktop.org Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
drm: block userspace under allocating buffer and having drivers overwrite it (v2)
to the core ioctl structs as well, for we found one instance where there
is a 32-/64-bit size mismatch and were guilty of writing beyond the end
of the user's buffer.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Dave Airlie <airlied@redhat.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: dri-devel@lists.freedesktop.org Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The w/a db makes the recommendation to both use a non-default value for
the initial clock and then to retry with an alternative clock for
Haswell with the Lakeport PCH.
"On LPT:H, use a divider value of 63 decimal (03Fh). If there is a
failure, retry at least three times with 63, then retry at least three
times with 72 decimal (048h)."
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
DRI clients that tried to grab the TTM lock when the master (X server) was
switched away during a VT switch were sent the SIGTERM signal by the
kernel. Fix this so that they are only sent that signal when the master has
exited.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Jakob Bornecrantz <jakob@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When walk_page_range walk a memory map's page tables, it'll skip
VM_PFNMAP area, then variable 'next' will to assign to vma->vm_end, it
maybe larger than 'end'. In next loop, 'addr' will be larger than
'next'. Then in /proc/XXXX/pagemap file reading procedure, the 'addr'
will growing forever in pagemap_pte_range, pte_to_pagemap_entry will
access the wrong pte.
BUG: Bad page map in process procrank pte:8437526f pmd:785de067
addr:9108d000 vm_flags:00200073 anon_vma:f0d99020 mapping: (null) index:9108d
CPU: 1 PID: 4974 Comm: procrank Tainted: G B W O 3.10.1+ #1
Call Trace:
dump_stack+0x16/0x18
print_bad_pte+0x114/0x1b0
vm_normal_page+0x56/0x60
pagemap_pte_range+0x17a/0x1d0
walk_page_range+0x19e/0x2c0
pagemap_read+0x16e/0x200
vfs_read+0x84/0x150
SyS_read+0x4a/0x80
syscall_call+0x7/0xb
Signed-off-by: Liu ShuoX <shuox.liu@intel.com> Signed-off-by: Chen LinX <linx.z.chen@intel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If a page we are inspecting is in swap we may occasionally report it as
having soft dirty bit (even if it is clean). The pte_soft_dirty helper
should be called on present pte only.
A THP PMD update is accounted for as 512 pages updated in vmstat. This is
large difference when estimating the cost of automatic NUMA balancing and
can be misleading when comparing results that had collapsed versus split
THP. This patch addresses the accounting issue.
Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-10-git-send-email-mgorman@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
During hours of testing, one crashed with weird errors and while I have
no direct evidence, I suspect something like the race above happened.
This patch extends the page lock to being held until the pmd_numa is
cleared to prevent migration starting in parallel while the pmd_numa is
being cleared. It also flushes the old pmd entry and orders pagetable
insertion before rmap insertion.
Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-9-git-send-email-mgorman@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
- do_huge_pmd_numa_page():
Accounts against the current node, not the node where the
page resides, unless we migrated, in which case it accounts
against the node we migrated to.
- do_numa_page():
Accounts against the current node, not the node where the
page resides, unless we migrated, in which case it accounts
against the node we migrated to.
- do_pmd_numa_page():
Accounts not at all when the page isn't migrated, otherwise
accounts against the node we migrated towards.
This seems wrong to me; all three sites should have the same
sementaics, furthermore we should accounts against where the page
really is, we already know where the task is.
So modify all three sites to always account; we did after all receive
the fault; and always account to where the page is after migration,
regardless of success.
They all still differ on when they clear the PTE/PMD; ideally that
would get sorted too.
Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-8-git-send-email-mgorman@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
THP migrations are serialised by the page lock but on its own that does
not prevent THP splits. If the page is split during THP migration then
the pmd_same checks will prevent page table corruption but the unlock page
and other fix-ups potentially will cause corruption. This patch takes the
anon_vma lock to prevent parallel splits during migration.
Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-7-git-send-email-mgorman@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The locking for migrating THP is unusual. While normal page migration
prevents parallel accesses using a migration PTE, THP migration relies on
a combination of the page_table_lock, the page lock and the existance of
the NUMA hinting PTE to guarantee safety but there is a bug in the scheme.
If a THP page is currently being migrated and another thread traps a
fault on the same page it checks if the page is misplaced. If it is not,
then pmd_numa is cleared. The problem is that it checks if the page is
misplaced without holding the page lock meaning that the racing thread
can be migrating the THP when the second thread clears the NUMA bit
and faults a stale page.
This patch checks if the page is potentially being migrated and stalls
using the lock_page if it is potentially being migrated before checking
if the page is misplaced or not.
Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-6-git-send-email-mgorman@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This fixes a regression for the Nomadik on the main system
timers.
The Nomadik seemed a bit slow and its heartbeat wasn't looking
healthy. And it was not strange, because it has been connected
to the 32768 Hz clock at boot, while being told by the clock driver
that it was 2.4MHz. Actually connect the TIMCLK to 2.4MHz by
default as this is what we want for nice scheduling, clocksource
and clock event.
The order of arguments in the call to vco_set() for the ICST clocks appears to
have been switched in error, which results in the VCO not being initialised
correctly. This in turn stops the integrated LCD on things like Integrator/CP
from working correctly.
This patch fixes the order and restores the expected functionality.
Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Jonathan Austin <jonathan.austin@arm.com> Signed-off-by: Mike Turquette <mturquette@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In commit d496f94d22d1 ('[SCSI] aacraid: fix security weakness') we
added a check on CAP_SYS_RAWIO to the ioctl. The compat ioctls need the
check as well.
Commit b1adaf65ba03 ("[SCSI] block: add sg buffer copy helper
functions") introduces two sg buffer copy helpers, and calls
flush_kernel_dcache_page() on pages in SG list after these pages are
written to.
Unfortunately, the commit may introduce a potential bug:
- Before sending some SCSI commands, kmalloc() buffer may be passed to
block layper, so flush_kernel_dcache_page() can see a slab page
finally
- According to cachetlb.txt, flush_kernel_dcache_page() is only called
on "a user page", which surely can't be a slab page.
- ARCH's implementation of flush_kernel_dcache_page() may use page
mapping information to do optimization so page_mapping() will see the
slab page, then VM_BUG_ON() is triggered.
Aaro Koskinen reported the bug on ARM/kirkwood when DEBUG_VM is enabled,
and this patch fixes the bug by adding test of '!PageSlab(miter->page)'
before calling flush_kernel_dcache_page().
Signed-off-by: Ming Lei <ming.lei@canonical.com> Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi> Tested-by: Simon Baatz <gmbnomis@gmail.com> Cc: Russell King - ARM Linux <linux@arm.linux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Cc: Aaro Koskinen <aaro.koskinen@iki.fi> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Tejun Heo <tj@kernel.org> Cc: "James E.J. Bottomley" <JBottomley@parallels.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Nico Golde reports a few straggling uses of [io_]remap_pfn_range() that
really should use the vm_iomap_memory() helper. This trivially converts
two of them to the helper, and comments about why the third one really
needs to continue to use remap_pfn_range(), and adds the missing size
check.
According to create_thread(3): "The new thread does not inherit the creating
thread's alternate signal stack". Since commit f9a3879a (Fix sigaltstack
corruption among cloned threads), current->sas_ss_size is set to 0 for cloned
processes sharing VM with their parent. Don't use the (nonexistent) alternate
signal stack in this case. This has been broken since commit 29c4dfd9 ([XTENSA]
Remove non-rt signal handling).
Fixes the SA_ONSTACK part of the nptl/tst-cancel20 test from uClibc.
Signed-off-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Chris Zankel <chris@zankel.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>