When copy_from_user fails, return -EFAULT, not -ENOMEM
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Reported-by: Robert Elliott <elliott@hp.com> Reviewed-by: Joe Handzik <joseph.t.handzik@hp.com> Reviewed-by: Scott Teel <scott.teel@hp.com>
Reviewed by: Mike MIller <michael.miller@canonical.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
If the timer irqs are resumed during device resume it is possible in
certain circumstances for the resume to hang early on, before device
interrupts are resumed. For an Ubuntu 14.04 PVHVM guest this would
occur in ~0.5% of resume attempts.
It is not entirely clear what is occuring the point of the hang but I
think a task necessary for the resume calls schedule_timeout(),
waiting for a timer interrupt (which never arrives). This failure may
require specific tasks to be running on the other VCPUs to trigger
(processes are not frozen during a suspend/resume if PREEMPT is
disabled).
Add IRQF_EARLY_RESUME to the timer interrupts so they are resumed in
syscore_resume().
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Without CONFIG_RELOCATABLE the early boot code will decompress the
kernel to LOAD_PHYSICAL_ADDR. While this may have been fine in the BIOS
days, that isn't going to fly with UEFI since parts of the firmware
code/data may be located at LOAD_PHYSICAL_ADDR.
Straying outside of the bounds of the regions we've explicitly requested
from the firmware will cause all sorts of trouble. Bruno reports that
his machine resets while trying to decompress the kernel image.
We already go to great pains to ensure the kernel is loaded into a
suitably aligned buffer, it's just that the address isn't necessarily
LOAD_PHYSICAL_ADDR, because we can't guarantee that address isn't in-use
by the firmware.
Explicitly enforce CONFIG_RELOCATABLE for the EFI boot stub, so that we
can load the kernel at any address with the correct alignment.
Reported-by: Bruno Prémont <bonbons@linux-vserver.org> Tested-by: Bruno Prémont <bonbons@linux-vserver.org> Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Commit 30919b0bf356 ("x86: avoid low BIOS area when allocating address
space") moved the test for resource allocations that fall within the first
1MB of address space from the PCI-specific path to a generic path, such
that all resource allocations will avoid this area. However, this breaks
ISA cards which need to allocate a memory region within the first 1MB. An
example is the i82365 PCMCIA controller and derivatives like the Ricoh
RF5C296/396 which map part of the PCMCIA socket memory address space into
the first 1MB of system memory address space. They do not work anymore as
no usable memory region exists due to this change:
We can't do ASPM configuration at enumeration-time because enabling it
makes some defective hardware unresponsive, even if ASPM is disabled later
(see 41cd766b0659 ("PCI: Don't enable aspm before drivers have had a chance
to veto it"). Therefore, we have to do it after a driver claims the
device.
We previously configured ASPM in pci_set_power_state(), but that's not a
very good place because it's not really related to setting the PCI device
power state, and doing it there means:
- We incorrectly skipped ASPM config when setting a device that's
already in D0 to D0.
- We unnecessarily configured ASPM when setting a device to a low-power
state (the ASPM feature only applies when the device is in D0).
- We unnecessarily configured ASPM when called from a .resume() method
(ASPM configuration needs to be restored during resume, but
pci_restore_pcie_state() should already do this).
Move ASPM configuration from pci_set_power_state() to
do_pci_enable_device() so we do it when a driver enables a device.
The third parameter of kvm_iommu_put_pages is wrong,
It should be 'gfn - slot->base_gfn'.
By making gfn very large, malicious guest or userspace can cause kvm to
go to this error path, and subsequently to pass a huge value as size.
Alternatively if gfn is small, then pages would be pinned but never
unpinned, causing host memory leak and local DOS.
Passing a reasonable but large value could be the most dangerous case,
because it would unpin a page that should have stayed pinned, and thus
allow the device to DMA into arbitrary memory. However, this cannot
happen because of the condition that can trigger the error:
- out of memory (where you can't allocate even a single page)
should not be possible for the attacker to trigger
- when exceeding the iommu's address space, guest pages after gfn
will also exceed the iommu's address space, and inside
kvm_iommu_put_pages() the iommu_iova_to_phys() will fail. The
page thus would not be unpinned at all.
Reported-by: Jack Morgenstein <jackm@mellanox.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
SeaBIOS has a limit on the number of MTRRs that it can handle,
and this patch exceeded the limit. Better revert it.
Thanks to Nadav Amit for debugging the cause.
Reported-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Currently, the EOI exit bitmap (used for APICv) does not include
interrupts that are masked. However, this can cause a bug that manifests
as an interrupt storm inside the guest. Alex Williamson reported the
bug and is the one who really debugged this; I only wrote the patch. :)
The scenario involves a multi-function PCI device with OHCI and EHCI
USB functions and an audio function, all assigned to the guest, where
both USB functions use legacy INTx interrupts.
As soon as the guest boots, interrupts for these devices turn into an
interrupt storm in the guest; the host does not see the interrupt storm.
Basically the EOI path does not work, and the guest continues to see the
interrupt over and over, even after it attempts to mask it at the APIC.
The bug is only visible with older kernels (RHEL6.5, based on 2.6.32
with not many changes in the area of APIC/IOAPIC handling).
Alex then tried forcing bit 59 (corresponding to the USB functions' IRQ)
on in the eoi_exit_bitmap and TMR, and things then work. What happens
is that VFIO asserts IRQ11, then KVM recomputes the EOI exit bitmap.
It does not have set bit 59 because the RTE was masked, so the IOAPIC
never sees the EOI and the interrupt continues to fire in the guest.
My guess was that the guest is masking the interrupt in the redirection
table in the interrupt routine, i.e. while the interrupt is set in a
LAPIC's ISR, The simplest fix is to ignore the masking state, we would
rather have an unnecessary exit rather than a missed IRQ ACK and anyway
IOAPIC interrupts are not as performance-sensitive as for example MSIs.
Alex tested this patch and it fixed his bug.
[Thanks to Alex for his precise description of the problem
and initial debugging effort. A lot of the text above is
based on emails exchanged with him.]
Reported-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Return unhandlable error on inter-privilege level ret instruction. This is
since the current emulation does not check the privilege level correctly when
loading the CS, and does not pop RSP/SS as needed.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
[ I'm currently running my tests on it now, and so far, after a few
hours it has yet to blow up. I'll run it for 24 hours which it never
succeeded in the past. ]
The tracing code has a way to make directories within the debugfs file
system as well as deleting them using mkdir/rmdir in the instance
directory. This is very limited in functionality, such as there is
no renames, and the parent directory "instance" can not be modified.
The tracing code creates the instance directory from the debugfs code
and then replaces the dentry->d_inode->i_op with its own to allow
for mkdir/rmdir to work.
When these are called, the d_entry and inode locks need to be released
to call the instance creation and deletion code. That code has its own
accounting and locking to serialize everything to prevent multiple
users from causing harm. As the parent "instance" directory can not
be modified this simplifies things.
I created a stress test that creates several threads that randomly
creates and deletes directories thousands of times a second. The code
stood up to this test and I submitted it a while ago.
Recently I added a new test that adds readers to the mix. While the
instance directories were being added and deleted, readers would read
from these directories and even enable tracing within them. This test
was able to trigger a bug:
Where the child->d_u.d_child seemed to be corrupted. I added lots of
trace_printk()s to see what was wrong, and sure enough, it was always
the child's d_u.d_child field. I looked around to see what touches
it and noticed that in __dentry_kill() which calls dentry_free():
static void dentry_free(struct dentry *dentry)
{
/* if dentry was never visible to RCU, immediate free is OK */
if (!(dentry->d_flags & DCACHE_RCUACCESS))
__d_free(&dentry->d_u.d_rcu);
else
call_rcu(&dentry->d_u.d_rcu, __d_free);
}
I also noticed that __dentry_kill() unlinks the child->d_u.child
under the parent->d_lock spin_lock.
Looking back at the loop in debugfs_remove_recursive() it never takes the
parent->d_lock to do the list walk. Adding more tracing, I was able to
prove this was the issue:
The dentry_kill freed ffff88006d573600 just as the remove recursive was walking
it.
In order to fix this, the list walk needs to be modified a bit to take
the parent->d_lock. The safe version is no longer necessary, as every
time we remove a child, the parent->d_lock must be released and the
list walk must start over. Each time a child is removed, even though it
may still be on the list, it should be skipped by the first check
in the loop:
if (!debugfs_positive(child))
continue;
Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
The interrupt handler in the ux500 crypto driver has an obviously
incorrect way to access the data buffer, which for a while has
caused this build warning:
../ux500/cryp/cryp_core.c: In function 'cryp_interrupt_handler':
../ux500/cryp/cryp_core.c:234:5: warning: passing argument 1 of '__fswab32' makes integer from pointer without a cast [enabled by default]
writel_relaxed(ctx->indata,
^
In file included from ../include/linux/swab.h:4:0,
from ../include/uapi/linux/byteorder/big_endian.h:12,
from ../include/linux/byteorder/big_endian.h:4,
from ../arch/arm/include/uapi/asm/byteorder.h:19,
from ../include/asm-generic/bitops/le.h:5,
from ../arch/arm/include/asm/bitops.h:340,
from ../include/linux/bitops.h:33,
from ../include/linux/kernel.h:10,
from ../include/linux/clk.h:16,
from ../drivers/crypto/ux500/cryp/cryp_core.c:12:
../include/uapi/linux/swab.h:57:119: note: expected '__u32' but argument is of type 'const u8 *'
static inline __attribute_const__ __u32 __fswab32(__u32 val)
There are at least two, possibly three problems here:
a) when writing into the FIFO, we copy the pointer rather than the
actual data we want to give to the hardware
b) the data pointer is an array of 8-bit values, while the FIFO
is 32-bit wide, so both the read and write access fail to do
a proper type conversion
c) This seems incorrect for big-endian kernels, on which we need to
byte-swap any register access, but not normally FIFO accesses,
at least the DMA case doesn't do it either.
This converts the bogus loop to use the same readsl/writesl pair
that we use for the two other modes (DMA and polling). This is
more efficient and consistent, and probably correct for endianess.
The bug has existed since the driver was first merged, and was
probably never detected because nobody tried to use interrupt mode.
It might make sense to backport this fix to stable kernels, depending
on how the crypto maintainers feel about that.
When a tty is opened for the serial console, the termios c_cflag
settings are inherited from the console line settings.
However, if the tty is subsequently closed, the termios settings
are lost. This results in a garbled console if the console is later
suspended and resumed.
Preserve the termios c_cflag for the serial console when the tty
is shutdown; this reflects the most recent line settings.
Fixes: Bugzilla #69751, 'serial console does not wake from S3' Reported-by: Valerio Vanni <valerio.vanni@inwind.it> Acked-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
If there is a failure while allocating the preallocation structure, a
number of blocks can end up getting marked in the in-memory buddy
bitmap, and then not getting released. This can result in the
following corruption getting reported by the kernel:
EXT4-fs error (device sda3): ext4_mb_generate_buddy:758: group 1126,
12793 clusters in bitmap, 12729 in gd
In that case, we need to release the blocks using mb_free_blocks().
Tested: fs smoke test; also demonstrated that with injected errors,
the file system is no longer getting corrupted
Most device drivers do call 'tpm_do_selftest' which executes a
TPM_ContinueSelfTest. tpm_i2c_stm_st33 is just pointlessly different,
I think it is bug.
These days we have the general assumption that the TPM is usable by
the kernel immediately after the driver is finished, so we can no
longer defer the mandatory self test to userspace.
Reported-by: Richard Marciel <rmaciel@linux.vnet.ibm.com> Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Peter Huewe <peterhuewe@gmx.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
On platforms with sizeof(int) < sizeof(long), writing a temperature
limit larger than MAXINT will result in unpredictable limit values
written to the chip. Avoid auto-conversion from long to int to fix
the problem.
Voltage limits, fan minimum speed, pwm frequency, pwm ramp rate, and
other attributes have the same problem, fix them as well.
Zone temperature limits are signed, but were cached as u8, causing
unepected values to be reported for negative temperatures. Cache as
s8 to fix the problem.
vrm is an u8, so the written value needs to be limited to [0, 255].
Signed-off-by: Axel Lin <axel.lin@ingics.com>
[Guenter Roeck: Fix zone temperature cache] Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Current code uses data_rate as array index in ads1015_read_adc() and uses pga
as array index in ads1015_reg_to_mv, so we must make sure both data_rate and
pga settings are in valid value range.
Return -EINVAL if the setting is out-of-range.
Signed-off-by: Axel Lin <axel.lin@ingics.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Temperature limit register writes did not account for negative numbers.
As a result, writing -127000 resulted in -126000 written into the
temperature limit register. This problem affected temp[1-3]_min,
temp[1-3]_max, temp[1-3]_auto_temp_crit, and temp[1-3]_auto_temp_min.
When writing pwm[1-3]_freq, a long variable was auto-converted into an int
without range check. Wiring values larger than MAXINT resulted in unexpected
register values.
When writing temp[1-3]_auto_temp_max, an unsigned long variable was
auto-converted into an int without range check. Writing values larger than
MAXINT resulted in unexpected register values.
vrm is an u8, so the written value needs to be limited to [0, 255].
Cc: Axel Lin <axel.lin@ingics.com> Reviewed-by: Axel Lin <axel.lin@ingics.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
On platforms with sizeof(int) < sizeof(unsigned long), writing a rpm value
larger than MAXINT will result in unpredictable limit values written to the
chip. Avoid auto-conversion from unsigned long to int to fix the problem.
Signed-off-by: Axel Lin <axel.lin@ingics.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
On platforms with sizeof(int) < sizeof(long), writing a temperature
limit larger than MAXINT will result in unpredictable limit values
written to the chip. Avoid auto-conversion from long to int to fix
the problem.
Cc: Axel Lin <axel.lin@ingics.com> Reviewed-by: Axel Lin <axel.lin@ingics.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Ensure mutex lock protects the read-modify-write period to prevent possible
race condition bug.
In additional, update data->valid should also be protected by the mutex lock.
Signed-off-by: Axel Lin <axel.lin@ingics.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
On platforms with sizeof(int) < sizeof(long), writing a temperature
limit larger than MAXINT will result in unpredictable limit values
written to the chip. Avoid auto-conversion from long to int to fix
the problem.
Signed-off-by: Axel Lin <axel.lin@ingics.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Regular randconfig nightly testing has detected problems with omapdrm.
omapdrm fails to build when the kernel is built to support 64-bit DMA
addresses and/or 64-bit physical addresses due to an assumption about
the width of these types.
Use %pad to print DMA addresses, rather than %x or %Zx (which is even
more wrong than %x). Avoid passing a uint32_t pointer into a function
which expects dma_addr_t pointer.
drivers/gpu/drm/omapdrm/omap_plane.c: In function 'omap_plane_pre_apply':
drivers/gpu/drm/omapdrm/omap_plane.c:145:2: error: format '%x' expects argument of type 'unsigned int', but argument 5 has type 'dma_addr_t' [-Werror=format]
drivers/gpu/drm/omapdrm/omap_plane.c:145:2: error: format '%x' expects argument of type 'unsigned int', but argument 6 has type 'dma_addr_t' [-Werror=format]
make[5]: *** [drivers/gpu/drm/omapdrm/omap_plane.o] Error 1
drivers/gpu/drm/omapdrm/omap_gem.c: In function 'omap_gem_get_paddr':
drivers/gpu/drm/omapdrm/omap_gem.c:794:4: error: format '%x' expects argument of type 'unsigned int', but argument 3 has type 'dma_addr_t' [-Werror=format]
drivers/gpu/drm/omapdrm/omap_gem.c: In function 'omap_gem_describe':
drivers/gpu/drm/omapdrm/omap_gem.c:991:4: error: format '%Zx' expects argument of type 'size_t', but argument 7 has type 'dma_addr_t' [-Werror=format]
drivers/gpu/drm/omapdrm/omap_gem.c: In function 'omap_gem_init':
drivers/gpu/drm/omapdrm/omap_gem.c:1470:4: error: format '%x' expects argument of type 'unsigned int', but argument 7 has type 'dma_addr_t' [-Werror=format]
make[5]: *** [drivers/gpu/drm/omapdrm/omap_gem.o] Error 1
drivers/gpu/drm/omapdrm/omap_dmm_tiler.c: In function 'dmm_txn_append':
drivers/gpu/drm/omapdrm/omap_dmm_tiler.c:226:2: error: passing argument 3 of 'alloc_dma' from incompatible pointer type [-Werror]
make[5]: *** [drivers/gpu/drm/omapdrm/omap_dmm_tiler.o] Error 1
make[5]: Target `__build' not remade because of errors.
make[4]: *** [drivers/gpu/drm/omapdrm] Error 2
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
According to the comment “restore_es3: applies to 34xx >= ES3.0" in
"arch/arm/mach-omap2/sleep34xx.S”, omap3_restore_es3 should be used
if the revision of an OMAP34xx is ES3.1.2.
Signed-off-by: Jeremy Vial <jvial@adeneo-embedded.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Link must be reset in case the fw doesn't
respond to client disconnect request.
We did charge the timer only in irq path
from mei_cl_irq_close and not in mei_cl_disconnect
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
ALC269 & co have many vendor-specific setups with COEF verbs.
However, some verbs seem specific to some codec versions and they
result in the codec stalling. Typically, such a case can be avoided
by checking the return value from reading a COEF. If the return value
is -1, it implies that the COEF is invalid, thus it shouldn't be
written.
This patch adds the invalid COEF checks in appropriate places
accessing ALC269 and its variants. The patch actually fixes the
resume problem on Acer AO725 laptop.
On some HP laptops, the mute led is controlled by codec gpio.
When some machine resume from s3/s4, the codec gpio data will be
cleared to 0 by BIOS:
Before suspend:
IO[3]: enable=1, dir=1, wake=0, sticky=0, data=1, unsol=0
After resume:
IO[3]: enable=1, dir=1, wake=0, sticky=0, data=0, unsol=0
To skip the AFG node to enter D3 can't fix this problem.
A workaround is to restore the gpio data when the system resume
back from s3/s4. It is safe even on the machines without this
problem.
BugLink: https://bugs.launchpad.net/bugs/1358116 Tested-by: Franz Hsieh <franz.hsieh@canonical.com> Signed-off-by: Hui Wang <hui.wang@canonical.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
CA0132 driver tries to reload the firmware at resume. Usually this
works since the firmware loader core caches the firmware contents by
itself. However, if the driver failed to load the firmwares
(e.g. missing files), reloading the firmware at resume goes through
the actual file loading code path, and triggers a kernel WARNING like:
WARNING: CPU: 10 PID:11371 at drivers/base/firmware_class.c:1105 _request_firmware+0x9ab/0x9d0()
For avoiding this situation, this patch makes CA0132 skipping the f/w
loading at resume when it failed at probe time.
Problem Summary: Problem has been observed generally with PM states
where VBUS goes off during suspend. There are some SS USB devices which
take longer time for link training compared to many others. Such
devices fail to reconnect with same old address which was associated
with it before suspend.
When system resumes, at some point of time (dpm_run_callback->
usb_dev_resume->usb_resume->usb_resume_both->usb_resume_device->
usb_port_resume) SW reads hub status. If device is present,
then it finishes port resume and re-enumerates device with same
address. If device is not present then, SW thinks that device was
removed during suspend and therefore does logical disconnection
and removes all the resource allocated for this device.
Now, if I put sufficient delay just before root hub status read in
usb_resume_device then, SW sees always that device is present. In normal
course(without any delay) SW sees that no device is present and then SW
removes all resource associated with the device at this port. In the
latter case, after sometime, device says that hey I am here, now host
enumerates it, but with new address.
Problem had been reproduced when I connect verbatim USB3.0 hard disc
with my STiH407 XHCI host running with 3.10 kernel.
I see that similar problem has been reported here.
https://bugzilla.kernel.org/show_bug.cgi?id=53211
Reading above it seems that bug was not in 3.6.6 and was present in 3.8
and again it was not present for some in 3.12.6, while it was present
for few others. I tested with 3.13-FC19 running at i686 desktop, problem
was still there. However, I was failed to reproduce it with 3.16-RC4
running at same i686 machine. I would say it is just a random
observation. Problem for few devices is always there, as I am unable to
find a proper fix for the issue.
So, now question is what should be the amount of delay so that host is
always able to recognize suspended device after resume.
XHCI specs 4.19.4 says that when Link training is successful, port sets
CSC bit to 1. So if SW reads port status before successful link
training, then it will not find device to be present. USB Analyzer log
with such buggy devices show that in some cases device switch on the
RX termination after long delay of host enabling the VBUS. In few other
cases it has been seen that device fails to negotiate link training in
first attempt. It has been reported till now that few devices take as
long as 2000 ms to train the link after host enabling its VBUS and
RX termination. This patch implements a 2000 ms timeout for CSC bit to set
ie for link training. If in a case link trains before timeout, loop will
exit earlier.
This patch implements above delay, but only for SS device and when
persist is enabled.
So, for the good device overhead is almost none. While for the bad
devices penalty could be the time which it take for link training.
But, If a device was connected before suspend, and was removed
while system was asleep, then the penalty would be the timeout ie
2000 ms.
Results:
Verbatim USB SS hard disk connected with STiH407 USB host running 3.10
Kernel resumes in 461 msecs without this patch, but hard disk is
assigned a new device address. Same system resumes in 790 msecs with
this patch, but with old device address.
The EHCI packet buffer in/out threshold is programmable for Intel Quark X1000
USB host controller, and the default value is 0x20 dwords. The in/out threshold
can be programmed to 0x80 dwords (512 Bytes) to maximize the perfomrance,
but only when isochronous/interrupt transactions are not initiated by the USB
host controller. This patch is to reconfigure the packet buffer in/out
threshold as maximal as possible to maximize the performance, and 0x7F dwords
(508 Bytes) should be used because the USB host controller initiates
isochronous/interrupt transactions.
usbfs allows user space to pass down an URB which sets URB_SHORT_NOT_OK
for output URBs. That causes usbcore to log messages without limit
for a nonsensical disallowed combination. The fix is to silently drop
the attribute in usbfs.
The problem is reported to exist since 3.14
https://www.virtualbox.org/ticket/13085
Signed-off-by: Oliver Neukum <oneukum@suse.de> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
This patch fixes a bug in ohci-hcd. When an URB is unlinked, the
corresponding Endpoint Descriptor is added to the ed_rm_list and taken
off the hardware schedule. Once the ED is no longer visible to the
hardware, finish_unlinks() handles the URBs that were unlinked or have
completed. If any URBs remain attached to the ED, the ED is added
back to the hardware schedule -- but only if the controller is
running.
This fails when a controller dies. A non-empty ED does not get added
back to the hardware schedule and does not remain on the ed_rm_list;
ohci-hcd loses track of it. The remaining URBs cannot be unlinked,
which causes the USB stack to hang.
The patch changes finish_unlinks() so that non-empty EDs remain on
the ed_rm_list if the controller isn't running. This requires moving
some of the existing code around, to avoid modifying the ED's hardware
fields more than once.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
The debug routine fill_async_buffer() in ohci-hcd is buggy: It never
produces any output because it forgets to initialize the output buffer
size. Also, the debug routine ohci_dump() has an unused argument.
This patch adds the correct initialization and removes the unused
argument.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
We did not check relocated directory in any way when processing Rock
Ridge 'CL' tag. Thus a corrupted isofs image can possibly have a CL
entry pointing to another CL entry leading to possibly unbounded
recursion in kernel code and thus stack overflow or deadlocks (if there
is a loop created from CL entries).
Fix the problem by not allowing CL entry to point to a directory entry
with CL entry (such use makes no good sense anyway) and by checking
whether CL entry doesn't point to itself.
Reported-by: Chris Evans <cevans@google.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
device_index is a char type and the size of paired_dj_deivces is 7
elements, therefore proper bounds checking has to be applied to
device_index before it is used.
We are currently performing the bounds checking in
logi_dj_recv_add_djhid_device(), which is too late, as malicious device
could send REPORT_TYPE_NOTIF_DEVICE_UNPAIRED early enough and trigger the
problem in one of the report forwarding functions called from
logi_dj_raw_event().
Fix this by performing the check at the earliest possible ocasion in
logi_dj_raw_event().
Reported-by: Ben Hawkes <hawkes@google.com> Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
This adds a pci_upstream_bridge() interface to find the PCI-to-PCI bridge
upstream from a device. This is typically just "dev->bus->self", but in
the case of a VF on a virtual bus, we have to start from the corresponding
PF. Returns NULL if there is no upstream PCI bridge, i.e., if the device
is on a root bus.
Commit 08778795 ("block: Fix nr_vecs for inline integrity vectors") from
Martin introduces the function bip_integrity_vecs(get the useful vectors)
to fix the issue about nr_vecs for inline integrity vectors that reported
by David Milburn.
But it seems that bip_integrity_vecs() will return the wrong number if the
bio is not based on any bio_set for some reason(bio->bi_pool == NULL),
because in that case, the bip_inline_vecs[0] is malloced directly. So
here we add the bip_max_vcnt to record the count of vector slots, and
cleanup the function bip_integrity_vecs().
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Kent Overstreet <kmo@daterainc.com> Signed-off-by: Jens Axboe <axboe@fb.com>
So snapshot.c mark the pfn to forbidden pages map. But, this
page is also in the memory bitmap in snapshot image because it's an
original page used by image kernel, so it will also mark as an
unsafe(free) page in prepare_image().
That means the page in e820 when resuming mark as "forbidden" and
"free", it causes get_buffer() treat it as an allocated unsafe page.
Then snapshot_write_next() return this page to load_image, load_image
writing content to this address, but this page didn't really allocated
. So, we got page fault.
Although the root cause is from BIOS, I think aggressive check and
significant message in kernel will better then a page fault for
issue tracking, especially when serial console unavailable.
This patch adds code in mark_unsafe_pages() for check does free pages in
nosave region. If so, then it print message and return fault to stop whole
S4 resume process:
I am using a USB keyborad that give me "usb_submit_urb(ctrl) failed: -1" error
when I plugin it. and I need to wait for 10s for this device to be ready.
By adding this quirks, the usb keyborad is usable right after plugin
This patch adds a usb quirk to support devices with interupt endpoints
and bInterval values expressed as microframes. The quirk causes the
parse endpoint function to modify the reported bInterval to a standards
conforming value.
There is currently code in the endpoint parser that checks for
bIntervals that are outside of the valid range (1-16 for USB 2+ high
speed and super speed interupt endpoints). In this case, the code assumes
the bInterval is being reported in 1ms frames. As well, the correction
is only applied if the original bInterval value is out of the 1-16 range.
With this quirk applied to the device, the bInterval will be
accurately adjusted from microframes to an exponent.
Signed-off-by: James P Michels III <james.p.michels@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
The usb device will autoresume from choose_wakeup() if it is
autosuspended with the wrong wakeup setting, but below errors occur
because usb3503 misc driver will switch to standby mode when suspended.
As add USB_QUIRK_RESET_RESUME, it can stop setting wrong wakeup from
autosuspend_check().
[ 7.734717] usb 1-3: reset high-speed USB device number 3 using exynos-ehci
[ 7.854658] usb 1-3: device descriptor read/64, error -71
[ 8.079657] usb 1-3: device descriptor read/64, error -71
[ 8.294664] usb 1-3: reset high-speed USB device number 3 using exynos-ehci
[ 8.414658] usb 1-3: device descriptor read/64, error -71
[ 8.639657] usb 1-3: device descriptor read/64, error -71
[ 8.854667] usb 1-3: reset high-speed USB device number 3 using exynos-ehci
[ 9.264598] usb 1-3: device not accepting address 3, error -71
[ 9.374655] usb 1-3: reset high-speed USB device number 3 using exynos-ehci
[ 9.784601] usb 1-3: device not accepting address 3, error -71
[ 9.784838] usb usb1-port3: device 1-3 not suspended yet
This `usb_reset_device` command has been around since the driver was
originally reverse engineered. It doesn't cause much issue on single
interface CP210x devices, but on the CP2105 and CP2108 with 2 and 4
interfaces respectively it will cause instability on enumeration and
delays enumeration noticably. There should be no reason to reset a device
at startup, per the CP210x AN571 spec.
Signed-off-by: Preston Fick <preston.fick@silabs.com> Cc: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
The musb/cppi41 code installs a hrtimer to work around DMA completion
interrupts that have fired too early on AM335x hardware. This timer
is currently programmed to first fire 140 microseconds after the DMA
completion callback. According to the commit which introduced it
(a655f481d83, "usb: musb: musb_cppi41: handle pre-mature TX complete
interrupt"), that value is is considered a 'rule of thumb' that worked
well with the test case described in the commit log.
Test show, however, that for USB audio devices and much smaller packet
sizes, the timer has to fire earlier in order to correctly handle the audio
stream. The original test case had output transfer sizes of 1514 bytes, and
a delay of 140 microseconds. For audio devices with 24 bytes channel size, 3
microseconds seem to work well.
Hence, let's assume that the time it takes to clear the bit correlates with
the number of bytes transferred. The referenced commit log mentions such a
suspicion as well. Let the timer fire in cppi41_channel->total_len/10
microseconds to correctly handle both cases.
Also, shorten the interval in which the timer fires again in case of
a non-empty early_tx list.
With these changes in place, both FS and HS audio devices appear to work
well on AM335x hardware.
Signed-off-by: Daniel Mack <zonque@gmail.com> Reported-by: Sebastian Reimers <sebastian.reimers@googlemail.com> Signed-off-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
The real fix is where we check the bytes we need against how much is
remaining - we also need to check for a journal entry bigger than our
buffer, we'll never write those and it would be bad if we tried to read
one.
Also improve the diagnostic messages.
Signed-off-by: Kent Overstreet <kmo@daterainc.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
In __rtc_read_alarm(), if the alarm time retrieved by
rtc_read_alarm_internal() from the device contains invalid values (e.g.
month=2,mday=31) and the year not set (=-1), the initialization will
loop infinitely because the year-fixing loop expects the time being
invalid due to leap year.
Fix reduces the loop to the leap years and adds final validity check.
In particular seeing zero in eft->month is problematic, as it results in
-1 (converted to unsigned int, i.e. yielding 0xffffffff) getting passed
to rtc_year_days(), where the value gets used as an array index
(normally resulting in a crash). This was observed with the driver
enabled on x86 on some Fujitsu system (with possibly not up to date
firmware, but anyway).
Perhaps efi_read_alarm() should not fail if neither enabled nor pending
are set, but the returned time is invalid?
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reported-by: Raymund Will <rw@suse.de> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Jingoo Han <jg1.han@samsung.com> Acked-by: Lee, Chun-Yi <jlee@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Compared source code of rtc-lib.c::rtc_year_days() with
efirtc.c::rtc_year_days(), found the code in rtc-efi decreases value of
day twice when it computing year days. rtc-lib.c::rtc_year_days() has
already decrease days and return the year days from 0 to 365.
This fix (not very clean though) should fix the long time USB3
issue that was spotted last year. The rational has been given by
Hans de Goede:
----
I think the most likely cause for this is a firmware bug
in the unifying receiver, likely a race condition.
The most prominent difference between having a USB-2 device
plugged into an EHCI (so USB-2 only) port versus an XHCI
port will be inter packet timing. Specifically if you
send packets (ie hid reports) one at a time, then with
the EHCI controller their will be a significant pause
between them, where with XHCI they will be very close
together in time.
The reason for this is the difference in EHCI / XHCI
controller OS <-> driver interfaces.
For non periodic endpoints (control, bulk) the EHCI uses a
circular linked-list of commands in dma-memory, which it
follows to execute commands, if the list is empty, it
will go into an idle state and re-check periodically.
The XHCI uses a ring of commands per endpoint, and if the OS
places anything new on the ring it will do an ioport write,
waking up the XHCI making it send the new packet immediately.
For periodic transfers (isoc, interrupt) the delay between
packets when sending one at a time (rather then queuing them
up) will be even larger, because they need to be inserted into
the EHCI schedule 2 ms in the future so the OS driver can be
sure that the EHCI driver does not try to start executing the
time slot in question before the insertion has completed.
So a possible fix may be to insert a delay between packets
being send to the receiver.
----
I tested this on a buggy Haswell USB 3.0 motherboard, and I always
get the notification after adding the msleep.
Numerical values stored in the device tree are encoded in Big Endian and
should be byte swapped when running in Little Endian.
The RPA hotplug module should convert those values as well.
Note that in rpaphp_get_drc_props(), the comparison between indexes[i+1]
and *index is done using the BE values (whatever is the current endianess).
This doesn't matter since we are checking for equality here. This way only
the returned value is byte swapped.
RPA also made RTAS calls which implies BE values to be used. According to
the patch done in RTAS (http://patchwork.ozlabs.org/patch/336865), no
additional conversion is required in RPA.
tipc_msg_build() calls skb_copy_to_linear_data_offset() to copy data
from user space to kernel space. However, the latter function does
in its turn call memcpy() to perform the actual copying. This poses
an obvious security and robustness risk, since memcpy() never makes
any validity check on the pointer it is copying from.
To correct this, we the replace the offending function call with
a call to memcpy_fromiovecend(), which uses copy_from_user() to
perform the copying.
Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
This patch adds support for 57764, 57765, 57787, 57782 and 57786
devices.
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Since commit 3fb43eb ("bnx2x: Change to D3hot only on removal") nvram
is accessible whenever the driver is loaded - Thus it is possible to
test it during self-test even if the interface is down
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Ariel Elior <ariele@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
The cpl_abort_req struct has several reserved members which need to be
cleared to avoid disclosing kernel information. I have added a memset()
so now it matches the cxgb4 version of this function.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
netxen_process_lro() contains two bounds checks. One for the ring number
against the number of rings, and one for the Rx buffer ID against the
array of receive buffers.
Both of these have off-by-one errors, using > instead of >=. The correct
versions are used in netxen_process_rcv(), they're just wrong in
netxen_process_lro().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
The fallback to 32-bit DMA mask is rather odd:
if (!dma_set_mask(&pdev->dev, DMA_BIT_MASK(64)) &&
!dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(64))) {
*using_dac = true;
} else {
err = dma_set_mask(&pdev->dev, DMA_BIT_MASK(32));
if (err) {
err = dma_set_coherent_mask(&pdev->dev,
DMA_BIT_MASK(32));
if (err)
goto release_regions;
}
This means we only try and set the coherent DMA mask if we failed to
set a 32-bit DMA mask, and only if both fail do we fail the driver.
Adjust this so that if either setting fails, we fail the driver - and
thereby end up properly setting both the DMA mask and the coherent
DMA mask in the fallback case.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
If new_mtu is very large then "new_mtu + ETH_HLEN + ETH_FCS_LEN" can
wrap and the check on the next line can underflow. This is one of those
bugs which can be triggered by the user if you have namespaces
configured.
Also since this is something the user can trigger then we don't want to
have dev_err() message.
This is a static checker fix and I'm not sure what the impact is.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Tested-by: Sibai Li Sibai.li@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
The fallback to 32-bit DMA mask is rather odd:
err = dma_set_mask(&pdev->dev, DMA_BIT_MASK(64));
if (!err) {
err = dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(64));
if (!err)
pci_using_dac = 1;
} else {
err = dma_set_mask(&pdev->dev, DMA_BIT_MASK(32));
if (err) {
err = dma_set_coherent_mask(&pdev->dev,
DMA_BIT_MASK(32));
if (err) {
dev_err(&pdev->dev, "No usable DMA "
"configuration, aborting\n");
goto err_dma;
}
}
}
This means we only set the coherent DMA mask in the fallback path if
the DMA mask set failed, which is silly. This fixes it to set the
coherent DMA mask only if dma_set_mask() succeeded, and to error out
if either fails.
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
commit fa44f2f185f7f9da19d331929bb1b56c1ccd1d93 broke reloading of igb, when
VFs are assigned to a guest, in several ways.
1. on module load adapter->vf_data does not get properly allocated,
resulting in a null pointer exception when accessing adapter->vf_data in
igb_reset() on module reload.
modprobe -r igb ; modprobe igb max_vfs=7
[ 215.215837] igb 0000:01:00.1: removed PHC on eth1
[ 216.932072] igb 0000:01:00.1: IOV Disabled
[ 216.937038] igb 0000:01:00.0: removed PHC on eth0
[ 217.127032] igb 0000:01:00.0: Cannot deallocate SR-IOV virtual functions while they are assigned - VFs will not be deallocated
[ 217.146178] igb: Intel(R) Gigabit Ethernet Network Driver - version 5.0.5-k
[ 217.154050] igb: Copyright (c) 2007-2013 Intel Corporation.
[ 217.160688] igb 0000:01:00.0: Enabling SR-IOV VFs using the module parameter is deprecated - please use the pci sysfs interface.
[ 217.173703] igb 0000:01:00.0: irq 103 for MSI/MSI-X
[ 217.179227] igb 0000:01:00.0: irq 104 for MSI/MSI-X
[ 217.184735] igb 0000:01:00.0: irq 105 for MSI/MSI-X
[ 217.220082] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
[ 217.228846] IP: [<ffffffffa007c5e5>] igb_reset+0xc5/0x4b0 [igb]
[ 217.235472] PGD 3607ec067 PUD 36170b067 PMD 0
[ 217.240461] Oops: 0002 [#1] SMP
[ 217.244085] Modules linked in: igb(+) igbvf mptsas mptscsih mptbase scsi_transport_sas [last unloaded: igb]
[ 217.255040] CPU: 4 PID: 4833 Comm: modprobe Not tainted 3.11.0+ #46
[...]
[ 217.390007] [<ffffffffa007fab2>] igb_probe+0x892/0xfd0 [igb]
[ 217.396422] [<ffffffff81470b3e>] local_pci_probe+0x1e/0x40
[ 217.402641] [<ffffffff81472029>] pci_device_probe+0xf9/0x110
[...]
2. A follow up issue, pci_enable_sriov() should only be called if no VFs were
still allocated on module unload. Otherwise pci_enable_sriov() gets called
multiple times in a row rendering the NIC unusable until reset.
3. simply calling igb_enable_sriov() in igb_probe_vfs() is not enough as the
interrupts need to be re-setup. Switching that to igb_pci_enable_sriov().
Signed-off-by: Stefan Assmann <sassmann@kpanic.de> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Tested-by: Sibai Li <Sibai.li@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
This patch calls code to set the master/slave mode for all m88 gen 2
PHY's. This patch also removes the call to this function for I210 devices
only from the function that is not called by I210 devices.
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com> Tested-by: Jeff Pieper <jeffrey.e.pieper@gmail.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
The fallback to 32-bit DMA mask is rather odd:
err = dma_set_mask(&pdev->dev, DMA_BIT_MASK(64));
if (!err) {
err = dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(64));
if (!err)
pci_using_dac = 1;
} else {
err = dma_set_mask(&pdev->dev, DMA_BIT_MASK(32));
if (err) {
err = dma_set_coherent_mask(&pdev->dev,
DMA_BIT_MASK(32));
if (err) {
dev_err(&pdev->dev,
"No usable DMA configuration, aborting\n");
goto err_dma;
}
}
}
This means we only set the coherent DMA mask in the fallback path if
the DMA mask set failed, which is silly. This fixes it to set the
coherent DMA mask only if dma_set_mask() succeeded, and to error out
if either fails.
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Since we are already checking for read failure in check_link we don't need
to do it here. Instead just make sure the watchdog task gets scheduled, if
we are up, and it can be done there. This will better follow igbvf method
of handling a mailbox event and message timeout.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Stephen Ko <stephen.s.ko@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
The fallback to 32-bit DMA mask is rather odd:
if (!dma_set_mask(&pdev->dev, DMA_BIT_MASK(64)) &&
!dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(64))) {
pci_using_dac = 1;
} else {
err = dma_set_mask(&pdev->dev, DMA_BIT_MASK(32));
if (err) {
err = dma_set_coherent_mask(&pdev->dev,
DMA_BIT_MASK(32));
if (err) {
dev_err(&pdev->dev, "No usable DMA "
"configuration, aborting\n");
goto err_dma;
}
}
pci_using_dac = 0;
}
This means we only set the coherent DMA mask in the fallback path if
the DMA mask set failed, which is silly. This fixes it to set the
coherent DMA mask only if dma_set_mask() succeeded, and to error out
if either fails.
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
This patch resolves an issue where the MTA table can be cleared when the
interface is reset while in promisc mode. As result IPv6 traffic between
VFs will be interrupted.
This patch makes the update of the MTA table unconditional to avoid the
inconsistent clearing on reset.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
ixgbe_napi_disable_all calls napi_disable on each queue, however the busy
polling code introduced a local_bh_disable()d context around the napi_disable.
The original author did not realize that napi_disable might sleep, which would
cause a sleep while atomic BUG. In addition, on a single processor system, the
ixgbe_qv_lock_napi loop shouldn't have to mdelay. This patch adds an
ixgbe_qv_disable along with a new IXGBE_QV_STATE_DISABLED bit, which it uses to
indicate to the poll and napi routines that the q_vector has been disabled. Now
the ixgbe_napi_disable_all function will wait until all pending work has been
finished and prevent any future work from being started.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Cc: Eliezer Tamir <eliezer.tamir@linux.intel.com> Cc: Alexander Duyck <alexander.duyck@intel.com> Cc: Hyong-Youb Kim <hykim@myri.com> Cc: Amir Vadai <amirv@mellanox.com> Cc: Dmitry Kravkov <dmitry@broadcom.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
This patch resolves an issue where the logic used to detect changes in rx-usecs
was incorrect and was masked by the call to ixgbe_update_rsc().
Setting rx-usecs between 0,2-9 and 1,10 and up requires a reset to allow
ixgbe_configure_tx_ring() to set the correct value for TXDCTL.WTHRESH in
order to avoid Tx hangs with BQL enabled.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
The fallback to 32-bit DMA mask is rather odd:
if (!dma_set_mask(&pdev->dev, DMA_BIT_MASK(64)) &&
!dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(64))) {
pci_using_dac = 1;
} else {
err = dma_set_mask(&pdev->dev, DMA_BIT_MASK(32));
if (err) {
err = dma_set_coherent_mask(&pdev->dev,
DMA_BIT_MASK(32));
if (err) {
dev_err(&pdev->dev,
"No usable DMA configuration, aborting\n");
goto err_dma;
}
}
pci_using_dac = 0;
}
This means we only set the coherent DMA mask in the fallback path if
the DMA mask set failed, which is silly. This fixes it to set the
coherent DMA mask only if dma_set_mask() succeeded, and to error out
if either fails.
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
On e1000_down(), we should ensure every asynchronous work is canceled
before proceeding. Since the watchdog_task can schedule other works
apart from itself, it should be stopped first, but currently it is
stopped after the reset_task. This can result in the following race
leading to the reset_task running after the module unload:
The patch moves cancel_delayed_work_sync(watchdog_task) at the beginning
of e1000_down_and_stop() thus ensuring the race is impossible.
Cc: Tushar Dave <tushar.n.dave@intel.com> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
This change is based on a similar change made to e1000e support in
commit bb9e44d0d0f4 ("e1000e: prevent oops when adapter is being closed
and reset simultaneously"). The same issue has also been observed
on the older e1000 cards.
Here, we have increased the RESET_COUNT value to 50 because there are too
many accesses to e1000 nic on stress tests to e1000 nic, it is not enough
to set RESET_COUT 25. Experimentation has shown that it is enough to set
RESET_COUNT 50.
Signed-off-by: yzhu1 <yanjun.zhu@windriver.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Commit 7509963c703b (e1000e: Fix a compile flag mis-match for
suspend/resume) moved suspend and resume hooks to be available when
CONFIG_PM is set. However, it can be set even if CONFIG_PM_SLEEP is not set
causing following warnings to be emitted:
drivers/net/ethernet/intel/e1000e/netdev.c:6178:12: warning:
‘e1000_suspend’ defined but not used [-Wunused-function]
drivers/net/ethernet/intel/e1000e/netdev.c:6185:12: warning:
‘e1000_resume’ defined but not used [-Wunused-function]
To fix this make the hooks to be available only when CONFIG_PM_SLEEP is set
and remove CONFIG_PM wrapping from driver ops because this is already
handled by SET_SYSTEM_SLEEP_PM_OPS() and SET_RUNTIME_PM_OPS().
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Dave Ertman <davidx.m.ertman@intel.com> Cc: Aaron Brown <aaron.f.brown@intel.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
This patch addresses a mis-match between the declaration and usage of
the e1000_suspend and e1000_resume functions. Previously, these
functions were declared in a CONFIG_PM_SLEEP wrapper, and then utilized
within a CONFIG_PM wrapper. Both the declaration and usage will now be
contained within CONFIG_PM wrappers.
Signed-off-by: Dave Ertman <davidx.m.ertman@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
The fallback to 32-bit DMA mask is rather odd:
err = dma_set_mask(&pdev->dev, DMA_BIT_MASK(64));
if (!err) {
err = dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(64));
if (!err)
pci_using_dac = 1;
} else {
err = dma_set_mask(&pdev->dev, DMA_BIT_MASK(32));
if (err) {
err = dma_set_coherent_mask(&pdev->dev,
DMA_BIT_MASK(32));
if (err) {
dev_err(&pdev->dev,
"No usable DMA configuration, aborting\n");
goto err_dma;
}
}
}
This means we only set the coherent DMA mask in the fallback path if
the DMA mask set failed, which is silly. This fixes it to set the
coherent DMA mask only if dma_set_mask() succeeded, and to error out
if either fails.
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Provide a helper to set both the DMA and coherent DMA masks to the
same value - this avoids duplicated code in a number of drivers,
sometimes with buggy error handling, and also allows us identify
which drivers do things differently.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
When FB_EVENT_FB_UNBIND is sent, fbcon has two paths, one path taken
when there is another frame buffer to switch any affected vcs to and
another path when there isn't.
In the case where there is another frame buffer to use,
fbcon_fb_unbind calls set_con2fb_map to remap all of the affected vcs
to the replacement frame buffer. set_con2fb_map will eventually call
con2fb_release_oldinfo when the last vcs gets unmapped from the old
frame buffer.
con2fb_release_oldinfo frees the fbcon data that is hooked off of the
fb_info structure, including the cursor timer.
In the case where there isn't another frame buffer to use,
fbcon_fb_unbind simply calls fbcon_unbind, which doesn't clear the
con2fb_map or free the fbcon data hooked from the fb_info
structure. In particular, it doesn't stop the cursor blink timer. When
the fb_info structure is then freed, we end up with a timer queue
pointing into freed memory and "bad things" start happening.
This patch first changes con2fb_release_oldinfo so that it can take a
NULL pointer for the new frame buffer, but still does all of the
deallocation and cursor timer cleanup.
Finally, the patch tries to replicate some of what set_con2fb_map does
by clearing the con2fb_map for the affected vcs and calling the
modified con2fb_release_info function to clean up the fb_info structure.
Signed-off-by: Keith Packard <keithp@keithp.com> Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
free_holes_block() passed local variable as a block pointer
to ext4_clear_blocks(). Thus ext4_clear_blocks() zeroed out this local
variable instead of proper place in inode / indirect block. We later
zero out proper place in inode / indirect block but don't dirty the
inode / buffer again which can lead to subtle issues (some changes e.g.
to inode can be lost).
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
While invesgiating the issue where in "mount --bind -oremount,ro ..."
would result in later "mount --bind -oremount,rw" succeeding even if
the mount started off locked I realized that there are several
additional mount flags that should be locked and are not.
In particular MNT_NOSUID, MNT_NODEV, MNT_NOEXEC, and the atime
flags in addition to MNT_READONLY should all be locked. These
flags are all per superblock, can all be changed with MS_BIND,
and should not be changable if set by a more privileged user.
The following additions to the current logic are added in this patch.
- nosuid may not be clearable by a less privileged user.
- nodev may not be clearable by a less privielged user.
- noexec may not be clearable by a less privileged user.
- atime flags may not be changeable by a less privileged user.
The logic with atime is that always setting atime on access is a
global policy and backup software and auditing software could break if
atime bits are not updated (when they are configured to be updated),
and serious performance degradation could result (DOS attack) if atime
updates happen when they have been explicitly disabled. Therefore an
unprivileged user should not be able to mess with the atime bits set
by a more privileged user.
The additional restrictions are implemented with the addition of
MNT_LOCK_NOSUID, MNT_LOCK_NODEV, MNT_LOCK_NOEXEC, and MNT_LOCK_ATIME
mnt flags.
Taken together these changes and the fixes for MNT_LOCK_READONLY
should make it safe for an unprivileged user to create a user
namespace and to call "mount --bind -o remount,... ..." without
the danger of mount flags being changed maliciously.
Cc: stable@vger.kernel.org Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
There are no races as locked mount flags are guaranteed to never change.
Moving the test into do_remount makes it more visible, and ensures all
filesystem remounts pass the MNT_LOCK_READONLY permission check. This
second case is not an issue today as filesystem remounts are guarded
by capable(CAP_DAC_ADMIN) and thus will always fail in less privileged
mount namespaces, but it could become an issue in the future.
Cc: stable@vger.kernel.org Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Kenton Varda <kenton@sandstorm.io> discovered that by remounting a
read-only bind mount read-only in a user namespace the
MNT_LOCK_READONLY bit would be cleared, allowing an unprivileged user
to the remount a read-only mount read-write.
Correct this by replacing the mask of mount flags to preserve
with a mask of mount flags that may be changed, and preserve
all others. This ensures that any future bugs with this mask and
remount will fail in an easy to detect way where new mount flags
simply won't change.
Cc: stable@vger.kernel.org Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>