The fatal_skb_slots is the threshold to determine whether a packet is
malicious.
XEN_NETBK_LEGACY_SLOTS_MAX is the maximum slots a valid packet can have at
this point. It is defined to be XEN_NETIF_NR_SLOTS_MIN because that's
guaranteed to be supported by all backends.
Suggested-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Johannes Berg [Thu, 23 May 2013 20:24:11 +0000 (22:24 +0200)]
mac80211_hwsim: remove P2P_DEVICE support
Unfortunately, advertising P2P_DEVICE support was a little
premature, a number of issues came up in testing and have
been fixed for 3.10. Rather than try to backport all the
different fixes, disable P2P_DEVICE support in the drivers
using it.
Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Johannes Berg [Thu, 23 May 2013 20:24:31 +0000 (22:24 +0200)]
iwlwifi: mvm: remove P2P_DEVICE support
Unfortunately, advertising P2P_DEVICE support was a little
premature, a number of issues came up in testing and have
been fixed for 3.10. Rather than try to backport all the
different fixes, disable P2P_DEVICE support in the drivers
using it. For iwlmvm that implies disabling P2P completely
as it can't support P2P operation w/o P2P Device.
Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The maximum packet including header that can be handled by netfront / netback
wire format is 65535. Reduce gso_max_size accordingly.
Drop skb and print warning when skb->len > 65535. This can 1) save the effort
to send malformed packet to netback, 2) help spotting misconfiguration of
netfront in the future.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tune xen_netbk_count_requests to not touch working array beyond limit, so that
we can make working array size constant.
Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tracking down from the caller, first_idx is always equal to vif->tx.req_cons.
Remove it to avoid confusion.
Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Some frontend drivers are sending packets > 64 KiB in length. This length
overflows the length field in the first slot making the following slots have
an invalid length.
Turn this error back into a non-fatal error by dropping the packet. To avoid
having the following slots having fatal errors, consume all slots in the
packet.
This does not reopen the security hole in XSA-39 as if the packet as an
invalid number of slots it will still hit fatal error case.
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch tries to coalesce tx requests when constructing grant copy
structures. It enables netback to deal with situation when frontend's
MAX_SKB_FRAGS is larger than backend's MAX_SKB_FRAGS.
With the help of coalescing, this patch tries to address two regressions
avoid reopening the security hole in XSA-39.
Regression 1. The reduction of the number of supported ring entries (slots)
per packet (from 18 to 17). This regression has been around for some time but
remains unnoticed until XSA-39 security fix. This is fixed by coalescing
slots.
Regression 2. The XSA-39 security fix turning "too many frags" errors from
just dropping the packet to a fatal error and disabling the VIF. This is fixed
by coalescing slots (handling 18 slots when backend's MAX_SKB_FRAGS is 17)
which rules out false positive (using 18 slots is legit) and dropping packets
using 19 to `max_skb_slots` slots.
To avoid reopening security hole in XSA-39, frontend sending packet using more
than max_skb_slots is considered malicious.
The behavior of netback for packet is thus:
1-18 slots: valid
19-max_skb_slots slots: drop and respond with an error
max_skb_slots+ slots: fatal error
max_skb_slots is configurable by admin, default value is 20.
Also change variable name from "frags" to "slots" in netbk_count_requests.
Please note that RX path still has dependency on MAX_SKB_FRAGS. This will be
fixed with separate patch.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit a99d76f (leds: leds-gpio: use gpio_request_one)
and commit 2d7c22f (leds: leds-gpio: set devm_gpio_request_one()
flags param correctly) which was a fix of the first one.
The conversion to devm_gpio_request in commit e3b1d44c (leds:
leds-gpio: use devm_gpio_request_one) is not reverted.
The problem is that gpio_cansleep() and gpio_get_value_cansleep()
calls can crash if the gpio is not first reserved. Incidentally this
same bug existed earlier and was fixed similarly in commit d95cbe61
(leds: Fix potential leds-gpio oops). But the OOPS is real. It happens
when GPIOs are provided by module which is not yet loaded.
So this fixes the following BUG during my ALIX boot (3.9.2-vanilla):
This patch fixes a bug where FILEIO was incorrectly reporting the number
of logical blocks (+ 1) when using non struct block_device export mode.
It changes fd_get_blocks() to follow all other backend ->get_blocks() cases,
and reduces the calculated dev_size by one dev->dev_attrib.block_size
number of bytes, and also fixes initial fd_block_size assignment at
fd_configure_device() time introduced in commit 0fd97ccf4.
Switch back to pre commit 1c7b13fe652 list splicing logic for active I/O
shutdown with tcm_qla2xxx + ib_srpt fabrics.
The original commit was done under the incorrect assumption that it's safe to
walk se_sess->sess_cmd_list unprotected in target_wait_for_sess_cmds() after
sess->sess_tearing_down = 1 has been set by target_sess_cmd_list_set_waiting()
during session shutdown.
So instead of adding sess->sess_cmd_lock protection around sess->sess_cmd_list
during target_wait_for_sess_cmds(), switch back to sess->sess_wait_list to
allow wait_for_completion() + TFO->release_cmd() to occur without having to
walk ->sess_cmd_list after the list_splice.
Also add a check to exit if target_sess_cmd_list_set_waiting() has already
been called, and add a WARN_ON to check for any fabric bug where new se_cmds
are added to sess->sess_cmd_list after sess->sess_tearing_down = 1 has already
been set.
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Cc: Joern Engel <joern@logfs.org> Cc: Roland Dreier <roland@kernel.org> Signed-off-by: Lingzhu Xiang <lxiang@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix bug introduced by commit 4582a4ab2a "FUSE: Adapt readdirplus to application
usage patterns".
We need to check for a positive dentry; negative dentries are not added by
readdirplus. Secondly we need to advise the use of readdirplus on the *parent*,
otherwise the whole thing is useless. Thirdly all this is only relevant if
"readdirplus_auto" mode is selected by the filesystem.
We advise the use of readdirplus only if the dentry was still valid. If we had
to redo the lookup then there was no use in doing the -plus version.
McASP serial audio engine needs different rotation values on TX and RX
channels. Commit dde109fb462 ("ASoC: McASP: Fix data rotation for
playback. Enables 24bit audio playback") changed the calculation to fix
the playback format, but broke the capture stream by doing it for both
TXFMT and RXFMT.
Signed-off-by: Daniel Mack <zonque@gmail.com> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Lingzhu Xiang <lxiang@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 819a087316a6 ("IB/iser: Avoid error prints on EAGAIN
registration failures") not only eliminated the error print on that
case, but rather also modified the code such that it doesn't return
any error to upper layers. As a result a wrong mapping was used. Fix
this to correctly return the error in that case.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Unlike Kvaser Leaf light devices, some other Kvaser devices (like USBcan
Pro, USBcan R) receive CAN messages in CMD_LOG_MESSAGE frames. This
patch adds support for it.
Signed-off-by: Jonas Peterson <jonas.peterson@gmail.com> Signed-off-by: Olivier Sobrie <olivier@sobrie.be> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 091f0ea30074bc43f9250961b3247af713024bc6 "tg3: Add New 5719 Read
DMA workaround" added a workaround for TX DMA stall on the 5719. This
workaround needs to be applied to the 5720 as well.
Reported-by: Roland Dreier <roland@purestorage.com> Tested-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If a P2P-Device is present and another virtual interface triggers
the connection work, the system crash because it tries to check
if the P2P-Device's netdev (which doesn't exist) is up. Skip any
wdevs that have no netdev to fix this.
Without this command, the firmware will filter out all the
multicast frames. Let them all in as for now. Later we will
want to optimize this to save power.
I tried to avoid to send zeroed LQ cmd, but I made a (very)
stupid mistake in the memcmp.
Since this patch has been ported to stable, the fix should
go to stable too.
This fixes https://bugzilla.kernel.org/show_bug.cgi?id=58341
Reported-by: Hinnerk van Bruinehsen <h.v.bruinehsen@fu-berlin.de> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since Eric's commit efe117ab8 ("Speedup ieee80211_remove_interfaces")
there's a bug in mac80211 when it unregisters with AP_VLAN interfaces
up. If the AP_VLAN interface was registered after the AP it belongs
to (which is the typical case) and then we get into this code path,
unregister_netdevice_many() will crash because it isn't prepared to
deal with interfaces being closed in the middle of it. Exactly this
happens though, because we iterate the list, find the AP master this
AP_VLAN belongs to and dev_close() the dependent VLANs. After this,
unregister_netdevice_many() won't pick up the fact that the AP_VLAN
is already down and will do it again, causing a crash.
Signed-off-by: Johannes Berg <johannes.berg@intel.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We send direct probe to broadcast address, as some APs do not respond to
unicast PROBE frames when unassociated. Broadcast frames are not acked,
so we can not use that for trigger MLME state machine, but we need to
use old timeout mechanism.
This fixes authentication timed out like below:
[ 1024.671974] wlan6: authenticate with 54:e6:fc:98:63:fe
[ 1024.694125] wlan6: direct probe to 54:e6:fc:98:63:fe (try 1/3)
[ 1024.695450] wlan6: direct probe to 54:e6:fc:98:63:fe (try 2/3)
[ 1024.700586] wlan6: send auth to 54:e6:fc:98:63:fe (try 3/3)
[ 1024.701441] wlan6: authentication with 54:e6:fc:98:63:fe timed out
With fix, we have:
[ 4524.198978] wlan6: authenticate with 54:e6:fc:98:63:fe
[ 4524.220692] wlan6: direct probe to 54:e6:fc:98:63:fe (try 1/3)
[ 4524.421784] wlan6: send auth to 54:e6:fc:98:63:fe (try 2/3)
[ 4524.423272] wlan6: authenticated
[ 4524.423811] wlan6: associate with 54:e6:fc:98:63:fe (try 1/3)
[ 4524.427492] wlan6: RX AssocResp from 54:e6:fc:98:63:fe (capab=0x431 status=0 aid=1)
The falcon is present, but the rest of the copy engine doesn't appear to
be... PUNITS doesn't report disabled (maybe the bits for the copy engines
got added later?), so we end up trying to use a non-functional CE1, and
bust all sorts of things.. Most notably, suspend/resume..
Like on UL30VT, the ACPI video driver can't control backlight correctly on
Asus UL30A. Vendor driver (asus-laptop) can work. This patch is to
add "Asus UL30A" to ACPI video detect blacklist in order to use
asus-laptop for video control on the "Asus UL30A" rather than ACPI
video driver.
Signed-off-by: Bastian Triller <bastian.triller@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently, drivers/acpi/device_pm.c depends on CONFIG_PM and all of
the functions defined in there are replaced with static inline stubs
if that option is unset. However, CONFIG_PM means, roughly, "runtime
PM or suspend/hibernation support" and some of those functions are
useful regardless of that. For example, they are used by the ACPI
fan driver for controlling fans and acpi_device_set_power() is called
during device removal. Moreover, device initialization may depend on
setting device power states properly.
For these reasons, make the routines manipulating ACPI device power
states defined in drivers/acpi/device_pm.c available for CONFIG_PM
unset too.
Reported-by: Zhang Rui <rui.zhang@intel.com> Reported-and-tested-by: Michel Lespinasse <walken@google.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Work around an IOMMU hardware bug where clearing the
EVT_INT or PPR_INT bit in the status register may race with
the hardware trying to set it again. When not handled the
bit might not be cleared and we lose all future event or ppr
interrupts.
Current driver does not clear the IOMMU event log interrupt bit
in the IOMMU status register after processing an interrupt.
This causes the IOMMU hardware to generate event log interrupt only once.
This has been observed in both IOMMU v1 and V2 hardware.
This patch clears the bit by writing 1 to bit 1 of the IOMMU
status register (MMIO Offset 2020h)
Move the counter for non-AMPDU frames to mvm. It is needed
for the drain flow which happens once the ieee80211_sta has
been freed, so keeping it in iwl_mvm_sta which is embed into
ieee80211_sta is not a good idea.
Also, since its purpose it to remove the STA in the fw only
after all the frames for this station have exited the shared
Tx queues, we need to decrement it in the reclaim flow. This
flow can happen after ieee80211_sta has been removed, which
means that we have no iwl_mvm_sta there. So we can't know
what is the vif type. Hence, we know audit these frames for
all the vif types.
In order to avoid spawning sta_drained_wk all the time, we
now check that we are in a flow in which draining might
happen - only when mvmsta is NULL. This is better than
previous code that would spawn sta_drained_wk all the time
in AP mode.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Reviewed-by: Ilan Peer <ilan.peer@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Lingzhu Xiang <lxiang@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch fixes races uncovered by xfstests testcase 068.
One race is the result of jfs_sync() trying to write a sync point to the
journal after it has been frozen (or possibly in the process). Since
freezing sync's the journal, there is no need to write a sync point so
we simply want to return.
The second involves jfs_write_inode() being called on a deleted inode.
It calls jfs_flush_journal which is held up by the jfs_commit thread
doing the final iput on the same deleted inode, which itself is
waiting for the I_SYNC flag to be cleared. jfs_write_inode need not
do anything when i_nlink is zero, which is the easy fix.
Reported-by: Michael L. Semon <mlsemon35@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After sleeping for filldir(), we check to see if the file system has
changed and research. The next_pos pointer is updated but its value
isn't pushed into the key used for the search itself. As a result,
the search returns the same item that the last cycle of the loop did
and filldir() is called multiple times with the same data.
The end result is that the buffer can contain the same name multiple
times. This can be returned to userspace or used internally in the
xattr code where it can manifest with the following warning:
jdm-20004 reiserfs_delete_xattrs: Couldn't delete all xattrs (-2)
reiserfs_for_each_xattr uses reiserfs_readdir_dentry to iterate over
the xattr names and ends up trying to unlink the same name twice. The
second attempt fails with -ENOENT and the error is returned. At some
point I'll need to add support into reiserfsck to remove the orphaned
directories left behind when this occurs.
The fix is to push the value into the key before researching.
Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jonghwan Choi <jhbird.choi@samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
reiserfs_chown_xattrs() takes the iattr struct passed into ->setattr
and uses it to iterate over all the attrs associated with a file to change
ownership of xattrs (and transfer quota associated with the xattr files).
When the setuid bit is cleared during chown, ATTR_MODE and iattr->ia_mode
are passed to all the xattrs as well. This means that the xattr directory
will have S_IFREG added to its mode bits.
This has been prevented in practice by a missing IS_PRIVATE check
in reiserfs_acl_chmod, which caused a double-lock to occur while holding
the write lock. Since the file system was completely locked up, the
writeout of the corrupted mode never happened.
This patch temporarily clears everything but ATTR_UID|ATTR_GID for the
calls to reiserfs_setattr and adds the missing IS_PRIVATE check.
Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jonghwan Choi <jhbird.choi@samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reiserfs is currently able to be deadlocked by having two NFS clients
where one has removed and recreated a file and another is accessing the
file with an open file handle.
If one client deletes and recreates a file with timing such that the
recreated file obtains the same [dirid, objectid] pair as the original
file while another client accesses the file via file handle, the create
and lookup can race and deadlock if the lookup manages to create the
in-memory inode first.
The create thread, in insert_inode_locked4, will hold the write lock
while waiting on the other inode to be unlocked. The lookup thread,
anywhere in the iget path, will release and reacquire the write lock while
it schedules. If it needs to reacquire the lock while the create thread
has it, it will never be able to make forward progress because it needs
to reacquire the lock before ultimately unlocking the inode.
This patch drops the write lock across the insert_inode_locked4 call so
that the ordering of inode_wait -> write lock is retained. Since this
would have been the case before the BKL push-down, this is safe.
Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jonghwan Choi <jhbird.choi@samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Adam Lackorzynski reported the following build failure on
!CONFIG_HOTPLUG_CPU configuration:
CC arch/powerpc/kernel/rtas.o
arch/powerpc/kernel/rtas.c: In function ‘rtas_cpu_state_change_mask’:
arch/powerpc/kernel/rtas.c:843:4: error: implicit declaration of function ‘cpu_down’ [-Werror=implicit-function-declaration]
cc1: all warnings being treated as errors
make[1]: *** [arch/powerpc/kernel/rtas.o] Error 1
make: *** [arch/powerpc/kernel] Error 2
The build fails because cpu_down() is defined only under CONFIG_HOTPLUG_CPU.
Looking further, the mobility code in pseries is one of the call-sites which
uses rtas_ibm_suspend_me(), which in turn calls rtas_cpu_state_change_mask().
And the mobility code is unconditionally compiled-in (it does not fall under
any Kconfig option). And commit 120496ac (powerpc: Bring all threads online
prior to migration/hibernation) which introduced this build regression is
critical for the proper functioning of the migration code. So it appears
that the only solution to this problem is to enable CONFIG_HOTPLUG_CPU if
SMP is enabled on PPC_PSERIES platforms. So make that change in the Kconfig.
Reported-by: Adam Lackorzynski <adam@os.inf.tu-dresden.de> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
regulator_enable_regmap() uses enable_reg to enable the regulator.
But enable_reg for smps10 points to SMPS10_STATUS which is a
read-only register. Fixed the same by having enable_reg
set to SMPS10_CTRL.
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is encountered when booting RHEL5.9 64-bit. There is another bug
after this one that is not a simple emulation failure, but this one lets
the boot proceed a bit.
Given that srpt_release_channel_work() calls target_wait_for_sess_cmds()
to allow outstanding se_cmd_t->cmd_kref a change to complete, the call
to perform target_sess_cmd_list_set_waiting() needs to happen in
srpt_shutdown_session()
Also, this patch adds an explicit call to srpt_shutdown_session() within
srpt_drain_channel() so that target_sess_cmd_list_set_waiting() will be
called in the cases where TFO->shutdown_session() is not triggered
directly by TCM.
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Cc: Joern Engel <joern@logfs.org> Cc: Roland Dreier <roland@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If a key was larger than 64 bytes, as checked by iscsi_check_key(), the
error response packet, generated by iscsi_add_notunderstood_response(),
would still attempt to copy the entire key into the packet, overflowing
the structure on the heap.
Remote preauthentication kernel memory corruption was possible if a
target was configured and listening on the network.
If we are emulating an instruction inside an active user transaction that
touches memory, the kernel can't emulate it as it operates in transactional
suspend context. We need to abort these transactions and send them back to
userspace for the hardware to rollback.
We can service these if the user transaction is in suspend mode, since the
kernel will operate in the same suspend context.
This adds a check to all alignment faults and to specific instruction
emulations (only string instructions for now). If the user process is in an
active (non-suspended) transaction, we abort the transaction go back to
userspace allowing the HW to roll back the transaction and tell the user of the
failure. This also adds new tm abort cause codes to report the reason of the
persistent error to the user.
Crappy test case here http://neuling.org/devel/junkcode/aligntm.c
Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When in an active transaction that takes a signal, we need to be careful with
the stack. It's possible that the stack has moved back up after the tbegin.
The obvious case here is when the tbegin is called inside a function that
returns before a tend. In this case, the stack is part of the checkpointed
transactional memory state. If we write over this non transactionally or in
suspend, we are in trouble because if we get a tm abort, the program counter
and stack pointer will be back at the tbegin but our in memory stack won't be
valid anymore.
To avoid this, when taking a signal in an active transaction, we need to use
the stack pointer from the checkpointed state, rather than the speculated
state. This ensures that the signal context (written tm suspended) will be
written below the stack required for the rollback. The transaction is aborted
becuase of the treclaim, so any memory written between the tbegin and the
signal will be rolled back anyway.
For signals taken in non-TM or suspended mode, we use the
normal/non-checkpointed stack pointer.
Tested with 64 and 32 bit signals
Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
While returning from exception handling in case of PREEMPT enabled,
_TIF_NEED_RESCHED bit is checked in TI_FLAGS (thread_info flag) of current
task. Only if this bit is set, it should continue with the process of
calling preempt_schedule_irq() to schedule highest priority task if
available.
Current code assumes that r8 contains TI_FLAGS and check this for
_TIF_NEED_RESCHED, but as r8 is modified in the code which executes before
this check, r8 no longer contains the expected TI_FLAGS information.
As a result check for comparison with _TIF_NEED_RESCHED was failing even if
NEED_RESCHED bit is set in the current thread_info flag. Due to this,
preempt_schedule_irq() and in turn scheduler was not getting called even if
highest priority task is ready for execution.
So, store temporary results in r0 instead of r8 to prevent r8 from getting
modified as subsequent code is dependent on its value.
When cgroup_next_descendant_pre() initiates a walk, it checks whether
the subtree root doesn't have any children and if not returns NULL.
Later code assumes that the subtree isn't empty. This is broken
because the subtree may become empty inbetween, which can lead to the
traversal escaping the subtree by walking to the sibling of the
subtree root.
There's no reason to have the early exit path. Remove it along with
the later assumption that the subtree isn't empty. This simplifies
the code a bit and fixes the subtle bug.
While at it, fix the comment of cgroup_for_each_descendant_pre() which
was incorrectly referring to ->css_offline() instead of
->css_online().
cgroup_create_file() calls d_instantiate(), which may decide to look
at the xattrs on the file. Smack always does this and SELinux can be
configured to do so.
But cgroup_add_file() didn't initialize xattrs before calling
cgroup_create_file(), which finally leads to dereferencing NULL
dentry->d_fsdata.
This bug has been there since cgroup xattr was introduced.
Reported-by: Ivan Bulatovic <combuster@archlinux.us> Reported-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The present code does not wait for the SCC to finish resetting itself
before trying to initialise the device. The result is that the SCC
interrupt sources become enabled (if they weren't already). This leads to
an early boot crash (unexpected interrupt) given CONFIG_EARLY_PRINTK. Fix
this by adding a delay. A successful reset disables the interrupt sources.
Also, after the reset for channel A setup, the SCC then gets a second
reset for channel B setup which leaves channel A uninitialised again. Fix
this by performing the reset only once.
libata honors DMADIR for regular commands, but not for internal commands
used (among other) during device initialisation.
This makes SATA-host-to-PATA-device bridges based on Silicon Image SiL3611
(such as "Abit Serillel 2") end up disabled when used with an ATAPI device
after a few tries.
Log output of the bridge being hot-plugged with an ATAPI drive:
[ 9631.212901] ata1: exception Emask 0x10 SAct 0x0 SErr 0x40c0000 action 0xe frozen
[ 9631.212913] ata1: irq_stat 0x00000040, connection status changed
[ 9631.212923] ata1: SError: { CommWake 10B8B DevExch }
[ 9631.212939] ata1: hard resetting link
[ 9632.104962] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 9632.106393] ata1.00: ATAPI: PIONEER DVD-RW DVR-115, 1.06, max UDMA/33
[ 9632.106407] ata1.00: applying bridge limits
[ 9632.108151] ata1.00: configured for UDMA/33
[ 9637.105303] ata1.00: qc timeout (cmd 0xa0)
[ 9637.105324] ata1.00: failed to clear UNIT ATTENTION (err_mask=0x5)
[ 9637.105335] ata1: hard resetting link
[ 9638.044599] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 9638.047878] ata1.00: configured for UDMA/33
[ 9643.044933] ata1.00: qc timeout (cmd 0xa0)
[ 9643.044953] ata1.00: failed to clear UNIT ATTENTION (err_mask=0x5)
[ 9643.044963] ata1: limiting SATA link speed to 1.5 Gbps
[ 9643.044971] ata1.00: limiting speed to UDMA/33:PIO3
[ 9643.044979] ata1: hard resetting link
[ 9643.984225] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 9643.987471] ata1.00: configured for UDMA/33
[ 9648.984591] ata1.00: qc timeout (cmd 0xa0)
[ 9648.984612] ata1.00: failed to clear UNIT ATTENTION (err_mask=0x5)
[ 9648.984619] ata1.00: disabled
[ 9649.000593] ata1: hard resetting link
[ 9649.939902] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 9649.955864] ata1: EH complete
With this patch, the drive enumerates correctly when libata is loaded with
atapi_dmadir=1:
The driver's interrupt handling code is too picky in deciding whether it should
handle an interrupt or not which causes completely unneeded spurious interrupts.
Thus make sata_rcar_{ata|serr}_interrupt() *void*; add ATA status register read
to sata_rcar_ata_interrupt() to clear an unexpected ATA interrupt -- it doesn't
get cleared by writing to the SATAINTSTAT register in the interrupt mode we use.
Also, in sata_rcar_ata_interrupt() we should check SATAINTSTAT register only for
enabled interrupts and we should clear only those interrupts that we have read
as active first time around, because else we have a race and risk clearing an
interrupt that can occur between read and write of the SATAINTSTAT register
and never registering it...
Iff bmdma_setup() has to stop a DMA transfer before starting a new
one, then the STOP bit in the ATAPI_CONTROL1 register will remain set
(it's only cleared when setting the START bit to 1) and then
bmdma_start() method will set both START and STOP bits simultaneously
which should abort the transfer being just started. Avoid that by
explicitly clearing the STOP bit in bmdma_start() method (in this case
it will be ignored on write).
Consider the case where we have a very short ip= string in the original
mount options, and when we chase a referral we end up with a very long
IPv6 address. Be sure to allow for that possibility when estimating the
size of the string to allocate.
Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Newer asics have variable numbers of crtcs. Use that
rather than the asic family to determine which crtcs
to check. This avoids checking non-existent crtcs or
missing crtcs on certain asics.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The default register value for MASTERA_VOL is 0x00, the same as
MASTERB_VOL.
Signed-off-by: Nicolas Schichan <nschichan@freebox.fr> Acked-by: Brian Austin <brian.austin@cirrus.com> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As of f025adf191924e3a75ce80e130afcd2485b53bb8 "sunrpc: Properly decode
kuids and kgids in RPC_AUTH_UNIX credentials" any rpc containing a -1
(0xffff) uid or gid would fail with a badcred error.
Reported symptoms were xmbc clients failing on upgrade of the NFS
server; examination of the network trace showed them sending -1 as the
gid.
Reported-by: Julian Sikorski <belegdol@gmail.com> Tested-by: Julian Sikorski <belegdol@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The lockless RPC_IS_QUEUED() test in __rpc_execute means that we need to
be careful about ordering the calls to rpc_test_and_set_running(task) and
rpc_clear_queued(task). If we get the order wrong, then we may end up
testing the RPC_TASK_RUNNING flag after __rpc_execute() has looped
and changed the state of the rpc_task.
The s5p_csis_phy_enable/s5p_dsim_phy_enable functions are now used
directly by corresponding drivers and thus need to be exported so
the drivers can be built as modules.
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Signed-off-by: Kukjin Kim <kgene.kim@samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Rather than completely killing the kernel if we receive an esr value we
can't deal with in the el0 handlers, send the process a SIGILL and log
the esr value in the hope that we can debug it. If we receive a bad esr
from el1, we'll die() as before.
XFS has failed to kill suid/sgid bits correctly when truncating
files of non-zero size since commit c4ed4243 ("xfs: split
xfs_setattr") introduced in the 3.1 kernel. Fix it.
Fix it.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Otherwise we get a race between unload and reload of the same module:
the new module doesn't see the old one in the list, but then fails because
it can't register over the still-extant entries in sysfs:
binutils prior to 2.18 (e.g. the ones found on SLE10) don't support
assembling PEXTRD, so a macro based approach like the one for PCLMULQDQ
in the same file should be used.
This requires making the helper macros capable of recognizing 32-bit
general purpose register operands.
[ hpa: tagging for stable as it is a low risk build fix ]
Signed-off-by: Jan Beulich <jbeulich@suse.com> Link: http://lkml.kernel.org/r/51A6142A02000078000D99D8@nat28.tlf.novell.com Cc: Alexander Boyko <alexander_boyko@xyratex.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Huang Ying <ying.huang@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Christian found v3.9 does not work with E350 with EFI is enabled.
[ 1.658832] Trying to unpack rootfs image as initramfs...
[ 1.679935] BUG: unable to handle kernel paging request at ffff88006e3fd000
[ 1.686940] IP: [<ffffffff813661df>] memset+0x1f/0xb0
[ 1.692010] PGD 1f77067 PUD 1f7a067 PMD 61420067 PTE 0
but early memtest report all memory could be accessed without problem.
early page table is set in following sequence:
[ 0.000000] init_memory_mapping: [mem 0x00000000-0x000fffff]
[ 0.000000] init_memory_mapping: [mem 0x6e600000-0x6e7fffff]
[ 0.000000] init_memory_mapping: [mem 0x6c000000-0x6e5fffff]
[ 0.000000] init_memory_mapping: [mem 0x00100000-0x6bffffff]
[ 0.000000] init_memory_mapping: [mem 0x6e800000-0x6ea07fff]
but later efi_enter_virtual_mode try set mapping again wrongly.
[ 0.010644] pid_max: default: 32768 minimum: 301
[ 0.015302] init_memory_mapping: [mem 0x640c5000-0x6e3fcfff]
that means it fails with pfn_range_is_mapped.
It turns out that we have a bug in add_range_with_merge and it does not
merge range properly when new add one fill the hole between two exsiting
ranges. In the case when [mem 0x00100000-0x6bffffff] is the hole between
[mem 0x00000000-0x000fffff] and [mem 0x6c000000-0x6e7fffff].
Fix the add_range_with_merge by calling itself recursively.
In head_64.S, a switchover has been used to handle kernel crossing
1G, 512G boundaries.
And commit 8170e6bed465b4b0c7687f93e9948aca4358a33b
x86, 64bit: Use a #PF handler to materialize early mappings on demand
said:
During the switchover in head_64.S, before #PF handler is available,
we use three pages to handle kernel crossing 1G, 512G boundaries with
sharing page by playing games with page aliasing: the same page is
mapped twice in the higher-level tables with appropriate wraparound.
But from the switchover code, when we set up the PUD table:
114 addq $4096, %rdx
115 movq %rdi, %rax
116 shrq $PUD_SHIFT, %rax
117 andl $(PTRS_PER_PUD-1), %eax
118 movq %rdx, (4096+0)(%rbx,%rax,8)
119 movq %rdx, (4096+8)(%rbx,%rax,8)
It seems line 119 has a potential bug there. For example,
if the kernel is loaded at physical address 511G+1008M, that is 000000000111111111111111000000000000000000000000
and the kernel _end is 512G+2M, that is 000000001000000000000000001000000000000000000000
So in this example, when using the 2nd page to setup PUD (line 114~119),
rax is 511.
In line 118, we put rdx which is the address of the PMD page (the 3rd page)
into entry 511 of the PUD table. But in line 119, the entry we calculate from
(4096+8)(%rbx,%rax,8) has exceeded the PUD page. IMO, the entry in line
119 should be wraparound into entry 0 of the PUD table.
The patch fixes the bug.
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Link: http://lkml.kernel.org/r/5191DE5A.3020302@cn.fujitsu.com Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
With the addition of eagerfpu the irq_fpu_usable() now returns false
negatives especially in the case of ksoftirqd and interrupted idle task,
two common cases for FPU use for example in networking/crypto. With
eagerfpu=off FPU use is possible in those contexts. This is because of
the eagerfpu check in interrupted_kernel_fpu_idle():
...
* For now, with eagerfpu we will return interrupted kernel FPU
* state as not-idle. TBD: Ideally we can change the return value
* to something like __thread_has_fpu(current). But we need to
* be careful of doing __thread_clear_has_fpu() before saving
* the FPU etc for supporting nested uses etc. For now, take
* the simple route!
...
if (use_eager_fpu())
return 0;
As eagerfpu is automatically "on" on those CPUs that also have the
features like AES-NI this patch changes the eagerfpu check to return 1 in
case the kernel_fpu_begin() has not been said yet. Once it has been the
__thread_has_fpu() will start returning 0.
Notice that with eagerfpu the __thread_has_fpu is always true initially.
FPU use is thus always possible no matter what task is under us, unless
the state has already been saved with kernel_fpu_begin().
[ hpa: this is a performance regression, not a correctness regression,
but since it can be quite serious on CPUs which need encryption at
interrupt time I am marking this for urgent/stable. ]
We should not use set_pmd_at to update pmd_t with pgtable_t pointer.
set_pmd_at is used to set pmd with huge pte entries and architectures
like ppc64, clear few flags from the pte when saving a new entry.
Without this change we observe bad pte errors like below on ppc64 with
THP enabled.
A panic can be caused by simply cat'ing /proc/<pid>/smaps while an
application has a VM_PFNMAP range. It happened in-house when a
benchmarker was trying to decipher the memory layout of his program.
/proc/<pid>/smaps and similar walks through a user page table should not
be looking at VM_PFNMAP areas.
Certain tests in walk_page_range() (specifically split_huge_page_pmd())
assume that all the mapped PFN's are backed with page structures. And
this is not usually true for VM_PFNMAP areas. This can result in panics
on kernel page faults when attempting to address those page structures.
There are a half dozen callers of walk_page_range() that walk through a
task's entire page table (as N. Horiguchi pointed out). So rather than
change all of them, this patch changes just walk_page_range() to ignore
VM_PFNMAP areas.
The logic of hugetlb_vma() is moved back into walk_page_range(), as we
want to test any vma in the range.
VM_PFNMAP areas are used by:
- graphics memory manager gpu/drm/drm_gem.c
- global reference unit sgi-gru/grufile.c
- sgi special memory char/mspec.c
- and probably several out-of-tree modules
[akpm@linux-foundation.org: remove now-unused hugetlb_vma() stub] Signed-off-by: Cliff Wickman <cpw@sgi.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: David Sterba <dsterba@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The index on the page must be set before it is inserted in the radix
tree. Otherwise there is a small race which can occur during lookup
where the page can be found with the incorrect index. This will trigger
the BUG_ON() in brd_lookup_page().
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Reported-by: Chris Wedgwood <cw@f00f.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 0c59b89c81ea ("mm: memcg: push down PageSwapCache check into
uncharge entry functions") added a VM_BUG_ON() on PageSwapCache in the
uncharge path after checking that page flag once, assuming that the
state is stable in all paths, but this is not the case and the condition
triggers in user environments. An uncharge after the last page table
reference to the page goes away can race with reclaim adding the page to
swap cache.
Swap cache pages are usually uncharged when they are freed after
swapout, from a path that also handles swap usage accounting and memcg
lifetime management. However, since the last page table reference is
gone and thus no references to the swap slot left, the swap slot will be
freed shortly when reclaim attempts to write the page to disk. The
whole swap accounting is not even necessary.
So while the race condition for which this VM_BUG_ON was added is real
and actually existed all along, there are no negative effects. Remove
the VM_BUG_ON again.
Commit 751efd8610d3 ("mmu_notifier_unregister NULL Pointer deref and
multiple ->release()") breaks the fix 3ad3d901bbcf ("mm: mmu_notifier:
fix freed page still mapped in secondary MMU").
Since hlist_for_each_entry_rcu() is changed now, we can not revert that
patch directly, so this patch reverts the commit and simply fix the bug
spotted by that patch
There is a race condition between mmu_notifier_unregister() and
__mmu_notifier_release().
Assume two tasks, one calling mmu_notifier_unregister() as a result
of a filp_close() ->flush() callout (task A), and the other calling
mmu_notifier_release() from an mmput() (task B).
A B
t1 srcu_read_lock()
t2 if (!hlist_unhashed())
t3 srcu_read_unlock()
t4 srcu_read_lock()
t5 hlist_del_init_rcu()
t6 synchronize_srcu()
t7 srcu_read_unlock()
t8 hlist_del_rcu() <--- NULL pointer deref.
This can be fixed by using hlist_del_init_rcu instead of hlist_del_rcu.
The another issue spotted in the commit is "multiple ->release()
callouts", we needn't care it too much because it is really rare (e.g,
can not happen on kvm since mmu-notify is unregistered after
exit_mmap()) and the later call of multiple ->release should be fast
since all the pages have already been released by the first call.
Anyway, this issue should be fixed in a separate patch.
-stable suggestions: Any version that has commit 751efd8610d3 need to be
backported. I find the oldest version has this commit is 3.0-stable.
nilfs2: fix issue of nilfs_set_page_dirty for page at EOF boundary
DESCRIPTION:
There are use-cases when NILFS2 file system (formatted with block size
lesser than 4 KB) can be remounted in RO mode because of encountering of
"broken bmap" issue.
The issue was reported by Anthony Doggett <Anthony2486@interfaces.org.uk>:
"The machine I've been trialling nilfs on is running Debian Testing,
Linux version 3.2.0-4-686-pae (debian-kernel@lists.debian.org) (gcc
version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.35-2), but I've
also reproduced it (identically) with Debian Unstable amd64 and Debian
Experimental (using the 3.8-trunk kernel). The problematic partitions
were formatted with "mkfs.nilfs2 -b 1024 -B 8192"."
SYMPTOMS:
(1) System log contains error messages likewise:
(2) The NILFS2 file system is remounted in RO mode.
REPRODUSING PATH:
(1) Create volume group with name "unencrypted" by means of vgcreate utility.
(2) Run script (prepared by Anthony Doggett <Anthony2486@interfaces.org.uk>):
The error message "NILFS error (device dm-17): nilfs_bmap_assign: broken
bmap (inode number=28)" takes place because of trying to get block
number for third block of the file with logical offset #3072 bytes. As
it is possible to see from above output, the file has 60 bytes of the
whole size. So, it is enough one block (1 KB in size) allocation for
the whole file. Trying to operate with several blocks instead of one
takes place because of discovering several dirty buffers for this file
in nilfs_segctor_scan_file() method.
The root cause of this issue is in nilfs_set_page_dirty function which
is called just before writing to an mmapped page.
When nilfs_page_mkwrite function handles a page at EOF boundary, it
fills hole blocks only inside EOF through __block_page_mkwrite().
The __block_page_mkwrite() function calls set_page_dirty() after filling
hole blocks, thus nilfs_set_page_dirty function (=
a_ops->set_page_dirty) is called. However, the current implementation
of nilfs_set_page_dirty() wrongly marks all buffers dirty even for page
at EOF boundary.
As a result, buffers outside EOF are inconsistently marked dirty and
queued for write even though they are not mapped with nilfs_get_block
function.
FIX:
This modifies nilfs_set_page_dirty() not to mark hole blocks dirty.
Thanks to Vyacheslav Dubeyko for his effort on analysis and proposals
for this issue.
Many callers of the wait_event_timeout() and
wait_event_interruptible_timeout() expect that the return value will be
positive if the specified condition becomes true before the timeout
elapses. However, at the moment this isn't guaranteed. If the wake-up
handler is delayed enough, the time remaining until timeout will be
calculated as 0 - and passed back as a return value - even if the
condition became true before the timeout has passed.
Fix this by returning at least 1 if the condition becomes true. This
semantic is in line with what wait_for_condition_timeout() does; see
commit bb10ed09 ("sched: fix wait_for_completion_timeout() spurious
failure under heavy load").
Daniel said "We have 3 instances of this bug in drm/i915. One case even
where we switch between the interruptible and not interruptible
wait_event_timeout variants, foolishly presuming they have the same
semantics. I very much like this."
One such bug is reported at
https://bugs.freedesktop.org/show_bug.cgi?id=64133
Signed-off-by: Imre Deak <imre.deak@intel.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: David Howells <dhowells@redhat.com> Acked-by: Jens Axboe <axboe@kernel.dk> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Dave Jones <davej@redhat.com> Cc: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There is a race between klist_remove and klist_release. klist_remove
uses a local var waiter saved on stack. When klist_release calls
wake_up_process(waiter->process) to wake up the waiter, waiter might run
immediately and reuse the stack. Then, klist_release calls
list_del(&waiter->list) to change previous
wait data and cause prior waiter thread corrupt.
Page 'new' during MIGRATION can't be flushed with flush_cache_page().
Using flush_cache_page(vma, addr, pfn) is justified only if the page is
already placed in process page table, and that is done right after
flush_cache_page(). But without it the arch function has no knowledge
of process PTE and does nothing.
Besides that, flush_cache_page() flushes an application cache page, but
the kernel has a different page virtual address and dirtied it.
Replace it with flush_dcache_page(new) which is the proper usage.
The old page is flushed in try_to_unmap_one() before migration.
This bug takes place in Sead3 board with M14Kc MIPS CPU without cache
aliasing (but Harvard arch - separate I and D cache) in tight memory
environment (128MB) each 1-3days on SOAK test. It fails in cc1 during
kernel build (SIGILL, SIGBUS, SIGSEG) if CONFIG_COMPACTION is switched
ON.
Signed-off-by: Leonid Yegoshin <Leonid.Yegoshin@imgtec.com> Cc: Leonid Yegoshin <yegoshin@mips.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Michal Hocko <mhocko@suse.cz> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix bug in MSI interrupt handling which causes loss of event
notifications.
Typical indication of lost MSI interrupts are stalled message and
doorbell transfers between RapidIO endpoints. To avoid loss of MSI
interrupts all interrupts from the device must be disabled on entering
the interrupt handler routine and re-enabled when exiting it.
Re-enabling device interrupts will trigger new MSI message(s) if Tsi721
registered new events since entering interrupt handler routine.
This patch is applicable to kernel versions starting from v3.2.
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
During the development of this driver an in-house register documentation
was used. The last week some integration tests were done and this
problem was found. It turned out that the released register
documentation is wrong.
The fix is very simple: shift all masks by one.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Cc: Bryan Wu <cooloney@gmail.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Enable KW_PCIE1 on QNAP TS-11x/TS-21x devices as newer revisions
(rev 1.3) have a USB 3.0 chip from Etron on PCIe port 1. Thanks
to Marek Vasut for identifying this issue!
Signed-off-by: Martin Michlmayr <tbm@cyrius.com> Tested-by: Marek Vasut <marex@denx.de> Acked-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jason Cooper <jason@lakedaemon.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Last time we found there is lock/unlock bug in ocfs2_file_aio_write, and
then we did a thorough search for all lock resources in
ocfs2_inode_info, including rw, inode and open lockres and found this
bug. My kernel version is 3.0.13, and it is also in the lastest version
3.9. In ocfs2_fiemap, once ocfs2_get_clusters_nocache failed, it should
goto out_unlock instead of out, because we need release buffer head, up
read alloc sem and unlock inode.
Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Acked-by: Sunil Mushran <sunil.mushran@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Manual peak calibration is currently enabled only for
AR9462 and AR9565. This is also required for AR9485.
The initvals are also modified to disable HW peak calibration.
Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The comparison between traced and symbol addresses is backwards: if
the traced address doesn't exactly match a symbol (which we don't
expect it to), we'll show the next symbol and the offset to it,
whereas we should show the previous symbol and the offset from it.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This works much better if we don't treat protocol numbers as addresses.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The 5725 family of devices (asic rev 5762), corrupts TSO packets where
the buffer is within MSS bytes of a 4G boundary (4G, 8G etc.). Detect
this condition and trigger the workaround path.
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On the 5718, 5719 and 5720 serdes devices, powering down function 0
results in all the other ports being powered down. Add code to skip
function 0 power down.
v2:
- Modify tg3_phy_power_bug() function to use a switch instead of a
complicated if statement. Suggested by Joe Perches.
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 902c098a3663 ("random: use lockless techniques in the interrupt
path") turned IRQ path from being spinlock protected into lockless
cmpxchg-retry update.
That commit removed r->lock serialization between crediting entropy bits
from IRQ context and accounting when extracting entropy on userspace
read path, but didn't turn the r->entropy_count reads/updates in
account() to use cmpxchg as well.
It has been observed, that under certain circumstances this leads to
read() on /dev/urandom to return 0 (EOF), as r->entropy_count gets
corrupted and becomes negative, which in turn results in propagating 0
all the way from account() to the actual read() call.
Convert the accounting code to be the proper lockless counterpart of
what has been partially done by 902c098a3663.
Commit ec8f02da9ea5 ("random: prime last_data value per fips
requirements") added priming of last_data per fips requirements.
Unfortuantely, it did so in a way that can lead to multiple threads all
incrementing nbytes, but only one actually doing anything with the extra
data, which leads to some fun random corruption and panics.
The fix is to simply do everything needed to prime last_data in a single
shot, so there's no window for multiple cpus to increment nbytes -- in
fact, we won't even increment or decrement nbytes anymore, we'll just
extract the needed EXTRACT_SIZE one time per pool and then carry on with
the normal routine.
All these changes have been tested across multiple hosts and
architectures where panics were previously encoutered. The code changes
are are strictly limited to areas only touched when when booted in fips
mode.
This change should also go into 3.8-stable, to make the myriads of fips
users on 3.8.x happy.
Signed-off-by: Jarod Wilson <jarod@redhat.com> Tested-by: Jan Stancek <jstancek@redhat.com> Tested-by: Jan Stodola <jstodola@redhat.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Neil Horman <nhorman@tuxdriver.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Matt Mackall <mpm@selenic.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-------------->8---------------------
[ARCLinux]$ ip address show lo | grep inet
[ARCLinux]$ ip address show lo | grep inet
[ARCLinux]$ ip address show lo | grep inet
[ARCLinux]$
[ARCLinux]$ ip address show lo | grep inet
inet 127.0.0.1/8 scope host lo
-------------->8---------------------
The user mode page permission templates used to have all Kernel mode
access bits enabled.
This caused a tricky race condition observed with uClibc buffered file
read and UNIX pipes.
1. Read access to an anon mapped page in libc .bss: write-protected
zero_page mapped: TLB Entry installed with Ur + K[rwx]
2. grep calls libc:getc() -> buffered read layer calls read(2) with the
internal read buffer in same .bss page.
The read() call is on STDIN which has been redirected to a pipe.
read(2) => sys_read() => pipe_read() => copy_to_user()
3. Since page has Kernel-write permission (despite being user-mode
write-protected), copy_to_user() suceeds w/o taking a MMU TLB-Miss
Exception (page-fault for ARC). core-MM is unaware that kernel
erroneously wrote to the reserved read-only zero-page (BUG #1)
4. Control returns to userspace which now does a write to same .bss page
Since Linux MM is not aware that page has been modified by kernel, it
simply reassigns a new writable zero-init page to mapping, loosing the
prior write by kernel - effectively zero'ing out the libc read buffer
under the hood - hence grep doesn't see right data (BUG #2)
The fix is to make all kernel-mode access permissions mirror the
user-mode ones. Note that the kernel still has full access to pages,
when accessed directly (w/o MMU) - this fix ensures that kernel-mode
access in copy_to_from() path uses the same faulting access model as for
pure user accesses to keep MM fully aware of page state.
The issue is peudo-random because it only shows up if the TLB entry
installed in #1 is present at the time of #3. If it is evicted out, due
to TLB pressure or some-such, then copy_to_user() does take a TLB Miss
Exception, with a routine write-to-anon COW processing installing a
fresh page for kernel writes and also usable as it is in userspace.
Further the issue was dormant for so long as it depends on where the
libc internal read buffer (in .bss) is mapped at runtime.
If it happens to reside in file-backed data mapping of libc (in the
page-aligned slack space trailing the file backed data), loader zero
padding the slack space, does the early cow page replacement, setting
things up at the very beginning itself.
With gcc 4.8 based builds, the libc buffer got pushed out to a real
anon mapping which triggers the issue.
It's generally not safe to reset the inode ops once they've been set. In
the case where the inode was originally thought to be a directory and
then later found to be a DFS referral, this can lead to an oops when we
try to trigger an inode op on it after changing the ops to the blank
referral operations.
Reported-and-Tested-by: Sachin Prabhu <sprabhu@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Linux' notion of cpuid is different from the Host's notion of CPUID. In the
call to bind the channel interrupts, we should use the host's notion of
CPU Ids. Fix this bug.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
HP's virtual UHCI host controller takes a long time to suspend
(several hundred microseconds), even when no devices are attached.
This provokes a warning message from uhci-hcd in the auto-stop case.
To prevent this from happening, this patch adds a test to avoid
performing an auto-stop when the wait_for_hp quirk flag is set. The
controller will still suspend through the normal runtime PM mechanism.
And since that pathway includes a 1-ms delay, the slowness of the
virtual hardware won't matter.