The hardware and firmware for the RTL8192EE utilize a FIFO list of
descriptors. There were some problems with the initial implementation.
The worst of these failed to detect that the FIFO was becoming full,
which led to the device needing to be power cycled. As this condition
is not relevant to most of the devices supported by rtlwifi, a callback
routine was added to detect this situation. This patch implements the
necessary changes in the pci handler, and the linkage into the appropriate
rtl8192ee routine.
Signed-off-by: Troy Tan <troy_tan@realsil.com.cn> Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Cc: Stable <stable@vger.kernel.org> [V3.18] Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We have a race condition between move_pages() and freeing hugepages, where
move_pages() calls follow_page(FOLL_GET) for hugepages internally and
tries to get its refcount without preventing concurrent freeing. This
race crashes the kernel, so this patch fixes it by moving FOLL_GET code
for hugepages into follow_huge_pmd() with taking the page table lock.
This patch intentionally removes page==NULL check after pte_page.
This is justified because pte_page() never returns NULL for any
architectures or configurations.
This patch changes the behavior of follow_huge_pmd() for tail pages and
then tail pages can be pinned/returned. So the caller must be changed to
properly handle the returned tail pages.
We could have a choice to add the similar locking to
follow_huge_(addr|pud) for consistency, but it's not necessary because
currently these functions don't support FOLL_GET flag, so let's leave it
for future development.
int main(int argc, char *argv[]) {
int i;
int nr_hp = strtol(argv[1], NULL, 0);
int nr_p = nr_hp * HPS / PS;
int ret;
void **addrs;
int *status;
int *nodes;
pid_t pid;
Currently we have many duplicates in definitions around
follow_huge_addr(), follow_huge_pmd(), and follow_huge_pud(), so this
patch tries to remove the m. The basic idea is to put the default
implementation for these functions in mm/hugetlb.c as weak symbols
(regardless of CONFIG_ARCH_WANT_GENERAL_HUGETL B), and to implement
arch-specific code only when the arch needs it.
For follow_huge_addr(), only powerpc and ia64 have their own
implementation, and in all other architectures this function just returns
ERR_PTR(-EINVAL). So this patch sets returning ERR_PTR(-EINVAL) as
default.
As for follow_huge_(pmd|pud)(), if (pmd|pud)_huge() is implemented to
always return 0 in your architecture (like in ia64 or sparc,) it's never
called (the callsite is optimized away) no matter how implemented it is.
So in such architectures, we don't need arch-specific implementation.
In some architecture (like mips, s390 and tile,) their current
arch-specific follow_huge_(pmd|pud)() are effectively identical with the
common code, so this patch lets these architecture use the common code.
One exception is metag, where pmd_huge() could return non-zero but it
expects follow_huge_pmd() to always return NULL. This means that we need
arch-specific implementation which returns NULL. This behavior looks
strange to me (because non-zero pmd_huge() implies that the architecture
supports PMD-based hugepage, so follow_huge_pmd() can/should return some
relevant value,) but that's beyond this cleanup patch, so let's keep it.
Justification of non-trivial changes:
- in s390, follow_huge_pmd() checks !MACHINE_HAS_HPAGE at first, and this
patch removes the check. This is OK because we can assume MACHINE_HAS_HPAGE
is true when follow_huge_pmd() can be called (note that pmd_huge() has
the same check and always returns 0 for !MACHINE_HAS_HPAGE.)
- in s390 and mips, we use HPAGE_MASK instead of PMD_MASK as done in common
code. This patch forces these archs use PMD_MASK, but it's OK because
they are identical in both archs.
In s390, both of HPAGE_SHIFT and PMD_SHIFT are 20.
In mips, HPAGE_SHIFT is defined as (PAGE_SHIFT + PAGE_SHIFT - 3) and
PMD_SHIFT is define as (PAGE_SHIFT + PAGE_SHIFT + PTE_ORDER - 3), but
PTE_ORDER is always 0, so these are identical.
[n-horiguchi@ah.jp.nec.com: resolve conflict to apply to v3.19.1] Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Steve Capper <steve.capper@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reading of analog input channels by the `INSN_READ` comedi instruction
is broken for all except channel 0. `pci171x_ai_insn_read()` calls
`pci171x_ai_read_sample()` with the wrong value for the third parameter.
It is supposed to be the current index in a channel list (which is
always of length 1 in this case, so the index should be 0), but instead
it is passing the actual channel number. `pci171x_ai_read_sample()`
checks the channel number encoded in the raw sample value read from the
hardware matches the channel number stored in the specified index of the
previously set up channel list and returns `-ENODATA` if it doesn't
match. Since the index should always be 0 in this case, the match will
fail unless the channel number is also 0. Fix it by passing 0 as the
channel index.
Note that when the bug first appeared, it was `pci171x_ai_dropout()`
that was called with the wrong parameter value. `pci171x_ai_dropout()`
got replaced with `pci171x_ai_read_sample()` in commit 7fd2dae2500d
("staging: comedi: adv_pci1710: introduce pci171x_ai_read_sample()").
If EPT was enabled, unrestricted_guest was allowed in L1 regardless of
L0. L1 triple faulted when running L2 guest that required emulation.
Another side effect was 'WARN_ON_ONCE(vmx->nested.nested_run_pending)'
in L0's dmesg:
WARNING: CPU: 0 PID: 0 at arch/x86/kvm/vmx.c:9190 nested_vmx_vmexit+0x96e/0xb00 [kvm_intel] ()
Prevent this scenario by masking SECONDARY_EXEC_UNRESTRICTED_GUEST when
the host doesn't have it enabled.
tg3_init_one() calls tg3_halt() without tp->lock despite its assumption
and causes deadlock.
If lockdep is enabled, a warning like this shows up before the stall:
[ BUG: bad unlock balance detected! ]
3.19.0test #3 Tainted: G E
-------------------------------------
insmod/369 is trying to release lock (&(&tp->lock)->rlock) at:
[<ffffffffa02d5a1d>] tg3_chip_reset+0x14d/0x780 [tg3]
but there are no more locks to release!
tg3_init_one() doesn't call tg3_halt() under normal situation but
during kexec kdump I hit this problem.
Fixes: 932f19de ("tg3: Release tp->lock before invoking synchronize_irq()") Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
cdc_ncm disagrees with usbnet about how much framing overhead should
be counted in the tx_bytes statistics, and tries 'fix' this by
decrementing tx_bytes on the transmit path. But statistics must never
be decremented except due to roll-over; this will thoroughly confuse
user-space. Also, tx_bytes is only incremented by usbnet in the
completion path.
Fix this by requiring drivers that set FLAG_MULTI_FRAME to set a
tx_bytes delta along with the tx_packets count.
Currently the usbnet core does not update the tx_packets statistic for
drivers with FLAG_MULTI_PACKET and there is no hook in the TX
completion path where they could do this.
cdc_ncm and dependent drivers are bumping tx_packets stat on the
transmit path while asix and sr9800 aren't updating it at all.
Add a packet count in struct skb_data so these drivers can fill it
in, initialise it to 1 for other drivers, and add the packet count
to the tx_packets statistic on completion.
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Tested-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
handle_offloads() calls skb_reset_inner_headers() to store
the layer pointers to the encapsulated packet. However, we
currently push the vlag tag (if there is one) onto the packet
afterwards. This changes the MAC header for the encapsulated
packet but it is not reflected in skb->inner_mac_header, which
breaks GSO and drivers which attempt to use this for encapsulation
offloads.
Fixes: 1eaa8178 ("vxlan: Add tx-vlan offload support.") Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On Wed, Apr 15, 2015 at 05:41:26PM +0200, Nicolas Dichtel wrote:
> Le 15/04/2015 15:57, Herbert Xu a écrit :
> >On Wed, Apr 15, 2015 at 06:22:29PM +0800, Herbert Xu wrote:
> [snip]
> >Subject: skbuff: Do not scrub skb mark within the same name space
> >
> >The commit ea23192e8e577dfc51e0f4fc5ca113af334edff9 ("tunnels:
> Maybe add a Fixes tag?
> Fixes: ea23192e8e57 ("tunnels: harmonize cleanup done on skb on rx path")
>
> >harmonize cleanup done on skb on rx path") broke anyone trying to
> >use netfilter marking across IPv4 tunnels. While most of the
> >fields that are cleared by skb_scrub_packet don't matter, the
> >netfilter mark must be preserved.
> >
> >This patch rearranges skb_scurb_packet to preserve the mark field.
> nit: s/scurb/scrub
>
> Else it's fine for me.
Sure.
PS I used the wrong email for James the first time around. So
let me repeat the question here. Should secmark be preserved
or cleared across tunnels within the same name space? In fact,
do our security models even support name spaces?
---8<---
The commit ea23192e8e577dfc51e0f4fc5ca113af334edff9 ("tunnels:
harmonize cleanup done on skb on rx path") broke anyone trying to
use netfilter marking across IPv4 tunnels. While most of the
fields that are cleared by skb_scrub_packet don't matter, the
netfilter mark must be preserved.
This patch rearranges skb_scrub_packet to preserve the mark field.
Fixes: ea23192e8e57 ("tunnels: harmonize cleanup done on skb on rx path") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch reverts commit b8fb4e0648a2ab3734140342002f68fb0c7d1602
because the secmark must be preserved even when a packet crosses
namespace boundaries. The reason is that security labels apply to
the system as a whole and is not per-namespace.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 9a2620c877454 ("bnx2x: prevent WARN during driver unload")
switched the napi/busy_lock locking mechanism from spin_lock() into
spin_lock_bh(), breaking inter-operability with netconsole, as netpoll
disables interrupts prior to calling our napi mechanism.
This switches the driver into using atomic assignments instead of the
spinlock mechanisms previously employed.
Based on initial patch from Yuval Mintz & Ariel Elior
I basically added softirq starvation avoidance, and mixture
of atomic operations, plain writes and barriers.
Note this slightly reduces the overhead for this driver when no
busy_poll sockets are in use.
Fixes: 9a2620c877454 ("bnx2x: prevent WARN during driver unload") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
I noticed tcpdump was giving funky timestamps for locally
generated SYNACK messages on loopback interface.
11:42:46.938990 IP 127.0.0.1.48245 > 127.0.0.2.23850: S 945476042:945476042(0) win 43690 <mss 65495,nop,nop,sackOK,nop,wscale 7>
20:28:58.502209 IP 127.0.0.2.23850 > 127.0.0.1.48245: S 3160535375:3160535375(0) ack 945476043 win 43690 <mss
65495,nop,nop,sackOK,nop,wscale 7>
This is because we need to clear skb->tstamp before
entering lower stack, otherwise net_timestamp_check()
does not set skb->tstamp.
Fixes: 7faee5c0d514 ("tcp: remove TCP_SKB_CB(skb)->when") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 1daa4303b4ca ("net/mlx4_core: Deprecate error message at
ConnectX-2 cards startup to debug") did the deprecation only for port 1
of the card. Need to deprecate for port 2 as well.
Fixes: 1daa4303b4ca ("net/mlx4_core: Deprecate error message at ConnectX-2 cards startup to debug") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We should not consult skb->sk for output decisions in xmit recursion
levels > 0 in the stack. Otherwise local socket settings could influence
the result of e.g. tunnel encapsulation process.
ipv6 does not conform with this in three places:
1) ip6_fragment: we do consult ipv6_npinfo for frag_size
2) sk_mc_loop in ipv6 uses skb->sk and checks if we should
loop the packet back to the local socket
3) ip6_skb_dst_mtu could query the settings from the user socket and
force a wrong MTU
Furthermore:
In sk_mc_loop we could potentially land in WARN_ON(1) if we use a
PF_PACKET socket ontop of an IPv6-backed vxlan device.
Reuse xmit_recursion as we are currently only interested in protecting
tunnel devices.
Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On processing cumulative ACKs, the FRTO code was not checking the
SACKed bit, meaning that there could be a spurious FRTO undo on a
cumulative ACK of a previously SACKed skb.
The FRTO code should only consider a cumulative ACK to indicate that
an original/unretransmitted skb is newly ACKed if the skb was not yet
SACKed.
The effect of the spurious FRTO undo would typically be to make the
connection think that all previously-sent packets were in flight when
they really weren't, leading to a stall and an RTO.
Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Fixes: e33099f96d99c ("tcp: implement RFC5682 F-RTO") Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
xen-netfront limits transmitted skbs to be at most 44 segments in size. However,
GSO permits up to 65536 bytes, which means a maximum of 45 segments of 1448
bytes each. This slight reduction in the size of packets means a slight loss in
efficiency.
Since c/s 9ecd1a75d, xen-netfront sets gso_max_size to
XEN_NETIF_MAX_TX_SIZE - MAX_TCP_HEADER,
where XEN_NETIF_MAX_TX_SIZE is 65535 bytes.
The calculation used by tcp_tso_autosize (and also tcp_xmit_size_goal since c/s 6c09fa09d) in determining when to split an skb into two is
sk->sk_gso_max_size - 1 - MAX_TCP_HEADER.
So the maximum permitted size of an skb is calculated to be
(XEN_NETIF_MAX_TX_SIZE - MAX_TCP_HEADER) - 1 - MAX_TCP_HEADER.
Intuitively, this looks like the wrong formula -- we don't need two TCP headers.
Instead, there is no need to deviate from the default gso_max_size of 65536 as
this already accommodates the size of the header.
Currently, the largest skb transmitted by netfront is 63712 bytes (44 segments
of 1448 bytes each), as observed via tcpdump. This patch makes netfront send
skbs of up to 65160 bytes (45 segments of 1448 bytes each).
Similarly, the maximum allowable mtu does not need to subtract MAX_TCP_HEADER as
it relates to the size of the whole packet, including the header.
Fixes: 9ecd1a75d977 ("xen-netfront: reduce gso_max_size to account for max TCP header") Signed-off-by: Jonathan Davies <jonathan.davies@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Return module reference before invoking the respective vport
->destroy() function. This is needed as ovs_vport_del() is not
invoked inside an RCU read side critical section so the kfree
can occur immediately before returning to ovs_vport_del().
Returning the module reference before ->destroy() is safe because
the module unregistration is blocked on ovs_lock which we hold
while destroying the datapath.
Fixes: 62b9c8d0372d ("ovs: Turn vports with dependencies into separate modules") Reported-by: Pravin Shelar <pshelar@nicira.com> Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Before commit 3900f29021f0bc7fe9815aa32f1a993b7dfdd402 ("bonding: slight
optimizztion for bond_slave_override()") the override logic was to send packets
with non-zero queue_id through the slave with corresponding queue_id, under two
conditions only - if the slave can transmit and it's up.
The above mentioned commit changed this logic by introducing an additional
condition - whether the bond is active (indirectly, using the slave_can_tx and
later - bond_is_active_slave), that prevents the user from implementing more
complex policies according to the Documentation/networking/bonding.txt.
Signed-off-by: Anton Nayshtut <anton@swortex.com> Signed-off-by: Alexey Bogoslavsky <alexey@swortex.com> Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
tcp_v6_fill_cb() will be called twice if socket's state changes from
TCP_TIME_WAIT to TCP_LISTEN. That can result in control buffer data
corruption because in the second tcp_v6_fill_cb() call it's not copying
IP6CB(skb) anymore, but 'seq', 'end_seq', etc., so we can get weird and
unpredictable results. Performance loss of up to 1200% has been observed
in LTP/vxlan03 test.
This can be fixed by copying inet6_skb_parm to the beginning of 'cb'
only if xfrm6_policy_check() and tcp_v6_fill_cb() are going to be
called again.
Fixes: 2dc49d1680b53 ("tcp6: don't move IP6CB before xfrm6_policy_check()") Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Instead of -1 with EAGAIN, read on a O_NONBLOCK tun fd will return 0. This
fixes this by properly returning the error code from __skb_recv_datagram.
Signed-off-by: Alex Gartrell <agartrell@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A local route may have a lower hop_limit set than global routes do.
RFC 3756, Section 4.2.7, "Parameter Spoofing"
> 1. The attacker includes a Current Hop Limit of one or another small
> number which the attacker knows will cause legitimate packets to
> be dropped before they reach their destination.
> As an example, one possible approach to mitigate this threat is to
> ignore very small hop limits. The nodes could implement a
> configurable minimum hop limit, and ignore attempts to set it below
> said limit.
Signed-off-by: D.S. Ljungmark <ljungmark@modio.se> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Netdevice registration should be performed a the end of the driver
initialization flow. If we don't do that, after calling register_netdevice,
device callbacks may be issued by higher layers of the stack before
final configuration of the device is done.
For example (VXLAN configuration race), mlx4_SET_PORT_VXLAN was issued
after the register_netdev command. System network scripts may configure
the interface (UP) right after the registration, which also attach
unicast VXLAN steering rule, before mlx4_SET_PORT_VXLAN was called,
causing the firmware to fail the rule attachment.
Fixes: 837052d0ccc5 ("net/mlx4_en: Add netdev support for TCP/IP offloads of vxlan tunneling") Signed-off-by: Ido Shamay <idos@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Master change notifications may occur other than when joining or
leaving a bridge, for example when being added to or removed from
a bond or Open vSwitch.
Previously in those cases rocker_port_bridge_leave() was called
which results in a null-pointer dereference as rocker_port->bridge_dev
is NULL because there is no bridge device.
This patch makes provision for doing nothing in such cases.
Fixes: 6c7079450071f ("rocker: implement L2 bridge offloading") Acked-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On s390x, gcc 4.8 compiles this part of tcp_v6_early_demux()
struct dst_entry *dst = sk->sk_rx_dst;
if (dst)
dst = dst_check(dst, inet6_sk(sk)->rx_dst_cookie);
to code reading sk->sk_rx_dst twice, once for the test and once for
the argument of ip6_dst_check() (dst_check() is inline). This allows
ip6_dst_check() to be called with null first argument, causing a crash.
Protect sk->sk_rx_dst access by READ_ONCE() both in IPv4 and IPv6
TCP early demux code.
Fixes: 41063e9dd119 ("ipv4: Early TCP socket demux.") Fixes: c7109986db3c ("ipv6: Early TCP socket demux") Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Failure happens when attempting to allocate pages for
'struct kvm_memslots', however it doesn't have to be
present in physically contiguous (kmalloc-ed) address
space, change allocation to kvm_kvzalloc() so that
it will be vmalloc-ed when its size is more then a page.
It is platform/output depenedent when exactly the pipe will start
running. Sometimes we just need the (cpu) pipe enabled, in other cases
the pch transcoder is enough and in yet other cases the (DP) port is
sending the frame start signal.
In a perfect world we'd put the drm_crtc_vblank_on call exactly where
the pipe starts running, but due to cloning and similar things this
will get messy. And the current approach of picking the most
conservative place for all combinations also doesn't work since that
results in legit vblank waits (in encoder->enable hooks, e.g. the 2
vblank waits for sdvo) failing.
isn't great either since screaming when the vblank wait work because
the pipe is off is kinda nice.
Pick a compromise and move the drm_crtc_vblank_on right before the
encoder->enable call. This is a lie on some outputs/platforms, but
after the ->enable callback the pipe is guaranteed to run everywhere.
So not that bad really. Suggested by Ville.
v2: Same treatment for drm_crtc_vblank_off and encoder->disable: I've
missed the ibx pipe B select w/a, which also has a vblank wait in the
disable function (while the pipe is obviously still running).
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A new fsync vs power fail test in xfstests indicated that XFS can
have unreliable data consistency when doing extending truncates that
require block zeroing. The blocks beyond EOF get zeroed in memory,
but we never force those changes to disk before we run the
transaction that extends the file size and exposes those blocks to
userspace. This can result in the blocks not being correctly zeroed
after a crash.
Because in-memory behaviour is correct, tools like fsx don't pick up
any coherency problems - it's not until the filesystem is shutdown
or the system crashes after writing the truncate transaction to the
journal but before the zeroed data in the page cache is flushed that
the issue is exposed.
Fix this by also flushing the dirty data in memory region between
the old size and new size when we've found blocks that need zeroing
in the truncate process.
Reported-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 4f579ae7de56 (ext4: fix punch hole on files with indirect
mapping) rewrote FALLOC_FL_PUNCH_HOLE for ext4 files with indirect
mapping. However, there are bugs in several corner cases. This fixes 5
distinct bugs:
1. When there is at least one entire level of indirection between the
start and end of the punch range and the end of the punch range is the
first block of its level, we can't return early; we have to free the
intervening levels.
2. When the end is at a higher level of indirection than the start and
ext4_find_shared returns a top branch for the end, we still need to free
the rest of the shared branch it returns; we can't decrement partial2.
3. When a punch happens within one level of indirection, we need to
converge on an indirect block that contains the start and end. However,
because the branches returned from ext4_find_shared do not necessarily
start at the same level (e.g., the partial2 chain will be shallower if
the last block occurs at the beginning of an indirect group), the walk
of the two chains can end up "missing" each other and freeing a bunch of
extra blocks in the process. This mismatch can be handled by first
making sure that the chains are at the same level, then walking them
together until they converge.
4. When the punch happens within one level of indirection and
ext4_find_shared returns a top branch for the start, we must free it,
but only if the end does not occur within that branch.
5. When the punch happens within one level of indirection and
ext4_find_shared returns a top branch for the end, then we shouldn't
free the block referenced by the end of the returned chain (this mirrors
the different levels case).
The hrtimer_{start/cancel} functions call into tracing which uses RCU.
But it is not legal to call into RCU in cpuidle because it is one of the
quiescent states. Hence protect this region with RCU_NONIDLE which informs
RCU that the cpu is momentarily non-idle.
As an aside it is helpful to point out that the clock event device that is
programmed here is not a per-cpu clock device; it is a
pseudo clock device, used by the broadcast framework alone.
The per-cpu clock device programming never goes through bc_set_next().
Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: linuxppc-dev@ozlabs.org Cc: mpe@ellerman.id.au Cc: tglx@linutronix.de Link: http://lkml.kernel.org/r/20150318104705.17763.56668.stgit@preeti.in.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For RoCE ports, we set the u32 PMA values based on u64 HCA counters. In case of
overflow, according to the IB spec, we have to saturate a counter to its
max value, do that.
Fixes: c37791349cc7 ('IB/mlx4: Support PMA counters for IBoE') Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The rate provided at the output of a clk-divider is calculated as:
DIV_ROUND_UP(parent_rate, div)
since commit b11d282dbea2 (clk: divider: fix rate calculation for
fractional rates). So to yield a rate not bigger than r parent_rate
must be <= r * div.
The effect of choosing a parent rate that is too big as was done before
this patch results in wrongly ruling out good dividers.
Note that this is not a complete fix as __clk_round_rate might return a
value >= its 2nd parameter. Also for dividers with
CLK_DIVIDER_ROUND_CLOSEST set the calculation is not accurate. But this
fixes the test case by Sascha Hauer that uses a chain of three dividers
under a fixed clock.
It's an invalid approach to assume that among two divider values
the one nearer the exact divider is the better one.
Assume a parent rate of 1000 Hz, a divider with CLK_DIVIDER_POWER_OF_TWO
and a target rate of 89 Hz. The exact divider is ~ 11.236 so 8 and 16
are the candidates to choose from yielding rates 125 Hz and 62.5 Hz
respectivly. While 8 is nearer to 11.236 than 16 is, the latter is still
the better divider as 62.5 is nearer to 89 than 125 is.
Fixes: 774b514390b1 (clk: divider: Add round to closest divider) Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Acked-by: Sascha Hauer <s.hauer@pengutronix.de> Acked-by: Maxime Coquelin <maxime.coquelin@st.com> Signed-off-by: Michael Turquette <mturquette@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is due to a race condition between stopping the thread and
calling vb2_internal_streamoff(). While I have not been able to deduce
the exact mechanism how this race condition can produce this warning,
I can see that the way the stream is stopped is likely to lead to a
race somewhere.
This patch simplifies how this is done by first ensuring that the
thread is completely stopped before cleaning up the vb2 queue. It
does that by setting threadio->stop to true, followed by a call to
vb2_queue_error() which will wake up the thread. The thread sees that
'stop' is true and it will exit.
The call to kthread_stop() waits until the thread has exited, and only
then is the queue cleaned up by calling __vb2_cleanup_fileio().
This is a much cleaner sequence and the warning has now disappeared.
The last argument of vb2_dc_get_user_pages() is of type enum
dma_data_direction, but the caller, vb2_dc_get_userptr() passes a value
which is the result of comparison dma_dir == DMA_FROM_DEVICE. This results
in the write parameter to get_user_pages() being zero in all cases, i.e.
that the caller has no intent to write there.
This was broken by patch "vb2: replace 'write' by 'dma_dir'".
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Acked-by: Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Unlike scan_async_group(), soc_of_bind() doesn't allocate its
soc_camera_async_client structure using devm_kzalloc(), but has it
embedded inside the soc_of_info structure. Hence on failure, it must
free the whole soc_of_info structure, and not just the embedded
soc_camera_async_client structure, as the latter causes a warning, and
may cause slab corruption:
TASK_SIZE is depends on the systems architecture (32 or 64 bits) and it
should not be used for defining offset boundary for mmaping buffers for
CAPTURE and OUTPUT queues. This patch fixes support for MMAP calls on
the CAPTURE queue on 64bit architectures (like ARM64).
This fixes a oops due to a double list add when adding a reject PDU for
iscsit_allocate_iovecs allocation failures. The cmd has already been
added to the conn_cmd_list in iscsit_setup_scsi_cmd, so this has us call
iscsit_reject_cmd.
Note that for ERL0 the reject PDU is not actually sent, so this patch
is not completely tested. Just verified we do not oops. The problem is the
add reject functions return -1 which is returned all the way up to
iscsi_target_rx_thread which for ERL0 will drop the connection.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If we fail past the aio_setup_ring(), we need to destroy the
mapping. We don't need to care about anybody having found ctx,
or added requests to it, since the last failure exit is exactly
the failure to make ctx visible to lookups.
Reproducer (based on one by Joe Mario <jmario@redhat.com>):
int main()
{
io_context_t *ctx;
int created, limit, i, destroyed;
FILE *f;
count("before");
if ((f = fopen("/proc/sys/fs/aio-max-nr", "r")) == NULL)
perror("opening aio-max-nr");
else if (fscanf(f, "%d", &limit) != 1)
fprintf(stderr, "can't parse aio-max-nr\n");
else if ((ctx = calloc(limit, sizeof(io_context_t))) == NULL)
perror("allocating aio_context_t array");
else {
for (i = 0, created = 0; i < limit; i++) {
if (io_setup(1000, ctx + created) == 0)
created++;
}
for (i = 0, destroyed = 0; i < created; i++)
if (io_destroy(ctx[i]) == 0)
destroyed++;
printf("created %d, failed %d, destroyed %d\n",
created, limit - created, destroyed);
count("after");
}
}
Found-by: Joe Mario <jmario@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
"ocfs2 syncs the wrong range" had been broken; prior to it the
code was doing the wrong thing in case of O_APPEND, all right,
but _after_ it we were syncing the wrong range in 100% cases.
*ppos, aka iocb->ki_pos is incremented prior to that point,
so we are always doing sync on the area _after_ the one we'd
written to.
Spotted by Joseph Qi <joseph.qi@huawei.com> back in January;
unfortunately, I'd missed his mail back then ;-/
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kernel panic was happening as iscsi_host_remove() was called on
a host which was not yet added.
Signed-off-by: John Soni Jose <sony.john-n@emulex.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Odin.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Dirty page throttling should be sufficient for us in the general case
so there is no need to use __GFP_MEMALLOC - it would be needed only in
the swap-over-rbd case, which we currently don't support. (It would
probably take approximately the commit that is being reverted to add
that support, but we would also need the "swap" option to distinguish
from the general case and make sure swap ceph_client-s aren't shared
with anything else.) See ceph-devel threads [1] and [2] for the
details of why enabling pfmemalloc reserves for all cases is a bad
thing.
On top of potential system lockups related to drained emergency
reserves, this turned out to cause ceph lockups in case peers are on
the same host and communicating via loopback due to sk_filter()
dropping pfmemalloc skbs on the receiving side because the receiving
loopback socket is not tagged with SOCK_MEMALLOC.
[1] "SOCK_MEMALLOC vs loopback"
http://www.spinics.net/lists/ceph-devel/msg22998.html
[2] "[PATCH] libceph: don't set memalloc flags in loopback case"
http://www.spinics.net/lists/ceph-devel/msg23392.html
Commit 84c91b7ae07c (PM / hibernate: avoid unsafe pages in e820 reserved
regions) is reported to make resume from hibernation on Lenovo x230
unreliable, so revert it.
We will revisit the issue the commit in question was supposed to fix
in the future.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=96111 Reported-by: rhn <kebuac.rhn@porcupinefactory.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Transmission of an AP beacon does not call the TX interrupt service routine,
which usually does the cleanup. Instead, cleanup is handled in a tasklet
completion routine. Unfortunately, this routine has a serious bug in that it does
not release the DMA mapping before it frees the skb, thus one IOMMU mapping is
leaked for each beacon. The test system failed with no free IOMMU mapping slots
approximately one hour after hostapd was used to start an AP.
This issue was reported and tested at https://github.com/lwfinger/rtlwifi_new/issues/30.
Reported-and-tested-by: Kevin Mullican <kevin@mullican.com> Cc: Kevin Mullican <kevin@mullican.com> Signed-off-by: Shao Fu <shaofu@realtek.com> Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Device domains never span IOMMU hardware units, which allows the
domain ID space for each IOMMU to be an independent address space.
Therefore we can have multiple, independent domains, each with the
same domain->id, but attached to different hardware units. This is
also why we need to do a heavy-weight search for VM domains since
they can span multiple IOMMUs hardware units and we don't require a
single global ID to use for all hardware units.
Therefore, if we call iommu_detach_domain() across all active IOMMU
hardware units for a non-VM domain, the result is that we clear domain
IDs that are not associated with our domain, allowing them to be
re-allocated and causing apparent coherency issues when the device
cannot access IOVAs for the intended domain.
This bug was introduced in commit fb170fb4c548 ("iommu/vt-d: Introduce
helper functions to make code symmetric for readability"), but is
significantly exacerbated by the more recent commit 62c22167dd70
("iommu/vt-d: Fix dmar_domain leak in iommu_attach_device") which calls
domain_exit() more frequently to resolve a domain leak.
Fixes: fb170fb4c548 ("iommu/vt-d: Introduce helper functions to make code symmetric for readability") Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Cc: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The BCM43362 firmware falsely reports it is capable of providing
MBSS. As a result AP mode no longer works for this device. Therefor
disable MBSS in the driver for this chipset.
Reported-by: Jorg Krause <jkrause@posteo.de> Reviewed-by: Hante Meuleman <meuleman@broadcom.com> Reviewed-by: Pieter-Paul Giesberts <pieterpg@broadcom.com> Signed-off-by: Arend van Spriel <arend@broadcom.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Under intermittent network outages, find_writable_file() is susceptible
to the following race condition, which results in a user-after-free in
the cifs_writepages code-path:
At this point we loop back through with an invalid inv_file pointer
and a refind value of 1. On second pass, inv_file is not overwritten on
openFileList traversal, and is subsequently dereferenced.
Signed-off-by: David Disseldorp <ddiss@suse.de> Reviewed-by: Jeff Layton <jlayton@samba.org> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
While attempting to clone a file on a samba server, we receive a
STATUS_INVALID_DEVICE_REQUEST. This is mapped to -EOPNOTSUPP which
isn't handled in smb2_clone_range(). We end up looping in the while loop
making same call to the samba server over and over again.
The proposed fix is to exit and return the error value when encountered
with an unhandled error.
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Signed-off-by: Steve French <steve.french@primarydata.com> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In canon mode, the read buffer head will advance over the buffer tail
if the input > 4095 bytes without receiving a line termination char.
Discard additional input until a line termination is received.
Before evaluating for overflow, the 'room' value is normalized for
I_PARMRK and 1 byte is reserved for line termination (even in !icanon
mode, in case the mode is switched). The following table shows the
transform:
actual buffer | 'room' value before overflow calc
space avail | !I_PARMRK | I_PARMRK
--------------------------------------------------
0 | -1 | -1
1 | 0 | 0
2 | 1 | 0
3 | 2 | 0
4+ | 3 | 1
When !icanon or when icanon and the read buffer contains newlines,
normalized 'room' values of -1 and 0 are clamped to 0, and
'overflow' is 0, so read_head is not adjusted and the input i/o loop
exits (setting no_room if called from flush_to_ldisc()). No input
is discarded since the reader does have input available to read
which ensures forward progress.
When icanon and the read buffer does not contain newlines and the
normalized 'room' value is 0, then overflow and room are reset to 1,
so that the i/o loop will process the next input char normally
(except for parity errors which are ignored). Thus, erasures, signalling
chars, 7-bit mode, etc. will continue to be handled properly.
If the input char processed was not a line termination char, then
the canon_head index will not have advanced, so the normalized 'room'
value will now be -1 and 'overflow' will be set, which indicates the
read_head can safely be reset, effectively erasing the last char
processed.
If the input char processed was a line termination, then the
canon_head index will have advanced, so 'overflow' is cleared to 0,
the read_head is not reset, and 'room' is cleared to 0, which exits
the i/o loop (because the reader now have input available to read
which ensures forward progress).
Note that it is possible for a line termination to be received, and
for the reader to copy the line to the user buffer before the
input i/o loop is ready to process the next input char. This is
why the i/o loop recomputes the room/overflow state with every
input char while handling overflow.
Finally, if the input data was processed without receiving
a line termination (so that overflow is still set), the pty
driver must receive a write wakeup. A pty writer may be waiting
to write more data in n_tty_write() but without unthrottling
here that wakeup will not arrive, and forward progress will halt.
(Normally, the pty writer is woken when the reader reads data out
of the buffer and more space become available).
Signed-off-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(backported from commit fb5ef9e7da39968fec6d6f37f20a23d23740c75e) Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
When the receiver was enabled during startup, a character could
have been in the FIFO when the UART get initially used. The
driver configures the (receive) watermark level, and flushes the
FIFO. However, the receive flag (RDRF) could still be set at that
stage (as mentioned in the register description of UARTx_RWFIFO).
This leads to an interrupt which won't be handled properly in
interrupt mode: The receive interrupt function lpuart_rxint checks
the FIFO count, which is 0 at that point (due to the flush
during initialization). The problem does not manifest when using
DMA to receive characters.
Fix this situation by explicitly read the status register, which
leads to clearing of the RDRF flag. Due to the flush just after
the status flag read, a explicit data read is not to required.
Signed-off-by: Stefan Agner <stefan@agner.ch> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Specify transmit FIFO size which might be different depending on
LPUART instance. This makes sure uart_wait_until_sent in serial
core getting called, which in turn waits and checks if the FIFO
is really empty on shutdown by using the tx_empty callback.
Without the call of this callback, the last several characters
might not yet be transmitted when closing the serial port. This
can be reproduced by simply using echo and redirect the output to
a ttyLP device.
Signed-off-by: Stefan Agner <stefan@agner.ch> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When a device with an isochronous endpoint is plugged into the Intel
xHCI host controller, and the driver submits multiple frames per URB,
the xHCI driver will set the Block Event Interrupt (BEI) flag on all
but the last TD for the URB. This causes the host controller to place
an event on the event ring, but not send an interrupt. When the last
TD for the URB completes, BEI is cleared, and we get an interrupt for
the whole URB.
However, under Intel xHCI host controllers, if the event ring is full
of events from transfers with BEI set, an "Event Ring is Full" event
will be posted to the last entry of the event ring, but no interrupt
is generated. Host will cease all transfer and command executions and
wait until software completes handling the pending events in the event
ring. That means xHC stops, but event of "event ring is full" is not
notified. As the result, the xHC looks like dead to user.
This patch is to apply XHCI_AVOID_BEI quirk to Intel xHC devices. And
it should be backported to kernels as old as 3.0, that contains the
commit 69e848c2090a ("Intel xhci: Support EHCI/xHCI port switching.").
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: Alistair Grant <akgrant0710@gmail.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Linux xHCI driver doesn't report and handle port cofig error change.
If Port Configure Error for root hub port occurs, CEC bit in PORTSC
would be set by xHC and remains 1. This happends when the root port
fails to configure its link partner, e.g. the port fails to exchange
port capabilities information using Port Capability LMPs.
Then the Port Status Change Events will be blocked until all status
change bits(CEC is one of the change bits) are cleared('0') (refer to
xHCI spec 4.19.2). Otherwise, the port status change event for this
root port will not be generated anymore, then root port would look
like dead for user and can't be recovered until a Host Controller
Reset(HCRST).
This patch is to check CEC bit in PORTSC in xhci_get_port_status()
and set a Config Error in the return status if CEC is set. This will
cause a ClearPortFeature request, where CEC bit is cleared in
xhci_clear_port_change_bit().
[The commit log is based on initial Marvell patch posted at
http://marc.info/?l=linux-kernel&m=142323612321434&w=2]
Fix a bug that leads to showing the name and description of C-state C0
as "<null>" in sysfs after the ACPI C-states changed (e.g. after AC->DC
or DC->AC
transition).
The function poll_idle_init() in drivers/cpuidle/driver.c initializes the
state 0 during cpuidle_register_driver(), so we better do not overwrite it
again with '\0' during acpi_processor_cst_has_changed().
Signed-off-by: Thomas Schlichter <thomas.schlichter@web.de> Reviewed-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Thomas Schlichter reports the following issue on his Samsung NC20:
"The C-states C1 and C2 to the OS when connected to AC, and additionally
provides the C3 C-state when disconnected from AC. However, the number
of C-states shown in sysfs is fixed to the number of C-states present
at boot.
If I boot with AC connected, I always only see the C-states up to C2
even if I disconnect AC.
The reason is commit 130a5f692425 (ACPI / cpuidle: remove dev->state_count
setting). It removes the update of dev->state_count, but sysfs uses
exactly this variable to show the C-states.
The fix is to use drv->state_count in sysfs. As this is currently the
last user of dev->state_count, this variable can be completely removed."
Remove dev->state_count as per the above.
Reported-by: Thomas Schlichter <thomas.schlichter@web.de> Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
[ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
dmi_num is a u16, dmi_len is a u32, so this construct:
dmi_num = dmi_len / 4;
would result in an integer overflow for a DMI table larger than
256 kB. I've never see such a large table so far, but SMBIOS 3.0
makes it possible so maybe we'll see such tables in the future.
So instead of faking a structure count when the entry point does
not provide it, adjust the loop condition in dmi_table() to properly
deal with the case where dmi_num is not set.
This bug was introduced with the initial SMBIOS 3.0 support in commit fc43026278b2 ("dmi: add support for SMBIOS 3.0 64-bit entry point").
Signed-off-by: Jean Delvare <jdelvare@suse.de> Cc: Matt Fleming <matt.fleming@intel.com> Cc: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Return EPROBE_DEFER if Regulator returns EPROBE_DEFER
If the Flexcan driver is built into kernel and a regulator is used to
enable the CAN transceiver, the Flexcan driver may not use the regulator.
When initializing the Flexcan device with a regulator defined in the device
tree, but not initialized, the regulator subsystem returns EPROBE_DEFER, hence
the Flexcan init fails.
The solution for this is to return EPROBE_DEFER if regulator is not initialized
and wait until the regulator is initialized.
Signed-off-by: Andreas Werner <kernel@andy89.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The ASRock Q1900DC-ITX mainboard (Baytrail-D) hangs randomly in
both BIOS and UEFI mode while rebooting unless reboot=pci is
used. Add a quirk to reboot via the pci method.
The problem is very intermittent and hard to debug, it might succeed
rebooting just fine 40 times in a row - but fails half a dozen times
the next day. It seems to be slightly less common in BIOS CSM mode
than native UEFI (with the CSM disabled), but it does happen in either
mode. Since I've started testing this patch in late january, rebooting
has been 100% reliable.
Most of the time it already hangs during POST, but occasionally it
might even make it through the bootloader and the kernel might even
start booting, but then hangs before the mode switch. The same symptoms
occur with grub-efi, gummiboot and grub-pc, just as well as (at least)
kernel 3.16-3.19 and 4.0-rc6 (I haven't tried older kernels than 3.16).
Upgrading to the most current mainboard firmware of the ASRock
Q1900DC-ITX, version 1.20, does not improve the situation.
( Searching the web seems to suggest that other Bay Trail-D mainboards
might be affected as well. )
-- Signed-off-by: Stefan Lippers-Hollmann <s.l-h@gmx.de> Cc: Matt Fleming <matt.fleming@intel.com> Link: http://lkml.kernel.org/r/20150330224427.0fb58e42@mir Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sc->nbcnvifs tracks assigned beacon slots, not enabled beacons.
Therefore, it cannot be used to decide if cur_conf->enable_beacon (bool)
should be updated, or if beacons have been enabled already.
With the current code (depending on the order of calls), beacons often
do not get enabled in an AP+STA setup.
To fix tracking of enabled beacons, convert cur_conf->enable_beacon to a
bitmask of enabled beacon slots.
Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In omap_dma_start_desc the vdesc->node is removed from the virt-dma
framework managed lists (to be precise from the desc_issued list).
If a terminate_all comes before the transfer finishes the omap_desc will
not be freed up because it is not in any of the lists and we stopped the
DMA channel so the transfer will not going to complete.
There is no special sequence for leaking memory when using cyclic (audio)
transfer: with every start and stop of a cyclic transfer the driver leaks
struct omap_desc worth of memory.
Free up the allocated memory directly in omap_dma_terminate_all() since the
framework will not going to do that for us.
If edma_terminate_all() was called while a transfer was running (i.e. after
edma_execute() but before edma_callback()) the echan->edesc was not freed.
This was due to the fact that a running transfer is on none of the
vchan lists: desc_submitted, desc_issued, desc_completed (edma_execute()
removes it from the desc_issued list), so the vchan_dma_desc_free_list()
called at the end of edma_terminate_all() didn't find it and didn't free it.
This bug was found on an AM1808 based hardware (very similar to da850evm,
however using the second MMC/SD controller), where intense operations on the SD
card wasted the device 128MB RAM within a couple of days.
Peter Ujfalusi:
The issue is even more severe since it affects cyclic (audio) transfers as
well. In this case starting/stopping audio will results memory leak.
Signed-off-by: Petr Kulhavy <petr@barix.com> Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> CC: <linux-omap@vger.kernel.org> Signed-off-by: Vinod Koul <vinod.koul@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch uses iio_trigger_get to increment the reference
count of trigger device, to avoid incorrect assignment.
Can result in a null pointer dereference during removal if the
trigger has been changed before removal.
This patch refers to a similar situation encountered through the
following discussion:
http://www.spinics.net/lists/linux-iio/msg13669.html
Depending on conversion mode used, the ADC clock (ADCK) needs
to be below a maximum frequency. According to Vybrid's data
sheet this is 20MHz for the low power conversion mode.
The ADC clock is depending on input clock, which is the bus
clock by default. Vybrid SoC are typically clocked at at 400MHz
or 500MHz, which leads to 66MHz or 83MHz bus clock respectively.
Hence, a divider of 8 is required to stay below the specified
maximum clock of 20MHz.
Due to the different bus clock speeds, the resulting sampling
frequency is not static. Hence use the ADC clock and calculate
the actual available sampling frequency dynamically.
This fixes bogous values observed on some 500MHz clocked Vybrid
SoC. The resulting value usually showed Bit 9 being stuck at 1,
or 0, which lead to a value of +/-512.
Signed-off-by: Stefan Agner <stefan@agner.ch> Acked-by: Fugang Duan <B38611@freescale.com> Signed-off-by: Jonathan Cameron <jic23@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently driver reports device bandwidth list as available
sampling frequency. But sampling frequency is actually twice
the device bandwidth. This patch fixes this issue.
SCSI transport drivers and SCSI LLDs block a SCSI device if the
transport layer is not operational. This means that in this state
no requests should be processed, even if the REQ_PREEMPT flag has
been set. This patch avoids that a rescan shortly after a cable
pull sporadically triggers the following kernel oops:
This patch uses the existing CALAO Systems ftdi_8u2232c_probe in order
to avoid attaching a TTY to the JTAG port as this board is based on the
CALAO Systems reference design and needs the same fix up.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
[johan: clean up probe logic ] Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add USB VID/PID for Xircom PGMFHUB USB/serial component. (The hub and SCSI
bridge on that hardware are recognized out of the box by existing drivers.)
Tested VID/PID using new_id and loopback connection and was met with
success, but that's all the testing done.
Use readb() and memcpy_fromio() accessors instead.
Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Micron has released an updated firmware (MU02) for M510/M550/MX100
drives to fix the issues with queued TRIM. Queued TRIM remains broken on
M500 but is working fine on later drives such as M600 and MX200.
Tweak our blacklist to reflect the above.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=71371 Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2f800fbd777b ("writeback: fix dirtied pages accounting on redirty")
introduced account_page_redirty() which reverts stat updates for a
redirtied page, making BDI_DIRTIED no longer monotonically increasing.
bdi_update_write_bandwidth() uses the delta in BDI_DIRTIED as the
basis for bandwidth calculation. While unlikely, since the above
patch, the newer value may be lower than the recorded past value and
underflow the bandwidth calculation leading to a wild result.
Fix it by subtracing min of the old and new values when calculating
delta. AFAIK, there hasn't been any report of it happening but the
resulting erratic behavior would be non-critical and temporary, so
it's possible that the issue is happening without being reported. The
risk of the fix is very low, so tagged for -stable.
global_update_bandwidth() uses static variable update_time as the
timestamp for the last update but forgets to initialize it to
INITIALIZE_JIFFIES.
This means that global_dirty_limit will be 5 mins into the future on
32bit and some large amount jiffies into the past on 64bit. This
isn't critical as the only effect is that global_dirty_limit won't be
updated for the first 5 mins after booting on 32bit machines,
especially given the auxiliary nature of global_dirty_limit's role -
protecting against global dirty threshold's sudden dips; however, it
does lead to unintended suboptimal behavior. Fix it.
All CPUs leaving the first-online CPU are hotplugged out on suspend and
and cpufreq core stops managing them.
On resume, we need to call cpufreq_update_policy() for this CPU's policy
to make sure its frequency is in sync with cpufreq's cached value, as it
might have got updated by hardware during suspend/resume.
The policies are always added to the top of the policy-list. So, in
normal circumstances, CPU 0's policy will be the last one in the list.
And so the code checks for the last policy.
But there are cases where it will fail. Consider quad-core system, with
policy-per core. If CPU0 is hotplugged out and added back again, the
last policy will be on CPU1 :(
To fix this in a proper way, always look for the policy of the first
online CPU. That way we will be sure that we are calling
cpufreq_update_policy() for the only CPU that wasn't hotplugged out.
Fixes: 2f0aea936360 ("cpufreq: suspend governors on system suspend/hibernate") Reported-by: Saravana Kannan <skannan@codeaurora.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Saravana Kannan <skannan@codeaurora.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When non-realtime tasks get priority-inheritance boosted to a realtime
scheduling class, RLIMIT_RTTIME starts to apply to them. However, the
counter used for checking this (the same one used for SCHED_RR
timeslices) was not getting reset. This meant that tasks running with a
non-realtime scheduling class which are repeatedly boosted to a realtime
one, but never block while they are running realtime, eventually hit the
timeout without ever running for a time over the limit. This patch
resets the realtime timeslice counter when un-PI-boosting from an RT to
a non-RT scheduling class.
I have some test code with two threads and a shared PTHREAD_PRIO_INHERIT
mutex which induces priority boosting and spins while boosted that gets
killed by a SIGXCPU on non-fixed kernels but doesn't with this patch
applied. It happens much faster with a CONFIG_PREEMPT_RT kernel, and
does happen eventually with PREEMPT_VOLUNTARY kernels.
Commit 3c605096d315 ("mm/page_alloc: restrict max order of merging on
isolated pageblock") changed the logic of unset_migratetype_isolate to
check the buddy allocator and explicitly call __free_pages to merge.
The page that is being freed in this path never had prep_new_page called
so set_page_refcounted is called explicitly but there is no call to
kernel_map_pages. With the default kernel_map_pages this is mostly
harmless but if kernel_map_pages does any manipulation of the page
tables (unmapping or setting pages to read only) this may trigger a
fault:
The cause is the "memset(pgdat, 0, sizeof(*pgdat))" at the end of
try_offline_node, which will reset all the content of pgdat to 0, as the
pgdat is accessed lock-free, so that the users still using the pgdat
will panic, such as the vmstat_update routine.
process A: offline node XX:
vmstat_updat()
refresh_cpu_vm_stats()
for_each_populated_zone()
find online node XX
cond_resched()
offline cpu and memory, then try_offline_node()
node_set_offline(nid), and memset(pgdat, 0, sizeof(*pgdat))
zone = next_zone(zone)
pg_data_t *pgdat = zone->zone_pgdat; // here pgdat is NULL now
next_online_pgdat(pgdat)
next_online_node(pgdat->node_id); // NULL pointer access
So the solution here is postponing the reset of obsolete pgdat from
try_offline_node() to hotadd_new_pgdat(), and just resetting
pgdat->nr_zones and pgdat->classzone_idx to be 0 rather than the memset
0 to avoid breaking pointer information in pgdat.
I have constantly stumbled upon "kernel BUG at mm/rmap.c:399!" after
upgrading to 3.19 and had no luck with 4.0-rc1 neither.
So, after looking into new logic introduced by commit 7a3ef208e662 ("mm:
prevent endless growth of anon_vma hierarchy"), I found chances are that
unlink_anon_vmas() is called without incrementing dst->anon_vma->degree
in anon_vma_clone() due to allocation failure. If dst->anon_vma is not
NULL in error path, its degree will be incorrectly decremented in
unlink_anon_vmas() and eventually underflow when exiting as a result of
another call to unlink_anon_vmas(). That's how "kernel BUG at
mm/rmap.c:399!" is triggered for me.
This patch fixes the underflow by dropping dst->anon_vma when allocation
fails. It's safe to do so regardless of original value of dst->anon_vma
because dst->anon_vma doesn't have valid meaning if anon_vma_clone()
fails. Besides, callers don't care dst->anon_vma in such case neither.
Also suggested by Michal Hocko, we can clean up vma_adjust() a bit as
anon_vma_clone() now does the work.
[akpm@linux-foundation.org: tweak comment] Fixes: 7a3ef208e662 ("mm: prevent endless growth of anon_vma hierarchy") Signed-off-by: Leon Yu <chianglungyu@gmail.com> Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There's an issue with the way the RX A-MPDU reorder timer is
deleted that can cause a kernel crash like this:
* tid_rx is removed - call_rcu(ieee80211_free_tid_rx)
* station is destroyed
* reorder timer fires before ieee80211_free_tid_rx() runs,
accessing the station, thus potentially crashing due to
the use-after-free
The station deletion is protected by synchronize_net(), but
that isn't enough -- ieee80211_free_tid_rx() need not have
run when that returns (it deletes the timer.) We could use
rcu_barrier() instead of synchronize_net(), but that's much
more expensive.
Instead, to fix this, add a field tracking that the session
is being deleted. In this case, the only re-arming of the
timer happens with the reorder spinlock held, so make that
code not rearm it if the session is being deleted and also
delete the timer after setting that field. This ensures the
timer cannot fire after ___ieee80211_stop_rx_ba_session()
returns, which fixes the problem.
Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Implement arch_irq_work_has_interrupt() for powerpc
Commit 9b01f5bf3 introduced a dependency on "IRQ work self-IPIs" for
full dynamic ticks to be enabled, by expecting architectures to
implement a suitable arch_irq_work_has_interrupt() routine.
Several arches have implemented this routine, including x86 (3010279f)
and arm (09f6edd4), but powerpc was omitted.
This patch implements this routine for powerpc.
The symptom, at boot (on powerpc systems) with "nohz_full=<CPU list>"
is displayed:
NO_HZ: Can't run full dynticks because arch doesn't support irq work self-IPIs
after this patch:
NO_HZ: Full dynticks CPUs: <CPU list>.
Tested against 3.19.
powerpc implements "IRQ work self-IPIs" by setting the decrementer to 1 in
arch_irq_work_raise(), which causes a decrementer exception on the next
timebase tick. We then handle the work in __timer_interrupt().
CC: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Paul A. Clarke <pc@us.ibm.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[mpe: Flesh out change log, fix ws & include guards, remove include of processor.h] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Space allocated for paca is based off nr_cpu_ids,
but pnv_alloc_idle_core_states() iterates paca with
cpu_nr_cores()*threads_per_core, which is using NR_CPUS.
This causes pnv_alloc_idle_core_states() to write over memory,
which is outside of paca array and may later lead to various panics.
Fixes: 7cba160ad789 (powernv/cpuidle: Redesign idle states management) Signed-off-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Preet U. Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We currently have a problem that SELinux policy is being enforced when
creating debugfs files. If a debugfs file is created as a side effect of
doing some syscall, then that creation can fail if the SELinux policy
for that process prevents it.
This seems wrong. We don't do that for files under /proc, for instance,
so Bruce has proposed a patch to fix that.
While discussing that patch however, Greg K.H. stated:
"No kernel code should care / fail if a debugfs function fails, so
please fix up the sunrpc code first."
This patch converts all of the sunrpc debugfs setup code to be void
return functins, and the callers to not look for errors from those
functions.
This should allow rpc_clnt and rpc_xprt creation to work, even if the
kernel fails to create debugfs files for some reason.
Symptoms were failing krb5 mounts on systems using gss-proxy and
selinux.
Fixes: 388f0c776781 "sunrpc: add a debugfs rpc_xprt directory..." Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
alloc_init_lock_stateowner can return an already freed entry if there is
a race to put openowners in the hashtable.
Noticed by inspection after Jeff Layton fixed the same bug for open
owners. Depending on client behavior, this one may be trickier to
trigger in practice.
Fixes: c58c6610ec24 "nfsd: Protect adding/removing lock owners using client_lock" Cc: Trond Myklebust <trond.myklebust@primarydata.com> Acked-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
alloc_init_open_stateowner can return an already freed entry if there is
a race to put openowners in the hashtable.
In commit 7ffb588086e9, we changed it so that we allocate and initialize
an openowner, and then check to see if a matching one got stuffed into
the hashtable in the meantime. If it did, then we free the one we just
allocated and take a reference on the one already there. There is a bug
here though. The code will then return the pointer to the one that was
allocated (and has now been freed).
This wasn't evident before as this race almost never occurred. The Linux
kernel client used to serialize requests for a single openowner. That
has changed now with v4.0 kernels, and this race can now easily occur.
Fixes: 7ffb588086e9 Cc: Trond Myklebust <trond.myklebust@primarydata.com> Reported-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To be able to add memory pages which were added via memory hotplug to
a pv domain, the pages must be "invalid" instead of "identity" in the
p2m list before they can be added.
Suggested-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 054954eb051f35e74b75a566a96fe756015352c8 ("xen: switch to linear
virtual mapped sparse p2m list") introduced a regression regarding to
memory hotplug for a pv-domain: as the virtual space for the p2m list
is allocated for the to be expected memory size of the domain only,
hotplugged memory above that size will not be usable by the domain.
Correct this by using a configurable size for the p2m list in case of
memory hotplug enabled (default supported memory size is 512 GB for
64 bit domains and 4 GB for 32 bit domains).
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The assumption before this patch was that we don't need to
run again the INIT firmware after the system booted. The
INIT firmware runs calibrations which impact the physical
layer's behavior.
Users reported that it may be helpful to run these
calibrations again every time the interface is brought up.
The penatly is minimal, since the calibrations run fast.
This fixes:
https://bugzilla.kernel.org/show_bug.cgi?id=94341
Properly verify that the resulting page aligned end address is larger
than both the start address and the length of the memory area requested.
Both the start and length arguments for ib_umem_get are controlled by
the user. A misbehaving user can provide values which will cause an
integer overflow when calculating the page aligned end address.
This overflow can cause also miscalculation of the number of pages
mapped, and additional logic issues.
Addresses: CVE-2014-8159 Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>