Now that all of the user copy routines are converted to return
accurate residual lengths when an exception occurs, we no longer need
the broken fixup routines.
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The fixup helper function mechanism for handling user copy fault
handling is not %100 accurrate, and can never be made so.
We are going to transition the code to return the running return
return length, which is always kept track in one or more registers
of each of these routines.
In order to convert them one by one, we have to allow the existing
behavior to continue functioning.
Therefore make all the copy code that wants the fixup helper to be
used return negative one.
After all of the user copy routines have been converted, this logic
and the fixup helpers themselves can be removed completely.
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When the vmalloc area gets fragmented, and because the firmware
mapping area sits between where modules live and the vmalloc area, we
can sometimes receive requests for enormous kernel TLB range flushes.
When this happens the cpu just spins flushing billions of pages and
this triggers the NMI watchdog and other problems.
We took care of this on the TSB side by doing a linear scan of the
table once we pass a certain threshold.
Do something similar for the TLB flush, however we are limited by
the TLB flush facilities provided by the different chip variants.
First of all we use an (mostly arbitrary) cut-off of 256K which is
about 32 pages. This can be tuned in the future.
The huge range code path for each chip works as follows:
1) On spitfire we flush all non-locked TLB entries using diagnostic
acceses.
2) On cheetah we use the "flush all" TLB flush.
3) On sun4v/hypervisor we do a TLB context flush on context 0, which
unlike previous chips does not remove "permanent" or locked
entries.
We could probably do something better on spitfire, such as limiting
the flush to kernel TLB entries or even doing range comparisons.
However that probably isn't worth it since those chips are old and
the TLB only had 64 entries.
Reported-by: James Clarke <jrtc27@jrtc27.com> Tested-by: James Clarke <jrtc27@jrtc27.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When we copy code over to patch another piece of code, we can only use
PC-relative branches that target code within that piece of code.
Such PC-relative branches cannot be made to external symbols because
the patch moves the location of the code and thus modifies the
relative address of external symbols.
Use an absolute jmpl to fix this problem.
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If the number of pages we are flushing is more than twice the number
of entries in the TSB, just scan the TSB table for matches rather
than probing each and every page in the range.
Based upon a patch and report by James Clarke.
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Additionally, if the offset will overflow the immediate for a ba,pt
instruction, fall back on a standard ba to get an extra 3 bits.
Signed-off-by: James Clarke <jrtc27@jrtc27.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
do_sparc64_fault() calculates both the base and huge page RSS sizes and
uses this information in calls to tsb_grow(). The calculation for base
page TSB size is not correct if the task uses hugetlb pages. hugetlb
pages are not accounted for in RSS, therefore the call to get_mm_rss(mm)
does not include hugetlb pages. However, the number of pages based on
huge_pte_count (which does include hugetlb pages) is subtracted from
this value. This will result in an artificially small and often negative
RSS calculation. The base TSB size is then often set to max_tsb_size
as the passed RSS is unsigned, so a negative value looks really big.
THP pages are also accounted for in huge_pte_count, and THP pages are
accounted for in RSS so the calculation in do_sparc64_fault() is correct
if a task only uses THP pages.
A single huge_pte_count is not sufficient for TSB sizing if both hugetlb
and THP pages can be used. Instead of a single counter, use two: one
for hugetlb and one for THP.
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We accidentally take the "port->lock" twice in a row. This old code
was supposed to be deleted.
Fixes: e58e241c1788 ('sparc: serial: Clean up the locking for -rt') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On pre-Niagara systems, we fetch the fault address on data TLB
exceptions from the TLB_TAG_ACCESS register. But this register also
contains the context ID assosciated with the fault in the low 13 bits
of the register value.
This propagates into current_thread_info()->fault_address and can
cause trouble later on.
So clear the low 13-bits out of the TLB_TAG_ACCESS value in the cases
where it matters.
Reported-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Line discipline drivers may mistakenly misuse ldisc-related fields
when initializing. For example, a failure to initialize tty->receive_room
in the N_GIGASET_M101 line discipline was recently found and fixed [1].
Now, the N_X25 line discipline has been discovered accessing the previous
line discipline's already-freed private data [2].
Harden the ldisc interface against misuse by initializing revelant
tty fields before instancing the new line discipline.
With syzkaller help, Marco Grassi found a bug in TCP stack,
crashing in tcp_collapse()
Root cause is that sk_filter() can truncate the incoming skb,
but TCP stack was not really expecting this to happen.
It probably was expecting a simple DROP or ACCEPT behavior.
We first need to make sure no part of TCP header could be removed.
Then we need to adjust TCP_SKB_CB(skb)->end_seq
Many thanks to syzkaller team and Marco for giving us a reproducer.
Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Marco Grassi <marco.gra@gmail.com> Reported-by: Vladis Dronov <vdronov@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In v2.6, ip_rt_redirect() calls arp_bind_neighbour() which returns 0
and then the state of the neigh for the new_gw is checked. If the state
isn't valid then the redirected route is deleted. This behavior is
maintained up to v3.5.7 by check_peer_redirect() because rt->rt_gateway
is assigned to peer->redirect_learned.a4 before calling
ipv4_neigh_lookup().
After commit 5943634fc559 ("ipv4: Maintain redirect and PMTU info in
struct rtable again."), ipv4_neigh_lookup() is performed without the
rt_gateway assigned to the new_gw. In the case when rt_gateway (old_gw)
isn't zero, the function uses it as the key. The neigh is most likely
valid since the old_gw is the one that sends the ICMP redirect message.
Then the new_gw is assigned to fib_nh_exception. The problem is: the
new_gw ARP may never gets resolved and the traffic is blackholed.
So, use the new_gw for neigh lookup.
Changes from v1:
- use __ipv4_neigh_lookup instead (per Eric Dumazet).
Fixes: 5943634fc559 ("ipv4: Maintain redirect and PMTU info in struct rtable again.") Signed-off-by: Stephen Suryaputra Lin <ssurya@ieee.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After Tom patch, thoff field could point past the end of the buffer,
this could fool some callers.
If an skb was provided, skb->len should be the upper limit.
If not, hlen is supposed to be the upper limit.
Fixes: a6e544b0a88b ("flow_dissector: Jump to exit code in __skb_flow_dissect") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Yibin Yang <yibyang@cisco.com Acked-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Do not send the next message in sendmmsg for partial sendmsg
invocations.
sendmmsg assumes that it can continue sending the next message
when the return value of the individual sendmsg invocations
is positive. It results in corrupting the data for TCP,
SCTP, and UNIX streams.
For example, sendmmsg([["abcd"], ["efgh"]]) can result in a stream
of "aefgh" if the first sendmsg invocation sends only the first
byte while the second sendmsg goes through.
Datagram sockets either send the entire datagram or fail, so
this patch affects only sockets of type SOCK_STREAM and
SOCK_SEQPACKET.
Fixes: 228e548e6020 ("net: Add sendmmsg socket system call") Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The display of /proc/net/route has had a couple issues due to the fact that
when I originally rewrote most of fib_trie I made it so that the iterator
was tracking the next value to use instead of the current.
In addition it had an off by 1 error where I was tracking the first piece
of data as position 0, even though in reality that belonged to the
SEQ_START_TOKEN.
This patch updates the code so the iterator tracks the last reported
position and key instead of the next expected position and key. In
addition it shifts things so that all of the leaves start at 1 instead of
trying to report leaves starting with offset 0 as being valid. With these
two issues addressed this should resolve any off by one errors that were
present in the display of /proc/net/route.
Fixes: 25b97c016b26 ("ipv4: off-by-one in continuation handling in /proc/net/route") Cc: Andy Whitcroft <apw@canonical.com> Reported-by: Jason Baron <jbaron@akamai.com> Tested-by: Jason Baron <jbaron@akamai.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sctp_wait_for_connect() currently already holds the asoc to keep it
alive during the sleep, in case another thread release it. But Andrey
Konovalov and Dmitry Vyukov reported an use-after-free in such
situation.
Problem is that __sctp_connect() doesn't get a ref on the asoc and will
do a read on the asoc after calling sctp_wait_for_connect(), but by then
another thread may have closed it and the _put on sctp_wait_for_connect
will actually release it, causing the use-after-free.
Fix is, instead of doing the read after waiting for the connect, do it
before so, and avoid this issue as the socket is still locked by then.
There should be no issue on returning the asoc id in case of failure as
the application shouldn't trust on that number in such situations
anyway.
This issue doesn't exist in sctp_sendmsg() path.
Reported-by: Dmitry Vyukov <dvyukov@google.com> Reported-by: Andrey Konovalov <andreyknvl@google.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
dccp_v6_err() does not use pskb_may_pull() and might access garbage.
We only need 4 bytes at the beginning of the DCCP header, like TCP,
so the 8 bytes pulled in icmpv6_notify() are more than enough.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
dccp_v4_err() does not use pskb_may_pull() and might access garbage.
We only need 4 bytes at the beginning of the DCCP header, like TCP,
so the 8 bytes pulled in icmp_socket_deliver() are more than enough.
This patch might allow to process more ICMP messages, as some routers
are still limiting the size of reflected bytes to 28 (RFC 792), instead
of extended lengths (RFC 1812 4.3.2.3)
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Imagine initial value of max_skb_frags is 17, and last
skb in write queue has 15 frags.
Then max_skb_frags is lowered to 14 or smaller value.
tcp_sendmsg() will then be allowed to add additional page frags
and eventually go past MAX_SKB_FRAGS, overflowing struct
skb_shared_info.
Fixes: 5f74f82ea34c ("net:Add sysctl_max_skb_frags") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Hans Westgaard Ry <hans.westgaard.ry@oracle.com> Cc: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
skb->cb may contain data from previous layers. In the observed scenario,
the garbage data were misinterpreted as IP6CB(skb)->frag_max_size, so
that small packets sent through the tunnel are mistakenly fragmented.
This patch unconditionally clears the control buffer in ip6tunnel_xmit(),
which affects ip6_tunnel, ip6_udp_tunnel and ip6_gre. Currently none of
these tunnels set IP6CB(skb)->flags, otherwise it needs to be done earlier.
Cc: stable@vger.kernel.org Signed-off-by: Eli Cooper <elicooper@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Current bgmac code initializes some DMA settings in the receive control
register for some hardware and then immediately clears those settings.
Not clearing those settings results in ~420Mbps *improvement* in
throughput; this system can now receive frames at line-rate on Broadcom
5871x hardware compared to ~520Mbps today. I also tested a few other
values but found there to be no discernible difference in CPU
utilization even if burst size and prefetching values are different.
On the hardware tested there was no need to keep the code that cleared
all but bits 16-17, but since there is a wide variety of hardware that
used this driver (I did not look at all hardware docs for hardware using
this IP block), I find it wise to move this call up and clear bits just
after reading the default value from the hardware rather than completely
removing it.
This is a good candidate for -stable >=3.14 since that is when the code
that was supposed to improve performance (but did not) was introduced.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com> Fixes: 56ceecde1f29 ("bgmac: initialize the DMA controller of core...") Cc: Hauke Mehrtens <hauke@hauke-m.de> Acked-by: Hauke Mehrtens <hauke@hauke-m.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sending zero checksum is ok for TCP, but not for UDP.
UDPv6 receiver should by default drop a frame with a 0 checksum,
and UDPv4 would not verify the checksum and might accept a corrupted
packet.
Simply replace such checksum by 0xffff, regardless of transport.
This error was caught on SIT tunnels, but seems generic.
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Maciej Żenczykowski <maze@google.com> Cc: Willem de Bruijn <willemb@google.com> Acked-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
At accept() time, it is possible the parent has a non zero
sk_err_soft, leftover from a prior error.
Make sure we do not leave this value in the child, as it
makes future getsockopt(SO_ERROR) calls quite unreliable.
Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
... which makes sense for reno (it sets ssthresh to half the current cwnd),
but it makes no sense for dctcp, which sets ssthresh based on the current
congestion estimate.
This can cause severe growth of cwnd (eventually overflowing u32).
Fix this by saving last cwnd on loss and restore cwnd based on that,
similar to cubic and other algorithms.
Fixes: e3118e8359bb7c ("net: tcp: add DCTCP congestion control algorithm") Cc: Lawrence Brakmo <brakmo@fb.com> Cc: Andrew Shewmaker <agshew@gmail.com> Cc: Glenn Judd <glenn.judd@morganstanley.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
nf_log_proc_dostring() used current's network namespace instead of the one
corresponding to the sysctl file the write was performed on. Because the
permission check happens at open time and the nf_log files in namespaces
are accessible for the namespace owner, this can be abused by an
unprivileged user to effectively write to the init namespace's nf_log
sysctls.
Stash the "struct net *" in extra2 - data and extra1 are already used.
uid_t outer_uid;
gid_t outer_gid;
int stolen_fd = -1;
void writefile(char *path, char *buf) {
int fd = open(path, O_WRONLY);
if (fd == -1)
err(1, "unable to open thing");
if (write(fd, buf, strlen(buf)) != strlen(buf))
err(1, "unable to write thing");
close(fd);
}
int child_fn(void *p_) {
if (mount("proc", "/proc", "proc", MS_NOSUID|MS_NODEV|MS_NOEXEC,
NULL))
err(1, "mount");
/* Yes, we need to set the maps for the net sysctls to recognize us
* as namespace root.
*/
char buf[1000];
sprintf(buf, "0 %d 1\n", (int)outer_uid);
writefile("/proc/1/uid_map", buf);
writefile("/proc/1/setgroups", "deny");
sprintf(buf, "0 %d 1\n", (int)outer_gid);
writefile("/proc/1/gid_map", buf);
While free'ing qgroup->reserved resources, we much check if
the page has not been invalidated by a truncate operation
by checking if the page is still dirty before reducing the
qgroup resources. Resources in such a case are free'd when
the entire extent is released by delayed_ref.
This fixes a double accounting while releasing resources
in case of truncating a file, reproduced by the following testcase.
SCRATCH_DEV=/dev/vdb
SCRATCH_MNT=/mnt
mkfs.btrfs -f $SCRATCH_DEV
mount -t btrfs $SCRATCH_DEV $SCRATCH_MNT
cd $SCRATCH_MNT
btrfs quota enable $SCRATCH_MNT
btrfs subvolume create a
btrfs qgroup limit 500m a $SCRATCH_MNT
sync
for c in {1..15}; do
dd if=/dev/zero bs=1M count=40 of=$SCRATCH_MNT/a/file;
done
An interrupt may occur right after devm_request_irq() is called and
prior to the spinlock initialization, leading to a kernel oops,
as the interrupt handler uses the spinlock.
In order to prevent this problem, move the spinlock initialization
prior to requesting the interrupts.
When sun4i_codec_create_card fails, we do not assign a proper error
code to the return value. The return value would be 0 from the previous
function call, or we would have bailed out sooner. This would confuse
the driver core into thinking the device probe succeeded, when in fact
it didn't, leaving various devres based resources lingering.
Make the create_card function pass back a meaningful error code, and
assign it to the return value.
Fixes: 45fb6b6f2aa3 ("ASoC: sunxi: add support for the on-chip codec on
early Allwinner SoCs") Signed-off-by: Chen-Yu Tsai <wens@csie.org> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
hw_random carefully avoids using a stack buffer except in
add_early_randomness(). This causes a crash in virtio_rng if
CONFIG_VMAP_STACK=y.
Reported-by: Matt Mullins <mmullins@mmlx.us> Tested-by: Matt Mullins <mmullins@mmlx.us> Fixes: d3cc7996473a ("hwrng: fetch randomness only after device init") Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
gen_pool_alloc_algo() iterates over the chunks of a pool trying to find
a contiguous block of memory that satisfies the allocation request.
The shortcut
if (size > atomic_read(&chunk->avail))
continue;
makes the loop skip over chunks that do not have enough bytes left to
fulfill the request. There are two situations, though, where an
allocation might still fail:
(1) The available memory is not contiguous, i.e. the request cannot
be fulfilled due to external fragmentation.
(2) A race condition. Another thread runs the same code concurrently
and is quicker to grab the available memory.
In those situations, the loop calls pool->algo() to search the entire
chunk, and pool->algo() returns some value that is >= end_bit to
indicate that the search failed. This return value is then assigned to
start_bit. The variables start_bit and end_bit describe the range that
should be searched, and this range should be reset for every chunk that
is searched. Today, the code fails to reset start_bit to 0. As a
result, prefixes of subsequent chunks are ignored. Memory allocations
might fail even though there is plenty of room left in these prefixes of
those other chunks.
Fixes: 7f184275aa30 ("lib, Make gen_pool memory allocator lockless") Link: http://lkml.kernel.org/r/1477420604-28918-1-git-send-email-danielmentz@google.com Signed-off-by: Daniel Mentz <danielmentz@google.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
NFC version reply size checked against only header size, not against
full message size. That may lead potentially to uninitialized memory access
in version data.
That leads to warnings when version data is accessed:
drivers/misc/mei/bus-fixup.c: warning: '*((void *)&ver+11)' may be used uninitialized in this function [-Wuninitialized]: => 212:2
Reported in
Build regressions/improvements in v4.9-rc3
https://lkml.org/lkml/2016/10/30/57
Fixes: 59fcd7c63abf (mei: nfc: Initial nfc implementation) Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It turns out that the disable_dmar_iommu() code-path tried
to get the device_domain_lock recursivly, which will
dead-lock when this code runs on dmar removal. Fix both
code-paths that could lead to the dead-lock.
Fixes: 55d940430ab9 ('iommu/vt-d: Get rid of domain->iommu_lock') Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After commit 1cf6e8fc8341 ("tty/serial: at91: fix RTS line management
when hardware handshake is enabled"), the hardware handshake wasn't
functional anymore on Atmel platforms (beside SAMA5D2).
To understand why, one has to understand the flag ATMEL_US_USMODE_HWHS
first:
Before commit 1cf6e8fc8341 ("tty/serial: at91: fix RTS line management
when hardware handshake is enabled"), this flag was never set.
Thus, the CTS/RTS where only handled by serial_core (and everything
worked just fine).
This commit introduced the use of the ATMEL_US_USMODE_HWHS flag,
enabling it for all boards when the user space enables flow control.
When the ATMEL_US_USMODE_HWHS is set, the Atmel USART controller
handles a part of the flow control job:
- disable the transmitter when the CTS pin gets high.
- drive the RTS pin high when the DMA buffer transfer is completed or
PDC RX buffer full or RX FIFO is beyond threshold. (depending on the
controller version).
NB: This feature is *not* mandatory for the flow control to work.
(Nevertheless, it's very useful if low latencies are needed.)
Now, the specifics of the ATMEL_US_USMODE_HWHS flag:
- For platforms with DMAC and no FIFOs (sam9x25, sam9x35, sama5D3,
sama5D4, sam9g15, sam9g25, sam9g35)* this feature simply doesn't work.
( source: https://lkml.org/lkml/2016/9/7/598 )
Tested it on sam9g35, the RTS pins always stays up, even when RXEN=1
or a new DMA transfer descriptor is set.
=> ATMEL_US_USMODE_HWHS must not be used for those platforms
- For platforms with a PDC (sam926{0,1,3}, sam9g10, sam9g20, sam9g45,
sam9g46)*, there's another kind of problem. Once the flag
ATMEL_US_USMODE_HWHS is set, the RTS pin can't be driven anymore via
RTSEN/RTSDIS in USART Control Register. The RTS pin can only be driven
by enabling/disabling the receiver or setting RCR=RNCR=0 in the PDC
(Receive (Next) Counter Register).
=> Doing this is beyond the scope of this patch and could add other
bugs, so the original (and working) behaviour should be set for those
platforms (meaning ATMEL_US_USMODE_HWHS flag should be unset).
- For platforms with a FIFO (sama5d2)*, the RTS pin is driven according
to the RX FIFO thresholds, and can be also driven by RTSEN/RTSDIS in
USART Control Register. No problem here.
(This was the use case of commit 1cf6e8fc8341 ("tty/serial: at91: fix
RTS line management when hardware handshake is enabled"))
NB: If the CTS pin declared as a GPIO in the DTS, (for instance
cts-gpios = <&pioA PIN_PB31 GPIO_ACTIVE_LOW>), the transmitter will be
disabled.
=> ATMEL_US_USMODE_HWHS flag can be set for this platform ONLY IF the
CTS pin is not a GPIO.
So, the only case when ATMEL_US_USMODE_HWHS can be enabled is when
(atmel_use_fifo(port) &&
!mctrl_gpio_to_gpiod(atmel_port->gpios, UART_GPIO_CTS))
Tested on all Atmel USART controller flavours:
AT91SAM9G35-CM (DMAC flavour), AT91SAM9G20-EK (PDC flavour),
SAMA5D2xplained (FIFO flavour).
* the list may not be exhaustive
Fixes: 1cf6e8fc8341 ("tty/serial: at91: fix RTS line management when hardware handshake is enabled") Signed-off-by: Richard Genoud <richard.genoud@gmail.com> Acked-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Acked-by: Cyrille Pitchen <cyrille.pitchen@atmel.com> Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
[nicolas.ferre@atmel.com: adapt to 4.4.x kernel for stable by adding
the atmel_port variable declaration which was missing] Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When setting the channel configuration register, the perid field is not
set to 0 since it is useless for mem2mem transfers. Unfortunately, a
device has 0 as perid. It could cause spurious flags status because
the controller could mix some events from the two channels.
For that reason, use the highest perid value for mem2mem transfers since it
doesn't match the perid of other devices.
The VBT provides the platform a way to mix and match the DDI ports vs.
GMBUS pins. Currently we only trust the VBT for DDI E, which I suppose
has no standard GMBUS pin assignment. However, there are machines out
there that use a non-standard mapping for the other ports as well.
Let's start trusting the VBT on this one for all ports on DDI platforms.
I've structured the code such that other platforms could easily start
using this as well, by simply filling in the ddi_port_info. IIRC there
may be CHV system that might actually need this.
v2: Include a commit message, include a debug message during init
The advancing of the PC when completing an MMIO load is done before
re-entering the guest, i.e. before restoring the guest ASID. However if
the load is in a branch delay slot it may need to access guest code to
read the prior branch instruction. This isn't safe in TLB mapped code at
the moment, nor in the future when we'll access unmapped guest segments
using direct user accessors too, as it could read the branch from host
user memory instead.
Therefore calculate the resume PC in advance while we're still in the
right context and save it in the new vcpu->arch.io_pc (replacing the no
longer needed vcpu->arch.pending_load_cause), and restore it on MMIO
completion.
Fixes: e685c689f3a8 ("KVM/MIPS32: Privileged instruction/target branch emulation.") Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[james.hogan@imgtec.com: Backport to 3.18..4.4] Signed-off-by: James Hogan <james.hogan@imgtec.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
That check should be for the line sas_target_priv_data->raid_device =
raid_device;
Due to above hunk, we are not initializing raid_device's starget for
raid volumes, and so during raid disk deletion driver is not calling
scsi_remove_target() API as driver observes starget field of
raid_device's structure as NULL.
A system can get hung task timeouts if a qlogic board fails during
initialization (if the board breaks again or fails the init). The hang
involves the scsi scan.
In a nutshell, since commit beb9e315e6e0 ("qla2xxx: Prevent removal and
board_disable race"):
...it is possible to have freed ha (base_vha->hw) early by a call to
qla2x00_remove_one when pdev->enable_cnt equals zero:
if (!atomic_read(&pdev->enable_cnt)) {
scsi_host_put(base_vha->host);
kfree(ha);
pci_set_drvdata(pdev, NULL);
return;
Almost always, the scsi_host_put above frees the vha structure
(attached to the end of the Scsi_Host we're putting) since it's the last
put, and life is good. However, if we are entering this routine because
the adapter has broken sometime during initialization AND a scsi scan is
already in progress (and has done its own scsi_host_get), vha will not
be freed. What's worse, the scsi scan will access the freed ha structure
through qla2xxx_scan_finished:
if (time > vha->hw->loop_reset_delay * HZ)
return 1;
The scsi scan keeps checking to see if a scan is complete by calling
qla2xxx_scan_finished. There is a timeout value that limits the length
of time a scan can take (hw->loop_reset_delay, usually set to 5
seconds), but this definition is in the data structure (hw) that can get
freed early.
This can yield unpredictable results, the worst of which is that the
scsi scan can hang indefinitely. This happens when the freed structure
gets reused and loop_reset_delay gets overwritten with garbage, which
the scan obliviously uses as its timeout value.
The fix for this is simple: at the top of qla2xxx_scan_finished, check
for the UNLOADING bit in the vha structure (_vha is not freed at this
point). If UNLOADING is set, we exit the scan for this adapter
immediately. After this last reference to the ha structure, we'll exit
the scan for this adapter, and continue on.
This problem is hard to hit, but I have run into it doing negative
testing many times now (with a test specifically designed to bring it
out), so I can verify that this fix works. My testing has been against a
RHEL7 driver variant, but the bug and patch are equally relevant to to
the upstream driver.
Fixes: beb9e315e6e0 ("qla2xxx: Prevent removal and board_disable race") Signed-off-by: Bill Kuzeja <william.kuzeja@stratus.com> Acked-by: Himanshu Madhani <himanshu.madhani@cavium.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
While testing, it was observed that on some platforms the scale value
from iio sysfs for gyroscope is always 0 (E.g. Yoga 260). This results
in the final angular velocity component values to be zeros.
This is caused by insufficient precision of scale value displayed in sysfs.
If the precision is changed to nano from current micro, then this is
sufficient to display the scale value on this platform.
Since this can be a problem for all other HID sensors, increase scale
precision of all HID sensors to nano from current micro.
Results on Yoga 260:
name scale before scale now
--------------------------------------------
gyro_3d 0.000000 0.000000174
als 0.001000 0.001000000
magn_3d 0.000001 0.000001000
accel_3d 0.000009 0.000009806
Signed-off-by: Song Hongyan <hongyan.song@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Jonathan Cameron <jic23@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The boot-time frequency of a CPU is considered its rated maximum, as we
have no other source of such information. However, this was previously
only used for chips with 80% restrictions on secondary PLLs. This
usually wasn't a problem because most chips/configs boot with a divider
of /1, with other dividers being used only for dynamic frequency
reduction. However, at least one config (LS1021A at less than 1 GHz)
uses a different divider for top speed. This was causing cpufreq to set
a frequency beyond the chip's rated speed.
This is fixed by applying a 100%-of-initial-speed limit to all CPU PLLs,
similar to the existing 80% limit that only applied to some.
Signed-off-by: Scott Wood <oss@buserror.net> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bug 150611 uncovered that the WMI ID used by the toshiba-wmi driver
is not Toshiba specific, and as such, the driver was being loaded
on non Toshiba laptops too.
This patch adds a DMI matching list checking for TOSHIBA as the
vendor, refusing to load if it is not.
Also the WMI GUID was renamed, dropping the TOSHIBA_ prefix, to
better reflect that such GUID is not a Toshiba specific one.
Don't pass a size larger than iov_len to kernel_sendmsg().
Otherwise it will cause a NULL pointer deref when kernel_sendmsg()
returns with rv < size.
DRBD as external module has been around in the kernel 2.4 days already.
We used to be compatible to 2.4 and very early 2.6 kernels,
we used to use
rv = sock_sendmsg(sock, &msg, iov.iov_len);
then later changed to
rv = kernel_sendmsg(sock, &msg, &iov, 1, size);
when we should have used
rv = kernel_sendmsg(sock, &msg, &iov, 1, iov.iov_len);
tcp_sendmsg() used to totally ignore the size parameter. 57be5bd ip: convert tcp_sendmsg() to iov_iter primitives
changes that, and exposes our long standing error.
Even with this error exposed, to trigger the bug, we would need to have
an environment (config or otherwise) causing us to not use sendpage()
for larger transfers, a failing connection, and have it fail "just at the
right time". Apparently that was unlikely enough for most, so this went
unnoticed for years.
Still, it is known to trigger at least some of these,
and suspected for the others:
[0] http://lists.linbit.com/pipermail/drbd-user/2016-July/023112.html
[1] http://lists.linbit.com/pipermail/drbd-dev/2016-March/003362.html
[2] https://forums.grsecurity.net/viewtopic.php?f=3&t=4546
[3] https://ubuntuforums.org/showthread.php?t=2336150
[4] http://e2.howsolveproblem.com/i/1175162/
This should go into 4.9,
and into all stable branches since and including v4.0,
which is the first to contain the exposing change.
It is correct for all stable branches older than that as well
(which contain the DRBD driver; which is 2.6.33 and up).
It requires a small "conflict" resolution for v4.4 and earlier, with v4.5
we dropped the comment block immediately preceding the kernel_sendmsg().
Fixes: b411b3637fa7 ("The DRBD driver") Cc: viro@zeniv.linux.org.uk Cc: christoph.lechleitner@iteg.at Cc: wolfgang.glas@iteg.at Reported-by: Christoph Lechleitner <christoph.lechleitner@iteg.at> Tested-by: Christoph Lechleitner <christoph.lechleitner@iteg.at> Signed-off-by: Richard Weinberger <richard@nod.at>
[changed oneliner to be "obvious" without context; more verbose message] Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
According to Dave Miller "the networking stack has a
hard requirement that all SKBs which are transmitted
must have their completion signalled in a fininte
amount of time. This is because, until the SKB is
freed by the driver, it holds onto socket,
netfilter, and other subsystem resources."
In summary, this means that using TX IRQ throttling
for the networking gadgets is, at least, complex and
we should avoid it for the time being.
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Suggested-by: David Miller <davem@davemloft.net> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The TIOCMIWAIT implementation would return -EINVAL if any of the three
supported signals were included in the mask.
Instead of returning an error in case TIOCM_CTS is included, simply
drop the mask check completely, which is in accordance with how other
drivers implement this ioctl.
Fixes: 5a6a62bdb925 ("cdc-acm: add TIOCMIWAIT") Signed-off-by: Johan Hovold <johan@kernel.org> Acked-by: Oliver Neukum <oneukum@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This basicly reverts commit e534f3e9 (staging:nvec: Introduce the use of
the managed version of kzalloc). Serio struct should never by managed
because it is refcounted. Doing so will lead to a double free oops on module
remove.
Signed-off-by: Marc Dietrich <marvin24@gmx.de> Fixes: e534f3e9429f ("staging:nvec: Introduce the use of the managed version of kzalloc") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is necessary to detect paz00 (ac100) touchpad properly as one
speaking ETPS/2 protocol. Without it X.org's synaptics driver doesn't
work as the touchpad is detected as an ImPS/2 mouse instead.
A pass through port is an additional PS/2 port used to connect a slave
device to a master device that is using PS/2 to communicate with the
host (so slave's PS/2 communication is tunneled over master's PS/2
link). "Synaptics PS/2 TouchPad Interfacing Guide" describes such a
setup (PS/2 PASS-THROUGH OPTION section).
Since paz00's embedded controller is not connected to a PS/2 port
itself, the PS/2 interface it exposes is not a pass-through one.
Signed-off-by: Paul Fertser <fercerpav@gmail.com> Acked-by: Marc Dietrich <marvin24@gmx.de> Fixes: 36b30d6138f4 ("staging: nvec: ps2: change serio type to passthrough") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This command was sent behind serio's back and the answer to it was
confusing atkbd probe function which lead to the elantech touchpad
getting detected as a keyboard.
To prevent this from happening just let every party do its part of the
job.
Signed-off-by: Paul Fertser <fercerpav@gmail.com> Acked-by: Marc Dietrich <marvin24@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The ad5933_i2c_read function returns an error code to indicate
whether it could read data or not. However ad5933_work() ignores
this return code and just accesses the data unconditionally,
which gets detected by gcc as a possible bug:
drivers/staging/iio/impedance-analyzer/ad5933.c: In function 'ad5933_work':
drivers/staging/iio/impedance-analyzer/ad5933.c:649:16: warning: 'status' may be used uninitialized in this function [-Wmaybe-uninitialized]
This adds minimal error handling so we only evaluate the
data if it was correctly read.
When the system is suspended to S3 the BIOS might re-initialize certain
GPIO pins back to their original state or it may re-program interrupt mask
of others. For example Acer TravelMate B116-M had BIOS bug where certain
GPIO pin (MF_ISH_GPIO_5) was programmed to trigger on high level, and the
pin state was high once the BIOS gave control to the OS on resume.
We reset the mask back to known state in chv_pinctrl_resume() but that is
called only after device interrupts have already been enabled.
Now, this particular issue was fixed by upgrading the BIOS to the latest
(v1.23) but not everybody upgrades their BIOSes so we fix it up in the
driver as well.
Prevent the possible interrupt storm by moving suspend and resume hooks to
be called at _noirq time instead. Since device interrupts are still
disabled we can restore the mask back to known state before interrupt storm
happens.
Reported-by: Christian Steiner <christian.steiner@outlook.de> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If async suspend is enabled, the driver may access registers concurrently
with another instance which may fail because of the bug in Cherryview GPIO
hardware. Prevent this by taking the shared lock while accessing the
hardware in suspend and resume hooks.
The current code doesn't even compile as somehow the inline assembly
can't see the register names defined as ARC_RTC_*
I'm pretty sure It worked when I first got it merged, but the tools were
definitely different then.
Since commit d86bd1bece6f ("mm/slub: support left redzone") it is no longer
guaranteed that kmalloc(PAGE_SIZE) returns page aligned memory.
After the above commit we get an error for diag224 because aligned
memory is required. This leads to the following user visible error:
# mount none -t s390_hypfs /sys/hypervisor/
mount: unknown filesystem type 's390_hypfs'
# dmesg | grep hypfs
hypfs.cccfb8: The hardware system does not provide all functions
required by hypfs
hypfs.7a79f0: Initialization of hypfs failed with rc=-61
Fix this problem and use get_free_page() instead of kmalloc() to get
correctly aligned memory.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It could be not possible to freeze coredumping task when it waits for
'core_state->startup' completion, because threads are frozen in
get_signal() before they got a chance to complete 'core_state->startup'.
Inability to freeze a task during suspend will cause suspend to fail.
Also CRIU uses cgroup freezer during dump operation. So with an
unfreezable task the CRIU dump will fail because it waits for a
transition from 'FREEZING' to 'FROZEN' state which will never happen.
Use freezer_do_not_count() to tell freezer to ignore coredumping task
while it waits for core_state->startup completion.
Link: http://lkml.kernel.org/r/1475225434-3753-1-git-send-email-aryabinin@virtuozzo.com Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Acked-by: Pavel Machek <pavel@ucw.cz> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Tejun Heo <tj@kernel.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When root activates a swap partition whose header has the wrong
endianness, nr_badpages elements of badpages are swabbed before
nr_badpages has been checked, leading to a buffer overrun of up to 8GB.
This normally is not a security issue because it can only be exploited
by root (more specifically, a process with CAP_SYS_ADMIN or the ability
to modify a swap file/partition), and such a process can already e.g.
modify swapped-out memory of any other userspace process on the system.
When receiving a nec repeat, ensure the correct scancode is repeated
rather than a random value from the stack. This removes the need for
the bogus uninitialized_var() and also fixes the warnings:
drivers/media/usb/dvb-usb/dib0700_core.c: In function ‘dib0700_rc_urb_completion’:
drivers/media/usb/dvb-usb/dib0700_core.c:679: warning: ‘protocol’ may be used uninitialized in this function
[sean addon: So after writing the patch and submitting it, I've bought the
hardware on ebay. Without this patch you get random scancodes
on nec repeats, which the patch indeed fixes.]
Signed-off-by: Sean Young <sean@mess.org> Tested-by: Sean Young <sean@mess.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mismatching stream names in DAPM route and widget definitions are
causing compilation errors. Fixing these names allows the cs4270
driver to compile and function.
[Errors must be at probe time not compile time -- broonie]
Signed-off-by: Murray Foster <mrafoster@gmail.com> Acked-by: Paul Handrigan <Paul.Handrigan@cirrus.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The ALSA proc handler allows currently the write in the unlimited size
until kmalloc() fails. But basically the write is supposed to be only
for small inputs, mostly for one line inputs, and we don't have to
handle too large sizes at all. Since the kmalloc error results in the
kernel warning, it's better to limit the size beforehand.
This patch adds the limit of 16kB, which must be large enough for the
currently existing code.
Currently the ALSA proc handler allows read or write even if the proc
file were write-only or read-only. It's mostly harmless, does thing
but allocating memory and ignores the input/output. But it doesn't
tell user about the invalid use, and it's confusing and inconsistent
in comparison with other proc files.
This patch adds some sanity checks and let the proc handler returning
an -EIO error when the invalid read/write is performed.
This patch will fix regression caused by commit 1e793f6fc0db ("scsi:
megaraid_sas: Fix data integrity failure for JBOD (passthrough)
devices").
The problem was that the MEGASAS_IS_LOGICAL macro did not have braces
and as a result the driver ended up exposing a lot of non-existing SCSI
devices (all SCSI commands to channels 1,2,3 were returned as
SUCCESS-DID_OK by driver).
When I fixed the dp rate selection in: 092c96a8ab9d1bd60ada2ed385cc364ce084180e
drm/radeon: fix dp link rate selection (v2)
I accidently dropped the special handling for NUTMEG
DP bridge chips. They require a fixed link rate.
Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Ken Wang <Qingqing.Wang@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Tested-by: Ken Moffat <zarniwhoop@ntlworld.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When I fixed the dp rate selection in: 3b73b168cffd9c392584d3f665021fa2190f8612
drm/amdgpu: fix dp link rate selection (v2)
I accidently dropped the special handling for NUTMEG
DP bridge chips. They require a fixed link rate.
Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Ken Wang <Qingqing.Wang@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When a guest TLB entry is replaced by TLBWI or TLBWR, we only invalidate
TLB entries on the local CPU. This doesn't work correctly on an SMP host
when the guest is migrated to a different physical CPU, as it could pick
up stale TLB mappings from the last time the vCPU ran on that physical
CPU.
Therefore invalidate both user and kernel host ASIDs on other CPUs,
which will cause new ASIDs to be generated when it next runs on those
CPUs.
We're careful only to do this if the TLB entry was already valid, and
only for the kernel ASID where the virtual address it mapped is outside
of the guest user address range.
Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org Cc: <stable@vger.kernel.org> # 3.17.x-
[james.hogan@imgtec.com: Backport to 3.17..4.4] Signed-off-by: James Hogan <james.hogan@imgtec.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
pageblock_order can be (at least) an unsigned int or an unsigned long
depending on the kernel config and architecture, so use max_t(unsigned
long ...) when comparing it.
fixes these warnings:
In file included from include/linux/list.h:8:0,
from include/linux/kobject.h:20,
from include/linux/of.h:21,
from drivers/of/of_reserved_mem.c:17:
drivers/of/of_reserved_mem.c: In function ‘__reserved_mem_alloc_size’:
include/linux/kernel.h:748:17: warning: comparison of distinct pointer types lacks a cast
(void) (&_max1 == &_max2); \
^
include/linux/kernel.h:747:9: note: in definition of macro ‘max’
typeof(y) _max2 = (y); \
^
drivers/of/of_reserved_mem.c:131:48: note: in expansion of macro ‘max’
align = max(align, (phys_addr_t)PAGE_SIZE << max(MAX_ORDER - 1, pageblock_ord
^
include/linux/kernel.h:748:17: warning: comparison of distinct pointer types lacks a cast
(void) (&_max1 == &_max2); \
^
include/linux/kernel.h:747:21: note: in definition of macro ‘max’
typeof(y) _max2 = (y); \
^
drivers/of/of_reserved_mem.c:131:48: note: in expansion of macro ‘max’
align = max(align, (phys_addr_t)PAGE_SIZE << max(MAX_ORDER - 1, pageblock_ord
^
Fixes: 1cc8e3458b51 ("drivers: of: of_reserved_mem: fixup the alignment with CMA setup") Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Rob Herring <robh@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When transmitting on a packet socket with PACKET_VNET_HDR and
PACKET_QDISC_BYPASS, validate device support for features requested
in vnet_hdr.
Drop TSO packets sent to devices that do not support TSO or have the
feature disabled. Note that the latter currently do process those
packets correctly, regardless of not advertising the feature.
Because of SKB_GSO_DODGY, it is not sufficient to test device features
with netif_needs_gso. Full validate_xmit_skb is needed.
Switch to software checksum for non-TSO packets that request checksum
offload if that device feature is unsupported or disabled. Note that
similar to the TSO case, device drivers may perform checksum offload
correctly even when not advertising it.
When switching to software checksum, packets hit skb_checksum_help,
which has two BUG_ON checksum not in linear segment. Packet sockets
always allocate at least up to csum_start + csum_off + 2 as linear.
Tested by running github.com/wdebruij/kerneltools/psock_txring_vnet.c
Fixes: d346a3fae3ff ("packet: introduce PACKET_QDISC_BYPASS socket option") Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andrey Konovalov reported that KASAN detected that SCTP was using a slab
beyond the boundaries. It was caused because when handling out of the
blue packets in function sctp_sf_ootb() it was checking the chunk len
only after already processing the first chunk, validating only for the
2nd and subsequent ones.
The fix is to just move the check upwards so it's also validated for the
1st chunk.
Reported-by: Andrey Konovalov <andreyknvl@google.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
While trying out [1][2], I noticed that tc monitor doesn't show the
correct handle on delete:
$ tc monitor
qdisc clsact ffff: dev eno1 parent ffff:fff1
filter dev eno1 ingress protocol all pref 49152 bpf handle 0x2a [...]
deleted filter dev eno1 ingress protocol all pref 49152 bpf handle 0xf3be0c80
some context to explain the above:
The user identity of any tc filter is represented by a 32-bit
identifier encoded in tcm->tcm_handle. Example 0x2a in the bpf filter
above. A user wishing to delete, get or even modify a specific filter
uses this handle to reference it.
Every classifier is free to provide its own semantics for the 32 bit handle.
Example: classifiers like u32 use schemes like 800:1:801 to describe
the semantics of their filters represented as hash table, bucket and
node ids etc.
Classifiers also have internal per-filter representation which is different
from this externally visible identity. Most classifiers set this
internal representation to be a pointer address (which allows fast retrieval
of said filters in their implementations). This internal representation
is referenced with the "fh" variable in the kernel control code.
When a user successfuly deletes a specific filter, by specifying the correct
tcm->tcm_handle, an event is generated to user space which indicates
which specific filter was deleted.
Before this patch, the "fh" value was sent to user space as the identity.
As an example what is shown in the sample bpf filter delete event above
is 0xf3be0c80. This is infact a 32-bit truncation of 0xffff8807f3be0c80
which happens to be a 64-bit memory address of the internal filter
representation (address of the corresponding filter's struct cls_bpf_prog);
After this patch the appropriate user identifiable handle as encoded
in the originating request tcm->tcm_handle is generated in the event.
One of the cardinal rules of netlink rules is to be able to take an
event (such as a delete in this case) and reflect it back to the
kernel and successfully delete the filter. This patch achieves that.
Note, this issue has existed since the original TC action
infrastructure code patch back in 2004 as found in:
https://git.kernel.org/cgit/linux/kernel/git/history/history.git/commit/
Fixes: 4e54c4816bfe ("[NET]: Add tc extensions infrastructure.") Reported-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
First bug was added in commit ad6f939ab193 ("ip: Add offset parameter to
ip_cmsg_recv") : Tom missed that ipv4 udp messages could be received on
AF_INET6 socket. ip_cmsg_recv(msg, skb) should have been replaced by
ip_cmsg_recv_offset(msg, skb, sizeof(struct udphdr));
Then commit e6afc8ace6dd ("udp: remove headers from UDP packets before
queueing") forgot to adjust the offsets now UDP headers are pulled
before skb are put in receive queue.
Fixes: ad6f939ab193 ("ip: Add offset parameter to ip_cmsg_recv") Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Sam Kumar <samanthakumar@google.com> Cc: Willem de Bruijn <willemb@google.com> Tested-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Most of getsockopt handlers in net/sctp/socket.c check len against
sizeof some structure like:
if (len < sizeof(int))
return -EINVAL;
On the first look, the check seems to be correct. But since len is int
and sizeof returns size_t, int gets promoted to unsigned size_t too. So
the test returns false for negative lengths. Yes, (-1 < sizeof(long)) is
false.
Fix this in sctp by explicitly checking len < 0 before any getsockopt
handler is called.
Note that sctp_getsockopt_events already handled the negative case.
Since we added the < 0 check elsewhere, this one can be removed.
This reverts commit a681574c99be23e4d20b769bf0e543239c364af5
("ipv4: disable BH in set_ping_group_range()") because we never
read ping_group_range in BH context (unlike local_port_range).
Then, since we already have a lock for ping_group_range, those
using ip_local_ports.lock for ping_group_range are clearly typos.
We might consider to share a same lock for both ping_group_range
and local_port_range w.r.t. space saving, but that should be for
net-next.
Fixes: a681574c99be ("ipv4: disable BH in set_ping_group_range()") Fixes: ba6b918ab234 ("ping: move ping_group_range out of CONFIG_SYSCTL") Cc: Eric Dumazet <edumazet@google.com> Cc: Eric Salo <salo@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In commit 4ee3bd4a8c746 ("ipv4: disable BH when changing ip local port
range") Cong added BH protection in set_local_port_range() but missed
that same fix was needed in set_ping_group_range()
Fixes: b8f1a55639e6 ("udp: Add function to make source port for UDP tunnels") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Eric Salo <salo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently, GRO can do unlimited recursion through the gro_receive
handlers. This was fixed for tunneling protocols by limiting tunnel GRO
to one level with encap_mark, but both VLAN and TEB still have this
problem. Thus, the kernel is vulnerable to a stack overflow, if we
receive a packet composed entirely of VLAN headers.
This patch adds a recursion counter to the GRO layer to prevent stack
overflow. When a gro_receive function hits the recursion limit, GRO is
aborted for this skb and it is processed normally. This recursion
counter is put in the GRO CB, but could be turned into a percpu counter
if we run out of space in the CB.
Thanks to Vladimír Beneš <vbenes@redhat.com> for the initial bug report.
Fixes: CVE-2016-7039 Fixes: 9b174d88c257 ("net: Add Transparent Ethernet Bridging GRO support.") Fixes: 66e5133f19e9 ("vlan: Add GRO support for non hardware accelerated vlan") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Jiri Benc <jbenc@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The offload flag is a status flag and should not be used by
FIB semantics for comparison.
Fixes: 37ed9493699c ("rtnetlink: add RTNH_F_EXTERNAL flag for fib offload") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Satish reported a problem with the perm multicast router ports not getting
reenabled after some series of events, in particular if it happens that the
multicast snooping has been disabled and the port goes to disabled state
then it will be deleted from the router port list, but if it moves into
non-disabled state it will not be re-added because the mcast snooping is
still disabled, and enabling snooping later does nothing.
Here are the steps to reproduce, setup br0 with snooping enabled and eth1
added as a perm router (multicast_router = 2):
1. $ echo 0 > /sys/class/net/br0/bridge/multicast_snooping
2. $ ip l set eth1 down
^ This step deletes the interface from the router list
3. $ ip l set eth1 up
^ This step does not add it again because mcast snooping is disabled
4. $ echo 1 > /sys/class/net/br0/bridge/multicast_snooping
5. $ bridge -d -s mdb show
<empty>
At this point we have mcast enabled and eth1 as a perm router (value = 2)
but it is not in the router list which is incorrect.
After this change:
1. $ echo 0 > /sys/class/net/br0/bridge/multicast_snooping
2. $ ip l set eth1 down
^ This step deletes the interface from the router list
3. $ ip l set eth1 up
^ This step does not add it again because mcast snooping is disabled
4. $ echo 1 > /sys/class/net/br0/bridge/multicast_snooping
5. $ bridge -d -s mdb show
router ports on br0: eth1
Note: we can directly do br_multicast_enable_port for all because the
querier timer already has checks for the port state and will simply
expire if it's in blocking/disabled. See the comment added by
commit 9aa66382163e7 ("bridge: multicast: add a comment to
br_port_state_selection about blocking state")
Fixes: 561f1103a2b7 ("bridge: Add multicast_snooping sysfs toggle") Reported-by: Satish Ashok <sashok@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After Jesper commit back in linux-3.18, we trigger a lockdep
splat in proc_create_data() while allocating memory from
pktgen_change_name().
This patch converts t->if_lock to a mutex, since it is now only
used from control path, and adds proper locking to pktgen_change_name()
1) pktgen_thread_lock to protect the outer loop (iterating threads)
2) t->if_lock to protect the inner loop (iterating devices)
Note that before Jesper patch, pktgen_change_name() was lacking proper
protection, but lockdep was not able to detect the problem.
Fixes: 8788370a1d4b ("pktgen: RCU-ify "if_list" to remove lock in next_to_run()") Reported-by: John Sperbeck <jsperbeck@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The goal of the patch is to fix this scenario:
ip link add dummy1 type dummy
ip link set dummy1 up
ip link set lo down ; ip link set lo up
After that sequence, the local route to the link layer address of dummy1 is
not there anymore.
When the loopback is set down, all local routes are deleted by
addrconf_ifdown()/rt6_ifdown(). At this time, the rt6_info entry still
exists, because the corresponding idev has a reference on it. After the rcu
grace period, dst_rcu_free() is called, and thus ___dst_free(), which will
set obsolete to DST_OBSOLETE_DEAD.
In this case, init_loopback() is called before dst_rcu_free(), thus
obsolete is still sets to something <= 0. So, the function doesn't add the
route again. To avoid that race, let's check the rt6 refcnt instead.
Fixes: 25fb6ca4ed9c ("net IPv6 : Fix broken IPv6 routing table after loopback down-up") Fixes: a881ae1f625c ("ipv6: don't call addrconf_dst_alloc again when enable lo") Fixes: 33d99113b110 ("ipv6: reallocate addrconf router for ipv6 address when lo device up") Reported-by: Francesco Santoro <francesco.santoro@6wind.com> Reported-by: Samuel Gauthier <samuel.gauthier@6wind.com> CC: Balakumaran Kannan <Balakumaran.Kannan@ap.sony.com> CC: Maruthi Thotad <Maruthi.Thotad@ap.sony.com> CC: Sabrina Dubroca <sd@queasysnail.net> CC: Hannes Frederic Sowa <hannes@stressinduktion.org> CC: Weilong Chen <chenweilong@huawei.com> CC: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The commit ea3dc9601bda ("ip6_tunnel: Add support for wildcard tunnel
endpoints.") introduces support for wildcards in tunnels endpoints,
but in some rare circumstances ip6_tnl_lookup selects wrong tunnel
interface relying only on source or destination address of the packet
and not checking presence of wildcard in tunnels endpoints. Later in
ip6_tnl_rcv this packets can be dicarded because of difference in
ipproto even if fallback device have proper ipproto configuration.
This patch adds checks of wildcard endpoint in tunnel avoiding such
behavior
Fixes: ea3dc9601bda ("ip6_tunnel: Add support for wildcard tunnel endpoints.") Signed-off-by: Vadim Fedorenko <junk@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Issue is that ip6_datagram_recv_specific_ctl() expects to find IP6CB
data that was moved at a different place in tcp_v6_rcv()
This patch moves tcp_v6_restore_cb() up and calls it from
tcp_v6_do_rcv() when np->pktoptions is set.
Fixes: 971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache line misses") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Baozeng Ding <sploving1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>