]> git.itanic.dy.fi Git - linux-stable/log
linux-stable
5 years agoLinux 4.14.81 v4.14.81
Greg Kroah-Hartman [Tue, 13 Nov 2018 19:15:18 +0000 (11:15 -0800)]
Linux 4.14.81

5 years agoMD: fix invalid stored role for a disk - try2
Shaohua Li [Mon, 15 Oct 2018 00:05:07 +0000 (17:05 -0700)]
MD: fix invalid stored role for a disk - try2

commit 9e753ba9b9b405e3902d9f08aec5f2ea58a0c317 upstream.

Commit d595567dc4f0 (MD: fix invalid stored role for a disk) broke linear
hotadd. Let's only fix the role for disks in raid1/10.
Based on Guoqing's original patch.

Reported-by: kernel test robot <rong.a.chen@intel.com>
Cc: Gioh Kim <gi-oh.kim@profitbricks.com>
Cc: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agobpf: wait for running BPF programs when updating map-in-map
Daniel Colascione [Fri, 12 Oct 2018 10:54:27 +0000 (03:54 -0700)]
bpf: wait for running BPF programs when updating map-in-map

commit 1ae80cf31938c8f77c37a29bbe29e7f1cd492be8 upstream.

The map-in-map frequently serves as a mechanism for atomic
snapshotting of state that a BPF program might record.  The current
implementation is dangerous to use in this way, however, since
userspace has no way of knowing when all programs that might have
retrieved the "old" value of the map may have completed.

This change ensures that map update operations on map-in-map map types
always wait for all references to the old map to drop before returning
to userspace.

Signed-off-by: Daniel Colascione <dancol@google.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
[fengc@google.com: 4.14 backport: adjust context]
Signed-off-by: Chenbo Feng <fengc@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonet: sched: Remove TCA_OPTIONS from policy
David Ahern [Wed, 24 Oct 2018 15:32:49 +0000 (08:32 -0700)]
net: sched: Remove TCA_OPTIONS from policy

commit e72bde6b66299602087c8c2350d36a525e75d06e upstream.

Marco reported an error with hfsc:
root@Calimero:~# tc qdisc add dev eth0 root handle 1:0 hfsc default 1
Error: Attribute failed policy validation.

Apparently a few implementations pass TCA_OPTIONS as a binary instead
of nested attribute, so drop TCA_OPTIONS from the policy.

Fixes: 8b4c3cdd9dd8 ("net: sched: Add policy validation for tc attributes")
Reported-by: Marco Berizzi <pupilla@libero.it>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoBtrfs: fix fsync after hole punching when using no-holes feature
Filipe Manana [Mon, 26 Mar 2018 22:59:00 +0000 (23:59 +0100)]
Btrfs: fix fsync after hole punching when using no-holes feature

commit 4ee3fad34a9cc2cf33303dfbd0cf554248651c86 upstream.

When we have the no-holes mode enabled and fsync a file after punching a
hole in it, we can end up not logging the whole hole range in the log tree.
This happens if the file has extent items that span more than one leaf and
we punch a hole that covers a range that starts in a leaf but does not go
beyond the offset of the first extent in the next leaf.

Example:

  $ mkfs.btrfs -f -O no-holes -n 65536 /dev/sdb
  $ mount /dev/sdb /mnt
  $ for ((i = 0; i <= 831; i++)); do
offset=$((i * 2 * 256 * 1024))
xfs_io -f -c "pwrite -S 0xab -b 256K $offset 256K" \
/mnt/foobar >/dev/null
    done
  $ sync

  # We now have 2 leafs in our filesystem fs tree, the first leaf has an
  # item corresponding the extent at file offset 216530944 and the second
  # leaf has a first item corresponding to the extent at offset 217055232.
  # Now we punch a hole that partially covers the range of the extent at
  # offset 216530944 but does go beyond the offset 217055232.

  $ xfs_io -c "fpunch $((216530944 + 128 * 1024 - 4000)) 256K" /mnt/foobar
  $ xfs_io -c "fsync" /mnt/foobar

  <power fail>

  # mount to replay the log
  $ mount /dev/sdb /mnt

  # Before this patch, only the subrange [216658016216662016[ (length of
  # 4000 bytes) was logged, leaving an incorrect file layout after log
  # replay.

Fix this by checking if there is a hole between the last extent item that
we processed and the first extent item in the next leaf, and if there is
one, log an explicit hole extent item.

Fixes: 16e7549f045d ("Btrfs: incompatible format change to remove hole extents")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoBtrfs: fix use-after-free when dumping free space
Filipe Manana [Mon, 22 Oct 2018 09:43:06 +0000 (10:43 +0100)]
Btrfs: fix use-after-free when dumping free space

commit 9084cb6a24bf5838a665af92ded1af8363f9e563 upstream.

We were iterating a block group's free space cache rbtree without locking
first the lock that protects it (the free_space_ctl->free_space_offset
rbtree is protected by the free_space_ctl->tree_lock spinlock).

KASAN reported an use-after-free problem when iterating such a rbtree due
to a concurrent rbtree delete:

[ 9520.359168] ==================================================================
[ 9520.359656] BUG: KASAN: use-after-free in rb_next+0x13/0x90
[ 9520.359949] Read of size 8 at addr ffff8800b7ada500 by task btrfs-transacti/1721
[ 9520.360357]
[ 9520.360530] CPU: 4 PID: 1721 Comm: btrfs-transacti Tainted: G             L    4.19.0-rc8-nbor #555
[ 9520.360990] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[ 9520.362682] Call Trace:
[ 9520.362887]  dump_stack+0xa4/0xf5
[ 9520.363146]  print_address_description+0x78/0x280
[ 9520.363412]  kasan_report+0x263/0x390
[ 9520.363650]  ? rb_next+0x13/0x90
[ 9520.363873]  __asan_load8+0x54/0x90
[ 9520.364102]  rb_next+0x13/0x90
[ 9520.364380]  btrfs_dump_free_space+0x146/0x160 [btrfs]
[ 9520.364697]  dump_space_info+0x2cd/0x310 [btrfs]
[ 9520.364997]  btrfs_reserve_extent+0x1ee/0x1f0 [btrfs]
[ 9520.365310]  __btrfs_prealloc_file_range+0x1cc/0x620 [btrfs]
[ 9520.365646]  ? btrfs_update_time+0x180/0x180 [btrfs]
[ 9520.365923]  ? _raw_spin_unlock+0x27/0x40
[ 9520.366204]  ? btrfs_alloc_data_chunk_ondemand+0x2c0/0x5c0 [btrfs]
[ 9520.366549]  btrfs_prealloc_file_range_trans+0x23/0x30 [btrfs]
[ 9520.366880]  cache_save_setup+0x42e/0x580 [btrfs]
[ 9520.367220]  ? btrfs_check_data_free_space+0xd0/0xd0 [btrfs]
[ 9520.367518]  ? lock_downgrade+0x2f0/0x2f0
[ 9520.367799]  ? btrfs_write_dirty_block_groups+0x11f/0x6e0 [btrfs]
[ 9520.368104]  ? kasan_check_read+0x11/0x20
[ 9520.368349]  ? do_raw_spin_unlock+0xa8/0x140
[ 9520.368638]  btrfs_write_dirty_block_groups+0x2af/0x6e0 [btrfs]
[ 9520.368978]  ? btrfs_start_dirty_block_groups+0x870/0x870 [btrfs]
[ 9520.369282]  ? do_raw_spin_unlock+0xa8/0x140
[ 9520.369534]  ? _raw_spin_unlock+0x27/0x40
[ 9520.369811]  ? btrfs_run_delayed_refs+0x1b8/0x230 [btrfs]
[ 9520.370137]  commit_cowonly_roots+0x4b9/0x610 [btrfs]
[ 9520.370560]  ? commit_fs_roots+0x350/0x350 [btrfs]
[ 9520.370926]  ? btrfs_run_delayed_refs+0x1b8/0x230 [btrfs]
[ 9520.371285]  btrfs_commit_transaction+0x5e5/0x10e0 [btrfs]
[ 9520.371612]  ? btrfs_apply_pending_changes+0x90/0x90 [btrfs]
[ 9520.371943]  ? start_transaction+0x168/0x6c0 [btrfs]
[ 9520.372257]  transaction_kthread+0x21c/0x240 [btrfs]
[ 9520.372537]  kthread+0x1d2/0x1f0
[ 9520.372793]  ? btrfs_cleanup_transaction+0xb50/0xb50 [btrfs]
[ 9520.373090]  ? kthread_park+0xb0/0xb0
[ 9520.373329]  ret_from_fork+0x3a/0x50
[ 9520.373567]
[ 9520.373738] Allocated by task 1804:
[ 9520.373974]  kasan_kmalloc+0xff/0x180
[ 9520.374208]  kasan_slab_alloc+0x11/0x20
[ 9520.374447]  kmem_cache_alloc+0xfc/0x2d0
[ 9520.374731]  __btrfs_add_free_space+0x40/0x580 [btrfs]
[ 9520.375044]  unpin_extent_range+0x4f7/0x7a0 [btrfs]
[ 9520.375383]  btrfs_finish_extent_commit+0x15f/0x4d0 [btrfs]
[ 9520.375707]  btrfs_commit_transaction+0xb06/0x10e0 [btrfs]
[ 9520.376027]  btrfs_alloc_data_chunk_ondemand+0x237/0x5c0 [btrfs]
[ 9520.376365]  btrfs_check_data_free_space+0x81/0xd0 [btrfs]
[ 9520.376689]  btrfs_delalloc_reserve_space+0x25/0x80 [btrfs]
[ 9520.377018]  btrfs_direct_IO+0x42e/0x6d0 [btrfs]
[ 9520.377284]  generic_file_direct_write+0x11e/0x220
[ 9520.377587]  btrfs_file_write_iter+0x472/0xac0 [btrfs]
[ 9520.377875]  aio_write+0x25c/0x360
[ 9520.378106]  io_submit_one+0xaa0/0xdc0
[ 9520.378343]  __se_sys_io_submit+0xfa/0x2f0
[ 9520.378589]  __x64_sys_io_submit+0x43/0x50
[ 9520.378840]  do_syscall_64+0x7d/0x240
[ 9520.379081]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 9520.379387]
[ 9520.379557] Freed by task 1802:
[ 9520.379782]  __kasan_slab_free+0x173/0x260
[ 9520.380028]  kasan_slab_free+0xe/0x10
[ 9520.380262]  kmem_cache_free+0xc1/0x2c0
[ 9520.380544]  btrfs_find_space_for_alloc+0x4cd/0x4e0 [btrfs]
[ 9520.380866]  find_free_extent+0xa99/0x17e0 [btrfs]
[ 9520.381166]  btrfs_reserve_extent+0xd5/0x1f0 [btrfs]
[ 9520.381474]  btrfs_get_blocks_direct+0x60b/0xbd0 [btrfs]
[ 9520.381761]  __blockdev_direct_IO+0x10ee/0x58a1
[ 9520.382059]  btrfs_direct_IO+0x25a/0x6d0 [btrfs]
[ 9520.382321]  generic_file_direct_write+0x11e/0x220
[ 9520.382623]  btrfs_file_write_iter+0x472/0xac0 [btrfs]
[ 9520.382904]  aio_write+0x25c/0x360
[ 9520.383172]  io_submit_one+0xaa0/0xdc0
[ 9520.383416]  __se_sys_io_submit+0xfa/0x2f0
[ 9520.383678]  __x64_sys_io_submit+0x43/0x50
[ 9520.383927]  do_syscall_64+0x7d/0x240
[ 9520.384165]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 9520.384439]
[ 9520.384610] The buggy address belongs to the object at ffff8800b7ada500
                which belongs to the cache btrfs_free_space of size 72
[ 9520.385175] The buggy address is located 0 bytes inside of
                72-byte region [ffff8800b7ada500ffff8800b7ada548)
[ 9520.385691] The buggy address belongs to the page:
[ 9520.385957] page:ffffea0002deb680 count:1 mapcount:0 mapping:ffff880108a1d700 index:0x0 compound_mapcount: 0
[ 9520.388030] flags: 0x8100(slab|head)
[ 9520.388281] raw: 0000000000008100 ffffea0002deb608 ffffea0002728808 ffff880108a1d700
[ 9520.388722] raw: 0000000000000000 0000000000130013 00000001ffffffff 0000000000000000
[ 9520.389169] page dumped because: kasan: bad access detected
[ 9520.389473]
[ 9520.389658] Memory state around the buggy address:
[ 9520.389943]  ffff8800b7ada400: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 9520.390368]  ffff8800b7ada480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 9520.390796] >ffff8800b7ada500: fb fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc
[ 9520.391223]                    ^
[ 9520.391461]  ffff8800b7ada580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 9520.391885]  ffff8800b7ada600: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 9520.392313] ==================================================================
[ 9520.392772] BTRFS critical (device vdc): entry offset 2258497536, bytes 131072, bitmap no
[ 9520.393247] BUG: unable to handle kernel NULL pointer dereference at 0000000000000011
[ 9520.393705] PGD 800000010dbab067 P4D 800000010dbab067 PUD 107551067 PMD 0
[ 9520.394059] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[ 9520.394378] CPU: 4 PID: 1721 Comm: btrfs-transacti Tainted: G    B        L    4.19.0-rc8-nbor #555
[ 9520.394858] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[ 9520.395350] RIP: 0010:rb_next+0x3c/0x90
[ 9520.396461] RSP: 0018:ffff8801074ff780 EFLAGS: 00010292
[ 9520.396762] RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff81b5ac4c
[ 9520.397115] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 0000000000000011
[ 9520.397468] RBP: ffff8801074ff7a0 R08: ffffed0021d64ccc R09: ffffed0021d64ccc
[ 9520.397821] R10: 0000000000000001 R11: ffffed0021d64ccb R12: ffff8800b91e0000
[ 9520.398188] R13: ffff8800a3ceba48 R14: ffff8800b627bf80 R15: 0000000000020000
[ 9520.398555] FS:  0000000000000000(0000) GS:ffff88010eb00000(0000) knlGS:0000000000000000
[ 9520.399007] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9520.399335] CR2: 0000000000000011 CR3: 0000000106b52000 CR4: 00000000000006a0
[ 9520.399679] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 9520.400023] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 9520.400400] Call Trace:
[ 9520.400648]  btrfs_dump_free_space+0x146/0x160 [btrfs]
[ 9520.400974]  dump_space_info+0x2cd/0x310 [btrfs]
[ 9520.401287]  btrfs_reserve_extent+0x1ee/0x1f0 [btrfs]
[ 9520.401609]  __btrfs_prealloc_file_range+0x1cc/0x620 [btrfs]
[ 9520.401952]  ? btrfs_update_time+0x180/0x180 [btrfs]
[ 9520.402232]  ? _raw_spin_unlock+0x27/0x40
[ 9520.402522]  ? btrfs_alloc_data_chunk_ondemand+0x2c0/0x5c0 [btrfs]
[ 9520.402882]  btrfs_prealloc_file_range_trans+0x23/0x30 [btrfs]
[ 9520.403261]  cache_save_setup+0x42e/0x580 [btrfs]
[ 9520.403570]  ? btrfs_check_data_free_space+0xd0/0xd0 [btrfs]
[ 9520.403871]  ? lock_downgrade+0x2f0/0x2f0
[ 9520.404161]  ? btrfs_write_dirty_block_groups+0x11f/0x6e0 [btrfs]
[ 9520.404481]  ? kasan_check_read+0x11/0x20
[ 9520.404732]  ? do_raw_spin_unlock+0xa8/0x140
[ 9520.405026]  btrfs_write_dirty_block_groups+0x2af/0x6e0 [btrfs]
[ 9520.405375]  ? btrfs_start_dirty_block_groups+0x870/0x870 [btrfs]
[ 9520.405694]  ? do_raw_spin_unlock+0xa8/0x140
[ 9520.405958]  ? _raw_spin_unlock+0x27/0x40
[ 9520.406243]  ? btrfs_run_delayed_refs+0x1b8/0x230 [btrfs]
[ 9520.406574]  commit_cowonly_roots+0x4b9/0x610 [btrfs]
[ 9520.406899]  ? commit_fs_roots+0x350/0x350 [btrfs]
[ 9520.407253]  ? btrfs_run_delayed_refs+0x1b8/0x230 [btrfs]
[ 9520.407589]  btrfs_commit_transaction+0x5e5/0x10e0 [btrfs]
[ 9520.407925]  ? btrfs_apply_pending_changes+0x90/0x90 [btrfs]
[ 9520.408262]  ? start_transaction+0x168/0x6c0 [btrfs]
[ 9520.408582]  transaction_kthread+0x21c/0x240 [btrfs]
[ 9520.408870]  kthread+0x1d2/0x1f0
[ 9520.409138]  ? btrfs_cleanup_transaction+0xb50/0xb50 [btrfs]
[ 9520.409440]  ? kthread_park+0xb0/0xb0
[ 9520.409682]  ret_from_fork+0x3a/0x50
[ 9520.410508] Dumping ftrace buffer:
[ 9520.410764]    (ftrace buffer empty)
[ 9520.411007] CR2: 0000000000000011
[ 9520.411297] ---[ end trace 01a0863445cf360a ]---
[ 9520.411568] RIP: 0010:rb_next+0x3c/0x90
[ 9520.412644] RSP: 0018:ffff8801074ff780 EFLAGS: 00010292
[ 9520.412932] RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff81b5ac4c
[ 9520.413274] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 0000000000000011
[ 9520.413616] RBP: ffff8801074ff7a0 R08: ffffed0021d64ccc R09: ffffed0021d64ccc
[ 9520.414007] R10: 0000000000000001 R11: ffffed0021d64ccb R12: ffff8800b91e0000
[ 9520.414349] R13: ffff8800a3ceba48 R14: ffff8800b627bf80 R15: 0000000000020000
[ 9520.416074] FS:  0000000000000000(0000) GS:ffff88010eb00000(0000) knlGS:0000000000000000
[ 9520.416536] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9520.416848] CR2: 0000000000000011 CR3: 0000000106b52000 CR4: 00000000000006a0
[ 9520.418477] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 9520.418846] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 9520.419204] Kernel panic - not syncing: Fatal exception
[ 9520.419666] Dumping ftrace buffer:
[ 9520.419930]    (ftrace buffer empty)
[ 9520.420168] Kernel Offset: disabled
[ 9520.420406] ---[ end Kernel panic - not syncing: Fatal exception ]---

Fix this by acquiring the respective lock before iterating the rbtree.

Reported-by: Nikolay Borisov <nborisov@suse.com>
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoBtrfs: fix use-after-free during inode eviction
Filipe Manana [Fri, 12 Oct 2018 12:02:48 +0000 (13:02 +0100)]
Btrfs: fix use-after-free during inode eviction

commit 421f0922a2cfb0c75acd9746454aaa576c711a65 upstream.

At inode.c:evict_inode_truncate_pages(), when we iterate over the
inode's extent states, we access an extent state record's "state" field
after we unlocked the inode's io tree lock. This can lead to a
use-after-free issue because after we unlock the io tree that extent
state record might have been freed due to being merged into another
adjacent extent state record (a previous inflight bio for a read
operation finished in the meanwhile which unlocked a range in the io
tree and cause a merge of extent state records, as explained in the
comment before the while loop added in commit 6ca0709756710 ("Btrfs: fix
hang during inode eviction due to concurrent readahead")).

Fix this by keeping a copy of the extent state's flags in a local
variable and using it after unlocking the io tree.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201189
Fixes: b9d0b38928e2 ("btrfs: Add handler for invalidate page")
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agobtrfs: move the dio_sem higher up the callchain
Josef Bacik [Fri, 12 Oct 2018 19:32:32 +0000 (15:32 -0400)]
btrfs: move the dio_sem higher up the callchain

commit c495144bc6962186feae31d687596d2472000e45 upstream.

We're getting a lockdep splat because we take the dio_sem under the
log_mutex.  What we really need is to protect fsync() from logging an
extent map for an extent we never waited on higher up, so just guard the
whole thing with dio_sem.

======================================================
WARNING: possible circular locking dependency detected
4.18.0-rc4-xfstests-00025-g5de5edbaf1d4 #411 Not tainted
------------------------------------------------------
aio-dio-invalid/30928 is trying to acquire lock:
0000000092621cfd (&mm->mmap_sem){++++}, at: get_user_pages_unlocked+0x5a/0x1e0

but task is already holding lock:
00000000cefe6b35 (&ei->dio_sem){++++}, at: btrfs_direct_IO+0x3be/0x400

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #5 (&ei->dio_sem){++++}:
       lock_acquire+0xbd/0x220
       down_write+0x51/0xb0
       btrfs_log_changed_extents+0x80/0xa40
       btrfs_log_inode+0xbaf/0x1000
       btrfs_log_inode_parent+0x26f/0xa80
       btrfs_log_dentry_safe+0x50/0x70
       btrfs_sync_file+0x357/0x540
       do_fsync+0x38/0x60
       __ia32_sys_fdatasync+0x12/0x20
       do_fast_syscall_32+0x9a/0x2f0
       entry_SYSENTER_compat+0x84/0x96

-> #4 (&ei->log_mutex){+.+.}:
       lock_acquire+0xbd/0x220
       __mutex_lock+0x86/0xa10
       btrfs_record_unlink_dir+0x2a/0xa0
       btrfs_unlink+0x5a/0xc0
       vfs_unlink+0xb1/0x1a0
       do_unlinkat+0x264/0x2b0
       do_fast_syscall_32+0x9a/0x2f0
       entry_SYSENTER_compat+0x84/0x96

-> #3 (sb_internal#2){.+.+}:
       lock_acquire+0xbd/0x220
       __sb_start_write+0x14d/0x230
       start_transaction+0x3e6/0x590
       btrfs_evict_inode+0x475/0x640
       evict+0xbf/0x1b0
       btrfs_run_delayed_iputs+0x6c/0x90
       cleaner_kthread+0x124/0x1a0
       kthread+0x106/0x140
       ret_from_fork+0x3a/0x50

-> #2 (&fs_info->cleaner_delayed_iput_mutex){+.+.}:
       lock_acquire+0xbd/0x220
       __mutex_lock+0x86/0xa10
       btrfs_alloc_data_chunk_ondemand+0x197/0x530
       btrfs_check_data_free_space+0x4c/0x90
       btrfs_delalloc_reserve_space+0x20/0x60
       btrfs_page_mkwrite+0x87/0x520
       do_page_mkwrite+0x31/0xa0
       __handle_mm_fault+0x799/0xb00
       handle_mm_fault+0x7c/0xe0
       __do_page_fault+0x1d3/0x4a0
       async_page_fault+0x1e/0x30

-> #1 (sb_pagefaults){.+.+}:
       lock_acquire+0xbd/0x220
       __sb_start_write+0x14d/0x230
       btrfs_page_mkwrite+0x6a/0x520
       do_page_mkwrite+0x31/0xa0
       __handle_mm_fault+0x799/0xb00
       handle_mm_fault+0x7c/0xe0
       __do_page_fault+0x1d3/0x4a0
       async_page_fault+0x1e/0x30

-> #0 (&mm->mmap_sem){++++}:
       __lock_acquire+0x42e/0x7a0
       lock_acquire+0xbd/0x220
       down_read+0x48/0xb0
       get_user_pages_unlocked+0x5a/0x1e0
       get_user_pages_fast+0xa4/0x150
       iov_iter_get_pages+0xc3/0x340
       do_direct_IO+0xf93/0x1d70
       __blockdev_direct_IO+0x32d/0x1c20
       btrfs_direct_IO+0x227/0x400
       generic_file_direct_write+0xcf/0x180
       btrfs_file_write_iter+0x308/0x58c
       aio_write+0xf8/0x1d0
       io_submit_one+0x3a9/0x620
       __ia32_compat_sys_io_submit+0xb2/0x270
       do_int80_syscall_32+0x5b/0x1a0
       entry_INT80_compat+0x88/0xa0

other info that might help us debug this:

Chain exists of:
  &mm->mmap_sem --> &ei->log_mutex --> &ei->dio_sem

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&ei->dio_sem);
                               lock(&ei->log_mutex);
                               lock(&ei->dio_sem);
  lock(&mm->mmap_sem);

 *** DEADLOCK ***

1 lock held by aio-dio-invalid/30928:
 #0: 00000000cefe6b35 (&ei->dio_sem){++++}, at: btrfs_direct_IO+0x3be/0x400

stack backtrace:
CPU: 0 PID: 30928 Comm: aio-dio-invalid Not tainted 4.18.0-rc4-xfstests-00025-g5de5edbaf1d4 #411
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
Call Trace:
 dump_stack+0x7c/0xbb
 print_circular_bug.isra.37+0x297/0x2a4
 check_prev_add.constprop.45+0x781/0x7a0
 ? __lock_acquire+0x42e/0x7a0
 validate_chain.isra.41+0x7f0/0xb00
 __lock_acquire+0x42e/0x7a0
 lock_acquire+0xbd/0x220
 ? get_user_pages_unlocked+0x5a/0x1e0
 down_read+0x48/0xb0
 ? get_user_pages_unlocked+0x5a/0x1e0
 get_user_pages_unlocked+0x5a/0x1e0
 get_user_pages_fast+0xa4/0x150
 iov_iter_get_pages+0xc3/0x340
 do_direct_IO+0xf93/0x1d70
 ? __alloc_workqueue_key+0x358/0x490
 ? __blockdev_direct_IO+0x14b/0x1c20
 __blockdev_direct_IO+0x32d/0x1c20
 ? btrfs_run_delalloc_work+0x40/0x40
 ? can_nocow_extent+0x490/0x490
 ? kvm_clock_read+0x1f/0x30
 ? can_nocow_extent+0x490/0x490
 ? btrfs_run_delalloc_work+0x40/0x40
 btrfs_direct_IO+0x227/0x400
 ? btrfs_run_delalloc_work+0x40/0x40
 generic_file_direct_write+0xcf/0x180
 btrfs_file_write_iter+0x308/0x58c
 aio_write+0xf8/0x1d0
 ? kvm_clock_read+0x1f/0x30
 ? __might_fault+0x3e/0x90
 io_submit_one+0x3a9/0x620
 ? io_submit_one+0xe5/0x620
 __ia32_compat_sys_io_submit+0xb2/0x270
 do_int80_syscall_32+0x5b/0x1a0
 entry_INT80_compat+0x88/0xa0

CC: stable@vger.kernel.org # 4.14+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agobtrfs: don't run delayed_iputs in commit
Josef Bacik [Thu, 11 Oct 2018 19:54:31 +0000 (15:54 -0400)]
btrfs: don't run delayed_iputs in commit

commit 30928e9baac238a7330085a1c5747f0b5df444b4 upstream.

This could result in a really bad case where we do something like

evict
  evict_refill_and_join
    btrfs_commit_transaction
      btrfs_run_delayed_iputs
        evict
          evict_refill_and_join
            btrfs_commit_transaction
... forever

We have plenty of other places where we run delayed iputs that are much
safer, let those do the work.

CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agobtrfs: only free reserved extent if we didn't insert it
Josef Bacik [Thu, 11 Oct 2018 19:54:21 +0000 (15:54 -0400)]
btrfs: only free reserved extent if we didn't insert it

commit 49940bdd57779c78462da7aa5a8650b2fea8c2ff upstream.

When we insert the file extent once the ordered extent completes we free
the reserved extent reservation as it'll have been migrated to the
bytes_used counter.  However if we error out after this step we'll still
clear the reserved extent reservation, resulting in a negative
accounting of the reserved bytes for the block group and space info.
Fix this by only doing the free if we didn't successfully insert a file
extent for this extent.

CC: stable@vger.kernel.org # 4.14+
Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agobtrfs: don't use ctl->free_space for max_extent_size
Josef Bacik [Thu, 11 Oct 2018 19:54:09 +0000 (15:54 -0400)]
btrfs: don't use ctl->free_space for max_extent_size

commit fb5c39d7a887108087de6ff93d3f326b01b4ef41 upstream.

max_extent_size is supposed to be the largest contiguous range for the
space info, and ctl->free_space is the total free space in the block
group.  We need to keep track of these separately and _only_ use the
max_free_space if we don't have a max_extent_size, as that means our
original request was too large to search any of the block groups for and
therefore wouldn't have a max_extent_size set.

CC: stable@vger.kernel.org # 4.14+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agobtrfs: set max_extent_size properly
Josef Bacik [Fri, 12 Oct 2018 19:32:33 +0000 (15:32 -0400)]
btrfs: set max_extent_size properly

commit ad22cf6ea47fa20fbe11ac324a0a15c0a9a4a2a9 upstream.

We can't use entry->bytes if our entry is a bitmap entry, we need to use
entry->max_extent_size in that case.  Fix up all the logic to make this
consistent.

CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoBtrfs: fix assertion on fsync of regular file when using no-holes feature
Filipe Manana [Mon, 15 Oct 2018 08:51:00 +0000 (09:51 +0100)]
Btrfs: fix assertion on fsync of regular file when using no-holes feature

commit 7ed586d0a8241e81d58c656c5b315f781fa6fc97 upstream.

When using the NO_HOLES feature and logging a regular file, we were
expecting that if we find an inline extent, that either its size in RAM
(uncompressed and unenconded) matches the size of the file or if it does
not, that it matches the sector size and it represents compressed data.
This assertion does not cover a case where the length of the inline extent
is smaller than the sector size and also smaller the file's size, such
case is possible through fallocate. Example:

  $ mkfs.btrfs -f -O no-holes /dev/sdb
  $ mount /dev/sdb /mnt

  $ xfs_io -f -c "pwrite -S 0xb60 0 21" /mnt/foobar
  $ xfs_io -c "falloc 40 40" /mnt/foobar
  $ xfs_io -c "fsync" /mnt/foobar

In the above example we trigger the assertion because the inline extent's
length is 21 bytes while the file size is 80 bytes. The fallocate() call
merely updated the file's size and did not touch the existing inline
extent, as expected.

So fix this by adjusting the assertion so that an inline extent length
smaller than the file size is valid if the file size is smaller than the
filesystem's sector size.

A test case for fstests follows soon.

Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Fixes: a89ca6f24ffe ("Btrfs: fix fsync after truncate when no_holes feature is enabled")
CC: stable@vger.kernel.org # 4.14+
Link: https://lore.kernel.org/linux-btrfs/CAE5jQCfRSBC7n4pUTFJcmHh109=gwyT9mFkCOL+NKfzswmR=_Q@mail.gmail.com/
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoBtrfs: fix null pointer dereference on compressed write path error
Filipe Manana [Fri, 12 Oct 2018 23:37:25 +0000 (00:37 +0100)]
Btrfs: fix null pointer dereference on compressed write path error

commit 3527a018c00e5dbada2f9d7ed5576437b6dd5cfb upstream.

At inode.c:compress_file_range(), under the "free_pages_out" label, we can
end up dereferencing the "pages" pointer when it has a NULL value. This
case happens when "start" has a value of 0 and we fail to allocate memory
for the "pages" pointer. When that happens we jump to the "cont" label and
then enter the "if (start == 0)" branch where we immediately call the
cow_file_range_inline() function. If that function returns 0 (success
creating an inline extent) or an error (like -ENOMEM for example) we jump
to the "free_pages_out" label and then access "pages[i]" leading to a NULL
pointer dereference, since "nr_pages" has a value greater than zero at
that point.

Fix this by setting "nr_pages" to 0 when we fail to allocate memory for
the "pages" pointer.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201119
Fixes: 771ed689d2cd ("Btrfs: Optimize compressed writeback and reads")
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agobtrfs: qgroup: Dirty all qgroups before rescan
Qu Wenruo [Fri, 10 Aug 2018 02:20:26 +0000 (10:20 +0800)]
btrfs: qgroup: Dirty all qgroups before rescan

commit 9c7b0c2e8dbfbcd80a71e2cbfe02704f26c185c6 upstream.

[BUG]
In the following case, rescan won't zero out the number of qgroup 1/0:

  $ mkfs.btrfs -fq $DEV
  $ mount $DEV /mnt

  $ btrfs quota enable /mnt
  $ btrfs qgroup create 1/0 /mnt
  $ btrfs sub create /mnt/sub
  $ btrfs qgroup assign 0/257 1/0 /mnt

  $ dd if=/dev/urandom of=/mnt/sub/file bs=1k count=1000
  $ btrfs sub snap /mnt/sub /mnt/snap
  $ btrfs quota rescan -w /mnt
  $ btrfs qgroup show -pcre /mnt
  qgroupid         rfer         excl     max_rfer     max_excl parent  child
  --------         ----         ----     --------     -------- ------  -----
  0/5          16.00KiB     16.00KiB         none         none ---     ---
  0/257      1016.00KiB     16.00KiB         none         none 1/0     ---
  0/258      1016.00KiB     16.00KiB         none         none ---     ---
  1/0        1016.00KiB     16.00KiB         none         none ---     0/257

So far so good, but:

  $ btrfs qgroup remove 0/257 1/0 /mnt
  WARNING: quotas may be inconsistent, rescan needed
  $ btrfs quota rescan -w /mnt
  $ btrfs qgroup show -pcre  /mnt
  qgoupid         rfer         excl     max_rfer     max_excl parent  child
  --------         ----         ----     --------     -------- ------  -----
  0/5          16.00KiB     16.00KiB         none         none ---     ---
  0/257      1016.00KiB     16.00KiB         none         none ---     ---
  0/258      1016.00KiB     16.00KiB         none         none ---     ---
  1/0        1016.00KiB     16.00KiB         none         none ---     ---
     ^^^^^^^^^^     ^^^^^^^^ not cleared

[CAUSE]
Before rescan we call qgroup_rescan_zero_tracking() to zero out all
qgroups' accounting numbers.

However we don't mark all qgroups dirty, but rely on rescan to do so.

If we have any high level qgroup without children, it won't be marked
dirty during rescan, since we cannot reach that qgroup.

This will cause QGROUP_INFO items of childless qgroups never get updated
in the quota tree, thus their numbers will stay the same in "btrfs
qgroup show" output.

[FIX]
Just mark all qgroups dirty in qgroup_rescan_zero_tracking(), so even if
we have childless qgroups, their QGROUP_INFO items will still get
updated during rescan.

Reported-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Tested-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoBtrfs: fix wrong dentries after fsync of file that got its parent replaced
Filipe Manana [Tue, 9 Oct 2018 14:05:29 +0000 (15:05 +0100)]
Btrfs: fix wrong dentries after fsync of file that got its parent replaced

commit 0f375eed92b5a407657532637ed9652611a682f5 upstream.

In a scenario like the following:

  mkdir /mnt/A               # inode 258
  mkdir /mnt/B               # inode 259
  touch /mnt/B/bar           # inode 260

  sync

  mv /mnt/B/bar /mnt/A/bar
  mv -T /mnt/A /mnt/B
  fsync /mnt/B/bar

  <power fail>

After replaying the log we end up with file bar having 2 hard links, both
with the name 'bar' and one in the directory with inode number 258 and the
other in the directory with inode number 259. Also, we end up with the
directory inode 259 still existing and with the directory inode 258 still
named as 'A', instead of 'B'. In this scenario, file 'bar' should only
have one hard link, located at directory inode 258, the directory inode
259 should not exist anymore and the name for directory inode 258 should
be 'B'.

This incorrect behaviour happens because when attempting to log the old
parents of an inode, we skip any parents that no longer exist. Fix this
by forcing a full commit if an old parent no longer exists.

A test case for fstests follows soon.

CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoBtrfs: fix warning when replaying log after fsync of a tmpfile
Filipe Manana [Mon, 8 Oct 2018 10:12:55 +0000 (11:12 +0100)]
Btrfs: fix warning when replaying log after fsync of a tmpfile

commit f2d72f42d5fa3bf33761d9e47201745f624fcff5 upstream.

When replaying a log which contains a tmpfile (which necessarily has a
link count of 0) we end up calling inc_nlink(), at
fs/btrfs/tree-log.c:replay_one_buffer(), which produces a warning like
the following:

  [195191.943673] WARNING: CPU: 0 PID: 6924 at fs/inode.c:342 inc_nlink+0x33/0x40
  [195191.943723] CPU: 0 PID: 6924 Comm: mount Not tainted 4.19.0-rc6-btrfs-next-38 #1
  [195191.943724] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
  [195191.943726] RIP: 0010:inc_nlink+0x33/0x40
  [195191.943728] RSP: 0018:ffffb96e425e3870 EFLAGS: 00010246
  [195191.943730] RAX: 0000000000000000 RBX: ffff8c0d1e6af4f0 RCX: 0000000000000006
  [195191.943731] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8c0d1e6af4f0
  [195191.943731] RBP: 0000000000000097 R08: 0000000000000001 R09: 0000000000000000
  [195191.943732] R10: 0000000000000000 R11: 0000000000000000 R12: ffffb96e425e3a60
  [195191.943733] R13: ffff8c0d10cff0c8 R14: ffff8c0d0d515348 R15: ffff8c0d78a1b3f8
  [195191.943735] FS:  00007f570ee24480(0000) GS:ffff8c0dfb200000(0000) knlGS:0000000000000000
  [195191.943736] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [195191.943737] CR2: 00005593286277c8 CR3: 00000000bb8f2006 CR4: 00000000003606f0
  [195191.943739] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  [195191.943740] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  [195191.943741] Call Trace:
  [195191.943778]  replay_one_buffer+0x797/0x7d0 [btrfs]
  [195191.943802]  walk_up_log_tree+0x1c1/0x250 [btrfs]
  [195191.943809]  ? rcu_read_lock_sched_held+0x3f/0x70
  [195191.943825]  walk_log_tree+0xae/0x1d0 [btrfs]
  [195191.943840]  btrfs_recover_log_trees+0x1d7/0x4d0 [btrfs]
  [195191.943856]  ? replay_dir_deletes+0x280/0x280 [btrfs]
  [195191.943870]  open_ctree+0x1c3b/0x22a0 [btrfs]
  [195191.943887]  btrfs_mount_root+0x6b4/0x800 [btrfs]
  [195191.943894]  ? rcu_read_lock_sched_held+0x3f/0x70
  [195191.943899]  ? pcpu_alloc+0x55b/0x7c0
  [195191.943906]  ? mount_fs+0x3b/0x140
  [195191.943908]  mount_fs+0x3b/0x140
  [195191.943912]  ? __init_waitqueue_head+0x36/0x50
  [195191.943916]  vfs_kern_mount+0x62/0x160
  [195191.943927]  btrfs_mount+0x134/0x890 [btrfs]
  [195191.943936]  ? rcu_read_lock_sched_held+0x3f/0x70
  [195191.943938]  ? pcpu_alloc+0x55b/0x7c0
  [195191.943943]  ? mount_fs+0x3b/0x140
  [195191.943952]  ? btrfs_remount+0x570/0x570 [btrfs]
  [195191.943954]  mount_fs+0x3b/0x140
  [195191.943956]  ? __init_waitqueue_head+0x36/0x50
  [195191.943960]  vfs_kern_mount+0x62/0x160
  [195191.943963]  do_mount+0x1f9/0xd40
  [195191.943967]  ? memdup_user+0x4b/0x70
  [195191.943971]  ksys_mount+0x7e/0xd0
  [195191.943974]  __x64_sys_mount+0x21/0x30
  [195191.943977]  do_syscall_64+0x60/0x1b0
  [195191.943980]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
  [195191.943983] RIP: 0033:0x7f570e4e524a
  [195191.943986] RSP: 002b:00007ffd83589478 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
  [195191.943989] RAX: ffffffffffffffda RBX: 0000563f335b2060 RCX: 00007f570e4e524a
  [195191.943990] RDX: 0000563f335b2240 RSI: 0000563f335b2280 RDI: 0000563f335b2260
  [195191.943992] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000020
  [195191.943993] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 0000563f335b2260
  [195191.943994] R13: 0000563f335b2240 R14: 0000000000000000 R15: 00000000ffffffff
  [195191.944002] irq event stamp: 8688
  [195191.944010] hardirqs last  enabled at (8687): [<ffffffff9cb004c3>] console_unlock+0x503/0x640
  [195191.944012] hardirqs last disabled at (8688): [<ffffffff9ca037dd>] trace_hardirqs_off_thunk+0x1a/0x1c
  [195191.944018] softirqs last  enabled at (8638): [<ffffffff9cc0a5d1>] __set_page_dirty_nobuffers+0x101/0x150
  [195191.944020] softirqs last disabled at (8634): [<ffffffff9cc26bbe>] wb_wakeup_delayed+0x2e/0x60
  [195191.944022] ---[ end trace 5d6e873a9a0b811a ]---

This happens because the inode does not have the flag I_LINKABLE set,
which is a runtime only flag, not meant to be persisted, set when the
inode is created through open(2) if the flag O_EXCL is not passed to it.
Except for the warning, there are no other consequences (like corruptions
or metadata inconsistencies).

Since it's pointless to replay a tmpfile as it would be deleted in a
later phase of the log replay procedure (it has a link count of 0), fix
this by not logging tmpfiles and if a tmpfile is found in a log (created
by a kernel without this change), skip the replay of the inode.

A test case for fstests follows soon.

Fixes: 471d557afed1 ("Btrfs: fix loss of prealloc extents past i_size after fsync log replay")
CC: stable@vger.kernel.org # 4.18+
Reported-by: Martin Steigerwald <martin@lichtvoll.de>
Link: https://lore.kernel.org/linux-btrfs/3666619.NTnn27ZJZE@merkaba/
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agobtrfs: make sure we create all new block groups
Josef Bacik [Fri, 28 Sep 2018 11:18:02 +0000 (07:18 -0400)]
btrfs: make sure we create all new block groups

commit 545e3366db823dc3342ca9d7fea803f829c9062f upstream.

Allocating new chunks modifies both the extent and chunk tree, which can
trigger new chunk allocations.  So instead of doing list_for_each_safe,
just do while (!list_empty()) so we make sure we don't exit with other
pending bg's still on our list.

CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agobtrfs: reset max_extent_size on clear in a bitmap
Josef Bacik [Fri, 28 Sep 2018 11:18:00 +0000 (07:18 -0400)]
btrfs: reset max_extent_size on clear in a bitmap

commit 553cceb49681d60975d00892877d4c871bf220f9 upstream.

We need to clear the max_extent_size when we clear bits from a bitmap
since it could have been from the range that contains the
max_extent_size.

CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agobtrfs: protect space cache inode alloc with GFP_NOFS
Josef Bacik [Fri, 28 Sep 2018 11:17:49 +0000 (07:17 -0400)]
btrfs: protect space cache inode alloc with GFP_NOFS

commit 84de76a2fb217dc1b6bc2965cc397d1648aa1404 upstream.

If we're allocating a new space cache inode it's likely going to be
under a transaction handle, so we need to use memalloc_nofs_save() in
order to avoid deadlocks, and more importantly lockdep messages that
make xfstests fail.

CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agobtrfs: wait on caching when putting the bg cache
Josef Bacik [Wed, 12 Sep 2018 14:45:45 +0000 (10:45 -0400)]
btrfs: wait on caching when putting the bg cache

commit 3aa7c7a31c26321696b92841d5103461c6f3f517 upstream.

While testing my backport I noticed there was a panic if I ran
generic/416 generic/417 generic/418 all in a row.  This just happened to
uncover a race where we had outstanding IO after we destroy all of our
workqueues, and then we'd go to queue the endio work on those free'd
workqueues.

This is because we aren't waiting for the caching threads to be done
before freeing everything up, so to fix this make sure we wait on any
outstanding caching that's being done before we free up the block group,
so we're sure to be done with all IO by the time we get to
btrfs_stop_all_workers().  This fixes the panic I was seeing
consistently in testing.

------------[ cut here ]------------
kernel BUG at fs/btrfs/volumes.c:6112!
SMP PTI
Modules linked in:
CPU: 1 PID: 27165 Comm: kworker/u4:7 Not tainted 4.16.0-02155-g3553e54a578d-dirty #875
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
Workqueue: btrfs-cache btrfs_cache_helper
RIP: 0010:btrfs_map_bio+0x346/0x370
RSP: 0000:ffffc900061e79d0 EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff880071542e00 RCX: 0000000000533000
RDX: ffff88006bb74380 RSI: 0000000000000008 RDI: ffff880078160000
RBP: 0000000000000001 R08: ffff8800781cd200 R09: 0000000000503000
R10: ffff88006cd21200 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: ffff8800781cd200 R15: ffff880071542e00
FS:  0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000817ffc4 CR3: 0000000078314000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 btree_submit_bio_hook+0x8a/0xd0
 submit_one_bio+0x5d/0x80
 read_extent_buffer_pages+0x18a/0x320
 btree_read_extent_buffer_pages+0xbc/0x200
 ? alloc_extent_buffer+0x359/0x3e0
 read_tree_block+0x3d/0x60
 read_block_for_search.isra.30+0x1a5/0x360
 btrfs_search_slot+0x41b/0xa10
 btrfs_next_old_leaf+0x212/0x470
 caching_thread+0x323/0x490
 normal_work_helper+0xc5/0x310
 process_one_work+0x141/0x340
 worker_thread+0x44/0x3c0
 kthread+0xf8/0x130
 ? process_one_work+0x340/0x340
 ? kthread_bind+0x10/0x10
 ret_from_fork+0x35/0x40
RIP: btrfs_map_bio+0x346/0x370 RSP: ffffc900061e79d0
---[ end trace 827eb13e50846033 ]---
Kernel panic - not syncing: Fatal exception
Kernel Offset: disabled
---[ end Kernel panic - not syncing: Fatal exception

CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agobtrfs: don't attempt to trim devices that don't support it
Jeff Mahoney [Thu, 6 Sep 2018 21:18:15 +0000 (17:18 -0400)]
btrfs: don't attempt to trim devices that don't support it

commit 0be88e367fd8fbdb45257615d691f4675dda062f upstream.

We check whether any device the file system is using supports discard in
the ioctl call, but then we attempt to trim free extents on every device
regardless of whether discard is supported.  Due to the way we mask off
EOPNOTSUPP, we can end up issuing the trim operations on each free range
on devices that don't support it, just wasting time.

Fixes: 499f377f49f08 ("btrfs: iterate over unused chunk space in FITRIM")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agobtrfs: iterate all devices during trim, instead of fs_devices::alloc_list
Jeff Mahoney [Thu, 6 Sep 2018 21:18:14 +0000 (17:18 -0400)]
btrfs: iterate all devices during trim, instead of fs_devices::alloc_list

commit d4e329de5e5e21594df2e0dd59da9acee71f133b upstream.

btrfs_trim_fs iterates over the fs_devices->alloc_list while holding the
device_list_mutex.  The problem is that ->alloc_list is protected by the
chunk mutex.  We don't want to hold the chunk mutex over the trim of the
entire file system.  Fortunately, the ->dev_list list is protected by
the dev_list mutex and while it will give us all devices, including
read-only devices, we already just skip the read-only devices.  Then we
can continue to take and release the chunk mutex while scanning each
device.

Fixes: 499f377f49f ("btrfs: iterate over unused chunk space in FITRIM")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agobtrfs: Ensure btrfs_trim_fs can trim the whole filesystem
Qu Wenruo [Fri, 7 Sep 2018 06:16:24 +0000 (14:16 +0800)]
btrfs: Ensure btrfs_trim_fs can trim the whole filesystem

commit 6ba9fc8e628becf0e3ec94083450d089b0dec5f5 upstream.

[BUG]
fstrim on some btrfs only trims the unallocated space, not trimming any
space in existing block groups.

[CAUSE]
Before fstrim_range passed to btrfs_trim_fs(), it gets truncated to
range [0, super->total_bytes).  So later btrfs_trim_fs() will only be
able to trim block groups in range [0, super->total_bytes).

While for btrfs, any bytenr aligned to sectorsize is valid, since btrfs
uses its logical address space, there is nothing limiting the location
where we put block groups.

For filesystem with frequent balance, it's quite easy to relocate all
block groups and bytenr of block groups will start beyond
super->total_bytes.

In that case, btrfs will not trim existing block groups.

[FIX]
Just remove the truncation in btrfs_ioctl_fitrim(), so btrfs_trim_fs()
can get the unmodified range, which is normally set to [0, U64_MAX].

Reported-by: Chris Murphy <lists@colorremedies.com>
Fixes: f4c697e6406d ("btrfs: return EINVAL if start > total_bytes in fitrim ioctl")
CC: <stable@vger.kernel.org> # v4.4+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agobtrfs: Enhance btrfs_trim_fs function to handle error better
Qu Wenruo [Fri, 7 Sep 2018 06:16:23 +0000 (14:16 +0800)]
btrfs: Enhance btrfs_trim_fs function to handle error better

commit 93bba24d4b5ad1e5cd8b43f64e66ff9d6355dd20 upstream.

Function btrfs_trim_fs() doesn't handle errors in a consistent way. If
error happens when trimming existing block groups, it will skip the
remaining blocks and continue to trim unallocated space for each device.

The return value will only reflect the final error from device trimming.

This patch will fix such behavior by:

1) Recording the last error from block group or device trimming
   The return value will also reflect the last error during trimming.
   Make developer more aware of the problem.

2) Continuing trimming if possible
   If we failed to trim one block group or device, we could still try
   the next block group or device.

3) Report number of failures during block group and device trimming
   It would be less noisy, but still gives user a brief summary of
   what's going wrong.

Such behavior can avoid confusion for cases like failure to trim the
first block group and then only unallocated space is trimmed.

Reported-by: Chris Murphy <lists@colorremedies.com>
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ add bg_ret and dev_ret to the messages ]
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agobtrfs: fix error handling in free_log_tree
Jeff Mahoney [Thu, 6 Sep 2018 20:59:33 +0000 (16:59 -0400)]
btrfs: fix error handling in free_log_tree

commit 374b0e2d6ba5da7fd1cadb3247731ff27d011f6f upstream.

When we hit an I/O error in free_log_tree->walk_log_tree during file system
shutdown we can crash due to there not being a valid transaction handle.

Use btrfs_handle_fs_error when there's no transaction handle to use.

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000060
  IP: free_log_tree+0xd2/0x140 [btrfs]
  PGD 0 P4D 0
  Oops: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
  Modules linked in: <modules>
  CPU: 2 PID: 23544 Comm: umount Tainted: G        W        4.12.14-kvmsmall #9 SLE15 (unreleased)
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
  task: ffff96bfd3478880 task.stack: ffffa7cf40d78000
  RIP: 0010:free_log_tree+0xd2/0x140 [btrfs]
  RSP: 0018:ffffa7cf40d7bd10 EFLAGS: 00010282
  RAX: 00000000fffffffb RBX: 00000000fffffffb RCX: 0000000000000002
  RDX: 0000000000000000 RSI: ffff96c02f07d4c8 RDI: 0000000000000282
  RBP: ffff96c013cf1000 R08: ffff96c02f07d4c8 R09: ffff96c02f07d4d0
  R10: 0000000000000000 R11: 0000000000000002 R12: 0000000000000000
  R13: ffff96c005e800c0 R14: ffffa7cf40d7bdb8 R15: 0000000000000000
  FS:  00007f17856bcfc0(0000) GS:ffff96c03f600000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000060 CR3: 0000000045ed6002 CR4: 00000000003606e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   ? wait_for_writer+0xb0/0xb0 [btrfs]
   btrfs_free_log+0x17/0x30 [btrfs]
   btrfs_drop_and_free_fs_root+0x9a/0xe0 [btrfs]
   btrfs_free_fs_roots+0xc0/0x130 [btrfs]
   ? wait_for_completion+0xf2/0x100
   close_ctree+0xea/0x2e0 [btrfs]
   ? kthread_stop+0x161/0x260
   generic_shutdown_super+0x6c/0x120
   kill_anon_super+0xe/0x20
   btrfs_kill_super+0x13/0x100 [btrfs]
   deactivate_locked_super+0x3f/0x70
   cleanup_mnt+0x3b/0x70
   task_work_run+0x78/0x90
   exit_to_usermode_loop+0x77/0xa6
   do_syscall_64+0x1c5/0x1e0
   entry_SYSCALL_64_after_hwframe+0x42/0xb7
  RIP: 0033:0x7f1784f90827
  RSP: 002b:00007ffdeeb03118 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
  RAX: 0000000000000000 RBX: 0000556a60c62970 RCX: 00007f1784f90827
  RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000556a60c62b50
  RBP: 0000000000000000 R08: 0000000000000005 R09: 00000000ffffffff
  R10: 0000556a60c63900 R11: 0000000000000246 R12: 0000556a60c62b50
  R13: 00007f17854a81c4 R14: 0000000000000000 R15: 0000000000000000
  RIP: free_log_tree+0xd2/0x140 [btrfs] RSP: ffffa7cf40d7bd10
  CR2: 0000000000000060

Fixes: 681ae50917df9 ("Btrfs: cleanup reserved space when freeing tree log on error")
CC: <stable@vger.kernel.org> # v3.13
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agobtrfs: locking: Add extra check in btrfs_init_new_buffer() to avoid deadlock
Qu Wenruo [Tue, 21 Aug 2018 01:53:47 +0000 (09:53 +0800)]
btrfs: locking: Add extra check in btrfs_init_new_buffer() to avoid deadlock

commit b72c3aba09a53fc7c1824250d71180ca154517a7 upstream.

[BUG]
For certain crafted image, whose csum root leaf has missing backref, if
we try to trigger write with data csum, it could cause deadlock with the
following kernel WARN_ON():

  WARNING: CPU: 1 PID: 41 at fs/btrfs/locking.c:230 btrfs_tree_lock+0x3e2/0x400
  CPU: 1 PID: 41 Comm: kworker/u4:1 Not tainted 4.18.0-rc1+ #8
  Workqueue: btrfs-endio-write btrfs_endio_write_helper
  RIP: 0010:btrfs_tree_lock+0x3e2/0x400
  Call Trace:
   btrfs_alloc_tree_block+0x39f/0x770
   __btrfs_cow_block+0x285/0x9e0
   btrfs_cow_block+0x191/0x2e0
   btrfs_search_slot+0x492/0x1160
   btrfs_lookup_csum+0xec/0x280
   btrfs_csum_file_blocks+0x2be/0xa60
   add_pending_csums+0xaf/0xf0
   btrfs_finish_ordered_io+0x74b/0xc90
   finish_ordered_fn+0x15/0x20
   normal_work_helper+0xf6/0x500
   btrfs_endio_write_helper+0x12/0x20
   process_one_work+0x302/0x770
   worker_thread+0x81/0x6d0
   kthread+0x180/0x1d0
   ret_from_fork+0x35/0x40

[CAUSE]
That crafted image has missing backref for csum tree root leaf.  And
when we try to allocate new tree block, since there is no
EXTENT/METADATA_ITEM for csum tree root, btrfs consider it's free slot
and use it.

The extent tree of the image looks like:

  Normal image                      |       This fuzzed image
  ----------------------------------+--------------------------------
  BG 29360128                       | BG 29360128
   One empty slot                   |  One empty slot
  29364224: backref to UUID tree    | 29364224: backref to UUID tree
   Two empty slots                  |  Two empty slots
  29376512: backref to CSUM tree    |  One empty slot (bad type) <<<
  29380608: backref to D_RELOC tree | 29380608: backref to D_RELOC tree
  ...                               | ...

Since bytenr 29376512 has no METADATA/EXTENT_ITEM, when btrfs try to
alloc tree block, it's an valid slot for btrfs.

And for finish_ordered_write, when we need to insert csum, we try to CoW
csum tree root.

By accident, empty slots at bytenr BG_OFFSET, BG_OFFSET + 8K,
BG_OFFSET + 12K is already used by tree block COW for other trees, the
next empty slot is BG_OFFSET + 16K, which should be the backref for CSUM
tree.

But due to the bad type, btrfs can recognize it and still consider it as
an empty slot, and will try to use it for csum tree CoW.

Then in the following call trace, we will try to lock the new tree
block, which turns out to be the old csum tree root which is already
locked:

btrfs_search_slot() called on csum tree root, which is at 29376512
|- btrfs_cow_block()
   |- btrfs_set_lock_block()
   |  |- Now locks tree block 29376512 (old csum tree root)
   |- __btrfs_cow_block()
      |- btrfs_alloc_tree_block()
         |- btrfs_reserve_extent()
            | Now it returns tree block 29376512, which extent tree
            | shows its empty slot, but it's already hold by csum tree
            |- btrfs_init_new_buffer()
               |- btrfs_tree_lock()
                  | Triggers WARN_ON(eb->lock_owner == current->pid)
                  |- wait_event()
                     Wait lock owner to release the lock, but it's
                     locked by ourself, so it will deadlock

[FIX]
This patch will do the lock_owner and current->pid check at
btrfs_init_new_buffer().
So above deadlock can be avoided.

Since such problem can only happen in crafted image, we will still
trigger kernel warning for later aborted transaction, but with a little
more meaningful warning message.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=200405
Reported-by: Xu Wen <wen.xu@gatech.edu>
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agobtrfs: Handle owner mismatch gracefully when walking up tree
Qu Wenruo [Tue, 21 Aug 2018 01:42:03 +0000 (09:42 +0800)]
btrfs: Handle owner mismatch gracefully when walking up tree

commit 65c6e82becec33731f48786e5a30f98662c86b16 upstream.

[BUG]
When mounting certain crafted image, btrfs will trigger kernel BUG_ON()
when trying to recover balance:

  kernel BUG at fs/btrfs/extent-tree.c:8956!
  invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
  CPU: 1 PID: 662 Comm: mount Not tainted 4.18.0-rc1-custom+ #10
  RIP: 0010:walk_up_proc+0x336/0x480 [btrfs]
  RSP: 0018:ffffb53540c9b890 EFLAGS: 00010202
  Call Trace:
   walk_up_tree+0x172/0x1f0 [btrfs]
   btrfs_drop_snapshot+0x3a4/0x830 [btrfs]
   merge_reloc_roots+0xe1/0x1d0 [btrfs]
   btrfs_recover_relocation+0x3ea/0x420 [btrfs]
   open_ctree+0x1af3/0x1dd0 [btrfs]
   btrfs_mount_root+0x66b/0x740 [btrfs]
   mount_fs+0x3b/0x16a
   vfs_kern_mount.part.9+0x54/0x140
   btrfs_mount+0x16d/0x890 [btrfs]
   mount_fs+0x3b/0x16a
   vfs_kern_mount.part.9+0x54/0x140
   do_mount+0x1fd/0xda0
   ksys_mount+0xba/0xd0
   __x64_sys_mount+0x21/0x30
   do_syscall_64+0x60/0x210
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

[CAUSE]
Extent tree corruption.  In this particular case, reloc tree root's
owner is DATA_RELOC_TREE (should be TREE_RELOC), thus its backref is
corrupted and we failed the owner check in walk_up_tree().

[FIX]
It's pretty hard to take care of every extent tree corruption, but at
least we can remove such BUG_ON() and exit more gracefully.

And since in this particular image, DATA_RELOC_TREE and TREE_RELOC share
the same root (which is obviously invalid), we needs to make
__del_reloc_root() more robust to detect such invalid sharing to avoid
possible NULL dereference as root->node can be NULL in this case.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=200411
Reported-by: Xu Wen <wen.xu@gatech.edu>
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agobtrfs: qgroup: Avoid calling qgroup functions if qgroup is not enabled
Qu Wenruo [Tue, 9 Oct 2018 06:36:45 +0000 (14:36 +0800)]
btrfs: qgroup: Avoid calling qgroup functions if qgroup is not enabled

commit 3628b4ca64f24a4ec55055597d0cb1c814729f8b upstream.

Some qgroup trace events like btrfs_qgroup_release_data() and
btrfs_qgroup_free_delayed_ref() can still be triggered even if qgroup is
not enabled.

This is caused by the lack of qgroup status check before calling some
qgroup functions.  Thankfully the functions can handle quota disabled
case well and just do nothing for qgroup disabled case.

This patch will do earlier check before triggering related trace events.

And for enabled <-> disabled race case:

1) For enabled->disabled case
   Disable will wipe out all qgroups data including reservation and
   excl/rfer. Even if we leak some reservation or numbers, it will
   still be cleared, so nothing will go wrong.

2) For disabled -> enabled case
   Current btrfs_qgroup_release_data() will use extent_io tree to ensure
   we won't underflow reservation. And for delayed_ref we use
   head->qgroup_reserved to record the reserved space, so in that case
   head->qgroup_reserved should be 0 and we won't underflow.

CC: stable@vger.kernel.org # 4.14+
Reported-by: Chris Murphy <lists@colorremedies.com>
Link: https://lore.kernel.org/linux-btrfs/CAJCQCtQau7DtuUUeycCkZ36qjbKuxNzsgqJ7+sJ6W0dK_NLE3w@mail.gmail.com/
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoselftests/powerpc: Fix ptrace tm failure
Breno Leitao [Mon, 22 Oct 2018 21:16:26 +0000 (18:16 -0300)]
selftests/powerpc: Fix ptrace tm failure

commit 48dc0ef19044bfb69193302fbe3a834e3331b7ae upstream.

Test ptrace-tm-spd-gpr fails on current kernel (4.19) due to a segmentation
fault that happens on the child process prior to setting cptr[2] = 1. This
causes the parent process to wait forever at 'while (!pptr[2])' and the test to
be killed by the test harness framework by timeout, thus, failing.

The segmentation fault happens because of a inline assembly being
generated as:

0x10000355c <tm_spd_gpr+492>    lfs    f0, 0(0)

This is reading memory position 0x0 and causing the segmentation fault.

This code is being generated by ASM_LOAD_FPR_SINGLE_PRECISION(flt_4), where
flt_4 is passed to the inline assembly block as:

[flt_4] "r" (&d)

Since the inline assembly 'r' constraint means any GPR, gpr0 is being
chosen, thus causing this issue when issuing a Load Floating-Point Single
instruction.

This patch simply changes the constraint to 'b', which specify that this
register will be used as base, and r0 is not allowed to be used, avoiding
this issue.

Other than that, removing flt_2 register from the input operands, since it
is not used by the inline assembly code at all.

Cc: stable@vger.kernel.org
Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agosoc/tegra: pmc: Fix child-node lookup
Johan Hovold [Wed, 15 Nov 2017 09:44:58 +0000 (10:44 +0100)]
soc/tegra: pmc: Fix child-node lookup

commit 1dc6bd5e39a29453bdcc17348dd2a89f1aa4004e upstream.

Fix child-node lookup during probe, which ended up searching the whole
device tree depth-first starting at the parent rather than just matching
on its children.

To make things worse, the parent pmc node could end up being prematurely
freed as of_find_node_by_name() drops a reference to its first argument.

Fixes: 3568df3d31d6 ("soc: tegra: Add thermal reset (thermtrip) support to PMC")
Cc: stable <stable@vger.kernel.org> # 4.0
Cc: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoarm64: dts: stratix10: Correct System Manager register size
Thor Thayer [Tue, 25 Sep 2018 15:31:52 +0000 (10:31 -0500)]
arm64: dts: stratix10: Correct System Manager register size

commit 74121b9aa3cd571ddfff014a9f47db36cae3cda9 upstream.

Correct the register size of the System Manager node.

Cc: stable@vger.kernel.org
Fixes: 78cd6a9d8e154 ("arm64: dts: Add base stratix 10 dtsi")
Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoARM: dts: socfpga: Fix SDRAM node address for Arria10
Thor Thayer [Tue, 25 Sep 2018 15:21:10 +0000 (10:21 -0500)]
ARM: dts: socfpga: Fix SDRAM node address for Arria10

commit ce3bf934f919a7d675c5b7fa4cc233ded9c6256e upstream.

The address in the SDRAM node was incorrect. Fix this to agree with the
correct address and to match the reg definition block.

Cc: stable@vger.kernel.org
Fixes: 54b4a8f57848b("arm: socfpga: dts: Add Arria10 SDRAM EDAC DTS support")
Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoCramfs: fix abad comparison when wrap-arounds occur
Nicolas Pitre [Tue, 30 Oct 2018 17:26:15 +0000 (13:26 -0400)]
Cramfs: fix abad comparison when wrap-arounds occur

commit 672ca9dd13f1aca0c17516f76fc5b0e8344b3e46 upstream.

It is possible for corrupted filesystem images to produce very large
block offsets that may wrap when a length is added, and wrongly pass
the buffer size test.

Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agorpmsg: smd: fix memory leak on channel create
Colin Ian King [Thu, 27 Sep 2018 21:36:27 +0000 (22:36 +0100)]
rpmsg: smd: fix memory leak on channel create

commit 940c620d6af8fca7d115de40f19870fba415efac upstream.

Currently a failed allocation of channel->name leads to an
immediate return without freeing channel. Fix this by setting
ret to -ENOMEM and jumping to an exit path that kfree's channel.

Detected by CoverityScan, CID#1473692 ("Resource Leak")

Fixes: 53e2822e56c7 ("rpmsg: Introduce Qualcomm SMD backend")
Cc: stable@vger.kernel.org
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoarm64: lse: remove -fcall-used-x0 flag
Tri Vo [Wed, 19 Sep 2018 19:27:50 +0000 (12:27 -0700)]
arm64: lse: remove -fcall-used-x0 flag

commit 2a6c7c367de82951c98a290a21156770f6f82c84 upstream.

x0 is not callee-saved in the PCS. So there is no need to specify
-fcall-used-x0.

Clang doesn't currently support -fcall-used flags. This patch will help
building the kernel with clang.

Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Tri Vo <trong@android.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agomedia: media colorspaces*.rst: rename AdobeRGB to opRGB
Hans Verkuil [Thu, 13 Sep 2018 11:47:28 +0000 (07:47 -0400)]
media: media colorspaces*.rst: rename AdobeRGB to opRGB

commit a58c37978cf02f6d35d05ee4e9288cb8455f1401 upstream.

Drop all Adobe references and use the official opRGB standard
instead.

Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Cc: stable@vger.kernel.org
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agomedia: em28xx: make v4l2-compliance happier by starting sequence on zero
Mauro Carvalho Chehab [Fri, 14 Sep 2018 02:46:29 +0000 (22:46 -0400)]
media: em28xx: make v4l2-compliance happier by starting sequence on zero

commit afeaade90db4c5dab93f326d9582be1d5954a198 upstream.

The v4l2-compliance tool complains if a video doesn't start
with a zero sequence number.

While this shouldn't cause any real problem for apps, let's
make it happier, in order to better check the v4l2-compliance
differences before and after patchsets.

This is actually an old issue. It is there since at least its
videobuf2 conversion, e. g. changeset 3829fadc461 ("[media]
em28xx: convert to videobuf2"), if VB1 wouldn't suffer from
the same issue.

Cc: stable@vger.kernel.org
Fixes: d3829fadc461 ("[media] em28xx: convert to videobuf2")
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agomedia: em28xx: fix input name for Terratec AV 350
Mauro Carvalho Chehab [Fri, 14 Sep 2018 04:20:21 +0000 (00:20 -0400)]
media: em28xx: fix input name for Terratec AV 350

commit 15644bfa195bd166d0a5ed76ae2d587f719c3dac upstream.

Instead of using a register value, use an AMUX name, as otherwise
VIDIOC_G_AUDIO would fail.

Cc: stable@vger.kernel.org
Fixes: 766ed64de554 ("V4L/DVB (11827): Add support for Terratec Grabster AV350")
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agomedia: tvp5150: avoid going past array on v4l2_querymenu()
Mauro Carvalho Chehab [Thu, 13 Sep 2018 20:49:51 +0000 (16:49 -0400)]
media: tvp5150: avoid going past array on v4l2_querymenu()

commit 5c4c4505b716cb782ad7263091edc466c4d1fbd4 upstream.

The parameters of v4l2_ctrl_new_std_menu_items() are tricky: instead of
the number of possible values, it requires the number of the maximum
value. In other words, the ARRAY_SIZE() value should be decremented,
otherwise it will go past the array bounds, as warned by KASAN:

[  279.839688] BUG: KASAN: global-out-of-bounds in v4l2_querymenu+0x10d/0x180 [videodev]
[  279.839709] Read of size 8 at addr ffffffffc10a4cb0 by task v4l2-compliance/16676

[  279.839736] CPU: 1 PID: 16676 Comm: v4l2-compliance Not tainted 4.18.0-rc2+ #120
[  279.839741] Hardware name:  /NUC5i7RYB, BIOS RYBDWi35.86A.0364.2017.0511.0949 05/11/2017
[  279.839743] Call Trace:
[  279.839758]  dump_stack+0x71/0xab
[  279.839807]  ? v4l2_querymenu+0x10d/0x180 [videodev]
[  279.839817]  print_address_description+0x1c9/0x270
[  279.839863]  ? v4l2_querymenu+0x10d/0x180 [videodev]
[  279.839871]  kasan_report+0x237/0x360
[  279.839918]  v4l2_querymenu+0x10d/0x180 [videodev]
[  279.839964]  __video_do_ioctl+0x2c8/0x590 [videodev]
[  279.840011]  ? copy_overflow+0x20/0x20 [videodev]
[  279.840020]  ? avc_ss_reset+0xa0/0xa0
[  279.840028]  ? check_stack_object+0x21/0x60
[  279.840036]  ? __check_object_size+0xe7/0x240
[  279.840080]  video_usercopy+0xed/0x730 [videodev]
[  279.840123]  ? copy_overflow+0x20/0x20 [videodev]
[  279.840167]  ? v4l_enumstd+0x40/0x40 [videodev]
[  279.840177]  ? __handle_mm_fault+0x9f9/0x1ba0
[  279.840186]  ? __pmd_alloc+0x2c0/0x2c0
[  279.840193]  ? __vfs_write+0xb6/0x350
[  279.840200]  ? kernel_read+0xa0/0xa0
[  279.840244]  ? video_usercopy+0x730/0x730 [videodev]
[  279.840284]  v4l2_ioctl+0xa1/0xb0 [videodev]
[  279.840295]  do_vfs_ioctl+0x117/0x8a0
[  279.840303]  ? selinux_file_ioctl+0x211/0x2f0
[  279.840313]  ? ioctl_preallocate+0x120/0x120
[  279.840319]  ? selinux_capable+0x20/0x20
[  279.840332]  ksys_ioctl+0x70/0x80
[  279.840342]  __x64_sys_ioctl+0x3d/0x50
[  279.840351]  do_syscall_64+0x6d/0x1c0
[  279.840361]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  279.840367] RIP: 0033:0x7fdfb46275d7
[  279.840369] Code: b3 66 90 48 8b 05 b1 48 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 48 2d 00 f7 d8 64 89 01 48
[  279.840474] RSP: 002b:00007ffee1179038 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
[  279.840483] RAX: ffffffffffffffda RBX: 00007ffee1179180 RCX: 00007fdfb46275d7
[  279.840488] RDX: 00007ffee11790c0 RSI: 00000000c02c5625 RDI: 0000000000000003
[  279.840493] RBP: 0000000000000002 R08: 0000000000000020 R09: 00000000009f0902
[  279.840497] R10: 0000000000000000 R11: 0000000000000202 R12: 00007ffee117a5a0
[  279.840501] R13: 00007ffee11790c0 R14: 0000000000000002 R15: 0000000000000000

[  279.840515] The buggy address belongs to the variable:
[  279.840535]  tvp5150_test_patterns+0x10/0xffffffffffffe360 [tvp5150]

Fixes: c43875f66140 ("[media] tvp5150: replace MEDIA_ENT_F_CONN_TEST by a control")
Cc: stable@vger.kernel.org
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agomedia: em28xx: use a default format if TRY_FMT fails
Mauro Carvalho Chehab [Fri, 14 Sep 2018 03:22:40 +0000 (23:22 -0400)]
media: em28xx: use a default format if TRY_FMT fails

commit f823ce2a1202d47110a7ef86b65839f0be8adc38 upstream.

Follow the V4L2 spec, as warned by v4l2-compliance:

warn: v4l2-test-formats.cpp(732): TRY_FMT cannot handle an invalid pixelformat.
warn: v4l2-test-formats.cpp(733): This may or may not be a problem. For more information see:

warn: v4l2-test-formats.cpp(734): http://www.mail-archive.com/linux-media@vger.kernel.org/msg56550.html

Cc: stable@vger.kernel.org
Fixes: bddcf63313c6 ("V4L/DVB (9927): em28xx: use a more standard way to specify video formats")
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoxen-blkfront: fix kernel panic with negotiate_mq error path
Manjunath Patil [Tue, 30 Oct 2018 16:49:21 +0000 (09:49 -0700)]
xen-blkfront: fix kernel panic with negotiate_mq error path

commit 6cc4a0863c9709c512280c64e698d68443ac8053 upstream.

info->nr_rings isn't adjusted in case of ENOMEM error from
negotiate_mq(). This leads to kernel panic in error path.

Typical call stack involving panic -
 #8 page_fault at ffffffff8175936f
    [exception RIP: blkif_free_ring+33]
    RIP: ffffffffa0149491  RSP: ffff8804f7673c08  RFLAGS: 00010292
 ...
 #9 blkif_free at ffffffffa0149aaa [xen_blkfront]
 #10 talk_to_blkback at ffffffffa014c8cd [xen_blkfront]
 #11 blkback_changed at ffffffffa014ea8b [xen_blkfront]
 #12 xenbus_otherend_changed at ffffffff81424670
 #13 backend_changed at ffffffff81426dc3
 #14 xenwatch_thread at ffffffff81422f29
 #15 kthread at ffffffff810abe6a
 #16 ret_from_fork at ffffffff81754078

Cc: stable@vger.kernel.org
Fixes: 7ed8ce1c5fc7 ("xen-blkfront: move negotiate_mq to cover all cases of new VBDs")
Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoxen: fix xen_qlock_wait()
Juergen Gross [Thu, 8 Nov 2018 07:35:06 +0000 (08:35 +0100)]
xen: fix xen_qlock_wait()

commit d3132b3860f6cf35ff7609a76bbcdbb814bd027c upstream.

Commit a856531951dc80 ("xen: make xen_qlock_wait() nestable")
introduced a regression for Xen guests running fully virtualized
(HVM or PVH mode). The Xen hypervisor wouldn't return from the poll
hypercall with interrupts disabled in case of an interrupt (for PV
guests it does).

So instead of disabling interrupts in xen_qlock_wait() use a nesting
counter to avoid calling xen_clear_irq_pending() in case
xen_qlock_wait() is nested.

Fixes: a856531951dc80 ("xen: make xen_qlock_wait() nestable")
Cc: stable@vger.kernel.org
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Tested-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agokgdboc: Passing ekgdboc to command line causes panic
He Zhe [Fri, 17 Aug 2018 14:42:28 +0000 (22:42 +0800)]
kgdboc: Passing ekgdboc to command line causes panic

commit 1bd54d851f50dea6af30c3e6ff4f3e9aab5558f9 upstream.

kgdboc_option_setup does not check input argument before passing it
to strlen. The argument would be a NULL pointer if "ekgdboc", without
its value, is set in command line and thus cause the following panic.

PANIC: early exception 0xe3 IP 10:ffffffff8fbbb620 error 0 cr2 0x0
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.18-rc8+ #1
[    0.000000] RIP: 0010:strlen+0x0/0x20
...
[    0.000000] Call Trace
[    0.000000]  ? kgdboc_option_setup+0x9/0xa0
[    0.000000]  ? kgdboc_early_init+0x6/0x1b
[    0.000000]  ? do_early_param+0x4d/0x82
[    0.000000]  ? parse_args+0x212/0x330
[    0.000000]  ? rdinit_setup+0x26/0x26
[    0.000000]  ? parse_early_options+0x20/0x23
[    0.000000]  ? rdinit_setup+0x26/0x26
[    0.000000]  ? parse_early_param+0x2d/0x39
[    0.000000]  ? setup_arch+0x2f7/0xbf4
[    0.000000]  ? start_kernel+0x5e/0x4c2
[    0.000000]  ? load_ucode_bsp+0x113/0x12f
[    0.000000]  ? secondary_startup_64+0xa5/0xb0

This patch adds a check to prevent the panic.

Cc: stable@vger.kernel.org
Cc: jason.wessel@windriver.com
Cc: gregkh@linuxfoundation.org
Cc: jslaby@suse.com
Signed-off-by: He Zhe <zhe.he@windriver.com>
Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agomedia: v4l2-tpg: fix kernel oops when enabling HFLIP and OSD
Hans Verkuil [Mon, 8 Oct 2018 19:08:27 +0000 (15:08 -0400)]
media: v4l2-tpg: fix kernel oops when enabling HFLIP and OSD

commit 250854eed5d45a73d81e4137dfd85180af6f2ec3 upstream.

When the OSD is on (i.e. vivid displays text on top of the test pattern), and
you enable hflip, then the driver crashes.

The cause turned out to be a division of a negative number by an unsigned value.
You expect that -8 / 2U would be -4, but in reality it is 2147483644 :-(

Fixes: 3e14e7a82c1ef ("vivid-tpg: add hor/vert downsampling support to tpg_gen_text")
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Reported-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: <stable@vger.kernel.org> # for v4.1 and up
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoTC: Set DMA masks for devices
Maciej W. Rozycki [Wed, 3 Oct 2018 12:21:07 +0000 (13:21 +0100)]
TC: Set DMA masks for devices

commit 3f2aa244ee1a0d17ed5b6c86564d2c1b24d1c96b upstream.

Fix a TURBOchannel support regression with commit 205e1b7f51e4
("dma-mapping: warn when there is no coherent_dma_mask") that caused
coherent DMA allocations to produce a warning such as:

defxx: v1.11 2014/07/01  Lawrence V. Stefani and others
tc1: DEFTA at MMIO addr = 0x1e900000, IRQ = 20, Hardware addr = 08-00-2b-a3-a3-29
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1 at ./include/linux/dma-mapping.h:516 dfx_dev_register+0x670/0x678
Modules linked in:
CPU: 0 PID: 1 Comm: swapper Not tainted 4.19.0-rc6 #2
Stack : ffffffff8009ffc0 fffffffffffffec0 0000000000000000 ffffffff80647650
        0000000000000000 0000000000000000 ffffffff806f5f80 ffffffffffffffff
        0000000000000000 0000000000000000 0000000000000001 ffffffff8065d4e8
        98000000031b6300 ffffffff80563478 ffffffff805685b0 ffffffffffffffff
        0000000000000000 ffffffff805d6720 0000000000000204 ffffffff80388df8
        0000000000000000 0000000000000009 ffffffff8053efd0 ffffffff806657d0
        0000000000000000 ffffffff803177f8 0000000000000000 ffffffff806d0000
        9800000003078000 980000000307b9e0 000000001e900000 ffffffff80067940
        0000000000000000 ffffffff805d6720 0000000000000204 ffffffff80388df8
        ffffffff805176c0 ffffffff8004dc78 0000000000000000 ffffffff80067940
        ...
Call Trace:
[<ffffffff8004dc78>] show_stack+0xa0/0x130
[<ffffffff80067940>] __warn+0x128/0x170
---[ end trace b1d1e094f67f3bb2 ]---

This is because the TURBOchannel bus driver fails to set the coherent
DMA mask for devices enumerated.

Set the regular and coherent DMA masks for TURBOchannel devices then,
observing that the bus protocol supports a 34-bit (16GiB) DMA address
space, by interpreting the value presented in the address cycle across
the 32 `ad' lines as a 32-bit word rather than byte address[1].  The
architectural size of the TURBOchannel DMA address space exceeds the
maximum amount of RAM any actual TURBOchannel system in existence may
have, hence both masks are the same.

This removes the warning shown above.

References:

[1] "TURBOchannel Hardware Specification", EK-369AA-OD-007B, Digital
    Equipment Corporation, January 1993, Section "DMA", pp. 1-15 -- 1-17

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Paul Burton <paul.burton@mips.com>
Patchwork: https://patchwork.linux-mips.org/patch/20835/
Fixes: 205e1b7f51e4 ("dma-mapping: warn when there is no coherent_dma_mask")
Cc: stable@vger.kernel.org # 4.16+
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoiommu/arm-smmu: Ensure that page-table updates are visible before TLBI
Will Deacon [Mon, 1 Oct 2018 11:42:49 +0000 (12:42 +0100)]
iommu/arm-smmu: Ensure that page-table updates are visible before TLBI

commit 7d321bd3542500caf125249f44dc37cb4e738013 upstream.

The IO-pgtable code relies on the driver TLB invalidation callbacks to
ensure that all page-table updates are visible to the IOMMU page-table
walker.

In the case that the page-table walker is cache-coherent, we cannot rely
on an implicit DSB from the DMA-mapping code, so we must ensure that we
execute a DSB in our tlb_add_flush() callback prior to triggering the
invalidation.

Cc: <stable@vger.kernel.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Fixes: 2df7a25ce4a7 ("iommu/arm-smmu: Clean up DMA API usage")
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoMIPS: OCTEON: fix out of bounds array access on CN68XX
Aaro Koskinen [Fri, 26 Oct 2018 22:46:34 +0000 (01:46 +0300)]
MIPS: OCTEON: fix out of bounds array access on CN68XX

commit c0fae7e2452b90c31edd2d25eb3baf0c76b400ca upstream.

The maximum number of interfaces is returned by
cvmx_helper_get_number_of_interfaces(), and the value is used to access
interface_port_count[]. When CN68XX support was added, we forgot
to increase the array size. Fix that.

Fixes: 2c8c3f0201333 ("MIPS: Octeon: Support additional interfaces on CN68XX")
Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: Paul Burton <paul.burton@mips.com>
Patchwork: https://patchwork.linux-mips.org/patch/20949/
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org # v4.3+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agopowerpc/msi: Fix compile error on mpc83xx
Christophe Leroy [Fri, 19 Oct 2018 06:12:50 +0000 (06:12 +0000)]
powerpc/msi: Fix compile error on mpc83xx

commit 0f99153def98134403c9149128e59d3e1786cf04 upstream.

mpic_get_primary_version() is not defined when not using MPIC.
The compile error log like:

arch/powerpc/sysdev/built-in.o: In function `fsl_of_msi_probe':
fsl_msi.c:(.text+0x150c): undefined reference to `fsl_mpic_primary_get_version'

Signed-off-by: Jia Hongtao <hongtao.jia@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Reported-by: Radu Rendec <radu.rendec@gmail.com>
Fixes: 807d38b73b6 ("powerpc/mpic: Add get_version API both for internal and external use")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agodm zoned: fix various dmz_get_mblock() issues
Damien Le Moal [Wed, 17 Oct 2018 09:05:08 +0000 (18:05 +0900)]
dm zoned: fix various dmz_get_mblock() issues

commit 3d4e738311327bc4ba1d55fbe2f1da3de4a475f9 upstream.

dmz_fetch_mblock() called from dmz_get_mblock() has a race since the
allocation of the new metadata block descriptor and its insertion in
the cache rbtree with the READING state is not atomic. Two different
contexts requesting the same block may end up each adding two different
descriptors of the same block to the cache.

Another problem for this function is that the BIO for processing the
block read is allocated after the metadata block descriptor is inserted
in the cache rbtree. If the BIO allocation fails, the metadata block
descriptor is freed without first being removed from the rbtree.

Fix the first problem by checking again if the requested block is not in
the cache right before inserting the newly allocated descriptor,
atomically under the mblk_lock spinlock. The second problem is fixed by
simply allocating the BIO before inserting the new block in the cache.

Finally, since dmz_fetch_mblock() also increments a block reference
counter, rename the function to dmz_get_mblock_slow(). To be symmetric
and clear, also rename dmz_lookup_mblock() to dmz_get_mblock_fast() and
increment the block reference counter directly in that function rather
than in dmz_get_mblock().

Fixes: 3b1a94c88b79 ("dm zoned: drive-managed zoned block device target")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agodm zoned: fix metadata block ref counting
Damien Le Moal [Wed, 17 Oct 2018 09:05:07 +0000 (18:05 +0900)]
dm zoned: fix metadata block ref counting

commit 33c2865f8d011a2ca9f67124ddab9dc89382e9f1 upstream.

Since the ref field of struct dmz_mblock is always used with the
spinlock of struct dmz_metadata locked, there is no need to use an
atomic_t type. Change the type of the ref field to an unsigne
integer.

Fixes: 3b1a94c88b79 ("dm zoned: drive-managed zoned block device target")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agodm ioctl: harden copy_params()'s copy_from_user() from malicious users
Wenwen Wang [Wed, 3 Oct 2018 16:43:59 +0000 (11:43 -0500)]
dm ioctl: harden copy_params()'s copy_from_user() from malicious users

commit 800a7340ab7dd667edf95e74d8e4f23a17e87076 upstream.

In copy_params(), the struct 'dm_ioctl' is first copied from the user
space buffer 'user' to 'param_kernel' and the field 'data_size' is
checked against 'minimum_data_size' (size of 'struct dm_ioctl' payload
up to its 'data' member).  If the check fails, an error code EINVAL will be
returned.  Otherwise, param_kernel->data_size is used to do a second copy,
which copies from the same user-space buffer to 'dmi'.  After the second
copy, only 'dmi->data_size' is checked against 'param_kernel->data_size'.
Given that the buffer 'user' resides in the user space, a malicious
user-space process can race to change the content in the buffer between
the two copies.  This way, the attacker can inject inconsistent data
into 'dmi' (versus previously validated 'param_kernel').

Fix redundant copying of 'minimum_data_size' from user-space buffer by
using the first copy stored in 'param_kernel'.  Also remove the
'data_size' check after the second copy because it is now unnecessary.

Cc: stable@vger.kernel.org
Signed-off-by: Wenwen Wang <wang6495@umn.edu>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agolockd: fix access beyond unterminated strings in prints
Amir Goldstein [Fri, 28 Sep 2018 17:41:48 +0000 (20:41 +0300)]
lockd: fix access beyond unterminated strings in prints

commit 93f38b6fae0ea8987e22d9e6c38f8dfdccd867ee upstream.

printk format used %*s instead of %.*s, so hostname_len does not limit
the number of bytes accessed from hostname.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonfsd: Fix an Oops in free_session()
Trond Myklebust [Tue, 9 Oct 2018 19:54:15 +0000 (15:54 -0400)]
nfsd: Fix an Oops in free_session()

commit bb6ad5572c0022e17e846b382d7413cdcf8055be upstream.

In call_xpt_users(), we delete the entry from the list, but we
do not reinitialise it. This triggers the list poisoning when
we later call unregister_xpt_user() in nfsd4_del_conns().

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonfs: Fix a missed page unlock after pg_doio()
Benjamin Coddington [Thu, 18 Oct 2018 19:01:48 +0000 (15:01 -0400)]
nfs: Fix a missed page unlock after pg_doio()

commit fdbd1a2e4a71adcb1ae219fcfd964930d77a7f84 upstream.

We must check pg_error and call error_cleanup after any call to pg_doio.
Currently, we are skipping the unlock of a page if we encounter an error in
nfs_pageio_complete() before handing off the work to the RPC layer.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoNFSv4.1: Fix the r/wsize checking
Trond Myklebust [Tue, 18 Sep 2018 14:07:44 +0000 (10:07 -0400)]
NFSv4.1: Fix the r/wsize checking

commit 943cff67b842839f4f35364ba2db5c2d3f025d94 upstream.

The intention of nfs4_session_set_rwsize() was to cap the r/wsize to the
buffer sizes negotiated by the CREATE_SESSION. The initial code had a
bug whereby we would not check the values negotiated by nfs_probe_fsinfo()
(the assumption being that CREATE_SESSION will always negotiate buffer values
that are sane w.r.t. the server's preferred r/wsizes) but would only check
values set by the user in the 'mount' command.

The code was changed in 4.11 to _always_ set the r/wsize, meaning that we
now never use the server preferred r/wsizes. This is the regression that
this patch fixes.
Also rename the function to nfs4_session_limit_rwsize() in order to avoid
future confusion.

Fixes: 033853325fe3 (NFSv4.1 respect server's max size in CREATE_SESSION")
Cc: stable@vger.kernel.org # v4.11+
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agogenirq: Fix race on spurious interrupt detection
Lukas Wunner [Thu, 18 Oct 2018 13:15:05 +0000 (15:15 +0200)]
genirq: Fix race on spurious interrupt detection

commit 746a923b863a1065ef77324e1e43f19b1a3eab5c upstream.

Commit 1e77d0a1ed74 ("genirq: Sanitize spurious interrupt detection of
threaded irqs") made detection of spurious interrupts work for threaded
handlers by:

a) incrementing a counter every time the thread returns IRQ_HANDLED, and
b) checking whether that counter has increased every time the thread is
   woken.

However for oneshot interrupts, the commit unmasks the interrupt before
incrementing the counter.  If another interrupt occurs right after
unmasking but before the counter is incremented, that interrupt is
incorrectly considered spurious:

time
 |  irq_thread()
 |    irq_thread_fn()
 |      action->thread_fn()
 |      irq_finalize_oneshot()
 |        unmask_threaded_irq()            /* interrupt is unmasked */
 |
 |                  /* interrupt fires, incorrectly deemed spurious */
 |
 |    atomic_inc(&desc->threads_handled); /* counter is incremented */
 v

This is observed with a hi3110 CAN controller receiving data at high volume
(from a separate machine sending with "cangen -g 0 -i -x"): The controller
signals a huge number of interrupts (hundreds of millions per day) and
every second there are about a dozen which are deemed spurious.

In theory with high CPU load and the presence of higher priority tasks, the
number of incorrectly detected spurious interrupts might increase beyond
the 99,900 threshold and cause disablement of the interrupt.

In practice it just increments the spurious interrupt count. But that can
cause people to waste time investigating it over and over.

Fix it by moving the accounting before the invocation of
irq_finalize_oneshot().

[ tglx: Folded change log update ]

Fixes: 1e77d0a1ed74 ("genirq: Sanitize spurious interrupt detection of threaded irqs")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Mathias Duckeck <m.duckeck@kunbus.de>
Cc: Akshay Bhat <akshay.bhat@timesys.com>
Cc: Casey Fitzpatrick <casey.fitzpatrick@timesys.com>
Cc: stable@vger.kernel.org # v3.16+
Link: https://lkml.kernel.org/r/1dfd8bbd16163940648045495e3e9698e63b50ad.1539867047.git.lukas@wunner.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoprintk: Fix panic caused by passing log_buf_len to command line
He Zhe [Sat, 29 Sep 2018 16:45:50 +0000 (00:45 +0800)]
printk: Fix panic caused by passing log_buf_len to command line

commit 277fcdb2cfee38ccdbe07e705dbd4896ba0c9930 upstream.

log_buf_len_setup does not check input argument before passing it to
simple_strtoull. The argument would be a NULL pointer if "log_buf_len",
without its value, is set in command line and thus causes the following
panic.

PANIC: early exception 0xe3 IP 10:ffffffffaaeacd0d error 0 cr2 0x0
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.0-rc4-yocto-standard+ #1
[    0.000000] RIP: 0010:_parse_integer_fixup_radix+0xd/0x70
...
[    0.000000] Call Trace:
[    0.000000]  simple_strtoull+0x29/0x70
[    0.000000]  memparse+0x26/0x90
[    0.000000]  log_buf_len_setup+0x17/0x22
[    0.000000]  do_early_param+0x57/0x8e
[    0.000000]  parse_args+0x208/0x320
[    0.000000]  ? rdinit_setup+0x30/0x30
[    0.000000]  parse_early_options+0x29/0x2d
[    0.000000]  ? rdinit_setup+0x30/0x30
[    0.000000]  parse_early_param+0x36/0x4d
[    0.000000]  setup_arch+0x336/0x99e
[    0.000000]  start_kernel+0x6f/0x4ee
[    0.000000]  x86_64_start_reservations+0x24/0x26
[    0.000000]  x86_64_start_kernel+0x6f/0x72
[    0.000000]  secondary_startup_64+0xa4/0xb0

This patch adds a check to prevent the panic.

Link: http://lkml.kernel.org/r/1538239553-81805-1-git-send-email-zhe.he@windriver.com
Cc: stable@vger.kernel.org
Cc: rostedt@goodmis.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: He Zhe <zhe.he@windriver.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agosmb3: on kerberos mount if server doesn't specify auth type use krb5
Steve French [Sun, 28 Oct 2018 18:13:23 +0000 (13:13 -0500)]
smb3: on kerberos mount if server doesn't specify auth type use krb5

commit 926674de6705f0f1dbf29a62fd758d0977f535d6 upstream.

Some servers (e.g. Azure) do not include a spnego blob in the SMB3
negotiate protocol response, so on kerberos mounts ("sec=krb5")
we can fail, as we expected the server to list its supported
auth types (OIDs in the spnego blob in the negprot response).
Change this so that on krb5 mounts we default to trying krb5 if the
server doesn't list its supported protocol mechanisms.

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agosmb3: do not attempt cifs operation in smb3 query info error path
Steve French [Fri, 19 Oct 2018 05:45:21 +0000 (00:45 -0500)]
smb3: do not attempt cifs operation in smb3 query info error path

commit 1e77a8c204c9d1b655c61751b8ad0fde22421dbb upstream.

If backupuid mount option is sent, we can incorrectly retry
(on access denied on query info) with a cifs (FindFirst) operation
on an smb3 mount which causes the server to force the session close.

We set backup intent on open so no need for this fallback.

See kernel bugzilla 201435

Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agosmb3: allow stats which track session and share reconnects to be reset
Steve French [Sun, 16 Sep 2018 04:04:41 +0000 (23:04 -0500)]
smb3: allow stats which track session and share reconnects to be reset

commit 2c887635cd6ab3af619dc2be94e5bf8f2e172b78 upstream.

Currently, "echo 0 > /proc/fs/cifs/Stats" resets all of the stats
except the session and share reconnect counts.  Fix it to
reset those as well.

CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agow1: omap-hdq: fix missing bus unregister at removal
Andreas Kemnade [Sat, 22 Sep 2018 19:20:54 +0000 (21:20 +0200)]
w1: omap-hdq: fix missing bus unregister at removal

commit a007734618fee1bf35556c04fa498d41d42c7301 upstream.

The bus master was not removed after unloading the module
or unbinding the driver. That lead to oopses like this

[  127.842987] Unable to handle kernel paging request at virtual address bf01d04c
[  127.850646] pgd = 70e3cd9a
[  127.853698] [bf01d04c] *pgd=8f908811, *pte=00000000, *ppte=00000000
[  127.860412] Internal error: Oops: 80000007 [#1] PREEMPT SMP ARM
[  127.866668] Modules linked in: bq27xxx_battery overlay [last unloaded: omap_hdq]
[  127.874542] CPU: 0 PID: 1022 Comm: w1_bus_master1 Not tainted 4.19.0-rc4-00001-g2d51da718324 #12
[  127.883819] Hardware name: Generic OMAP36xx (Flattened Device Tree)
[  127.890441] PC is at 0xbf01d04c
[  127.893798] LR is at w1_search_process_cb+0x4c/0xfc
[  127.898956] pc : [<bf01d04c>]    lr : [<c05f9580>]    psr: a0070013
[  127.905609] sp : cf885f48  ip : bf01d04c  fp : ddf1e11c
[  127.911132] r10: cf8fe040  r9 : c05f8d00  r8 : cf8fe040
[  127.916656] r7 : 000000f0  r6 : cf8fe02c  r5 : cf8fe000  r4 : cf8fe01c
[  127.923553] r3 : c05f8d00  r2 : 000000f0  r1 : cf8fe000  r0 : dde1ef10
[  127.930450] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[  127.938018] Control: 10c5387d  Table: 8f8f0019  DAC: 00000051
[  127.944091] Process w1_bus_master1 (pid: 1022, stack limit = 0x9135699f)
[  127.951171] Stack: (0xcf885f48 to 0xcf886000)
[  127.955810] 5f40:                   cf8fe000 00000000 cf884000 cf8fe090 000003e8 c05f8d00
[  127.964477] 5f60: dde5fc34 c05f9700 ddf1e100 ddf1e540 cf884000 cf8fe000 c05f9694 00000000
[  127.973114] 5f80: dde5fc34 c01499a4 00000000 ddf1e540 c0149874 00000000 00000000 00000000
[  127.981781] 5fa0: 00000000 00000000 00000000 c01010e8 00000000 00000000 00000000 00000000
[  127.990447] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  127.999114] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[  128.007781] [<c05f9580>] (w1_search_process_cb) from [<c05f9700>] (w1_process+0x6c/0x118)
[  128.016479] [<c05f9700>] (w1_process) from [<c01499a4>] (kthread+0x130/0x148)
[  128.024047] [<c01499a4>] (kthread) from [<c01010e8>] (ret_from_fork+0x14/0x2c)
[  128.031677] Exception stack(0xcf885fb0 to 0xcf885ff8)
[  128.037017] 5fa0:                                     00000000 00000000 00000000 00000000
[  128.045684] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  128.054351] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[  128.061340] Code: bad PC value
[  128.064697] ---[ end trace af066e33c0e14119 ]---

Cc: <stable@vger.kernel.org>
Signed-off-by: Andreas Kemnade <andreas@kemnade.info>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoiio: adc: at91: fix wrong channel number in triggered buffer mode
Eugen Hristev [Mon, 24 Sep 2018 07:51:44 +0000 (10:51 +0300)]
iio: adc: at91: fix wrong channel number in triggered buffer mode

commit aea835f2dc8a682942b859179c49ad1841a6c8b9 upstream.

When channels are registered, the hardware channel number is not the
actual iio channel number.
This is because the driver is probed with a certain number of accessible
channels. Some pins are routed and some not, depending on the description of
the board in the DT.
Because of that, channels 0,1,2,3 can correspond to hardware channels
2,3,4,5 for example.
In the buffered triggered case, we need to do the translation accordingly.
Fixed the channel number to stop reading the wrong channel.

Fixes: 0e589d5fb ("ARM: AT91: IIO: Add AT91 ADC driver.")
Cc: Maxime Ripard <maxime.ripard@bootlin.com>
Signed-off-by: Eugen Hristev <eugen.hristev@microchip.com>
Acked-by: Ludovic Desroches <ludovic.desroches@microchip.com>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoiio: adc: at91: fix acking DRDY irq on simple conversions
Eugen Hristev [Mon, 24 Sep 2018 07:51:43 +0000 (10:51 +0300)]
iio: adc: at91: fix acking DRDY irq on simple conversions

commit bc1b45326223e7e890053cf6266357adfa61942d upstream.

When doing simple conversions, the driver did not acknowledge the DRDY irq.
If this irq status is not acked, it will be left pending, and as soon as a
trigger is enabled, the irq handler will be called, it doesn't know why
this status has occurred because no channel is pending, and then it will go
int a irq loop and board will hang.
To avoid this situation, read the LCDR after a raw conversion is done.

Fixes: 0e589d5fb ("ARM: AT91: IIO: Add AT91 ADC driver.")
Cc: Maxime Ripard <maxime.ripard@bootlin.com>
Signed-off-by: Eugen Hristev <eugen.hristev@microchip.com>
Acked-by: Ludovic Desroches <ludovic.desroches@microchip.com>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoiio: adc: imx25-gcq: Fix leak of device_node in mx25_gcq_setup_cfgs()
Alexey Khoroshilov [Fri, 21 Sep 2018 21:58:02 +0000 (00:58 +0300)]
iio: adc: imx25-gcq: Fix leak of device_node in mx25_gcq_setup_cfgs()

commit d3fa21c73c391975488818b085b894c2980ea052 upstream.

Leaving for_each_child_of_node loop we should release child device node,
if it is not stored for future use.

Found by Linux Driver Verification project (linuxtesting.org).

JC: I'm not sending this as a quick fix as it's been wrong for years,
but good to pick up for stable after the merge window.

Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Fixes: 6df2e98c3ea56 ("iio: adc: Add imx25-gcq ADC driver")
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoiio: ad5064: Fix regulator handling
Lars-Peter Clausen [Fri, 28 Sep 2018 09:23:40 +0000 (11:23 +0200)]
iio: ad5064: Fix regulator handling

commit 8911a43bc198877fad9f4b0246a866b26bb547ab upstream.

The correct way to handle errors returned by regualtor_get() and friends is
to propagate the error since that means that an regulator was specified,
but something went wrong when requesting it.

For handling optional regulators, e.g. when the device has an internal
vref, regulator_get_optional() should be used to avoid getting the dummy
regulator that the regulator core otherwise provides.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agokbuild: fix kernel/bounds.c 'W=1' warning
Arnd Bergmann [Tue, 30 Oct 2018 22:07:32 +0000 (15:07 -0700)]
kbuild: fix kernel/bounds.c 'W=1' warning

commit 6a32c2469c3fbfee8f25bcd20af647326650a6cf upstream.

Building any configuration with 'make W=1' produces a warning:

kernel/bounds.c:16:6: warning: no previous prototype for 'foo' [-Wmissing-prototypes]

When also passing -Werror, this prevents us from building any other files.
Nobody ever calls the function, but we can't make it 'static' either
since we want the compiler output.

Calling it 'main' instead however avoids the warning, because gcc
does not insist on having a declaration for main.

Link: http://lkml.kernel.org/r/20181005083313.2088252-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
Reviewed-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoKVM: arm64: Fix caching of host MDCR_EL2 value
Mark Rutland [Wed, 17 Oct 2018 16:42:10 +0000 (17:42 +0100)]
KVM: arm64: Fix caching of host MDCR_EL2 value

commit da5a3ce66b8bb51b0ea8a89f42aac153903f90fb upstream.

At boot time, KVM stashes the host MDCR_EL2 value, but only does this
when the kernel is not running in hyp mode (i.e. is non-VHE). In these
cases, the stashed value of MDCR_EL2.HPMN happens to be zero, which can
lead to CONSTRAINED UNPREDICTABLE behaviour.

Since we use this value to derive the MDCR_EL2 value when switching
to/from a guest, after a guest have been run, the performance counters
do not behave as expected. This has been observed to result in accesses
via PMXEVTYPER_EL0 and PMXEVCNTR_EL0 not affecting the relevant
counters, resulting in events not being counted. In these cases, only
the fixed-purpose cycle counter appears to work as expected.

Fix this by always stashing the host MDCR_EL2 value, regardless of VHE.

Cc: Christopher Dall <christoffer.dall@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: stable@vger.kernel.org
Fixes: 1e947bad0b63b351 ("arm64: KVM: Skip HYP setup when already running in HYP")
Tested-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agomm/rmap: map_pte() was not handling private ZONE_DEVICE page properly
Ralph Campbell [Tue, 30 Oct 2018 22:04:11 +0000 (15:04 -0700)]
mm/rmap: map_pte() was not handling private ZONE_DEVICE page properly

commit aab8d0520e6e7c2a61f71195e6ce7007a4843afb upstream.

Private ZONE_DEVICE pages use a special pte entry and thus are not
present.  Properly handle this case in map_pte(), it is already handled in
check_pte(), the map_pte() part was lost in some rebase most probably.

Without this patch the slow migration path can not migrate back to any
private ZONE_DEVICE memory to regular memory.  This was found after stress
testing migration back to system memory.  This ultimatly can lead to the
CPU constantly page fault looping on the special swap entry.

Link: http://lkml.kernel.org/r/20181019160442.18723-3-jglisse@redhat.com
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agohugetlbfs: dirty pages as they are added to pagecache
Mike Kravetz [Fri, 26 Oct 2018 22:10:58 +0000 (15:10 -0700)]
hugetlbfs: dirty pages as they are added to pagecache

commit 22146c3ce98962436e401f7b7016a6f664c9ffb5 upstream.

Some test systems were experiencing negative huge page reserve counts and
incorrect file block counts.  This was traced to /proc/sys/vm/drop_caches
removing clean pages from hugetlbfs file pagecaches.  When non-hugetlbfs
explicit code removes the pages, the appropriate accounting is not
performed.

This can be recreated as follows:
 fallocate -l 2M /dev/hugepages/foo
 echo 1 > /proc/sys/vm/drop_caches
 fallocate -l 2M /dev/hugepages/foo
 grep -i huge /proc/meminfo
   AnonHugePages:         0 kB
   ShmemHugePages:        0 kB
   HugePages_Total:    2048
   HugePages_Free:     2047
   HugePages_Rsvd:    18446744073709551615
   HugePages_Surp:        0
   Hugepagesize:       2048 kB
   Hugetlb:         4194304 kB
 ls -lsh /dev/hugepages/foo
   4.0M -rw-r--r--. 1 root root 2.0M Oct 17 20:05 /dev/hugepages/foo

To address this issue, dirty pages as they are added to pagecache.  This
can easily be reproduced with fallocate as shown above.  Read faulted
pages will eventually end up being marked dirty.  But there is a window
where they are clean and could be impacted by code such as drop_caches.
So, just dirty them all as they are added to the pagecache.

Link: http://lkml.kernel.org/r/b5be45b8-5afe-56cd-9482-28384699a049@oracle.com
Fixes: 6bda666a03f0 ("hugepages: fold find_or_alloc_pages into huge_no_page()")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Mihcla Hocko <mhocko@suse.com>
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoima: fix showing large 'violations' or 'runtime_measurements_count'
Eric Biggers [Fri, 7 Sep 2018 21:33:24 +0000 (14:33 -0700)]
ima: fix showing large 'violations' or 'runtime_measurements_count'

commit 1e4c8dafbb6bf72fb5eca035b861e39c5896c2b7 upstream.

The 12 character temporary buffer is not necessarily long enough to hold
a 'long' value.  Increase it.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agomm: /proc/pid/smaps_rollup: fix NULL pointer deref in smaps_pte_range()
Vlastimil Babka [Fri, 26 Oct 2018 22:02:16 +0000 (15:02 -0700)]
mm: /proc/pid/smaps_rollup: fix NULL pointer deref in smaps_pte_range()

commit fa76da461bb0be13c8339d984dcf179151167c8f upstream.

Leonardo reports an apparent regression in 4.19-rc7:

 BUG: unable to handle kernel NULL pointer dereference at 00000000000000f0
 PGD 0 P4D 0
 Oops: 0000 [#1] PREEMPT SMP PTI
 CPU: 3 PID: 6032 Comm: python Not tainted 4.19.0-041900rc7-lowlatency #201810071631
 Hardware name: LENOVO 80UG/Toronto 4A2, BIOS 0XCN45WW 08/09/2018
 RIP: 0010:smaps_pte_range+0x32d/0x540
 Code: 80 00 00 00 00 74 a9 48 89 de 41 f6 40 52 40 0f 85 04 02 00 00 49 2b 30 48 c1 ee 0c 49 03 b0 98 00 00 00 49 8b 80 a0 00 00 00 <48> 8b b8 f0 00 00 00 e8 b7 ef ec ff 48 85 c0 0f 84 71 ff ff ff a8
 RSP: 0018:ffffb0cbc484fb88 EFLAGS: 00010202
 RAX: 0000000000000000 RBX: 0000560ddb9e9000 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: 0000000560ddb9e9 RDI: 0000000000000001
 RBP: ffffb0cbc484fbc0 R08: ffff94a5a227a578 R09: ffff94a5a227a578
 R10: 0000000000000000 R11: 0000560ddbbe7000 R12: ffffe903098ba728
 R13: ffffb0cbc484fc78 R14: ffffb0cbc484fcf8 R15: ffff94a5a2e9cf48
 FS:  00007f6dfb683740(0000) GS:ffff94a5aaf80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00000000000000f0 CR3: 000000011c118001 CR4: 00000000003606e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  __walk_page_range+0x3c2/0x6f0
  walk_page_vma+0x42/0x60
  smap_gather_stats+0x79/0xe0
  ? gather_pte_stats+0x320/0x320
  ? gather_hugetlb_stats+0x70/0x70
  show_smaps_rollup+0xcd/0x1c0
  seq_read+0x157/0x400
  __vfs_read+0x3a/0x180
  ? security_file_permission+0x93/0xc0
  ? security_file_permission+0x93/0xc0
  vfs_read+0x8f/0x140
  ksys_read+0x55/0xc0
  __x64_sys_read+0x1a/0x20
  do_syscall_64+0x5a/0x110
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Decoded code matched to local compilation+disassembly points to
smaps_pte_entry():

        } else if (unlikely(IS_ENABLED(CONFIG_SHMEM) && mss->check_shmem_swap
                                                        && pte_none(*pte))) {
                page = find_get_entry(vma->vm_file->f_mapping,
                                                linear_page_index(vma, addr));

Here, vma->vm_file is NULL.  mss->check_shmem_swap should be false in that
case, however for smaps_rollup, smap_gather_stats() can set the flag true
for one vma and leave it true for subsequent vma's where it should be
false.

To fix, reset the check_shmem_swap flag to false.  There's also related
bug which sets mss->swap to shmem_swapped, which in the context of
smaps_rollup overwrites any value accumulated from previous vma's.  Fix
that as well.

Note that the report suggests a regression between 4.17.19 and 4.19-rc7,
which makes the 4.19 series ending with commit 258f669e7e88 ("mm:
/proc/pid/smaps_rollup: convert to single value seq_file") suspicious.
But the mss was reused for rollup since 493b0e9d945f ("mm: add
/proc/pid/smaps_rollup") so let's play it safe with the stable backport.

Link: http://lkml.kernel.org/r/555fbd1f-4ac9-0b58-dcd4-5dc4380ff7ca@suse.cz
Link: https://bugzilla.kernel.org/show_bug.cgi?id=201377
Fixes: 493b0e9d945f ("mm: add /proc/pid/smaps_rollup")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Leonardo Soares Müller <leozinho29_eu@hotmail.com>
Tested-by: Leonardo Soares Müller <leozinho29_eu@hotmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Daniel Colascione <dancol@google.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agocrypto: tcrypt - fix ghash-generic speed test
Horia Geantă [Wed, 12 Sep 2018 13:20:48 +0000 (16:20 +0300)]
crypto: tcrypt - fix ghash-generic speed test

commit 331351f89c36bf7d03561a28b6f64fa10a9f6f3a upstream.

ghash is a keyed hash algorithm, thus setkey needs to be called.
Otherwise the following error occurs:
$ modprobe tcrypt mode=318 sec=1
testing speed of async ghash-generic (ghash-generic)
tcrypt: test  0 (   16 byte blocks,   16 bytes per update,   1 updates):
tcrypt: hashing failed ret=-126

Cc: <stable@vger.kernel.org> # 4.6+
Fixes: 0660511c0bee ("crypto: tcrypt - Use ahash")
Tested-by: Franck Lenormand <franck.lenormand@nxp.com>
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agocrypto: lrw - Fix out-of bounds access on counter overflow
Ondrej Mosnacek [Thu, 13 Sep 2018 08:51:31 +0000 (10:51 +0200)]
crypto: lrw - Fix out-of bounds access on counter overflow

commit fbe1a850b3b1522e9fc22319ccbbcd2ab05328d2 upstream.

When the LRW block counter overflows, the current implementation returns
128 as the index to the precomputed multiplication table, which has 128
entries. This patch fixes it to return the correct value (127).

Fixes: 64470f1b8510 ("[CRYPTO] lrw: Liskov Rivest Wagner, a tweakable narrow block cipher mode")
Cc: <stable@vger.kernel.org> # 2.6.20+
Reported-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agosignal: Guard against negative signal numbers in copy_siginfo_from_user32
Eric W. Biederman [Thu, 11 Oct 2018 01:29:44 +0000 (20:29 -0500)]
signal: Guard against negative signal numbers in copy_siginfo_from_user32

commit a36700589b85443e28170be59fa11c8a104130a5 upstream.

While fixing an out of bounds array access in known_siginfo_layout
reported by the kernel test robot it became apparent that the same bug
exists in siginfo_layout and affects copy_siginfo_from_user32.

The straight forward fix that makes guards against making this mistake
in the future and should keep the code size small is to just take an
unsigned signal number instead of a signed signal number, as I did to
fix known_siginfo_layout.

Cc: stable@vger.kernel.org
Fixes: cc731525f26a ("signal: Remove kernel interal si_code magic")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agosignal/GenWQE: Fix sending of SIGKILL
Eric W. Biederman [Thu, 13 Sep 2018 09:28:01 +0000 (11:28 +0200)]
signal/GenWQE: Fix sending of SIGKILL

commit 0ab93e9c99f8208c0a1a7b7170c827936268c996 upstream.

The genweq_add_file and genwqe_del_file by caching current without
using reference counting embed the assumption that a file descriptor
will never be passed from one process to another.  It even embeds the
assumption that the the thread that opened the file will be in
existence when the process terminates.   Neither of which are
guaranteed to be true.

Therefore replace caching the task_struct of the opener with
pid of the openers thread group id.  All the knowledge of the
opener is used for is as the target of SIGKILL and a SIGKILL
will kill the entire process group.

Rename genwqe_force_sig to genwqe_terminate, remove it's unncessary
signal argument, update it's ownly caller, and use kill_pid
instead of force_sig.

The work force_sig does in changing signal handling state is not
relevant to SIGKILL sent as SEND_SIG_PRIV.  The exact same processess
will be killed just with less work, and less confusion.  The work done
by force_sig is really only needed for handling syncrhonous
exceptions.

It will still be possible to cause genwqe_device_remove to wait
8 seconds by passing a file descriptor to another process but
the possible user after free is fixed.

Fixes: eaf4722d4645 ("GenWQE Character device and DDCB queue")
Cc: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Frank Haverkamp <haver@linux.vnet.ibm.com>
Cc: Joerg-Stephan Vogt <jsvogt@de.ibm.com>
Cc: Michael Jung <mijung@gmx.net>
Cc: Michael Ruettger <michael@ibmra.de>
Cc: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
Cc: Eberhard S. Amann <esa@linux.vnet.ibm.com>
Cc: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Cc: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoPCI: vmd: White list for fast interrupt handlers
Keith Busch [Tue, 8 May 2018 16:00:22 +0000 (10:00 -0600)]
PCI: vmd: White list for fast interrupt handlers

commit a7f58b9ecfd3c0f63703ec10f4a592cc38dbd1b8 upstream.

Devices with slow interrupt handlers are significantly harming
performance when their interrupt vector is shared with a fast device.

Create a class code white list for devices with known fast interrupt
handlers and let all other devices share a single vector so that they
don't interfere with performance.

At the moment, only the NVM Express class code is on the list, but more
may be added if VMD users desire to use other low-latency devices in
these domains.

Signed-off-by: Keith Busch <keith.busch@intel.com>
[lorenzo.pieralisi@arm.com: changelog]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Jon Derrick: <jonathan.derrick@intel.com>
Cc: "Heitke, Kenneth" <kenneth.heitke@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoPCI: Add Device IDs for Intel GPU "spurious interrupt" quirk
Bin Meng [Wed, 26 Sep 2018 15:14:01 +0000 (08:14 -0700)]
PCI: Add Device IDs for Intel GPU "spurious interrupt" quirk

commit d0c9606b31a21028fb5b753c8ad79626292accfd upstream.

Add Device IDs to the Intel GPU "spurious interrupt" quirk table.

For these devices, unplugging the VGA cable and plugging it in again causes
spurious interrupts from the IGD.  Linux eventually disables the interrupt,
but of course that disables any other devices sharing the interrupt.

The theory is that this is a VGA BIOS defect: it should have disabled the
IGD interrupt but failed to do so.

See f67fd55fa96f ("PCI: Add quirk for still enabled interrupts on Intel
Sandy Bridge GPUs") and 7c82126a94e6 ("PCI: Add new ID for Intel GPU
"spurious interrupt" quirk") for some history.

[bhelgaas: See link below for discussion about how to fix this more
generically instead of adding device IDs for every new Intel GPU.  I hope
this is the last patch to add device IDs.]

Link: https://lore.kernel.org/linux-pci/1537974841-29928-1-git-send-email-bmeng.cn@gmail.com
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org # v3.4+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoPCI/ASPM: Fix link_state teardown on device removal
Lukas Wunner [Tue, 4 Sep 2018 17:34:18 +0000 (12:34 -0500)]
PCI/ASPM: Fix link_state teardown on device removal

commit aeae4f3e5c38d47bdaef50446dc0ec857307df68 upstream.

Upon removal of the last device on a bus, the link_state of the bridge
leading to that bus is sought to be torn down by having pci_stop_dev()
call pcie_aspm_exit_link_state().

When ASPM was originally introduced by commit 7d715a6c1ae5 ("PCI: add
PCI Express ASPM support"), it determined whether the device being
removed is the last one by calling list_empty() on the bridge's
subordinate devices list.  That didn't work because the device is only
removed from the list slightly later in pci_destroy_dev().

Commit 3419c75e15f8 ("PCI: properly clean up ASPM link state on device
remove") attempted to fix it by calling list_is_last(), but that's not
correct either because it checks whether the device is at the *end* of
the list, not whether it's the last one *left* in the list.  If the user
removes the device which happens to be at the end of the list via sysfs
but other devices are preceding the device in the list, the link_state
is torn down prematurely.

The real fix is to move the invocation of pcie_aspm_exit_link_state() to
pci_destroy_dev() and reinstate the call to list_empty().  Remove a
duplicate check for dev->bus->self because pcie_aspm_exit_link_state()
already contains an identical check.

Fixes: 7d715a6c1ae5 ("PCI: add PCI Express ASPM support")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Shaohua Li <shaohua.li@intel.com>
Cc: stable@vger.kernel.org # v2.6.26
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoARM: dts: dra7: Fix up unaligned access setting for PCIe EP
Vignesh R [Tue, 25 Sep 2018 05:21:51 +0000 (10:51 +0530)]
ARM: dts: dra7: Fix up unaligned access setting for PCIe EP

commit 6d0af44a82be87c13f2320821e9fbb8b8cf5a56f upstream.

Bit positions of PCIE_SS1_AXI2OCP_LEGACY_MODE_ENABLE and
PCIE_SS1_AXI2OCP_LEGACY_MODE_ENABLE in CTRL_CORE_SMA_SW_7 are
incorrectly documented in the TRM. In fact, the bit positions are
swapped. Update the DT bindings for PCIe EP to reflect the same.

Fixes: d23f3839fe97 ("ARM: dts: DRA7: Add pcie1 dt node for EP mode")
Cc: stable@vger.kernel.org
Signed-off-by: Vignesh R <vigneshr@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoEDAC, skx_edac: Fix logical channel intermediate decoding
Qiuxu Zhuo [Tue, 9 Oct 2018 17:20:25 +0000 (10:20 -0700)]
EDAC, skx_edac: Fix logical channel intermediate decoding

commit 8f18973877204dc8ca4ce1004a5d28683b9a7086 upstream.

The code "lchan = (lchan << 1) | ~lchan" for logical channel
intermediate decoding is wrong. The wrong intermediate decoding
result is {0xffffffff, 0xfffffffe}.

Fix it by replacing '~' with '!'. The correct intermediate
decoding result is {0x1, 0x2}.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
CC: Aristeu Rozanski <aris@redhat.com>
CC: Mauro Carvalho Chehab <mchehab@kernel.org>
CC: linux-edac <linux-edac@vger.kernel.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20181009172025.18594-1-tony.luck@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoEDAC, {i7core,sb,skx}_edac: Fix uncorrected error counting
Tony Luck [Fri, 28 Sep 2018 21:39:34 +0000 (14:39 -0700)]
EDAC, {i7core,sb,skx}_edac: Fix uncorrected error counting

commit 432de7fd7630c84ad24f1c2acd1e3bb4ce3741ca upstream.

The count of errors is picked up from bits 52:38 of the machine check
bank status register. But this is the count of *corrected* errors. If an
uncorrected error is being logged, the h/w sets this field to 0. Which
means that when edac_mc_handle_error() is called, the EDAC core will
carefully add zero to the appropriate uncorrected error counts.

Signed-off-by: Tony Luck <tony.luck@intel.com>
[ Massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180928213934.19890-1-tony.luck@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoEDAC, amd64: Add Family 17h, models 10h-2fh support
Michael Jin [Thu, 16 Aug 2018 19:28:40 +0000 (15:28 -0400)]
EDAC, amd64: Add Family 17h, models 10h-2fh support

commit 8960de4a5ca7980ed1e19e7ca5a774d3b7a55c38 upstream.

Add new device IDs for family 17h, models 10h-2fh.

This is required by amd64_edac_mod in order to properly detect PCI
device functions 0 and 6.

Signed-off-by: Michael Jin <mikhail.jin@gmail.com>
Reviewed-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180816192840.31166-1-mikhail.jin@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoHID: hiddev: fix potential Spectre v1
Breno Leitao [Fri, 19 Oct 2018 20:01:33 +0000 (17:01 -0300)]
HID: hiddev: fix potential Spectre v1

commit f11274396a538b31bc010f782e05c2ce3f804c13 upstream.

uref->usage_index can be indirectly controlled by userspace, hence leading
to a potential exploitation of the Spectre variant 1 vulnerability.

This field is used as an array index by the hiddev_ioctl_usage() function,
when 'cmd' is either HIDIOCGCOLLECTIONINDEX, HIDIOCGUSAGES or
HIDIOCSUSAGES.

For cmd == HIDIOCGCOLLECTIONINDEX case, uref->usage_index is compared to
field->maxusage and then used as an index to dereference field->usage
array. The same thing happens to the cmd == HIDIOC{G,S}USAGES cases, where
uref->usage_index is checked against an array maximum value and then it is
used as an index in an array.

This is a summary of the HIDIOCGCOLLECTIONINDEX case, which matches the
traditional Spectre V1 first load:

copy_from_user(uref, user_arg, sizeof(*uref))
if (uref->usage_index >= field->maxusage)
goto inval;
i = field->usage[uref->usage_index].collection_index;
return i;

This patch fixes this by sanitizing field uref->usage_index before using it
to index field->usage (HIDIOCGCOLLECTIONINDEX) or field->value in
HIDIOC{G,S}USAGES arrays, thus, avoiding speculation in the first load.

Cc: <stable@vger.kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
v2: Contemplate cmd == HIDIOC{G,S}USAGES case
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoext4: fix use-after-free race in ext4_remount()'s error path
Theodore Ts'o [Fri, 12 Oct 2018 13:28:09 +0000 (09:28 -0400)]
ext4: fix use-after-free race in ext4_remount()'s error path

commit 33458eaba4dfe778a426df6a19b7aad2ff9f7eec upstream.

It's possible for ext4_show_quota_options() to try reading
s_qf_names[i] while it is being modified by ext4_remount() --- most
notably, in ext4_remount's error path when the original values of the
quota file name gets restored.

Reported-by: syzbot+a2872d6feea6918008a9@syzkaller.appspotmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org # 3.2+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoext4: propagate error from dquot_initialize() in EXT4_IOC_FSSETXATTR
Wang Shilong [Wed, 3 Oct 2018 16:19:21 +0000 (12:19 -0400)]
ext4: propagate error from dquot_initialize() in EXT4_IOC_FSSETXATTR

commit 182a79e0c17147d2c2d3990a9a7b6b58a1561c7a upstream.

We return most failure of dquota_initialize() except
inode evict, this could make a bit sense, for example
we allow file removal even quota files are broken?

But it dosen't make sense to allow setting project
if quota files etc are broken.

Signed-off-by: Wang Shilong <wshilong@ddn.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoext4: fix setattr project check in fssetxattr ioctl
Wang Shilong [Wed, 3 Oct 2018 14:33:32 +0000 (10:33 -0400)]
ext4: fix setattr project check in fssetxattr ioctl

commit dc7ac6c4cae3b58724c2f1e21a7c05ce19ecd5a8 upstream.

Currently, project quota could be changed by fssetxattr
ioctl, and existed permission check inode_owner_or_capable()
is obviously not enough, just think that common users could
change project id of file, that could make users to
break project quota easily.

This patch try to follow same regular of xfs project
quota:

"Project Quota ID state is only allowed to change from
within the init namespace. Enforce that restriction only
if we are trying to change the quota ID state.
Everything else is allowed in user namespaces."

Besides that, check and set project id'state should
be an atomic operation, protect whole operation with
inode lock, ext4_ioctl_setproject() is only used for
ioctl EXT4_IOC_FSSETXATTR, we have held mnt_want_write_file()
before ext4_ioctl_setflags(), and ext4_ioctl_setproject()
is called after ext4_ioctl_setflags(), we could share
codes, so remove it inside ext4_ioctl_setproject().

Signed-off-by: Wang Shilong <wshilong@ddn.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoext4: initialize retries variable in ext4_da_write_inline_data_begin()
Lukas Czerner [Wed, 3 Oct 2018 01:18:45 +0000 (21:18 -0400)]
ext4: initialize retries variable in ext4_da_write_inline_data_begin()

commit 625ef8a3acd111d5f496d190baf99d1a815bd03e upstream.

Variable retries is not initialized in ext4_da_write_inline_data_begin()
which can lead to nondeterministic number of retries in case we hit
ENOSPC. Initialize retries to zero as we do everywhere else.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Fixes: bc0ca9df3b2a ("ext4: retry allocation when inline->extent conversion failed")
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agogfs2_meta: ->mount() can get NULL dev_name
Al Viro [Sat, 13 Oct 2018 04:19:13 +0000 (00:19 -0400)]
gfs2_meta: ->mount() can get NULL dev_name

commit 3df629d873f8683af6f0d34dfc743f637966d483 upstream.

get in sync with mount_bdev() handling of the same

Reported-by: syzbot+c54f8e94e6bba03b04e9@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agojbd2: fix use after free in jbd2_log_do_checkpoint()
Jan Kara [Fri, 5 Oct 2018 22:44:40 +0000 (18:44 -0400)]
jbd2: fix use after free in jbd2_log_do_checkpoint()

commit ccd3c4373eacb044eb3832966299d13d2631f66f upstream.

The code cleaning transaction's lists of checkpoint buffers has a bug
where it increases bh refcount only after releasing
journal->j_list_lock. Thus the following race is possible:

CPU0 CPU1
jbd2_log_do_checkpoint()
jbd2_journal_try_to_free_buffers()
  __journal_try_to_free_buffer(bh)
  ...
  while (transaction->t_checkpoint_io_list)
  ...
    if (buffer_locked(bh)) {

<-- IO completes now, buffer gets unlocked -->

      spin_unlock(&journal->j_list_lock);
    spin_lock(&journal->j_list_lock);
    __jbd2_journal_remove_checkpoint(jh);
    spin_unlock(&journal->j_list_lock);
  try_to_free_buffers(page);
      get_bh(bh) <-- accesses freed bh

Fix the problem by grabbing bh reference before unlocking
journal->j_list_lock.

Fixes: dc6e8d669cf5 ("jbd2: don't call get_bh() before calling __jbd2_journal_remove_checkpoint()")
Fixes: be1158cc615f ("jbd2: fold __process_buffer() into jbd2_log_do_checkpoint()")
Reported-by: syzbot+7f4a27091759e2fe7453@syzkaller.appspotmail.com
CC: stable@vger.kernel.org
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoIB/mlx5: Fix MR cache initialization
Artemy Kovalyov [Mon, 15 Oct 2018 11:13:35 +0000 (14:13 +0300)]
IB/mlx5: Fix MR cache initialization

commit 013c2403bf32e48119aeb13126929f81352cc7ac upstream.

Schedule MR cache work only after bucket was initialized.

Cc: <stable@vger.kernel.org> # 4.10
Fixes: 49780d42dfc9 ("IB/mlx5: Expose MR cache for mlx5_ib")
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoASoC: intel: skylake: Add missing break in skl_tplg_get_token()
Takashi Iwai [Wed, 3 Oct 2018 17:31:44 +0000 (19:31 +0200)]
ASoC: intel: skylake: Add missing break in skl_tplg_get_token()

commit 9c80c5a8831471e0a3e139aad1b0d4c0fdc50b2f upstream.

skl_tplg_get_token() misses a break in the big switch() block for
SKL_TKN_U8_CORE_ID entry.
Spotted nicely by -Wimplicit-fallthrough compiler option.

Fixes: 6277e83292a2 ("ASoC: Intel: Skylake: Parse vendor tokens to build module data")
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agolibnvdimm, region: Fail badblocks listing for inactive regions
Dan Williams [Thu, 27 Sep 2018 22:01:55 +0000 (15:01 -0700)]
libnvdimm, region: Fail badblocks listing for inactive regions

commit 5d394eee2c102453278d81d9a7cf94c80253486a upstream.

While experimenting with region driver loading the following backtrace
was triggered:

 INFO: trying to register non-static key.
 the code is fine but needs lockdep annotation.
 turning off the locking correctness validator.
 [..]
 Call Trace:
  dump_stack+0x85/0xcb
  register_lock_class+0x571/0x580
  ? __lock_acquire+0x2ba/0x1310
  ? kernfs_seq_start+0x2a/0x80
  __lock_acquire+0xd4/0x1310
  ? dev_attr_show+0x1c/0x50
  ? __lock_acquire+0x2ba/0x1310
  ? kernfs_seq_start+0x2a/0x80
  ? lock_acquire+0x9e/0x1a0
  lock_acquire+0x9e/0x1a0
  ? dev_attr_show+0x1c/0x50
  badblocks_show+0x70/0x190
  ? dev_attr_show+0x1c/0x50
  dev_attr_show+0x1c/0x50

This results from a missing successful call to devm_init_badblocks()
from nd_region_probe(). Block attempts to show badblocks while the
region is not enabled.

Fixes: 6a6bef90425e ("libnvdimm: add mechanism to publish badblocks...")
Cc: <stable@vger.kernel.org>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agolibnvdimm: Hold reference on parent while scheduling async init
Alexander Duyck [Tue, 25 Sep 2018 20:53:02 +0000 (13:53 -0700)]
libnvdimm: Hold reference on parent while scheduling async init

commit b6eae0f61db27748606cc00dafcfd1e2c032f0a5 upstream.

Unlike asynchronous initialization in the core we have not yet associated
the device with the parent, and as such the device doesn't hold a reference
to the parent.

In order to resolve that we should be holding a reference on the parent
until the asynchronous initialization has completed.

Cc: <stable@vger.kernel.org>
Fixes: 4d88a97aa9e8 ("libnvdimm: ...base ... infrastructure")
Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agodmaengine: stm32-dma: fix incomplete configuration in cyclic mode
Pierre Yves MORDRET [Tue, 13 Mar 2018 16:42:06 +0000 (17:42 +0100)]
dmaengine: stm32-dma: fix incomplete configuration in cyclic mode

commit e57cb3b3f10d005410f09d4598cc6d62b833f2b0 upstream.

When in cyclic mode, the configuration is updated after having started the
DMA hardware (STM32_DMA_SCR_EN) leading to incomplete configuration of
SMxAR registers.

Signed-off-by: Pierre-Yves MORDRET <pierre-yves.mordret@st.com>
Signed-off-by: Hugues Fruchet <hugues.fruchet@st.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agodmaengine: ppc4xx: fix off-by-one build failure
Christian Lamparter [Sun, 14 Oct 2018 21:28:50 +0000 (23:28 +0200)]
dmaengine: ppc4xx: fix off-by-one build failure

commit 27d8d2d7a9b7eb05c4484b74b749eaee7b50b845 upstream.

There are two poly_store, but one should have been poly_show.

|adma.c:4382:16: error: conflicting types for 'poly_store'
| static ssize_t poly_store(struct device_driver *dev, const char *buf,
|                ^~~~~~~~~~
|adma.c:4363:16: note: previous definition of 'poly_store' was here
| static ssize_t poly_store(struct device_driver *dev, char *buf)
|                ^~~~~~~~~~

CC: stable@vger.kernel.org
Fixes: 13efe1a05384 ("dmaengine: ppc4xx: remove DRIVER_ATTR() usage")
Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonet/ipv4: defensive cipso option parsing
Stefan Nuernberger [Mon, 17 Sep 2018 17:46:53 +0000 (19:46 +0200)]
net/ipv4: defensive cipso option parsing

commit 076ed3da0c9b2f88d9157dbe7044a45641ae369e upstream.

commit 40413955ee26 ("Cipso: cipso_v4_optptr enter infinite loop") fixed
a possible infinite loop in the IP option parsing of CIPSO. The fix
assumes that ip_options_compile filtered out all zero length options and
that no other one-byte options beside IPOPT_END and IPOPT_NOOP exist.
While this assumption currently holds true, add explicit checks for zero
length and invalid length options to be safe for the future. Even though
ip_options_compile should have validated the options, the introduction of
new one-byte options can still confuse this code without the additional
checks.

Signed-off-by: Stefan Nuernberger <snu@amazon.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Simon Veith <sveith@amazon.de>
Cc: stable@vger.kernel.org
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoiwlwifi: mvm: check return value of rs_rate_from_ucode_rate()
Luca Coelho [Sat, 13 Oct 2018 06:46:08 +0000 (09:46 +0300)]
iwlwifi: mvm: check return value of rs_rate_from_ucode_rate()

commit 3d71c3f1f50cf309bd20659422af549bc784bfff upstream.

The rs_rate_from_ucode_rate() function may return -EINVAL if the rate
is invalid, but none of the callsites check for the error, potentially
making us access arrays with index IWL_RATE_INVALID, which is larger
than the arrays, causing an out-of-bounds access.  This will trigger
KASAN warnings, such as the one reported in the bugzilla issue
mentioned below.

This fixes https://bugzilla.kernel.org/show_bug.cgi?id=200659

Cc: stable@vger.kernel.org
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agousb: gadget: udc: renesas_usb3: Fix b-device mode for "workaround"
Yoshihiro Shimoda [Tue, 2 Oct 2018 11:57:44 +0000 (20:57 +0900)]
usb: gadget: udc: renesas_usb3: Fix b-device mode for "workaround"

commit afc92514a34c7414b28047b1205a6b709103c699 upstream.

If the "workaround_for_vbus" is true, the driver will not call
usb_disconnect(). So, since the controller keeps some registers'
value, the driver doesn't re-enumarate suitable speed after
the b-device mode is disabled. To fix the issue, this patch
adds usb_disconnect() calling in renesas_usb3_b_device_write()
if workaround_for_vbus is true.

Fixes: 43ba968b00ea ("usb: gadget: udc: renesas_usb3: add debugfs to set the b-device mode")
Cc: <stable@vger.kernel.org> # v4.14+
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agousbip:vudc: BUG kmalloc-2048 (Not tainted): Poison overwritten
Shuah Khan (Samsung OSG) [Thu, 18 Oct 2018 16:19:29 +0000 (10:19 -0600)]
usbip:vudc: BUG kmalloc-2048 (Not tainted): Poison overwritten

commit e28fd56ad5273be67d0fae5bedc7e1680e729952 upstream.

In rmmod path, usbip_vudc does platform_device_put() twice once from
platform_device_unregister() and then from put_vudc_device().

The second put results in:

BUG kmalloc-2048 (Not tainted): Poison overwritten error or
BUG: KASAN: use-after-free in kobject_put+0x1e/0x230 if KASAN is
enabled.

[  169.042156] calling  init+0x0/0x1000 [usbip_vudc] @ 1697
[  169.042396] =============================================================================
[  169.043678] probe of usbip-vudc.0 returned 1 after 350 usecs
[  169.044508] BUG kmalloc-2048 (Not tainted): Poison overwritten
[  169.044509] -----------------------------------------------------------------------------
...
[  169.057849] INFO: Freed in device_release+0x2b/0x80 age=4223 cpu=3 pid=1693
[  169.057852]  kobject_put+0x86/0x1b0
[  169.057853]  0xffffffffc0c30a96
[  169.057855]  __x64_sys_delete_module+0x157/0x240

Fix it to call platform_device_del() instead and let put_vudc_device() do
the platform_device_put().

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Shuah Khan (Samsung OSG) <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>