Syzbot reported slab-out-of-bounds Write bug in hid-betopff driver.
The problem is the driver assumes the device must have an input report but
some malicious devices violate this assumption.
So this patch checks hid_device's input is non empty before it's been used.
1) If we ccp_init_data() fails for &src then we need to free aad.
Use goto e_aad instead of goto e_ctx.
2) The label to free the &final_wa was named incorrectly as "e_tag" but
it should have been "e_final_wa". One error path leaked &final_wa.
3) The &tag was leaked on one error path. In that case, I added a free
before the goto because the resource was local to that block.
Fixes: 36cf515b9bbe ("crypto: ccp - Enable support for AES GCM on v5 CCPs") Reported-by: "minihanshen(沈明航)" <minihanshen@tencent.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: John Allen <john.allen@amd.com> Tested-by: John Allen <john.allen@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There are two invocation sites of hso_free_net_device. After
refactoring hso_create_net_device, this parameter is useless.
Remove the bailout in the hso_free_net_device and change the invocation
sites of this function.
Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The current error handling code of hso_create_net_device is
hso_free_net_device, no matter which errors lead to. For example,
WARNING in hso_free_net_device [1].
Fix this by refactoring the error handling code of
hso_create_net_device by handling different errors by different code.
Reported-by: syzbot+44d53c7255bb1aea22d2@syzkaller.appspotmail.com Fixes: 5fcfb6d0bfcd ("hso: fix bailout in error case of probe") Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The driver tries to reuse code for disconnect in case
of a failed probe.
If resources need to be freed after an error in probe, the
netdev must not be freed because it has never been registered.
Fix it by telling the helper which path we are in.
Signed-off-by: Oliver Neukum <oneukum@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: <stable@vger.kernel.org> Signed-off-by: sumiyawang <sumiyawang@tencent.com> Reviewed-by: yongduan <yongduan@tencent.com> Link: https://lore.kernel.org/r/1629632949-14749-1-git-send-email-sumiyawang@tencent.com Fixes: 50f44ee7248a ("mm/devm_memremap_pages: fix final page put race") Signed-off-by: Dan Williams <dan.j.williams@intel.com>
[tyhicks: Minor contextual change in pmem_attach_disk() due to the
transition to 'struct range' not yet taking place. Preserve the
memcpy() call rather than initializing the range struct. That change
was introduced in v5.10 with commit a4574f63edc6 ("mm/memremap_pages:
convert to 'struct range'")] Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The PCI code has several paths where the struct pci_host_bridge is freed
directly. This is wrong because it contains a struct device which is
refcounted and should be freed using put_device(). This can result in
use-after-free errors. I think this problem has existed since 2012 with
commit 7b5436635800 ("PCI: add generic device into pci_host_bridge
struct"). It generally hasn't mattered as most host bridge drivers are
still built-in and can't unbind.
The problem is a struct device should never be freed directly once
device_initialize() is called and a ref is held, but that doesn't happen
until pci_register_host_bridge(). There's then a window between allocating
the host bridge and pci_register_host_bridge() where kfree should be used.
This is fragile and requires callers to do the right thing. To fix this, we
need to split device_register() into device_initialize() and device_add()
calls, so that the host bridge struct is always freed by using a
put_device().
devm_pci_alloc_host_bridge() is using devm_kzalloc() to allocate struct
pci_host_bridge which will be freed directly. Instead, we can use a custom
devres action to call put_device().
Link: https://lore.kernel.org/r/20200513223859.11295-2-robh@kernel.org Reported-by: Anders Roxell <anders.roxell@linaro.org> Tested-by: Anders Roxell <anders.roxell@linaro.org> Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Arnd Bergmann <arnd@arndb.de>
[tyhicks: Minor contextual change in pci_init_host_bridge() due to the
lack of a native_dpc member in the pci_host_bridge struct. It was added
in v5.7 with commit ac1c8e35a326 ("PCI/DPC: Add Error Disconnect
Recover (EDR) support")] Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 14b41a2959fb ("net: stmmac: Delete txtimer in suspend") was the
first attempt to fix a race between mod_timer() and setup_timer()
during stmmac_resume(). However the issue still exists as the commit
only addressed half of the issue.
Same race can still happen as stmmac_resume() re-attaches interface
way too early - even before hardware is fully initialized. Worse,
doing so allows network traffic to restart and stmmac_tx_timer_arm()
being called in the middle of stmmac_resume(), which re-init tx timers
in stmmac_init_coalesce(). timer_list will be corrupted and system
crashes as a result of race between mod_timer() and setup_timer().
Fix this by deferring netif_device_attach() to the end of
stmmac_resume().
Signed-off-by: Leon Yu <leoyu@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Macpaul Lin <macpaul.lin@mediatek.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
up->corkflag field can be read or written without any lock.
Annotate accesses to avoid possible syzbot/KCSAN reports.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since the actual_length calculation is performed unsigned, packets
shorter than 7 bytes (e.g. packets without data or otherwise truncated)
or non-received packets ("zero" bytes) can cause buffer overflow.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=214437 Fixes: 42337b9d4d958("HID: add driver for U2F Zero built-in LED and RNG") Signed-off-by: Andrej Shadura <andrew.shadura@collabora.co.uk> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When ext4_htree_fill_tree() fails, ext4_dx_readdir() can run into an
infinite loop since if info->last_pos != ctx->pos this will reset the
directory scan and reread the failing entry. For example:
1. a dx_dir which has 3 block, block 0 as dx_root block, block 1/2 as
leaf block which own the ext4_dir_entry_2
2. block 1 read ok and call_filldir which will fill the dirent and update
the ctx->pos
3. block 2 read fail, but we has already fill some dirent, so we will
return back to userspace will a positive return val(see ksys_getdents64)
4. the second ext4_dx_readdir will reset the world since info->last_pos
!= ctx->pos, and will also init the curr_hash which pos to block 1
5. So we will read block1 too, and once block2 still read fail, we can
only fill one dirent because the hash of the entry in block1(besides
the last one) won't greater than curr_hash
6. this time, we forget update last_pos too since the read for block2
will fail, and since we has got the one entry, ksys_getdents64 can
return success
7. Latter we will trapped in a loop with step 4~6
Cc: stable@kernel.org Signed-off-by: yangerkun <yangerkun@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Link: https://lore.kernel.org/r/20210914111415.3921954-1-yangerkun@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When ext4_insert_delayed block receives and recovers from an error from
ext4_es_insert_delayed_block(), e.g., ENOMEM, it does not release the
space it has reserved for that block insertion as it should. One effect
of this bug is that s_dirtyclusters_counter is not decremented and
remains incorrectly elevated until the file system has been unmounted.
This can result in premature ENOSPC returns and apparent loss of free
space.
Another effect of this bug is that
/sys/fs/ext4/<dev>/delayed_allocation_blocks can remain non-zero even
after syncfs has been executed on the filesystem.
Besides, add check for s_dirtyclusters_counter when inode is going to be
evicted and freed. s_dirtyclusters_counter can still keep non-zero until
inode is written back in .evict_inode(), and thus the check is delayed
to .destroy_inode().
We should use unsigned long long rather than loff_t to avoid
overflow in ext4_max_bitmap_size() for comparison before returning.
w/o this patch sbi->s_bitmap_maxbytes was becoming a negative
value due to overflow of upper_limit (with has_huge_files as true)
Below is a quick test to trigger it on a 64KB pagesize system.
sudo mkfs.ext4 -b 65536 -O ^has_extents,^64bit /dev/loop2
sudo mount /dev/loop2 /mnt
sudo echo "hello" > /mnt/hello -> This will error out with
"echo: write error: File too large"
Registration of the ipoctal tty devices is unlikely to fail, but if it
ever does, make sure not to deregister a never registered tty device
(and dereference a NULL pointer) when the driver is later unbound.
The tty driver name is used also after registering the driver and must
specifically not be allocated on the stack to avoid leaking information
to user space (or triggering an oops).
Drivers should not try to encode topology information in the tty device
name but this one snuck in through staging without anyone noticing and
another driver has since copied this malpractice.
Fixing the ABI is a separate issue, but this at least plugs the security
hole.
Fixes: ba4dc61fe8c5 ("Staging: ipack: add support for IP-OCTAL mezzanine board") Cc: stable@vger.kernel.org # 3.5 Acked-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com> Signed-off-by: Johan Hovold <johan@kernel.org> Link: https://lore.kernel.org/r/20210917114622.5412-2-johan@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In commit b212921b13bd ("elf: don't use MAP_FIXED_NOREPLACE for elf
executable mappings") we still leave MAP_FIXED_NOREPLACE in place for
load_elf_interp.
Unfortunately, this will cause kernel to fail to start with:
1 (init): Uhuuh, elf segment at 00003ffff7ffd000 requested but the memory is mapped already
Failed to execute /init (error -17)
The reason is that the elf interpreter (ld.so) has overlapping segments.
readelf -l ld-2.31.so
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x000000000002c94c 0x000000000002c94c R E 0x10000
LOAD 0x000000000002dae0 0x000000000003dae0 0x000000000003dae0
0x00000000000021e8 0x0000000000002320 RW 0x10000
LOAD 0x000000000002fe00 0x000000000003fe00 0x000000000003fe00
0x00000000000011ac 0x0000000000001328 RW 0x10000
The reason for this problem is the same as described in commit ad55eac74f20 ("elf: enforce MAP_FIXED on overlaying elf segments").
Not only executable binaries, elf interpreters (e.g. ld.so) can have
overlapping elf segments, so we better drop MAP_FIXED_NOREPLACE and go
back to MAP_FIXED in load_elf_interp.
Fixes: 4ed28639519c ("fs, elf: drop MAP_FIXED usage from elf_map") Cc: <stable@vger.kernel.org> # v4.19 Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Chen Jingwen <chenjingwen6@huawei.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jann Horn reported that SO_PEERCRED and SO_PEERGROUPS implementations
are racy, as af_unix can concurrently change sk_peer_pid and sk_peer_cred.
In order to fix this issue, this patch adds a new spinlock that needs
to be used whenever these fields are read or written.
Jann also pointed out that l2cap_sock_get_peer_pid_cb() is currently
reading sk->sk_peer_pid which makes no sense, as this field
is only possibly set by AF_UNIX sockets.
We will have to clean this in a separate patch.
This could be done by reverting b48596d1dc25 "Bluetooth: L2CAP: Add get_peer_pid callback"
or implementing what was truly expected.
Fixes: 109f6e39fa07 ("af_unix: Allow SO_PEERCRED to work across namespaces.") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Jann Horn <jannh@google.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Cc: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Patch that refactored fl_walk() to use idr_for_each_entry_continue_ul()
also removed rcu protection of individual filters which causes following
use-after-free when filter is deleted concurrently. Fix fl_walk() to obtain
rcu read lock while iterating and taking the filter reference and temporary
release the lock while calling arg->fn() callback that can sleep.
KASAN trace:
[ 352.773640] ==================================================================
[ 352.775041] BUG: KASAN: use-after-free in fl_walk+0x159/0x240 [cls_flower]
[ 352.776304] Read of size 4 at addr ffff8881c8251480 by task tc/2987
hns3_nic_net_open() is not allowed to called repeatly, but there
is no checking for this. When doing device reset and setup tc
concurrently, there is a small oppotunity to call hns3_nic_net_open
repeatedly, and cause kernel bug by calling napi_enable twice.
The calltrace information is like below:
[ 3078.222780] ------------[ cut here ]------------
[ 3078.230255] kernel BUG at net/core/dev.c:6991!
[ 3078.236224] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[ 3078.243431] Modules linked in: hns3 hclgevf hclge hnae3 vfio_iommu_type1 vfio_pci vfio_virqfd vfio pv680_mii(O)
[ 3078.258880] CPU: 0 PID: 295 Comm: kworker/u8:5 Tainted: G O 5.14.0-rc4+ #1
[ 3078.269102] Hardware name: , BIOS KpxxxFPGA 1P B600 V181 08/12/2021
[ 3078.276801] Workqueue: hclge hclge_service_task [hclge]
[ 3078.288774] pstate: 60400009 (nZCv daif +PAN -UAO -TCO BTYPE=--)
[ 3078.296168] pc : napi_enable+0x80/0x84
tc qdisc sho[w 3d0e7v8 .e3t0h218 79] lr : hns3_nic_net_open+0x138/0x510 [hns3]
Once hns3_nic_net_open() is excute success, the flag
HNS3_NIC_STATE_DOWN will be cleared. So add checking for this
flag, directly return when HNS3_NIC_STATE_DOWN is no set.
Fixes: e888402789b9 ("net: hns3: call hns3_nic_net_open() while doing HNAE3_UP_CLIENT") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Both cxgb4 and csiostor drivers run on their own independent Physical
Function. But when cxgb4 and csiostor are both being loaded in parallel via
modprobe, there is a race when firmware upgrade is attempted by both the
drivers.
When the cxgb4 driver initiates the firmware upgrade, it halts the firmware
and the chip until upgrade is complete. When the csiostor driver is coming
up in parallel, the firmware mailbox communication fails with timeouts and
the csiostor driver probe fails.
Add a module soft dependency on cxgb4 driver to ensure loading csiostor
triggers cxgb4 to load first when available to avoid the firmware upgrade
race.
It's not enough to set net.ipv4.conf.all.rp_filter=0, that does not override
a greater rp_filter value on the individual interfaces. We also need to set
net.ipv4.conf.default.rp_filter=0 before creating the interfaces. That way,
they'll also get their own rp_filter value of zero.
The e100_get_regs function is used to implement a simple register dump
for the e100 device. The data is broken into a couple of MAC control
registers, and then a series of PHY registers, followed by a memory dump
buffer.
The total length of the register dump is defined as (1 + E100_PHY_REGS)
* sizeof(u32) + sizeof(nic->mem->dump_buf).
The logic for filling in the PHY registers uses a convoluted inverted
count for loop which counts from E100_PHY_REGS (0x1C) down to 0, and
assigns the slots 1 + E100_PHY_REGS - i. The first loop iteration will
fill in [1] and the final loop iteration will fill in [1 + 0x1C]. This
is actually one more than the supposed number of PHY registers.
The memory dump buffer is then filled into the space at
[2 + E100_PHY_REGS] which will cause that memcpy to assign 4 bytes past
the total size.
The end result is that we overrun the total buffer size allocated by the
kernel, which could lead to a panic or other issues due to memory
corruption.
It is difficult to determine the actual total number of registers
here. The only 8255x datasheet I could find indicates there are 28 total
MDI registers. However, we're reading 29 here, and reading them in
reverse!
In addition, the ethtool e100 register dump interface appears to read
the first PHY register to determine if the device is in MDI or MDIx
mode. This doesn't appear to be documented anywhere within the 8255x
datasheet. I can only assume it must be in register 28 (the extra
register we're reading here).
Lets not change any of the intended meaning of what we copy here. Just
extend the space by 4 bytes to account for the extra register and
continue copying the data out in the same order.
Change the E100_PHY_REGS value to be the correct total (29) so that the
total register dump size is calculated properly. Fix the offset for
where we copy the dump buffer so that it doesn't overrun the total size.
Re-write the for loop to use counting up instead of the convoluted
down-counting. Correct the mdio_read offset to use the 0-based register
offsets, but maintain the bizarre reverse ordering so that we have the
ABI expected by applications like ethtool. This requires and additional
subtraction of 1. It seems a bit odd but it makes the flow of assignment
into the register buffer easier to follow.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Felicitas Hetzelt <felicitashetzelt@gmail.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
commit abf9b902059f ("e100: cleanup unneeded math") tried to simplify
e100_get_regs_len and remove a double 'divide and then multiply'
calculation that the e100_reg_regs_len function did.
This change broke the size calculation entirely as it failed to account
for the fact that the numbered registers are actually 4 bytes wide and
not 1 byte. This resulted in a significant under allocation of the
register buffer used by e100_get_regs.
Fix this by properly multiplying the register count by u32 first before
adding the size of the dump buffer.
Multipath RTA_FLOW is embedded in nexthop. Dump it in fib_add_nexthop()
to get the length of rtnexthop correct.
Fixes: b0f60193632e ("ipv4: Refactor nexthop attributes in fib_dump_info") Signed-off-by: Xiao Liang <shaw.leon@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
For both local and remote sensors all the supported ICs can report an
"undervoltage lockout" condition which means the conversion wasn't
properly performed due to insufficient power supply voltage and so the
measurement results can't be trusted.
Fixes: 3acb50c18d8d ("sctp: delay as much as possible skb_linearize") Reported-by: syzbot+581aff2ae6b860625116@syzkaller.appspotmail.com Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Thomas explained in https://lore.kernel.org/r/87mtoeb4hb.ffs@tglx
that our handling of the hrtimer here is wrong: If the timer fires
late (e.g. due to vCPU scheduling, as reported by Dmitry/syzbot)
then it tries to actually rearm the timer at the next deadline,
which might be in the past already:
1 2 3 N N+1
| | | ... | |
^ intended to fire here (1)
^ next deadline here (2)
^ actually fired here
The next time it fires, it's later, but will still try to schedule
for the next deadline (now 3), etc. until it catches up with N,
but that might take a long time, causing stalls etc.
Now, all of this is simulation, so we just have to fix it, but
note that the behaviour is wrong even per spec, since there's no
value then in sending all those beacons unaligned - they should be
aligned to the TBTT (1, 2, 3, ... in the picture), and if we're a
bit (or a lot) late, then just resume at that point.
Therefore, change the code to use hrtimer_forward_now() which will
ensure that the next firing of the timer would be at N+1 (in the
picture), i.e. the next interval point after the current time.
Suggested-by: Thomas Gleixner <tglx@linutronix.de> Reported-by: Dmitry Vyukov <dvyukov@google.com> Reported-by: syzbot+0e964fad69a9c462bc1e@syzkaller.appspotmail.com Fixes: 01e59e467ecf ("mac80211_hwsim: hrtimer beacon") Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210915112936.544f383472eb.I3f9712009027aa09244b65399bf18bf482a8c4f1@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
In ieee80211_amsdu_aggregate() set a pointer frag_tail point to the
end of skb_shinfo(head)->frag_list, and use it to bind other skb in
the end of this function. But when execute ieee80211_amsdu_aggregate()
->ieee80211_amsdu_realloc_pad()->pskb_expand_head(), the address of
skb_shinfo(head)->frag_list will be changed. However, the
ieee80211_amsdu_aggregate() not update frag_tail after call
pskb_expand_head(). That will cause the second skb can't bind to the
head skb appropriately.So we update the address of frag_tail to fix it.
Fan speed minimum can be enforced from sysfs. For example, setting
current fan speed to 20 is used to enforce fan speed to be at 100%
speed, 19 - to be not below 90% speed, etcetera. This feature provides
ability to limit fan speed according to some system wise
considerations, like absence of some replaceable units or high system
ambient temperature.
Request for changing fan minimum speed is configuration request and can
be set only through 'sysfs' write procedure. In this situation value of
argument 'state' is above nominal fan speed maximum.
Return non-zero code in this case to avoid
thermal_cooling_device_stats_update() call, because in this case
statistics update violates thermal statistics table range.
The issues is observed in case kernel is configured with option
CONFIG_THERMAL_STATISTICS.
Here is the trace from KASAN:
[ 159.506659] BUG: KASAN: slab-out-of-bounds in thermal_cooling_device_stats_update+0x7d/0xb0
[ 159.516016] Read of size 4 at addr ffff888116163840 by task hw-management.s/7444
[ 159.545625] Call Trace:
[ 159.548366] dump_stack+0x92/0xc1
[ 159.552084] ? thermal_cooling_device_stats_update+0x7d/0xb0
[ 159.635869] thermal_zone_device_update+0x345/0x780
[ 159.688711] thermal_zone_device_set_mode+0x7d/0xc0
[ 159.694174] mlxsw_thermal_modules_init+0x48f/0x590 [mlxsw_core]
[ 159.700972] ? mlxsw_thermal_set_cur_state+0x5a0/0x5a0 [mlxsw_core]
[ 159.731827] mlxsw_thermal_init+0x763/0x880 [mlxsw_core]
[ 160.070233] RIP: 0033:0x7fd995909970
[ 160.074239] Code: 73 01 c3 48 8b 0d 28 d5 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 99 2d 2c 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ..
[ 160.095242] RSP: 002b:00007fff54f5d938 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 160.103722] RAX: ffffffffffffffda RBX: 0000000000000013 RCX: 00007fd995909970
[ 160.111710] RDX: 0000000000000013 RSI: 0000000001906008 RDI: 0000000000000001
[ 160.119699] RBP: 0000000001906008 R08: 00007fd995bc9760 R09: 00007fd996210700
[ 160.127687] R10: 0000000000000073 R11: 0000000000000246 R12: 0000000000000013
[ 160.135673] R13: 0000000000000001 R14: 00007fd995bc8600 R15: 0000000000000013
[ 160.143671]
[ 160.145338] Allocated by task 2924:
[ 160.149242] kasan_save_stack+0x19/0x40
[ 160.153541] __kasan_kmalloc+0x7f/0xa0
[ 160.157743] __kmalloc+0x1a2/0x2b0
[ 160.161552] thermal_cooling_device_setup_sysfs+0xf9/0x1a0
[ 160.167687] __thermal_cooling_device_register+0x1b5/0x500
[ 160.173833] devm_thermal_of_cooling_device_register+0x60/0xa0
[ 160.180356] mlxreg_fan_probe+0x474/0x5e0 [mlxreg_fan]
[ 160.248140]
[ 160.249807] The buggy address belongs to the object at ffff888116163400
[ 160.249807] which belongs to the cache kmalloc-1k of size 1024
[ 160.263814] The buggy address is located 64 bytes to the right of
[ 160.263814] 1024-byte region [ffff888116163400, ffff888116163800)
[ 160.277536] The buggy address belongs to the page:
[ 160.282898] page:0000000012275840 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888116167000 pfn:0x116160
[ 160.294872] head:0000000012275840 order:3 compound_mapcount:0 compound_pincount:0
[ 160.303251] flags: 0x200000000010200(slab|head|node=0|zone=2)
[ 160.309694] raw: 0200000000010200ffffea00046f7208ffffea0004928208ffff88810004dbc0
[ 160.318367] raw: ffff88811616700000000000000a000600000001ffffffff0000000000000000
[ 160.327033] page dumped because: kasan: bad access detected
[ 160.333270]
[ 160.334937] Memory state around the buggy address:
[ 160.356469] >ffff888116163800: fc ..
Fixes: 65afb4c8e7e4 ("hwmon: (mlxreg-fan) Add support for Mellanox FAN driver") Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Link: https://lore.kernel.org/r/20210916183151.869427-1-vadimp@nvidia.com Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
ip_vs_conn_tab_bits may be provided by the user through the
conn_tab_bits module parameter. If this value is greater than 31, or
less than 0, the shift operator used to derive tab_size causes undefined
behaviour.
Fix this checking ip_vs_conn_tab_bits value to be in the range specified
in ipvs Kconfig. If not, simply use default value.
Fixes: 6f7edb4881bf ("IPVS: Allow boot time change of hash size") Reported-by: Yi Chen <yiche@redhat.com> Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
There're other modules might use hv_clock_per_cpu variable like ptp_kvm,
so move it into kvmclock.h and export the symbol to make it visiable to
other modules.
Signed-off-by: Zelin Deng <zelin.deng@linux.alibaba.com> Cc: <stable@vger.kernel.org>
Message-Id: <1632892429-101194-2-git-send-email-zelin.deng@linux.alibaba.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When PN checking is done in mac80211, for fragmentation we need
to copy the PN to the RX struct so we can later use it to do a
comparison, since commit bf30ca922a0c ("mac80211: check defrag
PN against current frame").
Unfortunately, in that commit I used the 'hdr' variable without
it being necessarily valid, so use-after-free could occur if it
was necessary to reallocate (parts of) the frame.
Fix this by reloading the variable after the code that results
in the reallocations, if any.
This fixes https://bugzilla.kernel.org/show_bug.cgi?id=214401.
If driver read val value sufficient for
(val & 0x08) && (!(val & 0x80)) && ((val & 0x7) == ((val >> 4) & 0x7))
from device then Null pointer dereference occurs.
(It is possible if tmp = 0b0xyz1xyz, where same literals mean same numbers)
Also lm75[] does not serve a purpose anymore after switching to
devm_i2c_new_dummy_device() in w83791d_detect_subclients().
The patch fixes possible NULL pointer dereference by removing lm75[].
Found by Linux Driver Verification project (linuxtesting.org).
If driver read val value sufficient for
(val & 0x08) && (!(val & 0x80)) && ((val & 0x7) == ((val >> 4) & 0x7))
from device then Null pointer dereference occurs.
(It is possible if tmp = 0b0xyz1xyz, where same literals mean same numbers)
Also lm75[] does not serve a purpose anymore after switching to
devm_i2c_new_dummy_device() in w83791d_detect_subclients().
The patch fixes possible NULL pointer dereference by removing lm75[].
Found by Linux Driver Verification project (linuxtesting.org).
If driver read tmp value sufficient for
(tmp & 0x08) && (!(tmp & 0x80)) && ((tmp & 0x7) == ((tmp >> 4) & 0x7))
from device then Null pointer dereference occurs.
(It is possible if tmp = 0b0xyz1xyz, where same literals mean same numbers)
Also lm75[] does not serve a purpose anymore after switching to
devm_i2c_new_dummy_device() in w83791d_detect_subclients().
The patch fixes possible NULL pointer dereference by removing lm75[].
Found by Linux Driver Verification project (linuxtesting.org).
If the file size is almost S64_MAX, the calculated number of Merkle tree
levels exceeds FS_VERITY_MAX_LEVELS, causing FS_IOC_ENABLE_VERITY to
fail. This is unintentional, since as the comment above the definition
of FS_VERITY_MAX_LEVELS states, it is enough for over U64_MAX bytes of
data using SHA-256 and 4K blocks. (Specifically, 4096*128**8 >= 2**64.)
The bug is actually that when the number of blocks in the first level is
calculated from i_size, there is a signed integer overflow due to i_size
being signed. Fix this by treating i_size as unsigned.
For DEV_VER_V3 version there exist race condition between clearing
ep_sts.EP_STS_TRBERR and setting ep_cmd.EP_CMD_DRDY bit.
Setting EP_CMD_DRDY will be ignored by controller when
EP_STS_TRBERR is set. So, between these two instructions we have
a small time gap in which the EP_STS_TRBERR can be set. In such case
the transfer will not start after setting doorbell.
Since commit e5c6b312ce3c ("cpufreq: schedutil: Use kobject release()
method to free sugov_tunables") kobject_put() has kfree()d the
attr_set before gov_attr_set_put() returns.
kobject_put() isn't the last user of attr_set in gov_attr_set_put(),
the subsequent mutex_destroy() triggers a use-after-free:
| BUG: KASAN: use-after-free in mutex_is_locked+0x20/0x60
| Read of size 8 at addr ffff000800ca4250 by task cpuhp/2/20
|
| CPU: 2 PID: 20 Comm: cpuhp/2 Not tainted 5.15.0-rc1 #12369
| Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development
| Platform, BIOS EDK II Jul 30 2018
| Call trace:
| dump_backtrace+0x0/0x380
| show_stack+0x1c/0x30
| dump_stack_lvl+0x8c/0xb8
| print_address_description.constprop.0+0x74/0x2b8
| kasan_report+0x1f4/0x210
| kasan_check_range+0xfc/0x1a4
| __kasan_check_read+0x38/0x60
| mutex_is_locked+0x20/0x60
| mutex_destroy+0x80/0x100
| gov_attr_set_put+0xfc/0x150
| sugov_exit+0x78/0x190
| cpufreq_offline.isra.0+0x2c0/0x660
| cpuhp_cpufreq_offline+0x14/0x24
| cpuhp_invoke_callback+0x430/0x6d0
| cpuhp_thread_fun+0x1b0/0x624
| smpboot_thread_fn+0x5e0/0xa6c
| kthread+0x3a0/0x450
| ret_from_fork+0x10/0x20
Swap the order of the calls.
Fixes: e5c6b312ce3c ("cpufreq: schedutil: Use kobject release() method to free sugov_tunables") Cc: 4.7+ <stable@vger.kernel.org> # 4.7+ Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
So split the original sugov_tunables_free() into two functions,
sugov_clear_global_tunables() is just used to clear the global_tunables
and the new sugov_tunables_free() is used as kobj_type::release to
release the sugov_tunables safely.
Fixes: 9bdcb44e391d ("cpufreq: schedutil: New governor based on scheduler utilization data") Cc: 4.7+ <stable@vger.kernel.org> # 4.7+ Signed-off-by: Kevin Hao <haokexin@gmail.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
This issue happens when a userspace program does an ioctl
FBIOPUT_VSCREENINFO passing the fb_var_screeninfo struct
containing only the fields xres, yres, and bits_per_pixel
with values.
If this struct is the same as the previous ioctl, the
vc_resize() detects it and doesn't call the resize_screen(),
leaving the fb_var_screeninfo incomplete. And this leads to
the updatescrollmode() calculates a wrong value to
fbcon_display->vrows, which makes the real_y() return a
wrong value of y, and that value, eventually, causes
the imageblit to access an out-of-bound address value.
To solve this issue I made the resize_screen() be called
even if the screen does not need any resizing, so it will
"fix and fill" the fb_var_screeninfo independently.
Cc: stable <stable@vger.kernel.org> # after 5.15-rc2 is out, give it time to bake Reported-and-tested-by: syzbot+858dc7a2f7ef07c2c219@syzkaller.appspotmail.com Signed-off-by: Igor Matheus Andrade Torrente <igormtorrente@gmail.com> Link: https://lore.kernel.org/r/20210628134509.15895-1-igormtorrente@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
In commit b7213ffa0e58 ("qnx4: avoid stringop-overread errors") I tried
to teach gcc about how the directory entry structure can be two
different things depending on a status flag. It made the code clearer,
and it seemed to make gcc happy.
However, Arnd points to a gcc bug, where despite using two different
members of a union, gcc then gets confused, and uses the size of one of
the members to decide if a string overrun happens. And not necessarily
the rigth one.
End result: with some configurations, gcc-11 will still complain about
the source buffer size being overread:
because gcc will get confused about which union member entry is actually
getting accessed, even when the source code is very clear about it. Gcc
internally will have combined two "redundant" pointers (pointing to
different union elements that are at the same offset), and takes the
size checking from one or the other - not necessarily the right one.
This is clearly a gcc bug, but we can work around it fairly easily. The
biggest thing here is the big honking comment about why we do what we
do.
Commit 8480ed9c2bbd56 ("xen/balloon: use a kernel thread instead a
workqueue") switched the Xen balloon driver to use a kernel thread.
Unfortunately the patch omitted to call try_to_freeze() or to use
wait_event_freezable_timeout(), causing a system suspend to fail.
Fixes: 8480ed9c2bbd56 ("xen/balloon: use a kernel thread instead a workqueue") Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/20210920100345.21939-1-jgross@suse.com Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Current PCIe MEM space of size 16 MB is not enough for some combination
of PCIe cards (e.g. NVMe disk together with ath11k wifi card). ARM Trusted
Firmware for Armada 3700 platform already assigns 128 MB for PCIe window,
so extend PCIe MEM space to the end of 128 MB PCIe window which allows to
allocate more PCIe BARs for more PCIe cards.
Without this change some combination of PCIe cards cannot be used and
kernel show error messages in dmesg during initialization:
pci 0000:00:00.0: BAR 8: no space for [mem size 0x01800000]
pci 0000:00:00.0: BAR 8: failed to assign [mem size 0x01800000]
pci 0000:00:00.0: BAR 6: assigned [mem 0xe8000000-0xe80007ff pref]
pci 0000:01:00.0: BAR 8: no space for [mem size 0x01800000]
pci 0000:01:00.0: BAR 8: failed to assign [mem size 0x01800000]
pci 0000:02:03.0: BAR 8: no space for [mem size 0x01000000]
pci 0000:02:03.0: BAR 8: failed to assign [mem size 0x01000000]
pci 0000:02:07.0: BAR 8: no space for [mem size 0x00100000]
pci 0000:02:07.0: BAR 8: failed to assign [mem size 0x00100000]
pci 0000:03:00.0: BAR 0: no space for [mem size 0x01000000 64bit]
pci 0000:03:00.0: BAR 0: failed to assign [mem size 0x01000000 64bit]
Due to bugs in U-Boot port for Turris Mox, the second range in Turris Mox
kernel DTS file for PCIe must start at 16 MB offset. Otherwise U-Boot
crashes during loading of kernel DTB file. This bug is present only in
U-Boot code for Turris Mox and therefore other Armada 3700 devices are not
affected by this bug. Bug is fixed in U-Boot version 2021.07.
To not break booting new kernels on existing versions of U-Boot on Turris
Mox, use first 16 MB range for IO and second range with rest of PCIe window
for MEM.
Signed-off-by: Pali Rohár <pali@kernel.org> Fixes: 76f6386b25cc ("arm64: dts: marvell: Add Aardvark PCIe support for Armada 3700") Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After upgrading to Linux 5.13.3 I noticed my laptop would shutdown due
to overheat (when it should not). It turned out this was due to commit fe6a6de6692e ("thermal/drivers/int340x/processor_thermal: Fix tcc setting").
What happens is this drivers uses a global variable to keep track of the
tcc offset (tcc_offset_save) and uses it on resume. The issue is this
variable is initialized to 0, but is only set in
tcc_offset_degree_celsius_store, i.e. when the tcc offset is explicitly
set by userspace. If that does not happen, the resume path will set the
offset to 0 (in my case the h/w default being 3, the offset would become
too low after a suspend/resume cycle).
The issue did not arise before commit fe6a6de6692e, as the function
setting the offset would return if the offset was 0. This is no longer
the case (rightfully).
Fix this by not applying the offset if it wasn't saved before, reverting
back to the old logic. A better approach will come later, but this will
be easier to apply to stable kernels.
The logic to restore the offset after a resume was there long before
commit fe6a6de6692e, but as a value of 0 was considered invalid I'm
referencing the commit that made the issue possible in the Fixes tag
instead.
Without CONFIG_PM enabled, the SET_RUNTIME_PM_OPS() macro ends up being
empty, and the only use of tegra_slink_runtime_{resume,suspend} goes
away, resulting in
drivers/spi/spi-tegra20-slink.c:1200:12: error: ‘tegra_slink_runtime_resume’ defined but not used [-Werror=unused-function]
1200 | static int tegra_slink_runtime_resume(struct device *dev)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/spi/spi-tegra20-slink.c:1188:12: error: ‘tegra_slink_runtime_suspend’ defined but not used [-Werror=unused-function]
1188 | static int tegra_slink_runtime_suspend(struct device *dev)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
mark the functions __maybe_unused to make the build happy.
This hits the alpha allmodconfig build (and others).
tx timeout and slot time are currently specified in units of HZ. On
Alpha, HZ is defined as 1024. When building alpha:allmodconfig, this
results in the following error message.
drivers/net/hamradio/6pack.c: In function 'sixpack_open':
drivers/net/hamradio/6pack.c:71:41: error:
unsigned conversion from 'int' to 'unsigned char'
changes value from '256' to '0'
In the 6PACK protocol, tx timeout is specified in units of 10 ms and
transmitted over the wire:
https://www.linux-ax25.org/wiki/6PACK
Defining a value dependent on HZ doesn't really make sense, and
presumably comes from the (very historical) situation where HZ was
originally 100.
Note that the SIXP_SLOTTIME use explicitly is about 10ms granularity:
Some drivers pass a pointer to volatile data to virt_to_bus() and
virt_to_phys(), and that works fine. One exception is alpha. This
results in a number of compile errors such as
drivers/net/wan/lmc/lmc_main.c: In function 'lmc_softreset':
drivers/net/wan/lmc/lmc_main.c:1782:50: error:
passing argument 1 of 'virt_to_bus' discards 'volatile'
qualifier from pointer target type
drivers/atm/ambassador.c: In function 'do_loader_command':
drivers/atm/ambassador.c:1747:58: error:
passing argument 1 of 'virt_to_bus' discards 'volatile'
qualifier from pointer target type
Declare the parameter of virt_to_phys and virt_to_bus as pointer to
volatile to fix the problem.
__stack_chk_guard is setup once while init stage and never changed
after that.
Although the modification of this variable at runtime will usually
cause the kernel to crash (so does the attacker), it should be marked
as __ro_after_init, and it should not affect performance if it is
placed in the ro_after_init section.
The qnx4 directory entries are 64-byte blocks that have different
contents depending on the a status byte that is in the last byte of the
block.
In particular, a directory entry can be either a "link info" entry with
a 48-byte name and pointers to the real inode information, or an "inode
entry" with a smaller 16-byte name and the full inode information.
But the code was written to always just treat the directory name as if
it was part of that "inode entry", and just extend the name to the
longer case if the status byte said it was a link entry.
That work just fine and gives the right results, but now that gcc is
tracking data structure accesses much more, the code can trigger a
compiler error about using up to 48 bytes (the long name) in a structure
that only has that shorter name in it:
fs/qnx4/dir.c: In function ‘qnx4_readdir’:
fs/qnx4/dir.c:51:32: error: ‘strnlen’ specified bound 48 exceeds source size 16 [-Werror=stringop-overread]
51 | size = strnlen(de->di_fname, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from fs/qnx4/qnx4.h:3,
from fs/qnx4/dir.c:16:
include/uapi/linux/qnx4_fs.h:45:25: note: source object declared here
45 | char di_fname[QNX4_SHORT_NAME_MAX];
| ^~~~~~~~
which is because the source code doesn't really make this whole "one of
two different types" explicit.
Fix this by introducing a very explicit union of the two types, and
basically explaining to the compiler what is really going on.
The sparc mdesc code does pointer games with 'struct mdesc_hdr', but
didn't describe to the compiler how that header is then followed by the
data that the header describes.
As a result, gcc is now unhappy since it does stricter pointer range
tracking, and doesn't understand about how these things work. This
results in various errors like:
arch/sparc/kernel/mdesc.c: In function ‘mdesc_node_by_name’:
arch/sparc/kernel/mdesc.c:647:22: error: ‘strcmp’ reading 1 or more bytes from a region of size 0 [-Werror=stringop-overread]
647 | if (!strcmp(names + ep[ret].name_offset, name))
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
which are easily avoided by just describing 'struct mdesc_hdr' better,
and making the node_block() helper function look into that unsized
data[] that follows the header.
This makes the sparc64 build happy again at least for my cross-compiler
version (gcc version 11.2.1).
gcc 11.x reports the following compiler warning/error.
drivers/net/ethernet/i825xx/82596.c: In function 'i82596_probe':
arch/m68k/include/asm/string.h:72:25: error:
'__builtin_memcpy' reading 6 bytes from a region of size 0 [-Werror=stringop-overread]
Use absolute_pointer() to work around the problem.
absolute_pointer() disassociates a pointer from its originating symbol
type and context. Use it to prevent compiler warnings/errors such as
drivers/net/ethernet/i825xx/82596.c: In function 'i82596_probe':
arch/m68k/include/asm/string.h:72:25: error:
'__builtin_memcpy' reading 6 bytes from a region of size 0 [-Werror=stringop-overread]
Such warnings may be reported by gcc 11.x for string and memory
operations on fixed addresses.
KASAN reports a use-after-free report when doing fuzz test:
[693354.104835] ==================================================================
[693354.105094] BUG: KASAN: use-after-free in bfq_io_set_weight_legacy+0xd3/0x160
[693354.105336] Read of size 4 at addr ffff888be0a35664 by task sh/1453338
[693354.108511] Memory state around the buggy address:
[693354.108671] ffff888be0a35500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[693354.116396] ffff888be0a35580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[693354.124473] >ffff888be0a35600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[693354.132421] ^
[693354.140284] ffff888be0a35680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[693354.147912] ffff888be0a35700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[693354.155281] ==================================================================
blkgs are protected by both queue and blkcg locks and holding
either should stabilize them. However, the path of destroying
blkg policy data is only protected by queue lock in
blkcg_activate_policy()/blkcg_deactivate_policy(). Other tasks
can get the blkg policy data before the blkg policy data is
destroyed, and use it after destroyed, which will result in a
use-after-free.
Commit 53b7670e5735 ("sparc: factor the dma coherent mapping into
helper") lost the page align for the calls to dma_make_coherent and
srmmu_unmapiorange. The latter cannot handle a non page aligned len
argument.
Signed-off-by: Andreas Larsson <andreas@gaisler.com> Reviewed-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
nvme_update_ana_state() has a deficiency that results in a failure to
properly update the ana state for a namespace in the following case:
NSIDs in ctrl->namespaces: 1, 3, 4
NSIDs in desc->nsids: 1, 2, 3, 4
Loop iteration 0:
ns index = 0, n = 0, ns->head->ns_id = 1, nsid = 1, MATCH.
Loop iteration 1:
ns index = 1, n = 1, ns->head->ns_id = 3, nsid = 2, NO MATCH.
Loop iteration 2:
ns index = 2, n = 2, ns->head->ns_id = 4, nsid = 4, MATCH.
Where the update to the ANA state of NSID 3 is missed. To fix this
increment n and retry the update with the same ns when ns->head->ns_id is
higher than nsid,
Signed-off-by: Anton Eidelman <anton@lightbitslabs.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Sasha Levin <sashal@kernel.org>
Today the Xen ballooning is done via delayed work in a workqueue. This
might result in workqueue hangups being reported in case of large
amounts of memory are being ballooned in one go (here 16GB):
Commit 7661809d493b ("mm: don't allow oversized kvmalloc() calls") add the
oversize check. When the allocation is larger than what kmalloc() supports,
the following warning triggered:
only increase fib6_sernum in net namespace after add fib6_info
successfully.
Signed-off-by: zhang kai <zhangkaiheb@126.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
arch/m68k/include/asm/raw_io.h:20:19: error:
cast to pointer from integer of different size
arch/m68k/include/asm/raw_io.h:30:32: error:
cast to pointer from integer of different size [-Werror=int-to-p
On m68k, io functions are defined as macros. The problem is seen if the
macro parameter variable size differs from the size of a pointer. Cast
the parameter of all io macros to unsigned long before casting it to
a pointer to fix the problem.
The Synopsys Ethernet IP uses the CSR clock as a base clock for MDC.
The divisor used is set in the MAC_MDIO_Address register field CR
(Clock Rate)
The divisor is there to change the CSR clock into a clock that falls
below the IEEE 802.3 specified max frequency of 2.5MHz.
If the CSR clock is 300MHz, the code falls back to using the reset
value in the MAC_MDIO_Address register, as described in the comment
above this code.
However, 300MHz is actually an allowed value and the proper divider
can be estimated quite easily (it's just 1Hz difference!)
A CSR frequency of 300MHz with the maximum clock rate value of 0x5
(STMMAC_CSR_250_300M, a divisor of 124) gives somewhere around
~2.42MHz which is below the IEEE 802.3 specified maximum.
For the ARTPEC-8 SoC, the CSR clock is this problematic 300MHz,
and unfortunately, the reset-value of the MAC_MDIO_Address CR field
is 0x0.
This leads to a clock rate of zero and a divisor of 42, and gives an
MDC frequency of ~7.14MHz.
Allow CSR clock of 300MHz by making the comparison inclusive.
Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
plat_dev->dev->platform_data is released by platform_device_unregister(),
use of pclk and hclk is a use-after-free. Since device unregister won't
need a clk device we adjust the function call sequence to fix this issue.
[ 31.261225] BUG: KASAN: use-after-free in macb_remove+0x77/0xc6 [macb_pci]
[ 31.275563] Freed by task 306:
[ 30.276782] platform_device_release+0x25/0x80
Suggested-by: Nicolas Ferre <Nicolas.Ferre@microchip.com> Signed-off-by: Tong Zhang <ztong0001@gmail.com> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
ioctl$SG_IO(/dev/sda, SG_IO, ...)
// Enters trace_note_tsk() after blk_trace_free() returned
// Use mdelay in rcu region rather than msleep(which may schedule out)
Remove blk_trace from running_list before calling blk_trace_free() by
sysfs if blk_trace is at Blktrace_running state.
Commit b0140891a8cea3 ("md: Fix race when creating a new md device.")
not only moved assigning mddev->gendisk before calling add_disk, which
fixes the races described in the commit log, but also added a
mddev->open_mutex critical section over add_disk and creation of the
md kobj. Adding a kobject after add_disk is racy vs deleting the gendisk
right after adding it, but md already prevents against that by holding
a mddev->active reference.
On the other hand taking this lock added a lock order reversal with what
is not disk->open_mutex (used to be bdev->bd_mutex when the commit was
added) for partition devices, which need that lock for the internal open
for the partition scan, and a recent commit also takes it for
non-partitioned devices, leading to further lockdep splatter.
Fixes: b0140891a8ce ("md: Fix race when creating a new md device.") Fixes: d62633873590 ("block: support delayed holder registration") Reported-by: syzbot+fadc0aaf497e6a493b9f@syzkaller.appspotmail.com Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: syzbot+fadc0aaf497e6a493b9f@syzkaller.appspotmail.com Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The limit should be "PAGE_SIZE - len" instead of "PAGE_SIZE". We're not
going to hit the limit so this fix will not affect runtime.
Link: https://lore.kernel.org/r/20210916132331.GE25094@kili Fixes: 5b9e70b22cc5 ("scsi: lpfc: raise sg count for nvme to use available sg resources") Reviewed-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The problem is the mismatched types between "ctx->total_len" which is
an unsigned int, "rc" which is an int, and "ctx->rc" which is a
ssize_t. The code does:
ctx->rc = (rc == 0) ? ctx->total_len : rc;
We want "ctx->rc" to store the negative "rc" error code. But what
happens is that "rc" is type promoted to a high unsigned int and
'ctx->rc" will store the high positive value instead of a negative
value.
The fix is to change "rc" from an int to a ssize_t.
Fixes: c610c4b619e5 ("CIFS: Add asynchronous write support through kernel AIO") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
After printing the list of thermal governors, then this function prints
a newline character. The problem is that "size" has not been updated
after printing the last governor. This means that it can write one
character (the NUL terminator) beyond the end of the buffer.
Get rid of the "size" variable and just use "PAGE_SIZE - count" directly.
Fixes: 1b4f48494eb2 ("thermal: core: group functions related to governor handling") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210916131342.GB25094@kili Signed-off-by: Sasha Levin <sashal@kernel.org>
'set_signals()' in synclink_gt.c conflicts with an exported symbol
in arch/um/, so change set_signals() to set_gtsignals(). Keep
the function names similar by also changing get_signals() to
get_gtsignals().
../drivers/tty/synclink_gt.c:442:13: error: conflicting types for ‘set_signals’
static void set_signals(struct slgt_info *info);
^~~~~~~~~~~
In file included from ../include/linux/irqflags.h:16:0,
from ../include/linux/spinlock.h:58,
from ../include/linux/mm_types.h:9,
from ../include/linux/buildid.h:5,
from ../include/linux/module.h:14,
from ../drivers/tty/synclink_gt.c:46:
../arch/um/include/asm/irqflags.h:6:5: note: previous declaration of ‘set_signals’ was here
int set_signals(int enable);
^~~~~~~~~~~
Forward declarations make the code larger and rewrites harder. Harder as
they are often omitted from global changes. Remove forward declarations
which are not really needed, i.e. the definition of the function is
before its first use.
ISCSI_NET_PARAM_IFACE_ENABLE belongs to enum iscsi_net_param instead of
iscsi_iface_param so move it to ISCSI_NET_PARAM. Otherwise, when we call
into the driver, we might not match and return that we don't want attr
visible in sysfs. Found in code review.
Link: https://lore.kernel.org/r/20210901085336.2264295-1-libaokun1@huawei.com Fixes: e746f3451ec7 ("scsi: iscsi: Fix iface sysfs attr detection") Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Baokun Li <libaokun1@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
If the HW device is during recovery, the HW resources will never return,
hence we shouldn't wait for the CID (HW context ID) bitmaps to clear.
This fix speeds up the error recovery flow.
Fixes: 64515dc899df ("qed: Add infrastructure for error detection and recovery") Signed-off-by: Michal Kalderon <mkalderon@marvell.com> Signed-off-by: Ariel Elior <aelior@marvell.com> Signed-off-by: Shai Malin <smalin@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
The smallest TX ring size we support must fit a TX SKB with MAX_SKB_FRAGS
+ 1. Because the first TX BD for a packet is always a long TX BD, we
need an extra TX BD to fit this packet. Define BNXT_MIN_TX_DESC_CNT with
this value to make this more clear. The current code uses a minimum
that is off by 1. Fix it using this constant.
The tx_wake_thresh to determine when to wake up the TX queue is half the
ring size but we must have at least BNXT_MIN_TX_DESC_CNT for the next
packet which may have maximum fragments. So the comparison of the
available TX BDs with tx_wake_thresh should be >= instead of > in the
current code. Otherwise, at the smallest ring size, we will never wake
up the TX queue and will cause TX timeout.
Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.") Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadocm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
irq_set_affinity_hit() stores a reference to the cpumask_t
parameter in the irq descriptor, and that reference can be
accessed later from irq_affinity_hint_proc_show(). Since
the cpu_mask parameter passed to irq_set_affinity_hit() has
only temporary storage (it's on the stack memory), later
accesses to it are illegal. Thus reads from the corresponding
procfs affinity_hint file can result in paging request oops.
The issue is fixed by the get_cpu_mask() helper, which provides
a permanent storage for the cpumask_t parameter.
Fixes: d4fd0404c1c9 ("enetc: Introduce basic PF and VF ENETC ethernet drivers") Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
The AFS filesystem is currently triggering the silly-rename cleanup from
afs_d_revalidate() when it sees that a dentry has been changed by a third
party[1]. It should not be doing this as the cleanup includes deleting the
silly-rename target file on iput.
Fix this by removing the places in the d_revalidate handling that validate
anything other than the directory and the dirent. It probably should not
be looking to validate the target inode of the dentry also.
This includes removing the point in afs_d_revalidate() where the inode that
a dentry used to point to was marked as being deleted (AFS_VNODE_DELETED).
We don't know it got deleted. It could have been renamed or it could have
hard links remaining.
This was reproduced by cloning a git repo onto an afs volume on one
machine, switching to another machine and doing "git status", then
switching back to the first and doing "git status". The second status
would show weird output due to ".git/index" getting deleted by the above
mentioned mechanism.
A simpler way to do it is to do:
machine 1: touch a
machine 2: touch b; mv -f b a
machine 1: stat a
on an afs volume. The bug shows up as the stat failing with ENOENT and the
file server log showing that machine 1 deleted "a".
If resource allocation and registration fail for a muxed tty device
(e.g. if there are no more minor numbers) the driver should not try to
deregister the never-registered (or already-deregistered) tty.
Fix up the error handling to avoid dereferencing a NULL pointer when
attempting to remove the character device.
Fixes: 72dc1c096c70 ("HSO: add option hso driver") Cc: stable@vger.kernel.org # 2.6.27 Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>