14285 Commits

Author SHA1 Message Date
Mina Almasry
678f6e28b5 net: add SO_DEVMEM_DONTNEED setsockopt to release RX frags
Add an interface for the user to notify the kernel that it is done
reading the devmem dmabuf frags returned as cmsg. The kernel will
drop the reference on the frags to make them available for reuse.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Kaiyuan Zhang <kaiyuanz@google.com>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240910171458.219195-11-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-11 20:44:32 -07:00
Mina Almasry
8f0b3cc9a4 tcp: RX path for devmem TCP
In tcp_recvmsg_locked(), detect if the skb being received by the user
is a devmem skb. In this case - if the user provided the MSG_SOCK_DEVMEM
flag - pass it to tcp_recvmsg_devmem() for custom handling.

tcp_recvmsg_devmem() copies any data in the skb header to the linear
buffer, and returns a cmsg to the user indicating the number of bytes
returned in the linear buffer.

tcp_recvmsg_devmem() then loops over the unaccessible devmem skb frags,
and returns to the user a cmsg_devmem indicating the location of the
data in the dmabuf device memory. cmsg_devmem contains this information:

1. the offset into the dmabuf where the payload starts. 'frag_offset'.
2. the size of the frag. 'frag_size'.
3. an opaque token 'frag_token' to return to the kernel when the buffer
is to be released.

The pages awaiting freeing are stored in the newly added
sk->sk_user_frags, and each page passed to userspace is get_page()'d.
This reference is dropped once the userspace indicates that it is
done reading this page.  All pages are released when the socket is
destroyed.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Kaiyuan Zhang <kaiyuanz@google.com>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240910171458.219195-10-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-11 20:44:32 -07:00
Mina Almasry
3efd7ab46d net: netdev netlink api to bind dma-buf to a net device
API takes the dma-buf fd as input, and binds it to the netdevice. The
user can specify the rx queues to bind the dma-buf to.

Suggested-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20240910171458.219195-3-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-11 20:44:31 -07:00
Tejun Heo
0b1777f0fa Merge branch 'tip/sched/core' into sched_ext/for-6.12
Pull in tip/sched/core to resolve two merge conflicts:

- 96fd6c65efc6 ("sched: Factor out update_other_load_avgs() from __update_blocked_others()")
  5d871a63997f ("sched/fair: Move effective_cpu_util() and effective_cpu_util() in fair.c")

  A simple context conflict. The former added __update_blocked_others() in
  the same #ifdef CONFIG_SMP block that effective_cpu_util() and
  sched_cpu_util() are in and the latter moved those functions to fair.c.
  This makes __update_blocked_others() more out of place. Will follow up
  with a patch to relocate.

- 96fd6c65efc6 ("sched: Factor out update_other_load_avgs() from __update_blocked_others()")
  84d265281d6c ("sched/pelt: Use rq_clock_task() for hw_pressure")

  The former factored out the body of __update_blocked_others() into
  update_other_load_avgs(). The latter changed how update_hw_load_avg() is
  called in the body. Resolved by applying the change to
  update_other_load_avgs() instead.

Signed-off-by: Tejun Heo <tj@kernel.org>
2024-09-11 08:43:26 -10:00
Pavel Begunkov
50c52250e2 block: implement async io_uring discard cmd
io_uring allows implementing custom file specific asynchronous
operations via the fops->uring_cmd callback, a.k.a. IORING_OP_URING_CMD
requests or just io_uring commands. Use it to add support for async
discards.

Normally, it first tries to queue up bios in a non-blocking context,
and if that fails, we'd retry from a blocking context by returning
-EAGAIN to the core io_uring. We always get the result from bios
asynchronously by setting a custom bi_end_io callback, at which point
we drag the request into the task context to either reissue or complete
it and post a completion to the user.

Unlike ioctl(BLKDISCARD) with stronger guarantees against races, we only
do a best effort attempt to invalidate page cache, and it can race with
any writes and reads and leave page cache stale. It's the same kind of
races we allow to direct writes.

Also, apart from cases where discarding is not allowed at all, e.g.
discards are not supported or the file/device is read only, the user
should assume that the sector range on disk is not valid anymore, even
when an error was returned to the user.

Suggested-by: Conrad Meyer <conradmeyer@meta.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2b5210443e4fa0257934f73dfafcc18a77cd0e09.1726072086.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-09-11 10:45:28 -06:00
Jens Axboe
6d0f8dcb3a Merge branch 'for-6.12/io_uring' into for-6.12/io_uring-discard
* for-6.12/io_uring: (31 commits)
  io_uring/io-wq: inherit cpuset of cgroup in io worker
  io_uring/io-wq: do not allow pinning outside of cpuset
  io_uring/rw: drop -EOPNOTSUPP check in __io_complete_rw_common()
  io_uring/rw: treat -EOPNOTSUPP for IOCB_NOWAIT like -EAGAIN
  io_uring/sqpoll: do not allow pinning outside of cpuset
  io_uring/eventfd: move refs to refcount_t
  io_uring: remove unused rsrc_put_fn
  io_uring: add new line after variable declaration
  io_uring: add GCOV_PROFILE_URING Kconfig option
  io_uring/kbuf: add support for incremental buffer consumption
  io_uring/kbuf: pass in 'len' argument for buffer commit
  Revert "io_uring: Require zeroed sqe->len on provided-buffers send"
  io_uring/kbuf: move io_ring_head_to_buf() to kbuf.h
  io_uring/kbuf: add io_kbuf_commit() helper
  io_uring/kbuf: shrink nr_iovs/mode in struct buf_sel_arg
  io_uring: wire up min batch wake timeout
  io_uring: add support for batch wait timeout
  io_uring: implement our own schedule timeout handling
  io_uring: move schedule wait logic into helper
  io_uring: encapsulate extraneous wait flags into a separate struct
  ...
2024-09-11 10:42:40 -06:00
Jens Axboe
318ad4283a Merge branch 'for-6.12/block' into for-6.12/io_uring-discard
* for-6.12/block: (115 commits)
  block: unpin user pages belonging to a folio at once
  mm: release number of pages of a folio
  block: introduce folio awareness and add a bigger size from folio
  block: Added folio-ized version of bio_add_hw_page()
  block, bfq: factor out a helper to split bfqq in bfq_init_rq()
  block, bfq: remove local variable 'bfqq_already_existing' in bfq_init_rq()
  block, bfq: remove local variable 'split' in bfq_init_rq()
  block, bfq: remove bfq_log_bfqg()
  block, bfq: merge bfq_release_process_ref() into bfq_put_cooperator()
  block, bfq: fix procress reference leakage for bfqq in merge chain
  block, bfq: fix uaf for accessing waker_bfqq after splitting
  blk-throttle: support prioritized processing of metadata
  blk-throttle: remove last_low_overflow_time
  drbd: Add NULL check for net_conf to prevent dereference in state validation
  blk-mq: add missing unplug trace event
  mtip32xx: Remove redundant null pointer checks in mtip_hw_debugfs_init()
  md: Add new_level sysfs interface
  zram: Shrink zram_table_entry::flags.
  zram: Remove ZRAM_LOCK
  zram: Replace bit spinlocks with a spinlock_t.
  ...
2024-09-11 10:42:37 -06:00
Christian Loehle
6ebf2d021a sched/deadline: Clarify nanoseconds in uapi
Specify the time values of the deadline parameters of deadline,
runtime, and period as being in nanoseconds explicitly as they always
have been.

Signed-off-by: Christian Loehle <christian.loehle@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Link: https://lore.kernel.org/r/20240813144348.1180344-3-christian.loehle@arm.com
2024-09-11 11:23:56 +02:00
Simona Vetter
b615b9c36c Linux 6.11-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmbeHCQeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGwfwH/ijnVvDWt0L1mpkE
 oIPmKV+2018CA5ww/Hh+ncToWn/aCmrczHc1SEUOk/SbZnGyXJj/6KiNEK6XpJyu
 Hb90y53D5B9jkEq8WPbSy5RtqCU598gYPeBxkczjj431jer9EsZVzqsKxGRzdAud
 2+Ft/qLiVL8AP5P8IPuU7G8CU6OE0fUL5PyuzMGDtstL3R6lPpG+kf/VrJGV1mp7
 DVZO8hKwIi5Vu+ciaTJv+9PMHzXRnMhLIGabtGIzM8nhmrQx/Kv/PMjiEl/OBkmk
 6PzafEkxVtBKDNK2Qhp+QMTQJATuPccZI8Kn6peZhqoNWYHBqx7d88Q/2iiAGj0z
 skPW5Gs=
 =orf8
 -----END PGP SIGNATURE-----

Merge v6.11-rc7 into drm-next

Thomas needs 5a498d4d06d6 ("drm/fbdev-dma: Only install deferred I/O
if necessary") in drm-misc, so start the backmerge cascade.

Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>
2024-09-11 09:18:15 +02:00
Bjorn Helgaas
87f10faf16 PCI: Rename CRS Completion Status to RRS
PCIe r6.0 changed the abbreviation for "Configuration Request Retry Status"
Completion Status from "CRS" to "RRS" and uses the terminology of
"Configuration RRS Software Visibility" instead of "CRS Software
Visibility".

Align the Linux usage with the r6.0 spec language.  No functional change
intended.

It's confusing to make this change, but I think "RRS" *is* a better
abbreviation because it was easy to interpret "CRS" as "Completion Retry
Status", which really didn't make any sense.

Link: https://lore.kernel.org/r/20240827234848.4429-4-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-09-10 19:52:30 -05:00
Jason Xing
be8e9eb375 net-timestamp: introduce SOF_TIMESTAMPING_OPT_RX_FILTER flag
introduce a new flag SOF_TIMESTAMPING_OPT_RX_FILTER in the receive
path. User can set it with SOF_TIMESTAMPING_SOFTWARE to filter
out rx software timestamp report, especially after a process turns on
netstamp_needed_key which can time stamp every incoming skb.

Previously, we found out if an application starts first which turns on
netstamp_needed_key, then another one only passing SOF_TIMESTAMPING_SOFTWARE
could also get rx timestamp. Now we handle this case by introducing this
new flag without breaking users.

Quoting Willem to explain why we need the flag:
"why a process would want to request software timestamp reporting, but
not receive software timestamp generation. The only use I see is when
the application does request
SOF_TIMESTAMPING_SOFTWARE | SOF_TIMESTAMPING_TX_SOFTWARE."

Similarly, this new flag could also be used for hardware case where we
can set it with SOF_TIMESTAMPING_RAW_HARDWARE, then we won't receive
hardware receive timestamp.

Another thing about errqueue in this patch I have a few words to say:
In this case, we need to handle the egress path carefully, or else
reporting the tx timestamp will fail. Egress path and ingress path will
finally call sock_recv_timestamp(). We have to distinguish them.
Errqueue is a good indicator to reflect the flow direction.

Suggested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240909015612.3856-2-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-10 16:55:23 -07:00
Takashi Iwai
5516e3f476 Merge branch 'for-linus' into for-next
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-09-10 10:28:00 +02:00
Cindy Lu
2f87e9cf0c vdpa: support set mac address from vdpa tool
Add new UAPI to support the mac address from vdpa tool
Function vdpa_nl_cmd_dev_attr_set_doit() will get the
new MAC address from the vdpa tool and then set it to the device.

The usage is: vdpa dev set name vdpa_name mac **:**:**:**:**:**

Here is example:
root@L1# vdpa -jp dev config show vdpa0
{
    "config": {
        "vdpa0": {
            "mac": "82:4d:e9:5d:d7:e6",
            "link ": "up",
            "link_announce ": false,
            "mtu": 1500
        }
    }
}

root@L1# vdpa dev set name vdpa0 mac 00:11:22:33:44:55

root@L1# vdpa -jp dev config show vdpa0
{
    "config": {
        "vdpa0": {
            "mac": "00:11:22:33:44:55",
            "link ": "up",
            "link_announce ": false,
            "mtu": 1500
        }
    }
}

Signed-off-by: Cindy Lu <lulu@redhat.com>
Message-Id: <20240731031653.1047692-2-lulu@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2024-09-10 02:51:48 -04:00
zhenwei pi
74c025c5d7 virtio_balloon: introduce memory scan/reclaim info
Expose memory scan/reclaim information to the host side via virtio
balloon device.

Now we have a metric to analyze the memory performance:

y: counter increases
n: counter does not changes
h: the rate of counter change is high
l: the rate of counter change is low

OOM: VIRTIO_BALLOON_S_OOM_KILL
STALL: VIRTIO_BALLOON_S_ALLOC_STALL
ASCAN: VIRTIO_BALLOON_S_SCAN_ASYNC
DSCAN: VIRTIO_BALLOON_S_SCAN_DIRECT
ARCLM: VIRTIO_BALLOON_S_RECLAIM_ASYNC
DRCLM: VIRTIO_BALLOON_S_RECLAIM_DIRECT

- OOM[y], STALL[*], ASCAN[*], DSCAN[*], ARCLM[*], DRCLM[*]:
  the guest runs under really critial memory pressure

- OOM[n], STALL[h], ASCAN[*], DSCAN[l], ARCLM[*], DRCLM[l]:
  the memory allocation stalls due to cgroup, not the global memory
  pressure.

- OOM[n], STALL[h], ASCAN[*], DSCAN[h], ARCLM[*], DRCLM[h]:
  the memory allocation stalls due to global memory pressure. The
  performance gets hurt a lot. A high ratio between DRCLM/DSCAN shows
  quite effective memory reclaiming.

- OOM[n], STALL[h], ASCAN[*], DSCAN[h], ARCLM[*], DRCLM[l]:
  the memory allocation stalls due to global memory pressure.
  the ratio between DRCLM/DSCAN gets low, the guest OS is thrashing
  heavily, the serious case leads poor performance and difficult
  trouble shooting. Ex, sshd may block on memory allocation when
  accepting new connections, a user can't login a VM by ssh command.

- OOM[n], STALL[n], ASCAN[h], DSCAN[n], ARCLM[l], DRCLM[n]:
  the low ratio between ARCLM/ASCAN shows that the guest tries to
  reclaim more memory, but it can't. Once more memory is required in
  future, it will struggle to reclaim memory.

Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Message-Id: <20240423034109.1552866-5-pizhenwei@bytedance.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-09-10 02:51:48 -04:00
zhenwei pi
c5b70a26aa virtio_balloon: introduce memory allocation stall counter
Memory allocation stall counter represents the performance/latency of
memory allocation, expose this counter to the host side by virtio
balloon device via out-of-bound way.

Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Message-Id: <20240423034109.1552866-4-pizhenwei@bytedance.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-09-10 02:51:47 -04:00
zhenwei pi
6cf1c97dad virtio_balloon: introduce oom-kill invocations
When the guest OS runs under critical memory pressure, the guest
starts to kill processes. A guest monitor agent may scan 'oom_kill'
from /proc/vmstat, and reports the OOM KILL event. However, the agent
may be killed and we will loss this critical event(and the later
events).

For now we can also grep for magic words in guest kernel log from host
side. Rather than this unstable way, virtio balloon reports OOM-KILL
invocations instead.

Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Message-Id: <20240423034109.1552866-3-pizhenwei@bytedance.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-09-10 02:51:47 -04:00
Greg Kroah-Hartman
f299cd11f7 Merge 6.11-rc7 into usb-next
We need the USB fixes in here as well, and this also resolves the merge
conflict in:
	drivers/usb/typec/ucsi/ucsi.c

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-09-09 08:40:22 +02:00
Greg Kroah-Hartman
895b4fae93 Merge 6.11-rc7 into char-misc-next
We need the char-misc fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-09-09 08:36:23 +02:00
Mahesh Bandewar
c259acab83 ptp/ioctl: support MONOTONIC{,_RAW} timestamps for PTP_SYS_OFFSET_EXTENDED
The ability to read the PHC (Physical Hardware Clock) alongside
multiple system clocks is currently dependent on the specific
hardware architecture. This limitation restricts the use of
PTP_SYS_OFFSET_PRECISE to certain hardware configurations.

The generic soultion which would work across all architectures
is to read the PHC along with the latency to perform PHC-read as
offered by PTP_SYS_OFFSET_EXTENDED which provides pre and post
timestamps.  However, these timestamps are currently limited
to the CLOCK_REALTIME timebase. Since CLOCK_REALTIME is affected
by NTP (or similar time synchronization services), it can
experience significant jumps forward or backward. This hinders
the precise latency measurements that PTP_SYS_OFFSET_EXTENDED
is designed to provide.

This problem could be addressed by supporting MONOTONIC_RAW
timestamps within PTP_SYS_OFFSET_EXTENDED. Unlike CLOCK_REALTIME
or CLOCK_MONOTONIC, the MONOTONIC_RAW timebase is unaffected
by NTP adjustments.

This enhancement can be implemented by utilizing one of the three
reserved words within the PTP_SYS_OFFSET_EXTENDED struct to pass
the clock-id for timestamps.  The current behavior aligns with
clock-id for CLOCK_REALTIME timebase (value of 0), ensuring
backward compatibility of the UAPI.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-08 18:40:33 +01:00
Jakub Kicinski
f723224742 netfilter pull request 24-09-06
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmbaOvIACgkQ1V2XiooU
 IOT/oQ/+JTsIkXRn8XAOgjsbxOEOvrUPAzb72Atz/cCA0RPQkHXbdZtxLDPbcN1v
 lQG6R+ZK+trS70fIMqnfSbEB/eaCWum+/kd9ZSp5RCFW4M9OVde+KTJj+IfEzsQZ
 spZRR53VnAN5jSeI2U3w4iYnyCWn5Xtp2sGETrjh43yK3cirvo7sZd/+477gZiGp
 qBDEgZrzcDzfm8IxJCCUeJdcNeM7ytoMhuyITT9YrvUt0Qo6+qPsx5hVFwMFly/M
 WkvxCR/1DR+Unhp4a30STEPPxDR0f284WoaiuxEvNAN2yP7p7O35mcStzyfhlOh+
 wB/Cc4ESBa3fPRhA+l3FDsdyrlHsi3c8VUwBWcXVryeD5e1mzyveXye9O2HtWmET
 wBtukfdPORu8JBBHxf3kmv+ZLAJLjAwyO1G1DHFruL/yEAJIDq4gluxlR+71rg7n
 qAZUvvV3MGQMCNIO3GlQ6ODtl0UcIUTHwW5//MEaxOC/aqWN/fr/keSz8xGE2Qkt
 47TFbBiGC6UR0KD+wWGAWfOlWN4G9m7E4SG++vCkXJGio4bvyGl8TxorWsh99vCv
 BMq59ZRtsS1xiEcWF48Q0Y5YtURIdCih/LcfDdbIQFzkNlHzzGpo68MHN/anqgu/
 GE4JTdgjf79lfDqJDqdnQiio7P44NZqhkeUT8yQTE1xbIKsQRNY=
 =Uxb1
 -----END PGP SIGNATURE-----

Merge tag 'nf-next-24-09-06' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for net-next:

Patch #1 adds ctnetlink support for kernel side filtering for
	 deletions, from Changliang Wu.

Patch #2 updates nft_counter support to Use u64_stats_t,
	 from Sebastian Andrzej Siewior.

Patch #3 uses kmemdup_array() in all xtables frontends,
	 from Yan Zhen.

Patch #4 is a oneliner to use ERR_CAST() in nf_conntrack instead
	 opencoded casting, from Shen Lichuan.

Patch #5 removes unused argument in nftables .validate interface,
	 from Florian Westphal.

Patch #6 is a oneliner to correct a typo in nftables kdoc,
	 from Simon Horman.

Patch #7 fixes missing kdoc in nftables, also from Simon.

Patch #8 updates nftables to handle timeout less than CONFIG_HZ.

Patch #9 rejects element expiration if timeout is zero,
	 otherwise it is silently ignored.

Patch #10 disallows element expiration larger than timeout.

Patch #11 removes unnecessary READ_ONCE annotation while mutex is held.

Patch #12 adds missing READ_ONCE/WRITE_ONCE annotation in dynset.

Patch #13 annotates data-races around element expiration.

Patch #14 allocates timeout and expiration in one single set element
	  extension, they are tighly couple, no reason to keep them
	  separated anymore.

Patch #15 updates nftables to interpret zero timeout element as never
	  times out. Note that it is already possible to declare sets
	  with elements that never time out but this generalizes to all
	  kind of set with timeouts.

Patch #16 supports for element timeout and expiration updates.

* tag 'nf-next-24-09-06' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: nf_tables: set element timeout update support
  netfilter: nf_tables: zero timeout means element never times out
  netfilter: nf_tables: consolidate timeout extension for elements
  netfilter: nf_tables: annotate data-races around element expiration
  netfilter: nft_dynset: annotate data-races around set timeout
  netfilter: nf_tables: remove annotation to access set timeout while holding lock
  netfilter: nf_tables: reject expiration higher than timeout
  netfilter: nf_tables: reject element expiration with no timeout
  netfilter: nf_tables: elements with timeout below CONFIG_HZ never expire
  netfilter: nf_tables: Add missing Kernel doc
  netfilter: nf_tables: Correct spelling in nf_tables.h
  netfilter: nf_tables: drop unused 3rd argument from validate callback ops
  netfilter: conntrack: Convert to use ERR_CAST()
  netfilter: Use kmemdup_array instead of kmemdup for multiple allocation
  netfilter: nft_counter: Use u64_stats_t for statistic.
  netfilter: ctnetlink: support CTA_FILTER for flush
====================

Link: https://patch.msgid.link/20240905232920.5481-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-06 18:39:31 -07:00
Philip Yang
663b0f1e14 drm/amdkfd: Document and define SVM events message macro
Document how to use SMI system management interface to enable and
receive SVM events. Document SVM event triggers.

Define SVM events message string format macro that could be used by user
mode for sscanf to parse the event. Add it to uAPI header file to make
it obvious that is changing uAPI in future.

No functional changes.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-06 17:55:05 -04:00
Linus Torvalds
703896be30 sound fixes for 6.11-rc7
Hopefully the last PR for 6.11, at least for this level of amount.
 
 In addition to the usual HD-audio quirks, there are more changes in
 ASoC, but all look small and device-specific fixes, and nothing stands
 out.  The only slightly big change is sunxi I2S fix, which looks quite
 safe to apply, too.
 -----BEGIN PGP SIGNATURE-----
 
 iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAmbaxkIOHHRpd2FpQHN1
 c2UuZGUACgkQLtJE4w1nLE/xDQ/9ESOhr5vmViDZ/LOXrmA36Mc7byvDE0ku9TsN
 bEBRSo2J5GJX3ty9ST6Pwm4jRlPAXRrqqWNhP2b+nQTTdzRWFQC58m79BSGmGWRA
 qyuFPakn1tiCdl5rvHS7hpYfzMkc8amVILhhJyhbv7Zg0UPriczPL3JI/W2GAJRs
 vMuHAD2ulSTVcJdYGgvny5JGWVQ649mQw7rfp+h3G9kXRLyOXKamGLU2e68TEZLq
 owUifBS1rC7+jXUn6dZ5MU86BLMQFCCDCOUjEEklH08Y+1DzqlS+JE80NcJlUBo8
 xNHoY1xFqDb0DAPb3w7P5IfCMBkb1I2uyHyMQg+SJeN05qXiSAKIsZhr5/NeivFq
 clB6R9l3tmN66lQEqr1ZxBEx6Sgr40Deh2qFOjyq7xsm+PRU4oPdels4+awRTWPy
 8fr/L2wWtYGGILM7iDkasgVeYQZGFJ7AG0gC0AyUoKtEcZZsQUiwfAWT85K4GLP9
 mmUMwqZH3nKtOxXgphSGbxOnep9cWNT7IJ/NXZ1iNGlByLXVDk5fIlOUK8983gCx
 duuFA1CpR3RwayGOo2UIwT62+qOTtfwGGKkHAPWjbBWa/daNxI8UEaHH/XEH3qyy
 iK8xFIkKimi95hjAA8FmVN6aGkeu3OCnQvdzbP3f3k0Tj1xe3t42xhPmQVl9yrpu
 lwIdl20=
 =npz1
 -----END PGP SIGNATURE-----

Merge tag 'sound-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "Hopefully the last PR for 6.11, at least for this level of amount.

  In addition to the usual HD-audio quirks, there are more changes in
  ASoC, but all look small and device-specific fixes, and nothing stands
  out. The only slightly big change is sunxi I2S fix, which looks quite
  safe to apply, too"

* tag 'sound-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (21 commits)
  ALSA: hda/realtek - Fix inactive headset mic jack for ASUS Vivobook 15 X1504VAP
  ALSA: hda/realtek: Support mute LED on HP Laptop 14-dq2xxx
  ALSA: hda/realtek: Enable Mute Led for HP Victus 15-fb1xxx
  ALSA: hda/realtek: extend quirks for Clevo V5[46]0
  ASoC: codecs: lpass-va-macro: set the default codec version for sm8250
  ALSA: hda: add HDMI codec ID for Intel PTL
  ALSA: hda/realtek: add patch for internal mic in Lenovo V145
  ASoC: sunxi: sun4i-i2s: fix LRCLK polarity in i2s mode
  ASoC: amd: yc: Add a quirk for MSI Bravo 17 (D7VEK)
  ASoC: mediatek: mt8188-mt6359: Modify key
  ASoc: SOF: topology: Clear SOF link platform name upon unload
  ALSA: hda/conexant: Add pincfg quirk to enable top speakers on Sirius devices
  ASoC: SOF: ipc: replace "enum sof_comp_type" field with "uint32_t"
  ASoC: fix module autoloading
  ASoC: tda7419: fix module autoloading
  ASoC: google: fix module autoloading
  ASoC: intel: fix module autoloading
  ASoC: tegra: Fix CBB error during probe()
  ASoC: dapm: Fix UAF for snd_soc_pcm_runtime object
  ASoC: Intel: soc-acpi-cht: Make Lenovo Yoga Tab 3 X90F DMI match less strict
  ...
2024-09-06 11:56:03 -07:00
Wouter Verhelst
e49dacc71e nbd: implement the WRITE_ZEROES command
The NBD protocol defines a message for zeroing out a region of an export

Add support to the kernel driver for that message.

Signed-off-by: Wouter Verhelst <w@uter.be>
Cc: Eric Blake <eblake@redhat.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20240812133032.115134-3-w@uter.be
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-09-06 08:31:40 -06:00
Takashi Iwai
c491b044cf ASoC: Fixes for v6.11
A larger set of fixes than I'd like at this point, but mainly due to
 people working on fixing module autoloading by adding missing exports of
 ID tables rather than anything particularly concerning.  There are some
 other runtime fixes and quirks, and a tweak to the ABI definition for
 SOF which ensures that a struct layout doesn't vary depending on the
 architecture of the host.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmbaKqkACgkQJNaLcl1U
 h9B6FAf/QlcNrFWx2m0SLhRqeeNRX3pwp7cjNLSo1TlIaBJy5NZJ1QjE7epl1/vx
 4nzWi+3p1oql3h64RLIGUZ3dIUlFBoRBjmfF/aunxQ9VT7F8cqvLol+4sl+TcpTc
 JWL5vuoCYcCsJDRwydUxcCyEClsC51PNn+WomqDP829yvgErob6VBXHhdRyZYd35
 ffPBhAUOW0C+SOu7zGCAVqbtki2WMKc8gYweUhsuqEFQebOMkrO/sqJJsLbWje4D
 LPGFfVDQaLQ+0h7pKR0m9Fwh17VTbsHGKEqt0qWK4GUfs61odsEs+C6BWfpM+S5k
 comuzgnjtxgCV3K9URmYZx0435i+PQ==
 =kMAy
 -----END PGP SIGNATURE-----

Merge tag 'asoc-fix-v6.11-rc6' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v6.11

A larger set of fixes than I'd like at this point, but mainly due to
people working on fixing module autoloading by adding missing exports of
ID tables rather than anything particularly concerning.  There are some
other runtime fixes and quirks, and a tweak to the ABI definition for
SOF which ensures that a struct layout doesn't vary depending on the
architecture of the host.
2024-09-06 08:24:56 +02:00
Dave Airlie
ca10367a5a A zpos normalization fix for komeda, a register bitmask fix for nouveau,
a memory leak fix for imagination, three fixes for the recent bridge
 HDMI work, a potential DoS fix and a cache coherency for panthor, a
 change of panel compatible and a deferred-io fix when used with
 non-highmem memory.
 -----BEGIN PGP SIGNATURE-----
 
 iJUEABMJAB0WIQTkHFbLp4ejekA/qfgnX84Zoj2+dgUCZtm9wgAKCRAnX84Zoj2+
 djrnAX913s+RTPRiNY6ym8XQo7jABX8/XxfHK9kbhyWF1aoOmd2kBxp/wP15hAmu
 ZSvMyPkBewek4CdFAS0GlkrdTkFXgvRIG415PtHTVwQU2bkdzc/3OzIhBNfkILX1
 HGzozDl5ew==
 =ui/V
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-fixes-2024-09-05' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes

A zpos normalization fix for komeda, a register bitmask fix for nouveau,
a memory leak fix for imagination, three fixes for the recent bridge
HDMI work, a potential DoS fix and a cache coherency for panthor, a
change of panel compatible and a deferred-io fix when used with
non-highmem memory.

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maxime Ripard <mripard@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240905-original-radical-guan-e7a2ae@houat
2024-09-06 11:25:46 +10:00
Hans Verkuil
c7a2925873 media: input: serio.h: add SERIO_EXTRON_DA_HD_PLUS
Add a new serio ID for the Extron DA HD 4K Plus series of 4K HDMI
Distribution Amplifiers. These devices support CEC over the serial
port, so a new serio ID is needed to be able to associate the CEC
driver.

Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2024-09-05 20:11:43 +02:00
Erling Ljunggren
6fe0593bfc media: videodev2.h: add V4L2_CAP_EDID
Add capability flag to indicate that the device is an EDID-only device.

Signed-off-by: Erling Ljunggren <hljunggr@cisco.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Reviewed-by: Sebastian Fricke <sebastian.fricke@collabora.com>
Reviewed-by: Ricardo Ribalda <ribalda@chromium.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2024-09-05 20:10:34 +02:00
Aleksa Sarai
4356d575ef fhandle: expose u64 mount id to name_to_handle_at(2)
Now that we provide a unique 64-bit mount ID interface in statx(2), we
can now provide a race-free way for name_to_handle_at(2) to provide a
file handle and corresponding mount without needing to worry about
racing with /proc/mountinfo parsing or having to open a file just to do
statx(2).

While this is not necessary if you are using AT_EMPTY_PATH and don't
care about an extra statx(2) call, users that pass full paths into
name_to_handle_at(2) need to know which mount the file handle comes from
(to make sure they don't try to open_by_handle_at a file handle from a
different filesystem) and switching to AT_EMPTY_PATH would require
allocating a file for every name_to_handle_at(2) call, turning

  err = name_to_handle_at(-EBADF, "/foo/bar/baz", &handle, &mntid,
                          AT_HANDLE_MNT_ID_UNIQUE);

into

  int fd = openat(-EBADF, "/foo/bar/baz", O_PATH | O_CLOEXEC);
  err1 = name_to_handle_at(fd, "", &handle, &unused_mntid, AT_EMPTY_PATH);
  err2 = statx(fd, "", AT_EMPTY_PATH, STATX_MNT_ID_UNIQUE, &statxbuf);
  mntid = statxbuf.stx_mnt_id;
  close(fd);

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Link: https://lore.kernel.org/r/20240828-exportfs-u64-mount-id-v3-2-10c2c4c16708@cyphar.com
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-05 11:39:17 +02:00
Aleksa Sarai
b4fef22c2f uapi: explain how per-syscall AT_* flags should be allocated
Unfortunately, the way we have gone about adding new AT_* flags has
been a little messy. In the beginning, all of the AT_* flags had generic
meanings and so it made sense to share the flag bits indiscriminately.
However, we inevitably ran into syscalls that needed their own
syscall-specific flags. Due to the lack of a planned out policy, we
ended up with the following situations:

 * Existing syscalls adding new features tended to use new AT_* bits,
   with some effort taken to try to re-use bits for flags that were so
   obviously syscall specific that they only make sense for a single
   syscall (such as the AT_EACCESS/AT_REMOVEDIR/AT_HANDLE_FID triplet).

   Given the constraints of bitflags, this works well in practice, but
   ideally (to avoid future confusion) we would plan ahead and define a
   set of "per-syscall bits" ahead of time so that when allocating new
   bits we don't end up with a complete mish-mash of which bits are
   supposed to be per-syscall and which aren't.

 * New syscalls dealt with this in several ways:

   - Some syscalls (like renameat2(2), move_mount(2), fsopen(2), and
     fspick(2)) created their separate own flag spaces that have no
     overlap with the AT_* flags. Most of these ended up allocating
     their bits sequentually.

     In the case of move_mount(2) and fspick(2), several flags have
     identical meanings to AT_* flags but were allocated in their own
     flag space.

     This makes sense for syscalls that will never share AT_* flags, but
     for some syscalls this leads to duplication with AT_* flags in a
     way that could cause confusion (if renameat2(2) grew a
     RENAME_EMPTY_PATH it seems likely that users could mistake it for
     AT_EMPTY_PATH since it is an *at(2) syscall).

   - Some syscalls unfortunately ended up both creating their own flag
     space while also using bits from other flag spaces. The most
     obvious example is open_tree(2), where the standard usage ends up
     using flags from *THREE* separate flag spaces:

       open_tree(AT_FDCWD, "/foo", OPEN_TREE_CLONE|O_CLOEXEC|AT_RECURSIVE);

     (Note that O_CLOEXEC is also platform-specific, so several future
     OPEN_TREE_* bits are also made unusable in one fell swoop.)

It's not entirely clear to me what the "right" choice is for new
syscalls. Just saying that all future VFS syscalls should use AT_* flags
doesn't seem practical. openat2(2) has RESOLVE_* flags (many of which
don't make much sense to burn generic AT_* flags for) and move_mount(2)
has separate AT_*-like flags for both the source and target so separate
flags are needed anyway (though it seems possible that renameat2(2)
could grow *_EMPTY_PATH flags at some point, and it's a bit of a shame
they can't be reused).

But at least for syscalls that _do_ choose to use AT_* flags, we should
explicitly state the policy that 0x2ff is currently intended for
per-syscall flags and that new flags should err on the side of
overlapping with existing flag bits (so we can extend the scope of
generic flags in the future if necessary).

And add AT_* aliases for the RENAME_* flags to further cement that
renameat2(2) is an *at(2) flag, just with its own per-syscall flags.

Suggested-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Link: https://lore.kernel.org/r/20240828-exportfs-u64-mount-id-v3-1-10c2c4c16708@cyphar.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-05 11:37:45 +02:00
Mary Guillemard
5f7762042f drm/panthor: Restrict high priorities on group_create
We were allowing any users to create a high priority group without any
permission checks. As a result, this was allowing possible denial of
service.

We now only allow the DRM master or users with the CAP_SYS_NICE
capability to set higher priorities than PANTHOR_GROUP_PRIORITY_MEDIUM.

As the sole user of that uAPI lives in Mesa and hardcode a value of
MEDIUM [1], this should be safe to do.

Additionally, as those checks are performed at the ioctl level,
panthor_group_create now only check for priority level validity.

[1]f390835074/src/gallium/drivers/panfrost/pan_csf.c (L1038)

Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Fixes: de8548813824 ("drm/panthor: Add the scheduler logical block")
Cc: stable@vger.kernel.org
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240903144955.144278-2-mary.guillemard@collabora.com
2024-09-05 09:33:33 +02:00
Ido Schimmel
1083d733eb ipv4: Fix user space build failure due to header change
RT_TOS() from include/uapi/linux/in_route.h is defined using
IPTOS_TOS_MASK from include/uapi/linux/ip.h. This is problematic for
files such as include/net/ip_fib.h that want to use RT_TOS() as without
including both header files kernel compilation fails:

In file included from ./include/net/ip_fib.h:25,
                 from ./include/net/route.h:27,
                 from ./include/net/lwtunnel.h:9,
                 from net/core/dst.c:24:
./include/net/ip_fib.h: In function ‘fib_dscp_masked_match’:
./include/uapi/linux/in_route.h:31:32: error: ‘IPTOS_TOS_MASK’ undeclared (first use in this function)
   31 | #define RT_TOS(tos)     ((tos)&IPTOS_TOS_MASK)
      |                                ^~~~~~~~~~~~~~
./include/net/ip_fib.h:440:45: note: in expansion of macro ‘RT_TOS’
  440 |         return dscp == inet_dsfield_to_dscp(RT_TOS(fl4->flowi4_tos));

Therefore, cited commit changed linux/in_route.h to include linux/ip.h.
However, as reported by David, this breaks iproute2 compilation due
overlapping definitions between linux/ip.h and
/usr/include/netinet/ip.h:

In file included from ../include/uapi/linux/in_route.h:5,
                 from iproute.c:19:
../include/uapi/linux/ip.h:25:9: warning: "IPTOS_TOS" redefined
   25 | #define IPTOS_TOS(tos)          ((tos)&IPTOS_TOS_MASK)
      |         ^~~~~~~~~
In file included from iproute.c:17:
/usr/include/netinet/ip.h:222:9: note: this is the location of the previous definition
  222 | #define IPTOS_TOS(tos)          ((tos) & IPTOS_TOS_MASK)

Fix by changing include/net/ip_fib.h to include linux/ip.h. Note that
usage of RT_TOS() should not spread further in the kernel due to recent
work in this area.

Fixes: 1fa3314c14c6 ("ipv4: Centralize TOS matching")
Reported-by: David Ahern <dsahern@kernel.org>
Closes: https://lore.kernel.org/netdev/2f5146ff-507d-4cab-a195-b28c0c9e654e@kernel.org/
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/20240903133554.2807343-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-04 16:40:33 -07:00
Mariusz Tkaczyk
4e893545ef PCI/NPEM: Add Native PCIe Enclosure Management support
Native PCIe Enclosure Management (NPEM, PCIe r6.1 sec 6.28) allows managing
LEDs in storage enclosures. NPEM is indication oriented and it does not
give direct access to LEDs. Although each indication *could* represent an
individual LED, multiple indications could also be represented as a single,
multi-color LED or a single LED blinking in a specific interval.  The
specification leaves that open.

Each enabled indication (capability register bit on) is represented as a
ledclass_dev which can be controlled through sysfs. For every ledclass
device only 2 brightness states are allowed: LED_ON (1) or LED_OFF (0).
This corresponds to the NPEM control register (Indication bit on/off).

Ledclass devices appear in sysfs as child devices (subdirectory) of PCI
device which has an NPEM Extended Capability and indication is enabled in
NPEM capability register. For example, these are LEDs created for pcieport
"10000:02:05.0" on my setup:

  leds/
  ├── 10000:02:05.0:enclosure:fail
  ├── 10000:02:05.0:enclosure:locate
  ├── 10000:02:05.0:enclosure:ok
  └── 10000:02:05.0:enclosure:rebuild

They can be also found in "/sys/class/leds" directory. The parent PCIe
device domain/bus/device/function address is used to guarantee uniqueness
across leds subsystem.

To enable/disable a "fail" indication, the "brightness" file can be edited:

  echo 1 > ./leds/10000:02:05.0:enclosure:fail/brightness
  echo 0 > ./leds/10000:02:05.0:enclosure:fail/brightness

PCIe r6.1, sec 7.9.19.2 defines the possible indications.

Multiple indications for same parent PCIe device can conflict and hardware
may update them when processing new request. To avoid issues, driver
refresh all indications by reading back control register.

This driver expects to be the exclusive NPEM extended capability manager.
It waits up to 1 second after imposing new request, it doesn't verify if
controller is busy before write, and it assumes the mutex lock gives
protection from concurrent updates.

If _DSM LED management is available, we assume the platform may be using
NPEM for its own purposes (see PCI Firmware Spec r3.3 sec 4.7), so the
driver does not use NPEM. A future patch will add _DSM support; an info
message notes whether NPEM or _DSM is being used.

NPEM is a PCIe extended capability so it should be registered in
pcie_init_capabilities() but it is not possible due to LED dependency.  The
parent pci_device must be added earlier for led_classdev_register() to be
successful. NPEM does not require configuration on kernel side, so it is
safe to register LED devices later.

Link: https://lore.kernel.org/r/20240904104848.23480-3-mariusz.tkaczyk@linux.intel.com
Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Stuart Hayes <stuart.w.hayes@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2024-09-04 17:25:12 -05:00
Tejun Heo
649e980dad Merge branch 'bpf/master' into for-6.12
Pull bpf/master to receive baebe9aaba1e ("bpf: allow passing struct
bpf_iter_<type> as kfunc arguments") and related changes in preparation for
the DSQ iterator patchset.

Signed-off-by: Tejun Heo <tj@kernel.org>
2024-09-04 11:41:32 -10:00
Alexander Mikhalitsyn
16e1503eaf fuse: allow idmapped mounts
Now we have everything in place and we can allow idmapped mounts
by setting the FS_ALLOW_IDMAP flag. Notice that real availability
of idmapped mounts will depend on the fuse daemon. Fuse daemon
have to set FUSE_ALLOW_IDMAP flag in the FUSE_INIT reply.

To discuss:
- we enable idmapped mounts support only if "default_permissions" mode is
enabled, because otherwise we would need to deal with UID/GID mappings in
the userspace side OR provide the userspace with idmapped
req->in.h.uid/req->in.h.gid values which is not something that we probably
want to. Idmapped mounts philosophy is not about faking caller uid/gid.

Some extra links and examples:

- libfuse support
https://github.com/mihalicyn/libfuse/commits/idmap_support

- fuse-overlayfs support:
https://github.com/mihalicyn/fuse-overlayfs/commits/idmap_support

- cephfs-fuse conversion example
https://github.com/mihalicyn/ceph/commits/fuse_idmap

- glusterfs conversion example
https://github.com/mihalicyn/glusterfs/commits/fuse_idmap

Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-09-04 16:51:11 +02:00
Alexander Mikhalitsyn
aa16880d9f fuse: add basic infrastructure to support idmappings
Add some preparational changes in fuse_get_req/fuse_force_creds
to handle idmappings.

Miklos suggested [1], [2] to change the meaning of in.h.uid/in.h.gid
fields when daemon declares support for idmapped mounts. In a new semantic,
we fill uid/gid values in fuse header with a id-mapped caller uid/gid (for
requests which create new inodes), for all the rest cases we just send -1
to userspace.

No functional changes intended.

Link: https://lore.kernel.org/all/CAJfpegsVY97_5mHSc06mSw79FehFWtoXT=hhTUK_E-Yhr7OAuQ@mail.gmail.com/ [1]
Link: https://lore.kernel.org/all/CAJfpegtHQsEUuFq1k4ZbTD3E1h-GsrN3PWyv7X8cg6sfU_W2Yw@mail.gmail.com/ [2]
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-09-04 16:48:22 +02:00
Joey Gouly
1751981992 arm64/ptrace: add support for FEAT_POE
Add a regset for POE containing POR_EL0.

Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20240822151113.1479789-21-joey.gouly@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-09-04 12:54:05 +01:00
Pablo Neira Ayuso
8bfb74ae12 netfilter: nf_tables: zero timeout means element never times out
This patch uses zero as timeout marker for those elements that never expire
when the element is created.

If userspace provides no timeout for an element, then the default set
timeout applies. However, if no default set timeout is specified and
timeout flag is set on, then timeout extension is allocated and timeout
is set to zero to allow for future updates.

Use of zero a never timeout marker has been suggested by Phil Sutter.

Note that, in older kernels, it is already possible to define elements
that never expire by declaring a set with the set timeout flag set on
and no global set timeout, in this case, new element with no explicit
timeout never expire do not allocate the timeout extension, hence, they
never expire. This approach makes it complicated to accomodate element
timeout update, because element extensions do not support reallocations.
Therefore, allocate the timeout extension and use the new marker for
this case, but do not expose it to userspace to retain backward
compatibility in the set listing.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-09-03 18:19:40 +02:00
Greg Kroah-Hartman
f53835f110 IIO: 1st set of new device support, features and cleanup for 6.12
Includes a merge of spi-mos-config branch from spi.git that brings
 support needed for the AD4000 driver.
 
 Lots of new device support this time including 9 new drivers and substantial
 changes to add new support to several more.
 
 New device support
 ------------------
 
 Given we have a lot of new support, I've subcategorized them:
 
 Substantial changes, or new driver
 **********************************
 
 adi,ad4000
 - New driver for this high speed ADC.
 adi,ad4695
 - New driver supporting AD4690, AD4696, AD4697 and AD4698 ADCs.
 - Follow up series added triggered buffer support.
 adi,ad7380
 - Add support for single ended parts, AD7386, ADC7387, AD7388 and -4 variants.
   (driver previously only support differential parts).
   These variants have an additional front end MUX so only half the channels
   can be sampled efficiently.
 adi,ad9467
 - Refactor and extend driver to support ad9643, ad9449 and ad9652 high speed
   ADCs.
 adi,adxl380
 - New driver for this low power accelerometer.
 adi,ltc2664
 - New driver supporting LTC2664 and LTC2672 DACs.
 microchip,pac1921
 - New driver for this power/current monitor chip.
 rohm,bh1745
 - New driver for this RGBC colour sensor.
 rohm,bu27034anuc
 - The original bu27034 was canceled before mass production, so the
   driver is modified to support the BU27034ANUC which had some significant
   differences.  DT compatible changed to avoid chance of old driver ever
   binding to real hardware.
 sciosense,ens210
 - New driver for ens210, ens210a, ens211, ens212, ens213a, and ens215
   temperature and humidity sensors (all register compatible up to some
   conversion time differences)
 sensiron,sdp500
 - New driver for this differential pressure sensor.
 tyhx,hx9023s
 - New driver to support this capacitive proximity sensor.
 
 Minor changes to support new devices
 ************************************
 
 adi,adf4377
 - Add support for the single output adf4378.
 kionix,kxcjk-1013
 - Add support for KX022-1020 accelerometer (binding and ID table only)
 liteon,ltrf216a
 - Add support for ltr-308.  A few minor differences in features set
 rockchip,saradc
 - Add ID for rk3576-saradc
 sensortek,stk3310
 - Add ID for stk3013 proximity sensor which (despite documentation) has
   an ambient light sensor and is compatible with existing parts.
 
 Documentation updates
 ---------------------
 
 Generalize ABI docs for shunt resistor attribute
 Improve calibscale and calibbias related documentation.  A couple of follow
 up patches to resolve duplicate documentation that resulted.
 
 New core features
 -----------------
 
 backend
 - Add option for debugfs - useful for test pattern control
 - Use this for both adi-axi-adc and adi-axi-dac
 trigger suspend
 - Add functions to allow triggers to be suspended. This avoids problems
   when a device enters suspend to idle with a sysfs trigger. Use it for now
   in the bmi323 only.
 
 New driver features
 -------------------
 
 adi,ad7192
 - Add option to be a clock provider (+ additional clock config options)
 adi,ad7380
 - Add documentation for this fairly new driver.
 adi,ad9461
 - Provide control of test modes and backend validation blocks used
   to identify problems (via debugfs)
 adi,ad9739
 - Add backend debugfs and docs for what is provided via adi-axi-dac
 avago,apds9960
 - Add proximity and gesture calibration offset control
 bosch,bmp280
 - Triggered buffer support including adding raw+scale output for sysfs.
 liteon,ltr390
 - Add configuration of integration time and scale.
 stm,dfsdm
 - Convert this SD modulator driver to backend framework and add support
   for channel scaling + modern channel bindings.
 
 Treewide cleanup
 ----------------
 
 iio_dev->masklength: Making it private.
 - Provide access function to read the core compute channel mask length
   and a macro to iterate over elements in the active_scan_mask.
 - Enables marking masklength __private preventing drivers from
   writing it without triggering a build warning whilst minimizing overhead
   in what are typically hot paths.
 - Convert all drivers and finally mark it private.
   Merge conflicts resolved in drivers applied after this point.
 Constify regmap_bus
 - These are never modified, so mark them const.
 
 Core cleanup
 ------------
 
 backend
 - A few late breaking bits of feedback (unused variable, error messages)
 dma-buffer
 - Namespace exports.
 core
 - Drop unused assignment.
 
 Driver cleanup
 --------------
 
 adi,ad4695
 - Fixing binding to reflect that common-mode-channel is a scalar.
 adi,ad7280a
 - Use __free(kfree) to simplify freeing of receive buffer.
 adi,ad7606
 - Various dt-binding cleanup and improvements.
 - Fix oversampling related gpio handling.
 - Make polarity of standby gpio match documentation.
 - use guard() to simplify lock handling.
 adi,ad7768
 - Use device_for_each_child_node_scoped() instead of fwnode equivalent.
 adi,ad7124
 - Reduce SPI transfers by avoiding separate writes to different fields
   in the same register.
 - Start the ADC in idle mode.
 adi,adis
 - Drop ifdefs in favor of IS_ENABLED.
 adi,admv8818
 - Fix wrong ABI docs.
 asahi-kasei,ak8975
 - Drop a prefix free compatible accidentally added recently.
 aspeed,adc
 - Use of_property_present() instead of of_find_property() to see if the
   property is there or not.
 atmel,at91,
 - Use __free(kfree) to simplify freeing of channel related array.
 bosch,bma400
 - Use __free(kfree) to simplify freeing a locally allocated string.
 bosch,bmc150
 - Add missing mount-matrix binding docs.
 bosch,bme680
 - Fix read/write to ensure multiple necessary sequential reads without
   device configuration change.
 - Drop unnecessary type casts and use more appropriate data types.
 - Drop some left over ACPI code as ACPI support was removed due to invalid
   IDs (and no known users).
 - Sort headers consistently.
 - Avoid unnecessary duplicate read and redundant read of gas config.
 - Use bulk reads to get calibration data.
 - Reorder allocation of IIO device to be prior to device init.
 - Add remaining read/write buffers to the union used already for all others.
 - Tidy up error checks for consistency of style, including dev_err_probe()
 - Bring the device startup procedure inline with the vendor code.
 - Reorder code so mode forcing is more obvious occurring where needed.
 - Tidy up data locality in reading functions so no magic data is stored
   in state structures just to get it across function calls.
 - Make a local lookup table static to avoid placing it on the stack.
 bosch,bmp280
 - Fix BME280 regmap to not include registers it doesn't have.
 - Wait a little longer after config to allow for maximum possible necessary
   wait.
 - Reorganize headers.
 - Make conversion_time_max array static to avoid placing it on the stack.
 maxim,max1363
 - Use __free(kfree) to simplify freeing transmission buffer.
 microchip,mcp3964
 - Use devm_regulator_get_enable_read_voltage()
 microchip,mcp3911
 - Use devm_regulator_get_enable_read_voltage()
 microchip,mcp4728
 - Use devm_regulator_get_enable_read_voltage()
 microchip,mcp4922
 - Use devm_regulator_get_enable_read_voltage() and devm_* to allow
   dropping of explicit remove() callback.
 onnn,noa1305
 - Various tidy up.
 - Provide available scale values.
 - Make integration time configurable.
 - Fix up integration time look up (/2 error)
 ti,dac7311
 - Check if spi_setup() succeeded.
 ti,tsc2046
 - Use __free(kfree) to simplify freeing rx and tx buffers.
 - Use devm_regulator_get_enable_read_voltage()
 
 Various minor fixes not called out explicitly.
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEEbilms4eEBlKRJoGxVIU0mcT0FogFAmbHhnARHGppYzIzQGtl
 cm5lbC5vcmcACgkQVIU0mcT0FohpshAApFDoTkyYMa7x1r5WUZ/5j474319LvwDO
 /9UIDIgR8qSzR2fDYl+LR03ZWknsOXF4lfCrCf65zPaR/8bB7TsjD8A7uIPVAKDF
 Tu+nSgBworJcvokPzygtrjoor2u2LCXdZVurYrFggMZ833LY5HTotFDAB32wx3QM
 p7p7OU0LgAZ8VR+ykzkbwp9NjOSrgD2mD7emy7Enu4h/OzLzst0c15KkUaOpnSUZ
 8R/+tz5lERrF+ACjWm+sWSe8ry2SkQppd8G8pSXyUM0uD2KO0I78FEpA3wUB2H++
 wiki1cm1kOM/ljHbXn2tqp5s+A8p6d0/LOCZm9bUi9kmtP5J2ky2iZmpZPraO52d
 +jbnHh/GyvoyIzeZRJZtp9h4hWTPNV2pgvb5BHD7Fek5rxOXXBlulDd695Ygbfq5
 vxiXYfN+ozVQk3/1mm0FwA34VZSoHADvzTxANQE9Vi99ywenpqJ5VYWQm/Bf0oHm
 HMH1sCcrmPHF9NOEUPV2uCanTQ20Q+OO89xOUBDGma1FKh6108wSont5c6GX/dKu
 sChUdllXSlNUR8VoiAYSFEP/U+gXnRE8Scxuk1Xx12RuKYpe0NNdRyRtj86kTU+1
 e6gHY90NskQCSVvOiivvo/rNTO08EZND9V3pbD/2HxaFvM4zw/iJtNR49DLdTkpX
 DfiCl2BAbLw=
 =eXUA
 -----END PGP SIGNATURE-----

Merge tag 'iio-for-6.12a' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jic23/iio into char-misc-testing

Jonathan writes:

IIO: 1st set of new device support, features and cleanup for 6.12

Includes a merge of spi-mos-config branch from spi.git that brings
support needed for the AD4000 driver.

Lots of new device support this time including 9 new drivers and substantial
changes to add new support to several more.

New device support
------------------

Given we have a lot of new support, I've subcategorized them:

Substantial changes, or new driver
**********************************

adi,ad4000
- New driver for this high speed ADC.
adi,ad4695
- New driver supporting AD4690, AD4696, AD4697 and AD4698 ADCs.
- Follow up series added triggered buffer support.
adi,ad7380
- Add support for single ended parts, AD7386, ADC7387, AD7388 and -4 variants.
  (driver previously only support differential parts).
  These variants have an additional front end MUX so only half the channels
  can be sampled efficiently.
adi,ad9467
- Refactor and extend driver to support ad9643, ad9449 and ad9652 high speed
  ADCs.
adi,adxl380
- New driver for this low power accelerometer.
adi,ltc2664
- New driver supporting LTC2664 and LTC2672 DACs.
microchip,pac1921
- New driver for this power/current monitor chip.
rohm,bh1745
- New driver for this RGBC colour sensor.
rohm,bu27034anuc
- The original bu27034 was canceled before mass production, so the
  driver is modified to support the BU27034ANUC which had some significant
  differences.  DT compatible changed to avoid chance of old driver ever
  binding to real hardware.
sciosense,ens210
- New driver for ens210, ens210a, ens211, ens212, ens213a, and ens215
  temperature and humidity sensors (all register compatible up to some
  conversion time differences)
sensiron,sdp500
- New driver for this differential pressure sensor.
tyhx,hx9023s
- New driver to support this capacitive proximity sensor.

Minor changes to support new devices
************************************

adi,adf4377
- Add support for the single output adf4378.
kionix,kxcjk-1013
- Add support for KX022-1020 accelerometer (binding and ID table only)
liteon,ltrf216a
- Add support for ltr-308.  A few minor differences in features set
rockchip,saradc
- Add ID for rk3576-saradc
sensortek,stk3310
- Add ID for stk3013 proximity sensor which (despite documentation) has
  an ambient light sensor and is compatible with existing parts.

Documentation updates
---------------------

Generalize ABI docs for shunt resistor attribute
Improve calibscale and calibbias related documentation.  A couple of follow
up patches to resolve duplicate documentation that resulted.

New core features
-----------------

backend
- Add option for debugfs - useful for test pattern control
- Use this for both adi-axi-adc and adi-axi-dac
trigger suspend
- Add functions to allow triggers to be suspended. This avoids problems
  when a device enters suspend to idle with a sysfs trigger. Use it for now
  in the bmi323 only.

New driver features
-------------------

adi,ad7192
- Add option to be a clock provider (+ additional clock config options)
adi,ad7380
- Add documentation for this fairly new driver.
adi,ad9461
- Provide control of test modes and backend validation blocks used
  to identify problems (via debugfs)
adi,ad9739
- Add backend debugfs and docs for what is provided via adi-axi-dac
avago,apds9960
- Add proximity and gesture calibration offset control
bosch,bmp280
- Triggered buffer support including adding raw+scale output for sysfs.
liteon,ltr390
- Add configuration of integration time and scale.
stm,dfsdm
- Convert this SD modulator driver to backend framework and add support
  for channel scaling + modern channel bindings.

Treewide cleanup
----------------

iio_dev->masklength: Making it private.
- Provide access function to read the core compute channel mask length
  and a macro to iterate over elements in the active_scan_mask.
- Enables marking masklength __private preventing drivers from
  writing it without triggering a build warning whilst minimizing overhead
  in what are typically hot paths.
- Convert all drivers and finally mark it private.
  Merge conflicts resolved in drivers applied after this point.
Constify regmap_bus
- These are never modified, so mark them const.

Core cleanup
------------

backend
- A few late breaking bits of feedback (unused variable, error messages)
dma-buffer
- Namespace exports.
core
- Drop unused assignment.

Driver cleanup
--------------

adi,ad4695
- Fixing binding to reflect that common-mode-channel is a scalar.
adi,ad7280a
- Use __free(kfree) to simplify freeing of receive buffer.
adi,ad7606
- Various dt-binding cleanup and improvements.
- Fix oversampling related gpio handling.
- Make polarity of standby gpio match documentation.
- use guard() to simplify lock handling.
adi,ad7768
- Use device_for_each_child_node_scoped() instead of fwnode equivalent.
adi,ad7124
- Reduce SPI transfers by avoiding separate writes to different fields
  in the same register.
- Start the ADC in idle mode.
adi,adis
- Drop ifdefs in favor of IS_ENABLED.
adi,admv8818
- Fix wrong ABI docs.
asahi-kasei,ak8975
- Drop a prefix free compatible accidentally added recently.
aspeed,adc
- Use of_property_present() instead of of_find_property() to see if the
  property is there or not.
atmel,at91,
- Use __free(kfree) to simplify freeing of channel related array.
bosch,bma400
- Use __free(kfree) to simplify freeing a locally allocated string.
bosch,bmc150
- Add missing mount-matrix binding docs.
bosch,bme680
- Fix read/write to ensure multiple necessary sequential reads without
  device configuration change.
- Drop unnecessary type casts and use more appropriate data types.
- Drop some left over ACPI code as ACPI support was removed due to invalid
  IDs (and no known users).
- Sort headers consistently.
- Avoid unnecessary duplicate read and redundant read of gas config.
- Use bulk reads to get calibration data.
- Reorder allocation of IIO device to be prior to device init.
- Add remaining read/write buffers to the union used already for all others.
- Tidy up error checks for consistency of style, including dev_err_probe()
- Bring the device startup procedure inline with the vendor code.
- Reorder code so mode forcing is more obvious occurring where needed.
- Tidy up data locality in reading functions so no magic data is stored
  in state structures just to get it across function calls.
- Make a local lookup table static to avoid placing it on the stack.
bosch,bmp280
- Fix BME280 regmap to not include registers it doesn't have.
- Wait a little longer after config to allow for maximum possible necessary
  wait.
- Reorganize headers.
- Make conversion_time_max array static to avoid placing it on the stack.
maxim,max1363
- Use __free(kfree) to simplify freeing transmission buffer.
microchip,mcp3964
- Use devm_regulator_get_enable_read_voltage()
microchip,mcp3911
- Use devm_regulator_get_enable_read_voltage()
microchip,mcp4728
- Use devm_regulator_get_enable_read_voltage()
microchip,mcp4922
- Use devm_regulator_get_enable_read_voltage() and devm_* to allow
  dropping of explicit remove() callback.
onnn,noa1305
- Various tidy up.
- Provide available scale values.
- Make integration time configurable.
- Fix up integration time look up (/2 error)
ti,dac7311
- Check if spi_setup() succeeded.
ti,tsc2046
- Use __free(kfree) to simplify freeing rx and tx buffers.
- Use devm_regulator_get_enable_read_voltage()

Various minor fixes not called out explicitly.

* tag 'iio-for-6.12a' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jic23/iio: (250 commits)
  drivers:iio:Fix the NULL vs IS_ERR() bug for debugfs_create_dir()
  iio: sgp40: retain documentation in driver
  iio: ABI: remove duplicate in_resistance_calibbias
  dt-bindings: iio: st,stm32-adc: add top-level constraints
  iio: ABI: add missing calibbias attributes
  iio: ABI: add missing calibscale attributes
  iio: ABI: sort calibscale attributes
  iio: ABI: document calibscale_available attributes
  iio: light: ltr390: Calculate 'counts_per_uvi' dynamically
  iio: light: ltr390: Add ALS channel and support for gain and resolution
  doc: iio: ad4695: document buffered read
  iio: adc: ad4695: implement triggered buffer
  iio: proximity: hx9023s: Fix error code in hx9023s_property_get()
  iio: light: noa1305: Fix up integration time look up
  iio: humidity: Add support for ENS210
  dt-bindings: iio: humidity: add ENS210 sensor family
  iio: imu: adis16460: drop ifdef around CONFIG_DEBUG_FS
  iio: imu: adis16400: drop ifdef around CONFIG_DEBUG_FS
  iio: imu: adis16480: drop ifdef around CONFIG_DEBUG_FS
  iio: imu: adis16475: drop ifdef around CONFIG_DEBUG_FS
  ...
2024-09-03 11:32:16 +02:00
Greg Kroah-Hartman
35f4a62964 Merge 6.11-rc6 into usb-next
We need the USB fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-09-03 07:42:58 +02:00
Chandramohan Akula
181028a0d8 RDMA/bnxt_re: Share a page to expose per SRQ info with userspace
Gen P7 adapters needs to share a toggle bits information received
in kernel driver with the user space. User space needs this
info to arm the SRQ.

User space application can get this page using the
UAPI routines. Library will mmap this page and get the
toggle bits to be used in the next ARM Doorbell.

Uses a hash list to map the SRQ structure from the SRQ ID.
SRQ structure is retrieved from the hash list while the
library calls the UAPI routine to get the toggle page
mapping. Currently the full page is mapped per SRQ. This
can be optimized to enable multiple SRQs from the same
application share the same page and different offsets
in the page

Signed-off-by: Chandramohan Akula <chandramohan.akula@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://patch.msgid.link/1724945645-14989-4-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-02 10:10:36 +03:00
Matthew Wilcox (Oracle)
09022bc196 mm: remove PG_error
The PG_error bit is now unused; delete it and free up a bit in
page->flags.

Link: https://lkml.kernel.org/r/20240807193528.1865100-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 20:26:05 -07:00
Connor Abbott
d7eafed322 drm/msm: Expose expanded UBWC config uapi
This adds extra parameters that affect UBWC tiling that will be used by
the Mesa implementation of VK_EXT_host_image_copy.

Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/607401/
Signed-off-by: Rob Clark <robdclark@chromium.org>
2024-08-30 10:41:19 -07:00
Ian Kent
433f9d76a0 autofs: add per dentry expire timeout
Add ability to set per-dentry mount expire timeout to autofs.

There are two fairly well known automounter map formats, the autofs
format and the amd format (more or less System V and Berkley).

Some time ago Linux autofs added an amd map format parser that
implemented a fair amount of the amd functionality. This was done
within the autofs infrastructure and some functionality wasn't
implemented because it either didn't make sense or required extra
kernel changes. The idea was to restrict changes to be within the
existing autofs functionality as much as possible and leave changes
with a wider scope to be considered later.

One of these changes is implementing the amd options:
1) "unmount", expire this mount according to a timeout (same as the
   current autofs default).
2) "nounmount", don't expire this mount (same as setting the autofs
   timeout to 0 except only for this specific mount) .
3) "utimeout=<seconds>", expire this mount using the specified
   timeout (again same as setting the autofs timeout but only for
   this mount).

To implement these options per-dentry expire timeouts need to be
implemented for autofs indirect mounts. This is because all map keys
(mounts) for autofs indirect mounts use an expire timeout stored in
the autofs mount super block info. structure and all indirect mounts
use the same expire timeout.

Now I have a request to add the "nounmount" option so I need to add
the per-dentry expire handling to the kernel implementation to do this.

The implementation uses the trailing path component to identify the
mount (and is also used as the autofs map key) which is passed in the
autofs_dev_ioctl structure path field. The expire timeout is passed
in autofs_dev_ioctl timeout field (well, of the timeout union).

If the passed in timeout is equal to -1 the per-dentry timeout and
flag are cleared providing for the "unmount" option. If the timeout
is greater than or equal to 0 the timeout is set to the value and the
flag is also set. If the dentry timeout is 0 the dentry will not expire
by timeout which enables the implementation of the "nounmount" option
for the specific mount. When the dentry timeout is greater than zero it
allows for the implementation of the "utimeout=<seconds>" option.

Signed-off-by: Ian Kent <raven@themaw.net>
Link: https://lore.kernel.org/r/20240814090231.963520-1-raven@themaw.net
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30 08:22:36 +02:00
Jakub Kicinski
3cbd2090d3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

drivers/net/ethernet/faraday/ftgmac100.c
  4186c8d9e6af ("net: ftgmac100: Ensure tx descriptor updates are visible")
  e24a6c874601 ("net: ftgmac100: Get link speed and duplex for NC-SI")
https://lore.kernel.org/0b851ec5-f91d-4dd3-99da-e81b98c9ed28@kernel.org

net/ipv4/tcp.c
  bac76cf89816 ("tcp: fix forever orphan socket caused by tcp_abort")
  edefba66d929 ("tcp: rstreason: introduce SK_RST_REASON_TCP_STATE for active reset")
https://lore.kernel.org/20240828112207.5c199d41@canb.auug.org.au

No adjacent changes.

Link: https://patch.msgid.link/20240829130829.39148-1-pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-29 11:49:10 -07:00
Jens Axboe
ae98dbf43d io_uring/kbuf: add support for incremental buffer consumption
By default, any recv/read operation that uses provided buffers will
consume at least 1 buffer fully (and maybe more, in case of bundles).
This adds support for incremental consumption, meaning that an
application may add large buffers, and each read/recv will just consume
the part of the buffer that it needs.

For example, let's say an application registers 1MB buffers in a
provided buffer ring, for streaming receives. If it gets a short recv,
then the full 1MB buffer will be consumed and passed back to the
application. With incremental consumption, only the part that was
actually used is consumed, and the buffer remains the current one.

This means that both the application and the kernel needs to keep track
of what the current receive point is. Each recv will still pass back a
buffer ID and the size consumed, the only difference is that before the
next receive would always be the next buffer in the ring. Now the same
buffer ID may return multiple receives, each at an offset into that
buffer from where the previous receive left off. Example:

Application registers a provided buffer ring, and adds two 32K buffers
to the ring.

Buffer1 address: 0x1000000 (buffer ID 0)
Buffer2 address: 0x2000000 (buffer ID 1)

A recv completion is received with the following values:

cqe->res	0x1000	(4k bytes received)
cqe->flags	0x11	(CQE_F_BUFFER|CQE_F_BUF_MORE set, buffer ID 0)

and the application now knows that 4096b of data is available at
0x1000000, the start of that buffer, and that more data from this buffer
will be coming. Now the next receive comes in:

cqe->res	0x2010	(8k bytes received)
cqe->flags	0x11	(CQE_F_BUFFER|CQE_F_BUF_MORE set, buffer ID 0)

which tells the application that 8k is available where the last
completion left off, at 0x1001000. Next completion is:

cqe->res	0x5000	(20k bytes received)
cqe->flags	0x1	(CQE_F_BUFFER set, buffer ID 0)

and the application now knows that 20k of data is available at
0x1003000, which is where the previous receive ended. CQE_F_BUF_MORE
isn't set, as no more data is available in this buffer ID. The next
completion is then:

cqe->res	0x1000	(4k bytes received)
cqe->flags	0x10001	(CQE_F_BUFFER|CQE_F_BUF_MORE set, buffer ID 1)

which tells the application that buffer ID 1 is now the current one,
hence there's 4k of valid data at 0x2000000. 0x2001000 will be the next
receive point for this buffer ID.

When a buffer will be reused by future CQE completions,
IORING_CQE_BUF_MORE will be set in cqe->flags. This tells the application
that the kernel isn't done with the buffer yet, and that it should expect
more completions for this buffer ID. Will only be set by provided buffer
rings setup with IOU_PBUF_RING INC, as that's the only type of buffer
that will see multiple consecutive completions for the same buffer ID.
For any other provided buffer type, any completion that passes back
a buffer to the application is final.

Once a buffer has been fully consumed, the buffer ring head is
incremented and the next receive will indicate the next buffer ID in the
CQE cflags.

On the send side, the application can manage how much data is sent from
an existing buffer by setting sqe->len to the desired send length.

An application can request incremental consumption by setting
IOU_PBUF_RING_INC in the provided buffer ring registration. Outside of
that, any provided buffer ring setup and buffer additions is done like
before, no changes there. The only change is in how an application may
see multiple completions for the same buffer ID, hence needing to know
where the next receive will happen.

Note that like existing provided buffer rings, this should not be used
with IOSQE_ASYNC, as both really require the ring to remain locked over
the duration of the buffer selection and the operation completion. It
will consume a buffer otherwise regardless of the size of the IO done.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-08-29 08:44:58 -06:00
Peter Hutterer
b31c9d9dc3 HID: hidraw: add HIDIOCREVOKE ioctl
There is a need for userspace applications to open HID devices directly.
Use-cases include configuration of gaming mice or direct access to
joystick devices. The latter is currently handled by the uaccess tag in
systemd, other devices include more custom/local configurations or just
sudo.

A better approach is what we already have for evdev devices: give the
application a file descriptor and revoke it when it may no longer access
that device.

This patch is the hidraw equivalent to the EVIOCREVOKE ioctl, see
commit c7dc65737c9a ("Input: evdev - add EVIOCREVOKE ioctl") for full
details.

An MR for systemd-logind has been filed here:
https://github.com/systemd/systemd/pull/33970

Signed-off-by: Peter Hutterer <peter.hutterer@who-t.net>
Link: https://patch.msgid.link/20240827-hidraw-revoke-v5-1-d004a7451aea@kernel.org
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
2024-08-29 10:39:37 +02:00
Christoph Hellwig
57413d8e17 fs: sort out the fallocate mode vs flag mess
The fallocate system call takes a mode argument, but that argument
contains a wild mix of exclusive modes and an optional flags.

Replace FALLOC_FL_SUPPORTED_MASK with FALLOC_FL_MODE_MASK, which excludes
the optional flag bit, so that we can use switch statement on the value
to easily enumerate the cases while getting the check for duplicate modes
for free.

To make this (and in the future the file system implementations) more
readable also add a symbolic name for the 0 mode used to allocate blocks.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240827065123.1762168-4-hch@lst.de
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-28 16:53:57 +02:00
Anshuman Khandual
947697c6f0 uapi: Define GENMASK_U128
This adds GENMASK_U128() and __GENMASK_U128() macros using __BITS_PER_U128
and __int128 data types. These macros will be used in providing support for
generating 128 bit masks.

The macros wouldn't work in all assembler flavors for reasons described
in the comments on top of declarations. Enforce it for more by adding
!__ASSEMBLY__ guard.

Cc: Yury Norov <yury.norov@gmail.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Arnd Bergmann <arnd@arndb.de>>
Cc: linux-kernel@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
2024-08-28 06:53:58 -07:00
Jason Gunthorpe
34cd192881 Merge branch 'bnxt_re_variable_wqes' into rdma.git for-next
Selvin Xavier says:

=============
Enable the Variable size Work Queue entry support for Gen P7
adapters. This would help in the better utilization of the queue memory
and pci bandwidth due to the smaller send queue Work entries.
=============

Based on v6.11-rc5 for dependencies.

* bnxt_re_variable_wqes: (829 commits)
  RDMA/bnxt_re: Enable variable size WQEs for user space applications
  RDMA/bnxt_re: Handle variable WQE support for user applications
  RDMA/bnxt_re: Fix the table size for PSN/MSN entries
  RDMA/bnxt_re: Get the WQE index from slot index while completing the WQEs
  RDMA/bnxt_re: Add support for Variable WQE in Genp7 adapters
  Linux 6.11-rc5
  ...

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-08-28 09:11:29 -03:00
Rodrigo Vivi
04cf420bbc
Merge drm/drm-next into drm-intel-next
Need to take some Xe bo definition in here before
we can add the BMG display 64k aligned size restrictions.

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-08-27 17:06:28 -04:00