When invoke memfd_pin_folios, we need offer an array to save each folio
which we pinned.
The current way is dynamic alloc an array(use kvmalloc), get folios,
save into udmabuf and then free.
Depend on the size, kvmalloc can do something different:
Below PAGE_SIZE, slab allocator will be used, which have good alloc
performance, due to it cached page.
PAGE_SIZE - PCP Order, PCP(per-cpu-pageset) also given buddy page a
cache in each CPU, so different CPU no need to hold some lock(zone or
some) to get the locally page. If PCP cached page, the access also fast.
PAGE_SIZE - BUDDY_MAX, try to get page from buddy, due to kvmalloc adjusted
the gfp flags, if zone freelist can't alloc page(fast path), we will not
enter slowpath to reclaim memory. Due to need hold lock and check, may
slow, but still fast than vmalloc.
Anything wrong will fallback into vmalloc to alloc memory, it obtains
contiguous virtual addresses by loop alloc order 0 page(PAGE_SIZE), and
then map it into vmalloc area. If necessary, page alloc may enter
slowpath to reclaim memory. Hence, if fallback into vmalloc, it's slow.
When create, we need to iter each udmabuf item, then pin it's range
folios, if each item's range folio's count is large, we may fallback each
into vmalloc.
This patch find the largest range folio in items, then alloc this size's
folio array. When pin range folios, reuse this array.
Signed-off-by: Huan Yang <link@vivo.com>
Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240918025238.2957823-8-link@vivo.com
Currently, udmabuf handles folio by create an unpin list to record
each folio obtained from the list and unpinning them when released. To
maintain this, many struct have been established.
However, maintain this requires a significant amount of memory and
iter the list is a substantial overhead, which is not friendly to the
CPU cache.
When create, we arranged the folio array in the order of pin and set
the offset according to pgcnt. So, if record each pinned folio when
create, then can easy unpin it. Compare to use list to record it,
an array also can do this.
Hence, this patch setup a pinned_folios array(size is the pgcnt) to
instead of udmabuf_folio struct, it record each folio which pinned when
invoke memfd_pin_folios, then unpin folio by iter pinned_folios.
Note that, since a folio may be pinned multiple times, each folio can be
added to pinned_folios multiple times, depend on how many times the
folio has been pinned when create.
Compare to udmabuf_folio(24 byte size), a folio pointer is 8 byte, if no
large folio - each folio is PAGE_SIZE - and need to unpin when release.
So need to record each folio, by this patch, each folio can save 16 byte.
But if large folio used, depend on the large folio's number, the
pinned_folios array may take more memory, but it still can makes unpin
access more cache-friendly.
Signed-off-by: Huan Yang <link@vivo.com>
Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240918025238.2957823-7-link@vivo.com
After udmabuf is allocated, its resources need to be initialized,
including various array structures. The current array structure has
already been greatly expanded.
Also, before udmabuf needs to be kfree, the occupied resources need to
be released.
This part is repetitive and maybe overlooked.
This patch give a helper function when init and deinit, by this,
reduce duplicate code.
Signed-off-by: Huan Yang <link@vivo.com>
Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240918025238.2957823-6-link@vivo.com
This patch aim to simplify the memfd folio pin during the udmabuf
create. No functional changes.
This patch create a udmabuf_pin_folios function, in this, do the memfd
pin folio and then record each pinned folio, offset.
This patch simplify the pinned folio record, iter by each pinned folio,
and then record each offset in it.
Compare to iter by pgcnt, more readable.
Suggested-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Signed-off-by: Huan Yang <link@vivo.com>
Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240918025238.2957823-5-link@vivo.com
Currently vmap_udmabuf set page's array by each folio.
But, ubuf->folios is only contain's the folio's head page.
That mean we repeatedly mapped the folio head page to the vmalloc area.
Due to udmabuf can use hugetlb, if HVO enabled, tail page may not exist,
so, we can't use page array to map, instead, use pfn array.
By this, we removed page usage in udmabuf totally.
Fixes: 5e72b2b41a21 ("udmabuf: convert udmabuf driver to use folios")
Suggested-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Signed-off-by: Huan Yang <link@vivo.com>
Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240918025238.2957823-4-link@vivo.com
When PAGE_SIZE 4096, MAX_PAGE_ORDER 10, 64bit machine,
page_alloc only support 4MB.
If above this, trigger this warn and return NULL.
udmabuf can change size limit, if change it to 3072(3GB), and then alloc
3GB udmabuf, will fail create.
[ 4080.876581] ------------[ cut here ]------------
[ 4080.876843] WARNING: CPU: 3 PID: 2015 at mm/page_alloc.c:4556 __alloc_pages+0x2c8/0x350
[ 4080.878839] RIP: 0010:__alloc_pages+0x2c8/0x350
[ 4080.879470] Call Trace:
[ 4080.879473] <TASK>
[ 4080.879473] ? __alloc_pages+0x2c8/0x350
[ 4080.879475] ? __warn.cold+0x8e/0xe8
[ 4080.880647] ? __alloc_pages+0x2c8/0x350
[ 4080.880909] ? report_bug+0xff/0x140
[ 4080.881175] ? handle_bug+0x3c/0x80
[ 4080.881556] ? exc_invalid_op+0x17/0x70
[ 4080.881559] ? asm_exc_invalid_op+0x1a/0x20
[ 4080.882077] ? udmabuf_create+0x131/0x400
Because MAX_PAGE_ORDER, kmalloc can max alloc 4096 * (1 << 10), 4MB
memory, each array entry is pointer(8byte), so can save 524288 pages(2GB).
Further more, costly order(order 3) may not be guaranteed that it can be
applied for, due to fragmentation.
This patch change udmabuf array use kvmalloc_array, this can fallback
alloc into vmalloc, which can guarantee allocation for any size and does
not affect the performance of kmalloc allocations.
Signed-off-by: Huan Yang <link@vivo.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240918025238.2957823-3-link@vivo.com
The current udmabuf mmap only fills the physical memory to the
corresponding virtual address when the user actually accesses the
virtual address.
However, the current udmabuf has already obtained and pinned the folio
upon completion of the creation.This means that the physical memory has
already been acquired, rather than being accessed dynamically.
As a result, the page fault has lost its purpose as a demanding
page. Due to the fact that page fault requires trapping into kernel mode
and filling in when accessing the corresponding virtual address in mmap,
when creating a large size udmabuf, this represents a considerable
overhead.
This patch fill the pfn into page table, and then pre-fault each pfn
into vma, when first access.
Notice, if anything wrong , we do not return an error during this
pre-fault step. However, an error will be returned if the failure occurs
when the addr is truly accessed
Suggested-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Signed-off-by: Huan Yang <link@vivo.com>
Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240918025238.2957823-2-link@vivo.com
Using memfd_pin_folios() will ensure that the pages are pinned
correctly using FOLL_PIN. And, this also ensures that we don't
accidentally break features such as memory hotunplug as it would
not allow pinning pages in the movable zone.
Using this new API also simplifies the code as we no longer have
to deal with extracting individual pages from their mappings or
handle shmem and hugetlb cases separately.
Link: https://lkml.kernel.org/r/20240624063952.1572359-9-vivek.kasireddy@intel.com
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Hugh Dickins <hughd@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Dongwon Kim <dongwon.kim@intel.com>
Cc: Junxiao Chang <junxiao.chang@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This is mainly a preparatory patch to use memfd_pin_folios() API for
pinning folios. Using folios instead of pages makes sense as the udmabuf
driver needs to handle both shmem and hugetlb cases. And, using the
memfd_pin_folios() API makes this easier as we no longer need to
separately handle shmem vs hugetlb cases in the udmabuf driver.
Note that, the function vmap_udmabuf() still needs a list of pages; so, we
collect all the head pages into a local array in this case.
Other changes in this patch include the addition of helpers for checking
the memfd seals and exporting dmabuf. Moving code from udmabuf_create()
into these helpers improves readability given that udmabuf_create() is a
bit long.
Link: https://lkml.kernel.org/r/20240624063952.1572359-8-vivek.kasireddy@intel.com
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Hugh Dickins <hughd@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Dongwon Kim <dongwon.kim@intel.com>
Cc: Junxiao Chang <junxiao.chang@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
A user or admin can configure a VMM (Qemu) Guest's memory to be backed by
hugetlb pages for various reasons. However, a Guest OS would still
allocate (and pin) buffers that are backed by regular 4k sized pages. In
order to map these buffers and create dma-bufs for them on the Host, we
first need to find the hugetlb pages where the buffer allocations are
located and then determine the offsets of individual chunks (within those
pages) and use this information to eventually populate a scatterlist.
Testcase: default_hugepagesz=2M hugepagesz=2M hugepages=2500 options
were passed to the Host kernel and Qemu was launched with these
relevant options: qemu-system-x86_64 -m 4096m....
-device virtio-gpu-pci,max_outputs=1,blob=true,xres=1920,yres=1080
-display gtk,gl=on
-object memory-backend-memfd,hugetlb=on,id=mem1,size=4096M
-machine memory-backend=mem1
Replacing -display gtk,gl=on with -display gtk,gl=off above would
exercise the mmap handler.
Link: https://lkml.kernel.org/r/20240624063952.1572359-7-vivek.kasireddy@intel.com
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Acked-by: Mike Kravetz <mike.kravetz@oracle.com> (v2)
Acked-by: Dave Airlie <airlied@redhat.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Hugh Dickins <hughd@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Dongwon Kim <dongwon.kim@intel.com>
Cc: Junxiao Chang <junxiao.chang@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add VM_PFNMAP to vm_flags in the mmap handler to ensure that the mappings
would be managed without using struct page.
And, in the vm_fault handler, use vmf_insert_pfn to share the page's pfn
to userspace instead of directly sharing the page (via struct page *).
Link: https://lkml.kernel.org/r/20240624063952.1572359-6-vivek.kasireddy@intel.com
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Hugh Dickins <hughd@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Dongwon Kim <dongwon.kim@intel.com>
Cc: Junxiao Chang <junxiao.chang@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
UAPI Changes:
* fbdev:
* Make fbdev userspace interfaces optional; only leaves the
framebuffer console active
* prime:
* Support dma-buf self-import for all drivers automatically: improves
support for many userspace compositors
Cross-subsystem Changes:
* backlight:
* Fix interaction with fbdev in several drivers
* base: Convert struct platform.remove to return void; part of a larger,
tree-wide effort
* dma-buf: Acquire reservation lock for mmap() in exporters; part
of an on-going effort to simplify locking around dma-bufs
* fbdev:
* Use Linux device instead of fbdev device in many places
* Use deferred-I/O helper macros in various drivers
* i2c: Convert struct i2c from .probe_new to .probe; part of a larger,
tree-wide effort
* video:
* Avoid including <linux/screen_info.h>
Core Changes:
* atomic:
* Improve logging
* prime:
* Remove struct drm_driver.gem_prime_mmap plus driver updates: all
drivers now implement this callback with drm_gem_prime_mmap()
* gem:
* Support execution contexts: provides locking over multiple GEM
objects
* ttm:
* Support init_on_free
* Swapout fixes
Driver Changes:
* accel:
* ivpu: MMU updates; Support debugfs
* ast:
* Improve device-model detection
* Cleanups
* bridge:
* dw-hdmi: Improve support for YUV420 bus format
* dw-mipi-dsi: Fix enable/disable of DSI controller
* lt9611uxc: Use MODULE_FIRMWARE()
* ps8640: Remove broken EDID code
* samsung-dsim: Fix command transfer
* tc358764: Handle HS/VS polarity; Use BIT() macro; Various cleanups
* Cleanups
* ingenic:
* Kconfig REGMAP fixes
* loongson:
* Support display controller
* mgag200:
* Minor fixes
* mxsfb:
* Support disabling overlay planes
* nouveau:
* Improve VRAM detection
* Various fixes and cleanups
* panel:
* panel-edp: Support AUO B116XAB01.4
* Support Visionox R66451 plus DT bindings
* Cleanups
* ssd130x:
* Support per-controller default resolution plus DT bindings
* Reduce memory-allocation overhead
* Cleanups
* tidss:
* Support TI AM625 plus DT bindings
* Implement new connector model plus driver updates
* vkms
* Improve write-back support
* Documentation fixes
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEchf7rIzpz2NEoWjlaA3BHVMLeiMFAmSvvRAACgkQaA3BHVML
eiNpGQgAs8jq1XjN9t8jZsdgXnoCbkZyVUI2NO0HwoVwpRCLgbXp5AX5qq2oRciE
TBhe4Fceh/ZsYqHTZQahnguxgRKM5JgXwbI4Z0iiOVcqasNbycaKAqipxJJ7kdo1
qPhGCbgQFVX7oIq2xjfXehh6O0SYX+R9r88X8dMJxMYv/pcLwOHG74kS040WOcQq
uATgcnobOf/D8ZmlqvfKGAeTUoFo/RSR2Uhlauka58qgeUbicrTELZT2barY9d+k
as6U5vv4wx2zMklTkjrlkMpAT1ZpbB9d3jGHwL27VEnjlfd3wV2bdH7Dzn9qZRf/
gn0ALg/b3u5yBWk/k7YBvijXyNcH6Q==
=bBuG
-----END PGP SIGNATURE-----
Merge tag 'drm-misc-next-2023-07-13' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
drm-misc-next for v6.6:
UAPI Changes:
* fbdev:
* Make fbdev userspace interfaces optional; only leaves the
framebuffer console active
* prime:
* Support dma-buf self-import for all drivers automatically: improves
support for many userspace compositors
Cross-subsystem Changes:
* backlight:
* Fix interaction with fbdev in several drivers
* base: Convert struct platform.remove to return void; part of a larger,
tree-wide effort
* dma-buf: Acquire reservation lock for mmap() in exporters; part
of an on-going effort to simplify locking around dma-bufs
* fbdev:
* Use Linux device instead of fbdev device in many places
* Use deferred-I/O helper macros in various drivers
* i2c: Convert struct i2c from .probe_new to .probe; part of a larger,
tree-wide effort
* video:
* Avoid including <linux/screen_info.h>
Core Changes:
* atomic:
* Improve logging
* prime:
* Remove struct drm_driver.gem_prime_mmap plus driver updates: all
drivers now implement this callback with drm_gem_prime_mmap()
* gem:
* Support execution contexts: provides locking over multiple GEM
objects
* ttm:
* Support init_on_free
* Swapout fixes
Driver Changes:
* accel:
* ivpu: MMU updates; Support debugfs
* ast:
* Improve device-model detection
* Cleanups
* bridge:
* dw-hdmi: Improve support for YUV420 bus format
* dw-mipi-dsi: Fix enable/disable of DSI controller
* lt9611uxc: Use MODULE_FIRMWARE()
* ps8640: Remove broken EDID code
* samsung-dsim: Fix command transfer
* tc358764: Handle HS/VS polarity; Use BIT() macro; Various cleanups
* Cleanups
* ingenic:
* Kconfig REGMAP fixes
* loongson:
* Support display controller
* mgag200:
* Minor fixes
* mxsfb:
* Support disabling overlay planes
* nouveau:
* Improve VRAM detection
* Various fixes and cleanups
* panel:
* panel-edp: Support AUO B116XAB01.4
* Support Visionox R66451 plus DT bindings
* Cleanups
* ssd130x:
* Support per-controller default resolution plus DT bindings
* Reduce memory-allocation overhead
* Cleanups
* tidss:
* Support TI AM625 plus DT bindings
* Implement new connector model plus driver updates
* vkms
* Improve write-back support
* Documentation fixes
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20230713090830.GA23281@linux-uq9g
Don't assert held dma-buf reservation lock on memory mapping of exported
buffer.
We're going to change dma-buf mmap() locking policy such that exporters
will have to handle the lock. The previous locking policy caused deadlock
problem for DRM drivers in a case of self-imported dma-bufs once these
drivers are moved to use reservation lock universally. The problem is
solved by moving the lock down to exporters. This patch prepares udmabuf
for the locking policy update.
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230529223935.2672495-4-dmitry.osipenko@collabora.com
This effectively reverts commit 16c243e99d33 ("udmabuf: Add support for
mapping hugepages (v4)"). Recently, Junxiao Chang found a BUG with page
map counting as described here [1]. This issue pointed out that the
udmabuf driver was making direct use of subpages of hugetlb pages. This
is not a good idea, and no other mm code attempts such use. In addition
to the mapcount issue, this also causes issues with hugetlb vmemmap
optimization and page poisoning.
For now, remove hugetlb support.
If udmabuf wants to be used on hugetlb mappings, it should be changed to
only use complete hugetlb pages. This will require different alignment
and size requirements on the UDMABUF_CREATE API.
[1] https://lore.kernel.org/linux-mm/20230512072036.1027784-1-junxiao.chang@intel.com/
Link: https://lkml.kernel.org/r/20230608204927.88711-1-mike.kravetz@oracle.com
Fixes: 16c243e99d33 ("udmabuf: Add support for mapping hugepages (v4)")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dongwon Kim <dongwon.kim@intel.com>
Cc: James Houghton <jthoughton@google.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Junxiao Chang <junxiao.chang@intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Since commit 8b41fc4454e ("kbuild: create modules.builtin without
Makefile.modbuiltin or tristate.conf"), MODULE_LICENSE declarations
are used to identify modules. As a consequence, uses of the macro
in non-modules will cause modprobe to misidentify their containing
object file as a module when it is not (false positives), and modprobe
might succeed rather than failing with a suitable error message.
So remove it in the files in this commit, none of which can be built as
modules.
Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Suggested-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: linux-modules@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Hitomi Hasegawa <hasegawa-hitomi@fujitsu.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: dri-devel@lists.freedesktop.org
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
The reason behind that patch is associated with videobuf2 subsystem
(or more genrally with v4l2 framework) and user created
dma buffers (udmabuf). In some circumstances
when dealing with V4L2_MEMORY_DMABUF buffers videobuf2 subsystem
wants to use dma_buf_vmap() method on the attached dma buffer.
As udmabuf does not have .vmap operation implemented,
such dma_buf_vmap() natually fails.
videobuf2_common: __vb2_queue_alloc: allocated 3 buffers, 1 plane(s) each
videobuf2_common: __prepare_dmabuf: buffer for plane 0 changed
videobuf2_common: __prepare_dmabuf: failed to map dmabuf for plane 0
videobuf2_common: __buf_prepare: buffer preparation failed: -14
The patch itself seems to be strighforward.
It adds implementation of .vmap and .vunmap methods
to 'struct dma_buf_ops udmabuf_ops'.
.vmap method itself uses vm_map_ram() to map pages linearly
into the kernel virtual address space.
.vunmap removes mapping created earlier by .vmap.
All locking and 'vmapping counting' is done in dma_buf.c
so it seems to be redundant/unnecessary in .vmap/.vunmap.
Signed-off-by: Lukasz Wiecaszek <lukasz.wiecaszek@gmail.com>
Reviewed-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Acked-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221117171810.75637-1-lukasz.wiecaszek@gmail.com
Signed-off-by: Christian König <christian.koenig@amd.com>
When userspace mmaps dma-buf's fd, the dma-buf reservation lock must be
held. Add locking sanity check to the dma-buf mmaping callback to ensure
that the locking assumption won't regress in the future.
Suggested-by: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Acked-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221110201349.351294-4-dmitry.osipenko@collabora.com
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCYzxj0gAKCRBZ7Krx/gZQ
66/1AQC/KfIAINNOPxozsZaxOaOKo0ouVJ7sJV4ZGsPKpU69gwD/UodJZCtyZ52h
wwkmfzTDjAgGt1QCKj96zk2XFqg4swE=
=u0pv
-----END PGP SIGNATURE-----
Merge tag 'pull-file_inode' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull file_inode() updates from Al Vrio:
"whack-a-mole: cropped up open-coded file_inode() uses..."
* tag 'pull-file_inode' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
orangefs: use ->f_mapping
_nfs42_proc_copy(): use ->f_mapping instead of file_inode()->i_mapping
dma_buf: no need to bother with file_inode()->i_mapping
nfs_finish_open(): don't open-code file_inode()
bprm_fill_uid(): don't open-code file_inode()
sgx: use ->f_mapping...
exfat_iterate(): don't open-code file_inode(file)
ibmvmc: don't open-code file_inode()
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmLLR2MeHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG+hMH/jKGMOAbicR/CRq8
WLKmpb1eTJP2dbeiEs5amBk9DZQhqjx6tIQRCpZoGxBL+XWq7DX2fRLkAT56yS5/
NwferpR6IR9GlhjbfczF0JuQkP6eRUXnLrIKS5MViLI5QrCI80kkj4/mdqUXSiBV
cMfXl5T1j+pb3zHUVXjnmvY+77q6rZTPoGxa/l8d6MaIhAg+jhu2E1HaSaSCX/YK
TViq7ciI9cXoFV9yqhLkkBdGjBV8VQsKmeWEcA738bdSy1WAJSV1SVTJqLFvwdPI
PM1asxkPoQ7jRrwsY4G8pZ3zPskJMS4Qwdn64HK+no2AKhJt2p6MePD1XblcrGHK
QNStMY0=
=LfuD
-----END PGP SIGNATURE-----
Backmerge tag 'v5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into drm-next
Backmerge in rc6 so I can merge msm next easier.
Linux 5.19-rc6
Signed-off-by: Dave Airlie <airlied@redhat.com>
Check vm_fault->pgoff before using it. When we removed the warning, we
also removed the check.
Fixes: 7b26e4e2119d ("udmabuf: drop WARN_ON() check.")
Reported-by: zdi-disclosures@trendmicro.com
Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Syzbot has reported GPF in sg_alloc_append_table_from_pages(). The
problem was in ubuf->pages == ZERO_PTR.
ubuf->pagecount is calculated from arguments passed from user-space. If
user creates udmabuf with list.size == 0 then ubuf->pagecount will be
also equal to zero; it causes kmalloc_array() to return ZERO_PTR.
Fix it by validating ubuf->pagecount before passing it to
kmalloc_array().
Fixes: fbb0de795078 ("Add udmabuf misc device")
Reported-and-tested-by: syzbot+2c56b725ec547fa9cb29@syzkaller.appspotmail.com
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20211230142649.23022-1-paskripkin@gmail.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Syzbot reported general protection fault in udmabuf_create. The problem
was in wrong error handling.
In commit 16c243e99d33 ("udmabuf: Add support for mapping hugepages (v4)")
shmem_read_mapping_page() call was replaced with find_get_page_flags(),
but find_get_page_flags() returns NULL on failure instead PTR_ERR().
Wrong error checking was causing GPF in get_page(), since passed page
was equal to NULL. Fix it by changing if (IS_ER(!hpage)) to if (!hpage)
Reported-by: syzbot+e9cd3122a37c5d6c51e8@syzkaller.appspotmail.com
Fixes: 16c243e99d33 ("udmabuf: Add support for mapping hugepages (v4)")
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20210811175052.21254-1-paskripkin@gmail.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Default list_limit and size_limit_mb are not big enough to cover all
possible use cases. For example, list_limit could be well over its default,
1024 if only one or several pages are chained in all individual list entries
when creating dmabuf backed by >4MB buffer. list_limit and size_limit_mb are
now defined as module parameters so that those can be optionally configured
by root with proper values to remove these constraints.
Cc: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20210611212107.9876-1-dongwon.kim@intel.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
If the VMM's (Qemu) memory backend is backed up by memfd + Hugepages
(hugetlbfs and not THP), we have to first find the hugepage(s) where
the Guest allocations are located and then extract the regular 4k
sized subpages from them.
v2: Ensure that the subpage and hugepage offsets are calculated correctly
when the range of subpage allocations cuts across multiple hugepages.
v3: Instead of repeatedly looking up the hugepage for each subpage,
only do it when the subpage allocation crosses over into a different
hugepage. (suggested by Gerd and DW)
v4: Fix the following warning identified by checkpatch:
CHECK:OPEN_ENDED_LINE: Lines should not end with a '('
Cc: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20210609182915.592743-1-vivek.kasireddy@intel.com
[ kraxel: one more checkpatch format tweak ]
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
UAPI Changes:
Cross-subsystem Changes:
Core Changes:
- dev: More devm_drm convertions and removal of drm_dev_init
Driver Changes:
- i915: selftests improvements
- panfrost: support for Amlogic SoC
- vc4: one fix
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRcEzekXsqa64kGDp7j7w1vZxhRxQUCX2jGxQAKCRDj7w1vZxhR
xR3DAQCiZOnaxVcY49iG4343Z1aHHaIEShbnB0bDdaWstn7kiQD/UXBXUoOSFoFQ
FkTsW31JsdXNnWP5e6/eJd2Lb6waVAA=
=VlsU
-----END PGP SIGNATURE-----
Merge tag 'drm-misc-next-2020-09-21' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
drm-misc-next for 5.10:
UAPI Changes:
Cross-subsystem Changes:
- virtio: Merged a PR for patches that will affect drm/virtio
Core Changes:
- dev: More devm_drm convertions and removal of drm_dev_init
- atomic: Split out drm_atomic_helper_calc_timestamping_constants of
drm_atomic_helper_update_legacy_modeset_state
- ttm: More rework
Driver Changes:
- i915: selftests improvements
- panfrost: support for Amlogic SoC
- vc4: one fix
- tree-wide: conversions to devm_drm_dev_alloc,
- ast: simplifications of the atomic modesetting code
- panfrost: multiple fixes
- vc4: multiple fixes
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20200921152956.2gxnsdgxmwhvjyut@gilmour.lan
The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function
returns the number of the created entries in the DMA address space.
However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
dma_unmap_sg must be called with the original number of the entries
passed to the dma_map_sg().
struct sg_table is a common structure used for describing a non-contiguous
memory buffer, used commonly in the DRM and graphics subsystems. It
consists of a scatterlist with memory pages and DMA addresses (sgl entry),
as well as the number of scatterlist entries: CPU pages (orig_nents entry)
and DMA mapped pages (nents entry).
It turned out that it was a common mistake to misuse nents and orig_nents
entries, calling DMA-mapping functions with a wrong number of entries or
ignoring the number of mapped entries returned by the dma_map_sg()
function.
To avoid such issues, lets use a common dma-mapping wrappers operating
directly on the struct sg_table objects and use scatterlist page
iterators where possible. This, almost always, hides references to the
nents and orig_nents entries, making the code robust, easier to follow
and copy/paste safe.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
I'm just going to put Chia's review comment here since it sums
the issue rather nicely:
"(1) Semantically, a dma-buf is in DMA domain. CPU access from the
importer must be surrounded by {begin,end}_cpu_access. This gives the
exporter a chance to move the buffer to the CPU domain temporarily.
(2) When the exporter itself has other means to do CPU access, it is
only reasonable for the exporter to move the buffer to the CPU domain
before access, and to the DMA domain after access. The exporter can
potentially reuse {begin,end}_cpu_access for that purpose.
Because of (1), udmabuf does need to implement the
{begin,end}_cpu_access hooks. But "begin" should mean
dma_sync_sg_for_cpu and "end" should mean dma_sync_sg_for_device.
Because of (2), if userspace wants to continuing accessing through the
memfd mapping, it should call udmabuf's {begin,end}_cpu_access to
avoid cache issues."
Reported-by: Chia-I Wu <olvaffe@gmail.com>
Suggested-by: Chia-I Wu <olvaffe@gmail.com>
Fixes: 284562e1f348 ("udmabuf: implement begin_cpu_access/end_cpu_access hooks")
Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
Link: http://patchwork.freedesktop.org/patch/msgid/20191217230228.453-1-gurchetansingh@chromium.org
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
With the misc device, we should end up using the result of
get_arch_dma_ops(..) or dma-direct ops.
This can allow us to have WC mappings in the guest after
synchronization.
Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
Link: http://patchwork.freedesktop.org/patch/msgid/20191203013627.85991-4-gurchetansingh@chromium.org
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
We accidentally forgot to set "ret" on this error path so it means we
return NULL instead of an error pointer. The caller checks for NULL and
changes it to an error pointer so it doesn't cause an issue at run time.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20180914065615.GA12043@mwanda
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Create variable for the list length limit. Serves as documentation,
also allows to make it a module parameter if needed.
Also add a total size limit.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20180911134216.9760-8-kraxel@redhat.com