2009-09-22 10:26:53 +05:30
|
|
|
/*
|
2010-06-01 13:31:25 +05:30
|
|
|
* Compressed RAM block device
|
2009-09-22 10:26:53 +05:30
|
|
|
*
|
2010-01-28 21:21:35 +05:30
|
|
|
* Copyright (C) 2008, 2009, 2010 Nitin Gupta
|
2014-01-30 15:45:55 -08:00
|
|
|
* 2012, 2013 Minchan Kim
|
2009-09-22 10:26:53 +05:30
|
|
|
*
|
|
|
|
* This code is released using a dual license strategy: BSD/GPL
|
|
|
|
* You can choose the licence that better fits your requirements.
|
|
|
|
*
|
|
|
|
* Released under the terms of 3-clause BSD License
|
|
|
|
* Released under the terms of GNU General Public License Version 2.0
|
|
|
|
*
|
|
|
|
*/
|
|
|
|
|
2010-06-01 13:31:25 +05:30
|
|
|
#define KMSG_COMPONENT "zram"
|
2009-09-22 10:26:53 +05:30
|
|
|
#define pr_fmt(fmt) KMSG_COMPONENT ": " fmt
|
|
|
|
|
|
|
|
#include <linux/module.h>
|
|
|
|
#include <linux/kernel.h>
|
2010-06-23 20:27:09 -07:00
|
|
|
#include <linux/bio.h>
|
2009-09-22 10:26:53 +05:30
|
|
|
#include <linux/bitops.h>
|
|
|
|
#include <linux/blkdev.h>
|
|
|
|
#include <linux/buffer_head.h>
|
|
|
|
#include <linux/device.h>
|
|
|
|
#include <linux/highmem.h>
|
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 17:04:11 +09:00
|
|
|
#include <linux/slab.h>
|
2017-01-10 16:58:21 -08:00
|
|
|
#include <linux/backing-dev.h>
|
2009-09-22 10:26:53 +05:30
|
|
|
#include <linux/string.h>
|
|
|
|
#include <linux/vmalloc.h>
|
2014-04-07 15:38:20 -07:00
|
|
|
#include <linux/err.h>
|
2015-06-25 15:00:06 -07:00
|
|
|
#include <linux/idr.h>
|
2015-06-25 15:00:24 -07:00
|
|
|
#include <linux/sysfs.h>
|
2018-06-07 17:05:49 -07:00
|
|
|
#include <linux/debugfs.h>
|
2016-11-27 00:13:46 +01:00
|
|
|
#include <linux/cpuhotplug.h>
|
2020-03-25 16:48:42 +01:00
|
|
|
#include <linux/part_stat.h>
|
2024-09-02 19:56:04 +09:00
|
|
|
#include <linux/kernel_read_file.h>
|
2009-09-22 10:26:53 +05:30
|
|
|
|
2010-06-01 13:31:24 +05:30
|
|
|
#include "zram_drv.h"
|
2009-09-22 10:26:53 +05:30
|
|
|
|
2015-06-25 15:00:06 -07:00
|
|
|
static DEFINE_IDR(zram_index_idr);
|
2015-06-25 15:00:24 -07:00
|
|
|
/* idr index must be protected */
|
|
|
|
static DEFINE_MUTEX(zram_index_mutex);
|
|
|
|
|
2010-06-01 13:31:25 +05:30
|
|
|
static int zram_major;
|
2020-12-14 19:14:35 -08:00
|
|
|
static const char *default_compressor = CONFIG_ZRAM_DEF_COMP;
|
2009-09-22 10:26:53 +05:30
|
|
|
|
|
|
|
/* Module params (documentation at end) */
|
2013-01-01 21:24:13 -08:00
|
|
|
static unsigned int num_devices = 1;
|
2018-04-05 16:24:47 -07:00
|
|
|
/*
|
|
|
|
* Pages that compress to sizes equals or greater than this are stored
|
|
|
|
* uncompressed in memory.
|
|
|
|
*/
|
|
|
|
static size_t huge_class_size;
|
2010-08-09 22:56:47 +05:30
|
|
|
|
2020-09-24 08:51:36 +02:00
|
|
|
static const struct block_device_operations zram_devops;
|
|
|
|
|
2017-05-03 14:55:41 -07:00
|
|
|
static void zram_free_page(struct zram *zram, size_t index);
|
2024-12-18 15:34:23 +09:00
|
|
|
static int zram_read_from_zspool(struct zram *zram, struct page *page,
|
|
|
|
u32 index);
|
2017-05-03 14:55:41 -07:00
|
|
|
|
zram: fix lockdep warning of free block handling
Patch series "zram idle page writeback", v3.
Inherently, swap device has many idle pages which are rare touched since
it was allocated. It is never problem if we use storage device as swap.
However, it's just waste for zram-swap.
This patchset supports zram idle page writeback feature.
* Admin can define what is idle page "no access since X time ago"
* Admin can define when zram should writeback them
* Admin can define when zram should stop writeback to prevent wearout
Details are in each patch's description.
This patch (of 7):
================================
WARNING: inconsistent lock state
4.19.0+ #390 Not tainted
--------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
zram_verify/2095 [HC0[0]:SC1[1]:HE1:SE0] takes:
00000000b1828693 (&(&zram->bitmap_lock)->rlock){+.?.}, at: put_entry_bdev+0x1e/0x50
{SOFTIRQ-ON-W} state was registered at:
_raw_spin_lock+0x2c/0x40
zram_make_request+0x755/0xdc9
generic_make_request+0x373/0x6a0
submit_bio+0x6c/0x140
__swap_writepage+0x3a8/0x480
shrink_page_list+0x1102/0x1a60
shrink_inactive_list+0x21b/0x3f0
shrink_node_memcg.constprop.99+0x4f8/0x7e0
shrink_node+0x7d/0x2f0
do_try_to_free_pages+0xe0/0x300
try_to_free_pages+0x116/0x2b0
__alloc_pages_slowpath+0x3f4/0xf80
__alloc_pages_nodemask+0x2a2/0x2f0
__handle_mm_fault+0x42e/0xb50
handle_mm_fault+0x55/0xb0
__do_page_fault+0x235/0x4b0
page_fault+0x1e/0x30
irq event stamp: 228412
hardirqs last enabled at (228412): [<ffffffff98245846>] __slab_free+0x3e6/0x600
hardirqs last disabled at (228411): [<ffffffff98245625>] __slab_free+0x1c5/0x600
softirqs last enabled at (228396): [<ffffffff98e0031e>] __do_softirq+0x31e/0x427
softirqs last disabled at (228403): [<ffffffff98072051>] irq_exit+0xd1/0xe0
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&(&zram->bitmap_lock)->rlock);
<Interrupt>
lock(&(&zram->bitmap_lock)->rlock);
*** DEADLOCK ***
no locks held by zram_verify/2095.
stack backtrace:
CPU: 5 PID: 2095 Comm: zram_verify Not tainted 4.19.0+ #390
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x67/0x9b
print_usage_bug+0x1bd/0x1d3
mark_lock+0x4aa/0x540
__lock_acquire+0x51d/0x1300
lock_acquire+0x90/0x180
_raw_spin_lock+0x2c/0x40
put_entry_bdev+0x1e/0x50
zram_free_page+0xf6/0x110
zram_slot_free_notify+0x42/0xa0
end_swap_bio_read+0x5b/0x170
blk_update_request+0x8f/0x340
scsi_end_request+0x2c/0x1e0
scsi_io_completion+0x98/0x650
blk_done_softirq+0x9e/0xd0
__do_softirq+0xcc/0x427
irq_exit+0xd1/0xe0
do_IRQ+0x93/0x120
common_interrupt+0xf/0xf
</IRQ>
With writeback feature, zram_slot_free_notify could be called in softirq
context by end_swap_bio_read. However, bitmap_lock is not aware of that
so lockdep yell out:
get_entry_bdev
spin_lock(bitmap->lock);
irq
softirq
end_swap_bio_read
zram_slot_free_notify
zram_slot_lock <-- deadlock prone
zram_free_page
put_entry_bdev
spin_lock(bitmap->lock); <-- deadlock prone
With akpm's suggestion (i.e. bitmap operation is already atomic), we
could remove bitmap lock. It might fail to find a empty slot if serious
contention happens. However, it's not severe problem because huge page
writeback has already possiblity to fail if there is severe memory
pressure. Worst case is just keeping the incompressible in memory, not
storage.
The other problem is zram_slot_lock in zram_slot_slot_free_notify. To
make it safe is this patch introduces zram_slot_trylock where
zram_slot_free_notify uses it. Although it's rare to be contented, this
patch adds new debug stat "miss_free" to keep monitoring how often it
happens.
Link: http://lkml.kernel.org/r/20181127055429.251614-2-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Joey Pabalinas <joeypabalinas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 00:36:33 -08:00
|
|
|
static int zram_slot_trylock(struct zram *zram, u32 index)
|
|
|
|
{
|
2024-09-06 16:14:44 +02:00
|
|
|
return spin_trylock(&zram->table[index].lock);
|
zram: fix lockdep warning of free block handling
Patch series "zram idle page writeback", v3.
Inherently, swap device has many idle pages which are rare touched since
it was allocated. It is never problem if we use storage device as swap.
However, it's just waste for zram-swap.
This patchset supports zram idle page writeback feature.
* Admin can define what is idle page "no access since X time ago"
* Admin can define when zram should writeback them
* Admin can define when zram should stop writeback to prevent wearout
Details are in each patch's description.
This patch (of 7):
================================
WARNING: inconsistent lock state
4.19.0+ #390 Not tainted
--------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
zram_verify/2095 [HC0[0]:SC1[1]:HE1:SE0] takes:
00000000b1828693 (&(&zram->bitmap_lock)->rlock){+.?.}, at: put_entry_bdev+0x1e/0x50
{SOFTIRQ-ON-W} state was registered at:
_raw_spin_lock+0x2c/0x40
zram_make_request+0x755/0xdc9
generic_make_request+0x373/0x6a0
submit_bio+0x6c/0x140
__swap_writepage+0x3a8/0x480
shrink_page_list+0x1102/0x1a60
shrink_inactive_list+0x21b/0x3f0
shrink_node_memcg.constprop.99+0x4f8/0x7e0
shrink_node+0x7d/0x2f0
do_try_to_free_pages+0xe0/0x300
try_to_free_pages+0x116/0x2b0
__alloc_pages_slowpath+0x3f4/0xf80
__alloc_pages_nodemask+0x2a2/0x2f0
__handle_mm_fault+0x42e/0xb50
handle_mm_fault+0x55/0xb0
__do_page_fault+0x235/0x4b0
page_fault+0x1e/0x30
irq event stamp: 228412
hardirqs last enabled at (228412): [<ffffffff98245846>] __slab_free+0x3e6/0x600
hardirqs last disabled at (228411): [<ffffffff98245625>] __slab_free+0x1c5/0x600
softirqs last enabled at (228396): [<ffffffff98e0031e>] __do_softirq+0x31e/0x427
softirqs last disabled at (228403): [<ffffffff98072051>] irq_exit+0xd1/0xe0
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&(&zram->bitmap_lock)->rlock);
<Interrupt>
lock(&(&zram->bitmap_lock)->rlock);
*** DEADLOCK ***
no locks held by zram_verify/2095.
stack backtrace:
CPU: 5 PID: 2095 Comm: zram_verify Not tainted 4.19.0+ #390
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x67/0x9b
print_usage_bug+0x1bd/0x1d3
mark_lock+0x4aa/0x540
__lock_acquire+0x51d/0x1300
lock_acquire+0x90/0x180
_raw_spin_lock+0x2c/0x40
put_entry_bdev+0x1e/0x50
zram_free_page+0xf6/0x110
zram_slot_free_notify+0x42/0xa0
end_swap_bio_read+0x5b/0x170
blk_update_request+0x8f/0x340
scsi_end_request+0x2c/0x1e0
scsi_io_completion+0x98/0x650
blk_done_softirq+0x9e/0xd0
__do_softirq+0xcc/0x427
irq_exit+0xd1/0xe0
do_IRQ+0x93/0x120
common_interrupt+0xf/0xf
</IRQ>
With writeback feature, zram_slot_free_notify could be called in softirq
context by end_swap_bio_read. However, bitmap_lock is not aware of that
so lockdep yell out:
get_entry_bdev
spin_lock(bitmap->lock);
irq
softirq
end_swap_bio_read
zram_slot_free_notify
zram_slot_lock <-- deadlock prone
zram_free_page
put_entry_bdev
spin_lock(bitmap->lock); <-- deadlock prone
With akpm's suggestion (i.e. bitmap operation is already atomic), we
could remove bitmap lock. It might fail to find a empty slot if serious
contention happens. However, it's not severe problem because huge page
writeback has already possiblity to fail if there is severe memory
pressure. Worst case is just keeping the incompressible in memory, not
storage.
The other problem is zram_slot_lock in zram_slot_slot_free_notify. To
make it safe is this patch introduces zram_slot_trylock where
zram_slot_free_notify uses it. Although it's rare to be contented, this
patch adds new debug stat "miss_free" to keep monitoring how often it
happens.
Link: http://lkml.kernel.org/r/20181127055429.251614-2-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Joey Pabalinas <joeypabalinas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 00:36:33 -08:00
|
|
|
}
|
|
|
|
|
2018-06-07 17:05:39 -07:00
|
|
|
static void zram_slot_lock(struct zram *zram, u32 index)
|
|
|
|
{
|
2024-09-06 16:14:43 +02:00
|
|
|
spin_lock(&zram->table[index].lock);
|
2018-06-07 17:05:39 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
static void zram_slot_unlock(struct zram *zram, u32 index)
|
|
|
|
{
|
2024-09-06 16:14:43 +02:00
|
|
|
spin_unlock(&zram->table[index].lock);
|
2018-06-07 17:05:39 -07:00
|
|
|
}
|
|
|
|
|
2015-02-12 15:00:45 -08:00
|
|
|
static inline bool init_done(struct zram *zram)
|
2014-04-07 15:38:00 -07:00
|
|
|
{
|
2015-02-12 15:00:45 -08:00
|
|
|
return zram->disksize;
|
2014-04-07 15:38:00 -07:00
|
|
|
}
|
|
|
|
|
2013-06-22 03:21:18 +03:00
|
|
|
static inline struct zram *dev_to_zram(struct device *dev)
|
|
|
|
{
|
|
|
|
return (struct zram *)dev_to_disk(dev)->private_data;
|
|
|
|
}
|
|
|
|
|
2017-05-03 14:55:50 -07:00
|
|
|
static unsigned long zram_get_handle(struct zram *zram, u32 index)
|
|
|
|
{
|
|
|
|
return zram->table[index].handle;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void zram_set_handle(struct zram *zram, u32 index, unsigned long handle)
|
|
|
|
{
|
|
|
|
zram->table[index].handle = handle;
|
|
|
|
}
|
|
|
|
|
2015-06-25 15:00:16 -07:00
|
|
|
/* flag operations require table entry bit_spin_lock() being held */
|
2018-06-07 17:05:49 -07:00
|
|
|
static bool zram_test_flag(struct zram *zram, u32 index,
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
enum zram_pageflags flag)
|
2015-05-05 16:23:25 -07:00
|
|
|
{
|
2018-12-28 00:36:40 -08:00
|
|
|
return zram->table[index].flags & BIT(flag);
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
}
|
2015-05-05 16:23:25 -07:00
|
|
|
|
2017-05-03 14:55:47 -07:00
|
|
|
static void zram_set_flag(struct zram *zram, u32 index,
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
enum zram_pageflags flag)
|
|
|
|
{
|
2018-12-28 00:36:40 -08:00
|
|
|
zram->table[index].flags |= BIT(flag);
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
}
|
2015-05-05 16:23:25 -07:00
|
|
|
|
2017-05-03 14:55:47 -07:00
|
|
|
static void zram_clear_flag(struct zram *zram, u32 index,
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
enum zram_pageflags flag)
|
|
|
|
{
|
2018-12-28 00:36:40 -08:00
|
|
|
zram->table[index].flags &= ~BIT(flag);
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
}
|
2015-05-05 16:23:25 -07:00
|
|
|
|
2017-05-03 14:55:47 -07:00
|
|
|
static size_t zram_get_obj_size(struct zram *zram, u32 index)
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
{
|
2018-12-28 00:36:40 -08:00
|
|
|
return zram->table[index].flags & (BIT(ZRAM_FLAG_SHIFT) - 1);
|
2015-05-05 16:23:25 -07:00
|
|
|
}
|
|
|
|
|
2017-05-03 14:55:47 -07:00
|
|
|
static void zram_set_obj_size(struct zram *zram,
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
u32 index, size_t size)
|
2013-06-22 03:21:18 +03:00
|
|
|
{
|
2018-12-28 00:36:40 -08:00
|
|
|
unsigned long flags = zram->table[index].flags >> ZRAM_FLAG_SHIFT;
|
2013-06-22 03:21:18 +03:00
|
|
|
|
2018-12-28 00:36:40 -08:00
|
|
|
zram->table[index].flags = (flags << ZRAM_FLAG_SHIFT) | size;
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
}
|
|
|
|
|
zram: support idle/huge page writeback
Add a new feature "zram idle/huge page writeback". In the zram-swap use
case, zram usually has many idle/huge swap pages. It's pointless to keep
them in memory (ie, zram).
To solve this problem, this feature introduces idle/huge page writeback to
the backing device so the goal is to save more memory space on embedded
systems.
Normal sequence to use idle/huge page writeback feature is as follows,
while (1) {
# mark allocated zram slot to idle
echo all > /sys/block/zram0/idle
# leave system working for several hours
# Unless there is no access for some blocks on zram,
# they are still IDLE marked pages.
echo "idle" > /sys/block/zram0/writeback
or/and
echo "huge" > /sys/block/zram0/writeback
# write the IDLE or/and huge marked slot into backing device
# and free the memory.
}
Per the discussion at
https://lore.kernel.org/lkml/20181122065926.GG3441@jagdpanzerIV/T/#u,
This patch removes direct incommpressibe page writeback feature
(d2afd25114f4 ("zram: write incompressible pages to backing device")).
Below concerns from Sergey:
== &< ==
"IDLE writeback" is superior to "incompressible writeback".
"incompressible writeback" is completely unpredictable and uncontrollable;
it depens on data patterns and compression algorithms. While "IDLE
writeback" is predictable.
I even suspect, that, *ideally*, we can remove "incompressible writeback".
"IDLE pages" is a super set which also includes "incompressible" pages.
So, technically, we still can do "incompressible writeback" from "IDLE
writeback" path; but a much more reasonable one, based on a page idling
period.
I understand that you want to keep "direct incompressible writeback"
around. ZRAM is especially popular on devices which do suffer from flash
wearout, so I can see "incompressible writeback" path becoming a dead
code, long term.
== &< ==
Below concerns from Minchan:
== &< ==
My concern is if we enable CONFIG_ZRAM_WRITEBACK in this implementation,
both hugepage/idlepage writeck will turn on. However someuser want to
enable only idlepage writeback so we need to introduce turn on/off knob
for hugepage or new CONFIG_ZRAM_IDLEPAGE_WRITEBACK for those usecase. I
don't want to make it complicated *if possible*.
Long term, I imagine we need to make VM aware of new swap hierarchy a
little bit different with as-is. For example, first high priority swap
can return -EIO or -ENOCOMP, swap try to fallback to next lower priority
swap device. With that, hugepage writeback will work tranparently.
So we could regard it as regression because incompressible pages doesn't
go to backing storage automatically. Instead, user should do it via "echo
huge" > /sys/block/zram/writeback" manually.
== &< ==
Link: http://lkml.kernel.org/r/20181127055429.251614-6-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Joey Pabalinas <joeypabalinas@gmail.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 00:36:47 -08:00
|
|
|
static inline bool zram_allocated(struct zram *zram, u32 index)
|
|
|
|
{
|
|
|
|
return zram_get_obj_size(zram, index) ||
|
|
|
|
zram_test_flag(zram, index, ZRAM_SAME) ||
|
|
|
|
zram_test_flag(zram, index, ZRAM_WB);
|
|
|
|
}
|
|
|
|
|
2024-12-18 15:34:21 +09:00
|
|
|
static inline void update_used_max(struct zram *zram, const unsigned long pages)
|
|
|
|
{
|
|
|
|
unsigned long cur_max = atomic_long_read(&zram->stats.max_used_pages);
|
|
|
|
|
|
|
|
do {
|
|
|
|
if (cur_max >= pages)
|
|
|
|
return;
|
|
|
|
} while (!atomic_long_try_cmpxchg(&zram->stats.max_used_pages,
|
|
|
|
&cur_max, pages));
|
|
|
|
}
|
|
|
|
|
|
|
|
static bool zram_can_store_page(struct zram *zram)
|
|
|
|
{
|
|
|
|
unsigned long alloced_pages;
|
|
|
|
|
|
|
|
alloced_pages = zs_get_total_pages(zram->mem_pool);
|
|
|
|
update_used_max(zram, alloced_pages);
|
|
|
|
|
|
|
|
return !zram->limit_pages || alloced_pages <= zram->limit_pages;
|
|
|
|
}
|
|
|
|
|
2017-05-03 14:55:41 -07:00
|
|
|
#if PAGE_SIZE != 4096
|
2015-11-06 16:29:06 -08:00
|
|
|
static inline bool is_partial_io(struct bio_vec *bvec)
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
{
|
|
|
|
return bvec->bv_len != PAGE_SIZE;
|
|
|
|
}
|
2023-04-11 19:14:43 +02:00
|
|
|
#define ZRAM_PARTIAL_IO 1
|
2017-05-03 14:55:41 -07:00
|
|
|
#else
|
|
|
|
static inline bool is_partial_io(struct bio_vec *bvec)
|
|
|
|
{
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
#endif
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
|
2022-11-09 20:50:38 +09:00
|
|
|
static inline void zram_set_priority(struct zram *zram, u32 index, u32 prio)
|
|
|
|
{
|
|
|
|
prio &= ZRAM_COMP_PRIORITY_MASK;
|
|
|
|
/*
|
|
|
|
* Clear previous priority value first, in case if we recompress
|
|
|
|
* further an already recompressed page
|
|
|
|
*/
|
|
|
|
zram->table[index].flags &= ~(ZRAM_COMP_PRIORITY_MASK <<
|
|
|
|
ZRAM_COMP_PRIORITY_BIT1);
|
|
|
|
zram->table[index].flags |= (prio << ZRAM_COMP_PRIORITY_BIT1);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline u32 zram_get_priority(struct zram *zram, u32 index)
|
|
|
|
{
|
|
|
|
u32 prio = zram->table[index].flags >> ZRAM_COMP_PRIORITY_BIT1;
|
|
|
|
|
|
|
|
return prio & ZRAM_COMP_PRIORITY_MASK;
|
|
|
|
}
|
|
|
|
|
2023-11-15 11:42:12 +09:00
|
|
|
static void zram_accessed(struct zram *zram, u32 index)
|
|
|
|
{
|
|
|
|
zram_clear_flag(zram, index, ZRAM_IDLE);
|
zram: introduce ZRAM_PP_SLOT flag
Patch series "zram: optimal post-processing target selection", v5.
Problem:
--------
Both recompression and writeback perform a very simple linear scan of all
zram slots in search for post-processing (writeback or recompress)
candidate slots. This often means that we pick the worst candidate for pp
(post-processing), e.g. a 48 bytes object for writeback, which is nearly
useless, because it only releases 48 bytes from zsmalloc pool, but
consumes an entire 4K slot in the backing device. Similarly,
recompression of an 48 bytes objects is unlikely to save more memory that
recompression of a 3000 bytes object. Both recompression and writeback
consume constrained resources (CPU time, batter, backing device storage
space) and quite often have a (daily) limit on the number of items they
post-process, so we should utilize those constrained resources in the most
optimal way.
Solution:
---------
This patch reworks the way we select pp targets. We, quite clearly, want
to sort all the candidates and always pick the largest, be it
recompression or writeback. Especially for writeback, because the larger
object we writeback the more memory we release. This series introduces
concept of pp buckets and pp scan/selection.
The scan step is a simple iteration over all zram->table entries, just
like what we currently do, but we don't post-process a candidate slot
immediately. Instead we assign it to a PP (post-processing) bucket. PP
bucket is, basically, a list which holds pp candidate slots that belong to
the same size class. PP buckets are 64 bytes apart, slots are not
strictly sorted within a bucket there is a 64 bytes variance.
The select step simply iterates over pp buckets from highest to lowest and
picks all candidate slots a particular buckets contains. So this gives us
sorted candidates (in linear time) and allows us to select most optimal
(largest) candidates for post-processing first.
This patch (of 7):
This flag indicates that the slot was selected as a candidate slot for
post-processing (pp) and was assigned to a pp bucket. It does not
necessarily mean that the slot is currently under post-processing, but may
mean so. The slot can loose its PP_SLOT flag, while still being in the
pp-bucket, if it's accessed or slot_free-ed.
Link: https://lkml.kernel.org/r/20240917021020.883356-1-senozhatsky@chromium.org
Link: https://lkml.kernel.org/r/20240917021020.883356-2-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-17 11:09:06 +09:00
|
|
|
zram_clear_flag(zram, index, ZRAM_PP_SLOT);
|
2023-11-15 11:42:12 +09:00
|
|
|
#ifdef CONFIG_ZRAM_TRACK_ENTRY_ACTIME
|
|
|
|
zram->table[index].ac_time = ktime_get_boottime();
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
2024-09-17 11:09:09 +09:00
|
|
|
#if defined CONFIG_ZRAM_WRITEBACK || defined CONFIG_ZRAM_MULTI_COMP
|
zram: rework recompress target selection strategy
Target slot selection for recompression is just a simple iteration over
zram->table entries (stored pages) from slot 0 to max slot. Given that
zram->table slots are written in random order and are not sorted by size,
a simple iteration over slots selects suboptimal targets for
recompression. This is not a problem if we recompress every single
zram->table slot, but we never do that in reality. In reality we limit
the number of slots we can recompress (via max_pages parameter) and hence
proper slot selection becomes very important. The strategy is quite
simple, suppose we have two candidate slots for recompression, one of size
48 bytes and one of size 2800 bytes, and we can recompress only one, then
it certainly makes more sense to pick 2800 entry for recompression.
Because even if we manage to compress 48 bytes objects even further the
savings are going to be very small. Potential savings after good
re-compression of 2800 bytes objects are much higher.
This patch reworks slot selection and introduces the strategy described
above: among candidate slots always select the biggest ones first.
For that the patch introduces zram_pp_ctl (post-processing) structure
which holds NUM_PP_BUCKETS pp buckets of slots. Slots are assigned to a
particular group based on their sizes - the larger the size of the slot
the higher the group index. This, basically, sorts slots by size in liner
time (we still perform just one iteration over zram->table slots). When
we select slot for recompression we always first lookup in higher pp
buckets (those that hold the largest slots). Which achieves the desired
behavior.
TEST
====
A very simple demonstration: zram is configured with zstd, and zstd with
dict as a recompression stream. A limited (max 4096 pages) recompression
is performed then, with a log of sizes of slots that were recompressed.
You can see that patched zram selects slots for recompression in
significantly different manner, which leads to higher memory savings (see
column #2 of mm_stat output).
BASE
----
*** initial state of zram device
/sys/block/zram0/mm_stat
1750994944 504491413 514203648 0 514203648 1 0 34204 34204
*** recompress idle max_pages=4096
/sys/block/zram0/mm_stat
1750994944 504262229 514953216 0 514203648 1 0 34204 34204
Sizes of selected objects for recompression:
... 45 58 24 226 91 40 24 24 24 424 2104 93 2078 2078 2078 959 154 ...
PATCHED
-------
*** initial state of zram device
/sys/block/zram0/mm_stat
1750982656 504492801 514170880 0 514170880 1 0 34204 34204
*** recompress idle max_pages=4096
/sys/block/zram0/mm_stat
1750982656 503716710 517586944 0 514170880 1 0 34204 34204
Sizes of selected objects for recompression:
... 3680 3694 3667 3590 3614 3553 3537 3548 3550 3542 3543 3537 ...
Note, pp-slots are not strictly sorted, there is a PP_BUCKET_SIZE_RANGE
variation of sizes within particular bucket.
[senozhatsky@chromium.org: do not skip the first bucket]
Link: https://lkml.kernel.org/r/20241001085634.1948384-1-senozhatsky@chromium.org
Link: https://lkml.kernel.org/r/20240917021020.883356-4-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-17 11:09:08 +09:00
|
|
|
struct zram_pp_slot {
|
|
|
|
unsigned long index;
|
|
|
|
struct list_head entry;
|
|
|
|
};
|
|
|
|
|
|
|
|
/*
|
|
|
|
* A post-processing bucket is, essentially, a size class, this defines
|
|
|
|
* the range (in bytes) of pp-slots sizes in particular bucket.
|
|
|
|
*/
|
|
|
|
#define PP_BUCKET_SIZE_RANGE 64
|
|
|
|
#define NUM_PP_BUCKETS ((PAGE_SIZE / PP_BUCKET_SIZE_RANGE) + 1)
|
|
|
|
|
|
|
|
struct zram_pp_ctl {
|
|
|
|
struct list_head pp_buckets[NUM_PP_BUCKETS];
|
|
|
|
};
|
|
|
|
|
|
|
|
static struct zram_pp_ctl *init_pp_ctl(void)
|
|
|
|
{
|
|
|
|
struct zram_pp_ctl *ctl;
|
|
|
|
u32 idx;
|
|
|
|
|
|
|
|
ctl = kmalloc(sizeof(*ctl), GFP_KERNEL);
|
|
|
|
if (!ctl)
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
for (idx = 0; idx < NUM_PP_BUCKETS; idx++)
|
|
|
|
INIT_LIST_HEAD(&ctl->pp_buckets[idx]);
|
|
|
|
return ctl;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void release_pp_slot(struct zram *zram, struct zram_pp_slot *pps)
|
|
|
|
{
|
|
|
|
list_del_init(&pps->entry);
|
|
|
|
|
|
|
|
zram_slot_lock(zram, pps->index);
|
|
|
|
zram_clear_flag(zram, pps->index, ZRAM_PP_SLOT);
|
|
|
|
zram_slot_unlock(zram, pps->index);
|
|
|
|
|
|
|
|
kfree(pps);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void release_pp_ctl(struct zram *zram, struct zram_pp_ctl *ctl)
|
|
|
|
{
|
|
|
|
u32 idx;
|
|
|
|
|
|
|
|
if (!ctl)
|
|
|
|
return;
|
|
|
|
|
|
|
|
for (idx = 0; idx < NUM_PP_BUCKETS; idx++) {
|
|
|
|
while (!list_empty(&ctl->pp_buckets[idx])) {
|
|
|
|
struct zram_pp_slot *pps;
|
|
|
|
|
|
|
|
pps = list_first_entry(&ctl->pp_buckets[idx],
|
|
|
|
struct zram_pp_slot,
|
|
|
|
entry);
|
|
|
|
release_pp_slot(zram, pps);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
kfree(ctl);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void place_pp_slot(struct zram *zram, struct zram_pp_ctl *ctl,
|
|
|
|
struct zram_pp_slot *pps)
|
|
|
|
{
|
|
|
|
u32 idx;
|
|
|
|
|
|
|
|
idx = zram_get_obj_size(zram, pps->index) / PP_BUCKET_SIZE_RANGE;
|
|
|
|
list_add(&pps->entry, &ctl->pp_buckets[idx]);
|
|
|
|
|
|
|
|
zram_set_flag(zram, pps->index, ZRAM_PP_SLOT);
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct zram_pp_slot *select_pp_slot(struct zram_pp_ctl *ctl)
|
|
|
|
{
|
|
|
|
struct zram_pp_slot *pps = NULL;
|
|
|
|
s32 idx = NUM_PP_BUCKETS - 1;
|
|
|
|
|
|
|
|
/* The higher the bucket id the more optimal slot post-processing is */
|
|
|
|
while (idx >= 0) {
|
|
|
|
pps = list_first_entry_or_null(&ctl->pp_buckets[idx],
|
|
|
|
struct zram_pp_slot,
|
|
|
|
entry);
|
|
|
|
if (pps)
|
|
|
|
break;
|
|
|
|
|
|
|
|
idx--;
|
|
|
|
}
|
|
|
|
return pps;
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2017-09-08 16:14:07 -07:00
|
|
|
static inline void zram_fill_page(void *ptr, unsigned long len,
|
2017-02-24 14:59:27 -08:00
|
|
|
unsigned long value)
|
|
|
|
{
|
|
|
|
WARN_ON_ONCE(!IS_ALIGNED(len, sizeof(unsigned long)));
|
2017-09-08 16:14:07 -07:00
|
|
|
memset_l(ptr, value, len / sizeof(unsigned long));
|
2017-02-24 14:59:27 -08:00
|
|
|
}
|
|
|
|
|
|
|
|
static bool page_same_filled(void *ptr, unsigned long *element)
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
{
|
|
|
|
unsigned long *page;
|
2017-05-03 14:55:56 -07:00
|
|
|
unsigned long val;
|
2020-01-30 22:15:22 -08:00
|
|
|
unsigned int pos, last_pos = PAGE_SIZE / sizeof(*page) - 1;
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
|
|
|
|
page = (unsigned long *)ptr;
|
2017-05-03 14:55:56 -07:00
|
|
|
val = page[0];
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
|
2020-01-30 22:15:22 -08:00
|
|
|
if (val != page[last_pos])
|
|
|
|
return false;
|
|
|
|
|
|
|
|
for (pos = 1; pos < last_pos; pos++) {
|
2017-05-03 14:55:56 -07:00
|
|
|
if (val != page[pos])
|
2015-11-06 16:29:06 -08:00
|
|
|
return false;
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
}
|
|
|
|
|
2017-05-03 14:55:56 -07:00
|
|
|
*element = val;
|
2017-02-24 14:59:27 -08:00
|
|
|
|
2015-11-06 16:29:06 -08:00
|
|
|
return true;
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
}
|
|
|
|
|
2013-06-22 03:21:18 +03:00
|
|
|
static ssize_t initstate_show(struct device *dev,
|
|
|
|
struct device_attribute *attr, char *buf)
|
|
|
|
{
|
2014-04-07 15:38:04 -07:00
|
|
|
u32 val;
|
2013-06-22 03:21:18 +03:00
|
|
|
struct zram *zram = dev_to_zram(dev);
|
|
|
|
|
2014-04-07 15:38:04 -07:00
|
|
|
down_read(&zram->init_lock);
|
|
|
|
val = init_done(zram);
|
|
|
|
up_read(&zram->init_lock);
|
2013-06-22 03:21:18 +03:00
|
|
|
|
2014-04-07 15:38:22 -07:00
|
|
|
return scnprintf(buf, PAGE_SIZE, "%u\n", val);
|
2013-06-22 03:21:18 +03:00
|
|
|
}
|
|
|
|
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
static ssize_t disksize_show(struct device *dev,
|
|
|
|
struct device_attribute *attr, char *buf)
|
|
|
|
{
|
|
|
|
struct zram *zram = dev_to_zram(dev);
|
|
|
|
|
|
|
|
return scnprintf(buf, PAGE_SIZE, "%llu\n", zram->disksize);
|
|
|
|
}
|
|
|
|
|
2014-10-09 15:29:53 -07:00
|
|
|
static ssize_t mem_limit_store(struct device *dev,
|
|
|
|
struct device_attribute *attr, const char *buf, size_t len)
|
|
|
|
{
|
|
|
|
u64 limit;
|
|
|
|
char *tmp;
|
|
|
|
struct zram *zram = dev_to_zram(dev);
|
|
|
|
|
|
|
|
limit = memparse(buf, &tmp);
|
|
|
|
if (buf == tmp) /* no chars parsed, invalid input */
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
down_write(&zram->init_lock);
|
|
|
|
zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT;
|
|
|
|
up_write(&zram->init_lock);
|
|
|
|
|
|
|
|
return len;
|
|
|
|
}
|
|
|
|
|
2014-10-09 15:29:55 -07:00
|
|
|
static ssize_t mem_used_max_store(struct device *dev,
|
|
|
|
struct device_attribute *attr, const char *buf, size_t len)
|
|
|
|
{
|
|
|
|
int err;
|
|
|
|
unsigned long val;
|
|
|
|
struct zram *zram = dev_to_zram(dev);
|
|
|
|
|
|
|
|
err = kstrtoul(buf, 10, &val);
|
|
|
|
if (err || val != 0)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
down_read(&zram->init_lock);
|
2014-10-29 14:50:57 -07:00
|
|
|
if (init_done(zram)) {
|
2014-10-09 15:29:55 -07:00
|
|
|
atomic_long_set(&zram->stats.max_used_pages,
|
2017-05-03 14:55:47 -07:00
|
|
|
zs_get_total_pages(zram->mem_pool));
|
2014-10-29 14:50:57 -07:00
|
|
|
}
|
2014-10-09 15:29:55 -07:00
|
|
|
up_read(&zram->init_lock);
|
|
|
|
|
|
|
|
return len;
|
|
|
|
}
|
|
|
|
|
2021-11-05 13:45:15 -07:00
|
|
|
/*
|
|
|
|
* Mark all pages which are older than or equal to cutoff as IDLE.
|
|
|
|
* Callers should hold the zram init lock in read mode
|
|
|
|
*/
|
|
|
|
static void mark_idle(struct zram *zram, ktime_t cutoff)
|
2018-12-28 00:36:44 -08:00
|
|
|
{
|
2021-11-05 13:45:15 -07:00
|
|
|
int is_idle = 1;
|
2018-12-28 00:36:44 -08:00
|
|
|
unsigned long nr_pages = zram->disksize >> PAGE_SHIFT;
|
|
|
|
int index;
|
|
|
|
|
|
|
|
for (index = 0; index < nr_pages; index++) {
|
zram: support idle/huge page writeback
Add a new feature "zram idle/huge page writeback". In the zram-swap use
case, zram usually has many idle/huge swap pages. It's pointless to keep
them in memory (ie, zram).
To solve this problem, this feature introduces idle/huge page writeback to
the backing device so the goal is to save more memory space on embedded
systems.
Normal sequence to use idle/huge page writeback feature is as follows,
while (1) {
# mark allocated zram slot to idle
echo all > /sys/block/zram0/idle
# leave system working for several hours
# Unless there is no access for some blocks on zram,
# they are still IDLE marked pages.
echo "idle" > /sys/block/zram0/writeback
or/and
echo "huge" > /sys/block/zram0/writeback
# write the IDLE or/and huge marked slot into backing device
# and free the memory.
}
Per the discussion at
https://lore.kernel.org/lkml/20181122065926.GG3441@jagdpanzerIV/T/#u,
This patch removes direct incommpressibe page writeback feature
(d2afd25114f4 ("zram: write incompressible pages to backing device")).
Below concerns from Sergey:
== &< ==
"IDLE writeback" is superior to "incompressible writeback".
"incompressible writeback" is completely unpredictable and uncontrollable;
it depens on data patterns and compression algorithms. While "IDLE
writeback" is predictable.
I even suspect, that, *ideally*, we can remove "incompressible writeback".
"IDLE pages" is a super set which also includes "incompressible" pages.
So, technically, we still can do "incompressible writeback" from "IDLE
writeback" path; but a much more reasonable one, based on a page idling
period.
I understand that you want to keep "direct incompressible writeback"
around. ZRAM is especially popular on devices which do suffer from flash
wearout, so I can see "incompressible writeback" path becoming a dead
code, long term.
== &< ==
Below concerns from Minchan:
== &< ==
My concern is if we enable CONFIG_ZRAM_WRITEBACK in this implementation,
both hugepage/idlepage writeck will turn on. However someuser want to
enable only idlepage writeback so we need to introduce turn on/off knob
for hugepage or new CONFIG_ZRAM_IDLEPAGE_WRITEBACK for those usecase. I
don't want to make it complicated *if possible*.
Long term, I imagine we need to make VM aware of new swap hierarchy a
little bit different with as-is. For example, first high priority swap
can return -EIO or -ENOCOMP, swap try to fallback to next lower priority
swap device. With that, hugepage writeback will work tranparently.
So we could regard it as regression because incompressible pages doesn't
go to backing storage automatically. Instead, user should do it via "echo
huge" > /sys/block/zram/writeback" manually.
== &< ==
Link: http://lkml.kernel.org/r/20181127055429.251614-6-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Joey Pabalinas <joeypabalinas@gmail.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 00:36:47 -08:00
|
|
|
/*
|
2024-09-17 11:09:12 +09:00
|
|
|
* Do not mark ZRAM_SAME slots as ZRAM_IDLE, because no
|
2024-09-17 11:09:10 +09:00
|
|
|
* post-processing (recompress, writeback) happens to the
|
|
|
|
* ZRAM_SAME slot.
|
|
|
|
*
|
|
|
|
* And ZRAM_WB slots simply cannot be ZRAM_IDLE.
|
zram: support idle/huge page writeback
Add a new feature "zram idle/huge page writeback". In the zram-swap use
case, zram usually has many idle/huge swap pages. It's pointless to keep
them in memory (ie, zram).
To solve this problem, this feature introduces idle/huge page writeback to
the backing device so the goal is to save more memory space on embedded
systems.
Normal sequence to use idle/huge page writeback feature is as follows,
while (1) {
# mark allocated zram slot to idle
echo all > /sys/block/zram0/idle
# leave system working for several hours
# Unless there is no access for some blocks on zram,
# they are still IDLE marked pages.
echo "idle" > /sys/block/zram0/writeback
or/and
echo "huge" > /sys/block/zram0/writeback
# write the IDLE or/and huge marked slot into backing device
# and free the memory.
}
Per the discussion at
https://lore.kernel.org/lkml/20181122065926.GG3441@jagdpanzerIV/T/#u,
This patch removes direct incommpressibe page writeback feature
(d2afd25114f4 ("zram: write incompressible pages to backing device")).
Below concerns from Sergey:
== &< ==
"IDLE writeback" is superior to "incompressible writeback".
"incompressible writeback" is completely unpredictable and uncontrollable;
it depens on data patterns and compression algorithms. While "IDLE
writeback" is predictable.
I even suspect, that, *ideally*, we can remove "incompressible writeback".
"IDLE pages" is a super set which also includes "incompressible" pages.
So, technically, we still can do "incompressible writeback" from "IDLE
writeback" path; but a much more reasonable one, based on a page idling
period.
I understand that you want to keep "direct incompressible writeback"
around. ZRAM is especially popular on devices which do suffer from flash
wearout, so I can see "incompressible writeback" path becoming a dead
code, long term.
== &< ==
Below concerns from Minchan:
== &< ==
My concern is if we enable CONFIG_ZRAM_WRITEBACK in this implementation,
both hugepage/idlepage writeck will turn on. However someuser want to
enable only idlepage writeback so we need to introduce turn on/off knob
for hugepage or new CONFIG_ZRAM_IDLEPAGE_WRITEBACK for those usecase. I
don't want to make it complicated *if possible*.
Long term, I imagine we need to make VM aware of new swap hierarchy a
little bit different with as-is. For example, first high priority swap
can return -EIO or -ENOCOMP, swap try to fallback to next lower priority
swap device. With that, hugepage writeback will work tranparently.
So we could regard it as regression because incompressible pages doesn't
go to backing storage automatically. Instead, user should do it via "echo
huge" > /sys/block/zram/writeback" manually.
== &< ==
Link: http://lkml.kernel.org/r/20181127055429.251614-6-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Joey Pabalinas <joeypabalinas@gmail.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 00:36:47 -08:00
|
|
|
*/
|
2018-12-28 00:36:44 -08:00
|
|
|
zram_slot_lock(zram, index);
|
2024-09-17 11:09:10 +09:00
|
|
|
if (!zram_allocated(zram, index) ||
|
|
|
|
zram_test_flag(zram, index, ZRAM_WB) ||
|
|
|
|
zram_test_flag(zram, index, ZRAM_SAME)) {
|
|
|
|
zram_slot_unlock(zram, index);
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
2023-11-15 11:42:12 +09:00
|
|
|
#ifdef CONFIG_ZRAM_TRACK_ENTRY_ACTIME
|
2024-09-17 11:09:10 +09:00
|
|
|
is_idle = !cutoff ||
|
|
|
|
ktime_after(cutoff, zram->table[index].ac_time);
|
2021-11-05 13:45:15 -07:00
|
|
|
#endif
|
2024-09-17 11:09:10 +09:00
|
|
|
if (is_idle)
|
|
|
|
zram_set_flag(zram, index, ZRAM_IDLE);
|
2024-10-29 00:36:15 +09:00
|
|
|
else
|
|
|
|
zram_clear_flag(zram, index, ZRAM_IDLE);
|
2018-12-28 00:36:44 -08:00
|
|
|
zram_slot_unlock(zram, index);
|
|
|
|
}
|
2021-11-05 13:45:15 -07:00
|
|
|
}
|
2018-12-28 00:36:44 -08:00
|
|
|
|
2021-11-05 13:45:15 -07:00
|
|
|
static ssize_t idle_store(struct device *dev,
|
|
|
|
struct device_attribute *attr, const char *buf, size_t len)
|
|
|
|
{
|
|
|
|
struct zram *zram = dev_to_zram(dev);
|
|
|
|
ktime_t cutoff_time = 0;
|
|
|
|
ssize_t rv = -EINVAL;
|
2018-12-28 00:36:44 -08:00
|
|
|
|
2021-11-05 13:45:15 -07:00
|
|
|
if (!sysfs_streq(buf, "all")) {
|
|
|
|
/*
|
2022-09-14 14:20:33 +09:00
|
|
|
* If it did not parse as 'all' try to treat it as an integer
|
|
|
|
* when we have memory tracking enabled.
|
2021-11-05 13:45:15 -07:00
|
|
|
*/
|
|
|
|
u64 age_sec;
|
|
|
|
|
2023-11-15 11:42:12 +09:00
|
|
|
if (IS_ENABLED(CONFIG_ZRAM_TRACK_ENTRY_ACTIME) && !kstrtoull(buf, 0, &age_sec))
|
2021-11-05 13:45:15 -07:00
|
|
|
cutoff_time = ktime_sub(ktime_get_boottime(),
|
|
|
|
ns_to_ktime(age_sec * NSEC_PER_SEC));
|
|
|
|
else
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
down_read(&zram->init_lock);
|
|
|
|
if (!init_done(zram))
|
|
|
|
goto out_unlock;
|
|
|
|
|
2022-09-14 14:20:33 +09:00
|
|
|
/*
|
|
|
|
* A cutoff_time of 0 marks everything as idle, this is the
|
|
|
|
* "all" behavior.
|
|
|
|
*/
|
2021-11-05 13:45:15 -07:00
|
|
|
mark_idle(zram, cutoff_time);
|
|
|
|
rv = len;
|
|
|
|
|
|
|
|
out_unlock:
|
|
|
|
up_read(&zram->init_lock);
|
|
|
|
out:
|
|
|
|
return rv;
|
2018-12-28 00:36:44 -08:00
|
|
|
}
|
|
|
|
|
2017-09-06 16:19:54 -07:00
|
|
|
#ifdef CONFIG_ZRAM_WRITEBACK
|
2019-01-08 15:22:53 -08:00
|
|
|
static ssize_t writeback_limit_enable_store(struct device *dev,
|
|
|
|
struct device_attribute *attr, const char *buf, size_t len)
|
|
|
|
{
|
|
|
|
struct zram *zram = dev_to_zram(dev);
|
|
|
|
u64 val;
|
|
|
|
ssize_t ret = -EINVAL;
|
|
|
|
|
|
|
|
if (kstrtoull(buf, 10, &val))
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
down_read(&zram->init_lock);
|
|
|
|
spin_lock(&zram->wb_limit_lock);
|
|
|
|
zram->wb_limit_enable = val;
|
|
|
|
spin_unlock(&zram->wb_limit_lock);
|
|
|
|
up_read(&zram->init_lock);
|
|
|
|
ret = len;
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static ssize_t writeback_limit_enable_show(struct device *dev,
|
|
|
|
struct device_attribute *attr, char *buf)
|
|
|
|
{
|
|
|
|
bool val;
|
|
|
|
struct zram *zram = dev_to_zram(dev);
|
|
|
|
|
|
|
|
down_read(&zram->init_lock);
|
|
|
|
spin_lock(&zram->wb_limit_lock);
|
|
|
|
val = zram->wb_limit_enable;
|
|
|
|
spin_unlock(&zram->wb_limit_lock);
|
|
|
|
up_read(&zram->init_lock);
|
|
|
|
|
|
|
|
return scnprintf(buf, PAGE_SIZE, "%d\n", val);
|
|
|
|
}
|
|
|
|
|
2018-12-28 00:36:54 -08:00
|
|
|
static ssize_t writeback_limit_store(struct device *dev,
|
|
|
|
struct device_attribute *attr, const char *buf, size_t len)
|
|
|
|
{
|
|
|
|
struct zram *zram = dev_to_zram(dev);
|
|
|
|
u64 val;
|
|
|
|
ssize_t ret = -EINVAL;
|
|
|
|
|
|
|
|
if (kstrtoull(buf, 10, &val))
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
down_read(&zram->init_lock);
|
2019-01-08 15:22:53 -08:00
|
|
|
spin_lock(&zram->wb_limit_lock);
|
|
|
|
zram->bd_wb_limit = val;
|
|
|
|
spin_unlock(&zram->wb_limit_lock);
|
2018-12-28 00:36:54 -08:00
|
|
|
up_read(&zram->init_lock);
|
|
|
|
ret = len;
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static ssize_t writeback_limit_show(struct device *dev,
|
|
|
|
struct device_attribute *attr, char *buf)
|
|
|
|
{
|
|
|
|
u64 val;
|
|
|
|
struct zram *zram = dev_to_zram(dev);
|
|
|
|
|
|
|
|
down_read(&zram->init_lock);
|
2019-01-08 15:22:53 -08:00
|
|
|
spin_lock(&zram->wb_limit_lock);
|
|
|
|
val = zram->bd_wb_limit;
|
|
|
|
spin_unlock(&zram->wb_limit_lock);
|
2018-12-28 00:36:54 -08:00
|
|
|
up_read(&zram->init_lock);
|
|
|
|
|
|
|
|
return scnprintf(buf, PAGE_SIZE, "%llu\n", val);
|
|
|
|
}
|
|
|
|
|
2017-09-06 16:19:54 -07:00
|
|
|
static void reset_bdev(struct zram *zram)
|
|
|
|
{
|
2018-12-28 00:36:40 -08:00
|
|
|
if (!zram->backing_dev)
|
2017-09-06 16:19:54 -07:00
|
|
|
return;
|
|
|
|
|
|
|
|
/* hope filp_close flush all of IO */
|
|
|
|
filp_close(zram->backing_dev, NULL);
|
|
|
|
zram->backing_dev = NULL;
|
2024-04-28 18:55:47 -04:00
|
|
|
zram->bdev = NULL;
|
2020-09-24 08:51:36 +02:00
|
|
|
zram->disk->fops = &zram_devops;
|
2017-09-06 16:19:57 -07:00
|
|
|
kvfree(zram->bitmap);
|
|
|
|
zram->bitmap = NULL;
|
2017-09-06 16:19:54 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
static ssize_t backing_dev_show(struct device *dev,
|
|
|
|
struct device_attribute *attr, char *buf)
|
|
|
|
{
|
2019-10-18 20:20:14 -07:00
|
|
|
struct file *file;
|
2017-09-06 16:19:54 -07:00
|
|
|
struct zram *zram = dev_to_zram(dev);
|
|
|
|
char *p;
|
|
|
|
ssize_t ret;
|
|
|
|
|
|
|
|
down_read(&zram->init_lock);
|
2019-10-18 20:20:14 -07:00
|
|
|
file = zram->backing_dev;
|
|
|
|
if (!file) {
|
2017-09-06 16:19:54 -07:00
|
|
|
memcpy(buf, "none\n", 5);
|
|
|
|
up_read(&zram->init_lock);
|
|
|
|
return 5;
|
|
|
|
}
|
|
|
|
|
|
|
|
p = file_path(file, buf, PAGE_SIZE - 1);
|
|
|
|
if (IS_ERR(p)) {
|
|
|
|
ret = PTR_ERR(p);
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
ret = strlen(p);
|
|
|
|
memmove(buf, p, ret);
|
|
|
|
buf[ret++] = '\n';
|
|
|
|
out:
|
|
|
|
up_read(&zram->init_lock);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static ssize_t backing_dev_store(struct device *dev,
|
|
|
|
struct device_attribute *attr, const char *buf, size_t len)
|
|
|
|
{
|
|
|
|
char *file_name;
|
2018-08-21 21:54:02 -07:00
|
|
|
size_t sz;
|
2017-09-06 16:19:54 -07:00
|
|
|
struct file *backing_dev = NULL;
|
|
|
|
struct inode *inode;
|
2020-11-16 15:43:46 +01:00
|
|
|
unsigned int bitmap_sz;
|
2017-09-06 16:19:57 -07:00
|
|
|
unsigned long nr_pages, *bitmap = NULL;
|
2017-09-06 16:19:54 -07:00
|
|
|
int err;
|
|
|
|
struct zram *zram = dev_to_zram(dev);
|
|
|
|
|
|
|
|
file_name = kmalloc(PATH_MAX, GFP_KERNEL);
|
|
|
|
if (!file_name)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
down_write(&zram->init_lock);
|
|
|
|
if (init_done(zram)) {
|
|
|
|
pr_info("Can't setup backing device for initialized device\n");
|
|
|
|
err = -EBUSY;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
2022-08-18 22:59:57 +02:00
|
|
|
strscpy(file_name, buf, PATH_MAX);
|
2018-08-21 21:54:02 -07:00
|
|
|
/* ignore trailing newline */
|
|
|
|
sz = strlen(file_name);
|
|
|
|
if (sz > 0 && file_name[sz - 1] == '\n')
|
|
|
|
file_name[sz - 1] = 0x00;
|
2017-09-06 16:19:54 -07:00
|
|
|
|
2024-04-28 18:55:47 -04:00
|
|
|
backing_dev = filp_open(file_name, O_RDWR | O_LARGEFILE | O_EXCL, 0);
|
2017-09-06 16:19:54 -07:00
|
|
|
if (IS_ERR(backing_dev)) {
|
|
|
|
err = PTR_ERR(backing_dev);
|
|
|
|
backing_dev = NULL;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
2024-04-28 18:55:47 -04:00
|
|
|
inode = backing_dev->f_mapping->host;
|
2017-09-06 16:19:54 -07:00
|
|
|
|
|
|
|
/* Support only block device in this moment */
|
|
|
|
if (!S_ISBLK(inode->i_mode)) {
|
|
|
|
err = -ENOTBLK;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
2017-09-06 16:19:57 -07:00
|
|
|
nr_pages = i_size_read(inode) >> PAGE_SHIFT;
|
2024-12-10 00:57:15 +08:00
|
|
|
/* Refuse to use zero sized device (also prevents self reference) */
|
|
|
|
if (!nr_pages) {
|
|
|
|
err = -EINVAL;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
2017-09-06 16:19:57 -07:00
|
|
|
bitmap_sz = BITS_TO_LONGS(nr_pages) * sizeof(long);
|
|
|
|
bitmap = kvzalloc(bitmap_sz, GFP_KERNEL);
|
|
|
|
if (!bitmap) {
|
|
|
|
err = -ENOMEM;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
2017-09-06 16:19:54 -07:00
|
|
|
reset_bdev(zram);
|
|
|
|
|
2024-04-28 18:55:47 -04:00
|
|
|
zram->bdev = I_BDEV(inode);
|
2017-09-06 16:19:54 -07:00
|
|
|
zram->backing_dev = backing_dev;
|
2017-09-06 16:19:57 -07:00
|
|
|
zram->bitmap = bitmap;
|
|
|
|
zram->nr_pages = nr_pages;
|
2017-09-06 16:19:54 -07:00
|
|
|
up_write(&zram->init_lock);
|
|
|
|
|
|
|
|
pr_info("setup backing device %s\n", file_name);
|
|
|
|
kfree(file_name);
|
|
|
|
|
|
|
|
return len;
|
|
|
|
out:
|
2021-01-25 16:13:01 +08:00
|
|
|
kvfree(bitmap);
|
2017-09-06 16:19:57 -07:00
|
|
|
|
2017-09-06 16:19:54 -07:00
|
|
|
if (backing_dev)
|
|
|
|
filp_close(backing_dev, NULL);
|
|
|
|
|
|
|
|
up_write(&zram->init_lock);
|
|
|
|
|
|
|
|
kfree(file_name);
|
|
|
|
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2018-12-28 00:36:40 -08:00
|
|
|
static unsigned long alloc_block_bdev(struct zram *zram)
|
2017-09-06 16:19:57 -07:00
|
|
|
{
|
zram: fix lockdep warning of free block handling
Patch series "zram idle page writeback", v3.
Inherently, swap device has many idle pages which are rare touched since
it was allocated. It is never problem if we use storage device as swap.
However, it's just waste for zram-swap.
This patchset supports zram idle page writeback feature.
* Admin can define what is idle page "no access since X time ago"
* Admin can define when zram should writeback them
* Admin can define when zram should stop writeback to prevent wearout
Details are in each patch's description.
This patch (of 7):
================================
WARNING: inconsistent lock state
4.19.0+ #390 Not tainted
--------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
zram_verify/2095 [HC0[0]:SC1[1]:HE1:SE0] takes:
00000000b1828693 (&(&zram->bitmap_lock)->rlock){+.?.}, at: put_entry_bdev+0x1e/0x50
{SOFTIRQ-ON-W} state was registered at:
_raw_spin_lock+0x2c/0x40
zram_make_request+0x755/0xdc9
generic_make_request+0x373/0x6a0
submit_bio+0x6c/0x140
__swap_writepage+0x3a8/0x480
shrink_page_list+0x1102/0x1a60
shrink_inactive_list+0x21b/0x3f0
shrink_node_memcg.constprop.99+0x4f8/0x7e0
shrink_node+0x7d/0x2f0
do_try_to_free_pages+0xe0/0x300
try_to_free_pages+0x116/0x2b0
__alloc_pages_slowpath+0x3f4/0xf80
__alloc_pages_nodemask+0x2a2/0x2f0
__handle_mm_fault+0x42e/0xb50
handle_mm_fault+0x55/0xb0
__do_page_fault+0x235/0x4b0
page_fault+0x1e/0x30
irq event stamp: 228412
hardirqs last enabled at (228412): [<ffffffff98245846>] __slab_free+0x3e6/0x600
hardirqs last disabled at (228411): [<ffffffff98245625>] __slab_free+0x1c5/0x600
softirqs last enabled at (228396): [<ffffffff98e0031e>] __do_softirq+0x31e/0x427
softirqs last disabled at (228403): [<ffffffff98072051>] irq_exit+0xd1/0xe0
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&(&zram->bitmap_lock)->rlock);
<Interrupt>
lock(&(&zram->bitmap_lock)->rlock);
*** DEADLOCK ***
no locks held by zram_verify/2095.
stack backtrace:
CPU: 5 PID: 2095 Comm: zram_verify Not tainted 4.19.0+ #390
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x67/0x9b
print_usage_bug+0x1bd/0x1d3
mark_lock+0x4aa/0x540
__lock_acquire+0x51d/0x1300
lock_acquire+0x90/0x180
_raw_spin_lock+0x2c/0x40
put_entry_bdev+0x1e/0x50
zram_free_page+0xf6/0x110
zram_slot_free_notify+0x42/0xa0
end_swap_bio_read+0x5b/0x170
blk_update_request+0x8f/0x340
scsi_end_request+0x2c/0x1e0
scsi_io_completion+0x98/0x650
blk_done_softirq+0x9e/0xd0
__do_softirq+0xcc/0x427
irq_exit+0xd1/0xe0
do_IRQ+0x93/0x120
common_interrupt+0xf/0xf
</IRQ>
With writeback feature, zram_slot_free_notify could be called in softirq
context by end_swap_bio_read. However, bitmap_lock is not aware of that
so lockdep yell out:
get_entry_bdev
spin_lock(bitmap->lock);
irq
softirq
end_swap_bio_read
zram_slot_free_notify
zram_slot_lock <-- deadlock prone
zram_free_page
put_entry_bdev
spin_lock(bitmap->lock); <-- deadlock prone
With akpm's suggestion (i.e. bitmap operation is already atomic), we
could remove bitmap lock. It might fail to find a empty slot if serious
contention happens. However, it's not severe problem because huge page
writeback has already possiblity to fail if there is severe memory
pressure. Worst case is just keeping the incompressible in memory, not
storage.
The other problem is zram_slot_lock in zram_slot_slot_free_notify. To
make it safe is this patch introduces zram_slot_trylock where
zram_slot_free_notify uses it. Although it's rare to be contented, this
patch adds new debug stat "miss_free" to keep monitoring how often it
happens.
Link: http://lkml.kernel.org/r/20181127055429.251614-2-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Joey Pabalinas <joeypabalinas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 00:36:33 -08:00
|
|
|
unsigned long blk_idx = 1;
|
|
|
|
retry:
|
2017-09-06 16:19:57 -07:00
|
|
|
/* skip 0 bit to confuse zram.handle = 0 */
|
zram: fix lockdep warning of free block handling
Patch series "zram idle page writeback", v3.
Inherently, swap device has many idle pages which are rare touched since
it was allocated. It is never problem if we use storage device as swap.
However, it's just waste for zram-swap.
This patchset supports zram idle page writeback feature.
* Admin can define what is idle page "no access since X time ago"
* Admin can define when zram should writeback them
* Admin can define when zram should stop writeback to prevent wearout
Details are in each patch's description.
This patch (of 7):
================================
WARNING: inconsistent lock state
4.19.0+ #390 Not tainted
--------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
zram_verify/2095 [HC0[0]:SC1[1]:HE1:SE0] takes:
00000000b1828693 (&(&zram->bitmap_lock)->rlock){+.?.}, at: put_entry_bdev+0x1e/0x50
{SOFTIRQ-ON-W} state was registered at:
_raw_spin_lock+0x2c/0x40
zram_make_request+0x755/0xdc9
generic_make_request+0x373/0x6a0
submit_bio+0x6c/0x140
__swap_writepage+0x3a8/0x480
shrink_page_list+0x1102/0x1a60
shrink_inactive_list+0x21b/0x3f0
shrink_node_memcg.constprop.99+0x4f8/0x7e0
shrink_node+0x7d/0x2f0
do_try_to_free_pages+0xe0/0x300
try_to_free_pages+0x116/0x2b0
__alloc_pages_slowpath+0x3f4/0xf80
__alloc_pages_nodemask+0x2a2/0x2f0
__handle_mm_fault+0x42e/0xb50
handle_mm_fault+0x55/0xb0
__do_page_fault+0x235/0x4b0
page_fault+0x1e/0x30
irq event stamp: 228412
hardirqs last enabled at (228412): [<ffffffff98245846>] __slab_free+0x3e6/0x600
hardirqs last disabled at (228411): [<ffffffff98245625>] __slab_free+0x1c5/0x600
softirqs last enabled at (228396): [<ffffffff98e0031e>] __do_softirq+0x31e/0x427
softirqs last disabled at (228403): [<ffffffff98072051>] irq_exit+0xd1/0xe0
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&(&zram->bitmap_lock)->rlock);
<Interrupt>
lock(&(&zram->bitmap_lock)->rlock);
*** DEADLOCK ***
no locks held by zram_verify/2095.
stack backtrace:
CPU: 5 PID: 2095 Comm: zram_verify Not tainted 4.19.0+ #390
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x67/0x9b
print_usage_bug+0x1bd/0x1d3
mark_lock+0x4aa/0x540
__lock_acquire+0x51d/0x1300
lock_acquire+0x90/0x180
_raw_spin_lock+0x2c/0x40
put_entry_bdev+0x1e/0x50
zram_free_page+0xf6/0x110
zram_slot_free_notify+0x42/0xa0
end_swap_bio_read+0x5b/0x170
blk_update_request+0x8f/0x340
scsi_end_request+0x2c/0x1e0
scsi_io_completion+0x98/0x650
blk_done_softirq+0x9e/0xd0
__do_softirq+0xcc/0x427
irq_exit+0xd1/0xe0
do_IRQ+0x93/0x120
common_interrupt+0xf/0xf
</IRQ>
With writeback feature, zram_slot_free_notify could be called in softirq
context by end_swap_bio_read. However, bitmap_lock is not aware of that
so lockdep yell out:
get_entry_bdev
spin_lock(bitmap->lock);
irq
softirq
end_swap_bio_read
zram_slot_free_notify
zram_slot_lock <-- deadlock prone
zram_free_page
put_entry_bdev
spin_lock(bitmap->lock); <-- deadlock prone
With akpm's suggestion (i.e. bitmap operation is already atomic), we
could remove bitmap lock. It might fail to find a empty slot if serious
contention happens. However, it's not severe problem because huge page
writeback has already possiblity to fail if there is severe memory
pressure. Worst case is just keeping the incompressible in memory, not
storage.
The other problem is zram_slot_lock in zram_slot_slot_free_notify. To
make it safe is this patch introduces zram_slot_trylock where
zram_slot_free_notify uses it. Although it's rare to be contented, this
patch adds new debug stat "miss_free" to keep monitoring how often it
happens.
Link: http://lkml.kernel.org/r/20181127055429.251614-2-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Joey Pabalinas <joeypabalinas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 00:36:33 -08:00
|
|
|
blk_idx = find_next_zero_bit(zram->bitmap, zram->nr_pages, blk_idx);
|
|
|
|
if (blk_idx == zram->nr_pages)
|
2017-09-06 16:19:57 -07:00
|
|
|
return 0;
|
|
|
|
|
zram: fix lockdep warning of free block handling
Patch series "zram idle page writeback", v3.
Inherently, swap device has many idle pages which are rare touched since
it was allocated. It is never problem if we use storage device as swap.
However, it's just waste for zram-swap.
This patchset supports zram idle page writeback feature.
* Admin can define what is idle page "no access since X time ago"
* Admin can define when zram should writeback them
* Admin can define when zram should stop writeback to prevent wearout
Details are in each patch's description.
This patch (of 7):
================================
WARNING: inconsistent lock state
4.19.0+ #390 Not tainted
--------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
zram_verify/2095 [HC0[0]:SC1[1]:HE1:SE0] takes:
00000000b1828693 (&(&zram->bitmap_lock)->rlock){+.?.}, at: put_entry_bdev+0x1e/0x50
{SOFTIRQ-ON-W} state was registered at:
_raw_spin_lock+0x2c/0x40
zram_make_request+0x755/0xdc9
generic_make_request+0x373/0x6a0
submit_bio+0x6c/0x140
__swap_writepage+0x3a8/0x480
shrink_page_list+0x1102/0x1a60
shrink_inactive_list+0x21b/0x3f0
shrink_node_memcg.constprop.99+0x4f8/0x7e0
shrink_node+0x7d/0x2f0
do_try_to_free_pages+0xe0/0x300
try_to_free_pages+0x116/0x2b0
__alloc_pages_slowpath+0x3f4/0xf80
__alloc_pages_nodemask+0x2a2/0x2f0
__handle_mm_fault+0x42e/0xb50
handle_mm_fault+0x55/0xb0
__do_page_fault+0x235/0x4b0
page_fault+0x1e/0x30
irq event stamp: 228412
hardirqs last enabled at (228412): [<ffffffff98245846>] __slab_free+0x3e6/0x600
hardirqs last disabled at (228411): [<ffffffff98245625>] __slab_free+0x1c5/0x600
softirqs last enabled at (228396): [<ffffffff98e0031e>] __do_softirq+0x31e/0x427
softirqs last disabled at (228403): [<ffffffff98072051>] irq_exit+0xd1/0xe0
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&(&zram->bitmap_lock)->rlock);
<Interrupt>
lock(&(&zram->bitmap_lock)->rlock);
*** DEADLOCK ***
no locks held by zram_verify/2095.
stack backtrace:
CPU: 5 PID: 2095 Comm: zram_verify Not tainted 4.19.0+ #390
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x67/0x9b
print_usage_bug+0x1bd/0x1d3
mark_lock+0x4aa/0x540
__lock_acquire+0x51d/0x1300
lock_acquire+0x90/0x180
_raw_spin_lock+0x2c/0x40
put_entry_bdev+0x1e/0x50
zram_free_page+0xf6/0x110
zram_slot_free_notify+0x42/0xa0
end_swap_bio_read+0x5b/0x170
blk_update_request+0x8f/0x340
scsi_end_request+0x2c/0x1e0
scsi_io_completion+0x98/0x650
blk_done_softirq+0x9e/0xd0
__do_softirq+0xcc/0x427
irq_exit+0xd1/0xe0
do_IRQ+0x93/0x120
common_interrupt+0xf/0xf
</IRQ>
With writeback feature, zram_slot_free_notify could be called in softirq
context by end_swap_bio_read. However, bitmap_lock is not aware of that
so lockdep yell out:
get_entry_bdev
spin_lock(bitmap->lock);
irq
softirq
end_swap_bio_read
zram_slot_free_notify
zram_slot_lock <-- deadlock prone
zram_free_page
put_entry_bdev
spin_lock(bitmap->lock); <-- deadlock prone
With akpm's suggestion (i.e. bitmap operation is already atomic), we
could remove bitmap lock. It might fail to find a empty slot if serious
contention happens. However, it's not severe problem because huge page
writeback has already possiblity to fail if there is severe memory
pressure. Worst case is just keeping the incompressible in memory, not
storage.
The other problem is zram_slot_lock in zram_slot_slot_free_notify. To
make it safe is this patch introduces zram_slot_trylock where
zram_slot_free_notify uses it. Although it's rare to be contented, this
patch adds new debug stat "miss_free" to keep monitoring how often it
happens.
Link: http://lkml.kernel.org/r/20181127055429.251614-2-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Joey Pabalinas <joeypabalinas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 00:36:33 -08:00
|
|
|
if (test_and_set_bit(blk_idx, zram->bitmap))
|
|
|
|
goto retry;
|
2017-09-06 16:19:57 -07:00
|
|
|
|
2018-12-28 00:36:51 -08:00
|
|
|
atomic64_inc(&zram->stats.bd_count);
|
zram: fix lockdep warning of free block handling
Patch series "zram idle page writeback", v3.
Inherently, swap device has many idle pages which are rare touched since
it was allocated. It is never problem if we use storage device as swap.
However, it's just waste for zram-swap.
This patchset supports zram idle page writeback feature.
* Admin can define what is idle page "no access since X time ago"
* Admin can define when zram should writeback them
* Admin can define when zram should stop writeback to prevent wearout
Details are in each patch's description.
This patch (of 7):
================================
WARNING: inconsistent lock state
4.19.0+ #390 Not tainted
--------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
zram_verify/2095 [HC0[0]:SC1[1]:HE1:SE0] takes:
00000000b1828693 (&(&zram->bitmap_lock)->rlock){+.?.}, at: put_entry_bdev+0x1e/0x50
{SOFTIRQ-ON-W} state was registered at:
_raw_spin_lock+0x2c/0x40
zram_make_request+0x755/0xdc9
generic_make_request+0x373/0x6a0
submit_bio+0x6c/0x140
__swap_writepage+0x3a8/0x480
shrink_page_list+0x1102/0x1a60
shrink_inactive_list+0x21b/0x3f0
shrink_node_memcg.constprop.99+0x4f8/0x7e0
shrink_node+0x7d/0x2f0
do_try_to_free_pages+0xe0/0x300
try_to_free_pages+0x116/0x2b0
__alloc_pages_slowpath+0x3f4/0xf80
__alloc_pages_nodemask+0x2a2/0x2f0
__handle_mm_fault+0x42e/0xb50
handle_mm_fault+0x55/0xb0
__do_page_fault+0x235/0x4b0
page_fault+0x1e/0x30
irq event stamp: 228412
hardirqs last enabled at (228412): [<ffffffff98245846>] __slab_free+0x3e6/0x600
hardirqs last disabled at (228411): [<ffffffff98245625>] __slab_free+0x1c5/0x600
softirqs last enabled at (228396): [<ffffffff98e0031e>] __do_softirq+0x31e/0x427
softirqs last disabled at (228403): [<ffffffff98072051>] irq_exit+0xd1/0xe0
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&(&zram->bitmap_lock)->rlock);
<Interrupt>
lock(&(&zram->bitmap_lock)->rlock);
*** DEADLOCK ***
no locks held by zram_verify/2095.
stack backtrace:
CPU: 5 PID: 2095 Comm: zram_verify Not tainted 4.19.0+ #390
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x67/0x9b
print_usage_bug+0x1bd/0x1d3
mark_lock+0x4aa/0x540
__lock_acquire+0x51d/0x1300
lock_acquire+0x90/0x180
_raw_spin_lock+0x2c/0x40
put_entry_bdev+0x1e/0x50
zram_free_page+0xf6/0x110
zram_slot_free_notify+0x42/0xa0
end_swap_bio_read+0x5b/0x170
blk_update_request+0x8f/0x340
scsi_end_request+0x2c/0x1e0
scsi_io_completion+0x98/0x650
blk_done_softirq+0x9e/0xd0
__do_softirq+0xcc/0x427
irq_exit+0xd1/0xe0
do_IRQ+0x93/0x120
common_interrupt+0xf/0xf
</IRQ>
With writeback feature, zram_slot_free_notify could be called in softirq
context by end_swap_bio_read. However, bitmap_lock is not aware of that
so lockdep yell out:
get_entry_bdev
spin_lock(bitmap->lock);
irq
softirq
end_swap_bio_read
zram_slot_free_notify
zram_slot_lock <-- deadlock prone
zram_free_page
put_entry_bdev
spin_lock(bitmap->lock); <-- deadlock prone
With akpm's suggestion (i.e. bitmap operation is already atomic), we
could remove bitmap lock. It might fail to find a empty slot if serious
contention happens. However, it's not severe problem because huge page
writeback has already possiblity to fail if there is severe memory
pressure. Worst case is just keeping the incompressible in memory, not
storage.
The other problem is zram_slot_lock in zram_slot_slot_free_notify. To
make it safe is this patch introduces zram_slot_trylock where
zram_slot_free_notify uses it. Although it's rare to be contented, this
patch adds new debug stat "miss_free" to keep monitoring how often it
happens.
Link: http://lkml.kernel.org/r/20181127055429.251614-2-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Joey Pabalinas <joeypabalinas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 00:36:33 -08:00
|
|
|
return blk_idx;
|
2017-09-06 16:19:57 -07:00
|
|
|
}
|
|
|
|
|
2018-12-28 00:36:40 -08:00
|
|
|
static void free_block_bdev(struct zram *zram, unsigned long blk_idx)
|
2017-09-06 16:19:57 -07:00
|
|
|
{
|
|
|
|
int was_set;
|
|
|
|
|
2018-12-28 00:36:40 -08:00
|
|
|
was_set = test_and_clear_bit(blk_idx, zram->bitmap);
|
2017-09-06 16:19:57 -07:00
|
|
|
WARN_ON_ONCE(!was_set);
|
2018-12-28 00:36:51 -08:00
|
|
|
atomic64_dec(&zram->stats.bd_count);
|
2017-09-06 16:19:57 -07:00
|
|
|
}
|
|
|
|
|
2023-04-11 19:14:57 +02:00
|
|
|
static void read_from_bdev_async(struct zram *zram, struct page *page,
|
2017-09-06 16:20:07 -07:00
|
|
|
unsigned long entry, struct bio *parent)
|
|
|
|
{
|
|
|
|
struct bio *bio;
|
|
|
|
|
2024-04-28 18:55:47 -04:00
|
|
|
bio = bio_alloc(zram->bdev, 1, parent->bi_opf, GFP_NOIO);
|
2017-09-06 16:20:07 -07:00
|
|
|
bio->bi_iter.bi_sector = entry * (PAGE_SIZE >> 9);
|
2023-04-11 19:14:57 +02:00
|
|
|
__bio_add_page(bio, page, PAGE_SIZE, 0);
|
2023-04-11 19:14:58 +02:00
|
|
|
bio_chain(bio, parent);
|
2017-09-06 16:20:07 -07:00
|
|
|
submit_bio(bio);
|
|
|
|
}
|
|
|
|
|
2020-12-14 19:14:28 -08:00
|
|
|
#define PAGE_WB_SIG "page_index="
|
|
|
|
|
2022-11-09 20:50:46 +09:00
|
|
|
#define PAGE_WRITEBACK 0
|
|
|
|
#define HUGE_WRITEBACK (1<<0)
|
|
|
|
#define IDLE_WRITEBACK (1<<1)
|
|
|
|
#define INCOMPRESSIBLE_WRITEBACK (1<<2)
|
2020-12-14 19:14:28 -08:00
|
|
|
|
2024-09-17 11:09:09 +09:00
|
|
|
static int scan_slots_for_writeback(struct zram *zram, u32 mode,
|
|
|
|
unsigned long nr_pages,
|
|
|
|
unsigned long index,
|
|
|
|
struct zram_pp_ctl *ctl)
|
|
|
|
{
|
|
|
|
struct zram_pp_slot *pps = NULL;
|
|
|
|
|
|
|
|
for (; nr_pages != 0; index++, nr_pages--) {
|
|
|
|
if (!pps)
|
|
|
|
pps = kmalloc(sizeof(*pps), GFP_KERNEL);
|
|
|
|
if (!pps)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
INIT_LIST_HEAD(&pps->entry);
|
|
|
|
|
|
|
|
zram_slot_lock(zram, index);
|
|
|
|
if (!zram_allocated(zram, index))
|
|
|
|
goto next;
|
|
|
|
|
|
|
|
if (zram_test_flag(zram, index, ZRAM_WB) ||
|
|
|
|
zram_test_flag(zram, index, ZRAM_SAME))
|
|
|
|
goto next;
|
|
|
|
|
|
|
|
if (mode & IDLE_WRITEBACK &&
|
|
|
|
!zram_test_flag(zram, index, ZRAM_IDLE))
|
|
|
|
goto next;
|
|
|
|
if (mode & HUGE_WRITEBACK &&
|
|
|
|
!zram_test_flag(zram, index, ZRAM_HUGE))
|
|
|
|
goto next;
|
|
|
|
if (mode & INCOMPRESSIBLE_WRITEBACK &&
|
|
|
|
!zram_test_flag(zram, index, ZRAM_INCOMPRESSIBLE))
|
|
|
|
goto next;
|
|
|
|
|
|
|
|
pps->index = index;
|
|
|
|
place_pp_slot(zram, ctl, pps);
|
|
|
|
pps = NULL;
|
|
|
|
next:
|
|
|
|
zram_slot_unlock(zram, index);
|
|
|
|
}
|
|
|
|
|
|
|
|
kfree(pps);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
zram: support idle/huge page writeback
Add a new feature "zram idle/huge page writeback". In the zram-swap use
case, zram usually has many idle/huge swap pages. It's pointless to keep
them in memory (ie, zram).
To solve this problem, this feature introduces idle/huge page writeback to
the backing device so the goal is to save more memory space on embedded
systems.
Normal sequence to use idle/huge page writeback feature is as follows,
while (1) {
# mark allocated zram slot to idle
echo all > /sys/block/zram0/idle
# leave system working for several hours
# Unless there is no access for some blocks on zram,
# they are still IDLE marked pages.
echo "idle" > /sys/block/zram0/writeback
or/and
echo "huge" > /sys/block/zram0/writeback
# write the IDLE or/and huge marked slot into backing device
# and free the memory.
}
Per the discussion at
https://lore.kernel.org/lkml/20181122065926.GG3441@jagdpanzerIV/T/#u,
This patch removes direct incommpressibe page writeback feature
(d2afd25114f4 ("zram: write incompressible pages to backing device")).
Below concerns from Sergey:
== &< ==
"IDLE writeback" is superior to "incompressible writeback".
"incompressible writeback" is completely unpredictable and uncontrollable;
it depens on data patterns and compression algorithms. While "IDLE
writeback" is predictable.
I even suspect, that, *ideally*, we can remove "incompressible writeback".
"IDLE pages" is a super set which also includes "incompressible" pages.
So, technically, we still can do "incompressible writeback" from "IDLE
writeback" path; but a much more reasonable one, based on a page idling
period.
I understand that you want to keep "direct incompressible writeback"
around. ZRAM is especially popular on devices which do suffer from flash
wearout, so I can see "incompressible writeback" path becoming a dead
code, long term.
== &< ==
Below concerns from Minchan:
== &< ==
My concern is if we enable CONFIG_ZRAM_WRITEBACK in this implementation,
both hugepage/idlepage writeck will turn on. However someuser want to
enable only idlepage writeback so we need to introduce turn on/off knob
for hugepage or new CONFIG_ZRAM_IDLEPAGE_WRITEBACK for those usecase. I
don't want to make it complicated *if possible*.
Long term, I imagine we need to make VM aware of new swap hierarchy a
little bit different with as-is. For example, first high priority swap
can return -EIO or -ENOCOMP, swap try to fallback to next lower priority
swap device. With that, hugepage writeback will work tranparently.
So we could regard it as regression because incompressible pages doesn't
go to backing storage automatically. Instead, user should do it via "echo
huge" > /sys/block/zram/writeback" manually.
== &< ==
Link: http://lkml.kernel.org/r/20181127055429.251614-6-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Joey Pabalinas <joeypabalinas@gmail.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 00:36:47 -08:00
|
|
|
static ssize_t writeback_store(struct device *dev,
|
|
|
|
struct device_attribute *attr, const char *buf, size_t len)
|
|
|
|
{
|
|
|
|
struct zram *zram = dev_to_zram(dev);
|
|
|
|
unsigned long nr_pages = zram->disksize >> PAGE_SHIFT;
|
2024-09-17 11:09:09 +09:00
|
|
|
struct zram_pp_ctl *ctl = NULL;
|
|
|
|
struct zram_pp_slot *pps;
|
2020-12-14 19:14:28 -08:00
|
|
|
unsigned long index = 0;
|
zram: support idle/huge page writeback
Add a new feature "zram idle/huge page writeback". In the zram-swap use
case, zram usually has many idle/huge swap pages. It's pointless to keep
them in memory (ie, zram).
To solve this problem, this feature introduces idle/huge page writeback to
the backing device so the goal is to save more memory space on embedded
systems.
Normal sequence to use idle/huge page writeback feature is as follows,
while (1) {
# mark allocated zram slot to idle
echo all > /sys/block/zram0/idle
# leave system working for several hours
# Unless there is no access for some blocks on zram,
# they are still IDLE marked pages.
echo "idle" > /sys/block/zram0/writeback
or/and
echo "huge" > /sys/block/zram0/writeback
# write the IDLE or/and huge marked slot into backing device
# and free the memory.
}
Per the discussion at
https://lore.kernel.org/lkml/20181122065926.GG3441@jagdpanzerIV/T/#u,
This patch removes direct incommpressibe page writeback feature
(d2afd25114f4 ("zram: write incompressible pages to backing device")).
Below concerns from Sergey:
== &< ==
"IDLE writeback" is superior to "incompressible writeback".
"incompressible writeback" is completely unpredictable and uncontrollable;
it depens on data patterns and compression algorithms. While "IDLE
writeback" is predictable.
I even suspect, that, *ideally*, we can remove "incompressible writeback".
"IDLE pages" is a super set which also includes "incompressible" pages.
So, technically, we still can do "incompressible writeback" from "IDLE
writeback" path; but a much more reasonable one, based on a page idling
period.
I understand that you want to keep "direct incompressible writeback"
around. ZRAM is especially popular on devices which do suffer from flash
wearout, so I can see "incompressible writeback" path becoming a dead
code, long term.
== &< ==
Below concerns from Minchan:
== &< ==
My concern is if we enable CONFIG_ZRAM_WRITEBACK in this implementation,
both hugepage/idlepage writeck will turn on. However someuser want to
enable only idlepage writeback so we need to introduce turn on/off knob
for hugepage or new CONFIG_ZRAM_IDLEPAGE_WRITEBACK for those usecase. I
don't want to make it complicated *if possible*.
Long term, I imagine we need to make VM aware of new swap hierarchy a
little bit different with as-is. For example, first high priority swap
can return -EIO or -ENOCOMP, swap try to fallback to next lower priority
swap device. With that, hugepage writeback will work tranparently.
So we could regard it as regression because incompressible pages doesn't
go to backing storage automatically. Instead, user should do it via "echo
huge" > /sys/block/zram/writeback" manually.
== &< ==
Link: http://lkml.kernel.org/r/20181127055429.251614-6-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Joey Pabalinas <joeypabalinas@gmail.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 00:36:47 -08:00
|
|
|
struct bio bio;
|
|
|
|
struct bio_vec bio_vec;
|
|
|
|
struct page *page;
|
2020-01-30 22:15:25 -08:00
|
|
|
ssize_t ret = len;
|
2021-03-12 21:08:38 -08:00
|
|
|
int mode, err;
|
zram: support idle/huge page writeback
Add a new feature "zram idle/huge page writeback". In the zram-swap use
case, zram usually has many idle/huge swap pages. It's pointless to keep
them in memory (ie, zram).
To solve this problem, this feature introduces idle/huge page writeback to
the backing device so the goal is to save more memory space on embedded
systems.
Normal sequence to use idle/huge page writeback feature is as follows,
while (1) {
# mark allocated zram slot to idle
echo all > /sys/block/zram0/idle
# leave system working for several hours
# Unless there is no access for some blocks on zram,
# they are still IDLE marked pages.
echo "idle" > /sys/block/zram0/writeback
or/and
echo "huge" > /sys/block/zram0/writeback
# write the IDLE or/and huge marked slot into backing device
# and free the memory.
}
Per the discussion at
https://lore.kernel.org/lkml/20181122065926.GG3441@jagdpanzerIV/T/#u,
This patch removes direct incommpressibe page writeback feature
(d2afd25114f4 ("zram: write incompressible pages to backing device")).
Below concerns from Sergey:
== &< ==
"IDLE writeback" is superior to "incompressible writeback".
"incompressible writeback" is completely unpredictable and uncontrollable;
it depens on data patterns and compression algorithms. While "IDLE
writeback" is predictable.
I even suspect, that, *ideally*, we can remove "incompressible writeback".
"IDLE pages" is a super set which also includes "incompressible" pages.
So, technically, we still can do "incompressible writeback" from "IDLE
writeback" path; but a much more reasonable one, based on a page idling
period.
I understand that you want to keep "direct incompressible writeback"
around. ZRAM is especially popular on devices which do suffer from flash
wearout, so I can see "incompressible writeback" path becoming a dead
code, long term.
== &< ==
Below concerns from Minchan:
== &< ==
My concern is if we enable CONFIG_ZRAM_WRITEBACK in this implementation,
both hugepage/idlepage writeck will turn on. However someuser want to
enable only idlepage writeback so we need to introduce turn on/off knob
for hugepage or new CONFIG_ZRAM_IDLEPAGE_WRITEBACK for those usecase. I
don't want to make it complicated *if possible*.
Long term, I imagine we need to make VM aware of new swap hierarchy a
little bit different with as-is. For example, first high priority swap
can return -EIO or -ENOCOMP, swap try to fallback to next lower priority
swap device. With that, hugepage writeback will work tranparently.
So we could regard it as regression because incompressible pages doesn't
go to backing storage automatically. Instead, user should do it via "echo
huge" > /sys/block/zram/writeback" manually.
== &< ==
Link: http://lkml.kernel.org/r/20181127055429.251614-6-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Joey Pabalinas <joeypabalinas@gmail.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 00:36:47 -08:00
|
|
|
unsigned long blk_idx = 0;
|
|
|
|
|
2019-03-28 20:44:24 -07:00
|
|
|
if (sysfs_streq(buf, "idle"))
|
zram: support idle/huge page writeback
Add a new feature "zram idle/huge page writeback". In the zram-swap use
case, zram usually has many idle/huge swap pages. It's pointless to keep
them in memory (ie, zram).
To solve this problem, this feature introduces idle/huge page writeback to
the backing device so the goal is to save more memory space on embedded
systems.
Normal sequence to use idle/huge page writeback feature is as follows,
while (1) {
# mark allocated zram slot to idle
echo all > /sys/block/zram0/idle
# leave system working for several hours
# Unless there is no access for some blocks on zram,
# they are still IDLE marked pages.
echo "idle" > /sys/block/zram0/writeback
or/and
echo "huge" > /sys/block/zram0/writeback
# write the IDLE or/and huge marked slot into backing device
# and free the memory.
}
Per the discussion at
https://lore.kernel.org/lkml/20181122065926.GG3441@jagdpanzerIV/T/#u,
This patch removes direct incommpressibe page writeback feature
(d2afd25114f4 ("zram: write incompressible pages to backing device")).
Below concerns from Sergey:
== &< ==
"IDLE writeback" is superior to "incompressible writeback".
"incompressible writeback" is completely unpredictable and uncontrollable;
it depens on data patterns and compression algorithms. While "IDLE
writeback" is predictable.
I even suspect, that, *ideally*, we can remove "incompressible writeback".
"IDLE pages" is a super set which also includes "incompressible" pages.
So, technically, we still can do "incompressible writeback" from "IDLE
writeback" path; but a much more reasonable one, based on a page idling
period.
I understand that you want to keep "direct incompressible writeback"
around. ZRAM is especially popular on devices which do suffer from flash
wearout, so I can see "incompressible writeback" path becoming a dead
code, long term.
== &< ==
Below concerns from Minchan:
== &< ==
My concern is if we enable CONFIG_ZRAM_WRITEBACK in this implementation,
both hugepage/idlepage writeck will turn on. However someuser want to
enable only idlepage writeback so we need to introduce turn on/off knob
for hugepage or new CONFIG_ZRAM_IDLEPAGE_WRITEBACK for those usecase. I
don't want to make it complicated *if possible*.
Long term, I imagine we need to make VM aware of new swap hierarchy a
little bit different with as-is. For example, first high priority swap
can return -EIO or -ENOCOMP, swap try to fallback to next lower priority
swap device. With that, hugepage writeback will work tranparently.
So we could regard it as regression because incompressible pages doesn't
go to backing storage automatically. Instead, user should do it via "echo
huge" > /sys/block/zram/writeback" manually.
== &< ==
Link: http://lkml.kernel.org/r/20181127055429.251614-6-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Joey Pabalinas <joeypabalinas@gmail.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 00:36:47 -08:00
|
|
|
mode = IDLE_WRITEBACK;
|
2019-03-28 20:44:24 -07:00
|
|
|
else if (sysfs_streq(buf, "huge"))
|
zram: support idle/huge page writeback
Add a new feature "zram idle/huge page writeback". In the zram-swap use
case, zram usually has many idle/huge swap pages. It's pointless to keep
them in memory (ie, zram).
To solve this problem, this feature introduces idle/huge page writeback to
the backing device so the goal is to save more memory space on embedded
systems.
Normal sequence to use idle/huge page writeback feature is as follows,
while (1) {
# mark allocated zram slot to idle
echo all > /sys/block/zram0/idle
# leave system working for several hours
# Unless there is no access for some blocks on zram,
# they are still IDLE marked pages.
echo "idle" > /sys/block/zram0/writeback
or/and
echo "huge" > /sys/block/zram0/writeback
# write the IDLE or/and huge marked slot into backing device
# and free the memory.
}
Per the discussion at
https://lore.kernel.org/lkml/20181122065926.GG3441@jagdpanzerIV/T/#u,
This patch removes direct incommpressibe page writeback feature
(d2afd25114f4 ("zram: write incompressible pages to backing device")).
Below concerns from Sergey:
== &< ==
"IDLE writeback" is superior to "incompressible writeback".
"incompressible writeback" is completely unpredictable and uncontrollable;
it depens on data patterns and compression algorithms. While "IDLE
writeback" is predictable.
I even suspect, that, *ideally*, we can remove "incompressible writeback".
"IDLE pages" is a super set which also includes "incompressible" pages.
So, technically, we still can do "incompressible writeback" from "IDLE
writeback" path; but a much more reasonable one, based on a page idling
period.
I understand that you want to keep "direct incompressible writeback"
around. ZRAM is especially popular on devices which do suffer from flash
wearout, so I can see "incompressible writeback" path becoming a dead
code, long term.
== &< ==
Below concerns from Minchan:
== &< ==
My concern is if we enable CONFIG_ZRAM_WRITEBACK in this implementation,
both hugepage/idlepage writeck will turn on. However someuser want to
enable only idlepage writeback so we need to introduce turn on/off knob
for hugepage or new CONFIG_ZRAM_IDLEPAGE_WRITEBACK for those usecase. I
don't want to make it complicated *if possible*.
Long term, I imagine we need to make VM aware of new swap hierarchy a
little bit different with as-is. For example, first high priority swap
can return -EIO or -ENOCOMP, swap try to fallback to next lower priority
swap device. With that, hugepage writeback will work tranparently.
So we could regard it as regression because incompressible pages doesn't
go to backing storage automatically. Instead, user should do it via "echo
huge" > /sys/block/zram/writeback" manually.
== &< ==
Link: http://lkml.kernel.org/r/20181127055429.251614-6-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Joey Pabalinas <joeypabalinas@gmail.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 00:36:47 -08:00
|
|
|
mode = HUGE_WRITEBACK;
|
2022-04-29 14:36:59 -07:00
|
|
|
else if (sysfs_streq(buf, "huge_idle"))
|
|
|
|
mode = IDLE_WRITEBACK | HUGE_WRITEBACK;
|
2022-11-09 20:50:46 +09:00
|
|
|
else if (sysfs_streq(buf, "incompressible"))
|
|
|
|
mode = INCOMPRESSIBLE_WRITEBACK;
|
2020-12-14 19:14:28 -08:00
|
|
|
else {
|
|
|
|
if (strncmp(buf, PAGE_WB_SIG, sizeof(PAGE_WB_SIG) - 1))
|
|
|
|
return -EINVAL;
|
|
|
|
|
2021-03-12 21:08:41 -08:00
|
|
|
if (kstrtol(buf + sizeof(PAGE_WB_SIG) - 1, 10, &index) ||
|
|
|
|
index >= nr_pages)
|
2020-12-14 19:14:28 -08:00
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
nr_pages = 1;
|
|
|
|
mode = PAGE_WRITEBACK;
|
|
|
|
}
|
zram: support idle/huge page writeback
Add a new feature "zram idle/huge page writeback". In the zram-swap use
case, zram usually has many idle/huge swap pages. It's pointless to keep
them in memory (ie, zram).
To solve this problem, this feature introduces idle/huge page writeback to
the backing device so the goal is to save more memory space on embedded
systems.
Normal sequence to use idle/huge page writeback feature is as follows,
while (1) {
# mark allocated zram slot to idle
echo all > /sys/block/zram0/idle
# leave system working for several hours
# Unless there is no access for some blocks on zram,
# they are still IDLE marked pages.
echo "idle" > /sys/block/zram0/writeback
or/and
echo "huge" > /sys/block/zram0/writeback
# write the IDLE or/and huge marked slot into backing device
# and free the memory.
}
Per the discussion at
https://lore.kernel.org/lkml/20181122065926.GG3441@jagdpanzerIV/T/#u,
This patch removes direct incommpressibe page writeback feature
(d2afd25114f4 ("zram: write incompressible pages to backing device")).
Below concerns from Sergey:
== &< ==
"IDLE writeback" is superior to "incompressible writeback".
"incompressible writeback" is completely unpredictable and uncontrollable;
it depens on data patterns and compression algorithms. While "IDLE
writeback" is predictable.
I even suspect, that, *ideally*, we can remove "incompressible writeback".
"IDLE pages" is a super set which also includes "incompressible" pages.
So, technically, we still can do "incompressible writeback" from "IDLE
writeback" path; but a much more reasonable one, based on a page idling
period.
I understand that you want to keep "direct incompressible writeback"
around. ZRAM is especially popular on devices which do suffer from flash
wearout, so I can see "incompressible writeback" path becoming a dead
code, long term.
== &< ==
Below concerns from Minchan:
== &< ==
My concern is if we enable CONFIG_ZRAM_WRITEBACK in this implementation,
both hugepage/idlepage writeck will turn on. However someuser want to
enable only idlepage writeback so we need to introduce turn on/off knob
for hugepage or new CONFIG_ZRAM_IDLEPAGE_WRITEBACK for those usecase. I
don't want to make it complicated *if possible*.
Long term, I imagine we need to make VM aware of new swap hierarchy a
little bit different with as-is. For example, first high priority swap
can return -EIO or -ENOCOMP, swap try to fallback to next lower priority
swap device. With that, hugepage writeback will work tranparently.
So we could regard it as regression because incompressible pages doesn't
go to backing storage automatically. Instead, user should do it via "echo
huge" > /sys/block/zram/writeback" manually.
== &< ==
Link: http://lkml.kernel.org/r/20181127055429.251614-6-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Joey Pabalinas <joeypabalinas@gmail.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 00:36:47 -08:00
|
|
|
|
|
|
|
down_read(&zram->init_lock);
|
|
|
|
if (!init_done(zram)) {
|
|
|
|
ret = -EINVAL;
|
|
|
|
goto release_init_lock;
|
|
|
|
}
|
|
|
|
|
2024-09-17 11:09:07 +09:00
|
|
|
/* Do not permit concurrent post-processing actions. */
|
|
|
|
if (atomic_xchg(&zram->pp_in_progress, 1)) {
|
|
|
|
up_read(&zram->init_lock);
|
|
|
|
return -EAGAIN;
|
|
|
|
}
|
|
|
|
|
zram: support idle/huge page writeback
Add a new feature "zram idle/huge page writeback". In the zram-swap use
case, zram usually has many idle/huge swap pages. It's pointless to keep
them in memory (ie, zram).
To solve this problem, this feature introduces idle/huge page writeback to
the backing device so the goal is to save more memory space on embedded
systems.
Normal sequence to use idle/huge page writeback feature is as follows,
while (1) {
# mark allocated zram slot to idle
echo all > /sys/block/zram0/idle
# leave system working for several hours
# Unless there is no access for some blocks on zram,
# they are still IDLE marked pages.
echo "idle" > /sys/block/zram0/writeback
or/and
echo "huge" > /sys/block/zram0/writeback
# write the IDLE or/and huge marked slot into backing device
# and free the memory.
}
Per the discussion at
https://lore.kernel.org/lkml/20181122065926.GG3441@jagdpanzerIV/T/#u,
This patch removes direct incommpressibe page writeback feature
(d2afd25114f4 ("zram: write incompressible pages to backing device")).
Below concerns from Sergey:
== &< ==
"IDLE writeback" is superior to "incompressible writeback".
"incompressible writeback" is completely unpredictable and uncontrollable;
it depens on data patterns and compression algorithms. While "IDLE
writeback" is predictable.
I even suspect, that, *ideally*, we can remove "incompressible writeback".
"IDLE pages" is a super set which also includes "incompressible" pages.
So, technically, we still can do "incompressible writeback" from "IDLE
writeback" path; but a much more reasonable one, based on a page idling
period.
I understand that you want to keep "direct incompressible writeback"
around. ZRAM is especially popular on devices which do suffer from flash
wearout, so I can see "incompressible writeback" path becoming a dead
code, long term.
== &< ==
Below concerns from Minchan:
== &< ==
My concern is if we enable CONFIG_ZRAM_WRITEBACK in this implementation,
both hugepage/idlepage writeck will turn on. However someuser want to
enable only idlepage writeback so we need to introduce turn on/off knob
for hugepage or new CONFIG_ZRAM_IDLEPAGE_WRITEBACK for those usecase. I
don't want to make it complicated *if possible*.
Long term, I imagine we need to make VM aware of new swap hierarchy a
little bit different with as-is. For example, first high priority swap
can return -EIO or -ENOCOMP, swap try to fallback to next lower priority
swap device. With that, hugepage writeback will work tranparently.
So we could regard it as regression because incompressible pages doesn't
go to backing storage automatically. Instead, user should do it via "echo
huge" > /sys/block/zram/writeback" manually.
== &< ==
Link: http://lkml.kernel.org/r/20181127055429.251614-6-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Joey Pabalinas <joeypabalinas@gmail.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 00:36:47 -08:00
|
|
|
if (!zram->backing_dev) {
|
|
|
|
ret = -ENODEV;
|
|
|
|
goto release_init_lock;
|
|
|
|
}
|
|
|
|
|
|
|
|
page = alloc_page(GFP_KERNEL);
|
|
|
|
if (!page) {
|
|
|
|
ret = -ENOMEM;
|
|
|
|
goto release_init_lock;
|
|
|
|
}
|
|
|
|
|
2024-09-17 11:09:09 +09:00
|
|
|
ctl = init_pp_ctl();
|
|
|
|
if (!ctl) {
|
|
|
|
ret = -ENOMEM;
|
|
|
|
goto release_init_lock;
|
|
|
|
}
|
|
|
|
|
|
|
|
scan_slots_for_writeback(zram, mode, nr_pages, index, ctl);
|
|
|
|
|
|
|
|
while ((pps = select_pp_slot(ctl))) {
|
2019-01-08 15:22:53 -08:00
|
|
|
spin_lock(&zram->wb_limit_lock);
|
|
|
|
if (zram->wb_limit_enable && !zram->bd_wb_limit) {
|
|
|
|
spin_unlock(&zram->wb_limit_lock);
|
2018-12-28 00:36:54 -08:00
|
|
|
ret = -EIO;
|
|
|
|
break;
|
|
|
|
}
|
2019-01-08 15:22:53 -08:00
|
|
|
spin_unlock(&zram->wb_limit_lock);
|
2018-12-28 00:36:54 -08:00
|
|
|
|
zram: support idle/huge page writeback
Add a new feature "zram idle/huge page writeback". In the zram-swap use
case, zram usually has many idle/huge swap pages. It's pointless to keep
them in memory (ie, zram).
To solve this problem, this feature introduces idle/huge page writeback to
the backing device so the goal is to save more memory space on embedded
systems.
Normal sequence to use idle/huge page writeback feature is as follows,
while (1) {
# mark allocated zram slot to idle
echo all > /sys/block/zram0/idle
# leave system working for several hours
# Unless there is no access for some blocks on zram,
# they are still IDLE marked pages.
echo "idle" > /sys/block/zram0/writeback
or/and
echo "huge" > /sys/block/zram0/writeback
# write the IDLE or/and huge marked slot into backing device
# and free the memory.
}
Per the discussion at
https://lore.kernel.org/lkml/20181122065926.GG3441@jagdpanzerIV/T/#u,
This patch removes direct incommpressibe page writeback feature
(d2afd25114f4 ("zram: write incompressible pages to backing device")).
Below concerns from Sergey:
== &< ==
"IDLE writeback" is superior to "incompressible writeback".
"incompressible writeback" is completely unpredictable and uncontrollable;
it depens on data patterns and compression algorithms. While "IDLE
writeback" is predictable.
I even suspect, that, *ideally*, we can remove "incompressible writeback".
"IDLE pages" is a super set which also includes "incompressible" pages.
So, technically, we still can do "incompressible writeback" from "IDLE
writeback" path; but a much more reasonable one, based on a page idling
period.
I understand that you want to keep "direct incompressible writeback"
around. ZRAM is especially popular on devices which do suffer from flash
wearout, so I can see "incompressible writeback" path becoming a dead
code, long term.
== &< ==
Below concerns from Minchan:
== &< ==
My concern is if we enable CONFIG_ZRAM_WRITEBACK in this implementation,
both hugepage/idlepage writeck will turn on. However someuser want to
enable only idlepage writeback so we need to introduce turn on/off knob
for hugepage or new CONFIG_ZRAM_IDLEPAGE_WRITEBACK for those usecase. I
don't want to make it complicated *if possible*.
Long term, I imagine we need to make VM aware of new swap hierarchy a
little bit different with as-is. For example, first high priority swap
can return -EIO or -ENOCOMP, swap try to fallback to next lower priority
swap device. With that, hugepage writeback will work tranparently.
So we could regard it as regression because incompressible pages doesn't
go to backing storage automatically. Instead, user should do it via "echo
huge" > /sys/block/zram/writeback" manually.
== &< ==
Link: http://lkml.kernel.org/r/20181127055429.251614-6-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Joey Pabalinas <joeypabalinas@gmail.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 00:36:47 -08:00
|
|
|
if (!blk_idx) {
|
|
|
|
blk_idx = alloc_block_bdev(zram);
|
|
|
|
if (!blk_idx) {
|
|
|
|
ret = -ENOSPC;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2024-09-17 11:09:09 +09:00
|
|
|
index = pps->index;
|
zram: support idle/huge page writeback
Add a new feature "zram idle/huge page writeback". In the zram-swap use
case, zram usually has many idle/huge swap pages. It's pointless to keep
them in memory (ie, zram).
To solve this problem, this feature introduces idle/huge page writeback to
the backing device so the goal is to save more memory space on embedded
systems.
Normal sequence to use idle/huge page writeback feature is as follows,
while (1) {
# mark allocated zram slot to idle
echo all > /sys/block/zram0/idle
# leave system working for several hours
# Unless there is no access for some blocks on zram,
# they are still IDLE marked pages.
echo "idle" > /sys/block/zram0/writeback
or/and
echo "huge" > /sys/block/zram0/writeback
# write the IDLE or/and huge marked slot into backing device
# and free the memory.
}
Per the discussion at
https://lore.kernel.org/lkml/20181122065926.GG3441@jagdpanzerIV/T/#u,
This patch removes direct incommpressibe page writeback feature
(d2afd25114f4 ("zram: write incompressible pages to backing device")).
Below concerns from Sergey:
== &< ==
"IDLE writeback" is superior to "incompressible writeback".
"incompressible writeback" is completely unpredictable and uncontrollable;
it depens on data patterns and compression algorithms. While "IDLE
writeback" is predictable.
I even suspect, that, *ideally*, we can remove "incompressible writeback".
"IDLE pages" is a super set which also includes "incompressible" pages.
So, technically, we still can do "incompressible writeback" from "IDLE
writeback" path; but a much more reasonable one, based on a page idling
period.
I understand that you want to keep "direct incompressible writeback"
around. ZRAM is especially popular on devices which do suffer from flash
wearout, so I can see "incompressible writeback" path becoming a dead
code, long term.
== &< ==
Below concerns from Minchan:
== &< ==
My concern is if we enable CONFIG_ZRAM_WRITEBACK in this implementation,
both hugepage/idlepage writeck will turn on. However someuser want to
enable only idlepage writeback so we need to introduce turn on/off knob
for hugepage or new CONFIG_ZRAM_IDLEPAGE_WRITEBACK for those usecase. I
don't want to make it complicated *if possible*.
Long term, I imagine we need to make VM aware of new swap hierarchy a
little bit different with as-is. For example, first high priority swap
can return -EIO or -ENOCOMP, swap try to fallback to next lower priority
swap device. With that, hugepage writeback will work tranparently.
So we could regard it as regression because incompressible pages doesn't
go to backing storage automatically. Instead, user should do it via "echo
huge" > /sys/block/zram/writeback" manually.
== &< ==
Link: http://lkml.kernel.org/r/20181127055429.251614-6-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Joey Pabalinas <joeypabalinas@gmail.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 00:36:47 -08:00
|
|
|
zram_slot_lock(zram, index);
|
|
|
|
/*
|
2024-09-17 11:09:12 +09:00
|
|
|
* scan_slots() sets ZRAM_PP_SLOT and relases slot lock, so
|
|
|
|
* slots can change in the meantime. If slots are accessed or
|
|
|
|
* freed they lose ZRAM_PP_SLOT flag and hence we don't
|
|
|
|
* post-process them.
|
zram: support idle/huge page writeback
Add a new feature "zram idle/huge page writeback". In the zram-swap use
case, zram usually has many idle/huge swap pages. It's pointless to keep
them in memory (ie, zram).
To solve this problem, this feature introduces idle/huge page writeback to
the backing device so the goal is to save more memory space on embedded
systems.
Normal sequence to use idle/huge page writeback feature is as follows,
while (1) {
# mark allocated zram slot to idle
echo all > /sys/block/zram0/idle
# leave system working for several hours
# Unless there is no access for some blocks on zram,
# they are still IDLE marked pages.
echo "idle" > /sys/block/zram0/writeback
or/and
echo "huge" > /sys/block/zram0/writeback
# write the IDLE or/and huge marked slot into backing device
# and free the memory.
}
Per the discussion at
https://lore.kernel.org/lkml/20181122065926.GG3441@jagdpanzerIV/T/#u,
This patch removes direct incommpressibe page writeback feature
(d2afd25114f4 ("zram: write incompressible pages to backing device")).
Below concerns from Sergey:
== &< ==
"IDLE writeback" is superior to "incompressible writeback".
"incompressible writeback" is completely unpredictable and uncontrollable;
it depens on data patterns and compression algorithms. While "IDLE
writeback" is predictable.
I even suspect, that, *ideally*, we can remove "incompressible writeback".
"IDLE pages" is a super set which also includes "incompressible" pages.
So, technically, we still can do "incompressible writeback" from "IDLE
writeback" path; but a much more reasonable one, based on a page idling
period.
I understand that you want to keep "direct incompressible writeback"
around. ZRAM is especially popular on devices which do suffer from flash
wearout, so I can see "incompressible writeback" path becoming a dead
code, long term.
== &< ==
Below concerns from Minchan:
== &< ==
My concern is if we enable CONFIG_ZRAM_WRITEBACK in this implementation,
both hugepage/idlepage writeck will turn on. However someuser want to
enable only idlepage writeback so we need to introduce turn on/off knob
for hugepage or new CONFIG_ZRAM_IDLEPAGE_WRITEBACK for those usecase. I
don't want to make it complicated *if possible*.
Long term, I imagine we need to make VM aware of new swap hierarchy a
little bit different with as-is. For example, first high priority swap
can return -EIO or -ENOCOMP, swap try to fallback to next lower priority
swap device. With that, hugepage writeback will work tranparently.
So we could regard it as regression because incompressible pages doesn't
go to backing storage automatically. Instead, user should do it via "echo
huge" > /sys/block/zram/writeback" manually.
== &< ==
Link: http://lkml.kernel.org/r/20181127055429.251614-6-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Joey Pabalinas <joeypabalinas@gmail.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 00:36:47 -08:00
|
|
|
*/
|
2024-09-17 11:09:12 +09:00
|
|
|
if (!zram_test_flag(zram, index, ZRAM_PP_SLOT))
|
|
|
|
goto next;
|
2024-12-18 15:34:23 +09:00
|
|
|
if (zram_read_from_zspool(zram, page, index))
|
|
|
|
goto next;
|
zram: support idle/huge page writeback
Add a new feature "zram idle/huge page writeback". In the zram-swap use
case, zram usually has many idle/huge swap pages. It's pointless to keep
them in memory (ie, zram).
To solve this problem, this feature introduces idle/huge page writeback to
the backing device so the goal is to save more memory space on embedded
systems.
Normal sequence to use idle/huge page writeback feature is as follows,
while (1) {
# mark allocated zram slot to idle
echo all > /sys/block/zram0/idle
# leave system working for several hours
# Unless there is no access for some blocks on zram,
# they are still IDLE marked pages.
echo "idle" > /sys/block/zram0/writeback
or/and
echo "huge" > /sys/block/zram0/writeback
# write the IDLE or/and huge marked slot into backing device
# and free the memory.
}
Per the discussion at
https://lore.kernel.org/lkml/20181122065926.GG3441@jagdpanzerIV/T/#u,
This patch removes direct incommpressibe page writeback feature
(d2afd25114f4 ("zram: write incompressible pages to backing device")).
Below concerns from Sergey:
== &< ==
"IDLE writeback" is superior to "incompressible writeback".
"incompressible writeback" is completely unpredictable and uncontrollable;
it depens on data patterns and compression algorithms. While "IDLE
writeback" is predictable.
I even suspect, that, *ideally*, we can remove "incompressible writeback".
"IDLE pages" is a super set which also includes "incompressible" pages.
So, technically, we still can do "incompressible writeback" from "IDLE
writeback" path; but a much more reasonable one, based on a page idling
period.
I understand that you want to keep "direct incompressible writeback"
around. ZRAM is especially popular on devices which do suffer from flash
wearout, so I can see "incompressible writeback" path becoming a dead
code, long term.
== &< ==
Below concerns from Minchan:
== &< ==
My concern is if we enable CONFIG_ZRAM_WRITEBACK in this implementation,
both hugepage/idlepage writeck will turn on. However someuser want to
enable only idlepage writeback so we need to introduce turn on/off knob
for hugepage or new CONFIG_ZRAM_IDLEPAGE_WRITEBACK for those usecase. I
don't want to make it complicated *if possible*.
Long term, I imagine we need to make VM aware of new swap hierarchy a
little bit different with as-is. For example, first high priority swap
can return -EIO or -ENOCOMP, swap try to fallback to next lower priority
swap device. With that, hugepage writeback will work tranparently.
So we could regard it as regression because incompressible pages doesn't
go to backing storage automatically. Instead, user should do it via "echo
huge" > /sys/block/zram/writeback" manually.
== &< ==
Link: http://lkml.kernel.org/r/20181127055429.251614-6-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Joey Pabalinas <joeypabalinas@gmail.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 00:36:47 -08:00
|
|
|
zram_slot_unlock(zram, index);
|
2024-09-17 11:09:09 +09:00
|
|
|
|
2024-04-28 18:55:47 -04:00
|
|
|
bio_init(&bio, zram->bdev, &bio_vec, 1,
|
2022-01-24 10:11:06 +01:00
|
|
|
REQ_OP_WRITE | REQ_SYNC);
|
zram: support idle/huge page writeback
Add a new feature "zram idle/huge page writeback". In the zram-swap use
case, zram usually has many idle/huge swap pages. It's pointless to keep
them in memory (ie, zram).
To solve this problem, this feature introduces idle/huge page writeback to
the backing device so the goal is to save more memory space on embedded
systems.
Normal sequence to use idle/huge page writeback feature is as follows,
while (1) {
# mark allocated zram slot to idle
echo all > /sys/block/zram0/idle
# leave system working for several hours
# Unless there is no access for some blocks on zram,
# they are still IDLE marked pages.
echo "idle" > /sys/block/zram0/writeback
or/and
echo "huge" > /sys/block/zram0/writeback
# write the IDLE or/and huge marked slot into backing device
# and free the memory.
}
Per the discussion at
https://lore.kernel.org/lkml/20181122065926.GG3441@jagdpanzerIV/T/#u,
This patch removes direct incommpressibe page writeback feature
(d2afd25114f4 ("zram: write incompressible pages to backing device")).
Below concerns from Sergey:
== &< ==
"IDLE writeback" is superior to "incompressible writeback".
"incompressible writeback" is completely unpredictable and uncontrollable;
it depens on data patterns and compression algorithms. While "IDLE
writeback" is predictable.
I even suspect, that, *ideally*, we can remove "incompressible writeback".
"IDLE pages" is a super set which also includes "incompressible" pages.
So, technically, we still can do "incompressible writeback" from "IDLE
writeback" path; but a much more reasonable one, based on a page idling
period.
I understand that you want to keep "direct incompressible writeback"
around. ZRAM is especially popular on devices which do suffer from flash
wearout, so I can see "incompressible writeback" path becoming a dead
code, long term.
== &< ==
Below concerns from Minchan:
== &< ==
My concern is if we enable CONFIG_ZRAM_WRITEBACK in this implementation,
both hugepage/idlepage writeck will turn on. However someuser want to
enable only idlepage writeback so we need to introduce turn on/off knob
for hugepage or new CONFIG_ZRAM_IDLEPAGE_WRITEBACK for those usecase. I
don't want to make it complicated *if possible*.
Long term, I imagine we need to make VM aware of new swap hierarchy a
little bit different with as-is. For example, first high priority swap
can return -EIO or -ENOCOMP, swap try to fallback to next lower priority
swap device. With that, hugepage writeback will work tranparently.
So we could regard it as regression because incompressible pages doesn't
go to backing storage automatically. Instead, user should do it via "echo
huge" > /sys/block/zram/writeback" manually.
== &< ==
Link: http://lkml.kernel.org/r/20181127055429.251614-6-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Joey Pabalinas <joeypabalinas@gmail.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 00:36:47 -08:00
|
|
|
bio.bi_iter.bi_sector = blk_idx * (PAGE_SIZE >> 9);
|
2023-05-31 04:50:34 -07:00
|
|
|
__bio_add_page(&bio, page, PAGE_SIZE, 0);
|
zram: support idle/huge page writeback
Add a new feature "zram idle/huge page writeback". In the zram-swap use
case, zram usually has many idle/huge swap pages. It's pointless to keep
them in memory (ie, zram).
To solve this problem, this feature introduces idle/huge page writeback to
the backing device so the goal is to save more memory space on embedded
systems.
Normal sequence to use idle/huge page writeback feature is as follows,
while (1) {
# mark allocated zram slot to idle
echo all > /sys/block/zram0/idle
# leave system working for several hours
# Unless there is no access for some blocks on zram,
# they are still IDLE marked pages.
echo "idle" > /sys/block/zram0/writeback
or/and
echo "huge" > /sys/block/zram0/writeback
# write the IDLE or/and huge marked slot into backing device
# and free the memory.
}
Per the discussion at
https://lore.kernel.org/lkml/20181122065926.GG3441@jagdpanzerIV/T/#u,
This patch removes direct incommpressibe page writeback feature
(d2afd25114f4 ("zram: write incompressible pages to backing device")).
Below concerns from Sergey:
== &< ==
"IDLE writeback" is superior to "incompressible writeback".
"incompressible writeback" is completely unpredictable and uncontrollable;
it depens on data patterns and compression algorithms. While "IDLE
writeback" is predictable.
I even suspect, that, *ideally*, we can remove "incompressible writeback".
"IDLE pages" is a super set which also includes "incompressible" pages.
So, technically, we still can do "incompressible writeback" from "IDLE
writeback" path; but a much more reasonable one, based on a page idling
period.
I understand that you want to keep "direct incompressible writeback"
around. ZRAM is especially popular on devices which do suffer from flash
wearout, so I can see "incompressible writeback" path becoming a dead
code, long term.
== &< ==
Below concerns from Minchan:
== &< ==
My concern is if we enable CONFIG_ZRAM_WRITEBACK in this implementation,
both hugepage/idlepage writeck will turn on. However someuser want to
enable only idlepage writeback so we need to introduce turn on/off knob
for hugepage or new CONFIG_ZRAM_IDLEPAGE_WRITEBACK for those usecase. I
don't want to make it complicated *if possible*.
Long term, I imagine we need to make VM aware of new swap hierarchy a
little bit different with as-is. For example, first high priority swap
can return -EIO or -ENOCOMP, swap try to fallback to next lower priority
swap device. With that, hugepage writeback will work tranparently.
So we could regard it as regression because incompressible pages doesn't
go to backing storage automatically. Instead, user should do it via "echo
huge" > /sys/block/zram/writeback" manually.
== &< ==
Link: http://lkml.kernel.org/r/20181127055429.251614-6-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Joey Pabalinas <joeypabalinas@gmail.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 00:36:47 -08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* XXX: A single page IO would be inefficient for write
|
|
|
|
* but it would be not bad as starter.
|
|
|
|
*/
|
2021-03-12 21:08:38 -08:00
|
|
|
err = submit_bio_wait(&bio);
|
|
|
|
if (err) {
|
2024-09-17 11:09:09 +09:00
|
|
|
release_pp_slot(zram, pps);
|
2021-03-12 21:08:38 -08:00
|
|
|
/*
|
2022-11-09 20:50:40 +09:00
|
|
|
* BIO errors are not fatal, we continue and simply
|
|
|
|
* attempt to writeback the remaining objects (pages).
|
|
|
|
* At the same time we need to signal user-space that
|
|
|
|
* some writes (at least one, but also could be all of
|
|
|
|
* them) were not successful and we do so by returning
|
|
|
|
* the most recent BIO error.
|
2021-03-12 21:08:38 -08:00
|
|
|
*/
|
|
|
|
ret = err;
|
zram: support idle/huge page writeback
Add a new feature "zram idle/huge page writeback". In the zram-swap use
case, zram usually has many idle/huge swap pages. It's pointless to keep
them in memory (ie, zram).
To solve this problem, this feature introduces idle/huge page writeback to
the backing device so the goal is to save more memory space on embedded
systems.
Normal sequence to use idle/huge page writeback feature is as follows,
while (1) {
# mark allocated zram slot to idle
echo all > /sys/block/zram0/idle
# leave system working for several hours
# Unless there is no access for some blocks on zram,
# they are still IDLE marked pages.
echo "idle" > /sys/block/zram0/writeback
or/and
echo "huge" > /sys/block/zram0/writeback
# write the IDLE or/and huge marked slot into backing device
# and free the memory.
}
Per the discussion at
https://lore.kernel.org/lkml/20181122065926.GG3441@jagdpanzerIV/T/#u,
This patch removes direct incommpressibe page writeback feature
(d2afd25114f4 ("zram: write incompressible pages to backing device")).
Below concerns from Sergey:
== &< ==
"IDLE writeback" is superior to "incompressible writeback".
"incompressible writeback" is completely unpredictable and uncontrollable;
it depens on data patterns and compression algorithms. While "IDLE
writeback" is predictable.
I even suspect, that, *ideally*, we can remove "incompressible writeback".
"IDLE pages" is a super set which also includes "incompressible" pages.
So, technically, we still can do "incompressible writeback" from "IDLE
writeback" path; but a much more reasonable one, based on a page idling
period.
I understand that you want to keep "direct incompressible writeback"
around. ZRAM is especially popular on devices which do suffer from flash
wearout, so I can see "incompressible writeback" path becoming a dead
code, long term.
== &< ==
Below concerns from Minchan:
== &< ==
My concern is if we enable CONFIG_ZRAM_WRITEBACK in this implementation,
both hugepage/idlepage writeck will turn on. However someuser want to
enable only idlepage writeback so we need to introduce turn on/off knob
for hugepage or new CONFIG_ZRAM_IDLEPAGE_WRITEBACK for those usecase. I
don't want to make it complicated *if possible*.
Long term, I imagine we need to make VM aware of new swap hierarchy a
little bit different with as-is. For example, first high priority swap
can return -EIO or -ENOCOMP, swap try to fallback to next lower priority
swap device. With that, hugepage writeback will work tranparently.
So we could regard it as regression because incompressible pages doesn't
go to backing storage automatically. Instead, user should do it via "echo
huge" > /sys/block/zram/writeback" manually.
== &< ==
Link: http://lkml.kernel.org/r/20181127055429.251614-6-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Joey Pabalinas <joeypabalinas@gmail.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 00:36:47 -08:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
2018-12-28 00:36:51 -08:00
|
|
|
atomic64_inc(&zram->stats.bd_writes);
|
2024-09-17 11:09:12 +09:00
|
|
|
zram_slot_lock(zram, index);
|
zram: support idle/huge page writeback
Add a new feature "zram idle/huge page writeback". In the zram-swap use
case, zram usually has many idle/huge swap pages. It's pointless to keep
them in memory (ie, zram).
To solve this problem, this feature introduces idle/huge page writeback to
the backing device so the goal is to save more memory space on embedded
systems.
Normal sequence to use idle/huge page writeback feature is as follows,
while (1) {
# mark allocated zram slot to idle
echo all > /sys/block/zram0/idle
# leave system working for several hours
# Unless there is no access for some blocks on zram,
# they are still IDLE marked pages.
echo "idle" > /sys/block/zram0/writeback
or/and
echo "huge" > /sys/block/zram0/writeback
# write the IDLE or/and huge marked slot into backing device
# and free the memory.
}
Per the discussion at
https://lore.kernel.org/lkml/20181122065926.GG3441@jagdpanzerIV/T/#u,
This patch removes direct incommpressibe page writeback feature
(d2afd25114f4 ("zram: write incompressible pages to backing device")).
Below concerns from Sergey:
== &< ==
"IDLE writeback" is superior to "incompressible writeback".
"incompressible writeback" is completely unpredictable and uncontrollable;
it depens on data patterns and compression algorithms. While "IDLE
writeback" is predictable.
I even suspect, that, *ideally*, we can remove "incompressible writeback".
"IDLE pages" is a super set which also includes "incompressible" pages.
So, technically, we still can do "incompressible writeback" from "IDLE
writeback" path; but a much more reasonable one, based on a page idling
period.
I understand that you want to keep "direct incompressible writeback"
around. ZRAM is especially popular on devices which do suffer from flash
wearout, so I can see "incompressible writeback" path becoming a dead
code, long term.
== &< ==
Below concerns from Minchan:
== &< ==
My concern is if we enable CONFIG_ZRAM_WRITEBACK in this implementation,
both hugepage/idlepage writeck will turn on. However someuser want to
enable only idlepage writeback so we need to introduce turn on/off knob
for hugepage or new CONFIG_ZRAM_IDLEPAGE_WRITEBACK for those usecase. I
don't want to make it complicated *if possible*.
Long term, I imagine we need to make VM aware of new swap hierarchy a
little bit different with as-is. For example, first high priority swap
can return -EIO or -ENOCOMP, swap try to fallback to next lower priority
swap device. With that, hugepage writeback will work tranparently.
So we could regard it as regression because incompressible pages doesn't
go to backing storage automatically. Instead, user should do it via "echo
huge" > /sys/block/zram/writeback" manually.
== &< ==
Link: http://lkml.kernel.org/r/20181127055429.251614-6-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Joey Pabalinas <joeypabalinas@gmail.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 00:36:47 -08:00
|
|
|
/*
|
2024-09-17 11:09:12 +09:00
|
|
|
* Same as above, we release slot lock during writeback so
|
|
|
|
* slot can change under us: slot_free() or slot_free() and
|
|
|
|
* reallocation (zram_write_page()). In both cases slot loses
|
|
|
|
* ZRAM_PP_SLOT flag. No concurrent post-processing can set
|
|
|
|
* ZRAM_PP_SLOT on such slots until current post-processing
|
|
|
|
* finishes.
|
zram: support idle/huge page writeback
Add a new feature "zram idle/huge page writeback". In the zram-swap use
case, zram usually has many idle/huge swap pages. It's pointless to keep
them in memory (ie, zram).
To solve this problem, this feature introduces idle/huge page writeback to
the backing device so the goal is to save more memory space on embedded
systems.
Normal sequence to use idle/huge page writeback feature is as follows,
while (1) {
# mark allocated zram slot to idle
echo all > /sys/block/zram0/idle
# leave system working for several hours
# Unless there is no access for some blocks on zram,
# they are still IDLE marked pages.
echo "idle" > /sys/block/zram0/writeback
or/and
echo "huge" > /sys/block/zram0/writeback
# write the IDLE or/and huge marked slot into backing device
# and free the memory.
}
Per the discussion at
https://lore.kernel.org/lkml/20181122065926.GG3441@jagdpanzerIV/T/#u,
This patch removes direct incommpressibe page writeback feature
(d2afd25114f4 ("zram: write incompressible pages to backing device")).
Below concerns from Sergey:
== &< ==
"IDLE writeback" is superior to "incompressible writeback".
"incompressible writeback" is completely unpredictable and uncontrollable;
it depens on data patterns and compression algorithms. While "IDLE
writeback" is predictable.
I even suspect, that, *ideally*, we can remove "incompressible writeback".
"IDLE pages" is a super set which also includes "incompressible" pages.
So, technically, we still can do "incompressible writeback" from "IDLE
writeback" path; but a much more reasonable one, based on a page idling
period.
I understand that you want to keep "direct incompressible writeback"
around. ZRAM is especially popular on devices which do suffer from flash
wearout, so I can see "incompressible writeback" path becoming a dead
code, long term.
== &< ==
Below concerns from Minchan:
== &< ==
My concern is if we enable CONFIG_ZRAM_WRITEBACK in this implementation,
both hugepage/idlepage writeck will turn on. However someuser want to
enable only idlepage writeback so we need to introduce turn on/off knob
for hugepage or new CONFIG_ZRAM_IDLEPAGE_WRITEBACK for those usecase. I
don't want to make it complicated *if possible*.
Long term, I imagine we need to make VM aware of new swap hierarchy a
little bit different with as-is. For example, first high priority swap
can return -EIO or -ENOCOMP, swap try to fallback to next lower priority
swap device. With that, hugepage writeback will work tranparently.
So we could regard it as regression because incompressible pages doesn't
go to backing storage automatically. Instead, user should do it via "echo
huge" > /sys/block/zram/writeback" manually.
== &< ==
Link: http://lkml.kernel.org/r/20181127055429.251614-6-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Joey Pabalinas <joeypabalinas@gmail.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 00:36:47 -08:00
|
|
|
*/
|
2024-09-17 11:09:12 +09:00
|
|
|
if (!zram_test_flag(zram, index, ZRAM_PP_SLOT))
|
zram: support idle/huge page writeback
Add a new feature "zram idle/huge page writeback". In the zram-swap use
case, zram usually has many idle/huge swap pages. It's pointless to keep
them in memory (ie, zram).
To solve this problem, this feature introduces idle/huge page writeback to
the backing device so the goal is to save more memory space on embedded
systems.
Normal sequence to use idle/huge page writeback feature is as follows,
while (1) {
# mark allocated zram slot to idle
echo all > /sys/block/zram0/idle
# leave system working for several hours
# Unless there is no access for some blocks on zram,
# they are still IDLE marked pages.
echo "idle" > /sys/block/zram0/writeback
or/and
echo "huge" > /sys/block/zram0/writeback
# write the IDLE or/and huge marked slot into backing device
# and free the memory.
}
Per the discussion at
https://lore.kernel.org/lkml/20181122065926.GG3441@jagdpanzerIV/T/#u,
This patch removes direct incommpressibe page writeback feature
(d2afd25114f4 ("zram: write incompressible pages to backing device")).
Below concerns from Sergey:
== &< ==
"IDLE writeback" is superior to "incompressible writeback".
"incompressible writeback" is completely unpredictable and uncontrollable;
it depens on data patterns and compression algorithms. While "IDLE
writeback" is predictable.
I even suspect, that, *ideally*, we can remove "incompressible writeback".
"IDLE pages" is a super set which also includes "incompressible" pages.
So, technically, we still can do "incompressible writeback" from "IDLE
writeback" path; but a much more reasonable one, based on a page idling
period.
I understand that you want to keep "direct incompressible writeback"
around. ZRAM is especially popular on devices which do suffer from flash
wearout, so I can see "incompressible writeback" path becoming a dead
code, long term.
== &< ==
Below concerns from Minchan:
== &< ==
My concern is if we enable CONFIG_ZRAM_WRITEBACK in this implementation,
both hugepage/idlepage writeck will turn on. However someuser want to
enable only idlepage writeback so we need to introduce turn on/off knob
for hugepage or new CONFIG_ZRAM_IDLEPAGE_WRITEBACK for those usecase. I
don't want to make it complicated *if possible*.
Long term, I imagine we need to make VM aware of new swap hierarchy a
little bit different with as-is. For example, first high priority swap
can return -EIO or -ENOCOMP, swap try to fallback to next lower priority
swap device. With that, hugepage writeback will work tranparently.
So we could regard it as regression because incompressible pages doesn't
go to backing storage automatically. Instead, user should do it via "echo
huge" > /sys/block/zram/writeback" manually.
== &< ==
Link: http://lkml.kernel.org/r/20181127055429.251614-6-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Joey Pabalinas <joeypabalinas@gmail.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 00:36:47 -08:00
|
|
|
goto next;
|
|
|
|
|
|
|
|
zram_free_page(zram, index);
|
|
|
|
zram_set_flag(zram, index, ZRAM_WB);
|
2024-12-18 15:34:19 +09:00
|
|
|
zram_set_handle(zram, index, blk_idx);
|
zram: support idle/huge page writeback
Add a new feature "zram idle/huge page writeback". In the zram-swap use
case, zram usually has many idle/huge swap pages. It's pointless to keep
them in memory (ie, zram).
To solve this problem, this feature introduces idle/huge page writeback to
the backing device so the goal is to save more memory space on embedded
systems.
Normal sequence to use idle/huge page writeback feature is as follows,
while (1) {
# mark allocated zram slot to idle
echo all > /sys/block/zram0/idle
# leave system working for several hours
# Unless there is no access for some blocks on zram,
# they are still IDLE marked pages.
echo "idle" > /sys/block/zram0/writeback
or/and
echo "huge" > /sys/block/zram0/writeback
# write the IDLE or/and huge marked slot into backing device
# and free the memory.
}
Per the discussion at
https://lore.kernel.org/lkml/20181122065926.GG3441@jagdpanzerIV/T/#u,
This patch removes direct incommpressibe page writeback feature
(d2afd25114f4 ("zram: write incompressible pages to backing device")).
Below concerns from Sergey:
== &< ==
"IDLE writeback" is superior to "incompressible writeback".
"incompressible writeback" is completely unpredictable and uncontrollable;
it depens on data patterns and compression algorithms. While "IDLE
writeback" is predictable.
I even suspect, that, *ideally*, we can remove "incompressible writeback".
"IDLE pages" is a super set which also includes "incompressible" pages.
So, technically, we still can do "incompressible writeback" from "IDLE
writeback" path; but a much more reasonable one, based on a page idling
period.
I understand that you want to keep "direct incompressible writeback"
around. ZRAM is especially popular on devices which do suffer from flash
wearout, so I can see "incompressible writeback" path becoming a dead
code, long term.
== &< ==
Below concerns from Minchan:
== &< ==
My concern is if we enable CONFIG_ZRAM_WRITEBACK in this implementation,
both hugepage/idlepage writeck will turn on. However someuser want to
enable only idlepage writeback so we need to introduce turn on/off knob
for hugepage or new CONFIG_ZRAM_IDLEPAGE_WRITEBACK for those usecase. I
don't want to make it complicated *if possible*.
Long term, I imagine we need to make VM aware of new swap hierarchy a
little bit different with as-is. For example, first high priority swap
can return -EIO or -ENOCOMP, swap try to fallback to next lower priority
swap device. With that, hugepage writeback will work tranparently.
So we could regard it as regression because incompressible pages doesn't
go to backing storage automatically. Instead, user should do it via "echo
huge" > /sys/block/zram/writeback" manually.
== &< ==
Link: http://lkml.kernel.org/r/20181127055429.251614-6-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Joey Pabalinas <joeypabalinas@gmail.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 00:36:47 -08:00
|
|
|
blk_idx = 0;
|
|
|
|
atomic64_inc(&zram->stats.pages_stored);
|
2019-01-08 15:22:53 -08:00
|
|
|
spin_lock(&zram->wb_limit_lock);
|
|
|
|
if (zram->wb_limit_enable && zram->bd_wb_limit > 0)
|
|
|
|
zram->bd_wb_limit -= 1UL << (PAGE_SHIFT - 12);
|
|
|
|
spin_unlock(&zram->wb_limit_lock);
|
zram: support idle/huge page writeback
Add a new feature "zram idle/huge page writeback". In the zram-swap use
case, zram usually has many idle/huge swap pages. It's pointless to keep
them in memory (ie, zram).
To solve this problem, this feature introduces idle/huge page writeback to
the backing device so the goal is to save more memory space on embedded
systems.
Normal sequence to use idle/huge page writeback feature is as follows,
while (1) {
# mark allocated zram slot to idle
echo all > /sys/block/zram0/idle
# leave system working for several hours
# Unless there is no access for some blocks on zram,
# they are still IDLE marked pages.
echo "idle" > /sys/block/zram0/writeback
or/and
echo "huge" > /sys/block/zram0/writeback
# write the IDLE or/and huge marked slot into backing device
# and free the memory.
}
Per the discussion at
https://lore.kernel.org/lkml/20181122065926.GG3441@jagdpanzerIV/T/#u,
This patch removes direct incommpressibe page writeback feature
(d2afd25114f4 ("zram: write incompressible pages to backing device")).
Below concerns from Sergey:
== &< ==
"IDLE writeback" is superior to "incompressible writeback".
"incompressible writeback" is completely unpredictable and uncontrollable;
it depens on data patterns and compression algorithms. While "IDLE
writeback" is predictable.
I even suspect, that, *ideally*, we can remove "incompressible writeback".
"IDLE pages" is a super set which also includes "incompressible" pages.
So, technically, we still can do "incompressible writeback" from "IDLE
writeback" path; but a much more reasonable one, based on a page idling
period.
I understand that you want to keep "direct incompressible writeback"
around. ZRAM is especially popular on devices which do suffer from flash
wearout, so I can see "incompressible writeback" path becoming a dead
code, long term.
== &< ==
Below concerns from Minchan:
== &< ==
My concern is if we enable CONFIG_ZRAM_WRITEBACK in this implementation,
both hugepage/idlepage writeck will turn on. However someuser want to
enable only idlepage writeback so we need to introduce turn on/off knob
for hugepage or new CONFIG_ZRAM_IDLEPAGE_WRITEBACK for those usecase. I
don't want to make it complicated *if possible*.
Long term, I imagine we need to make VM aware of new swap hierarchy a
little bit different with as-is. For example, first high priority swap
can return -EIO or -ENOCOMP, swap try to fallback to next lower priority
swap device. With that, hugepage writeback will work tranparently.
So we could regard it as regression because incompressible pages doesn't
go to backing storage automatically. Instead, user should do it via "echo
huge" > /sys/block/zram/writeback" manually.
== &< ==
Link: http://lkml.kernel.org/r/20181127055429.251614-6-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Joey Pabalinas <joeypabalinas@gmail.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 00:36:47 -08:00
|
|
|
next:
|
|
|
|
zram_slot_unlock(zram, index);
|
2024-09-17 11:09:09 +09:00
|
|
|
release_pp_slot(zram, pps);
|
2024-12-18 15:34:24 +09:00
|
|
|
|
|
|
|
cond_resched();
|
zram: support idle/huge page writeback
Add a new feature "zram idle/huge page writeback". In the zram-swap use
case, zram usually has many idle/huge swap pages. It's pointless to keep
them in memory (ie, zram).
To solve this problem, this feature introduces idle/huge page writeback to
the backing device so the goal is to save more memory space on embedded
systems.
Normal sequence to use idle/huge page writeback feature is as follows,
while (1) {
# mark allocated zram slot to idle
echo all > /sys/block/zram0/idle
# leave system working for several hours
# Unless there is no access for some blocks on zram,
# they are still IDLE marked pages.
echo "idle" > /sys/block/zram0/writeback
or/and
echo "huge" > /sys/block/zram0/writeback
# write the IDLE or/and huge marked slot into backing device
# and free the memory.
}
Per the discussion at
https://lore.kernel.org/lkml/20181122065926.GG3441@jagdpanzerIV/T/#u,
This patch removes direct incommpressibe page writeback feature
(d2afd25114f4 ("zram: write incompressible pages to backing device")).
Below concerns from Sergey:
== &< ==
"IDLE writeback" is superior to "incompressible writeback".
"incompressible writeback" is completely unpredictable and uncontrollable;
it depens on data patterns and compression algorithms. While "IDLE
writeback" is predictable.
I even suspect, that, *ideally*, we can remove "incompressible writeback".
"IDLE pages" is a super set which also includes "incompressible" pages.
So, technically, we still can do "incompressible writeback" from "IDLE
writeback" path; but a much more reasonable one, based on a page idling
period.
I understand that you want to keep "direct incompressible writeback"
around. ZRAM is especially popular on devices which do suffer from flash
wearout, so I can see "incompressible writeback" path becoming a dead
code, long term.
== &< ==
Below concerns from Minchan:
== &< ==
My concern is if we enable CONFIG_ZRAM_WRITEBACK in this implementation,
both hugepage/idlepage writeck will turn on. However someuser want to
enable only idlepage writeback so we need to introduce turn on/off knob
for hugepage or new CONFIG_ZRAM_IDLEPAGE_WRITEBACK for those usecase. I
don't want to make it complicated *if possible*.
Long term, I imagine we need to make VM aware of new swap hierarchy a
little bit different with as-is. For example, first high priority swap
can return -EIO or -ENOCOMP, swap try to fallback to next lower priority
swap device. With that, hugepage writeback will work tranparently.
So we could regard it as regression because incompressible pages doesn't
go to backing storage automatically. Instead, user should do it via "echo
huge" > /sys/block/zram/writeback" manually.
== &< ==
Link: http://lkml.kernel.org/r/20181127055429.251614-6-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Joey Pabalinas <joeypabalinas@gmail.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 00:36:47 -08:00
|
|
|
}
|
|
|
|
|
|
|
|
if (blk_idx)
|
|
|
|
free_block_bdev(zram, blk_idx);
|
|
|
|
__free_page(page);
|
|
|
|
release_init_lock:
|
2024-09-17 11:09:09 +09:00
|
|
|
release_pp_ctl(zram, ctl);
|
2024-09-17 11:09:07 +09:00
|
|
|
atomic_set(&zram->pp_in_progress, 0);
|
zram: support idle/huge page writeback
Add a new feature "zram idle/huge page writeback". In the zram-swap use
case, zram usually has many idle/huge swap pages. It's pointless to keep
them in memory (ie, zram).
To solve this problem, this feature introduces idle/huge page writeback to
the backing device so the goal is to save more memory space on embedded
systems.
Normal sequence to use idle/huge page writeback feature is as follows,
while (1) {
# mark allocated zram slot to idle
echo all > /sys/block/zram0/idle
# leave system working for several hours
# Unless there is no access for some blocks on zram,
# they are still IDLE marked pages.
echo "idle" > /sys/block/zram0/writeback
or/and
echo "huge" > /sys/block/zram0/writeback
# write the IDLE or/and huge marked slot into backing device
# and free the memory.
}
Per the discussion at
https://lore.kernel.org/lkml/20181122065926.GG3441@jagdpanzerIV/T/#u,
This patch removes direct incommpressibe page writeback feature
(d2afd25114f4 ("zram: write incompressible pages to backing device")).
Below concerns from Sergey:
== &< ==
"IDLE writeback" is superior to "incompressible writeback".
"incompressible writeback" is completely unpredictable and uncontrollable;
it depens on data patterns and compression algorithms. While "IDLE
writeback" is predictable.
I even suspect, that, *ideally*, we can remove "incompressible writeback".
"IDLE pages" is a super set which also includes "incompressible" pages.
So, technically, we still can do "incompressible writeback" from "IDLE
writeback" path; but a much more reasonable one, based on a page idling
period.
I understand that you want to keep "direct incompressible writeback"
around. ZRAM is especially popular on devices which do suffer from flash
wearout, so I can see "incompressible writeback" path becoming a dead
code, long term.
== &< ==
Below concerns from Minchan:
== &< ==
My concern is if we enable CONFIG_ZRAM_WRITEBACK in this implementation,
both hugepage/idlepage writeck will turn on. However someuser want to
enable only idlepage writeback so we need to introduce turn on/off knob
for hugepage or new CONFIG_ZRAM_IDLEPAGE_WRITEBACK for those usecase. I
don't want to make it complicated *if possible*.
Long term, I imagine we need to make VM aware of new swap hierarchy a
little bit different with as-is. For example, first high priority swap
can return -EIO or -ENOCOMP, swap try to fallback to next lower priority
swap device. With that, hugepage writeback will work tranparently.
So we could regard it as regression because incompressible pages doesn't
go to backing storage automatically. Instead, user should do it via "echo
huge" > /sys/block/zram/writeback" manually.
== &< ==
Link: http://lkml.kernel.org/r/20181127055429.251614-6-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Joey Pabalinas <joeypabalinas@gmail.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 00:36:47 -08:00
|
|
|
up_read(&zram->init_lock);
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2017-09-06 16:20:07 -07:00
|
|
|
struct zram_work {
|
|
|
|
struct work_struct work;
|
|
|
|
struct zram *zram;
|
|
|
|
unsigned long entry;
|
2023-04-11 19:14:56 +02:00
|
|
|
struct page *page;
|
2023-04-11 19:14:59 +02:00
|
|
|
int error;
|
2017-09-06 16:20:07 -07:00
|
|
|
};
|
|
|
|
|
|
|
|
static void zram_sync_read(struct work_struct *work)
|
|
|
|
{
|
|
|
|
struct zram_work *zw = container_of(work, struct zram_work, work);
|
2023-04-11 19:14:58 +02:00
|
|
|
struct bio_vec bv;
|
|
|
|
struct bio bio;
|
2017-09-06 16:20:07 -07:00
|
|
|
|
2024-04-28 18:55:47 -04:00
|
|
|
bio_init(&bio, zw->zram->bdev, &bv, 1, REQ_OP_READ);
|
2023-04-11 19:14:58 +02:00
|
|
|
bio.bi_iter.bi_sector = zw->entry * (PAGE_SIZE >> 9);
|
|
|
|
__bio_add_page(&bio, zw->page, PAGE_SIZE, 0);
|
2023-04-11 19:14:59 +02:00
|
|
|
zw->error = submit_bio_wait(&bio);
|
2017-09-06 16:20:07 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2020-07-01 10:59:43 +02:00
|
|
|
* Block layer want one ->submit_bio to be active at a time, so if we use
|
|
|
|
* chained IO with parent IO in same context, it's a deadlock. To avoid that,
|
|
|
|
* use a worker thread context.
|
2017-09-06 16:20:07 -07:00
|
|
|
*/
|
2023-04-11 19:14:56 +02:00
|
|
|
static int read_from_bdev_sync(struct zram *zram, struct page *page,
|
2023-04-11 19:14:58 +02:00
|
|
|
unsigned long entry)
|
2017-09-06 16:20:07 -07:00
|
|
|
{
|
|
|
|
struct zram_work work;
|
|
|
|
|
2023-04-11 19:14:56 +02:00
|
|
|
work.page = page;
|
2017-09-06 16:20:07 -07:00
|
|
|
work.zram = zram;
|
|
|
|
work.entry = entry;
|
|
|
|
|
|
|
|
INIT_WORK_ONSTACK(&work.work, zram_sync_read);
|
|
|
|
queue_work(system_unbound_wq, &work.work);
|
|
|
|
flush_work(&work.work);
|
|
|
|
destroy_work_on_stack(&work.work);
|
|
|
|
|
2023-04-11 19:14:59 +02:00
|
|
|
return work.error;
|
2017-09-06 16:20:07 -07:00
|
|
|
}
|
|
|
|
|
2023-04-11 19:14:56 +02:00
|
|
|
static int read_from_bdev(struct zram *zram, struct page *page,
|
2023-04-11 19:14:58 +02:00
|
|
|
unsigned long entry, struct bio *parent)
|
2017-09-06 16:20:07 -07:00
|
|
|
{
|
2018-12-28 00:36:51 -08:00
|
|
|
atomic64_inc(&zram->stats.bd_reads);
|
2023-04-11 19:14:58 +02:00
|
|
|
if (!parent) {
|
2023-04-11 19:14:43 +02:00
|
|
|
if (WARN_ON_ONCE(!IS_ENABLED(ZRAM_PARTIAL_IO)))
|
|
|
|
return -EIO;
|
2023-04-11 19:14:58 +02:00
|
|
|
return read_from_bdev_sync(zram, page, entry);
|
2023-04-11 19:14:43 +02:00
|
|
|
}
|
2023-04-11 19:14:57 +02:00
|
|
|
read_from_bdev_async(zram, page, entry, parent);
|
2023-04-11 19:14:59 +02:00
|
|
|
return 0;
|
2017-09-06 16:20:07 -07:00
|
|
|
}
|
2017-09-06 16:19:54 -07:00
|
|
|
#else
|
|
|
|
static inline void reset_bdev(struct zram *zram) {};
|
2023-04-11 19:14:56 +02:00
|
|
|
static int read_from_bdev(struct zram *zram, struct page *page,
|
2023-04-11 19:14:58 +02:00
|
|
|
unsigned long entry, struct bio *parent)
|
2017-09-06 16:20:07 -07:00
|
|
|
{
|
|
|
|
return -EIO;
|
|
|
|
}
|
2018-12-28 00:36:40 -08:00
|
|
|
|
|
|
|
static void free_block_bdev(struct zram *zram, unsigned long blk_idx) {};
|
2017-09-06 16:19:54 -07:00
|
|
|
#endif
|
|
|
|
|
2018-06-07 17:05:49 -07:00
|
|
|
#ifdef CONFIG_ZRAM_MEMORY_TRACKING
|
|
|
|
|
|
|
|
static struct dentry *zram_debugfs_root;
|
|
|
|
|
|
|
|
static void zram_debugfs_create(void)
|
|
|
|
{
|
|
|
|
zram_debugfs_root = debugfs_create_dir("zram", NULL);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void zram_debugfs_destroy(void)
|
|
|
|
{
|
|
|
|
debugfs_remove_recursive(zram_debugfs_root);
|
|
|
|
}
|
|
|
|
|
|
|
|
static ssize_t read_block_state(struct file *file, char __user *buf,
|
|
|
|
size_t count, loff_t *ppos)
|
|
|
|
{
|
|
|
|
char *kbuf;
|
|
|
|
ssize_t index, written = 0;
|
|
|
|
struct zram *zram = file->private_data;
|
|
|
|
unsigned long nr_pages = zram->disksize >> PAGE_SHIFT;
|
|
|
|
struct timespec64 ts;
|
|
|
|
|
|
|
|
kbuf = kvmalloc(count, GFP_KERNEL);
|
|
|
|
if (!kbuf)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
down_read(&zram->init_lock);
|
|
|
|
if (!init_done(zram)) {
|
|
|
|
up_read(&zram->init_lock);
|
|
|
|
kvfree(kbuf);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
for (index = *ppos; index < nr_pages; index++) {
|
|
|
|
int copied;
|
|
|
|
|
|
|
|
zram_slot_lock(zram, index);
|
|
|
|
if (!zram_allocated(zram, index))
|
|
|
|
goto next;
|
|
|
|
|
|
|
|
ts = ktime_to_timespec64(zram->table[index].ac_time);
|
|
|
|
copied = snprintf(kbuf + written, count,
|
2022-11-09 20:50:47 +09:00
|
|
|
"%12zd %12lld.%06lu %c%c%c%c%c%c\n",
|
2018-06-07 17:05:49 -07:00
|
|
|
index, (s64)ts.tv_sec,
|
|
|
|
ts.tv_nsec / NSEC_PER_USEC,
|
|
|
|
zram_test_flag(zram, index, ZRAM_SAME) ? 's' : '.',
|
|
|
|
zram_test_flag(zram, index, ZRAM_WB) ? 'w' : '.',
|
2018-12-28 00:36:44 -08:00
|
|
|
zram_test_flag(zram, index, ZRAM_HUGE) ? 'h' : '.',
|
2022-11-09 20:50:39 +09:00
|
|
|
zram_test_flag(zram, index, ZRAM_IDLE) ? 'i' : '.',
|
2022-11-09 20:50:47 +09:00
|
|
|
zram_get_priority(zram, index) ? 'r' : '.',
|
|
|
|
zram_test_flag(zram, index,
|
|
|
|
ZRAM_INCOMPRESSIBLE) ? 'n' : '.');
|
2018-06-07 17:05:49 -07:00
|
|
|
|
2021-11-05 13:45:12 -07:00
|
|
|
if (count <= copied) {
|
2018-06-07 17:05:49 -07:00
|
|
|
zram_slot_unlock(zram, index);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
written += copied;
|
|
|
|
count -= copied;
|
|
|
|
next:
|
|
|
|
zram_slot_unlock(zram, index);
|
|
|
|
*ppos += 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
up_read(&zram->init_lock);
|
|
|
|
if (copy_to_user(buf, kbuf, written))
|
|
|
|
written = -EFAULT;
|
|
|
|
kvfree(kbuf);
|
|
|
|
|
|
|
|
return written;
|
|
|
|
}
|
|
|
|
|
|
|
|
static const struct file_operations proc_zram_block_state_op = {
|
|
|
|
.open = simple_open,
|
|
|
|
.read = read_block_state,
|
|
|
|
.llseek = default_llseek,
|
|
|
|
};
|
|
|
|
|
|
|
|
static void zram_debugfs_register(struct zram *zram)
|
|
|
|
{
|
|
|
|
if (!zram_debugfs_root)
|
|
|
|
return;
|
|
|
|
|
|
|
|
zram->debugfs_dir = debugfs_create_dir(zram->disk->disk_name,
|
|
|
|
zram_debugfs_root);
|
|
|
|
debugfs_create_file("block_state", 0400, zram->debugfs_dir,
|
|
|
|
zram, &proc_zram_block_state_op);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void zram_debugfs_unregister(struct zram *zram)
|
|
|
|
{
|
|
|
|
debugfs_remove_recursive(zram->debugfs_dir);
|
|
|
|
}
|
|
|
|
#else
|
|
|
|
static void zram_debugfs_create(void) {};
|
|
|
|
static void zram_debugfs_destroy(void) {};
|
|
|
|
static void zram_debugfs_register(struct zram *zram) {};
|
|
|
|
static void zram_debugfs_unregister(struct zram *zram) {};
|
|
|
|
#endif
|
2017-09-06 16:19:54 -07:00
|
|
|
|
2016-05-20 16:59:59 -07:00
|
|
|
/*
|
|
|
|
* We switched to per-cpu streams and this attr is not needed anymore.
|
|
|
|
* However, we will keep it around for some time, because:
|
|
|
|
* a) we may revert per-cpu streams in the future
|
|
|
|
* b) it's visible to user space and we need to follow our 2 years
|
|
|
|
* retirement rule; but we already have a number of 'soon to be
|
|
|
|
* altered' attrs, so max_comp_streams need to wait for the next
|
|
|
|
* layoff cycle.
|
|
|
|
*/
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
static ssize_t max_comp_streams_show(struct device *dev,
|
|
|
|
struct device_attribute *attr, char *buf)
|
|
|
|
{
|
2016-05-20 16:59:59 -07:00
|
|
|
return scnprintf(buf, PAGE_SIZE, "%d\n", num_online_cpus());
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
}
|
|
|
|
|
zram: add multi stream functionality
Existing zram (zcomp) implementation has only one compression stream
(buffer and algorithm private part), so in order to prevent data
corruption only one write (compress operation) can use this compression
stream, forcing all concurrent write operations to wait for stream lock
to be released. This patch changes zcomp to keep a compression streams
list of user-defined size (via sysfs device attr). Each write operation
still exclusively holds compression stream, the difference is that we
can have N write operations (depending on size of streams list)
executing in parallel. See TEST section later in commit message for
performance data.
Introduce struct zcomp_strm_multi and a set of functions to manage
zcomp_strm stream access. zcomp_strm_multi has a list of idle
zcomp_strm structs, spinlock to protect idle list and wait queue, making
it possible to perform parallel compressions.
The following set of functions added:
- zcomp_strm_multi_find()/zcomp_strm_multi_release()
find and release a compression stream, implement required locking
- zcomp_strm_multi_create()/zcomp_strm_multi_destroy()
create and destroy zcomp_strm_multi
zcomp ->strm_find() and ->strm_release() callbacks are set during
initialisation to zcomp_strm_multi_find()/zcomp_strm_multi_release()
correspondingly.
Each time zcomp issues a zcomp_strm_multi_find() call, the following set
of operations performed:
- spin lock strm_lock
- if idle list is not empty, remove zcomp_strm from idle list, spin
unlock and return zcomp stream pointer to caller
- if idle list is empty, current adds itself to wait queue. it will be
awaken by zcomp_strm_multi_release() caller.
zcomp_strm_multi_release():
- spin lock strm_lock
- add zcomp stream to idle list
- spin unlock, wake up sleeper
Minchan Kim reported that spinlock-based locking scheme has demonstrated
a severe perfomance regression for single compression stream case,
comparing to mutex-based (see https://lkml.org/lkml/2014/2/18/16)
base spinlock mutex
==Initial write ==Initial write ==Initial write
records: 5 records: 5 records: 5
avg: 1642424.35 avg: 699610.40 avg: 1655583.71
std: 39890.95(2.43%) std: 232014.19(33.16%) std: 52293.96
max: 1690170.94 max: 1163473.45 max: 1697164.75
min: 1568669.52 min: 573429.88 min: 1553410.23
==Rewrite ==Rewrite ==Rewrite
records: 5 records: 5 records: 5
avg: 1611775.39 avg: 501406.64 avg: 1684419.11
std: 17144.58(1.06%) std: 15354.41(3.06%) std: 18367.42
max: 1641800.95 max: 531356.78 max: 1706445.84
min: 1593515.27 min: 488817.78 min: 1655335.73
When only one compression stream available, mutex with spin on owner
tends to perform much better than frequent wait_event()/wake_up(). This
is why single stream implemented as a special case with mutex locking.
Introduce and document zram device attribute max_comp_streams. This
attr shows and stores current zcomp's max number of zcomp streams
(max_strm). Extend zcomp's zcomp_create() with `max_strm' parameter.
`max_strm' limits the number of zcomp_strm structs in compression
backend's idle list (max_comp_streams).
max_comp_streams used during initialisation as follows:
-- passing to zcomp_create() max_strm equals to 1 will initialise zcomp
using single compression stream zcomp_strm_single (mutex-based locking).
-- passing to zcomp_create() max_strm greater than 1 will initialise zcomp
using multi compression stream zcomp_strm_multi (spinlock-based locking).
default max_comp_streams value is 1, meaning that zram with single stream
will be initialised.
Later patch will introduce configuration knob to change max_comp_streams
on already initialised and used zcomp.
TEST
iozone -t 3 -R -r 16K -s 60M -I +Z
test base 1 strm (mutex) 3 strm (spinlock)
-----------------------------------------------------------------------
Initial write 589286.78 583518.39 718011.05
Rewrite 604837.97 596776.38 1515125.72
Random write 584120.11 595714.58 1388850.25
Pwrite 535731.17 541117.38 739295.27
Fwrite 1418083.88 1478612.72 1484927.06
Usage example:
set max_comp_streams to 4
echo 4 > /sys/block/zram0/max_comp_streams
show current max_comp_streams (default value is 1).
cat /sys/block/zram0/max_comp_streams
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 15:38:14 -07:00
|
|
|
static ssize_t max_comp_streams_store(struct device *dev,
|
|
|
|
struct device_attribute *attr, const char *buf, size_t len)
|
|
|
|
{
|
2016-05-20 16:59:59 -07:00
|
|
|
return len;
|
zram: add multi stream functionality
Existing zram (zcomp) implementation has only one compression stream
(buffer and algorithm private part), so in order to prevent data
corruption only one write (compress operation) can use this compression
stream, forcing all concurrent write operations to wait for stream lock
to be released. This patch changes zcomp to keep a compression streams
list of user-defined size (via sysfs device attr). Each write operation
still exclusively holds compression stream, the difference is that we
can have N write operations (depending on size of streams list)
executing in parallel. See TEST section later in commit message for
performance data.
Introduce struct zcomp_strm_multi and a set of functions to manage
zcomp_strm stream access. zcomp_strm_multi has a list of idle
zcomp_strm structs, spinlock to protect idle list and wait queue, making
it possible to perform parallel compressions.
The following set of functions added:
- zcomp_strm_multi_find()/zcomp_strm_multi_release()
find and release a compression stream, implement required locking
- zcomp_strm_multi_create()/zcomp_strm_multi_destroy()
create and destroy zcomp_strm_multi
zcomp ->strm_find() and ->strm_release() callbacks are set during
initialisation to zcomp_strm_multi_find()/zcomp_strm_multi_release()
correspondingly.
Each time zcomp issues a zcomp_strm_multi_find() call, the following set
of operations performed:
- spin lock strm_lock
- if idle list is not empty, remove zcomp_strm from idle list, spin
unlock and return zcomp stream pointer to caller
- if idle list is empty, current adds itself to wait queue. it will be
awaken by zcomp_strm_multi_release() caller.
zcomp_strm_multi_release():
- spin lock strm_lock
- add zcomp stream to idle list
- spin unlock, wake up sleeper
Minchan Kim reported that spinlock-based locking scheme has demonstrated
a severe perfomance regression for single compression stream case,
comparing to mutex-based (see https://lkml.org/lkml/2014/2/18/16)
base spinlock mutex
==Initial write ==Initial write ==Initial write
records: 5 records: 5 records: 5
avg: 1642424.35 avg: 699610.40 avg: 1655583.71
std: 39890.95(2.43%) std: 232014.19(33.16%) std: 52293.96
max: 1690170.94 max: 1163473.45 max: 1697164.75
min: 1568669.52 min: 573429.88 min: 1553410.23
==Rewrite ==Rewrite ==Rewrite
records: 5 records: 5 records: 5
avg: 1611775.39 avg: 501406.64 avg: 1684419.11
std: 17144.58(1.06%) std: 15354.41(3.06%) std: 18367.42
max: 1641800.95 max: 531356.78 max: 1706445.84
min: 1593515.27 min: 488817.78 min: 1655335.73
When only one compression stream available, mutex with spin on owner
tends to perform much better than frequent wait_event()/wake_up(). This
is why single stream implemented as a special case with mutex locking.
Introduce and document zram device attribute max_comp_streams. This
attr shows and stores current zcomp's max number of zcomp streams
(max_strm). Extend zcomp's zcomp_create() with `max_strm' parameter.
`max_strm' limits the number of zcomp_strm structs in compression
backend's idle list (max_comp_streams).
max_comp_streams used during initialisation as follows:
-- passing to zcomp_create() max_strm equals to 1 will initialise zcomp
using single compression stream zcomp_strm_single (mutex-based locking).
-- passing to zcomp_create() max_strm greater than 1 will initialise zcomp
using multi compression stream zcomp_strm_multi (spinlock-based locking).
default max_comp_streams value is 1, meaning that zram with single stream
will be initialised.
Later patch will introduce configuration knob to change max_comp_streams
on already initialised and used zcomp.
TEST
iozone -t 3 -R -r 16K -s 60M -I +Z
test base 1 strm (mutex) 3 strm (spinlock)
-----------------------------------------------------------------------
Initial write 589286.78 583518.39 718011.05
Rewrite 604837.97 596776.38 1515125.72
Random write 584120.11 595714.58 1388850.25
Pwrite 535731.17 541117.38 739295.27
Fwrite 1418083.88 1478612.72 1484927.06
Usage example:
set max_comp_streams to 4
echo 4 > /sys/block/zram0/max_comp_streams
show current max_comp_streams (default value is 1).
cat /sys/block/zram0/max_comp_streams
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 15:38:14 -07:00
|
|
|
}
|
|
|
|
|
2022-11-09 20:50:36 +09:00
|
|
|
static void comp_algorithm_set(struct zram *zram, u32 prio, const char *alg)
|
2014-04-07 15:38:17 -07:00
|
|
|
{
|
2022-11-09 20:50:36 +09:00
|
|
|
/* Do not free statically defined compression algorithms */
|
|
|
|
if (zram->comp_algs[prio] != default_compressor)
|
|
|
|
kfree(zram->comp_algs[prio]);
|
|
|
|
|
|
|
|
zram->comp_algs[prio] = alg;
|
|
|
|
}
|
|
|
|
|
|
|
|
static ssize_t __comp_algorithm_show(struct zram *zram, u32 prio, char *buf)
|
|
|
|
{
|
|
|
|
ssize_t sz;
|
2014-04-07 15:38:17 -07:00
|
|
|
|
|
|
|
down_read(&zram->init_lock);
|
2022-11-09 20:50:36 +09:00
|
|
|
sz = zcomp_available_show(zram->comp_algs[prio], buf);
|
2014-04-07 15:38:17 -07:00
|
|
|
up_read(&zram->init_lock);
|
|
|
|
|
|
|
|
return sz;
|
|
|
|
}
|
|
|
|
|
2022-11-09 20:50:36 +09:00
|
|
|
static int __comp_algorithm_store(struct zram *zram, u32 prio, const char *buf)
|
2014-04-07 15:38:17 -07:00
|
|
|
{
|
2022-11-09 20:50:35 +09:00
|
|
|
char *compressor;
|
2015-06-25 15:00:29 -07:00
|
|
|
size_t sz;
|
|
|
|
|
2022-11-09 20:50:35 +09:00
|
|
|
sz = strlen(buf);
|
|
|
|
if (sz >= CRYPTO_MAX_ALG_NAME)
|
|
|
|
return -E2BIG;
|
|
|
|
|
|
|
|
compressor = kstrdup(buf, GFP_KERNEL);
|
|
|
|
if (!compressor)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
zram: use crypto api to check alg availability
There is no way to get a string with all the crypto comp algorithms
supported by the crypto comp engine, so we need to maintain our own
backends list. At the same time we additionally need to use
crypto_has_comp() to make sure that the user has requested a compression
algorithm that is recognized by the crypto comp engine. Relying on
/proc/crypto is not an options here, because it does not show
not-yet-inserted compression modules.
Example:
modprobe zram
cat /proc/crypto | grep -i lz4
modprobe lz4
cat /proc/crypto | grep -i lz4
name : lz4
driver : lz4-generic
module : lz4
So the user can't tell exactly if the lz4 is really supported from
/proc/crypto output, unless someone or something has loaded it.
This patch also adds crypto_has_comp() to zcomp_available_show(). We
store all the compression algorithms names in zcomp's `backends' array,
regardless the CONFIG_CRYPTO_FOO configuration, but show only those that
are also supported by crypto engine. This helps user to know the exact
list of compression algorithms that can be used.
Example:
module lz4 is not loaded yet, but is supported by the crypto
engine. /proc/crypto has no information on this module, while
zram's `comp_algorithm' lists it:
cat /proc/crypto | grep -i lz4
cat /sys/block/zram0/comp_algorithm
[lzo] lz4 deflate lz4hc 842
We still use the `backends' array to determine if the requested
compression backend is known to crypto api. This array, however, may not
contain some entries, therefore as the last step we call crypto_has_comp()
function which attempts to insmod the requested compression algorithm to
determine if crypto api supports it. The advantage of this method is that
now we permit the usage of out-of-tree crypto compression modules
(implementing S/W or H/W compression).
[sergey.senozhatsky@gmail.com: zram-use-crypto-api-to-check-alg-availability-v3]
Link: http://lkml.kernel.org/r/20160604024902.11778-4-sergey.senozhatsky@gmail.com
Link: http://lkml.kernel.org/r/20160531122017.2878-5-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-26 15:22:48 -07:00
|
|
|
/* ignore trailing newline */
|
|
|
|
if (sz > 0 && compressor[sz - 1] == '\n')
|
|
|
|
compressor[sz - 1] = 0x00;
|
|
|
|
|
2022-11-09 20:50:35 +09:00
|
|
|
if (!zcomp_available_algorithm(compressor)) {
|
|
|
|
kfree(compressor);
|
2015-11-06 16:29:01 -08:00
|
|
|
return -EINVAL;
|
2022-11-09 20:50:35 +09:00
|
|
|
}
|
2015-11-06 16:29:01 -08:00
|
|
|
|
2014-04-07 15:38:17 -07:00
|
|
|
down_write(&zram->init_lock);
|
|
|
|
if (init_done(zram)) {
|
|
|
|
up_write(&zram->init_lock);
|
2022-11-09 20:50:35 +09:00
|
|
|
kfree(compressor);
|
2014-04-07 15:38:17 -07:00
|
|
|
pr_info("Can't change algorithm for initialized device\n");
|
|
|
|
return -EBUSY;
|
|
|
|
}
|
2015-06-25 15:00:29 -07:00
|
|
|
|
2022-11-09 20:50:36 +09:00
|
|
|
comp_algorithm_set(zram, prio, compressor);
|
2014-04-07 15:38:17 -07:00
|
|
|
up_write(&zram->init_lock);
|
2022-11-09 20:50:36 +09:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2024-09-02 19:56:04 +09:00
|
|
|
static void comp_params_reset(struct zram *zram, u32 prio)
|
2024-09-02 19:56:03 +09:00
|
|
|
{
|
2024-09-02 19:56:04 +09:00
|
|
|
struct zcomp_params *params = &zram->params[prio];
|
|
|
|
|
|
|
|
vfree(params->dict);
|
|
|
|
params->level = ZCOMP_PARAM_NO_LEVEL;
|
|
|
|
params->dict_sz = 0;
|
|
|
|
params->dict = NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int comp_params_store(struct zram *zram, u32 prio, s32 level,
|
|
|
|
const char *dict_path)
|
|
|
|
{
|
|
|
|
ssize_t sz = 0;
|
|
|
|
|
|
|
|
comp_params_reset(zram, prio);
|
|
|
|
|
|
|
|
if (dict_path) {
|
|
|
|
sz = kernel_read_file_from_path(dict_path, 0,
|
|
|
|
&zram->params[prio].dict,
|
|
|
|
INT_MAX,
|
|
|
|
NULL,
|
|
|
|
READING_POLICY);
|
|
|
|
if (sz < 0)
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
zram->params[prio].dict_sz = sz;
|
2024-09-02 19:56:03 +09:00
|
|
|
zram->params[prio].level = level;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static ssize_t algorithm_params_store(struct device *dev,
|
|
|
|
struct device_attribute *attr,
|
|
|
|
const char *buf,
|
|
|
|
size_t len)
|
|
|
|
{
|
|
|
|
s32 prio = ZRAM_PRIMARY_COMP, level = ZCOMP_PARAM_NO_LEVEL;
|
2024-09-02 19:56:04 +09:00
|
|
|
char *args, *param, *val, *algo = NULL, *dict_path = NULL;
|
2024-09-02 19:56:03 +09:00
|
|
|
struct zram *zram = dev_to_zram(dev);
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
args = skip_spaces(buf);
|
|
|
|
while (*args) {
|
|
|
|
args = next_arg(args, ¶m, &val);
|
|
|
|
|
|
|
|
if (!val || !*val)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
if (!strcmp(param, "priority")) {
|
|
|
|
ret = kstrtoint(val, 10, &prio);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!strcmp(param, "level")) {
|
|
|
|
ret = kstrtoint(val, 10, &level);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!strcmp(param, "algo")) {
|
|
|
|
algo = val;
|
|
|
|
continue;
|
|
|
|
}
|
2024-09-02 19:56:04 +09:00
|
|
|
|
|
|
|
if (!strcmp(param, "dict")) {
|
|
|
|
dict_path = val;
|
|
|
|
continue;
|
|
|
|
}
|
2024-09-02 19:56:03 +09:00
|
|
|
}
|
|
|
|
|
|
|
|
/* Lookup priority by algorithm name */
|
|
|
|
if (algo) {
|
|
|
|
s32 p;
|
|
|
|
|
|
|
|
prio = -EINVAL;
|
|
|
|
for (p = ZRAM_PRIMARY_COMP; p < ZRAM_MAX_COMPS; p++) {
|
|
|
|
if (!zram->comp_algs[p])
|
|
|
|
continue;
|
|
|
|
|
|
|
|
if (!strcmp(zram->comp_algs[p], algo)) {
|
|
|
|
prio = p;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (prio < ZRAM_PRIMARY_COMP || prio >= ZRAM_MAX_COMPS)
|
|
|
|
return -EINVAL;
|
|
|
|
|
2024-09-02 19:56:04 +09:00
|
|
|
ret = comp_params_store(zram, prio, level, dict_path);
|
2024-09-02 19:56:03 +09:00
|
|
|
return ret ? ret : len;
|
|
|
|
}
|
|
|
|
|
2022-11-09 20:50:36 +09:00
|
|
|
static ssize_t comp_algorithm_show(struct device *dev,
|
|
|
|
struct device_attribute *attr,
|
|
|
|
char *buf)
|
|
|
|
{
|
|
|
|
struct zram *zram = dev_to_zram(dev);
|
|
|
|
|
|
|
|
return __comp_algorithm_show(zram, ZRAM_PRIMARY_COMP, buf);
|
|
|
|
}
|
|
|
|
|
|
|
|
static ssize_t comp_algorithm_store(struct device *dev,
|
|
|
|
struct device_attribute *attr,
|
|
|
|
const char *buf,
|
|
|
|
size_t len)
|
|
|
|
{
|
|
|
|
struct zram *zram = dev_to_zram(dev);
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
ret = __comp_algorithm_store(zram, ZRAM_PRIMARY_COMP, buf);
|
|
|
|
return ret ? ret : len;
|
2014-04-07 15:38:17 -07:00
|
|
|
}
|
|
|
|
|
2022-11-09 20:50:36 +09:00
|
|
|
#ifdef CONFIG_ZRAM_MULTI_COMP
|
|
|
|
static ssize_t recomp_algorithm_show(struct device *dev,
|
|
|
|
struct device_attribute *attr,
|
|
|
|
char *buf)
|
|
|
|
{
|
|
|
|
struct zram *zram = dev_to_zram(dev);
|
|
|
|
ssize_t sz = 0;
|
|
|
|
u32 prio;
|
|
|
|
|
|
|
|
for (prio = ZRAM_SECONDARY_COMP; prio < ZRAM_MAX_COMPS; prio++) {
|
|
|
|
if (!zram->comp_algs[prio])
|
|
|
|
continue;
|
|
|
|
|
|
|
|
sz += scnprintf(buf + sz, PAGE_SIZE - sz - 2, "#%d: ", prio);
|
|
|
|
sz += __comp_algorithm_show(zram, prio, buf + sz);
|
|
|
|
}
|
|
|
|
|
|
|
|
return sz;
|
|
|
|
}
|
|
|
|
|
|
|
|
static ssize_t recomp_algorithm_store(struct device *dev,
|
|
|
|
struct device_attribute *attr,
|
|
|
|
const char *buf,
|
|
|
|
size_t len)
|
|
|
|
{
|
|
|
|
struct zram *zram = dev_to_zram(dev);
|
|
|
|
int prio = ZRAM_SECONDARY_COMP;
|
|
|
|
char *args, *param, *val;
|
|
|
|
char *alg = NULL;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
args = skip_spaces(buf);
|
|
|
|
while (*args) {
|
|
|
|
args = next_arg(args, ¶m, &val);
|
|
|
|
|
2023-01-03 12:01:19 +09:00
|
|
|
if (!val || !*val)
|
2022-11-09 20:50:36 +09:00
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
if (!strcmp(param, "algo")) {
|
|
|
|
alg = val;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!strcmp(param, "priority")) {
|
|
|
|
ret = kstrtoint(val, 10, &prio);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!alg)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
if (prio < ZRAM_SECONDARY_COMP || prio >= ZRAM_MAX_COMPS)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
ret = __comp_algorithm_store(zram, prio, alg);
|
|
|
|
return ret ? ret : len;
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
static ssize_t compact_store(struct device *dev,
|
|
|
|
struct device_attribute *attr, const char *buf, size_t len)
|
2009-09-22 10:26:53 +05:30
|
|
|
{
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
struct zram *zram = dev_to_zram(dev);
|
2009-09-22 10:26:53 +05:30
|
|
|
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
down_read(&zram->init_lock);
|
|
|
|
if (!init_done(zram)) {
|
|
|
|
up_read(&zram->init_lock);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
2009-09-22 10:26:53 +05:30
|
|
|
|
2017-05-03 14:55:47 -07:00
|
|
|
zs_compact(zram->mem_pool);
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
up_read(&zram->init_lock);
|
zram: replace global tb_lock with fine grain lock
Currently, we use a rwlock tb_lock to protect concurrent access to the
whole zram meta table. However, according to the actual access model,
there is only a small chance for upper user to access the same
table[index], so the current lock granularity is too big.
The idea of optimization is to change the lock granularity from whole
meta table to per table entry (table -> table[index]), so that we can
protect concurrent access to the same table[index], meanwhile allow the
maximum concurrency.
With this in mind, several kinds of locks which could be used as a
per-entry lock were tested and compared:
Test environment:
x86-64 Intel Core2 Q8400, system memory 4GB, Ubuntu 12.04,
kernel v3.15.0-rc3 as base, zram with 4 max_comp_streams LZO.
iozone test:
iozone -t 4 -R -r 16K -s 200M -I +Z
(1GB zram with ext4 filesystem, take the average of 10 tests, KB/s)
Test base CAS spinlock rwlock bit_spinlock
-------------------------------------------------------------------
Initial write 1381094 1425435 1422860 1423075 1421521
Rewrite 1529479 1641199 1668762 1672855 1654910
Read 8468009 11324979 11305569 11117273 10997202
Re-read 8467476 11260914 11248059 11145336 10906486
Reverse Read 6821393 8106334 8282174 8279195 8109186
Stride read 7191093 8994306 9153982 8961224 9004434
Random read 7156353 8957932 9167098 8980465 8940476
Mixed workload 4172747 5680814 5927825 5489578 5972253
Random write 1483044 1605588 1594329 1600453 1596010
Pwrite 1276644 1303108 1311612 1314228 1300960
Pread 4324337 4632869 4618386 4457870 4500166
To enhance the possibility of access the same table[index] concurrently,
set zram a small disksize(10MB) and let threads run with large loop
count.
fio test:
fio --bs=32k --randrepeat=1 --randseed=100 --refill_buffers
--scramble_buffers=1 --direct=1 --loops=3000 --numjobs=4
--filename=/dev/zram0 --name=seq-write --rw=write --stonewall
--name=seq-read --rw=read --stonewall --name=seq-readwrite
--rw=rw --stonewall --name=rand-readwrite --rw=randrw --stonewall
(10MB zram raw block device, take the average of 10 tests, KB/s)
Test base CAS spinlock rwlock bit_spinlock
-------------------------------------------------------------
seq-write 933789 999357 1003298 995961 1001958
seq-read 5634130 6577930 6380861 6243912 6230006
seq-rw 1405687 1638117 1640256 1633903 1634459
rand-rw 1386119 1614664 1617211 1609267 1612471
All the optimization methods show a higher performance than the base,
however, it is hard to say which method is the most appropriate.
On the other hand, zram is mostly used on small embedded system, so we
don't want to increase any memory footprint.
This patch pick the bit_spinlock method, pack object size and page_flag
into an unsigned long table.value, so as to not increase any memory
overhead on both 32-bit and 64-bit system.
On the third hand, even though different kinds of locks have different
performances, we can ignore this difference, because: if zram is used as
zram swapfile, the swap subsystem can prevent concurrent access to the
same swapslot; if zram is used as zram-blk for set up filesystem on it,
the upper filesystem and the page cache also prevent concurrent access
of the same block mostly. So we can ignore the different performances
among locks.
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Davidlohr Bueso <davidlohr@hp.com>
Signed-off-by: Weijie Yang <weijie.yang@samsung.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-06 16:08:31 -07:00
|
|
|
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
return len;
|
zram: replace global tb_lock with fine grain lock
Currently, we use a rwlock tb_lock to protect concurrent access to the
whole zram meta table. However, according to the actual access model,
there is only a small chance for upper user to access the same
table[index], so the current lock granularity is too big.
The idea of optimization is to change the lock granularity from whole
meta table to per table entry (table -> table[index]), so that we can
protect concurrent access to the same table[index], meanwhile allow the
maximum concurrency.
With this in mind, several kinds of locks which could be used as a
per-entry lock were tested and compared:
Test environment:
x86-64 Intel Core2 Q8400, system memory 4GB, Ubuntu 12.04,
kernel v3.15.0-rc3 as base, zram with 4 max_comp_streams LZO.
iozone test:
iozone -t 4 -R -r 16K -s 200M -I +Z
(1GB zram with ext4 filesystem, take the average of 10 tests, KB/s)
Test base CAS spinlock rwlock bit_spinlock
-------------------------------------------------------------------
Initial write 1381094 1425435 1422860 1423075 1421521
Rewrite 1529479 1641199 1668762 1672855 1654910
Read 8468009 11324979 11305569 11117273 10997202
Re-read 8467476 11260914 11248059 11145336 10906486
Reverse Read 6821393 8106334 8282174 8279195 8109186
Stride read 7191093 8994306 9153982 8961224 9004434
Random read 7156353 8957932 9167098 8980465 8940476
Mixed workload 4172747 5680814 5927825 5489578 5972253
Random write 1483044 1605588 1594329 1600453 1596010
Pwrite 1276644 1303108 1311612 1314228 1300960
Pread 4324337 4632869 4618386 4457870 4500166
To enhance the possibility of access the same table[index] concurrently,
set zram a small disksize(10MB) and let threads run with large loop
count.
fio test:
fio --bs=32k --randrepeat=1 --randseed=100 --refill_buffers
--scramble_buffers=1 --direct=1 --loops=3000 --numjobs=4
--filename=/dev/zram0 --name=seq-write --rw=write --stonewall
--name=seq-read --rw=read --stonewall --name=seq-readwrite
--rw=rw --stonewall --name=rand-readwrite --rw=randrw --stonewall
(10MB zram raw block device, take the average of 10 tests, KB/s)
Test base CAS spinlock rwlock bit_spinlock
-------------------------------------------------------------
seq-write 933789 999357 1003298 995961 1001958
seq-read 5634130 6577930 6380861 6243912 6230006
seq-rw 1405687 1638117 1640256 1633903 1634459
rand-rw 1386119 1614664 1617211 1609267 1612471
All the optimization methods show a higher performance than the base,
however, it is hard to say which method is the most appropriate.
On the other hand, zram is mostly used on small embedded system, so we
don't want to increase any memory footprint.
This patch pick the bit_spinlock method, pack object size and page_flag
into an unsigned long table.value, so as to not increase any memory
overhead on both 32-bit and 64-bit system.
On the third hand, even though different kinds of locks have different
performances, we can ignore this difference, because: if zram is used as
zram swapfile, the swap subsystem can prevent concurrent access to the
same swapslot; if zram is used as zram-blk for set up filesystem on it,
the upper filesystem and the page cache also prevent concurrent access
of the same block mostly. So we can ignore the different performances
among locks.
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Davidlohr Bueso <davidlohr@hp.com>
Signed-off-by: Weijie Yang <weijie.yang@samsung.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-06 16:08:31 -07:00
|
|
|
}
|
|
|
|
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
static ssize_t io_stat_show(struct device *dev,
|
|
|
|
struct device_attribute *attr, char *buf)
|
zram: replace global tb_lock with fine grain lock
Currently, we use a rwlock tb_lock to protect concurrent access to the
whole zram meta table. However, according to the actual access model,
there is only a small chance for upper user to access the same
table[index], so the current lock granularity is too big.
The idea of optimization is to change the lock granularity from whole
meta table to per table entry (table -> table[index]), so that we can
protect concurrent access to the same table[index], meanwhile allow the
maximum concurrency.
With this in mind, several kinds of locks which could be used as a
per-entry lock were tested and compared:
Test environment:
x86-64 Intel Core2 Q8400, system memory 4GB, Ubuntu 12.04,
kernel v3.15.0-rc3 as base, zram with 4 max_comp_streams LZO.
iozone test:
iozone -t 4 -R -r 16K -s 200M -I +Z
(1GB zram with ext4 filesystem, take the average of 10 tests, KB/s)
Test base CAS spinlock rwlock bit_spinlock
-------------------------------------------------------------------
Initial write 1381094 1425435 1422860 1423075 1421521
Rewrite 1529479 1641199 1668762 1672855 1654910
Read 8468009 11324979 11305569 11117273 10997202
Re-read 8467476 11260914 11248059 11145336 10906486
Reverse Read 6821393 8106334 8282174 8279195 8109186
Stride read 7191093 8994306 9153982 8961224 9004434
Random read 7156353 8957932 9167098 8980465 8940476
Mixed workload 4172747 5680814 5927825 5489578 5972253
Random write 1483044 1605588 1594329 1600453 1596010
Pwrite 1276644 1303108 1311612 1314228 1300960
Pread 4324337 4632869 4618386 4457870 4500166
To enhance the possibility of access the same table[index] concurrently,
set zram a small disksize(10MB) and let threads run with large loop
count.
fio test:
fio --bs=32k --randrepeat=1 --randseed=100 --refill_buffers
--scramble_buffers=1 --direct=1 --loops=3000 --numjobs=4
--filename=/dev/zram0 --name=seq-write --rw=write --stonewall
--name=seq-read --rw=read --stonewall --name=seq-readwrite
--rw=rw --stonewall --name=rand-readwrite --rw=randrw --stonewall
(10MB zram raw block device, take the average of 10 tests, KB/s)
Test base CAS spinlock rwlock bit_spinlock
-------------------------------------------------------------
seq-write 933789 999357 1003298 995961 1001958
seq-read 5634130 6577930 6380861 6243912 6230006
seq-rw 1405687 1638117 1640256 1633903 1634459
rand-rw 1386119 1614664 1617211 1609267 1612471
All the optimization methods show a higher performance than the base,
however, it is hard to say which method is the most appropriate.
On the other hand, zram is mostly used on small embedded system, so we
don't want to increase any memory footprint.
This patch pick the bit_spinlock method, pack object size and page_flag
into an unsigned long table.value, so as to not increase any memory
overhead on both 32-bit and 64-bit system.
On the third hand, even though different kinds of locks have different
performances, we can ignore this difference, because: if zram is used as
zram swapfile, the swap subsystem can prevent concurrent access to the
same swapslot; if zram is used as zram-blk for set up filesystem on it,
the upper filesystem and the page cache also prevent concurrent access
of the same block mostly. So we can ignore the different performances
among locks.
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Davidlohr Bueso <davidlohr@hp.com>
Signed-off-by: Weijie Yang <weijie.yang@samsung.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-06 16:08:31 -07:00
|
|
|
{
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
struct zram *zram = dev_to_zram(dev);
|
|
|
|
ssize_t ret;
|
zram: replace global tb_lock with fine grain lock
Currently, we use a rwlock tb_lock to protect concurrent access to the
whole zram meta table. However, according to the actual access model,
there is only a small chance for upper user to access the same
table[index], so the current lock granularity is too big.
The idea of optimization is to change the lock granularity from whole
meta table to per table entry (table -> table[index]), so that we can
protect concurrent access to the same table[index], meanwhile allow the
maximum concurrency.
With this in mind, several kinds of locks which could be used as a
per-entry lock were tested and compared:
Test environment:
x86-64 Intel Core2 Q8400, system memory 4GB, Ubuntu 12.04,
kernel v3.15.0-rc3 as base, zram with 4 max_comp_streams LZO.
iozone test:
iozone -t 4 -R -r 16K -s 200M -I +Z
(1GB zram with ext4 filesystem, take the average of 10 tests, KB/s)
Test base CAS spinlock rwlock bit_spinlock
-------------------------------------------------------------------
Initial write 1381094 1425435 1422860 1423075 1421521
Rewrite 1529479 1641199 1668762 1672855 1654910
Read 8468009 11324979 11305569 11117273 10997202
Re-read 8467476 11260914 11248059 11145336 10906486
Reverse Read 6821393 8106334 8282174 8279195 8109186
Stride read 7191093 8994306 9153982 8961224 9004434
Random read 7156353 8957932 9167098 8980465 8940476
Mixed workload 4172747 5680814 5927825 5489578 5972253
Random write 1483044 1605588 1594329 1600453 1596010
Pwrite 1276644 1303108 1311612 1314228 1300960
Pread 4324337 4632869 4618386 4457870 4500166
To enhance the possibility of access the same table[index] concurrently,
set zram a small disksize(10MB) and let threads run with large loop
count.
fio test:
fio --bs=32k --randrepeat=1 --randseed=100 --refill_buffers
--scramble_buffers=1 --direct=1 --loops=3000 --numjobs=4
--filename=/dev/zram0 --name=seq-write --rw=write --stonewall
--name=seq-read --rw=read --stonewall --name=seq-readwrite
--rw=rw --stonewall --name=rand-readwrite --rw=randrw --stonewall
(10MB zram raw block device, take the average of 10 tests, KB/s)
Test base CAS spinlock rwlock bit_spinlock
-------------------------------------------------------------
seq-write 933789 999357 1003298 995961 1001958
seq-read 5634130 6577930 6380861 6243912 6230006
seq-rw 1405687 1638117 1640256 1633903 1634459
rand-rw 1386119 1614664 1617211 1609267 1612471
All the optimization methods show a higher performance than the base,
however, it is hard to say which method is the most appropriate.
On the other hand, zram is mostly used on small embedded system, so we
don't want to increase any memory footprint.
This patch pick the bit_spinlock method, pack object size and page_flag
into an unsigned long table.value, so as to not increase any memory
overhead on both 32-bit and 64-bit system.
On the third hand, even though different kinds of locks have different
performances, we can ignore this difference, because: if zram is used as
zram swapfile, the swap subsystem can prevent concurrent access to the
same swapslot; if zram is used as zram-blk for set up filesystem on it,
the upper filesystem and the page cache also prevent concurrent access
of the same block mostly. So we can ignore the different performances
among locks.
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Davidlohr Bueso <davidlohr@hp.com>
Signed-off-by: Weijie Yang <weijie.yang@samsung.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-06 16:08:31 -07:00
|
|
|
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
down_read(&zram->init_lock);
|
|
|
|
ret = scnprintf(buf, PAGE_SIZE,
|
2023-04-11 19:14:44 +02:00
|
|
|
"%8llu %8llu 0 %8llu\n",
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
(u64)atomic64_read(&zram->stats.failed_reads),
|
|
|
|
(u64)atomic64_read(&zram->stats.failed_writes),
|
|
|
|
(u64)atomic64_read(&zram->stats.notify_free));
|
|
|
|
up_read(&zram->init_lock);
|
2009-09-22 10:26:53 +05:30
|
|
|
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
return ret;
|
2013-06-22 03:21:18 +03:00
|
|
|
}
|
|
|
|
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
static ssize_t mm_stat_show(struct device *dev,
|
|
|
|
struct device_attribute *attr, char *buf)
|
2013-06-22 03:21:18 +03:00
|
|
|
{
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
struct zram *zram = dev_to_zram(dev);
|
2015-09-08 15:04:35 -07:00
|
|
|
struct zs_pool_stats pool_stats;
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
u64 orig_size, mem_used = 0;
|
|
|
|
long max_used;
|
|
|
|
ssize_t ret;
|
2013-08-08 23:53:24 +05:30
|
|
|
|
2015-09-08 15:04:35 -07:00
|
|
|
memset(&pool_stats, 0x00, sizeof(struct zs_pool_stats));
|
|
|
|
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
down_read(&zram->init_lock);
|
2015-09-08 15:04:35 -07:00
|
|
|
if (init_done(zram)) {
|
2017-05-03 14:55:47 -07:00
|
|
|
mem_used = zs_get_total_pages(zram->mem_pool);
|
|
|
|
zs_pool_stats(zram->mem_pool, &pool_stats);
|
2015-09-08 15:04:35 -07:00
|
|
|
}
|
2013-06-22 03:21:18 +03:00
|
|
|
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
orig_size = atomic64_read(&zram->stats.pages_stored);
|
|
|
|
max_used = atomic_long_read(&zram->stats.max_used_pages);
|
2013-06-22 03:21:18 +03:00
|
|
|
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
ret = scnprintf(buf, PAGE_SIZE,
|
2020-12-14 19:14:32 -08:00
|
|
|
"%8llu %8llu %8llu %8lu %8ld %8llu %8lu %8llu %8llu\n",
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
orig_size << PAGE_SHIFT,
|
|
|
|
(u64)atomic64_read(&zram->stats.compr_data_size),
|
|
|
|
mem_used << PAGE_SHIFT,
|
|
|
|
zram->limit_pages << PAGE_SHIFT,
|
|
|
|
max_used << PAGE_SHIFT,
|
2017-02-24 14:59:27 -08:00
|
|
|
(u64)atomic64_read(&zram->stats.same_pages),
|
2021-02-25 17:18:31 -08:00
|
|
|
atomic_long_read(&pool_stats.pages_compacted),
|
2020-12-14 19:14:32 -08:00
|
|
|
(u64)atomic64_read(&zram->stats.huge_pages),
|
|
|
|
(u64)atomic64_read(&zram->stats.huge_pages_since));
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
up_read(&zram->init_lock);
|
2013-06-22 03:21:18 +03:00
|
|
|
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2018-12-28 00:36:51 -08:00
|
|
|
#ifdef CONFIG_ZRAM_WRITEBACK
|
2018-12-28 00:36:54 -08:00
|
|
|
#define FOUR_K(x) ((x) * (1 << (PAGE_SHIFT - 12)))
|
2018-12-28 00:36:51 -08:00
|
|
|
static ssize_t bd_stat_show(struct device *dev,
|
|
|
|
struct device_attribute *attr, char *buf)
|
|
|
|
{
|
|
|
|
struct zram *zram = dev_to_zram(dev);
|
|
|
|
ssize_t ret;
|
|
|
|
|
|
|
|
down_read(&zram->init_lock);
|
|
|
|
ret = scnprintf(buf, PAGE_SIZE,
|
|
|
|
"%8llu %8llu %8llu\n",
|
2018-12-28 00:36:54 -08:00
|
|
|
FOUR_K((u64)atomic64_read(&zram->stats.bd_count)),
|
|
|
|
FOUR_K((u64)atomic64_read(&zram->stats.bd_reads)),
|
|
|
|
FOUR_K((u64)atomic64_read(&zram->stats.bd_writes)));
|
2018-12-28 00:36:51 -08:00
|
|
|
up_read(&zram->init_lock);
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2016-05-20 17:00:02 -07:00
|
|
|
static ssize_t debug_stat_show(struct device *dev,
|
|
|
|
struct device_attribute *attr, char *buf)
|
|
|
|
{
|
Revert "zram: remove double compression logic"
This reverts commit e7be8d1dd983156b ("zram: remove double compression
logic") as it causes zram failures. It does not revert cleanly, PTR_ERR
handling was introduced in the meantime. This is handled by appropriate
IS_ERR.
When under memory pressure, zs_malloc() can fail. Before the above
commit, the allocation was retried with direct reclaim enabled (GFP_NOIO).
After the commit, it is not -- only __GFP_KSWAPD_RECLAIM is tried.
So when the failure occurs under memory pressure, the overlaying
filesystem such as ext2 (mounted by ext4 module in this case) can emit
failures, making the (file)system unusable:
EXT4-fs warning (device zram0): ext4_end_bio:343: I/O error 10 writing to inode 16386 starting block 159744)
Buffer I/O error on device zram0, logical block 159744
With direct reclaim, memory is really reclaimed and allocation succeeds,
eventually. In the worst case, the oom killer is invoked, which is proper
outcome if user sets up zram too large (in comparison to available RAM).
This very diff doesn't apply to 5.19 (stable) cleanly (see PTR_ERR note
above). Use revert of e7be8d1dd983 directly.
Link: https://bugzilla.suse.com/show_bug.cgi?id=1202203
Link: https://lkml.kernel.org/r/20220810070609.14402-1-jslaby@suse.cz
Fixes: e7be8d1dd983 ("zram: remove double compression logic")
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Alexey Romanov <avromanov@sberdevices.ru>
Cc: Dmitry Rokosov <ddrokosov@sberdevices.ru>
Cc: Lukas Czerner <lczerner@redhat.com>
Cc: <stable@vger.kernel.org> [5.19]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-08-10 09:06:09 +02:00
|
|
|
int version = 1;
|
2016-05-20 17:00:02 -07:00
|
|
|
struct zram *zram = dev_to_zram(dev);
|
|
|
|
ssize_t ret;
|
|
|
|
|
|
|
|
down_read(&zram->init_lock);
|
|
|
|
ret = scnprintf(buf, PAGE_SIZE,
|
Revert "zram: remove double compression logic"
This reverts commit e7be8d1dd983156b ("zram: remove double compression
logic") as it causes zram failures. It does not revert cleanly, PTR_ERR
handling was introduced in the meantime. This is handled by appropriate
IS_ERR.
When under memory pressure, zs_malloc() can fail. Before the above
commit, the allocation was retried with direct reclaim enabled (GFP_NOIO).
After the commit, it is not -- only __GFP_KSWAPD_RECLAIM is tried.
So when the failure occurs under memory pressure, the overlaying
filesystem such as ext2 (mounted by ext4 module in this case) can emit
failures, making the (file)system unusable:
EXT4-fs warning (device zram0): ext4_end_bio:343: I/O error 10 writing to inode 16386 starting block 159744)
Buffer I/O error on device zram0, logical block 159744
With direct reclaim, memory is really reclaimed and allocation succeeds,
eventually. In the worst case, the oom killer is invoked, which is proper
outcome if user sets up zram too large (in comparison to available RAM).
This very diff doesn't apply to 5.19 (stable) cleanly (see PTR_ERR note
above). Use revert of e7be8d1dd983 directly.
Link: https://bugzilla.suse.com/show_bug.cgi?id=1202203
Link: https://lkml.kernel.org/r/20220810070609.14402-1-jslaby@suse.cz
Fixes: e7be8d1dd983 ("zram: remove double compression logic")
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Alexey Romanov <avromanov@sberdevices.ru>
Cc: Dmitry Rokosov <ddrokosov@sberdevices.ru>
Cc: Lukas Czerner <lczerner@redhat.com>
Cc: <stable@vger.kernel.org> [5.19]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-08-10 09:06:09 +02:00
|
|
|
"version: %d\n%8llu %8llu\n",
|
2016-05-20 17:00:02 -07:00
|
|
|
version,
|
Revert "zram: remove double compression logic"
This reverts commit e7be8d1dd983156b ("zram: remove double compression
logic") as it causes zram failures. It does not revert cleanly, PTR_ERR
handling was introduced in the meantime. This is handled by appropriate
IS_ERR.
When under memory pressure, zs_malloc() can fail. Before the above
commit, the allocation was retried with direct reclaim enabled (GFP_NOIO).
After the commit, it is not -- only __GFP_KSWAPD_RECLAIM is tried.
So when the failure occurs under memory pressure, the overlaying
filesystem such as ext2 (mounted by ext4 module in this case) can emit
failures, making the (file)system unusable:
EXT4-fs warning (device zram0): ext4_end_bio:343: I/O error 10 writing to inode 16386 starting block 159744)
Buffer I/O error on device zram0, logical block 159744
With direct reclaim, memory is really reclaimed and allocation succeeds,
eventually. In the worst case, the oom killer is invoked, which is proper
outcome if user sets up zram too large (in comparison to available RAM).
This very diff doesn't apply to 5.19 (stable) cleanly (see PTR_ERR note
above). Use revert of e7be8d1dd983 directly.
Link: https://bugzilla.suse.com/show_bug.cgi?id=1202203
Link: https://lkml.kernel.org/r/20220810070609.14402-1-jslaby@suse.cz
Fixes: e7be8d1dd983 ("zram: remove double compression logic")
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Alexey Romanov <avromanov@sberdevices.ru>
Cc: Dmitry Rokosov <ddrokosov@sberdevices.ru>
Cc: Lukas Czerner <lczerner@redhat.com>
Cc: <stable@vger.kernel.org> [5.19]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-08-10 09:06:09 +02:00
|
|
|
(u64)atomic64_read(&zram->stats.writestall),
|
zram: fix lockdep warning of free block handling
Patch series "zram idle page writeback", v3.
Inherently, swap device has many idle pages which are rare touched since
it was allocated. It is never problem if we use storage device as swap.
However, it's just waste for zram-swap.
This patchset supports zram idle page writeback feature.
* Admin can define what is idle page "no access since X time ago"
* Admin can define when zram should writeback them
* Admin can define when zram should stop writeback to prevent wearout
Details are in each patch's description.
This patch (of 7):
================================
WARNING: inconsistent lock state
4.19.0+ #390 Not tainted
--------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
zram_verify/2095 [HC0[0]:SC1[1]:HE1:SE0] takes:
00000000b1828693 (&(&zram->bitmap_lock)->rlock){+.?.}, at: put_entry_bdev+0x1e/0x50
{SOFTIRQ-ON-W} state was registered at:
_raw_spin_lock+0x2c/0x40
zram_make_request+0x755/0xdc9
generic_make_request+0x373/0x6a0
submit_bio+0x6c/0x140
__swap_writepage+0x3a8/0x480
shrink_page_list+0x1102/0x1a60
shrink_inactive_list+0x21b/0x3f0
shrink_node_memcg.constprop.99+0x4f8/0x7e0
shrink_node+0x7d/0x2f0
do_try_to_free_pages+0xe0/0x300
try_to_free_pages+0x116/0x2b0
__alloc_pages_slowpath+0x3f4/0xf80
__alloc_pages_nodemask+0x2a2/0x2f0
__handle_mm_fault+0x42e/0xb50
handle_mm_fault+0x55/0xb0
__do_page_fault+0x235/0x4b0
page_fault+0x1e/0x30
irq event stamp: 228412
hardirqs last enabled at (228412): [<ffffffff98245846>] __slab_free+0x3e6/0x600
hardirqs last disabled at (228411): [<ffffffff98245625>] __slab_free+0x1c5/0x600
softirqs last enabled at (228396): [<ffffffff98e0031e>] __do_softirq+0x31e/0x427
softirqs last disabled at (228403): [<ffffffff98072051>] irq_exit+0xd1/0xe0
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&(&zram->bitmap_lock)->rlock);
<Interrupt>
lock(&(&zram->bitmap_lock)->rlock);
*** DEADLOCK ***
no locks held by zram_verify/2095.
stack backtrace:
CPU: 5 PID: 2095 Comm: zram_verify Not tainted 4.19.0+ #390
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x67/0x9b
print_usage_bug+0x1bd/0x1d3
mark_lock+0x4aa/0x540
__lock_acquire+0x51d/0x1300
lock_acquire+0x90/0x180
_raw_spin_lock+0x2c/0x40
put_entry_bdev+0x1e/0x50
zram_free_page+0xf6/0x110
zram_slot_free_notify+0x42/0xa0
end_swap_bio_read+0x5b/0x170
blk_update_request+0x8f/0x340
scsi_end_request+0x2c/0x1e0
scsi_io_completion+0x98/0x650
blk_done_softirq+0x9e/0xd0
__do_softirq+0xcc/0x427
irq_exit+0xd1/0xe0
do_IRQ+0x93/0x120
common_interrupt+0xf/0xf
</IRQ>
With writeback feature, zram_slot_free_notify could be called in softirq
context by end_swap_bio_read. However, bitmap_lock is not aware of that
so lockdep yell out:
get_entry_bdev
spin_lock(bitmap->lock);
irq
softirq
end_swap_bio_read
zram_slot_free_notify
zram_slot_lock <-- deadlock prone
zram_free_page
put_entry_bdev
spin_lock(bitmap->lock); <-- deadlock prone
With akpm's suggestion (i.e. bitmap operation is already atomic), we
could remove bitmap lock. It might fail to find a empty slot if serious
contention happens. However, it's not severe problem because huge page
writeback has already possiblity to fail if there is severe memory
pressure. Worst case is just keeping the incompressible in memory, not
storage.
The other problem is zram_slot_lock in zram_slot_slot_free_notify. To
make it safe is this patch introduces zram_slot_trylock where
zram_slot_free_notify uses it. Although it's rare to be contented, this
patch adds new debug stat "miss_free" to keep monitoring how often it
happens.
Link: http://lkml.kernel.org/r/20181127055429.251614-2-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Joey Pabalinas <joeypabalinas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 00:36:33 -08:00
|
|
|
(u64)atomic64_read(&zram->stats.miss_free));
|
2016-05-20 17:00:02 -07:00
|
|
|
up_read(&zram->init_lock);
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
static DEVICE_ATTR_RO(io_stat);
|
|
|
|
static DEVICE_ATTR_RO(mm_stat);
|
2018-12-28 00:36:51 -08:00
|
|
|
#ifdef CONFIG_ZRAM_WRITEBACK
|
|
|
|
static DEVICE_ATTR_RO(bd_stat);
|
|
|
|
#endif
|
2016-05-20 17:00:02 -07:00
|
|
|
static DEVICE_ATTR_RO(debug_stat);
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
|
2017-05-03 14:55:47 -07:00
|
|
|
static void zram_meta_free(struct zram *zram, u64 disksize)
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
{
|
|
|
|
size_t num_pages = disksize >> PAGE_SHIFT;
|
|
|
|
size_t index;
|
2015-02-12 15:00:33 -08:00
|
|
|
|
2024-12-10 00:57:16 +08:00
|
|
|
if (!zram->table)
|
|
|
|
return;
|
|
|
|
|
2015-02-12 15:00:33 -08:00
|
|
|
/* Free all pages that are still in this zram device */
|
2017-05-03 14:55:53 -07:00
|
|
|
for (index = 0; index < num_pages; index++)
|
|
|
|
zram_free_page(zram, index);
|
2015-02-12 15:00:33 -08:00
|
|
|
|
2017-05-03 14:55:47 -07:00
|
|
|
zs_destroy_pool(zram->mem_pool);
|
|
|
|
vfree(zram->table);
|
2024-12-10 00:57:16 +08:00
|
|
|
zram->table = NULL;
|
2013-06-22 03:21:18 +03:00
|
|
|
}
|
|
|
|
|
2017-05-03 14:55:47 -07:00
|
|
|
static bool zram_meta_alloc(struct zram *zram, u64 disksize)
|
2013-06-22 03:21:18 +03:00
|
|
|
{
|
2024-09-06 16:14:43 +02:00
|
|
|
size_t num_pages, index;
|
2013-06-22 03:21:18 +03:00
|
|
|
|
|
|
|
num_pages = disksize >> PAGE_SHIFT;
|
treewide: Use array_size() in vzalloc()
The vzalloc() function has no 2-factor argument form, so multiplication
factors need to be wrapped in array_size(). This patch replaces cases of:
vzalloc(a * b)
with:
vzalloc(array_size(a, b))
as well as handling cases of:
vzalloc(a * b * c)
with:
vzalloc(array3_size(a, b, c))
This does, however, attempt to ignore constant size factors like:
vzalloc(4 * 1024)
though any constants defined via macros get caught up in the conversion.
Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.
The Coccinelle script used for this was:
// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@
(
vzalloc(
- (sizeof(TYPE)) * E
+ sizeof(TYPE) * E
, ...)
|
vzalloc(
- (sizeof(THING)) * E
+ sizeof(THING) * E
, ...)
)
// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@
(
vzalloc(
- sizeof(u8) * (COUNT)
+ COUNT
, ...)
|
vzalloc(
- sizeof(__u8) * (COUNT)
+ COUNT
, ...)
|
vzalloc(
- sizeof(char) * (COUNT)
+ COUNT
, ...)
|
vzalloc(
- sizeof(unsigned char) * (COUNT)
+ COUNT
, ...)
|
vzalloc(
- sizeof(u8) * COUNT
+ COUNT
, ...)
|
vzalloc(
- sizeof(__u8) * COUNT
+ COUNT
, ...)
|
vzalloc(
- sizeof(char) * COUNT
+ COUNT
, ...)
|
vzalloc(
- sizeof(unsigned char) * COUNT
+ COUNT
, ...)
)
// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@
(
vzalloc(
- sizeof(TYPE) * (COUNT_ID)
+ array_size(COUNT_ID, sizeof(TYPE))
, ...)
|
vzalloc(
- sizeof(TYPE) * COUNT_ID
+ array_size(COUNT_ID, sizeof(TYPE))
, ...)
|
vzalloc(
- sizeof(TYPE) * (COUNT_CONST)
+ array_size(COUNT_CONST, sizeof(TYPE))
, ...)
|
vzalloc(
- sizeof(TYPE) * COUNT_CONST
+ array_size(COUNT_CONST, sizeof(TYPE))
, ...)
|
vzalloc(
- sizeof(THING) * (COUNT_ID)
+ array_size(COUNT_ID, sizeof(THING))
, ...)
|
vzalloc(
- sizeof(THING) * COUNT_ID
+ array_size(COUNT_ID, sizeof(THING))
, ...)
|
vzalloc(
- sizeof(THING) * (COUNT_CONST)
+ array_size(COUNT_CONST, sizeof(THING))
, ...)
|
vzalloc(
- sizeof(THING) * COUNT_CONST
+ array_size(COUNT_CONST, sizeof(THING))
, ...)
)
// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@
vzalloc(
- SIZE * COUNT
+ array_size(COUNT, SIZE)
, ...)
// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@
(
vzalloc(
- sizeof(TYPE) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
vzalloc(
- sizeof(TYPE) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
vzalloc(
- sizeof(TYPE) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
vzalloc(
- sizeof(TYPE) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
vzalloc(
- sizeof(THING) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
vzalloc(
- sizeof(THING) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
vzalloc(
- sizeof(THING) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
vzalloc(
- sizeof(THING) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
)
// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@
(
vzalloc(
- sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
vzalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
vzalloc(
- sizeof(THING1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
vzalloc(
- sizeof(THING1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
vzalloc(
- sizeof(TYPE1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
|
vzalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
)
// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@
(
vzalloc(
- (COUNT) * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
vzalloc(
- COUNT * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
vzalloc(
- COUNT * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
vzalloc(
- (COUNT) * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
vzalloc(
- COUNT * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
vzalloc(
- (COUNT) * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
vzalloc(
- (COUNT) * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
vzalloc(
- COUNT * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
)
// Any remaining multi-factor products, first at least 3-factor products
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@
(
vzalloc(C1 * C2 * C3, ...)
|
vzalloc(
- E1 * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
)
// And then all remaining 2 factors products when they're not all constants.
@@
expression E1, E2;
constant C1, C2;
@@
(
vzalloc(C1 * C2, ...)
|
vzalloc(
- E1 * E2
+ array_size(E1, E2)
, ...)
)
Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12 14:27:37 -07:00
|
|
|
zram->table = vzalloc(array_size(num_pages, sizeof(*zram->table)));
|
2017-05-03 14:55:47 -07:00
|
|
|
if (!zram->table)
|
|
|
|
return false;
|
2013-06-22 03:21:18 +03:00
|
|
|
|
2017-05-03 14:55:47 -07:00
|
|
|
zram->mem_pool = zs_create_pool(zram->disk->disk_name);
|
|
|
|
if (!zram->mem_pool) {
|
|
|
|
vfree(zram->table);
|
2025-01-07 14:54:46 +08:00
|
|
|
zram->table = NULL;
|
2017-05-03 14:55:47 -07:00
|
|
|
return false;
|
2013-06-22 03:21:18 +03:00
|
|
|
}
|
|
|
|
|
2018-04-05 16:24:47 -07:00
|
|
|
if (!huge_class_size)
|
|
|
|
huge_class_size = zs_huge_class_size(zram->mem_pool);
|
2024-09-06 16:14:43 +02:00
|
|
|
|
|
|
|
for (index = 0; index < num_pages; index++)
|
|
|
|
spin_lock_init(&zram->table[index].lock);
|
2017-05-03 14:55:47 -07:00
|
|
|
return true;
|
2013-06-22 03:21:18 +03:00
|
|
|
}
|
|
|
|
|
zram: replace global tb_lock with fine grain lock
Currently, we use a rwlock tb_lock to protect concurrent access to the
whole zram meta table. However, according to the actual access model,
there is only a small chance for upper user to access the same
table[index], so the current lock granularity is too big.
The idea of optimization is to change the lock granularity from whole
meta table to per table entry (table -> table[index]), so that we can
protect concurrent access to the same table[index], meanwhile allow the
maximum concurrency.
With this in mind, several kinds of locks which could be used as a
per-entry lock were tested and compared:
Test environment:
x86-64 Intel Core2 Q8400, system memory 4GB, Ubuntu 12.04,
kernel v3.15.0-rc3 as base, zram with 4 max_comp_streams LZO.
iozone test:
iozone -t 4 -R -r 16K -s 200M -I +Z
(1GB zram with ext4 filesystem, take the average of 10 tests, KB/s)
Test base CAS spinlock rwlock bit_spinlock
-------------------------------------------------------------------
Initial write 1381094 1425435 1422860 1423075 1421521
Rewrite 1529479 1641199 1668762 1672855 1654910
Read 8468009 11324979 11305569 11117273 10997202
Re-read 8467476 11260914 11248059 11145336 10906486
Reverse Read 6821393 8106334 8282174 8279195 8109186
Stride read 7191093 8994306 9153982 8961224 9004434
Random read 7156353 8957932 9167098 8980465 8940476
Mixed workload 4172747 5680814 5927825 5489578 5972253
Random write 1483044 1605588 1594329 1600453 1596010
Pwrite 1276644 1303108 1311612 1314228 1300960
Pread 4324337 4632869 4618386 4457870 4500166
To enhance the possibility of access the same table[index] concurrently,
set zram a small disksize(10MB) and let threads run with large loop
count.
fio test:
fio --bs=32k --randrepeat=1 --randseed=100 --refill_buffers
--scramble_buffers=1 --direct=1 --loops=3000 --numjobs=4
--filename=/dev/zram0 --name=seq-write --rw=write --stonewall
--name=seq-read --rw=read --stonewall --name=seq-readwrite
--rw=rw --stonewall --name=rand-readwrite --rw=randrw --stonewall
(10MB zram raw block device, take the average of 10 tests, KB/s)
Test base CAS spinlock rwlock bit_spinlock
-------------------------------------------------------------
seq-write 933789 999357 1003298 995961 1001958
seq-read 5634130 6577930 6380861 6243912 6230006
seq-rw 1405687 1638117 1640256 1633903 1634459
rand-rw 1386119 1614664 1617211 1609267 1612471
All the optimization methods show a higher performance than the base,
however, it is hard to say which method is the most appropriate.
On the other hand, zram is mostly used on small embedded system, so we
don't want to increase any memory footprint.
This patch pick the bit_spinlock method, pack object size and page_flag
into an unsigned long table.value, so as to not increase any memory
overhead on both 32-bit and 64-bit system.
On the third hand, even though different kinds of locks have different
performances, we can ignore this difference, because: if zram is used as
zram swapfile, the swap subsystem can prevent concurrent access to the
same swapslot; if zram is used as zram-blk for set up filesystem on it,
the upper filesystem and the page cache also prevent concurrent access
of the same block mostly. So we can ignore the different performances
among locks.
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Davidlohr Bueso <davidlohr@hp.com>
Signed-off-by: Weijie Yang <weijie.yang@samsung.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-06 16:08:31 -07:00
|
|
|
/*
|
|
|
|
* To protect concurrent access to the same index entry,
|
|
|
|
* caller should hold this table index entry's bit_spinlock to
|
|
|
|
* indicate this index entry is accessing.
|
|
|
|
*/
|
2010-06-01 13:31:25 +05:30
|
|
|
static void zram_free_page(struct zram *zram, size_t index)
|
2009-09-22 10:26:53 +05:30
|
|
|
{
|
2017-09-06 16:20:03 -07:00
|
|
|
unsigned long handle;
|
|
|
|
|
2023-11-15 11:42:12 +09:00
|
|
|
#ifdef CONFIG_ZRAM_TRACK_ENTRY_ACTIME
|
2018-12-28 00:36:40 -08:00
|
|
|
zram->table[index].ac_time = 0;
|
|
|
|
#endif
|
2024-09-17 11:09:11 +09:00
|
|
|
|
|
|
|
zram_clear_flag(zram, index, ZRAM_IDLE);
|
|
|
|
zram_clear_flag(zram, index, ZRAM_INCOMPRESSIBLE);
|
|
|
|
zram_clear_flag(zram, index, ZRAM_PP_SLOT);
|
|
|
|
zram_set_priority(zram, index, 0);
|
2018-12-28 00:36:44 -08:00
|
|
|
|
2018-06-07 17:05:42 -07:00
|
|
|
if (zram_test_flag(zram, index, ZRAM_HUGE)) {
|
|
|
|
zram_clear_flag(zram, index, ZRAM_HUGE);
|
|
|
|
atomic64_dec(&zram->stats.huge_pages);
|
|
|
|
}
|
|
|
|
|
2018-12-28 00:36:40 -08:00
|
|
|
if (zram_test_flag(zram, index, ZRAM_WB)) {
|
|
|
|
zram_clear_flag(zram, index, ZRAM_WB);
|
2024-12-18 15:34:19 +09:00
|
|
|
free_block_bdev(zram, zram_get_handle(zram, index));
|
2018-12-28 00:36:40 -08:00
|
|
|
goto out;
|
2017-09-06 16:20:03 -07:00
|
|
|
}
|
2009-09-22 10:26:53 +05:30
|
|
|
|
2017-02-24 14:59:27 -08:00
|
|
|
/*
|
|
|
|
* No memory is allocated for same element filled pages.
|
|
|
|
* Simply clear same page flag.
|
|
|
|
*/
|
2017-05-03 14:55:47 -07:00
|
|
|
if (zram_test_flag(zram, index, ZRAM_SAME)) {
|
|
|
|
zram_clear_flag(zram, index, ZRAM_SAME);
|
2017-02-24 14:59:27 -08:00
|
|
|
atomic64_dec(&zram->stats.same_pages);
|
2018-12-28 00:36:40 -08:00
|
|
|
goto out;
|
2009-09-22 10:26:53 +05:30
|
|
|
}
|
|
|
|
|
2017-09-06 16:20:03 -07:00
|
|
|
handle = zram_get_handle(zram, index);
|
2017-02-24 14:59:27 -08:00
|
|
|
if (!handle)
|
|
|
|
return;
|
|
|
|
|
2017-05-03 14:55:47 -07:00
|
|
|
zs_free(zram->mem_pool, handle);
|
2009-09-22 10:26:53 +05:30
|
|
|
|
2017-05-03 14:55:47 -07:00
|
|
|
atomic64_sub(zram_get_obj_size(zram, index),
|
2024-09-17 11:09:11 +09:00
|
|
|
&zram->stats.compr_data_size);
|
2018-12-28 00:36:40 -08:00
|
|
|
out:
|
2014-04-07 15:38:03 -07:00
|
|
|
atomic64_dec(&zram->stats.pages_stored);
|
2017-05-03 14:55:50 -07:00
|
|
|
zram_set_handle(zram, index, 0);
|
2017-05-03 14:55:47 -07:00
|
|
|
zram_set_obj_size(zram, index, 0);
|
2009-09-22 10:26:53 +05:30
|
|
|
}
|
|
|
|
|
2024-12-18 15:34:22 +09:00
|
|
|
static int read_same_filled_page(struct zram *zram, struct page *page,
|
2022-11-09 20:50:37 +09:00
|
|
|
u32 index)
|
2009-09-22 10:26:53 +05:30
|
|
|
{
|
2024-12-18 15:34:22 +09:00
|
|
|
void *mem;
|
|
|
|
|
|
|
|
mem = kmap_local_page(page);
|
|
|
|
zram_fill_page(mem, PAGE_SIZE, zram_get_handle(zram, index));
|
|
|
|
kunmap_local(mem);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int read_incompressible_page(struct zram *zram, struct page *page,
|
|
|
|
u32 index)
|
|
|
|
{
|
2014-01-30 15:46:03 -08:00
|
|
|
unsigned long handle;
|
2017-05-03 14:55:41 -07:00
|
|
|
void *src, *dst;
|
|
|
|
|
2017-05-03 14:55:50 -07:00
|
|
|
handle = zram_get_handle(zram, index);
|
2024-12-18 15:34:22 +09:00
|
|
|
src = zs_map_object(zram->mem_pool, handle, ZS_MM_RO);
|
|
|
|
dst = kmap_local_page(page);
|
|
|
|
copy_page(dst, src);
|
|
|
|
kunmap_local(dst);
|
|
|
|
zs_unmap_object(zram->mem_pool, handle);
|
2017-10-03 16:15:19 -07:00
|
|
|
|
2024-12-18 15:34:22 +09:00
|
|
|
return 0;
|
|
|
|
}
|
2017-10-03 16:15:19 -07:00
|
|
|
|
2024-12-18 15:34:22 +09:00
|
|
|
static int read_compressed_page(struct zram *zram, struct page *page, u32 index)
|
|
|
|
{
|
|
|
|
struct zcomp_strm *zstrm;
|
|
|
|
unsigned long handle;
|
|
|
|
unsigned int size;
|
|
|
|
void *src, *dst;
|
|
|
|
int ret, prio;
|
2009-09-22 10:26:53 +05:30
|
|
|
|
2024-12-18 15:34:22 +09:00
|
|
|
handle = zram_get_handle(zram, index);
|
|
|
|
size = zram_get_obj_size(zram, index);
|
|
|
|
prio = zram_get_priority(zram, index);
|
2020-10-19 12:13:53 +02:00
|
|
|
|
2024-12-18 15:34:22 +09:00
|
|
|
zstrm = zcomp_stream_get(zram->comps[prio]);
|
2017-05-03 14:55:47 -07:00
|
|
|
src = zs_map_object(zram->mem_pool, handle, ZS_MM_RO);
|
2024-12-18 15:34:22 +09:00
|
|
|
dst = kmap_local_page(page);
|
|
|
|
ret = zcomp_decompress(zram->comps[prio], zstrm, src, size, dst);
|
|
|
|
kunmap_local(dst);
|
2017-05-03 14:55:47 -07:00
|
|
|
zs_unmap_object(zram->mem_pool, handle);
|
2024-12-18 15:34:22 +09:00
|
|
|
zcomp_stream_put(zram->comps[prio]);
|
|
|
|
|
2022-11-09 20:50:37 +09:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2024-12-18 15:34:22 +09:00
|
|
|
/*
|
|
|
|
* Reads (decompresses if needed) a page from zspool (zsmalloc).
|
|
|
|
* Corresponding ZRAM slot should be locked.
|
|
|
|
*/
|
|
|
|
static int zram_read_from_zspool(struct zram *zram, struct page *page,
|
|
|
|
u32 index)
|
|
|
|
{
|
|
|
|
if (zram_test_flag(zram, index, ZRAM_SAME) ||
|
|
|
|
!zram_get_handle(zram, index))
|
|
|
|
return read_same_filled_page(zram, page, index);
|
|
|
|
|
|
|
|
if (!zram_test_flag(zram, index, ZRAM_HUGE))
|
|
|
|
return read_compressed_page(zram, page, index);
|
|
|
|
else
|
|
|
|
return read_incompressible_page(zram, page, index);
|
|
|
|
}
|
|
|
|
|
2023-04-11 19:14:51 +02:00
|
|
|
static int zram_read_page(struct zram *zram, struct page *page, u32 index,
|
2023-04-11 19:14:58 +02:00
|
|
|
struct bio *parent)
|
2022-11-09 20:50:37 +09:00
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
zram_slot_lock(zram, index);
|
|
|
|
if (!zram_test_flag(zram, index, ZRAM_WB)) {
|
|
|
|
/* Slot should be locked through out the function call */
|
|
|
|
ret = zram_read_from_zspool(zram, page, index);
|
|
|
|
zram_slot_unlock(zram, index);
|
|
|
|
} else {
|
2023-04-11 19:14:56 +02:00
|
|
|
/*
|
|
|
|
* The slot should be unlocked before reading from the backing
|
|
|
|
* device.
|
|
|
|
*/
|
2022-11-09 20:50:37 +09:00
|
|
|
zram_slot_unlock(zram, index);
|
|
|
|
|
2024-12-18 15:34:19 +09:00
|
|
|
ret = read_from_bdev(zram, page, zram_get_handle(zram, index),
|
2023-04-11 19:14:58 +02:00
|
|
|
parent);
|
2022-11-09 20:50:37 +09:00
|
|
|
}
|
Staging: ramzswap: Support generic I/O requests
Currently, ramzwap devices (/dev/ramzswapX) can only
be used as swap disks since it was hard-coded to consider
only the first request in bio vector.
Now, we iterate over all the segments in an incoming
bio which allows us to handle all kinds of I/O requests.
ramzswap devices can still handle PAGE_SIZE aligned and
multiple of PAGE_SIZE sized I/O requests only. To ensure
that we get always get such requests only, we set following
request_queue attributes to PAGE_SIZE:
- physical_block_size
- logical_block_size
- io_min
- io_opt
Note: physical and logical block sizes were already set
equal to PAGE_SIZE and that seems to be sufficient to get
PAGE_SIZE aligned I/O.
Since we are no longer limited to handling swap requests
only, the next few patches rename ramzswap to zram. So,
the devices will then be called /dev/zram{0, 1, 2, ...}
Usage/Examples:
1) Use as /tmp storage
- mkfs.ext4 /dev/zram0
- mount /dev/zram0 /tmp
2) Use as swap:
- mkswap /dev/zram0
- swapon /dev/zram0 -p 10 # give highest priority to zram0
Performance:
- I/O benchamark done with 'dd' command. Details can be
found here:
http://code.google.com/p/compcache/wiki/zramperf
Summary:
- Maximum read speed (approx):
- ram disk: 1200 MB/sec
- zram disk: 600 MB/sec
- Maximum write speed (approx):
- ram disk: 500 MB/sec
- zram disk: 160 MB/sec
Issues:
- Double caching: We can potentially waste memory by having
two copies of a page -- one in page cache (uncompress) and
second in the device memory (compressed). However, during
reclaim, clean page cache pages are quickly freed, so this
does not seem to be a big problem.
- Stale data: Not all filesystems support issuing 'discard'
requests to underlying block devices. So, if such filesystems
are used over zram devices, we can accumulate lot of stale
data in memory. Even for filesystems to do support discard
(example, ext4), we need to see how effective it is.
- Scalability: There is only one (per-device) de/compression
buffer stats. This can lead to significant contention, especially
when used for generic (non-swap) purposes.
Signed-off-by: Nitin Gupta <ngupta@vflare.org>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-06-01 13:31:23 +05:30
|
|
|
|
2011-06-10 15:28:47 +02:00
|
|
|
/* Should NEVER happen. Return bio error if it does. */
|
2022-11-09 20:50:37 +09:00
|
|
|
if (WARN_ON(ret < 0))
|
2011-06-10 15:28:47 +02:00
|
|
|
pr_err("Decompression failed! err=%d, page=%u\n", ret, index);
|
2009-09-22 10:26:53 +05:30
|
|
|
|
2017-05-03 14:55:41 -07:00
|
|
|
return ret;
|
2009-09-22 10:26:53 +05:30
|
|
|
}
|
|
|
|
|
2023-04-11 19:14:53 +02:00
|
|
|
/*
|
|
|
|
* Use a temporary buffer to decompress the page, as the decompressor
|
|
|
|
* always expects a full page for the output.
|
|
|
|
*/
|
|
|
|
static int zram_bvec_read_partial(struct zram *zram, struct bio_vec *bvec,
|
2023-04-11 19:14:58 +02:00
|
|
|
u32 index, int offset)
|
2011-06-10 15:28:48 +02:00
|
|
|
{
|
2023-04-11 19:14:53 +02:00
|
|
|
struct page *page = alloc_page(GFP_NOIO);
|
2011-06-10 15:28:48 +02:00
|
|
|
int ret;
|
2012-10-30 22:40:23 +03:00
|
|
|
|
2023-04-11 19:14:53 +02:00
|
|
|
if (!page)
|
|
|
|
return -ENOMEM;
|
2023-04-11 19:14:58 +02:00
|
|
|
ret = zram_read_page(zram, page, index, NULL);
|
2023-04-11 19:14:53 +02:00
|
|
|
if (likely(!ret))
|
2023-04-11 19:14:50 +02:00
|
|
|
memcpy_to_bvec(bvec, page_address(page) + offset);
|
2023-04-11 19:14:53 +02:00
|
|
|
__free_page(page);
|
2012-10-30 22:40:23 +03:00
|
|
|
return ret;
|
2011-06-10 15:28:48 +02:00
|
|
|
}
|
2012-10-30 22:40:23 +03:00
|
|
|
|
2023-04-11 19:14:53 +02:00
|
|
|
static int zram_bvec_read(struct zram *zram, struct bio_vec *bvec,
|
|
|
|
u32 index, int offset, struct bio *bio)
|
|
|
|
{
|
2012-10-30 22:40:23 +03:00
|
|
|
if (is_partial_io(bvec))
|
2023-04-11 19:14:58 +02:00
|
|
|
return zram_bvec_read_partial(zram, bvec, index, offset);
|
|
|
|
return zram_read_page(zram, bvec->bv_page, index, bio);
|
2011-06-10 15:28:48 +02:00
|
|
|
}
|
|
|
|
|
2024-12-18 15:34:20 +09:00
|
|
|
static int write_same_filled_page(struct zram *zram, unsigned long fill,
|
|
|
|
u32 index)
|
|
|
|
{
|
|
|
|
zram_slot_lock(zram, index);
|
|
|
|
zram_set_flag(zram, index, ZRAM_SAME);
|
|
|
|
zram_set_handle(zram, index, fill);
|
|
|
|
zram_slot_unlock(zram, index);
|
|
|
|
|
|
|
|
atomic64_inc(&zram->stats.same_pages);
|
|
|
|
atomic64_inc(&zram->stats.pages_stored);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2024-12-18 15:34:21 +09:00
|
|
|
static int write_incompressible_page(struct zram *zram, struct page *page,
|
|
|
|
u32 index)
|
|
|
|
{
|
|
|
|
unsigned long handle;
|
|
|
|
void *src, *dst;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* This function is called from preemptible context so we don't need
|
|
|
|
* to do optimistic and fallback to pessimistic handle allocation,
|
|
|
|
* like we do for compressible pages.
|
|
|
|
*/
|
|
|
|
handle = zs_malloc(zram->mem_pool, PAGE_SIZE,
|
|
|
|
GFP_NOIO | __GFP_HIGHMEM | __GFP_MOVABLE);
|
|
|
|
if (IS_ERR_VALUE(handle))
|
|
|
|
return PTR_ERR((void *)handle);
|
|
|
|
|
|
|
|
if (!zram_can_store_page(zram)) {
|
|
|
|
zcomp_stream_put(zram->comps[ZRAM_PRIMARY_COMP]);
|
|
|
|
zs_free(zram->mem_pool, handle);
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
|
|
|
|
dst = zs_map_object(zram->mem_pool, handle, ZS_MM_WO);
|
|
|
|
src = kmap_local_page(page);
|
|
|
|
memcpy(dst, src, PAGE_SIZE);
|
|
|
|
kunmap_local(src);
|
|
|
|
zs_unmap_object(zram->mem_pool, handle);
|
|
|
|
|
|
|
|
zram_slot_lock(zram, index);
|
|
|
|
zram_set_flag(zram, index, ZRAM_HUGE);
|
|
|
|
zram_set_handle(zram, index, handle);
|
|
|
|
zram_set_obj_size(zram, index, PAGE_SIZE);
|
|
|
|
zram_slot_unlock(zram, index);
|
|
|
|
|
|
|
|
atomic64_add(PAGE_SIZE, &zram->stats.compr_data_size);
|
|
|
|
atomic64_inc(&zram->stats.huge_pages);
|
|
|
|
atomic64_inc(&zram->stats.huge_pages_since);
|
|
|
|
atomic64_inc(&zram->stats.pages_stored);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2023-04-11 19:14:54 +02:00
|
|
|
static int zram_write_page(struct zram *zram, struct page *page, u32 index)
|
2009-09-22 10:26:53 +05:30
|
|
|
{
|
2017-09-06 16:20:00 -07:00
|
|
|
int ret = 0;
|
Revert "zram: remove double compression logic"
This reverts commit e7be8d1dd983156b ("zram: remove double compression
logic") as it causes zram failures. It does not revert cleanly, PTR_ERR
handling was introduced in the meantime. This is handled by appropriate
IS_ERR.
When under memory pressure, zs_malloc() can fail. Before the above
commit, the allocation was retried with direct reclaim enabled (GFP_NOIO).
After the commit, it is not -- only __GFP_KSWAPD_RECLAIM is tried.
So when the failure occurs under memory pressure, the overlaying
filesystem such as ext2 (mounted by ext4 module in this case) can emit
failures, making the (file)system unusable:
EXT4-fs warning (device zram0): ext4_end_bio:343: I/O error 10 writing to inode 16386 starting block 159744)
Buffer I/O error on device zram0, logical block 159744
With direct reclaim, memory is really reclaimed and allocation succeeds,
eventually. In the worst case, the oom killer is invoked, which is proper
outcome if user sets up zram too large (in comparison to available RAM).
This very diff doesn't apply to 5.19 (stable) cleanly (see PTR_ERR note
above). Use revert of e7be8d1dd983 directly.
Link: https://bugzilla.suse.com/show_bug.cgi?id=1202203
Link: https://lkml.kernel.org/r/20220810070609.14402-1-jslaby@suse.cz
Fixes: e7be8d1dd983 ("zram: remove double compression logic")
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Alexey Romanov <avromanov@sberdevices.ru>
Cc: Dmitry Rokosov <ddrokosov@sberdevices.ru>
Cc: Lukas Czerner <lczerner@redhat.com>
Cc: <stable@vger.kernel.org> [5.19]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-08-10 09:06:09 +02:00
|
|
|
unsigned long handle = -ENOMEM;
|
2017-09-06 16:19:47 -07:00
|
|
|
unsigned int comp_len = 0;
|
2024-12-18 15:34:21 +09:00
|
|
|
void *dst, *mem;
|
2017-09-06 16:19:47 -07:00
|
|
|
struct zcomp_strm *zstrm;
|
|
|
|
unsigned long element = 0;
|
2024-12-18 15:34:20 +09:00
|
|
|
bool same_filled;
|
2017-09-06 16:19:47 -07:00
|
|
|
|
zram: free slot memory early during write
Patch series "zram: split page type read/write handling", v2.
This is a subset of [1] series which contains only fixes and improvements
(no new features, as ZRAM_HUGE split is still under consideration).
The motivation for factoring out is that zram_write_page() gets more and
more complex all the time, because it tries to handle too many scenarios:
ZRAM_SAME store, ZRAM_HUGE store, compress page store with zs_malloc
allocation slowpath and conditional recompression, etc. Factor those out
and make things easier to handle.
Addition of cond_resched() is simply a fix, I can trigger watchdog from
zram writeback(). And early slot free is just a reasonable thing to do.
[1] https://lore.kernel.org/linux-kernel/20241119072057.3440039-1-senozhatsky@chromium.org
This patch (of 7):
In the current implementation entry's previously allocated memory is
released in the very last moment, when we already have allocated a new
memory for new data. This, basically, temporarily increases memory usage
for no good reason. For example, consider the case when both old (stale)
and new entry data are incompressible so such entry will temporarily use
two physical pages - one for stale (old) data and one for new data. We
can release old memory as soon as we get a write request for entry.
Link: https://lkml.kernel.org/r/20241218063513.297475-1-senozhatsky@chromium.org
Link: https://lkml.kernel.org/r/20241218063513.297475-2-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-18 15:34:18 +09:00
|
|
|
/* First, free memory allocated to this slot (if any) */
|
|
|
|
zram_slot_lock(zram, index);
|
|
|
|
zram_free_page(zram, index);
|
|
|
|
zram_slot_unlock(zram, index);
|
|
|
|
|
2023-11-28 17:22:07 +09:00
|
|
|
mem = kmap_local_page(page);
|
2024-12-18 15:34:20 +09:00
|
|
|
same_filled = page_same_filled(mem, &element);
|
2023-11-28 17:22:07 +09:00
|
|
|
kunmap_local(mem);
|
2024-12-18 15:34:20 +09:00
|
|
|
if (same_filled)
|
|
|
|
return write_same_filled_page(zram, element, index);
|
2011-06-10 15:28:48 +02:00
|
|
|
|
Revert "zram: remove double compression logic"
This reverts commit e7be8d1dd983156b ("zram: remove double compression
logic") as it causes zram failures. It does not revert cleanly, PTR_ERR
handling was introduced in the meantime. This is handled by appropriate
IS_ERR.
When under memory pressure, zs_malloc() can fail. Before the above
commit, the allocation was retried with direct reclaim enabled (GFP_NOIO).
After the commit, it is not -- only __GFP_KSWAPD_RECLAIM is tried.
So when the failure occurs under memory pressure, the overlaying
filesystem such as ext2 (mounted by ext4 module in this case) can emit
failures, making the (file)system unusable:
EXT4-fs warning (device zram0): ext4_end_bio:343: I/O error 10 writing to inode 16386 starting block 159744)
Buffer I/O error on device zram0, logical block 159744
With direct reclaim, memory is really reclaimed and allocation succeeds,
eventually. In the worst case, the oom killer is invoked, which is proper
outcome if user sets up zram too large (in comparison to available RAM).
This very diff doesn't apply to 5.19 (stable) cleanly (see PTR_ERR note
above). Use revert of e7be8d1dd983 directly.
Link: https://bugzilla.suse.com/show_bug.cgi?id=1202203
Link: https://lkml.kernel.org/r/20220810070609.14402-1-jslaby@suse.cz
Fixes: e7be8d1dd983 ("zram: remove double compression logic")
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Alexey Romanov <avromanov@sberdevices.ru>
Cc: Dmitry Rokosov <ddrokosov@sberdevices.ru>
Cc: Lukas Czerner <lczerner@redhat.com>
Cc: <stable@vger.kernel.org> [5.19]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-08-10 09:06:09 +02:00
|
|
|
compress_again:
|
2022-11-09 20:50:35 +09:00
|
|
|
zstrm = zcomp_stream_get(zram->comps[ZRAM_PRIMARY_COMP]);
|
2024-12-18 15:34:21 +09:00
|
|
|
mem = kmap_local_page(page);
|
2024-09-02 19:55:52 +09:00
|
|
|
ret = zcomp_compress(zram->comps[ZRAM_PRIMARY_COMP], zstrm,
|
2024-12-18 15:34:21 +09:00
|
|
|
mem, &comp_len);
|
|
|
|
kunmap_local(mem);
|
2009-09-22 10:26:53 +05:30
|
|
|
|
2014-04-07 15:38:12 -07:00
|
|
|
if (unlikely(ret)) {
|
2022-11-09 20:50:35 +09:00
|
|
|
zcomp_stream_put(zram->comps[ZRAM_PRIMARY_COMP]);
|
2011-06-10 15:28:47 +02:00
|
|
|
pr_err("Compression failed! err=%d\n", ret);
|
Revert "zram: remove double compression logic"
This reverts commit e7be8d1dd983156b ("zram: remove double compression
logic") as it causes zram failures. It does not revert cleanly, PTR_ERR
handling was introduced in the meantime. This is handled by appropriate
IS_ERR.
When under memory pressure, zs_malloc() can fail. Before the above
commit, the allocation was retried with direct reclaim enabled (GFP_NOIO).
After the commit, it is not -- only __GFP_KSWAPD_RECLAIM is tried.
So when the failure occurs under memory pressure, the overlaying
filesystem such as ext2 (mounted by ext4 module in this case) can emit
failures, making the (file)system unusable:
EXT4-fs warning (device zram0): ext4_end_bio:343: I/O error 10 writing to inode 16386 starting block 159744)
Buffer I/O error on device zram0, logical block 159744
With direct reclaim, memory is really reclaimed and allocation succeeds,
eventually. In the worst case, the oom killer is invoked, which is proper
outcome if user sets up zram too large (in comparison to available RAM).
This very diff doesn't apply to 5.19 (stable) cleanly (see PTR_ERR note
above). Use revert of e7be8d1dd983 directly.
Link: https://bugzilla.suse.com/show_bug.cgi?id=1202203
Link: https://lkml.kernel.org/r/20220810070609.14402-1-jslaby@suse.cz
Fixes: e7be8d1dd983 ("zram: remove double compression logic")
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Alexey Romanov <avromanov@sberdevices.ru>
Cc: Dmitry Rokosov <ddrokosov@sberdevices.ru>
Cc: Lukas Czerner <lczerner@redhat.com>
Cc: <stable@vger.kernel.org> [5.19]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-08-10 09:06:09 +02:00
|
|
|
zs_free(zram->mem_pool, handle);
|
2017-05-03 14:55:41 -07:00
|
|
|
return ret;
|
2011-06-10 15:28:47 +02:00
|
|
|
}
|
zram: user per-cpu compression streams
Remove idle streams list and keep compression streams in per-cpu data.
This removes two contented spin_lock()/spin_unlock() calls from write
path and also prevent write OP from being preempted while holding the
compression stream, which can cause slow downs.
For instance, let's assume that we have N cpus and N-2
max_comp_streams.TASK1 owns the last idle stream, TASK2-TASK3 come in
with the write requests:
TASK1 TASK2 TASK3
zram_bvec_write()
spin_lock
find stream
spin_unlock
compress
<<preempted>> zram_bvec_write()
spin_lock
find stream
spin_unlock
no_stream
schedule
zram_bvec_write()
spin_lock
find_stream
spin_unlock
no_stream
schedule
spin_lock
release stream
spin_unlock
wake up TASK2
not only TASK2 and TASK3 will not get the stream, TASK1 will be
preempted in the middle of its operation; while we would prefer it to
finish compression and release the stream.
Test environment: x86_64, 4 CPU box, 3G zram, lzo
The following fio tests were executed:
read, randread, write, randwrite, rw, randrw
with the increasing number of jobs from 1 to 10.
4 streams 8 streams per-cpu
===========================================================
jobs1
READ: 2520.1MB/s 2566.5MB/s 2491.5MB/s
READ: 2102.7MB/s 2104.2MB/s 2091.3MB/s
WRITE: 1355.1MB/s 1320.2MB/s 1378.9MB/s
WRITE: 1103.5MB/s 1097.2MB/s 1122.5MB/s
READ: 434013KB/s 435153KB/s 439961KB/s
WRITE: 433969KB/s 435109KB/s 439917KB/s
READ: 403166KB/s 405139KB/s 403373KB/s
WRITE: 403223KB/s 405197KB/s 403430KB/s
jobs2
READ: 7958.6MB/s 8105.6MB/s 8073.7MB/s
READ: 6864.9MB/s 6989.8MB/s 7021.8MB/s
WRITE: 2438.1MB/s 2346.9MB/s 3400.2MB/s
WRITE: 1994.2MB/s 1990.3MB/s 2941.2MB/s
READ: 981504KB/s 973906KB/s 1018.8MB/s
WRITE: 981659KB/s 974060KB/s 1018.1MB/s
READ: 937021KB/s 938976KB/s 987250KB/s
WRITE: 934878KB/s 936830KB/s 984993KB/s
jobs3
READ: 13280MB/s 13553MB/s 13553MB/s
READ: 11534MB/s 11785MB/s 11755MB/s
WRITE: 3456.9MB/s 3469.9MB/s 4810.3MB/s
WRITE: 3029.6MB/s 3031.6MB/s 4264.8MB/s
READ: 1363.8MB/s 1362.6MB/s 1448.9MB/s
WRITE: 1361.9MB/s 1360.7MB/s 1446.9MB/s
READ: 1309.4MB/s 1310.6MB/s 1397.5MB/s
WRITE: 1307.4MB/s 1308.5MB/s 1395.3MB/s
jobs4
READ: 20244MB/s 20177MB/s 20344MB/s
READ: 17886MB/s 17913MB/s 17835MB/s
WRITE: 4071.6MB/s 4046.1MB/s 6370.2MB/s
WRITE: 3608.9MB/s 3576.3MB/s 5785.4MB/s
READ: 1824.3MB/s 1821.6MB/s 1997.5MB/s
WRITE: 1819.8MB/s 1817.4MB/s 1992.5MB/s
READ: 1765.7MB/s 1768.3MB/s 1937.3MB/s
WRITE: 1767.5MB/s 1769.1MB/s 1939.2MB/s
jobs5
READ: 18663MB/s 18986MB/s 18823MB/s
READ: 16659MB/s 16605MB/s 16954MB/s
WRITE: 3912.4MB/s 3888.7MB/s 6126.9MB/s
WRITE: 3506.4MB/s 3442.5MB/s 5519.3MB/s
READ: 1798.2MB/s 1746.5MB/s 1935.8MB/s
WRITE: 1792.7MB/s 1740.7MB/s 1929.1MB/s
READ: 1727.6MB/s 1658.2MB/s 1917.3MB/s
WRITE: 1726.5MB/s 1657.2MB/s 1916.6MB/s
jobs6
READ: 21017MB/s 20922MB/s 21162MB/s
READ: 19022MB/s 19140MB/s 18770MB/s
WRITE: 3968.2MB/s 4037.7MB/s 6620.8MB/s
WRITE: 3643.5MB/s 3590.2MB/s 6027.5MB/s
READ: 1871.8MB/s 1880.5MB/s 2049.9MB/s
WRITE: 1867.8MB/s 1877.2MB/s 2046.2MB/s
READ: 1755.8MB/s 1710.3MB/s 1964.7MB/s
WRITE: 1750.5MB/s 1705.9MB/s 1958.8MB/s
jobs7
READ: 21103MB/s 20677MB/s 21482MB/s
READ: 18522MB/s 18379MB/s 19443MB/s
WRITE: 4022.5MB/s 4067.4MB/s 6755.9MB/s
WRITE: 3691.7MB/s 3695.5MB/s 5925.6MB/s
READ: 1841.5MB/s 1933.9MB/s 2090.5MB/s
WRITE: 1842.7MB/s 1935.3MB/s 2091.9MB/s
READ: 1832.4MB/s 1856.4MB/s 1971.5MB/s
WRITE: 1822.3MB/s 1846.2MB/s 1960.6MB/s
jobs8
READ: 20463MB/s 20194MB/s 20862MB/s
READ: 18178MB/s 17978MB/s 18299MB/s
WRITE: 4085.9MB/s 4060.2MB/s 7023.8MB/s
WRITE: 3776.3MB/s 3737.9MB/s 6278.2MB/s
READ: 1957.6MB/s 1944.4MB/s 2109.5MB/s
WRITE: 1959.2MB/s 1946.2MB/s 2111.4MB/s
READ: 1900.6MB/s 1885.7MB/s 2082.1MB/s
WRITE: 1896.2MB/s 1881.4MB/s 2078.3MB/s
jobs9
READ: 19692MB/s 19734MB/s 19334MB/s
READ: 17678MB/s 18249MB/s 17666MB/s
WRITE: 4004.7MB/s 4064.8MB/s 6990.7MB/s
WRITE: 3724.7MB/s 3772.1MB/s 6193.6MB/s
READ: 1953.7MB/s 1967.3MB/s 2105.6MB/s
WRITE: 1953.4MB/s 1966.7MB/s 2104.1MB/s
READ: 1860.4MB/s 1897.4MB/s 2068.5MB/s
WRITE: 1858.9MB/s 1895.9MB/s 2066.8MB/s
jobs10
READ: 19730MB/s 19579MB/s 19492MB/s
READ: 18028MB/s 18018MB/s 18221MB/s
WRITE: 4027.3MB/s 4090.6MB/s 7020.1MB/s
WRITE: 3810.5MB/s 3846.8MB/s 6426.8MB/s
READ: 1956.1MB/s 1994.6MB/s 2145.2MB/s
WRITE: 1955.9MB/s 1993.5MB/s 2144.8MB/s
READ: 1852.8MB/s 1911.6MB/s 2075.8MB/s
WRITE: 1855.7MB/s 1914.6MB/s 2078.1MB/s
perf stat
4 streams 8 streams per-cpu
====================================================================================================================
jobs1
stalled-cycles-frontend 23,174,811,209 ( 38.21%) 23,220,254,188 ( 38.25%) 23,061,406,918 ( 38.34%)
stalled-cycles-backend 11,514,174,638 ( 18.98%) 11,696,722,657 ( 19.27%) 11,370,852,810 ( 18.90%)
instructions 73,925,005,782 ( 1.22) 73,903,177,632 ( 1.22) 73,507,201,037 ( 1.22)
branches 14,455,124,835 ( 756.063) 14,455,184,779 ( 755.281) 14,378,599,509 ( 758.546)
branch-misses 69,801,336 ( 0.48%) 80,225,529 ( 0.55%) 72,044,726 ( 0.50%)
jobs2
stalled-cycles-frontend 49,912,741,782 ( 46.11%) 50,101,189,290 ( 45.95%) 32,874,195,633 ( 35.11%)
stalled-cycles-backend 27,080,366,230 ( 25.02%) 27,949,970,232 ( 25.63%) 16,461,222,706 ( 17.58%)
instructions 122,831,629,690 ( 1.13) 122,919,846,419 ( 1.13) 121,924,786,775 ( 1.30)
branches 23,725,889,239 ( 692.663) 23,733,547,140 ( 688.062) 23,553,950,311 ( 794.794)
branch-misses 90,733,041 ( 0.38%) 96,320,895 ( 0.41%) 84,561,092 ( 0.36%)
jobs3
stalled-cycles-frontend 66,437,834,608 ( 45.58%) 63,534,923,344 ( 43.69%) 42,101,478,505 ( 33.19%)
stalled-cycles-backend 34,940,799,661 ( 23.97%) 34,774,043,148 ( 23.91%) 21,163,324,388 ( 16.68%)
instructions 171,692,121,862 ( 1.18) 171,775,373,044 ( 1.18) 170,353,542,261 ( 1.34)
branches 32,968,962,622 ( 628.723) 32,987,739,894 ( 630.512) 32,729,463,918 ( 717.027)
branch-misses 111,522,732 ( 0.34%) 110,472,894 ( 0.33%) 99,791,291 ( 0.30%)
jobs4
stalled-cycles-frontend 98,741,701,675 ( 49.72%) 94,797,349,965 ( 47.59%) 54,535,655,381 ( 33.53%)
stalled-cycles-backend 54,642,609,615 ( 27.51%) 55,233,554,408 ( 27.73%) 27,882,323,541 ( 17.14%)
instructions 220,884,807,851 ( 1.11) 220,930,887,273 ( 1.11) 218,926,845,851 ( 1.35)
branches 42,354,518,180 ( 592.105) 42,362,770,587 ( 590.452) 41,955,552,870 ( 716.154)
branch-misses 138,093,449 ( 0.33%) 131,295,286 ( 0.31%) 121,794,771 ( 0.29%)
jobs5
stalled-cycles-frontend 116,219,747,212 ( 48.14%) 110,310,397,012 ( 46.29%) 66,373,082,723 ( 33.70%)
stalled-cycles-backend 66,325,434,776 ( 27.48%) 64,157,087,914 ( 26.92%) 32,999,097,299 ( 16.76%)
instructions 270,615,008,466 ( 1.12) 270,546,409,525 ( 1.14) 268,439,910,948 ( 1.36)
branches 51,834,046,557 ( 599.108) 51,811,867,722 ( 608.883) 51,412,576,077 ( 729.213)
branch-misses 158,197,086 ( 0.31%) 142,639,805 ( 0.28%) 133,425,455 ( 0.26%)
jobs6
stalled-cycles-frontend 138,009,414,492 ( 48.23%) 139,063,571,254 ( 48.80%) 75,278,568,278 ( 32.80%)
stalled-cycles-backend 79,211,949,650 ( 27.68%) 79,077,241,028 ( 27.75%) 37,735,797,899 ( 16.44%)
instructions 319,763,993,731 ( 1.12) 319,937,782,834 ( 1.12) 316,663,600,784 ( 1.38)
branches 61,219,433,294 ( 595.056) 61,250,355,540 ( 598.215) 60,523,446,617 ( 733.706)
branch-misses 169,257,123 ( 0.28%) 154,898,028 ( 0.25%) 141,180,587 ( 0.23%)
jobs7
stalled-cycles-frontend 162,974,812,119 ( 49.20%) 159,290,061,987 ( 48.43%) 88,046,641,169 ( 33.21%)
stalled-cycles-backend 92,223,151,661 ( 27.84%) 91,667,904,406 ( 27.87%) 44,068,454,971 ( 16.62%)
instructions 369,516,432,430 ( 1.12) 369,361,799,063 ( 1.12) 365,290,380,661 ( 1.38)
branches 70,795,673,950 ( 594.220) 70,743,136,124 ( 597.876) 69,803,996,038 ( 732.822)
branch-misses 181,708,327 ( 0.26%) 165,767,821 ( 0.23%) 150,109,797 ( 0.22%)
jobs8
stalled-cycles-frontend 185,000,017,027 ( 49.30%) 182,334,345,473 ( 48.37%) 99,980,147,041 ( 33.26%)
stalled-cycles-backend 105,753,516,186 ( 28.18%) 107,937,830,322 ( 28.63%) 51,404,177,181 ( 17.10%)
instructions 418,153,161,055 ( 1.11) 418,308,565,828 ( 1.11) 413,653,475,581 ( 1.38)
branches 80,035,882,398 ( 592.296) 80,063,204,510 ( 589.843) 79,024,105,589 ( 730.530)
branch-misses 199,764,528 ( 0.25%) 177,936,926 ( 0.22%) 160,525,449 ( 0.20%)
jobs9
stalled-cycles-frontend 210,941,799,094 ( 49.63%) 204,714,679,254 ( 48.55%) 114,251,113,756 ( 33.96%)
stalled-cycles-backend 122,640,849,067 ( 28.85%) 122,188,553,256 ( 28.98%) 58,360,041,127 ( 17.35%)
instructions 468,151,025,415 ( 1.10) 467,354,869,323 ( 1.11) 462,665,165,216 ( 1.38)
branches 89,657,067,510 ( 585.628) 89,411,550,407 ( 588.990) 88,360,523,943 ( 730.151)
branch-misses 218,292,301 ( 0.24%) 191,701,247 ( 0.21%) 178,535,678 ( 0.20%)
jobs10
stalled-cycles-frontend 233,595,958,008 ( 49.81%) 227,540,615,689 ( 49.11%) 160,341,979,938 ( 43.07%)
stalled-cycles-backend 136,153,676,021 ( 29.03%) 133,635,240,742 ( 28.84%) 65,909,135,465 ( 17.70%)
instructions 517,001,168,497 ( 1.10) 516,210,976,158 ( 1.11) 511,374,038,613 ( 1.37)
branches 98,911,641,329 ( 585.796) 98,700,069,712 ( 591.583) 97,646,761,028 ( 728.712)
branch-misses 232,341,823 ( 0.23%) 199,256,308 ( 0.20%) 183,135,268 ( 0.19%)
per-cpu streams tend to cause significantly less stalled cycles; execute
less branches and hit less branch-misses.
perf stat reported execution time
4 streams 8 streams per-cpu
====================================================================
jobs1
seconds elapsed 20.909073870 20.875670495 20.817838540
jobs2
seconds elapsed 18.529488399 18.720566469 16.356103108
jobs3
seconds elapsed 18.991159531 18.991340812 16.766216066
jobs4
seconds elapsed 19.560643828 19.551323547 16.246621715
jobs5
seconds elapsed 24.746498464 25.221646740 20.696112444
jobs6
seconds elapsed 28.258181828 28.289765505 22.885688857
jobs7
seconds elapsed 32.632490241 31.909125381 26.272753738
jobs8
seconds elapsed 35.651403851 36.027596308 29.108024711
jobs9
seconds elapsed 40.569362365 40.024227989 32.898204012
jobs10
seconds elapsed 44.673112304 43.874898137 35.632952191
Please see
Link: http://marc.info/?l=linux-kernel&m=146166970727530
Link: http://marc.info/?l=linux-kernel&m=146174716719650
for more test results (under low memory conditions).
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Suggested-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 16:59:51 -07:00
|
|
|
|
2024-12-18 15:34:21 +09:00
|
|
|
if (comp_len >= huge_class_size) {
|
|
|
|
zcomp_stream_put(zram->comps[ZRAM_PRIMARY_COMP]);
|
|
|
|
return write_incompressible_page(zram, page, index);
|
|
|
|
}
|
|
|
|
|
Revert "zram: remove double compression logic"
This reverts commit e7be8d1dd983156b ("zram: remove double compression
logic") as it causes zram failures. It does not revert cleanly, PTR_ERR
handling was introduced in the meantime. This is handled by appropriate
IS_ERR.
When under memory pressure, zs_malloc() can fail. Before the above
commit, the allocation was retried with direct reclaim enabled (GFP_NOIO).
After the commit, it is not -- only __GFP_KSWAPD_RECLAIM is tried.
So when the failure occurs under memory pressure, the overlaying
filesystem such as ext2 (mounted by ext4 module in this case) can emit
failures, making the (file)system unusable:
EXT4-fs warning (device zram0): ext4_end_bio:343: I/O error 10 writing to inode 16386 starting block 159744)
Buffer I/O error on device zram0, logical block 159744
With direct reclaim, memory is really reclaimed and allocation succeeds,
eventually. In the worst case, the oom killer is invoked, which is proper
outcome if user sets up zram too large (in comparison to available RAM).
This very diff doesn't apply to 5.19 (stable) cleanly (see PTR_ERR note
above). Use revert of e7be8d1dd983 directly.
Link: https://bugzilla.suse.com/show_bug.cgi?id=1202203
Link: https://lkml.kernel.org/r/20220810070609.14402-1-jslaby@suse.cz
Fixes: e7be8d1dd983 ("zram: remove double compression logic")
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Alexey Romanov <avromanov@sberdevices.ru>
Cc: Dmitry Rokosov <ddrokosov@sberdevices.ru>
Cc: Lukas Czerner <lczerner@redhat.com>
Cc: <stable@vger.kernel.org> [5.19]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-08-10 09:06:09 +02:00
|
|
|
/*
|
|
|
|
* handle allocation has 2 paths:
|
|
|
|
* a) fast path is executed with preemption disabled (for
|
|
|
|
* per-cpu streams) and has __GFP_DIRECT_RECLAIM bit clear,
|
|
|
|
* since we can't sleep;
|
|
|
|
* b) slow path enables preemption and attempts to allocate
|
|
|
|
* the page with __GFP_DIRECT_RECLAIM bit set. we have to
|
|
|
|
* put per-cpu compression stream and, thus, to re-do
|
|
|
|
* the compression once handle is allocated.
|
|
|
|
*
|
|
|
|
* if we have a 'non-null' handle here then we are coming
|
|
|
|
* from the slow path and handle has already been allocated.
|
|
|
|
*/
|
2022-11-09 20:50:41 +09:00
|
|
|
if (IS_ERR_VALUE(handle))
|
Revert "zram: remove double compression logic"
This reverts commit e7be8d1dd983156b ("zram: remove double compression
logic") as it causes zram failures. It does not revert cleanly, PTR_ERR
handling was introduced in the meantime. This is handled by appropriate
IS_ERR.
When under memory pressure, zs_malloc() can fail. Before the above
commit, the allocation was retried with direct reclaim enabled (GFP_NOIO).
After the commit, it is not -- only __GFP_KSWAPD_RECLAIM is tried.
So when the failure occurs under memory pressure, the overlaying
filesystem such as ext2 (mounted by ext4 module in this case) can emit
failures, making the (file)system unusable:
EXT4-fs warning (device zram0): ext4_end_bio:343: I/O error 10 writing to inode 16386 starting block 159744)
Buffer I/O error on device zram0, logical block 159744
With direct reclaim, memory is really reclaimed and allocation succeeds,
eventually. In the worst case, the oom killer is invoked, which is proper
outcome if user sets up zram too large (in comparison to available RAM).
This very diff doesn't apply to 5.19 (stable) cleanly (see PTR_ERR note
above). Use revert of e7be8d1dd983 directly.
Link: https://bugzilla.suse.com/show_bug.cgi?id=1202203
Link: https://lkml.kernel.org/r/20220810070609.14402-1-jslaby@suse.cz
Fixes: e7be8d1dd983 ("zram: remove double compression logic")
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Alexey Romanov <avromanov@sberdevices.ru>
Cc: Dmitry Rokosov <ddrokosov@sberdevices.ru>
Cc: Lukas Czerner <lczerner@redhat.com>
Cc: <stable@vger.kernel.org> [5.19]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-08-10 09:06:09 +02:00
|
|
|
handle = zs_malloc(zram->mem_pool, comp_len,
|
2024-12-18 15:34:21 +09:00
|
|
|
__GFP_KSWAPD_RECLAIM |
|
|
|
|
__GFP_NOWARN |
|
|
|
|
__GFP_HIGHMEM |
|
|
|
|
__GFP_MOVABLE);
|
2022-11-09 20:50:41 +09:00
|
|
|
if (IS_ERR_VALUE(handle)) {
|
2022-11-09 20:50:35 +09:00
|
|
|
zcomp_stream_put(zram->comps[ZRAM_PRIMARY_COMP]);
|
Revert "zram: remove double compression logic"
This reverts commit e7be8d1dd983156b ("zram: remove double compression
logic") as it causes zram failures. It does not revert cleanly, PTR_ERR
handling was introduced in the meantime. This is handled by appropriate
IS_ERR.
When under memory pressure, zs_malloc() can fail. Before the above
commit, the allocation was retried with direct reclaim enabled (GFP_NOIO).
After the commit, it is not -- only __GFP_KSWAPD_RECLAIM is tried.
So when the failure occurs under memory pressure, the overlaying
filesystem such as ext2 (mounted by ext4 module in this case) can emit
failures, making the (file)system unusable:
EXT4-fs warning (device zram0): ext4_end_bio:343: I/O error 10 writing to inode 16386 starting block 159744)
Buffer I/O error on device zram0, logical block 159744
With direct reclaim, memory is really reclaimed and allocation succeeds,
eventually. In the worst case, the oom killer is invoked, which is proper
outcome if user sets up zram too large (in comparison to available RAM).
This very diff doesn't apply to 5.19 (stable) cleanly (see PTR_ERR note
above). Use revert of e7be8d1dd983 directly.
Link: https://bugzilla.suse.com/show_bug.cgi?id=1202203
Link: https://lkml.kernel.org/r/20220810070609.14402-1-jslaby@suse.cz
Fixes: e7be8d1dd983 ("zram: remove double compression logic")
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Alexey Romanov <avromanov@sberdevices.ru>
Cc: Dmitry Rokosov <ddrokosov@sberdevices.ru>
Cc: Lukas Czerner <lczerner@redhat.com>
Cc: <stable@vger.kernel.org> [5.19]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-08-10 09:06:09 +02:00
|
|
|
atomic64_inc(&zram->stats.writestall);
|
|
|
|
handle = zs_malloc(zram->mem_pool, comp_len,
|
2024-12-18 15:34:21 +09:00
|
|
|
GFP_NOIO | __GFP_HIGHMEM |
|
|
|
|
__GFP_MOVABLE);
|
2022-11-09 20:50:41 +09:00
|
|
|
if (IS_ERR_VALUE(handle))
|
2022-08-24 14:31:17 +03:00
|
|
|
return PTR_ERR((void *)handle);
|
|
|
|
|
2024-12-18 15:34:21 +09:00
|
|
|
goto compress_again;
|
2011-06-10 15:28:47 +02:00
|
|
|
}
|
2014-10-09 15:29:53 -07:00
|
|
|
|
2024-12-18 15:34:21 +09:00
|
|
|
if (!zram_can_store_page(zram)) {
|
2022-11-09 20:50:35 +09:00
|
|
|
zcomp_stream_put(zram->comps[ZRAM_PRIMARY_COMP]);
|
2017-05-03 14:55:47 -07:00
|
|
|
zs_free(zram->mem_pool, handle);
|
2017-05-03 14:55:41 -07:00
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
|
2017-05-03 14:55:47 -07:00
|
|
|
dst = zs_map_object(zram->mem_pool, handle, ZS_MM_WO);
|
2017-05-03 14:55:41 -07:00
|
|
|
|
2024-12-18 15:34:21 +09:00
|
|
|
memcpy(dst, zstrm->buffer, comp_len);
|
2022-11-09 20:50:35 +09:00
|
|
|
zcomp_stream_put(zram->comps[ZRAM_PRIMARY_COMP]);
|
2017-05-03 14:55:47 -07:00
|
|
|
zs_unmap_object(zram->mem_pool, handle);
|
2024-12-18 15:34:20 +09:00
|
|
|
|
2017-05-03 14:55:44 -07:00
|
|
|
zram_slot_lock(zram, index);
|
2024-12-18 15:34:20 +09:00
|
|
|
zram_set_handle(zram, index, handle);
|
|
|
|
zram_set_obj_size(zram, index, comp_len);
|
2017-05-03 14:55:44 -07:00
|
|
|
zram_slot_unlock(zram, index);
|
2009-09-22 10:26:53 +05:30
|
|
|
|
2011-06-10 15:28:47 +02:00
|
|
|
/* Update stats */
|
2014-04-07 15:38:03 -07:00
|
|
|
atomic64_inc(&zram->stats.pages_stored);
|
2024-12-18 15:34:21 +09:00
|
|
|
atomic64_add(comp_len, &zram->stats.compr_data_size);
|
|
|
|
|
2017-09-06 16:20:00 -07:00
|
|
|
return ret;
|
2017-05-03 14:55:41 -07:00
|
|
|
}
|
|
|
|
|
2023-04-11 19:14:55 +02:00
|
|
|
/*
|
|
|
|
* This is a partial IO. Read the full page before writing the changes.
|
|
|
|
*/
|
|
|
|
static int zram_bvec_write_partial(struct zram *zram, struct bio_vec *bvec,
|
|
|
|
u32 index, int offset, struct bio *bio)
|
2017-05-03 14:55:41 -07:00
|
|
|
{
|
2023-04-11 19:14:55 +02:00
|
|
|
struct page *page = alloc_page(GFP_NOIO);
|
2017-05-03 14:55:41 -07:00
|
|
|
int ret;
|
|
|
|
|
2023-04-11 19:14:55 +02:00
|
|
|
if (!page)
|
|
|
|
return -ENOMEM;
|
2017-05-03 14:55:41 -07:00
|
|
|
|
2023-04-11 19:14:58 +02:00
|
|
|
ret = zram_read_page(zram, page, index, bio);
|
2023-04-11 19:14:55 +02:00
|
|
|
if (!ret) {
|
2023-04-11 19:14:50 +02:00
|
|
|
memcpy_from_bvec(page_address(page) + offset, bvec);
|
2023-04-11 19:14:55 +02:00
|
|
|
ret = zram_write_page(zram, page, index);
|
2017-05-03 14:55:41 -07:00
|
|
|
}
|
2023-04-11 19:14:55 +02:00
|
|
|
__free_page(page);
|
|
|
|
return ret;
|
|
|
|
}
|
2017-05-03 14:55:41 -07:00
|
|
|
|
2023-04-11 19:14:55 +02:00
|
|
|
static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec,
|
|
|
|
u32 index, int offset, struct bio *bio)
|
|
|
|
{
|
2013-01-02 08:53:41 -08:00
|
|
|
if (is_partial_io(bvec))
|
2023-04-11 19:14:55 +02:00
|
|
|
return zram_bvec_write_partial(zram, bvec, index, offset, bio);
|
|
|
|
return zram_write_page(zram, bvec->bv_page, index);
|
2011-06-10 15:28:47 +02:00
|
|
|
}
|
|
|
|
|
2022-11-09 20:50:38 +09:00
|
|
|
#ifdef CONFIG_ZRAM_MULTI_COMP
|
zram: rework recompress target selection strategy
Target slot selection for recompression is just a simple iteration over
zram->table entries (stored pages) from slot 0 to max slot. Given that
zram->table slots are written in random order and are not sorted by size,
a simple iteration over slots selects suboptimal targets for
recompression. This is not a problem if we recompress every single
zram->table slot, but we never do that in reality. In reality we limit
the number of slots we can recompress (via max_pages parameter) and hence
proper slot selection becomes very important. The strategy is quite
simple, suppose we have two candidate slots for recompression, one of size
48 bytes and one of size 2800 bytes, and we can recompress only one, then
it certainly makes more sense to pick 2800 entry for recompression.
Because even if we manage to compress 48 bytes objects even further the
savings are going to be very small. Potential savings after good
re-compression of 2800 bytes objects are much higher.
This patch reworks slot selection and introduces the strategy described
above: among candidate slots always select the biggest ones first.
For that the patch introduces zram_pp_ctl (post-processing) structure
which holds NUM_PP_BUCKETS pp buckets of slots. Slots are assigned to a
particular group based on their sizes - the larger the size of the slot
the higher the group index. This, basically, sorts slots by size in liner
time (we still perform just one iteration over zram->table slots). When
we select slot for recompression we always first lookup in higher pp
buckets (those that hold the largest slots). Which achieves the desired
behavior.
TEST
====
A very simple demonstration: zram is configured with zstd, and zstd with
dict as a recompression stream. A limited (max 4096 pages) recompression
is performed then, with a log of sizes of slots that were recompressed.
You can see that patched zram selects slots for recompression in
significantly different manner, which leads to higher memory savings (see
column #2 of mm_stat output).
BASE
----
*** initial state of zram device
/sys/block/zram0/mm_stat
1750994944 504491413 514203648 0 514203648 1 0 34204 34204
*** recompress idle max_pages=4096
/sys/block/zram0/mm_stat
1750994944 504262229 514953216 0 514203648 1 0 34204 34204
Sizes of selected objects for recompression:
... 45 58 24 226 91 40 24 24 24 424 2104 93 2078 2078 2078 959 154 ...
PATCHED
-------
*** initial state of zram device
/sys/block/zram0/mm_stat
1750982656 504492801 514170880 0 514170880 1 0 34204 34204
*** recompress idle max_pages=4096
/sys/block/zram0/mm_stat
1750982656 503716710 517586944 0 514170880 1 0 34204 34204
Sizes of selected objects for recompression:
... 3680 3694 3667 3590 3614 3553 3537 3548 3550 3542 3543 3537 ...
Note, pp-slots are not strictly sorted, there is a PP_BUCKET_SIZE_RANGE
variation of sizes within particular bucket.
[senozhatsky@chromium.org: do not skip the first bucket]
Link: https://lkml.kernel.org/r/20241001085634.1948384-1-senozhatsky@chromium.org
Link: https://lkml.kernel.org/r/20240917021020.883356-4-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-17 11:09:08 +09:00
|
|
|
#define RECOMPRESS_IDLE (1 << 0)
|
|
|
|
#define RECOMPRESS_HUGE (1 << 1)
|
|
|
|
|
|
|
|
static int scan_slots_for_recompress(struct zram *zram, u32 mode,
|
|
|
|
struct zram_pp_ctl *ctl)
|
|
|
|
{
|
|
|
|
unsigned long nr_pages = zram->disksize >> PAGE_SHIFT;
|
|
|
|
struct zram_pp_slot *pps = NULL;
|
|
|
|
unsigned long index;
|
|
|
|
|
|
|
|
for (index = 0; index < nr_pages; index++) {
|
|
|
|
if (!pps)
|
|
|
|
pps = kmalloc(sizeof(*pps), GFP_KERNEL);
|
|
|
|
if (!pps)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
INIT_LIST_HEAD(&pps->entry);
|
|
|
|
|
|
|
|
zram_slot_lock(zram, index);
|
|
|
|
if (!zram_allocated(zram, index))
|
|
|
|
goto next;
|
|
|
|
|
|
|
|
if (mode & RECOMPRESS_IDLE &&
|
|
|
|
!zram_test_flag(zram, index, ZRAM_IDLE))
|
|
|
|
goto next;
|
|
|
|
|
|
|
|
if (mode & RECOMPRESS_HUGE &&
|
|
|
|
!zram_test_flag(zram, index, ZRAM_HUGE))
|
|
|
|
goto next;
|
|
|
|
|
|
|
|
if (zram_test_flag(zram, index, ZRAM_WB) ||
|
|
|
|
zram_test_flag(zram, index, ZRAM_SAME) ||
|
|
|
|
zram_test_flag(zram, index, ZRAM_INCOMPRESSIBLE))
|
|
|
|
goto next;
|
|
|
|
|
|
|
|
pps->index = index;
|
|
|
|
place_pp_slot(zram, ctl, pps);
|
|
|
|
pps = NULL;
|
|
|
|
next:
|
|
|
|
zram_slot_unlock(zram, index);
|
|
|
|
}
|
|
|
|
|
|
|
|
kfree(pps);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2022-11-09 20:50:38 +09:00
|
|
|
/*
|
|
|
|
* This function will decompress (unless it's ZRAM_HUGE) the page and then
|
|
|
|
* attempt to compress it using provided compression algorithm priority
|
|
|
|
* (which is potentially more effective).
|
|
|
|
*
|
|
|
|
* Corresponding ZRAM slot should be locked.
|
|
|
|
*/
|
zram: rework recompress target selection strategy
Target slot selection for recompression is just a simple iteration over
zram->table entries (stored pages) from slot 0 to max slot. Given that
zram->table slots are written in random order and are not sorted by size,
a simple iteration over slots selects suboptimal targets for
recompression. This is not a problem if we recompress every single
zram->table slot, but we never do that in reality. In reality we limit
the number of slots we can recompress (via max_pages parameter) and hence
proper slot selection becomes very important. The strategy is quite
simple, suppose we have two candidate slots for recompression, one of size
48 bytes and one of size 2800 bytes, and we can recompress only one, then
it certainly makes more sense to pick 2800 entry for recompression.
Because even if we manage to compress 48 bytes objects even further the
savings are going to be very small. Potential savings after good
re-compression of 2800 bytes objects are much higher.
This patch reworks slot selection and introduces the strategy described
above: among candidate slots always select the biggest ones first.
For that the patch introduces zram_pp_ctl (post-processing) structure
which holds NUM_PP_BUCKETS pp buckets of slots. Slots are assigned to a
particular group based on their sizes - the larger the size of the slot
the higher the group index. This, basically, sorts slots by size in liner
time (we still perform just one iteration over zram->table slots). When
we select slot for recompression we always first lookup in higher pp
buckets (those that hold the largest slots). Which achieves the desired
behavior.
TEST
====
A very simple demonstration: zram is configured with zstd, and zstd with
dict as a recompression stream. A limited (max 4096 pages) recompression
is performed then, with a log of sizes of slots that were recompressed.
You can see that patched zram selects slots for recompression in
significantly different manner, which leads to higher memory savings (see
column #2 of mm_stat output).
BASE
----
*** initial state of zram device
/sys/block/zram0/mm_stat
1750994944 504491413 514203648 0 514203648 1 0 34204 34204
*** recompress idle max_pages=4096
/sys/block/zram0/mm_stat
1750994944 504262229 514953216 0 514203648 1 0 34204 34204
Sizes of selected objects for recompression:
... 45 58 24 226 91 40 24 24 24 424 2104 93 2078 2078 2078 959 154 ...
PATCHED
-------
*** initial state of zram device
/sys/block/zram0/mm_stat
1750982656 504492801 514170880 0 514170880 1 0 34204 34204
*** recompress idle max_pages=4096
/sys/block/zram0/mm_stat
1750982656 503716710 517586944 0 514170880 1 0 34204 34204
Sizes of selected objects for recompression:
... 3680 3694 3667 3590 3614 3553 3537 3548 3550 3542 3543 3537 ...
Note, pp-slots are not strictly sorted, there is a PP_BUCKET_SIZE_RANGE
variation of sizes within particular bucket.
[senozhatsky@chromium.org: do not skip the first bucket]
Link: https://lkml.kernel.org/r/20241001085634.1948384-1-senozhatsky@chromium.org
Link: https://lkml.kernel.org/r/20240917021020.883356-4-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-17 11:09:08 +09:00
|
|
|
static int recompress_slot(struct zram *zram, u32 index, struct page *page,
|
2024-03-29 18:39:41 +09:00
|
|
|
u64 *num_recomp_pages, u32 threshold, u32 prio,
|
|
|
|
u32 prio_max)
|
2022-11-09 20:50:38 +09:00
|
|
|
{
|
|
|
|
struct zcomp_strm *zstrm = NULL;
|
|
|
|
unsigned long handle_old;
|
|
|
|
unsigned long handle_new;
|
|
|
|
unsigned int comp_len_old;
|
|
|
|
unsigned int comp_len_new;
|
2022-11-09 20:50:42 +09:00
|
|
|
unsigned int class_index_old;
|
|
|
|
unsigned int class_index_new;
|
2022-11-09 20:50:44 +09:00
|
|
|
u32 num_recomps = 0;
|
2022-11-09 20:50:38 +09:00
|
|
|
void *src, *dst;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
handle_old = zram_get_handle(zram, index);
|
|
|
|
if (!handle_old)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
comp_len_old = zram_get_obj_size(zram, index);
|
|
|
|
/*
|
|
|
|
* Do not recompress objects that are already "small enough".
|
|
|
|
*/
|
|
|
|
if (comp_len_old < threshold)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
ret = zram_read_from_zspool(zram, page, index);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
2024-10-29 00:36:14 +09:00
|
|
|
/*
|
|
|
|
* We touched this entry so mark it as non-IDLE. This makes sure that
|
|
|
|
* we don't preserve IDLE flag and don't incorrectly pick this entry
|
|
|
|
* for different post-processing type (e.g. writeback).
|
|
|
|
*/
|
|
|
|
zram_clear_flag(zram, index, ZRAM_IDLE);
|
|
|
|
|
2022-11-09 20:50:42 +09:00
|
|
|
class_index_old = zs_lookup_class_index(zram->mem_pool, comp_len_old);
|
2022-11-09 20:50:38 +09:00
|
|
|
/*
|
|
|
|
* Iterate the secondary comp algorithms list (in order of priority)
|
|
|
|
* and try to recompress the page.
|
|
|
|
*/
|
|
|
|
for (; prio < prio_max; prio++) {
|
|
|
|
if (!zram->comps[prio])
|
|
|
|
continue;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Skip if the object is already re-compressed with a higher
|
|
|
|
* priority algorithm (or same algorithm).
|
|
|
|
*/
|
|
|
|
if (prio <= zram_get_priority(zram, index))
|
|
|
|
continue;
|
|
|
|
|
2022-11-09 20:50:44 +09:00
|
|
|
num_recomps++;
|
2022-11-09 20:50:38 +09:00
|
|
|
zstrm = zcomp_stream_get(zram->comps[prio]);
|
2023-11-28 17:22:07 +09:00
|
|
|
src = kmap_local_page(page);
|
2024-09-02 19:55:52 +09:00
|
|
|
ret = zcomp_compress(zram->comps[prio], zstrm,
|
|
|
|
src, &comp_len_new);
|
2023-11-28 17:22:07 +09:00
|
|
|
kunmap_local(src);
|
2022-11-09 20:50:38 +09:00
|
|
|
|
|
|
|
if (ret) {
|
|
|
|
zcomp_stream_put(zram->comps[prio]);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2022-11-09 20:50:42 +09:00
|
|
|
class_index_new = zs_lookup_class_index(zram->mem_pool,
|
|
|
|
comp_len_new);
|
|
|
|
|
2022-11-09 20:50:38 +09:00
|
|
|
/* Continue until we make progress */
|
2022-11-09 20:50:43 +09:00
|
|
|
if (class_index_new >= class_index_old ||
|
2022-11-09 20:50:38 +09:00
|
|
|
(threshold && comp_len_new >= threshold)) {
|
|
|
|
zcomp_stream_put(zram->comps[prio]);
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Recompression was successful so break out */
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We did not try to recompress, e.g. when we have only one
|
|
|
|
* secondary algorithm and the page is already recompressed
|
|
|
|
* using that algorithm
|
|
|
|
*/
|
|
|
|
if (!zstrm)
|
|
|
|
return 0;
|
|
|
|
|
2024-03-29 18:39:41 +09:00
|
|
|
/*
|
|
|
|
* Decrement the limit (if set) on pages we can recompress, even
|
|
|
|
* when current recompression was unsuccessful or did not compress
|
|
|
|
* the page below the threshold, because we still spent resources
|
|
|
|
* on it.
|
|
|
|
*/
|
|
|
|
if (*num_recomp_pages)
|
|
|
|
*num_recomp_pages -= 1;
|
|
|
|
|
2022-11-09 20:50:43 +09:00
|
|
|
if (class_index_new >= class_index_old) {
|
2022-11-09 20:50:44 +09:00
|
|
|
/*
|
|
|
|
* Secondary algorithms failed to re-compress the page
|
|
|
|
* in a way that would save memory, mark the object as
|
|
|
|
* incompressible so that we will not try to compress
|
|
|
|
* it again.
|
|
|
|
*
|
|
|
|
* We need to make sure that all secondary algorithms have
|
|
|
|
* failed, so we test if the number of recompressions matches
|
|
|
|
* the number of active secondary algorithms.
|
|
|
|
*/
|
|
|
|
if (num_recomps == zram->num_active_comps - 1)
|
|
|
|
zram_set_flag(zram, index, ZRAM_INCOMPRESSIBLE);
|
2022-11-09 20:50:38 +09:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Successful recompression but above threshold */
|
|
|
|
if (threshold && comp_len_new >= threshold)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* No direct reclaim (slow path) for handle allocation and no
|
2023-04-11 19:14:54 +02:00
|
|
|
* re-compression attempt (unlike in zram_write_bvec()) since
|
2022-11-09 20:50:38 +09:00
|
|
|
* we already have stored that object in zsmalloc. If we cannot
|
|
|
|
* alloc memory for recompressed object then we bail out and
|
|
|
|
* simply keep the old (existing) object in zsmalloc.
|
|
|
|
*/
|
|
|
|
handle_new = zs_malloc(zram->mem_pool, comp_len_new,
|
|
|
|
__GFP_KSWAPD_RECLAIM |
|
|
|
|
__GFP_NOWARN |
|
|
|
|
__GFP_HIGHMEM |
|
|
|
|
__GFP_MOVABLE);
|
|
|
|
if (IS_ERR_VALUE(handle_new)) {
|
|
|
|
zcomp_stream_put(zram->comps[prio]);
|
|
|
|
return PTR_ERR((void *)handle_new);
|
|
|
|
}
|
|
|
|
|
|
|
|
dst = zs_map_object(zram->mem_pool, handle_new, ZS_MM_WO);
|
|
|
|
memcpy(dst, zstrm->buffer, comp_len_new);
|
|
|
|
zcomp_stream_put(zram->comps[prio]);
|
|
|
|
|
|
|
|
zs_unmap_object(zram->mem_pool, handle_new);
|
|
|
|
|
|
|
|
zram_free_page(zram, index);
|
|
|
|
zram_set_handle(zram, index, handle_new);
|
|
|
|
zram_set_obj_size(zram, index, comp_len_new);
|
|
|
|
zram_set_priority(zram, index, prio);
|
|
|
|
|
|
|
|
atomic64_add(comp_len_new, &zram->stats.compr_data_size);
|
|
|
|
atomic64_inc(&zram->stats.pages_stored);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static ssize_t recompress_store(struct device *dev,
|
|
|
|
struct device_attribute *attr,
|
|
|
|
const char *buf, size_t len)
|
|
|
|
{
|
2022-11-09 20:50:44 +09:00
|
|
|
u32 prio = ZRAM_SECONDARY_COMP, prio_max = ZRAM_MAX_COMPS;
|
2022-11-09 20:50:38 +09:00
|
|
|
struct zram *zram = dev_to_zram(dev);
|
2022-11-09 20:50:44 +09:00
|
|
|
char *args, *param, *val, *algo = NULL;
|
2024-03-29 18:39:41 +09:00
|
|
|
u64 num_recomp_pages = ULLONG_MAX;
|
zram: rework recompress target selection strategy
Target slot selection for recompression is just a simple iteration over
zram->table entries (stored pages) from slot 0 to max slot. Given that
zram->table slots are written in random order and are not sorted by size,
a simple iteration over slots selects suboptimal targets for
recompression. This is not a problem if we recompress every single
zram->table slot, but we never do that in reality. In reality we limit
the number of slots we can recompress (via max_pages parameter) and hence
proper slot selection becomes very important. The strategy is quite
simple, suppose we have two candidate slots for recompression, one of size
48 bytes and one of size 2800 bytes, and we can recompress only one, then
it certainly makes more sense to pick 2800 entry for recompression.
Because even if we manage to compress 48 bytes objects even further the
savings are going to be very small. Potential savings after good
re-compression of 2800 bytes objects are much higher.
This patch reworks slot selection and introduces the strategy described
above: among candidate slots always select the biggest ones first.
For that the patch introduces zram_pp_ctl (post-processing) structure
which holds NUM_PP_BUCKETS pp buckets of slots. Slots are assigned to a
particular group based on their sizes - the larger the size of the slot
the higher the group index. This, basically, sorts slots by size in liner
time (we still perform just one iteration over zram->table slots). When
we select slot for recompression we always first lookup in higher pp
buckets (those that hold the largest slots). Which achieves the desired
behavior.
TEST
====
A very simple demonstration: zram is configured with zstd, and zstd with
dict as a recompression stream. A limited (max 4096 pages) recompression
is performed then, with a log of sizes of slots that were recompressed.
You can see that patched zram selects slots for recompression in
significantly different manner, which leads to higher memory savings (see
column #2 of mm_stat output).
BASE
----
*** initial state of zram device
/sys/block/zram0/mm_stat
1750994944 504491413 514203648 0 514203648 1 0 34204 34204
*** recompress idle max_pages=4096
/sys/block/zram0/mm_stat
1750994944 504262229 514953216 0 514203648 1 0 34204 34204
Sizes of selected objects for recompression:
... 45 58 24 226 91 40 24 24 24 424 2104 93 2078 2078 2078 959 154 ...
PATCHED
-------
*** initial state of zram device
/sys/block/zram0/mm_stat
1750982656 504492801 514170880 0 514170880 1 0 34204 34204
*** recompress idle max_pages=4096
/sys/block/zram0/mm_stat
1750982656 503716710 517586944 0 514170880 1 0 34204 34204
Sizes of selected objects for recompression:
... 3680 3694 3667 3590 3614 3553 3537 3548 3550 3542 3543 3537 ...
Note, pp-slots are not strictly sorted, there is a PP_BUCKET_SIZE_RANGE
variation of sizes within particular bucket.
[senozhatsky@chromium.org: do not skip the first bucket]
Link: https://lkml.kernel.org/r/20241001085634.1948384-1-senozhatsky@chromium.org
Link: https://lkml.kernel.org/r/20240917021020.883356-4-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-17 11:09:08 +09:00
|
|
|
struct zram_pp_ctl *ctl = NULL;
|
|
|
|
struct zram_pp_slot *pps;
|
2022-11-09 20:50:44 +09:00
|
|
|
u32 mode = 0, threshold = 0;
|
2022-11-09 20:50:38 +09:00
|
|
|
struct page *page;
|
|
|
|
ssize_t ret;
|
|
|
|
|
|
|
|
args = skip_spaces(buf);
|
|
|
|
while (*args) {
|
|
|
|
args = next_arg(args, ¶m, &val);
|
|
|
|
|
2023-01-03 12:01:19 +09:00
|
|
|
if (!val || !*val)
|
2022-11-09 20:50:38 +09:00
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
if (!strcmp(param, "type")) {
|
|
|
|
if (!strcmp(val, "idle"))
|
|
|
|
mode = RECOMPRESS_IDLE;
|
|
|
|
if (!strcmp(val, "huge"))
|
|
|
|
mode = RECOMPRESS_HUGE;
|
|
|
|
if (!strcmp(val, "huge_idle"))
|
|
|
|
mode = RECOMPRESS_IDLE | RECOMPRESS_HUGE;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
2024-03-29 18:39:41 +09:00
|
|
|
if (!strcmp(param, "max_pages")) {
|
|
|
|
/*
|
|
|
|
* Limit the number of entries (pages) we attempt to
|
|
|
|
* recompress.
|
|
|
|
*/
|
|
|
|
ret = kstrtoull(val, 10, &num_recomp_pages);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
2022-11-09 20:50:38 +09:00
|
|
|
if (!strcmp(param, "threshold")) {
|
|
|
|
/*
|
|
|
|
* We will re-compress only idle objects equal or
|
|
|
|
* greater in size than watermark.
|
|
|
|
*/
|
|
|
|
ret = kstrtouint(val, 10, &threshold);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
continue;
|
|
|
|
}
|
2022-11-09 20:50:44 +09:00
|
|
|
|
|
|
|
if (!strcmp(param, "algo")) {
|
|
|
|
algo = val;
|
|
|
|
continue;
|
|
|
|
}
|
2024-09-02 19:56:12 +09:00
|
|
|
|
|
|
|
if (!strcmp(param, "priority")) {
|
|
|
|
ret = kstrtouint(val, 10, &prio);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
if (prio == ZRAM_PRIMARY_COMP)
|
|
|
|
prio = ZRAM_SECONDARY_COMP;
|
|
|
|
|
|
|
|
prio_max = min(prio + 1, ZRAM_MAX_COMPS);
|
|
|
|
continue;
|
|
|
|
}
|
2022-11-09 20:50:38 +09:00
|
|
|
}
|
|
|
|
|
2023-06-14 23:13:12 +09:00
|
|
|
if (threshold >= huge_class_size)
|
2022-11-09 20:50:38 +09:00
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
down_read(&zram->init_lock);
|
|
|
|
if (!init_done(zram)) {
|
|
|
|
ret = -EINVAL;
|
|
|
|
goto release_init_lock;
|
|
|
|
}
|
|
|
|
|
2024-09-17 11:09:07 +09:00
|
|
|
/* Do not permit concurrent post-processing actions. */
|
|
|
|
if (atomic_xchg(&zram->pp_in_progress, 1)) {
|
|
|
|
up_read(&zram->init_lock);
|
|
|
|
return -EAGAIN;
|
|
|
|
}
|
|
|
|
|
2022-11-09 20:50:44 +09:00
|
|
|
if (algo) {
|
|
|
|
bool found = false;
|
|
|
|
|
|
|
|
for (; prio < ZRAM_MAX_COMPS; prio++) {
|
|
|
|
if (!zram->comp_algs[prio])
|
|
|
|
continue;
|
|
|
|
|
|
|
|
if (!strcmp(zram->comp_algs[prio], algo)) {
|
|
|
|
prio_max = min(prio + 1, ZRAM_MAX_COMPS);
|
|
|
|
found = true;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!found) {
|
|
|
|
ret = -EINVAL;
|
|
|
|
goto release_init_lock;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2022-11-09 20:50:38 +09:00
|
|
|
page = alloc_page(GFP_KERNEL);
|
|
|
|
if (!page) {
|
|
|
|
ret = -ENOMEM;
|
|
|
|
goto release_init_lock;
|
|
|
|
}
|
|
|
|
|
zram: rework recompress target selection strategy
Target slot selection for recompression is just a simple iteration over
zram->table entries (stored pages) from slot 0 to max slot. Given that
zram->table slots are written in random order and are not sorted by size,
a simple iteration over slots selects suboptimal targets for
recompression. This is not a problem if we recompress every single
zram->table slot, but we never do that in reality. In reality we limit
the number of slots we can recompress (via max_pages parameter) and hence
proper slot selection becomes very important. The strategy is quite
simple, suppose we have two candidate slots for recompression, one of size
48 bytes and one of size 2800 bytes, and we can recompress only one, then
it certainly makes more sense to pick 2800 entry for recompression.
Because even if we manage to compress 48 bytes objects even further the
savings are going to be very small. Potential savings after good
re-compression of 2800 bytes objects are much higher.
This patch reworks slot selection and introduces the strategy described
above: among candidate slots always select the biggest ones first.
For that the patch introduces zram_pp_ctl (post-processing) structure
which holds NUM_PP_BUCKETS pp buckets of slots. Slots are assigned to a
particular group based on their sizes - the larger the size of the slot
the higher the group index. This, basically, sorts slots by size in liner
time (we still perform just one iteration over zram->table slots). When
we select slot for recompression we always first lookup in higher pp
buckets (those that hold the largest slots). Which achieves the desired
behavior.
TEST
====
A very simple demonstration: zram is configured with zstd, and zstd with
dict as a recompression stream. A limited (max 4096 pages) recompression
is performed then, with a log of sizes of slots that were recompressed.
You can see that patched zram selects slots for recompression in
significantly different manner, which leads to higher memory savings (see
column #2 of mm_stat output).
BASE
----
*** initial state of zram device
/sys/block/zram0/mm_stat
1750994944 504491413 514203648 0 514203648 1 0 34204 34204
*** recompress idle max_pages=4096
/sys/block/zram0/mm_stat
1750994944 504262229 514953216 0 514203648 1 0 34204 34204
Sizes of selected objects for recompression:
... 45 58 24 226 91 40 24 24 24 424 2104 93 2078 2078 2078 959 154 ...
PATCHED
-------
*** initial state of zram device
/sys/block/zram0/mm_stat
1750982656 504492801 514170880 0 514170880 1 0 34204 34204
*** recompress idle max_pages=4096
/sys/block/zram0/mm_stat
1750982656 503716710 517586944 0 514170880 1 0 34204 34204
Sizes of selected objects for recompression:
... 3680 3694 3667 3590 3614 3553 3537 3548 3550 3542 3543 3537 ...
Note, pp-slots are not strictly sorted, there is a PP_BUCKET_SIZE_RANGE
variation of sizes within particular bucket.
[senozhatsky@chromium.org: do not skip the first bucket]
Link: https://lkml.kernel.org/r/20241001085634.1948384-1-senozhatsky@chromium.org
Link: https://lkml.kernel.org/r/20240917021020.883356-4-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-17 11:09:08 +09:00
|
|
|
ctl = init_pp_ctl();
|
|
|
|
if (!ctl) {
|
|
|
|
ret = -ENOMEM;
|
|
|
|
goto release_init_lock;
|
|
|
|
}
|
|
|
|
|
|
|
|
scan_slots_for_recompress(zram, mode, ctl);
|
|
|
|
|
2022-11-09 20:50:38 +09:00
|
|
|
ret = len;
|
zram: rework recompress target selection strategy
Target slot selection for recompression is just a simple iteration over
zram->table entries (stored pages) from slot 0 to max slot. Given that
zram->table slots are written in random order and are not sorted by size,
a simple iteration over slots selects suboptimal targets for
recompression. This is not a problem if we recompress every single
zram->table slot, but we never do that in reality. In reality we limit
the number of slots we can recompress (via max_pages parameter) and hence
proper slot selection becomes very important. The strategy is quite
simple, suppose we have two candidate slots for recompression, one of size
48 bytes and one of size 2800 bytes, and we can recompress only one, then
it certainly makes more sense to pick 2800 entry for recompression.
Because even if we manage to compress 48 bytes objects even further the
savings are going to be very small. Potential savings after good
re-compression of 2800 bytes objects are much higher.
This patch reworks slot selection and introduces the strategy described
above: among candidate slots always select the biggest ones first.
For that the patch introduces zram_pp_ctl (post-processing) structure
which holds NUM_PP_BUCKETS pp buckets of slots. Slots are assigned to a
particular group based on their sizes - the larger the size of the slot
the higher the group index. This, basically, sorts slots by size in liner
time (we still perform just one iteration over zram->table slots). When
we select slot for recompression we always first lookup in higher pp
buckets (those that hold the largest slots). Which achieves the desired
behavior.
TEST
====
A very simple demonstration: zram is configured with zstd, and zstd with
dict as a recompression stream. A limited (max 4096 pages) recompression
is performed then, with a log of sizes of slots that were recompressed.
You can see that patched zram selects slots for recompression in
significantly different manner, which leads to higher memory savings (see
column #2 of mm_stat output).
BASE
----
*** initial state of zram device
/sys/block/zram0/mm_stat
1750994944 504491413 514203648 0 514203648 1 0 34204 34204
*** recompress idle max_pages=4096
/sys/block/zram0/mm_stat
1750994944 504262229 514953216 0 514203648 1 0 34204 34204
Sizes of selected objects for recompression:
... 45 58 24 226 91 40 24 24 24 424 2104 93 2078 2078 2078 959 154 ...
PATCHED
-------
*** initial state of zram device
/sys/block/zram0/mm_stat
1750982656 504492801 514170880 0 514170880 1 0 34204 34204
*** recompress idle max_pages=4096
/sys/block/zram0/mm_stat
1750982656 503716710 517586944 0 514170880 1 0 34204 34204
Sizes of selected objects for recompression:
... 3680 3694 3667 3590 3614 3553 3537 3548 3550 3542 3543 3537 ...
Note, pp-slots are not strictly sorted, there is a PP_BUCKET_SIZE_RANGE
variation of sizes within particular bucket.
[senozhatsky@chromium.org: do not skip the first bucket]
Link: https://lkml.kernel.org/r/20241001085634.1948384-1-senozhatsky@chromium.org
Link: https://lkml.kernel.org/r/20240917021020.883356-4-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-17 11:09:08 +09:00
|
|
|
while ((pps = select_pp_slot(ctl))) {
|
2022-11-09 20:50:38 +09:00
|
|
|
int err = 0;
|
|
|
|
|
2024-03-29 18:39:41 +09:00
|
|
|
if (!num_recomp_pages)
|
|
|
|
break;
|
|
|
|
|
zram: rework recompress target selection strategy
Target slot selection for recompression is just a simple iteration over
zram->table entries (stored pages) from slot 0 to max slot. Given that
zram->table slots are written in random order and are not sorted by size,
a simple iteration over slots selects suboptimal targets for
recompression. This is not a problem if we recompress every single
zram->table slot, but we never do that in reality. In reality we limit
the number of slots we can recompress (via max_pages parameter) and hence
proper slot selection becomes very important. The strategy is quite
simple, suppose we have two candidate slots for recompression, one of size
48 bytes and one of size 2800 bytes, and we can recompress only one, then
it certainly makes more sense to pick 2800 entry for recompression.
Because even if we manage to compress 48 bytes objects even further the
savings are going to be very small. Potential savings after good
re-compression of 2800 bytes objects are much higher.
This patch reworks slot selection and introduces the strategy described
above: among candidate slots always select the biggest ones first.
For that the patch introduces zram_pp_ctl (post-processing) structure
which holds NUM_PP_BUCKETS pp buckets of slots. Slots are assigned to a
particular group based on their sizes - the larger the size of the slot
the higher the group index. This, basically, sorts slots by size in liner
time (we still perform just one iteration over zram->table slots). When
we select slot for recompression we always first lookup in higher pp
buckets (those that hold the largest slots). Which achieves the desired
behavior.
TEST
====
A very simple demonstration: zram is configured with zstd, and zstd with
dict as a recompression stream. A limited (max 4096 pages) recompression
is performed then, with a log of sizes of slots that were recompressed.
You can see that patched zram selects slots for recompression in
significantly different manner, which leads to higher memory savings (see
column #2 of mm_stat output).
BASE
----
*** initial state of zram device
/sys/block/zram0/mm_stat
1750994944 504491413 514203648 0 514203648 1 0 34204 34204
*** recompress idle max_pages=4096
/sys/block/zram0/mm_stat
1750994944 504262229 514953216 0 514203648 1 0 34204 34204
Sizes of selected objects for recompression:
... 45 58 24 226 91 40 24 24 24 424 2104 93 2078 2078 2078 959 154 ...
PATCHED
-------
*** initial state of zram device
/sys/block/zram0/mm_stat
1750982656 504492801 514170880 0 514170880 1 0 34204 34204
*** recompress idle max_pages=4096
/sys/block/zram0/mm_stat
1750982656 503716710 517586944 0 514170880 1 0 34204 34204
Sizes of selected objects for recompression:
... 3680 3694 3667 3590 3614 3553 3537 3548 3550 3542 3543 3537 ...
Note, pp-slots are not strictly sorted, there is a PP_BUCKET_SIZE_RANGE
variation of sizes within particular bucket.
[senozhatsky@chromium.org: do not skip the first bucket]
Link: https://lkml.kernel.org/r/20241001085634.1948384-1-senozhatsky@chromium.org
Link: https://lkml.kernel.org/r/20240917021020.883356-4-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-17 11:09:08 +09:00
|
|
|
zram_slot_lock(zram, pps->index);
|
|
|
|
if (!zram_test_flag(zram, pps->index, ZRAM_PP_SLOT))
|
2022-11-09 20:50:38 +09:00
|
|
|
goto next;
|
|
|
|
|
zram: rework recompress target selection strategy
Target slot selection for recompression is just a simple iteration over
zram->table entries (stored pages) from slot 0 to max slot. Given that
zram->table slots are written in random order and are not sorted by size,
a simple iteration over slots selects suboptimal targets for
recompression. This is not a problem if we recompress every single
zram->table slot, but we never do that in reality. In reality we limit
the number of slots we can recompress (via max_pages parameter) and hence
proper slot selection becomes very important. The strategy is quite
simple, suppose we have two candidate slots for recompression, one of size
48 bytes and one of size 2800 bytes, and we can recompress only one, then
it certainly makes more sense to pick 2800 entry for recompression.
Because even if we manage to compress 48 bytes objects even further the
savings are going to be very small. Potential savings after good
re-compression of 2800 bytes objects are much higher.
This patch reworks slot selection and introduces the strategy described
above: among candidate slots always select the biggest ones first.
For that the patch introduces zram_pp_ctl (post-processing) structure
which holds NUM_PP_BUCKETS pp buckets of slots. Slots are assigned to a
particular group based on their sizes - the larger the size of the slot
the higher the group index. This, basically, sorts slots by size in liner
time (we still perform just one iteration over zram->table slots). When
we select slot for recompression we always first lookup in higher pp
buckets (those that hold the largest slots). Which achieves the desired
behavior.
TEST
====
A very simple demonstration: zram is configured with zstd, and zstd with
dict as a recompression stream. A limited (max 4096 pages) recompression
is performed then, with a log of sizes of slots that were recompressed.
You can see that patched zram selects slots for recompression in
significantly different manner, which leads to higher memory savings (see
column #2 of mm_stat output).
BASE
----
*** initial state of zram device
/sys/block/zram0/mm_stat
1750994944 504491413 514203648 0 514203648 1 0 34204 34204
*** recompress idle max_pages=4096
/sys/block/zram0/mm_stat
1750994944 504262229 514953216 0 514203648 1 0 34204 34204
Sizes of selected objects for recompression:
... 45 58 24 226 91 40 24 24 24 424 2104 93 2078 2078 2078 959 154 ...
PATCHED
-------
*** initial state of zram device
/sys/block/zram0/mm_stat
1750982656 504492801 514170880 0 514170880 1 0 34204 34204
*** recompress idle max_pages=4096
/sys/block/zram0/mm_stat
1750982656 503716710 517586944 0 514170880 1 0 34204 34204
Sizes of selected objects for recompression:
... 3680 3694 3667 3590 3614 3553 3537 3548 3550 3542 3543 3537 ...
Note, pp-slots are not strictly sorted, there is a PP_BUCKET_SIZE_RANGE
variation of sizes within particular bucket.
[senozhatsky@chromium.org: do not skip the first bucket]
Link: https://lkml.kernel.org/r/20241001085634.1948384-1-senozhatsky@chromium.org
Link: https://lkml.kernel.org/r/20240917021020.883356-4-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-17 11:09:08 +09:00
|
|
|
err = recompress_slot(zram, pps->index, page,
|
|
|
|
&num_recomp_pages, threshold,
|
|
|
|
prio, prio_max);
|
2022-11-09 20:50:38 +09:00
|
|
|
next:
|
zram: rework recompress target selection strategy
Target slot selection for recompression is just a simple iteration over
zram->table entries (stored pages) from slot 0 to max slot. Given that
zram->table slots are written in random order and are not sorted by size,
a simple iteration over slots selects suboptimal targets for
recompression. This is not a problem if we recompress every single
zram->table slot, but we never do that in reality. In reality we limit
the number of slots we can recompress (via max_pages parameter) and hence
proper slot selection becomes very important. The strategy is quite
simple, suppose we have two candidate slots for recompression, one of size
48 bytes and one of size 2800 bytes, and we can recompress only one, then
it certainly makes more sense to pick 2800 entry for recompression.
Because even if we manage to compress 48 bytes objects even further the
savings are going to be very small. Potential savings after good
re-compression of 2800 bytes objects are much higher.
This patch reworks slot selection and introduces the strategy described
above: among candidate slots always select the biggest ones first.
For that the patch introduces zram_pp_ctl (post-processing) structure
which holds NUM_PP_BUCKETS pp buckets of slots. Slots are assigned to a
particular group based on their sizes - the larger the size of the slot
the higher the group index. This, basically, sorts slots by size in liner
time (we still perform just one iteration over zram->table slots). When
we select slot for recompression we always first lookup in higher pp
buckets (those that hold the largest slots). Which achieves the desired
behavior.
TEST
====
A very simple demonstration: zram is configured with zstd, and zstd with
dict as a recompression stream. A limited (max 4096 pages) recompression
is performed then, with a log of sizes of slots that were recompressed.
You can see that patched zram selects slots for recompression in
significantly different manner, which leads to higher memory savings (see
column #2 of mm_stat output).
BASE
----
*** initial state of zram device
/sys/block/zram0/mm_stat
1750994944 504491413 514203648 0 514203648 1 0 34204 34204
*** recompress idle max_pages=4096
/sys/block/zram0/mm_stat
1750994944 504262229 514953216 0 514203648 1 0 34204 34204
Sizes of selected objects for recompression:
... 45 58 24 226 91 40 24 24 24 424 2104 93 2078 2078 2078 959 154 ...
PATCHED
-------
*** initial state of zram device
/sys/block/zram0/mm_stat
1750982656 504492801 514170880 0 514170880 1 0 34204 34204
*** recompress idle max_pages=4096
/sys/block/zram0/mm_stat
1750982656 503716710 517586944 0 514170880 1 0 34204 34204
Sizes of selected objects for recompression:
... 3680 3694 3667 3590 3614 3553 3537 3548 3550 3542 3543 3537 ...
Note, pp-slots are not strictly sorted, there is a PP_BUCKET_SIZE_RANGE
variation of sizes within particular bucket.
[senozhatsky@chromium.org: do not skip the first bucket]
Link: https://lkml.kernel.org/r/20241001085634.1948384-1-senozhatsky@chromium.org
Link: https://lkml.kernel.org/r/20240917021020.883356-4-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-17 11:09:08 +09:00
|
|
|
zram_slot_unlock(zram, pps->index);
|
|
|
|
release_pp_slot(zram, pps);
|
|
|
|
|
2022-11-09 20:50:38 +09:00
|
|
|
if (err) {
|
|
|
|
ret = err;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
cond_resched();
|
|
|
|
}
|
|
|
|
|
|
|
|
__free_page(page);
|
|
|
|
|
|
|
|
release_init_lock:
|
zram: rework recompress target selection strategy
Target slot selection for recompression is just a simple iteration over
zram->table entries (stored pages) from slot 0 to max slot. Given that
zram->table slots are written in random order and are not sorted by size,
a simple iteration over slots selects suboptimal targets for
recompression. This is not a problem if we recompress every single
zram->table slot, but we never do that in reality. In reality we limit
the number of slots we can recompress (via max_pages parameter) and hence
proper slot selection becomes very important. The strategy is quite
simple, suppose we have two candidate slots for recompression, one of size
48 bytes and one of size 2800 bytes, and we can recompress only one, then
it certainly makes more sense to pick 2800 entry for recompression.
Because even if we manage to compress 48 bytes objects even further the
savings are going to be very small. Potential savings after good
re-compression of 2800 bytes objects are much higher.
This patch reworks slot selection and introduces the strategy described
above: among candidate slots always select the biggest ones first.
For that the patch introduces zram_pp_ctl (post-processing) structure
which holds NUM_PP_BUCKETS pp buckets of slots. Slots are assigned to a
particular group based on their sizes - the larger the size of the slot
the higher the group index. This, basically, sorts slots by size in liner
time (we still perform just one iteration over zram->table slots). When
we select slot for recompression we always first lookup in higher pp
buckets (those that hold the largest slots). Which achieves the desired
behavior.
TEST
====
A very simple demonstration: zram is configured with zstd, and zstd with
dict as a recompression stream. A limited (max 4096 pages) recompression
is performed then, with a log of sizes of slots that were recompressed.
You can see that patched zram selects slots for recompression in
significantly different manner, which leads to higher memory savings (see
column #2 of mm_stat output).
BASE
----
*** initial state of zram device
/sys/block/zram0/mm_stat
1750994944 504491413 514203648 0 514203648 1 0 34204 34204
*** recompress idle max_pages=4096
/sys/block/zram0/mm_stat
1750994944 504262229 514953216 0 514203648 1 0 34204 34204
Sizes of selected objects for recompression:
... 45 58 24 226 91 40 24 24 24 424 2104 93 2078 2078 2078 959 154 ...
PATCHED
-------
*** initial state of zram device
/sys/block/zram0/mm_stat
1750982656 504492801 514170880 0 514170880 1 0 34204 34204
*** recompress idle max_pages=4096
/sys/block/zram0/mm_stat
1750982656 503716710 517586944 0 514170880 1 0 34204 34204
Sizes of selected objects for recompression:
... 3680 3694 3667 3590 3614 3553 3537 3548 3550 3542 3543 3537 ...
Note, pp-slots are not strictly sorted, there is a PP_BUCKET_SIZE_RANGE
variation of sizes within particular bucket.
[senozhatsky@chromium.org: do not skip the first bucket]
Link: https://lkml.kernel.org/r/20241001085634.1948384-1-senozhatsky@chromium.org
Link: https://lkml.kernel.org/r/20240917021020.883356-4-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-17 11:09:08 +09:00
|
|
|
release_pp_ctl(zram, ctl);
|
2024-09-17 11:09:07 +09:00
|
|
|
atomic_set(&zram->pp_in_progress, 0);
|
2022-11-09 20:50:38 +09:00
|
|
|
up_read(&zram->init_lock);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2023-04-11 19:14:45 +02:00
|
|
|
static void zram_bio_discard(struct zram *zram, struct bio *bio)
|
2014-04-07 15:38:24 -07:00
|
|
|
{
|
|
|
|
size_t n = bio->bi_iter.bi_size;
|
2023-04-11 19:14:45 +02:00
|
|
|
u32 index = bio->bi_iter.bi_sector >> SECTORS_PER_PAGE_SHIFT;
|
|
|
|
u32 offset = (bio->bi_iter.bi_sector & (SECTORS_PER_PAGE - 1)) <<
|
|
|
|
SECTOR_SHIFT;
|
2014-04-07 15:38:24 -07:00
|
|
|
|
|
|
|
/*
|
|
|
|
* zram manages data in physical block size units. Because logical block
|
|
|
|
* size isn't identical with physical block size on some arch, we
|
|
|
|
* could get a discard request pointing to a specific offset within a
|
|
|
|
* certain physical block. Although we can handle this request by
|
|
|
|
* reading that physiclal block and decompressing and partially zeroing
|
|
|
|
* and re-compressing and then re-storing it, this isn't reasonable
|
|
|
|
* because our intent with a discard request is to save memory. So
|
|
|
|
* skipping this logical block is appropriate here.
|
|
|
|
*/
|
|
|
|
if (offset) {
|
zram: correct offset usage in zram_bio_discard
We want to skip the physical block(PAGE_SIZE) which is partially covered
by the discard bio, so we check the remaining size and subtract it if
there is a need to goto the next physical block.
The current offset usage in zram_bio_discard is incorrect, it will cause
its upper filesystem breakdown. Consider the following scenario:
On some architecture or config, PAGE_SIZE is 64K for example, filesystem
is set up on zram disk without PAGE_SIZE aligned, a discard bio leads to a
offset = 4K and size=72K, normally, it should not really discard any
physical block as it partially cover two physical blocks. However, with
the current offset usage, it will discard the second physical block and
free its memory, which will cause filesystem breakdown.
This patch corrects the offset usage in zram_bio_discard.
Signed-off-by: Weijie Yang <weijie.yang@samsung.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:11:06 -07:00
|
|
|
if (n <= (PAGE_SIZE - offset))
|
2014-04-07 15:38:24 -07:00
|
|
|
return;
|
|
|
|
|
zram: correct offset usage in zram_bio_discard
We want to skip the physical block(PAGE_SIZE) which is partially covered
by the discard bio, so we check the remaining size and subtract it if
there is a need to goto the next physical block.
The current offset usage in zram_bio_discard is incorrect, it will cause
its upper filesystem breakdown. Consider the following scenario:
On some architecture or config, PAGE_SIZE is 64K for example, filesystem
is set up on zram disk without PAGE_SIZE aligned, a discard bio leads to a
offset = 4K and size=72K, normally, it should not really discard any
physical block as it partially cover two physical blocks. However, with
the current offset usage, it will discard the second physical block and
free its memory, which will cause filesystem breakdown.
This patch corrects the offset usage in zram_bio_discard.
Signed-off-by: Weijie Yang <weijie.yang@samsung.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:11:06 -07:00
|
|
|
n -= (PAGE_SIZE - offset);
|
2014-04-07 15:38:24 -07:00
|
|
|
index++;
|
|
|
|
}
|
|
|
|
|
|
|
|
while (n >= PAGE_SIZE) {
|
2017-05-03 14:55:44 -07:00
|
|
|
zram_slot_lock(zram, index);
|
2014-04-07 15:38:24 -07:00
|
|
|
zram_free_page(zram, index);
|
2017-05-03 14:55:44 -07:00
|
|
|
zram_slot_unlock(zram, index);
|
2014-10-09 15:29:57 -07:00
|
|
|
atomic64_inc(&zram->stats.notify_free);
|
2014-04-07 15:38:24 -07:00
|
|
|
index++;
|
|
|
|
n -= PAGE_SIZE;
|
|
|
|
}
|
2023-04-11 19:14:45 +02:00
|
|
|
|
|
|
|
bio_endio(bio);
|
2014-04-07 15:38:24 -07:00
|
|
|
}
|
|
|
|
|
2023-04-11 19:14:49 +02:00
|
|
|
static void zram_bio_read(struct zram *zram, struct bio *bio)
|
2013-06-22 03:21:18 +03:00
|
|
|
{
|
2023-08-05 07:55:37 +02:00
|
|
|
unsigned long start_time = bio_start_io_acct(bio);
|
|
|
|
struct bvec_iter iter = bio->bi_iter;
|
2013-06-22 03:21:18 +03:00
|
|
|
|
2023-08-05 07:55:37 +02:00
|
|
|
do {
|
2023-04-11 19:14:49 +02:00
|
|
|
u32 index = iter.bi_sector >> SECTORS_PER_PAGE_SHIFT;
|
|
|
|
u32 offset = (iter.bi_sector & (SECTORS_PER_PAGE - 1)) <<
|
|
|
|
SECTOR_SHIFT;
|
2023-08-05 07:55:37 +02:00
|
|
|
struct bio_vec bv = bio_iter_iovec(bio, iter);
|
|
|
|
|
|
|
|
bv.bv_len = min_t(u32, bv.bv_len, PAGE_SIZE - offset);
|
2018-06-07 17:05:45 -07:00
|
|
|
|
2023-04-11 19:14:49 +02:00
|
|
|
if (zram_bvec_read(zram, &bv, index, offset, bio) < 0) {
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
atomic64_inc(&zram->stats.failed_reads);
|
2023-04-11 19:14:49 +02:00
|
|
|
bio->bi_status = BLK_STS_IOERR;
|
|
|
|
break;
|
2023-04-11 19:14:48 +02:00
|
|
|
}
|
2023-04-11 19:14:49 +02:00
|
|
|
flush_dcache_page(bv.bv_page);
|
2013-06-22 03:21:18 +03:00
|
|
|
|
2023-04-11 19:14:49 +02:00
|
|
|
zram_slot_lock(zram, index);
|
|
|
|
zram_accessed(zram, index);
|
|
|
|
zram_slot_unlock(zram, index);
|
2023-08-05 07:55:37 +02:00
|
|
|
|
|
|
|
bio_advance_iter_single(bio, &iter, bv.bv_len);
|
|
|
|
} while (iter.bi_size);
|
|
|
|
|
2023-04-11 19:14:49 +02:00
|
|
|
bio_end_io_acct(bio, start_time);
|
|
|
|
bio_endio(bio);
|
2011-06-10 15:28:47 +02:00
|
|
|
}
|
|
|
|
|
2023-04-11 19:14:49 +02:00
|
|
|
static void zram_bio_write(struct zram *zram, struct bio *bio)
|
2011-06-10 15:28:47 +02:00
|
|
|
{
|
2023-08-05 07:55:37 +02:00
|
|
|
unsigned long start_time = bio_start_io_acct(bio);
|
|
|
|
struct bvec_iter iter = bio->bi_iter;
|
2011-06-10 15:28:47 +02:00
|
|
|
|
2023-08-05 07:55:37 +02:00
|
|
|
do {
|
2023-04-11 19:14:46 +02:00
|
|
|
u32 index = iter.bi_sector >> SECTORS_PER_PAGE_SHIFT;
|
|
|
|
u32 offset = (iter.bi_sector & (SECTORS_PER_PAGE - 1)) <<
|
|
|
|
SECTOR_SHIFT;
|
2023-08-05 07:55:37 +02:00
|
|
|
struct bio_vec bv = bio_iter_iovec(bio, iter);
|
|
|
|
|
|
|
|
bv.bv_len = min_t(u32, bv.bv_len, PAGE_SIZE - offset);
|
2011-06-10 15:28:48 +02:00
|
|
|
|
2023-04-11 19:14:49 +02:00
|
|
|
if (zram_bvec_write(zram, &bv, index, offset, bio) < 0) {
|
|
|
|
atomic64_inc(&zram->stats.failed_writes);
|
2023-04-11 19:14:46 +02:00
|
|
|
bio->bi_status = BLK_STS_IOERR;
|
|
|
|
break;
|
|
|
|
}
|
2011-06-10 15:28:48 +02:00
|
|
|
|
2023-04-11 19:14:49 +02:00
|
|
|
zram_slot_lock(zram, index);
|
|
|
|
zram_accessed(zram, index);
|
|
|
|
zram_slot_unlock(zram, index);
|
2023-08-05 07:55:37 +02:00
|
|
|
|
|
|
|
bio_advance_iter_single(bio, &iter, bv.bv_len);
|
|
|
|
} while (iter.bi_size);
|
|
|
|
|
2020-05-27 07:24:11 +02:00
|
|
|
bio_end_io_acct(bio, start_time);
|
2015-07-20 15:29:37 +02:00
|
|
|
bio_endio(bio);
|
2009-09-22 10:26:53 +05:30
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2010-06-01 13:31:25 +05:30
|
|
|
* Handler function for all zram I/O requests.
|
2009-09-22 10:26:53 +05:30
|
|
|
*/
|
2021-10-12 13:12:24 +02:00
|
|
|
static void zram_submit_bio(struct bio *bio)
|
2009-09-22 10:26:53 +05:30
|
|
|
{
|
2021-01-24 11:02:34 +01:00
|
|
|
struct zram *zram = bio->bi_bdev->bd_disk->private_data;
|
2009-09-22 10:26:53 +05:30
|
|
|
|
2023-04-11 19:14:47 +02:00
|
|
|
switch (bio_op(bio)) {
|
|
|
|
case REQ_OP_READ:
|
2023-04-11 19:14:49 +02:00
|
|
|
zram_bio_read(zram, bio);
|
|
|
|
break;
|
2023-04-11 19:14:47 +02:00
|
|
|
case REQ_OP_WRITE:
|
2023-04-11 19:14:49 +02:00
|
|
|
zram_bio_write(zram, bio);
|
2023-04-11 19:14:47 +02:00
|
|
|
break;
|
|
|
|
case REQ_OP_DISCARD:
|
|
|
|
case REQ_OP_WRITE_ZEROES:
|
|
|
|
zram_bio_discard(zram, bio);
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
WARN_ON_ONCE(1);
|
|
|
|
bio_endio(bio);
|
2011-02-17 17:11:49 +01:00
|
|
|
}
|
2009-09-22 10:26:53 +05:30
|
|
|
}
|
|
|
|
|
2011-09-09 19:01:00 -04:00
|
|
|
static void zram_slot_free_notify(struct block_device *bdev,
|
|
|
|
unsigned long index)
|
2010-05-17 11:02:44 +05:30
|
|
|
{
|
2010-06-01 13:31:25 +05:30
|
|
|
struct zram *zram;
|
2010-05-17 11:02:44 +05:30
|
|
|
|
2010-06-01 13:31:25 +05:30
|
|
|
zram = bdev->bd_disk->private_data;
|
2013-08-12 15:13:56 +09:00
|
|
|
|
zram: fix lockdep warning of free block handling
Patch series "zram idle page writeback", v3.
Inherently, swap device has many idle pages which are rare touched since
it was allocated. It is never problem if we use storage device as swap.
However, it's just waste for zram-swap.
This patchset supports zram idle page writeback feature.
* Admin can define what is idle page "no access since X time ago"
* Admin can define when zram should writeback them
* Admin can define when zram should stop writeback to prevent wearout
Details are in each patch's description.
This patch (of 7):
================================
WARNING: inconsistent lock state
4.19.0+ #390 Not tainted
--------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
zram_verify/2095 [HC0[0]:SC1[1]:HE1:SE0] takes:
00000000b1828693 (&(&zram->bitmap_lock)->rlock){+.?.}, at: put_entry_bdev+0x1e/0x50
{SOFTIRQ-ON-W} state was registered at:
_raw_spin_lock+0x2c/0x40
zram_make_request+0x755/0xdc9
generic_make_request+0x373/0x6a0
submit_bio+0x6c/0x140
__swap_writepage+0x3a8/0x480
shrink_page_list+0x1102/0x1a60
shrink_inactive_list+0x21b/0x3f0
shrink_node_memcg.constprop.99+0x4f8/0x7e0
shrink_node+0x7d/0x2f0
do_try_to_free_pages+0xe0/0x300
try_to_free_pages+0x116/0x2b0
__alloc_pages_slowpath+0x3f4/0xf80
__alloc_pages_nodemask+0x2a2/0x2f0
__handle_mm_fault+0x42e/0xb50
handle_mm_fault+0x55/0xb0
__do_page_fault+0x235/0x4b0
page_fault+0x1e/0x30
irq event stamp: 228412
hardirqs last enabled at (228412): [<ffffffff98245846>] __slab_free+0x3e6/0x600
hardirqs last disabled at (228411): [<ffffffff98245625>] __slab_free+0x1c5/0x600
softirqs last enabled at (228396): [<ffffffff98e0031e>] __do_softirq+0x31e/0x427
softirqs last disabled at (228403): [<ffffffff98072051>] irq_exit+0xd1/0xe0
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&(&zram->bitmap_lock)->rlock);
<Interrupt>
lock(&(&zram->bitmap_lock)->rlock);
*** DEADLOCK ***
no locks held by zram_verify/2095.
stack backtrace:
CPU: 5 PID: 2095 Comm: zram_verify Not tainted 4.19.0+ #390
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x67/0x9b
print_usage_bug+0x1bd/0x1d3
mark_lock+0x4aa/0x540
__lock_acquire+0x51d/0x1300
lock_acquire+0x90/0x180
_raw_spin_lock+0x2c/0x40
put_entry_bdev+0x1e/0x50
zram_free_page+0xf6/0x110
zram_slot_free_notify+0x42/0xa0
end_swap_bio_read+0x5b/0x170
blk_update_request+0x8f/0x340
scsi_end_request+0x2c/0x1e0
scsi_io_completion+0x98/0x650
blk_done_softirq+0x9e/0xd0
__do_softirq+0xcc/0x427
irq_exit+0xd1/0xe0
do_IRQ+0x93/0x120
common_interrupt+0xf/0xf
</IRQ>
With writeback feature, zram_slot_free_notify could be called in softirq
context by end_swap_bio_read. However, bitmap_lock is not aware of that
so lockdep yell out:
get_entry_bdev
spin_lock(bitmap->lock);
irq
softirq
end_swap_bio_read
zram_slot_free_notify
zram_slot_lock <-- deadlock prone
zram_free_page
put_entry_bdev
spin_lock(bitmap->lock); <-- deadlock prone
With akpm's suggestion (i.e. bitmap operation is already atomic), we
could remove bitmap lock. It might fail to find a empty slot if serious
contention happens. However, it's not severe problem because huge page
writeback has already possiblity to fail if there is severe memory
pressure. Worst case is just keeping the incompressible in memory, not
storage.
The other problem is zram_slot_lock in zram_slot_slot_free_notify. To
make it safe is this patch introduces zram_slot_trylock where
zram_slot_free_notify uses it. Although it's rare to be contented, this
patch adds new debug stat "miss_free" to keep monitoring how often it
happens.
Link: http://lkml.kernel.org/r/20181127055429.251614-2-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Joey Pabalinas <joeypabalinas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 00:36:33 -08:00
|
|
|
atomic64_inc(&zram->stats.notify_free);
|
|
|
|
if (!zram_slot_trylock(zram, index)) {
|
|
|
|
atomic64_inc(&zram->stats.miss_free);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2014-01-30 15:46:04 -08:00
|
|
|
zram_free_page(zram, index);
|
2017-05-03 14:55:44 -07:00
|
|
|
zram_slot_unlock(zram, index);
|
2010-05-17 11:02:44 +05:30
|
|
|
}
|
|
|
|
|
2024-09-02 19:56:01 +09:00
|
|
|
static void zram_comp_params_reset(struct zram *zram)
|
|
|
|
{
|
|
|
|
u32 prio;
|
|
|
|
|
|
|
|
for (prio = ZRAM_PRIMARY_COMP; prio < ZRAM_MAX_COMPS; prio++) {
|
2024-09-02 19:56:04 +09:00
|
|
|
comp_params_reset(zram, prio);
|
2024-09-02 19:56:01 +09:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2022-11-09 20:50:35 +09:00
|
|
|
static void zram_destroy_comps(struct zram *zram)
|
|
|
|
{
|
|
|
|
u32 prio;
|
|
|
|
|
2024-10-09 13:28:00 +09:00
|
|
|
for (prio = ZRAM_PRIMARY_COMP; prio < ZRAM_MAX_COMPS; prio++) {
|
2022-11-09 20:50:35 +09:00
|
|
|
struct zcomp *comp = zram->comps[prio];
|
|
|
|
|
|
|
|
zram->comps[prio] = NULL;
|
|
|
|
if (!comp)
|
|
|
|
continue;
|
|
|
|
zcomp_destroy(comp);
|
2022-11-09 20:50:44 +09:00
|
|
|
zram->num_active_comps--;
|
2022-11-09 20:50:35 +09:00
|
|
|
}
|
2024-09-02 19:56:01 +09:00
|
|
|
|
2024-09-23 19:48:43 +03:00
|
|
|
for (prio = ZRAM_PRIMARY_COMP; prio < ZRAM_MAX_COMPS; prio++) {
|
|
|
|
/* Do not free statically defined compression algorithms */
|
|
|
|
if (zram->comp_algs[prio] != default_compressor)
|
|
|
|
kfree(zram->comp_algs[prio]);
|
2024-09-11 11:54:56 +09:00
|
|
|
zram->comp_algs[prio] = NULL;
|
|
|
|
}
|
|
|
|
|
2024-09-02 19:56:01 +09:00
|
|
|
zram_comp_params_reset(zram);
|
2022-11-09 20:50:35 +09:00
|
|
|
}
|
|
|
|
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
static void zram_reset_device(struct zram *zram)
|
|
|
|
{
|
|
|
|
down_write(&zram->init_lock);
|
2013-06-22 03:21:18 +03:00
|
|
|
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
zram->limit_pages = 0;
|
|
|
|
|
2020-11-16 15:57:09 +01:00
|
|
|
set_capacity_and_notify(zram->disk, 0);
|
2020-11-24 09:36:54 +01:00
|
|
|
part_stat_set_all(zram->disk->part0, 0);
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
|
|
|
|
/* I/O operation under all of CPU are done so let's free */
|
2022-08-24 12:51:00 +09:00
|
|
|
zram_meta_free(zram, zram->disksize);
|
|
|
|
zram->disksize = 0;
|
2022-11-09 20:50:35 +09:00
|
|
|
zram_destroy_comps(zram);
|
2017-05-03 14:55:53 -07:00
|
|
|
memset(&zram->stats, 0, sizeof(zram->stats));
|
2024-09-17 11:09:07 +09:00
|
|
|
atomic_set(&zram->pp_in_progress, 0);
|
2017-09-06 16:19:54 -07:00
|
|
|
reset_bdev(zram);
|
2021-10-25 10:54:23 +08:00
|
|
|
|
2022-11-09 20:50:35 +09:00
|
|
|
comp_algorithm_set(zram, ZRAM_PRIMARY_COMP, default_compressor);
|
2021-10-25 10:54:23 +08:00
|
|
|
up_write(&zram->init_lock);
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
static ssize_t disksize_store(struct device *dev,
|
|
|
|
struct device_attribute *attr, const char *buf, size_t len)
|
2015-04-15 16:16:03 -07:00
|
|
|
{
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
u64 disksize;
|
|
|
|
struct zcomp *comp;
|
2015-04-15 16:16:03 -07:00
|
|
|
struct zram *zram = dev_to_zram(dev);
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
int err;
|
2022-11-09 20:50:35 +09:00
|
|
|
u32 prio;
|
2015-04-15 16:16:03 -07:00
|
|
|
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
disksize = memparse(buf, NULL);
|
|
|
|
if (!disksize)
|
|
|
|
return -EINVAL;
|
2015-04-15 16:16:03 -07:00
|
|
|
|
2017-05-03 14:55:47 -07:00
|
|
|
down_write(&zram->init_lock);
|
|
|
|
if (init_done(zram)) {
|
|
|
|
pr_info("Cannot change disksize for initialized device\n");
|
|
|
|
err = -EBUSY;
|
|
|
|
goto out_unlock;
|
|
|
|
}
|
|
|
|
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
disksize = PAGE_ALIGN(disksize);
|
2017-05-03 14:55:47 -07:00
|
|
|
if (!zram_meta_alloc(zram, disksize)) {
|
|
|
|
err = -ENOMEM;
|
|
|
|
goto out_unlock;
|
|
|
|
}
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
|
2024-10-09 13:28:00 +09:00
|
|
|
for (prio = ZRAM_PRIMARY_COMP; prio < ZRAM_MAX_COMPS; prio++) {
|
2022-11-09 20:50:35 +09:00
|
|
|
if (!zram->comp_algs[prio])
|
|
|
|
continue;
|
|
|
|
|
2024-09-02 19:56:01 +09:00
|
|
|
comp = zcomp_create(zram->comp_algs[prio],
|
|
|
|
&zram->params[prio]);
|
2022-11-09 20:50:35 +09:00
|
|
|
if (IS_ERR(comp)) {
|
|
|
|
pr_err("Cannot initialise %s compressing backend\n",
|
|
|
|
zram->comp_algs[prio]);
|
|
|
|
err = PTR_ERR(comp);
|
|
|
|
goto out_free_comps;
|
|
|
|
}
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
|
2022-11-09 20:50:35 +09:00
|
|
|
zram->comps[prio] = comp;
|
2022-11-09 20:50:44 +09:00
|
|
|
zram->num_active_comps++;
|
2022-11-09 20:50:35 +09:00
|
|
|
}
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
zram->disksize = disksize;
|
2020-11-16 15:57:09 +01:00
|
|
|
set_capacity_and_notify(zram->disk, zram->disksize >> SECTOR_SHIFT);
|
2017-01-10 16:58:18 -08:00
|
|
|
up_write(&zram->init_lock);
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
|
|
|
|
return len;
|
|
|
|
|
2022-11-09 20:50:35 +09:00
|
|
|
out_free_comps:
|
|
|
|
zram_destroy_comps(zram);
|
2017-05-03 14:55:47 -07:00
|
|
|
zram_meta_free(zram, disksize);
|
|
|
|
out_unlock:
|
|
|
|
up_write(&zram->init_lock);
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
return err;
|
2015-04-15 16:16:03 -07:00
|
|
|
}
|
|
|
|
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
static ssize_t reset_store(struct device *dev,
|
|
|
|
struct device_attribute *attr, const char *buf, size_t len)
|
2015-04-15 16:16:06 -07:00
|
|
|
{
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
int ret;
|
|
|
|
unsigned short do_reset;
|
|
|
|
struct zram *zram;
|
2022-03-30 07:29:04 +02:00
|
|
|
struct gendisk *disk;
|
2015-04-15 16:16:06 -07:00
|
|
|
|
2015-06-25 15:00:21 -07:00
|
|
|
ret = kstrtou16(buf, 10, &do_reset);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
if (!do_reset)
|
|
|
|
return -EINVAL;
|
|
|
|
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
zram = dev_to_zram(dev);
|
2022-03-30 07:29:04 +02:00
|
|
|
disk = zram->disk;
|
2015-04-15 16:16:06 -07:00
|
|
|
|
2022-03-30 07:29:04 +02:00
|
|
|
mutex_lock(&disk->open_mutex);
|
2015-06-25 15:00:21 -07:00
|
|
|
/* Do not reset an active device or claimed device */
|
2022-03-30 07:29:06 +02:00
|
|
|
if (disk_openers(disk) || zram->claim) {
|
2022-03-30 07:29:04 +02:00
|
|
|
mutex_unlock(&disk->open_mutex);
|
2015-06-25 15:00:21 -07:00
|
|
|
return -EBUSY;
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
}
|
|
|
|
|
2015-06-25 15:00:21 -07:00
|
|
|
/* From now on, anyone can't open /dev/zram[0-9] */
|
|
|
|
zram->claim = true;
|
2022-03-30 07:29:04 +02:00
|
|
|
mutex_unlock(&disk->open_mutex);
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
|
2015-06-25 15:00:21 -07:00
|
|
|
/* Make sure all the pending I/O are finished */
|
2022-03-30 07:29:04 +02:00
|
|
|
sync_blockdev(disk->part0);
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
zram_reset_device(zram);
|
|
|
|
|
2022-03-30 07:29:04 +02:00
|
|
|
mutex_lock(&disk->open_mutex);
|
2015-06-25 15:00:21 -07:00
|
|
|
zram->claim = false;
|
2022-03-30 07:29:04 +02:00
|
|
|
mutex_unlock(&disk->open_mutex);
|
2015-06-25 15:00:21 -07:00
|
|
|
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
return len;
|
2015-06-25 15:00:21 -07:00
|
|
|
}
|
|
|
|
|
2023-06-08 13:02:55 +02:00
|
|
|
static int zram_open(struct gendisk *disk, blk_mode_t mode)
|
2015-06-25 15:00:21 -07:00
|
|
|
{
|
2023-06-08 13:02:36 +02:00
|
|
|
struct zram *zram = disk->private_data;
|
2015-06-25 15:00:21 -07:00
|
|
|
|
2023-06-08 13:02:36 +02:00
|
|
|
WARN_ON(!mutex_is_locked(&disk->open_mutex));
|
2015-06-25 15:00:21 -07:00
|
|
|
|
|
|
|
/* zram was claimed to reset so open request fails */
|
|
|
|
if (zram->claim)
|
2023-06-08 13:02:36 +02:00
|
|
|
return -EBUSY;
|
|
|
|
return 0;
|
2015-04-15 16:16:06 -07:00
|
|
|
}
|
|
|
|
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
static const struct block_device_operations zram_devops = {
|
2015-06-25 15:00:21 -07:00
|
|
|
.open = zram_open,
|
2020-07-01 10:59:43 +02:00
|
|
|
.submit_bio = zram_submit_bio,
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
.swap_slot_free_notify = zram_slot_free_notify,
|
|
|
|
.owner = THIS_MODULE
|
|
|
|
};
|
|
|
|
|
|
|
|
static DEVICE_ATTR_WO(compact);
|
|
|
|
static DEVICE_ATTR_RW(disksize);
|
|
|
|
static DEVICE_ATTR_RO(initstate);
|
|
|
|
static DEVICE_ATTR_WO(reset);
|
2017-02-22 15:46:45 -08:00
|
|
|
static DEVICE_ATTR_WO(mem_limit);
|
|
|
|
static DEVICE_ATTR_WO(mem_used_max);
|
2018-12-28 00:36:44 -08:00
|
|
|
static DEVICE_ATTR_WO(idle);
|
zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 15:00:08 -07:00
|
|
|
static DEVICE_ATTR_RW(max_comp_streams);
|
|
|
|
static DEVICE_ATTR_RW(comp_algorithm);
|
2017-09-06 16:19:54 -07:00
|
|
|
#ifdef CONFIG_ZRAM_WRITEBACK
|
|
|
|
static DEVICE_ATTR_RW(backing_dev);
|
zram: support idle/huge page writeback
Add a new feature "zram idle/huge page writeback". In the zram-swap use
case, zram usually has many idle/huge swap pages. It's pointless to keep
them in memory (ie, zram).
To solve this problem, this feature introduces idle/huge page writeback to
the backing device so the goal is to save more memory space on embedded
systems.
Normal sequence to use idle/huge page writeback feature is as follows,
while (1) {
# mark allocated zram slot to idle
echo all > /sys/block/zram0/idle
# leave system working for several hours
# Unless there is no access for some blocks on zram,
# they are still IDLE marked pages.
echo "idle" > /sys/block/zram0/writeback
or/and
echo "huge" > /sys/block/zram0/writeback
# write the IDLE or/and huge marked slot into backing device
# and free the memory.
}
Per the discussion at
https://lore.kernel.org/lkml/20181122065926.GG3441@jagdpanzerIV/T/#u,
This patch removes direct incommpressibe page writeback feature
(d2afd25114f4 ("zram: write incompressible pages to backing device")).
Below concerns from Sergey:
== &< ==
"IDLE writeback" is superior to "incompressible writeback".
"incompressible writeback" is completely unpredictable and uncontrollable;
it depens on data patterns and compression algorithms. While "IDLE
writeback" is predictable.
I even suspect, that, *ideally*, we can remove "incompressible writeback".
"IDLE pages" is a super set which also includes "incompressible" pages.
So, technically, we still can do "incompressible writeback" from "IDLE
writeback" path; but a much more reasonable one, based on a page idling
period.
I understand that you want to keep "direct incompressible writeback"
around. ZRAM is especially popular on devices which do suffer from flash
wearout, so I can see "incompressible writeback" path becoming a dead
code, long term.
== &< ==
Below concerns from Minchan:
== &< ==
My concern is if we enable CONFIG_ZRAM_WRITEBACK in this implementation,
both hugepage/idlepage writeck will turn on. However someuser want to
enable only idlepage writeback so we need to introduce turn on/off knob
for hugepage or new CONFIG_ZRAM_IDLEPAGE_WRITEBACK for those usecase. I
don't want to make it complicated *if possible*.
Long term, I imagine we need to make VM aware of new swap hierarchy a
little bit different with as-is. For example, first high priority swap
can return -EIO or -ENOCOMP, swap try to fallback to next lower priority
swap device. With that, hugepage writeback will work tranparently.
So we could regard it as regression because incompressible pages doesn't
go to backing storage automatically. Instead, user should do it via "echo
huge" > /sys/block/zram/writeback" manually.
== &< ==
Link: http://lkml.kernel.org/r/20181127055429.251614-6-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Joey Pabalinas <joeypabalinas@gmail.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 00:36:47 -08:00
|
|
|
static DEVICE_ATTR_WO(writeback);
|
2018-12-28 00:36:54 -08:00
|
|
|
static DEVICE_ATTR_RW(writeback_limit);
|
2019-01-08 15:22:53 -08:00
|
|
|
static DEVICE_ATTR_RW(writeback_limit_enable);
|
2017-09-06 16:19:54 -07:00
|
|
|
#endif
|
2022-11-09 20:50:36 +09:00
|
|
|
#ifdef CONFIG_ZRAM_MULTI_COMP
|
|
|
|
static DEVICE_ATTR_RW(recomp_algorithm);
|
2022-11-09 20:50:38 +09:00
|
|
|
static DEVICE_ATTR_WO(recompress);
|
2022-11-09 20:50:36 +09:00
|
|
|
#endif
|
2024-09-02 19:56:03 +09:00
|
|
|
static DEVICE_ATTR_WO(algorithm_params);
|
2014-04-07 15:38:04 -07:00
|
|
|
|
2013-06-22 03:21:18 +03:00
|
|
|
static struct attribute *zram_disk_attrs[] = {
|
|
|
|
&dev_attr_disksize.attr,
|
|
|
|
&dev_attr_initstate.attr,
|
|
|
|
&dev_attr_reset.attr,
|
2015-05-05 16:23:25 -07:00
|
|
|
&dev_attr_compact.attr,
|
2014-10-09 15:29:53 -07:00
|
|
|
&dev_attr_mem_limit.attr,
|
2014-10-09 15:29:55 -07:00
|
|
|
&dev_attr_mem_used_max.attr,
|
2018-12-28 00:36:44 -08:00
|
|
|
&dev_attr_idle.attr,
|
zram: add multi stream functionality
Existing zram (zcomp) implementation has only one compression stream
(buffer and algorithm private part), so in order to prevent data
corruption only one write (compress operation) can use this compression
stream, forcing all concurrent write operations to wait for stream lock
to be released. This patch changes zcomp to keep a compression streams
list of user-defined size (via sysfs device attr). Each write operation
still exclusively holds compression stream, the difference is that we
can have N write operations (depending on size of streams list)
executing in parallel. See TEST section later in commit message for
performance data.
Introduce struct zcomp_strm_multi and a set of functions to manage
zcomp_strm stream access. zcomp_strm_multi has a list of idle
zcomp_strm structs, spinlock to protect idle list and wait queue, making
it possible to perform parallel compressions.
The following set of functions added:
- zcomp_strm_multi_find()/zcomp_strm_multi_release()
find and release a compression stream, implement required locking
- zcomp_strm_multi_create()/zcomp_strm_multi_destroy()
create and destroy zcomp_strm_multi
zcomp ->strm_find() and ->strm_release() callbacks are set during
initialisation to zcomp_strm_multi_find()/zcomp_strm_multi_release()
correspondingly.
Each time zcomp issues a zcomp_strm_multi_find() call, the following set
of operations performed:
- spin lock strm_lock
- if idle list is not empty, remove zcomp_strm from idle list, spin
unlock and return zcomp stream pointer to caller
- if idle list is empty, current adds itself to wait queue. it will be
awaken by zcomp_strm_multi_release() caller.
zcomp_strm_multi_release():
- spin lock strm_lock
- add zcomp stream to idle list
- spin unlock, wake up sleeper
Minchan Kim reported that spinlock-based locking scheme has demonstrated
a severe perfomance regression for single compression stream case,
comparing to mutex-based (see https://lkml.org/lkml/2014/2/18/16)
base spinlock mutex
==Initial write ==Initial write ==Initial write
records: 5 records: 5 records: 5
avg: 1642424.35 avg: 699610.40 avg: 1655583.71
std: 39890.95(2.43%) std: 232014.19(33.16%) std: 52293.96
max: 1690170.94 max: 1163473.45 max: 1697164.75
min: 1568669.52 min: 573429.88 min: 1553410.23
==Rewrite ==Rewrite ==Rewrite
records: 5 records: 5 records: 5
avg: 1611775.39 avg: 501406.64 avg: 1684419.11
std: 17144.58(1.06%) std: 15354.41(3.06%) std: 18367.42
max: 1641800.95 max: 531356.78 max: 1706445.84
min: 1593515.27 min: 488817.78 min: 1655335.73
When only one compression stream available, mutex with spin on owner
tends to perform much better than frequent wait_event()/wake_up(). This
is why single stream implemented as a special case with mutex locking.
Introduce and document zram device attribute max_comp_streams. This
attr shows and stores current zcomp's max number of zcomp streams
(max_strm). Extend zcomp's zcomp_create() with `max_strm' parameter.
`max_strm' limits the number of zcomp_strm structs in compression
backend's idle list (max_comp_streams).
max_comp_streams used during initialisation as follows:
-- passing to zcomp_create() max_strm equals to 1 will initialise zcomp
using single compression stream zcomp_strm_single (mutex-based locking).
-- passing to zcomp_create() max_strm greater than 1 will initialise zcomp
using multi compression stream zcomp_strm_multi (spinlock-based locking).
default max_comp_streams value is 1, meaning that zram with single stream
will be initialised.
Later patch will introduce configuration knob to change max_comp_streams
on already initialised and used zcomp.
TEST
iozone -t 3 -R -r 16K -s 60M -I +Z
test base 1 strm (mutex) 3 strm (spinlock)
-----------------------------------------------------------------------
Initial write 589286.78 583518.39 718011.05
Rewrite 604837.97 596776.38 1515125.72
Random write 584120.11 595714.58 1388850.25
Pwrite 535731.17 541117.38 739295.27
Fwrite 1418083.88 1478612.72 1484927.06
Usage example:
set max_comp_streams to 4
echo 4 > /sys/block/zram0/max_comp_streams
show current max_comp_streams (default value is 1).
cat /sys/block/zram0/max_comp_streams
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 15:38:14 -07:00
|
|
|
&dev_attr_max_comp_streams.attr,
|
2014-04-07 15:38:17 -07:00
|
|
|
&dev_attr_comp_algorithm.attr,
|
2017-09-06 16:19:54 -07:00
|
|
|
#ifdef CONFIG_ZRAM_WRITEBACK
|
|
|
|
&dev_attr_backing_dev.attr,
|
zram: support idle/huge page writeback
Add a new feature "zram idle/huge page writeback". In the zram-swap use
case, zram usually has many idle/huge swap pages. It's pointless to keep
them in memory (ie, zram).
To solve this problem, this feature introduces idle/huge page writeback to
the backing device so the goal is to save more memory space on embedded
systems.
Normal sequence to use idle/huge page writeback feature is as follows,
while (1) {
# mark allocated zram slot to idle
echo all > /sys/block/zram0/idle
# leave system working for several hours
# Unless there is no access for some blocks on zram,
# they are still IDLE marked pages.
echo "idle" > /sys/block/zram0/writeback
or/and
echo "huge" > /sys/block/zram0/writeback
# write the IDLE or/and huge marked slot into backing device
# and free the memory.
}
Per the discussion at
https://lore.kernel.org/lkml/20181122065926.GG3441@jagdpanzerIV/T/#u,
This patch removes direct incommpressibe page writeback feature
(d2afd25114f4 ("zram: write incompressible pages to backing device")).
Below concerns from Sergey:
== &< ==
"IDLE writeback" is superior to "incompressible writeback".
"incompressible writeback" is completely unpredictable and uncontrollable;
it depens on data patterns and compression algorithms. While "IDLE
writeback" is predictable.
I even suspect, that, *ideally*, we can remove "incompressible writeback".
"IDLE pages" is a super set which also includes "incompressible" pages.
So, technically, we still can do "incompressible writeback" from "IDLE
writeback" path; but a much more reasonable one, based on a page idling
period.
I understand that you want to keep "direct incompressible writeback"
around. ZRAM is especially popular on devices which do suffer from flash
wearout, so I can see "incompressible writeback" path becoming a dead
code, long term.
== &< ==
Below concerns from Minchan:
== &< ==
My concern is if we enable CONFIG_ZRAM_WRITEBACK in this implementation,
both hugepage/idlepage writeck will turn on. However someuser want to
enable only idlepage writeback so we need to introduce turn on/off knob
for hugepage or new CONFIG_ZRAM_IDLEPAGE_WRITEBACK for those usecase. I
don't want to make it complicated *if possible*.
Long term, I imagine we need to make VM aware of new swap hierarchy a
little bit different with as-is. For example, first high priority swap
can return -EIO or -ENOCOMP, swap try to fallback to next lower priority
swap device. With that, hugepage writeback will work tranparently.
So we could regard it as regression because incompressible pages doesn't
go to backing storage automatically. Instead, user should do it via "echo
huge" > /sys/block/zram/writeback" manually.
== &< ==
Link: http://lkml.kernel.org/r/20181127055429.251614-6-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Joey Pabalinas <joeypabalinas@gmail.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 00:36:47 -08:00
|
|
|
&dev_attr_writeback.attr,
|
2018-12-28 00:36:54 -08:00
|
|
|
&dev_attr_writeback_limit.attr,
|
2019-01-08 15:22:53 -08:00
|
|
|
&dev_attr_writeback_limit_enable.attr,
|
2017-09-06 16:19:54 -07:00
|
|
|
#endif
|
2015-04-15 16:16:03 -07:00
|
|
|
&dev_attr_io_stat.attr,
|
2015-04-15 16:16:06 -07:00
|
|
|
&dev_attr_mm_stat.attr,
|
2018-12-28 00:36:51 -08:00
|
|
|
#ifdef CONFIG_ZRAM_WRITEBACK
|
|
|
|
&dev_attr_bd_stat.attr,
|
|
|
|
#endif
|
2016-05-20 17:00:02 -07:00
|
|
|
&dev_attr_debug_stat.attr,
|
2022-11-09 20:50:36 +09:00
|
|
|
#ifdef CONFIG_ZRAM_MULTI_COMP
|
|
|
|
&dev_attr_recomp_algorithm.attr,
|
2022-11-09 20:50:38 +09:00
|
|
|
&dev_attr_recompress.attr,
|
2022-11-09 20:50:36 +09:00
|
|
|
#endif
|
2024-09-02 19:56:03 +09:00
|
|
|
&dev_attr_algorithm_params.attr,
|
2013-06-22 03:21:18 +03:00
|
|
|
NULL,
|
|
|
|
};
|
|
|
|
|
2022-01-14 14:09:22 -08:00
|
|
|
ATTRIBUTE_GROUPS(zram_disk);
|
2018-09-28 08:17:22 +02:00
|
|
|
|
2015-06-25 15:00:19 -07:00
|
|
|
/*
|
|
|
|
* Allocate and initialize new zram device. the function returns
|
|
|
|
* '>= 0' device_id upon success, and negative value otherwise.
|
|
|
|
*/
|
|
|
|
static int zram_add(void)
|
2009-09-22 10:26:53 +05:30
|
|
|
{
|
2024-02-15 08:10:51 +01:00
|
|
|
struct queue_limits lim = {
|
|
|
|
.logical_block_size = ZRAM_LOGICAL_BLOCK_SIZE,
|
|
|
|
/*
|
|
|
|
* To ensure that we always get PAGE_SIZE aligned and
|
|
|
|
* n*PAGE_SIZED sized I/O requests.
|
|
|
|
*/
|
|
|
|
.physical_block_size = PAGE_SIZE,
|
|
|
|
.io_min = PAGE_SIZE,
|
|
|
|
.io_opt = PAGE_SIZE,
|
|
|
|
.max_hw_discard_sectors = UINT_MAX,
|
|
|
|
/*
|
|
|
|
* zram_bio_discard() will clear all logical blocks if logical
|
|
|
|
* block size is identical with physical block size(PAGE_SIZE).
|
|
|
|
* But if it is different, we will skip discarding some parts of
|
|
|
|
* logical blocks in the part of the request range which isn't
|
|
|
|
* aligned to physical block size. So we can't ensure that all
|
|
|
|
* discarded logical blocks are zeroed.
|
|
|
|
*/
|
|
|
|
#if ZRAM_LOGICAL_BLOCK_SIZE == PAGE_SIZE
|
|
|
|
.max_write_zeroes_sectors = UINT_MAX,
|
|
|
|
#endif
|
2024-06-17 08:04:45 +02:00
|
|
|
.features = BLK_FEAT_STABLE_WRITES |
|
|
|
|
BLK_FEAT_SYNCHRONOUS,
|
2024-02-15 08:10:51 +01:00
|
|
|
};
|
2015-06-25 15:00:06 -07:00
|
|
|
struct zram *zram;
|
2015-06-25 15:00:19 -07:00
|
|
|
int ret, device_id;
|
2015-06-25 15:00:06 -07:00
|
|
|
|
|
|
|
zram = kzalloc(sizeof(struct zram), GFP_KERNEL);
|
|
|
|
if (!zram)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
2015-06-25 15:00:19 -07:00
|
|
|
ret = idr_alloc(&zram_index_idr, zram, 0, 0, GFP_KERNEL);
|
2015-06-25 15:00:06 -07:00
|
|
|
if (ret < 0)
|
|
|
|
goto out_free_dev;
|
2015-06-25 15:00:19 -07:00
|
|
|
device_id = ret;
|
2010-01-28 21:13:40 +05:30
|
|
|
|
2011-09-06 15:02:11 +02:00
|
|
|
init_rwsem(&zram->init_lock);
|
2019-01-08 15:22:53 -08:00
|
|
|
#ifdef CONFIG_ZRAM_WRITEBACK
|
|
|
|
spin_lock_init(&zram->wb_limit_lock);
|
|
|
|
#endif
|
2009-09-22 10:26:53 +05:30
|
|
|
|
2015-06-25 15:00:06 -07:00
|
|
|
/* gendisk structure */
|
2024-02-15 08:10:51 +01:00
|
|
|
zram->disk = blk_alloc_disk(&lim, NUMA_NO_NODE);
|
2024-02-15 08:10:47 +01:00
|
|
|
if (IS_ERR(zram->disk)) {
|
2015-09-08 15:04:58 -07:00
|
|
|
pr_err("Error allocating disk structure for device %d\n",
|
2009-09-22 10:26:53 +05:30
|
|
|
device_id);
|
2024-02-15 08:10:47 +01:00
|
|
|
ret = PTR_ERR(zram->disk);
|
2021-05-21 07:51:00 +02:00
|
|
|
goto out_free_idr;
|
2009-09-22 10:26:53 +05:30
|
|
|
}
|
|
|
|
|
2010-06-01 13:31:25 +05:30
|
|
|
zram->disk->major = zram_major;
|
|
|
|
zram->disk->first_minor = device_id;
|
2021-05-21 07:51:00 +02:00
|
|
|
zram->disk->minors = 1;
|
2021-11-22 14:06:22 +01:00
|
|
|
zram->disk->flags |= GENHD_FL_NO_PART;
|
2010-06-01 13:31:25 +05:30
|
|
|
zram->disk->fops = &zram_devops;
|
|
|
|
zram->disk->private_data = zram;
|
|
|
|
snprintf(zram->disk->disk_name, 16, "zram%d", device_id);
|
2024-09-17 11:09:07 +09:00
|
|
|
atomic_set(&zram->pp_in_progress, 0);
|
2024-11-08 18:01:47 +08:00
|
|
|
zram_comp_params_reset(zram);
|
|
|
|
comp_algorithm_set(zram, ZRAM_PRIMARY_COMP, default_compressor);
|
2009-09-22 10:26:53 +05:30
|
|
|
|
2022-12-23 13:03:31 +09:00
|
|
|
/* Actual capacity set using sysfs (/sys/block/zram<id>/disksize */
|
2010-06-01 13:31:25 +05:30
|
|
|
set_capacity(zram->disk, 0);
|
2022-01-14 14:09:22 -08:00
|
|
|
ret = device_add_disk(NULL, zram->disk, zram_disk_groups);
|
2021-10-15 16:52:14 -07:00
|
|
|
if (ret)
|
|
|
|
goto out_cleanup_disk;
|
2018-09-28 08:17:22 +02:00
|
|
|
|
2018-06-07 17:05:49 -07:00
|
|
|
zram_debugfs_register(zram);
|
2015-06-25 15:00:14 -07:00
|
|
|
pr_info("Added device: %s\n", zram->disk->disk_name);
|
2015-06-25 15:00:19 -07:00
|
|
|
return device_id;
|
2010-01-28 21:13:40 +05:30
|
|
|
|
2021-10-15 16:52:14 -07:00
|
|
|
out_cleanup_disk:
|
2022-06-19 08:05:52 +02:00
|
|
|
put_disk(zram->disk);
|
2015-06-25 15:00:06 -07:00
|
|
|
out_free_idr:
|
|
|
|
idr_remove(&zram_index_idr, device_id);
|
|
|
|
out_free_dev:
|
|
|
|
kfree(zram);
|
2010-01-28 21:13:40 +05:30
|
|
|
return ret;
|
2009-09-22 10:26:53 +05:30
|
|
|
}
|
|
|
|
|
2015-06-25 15:00:24 -07:00
|
|
|
static int zram_remove(struct zram *zram)
|
2009-09-22 10:26:53 +05:30
|
|
|
{
|
2021-10-25 10:54:24 +08:00
|
|
|
bool claimed;
|
2015-06-25 15:00:24 -07:00
|
|
|
|
2022-03-30 07:29:05 +02:00
|
|
|
mutex_lock(&zram->disk->open_mutex);
|
2022-03-30 07:29:06 +02:00
|
|
|
if (disk_openers(zram->disk)) {
|
2022-03-30 07:29:05 +02:00
|
|
|
mutex_unlock(&zram->disk->open_mutex);
|
2015-06-25 15:00:24 -07:00
|
|
|
return -EBUSY;
|
|
|
|
}
|
|
|
|
|
2021-10-25 10:54:24 +08:00
|
|
|
claimed = zram->claim;
|
|
|
|
if (!claimed)
|
|
|
|
zram->claim = true;
|
2022-03-30 07:29:05 +02:00
|
|
|
mutex_unlock(&zram->disk->open_mutex);
|
2015-06-25 15:00:24 -07:00
|
|
|
|
2018-06-07 17:05:49 -07:00
|
|
|
zram_debugfs_unregister(zram);
|
2009-09-22 10:26:53 +05:30
|
|
|
|
2021-10-25 10:54:24 +08:00
|
|
|
if (claimed) {
|
|
|
|
/*
|
|
|
|
* If we were claimed by reset_store(), del_gendisk() will
|
|
|
|
* wait until reset_store() is done, so nothing need to do.
|
|
|
|
*/
|
|
|
|
;
|
|
|
|
} else {
|
|
|
|
/* Make sure all the pending I/O are finished */
|
2022-03-30 07:29:05 +02:00
|
|
|
sync_blockdev(zram->disk->part0);
|
2021-10-25 10:54:24 +08:00
|
|
|
zram_reset_device(zram);
|
|
|
|
}
|
2015-06-25 15:00:24 -07:00
|
|
|
|
|
|
|
pr_info("Removed device: %s\n", zram->disk->disk_name);
|
|
|
|
|
2015-06-25 15:00:06 -07:00
|
|
|
del_gendisk(zram->disk);
|
2021-10-25 10:54:24 +08:00
|
|
|
|
|
|
|
/* del_gendisk drains pending reset_store */
|
|
|
|
WARN_ON_ONCE(claimed && zram->claim);
|
|
|
|
|
2021-10-25 10:54:25 +08:00
|
|
|
/*
|
|
|
|
* disksize_store() may be called in between zram_reset_device()
|
|
|
|
* and del_gendisk(), so run the last reset to avoid leaking
|
|
|
|
* anything allocated with disksize_store()
|
|
|
|
*/
|
|
|
|
zram_reset_device(zram);
|
|
|
|
|
2022-06-19 08:05:52 +02:00
|
|
|
put_disk(zram->disk);
|
2015-06-25 15:00:06 -07:00
|
|
|
kfree(zram);
|
2015-06-25 15:00:24 -07:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* zram-control sysfs attributes */
|
2017-06-08 10:12:39 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* NOTE: hot_add attribute is not the usual read-only sysfs attribute. In a
|
|
|
|
* sense that reading from this file does alter the state of your system -- it
|
|
|
|
* creates a new un-initialized zram device and returns back this device's
|
|
|
|
* device_id (or an error code if it fails to create a new device).
|
|
|
|
*/
|
2023-03-25 09:45:37 +01:00
|
|
|
static ssize_t hot_add_show(const struct class *class,
|
|
|
|
const struct class_attribute *attr,
|
2015-06-25 15:00:24 -07:00
|
|
|
char *buf)
|
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
mutex_lock(&zram_index_mutex);
|
|
|
|
ret = zram_add();
|
|
|
|
mutex_unlock(&zram_index_mutex);
|
|
|
|
|
|
|
|
if (ret < 0)
|
|
|
|
return ret;
|
|
|
|
return scnprintf(buf, PAGE_SIZE, "%d\n", ret);
|
|
|
|
}
|
2023-04-18 15:47:15 +02:00
|
|
|
/* This attribute must be set to 0400, so CLASS_ATTR_RO() can not be used */
|
|
|
|
static struct class_attribute class_attr_hot_add =
|
|
|
|
__ATTR(hot_add, 0400, hot_add_show, NULL);
|
2015-06-25 15:00:24 -07:00
|
|
|
|
2023-03-25 09:45:37 +01:00
|
|
|
static ssize_t hot_remove_store(const struct class *class,
|
|
|
|
const struct class_attribute *attr,
|
2015-06-25 15:00:24 -07:00
|
|
|
const char *buf,
|
|
|
|
size_t count)
|
|
|
|
{
|
|
|
|
struct zram *zram;
|
|
|
|
int ret, dev_id;
|
|
|
|
|
|
|
|
/* dev_id is gendisk->first_minor, which is `int' */
|
|
|
|
ret = kstrtoint(buf, 10, &dev_id);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
if (dev_id < 0)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
mutex_lock(&zram_index_mutex);
|
|
|
|
|
|
|
|
zram = idr_find(&zram_index_idr, dev_id);
|
2016-01-15 16:54:48 -08:00
|
|
|
if (zram) {
|
2015-06-25 15:00:24 -07:00
|
|
|
ret = zram_remove(zram);
|
2016-11-30 15:54:08 -08:00
|
|
|
if (!ret)
|
|
|
|
idr_remove(&zram_index_idr, dev_id);
|
2016-01-15 16:54:48 -08:00
|
|
|
} else {
|
2015-06-25 15:00:24 -07:00
|
|
|
ret = -ENODEV;
|
2016-01-15 16:54:48 -08:00
|
|
|
}
|
2015-06-25 15:00:24 -07:00
|
|
|
|
|
|
|
mutex_unlock(&zram_index_mutex);
|
|
|
|
return ret ? ret : count;
|
2015-06-25 15:00:06 -07:00
|
|
|
}
|
2017-06-08 10:12:39 +02:00
|
|
|
static CLASS_ATTR_WO(hot_remove);
|
zram: rework reset and destroy path
We need to return set_capacity(disk, 0) from reset_store() back to
zram_reset_device(), a catch by Ganesh Mahendran. Potentially, we can
race set_capacity() calls from init and reset paths.
The problem is that zram_reset_device() is also getting called from
zram_exit(), which performs operations in misleading reversed order -- we
first create_device() and then init it, while zram_exit() perform
destroy_device() first and then does zram_reset_device(). This is done to
remove sysfs group before we reset device, so we can continue with device
reset/destruction not being raced by sysfs attr write (f.e. disksize).
Apart from that, destroy_device() releases zram->disk (but we still have
->disk pointer), so we cannot acces zram->disk in later
zram_reset_device() call, which may cause additional errors in the future.
So, this patch rework and cleanup destroy path.
1) remove several unneeded goto labels in zram_init()
2) factor out zram_init() error path and zram_exit() into
destroy_devices() function, which takes the number of devices to
destroy as its argument.
3) remove sysfs group in destroy_devices() first, so we can reorder
operations -- reset device (as expected) goes before disk destroy and
queue cleanup. So we can always access ->disk in zram_reset_device().
4) and, finally, return set_capacity() back under ->init_lock.
[akpm@linux-foundation.org: tweak comment]
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Ganesh Mahendran <opensource.ganesh@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 15:00:39 -08:00
|
|
|
|
2017-06-08 10:12:39 +02:00
|
|
|
static struct attribute *zram_control_class_attrs[] = {
|
|
|
|
&class_attr_hot_add.attr,
|
|
|
|
&class_attr_hot_remove.attr,
|
|
|
|
NULL,
|
2015-06-25 15:00:24 -07:00
|
|
|
};
|
2017-06-08 10:12:39 +02:00
|
|
|
ATTRIBUTE_GROUPS(zram_control_class);
|
2015-06-25 15:00:24 -07:00
|
|
|
|
|
|
|
static struct class zram_control_class = {
|
|
|
|
.name = "zram-control",
|
2017-06-08 10:12:39 +02:00
|
|
|
.class_groups = zram_control_class_groups,
|
2015-06-25 15:00:24 -07:00
|
|
|
};
|
|
|
|
|
2015-06-25 15:00:06 -07:00
|
|
|
static int zram_remove_cb(int id, void *ptr, void *data)
|
|
|
|
{
|
2021-10-25 10:54:24 +08:00
|
|
|
WARN_ON_ONCE(zram_remove(ptr));
|
2015-06-25 15:00:06 -07:00
|
|
|
return 0;
|
|
|
|
}
|
zram: rework reset and destroy path
We need to return set_capacity(disk, 0) from reset_store() back to
zram_reset_device(), a catch by Ganesh Mahendran. Potentially, we can
race set_capacity() calls from init and reset paths.
The problem is that zram_reset_device() is also getting called from
zram_exit(), which performs operations in misleading reversed order -- we
first create_device() and then init it, while zram_exit() perform
destroy_device() first and then does zram_reset_device(). This is done to
remove sysfs group before we reset device, so we can continue with device
reset/destruction not being raced by sysfs attr write (f.e. disksize).
Apart from that, destroy_device() releases zram->disk (but we still have
->disk pointer), so we cannot acces zram->disk in later
zram_reset_device() call, which may cause additional errors in the future.
So, this patch rework and cleanup destroy path.
1) remove several unneeded goto labels in zram_init()
2) factor out zram_init() error path and zram_exit() into
destroy_devices() function, which takes the number of devices to
destroy as its argument.
3) remove sysfs group in destroy_devices() first, so we can reorder
operations -- reset device (as expected) goes before disk destroy and
queue cleanup. So we can always access ->disk in zram_reset_device().
4) and, finally, return set_capacity() back under ->init_lock.
[akpm@linux-foundation.org: tweak comment]
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Ganesh Mahendran <opensource.ganesh@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 15:00:39 -08:00
|
|
|
|
2015-06-25 15:00:06 -07:00
|
|
|
static void destroy_devices(void)
|
|
|
|
{
|
2015-06-25 15:00:24 -07:00
|
|
|
class_unregister(&zram_control_class);
|
2015-06-25 15:00:06 -07:00
|
|
|
idr_for_each(&zram_index_idr, &zram_remove_cb, NULL);
|
2018-06-07 17:05:49 -07:00
|
|
|
zram_debugfs_destroy();
|
2015-06-25 15:00:06 -07:00
|
|
|
idr_destroy(&zram_index_idr);
|
zram: rework reset and destroy path
We need to return set_capacity(disk, 0) from reset_store() back to
zram_reset_device(), a catch by Ganesh Mahendran. Potentially, we can
race set_capacity() calls from init and reset paths.
The problem is that zram_reset_device() is also getting called from
zram_exit(), which performs operations in misleading reversed order -- we
first create_device() and then init it, while zram_exit() perform
destroy_device() first and then does zram_reset_device(). This is done to
remove sysfs group before we reset device, so we can continue with device
reset/destruction not being raced by sysfs attr write (f.e. disksize).
Apart from that, destroy_device() releases zram->disk (but we still have
->disk pointer), so we cannot acces zram->disk in later
zram_reset_device() call, which may cause additional errors in the future.
So, this patch rework and cleanup destroy path.
1) remove several unneeded goto labels in zram_init()
2) factor out zram_init() error path and zram_exit() into
destroy_devices() function, which takes the number of devices to
destroy as its argument.
3) remove sysfs group in destroy_devices() first, so we can reorder
operations -- reset device (as expected) goes before disk destroy and
queue cleanup. So we can always access ->disk in zram_reset_device().
4) and, finally, return set_capacity() back under ->init_lock.
[akpm@linux-foundation.org: tweak comment]
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Ganesh Mahendran <opensource.ganesh@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 15:00:39 -08:00
|
|
|
unregister_blkdev(zram_major, "zram");
|
2016-11-27 00:13:46 +01:00
|
|
|
cpuhp_remove_multi_state(CPUHP_ZCOMP_PREPARE);
|
2009-09-22 10:26:53 +05:30
|
|
|
}
|
|
|
|
|
2010-06-01 13:31:25 +05:30
|
|
|
static int __init zram_init(void)
|
2009-09-22 10:26:53 +05:30
|
|
|
{
|
2024-09-06 16:14:45 +02:00
|
|
|
struct zram_table_entry zram_te;
|
2015-06-25 15:00:19 -07:00
|
|
|
int ret;
|
2009-09-22 10:26:53 +05:30
|
|
|
|
2024-09-06 16:14:45 +02:00
|
|
|
BUILD_BUG_ON(__NR_ZRAM_PAGEFLAGS > sizeof(zram_te.flags) * 8);
|
2022-09-13 00:27:44 +09:00
|
|
|
|
2016-11-27 00:13:46 +01:00
|
|
|
ret = cpuhp_setup_state_multi(CPUHP_ZCOMP_PREPARE, "block/zram:prepare",
|
|
|
|
zcomp_cpu_up_prepare, zcomp_cpu_dead);
|
|
|
|
if (ret < 0)
|
|
|
|
return ret;
|
|
|
|
|
2015-06-25 15:00:24 -07:00
|
|
|
ret = class_register(&zram_control_class);
|
|
|
|
if (ret) {
|
2015-09-08 15:04:58 -07:00
|
|
|
pr_err("Unable to register zram-control class\n");
|
2016-11-27 00:13:46 +01:00
|
|
|
cpuhp_remove_multi_state(CPUHP_ZCOMP_PREPARE);
|
2015-06-25 15:00:24 -07:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2018-06-07 17:05:49 -07:00
|
|
|
zram_debugfs_create();
|
2010-06-01 13:31:25 +05:30
|
|
|
zram_major = register_blkdev(0, "zram");
|
|
|
|
if (zram_major <= 0) {
|
2015-09-08 15:04:58 -07:00
|
|
|
pr_err("Unable to get major number\n");
|
2015-06-25 15:00:24 -07:00
|
|
|
class_unregister(&zram_control_class);
|
2016-11-27 00:13:46 +01:00
|
|
|
cpuhp_remove_multi_state(CPUHP_ZCOMP_PREPARE);
|
zram: rework reset and destroy path
We need to return set_capacity(disk, 0) from reset_store() back to
zram_reset_device(), a catch by Ganesh Mahendran. Potentially, we can
race set_capacity() calls from init and reset paths.
The problem is that zram_reset_device() is also getting called from
zram_exit(), which performs operations in misleading reversed order -- we
first create_device() and then init it, while zram_exit() perform
destroy_device() first and then does zram_reset_device(). This is done to
remove sysfs group before we reset device, so we can continue with device
reset/destruction not being raced by sysfs attr write (f.e. disksize).
Apart from that, destroy_device() releases zram->disk (but we still have
->disk pointer), so we cannot acces zram->disk in later
zram_reset_device() call, which may cause additional errors in the future.
So, this patch rework and cleanup destroy path.
1) remove several unneeded goto labels in zram_init()
2) factor out zram_init() error path and zram_exit() into
destroy_devices() function, which takes the number of devices to
destroy as its argument.
3) remove sysfs group in destroy_devices() first, so we can reorder
operations -- reset device (as expected) goes before disk destroy and
queue cleanup. So we can always access ->disk in zram_reset_device().
4) and, finally, return set_capacity() back under ->init_lock.
[akpm@linux-foundation.org: tweak comment]
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Ganesh Mahendran <opensource.ganesh@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 15:00:39 -08:00
|
|
|
return -EBUSY;
|
2009-09-22 10:26:53 +05:30
|
|
|
}
|
|
|
|
|
2015-06-25 15:00:19 -07:00
|
|
|
while (num_devices != 0) {
|
2015-06-25 15:00:24 -07:00
|
|
|
mutex_lock(&zram_index_mutex);
|
2015-06-25 15:00:19 -07:00
|
|
|
ret = zram_add();
|
2015-06-25 15:00:24 -07:00
|
|
|
mutex_unlock(&zram_index_mutex);
|
2015-06-25 15:00:19 -07:00
|
|
|
if (ret < 0)
|
zram: rework reset and destroy path
We need to return set_capacity(disk, 0) from reset_store() back to
zram_reset_device(), a catch by Ganesh Mahendran. Potentially, we can
race set_capacity() calls from init and reset paths.
The problem is that zram_reset_device() is also getting called from
zram_exit(), which performs operations in misleading reversed order -- we
first create_device() and then init it, while zram_exit() perform
destroy_device() first and then does zram_reset_device(). This is done to
remove sysfs group before we reset device, so we can continue with device
reset/destruction not being raced by sysfs attr write (f.e. disksize).
Apart from that, destroy_device() releases zram->disk (but we still have
->disk pointer), so we cannot acces zram->disk in later
zram_reset_device() call, which may cause additional errors in the future.
So, this patch rework and cleanup destroy path.
1) remove several unneeded goto labels in zram_init()
2) factor out zram_init() error path and zram_exit() into
destroy_devices() function, which takes the number of devices to
destroy as its argument.
3) remove sysfs group in destroy_devices() first, so we can reorder
operations -- reset device (as expected) goes before disk destroy and
queue cleanup. So we can always access ->disk in zram_reset_device().
4) and, finally, return set_capacity() back under ->init_lock.
[akpm@linux-foundation.org: tweak comment]
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Ganesh Mahendran <opensource.ganesh@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 15:00:39 -08:00
|
|
|
goto out_error;
|
2015-06-25 15:00:19 -07:00
|
|
|
num_devices--;
|
2010-01-28 21:13:40 +05:30
|
|
|
}
|
|
|
|
|
2009-09-22 10:26:53 +05:30
|
|
|
return 0;
|
2010-01-28 21:13:40 +05:30
|
|
|
|
zram: rework reset and destroy path
We need to return set_capacity(disk, 0) from reset_store() back to
zram_reset_device(), a catch by Ganesh Mahendran. Potentially, we can
race set_capacity() calls from init and reset paths.
The problem is that zram_reset_device() is also getting called from
zram_exit(), which performs operations in misleading reversed order -- we
first create_device() and then init it, while zram_exit() perform
destroy_device() first and then does zram_reset_device(). This is done to
remove sysfs group before we reset device, so we can continue with device
reset/destruction not being raced by sysfs attr write (f.e. disksize).
Apart from that, destroy_device() releases zram->disk (but we still have
->disk pointer), so we cannot acces zram->disk in later
zram_reset_device() call, which may cause additional errors in the future.
So, this patch rework and cleanup destroy path.
1) remove several unneeded goto labels in zram_init()
2) factor out zram_init() error path and zram_exit() into
destroy_devices() function, which takes the number of devices to
destroy as its argument.
3) remove sysfs group in destroy_devices() first, so we can reorder
operations -- reset device (as expected) goes before disk destroy and
queue cleanup. So we can always access ->disk in zram_reset_device().
4) and, finally, return set_capacity() back under ->init_lock.
[akpm@linux-foundation.org: tweak comment]
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Ganesh Mahendran <opensource.ganesh@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 15:00:39 -08:00
|
|
|
out_error:
|
2015-06-25 15:00:06 -07:00
|
|
|
destroy_devices();
|
2009-09-22 10:26:53 +05:30
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2010-06-01 13:31:25 +05:30
|
|
|
static void __exit zram_exit(void)
|
2009-09-22 10:26:53 +05:30
|
|
|
{
|
2015-06-25 15:00:06 -07:00
|
|
|
destroy_devices();
|
2009-09-22 10:26:53 +05:30
|
|
|
}
|
|
|
|
|
2010-06-01 13:31:25 +05:30
|
|
|
module_init(zram_init);
|
|
|
|
module_exit(zram_exit);
|
2009-09-22 10:26:53 +05:30
|
|
|
|
2013-06-22 03:21:18 +03:00
|
|
|
module_param(num_devices, uint, 0);
|
2015-06-25 15:00:11 -07:00
|
|
|
MODULE_PARM_DESC(num_devices, "Number of pre-created zram devices");
|
2013-06-22 03:21:18 +03:00
|
|
|
|
2009-09-22 10:26:53 +05:30
|
|
|
MODULE_LICENSE("Dual BSD/GPL");
|
|
|
|
MODULE_AUTHOR("Nitin Gupta <ngupta@vflare.org>");
|
2010-06-01 13:31:25 +05:30
|
|
|
MODULE_DESCRIPTION("Compressed RAM Block Device");
|