bcachefs updates for 6.8:

- btree write buffer rewrite: instead of adding keys to the btree write
    buffer at transaction commit time, we know journal them with a
    different journal entry type and copy them from the journal to the
    write buffer just prior to journal write.
 
    This reduces the number of atomic operations on shared cachelines
    in the transaction commit path and is a signicant performance
    improvement on some workloads: multithreaded 4k random writes went
    from ~650k iops to ~850k iops.
 
  - Bring back optimistic spinning for six locks: the new implementation
    doesn't use osq locks; instead we add to the lock waitlist as normal,
    and then spin on the lock_acquired bit in the waitlist entry, _not_
    the lock itself.
 
  - BCH_IOCTL_DEV_USAGE_V2, which allows for new data types
  - BCH_IOCTL_OFFLINE_FSCK, which runs the kernel implementation of fsck
    but without mounting: useful for transparently using the kernel
    version of fsck from 'bcachefs fsck' when the kernel version is a
    better match for the on disk filesystem.
 
  - BCH_IOCTL_ONLINE_FSCK: online fsck. Not all passes are supported yet,
    but the passes that are supported are fully featured - errors may be
    corrected as normal.
 
    The new ioctls use the new 'thread_with_file' abstraction for kicking
    off a kthread that's tied to a file descriptor returned to userspace
    via the ioctl.
 
  - btree_paths within a btree_trans are now dynamically growable,
    instead of being limited to 64. This is important for the
    check_directory_structure phase of fsck, and also fixes some issues
    we were having with btree path overflow in the reflink btree.
 
  - Trigger refactoring; prep work for the upcoming disk space accounting
    rewrite
 
  - Numerous bugfixes :)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmWe8PUACgkQE6szbY3K
 bnYw6g/9GAXfIGasTZZwK2XEr36RYtEFYMwd/m9V1ET0DH6d/MFH9G7tTYl52AQ4
 k9cDFb0d2qdtNk2Rlml1lHFrxMzkp2Q7j9S4YcETrE+/Dir8ODVcJXrGeNTCMGmz
 B+C12mTOpWrzGMrioRgFZjWAnacsY3RP8NFRTT9HIJHO9UCP+xN5y++sX10C5Gwv
 7UVWTaUwjkgdYWkR8RCKGXuG5cNNlRp4Y0eeK2XruG1iI9VAilir1glcD/YMOY8M
 vECQzmf2ZLGFS/tpnmqVhNbNwVWpTQMYassvKaisWNHLDUgskOoF8YfoYSH27t7F
 GBb1154O2ga6ea866677FDeNVlg386mGCTUy2xOhMpDL3zW+/Is+8MdfJI4MJP5R
 EwcjHnn2bk0C2kULbAohw0gnU42FulfvsLNnrfxCeygmZrDoOOCL1HpvnBG4vskc
 Fp6NK83l974QnyLdPsjr1yB2d2pgb+uMP1v76IukQi0IjNSAyvwSa5nloPTHRzpC
 j6e2cFpdtX+6vEu6KngXVKTblSEnwhVBTaTR37Lr8PX1sZqFS/+mjRDgg3HZa/GI
 u0fC0mQyVL9KjDs5LJGpTc/qs8J4mpoS5+dfzn38MI76dFxd5TYZKWVfILTrOtDF
 ugDnoLkMuYFdueKI2M3YzxXyaA7HBT+7McAdENuJJzJnEuSAZs0=
 =JvA2
 -----END PGP SIGNATURE-----

Merge tag 'bcachefs-2024-01-10' of https://evilpiepirate.org/git/bcachefs

Pull bcachefs updates from Kent Overstreet:

 - btree write buffer rewrite: instead of adding keys to the btree write
   buffer at transaction commit time, we now journal them with a
   different journal entry type and copy them from the journal to the
   write buffer just prior to journal write.

   This reduces the number of atomic operations on shared cachelines in
   the transaction commit path and is a signicant performance
   improvement on some workloads: multithreaded 4k random writes went
   from ~650k iops to ~850k iops.

 - Bring back optimistic spinning for six locks: the new implementation
   doesn't use osq locks; instead we add to the lock waitlist as normal,
   and then spin on the lock_acquired bit in the waitlist entry, _not_
   the lock itself.

 - New ioctls:

    - BCH_IOCTL_DEV_USAGE_V2, which allows for new data types

    - BCH_IOCTL_OFFLINE_FSCK, which runs the kernel implementation of
      fsck but without mounting: useful for transparently using the
      kernel version of fsck from 'bcachefs fsck' when the kernel
      version is a better match for the on disk filesystem.

    - BCH_IOCTL_ONLINE_FSCK: online fsck. Not all passes are supported
      yet, but the passes that are supported are fully featured - errors
      may be corrected as normal.

   The new ioctls use the new 'thread_with_file' abstraction for kicking
   off a kthread that's tied to a file descriptor returned to userspace
   via the ioctl.

 - btree_paths within a btree_trans are now dynamically growable,
   instead of being limited to 64. This is important for the
   check_directory_structure phase of fsck, and also fixes some issues
   we were having with btree path overflow in the reflink btree.

 - Trigger refactoring; prep work for the upcoming disk space accounting
   rewrite

 - Numerous bugfixes :)

* tag 'bcachefs-2024-01-10' of https://evilpiepirate.org/git/bcachefs: (226 commits)
  bcachefs: eytzinger0_find() search should be const
  bcachefs: move "ptrs not changing" optimization to bch2_trigger_extent()
  bcachefs: fix simulateously upgrading & downgrading
  bcachefs: Restart recovery passes more reliably
  bcachefs: bch2_dump_bset() doesn't choke on u64s == 0
  bcachefs: improve checksum error messages
  bcachefs: improve validate_bset_keys()
  bcachefs: print sb magic when relevant
  bcachefs: __bch2_sb_field_to_text()
  bcachefs: %pg is banished
  bcachefs: Improve would_deadlock trace event
  bcachefs: fsck_err()s don't need to manually check c->sb.version anymore
  bcachefs: Upgrades now specify errors to fix, like downgrades
  bcachefs: no thread_with_file in userspace
  bcachefs: Don't autofix errors we can't fix
  bcachefs: add missing bch2_latency_acct() call
  bcachefs: increase max_active on io_complete_wq
  bcachefs: add time_stats for btree_node_read_done()
  bcachefs: don't clear accessed bit in btree node fill
  bcachefs: Add an option to control btree node prefetching
  ...
This commit is contained in:
Linus Torvalds 2024-01-10 16:34:17 -08:00
commit 999a36b52b
125 changed files with 7058 additions and 5918 deletions

View File

@ -3502,7 +3502,7 @@ F: drivers/net/hamradio/baycom*
BCACHE (BLOCK LAYER CACHE) BCACHE (BLOCK LAYER CACHE)
M: Coly Li <colyli@suse.de> M: Coly Li <colyli@suse.de>
M: Kent Overstreet <kent.overstreet@gmail.com> M: Kent Overstreet <kent.overstreet@linux.dev>
L: linux-bcache@vger.kernel.org L: linux-bcache@vger.kernel.org
S: Maintained S: Maintained
W: http://bcache.evilpiepirate.org W: http://bcache.evilpiepirate.org

View File

@ -23,6 +23,8 @@ EXPORT_SYMBOL_GPL(powerpc_firmware_features);
#if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_KVM_GUEST) #if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_KVM_GUEST)
DEFINE_STATIC_KEY_FALSE(kvm_guest); DEFINE_STATIC_KEY_FALSE(kvm_guest);
EXPORT_SYMBOL_GPL(kvm_guest);
int __init check_kvm_guest(void) int __init check_kvm_guest(void)
{ {
struct device_node *hyper_node; struct device_node *hyper_node;

View File

@ -50,14 +50,6 @@ config BCACHEFS_POSIX_ACL
depends on BCACHEFS_FS depends on BCACHEFS_FS
select FS_POSIX_ACL select FS_POSIX_ACL
config BCACHEFS_DEBUG_TRANSACTIONS
bool "bcachefs runtime info"
depends on BCACHEFS_FS
help
This makes the list of running btree transactions available in debugfs.
This is a highly useful debugging feature but does add a small amount of overhead.
config BCACHEFS_DEBUG config BCACHEFS_DEBUG
bool "bcachefs debugging" bool "bcachefs debugging"
depends on BCACHEFS_FS depends on BCACHEFS_FS
@ -85,6 +77,16 @@ config BCACHEFS_NO_LATENCY_ACCT
help help
This disables device latency tracking and time stats, only for performance testing This disables device latency tracking and time stats, only for performance testing
config BCACHEFS_SIX_OPTIMISTIC_SPIN
bool "Optimistic spinning for six locks"
depends on BCACHEFS_FS
depends on SMP
default y
help
Instead of immediately sleeping when attempting to take a six lock that
is held by another thread, spin for a short while, as long as the
thread owning the lock is running.
config MEAN_AND_VARIANCE_UNIT_TEST config MEAN_AND_VARIANCE_UNIT_TEST
tristate "mean_and_variance unit tests" if !KUNIT_ALL_TESTS tristate "mean_and_variance unit tests" if !KUNIT_ALL_TESTS
depends on KUNIT depends on KUNIT

View File

@ -82,6 +82,7 @@ bcachefs-y := \
super-io.o \ super-io.o \
sysfs.o \ sysfs.o \
tests.o \ tests.o \
thread_with_file.o \
trace.o \ trace.o \
two_state_shared_lock.o \ two_state_shared_lock.o \
util.o \ util.o \

View File

@ -261,10 +261,8 @@ int bch2_alloc_v4_invalid(struct bch_fs *c, struct bkey_s_c k,
case BCH_DATA_free: case BCH_DATA_free:
case BCH_DATA_need_gc_gens: case BCH_DATA_need_gc_gens:
case BCH_DATA_need_discard: case BCH_DATA_need_discard:
bkey_fsck_err_on(a.v->dirty_sectors || bkey_fsck_err_on(bch2_bucket_sectors(*a.v) || a.v->stripe,
a.v->cached_sectors || c, err, alloc_key_empty_but_have_data,
a.v->stripe, c, err,
alloc_key_empty_but_have_data,
"empty data type free but have data"); "empty data type free but have data");
break; break;
case BCH_DATA_sb: case BCH_DATA_sb:
@ -272,22 +270,21 @@ int bch2_alloc_v4_invalid(struct bch_fs *c, struct bkey_s_c k,
case BCH_DATA_btree: case BCH_DATA_btree:
case BCH_DATA_user: case BCH_DATA_user:
case BCH_DATA_parity: case BCH_DATA_parity:
bkey_fsck_err_on(!a.v->dirty_sectors, c, err, bkey_fsck_err_on(!bch2_bucket_sectors_dirty(*a.v),
alloc_key_dirty_sectors_0, c, err, alloc_key_dirty_sectors_0,
"data_type %s but dirty_sectors==0", "data_type %s but dirty_sectors==0",
bch2_data_types[a.v->data_type]); bch2_data_types[a.v->data_type]);
break; break;
case BCH_DATA_cached: case BCH_DATA_cached:
bkey_fsck_err_on(!a.v->cached_sectors || bkey_fsck_err_on(!a.v->cached_sectors ||
a.v->dirty_sectors || bch2_bucket_sectors_dirty(*a.v) ||
a.v->stripe, c, err, a.v->stripe,
alloc_key_cached_inconsistency, c, err, alloc_key_cached_inconsistency,
"data type inconsistency"); "data type inconsistency");
bkey_fsck_err_on(!a.v->io_time[READ] && bkey_fsck_err_on(!a.v->io_time[READ] &&
c->curr_recovery_pass > BCH_RECOVERY_PASS_check_alloc_to_lru_refs, c->curr_recovery_pass > BCH_RECOVERY_PASS_check_alloc_to_lru_refs,
c, err, c, err, alloc_key_cached_but_read_time_zero,
alloc_key_cached_but_read_time_zero,
"cached bucket with read_time == 0"); "cached bucket with read_time == 0");
break; break;
case BCH_DATA_stripe: case BCH_DATA_stripe:
@ -537,18 +534,12 @@ void bch2_bucket_gens_to_text(struct printbuf *out, struct bch_fs *c, struct bke
int bch2_bucket_gens_init(struct bch_fs *c) int bch2_bucket_gens_init(struct bch_fs *c)
{ {
struct btree_trans *trans = bch2_trans_get(c); struct btree_trans *trans = bch2_trans_get(c);
struct btree_iter iter;
struct bkey_s_c k;
struct bch_alloc_v4 a;
struct bkey_i_bucket_gens g; struct bkey_i_bucket_gens g;
bool have_bucket_gens_key = false; bool have_bucket_gens_key = false;
unsigned offset;
struct bpos pos;
u8 gen;
int ret; int ret;
for_each_btree_key(trans, iter, BTREE_ID_alloc, POS_MIN, ret = for_each_btree_key(trans, iter, BTREE_ID_alloc, POS_MIN,
BTREE_ITER_PREFETCH, k, ret) { BTREE_ITER_PREFETCH, k, ({
/* /*
* Not a fsck error because this is checked/repaired by * Not a fsck error because this is checked/repaired by
* bch2_check_alloc_key() which runs later: * bch2_check_alloc_key() which runs later:
@ -556,13 +547,14 @@ int bch2_bucket_gens_init(struct bch_fs *c)
if (!bch2_dev_bucket_exists(c, k.k->p)) if (!bch2_dev_bucket_exists(c, k.k->p))
continue; continue;
gen = bch2_alloc_to_v4(k, &a)->gen; struct bch_alloc_v4 a;
pos = alloc_gens_pos(iter.pos, &offset); u8 gen = bch2_alloc_to_v4(k, &a)->gen;
unsigned offset;
struct bpos pos = alloc_gens_pos(iter.pos, &offset);
if (have_bucket_gens_key && bkey_cmp(iter.pos, pos)) { if (have_bucket_gens_key && bkey_cmp(iter.pos, pos)) {
ret = commit_do(trans, NULL, NULL, ret = commit_do(trans, NULL, NULL,
BTREE_INSERT_NOFAIL| BCH_TRANS_COMMIT_no_enospc,
BTREE_INSERT_LAZY_RW,
bch2_btree_insert_trans(trans, BTREE_ID_bucket_gens, &g.k_i, 0)); bch2_btree_insert_trans(trans, BTREE_ID_bucket_gens, &g.k_i, 0));
if (ret) if (ret)
break; break;
@ -576,45 +568,37 @@ int bch2_bucket_gens_init(struct bch_fs *c)
} }
g.v.gens[offset] = gen; g.v.gens[offset] = gen;
} 0;
bch2_trans_iter_exit(trans, &iter); }));
if (have_bucket_gens_key && !ret) if (have_bucket_gens_key && !ret)
ret = commit_do(trans, NULL, NULL, ret = commit_do(trans, NULL, NULL,
BTREE_INSERT_NOFAIL| BCH_TRANS_COMMIT_no_enospc,
BTREE_INSERT_LAZY_RW,
bch2_btree_insert_trans(trans, BTREE_ID_bucket_gens, &g.k_i, 0)); bch2_btree_insert_trans(trans, BTREE_ID_bucket_gens, &g.k_i, 0));
bch2_trans_put(trans); bch2_trans_put(trans);
if (ret) bch_err_fn(c, ret);
bch_err_fn(c, ret);
return ret; return ret;
} }
int bch2_alloc_read(struct bch_fs *c) int bch2_alloc_read(struct bch_fs *c)
{ {
struct btree_trans *trans = bch2_trans_get(c); struct btree_trans *trans = bch2_trans_get(c);
struct btree_iter iter;
struct bkey_s_c k;
struct bch_dev *ca;
int ret; int ret;
down_read(&c->gc_lock); down_read(&c->gc_lock);
if (c->sb.version_upgrade_complete >= bcachefs_metadata_version_bucket_gens) { if (c->sb.version_upgrade_complete >= bcachefs_metadata_version_bucket_gens) {
const struct bch_bucket_gens *g; ret = for_each_btree_key(trans, iter, BTREE_ID_bucket_gens, POS_MIN,
u64 b; BTREE_ITER_PREFETCH, k, ({
for_each_btree_key(trans, iter, BTREE_ID_bucket_gens, POS_MIN,
BTREE_ITER_PREFETCH, k, ret) {
u64 start = bucket_gens_pos_to_alloc(k.k->p, 0).offset; u64 start = bucket_gens_pos_to_alloc(k.k->p, 0).offset;
u64 end = bucket_gens_pos_to_alloc(bpos_nosnap_successor(k.k->p), 0).offset; u64 end = bucket_gens_pos_to_alloc(bpos_nosnap_successor(k.k->p), 0).offset;
if (k.k->type != KEY_TYPE_bucket_gens) if (k.k->type != KEY_TYPE_bucket_gens)
continue; continue;
g = bkey_s_c_to_bucket_gens(k).v; const struct bch_bucket_gens *g = bkey_s_c_to_bucket_gens(k).v;
/* /*
* Not a fsck error because this is checked/repaired by * Not a fsck error because this is checked/repaired by
@ -623,19 +607,17 @@ int bch2_alloc_read(struct bch_fs *c)
if (!bch2_dev_exists2(c, k.k->p.inode)) if (!bch2_dev_exists2(c, k.k->p.inode))
continue; continue;
ca = bch_dev_bkey_exists(c, k.k->p.inode); struct bch_dev *ca = bch_dev_bkey_exists(c, k.k->p.inode);
for (b = max_t(u64, ca->mi.first_bucket, start); for (u64 b = max_t(u64, ca->mi.first_bucket, start);
b < min_t(u64, ca->mi.nbuckets, end); b < min_t(u64, ca->mi.nbuckets, end);
b++) b++)
*bucket_gen(ca, b) = g->gens[b & KEY_TYPE_BUCKET_GENS_MASK]; *bucket_gen(ca, b) = g->gens[b & KEY_TYPE_BUCKET_GENS_MASK];
} 0;
bch2_trans_iter_exit(trans, &iter); }));
} else { } else {
struct bch_alloc_v4 a; ret = for_each_btree_key(trans, iter, BTREE_ID_alloc, POS_MIN,
BTREE_ITER_PREFETCH, k, ({
for_each_btree_key(trans, iter, BTREE_ID_alloc, POS_MIN,
BTREE_ITER_PREFETCH, k, ret) {
/* /*
* Not a fsck error because this is checked/repaired by * Not a fsck error because this is checked/repaired by
* bch2_check_alloc_key() which runs later: * bch2_check_alloc_key() which runs later:
@ -643,19 +625,18 @@ int bch2_alloc_read(struct bch_fs *c)
if (!bch2_dev_bucket_exists(c, k.k->p)) if (!bch2_dev_bucket_exists(c, k.k->p))
continue; continue;
ca = bch_dev_bkey_exists(c, k.k->p.inode); struct bch_dev *ca = bch_dev_bkey_exists(c, k.k->p.inode);
struct bch_alloc_v4 a;
*bucket_gen(ca, k.k->p.offset) = bch2_alloc_to_v4(k, &a)->gen; *bucket_gen(ca, k.k->p.offset) = bch2_alloc_to_v4(k, &a)->gen;
} 0;
bch2_trans_iter_exit(trans, &iter); }));
} }
bch2_trans_put(trans); bch2_trans_put(trans);
up_read(&c->gc_lock); up_read(&c->gc_lock);
if (ret) bch_err_fn(c, ret);
bch_err_fn(c, ret);
return ret; return ret;
} }
@ -768,83 +749,177 @@ static noinline int bch2_bucket_gen_update(struct btree_trans *trans,
return ret; return ret;
} }
int bch2_trans_mark_alloc(struct btree_trans *trans, int bch2_trigger_alloc(struct btree_trans *trans,
enum btree_id btree_id, unsigned level, enum btree_id btree, unsigned level,
struct bkey_s_c old, struct bkey_i *new, struct bkey_s_c old, struct bkey_s new,
unsigned flags) unsigned flags)
{ {
struct bch_fs *c = trans->c; struct bch_fs *c = trans->c;
struct bch_alloc_v4 old_a_convert, *new_a;
const struct bch_alloc_v4 *old_a;
u64 old_lru, new_lru;
int ret = 0; int ret = 0;
/* if (bch2_trans_inconsistent_on(!bch2_dev_bucket_exists(c, new.k->p), trans,
* Deletion only happens in the device removal path, with "alloc key for invalid device or bucket"))
* BTREE_TRIGGER_NORUN: return -EIO;
*/
BUG_ON(new->k.type != KEY_TYPE_alloc_v4);
old_a = bch2_alloc_to_v4(old, &old_a_convert); struct bch_dev *ca = bch_dev_bkey_exists(c, new.k->p.inode);
new_a = &bkey_i_to_alloc_v4(new)->v;
new_a->data_type = alloc_data_type(*new_a, new_a->data_type); struct bch_alloc_v4 old_a_convert;
const struct bch_alloc_v4 *old_a = bch2_alloc_to_v4(old, &old_a_convert);
if (new_a->dirty_sectors > old_a->dirty_sectors || if (flags & BTREE_TRIGGER_TRANSACTIONAL) {
new_a->cached_sectors > old_a->cached_sectors) { struct bch_alloc_v4 *new_a = bkey_s_to_alloc_v4(new).v;
new_a->io_time[READ] = max_t(u64, 1, atomic64_read(&c->io_clock[READ].now));
new_a->io_time[WRITE]= max_t(u64, 1, atomic64_read(&c->io_clock[WRITE].now)); new_a->data_type = alloc_data_type(*new_a, new_a->data_type);
SET_BCH_ALLOC_V4_NEED_INC_GEN(new_a, true);
SET_BCH_ALLOC_V4_NEED_DISCARD(new_a, true); if (bch2_bucket_sectors(*new_a) > bch2_bucket_sectors(*old_a)) {
new_a->io_time[READ] = max_t(u64, 1, atomic64_read(&c->io_clock[READ].now));
new_a->io_time[WRITE]= max_t(u64, 1, atomic64_read(&c->io_clock[WRITE].now));
SET_BCH_ALLOC_V4_NEED_INC_GEN(new_a, true);
SET_BCH_ALLOC_V4_NEED_DISCARD(new_a, true);
}
if (data_type_is_empty(new_a->data_type) &&
BCH_ALLOC_V4_NEED_INC_GEN(new_a) &&
!bch2_bucket_is_open_safe(c, new.k->p.inode, new.k->p.offset)) {
new_a->gen++;
SET_BCH_ALLOC_V4_NEED_INC_GEN(new_a, false);
}
if (old_a->data_type != new_a->data_type ||
(new_a->data_type == BCH_DATA_free &&
alloc_freespace_genbits(*old_a) != alloc_freespace_genbits(*new_a))) {
ret = bch2_bucket_do_index(trans, old, old_a, false) ?:
bch2_bucket_do_index(trans, new.s_c, new_a, true);
if (ret)
return ret;
}
if (new_a->data_type == BCH_DATA_cached &&
!new_a->io_time[READ])
new_a->io_time[READ] = max_t(u64, 1, atomic64_read(&c->io_clock[READ].now));
u64 old_lru = alloc_lru_idx_read(*old_a);
u64 new_lru = alloc_lru_idx_read(*new_a);
if (old_lru != new_lru) {
ret = bch2_lru_change(trans, new.k->p.inode,
bucket_to_u64(new.k->p),
old_lru, new_lru);
if (ret)
return ret;
}
new_a->fragmentation_lru = alloc_lru_idx_fragmentation(*new_a,
bch_dev_bkey_exists(c, new.k->p.inode));
if (old_a->fragmentation_lru != new_a->fragmentation_lru) {
ret = bch2_lru_change(trans,
BCH_LRU_FRAGMENTATION_START,
bucket_to_u64(new.k->p),
old_a->fragmentation_lru, new_a->fragmentation_lru);
if (ret)
return ret;
}
if (old_a->gen != new_a->gen) {
ret = bch2_bucket_gen_update(trans, new.k->p, new_a->gen);
if (ret)
return ret;
}
/*
* need to know if we're getting called from the invalidate path or
* not:
*/
if ((flags & BTREE_TRIGGER_BUCKET_INVALIDATE) &&
old_a->cached_sectors) {
ret = bch2_update_cached_sectors_list(trans, new.k->p.inode,
-((s64) old_a->cached_sectors));
if (ret)
return ret;
}
} }
if (data_type_is_empty(new_a->data_type) && if (!(flags & BTREE_TRIGGER_TRANSACTIONAL) && (flags & BTREE_TRIGGER_INSERT)) {
BCH_ALLOC_V4_NEED_INC_GEN(new_a) && struct bch_alloc_v4 *new_a = bkey_s_to_alloc_v4(new).v;
!bch2_bucket_is_open_safe(c, new->k.p.inode, new->k.p.offset)) { u64 journal_seq = trans->journal_res.seq;
new_a->gen++; u64 bucket_journal_seq = new_a->journal_seq;
SET_BCH_ALLOC_V4_NEED_INC_GEN(new_a, false);
if ((flags & BTREE_TRIGGER_INSERT) &&
data_type_is_empty(old_a->data_type) !=
data_type_is_empty(new_a->data_type) &&
new.k->type == KEY_TYPE_alloc_v4) {
struct bch_alloc_v4 *v = bkey_s_to_alloc_v4(new).v;
/*
* If the btree updates referring to a bucket weren't flushed
* before the bucket became empty again, then the we don't have
* to wait on a journal flush before we can reuse the bucket:
*/
v->journal_seq = bucket_journal_seq =
data_type_is_empty(new_a->data_type) &&
(journal_seq == v->journal_seq ||
bch2_journal_noflush_seq(&c->journal, v->journal_seq))
? 0 : journal_seq;
}
if (!data_type_is_empty(old_a->data_type) &&
data_type_is_empty(new_a->data_type) &&
bucket_journal_seq) {
ret = bch2_set_bucket_needs_journal_commit(&c->buckets_waiting_for_journal,
c->journal.flushed_seq_ondisk,
new.k->p.inode, new.k->p.offset,
bucket_journal_seq);
if (ret) {
bch2_fs_fatal_error(c,
"error setting bucket_needs_journal_commit: %i", ret);
return ret;
}
}
percpu_down_read(&c->mark_lock);
if (new_a->gen != old_a->gen)
*bucket_gen(ca, new.k->p.offset) = new_a->gen;
bch2_dev_usage_update(c, ca, old_a, new_a, journal_seq, false);
if (new_a->data_type == BCH_DATA_free &&
(!new_a->journal_seq || new_a->journal_seq < c->journal.flushed_seq_ondisk))
closure_wake_up(&c->freelist_wait);
if (new_a->data_type == BCH_DATA_need_discard &&
(!bucket_journal_seq || bucket_journal_seq < c->journal.flushed_seq_ondisk))
bch2_do_discards(c);
if (old_a->data_type != BCH_DATA_cached &&
new_a->data_type == BCH_DATA_cached &&
should_invalidate_buckets(ca, bch2_dev_usage_read(ca)))
bch2_do_invalidates(c);
if (new_a->data_type == BCH_DATA_need_gc_gens)
bch2_do_gc_gens(c);
percpu_up_read(&c->mark_lock);
} }
if (old_a->data_type != new_a->data_type || if ((flags & BTREE_TRIGGER_GC) &&
(new_a->data_type == BCH_DATA_free && (flags & BTREE_TRIGGER_BUCKET_INVALIDATE)) {
alloc_freespace_genbits(*old_a) != alloc_freespace_genbits(*new_a))) { struct bch_alloc_v4 new_a_convert;
ret = bch2_bucket_do_index(trans, old, old_a, false) ?: const struct bch_alloc_v4 *new_a = bch2_alloc_to_v4(new.s_c, &new_a_convert);
bch2_bucket_do_index(trans, bkey_i_to_s_c(new), new_a, true);
if (ret)
return ret;
}
if (new_a->data_type == BCH_DATA_cached && percpu_down_read(&c->mark_lock);
!new_a->io_time[READ]) struct bucket *g = gc_bucket(ca, new.k->p.offset);
new_a->io_time[READ] = max_t(u64, 1, atomic64_read(&c->io_clock[READ].now));
old_lru = alloc_lru_idx_read(*old_a); bucket_lock(g);
new_lru = alloc_lru_idx_read(*new_a);
if (old_lru != new_lru) { g->gen_valid = 1;
ret = bch2_lru_change(trans, new->k.p.inode, g->gen = new_a->gen;
bucket_to_u64(new->k.p), g->data_type = new_a->data_type;
old_lru, new_lru); g->stripe = new_a->stripe;
if (ret) g->stripe_redundancy = new_a->stripe_redundancy;
return ret; g->dirty_sectors = new_a->dirty_sectors;
} g->cached_sectors = new_a->cached_sectors;
new_a->fragmentation_lru = alloc_lru_idx_fragmentation(*new_a, bucket_unlock(g);
bch_dev_bkey_exists(c, new->k.p.inode)); percpu_up_read(&c->mark_lock);
if (old_a->fragmentation_lru != new_a->fragmentation_lru) {
ret = bch2_lru_change(trans,
BCH_LRU_FRAGMENTATION_START,
bucket_to_u64(new->k.p),
old_a->fragmentation_lru, new_a->fragmentation_lru);
if (ret)
return ret;
}
if (old_a->gen != new_a->gen) {
ret = bch2_bucket_gen_update(trans, new->k.p, new_a->gen);
if (ret)
return ret;
} }
return 0; return 0;
@ -869,8 +944,9 @@ static struct bkey_s_c bch2_get_key_or_hole(struct btree_iter *iter, struct bpos
bch2_trans_copy_iter(&iter2, iter); bch2_trans_copy_iter(&iter2, iter);
if (!bpos_eq(iter->path->l[0].b->key.k.p, SPOS_MAX)) struct btree_path *path = btree_iter_path(iter->trans, iter);
end = bkey_min(end, bpos_nosnap_successor(iter->path->l[0].b->key.k.p)); if (!bpos_eq(path->l[0].b->key.k.p, SPOS_MAX))
end = bkey_min(end, bpos_nosnap_successor(path->l[0].b->key.k.p));
end = bkey_min(end, POS(iter->pos.inode, iter->pos.offset + U32_MAX - 1)); end = bkey_min(end, POS(iter->pos.inode, iter->pos.offset + U32_MAX - 1));
@ -898,7 +974,6 @@ static struct bkey_s_c bch2_get_key_or_hole(struct btree_iter *iter, struct bpos
static bool next_bucket(struct bch_fs *c, struct bpos *bucket) static bool next_bucket(struct bch_fs *c, struct bpos *bucket)
{ {
struct bch_dev *ca; struct bch_dev *ca;
unsigned iter;
if (bch2_dev_bucket_exists(c, *bucket)) if (bch2_dev_bucket_exists(c, *bucket))
return true; return true;
@ -916,8 +991,7 @@ static bool next_bucket(struct bch_fs *c, struct bpos *bucket)
} }
rcu_read_lock(); rcu_read_lock();
iter = bucket->inode; ca = __bch2_next_dev_idx(c, bucket->inode, NULL);
ca = __bch2_next_dev(c, &iter, NULL);
if (ca) if (ca)
*bucket = POS(ca->dev_idx, ca->mi.first_bucket); *bucket = POS(ca->dev_idx, ca->mi.first_bucket);
rcu_read_unlock(); rcu_read_unlock();
@ -1158,9 +1232,6 @@ int bch2_check_alloc_hole_bucket_gens(struct btree_trans *trans,
unsigned i, gens_offset, gens_end_offset; unsigned i, gens_offset, gens_end_offset;
int ret; int ret;
if (c->sb.version < bcachefs_metadata_version_bucket_gens)
return 0;
bch2_btree_iter_set_pos(bucket_gens_iter, alloc_gens_pos(start, &gens_offset)); bch2_btree_iter_set_pos(bucket_gens_iter, alloc_gens_pos(start, &gens_offset));
k = bch2_btree_iter_peek_slot(bucket_gens_iter); k = bch2_btree_iter_peek_slot(bucket_gens_iter);
@ -1212,7 +1283,7 @@ int bch2_check_alloc_hole_bucket_gens(struct btree_trans *trans,
return ret; return ret;
} }
static noinline_for_stack int __bch2_check_discard_freespace_key(struct btree_trans *trans, static noinline_for_stack int bch2_check_discard_freespace_key(struct btree_trans *trans,
struct btree_iter *iter) struct btree_iter *iter)
{ {
struct bch_fs *c = trans->c; struct bch_fs *c = trans->c;
@ -1267,28 +1338,10 @@ static noinline_for_stack int __bch2_check_discard_freespace_key(struct btree_tr
ret = bch2_btree_delete_extent_at(trans, iter, ret = bch2_btree_delete_extent_at(trans, iter,
iter->btree_id == BTREE_ID_freespace ? 1 : 0, 0) ?: iter->btree_id == BTREE_ID_freespace ? 1 : 0, 0) ?:
bch2_trans_commit(trans, NULL, NULL, bch2_trans_commit(trans, NULL, NULL,
BTREE_INSERT_NOFAIL|BTREE_INSERT_LAZY_RW); BCH_TRANS_COMMIT_no_enospc);
goto out; goto out;
} }
static int bch2_check_discard_freespace_key(struct btree_trans *trans,
struct btree_iter *iter,
struct bpos end)
{
if (!btree_id_is_extents(iter->btree_id)) {
return __bch2_check_discard_freespace_key(trans, iter);
} else {
int ret = 0;
while (!bkey_eq(iter->pos, end) &&
!(ret = btree_trans_too_many_iters(trans) ?:
__bch2_check_discard_freespace_key(trans, iter)))
bch2_btree_iter_set_pos(iter, bpos_nosnap_successor(iter->pos));
return ret;
}
}
/* /*
* We've already checked that generation numbers in the bucket_gens btree are * We've already checked that generation numbers in the bucket_gens btree are
* valid for buckets that exist; this just checks for keys for nonexistent * valid for buckets that exist; this just checks for keys for nonexistent
@ -1422,8 +1475,7 @@ int bch2_check_alloc_info(struct bch_fs *c)
} }
ret = bch2_trans_commit(trans, NULL, NULL, ret = bch2_trans_commit(trans, NULL, NULL,
BTREE_INSERT_NOFAIL| BCH_TRANS_COMMIT_no_enospc);
BTREE_INSERT_LAZY_RW);
if (ret) if (ret)
goto bkey_err; goto bkey_err;
@ -1442,23 +1494,50 @@ int bch2_check_alloc_info(struct bch_fs *c)
if (ret < 0) if (ret < 0)
goto err; goto err;
ret = for_each_btree_key2(trans, iter, ret = for_each_btree_key(trans, iter,
BTREE_ID_need_discard, POS_MIN, BTREE_ID_need_discard, POS_MIN,
BTREE_ITER_PREFETCH, k, BTREE_ITER_PREFETCH, k,
bch2_check_discard_freespace_key(trans, &iter, k.k->p)) ?: bch2_check_discard_freespace_key(trans, &iter));
for_each_btree_key2(trans, iter, if (ret)
BTREE_ID_freespace, POS_MIN, goto err;
BTREE_ITER_PREFETCH, k,
bch2_check_discard_freespace_key(trans, &iter, k.k->p)) ?: bch2_trans_iter_init(trans, &iter, BTREE_ID_freespace, POS_MIN,
for_each_btree_key_commit(trans, iter, BTREE_ITER_PREFETCH);
while (1) {
bch2_trans_begin(trans);
k = bch2_btree_iter_peek(&iter);
if (!k.k)
break;
ret = bkey_err(k) ?:
bch2_check_discard_freespace_key(trans, &iter);
if (bch2_err_matches(ret, BCH_ERR_transaction_restart)) {
ret = 0;
continue;
}
if (ret) {
struct printbuf buf = PRINTBUF;
bch2_bkey_val_to_text(&buf, c, k);
bch_err(c, "while checking %s", buf.buf);
printbuf_exit(&buf);
break;
}
bch2_btree_iter_set_pos(&iter, bpos_nosnap_successor(iter.pos));
}
bch2_trans_iter_exit(trans, &iter);
if (ret)
goto err;
ret = for_each_btree_key_commit(trans, iter,
BTREE_ID_bucket_gens, POS_MIN, BTREE_ID_bucket_gens, POS_MIN,
BTREE_ITER_PREFETCH, k, BTREE_ITER_PREFETCH, k,
NULL, NULL, BTREE_INSERT_NOFAIL|BTREE_INSERT_LAZY_RW, NULL, NULL, BCH_TRANS_COMMIT_no_enospc,
bch2_check_bucket_gens_key(trans, &iter, k)); bch2_check_bucket_gens_key(trans, &iter, k));
err: err:
bch2_trans_put(trans); bch2_trans_put(trans);
if (ret) bch_err_fn(c, ret);
bch_err_fn(c, ret);
return ret; return ret;
} }
@ -1486,6 +1565,27 @@ static int bch2_check_alloc_to_lru_ref(struct btree_trans *trans,
if (a->data_type != BCH_DATA_cached) if (a->data_type != BCH_DATA_cached)
return 0; return 0;
if (fsck_err_on(!a->io_time[READ], c,
alloc_key_cached_but_read_time_zero,
"cached bucket with read_time 0\n"
" %s",
(printbuf_reset(&buf),
bch2_bkey_val_to_text(&buf, c, alloc_k), buf.buf))) {
struct bkey_i_alloc_v4 *a_mut =
bch2_alloc_to_v4_mut(trans, alloc_k);
ret = PTR_ERR_OR_ZERO(a_mut);
if (ret)
goto err;
a_mut->v.io_time[READ] = atomic64_read(&c->io_clock[READ].now);
ret = bch2_trans_update(trans, alloc_iter,
&a_mut->k_i, BTREE_TRIGGER_NORUN);
if (ret)
goto err;
a = &a_mut->v;
}
lru_k = bch2_bkey_get_iter(trans, &lru_iter, BTREE_ID_lru, lru_k = bch2_bkey_get_iter(trans, &lru_iter, BTREE_ID_lru,
lru_pos(alloc_k.k->p.inode, lru_pos(alloc_k.k->p.inode,
bucket_to_u64(alloc_k.k->p), bucket_to_u64(alloc_k.k->p),
@ -1494,41 +1594,18 @@ static int bch2_check_alloc_to_lru_ref(struct btree_trans *trans,
if (ret) if (ret)
return ret; return ret;
if (fsck_err_on(!a->io_time[READ], c, if (fsck_err_on(lru_k.k->type != KEY_TYPE_set, c,
alloc_key_cached_but_read_time_zero,
"cached bucket with read_time 0\n"
" %s",
(printbuf_reset(&buf),
bch2_bkey_val_to_text(&buf, c, alloc_k), buf.buf)) ||
fsck_err_on(lru_k.k->type != KEY_TYPE_set, c,
alloc_key_to_missing_lru_entry, alloc_key_to_missing_lru_entry,
"missing lru entry\n" "missing lru entry\n"
" %s", " %s",
(printbuf_reset(&buf), (printbuf_reset(&buf),
bch2_bkey_val_to_text(&buf, c, alloc_k), buf.buf))) { bch2_bkey_val_to_text(&buf, c, alloc_k), buf.buf))) {
u64 read_time = a->io_time[READ] ?:
atomic64_read(&c->io_clock[READ].now);
ret = bch2_lru_set(trans, ret = bch2_lru_set(trans,
alloc_k.k->p.inode, alloc_k.k->p.inode,
bucket_to_u64(alloc_k.k->p), bucket_to_u64(alloc_k.k->p),
read_time); a->io_time[READ]);
if (ret) if (ret)
goto err; goto err;
if (a->io_time[READ] != read_time) {
struct bkey_i_alloc_v4 *a_mut =
bch2_alloc_to_v4_mut(trans, alloc_k);
ret = PTR_ERR_OR_ZERO(a_mut);
if (ret)
goto err;
a_mut->v.io_time[READ] = read_time;
ret = bch2_trans_update(trans, alloc_iter,
&a_mut->k_i, BTREE_TRIGGER_NORUN);
if (ret)
goto err;
}
} }
err: err:
fsck_err: fsck_err:
@ -1539,17 +1616,12 @@ static int bch2_check_alloc_to_lru_ref(struct btree_trans *trans,
int bch2_check_alloc_to_lru_refs(struct bch_fs *c) int bch2_check_alloc_to_lru_refs(struct bch_fs *c)
{ {
struct btree_iter iter; int ret = bch2_trans_run(c,
struct bkey_s_c k;
int ret = 0;
ret = bch2_trans_run(c,
for_each_btree_key_commit(trans, iter, BTREE_ID_alloc, for_each_btree_key_commit(trans, iter, BTREE_ID_alloc,
POS_MIN, BTREE_ITER_PREFETCH, k, POS_MIN, BTREE_ITER_PREFETCH, k,
NULL, NULL, BTREE_INSERT_NOFAIL|BTREE_INSERT_LAZY_RW, NULL, NULL, BCH_TRANS_COMMIT_no_enospc,
bch2_check_alloc_to_lru_ref(trans, &iter))); bch2_check_alloc_to_lru_ref(trans, &iter)));
if (ret) bch_err_fn(c, ret);
bch_err_fn(c, ret);
return ret; return ret;
} }
@ -1655,11 +1727,11 @@ static int bch2_discard_one_bucket(struct btree_trans *trans,
ret = bch2_trans_update(trans, &iter, &a->k_i, 0) ?: ret = bch2_trans_update(trans, &iter, &a->k_i, 0) ?:
bch2_trans_commit(trans, NULL, NULL, bch2_trans_commit(trans, NULL, NULL,
BCH_WATERMARK_btree| BCH_WATERMARK_btree|
BTREE_INSERT_NOFAIL); BCH_TRANS_COMMIT_no_enospc);
if (ret) if (ret)
goto out; goto out;
this_cpu_inc(c->counters[BCH_COUNTER_bucket_discard]); count_event(c, bucket_discard);
(*discarded)++; (*discarded)++;
out: out:
(*seen)++; (*seen)++;
@ -1672,8 +1744,6 @@ static int bch2_discard_one_bucket(struct btree_trans *trans,
static void bch2_do_discards_work(struct work_struct *work) static void bch2_do_discards_work(struct work_struct *work)
{ {
struct bch_fs *c = container_of(work, struct bch_fs, discard_work); struct bch_fs *c = container_of(work, struct bch_fs, discard_work);
struct btree_iter iter;
struct bkey_s_c k;
u64 seen = 0, open = 0, need_journal_commit = 0, discarded = 0; u64 seen = 0, open = 0, need_journal_commit = 0, discarded = 0;
struct bpos discard_pos_done = POS_MAX; struct bpos discard_pos_done = POS_MAX;
int ret; int ret;
@ -1684,8 +1754,8 @@ static void bch2_do_discards_work(struct work_struct *work)
* successful commit: * successful commit:
*/ */
ret = bch2_trans_run(c, ret = bch2_trans_run(c,
for_each_btree_key2(trans, iter, for_each_btree_key(trans, iter,
BTREE_ID_need_discard, POS_MIN, 0, k, BTREE_ID_need_discard, POS_MIN, 0, k,
bch2_discard_one_bucket(trans, &iter, &discard_pos_done, bch2_discard_one_bucket(trans, &iter, &discard_pos_done,
&seen, &seen,
&open, &open,
@ -1760,7 +1830,7 @@ static int invalidate_one_bucket(struct btree_trans *trans,
BTREE_TRIGGER_BUCKET_INVALIDATE) ?: BTREE_TRIGGER_BUCKET_INVALIDATE) ?:
bch2_trans_commit(trans, NULL, NULL, bch2_trans_commit(trans, NULL, NULL,
BCH_WATERMARK_btree| BCH_WATERMARK_btree|
BTREE_INSERT_NOFAIL); BCH_TRANS_COMMIT_no_enospc);
if (ret) if (ret)
goto out; goto out;
@ -1795,22 +1865,18 @@ static int invalidate_one_bucket(struct btree_trans *trans,
static void bch2_do_invalidates_work(struct work_struct *work) static void bch2_do_invalidates_work(struct work_struct *work)
{ {
struct bch_fs *c = container_of(work, struct bch_fs, invalidate_work); struct bch_fs *c = container_of(work, struct bch_fs, invalidate_work);
struct bch_dev *ca;
struct btree_trans *trans = bch2_trans_get(c); struct btree_trans *trans = bch2_trans_get(c);
struct btree_iter iter;
struct bkey_s_c k;
unsigned i;
int ret = 0; int ret = 0;
ret = bch2_btree_write_buffer_flush(trans); ret = bch2_btree_write_buffer_tryflush(trans);
if (ret) if (ret)
goto err; goto err;
for_each_member_device(ca, c, i) { for_each_member_device(c, ca) {
s64 nr_to_invalidate = s64 nr_to_invalidate =
should_invalidate_buckets(ca, bch2_dev_usage_read(ca)); should_invalidate_buckets(ca, bch2_dev_usage_read(ca));
ret = for_each_btree_key2_upto(trans, iter, BTREE_ID_lru, ret = for_each_btree_key_upto(trans, iter, BTREE_ID_lru,
lru_pos(ca->dev_idx, 0, 0), lru_pos(ca->dev_idx, 0, 0),
lru_pos(ca->dev_idx, U64_MAX, LRU_TIME_MAX), lru_pos(ca->dev_idx, U64_MAX, LRU_TIME_MAX),
BTREE_ITER_INTENT, k, BTREE_ITER_INTENT, k,
@ -1884,8 +1950,7 @@ int bch2_dev_freespace_init(struct bch_fs *c, struct bch_dev *ca,
ret = bch2_bucket_do_index(trans, k, a, true) ?: ret = bch2_bucket_do_index(trans, k, a, true) ?:
bch2_trans_commit(trans, NULL, NULL, bch2_trans_commit(trans, NULL, NULL,
BTREE_INSERT_LAZY_RW| BCH_TRANS_COMMIT_no_enospc);
BTREE_INSERT_NOFAIL);
if (ret) if (ret)
goto bkey_err; goto bkey_err;
@ -1905,8 +1970,7 @@ int bch2_dev_freespace_init(struct bch_fs *c, struct bch_dev *ca,
ret = bch2_btree_insert_trans(trans, BTREE_ID_freespace, freespace, 0) ?: ret = bch2_btree_insert_trans(trans, BTREE_ID_freespace, freespace, 0) ?:
bch2_trans_commit(trans, NULL, NULL, bch2_trans_commit(trans, NULL, NULL,
BTREE_INSERT_LAZY_RW| BCH_TRANS_COMMIT_no_enospc);
BTREE_INSERT_NOFAIL);
if (ret) if (ret)
goto bkey_err; goto bkey_err;
@ -1937,8 +2001,6 @@ int bch2_dev_freespace_init(struct bch_fs *c, struct bch_dev *ca,
int bch2_fs_freespace_init(struct bch_fs *c) int bch2_fs_freespace_init(struct bch_fs *c)
{ {
struct bch_dev *ca;
unsigned i;
int ret = 0; int ret = 0;
bool doing_init = false; bool doing_init = false;
@ -1947,7 +2009,7 @@ int bch2_fs_freespace_init(struct bch_fs *c)
* every mount: * every mount:
*/ */
for_each_member_device(ca, c, i) { for_each_member_device(c, ca) {
if (ca->mi.freespace_initialized) if (ca->mi.freespace_initialized)
continue; continue;
@ -2007,15 +2069,13 @@ int bch2_bucket_io_time_reset(struct btree_trans *trans, unsigned dev,
void bch2_recalc_capacity(struct bch_fs *c) void bch2_recalc_capacity(struct bch_fs *c)
{ {
struct bch_dev *ca;
u64 capacity = 0, reserved_sectors = 0, gc_reserve; u64 capacity = 0, reserved_sectors = 0, gc_reserve;
unsigned bucket_size_max = 0; unsigned bucket_size_max = 0;
unsigned long ra_pages = 0; unsigned long ra_pages = 0;
unsigned i;
lockdep_assert_held(&c->state_lock); lockdep_assert_held(&c->state_lock);
for_each_online_member(ca, c, i) { for_each_online_member(c, ca) {
struct backing_dev_info *bdi = ca->disk_sb.bdev->bd_disk->bdi; struct backing_dev_info *bdi = ca->disk_sb.bdev->bd_disk->bdi;
ra_pages += bdi->ra_pages; ra_pages += bdi->ra_pages;
@ -2023,7 +2083,7 @@ void bch2_recalc_capacity(struct bch_fs *c)
bch2_set_ra_pages(c, ra_pages); bch2_set_ra_pages(c, ra_pages);
for_each_rw_member(ca, c, i) { for_each_rw_member(c, ca) {
u64 dev_reserve = 0; u64 dev_reserve = 0;
/* /*
@ -2079,11 +2139,9 @@ void bch2_recalc_capacity(struct bch_fs *c)
u64 bch2_min_rw_member_capacity(struct bch_fs *c) u64 bch2_min_rw_member_capacity(struct bch_fs *c)
{ {
struct bch_dev *ca;
unsigned i;
u64 ret = U64_MAX; u64 ret = U64_MAX;
for_each_rw_member(ca, c, i) for_each_rw_member(c, ca)
ret = min(ret, ca->mi.nbuckets * ca->mi.bucket_size); ret = min(ret, ca->mi.nbuckets * ca->mi.bucket_size);
return ret; return ret;
} }

View File

@ -71,6 +71,24 @@ static inline enum bch_data_type bucket_data_type(enum bch_data_type data_type)
return data_type == BCH_DATA_stripe ? BCH_DATA_user : data_type; return data_type == BCH_DATA_stripe ? BCH_DATA_user : data_type;
} }
static inline unsigned bch2_bucket_sectors(struct bch_alloc_v4 a)
{
return a.dirty_sectors + a.cached_sectors;
}
static inline unsigned bch2_bucket_sectors_dirty(struct bch_alloc_v4 a)
{
return a.dirty_sectors;
}
static inline unsigned bch2_bucket_sectors_fragmented(struct bch_dev *ca,
struct bch_alloc_v4 a)
{
int d = bch2_bucket_sectors_dirty(a);
return d ? max(0, ca->mi.bucket_size - d) : 0;
}
static inline u64 alloc_lru_idx_read(struct bch_alloc_v4 a) static inline u64 alloc_lru_idx_read(struct bch_alloc_v4 a)
{ {
return a.data_type == BCH_DATA_cached ? a.io_time[READ] : 0; return a.data_type == BCH_DATA_cached ? a.io_time[READ] : 0;
@ -90,10 +108,11 @@ static inline u64 alloc_lru_idx_fragmentation(struct bch_alloc_v4 a,
struct bch_dev *ca) struct bch_dev *ca)
{ {
if (!data_type_movable(a.data_type) || if (!data_type_movable(a.data_type) ||
a.dirty_sectors >= ca->mi.bucket_size) !bch2_bucket_sectors_fragmented(ca, a))
return 0; return 0;
return div_u64((u64) a.dirty_sectors * (1ULL << 31), ca->mi.bucket_size); u64 d = bch2_bucket_sectors_dirty(a);
return div_u64(d * (1ULL << 31), ca->mi.bucket_size);
} }
static inline u64 alloc_freespace_genbits(struct bch_alloc_v4 a) static inline u64 alloc_freespace_genbits(struct bch_alloc_v4 a)
@ -163,24 +182,21 @@ void bch2_alloc_to_text(struct printbuf *, struct bch_fs *, struct bkey_s_c);
#define bch2_bkey_ops_alloc ((struct bkey_ops) { \ #define bch2_bkey_ops_alloc ((struct bkey_ops) { \
.key_invalid = bch2_alloc_v1_invalid, \ .key_invalid = bch2_alloc_v1_invalid, \
.val_to_text = bch2_alloc_to_text, \ .val_to_text = bch2_alloc_to_text, \
.trans_trigger = bch2_trans_mark_alloc, \ .trigger = bch2_trigger_alloc, \
.atomic_trigger = bch2_mark_alloc, \
.min_val_size = 8, \ .min_val_size = 8, \
}) })
#define bch2_bkey_ops_alloc_v2 ((struct bkey_ops) { \ #define bch2_bkey_ops_alloc_v2 ((struct bkey_ops) { \
.key_invalid = bch2_alloc_v2_invalid, \ .key_invalid = bch2_alloc_v2_invalid, \
.val_to_text = bch2_alloc_to_text, \ .val_to_text = bch2_alloc_to_text, \
.trans_trigger = bch2_trans_mark_alloc, \ .trigger = bch2_trigger_alloc, \
.atomic_trigger = bch2_mark_alloc, \
.min_val_size = 8, \ .min_val_size = 8, \
}) })
#define bch2_bkey_ops_alloc_v3 ((struct bkey_ops) { \ #define bch2_bkey_ops_alloc_v3 ((struct bkey_ops) { \
.key_invalid = bch2_alloc_v3_invalid, \ .key_invalid = bch2_alloc_v3_invalid, \
.val_to_text = bch2_alloc_to_text, \ .val_to_text = bch2_alloc_to_text, \
.trans_trigger = bch2_trans_mark_alloc, \ .trigger = bch2_trigger_alloc, \
.atomic_trigger = bch2_mark_alloc, \
.min_val_size = 16, \ .min_val_size = 16, \
}) })
@ -188,8 +204,7 @@ void bch2_alloc_to_text(struct printbuf *, struct bch_fs *, struct bkey_s_c);
.key_invalid = bch2_alloc_v4_invalid, \ .key_invalid = bch2_alloc_v4_invalid, \
.val_to_text = bch2_alloc_to_text, \ .val_to_text = bch2_alloc_to_text, \
.swab = bch2_alloc_v4_swab, \ .swab = bch2_alloc_v4_swab, \
.trans_trigger = bch2_trans_mark_alloc, \ .trigger = bch2_trigger_alloc, \
.atomic_trigger = bch2_mark_alloc, \
.min_val_size = 48, \ .min_val_size = 48, \
}) })
@ -213,8 +228,8 @@ static inline bool bkey_is_alloc(const struct bkey *k)
int bch2_alloc_read(struct bch_fs *); int bch2_alloc_read(struct bch_fs *);
int bch2_trans_mark_alloc(struct btree_trans *, enum btree_id, unsigned, int bch2_trigger_alloc(struct btree_trans *, enum btree_id, unsigned,
struct bkey_s_c, struct bkey_i *, unsigned); struct bkey_s_c, struct bkey_s, unsigned);
int bch2_check_alloc_info(struct bch_fs *); int bch2_check_alloc_info(struct bch_fs *);
int bch2_check_alloc_to_lru_refs(struct bch_fs *); int bch2_check_alloc_to_lru_refs(struct bch_fs *);
void bch2_do_discards(struct bch_fs *); void bch2_do_discards(struct bch_fs *);

View File

@ -69,11 +69,8 @@ const char * const bch2_watermarks[] = {
void bch2_reset_alloc_cursors(struct bch_fs *c) void bch2_reset_alloc_cursors(struct bch_fs *c)
{ {
struct bch_dev *ca;
unsigned i;
rcu_read_lock(); rcu_read_lock();
for_each_member_device_rcu(ca, c, i, NULL) for_each_member_device_rcu(c, ca, NULL)
ca->alloc_cursor = 0; ca->alloc_cursor = 0;
rcu_read_unlock(); rcu_read_unlock();
} }
@ -239,9 +236,8 @@ static struct open_bucket *__try_alloc_bucket(struct bch_fs *c, struct bch_dev *
if (cl) if (cl)
closure_wait(&c->open_buckets_wait, cl); closure_wait(&c->open_buckets_wait, cl);
if (!c->blocked_allocate_open_bucket) track_event_change(&c->times[BCH_TIME_blocked_allocate_open_bucket],
c->blocked_allocate_open_bucket = local_clock(); &c->blocked_allocate_open_bucket, true);
spin_unlock(&c->freelist_lock); spin_unlock(&c->freelist_lock);
return ERR_PTR(-BCH_ERR_open_buckets_empty); return ERR_PTR(-BCH_ERR_open_buckets_empty);
} }
@ -267,19 +263,11 @@ static struct open_bucket *__try_alloc_bucket(struct bch_fs *c, struct bch_dev *
ca->nr_open_buckets++; ca->nr_open_buckets++;
bch2_open_bucket_hash_add(c, ob); bch2_open_bucket_hash_add(c, ob);
if (c->blocked_allocate_open_bucket) { track_event_change(&c->times[BCH_TIME_blocked_allocate_open_bucket],
bch2_time_stats_update( &c->blocked_allocate_open_bucket, false);
&c->times[BCH_TIME_blocked_allocate_open_bucket],
c->blocked_allocate_open_bucket);
c->blocked_allocate_open_bucket = 0;
}
if (c->blocked_allocate) { track_event_change(&c->times[BCH_TIME_blocked_allocate],
bch2_time_stats_update( &c->blocked_allocate, false);
&c->times[BCH_TIME_blocked_allocate],
c->blocked_allocate);
c->blocked_allocate = 0;
}
spin_unlock(&c->freelist_lock); spin_unlock(&c->freelist_lock);
return ob; return ob;
@ -377,9 +365,9 @@ static struct open_bucket *try_alloc_bucket(struct btree_trans *trans, struct bc
ob = __try_alloc_bucket(c, ca, b, watermark, a, s, cl); ob = __try_alloc_bucket(c, ca, b, watermark, a, s, cl);
if (!ob) if (!ob)
iter.path->preserve = false; set_btree_iter_dontneed(&iter);
err: err:
if (iter.trans && iter.path) if (iter.path)
set_btree_iter_dontneed(&iter); set_btree_iter_dontneed(&iter);
bch2_trans_iter_exit(trans, &iter); bch2_trans_iter_exit(trans, &iter);
printbuf_exit(&buf); printbuf_exit(&buf);
@ -447,7 +435,7 @@ bch2_bucket_alloc_early(struct btree_trans *trans,
ob = __try_alloc_bucket(trans->c, ca, k.k->p.offset, watermark, a, s, cl); ob = __try_alloc_bucket(trans->c, ca, k.k->p.offset, watermark, a, s, cl);
next: next:
citer.path->preserve = false; set_btree_iter_dontneed(&citer);
bch2_trans_iter_exit(trans, &citer); bch2_trans_iter_exit(trans, &citer);
if (ob) if (ob)
break; break;
@ -502,7 +490,7 @@ static struct open_bucket *bch2_bucket_alloc_freelist(struct btree_trans *trans,
ob = try_alloc_bucket(trans, ca, watermark, ob = try_alloc_bucket(trans, ca, watermark,
alloc_cursor, s, k, cl); alloc_cursor, s, k, cl);
if (ob) { if (ob) {
iter.path->preserve = false; set_btree_iter_dontneed(&iter);
break; break;
} }
} }
@ -567,8 +555,8 @@ static struct open_bucket *bch2_bucket_alloc_trans(struct btree_trans *trans,
goto again; goto again;
} }
if (!c->blocked_allocate) track_event_change(&c->times[BCH_TIME_blocked_allocate],
c->blocked_allocate = local_clock(); &c->blocked_allocate, true);
ob = ERR_PTR(-BCH_ERR_freelist_empty); ob = ERR_PTR(-BCH_ERR_freelist_empty);
goto err; goto err;
@ -697,11 +685,9 @@ static int add_new_bucket(struct bch_fs *c,
bch_dev_bkey_exists(c, ob->dev)->mi.durability; bch_dev_bkey_exists(c, ob->dev)->mi.durability;
BUG_ON(*nr_effective >= nr_replicas); BUG_ON(*nr_effective >= nr_replicas);
BUG_ON(flags & BCH_WRITE_ONLY_SPECIFIED_DEVS);
__clear_bit(ob->dev, devs_may_alloc->d); __clear_bit(ob->dev, devs_may_alloc->d);
*nr_effective += (flags & BCH_WRITE_ONLY_SPECIFIED_DEVS) *nr_effective += durability;
? durability : 1;
*have_cache |= !durability; *have_cache |= !durability;
ob_push(c, ptrs, ob); ob_push(c, ptrs, ob);
@ -972,8 +958,8 @@ static int __open_bucket_add_buckets(struct btree_trans *trans,
devs = target_rw_devs(c, wp->data_type, target); devs = target_rw_devs(c, wp->data_type, target);
/* Don't allocate from devices we already have pointers to: */ /* Don't allocate from devices we already have pointers to: */
for (i = 0; i < devs_have->nr; i++) darray_for_each(*devs_have, i)
__clear_bit(devs_have->devs[i], devs.d); __clear_bit(*i, devs.d);
open_bucket_for_each(c, ptrs, ob, i) open_bucket_for_each(c, ptrs, ob, i)
__clear_bit(ob->dev, devs.d); __clear_bit(ob->dev, devs.d);

View File

@ -3,6 +3,7 @@
#include "bbpos.h" #include "bbpos.h"
#include "alloc_background.h" #include "alloc_background.h"
#include "backpointers.h" #include "backpointers.h"
#include "bkey_buf.h"
#include "btree_cache.h" #include "btree_cache.h"
#include "btree_update.h" #include "btree_update.h"
#include "btree_update_interior.h" #include "btree_update_interior.h"
@ -136,15 +137,30 @@ static noinline int backpointer_mod_err(struct btree_trans *trans,
} }
int bch2_bucket_backpointer_mod_nowritebuffer(struct btree_trans *trans, int bch2_bucket_backpointer_mod_nowritebuffer(struct btree_trans *trans,
struct bkey_i_backpointer *bp_k, struct bpos bucket,
struct bch_backpointer bp, struct bch_backpointer bp,
struct bkey_s_c orig_k, struct bkey_s_c orig_k,
bool insert) bool insert)
{ {
struct btree_iter bp_iter; struct btree_iter bp_iter;
struct bkey_s_c k; struct bkey_s_c k;
struct bkey_i_backpointer *bp_k;
int ret; int ret;
bp_k = bch2_trans_kmalloc_nomemzero(trans, sizeof(struct bkey_i_backpointer));
ret = PTR_ERR_OR_ZERO(bp_k);
if (ret)
return ret;
bkey_backpointer_init(&bp_k->k_i);
bp_k->k.p = bucket_pos_to_bp(trans->c, bucket, bp.bucket_offset);
bp_k->v = bp;
if (!insert) {
bp_k->k.type = KEY_TYPE_deleted;
set_bkey_val_u64s(&bp_k->k, 0);
}
k = bch2_bkey_get_iter(trans, &bp_iter, BTREE_ID_backpointers, k = bch2_bkey_get_iter(trans, &bp_iter, BTREE_ID_backpointers,
bp_k->k.p, bp_k->k.p,
BTREE_ITER_INTENT| BTREE_ITER_INTENT|
@ -375,39 +391,32 @@ static int bch2_check_btree_backpointer(struct btree_trans *trans, struct btree_
/* verify that every backpointer has a corresponding alloc key */ /* verify that every backpointer has a corresponding alloc key */
int bch2_check_btree_backpointers(struct bch_fs *c) int bch2_check_btree_backpointers(struct bch_fs *c)
{ {
struct btree_iter iter; int ret = bch2_trans_run(c,
struct bkey_s_c k;
int ret;
ret = bch2_trans_run(c,
for_each_btree_key_commit(trans, iter, for_each_btree_key_commit(trans, iter,
BTREE_ID_backpointers, POS_MIN, 0, k, BTREE_ID_backpointers, POS_MIN, 0, k,
NULL, NULL, BTREE_INSERT_LAZY_RW|BTREE_INSERT_NOFAIL, NULL, NULL, BCH_TRANS_COMMIT_no_enospc,
bch2_check_btree_backpointer(trans, &iter, k))); bch2_check_btree_backpointer(trans, &iter, k)));
if (ret) bch_err_fn(c, ret);
bch_err_fn(c, ret);
return ret; return ret;
} }
struct bpos_level {
unsigned level;
struct bpos pos;
};
static int check_bp_exists(struct btree_trans *trans, static int check_bp_exists(struct btree_trans *trans,
struct bpos bucket, struct bpos bucket,
struct bch_backpointer bp, struct bch_backpointer bp,
struct bkey_s_c orig_k, struct bkey_s_c orig_k,
struct bpos bucket_start, struct bpos bucket_start,
struct bpos bucket_end, struct bpos bucket_end,
struct bpos_level *last_flushed) struct bkey_buf *last_flushed)
{ {
struct bch_fs *c = trans->c; struct bch_fs *c = trans->c;
struct btree_iter bp_iter = { NULL }; struct btree_iter bp_iter = { NULL };
struct printbuf buf = PRINTBUF; struct printbuf buf = PRINTBUF;
struct bkey_s_c bp_k; struct bkey_s_c bp_k;
struct bkey_buf tmp;
int ret; int ret;
bch2_bkey_buf_init(&tmp);
if (bpos_lt(bucket, bucket_start) || if (bpos_lt(bucket, bucket_start) ||
bpos_gt(bucket, bucket_end)) bpos_gt(bucket, bucket_end))
return 0; return 0;
@ -424,13 +433,22 @@ static int check_bp_exists(struct btree_trans *trans,
if (bp_k.k->type != KEY_TYPE_backpointer || if (bp_k.k->type != KEY_TYPE_backpointer ||
memcmp(bkey_s_c_to_backpointer(bp_k).v, &bp, sizeof(bp))) { memcmp(bkey_s_c_to_backpointer(bp_k).v, &bp, sizeof(bp))) {
if (last_flushed->level != bp.level || if (!bpos_eq(orig_k.k->p, last_flushed->k->k.p) ||
!bpos_eq(last_flushed->pos, orig_k.k->p)) { bkey_bytes(orig_k.k) != bkey_bytes(&last_flushed->k->k) ||
last_flushed->level = bp.level; memcmp(orig_k.v, &last_flushed->k->v, bkey_val_bytes(orig_k.k))) {
last_flushed->pos = orig_k.k->p; bch2_bkey_buf_reassemble(&tmp, c, orig_k);
ret = bch2_btree_write_buffer_flush_sync(trans) ?: if (bp.level) {
-BCH_ERR_transaction_restart_write_buffer_flush; bch2_trans_unlock(trans);
bch2_btree_interior_updates_flush(c);
}
ret = bch2_btree_write_buffer_flush_sync(trans);
if (ret)
goto err;
bch2_bkey_buf_copy(last_flushed, c, tmp.k);
ret = -BCH_ERR_transaction_restart_write_buffer_flush;
goto out; goto out;
} }
goto missing; goto missing;
@ -439,6 +457,7 @@ static int check_bp_exists(struct btree_trans *trans,
err: err:
fsck_err: fsck_err:
bch2_trans_iter_exit(trans, &bp_iter); bch2_trans_iter_exit(trans, &bp_iter);
bch2_bkey_buf_exit(&tmp, c);
printbuf_exit(&buf); printbuf_exit(&buf);
return ret; return ret;
missing: missing:
@ -448,8 +467,7 @@ static int check_bp_exists(struct btree_trans *trans,
prt_printf(&buf, "\nbp pos "); prt_printf(&buf, "\nbp pos ");
bch2_bpos_to_text(&buf, bp_iter.pos); bch2_bpos_to_text(&buf, bp_iter.pos);
if (c->sb.version_upgrade_complete < bcachefs_metadata_version_backpointers || if (c->opts.reconstruct_alloc ||
c->opts.reconstruct_alloc ||
fsck_err(c, ptr_to_missing_backpointer, "%s", buf.buf)) fsck_err(c, ptr_to_missing_backpointer, "%s", buf.buf))
ret = bch2_bucket_backpointer_mod(trans, bucket, bp, orig_k, true); ret = bch2_bucket_backpointer_mod(trans, bucket, bp, orig_k, true);
@ -457,25 +475,18 @@ static int check_bp_exists(struct btree_trans *trans,
} }
static int check_extent_to_backpointers(struct btree_trans *trans, static int check_extent_to_backpointers(struct btree_trans *trans,
struct btree_iter *iter, enum btree_id btree, unsigned level,
struct bpos bucket_start, struct bpos bucket_start,
struct bpos bucket_end, struct bpos bucket_end,
struct bpos_level *last_flushed) struct bkey_buf *last_flushed,
struct bkey_s_c k)
{ {
struct bch_fs *c = trans->c; struct bch_fs *c = trans->c;
struct bkey_ptrs_c ptrs; struct bkey_ptrs_c ptrs;
const union bch_extent_entry *entry; const union bch_extent_entry *entry;
struct extent_ptr_decoded p; struct extent_ptr_decoded p;
struct bkey_s_c k;
int ret; int ret;
k = bch2_btree_iter_peek_all_levels(iter);
ret = bkey_err(k);
if (ret)
return ret;
if (!k.k)
return 0;
ptrs = bch2_bkey_ptrs_c(k); ptrs = bch2_bkey_ptrs_c(k);
bkey_for_each_ptr_decode(k.k, ptrs, p, entry) { bkey_for_each_ptr_decode(k.k, ptrs, p, entry) {
struct bpos bucket_pos; struct bpos bucket_pos;
@ -484,7 +495,7 @@ static int check_extent_to_backpointers(struct btree_trans *trans,
if (p.ptr.cached) if (p.ptr.cached)
continue; continue;
bch2_extent_ptr_to_bp(c, iter->btree_id, iter->path->level, bch2_extent_ptr_to_bp(c, btree, level,
k, p, &bucket_pos, &bp); k, p, &bucket_pos, &bp);
ret = check_bp_exists(trans, bucket_pos, bp, k, ret = check_bp_exists(trans, bucket_pos, bp, k,
@ -501,44 +512,33 @@ static int check_btree_root_to_backpointers(struct btree_trans *trans,
enum btree_id btree_id, enum btree_id btree_id,
struct bpos bucket_start, struct bpos bucket_start,
struct bpos bucket_end, struct bpos bucket_end,
struct bpos_level *last_flushed) struct bkey_buf *last_flushed,
int *level)
{ {
struct bch_fs *c = trans->c; struct bch_fs *c = trans->c;
struct btree_root *r = bch2_btree_id_root(c, btree_id);
struct btree_iter iter; struct btree_iter iter;
struct btree *b; struct btree *b;
struct bkey_s_c k; struct bkey_s_c k;
struct bkey_ptrs_c ptrs;
struct extent_ptr_decoded p;
const union bch_extent_entry *entry;
int ret; int ret;
retry:
bch2_trans_node_iter_init(trans, &iter, btree_id, POS_MIN, 0, r->level, 0); bch2_trans_node_iter_init(trans, &iter, btree_id, POS_MIN,
0, bch2_btree_id_root(c, btree_id)->b->c.level, 0);
b = bch2_btree_iter_peek_node(&iter); b = bch2_btree_iter_peek_node(&iter);
ret = PTR_ERR_OR_ZERO(b); ret = PTR_ERR_OR_ZERO(b);
if (ret) if (ret)
goto err; goto err;
BUG_ON(b != btree_node_root(c, b)); if (b != btree_node_root(c, b)) {
bch2_trans_iter_exit(trans, &iter);
goto retry;
}
*level = b->c.level;
k = bkey_i_to_s_c(&b->key); k = bkey_i_to_s_c(&b->key);
ptrs = bch2_bkey_ptrs_c(k); ret = check_extent_to_backpointers(trans, btree_id, b->c.level + 1,
bkey_for_each_ptr_decode(k.k, ptrs, p, entry) {
struct bpos bucket_pos;
struct bch_backpointer bp;
if (p.ptr.cached)
continue;
bch2_extent_ptr_to_bp(c, iter.btree_id, b->c.level + 1,
k, p, &bucket_pos, &bp);
ret = check_bp_exists(trans, bucket_pos, bp, k,
bucket_start, bucket_end, bucket_start, bucket_end,
last_flushed); last_flushed, k);
if (ret)
goto err;
}
err: err:
bch2_trans_iter_exit(trans, &iter); bch2_trans_iter_exit(trans, &iter);
return ret; return ret;
@ -616,43 +616,60 @@ static int bch2_check_extents_to_backpointers_pass(struct btree_trans *trans,
struct bch_fs *c = trans->c; struct bch_fs *c = trans->c;
struct btree_iter iter; struct btree_iter iter;
enum btree_id btree_id; enum btree_id btree_id;
struct bpos_level last_flushed = { UINT_MAX, POS_MIN }; struct bkey_s_c k;
struct bkey_buf last_flushed;
int ret = 0; int ret = 0;
bch2_bkey_buf_init(&last_flushed);
bkey_init(&last_flushed.k->k);
for (btree_id = 0; btree_id < btree_id_nr_alive(c); btree_id++) { for (btree_id = 0; btree_id < btree_id_nr_alive(c); btree_id++) {
unsigned depth = btree_type_has_ptrs(btree_id) ? 0 : 1; int level, depth = btree_type_has_ptrs(btree_id) ? 0 : 1;
bch2_trans_node_iter_init(trans, &iter, btree_id, POS_MIN, 0,
depth,
BTREE_ITER_ALL_LEVELS|
BTREE_ITER_PREFETCH);
do {
ret = commit_do(trans, NULL, NULL,
BTREE_INSERT_LAZY_RW|
BTREE_INSERT_NOFAIL,
check_extent_to_backpointers(trans, &iter,
bucket_start, bucket_end,
&last_flushed));
if (ret)
break;
} while (!bch2_btree_iter_advance(&iter));
bch2_trans_iter_exit(trans, &iter);
if (ret)
break;
ret = commit_do(trans, NULL, NULL, ret = commit_do(trans, NULL, NULL,
BTREE_INSERT_LAZY_RW| BCH_TRANS_COMMIT_no_enospc,
BTREE_INSERT_NOFAIL,
check_btree_root_to_backpointers(trans, btree_id, check_btree_root_to_backpointers(trans, btree_id,
bucket_start, bucket_end, bucket_start, bucket_end,
&last_flushed)); &last_flushed, &level));
if (ret) if (ret)
break; return ret;
while (level >= depth) {
bch2_trans_node_iter_init(trans, &iter, btree_id, POS_MIN, 0,
level,
BTREE_ITER_PREFETCH);
while (1) {
bch2_trans_begin(trans);
k = bch2_btree_iter_peek(&iter);
if (!k.k)
break;
ret = bkey_err(k) ?:
check_extent_to_backpointers(trans, btree_id, level,
bucket_start, bucket_end,
&last_flushed, k) ?:
bch2_trans_commit(trans, NULL, NULL,
BCH_TRANS_COMMIT_no_enospc);
if (bch2_err_matches(ret, BCH_ERR_transaction_restart)) {
ret = 0;
continue;
}
if (ret)
break;
if (bpos_eq(iter.pos, SPOS_MAX))
break;
bch2_btree_iter_advance(&iter);
}
bch2_trans_iter_exit(trans, &iter);
if (ret)
return ret;
--level;
}
} }
return ret;
bch2_bkey_buf_exit(&last_flushed, c);
return 0;
} }
static struct bpos bucket_pos_to_bp_safe(const struct bch_fs *c, static struct bpos bucket_pos_to_bp_safe(const struct bch_fs *c,
@ -746,8 +763,7 @@ int bch2_check_extents_to_backpointers(struct bch_fs *c)
} }
bch2_trans_put(trans); bch2_trans_put(trans);
if (ret) bch_err_fn(c, ret);
bch_err_fn(c, ret);
return ret; return ret;
} }
@ -801,13 +817,11 @@ static int bch2_check_backpointers_to_extents_pass(struct btree_trans *trans,
struct bbpos start, struct bbpos start,
struct bbpos end) struct bbpos end)
{ {
struct btree_iter iter;
struct bkey_s_c k;
struct bpos last_flushed_pos = SPOS_MAX; struct bpos last_flushed_pos = SPOS_MAX;
return for_each_btree_key_commit(trans, iter, BTREE_ID_backpointers, return for_each_btree_key_commit(trans, iter, BTREE_ID_backpointers,
POS_MIN, BTREE_ITER_PREFETCH, k, POS_MIN, BTREE_ITER_PREFETCH, k,
NULL, NULL, BTREE_INSERT_LAZY_RW|BTREE_INSERT_NOFAIL, NULL, NULL, BCH_TRANS_COMMIT_no_enospc,
check_one_backpointer(trans, start, end, check_one_backpointer(trans, start, end,
bkey_s_c_to_backpointer(k), bkey_s_c_to_backpointer(k),
&last_flushed_pos)); &last_flushed_pos));
@ -854,7 +868,6 @@ int bch2_check_backpointers_to_extents(struct bch_fs *c)
} }
bch2_trans_put(trans); bch2_trans_put(trans);
if (ret) bch_err_fn(c, ret);
bch_err_fn(c, ret);
return ret; return ret;
} }

View File

@ -63,7 +63,7 @@ static inline struct bpos bucket_pos_to_bp(const struct bch_fs *c,
return ret; return ret;
} }
int bch2_bucket_backpointer_mod_nowritebuffer(struct btree_trans *, struct bkey_i_backpointer *, int bch2_bucket_backpointer_mod_nowritebuffer(struct btree_trans *, struct bpos bucket,
struct bch_backpointer, struct bkey_s_c, bool); struct bch_backpointer, struct bkey_s_c, bool);
static inline int bch2_bucket_backpointer_mod(struct btree_trans *trans, static inline int bch2_bucket_backpointer_mod(struct btree_trans *trans,
@ -72,28 +72,21 @@ static inline int bch2_bucket_backpointer_mod(struct btree_trans *trans,
struct bkey_s_c orig_k, struct bkey_s_c orig_k,
bool insert) bool insert)
{ {
struct bch_fs *c = trans->c; if (unlikely(bch2_backpointers_no_use_write_buffer))
struct bkey_i_backpointer *bp_k; return bch2_bucket_backpointer_mod_nowritebuffer(trans, bucket, bp, orig_k, insert);
int ret;
bp_k = bch2_trans_kmalloc_nomemzero(trans, sizeof(struct bkey_i_backpointer)); struct bkey_i_backpointer bp_k;
ret = PTR_ERR_OR_ZERO(bp_k);
if (ret)
return ret;
bkey_backpointer_init(&bp_k->k_i); bkey_backpointer_init(&bp_k.k_i);
bp_k->k.p = bucket_pos_to_bp(c, bucket, bp.bucket_offset); bp_k.k.p = bucket_pos_to_bp(trans->c, bucket, bp.bucket_offset);
bp_k->v = bp; bp_k.v = bp;
if (!insert) { if (!insert) {
bp_k->k.type = KEY_TYPE_deleted; bp_k.k.type = KEY_TYPE_deleted;
set_bkey_val_u64s(&bp_k->k, 0); set_bkey_val_u64s(&bp_k.k, 0);
} }
if (unlikely(bch2_backpointers_no_use_write_buffer)) return bch2_trans_update_buffered(trans, BTREE_ID_backpointers, &bp_k.k_i);
return bch2_bucket_backpointer_mod_nowritebuffer(trans, bp_k, bp, orig_k, insert);
return bch2_trans_update_buffered(trans, BTREE_ID_backpointers, &bp_k->k_i);
} }
static inline enum bch_data_type bkey_ptr_data_type(enum btree_id btree_id, unsigned level, static inline enum bch_data_type bkey_ptr_data_type(enum btree_id btree_id, unsigned level,

View File

@ -193,6 +193,7 @@
#include <linux/mutex.h> #include <linux/mutex.h>
#include <linux/percpu-refcount.h> #include <linux/percpu-refcount.h>
#include <linux/percpu-rwsem.h> #include <linux/percpu-rwsem.h>
#include <linux/refcount.h>
#include <linux/rhashtable.h> #include <linux/rhashtable.h>
#include <linux/rwsem.h> #include <linux/rwsem.h>
#include <linux/semaphore.h> #include <linux/semaphore.h>
@ -223,9 +224,11 @@
#define race_fault(...) dynamic_fault("bcachefs:race") #define race_fault(...) dynamic_fault("bcachefs:race")
#define count_event(_c, _name) this_cpu_inc((_c)->counters[BCH_COUNTER_##_name])
#define trace_and_count(_c, _name, ...) \ #define trace_and_count(_c, _name, ...) \
do { \ do { \
this_cpu_inc((_c)->counters[BCH_COUNTER_##_name]); \ count_event(_c, _name); \
trace_##_name(__VA_ARGS__); \ trace_##_name(__VA_ARGS__); \
} while (0) } while (0)
@ -262,46 +265,76 @@ do { \
#define bch2_fmt(_c, fmt) bch2_log_msg(_c, fmt "\n") #define bch2_fmt(_c, fmt) bch2_log_msg(_c, fmt "\n")
__printf(2, 3)
void __bch2_print(struct bch_fs *c, const char *fmt, ...);
#define maybe_dev_to_fs(_c) _Generic((_c), \
struct bch_dev *: ((struct bch_dev *) (_c))->fs, \
struct bch_fs *: (_c))
#define bch2_print(_c, ...) __bch2_print(maybe_dev_to_fs(_c), __VA_ARGS__)
#define bch2_print_ratelimited(_c, ...) \
do { \
static DEFINE_RATELIMIT_STATE(_rs, \
DEFAULT_RATELIMIT_INTERVAL, \
DEFAULT_RATELIMIT_BURST); \
\
if (__ratelimit(&_rs)) \
bch2_print(_c, __VA_ARGS__); \
} while (0)
#define bch_info(c, fmt, ...) \ #define bch_info(c, fmt, ...) \
printk(KERN_INFO bch2_fmt(c, fmt), ##__VA_ARGS__) bch2_print(c, KERN_INFO bch2_fmt(c, fmt), ##__VA_ARGS__)
#define bch_notice(c, fmt, ...) \ #define bch_notice(c, fmt, ...) \
printk(KERN_NOTICE bch2_fmt(c, fmt), ##__VA_ARGS__) bch2_print(c, KERN_NOTICE bch2_fmt(c, fmt), ##__VA_ARGS__)
#define bch_warn(c, fmt, ...) \ #define bch_warn(c, fmt, ...) \
printk(KERN_WARNING bch2_fmt(c, fmt), ##__VA_ARGS__) bch2_print(c, KERN_WARNING bch2_fmt(c, fmt), ##__VA_ARGS__)
#define bch_warn_ratelimited(c, fmt, ...) \ #define bch_warn_ratelimited(c, fmt, ...) \
printk_ratelimited(KERN_WARNING bch2_fmt(c, fmt), ##__VA_ARGS__) bch2_print_ratelimited(c, KERN_WARNING bch2_fmt(c, fmt), ##__VA_ARGS__)
#define bch_err(c, fmt, ...) \ #define bch_err(c, fmt, ...) \
printk(KERN_ERR bch2_fmt(c, fmt), ##__VA_ARGS__) bch2_print(c, KERN_ERR bch2_fmt(c, fmt), ##__VA_ARGS__)
#define bch_err_dev(ca, fmt, ...) \ #define bch_err_dev(ca, fmt, ...) \
printk(KERN_ERR bch2_fmt_dev(ca, fmt), ##__VA_ARGS__) bch2_print(c, KERN_ERR bch2_fmt_dev(ca, fmt), ##__VA_ARGS__)
#define bch_err_dev_offset(ca, _offset, fmt, ...) \ #define bch_err_dev_offset(ca, _offset, fmt, ...) \
printk(KERN_ERR bch2_fmt_dev_offset(ca, _offset, fmt), ##__VA_ARGS__) bch2_print(c, KERN_ERR bch2_fmt_dev_offset(ca, _offset, fmt), ##__VA_ARGS__)
#define bch_err_inum(c, _inum, fmt, ...) \ #define bch_err_inum(c, _inum, fmt, ...) \
printk(KERN_ERR bch2_fmt_inum(c, _inum, fmt), ##__VA_ARGS__) bch2_print(c, KERN_ERR bch2_fmt_inum(c, _inum, fmt), ##__VA_ARGS__)
#define bch_err_inum_offset(c, _inum, _offset, fmt, ...) \ #define bch_err_inum_offset(c, _inum, _offset, fmt, ...) \
printk(KERN_ERR bch2_fmt_inum_offset(c, _inum, _offset, fmt), ##__VA_ARGS__) bch2_print(c, KERN_ERR bch2_fmt_inum_offset(c, _inum, _offset, fmt), ##__VA_ARGS__)
#define bch_err_ratelimited(c, fmt, ...) \ #define bch_err_ratelimited(c, fmt, ...) \
printk_ratelimited(KERN_ERR bch2_fmt(c, fmt), ##__VA_ARGS__) bch2_print_ratelimited(c, KERN_ERR bch2_fmt(c, fmt), ##__VA_ARGS__)
#define bch_err_dev_ratelimited(ca, fmt, ...) \ #define bch_err_dev_ratelimited(ca, fmt, ...) \
printk_ratelimited(KERN_ERR bch2_fmt_dev(ca, fmt), ##__VA_ARGS__) bch2_print_ratelimited(ca, KERN_ERR bch2_fmt_dev(ca, fmt), ##__VA_ARGS__)
#define bch_err_dev_offset_ratelimited(ca, _offset, fmt, ...) \ #define bch_err_dev_offset_ratelimited(ca, _offset, fmt, ...) \
printk_ratelimited(KERN_ERR bch2_fmt_dev_offset(ca, _offset, fmt), ##__VA_ARGS__) bch2_print_ratelimited(ca, KERN_ERR bch2_fmt_dev_offset(ca, _offset, fmt), ##__VA_ARGS__)
#define bch_err_inum_ratelimited(c, _inum, fmt, ...) \ #define bch_err_inum_ratelimited(c, _inum, fmt, ...) \
printk_ratelimited(KERN_ERR bch2_fmt_inum(c, _inum, fmt), ##__VA_ARGS__) bch2_print_ratelimited(c, KERN_ERR bch2_fmt_inum(c, _inum, fmt), ##__VA_ARGS__)
#define bch_err_inum_offset_ratelimited(c, _inum, _offset, fmt, ...) \ #define bch_err_inum_offset_ratelimited(c, _inum, _offset, fmt, ...) \
printk_ratelimited(KERN_ERR bch2_fmt_inum_offset(c, _inum, _offset, fmt), ##__VA_ARGS__) bch2_print_ratelimited(c, KERN_ERR bch2_fmt_inum_offset(c, _inum, _offset, fmt), ##__VA_ARGS__)
static inline bool should_print_err(int err)
{
return err && !bch2_err_matches(err, BCH_ERR_transaction_restart);
}
#define bch_err_fn(_c, _ret) \ #define bch_err_fn(_c, _ret) \
do { \ do { \
if (_ret && !bch2_err_matches(_ret, BCH_ERR_transaction_restart))\ if (should_print_err(_ret)) \
bch_err(_c, "%s(): error %s", __func__, bch2_err_str(_ret));\ bch_err(_c, "%s(): error %s", __func__, bch2_err_str(_ret));\
} while (0) } while (0)
#define bch_err_fn_ratelimited(_c, _ret) \
do { \
if (should_print_err(_ret)) \
bch_err_ratelimited(_c, "%s(): error %s", __func__, bch2_err_str(_ret));\
} while (0)
#define bch_err_msg(_c, _ret, _msg, ...) \ #define bch_err_msg(_c, _ret, _msg, ...) \
do { \ do { \
if (_ret && !bch2_err_matches(_ret, BCH_ERR_transaction_restart))\ if (should_print_err(_ret)) \
bch_err(_c, "%s(): error " _msg " %s", __func__, \ bch_err(_c, "%s(): error " _msg " %s", __func__, \
##__VA_ARGS__, bch2_err_str(_ret)); \ ##__VA_ARGS__, bch2_err_str(_ret)); \
} while (0) } while (0)
@ -392,6 +425,7 @@ BCH_DEBUG_PARAMS_DEBUG()
x(btree_node_merge) \ x(btree_node_merge) \
x(btree_node_sort) \ x(btree_node_sort) \
x(btree_node_read) \ x(btree_node_read) \
x(btree_node_read_done) \
x(btree_interior_update_foreground) \ x(btree_interior_update_foreground) \
x(btree_interior_update_total) \ x(btree_interior_update_total) \
x(btree_gc) \ x(btree_gc) \
@ -401,9 +435,12 @@ BCH_DEBUG_PARAMS_DEBUG()
x(journal_flush_write) \ x(journal_flush_write) \
x(journal_noflush_write) \ x(journal_noflush_write) \
x(journal_flush_seq) \ x(journal_flush_seq) \
x(blocked_journal) \ x(blocked_journal_low_on_space) \
x(blocked_journal_low_on_pin) \
x(blocked_journal_max_in_flight) \
x(blocked_allocate) \ x(blocked_allocate) \
x(blocked_allocate_open_bucket) \ x(blocked_allocate_open_bucket) \
x(blocked_write_buffer_full) \
x(nocow_lock_contended) x(nocow_lock_contended)
enum bch_time_stats { enum bch_time_stats {
@ -428,6 +465,7 @@ enum bch_time_stats {
#include "replicas_types.h" #include "replicas_types.h"
#include "subvolume_types.h" #include "subvolume_types.h"
#include "super_types.h" #include "super_types.h"
#include "thread_with_file_types.h"
/* Number of nodes btree coalesce will try to coalesce at once */ /* Number of nodes btree coalesce will try to coalesce at once */
#define GC_MERGE_NODES 4U #define GC_MERGE_NODES 4U
@ -564,32 +602,35 @@ struct bch_dev {
struct io_count __percpu *io_done; struct io_count __percpu *io_done;
}; };
enum { /*
/* startup: */ * initial_gc_unfixed
BCH_FS_STARTED, * error
BCH_FS_MAY_GO_RW, * topology error
BCH_FS_RW, */
BCH_FS_WAS_RW,
/* shutdown: */ #define BCH_FS_FLAGS() \
BCH_FS_STOPPING, x(started) \
BCH_FS_EMERGENCY_RO, x(may_go_rw) \
BCH_FS_GOING_RO, x(rw) \
BCH_FS_WRITE_DISABLE_COMPLETE, x(was_rw) \
BCH_FS_CLEAN_SHUTDOWN, x(stopping) \
x(emergency_ro) \
x(going_ro) \
x(write_disable_complete) \
x(clean_shutdown) \
x(fsck_running) \
x(initial_gc_unfixed) \
x(need_another_gc) \
x(need_delete_dead_snapshots) \
x(error) \
x(topology_error) \
x(errors_fixed) \
x(errors_not_fixed)
/* fsck passes: */ enum bch_fs_flags {
BCH_FS_FSCK_DONE, #define x(n) BCH_FS_##n,
BCH_FS_INITIAL_GC_UNFIXED, /* kill when we enumerate fsck errors */ BCH_FS_FLAGS()
BCH_FS_NEED_ANOTHER_GC, #undef x
BCH_FS_NEED_DELETE_DEAD_SNAPSHOTS,
/* errors: */
BCH_FS_ERROR,
BCH_FS_TOPOLOGY_ERROR,
BCH_FS_ERRORS_FIXED,
BCH_FS_ERRORS_NOT_FIXED,
}; };
struct btree_debug { struct btree_debug {
@ -599,10 +640,11 @@ struct btree_debug {
#define BCH_TRANSACTIONS_NR 128 #define BCH_TRANSACTIONS_NR 128
struct btree_transaction_stats { struct btree_transaction_stats {
struct bch2_time_stats duration;
struct bch2_time_stats lock_hold_times; struct bch2_time_stats lock_hold_times;
struct mutex lock; struct mutex lock;
unsigned nr_max_paths; unsigned nr_max_paths;
unsigned wb_updates_size; unsigned journal_entries_size;
unsigned max_mem; unsigned max_mem;
char *max_paths_text; char *max_paths_text;
}; };
@ -664,7 +706,8 @@ struct btree_trans_buf {
x(invalidate) \ x(invalidate) \
x(delete_dead_snapshots) \ x(delete_dead_snapshots) \
x(snapshot_delete_pagecache) \ x(snapshot_delete_pagecache) \
x(sysfs) x(sysfs) \
x(btree_write_buffer)
enum bch_write_ref { enum bch_write_ref {
#define x(n) BCH_WRITE_REF_##n, #define x(n) BCH_WRITE_REF_##n,
@ -689,6 +732,8 @@ struct bch_fs {
struct super_block *vfs_sb; struct super_block *vfs_sb;
dev_t dev; dev_t dev;
char name[40]; char name[40];
struct stdio_redirect *stdio;
struct task_struct *stdio_filter;
/* ro/rw, add/remove/resize devices: */ /* ro/rw, add/remove/resize devices: */
struct rw_semaphore state_lock; struct rw_semaphore state_lock;
@ -699,6 +744,13 @@ struct bch_fs {
#else #else
struct percpu_ref writes; struct percpu_ref writes;
#endif #endif
/*
* Analagous to c->writes, for asynchronous ops that don't necessarily
* need fs to be read-write
*/
refcount_t ro_ref;
wait_queue_head_t ro_ref_wait;
struct work_struct read_only_work; struct work_struct read_only_work;
struct bch_dev __rcu *devs[BCH_SB_MEMBERS_MAX]; struct bch_dev __rcu *devs[BCH_SB_MEMBERS_MAX];
@ -1002,10 +1054,21 @@ struct bch_fs {
/* RECOVERY */ /* RECOVERY */
u64 journal_replay_seq_start; u64 journal_replay_seq_start;
u64 journal_replay_seq_end; u64 journal_replay_seq_end;
/*
* Two different uses:
* "Has this fsck pass?" - i.e. should this type of error be an
* emergency read-only
* And, in certain situations fsck will rewind to an earlier pass: used
* for signaling to the toplevel code which pass we want to run now.
*/
enum bch_recovery_pass curr_recovery_pass; enum bch_recovery_pass curr_recovery_pass;
/* bitmap of explicitly enabled recovery passes: */ /* bitmap of explicitly enabled recovery passes: */
u64 recovery_passes_explicit; u64 recovery_passes_explicit;
/* bitmask of recovery passes that we actually ran */
u64 recovery_passes_complete; u64 recovery_passes_complete;
/* never rewinds version of curr_recovery_pass */
enum bch_recovery_pass recovery_pass_done;
struct semaphore online_fsck_mutex;
/* DEBUG JUNK */ /* DEBUG JUNK */
struct dentry *fs_debug_dir; struct dentry *fs_debug_dir;
@ -1065,10 +1128,20 @@ static inline void bch2_write_ref_get(struct bch_fs *c, enum bch_write_ref ref)
#endif #endif
} }
static inline bool __bch2_write_ref_tryget(struct bch_fs *c, enum bch_write_ref ref)
{
#ifdef BCH_WRITE_REF_DEBUG
return !test_bit(BCH_FS_going_ro, &c->flags) &&
atomic_long_inc_not_zero(&c->writes[ref]);
#else
return percpu_ref_tryget(&c->writes);
#endif
}
static inline bool bch2_write_ref_tryget(struct bch_fs *c, enum bch_write_ref ref) static inline bool bch2_write_ref_tryget(struct bch_fs *c, enum bch_write_ref ref)
{ {
#ifdef BCH_WRITE_REF_DEBUG #ifdef BCH_WRITE_REF_DEBUG
return !test_bit(BCH_FS_GOING_RO, &c->flags) && return !test_bit(BCH_FS_going_ro, &c->flags) &&
atomic_long_inc_not_zero(&c->writes[ref]); atomic_long_inc_not_zero(&c->writes[ref]);
#else #else
return percpu_ref_tryget_live(&c->writes); return percpu_ref_tryget_live(&c->writes);
@ -1087,13 +1160,27 @@ static inline void bch2_write_ref_put(struct bch_fs *c, enum bch_write_ref ref)
if (atomic_long_read(&c->writes[i])) if (atomic_long_read(&c->writes[i]))
return; return;
set_bit(BCH_FS_WRITE_DISABLE_COMPLETE, &c->flags); set_bit(BCH_FS_write_disable_complete, &c->flags);
wake_up(&bch2_read_only_wait); wake_up(&bch2_read_only_wait);
#else #else
percpu_ref_put(&c->writes); percpu_ref_put(&c->writes);
#endif #endif
} }
static inline bool bch2_ro_ref_tryget(struct bch_fs *c)
{
if (test_bit(BCH_FS_stopping, &c->flags))
return false;
return refcount_inc_not_zero(&c->ro_ref);
}
static inline void bch2_ro_ref_put(struct bch_fs *c)
{
if (refcount_dec_and_test(&c->ro_ref))
wake_up(&c->ro_ref_wait);
}
static inline void bch2_set_ra_pages(struct bch_fs *c, unsigned ra_pages) static inline void bch2_set_ra_pages(struct bch_fs *c, unsigned ra_pages)
{ {
#ifndef NO_BCACHEFS_FS #ifndef NO_BCACHEFS_FS
@ -1158,6 +1245,15 @@ static inline bool bch2_dev_exists2(const struct bch_fs *c, unsigned dev)
return dev < c->sb.nr_devices && c->devs[dev]; return dev < c->sb.nr_devices && c->devs[dev];
} }
static inline struct stdio_redirect *bch2_fs_stdio_redirect(struct bch_fs *c)
{
struct stdio_redirect *stdio = c->stdio;
if (c->stdio_filter && c->stdio_filter != current)
stdio = NULL;
return stdio;
}
#define BKEY_PADDED_ONSTACK(key, pad) \ #define BKEY_PADDED_ONSTACK(key, pad) \
struct { struct bkey_i key; __u64 key ## _pad[pad]; } struct { struct bkey_i key; __u64 key ## _pad[pad]; }

View File

@ -307,6 +307,13 @@ struct bkey_i {
struct bch_val v; struct bch_val v;
}; };
#define POS_KEY(_pos) \
((struct bkey) { \
.u64s = BKEY_U64s, \
.format = KEY_FORMAT_CURRENT, \
.p = _pos, \
})
#define KEY(_inode, _offset, _size) \ #define KEY(_inode, _offset, _size) \
((struct bkey) { \ ((struct bkey) { \
.u64s = BKEY_U64s, \ .u64s = BKEY_U64s, \
@ -1296,6 +1303,7 @@ struct bch_member {
__le64 errors[BCH_MEMBER_ERROR_NR]; __le64 errors[BCH_MEMBER_ERROR_NR];
__le64 errors_at_reset[BCH_MEMBER_ERROR_NR]; __le64 errors_at_reset[BCH_MEMBER_ERROR_NR];
__le64 errors_reset_time; __le64 errors_reset_time;
__le64 seq;
}; };
#define BCH_MEMBER_V1_BYTES 56 #define BCH_MEMBER_V1_BYTES 56
@ -1442,7 +1450,7 @@ struct bch_sb_field_replicas_v0 {
struct bch_replicas_entry_v0 entries[]; struct bch_replicas_entry_v0 entries[];
} __packed __aligned(8); } __packed __aligned(8);
struct bch_replicas_entry { struct bch_replicas_entry_v1 {
__u8 data_type; __u8 data_type;
__u8 nr_devs; __u8 nr_devs;
__u8 nr_required; __u8 nr_required;
@ -1454,7 +1462,7 @@ struct bch_replicas_entry {
struct bch_sb_field_replicas { struct bch_sb_field_replicas {
struct bch_sb_field field; struct bch_sb_field field;
struct bch_replicas_entry entries[]; struct bch_replicas_entry_v1 entries[];
} __packed __aligned(8); } __packed __aligned(8);
/* BCH_SB_FIELD_quota: */ /* BCH_SB_FIELD_quota: */
@ -1571,7 +1579,9 @@ struct bch_sb_field_disk_groups {
x(write_super, 73) \ x(write_super, 73) \
x(trans_restart_would_deadlock_recursion_limit, 74) \ x(trans_restart_would_deadlock_recursion_limit, 74) \
x(trans_restart_write_buffer_flush, 75) \ x(trans_restart_write_buffer_flush, 75) \
x(trans_restart_split_race, 76) x(trans_restart_split_race, 76) \
x(write_buffer_flush_slowpath, 77) \
x(write_buffer_flush_sync, 78)
enum bch_persistent_counters { enum bch_persistent_counters {
#define x(t, n, ...) BCH_COUNTER_##t, #define x(t, n, ...) BCH_COUNTER_##t,
@ -1662,69 +1672,41 @@ struct bch_sb_field_downgrade {
#define BCH_VERSION_MINOR(_v) ((__u16) ((_v) & ~(~0U << 10))) #define BCH_VERSION_MINOR(_v) ((__u16) ((_v) & ~(~0U << 10)))
#define BCH_VERSION(_major, _minor) (((_major) << 10)|(_minor) << 0) #define BCH_VERSION(_major, _minor) (((_major) << 10)|(_minor) << 0)
#define RECOVERY_PASS_ALL_FSCK (1ULL << 63)
/* /*
* field 1: version name * field 1: version name
* field 2: BCH_VERSION(major, minor) * field 2: BCH_VERSION(major, minor)
* field 3: recovery passess required on upgrade * field 3: recovery passess required on upgrade
*/ */
#define BCH_METADATA_VERSIONS() \ #define BCH_METADATA_VERSIONS() \
x(bkey_renumber, BCH_VERSION(0, 10), \ x(bkey_renumber, BCH_VERSION(0, 10)) \
RECOVERY_PASS_ALL_FSCK) \ x(inode_btree_change, BCH_VERSION(0, 11)) \
x(inode_btree_change, BCH_VERSION(0, 11), \ x(snapshot, BCH_VERSION(0, 12)) \
RECOVERY_PASS_ALL_FSCK) \ x(inode_backpointers, BCH_VERSION(0, 13)) \
x(snapshot, BCH_VERSION(0, 12), \ x(btree_ptr_sectors_written, BCH_VERSION(0, 14)) \
RECOVERY_PASS_ALL_FSCK) \ x(snapshot_2, BCH_VERSION(0, 15)) \
x(inode_backpointers, BCH_VERSION(0, 13), \ x(reflink_p_fix, BCH_VERSION(0, 16)) \
RECOVERY_PASS_ALL_FSCK) \ x(subvol_dirent, BCH_VERSION(0, 17)) \
x(btree_ptr_sectors_written, BCH_VERSION(0, 14), \ x(inode_v2, BCH_VERSION(0, 18)) \
RECOVERY_PASS_ALL_FSCK) \ x(freespace, BCH_VERSION(0, 19)) \
x(snapshot_2, BCH_VERSION(0, 15), \ x(alloc_v4, BCH_VERSION(0, 20)) \
BIT_ULL(BCH_RECOVERY_PASS_fs_upgrade_for_subvolumes)| \ x(new_data_types, BCH_VERSION(0, 21)) \
BIT_ULL(BCH_RECOVERY_PASS_initialize_subvolumes)| \ x(backpointers, BCH_VERSION(0, 22)) \
RECOVERY_PASS_ALL_FSCK) \ x(inode_v3, BCH_VERSION(0, 23)) \
x(reflink_p_fix, BCH_VERSION(0, 16), \ x(unwritten_extents, BCH_VERSION(0, 24)) \
BIT_ULL(BCH_RECOVERY_PASS_fix_reflink_p)) \ x(bucket_gens, BCH_VERSION(0, 25)) \
x(subvol_dirent, BCH_VERSION(0, 17), \ x(lru_v2, BCH_VERSION(0, 26)) \
RECOVERY_PASS_ALL_FSCK) \ x(fragmentation_lru, BCH_VERSION(0, 27)) \
x(inode_v2, BCH_VERSION(0, 18), \ x(no_bps_in_alloc_keys, BCH_VERSION(0, 28)) \
RECOVERY_PASS_ALL_FSCK) \ x(snapshot_trees, BCH_VERSION(0, 29)) \
x(freespace, BCH_VERSION(0, 19), \ x(major_minor, BCH_VERSION(1, 0)) \
RECOVERY_PASS_ALL_FSCK) \ x(snapshot_skiplists, BCH_VERSION(1, 1)) \
x(alloc_v4, BCH_VERSION(0, 20), \ x(deleted_inodes, BCH_VERSION(1, 2)) \
RECOVERY_PASS_ALL_FSCK) \ x(rebalance_work, BCH_VERSION(1, 3)) \
x(new_data_types, BCH_VERSION(0, 21), \ x(member_seq, BCH_VERSION(1, 4))
RECOVERY_PASS_ALL_FSCK) \
x(backpointers, BCH_VERSION(0, 22), \
RECOVERY_PASS_ALL_FSCK) \
x(inode_v3, BCH_VERSION(0, 23), \
RECOVERY_PASS_ALL_FSCK) \
x(unwritten_extents, BCH_VERSION(0, 24), \
RECOVERY_PASS_ALL_FSCK) \
x(bucket_gens, BCH_VERSION(0, 25), \
BIT_ULL(BCH_RECOVERY_PASS_bucket_gens_init)| \
RECOVERY_PASS_ALL_FSCK) \
x(lru_v2, BCH_VERSION(0, 26), \
RECOVERY_PASS_ALL_FSCK) \
x(fragmentation_lru, BCH_VERSION(0, 27), \
RECOVERY_PASS_ALL_FSCK) \
x(no_bps_in_alloc_keys, BCH_VERSION(0, 28), \
RECOVERY_PASS_ALL_FSCK) \
x(snapshot_trees, BCH_VERSION(0, 29), \
RECOVERY_PASS_ALL_FSCK) \
x(major_minor, BCH_VERSION(1, 0), \
0) \
x(snapshot_skiplists, BCH_VERSION(1, 1), \
BIT_ULL(BCH_RECOVERY_PASS_check_snapshots)) \
x(deleted_inodes, BCH_VERSION(1, 2), \
BIT_ULL(BCH_RECOVERY_PASS_check_inodes)) \
x(rebalance_work, BCH_VERSION(1, 3), \
BIT_ULL(BCH_RECOVERY_PASS_set_fs_needs_rebalance))
enum bcachefs_metadata_version { enum bcachefs_metadata_version {
bcachefs_metadata_version_min = 9, bcachefs_metadata_version_min = 9,
#define x(t, n, upgrade_passes) bcachefs_metadata_version_##t = n, #define x(t, n) bcachefs_metadata_version_##t = n,
BCH_METADATA_VERSIONS() BCH_METADATA_VERSIONS()
#undef x #undef x
bcachefs_metadata_version_max bcachefs_metadata_version_max
@ -1786,7 +1768,8 @@ struct bch_sb {
__le32 time_base_hi; __le32 time_base_hi;
__le32 time_precision; __le32 time_precision;
__le64 flags[8]; __le64 flags[7];
__le64 write_time;
__le64 features[2]; __le64 features[2];
__le64 compat[2]; __le64 compat[2];
@ -2153,7 +2136,8 @@ static inline __u64 __bset_magic(struct bch_sb *sb)
x(clock, 7) \ x(clock, 7) \
x(dev_usage, 8) \ x(dev_usage, 8) \
x(log, 9) \ x(log, 9) \
x(overwrite, 10) x(overwrite, 10) \
x(write_buffer_keys, 11)
enum { enum {
#define x(f, nr) BCH_JSET_ENTRY_##f = nr, #define x(f, nr) BCH_JSET_ENTRY_##f = nr,
@ -2162,6 +2146,19 @@ enum {
BCH_JSET_ENTRY_NR BCH_JSET_ENTRY_NR
}; };
static inline bool jset_entry_is_key(struct jset_entry *e)
{
switch (e->type) {
case BCH_JSET_ENTRY_btree_keys:
case BCH_JSET_ENTRY_btree_root:
case BCH_JSET_ENTRY_overwrite:
case BCH_JSET_ENTRY_write_buffer_keys:
return true;
}
return false;
}
/* /*
* Journal sequence numbers can be blacklisted: bsets record the max sequence * Journal sequence numbers can be blacklisted: bsets record the max sequence
* number of all the journal entries they contain updates for, so that on * number of all the journal entries they contain updates for, so that on
@ -2203,7 +2200,7 @@ struct jset_entry_usage {
struct jset_entry_data_usage { struct jset_entry_data_usage {
struct jset_entry entry; struct jset_entry entry;
__le64 v; __le64 v;
struct bch_replicas_entry r; struct bch_replicas_entry_v1 r;
} __packed; } __packed;
struct jset_entry_clock { struct jset_entry_clock {
@ -2224,8 +2221,8 @@ struct jset_entry_dev_usage {
__le32 dev; __le32 dev;
__u32 pad; __u32 pad;
__le64 buckets_ec; __le64 _buckets_ec; /* No longer used */
__le64 _buckets_unavailable; /* No longer used */ __le64 _buckets_unavailable; /* No longer used */
struct jset_entry_dev_usage_type d[]; struct jset_entry_dev_usage_type d[];
}; };
@ -2239,7 +2236,7 @@ static inline unsigned jset_entry_dev_usage_nr_types(struct jset_entry_dev_usage
struct jset_entry_log { struct jset_entry_log {
struct jset_entry entry; struct jset_entry entry;
u8 d[]; u8 d[];
} __packed; } __packed __aligned(8);
/* /*
* On disk format for a journal entry: * On disk format for a journal entry:

View File

@ -81,6 +81,11 @@ struct bch_ioctl_incremental {
#define BCH_IOCTL_SUBVOLUME_CREATE _IOW(0xbc, 16, struct bch_ioctl_subvolume) #define BCH_IOCTL_SUBVOLUME_CREATE _IOW(0xbc, 16, struct bch_ioctl_subvolume)
#define BCH_IOCTL_SUBVOLUME_DESTROY _IOW(0xbc, 17, struct bch_ioctl_subvolume) #define BCH_IOCTL_SUBVOLUME_DESTROY _IOW(0xbc, 17, struct bch_ioctl_subvolume)
#define BCH_IOCTL_DEV_USAGE_V2 _IOWR(0xbc, 18, struct bch_ioctl_dev_usage_v2)
#define BCH_IOCTL_FSCK_OFFLINE _IOW(0xbc, 19, struct bch_ioctl_fsck_offline)
#define BCH_IOCTL_FSCK_ONLINE _IOW(0xbc, 20, struct bch_ioctl_fsck_online)
/* ioctl below act on a particular file, not the filesystem as a whole: */ /* ioctl below act on a particular file, not the filesystem as a whole: */
#define BCHFS_IOC_REINHERIT_ATTRS _IOR(0xbc, 64, const char __user *) #define BCHFS_IOC_REINHERIT_ATTRS _IOR(0xbc, 64, const char __user *)
@ -173,12 +178,18 @@ struct bch_ioctl_disk_set_state {
__u64 dev; __u64 dev;
}; };
#define BCH_DATA_OPS() \
x(scrub, 0) \
x(rereplicate, 1) \
x(migrate, 2) \
x(rewrite_old_nodes, 3) \
x(drop_extra_replicas, 4)
enum bch_data_ops { enum bch_data_ops {
BCH_DATA_OP_SCRUB = 0, #define x(t, n) BCH_DATA_OP_##t = n,
BCH_DATA_OP_REREPLICATE = 1, BCH_DATA_OPS()
BCH_DATA_OP_MIGRATE = 2, #undef x
BCH_DATA_OP_REWRITE_OLD_NODES = 3, BCH_DATA_OP_NR
BCH_DATA_OP_NR = 4,
}; };
/* /*
@ -237,7 +248,7 @@ struct bch_ioctl_data_event {
struct bch_replicas_usage { struct bch_replicas_usage {
__u64 sectors; __u64 sectors;
struct bch_replicas_entry r; struct bch_replicas_entry_v1 r;
} __packed; } __packed;
static inline struct bch_replicas_usage * static inline struct bch_replicas_usage *
@ -268,7 +279,7 @@ struct bch_ioctl_fs_usage {
__u32 replica_entries_bytes; __u32 replica_entries_bytes;
__u32 pad; __u32 pad;
struct bch_replicas_usage replicas[0]; struct bch_replicas_usage replicas[];
}; };
/* /*
@ -292,7 +303,20 @@ struct bch_ioctl_dev_usage {
__u64 buckets; __u64 buckets;
__u64 sectors; __u64 sectors;
__u64 fragmented; __u64 fragmented;
} d[BCH_DATA_NR]; } d[10];
};
struct bch_ioctl_dev_usage_v2 {
__u64 dev;
__u32 flags;
__u8 state;
__u8 nr_data_types;
__u8 pad[6];
__u32 bucket_size;
__u64 nr_buckets;
struct bch_ioctl_dev_usage_type d[];
}; };
/* /*
@ -365,4 +389,24 @@ struct bch_ioctl_subvolume {
#define BCH_SUBVOL_SNAPSHOT_CREATE (1U << 0) #define BCH_SUBVOL_SNAPSHOT_CREATE (1U << 0)
#define BCH_SUBVOL_SNAPSHOT_RO (1U << 1) #define BCH_SUBVOL_SNAPSHOT_RO (1U << 1)
/*
* BCH_IOCTL_FSCK_OFFLINE: run fsck from the 'bcachefs fsck' userspace command,
* but with the kernel's implementation of fsck:
*/
struct bch_ioctl_fsck_offline {
__u64 flags;
__u64 opts; /* string */
__u64 nr_devs;
__u64 devs[] __counted_by(nr_devs);
};
/*
* BCH_IOCTL_FSCK_ONLINE: run fsck from the 'bcachefs fsck' userspace command,
* but with the kernel's implementation of fsck:
*/
struct bch_ioctl_fsck_online {
__u64 flags;
__u64 opts; /* string */
};
#endif /* _BCACHEFS_IOCTL_H */ #endif /* _BCACHEFS_IOCTL_H */

View File

@ -28,10 +28,8 @@ struct bkey_ops {
void (*swab)(struct bkey_s); void (*swab)(struct bkey_s);
bool (*key_normalize)(struct bch_fs *, struct bkey_s); bool (*key_normalize)(struct bch_fs *, struct bkey_s);
bool (*key_merge)(struct bch_fs *, struct bkey_s, struct bkey_s_c); bool (*key_merge)(struct bch_fs *, struct bkey_s, struct bkey_s_c);
int (*trans_trigger)(struct btree_trans *, enum btree_id, unsigned, int (*trigger)(struct btree_trans *, enum btree_id, unsigned,
struct bkey_s_c, struct bkey_i *, unsigned); struct bkey_s_c, struct bkey_s, unsigned);
int (*atomic_trigger)(struct btree_trans *, enum btree_id, unsigned,
struct bkey_s_c, struct bkey_s_c, unsigned);
void (*compat)(enum btree_id id, unsigned version, void (*compat)(enum btree_id id, unsigned version,
unsigned big_endian, int write, unsigned big_endian, int write,
struct bkey_s); struct bkey_s);
@ -78,84 +76,86 @@ static inline bool bch2_bkey_maybe_mergable(const struct bkey *l, const struct b
bool bch2_bkey_merge(struct bch_fs *, struct bkey_s, struct bkey_s_c); bool bch2_bkey_merge(struct bch_fs *, struct bkey_s, struct bkey_s_c);
static inline int bch2_mark_key(struct btree_trans *trans,
enum btree_id btree, unsigned level,
struct bkey_s_c old, struct bkey_s_c new,
unsigned flags)
{
const struct bkey_ops *ops = bch2_bkey_type_ops(old.k->type ?: new.k->type);
return ops->atomic_trigger
? ops->atomic_trigger(trans, btree, level, old, new, flags)
: 0;
}
enum btree_update_flags { enum btree_update_flags {
__BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE = __BTREE_ITER_FLAGS_END, __BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE = __BTREE_ITER_FLAGS_END,
__BTREE_UPDATE_NOJOURNAL, __BTREE_UPDATE_NOJOURNAL,
__BTREE_UPDATE_PREJOURNAL,
__BTREE_UPDATE_KEY_CACHE_RECLAIM, __BTREE_UPDATE_KEY_CACHE_RECLAIM,
__BTREE_TRIGGER_NORUN, /* Don't run triggers at all */ __BTREE_TRIGGER_NORUN,
__BTREE_TRIGGER_TRANSACTIONAL,
__BTREE_TRIGGER_INSERT, __BTREE_TRIGGER_INSERT,
__BTREE_TRIGGER_OVERWRITE, __BTREE_TRIGGER_OVERWRITE,
__BTREE_TRIGGER_GC, __BTREE_TRIGGER_GC,
__BTREE_TRIGGER_BUCKET_INVALIDATE, __BTREE_TRIGGER_BUCKET_INVALIDATE,
__BTREE_TRIGGER_NOATOMIC,
}; };
#define BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE (1U << __BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE) #define BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE (1U << __BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE)
#define BTREE_UPDATE_NOJOURNAL (1U << __BTREE_UPDATE_NOJOURNAL) #define BTREE_UPDATE_NOJOURNAL (1U << __BTREE_UPDATE_NOJOURNAL)
#define BTREE_UPDATE_PREJOURNAL (1U << __BTREE_UPDATE_PREJOURNAL)
#define BTREE_UPDATE_KEY_CACHE_RECLAIM (1U << __BTREE_UPDATE_KEY_CACHE_RECLAIM) #define BTREE_UPDATE_KEY_CACHE_RECLAIM (1U << __BTREE_UPDATE_KEY_CACHE_RECLAIM)
/* Don't run triggers at all */
#define BTREE_TRIGGER_NORUN (1U << __BTREE_TRIGGER_NORUN) #define BTREE_TRIGGER_NORUN (1U << __BTREE_TRIGGER_NORUN)
/*
* If set, we're running transactional triggers as part of a transaction commit:
* triggers may generate new updates
*
* If cleared, and either BTREE_TRIGGER_INSERT|BTREE_TRIGGER_OVERWRITE are set,
* we're running atomic triggers during a transaction commit: we have our
* journal reservation, we're holding btree node write locks, and we know the
* transaction is going to commit (returning an error here is a fatal error,
* causing us to go emergency read-only)
*/
#define BTREE_TRIGGER_TRANSACTIONAL (1U << __BTREE_TRIGGER_TRANSACTIONAL)
/* @new is entering the btree */
#define BTREE_TRIGGER_INSERT (1U << __BTREE_TRIGGER_INSERT) #define BTREE_TRIGGER_INSERT (1U << __BTREE_TRIGGER_INSERT)
/* @old is leaving the btree */
#define BTREE_TRIGGER_OVERWRITE (1U << __BTREE_TRIGGER_OVERWRITE) #define BTREE_TRIGGER_OVERWRITE (1U << __BTREE_TRIGGER_OVERWRITE)
/* We're in gc/fsck: running triggers to recalculate e.g. disk usage */
#define BTREE_TRIGGER_GC (1U << __BTREE_TRIGGER_GC) #define BTREE_TRIGGER_GC (1U << __BTREE_TRIGGER_GC)
/* signal from bucket invalidate path to alloc trigger */
#define BTREE_TRIGGER_BUCKET_INVALIDATE (1U << __BTREE_TRIGGER_BUCKET_INVALIDATE) #define BTREE_TRIGGER_BUCKET_INVALIDATE (1U << __BTREE_TRIGGER_BUCKET_INVALIDATE)
#define BTREE_TRIGGER_NOATOMIC (1U << __BTREE_TRIGGER_NOATOMIC)
static inline int bch2_trans_mark_key(struct btree_trans *trans, static inline int bch2_key_trigger(struct btree_trans *trans,
enum btree_id btree_id, unsigned level, enum btree_id btree, unsigned level,
struct bkey_s_c old, struct bkey_i *new, struct bkey_s_c old, struct bkey_s new,
unsigned flags) unsigned flags)
{ {
const struct bkey_ops *ops = bch2_bkey_type_ops(old.k->type ?: new->k.type); const struct bkey_ops *ops = bch2_bkey_type_ops(old.k->type ?: new.k->type);
return ops->trans_trigger return ops->trigger
? ops->trans_trigger(trans, btree_id, level, old, new, flags) ? ops->trigger(trans, btree, level, old, new, flags)
: 0; : 0;
} }
static inline int bch2_trans_mark_old(struct btree_trans *trans, static inline int bch2_key_trigger_old(struct btree_trans *trans,
enum btree_id btree_id, unsigned level, enum btree_id btree_id, unsigned level,
struct bkey_s_c old, unsigned flags) struct bkey_s_c old, unsigned flags)
{ {
struct bkey_i deleted; struct bkey_i deleted;
bkey_init(&deleted.k); bkey_init(&deleted.k);
deleted.k.p = old.k->p; deleted.k.p = old.k->p;
return bch2_trans_mark_key(trans, btree_id, level, old, &deleted, return bch2_key_trigger(trans, btree_id, level, old, bkey_i_to_s(&deleted),
BTREE_TRIGGER_OVERWRITE|flags); BTREE_TRIGGER_OVERWRITE|flags);
} }
static inline int bch2_trans_mark_new(struct btree_trans *trans, static inline int bch2_key_trigger_new(struct btree_trans *trans,
enum btree_id btree_id, unsigned level, enum btree_id btree_id, unsigned level,
struct bkey_i *new, unsigned flags) struct bkey_s new, unsigned flags)
{ {
struct bkey_i deleted; struct bkey_i deleted;
bkey_init(&deleted.k); bkey_init(&deleted.k);
deleted.k.p = new->k.p; deleted.k.p = new.k->p;
return bch2_trans_mark_key(trans, btree_id, level, bkey_i_to_s_c(&deleted), new, return bch2_key_trigger(trans, btree_id, level, bkey_i_to_s_c(&deleted), new,
BTREE_TRIGGER_INSERT|flags); BTREE_TRIGGER_INSERT|flags);
} }
void bch2_bkey_renumber(enum btree_node_type, struct bkey_packed *, int); void bch2_bkey_renumber(enum btree_node_type, struct bkey_packed *, int);

View File

@ -68,6 +68,12 @@ void bch2_dump_bset(struct bch_fs *c, struct btree *b,
_k = _n) { _k = _n) {
_n = bkey_p_next(_k); _n = bkey_p_next(_k);
if (!_k->u64s) {
printk(KERN_ERR "block %u key %5zu - u64s 0? aieee!\n", set,
_k->_data - i->_data);
break;
}
k = bkey_disassemble(b, _k, &uk); k = bkey_disassemble(b, _k, &uk);
printbuf_reset(&buf); printbuf_reset(&buf);

View File

@ -500,19 +500,21 @@ void bch2_fs_btree_cache_init_early(struct btree_cache *bc)
* cannibalize_bucket() will take. This means every time we unlock the root of * cannibalize_bucket() will take. This means every time we unlock the root of
* the btree, we need to release this lock if we have it held. * the btree, we need to release this lock if we have it held.
*/ */
void bch2_btree_cache_cannibalize_unlock(struct bch_fs *c) void bch2_btree_cache_cannibalize_unlock(struct btree_trans *trans)
{ {
struct bch_fs *c = trans->c;
struct btree_cache *bc = &c->btree_cache; struct btree_cache *bc = &c->btree_cache;
if (bc->alloc_lock == current) { if (bc->alloc_lock == current) {
trace_and_count(c, btree_cache_cannibalize_unlock, c); trace_and_count(c, btree_cache_cannibalize_unlock, trans);
bc->alloc_lock = NULL; bc->alloc_lock = NULL;
closure_wake_up(&bc->alloc_wait); closure_wake_up(&bc->alloc_wait);
} }
} }
int bch2_btree_cache_cannibalize_lock(struct bch_fs *c, struct closure *cl) int bch2_btree_cache_cannibalize_lock(struct btree_trans *trans, struct closure *cl)
{ {
struct bch_fs *c = trans->c;
struct btree_cache *bc = &c->btree_cache; struct btree_cache *bc = &c->btree_cache;
struct task_struct *old; struct task_struct *old;
@ -521,7 +523,7 @@ int bch2_btree_cache_cannibalize_lock(struct bch_fs *c, struct closure *cl)
goto success; goto success;
if (!cl) { if (!cl) {
trace_and_count(c, btree_cache_cannibalize_lock_fail, c); trace_and_count(c, btree_cache_cannibalize_lock_fail, trans);
return -BCH_ERR_ENOMEM_btree_cache_cannibalize_lock; return -BCH_ERR_ENOMEM_btree_cache_cannibalize_lock;
} }
@ -535,11 +537,11 @@ int bch2_btree_cache_cannibalize_lock(struct bch_fs *c, struct closure *cl)
goto success; goto success;
} }
trace_and_count(c, btree_cache_cannibalize_lock_fail, c); trace_and_count(c, btree_cache_cannibalize_lock_fail, trans);
return -BCH_ERR_btree_cache_cannibalize_lock_blocked; return -BCH_ERR_btree_cache_cannibalize_lock_blocked;
success: success:
trace_and_count(c, btree_cache_cannibalize_lock, c); trace_and_count(c, btree_cache_cannibalize_lock, trans);
return 0; return 0;
} }
@ -673,7 +675,7 @@ struct btree *bch2_btree_node_mem_alloc(struct btree_trans *trans, bool pcpu_rea
mutex_unlock(&bc->lock); mutex_unlock(&bc->lock);
trace_and_count(c, btree_cache_cannibalize, c); trace_and_count(c, btree_cache_cannibalize, trans);
goto out; goto out;
} }
@ -717,12 +719,6 @@ static noinline struct btree *bch2_btree_node_fill(struct btree_trans *trans,
if (IS_ERR(b)) if (IS_ERR(b))
return b; return b;
/*
* Btree nodes read in from disk should not have the accessed bit set
* initially, so that linear scans don't thrash the cache:
*/
clear_btree_node_accessed(b);
bkey_copy(&b->key, k); bkey_copy(&b->key, k);
if (bch2_btree_node_hash_insert(bc, b, level, btree_id)) { if (bch2_btree_node_hash_insert(bc, b, level, btree_id)) {
/* raced with another fill: */ /* raced with another fill: */
@ -749,7 +745,7 @@ static noinline struct btree *bch2_btree_node_fill(struct btree_trans *trans,
if (path && sync) if (path && sync)
bch2_trans_unlock_noassert(trans); bch2_trans_unlock_noassert(trans);
bch2_btree_node_read(c, b, sync); bch2_btree_node_read(trans, b, sync);
if (!sync) if (!sync)
return NULL; return NULL;
@ -1039,7 +1035,7 @@ struct btree *bch2_btree_node_get_noiter(struct btree_trans *trans,
goto retry; goto retry;
if (IS_ERR(b) && if (IS_ERR(b) &&
!bch2_btree_cache_cannibalize_lock(c, NULL)) !bch2_btree_cache_cannibalize_lock(trans, NULL))
goto retry; goto retry;
if (IS_ERR(b)) if (IS_ERR(b))
@ -1087,7 +1083,7 @@ struct btree *bch2_btree_node_get_noiter(struct btree_trans *trans,
EBUG_ON(BTREE_NODE_LEVEL(b->data) != level); EBUG_ON(BTREE_NODE_LEVEL(b->data) != level);
btree_check_header(c, b); btree_check_header(c, b);
out: out:
bch2_btree_cache_cannibalize_unlock(c); bch2_btree_cache_cannibalize_unlock(trans);
return b; return b;
} }

View File

@ -17,8 +17,8 @@ int __bch2_btree_node_hash_insert(struct btree_cache *, struct btree *);
int bch2_btree_node_hash_insert(struct btree_cache *, struct btree *, int bch2_btree_node_hash_insert(struct btree_cache *, struct btree *,
unsigned, enum btree_id); unsigned, enum btree_id);
void bch2_btree_cache_cannibalize_unlock(struct bch_fs *); void bch2_btree_cache_cannibalize_unlock(struct btree_trans *);
int bch2_btree_cache_cannibalize_lock(struct bch_fs *, struct closure *); int bch2_btree_cache_cannibalize_lock(struct btree_trans *, struct closure *);
struct btree *__bch2_btree_node_mem_alloc(struct bch_fs *); struct btree *__bch2_btree_node_mem_alloc(struct bch_fs *);
struct btree *bch2_btree_node_mem_alloc(struct btree_trans *, bool); struct btree *bch2_btree_node_mem_alloc(struct btree_trans *, bool);

View File

@ -41,6 +41,14 @@
#define DROP_THIS_NODE 10 #define DROP_THIS_NODE 10
#define DROP_PREV_NODE 11 #define DROP_PREV_NODE 11
static struct bkey_s unsafe_bkey_s_c_to_s(struct bkey_s_c k)
{
return (struct bkey_s) {{{
(struct bkey *) k.k,
(struct bch_val *) k.v
}}};
}
static bool should_restart_for_topology_repair(struct bch_fs *c) static bool should_restart_for_topology_repair(struct bch_fs *c)
{ {
return c->opts.fix_errors != FSCK_FIX_no && return c->opts.fix_errors != FSCK_FIX_no &&
@ -108,7 +116,7 @@ static int bch2_gc_check_topology(struct bch_fs *c,
ret = bch2_run_explicit_recovery_pass(c, BCH_RECOVERY_PASS_check_topology); ret = bch2_run_explicit_recovery_pass(c, BCH_RECOVERY_PASS_check_topology);
goto err; goto err;
} else { } else {
set_bit(BCH_FS_INITIAL_GC_UNFIXED, &c->flags); set_bit(BCH_FS_initial_gc_unfixed, &c->flags);
} }
} }
} }
@ -134,7 +142,7 @@ static int bch2_gc_check_topology(struct bch_fs *c,
ret = bch2_run_explicit_recovery_pass(c, BCH_RECOVERY_PASS_check_topology); ret = bch2_run_explicit_recovery_pass(c, BCH_RECOVERY_PASS_check_topology);
goto err; goto err;
} else { } else {
set_bit(BCH_FS_INITIAL_GC_UNFIXED, &c->flags); set_bit(BCH_FS_initial_gc_unfixed, &c->flags);
} }
} }
@ -414,10 +422,9 @@ static int bch2_btree_repair_topology_recurse(struct btree_trans *trans, struct
continue; continue;
} }
if (ret) { bch_err_msg(c, ret, "getting btree node");
bch_err_msg(c, ret, "getting btree node"); if (ret)
break; break;
}
ret = btree_repair_node_boundaries(c, b, prev, cur); ret = btree_repair_node_boundaries(c, b, prev, cur);
@ -482,10 +489,9 @@ static int bch2_btree_repair_topology_recurse(struct btree_trans *trans, struct
false); false);
ret = PTR_ERR_OR_ZERO(cur); ret = PTR_ERR_OR_ZERO(cur);
if (ret) { bch_err_msg(c, ret, "getting btree node");
bch_err_msg(c, ret, "getting btree node"); if (ret)
goto err; goto err;
}
ret = bch2_btree_repair_topology_recurse(trans, cur); ret = bch2_btree_repair_topology_recurse(trans, cur);
six_unlock_read(&cur->c.lock); six_unlock_read(&cur->c.lock);
@ -619,7 +625,7 @@ static int bch2_check_fix_ptrs(struct btree_trans *trans, enum btree_id btree_id
g->data_type = 0; g->data_type = 0;
g->dirty_sectors = 0; g->dirty_sectors = 0;
g->cached_sectors = 0; g->cached_sectors = 0;
set_bit(BCH_FS_NEED_ANOTHER_GC, &c->flags); set_bit(BCH_FS_need_another_gc, &c->flags);
} else { } else {
do_update = true; do_update = true;
} }
@ -664,7 +670,7 @@ static int bch2_check_fix_ptrs(struct btree_trans *trans, enum btree_id btree_id
bch2_bkey_val_to_text(&buf, c, *k), buf.buf))) { bch2_bkey_val_to_text(&buf, c, *k), buf.buf))) {
if (data_type == BCH_DATA_btree) { if (data_type == BCH_DATA_btree) {
g->data_type = data_type; g->data_type = data_type;
set_bit(BCH_FS_NEED_ANOTHER_GC, &c->flags); set_bit(BCH_FS_need_another_gc, &c->flags);
} else { } else {
do_update = true; do_update = true;
} }
@ -707,8 +713,8 @@ static int bch2_check_fix_ptrs(struct btree_trans *trans, enum btree_id btree_id
new = kmalloc(bkey_bytes(k->k), GFP_KERNEL); new = kmalloc(bkey_bytes(k->k), GFP_KERNEL);
if (!new) { if (!new) {
bch_err_msg(c, ret, "allocating new key");
ret = -BCH_ERR_ENOMEM_gc_repair_key; ret = -BCH_ERR_ENOMEM_gc_repair_key;
bch_err_msg(c, ret, "allocating new key");
goto err; goto err;
} }
@ -807,9 +813,6 @@ static int bch2_gc_mark_key(struct btree_trans *trans, enum btree_id btree_id,
struct bch_fs *c = trans->c; struct bch_fs *c = trans->c;
struct bkey deleted = KEY(0, 0, 0); struct bkey deleted = KEY(0, 0, 0);
struct bkey_s_c old = (struct bkey_s_c) { &deleted, NULL }; struct bkey_s_c old = (struct bkey_s_c) { &deleted, NULL };
unsigned flags =
BTREE_TRIGGER_GC|
(initial ? BTREE_TRIGGER_NOATOMIC : 0);
int ret = 0; int ret = 0;
deleted.p = k->k->p; deleted.p = k->k->p;
@ -831,11 +834,10 @@ static int bch2_gc_mark_key(struct btree_trans *trans, enum btree_id btree_id,
} }
ret = commit_do(trans, NULL, NULL, 0, ret = commit_do(trans, NULL, NULL, 0,
bch2_mark_key(trans, btree_id, level, old, *k, flags)); bch2_key_trigger(trans, btree_id, level, old, unsafe_bkey_s_c_to_s(*k), BTREE_TRIGGER_GC));
fsck_err: fsck_err:
err: err:
if (ret) bch_err_fn(c, ret);
bch_err_fn(c, ret);
return ret; return ret;
} }
@ -996,7 +998,7 @@ static int bch2_gc_btree_init_recurse(struct btree_trans *trans, struct btree *b
/* Continue marking when opted to not /* Continue marking when opted to not
* fix the error: */ * fix the error: */
ret = 0; ret = 0;
set_bit(BCH_FS_INITIAL_GC_UNFIXED, &c->flags); set_bit(BCH_FS_initial_gc_unfixed, &c->flags);
continue; continue;
} }
} else if (ret) { } else if (ret) {
@ -1068,8 +1070,7 @@ static int bch2_gc_btree_init(struct btree_trans *trans,
fsck_err: fsck_err:
six_unlock_read(&b->c.lock); six_unlock_read(&b->c.lock);
if (ret < 0) bch_err_fn(c, ret);
bch_err_fn(c, ret);
printbuf_exit(&buf); printbuf_exit(&buf);
return ret; return ret;
} }
@ -1105,10 +1106,8 @@ static int bch2_gc_btrees(struct bch_fs *c, bool initial, bool metadata_only)
: bch2_gc_btree(trans, i, initial, metadata_only); : bch2_gc_btree(trans, i, initial, metadata_only);
} }
if (ret < 0)
bch_err_fn(c, ret);
bch2_trans_put(trans); bch2_trans_put(trans);
bch_err_fn(c, ret);
return ret; return ret;
} }
@ -1159,13 +1158,10 @@ static void bch2_mark_dev_superblock(struct bch_fs *c, struct bch_dev *ca,
static void bch2_mark_superblocks(struct bch_fs *c) static void bch2_mark_superblocks(struct bch_fs *c)
{ {
struct bch_dev *ca;
unsigned i;
mutex_lock(&c->sb_lock); mutex_lock(&c->sb_lock);
gc_pos_set(c, gc_phase(GC_PHASE_SB)); gc_pos_set(c, gc_phase(GC_PHASE_SB));
for_each_online_member(ca, c, i) for_each_online_member(c, ca)
bch2_mark_dev_superblock(c, ca, BTREE_TRIGGER_GC); bch2_mark_dev_superblock(c, ca, BTREE_TRIGGER_GC);
mutex_unlock(&c->sb_lock); mutex_unlock(&c->sb_lock);
} }
@ -1190,13 +1186,10 @@ static void bch2_mark_pending_btree_node_frees(struct bch_fs *c)
static void bch2_gc_free(struct bch_fs *c) static void bch2_gc_free(struct bch_fs *c)
{ {
struct bch_dev *ca;
unsigned i;
genradix_free(&c->reflink_gc_table); genradix_free(&c->reflink_gc_table);
genradix_free(&c->gc_stripes); genradix_free(&c->gc_stripes);
for_each_member_device(ca, c, i) { for_each_member_device(c, ca) {
kvpfree(rcu_dereference_protected(ca->buckets_gc, 1), kvpfree(rcu_dereference_protected(ca->buckets_gc, 1),
sizeof(struct bucket_array) + sizeof(struct bucket_array) +
ca->mi.nbuckets * sizeof(struct bucket)); ca->mi.nbuckets * sizeof(struct bucket));
@ -1218,7 +1211,7 @@ static int bch2_gc_done(struct bch_fs *c,
bool verify = !metadata_only && bool verify = !metadata_only &&
!c->opts.reconstruct_alloc && !c->opts.reconstruct_alloc &&
(!initial || (c->sb.compat & (1ULL << BCH_COMPAT_alloc_info))); (!initial || (c->sb.compat & (1ULL << BCH_COMPAT_alloc_info)));
unsigned i, dev; unsigned i;
int ret = 0; int ret = 0;
percpu_down_write(&c->mark_lock); percpu_down_write(&c->mark_lock);
@ -1230,14 +1223,14 @@ static int bch2_gc_done(struct bch_fs *c,
, ##__VA_ARGS__, dst->_f, src->_f))) \ , ##__VA_ARGS__, dst->_f, src->_f))) \
dst->_f = src->_f dst->_f = src->_f
#define copy_dev_field(_err, _f, _msg, ...) \ #define copy_dev_field(_err, _f, _msg, ...) \
copy_field(_err, _f, "dev %u has wrong " _msg, dev, ##__VA_ARGS__) copy_field(_err, _f, "dev %u has wrong " _msg, ca->dev_idx, ##__VA_ARGS__)
#define copy_fs_field(_err, _f, _msg, ...) \ #define copy_fs_field(_err, _f, _msg, ...) \
copy_field(_err, _f, "fs has wrong " _msg, ##__VA_ARGS__) copy_field(_err, _f, "fs has wrong " _msg, ##__VA_ARGS__)
for (i = 0; i < ARRAY_SIZE(c->usage); i++) for (i = 0; i < ARRAY_SIZE(c->usage); i++)
bch2_fs_usage_acc_to_base(c, i); bch2_fs_usage_acc_to_base(c, i);
for_each_member_device(ca, c, dev) { __for_each_member_device(c, ca) {
struct bch_dev_usage *dst = ca->usage_base; struct bch_dev_usage *dst = ca->usage_base;
struct bch_dev_usage *src = (void *) struct bch_dev_usage *src = (void *)
bch2_acc_percpu_u64s((u64 __percpu *) ca->usage_gc, bch2_acc_percpu_u64s((u64 __percpu *) ca->usage_gc,
@ -1251,9 +1244,6 @@ static int bch2_gc_done(struct bch_fs *c,
copy_dev_field(dev_usage_fragmented_wrong, copy_dev_field(dev_usage_fragmented_wrong,
d[i].fragmented, "%s fragmented", bch2_data_types[i]); d[i].fragmented, "%s fragmented", bch2_data_types[i]);
} }
copy_dev_field(dev_usage_buckets_ec_wrong,
buckets_ec, "buckets_ec");
} }
{ {
@ -1284,7 +1274,7 @@ static int bch2_gc_done(struct bch_fs *c,
} }
for (i = 0; i < c->replicas.nr; i++) { for (i = 0; i < c->replicas.nr; i++) {
struct bch_replicas_entry *e = struct bch_replicas_entry_v1 *e =
cpu_replicas_entry(&c->replicas, i); cpu_replicas_entry(&c->replicas, i);
if (metadata_only && if (metadata_only &&
@ -1307,8 +1297,7 @@ static int bch2_gc_done(struct bch_fs *c,
fsck_err: fsck_err:
if (ca) if (ca)
percpu_ref_put(&ca->ref); percpu_ref_put(&ca->ref);
if (ret) bch_err_fn(c, ret);
bch_err_fn(c, ret);
percpu_up_write(&c->mark_lock); percpu_up_write(&c->mark_lock);
printbuf_exit(&buf); printbuf_exit(&buf);
@ -1317,9 +1306,6 @@ static int bch2_gc_done(struct bch_fs *c,
static int bch2_gc_start(struct bch_fs *c) static int bch2_gc_start(struct bch_fs *c)
{ {
struct bch_dev *ca = NULL;
unsigned i;
BUG_ON(c->usage_gc); BUG_ON(c->usage_gc);
c->usage_gc = __alloc_percpu_gfp(fs_usage_u64s(c) * sizeof(u64), c->usage_gc = __alloc_percpu_gfp(fs_usage_u64s(c) * sizeof(u64),
@ -1329,7 +1315,7 @@ static int bch2_gc_start(struct bch_fs *c)
return -BCH_ERR_ENOMEM_gc_start; return -BCH_ERR_ENOMEM_gc_start;
} }
for_each_member_device(ca, c, i) { for_each_member_device(c, ca) {
BUG_ON(ca->usage_gc); BUG_ON(ca->usage_gc);
ca->usage_gc = alloc_percpu(struct bch_dev_usage); ca->usage_gc = alloc_percpu(struct bch_dev_usage);
@ -1348,10 +1334,7 @@ static int bch2_gc_start(struct bch_fs *c)
static int bch2_gc_reset(struct bch_fs *c) static int bch2_gc_reset(struct bch_fs *c)
{ {
struct bch_dev *ca; for_each_member_device(c, ca) {
unsigned i;
for_each_member_device(ca, c, i) {
free_percpu(ca->usage_gc); free_percpu(ca->usage_gc);
ca->usage_gc = NULL; ca->usage_gc = NULL;
} }
@ -1389,9 +1372,6 @@ static int bch2_alloc_write_key(struct btree_trans *trans,
enum bch_data_type type; enum bch_data_type type;
int ret; int ret;
if (bkey_ge(iter->pos, POS(ca->dev_idx, ca->mi.nbuckets)))
return 1;
old = bch2_alloc_to_v4(k, &old_convert); old = bch2_alloc_to_v4(k, &old_convert);
new = *old; new = *old;
@ -1488,52 +1468,36 @@ static int bch2_alloc_write_key(struct btree_trans *trans,
static int bch2_gc_alloc_done(struct bch_fs *c, bool metadata_only) static int bch2_gc_alloc_done(struct bch_fs *c, bool metadata_only)
{ {
struct btree_trans *trans = bch2_trans_get(c);
struct btree_iter iter;
struct bkey_s_c k;
struct bch_dev *ca;
unsigned i;
int ret = 0; int ret = 0;
for_each_member_device(ca, c, i) { for_each_member_device(c, ca) {
ret = for_each_btree_key_commit(trans, iter, BTREE_ID_alloc, ret = bch2_trans_run(c,
POS(ca->dev_idx, ca->mi.first_bucket), for_each_btree_key_upto_commit(trans, iter, BTREE_ID_alloc,
BTREE_ITER_SLOTS|BTREE_ITER_PREFETCH, k, POS(ca->dev_idx, ca->mi.first_bucket),
NULL, NULL, BTREE_INSERT_LAZY_RW, POS(ca->dev_idx, ca->mi.nbuckets - 1),
bch2_alloc_write_key(trans, &iter, k, metadata_only)); BTREE_ITER_SLOTS|BTREE_ITER_PREFETCH, k,
NULL, NULL, BCH_TRANS_COMMIT_lazy_rw,
if (ret < 0) { bch2_alloc_write_key(trans, &iter, k, metadata_only)));
bch_err_fn(c, ret); if (ret) {
percpu_ref_put(&ca->ref); percpu_ref_put(&ca->ref);
break; break;
} }
} }
bch2_trans_put(trans); bch_err_fn(c, ret);
return ret < 0 ? ret : 0; return ret;
} }
static int bch2_gc_alloc_start(struct bch_fs *c, bool metadata_only) static int bch2_gc_alloc_start(struct bch_fs *c, bool metadata_only)
{ {
struct bch_dev *ca; for_each_member_device(c, ca) {
struct btree_trans *trans = bch2_trans_get(c);
struct btree_iter iter;
struct bkey_s_c k;
struct bucket *g;
struct bch_alloc_v4 a_convert;
const struct bch_alloc_v4 *a;
unsigned i;
int ret;
for_each_member_device(ca, c, i) {
struct bucket_array *buckets = kvpmalloc(sizeof(struct bucket_array) + struct bucket_array *buckets = kvpmalloc(sizeof(struct bucket_array) +
ca->mi.nbuckets * sizeof(struct bucket), ca->mi.nbuckets * sizeof(struct bucket),
GFP_KERNEL|__GFP_ZERO); GFP_KERNEL|__GFP_ZERO);
if (!buckets) { if (!buckets) {
percpu_ref_put(&ca->ref); percpu_ref_put(&ca->ref);
bch_err(c, "error allocating ca->buckets[gc]"); bch_err(c, "error allocating ca->buckets[gc]");
ret = -BCH_ERR_ENOMEM_gc_alloc_start; return -BCH_ERR_ENOMEM_gc_alloc_start;
goto err;
} }
buckets->first_bucket = ca->mi.first_bucket; buckets->first_bucket = ca->mi.first_bucket;
@ -1541,42 +1505,38 @@ static int bch2_gc_alloc_start(struct bch_fs *c, bool metadata_only)
rcu_assign_pointer(ca->buckets_gc, buckets); rcu_assign_pointer(ca->buckets_gc, buckets);
} }
ret = for_each_btree_key2(trans, iter, BTREE_ID_alloc, POS_MIN, int ret = bch2_trans_run(c,
BTREE_ITER_PREFETCH, k, ({ for_each_btree_key(trans, iter, BTREE_ID_alloc, POS_MIN,
ca = bch_dev_bkey_exists(c, k.k->p.inode); BTREE_ITER_PREFETCH, k, ({
g = gc_bucket(ca, k.k->p.offset); struct bch_dev *ca = bch_dev_bkey_exists(c, k.k->p.inode);
struct bucket *g = gc_bucket(ca, k.k->p.offset);
a = bch2_alloc_to_v4(k, &a_convert); struct bch_alloc_v4 a_convert;
const struct bch_alloc_v4 *a = bch2_alloc_to_v4(k, &a_convert);
g->gen_valid = 1; g->gen_valid = 1;
g->gen = a->gen; g->gen = a->gen;
if (metadata_only && if (metadata_only &&
(a->data_type == BCH_DATA_user || (a->data_type == BCH_DATA_user ||
a->data_type == BCH_DATA_cached || a->data_type == BCH_DATA_cached ||
a->data_type == BCH_DATA_parity)) { a->data_type == BCH_DATA_parity)) {
g->data_type = a->data_type; g->data_type = a->data_type;
g->dirty_sectors = a->dirty_sectors; g->dirty_sectors = a->dirty_sectors;
g->cached_sectors = a->cached_sectors; g->cached_sectors = a->cached_sectors;
g->stripe = a->stripe; g->stripe = a->stripe;
g->stripe_redundancy = a->stripe_redundancy; g->stripe_redundancy = a->stripe_redundancy;
} }
0; 0;
})); })));
err: bch_err_fn(c, ret);
bch2_trans_put(trans);
if (ret)
bch_err_fn(c, ret);
return ret; return ret;
} }
static void bch2_gc_alloc_reset(struct bch_fs *c, bool metadata_only) static void bch2_gc_alloc_reset(struct bch_fs *c, bool metadata_only)
{ {
struct bch_dev *ca; for_each_member_device(c, ca) {
unsigned i;
for_each_member_device(ca, c, i) {
struct bucket_array *buckets = gc_bucket_array(ca); struct bucket_array *buckets = gc_bucket_array(ca);
struct bucket *g; struct bucket *g;
@ -1634,7 +1594,7 @@ static int bch2_gc_write_reflink_key(struct btree_trans *trans,
if (!r->refcount) if (!r->refcount)
new->k.type = KEY_TYPE_deleted; new->k.type = KEY_TYPE_deleted;
else else
*bkey_refcount(new) = cpu_to_le64(r->refcount); *bkey_refcount(bkey_i_to_s(new)) = cpu_to_le64(r->refcount);
} }
fsck_err: fsck_err:
printbuf_exit(&buf); printbuf_exit(&buf);
@ -1643,64 +1603,52 @@ static int bch2_gc_write_reflink_key(struct btree_trans *trans,
static int bch2_gc_reflink_done(struct bch_fs *c, bool metadata_only) static int bch2_gc_reflink_done(struct bch_fs *c, bool metadata_only)
{ {
struct btree_trans *trans;
struct btree_iter iter;
struct bkey_s_c k;
size_t idx = 0; size_t idx = 0;
int ret = 0;
if (metadata_only) if (metadata_only)
return 0; return 0;
trans = bch2_trans_get(c); int ret = bch2_trans_run(c,
for_each_btree_key_commit(trans, iter,
ret = for_each_btree_key_commit(trans, iter, BTREE_ID_reflink, POS_MIN,
BTREE_ID_reflink, POS_MIN, BTREE_ITER_PREFETCH, k,
BTREE_ITER_PREFETCH, k, NULL, NULL, BCH_TRANS_COMMIT_no_enospc,
NULL, NULL, BTREE_INSERT_NOFAIL, bch2_gc_write_reflink_key(trans, &iter, k, &idx)));
bch2_gc_write_reflink_key(trans, &iter, k, &idx));
c->reflink_gc_nr = 0; c->reflink_gc_nr = 0;
bch2_trans_put(trans);
return ret; return ret;
} }
static int bch2_gc_reflink_start(struct bch_fs *c, static int bch2_gc_reflink_start(struct bch_fs *c,
bool metadata_only) bool metadata_only)
{ {
struct btree_trans *trans;
struct btree_iter iter;
struct bkey_s_c k;
struct reflink_gc *r;
int ret = 0;
if (metadata_only) if (metadata_only)
return 0; return 0;
trans = bch2_trans_get(c);
c->reflink_gc_nr = 0; c->reflink_gc_nr = 0;
for_each_btree_key(trans, iter, BTREE_ID_reflink, POS_MIN, int ret = bch2_trans_run(c,
BTREE_ITER_PREFETCH, k, ret) { for_each_btree_key(trans, iter, BTREE_ID_reflink, POS_MIN,
const __le64 *refcount = bkey_refcount_c(k); BTREE_ITER_PREFETCH, k, ({
const __le64 *refcount = bkey_refcount_c(k);
if (!refcount) if (!refcount)
continue; continue;
r = genradix_ptr_alloc(&c->reflink_gc_table, c->reflink_gc_nr++, struct reflink_gc *r = genradix_ptr_alloc(&c->reflink_gc_table,
GFP_KERNEL); c->reflink_gc_nr++, GFP_KERNEL);
if (!r) { if (!r) {
ret = -BCH_ERR_ENOMEM_gc_reflink_start; ret = -BCH_ERR_ENOMEM_gc_reflink_start;
break; break;
} }
r->offset = k.k->p.offset; r->offset = k.k->p.offset;
r->size = k.k->size; r->size = k.k->size;
r->refcount = 0; r->refcount = 0;
} 0;
bch2_trans_iter_exit(trans, &iter); })));
bch2_trans_put(trans); bch_err_fn(c, ret);
return ret; return ret;
} }
@ -1768,24 +1716,15 @@ static int bch2_gc_write_stripes_key(struct btree_trans *trans,
static int bch2_gc_stripes_done(struct bch_fs *c, bool metadata_only) static int bch2_gc_stripes_done(struct bch_fs *c, bool metadata_only)
{ {
struct btree_trans *trans;
struct btree_iter iter;
struct bkey_s_c k;
int ret = 0;
if (metadata_only) if (metadata_only)
return 0; return 0;
trans = bch2_trans_get(c); return bch2_trans_run(c,
for_each_btree_key_commit(trans, iter,
ret = for_each_btree_key_commit(trans, iter, BTREE_ID_stripes, POS_MIN,
BTREE_ID_stripes, POS_MIN, BTREE_ITER_PREFETCH, k,
BTREE_ITER_PREFETCH, k, NULL, NULL, BCH_TRANS_COMMIT_no_enospc,
NULL, NULL, BTREE_INSERT_NOFAIL, bch2_gc_write_stripes_key(trans, &iter, k)));
bch2_gc_write_stripes_key(trans, &iter, k));
bch2_trans_put(trans);
return ret;
} }
static void bch2_gc_stripes_reset(struct bch_fs *c, bool metadata_only) static void bch2_gc_stripes_reset(struct bch_fs *c, bool metadata_only)
@ -1848,7 +1787,7 @@ int bch2_gc(struct bch_fs *c, bool initial, bool metadata_only)
#endif #endif
c->gc_count++; c->gc_count++;
if (test_bit(BCH_FS_NEED_ANOTHER_GC, &c->flags) || if (test_bit(BCH_FS_need_another_gc, &c->flags) ||
(!iter && bch2_test_restart_gc)) { (!iter && bch2_test_restart_gc)) {
if (iter++ > 2) { if (iter++ > 2) {
bch_info(c, "Unable to fix bucket gens, looping"); bch_info(c, "Unable to fix bucket gens, looping");
@ -1860,7 +1799,7 @@ int bch2_gc(struct bch_fs *c, bool initial, bool metadata_only)
* XXX: make sure gens we fixed got saved * XXX: make sure gens we fixed got saved
*/ */
bch_info(c, "Second GC pass needed, restarting:"); bch_info(c, "Second GC pass needed, restarting:");
clear_bit(BCH_FS_NEED_ANOTHER_GC, &c->flags); clear_bit(BCH_FS_need_another_gc, &c->flags);
__gc_pos_set(c, gc_phase(GC_PHASE_NOT_RUNNING)); __gc_pos_set(c, gc_phase(GC_PHASE_NOT_RUNNING));
bch2_gc_stripes_reset(c, metadata_only); bch2_gc_stripes_reset(c, metadata_only);
@ -1900,9 +1839,7 @@ int bch2_gc(struct bch_fs *c, bool initial, bool metadata_only)
* allocator thread - issue wakeup in case they blocked on gc_lock: * allocator thread - issue wakeup in case they blocked on gc_lock:
*/ */
closure_wake_up(&c->freelist_wait); closure_wake_up(&c->freelist_wait);
bch_err_fn(c, ret);
if (ret)
bch_err_fn(c, ret);
return ret; return ret;
} }
@ -1912,7 +1849,6 @@ static int gc_btree_gens_key(struct btree_trans *trans,
{ {
struct bch_fs *c = trans->c; struct bch_fs *c = trans->c;
struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k); struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k);
const struct bch_extent_ptr *ptr;
struct bkey_i *u; struct bkey_i *u;
int ret; int ret;
@ -1970,12 +1906,7 @@ static int bch2_alloc_write_oldest_gen(struct btree_trans *trans, struct btree_i
int bch2_gc_gens(struct bch_fs *c) int bch2_gc_gens(struct bch_fs *c)
{ {
struct btree_trans *trans;
struct btree_iter iter;
struct bkey_s_c k;
struct bch_dev *ca;
u64 b, start_time = local_clock(); u64 b, start_time = local_clock();
unsigned i;
int ret; int ret;
/* /*
@ -1988,9 +1919,8 @@ int bch2_gc_gens(struct bch_fs *c)
trace_and_count(c, gc_gens_start, c); trace_and_count(c, gc_gens_start, c);
down_read(&c->gc_lock); down_read(&c->gc_lock);
trans = bch2_trans_get(c);
for_each_member_device(ca, c, i) { for_each_member_device(c, ca) {
struct bucket_gens *gens = bucket_gens(ca); struct bucket_gens *gens = bucket_gens(ca);
BUG_ON(ca->oldest_gen); BUG_ON(ca->oldest_gen);
@ -2007,33 +1937,31 @@ int bch2_gc_gens(struct bch_fs *c)
ca->oldest_gen[b] = gens->b[b]; ca->oldest_gen[b] = gens->b[b];
} }
for (i = 0; i < BTREE_ID_NR; i++) for (unsigned i = 0; i < BTREE_ID_NR; i++)
if (btree_type_has_ptrs(i)) { if (btree_type_has_ptrs(i)) {
c->gc_gens_btree = i; c->gc_gens_btree = i;
c->gc_gens_pos = POS_MIN; c->gc_gens_pos = POS_MIN;
ret = for_each_btree_key_commit(trans, iter, i, ret = bch2_trans_run(c,
POS_MIN, for_each_btree_key_commit(trans, iter, i,
BTREE_ITER_PREFETCH|BTREE_ITER_ALL_SNAPSHOTS, POS_MIN,
k, BTREE_ITER_PREFETCH|BTREE_ITER_ALL_SNAPSHOTS,
NULL, NULL, k,
BTREE_INSERT_NOFAIL, NULL, NULL,
gc_btree_gens_key(trans, &iter, k)); BCH_TRANS_COMMIT_no_enospc,
if (ret && !bch2_err_matches(ret, EROFS)) gc_btree_gens_key(trans, &iter, k)));
bch_err_fn(c, ret);
if (ret) if (ret)
goto err; goto err;
} }
ret = for_each_btree_key_commit(trans, iter, BTREE_ID_alloc, ret = bch2_trans_run(c,
POS_MIN, for_each_btree_key_commit(trans, iter, BTREE_ID_alloc,
BTREE_ITER_PREFETCH, POS_MIN,
k, BTREE_ITER_PREFETCH,
NULL, NULL, k,
BTREE_INSERT_NOFAIL, NULL, NULL,
bch2_alloc_write_oldest_gen(trans, &iter, k)); BCH_TRANS_COMMIT_no_enospc,
if (ret && !bch2_err_matches(ret, EROFS)) bch2_alloc_write_oldest_gen(trans, &iter, k)));
bch_err_fn(c, ret);
if (ret) if (ret)
goto err; goto err;
@ -2045,14 +1973,15 @@ int bch2_gc_gens(struct bch_fs *c)
bch2_time_stats_update(&c->times[BCH_TIME_btree_gc], start_time); bch2_time_stats_update(&c->times[BCH_TIME_btree_gc], start_time);
trace_and_count(c, gc_gens_end, c); trace_and_count(c, gc_gens_end, c);
err: err:
for_each_member_device(ca, c, i) { for_each_member_device(c, ca) {
kvfree(ca->oldest_gen); kvfree(ca->oldest_gen);
ca->oldest_gen = NULL; ca->oldest_gen = NULL;
} }
bch2_trans_put(trans);
up_read(&c->gc_lock); up_read(&c->gc_lock);
mutex_unlock(&c->gc_gens_lock); mutex_unlock(&c->gc_gens_lock);
if (!bch2_err_matches(ret, EROFS))
bch_err_fn(c, ret);
return ret; return ret;
} }
@ -2062,7 +1991,6 @@ static int bch2_gc_thread(void *arg)
struct io_clock *clock = &c->io_clock[WRITE]; struct io_clock *clock = &c->io_clock[WRITE];
unsigned long last = atomic64_read(&clock->now); unsigned long last = atomic64_read(&clock->now);
unsigned last_kick = atomic_read(&c->kick_gc); unsigned last_kick = atomic_read(&c->kick_gc);
int ret;
set_freezable(); set_freezable();
@ -2102,11 +2030,8 @@ static int bch2_gc_thread(void *arg)
#if 0 #if 0
ret = bch2_gc(c, false, false); ret = bch2_gc(c, false, false);
#else #else
ret = bch2_gc_gens(c); bch2_gc_gens(c);
#endif #endif
if (ret < 0)
bch_err_fn(c, ret);
debug_check_no_locks_held(); debug_check_no_locks_held();
} }

View File

@ -524,7 +524,8 @@ static void btree_err_msg(struct printbuf *out, struct bch_fs *c,
prt_printf(out, "at btree "); prt_printf(out, "at btree ");
bch2_btree_pos_to_text(out, c, b); bch2_btree_pos_to_text(out, c, b);
prt_printf(out, "\n node offset %u", b->written); prt_printf(out, "\n node offset %u/%u",
b->written, btree_ptr_sectors_written(&b->key));
if (i) if (i)
prt_printf(out, " bset u64s %u", le16_to_cpu(i->u64s)); prt_printf(out, " bset u64s %u", le16_to_cpu(i->u64s));
prt_str(out, ": "); prt_str(out, ": ");
@ -830,6 +831,23 @@ static int bset_key_invalid(struct bch_fs *c, struct btree *b,
(rw == WRITE ? bch2_bkey_val_invalid(c, k, READ, err) : 0); (rw == WRITE ? bch2_bkey_val_invalid(c, k, READ, err) : 0);
} }
static bool __bkey_valid(struct bch_fs *c, struct btree *b,
struct bset *i, struct bkey_packed *k)
{
if (bkey_p_next(k) > vstruct_last(i))
return false;
if (k->format > KEY_FORMAT_CURRENT)
return false;
struct printbuf buf = PRINTBUF;
struct bkey tmp;
struct bkey_s u = __bkey_disassemble(b, k, &tmp);
bool ret = __bch2_bkey_invalid(c, u.s_c, btree_node_type(b), READ, &buf);
printbuf_exit(&buf);
return ret;
}
static int validate_bset_keys(struct bch_fs *c, struct btree *b, static int validate_bset_keys(struct bch_fs *c, struct btree *b,
struct bset *i, int write, struct bset *i, int write,
bool have_retry, bool *saw_error) bool have_retry, bool *saw_error)
@ -845,6 +863,7 @@ static int validate_bset_keys(struct bch_fs *c, struct btree *b,
k != vstruct_last(i);) { k != vstruct_last(i);) {
struct bkey_s u; struct bkey_s u;
struct bkey tmp; struct bkey tmp;
unsigned next_good_key;
if (btree_err_on(bkey_p_next(k) > vstruct_last(i), if (btree_err_on(bkey_p_next(k) > vstruct_last(i),
-BCH_ERR_btree_node_read_err_fixable, -BCH_ERR_btree_node_read_err_fixable,
@ -859,12 +878,8 @@ static int validate_bset_keys(struct bch_fs *c, struct btree *b,
-BCH_ERR_btree_node_read_err_fixable, -BCH_ERR_btree_node_read_err_fixable,
c, NULL, b, i, c, NULL, b, i,
btree_node_bkey_bad_format, btree_node_bkey_bad_format,
"invalid bkey format %u", k->format)) { "invalid bkey format %u", k->format))
i->u64s = cpu_to_le16(le16_to_cpu(i->u64s) - k->u64s); goto drop_this_key;
memmove_u64s_down(k, bkey_p_next(k),
(u64 *) vstruct_end(i) - (u64 *) k);
continue;
}
/* XXX: validate k->u64s */ /* XXX: validate k->u64s */
if (!write) if (!write)
@ -885,11 +900,7 @@ static int validate_bset_keys(struct bch_fs *c, struct btree *b,
c, NULL, b, i, c, NULL, b, i,
btree_node_bad_bkey, btree_node_bad_bkey,
"invalid bkey: %s", buf.buf); "invalid bkey: %s", buf.buf);
goto drop_this_key;
i->u64s = cpu_to_le16(le16_to_cpu(i->u64s) - k->u64s);
memmove_u64s_down(k, bkey_p_next(k),
(u64 *) vstruct_end(i) - (u64 *) k);
continue;
} }
if (write) if (write)
@ -906,21 +917,45 @@ static int validate_bset_keys(struct bch_fs *c, struct btree *b,
prt_printf(&buf, " > "); prt_printf(&buf, " > ");
bch2_bkey_to_text(&buf, u.k); bch2_bkey_to_text(&buf, u.k);
bch2_dump_bset(c, b, i, 0);
if (btree_err(-BCH_ERR_btree_node_read_err_fixable, if (btree_err(-BCH_ERR_btree_node_read_err_fixable,
c, NULL, b, i, c, NULL, b, i,
btree_node_bkey_out_of_order, btree_node_bkey_out_of_order,
"%s", buf.buf)) { "%s", buf.buf))
i->u64s = cpu_to_le16(le16_to_cpu(i->u64s) - k->u64s); goto drop_this_key;
memmove_u64s_down(k, bkey_p_next(k),
(u64 *) vstruct_end(i) - (u64 *) k);
continue;
}
} }
prev = k; prev = k;
k = bkey_p_next(k); k = bkey_p_next(k);
continue;
drop_this_key:
next_good_key = k->u64s;
if (!next_good_key ||
(BSET_BIG_ENDIAN(i) == CPU_BIG_ENDIAN &&
version >= bcachefs_metadata_version_snapshot)) {
/*
* only do scanning if bch2_bkey_compat() has nothing to
* do
*/
if (!__bkey_valid(c, b, i, (void *) ((u64 *) k + next_good_key))) {
for (next_good_key = 1;
next_good_key < (u64 *) vstruct_last(i) - (u64 *) k;
next_good_key++)
if (__bkey_valid(c, b, i, (void *) ((u64 *) k + next_good_key)))
goto got_good_key;
}
/*
* didn't find a good key, have to truncate the rest of
* the bset
*/
next_good_key = (u64 *) vstruct_last(i) - (u64 *) k;
}
got_good_key:
le16_add_cpu(&i->u64s, -next_good_key);
memmove_u64s_down(k, bkey_p_next(k), (u64 *) vstruct_end(i) - (u64 *) k);
} }
fsck_err: fsck_err:
printbuf_exit(&buf); printbuf_exit(&buf);
@ -934,7 +969,6 @@ int bch2_btree_node_read_done(struct bch_fs *c, struct bch_dev *ca,
struct sort_iter *iter; struct sort_iter *iter;
struct btree_node *sorted; struct btree_node *sorted;
struct bkey_packed *k; struct bkey_packed *k;
struct bch_extent_ptr *ptr;
struct bset *i; struct bset *i;
bool used_mempool, blacklisted; bool used_mempool, blacklisted;
bool updated_range = b->key.k.type == KEY_TYPE_btree_ptr_v2 && bool updated_range = b->key.k.type == KEY_TYPE_btree_ptr_v2 &&
@ -943,6 +977,7 @@ int bch2_btree_node_read_done(struct bch_fs *c, struct bch_dev *ca,
unsigned ptr_written = btree_ptr_sectors_written(&b->key); unsigned ptr_written = btree_ptr_sectors_written(&b->key);
struct printbuf buf = PRINTBUF; struct printbuf buf = PRINTBUF;
int ret = 0, retry_read = 0, write = READ; int ret = 0, retry_read = 0, write = READ;
u64 start_time = local_clock();
b->version_ondisk = U16_MAX; b->version_ondisk = U16_MAX;
/* We might get called multiple times on read retry: */ /* We might get called multiple times on read retry: */
@ -968,12 +1003,20 @@ int bch2_btree_node_read_done(struct bch_fs *c, struct bch_dev *ca,
struct bch_btree_ptr_v2 *bp = struct bch_btree_ptr_v2 *bp =
&bkey_i_to_btree_ptr_v2(&b->key)->v; &bkey_i_to_btree_ptr_v2(&b->key)->v;
bch2_bpos_to_text(&buf, b->data->min_key);
prt_str(&buf, "-");
bch2_bpos_to_text(&buf, b->data->max_key);
btree_err_on(b->data->keys.seq != bp->seq, btree_err_on(b->data->keys.seq != bp->seq,
-BCH_ERR_btree_node_read_err_must_retry, -BCH_ERR_btree_node_read_err_must_retry,
c, ca, b, NULL, c, ca, b, NULL,
btree_node_bad_seq, btree_node_bad_seq,
"got wrong btree node (seq %llx want %llx)", "got wrong btree node (want %llx got %llx)\n"
b->data->keys.seq, bp->seq); "got btree %s level %llu pos %s",
bp->seq, b->data->keys.seq,
bch2_btree_id_str(BTREE_NODE_ID(b->data)),
BTREE_NODE_LEVEL(b->data),
buf.buf);
} else { } else {
btree_err_on(!b->data->keys.seq, btree_err_on(!b->data->keys.seq,
-BCH_ERR_btree_node_read_err_must_retry, -BCH_ERR_btree_node_read_err_must_retry,
@ -999,8 +1042,8 @@ int bch2_btree_node_read_done(struct bch_fs *c, struct bch_dev *ca,
nonce = btree_nonce(i, b->written << 9); nonce = btree_nonce(i, b->written << 9);
csum_bad = bch2_crc_cmp(b->data->csum, struct bch_csum csum = csum_vstruct(c, BSET_CSUM_TYPE(i), nonce, b->data);
csum_vstruct(c, BSET_CSUM_TYPE(i), nonce, b->data)); csum_bad = bch2_crc_cmp(b->data->csum, csum);
if (csum_bad) if (csum_bad)
bch2_io_error(ca, BCH_MEMBER_ERROR_checksum); bch2_io_error(ca, BCH_MEMBER_ERROR_checksum);
@ -1008,7 +1051,10 @@ int bch2_btree_node_read_done(struct bch_fs *c, struct bch_dev *ca,
-BCH_ERR_btree_node_read_err_want_retry, -BCH_ERR_btree_node_read_err_want_retry,
c, ca, b, i, c, ca, b, i,
bset_bad_csum, bset_bad_csum,
"invalid checksum"); "%s",
(printbuf_reset(&buf),
bch2_csum_err_msg(&buf, BSET_CSUM_TYPE(i), b->data->csum, csum),
buf.buf));
ret = bset_encrypt(c, i, b->written << 9); ret = bset_encrypt(c, i, b->written << 9);
if (bch2_fs_fatal_err_on(ret, c, if (bch2_fs_fatal_err_on(ret, c,
@ -1037,8 +1083,8 @@ int bch2_btree_node_read_done(struct bch_fs *c, struct bch_dev *ca,
"unknown checksum type %llu", BSET_CSUM_TYPE(i)); "unknown checksum type %llu", BSET_CSUM_TYPE(i));
nonce = btree_nonce(i, b->written << 9); nonce = btree_nonce(i, b->written << 9);
csum_bad = bch2_crc_cmp(bne->csum, struct bch_csum csum = csum_vstruct(c, BSET_CSUM_TYPE(i), nonce, bne);
csum_vstruct(c, BSET_CSUM_TYPE(i), nonce, bne)); csum_bad = bch2_crc_cmp(bne->csum, csum);
if (csum_bad) if (csum_bad)
bch2_io_error(ca, BCH_MEMBER_ERROR_checksum); bch2_io_error(ca, BCH_MEMBER_ERROR_checksum);
@ -1046,7 +1092,10 @@ int bch2_btree_node_read_done(struct bch_fs *c, struct bch_dev *ca,
-BCH_ERR_btree_node_read_err_want_retry, -BCH_ERR_btree_node_read_err_want_retry,
c, ca, b, i, c, ca, b, i,
bset_bad_csum, bset_bad_csum,
"invalid checksum"); "%s",
(printbuf_reset(&buf),
bch2_csum_err_msg(&buf, BSET_CSUM_TYPE(i), bne->csum, csum),
buf.buf));
ret = bset_encrypt(c, i, b->written << 9); ret = bset_encrypt(c, i, b->written << 9);
if (bch2_fs_fatal_err_on(ret, c, if (bch2_fs_fatal_err_on(ret, c,
@ -1202,6 +1251,7 @@ int bch2_btree_node_read_done(struct bch_fs *c, struct bch_dev *ca,
out: out:
mempool_free(iter, &c->fill_iter); mempool_free(iter, &c->fill_iter);
printbuf_exit(&buf); printbuf_exit(&buf);
bch2_time_stats_update(&c->times[BCH_TIME_btree_node_read_done], start_time);
return retry_read; return retry_read;
fsck_err: fsck_err:
if (ret == -BCH_ERR_btree_node_read_err_want_retry || if (ret == -BCH_ERR_btree_node_read_err_want_retry ||
@ -1575,16 +1625,17 @@ static int btree_node_read_all_replicas(struct bch_fs *c, struct btree *b, bool
return 0; return 0;
} }
void bch2_btree_node_read(struct bch_fs *c, struct btree *b, void bch2_btree_node_read(struct btree_trans *trans, struct btree *b,
bool sync) bool sync)
{ {
struct bch_fs *c = trans->c;
struct extent_ptr_decoded pick; struct extent_ptr_decoded pick;
struct btree_read_bio *rb; struct btree_read_bio *rb;
struct bch_dev *ca; struct bch_dev *ca;
struct bio *bio; struct bio *bio;
int ret; int ret;
trace_and_count(c, btree_node_read, c, b); trace_and_count(c, btree_node_read, trans, b);
if (bch2_verify_all_btree_replicas && if (bch2_verify_all_btree_replicas &&
!btree_node_read_all_replicas(c, b, sync)) !btree_node_read_all_replicas(c, b, sync))
@ -1637,7 +1688,7 @@ void bch2_btree_node_read(struct bch_fs *c, struct btree *b,
if (sync) { if (sync) {
submit_bio_wait(bio); submit_bio_wait(bio);
bch2_latency_acct(ca, rb->start_time, READ);
btree_node_read_work(&rb->work); btree_node_read_work(&rb->work);
} else { } else {
submit_bio(bio); submit_bio(bio);
@ -1663,12 +1714,12 @@ static int __bch2_btree_root_read(struct btree_trans *trans, enum btree_id id,
closure_init_stack(&cl); closure_init_stack(&cl);
do { do {
ret = bch2_btree_cache_cannibalize_lock(c, &cl); ret = bch2_btree_cache_cannibalize_lock(trans, &cl);
closure_sync(&cl); closure_sync(&cl);
} while (ret); } while (ret);
b = bch2_btree_node_mem_alloc(trans, level != 0); b = bch2_btree_node_mem_alloc(trans, level != 0);
bch2_btree_cache_cannibalize_unlock(c); bch2_btree_cache_cannibalize_unlock(trans);
BUG_ON(IS_ERR(b)); BUG_ON(IS_ERR(b));
@ -1677,7 +1728,7 @@ static int __bch2_btree_root_read(struct btree_trans *trans, enum btree_id id,
set_btree_node_read_in_flight(b); set_btree_node_read_in_flight(b);
bch2_btree_node_read(c, b, true); bch2_btree_node_read(trans, b, true);
if (btree_node_read_error(b)) { if (btree_node_read_error(b)) {
bch2_btree_node_hash_remove(&c->btree_cache, b); bch2_btree_node_hash_remove(&c->btree_cache, b);
@ -1789,8 +1840,10 @@ static void btree_node_write_work(struct work_struct *work)
bch2_bkey_drop_ptrs(bkey_i_to_s(&wbio->key), ptr, bch2_bkey_drop_ptrs(bkey_i_to_s(&wbio->key), ptr,
bch2_dev_list_has_dev(wbio->wbio.failed, ptr->dev)); bch2_dev_list_has_dev(wbio->wbio.failed, ptr->dev));
if (!bch2_bkey_nr_ptrs(bkey_i_to_s_c(&wbio->key))) if (!bch2_bkey_nr_ptrs(bkey_i_to_s_c(&wbio->key))) {
ret = -BCH_ERR_btree_write_all_failed;
goto err; goto err;
}
if (wbio->wbio.first_btree_write) { if (wbio->wbio.first_btree_write) {
if (wbio->wbio.failed.nr) { if (wbio->wbio.failed.nr) {
@ -1800,9 +1853,9 @@ static void btree_node_write_work(struct work_struct *work)
ret = bch2_trans_do(c, NULL, NULL, 0, ret = bch2_trans_do(c, NULL, NULL, 0,
bch2_btree_node_update_key_get_iter(trans, b, &wbio->key, bch2_btree_node_update_key_get_iter(trans, b, &wbio->key,
BCH_WATERMARK_reclaim| BCH_WATERMARK_reclaim|
BTREE_INSERT_JOURNAL_RECLAIM| BCH_TRANS_COMMIT_journal_reclaim|
BTREE_INSERT_NOFAIL| BCH_TRANS_COMMIT_no_enospc|
BTREE_INSERT_NOCHECK_RW, BCH_TRANS_COMMIT_no_check_rw,
!wbio->wbio.failed.nr)); !wbio->wbio.failed.nr));
if (ret) if (ret)
goto err; goto err;
@ -1885,7 +1938,6 @@ static int validate_bset_for_write(struct bch_fs *c, struct btree *b,
static void btree_write_submit(struct work_struct *work) static void btree_write_submit(struct work_struct *work)
{ {
struct btree_write_bio *wbio = container_of(work, struct btree_write_bio, work); struct btree_write_bio *wbio = container_of(work, struct btree_write_bio, work);
struct bch_extent_ptr *ptr;
BKEY_PADDED_ONSTACK(k, BKEY_BTREE_PTR_VAL_U64s_MAX) tmp; BKEY_PADDED_ONSTACK(k, BKEY_BTREE_PTR_VAL_U64s_MAX) tmp;
bkey_copy(&tmp.k, &wbio->key); bkey_copy(&tmp.k, &wbio->key);

View File

@ -130,7 +130,7 @@ void bch2_btree_init_next(struct btree_trans *, struct btree *);
int bch2_btree_node_read_done(struct bch_fs *, struct bch_dev *, int bch2_btree_node_read_done(struct bch_fs *, struct bch_dev *,
struct btree *, bool, bool *); struct btree *, bool, bool *);
void bch2_btree_node_read(struct bch_fs *, struct btree *, bool); void bch2_btree_node_read(struct btree_trans *, struct btree *, bool);
int bch2_btree_root_read(struct bch_fs *, enum btree_id, int bch2_btree_root_read(struct bch_fs *, enum btree_id,
const struct bkey_i *, unsigned); const struct bkey_i *, unsigned);

File diff suppressed because it is too large Load Diff

View File

@ -63,60 +63,57 @@ static inline void btree_trans_sort_paths(struct btree_trans *trans)
__bch2_btree_trans_sort_paths(trans); __bch2_btree_trans_sort_paths(trans);
} }
static inline struct btree_path * static inline unsigned long *trans_paths_nr(struct btree_path *paths)
__trans_next_path(struct btree_trans *trans, unsigned idx)
{ {
u64 l; return &container_of(paths, struct btree_trans_paths, paths[0])->nr_paths;
if (idx == BTREE_ITER_MAX)
return NULL;
l = trans->paths_allocated >> idx;
if (!l)
return NULL;
idx += __ffs64(l);
EBUG_ON(idx >= BTREE_ITER_MAX);
EBUG_ON(trans->paths[idx].idx != idx);
return &trans->paths[idx];
} }
#define trans_for_each_path_from(_trans, _path, _start) \ static inline unsigned long *trans_paths_allocated(struct btree_path *paths)
for (_path = __trans_next_path((_trans), _start); \ {
(_path); \ unsigned long *v = trans_paths_nr(paths);
_path = __trans_next_path((_trans), (_path)->idx + 1)) return v - BITS_TO_LONGS(*v);
}
#define trans_for_each_path(_trans, _path) \ #define trans_for_each_path_idx_from(_paths_allocated, _nr, _idx, _start)\
trans_for_each_path_from(_trans, _path, 0) for (_idx = _start; \
(_idx = find_next_bit(_paths_allocated, _nr, _idx)) < _nr; \
_idx++)
static inline struct btree_path * static inline struct btree_path *
__trans_next_path_safe(struct btree_trans *trans, unsigned *idx) __trans_next_path(struct btree_trans *trans, unsigned *idx)
{ {
u64 l; unsigned long *w = trans->paths_allocated + *idx / BITS_PER_LONG;
/*
* Open coded find_next_bit(), because
* - this is fast path, we can't afford the function call
* - and we know that nr_paths is a multiple of BITS_PER_LONG,
*/
while (*idx < trans->nr_paths) {
unsigned long v = *w >> (*idx & (BITS_PER_LONG - 1));
if (v) {
*idx += __ffs(v);
return trans->paths + *idx;
}
if (*idx == BTREE_ITER_MAX) *idx += BITS_PER_LONG;
return NULL; *idx &= ~(BITS_PER_LONG - 1);
w++;
}
l = trans->paths_allocated >> *idx; return NULL;
if (!l)
return NULL;
*idx += __ffs64(l);
EBUG_ON(*idx >= BTREE_ITER_MAX);
return &trans->paths[*idx];
} }
/* /*
* This version is intended to be safe for use on a btree_trans that is owned by * This version is intended to be safe for use on a btree_trans that is owned by
* another thread, for bch2_btree_trans_to_text(); * another thread, for bch2_btree_trans_to_text();
*/ */
#define trans_for_each_path_safe_from(_trans, _path, _idx, _start) \ #define trans_for_each_path_from(_trans, _path, _idx, _start) \
for (_idx = _start; \ for (_idx = _start; \
(_path = __trans_next_path_safe((_trans), &_idx)); \ (_path = __trans_next_path((_trans), &_idx)); \
_idx++) _idx++)
#define trans_for_each_path_safe(_trans, _path, _idx) \ #define trans_for_each_path(_trans, _path, _idx) \
trans_for_each_path_safe_from(_trans, _path, _idx, 0) trans_for_each_path_from(_trans, _path, _idx, 1)
static inline struct btree_path *next_btree_path(struct btree_trans *trans, struct btree_path *path) static inline struct btree_path *next_btree_path(struct btree_trans *trans, struct btree_path *path)
{ {
@ -138,10 +135,23 @@ static inline struct btree_path *prev_btree_path(struct btree_trans *trans, stru
: NULL; : NULL;
} }
#define trans_for_each_path_inorder(_trans, _path, _i) \ #define trans_for_each_path_idx_inorder(_trans, _iter) \
for (_i = 0; \ for (_iter = (struct trans_for_each_path_inorder_iter) { 0 }; \
((_path) = (_trans)->paths + trans->sorted[_i]), (_i) < (_trans)->nr_sorted;\ (_iter.path_idx = trans->sorted[_iter.sorted_idx], \
_i++) _iter.sorted_idx < (_trans)->nr_sorted); \
_iter.sorted_idx++)
struct trans_for_each_path_inorder_iter {
btree_path_idx_t sorted_idx;
btree_path_idx_t path_idx;
};
#define trans_for_each_path_inorder(_trans, _path, _iter) \
for (_iter = (struct trans_for_each_path_inorder_iter) { 0 }; \
(_iter.path_idx = trans->sorted[_iter.sorted_idx], \
_path = (_trans)->paths + _iter.path_idx, \
_iter.sorted_idx < (_trans)->nr_sorted); \
_iter.sorted_idx++)
#define trans_for_each_path_inorder_reverse(_trans, _path, _i) \ #define trans_for_each_path_inorder_reverse(_trans, _path, _i) \
for (_i = trans->nr_sorted - 1; \ for (_i = trans->nr_sorted - 1; \
@ -157,67 +167,65 @@ static inline bool __path_has_node(const struct btree_path *path,
static inline struct btree_path * static inline struct btree_path *
__trans_next_path_with_node(struct btree_trans *trans, struct btree *b, __trans_next_path_with_node(struct btree_trans *trans, struct btree *b,
unsigned idx) unsigned *idx)
{ {
struct btree_path *path = __trans_next_path(trans, idx); struct btree_path *path;
while (path && !__path_has_node(path, b)) while ((path = __trans_next_path(trans, idx)) &&
path = __trans_next_path(trans, path->idx + 1); !__path_has_node(path, b))
(*idx)++;
return path; return path;
} }
#define trans_for_each_path_with_node(_trans, _b, _path) \ #define trans_for_each_path_with_node(_trans, _b, _path, _iter) \
for (_path = __trans_next_path_with_node((_trans), (_b), 0); \ for (_iter = 1; \
(_path); \ (_path = __trans_next_path_with_node((_trans), (_b), &_iter));\
_path = __trans_next_path_with_node((_trans), (_b), \ _iter++)
(_path)->idx + 1))
struct btree_path *__bch2_btree_path_make_mut(struct btree_trans *, struct btree_path *, btree_path_idx_t __bch2_btree_path_make_mut(struct btree_trans *, btree_path_idx_t,
bool, unsigned long); bool, unsigned long);
static inline struct btree_path * __must_check static inline btree_path_idx_t __must_check
bch2_btree_path_make_mut(struct btree_trans *trans, bch2_btree_path_make_mut(struct btree_trans *trans,
struct btree_path *path, bool intent, btree_path_idx_t path, bool intent,
unsigned long ip) unsigned long ip)
{ {
if (path->ref > 1 || path->preserve) if (trans->paths[path].ref > 1 ||
trans->paths[path].preserve)
path = __bch2_btree_path_make_mut(trans, path, intent, ip); path = __bch2_btree_path_make_mut(trans, path, intent, ip);
path->should_be_locked = false; trans->paths[path].should_be_locked = false;
return path; return path;
} }
struct btree_path * __must_check btree_path_idx_t __must_check
__bch2_btree_path_set_pos(struct btree_trans *, struct btree_path *, __bch2_btree_path_set_pos(struct btree_trans *, btree_path_idx_t,
struct bpos, bool, unsigned long, int); struct bpos, bool, unsigned long);
static inline struct btree_path * __must_check static inline btree_path_idx_t __must_check
bch2_btree_path_set_pos(struct btree_trans *trans, bch2_btree_path_set_pos(struct btree_trans *trans,
struct btree_path *path, struct bpos new_pos, btree_path_idx_t path, struct bpos new_pos,
bool intent, unsigned long ip) bool intent, unsigned long ip)
{ {
int cmp = bpos_cmp(new_pos, path->pos); return !bpos_eq(new_pos, trans->paths[path].pos)
? __bch2_btree_path_set_pos(trans, path, new_pos, intent, ip)
return cmp
? __bch2_btree_path_set_pos(trans, path, new_pos, intent, ip, cmp)
: path; : path;
} }
int __must_check bch2_btree_path_traverse_one(struct btree_trans *, struct btree_path *, int __must_check bch2_btree_path_traverse_one(struct btree_trans *,
btree_path_idx_t,
unsigned, unsigned long); unsigned, unsigned long);
static inline int __must_check bch2_btree_path_traverse(struct btree_trans *trans, static inline int __must_check bch2_btree_path_traverse(struct btree_trans *trans,
struct btree_path *path, unsigned flags) btree_path_idx_t path, unsigned flags)
{ {
if (path->uptodate < BTREE_ITER_NEED_RELOCK) if (trans->paths[path].uptodate < BTREE_ITER_NEED_RELOCK)
return 0; return 0;
return bch2_btree_path_traverse_one(trans, path, flags, _RET_IP_); return bch2_btree_path_traverse_one(trans, path, flags, _RET_IP_);
} }
int __must_check bch2_btree_path_traverse(struct btree_trans *, btree_path_idx_t bch2_path_get(struct btree_trans *, enum btree_id, struct bpos,
struct btree_path *, unsigned);
struct btree_path *bch2_path_get(struct btree_trans *, enum btree_id, struct bpos,
unsigned, unsigned, unsigned, unsigned long); unsigned, unsigned, unsigned, unsigned long);
struct bkey_s_c bch2_btree_path_peek_slot(struct btree_path *, struct bkey *); struct bkey_s_c bch2_btree_path_peek_slot(struct btree_path *, struct bkey *);
@ -269,7 +277,7 @@ void bch2_btree_node_iter_fix(struct btree_trans *trans, struct btree_path *,
int bch2_btree_path_relock_intent(struct btree_trans *, struct btree_path *); int bch2_btree_path_relock_intent(struct btree_trans *, struct btree_path *);
void bch2_path_put(struct btree_trans *, struct btree_path *, bool); void bch2_path_put(struct btree_trans *, btree_path_idx_t, bool);
int bch2_trans_relock(struct btree_trans *); int bch2_trans_relock(struct btree_trans *);
int bch2_trans_relock_notrace(struct btree_trans *); int bch2_trans_relock_notrace(struct btree_trans *);
@ -335,7 +343,7 @@ static inline void bch2_btree_path_downgrade(struct btree_trans *trans,
void bch2_trans_downgrade(struct btree_trans *); void bch2_trans_downgrade(struct btree_trans *);
void bch2_trans_node_add(struct btree_trans *trans, struct btree *); void bch2_trans_node_add(struct btree_trans *trans, struct btree_path *, struct btree *);
void bch2_trans_node_reinit_iter(struct btree_trans *, struct btree *); void bch2_trans_node_reinit_iter(struct btree_trans *, struct btree *);
int __must_check __bch2_btree_iter_traverse(struct btree_iter *iter); int __must_check __bch2_btree_iter_traverse(struct btree_iter *iter);
@ -348,8 +356,6 @@ struct btree *bch2_btree_iter_next_node(struct btree_iter *);
struct bkey_s_c bch2_btree_iter_peek_upto(struct btree_iter *, struct bpos); struct bkey_s_c bch2_btree_iter_peek_upto(struct btree_iter *, struct bpos);
struct bkey_s_c bch2_btree_iter_next(struct btree_iter *); struct bkey_s_c bch2_btree_iter_next(struct btree_iter *);
struct bkey_s_c bch2_btree_iter_peek_all_levels(struct btree_iter *);
static inline struct bkey_s_c bch2_btree_iter_peek(struct btree_iter *iter) static inline struct bkey_s_c bch2_btree_iter_peek(struct btree_iter *iter)
{ {
return bch2_btree_iter_peek_upto(iter, SPOS_MAX); return bch2_btree_iter_peek_upto(iter, SPOS_MAX);
@ -376,10 +382,12 @@ static inline void __bch2_btree_iter_set_pos(struct btree_iter *iter, struct bpo
static inline void bch2_btree_iter_set_pos(struct btree_iter *iter, struct bpos new_pos) static inline void bch2_btree_iter_set_pos(struct btree_iter *iter, struct bpos new_pos)
{ {
struct btree_trans *trans = iter->trans;
if (unlikely(iter->update_path)) if (unlikely(iter->update_path))
bch2_path_put(iter->trans, iter->update_path, bch2_path_put(trans, iter->update_path,
iter->flags & BTREE_ITER_INTENT); iter->flags & BTREE_ITER_INTENT);
iter->update_path = NULL; iter->update_path = 0;
if (!(iter->flags & BTREE_ITER_ALL_SNAPSHOTS)) if (!(iter->flags & BTREE_ITER_ALL_SNAPSHOTS))
new_pos.snapshot = iter->snapshot; new_pos.snapshot = iter->snapshot;
@ -408,9 +416,6 @@ static inline unsigned __bch2_btree_iter_flags(struct btree_trans *trans,
unsigned btree_id, unsigned btree_id,
unsigned flags) unsigned flags)
{ {
if (flags & BTREE_ITER_ALL_LEVELS)
flags |= BTREE_ITER_ALL_SNAPSHOTS|__BTREE_ITER_ALL_SNAPSHOTS;
if (!(flags & (BTREE_ITER_ALL_SNAPSHOTS|BTREE_ITER_NOT_EXTENTS)) && if (!(flags & (BTREE_ITER_ALL_SNAPSHOTS|BTREE_ITER_NOT_EXTENTS)) &&
btree_id_is_extents(btree_id)) btree_id_is_extents(btree_id))
flags |= BTREE_ITER_IS_EXTENTS; flags |= BTREE_ITER_IS_EXTENTS;
@ -450,14 +455,16 @@ static inline void bch2_trans_iter_init_common(struct btree_trans *trans,
unsigned flags, unsigned flags,
unsigned long ip) unsigned long ip)
{ {
memset(iter, 0, sizeof(*iter)); iter->trans = trans;
iter->trans = trans; iter->update_path = 0;
iter->btree_id = btree_id; iter->key_cache_path = 0;
iter->flags = flags; iter->btree_id = btree_id;
iter->snapshot = pos.snapshot; iter->min_depth = 0;
iter->pos = pos; iter->flags = flags;
iter->k.p = pos; iter->snapshot = pos.snapshot;
iter->pos = pos;
iter->k = POS_KEY(pos);
iter->journal_idx = 0;
#ifdef CONFIG_BCACHEFS_DEBUG #ifdef CONFIG_BCACHEFS_DEBUG
iter->ip_allocated = ip; iter->ip_allocated = ip;
#endif #endif
@ -489,8 +496,10 @@ void bch2_trans_copy_iter(struct btree_iter *, struct btree_iter *);
static inline void set_btree_iter_dontneed(struct btree_iter *iter) static inline void set_btree_iter_dontneed(struct btree_iter *iter)
{ {
if (!iter->trans->restarted) struct btree_trans *trans = iter->trans;
iter->path->preserve = false;
if (!trans->restarted)
btree_iter_path(trans, iter)->preserve = false;
} }
void *__bch2_trans_kmalloc(struct btree_trans *, size_t); void *__bch2_trans_kmalloc(struct btree_trans *, size_t);
@ -512,7 +521,7 @@ static inline void *bch2_trans_kmalloc(struct btree_trans *trans, size_t size)
static inline void *bch2_trans_kmalloc_nomemzero(struct btree_trans *trans, size_t size) static inline void *bch2_trans_kmalloc_nomemzero(struct btree_trans *trans, size_t size)
{ {
size = roundup(size, 8); size = round_up(size, 8);
if (likely(trans->mem_top + size <= trans->mem_bytes)) { if (likely(trans->mem_top + size <= trans->mem_bytes)) {
void *p = trans->mem + trans->mem_top; void *p = trans->mem + trans->mem_top;
@ -581,7 +590,6 @@ static inline int __bch2_bkey_get_val_typed(struct btree_trans *trans,
KEY_TYPE_##_type, sizeof(*_val), _val) KEY_TYPE_##_type, sizeof(*_val), _val)
void bch2_trans_srcu_unlock(struct btree_trans *); void bch2_trans_srcu_unlock(struct btree_trans *);
void bch2_trans_srcu_lock(struct btree_trans *);
u32 bch2_trans_begin(struct btree_trans *); u32 bch2_trans_begin(struct btree_trans *);
@ -606,8 +614,6 @@ u32 bch2_trans_begin(struct btree_trans *);
static inline struct bkey_s_c bch2_btree_iter_peek_prev_type(struct btree_iter *iter, static inline struct bkey_s_c bch2_btree_iter_peek_prev_type(struct btree_iter *iter,
unsigned flags) unsigned flags)
{ {
BUG_ON(flags & BTREE_ITER_ALL_LEVELS);
return flags & BTREE_ITER_SLOTS ? bch2_btree_iter_peek_slot(iter) : return flags & BTREE_ITER_SLOTS ? bch2_btree_iter_peek_slot(iter) :
bch2_btree_iter_peek_prev(iter); bch2_btree_iter_peek_prev(iter);
} }
@ -615,8 +621,7 @@ static inline struct bkey_s_c bch2_btree_iter_peek_prev_type(struct btree_iter *
static inline struct bkey_s_c bch2_btree_iter_peek_type(struct btree_iter *iter, static inline struct bkey_s_c bch2_btree_iter_peek_type(struct btree_iter *iter,
unsigned flags) unsigned flags)
{ {
return flags & BTREE_ITER_ALL_LEVELS ? bch2_btree_iter_peek_all_levels(iter) : return flags & BTREE_ITER_SLOTS ? bch2_btree_iter_peek_slot(iter) :
flags & BTREE_ITER_SLOTS ? bch2_btree_iter_peek_slot(iter) :
bch2_btree_iter_peek(iter); bch2_btree_iter_peek(iter);
} }
@ -633,61 +638,34 @@ static inline struct bkey_s_c bch2_btree_iter_peek_upto_type(struct btree_iter *
return bch2_btree_iter_peek_slot(iter); return bch2_btree_iter_peek_slot(iter);
} }
int __bch2_btree_trans_too_many_iters(struct btree_trans *);
static inline int btree_trans_too_many_iters(struct btree_trans *trans) static inline int btree_trans_too_many_iters(struct btree_trans *trans)
{ {
if (hweight64(trans->paths_allocated) > BTREE_ITER_MAX - 8) { if (bitmap_weight(trans->paths_allocated, trans->nr_paths) > BTREE_ITER_INITIAL - 8)
trace_and_count(trans->c, trans_restart_too_many_iters, trans, _THIS_IP_); return __bch2_btree_trans_too_many_iters(trans);
return btree_trans_restart(trans, BCH_ERR_transaction_restart_too_many_iters);
}
return 0; return 0;
} }
struct bkey_s_c bch2_btree_iter_peek_and_restart_outlined(struct btree_iter *); /*
* goto instead of loop, so that when used inside for_each_btree_key2()
static inline struct bkey_s_c * break/continue work correctly
__bch2_btree_iter_peek_and_restart(struct btree_trans *trans, */
struct btree_iter *iter, unsigned flags)
{
struct bkey_s_c k;
while (btree_trans_too_many_iters(trans) ||
(k = bch2_btree_iter_peek_type(iter, flags),
bch2_err_matches(bkey_err(k), BCH_ERR_transaction_restart)))
bch2_trans_begin(trans);
return k;
}
static inline struct bkey_s_c
__bch2_btree_iter_peek_upto_and_restart(struct btree_trans *trans,
struct btree_iter *iter,
struct bpos end,
unsigned flags)
{
struct bkey_s_c k;
while (btree_trans_too_many_iters(trans) ||
(k = bch2_btree_iter_peek_upto_type(iter, end, flags),
bch2_err_matches(bkey_err(k), BCH_ERR_transaction_restart)))
bch2_trans_begin(trans);
return k;
}
#define lockrestart_do(_trans, _do) \ #define lockrestart_do(_trans, _do) \
({ \ ({ \
__label__ transaction_restart; \
u32 _restart_count; \ u32 _restart_count; \
int _ret2; \ int _ret2; \
transaction_restart: \
_restart_count = bch2_trans_begin(_trans); \
_ret2 = (_do); \
\ \
do { \ if (bch2_err_matches(_ret2, BCH_ERR_transaction_restart)) \
_restart_count = bch2_trans_begin(_trans); \ goto transaction_restart; \
_ret2 = (_do); \
} while (bch2_err_matches(_ret2, BCH_ERR_transaction_restart)); \
\ \
if (!_ret2) \ if (!_ret2) \
bch2_trans_verify_not_restarted(_trans, _restart_count);\ bch2_trans_verify_not_restarted(_trans, _restart_count);\
\
_ret2; \ _ret2; \
}) })
@ -716,91 +694,56 @@ __bch2_btree_iter_peek_upto_and_restart(struct btree_trans *trans,
_ret2 ?: trans_was_restarted(_trans, _restart_count); \ _ret2 ?: trans_was_restarted(_trans, _restart_count); \
}) })
#define for_each_btree_key2(_trans, _iter, _btree_id, \ #define for_each_btree_key_upto(_trans, _iter, _btree_id, \
_start, _flags, _k, _do) \ _start, _end, _flags, _k, _do) \
({ \ ({ \
struct btree_iter _iter; \
struct bkey_s_c _k; \
int _ret3 = 0; \ int _ret3 = 0; \
\ \
bch2_trans_iter_init((_trans), &(_iter), (_btree_id), \ bch2_trans_iter_init((_trans), &(_iter), (_btree_id), \
(_start), (_flags)); \ (_start), (_flags)); \
\ \
while (1) { \ do { \
u32 _restart_count = bch2_trans_begin(_trans); \ _ret3 = lockrestart_do(_trans, ({ \
(_k) = bch2_btree_iter_peek_upto_type(&(_iter), \
_end, (_flags)); \
if (!(_k).k) \
break; \
\ \
_ret3 = 0; \ bkey_err(_k) ?: (_do); \
(_k) = bch2_btree_iter_peek_type(&(_iter), (_flags)); \ })); \
if (!(_k).k) \ } while (!_ret3 && bch2_btree_iter_advance(&(_iter))); \
break; \
\
_ret3 = bkey_err(_k) ?: (_do); \
if (bch2_err_matches(_ret3, BCH_ERR_transaction_restart))\
continue; \
if (_ret3) \
break; \
bch2_trans_verify_not_restarted(_trans, _restart_count);\
if (!bch2_btree_iter_advance(&(_iter))) \
break; \
} \
\ \
bch2_trans_iter_exit((_trans), &(_iter)); \ bch2_trans_iter_exit((_trans), &(_iter)); \
_ret3; \ _ret3; \
}) })
#define for_each_btree_key2_upto(_trans, _iter, _btree_id, \ #define for_each_btree_key(_trans, _iter, _btree_id, \
_start, _end, _flags, _k, _do) \ _start, _flags, _k, _do) \
({ \ for_each_btree_key_upto(_trans, _iter, _btree_id, _start, \
int _ret3 = 0; \ SPOS_MAX, _flags, _k, _do)
\
bch2_trans_iter_init((_trans), &(_iter), (_btree_id), \
(_start), (_flags)); \
\
while (1) { \
u32 _restart_count = bch2_trans_begin(_trans); \
\
_ret3 = 0; \
(_k) = bch2_btree_iter_peek_upto_type(&(_iter), _end, (_flags));\
if (!(_k).k) \
break; \
\
_ret3 = bkey_err(_k) ?: (_do); \
if (bch2_err_matches(_ret3, BCH_ERR_transaction_restart))\
continue; \
if (_ret3) \
break; \
bch2_trans_verify_not_restarted(_trans, _restart_count);\
if (!bch2_btree_iter_advance(&(_iter))) \
break; \
} \
\
bch2_trans_iter_exit((_trans), &(_iter)); \
_ret3; \
})
#define for_each_btree_key_reverse(_trans, _iter, _btree_id, \ #define for_each_btree_key_reverse(_trans, _iter, _btree_id, \
_start, _flags, _k, _do) \ _start, _flags, _k, _do) \
({ \ ({ \
struct btree_iter _iter; \
struct bkey_s_c _k; \
int _ret3 = 0; \ int _ret3 = 0; \
\ \
bch2_trans_iter_init((_trans), &(_iter), (_btree_id), \ bch2_trans_iter_init((_trans), &(_iter), (_btree_id), \
(_start), (_flags)); \ (_start), (_flags)); \
\ \
while (1) { \ do { \
u32 _restart_count = bch2_trans_begin(_trans); \ _ret3 = lockrestart_do(_trans, ({ \
(_k) = bch2_btree_iter_peek_prev_type(&(_iter), (_flags));\ (_k) = bch2_btree_iter_peek_prev_type(&(_iter), \
if (!(_k).k) { \ (_flags)); \
_ret3 = 0; \ if (!(_k).k) \
break; \ break; \
} \
\ \
_ret3 = bkey_err(_k) ?: (_do); \ bkey_err(_k) ?: (_do); \
if (bch2_err_matches(_ret3, BCH_ERR_transaction_restart))\ })); \
continue; \ } while (!_ret3 && bch2_btree_iter_rewind(&(_iter))); \
if (_ret3) \
break; \
bch2_trans_verify_not_restarted(_trans, _restart_count);\
if (!bch2_btree_iter_rewind(&(_iter))) \
break; \
} \
\ \
bch2_trans_iter_exit((_trans), &(_iter)); \ bch2_trans_iter_exit((_trans), &(_iter)); \
_ret3; \ _ret3; \
@ -810,7 +753,7 @@ __bch2_btree_iter_peek_upto_and_restart(struct btree_trans *trans,
_start, _iter_flags, _k, \ _start, _iter_flags, _k, \
_disk_res, _journal_seq, _commit_flags,\ _disk_res, _journal_seq, _commit_flags,\
_do) \ _do) \
for_each_btree_key2(_trans, _iter, _btree_id, _start, _iter_flags, _k,\ for_each_btree_key(_trans, _iter, _btree_id, _start, _iter_flags, _k,\
(_do) ?: bch2_trans_commit(_trans, (_disk_res),\ (_do) ?: bch2_trans_commit(_trans, (_disk_res),\
(_journal_seq), (_commit_flags))) (_journal_seq), (_commit_flags)))
@ -826,11 +769,27 @@ __bch2_btree_iter_peek_upto_and_restart(struct btree_trans *trans,
_start, _end, _iter_flags, _k, \ _start, _end, _iter_flags, _k, \
_disk_res, _journal_seq, _commit_flags,\ _disk_res, _journal_seq, _commit_flags,\
_do) \ _do) \
for_each_btree_key2_upto(_trans, _iter, _btree_id, _start, _end, _iter_flags, _k,\ for_each_btree_key_upto(_trans, _iter, _btree_id, _start, _end, _iter_flags, _k,\
(_do) ?: bch2_trans_commit(_trans, (_disk_res),\ (_do) ?: bch2_trans_commit(_trans, (_disk_res),\
(_journal_seq), (_commit_flags))) (_journal_seq), (_commit_flags)))
#define for_each_btree_key(_trans, _iter, _btree_id, \ struct bkey_s_c bch2_btree_iter_peek_and_restart_outlined(struct btree_iter *);
static inline struct bkey_s_c
__bch2_btree_iter_peek_and_restart(struct btree_trans *trans,
struct btree_iter *iter, unsigned flags)
{
struct bkey_s_c k;
while (btree_trans_too_many_iters(trans) ||
(k = bch2_btree_iter_peek_type(iter, flags),
bch2_err_matches(bkey_err(k), BCH_ERR_transaction_restart)))
bch2_trans_begin(trans);
return k;
}
#define for_each_btree_key_old(_trans, _iter, _btree_id, \
_start, _flags, _k, _ret) \ _start, _flags, _k, _ret) \
for (bch2_trans_iter_init((_trans), &(_iter), (_btree_id), \ for (bch2_trans_iter_init((_trans), &(_iter), (_btree_id), \
(_start), (_flags)); \ (_start), (_flags)); \
@ -838,23 +797,6 @@ __bch2_btree_iter_peek_upto_and_restart(struct btree_trans *trans,
!((_ret) = bkey_err(_k)) && (_k).k; \ !((_ret) = bkey_err(_k)) && (_k).k; \
bch2_btree_iter_advance(&(_iter))) bch2_btree_iter_advance(&(_iter)))
#define for_each_btree_key_upto(_trans, _iter, _btree_id, \
_start, _end, _flags, _k, _ret) \
for (bch2_trans_iter_init((_trans), &(_iter), (_btree_id), \
(_start), (_flags)); \
(_k) = __bch2_btree_iter_peek_upto_and_restart((_trans), \
&(_iter), _end, _flags),\
!((_ret) = bkey_err(_k)) && (_k).k; \
bch2_btree_iter_advance(&(_iter)))
#define for_each_btree_key_norestart(_trans, _iter, _btree_id, \
_start, _flags, _k, _ret) \
for (bch2_trans_iter_init((_trans), &(_iter), (_btree_id), \
(_start), (_flags)); \
(_k) = bch2_btree_iter_peek_type(&(_iter), _flags), \
!((_ret) = bkey_err(_k)) && (_k).k; \
bch2_btree_iter_advance(&(_iter)))
#define for_each_btree_key_upto_norestart(_trans, _iter, _btree_id, \ #define for_each_btree_key_upto_norestart(_trans, _iter, _btree_id, \
_start, _end, _flags, _k, _ret) \ _start, _end, _flags, _k, _ret) \
for (bch2_trans_iter_init((_trans), &(_iter), (_btree_id), \ for (bch2_trans_iter_init((_trans), &(_iter), (_btree_id), \
@ -863,24 +805,20 @@ __bch2_btree_iter_peek_upto_and_restart(struct btree_trans *trans,
!((_ret) = bkey_err(_k)) && (_k).k; \ !((_ret) = bkey_err(_k)) && (_k).k; \
bch2_btree_iter_advance(&(_iter))) bch2_btree_iter_advance(&(_iter)))
#define for_each_btree_key_continue(_trans, _iter, _flags, _k, _ret) \
for (; \
(_k) = __bch2_btree_iter_peek_and_restart((_trans), &(_iter), _flags),\
!((_ret) = bkey_err(_k)) && (_k).k; \
bch2_btree_iter_advance(&(_iter)))
#define for_each_btree_key_continue_norestart(_iter, _flags, _k, _ret) \
for (; \
(_k) = bch2_btree_iter_peek_type(&(_iter), _flags), \
!((_ret) = bkey_err(_k)) && (_k).k; \
bch2_btree_iter_advance(&(_iter)))
#define for_each_btree_key_upto_continue_norestart(_iter, _end, _flags, _k, _ret)\ #define for_each_btree_key_upto_continue_norestart(_iter, _end, _flags, _k, _ret)\
for (; \ for (; \
(_k) = bch2_btree_iter_peek_upto_type(&(_iter), _end, _flags), \ (_k) = bch2_btree_iter_peek_upto_type(&(_iter), _end, _flags), \
!((_ret) = bkey_err(_k)) && (_k).k; \ !((_ret) = bkey_err(_k)) && (_k).k; \
bch2_btree_iter_advance(&(_iter))) bch2_btree_iter_advance(&(_iter)))
#define for_each_btree_key_norestart(_trans, _iter, _btree_id, \
_start, _flags, _k, _ret) \
for_each_btree_key_upto_norestart(_trans, _iter, _btree_id, _start,\
SPOS_MAX, _flags, _k, _ret)
#define for_each_btree_key_continue_norestart(_iter, _flags, _k, _ret) \
for_each_btree_key_upto_continue_norestart(_iter, SPOS_MAX, _flags, _k, _ret)
#define drop_locks_do(_trans, _do) \ #define drop_locks_do(_trans, _do) \
({ \ ({ \
bch2_trans_unlock(_trans); \ bch2_trans_unlock(_trans); \
@ -912,10 +850,7 @@ __bch2_btree_iter_peek_upto_and_restart(struct btree_trans *trans,
_p; \ _p; \
}) })
/* new multiple iterator interface: */
void bch2_trans_updates_to_text(struct printbuf *, struct btree_trans *); void bch2_trans_updates_to_text(struct printbuf *, struct btree_trans *);
void bch2_btree_path_to_text(struct printbuf *, struct btree_path *);
void bch2_trans_paths_to_text(struct printbuf *, struct btree_trans *); void bch2_trans_paths_to_text(struct printbuf *, struct btree_trans *);
void bch2_dump_trans_updates(struct btree_trans *); void bch2_dump_trans_updates(struct btree_trans *);
void bch2_dump_trans_paths_updates(struct btree_trans *); void bch2_dump_trans_paths_updates(struct btree_trans *);

View File

@ -73,6 +73,7 @@ static size_t bch2_journal_key_search(struct journal_keys *keys,
return idx_to_pos(keys, __bch2_journal_key_search(keys, id, level, pos)); return idx_to_pos(keys, __bch2_journal_key_search(keys, id, level, pos));
} }
/* Returns first non-overwritten key >= search key: */
struct bkey_i *bch2_journal_keys_peek_upto(struct bch_fs *c, enum btree_id btree_id, struct bkey_i *bch2_journal_keys_peek_upto(struct bch_fs *c, enum btree_id btree_id,
unsigned level, struct bpos pos, unsigned level, struct bpos pos,
struct bpos end_pos, size_t *idx) struct bpos end_pos, size_t *idx)
@ -86,12 +87,26 @@ struct bkey_i *bch2_journal_keys_peek_upto(struct bch_fs *c, enum btree_id btree
if (!*idx) if (!*idx)
*idx = __bch2_journal_key_search(keys, btree_id, level, pos); *idx = __bch2_journal_key_search(keys, btree_id, level, pos);
while (*idx &&
__journal_key_cmp(btree_id, level, end_pos, idx_to_key(keys, *idx - 1)) <= 0) {
--(*idx);
iters++;
if (iters == 10) {
*idx = 0;
goto search;
}
}
while ((k = *idx < keys->nr ? idx_to_key(keys, *idx) : NULL)) { while ((k = *idx < keys->nr ? idx_to_key(keys, *idx) : NULL)) {
if (__journal_key_cmp(btree_id, level, end_pos, k) < 0) if (__journal_key_cmp(btree_id, level, end_pos, k) < 0)
return NULL; return NULL;
if (__journal_key_cmp(btree_id, level, pos, k) <= 0 && if (k->overwritten) {
!k->overwritten) (*idx)++;
continue;
}
if (__journal_key_cmp(btree_id, level, pos, k) <= 0)
return k->k; return k->k;
(*idx)++; (*idx)++;
@ -162,7 +177,7 @@ int bch2_journal_key_insert_take(struct bch_fs *c, enum btree_id id,
struct journal_keys *keys = &c->journal_keys; struct journal_keys *keys = &c->journal_keys;
size_t idx = bch2_journal_key_search(keys, id, level, k->k.p); size_t idx = bch2_journal_key_search(keys, id, level, k->k.p);
BUG_ON(test_bit(BCH_FS_RW, &c->flags)); BUG_ON(test_bit(BCH_FS_rw, &c->flags));
if (idx < keys->size && if (idx < keys->size &&
journal_key_cmp(&n, &keys->d[idx]) == 0) { journal_key_cmp(&n, &keys->d[idx]) == 0) {
@ -452,9 +467,7 @@ static void __journal_keys_sort(struct journal_keys *keys)
src = dst = keys->d; src = dst = keys->d;
while (src < keys->d + keys->nr) { while (src < keys->d + keys->nr) {
while (src + 1 < keys->d + keys->nr && while (src + 1 < keys->d + keys->nr &&
src[0].btree_id == src[1].btree_id && !journal_key_cmp(src, src + 1))
src[0].level == src[1].level &&
bpos_eq(src[0].k->k.p, src[1].k->k.p))
src++; src++;
*dst++ = *src++; *dst++ = *src++;

View File

@ -630,7 +630,7 @@ static int btree_key_cache_flush_pos(struct btree_trans *trans,
if (ret) if (ret)
goto out; goto out;
ck = (void *) c_iter.path->l[0].b; ck = (void *) btree_iter_path(trans, &c_iter)->l[0].b;
if (!ck) if (!ck)
goto out; goto out;
@ -645,22 +645,29 @@ static int btree_key_cache_flush_pos(struct btree_trans *trans,
if (journal_seq && ck->journal.seq != journal_seq) if (journal_seq && ck->journal.seq != journal_seq)
goto out; goto out;
trans->journal_res.seq = ck->journal.seq;
/* /*
* Since journal reclaim depends on us making progress here, and the * If we're at the end of the journal, we really want to free up space
* allocator/copygc depend on journal reclaim making progress, we need * in the journal right away - we don't want to pin that old journal
* to be using alloc reserves: * sequence number with a new btree node write, we want to re-journal
* the update
*/ */
if (ck->journal.seq == journal_last_seq(j))
commit_flags |= BCH_WATERMARK_reclaim;
if (ck->journal.seq != journal_last_seq(j) ||
j->watermark == BCH_WATERMARK_stripe)
commit_flags |= BCH_TRANS_COMMIT_no_journal_res;
ret = bch2_btree_iter_traverse(&b_iter) ?: ret = bch2_btree_iter_traverse(&b_iter) ?:
bch2_trans_update(trans, &b_iter, ck->k, bch2_trans_update(trans, &b_iter, ck->k,
BTREE_UPDATE_KEY_CACHE_RECLAIM| BTREE_UPDATE_KEY_CACHE_RECLAIM|
BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE| BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE|
BTREE_TRIGGER_NORUN) ?: BTREE_TRIGGER_NORUN) ?:
bch2_trans_commit(trans, NULL, NULL, bch2_trans_commit(trans, NULL, NULL,
BTREE_INSERT_NOCHECK_RW| BCH_TRANS_COMMIT_no_check_rw|
BTREE_INSERT_NOFAIL| BCH_TRANS_COMMIT_no_enospc|
(ck->journal.seq == journal_last_seq(j)
? BCH_WATERMARK_reclaim
: 0)|
commit_flags); commit_flags);
bch2_fs_fatal_err_on(ret && bch2_fs_fatal_err_on(ret &&
@ -673,7 +680,8 @@ static int btree_key_cache_flush_pos(struct btree_trans *trans,
bch2_journal_pin_drop(j, &ck->journal); bch2_journal_pin_drop(j, &ck->journal);
BUG_ON(!btree_node_locked(c_iter.path, 0)); struct btree_path *path = btree_iter_path(trans, &c_iter);
BUG_ON(!btree_node_locked(path, 0));
if (!evict) { if (!evict) {
if (test_bit(BKEY_CACHED_DIRTY, &ck->flags)) { if (test_bit(BKEY_CACHED_DIRTY, &ck->flags)) {
@ -682,19 +690,20 @@ static int btree_key_cache_flush_pos(struct btree_trans *trans,
} }
} else { } else {
struct btree_path *path2; struct btree_path *path2;
unsigned i;
evict: evict:
trans_for_each_path(trans, path2) trans_for_each_path(trans, path2, i)
if (path2 != c_iter.path) if (path2 != path)
__bch2_btree_path_unlock(trans, path2); __bch2_btree_path_unlock(trans, path2);
bch2_btree_node_lock_write_nofail(trans, c_iter.path, &ck->c); bch2_btree_node_lock_write_nofail(trans, path, &ck->c);
if (test_bit(BKEY_CACHED_DIRTY, &ck->flags)) { if (test_bit(BKEY_CACHED_DIRTY, &ck->flags)) {
clear_bit(BKEY_CACHED_DIRTY, &ck->flags); clear_bit(BKEY_CACHED_DIRTY, &ck->flags);
atomic_long_dec(&c->btree_key_cache.nr_dirty); atomic_long_dec(&c->btree_key_cache.nr_dirty);
} }
mark_btree_node_locked_noreset(c_iter.path, 0, BTREE_NODE_UNLOCKED); mark_btree_node_locked_noreset(path, 0, BTREE_NODE_UNLOCKED);
bkey_cached_evict(&c->btree_key_cache, ck); bkey_cached_evict(&c->btree_key_cache, ck);
bkey_cached_free_fast(&c->btree_key_cache, ck); bkey_cached_free_fast(&c->btree_key_cache, ck);
} }
@ -732,9 +741,9 @@ int bch2_btree_key_cache_journal_flush(struct journal *j,
} }
six_unlock_read(&ck->c.lock); six_unlock_read(&ck->c.lock);
ret = commit_do(trans, NULL, NULL, 0, ret = lockrestart_do(trans,
btree_key_cache_flush_pos(trans, key, seq, btree_key_cache_flush_pos(trans, key, seq,
BTREE_INSERT_JOURNAL_RECLAIM, false)); BCH_TRANS_COMMIT_journal_reclaim, false));
unlock: unlock:
srcu_read_unlock(&c->btree_trans_barrier, srcu_idx); srcu_read_unlock(&c->btree_trans_barrier, srcu_idx);
@ -742,28 +751,12 @@ int bch2_btree_key_cache_journal_flush(struct journal *j,
return ret; return ret;
} }
/*
* Flush and evict a key from the key cache:
*/
int bch2_btree_key_cache_flush(struct btree_trans *trans,
enum btree_id id, struct bpos pos)
{
struct bch_fs *c = trans->c;
struct bkey_cached_key key = { id, pos };
/* Fastpath - assume it won't be found: */
if (!bch2_btree_key_cache_find(c, id, pos))
return 0;
return btree_key_cache_flush_pos(trans, key, 0, 0, true);
}
bool bch2_btree_insert_key_cached(struct btree_trans *trans, bool bch2_btree_insert_key_cached(struct btree_trans *trans,
unsigned flags, unsigned flags,
struct btree_insert_entry *insert_entry) struct btree_insert_entry *insert_entry)
{ {
struct bch_fs *c = trans->c; struct bch_fs *c = trans->c;
struct bkey_cached *ck = (void *) insert_entry->path->l[0].b; struct bkey_cached *ck = (void *) (trans->paths + insert_entry->path)->l[0].b;
struct bkey_i *insert = insert_entry->k; struct bkey_i *insert = insert_entry->k;
bool kick_reclaim = false; bool kick_reclaim = false;
@ -773,7 +766,7 @@ bool bch2_btree_insert_key_cached(struct btree_trans *trans,
ck->valid = true; ck->valid = true;
if (!test_bit(BKEY_CACHED_DIRTY, &ck->flags)) { if (!test_bit(BKEY_CACHED_DIRTY, &ck->flags)) {
EBUG_ON(test_bit(BCH_FS_CLEAN_SHUTDOWN, &c->flags)); EBUG_ON(test_bit(BCH_FS_clean_shutdown, &c->flags));
set_bit(BKEY_CACHED_DIRTY, &ck->flags); set_bit(BKEY_CACHED_DIRTY, &ck->flags);
atomic_long_inc(&c->btree_key_cache.nr_dirty); atomic_long_inc(&c->btree_key_cache.nr_dirty);
@ -1000,7 +993,7 @@ void bch2_fs_btree_key_cache_exit(struct btree_key_cache *bc)
if (atomic_long_read(&bc->nr_dirty) && if (atomic_long_read(&bc->nr_dirty) &&
!bch2_journal_error(&c->journal) && !bch2_journal_error(&c->journal) &&
test_bit(BCH_FS_WAS_RW, &c->flags)) test_bit(BCH_FS_was_rw, &c->flags))
panic("btree key cache shutdown error: nr_dirty nonzero (%li)\n", panic("btree key cache shutdown error: nr_dirty nonzero (%li)\n",
atomic_long_read(&bc->nr_dirty)); atomic_long_read(&bc->nr_dirty));

View File

@ -31,8 +31,6 @@ int bch2_btree_path_traverse_cached(struct btree_trans *, struct btree_path *,
bool bch2_btree_insert_key_cached(struct btree_trans *, unsigned, bool bch2_btree_insert_key_cached(struct btree_trans *, unsigned,
struct btree_insert_entry *); struct btree_insert_entry *);
int bch2_btree_key_cache_flush(struct btree_trans *,
enum btree_id, struct bpos);
void bch2_btree_key_cache_drop(struct btree_trans *, void bch2_btree_key_cache_drop(struct btree_trans *,
struct btree_path *); struct btree_path *);

View File

@ -32,13 +32,14 @@ struct six_lock_count bch2_btree_node_lock_counts(struct btree_trans *trans,
{ {
struct btree_path *path; struct btree_path *path;
struct six_lock_count ret; struct six_lock_count ret;
unsigned i;
memset(&ret, 0, sizeof(ret)); memset(&ret, 0, sizeof(ret));
if (IS_ERR_OR_NULL(b)) if (IS_ERR_OR_NULL(b))
return ret; return ret;
trans_for_each_path(trans, path) trans_for_each_path(trans, path, i)
if (path != skip && &path->l[level].b->c == b) { if (path != skip && &path->l[level].b->c == b) {
int t = btree_node_locked_type(path, level); int t = btree_node_locked_type(path, level);
@ -85,8 +86,14 @@ static noinline void print_cycle(struct printbuf *out, struct lock_graph *g)
prt_printf(out, "Found lock cycle (%u entries):", g->nr); prt_printf(out, "Found lock cycle (%u entries):", g->nr);
prt_newline(out); prt_newline(out);
for (i = g->g; i < g->g + g->nr; i++) for (i = g->g; i < g->g + g->nr; i++) {
struct task_struct *task = READ_ONCE(i->trans->locking_wait.task);
if (!task)
continue;
bch2_btree_trans_to_text(out, i->trans); bch2_btree_trans_to_text(out, i->trans);
bch2_prt_task_backtrace(out, task, i == g->g ? 5 : 1);
}
} }
static noinline void print_chain(struct printbuf *out, struct lock_graph *g) static noinline void print_chain(struct printbuf *out, struct lock_graph *g)
@ -94,9 +101,10 @@ static noinline void print_chain(struct printbuf *out, struct lock_graph *g)
struct trans_waiting_for_lock *i; struct trans_waiting_for_lock *i;
for (i = g->g; i != g->g + g->nr; i++) { for (i = g->g; i != g->g + g->nr; i++) {
struct task_struct *task = i->trans->locking_wait.task;
if (i != g->g) if (i != g->g)
prt_str(out, "<- "); prt_str(out, "<- ");
prt_printf(out, "%u ", i->trans->locking_wait.task->pid); prt_printf(out, "%u ", task ?task->pid : 0);
} }
prt_newline(out); prt_newline(out);
} }
@ -142,10 +150,27 @@ static bool lock_graph_remove_non_waiters(struct lock_graph *g)
return false; return false;
} }
static void trace_would_deadlock(struct lock_graph *g, struct btree_trans *trans)
{
struct bch_fs *c = trans->c;
count_event(c, trans_restart_would_deadlock);
if (trace_trans_restart_would_deadlock_enabled()) {
struct printbuf buf = PRINTBUF;
buf.atomic++;
print_cycle(&buf, g);
trace_trans_restart_would_deadlock(trans, buf.buf);
printbuf_exit(&buf);
}
}
static int abort_lock(struct lock_graph *g, struct trans_waiting_for_lock *i) static int abort_lock(struct lock_graph *g, struct trans_waiting_for_lock *i)
{ {
if (i == g->g) { if (i == g->g) {
trace_and_count(i->trans->c, trans_restart_would_deadlock, i->trans, _RET_IP_); trace_would_deadlock(g, i->trans);
return btree_trans_restart(i->trans, BCH_ERR_transaction_restart_would_deadlock); return btree_trans_restart(i->trans, BCH_ERR_transaction_restart_would_deadlock);
} else { } else {
i->trans->lock_must_abort = true; i->trans->lock_must_abort = true;
@ -202,7 +227,7 @@ static noinline int break_cycle(struct lock_graph *g, struct printbuf *cycle)
prt_printf(&buf, "backtrace:"); prt_printf(&buf, "backtrace:");
prt_newline(&buf); prt_newline(&buf);
printbuf_indent_add(&buf, 2); printbuf_indent_add(&buf, 2);
bch2_prt_task_backtrace(&buf, trans->locking_wait.task); bch2_prt_task_backtrace(&buf, trans->locking_wait.task, 2);
printbuf_indent_sub(&buf, 2); printbuf_indent_sub(&buf, 2);
prt_newline(&buf); prt_newline(&buf);
} }
@ -262,27 +287,40 @@ int bch2_check_for_deadlock(struct btree_trans *trans, struct printbuf *cycle)
struct lock_graph g; struct lock_graph g;
struct trans_waiting_for_lock *top; struct trans_waiting_for_lock *top;
struct btree_bkey_cached_common *b; struct btree_bkey_cached_common *b;
struct btree_path *path; btree_path_idx_t path_idx;
unsigned path_idx; int ret = 0;
int ret;
g.nr = 0;
if (trans->lock_must_abort) { if (trans->lock_must_abort) {
if (cycle) if (cycle)
return -1; return -1;
trace_and_count(trans->c, trans_restart_would_deadlock, trans, _RET_IP_); trace_would_deadlock(&g, trans);
return btree_trans_restart(trans, BCH_ERR_transaction_restart_would_deadlock); return btree_trans_restart(trans, BCH_ERR_transaction_restart_would_deadlock);
} }
g.nr = 0;
lock_graph_down(&g, trans); lock_graph_down(&g, trans);
/* trans->paths is rcu protected vs. freeing */
rcu_read_lock();
if (cycle)
cycle->atomic++;
next: next:
if (!g.nr) if (!g.nr)
return 0; goto out;
top = &g.g[g.nr - 1]; top = &g.g[g.nr - 1];
trans_for_each_path_safe_from(top->trans, path, path_idx, top->path_idx) { struct btree_path *paths = rcu_dereference(top->trans->paths);
if (!paths)
goto up;
unsigned long *paths_allocated = trans_paths_allocated(paths);
trans_for_each_path_idx_from(paths_allocated, *trans_paths_nr(paths),
path_idx, top->path_idx) {
struct btree_path *path = paths + path_idx;
if (!path->nodes_locked) if (!path->nodes_locked)
continue; continue;
@ -348,18 +386,23 @@ int bch2_check_for_deadlock(struct btree_trans *trans, struct printbuf *cycle)
ret = lock_graph_descend(&g, trans, cycle); ret = lock_graph_descend(&g, trans, cycle);
if (ret) if (ret)
return ret; goto out;
goto next; goto next;
} }
raw_spin_unlock(&b->lock.wait_lock); raw_spin_unlock(&b->lock.wait_lock);
} }
} }
up:
if (g.nr > 1 && cycle) if (g.nr > 1 && cycle)
print_chain(cycle, &g); print_chain(cycle, &g);
lock_graph_up(&g); lock_graph_up(&g);
goto next; goto next;
out:
if (cycle)
--cycle->atomic;
rcu_read_unlock();
return ret;
} }
int bch2_six_check_for_deadlock(struct six_lock *lock, void *p) int bch2_six_check_for_deadlock(struct six_lock *lock, void *p)
@ -398,7 +441,7 @@ void bch2_btree_node_lock_write_nofail(struct btree_trans *trans,
struct btree_bkey_cached_common *b) struct btree_bkey_cached_common *b)
{ {
struct btree_path *linked; struct btree_path *linked;
unsigned i; unsigned i, iter;
int ret; int ret;
/* /*
@ -412,7 +455,7 @@ void bch2_btree_node_lock_write_nofail(struct btree_trans *trans,
* already taken are no longer needed: * already taken are no longer needed:
*/ */
trans_for_each_path(trans, linked) { trans_for_each_path(trans, linked, iter) {
if (!linked->nodes_locked) if (!linked->nodes_locked)
continue; continue;
@ -624,8 +667,6 @@ bool __bch2_btree_path_upgrade(struct btree_trans *trans,
unsigned new_locks_want, unsigned new_locks_want,
struct get_locks_fail *f) struct get_locks_fail *f)
{ {
struct btree_path *linked;
if (bch2_btree_path_upgrade_noupgrade_sibs(trans, path, new_locks_want, f)) if (bch2_btree_path_upgrade_noupgrade_sibs(trans, path, new_locks_want, f))
return true; return true;
@ -648,8 +689,11 @@ bool __bch2_btree_path_upgrade(struct btree_trans *trans,
* before interior nodes - now that's handled by * before interior nodes - now that's handled by
* bch2_btree_path_traverse_all(). * bch2_btree_path_traverse_all().
*/ */
if (!path->cached && !trans->in_traverse_all) if (!path->cached && !trans->in_traverse_all) {
trans_for_each_path(trans, linked) struct btree_path *linked;
unsigned i;
trans_for_each_path(trans, linked, i)
if (linked != path && if (linked != path &&
linked->cached == path->cached && linked->cached == path->cached &&
linked->btree_id == path->btree_id && linked->btree_id == path->btree_id &&
@ -657,6 +701,7 @@ bool __bch2_btree_path_upgrade(struct btree_trans *trans,
linked->locks_want = new_locks_want; linked->locks_want = new_locks_want;
btree_path_get_locks(trans, linked, true, NULL); btree_path_get_locks(trans, linked, true, NULL);
} }
}
return false; return false;
} }
@ -665,7 +710,7 @@ void __bch2_btree_path_downgrade(struct btree_trans *trans,
struct btree_path *path, struct btree_path *path,
unsigned new_locks_want) unsigned new_locks_want)
{ {
unsigned l; unsigned l, old_locks_want = path->locks_want;
if (trans->restarted) if (trans->restarted)
return; return;
@ -689,8 +734,7 @@ void __bch2_btree_path_downgrade(struct btree_trans *trans,
bch2_btree_path_verify_locks(path); bch2_btree_path_verify_locks(path);
path->downgrade_seq++; trace_path_downgrade(trans, _RET_IP_, path, old_locks_want);
trace_path_downgrade(trans, _RET_IP_, path);
} }
/* Btree transaction locking: */ /* Btree transaction locking: */
@ -698,22 +742,24 @@ void __bch2_btree_path_downgrade(struct btree_trans *trans,
void bch2_trans_downgrade(struct btree_trans *trans) void bch2_trans_downgrade(struct btree_trans *trans)
{ {
struct btree_path *path; struct btree_path *path;
unsigned i;
if (trans->restarted) if (trans->restarted)
return; return;
trans_for_each_path(trans, path) trans_for_each_path(trans, path, i)
bch2_btree_path_downgrade(trans, path); bch2_btree_path_downgrade(trans, path);
} }
int bch2_trans_relock(struct btree_trans *trans) int bch2_trans_relock(struct btree_trans *trans)
{ {
struct btree_path *path; struct btree_path *path;
unsigned i;
if (unlikely(trans->restarted)) if (unlikely(trans->restarted))
return -((int) trans->restarted); return -((int) trans->restarted);
trans_for_each_path(trans, path) trans_for_each_path(trans, path, i)
if (path->should_be_locked && if (path->should_be_locked &&
!bch2_btree_path_relock_norestart(trans, path, _RET_IP_)) { !bch2_btree_path_relock_norestart(trans, path, _RET_IP_)) {
trace_and_count(trans->c, trans_restart_relock, trans, _RET_IP_, path); trace_and_count(trans->c, trans_restart_relock, trans, _RET_IP_, path);
@ -725,11 +771,12 @@ int bch2_trans_relock(struct btree_trans *trans)
int bch2_trans_relock_notrace(struct btree_trans *trans) int bch2_trans_relock_notrace(struct btree_trans *trans)
{ {
struct btree_path *path; struct btree_path *path;
unsigned i;
if (unlikely(trans->restarted)) if (unlikely(trans->restarted))
return -((int) trans->restarted); return -((int) trans->restarted);
trans_for_each_path(trans, path) trans_for_each_path(trans, path, i)
if (path->should_be_locked && if (path->should_be_locked &&
!bch2_btree_path_relock_norestart(trans, path, _RET_IP_)) { !bch2_btree_path_relock_norestart(trans, path, _RET_IP_)) {
return btree_trans_restart(trans, BCH_ERR_transaction_restart_relock); return btree_trans_restart(trans, BCH_ERR_transaction_restart_relock);
@ -740,16 +787,18 @@ int bch2_trans_relock_notrace(struct btree_trans *trans)
void bch2_trans_unlock_noassert(struct btree_trans *trans) void bch2_trans_unlock_noassert(struct btree_trans *trans)
{ {
struct btree_path *path; struct btree_path *path;
unsigned i;
trans_for_each_path(trans, path) trans_for_each_path(trans, path, i)
__bch2_btree_path_unlock(trans, path); __bch2_btree_path_unlock(trans, path);
} }
void bch2_trans_unlock(struct btree_trans *trans) void bch2_trans_unlock(struct btree_trans *trans)
{ {
struct btree_path *path; struct btree_path *path;
unsigned i;
trans_for_each_path(trans, path) trans_for_each_path(trans, path, i)
__bch2_btree_path_unlock(trans, path); __bch2_btree_path_unlock(trans, path);
} }
@ -762,8 +811,9 @@ void bch2_trans_unlock_long(struct btree_trans *trans)
bool bch2_trans_locked(struct btree_trans *trans) bool bch2_trans_locked(struct btree_trans *trans)
{ {
struct btree_path *path; struct btree_path *path;
unsigned i;
trans_for_each_path(trans, path) trans_for_each_path(trans, path, i)
if (path->nodes_locked) if (path->nodes_locked)
return true; return true;
return false; return false;
@ -809,8 +859,9 @@ void bch2_btree_path_verify_locks(struct btree_path *path)
void bch2_trans_verify_locks(struct btree_trans *trans) void bch2_trans_verify_locks(struct btree_trans *trans)
{ {
struct btree_path *path; struct btree_path *path;
unsigned i;
trans_for_each_path(trans, path) trans_for_each_path(trans, path, i)
bch2_btree_path_verify_locks(path); bch2_btree_path_verify_locks(path);
} }

View File

@ -122,12 +122,9 @@ static void btree_trans_lock_hold_time_update(struct btree_trans *trans,
struct btree_path *path, unsigned level) struct btree_path *path, unsigned level)
{ {
#ifdef CONFIG_BCACHEFS_LOCK_TIME_STATS #ifdef CONFIG_BCACHEFS_LOCK_TIME_STATS
struct btree_transaction_stats *s = btree_trans_stats(trans); __bch2_time_stats_update(&btree_trans_stats(trans)->lock_hold_times,
path->l[level].lock_taken_time,
if (s) local_clock());
__bch2_time_stats_update(&s->lock_hold_times,
path->l[level].lock_taken_time,
local_clock());
#endif #endif
} }
@ -175,6 +172,7 @@ bch2_btree_node_unlock_write_inlined(struct btree_trans *trans, struct btree_pat
struct btree *b) struct btree *b)
{ {
struct btree_path *linked; struct btree_path *linked;
unsigned i;
EBUG_ON(path->l[b->c.level].b != b); EBUG_ON(path->l[b->c.level].b != b);
EBUG_ON(path->l[b->c.level].lock_seq != six_lock_seq(&b->c.lock)); EBUG_ON(path->l[b->c.level].lock_seq != six_lock_seq(&b->c.lock));
@ -182,7 +180,7 @@ bch2_btree_node_unlock_write_inlined(struct btree_trans *trans, struct btree_pat
mark_btree_node_locked_noreset(path, b->c.level, BTREE_NODE_INTENT_LOCKED); mark_btree_node_locked_noreset(path, b->c.level, BTREE_NODE_INTENT_LOCKED);
trans_for_each_path_with_node(trans, b, linked) trans_for_each_path_with_node(trans, b, linked, i)
linked->l[b->c.level].lock_seq++; linked->l[b->c.level].lock_seq++;
six_unlock_write(&b->c.lock); six_unlock_write(&b->c.lock);
@ -242,8 +240,9 @@ static inline bool btree_node_lock_increment(struct btree_trans *trans,
enum btree_node_locked_type want) enum btree_node_locked_type want)
{ {
struct btree_path *path; struct btree_path *path;
unsigned i;
trans_for_each_path(trans, path) trans_for_each_path(trans, path, i)
if (&path->l[level].b->c == b && if (&path->l[level].b->c == b &&
btree_node_locked_type(path, level) >= want) { btree_node_locked_type(path, level) >= want) {
six_lock_increment(&b->lock, (enum six_lock_type) want); six_lock_increment(&b->lock, (enum six_lock_type) want);
@ -263,7 +262,6 @@ static inline int btree_node_lock(struct btree_trans *trans,
int ret = 0; int ret = 0;
EBUG_ON(level >= BTREE_MAX_DEPTH); EBUG_ON(level >= BTREE_MAX_DEPTH);
EBUG_ON(!(trans->paths_allocated & (1ULL << path->idx)));
if (likely(six_trylock_type(&b->lock, type)) || if (likely(six_trylock_type(&b->lock, type)) ||
btree_node_lock_increment(trans, b, level, (enum btree_node_locked_type) type) || btree_node_lock_increment(trans, b, level, (enum btree_node_locked_type) type) ||

View File

@ -12,6 +12,7 @@
#include "errcode.h" #include "errcode.h"
#include "error.h" #include "error.h"
#include "journal.h" #include "journal.h"
#include "journal_io.h"
#include "journal_reclaim.h" #include "journal_reclaim.h"
#include "replicas.h" #include "replicas.h"
#include "snapshot.h" #include "snapshot.h"
@ -23,7 +24,7 @@ static void verify_update_old_key(struct btree_trans *trans, struct btree_insert
#ifdef CONFIG_BCACHEFS_DEBUG #ifdef CONFIG_BCACHEFS_DEBUG
struct bch_fs *c = trans->c; struct bch_fs *c = trans->c;
struct bkey u; struct bkey u;
struct bkey_s_c k = bch2_btree_path_peek_slot_exact(i->path, &u); struct bkey_s_c k = bch2_btree_path_peek_slot_exact(trans->paths + i->path, &u);
if (unlikely(trans->journal_replay_not_finished)) { if (unlikely(trans->journal_replay_not_finished)) {
struct bkey_i *j_k = struct bkey_i *j_k =
@ -41,23 +42,23 @@ static void verify_update_old_key(struct btree_trans *trans, struct btree_insert
#endif #endif
} }
static inline struct btree_path_level *insert_l(struct btree_insert_entry *i) static inline struct btree_path_level *insert_l(struct btree_trans *trans, struct btree_insert_entry *i)
{ {
return i->path->l + i->level; return (trans->paths + i->path)->l + i->level;
} }
static inline bool same_leaf_as_prev(struct btree_trans *trans, static inline bool same_leaf_as_prev(struct btree_trans *trans,
struct btree_insert_entry *i) struct btree_insert_entry *i)
{ {
return i != trans->updates && return i != trans->updates &&
insert_l(&i[0])->b == insert_l(&i[-1])->b; insert_l(trans, &i[0])->b == insert_l(trans, &i[-1])->b;
} }
static inline bool same_leaf_as_next(struct btree_trans *trans, static inline bool same_leaf_as_next(struct btree_trans *trans,
struct btree_insert_entry *i) struct btree_insert_entry *i)
{ {
return i + 1 < trans->updates + trans->nr_updates && return i + 1 < trans->updates + trans->nr_updates &&
insert_l(&i[0])->b == insert_l(&i[1])->b; insert_l(trans, &i[0])->b == insert_l(trans, &i[1])->b;
} }
inline void bch2_btree_node_prep_for_write(struct btree_trans *trans, inline void bch2_btree_node_prep_for_write(struct btree_trans *trans,
@ -84,7 +85,7 @@ static noinline int trans_lock_write_fail(struct btree_trans *trans, struct btre
if (same_leaf_as_prev(trans, i)) if (same_leaf_as_prev(trans, i))
continue; continue;
bch2_btree_node_unlock_write(trans, i->path, insert_l(i)->b); bch2_btree_node_unlock_write(trans, trans->paths + i->path, insert_l(trans, i)->b);
} }
trace_and_count(trans->c, trans_restart_would_deadlock_write, trans); trace_and_count(trans->c, trans_restart_would_deadlock_write, trans);
@ -93,19 +94,17 @@ static noinline int trans_lock_write_fail(struct btree_trans *trans, struct btre
static inline int bch2_trans_lock_write(struct btree_trans *trans) static inline int bch2_trans_lock_write(struct btree_trans *trans)
{ {
struct btree_insert_entry *i;
EBUG_ON(trans->write_locked); EBUG_ON(trans->write_locked);
trans_for_each_update(trans, i) { trans_for_each_update(trans, i) {
if (same_leaf_as_prev(trans, i)) if (same_leaf_as_prev(trans, i))
continue; continue;
if (bch2_btree_node_lock_write(trans, i->path, &insert_l(i)->b->c)) if (bch2_btree_node_lock_write(trans, trans->paths + i->path, &insert_l(trans, i)->b->c))
return trans_lock_write_fail(trans, i); return trans_lock_write_fail(trans, i);
if (!i->cached) if (!i->cached)
bch2_btree_node_prep_for_write(trans, i->path, insert_l(i)->b); bch2_btree_node_prep_for_write(trans, trans->paths + i->path, insert_l(trans, i)->b);
} }
trans->write_locked = true; trans->write_locked = true;
@ -115,12 +114,10 @@ static inline int bch2_trans_lock_write(struct btree_trans *trans)
static inline void bch2_trans_unlock_write(struct btree_trans *trans) static inline void bch2_trans_unlock_write(struct btree_trans *trans)
{ {
if (likely(trans->write_locked)) { if (likely(trans->write_locked)) {
struct btree_insert_entry *i;
trans_for_each_update(trans, i) trans_for_each_update(trans, i)
if (!same_leaf_as_prev(trans, i)) if (!same_leaf_as_prev(trans, i))
bch2_btree_node_unlock_write_inlined(trans, i->path, bch2_btree_node_unlock_write_inlined(trans,
insert_l(i)->b); trans->paths + i->path, insert_l(trans, i)->b);
trans->write_locked = false; trans->write_locked = false;
} }
} }
@ -287,7 +284,7 @@ inline void bch2_btree_insert_key_leaf(struct btree_trans *trans,
bch2_btree_add_journal_pin(c, b, journal_seq); bch2_btree_add_journal_pin(c, b, journal_seq);
if (unlikely(!btree_node_dirty(b))) { if (unlikely(!btree_node_dirty(b))) {
EBUG_ON(test_bit(BCH_FS_CLEAN_SHUTDOWN, &c->flags)); EBUG_ON(test_bit(BCH_FS_clean_shutdown, &c->flags));
set_btree_node_dirty_acct(c, b); set_btree_node_dirty_acct(c, b);
} }
@ -311,10 +308,12 @@ inline void bch2_btree_insert_key_leaf(struct btree_trans *trans,
static inline void btree_insert_entry_checks(struct btree_trans *trans, static inline void btree_insert_entry_checks(struct btree_trans *trans,
struct btree_insert_entry *i) struct btree_insert_entry *i)
{ {
BUG_ON(!bpos_eq(i->k->k.p, i->path->pos)); struct btree_path *path = trans->paths + i->path;
BUG_ON(i->cached != i->path->cached);
BUG_ON(i->level != i->path->level); BUG_ON(!bpos_eq(i->k->k.p, path->pos));
BUG_ON(i->btree_id != i->path->btree_id); BUG_ON(i->cached != path->cached);
BUG_ON(i->level != path->level);
BUG_ON(i->btree_id != path->btree_id);
EBUG_ON(!i->level && EBUG_ON(!i->level &&
btree_type_has_snapshots(i->btree_id) && btree_type_has_snapshots(i->btree_id) &&
!(i->flags & BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE) && !(i->flags & BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE) &&
@ -361,8 +360,6 @@ noinline static int
btree_key_can_insert_cached_slowpath(struct btree_trans *trans, unsigned flags, btree_key_can_insert_cached_slowpath(struct btree_trans *trans, unsigned flags,
struct btree_path *path, unsigned new_u64s) struct btree_path *path, unsigned new_u64s)
{ {
struct bch_fs *c = trans->c;
struct btree_insert_entry *i;
struct bkey_cached *ck = (void *) path->l[0].b; struct bkey_cached *ck = (void *) path->l[0].b;
struct bkey_i *new_k; struct bkey_i *new_k;
int ret; int ret;
@ -372,7 +369,7 @@ btree_key_can_insert_cached_slowpath(struct btree_trans *trans, unsigned flags,
new_k = kmalloc(new_u64s * sizeof(u64), GFP_KERNEL); new_k = kmalloc(new_u64s * sizeof(u64), GFP_KERNEL);
if (!new_k) { if (!new_k) {
bch_err(c, "error allocating memory for key cache key, btree %s u64s %u", bch_err(trans->c, "error allocating memory for key cache key, btree %s u64s %u",
bch2_btree_id_str(path->btree_id), new_u64s); bch2_btree_id_str(path->btree_id), new_u64s);
return -BCH_ERR_ENOMEM_btree_key_cache_insert; return -BCH_ERR_ENOMEM_btree_key_cache_insert;
} }
@ -401,7 +398,6 @@ static int btree_key_can_insert_cached(struct btree_trans *trans, unsigned flags
{ {
struct bch_fs *c = trans->c; struct bch_fs *c = trans->c;
struct bkey_cached *ck = (void *) path->l[0].b; struct bkey_cached *ck = (void *) path->l[0].b;
struct btree_insert_entry *i;
unsigned new_u64s; unsigned new_u64s;
struct bkey_i *new_k; struct bkey_i *new_k;
@ -409,7 +405,7 @@ static int btree_key_can_insert_cached(struct btree_trans *trans, unsigned flags
if (!test_bit(BKEY_CACHED_DIRTY, &ck->flags) && if (!test_bit(BKEY_CACHED_DIRTY, &ck->flags) &&
bch2_btree_key_cache_must_wait(c) && bch2_btree_key_cache_must_wait(c) &&
!(flags & BTREE_INSERT_JOURNAL_RECLAIM)) !(flags & BCH_TRANS_COMMIT_journal_reclaim))
return -BCH_ERR_btree_insert_need_journal_reclaim; return -BCH_ERR_btree_insert_need_journal_reclaim;
/* /*
@ -455,22 +451,15 @@ static int run_one_mem_trigger(struct btree_trans *trans,
if (!btree_node_type_needs_gc(__btree_node_type(i->level, i->btree_id))) if (!btree_node_type_needs_gc(__btree_node_type(i->level, i->btree_id)))
return 0; return 0;
if (old_ops->atomic_trigger == new_ops->atomic_trigger) { if (old_ops->trigger == new_ops->trigger) {
ret = bch2_mark_key(trans, i->btree_id, i->level, ret = bch2_key_trigger(trans, i->btree_id, i->level,
old, bkey_i_to_s_c(new), old, bkey_i_to_s(new),
BTREE_TRIGGER_INSERT|BTREE_TRIGGER_OVERWRITE|flags); BTREE_TRIGGER_INSERT|BTREE_TRIGGER_OVERWRITE|flags);
} else { } else {
struct bkey _deleted = KEY(0, 0, 0); ret = bch2_key_trigger_new(trans, i->btree_id, i->level,
struct bkey_s_c deleted = (struct bkey_s_c) { &_deleted, NULL }; bkey_i_to_s(new), flags) ?:
bch2_key_trigger_old(trans, i->btree_id, i->level,
_deleted.p = i->path->pos; old, flags);
ret = bch2_mark_key(trans, i->btree_id, i->level,
deleted, bkey_i_to_s_c(new),
BTREE_TRIGGER_INSERT|flags) ?:
bch2_mark_key(trans, i->btree_id, i->level,
old, deleted,
BTREE_TRIGGER_OVERWRITE|flags);
} }
return ret; return ret;
@ -488,6 +477,7 @@ static int run_one_trans_trigger(struct btree_trans *trans, struct btree_insert_
struct bkey_s_c old = { &old_k, i->old_v }; struct bkey_s_c old = { &old_k, i->old_v };
const struct bkey_ops *old_ops = bch2_bkey_type_ops(old.k->type); const struct bkey_ops *old_ops = bch2_bkey_type_ops(old.k->type);
const struct bkey_ops *new_ops = bch2_bkey_type_ops(i->k->k.type); const struct bkey_ops *new_ops = bch2_bkey_type_ops(i->k->k.type);
unsigned flags = i->flags|BTREE_TRIGGER_TRANSACTIONAL;
verify_update_old_key(trans, i); verify_update_old_key(trans, i);
@ -497,19 +487,18 @@ static int run_one_trans_trigger(struct btree_trans *trans, struct btree_insert_
if (!i->insert_trigger_run && if (!i->insert_trigger_run &&
!i->overwrite_trigger_run && !i->overwrite_trigger_run &&
old_ops->trans_trigger == new_ops->trans_trigger) { old_ops->trigger == new_ops->trigger) {
i->overwrite_trigger_run = true; i->overwrite_trigger_run = true;
i->insert_trigger_run = true; i->insert_trigger_run = true;
return bch2_trans_mark_key(trans, i->btree_id, i->level, old, i->k, return bch2_key_trigger(trans, i->btree_id, i->level, old, bkey_i_to_s(i->k),
BTREE_TRIGGER_INSERT| BTREE_TRIGGER_INSERT|
BTREE_TRIGGER_OVERWRITE| BTREE_TRIGGER_OVERWRITE|flags) ?: 1;
i->flags) ?: 1;
} else if (overwrite && !i->overwrite_trigger_run) { } else if (overwrite && !i->overwrite_trigger_run) {
i->overwrite_trigger_run = true; i->overwrite_trigger_run = true;
return bch2_trans_mark_old(trans, i->btree_id, i->level, old, i->flags) ?: 1; return bch2_key_trigger_old(trans, i->btree_id, i->level, old, flags) ?: 1;
} else if (!overwrite && !i->insert_trigger_run) { } else if (!overwrite && !i->insert_trigger_run) {
i->insert_trigger_run = true; i->insert_trigger_run = true;
return bch2_trans_mark_new(trans, i->btree_id, i->level, i->k, i->flags) ?: 1; return bch2_key_trigger_new(trans, i->btree_id, i->level, bkey_i_to_s(i->k), flags) ?: 1;
} else { } else {
return 0; return 0;
} }
@ -551,7 +540,7 @@ static int run_btree_triggers(struct btree_trans *trans, enum btree_id btree_id,
static int bch2_trans_commit_run_triggers(struct btree_trans *trans) static int bch2_trans_commit_run_triggers(struct btree_trans *trans)
{ {
struct btree_insert_entry *i = NULL, *btree_id_start = trans->updates; struct btree_insert_entry *btree_id_start = trans->updates;
unsigned btree_id = 0; unsigned btree_id = 0;
int ret = 0; int ret = 0;
@ -598,7 +587,6 @@ static int bch2_trans_commit_run_triggers(struct btree_trans *trans)
static noinline int bch2_trans_commit_run_gc_triggers(struct btree_trans *trans) static noinline int bch2_trans_commit_run_gc_triggers(struct btree_trans *trans)
{ {
struct bch_fs *c = trans->c; struct bch_fs *c = trans->c;
struct btree_insert_entry *i;
int ret = 0; int ret = 0;
trans_for_each_update(trans, i) { trans_for_each_update(trans, i) {
@ -608,7 +596,7 @@ static noinline int bch2_trans_commit_run_gc_triggers(struct btree_trans *trans)
*/ */
BUG_ON(i->cached || i->level); BUG_ON(i->cached || i->level);
if (gc_visited(c, gc_pos_btree_node(insert_l(i)->b))) { if (gc_visited(c, gc_pos_btree_node(insert_l(trans, i)->b))) {
ret = run_one_mem_trigger(trans, i, i->flags|BTREE_TRIGGER_GC); ret = run_one_mem_trigger(trans, i, i->flags|BTREE_TRIGGER_GC);
if (ret) if (ret)
break; break;
@ -624,8 +612,6 @@ bch2_trans_commit_write_locked(struct btree_trans *trans, unsigned flags,
unsigned long trace_ip) unsigned long trace_ip)
{ {
struct bch_fs *c = trans->c; struct bch_fs *c = trans->c;
struct btree_insert_entry *i;
struct btree_write_buffered_key *wb;
struct btree_trans_commit_hook *h; struct btree_trans_commit_hook *h;
unsigned u64s = 0; unsigned u64s = 0;
int ret; int ret;
@ -650,23 +636,21 @@ bch2_trans_commit_write_locked(struct btree_trans *trans, unsigned flags,
u64s += i->k->k.u64s; u64s += i->k->k.u64s;
ret = !i->cached ret = !i->cached
? btree_key_can_insert(trans, insert_l(i)->b, u64s) ? btree_key_can_insert(trans, insert_l(trans, i)->b, u64s)
: btree_key_can_insert_cached(trans, flags, i->path, u64s); : btree_key_can_insert_cached(trans, flags, trans->paths + i->path, u64s);
if (ret) { if (ret) {
*stopped_at = i; *stopped_at = i;
return ret; return ret;
} }
}
if (trans->nr_wb_updates && i->k->k.needs_whiteout = false;
trans->nr_wb_updates + c->btree_write_buffer.state.nr > c->btree_write_buffer.size) }
return -BCH_ERR_btree_insert_need_flush_buffer;
/* /*
* Don't get journal reservation until after we know insert will * Don't get journal reservation until after we know insert will
* succeed: * succeed:
*/ */
if (likely(!(flags & BTREE_INSERT_JOURNAL_REPLAY))) { if (likely(!(flags & BCH_TRANS_COMMIT_no_journal_res))) {
ret = bch2_trans_journal_res_get(trans, ret = bch2_trans_journal_res_get(trans,
(flags & BCH_WATERMARK_MASK)| (flags & BCH_WATERMARK_MASK)|
JOURNAL_RES_GET_NONBLOCK); JOURNAL_RES_GET_NONBLOCK);
@ -675,8 +659,6 @@ bch2_trans_commit_write_locked(struct btree_trans *trans, unsigned flags,
if (unlikely(trans->journal_transaction_names)) if (unlikely(trans->journal_transaction_names))
journal_transaction_name(trans); journal_transaction_name(trans);
} else {
trans->journal_res.seq = c->journal.replay_journal_seq;
} }
/* /*
@ -685,7 +667,7 @@ bch2_trans_commit_write_locked(struct btree_trans *trans, unsigned flags,
*/ */
if (IS_ENABLED(CONFIG_BCACHEFS_DEBUG) && if (IS_ENABLED(CONFIG_BCACHEFS_DEBUG) &&
!(flags & BTREE_INSERT_JOURNAL_REPLAY)) { !(flags & BCH_TRANS_COMMIT_no_journal_res)) {
if (bch2_journal_seq_verify) if (bch2_journal_seq_verify)
trans_for_each_update(trans, i) trans_for_each_update(trans, i)
i->k->k.version.lo = trans->journal_res.seq; i->k->k.version.lo = trans->journal_res.seq;
@ -698,14 +680,6 @@ bch2_trans_commit_write_locked(struct btree_trans *trans, unsigned flags,
bch2_trans_fs_usage_apply(trans, trans->fs_usage_deltas)) bch2_trans_fs_usage_apply(trans, trans->fs_usage_deltas))
return -BCH_ERR_btree_insert_need_mark_replicas; return -BCH_ERR_btree_insert_need_mark_replicas;
if (trans->nr_wb_updates) {
EBUG_ON(flags & BTREE_INSERT_JOURNAL_REPLAY);
ret = bch2_btree_insert_keys_write_buffer(trans);
if (ret)
goto revert_fs_usage;
}
h = trans->hooks; h = trans->hooks;
while (h) { while (h) {
ret = h->fn(trans, h); ret = h->fn(trans, h);
@ -727,16 +701,7 @@ bch2_trans_commit_write_locked(struct btree_trans *trans, unsigned flags,
goto fatal_err; goto fatal_err;
} }
if (unlikely(trans->extra_journal_entries.nr)) { if (likely(!(flags & BCH_TRANS_COMMIT_no_journal_res))) {
memcpy_u64s_small(journal_res_entry(&c->journal, &trans->journal_res),
trans->extra_journal_entries.data,
trans->extra_journal_entries.nr);
trans->journal_res.offset += trans->extra_journal_entries.nr;
trans->journal_res.u64s -= trans->extra_journal_entries.nr;
}
if (likely(!(flags & BTREE_INSERT_JOURNAL_REPLAY))) {
struct journal *j = &c->journal; struct journal *j = &c->journal;
struct jset_entry *entry; struct jset_entry *entry;
@ -765,33 +730,27 @@ bch2_trans_commit_write_locked(struct btree_trans *trans, unsigned flags,
bkey_copy((struct bkey_i *) entry->start, i->k); bkey_copy((struct bkey_i *) entry->start, i->k);
} }
trans_for_each_wb_update(trans, wb) { memcpy_u64s_small(journal_res_entry(&c->journal, &trans->journal_res),
entry = bch2_journal_add_entry(j, &trans->journal_res, trans->journal_entries,
BCH_JSET_ENTRY_btree_keys, trans->journal_entries_u64s);
wb->btree, 0,
wb->k.k.u64s); trans->journal_res.offset += trans->journal_entries_u64s;
bkey_copy((struct bkey_i *) entry->start, &wb->k); trans->journal_res.u64s -= trans->journal_entries_u64s;
}
if (trans->journal_seq) if (trans->journal_seq)
*trans->journal_seq = trans->journal_res.seq; *trans->journal_seq = trans->journal_res.seq;
} }
trans_for_each_update(trans, i) { trans_for_each_update(trans, i) {
i->k->k.needs_whiteout = false; struct btree_path *path = trans->paths + i->path;
if (!i->cached) { if (!i->cached) {
u64 seq = trans->journal_res.seq; bch2_btree_insert_key_leaf(trans, path, i->k, trans->journal_res.seq);
if (i->flags & BTREE_UPDATE_PREJOURNAL)
seq = i->seq;
bch2_btree_insert_key_leaf(trans, i->path, i->k, seq);
} else if (!i->key_cache_already_flushed) } else if (!i->key_cache_already_flushed)
bch2_btree_insert_key_cached(trans, flags, i); bch2_btree_insert_key_cached(trans, flags, i);
else { else {
bch2_btree_key_cache_drop(trans, i->path); bch2_btree_key_cache_drop(trans, path);
btree_path_set_dirty(i->path, BTREE_ITER_NEED_TRAVERSE); btree_path_set_dirty(path, BTREE_ITER_NEED_TRAVERSE);
} }
} }
@ -806,14 +765,8 @@ bch2_trans_commit_write_locked(struct btree_trans *trans, unsigned flags,
static noinline void bch2_drop_overwrites_from_journal(struct btree_trans *trans) static noinline void bch2_drop_overwrites_from_journal(struct btree_trans *trans)
{ {
struct btree_insert_entry *i;
struct btree_write_buffered_key *wb;
trans_for_each_update(trans, i) trans_for_each_update(trans, i)
bch2_journal_key_overwritten(trans->c, i->btree_id, i->level, i->k->k.p); bch2_journal_key_overwritten(trans->c, i->btree_id, i->level, i->k->k.p);
trans_for_each_wb_update(trans, wb)
bch2_journal_key_overwritten(trans->c, wb->btree, 0, wb->k.k.p);
} }
static noinline int bch2_trans_commit_bkey_invalid(struct btree_trans *trans, static noinline int bch2_trans_commit_bkey_invalid(struct btree_trans *trans,
@ -841,6 +794,33 @@ static noinline int bch2_trans_commit_bkey_invalid(struct btree_trans *trans,
return -EINVAL; return -EINVAL;
} }
static noinline int bch2_trans_commit_journal_entry_invalid(struct btree_trans *trans,
struct jset_entry *i)
{
struct bch_fs *c = trans->c;
struct printbuf buf = PRINTBUF;
prt_printf(&buf, "invalid bkey on insert from %s", trans->fn);
prt_newline(&buf);
printbuf_indent_add(&buf, 2);
bch2_journal_entry_to_text(&buf, c, i);
prt_newline(&buf);
bch2_print_string_as_lines(KERN_ERR, buf.buf);
bch2_inconsistent_error(c);
bch2_dump_trans_updates(trans);
return -EINVAL;
}
static int bch2_trans_commit_journal_pin_flush(struct journal *j,
struct journal_entry_pin *_pin, u64 seq)
{
return 0;
}
/* /*
* Get journal reservation, take write locks, and attempt to do btree update(s): * Get journal reservation, take write locks, and attempt to do btree update(s):
*/ */
@ -849,7 +829,6 @@ static inline int do_bch2_trans_commit(struct btree_trans *trans, unsigned flags
unsigned long trace_ip) unsigned long trace_ip)
{ {
struct bch_fs *c = trans->c; struct bch_fs *c = trans->c;
struct btree_insert_entry *i;
int ret = 0, u64s_delta = 0; int ret = 0, u64s_delta = 0;
trans_for_each_update(trans, i) { trans_for_each_update(trans, i) {
@ -884,13 +863,15 @@ static inline int do_bch2_trans_commit(struct btree_trans *trans, unsigned flags
if (!ret && trans->journal_pin) if (!ret && trans->journal_pin)
bch2_journal_pin_add(&c->journal, trans->journal_res.seq, bch2_journal_pin_add(&c->journal, trans->journal_res.seq,
trans->journal_pin, NULL); trans->journal_pin,
bch2_trans_commit_journal_pin_flush);
/* /*
* Drop journal reservation after dropping write locks, since dropping * Drop journal reservation after dropping write locks, since dropping
* the journal reservation may kick off a journal write: * the journal reservation may kick off a journal write:
*/ */
bch2_journal_res_put(&c->journal, &trans->journal_res); if (likely(!(flags & BCH_TRANS_COMMIT_no_journal_res)))
bch2_journal_res_put(&c->journal, &trans->journal_res);
return ret; return ret;
} }
@ -916,7 +897,8 @@ int bch2_trans_commit_error(struct btree_trans *trans, unsigned flags,
case -BCH_ERR_btree_insert_btree_node_full: case -BCH_ERR_btree_insert_btree_node_full:
ret = bch2_btree_split_leaf(trans, i->path, flags); ret = bch2_btree_split_leaf(trans, i->path, flags);
if (bch2_err_matches(ret, BCH_ERR_transaction_restart)) if (bch2_err_matches(ret, BCH_ERR_transaction_restart))
trace_and_count(c, trans_restart_btree_node_split, trans, trace_ip, i->path); trace_and_count(c, trans_restart_btree_node_split, trans,
trace_ip, trans->paths + i->path);
break; break;
case -BCH_ERR_btree_insert_need_mark_replicas: case -BCH_ERR_btree_insert_need_mark_replicas:
ret = drop_locks_do(trans, ret = drop_locks_do(trans,
@ -927,7 +909,7 @@ int bch2_trans_commit_error(struct btree_trans *trans, unsigned flags,
* XXX: this should probably be a separate BTREE_INSERT_NONBLOCK * XXX: this should probably be a separate BTREE_INSERT_NONBLOCK
* flag * flag
*/ */
if ((flags & BTREE_INSERT_JOURNAL_RECLAIM) && if ((flags & BCH_TRANS_COMMIT_journal_reclaim) &&
(flags & BCH_WATERMARK_MASK) != BCH_WATERMARK_reclaim) { (flags & BCH_WATERMARK_MASK) != BCH_WATERMARK_reclaim) {
ret = -BCH_ERR_journal_reclaim_would_deadlock; ret = -BCH_ERR_journal_reclaim_would_deadlock;
break; break;
@ -950,30 +932,6 @@ int bch2_trans_commit_error(struct btree_trans *trans, unsigned flags,
ret = bch2_trans_relock(trans); ret = bch2_trans_relock(trans);
break; break;
case -BCH_ERR_btree_insert_need_flush_buffer: {
struct btree_write_buffer *wb = &c->btree_write_buffer;
ret = 0;
if (wb->state.nr > wb->size * 3 / 4) {
bch2_trans_unlock(trans);
mutex_lock(&wb->flush_lock);
if (wb->state.nr > wb->size * 3 / 4) {
bch2_trans_begin(trans);
ret = __bch2_btree_write_buffer_flush(trans,
flags|BTREE_INSERT_NOCHECK_RW, true);
if (!ret) {
trace_and_count(c, trans_restart_write_buffer_flush, trans, _THIS_IP_);
ret = btree_trans_restart(trans, BCH_ERR_transaction_restart_write_buffer_flush);
}
} else {
mutex_unlock(&wb->flush_lock);
ret = bch2_trans_relock(trans);
}
}
break;
}
default: default:
BUG_ON(ret >= 0); BUG_ON(ret >= 0);
break; break;
@ -982,8 +940,7 @@ int bch2_trans_commit_error(struct btree_trans *trans, unsigned flags,
BUG_ON(bch2_err_matches(ret, BCH_ERR_transaction_restart) != !!trans->restarted); BUG_ON(bch2_err_matches(ret, BCH_ERR_transaction_restart) != !!trans->restarted);
bch2_fs_inconsistent_on(bch2_err_matches(ret, ENOSPC) && bch2_fs_inconsistent_on(bch2_err_matches(ret, ENOSPC) &&
!(flags & BTREE_INSERT_NOWAIT) && (flags & BCH_TRANS_COMMIT_no_enospc), c,
(flags & BTREE_INSERT_NOFAIL), c,
"%s: incorrectly got %s\n", __func__, bch2_err_str(ret)); "%s: incorrectly got %s\n", __func__, bch2_err_str(ret));
return ret; return ret;
@ -995,8 +952,8 @@ bch2_trans_commit_get_rw_cold(struct btree_trans *trans, unsigned flags)
struct bch_fs *c = trans->c; struct bch_fs *c = trans->c;
int ret; int ret;
if (likely(!(flags & BTREE_INSERT_LAZY_RW)) || if (likely(!(flags & BCH_TRANS_COMMIT_lazy_rw)) ||
test_bit(BCH_FS_STARTED, &c->flags)) test_bit(BCH_FS_started, &c->flags))
return -BCH_ERR_erofs_trans_commit; return -BCH_ERR_erofs_trans_commit;
ret = drop_locks_do(trans, bch2_fs_read_write_early(c)); ret = drop_locks_do(trans, bch2_fs_read_write_early(c));
@ -1016,7 +973,6 @@ static noinline int
do_bch2_trans_commit_to_journal_replay(struct btree_trans *trans) do_bch2_trans_commit_to_journal_replay(struct btree_trans *trans)
{ {
struct bch_fs *c = trans->c; struct bch_fs *c = trans->c;
struct btree_insert_entry *i;
int ret = 0; int ret = 0;
trans_for_each_update(trans, i) { trans_for_each_update(trans, i) {
@ -1030,19 +986,14 @@ do_bch2_trans_commit_to_journal_replay(struct btree_trans *trans)
int __bch2_trans_commit(struct btree_trans *trans, unsigned flags) int __bch2_trans_commit(struct btree_trans *trans, unsigned flags)
{ {
struct btree_insert_entry *errored_at = NULL;
struct bch_fs *c = trans->c; struct bch_fs *c = trans->c;
struct btree_insert_entry *i = NULL;
struct btree_write_buffered_key *wb;
int ret = 0; int ret = 0;
if (!trans->nr_updates && if (!trans->nr_updates &&
!trans->nr_wb_updates && !trans->journal_entries_u64s)
!trans->extra_journal_entries.nr)
goto out_reset; goto out_reset;
if (flags & BTREE_INSERT_GC_LOCK_HELD)
lockdep_assert_held(&c->gc_lock);
ret = bch2_trans_commit_run_triggers(trans); ret = bch2_trans_commit_run_triggers(trans);
if (ret) if (ret)
goto out_reset; goto out_reset;
@ -1051,7 +1002,7 @@ int __bch2_trans_commit(struct btree_trans *trans, unsigned flags)
struct printbuf buf = PRINTBUF; struct printbuf buf = PRINTBUF;
enum bkey_invalid_flags invalid_flags = 0; enum bkey_invalid_flags invalid_flags = 0;
if (!(flags & BTREE_INSERT_JOURNAL_REPLAY)) if (!(flags & BCH_TRANS_COMMIT_no_journal_res))
invalid_flags |= BKEY_INVALID_WRITE|BKEY_INVALID_COMMIT; invalid_flags |= BKEY_INVALID_WRITE|BKEY_INVALID_COMMIT;
if (unlikely(bch2_bkey_invalid(c, bkey_i_to_s_c(i->k), if (unlikely(bch2_bkey_invalid(c, bkey_i_to_s_c(i->k),
@ -1064,47 +1015,52 @@ int __bch2_trans_commit(struct btree_trans *trans, unsigned flags)
return ret; return ret;
} }
if (unlikely(!test_bit(BCH_FS_MAY_GO_RW, &c->flags))) { for (struct jset_entry *i = trans->journal_entries;
i != (void *) ((u64 *) trans->journal_entries + trans->journal_entries_u64s);
i = vstruct_next(i)) {
enum bkey_invalid_flags invalid_flags = 0;
if (!(flags & BCH_TRANS_COMMIT_no_journal_res))
invalid_flags |= BKEY_INVALID_WRITE|BKEY_INVALID_COMMIT;
if (unlikely(bch2_journal_entry_validate(c, NULL, i,
bcachefs_metadata_version_current,
CPU_BIG_ENDIAN, invalid_flags)))
ret = bch2_trans_commit_journal_entry_invalid(trans, i);
if (ret)
return ret;
}
if (unlikely(!test_bit(BCH_FS_may_go_rw, &c->flags))) {
ret = do_bch2_trans_commit_to_journal_replay(trans); ret = do_bch2_trans_commit_to_journal_replay(trans);
goto out_reset; goto out_reset;
} }
if (!(flags & BTREE_INSERT_NOCHECK_RW) && if (!(flags & BCH_TRANS_COMMIT_no_check_rw) &&
unlikely(!bch2_write_ref_tryget(c, BCH_WRITE_REF_trans))) { unlikely(!bch2_write_ref_tryget(c, BCH_WRITE_REF_trans))) {
ret = bch2_trans_commit_get_rw_cold(trans, flags); ret = bch2_trans_commit_get_rw_cold(trans, flags);
if (ret) if (ret)
goto out_reset; goto out_reset;
} }
if (c->btree_write_buffer.state.nr > c->btree_write_buffer.size / 2 && EBUG_ON(test_bit(BCH_FS_clean_shutdown, &c->flags));
mutex_trylock(&c->btree_write_buffer.flush_lock)) {
bch2_trans_begin(trans);
bch2_trans_unlock(trans);
ret = __bch2_btree_write_buffer_flush(trans, trans->journal_u64s = trans->journal_entries_u64s;
flags|BTREE_INSERT_NOCHECK_RW, true);
if (!ret) {
trace_and_count(c, trans_restart_write_buffer_flush, trans, _THIS_IP_);
ret = btree_trans_restart(trans, BCH_ERR_transaction_restart_write_buffer_flush);
}
goto out;
}
EBUG_ON(test_bit(BCH_FS_CLEAN_SHUTDOWN, &c->flags));
trans->journal_u64s = trans->extra_journal_entries.nr;
trans->journal_transaction_names = READ_ONCE(c->opts.journal_transaction_names); trans->journal_transaction_names = READ_ONCE(c->opts.journal_transaction_names);
if (trans->journal_transaction_names) if (trans->journal_transaction_names)
trans->journal_u64s += jset_u64s(JSET_ENTRY_LOG_U64s); trans->journal_u64s += jset_u64s(JSET_ENTRY_LOG_U64s);
trans_for_each_update(trans, i) { trans_for_each_update(trans, i) {
EBUG_ON(!i->path->should_be_locked); struct btree_path *path = trans->paths + i->path;
ret = bch2_btree_path_upgrade(trans, i->path, i->level + 1); EBUG_ON(!path->should_be_locked);
ret = bch2_btree_path_upgrade(trans, path, i->level + 1);
if (unlikely(ret)) if (unlikely(ret))
goto out; goto out;
EBUG_ON(!btree_node_intent_locked(i->path, i->level)); EBUG_ON(!btree_node_intent_locked(path, i->level));
if (i->key_cache_already_flushed) if (i->key_cache_already_flushed)
continue; continue;
@ -1120,22 +1076,21 @@ int __bch2_trans_commit(struct btree_trans *trans, unsigned flags)
trans->journal_u64s += jset_u64s(i->old_k.u64s); trans->journal_u64s += jset_u64s(i->old_k.u64s);
} }
trans_for_each_wb_update(trans, wb) if (trans->extra_disk_res) {
trans->journal_u64s += jset_u64s(wb->k.k.u64s);
if (trans->extra_journal_res) {
ret = bch2_disk_reservation_add(c, trans->disk_res, ret = bch2_disk_reservation_add(c, trans->disk_res,
trans->extra_journal_res, trans->extra_disk_res,
(flags & BTREE_INSERT_NOFAIL) (flags & BCH_TRANS_COMMIT_no_enospc)
? BCH_DISK_RESERVATION_NOFAIL : 0); ? BCH_DISK_RESERVATION_NOFAIL : 0);
if (ret) if (ret)
goto err; goto err;
} }
retry: retry:
errored_at = NULL;
bch2_trans_verify_not_in_restart(trans); bch2_trans_verify_not_in_restart(trans);
memset(&trans->journal_res, 0, sizeof(trans->journal_res)); if (likely(!(flags & BCH_TRANS_COMMIT_no_journal_res)))
memset(&trans->journal_res, 0, sizeof(trans->journal_res));
ret = do_bch2_trans_commit(trans, flags, &i, _RET_IP_); ret = do_bch2_trans_commit(trans, flags, &errored_at, _RET_IP_);
/* make sure we didn't drop or screw up locks: */ /* make sure we didn't drop or screw up locks: */
bch2_trans_verify_locks(trans); bch2_trans_verify_locks(trans);
@ -1145,7 +1100,7 @@ int __bch2_trans_commit(struct btree_trans *trans, unsigned flags)
trace_and_count(c, transaction_commit, trans, _RET_IP_); trace_and_count(c, transaction_commit, trans, _RET_IP_);
out: out:
if (likely(!(flags & BTREE_INSERT_NOCHECK_RW))) if (likely(!(flags & BCH_TRANS_COMMIT_no_check_rw)))
bch2_write_ref_put(c, BCH_WRITE_REF_trans); bch2_write_ref_put(c, BCH_WRITE_REF_trans);
out_reset: out_reset:
if (!ret) if (!ret)
@ -1154,9 +1109,21 @@ int __bch2_trans_commit(struct btree_trans *trans, unsigned flags)
return ret; return ret;
err: err:
ret = bch2_trans_commit_error(trans, flags, i, ret, _RET_IP_); ret = bch2_trans_commit_error(trans, flags, errored_at, ret, _RET_IP_);
if (ret) if (ret)
goto out; goto out;
/*
* We might have done another transaction commit in the error path -
* i.e. btree write buffer flush - which will have made use of
* trans->journal_res, but with BCH_TRANS_COMMIT_no_journal_res that is
* how the journal sequence number to pin is passed in - so we must
* restart:
*/
if (flags & BCH_TRANS_COMMIT_no_journal_res) {
ret = -BCH_ERR_transaction_restart_nested;
goto out;
}
goto retry; goto retry;
} }

View File

@ -185,33 +185,32 @@ struct btree_node_iter {
* Iterate over all possible positions, synthesizing deleted keys for holes: * Iterate over all possible positions, synthesizing deleted keys for holes:
*/ */
static const __maybe_unused u16 BTREE_ITER_SLOTS = 1 << 0; static const __maybe_unused u16 BTREE_ITER_SLOTS = 1 << 0;
static const __maybe_unused u16 BTREE_ITER_ALL_LEVELS = 1 << 1;
/* /*
* Indicates that intent locks should be taken on leaf nodes, because we expect * Indicates that intent locks should be taken on leaf nodes, because we expect
* to be doing updates: * to be doing updates:
*/ */
static const __maybe_unused u16 BTREE_ITER_INTENT = 1 << 2; static const __maybe_unused u16 BTREE_ITER_INTENT = 1 << 1;
/* /*
* Causes the btree iterator code to prefetch additional btree nodes from disk: * Causes the btree iterator code to prefetch additional btree nodes from disk:
*/ */
static const __maybe_unused u16 BTREE_ITER_PREFETCH = 1 << 3; static const __maybe_unused u16 BTREE_ITER_PREFETCH = 1 << 2;
/* /*
* Used in bch2_btree_iter_traverse(), to indicate whether we're searching for * Used in bch2_btree_iter_traverse(), to indicate whether we're searching for
* @pos or the first key strictly greater than @pos * @pos or the first key strictly greater than @pos
*/ */
static const __maybe_unused u16 BTREE_ITER_IS_EXTENTS = 1 << 4; static const __maybe_unused u16 BTREE_ITER_IS_EXTENTS = 1 << 3;
static const __maybe_unused u16 BTREE_ITER_NOT_EXTENTS = 1 << 5; static const __maybe_unused u16 BTREE_ITER_NOT_EXTENTS = 1 << 4;
static const __maybe_unused u16 BTREE_ITER_CACHED = 1 << 6; static const __maybe_unused u16 BTREE_ITER_CACHED = 1 << 5;
static const __maybe_unused u16 BTREE_ITER_WITH_KEY_CACHE = 1 << 7; static const __maybe_unused u16 BTREE_ITER_WITH_KEY_CACHE = 1 << 6;
static const __maybe_unused u16 BTREE_ITER_WITH_UPDATES = 1 << 8; static const __maybe_unused u16 BTREE_ITER_WITH_UPDATES = 1 << 7;
static const __maybe_unused u16 BTREE_ITER_WITH_JOURNAL = 1 << 9; static const __maybe_unused u16 BTREE_ITER_WITH_JOURNAL = 1 << 8;
static const __maybe_unused u16 __BTREE_ITER_ALL_SNAPSHOTS = 1 << 10; static const __maybe_unused u16 __BTREE_ITER_ALL_SNAPSHOTS = 1 << 9;
static const __maybe_unused u16 BTREE_ITER_ALL_SNAPSHOTS = 1 << 11; static const __maybe_unused u16 BTREE_ITER_ALL_SNAPSHOTS = 1 << 10;
static const __maybe_unused u16 BTREE_ITER_FILTER_SNAPSHOTS = 1 << 12; static const __maybe_unused u16 BTREE_ITER_FILTER_SNAPSHOTS = 1 << 11;
static const __maybe_unused u16 BTREE_ITER_NOPRESERVE = 1 << 13; static const __maybe_unused u16 BTREE_ITER_NOPRESERVE = 1 << 12;
static const __maybe_unused u16 BTREE_ITER_CACHED_NOFILL = 1 << 14; static const __maybe_unused u16 BTREE_ITER_CACHED_NOFILL = 1 << 13;
static const __maybe_unused u16 BTREE_ITER_KEY_CACHE_FILL = 1 << 15; static const __maybe_unused u16 BTREE_ITER_KEY_CACHE_FILL = 1 << 14;
#define __BTREE_ITER_FLAGS_END 16 #define __BTREE_ITER_FLAGS_END 15
enum btree_path_uptodate { enum btree_path_uptodate {
BTREE_ITER_UPTODATE = 0, BTREE_ITER_UPTODATE = 0,
@ -223,13 +222,12 @@ enum btree_path_uptodate {
#define TRACK_PATH_ALLOCATED #define TRACK_PATH_ALLOCATED
#endif #endif
typedef u16 btree_path_idx_t;
struct btree_path { struct btree_path {
u8 idx; btree_path_idx_t sorted_idx;
u8 sorted_idx;
u8 ref; u8 ref;
u8 intent_ref; u8 intent_ref;
u32 alloc_seq;
u32 downgrade_seq;
/* btree_iter_copy starts here: */ /* btree_iter_copy starts here: */
struct bpos pos; struct bpos pos;
@ -283,13 +281,12 @@ static inline unsigned long btree_path_ip_allocated(struct btree_path *path)
*/ */
struct btree_iter { struct btree_iter {
struct btree_trans *trans; struct btree_trans *trans;
struct btree_path *path; btree_path_idx_t path;
struct btree_path *update_path; btree_path_idx_t update_path;
struct btree_path *key_cache_path; btree_path_idx_t key_cache_path;
enum btree_id btree_id:8; enum btree_id btree_id:8;
unsigned min_depth:3; u8 min_depth;
unsigned advanced:1;
/* btree_iter_copy starts here: */ /* btree_iter_copy starts here: */
u16 flags; u16 flags;
@ -306,7 +303,6 @@ struct btree_iter {
/* BTREE_ITER_WITH_JOURNAL: */ /* BTREE_ITER_WITH_JOURNAL: */
size_t journal_idx; size_t journal_idx;
struct bpos journal_pos;
#ifdef TRACK_PATH_ALLOCATED #ifdef TRACK_PATH_ALLOCATED
unsigned long ip_allocated; unsigned long ip_allocated;
#endif #endif
@ -354,16 +350,16 @@ struct btree_insert_entry {
* to the size of the key being overwritten in the btree: * to the size of the key being overwritten in the btree:
*/ */
u8 old_btree_u64s; u8 old_btree_u64s;
btree_path_idx_t path;
struct bkey_i *k; struct bkey_i *k;
struct btree_path *path;
u64 seq;
/* key being overwritten: */ /* key being overwritten: */
struct bkey old_k; struct bkey old_k;
const struct bch_val *old_v; const struct bch_val *old_v;
unsigned long ip_allocated; unsigned long ip_allocated;
}; };
#define BTREE_ITER_MAX 64 #define BTREE_ITER_INITIAL 64
#define BTREE_ITER_MAX (1U << 10)
struct btree_trans_commit_hook; struct btree_trans_commit_hook;
typedef int (btree_trans_commit_hook_fn)(struct btree_trans *, struct btree_trans_commit_hook *); typedef int (btree_trans_commit_hook_fn)(struct btree_trans *, struct btree_trans_commit_hook *);
@ -377,25 +373,30 @@ struct btree_trans_commit_hook {
#define BTREE_TRANS_MAX_LOCK_HOLD_TIME_NS 10000 #define BTREE_TRANS_MAX_LOCK_HOLD_TIME_NS 10000
struct btree_trans_paths {
unsigned long nr_paths;
struct btree_path paths[];
};
struct btree_trans { struct btree_trans {
struct bch_fs *c; struct bch_fs *c;
const char *fn;
struct closure ref;
struct list_head list;
u64 last_begin_time;
u8 lock_may_not_fail; unsigned long *paths_allocated;
u8 lock_must_abort; struct btree_path *paths;
struct btree_bkey_cached_common *locking; btree_path_idx_t *sorted;
struct six_lock_waiter locking_wait; struct btree_insert_entry *updates;
int srcu_idx; void *mem;
unsigned mem_top;
unsigned mem_bytes;
btree_path_idx_t nr_sorted;
btree_path_idx_t nr_paths;
btree_path_idx_t nr_paths_max;
u8 fn_idx; u8 fn_idx;
u8 nr_sorted;
u8 nr_updates; u8 nr_updates;
u8 nr_wb_updates; u8 lock_must_abort;
u8 wb_updates_size; bool lock_may_not_fail:1;
bool srcu_held:1; bool srcu_held:1;
bool used_mempool:1; bool used_mempool:1;
bool in_traverse_all:1; bool in_traverse_all:1;
@ -407,41 +408,56 @@ struct btree_trans {
bool write_locked:1; bool write_locked:1;
enum bch_errcode restarted:16; enum bch_errcode restarted:16;
u32 restart_count; u32 restart_count;
u64 last_begin_time;
unsigned long last_begin_ip; unsigned long last_begin_ip;
unsigned long last_restarted_ip; unsigned long last_restarted_ip;
unsigned long srcu_lock_time; unsigned long srcu_lock_time;
/* const char *fn;
* For when bch2_trans_update notices we'll be splitting a compressed struct btree_bkey_cached_common *locking;
* extent: struct six_lock_waiter locking_wait;
*/ int srcu_idx;
unsigned extra_journal_res;
unsigned nr_max_paths;
u64 paths_allocated;
unsigned mem_top;
unsigned mem_max;
unsigned mem_bytes;
void *mem;
u8 sorted[BTREE_ITER_MAX + 8];
struct btree_path paths[BTREE_ITER_MAX];
struct btree_insert_entry updates[BTREE_ITER_MAX];
struct btree_write_buffered_key *wb_updates;
/* update path: */ /* update path: */
u16 journal_entries_u64s;
u16 journal_entries_size;
struct jset_entry *journal_entries;
struct btree_trans_commit_hook *hooks; struct btree_trans_commit_hook *hooks;
darray_u64 extra_journal_entries;
struct journal_entry_pin *journal_pin; struct journal_entry_pin *journal_pin;
struct journal_res journal_res; struct journal_res journal_res;
u64 *journal_seq; u64 *journal_seq;
struct disk_reservation *disk_res; struct disk_reservation *disk_res;
unsigned journal_u64s; unsigned journal_u64s;
unsigned extra_disk_res; /* XXX kill */
struct replicas_delta_list *fs_usage_deltas; struct replicas_delta_list *fs_usage_deltas;
/* Entries before this are zeroed out on every bch2_trans_get() call */
struct list_head list;
struct closure ref;
unsigned long _paths_allocated[BITS_TO_LONGS(BTREE_ITER_INITIAL)];
struct btree_trans_paths trans_paths;
struct btree_path _paths[BTREE_ITER_INITIAL];
btree_path_idx_t _sorted[BTREE_ITER_INITIAL + 4];
struct btree_insert_entry _updates[BTREE_ITER_INITIAL];
}; };
static inline struct btree_path *btree_iter_path(struct btree_trans *trans, struct btree_iter *iter)
{
return trans->paths + iter->path;
}
static inline struct btree_path *btree_iter_key_cache_path(struct btree_trans *trans, struct btree_iter *iter)
{
return iter->key_cache_path
? trans->paths + iter->key_cache_path
: NULL;
}
#define BCH_BTREE_WRITE_TYPES() \ #define BCH_BTREE_WRITE_TYPES() \
x(initial, 0) \ x(initial, 0) \
x(init_next_bset, 1) \ x(init_next_bset, 1) \

View File

@ -24,7 +24,7 @@ static inline int btree_insert_entry_cmp(const struct btree_insert_entry *l,
} }
static int __must_check static int __must_check
bch2_trans_update_by_path(struct btree_trans *, struct btree_path *, bch2_trans_update_by_path(struct btree_trans *, btree_path_idx_t,
struct bkey_i *, enum btree_update_flags, struct bkey_i *, enum btree_update_flags,
unsigned long ip); unsigned long ip);
@ -200,7 +200,7 @@ int bch2_trans_update_extent_overwrite(struct btree_trans *trans,
*/ */
if (nr_splits > 1 && if (nr_splits > 1 &&
(compressed_sectors = bch2_bkey_sectors_compressed(old))) (compressed_sectors = bch2_bkey_sectors_compressed(old)))
trans->extra_journal_res += compressed_sectors * (nr_splits - 1); trans->extra_disk_res += compressed_sectors * (nr_splits - 1);
if (front_split) { if (front_split) {
update = bch2_bkey_make_mut_noupdate(trans, old); update = bch2_bkey_make_mut_noupdate(trans, old);
@ -339,21 +339,22 @@ static int bch2_trans_update_extent(struct btree_trans *trans,
} }
static noinline int flush_new_cached_update(struct btree_trans *trans, static noinline int flush_new_cached_update(struct btree_trans *trans,
struct btree_path *path,
struct btree_insert_entry *i, struct btree_insert_entry *i,
enum btree_update_flags flags, enum btree_update_flags flags,
unsigned long ip) unsigned long ip)
{ {
struct btree_path *btree_path;
struct bkey k; struct bkey k;
int ret; int ret;
btree_path = bch2_path_get(trans, path->btree_id, path->pos, 1, 0, btree_path_idx_t path_idx =
BTREE_ITER_INTENT, _THIS_IP_); bch2_path_get(trans, i->btree_id, i->old_k.p, 1, 0,
ret = bch2_btree_path_traverse(trans, btree_path, 0); BTREE_ITER_INTENT, _THIS_IP_);
ret = bch2_btree_path_traverse(trans, path_idx, 0);
if (ret) if (ret)
goto out; goto out;
struct btree_path *btree_path = trans->paths + path_idx;
/* /*
* The old key in the insert entry might actually refer to an existing * The old key in the insert entry might actually refer to an existing
* key in the btree that has been deleted from cache and not yet * key in the btree that has been deleted from cache and not yet
@ -368,43 +369,34 @@ static noinline int flush_new_cached_update(struct btree_trans *trans,
i->flags |= BTREE_TRIGGER_NORUN; i->flags |= BTREE_TRIGGER_NORUN;
btree_path_set_should_be_locked(btree_path); btree_path_set_should_be_locked(btree_path);
ret = bch2_trans_update_by_path(trans, btree_path, i->k, flags, ip); ret = bch2_trans_update_by_path(trans, path_idx, i->k, flags, ip);
out: out:
bch2_path_put(trans, btree_path, true); bch2_path_put(trans, path_idx, true);
return ret; return ret;
} }
static int __must_check static int __must_check
bch2_trans_update_by_path(struct btree_trans *trans, struct btree_path *path, bch2_trans_update_by_path(struct btree_trans *trans, btree_path_idx_t path_idx,
struct bkey_i *k, enum btree_update_flags flags, struct bkey_i *k, enum btree_update_flags flags,
unsigned long ip) unsigned long ip)
{ {
struct bch_fs *c = trans->c; struct bch_fs *c = trans->c;
struct btree_insert_entry *i, n; struct btree_insert_entry *i, n;
u64 seq = 0;
int cmp; int cmp;
struct btree_path *path = trans->paths + path_idx;
EBUG_ON(!path->should_be_locked); EBUG_ON(!path->should_be_locked);
EBUG_ON(trans->nr_updates >= BTREE_ITER_MAX); EBUG_ON(trans->nr_updates >= trans->nr_paths);
EBUG_ON(!bpos_eq(k->k.p, path->pos)); EBUG_ON(!bpos_eq(k->k.p, path->pos));
/*
* The transaction journal res hasn't been allocated at this point.
* That occurs at commit time. Reuse the seq field to pass in the seq
* of a prejournaled key.
*/
if (flags & BTREE_UPDATE_PREJOURNAL)
seq = trans->journal_res.seq;
n = (struct btree_insert_entry) { n = (struct btree_insert_entry) {
.flags = flags, .flags = flags,
.bkey_type = __btree_node_type(path->level, path->btree_id), .bkey_type = __btree_node_type(path->level, path->btree_id),
.btree_id = path->btree_id, .btree_id = path->btree_id,
.level = path->level, .level = path->level,
.cached = path->cached, .cached = path->cached,
.path = path, .path = path_idx,
.k = k, .k = k,
.seq = seq,
.ip_allocated = ip, .ip_allocated = ip,
}; };
@ -418,7 +410,7 @@ bch2_trans_update_by_path(struct btree_trans *trans, struct btree_path *path,
* Pending updates are kept sorted: first, find position of new update, * Pending updates are kept sorted: first, find position of new update,
* then delete/trim any updates the new update overwrites: * then delete/trim any updates the new update overwrites:
*/ */
trans_for_each_update(trans, i) { for (i = trans->updates; i < trans->updates + trans->nr_updates; i++) {
cmp = btree_insert_entry_cmp(&n, i); cmp = btree_insert_entry_cmp(&n, i);
if (cmp <= 0) if (cmp <= 0)
break; break;
@ -432,7 +424,6 @@ bch2_trans_update_by_path(struct btree_trans *trans, struct btree_path *path,
i->cached = n.cached; i->cached = n.cached;
i->k = n.k; i->k = n.k;
i->path = n.path; i->path = n.path;
i->seq = n.seq;
i->ip_allocated = n.ip_allocated; i->ip_allocated = n.ip_allocated;
} else { } else {
array_insert_item(trans->updates, trans->nr_updates, array_insert_item(trans->updates, trans->nr_updates,
@ -452,7 +443,7 @@ bch2_trans_update_by_path(struct btree_trans *trans, struct btree_path *path,
} }
} }
__btree_path_get(i->path, true); __btree_path_get(trans->paths + i->path, true);
/* /*
* If a key is present in the key cache, it must also exist in the * If a key is present in the key cache, it must also exist in the
@ -462,7 +453,7 @@ bch2_trans_update_by_path(struct btree_trans *trans, struct btree_path *path,
* work: * work:
*/ */
if (path->cached && bkey_deleted(&i->old_k)) if (path->cached && bkey_deleted(&i->old_k))
return flush_new_cached_update(trans, path, i, flags, ip); return flush_new_cached_update(trans, i, flags, ip);
return 0; return 0;
} }
@ -471,9 +462,11 @@ static noinline int bch2_trans_update_get_key_cache(struct btree_trans *trans,
struct btree_iter *iter, struct btree_iter *iter,
struct btree_path *path) struct btree_path *path)
{ {
if (!iter->key_cache_path || struct btree_path *key_cache_path = btree_iter_key_cache_path(trans, iter);
!iter->key_cache_path->should_be_locked ||
!bpos_eq(iter->key_cache_path->pos, iter->pos)) { if (!key_cache_path ||
!key_cache_path->should_be_locked ||
!bpos_eq(key_cache_path->pos, iter->pos)) {
struct bkey_cached *ck; struct bkey_cached *ck;
int ret; int ret;
@ -488,19 +481,18 @@ static noinline int bch2_trans_update_get_key_cache(struct btree_trans *trans,
iter->flags & BTREE_ITER_INTENT, iter->flags & BTREE_ITER_INTENT,
_THIS_IP_); _THIS_IP_);
ret = bch2_btree_path_traverse(trans, iter->key_cache_path, ret = bch2_btree_path_traverse(trans, iter->key_cache_path, BTREE_ITER_CACHED);
BTREE_ITER_CACHED);
if (unlikely(ret)) if (unlikely(ret))
return ret; return ret;
ck = (void *) iter->key_cache_path->l[0].b; ck = (void *) trans->paths[iter->key_cache_path].l[0].b;
if (test_bit(BKEY_CACHED_DIRTY, &ck->flags)) { if (test_bit(BKEY_CACHED_DIRTY, &ck->flags)) {
trace_and_count(trans->c, trans_restart_key_cache_raced, trans, _RET_IP_); trace_and_count(trans->c, trans_restart_key_cache_raced, trans, _RET_IP_);
return btree_trans_restart(trans, BCH_ERR_transaction_restart_key_cache_raced); return btree_trans_restart(trans, BCH_ERR_transaction_restart_key_cache_raced);
} }
btree_path_set_should_be_locked(iter->key_cache_path); btree_path_set_should_be_locked(trans->paths + iter->key_cache_path);
} }
return 0; return 0;
@ -509,7 +501,7 @@ static noinline int bch2_trans_update_get_key_cache(struct btree_trans *trans,
int __must_check bch2_trans_update(struct btree_trans *trans, struct btree_iter *iter, int __must_check bch2_trans_update(struct btree_trans *trans, struct btree_iter *iter,
struct bkey_i *k, enum btree_update_flags flags) struct bkey_i *k, enum btree_update_flags flags)
{ {
struct btree_path *path = iter->update_path ?: iter->path; btree_path_idx_t path_idx = iter->update_path ?: iter->path;
int ret; int ret;
if (iter->flags & BTREE_ITER_IS_EXTENTS) if (iter->flags & BTREE_ITER_IS_EXTENTS)
@ -529,6 +521,7 @@ int __must_check bch2_trans_update(struct btree_trans *trans, struct btree_iter
/* /*
* Ensure that updates to cached btrees go to the key cache: * Ensure that updates to cached btrees go to the key cache:
*/ */
struct btree_path *path = trans->paths + path_idx;
if (!(flags & BTREE_UPDATE_KEY_CACHE_RECLAIM) && if (!(flags & BTREE_UPDATE_KEY_CACHE_RECLAIM) &&
!path->cached && !path->cached &&
!path->level && !path->level &&
@ -537,27 +530,15 @@ int __must_check bch2_trans_update(struct btree_trans *trans, struct btree_iter
if (ret) if (ret)
return ret; return ret;
path = iter->key_cache_path; path_idx = iter->key_cache_path;
} }
return bch2_trans_update_by_path(trans, path, k, flags, _RET_IP_); return bch2_trans_update_by_path(trans, path_idx, k, flags, _RET_IP_);
} }
/* int bch2_btree_insert_clone_trans(struct btree_trans *trans,
* Add a transaction update for a key that has already been journaled. enum btree_id btree,
*/ struct bkey_i *k)
int __must_check bch2_trans_update_seq(struct btree_trans *trans, u64 seq,
struct btree_iter *iter, struct bkey_i *k,
enum btree_update_flags flags)
{
trans->journal_res.seq = seq;
return bch2_trans_update(trans, iter, k, flags|BTREE_UPDATE_NOJOURNAL|
BTREE_UPDATE_PREJOURNAL);
}
static noinline int bch2_btree_insert_clone_trans(struct btree_trans *trans,
enum btree_id btree,
struct bkey_i *k)
{ {
struct bkey_i *n = bch2_trans_kmalloc(trans, bkey_bytes(&k->k)); struct bkey_i *n = bch2_trans_kmalloc(trans, bkey_bytes(&k->k));
int ret = PTR_ERR_OR_ZERO(n); int ret = PTR_ERR_OR_ZERO(n);
@ -568,60 +549,30 @@ static noinline int bch2_btree_insert_clone_trans(struct btree_trans *trans,
return bch2_btree_insert_trans(trans, btree, n, 0); return bch2_btree_insert_trans(trans, btree, n, 0);
} }
int __must_check bch2_trans_update_buffered(struct btree_trans *trans, struct jset_entry *__bch2_trans_jset_entry_alloc(struct btree_trans *trans, unsigned u64s)
enum btree_id btree,
struct bkey_i *k)
{ {
struct btree_write_buffered_key *i; unsigned new_top = trans->journal_entries_u64s + u64s;
int ret; unsigned old_size = trans->journal_entries_size;
EBUG_ON(trans->nr_wb_updates > trans->wb_updates_size); if (new_top > trans->journal_entries_size) {
EBUG_ON(k->k.u64s > BTREE_WRITE_BUFERED_U64s_MAX); trans->journal_entries_size = roundup_pow_of_two(new_top);
if (unlikely(trans->journal_replay_not_finished)) btree_trans_stats(trans)->journal_entries_size = trans->journal_entries_size;
return bch2_btree_insert_clone_trans(trans, btree, k);
trans_for_each_wb_update(trans, i) {
if (i->btree == btree && bpos_eq(i->k.k.p, k->k.p)) {
bkey_copy(&i->k, k);
return 0;
}
} }
if (!trans->wb_updates || struct jset_entry *n =
trans->nr_wb_updates == trans->wb_updates_size) { bch2_trans_kmalloc_nomemzero(trans,
struct btree_write_buffered_key *u; trans->journal_entries_size * sizeof(u64));
if (IS_ERR(n))
return ERR_CAST(n);
if (trans->nr_wb_updates == trans->wb_updates_size) { if (trans->journal_entries)
struct btree_transaction_stats *s = btree_trans_stats(trans); memcpy(n, trans->journal_entries, old_size * sizeof(u64));
trans->journal_entries = n;
BUG_ON(trans->wb_updates_size > U8_MAX / 2); struct jset_entry *e = btree_trans_journal_entries_top(trans);
trans->wb_updates_size = max(1, trans->wb_updates_size * 2); trans->journal_entries_u64s = new_top;
if (s) return e;
s->wb_updates_size = trans->wb_updates_size;
}
u = bch2_trans_kmalloc_nomemzero(trans,
trans->wb_updates_size *
sizeof(struct btree_write_buffered_key));
ret = PTR_ERR_OR_ZERO(u);
if (ret)
return ret;
if (trans->nr_wb_updates)
memcpy(u, trans->wb_updates, trans->nr_wb_updates *
sizeof(struct btree_write_buffered_key));
trans->wb_updates = u;
}
trans->wb_updates[trans->nr_wb_updates] = (struct btree_write_buffered_key) {
.btree = btree,
};
bkey_copy(&trans->wb_updates[trans->nr_wb_updates].k, k);
trans->nr_wb_updates++;
return 0;
} }
int bch2_bkey_get_empty_slot(struct btree_trans *trans, struct btree_iter *iter, int bch2_bkey_get_empty_slot(struct btree_trans *trans, struct btree_iter *iter,
@ -733,20 +684,6 @@ int bch2_btree_delete_at(struct btree_trans *trans,
return bch2_btree_delete_extent_at(trans, iter, 0, update_flags); return bch2_btree_delete_extent_at(trans, iter, 0, update_flags);
} }
int bch2_btree_delete_at_buffered(struct btree_trans *trans,
enum btree_id btree, struct bpos pos)
{
struct bkey_i *k;
k = bch2_trans_kmalloc(trans, sizeof(*k));
if (IS_ERR(k))
return PTR_ERR(k);
bkey_init(&k->k);
k->k.p = pos;
return bch2_trans_update_buffered(trans, btree, k);
}
int bch2_btree_delete(struct btree_trans *trans, int bch2_btree_delete(struct btree_trans *trans,
enum btree_id btree, struct bpos pos, enum btree_id btree, struct bpos pos,
unsigned update_flags) unsigned update_flags)
@ -809,7 +746,7 @@ int bch2_btree_delete_range_trans(struct btree_trans *trans, enum btree_id id,
ret = bch2_trans_update(trans, &iter, &delete, update_flags) ?: ret = bch2_trans_update(trans, &iter, &delete, update_flags) ?:
bch2_trans_commit(trans, &disk_res, journal_seq, bch2_trans_commit(trans, &disk_res, journal_seq,
BTREE_INSERT_NOFAIL); BCH_TRANS_COMMIT_no_enospc);
bch2_disk_reservation_put(trans->c, &disk_res); bch2_disk_reservation_put(trans->c, &disk_res);
err: err:
/* /*
@ -851,56 +788,26 @@ int bch2_btree_delete_range(struct bch_fs *c, enum btree_id id,
int bch2_btree_bit_mod(struct btree_trans *trans, enum btree_id btree, int bch2_btree_bit_mod(struct btree_trans *trans, enum btree_id btree,
struct bpos pos, bool set) struct bpos pos, bool set)
{ {
struct bkey_i *k; struct bkey_i k;
int ret = 0;
k = bch2_trans_kmalloc_nomemzero(trans, sizeof(*k)); bkey_init(&k.k);
ret = PTR_ERR_OR_ZERO(k); k.k.type = set ? KEY_TYPE_set : KEY_TYPE_deleted;
if (unlikely(ret)) k.k.p = pos;
return ret;
bkey_init(&k->k); return bch2_trans_update_buffered(trans, btree, &k);
k->k.type = set ? KEY_TYPE_set : KEY_TYPE_deleted;
k->k.p = pos;
return bch2_trans_update_buffered(trans, btree, k);
} }
__printf(2, 0) static int __bch2_trans_log_msg(struct btree_trans *trans, struct printbuf *buf, unsigned u64s)
static int __bch2_trans_log_msg(darray_u64 *entries, const char *fmt, va_list args)
{ {
struct printbuf buf = PRINTBUF; struct jset_entry *e = bch2_trans_jset_entry_alloc(trans, jset_u64s(u64s));
struct jset_entry_log *l; int ret = PTR_ERR_OR_ZERO(e);
unsigned u64s;
int ret;
prt_vprintf(&buf, fmt, args);
ret = buf.allocation_failure ? -BCH_ERR_ENOMEM_trans_log_msg : 0;
if (ret) if (ret)
goto err; return ret;
u64s = DIV_ROUND_UP(buf.pos, sizeof(u64)); struct jset_entry_log *l = container_of(e, struct jset_entry_log, entry);
journal_entry_init(e, BCH_JSET_ENTRY_log, 0, 1, u64s);
ret = darray_make_room(entries, jset_u64s(u64s)); memcpy(l->d, buf->buf, buf->pos);
if (ret) return 0;
goto err;
l = (void *) &darray_top(*entries);
l->entry.u64s = cpu_to_le16(u64s);
l->entry.btree_id = 0;
l->entry.level = 1;
l->entry.type = BCH_JSET_ENTRY_log;
l->entry.pad[0] = 0;
l->entry.pad[1] = 0;
l->entry.pad[2] = 0;
memcpy(l->d, buf.buf, buf.pos);
while (buf.pos & 7)
l->d[buf.pos++] = '\0';
entries->nr += jset_u64s(u64s);
err:
printbuf_exit(&buf);
return ret;
} }
__printf(3, 0) __printf(3, 0)
@ -908,16 +815,32 @@ static int
__bch2_fs_log_msg(struct bch_fs *c, unsigned commit_flags, const char *fmt, __bch2_fs_log_msg(struct bch_fs *c, unsigned commit_flags, const char *fmt,
va_list args) va_list args)
{ {
int ret; struct printbuf buf = PRINTBUF;
prt_vprintf(&buf, fmt, args);
unsigned u64s = DIV_ROUND_UP(buf.pos, sizeof(u64));
prt_chars(&buf, '\0', u64s * sizeof(u64) - buf.pos);
int ret = buf.allocation_failure ? -BCH_ERR_ENOMEM_trans_log_msg : 0;
if (ret)
goto err;
if (!test_bit(JOURNAL_STARTED, &c->journal.flags)) { if (!test_bit(JOURNAL_STARTED, &c->journal.flags)) {
ret = __bch2_trans_log_msg(&c->journal.early_journal_entries, fmt, args); ret = darray_make_room(&c->journal.early_journal_entries, jset_u64s(u64s));
if (ret)
goto err;
struct jset_entry_log *l = (void *) &darray_top(c->journal.early_journal_entries);
journal_entry_init(&l->entry, BCH_JSET_ENTRY_log, 0, 1, u64s);
memcpy(l->d, buf.buf, buf.pos);
c->journal.early_journal_entries.nr += jset_u64s(u64s);
} else { } else {
ret = bch2_trans_do(c, NULL, NULL, ret = bch2_trans_do(c, NULL, NULL,
BTREE_INSERT_LAZY_RW|commit_flags, BCH_TRANS_COMMIT_lazy_rw|commit_flags,
__bch2_trans_log_msg(&trans->extra_journal_entries, fmt, args)); __bch2_trans_log_msg(trans, &buf, u64s));
} }
err:
printbuf_exit(&buf);
return ret; return ret;
} }

View File

@ -21,42 +21,32 @@ void bch2_btree_add_journal_pin(struct bch_fs *, struct btree *, u64);
void bch2_btree_insert_key_leaf(struct btree_trans *, struct btree_path *, void bch2_btree_insert_key_leaf(struct btree_trans *, struct btree_path *,
struct bkey_i *, u64); struct bkey_i *, u64);
enum btree_insert_flags { #define BCH_TRANS_COMMIT_FLAGS() \
x(no_enospc, "don't check for enospc") \
x(no_check_rw, "don't attempt to take a ref on c->writes") \
x(lazy_rw, "go read-write if we haven't yet - only for use in recovery") \
x(no_journal_res, "don't take a journal reservation, instead " \
"pin journal entry referred to by trans->journal_res.seq") \
x(journal_reclaim, "operation required for journal reclaim; may return error" \
"instead of deadlocking if BCH_WATERMARK_reclaim not specified")\
enum __bch_trans_commit_flags {
/* First bits for bch_watermark: */ /* First bits for bch_watermark: */
__BTREE_INSERT_NOFAIL = BCH_WATERMARK_BITS, __BCH_TRANS_COMMIT_FLAGS_START = BCH_WATERMARK_BITS,
__BTREE_INSERT_NOCHECK_RW, #define x(n, ...) __BCH_TRANS_COMMIT_##n,
__BTREE_INSERT_LAZY_RW, BCH_TRANS_COMMIT_FLAGS()
__BTREE_INSERT_JOURNAL_REPLAY, #undef x
__BTREE_INSERT_JOURNAL_RECLAIM,
__BTREE_INSERT_NOWAIT,
__BTREE_INSERT_GC_LOCK_HELD,
__BCH_HASH_SET_MUST_CREATE,
__BCH_HASH_SET_MUST_REPLACE,
}; };
/* Don't check for -ENOSPC: */ enum bch_trans_commit_flags {
#define BTREE_INSERT_NOFAIL BIT(__BTREE_INSERT_NOFAIL) #define x(n, ...) BCH_TRANS_COMMIT_##n = BIT(__BCH_TRANS_COMMIT_##n),
BCH_TRANS_COMMIT_FLAGS()
#define BTREE_INSERT_NOCHECK_RW BIT(__BTREE_INSERT_NOCHECK_RW) #undef x
#define BTREE_INSERT_LAZY_RW BIT(__BTREE_INSERT_LAZY_RW) };
/* Insert is for journal replay - don't get journal reservations: */
#define BTREE_INSERT_JOURNAL_REPLAY BIT(__BTREE_INSERT_JOURNAL_REPLAY)
/* Insert is being called from journal reclaim path: */
#define BTREE_INSERT_JOURNAL_RECLAIM BIT(__BTREE_INSERT_JOURNAL_RECLAIM)
/* Don't block on allocation failure (for new btree nodes: */
#define BTREE_INSERT_NOWAIT BIT(__BTREE_INSERT_NOWAIT)
#define BTREE_INSERT_GC_LOCK_HELD BIT(__BTREE_INSERT_GC_LOCK_HELD)
#define BCH_HASH_SET_MUST_CREATE BIT(__BCH_HASH_SET_MUST_CREATE)
#define BCH_HASH_SET_MUST_REPLACE BIT(__BCH_HASH_SET_MUST_REPLACE)
int bch2_btree_delete_extent_at(struct btree_trans *, struct btree_iter *, int bch2_btree_delete_extent_at(struct btree_trans *, struct btree_iter *,
unsigned, unsigned); unsigned, unsigned);
int bch2_btree_delete_at(struct btree_trans *, struct btree_iter *, unsigned); int bch2_btree_delete_at(struct btree_trans *, struct btree_iter *, unsigned);
int bch2_btree_delete_at_buffered(struct btree_trans *, enum btree_id, struct bpos);
int bch2_btree_delete(struct btree_trans *, enum btree_id, struct bpos, unsigned); int bch2_btree_delete(struct btree_trans *, enum btree_id, struct bpos, unsigned);
int bch2_btree_insert_nonextent(struct btree_trans *, enum btree_id, int bch2_btree_insert_nonextent(struct btree_trans *, enum btree_id,
@ -74,6 +64,12 @@ int bch2_btree_delete_range(struct bch_fs *, enum btree_id,
int bch2_btree_bit_mod(struct btree_trans *, enum btree_id, struct bpos, bool); int bch2_btree_bit_mod(struct btree_trans *, enum btree_id, struct bpos, bool);
static inline int bch2_btree_delete_at_buffered(struct btree_trans *trans,
enum btree_id btree, struct bpos pos)
{
return bch2_btree_bit_mod(trans, btree, pos, false);
}
int __bch2_insert_snapshot_whiteouts(struct btree_trans *, enum btree_id, int __bch2_insert_snapshot_whiteouts(struct btree_trans *, enum btree_id,
struct bpos, struct bpos); struct bpos, struct bpos);
@ -105,10 +101,44 @@ int bch2_bkey_get_empty_slot(struct btree_trans *, struct btree_iter *,
int __must_check bch2_trans_update(struct btree_trans *, struct btree_iter *, int __must_check bch2_trans_update(struct btree_trans *, struct btree_iter *,
struct bkey_i *, enum btree_update_flags); struct bkey_i *, enum btree_update_flags);
int __must_check bch2_trans_update_seq(struct btree_trans *, u64, struct btree_iter *,
struct bkey_i *, enum btree_update_flags); struct jset_entry *__bch2_trans_jset_entry_alloc(struct btree_trans *, unsigned);
int __must_check bch2_trans_update_buffered(struct btree_trans *,
enum btree_id, struct bkey_i *); static inline struct jset_entry *btree_trans_journal_entries_top(struct btree_trans *trans)
{
return (void *) ((u64 *) trans->journal_entries + trans->journal_entries_u64s);
}
static inline struct jset_entry *
bch2_trans_jset_entry_alloc(struct btree_trans *trans, unsigned u64s)
{
if (!trans->journal_entries ||
trans->journal_entries_u64s + u64s > trans->journal_entries_size)
return __bch2_trans_jset_entry_alloc(trans, u64s);
struct jset_entry *e = btree_trans_journal_entries_top(trans);
trans->journal_entries_u64s += u64s;
return e;
}
int bch2_btree_insert_clone_trans(struct btree_trans *, enum btree_id, struct bkey_i *);
static inline int __must_check bch2_trans_update_buffered(struct btree_trans *trans,
enum btree_id btree,
struct bkey_i *k)
{
if (unlikely(trans->journal_replay_not_finished))
return bch2_btree_insert_clone_trans(trans, btree, k);
struct jset_entry *e = bch2_trans_jset_entry_alloc(trans, jset_u64s(k->k.u64s));
int ret = PTR_ERR_OR_ZERO(e);
if (ret)
return ret;
journal_entry_init(e, BCH_JSET_ENTRY_write_buffer_keys, btree, 0, k->k.u64s);
bkey_copy(e->start, k);
return 0;
}
void bch2_trans_commit_hook(struct btree_trans *, void bch2_trans_commit_hook(struct btree_trans *,
struct btree_trans_commit_hook *); struct btree_trans_commit_hook *);
@ -157,28 +187,19 @@ static inline int bch2_trans_commit(struct btree_trans *trans,
bch2_trans_run(_c, commit_do(trans, _disk_res, _journal_seq, _flags, _do)) bch2_trans_run(_c, commit_do(trans, _disk_res, _journal_seq, _flags, _do))
#define trans_for_each_update(_trans, _i) \ #define trans_for_each_update(_trans, _i) \
for ((_i) = (_trans)->updates; \ for (struct btree_insert_entry *_i = (_trans)->updates; \
(_i) < (_trans)->updates + (_trans)->nr_updates; \ (_i) < (_trans)->updates + (_trans)->nr_updates; \
(_i)++) (_i)++)
#define trans_for_each_wb_update(_trans, _i) \
for ((_i) = (_trans)->wb_updates; \
(_i) < (_trans)->wb_updates + (_trans)->nr_wb_updates; \
(_i)++)
static inline void bch2_trans_reset_updates(struct btree_trans *trans) static inline void bch2_trans_reset_updates(struct btree_trans *trans)
{ {
struct btree_insert_entry *i;
trans_for_each_update(trans, i) trans_for_each_update(trans, i)
bch2_path_put(trans, i->path, true); bch2_path_put(trans, i->path, true);
trans->extra_journal_res = 0;
trans->nr_updates = 0; trans->nr_updates = 0;
trans->nr_wb_updates = 0; trans->journal_entries_u64s = 0;
trans->wb_updates = NULL;
trans->hooks = NULL; trans->hooks = NULL;
trans->extra_journal_entries.nr = 0; trans->extra_disk_res = 0;
if (trans->fs_usage_deltas) { if (trans->fs_usage_deltas) {
trans->fs_usage_deltas->used = 0; trans->fs_usage_deltas->used = 0;

View File

@ -25,24 +25,24 @@
#include <linux/random.h> #include <linux/random.h>
static int bch2_btree_insert_node(struct btree_update *, struct btree_trans *, static int bch2_btree_insert_node(struct btree_update *, struct btree_trans *,
struct btree_path *, struct btree *, btree_path_idx_t, struct btree *,
struct keylist *, unsigned); struct keylist *, unsigned);
static void bch2_btree_update_add_new_node(struct btree_update *, struct btree *); static void bch2_btree_update_add_new_node(struct btree_update *, struct btree *);
static struct btree_path *get_unlocked_mut_path(struct btree_trans *trans, static btree_path_idx_t get_unlocked_mut_path(struct btree_trans *trans,
enum btree_id btree_id, enum btree_id btree_id,
unsigned level, unsigned level,
struct bpos pos) struct bpos pos)
{ {
struct btree_path *path; btree_path_idx_t path_idx = bch2_path_get(trans, btree_id, pos, level + 1, level,
path = bch2_path_get(trans, btree_id, pos, level + 1, level,
BTREE_ITER_NOPRESERVE| BTREE_ITER_NOPRESERVE|
BTREE_ITER_INTENT, _RET_IP_); BTREE_ITER_INTENT, _RET_IP_);
path = bch2_btree_path_make_mut(trans, path, true, _RET_IP_); path_idx = bch2_btree_path_make_mut(trans, path_idx, true, _RET_IP_);
struct btree_path *path = trans->paths + path_idx;
bch2_btree_path_downgrade(trans, path); bch2_btree_path_downgrade(trans, path);
__bch2_btree_path_unlock(trans, path); __bch2_btree_path_unlock(trans, path);
return path; return path_idx;
} }
/* Debug code: */ /* Debug code: */
@ -164,9 +164,11 @@ static bool bch2_btree_node_format_fits(struct bch_fs *c, struct btree *b,
/* Btree node freeing/allocation: */ /* Btree node freeing/allocation: */
static void __btree_node_free(struct bch_fs *c, struct btree *b) static void __btree_node_free(struct btree_trans *trans, struct btree *b)
{ {
trace_and_count(c, btree_node_free, c, b); struct bch_fs *c = trans->c;
trace_and_count(c, btree_node_free, trans, b);
BUG_ON(btree_node_write_blocked(b)); BUG_ON(btree_node_write_blocked(b));
BUG_ON(btree_node_dirty(b)); BUG_ON(btree_node_dirty(b));
@ -188,15 +190,15 @@ static void bch2_btree_node_free_inmem(struct btree_trans *trans,
struct btree *b) struct btree *b)
{ {
struct bch_fs *c = trans->c; struct bch_fs *c = trans->c;
unsigned level = b->c.level; unsigned i, level = b->c.level;
bch2_btree_node_lock_write_nofail(trans, path, &b->c); bch2_btree_node_lock_write_nofail(trans, path, &b->c);
bch2_btree_node_hash_remove(&c->btree_cache, b); bch2_btree_node_hash_remove(&c->btree_cache, b);
__btree_node_free(c, b); __btree_node_free(trans, b);
six_unlock_write(&b->c.lock); six_unlock_write(&b->c.lock);
mark_btree_node_locked_noreset(path, level, BTREE_NODE_INTENT_LOCKED); mark_btree_node_locked_noreset(path, level, BTREE_NODE_INTENT_LOCKED);
trans_for_each_path(trans, path) trans_for_each_path(trans, path, i)
if (path->l[level].b == b) { if (path->l[level].b == b) {
btree_node_unlock(trans, path, level); btree_node_unlock(trans, path, level);
path->l[level].b = ERR_PTR(-BCH_ERR_no_btree_node_init); path->l[level].b = ERR_PTR(-BCH_ERR_no_btree_node_init);
@ -210,7 +212,7 @@ static void bch2_btree_node_free_never_used(struct btree_update *as,
struct bch_fs *c = as->c; struct bch_fs *c = as->c;
struct prealloc_nodes *p = &as->prealloc_nodes[b->c.lock.readers != NULL]; struct prealloc_nodes *p = &as->prealloc_nodes[b->c.lock.readers != NULL];
struct btree_path *path; struct btree_path *path;
unsigned level = b->c.level; unsigned i, level = b->c.level;
BUG_ON(!list_empty(&b->write_blocked)); BUG_ON(!list_empty(&b->write_blocked));
BUG_ON(b->will_make_reachable != (1UL|(unsigned long) as)); BUG_ON(b->will_make_reachable != (1UL|(unsigned long) as));
@ -233,7 +235,7 @@ static void bch2_btree_node_free_never_used(struct btree_update *as,
six_unlock_intent(&b->c.lock); six_unlock_intent(&b->c.lock);
trans_for_each_path(trans, path) trans_for_each_path(trans, path, i)
if (path->l[level].b == b) { if (path->l[level].b == b) {
btree_node_unlock(trans, path, level); btree_node_unlock(trans, path, level);
path->l[level].b = ERR_PTR(-BCH_ERR_no_btree_node_init); path->l[level].b = ERR_PTR(-BCH_ERR_no_btree_node_init);
@ -363,7 +365,7 @@ static struct btree *bch2_btree_node_alloc(struct btree_update *as,
ret = bch2_btree_node_hash_insert(&c->btree_cache, b, level, as->btree_id); ret = bch2_btree_node_hash_insert(&c->btree_cache, b, level, as->btree_id);
BUG_ON(ret); BUG_ON(ret);
trace_and_count(c, btree_node_alloc, c, b); trace_and_count(c, btree_node_alloc, trans, b);
bch2_increment_clock(c, btree_sectors(c), WRITE); bch2_increment_clock(c, btree_sectors(c), WRITE);
return b; return b;
} }
@ -453,7 +455,7 @@ static void bch2_btree_reserve_put(struct btree_update *as, struct btree_trans *
btree_node_lock_nopath_nofail(trans, &b->c, SIX_LOCK_intent); btree_node_lock_nopath_nofail(trans, &b->c, SIX_LOCK_intent);
btree_node_lock_nopath_nofail(trans, &b->c, SIX_LOCK_write); btree_node_lock_nopath_nofail(trans, &b->c, SIX_LOCK_write);
__btree_node_free(c, b); __btree_node_free(trans, b);
six_unlock_write(&b->c.lock); six_unlock_write(&b->c.lock);
six_unlock_intent(&b->c.lock); six_unlock_intent(&b->c.lock);
} }
@ -466,7 +468,6 @@ static int bch2_btree_reserve_get(struct btree_trans *trans,
unsigned flags, unsigned flags,
struct closure *cl) struct closure *cl)
{ {
struct bch_fs *c = as->c;
struct btree *b; struct btree *b;
unsigned interior; unsigned interior;
int ret = 0; int ret = 0;
@ -476,11 +477,8 @@ static int bch2_btree_reserve_get(struct btree_trans *trans,
/* /*
* Protects reaping from the btree node cache and using the btree node * Protects reaping from the btree node cache and using the btree node
* open bucket reserve: * open bucket reserve:
*
* BTREE_INSERT_NOWAIT only applies to btree node allocation, not
* blocking on this lock:
*/ */
ret = bch2_btree_cache_cannibalize_lock(c, cl); ret = bch2_btree_cache_cannibalize_lock(trans, cl);
if (ret) if (ret)
return ret; return ret;
@ -488,9 +486,8 @@ static int bch2_btree_reserve_get(struct btree_trans *trans,
struct prealloc_nodes *p = as->prealloc_nodes + interior; struct prealloc_nodes *p = as->prealloc_nodes + interior;
while (p->nr < nr_nodes[interior]) { while (p->nr < nr_nodes[interior]) {
b = __bch2_btree_node_alloc(trans, &as->disk_res, b = __bch2_btree_node_alloc(trans, &as->disk_res, cl,
flags & BTREE_INSERT_NOWAIT ? NULL : cl, interior, flags);
interior, flags);
if (IS_ERR(b)) { if (IS_ERR(b)) {
ret = PTR_ERR(b); ret = PTR_ERR(b);
goto err; goto err;
@ -500,7 +497,7 @@ static int bch2_btree_reserve_get(struct btree_trans *trans,
} }
} }
err: err:
bch2_btree_cache_cannibalize_unlock(c); bch2_btree_cache_cannibalize_unlock(trans);
return ret; return ret;
} }
@ -559,24 +556,20 @@ static void btree_update_add_key(struct btree_update *as,
static int btree_update_nodes_written_trans(struct btree_trans *trans, static int btree_update_nodes_written_trans(struct btree_trans *trans,
struct btree_update *as) struct btree_update *as)
{ {
struct bkey_i *k; struct jset_entry *e = bch2_trans_jset_entry_alloc(trans, as->journal_u64s);
int ret; int ret = PTR_ERR_OR_ZERO(e);
ret = darray_make_room(&trans->extra_journal_entries, as->journal_u64s);
if (ret) if (ret)
return ret; return ret;
memcpy(&darray_top(trans->extra_journal_entries), memcpy(e, as->journal_entries, as->journal_u64s * sizeof(u64));
as->journal_entries,
as->journal_u64s * sizeof(u64));
trans->extra_journal_entries.nr += as->journal_u64s;
trans->journal_pin = &as->journal; trans->journal_pin = &as->journal;
for_each_keylist_key(&as->old_keys, k) { for_each_keylist_key(&as->old_keys, k) {
unsigned level = bkey_i_to_btree_ptr_v2(k)->v.mem_ptr; unsigned level = bkey_i_to_btree_ptr_v2(k)->v.mem_ptr;
ret = bch2_trans_mark_old(trans, as->btree_id, level, bkey_i_to_s_c(k), 0); ret = bch2_key_trigger_old(trans, as->btree_id, level, bkey_i_to_s_c(k),
BTREE_TRIGGER_TRANSACTIONAL);
if (ret) if (ret)
return ret; return ret;
} }
@ -584,7 +577,8 @@ static int btree_update_nodes_written_trans(struct btree_trans *trans,
for_each_keylist_key(&as->new_keys, k) { for_each_keylist_key(&as->new_keys, k) {
unsigned level = bkey_i_to_btree_ptr_v2(k)->v.mem_ptr; unsigned level = bkey_i_to_btree_ptr_v2(k)->v.mem_ptr;
ret = bch2_trans_mark_new(trans, as->btree_id, level, k, 0); ret = bch2_key_trigger_new(trans, as->btree_id, level, bkey_i_to_s(k),
BTREE_TRIGGER_TRANSACTIONAL);
if (ret) if (ret)
return ret; return ret;
} }
@ -645,9 +639,9 @@ static void btree_update_nodes_written(struct btree_update *as)
*/ */
ret = commit_do(trans, &as->disk_res, &journal_seq, ret = commit_do(trans, &as->disk_res, &journal_seq,
BCH_WATERMARK_reclaim| BCH_WATERMARK_reclaim|
BTREE_INSERT_NOFAIL| BCH_TRANS_COMMIT_no_enospc|
BTREE_INSERT_NOCHECK_RW| BCH_TRANS_COMMIT_no_check_rw|
BTREE_INSERT_JOURNAL_RECLAIM, BCH_TRANS_COMMIT_journal_reclaim,
btree_update_nodes_written_trans(trans, as)); btree_update_nodes_written_trans(trans, as));
bch2_trans_unlock(trans); bch2_trans_unlock(trans);
@ -655,10 +649,11 @@ static void btree_update_nodes_written(struct btree_update *as)
"%s(): error %s", __func__, bch2_err_str(ret)); "%s(): error %s", __func__, bch2_err_str(ret));
err: err:
if (as->b) { if (as->b) {
struct btree_path *path;
b = as->b; b = as->b;
path = get_unlocked_mut_path(trans, as->btree_id, b->c.level, b->key.k.p); btree_path_idx_t path_idx = get_unlocked_mut_path(trans,
as->btree_id, b->c.level, b->key.k.p);
struct btree_path *path = trans->paths + path_idx;
/* /*
* @b is the node we did the final insert into: * @b is the node we did the final insert into:
* *
@ -728,7 +723,7 @@ static void btree_update_nodes_written(struct btree_update *as)
btree_node_write_if_need(c, b, SIX_LOCK_intent); btree_node_write_if_need(c, b, SIX_LOCK_intent);
btree_node_unlock(trans, path, b->c.level); btree_node_unlock(trans, path, b->c.level);
bch2_path_put(trans, path, true); bch2_path_put(trans, path_idx, true);
} }
bch2_journal_pin_drop(&c->journal, &as->journal); bch2_journal_pin_drop(&c->journal, &as->journal);
@ -815,6 +810,12 @@ static void btree_update_updated_node(struct btree_update *as, struct btree *b)
mutex_unlock(&c->btree_interior_update_lock); mutex_unlock(&c->btree_interior_update_lock);
} }
static int bch2_update_reparent_journal_pin_flush(struct journal *j,
struct journal_entry_pin *_pin, u64 seq)
{
return 0;
}
static void btree_update_reparent(struct btree_update *as, static void btree_update_reparent(struct btree_update *as,
struct btree_update *child) struct btree_update *child)
{ {
@ -825,7 +826,8 @@ static void btree_update_reparent(struct btree_update *as,
child->b = NULL; child->b = NULL;
child->mode = BTREE_INTERIOR_UPDATING_AS; child->mode = BTREE_INTERIOR_UPDATING_AS;
bch2_journal_pin_copy(&c->journal, &as->journal, &child->journal, NULL); bch2_journal_pin_copy(&c->journal, &as->journal, &child->journal,
bch2_update_reparent_journal_pin_flush);
} }
static void btree_update_updated_root(struct btree_update *as, struct btree *b) static void btree_update_updated_root(struct btree_update *as, struct btree *b)
@ -934,6 +936,12 @@ static void bch2_btree_update_get_open_buckets(struct btree_update *as, struct b
b->ob.v[--b->ob.nr]; b->ob.v[--b->ob.nr];
} }
static int bch2_btree_update_will_free_node_journal_pin_flush(struct journal *j,
struct journal_entry_pin *_pin, u64 seq)
{
return 0;
}
/* /*
* @b is being split/rewritten: it may have pointers to not-yet-written btree * @b is being split/rewritten: it may have pointers to not-yet-written btree
* nodes and thus outstanding btree_updates - redirect @b's * nodes and thus outstanding btree_updates - redirect @b's
@ -985,11 +993,13 @@ static void bch2_btree_interior_update_will_free_node(struct btree_update *as,
* when the new nodes are persistent and reachable on disk: * when the new nodes are persistent and reachable on disk:
*/ */
w = btree_current_write(b); w = btree_current_write(b);
bch2_journal_pin_copy(&c->journal, &as->journal, &w->journal, NULL); bch2_journal_pin_copy(&c->journal, &as->journal, &w->journal,
bch2_btree_update_will_free_node_journal_pin_flush);
bch2_journal_pin_drop(&c->journal, &w->journal); bch2_journal_pin_drop(&c->journal, &w->journal);
w = btree_prev_write(b); w = btree_prev_write(b);
bch2_journal_pin_copy(&c->journal, &as->journal, &w->journal, NULL); bch2_journal_pin_copy(&c->journal, &as->journal, &w->journal,
bch2_btree_update_will_free_node_journal_pin_flush);
bch2_journal_pin_drop(&c->journal, &w->journal); bch2_journal_pin_drop(&c->journal, &w->journal);
mutex_unlock(&c->btree_interior_update_lock); mutex_unlock(&c->btree_interior_update_lock);
@ -1039,7 +1049,7 @@ bch2_btree_update_start(struct btree_trans *trans, struct btree_path *path,
struct bch_fs *c = trans->c; struct bch_fs *c = trans->c;
struct btree_update *as; struct btree_update *as;
u64 start_time = local_clock(); u64 start_time = local_clock();
int disk_res_flags = (flags & BTREE_INSERT_NOFAIL) int disk_res_flags = (flags & BCH_TRANS_COMMIT_no_enospc)
? BCH_DISK_RESERVATION_NOFAIL : 0; ? BCH_DISK_RESERVATION_NOFAIL : 0;
unsigned nr_nodes[2] = { 0, 0 }; unsigned nr_nodes[2] = { 0, 0 };
unsigned update_level = level; unsigned update_level = level;
@ -1057,7 +1067,7 @@ bch2_btree_update_start(struct btree_trans *trans, struct btree_path *path,
flags &= ~BCH_WATERMARK_MASK; flags &= ~BCH_WATERMARK_MASK;
flags |= watermark; flags |= watermark;
if (!(flags & BTREE_INSERT_JOURNAL_RECLAIM) && if (!(flags & BCH_TRANS_COMMIT_journal_reclaim) &&
watermark < c->journal.watermark) { watermark < c->journal.watermark) {
struct journal_res res = { 0 }; struct journal_res res = { 0 };
@ -1094,9 +1104,7 @@ bch2_btree_update_start(struct btree_trans *trans, struct btree_path *path,
split = path->l[update_level].b->nr.live_u64s > BTREE_SPLIT_THRESHOLD(c); split = path->l[update_level].b->nr.live_u64s > BTREE_SPLIT_THRESHOLD(c);
} }
if (flags & BTREE_INSERT_GC_LOCK_HELD) if (!down_read_trylock(&c->gc_lock)) {
lockdep_assert_held(&c->gc_lock);
else if (!down_read_trylock(&c->gc_lock)) {
ret = drop_locks_do(trans, (down_read(&c->gc_lock), 0)); ret = drop_locks_do(trans, (down_read(&c->gc_lock), 0));
if (ret) { if (ret) {
up_read(&c->gc_lock); up_read(&c->gc_lock);
@ -1110,7 +1118,7 @@ bch2_btree_update_start(struct btree_trans *trans, struct btree_path *path,
as->c = c; as->c = c;
as->start_time = start_time; as->start_time = start_time;
as->mode = BTREE_INTERIOR_NO_UPDATE; as->mode = BTREE_INTERIOR_NO_UPDATE;
as->took_gc_lock = !(flags & BTREE_INSERT_GC_LOCK_HELD); as->took_gc_lock = true;
as->btree_id = path->btree_id; as->btree_id = path->btree_id;
as->update_level = update_level; as->update_level = update_level;
INIT_LIST_HEAD(&as->list); INIT_LIST_HEAD(&as->list);
@ -1153,7 +1161,7 @@ bch2_btree_update_start(struct btree_trans *trans, struct btree_path *path,
* flag * flag
*/ */
if (bch2_err_matches(ret, ENOSPC) && if (bch2_err_matches(ret, ENOSPC) &&
(flags & BTREE_INSERT_JOURNAL_RECLAIM) && (flags & BCH_TRANS_COMMIT_journal_reclaim) &&
watermark != BCH_WATERMARK_reclaim) { watermark != BCH_WATERMARK_reclaim) {
ret = -BCH_ERR_journal_reclaim_would_deadlock; ret = -BCH_ERR_journal_reclaim_would_deadlock;
goto err; goto err;
@ -1183,6 +1191,9 @@ bch2_btree_update_start(struct btree_trans *trans, struct btree_path *path,
return as; return as;
err: err:
bch2_btree_update_free(as, trans); bch2_btree_update_free(as, trans);
if (!bch2_err_matches(ret, ENOSPC) &&
!bch2_err_matches(ret, EROFS))
bch_err_fn_ratelimited(c, ret);
return ERR_PTR(ret); return ERR_PTR(ret);
} }
@ -1214,7 +1225,7 @@ static void bch2_btree_set_root(struct btree_update *as,
struct bch_fs *c = as->c; struct bch_fs *c = as->c;
struct btree *old; struct btree *old;
trace_and_count(c, btree_node_set_root, c, b); trace_and_count(c, btree_node_set_root, trans, b);
old = btree_node_root(c, b); old = btree_node_root(c, b);
@ -1445,10 +1456,12 @@ static void __btree_split_node(struct btree_update *as,
*/ */
static void btree_split_insert_keys(struct btree_update *as, static void btree_split_insert_keys(struct btree_update *as,
struct btree_trans *trans, struct btree_trans *trans,
struct btree_path *path, btree_path_idx_t path_idx,
struct btree *b, struct btree *b,
struct keylist *keys) struct keylist *keys)
{ {
struct btree_path *path = trans->paths + path_idx;
if (!bch2_keylist_empty(keys) && if (!bch2_keylist_empty(keys) &&
bpos_le(bch2_keylist_front(keys)->k.p, b->data->max_key)) { bpos_le(bch2_keylist_front(keys)->k.p, b->data->max_key)) {
struct btree_node_iter node_iter; struct btree_node_iter node_iter;
@ -1462,25 +1475,25 @@ static void btree_split_insert_keys(struct btree_update *as,
} }
static int btree_split(struct btree_update *as, struct btree_trans *trans, static int btree_split(struct btree_update *as, struct btree_trans *trans,
struct btree_path *path, struct btree *b, btree_path_idx_t path, struct btree *b,
struct keylist *keys, unsigned flags) struct keylist *keys, unsigned flags)
{ {
struct bch_fs *c = as->c; struct bch_fs *c = as->c;
struct btree *parent = btree_node_parent(path, b); struct btree *parent = btree_node_parent(trans->paths + path, b);
struct btree *n1, *n2 = NULL, *n3 = NULL; struct btree *n1, *n2 = NULL, *n3 = NULL;
struct btree_path *path1 = NULL, *path2 = NULL; btree_path_idx_t path1 = 0, path2 = 0;
u64 start_time = local_clock(); u64 start_time = local_clock();
int ret = 0; int ret = 0;
BUG_ON(!parent && (b != btree_node_root(c, b))); BUG_ON(!parent && (b != btree_node_root(c, b)));
BUG_ON(parent && !btree_node_intent_locked(path, b->c.level + 1)); BUG_ON(parent && !btree_node_intent_locked(trans->paths + path, b->c.level + 1));
bch2_btree_interior_update_will_free_node(as, b); bch2_btree_interior_update_will_free_node(as, b);
if (b->nr.live_u64s > BTREE_SPLIT_THRESHOLD(c)) { if (b->nr.live_u64s > BTREE_SPLIT_THRESHOLD(c)) {
struct btree *n[2]; struct btree *n[2];
trace_and_count(c, btree_node_split, c, b); trace_and_count(c, btree_node_split, trans, b);
n[0] = n1 = bch2_btree_node_alloc(as, trans, b->c.level); n[0] = n1 = bch2_btree_node_alloc(as, trans, b->c.level);
n[1] = n2 = bch2_btree_node_alloc(as, trans, b->c.level); n[1] = n2 = bch2_btree_node_alloc(as, trans, b->c.level);
@ -1501,15 +1514,15 @@ static int btree_split(struct btree_update *as, struct btree_trans *trans,
six_unlock_write(&n2->c.lock); six_unlock_write(&n2->c.lock);
six_unlock_write(&n1->c.lock); six_unlock_write(&n1->c.lock);
path1 = get_unlocked_mut_path(trans, path->btree_id, n1->c.level, n1->key.k.p); path1 = get_unlocked_mut_path(trans, as->btree_id, n1->c.level, n1->key.k.p);
six_lock_increment(&n1->c.lock, SIX_LOCK_intent); six_lock_increment(&n1->c.lock, SIX_LOCK_intent);
mark_btree_node_locked(trans, path1, n1->c.level, BTREE_NODE_INTENT_LOCKED); mark_btree_node_locked(trans, trans->paths + path1, n1->c.level, BTREE_NODE_INTENT_LOCKED);
bch2_btree_path_level_init(trans, path1, n1); bch2_btree_path_level_init(trans, trans->paths + path1, n1);
path2 = get_unlocked_mut_path(trans, path->btree_id, n2->c.level, n2->key.k.p); path2 = get_unlocked_mut_path(trans, as->btree_id, n2->c.level, n2->key.k.p);
six_lock_increment(&n2->c.lock, SIX_LOCK_intent); six_lock_increment(&n2->c.lock, SIX_LOCK_intent);
mark_btree_node_locked(trans, path2, n2->c.level, BTREE_NODE_INTENT_LOCKED); mark_btree_node_locked(trans, trans->paths + path2, n2->c.level, BTREE_NODE_INTENT_LOCKED);
bch2_btree_path_level_init(trans, path2, n2); bch2_btree_path_level_init(trans, trans->paths + path2, n2);
/* /*
* Note that on recursive parent_keys == keys, so we * Note that on recursive parent_keys == keys, so we
@ -1526,11 +1539,11 @@ static int btree_split(struct btree_update *as, struct btree_trans *trans,
bch2_btree_update_add_new_node(as, n3); bch2_btree_update_add_new_node(as, n3);
six_unlock_write(&n3->c.lock); six_unlock_write(&n3->c.lock);
path2->locks_want++; trans->paths[path2].locks_want++;
BUG_ON(btree_node_locked(path2, n3->c.level)); BUG_ON(btree_node_locked(trans->paths + path2, n3->c.level));
six_lock_increment(&n3->c.lock, SIX_LOCK_intent); six_lock_increment(&n3->c.lock, SIX_LOCK_intent);
mark_btree_node_locked(trans, path2, n3->c.level, BTREE_NODE_INTENT_LOCKED); mark_btree_node_locked(trans, trans->paths + path2, n3->c.level, BTREE_NODE_INTENT_LOCKED);
bch2_btree_path_level_init(trans, path2, n3); bch2_btree_path_level_init(trans, trans->paths + path2, n3);
n3->sib_u64s[0] = U16_MAX; n3->sib_u64s[0] = U16_MAX;
n3->sib_u64s[1] = U16_MAX; n3->sib_u64s[1] = U16_MAX;
@ -1538,7 +1551,7 @@ static int btree_split(struct btree_update *as, struct btree_trans *trans,
btree_split_insert_keys(as, trans, path, n3, &as->parent_keys); btree_split_insert_keys(as, trans, path, n3, &as->parent_keys);
} }
} else { } else {
trace_and_count(c, btree_node_compact, c, b); trace_and_count(c, btree_node_compact, trans, b);
n1 = bch2_btree_node_alloc_replacement(as, trans, b); n1 = bch2_btree_node_alloc_replacement(as, trans, b);
@ -1551,10 +1564,10 @@ static int btree_split(struct btree_update *as, struct btree_trans *trans,
bch2_btree_update_add_new_node(as, n1); bch2_btree_update_add_new_node(as, n1);
six_unlock_write(&n1->c.lock); six_unlock_write(&n1->c.lock);
path1 = get_unlocked_mut_path(trans, path->btree_id, n1->c.level, n1->key.k.p); path1 = get_unlocked_mut_path(trans, as->btree_id, n1->c.level, n1->key.k.p);
six_lock_increment(&n1->c.lock, SIX_LOCK_intent); six_lock_increment(&n1->c.lock, SIX_LOCK_intent);
mark_btree_node_locked(trans, path1, n1->c.level, BTREE_NODE_INTENT_LOCKED); mark_btree_node_locked(trans, trans->paths + path1, n1->c.level, BTREE_NODE_INTENT_LOCKED);
bch2_btree_path_level_init(trans, path1, n1); bch2_btree_path_level_init(trans, trans->paths + path1, n1);
if (parent) if (parent)
bch2_keylist_add(&as->parent_keys, &n1->key); bch2_keylist_add(&as->parent_keys, &n1->key);
@ -1568,10 +1581,10 @@ static int btree_split(struct btree_update *as, struct btree_trans *trans,
if (ret) if (ret)
goto err; goto err;
} else if (n3) { } else if (n3) {
bch2_btree_set_root(as, trans, path, n3); bch2_btree_set_root(as, trans, trans->paths + path, n3);
} else { } else {
/* Root filled up but didn't need to be split */ /* Root filled up but didn't need to be split */
bch2_btree_set_root(as, trans, path, n1); bch2_btree_set_root(as, trans, trans->paths + path, n1);
} }
if (n3) { if (n3) {
@ -1591,13 +1604,13 @@ static int btree_split(struct btree_update *as, struct btree_trans *trans,
* node after another thread has locked and updated the new node, thus * node after another thread has locked and updated the new node, thus
* seeing stale data: * seeing stale data:
*/ */
bch2_btree_node_free_inmem(trans, path, b); bch2_btree_node_free_inmem(trans, trans->paths + path, b);
if (n3) if (n3)
bch2_trans_node_add(trans, n3); bch2_trans_node_add(trans, trans->paths + path, n3);
if (n2) if (n2)
bch2_trans_node_add(trans, n2); bch2_trans_node_add(trans, trans->paths + path2, n2);
bch2_trans_node_add(trans, n1); bch2_trans_node_add(trans, trans->paths + path1, n1);
if (n3) if (n3)
six_unlock_intent(&n3->c.lock); six_unlock_intent(&n3->c.lock);
@ -1606,11 +1619,11 @@ static int btree_split(struct btree_update *as, struct btree_trans *trans,
six_unlock_intent(&n1->c.lock); six_unlock_intent(&n1->c.lock);
out: out:
if (path2) { if (path2) {
__bch2_btree_path_unlock(trans, path2); __bch2_btree_path_unlock(trans, trans->paths + path2);
bch2_path_put(trans, path2, true); bch2_path_put(trans, path2, true);
} }
if (path1) { if (path1) {
__bch2_btree_path_unlock(trans, path1); __bch2_btree_path_unlock(trans, trans->paths + path1);
bch2_path_put(trans, path1, true); bch2_path_put(trans, path1, true);
} }
@ -1638,13 +1651,14 @@ bch2_btree_insert_keys_interior(struct btree_update *as,
struct keylist *keys) struct keylist *keys)
{ {
struct btree_path *linked; struct btree_path *linked;
unsigned i;
__bch2_btree_insert_keys_interior(as, trans, path, b, __bch2_btree_insert_keys_interior(as, trans, path, b,
path->l[b->c.level].iter, keys); path->l[b->c.level].iter, keys);
btree_update_updated_node(as, b); btree_update_updated_node(as, b);
trans_for_each_path_with_node(trans, b, linked) trans_for_each_path_with_node(trans, b, linked, i)
bch2_btree_node_iter_peek(&linked->l[b->c.level].iter, b); bch2_btree_node_iter_peek(&linked->l[b->c.level].iter, b);
bch2_trans_verify_paths(trans); bch2_trans_verify_paths(trans);
@ -1655,7 +1669,7 @@ bch2_btree_insert_keys_interior(struct btree_update *as,
* *
* @as: btree_update object * @as: btree_update object
* @trans: btree_trans object * @trans: btree_trans object
* @path: path that points to current node * @path_idx: path that points to current node
* @b: node to insert keys into * @b: node to insert keys into
* @keys: list of keys to insert * @keys: list of keys to insert
* @flags: transaction commit flags * @flags: transaction commit flags
@ -1667,10 +1681,11 @@ bch2_btree_insert_keys_interior(struct btree_update *as,
* for leaf nodes -- inserts into interior nodes have to be atomic. * for leaf nodes -- inserts into interior nodes have to be atomic.
*/ */
static int bch2_btree_insert_node(struct btree_update *as, struct btree_trans *trans, static int bch2_btree_insert_node(struct btree_update *as, struct btree_trans *trans,
struct btree_path *path, struct btree *b, btree_path_idx_t path_idx, struct btree *b,
struct keylist *keys, unsigned flags) struct keylist *keys, unsigned flags)
{ {
struct bch_fs *c = as->c; struct bch_fs *c = as->c;
struct btree_path *path = trans->paths + path_idx;
int old_u64s = le16_to_cpu(btree_bset_last(b)->u64s); int old_u64s = le16_to_cpu(btree_bset_last(b)->u64s);
int old_live_u64s = b->nr.live_u64s; int old_live_u64s = b->nr.live_u64s;
int live_u64s_added, u64s_added; int live_u64s_added, u64s_added;
@ -1723,19 +1738,22 @@ static int bch2_btree_insert_node(struct btree_update *as, struct btree_trans *t
return btree_trans_restart(trans, BCH_ERR_transaction_restart_split_race); return btree_trans_restart(trans, BCH_ERR_transaction_restart_split_race);
} }
return btree_split(as, trans, path, b, keys, flags); return btree_split(as, trans, path_idx, b, keys, flags);
} }
int bch2_btree_split_leaf(struct btree_trans *trans, int bch2_btree_split_leaf(struct btree_trans *trans,
struct btree_path *path, btree_path_idx_t path,
unsigned flags) unsigned flags)
{ {
struct btree *b = path_l(path)->b; /* btree_split & merge may both cause paths array to be reallocated */
struct btree *b = path_l(trans->paths + path)->b;
struct btree_update *as; struct btree_update *as;
unsigned l; unsigned l;
int ret = 0; int ret = 0;
as = bch2_btree_update_start(trans, path, path->level, as = bch2_btree_update_start(trans, trans->paths + path,
trans->paths[path].level,
true, flags); true, flags);
if (IS_ERR(as)) if (IS_ERR(as))
return PTR_ERR(as); return PTR_ERR(as);
@ -1748,20 +1766,21 @@ int bch2_btree_split_leaf(struct btree_trans *trans,
bch2_btree_update_done(as, trans); bch2_btree_update_done(as, trans);
for (l = path->level + 1; btree_node_intent_locked(path, l) && !ret; l++) for (l = trans->paths[path].level + 1;
btree_node_intent_locked(&trans->paths[path], l) && !ret;
l++)
ret = bch2_foreground_maybe_merge(trans, path, l, flags); ret = bch2_foreground_maybe_merge(trans, path, l, flags);
return ret; return ret;
} }
int __bch2_foreground_maybe_merge(struct btree_trans *trans, int __bch2_foreground_maybe_merge(struct btree_trans *trans,
struct btree_path *path, btree_path_idx_t path,
unsigned level, unsigned level,
unsigned flags, unsigned flags,
enum btree_node_sibling sib) enum btree_node_sibling sib)
{ {
struct bch_fs *c = trans->c; struct bch_fs *c = trans->c;
struct btree_path *sib_path = NULL, *new_path = NULL;
struct btree_update *as; struct btree_update *as;
struct bkey_format_state new_s; struct bkey_format_state new_s;
struct bkey_format new_f; struct bkey_format new_f;
@ -1769,13 +1788,15 @@ int __bch2_foreground_maybe_merge(struct btree_trans *trans,
struct btree *b, *m, *n, *prev, *next, *parent; struct btree *b, *m, *n, *prev, *next, *parent;
struct bpos sib_pos; struct bpos sib_pos;
size_t sib_u64s; size_t sib_u64s;
enum btree_id btree = trans->paths[path].btree_id;
btree_path_idx_t sib_path = 0, new_path = 0;
u64 start_time = local_clock(); u64 start_time = local_clock();
int ret = 0; int ret = 0;
BUG_ON(!path->should_be_locked); BUG_ON(!trans->paths[path].should_be_locked);
BUG_ON(!btree_node_locked(path, level)); BUG_ON(!btree_node_locked(&trans->paths[path], level));
b = path->l[level].b; b = trans->paths[path].l[level].b;
if ((sib == btree_prev_sib && bpos_eq(b->data->min_key, POS_MIN)) || if ((sib == btree_prev_sib && bpos_eq(b->data->min_key, POS_MIN)) ||
(sib == btree_next_sib && bpos_eq(b->data->max_key, SPOS_MAX))) { (sib == btree_next_sib && bpos_eq(b->data->max_key, SPOS_MAX))) {
@ -1787,18 +1808,18 @@ int __bch2_foreground_maybe_merge(struct btree_trans *trans,
? bpos_predecessor(b->data->min_key) ? bpos_predecessor(b->data->min_key)
: bpos_successor(b->data->max_key); : bpos_successor(b->data->max_key);
sib_path = bch2_path_get(trans, path->btree_id, sib_pos, sib_path = bch2_path_get(trans, btree, sib_pos,
U8_MAX, level, BTREE_ITER_INTENT, _THIS_IP_); U8_MAX, level, BTREE_ITER_INTENT, _THIS_IP_);
ret = bch2_btree_path_traverse(trans, sib_path, false); ret = bch2_btree_path_traverse(trans, sib_path, false);
if (ret) if (ret)
goto err; goto err;
btree_path_set_should_be_locked(sib_path); btree_path_set_should_be_locked(trans->paths + sib_path);
m = sib_path->l[level].b; m = trans->paths[sib_path].l[level].b;
if (btree_node_parent(path, b) != if (btree_node_parent(trans->paths + path, b) !=
btree_node_parent(sib_path, m)) { btree_node_parent(trans->paths + sib_path, m)) {
b->sib_u64s[sib] = U16_MAX; b->sib_u64s[sib] = U16_MAX;
goto out; goto out;
} }
@ -1851,14 +1872,14 @@ int __bch2_foreground_maybe_merge(struct btree_trans *trans,
if (b->sib_u64s[sib] > c->btree_foreground_merge_threshold) if (b->sib_u64s[sib] > c->btree_foreground_merge_threshold)
goto out; goto out;
parent = btree_node_parent(path, b); parent = btree_node_parent(trans->paths + path, b);
as = bch2_btree_update_start(trans, path, level, false, as = bch2_btree_update_start(trans, trans->paths + path, level, false,
BTREE_INSERT_NOFAIL|flags); BCH_TRANS_COMMIT_no_enospc|flags);
ret = PTR_ERR_OR_ZERO(as); ret = PTR_ERR_OR_ZERO(as);
if (ret) if (ret)
goto err; goto err;
trace_and_count(c, btree_node_merge, c, b); trace_and_count(c, btree_node_merge, trans, b);
bch2_btree_interior_update_will_free_node(as, b); bch2_btree_interior_update_will_free_node(as, b);
bch2_btree_interior_update_will_free_node(as, m); bch2_btree_interior_update_will_free_node(as, m);
@ -1882,10 +1903,10 @@ int __bch2_foreground_maybe_merge(struct btree_trans *trans,
bch2_btree_update_add_new_node(as, n); bch2_btree_update_add_new_node(as, n);
six_unlock_write(&n->c.lock); six_unlock_write(&n->c.lock);
new_path = get_unlocked_mut_path(trans, path->btree_id, n->c.level, n->key.k.p); new_path = get_unlocked_mut_path(trans, btree, n->c.level, n->key.k.p);
six_lock_increment(&n->c.lock, SIX_LOCK_intent); six_lock_increment(&n->c.lock, SIX_LOCK_intent);
mark_btree_node_locked(trans, new_path, n->c.level, BTREE_NODE_INTENT_LOCKED); mark_btree_node_locked(trans, trans->paths + new_path, n->c.level, BTREE_NODE_INTENT_LOCKED);
bch2_btree_path_level_init(trans, new_path, n); bch2_btree_path_level_init(trans, trans->paths + new_path, n);
bkey_init(&delete.k); bkey_init(&delete.k);
delete.k.p = prev->key.k.p; delete.k.p = prev->key.k.p;
@ -1903,10 +1924,10 @@ int __bch2_foreground_maybe_merge(struct btree_trans *trans,
bch2_btree_update_get_open_buckets(as, n); bch2_btree_update_get_open_buckets(as, n);
bch2_btree_node_write(c, n, SIX_LOCK_intent, 0); bch2_btree_node_write(c, n, SIX_LOCK_intent, 0);
bch2_btree_node_free_inmem(trans, path, b); bch2_btree_node_free_inmem(trans, trans->paths + path, b);
bch2_btree_node_free_inmem(trans, sib_path, m); bch2_btree_node_free_inmem(trans, trans->paths + sib_path, m);
bch2_trans_node_add(trans, n); bch2_trans_node_add(trans, trans->paths + path, n);
bch2_trans_verify_paths(trans); bch2_trans_verify_paths(trans);
@ -1934,16 +1955,16 @@ int bch2_btree_node_rewrite(struct btree_trans *trans,
unsigned flags) unsigned flags)
{ {
struct bch_fs *c = trans->c; struct bch_fs *c = trans->c;
struct btree_path *new_path = NULL;
struct btree *n, *parent; struct btree *n, *parent;
struct btree_update *as; struct btree_update *as;
btree_path_idx_t new_path = 0;
int ret; int ret;
flags |= BTREE_INSERT_NOFAIL; flags |= BCH_TRANS_COMMIT_no_enospc;
parent = btree_node_parent(iter->path, b); struct btree_path *path = btree_iter_path(trans, iter);
as = bch2_btree_update_start(trans, iter->path, b->c.level, parent = btree_node_parent(path, b);
false, flags); as = bch2_btree_update_start(trans, path, b->c.level, false, flags);
ret = PTR_ERR_OR_ZERO(as); ret = PTR_ERR_OR_ZERO(as);
if (ret) if (ret)
goto out; goto out;
@ -1958,27 +1979,27 @@ int bch2_btree_node_rewrite(struct btree_trans *trans,
new_path = get_unlocked_mut_path(trans, iter->btree_id, n->c.level, n->key.k.p); new_path = get_unlocked_mut_path(trans, iter->btree_id, n->c.level, n->key.k.p);
six_lock_increment(&n->c.lock, SIX_LOCK_intent); six_lock_increment(&n->c.lock, SIX_LOCK_intent);
mark_btree_node_locked(trans, new_path, n->c.level, BTREE_NODE_INTENT_LOCKED); mark_btree_node_locked(trans, trans->paths + new_path, n->c.level, BTREE_NODE_INTENT_LOCKED);
bch2_btree_path_level_init(trans, new_path, n); bch2_btree_path_level_init(trans, trans->paths + new_path, n);
trace_and_count(c, btree_node_rewrite, c, b); trace_and_count(c, btree_node_rewrite, trans, b);
if (parent) { if (parent) {
bch2_keylist_add(&as->parent_keys, &n->key); bch2_keylist_add(&as->parent_keys, &n->key);
ret = bch2_btree_insert_node(as, trans, iter->path, parent, ret = bch2_btree_insert_node(as, trans, iter->path,
&as->parent_keys, flags); parent, &as->parent_keys, flags);
if (ret) if (ret)
goto err; goto err;
} else { } else {
bch2_btree_set_root(as, trans, iter->path, n); bch2_btree_set_root(as, trans, btree_iter_path(trans, iter), n);
} }
bch2_btree_update_get_open_buckets(as, n); bch2_btree_update_get_open_buckets(as, n);
bch2_btree_node_write(c, n, SIX_LOCK_intent, 0); bch2_btree_node_write(c, n, SIX_LOCK_intent, 0);
bch2_btree_node_free_inmem(trans, iter->path, b); bch2_btree_node_free_inmem(trans, btree_iter_path(trans, iter), b);
bch2_trans_node_add(trans, n); bch2_trans_node_add(trans, trans->paths + iter->path, n);
six_unlock_intent(&n->c.lock); six_unlock_intent(&n->c.lock);
bch2_btree_update_done(as, trans); bch2_btree_update_done(as, trans);
@ -2047,8 +2068,7 @@ static void async_btree_node_rewrite_work(struct work_struct *work)
ret = bch2_trans_do(c, NULL, NULL, 0, ret = bch2_trans_do(c, NULL, NULL, 0,
async_btree_node_rewrite_trans(trans, a)); async_btree_node_rewrite_trans(trans, a));
if (ret) bch_err_fn(c, ret);
bch_err_fn(c, ret);
bch2_write_ref_put(c, BCH_WRITE_REF_node_rewrite); bch2_write_ref_put(c, BCH_WRITE_REF_node_rewrite);
kfree(a); kfree(a);
} }
@ -2071,7 +2091,7 @@ void bch2_btree_node_rewrite_async(struct bch_fs *c, struct btree *b)
a->seq = b->data->keys.seq; a->seq = b->data->keys.seq;
INIT_WORK(&a->work, async_btree_node_rewrite_work); INIT_WORK(&a->work, async_btree_node_rewrite_work);
if (unlikely(!test_bit(BCH_FS_MAY_GO_RW, &c->flags))) { if (unlikely(!test_bit(BCH_FS_may_go_rw, &c->flags))) {
mutex_lock(&c->pending_node_rewrites_lock); mutex_lock(&c->pending_node_rewrites_lock);
list_add(&a->list, &c->pending_node_rewrites); list_add(&a->list, &c->pending_node_rewrites);
mutex_unlock(&c->pending_node_rewrites_lock); mutex_unlock(&c->pending_node_rewrites_lock);
@ -2079,15 +2099,15 @@ void bch2_btree_node_rewrite_async(struct bch_fs *c, struct btree *b)
} }
if (!bch2_write_ref_tryget(c, BCH_WRITE_REF_node_rewrite)) { if (!bch2_write_ref_tryget(c, BCH_WRITE_REF_node_rewrite)) {
if (test_bit(BCH_FS_STARTED, &c->flags)) { if (test_bit(BCH_FS_started, &c->flags)) {
bch_err(c, "%s: error getting c->writes ref", __func__); bch_err(c, "%s: error getting c->writes ref", __func__);
kfree(a); kfree(a);
return; return;
} }
ret = bch2_fs_read_write_early(c); ret = bch2_fs_read_write_early(c);
bch_err_msg(c, ret, "going read-write");
if (ret) { if (ret) {
bch_err_msg(c, ret, "going read-write");
kfree(a); kfree(a);
return; return;
} }
@ -2138,13 +2158,12 @@ static int __bch2_btree_node_update_key(struct btree_trans *trans,
int ret; int ret;
if (!skip_triggers) { if (!skip_triggers) {
ret = bch2_trans_mark_old(trans, b->c.btree_id, b->c.level + 1, ret = bch2_key_trigger_old(trans, b->c.btree_id, b->c.level + 1,
bkey_i_to_s_c(&b->key), 0); bkey_i_to_s_c(&b->key),
if (ret) BTREE_TRIGGER_TRANSACTIONAL) ?:
return ret; bch2_key_trigger_new(trans, b->c.btree_id, b->c.level + 1,
bkey_i_to_s(new_key),
ret = bch2_trans_mark_new(trans, b->c.btree_id, b->c.level + 1, BTREE_TRIGGER_TRANSACTIONAL);
new_key, 0);
if (ret) if (ret)
return ret; return ret;
} }
@ -2156,7 +2175,7 @@ static int __bch2_btree_node_update_key(struct btree_trans *trans,
BUG_ON(ret); BUG_ON(ret);
} }
parent = btree_node_parent(iter->path, b); parent = btree_node_parent(btree_iter_path(trans, iter), b);
if (parent) { if (parent) {
bch2_trans_copy_iter(&iter2, iter); bch2_trans_copy_iter(&iter2, iter);
@ -2164,10 +2183,11 @@ static int __bch2_btree_node_update_key(struct btree_trans *trans,
iter2.flags & BTREE_ITER_INTENT, iter2.flags & BTREE_ITER_INTENT,
_THIS_IP_); _THIS_IP_);
BUG_ON(iter2.path->level != b->c.level); struct btree_path *path2 = btree_iter_path(trans, &iter2);
BUG_ON(!bpos_eq(iter2.path->pos, new_key->k.p)); BUG_ON(path2->level != b->c.level);
BUG_ON(!bpos_eq(path2->pos, new_key->k.p));
btree_path_set_level_up(trans, iter2.path); btree_path_set_level_up(trans, path2);
trans->paths_sorted = false; trans->paths_sorted = false;
@ -2178,23 +2198,23 @@ static int __bch2_btree_node_update_key(struct btree_trans *trans,
} else { } else {
BUG_ON(btree_node_root(c, b) != b); BUG_ON(btree_node_root(c, b) != b);
ret = darray_make_room(&trans->extra_journal_entries, struct jset_entry *e = bch2_trans_jset_entry_alloc(trans,
jset_u64s(new_key->k.u64s)); jset_u64s(new_key->k.u64s));
ret = PTR_ERR_OR_ZERO(e);
if (ret) if (ret)
return ret; return ret;
journal_entry_set((void *) &darray_top(trans->extra_journal_entries), journal_entry_set(e,
BCH_JSET_ENTRY_btree_root, BCH_JSET_ENTRY_btree_root,
b->c.btree_id, b->c.level, b->c.btree_id, b->c.level,
new_key, new_key->k.u64s); new_key, new_key->k.u64s);
trans->extra_journal_entries.nr += jset_u64s(new_key->k.u64s);
} }
ret = bch2_trans_commit(trans, NULL, NULL, commit_flags); ret = bch2_trans_commit(trans, NULL, NULL, commit_flags);
if (ret) if (ret)
goto err; goto err;
bch2_btree_node_lock_write_nofail(trans, iter->path, &b->c); bch2_btree_node_lock_write_nofail(trans, btree_iter_path(trans, iter), &b->c);
if (new_hash) { if (new_hash) {
mutex_lock(&c->btree_cache.lock); mutex_lock(&c->btree_cache.lock);
@ -2209,7 +2229,7 @@ static int __bch2_btree_node_update_key(struct btree_trans *trans,
bkey_copy(&b->key, new_key); bkey_copy(&b->key, new_key);
} }
bch2_btree_node_unlock_write(trans, iter->path, b); bch2_btree_node_unlock_write(trans, btree_iter_path(trans, iter), b);
out: out:
bch2_trans_iter_exit(trans, &iter2); bch2_trans_iter_exit(trans, &iter2);
return ret; return ret;
@ -2228,7 +2248,7 @@ int bch2_btree_node_update_key(struct btree_trans *trans, struct btree_iter *ite
{ {
struct bch_fs *c = trans->c; struct bch_fs *c = trans->c;
struct btree *new_hash = NULL; struct btree *new_hash = NULL;
struct btree_path *path = iter->path; struct btree_path *path = btree_iter_path(trans, iter);
struct closure cl; struct closure cl;
int ret = 0; int ret = 0;
@ -2243,7 +2263,7 @@ int bch2_btree_node_update_key(struct btree_trans *trans, struct btree_iter *ite
* btree_iter_traverse(): * btree_iter_traverse():
*/ */
if (btree_ptr_hash_val(new_key) != b->hash_val) { if (btree_ptr_hash_val(new_key) != b->hash_val) {
ret = bch2_btree_cache_cannibalize_lock(c, &cl); ret = bch2_btree_cache_cannibalize_lock(trans, &cl);
if (ret) { if (ret) {
ret = drop_locks_do(trans, (closure_sync(&cl), 0)); ret = drop_locks_do(trans, (closure_sync(&cl), 0));
if (ret) if (ret)
@ -2267,7 +2287,7 @@ int bch2_btree_node_update_key(struct btree_trans *trans, struct btree_iter *ite
six_unlock_intent(&new_hash->c.lock); six_unlock_intent(&new_hash->c.lock);
} }
closure_sync(&cl); closure_sync(&cl);
bch2_btree_cache_cannibalize_unlock(c); bch2_btree_cache_cannibalize_unlock(trans);
return ret; return ret;
} }
@ -2286,7 +2306,7 @@ int bch2_btree_node_update_key_get_iter(struct btree_trans *trans,
goto out; goto out;
/* has node been freed? */ /* has node been freed? */
if (iter.path->l[b->c.level].b != b) { if (btree_iter_path(trans, &iter)->l[b->c.level].b != b) {
/* node has been freed: */ /* node has been freed: */
BUG_ON(!btree_node_dying(b)); BUG_ON(!btree_node_dying(b));
goto out; goto out;
@ -2328,12 +2348,12 @@ static int __bch2_btree_root_alloc(struct btree_trans *trans, enum btree_id id)
closure_init_stack(&cl); closure_init_stack(&cl);
do { do {
ret = bch2_btree_cache_cannibalize_lock(c, &cl); ret = bch2_btree_cache_cannibalize_lock(trans, &cl);
closure_sync(&cl); closure_sync(&cl);
} while (ret); } while (ret);
b = bch2_btree_node_mem_alloc(trans, false); b = bch2_btree_node_mem_alloc(trans, false);
bch2_btree_cache_cannibalize_unlock(c); bch2_btree_cache_cannibalize_unlock(trans);
set_btree_node_fake(b); set_btree_node_fake(b);
set_btree_node_need_rewrite(b); set_btree_node_need_rewrite(b);

View File

@ -117,16 +117,17 @@ struct btree *__bch2_btree_node_alloc_replacement(struct btree_update *,
struct btree *, struct btree *,
struct bkey_format); struct bkey_format);
int bch2_btree_split_leaf(struct btree_trans *, struct btree_path *, unsigned); int bch2_btree_split_leaf(struct btree_trans *, btree_path_idx_t, unsigned);
int __bch2_foreground_maybe_merge(struct btree_trans *, struct btree_path *, int __bch2_foreground_maybe_merge(struct btree_trans *, btree_path_idx_t,
unsigned, unsigned, enum btree_node_sibling); unsigned, unsigned, enum btree_node_sibling);
static inline int bch2_foreground_maybe_merge_sibling(struct btree_trans *trans, static inline int bch2_foreground_maybe_merge_sibling(struct btree_trans *trans,
struct btree_path *path, btree_path_idx_t path_idx,
unsigned level, unsigned flags, unsigned level, unsigned flags,
enum btree_node_sibling sib) enum btree_node_sibling sib)
{ {
struct btree_path *path = trans->paths + path_idx;
struct btree *b; struct btree *b;
EBUG_ON(!btree_node_locked(path, level)); EBUG_ON(!btree_node_locked(path, level));
@ -135,11 +136,11 @@ static inline int bch2_foreground_maybe_merge_sibling(struct btree_trans *trans,
if (b->sib_u64s[sib] > trans->c->btree_foreground_merge_threshold) if (b->sib_u64s[sib] > trans->c->btree_foreground_merge_threshold)
return 0; return 0;
return __bch2_foreground_maybe_merge(trans, path, level, flags, sib); return __bch2_foreground_maybe_merge(trans, path_idx, level, flags, sib);
} }
static inline int bch2_foreground_maybe_merge(struct btree_trans *trans, static inline int bch2_foreground_maybe_merge(struct btree_trans *trans,
struct btree_path *path, btree_path_idx_t path,
unsigned level, unsigned level,
unsigned flags) unsigned flags)
{ {

View File

@ -7,45 +7,144 @@
#include "btree_write_buffer.h" #include "btree_write_buffer.h"
#include "error.h" #include "error.h"
#include "journal.h" #include "journal.h"
#include "journal_io.h"
#include "journal_reclaim.h" #include "journal_reclaim.h"
#include <linux/sort.h> #include <linux/prefetch.h>
static int btree_write_buffered_key_cmp(const void *_l, const void *_r) static int bch2_btree_write_buffer_journal_flush(struct journal *,
struct journal_entry_pin *, u64);
static int bch2_journal_keys_to_write_buffer(struct bch_fs *, struct journal_buf *);
static inline bool __wb_key_ref_cmp(const struct wb_key_ref *l, const struct wb_key_ref *r)
{ {
const struct btree_write_buffered_key *l = _l; return (cmp_int(l->hi, r->hi) ?:
const struct btree_write_buffered_key *r = _r; cmp_int(l->mi, r->mi) ?:
cmp_int(l->lo, r->lo)) >= 0;
return cmp_int(l->btree, r->btree) ?:
bpos_cmp(l->k.k.p, r->k.k.p) ?:
cmp_int(l->journal_seq, r->journal_seq) ?:
cmp_int(l->journal_offset, r->journal_offset);
} }
static int btree_write_buffered_journal_cmp(const void *_l, const void *_r) static inline bool wb_key_ref_cmp(const struct wb_key_ref *l, const struct wb_key_ref *r)
{ {
const struct btree_write_buffered_key *l = _l; #ifdef CONFIG_X86_64
const struct btree_write_buffered_key *r = _r; int cmp;
return cmp_int(l->journal_seq, r->journal_seq); asm("mov (%[l]), %%rax;"
"sub (%[r]), %%rax;"
"mov 8(%[l]), %%rax;"
"sbb 8(%[r]), %%rax;"
"mov 16(%[l]), %%rax;"
"sbb 16(%[r]), %%rax;"
: "=@ccae" (cmp)
: [l] "r" (l), [r] "r" (r)
: "rax", "cc");
EBUG_ON(cmp != __wb_key_ref_cmp(l, r));
return cmp;
#else
return __wb_key_ref_cmp(l, r);
#endif
} }
static int bch2_btree_write_buffer_flush_one(struct btree_trans *trans, /* Compare excluding idx, the low 24 bits: */
struct btree_iter *iter, static inline bool wb_key_eq(const void *_l, const void *_r)
struct btree_write_buffered_key *wb, {
unsigned commit_flags, const struct wb_key_ref *l = _l;
bool *write_locked, const struct wb_key_ref *r = _r;
size_t *fast)
return !((l->hi ^ r->hi)|
(l->mi ^ r->mi)|
((l->lo >> 24) ^ (r->lo >> 24)));
}
static noinline void wb_sort(struct wb_key_ref *base, size_t num)
{
size_t n = num, a = num / 2;
if (!a) /* num < 2 || size == 0 */
return;
for (;;) {
size_t b, c, d;
if (a) /* Building heap: sift down --a */
--a;
else if (--n) /* Sorting: Extract root to --n */
swap(base[0], base[n]);
else /* Sort complete */
break;
/*
* Sift element at "a" down into heap. This is the
* "bottom-up" variant, which significantly reduces
* calls to cmp_func(): we find the sift-down path all
* the way to the leaves (one compare per level), then
* backtrack to find where to insert the target element.
*
* Because elements tend to sift down close to the leaves,
* this uses fewer compares than doing two per level
* on the way down. (A bit more than half as many on
* average, 3/4 worst-case.)
*/
for (b = a; c = 2*b + 1, (d = c + 1) < n;)
b = wb_key_ref_cmp(base + c, base + d) ? c : d;
if (d == n) /* Special case last leaf with no sibling */
b = c;
/* Now backtrack from "b" to the correct location for "a" */
while (b != a && wb_key_ref_cmp(base + a, base + b))
b = (b - 1) / 2;
c = b; /* Where "a" belongs */
while (b != a) { /* Shift it into place */
b = (b - 1) / 2;
swap(base[b], base[c]);
}
}
}
static noinline int wb_flush_one_slowpath(struct btree_trans *trans,
struct btree_iter *iter,
struct btree_write_buffered_key *wb)
{
struct btree_path *path = btree_iter_path(trans, iter);
bch2_btree_node_unlock_write(trans, path, path->l[0].b);
trans->journal_res.seq = wb->journal_seq;
return bch2_trans_update(trans, iter, &wb->k,
BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE) ?:
bch2_trans_commit(trans, NULL, NULL,
BCH_TRANS_COMMIT_no_enospc|
BCH_TRANS_COMMIT_no_check_rw|
BCH_TRANS_COMMIT_no_journal_res|
BCH_TRANS_COMMIT_journal_reclaim);
}
static inline int wb_flush_one(struct btree_trans *trans, struct btree_iter *iter,
struct btree_write_buffered_key *wb,
bool *write_locked, size_t *fast)
{ {
struct bch_fs *c = trans->c; struct bch_fs *c = trans->c;
struct btree_path *path; struct btree_path *path;
int ret; int ret;
EBUG_ON(!wb->journal_seq);
EBUG_ON(!c->btree_write_buffer.flushing.pin.seq);
EBUG_ON(c->btree_write_buffer.flushing.pin.seq > wb->journal_seq);
ret = bch2_btree_iter_traverse(iter); ret = bch2_btree_iter_traverse(iter);
if (ret) if (ret)
return ret; return ret;
path = iter->path; /*
* We can't clone a path that has write locks: unshare it now, before
* set_pos and traverse():
*/
if (btree_iter_path(trans, iter)->ref > 1)
iter->path = __bch2_btree_path_make_mut(trans, iter->path, true, _THIS_IP_);
path = btree_iter_path(trans, iter);
if (!*write_locked) { if (!*write_locked) {
ret = bch2_btree_node_lock_write(trans, path, &path->l[0].b->c); ret = bch2_btree_node_lock_write(trans, path, &path->l[0].b->c);
@ -56,52 +155,14 @@ static int bch2_btree_write_buffer_flush_one(struct btree_trans *trans,
*write_locked = true; *write_locked = true;
} }
if (!bch2_btree_node_insert_fits(c, path->l[0].b, wb->k.k.u64s)) { if (unlikely(!bch2_btree_node_insert_fits(c, path->l[0].b, wb->k.k.u64s))) {
bch2_btree_node_unlock_write(trans, path, path->l[0].b);
*write_locked = false; *write_locked = false;
goto trans_commit; return wb_flush_one_slowpath(trans, iter, wb);
} }
bch2_btree_insert_key_leaf(trans, path, &wb->k, wb->journal_seq); bch2_btree_insert_key_leaf(trans, path, &wb->k, wb->journal_seq);
(*fast)++; (*fast)++;
if (path->ref > 1) {
/*
* We can't clone a path that has write locks: if the path is
* shared, unlock before set_pos(), traverse():
*/
bch2_btree_node_unlock_write(trans, path, path->l[0].b);
*write_locked = false;
}
return 0; return 0;
trans_commit:
return bch2_trans_update_seq(trans, wb->journal_seq, iter, &wb->k,
BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE) ?:
bch2_trans_commit(trans, NULL, NULL,
commit_flags|
BTREE_INSERT_NOCHECK_RW|
BTREE_INSERT_NOFAIL|
BTREE_INSERT_JOURNAL_RECLAIM);
}
static union btree_write_buffer_state btree_write_buffer_switch(struct btree_write_buffer *wb)
{
union btree_write_buffer_state old, new;
u64 v = READ_ONCE(wb->state.v);
do {
old.v = new.v = v;
new.nr = 0;
new.idx++;
} while ((v = atomic64_cmpxchg_acquire(&wb->state.counter, old.v, new.v)) != old.v);
while (old.idx == 0 ? wb->state.ref0 : wb->state.ref1)
cpu_relax();
smp_mb();
return old;
} }
/* /*
@ -124,41 +185,87 @@ btree_write_buffered_insert(struct btree_trans *trans,
bch2_trans_iter_init(trans, &iter, wb->btree, bkey_start_pos(&wb->k.k), bch2_trans_iter_init(trans, &iter, wb->btree, bkey_start_pos(&wb->k.k),
BTREE_ITER_CACHED|BTREE_ITER_INTENT); BTREE_ITER_CACHED|BTREE_ITER_INTENT);
trans->journal_res.seq = wb->journal_seq;
ret = bch2_btree_iter_traverse(&iter) ?: ret = bch2_btree_iter_traverse(&iter) ?:
bch2_trans_update_seq(trans, wb->journal_seq, &iter, &wb->k, bch2_trans_update(trans, &iter, &wb->k,
BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE); BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE);
bch2_trans_iter_exit(trans, &iter); bch2_trans_iter_exit(trans, &iter);
return ret; return ret;
} }
int __bch2_btree_write_buffer_flush(struct btree_trans *trans, unsigned commit_flags, static void move_keys_from_inc_to_flushing(struct btree_write_buffer *wb)
bool locked) {
struct bch_fs *c = container_of(wb, struct bch_fs, btree_write_buffer);
struct journal *j = &c->journal;
if (!wb->inc.keys.nr)
return;
bch2_journal_pin_add(j, wb->inc.keys.data[0].journal_seq, &wb->flushing.pin,
bch2_btree_write_buffer_journal_flush);
darray_resize(&wb->flushing.keys, min_t(size_t, 1U << 20, wb->flushing.keys.nr + wb->inc.keys.nr));
darray_resize(&wb->sorted, wb->flushing.keys.size);
if (!wb->flushing.keys.nr && wb->sorted.size >= wb->inc.keys.nr) {
swap(wb->flushing.keys, wb->inc.keys);
goto out;
}
size_t nr = min(darray_room(wb->flushing.keys),
wb->sorted.size - wb->flushing.keys.nr);
nr = min(nr, wb->inc.keys.nr);
memcpy(&darray_top(wb->flushing.keys),
wb->inc.keys.data,
sizeof(wb->inc.keys.data[0]) * nr);
memmove(wb->inc.keys.data,
wb->inc.keys.data + nr,
sizeof(wb->inc.keys.data[0]) * (wb->inc.keys.nr - nr));
wb->flushing.keys.nr += nr;
wb->inc.keys.nr -= nr;
out:
if (!wb->inc.keys.nr)
bch2_journal_pin_drop(j, &wb->inc.pin);
else
bch2_journal_pin_update(j, wb->inc.keys.data[0].journal_seq, &wb->inc.pin,
bch2_btree_write_buffer_journal_flush);
if (j->watermark) {
spin_lock(&j->lock);
bch2_journal_set_watermark(j);
spin_unlock(&j->lock);
}
BUG_ON(wb->sorted.size < wb->flushing.keys.nr);
}
static int bch2_btree_write_buffer_flush_locked(struct btree_trans *trans)
{ {
struct bch_fs *c = trans->c; struct bch_fs *c = trans->c;
struct journal *j = &c->journal; struct journal *j = &c->journal;
struct btree_write_buffer *wb = &c->btree_write_buffer; struct btree_write_buffer *wb = &c->btree_write_buffer;
struct journal_entry_pin pin;
struct btree_write_buffered_key *i, *keys;
struct btree_iter iter = { NULL }; struct btree_iter iter = { NULL };
size_t nr = 0, skipped = 0, fast = 0, slowpath = 0; size_t skipped = 0, fast = 0, slowpath = 0;
bool write_locked = false; bool write_locked = false;
union btree_write_buffer_state s;
int ret = 0; int ret = 0;
memset(&pin, 0, sizeof(pin)); bch2_trans_unlock(trans);
bch2_trans_begin(trans);
if (!locked && !mutex_trylock(&wb->flush_lock)) mutex_lock(&wb->inc.lock);
return 0; move_keys_from_inc_to_flushing(wb);
mutex_unlock(&wb->inc.lock);
bch2_journal_pin_copy(j, &pin, &wb->journal_pin, NULL); for (size_t i = 0; i < wb->flushing.keys.nr; i++) {
bch2_journal_pin_drop(j, &wb->journal_pin); wb->sorted.data[i].idx = i;
wb->sorted.data[i].btree = wb->flushing.keys.data[i].btree;
s = btree_write_buffer_switch(wb); memcpy(&wb->sorted.data[i].pos, &wb->flushing.keys.data[i].k.k.p, sizeof(struct bpos));
keys = wb->keys[s.idx]; }
nr = s.nr; wb->sorted.nr = wb->flushing.keys.nr;
if (race_fault())
goto slowpath;
/* /*
* We first sort so that we can detect and skip redundant updates, and * We first sort so that we can detect and skip redundant updates, and
@ -168,208 +275,373 @@ int __bch2_btree_write_buffer_flush(struct btree_trans *trans, unsigned commit_f
* However, since we're not flushing in the order they appear in the * However, since we're not flushing in the order they appear in the
* journal we won't be able to drop our journal pin until everything is * journal we won't be able to drop our journal pin until everything is
* flushed - which means this could deadlock the journal if we weren't * flushed - which means this could deadlock the journal if we weren't
* passing BTREE_INSERT_JOURNAL_RECLAIM. This causes the update to fail * passing BCH_TRANS_COMMIT_journal_reclaim. This causes the update to fail
* if it would block taking a journal reservation. * if it would block taking a journal reservation.
* *
* If that happens, simply skip the key so we can optimistically insert * If that happens, simply skip the key so we can optimistically insert
* as many keys as possible in the fast path. * as many keys as possible in the fast path.
*/ */
sort(keys, nr, sizeof(keys[0]), wb_sort(wb->sorted.data, wb->sorted.nr);
btree_write_buffered_key_cmp, NULL);
darray_for_each(wb->sorted, i) {
struct btree_write_buffered_key *k = &wb->flushing.keys.data[i->idx];
for (struct wb_key_ref *n = i + 1; n < min(i + 4, &darray_top(wb->sorted)); n++)
prefetch(&wb->flushing.keys.data[n->idx]);
BUG_ON(!k->journal_seq);
if (i + 1 < &darray_top(wb->sorted) &&
wb_key_eq(i, i + 1)) {
struct btree_write_buffered_key *n = &wb->flushing.keys.data[i[1].idx];
for (i = keys; i < keys + nr; i++) {
if (i + 1 < keys + nr &&
i[0].btree == i[1].btree &&
bpos_eq(i[0].k.k.p, i[1].k.k.p)) {
skipped++; skipped++;
i->journal_seq = 0; n->journal_seq = min_t(u64, n->journal_seq, k->journal_seq);
k->journal_seq = 0;
continue; continue;
} }
if (write_locked && if (write_locked) {
(iter.path->btree_id != i->btree || struct btree_path *path = btree_iter_path(trans, &iter);
bpos_gt(i->k.k.p, iter.path->l[0].b->key.k.p))) {
bch2_btree_node_unlock_write(trans, iter.path, iter.path->l[0].b); if (path->btree_id != i->btree ||
write_locked = false; bpos_gt(k->k.k.p, path->l[0].b->key.k.p)) {
bch2_btree_node_unlock_write(trans, path, path->l[0].b);
write_locked = false;
}
} }
if (!iter.path || iter.path->btree_id != i->btree) { if (!iter.path || iter.btree_id != k->btree) {
bch2_trans_iter_exit(trans, &iter); bch2_trans_iter_exit(trans, &iter);
bch2_trans_iter_init(trans, &iter, i->btree, i->k.k.p, bch2_trans_iter_init(trans, &iter, k->btree, k->k.k.p,
BTREE_ITER_INTENT|BTREE_ITER_ALL_SNAPSHOTS); BTREE_ITER_INTENT|BTREE_ITER_ALL_SNAPSHOTS);
} }
bch2_btree_iter_set_pos(&iter, i->k.k.p); bch2_btree_iter_set_pos(&iter, k->k.k.p);
iter.path->preserve = false; btree_iter_path(trans, &iter)->preserve = false;
do { do {
ret = bch2_btree_write_buffer_flush_one(trans, &iter, i, if (race_fault()) {
commit_flags, &write_locked, &fast); ret = -BCH_ERR_journal_reclaim_would_deadlock;
break;
}
ret = wb_flush_one(trans, &iter, k, &write_locked, &fast);
if (!write_locked) if (!write_locked)
bch2_trans_begin(trans); bch2_trans_begin(trans);
} while (bch2_err_matches(ret, BCH_ERR_transaction_restart)); } while (bch2_err_matches(ret, BCH_ERR_transaction_restart));
if (ret == -BCH_ERR_journal_reclaim_would_deadlock) { if (!ret) {
k->journal_seq = 0;
} else if (ret == -BCH_ERR_journal_reclaim_would_deadlock) {
slowpath++; slowpath++;
continue; ret = 0;
} } else
if (ret)
break; break;
i->journal_seq = 0;
} }
if (write_locked) if (write_locked) {
bch2_btree_node_unlock_write(trans, iter.path, iter.path->l[0].b); struct btree_path *path = btree_iter_path(trans, &iter);
bch2_btree_node_unlock_write(trans, path, path->l[0].b);
}
bch2_trans_iter_exit(trans, &iter); bch2_trans_iter_exit(trans, &iter);
trace_write_buffer_flush(trans, nr, skipped, fast, wb->size); if (ret)
goto err;
if (slowpath) if (slowpath) {
goto slowpath; /*
* Flush in the order they were present in the journal, so that
* we can release journal pins:
* The fastpath zapped the seq of keys that were successfully flushed so
* we can skip those here.
*/
trace_and_count(c, write_buffer_flush_slowpath, trans, slowpath, wb->flushing.keys.nr);
bch2_fs_fatal_err_on(ret, c, "%s: insert error %s", __func__, bch2_err_str(ret)); darray_for_each(wb->flushing.keys, i) {
out: if (!i->journal_seq)
bch2_journal_pin_drop(j, &pin); continue;
mutex_unlock(&wb->flush_lock);
return ret;
slowpath:
trace_write_buffer_flush_slowpath(trans, i - keys, nr);
/* bch2_journal_pin_update(j, i->journal_seq, &wb->flushing.pin,
* Now sort the rest by journal seq and bump the journal pin as we go. bch2_btree_write_buffer_journal_flush);
* The slowpath zapped the seq of keys that were successfully flushed so
* we can skip those here.
*/
sort(keys, nr, sizeof(keys[0]),
btree_write_buffered_journal_cmp,
NULL);
commit_flags &= ~BCH_WATERMARK_MASK; bch2_trans_begin(trans);
commit_flags |= BCH_WATERMARK_reclaim;
for (i = keys; i < keys + nr; i++) { ret = commit_do(trans, NULL, NULL,
if (!i->journal_seq) BCH_WATERMARK_reclaim|
continue; BCH_TRANS_COMMIT_no_check_rw|
BCH_TRANS_COMMIT_no_enospc|
if (i->journal_seq > pin.seq) { BCH_TRANS_COMMIT_no_journal_res|
struct journal_entry_pin pin2; BCH_TRANS_COMMIT_journal_reclaim,
btree_write_buffered_insert(trans, i));
memset(&pin2, 0, sizeof(pin2)); if (ret)
goto err;
bch2_journal_pin_add(j, i->journal_seq, &pin2, NULL);
bch2_journal_pin_drop(j, &pin);
bch2_journal_pin_copy(j, &pin, &pin2, NULL);
bch2_journal_pin_drop(j, &pin2);
} }
}
err:
bch2_fs_fatal_err_on(ret, c, "%s: insert error %s", __func__, bch2_err_str(ret));
trace_write_buffer_flush(trans, wb->flushing.keys.nr, skipped, fast, 0);
bch2_journal_pin_drop(j, &wb->flushing.pin);
wb->flushing.keys.nr = 0;
return ret;
}
ret = commit_do(trans, NULL, NULL, static int fetch_wb_keys_from_journal(struct bch_fs *c, u64 seq)
commit_flags| {
BTREE_INSERT_NOFAIL| struct journal *j = &c->journal;
BTREE_INSERT_JOURNAL_RECLAIM, struct journal_buf *buf;
btree_write_buffered_insert(trans, i)); int ret = 0;
if (bch2_fs_fatal_err_on(ret, c, "%s: insert error %s", __func__, bch2_err_str(ret)))
break; while (!ret && (buf = bch2_next_write_buffer_flush_journal_buf(j, seq))) {
ret = bch2_journal_keys_to_write_buffer(c, buf);
mutex_unlock(&j->buf_lock);
} }
goto out; return ret;
} }
int bch2_btree_write_buffer_flush_sync(struct btree_trans *trans) static int btree_write_buffer_flush_seq(struct btree_trans *trans, u64 seq)
{ {
bch2_trans_unlock(trans); struct bch_fs *c = trans->c;
mutex_lock(&trans->c->btree_write_buffer.flush_lock); struct btree_write_buffer *wb = &c->btree_write_buffer;
return __bch2_btree_write_buffer_flush(trans, 0, true); int ret = 0, fetch_from_journal_err;
}
int bch2_btree_write_buffer_flush(struct btree_trans *trans) do {
{ bch2_trans_unlock(trans);
return __bch2_btree_write_buffer_flush(trans, 0, false);
fetch_from_journal_err = fetch_wb_keys_from_journal(c, seq);
/*
* On memory allocation failure, bch2_btree_write_buffer_flush_locked()
* is not guaranteed to empty wb->inc:
*/
mutex_lock(&wb->flushing.lock);
ret = bch2_btree_write_buffer_flush_locked(trans);
mutex_unlock(&wb->flushing.lock);
} while (!ret &&
(fetch_from_journal_err ||
(wb->inc.pin.seq && wb->inc.pin.seq <= seq) ||
(wb->flushing.pin.seq && wb->flushing.pin.seq <= seq)));
return ret;
} }
static int bch2_btree_write_buffer_journal_flush(struct journal *j, static int bch2_btree_write_buffer_journal_flush(struct journal *j,
struct journal_entry_pin *_pin, u64 seq) struct journal_entry_pin *_pin, u64 seq)
{ {
struct bch_fs *c = container_of(j, struct bch_fs, journal); struct bch_fs *c = container_of(j, struct bch_fs, journal);
struct btree_write_buffer *wb = &c->btree_write_buffer;
mutex_lock(&wb->flush_lock); return bch2_trans_run(c, btree_write_buffer_flush_seq(trans, seq));
return bch2_trans_run(c,
__bch2_btree_write_buffer_flush(trans, BTREE_INSERT_NOCHECK_RW, true));
} }
static inline u64 btree_write_buffer_ref(int idx) int bch2_btree_write_buffer_flush_sync(struct btree_trans *trans)
{ {
return ((union btree_write_buffer_state) { struct bch_fs *c = trans->c;
.ref0 = idx == 0,
.ref1 = idx == 1, trace_and_count(c, write_buffer_flush_sync, trans, _RET_IP_);
}).v;
return btree_write_buffer_flush_seq(trans, journal_cur_seq(&c->journal));
} }
int bch2_btree_insert_keys_write_buffer(struct btree_trans *trans) int bch2_btree_write_buffer_flush_nocheck_rw(struct btree_trans *trans)
{ {
struct bch_fs *c = trans->c; struct bch_fs *c = trans->c;
struct btree_write_buffer *wb = &c->btree_write_buffer; struct btree_write_buffer *wb = &c->btree_write_buffer;
struct btree_write_buffered_key *i;
union btree_write_buffer_state old, new;
int ret = 0; int ret = 0;
u64 v;
trans_for_each_wb_update(trans, i) { if (mutex_trylock(&wb->flushing.lock)) {
EBUG_ON(i->k.k.u64s > BTREE_WRITE_BUFERED_U64s_MAX); ret = bch2_btree_write_buffer_flush_locked(trans);
mutex_unlock(&wb->flushing.lock);
i->journal_seq = trans->journal_res.seq;
i->journal_offset = trans->journal_res.offset;
} }
preempt_disable();
v = READ_ONCE(wb->state.v);
do {
old.v = new.v = v;
new.v += btree_write_buffer_ref(new.idx);
new.nr += trans->nr_wb_updates;
if (new.nr > wb->size) {
ret = -BCH_ERR_btree_insert_need_flush_buffer;
goto out;
}
} while ((v = atomic64_cmpxchg_acquire(&wb->state.counter, old.v, new.v)) != old.v);
memcpy(wb->keys[new.idx] + old.nr,
trans->wb_updates,
sizeof(trans->wb_updates[0]) * trans->nr_wb_updates);
bch2_journal_pin_add(&c->journal, trans->journal_res.seq, &wb->journal_pin,
bch2_btree_write_buffer_journal_flush);
atomic64_sub_return_release(btree_write_buffer_ref(new.idx), &wb->state.counter);
out:
preempt_enable();
return ret; return ret;
} }
int bch2_btree_write_buffer_tryflush(struct btree_trans *trans)
{
struct bch_fs *c = trans->c;
if (!bch2_write_ref_tryget(c, BCH_WRITE_REF_btree_write_buffer))
return -BCH_ERR_erofs_no_writes;
int ret = bch2_btree_write_buffer_flush_nocheck_rw(trans);
bch2_write_ref_put(c, BCH_WRITE_REF_btree_write_buffer);
return ret;
}
static void bch2_btree_write_buffer_flush_work(struct work_struct *work)
{
struct bch_fs *c = container_of(work, struct bch_fs, btree_write_buffer.flush_work);
struct btree_write_buffer *wb = &c->btree_write_buffer;
int ret;
mutex_lock(&wb->flushing.lock);
do {
ret = bch2_trans_run(c, bch2_btree_write_buffer_flush_locked(trans));
} while (!ret && bch2_btree_write_buffer_should_flush(c));
mutex_unlock(&wb->flushing.lock);
bch2_write_ref_put(c, BCH_WRITE_REF_btree_write_buffer);
}
int bch2_journal_key_to_wb_slowpath(struct bch_fs *c,
struct journal_keys_to_wb *dst,
enum btree_id btree, struct bkey_i *k)
{
struct btree_write_buffer *wb = &c->btree_write_buffer;
int ret;
retry:
ret = darray_make_room_gfp(&dst->wb->keys, 1, GFP_KERNEL);
if (!ret && dst->wb == &wb->flushing)
ret = darray_resize(&wb->sorted, wb->flushing.keys.size);
if (unlikely(ret)) {
if (dst->wb == &c->btree_write_buffer.flushing) {
mutex_unlock(&dst->wb->lock);
dst->wb = &c->btree_write_buffer.inc;
bch2_journal_pin_add(&c->journal, dst->seq, &dst->wb->pin,
bch2_btree_write_buffer_journal_flush);
goto retry;
}
return ret;
}
dst->room = darray_room(dst->wb->keys);
if (dst->wb == &wb->flushing)
dst->room = min(dst->room, wb->sorted.size - wb->flushing.keys.nr);
BUG_ON(!dst->room);
BUG_ON(!dst->seq);
struct btree_write_buffered_key *wb_k = &darray_top(dst->wb->keys);
wb_k->journal_seq = dst->seq;
wb_k->btree = btree;
bkey_copy(&wb_k->k, k);
dst->wb->keys.nr++;
dst->room--;
return 0;
}
void bch2_journal_keys_to_write_buffer_start(struct bch_fs *c, struct journal_keys_to_wb *dst, u64 seq)
{
struct btree_write_buffer *wb = &c->btree_write_buffer;
if (mutex_trylock(&wb->flushing.lock)) {
mutex_lock(&wb->inc.lock);
move_keys_from_inc_to_flushing(wb);
/*
* Attempt to skip wb->inc, and add keys directly to
* wb->flushing, saving us a copy later:
*/
if (!wb->inc.keys.nr) {
dst->wb = &wb->flushing;
} else {
mutex_unlock(&wb->flushing.lock);
dst->wb = &wb->inc;
}
} else {
mutex_lock(&wb->inc.lock);
dst->wb = &wb->inc;
}
dst->room = darray_room(dst->wb->keys);
if (dst->wb == &wb->flushing)
dst->room = min(dst->room, wb->sorted.size - wb->flushing.keys.nr);
dst->seq = seq;
bch2_journal_pin_add(&c->journal, seq, &dst->wb->pin,
bch2_btree_write_buffer_journal_flush);
}
void bch2_journal_keys_to_write_buffer_end(struct bch_fs *c, struct journal_keys_to_wb *dst)
{
struct btree_write_buffer *wb = &c->btree_write_buffer;
if (!dst->wb->keys.nr)
bch2_journal_pin_drop(&c->journal, &dst->wb->pin);
if (bch2_btree_write_buffer_should_flush(c) &&
__bch2_write_ref_tryget(c, BCH_WRITE_REF_btree_write_buffer) &&
!queue_work(system_unbound_wq, &c->btree_write_buffer.flush_work))
bch2_write_ref_put(c, BCH_WRITE_REF_btree_write_buffer);
if (dst->wb == &wb->flushing)
mutex_unlock(&wb->flushing.lock);
mutex_unlock(&wb->inc.lock);
}
static int bch2_journal_keys_to_write_buffer(struct bch_fs *c, struct journal_buf *buf)
{
struct journal_keys_to_wb dst;
struct jset_entry *entry;
struct bkey_i *k;
int ret = 0;
bch2_journal_keys_to_write_buffer_start(c, &dst, le64_to_cpu(buf->data->seq));
for_each_jset_entry_type(entry, buf->data, BCH_JSET_ENTRY_write_buffer_keys) {
jset_entry_for_each_key(entry, k) {
ret = bch2_journal_key_to_wb(c, &dst, entry->btree_id, k);
if (ret)
goto out;
}
entry->type = BCH_JSET_ENTRY_btree_keys;
}
buf->need_flush_to_write_buffer = false;
out:
bch2_journal_keys_to_write_buffer_end(c, &dst);
return ret;
}
static int wb_keys_resize(struct btree_write_buffer_keys *wb, size_t new_size)
{
if (wb->keys.size >= new_size)
return 0;
if (!mutex_trylock(&wb->lock))
return -EINTR;
int ret = darray_resize(&wb->keys, new_size);
mutex_unlock(&wb->lock);
return ret;
}
int bch2_btree_write_buffer_resize(struct bch_fs *c, size_t new_size)
{
struct btree_write_buffer *wb = &c->btree_write_buffer;
return wb_keys_resize(&wb->flushing, new_size) ?:
wb_keys_resize(&wb->inc, new_size);
}
void bch2_fs_btree_write_buffer_exit(struct bch_fs *c) void bch2_fs_btree_write_buffer_exit(struct bch_fs *c)
{ {
struct btree_write_buffer *wb = &c->btree_write_buffer; struct btree_write_buffer *wb = &c->btree_write_buffer;
BUG_ON(wb->state.nr && !bch2_journal_error(&c->journal)); BUG_ON((wb->inc.keys.nr || wb->flushing.keys.nr) &&
!bch2_journal_error(&c->journal));
kvfree(wb->keys[1]); darray_exit(&wb->sorted);
kvfree(wb->keys[0]); darray_exit(&wb->flushing.keys);
darray_exit(&wb->inc.keys);
} }
int bch2_fs_btree_write_buffer_init(struct bch_fs *c) int bch2_fs_btree_write_buffer_init(struct bch_fs *c)
{ {
struct btree_write_buffer *wb = &c->btree_write_buffer; struct btree_write_buffer *wb = &c->btree_write_buffer;
mutex_init(&wb->flush_lock); mutex_init(&wb->inc.lock);
wb->size = c->opts.btree_write_buffer_size; mutex_init(&wb->flushing.lock);
INIT_WORK(&wb->flush_work, bch2_btree_write_buffer_flush_work);
wb->keys[0] = kvmalloc_array(wb->size, sizeof(*wb->keys[0]), GFP_KERNEL); /* Will be resized by journal as needed: */
wb->keys[1] = kvmalloc_array(wb->size, sizeof(*wb->keys[1]), GFP_KERNEL); unsigned initial_size = 1 << 16;
if (!wb->keys[0] || !wb->keys[1])
return -BCH_ERR_ENOMEM_fs_btree_write_buffer_init;
return 0; return darray_make_room(&wb->inc.keys, initial_size) ?:
darray_make_room(&wb->flushing.keys, initial_size) ?:
darray_make_room(&wb->sorted, initial_size);
} }

View File

@ -2,12 +2,59 @@
#ifndef _BCACHEFS_BTREE_WRITE_BUFFER_H #ifndef _BCACHEFS_BTREE_WRITE_BUFFER_H
#define _BCACHEFS_BTREE_WRITE_BUFFER_H #define _BCACHEFS_BTREE_WRITE_BUFFER_H
int __bch2_btree_write_buffer_flush(struct btree_trans *, unsigned, bool); #include "bkey.h"
static inline bool bch2_btree_write_buffer_should_flush(struct bch_fs *c)
{
struct btree_write_buffer *wb = &c->btree_write_buffer;
return wb->inc.keys.nr + wb->flushing.keys.nr > wb->inc.keys.size / 4;
}
static inline bool bch2_btree_write_buffer_must_wait(struct bch_fs *c)
{
struct btree_write_buffer *wb = &c->btree_write_buffer;
return wb->inc.keys.nr > wb->inc.keys.size * 3 / 4;
}
struct btree_trans;
int bch2_btree_write_buffer_flush_sync(struct btree_trans *); int bch2_btree_write_buffer_flush_sync(struct btree_trans *);
int bch2_btree_write_buffer_flush(struct btree_trans *); int bch2_btree_write_buffer_flush_nocheck_rw(struct btree_trans *);
int bch2_btree_write_buffer_tryflush(struct btree_trans *);
int bch2_btree_insert_keys_write_buffer(struct btree_trans *); struct journal_keys_to_wb {
struct btree_write_buffer_keys *wb;
size_t room;
u64 seq;
};
int bch2_journal_key_to_wb_slowpath(struct bch_fs *,
struct journal_keys_to_wb *,
enum btree_id, struct bkey_i *);
static inline int bch2_journal_key_to_wb(struct bch_fs *c,
struct journal_keys_to_wb *dst,
enum btree_id btree, struct bkey_i *k)
{
EBUG_ON(!dst->seq);
if (unlikely(!dst->room))
return bch2_journal_key_to_wb_slowpath(c, dst, btree, k);
struct btree_write_buffered_key *wb_k = &darray_top(dst->wb->keys);
wb_k->journal_seq = dst->seq;
wb_k->btree = btree;
bkey_copy(&wb_k->k, k);
dst->wb->keys.nr++;
dst->room--;
return 0;
}
void bch2_journal_keys_to_write_buffer_start(struct bch_fs *, struct journal_keys_to_wb *, u64);
void bch2_journal_keys_to_write_buffer_end(struct bch_fs *, struct journal_keys_to_wb *);
int bch2_btree_write_buffer_resize(struct bch_fs *, size_t);
void bch2_fs_btree_write_buffer_exit(struct bch_fs *); void bch2_fs_btree_write_buffer_exit(struct bch_fs *);
int bch2_fs_btree_write_buffer_init(struct bch_fs *); int bch2_fs_btree_write_buffer_init(struct bch_fs *);

View File

@ -2,43 +2,56 @@
#ifndef _BCACHEFS_BTREE_WRITE_BUFFER_TYPES_H #ifndef _BCACHEFS_BTREE_WRITE_BUFFER_TYPES_H
#define _BCACHEFS_BTREE_WRITE_BUFFER_TYPES_H #define _BCACHEFS_BTREE_WRITE_BUFFER_TYPES_H
#include "darray.h"
#include "journal_types.h" #include "journal_types.h"
#define BTREE_WRITE_BUFERED_VAL_U64s_MAX 4 #define BTREE_WRITE_BUFERED_VAL_U64s_MAX 4
#define BTREE_WRITE_BUFERED_U64s_MAX (BKEY_U64s + BTREE_WRITE_BUFERED_VAL_U64s_MAX) #define BTREE_WRITE_BUFERED_U64s_MAX (BKEY_U64s + BTREE_WRITE_BUFERED_VAL_U64s_MAX)
struct wb_key_ref {
union {
struct {
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
unsigned idx:24;
u8 pos[sizeof(struct bpos)];
enum btree_id btree:8;
#else
enum btree_id btree:8;
u8 pos[sizeof(struct bpos)];
unsigned idx:24;
#endif
} __packed;
struct {
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
u64 lo;
u64 mi;
u64 hi;
#else
u64 hi;
u64 mi;
u64 lo;
#endif
};
};
};
struct btree_write_buffered_key { struct btree_write_buffered_key {
u64 journal_seq; enum btree_id btree:8;
unsigned journal_offset; u64 journal_seq:56;
enum btree_id btree;
__BKEY_PADDED(k, BTREE_WRITE_BUFERED_VAL_U64s_MAX); __BKEY_PADDED(k, BTREE_WRITE_BUFERED_VAL_U64s_MAX);
}; };
union btree_write_buffer_state { struct btree_write_buffer_keys {
struct { DARRAY(struct btree_write_buffered_key) keys;
atomic64_t counter; struct journal_entry_pin pin;
}; struct mutex lock;
struct {
u64 v;
};
struct {
u64 nr:23;
u64 idx:1;
u64 ref0:20;
u64 ref1:20;
};
}; };
struct btree_write_buffer { struct btree_write_buffer {
struct mutex flush_lock; DARRAY(struct wb_key_ref) sorted;
struct journal_entry_pin journal_pin; struct btree_write_buffer_keys inc;
struct btree_write_buffer_keys flushing;
union btree_write_buffer_state state; struct work_struct flush_work;
size_t size;
struct btree_write_buffered_key *keys[2];
}; };
#endif /* _BCACHEFS_BTREE_WRITE_BUFFER_TYPES_H */ #endif /* _BCACHEFS_BTREE_WRITE_BUFFER_TYPES_H */

File diff suppressed because it is too large Load Diff

View File

@ -203,6 +203,7 @@ static inline struct bch_dev_usage bch2_dev_usage_read(struct bch_dev *ca)
} }
void bch2_dev_usage_init(struct bch_dev *); void bch2_dev_usage_init(struct bch_dev *);
void bch2_dev_usage_to_text(struct printbuf *, struct bch_dev_usage *);
static inline u64 bch2_dev_buckets_reserved(struct bch_dev *ca, enum bch_watermark watermark) static inline u64 bch2_dev_buckets_reserved(struct bch_dev *ca, enum bch_watermark watermark)
{ {
@ -301,6 +302,12 @@ u64 bch2_fs_sectors_used(struct bch_fs *, struct bch_fs_usage_online *);
struct bch_fs_usage_short struct bch_fs_usage_short
bch2_fs_usage_read_short(struct bch_fs *); bch2_fs_usage_read_short(struct bch_fs *);
void bch2_dev_usage_update(struct bch_fs *, struct bch_dev *,
const struct bch_alloc_v4 *,
const struct bch_alloc_v4 *, u64, bool);
void bch2_dev_usage_update_m(struct bch_fs *, struct bch_dev *,
struct bucket *, struct bucket *);
/* key/bucket marking: */ /* key/bucket marking: */
static inline struct bch_fs_usage *fs_usage_ptr(struct bch_fs *c, static inline struct bch_fs_usage *fs_usage_ptr(struct bch_fs *c,
@ -315,44 +322,40 @@ static inline struct bch_fs_usage *fs_usage_ptr(struct bch_fs *c,
: c->usage[journal_seq & JOURNAL_BUF_MASK]); : c->usage[journal_seq & JOURNAL_BUF_MASK]);
} }
int bch2_update_replicas(struct bch_fs *, struct bkey_s_c,
struct bch_replicas_entry_v1 *, s64,
unsigned, bool);
int bch2_update_replicas_list(struct btree_trans *,
struct bch_replicas_entry_v1 *, s64);
int bch2_update_cached_sectors_list(struct btree_trans *, unsigned, s64);
int bch2_replicas_deltas_realloc(struct btree_trans *, unsigned); int bch2_replicas_deltas_realloc(struct btree_trans *, unsigned);
void bch2_fs_usage_initialize(struct bch_fs *); void bch2_fs_usage_initialize(struct bch_fs *);
int bch2_check_bucket_ref(struct btree_trans *, struct bkey_s_c,
const struct bch_extent_ptr *,
s64, enum bch_data_type, u8, u8, u32);
int bch2_mark_metadata_bucket(struct bch_fs *, struct bch_dev *, int bch2_mark_metadata_bucket(struct bch_fs *, struct bch_dev *,
size_t, enum bch_data_type, unsigned, size_t, enum bch_data_type, unsigned,
struct gc_pos, unsigned); struct gc_pos, unsigned);
int bch2_mark_alloc(struct btree_trans *, enum btree_id, unsigned, int bch2_trigger_extent(struct btree_trans *, enum btree_id, unsigned,
struct bkey_s_c, struct bkey_s_c, unsigned); struct bkey_s_c, struct bkey_s, unsigned);
int bch2_mark_extent(struct btree_trans *, enum btree_id, unsigned, int bch2_trigger_reservation(struct btree_trans *, enum btree_id, unsigned,
struct bkey_s_c, struct bkey_s_c, unsigned); struct bkey_s_c, struct bkey_s, unsigned);
int bch2_mark_stripe(struct btree_trans *, enum btree_id, unsigned,
struct bkey_s_c, struct bkey_s_c, unsigned);
int bch2_mark_reservation(struct btree_trans *, enum btree_id, unsigned,
struct bkey_s_c, struct bkey_s_c, unsigned);
int bch2_mark_reflink_p(struct btree_trans *, enum btree_id, unsigned,
struct bkey_s_c, struct bkey_s_c, unsigned);
int bch2_trans_mark_extent(struct btree_trans *, enum btree_id, unsigned, struct bkey_s_c, struct bkey_i *, unsigned); #define trigger_run_overwrite_then_insert(_fn, _trans, _btree_id, _level, _old, _new, _flags)\
int bch2_trans_mark_stripe(struct btree_trans *, enum btree_id, unsigned, struct bkey_s_c, struct bkey_i *, unsigned);
int bch2_trans_mark_reservation(struct btree_trans *, enum btree_id, unsigned, struct bkey_s_c, struct bkey_i *, unsigned);
int bch2_trans_mark_reflink_p(struct btree_trans *, enum btree_id, unsigned, struct bkey_s_c, struct bkey_i *, unsigned);
#define mem_trigger_run_overwrite_then_insert(_fn, _trans, _btree_id, _level, _old, _new, _flags)\
({ \ ({ \
int ret = 0; \ int ret = 0; \
\ \
if (_old.k->type) \ if (_old.k->type) \
ret = _fn(_trans, _btree_id, _level, _old, _flags & ~BTREE_TRIGGER_INSERT); \ ret = _fn(_trans, _btree_id, _level, _old, _flags & ~BTREE_TRIGGER_INSERT); \
if (!ret && _new.k->type) \ if (!ret && _new.k->type) \
ret = _fn(_trans, _btree_id, _level, _new, _flags & ~BTREE_TRIGGER_OVERWRITE); \ ret = _fn(_trans, _btree_id, _level, _new.s_c, _flags & ~BTREE_TRIGGER_OVERWRITE);\
ret; \ ret; \
}) })
#define trigger_run_overwrite_then_insert(_fn, _trans, _btree_id, _level, _old, _new, _flags) \
mem_trigger_run_overwrite_then_insert(_fn, _trans, _btree_id, _level, _old, bkey_i_to_s_c(_new), _flags)
void bch2_trans_fs_usage_revert(struct btree_trans *, struct replicas_delta_list *); void bch2_trans_fs_usage_revert(struct btree_trans *, struct replicas_delta_list *);
int bch2_trans_fs_usage_apply(struct btree_trans *, struct replicas_delta_list *); int bch2_trans_fs_usage_apply(struct btree_trans *, struct replicas_delta_list *);

View File

@ -33,8 +33,6 @@ struct bucket_gens {
}; };
struct bch_dev_usage { struct bch_dev_usage {
u64 buckets_ec;
struct { struct {
u64 buckets; u64 buckets;
u64 sectors; /* _compressed_ sectors: */ u64 sectors; /* _compressed_ sectors: */

View File

@ -7,22 +7,27 @@
#include "chardev.h" #include "chardev.h"
#include "journal.h" #include "journal.h"
#include "move.h" #include "move.h"
#include "recovery.h"
#include "replicas.h" #include "replicas.h"
#include "super.h" #include "super.h"
#include "super-io.h" #include "super-io.h"
#include "thread_with_file.h"
#include <linux/anon_inodes.h>
#include <linux/cdev.h> #include <linux/cdev.h>
#include <linux/device.h> #include <linux/device.h>
#include <linux/file.h>
#include <linux/fs.h> #include <linux/fs.h>
#include <linux/ioctl.h> #include <linux/ioctl.h>
#include <linux/kthread.h>
#include <linux/major.h> #include <linux/major.h>
#include <linux/sched/task.h> #include <linux/sched/task.h>
#include <linux/slab.h> #include <linux/slab.h>
#include <linux/uaccess.h> #include <linux/uaccess.h>
__must_check
static int copy_to_user_errcode(void __user *to, const void *from, unsigned long n)
{
return copy_to_user(to, from, n) ? -EFAULT : 0;
}
/* returns with ref on ca->ref */ /* returns with ref on ca->ref */
static struct bch_dev *bch2_device_lookup(struct bch_fs *c, u64 dev, static struct bch_dev *bch2_device_lookup(struct bch_fs *c, u64 dev,
unsigned flags) unsigned flags)
@ -132,8 +137,106 @@ static long bch2_ioctl_incremental(struct bch_ioctl_incremental __user *user_arg
} }
#endif #endif
struct fsck_thread {
struct thread_with_stdio thr;
struct bch_fs *c;
char **devs;
size_t nr_devs;
struct bch_opts opts;
};
static void bch2_fsck_thread_exit(struct thread_with_stdio *_thr)
{
struct fsck_thread *thr = container_of(_thr, struct fsck_thread, thr);
if (thr->devs)
for (size_t i = 0; i < thr->nr_devs; i++)
kfree(thr->devs[i]);
kfree(thr->devs);
kfree(thr);
}
static int bch2_fsck_offline_thread_fn(void *arg)
{
struct fsck_thread *thr = container_of(arg, struct fsck_thread, thr);
struct bch_fs *c = bch2_fs_open(thr->devs, thr->nr_devs, thr->opts);
thr->thr.thr.ret = PTR_ERR_OR_ZERO(c);
if (!thr->thr.thr.ret)
bch2_fs_stop(c);
thread_with_stdio_done(&thr->thr);
return 0;
}
static long bch2_ioctl_fsck_offline(struct bch_ioctl_fsck_offline __user *user_arg)
{
struct bch_ioctl_fsck_offline arg;
struct fsck_thread *thr = NULL;
u64 *devs = NULL;
long ret = 0;
if (copy_from_user(&arg, user_arg, sizeof(arg)))
return -EFAULT;
if (arg.flags)
return -EINVAL;
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
if (!(devs = kcalloc(arg.nr_devs, sizeof(*devs), GFP_KERNEL)) ||
!(thr = kzalloc(sizeof(*thr), GFP_KERNEL)) ||
!(thr->devs = kcalloc(arg.nr_devs, sizeof(*thr->devs), GFP_KERNEL))) {
ret = -ENOMEM;
goto err;
}
thr->opts = bch2_opts_empty();
thr->nr_devs = arg.nr_devs;
if (copy_from_user(devs, &user_arg->devs[0],
array_size(sizeof(user_arg->devs[0]), arg.nr_devs))) {
ret = -EINVAL;
goto err;
}
for (size_t i = 0; i < arg.nr_devs; i++) {
thr->devs[i] = strndup_user((char __user *)(unsigned long) devs[i], PATH_MAX);
ret = PTR_ERR_OR_ZERO(thr->devs[i]);
if (ret)
goto err;
}
if (arg.opts) {
char *optstr = strndup_user((char __user *)(unsigned long) arg.opts, 1 << 16);
ret = PTR_ERR_OR_ZERO(optstr) ?:
bch2_parse_mount_opts(NULL, &thr->opts, optstr);
kfree(optstr);
if (ret)
goto err;
}
opt_set(thr->opts, stdio, (u64)(unsigned long)&thr->thr.stdio);
ret = bch2_run_thread_with_stdio(&thr->thr,
bch2_fsck_thread_exit,
bch2_fsck_offline_thread_fn);
err:
if (ret < 0) {
if (thr)
bch2_fsck_thread_exit(&thr->thr);
pr_err("ret %s", bch2_err_str(ret));
}
kfree(devs);
return ret;
}
static long bch2_global_ioctl(unsigned cmd, void __user *arg) static long bch2_global_ioctl(unsigned cmd, void __user *arg)
{ {
long ret;
switch (cmd) { switch (cmd) {
#if 0 #if 0
case BCH_IOCTL_ASSEMBLE: case BCH_IOCTL_ASSEMBLE:
@ -141,18 +244,25 @@ static long bch2_global_ioctl(unsigned cmd, void __user *arg)
case BCH_IOCTL_INCREMENTAL: case BCH_IOCTL_INCREMENTAL:
return bch2_ioctl_incremental(arg); return bch2_ioctl_incremental(arg);
#endif #endif
default: case BCH_IOCTL_FSCK_OFFLINE: {
return -ENOTTY; ret = bch2_ioctl_fsck_offline(arg);
break;
} }
default:
ret = -ENOTTY;
break;
}
if (ret < 0)
ret = bch2_err_class(ret);
return ret;
} }
static long bch2_ioctl_query_uuid(struct bch_fs *c, static long bch2_ioctl_query_uuid(struct bch_fs *c,
struct bch_ioctl_query_uuid __user *user_arg) struct bch_ioctl_query_uuid __user *user_arg)
{ {
if (copy_to_user(&user_arg->uuid, &c->sb.user_uuid, return copy_to_user_errcode(&user_arg->uuid, &c->sb.user_uuid,
sizeof(c->sb.user_uuid))) sizeof(c->sb.user_uuid));
return -EFAULT;
return 0;
} }
#if 0 #if 0
@ -295,31 +405,27 @@ static long bch2_ioctl_disk_set_state(struct bch_fs *c,
} }
struct bch_data_ctx { struct bch_data_ctx {
struct thread_with_file thr;
struct bch_fs *c; struct bch_fs *c;
struct bch_ioctl_data arg; struct bch_ioctl_data arg;
struct bch_move_stats stats; struct bch_move_stats stats;
int ret;
struct task_struct *thread;
}; };
static int bch2_data_thread(void *arg) static int bch2_data_thread(void *arg)
{ {
struct bch_data_ctx *ctx = arg; struct bch_data_ctx *ctx = container_of(arg, struct bch_data_ctx, thr);
ctx->ret = bch2_data_job(ctx->c, &ctx->stats, ctx->arg);
ctx->thr.ret = bch2_data_job(ctx->c, &ctx->stats, ctx->arg);
ctx->stats.data_type = U8_MAX; ctx->stats.data_type = U8_MAX;
return 0; return 0;
} }
static int bch2_data_job_release(struct inode *inode, struct file *file) static int bch2_data_job_release(struct inode *inode, struct file *file)
{ {
struct bch_data_ctx *ctx = file->private_data; struct bch_data_ctx *ctx = container_of(file->private_data, struct bch_data_ctx, thr);
kthread_stop(ctx->thread); bch2_thread_with_file_exit(&ctx->thr);
put_task_struct(ctx->thread);
kfree(ctx); kfree(ctx);
return 0; return 0;
} }
@ -327,7 +433,7 @@ static int bch2_data_job_release(struct inode *inode, struct file *file)
static ssize_t bch2_data_job_read(struct file *file, char __user *buf, static ssize_t bch2_data_job_read(struct file *file, char __user *buf,
size_t len, loff_t *ppos) size_t len, loff_t *ppos)
{ {
struct bch_data_ctx *ctx = file->private_data; struct bch_data_ctx *ctx = container_of(file->private_data, struct bch_data_ctx, thr);
struct bch_fs *c = ctx->c; struct bch_fs *c = ctx->c;
struct bch_ioctl_data_event e = { struct bch_ioctl_data_event e = {
.type = BCH_DATA_EVENT_PROGRESS, .type = BCH_DATA_EVENT_PROGRESS,
@ -341,10 +447,7 @@ static ssize_t bch2_data_job_read(struct file *file, char __user *buf,
if (len < sizeof(e)) if (len < sizeof(e))
return -EINVAL; return -EINVAL;
if (copy_to_user(buf, &e, sizeof(e))) return copy_to_user_errcode(buf, &e, sizeof(e)) ?: sizeof(e);
return -EFAULT;
return sizeof(e);
} }
static const struct file_operations bcachefs_data_ops = { static const struct file_operations bcachefs_data_ops = {
@ -356,10 +459,8 @@ static const struct file_operations bcachefs_data_ops = {
static long bch2_ioctl_data(struct bch_fs *c, static long bch2_ioctl_data(struct bch_fs *c,
struct bch_ioctl_data arg) struct bch_ioctl_data arg)
{ {
struct bch_data_ctx *ctx = NULL; struct bch_data_ctx *ctx;
struct file *file = NULL; int ret;
unsigned flags = O_RDONLY|O_CLOEXEC|O_NONBLOCK;
int ret, fd = -1;
if (!capable(CAP_SYS_ADMIN)) if (!capable(CAP_SYS_ADMIN))
return -EPERM; return -EPERM;
@ -374,36 +475,11 @@ static long bch2_ioctl_data(struct bch_fs *c,
ctx->c = c; ctx->c = c;
ctx->arg = arg; ctx->arg = arg;
ctx->thread = kthread_create(bch2_data_thread, ctx, ret = bch2_run_thread_with_file(&ctx->thr,
"bch-data/%s", c->name); &bcachefs_data_ops,
if (IS_ERR(ctx->thread)) { bch2_data_thread);
ret = PTR_ERR(ctx->thread);
goto err;
}
ret = get_unused_fd_flags(flags);
if (ret < 0) if (ret < 0)
goto err; kfree(ctx);
fd = ret;
file = anon_inode_getfile("[bcachefs]", &bcachefs_data_ops, ctx, flags);
if (IS_ERR(file)) {
ret = PTR_ERR(file);
goto err;
}
fd_install(fd, file);
get_task_struct(ctx->thread);
wake_up_process(ctx->thread);
return fd;
err:
if (fd >= 0)
put_unused_fd(fd);
if (!IS_ERR_OR_NULL(ctx->thread))
kthread_stop(ctx->thread);
kfree(ctx);
return ret; return ret;
} }
@ -417,7 +493,7 @@ static long bch2_ioctl_fs_usage(struct bch_fs *c,
unsigned i; unsigned i;
int ret = 0; int ret = 0;
if (!test_bit(BCH_FS_STARTED, &c->flags)) if (!test_bit(BCH_FS_started, &c->flags))
return -EINVAL; return -EINVAL;
if (get_user(replica_entries_bytes, &user_arg->replica_entries_bytes)) if (get_user(replica_entries_bytes, &user_arg->replica_entries_bytes))
@ -444,7 +520,7 @@ static long bch2_ioctl_fs_usage(struct bch_fs *c,
dst_end = (void *) arg->replicas + replica_entries_bytes; dst_end = (void *) arg->replicas + replica_entries_bytes;
for (i = 0; i < c->replicas.nr; i++) { for (i = 0; i < c->replicas.nr; i++) {
struct bch_replicas_entry *src_e = struct bch_replicas_entry_v1 *src_e =
cpu_replicas_entry(&c->replicas, i); cpu_replicas_entry(&c->replicas, i);
/* check that we have enough space for one replicas entry */ /* check that we have enough space for one replicas entry */
@ -474,14 +550,15 @@ static long bch2_ioctl_fs_usage(struct bch_fs *c,
if (ret) if (ret)
goto err; goto err;
if (copy_to_user(user_arg, arg,
sizeof(*arg) + arg->replica_entries_bytes)) ret = copy_to_user_errcode(user_arg, arg,
ret = -EFAULT; sizeof(*arg) + arg->replica_entries_bytes);
err: err:
kfree(arg); kfree(arg);
return ret; return ret;
} }
/* obsolete, didn't allow for new data types: */
static long bch2_ioctl_dev_usage(struct bch_fs *c, static long bch2_ioctl_dev_usage(struct bch_fs *c,
struct bch_ioctl_dev_usage __user *user_arg) struct bch_ioctl_dev_usage __user *user_arg)
{ {
@ -490,7 +567,7 @@ static long bch2_ioctl_dev_usage(struct bch_fs *c,
struct bch_dev *ca; struct bch_dev *ca;
unsigned i; unsigned i;
if (!test_bit(BCH_FS_STARTED, &c->flags)) if (!test_bit(BCH_FS_started, &c->flags))
return -EINVAL; return -EINVAL;
if (copy_from_user(&arg, user_arg, sizeof(arg))) if (copy_from_user(&arg, user_arg, sizeof(arg)))
@ -511,7 +588,6 @@ static long bch2_ioctl_dev_usage(struct bch_fs *c,
arg.state = ca->mi.state; arg.state = ca->mi.state;
arg.bucket_size = ca->mi.bucket_size; arg.bucket_size = ca->mi.bucket_size;
arg.nr_buckets = ca->mi.nbuckets - ca->mi.first_bucket; arg.nr_buckets = ca->mi.nbuckets - ca->mi.first_bucket;
arg.buckets_ec = src.buckets_ec;
for (i = 0; i < BCH_DATA_NR; i++) { for (i = 0; i < BCH_DATA_NR; i++) {
arg.d[i].buckets = src.d[i].buckets; arg.d[i].buckets = src.d[i].buckets;
@ -521,10 +597,58 @@ static long bch2_ioctl_dev_usage(struct bch_fs *c,
percpu_ref_put(&ca->ref); percpu_ref_put(&ca->ref);
if (copy_to_user(user_arg, &arg, sizeof(arg))) return copy_to_user_errcode(user_arg, &arg, sizeof(arg));
}
static long bch2_ioctl_dev_usage_v2(struct bch_fs *c,
struct bch_ioctl_dev_usage_v2 __user *user_arg)
{
struct bch_ioctl_dev_usage_v2 arg;
struct bch_dev_usage src;
struct bch_dev *ca;
int ret = 0;
if (!test_bit(BCH_FS_started, &c->flags))
return -EINVAL;
if (copy_from_user(&arg, user_arg, sizeof(arg)))
return -EFAULT; return -EFAULT;
return 0; if ((arg.flags & ~BCH_BY_INDEX) ||
arg.pad[0] ||
arg.pad[1] ||
arg.pad[2])
return -EINVAL;
ca = bch2_device_lookup(c, arg.dev, arg.flags);
if (IS_ERR(ca))
return PTR_ERR(ca);
src = bch2_dev_usage_read(ca);
arg.state = ca->mi.state;
arg.bucket_size = ca->mi.bucket_size;
arg.nr_data_types = min(arg.nr_data_types, BCH_DATA_NR);
arg.nr_buckets = ca->mi.nbuckets - ca->mi.first_bucket;
ret = copy_to_user_errcode(user_arg, &arg, sizeof(arg));
if (ret)
goto err;
for (unsigned i = 0; i < arg.nr_data_types; i++) {
struct bch_ioctl_dev_usage_type t = {
.buckets = src.d[i].buckets,
.sectors = src.d[i].sectors,
.fragmented = src.d[i].fragmented,
};
ret = copy_to_user_errcode(&user_arg->d[i], &t, sizeof(t));
if (ret)
goto err;
}
err:
percpu_ref_put(&ca->ref);
return ret;
} }
static long bch2_ioctl_read_super(struct bch_fs *c, static long bch2_ioctl_read_super(struct bch_fs *c,
@ -561,9 +685,8 @@ static long bch2_ioctl_read_super(struct bch_fs *c,
goto err; goto err;
} }
if (copy_to_user((void __user *)(unsigned long)arg.sb, sb, ret = copy_to_user_errcode((void __user *)(unsigned long)arg.sb, sb,
vstruct_bytes(sb))) vstruct_bytes(sb));
ret = -EFAULT;
err: err:
if (!IS_ERR_OR_NULL(ca)) if (!IS_ERR_OR_NULL(ca))
percpu_ref_put(&ca->ref); percpu_ref_put(&ca->ref);
@ -575,8 +698,6 @@ static long bch2_ioctl_disk_get_idx(struct bch_fs *c,
struct bch_ioctl_disk_get_idx arg) struct bch_ioctl_disk_get_idx arg)
{ {
dev_t dev = huge_decode_dev(arg.dev); dev_t dev = huge_decode_dev(arg.dev);
struct bch_dev *ca;
unsigned i;
if (!capable(CAP_SYS_ADMIN)) if (!capable(CAP_SYS_ADMIN))
return -EPERM; return -EPERM;
@ -584,10 +705,10 @@ static long bch2_ioctl_disk_get_idx(struct bch_fs *c,
if (!dev) if (!dev)
return -EINVAL; return -EINVAL;
for_each_online_member(ca, c, i) for_each_online_member(c, ca)
if (ca->dev == dev) { if (ca->dev == dev) {
percpu_ref_put(&ca->io_ref); percpu_ref_put(&ca->io_ref);
return i; return ca->dev_idx;
} }
return -BCH_ERR_ENOENT_dev_idx_not_found; return -BCH_ERR_ENOENT_dev_idx_not_found;
@ -642,6 +763,97 @@ static long bch2_ioctl_disk_resize_journal(struct bch_fs *c,
return ret; return ret;
} }
static int bch2_fsck_online_thread_fn(void *arg)
{
struct fsck_thread *thr = container_of(arg, struct fsck_thread, thr);
struct bch_fs *c = thr->c;
c->stdio_filter = current;
c->stdio = &thr->thr.stdio;
/*
* XXX: can we figure out a way to do this without mucking with c->opts?
*/
unsigned old_fix_errors = c->opts.fix_errors;
if (opt_defined(thr->opts, fix_errors))
c->opts.fix_errors = thr->opts.fix_errors;
else
c->opts.fix_errors = FSCK_FIX_ask;
c->opts.fsck = true;
set_bit(BCH_FS_fsck_running, &c->flags);
c->curr_recovery_pass = BCH_RECOVERY_PASS_check_alloc_info;
int ret = bch2_run_online_recovery_passes(c);
clear_bit(BCH_FS_fsck_running, &c->flags);
bch_err_fn(c, ret);
c->stdio = NULL;
c->stdio_filter = NULL;
c->opts.fix_errors = old_fix_errors;
thread_with_stdio_done(&thr->thr);
up(&c->online_fsck_mutex);
bch2_ro_ref_put(c);
return 0;
}
static long bch2_ioctl_fsck_online(struct bch_fs *c,
struct bch_ioctl_fsck_online arg)
{
struct fsck_thread *thr = NULL;
long ret = 0;
if (arg.flags)
return -EINVAL;
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
if (!bch2_ro_ref_tryget(c))
return -EROFS;
if (down_trylock(&c->online_fsck_mutex)) {
bch2_ro_ref_put(c);
return -EAGAIN;
}
thr = kzalloc(sizeof(*thr), GFP_KERNEL);
if (!thr) {
ret = -ENOMEM;
goto err;
}
thr->c = c;
thr->opts = bch2_opts_empty();
if (arg.opts) {
char *optstr = strndup_user((char __user *)(unsigned long) arg.opts, 1 << 16);
ret = PTR_ERR_OR_ZERO(optstr) ?:
bch2_parse_mount_opts(c, &thr->opts, optstr);
kfree(optstr);
if (ret)
goto err;
}
ret = bch2_run_thread_with_stdio(&thr->thr,
bch2_fsck_thread_exit,
bch2_fsck_online_thread_fn);
err:
if (ret < 0) {
bch_err_fn(c, ret);
if (thr)
bch2_fsck_thread_exit(&thr->thr);
up(&c->online_fsck_mutex);
bch2_ro_ref_put(c);
}
return ret;
}
#define BCH_IOCTL(_name, _argtype) \ #define BCH_IOCTL(_name, _argtype) \
do { \ do { \
_argtype i; \ _argtype i; \
@ -663,6 +875,8 @@ long bch2_fs_ioctl(struct bch_fs *c, unsigned cmd, void __user *arg)
return bch2_ioctl_fs_usage(c, arg); return bch2_ioctl_fs_usage(c, arg);
case BCH_IOCTL_DEV_USAGE: case BCH_IOCTL_DEV_USAGE:
return bch2_ioctl_dev_usage(c, arg); return bch2_ioctl_dev_usage(c, arg);
case BCH_IOCTL_DEV_USAGE_V2:
return bch2_ioctl_dev_usage_v2(c, arg);
#if 0 #if 0
case BCH_IOCTL_START: case BCH_IOCTL_START:
BCH_IOCTL(start, struct bch_ioctl_start); BCH_IOCTL(start, struct bch_ioctl_start);
@ -675,7 +889,7 @@ long bch2_fs_ioctl(struct bch_fs *c, unsigned cmd, void __user *arg)
BCH_IOCTL(disk_get_idx, struct bch_ioctl_disk_get_idx); BCH_IOCTL(disk_get_idx, struct bch_ioctl_disk_get_idx);
} }
if (!test_bit(BCH_FS_STARTED, &c->flags)) if (!test_bit(BCH_FS_started, &c->flags))
return -EINVAL; return -EINVAL;
switch (cmd) { switch (cmd) {
@ -695,7 +909,8 @@ long bch2_fs_ioctl(struct bch_fs *c, unsigned cmd, void __user *arg)
BCH_IOCTL(disk_resize, struct bch_ioctl_disk_resize); BCH_IOCTL(disk_resize, struct bch_ioctl_disk_resize);
case BCH_IOCTL_DISK_RESIZE_JOURNAL: case BCH_IOCTL_DISK_RESIZE_JOURNAL:
BCH_IOCTL(disk_resize_journal, struct bch_ioctl_disk_resize_journal); BCH_IOCTL(disk_resize_journal, struct bch_ioctl_disk_resize_journal);
case BCH_IOCTL_FSCK_ONLINE:
BCH_IOCTL(fsck_online, struct bch_ioctl_fsck_online);
default: default:
return -ENOTTY; return -ENOTTY;
} }

View File

@ -45,6 +45,29 @@ struct bch_csum bch2_checksum(struct bch_fs *, unsigned, struct nonce,
bch2_checksum(_c, _type, _nonce, _start, vstruct_end(_i) - _start);\ bch2_checksum(_c, _type, _nonce, _start, vstruct_end(_i) - _start);\
}) })
static inline void bch2_csum_to_text(struct printbuf *out,
enum bch_csum_type type,
struct bch_csum csum)
{
const u8 *p = (u8 *) &csum;
unsigned bytes = type < BCH_CSUM_NR ? bch_crc_bytes[type] : 16;
for (unsigned i = 0; i < bytes; i++)
prt_hex_byte(out, p[i]);
}
static inline void bch2_csum_err_msg(struct printbuf *out,
enum bch_csum_type type,
struct bch_csum expected,
struct bch_csum got)
{
prt_printf(out, "checksum error: got ");
bch2_csum_to_text(out, type, got);
prt_str(out, " should be ");
bch2_csum_to_text(out, type, expected);
prt_printf(out, " type %s", bch2_csum_types[type]);
}
int bch2_chacha_encrypt_key(struct bch_key *, struct nonce, void *, size_t); int bch2_chacha_encrypt_key(struct bch_key *, struct nonce, void *, size_t);
int bch2_request_key(struct bch_sb *, struct bch_key *); int bch2_request_key(struct bch_sb *, struct bch_key *);
#ifndef __KERNEL__ #ifndef __KERNEL__

View File

@ -572,10 +572,6 @@ static int __bch2_fs_compress_init(struct bch_fs *c, u64 features)
ZSTD_parameters params = zstd_get_params(zstd_max_clevel(), ZSTD_parameters params = zstd_get_params(zstd_max_clevel(),
c->opts.encoded_extent_max); c->opts.encoded_extent_max);
/*
* ZSTD is lying: if we allocate the size of the workspace it says it
* requires, it returns memory allocation errors
*/
c->zstd_workspace_size = zstd_cctx_workspace_bound(&params.cParams); c->zstd_workspace_size = zstd_cctx_workspace_bound(&params.cParams);
struct { struct {

View File

@ -20,6 +20,7 @@ struct { \
#define DARRAY(_type) DARRAY_PREALLOCATED(_type, 0) #define DARRAY(_type) DARRAY_PREALLOCATED(_type, 0)
typedef DARRAY(char) darray_char; typedef DARRAY(char) darray_char;
typedef DARRAY(char *) darray_str;
int __bch2_darray_resize(darray_char *, size_t, size_t, gfp_t); int __bch2_darray_resize(darray_char *, size_t, size_t, gfp_t);
@ -81,11 +82,14 @@ static inline int __darray_make_room(darray_char *d, size_t t_size, size_t more,
#define darray_remove_item(_d, _pos) \ #define darray_remove_item(_d, _pos) \
array_remove_item((_d)->data, (_d)->nr, (_pos) - (_d)->data) array_remove_item((_d)->data, (_d)->nr, (_pos) - (_d)->data)
#define __darray_for_each(_d, _i) \
for ((_i) = (_d).data; _i < (_d).data + (_d).nr; _i++)
#define darray_for_each(_d, _i) \ #define darray_for_each(_d, _i) \
for (_i = (_d).data; _i < (_d).data + (_d).nr; _i++) for (typeof(&(_d).data[0]) _i = (_d).data; _i < (_d).data + (_d).nr; _i++)
#define darray_for_each_reverse(_d, _i) \ #define darray_for_each_reverse(_d, _i) \
for (_i = (_d).data + (_d).nr - 1; _i >= (_d).data; --_i) for (typeof(&(_d).data[0]) _i = (_d).data + (_d).nr - 1; _i >= (_d).data; --_i)
#define darray_init(_d) \ #define darray_init(_d) \
do { \ do { \

View File

@ -267,6 +267,20 @@ static int __bch2_data_update_index_update(struct btree_trans *trans,
goto out; goto out;
} }
if (trace_data_update_enabled()) {
struct printbuf buf = PRINTBUF;
prt_str(&buf, "\nold: ");
bch2_bkey_val_to_text(&buf, c, old);
prt_str(&buf, "\nk: ");
bch2_bkey_val_to_text(&buf, c, k);
prt_str(&buf, "\nnew: ");
bch2_bkey_val_to_text(&buf, c, bkey_i_to_s_c(insert));
trace_data_update(c, buf.buf);
printbuf_exit(&buf);
}
ret = bch2_insert_snapshot_whiteouts(trans, m->btree_id, ret = bch2_insert_snapshot_whiteouts(trans, m->btree_id,
k.k->p, bkey_start_pos(&insert->k)) ?: k.k->p, bkey_start_pos(&insert->k)) ?:
bch2_insert_snapshot_whiteouts(trans, m->btree_id, bch2_insert_snapshot_whiteouts(trans, m->btree_id,
@ -278,8 +292,8 @@ static int __bch2_data_update_index_update(struct btree_trans *trans,
BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE) ?: BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE) ?:
bch2_trans_commit(trans, &op->res, bch2_trans_commit(trans, &op->res,
NULL, NULL,
BTREE_INSERT_NOCHECK_RW| BCH_TRANS_COMMIT_no_check_rw|
BTREE_INSERT_NOFAIL| BCH_TRANS_COMMIT_no_enospc|
m->data_opts.btree_insert_flags); m->data_opts.btree_insert_flags);
if (!ret) { if (!ret) {
bch2_btree_iter_set_pos(&iter, next_pos); bch2_btree_iter_set_pos(&iter, next_pos);
@ -300,14 +314,14 @@ static int __bch2_data_update_index_update(struct btree_trans *trans,
} }
continue; continue;
nowork: nowork:
if (m->stats && m->stats) { if (m->stats) {
BUG_ON(k.k->p.offset <= iter.pos.offset); BUG_ON(k.k->p.offset <= iter.pos.offset);
atomic64_inc(&m->stats->keys_raced); atomic64_inc(&m->stats->keys_raced);
atomic64_add(k.k->p.offset - iter.pos.offset, atomic64_add(k.k->p.offset - iter.pos.offset,
&m->stats->sectors_raced); &m->stats->sectors_raced);
} }
this_cpu_inc(c->counters[BCH_COUNTER_move_extent_fail]); count_event(c, move_extent_fail);
bch2_btree_iter_advance(&iter); bch2_btree_iter_advance(&iter);
goto next; goto next;
@ -342,7 +356,6 @@ void bch2_data_update_exit(struct data_update *update)
struct bch_fs *c = update->op.c; struct bch_fs *c = update->op.c;
struct bkey_ptrs_c ptrs = struct bkey_ptrs_c ptrs =
bch2_bkey_ptrs_c(bkey_i_to_s_c(update->k.k)); bch2_bkey_ptrs_c(bkey_i_to_s_c(update->k.k));
const struct bch_extent_ptr *ptr;
bkey_for_each_ptr(ptrs, ptr) { bkey_for_each_ptr(ptrs, ptr) {
if (c->opts.nocow_enabled) if (c->opts.nocow_enabled)
@ -363,7 +376,6 @@ static void bch2_update_unwritten_extent(struct btree_trans *trans,
struct bio *bio = &update->op.wbio.bio; struct bio *bio = &update->op.wbio.bio;
struct bkey_i_extent *e; struct bkey_i_extent *e;
struct write_point *wp; struct write_point *wp;
struct bch_extent_ptr *ptr;
struct closure cl; struct closure cl;
struct btree_iter iter; struct btree_iter iter;
struct bkey_s_c k; struct bkey_s_c k;
@ -404,6 +416,8 @@ static void bch2_update_unwritten_extent(struct btree_trans *trans,
continue; continue;
} }
bch_err_fn_ratelimited(c, ret);
if (ret) if (ret)
return; return;
@ -476,7 +490,7 @@ int bch2_extent_drop_ptrs(struct btree_trans *trans,
return bch2_trans_relock(trans) ?: return bch2_trans_relock(trans) ?:
bch2_trans_update(trans, iter, n, BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE) ?: bch2_trans_update(trans, iter, n, BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE) ?:
bch2_trans_commit(trans, NULL, NULL, BTREE_INSERT_NOFAIL); bch2_trans_commit(trans, NULL, NULL, BCH_TRANS_COMMIT_no_enospc);
} }
int bch2_data_update_init(struct btree_trans *trans, int bch2_data_update_init(struct btree_trans *trans,
@ -493,7 +507,6 @@ int bch2_data_update_init(struct btree_trans *trans,
struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k); struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k);
const union bch_extent_entry *entry; const union bch_extent_entry *entry;
struct extent_ptr_decoded p; struct extent_ptr_decoded p;
const struct bch_extent_ptr *ptr;
unsigned i, reserve_sectors = k.k->size * data_opts.extra_replicas; unsigned i, reserve_sectors = k.k->size * data_opts.extra_replicas;
unsigned ptrs_locked = 0; unsigned ptrs_locked = 0;
int ret = 0; int ret = 0;
@ -639,7 +652,6 @@ int bch2_data_update_init(struct btree_trans *trans,
void bch2_data_update_opts_normalize(struct bkey_s_c k, struct data_update_opts *opts) void bch2_data_update_opts_normalize(struct bkey_s_c k, struct data_update_opts *opts)
{ {
struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k); struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k);
const struct bch_extent_ptr *ptr;
unsigned i = 0; unsigned i = 0;
bkey_for_each_ptr(ptrs, ptr) { bkey_for_each_ptr(ptrs, ptr) {

View File

@ -366,35 +366,23 @@ static ssize_t bch2_read_btree(struct file *file, char __user *buf,
size_t size, loff_t *ppos) size_t size, loff_t *ppos)
{ {
struct dump_iter *i = file->private_data; struct dump_iter *i = file->private_data;
struct btree_trans *trans;
struct btree_iter iter;
struct bkey_s_c k;
ssize_t ret;
i->ubuf = buf; i->ubuf = buf;
i->size = size; i->size = size;
i->ret = 0; i->ret = 0;
ret = flush_buf(i); return flush_buf(i) ?:
if (ret) bch2_trans_run(i->c,
return ret; for_each_btree_key(trans, iter, i->id, i->from,
BTREE_ITER_PREFETCH|
trans = bch2_trans_get(i->c); BTREE_ITER_ALL_SNAPSHOTS, k, ({
ret = for_each_btree_key2(trans, iter, i->id, i->from, bch2_bkey_val_to_text(&i->buf, i->c, k);
BTREE_ITER_PREFETCH| prt_newline(&i->buf);
BTREE_ITER_ALL_SNAPSHOTS, k, ({ bch2_trans_unlock(trans);
bch2_bkey_val_to_text(&i->buf, i->c, k); i->from = bpos_successor(iter.pos);
prt_newline(&i->buf); flush_buf(i);
drop_locks_do(trans, flush_buf(i)); }))) ?:
})); i->ret;
i->from = iter.pos;
bch2_trans_put(trans);
if (!ret)
ret = flush_buf(i);
return ret ?: i->ret;
} }
static const struct file_operations btree_debug_ops = { static const struct file_operations btree_debug_ops = {
@ -462,44 +450,32 @@ static ssize_t bch2_read_bfloat_failed(struct file *file, char __user *buf,
size_t size, loff_t *ppos) size_t size, loff_t *ppos)
{ {
struct dump_iter *i = file->private_data; struct dump_iter *i = file->private_data;
struct btree_trans *trans;
struct btree_iter iter;
struct bkey_s_c k;
ssize_t ret;
i->ubuf = buf; i->ubuf = buf;
i->size = size; i->size = size;
i->ret = 0; i->ret = 0;
ret = flush_buf(i); return flush_buf(i) ?:
if (ret) bch2_trans_run(i->c,
return ret; for_each_btree_key(trans, iter, i->id, i->from,
BTREE_ITER_PREFETCH|
BTREE_ITER_ALL_SNAPSHOTS, k, ({
struct btree_path_level *l =
&btree_iter_path(trans, &iter)->l[0];
struct bkey_packed *_k =
bch2_btree_node_iter_peek(&l->iter, l->b);
trans = bch2_trans_get(i->c); if (bpos_gt(l->b->key.k.p, i->prev_node)) {
bch2_btree_node_to_text(&i->buf, i->c, l->b);
i->prev_node = l->b->key.k.p;
}
ret = for_each_btree_key2(trans, iter, i->id, i->from, bch2_bfloat_to_text(&i->buf, l->b, _k);
BTREE_ITER_PREFETCH| bch2_trans_unlock(trans);
BTREE_ITER_ALL_SNAPSHOTS, k, ({ i->from = bpos_successor(iter.pos);
struct btree_path_level *l = &iter.path->l[0]; flush_buf(i);
struct bkey_packed *_k = }))) ?:
bch2_btree_node_iter_peek(&l->iter, l->b); i->ret;
if (bpos_gt(l->b->key.k.p, i->prev_node)) {
bch2_btree_node_to_text(&i->buf, i->c, l->b);
i->prev_node = l->b->key.k.p;
}
bch2_bfloat_to_text(&i->buf, l->b, _k);
drop_locks_do(trans, flush_buf(i));
}));
i->from = iter.pos;
bch2_trans_put(trans);
if (!ret)
ret = flush_buf(i);
return ret ?: i->ret;
} }
static const struct file_operations bfloat_failed_debug_ops = { static const struct file_operations bfloat_failed_debug_ops = {
@ -616,7 +592,6 @@ static const struct file_operations cached_btree_nodes_ops = {
.read = bch2_cached_btree_nodes_read, .read = bch2_cached_btree_nodes_read,
}; };
#ifdef CONFIG_BCACHEFS_DEBUG_TRANSACTIONS
static ssize_t bch2_btree_transactions_read(struct file *file, char __user *buf, static ssize_t bch2_btree_transactions_read(struct file *file, char __user *buf,
size_t size, loff_t *ppos) size_t size, loff_t *ppos)
{ {
@ -632,7 +607,9 @@ static ssize_t bch2_btree_transactions_read(struct file *file, char __user *buf,
restart: restart:
seqmutex_lock(&c->btree_trans_lock); seqmutex_lock(&c->btree_trans_lock);
list_for_each_entry(trans, &c->btree_trans_list, list) { list_for_each_entry(trans, &c->btree_trans_list, list) {
if (trans->locking_wait.task->pid <= i->iter) struct task_struct *task = READ_ONCE(trans->locking_wait.task);
if (!task || task->pid <= i->iter)
continue; continue;
closure_get(&trans->ref); closure_get(&trans->ref);
@ -650,11 +627,11 @@ static ssize_t bch2_btree_transactions_read(struct file *file, char __user *buf,
prt_printf(&i->buf, "backtrace:"); prt_printf(&i->buf, "backtrace:");
prt_newline(&i->buf); prt_newline(&i->buf);
printbuf_indent_add(&i->buf, 2); printbuf_indent_add(&i->buf, 2);
bch2_prt_task_backtrace(&i->buf, trans->locking_wait.task); bch2_prt_task_backtrace(&i->buf, task, 0);
printbuf_indent_sub(&i->buf, 2); printbuf_indent_sub(&i->buf, 2);
prt_newline(&i->buf); prt_newline(&i->buf);
i->iter = trans->locking_wait.task->pid; i->iter = task->pid;
closure_put(&trans->ref); closure_put(&trans->ref);
@ -678,7 +655,6 @@ static const struct file_operations btree_transactions_ops = {
.release = bch2_dump_release, .release = bch2_dump_release,
.read = bch2_btree_transactions_read, .read = bch2_btree_transactions_read,
}; };
#endif /* CONFIG_BCACHEFS_DEBUG_TRANSACTIONS */
static ssize_t bch2_journal_pins_read(struct file *file, char __user *buf, static ssize_t bch2_journal_pins_read(struct file *file, char __user *buf,
size_t size, loff_t *ppos) size_t size, loff_t *ppos)
@ -717,7 +693,7 @@ static const struct file_operations journal_pins_ops = {
.read = bch2_journal_pins_read, .read = bch2_journal_pins_read,
}; };
static int lock_held_stats_open(struct inode *inode, struct file *file) static int btree_transaction_stats_open(struct inode *inode, struct file *file)
{ {
struct bch_fs *c = inode->i_private; struct bch_fs *c = inode->i_private;
struct dump_iter *i; struct dump_iter *i;
@ -727,7 +703,7 @@ static int lock_held_stats_open(struct inode *inode, struct file *file)
if (!i) if (!i)
return -ENOMEM; return -ENOMEM;
i->iter = 0; i->iter = 1;
i->c = c; i->c = c;
i->buf = PRINTBUF; i->buf = PRINTBUF;
file->private_data = i; file->private_data = i;
@ -735,7 +711,7 @@ static int lock_held_stats_open(struct inode *inode, struct file *file)
return 0; return 0;
} }
static int lock_held_stats_release(struct inode *inode, struct file *file) static int btree_transaction_stats_release(struct inode *inode, struct file *file)
{ {
struct dump_iter *i = file->private_data; struct dump_iter *i = file->private_data;
@ -745,8 +721,8 @@ static int lock_held_stats_release(struct inode *inode, struct file *file)
return 0; return 0;
} }
static ssize_t lock_held_stats_read(struct file *file, char __user *buf, static ssize_t btree_transaction_stats_read(struct file *file, char __user *buf,
size_t size, loff_t *ppos) size_t size, loff_t *ppos)
{ {
struct dump_iter *i = file->private_data; struct dump_iter *i = file->private_data;
struct bch_fs *c = i->c; struct bch_fs *c = i->c;
@ -779,6 +755,13 @@ static ssize_t lock_held_stats_read(struct file *file, char __user *buf,
prt_printf(&i->buf, "Max mem used: %u", s->max_mem); prt_printf(&i->buf, "Max mem used: %u", s->max_mem);
prt_newline(&i->buf); prt_newline(&i->buf);
prt_printf(&i->buf, "Transaction duration:");
prt_newline(&i->buf);
printbuf_indent_add(&i->buf, 2);
bch2_time_stats_to_text(&i->buf, &s->duration);
printbuf_indent_sub(&i->buf, 2);
if (IS_ENABLED(CONFIG_BCACHEFS_LOCK_TIME_STATS)) { if (IS_ENABLED(CONFIG_BCACHEFS_LOCK_TIME_STATS)) {
prt_printf(&i->buf, "Lock hold times:"); prt_printf(&i->buf, "Lock hold times:");
prt_newline(&i->buf); prt_newline(&i->buf);
@ -810,11 +793,11 @@ static ssize_t lock_held_stats_read(struct file *file, char __user *buf,
return i->ret; return i->ret;
} }
static const struct file_operations lock_held_stats_op = { static const struct file_operations btree_transaction_stats_op = {
.owner = THIS_MODULE, .owner = THIS_MODULE,
.open = lock_held_stats_open, .open = btree_transaction_stats_open,
.release = lock_held_stats_release, .release = btree_transaction_stats_release,
.read = lock_held_stats_read, .read = btree_transaction_stats_read,
}; };
static ssize_t bch2_btree_deadlock_read(struct file *file, char __user *buf, static ssize_t bch2_btree_deadlock_read(struct file *file, char __user *buf,
@ -835,7 +818,9 @@ static ssize_t bch2_btree_deadlock_read(struct file *file, char __user *buf,
restart: restart:
seqmutex_lock(&c->btree_trans_lock); seqmutex_lock(&c->btree_trans_lock);
list_for_each_entry(trans, &c->btree_trans_list, list) { list_for_each_entry(trans, &c->btree_trans_list, list) {
if (trans->locking_wait.task->pid <= i->iter) struct task_struct *task = READ_ONCE(trans->locking_wait.task);
if (!task || task->pid <= i->iter)
continue; continue;
closure_get(&trans->ref); closure_get(&trans->ref);
@ -850,7 +835,7 @@ static ssize_t bch2_btree_deadlock_read(struct file *file, char __user *buf,
bch2_check_for_deadlock(trans, &i->buf); bch2_check_for_deadlock(trans, &i->buf);
i->iter = trans->locking_wait.task->pid; i->iter = task->pid;
closure_put(&trans->ref); closure_put(&trans->ref);
@ -897,16 +882,14 @@ void bch2_fs_debug_init(struct bch_fs *c)
debugfs_create_file("cached_btree_nodes", 0400, c->fs_debug_dir, debugfs_create_file("cached_btree_nodes", 0400, c->fs_debug_dir,
c->btree_debug, &cached_btree_nodes_ops); c->btree_debug, &cached_btree_nodes_ops);
#ifdef CONFIG_BCACHEFS_DEBUG_TRANSACTIONS
debugfs_create_file("btree_transactions", 0400, c->fs_debug_dir, debugfs_create_file("btree_transactions", 0400, c->fs_debug_dir,
c->btree_debug, &btree_transactions_ops); c->btree_debug, &btree_transactions_ops);
#endif
debugfs_create_file("journal_pins", 0400, c->fs_debug_dir, debugfs_create_file("journal_pins", 0400, c->fs_debug_dir,
c->btree_debug, &journal_pins_ops); c->btree_debug, &journal_pins_ops);
debugfs_create_file("btree_transaction_stats", 0400, c->fs_debug_dir, debugfs_create_file("btree_transaction_stats", 0400, c->fs_debug_dir,
c, &lock_held_stats_op); c, &btree_transaction_stats_op);
debugfs_create_file("btree_deadlock", 0400, c->fs_debug_dir, debugfs_create_file("btree_deadlock", 0400, c->fs_debug_dir,
c->btree_debug, &btree_deadlock_ops); c->btree_debug, &btree_deadlock_ops);
@ -947,8 +930,6 @@ void bch2_debug_exit(void)
int __init bch2_debug_init(void) int __init bch2_debug_init(void)
{ {
int ret = 0;
bch_debug = debugfs_create_dir("bcachefs", NULL); bch_debug = debugfs_create_dir("bcachefs", NULL);
return ret; return 0;
} }

View File

@ -65,7 +65,7 @@ static bool dirent_cmp_key(struct bkey_s_c _l, const void *_r)
const struct qstr l_name = bch2_dirent_get_name(l); const struct qstr l_name = bch2_dirent_get_name(l);
const struct qstr *r_name = _r; const struct qstr *r_name = _r;
return l_name.len - r_name->len ?: memcmp(l_name.name, r_name->name, l_name.len); return !qstr_eq(l_name, *r_name);
} }
static bool dirent_cmp_bkey(struct bkey_s_c _l, struct bkey_s_c _r) static bool dirent_cmp_bkey(struct bkey_s_c _l, struct bkey_s_c _r)
@ -75,7 +75,7 @@ static bool dirent_cmp_bkey(struct bkey_s_c _l, struct bkey_s_c _r)
const struct qstr l_name = bch2_dirent_get_name(l); const struct qstr l_name = bch2_dirent_get_name(l);
const struct qstr r_name = bch2_dirent_get_name(r); const struct qstr r_name = bch2_dirent_get_name(r);
return l_name.len - r_name.len ?: memcmp(l_name.name, r_name.name, l_name.len); return !qstr_eq(l_name, r_name);
} }
static bool dirent_is_visible(subvol_inum inum, struct bkey_s_c k) static bool dirent_is_visible(subvol_inum inum, struct bkey_s_c k)
@ -198,10 +198,39 @@ static struct bkey_i_dirent *dirent_create_key(struct btree_trans *trans,
return dirent; return dirent;
} }
int bch2_dirent_create_snapshot(struct btree_trans *trans,
u64 dir, u32 snapshot,
const struct bch_hash_info *hash_info,
u8 type, const struct qstr *name, u64 dst_inum,
u64 *dir_offset,
bch_str_hash_flags_t str_hash_flags)
{
subvol_inum zero_inum = { 0 };
struct bkey_i_dirent *dirent;
int ret;
dirent = dirent_create_key(trans, zero_inum, type, name, dst_inum);
ret = PTR_ERR_OR_ZERO(dirent);
if (ret)
return ret;
dirent->k.p.inode = dir;
dirent->k.p.snapshot = snapshot;
ret = bch2_hash_set_snapshot(trans, bch2_dirent_hash_desc, hash_info,
zero_inum, snapshot,
&dirent->k_i, str_hash_flags,
BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE);
*dir_offset = dirent->k.p.offset;
return ret;
}
int bch2_dirent_create(struct btree_trans *trans, subvol_inum dir, int bch2_dirent_create(struct btree_trans *trans, subvol_inum dir,
const struct bch_hash_info *hash_info, const struct bch_hash_info *hash_info,
u8 type, const struct qstr *name, u64 dst_inum, u8 type, const struct qstr *name, u64 dst_inum,
u64 *dir_offset, int flags) u64 *dir_offset,
bch_str_hash_flags_t str_hash_flags)
{ {
struct bkey_i_dirent *dirent; struct bkey_i_dirent *dirent;
int ret; int ret;
@ -212,7 +241,7 @@ int bch2_dirent_create(struct btree_trans *trans, subvol_inum dir,
return ret; return ret;
ret = bch2_hash_set(trans, bch2_dirent_hash_desc, hash_info, ret = bch2_hash_set(trans, bch2_dirent_hash_desc, hash_info,
dir, &dirent->k_i, flags); dir, &dirent->k_i, str_hash_flags);
*dir_offset = dirent->k.p.offset; *dir_offset = dirent->k.p.offset;
return ret; return ret;
@ -470,17 +499,11 @@ u64 bch2_dirent_lookup(struct bch_fs *c, subvol_inum dir,
const struct qstr *name, subvol_inum *inum) const struct qstr *name, subvol_inum *inum)
{ {
struct btree_trans *trans = bch2_trans_get(c); struct btree_trans *trans = bch2_trans_get(c);
struct btree_iter iter; struct btree_iter iter = { NULL };
int ret;
retry:
bch2_trans_begin(trans);
ret = __bch2_dirent_lookup_trans(trans, &iter, dir, hash_info, int ret = lockrestart_do(trans,
name, inum, 0); __bch2_dirent_lookup_trans(trans, &iter, dir, hash_info, name, inum, 0));
if (bch2_err_matches(ret, BCH_ERR_transaction_restart)) bch2_trans_iter_exit(trans, &iter);
goto retry;
if (!ret)
bch2_trans_iter_exit(trans, &iter);
bch2_trans_put(trans); bch2_trans_put(trans);
return ret; return ret;
} }

View File

@ -35,9 +35,14 @@ static inline unsigned dirent_val_u64s(unsigned len)
int bch2_dirent_read_target(struct btree_trans *, subvol_inum, int bch2_dirent_read_target(struct btree_trans *, subvol_inum,
struct bkey_s_c_dirent, subvol_inum *); struct bkey_s_c_dirent, subvol_inum *);
int bch2_dirent_create_snapshot(struct btree_trans *, u64, u32,
const struct bch_hash_info *, u8,
const struct qstr *, u64, u64 *,
bch_str_hash_flags_t);
int bch2_dirent_create(struct btree_trans *, subvol_inum, int bch2_dirent_create(struct btree_trans *, subvol_inum,
const struct bch_hash_info *, u8, const struct bch_hash_info *, u8,
const struct qstr *, u64, u64 *, int); const struct qstr *, u64, u64 *,
bch_str_hash_flags_t);
static inline unsigned vfs_d_type(unsigned type) static inline unsigned vfs_d_type(unsigned type)
{ {

View File

@ -89,19 +89,14 @@ static int bch2_sb_disk_groups_validate(struct bch_sb *sb,
void bch2_disk_groups_to_text(struct printbuf *out, struct bch_fs *c) void bch2_disk_groups_to_text(struct printbuf *out, struct bch_fs *c)
{ {
struct bch_disk_groups_cpu *g;
struct bch_dev *ca;
int i;
unsigned iter;
out->atomic++; out->atomic++;
rcu_read_lock(); rcu_read_lock();
g = rcu_dereference(c->disk_groups); struct bch_disk_groups_cpu *g = rcu_dereference(c->disk_groups);
if (!g) if (!g)
goto out; goto out;
for (i = 0; i < g->nr; i++) { for (unsigned i = 0; i < g->nr; i++) {
if (i) if (i)
prt_printf(out, " "); prt_printf(out, " ");
@ -111,7 +106,7 @@ void bch2_disk_groups_to_text(struct printbuf *out, struct bch_fs *c)
} }
prt_printf(out, "[parent %d devs", g->entries[i].parent); prt_printf(out, "[parent %d devs", g->entries[i].parent);
for_each_member_device_rcu(ca, c, iter, &g->entries[i].devs) for_each_member_device_rcu(c, ca, &g->entries[i].devs)
prt_printf(out, " %s", ca->name); prt_printf(out, " %s", ca->name);
prt_printf(out, "]"); prt_printf(out, "]");
} }
@ -562,7 +557,7 @@ void bch2_target_to_text(struct printbuf *out, struct bch_fs *c, unsigned v)
: NULL; : NULL;
if (ca && percpu_ref_tryget(&ca->io_ref)) { if (ca && percpu_ref_tryget(&ca->io_ref)) {
prt_printf(out, "/dev/%pg", ca->disk_sb.bdev); prt_printf(out, "/dev/%s", ca->name);
percpu_ref_put(&ca->io_ref); percpu_ref_put(&ca->io_ref);
} else if (ca) { } else if (ca) {
prt_printf(out, "offline device %u", t.dev); prt_printf(out, "offline device %u", t.dev);

View File

@ -3,6 +3,7 @@
/* erasure coding */ /* erasure coding */
#include "bcachefs.h" #include "bcachefs.h"
#include "alloc_background.h"
#include "alloc_foreground.h" #include "alloc_foreground.h"
#include "backpointers.h" #include "backpointers.h"
#include "bkey_buf.h" #include "bkey_buf.h"
@ -156,12 +157,311 @@ void bch2_stripe_to_text(struct printbuf *out, struct bch_fs *c,
} }
} }
/* Triggers: */
static int bch2_trans_mark_stripe_bucket(struct btree_trans *trans,
struct bkey_s_c_stripe s,
unsigned idx, bool deleting)
{
struct bch_fs *c = trans->c;
const struct bch_extent_ptr *ptr = &s.v->ptrs[idx];
struct btree_iter iter;
struct bkey_i_alloc_v4 *a;
enum bch_data_type data_type = idx >= s.v->nr_blocks - s.v->nr_redundant
? BCH_DATA_parity : 0;
s64 sectors = data_type ? le16_to_cpu(s.v->sectors) : 0;
int ret = 0;
if (deleting)
sectors = -sectors;
a = bch2_trans_start_alloc_update(trans, &iter, PTR_BUCKET_POS(c, ptr));
if (IS_ERR(a))
return PTR_ERR(a);
ret = bch2_check_bucket_ref(trans, s.s_c, ptr, sectors, data_type,
a->v.gen, a->v.data_type,
a->v.dirty_sectors);
if (ret)
goto err;
if (!deleting) {
if (bch2_trans_inconsistent_on(a->v.stripe ||
a->v.stripe_redundancy, trans,
"bucket %llu:%llu gen %u data type %s dirty_sectors %u: multiple stripes using same bucket (%u, %llu)",
iter.pos.inode, iter.pos.offset, a->v.gen,
bch2_data_types[a->v.data_type],
a->v.dirty_sectors,
a->v.stripe, s.k->p.offset)) {
ret = -EIO;
goto err;
}
if (bch2_trans_inconsistent_on(data_type && a->v.dirty_sectors, trans,
"bucket %llu:%llu gen %u data type %s dirty_sectors %u: data already in stripe bucket %llu",
iter.pos.inode, iter.pos.offset, a->v.gen,
bch2_data_types[a->v.data_type],
a->v.dirty_sectors,
s.k->p.offset)) {
ret = -EIO;
goto err;
}
a->v.stripe = s.k->p.offset;
a->v.stripe_redundancy = s.v->nr_redundant;
a->v.data_type = BCH_DATA_stripe;
} else {
if (bch2_trans_inconsistent_on(a->v.stripe != s.k->p.offset ||
a->v.stripe_redundancy != s.v->nr_redundant, trans,
"bucket %llu:%llu gen %u: not marked as stripe when deleting stripe %llu (got %u)",
iter.pos.inode, iter.pos.offset, a->v.gen,
s.k->p.offset, a->v.stripe)) {
ret = -EIO;
goto err;
}
a->v.stripe = 0;
a->v.stripe_redundancy = 0;
a->v.data_type = alloc_data_type(a->v, BCH_DATA_user);
}
a->v.dirty_sectors += sectors;
if (data_type)
a->v.data_type = !deleting ? data_type : 0;
ret = bch2_trans_update(trans, &iter, &a->k_i, 0);
if (ret)
goto err;
err:
bch2_trans_iter_exit(trans, &iter);
return ret;
}
static int mark_stripe_bucket(struct btree_trans *trans,
struct bkey_s_c k,
unsigned ptr_idx,
unsigned flags)
{
struct bch_fs *c = trans->c;
const struct bch_stripe *s = bkey_s_c_to_stripe(k).v;
unsigned nr_data = s->nr_blocks - s->nr_redundant;
bool parity = ptr_idx >= nr_data;
enum bch_data_type data_type = parity ? BCH_DATA_parity : BCH_DATA_stripe;
s64 sectors = parity ? le16_to_cpu(s->sectors) : 0;
const struct bch_extent_ptr *ptr = s->ptrs + ptr_idx;
struct bch_dev *ca = bch_dev_bkey_exists(c, ptr->dev);
struct bucket old, new, *g;
struct printbuf buf = PRINTBUF;
int ret = 0;
BUG_ON(!(flags & BTREE_TRIGGER_GC));
/* * XXX doesn't handle deletion */
percpu_down_read(&c->mark_lock);
g = PTR_GC_BUCKET(ca, ptr);
if (g->dirty_sectors ||
(g->stripe && g->stripe != k.k->p.offset)) {
bch2_fs_inconsistent(c,
"bucket %u:%zu gen %u: multiple stripes using same bucket\n%s",
ptr->dev, PTR_BUCKET_NR(ca, ptr), g->gen,
(bch2_bkey_val_to_text(&buf, c, k), buf.buf));
ret = -EINVAL;
goto err;
}
bucket_lock(g);
old = *g;
ret = bch2_check_bucket_ref(trans, k, ptr, sectors, data_type,
g->gen, g->data_type,
g->dirty_sectors);
if (ret)
goto err;
g->data_type = data_type;
g->dirty_sectors += sectors;
g->stripe = k.k->p.offset;
g->stripe_redundancy = s->nr_redundant;
new = *g;
err:
bucket_unlock(g);
if (!ret)
bch2_dev_usage_update_m(c, ca, &old, &new);
percpu_up_read(&c->mark_lock);
printbuf_exit(&buf);
return ret;
}
int bch2_trigger_stripe(struct btree_trans *trans,
enum btree_id btree_id, unsigned level,
struct bkey_s_c old, struct bkey_s _new,
unsigned flags)
{
struct bkey_s_c new = _new.s_c;
struct bch_fs *c = trans->c;
u64 idx = new.k->p.offset;
const struct bch_stripe *old_s = old.k->type == KEY_TYPE_stripe
? bkey_s_c_to_stripe(old).v : NULL;
const struct bch_stripe *new_s = new.k->type == KEY_TYPE_stripe
? bkey_s_c_to_stripe(new).v : NULL;
if (flags & BTREE_TRIGGER_TRANSACTIONAL) {
/*
* If the pointers aren't changing, we don't need to do anything:
*/
if (new_s && old_s &&
new_s->nr_blocks == old_s->nr_blocks &&
new_s->nr_redundant == old_s->nr_redundant &&
!memcmp(old_s->ptrs, new_s->ptrs,
new_s->nr_blocks * sizeof(struct bch_extent_ptr)))
return 0;
BUG_ON(new_s && old_s &&
(new_s->nr_blocks != old_s->nr_blocks ||
new_s->nr_redundant != old_s->nr_redundant));
if (new_s) {
s64 sectors = le16_to_cpu(new_s->sectors);
struct bch_replicas_padded r;
bch2_bkey_to_replicas(&r.e, new);
int ret = bch2_update_replicas_list(trans, &r.e, sectors * new_s->nr_redundant);
if (ret)
return ret;
}
if (old_s) {
s64 sectors = -((s64) le16_to_cpu(old_s->sectors));
struct bch_replicas_padded r;
bch2_bkey_to_replicas(&r.e, old);
int ret = bch2_update_replicas_list(trans, &r.e, sectors * old_s->nr_redundant);
if (ret)
return ret;
}
unsigned nr_blocks = new_s ? new_s->nr_blocks : old_s->nr_blocks;
for (unsigned i = 0; i < nr_blocks; i++) {
if (new_s && old_s &&
!memcmp(&new_s->ptrs[i],
&old_s->ptrs[i],
sizeof(new_s->ptrs[i])))
continue;
if (new_s) {
int ret = bch2_trans_mark_stripe_bucket(trans,
bkey_s_c_to_stripe(new), i, false);
if (ret)
return ret;
}
if (old_s) {
int ret = bch2_trans_mark_stripe_bucket(trans,
bkey_s_c_to_stripe(old), i, true);
if (ret)
return ret;
}
}
}
if (!(flags & (BTREE_TRIGGER_TRANSACTIONAL|BTREE_TRIGGER_GC))) {
struct stripe *m = genradix_ptr(&c->stripes, idx);
if (!m) {
struct printbuf buf1 = PRINTBUF;
struct printbuf buf2 = PRINTBUF;
bch2_bkey_val_to_text(&buf1, c, old);
bch2_bkey_val_to_text(&buf2, c, new);
bch_err_ratelimited(c, "error marking nonexistent stripe %llu while marking\n"
"old %s\n"
"new %s", idx, buf1.buf, buf2.buf);
printbuf_exit(&buf2);
printbuf_exit(&buf1);
bch2_inconsistent_error(c);
return -1;
}
if (!new_s) {
bch2_stripes_heap_del(c, m, idx);
memset(m, 0, sizeof(*m));
} else {
m->sectors = le16_to_cpu(new_s->sectors);
m->algorithm = new_s->algorithm;
m->nr_blocks = new_s->nr_blocks;
m->nr_redundant = new_s->nr_redundant;
m->blocks_nonempty = 0;
for (unsigned i = 0; i < new_s->nr_blocks; i++)
m->blocks_nonempty += !!stripe_blockcount_get(new_s, i);
if (!old_s)
bch2_stripes_heap_insert(c, m, idx);
else
bch2_stripes_heap_update(c, m, idx);
}
}
if (flags & BTREE_TRIGGER_GC) {
struct gc_stripe *m =
genradix_ptr_alloc(&c->gc_stripes, idx, GFP_KERNEL);
if (!m) {
bch_err(c, "error allocating memory for gc_stripes, idx %llu",
idx);
return -BCH_ERR_ENOMEM_mark_stripe;
}
/*
* This will be wrong when we bring back runtime gc: we should
* be unmarking the old key and then marking the new key
*/
m->alive = true;
m->sectors = le16_to_cpu(new_s->sectors);
m->nr_blocks = new_s->nr_blocks;
m->nr_redundant = new_s->nr_redundant;
for (unsigned i = 0; i < new_s->nr_blocks; i++)
m->ptrs[i] = new_s->ptrs[i];
bch2_bkey_to_replicas(&m->r.e, new);
/*
* gc recalculates this field from stripe ptr
* references:
*/
memset(m->block_sectors, 0, sizeof(m->block_sectors));
for (unsigned i = 0; i < new_s->nr_blocks; i++) {
int ret = mark_stripe_bucket(trans, new, i, flags);
if (ret)
return ret;
}
int ret = bch2_update_replicas(c, new, &m->r.e,
((s64) m->sectors * m->nr_redundant),
0, true);
if (ret) {
struct printbuf buf = PRINTBUF;
bch2_bkey_val_to_text(&buf, c, new);
bch2_fs_fatal_error(c, "no replicas entry for %s", buf.buf);
printbuf_exit(&buf);
return ret;
}
}
return 0;
}
/* returns blocknr in stripe that we matched: */ /* returns blocknr in stripe that we matched: */
static const struct bch_extent_ptr *bkey_matches_stripe(struct bch_stripe *s, static const struct bch_extent_ptr *bkey_matches_stripe(struct bch_stripe *s,
struct bkey_s_c k, unsigned *block) struct bkey_s_c k, unsigned *block)
{ {
struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k); struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k);
const struct bch_extent_ptr *ptr;
unsigned i, nr_data = s->nr_blocks - s->nr_redundant; unsigned i, nr_data = s->nr_blocks - s->nr_redundant;
bkey_for_each_ptr(ptrs, ptr) bkey_for_each_ptr(ptrs, ptr)
@ -791,28 +1091,22 @@ static void ec_stripe_delete_work(struct work_struct *work)
{ {
struct bch_fs *c = struct bch_fs *c =
container_of(work, struct bch_fs, ec_stripe_delete_work); container_of(work, struct bch_fs, ec_stripe_delete_work);
struct btree_trans *trans = bch2_trans_get(c);
int ret;
u64 idx;
while (1) { while (1) {
mutex_lock(&c->ec_stripes_heap_lock); mutex_lock(&c->ec_stripes_heap_lock);
idx = stripe_idx_to_delete(c); u64 idx = stripe_idx_to_delete(c);
mutex_unlock(&c->ec_stripes_heap_lock); mutex_unlock(&c->ec_stripes_heap_lock);
if (!idx) if (!idx)
break; break;
ret = commit_do(trans, NULL, NULL, BTREE_INSERT_NOFAIL, int ret = bch2_trans_do(c, NULL, NULL, BCH_TRANS_COMMIT_no_enospc,
ec_stripe_delete(trans, idx)); ec_stripe_delete(trans, idx));
if (ret) { bch_err_fn(c, ret);
bch_err_fn(c, ret); if (ret)
break; break;
}
} }
bch2_trans_put(trans);
bch2_write_ref_put(c, BCH_WRITE_REF_stripe_delete); bch2_write_ref_put(c, BCH_WRITE_REF_stripe_delete);
} }
@ -983,8 +1277,8 @@ static int ec_stripe_update_bucket(struct btree_trans *trans, struct ec_stripe_b
while (1) { while (1) {
ret = commit_do(trans, NULL, NULL, ret = commit_do(trans, NULL, NULL,
BTREE_INSERT_NOCHECK_RW| BCH_TRANS_COMMIT_no_check_rw|
BTREE_INSERT_NOFAIL, BCH_TRANS_COMMIT_no_enospc,
ec_stripe_update_extent(trans, bucket_pos, bucket.gen, ec_stripe_update_extent(trans, bucket_pos, bucket.gen,
s, &bp_pos)); s, &bp_pos));
if (ret) if (ret)
@ -1005,7 +1299,7 @@ static int ec_stripe_update_extents(struct bch_fs *c, struct ec_stripe_buf *s)
unsigned i, nr_data = v->nr_blocks - v->nr_redundant; unsigned i, nr_data = v->nr_blocks - v->nr_redundant;
int ret = 0; int ret = 0;
ret = bch2_btree_write_buffer_flush(trans); ret = bch2_btree_write_buffer_flush_sync(trans);
if (ret) if (ret)
goto err; goto err;
@ -1121,21 +1415,20 @@ static void ec_stripe_create(struct ec_stripe_new *s)
} }
ret = bch2_trans_do(c, &s->res, NULL, ret = bch2_trans_do(c, &s->res, NULL,
BTREE_INSERT_NOCHECK_RW| BCH_TRANS_COMMIT_no_check_rw|
BTREE_INSERT_NOFAIL, BCH_TRANS_COMMIT_no_enospc,
ec_stripe_key_update(trans, ec_stripe_key_update(trans,
bkey_i_to_stripe(&s->new_stripe.key), bkey_i_to_stripe(&s->new_stripe.key),
!s->have_existing_stripe)); !s->have_existing_stripe));
bch_err_msg(c, ret, "creating stripe key");
if (ret) { if (ret) {
bch_err(c, "error creating stripe: error creating stripe key");
goto err; goto err;
} }
ret = ec_stripe_update_extents(c, &s->new_stripe); ret = ec_stripe_update_extents(c, &s->new_stripe);
if (ret) { bch_err_msg(c, ret, "error updating extents");
bch_err_msg(c, ret, "creating stripe: error updating pointers"); if (ret)
goto err; goto err;
}
err: err:
bch2_disk_reservation_put(c, &s->res); bch2_disk_reservation_put(c, &s->res);
@ -1250,18 +1543,17 @@ static int unsigned_cmp(const void *_l, const void *_r)
static unsigned pick_blocksize(struct bch_fs *c, static unsigned pick_blocksize(struct bch_fs *c,
struct bch_devs_mask *devs) struct bch_devs_mask *devs)
{ {
struct bch_dev *ca; unsigned nr = 0, sizes[BCH_SB_MEMBERS_MAX];
unsigned i, nr = 0, sizes[BCH_SB_MEMBERS_MAX];
struct { struct {
unsigned nr, size; unsigned nr, size;
} cur = { 0, 0 }, best = { 0, 0 }; } cur = { 0, 0 }, best = { 0, 0 };
for_each_member_device_rcu(ca, c, i, devs) for_each_member_device_rcu(c, ca, devs)
sizes[nr++] = ca->mi.bucket_size; sizes[nr++] = ca->mi.bucket_size;
sort(sizes, nr, sizeof(unsigned), unsigned_cmp, NULL); sort(sizes, nr, sizeof(unsigned), unsigned_cmp, NULL);
for (i = 0; i < nr; i++) { for (unsigned i = 0; i < nr; i++) {
if (sizes[i] != cur.size) { if (sizes[i] != cur.size) {
if (cur.nr > best.nr) if (cur.nr > best.nr)
best = cur; best = cur;
@ -1344,8 +1636,6 @@ ec_new_stripe_head_alloc(struct bch_fs *c, unsigned target,
enum bch_watermark watermark) enum bch_watermark watermark)
{ {
struct ec_stripe_head *h; struct ec_stripe_head *h;
struct bch_dev *ca;
unsigned i;
h = kzalloc(sizeof(*h), GFP_KERNEL); h = kzalloc(sizeof(*h), GFP_KERNEL);
if (!h) if (!h)
@ -1362,13 +1652,13 @@ ec_new_stripe_head_alloc(struct bch_fs *c, unsigned target,
rcu_read_lock(); rcu_read_lock();
h->devs = target_rw_devs(c, BCH_DATA_user, target); h->devs = target_rw_devs(c, BCH_DATA_user, target);
for_each_member_device_rcu(ca, c, i, &h->devs) for_each_member_device_rcu(c, ca, &h->devs)
if (!ca->mi.durability) if (!ca->mi.durability)
__clear_bit(i, h->devs.d); __clear_bit(ca->dev_idx, h->devs.d);
h->blocksize = pick_blocksize(c, &h->devs); h->blocksize = pick_blocksize(c, &h->devs);
for_each_member_device_rcu(ca, c, i, &h->devs) for_each_member_device_rcu(c, ca, &h->devs)
if (ca->mi.bucket_size == h->blocksize) if (ca->mi.bucket_size == h->blocksize)
h->nr_active_devs++; h->nr_active_devs++;
@ -1415,7 +1705,7 @@ __bch2_ec_stripe_head_get(struct btree_trans *trans,
if (ret) if (ret)
return ERR_PTR(ret); return ERR_PTR(ret);
if (test_bit(BCH_FS_GOING_RO, &c->flags)) { if (test_bit(BCH_FS_going_ro, &c->flags)) {
h = ERR_PTR(-BCH_ERR_erofs_no_writes); h = ERR_PTR(-BCH_ERR_erofs_no_writes);
goto found; goto found;
} }
@ -1833,44 +2123,32 @@ void bch2_fs_ec_flush(struct bch_fs *c)
int bch2_stripes_read(struct bch_fs *c) int bch2_stripes_read(struct bch_fs *c)
{ {
struct btree_trans *trans = bch2_trans_get(c); int ret = bch2_trans_run(c,
struct btree_iter iter; for_each_btree_key(trans, iter, BTREE_ID_stripes, POS_MIN,
struct bkey_s_c k; BTREE_ITER_PREFETCH, k, ({
const struct bch_stripe *s; if (k.k->type != KEY_TYPE_stripe)
struct stripe *m; continue;
unsigned i;
int ret;
for_each_btree_key(trans, iter, BTREE_ID_stripes, POS_MIN, ret = __ec_stripe_mem_alloc(c, k.k->p.offset, GFP_KERNEL);
BTREE_ITER_PREFETCH, k, ret) { if (ret)
if (k.k->type != KEY_TYPE_stripe) break;
continue;
ret = __ec_stripe_mem_alloc(c, k.k->p.offset, GFP_KERNEL); const struct bch_stripe *s = bkey_s_c_to_stripe(k).v;
if (ret)
break;
s = bkey_s_c_to_stripe(k).v; struct stripe *m = genradix_ptr(&c->stripes, k.k->p.offset);
m->sectors = le16_to_cpu(s->sectors);
m->algorithm = s->algorithm;
m->nr_blocks = s->nr_blocks;
m->nr_redundant = s->nr_redundant;
m->blocks_nonempty = 0;
m = genradix_ptr(&c->stripes, k.k->p.offset); for (unsigned i = 0; i < s->nr_blocks; i++)
m->sectors = le16_to_cpu(s->sectors); m->blocks_nonempty += !!stripe_blockcount_get(s, i);
m->algorithm = s->algorithm;
m->nr_blocks = s->nr_blocks;
m->nr_redundant = s->nr_redundant;
m->blocks_nonempty = 0;
for (i = 0; i < s->nr_blocks; i++)
m->blocks_nonempty += !!stripe_blockcount_get(s, i);
bch2_stripes_heap_insert(c, m, k.k->p.offset);
}
bch2_trans_iter_exit(trans, &iter);
bch2_trans_put(trans);
if (ret)
bch_err_fn(c, ret);
bch2_stripes_heap_insert(c, m, k.k->p.offset);
0;
})));
bch_err_fn(c, ret);
return ret; return ret;
} }

View File

@ -12,13 +12,14 @@ int bch2_stripe_invalid(struct bch_fs *, struct bkey_s_c,
enum bkey_invalid_flags, struct printbuf *); enum bkey_invalid_flags, struct printbuf *);
void bch2_stripe_to_text(struct printbuf *, struct bch_fs *, void bch2_stripe_to_text(struct printbuf *, struct bch_fs *,
struct bkey_s_c); struct bkey_s_c);
int bch2_trigger_stripe(struct btree_trans *, enum btree_id, unsigned,
struct bkey_s_c, struct bkey_s, unsigned);
#define bch2_bkey_ops_stripe ((struct bkey_ops) { \ #define bch2_bkey_ops_stripe ((struct bkey_ops) { \
.key_invalid = bch2_stripe_invalid, \ .key_invalid = bch2_stripe_invalid, \
.val_to_text = bch2_stripe_to_text, \ .val_to_text = bch2_stripe_to_text, \
.swab = bch2_ptr_swab, \ .swab = bch2_ptr_swab, \
.trans_trigger = bch2_trans_mark_stripe, \ .trigger = bch2_trigger_stripe, \
.atomic_trigger = bch2_mark_stripe, \
.min_val_size = 8, \ .min_val_size = 8, \
}) })

View File

@ -5,7 +5,7 @@
#include "bcachefs_format.h" #include "bcachefs_format.h"
struct bch_replicas_padded { struct bch_replicas_padded {
struct bch_replicas_entry e; struct bch_replicas_entry_v1 e;
u8 pad[BCH_BKEY_PTRS_MAX]; u8 pad[BCH_BKEY_PTRS_MAX];
}; };

View File

@ -73,7 +73,6 @@
x(ENOMEM, ENOMEM_fsck_add_nlink) \ x(ENOMEM, ENOMEM_fsck_add_nlink) \
x(ENOMEM, ENOMEM_journal_key_insert) \ x(ENOMEM, ENOMEM_journal_key_insert) \
x(ENOMEM, ENOMEM_journal_keys_sort) \ x(ENOMEM, ENOMEM_journal_keys_sort) \
x(ENOMEM, ENOMEM_journal_replay) \
x(ENOMEM, ENOMEM_read_superblock_clean) \ x(ENOMEM, ENOMEM_read_superblock_clean) \
x(ENOMEM, ENOMEM_fs_alloc) \ x(ENOMEM, ENOMEM_fs_alloc) \
x(ENOMEM, ENOMEM_fs_name_alloc) \ x(ENOMEM, ENOMEM_fs_name_alloc) \
@ -152,7 +151,6 @@
x(BCH_ERR_btree_insert_fail, btree_insert_need_mark_replicas) \ x(BCH_ERR_btree_insert_fail, btree_insert_need_mark_replicas) \
x(BCH_ERR_btree_insert_fail, btree_insert_need_journal_res) \ x(BCH_ERR_btree_insert_fail, btree_insert_need_journal_res) \
x(BCH_ERR_btree_insert_fail, btree_insert_need_journal_reclaim) \ x(BCH_ERR_btree_insert_fail, btree_insert_need_journal_reclaim) \
x(BCH_ERR_btree_insert_fail, btree_insert_need_flush_buffer) \
x(0, backpointer_to_overwritten_btree_node) \ x(0, backpointer_to_overwritten_btree_node) \
x(0, lock_fail_root_changed) \ x(0, lock_fail_root_changed) \
x(0, journal_reclaim_would_deadlock) \ x(0, journal_reclaim_would_deadlock) \
@ -172,10 +170,12 @@
x(EINVAL, device_size_too_small) \ x(EINVAL, device_size_too_small) \
x(EINVAL, device_not_a_member_of_filesystem) \ x(EINVAL, device_not_a_member_of_filesystem) \
x(EINVAL, device_has_been_removed) \ x(EINVAL, device_has_been_removed) \
x(EINVAL, device_splitbrain) \
x(EINVAL, device_already_online) \ x(EINVAL, device_already_online) \
x(EINVAL, insufficient_devices_to_start) \ x(EINVAL, insufficient_devices_to_start) \
x(EINVAL, invalid) \ x(EINVAL, invalid) \
x(EINVAL, internal_fsck_err) \ x(EINVAL, internal_fsck_err) \
x(EINVAL, opt_parse_error) \
x(EROFS, erofs_trans_commit) \ x(EROFS, erofs_trans_commit) \
x(EROFS, erofs_no_writes) \ x(EROFS, erofs_no_writes) \
x(EROFS, erofs_journal_err) \ x(EROFS, erofs_journal_err) \
@ -224,6 +224,8 @@
x(BCH_ERR_invalid, invalid_bkey) \ x(BCH_ERR_invalid, invalid_bkey) \
x(BCH_ERR_operation_blocked, nocow_lock_blocked) \ x(BCH_ERR_operation_blocked, nocow_lock_blocked) \
x(EIO, btree_node_read_err) \ x(EIO, btree_node_read_err) \
x(EIO, sb_not_downgraded) \
x(EIO, btree_write_all_failed) \
x(BCH_ERR_btree_node_read_err, btree_node_read_err_fixable) \ x(BCH_ERR_btree_node_read_err, btree_node_read_err_fixable) \
x(BCH_ERR_btree_node_read_err, btree_node_read_err_want_retry) \ x(BCH_ERR_btree_node_read_err, btree_node_read_err_want_retry) \
x(BCH_ERR_btree_node_read_err, btree_node_read_err_must_retry) \ x(BCH_ERR_btree_node_read_err, btree_node_read_err_must_retry) \
@ -235,6 +237,7 @@
x(BCH_ERR_nopromote, nopromote_unwritten) \ x(BCH_ERR_nopromote, nopromote_unwritten) \
x(BCH_ERR_nopromote, nopromote_congested) \ x(BCH_ERR_nopromote, nopromote_congested) \
x(BCH_ERR_nopromote, nopromote_in_flight) \ x(BCH_ERR_nopromote, nopromote_in_flight) \
x(BCH_ERR_nopromote, nopromote_no_writes) \
x(BCH_ERR_nopromote, nopromote_enomem) x(BCH_ERR_nopromote, nopromote_enomem)
enum bch_errcode { enum bch_errcode {

View File

@ -2,12 +2,13 @@
#include "bcachefs.h" #include "bcachefs.h"
#include "error.h" #include "error.h"
#include "super.h" #include "super.h"
#include "thread_with_file.h"
#define FSCK_ERR_RATELIMIT_NR 10 #define FSCK_ERR_RATELIMIT_NR 10
bool bch2_inconsistent_error(struct bch_fs *c) bool bch2_inconsistent_error(struct bch_fs *c)
{ {
set_bit(BCH_FS_ERROR, &c->flags); set_bit(BCH_FS_error, &c->flags);
switch (c->opts.errors) { switch (c->opts.errors) {
case BCH_ON_ERROR_continue: case BCH_ON_ERROR_continue:
@ -26,8 +27,8 @@ bool bch2_inconsistent_error(struct bch_fs *c)
void bch2_topology_error(struct bch_fs *c) void bch2_topology_error(struct bch_fs *c)
{ {
set_bit(BCH_FS_TOPOLOGY_ERROR, &c->flags); set_bit(BCH_FS_topology_error, &c->flags);
if (test_bit(BCH_FS_FSCK_DONE, &c->flags)) if (!test_bit(BCH_FS_fsck_running, &c->flags))
bch2_inconsistent_error(c); bch2_inconsistent_error(c);
} }
@ -69,29 +70,11 @@ enum ask_yn {
YN_ALLYES, YN_ALLYES,
}; };
#ifdef __KERNEL__ static enum ask_yn parse_yn_response(char *buf)
#define bch2_fsck_ask_yn() YN_NO
#else
#include "tools-util.h"
enum ask_yn bch2_fsck_ask_yn(void)
{ {
char *buf = NULL; buf = strim(buf);
size_t buflen = 0;
bool ret;
while (true) {
fputs(" (y,n, or Y,N for all errors of this type) ", stdout);
fflush(stdout);
if (getline(&buf, &buflen, stdin) < 0)
die("error reading from standard input");
strim(buf);
if (strlen(buf) != 1)
continue;
if (strlen(buf) == 1)
switch (buf[0]) { switch (buf[0]) {
case 'n': case 'n':
return YN_NO; return YN_NO;
@ -102,7 +85,51 @@ enum ask_yn bch2_fsck_ask_yn(void)
case 'Y': case 'Y':
return YN_ALLYES; return YN_ALLYES;
} }
} return -1;
}
#ifdef __KERNEL__
static enum ask_yn bch2_fsck_ask_yn(struct bch_fs *c)
{
struct stdio_redirect *stdio = c->stdio;
if (c->stdio_filter && c->stdio_filter != current)
stdio = NULL;
if (!stdio)
return YN_NO;
char buf[100];
int ret;
do {
bch2_print(c, " (y,n, or Y,N for all errors of this type) ");
int r = bch2_stdio_redirect_readline(stdio, buf, sizeof(buf) - 1);
if (r < 0)
return YN_NO;
buf[r] = '\0';
} while ((ret = parse_yn_response(buf)) < 0);
return ret;
}
#else
#include "tools-util.h"
static enum ask_yn bch2_fsck_ask_yn(struct bch_fs *c)
{
char *buf = NULL;
size_t buflen = 0;
int ret;
do {
fputs(" (y,n, or Y,N for all errors of this type) ", stdout);
fflush(stdout);
if (getline(&buf, &buflen, stdin) < 0)
die("error reading from standard input");
} while ((ret = parse_yn_response(buf)) < 0);
free(buf); free(buf);
return ret; return ret;
@ -114,7 +141,7 @@ static struct fsck_err_state *fsck_err_get(struct bch_fs *c, const char *fmt)
{ {
struct fsck_err_state *s; struct fsck_err_state *s;
if (test_bit(BCH_FS_FSCK_DONE, &c->flags)) if (!test_bit(BCH_FS_fsck_running, &c->flags))
return NULL; return NULL;
list_for_each_entry(s, &c->fsck_error_msgs, list) list_for_each_entry(s, &c->fsck_error_msgs, list)
@ -152,7 +179,8 @@ int bch2_fsck_err(struct bch_fs *c,
struct printbuf buf = PRINTBUF, *out = &buf; struct printbuf buf = PRINTBUF, *out = &buf;
int ret = -BCH_ERR_fsck_ignore; int ret = -BCH_ERR_fsck_ignore;
if (test_bit(err, c->sb.errors_silent)) if ((flags & FSCK_CAN_FIX) &&
test_bit(err, c->sb.errors_silent))
return -BCH_ERR_fsck_fix; return -BCH_ERR_fsck_fix;
bch2_sb_error_count(c, err); bch2_sb_error_count(c, err);
@ -196,7 +224,7 @@ int bch2_fsck_err(struct bch_fs *c,
prt_printf(out, bch2_log_msg(c, "")); prt_printf(out, bch2_log_msg(c, ""));
#endif #endif
if (test_bit(BCH_FS_FSCK_DONE, &c->flags)) { if (!test_bit(BCH_FS_fsck_running, &c->flags)) {
if (c->opts.errors != BCH_ON_ERROR_continue || if (c->opts.errors != BCH_ON_ERROR_continue ||
!(flags & (FSCK_CAN_FIX|FSCK_CAN_IGNORE))) { !(flags & (FSCK_CAN_FIX|FSCK_CAN_IGNORE))) {
prt_str(out, ", shutting down"); prt_str(out, ", shutting down");
@ -221,10 +249,13 @@ int bch2_fsck_err(struct bch_fs *c,
int ask; int ask;
prt_str(out, ": fix?"); prt_str(out, ": fix?");
bch2_print_string_as_lines(KERN_ERR, out->buf); if (bch2_fs_stdio_redirect(c))
bch2_print(c, "%s", out->buf);
else
bch2_print_string_as_lines(KERN_ERR, out->buf);
print = false; print = false;
ask = bch2_fsck_ask_yn(); ask = bch2_fsck_ask_yn(c);
if (ask >= YN_ALLNO && s) if (ask >= YN_ALLNO && s)
s->fix = ask == YN_ALLNO s->fix = ask == YN_ALLNO
@ -253,10 +284,14 @@ int bch2_fsck_err(struct bch_fs *c,
!(flags & FSCK_CAN_IGNORE))) !(flags & FSCK_CAN_IGNORE)))
ret = -BCH_ERR_fsck_errors_not_fixed; ret = -BCH_ERR_fsck_errors_not_fixed;
if (print) if (print) {
bch2_print_string_as_lines(KERN_ERR, out->buf); if (bch2_fs_stdio_redirect(c))
bch2_print(c, "%s\n", out->buf);
else
bch2_print_string_as_lines(KERN_ERR, out->buf);
}
if (!test_bit(BCH_FS_FSCK_DONE, &c->flags) && if (test_bit(BCH_FS_fsck_running, &c->flags) &&
(ret != -BCH_ERR_fsck_fix && (ret != -BCH_ERR_fsck_fix &&
ret != -BCH_ERR_fsck_ignore)) ret != -BCH_ERR_fsck_ignore))
bch_err(c, "Unable to continue, halting"); bch_err(c, "Unable to continue, halting");
@ -274,10 +309,10 @@ int bch2_fsck_err(struct bch_fs *c,
bch2_inconsistent_error(c); bch2_inconsistent_error(c);
if (ret == -BCH_ERR_fsck_fix) { if (ret == -BCH_ERR_fsck_fix) {
set_bit(BCH_FS_ERRORS_FIXED, &c->flags); set_bit(BCH_FS_errors_fixed, &c->flags);
} else { } else {
set_bit(BCH_FS_ERRORS_NOT_FIXED, &c->flags); set_bit(BCH_FS_errors_not_fixed, &c->flags);
set_bit(BCH_FS_ERROR, &c->flags); set_bit(BCH_FS_error, &c->flags);
} }
return ret; return ret;

View File

@ -100,7 +100,7 @@ static int count_iters_for_insert(struct btree_trans *trans,
return ret2 ?: ret; return ret2 ?: ret;
} }
#define EXTENT_ITERS_MAX (BTREE_ITER_MAX / 3) #define EXTENT_ITERS_MAX (BTREE_ITER_INITIAL / 3)
int bch2_extent_atomic_end(struct btree_trans *trans, int bch2_extent_atomic_end(struct btree_trans *trans,
struct btree_iter *iter, struct btree_iter *iter,

View File

@ -843,7 +843,6 @@ void bch2_bkey_drop_device_noerror(struct bkey_s k, unsigned dev)
const struct bch_extent_ptr *bch2_bkey_has_device_c(struct bkey_s_c k, unsigned dev) const struct bch_extent_ptr *bch2_bkey_has_device_c(struct bkey_s_c k, unsigned dev)
{ {
struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k); struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k);
const struct bch_extent_ptr *ptr;
bkey_for_each_ptr(ptrs, ptr) bkey_for_each_ptr(ptrs, ptr)
if (ptr->dev == dev) if (ptr->dev == dev)
@ -855,7 +854,6 @@ const struct bch_extent_ptr *bch2_bkey_has_device_c(struct bkey_s_c k, unsigned
bool bch2_bkey_has_target(struct bch_fs *c, struct bkey_s_c k, unsigned target) bool bch2_bkey_has_target(struct bch_fs *c, struct bkey_s_c k, unsigned target)
{ {
struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k); struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k);
const struct bch_extent_ptr *ptr;
bkey_for_each_ptr(ptrs, ptr) bkey_for_each_ptr(ptrs, ptr)
if (bch2_dev_in_target(c, ptr->dev, target) && if (bch2_dev_in_target(c, ptr->dev, target) &&
@ -1065,7 +1063,6 @@ static int extent_ptr_invalid(struct bch_fs *c,
struct printbuf *err) struct printbuf *err)
{ {
struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k); struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k);
const struct bch_extent_ptr *ptr2;
u64 bucket; u64 bucket;
u32 bucket_offset; u32 bucket_offset;
struct bch_dev *ca; struct bch_dev *ca;
@ -1307,7 +1304,6 @@ unsigned bch2_bkey_ptrs_need_rebalance(struct bch_fs *c, struct bkey_s_c k,
} }
incompressible: incompressible:
if (target && bch2_target_accepts_data(c, BCH_DATA_user, target)) { if (target && bch2_target_accepts_data(c, BCH_DATA_user, target)) {
const struct bch_extent_ptr *ptr;
unsigned i = 0; unsigned i = 0;
bkey_for_each_ptr(ptrs, ptr) { bkey_for_each_ptr(ptrs, ptr) {

View File

@ -300,7 +300,7 @@ static inline struct bkey_ptrs bch2_bkey_ptrs(struct bkey_s k)
bkey_extent_entry_for_each_from(_p, _entry, _p.start) bkey_extent_entry_for_each_from(_p, _entry, _p.start)
#define __bkey_for_each_ptr(_start, _end, _ptr) \ #define __bkey_for_each_ptr(_start, _end, _ptr) \
for ((_ptr) = (_start); \ for (typeof(_start) (_ptr) = (_start); \
((_ptr) = __bkey_ptr_next(_ptr, _end)); \ ((_ptr) = __bkey_ptr_next(_ptr, _end)); \
(_ptr)++) (_ptr)++)
@ -415,8 +415,7 @@ void bch2_btree_ptr_v2_compat(enum btree_id, unsigned, unsigned,
.key_invalid = bch2_btree_ptr_invalid, \ .key_invalid = bch2_btree_ptr_invalid, \
.val_to_text = bch2_btree_ptr_to_text, \ .val_to_text = bch2_btree_ptr_to_text, \
.swab = bch2_ptr_swab, \ .swab = bch2_ptr_swab, \
.trans_trigger = bch2_trans_mark_extent, \ .trigger = bch2_trigger_extent, \
.atomic_trigger = bch2_mark_extent, \
}) })
#define bch2_bkey_ops_btree_ptr_v2 ((struct bkey_ops) { \ #define bch2_bkey_ops_btree_ptr_v2 ((struct bkey_ops) { \
@ -424,8 +423,7 @@ void bch2_btree_ptr_v2_compat(enum btree_id, unsigned, unsigned,
.val_to_text = bch2_btree_ptr_v2_to_text, \ .val_to_text = bch2_btree_ptr_v2_to_text, \
.swab = bch2_ptr_swab, \ .swab = bch2_ptr_swab, \
.compat = bch2_btree_ptr_v2_compat, \ .compat = bch2_btree_ptr_v2_compat, \
.trans_trigger = bch2_trans_mark_extent, \ .trigger = bch2_trigger_extent, \
.atomic_trigger = bch2_mark_extent, \
.min_val_size = 40, \ .min_val_size = 40, \
}) })
@ -439,8 +437,7 @@ bool bch2_extent_merge(struct bch_fs *, struct bkey_s, struct bkey_s_c);
.swab = bch2_ptr_swab, \ .swab = bch2_ptr_swab, \
.key_normalize = bch2_extent_normalize, \ .key_normalize = bch2_extent_normalize, \
.key_merge = bch2_extent_merge, \ .key_merge = bch2_extent_merge, \
.trans_trigger = bch2_trans_mark_extent, \ .trigger = bch2_trigger_extent, \
.atomic_trigger = bch2_mark_extent, \
}) })
/* KEY_TYPE_reservation: */ /* KEY_TYPE_reservation: */
@ -454,8 +451,7 @@ bool bch2_reservation_merge(struct bch_fs *, struct bkey_s, struct bkey_s_c);
.key_invalid = bch2_reservation_invalid, \ .key_invalid = bch2_reservation_invalid, \
.val_to_text = bch2_reservation_to_text, \ .val_to_text = bch2_reservation_to_text, \
.key_merge = bch2_reservation_merge, \ .key_merge = bch2_reservation_merge, \
.trans_trigger = bch2_trans_mark_reservation, \ .trigger = bch2_trigger_reservation, \
.atomic_trigger = bch2_mark_reservation, \
.min_val_size = 8, \ .min_val_size = 8, \
}) })
@ -547,7 +543,6 @@ static inline bool bkey_extent_is_allocation(const struct bkey *k)
static inline bool bkey_extent_is_unwritten(struct bkey_s_c k) static inline bool bkey_extent_is_unwritten(struct bkey_s_c k)
{ {
struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k); struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k);
const struct bch_extent_ptr *ptr;
bkey_for_each_ptr(ptrs, ptr) bkey_for_each_ptr(ptrs, ptr)
if (ptr->unwritten) if (ptr->unwritten)
@ -565,10 +560,9 @@ static inline struct bch_devs_list bch2_bkey_devs(struct bkey_s_c k)
{ {
struct bch_devs_list ret = (struct bch_devs_list) { 0 }; struct bch_devs_list ret = (struct bch_devs_list) { 0 };
struct bkey_ptrs_c p = bch2_bkey_ptrs_c(k); struct bkey_ptrs_c p = bch2_bkey_ptrs_c(k);
const struct bch_extent_ptr *ptr;
bkey_for_each_ptr(p, ptr) bkey_for_each_ptr(p, ptr)
ret.devs[ret.nr++] = ptr->dev; ret.data[ret.nr++] = ptr->dev;
return ret; return ret;
} }
@ -577,11 +571,10 @@ static inline struct bch_devs_list bch2_bkey_dirty_devs(struct bkey_s_c k)
{ {
struct bch_devs_list ret = (struct bch_devs_list) { 0 }; struct bch_devs_list ret = (struct bch_devs_list) { 0 };
struct bkey_ptrs_c p = bch2_bkey_ptrs_c(k); struct bkey_ptrs_c p = bch2_bkey_ptrs_c(k);
const struct bch_extent_ptr *ptr;
bkey_for_each_ptr(p, ptr) bkey_for_each_ptr(p, ptr)
if (!ptr->cached) if (!ptr->cached)
ret.devs[ret.nr++] = ptr->dev; ret.data[ret.nr++] = ptr->dev;
return ret; return ret;
} }
@ -590,11 +583,10 @@ static inline struct bch_devs_list bch2_bkey_cached_devs(struct bkey_s_c k)
{ {
struct bch_devs_list ret = (struct bch_devs_list) { 0 }; struct bch_devs_list ret = (struct bch_devs_list) { 0 };
struct bkey_ptrs_c p = bch2_bkey_ptrs_c(k); struct bkey_ptrs_c p = bch2_bkey_ptrs_c(k);
const struct bch_extent_ptr *ptr;
bkey_for_each_ptr(p, ptr) bkey_for_each_ptr(p, ptr)
if (ptr->cached) if (ptr->cached)
ret.devs[ret.nr++] = ptr->dev; ret.data[ret.nr++] = ptr->dev;
return ret; return ret;
} }

View File

@ -261,11 +261,11 @@ static inline ssize_t eytzinger0_find_le(void *base, size_t nr, size_t size,
#define eytzinger0_find(base, nr, size, _cmp, search) \ #define eytzinger0_find(base, nr, size, _cmp, search) \
({ \ ({ \
void *_base = (base); \ void *_base = (base); \
void *_search = (search); \ const void *_search = (search); \
size_t _nr = (nr); \ size_t _nr = (nr); \
size_t _size = (size); \ size_t _size = (size); \
size_t _i = 0; \ size_t _i = 0; \
int _res; \ int _res; \
\ \
while (_i < _nr && \ while (_i < _nr && \

View File

@ -166,10 +166,8 @@ int bch2_create_trans(struct btree_trans *trans,
if (ret) if (ret)
goto err; goto err;
if (c->sb.version >= bcachefs_metadata_version_inode_backpointers) { new_inode->bi_dir = dir_u->bi_inum;
new_inode->bi_dir = dir_u->bi_inum; new_inode->bi_dir_offset = dir_offset;
new_inode->bi_dir_offset = dir_offset;
}
} }
inode_iter.flags &= ~BTREE_ITER_ALL_SNAPSHOTS; inode_iter.flags &= ~BTREE_ITER_ALL_SNAPSHOTS;
@ -228,10 +226,8 @@ int bch2_link_trans(struct btree_trans *trans,
if (ret) if (ret)
goto err; goto err;
if (c->sb.version >= bcachefs_metadata_version_inode_backpointers) { inode_u->bi_dir = dir.inum;
inode_u->bi_dir = dir.inum; inode_u->bi_dir_offset = dir_offset;
inode_u->bi_dir_offset = dir_offset;
}
ret = bch2_inode_write(trans, &dir_iter, dir_u) ?: ret = bch2_inode_write(trans, &dir_iter, dir_u) ?:
bch2_inode_write(trans, &inode_iter, inode_u); bch2_inode_write(trans, &inode_iter, inode_u);
@ -414,21 +410,19 @@ int bch2_rename_trans(struct btree_trans *trans,
goto err; goto err;
} }
if (c->sb.version >= bcachefs_metadata_version_inode_backpointers) { src_inode_u->bi_dir = dst_dir_u->bi_inum;
src_inode_u->bi_dir = dst_dir_u->bi_inum; src_inode_u->bi_dir_offset = dst_offset;
src_inode_u->bi_dir_offset = dst_offset;
if (mode == BCH_RENAME_EXCHANGE) { if (mode == BCH_RENAME_EXCHANGE) {
dst_inode_u->bi_dir = src_dir_u->bi_inum; dst_inode_u->bi_dir = src_dir_u->bi_inum;
dst_inode_u->bi_dir_offset = src_offset; dst_inode_u->bi_dir_offset = src_offset;
} }
if (mode == BCH_RENAME_OVERWRITE && if (mode == BCH_RENAME_OVERWRITE &&
dst_inode_u->bi_dir == dst_dir_u->bi_inum && dst_inode_u->bi_dir == dst_dir_u->bi_inum &&
dst_inode_u->bi_dir_offset == src_offset) { dst_inode_u->bi_dir_offset == src_offset) {
dst_inode_u->bi_dir = 0; dst_inode_u->bi_dir = 0;
dst_inode_u->bi_dir_offset = 0; dst_inode_u->bi_dir_offset = 0;
}
} }
if (mode == BCH_RENAME_OVERWRITE) { if (mode == BCH_RENAME_OVERWRITE) {

View File

@ -52,26 +52,20 @@ struct readpages_iter {
static int readpages_iter_init(struct readpages_iter *iter, static int readpages_iter_init(struct readpages_iter *iter,
struct readahead_control *ractl) struct readahead_control *ractl)
{ {
struct folio **fi; struct folio *folio;
int ret;
memset(iter, 0, sizeof(*iter)); *iter = (struct readpages_iter) { ractl->mapping };
iter->mapping = ractl->mapping; while ((folio = __readahead_folio(ractl))) {
if (!bch2_folio_create(folio, GFP_KERNEL) ||
darray_push(&iter->folios, folio)) {
bch2_folio_release(folio);
ractl->_nr_pages += folio_nr_pages(folio);
ractl->_index -= folio_nr_pages(folio);
return iter->folios.nr ? 0 : -ENOMEM;
}
ret = bch2_filemap_get_contig_folios_d(iter->mapping, folio_put(folio);
ractl->_index << PAGE_SHIFT,
(ractl->_index + ractl->_nr_pages) << PAGE_SHIFT,
0, mapping_gfp_mask(iter->mapping),
&iter->folios);
if (ret)
return ret;
darray_for_each(iter->folios, fi) {
ractl->_nr_pages -= 1U << folio_order(*fi);
__bch2_folio_create(*fi, __GFP_NOFAIL|GFP_KERNEL);
folio_put(*fi);
folio_put(*fi);
} }
return 0; return 0;
@ -273,12 +267,12 @@ void bch2_readahead(struct readahead_control *ractl)
struct btree_trans *trans = bch2_trans_get(c); struct btree_trans *trans = bch2_trans_get(c);
struct folio *folio; struct folio *folio;
struct readpages_iter readpages_iter; struct readpages_iter readpages_iter;
int ret;
bch2_inode_opts_get(&opts, c, &inode->ei_inode); bch2_inode_opts_get(&opts, c, &inode->ei_inode);
ret = readpages_iter_init(&readpages_iter, ractl); int ret = readpages_iter_init(&readpages_iter, ractl);
BUG_ON(ret); if (ret)
return;
bch2_pagecache_add_get(inode); bch2_pagecache_add_get(inode);
@ -638,7 +632,7 @@ static int __bch2_writepage(struct folio *folio,
/* Check for writing past i_size: */ /* Check for writing past i_size: */
WARN_ONCE((bio_end_sector(&w->io->op.wbio.bio) << 9) > WARN_ONCE((bio_end_sector(&w->io->op.wbio.bio) << 9) >
round_up(i_size, block_bytes(c)) && round_up(i_size, block_bytes(c)) &&
!test_bit(BCH_FS_EMERGENCY_RO, &c->flags), !test_bit(BCH_FS_emergency_ro, &c->flags),
"writing past i_size: %llu > %llu (unrounded %llu)\n", "writing past i_size: %llu > %llu (unrounded %llu)\n",
bio_end_sector(&w->io->op.wbio.bio) << 9, bio_end_sector(&w->io->op.wbio.bio) << 9,
round_up(i_size, block_bytes(c)), round_up(i_size, block_bytes(c)),
@ -826,7 +820,7 @@ static int __bch2_buffered_write(struct bch_inode_info *inode,
struct bch_fs *c = inode->v.i_sb->s_fs_info; struct bch_fs *c = inode->v.i_sb->s_fs_info;
struct bch2_folio_reservation res; struct bch2_folio_reservation res;
folios fs; folios fs;
struct folio **fi, *f; struct folio *f;
unsigned copied = 0, f_offset, f_copied; unsigned copied = 0, f_offset, f_copied;
u64 end = pos + len, f_pos, f_len; u64 end = pos + len, f_pos, f_len;
loff_t last_folio_pos = inode->v.i_size; loff_t last_folio_pos = inode->v.i_size;

View File

@ -77,9 +77,6 @@ static int bch2_direct_IO_read(struct kiocb *req, struct iov_iter *iter)
bch2_inode_opts_get(&opts, c, &inode->ei_inode); bch2_inode_opts_get(&opts, c, &inode->ei_inode);
if ((offset|iter->count) & (block_bytes(c) - 1))
return -EINVAL;
ret = min_t(loff_t, iter->count, ret = min_t(loff_t, iter->count,
max_t(loff_t, 0, i_size_read(&inode->v) - offset)); max_t(loff_t, 0, i_size_read(&inode->v) - offset));

View File

@ -192,13 +192,17 @@ int bch2_fsync(struct file *file, loff_t start, loff_t end, int datasync)
{ {
struct bch_inode_info *inode = file_bch_inode(file); struct bch_inode_info *inode = file_bch_inode(file);
struct bch_fs *c = inode->v.i_sb->s_fs_info; struct bch_fs *c = inode->v.i_sb->s_fs_info;
int ret, ret2, ret3; int ret;
ret = file_write_and_wait_range(file, start, end); ret = file_write_and_wait_range(file, start, end);
ret2 = sync_inode_metadata(&inode->v, 1); if (ret)
ret3 = bch2_flush_inode(c, inode); goto out;
ret = sync_inode_metadata(&inode->v, 1);
return bch2_err_class(ret ?: ret2 ?: ret3); if (ret)
goto out;
ret = bch2_flush_inode(c, inode);
out:
return bch2_err_class(ret);
} }
/* truncate: */ /* truncate: */
@ -861,7 +865,8 @@ loff_t bch2_remap_file_range(struct file *file_src, loff_t pos_src,
abs(pos_src - pos_dst) < len) abs(pos_src - pos_dst) < len)
return -EINVAL; return -EINVAL;
bch2_lock_inodes(INODE_LOCK|INODE_PAGECACHE_BLOCK, src, dst); lock_two_nondirectories(&src->v, &dst->v);
bch2_lock_inodes(INODE_PAGECACHE_BLOCK, src, dst);
inode_dio_wait(&src->v); inode_dio_wait(&src->v);
inode_dio_wait(&dst->v); inode_dio_wait(&dst->v);
@ -914,7 +919,8 @@ loff_t bch2_remap_file_range(struct file *file_src, loff_t pos_src,
ret = bch2_flush_inode(c, dst); ret = bch2_flush_inode(c, dst);
err: err:
bch2_quota_reservation_put(c, dst, &quota_res); bch2_quota_reservation_put(c, dst, &quota_res);
bch2_unlock_inodes(INODE_LOCK|INODE_PAGECACHE_BLOCK, src, dst); bch2_unlock_inodes(INODE_PAGECACHE_BLOCK, src, dst);
unlock_two_nondirectories(&src->v, &dst->v);
return bch2_err_class(ret); return bch2_err_class(ret);
} }

View File

@ -285,34 +285,26 @@ static int bch2_ioc_goingdown(struct bch_fs *c, u32 __user *arg)
bch_notice(c, "shutdown by ioctl type %u", flags); bch_notice(c, "shutdown by ioctl type %u", flags);
down_write(&c->vfs_sb->s_umount);
switch (flags) { switch (flags) {
case FSOP_GOING_FLAGS_DEFAULT: case FSOP_GOING_FLAGS_DEFAULT:
ret = bdev_freeze(c->vfs_sb->s_bdev); ret = bdev_freeze(c->vfs_sb->s_bdev);
if (ret) if (ret)
goto err; break;
bch2_journal_flush(&c->journal); bch2_journal_flush(&c->journal);
c->vfs_sb->s_flags |= SB_RDONLY;
bch2_fs_emergency_read_only(c); bch2_fs_emergency_read_only(c);
bdev_thaw(c->vfs_sb->s_bdev); bdev_thaw(c->vfs_sb->s_bdev);
break; break;
case FSOP_GOING_FLAGS_LOGFLUSH: case FSOP_GOING_FLAGS_LOGFLUSH:
bch2_journal_flush(&c->journal); bch2_journal_flush(&c->journal);
fallthrough; fallthrough;
case FSOP_GOING_FLAGS_NOLOGFLUSH: case FSOP_GOING_FLAGS_NOLOGFLUSH:
c->vfs_sb->s_flags |= SB_RDONLY;
bch2_fs_emergency_read_only(c); bch2_fs_emergency_read_only(c);
break; break;
default: default:
ret = -EINVAL; ret = -EINVAL;
break; break;
} }
err:
up_write(&c->vfs_sb->s_umount);
return ret; return ret;
} }

View File

@ -93,7 +93,7 @@ int __must_check bch2_write_inode(struct bch_fs *c,
BTREE_ITER_INTENT) ?: BTREE_ITER_INTENT) ?:
(set ? set(trans, inode, &inode_u, p) : 0) ?: (set ? set(trans, inode, &inode_u, p) : 0) ?:
bch2_inode_write(trans, &iter, &inode_u) ?: bch2_inode_write(trans, &iter, &inode_u) ?:
bch2_trans_commit(trans, NULL, NULL, BTREE_INSERT_NOFAIL); bch2_trans_commit(trans, NULL, NULL, BCH_TRANS_COMMIT_no_enospc);
/* /*
* the btree node lock protects inode->ei_inode, not ei_update_lock; * the btree node lock protects inode->ei_inode, not ei_update_lock;
@ -455,7 +455,7 @@ int __bch2_unlink(struct inode *vdir, struct dentry *dentry,
bch2_lock_inodes(INODE_UPDATE_LOCK, dir, inode); bch2_lock_inodes(INODE_UPDATE_LOCK, dir, inode);
ret = commit_do(trans, NULL, NULL, ret = commit_do(trans, NULL, NULL,
BTREE_INSERT_NOFAIL, BCH_TRANS_COMMIT_no_enospc,
bch2_unlink_trans(trans, bch2_unlink_trans(trans,
inode_inum(dir), &dir_u, inode_inum(dir), &dir_u,
&inode_u, &dentry->d_name, &inode_u, &dentry->d_name,
@ -729,7 +729,7 @@ int bch2_setattr_nonsize(struct mnt_idmap *idmap,
ret = bch2_inode_write(trans, &inode_iter, &inode_u) ?: ret = bch2_inode_write(trans, &inode_iter, &inode_u) ?:
bch2_trans_commit(trans, NULL, NULL, bch2_trans_commit(trans, NULL, NULL,
BTREE_INSERT_NOFAIL); BCH_TRANS_COMMIT_no_enospc);
btree_err: btree_err:
bch2_trans_iter_exit(trans, &inode_iter); bch2_trans_iter_exit(trans, &inode_iter);
@ -1012,15 +1012,13 @@ static int bch2_vfs_readdir(struct file *file, struct dir_context *ctx)
{ {
struct bch_inode_info *inode = file_bch_inode(file); struct bch_inode_info *inode = file_bch_inode(file);
struct bch_fs *c = inode->v.i_sb->s_fs_info; struct bch_fs *c = inode->v.i_sb->s_fs_info;
int ret;
if (!dir_emit_dots(file, ctx)) if (!dir_emit_dots(file, ctx))
return 0; return 0;
ret = bch2_readdir(c, inode_inum(inode), ctx); int ret = bch2_readdir(c, inode_inum(inode), ctx);
if (ret)
bch_err_fn(c, ret);
bch_err_fn(c, ret);
return bch2_err_class(ret); return bch2_err_class(ret);
} }
@ -1500,7 +1498,7 @@ static void bch2_evict_inode(struct inode *vinode)
void bch2_evict_subvolume_inodes(struct bch_fs *c, snapshot_id_list *s) void bch2_evict_subvolume_inodes(struct bch_fs *c, snapshot_id_list *s)
{ {
struct bch_inode_info *inode, **i; struct bch_inode_info *inode;
DARRAY(struct bch_inode_info *) grabbed; DARRAY(struct bch_inode_info *) grabbed;
bool clean_pass = false, this_pass_clean; bool clean_pass = false, this_pass_clean;
@ -1626,43 +1624,18 @@ static struct bch_fs *bch2_path_to_fs(const char *path)
return c ?: ERR_PTR(-ENOENT); return c ?: ERR_PTR(-ENOENT);
} }
static char **split_devs(const char *_dev_name, unsigned *nr)
{
char *dev_name = NULL, **devs = NULL, *s;
size_t i = 0, nr_devs = 0;
dev_name = kstrdup(_dev_name, GFP_KERNEL);
if (!dev_name)
return NULL;
for (s = dev_name; s; s = strchr(s + 1, ':'))
nr_devs++;
devs = kcalloc(nr_devs + 1, sizeof(const char *), GFP_KERNEL);
if (!devs) {
kfree(dev_name);
return NULL;
}
while ((s = strsep(&dev_name, ":")))
devs[i++] = s;
*nr = nr_devs;
return devs;
}
static int bch2_remount(struct super_block *sb, int *flags, char *data) static int bch2_remount(struct super_block *sb, int *flags, char *data)
{ {
struct bch_fs *c = sb->s_fs_info; struct bch_fs *c = sb->s_fs_info;
struct bch_opts opts = bch2_opts_empty(); struct bch_opts opts = bch2_opts_empty();
int ret; int ret;
opt_set(opts, read_only, (*flags & SB_RDONLY) != 0);
ret = bch2_parse_mount_opts(c, &opts, data); ret = bch2_parse_mount_opts(c, &opts, data);
if (ret) if (ret)
goto err; goto err;
opt_set(opts, read_only, (*flags & SB_RDONLY) != 0);
if (opts.read_only != c->opts.read_only) { if (opts.read_only != c->opts.read_only) {
down_write(&c->state_lock); down_write(&c->state_lock);
@ -1696,11 +1669,9 @@ static int bch2_remount(struct super_block *sb, int *flags, char *data)
static int bch2_show_devname(struct seq_file *seq, struct dentry *root) static int bch2_show_devname(struct seq_file *seq, struct dentry *root)
{ {
struct bch_fs *c = root->d_sb->s_fs_info; struct bch_fs *c = root->d_sb->s_fs_info;
struct bch_dev *ca;
unsigned i;
bool first = true; bool first = true;
for_each_online_member(ca, c, i) { for_each_online_member(c, ca) {
if (!first) if (!first)
seq_putc(seq, ':'); seq_putc(seq, ':');
first = false; first = false;
@ -1770,7 +1741,7 @@ static int bch2_unfreeze(struct super_block *sb)
struct bch_fs *c = sb->s_fs_info; struct bch_fs *c = sb->s_fs_info;
int ret; int ret;
if (test_bit(BCH_FS_EMERGENCY_RO, &c->flags)) if (test_bit(BCH_FS_emergency_ro, &c->flags))
return 0; return 0;
down_write(&c->state_lock); down_write(&c->state_lock);
@ -1805,17 +1776,18 @@ static int bch2_noset_super(struct super_block *s, void *data)
return -EBUSY; return -EBUSY;
} }
typedef DARRAY(struct bch_fs *) darray_fs;
static int bch2_test_super(struct super_block *s, void *data) static int bch2_test_super(struct super_block *s, void *data)
{ {
struct bch_fs *c = s->s_fs_info; struct bch_fs *c = s->s_fs_info;
struct bch_fs **devs = data; darray_fs *d = data;
unsigned i;
if (!c) if (!c)
return false; return false;
for (i = 0; devs[i]; i++) darray_for_each(*d, i)
if (c != devs[i]) if (c != *i)
return false; return false;
return true; return true;
} }
@ -1824,13 +1796,9 @@ static struct dentry *bch2_mount(struct file_system_type *fs_type,
int flags, const char *dev_name, void *data) int flags, const char *dev_name, void *data)
{ {
struct bch_fs *c; struct bch_fs *c;
struct bch_dev *ca;
struct super_block *sb; struct super_block *sb;
struct inode *vinode; struct inode *vinode;
struct bch_opts opts = bch2_opts_empty(); struct bch_opts opts = bch2_opts_empty();
char **devs;
struct bch_fs **devs_to_fs = NULL;
unsigned i, nr_devs;
int ret; int ret;
opt_set(opts, read_only, (flags & SB_RDONLY) != 0); opt_set(opts, read_only, (flags & SB_RDONLY) != 0);
@ -1842,25 +1810,25 @@ static struct dentry *bch2_mount(struct file_system_type *fs_type,
if (!dev_name || strlen(dev_name) == 0) if (!dev_name || strlen(dev_name) == 0)
return ERR_PTR(-EINVAL); return ERR_PTR(-EINVAL);
devs = split_devs(dev_name, &nr_devs); darray_str devs;
if (!devs) ret = bch2_split_devs(dev_name, &devs);
return ERR_PTR(-ENOMEM); if (ret)
return ERR_PTR(ret);
devs_to_fs = kcalloc(nr_devs + 1, sizeof(void *), GFP_KERNEL); darray_fs devs_to_fs = {};
if (!devs_to_fs) { darray_for_each(devs, i) {
sb = ERR_PTR(-ENOMEM); ret = darray_push(&devs_to_fs, bch2_path_to_fs(*i));
goto got_sb; if (ret) {
sb = ERR_PTR(ret);
goto got_sb;
}
} }
for (i = 0; i < nr_devs; i++) sb = sget(fs_type, bch2_test_super, bch2_noset_super, flags|SB_NOSEC, &devs_to_fs);
devs_to_fs[i] = bch2_path_to_fs(devs[i]);
sb = sget(fs_type, bch2_test_super, bch2_noset_super,
flags|SB_NOSEC, devs_to_fs);
if (!IS_ERR(sb)) if (!IS_ERR(sb))
goto got_sb; goto got_sb;
c = bch2_fs_open(devs, nr_devs, opts); c = bch2_fs_open(devs.data, devs.nr, opts);
if (IS_ERR(c)) { if (IS_ERR(c)) {
sb = ERR_CAST(c); sb = ERR_CAST(c);
goto got_sb; goto got_sb;
@ -1880,9 +1848,8 @@ static struct dentry *bch2_mount(struct file_system_type *fs_type,
if (IS_ERR(sb)) if (IS_ERR(sb))
bch2_fs_stop(c); bch2_fs_stop(c);
got_sb: got_sb:
kfree(devs_to_fs); darray_exit(&devs_to_fs);
kfree(devs[0]); bch2_darray_str_exit(&devs);
kfree(devs);
if (IS_ERR(sb)) { if (IS_ERR(sb)) {
ret = PTR_ERR(sb); ret = PTR_ERR(sb);
@ -1923,7 +1890,7 @@ static struct dentry *bch2_mount(struct file_system_type *fs_type,
sb->s_bdi->ra_pages = VM_READAHEAD_PAGES; sb->s_bdi->ra_pages = VM_READAHEAD_PAGES;
for_each_online_member(ca, c, i) { for_each_online_member(c, ca) {
struct block_device *bdev = ca->disk_sb.bdev; struct block_device *bdev = ca->disk_sb.bdev;
/* XXX: create an anonymous device for multi device filesystems */ /* XXX: create an anonymous device for multi device filesystems */
@ -1944,10 +1911,9 @@ static struct dentry *bch2_mount(struct file_system_type *fs_type,
vinode = bch2_vfs_inode_get(c, BCACHEFS_ROOT_SUBVOL_INUM); vinode = bch2_vfs_inode_get(c, BCACHEFS_ROOT_SUBVOL_INUM);
ret = PTR_ERR_OR_ZERO(vinode); ret = PTR_ERR_OR_ZERO(vinode);
if (ret) { bch_err_msg(c, ret, "mounting: error getting root inode");
bch_err_msg(c, ret, "mounting: error getting root inode"); if (ret)
goto err_put_super; goto err_put_super;
}
sb->s_root = d_make_root(vinode); sb->s_root = d_make_root(vinode);
if (!sb->s_root) { if (!sb->s_root) {

View File

@ -77,9 +77,8 @@ static inline int ptrcmp(void *l, void *r)
} }
enum bch_inode_lock_op { enum bch_inode_lock_op {
INODE_LOCK = (1U << 0), INODE_PAGECACHE_BLOCK = (1U << 0),
INODE_PAGECACHE_BLOCK = (1U << 1), INODE_UPDATE_LOCK = (1U << 1),
INODE_UPDATE_LOCK = (1U << 2),
}; };
#define bch2_lock_inodes(_locks, ...) \ #define bch2_lock_inodes(_locks, ...) \
@ -91,8 +90,6 @@ do { \
\ \
for (i = 1; i < ARRAY_SIZE(a); i++) \ for (i = 1; i < ARRAY_SIZE(a); i++) \
if (a[i] != a[i - 1]) { \ if (a[i] != a[i - 1]) { \
if ((_locks) & INODE_LOCK) \
down_write_nested(&a[i]->v.i_rwsem, i); \
if ((_locks) & INODE_PAGECACHE_BLOCK) \ if ((_locks) & INODE_PAGECACHE_BLOCK) \
bch2_pagecache_block_get(a[i]);\ bch2_pagecache_block_get(a[i]);\
if ((_locks) & INODE_UPDATE_LOCK) \ if ((_locks) & INODE_UPDATE_LOCK) \
@ -109,8 +106,6 @@ do { \
\ \
for (i = 1; i < ARRAY_SIZE(a); i++) \ for (i = 1; i < ARRAY_SIZE(a); i++) \
if (a[i] != a[i - 1]) { \ if (a[i] != a[i - 1]) { \
if ((_locks) & INODE_LOCK) \
up_write(&a[i]->v.i_rwsem); \
if ((_locks) & INODE_PAGECACHE_BLOCK) \ if ((_locks) & INODE_PAGECACHE_BLOCK) \
bch2_pagecache_block_put(a[i]);\ bch2_pagecache_block_put(a[i]);\
if ((_locks) & INODE_UPDATE_LOCK) \ if ((_locks) & INODE_UPDATE_LOCK) \

File diff suppressed because it is too large Load Diff

View File

@ -561,64 +561,46 @@ static inline bool bkey_is_deleted_inode(struct bkey_s_c k)
return bkey_inode_flags(k) & BCH_INODE_unlinked; return bkey_inode_flags(k) & BCH_INODE_unlinked;
} }
int bch2_trans_mark_inode(struct btree_trans *trans, int bch2_trigger_inode(struct btree_trans *trans,
enum btree_id btree_id, unsigned level, enum btree_id btree_id, unsigned level,
struct bkey_s_c old, struct bkey_s_c old,
struct bkey_i *new, struct bkey_s new,
unsigned flags) unsigned flags)
{ {
int nr = bkey_is_inode(&new->k) - bkey_is_inode(old.k); s64 nr = bkey_is_inode(new.k) - bkey_is_inode(old.k);
bool old_deleted = bkey_is_deleted_inode(old);
bool new_deleted = bkey_is_deleted_inode(bkey_i_to_s_c(new));
if (nr) { if (flags & BTREE_TRIGGER_TRANSACTIONAL) {
int ret = bch2_replicas_deltas_realloc(trans, 0); if (nr) {
struct replicas_delta_list *d = trans->fs_usage_deltas; int ret = bch2_replicas_deltas_realloc(trans, 0);
if (ret)
return ret;
if (ret) trans->fs_usage_deltas->nr_inodes += nr;
return ret; }
d->nr_inodes += nr; bool old_deleted = bkey_is_deleted_inode(old);
bool new_deleted = bkey_is_deleted_inode(new.s_c);
if (old_deleted != new_deleted) {
int ret = bch2_btree_bit_mod(trans, BTREE_ID_deleted_inodes, new.k->p, new_deleted);
if (ret)
return ret;
}
} }
if (old_deleted != new_deleted) { if (!(flags & BTREE_TRIGGER_TRANSACTIONAL) && (flags & BTREE_TRIGGER_INSERT)) {
int ret = bch2_btree_bit_mod(trans, BTREE_ID_deleted_inodes, new->k.p, new_deleted); BUG_ON(!trans->journal_res.seq);
if (ret)
return ret;
}
return 0; bkey_s_to_inode_v3(new).v->bi_journal_seq = cpu_to_le64(trans->journal_res.seq);
}
int bch2_mark_inode(struct btree_trans *trans,
enum btree_id btree_id, unsigned level,
struct bkey_s_c old, struct bkey_s_c new,
unsigned flags)
{
struct bch_fs *c = trans->c;
struct bch_fs_usage *fs_usage;
u64 journal_seq = trans->journal_res.seq;
if (flags & BTREE_TRIGGER_INSERT) {
struct bch_inode_v3 *v = (struct bch_inode_v3 *) new.v;
BUG_ON(!journal_seq);
BUG_ON(new.k->type != KEY_TYPE_inode_v3);
v->bi_journal_seq = cpu_to_le64(journal_seq);
} }
if (flags & BTREE_TRIGGER_GC) { if (flags & BTREE_TRIGGER_GC) {
struct bch_fs *c = trans->c;
percpu_down_read(&c->mark_lock); percpu_down_read(&c->mark_lock);
preempt_disable(); this_cpu_add(c->usage_gc->nr_inodes, nr);
fs_usage = fs_usage_ptr(c, journal_seq, flags & BTREE_TRIGGER_GC);
fs_usage->nr_inodes += bkey_is_inode(new.k);
fs_usage->nr_inodes -= bkey_is_inode(old.k);
preempt_enable();
percpu_up_read(&c->mark_lock); percpu_up_read(&c->mark_lock);
} }
return 0; return 0;
} }
@ -831,7 +813,7 @@ static int bch2_inode_delete_keys(struct btree_trans *trans,
ret = bch2_trans_update(trans, &iter, &delete, 0) ?: ret = bch2_trans_update(trans, &iter, &delete, 0) ?:
bch2_trans_commit(trans, NULL, NULL, bch2_trans_commit(trans, NULL, NULL,
BTREE_INSERT_NOFAIL); BCH_TRANS_COMMIT_no_enospc);
err: err:
if (ret && !bch2_err_matches(ret, BCH_ERR_transaction_restart)) if (ret && !bch2_err_matches(ret, BCH_ERR_transaction_restart))
break; break;
@ -894,7 +876,7 @@ int bch2_inode_rm(struct bch_fs *c, subvol_inum inum)
ret = bch2_trans_update(trans, &iter, &delete.k_i, 0) ?: ret = bch2_trans_update(trans, &iter, &delete.k_i, 0) ?:
bch2_trans_commit(trans, NULL, NULL, bch2_trans_commit(trans, NULL, NULL,
BTREE_INSERT_NOFAIL); BCH_TRANS_COMMIT_no_enospc);
err: err:
bch2_trans_iter_exit(trans, &iter); bch2_trans_iter_exit(trans, &iter);
if (bch2_err_matches(ret, BCH_ERR_transaction_restart)) if (bch2_err_matches(ret, BCH_ERR_transaction_restart))
@ -1058,7 +1040,7 @@ int bch2_inode_rm_snapshot(struct btree_trans *trans, u64 inum, u32 snapshot)
ret = bch2_trans_update(trans, &iter, &delete.k_i, 0) ?: ret = bch2_trans_update(trans, &iter, &delete.k_i, 0) ?:
bch2_trans_commit(trans, NULL, NULL, bch2_trans_commit(trans, NULL, NULL,
BTREE_INSERT_NOFAIL); BCH_TRANS_COMMIT_no_enospc);
err: err:
bch2_trans_iter_exit(trans, &iter); bch2_trans_iter_exit(trans, &iter);
if (bch2_err_matches(ret, BCH_ERR_transaction_restart)) if (bch2_err_matches(ret, BCH_ERR_transaction_restart))
@ -1155,51 +1137,48 @@ static int may_delete_deleted_inode(struct btree_trans *trans,
int bch2_delete_dead_inodes(struct bch_fs *c) int bch2_delete_dead_inodes(struct bch_fs *c)
{ {
struct btree_trans *trans = bch2_trans_get(c); struct btree_trans *trans = bch2_trans_get(c);
struct btree_iter iter;
struct bkey_s_c k;
bool need_another_pass; bool need_another_pass;
int ret; int ret;
again: again:
need_another_pass = false; need_another_pass = false;
ret = bch2_btree_write_buffer_flush_sync(trans);
if (ret)
goto err;
/* /*
* Weird transaction restart handling here because on successful delete, * Weird transaction restart handling here because on successful delete,
* bch2_inode_rm_snapshot() will return a nested transaction restart, * bch2_inode_rm_snapshot() will return a nested transaction restart,
* but we can't retry because the btree write buffer won't have been * but we can't retry because the btree write buffer won't have been
* flushed and we'd spin: * flushed and we'd spin:
*/ */
for_each_btree_key(trans, iter, BTREE_ID_deleted_inodes, POS_MIN, ret = for_each_btree_key_commit(trans, iter, BTREE_ID_deleted_inodes, POS_MIN,
BTREE_ITER_PREFETCH|BTREE_ITER_ALL_SNAPSHOTS, k, ret) { BTREE_ITER_PREFETCH|BTREE_ITER_ALL_SNAPSHOTS, k,
ret = commit_do(trans, NULL, NULL, NULL, NULL, BCH_TRANS_COMMIT_no_enospc, ({
BTREE_INSERT_NOFAIL| ret = may_delete_deleted_inode(trans, &iter, k.k->p, &need_another_pass);
BTREE_INSERT_LAZY_RW, if (ret > 0) {
may_delete_deleted_inode(trans, &iter, k.k->p, &need_another_pass));
if (ret < 0)
break;
if (ret) {
if (!test_bit(BCH_FS_RW, &c->flags)) {
bch2_trans_unlock(trans);
bch2_fs_lazy_rw(c);
}
bch_verbose(c, "deleting unlinked inode %llu:%u", k.k->p.offset, k.k->p.snapshot); bch_verbose(c, "deleting unlinked inode %llu:%u", k.k->p.offset, k.k->p.snapshot);
ret = bch2_inode_rm_snapshot(trans, k.k->p.offset, k.k->p.snapshot); ret = bch2_inode_rm_snapshot(trans, k.k->p.offset, k.k->p.snapshot);
if (ret && !bch2_err_matches(ret, BCH_ERR_transaction_restart)) /*
break; * We don't want to loop here: a transaction restart
* error here means we handled a transaction restart and
* we're actually done, but if we loop we'll retry the
* same key because the write buffer hasn't been flushed
* yet
*/
if (bch2_err_matches(ret, BCH_ERR_transaction_restart)) {
ret = 0;
continue;
}
} }
}
bch2_trans_iter_exit(trans, &iter);
if (!ret && need_another_pass) ret;
}));
if (!ret && need_another_pass) {
ret = bch2_btree_write_buffer_flush_sync(trans);
if (ret)
goto err;
goto again; goto again;
}
err: err:
bch2_trans_put(trans); bch2_trans_put(trans);
return ret; return ret;
} }

View File

@ -17,32 +17,27 @@ int bch2_inode_v3_invalid(struct bch_fs *, struct bkey_s_c,
enum bkey_invalid_flags, struct printbuf *); enum bkey_invalid_flags, struct printbuf *);
void bch2_inode_to_text(struct printbuf *, struct bch_fs *, struct bkey_s_c); void bch2_inode_to_text(struct printbuf *, struct bch_fs *, struct bkey_s_c);
int bch2_trans_mark_inode(struct btree_trans *, enum btree_id, unsigned, int bch2_trigger_inode(struct btree_trans *, enum btree_id, unsigned,
struct bkey_s_c, struct bkey_i *, unsigned); struct bkey_s_c, struct bkey_s, unsigned);
int bch2_mark_inode(struct btree_trans *, enum btree_id, unsigned,
struct bkey_s_c, struct bkey_s_c, unsigned);
#define bch2_bkey_ops_inode ((struct bkey_ops) { \ #define bch2_bkey_ops_inode ((struct bkey_ops) { \
.key_invalid = bch2_inode_invalid, \ .key_invalid = bch2_inode_invalid, \
.val_to_text = bch2_inode_to_text, \ .val_to_text = bch2_inode_to_text, \
.trans_trigger = bch2_trans_mark_inode, \ .trigger = bch2_trigger_inode, \
.atomic_trigger = bch2_mark_inode, \
.min_val_size = 16, \ .min_val_size = 16, \
}) })
#define bch2_bkey_ops_inode_v2 ((struct bkey_ops) { \ #define bch2_bkey_ops_inode_v2 ((struct bkey_ops) { \
.key_invalid = bch2_inode_v2_invalid, \ .key_invalid = bch2_inode_v2_invalid, \
.val_to_text = bch2_inode_to_text, \ .val_to_text = bch2_inode_to_text, \
.trans_trigger = bch2_trans_mark_inode, \ .trigger = bch2_trigger_inode, \
.atomic_trigger = bch2_mark_inode, \
.min_val_size = 32, \ .min_val_size = 32, \
}) })
#define bch2_bkey_ops_inode_v3 ((struct bkey_ops) { \ #define bch2_bkey_ops_inode_v3 ((struct bkey_ops) { \
.key_invalid = bch2_inode_v3_invalid, \ .key_invalid = bch2_inode_v3_invalid, \
.val_to_text = bch2_inode_to_text, \ .val_to_text = bch2_inode_to_text, \
.trans_trigger = bch2_trans_mark_inode, \ .trigger = bch2_trigger_inode, \
.atomic_trigger = bch2_mark_inode, \
.min_val_size = 48, \ .min_val_size = 48, \
}) })

View File

@ -34,8 +34,7 @@ int bch2_extent_fallocate(struct btree_trans *trans,
struct open_buckets open_buckets = { 0 }; struct open_buckets open_buckets = { 0 };
struct bkey_s_c k; struct bkey_s_c k;
struct bkey_buf old, new; struct bkey_buf old, new;
unsigned sectors_allocated = 0; unsigned sectors_allocated = 0, new_replicas;
bool have_reservation = false;
bool unwritten = opts.nocow && bool unwritten = opts.nocow &&
c->sb.version >= bcachefs_metadata_version_unwritten_extents; c->sb.version >= bcachefs_metadata_version_unwritten_extents;
int ret; int ret;
@ -50,28 +49,20 @@ int bch2_extent_fallocate(struct btree_trans *trans,
return ret; return ret;
sectors = min_t(u64, sectors, k.k->p.offset - iter->pos.offset); sectors = min_t(u64, sectors, k.k->p.offset - iter->pos.offset);
new_replicas = max(0, (int) opts.data_replicas -
(int) bch2_bkey_nr_ptrs_fully_allocated(k));
if (!have_reservation) { /*
unsigned new_replicas = * Get a disk reservation before (in the nocow case) calling
max(0, (int) opts.data_replicas - * into the allocator:
(int) bch2_bkey_nr_ptrs_fully_allocated(k)); */
/* ret = bch2_disk_reservation_get(c, &disk_res, sectors, new_replicas, 0);
* Get a disk reservation before (in the nocow case) calling if (unlikely(ret))
* into the allocator: goto err_noprint;
*/
ret = bch2_disk_reservation_get(c, &disk_res, sectors, new_replicas, 0);
if (unlikely(ret))
goto err;
bch2_bkey_buf_reassemble(&old, c, k); bch2_bkey_buf_reassemble(&old, c, k);
}
if (have_reservation) { if (!unwritten) {
if (!bch2_extents_match(k, bkey_i_to_s_c(old.k)))
goto err;
bch2_key_resize(&new.k->k, sectors);
} else if (!unwritten) {
struct bkey_i_reservation *reservation; struct bkey_i_reservation *reservation;
bch2_bkey_buf_realloc(&new, c, sizeof(*reservation) / sizeof(u64)); bch2_bkey_buf_realloc(&new, c, sizeof(*reservation) / sizeof(u64));
@ -83,7 +74,6 @@ int bch2_extent_fallocate(struct btree_trans *trans,
struct bkey_i_extent *e; struct bkey_i_extent *e;
struct bch_devs_list devs_have; struct bch_devs_list devs_have;
struct write_point *wp; struct write_point *wp;
struct bch_extent_ptr *ptr;
devs_have.nr = 0; devs_have.nr = 0;
@ -118,14 +108,17 @@ int bch2_extent_fallocate(struct btree_trans *trans,
ptr->unwritten = true; ptr->unwritten = true;
} }
have_reservation = true;
ret = bch2_extent_update(trans, inum, iter, new.k, &disk_res, ret = bch2_extent_update(trans, inum, iter, new.k, &disk_res,
0, i_sectors_delta, true); 0, i_sectors_delta, true);
err: err:
if (!ret && sectors_allocated) if (!ret && sectors_allocated)
bch2_increment_clock(c, sectors_allocated, WRITE); bch2_increment_clock(c, sectors_allocated, WRITE);
if (should_print_err(ret))
bch_err_inum_offset_ratelimited(c,
inum.inum,
iter->pos.offset << 9,
"%s(): error: %s", __func__, bch2_err_str(ret));
err_noprint:
bch2_open_buckets_put(c, &open_buckets); bch2_open_buckets_put(c, &open_buckets);
bch2_disk_reservation_put(c, &disk_res); bch2_disk_reservation_put(c, &disk_res);
bch2_bkey_buf_exit(&new, c); bch2_bkey_buf_exit(&new, c);
@ -256,7 +249,7 @@ static int __bch2_resume_logged_op_truncate(struct btree_trans *trans,
u64 new_i_size = le64_to_cpu(op->v.new_i_size); u64 new_i_size = le64_to_cpu(op->v.new_i_size);
int ret; int ret;
ret = commit_do(trans, NULL, NULL, BTREE_INSERT_NOFAIL, ret = commit_do(trans, NULL, NULL, BCH_TRANS_COMMIT_no_enospc,
truncate_set_isize(trans, inum, new_i_size)); truncate_set_isize(trans, inum, new_i_size));
if (ret) if (ret)
goto err; goto err;
@ -378,7 +371,7 @@ case LOGGED_OP_FINSERT_start:
op->v.state = LOGGED_OP_FINSERT_shift_extents; op->v.state = LOGGED_OP_FINSERT_shift_extents;
if (insert) { if (insert) {
ret = commit_do(trans, NULL, NULL, BTREE_INSERT_NOFAIL, ret = commit_do(trans, NULL, NULL, BCH_TRANS_COMMIT_no_enospc,
adjust_i_size(trans, inum, src_offset, len) ?: adjust_i_size(trans, inum, src_offset, len) ?:
bch2_logged_op_update(trans, &op->k_i)); bch2_logged_op_update(trans, &op->k_i));
if (ret) if (ret)
@ -390,7 +383,7 @@ case LOGGED_OP_FINSERT_start:
if (ret && !bch2_err_matches(ret, BCH_ERR_transaction_restart)) if (ret && !bch2_err_matches(ret, BCH_ERR_transaction_restart))
goto err; goto err;
ret = commit_do(trans, NULL, NULL, BTREE_INSERT_NOFAIL, ret = commit_do(trans, NULL, NULL, BCH_TRANS_COMMIT_no_enospc,
bch2_logged_op_update(trans, &op->k_i)); bch2_logged_op_update(trans, &op->k_i));
} }
@ -455,7 +448,7 @@ case LOGGED_OP_FINSERT_shift_extents:
bch2_btree_insert_trans(trans, BTREE_ID_extents, &delete, 0) ?: bch2_btree_insert_trans(trans, BTREE_ID_extents, &delete, 0) ?:
bch2_btree_insert_trans(trans, BTREE_ID_extents, copy, 0) ?: bch2_btree_insert_trans(trans, BTREE_ID_extents, copy, 0) ?:
bch2_logged_op_update(trans, &op->k_i) ?: bch2_logged_op_update(trans, &op->k_i) ?:
bch2_trans_commit(trans, &disk_res, NULL, BTREE_INSERT_NOFAIL); bch2_trans_commit(trans, &disk_res, NULL, BCH_TRANS_COMMIT_no_enospc);
btree_err: btree_err:
bch2_disk_reservation_put(c, &disk_res); bch2_disk_reservation_put(c, &disk_res);
@ -470,12 +463,12 @@ case LOGGED_OP_FINSERT_shift_extents:
op->v.state = LOGGED_OP_FINSERT_finish; op->v.state = LOGGED_OP_FINSERT_finish;
if (!insert) { if (!insert) {
ret = commit_do(trans, NULL, NULL, BTREE_INSERT_NOFAIL, ret = commit_do(trans, NULL, NULL, BCH_TRANS_COMMIT_no_enospc,
adjust_i_size(trans, inum, src_offset, shift) ?: adjust_i_size(trans, inum, src_offset, shift) ?:
bch2_logged_op_update(trans, &op->k_i)); bch2_logged_op_update(trans, &op->k_i));
} else { } else {
/* We need an inode update to update bi_journal_seq for fsync: */ /* We need an inode update to update bi_journal_seq for fsync: */
ret = commit_do(trans, NULL, NULL, BTREE_INSERT_NOFAIL, ret = commit_do(trans, NULL, NULL, BCH_TRANS_COMMIT_no_enospc,
adjust_i_size(trans, inum, 0, 0) ?: adjust_i_size(trans, inum, 0, 0) ?:
bch2_logged_op_update(trans, &op->k_i)); bch2_logged_op_update(trans, &op->k_i));
} }

View File

@ -80,7 +80,7 @@ struct promote_op {
struct bpos pos; struct bpos pos;
struct data_update write; struct data_update write;
struct bio_vec bi_inline_vecs[0]; /* must be last */ struct bio_vec bi_inline_vecs[]; /* must be last */
}; };
static const struct rhashtable_params bch_promote_params = { static const struct rhashtable_params bch_promote_params = {
@ -172,11 +172,13 @@ static struct promote_op *__promote_alloc(struct btree_trans *trans,
int ret; int ret;
if (!bch2_write_ref_tryget(c, BCH_WRITE_REF_promote)) if (!bch2_write_ref_tryget(c, BCH_WRITE_REF_promote))
return NULL; return ERR_PTR(-BCH_ERR_nopromote_no_writes);
op = kzalloc(sizeof(*op) + sizeof(struct bio_vec) * pages, GFP_NOFS); op = kzalloc(sizeof(*op) + sizeof(struct bio_vec) * pages, GFP_KERNEL);
if (!op) if (!op) {
ret = -BCH_ERR_nopromote_enomem;
goto err; goto err;
}
op->start_time = local_clock(); op->start_time = local_clock();
op->pos = pos; op->pos = pos;
@ -187,24 +189,29 @@ static struct promote_op *__promote_alloc(struct btree_trans *trans,
*/ */
*rbio = kzalloc(sizeof(struct bch_read_bio) + *rbio = kzalloc(sizeof(struct bch_read_bio) +
sizeof(struct bio_vec) * pages, sizeof(struct bio_vec) * pages,
GFP_NOFS); GFP_KERNEL);
if (!*rbio) if (!*rbio) {
ret = -BCH_ERR_nopromote_enomem;
goto err; goto err;
}
rbio_init(&(*rbio)->bio, opts); rbio_init(&(*rbio)->bio, opts);
bio_init(&(*rbio)->bio, NULL, (*rbio)->bio.bi_inline_vecs, pages, 0); bio_init(&(*rbio)->bio, NULL, (*rbio)->bio.bi_inline_vecs, pages, 0);
if (bch2_bio_alloc_pages(&(*rbio)->bio, sectors << 9, if (bch2_bio_alloc_pages(&(*rbio)->bio, sectors << 9, GFP_KERNEL)) {
GFP_NOFS)) ret = -BCH_ERR_nopromote_enomem;
goto err; goto err;
}
(*rbio)->bounce = true; (*rbio)->bounce = true;
(*rbio)->split = true; (*rbio)->split = true;
(*rbio)->kmalloc = true; (*rbio)->kmalloc = true;
if (rhashtable_lookup_insert_fast(&c->promote_table, &op->hash, if (rhashtable_lookup_insert_fast(&c->promote_table, &op->hash,
bch_promote_params)) bch_promote_params)) {
ret = -BCH_ERR_nopromote_in_flight;
goto err; goto err;
}
bio = &op->write.op.wbio.bio; bio = &op->write.op.wbio.bio;
bio_init(bio, NULL, bio->bi_inline_vecs, pages, 0); bio_init(bio, NULL, bio->bi_inline_vecs, pages, 0);
@ -223,9 +230,8 @@ static struct promote_op *__promote_alloc(struct btree_trans *trans,
* -BCH_ERR_ENOSPC_disk_reservation: * -BCH_ERR_ENOSPC_disk_reservation:
*/ */
if (ret) { if (ret) {
ret = rhashtable_remove_fast(&c->promote_table, &op->hash, BUG_ON(rhashtable_remove_fast(&c->promote_table, &op->hash,
bch_promote_params); bch_promote_params));
BUG_ON(ret);
goto err; goto err;
} }
@ -239,7 +245,7 @@ static struct promote_op *__promote_alloc(struct btree_trans *trans,
*rbio = NULL; *rbio = NULL;
kfree(op); kfree(op);
bch2_write_ref_put(c, BCH_WRITE_REF_promote); bch2_write_ref_put(c, BCH_WRITE_REF_promote);
return NULL; return ERR_PTR(ret);
} }
noinline noinline
@ -274,10 +280,9 @@ static struct promote_op *promote_alloc(struct btree_trans *trans,
? BTREE_ID_reflink ? BTREE_ID_reflink
: BTREE_ID_extents, : BTREE_ID_extents,
k, pos, pick, opts, sectors, rbio); k, pos, pick, opts, sectors, rbio);
if (!promote) { ret = PTR_ERR_OR_ZERO(promote);
ret = -BCH_ERR_nopromote_enomem; if (ret)
goto nopromote; goto nopromote;
}
*bounce = true; *bounce = true;
*read_full = promote_full; *read_full = promote_full;
@ -526,7 +531,7 @@ static int __bch2_rbio_narrow_crcs(struct btree_trans *trans,
static noinline void bch2_rbio_narrow_crcs(struct bch_read_bio *rbio) static noinline void bch2_rbio_narrow_crcs(struct bch_read_bio *rbio)
{ {
bch2_trans_do(rbio->c, NULL, NULL, BTREE_INSERT_NOFAIL, bch2_trans_do(rbio->c, NULL, NULL, BCH_TRANS_COMMIT_no_enospc,
__bch2_rbio_narrow_crcs(trans, rbio)); __bch2_rbio_narrow_crcs(trans, rbio));
} }
@ -637,12 +642,17 @@ static void __bch2_read_endio(struct work_struct *work)
goto out; goto out;
} }
struct printbuf buf = PRINTBUF;
buf.atomic++;
prt_str(&buf, "data ");
bch2_csum_err_msg(&buf, crc.csum_type, rbio->pick.crc.csum, csum);
bch_err_inum_offset_ratelimited(ca, bch_err_inum_offset_ratelimited(ca,
rbio->read_pos.inode, rbio->read_pos.inode,
rbio->read_pos.offset << 9, rbio->read_pos.offset << 9,
"data checksum error: expected %0llx:%0llx got %0llx:%0llx (type %s)", "data %s", buf.buf);
rbio->pick.crc.csum.hi, rbio->pick.crc.csum.lo, printbuf_exit(&buf);
csum.hi, csum.lo, bch2_csum_types[crc.csum_type]);
bch2_io_error(ca, BCH_MEMBER_ERROR_checksum); bch2_io_error(ca, BCH_MEMBER_ERROR_checksum);
bch2_rbio_error(rbio, READ_RETRY_AVOID, BLK_STS_IOERR); bch2_rbio_error(rbio, READ_RETRY_AVOID, BLK_STS_IOERR);
goto out; goto out;

View File

@ -316,8 +316,8 @@ int bch2_extent_update(struct btree_trans *trans,
i_sectors_delta) ?: i_sectors_delta) ?:
bch2_trans_update(trans, iter, k, 0) ?: bch2_trans_update(trans, iter, k, 0) ?:
bch2_trans_commit(trans, disk_res, NULL, bch2_trans_commit(trans, disk_res, NULL,
BTREE_INSERT_NOCHECK_RW| BCH_TRANS_COMMIT_no_check_rw|
BTREE_INSERT_NOFAIL); BCH_TRANS_COMMIT_no_enospc);
if (unlikely(ret)) if (unlikely(ret))
return ret; return ret;
@ -396,17 +396,14 @@ void bch2_submit_wbio_replicas(struct bch_write_bio *wbio, struct bch_fs *c,
bool nocow) bool nocow)
{ {
struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(bkey_i_to_s_c(k)); struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(bkey_i_to_s_c(k));
const struct bch_extent_ptr *ptr;
struct bch_write_bio *n; struct bch_write_bio *n;
struct bch_dev *ca;
BUG_ON(c->opts.nochanges); BUG_ON(c->opts.nochanges);
bkey_for_each_ptr(ptrs, ptr) { bkey_for_each_ptr(ptrs, ptr) {
BUG_ON(ptr->dev >= BCH_SB_MEMBERS_MAX || BUG_ON(!bch2_dev_exists2(c, ptr->dev));
!c->devs[ptr->dev]);
ca = bch_dev_bkey_exists(c, ptr->dev); struct bch_dev *ca = bch_dev_bkey_exists(c, ptr->dev);
if (to_entry(ptr + 1) < ptrs.end) { if (to_entry(ptr + 1) < ptrs.end) {
n = to_wbio(bio_alloc_clone(NULL, &wbio->bio, n = to_wbio(bio_alloc_clone(NULL, &wbio->bio,
@ -1109,16 +1106,14 @@ static bool bch2_extent_is_writeable(struct bch_write_op *op,
static inline void bch2_nocow_write_unlock(struct bch_write_op *op) static inline void bch2_nocow_write_unlock(struct bch_write_op *op)
{ {
struct bch_fs *c = op->c; struct bch_fs *c = op->c;
const struct bch_extent_ptr *ptr;
struct bkey_i *k;
for_each_keylist_key(&op->insert_keys, k) { for_each_keylist_key(&op->insert_keys, k) {
struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(bkey_i_to_s_c(k)); struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(bkey_i_to_s_c(k));
bkey_for_each_ptr(ptrs, ptr) bkey_for_each_ptr(ptrs, ptr)
bch2_bucket_nocow_unlock(&c->nocow_locks, bch2_bucket_nocow_unlock(&c->nocow_locks,
PTR_BUCKET_POS(c, ptr), PTR_BUCKET_POS(c, ptr),
BUCKET_NOCOW_LOCK_UPDATE); BUCKET_NOCOW_LOCK_UPDATE);
} }
} }
@ -1128,25 +1123,20 @@ static int bch2_nocow_write_convert_one_unwritten(struct btree_trans *trans,
struct bkey_s_c k, struct bkey_s_c k,
u64 new_i_size) u64 new_i_size)
{ {
struct bkey_i *new;
struct bkey_ptrs ptrs;
struct bch_extent_ptr *ptr;
int ret;
if (!bch2_extents_match(bkey_i_to_s_c(orig), k)) { if (!bch2_extents_match(bkey_i_to_s_c(orig), k)) {
/* trace this */ /* trace this */
return 0; return 0;
} }
new = bch2_bkey_make_mut_noupdate(trans, k); struct bkey_i *new = bch2_bkey_make_mut_noupdate(trans, k);
ret = PTR_ERR_OR_ZERO(new); int ret = PTR_ERR_OR_ZERO(new);
if (ret) if (ret)
return ret; return ret;
bch2_cut_front(bkey_start_pos(&orig->k), new); bch2_cut_front(bkey_start_pos(&orig->k), new);
bch2_cut_back(orig->k.p, new); bch2_cut_back(orig->k.p, new);
ptrs = bch2_bkey_ptrs(bkey_i_to_s(new)); struct bkey_ptrs ptrs = bch2_bkey_ptrs(bkey_i_to_s(new));
bkey_for_each_ptr(ptrs, ptr) bkey_for_each_ptr(ptrs, ptr)
ptr->unwritten = 0; ptr->unwritten = 0;
@ -1167,16 +1157,12 @@ static void bch2_nocow_write_convert_unwritten(struct bch_write_op *op)
{ {
struct bch_fs *c = op->c; struct bch_fs *c = op->c;
struct btree_trans *trans = bch2_trans_get(c); struct btree_trans *trans = bch2_trans_get(c);
struct btree_iter iter;
struct bkey_i *orig;
struct bkey_s_c k;
int ret;
for_each_keylist_key(&op->insert_keys, orig) { for_each_keylist_key(&op->insert_keys, orig) {
ret = for_each_btree_key_upto_commit(trans, iter, BTREE_ID_extents, int ret = for_each_btree_key_upto_commit(trans, iter, BTREE_ID_extents,
bkey_start_pos(&orig->k), orig->k.p, bkey_start_pos(&orig->k), orig->k.p,
BTREE_ITER_INTENT, k, BTREE_ITER_INTENT, k,
NULL, NULL, BTREE_INSERT_NOFAIL, ({ NULL, NULL, BCH_TRANS_COMMIT_no_enospc, ({
bch2_nocow_write_convert_one_unwritten(trans, &iter, orig, k, op->new_i_size); bch2_nocow_write_convert_one_unwritten(trans, &iter, orig, k, op->new_i_size);
})); }));
@ -1228,10 +1214,7 @@ static void bch2_nocow_write(struct bch_write_op *op)
struct btree_trans *trans; struct btree_trans *trans;
struct btree_iter iter; struct btree_iter iter;
struct bkey_s_c k; struct bkey_s_c k;
struct bkey_ptrs_c ptrs;
const struct bch_extent_ptr *ptr;
DARRAY_PREALLOCATED(struct bucket_to_lock, 3) buckets; DARRAY_PREALLOCATED(struct bucket_to_lock, 3) buckets;
struct bucket_to_lock *i;
u32 snapshot; u32 snapshot;
struct bucket_to_lock *stale_at; struct bucket_to_lock *stale_at;
int ret; int ret;
@ -1273,7 +1256,7 @@ static void bch2_nocow_write(struct bch_write_op *op)
break; break;
/* Get iorefs before dropping btree locks: */ /* Get iorefs before dropping btree locks: */
ptrs = bch2_bkey_ptrs_c(k); struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k);
bkey_for_each_ptr(ptrs, ptr) { bkey_for_each_ptr(ptrs, ptr) {
struct bpos b = PTR_BUCKET_POS(c, ptr); struct bpos b = PTR_BUCKET_POS(c, ptr);
struct nocow_lock_bucket *l = struct nocow_lock_bucket *l =
@ -1464,6 +1447,10 @@ static void __bch2_write(struct bch_write_op *op)
op->flags |= BCH_WRITE_DONE; op->flags |= BCH_WRITE_DONE;
if (ret < 0) { if (ret < 0) {
bch_err_inum_offset_ratelimited(c,
op->pos.inode,
op->pos.offset << 9,
"%s(): error: %s", __func__, bch2_err_str(ret));
op->error = ret; op->error = ret;
break; break;
} }

View File

@ -10,6 +10,7 @@
#include "bkey_methods.h" #include "bkey_methods.h"
#include "btree_gc.h" #include "btree_gc.h"
#include "btree_update.h" #include "btree_update.h"
#include "btree_write_buffer.h"
#include "buckets.h" #include "buckets.h"
#include "error.h" #include "error.h"
#include "journal.h" #include "journal.h"
@ -184,6 +185,8 @@ static void __journal_entry_close(struct journal *j, unsigned closed_val)
/* Close out old buffer: */ /* Close out old buffer: */
buf->data->u64s = cpu_to_le32(old.cur_entry_offset); buf->data->u64s = cpu_to_le32(old.cur_entry_offset);
trace_journal_entry_close(c, vstruct_bytes(buf->data));
sectors = vstruct_blocks_plus(buf->data, c->block_bits, sectors = vstruct_blocks_plus(buf->data, c->block_bits,
buf->u64s_reserved) << c->block_bits; buf->u64s_reserved) << c->block_bits;
BUG_ON(sectors > buf->sectors); BUG_ON(sectors > buf->sectors);
@ -330,6 +333,7 @@ static int journal_entry_open(struct journal *j)
buf->must_flush = false; buf->must_flush = false;
buf->separate_flush = false; buf->separate_flush = false;
buf->flush_time = 0; buf->flush_time = 0;
buf->need_flush_to_write_buffer = true;
memset(buf->data, 0, sizeof(*buf->data)); memset(buf->data, 0, sizeof(*buf->data));
buf->data->seq = cpu_to_le64(journal_cur_seq(j)); buf->data->seq = cpu_to_le64(journal_cur_seq(j));
@ -363,11 +367,6 @@ static int journal_entry_open(struct journal *j)
} while ((v = atomic64_cmpxchg(&j->reservations.counter, } while ((v = atomic64_cmpxchg(&j->reservations.counter,
old.v, new.v)) != old.v); old.v, new.v)) != old.v);
if (j->res_get_blocked_start)
bch2_time_stats_update(j->blocked_time,
j->res_get_blocked_start);
j->res_get_blocked_start = 0;
mod_delayed_work(c->io_complete_wq, mod_delayed_work(c->io_complete_wq,
&j->write_work, &j->write_work,
msecs_to_jiffies(c->opts.journal_flush_delay)); msecs_to_jiffies(c->opts.journal_flush_delay));
@ -467,15 +466,12 @@ static int __journal_res_get(struct journal *j, struct journal_res *res,
__journal_entry_close(j, JOURNAL_ENTRY_CLOSED_VAL); __journal_entry_close(j, JOURNAL_ENTRY_CLOSED_VAL);
ret = journal_entry_open(j); ret = journal_entry_open(j);
if (ret == JOURNAL_ERR_max_in_flight) if (ret == JOURNAL_ERR_max_in_flight) {
track_event_change(&c->times[BCH_TIME_blocked_journal_max_in_flight],
&j->max_in_flight_start, true);
trace_and_count(c, journal_entry_full, c); trace_and_count(c, journal_entry_full, c);
unlock:
if ((ret && ret != JOURNAL_ERR_insufficient_devices) &&
!j->res_get_blocked_start) {
j->res_get_blocked_start = local_clock() ?: 1;
trace_and_count(c, journal_full, c);
} }
unlock:
can_discard = j->can_discard; can_discard = j->can_discard;
spin_unlock(&j->lock); spin_unlock(&j->lock);
@ -774,6 +770,48 @@ void bch2_journal_block(struct journal *j)
journal_quiesce(j); journal_quiesce(j);
} }
static struct journal_buf *__bch2_next_write_buffer_flush_journal_buf(struct journal *j, u64 max_seq)
{
struct journal_buf *ret = NULL;
mutex_lock(&j->buf_lock);
spin_lock(&j->lock);
max_seq = min(max_seq, journal_cur_seq(j));
for (u64 seq = journal_last_unwritten_seq(j);
seq <= max_seq;
seq++) {
unsigned idx = seq & JOURNAL_BUF_MASK;
struct journal_buf *buf = j->buf + idx;
if (buf->need_flush_to_write_buffer) {
if (seq == journal_cur_seq(j))
__journal_entry_close(j, JOURNAL_ENTRY_CLOSED_VAL);
union journal_res_state s;
s.v = atomic64_read_acquire(&j->reservations.counter);
ret = journal_state_count(s, idx)
? ERR_PTR(-EAGAIN)
: buf;
break;
}
}
spin_unlock(&j->lock);
if (IS_ERR_OR_NULL(ret))
mutex_unlock(&j->buf_lock);
return ret;
}
struct journal_buf *bch2_next_write_buffer_flush_journal_buf(struct journal *j, u64 max_seq)
{
struct journal_buf *ret;
wait_event(j->wait, (ret = __bch2_next_write_buffer_flush_journal_buf(j, max_seq)) != ERR_PTR(-EAGAIN));
return ret;
}
/* allocate journal on a device: */ /* allocate journal on a device: */
static int __bch2_set_nr_journal_buckets(struct bch_dev *ca, unsigned nr, static int __bch2_set_nr_journal_buckets(struct bch_dev *ca, unsigned nr,
@ -955,8 +993,7 @@ int bch2_set_nr_journal_buckets(struct bch_fs *c, struct bch_dev *ca,
break; break;
} }
if (ret) bch_err_fn(c, ret);
bch_err_fn(c, ret);
unlock: unlock:
up_write(&c->state_lock); up_write(&c->state_lock);
return ret; return ret;
@ -986,17 +1023,13 @@ int bch2_dev_journal_alloc(struct bch_dev *ca)
ret = __bch2_set_nr_journal_buckets(ca, nr, true, NULL); ret = __bch2_set_nr_journal_buckets(ca, nr, true, NULL);
err: err:
if (ret) bch_err_fn(ca, ret);
bch_err_fn(ca, ret);
return ret; return ret;
} }
int bch2_fs_journal_alloc(struct bch_fs *c) int bch2_fs_journal_alloc(struct bch_fs *c)
{ {
struct bch_dev *ca; for_each_online_member(c, ca) {
unsigned i;
for_each_online_member(ca, c, i) {
if (ca->journal.nr) if (ca->journal.nr)
continue; continue;
@ -1225,6 +1258,7 @@ int bch2_fs_journal_init(struct journal *j)
static struct lock_class_key res_key; static struct lock_class_key res_key;
unsigned i; unsigned i;
mutex_init(&j->buf_lock);
spin_lock_init(&j->lock); spin_lock_init(&j->lock);
spin_lock_init(&j->err_lock); spin_lock_init(&j->err_lock);
init_waitqueue_head(&j->wait); init_waitqueue_head(&j->wait);
@ -1260,10 +1294,8 @@ void __bch2_journal_debug_to_text(struct printbuf *out, struct journal *j)
{ {
struct bch_fs *c = container_of(j, struct bch_fs, journal); struct bch_fs *c = container_of(j, struct bch_fs, journal);
union journal_res_state s; union journal_res_state s;
struct bch_dev *ca;
unsigned long now = jiffies; unsigned long now = jiffies;
u64 seq; u64 nr_writes = j->nr_flush_writes + j->nr_noflush_writes;
unsigned i;
if (!out->nr_tabstops) if (!out->nr_tabstops)
printbuf_tabstop_push(out, 24); printbuf_tabstop_push(out, 24);
@ -1275,20 +1307,23 @@ void __bch2_journal_debug_to_text(struct printbuf *out, struct journal *j)
prt_printf(out, "dirty journal entries:\t%llu/%llu\n", fifo_used(&j->pin), j->pin.size); prt_printf(out, "dirty journal entries:\t%llu/%llu\n", fifo_used(&j->pin), j->pin.size);
prt_printf(out, "seq:\t\t\t%llu\n", journal_cur_seq(j)); prt_printf(out, "seq:\t\t\t%llu\n", journal_cur_seq(j));
prt_printf(out, "seq_ondisk:\t\t%llu\n", j->seq_ondisk); prt_printf(out, "seq_ondisk:\t\t%llu\n", j->seq_ondisk);
prt_printf(out, "last_seq:\t\t%llu\n", journal_last_seq(j)); prt_printf(out, "last_seq:\t\t%llu\n", journal_last_seq(j));
prt_printf(out, "last_seq_ondisk:\t%llu\n", j->last_seq_ondisk); prt_printf(out, "last_seq_ondisk:\t%llu\n", j->last_seq_ondisk);
prt_printf(out, "flushed_seq_ondisk:\t%llu\n", j->flushed_seq_ondisk); prt_printf(out, "flushed_seq_ondisk:\t%llu\n", j->flushed_seq_ondisk);
prt_printf(out, "watermark:\t\t%s\n", bch2_watermarks[j->watermark]); prt_printf(out, "watermark:\t\t%s\n", bch2_watermarks[j->watermark]);
prt_printf(out, "each entry reserved:\t%u\n", j->entry_u64s_reserved); prt_printf(out, "each entry reserved:\t%u\n", j->entry_u64s_reserved);
prt_printf(out, "nr flush writes:\t%llu\n", j->nr_flush_writes); prt_printf(out, "nr flush writes:\t%llu\n", j->nr_flush_writes);
prt_printf(out, "nr noflush writes:\t%llu\n", j->nr_noflush_writes); prt_printf(out, "nr noflush writes:\t%llu\n", j->nr_noflush_writes);
prt_printf(out, "nr direct reclaim:\t%llu\n", j->nr_direct_reclaim); prt_printf(out, "average write size:\t");
prt_human_readable_u64(out, nr_writes ? div64_u64(j->entry_bytes_written, nr_writes) : 0);
prt_newline(out);
prt_printf(out, "nr direct reclaim:\t%llu\n", j->nr_direct_reclaim);
prt_printf(out, "nr background reclaim:\t%llu\n", j->nr_background_reclaim); prt_printf(out, "nr background reclaim:\t%llu\n", j->nr_background_reclaim);
prt_printf(out, "reclaim kicked:\t\t%u\n", j->reclaim_kicked); prt_printf(out, "reclaim kicked:\t\t%u\n", j->reclaim_kicked);
prt_printf(out, "reclaim runs in:\t%u ms\n", time_after(j->next_reclaim, now) prt_printf(out, "reclaim runs in:\t%u ms\n", time_after(j->next_reclaim, now)
? jiffies_to_msecs(j->next_reclaim - jiffies) : 0); ? jiffies_to_msecs(j->next_reclaim - jiffies) : 0);
prt_printf(out, "current entry sectors:\t%u\n", j->cur_entry_sectors); prt_printf(out, "current entry sectors:\t%u\n", j->cur_entry_sectors);
prt_printf(out, "current entry error:\t%s\n", bch2_journal_errors[j->cur_entry_error]); prt_printf(out, "current entry error:\t%s\n", bch2_journal_errors[j->cur_entry_error]);
prt_printf(out, "current entry:\t\t"); prt_printf(out, "current entry:\t\t");
switch (s.cur_entry_offset) { switch (s.cur_entry_offset) {
@ -1305,10 +1340,10 @@ void __bch2_journal_debug_to_text(struct printbuf *out, struct journal *j)
prt_newline(out); prt_newline(out);
for (seq = journal_cur_seq(j); for (u64 seq = journal_cur_seq(j);
seq >= journal_last_unwritten_seq(j); seq >= journal_last_unwritten_seq(j);
--seq) { --seq) {
i = seq & JOURNAL_BUF_MASK; unsigned i = seq & JOURNAL_BUF_MASK;
prt_printf(out, "unwritten entry:"); prt_printf(out, "unwritten entry:");
prt_tab(out); prt_tab(out);
@ -1352,8 +1387,7 @@ void __bch2_journal_debug_to_text(struct printbuf *out, struct journal *j)
j->space[journal_space_total].next_entry, j->space[journal_space_total].next_entry,
j->space[journal_space_total].total); j->space[journal_space_total].total);
for_each_member_device_rcu(ca, c, i, for_each_member_device_rcu(c, ca, &c->rw_devs[BCH_DATA_journal]) {
&c->rw_devs[BCH_DATA_journal]) {
struct journal_device *ja = &ca->journal; struct journal_device *ja = &ca->journal;
if (!test_bit(ca->dev_idx, c->rw_devs[BCH_DATA_journal].d)) if (!test_bit(ca->dev_idx, c->rw_devs[BCH_DATA_journal].d))
@ -1362,7 +1396,7 @@ void __bch2_journal_debug_to_text(struct printbuf *out, struct journal *j)
if (!ja->nr) if (!ja->nr)
continue; continue;
prt_printf(out, "dev %u:\n", i); prt_printf(out, "dev %u:\n", ca->dev_idx);
prt_printf(out, "\tnr\t\t%u\n", ja->nr); prt_printf(out, "\tnr\t\t%u\n", ja->nr);
prt_printf(out, "\tbucket size\t%u\n", ca->mi.bucket_size); prt_printf(out, "\tbucket size\t%u\n", ca->mi.bucket_size);
prt_printf(out, "\tavailable\t%u:%u\n", bch2_journal_dev_buckets_available(j, ja, journal_space_discarded), ja->sectors_free); prt_printf(out, "\tavailable\t%u:%u\n", bch2_journal_dev_buckets_available(j, ja, journal_space_discarded), ja->sectors_free);

View File

@ -119,7 +119,6 @@ static inline void journal_wake(struct journal *j)
{ {
wake_up(&j->wait); wake_up(&j->wait);
closure_wake_up(&j->async_wait); closure_wake_up(&j->async_wait);
closure_wake_up(&j->preres_wait);
} }
static inline struct journal_buf *journal_cur_buf(struct journal *j) static inline struct journal_buf *journal_cur_buf(struct journal *j)
@ -239,8 +238,6 @@ bch2_journal_add_entry(struct journal *j, struct journal_res *res,
static inline bool journal_entry_empty(struct jset *j) static inline bool journal_entry_empty(struct jset *j)
{ {
struct jset_entry *i;
if (j->seq != j->last_seq) if (j->seq != j->last_seq)
return false; return false;
@ -426,6 +423,7 @@ static inline void bch2_journal_set_replay_done(struct journal *j)
void bch2_journal_unblock(struct journal *); void bch2_journal_unblock(struct journal *);
void bch2_journal_block(struct journal *); void bch2_journal_block(struct journal *);
struct journal_buf *bch2_next_write_buffer_flush_journal_buf(struct journal *j, u64 max_seq);
void __bch2_journal_debug_to_text(struct printbuf *, struct journal *); void __bch2_journal_debug_to_text(struct printbuf *, struct journal *);
void bch2_journal_debug_to_text(struct printbuf *, struct journal *); void bch2_journal_debug_to_text(struct printbuf *, struct journal *);

View File

@ -4,6 +4,7 @@
#include "alloc_foreground.h" #include "alloc_foreground.h"
#include "btree_io.h" #include "btree_io.h"
#include "btree_update_interior.h" #include "btree_update_interior.h"
#include "btree_write_buffer.h"
#include "buckets.h" #include "buckets.h"
#include "checksum.h" #include "checksum.h"
#include "disk_groups.h" #include "disk_groups.h"
@ -26,11 +27,15 @@ static struct nonce journal_nonce(const struct jset *jset)
}}; }};
} }
static bool jset_csum_good(struct bch_fs *c, struct jset *j) static bool jset_csum_good(struct bch_fs *c, struct jset *j, struct bch_csum *csum)
{ {
return bch2_checksum_type_valid(c, JSET_CSUM_TYPE(j)) && if (!bch2_checksum_type_valid(c, JSET_CSUM_TYPE(j))) {
!bch2_crc_cmp(j->csum, *csum = (struct bch_csum) {};
csum_vstruct(c, JSET_CSUM_TYPE(j), journal_nonce(j), j)); return false;
}
*csum = csum_vstruct(c, JSET_CSUM_TYPE(j), journal_nonce(j), j);
return !bch2_crc_cmp(j->csum, *csum);
} }
static inline u32 journal_entry_radix_idx(struct bch_fs *c, u64 seq) static inline u32 journal_entry_radix_idx(struct bch_fs *c, u64 seq)
@ -687,8 +692,6 @@ static void journal_entry_dev_usage_to_text(struct printbuf *out, struct bch_fs
le64_to_cpu(u->d[i].sectors), le64_to_cpu(u->d[i].sectors),
le64_to_cpu(u->d[i].fragmented)); le64_to_cpu(u->d[i].fragmented));
} }
prt_printf(out, " buckets_ec: %llu", le64_to_cpu(u->buckets_ec));
} }
static int journal_entry_log_validate(struct bch_fs *c, static int journal_entry_log_validate(struct bch_fs *c,
@ -725,6 +728,22 @@ static void journal_entry_overwrite_to_text(struct printbuf *out, struct bch_fs
journal_entry_btree_keys_to_text(out, c, entry); journal_entry_btree_keys_to_text(out, c, entry);
} }
static int journal_entry_write_buffer_keys_validate(struct bch_fs *c,
struct jset *jset,
struct jset_entry *entry,
unsigned version, int big_endian,
enum bkey_invalid_flags flags)
{
return journal_entry_btree_keys_validate(c, jset, entry,
version, big_endian, READ);
}
static void journal_entry_write_buffer_keys_to_text(struct printbuf *out, struct bch_fs *c,
struct jset_entry *entry)
{
journal_entry_btree_keys_to_text(out, c, entry);
}
struct jset_entry_ops { struct jset_entry_ops {
int (*validate)(struct bch_fs *, struct jset *, int (*validate)(struct bch_fs *, struct jset *,
struct jset_entry *, unsigned, int, struct jset_entry *, unsigned, int,
@ -768,7 +787,6 @@ void bch2_journal_entry_to_text(struct printbuf *out, struct bch_fs *c,
static int jset_validate_entries(struct bch_fs *c, struct jset *jset, static int jset_validate_entries(struct bch_fs *c, struct jset *jset,
enum bkey_invalid_flags flags) enum bkey_invalid_flags flags)
{ {
struct jset_entry *entry;
unsigned version = le32_to_cpu(jset->version); unsigned version = le32_to_cpu(jset->version);
int ret = 0; int ret = 0;
@ -920,6 +938,7 @@ static int journal_read_bucket(struct bch_dev *ca,
u64 offset = bucket_to_sector(ca, ja->buckets[bucket]), u64 offset = bucket_to_sector(ca, ja->buckets[bucket]),
end = offset + ca->mi.bucket_size; end = offset + ca->mi.bucket_size;
bool saw_bad = false, csum_good; bool saw_bad = false, csum_good;
struct printbuf err = PRINTBUF;
int ret = 0; int ret = 0;
pr_debug("reading %u", bucket); pr_debug("reading %u", bucket);
@ -952,7 +971,7 @@ static int journal_read_bucket(struct bch_dev *ca,
* found on a different device, and missing or * found on a different device, and missing or
* no journal entries will be handled later * no journal entries will be handled later
*/ */
return 0; goto out;
} }
j = buf->data; j = buf->data;
@ -969,12 +988,12 @@ static int journal_read_bucket(struct bch_dev *ca,
ret = journal_read_buf_realloc(buf, ret = journal_read_buf_realloc(buf,
vstruct_bytes(j)); vstruct_bytes(j));
if (ret) if (ret)
return ret; goto err;
} }
goto reread; goto reread;
case JOURNAL_ENTRY_NONE: case JOURNAL_ENTRY_NONE:
if (!saw_bad) if (!saw_bad)
return 0; goto out;
/* /*
* On checksum error we don't really trust the size * On checksum error we don't really trust the size
* field of the journal entry we read, so try reading * field of the journal entry we read, so try reading
@ -983,7 +1002,7 @@ static int journal_read_bucket(struct bch_dev *ca,
sectors = block_sectors(c); sectors = block_sectors(c);
goto next_block; goto next_block;
default: default:
return ret; goto err;
} }
/* /*
@ -993,20 +1012,28 @@ static int journal_read_bucket(struct bch_dev *ca,
* bucket: * bucket:
*/ */
if (le64_to_cpu(j->seq) < ja->bucket_seq[bucket]) if (le64_to_cpu(j->seq) < ja->bucket_seq[bucket])
return 0; goto out;
ja->bucket_seq[bucket] = le64_to_cpu(j->seq); ja->bucket_seq[bucket] = le64_to_cpu(j->seq);
csum_good = jset_csum_good(c, j); enum bch_csum_type csum_type = JSET_CSUM_TYPE(j);
struct bch_csum csum;
csum_good = jset_csum_good(c, j, &csum);
if (bch2_dev_io_err_on(!csum_good, ca, BCH_MEMBER_ERROR_checksum, if (bch2_dev_io_err_on(!csum_good, ca, BCH_MEMBER_ERROR_checksum,
"journal checksum error")) "%s",
(printbuf_reset(&err),
prt_str(&err, "journal "),
bch2_csum_err_msg(&err, csum_type, j->csum, csum),
err.buf)))
saw_bad = true; saw_bad = true;
ret = bch2_encrypt(c, JSET_CSUM_TYPE(j), journal_nonce(j), ret = bch2_encrypt(c, JSET_CSUM_TYPE(j), journal_nonce(j),
j->encrypted_start, j->encrypted_start,
vstruct_end(j) - (void *) j->encrypted_start); vstruct_end(j) - (void *) j->encrypted_start);
bch2_fs_fatal_err_on(ret, c, bch2_fs_fatal_err_on(ret, c,
"error decrypting journal entry: %i", ret); "error decrypting journal entry: %s",
bch2_err_str(ret));
mutex_lock(&jlist->lock); mutex_lock(&jlist->lock);
ret = journal_entry_add(c, ca, (struct journal_ptr) { ret = journal_entry_add(c, ca, (struct journal_ptr) {
@ -1025,7 +1052,7 @@ static int journal_read_bucket(struct bch_dev *ca,
case JOURNAL_ENTRY_ADD_OUT_OF_RANGE: case JOURNAL_ENTRY_ADD_OUT_OF_RANGE:
break; break;
default: default:
return ret; goto err;
} }
next_block: next_block:
pr_debug("next"); pr_debug("next");
@ -1034,7 +1061,11 @@ static int journal_read_bucket(struct bch_dev *ca,
j = ((void *) j) + (sectors << 9); j = ((void *) j) + (sectors << 9);
} }
return 0; out:
ret = 0;
err:
printbuf_exit(&err);
return ret;
} }
static CLOSURE_CALLBACK(bch2_journal_read_device) static CLOSURE_CALLBACK(bch2_journal_read_device)
@ -1156,8 +1187,6 @@ int bch2_journal_read(struct bch_fs *c,
struct journal_list jlist; struct journal_list jlist;
struct journal_replay *i, **_i, *prev = NULL; struct journal_replay *i, **_i, *prev = NULL;
struct genradix_iter radix_iter; struct genradix_iter radix_iter;
struct bch_dev *ca;
unsigned iter;
struct printbuf buf = PRINTBUF; struct printbuf buf = PRINTBUF;
bool degraded = false, last_write_torn = false; bool degraded = false, last_write_torn = false;
u64 seq; u64 seq;
@ -1168,7 +1197,7 @@ int bch2_journal_read(struct bch_fs *c,
jlist.last_seq = 0; jlist.last_seq = 0;
jlist.ret = 0; jlist.ret = 0;
for_each_member_device(ca, c, iter) { for_each_member_device(c, ca) {
if (!c->opts.fsck && if (!c->opts.fsck &&
!(bch2_dev_has_data(c, ca) & (1 << BCH_DATA_journal))) !(bch2_dev_has_data(c, ca) & (1 << BCH_DATA_journal)))
continue; continue;
@ -1334,7 +1363,7 @@ int bch2_journal_read(struct bch_fs *c,
continue; continue;
for (ptr = 0; ptr < i->nr_ptrs; ptr++) { for (ptr = 0; ptr < i->nr_ptrs; ptr++) {
ca = bch_dev_bkey_exists(c, i->ptrs[ptr].dev); struct bch_dev *ca = bch_dev_bkey_exists(c, i->ptrs[ptr].dev);
if (!i->ptrs[ptr].csum_good) if (!i->ptrs[ptr].csum_good)
bch_err_dev_offset(ca, i->ptrs[ptr].sector, bch_err_dev_offset(ca, i->ptrs[ptr].sector,
@ -1505,6 +1534,8 @@ static int journal_write_alloc(struct journal *j, struct journal_buf *w)
static void journal_buf_realloc(struct journal *j, struct journal_buf *buf) static void journal_buf_realloc(struct journal *j, struct journal_buf *buf)
{ {
struct bch_fs *c = container_of(j, struct bch_fs, journal);
/* we aren't holding j->lock: */ /* we aren't holding j->lock: */
unsigned new_size = READ_ONCE(j->buf_size_want); unsigned new_size = READ_ONCE(j->buf_size_want);
void *new_buf; void *new_buf;
@ -1512,6 +1543,11 @@ static void journal_buf_realloc(struct journal *j, struct journal_buf *buf)
if (buf->buf_size >= new_size) if (buf->buf_size >= new_size)
return; return;
size_t btree_write_buffer_size = new_size / 64;
if (bch2_btree_write_buffer_resize(c, btree_write_buffer_size))
return;
new_buf = kvpmalloc(new_size, GFP_NOFS|__GFP_NOWARN); new_buf = kvpmalloc(new_size, GFP_NOFS|__GFP_NOWARN);
if (!new_buf) if (!new_buf)
return; return;
@ -1604,6 +1640,9 @@ static CLOSURE_CALLBACK(journal_write_done)
bch2_journal_reclaim_fast(j); bch2_journal_reclaim_fast(j);
bch2_journal_space_available(j); bch2_journal_space_available(j);
track_event_change(&c->times[BCH_TIME_blocked_journal_max_in_flight],
&j->max_in_flight_start, false);
closure_wake_up(&w->wait); closure_wake_up(&w->wait);
journal_wake(j); journal_wake(j);
@ -1656,7 +1695,6 @@ static CLOSURE_CALLBACK(do_journal_write)
struct bch_fs *c = container_of(j, struct bch_fs, journal); struct bch_fs *c = container_of(j, struct bch_fs, journal);
struct bch_dev *ca; struct bch_dev *ca;
struct journal_buf *w = journal_last_unwritten_buf(j); struct journal_buf *w = journal_last_unwritten_buf(j);
struct bch_extent_ptr *ptr;
struct bio *bio; struct bio *bio;
unsigned sectors = vstruct_sectors(w->data, c->block_bits); unsigned sectors = vstruct_sectors(w->data, c->block_bits);
@ -1700,11 +1738,13 @@ static CLOSURE_CALLBACK(do_journal_write)
static int bch2_journal_write_prep(struct journal *j, struct journal_buf *w) static int bch2_journal_write_prep(struct journal *j, struct journal_buf *w)
{ {
struct bch_fs *c = container_of(j, struct bch_fs, journal); struct bch_fs *c = container_of(j, struct bch_fs, journal);
struct jset_entry *start, *end, *i, *next, *prev = NULL; struct jset_entry *start, *end;
struct jset *jset = w->data; struct jset *jset = w->data;
struct journal_keys_to_wb wb = { NULL };
unsigned sectors, bytes, u64s; unsigned sectors, bytes, u64s;
bool validate_before_checksum = false;
unsigned long btree_roots_have = 0; unsigned long btree_roots_have = 0;
bool validate_before_checksum = false;
u64 seq = le64_to_cpu(jset->seq);
int ret; int ret;
/* /*
@ -1715,7 +1755,7 @@ static int bch2_journal_write_prep(struct journal *j, struct journal_buf *w)
* If we wanted to be really fancy here, we could sort all the keys in * If we wanted to be really fancy here, we could sort all the keys in
* the jset and drop keys that were overwritten - probably not worth it: * the jset and drop keys that were overwritten - probably not worth it:
*/ */
vstruct_for_each_safe(jset, i, next) { vstruct_for_each(jset, i) {
unsigned u64s = le16_to_cpu(i->u64s); unsigned u64s = le16_to_cpu(i->u64s);
/* Empty entry: */ /* Empty entry: */
@ -1732,40 +1772,40 @@ static int bch2_journal_write_prep(struct journal *j, struct journal_buf *w)
* to c->btree_roots we have to get any missing btree roots and * to c->btree_roots we have to get any missing btree roots and
* add them to this journal entry: * add them to this journal entry:
*/ */
if (i->type == BCH_JSET_ENTRY_btree_root) { switch (i->type) {
case BCH_JSET_ENTRY_btree_root:
bch2_journal_entry_to_btree_root(c, i); bch2_journal_entry_to_btree_root(c, i);
__set_bit(i->btree_id, &btree_roots_have); __set_bit(i->btree_id, &btree_roots_have);
} break;
case BCH_JSET_ENTRY_write_buffer_keys:
EBUG_ON(!w->need_flush_to_write_buffer);
/* Can we merge with previous entry? */ if (!wb.wb)
if (prev && bch2_journal_keys_to_write_buffer_start(c, &wb, seq);
i->btree_id == prev->btree_id &&
i->level == prev->level &&
i->type == prev->type &&
i->type == BCH_JSET_ENTRY_btree_keys &&
le16_to_cpu(prev->u64s) + u64s <= U16_MAX) {
memmove_u64s_down(vstruct_next(prev),
i->_data,
u64s);
le16_add_cpu(&prev->u64s, u64s);
continue;
}
/* Couldn't merge, move i into new position (after prev): */ struct bkey_i *k;
prev = prev ? vstruct_next(prev) : jset->start; jset_entry_for_each_key(i, k) {
if (i != prev) ret = bch2_journal_key_to_wb(c, &wb, i->btree_id, k);
memmove_u64s_down(prev, i, jset_u64s(u64s)); if (ret) {
bch2_fs_fatal_error(c, "-ENOMEM flushing journal keys to btree write buffer");
bch2_journal_keys_to_write_buffer_end(c, &wb);
return ret;
}
}
i->type = BCH_JSET_ENTRY_btree_keys;
break;
}
} }
prev = prev ? vstruct_next(prev) : jset->start; if (wb.wb)
jset->u64s = cpu_to_le32((u64 *) prev - jset->_data); bch2_journal_keys_to_write_buffer_end(c, &wb);
w->need_flush_to_write_buffer = false;
start = end = vstruct_last(jset); start = end = vstruct_last(jset);
end = bch2_btree_roots_to_journal_entries(c, end, btree_roots_have); end = bch2_btree_roots_to_journal_entries(c, end, btree_roots_have);
bch2_journal_super_entries_add_common(c, &end, bch2_journal_super_entries_add_common(c, &end, seq);
le64_to_cpu(jset->seq));
u64s = (u64 *) end - (u64 *) start; u64s = (u64 *) end - (u64 *) start;
BUG_ON(u64s > j->entry_u64s_reserved); BUG_ON(u64s > j->entry_u64s_reserved);
@ -1788,7 +1828,7 @@ static int bch2_journal_write_prep(struct journal *j, struct journal_buf *w)
SET_JSET_CSUM_TYPE(jset, bch2_meta_checksum_type(c)); SET_JSET_CSUM_TYPE(jset, bch2_meta_checksum_type(c));
if (!JSET_NO_FLUSH(jset) && journal_entry_empty(jset)) if (!JSET_NO_FLUSH(jset) && journal_entry_empty(jset))
j->last_empty_seq = le64_to_cpu(jset->seq); j->last_empty_seq = seq;
if (bch2_csum_type_is_encryption(JSET_CSUM_TYPE(jset))) if (bch2_csum_type_is_encryption(JSET_CSUM_TYPE(jset)))
validate_before_checksum = true; validate_before_checksum = true;
@ -1847,7 +1887,7 @@ static int bch2_journal_write_pick_flush(struct journal *j, struct journal_buf *
(!w->must_flush && (!w->must_flush &&
(jiffies - j->last_flush_write) < msecs_to_jiffies(c->opts.journal_flush_delay) && (jiffies - j->last_flush_write) < msecs_to_jiffies(c->opts.journal_flush_delay) &&
test_bit(JOURNAL_MAY_SKIP_FLUSH, &j->flags))) { test_bit(JOURNAL_MAY_SKIP_FLUSH, &j->flags))) {
w->noflush = true; w->noflush = true;
SET_JSET_NO_FLUSH(w->data, true); SET_JSET_NO_FLUSH(w->data, true);
w->data->last_seq = 0; w->data->last_seq = 0;
w->last_seq = 0; w->last_seq = 0;
@ -1866,12 +1906,11 @@ CLOSURE_CALLBACK(bch2_journal_write)
{ {
closure_type(j, struct journal, io); closure_type(j, struct journal, io);
struct bch_fs *c = container_of(j, struct bch_fs, journal); struct bch_fs *c = container_of(j, struct bch_fs, journal);
struct bch_dev *ca;
struct journal_buf *w = journal_last_unwritten_buf(j); struct journal_buf *w = journal_last_unwritten_buf(j);
struct bch_replicas_padded replicas; struct bch_replicas_padded replicas;
struct bio *bio; struct bio *bio;
struct printbuf journal_debug_buf = PRINTBUF; struct printbuf journal_debug_buf = PRINTBUF;
unsigned i, nr_rw_members = 0; unsigned nr_rw_members = 0;
int ret; int ret;
BUG_ON(BCH_SB_CLEAN(c->disk_sb.sb)); BUG_ON(BCH_SB_CLEAN(c->disk_sb.sb));
@ -1884,12 +1923,16 @@ CLOSURE_CALLBACK(bch2_journal_write)
if (ret) if (ret)
goto err; goto err;
mutex_lock(&j->buf_lock);
journal_buf_realloc(j, w); journal_buf_realloc(j, w);
ret = bch2_journal_write_prep(j, w); ret = bch2_journal_write_prep(j, w);
mutex_unlock(&j->buf_lock);
if (ret) if (ret)
goto err; goto err;
j->entry_bytes_written += vstruct_bytes(w->data);
while (1) { while (1) {
spin_lock(&j->lock); spin_lock(&j->lock);
ret = journal_write_alloc(j, w); ret = journal_write_alloc(j, w);
@ -1927,7 +1970,7 @@ CLOSURE_CALLBACK(bch2_journal_write)
if (c->opts.nochanges) if (c->opts.nochanges)
goto no_io; goto no_io;
for_each_rw_member(ca, c, i) for_each_rw_member(c, ca)
nr_rw_members++; nr_rw_members++;
if (nr_rw_members > 1) if (nr_rw_members > 1)
@ -1944,7 +1987,7 @@ CLOSURE_CALLBACK(bch2_journal_write)
goto err; goto err;
if (!JSET_NO_FLUSH(w->data) && w->separate_flush) { if (!JSET_NO_FLUSH(w->data) && w->separate_flush) {
for_each_rw_member(ca, c, i) { for_each_rw_member(c, ca) {
percpu_ref_get(&ca->io_ref); percpu_ref_get(&ca->io_ref);
bio = ca->journal.bio; bio = ca->journal.bio;

View File

@ -3,6 +3,7 @@
#include "bcachefs.h" #include "bcachefs.h"
#include "btree_key_cache.h" #include "btree_key_cache.h"
#include "btree_update.h" #include "btree_update.h"
#include "btree_write_buffer.h"
#include "buckets.h" #include "buckets.h"
#include "errcode.h" #include "errcode.h"
#include "error.h" #include "error.h"
@ -50,17 +51,24 @@ unsigned bch2_journal_dev_buckets_available(struct journal *j,
return available; return available;
} }
static inline void journal_set_watermark(struct journal *j, bool low_on_space) void bch2_journal_set_watermark(struct journal *j)
{ {
unsigned watermark = BCH_WATERMARK_stripe; struct bch_fs *c = container_of(j, struct bch_fs, journal);
bool low_on_space = j->space[journal_space_clean].total * 4 <=
j->space[journal_space_total].total;
bool low_on_pin = fifo_free(&j->pin) < j->pin.size / 4;
bool low_on_wb = bch2_btree_write_buffer_must_wait(c);
unsigned watermark = low_on_space || low_on_pin || low_on_wb
? BCH_WATERMARK_reclaim
: BCH_WATERMARK_stripe;
if (low_on_space) if (track_event_change(&c->times[BCH_TIME_blocked_journal_low_on_space],
watermark = max_t(unsigned, watermark, BCH_WATERMARK_reclaim); &j->low_on_space_start, low_on_space) ||
if (fifo_free(&j->pin) < j->pin.size / 4) track_event_change(&c->times[BCH_TIME_blocked_journal_low_on_pin],
watermark = max_t(unsigned, watermark, BCH_WATERMARK_reclaim); &j->low_on_pin_start, low_on_pin) ||
track_event_change(&c->times[BCH_TIME_blocked_write_buffer_full],
if (watermark == j->watermark) &j->write_buffer_full_start, low_on_wb))
return; trace_and_count(c, journal_full, c);
swap(watermark, j->watermark); swap(watermark, j->watermark);
if (watermark > j->watermark) if (watermark > j->watermark)
@ -128,15 +136,13 @@ static struct journal_space __journal_space_available(struct journal *j, unsigne
enum journal_space_from from) enum journal_space_from from)
{ {
struct bch_fs *c = container_of(j, struct bch_fs, journal); struct bch_fs *c = container_of(j, struct bch_fs, journal);
struct bch_dev *ca; unsigned pos, nr_devs = 0;
unsigned i, pos, nr_devs = 0;
struct journal_space space, dev_space[BCH_SB_MEMBERS_MAX]; struct journal_space space, dev_space[BCH_SB_MEMBERS_MAX];
BUG_ON(nr_devs_want > ARRAY_SIZE(dev_space)); BUG_ON(nr_devs_want > ARRAY_SIZE(dev_space));
rcu_read_lock(); rcu_read_lock();
for_each_member_device_rcu(ca, c, i, for_each_member_device_rcu(c, ca, &c->rw_devs[BCH_DATA_journal]) {
&c->rw_devs[BCH_DATA_journal]) {
if (!ca->journal.nr) if (!ca->journal.nr)
continue; continue;
@ -165,19 +171,17 @@ static struct journal_space __journal_space_available(struct journal *j, unsigne
void bch2_journal_space_available(struct journal *j) void bch2_journal_space_available(struct journal *j)
{ {
struct bch_fs *c = container_of(j, struct bch_fs, journal); struct bch_fs *c = container_of(j, struct bch_fs, journal);
struct bch_dev *ca;
unsigned clean, clean_ondisk, total; unsigned clean, clean_ondisk, total;
unsigned max_entry_size = min(j->buf[0].buf_size >> 9, unsigned max_entry_size = min(j->buf[0].buf_size >> 9,
j->buf[1].buf_size >> 9); j->buf[1].buf_size >> 9);
unsigned i, nr_online = 0, nr_devs_want; unsigned nr_online = 0, nr_devs_want;
bool can_discard = false; bool can_discard = false;
int ret = 0; int ret = 0;
lockdep_assert_held(&j->lock); lockdep_assert_held(&j->lock);
rcu_read_lock(); rcu_read_lock();
for_each_member_device_rcu(ca, c, i, for_each_member_device_rcu(c, ca, &c->rw_devs[BCH_DATA_journal]) {
&c->rw_devs[BCH_DATA_journal]) {
struct journal_device *ja = &ca->journal; struct journal_device *ja = &ca->journal;
if (!ja->nr) if (!ja->nr)
@ -208,7 +212,7 @@ void bch2_journal_space_available(struct journal *j)
nr_devs_want = min_t(unsigned, nr_online, c->opts.metadata_replicas); nr_devs_want = min_t(unsigned, nr_online, c->opts.metadata_replicas);
for (i = 0; i < journal_space_nr; i++) for (unsigned i = 0; i < journal_space_nr; i++)
j->space[i] = __journal_space_available(j, nr_devs_want, i); j->space[i] = __journal_space_available(j, nr_devs_want, i);
clean_ondisk = j->space[journal_space_clean_ondisk].total; clean_ondisk = j->space[journal_space_clean_ondisk].total;
@ -226,7 +230,7 @@ void bch2_journal_space_available(struct journal *j)
else else
clear_bit(JOURNAL_MAY_SKIP_FLUSH, &j->flags); clear_bit(JOURNAL_MAY_SKIP_FLUSH, &j->flags);
journal_set_watermark(j, clean * 4 <= total); bch2_journal_set_watermark(j);
out: out:
j->cur_entry_sectors = !ret ? j->space[journal_space_discarded].next_entry : 0; j->cur_entry_sectors = !ret ? j->space[journal_space_discarded].next_entry : 0;
j->cur_entry_error = ret; j->cur_entry_error = ret;
@ -255,12 +259,10 @@ static bool should_discard_bucket(struct journal *j, struct journal_device *ja)
void bch2_journal_do_discards(struct journal *j) void bch2_journal_do_discards(struct journal *j)
{ {
struct bch_fs *c = container_of(j, struct bch_fs, journal); struct bch_fs *c = container_of(j, struct bch_fs, journal);
struct bch_dev *ca;
unsigned iter;
mutex_lock(&j->discard_lock); mutex_lock(&j->discard_lock);
for_each_rw_member(ca, c, iter) { for_each_rw_member(c, ca) {
struct journal_device *ja = &ca->journal; struct journal_device *ja = &ca->journal;
while (should_discard_bucket(j, ja)) { while (should_discard_bucket(j, ja)) {
@ -299,6 +301,7 @@ void bch2_journal_reclaim_fast(struct journal *j)
* all btree nodes got written out * all btree nodes got written out
*/ */
while (!fifo_empty(&j->pin) && while (!fifo_empty(&j->pin) &&
j->pin.front <= j->seq_ondisk &&
!atomic_read(&fifo_peek_front(&j->pin).count)) { !atomic_read(&fifo_peek_front(&j->pin).count)) {
j->pin.front++; j->pin.front++;
popped = true; popped = true;
@ -367,15 +370,36 @@ static enum journal_pin_type journal_pin_type(journal_pin_flush_fn fn)
return JOURNAL_PIN_other; return JOURNAL_PIN_other;
} }
void bch2_journal_pin_set(struct journal *j, u64 seq, static inline void bch2_journal_pin_set_locked(struct journal *j, u64 seq,
struct journal_entry_pin *pin, struct journal_entry_pin *pin,
journal_pin_flush_fn flush_fn) journal_pin_flush_fn flush_fn,
enum journal_pin_type type)
{
struct journal_entry_pin_list *pin_list = journal_seq_pin(j, seq);
/*
* flush_fn is how we identify journal pins in debugfs, so must always
* exist, even if it doesn't do anything:
*/
BUG_ON(!flush_fn);
atomic_inc(&pin_list->count);
pin->seq = seq;
pin->flush = flush_fn;
list_add(&pin->list, &pin_list->list[type]);
}
void bch2_journal_pin_copy(struct journal *j,
struct journal_entry_pin *dst,
struct journal_entry_pin *src,
journal_pin_flush_fn flush_fn)
{ {
struct journal_entry_pin_list *pin_list;
bool reclaim; bool reclaim;
spin_lock(&j->lock); spin_lock(&j->lock);
u64 seq = READ_ONCE(src->seq);
if (seq < journal_last_seq(j)) { if (seq < journal_last_seq(j)) {
/* /*
* bch2_journal_pin_copy() raced with bch2_journal_pin_drop() on * bch2_journal_pin_copy() raced with bch2_journal_pin_drop() on
@ -387,18 +411,34 @@ void bch2_journal_pin_set(struct journal *j, u64 seq,
return; return;
} }
pin_list = journal_seq_pin(j, seq); reclaim = __journal_pin_drop(j, dst);
bch2_journal_pin_set_locked(j, seq, dst, flush_fn, journal_pin_type(flush_fn));
if (reclaim)
bch2_journal_reclaim_fast(j);
spin_unlock(&j->lock);
/*
* If the journal is currently full, we might want to call flush_fn
* immediately:
*/
journal_wake(j);
}
void bch2_journal_pin_set(struct journal *j, u64 seq,
struct journal_entry_pin *pin,
journal_pin_flush_fn flush_fn)
{
bool reclaim;
spin_lock(&j->lock);
BUG_ON(seq < journal_last_seq(j));
reclaim = __journal_pin_drop(j, pin); reclaim = __journal_pin_drop(j, pin);
atomic_inc(&pin_list->count); bch2_journal_pin_set_locked(j, seq, pin, flush_fn, journal_pin_type(flush_fn));
pin->seq = seq;
pin->flush = flush_fn;
if (flush_fn)
list_add(&pin->list, &pin_list->list[journal_pin_type(flush_fn)]);
else
list_add(&pin->list, &pin_list->flushed);
if (reclaim) if (reclaim)
bch2_journal_reclaim_fast(j); bch2_journal_reclaim_fast(j);
@ -537,13 +577,11 @@ static size_t journal_flush_pins(struct journal *j,
static u64 journal_seq_to_flush(struct journal *j) static u64 journal_seq_to_flush(struct journal *j)
{ {
struct bch_fs *c = container_of(j, struct bch_fs, journal); struct bch_fs *c = container_of(j, struct bch_fs, journal);
struct bch_dev *ca;
u64 seq_to_flush = 0; u64 seq_to_flush = 0;
unsigned iter;
spin_lock(&j->lock); spin_lock(&j->lock);
for_each_rw_member(ca, c, iter) { for_each_rw_member(c, ca) {
struct journal_device *ja = &ca->journal; struct journal_device *ja = &ca->journal;
unsigned nr_buckets, bucket_to_flush; unsigned nr_buckets, bucket_to_flush;
@ -747,10 +785,9 @@ int bch2_journal_reclaim_start(struct journal *j)
p = kthread_create(bch2_journal_reclaim_thread, j, p = kthread_create(bch2_journal_reclaim_thread, j,
"bch-reclaim/%s", c->name); "bch-reclaim/%s", c->name);
ret = PTR_ERR_OR_ZERO(p); ret = PTR_ERR_OR_ZERO(p);
if (ret) { bch_err_msg(c, ret, "creating journal reclaim thread");
bch_err_msg(c, ret, "creating journal reclaim thread"); if (ret)
return ret; return ret;
}
get_task_struct(p); get_task_struct(p);
j->reclaim_thread = p; j->reclaim_thread = p;
@ -796,6 +833,7 @@ static int journal_flush_done(struct journal *j, u64 seq_to_flush,
bool bch2_journal_flush_pins(struct journal *j, u64 seq_to_flush) bool bch2_journal_flush_pins(struct journal *j, u64 seq_to_flush)
{ {
/* time_stats this */
bool did_work = false; bool did_work = false;
if (!test_bit(JOURNAL_STARTED, &j->flags)) if (!test_bit(JOURNAL_STARTED, &j->flags))

View File

@ -16,6 +16,7 @@ static inline void journal_reclaim_kick(struct journal *j)
unsigned bch2_journal_dev_buckets_available(struct journal *, unsigned bch2_journal_dev_buckets_available(struct journal *,
struct journal_device *, struct journal_device *,
enum journal_space_from); enum journal_space_from);
void bch2_journal_set_watermark(struct journal *);
void bch2_journal_space_available(struct journal *); void bch2_journal_space_available(struct journal *);
static inline bool journal_pin_active(struct journal_entry_pin *pin) static inline bool journal_pin_active(struct journal_entry_pin *pin)
@ -47,17 +48,10 @@ static inline void bch2_journal_pin_add(struct journal *j, u64 seq,
bch2_journal_pin_set(j, seq, pin, flush_fn); bch2_journal_pin_set(j, seq, pin, flush_fn);
} }
static inline void bch2_journal_pin_copy(struct journal *j, void bch2_journal_pin_copy(struct journal *,
struct journal_entry_pin *dst, struct journal_entry_pin *,
struct journal_entry_pin *src, struct journal_entry_pin *,
journal_pin_flush_fn flush_fn) journal_pin_flush_fn);
{
/* Guard against racing with journal_pin_drop(src): */
u64 seq = READ_ONCE(src->seq);
if (seq)
bch2_journal_pin_add(j, seq, dst, flush_fn);
}
static inline void bch2_journal_pin_update(struct journal *j, u64 seq, static inline void bch2_journal_pin_update(struct journal *j, u64 seq,
struct journal_entry_pin *pin, struct journal_entry_pin *pin,

View File

@ -267,7 +267,7 @@ void bch2_blacklist_entries_gc(struct work_struct *work)
while (!(ret = PTR_ERR_OR_ZERO(b)) && while (!(ret = PTR_ERR_OR_ZERO(b)) &&
b && b &&
!test_bit(BCH_FS_STOPPING, &c->flags)) !test_bit(BCH_FS_stopping, &c->flags))
b = bch2_btree_iter_next_node(&iter); b = bch2_btree_iter_next_node(&iter);
if (bch2_err_matches(ret, BCH_ERR_transaction_restart)) if (bch2_err_matches(ret, BCH_ERR_transaction_restart))

View File

@ -36,6 +36,7 @@ struct journal_buf {
bool noflush; /* write has already been kicked off, and was noflush */ bool noflush; /* write has already been kicked off, and was noflush */
bool must_flush; /* something wants a flush */ bool must_flush; /* something wants a flush */
bool separate_flush; bool separate_flush;
bool need_flush_to_write_buffer;
}; };
/* /*
@ -181,6 +182,12 @@ struct journal {
*/ */
darray_u64 early_journal_entries; darray_u64 early_journal_entries;
/*
* Protects journal_buf->data, when accessing without a jorunal
* reservation: for synchronization between the btree write buffer code
* and the journal write path:
*/
struct mutex buf_lock;
/* /*
* Two journal entries -- one is currently open for new entries, the * Two journal entries -- one is currently open for new entries, the
* other is possibly being written out. * other is possibly being written out.
@ -195,7 +202,6 @@ struct journal {
/* Used when waiting because the journal was full */ /* Used when waiting because the journal was full */
wait_queue_head_t wait; wait_queue_head_t wait;
struct closure_waitlist async_wait; struct closure_waitlist async_wait;
struct closure_waitlist preres_wait;
struct closure io; struct closure io;
struct delayed_work write_work; struct delayed_work write_work;
@ -262,15 +268,19 @@ struct journal {
unsigned long last_flush_write; unsigned long last_flush_write;
u64 res_get_blocked_start;
u64 write_start_time; u64 write_start_time;
u64 nr_flush_writes; u64 nr_flush_writes;
u64 nr_noflush_writes; u64 nr_noflush_writes;
u64 entry_bytes_written;
u64 low_on_space_start;
u64 low_on_pin_start;
u64 max_in_flight_start;
u64 write_buffer_full_start;
struct bch2_time_stats *flush_write_time; struct bch2_time_stats *flush_write_time;
struct bch2_time_stats *noflush_write_time; struct bch2_time_stats *noflush_write_time;
struct bch2_time_stats *blocked_time;
struct bch2_time_stats *flush_seq_time; struct bch2_time_stats *flush_seq_time;
#ifdef CONFIG_DEBUG_LOCK_ALLOC #ifdef CONFIG_DEBUG_LOCK_ALLOC

View File

@ -43,8 +43,6 @@ void bch2_keylist_pop_front(struct keylist *l)
#ifdef CONFIG_BCACHEFS_DEBUG #ifdef CONFIG_BCACHEFS_DEBUG
void bch2_verify_keylist_sorted(struct keylist *l) void bch2_verify_keylist_sorted(struct keylist *l)
{ {
struct bkey_i *k;
for_each_keylist_key(l, k) for_each_keylist_key(l, k)
BUG_ON(bkey_next(k) != l->top && BUG_ON(bkey_next(k) != l->top &&
bpos_ge(k->k.p, bkey_next(k)->k.p)); bpos_ge(k->k.p, bkey_next(k)->k.p));

View File

@ -50,18 +50,16 @@ static inline struct bkey_i *bch2_keylist_front(struct keylist *l)
} }
#define for_each_keylist_key(_keylist, _k) \ #define for_each_keylist_key(_keylist, _k) \
for (_k = (_keylist)->keys; \ for (struct bkey_i *_k = (_keylist)->keys; \
_k != (_keylist)->top; \ _k != (_keylist)->top; \
_k = bkey_next(_k)) _k = bkey_next(_k))
static inline u64 keylist_sectors(struct keylist *keys) static inline u64 keylist_sectors(struct keylist *keys)
{ {
struct bkey_i *k;
u64 ret = 0; u64 ret = 0;
for_each_keylist_key(keys, k) for_each_keylist_key(keys, k)
ret += k->k.size; ret += k->k.size;
return ret; return ret;
} }

View File

@ -54,16 +54,12 @@ static int resume_logged_op(struct btree_trans *trans, struct btree_iter *iter,
int bch2_resume_logged_ops(struct bch_fs *c) int bch2_resume_logged_ops(struct bch_fs *c)
{ {
struct btree_iter iter; int ret = bch2_trans_run(c,
struct bkey_s_c k; for_each_btree_key(trans, iter,
int ret; BTREE_ID_logged_ops, POS_MIN,
BTREE_ITER_PREFETCH, k,
ret = bch2_trans_run(c,
for_each_btree_key2(trans, iter,
BTREE_ID_logged_ops, POS_MIN, BTREE_ITER_PREFETCH, k,
resume_logged_op(trans, &iter, k))); resume_logged_op(trans, &iter, k)));
if (ret) bch_err_fn(c, ret);
bch_err_fn(c, ret);
return ret; return ret;
} }
@ -85,13 +81,13 @@ static int __bch2_logged_op_start(struct btree_trans *trans, struct bkey_i *k)
int bch2_logged_op_start(struct btree_trans *trans, struct bkey_i *k) int bch2_logged_op_start(struct btree_trans *trans, struct bkey_i *k)
{ {
return commit_do(trans, NULL, NULL, BTREE_INSERT_NOFAIL, return commit_do(trans, NULL, NULL, BCH_TRANS_COMMIT_no_enospc,
__bch2_logged_op_start(trans, k)); __bch2_logged_op_start(trans, k));
} }
void bch2_logged_op_finish(struct btree_trans *trans, struct bkey_i *k) void bch2_logged_op_finish(struct btree_trans *trans, struct bkey_i *k)
{ {
int ret = commit_do(trans, NULL, NULL, BTREE_INSERT_NOFAIL, int ret = commit_do(trans, NULL, NULL, BCH_TRANS_COMMIT_no_enospc,
bch2_btree_delete(trans, BTREE_ID_logged_ops, k->k.p, 0)); bch2_btree_delete(trans, BTREE_ID_logged_ops, k->k.p, 0));
/* /*
* This needs to be a fatal error because we've left an unfinished * This needs to be a fatal error because we've left an unfinished

View File

@ -147,18 +147,13 @@ static int bch2_check_lru_key(struct btree_trans *trans,
int bch2_check_lrus(struct bch_fs *c) int bch2_check_lrus(struct bch_fs *c)
{ {
struct btree_iter iter;
struct bkey_s_c k;
struct bpos last_flushed_pos = POS_MIN; struct bpos last_flushed_pos = POS_MIN;
int ret = 0; int ret = bch2_trans_run(c,
ret = bch2_trans_run(c,
for_each_btree_key_commit(trans, iter, for_each_btree_key_commit(trans, iter,
BTREE_ID_lru, POS_MIN, BTREE_ITER_PREFETCH, k, BTREE_ID_lru, POS_MIN, BTREE_ITER_PREFETCH, k,
NULL, NULL, BTREE_INSERT_NOFAIL|BTREE_INSERT_LAZY_RW, NULL, NULL, BCH_TRANS_COMMIT_no_enospc|BCH_TRANS_COMMIT_lazy_rw,
bch2_check_lru_key(trans, &iter, k, &last_flushed_pos))); bch2_check_lru_key(trans, &iter, k, &last_flushed_pos)));
if (ret) bch_err_fn(c, ret);
bch_err_fn(c, ret);
return ret; return ret;
} }

View File

@ -62,6 +62,7 @@ EXPORT_SYMBOL_GPL(u128_div);
/** /**
* mean_and_variance_get_mean() - get mean from @s * mean_and_variance_get_mean() - get mean from @s
* @s: mean and variance number of samples and their sums
*/ */
s64 mean_and_variance_get_mean(struct mean_and_variance s) s64 mean_and_variance_get_mean(struct mean_and_variance s)
{ {
@ -71,6 +72,7 @@ EXPORT_SYMBOL_GPL(mean_and_variance_get_mean);
/** /**
* mean_and_variance_get_variance() - get variance from @s1 * mean_and_variance_get_variance() - get variance from @s1
* @s1: mean and variance number of samples and sums
* *
* see linked pdf equation 12. * see linked pdf equation 12.
*/ */
@ -89,6 +91,7 @@ EXPORT_SYMBOL_GPL(mean_and_variance_get_variance);
/** /**
* mean_and_variance_get_stddev() - get standard deviation from @s * mean_and_variance_get_stddev() - get standard deviation from @s
* @s: mean and variance number of samples and their sums
*/ */
u32 mean_and_variance_get_stddev(struct mean_and_variance s) u32 mean_and_variance_get_stddev(struct mean_and_variance s)
{ {
@ -98,8 +101,8 @@ EXPORT_SYMBOL_GPL(mean_and_variance_get_stddev);
/** /**
* mean_and_variance_weighted_update() - exponentially weighted variant of mean_and_variance_update() * mean_and_variance_weighted_update() - exponentially weighted variant of mean_and_variance_update()
* @s1: .. * @s: mean and variance number of samples and their sums
* @s2: .. * @x: new value to include in the &mean_and_variance_weighted
* *
* see linked pdf: function derived from equations 140-143 where alpha = 2^w. * see linked pdf: function derived from equations 140-143 where alpha = 2^w.
* values are stored bitshifted for performance and added precision. * values are stored bitshifted for performance and added precision.
@ -129,6 +132,7 @@ EXPORT_SYMBOL_GPL(mean_and_variance_weighted_update);
/** /**
* mean_and_variance_weighted_get_mean() - get mean from @s * mean_and_variance_weighted_get_mean() - get mean from @s
* @s: mean and variance number of samples and their sums
*/ */
s64 mean_and_variance_weighted_get_mean(struct mean_and_variance_weighted s) s64 mean_and_variance_weighted_get_mean(struct mean_and_variance_weighted s)
{ {
@ -138,6 +142,7 @@ EXPORT_SYMBOL_GPL(mean_and_variance_weighted_get_mean);
/** /**
* mean_and_variance_weighted_get_variance() -- get variance from @s * mean_and_variance_weighted_get_variance() -- get variance from @s
* @s: mean and variance number of samples and their sums
*/ */
u64 mean_and_variance_weighted_get_variance(struct mean_and_variance_weighted s) u64 mean_and_variance_weighted_get_variance(struct mean_and_variance_weighted s)
{ {
@ -148,6 +153,7 @@ EXPORT_SYMBOL_GPL(mean_and_variance_weighted_get_variance);
/** /**
* mean_and_variance_weighted_get_stddev() - get standard deviation from @s * mean_and_variance_weighted_get_stddev() - get standard deviation from @s
* @s: mean and variance number of samples and their sums
*/ */
u32 mean_and_variance_weighted_get_stddev(struct mean_and_variance_weighted s) u32 mean_and_variance_weighted_get_stddev(struct mean_and_variance_weighted s)
{ {

View File

@ -12,9 +12,12 @@
/* /*
* u128_u: u128 user mode, because not all architectures support a real int128 * u128_u: u128 user mode, because not all architectures support a real int128
* type * type
*
* We don't use this version in userspace, because in userspace we link with
* Rust and rustc has issues with u128.
*/ */
#ifdef __SIZEOF_INT128__ #if defined(__SIZEOF_INT128__) && defined(__KERNEL__)
typedef struct { typedef struct {
unsigned __int128 v; unsigned __int128 v;

View File

@ -79,8 +79,6 @@ static int bch2_dev_usrdata_drop_key(struct btree_trans *trans,
static int bch2_dev_usrdata_drop(struct bch_fs *c, unsigned dev_idx, int flags) static int bch2_dev_usrdata_drop(struct bch_fs *c, unsigned dev_idx, int flags)
{ {
struct btree_trans *trans = bch2_trans_get(c); struct btree_trans *trans = bch2_trans_get(c);
struct btree_iter iter;
struct bkey_s_c k;
enum btree_id id; enum btree_id id;
int ret = 0; int ret = 0;
@ -90,7 +88,7 @@ static int bch2_dev_usrdata_drop(struct bch_fs *c, unsigned dev_idx, int flags)
ret = for_each_btree_key_commit(trans, iter, id, POS_MIN, ret = for_each_btree_key_commit(trans, iter, id, POS_MIN,
BTREE_ITER_PREFETCH|BTREE_ITER_ALL_SNAPSHOTS, k, BTREE_ITER_PREFETCH|BTREE_ITER_ALL_SNAPSHOTS, k,
NULL, NULL, BTREE_INSERT_NOFAIL, NULL, NULL, BCH_TRANS_COMMIT_no_enospc,
bch2_dev_usrdata_drop_key(trans, &iter, k, dev_idx, flags)); bch2_dev_usrdata_drop_key(trans, &iter, k, dev_idx, flags));
if (ret) if (ret)
break; break;
@ -145,10 +143,9 @@ static int bch2_dev_metadata_drop(struct bch_fs *c, unsigned dev_idx, int flags)
continue; continue;
} }
if (ret) { bch_err_msg(c, ret, "updating btree node key");
bch_err_msg(c, ret, "updating btree node key"); if (ret)
break; break;
}
next: next:
bch2_btree_iter_next_node(&iter); bch2_btree_iter_next_node(&iter);
} }

View File

@ -27,6 +27,13 @@
#include <linux/ioprio.h> #include <linux/ioprio.h>
#include <linux/kthread.h> #include <linux/kthread.h>
const char * const bch2_data_ops_strs[] = {
#define x(t, n, ...) [n] = #t,
BCH_DATA_OPS()
#undef x
NULL
};
static void trace_move_extent2(struct bch_fs *c, struct bkey_s_c k) static void trace_move_extent2(struct bch_fs *c, struct bkey_s_c k)
{ {
if (trace_move_extent_enabled()) { if (trace_move_extent_enabled()) {
@ -63,7 +70,7 @@ struct moving_io {
struct data_update write; struct data_update write;
/* Must be last since it is variable size */ /* Must be last since it is variable size */
struct bio_vec bi_inline_vecs[0]; struct bio_vec bi_inline_vecs[];
}; };
static void move_free(struct moving_io *io) static void move_free(struct moving_io *io)
@ -152,7 +159,7 @@ void bch2_move_ctxt_wait_for_io(struct moving_context *ctxt)
atomic_read(&ctxt->write_sectors) != sectors_pending); atomic_read(&ctxt->write_sectors) != sectors_pending);
} }
static void bch2_moving_ctxt_flush_all(struct moving_context *ctxt) void bch2_moving_ctxt_flush_all(struct moving_context *ctxt)
{ {
move_ctxt_wait_event(ctxt, list_empty(&ctxt->reads)); move_ctxt_wait_event(ctxt, list_empty(&ctxt->reads));
bch2_trans_unlock_long(ctxt->trans); bch2_trans_unlock_long(ctxt->trans);
@ -211,7 +218,7 @@ void bch2_move_stats_exit(struct bch_move_stats *stats, struct bch_fs *c)
trace_move_data(c, stats); trace_move_data(c, stats);
} }
void bch2_move_stats_init(struct bch_move_stats *stats, char *name) void bch2_move_stats_init(struct bch_move_stats *stats, const char *name)
{ {
memset(stats, 0, sizeof(*stats)); memset(stats, 0, sizeof(*stats));
stats->data_type = BCH_DATA_user; stats->data_type = BCH_DATA_user;
@ -342,7 +349,8 @@ int bch2_move_extent(struct moving_context *ctxt,
bch2_err_matches(ret, BCH_ERR_transaction_restart)) bch2_err_matches(ret, BCH_ERR_transaction_restart))
return ret; return ret;
this_cpu_inc(c->counters[BCH_COUNTER_move_extent_start_fail]); count_event(c, move_extent_start_fail);
if (trace_move_extent_start_fail_enabled()) { if (trace_move_extent_start_fail_enabled()) {
struct printbuf buf = PRINTBUF; struct printbuf buf = PRINTBUF;
@ -364,13 +372,10 @@ struct bch_io_opts *bch2_move_get_io_opts(struct btree_trans *trans,
int ret = 0; int ret = 0;
if (io_opts->cur_inum != extent_k.k->p.inode) { if (io_opts->cur_inum != extent_k.k->p.inode) {
struct btree_iter iter;
struct bkey_s_c k;
io_opts->d.nr = 0; io_opts->d.nr = 0;
for_each_btree_key(trans, iter, BTREE_ID_inodes, POS(0, extent_k.k->p.inode), ret = for_each_btree_key(trans, iter, BTREE_ID_inodes, POS(0, extent_k.k->p.inode),
BTREE_ITER_ALL_SNAPSHOTS, k, ret) { BTREE_ITER_ALL_SNAPSHOTS, k, ({
if (k.k->p.offset != extent_k.k->p.inode) if (k.k->p.offset != extent_k.k->p.inode)
break; break;
@ -383,11 +388,8 @@ struct bch_io_opts *bch2_move_get_io_opts(struct btree_trans *trans,
struct snapshot_io_opts_entry e = { .snapshot = k.k->p.snapshot }; struct snapshot_io_opts_entry e = { .snapshot = k.k->p.snapshot };
bch2_inode_opts_get(&e.io_opts, trans->c, &inode); bch2_inode_opts_get(&e.io_opts, trans->c, &inode);
ret = darray_push(&io_opts->d, e); darray_push(&io_opts->d, e);
if (ret) }));
break;
}
bch2_trans_iter_exit(trans, &iter);
io_opts->cur_inum = extent_k.k->p.inode; io_opts->cur_inum = extent_k.k->p.inode;
} }
@ -395,12 +397,10 @@ struct bch_io_opts *bch2_move_get_io_opts(struct btree_trans *trans,
if (ret) if (ret)
return ERR_PTR(ret); return ERR_PTR(ret);
if (extent_k.k->p.snapshot) { if (extent_k.k->p.snapshot)
struct snapshot_io_opts_entry *i;
darray_for_each(io_opts->d, i) darray_for_each(io_opts->d, i)
if (bch2_snapshot_is_ancestor(c, extent_k.k->p.snapshot, i->snapshot)) if (bch2_snapshot_is_ancestor(c, extent_k.k->p.snapshot, i->snapshot))
return &i->io_opts; return &i->io_opts;
}
return &io_opts->fs_io_opts; return &io_opts->fs_io_opts;
} }
@ -628,7 +628,7 @@ int bch2_move_data(struct bch_fs *c,
return ret; return ret;
} }
int __bch2_evacuate_bucket(struct moving_context *ctxt, int bch2_evacuate_bucket(struct moving_context *ctxt,
struct move_bucket_in_flight *bucket_in_flight, struct move_bucket_in_flight *bucket_in_flight,
struct bpos bucket, int gen, struct bpos bucket, int gen,
struct data_update_opts _data_opts) struct data_update_opts _data_opts)
@ -664,21 +664,19 @@ int __bch2_evacuate_bucket(struct moving_context *ctxt,
bkey_err(k = bch2_btree_iter_peek_slot(&iter))); bkey_err(k = bch2_btree_iter_peek_slot(&iter)));
bch2_trans_iter_exit(trans, &iter); bch2_trans_iter_exit(trans, &iter);
if (ret) { bch_err_msg(c, ret, "looking up alloc key");
bch_err_msg(c, ret, "looking up alloc key"); if (ret)
goto err; goto err;
}
a = bch2_alloc_to_v4(k, &a_convert); a = bch2_alloc_to_v4(k, &a_convert);
dirty_sectors = a->dirty_sectors; dirty_sectors = bch2_bucket_sectors_dirty(*a);
bucket_size = bch_dev_bkey_exists(c, bucket.inode)->mi.bucket_size; bucket_size = bch_dev_bkey_exists(c, bucket.inode)->mi.bucket_size;
fragmentation = a->fragmentation_lru; fragmentation = a->fragmentation_lru;
ret = bch2_btree_write_buffer_flush(trans); ret = bch2_btree_write_buffer_tryflush(trans);
if (ret) { bch_err_msg(c, ret, "flushing btree write buffer");
bch_err_msg(c, ret, "flushing btree write buffer"); if (ret)
goto err; goto err;
}
while (!(ret = bch2_move_ratelimit(ctxt))) { while (!(ret = bch2_move_ratelimit(ctxt))) {
if (is_kthread && kthread_should_stop()) if (is_kthread && kthread_should_stop())
@ -697,9 +695,6 @@ int __bch2_evacuate_bucket(struct moving_context *ctxt,
break; break;
if (!bp.level) { if (!bp.level) {
const struct bch_extent_ptr *ptr;
unsigned i = 0;
k = bch2_backpointer_get_key(trans, &iter, bp_pos, bp, 0); k = bch2_backpointer_get_key(trans, &iter, bp_pos, bp, 0);
ret = bkey_err(k); ret = bkey_err(k);
if (bch2_err_matches(ret, BCH_ERR_transaction_restart)) if (bch2_err_matches(ret, BCH_ERR_transaction_restart))
@ -722,6 +717,7 @@ int __bch2_evacuate_bucket(struct moving_context *ctxt,
data_opts.target = io_opts.background_target; data_opts.target = io_opts.background_target;
data_opts.rewrite_ptrs = 0; data_opts.rewrite_ptrs = 0;
unsigned i = 0;
bkey_for_each_ptr(bch2_bkey_ptrs_c(k), ptr) { bkey_for_each_ptr(bch2_bkey_ptrs_c(k), ptr) {
if (ptr->dev == bucket.inode) { if (ptr->dev == bucket.inode) {
data_opts.rewrite_ptrs |= 1U << i; data_opts.rewrite_ptrs |= 1U << i;
@ -789,31 +785,13 @@ int __bch2_evacuate_bucket(struct moving_context *ctxt,
return ret; return ret;
} }
int bch2_evacuate_bucket(struct bch_fs *c,
struct bpos bucket, int gen,
struct data_update_opts data_opts,
struct bch_ratelimit *rate,
struct bch_move_stats *stats,
struct write_point_specifier wp,
bool wait_on_copygc)
{
struct moving_context ctxt;
int ret;
bch2_moving_ctxt_init(&ctxt, c, rate, stats, wp, wait_on_copygc);
ret = __bch2_evacuate_bucket(&ctxt, NULL, bucket, gen, data_opts);
bch2_moving_ctxt_exit(&ctxt);
return ret;
}
typedef bool (*move_btree_pred)(struct bch_fs *, void *, typedef bool (*move_btree_pred)(struct bch_fs *, void *,
struct btree *, struct bch_io_opts *, struct btree *, struct bch_io_opts *,
struct data_update_opts *); struct data_update_opts *);
static int bch2_move_btree(struct bch_fs *c, static int bch2_move_btree(struct bch_fs *c,
enum btree_id start_btree_id, struct bpos start_pos, struct bbpos start,
enum btree_id end_btree_id, struct bpos end_pos, struct bbpos end,
move_btree_pred pred, void *arg, move_btree_pred pred, void *arg,
struct bch_move_stats *stats) struct bch_move_stats *stats)
{ {
@ -823,7 +801,7 @@ static int bch2_move_btree(struct bch_fs *c,
struct btree_trans *trans; struct btree_trans *trans;
struct btree_iter iter; struct btree_iter iter;
struct btree *b; struct btree *b;
enum btree_id id; enum btree_id btree;
struct data_update_opts data_opts; struct data_update_opts data_opts;
int ret = 0; int ret = 0;
@ -834,15 +812,15 @@ static int bch2_move_btree(struct bch_fs *c,
stats->data_type = BCH_DATA_btree; stats->data_type = BCH_DATA_btree;
for (id = start_btree_id; for (btree = start.btree;
id <= min_t(unsigned, end_btree_id, btree_id_nr_alive(c) - 1); btree <= min_t(unsigned, end.btree, btree_id_nr_alive(c) - 1);
id++) { btree ++) {
stats->pos = BBPOS(id, POS_MIN); stats->pos = BBPOS(btree, POS_MIN);
if (!bch2_btree_id_root(c, id)->b) if (!bch2_btree_id_root(c, btree)->b)
continue; continue;
bch2_trans_node_iter_init(trans, &iter, id, POS_MIN, 0, 0, bch2_trans_node_iter_init(trans, &iter, btree, POS_MIN, 0, 0,
BTREE_ITER_PREFETCH); BTREE_ITER_PREFETCH);
retry: retry:
ret = 0; ret = 0;
@ -852,8 +830,8 @@ static int bch2_move_btree(struct bch_fs *c,
if (kthread && kthread_should_stop()) if (kthread && kthread_should_stop())
break; break;
if ((cmp_int(id, end_btree_id) ?: if ((cmp_int(btree, end.btree) ?:
bpos_cmp(b->key.k.p, end_pos)) > 0) bpos_cmp(b->key.k.p, end.pos)) > 0)
break; break;
stats->pos = BBPOS(iter.btree_id, iter.pos); stats->pos = BBPOS(iter.btree_id, iter.pos);
@ -910,7 +888,6 @@ static bool migrate_pred(struct bch_fs *c, void *arg,
struct data_update_opts *data_opts) struct data_update_opts *data_opts)
{ {
struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k); struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k);
const struct bch_extent_ptr *ptr;
struct bch_ioctl_data *op = arg; struct bch_ioctl_data *op = arg;
unsigned i = 0; unsigned i = 0;
@ -990,8 +967,8 @@ int bch2_scan_old_btree_nodes(struct bch_fs *c, struct bch_move_stats *stats)
int ret; int ret;
ret = bch2_move_btree(c, ret = bch2_move_btree(c,
0, POS_MIN, BBPOS_MIN,
BTREE_ID_NR, SPOS_MAX, BBPOS_MAX,
rewrite_old_nodes_pred, c, stats); rewrite_old_nodes_pred, c, stats);
if (!ret) { if (!ret) {
mutex_lock(&c->sb_lock); mutex_lock(&c->sb_lock);
@ -1006,71 +983,101 @@ int bch2_scan_old_btree_nodes(struct bch_fs *c, struct bch_move_stats *stats)
return ret; return ret;
} }
static bool drop_extra_replicas_pred(struct bch_fs *c, void *arg,
struct bkey_s_c k,
struct bch_io_opts *io_opts,
struct data_update_opts *data_opts)
{
unsigned durability = bch2_bkey_durability(c, k);
unsigned replicas = bkey_is_btree_ptr(k.k)
? c->opts.metadata_replicas
: io_opts->data_replicas;
const union bch_extent_entry *entry;
struct extent_ptr_decoded p;
unsigned i = 0;
bkey_for_each_ptr_decode(k.k, bch2_bkey_ptrs_c(k), p, entry) {
unsigned d = bch2_extent_ptr_durability(c, &p);
if (d && durability - d >= replicas) {
data_opts->kill_ptrs |= BIT(i);
durability -= d;
}
i++;
}
return data_opts->kill_ptrs != 0;
}
static bool drop_extra_replicas_btree_pred(struct bch_fs *c, void *arg,
struct btree *b,
struct bch_io_opts *io_opts,
struct data_update_opts *data_opts)
{
return drop_extra_replicas_pred(c, arg, bkey_i_to_s_c(&b->key), io_opts, data_opts);
}
int bch2_data_job(struct bch_fs *c, int bch2_data_job(struct bch_fs *c,
struct bch_move_stats *stats, struct bch_move_stats *stats,
struct bch_ioctl_data op) struct bch_ioctl_data op)
{ {
struct bbpos start = BBPOS(op.start_btree, op.start_pos);
struct bbpos end = BBPOS(op.end_btree, op.end_pos);
int ret = 0; int ret = 0;
if (op.op >= BCH_DATA_OP_NR)
return -EINVAL;
bch2_move_stats_init(stats, bch2_data_ops_strs[op.op]);
switch (op.op) { switch (op.op) {
case BCH_DATA_OP_REREPLICATE: case BCH_DATA_OP_rereplicate:
bch2_move_stats_init(stats, "rereplicate");
stats->data_type = BCH_DATA_journal; stats->data_type = BCH_DATA_journal;
ret = bch2_journal_flush_device_pins(&c->journal, -1); ret = bch2_journal_flush_device_pins(&c->journal, -1);
ret = bch2_move_btree(c, start, end,
ret = bch2_move_btree(c,
op.start_btree, op.start_pos,
op.end_btree, op.end_pos,
rereplicate_btree_pred, c, stats) ?: ret; rereplicate_btree_pred, c, stats) ?: ret;
ret = bch2_replicas_gc2(c) ?: ret; ret = bch2_move_data(c, start, end,
ret = bch2_move_data(c,
(struct bbpos) { op.start_btree, op.start_pos },
(struct bbpos) { op.end_btree, op.end_pos },
NULL, NULL,
stats, stats,
writepoint_hashed((unsigned long) current), writepoint_hashed((unsigned long) current),
true, true,
rereplicate_pred, c) ?: ret; rereplicate_pred, c) ?: ret;
ret = bch2_replicas_gc2(c) ?: ret; ret = bch2_replicas_gc2(c) ?: ret;
bch2_move_stats_exit(stats, c);
break; break;
case BCH_DATA_OP_MIGRATE: case BCH_DATA_OP_migrate:
if (op.migrate.dev >= c->sb.nr_devices) if (op.migrate.dev >= c->sb.nr_devices)
return -EINVAL; return -EINVAL;
bch2_move_stats_init(stats, "migrate");
stats->data_type = BCH_DATA_journal; stats->data_type = BCH_DATA_journal;
ret = bch2_journal_flush_device_pins(&c->journal, op.migrate.dev); ret = bch2_journal_flush_device_pins(&c->journal, op.migrate.dev);
ret = bch2_move_btree(c, start, end,
ret = bch2_move_btree(c,
op.start_btree, op.start_pos,
op.end_btree, op.end_pos,
migrate_btree_pred, &op, stats) ?: ret; migrate_btree_pred, &op, stats) ?: ret;
ret = bch2_replicas_gc2(c) ?: ret; ret = bch2_move_data(c, start, end,
ret = bch2_move_data(c,
(struct bbpos) { op.start_btree, op.start_pos },
(struct bbpos) { op.end_btree, op.end_pos },
NULL, NULL,
stats, stats,
writepoint_hashed((unsigned long) current), writepoint_hashed((unsigned long) current),
true, true,
migrate_pred, &op) ?: ret; migrate_pred, &op) ?: ret;
ret = bch2_replicas_gc2(c) ?: ret; ret = bch2_replicas_gc2(c) ?: ret;
bch2_move_stats_exit(stats, c);
break; break;
case BCH_DATA_OP_REWRITE_OLD_NODES: case BCH_DATA_OP_rewrite_old_nodes:
bch2_move_stats_init(stats, "rewrite_old_nodes");
ret = bch2_scan_old_btree_nodes(c, stats); ret = bch2_scan_old_btree_nodes(c, stats);
bch2_move_stats_exit(stats, c); break;
case BCH_DATA_OP_drop_extra_replicas:
ret = bch2_move_btree(c, start, end,
drop_extra_replicas_btree_pred, c, stats) ?: ret;
ret = bch2_move_data(c, start, end, NULL, stats,
writepoint_hashed((unsigned long) current),
true,
drop_extra_replicas_pred, c) ?: ret;
ret = bch2_replicas_gc2(c) ?: ret;
break; break;
default: default:
ret = -EINVAL; ret = -EINVAL;
} }
bch2_move_stats_exit(stats, c);
return ret; return ret;
} }

View File

@ -75,12 +75,15 @@ do { \
typedef bool (*move_pred_fn)(struct bch_fs *, void *, struct bkey_s_c, typedef bool (*move_pred_fn)(struct bch_fs *, void *, struct bkey_s_c,
struct bch_io_opts *, struct data_update_opts *); struct bch_io_opts *, struct data_update_opts *);
extern const char * const bch2_data_ops_strs[];
void bch2_moving_ctxt_exit(struct moving_context *); void bch2_moving_ctxt_exit(struct moving_context *);
void bch2_moving_ctxt_init(struct moving_context *, struct bch_fs *, void bch2_moving_ctxt_init(struct moving_context *, struct bch_fs *,
struct bch_ratelimit *, struct bch_move_stats *, struct bch_ratelimit *, struct bch_move_stats *,
struct write_point_specifier, bool); struct write_point_specifier, bool);
struct moving_io *bch2_moving_ctxt_next_pending_write(struct moving_context *); struct moving_io *bch2_moving_ctxt_next_pending_write(struct moving_context *);
void bch2_moving_ctxt_do_pending_writes(struct moving_context *); void bch2_moving_ctxt_do_pending_writes(struct moving_context *);
void bch2_moving_ctxt_flush_all(struct moving_context *);
void bch2_move_ctxt_wait_for_io(struct moving_context *); void bch2_move_ctxt_wait_for_io(struct moving_context *);
int bch2_move_ratelimit(struct moving_context *); int bch2_move_ratelimit(struct moving_context *);
@ -133,23 +136,17 @@ int bch2_move_data(struct bch_fs *,
bool, bool,
move_pred_fn, void *); move_pred_fn, void *);
int __bch2_evacuate_bucket(struct moving_context *, int bch2_evacuate_bucket(struct moving_context *,
struct move_bucket_in_flight *, struct move_bucket_in_flight *,
struct bpos, int, struct bpos, int,
struct data_update_opts); struct data_update_opts);
int bch2_evacuate_bucket(struct bch_fs *, struct bpos, int,
struct data_update_opts,
struct bch_ratelimit *,
struct bch_move_stats *,
struct write_point_specifier,
bool);
int bch2_data_job(struct bch_fs *, int bch2_data_job(struct bch_fs *,
struct bch_move_stats *, struct bch_move_stats *,
struct bch_ioctl_data); struct bch_ioctl_data);
void bch2_move_stats_to_text(struct printbuf *, struct bch_move_stats *); void bch2_move_stats_to_text(struct printbuf *, struct bch_move_stats *);
void bch2_move_stats_exit(struct bch_move_stats *, struct bch_fs *); void bch2_move_stats_exit(struct bch_move_stats *, struct bch_fs *);
void bch2_move_stats_init(struct bch_move_stats *, char *); void bch2_move_stats_init(struct bch_move_stats *, const char *);
void bch2_fs_moving_ctxts_to_text(struct printbuf *, struct bch_fs *); void bch2_fs_moving_ctxts_to_text(struct printbuf *, struct bch_fs *);

View File

@ -91,7 +91,7 @@ static int bch2_bucket_is_movable(struct btree_trans *trans,
a = bch2_alloc_to_v4(k, &_a); a = bch2_alloc_to_v4(k, &_a);
b->k.gen = a->gen; b->k.gen = a->gen;
b->sectors = a->dirty_sectors; b->sectors = bch2_bucket_sectors_dirty(*a);
ret = data_type_movable(a->data_type) && ret = data_type_movable(a->data_type) &&
a->fragmentation_lru && a->fragmentation_lru &&
@ -145,20 +145,21 @@ static int bch2_copygc_get_buckets(struct moving_context *ctxt,
{ {
struct btree_trans *trans = ctxt->trans; struct btree_trans *trans = ctxt->trans;
struct bch_fs *c = trans->c; struct bch_fs *c = trans->c;
struct btree_iter iter;
struct bkey_s_c k;
size_t nr_to_get = max_t(size_t, 16U, buckets_in_flight->nr / 4); size_t nr_to_get = max_t(size_t, 16U, buckets_in_flight->nr / 4);
size_t saw = 0, in_flight = 0, not_movable = 0, sectors = 0; size_t saw = 0, in_flight = 0, not_movable = 0, sectors = 0;
int ret; int ret;
move_buckets_wait(ctxt, buckets_in_flight, false); move_buckets_wait(ctxt, buckets_in_flight, false);
ret = bch2_btree_write_buffer_flush(trans); ret = bch2_btree_write_buffer_tryflush(trans);
if (bch2_fs_fatal_err_on(ret, c, "%s: error %s from bch2_btree_write_buffer_flush()", if (bch2_err_matches(ret, EROFS))
return ret;
if (bch2_fs_fatal_err_on(ret, c, "%s: error %s from bch2_btree_write_buffer_tryflush()",
__func__, bch2_err_str(ret))) __func__, bch2_err_str(ret)))
return ret; return ret;
ret = for_each_btree_key2_upto(trans, iter, BTREE_ID_lru, ret = for_each_btree_key_upto(trans, iter, BTREE_ID_lru,
lru_pos(BCH_LRU_FRAGMENTATION_START, 0, 0), lru_pos(BCH_LRU_FRAGMENTATION_START, 0, 0),
lru_pos(BCH_LRU_FRAGMENTATION_START, U64_MAX, LRU_TIME_MAX), lru_pos(BCH_LRU_FRAGMENTATION_START, U64_MAX, LRU_TIME_MAX),
0, k, ({ 0, k, ({
@ -167,15 +168,23 @@ static int bch2_copygc_get_buckets(struct moving_context *ctxt,
saw++; saw++;
if (!bch2_bucket_is_movable(trans, &b, lru_pos_time(k.k->p))) ret2 = bch2_bucket_is_movable(trans, &b, lru_pos_time(k.k->p));
if (ret2 < 0)
goto err;
if (!ret2)
not_movable++; not_movable++;
else if (bucket_in_flight(buckets_in_flight, b.k)) else if (bucket_in_flight(buckets_in_flight, b.k))
in_flight++; in_flight++;
else { else {
ret2 = darray_push(buckets, b) ?: buckets->nr >= nr_to_get; ret2 = darray_push(buckets, b);
if (ret2 >= 0) if (ret2)
sectors += b.sectors; goto err;
sectors += b.sectors;
} }
ret2 = buckets->nr >= nr_to_get;
err:
ret2; ret2;
})); }));
@ -198,7 +207,6 @@ static int bch2_copygc(struct moving_context *ctxt,
}; };
move_buckets buckets = { 0 }; move_buckets buckets = { 0 };
struct move_bucket_in_flight *f; struct move_bucket_in_flight *f;
struct move_bucket *i;
u64 moved = atomic64_read(&ctxt->stats->sectors_moved); u64 moved = atomic64_read(&ctxt->stats->sectors_moved);
int ret = 0; int ret = 0;
@ -221,7 +229,7 @@ static int bch2_copygc(struct moving_context *ctxt,
break; break;
} }
ret = __bch2_evacuate_bucket(ctxt, f, f->bucket.k.bucket, ret = bch2_evacuate_bucket(ctxt, f, f->bucket.k.bucket,
f->bucket.k.gen, data_opts); f->bucket.k.gen, data_opts);
if (ret) if (ret)
goto err; goto err;
@ -259,19 +267,16 @@ static int bch2_copygc(struct moving_context *ctxt,
*/ */
unsigned long bch2_copygc_wait_amount(struct bch_fs *c) unsigned long bch2_copygc_wait_amount(struct bch_fs *c)
{ {
struct bch_dev *ca;
unsigned dev_idx;
s64 wait = S64_MAX, fragmented_allowed, fragmented; s64 wait = S64_MAX, fragmented_allowed, fragmented;
unsigned i;
for_each_rw_member(ca, c, dev_idx) { for_each_rw_member(c, ca) {
struct bch_dev_usage usage = bch2_dev_usage_read(ca); struct bch_dev_usage usage = bch2_dev_usage_read(ca);
fragmented_allowed = ((__dev_buckets_available(ca, usage, BCH_WATERMARK_stripe) * fragmented_allowed = ((__dev_buckets_available(ca, usage, BCH_WATERMARK_stripe) *
ca->mi.bucket_size) >> 1); ca->mi.bucket_size) >> 1);
fragmented = 0; fragmented = 0;
for (i = 0; i < BCH_DATA_NR; i++) for (unsigned i = 0; i < BCH_DATA_NR; i++)
if (data_type_movable(i)) if (data_type_movable(i))
fragmented += usage.d[i].fragmented; fragmented += usage.d[i].fragmented;
@ -313,9 +318,9 @@ static int bch2_copygc_thread(void *arg)
if (!buckets) if (!buckets)
return -ENOMEM; return -ENOMEM;
ret = rhashtable_init(&buckets->table, &bch_move_bucket_params); ret = rhashtable_init(&buckets->table, &bch_move_bucket_params);
bch_err_msg(c, ret, "allocating copygc buckets in flight");
if (ret) { if (ret) {
kfree(buckets); kfree(buckets);
bch_err_msg(c, ret, "allocating copygc buckets in flight");
return ret; return ret;
} }
@ -334,7 +339,8 @@ static int bch2_copygc_thread(void *arg)
if (!c->copy_gc_enabled) { if (!c->copy_gc_enabled) {
move_buckets_wait(&ctxt, buckets, true); move_buckets_wait(&ctxt, buckets, true);
kthread_wait_freezable(c->copy_gc_enabled); kthread_wait_freezable(c->copy_gc_enabled ||
kthread_should_stop());
} }
if (unlikely(freezing(current))) { if (unlikely(freezing(current))) {
@ -411,10 +417,9 @@ int bch2_copygc_start(struct bch_fs *c)
t = kthread_create(bch2_copygc_thread, c, "bch-copygc/%s", c->name); t = kthread_create(bch2_copygc_thread, c, "bch-copygc/%s", c->name);
ret = PTR_ERR_OR_ZERO(t); ret = PTR_ERR_OR_ZERO(t);
if (ret) { bch_err_msg(c, ret, "creating copygc thread");
bch_err_msg(c, ret, "creating copygc thread"); if (ret)
return ret; return ret;
}
get_task_struct(t); get_task_struct(t);

View File

@ -279,14 +279,14 @@ int bch2_opt_validate(const struct bch_option *opt, u64 v, struct printbuf *err)
if (err) if (err)
prt_printf(err, "%s: not a multiple of 512", prt_printf(err, "%s: not a multiple of 512",
opt->attr.name); opt->attr.name);
return -EINVAL; return -BCH_ERR_opt_parse_error;
} }
if ((opt->flags & OPT_MUST_BE_POW_2) && !is_power_of_2(v)) { if ((opt->flags & OPT_MUST_BE_POW_2) && !is_power_of_2(v)) {
if (err) if (err)
prt_printf(err, "%s: must be a power of two", prt_printf(err, "%s: must be a power of two",
opt->attr.name); opt->attr.name);
return -EINVAL; return -BCH_ERR_opt_parse_error;
} }
if (opt->fn.validate) if (opt->fn.validate)

View File

@ -233,11 +233,6 @@ enum fsck_err_opts {
OPT_BOOL(), \ OPT_BOOL(), \
BCH2_NO_SB_OPT, true, \ BCH2_NO_SB_OPT, true, \
NULL, "Stash pointer to in memory btree node in btree ptr")\ NULL, "Stash pointer to in memory btree node in btree ptr")\
x(btree_write_buffer_size, u32, \
OPT_FS|OPT_MOUNT, \
OPT_UINT(16, (1U << 20) - 1), \
BCH2_NO_SB_OPT, 1U << 13, \
NULL, "Number of btree write buffer entries") \
x(gc_reserve_percent, u8, \ x(gc_reserve_percent, u8, \
OPT_FS|OPT_FORMAT|OPT_MOUNT|OPT_RUNTIME, \ OPT_FS|OPT_FORMAT|OPT_MOUNT|OPT_RUNTIME, \
OPT_UINT(5, 21), \ OPT_UINT(5, 21), \
@ -394,7 +389,7 @@ enum fsck_err_opts {
BCH2_NO_SB_OPT, BCH_SB_SECTOR, \ BCH2_NO_SB_OPT, BCH_SB_SECTOR, \
"offset", "Sector offset of superblock") \ "offset", "Sector offset of superblock") \
x(read_only, u8, \ x(read_only, u8, \
OPT_FS, \ OPT_FS|OPT_MOUNT, \
OPT_BOOL(), \ OPT_BOOL(), \
BCH2_NO_SB_OPT, false, \ BCH2_NO_SB_OPT, false, \
NULL, NULL) \ NULL, NULL) \
@ -419,6 +414,11 @@ enum fsck_err_opts {
OPT_BOOL(), \ OPT_BOOL(), \
BCH2_NO_SB_OPT, false, \ BCH2_NO_SB_OPT, false, \
NULL, "Allocate the buckets_nouse bitmap") \ NULL, "Allocate the buckets_nouse bitmap") \
x(stdio, u64, \
0, \
OPT_UINT(0, S64_MAX), \
BCH2_NO_SB_OPT, false, \
NULL, "Pointer to a struct stdio_redirect") \
x(project, u8, \ x(project, u8, \
OPT_INODE, \ OPT_INODE, \
OPT_BOOL(), \ OPT_BOOL(), \
@ -458,7 +458,13 @@ enum fsck_err_opts {
OPT_UINT(0, BCH_REPLICAS_MAX), \ OPT_UINT(0, BCH_REPLICAS_MAX), \
BCH2_NO_SB_OPT, 1, \ BCH2_NO_SB_OPT, 1, \
"n", "Data written to this device will be considered\n"\ "n", "Data written to this device will be considered\n"\
"to have already been replicated n times") "to have already been replicated n times") \
x(btree_node_prefetch, u8, \
OPT_FS|OPT_MOUNT|OPT_RUNTIME, \
OPT_BOOL(), \
BCH2_NO_SB_OPT, true, \
NULL, "BTREE_ITER_PREFETCH casuse btree nodes to be\n"\
" prefetched sequentially")
struct bch_opts { struct bch_opts {
#define x(_name, _bits, ...) unsigned _name##_defined:1; #define x(_name, _bits, ...) unsigned _name##_defined:1;

View File

@ -599,14 +599,9 @@ static int bch2_fs_quota_read_inode(struct btree_trans *trans,
int bch2_fs_quota_read(struct bch_fs *c) int bch2_fs_quota_read(struct bch_fs *c)
{ {
struct bch_sb_field_quota *sb_quota;
struct btree_trans *trans;
struct btree_iter iter;
struct bkey_s_c k;
int ret;
mutex_lock(&c->sb_lock); mutex_lock(&c->sb_lock);
sb_quota = bch2_sb_get_or_create_quota(&c->disk_sb); struct bch_sb_field_quota *sb_quota = bch2_sb_get_or_create_quota(&c->disk_sb);
if (!sb_quota) { if (!sb_quota) {
mutex_unlock(&c->sb_lock); mutex_unlock(&c->sb_lock);
return -BCH_ERR_ENOSPC_sb_quota; return -BCH_ERR_ENOSPC_sb_quota;
@ -615,19 +610,14 @@ int bch2_fs_quota_read(struct bch_fs *c)
bch2_sb_quota_read(c); bch2_sb_quota_read(c);
mutex_unlock(&c->sb_lock); mutex_unlock(&c->sb_lock);
trans = bch2_trans_get(c); int ret = bch2_trans_run(c,
for_each_btree_key(trans, iter, BTREE_ID_quotas, POS_MIN,
ret = for_each_btree_key2(trans, iter, BTREE_ID_quotas, BTREE_ITER_PREFETCH, k,
POS_MIN, BTREE_ITER_PREFETCH, k, __bch2_quota_set(c, k, NULL)) ?:
__bch2_quota_set(c, k, NULL)) ?: for_each_btree_key(trans, iter, BTREE_ID_inodes, POS_MIN,
for_each_btree_key2(trans, iter, BTREE_ID_inodes, BTREE_ITER_PREFETCH|BTREE_ITER_ALL_SNAPSHOTS, k,
POS_MIN, BTREE_ITER_PREFETCH|BTREE_ITER_ALL_SNAPSHOTS, k, bch2_fs_quota_read_inode(trans, &iter, k)));
bch2_fs_quota_read_inode(trans, &iter, k)); bch_err_fn(c, ret);
bch2_trans_put(trans);
if (ret)
bch_err_fn(c, ret);
return ret; return ret;
} }

View File

@ -69,7 +69,7 @@ static int __bch2_set_rebalance_needs_scan(struct btree_trans *trans, u64 inum)
int bch2_set_rebalance_needs_scan(struct bch_fs *c, u64 inum) int bch2_set_rebalance_needs_scan(struct bch_fs *c, u64 inum)
{ {
int ret = bch2_trans_do(c, NULL, NULL, BTREE_INSERT_NOFAIL|BTREE_INSERT_LAZY_RW, int ret = bch2_trans_do(c, NULL, NULL, BCH_TRANS_COMMIT_no_enospc|BCH_TRANS_COMMIT_lazy_rw,
__bch2_set_rebalance_needs_scan(trans, inum)); __bch2_set_rebalance_needs_scan(trans, inum));
rebalance_wakeup(c); rebalance_wakeup(c);
return ret; return ret;
@ -125,7 +125,7 @@ static int bch2_bkey_clear_needs_rebalance(struct btree_trans *trans,
extent_entry_drop(bkey_i_to_s(n), extent_entry_drop(bkey_i_to_s(n),
(void *) bch2_bkey_rebalance_opts(bkey_i_to_s_c(n))); (void *) bch2_bkey_rebalance_opts(bkey_i_to_s_c(n)));
return bch2_trans_commit(trans, NULL, NULL, BTREE_INSERT_NOFAIL); return bch2_trans_commit(trans, NULL, NULL, BCH_TRANS_COMMIT_no_enospc);
} }
static struct bkey_s_c next_rebalance_extent(struct btree_trans *trans, static struct bkey_s_c next_rebalance_extent(struct btree_trans *trans,
@ -171,6 +171,21 @@ static struct bkey_s_c next_rebalance_extent(struct btree_trans *trans,
return bkey_s_c_null; return bkey_s_c_null;
} }
if (trace_rebalance_extent_enabled()) {
struct printbuf buf = PRINTBUF;
prt_str(&buf, "target=");
bch2_target_to_text(&buf, c, r->target);
prt_str(&buf, " compression=");
struct bch_compression_opt opt = __bch2_compression_decode(r->compression);
prt_str(&buf, bch2_compression_opts[opt.type]);
prt_str(&buf, " ");
bch2_bkey_val_to_text(&buf, c, k);
trace_rebalance_extent(c, buf.buf);
printbuf_exit(&buf);
}
return k; return k;
} }
@ -273,7 +288,7 @@ static int do_rebalance_scan(struct moving_context *ctxt, u64 inum, u64 cookie)
r->state = BCH_REBALANCE_scanning; r->state = BCH_REBALANCE_scanning;
ret = __bch2_move_data(ctxt, r->scan_start, r->scan_end, rebalance_pred, NULL) ?: ret = __bch2_move_data(ctxt, r->scan_start, r->scan_end, rebalance_pred, NULL) ?:
commit_do(trans, NULL, NULL, BTREE_INSERT_NOFAIL, commit_do(trans, NULL, NULL, BCH_TRANS_COMMIT_no_enospc,
bch2_clear_rebalance_needs_scan(trans, inum, cookie)); bch2_clear_rebalance_needs_scan(trans, inum, cookie));
bch2_move_stats_exit(&r->scan_stats, trans->c); bch2_move_stats_exit(&r->scan_stats, trans->c);
@ -317,8 +332,16 @@ static int do_rebalance(struct moving_context *ctxt)
BTREE_ID_rebalance_work, POS_MIN, BTREE_ID_rebalance_work, POS_MIN,
BTREE_ITER_ALL_SNAPSHOTS); BTREE_ITER_ALL_SNAPSHOTS);
while (!bch2_move_ratelimit(ctxt) && while (!bch2_move_ratelimit(ctxt)) {
!kthread_wait_freezable(r->enabled)) { if (!r->enabled) {
bch2_moving_ctxt_flush_all(ctxt);
kthread_wait_freezable(r->enabled ||
kthread_should_stop());
}
if (kthread_should_stop())
break;
bch2_trans_begin(trans); bch2_trans_begin(trans);
ret = bkey_err(k = next_rebalance_entry(trans, &rebalance_work_iter)); ret = bkey_err(k = next_rebalance_entry(trans, &rebalance_work_iter));
@ -447,10 +470,9 @@ int bch2_rebalance_start(struct bch_fs *c)
p = kthread_create(bch2_rebalance_thread, c, "bch-rebalance/%s", c->name); p = kthread_create(bch2_rebalance_thread, c, "bch-rebalance/%s", c->name);
ret = PTR_ERR_OR_ZERO(p); ret = PTR_ERR_OR_ZERO(p);
if (ret) { bch_err_msg(c, ret, "creating rebalance thread");
bch_err_msg(c, ret, "creating rebalance thread"); if (ret)
return ret; return ret;
}
get_task_struct(p); get_task_struct(p);
rcu_assign_pointer(c->rebalance.thread, p); rcu_assign_pointer(c->rebalance.thread, p);

View File

@ -99,6 +99,11 @@ static int bch2_journal_replay_key(struct btree_trans *trans,
unsigned update_flags = BTREE_TRIGGER_NORUN; unsigned update_flags = BTREE_TRIGGER_NORUN;
int ret; int ret;
if (k->overwritten)
return 0;
trans->journal_res.seq = k->journal_seq;
/* /*
* BTREE_UPDATE_KEY_CACHE_RECLAIM disables key cache lookup/update to * BTREE_UPDATE_KEY_CACHE_RECLAIM disables key cache lookup/update to
* keep the key cache coherent with the underlying btree. Nothing * keep the key cache coherent with the underlying btree. Nothing
@ -140,27 +145,13 @@ static int journal_sort_seq_cmp(const void *_l, const void *_r)
static int bch2_journal_replay(struct bch_fs *c) static int bch2_journal_replay(struct bch_fs *c)
{ {
struct journal_keys *keys = &c->journal_keys; struct journal_keys *keys = &c->journal_keys;
struct journal_key **keys_sorted, *k; DARRAY(struct journal_key *) keys_sorted = { 0 };
struct journal *j = &c->journal; struct journal *j = &c->journal;
u64 start_seq = c->journal_replay_seq_start; u64 start_seq = c->journal_replay_seq_start;
u64 end_seq = c->journal_replay_seq_start; u64 end_seq = c->journal_replay_seq_start;
size_t i; struct btree_trans *trans = bch2_trans_get(c);
int ret = 0; int ret = 0;
move_gap(keys->d, keys->nr, keys->size, keys->gap, keys->nr);
keys->gap = keys->nr;
keys_sorted = kvmalloc_array(keys->nr, sizeof(*keys_sorted), GFP_KERNEL);
if (!keys_sorted)
return -BCH_ERR_ENOMEM_journal_replay;
for (i = 0; i < keys->nr; i++)
keys_sorted[i] = &keys->d[i];
sort(keys_sorted, keys->nr,
sizeof(keys_sorted[0]),
journal_sort_seq_cmp, NULL);
if (keys->nr) { if (keys->nr) {
ret = bch2_journal_log_msg(c, "Starting journal replay (%zu keys in entries %llu-%llu)", ret = bch2_journal_log_msg(c, "Starting journal replay (%zu keys in entries %llu-%llu)",
keys->nr, start_seq, end_seq); keys->nr, start_seq, end_seq);
@ -170,27 +161,67 @@ static int bch2_journal_replay(struct bch_fs *c)
BUG_ON(!atomic_read(&keys->ref)); BUG_ON(!atomic_read(&keys->ref));
for (i = 0; i < keys->nr; i++) { /*
k = keys_sorted[i]; * First, attempt to replay keys in sorted order. This is more
* efficient - better locality of btree access - but some might fail if
* that would cause a journal deadlock.
*/
for (size_t i = 0; i < keys->nr; i++) {
cond_resched(); cond_resched();
struct journal_key *k = keys->d + i;
/* Skip fastpath if we're low on space in the journal */
ret = c->journal.watermark ? -1 :
commit_do(trans, NULL, NULL,
BCH_TRANS_COMMIT_no_enospc|
BCH_TRANS_COMMIT_journal_reclaim|
(!k->allocated ? BCH_TRANS_COMMIT_no_journal_res : 0),
bch2_journal_replay_key(trans, k));
BUG_ON(!ret && !k->overwritten);
if (ret) {
ret = darray_push(&keys_sorted, k);
if (ret)
goto err;
}
}
/*
* Now, replay any remaining keys in the order in which they appear in
* the journal, unpinning those journal entries as we go:
*/
sort(keys_sorted.data, keys_sorted.nr,
sizeof(keys_sorted.data[0]),
journal_sort_seq_cmp, NULL);
darray_for_each(keys_sorted, kp) {
cond_resched();
struct journal_key *k = *kp;
replay_now_at(j, k->journal_seq); replay_now_at(j, k->journal_seq);
ret = bch2_trans_do(c, NULL, NULL, ret = commit_do(trans, NULL, NULL,
BTREE_INSERT_LAZY_RW| BCH_TRANS_COMMIT_no_enospc|
BTREE_INSERT_NOFAIL| (!k->allocated
(!k->allocated ? BCH_TRANS_COMMIT_no_journal_res|BCH_WATERMARK_reclaim
? BTREE_INSERT_JOURNAL_REPLAY|BCH_WATERMARK_reclaim : 0),
: 0),
bch2_journal_replay_key(trans, k)); bch2_journal_replay_key(trans, k));
if (ret) { bch_err_msg(c, ret, "while replaying key at btree %s level %u:",
bch_err(c, "journal replay: error while replaying key at btree %s level %u: %s", bch2_btree_id_str(k->btree_id), k->level);
bch2_btree_id_str(k->btree_id), k->level, bch2_err_str(ret)); if (ret)
goto err; goto err;
}
BUG_ON(!k->overwritten);
} }
/*
* We need to put our btree_trans before calling flush_all_pins(), since
* that will use a btree_trans internally
*/
bch2_trans_put(trans);
trans = NULL;
if (!c->opts.keep_journal) if (!c->opts.keep_journal)
bch2_journal_keys_put_initial(c); bch2_journal_keys_put_initial(c);
@ -198,16 +229,14 @@ static int bch2_journal_replay(struct bch_fs *c)
j->replay_journal_seq = 0; j->replay_journal_seq = 0;
bch2_journal_set_replay_done(j); bch2_journal_set_replay_done(j);
bch2_journal_flush_all_pins(j);
ret = bch2_journal_error(j);
if (keys->nr && !ret) if (keys->nr)
bch2_journal_log_msg(c, "journal replay finished"); bch2_journal_log_msg(c, "journal replay finished");
err: err:
kvfree(keys_sorted); if (trans)
bch2_trans_put(trans);
if (ret) darray_exit(&keys_sorted);
bch_err_fn(c, ret); bch_err_fn(c, ret);
return ret; return ret;
} }
@ -275,8 +304,6 @@ static int journal_replay_entry_early(struct bch_fs *c,
struct bch_dev *ca = bch_dev_bkey_exists(c, le32_to_cpu(u->dev)); struct bch_dev *ca = bch_dev_bkey_exists(c, le32_to_cpu(u->dev));
unsigned i, nr_types = jset_entry_dev_usage_nr_types(u); unsigned i, nr_types = jset_entry_dev_usage_nr_types(u);
ca->usage_base->buckets_ec = le64_to_cpu(u->buckets_ec);
for (i = 0; i < min_t(unsigned, nr_types, BCH_DATA_NR); i++) { for (i = 0; i < min_t(unsigned, nr_types, BCH_DATA_NR); i++) {
ca->usage_base->d[i].buckets = le64_to_cpu(u->d[i].buckets); ca->usage_base->d[i].buckets = le64_to_cpu(u->d[i].buckets);
ca->usage_base->d[i].sectors = le64_to_cpu(u->d[i].sectors); ca->usage_base->d[i].sectors = le64_to_cpu(u->d[i].sectors);
@ -317,14 +344,11 @@ static int journal_replay_entry_early(struct bch_fs *c,
static int journal_replay_early(struct bch_fs *c, static int journal_replay_early(struct bch_fs *c,
struct bch_sb_field_clean *clean) struct bch_sb_field_clean *clean)
{ {
struct jset_entry *entry;
int ret;
if (clean) { if (clean) {
for (entry = clean->start; for (struct jset_entry *entry = clean->start;
entry != vstruct_end(&clean->field); entry != vstruct_end(&clean->field);
entry = vstruct_next(entry)) { entry = vstruct_next(entry)) {
ret = journal_replay_entry_early(c, entry); int ret = journal_replay_entry_early(c, entry);
if (ret) if (ret)
return ret; return ret;
} }
@ -339,7 +363,7 @@ static int journal_replay_early(struct bch_fs *c,
continue; continue;
vstruct_for_each(&i->j, entry) { vstruct_for_each(&i->j, entry) {
ret = journal_replay_entry_early(c, entry); int ret = journal_replay_entry_early(c, entry);
if (ret) if (ret)
return ret; return ret;
} }
@ -435,8 +459,7 @@ static int bch2_initialize_subvolumes(struct bch_fs *c)
ret = bch2_btree_insert(c, BTREE_ID_snapshot_trees, &root_tree.k_i, NULL, 0) ?: ret = bch2_btree_insert(c, BTREE_ID_snapshot_trees, &root_tree.k_i, NULL, 0) ?:
bch2_btree_insert(c, BTREE_ID_snapshots, &root_snapshot.k_i, NULL, 0) ?: bch2_btree_insert(c, BTREE_ID_snapshots, &root_snapshot.k_i, NULL, 0) ?:
bch2_btree_insert(c, BTREE_ID_subvolumes, &root_volume.k_i, NULL, 0); bch2_btree_insert(c, BTREE_ID_subvolumes, &root_volume.k_i, NULL, 0);
if (ret) bch_err_fn(c, ret);
bch_err_fn(c, ret);
return ret; return ret;
} }
@ -474,10 +497,9 @@ static int __bch2_fs_upgrade_for_subvolumes(struct btree_trans *trans)
noinline_for_stack noinline_for_stack
static int bch2_fs_upgrade_for_subvolumes(struct bch_fs *c) static int bch2_fs_upgrade_for_subvolumes(struct bch_fs *c)
{ {
int ret = bch2_trans_do(c, NULL, NULL, BTREE_INSERT_LAZY_RW, int ret = bch2_trans_do(c, NULL, NULL, BCH_TRANS_COMMIT_lazy_rw,
__bch2_fs_upgrade_for_subvolumes(trans)); __bch2_fs_upgrade_for_subvolumes(trans));
if (ret) bch_err_fn(c, ret);
bch_err_fn(c, ret);
return ret; return ret;
} }
@ -495,7 +517,20 @@ static int bch2_check_allocations(struct bch_fs *c)
static int bch2_set_may_go_rw(struct bch_fs *c) static int bch2_set_may_go_rw(struct bch_fs *c)
{ {
set_bit(BCH_FS_MAY_GO_RW, &c->flags); struct journal_keys *keys = &c->journal_keys;
/*
* After we go RW, the journal keys buffer can't be modified (except for
* setting journal_key->overwritten: it will be accessed by multiple
* threads
*/
move_gap(keys->d, keys->nr, keys->size, keys->gap, keys->nr);
keys->gap = keys->nr;
set_bit(BCH_FS_may_go_rw, &c->flags);
if (keys->nr || c->opts.fsck || !c->sb.clean)
return bch2_fs_read_write_early(c);
return 0; return 0;
} }
@ -589,17 +624,15 @@ static bool check_version_upgrade(struct bch_fs *c)
bch2_version_to_text(&buf, new_version); bch2_version_to_text(&buf, new_version);
prt_newline(&buf); prt_newline(&buf);
u64 recovery_passes = bch2_upgrade_recovery_passes(c, old_version, new_version); struct bch_sb_field_ext *ext = bch2_sb_field_get(c->disk_sb.sb, ext);
if (recovery_passes) { __le64 passes = ext->recovery_passes_required[0];
if ((recovery_passes & RECOVERY_PASS_ALL_FSCK) == RECOVERY_PASS_ALL_FSCK) bch2_sb_set_upgrade(c, old_version, new_version);
prt_str(&buf, "fsck required"); passes = ext->recovery_passes_required[0] & ~passes;
else {
prt_str(&buf, "running recovery passes: ");
prt_bitflags(&buf, bch2_recovery_passes, recovery_passes);
}
c->recovery_passes_explicit |= recovery_passes; if (passes) {
c->opts.fix_errors = FSCK_FIX_yes; prt_str(&buf, " running recovery passes: ");
prt_bitflags(&buf, bch2_recovery_passes,
bch2_recovery_passes_from_stable(le64_to_cpu(passes)));
} }
bch_info(c, "%s", buf.buf); bch_info(c, "%s", buf.buf);
@ -625,7 +658,7 @@ u64 bch2_fsck_recovery_passes(void)
static bool should_run_recovery_pass(struct bch_fs *c, enum bch_recovery_pass pass) static bool should_run_recovery_pass(struct bch_fs *c, enum bch_recovery_pass pass)
{ {
struct recovery_pass_fn *p = recovery_pass_fns + c->curr_recovery_pass; struct recovery_pass_fn *p = recovery_pass_fns + pass;
if (c->opts.norecovery && pass > BCH_RECOVERY_PASS_snapshots_read) if (c->opts.norecovery && pass > BCH_RECOVERY_PASS_snapshots_read)
return false; return false;
@ -642,24 +675,17 @@ static bool should_run_recovery_pass(struct bch_fs *c, enum bch_recovery_pass pa
static int bch2_run_recovery_pass(struct bch_fs *c, enum bch_recovery_pass pass) static int bch2_run_recovery_pass(struct bch_fs *c, enum bch_recovery_pass pass)
{ {
struct recovery_pass_fn *p = recovery_pass_fns + pass;
int ret; int ret;
c->curr_recovery_pass = pass; if (!(p->when & PASS_SILENT))
bch2_print(c, KERN_INFO bch2_log_msg(c, "%s..."),
if (should_run_recovery_pass(c, pass)) { bch2_recovery_passes[pass]);
struct recovery_pass_fn *p = recovery_pass_fns + pass; ret = p->fn(c);
if (ret)
if (!(p->when & PASS_SILENT)) return ret;
printk(KERN_INFO bch2_log_msg(c, "%s..."), if (!(p->when & PASS_SILENT))
bch2_recovery_passes[pass]); bch2_print(c, KERN_CONT " done\n");
ret = p->fn(c);
if (ret)
return ret;
if (!(p->when & PASS_SILENT))
printk(KERN_CONT " done\n");
c->recovery_passes_complete |= BIT_ULL(pass);
}
return 0; return 0;
} }
@ -669,12 +695,42 @@ static int bch2_run_recovery_passes(struct bch_fs *c)
int ret = 0; int ret = 0;
while (c->curr_recovery_pass < ARRAY_SIZE(recovery_pass_fns)) { while (c->curr_recovery_pass < ARRAY_SIZE(recovery_pass_fns)) {
ret = bch2_run_recovery_pass(c, c->curr_recovery_pass); if (should_run_recovery_pass(c, c->curr_recovery_pass)) {
if (bch2_err_matches(ret, BCH_ERR_restart_recovery)) unsigned pass = c->curr_recovery_pass;
ret = bch2_run_recovery_pass(c, c->curr_recovery_pass);
if (bch2_err_matches(ret, BCH_ERR_restart_recovery) ||
(ret && c->curr_recovery_pass < pass))
continue;
if (ret)
break;
c->recovery_passes_complete |= BIT_ULL(c->curr_recovery_pass);
}
c->curr_recovery_pass++;
c->recovery_pass_done = max(c->recovery_pass_done, c->curr_recovery_pass);
}
return ret;
}
int bch2_run_online_recovery_passes(struct bch_fs *c)
{
int ret = 0;
for (unsigned i = 0; i < ARRAY_SIZE(recovery_pass_fns); i++) {
struct recovery_pass_fn *p = recovery_pass_fns + i;
if (!(p->when & PASS_ONLINE))
continue; continue;
ret = bch2_run_recovery_pass(c, i);
if (bch2_err_matches(ret, BCH_ERR_restart_recovery)) {
i = c->curr_recovery_pass;
continue;
}
if (ret) if (ret)
break; break;
c->curr_recovery_pass++;
} }
return ret; return ret;
@ -779,6 +835,9 @@ int bch2_fs_recovery(struct bch_fs *c)
if (c->opts.fsck && IS_ENABLED(CONFIG_BCACHEFS_DEBUG)) if (c->opts.fsck && IS_ENABLED(CONFIG_BCACHEFS_DEBUG))
c->recovery_passes_explicit |= BIT_ULL(BCH_RECOVERY_PASS_check_topology); c->recovery_passes_explicit |= BIT_ULL(BCH_RECOVERY_PASS_check_topology);
if (c->opts.fsck)
set_bit(BCH_FS_fsck_running, &c->flags);
ret = bch2_blacklist_table_initialize(c); ret = bch2_blacklist_table_initialize(c);
if (ret) { if (ret) {
bch_err(c, "error initializing blacklist table"); bch_err(c, "error initializing blacklist table");
@ -919,13 +978,17 @@ int bch2_fs_recovery(struct bch_fs *c)
if (ret) if (ret)
goto err; goto err;
clear_bit(BCH_FS_fsck_running, &c->flags);
/* If we fixed errors, verify that fs is actually clean now: */ /* If we fixed errors, verify that fs is actually clean now: */
if (IS_ENABLED(CONFIG_BCACHEFS_DEBUG) && if (IS_ENABLED(CONFIG_BCACHEFS_DEBUG) &&
test_bit(BCH_FS_ERRORS_FIXED, &c->flags) && test_bit(BCH_FS_errors_fixed, &c->flags) &&
!test_bit(BCH_FS_ERRORS_NOT_FIXED, &c->flags) && !test_bit(BCH_FS_errors_not_fixed, &c->flags) &&
!test_bit(BCH_FS_ERROR, &c->flags)) { !test_bit(BCH_FS_error, &c->flags)) {
bch2_flush_fsck_errs(c);
bch_info(c, "Fixed errors, running fsck a second time to verify fs is clean"); bch_info(c, "Fixed errors, running fsck a second time to verify fs is clean");
clear_bit(BCH_FS_ERRORS_FIXED, &c->flags); clear_bit(BCH_FS_errors_fixed, &c->flags);
c->curr_recovery_pass = BCH_RECOVERY_PASS_check_alloc_info; c->curr_recovery_pass = BCH_RECOVERY_PASS_check_alloc_info;
@ -933,13 +996,13 @@ int bch2_fs_recovery(struct bch_fs *c)
if (ret) if (ret)
goto err; goto err;
if (test_bit(BCH_FS_ERRORS_FIXED, &c->flags) || if (test_bit(BCH_FS_errors_fixed, &c->flags) ||
test_bit(BCH_FS_ERRORS_NOT_FIXED, &c->flags)) { test_bit(BCH_FS_errors_not_fixed, &c->flags)) {
bch_err(c, "Second fsck run was not clean"); bch_err(c, "Second fsck run was not clean");
set_bit(BCH_FS_ERRORS_NOT_FIXED, &c->flags); set_bit(BCH_FS_errors_not_fixed, &c->flags);
} }
set_bit(BCH_FS_ERRORS_FIXED, &c->flags); set_bit(BCH_FS_errors_fixed, &c->flags);
} }
if (enabled_qtypes(c)) { if (enabled_qtypes(c)) {
@ -958,13 +1021,13 @@ int bch2_fs_recovery(struct bch_fs *c)
write_sb = true; write_sb = true;
} }
if (!test_bit(BCH_FS_ERROR, &c->flags) && if (!test_bit(BCH_FS_error, &c->flags) &&
!(c->disk_sb.sb->compat[0] & cpu_to_le64(1ULL << BCH_COMPAT_alloc_info))) { !(c->disk_sb.sb->compat[0] & cpu_to_le64(1ULL << BCH_COMPAT_alloc_info))) {
c->disk_sb.sb->compat[0] |= cpu_to_le64(1ULL << BCH_COMPAT_alloc_info); c->disk_sb.sb->compat[0] |= cpu_to_le64(1ULL << BCH_COMPAT_alloc_info);
write_sb = true; write_sb = true;
} }
if (!test_bit(BCH_FS_ERROR, &c->flags)) { if (!test_bit(BCH_FS_error, &c->flags)) {
struct bch_sb_field_ext *ext = bch2_sb_field_get(c->disk_sb.sb, ext); struct bch_sb_field_ext *ext = bch2_sb_field_get(c->disk_sb.sb, ext);
if (ext && if (ext &&
(!bch2_is_zero(ext->recovery_passes_required, sizeof(ext->recovery_passes_required)) || (!bch2_is_zero(ext->recovery_passes_required, sizeof(ext->recovery_passes_required)) ||
@ -976,8 +1039,8 @@ int bch2_fs_recovery(struct bch_fs *c)
} }
if (c->opts.fsck && if (c->opts.fsck &&
!test_bit(BCH_FS_ERROR, &c->flags) && !test_bit(BCH_FS_error, &c->flags) &&
!test_bit(BCH_FS_ERRORS_NOT_FIXED, &c->flags)) { !test_bit(BCH_FS_errors_not_fixed, &c->flags)) {
SET_BCH_SB_HAS_ERRORS(c->disk_sb.sb, 0); SET_BCH_SB_HAS_ERRORS(c->disk_sb.sb, 0);
SET_BCH_SB_HAS_TOPOLOGY_ERRORS(c->disk_sb.sb, 0); SET_BCH_SB_HAS_TOPOLOGY_ERRORS(c->disk_sb.sb, 0);
write_sb = true; write_sb = true;
@ -993,8 +1056,12 @@ int bch2_fs_recovery(struct bch_fs *c)
bch2_move_stats_init(&stats, "recovery"); bch2_move_stats_init(&stats, "recovery");
bch_info(c, "scanning for old btree nodes"); struct printbuf buf = PRINTBUF;
ret = bch2_fs_read_write(c) ?: bch2_version_to_text(&buf, c->sb.version_min);
bch_info(c, "scanning for old btree nodes: min_version %s", buf.buf);
printbuf_exit(&buf);
ret = bch2_fs_read_write_early(c) ?:
bch2_scan_old_btree_nodes(c, &stats); bch2_scan_old_btree_nodes(c, &stats);
if (ret) if (ret)
goto err; goto err;
@ -1007,7 +1074,6 @@ int bch2_fs_recovery(struct bch_fs *c)
ret = 0; ret = 0;
out: out:
set_bit(BCH_FS_FSCK_DONE, &c->flags);
bch2_flush_fsck_errs(c); bch2_flush_fsck_errs(c);
if (!c->opts.keep_journal && if (!c->opts.keep_journal &&
@ -1015,13 +1081,14 @@ int bch2_fs_recovery(struct bch_fs *c)
bch2_journal_keys_put_initial(c); bch2_journal_keys_put_initial(c);
kfree(clean); kfree(clean);
if (!ret && test_bit(BCH_FS_NEED_DELETE_DEAD_SNAPSHOTS, &c->flags)) { if (!ret &&
test_bit(BCH_FS_need_delete_dead_snapshots, &c->flags) &&
!c->opts.nochanges) {
bch2_fs_read_write_early(c); bch2_fs_read_write_early(c);
bch2_delete_dead_snapshots_async(c); bch2_delete_dead_snapshots_async(c);
} }
if (ret) bch_err_fn(c, ret);
bch_err_fn(c, ret);
return ret; return ret;
err: err:
fsck_err: fsck_err:
@ -1034,8 +1101,6 @@ int bch2_fs_initialize(struct bch_fs *c)
struct bch_inode_unpacked root_inode, lostfound_inode; struct bch_inode_unpacked root_inode, lostfound_inode;
struct bkey_inode_buf packed_inode; struct bkey_inode_buf packed_inode;
struct qstr lostfound = QSTR("lost+found"); struct qstr lostfound = QSTR("lost+found");
struct bch_dev *ca;
unsigned i;
int ret; int ret;
bch_notice(c, "initializing new filesystem"); bch_notice(c, "initializing new filesystem");
@ -1054,13 +1119,12 @@ int bch2_fs_initialize(struct bch_fs *c)
mutex_unlock(&c->sb_lock); mutex_unlock(&c->sb_lock);
c->curr_recovery_pass = ARRAY_SIZE(recovery_pass_fns); c->curr_recovery_pass = ARRAY_SIZE(recovery_pass_fns);
set_bit(BCH_FS_MAY_GO_RW, &c->flags); set_bit(BCH_FS_may_go_rw, &c->flags);
set_bit(BCH_FS_FSCK_DONE, &c->flags);
for (i = 0; i < BTREE_ID_NR; i++) for (unsigned i = 0; i < BTREE_ID_NR; i++)
bch2_btree_root_alloc(c, i); bch2_btree_root_alloc(c, i);
for_each_member_device(ca, c, i) for_each_member_device(c, ca)
bch2_dev_usage_init(ca); bch2_dev_usage_init(ca);
ret = bch2_fs_journal_alloc(c); ret = bch2_fs_journal_alloc(c);
@ -1088,7 +1152,7 @@ int bch2_fs_initialize(struct bch_fs *c)
if (ret) if (ret)
goto err; goto err;
for_each_online_member(ca, c, i) for_each_online_member(c, ca)
ca->new_fs_bucket_idx = 0; ca->new_fs_bucket_idx = 0;
ret = bch2_fs_freespace_init(c); ret = bch2_fs_freespace_init(c);
@ -1112,10 +1176,9 @@ int bch2_fs_initialize(struct bch_fs *c)
packed_inode.inode.k.p.snapshot = U32_MAX; packed_inode.inode.k.p.snapshot = U32_MAX;
ret = bch2_btree_insert(c, BTREE_ID_inodes, &packed_inode.inode.k_i, NULL, 0); ret = bch2_btree_insert(c, BTREE_ID_inodes, &packed_inode.inode.k_i, NULL, 0);
if (ret) { bch_err_msg(c, ret, "creating root directory");
bch_err_msg(c, ret, "creating root directory"); if (ret)
goto err; goto err;
}
bch2_inode_init_early(c, &lostfound_inode); bch2_inode_init_early(c, &lostfound_inode);
@ -1126,10 +1189,11 @@ int bch2_fs_initialize(struct bch_fs *c)
&lostfound, &lostfound,
0, 0, S_IFDIR|0700, 0, 0, 0, S_IFDIR|0700, 0,
NULL, NULL, (subvol_inum) { 0 }, 0)); NULL, NULL, (subvol_inum) { 0 }, 0));
if (ret) { bch_err_msg(c, ret, "creating lost+found");
bch_err_msg(c, ret, "creating lost+found"); if (ret)
goto err; goto err;
}
c->recovery_pass_done = ARRAY_SIZE(recovery_pass_fns) - 1;
if (enabled_qtypes(c)) { if (enabled_qtypes(c)) {
ret = bch2_fs_quota_read(c); ret = bch2_fs_quota_read(c);
@ -1138,10 +1202,9 @@ int bch2_fs_initialize(struct bch_fs *c)
} }
ret = bch2_journal_flush(&c->journal); ret = bch2_journal_flush(&c->journal);
if (ret) { bch_err_msg(c, ret, "writing first journal entry");
bch_err_msg(c, ret, "writing first journal entry"); if (ret)
goto err; goto err;
}
mutex_lock(&c->sb_lock); mutex_lock(&c->sb_lock);
SET_BCH_SB_INITIALIZED(c->disk_sb.sb, true); SET_BCH_SB_INITIALIZED(c->disk_sb.sb, true);
@ -1152,6 +1215,6 @@ int bch2_fs_initialize(struct bch_fs *c)
return 0; return 0;
err: err:
bch_err_fn(ca, ret); bch_err_fn(c, ret);
return ret; return ret;
} }

View File

@ -31,6 +31,7 @@ static inline int bch2_run_explicit_recovery_pass(struct bch_fs *c,
} }
} }
int bch2_run_online_recovery_passes(struct bch_fs *);
u64 bch2_fsck_recovery_passes(void); u64 bch2_fsck_recovery_passes(void);
int bch2_fs_recovery(struct bch_fs *); int bch2_fs_recovery(struct bch_fs *);

View File

@ -6,6 +6,7 @@
#define PASS_FSCK BIT(1) #define PASS_FSCK BIT(1)
#define PASS_UNCLEAN BIT(2) #define PASS_UNCLEAN BIT(2)
#define PASS_ALWAYS BIT(3) #define PASS_ALWAYS BIT(3)
#define PASS_ONLINE BIT(4)
/* /*
* Passes may be reordered, but the second field is a persistent identifier and * Passes may be reordered, but the second field is a persistent identifier and
@ -22,18 +23,18 @@
x(fs_journal_alloc, 7, PASS_ALWAYS|PASS_SILENT) \ x(fs_journal_alloc, 7, PASS_ALWAYS|PASS_SILENT) \
x(set_may_go_rw, 8, PASS_ALWAYS|PASS_SILENT) \ x(set_may_go_rw, 8, PASS_ALWAYS|PASS_SILENT) \
x(journal_replay, 9, PASS_ALWAYS) \ x(journal_replay, 9, PASS_ALWAYS) \
x(check_alloc_info, 10, PASS_FSCK) \ x(check_alloc_info, 10, PASS_ONLINE|PASS_FSCK) \
x(check_lrus, 11, PASS_FSCK) \ x(check_lrus, 11, PASS_ONLINE|PASS_FSCK) \
x(check_btree_backpointers, 12, PASS_FSCK) \ x(check_btree_backpointers, 12, PASS_ONLINE|PASS_FSCK) \
x(check_backpointers_to_extents, 13, PASS_FSCK) \ x(check_backpointers_to_extents, 13, PASS_ONLINE|PASS_FSCK) \
x(check_extents_to_backpointers, 14, PASS_FSCK) \ x(check_extents_to_backpointers, 14, PASS_ONLINE|PASS_FSCK) \
x(check_alloc_to_lru_refs, 15, PASS_FSCK) \ x(check_alloc_to_lru_refs, 15, PASS_ONLINE|PASS_FSCK) \
x(fs_freespace_init, 16, PASS_ALWAYS|PASS_SILENT) \ x(fs_freespace_init, 16, PASS_ALWAYS|PASS_SILENT) \
x(bucket_gens_init, 17, 0) \ x(bucket_gens_init, 17, 0) \
x(check_snapshot_trees, 18, PASS_FSCK) \ x(check_snapshot_trees, 18, PASS_ONLINE|PASS_FSCK) \
x(check_snapshots, 19, PASS_FSCK) \ x(check_snapshots, 19, PASS_ONLINE|PASS_FSCK) \
x(check_subvols, 20, PASS_FSCK) \ x(check_subvols, 20, PASS_ONLINE|PASS_FSCK) \
x(delete_dead_snapshots, 21, PASS_FSCK) \ x(delete_dead_snapshots, 21, PASS_ONLINE|PASS_FSCK) \
x(fs_upgrade_for_subvolumes, 22, 0) \ x(fs_upgrade_for_subvolumes, 22, 0) \
x(resume_logged_ops, 23, PASS_ALWAYS) \ x(resume_logged_ops, 23, PASS_ALWAYS) \
x(check_inodes, 24, PASS_FSCK) \ x(check_inodes, 24, PASS_FSCK) \
@ -41,8 +42,8 @@
x(check_indirect_extents, 26, PASS_FSCK) \ x(check_indirect_extents, 26, PASS_FSCK) \
x(check_dirents, 27, PASS_FSCK) \ x(check_dirents, 27, PASS_FSCK) \
x(check_xattrs, 28, PASS_FSCK) \ x(check_xattrs, 28, PASS_FSCK) \
x(check_root, 29, PASS_FSCK) \ x(check_root, 29, PASS_ONLINE|PASS_FSCK) \
x(check_directory_structure, 30, PASS_FSCK) \ x(check_directory_structure, 30, PASS_ONLINE|PASS_FSCK) \
x(check_nlinks, 31, PASS_FSCK) \ x(check_nlinks, 31, PASS_FSCK) \
x(delete_dead_inodes, 32, PASS_FSCK|PASS_UNCLEAN) \ x(delete_dead_inodes, 32, PASS_FSCK|PASS_UNCLEAN) \
x(fix_reflink_p, 33, 0) \ x(fix_reflink_p, 33, 0) \

View File

@ -3,6 +3,7 @@
#include "bkey_buf.h" #include "bkey_buf.h"
#include "btree_update.h" #include "btree_update.h"
#include "buckets.h" #include "buckets.h"
#include "error.h"
#include "extents.h" #include "extents.h"
#include "inode.h" #include "inode.h"
#include "io_misc.h" #include "io_misc.h"
@ -33,15 +34,14 @@ int bch2_reflink_p_invalid(struct bch_fs *c, struct bkey_s_c k,
struct printbuf *err) struct printbuf *err)
{ {
struct bkey_s_c_reflink_p p = bkey_s_c_to_reflink_p(k); struct bkey_s_c_reflink_p p = bkey_s_c_to_reflink_p(k);
int ret = 0;
if (c->sb.version >= bcachefs_metadata_version_reflink_p_fix && bkey_fsck_err_on(le64_to_cpu(p.v->idx) < le32_to_cpu(p.v->front_pad),
le64_to_cpu(p.v->idx) < le32_to_cpu(p.v->front_pad)) { c, err, reflink_p_front_pad_bad,
prt_printf(err, "idx < front_pad (%llu < %u)", "idx < front_pad (%llu < %u)",
le64_to_cpu(p.v->idx), le32_to_cpu(p.v->front_pad)); le64_to_cpu(p.v->idx), le32_to_cpu(p.v->front_pad));
return -EINVAL; fsck_err:
} return ret;
return 0;
} }
void bch2_reflink_p_to_text(struct printbuf *out, struct bch_fs *c, void bch2_reflink_p_to_text(struct printbuf *out, struct bch_fs *c,
@ -73,6 +73,184 @@ bool bch2_reflink_p_merge(struct bch_fs *c, struct bkey_s _l, struct bkey_s_c _r
return true; return true;
} }
static int trans_trigger_reflink_p_segment(struct btree_trans *trans,
struct bkey_s_c_reflink_p p,
u64 *idx, unsigned flags)
{
struct bch_fs *c = trans->c;
struct btree_iter iter;
struct bkey_i *k;
__le64 *refcount;
int add = !(flags & BTREE_TRIGGER_OVERWRITE) ? 1 : -1;
struct printbuf buf = PRINTBUF;
int ret;
k = bch2_bkey_get_mut_noupdate(trans, &iter,
BTREE_ID_reflink, POS(0, *idx),
BTREE_ITER_WITH_UPDATES);
ret = PTR_ERR_OR_ZERO(k);
if (ret)
goto err;
refcount = bkey_refcount(bkey_i_to_s(k));
if (!refcount) {
bch2_bkey_val_to_text(&buf, c, p.s_c);
bch2_trans_inconsistent(trans,
"nonexistent indirect extent at %llu while marking\n %s",
*idx, buf.buf);
ret = -EIO;
goto err;
}
if (!*refcount && (flags & BTREE_TRIGGER_OVERWRITE)) {
bch2_bkey_val_to_text(&buf, c, p.s_c);
bch2_trans_inconsistent(trans,
"indirect extent refcount underflow at %llu while marking\n %s",
*idx, buf.buf);
ret = -EIO;
goto err;
}
if (flags & BTREE_TRIGGER_INSERT) {
struct bch_reflink_p *v = (struct bch_reflink_p *) p.v;
u64 pad;
pad = max_t(s64, le32_to_cpu(v->front_pad),
le64_to_cpu(v->idx) - bkey_start_offset(&k->k));
BUG_ON(pad > U32_MAX);
v->front_pad = cpu_to_le32(pad);
pad = max_t(s64, le32_to_cpu(v->back_pad),
k->k.p.offset - p.k->size - le64_to_cpu(v->idx));
BUG_ON(pad > U32_MAX);
v->back_pad = cpu_to_le32(pad);
}
le64_add_cpu(refcount, add);
bch2_btree_iter_set_pos_to_extent_start(&iter);
ret = bch2_trans_update(trans, &iter, k, 0);
if (ret)
goto err;
*idx = k->k.p.offset;
err:
bch2_trans_iter_exit(trans, &iter);
printbuf_exit(&buf);
return ret;
}
static s64 gc_trigger_reflink_p_segment(struct btree_trans *trans,
struct bkey_s_c_reflink_p p,
u64 *idx, unsigned flags, size_t r_idx)
{
struct bch_fs *c = trans->c;
struct reflink_gc *r;
int add = !(flags & BTREE_TRIGGER_OVERWRITE) ? 1 : -1;
u64 start = le64_to_cpu(p.v->idx);
u64 end = le64_to_cpu(p.v->idx) + p.k->size;
u64 next_idx = end + le32_to_cpu(p.v->back_pad);
s64 ret = 0;
struct printbuf buf = PRINTBUF;
if (r_idx >= c->reflink_gc_nr)
goto not_found;
r = genradix_ptr(&c->reflink_gc_table, r_idx);
next_idx = min(next_idx, r->offset - r->size);
if (*idx < next_idx)
goto not_found;
BUG_ON((s64) r->refcount + add < 0);
r->refcount += add;
*idx = r->offset;
return 0;
not_found:
if (fsck_err(c, reflink_p_to_missing_reflink_v,
"pointer to missing indirect extent\n"
" %s\n"
" missing range %llu-%llu",
(bch2_bkey_val_to_text(&buf, c, p.s_c), buf.buf),
*idx, next_idx)) {
struct bkey_i *update = bch2_bkey_make_mut_noupdate(trans, p.s_c);
ret = PTR_ERR_OR_ZERO(update);
if (ret)
goto err;
if (next_idx <= start) {
bkey_i_to_reflink_p(update)->v.front_pad = cpu_to_le32(start - next_idx);
} else if (*idx >= end) {
bkey_i_to_reflink_p(update)->v.back_pad = cpu_to_le32(*idx - end);
} else {
bkey_error_init(update);
update->k.p = p.k->p;
update->k.p.offset = next_idx;
update->k.size = next_idx - *idx;
set_bkey_val_u64s(&update->k, 0);
}
ret = bch2_btree_insert_trans(trans, BTREE_ID_extents, update, BTREE_TRIGGER_NORUN);
}
*idx = next_idx;
err:
fsck_err:
printbuf_exit(&buf);
return ret;
}
static int __trigger_reflink_p(struct btree_trans *trans,
enum btree_id btree_id, unsigned level,
struct bkey_s_c k, unsigned flags)
{
struct bch_fs *c = trans->c;
struct bkey_s_c_reflink_p p = bkey_s_c_to_reflink_p(k);
int ret = 0;
u64 idx = le64_to_cpu(p.v->idx) - le32_to_cpu(p.v->front_pad);
u64 end = le64_to_cpu(p.v->idx) + p.k->size + le32_to_cpu(p.v->back_pad);
if (flags & BTREE_TRIGGER_TRANSACTIONAL) {
while (idx < end && !ret)
ret = trans_trigger_reflink_p_segment(trans, p, &idx, flags);
}
if (flags & BTREE_TRIGGER_GC) {
size_t l = 0, r = c->reflink_gc_nr;
while (l < r) {
size_t m = l + (r - l) / 2;
struct reflink_gc *ref = genradix_ptr(&c->reflink_gc_table, m);
if (ref->offset <= idx)
l = m + 1;
else
r = m;
}
while (idx < end && !ret)
ret = gc_trigger_reflink_p_segment(trans, p, &idx, flags, l++);
}
return ret;
}
int bch2_trigger_reflink_p(struct btree_trans *trans,
enum btree_id btree_id, unsigned level,
struct bkey_s_c old,
struct bkey_s new,
unsigned flags)
{
if ((flags & BTREE_TRIGGER_TRANSACTIONAL) &&
(flags & BTREE_TRIGGER_INSERT)) {
struct bch_reflink_p *v = bkey_s_to_reflink_p(new).v;
v->front_pad = v->back_pad = 0;
}
return trigger_run_overwrite_then_insert(__trigger_reflink_p, trans, btree_id, level, old, new, flags);
}
/* indirect extents */ /* indirect extents */
int bch2_reflink_v_invalid(struct bch_fs *c, struct bkey_s_c k, int bch2_reflink_v_invalid(struct bch_fs *c, struct bkey_s_c k,
@ -104,32 +282,26 @@ bool bch2_reflink_v_merge(struct bch_fs *c, struct bkey_s _l, struct bkey_s_c _r
} }
#endif #endif
static inline void check_indirect_extent_deleting(struct bkey_i *new, unsigned *flags) static inline void check_indirect_extent_deleting(struct bkey_s new, unsigned *flags)
{ {
if ((*flags & BTREE_TRIGGER_INSERT) && !*bkey_refcount(new)) { if ((*flags & BTREE_TRIGGER_INSERT) && !*bkey_refcount(new)) {
new->k.type = KEY_TYPE_deleted; new.k->type = KEY_TYPE_deleted;
new->k.size = 0; new.k->size = 0;
set_bkey_val_u64s(&new->k, 0);; set_bkey_val_u64s(new.k, 0);
*flags &= ~BTREE_TRIGGER_INSERT; *flags &= ~BTREE_TRIGGER_INSERT;
} }
} }
int bch2_trans_mark_reflink_v(struct btree_trans *trans, int bch2_trans_mark_reflink_v(struct btree_trans *trans,
enum btree_id btree_id, unsigned level, enum btree_id btree_id, unsigned level,
struct bkey_s_c old, struct bkey_i *new, struct bkey_s_c old, struct bkey_s new,
unsigned flags) unsigned flags)
{ {
check_indirect_extent_deleting(new, &flags); if ((flags & BTREE_TRIGGER_TRANSACTIONAL) &&
(flags & BTREE_TRIGGER_INSERT))
check_indirect_extent_deleting(new, &flags);
if (old.k->type == KEY_TYPE_reflink_v && return bch2_trigger_extent(trans, btree_id, level, old, new, flags);
new->k.type == KEY_TYPE_reflink_v &&
old.k->u64s == new->k.u64s &&
!memcmp(bkey_s_c_to_reflink_v(old).v->start,
bkey_i_to_reflink_v(new)->v.start,
bkey_val_bytes(&new->k) - 8))
return 0;
return bch2_trans_mark_extent(trans, btree_id, level, old, new, flags);
} }
/* indirect inline data */ /* indirect inline data */
@ -154,7 +326,7 @@ void bch2_indirect_inline_data_to_text(struct printbuf *out,
int bch2_trans_mark_indirect_inline_data(struct btree_trans *trans, int bch2_trans_mark_indirect_inline_data(struct btree_trans *trans,
enum btree_id btree_id, unsigned level, enum btree_id btree_id, unsigned level,
struct bkey_s_c old, struct bkey_i *new, struct bkey_s_c old, struct bkey_s new,
unsigned flags) unsigned flags)
{ {
check_indirect_extent_deleting(new, &flags); check_indirect_extent_deleting(new, &flags);
@ -197,7 +369,7 @@ static int bch2_make_extent_indirect(struct btree_trans *trans,
set_bkey_val_bytes(&r_v->k, sizeof(__le64) + bkey_val_bytes(&orig->k)); set_bkey_val_bytes(&r_v->k, sizeof(__le64) + bkey_val_bytes(&orig->k));
refcount = bkey_refcount(r_v); refcount = bkey_refcount(bkey_i_to_s(r_v));
*refcount = 0; *refcount = 0;
memcpy(refcount + 1, &orig->v, bkey_val_bytes(&orig->k)); memcpy(refcount + 1, &orig->v, bkey_val_bytes(&orig->k));
@ -398,7 +570,7 @@ s64 bch2_remap_range(struct bch_fs *c,
inode_u.bi_size = new_i_size; inode_u.bi_size = new_i_size;
ret2 = bch2_inode_write(trans, &inode_iter, &inode_u) ?: ret2 = bch2_inode_write(trans, &inode_iter, &inode_u) ?:
bch2_trans_commit(trans, NULL, NULL, bch2_trans_commit(trans, NULL, NULL,
BTREE_INSERT_NOFAIL); BCH_TRANS_COMMIT_no_enospc);
} }
bch2_trans_iter_exit(trans, &inode_iter); bch2_trans_iter_exit(trans, &inode_iter);

View File

@ -9,13 +9,14 @@ int bch2_reflink_p_invalid(struct bch_fs *, struct bkey_s_c,
void bch2_reflink_p_to_text(struct printbuf *, struct bch_fs *, void bch2_reflink_p_to_text(struct printbuf *, struct bch_fs *,
struct bkey_s_c); struct bkey_s_c);
bool bch2_reflink_p_merge(struct bch_fs *, struct bkey_s, struct bkey_s_c); bool bch2_reflink_p_merge(struct bch_fs *, struct bkey_s, struct bkey_s_c);
int bch2_trigger_reflink_p(struct btree_trans *, enum btree_id, unsigned,
struct bkey_s_c, struct bkey_s, unsigned);
#define bch2_bkey_ops_reflink_p ((struct bkey_ops) { \ #define bch2_bkey_ops_reflink_p ((struct bkey_ops) { \
.key_invalid = bch2_reflink_p_invalid, \ .key_invalid = bch2_reflink_p_invalid, \
.val_to_text = bch2_reflink_p_to_text, \ .val_to_text = bch2_reflink_p_to_text, \
.key_merge = bch2_reflink_p_merge, \ .key_merge = bch2_reflink_p_merge, \
.trans_trigger = bch2_trans_mark_reflink_p, \ .trigger = bch2_trigger_reflink_p, \
.atomic_trigger = bch2_mark_reflink_p, \
.min_val_size = 16, \ .min_val_size = 16, \
}) })
@ -24,14 +25,13 @@ int bch2_reflink_v_invalid(struct bch_fs *, struct bkey_s_c,
void bch2_reflink_v_to_text(struct printbuf *, struct bch_fs *, void bch2_reflink_v_to_text(struct printbuf *, struct bch_fs *,
struct bkey_s_c); struct bkey_s_c);
int bch2_trans_mark_reflink_v(struct btree_trans *, enum btree_id, unsigned, int bch2_trans_mark_reflink_v(struct btree_trans *, enum btree_id, unsigned,
struct bkey_s_c, struct bkey_i *, unsigned); struct bkey_s_c, struct bkey_s, unsigned);
#define bch2_bkey_ops_reflink_v ((struct bkey_ops) { \ #define bch2_bkey_ops_reflink_v ((struct bkey_ops) { \
.key_invalid = bch2_reflink_v_invalid, \ .key_invalid = bch2_reflink_v_invalid, \
.val_to_text = bch2_reflink_v_to_text, \ .val_to_text = bch2_reflink_v_to_text, \
.swab = bch2_ptr_swab, \ .swab = bch2_ptr_swab, \
.trans_trigger = bch2_trans_mark_reflink_v, \ .trigger = bch2_trans_mark_reflink_v, \
.atomic_trigger = bch2_mark_extent, \
.min_val_size = 8, \ .min_val_size = 8, \
}) })
@ -41,13 +41,13 @@ void bch2_indirect_inline_data_to_text(struct printbuf *,
struct bch_fs *, struct bkey_s_c); struct bch_fs *, struct bkey_s_c);
int bch2_trans_mark_indirect_inline_data(struct btree_trans *, int bch2_trans_mark_indirect_inline_data(struct btree_trans *,
enum btree_id, unsigned, enum btree_id, unsigned,
struct bkey_s_c, struct bkey_i *, struct bkey_s_c, struct bkey_s,
unsigned); unsigned);
#define bch2_bkey_ops_indirect_inline_data ((struct bkey_ops) { \ #define bch2_bkey_ops_indirect_inline_data ((struct bkey_ops) { \
.key_invalid = bch2_indirect_inline_data_invalid, \ .key_invalid = bch2_indirect_inline_data_invalid, \
.val_to_text = bch2_indirect_inline_data_to_text, \ .val_to_text = bch2_indirect_inline_data_to_text, \
.trans_trigger = bch2_trans_mark_indirect_inline_data, \ .trigger = bch2_trans_mark_indirect_inline_data, \
.min_val_size = 8, \ .min_val_size = 8, \
}) })
@ -63,13 +63,13 @@ static inline const __le64 *bkey_refcount_c(struct bkey_s_c k)
} }
} }
static inline __le64 *bkey_refcount(struct bkey_i *k) static inline __le64 *bkey_refcount(struct bkey_s k)
{ {
switch (k->k.type) { switch (k.k->type) {
case KEY_TYPE_reflink_v: case KEY_TYPE_reflink_v:
return &bkey_i_to_reflink_v(k)->v.refcount; return &bkey_s_to_reflink_v(k).v->refcount;
case KEY_TYPE_indirect_inline_data: case KEY_TYPE_indirect_inline_data:
return &bkey_i_to_indirect_inline_data(k)->v.refcount; return &bkey_s_to_indirect_inline_data(k).v->refcount;
default: default:
return NULL; return NULL;
} }

View File

@ -11,7 +11,7 @@ static int bch2_cpu_replicas_to_sb_replicas(struct bch_fs *,
/* Replicas tracking - in memory: */ /* Replicas tracking - in memory: */
static void verify_replicas_entry(struct bch_replicas_entry *e) static void verify_replicas_entry(struct bch_replicas_entry_v1 *e)
{ {
#ifdef CONFIG_BCACHEFS_DEBUG #ifdef CONFIG_BCACHEFS_DEBUG
unsigned i; unsigned i;
@ -26,7 +26,7 @@ static void verify_replicas_entry(struct bch_replicas_entry *e)
#endif #endif
} }
void bch2_replicas_entry_sort(struct bch_replicas_entry *e) void bch2_replicas_entry_sort(struct bch_replicas_entry_v1 *e)
{ {
bubble_sort(e->devs, e->nr_devs, u8_cmp); bubble_sort(e->devs, e->nr_devs, u8_cmp);
} }
@ -53,7 +53,7 @@ static void bch2_replicas_entry_v0_to_text(struct printbuf *out,
} }
void bch2_replicas_entry_to_text(struct printbuf *out, void bch2_replicas_entry_to_text(struct printbuf *out,
struct bch_replicas_entry *e) struct bch_replicas_entry_v1 *e)
{ {
unsigned i; unsigned i;
@ -68,7 +68,7 @@ void bch2_replicas_entry_to_text(struct printbuf *out,
prt_printf(out, "]"); prt_printf(out, "]");
} }
int bch2_replicas_entry_validate(struct bch_replicas_entry *r, int bch2_replicas_entry_validate(struct bch_replicas_entry_v1 *r,
struct bch_sb *sb, struct bch_sb *sb,
struct printbuf *err) struct printbuf *err)
{ {
@ -98,7 +98,7 @@ int bch2_replicas_entry_validate(struct bch_replicas_entry *r,
void bch2_cpu_replicas_to_text(struct printbuf *out, void bch2_cpu_replicas_to_text(struct printbuf *out,
struct bch_replicas_cpu *r) struct bch_replicas_cpu *r)
{ {
struct bch_replicas_entry *e; struct bch_replicas_entry_v1 *e;
bool first = true; bool first = true;
for_each_cpu_replicas_entry(r, e) { for_each_cpu_replicas_entry(r, e) {
@ -111,7 +111,7 @@ void bch2_cpu_replicas_to_text(struct printbuf *out,
} }
static void extent_to_replicas(struct bkey_s_c k, static void extent_to_replicas(struct bkey_s_c k,
struct bch_replicas_entry *r) struct bch_replicas_entry_v1 *r)
{ {
struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k); struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k);
const union bch_extent_entry *entry; const union bch_extent_entry *entry;
@ -131,7 +131,7 @@ static void extent_to_replicas(struct bkey_s_c k,
} }
static void stripe_to_replicas(struct bkey_s_c k, static void stripe_to_replicas(struct bkey_s_c k,
struct bch_replicas_entry *r) struct bch_replicas_entry_v1 *r)
{ {
struct bkey_s_c_stripe s = bkey_s_c_to_stripe(k); struct bkey_s_c_stripe s = bkey_s_c_to_stripe(k);
const struct bch_extent_ptr *ptr; const struct bch_extent_ptr *ptr;
@ -144,7 +144,7 @@ static void stripe_to_replicas(struct bkey_s_c k,
r->devs[r->nr_devs++] = ptr->dev; r->devs[r->nr_devs++] = ptr->dev;
} }
void bch2_bkey_to_replicas(struct bch_replicas_entry *e, void bch2_bkey_to_replicas(struct bch_replicas_entry_v1 *e,
struct bkey_s_c k) struct bkey_s_c k)
{ {
e->nr_devs = 0; e->nr_devs = 0;
@ -169,12 +169,10 @@ void bch2_bkey_to_replicas(struct bch_replicas_entry *e,
bch2_replicas_entry_sort(e); bch2_replicas_entry_sort(e);
} }
void bch2_devlist_to_replicas(struct bch_replicas_entry *e, void bch2_devlist_to_replicas(struct bch_replicas_entry_v1 *e,
enum bch_data_type data_type, enum bch_data_type data_type,
struct bch_devs_list devs) struct bch_devs_list devs)
{ {
unsigned i;
BUG_ON(!data_type || BUG_ON(!data_type ||
data_type == BCH_DATA_sb || data_type == BCH_DATA_sb ||
data_type >= BCH_DATA_NR); data_type >= BCH_DATA_NR);
@ -183,8 +181,8 @@ void bch2_devlist_to_replicas(struct bch_replicas_entry *e,
e->nr_devs = 0; e->nr_devs = 0;
e->nr_required = 1; e->nr_required = 1;
for (i = 0; i < devs.nr; i++) darray_for_each(devs, i)
e->devs[e->nr_devs++] = devs.devs[i]; e->devs[e->nr_devs++] = *i;
bch2_replicas_entry_sort(e); bch2_replicas_entry_sort(e);
} }
@ -192,7 +190,7 @@ void bch2_devlist_to_replicas(struct bch_replicas_entry *e,
static struct bch_replicas_cpu static struct bch_replicas_cpu
cpu_replicas_add_entry(struct bch_fs *c, cpu_replicas_add_entry(struct bch_fs *c,
struct bch_replicas_cpu *old, struct bch_replicas_cpu *old,
struct bch_replicas_entry *new_entry) struct bch_replicas_entry_v1 *new_entry)
{ {
unsigned i; unsigned i;
struct bch_replicas_cpu new = { struct bch_replicas_cpu new = {
@ -225,7 +223,7 @@ cpu_replicas_add_entry(struct bch_fs *c,
} }
static inline int __replicas_entry_idx(struct bch_replicas_cpu *r, static inline int __replicas_entry_idx(struct bch_replicas_cpu *r,
struct bch_replicas_entry *search) struct bch_replicas_entry_v1 *search)
{ {
int idx, entry_size = replicas_entry_bytes(search); int idx, entry_size = replicas_entry_bytes(search);
@ -243,7 +241,7 @@ static inline int __replicas_entry_idx(struct bch_replicas_cpu *r,
} }
int bch2_replicas_entry_idx(struct bch_fs *c, int bch2_replicas_entry_idx(struct bch_fs *c,
struct bch_replicas_entry *search) struct bch_replicas_entry_v1 *search)
{ {
bch2_replicas_entry_sort(search); bch2_replicas_entry_sort(search);
@ -251,13 +249,13 @@ int bch2_replicas_entry_idx(struct bch_fs *c,
} }
static bool __replicas_has_entry(struct bch_replicas_cpu *r, static bool __replicas_has_entry(struct bch_replicas_cpu *r,
struct bch_replicas_entry *search) struct bch_replicas_entry_v1 *search)
{ {
return __replicas_entry_idx(r, search) >= 0; return __replicas_entry_idx(r, search) >= 0;
} }
bool bch2_replicas_marked(struct bch_fs *c, bool bch2_replicas_marked(struct bch_fs *c,
struct bch_replicas_entry *search) struct bch_replicas_entry_v1 *search)
{ {
bool marked; bool marked;
@ -374,7 +372,7 @@ static int replicas_table_update(struct bch_fs *c,
static unsigned reserve_journal_replicas(struct bch_fs *c, static unsigned reserve_journal_replicas(struct bch_fs *c,
struct bch_replicas_cpu *r) struct bch_replicas_cpu *r)
{ {
struct bch_replicas_entry *e; struct bch_replicas_entry_v1 *e;
unsigned journal_res_u64s = 0; unsigned journal_res_u64s = 0;
/* nr_inodes: */ /* nr_inodes: */
@ -399,7 +397,7 @@ static unsigned reserve_journal_replicas(struct bch_fs *c,
noinline noinline
static int bch2_mark_replicas_slowpath(struct bch_fs *c, static int bch2_mark_replicas_slowpath(struct bch_fs *c,
struct bch_replicas_entry *new_entry) struct bch_replicas_entry_v1 *new_entry)
{ {
struct bch_replicas_cpu new_r, new_gc; struct bch_replicas_cpu new_r, new_gc;
int ret = 0; int ret = 0;
@ -464,7 +462,7 @@ static int bch2_mark_replicas_slowpath(struct bch_fs *c,
goto out; goto out;
} }
int bch2_mark_replicas(struct bch_fs *c, struct bch_replicas_entry *r) int bch2_mark_replicas(struct bch_fs *c, struct bch_replicas_entry_v1 *r)
{ {
return likely(bch2_replicas_marked(c, r)) return likely(bch2_replicas_marked(c, r))
? 0 : bch2_mark_replicas_slowpath(c, r); ? 0 : bch2_mark_replicas_slowpath(c, r);
@ -515,7 +513,7 @@ int bch2_replicas_gc_end(struct bch_fs *c, int ret)
int bch2_replicas_gc_start(struct bch_fs *c, unsigned typemask) int bch2_replicas_gc_start(struct bch_fs *c, unsigned typemask)
{ {
struct bch_replicas_entry *e; struct bch_replicas_entry_v1 *e;
unsigned i = 0; unsigned i = 0;
lockdep_assert_held(&c->replicas_gc_lock); lockdep_assert_held(&c->replicas_gc_lock);
@ -590,7 +588,7 @@ int bch2_replicas_gc2(struct bch_fs *c)
} }
for (i = 0; i < c->replicas.nr; i++) { for (i = 0; i < c->replicas.nr; i++) {
struct bch_replicas_entry *e = struct bch_replicas_entry_v1 *e =
cpu_replicas_entry(&c->replicas, i); cpu_replicas_entry(&c->replicas, i);
if (e->data_type == BCH_DATA_journal || if (e->data_type == BCH_DATA_journal ||
@ -621,7 +619,7 @@ int bch2_replicas_gc2(struct bch_fs *c)
} }
int bch2_replicas_set_usage(struct bch_fs *c, int bch2_replicas_set_usage(struct bch_fs *c,
struct bch_replicas_entry *r, struct bch_replicas_entry_v1 *r,
u64 sectors) u64 sectors)
{ {
int ret, idx = bch2_replicas_entry_idx(c, r); int ret, idx = bch2_replicas_entry_idx(c, r);
@ -654,7 +652,7 @@ static int
__bch2_sb_replicas_to_cpu_replicas(struct bch_sb_field_replicas *sb_r, __bch2_sb_replicas_to_cpu_replicas(struct bch_sb_field_replicas *sb_r,
struct bch_replicas_cpu *cpu_r) struct bch_replicas_cpu *cpu_r)
{ {
struct bch_replicas_entry *e, *dst; struct bch_replicas_entry_v1 *e, *dst;
unsigned nr = 0, entry_size = 0, idx = 0; unsigned nr = 0, entry_size = 0, idx = 0;
for_each_replicas_entry(sb_r, e) { for_each_replicas_entry(sb_r, e) {
@ -692,7 +690,7 @@ __bch2_sb_replicas_v0_to_cpu_replicas(struct bch_sb_field_replicas_v0 *sb_r,
nr++; nr++;
} }
entry_size += sizeof(struct bch_replicas_entry) - entry_size += sizeof(struct bch_replicas_entry_v1) -
sizeof(struct bch_replicas_entry_v0); sizeof(struct bch_replicas_entry_v0);
cpu_r->entries = kcalloc(nr, entry_size, GFP_KERNEL); cpu_r->entries = kcalloc(nr, entry_size, GFP_KERNEL);
@ -703,7 +701,7 @@ __bch2_sb_replicas_v0_to_cpu_replicas(struct bch_sb_field_replicas_v0 *sb_r,
cpu_r->entry_size = entry_size; cpu_r->entry_size = entry_size;
for_each_replicas_entry(sb_r, e) { for_each_replicas_entry(sb_r, e) {
struct bch_replicas_entry *dst = struct bch_replicas_entry_v1 *dst =
cpu_replicas_entry(cpu_r, idx++); cpu_replicas_entry(cpu_r, idx++);
dst->data_type = e->data_type; dst->data_type = e->data_type;
@ -747,7 +745,7 @@ static int bch2_cpu_replicas_to_sb_replicas_v0(struct bch_fs *c,
{ {
struct bch_sb_field_replicas_v0 *sb_r; struct bch_sb_field_replicas_v0 *sb_r;
struct bch_replicas_entry_v0 *dst; struct bch_replicas_entry_v0 *dst;
struct bch_replicas_entry *src; struct bch_replicas_entry_v1 *src;
size_t bytes; size_t bytes;
bytes = sizeof(struct bch_sb_field_replicas); bytes = sizeof(struct bch_sb_field_replicas);
@ -785,7 +783,7 @@ static int bch2_cpu_replicas_to_sb_replicas(struct bch_fs *c,
struct bch_replicas_cpu *r) struct bch_replicas_cpu *r)
{ {
struct bch_sb_field_replicas *sb_r; struct bch_sb_field_replicas *sb_r;
struct bch_replicas_entry *dst, *src; struct bch_replicas_entry_v1 *dst, *src;
bool need_v1 = false; bool need_v1 = false;
size_t bytes; size_t bytes;
@ -836,7 +834,7 @@ static int bch2_cpu_replicas_validate(struct bch_replicas_cpu *cpu_r,
memcmp, NULL); memcmp, NULL);
for (i = 0; i < cpu_r->nr; i++) { for (i = 0; i < cpu_r->nr; i++) {
struct bch_replicas_entry *e = struct bch_replicas_entry_v1 *e =
cpu_replicas_entry(cpu_r, i); cpu_replicas_entry(cpu_r, i);
int ret = bch2_replicas_entry_validate(e, sb, err); int ret = bch2_replicas_entry_validate(e, sb, err);
@ -844,7 +842,7 @@ static int bch2_cpu_replicas_validate(struct bch_replicas_cpu *cpu_r,
return ret; return ret;
if (i + 1 < cpu_r->nr) { if (i + 1 < cpu_r->nr) {
struct bch_replicas_entry *n = struct bch_replicas_entry_v1 *n =
cpu_replicas_entry(cpu_r, i + 1); cpu_replicas_entry(cpu_r, i + 1);
BUG_ON(memcmp(e, n, cpu_r->entry_size) > 0); BUG_ON(memcmp(e, n, cpu_r->entry_size) > 0);
@ -881,7 +879,7 @@ static void bch2_sb_replicas_to_text(struct printbuf *out,
struct bch_sb_field *f) struct bch_sb_field *f)
{ {
struct bch_sb_field_replicas *r = field_to_type(f, replicas); struct bch_sb_field_replicas *r = field_to_type(f, replicas);
struct bch_replicas_entry *e; struct bch_replicas_entry_v1 *e;
bool first = true; bool first = true;
for_each_replicas_entry(r, e) { for_each_replicas_entry(r, e) {
@ -943,7 +941,7 @@ const struct bch_sb_field_ops bch_sb_field_ops_replicas_v0 = {
bool bch2_have_enough_devs(struct bch_fs *c, struct bch_devs_mask devs, bool bch2_have_enough_devs(struct bch_fs *c, struct bch_devs_mask devs,
unsigned flags, bool print) unsigned flags, bool print)
{ {
struct bch_replicas_entry *e; struct bch_replicas_entry_v1 *e;
bool ret = true; bool ret = true;
percpu_down_read(&c->mark_lock); percpu_down_read(&c->mark_lock);
@ -1003,7 +1001,7 @@ unsigned bch2_sb_dev_has_data(struct bch_sb *sb, unsigned dev)
replicas_v0 = bch2_sb_field_get(sb, replicas_v0); replicas_v0 = bch2_sb_field_get(sb, replicas_v0);
if (replicas) { if (replicas) {
struct bch_replicas_entry *r; struct bch_replicas_entry_v1 *r;
for_each_replicas_entry(replicas, r) for_each_replicas_entry(replicas, r)
for (i = 0; i < r->nr_devs; i++) for (i = 0; i < r->nr_devs; i++)

View File

@ -6,28 +6,28 @@
#include "eytzinger.h" #include "eytzinger.h"
#include "replicas_types.h" #include "replicas_types.h"
void bch2_replicas_entry_sort(struct bch_replicas_entry *); void bch2_replicas_entry_sort(struct bch_replicas_entry_v1 *);
void bch2_replicas_entry_to_text(struct printbuf *, void bch2_replicas_entry_to_text(struct printbuf *,
struct bch_replicas_entry *); struct bch_replicas_entry_v1 *);
int bch2_replicas_entry_validate(struct bch_replicas_entry *, int bch2_replicas_entry_validate(struct bch_replicas_entry_v1 *,
struct bch_sb *, struct printbuf *); struct bch_sb *, struct printbuf *);
void bch2_cpu_replicas_to_text(struct printbuf *, struct bch_replicas_cpu *); void bch2_cpu_replicas_to_text(struct printbuf *, struct bch_replicas_cpu *);
static inline struct bch_replicas_entry * static inline struct bch_replicas_entry_v1 *
cpu_replicas_entry(struct bch_replicas_cpu *r, unsigned i) cpu_replicas_entry(struct bch_replicas_cpu *r, unsigned i)
{ {
return (void *) r->entries + r->entry_size * i; return (void *) r->entries + r->entry_size * i;
} }
int bch2_replicas_entry_idx(struct bch_fs *, int bch2_replicas_entry_idx(struct bch_fs *,
struct bch_replicas_entry *); struct bch_replicas_entry_v1 *);
void bch2_devlist_to_replicas(struct bch_replicas_entry *, void bch2_devlist_to_replicas(struct bch_replicas_entry_v1 *,
enum bch_data_type, enum bch_data_type,
struct bch_devs_list); struct bch_devs_list);
bool bch2_replicas_marked(struct bch_fs *, struct bch_replicas_entry *); bool bch2_replicas_marked(struct bch_fs *, struct bch_replicas_entry_v1 *);
int bch2_mark_replicas(struct bch_fs *, int bch2_mark_replicas(struct bch_fs *,
struct bch_replicas_entry *); struct bch_replicas_entry_v1 *);
static inline struct replicas_delta * static inline struct replicas_delta *
replicas_delta_next(struct replicas_delta *d) replicas_delta_next(struct replicas_delta *d)
@ -37,9 +37,9 @@ replicas_delta_next(struct replicas_delta *d)
int bch2_replicas_delta_list_mark(struct bch_fs *, struct replicas_delta_list *); int bch2_replicas_delta_list_mark(struct bch_fs *, struct replicas_delta_list *);
void bch2_bkey_to_replicas(struct bch_replicas_entry *, struct bkey_s_c); void bch2_bkey_to_replicas(struct bch_replicas_entry_v1 *, struct bkey_s_c);
static inline void bch2_replicas_entry_cached(struct bch_replicas_entry *e, static inline void bch2_replicas_entry_cached(struct bch_replicas_entry_v1 *e,
unsigned dev) unsigned dev)
{ {
e->data_type = BCH_DATA_cached; e->data_type = BCH_DATA_cached;
@ -59,7 +59,7 @@ int bch2_replicas_gc_start(struct bch_fs *, unsigned);
int bch2_replicas_gc2(struct bch_fs *); int bch2_replicas_gc2(struct bch_fs *);
int bch2_replicas_set_usage(struct bch_fs *, int bch2_replicas_set_usage(struct bch_fs *,
struct bch_replicas_entry *, struct bch_replicas_entry_v1 *,
u64); u64);
#define for_each_cpu_replicas_entry(_r, _i) \ #define for_each_cpu_replicas_entry(_r, _i) \

View File

@ -5,12 +5,12 @@
struct bch_replicas_cpu { struct bch_replicas_cpu {
unsigned nr; unsigned nr;
unsigned entry_size; unsigned entry_size;
struct bch_replicas_entry *entries; struct bch_replicas_entry_v1 *entries;
}; };
struct replicas_delta { struct replicas_delta {
s64 delta; s64 delta;
struct bch_replicas_entry r; struct bch_replicas_entry_v1 r;
} __packed; } __packed;
struct replicas_delta_list { struct replicas_delta_list {
@ -21,7 +21,7 @@ struct replicas_delta_list {
u64 nr_inodes; u64 nr_inodes;
u64 persistent_reserved[BCH_REPLICAS_MAX]; u64 persistent_reserved[BCH_REPLICAS_MAX];
struct {} memset_end; struct {} memset_end;
struct replicas_delta d[0]; struct replicas_delta d[];
}; };
#endif /* _BCACHEFS_REPLICAS_TYPES_H */ #endif /* _BCACHEFS_REPLICAS_TYPES_H */

View File

@ -191,13 +191,10 @@ void bch2_journal_super_entries_add_common(struct bch_fs *c,
struct jset_entry **end, struct jset_entry **end,
u64 journal_seq) u64 journal_seq)
{ {
struct bch_dev *ca;
unsigned i, dev;
percpu_down_read(&c->mark_lock); percpu_down_read(&c->mark_lock);
if (!journal_seq) { if (!journal_seq) {
for (i = 0; i < ARRAY_SIZE(c->usage); i++) for (unsigned i = 0; i < ARRAY_SIZE(c->usage); i++)
bch2_fs_usage_acc_to_base(c, i); bch2_fs_usage_acc_to_base(c, i);
} else { } else {
bch2_fs_usage_acc_to_base(c, journal_seq & JOURNAL_BUF_MASK); bch2_fs_usage_acc_to_base(c, journal_seq & JOURNAL_BUF_MASK);
@ -223,7 +220,7 @@ void bch2_journal_super_entries_add_common(struct bch_fs *c,
u->v = cpu_to_le64(atomic64_read(&c->key_version)); u->v = cpu_to_le64(atomic64_read(&c->key_version));
} }
for (i = 0; i < BCH_REPLICAS_MAX; i++) { for (unsigned i = 0; i < BCH_REPLICAS_MAX; i++) {
struct jset_entry_usage *u = struct jset_entry_usage *u =
container_of(jset_entry_init(end, sizeof(*u)), container_of(jset_entry_init(end, sizeof(*u)),
struct jset_entry_usage, entry); struct jset_entry_usage, entry);
@ -234,8 +231,8 @@ void bch2_journal_super_entries_add_common(struct bch_fs *c,
u->v = cpu_to_le64(c->usage_base->persistent_reserved[i]); u->v = cpu_to_le64(c->usage_base->persistent_reserved[i]);
} }
for (i = 0; i < c->replicas.nr; i++) { for (unsigned i = 0; i < c->replicas.nr; i++) {
struct bch_replicas_entry *e = struct bch_replicas_entry_v1 *e =
cpu_replicas_entry(&c->replicas, i); cpu_replicas_entry(&c->replicas, i);
struct jset_entry_data_usage *u = struct jset_entry_data_usage *u =
container_of(jset_entry_init(end, sizeof(*u) + e->nr_devs), container_of(jset_entry_init(end, sizeof(*u) + e->nr_devs),
@ -247,7 +244,7 @@ void bch2_journal_super_entries_add_common(struct bch_fs *c,
"embedded variable length struct"); "embedded variable length struct");
} }
for_each_member_device(ca, c, dev) { for_each_member_device(c, ca) {
unsigned b = sizeof(struct jset_entry_dev_usage) + unsigned b = sizeof(struct jset_entry_dev_usage) +
sizeof(struct jset_entry_dev_usage_type) * BCH_DATA_NR; sizeof(struct jset_entry_dev_usage_type) * BCH_DATA_NR;
struct jset_entry_dev_usage *u = struct jset_entry_dev_usage *u =
@ -255,10 +252,9 @@ void bch2_journal_super_entries_add_common(struct bch_fs *c,
struct jset_entry_dev_usage, entry); struct jset_entry_dev_usage, entry);
u->entry.type = BCH_JSET_ENTRY_dev_usage; u->entry.type = BCH_JSET_ENTRY_dev_usage;
u->dev = cpu_to_le32(dev); u->dev = cpu_to_le32(ca->dev_idx);
u->buckets_ec = cpu_to_le64(ca->usage_base->buckets_ec);
for (i = 0; i < BCH_DATA_NR; i++) { for (unsigned i = 0; i < BCH_DATA_NR; i++) {
u->d[i].buckets = cpu_to_le64(ca->usage_base->d[i].buckets); u->d[i].buckets = cpu_to_le64(ca->usage_base->d[i].buckets);
u->d[i].sectors = cpu_to_le64(ca->usage_base->d[i].sectors); u->d[i].sectors = cpu_to_le64(ca->usage_base->d[i].sectors);
u->d[i].fragmented = cpu_to_le64(ca->usage_base->d[i].fragmented); u->d[i].fragmented = cpu_to_le64(ca->usage_base->d[i].fragmented);
@ -267,7 +263,7 @@ void bch2_journal_super_entries_add_common(struct bch_fs *c,
percpu_up_read(&c->mark_lock); percpu_up_read(&c->mark_lock);
for (i = 0; i < 2; i++) { for (unsigned i = 0; i < 2; i++) {
struct jset_entry_clock *clock = struct jset_entry_clock *clock =
container_of(jset_entry_init(end, sizeof(*clock)), container_of(jset_entry_init(end, sizeof(*clock)),
struct jset_entry_clock, entry); struct jset_entry_clock, entry);

View File

@ -12,33 +12,105 @@
#include "sb-errors.h" #include "sb-errors.h"
#include "super-io.h" #include "super-io.h"
#define RECOVERY_PASS_ALL_FSCK BIT_ULL(63)
/* /*
* Downgrade table: * Upgrade, downgrade tables - run certain recovery passes, fix certain errors
* When dowgrading past certain versions, we need to run certain recovery passes
* and fix certain errors:
* *
* x(version, recovery_passes, errors...) * x(version, recovery_passes, errors...)
*/ */
#define UPGRADE_TABLE() \
x(backpointers, \
RECOVERY_PASS_ALL_FSCK) \
x(inode_v3, \
RECOVERY_PASS_ALL_FSCK) \
x(unwritten_extents, \
RECOVERY_PASS_ALL_FSCK) \
x(bucket_gens, \
BIT_ULL(BCH_RECOVERY_PASS_bucket_gens_init)| \
RECOVERY_PASS_ALL_FSCK) \
x(lru_v2, \
RECOVERY_PASS_ALL_FSCK) \
x(fragmentation_lru, \
RECOVERY_PASS_ALL_FSCK) \
x(no_bps_in_alloc_keys, \
RECOVERY_PASS_ALL_FSCK) \
x(snapshot_trees, \
RECOVERY_PASS_ALL_FSCK) \
x(snapshot_skiplists, \
BIT_ULL(BCH_RECOVERY_PASS_check_snapshots), \
BCH_FSCK_ERR_snapshot_bad_depth, \
BCH_FSCK_ERR_snapshot_bad_skiplist) \
x(deleted_inodes, \
BIT_ULL(BCH_RECOVERY_PASS_check_inodes), \
BCH_FSCK_ERR_unlinked_inode_not_on_deleted_list) \
x(rebalance_work, \
BIT_ULL(BCH_RECOVERY_PASS_set_fs_needs_rebalance))
#define DOWNGRADE_TABLE() #define DOWNGRADE_TABLE()
struct downgrade_entry { struct upgrade_downgrade_entry {
u64 recovery_passes; u64 recovery_passes;
u16 version; u16 version;
u16 nr_errors; u16 nr_errors;
const u16 *errors; const u16 *errors;
}; };
#define x(ver, passes, ...) static const u16 ver_##errors[] = { __VA_ARGS__ }; #define x(ver, passes, ...) static const u16 upgrade_##ver##_errors[] = { __VA_ARGS__ };
DOWNGRADE_TABLE() UPGRADE_TABLE()
#undef x #undef x
static const struct downgrade_entry downgrade_table[] = { static const struct upgrade_downgrade_entry upgrade_table[] = {
#define x(ver, passes, ...) { \ #define x(ver, passes, ...) { \
.recovery_passes = passes, \ .recovery_passes = passes, \
.version = bcachefs_metadata_version_##ver,\ .version = bcachefs_metadata_version_##ver,\
.nr_errors = ARRAY_SIZE(ver_##errors), \ .nr_errors = ARRAY_SIZE(upgrade_##ver##_errors), \
.errors = ver_##errors, \ .errors = upgrade_##ver##_errors, \
},
UPGRADE_TABLE()
#undef x
};
void bch2_sb_set_upgrade(struct bch_fs *c,
unsigned old_version,
unsigned new_version)
{
lockdep_assert_held(&c->sb_lock);
struct bch_sb_field_ext *ext = bch2_sb_field_get(c->disk_sb.sb, ext);
for (const struct upgrade_downgrade_entry *i = upgrade_table;
i < upgrade_table + ARRAY_SIZE(upgrade_table);
i++)
if (i->version > old_version && i->version <= new_version) {
u64 passes = i->recovery_passes;
if (passes & RECOVERY_PASS_ALL_FSCK)
passes |= bch2_fsck_recovery_passes();
passes &= ~RECOVERY_PASS_ALL_FSCK;
ext->recovery_passes_required[0] |=
cpu_to_le64(bch2_recovery_passes_to_stable(passes));
for (const u16 *e = i->errors;
e < i->errors + i->nr_errors;
e++) {
__set_bit(*e, c->sb.errors_silent);
ext->errors_silent[*e / 64] |= cpu_to_le64(BIT_ULL(*e % 64));
}
}
}
#define x(ver, passes, ...) static const u16 downgrade_ver_##errors[] = { __VA_ARGS__ };
DOWNGRADE_TABLE()
#undef x
static const struct upgrade_downgrade_entry downgrade_table[] = {
#define x(ver, passes, ...) { \
.recovery_passes = passes, \
.version = bcachefs_metadata_version_##ver,\
.nr_errors = ARRAY_SIZE(downgrade_##ver##_errors), \
.errors = downgrade_##ver##_errors, \
}, },
DOWNGRADE_TABLE() DOWNGRADE_TABLE()
#undef x #undef x
@ -118,7 +190,7 @@ int bch2_sb_downgrade_update(struct bch_fs *c)
darray_char table = {}; darray_char table = {};
int ret = 0; int ret = 0;
for (const struct downgrade_entry *src = downgrade_table; for (const struct upgrade_downgrade_entry *src = downgrade_table;
src < downgrade_table + ARRAY_SIZE(downgrade_table); src < downgrade_table + ARRAY_SIZE(downgrade_table);
src++) { src++) {
if (BCH_VERSION_MAJOR(src->version) != BCH_VERSION_MAJOR(le16_to_cpu(c->disk_sb.sb->version))) if (BCH_VERSION_MAJOR(src->version) != BCH_VERSION_MAJOR(le16_to_cpu(c->disk_sb.sb->version)))

Some files were not shown because too many files have changed in this diff Show More