This patch makes the cpuidle_states structure global (single copy)
instead of per-cpu. The statistics needed on per-cpu basis
by the governor are kept per-cpu. This simplifies the cpuidle
subsystem as state registration is done by single cpu only.
Having single copy of cpuidle_states saves memory. Rare case
of asymmetric C-states can be handled within the cpuidle driver
and architectures such as POWER do not have asymmetric C-states.
Having single/global registration of all the idle states,
dynamic C-state transitions on x86 are handled by
the boot cpu. Here, the boot cpu would disable all the devices,
re-populate the states and later enable all the devices,
irrespective of the cpu that would receive the notification first.
Reference:
https://lkml.org/lkml/2011/4/25/83
Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: Trinabh Gupta <g.trinabh@gmail.com>
Tested-by: Jean Pihet <j-pihet@ti.com>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Len Brown <len.brown@intel.com>
This is the first step towards global registration of cpuidle
states. The statistics used primarily by the governor are per-cpu
and have to be split from rest of the fields inside cpuidle_state,
which would be made global i.e. single copy. The driver_data field
is also per-cpu and moved.
Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: Trinabh Gupta <g.trinabh@gmail.com>
Tested-by: Jean Pihet <j-pihet@ti.com>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Len Brown <len.brown@intel.com>
The cpuidle_device->prepare() mechanism causes updates to the
cpuidle_state[].flags, setting and clearing CPUIDLE_FLAG_IGNORE
to tell the governor not to chose a state on a per-cpu basis at
run-time. State demotion is now handled by the driver and it returns
the actual state entered. Hence, this mechanism is not required.
Also this removes per-cpu flags from cpuidle_state enabling
it to be made global.
Reference:
https://lkml.org/lkml/2011/3/25/52
Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm>
Signed-off-by: Trinabh Gupta <g.trinabh@gmail.com>
Tested-by: Jean Pihet <j-pihet@ti.com>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Cpuidle governor only suggests the state to enter using the
governor->select() interface, but allows the low level driver to
override the recommended state. The actual entered state
may be different because of software or hardware demotion. Software
demotion is done by the back-end cpuidle driver and can be accounted
correctly. Current cpuidle code uses last_state field to capture the
actual state entered and based on that updates the statistics for the
state entered.
Ideally the driver enter routine should update the counters,
and it should return the state actually entered rather than the time
spent there. The generic cpuidle code should simply handle where
the counters live in the sysfs namespace, not updating the counters.
Reference:
https://lkml.org/lkml/2011/3/25/52
Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: Trinabh Gupta <g.trinabh@gmail.com>
Tested-by: Jean Pihet <j-pihet@ti.com>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Recently the ACPI ops structs were constified but the inline version
of register_hotplug_dock_device() was overlooked (see also commit
9c8b04b, June 25 2011). Update the inline function
register_hotplug_dock_device() that is enabled with
CONFIG_ACPI_DOCK=n too. This patch fixes at least the following
compiler warnings:
drivers/ata/libata-acpi.c: In function .ata_acpi_associate.:
drivers/ata/libata-acpi.c:266:11: warning: passing argument 2 of .register_hotplug_dock_device. discards qualifiers from pointer target type
include/acpi/acpi_drivers.h:146:19: note: expected .struct acpi_dock_ops *. but argument is of type .const struct acpi_dock_ops *.
drivers/ata/libata-acpi.c:275:11: warning: passing argument 2 of .register_hotplug_dock_device. discards qualifiers from pointer target type
include/acpi/acpi_drivers.h:146:19: note: expected .struct acpi_dock_ops *. but argument is of type .const struct acpi_dock_ops *.
Cc: stable@vger.kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>
There are a lot userspace approaches to detect the usage of the
platform (laptop, workstation, server, ...) and adjust kernel tunables
accordingly (io/process scheduler, power management, ...).
These approaches need constant maintaining and are ugly to implement
(detect PCMCIA controller -> laptop,
does not work on recent systems anymore, ...)
On ACPI systems there is an easy and reliable way (if implemented
in BIOS and most recent platforms have this value set).
-> export it to userspace.
Signed-off-by: Thomas Renninger <trenn@suse.de>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
The thermal driver should use a freezable workqueue to schedule
polling to prevent thermal_zone_device_update() from being run
during system suspend, when the devices it relies on may be inactive.
Make it use the system freezable workqueue for this purpose.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
ACPI_NO_HARDWARE_INIT is only used by acpi_early_init() and
acpi_bus_init() when calling acpi_enable_subsystem(), but
acpi_enable_subsystem() doesn't check that flag, so it can be
dropped.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
Callers to __acpi_ioremap_fast() pass the bit_width that they found in the
acpi_generic_address structure. Convert from bits to bytes when passing to
__acpi_find_iomap() - as it wants to see bytes, not bits.
cc: stable@kernel.org
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph/super.c: quiet sparse noise
ceph/mds_client.c: quiet sparse noise
ceph: use new D_COMPLETE dentry flag
ceph: clear parent D_COMPLETE flag when on dentry prune
* git://github.com/rustyrussell/linux:
module,bug: Add TAINT_OOT_MODULE flag for modules not built in-tree
module: Enable dynamic debugging regardless of taint
* 'rmobile-latest' of git://github.com/pmundt/linux-sh: (21 commits)
ARM: mach-shmobile: ag5evm needs CONFIG_I2C
ARM: mach-shmobile: sh73a0 and AG5EVM PINT support
ARM: mach-shmobile: Add support for PINT though INTC macros
ARM: mach-shmobile: SDHI0 GPIO hotplug for AG5EVM
ARM: mach-shmobile: Use common INTC IRQ code on sh73a0
ARM: mach-shmobile: Use common INTC IRQ code on sh7372
ARM: mach-shmobile: Use common INTC IRQ code on sh7377
ARM: mach-shmobile: Use common INTC IRQ code on sh7367
ARM: mach-shmobile: sh73a0 GPIO IRQ support
ARM: sh7372 ap4evb NOR Flash USB boot fix
ARM: mach-shmobile: sh7372 Mackerel NOR Flash USB boot fix
sh: intc: Allow triggering on both edges for ARM SoCs
ARM: mach-shmobile: Break out INTC IRQ code
ARM: mach-shmobile: Kota2 SDHI0 and SDHI1 support
ARM: mach-shmobile: Kota2 SCIFA4 and SCIFB support
ARM: mach-shmobile: Kota2 MMCIF support
ARM: mach-shmobile: Kota2 GPIO LEDs support
ARM: mach-shmobile: Kota2 GPIO Keys support
ARM: mach-shmobile: Kota2 KEYSC support
ARM: mach-shmobile: Kota2 SCIFA2 and SMSC911X support
...
The attached patch simplifies 29df8d8f8702f0f53c1375015f09f04bc8d023c1. As
the "pnp_xxx" structs are not designed to cope with IORESOURCE_DISABLED, and
hence no code can test for this value, setting this value is actually a "no op"
and can be skipped altogether. It is sufficient to remove the checks for
"empty" resources and continue processing.
The patch is applied against 3.1.
Signed-off-by: Witold Szczeponik <Witold.Szczeponik@gmx.net>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Len Brown <len.brown@intel.com>
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (106 commits)
powerpc/p3060qds: Add support for P3060QDS board
powerpc/83xx: Add shutdown request support to MCU handling on MPC8349 MITX
powerpc/85xx: Make kexec to interate over online cpus
powerpc/fsl_booke: Fix comment in head_fsl_booke.S
powerpc/85xx: issue 15 EOI after core reset for FSL CoreNet devices
powerpc/8xxx: Fix interrupt handling in MPC8xxx GPIO driver
powerpc/85xx: Add 'fsl,pq3-gpio' compatiable for GPIO driver
powerpc/86xx: Correct Gianfar support for GE boards
powerpc/cpm: Clear muram before it is in use.
drivers/virt: add ioctl for 32-bit compat on 64-bit to fsl-hv-manager
powerpc/fsl_msi: add support for "msi-address-64" property
powerpc/85xx: Setup secondary cores PIR with hard SMP id
powerpc/fsl-booke: Fix settlbcam for 64-bit
powerpc/85xx: Adding DCSR node to dtsi device trees
powerpc/85xx: clean up FPGA device tree nodes for Freecsale QorIQ boards
powerpc/85xx: fix PHYS_64BIT selection for P1022DS
powerpc/fsl-booke: Fix setup_initial_memory_limit to not blindly map
powerpc: respect mem= setting for early memory limit setup
powerpc: Update corenet64_smp_defconfig
powerpc: Update mpc85xx/corenet 32-bit defconfigs
...
Fix up trivial conflicts in:
- arch/powerpc/configs/40x/hcu4_defconfig
removed stale file, edited elsewhere
- arch/powerpc/include/asm/udbg.h, arch/powerpc/kernel/udbg.c:
added opal and gelic drivers vs added ePAPR driver
- drivers/tty/serial/8250.c
moved UPIO_TSI to powerpc vs removed UPIO_DWAPB support
Calling pm-suspend might trigger a recursive lock in it's code path.
In function acpi_hw_clear_acpi_status, acpi_os_acquire_lock holds
the lock acpi_gbl_hardware_lock before calling acpi_hw_register_write(),
then without releasing acpi_gbl_hardware_lock, this function calls
acpi_ev_walk_gpe_list, which tries to hold acpi_gbl_gpe_lock.
Both acpi_gbl_hardware_lock and acpi_gbl_gpe_lock are at same
lock-class and which might cause lock recursion deadlock.
Following patch fixes this scenario by just releasing
acpi_gbl_hardware_lock before calling acpi_ev_walk_gpe_list.
Changes since v0(https://lkml.org/lkml/2011/9/21/355):
- Fix changelog, thanks to Lin Ming.
Changes since v1 (https://lkml.org/lkml/2011/11/3/89):
- Update changelog and rename goto label, courtesy Srivatsa S. Bhat.
Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
Use kstrdup rather than duplicating its implementation
The semantic patch that makes this output is available
in scripts/coccinelle/api/kstrdup.cocci.
More information about semantic patching is available at
http://coccinelle.lip6.fr/
Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Len Brown <len.brown@intel.com>
During log replay, can commit the transaction before the fs_root
pointers are setup, so we have to make sure they are not null before
trying to use them.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Use of the GPL or a compatible licence doesn't necessarily make the code
any good. We already consider staging modules to be suspect, and this
should also be true for out-of-tree modules which may receive very
little review.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Reviewed-by: Dave Jones <davej@redhat.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (patched oops-tracing.txt)
Dynamic debugging is currently disabled for tainted modules, except
for TAINT_CRAP. This prevents use of dynamic debugging for
out-of-tree modules once the next patch is applied.
This condition was apparently intended to avoid a crash if a force-
loaded module has an incompatible definition of dynamic debug
structures. However, a administrator that forces us to load a module
is claiming that it *is* compatible even though it fails our version
checks. If they are mistaken, there are any number of ways the module
could crash the system.
As a side-effect, proprietary and other tainted modules can now use
dynamic_debug.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* 'next/move' of git://git.linaro.org/people/arnd/arm-soc:
ARM: EXYNOS: Add ARCH_EXYNOS and reorganize arch/arm/mach-exynos
ARM: EXYNOS4: convert MCT to percpu interrupt API
ARM: SAMSUNG: Add clk enable/disable of pwm
ARM: SAMSUNG: Fix compile error due to kfree
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda - Revert the check of NO_PRESENCE pincfg default bit
ALSA: hda - Fix a regression for DMA-position check with CA0110
ALSA: hda - Fix silent output regression with ALC861
ALSA: control: remove compilation warning on 32-bit
ALSA: ua101: fix crash when unplugging
Commit 2265cef2 (hwmon: (w83627ehf) Properly report PECI and AMD-SI
sensor types) results in kernel panic if data->temp_label was not
initialized.
The problem was found with chip W83627DHG-P.
Add check if data->temp->label was set before use.
Based on incomplete patch by Alexander Beregalov.
Reported-by: Alexander Beregalov <a.beregalov@gmail.com>
Tested-by: Alexander Beregalov <a.beregalov@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
The definition of TO_ATTR_NO in the non-SMP case is wrong. As the SMP
definition resolves to the correct value, just use this for both
cases.
Without this fix the temperature attributes are named temp0_* instead
of temp2_*, so libsensors won't pick them. Broken since kernel 3.0.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Tested-by: Phil Sutter <phil@nwl.cc>
Cc: stable@kernel.org
Acked-by: Durgadoss R <Durgadoss.r@intel.com>
Acked-by: Guenter Roeck <guenter.roeck@ericsson.com>
The implementation on commit [08a1f5eb: ALSA: hda - Check NO_PRESENCE
pincfg default bit] seems like a mis-interpretation of specification.
The spec gives the reversed bit definition. But, following the spec
also causes to change so many existing device configurations, thus we
can't change it so easily for now. For 3.2-rc1, it's safer to revert
this check (actually this patch comments out the code).
We may re-introduced the fixed version once after the wider test-case
coverages are done.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The regression-fix in 3.1 for the check of DMA-position validity caused
yet another regression for CA0110. As usual, this hardware seems working
only with LPIB properly. Adding the appropriate driver-caps bit to force
LPIB fixes the problem.
Reported-and-tested-by: Andres Freund <andres@anarazel.de>
Cc: <stable@kernel.org> [v3.1]
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The 3.1 kernel has a regression for ALC861 codec where no sound output
is heard with the default setup. It's because the amps in DACs aren't
properly unmuted while the output mixers are assigned only to pins.
This patch fixes the missing initialization of DACs when no mixer is
assigned to them.
Tested-by: Andrea Iob <andrea_iob@yahoo.it>
Cc: <stable@kernel.org> [v3.1+]
Signed-off-by: Takashi Iwai <tiwai@suse.de>
This was introduced by 'ALSA: control: add support for ENUMERATED user
space controls' which adds a u64 variable that gets cast to a pointer:
sound/core/control.c: In function 'snd_ctl_elem_init_enum_names':
sound/core/control.c:1089: warning: cast to pointer from integer of different size
Cast to uintptr_t before casting to pointer to avoid the warning.
Signed-off-by: Olof Johansson <olof@lixom.net>
[cl: replace long with uintptr_t]
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
If the device is unplugged while running, it is possible for a PCM
device to be closed after the disconnect callback has returned. This
means that kill_stream_urb() and disable_iso_interface() would try to
access already-invalid or freed USB data structures.
The function free_usb_related_resources() was intended to prevent this,
but forgot to clear the affected variables.
Reported-and-tested-by: Olivier Courtay <olivier@courtay.org>
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Cc: 2.6.33+ <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
While we're allocating ram for a new transaction, we drop our spinlock.
When we get the lock back, we do check to see if a transaction started
while we slept, but we don't check to make sure it isn't blocked
because a commit has already started.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
In case we were able to map less than we wanted (length < PAGE_SIZE
clause is true) btrfs_bio is still allocated and we have to free it.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
The scrub readahead branch brought in a new error handling hook,
but it was leaking extent_buffer references.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
The new ioctls to follow backrefs are not clean for 32/64 bit
compat. This reworks them for u64s everywhere. They are brand new, so
there are no problems with changing the interface now.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
We all keep getting those stupid warnings from use_block_rsv when running
stress.sh, and it's because the delayed insertion stuff is being stupid. It's
not the delayed insertion stuffs fault, it's all just stupid. When marking an
inode dirty for oh say updating the time on it, we just do a
btrfs_join_transaction, which doesn't reserve any space. This is stupid because
we're going to have to have space reserve to make this change, but we do it
because it's fast because chances are we're going to call it over and over again
and it doesn't matter. Well thanks to the delayed insertion stuff this is
mostly the case, so we do actually need to make this reservation. So if
trans->bytes_reserved is 0 then try to do a normal reservation. If not return
ENOSPC which will make the btrfs_dirty_inode start a proper transaction which
will let it do the whole ENOSPC dance and reserve enough space for the delayed
insertion to steal the reservation from the transaction.
The other stupid thing we do is not reserve space for the inode when writing to
the thing. Usually this is ok since we have to update the time so we'd have
already done all this work before we get to the endio stuff, so it doesn't
matter. But this is stupid because we could write the data after the
transaction commits where we changed the mtime of the inode so we have to cow
all the way down to the inode anyway. This used to be masked by the delalloc
reservation stuff, but because we delay the update it doesn't get masked in this
case. So again the delayed insertion stuff bites us in the ass. So if our
trans->block_rsv is delalloc, just steal the reservation from the delalloc
reserve. Hopefully this won't bite us in the ass, but I've said that before.
With this patch stress.sh no longer spits out those stupid warnings (famous last
words). Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Failure testing was tripping up over stale PageError bits in
metadata pages. If we have an io error on a block, and later on
end up reusing it, nobody ever clears PageError on those pages.
During commit, we'll find PageError and think we had trouble writing
the block, which will lead to aborts and other problems.
This changes clean_tree_block and the btrfs writepage code to
clear the PageError bit. In both cases we're either completely
done with the page or the page has good stuff and the error bit
is no longer valid.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Because of the overcommit stuff I had to make it so that we committed the
transaction all the time in reserve_metadata_bytes in case we had overcommitted
because of delayed items. This was because previously we had no way of knowing
how much space was reserved for delayed items. Now that we have the
delayed_block_rsv we can check it to see if committing the transaction would get
us anywhere. This patch breaks out the committing logic into a helper function
that will check to see if committing the transaction would free enough space for
us to get anything done. With this patch xfstests 83 goes from taking 445
seconds to taking 28 seconds on my box. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
I've been hitting warnings in use_block_rsv when running the delayed insertion
stuff. It's because we will readjust global block rsv based on what is in use,
which means we could end up discarding reservations that are for the delayed
insertion stuff. So instead create a seperate block rsv for the delayed
insertion stuff. This will also make it easier to debug problems with the
delayed insertion reservations since we will know that only the delayed
insertion code touches this block_rsv. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
This takes some of the free space in the btrfs super block
to record information about most of the roots in the last four
commits.
It also adds a -o recovery to use the root history log when
we're not able to read the tree of tree roots, the extent
tree root, the device tree root or the csum root.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
fs_info has now ~9kb, more than fits into one page. This will cause
mount failure when memory is too fragmented. Top space consumers are
super block structures super_copy and super_for_commit, ~2.8kb each.
Allocate them dynamically. fs_info will be ~3.5kb. (measured on x86_64)
Add a wrapper for freeing fs_info and all of it's dynamically allocated
members.
Signed-off-by: David Sterba <dsterba@suse.cz>
We no longer use the orphan block rsv for holding the reservation for truncating
the inode, so instead use the global block rsv and check to make sure it has
enough space for us to truncate the space. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
I fixed a problem where we weren't reserving space for an orphan item when we
had to fallback to using the global reserve for an unlink, but I introduced
another problem. I was migrating the bytes from the transaction reserve to the
global reserve and then releasing from the global reserve in
btrfs_end_transaction(). The problem with this is that a migrate will jack up
the size for the destination, but leave the size alone for the source, with the
idea that you can do a release normally on the source and it all washes out, and
then you can do a release again on the destination and it works out right. My
way was skipping the release on the trans_block_rsv which still had the jacked
up size from our original reservation. So instead release manually from the
global reserve if this transaction was using it, and then set the
trans->block_rsv back to the trans_block_rsv so that btrfs_end_transaction
cleans everything up properly. With this patch xfstest 83 doesn't emit warnings
about leaking space. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
write_cache_pages tries to build up a large bio to stuff down the pipe.
But if it needs to wait for a page lock, it needs to make sure and send
down any pending writes so we don't deadlock with anyone who has the
page lock and is waiting for writeback of things inside the bio.
Dave Sterba triggered this as a deadlock between the autodefrag code and
the extent write_cache_pages
Signed-off-by: Chris Mason <chris.mason@oracle.com>
The tree log had two important bugs that could cause corruptions after a
crash. Sometimes we were allowing tree log blocks to be reused after
the tree log was committed but before the transaction commit was done.
This allowed a future metadata write to overwrite the tree log data. It
is fixed by adding a new variant of freeing reserved extents that always
pins them. Credit goes to Stefan Behrens and Arne Jansen for many many
hours spent tracking this bug down.
During tree log replay, we do a pass through the tree log and pin all
the extents we find. This makes sure the replay code won't go in and
use any of those blocks for new allocations during replay. The problem
is the free space cache isn't honoring these pinned extents. So the
allocator can end up handing them out, leading to all kinds of problems
during replay.
The fix here is to force any free space cache to load while we pin the
extents, and then to make sure we remove the pinned extents from the
free space rbtree.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Reported-by: Stefan Behrens <sbehrens@giantdisaster.de>