275325 Commits

Author SHA1 Message Date
Deepthi Dharwar
46bcfad7a8 cpuidle: Single/Global registration of idle states
This patch makes the cpuidle_states structure global (single copy)
instead of per-cpu. The statistics needed on per-cpu basis
by the governor are kept per-cpu. This simplifies the cpuidle
subsystem as state registration is done by single cpu only.
Having single copy of cpuidle_states saves memory. Rare case
of asymmetric C-states can be handled within the cpuidle driver
and architectures such as POWER do not have asymmetric C-states.

Having single/global registration of all the idle states,
dynamic C-state transitions on x86 are handled by
the boot cpu. Here, the boot cpu  would disable all the devices,
re-populate the states and later enable all the devices,
irrespective of the cpu that would receive the notification first.

Reference:
https://lkml.org/lkml/2011/4/25/83

Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: Trinabh Gupta <g.trinabh@gmail.com>
Tested-by: Jean Pihet <j-pihet@ti.com>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-11-06 21:13:58 -05:00
Deepthi Dharwar
4202735e8a cpuidle: Split cpuidle_state structure and move per-cpu statistics fields
This is the first step towards global registration of cpuidle
states. The statistics used primarily by the governor are per-cpu
and have to be split from rest of the fields inside cpuidle_state,
which would be made global i.e. single copy. The driver_data field
is also per-cpu and moved.

Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: Trinabh Gupta <g.trinabh@gmail.com>
Tested-by: Jean Pihet <j-pihet@ti.com>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-11-06 21:13:49 -05:00
Deepthi Dharwar
b25edc42bf cpuidle: Remove CPUIDLE_FLAG_IGNORE and dev->prepare()
The cpuidle_device->prepare() mechanism causes updates to the
cpuidle_state[].flags, setting and clearing CPUIDLE_FLAG_IGNORE
to tell the governor not to chose a state on a per-cpu basis at
run-time. State demotion is now handled by the driver and it returns
the actual state entered. Hence, this mechanism is not required.
Also this removes per-cpu flags from cpuidle_state enabling
it to be made global.

Reference:
https://lkml.org/lkml/2011/3/25/52

Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm>
Signed-off-by: Trinabh Gupta <g.trinabh@gmail.com>
Tested-by: Jean Pihet <j-pihet@ti.com>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-11-06 21:13:43 -05:00
Deepthi Dharwar
e978aa7d7d cpuidle: Move dev->last_residency update to driver enter routine; remove dev->last_state
Cpuidle governor only suggests the state to enter using the
governor->select() interface, but allows the low level driver to
override the recommended state. The actual entered state
may be different because of software or hardware demotion. Software
demotion is done by the back-end cpuidle driver and can be accounted
correctly. Current cpuidle code uses last_state field to capture the
actual state entered and based on that updates the statistics for the
state entered.

Ideally the driver enter routine should update the counters,
and it should return the state actually entered rather than the time
spent there. The generic cpuidle code should simply handle where
the counters live in the sysfs namespace, not updating the counters.

Reference:
https://lkml.org/lkml/2011/3/25/52

Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: Trinabh Gupta <g.trinabh@gmail.com>
Tested-by: Jean Pihet <j-pihet@ti.com>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-11-06 21:13:30 -05:00
Bart Van Assche
c1056b42a8 ACPI: Fix CONFIG_ACPI_DOCK=n compiler warning
Recently the ACPI ops structs were constified but the inline version
of register_hotplug_dock_device() was overlooked (see also commit
9c8b04b, June 25 2011). Update the inline function
register_hotplug_dock_device() that is enabled with
CONFIG_ACPI_DOCK=n too. This patch fixes at least the following
compiler warnings:

drivers/ata/libata-acpi.c: In function .ata_acpi_associate.:
drivers/ata/libata-acpi.c:266:11: warning: passing argument 2 of .register_hotplug_dock_device. discards qualifiers from pointer target type
include/acpi/acpi_drivers.h:146:19: note: expected .struct acpi_dock_ops *. but argument is of type .const struct acpi_dock_ops *.
drivers/ata/libata-acpi.c:275:11: warning: passing argument 2 of .register_hotplug_dock_device. discards qualifiers from pointer target type
include/acpi/acpi_drivers.h:146:19: note: expected .struct acpi_dock_ops *. but argument is of type .const struct acpi_dock_ops *.

Cc: stable@vger.kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>
2011-11-06 20:58:17 -05:00
Thomas Renninger
362b646062 ACPI: Export FADT pm_profile integer value to userspace
There are a lot userspace approaches to detect the usage of the
platform (laptop, workstation, server, ...) and adjust kernel tunables
accordingly (io/process scheduler, power management, ...).

These approaches need constant maintaining and are ugly to implement
(detect PCMCIA controller -> laptop,
does not work on recent systems anymore, ...)
On ACPI systems there is an easy and reliable way (if implemented
in BIOS and most recent platforms have this value set).
-> export it to userspace.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-11-06 20:48:42 -05:00
Rafael J. Wysocki
51e20d0e3a thermal: Prevent polling from happening during system suspend
The thermal driver should use a freezable workqueue to schedule
polling to prevent thermal_zone_device_update() from being run
during system suspend, when the devices it relies on may be inactive.
Make it use the system freezable workqueue for this purpose.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-11-06 20:38:49 -05:00
Rafael J. Wysocki
4505a2015f ACPI: Drop ACPI_NO_HARDWARE_INIT
ACPI_NO_HARDWARE_INIT is only used by acpi_early_init() and
acpi_bus_init() when calling acpi_enable_subsystem(), but
acpi_enable_subsystem() doesn't check that flag, so it can be
dropped.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-11-06 20:32:31 -05:00
Luck, Tony
3bf3f8b19d ACPI atomicio: Convert width in bits to bytes in __acpi_ioremap_fast()
Callers to __acpi_ioremap_fast() pass the bit_width that they found in the
acpi_generic_address structure. Convert from bits to bytes when passing to
__acpi_find_iomap() - as it wants to see bytes, not bits.

cc: stable@kernel.org
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-11-06 20:30:23 -05:00
Linus Torvalds
5d5a8d2d9d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  ceph/super.c: quiet sparse noise
  ceph/mds_client.c: quiet sparse noise
  ceph: use new D_COMPLETE dentry flag
  ceph: clear parent D_COMPLETE flag when on dentry prune
2011-11-06 17:28:44 -08:00
Linus Torvalds
d4a2e61f0b Merge git://github.com/rustyrussell/linux
* git://github.com/rustyrussell/linux:
  module,bug: Add TAINT_OOT_MODULE flag for modules not built in-tree
  module: Enable dynamic debugging regardless of taint
2011-11-06 17:28:32 -08:00
Linus Torvalds
0e4c9dc2f2 Merge branch 'rmobile-latest' of git://github.com/pmundt/linux-sh
* 'rmobile-latest' of git://github.com/pmundt/linux-sh: (21 commits)
  ARM: mach-shmobile: ag5evm needs CONFIG_I2C
  ARM: mach-shmobile: sh73a0 and AG5EVM PINT support
  ARM: mach-shmobile: Add support for PINT though INTC macros
  ARM: mach-shmobile: SDHI0 GPIO hotplug for AG5EVM
  ARM: mach-shmobile: Use common INTC IRQ code on sh73a0
  ARM: mach-shmobile: Use common INTC IRQ code on sh7372
  ARM: mach-shmobile: Use common INTC IRQ code on sh7377
  ARM: mach-shmobile: Use common INTC IRQ code on sh7367
  ARM: mach-shmobile: sh73a0 GPIO IRQ support
  ARM: sh7372 ap4evb NOR Flash USB boot fix
  ARM: mach-shmobile: sh7372 Mackerel NOR Flash USB boot fix
  sh: intc: Allow triggering on both edges for ARM SoCs
  ARM: mach-shmobile: Break out INTC IRQ code
  ARM: mach-shmobile: Kota2 SDHI0 and SDHI1 support
  ARM: mach-shmobile: Kota2 SCIFA4 and SCIFB support
  ARM: mach-shmobile: Kota2 MMCIF support
  ARM: mach-shmobile: Kota2 GPIO LEDs support
  ARM: mach-shmobile: Kota2 GPIO Keys support
  ARM: mach-shmobile: Kota2 KEYSC support
  ARM: mach-shmobile: Kota2 SCIFA2 and SMSC911X support
  ...
2011-11-06 17:28:13 -08:00
Witold Szczeponik
18fd470a48 PNPACPI: Simplify disabled resource registration
The attached patch simplifies 29df8d8f8702f0f53c1375015f09f04bc8d023c1.  As
the "pnp_xxx" structs are not designed to cope with IORESOURCE_DISABLED, and
hence no code can test for this value, setting this value is actually a "no op"
and can be skipped altogether.  It is sufficient to remove the checks for
"empty" resources and continue processing.

The patch is applied against 3.1.

Signed-off-by: Witold Szczeponik <Witold.Szczeponik@gmx.net>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-11-06 20:13:55 -05:00
Linus Torvalds
1197ab2942 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (106 commits)
  powerpc/p3060qds: Add support for P3060QDS board
  powerpc/83xx: Add shutdown request support to MCU handling on MPC8349 MITX
  powerpc/85xx: Make kexec to interate over online cpus
  powerpc/fsl_booke: Fix comment in head_fsl_booke.S
  powerpc/85xx: issue 15 EOI after core reset for FSL CoreNet devices
  powerpc/8xxx: Fix interrupt handling in MPC8xxx GPIO driver
  powerpc/85xx: Add 'fsl,pq3-gpio' compatiable for GPIO driver
  powerpc/86xx: Correct Gianfar support for GE boards
  powerpc/cpm: Clear muram before it is in use.
  drivers/virt: add ioctl for 32-bit compat on 64-bit to fsl-hv-manager
  powerpc/fsl_msi: add support for "msi-address-64" property
  powerpc/85xx: Setup secondary cores PIR with hard SMP id
  powerpc/fsl-booke: Fix settlbcam for 64-bit
  powerpc/85xx: Adding DCSR node to dtsi device trees
  powerpc/85xx: clean up FPGA device tree nodes for Freecsale QorIQ boards
  powerpc/85xx: fix PHYS_64BIT selection for P1022DS
  powerpc/fsl-booke: Fix setup_initial_memory_limit to not blindly map
  powerpc: respect mem= setting for early memory limit setup
  powerpc: Update corenet64_smp_defconfig
  powerpc: Update mpc85xx/corenet 32-bit defconfigs
  ...

Fix up trivial conflicts in:
 - arch/powerpc/configs/40x/hcu4_defconfig
	removed stale file, edited elsewhere
 - arch/powerpc/include/asm/udbg.h, arch/powerpc/kernel/udbg.c:
	added opal and gelic drivers vs added ePAPR driver
 - drivers/tty/serial/8250.c
	moved UPIO_TSI to powerpc vs removed UPIO_DWAPB support
2011-11-06 17:12:03 -08:00
Rakib Mullick
f7f71cfbf0 ACPI: Fix possible recursive locking in hwregs.c
Calling pm-suspend might trigger a recursive lock in it's code path.
In function acpi_hw_clear_acpi_status, acpi_os_acquire_lock holds
the lock acpi_gbl_hardware_lock before calling acpi_hw_register_write(),
then without releasing acpi_gbl_hardware_lock, this function calls
acpi_ev_walk_gpe_list, which tries to hold acpi_gbl_gpe_lock.
Both acpi_gbl_hardware_lock and acpi_gbl_gpe_lock are at same
lock-class and which might cause lock recursion deadlock.

Following patch fixes this scenario by just releasing
acpi_gbl_hardware_lock before calling acpi_ev_walk_gpe_list.

 Changes since v0(https://lkml.org/lkml/2011/9/21/355):
	- Fix changelog, thanks to Lin Ming.

 Changes since v1 (https://lkml.org/lkml/2011/11/3/89):
	- Update changelog and rename goto label, courtesy Srivatsa S. Bhat.

Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-11-06 20:09:23 -05:00
Linus Torvalds
ec773e99ab Merge branch 'fixes' of http://ftp.arm.linux.org.uk/pub/linux/arm/kernel/git-cur/linux-2.6-arm
* 'fixes' of http://ftp.arm.linux.org.uk/pub/linux/arm/kernel/git-cur/linux-2.6-arm:
  ARM: PXA: fix includes in pxa2xx_cm_x2xx PCMCIA driver
  ARM: PXA: fix gpio-pxa.h build errors
  ARM: 7142/1: davinci: mark GPIO implementation complex
  ARM: 7134/1: Revert "EXYNOS4: Fix routing timer interrupt to offline CPU"
  ARM: PXA: eseries: fix eseries_register_clks section mismatch warning
  ARM: PXA: fix lubbock PCMCIA driver build error
2011-11-06 16:58:33 -08:00
Thomas Meyer
581de59e8d ACPI: use kstrdup()
Use kstrdup rather than duplicating its implementation

 The semantic patch that makes this output is available
 in scripts/coccinelle/api/kstrdup.cocci.

 More information about semantic patching is available at
 http://coccinelle.lip6.fr/

Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-11-06 19:13:44 -05:00
Chris Mason
7c7e82a77f Btrfs: check for a null fs root when writing to the backup root log
During log replay, can commit the transaction before the fs_root
pointers are setup, so we have to make sure they are not null before
trying to use them.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06 18:50:56 -05:00
Len Brown
22f4521d66 mrst pmu: update comment
referenced MeeGo, in particular, but really means Linux, in general.

Signed-off-by: Len Brown <len.brown@intel.com>
2011-11-06 18:32:45 -05:00
Ben Hutchings
2449b8ba07 module,bug: Add TAINT_OOT_MODULE flag for modules not built in-tree
Use of the GPL or a compatible licence doesn't necessarily make the code
any good.  We already consider staging modules to be suspect, and this
should also be true for out-of-tree modules which may receive very
little review.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Reviewed-by: Dave Jones <davej@redhat.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (patched oops-tracing.txt)
2011-11-07 07:54:42 +10:30
Ben Hutchings
1cd0d6c302 module: Enable dynamic debugging regardless of taint
Dynamic debugging is currently disabled for tainted modules, except
for TAINT_CRAP.  This prevents use of dynamic debugging for
out-of-tree modules once the next patch is applied.

This condition was apparently intended to avoid a crash if a force-
loaded module has an incompatible definition of dynamic debug
structures.  However, a administrator that forces us to load a module
is claiming that it *is* compatible even though it fails our version
checks.  If they are mistaken, there are any number of ways the module
could crash the system.

As a side-effect, proprietary and other tainted modules can now use
dynamic_debug.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-11-07 07:54:40 +10:30
Linus Torvalds
534baf55dd Merge branch 'next/move' of git://git.linaro.org/people/arnd/arm-soc
* 'next/move' of git://git.linaro.org/people/arnd/arm-soc:
  ARM: EXYNOS: Add ARCH_EXYNOS and reorganize arch/arm/mach-exynos
  ARM: EXYNOS4: convert MCT to percpu interrupt API
  ARM: SAMSUNG: Add clk enable/disable of pwm
  ARM: SAMSUNG: Fix compile error due to kfree
2011-11-06 12:25:11 -08:00
Linus Torvalds
ddf8a0d385 Merge branch 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging
* 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
  hwmon: (w83627ehf) Fix broken driver init
  hwmon: (coretemp) Fix for non-SMP builds
2011-11-06 12:15:26 -08:00
Linus Torvalds
9991357259 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda - Revert the check of NO_PRESENCE pincfg default bit
  ALSA: hda - Fix a regression for DMA-position check with CA0110
  ALSA: hda - Fix silent output regression with ALC861
  ALSA: control: remove compilation warning on 32-bit
  ALSA: ua101: fix crash when unplugging
2011-11-06 12:14:22 -08:00
Guenter Roeck
bfa02b0da6 hwmon: (w83627ehf) Fix broken driver init
Commit 2265cef2 (hwmon: (w83627ehf) Properly report PECI and AMD-SI
sensor types) results in kernel panic if data->temp_label was not
initialized.
The problem was found with chip W83627DHG-P.

Add check if data->temp->label was set before use.

Based on incomplete patch by Alexander Beregalov.

Reported-by: Alexander Beregalov <a.beregalov@gmail.com>
Tested-by: Alexander Beregalov <a.beregalov@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2011-11-06 20:25:18 +01:00
Jean Delvare
2aba6cac2a hwmon: (coretemp) Fix for non-SMP builds
The definition of TO_ATTR_NO in the non-SMP case is wrong. As the SMP
definition resolves to the correct value, just use this for both
cases.

Without this fix the temperature attributes are named temp0_* instead
of temp2_*, so libsensors won't pick them. Broken since kernel 3.0.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Tested-by: Phil Sutter <phil@nwl.cc>
Cc: stable@kernel.org
Acked-by: Durgadoss R <Durgadoss.r@intel.com>
Acked-by: Guenter Roeck <guenter.roeck@ericsson.com>
2011-11-06 20:25:18 +01:00
Takashi Iwai
f441917256 ALSA: hda - Revert the check of NO_PRESENCE pincfg default bit
The implementation on commit [08a1f5eb: ALSA: hda - Check NO_PRESENCE
pincfg default bit] seems like a mis-interpretation of specification.
The spec gives the reversed bit definition.  But, following the spec
also causes to change so many existing device configurations, thus we
can't change it so easily for now.  For 3.2-rc1, it's safer to revert
this check (actually this patch comments out the code).

We may re-introduced the fixed version once after the wider test-case
coverages are done.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
2011-11-06 14:07:37 +01:00
Takashi Iwai
69f9ba9b0c ALSA: hda - Fix a regression for DMA-position check with CA0110
The regression-fix in 3.1 for the check of DMA-position validity caused
yet another regression for CA0110.  As usual, this hardware seems working
only with LPIB properly.  Adding the appropriate driver-caps bit to force
LPIB fixes the problem.

Reported-and-tested-by: Andres Freund <andres@anarazel.de>
Cc: <stable@kernel.org> [v3.1]
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2011-11-06 13:49:13 +01:00
Takashi Iwai
43dea228a3 ALSA: hda - Fix silent output regression with ALC861
The 3.1 kernel has a regression for ALC861 codec where no sound output
is heard with the default setup.  It's because the amps in DACs aren't
properly unmuted while the output mixers are assigned only to pins.

This patch fixes the missing initialization of DACs when no mixer is
assigned to them.

Tested-by: Andrea Iob <andrea_iob@yahoo.it>
Cc: <stable@kernel.org> [v3.1+]
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2011-11-06 11:25:34 +01:00
Olof Johansson
447c6f93ab ALSA: control: remove compilation warning on 32-bit
This was introduced by 'ALSA: control: add support for ENUMERATED user
space controls' which adds a u64 variable that gets cast to a pointer:

sound/core/control.c: In function 'snd_ctl_elem_init_enum_names':
sound/core/control.c:1089: warning: cast to pointer from integer of different size

Cast to uintptr_t before casting to pointer to avoid the warning.

Signed-off-by: Olof Johansson <olof@lixom.net>
[cl: replace long with uintptr_t]
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2011-11-06 11:22:15 +01:00
Clemens Ladisch
862a6244eb ALSA: ua101: fix crash when unplugging
If the device is unplugged while running, it is possible for a PCM
device to be closed after the disconnect callback has returned.  This
means that kill_stream_urb() and disable_iso_interface() would try to
access already-invalid or freed USB data structures.

The function free_usb_related_resources() was intended to prevent this,
but forgot to clear the affected variables.

Reported-and-tested-by: Olivier Courtay <olivier@courtay.org>
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Cc: 2.6.33+ <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2011-11-06 11:21:42 +01:00
Chris Mason
d43317dcd0 Btrfs: fix race during transaction joins
While we're allocating ram for a new transaction, we drop our spinlock.
When we get the lock back, we do check to see if a transaction started
while we slept, but we don't check to make sure it isn't blocked
because a commit has already started.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06 03:26:19 -05:00
Ilya Dryomov
56d2a48f81 Btrfs: fix a potential btrfs_bio leak on scrub fixups
In case we were able to map less than we wanted (length < PAGE_SIZE
clause is true) btrfs_bio is still allocated and we have to free it.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06 03:11:29 -05:00
Ilya Dryomov
21ca543efc Btrfs: rename btrfs_bio multi -> bbio for consistency
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06 03:11:21 -05:00
Ilya Dryomov
9510dc4c62 Btrfs: stop leaking btrfs_bios on readahead
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06 03:11:08 -05:00
Chris Mason
306c8b68c8 Btrfs: stop the readahead threads on failed mount
If we don't stop them, they linger around corrupting
memory by using pointers to freed things.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06 03:09:41 -05:00
Chris Mason
c674e04e1c Btrfs: fix extent_buffer leak in the metadata IO error handling
The scrub readahead branch brought in a new error handling hook,
but it was leaking extent_buffer references.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06 03:09:10 -05:00
Chris Mason
740c3d226c Btrfs: fix the new inspection ioctls for 32 bit compat
The new ioctls to follow backrefs are not clean for 32/64 bit
compat.  This reworks them for u64s everywhere.  They are brand new, so
there are no problems with changing the interface now.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06 03:08:49 -05:00
Chris Mason
806468f8bf Merge git://git.jan-o-sch.net/btrfs-unstable into integration
Conflicts:
	fs/btrfs/Makefile
	fs/btrfs/extent_io.c
	fs/btrfs/extent_io.h
	fs/btrfs/scrub.c

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06 03:07:10 -05:00
Chris Mason
531f4b1ae5 Merge branch 'for-chris' of git://github.com/sensille/linux into integration
Conflicts:
	fs/btrfs/ctree.h

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06 03:05:08 -05:00
Josef Bacik
c06a0e120a Btrfs: fix delayed insertion reservation
We all keep getting those stupid warnings from use_block_rsv when running
stress.sh, and it's because the delayed insertion stuff is being stupid.  It's
not the delayed insertion stuffs fault, it's all just stupid.  When marking an
inode dirty for oh say updating the time on it, we just do a
btrfs_join_transaction, which doesn't reserve any space.  This is stupid because
we're going to have to have space reserve to make this change, but we do it
because it's fast because chances are we're going to call it over and over again
and it doesn't matter.  Well thanks to the delayed insertion stuff this is
mostly the case, so we do actually need to make this reservation.  So if
trans->bytes_reserved is 0 then try to do a normal reservation.  If not return
ENOSPC which will make the btrfs_dirty_inode start a proper transaction which
will let it do the whole ENOSPC dance and reserve enough space for the delayed
insertion to steal the reservation from the transaction.

The other stupid thing we do is not reserve space for the inode when writing to
the thing.  Usually this is ok since we have to update the time so we'd have
already done all this work before we get to the endio stuff, so it doesn't
matter.  But this is stupid because we could write the data after the
transaction commits where we changed the mtime of the inode so we have to cow
all the way down to the inode anyway.  This used to be masked by the delalloc
reservation stuff, but because we delay the update it doesn't get masked in this
case.  So again the delayed insertion stuff bites us in the ass.  So if our
trans->block_rsv is delalloc, just steal the reservation from the delalloc
reserve.  Hopefully this won't bite us in the ass, but I've said that before.

With this patch stress.sh no longer spits out those stupid warnings (famous last
words).  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06 03:04:20 -05:00
Chris Mason
bf0da8c183 Btrfs: ClearPageError during writepage and clean_tree_block
Failure testing was tripping up over stale PageError bits in
metadata pages.  If we have an io error on a block, and later on
end up reusing it, nobody ever clears PageError on those pages.

During commit, we'll find PageError and think we had trouble writing
the block, which will lead to aborts and other problems.

This changes clean_tree_block and the btrfs writepage code to
clear the PageError bit.  In both cases we're either completely
done with the page or the page has good stuff and the error bit
is no longer valid.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06 03:04:20 -05:00
Josef Bacik
663350ac38 Btrfs: be smarter about committing the transaction in reserve_metadata_bytes
Because of the overcommit stuff I had to make it so that we committed the
transaction all the time in reserve_metadata_bytes in case we had overcommitted
because of delayed items.  This was because previously we had no way of knowing
how much space was reserved for delayed items.  Now that we have the
delayed_block_rsv we can check it to see if committing the transaction would get
us anywhere.  This patch breaks out the committing logic into a helper function
that will check to see if committing the transaction would free enough space for
us to get anything done.  With this patch xfstests 83 goes from taking 445
seconds to taking 28 seconds on my box.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06 03:04:19 -05:00
Josef Bacik
6d668dda0c Btrfs: make a delayed_block_rsv for the delayed item insertion
I've been hitting warnings in use_block_rsv when running the delayed insertion
stuff.  It's because we will readjust global block rsv based on what is in use,
which means we could end up discarding reservations that are for the delayed
insertion stuff.  So instead create a seperate block rsv for the delayed
insertion stuff.  This will also make it easier to debug problems with the
delayed insertion reservations since we will know that only the delayed
insertion code touches this block_rsv.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06 03:04:18 -05:00
Chris Mason
af31f5e5b8 Btrfs: add a log of past tree roots
This takes some of the free space in the btrfs super block
to record information about most of the roots in the last four
commits.

It also adds a -o recovery to use the root history log when
we're not able to read the tree of tree roots, the extent
tree root, the device tree root or the csum root.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06 03:04:15 -05:00
David Sterba
6c41761fc6 btrfs: separate superblock items out of fs_info
fs_info has now ~9kb, more than fits into one page. This will cause
mount failure when memory is too fragmented. Top space consumers are
super block structures super_copy and super_for_commit, ~2.8kb each.
Allocate them dynamically. fs_info will be ~3.5kb. (measured on x86_64)

Add a wrapper for freeing fs_info and all of it's dynamically allocated
members.

Signed-off-by: David Sterba <dsterba@suse.cz>
2011-11-06 03:04:01 -05:00
Josef Bacik
c8174313a8 Btrfs: use the global reserve when truncating the free space cache inode
We no longer use the orphan block rsv for holding the reservation for truncating
the inode, so instead use the global block rsv and check to make sure it has
enough space for us to truncate the space.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06 03:03:50 -05:00
Josef Bacik
5a77d76c24 Btrfs: release metadata from global reserve if we have to fallback for unlink
I fixed a problem where we weren't reserving space for an orphan item when we
had to fallback to using the global reserve for an unlink, but I introduced
another problem.  I was migrating the bytes from the transaction reserve to the
global reserve and then releasing from the global reserve in
btrfs_end_transaction().  The problem with this is that a migrate will jack up
the size for the destination, but leave the size alone for the source, with the
idea that you can do a release normally on the source and it all washes out, and
then you can do a release again on the destination and it works out right.  My
way was skipping the release on the trans_block_rsv which still had the jacked
up size from our original reservation.  So instead release manually from the
global reserve if this transaction was using it, and then set the
trans->block_rsv back to the trans_block_rsv so that btrfs_end_transaction
cleans everything up properly.  With this patch xfstest 83 doesn't emit warnings
about leaking space.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06 03:03:49 -05:00
Chris Mason
01d658f2ca Btrfs: make sure to flush queued bios if write_cache_pages waits
write_cache_pages tries to build up a large bio to stuff down the pipe.
But if it needs to wait for a page lock, it needs to make sure and send
down any pending writes so we don't deadlock with anyone who has the
page lock and is waiting for writeback of things inside the bio.

Dave Sterba triggered this as a deadlock between the autodefrag code and
the extent write_cache_pages

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06 03:03:48 -05:00
Chris Mason
e688b7252f Btrfs: fix extent pinning bugs in the tree log
The tree log had two important bugs that could cause corruptions after a
crash.  Sometimes we were allowing tree log blocks to be reused after
the tree log was committed but before the transaction commit was done.

This allowed a future metadata write to overwrite the tree log data.  It
is fixed by adding a new variant of freeing reserved extents that always
pins them.  Credit goes to Stefan Behrens and Arne Jansen for many many
hours spent tracking this bug down.

During tree log replay, we do a pass through the tree log and pin all
the extents we find.  This makes sure the replay code won't go in and
use any of those blocks for new allocations during replay.  The problem
is the free space cache isn't honoring these pinned extents.  So the
allocator can end up handing them out, leading to all kinds of problems
during replay.

The fix here is to force any free space cache to load while we pin the
extents, and then to make sure we remove the pinned extents from the
free space rbtree.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
Reported-by: Stefan Behrens <sbehrens@giantdisaster.de>
2011-11-06 03:03:48 -05:00