334346 Commits

Author SHA1 Message Date
Vladimir Murzin
9b76beb071 openrisc: Make cpu_relax() invoke barrier()
Make cpu_relax() invoke barrier() to be the same as other arches.

Signed-off-by: Vladimir Murzin <murzin.v@gmail.com>
Signed-off-by: Jonas Bonn <jonas@southpole.se>
2012-10-11 11:27:19 +02:00
Ralf Baechle
35bafbee4b UAPI Disintegration 2012-10-09
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIVAwUAUHPmWhOxKuMESys7AQLOPQ//asL1dnbW+99GyGVjtvfQKUMbTF23I4u5
 2VkYAZl3cNSFMQTEY8BYEPOaQM/NhRXrEgExnsyNukbzVtV9lFq91VvesRYOhPEc
 94p6Q/TLj+y/rh/5N5Yz+5HyyPfTnzmaY5kL0uY9guHgZKtCMOf/15i6sPaXzZqZ
 7Qn1BRROZ71a8wlvP/3OyuiKDFyYtcE++nUQKNA8Msn87Xuu3tR6Zw7apJdTWuWi
 5oc/R0AzKXX5UiYShF38lVBpyrQlW2jshZPhKjboyE3oCClo+q0z+36gQ9Wpowkl
 ITgxKrJiEGM22RC0gr/hsZGhIxuO2cPvGlX+fwbMkQzTRXn2q6c3gZsx1w0Jfm4Z
 vbKMdNojP1bIflrMC47dGvKspwoFBut2dFRpeVShcRMjnYqx8YaBWM3HL0eKC5O0
 1nXVs7/4vwiIcqWjXzPNak+O/Z8egCvzY2nBwTD7aU6pufGZvTEsDnzi8jos6apQ
 /FxZGHGOVcpkbWdMOKyJXc7mkdm2zKfMsDbIYY9wxRDBSVM4Eq030lOqp+/jvqkc
 d5NT97n87BMOIOFW4noBqJW54DnvXhD3Oi9e+TmpYNIk78F0ewg3ac28Xy4aGil5
 xH/id7Nm7BWbXlnhHXFyw5WaeVmXhBo4UpRyxeLD1wmusrNAsvjrhpqOCqpbU1IT
 IelBSr0Qx5A=
 =XolC
 -----END PGP SIGNATURE-----

Merge tag 'disintegrate-mips-20121009' of git://git.infradead.org/users/dhowells/linux-headers into mips-for-linux-next

UAPI Disintegration 2012-10-09

Patchwork: https://patchwork.linux-mips.org/patch/4414/
2012-10-11 11:15:03 +02:00
Thomas Bogendoerfer
49a94e9482 MIPS: SNI: Switch RM400 serial to SCCNXP driver
The new SCCNXP driver supports the SC2681 chips used in RM400 machines.
We now use the new driver instead of the old SC26xx driver.

Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/4417/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-10-11 11:14:13 +02:00
Ralf Baechle
fd9e8392c3 MIPS: Remove unused empty_bad_pmd_table[] declaration.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-10-11 11:14:12 +02:00
Ralf Baechle
2551aebc67 MIPS: MT: Remove kspd.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-10-11 11:14:12 +02:00
Ralf Baechle
2eaaac508a MIPS: Malta: Fix section mismatch.
LD      arch/mips/pci/built-in.o
WARNING: arch/mips/pci/built-in.o(.devinit.text+0x2a0): Section mismatch in reference from the function malta_piix_func0_fixup() to the variable .init.data:pci_irq
The function __devinit malta_piix_func0_fixup() references
a variable __initdata pci_irq.
If pci_irq is only used by malta_piix_func0_fixup then
annotate pci_irq with a matching annotation.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-10-11 11:14:12 +02:00
Ralf Baechle
3efd5a0db5 MIPS: asm-offset.c: Delete unused irq_cpustat_t struct offsets.
Originally added in 05b541489c48e7fbeec19a92acf8683230750d0a [Merge with
Linux 2.5.5.] over 10 years ago but never been used.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-10-11 11:11:20 +02:00
Manuel Lauss
851d4f5d38 MIPS: Alchemy: Merge PB1100/1500 support into DB1000 code.
The PB1100/1500 are similar to their DB-cousins but with a few
more devices on the bus.

This patch adds PB1100/1500 support to the existing DB1100/1500
code.

Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
Cc: lnux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/4338/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-10-11 11:11:20 +02:00
Manuel Lauss
24e8c1a611 MIPS: Alchemy: merge PB1550 support into DB1550 code
The PB1550 is more or less a DB1550 without the PCI IDE controller,
a more complicated (read: configurable) Flash setup and some other
minor changes.  Like the DB1550 it can be automatically detected by
reading the CPLD ID register bits.

This patch adds PB1550 detection and setup to the DB1550 code.

Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/4337/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-10-11 11:11:20 +02:00
Manuel Lauss
bd8510df88 MIPS: Alchemy: Single kernel for DB1200/1300/1550
Combine support for the DB1200/PB1200, DB1300 and DB1550 boards into
a single kernel image.

defconfig-generated image verified on DB1200, DB1300 and DB1550.

Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/4335/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-10-11 11:11:20 +02:00
David Daney
748e787eb6 MIPS: Optimize TLB refill for RI/XI configurations.
We don't have to do a separate shift to eliminate the software bits,
just rotate them into the fill and they will be ignored.

Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/4294/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-10-11 11:11:20 +02:00
Ralf Baechle
981ef0de49 MIPS: proc: Cleanup printing of ASEs.
The number of %s was just getting ridiculous.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-10-11 11:10:43 +02:00
Ralf Baechle
475032564e MIPS: Hardwire detection of DSP ASE Rev 2 for systems, as required.
Most supported systems currently hardwire cpu_has_dsp to 0, so we also
can disable support for cpu_has_dsp2 resulting in a slightly smaller
kernel.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-10-11 11:10:43 +02:00
Steven J. Hill
ee80f7c73d MIPS: Add detection of DSP ASE Revision 2.
[ralf@linux-mips.org: This patch really only detects the ASE and passes its
existence on to userland via /proc/cpuinfo.  The DSP ASE Rev 2. adds new
resources but no resources that would need management by the kernel.]

Signed-off-by: Steven J. Hill <sjhill@mips.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/4165/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-10-11 11:05:03 +02:00
David Daney
f59a2d22a0 MIPS: Optimize pgd_init and pmd_init
On a dual issue processor GCC generates code that saves a couple of
clock cycles per loop if we rearrange things slightly.  Checking for
p != end saves a SLTU per loop, moving the increment to the middle can
let it dual issue on multi-issue processors.

Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/4249/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-10-11 11:04:35 +02:00
Al Cooper
a7911a8fd1 MIPS: perf: Add perf functionality for BMIPS5000
Add hardware performance counter support to kernel "perf" code for
BMIPS5000. The BMIPS5000 performance counters are similar to MIPS
MTI cores, so the changes were mostly made in perf_event_mipsxx.c
which is typically for MTI cores.

Signed-off-by: Al Cooper <alcooperx@gmail.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/4109/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-10-11 11:04:34 +02:00
Al Cooper
399aaa2568 MIPS: perf: Split the Kconfig option CONFIG_MIPS_MT_SMP
Split the Kconfig option CONFIG_MIPS_MT_SMP into CONFIG_MIPS_MT_SMP
and CONFIG_MIPS_PERF_SHARED_TC_COUNTERS so some of the code used
for performance counters that are shared between threads can be used
for MIPS cores that are not MT_SMP.

Signed-off-by: Al Cooper <alcooperx@gmail.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/4108/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-10-11 11:04:34 +02:00
Al Cooper
ecb8ee8a89 MIPS: perf: Remove unnecessary #ifdef
The #ifdef for CONFIG_HW_PERF_EVENTS is not needed because the
Makefile will only compile the module if this config option is set.
This means that the code under #else would never be compiled. This
may have been done to leave the original broken code around for
reference, but the FIXME comment above the code already shows the
broken code.

Signed-off-by: Al Cooper <alcooperx@gmail.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/4107/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-10-11 11:04:34 +02:00
Al Cooper
da4b62cd67 MIPS: perf: Add cpu feature bit for PCI (performance counter interrupt)
The PCI (Program Counter Interrupt) bit in the "cause" register
is mandatory for MIPS32R2 cores, but has also been added to some R1
cores (BMIPS5000). This change adds a cpu feature bit to make it
easier to check for and use this feature.

Signed-off-by: Al Cooper <alcooperx@gmail.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/4106/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-10-11 11:04:34 +02:00
Al Cooper
c5600b2dd9 MIPS: perf: Change the "mips_perf_event" table unsupported indicator.
Change the indicator from 0xffffffff in the "event_id" member to
zero in the "cntr_mask" member. This removes the need to initialize
entries that are unsupported. This also solves a problem where the
number of entries in the table was increased based on a globel enum
used for all platforms, but the new unsupported entries were not added
for mips. This was leaving new table entries of all zeros that we not
marked UNSUPPORTED.

Signed-off-by: Al Cooper <alcooperx@gmail.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/4110/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-10-11 11:02:41 +02:00
David Daney
485172b3df MIPS: Align swapper_pg_dir to 64K for better TLB Refill code.
We can save an instruction in the TLB Refill path for kernel mappings
by aligning swapper_pg_dir on a 64K boundary.  The address of
swapper_pg_dir can be generated with a single LUI instead of
LUI/{D}ADDUI.

The alignment of __init_end is bumped up to 64K so there are no holes
between it and swapper_pg_dir, which is placed at the very beginning
of .bss.

The alignment of invalid_pmd_table and invalid_pte_table can be
relaxed to PAGE_SIZE.  We do this by using __page_aligned_bss, which
has the added benefit of eliminating alignment holes in .bss.

Signed-off-by: David Daney <david.daney@cavium.com>
Cc: linux-mips@linux-mips.org
Cc: linux-arch@vger.kernel.org,
Cc: linux-kernel@vger.kernel.org
Acked-by: Arnd Bergmann <arnd@arndb.de>
Patchwork: https://patchwork.linux-mips.org/patch/4220/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-10-11 11:02:40 +02:00
David Daney
c87728ca82 vmlinux.lds.h: Allow architectures to add sections to the front of .bss
Follow-on MIPS patch will put an object here that needs 64K alignment
to minimize padding.

For those architectures that don't define BSS_FIRST_SECTIONS, there is
no change.

Signed-off-by: David Daney <david.daney@cavium.com>
Cc: linux-mips@linux-mips.org
Cc: linux-arch@vger.kernel.org,
Cc: linux-kernel@vger.kernel.org
Acked-by: Arnd Bergmann <arnd@arndb.de>
Patchwork: https://patchwork.linux-mips.org/patch/4221/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-10-11 11:02:37 +02:00
Joshua Kinard
b4f2a17ba9 Improve atomic.h robustness
I've maintained this patch, originally from Thiemo Seufer in 2004, for a
really long time, but I think it's time for it to get a look at for
possible inclusion.  I have had no problems with it across various SGI
systems over the years.

To quote the post here:
http://www.linux-mips.org/archives/linux-mips/2004-12/msg00000.html

"the atomic functions use so far memory references for the inline
assembler to access the semaphore. This can lead to additional
instructions in the ll/sc loop, because newer compilers don't
expand the memory reference any more but leave it to the assembler.

The appended patch uses registers instead, and makes the ll/sc
arguments more explicit. In some cases it will lead also to better
register scheduling because the register isn't bound to an output
any more."

Signed-off-by: Joshua Kinard <kumba@gentoo.org>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/4029/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-10-11 11:02:36 +02:00
Dmitry Torokhov
0cc8d6a9d2 Merge branch 'next' into for-linus
Prepare second set of updates for 3.7 merge window (Wacom driver update
and patches extending number of input minors).
2012-10-11 00:45:21 -07:00
NeilBrown
72f36d5972 md: refine reporting of resync/reshape delays.
If 'resync_max' is set to 0 (as is often done when starting a
reshape, so the mdadm can remain in control during a sensitive
period), and if the reshape request is initially delayed because
another array using the same array is resyncing or reshaping etc,
when user-space cannot easily tell when the delay changes from being
due to a conflicting reshape, to being due to resync_max = 0.

So introduce a new state: (curr_resync == 3) to reflect this, make
sure it is visible both via /proc/mdstat and via the "sync_completed"
sysfs attribute, and ensure that the event transition from one delay
state to the other is properly notified.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-11 14:25:57 +11:00
NeilBrown
e56108d65f md/raid5: be careful not to resize_stripes too big.
When a RAID5 is reshaping, conf->raid_disks is increased
before mddev->delta_disks becomes zero.
This can result in check_reshape calling resize_stripes with a
number that is too large.  This particularly happens
when md_check_recovery calls ->check_reshape().

If we use ->previous_raid_disks, we don't risk this.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-11 14:24:13 +11:00
NeilBrown
db07d85ef6 md: make sure manual changes to recovery checkpoint are saved.
If you make an array bigger but suppress resync of the new region with
  mdadm --grow /dev/mdX --size=max --assume-clean

then stop the array before anything is written to it, the effect of
the "--assume-clean" is lost and the array will resync the new space
when restarted.
So ensure that we update the metadata in the case.

Reported-by: Sebastian Riemer <sebastian.riemer@profitbricks.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-11 14:22:17 +11:00
Dan Carpenter
91502f099d md/raid10: use correct limit variable
Clang complains that we are assigning a variable to itself.  This should
be using bad_sectors like the similar earlier check does.

Bug has been present since 3.1-rc1.  It is minor but could
conceivably cause corruption or other bad behaviour.

Cc: stable@vger.kernel.org
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-11 14:20:58 +11:00
NeilBrown
48c26ddc9f md: writing to sync_action should clear the read-auto state.
In some cases array are started in 'read-auto' state where in
nothing gets written to any device until the array is written
to.  The purpose of this is to make accidental auto-assembly
of the wrong arrays less of a risk, and to allow arrays to be
started to read suspend-to-disk images without actually changing
anything (as might happen if the array were dirty and a
resync seemed necessary).

Explicitly writing the 'sync_action' for a read-auto array currently
doesn't clear the read-auto state, so the sync action doesn't
happen, which can be confusing.

So allow any successful write to sync_action to clear any read-auto
state.

Reported-by: Alexander Kühn <alexander.kuehn@nagilum.de>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-11 14:19:39 +11:00
Jianpeng Ma
7f7583d420 Subject: [PATCH] md:change resync_mismatches to atomic64_t to avoid races
Now that multiple threads can handle stripes, it is safer to
use an atomic64_t for resync_mismatches, to avoid update races.

Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-11 14:17:59 +11:00
Hiroaki SHIMODA
8edc0e624d e1000e: Change wthresh to 1 to avoid possible Tx stalls
This patch originated from Hiroaki SHIMODA but has been modified
by Intel with some minor cleanups and additional commit log text.

Denys Fedoryshchenko and others reported Tx stalls on e1000e with
BQL enabled.  Issue was root caused to hardware delays. They were
introduced because some of the e1000e hardware with transmit
writeback bursting enabled, waits until the driver does an
explict flush OR there are WTHRESH descriptors to write back.

Sometimes the delays in question were on the order of seconds,
causing visible lag for ssh sessions and unacceptable tx
completion latency, especially for BQL enabled kernels.

To avoid possible Tx stalls, change WTHRESH back to 1.

The current plan is to investigate a method for re-enabling
WTHRESH while not harming BQL, but those patches will be later
for net-next if they work.

please enqueue for stable since v3.3 as this bug was introduced in
commit 3f0cfa3bc11e7f00c9994e0f469cbc0e7da7b00c
Author: Tom Herbert <therbert@google.com>
Date:   Mon Nov 28 16:33:16 2011 +0000

    e1000e: Support for byte queue limits

    Changes to e1000e to use byte queue limits.

Reported-by: Denys Fedoryshchenko <denys@visp.net.lb>
Tested-by: Denys Fedoryshchenko <denys@visp.net.lb>
Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
CC: eric.dumazet@gmail.com
CC: therbert@google.com
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-10 22:59:18 -04:00
David S. Miller
959859d2fe Merge branch 'uapi-for-3.7' of git://gitorious.org/linux-can/linux-can
Marc Kleine-Budde says:

====================
this pull request for net, i.e. the v3.7 release cycle, contains the patch by
David Howells to move the UAPI related headers for the CAN subsystem.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-10 22:57:16 -04:00
stephen hemminger
68aaed54e7 ipv4: fix route mark sparse warning
Sparse complains about RTA_MARK which is should be host order according
to include file and usage in iproute.

net/ipv4/route.c:2223:46: warning: incorrect type in argument 3 (different base types)
net/ipv4/route.c:2223:46:    expected restricted __be32 [usertype] value
net/ipv4/route.c:2223:46:    got unsigned int [unsigned] [usertype] flowic_mark

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-10 22:54:59 -04:00
Ian Campbell
6a8ed462f1 xen: netback: handle compound page fragments on transmit.
An SKB paged fragment can consist of a compound page with order > 0.
However the netchannel protocol deals only in PAGE_SIZE frames.

Handle this in netbk_gop_frag_copy and xen_netbk_count_skb_slots by
iterating over the frames which make up the page.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Konrad Rzeszutek Wilk <konrad@kernel.org>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-10 22:50:45 -04:00
Sarveshwar Bandi
6caab7b054 bridge: Pull ip header into skb->data before looking into ip header.
If lower layer driver leaves the ip header in the skb fragment, it needs to
be first pulled into skb->data before inspecting ip header length or ip version
number.

Signed-off-by: Sarveshwar Bandi <sarveshwar.bandi@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-10 22:50:45 -04:00
Dan Carpenter
435f08a721 isdn: fix a wrapping bug in isdn_ppp_ioctl()
"protos" is an array of unsigned longs and "i" is the number of bits in
an unsigned long so we need to use 1UL as well to prevent the shift
from wrapping around.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-10 22:50:45 -04:00
NeilBrown
1ed850f356 md/raid5: make sure to_read and to_write never go negative.
to_read and to_write are part of the result of analysing
a stripe before handling it.
Their use is to avoid some loops and tests if the values are
known to be zero.  Thus it is not a problem if they are a
little bit larger than they should be.

So decrementing them in handle_failed_stripe serves little value, and
due to races it could cause some loops to be skipped incorrectly.

So remove those decrements.

Reported-by: "Jianpeng Ma" <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-11 13:50:13 +11:00
Alexander Lyakas
a7854487cd md: When RAID5 is dirty, force reconstruct-write instead of read-modify-write.
Signed-off-by: Alex Lyakas <alex@zadarastorage.com>
Suggested-by: Yair Hershko <yair@zadarastorage.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-11 13:50:12 +11:00
NeilBrown
b97390aec4 md/raid5: protect debug message against NULL derefernce.
The pr_debug in add_stripe_bio could race with something
changing *bip, so it is best to hold the lock until
after the pr_debug.

Reported-by: "Jianpeng Ma" <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-11 13:50:12 +11:00
NeilBrown
143c4d0573 md/raid5: add some missing locking in handle_failed_stripe.
We really should hold the stripe_lock while accessing
'toread' else we could race with add_stripe_bio and corrupt
a list.

Reported-by: "Jianpeng Ma" <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-11 13:50:12 +11:00
Shaohua Li
9e44476851 MD: raid5 avoid unnecessary zero page for trim
We want to avoid zero discarded dev page, because it's useless for discard.
But if we don't zero it, another read/write hit such page in the cache and will
get inconsistent data.

To avoid zero the page, we don't set R5_UPTODATE flag after construction is
done. In this way, discard write request is still issued and finished, but read
will not hit the page. If the stripe gets accessed soon, we need reread the
stripe, but since the chance is low, the reread isn't a big deal.

Signed-off-by: Shaohua Li <shli@fusionio.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-11 13:49:49 +11:00
Shaohua Li
620125f2bf MD: raid5 trim support
Discard for raid4/5/6 has limitation. If discard request size is
small, we do discard for one disk, but we need calculate parity and
write parity disk.  To correctly calculate parity, zero_after_discard
must be guaranteed. Even it's true, we need do discard for one disk
but write another disks, which makes the parity disks wear out
fast. This doesn't make sense. So an efficient discard for raid4/5/6
should discard all data disks and parity disks, which requires the
write pattern to be (A, A+chunk_size, A+chunk_size*2...). If A's size
is smaller than chunk_size, such pattern is almost impossible in
practice. So in this patch, I only handle the case that A's size
equals to chunk_size. That is discard request should be aligned to
stripe size and its size is multiple of stripe size.

Since we can only handle request with specific alignment and size (or
part of the request fitting stripes), we can't guarantee
zero_after_discard even zero_after_discard is true in low level
drives.

The block layer doesn't send down correctly aligned requests even
correct discard alignment is set, so I must filter out.

For raid4/5/6 parity calculation, if data is 0, parity is 0. So if
zero_after_discard is true for all disks, data is consistent after
discard.  Otherwise, data might be lost. Let's consider a scenario:
discard a stripe, write data to one disk and write parity disk. The
stripe could be still inconsistent till then depending on using data
from other data disks or parity disks to calculate new parity. If the
disk is broken, we can't restore it. So in this patch, we only enable
discard support if all disks have zero_after_discard.

If discard fails in one disk, we face the similar inconsistent issue
above. The patch will make discard follow the same path as normal
write request. If discard fails, a resync will be scheduled to make
the data consistent. This isn't good to have extra writes, but data
consistency is important.

If a subsequent read/write request hits raid5 cache of a discarded
stripe, the discarded dev page should have zero filled, so the data is
consistent. This patch will always zero dev page for discarded request
stripe. This isn't optimal because discard request doesn't need such
payload. Next patch will avoid it.

Signed-off-by: Shaohua Li <shli@fusionio.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-11 13:49:05 +11:00
Jianpeng Ma
582e2e056a md/bitmap:Don't use IS_ERR to judge alloc_page().
Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-11 13:45:36 +11:00
NeilBrown
7ad4d4a68a md/raid1: Don't release reference to device while handling read error.
When we get a read error, we arrange for raid1d to handle it.
Currently we release the reference on the device.  This can result
in
   conf->mirrors[read_disk].rdev
being NULL in fix_read_error, if the device happens to get removed
before the read error is handled.

So instead keep the reference until the read error has been fully
handled.

Reported-by: hank <pyu@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-11 13:44:30 +11:00
Michael Wang
fd177481b4 raid: replace list_for_each_continue_rcu with new interface
This patch replaces list_for_each_continue_rcu() with
list_for_each_entry_continue_rcu() to save a few lines
of code and allow removing list_for_each_continue_rcu().

Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-11 13:43:21 +11:00
Jan Beulich
af7cf25dd1 add further __init annotations to crypto/xor.c
Allow particularly do_xor_speed() to be discarded post-init.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-11 13:42:32 +11:00
Jonathan Brassow
761becff01 DM RAID: Fix for "sync" directive ineffectiveness
There are two table arguments that can be given to a DM RAID target
that control whether the array is forced to (re)synchronize or skip
initialization: "sync" and "nosync".  When "sync" is given, we set
mddev->recovery_cp to 0 in order to cause the device to resynchronize.
This is insufficient if there is a bitmap in use, because the array
will simply look at the bitmap and see that there is no recovery
necessary.

The fix is to skip over the loading of the superblocks when "sync" is
given, causing new superblocks to be written that will force the array
to go through initialization (i.e. synchronization).

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-11 13:42:19 +11:00
stephen hemminger
34e02aa1fb vxlan: fix oops when give unknown ifindex
If vxlan is created and the ifindex is passed; there are two cases which
are incorrectly handled by the existing code. The ifindex could be zero
(i.e. no device) or there could be no device with that ifindex.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-10 22:41:22 -04:00
stephen hemminger
d97c00a321 vxlan: fix receive checksum handling
Vxlan was trying to use postpull_rcsum to allow receive checksum
offload to work on drivers using CHECKSUM_COMPLETE method. But this
doesn't work correctly. Just force full receive checksum on received
packet.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-10 22:41:22 -04:00
stephen hemminger
2840bf2286 vxlan: add additional headroom
Tell upper layer protocols to allocate skb with additional headroom.
This avoids allocation and copy in local packet sends.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-10 22:41:22 -04:00