The CRIS SMP code cannot be built since there is no (and appears to
never have been) a CONFIG_SMP Kconfig option in arch/cris/. Remove it.
Signed-off-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: Jesper Nilsson <jespern@axis.com>
INIT_THREAD enables interrupts in the thread_struct's saved flags. This
means that interrupts get enabled in the middle of context_switch()
while switching to new tasks that get forked off the init task during
boot. Don't do this.
Fixes the following splat on boot with spinlock debugging on:
BUG: spinlock cpu recursion on CPU#0, swapper/2
lock: runqueues+0x0/0x47c, .magic: dead4ead, .owner: swapper/0,
.owner_cpu: 0
CPU: 0 PID: 2 Comm: swapper Not tainted 3.19.0-08796-ga747b55 #285
Call Trace:
[<c0032b80>] spin_bug+0x2a/0x36
[<c0032c98>] do_raw_spin_lock+0xa2/0x126
[<c01964b0>] _raw_spin_lock+0x20/0x2a
[<c00286c8>] scheduler_tick+0x22/0x76
[<c003db2c>] update_process_times+0x5e/0x72
[<c0007a94>] timer_interrupt+0x4e/0x6a
[<c00378d6>] handle_irq_event_percpu+0x54/0xf2
[<c00379c4>] handle_irq_event+0x50/0x74
[<c003988e>] handle_simple_irq+0x6c/0xbe
[<c0037270>] generic_handle_irq+0x2a/0x36
[<c0004c40>] do_IRQ+0x38/0x84
[<c000662e>] crisv32_do_IRQ+0x54/0x60
[<c0006204>] IRQ0x4b_interrupt+0x34/0x3c
[<c0192baa>] __schedule+0x24a/0x532
[<c00056b4>] ret_from_kernel_thread+0x0/0x14
Signed-off-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: Jesper Nilsson <jespern@axis.com>
Al Viro noted that CRIS fails to handle multiple signals.
This fixes the problem for CRISv32 by making it use a C work_pending
handling loop similar to the ARM implementation in 0a267fa6a15d41c
("ARM: 7472/1: pull all work_pending logics into C function").
This also happens to fixes the warnings which currently trigger on
CRISv32 due to do_signal() being called with interrupts disabled.
Test case (should die of the SIGSEGV which gets raised when setting up
the stack for SIGALRM, but instead reaches and executes the _exit(1)):
#include <unistd.h>
#include <signal.h>
#include <sys/time.h>
#include <err.h>
static void handler(int sig) { }
int main(int argc, char *argv[])
{
int ret;
struct itimerval t1 = { .it_value = {1} };
stack_t ss = {
.ss_sp = NULL,
.ss_size = SIGSTKSZ,
};
struct sigaction action = {
.sa_handler = handler,
.sa_flags = SA_ONSTACK,
};
ret = sigaltstack(&ss, NULL);
if (ret < 0)
err(1, "sigaltstack");
sigaction(SIGALRM, &action, NULL);
setitimer(ITIMER_REAL, &t1, NULL);
pause();
_exit(1);
return 0;
}
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Link: http://lkml.kernel.org/r/20121208074429.GC4939@ZenIV.linux.org.uk
Signed-off-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: Jesper Nilsson <jespern@axis.com>
Al Viro noted that CRIS is vulnerable to bogus restarts on sigreturn.
The fixes CRISv32 by using regs->exs as an additional indicator to
whether we should attempt to restart the syscall or not. EXS is only
used in the sigtrap handling, and in that path we already have r9 (the
other indicator, which indicates if we're in a syscall or not) cleared.
Test case, a port of Al's ARM version from 653d48b22166db2d8 ("arm: fix
really nasty sigreturn bug"):
#include <unistd.h>
#include <signal.h>
#include <stdlib.h>
#include <sys/time.h>
#include <errno.h>
void f(int n)
{
register int r10 asm ("r10") = n;
__asm__ __volatile__(
"ba 1f \n"
"nop \n"
"break 8 \n"
"1: ba . \n"
"nop \n"
:
: "r" (r10)
: "memory");
}
void handler1(int sig) { }
void handler2(int sig) { raise(1); }
void handler3(int sig) { exit(0); }
int main(int argc, char *argv[])
{
struct sigaction s = {.sa_handler = handler2};
struct itimerval t1 = { .it_value = {1} };
struct itimerval t2 = { .it_value = {2} };
signal(1, handler1);
sigemptyset(&s.sa_mask);
sigaddset(&s.sa_mask, 1);
sigaction(SIGALRM, &s, NULL);
signal(SIGVTALRM, handler3);
setitimer(ITIMER_REAL, &t1, NULL);
setitimer(ITIMER_VIRTUAL, &t2, NULL);
f(-513); /* -ERESTARTNOINTR */
return 0;
}
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Link: http://lkml.kernel.org/r/20121208074429.GC4939@ZenIV.linux.org.uk
Signed-off-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: Jesper Nilsson <jespern@axis.com>
r9 is used to determine whether syscall restarting must be performed or
not. Unfortunately, r9 is never set to zero in the non-syscall path,
and r9 is on top of that a callee-saved register which can be set to
non-zero by the C functions that are called during IRQ handling.
This means that if r10 (used for the syscall return value) is one of the
-ERESTART* values when a hardware interrupt occurs which leads to a
signal being delivered to the process, the kernel will "restart" a
syscall which never occurred. This will lead to the PC being moved back
by 2 on return to user space.
Fix the problem by setting r9 to zero in the interrupt path.
Test case (should loop forever but ends up executing the break 8 trap
instruction):
#include <signal.h>
#include <stdlib.h>
#include <sys/time.h>
void f(int n)
{
register int r9 asm ("r9") = 1;
register int r10 asm ("r10") = n;
__asm__ __volatile__(
"ba 1f \n"
"nop \n"
"break 8 \n"
"1: ba . \n"
"nop \n"
:
: "r" (r9), "r" (r10)
: "memory");
}
void handler1(int sig) { }
int main(int argc, char *argv[])
{
struct itimerval t1 = { .it_value = {1} };
signal(SIGALRM, handler1);
setitimer(ITIMER_REAL, &t1, NULL);
f(-513); /* -ERESTARTNOINTR */
return 0;
}
Signed-off-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: Jesper Nilsson <jespern@axis.com>
Add a minimal device tree for the ETRAX FS SoC and the Axis 88 developer
board.
Signed-off-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: Jesper Nilsson <jespern@axis.com>
Add support for booting CRISv32 with a built-in device tree.
Signed-off-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com>
Add support for IRQ domains to the CRISv32 interrupt controller.
Signed-off-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com>
Enable GPIOLIB on CRIS so that we can use the generic GPIO APIs.
Signed-off-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com>
While working on arch/cris/include/asm/uaccess.h, I noticed
that some macros within this header are made harder to read because they
violate a coding style rule: space is missing after comma.
Fix it up.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com>
virtio wants to read bitwise types from userspace using get_user. At the
moment this triggers sparse errors, since the value is passed through an
integer.
Fix that up using __force.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com>
If an attacker can cause a controlled kernel stack overflow, overwriting
the restart block is a very juicy exploit target. This is because the
restart_block is held in the same memory allocation as the kernel stack.
Moving the restart block to struct task_struct prevents this exploit by
making the restart_block harder to locate.
Note that there are other fields in thread_info that are also easy
targets, at least on some architectures.
It's also a decent simplification, since the restart code is more or less
identical on all architectures.
[james.hogan@imgtec.com: metag: align thread_info::supervisor_stack]
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: David Miller <davem@davemloft.net>
Acked-by: Richard Weinberger <richard@nod.at>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Steven Miao <realmz6@gmail.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Tested-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chen Liqin <liqin.linux@gmail.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Chris Zankel <chris@zankel.net>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
LKP has triggered a compiler warning after my recent patch "mm: account
pmd page tables to the process":
mm/mmap.c: In function 'exit_mmap':
>> mm/mmap.c:2857:2: warning: right shift count >= width of type [enabled by default]
The code:
> 2857 WARN_ON(mm_nr_pmds(mm) >
2858 round_up(FIRST_USER_ADDRESS, PUD_SIZE) >> PUD_SHIFT);
In this, on tile, we have FIRST_USER_ADDRESS defined as 0. round_up() has
the same type -- int. PUD_SHIFT.
I think the best way to fix it is to define FIRST_USER_ADDRESS as unsigned
long. On every arch for consistency.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The core VM already knows about VM_FAULT_SIGBUS, but cannot return a
"you should SIGSEGV" error, because the SIGSEGV case was generally
handled by the caller - usually the architecture fault handler.
That results in lots of duplication - all the architecture fault
handlers end up doing very similar "look up vma, check permissions, do
retries etc" - but it generally works. However, there are cases where
the VM actually wants to SIGSEGV, and applications _expect_ SIGSEGV.
In particular, when accessing the stack guard page, libsigsegv expects a
SIGSEGV. And it usually got one, because the stack growth is handled by
that duplicated architecture fault handler.
However, when the generic VM layer started propagating the error return
from the stack expansion in commit fee7e49d4514 ("mm: propagate error
from stack expansion even for guard page"), that now exposed the
existing VM_FAULT_SIGBUS result to user space. And user space really
expected SIGSEGV, not SIGBUS.
To fix that case, we need to add a VM_FAULT_SIGSEGV, and teach all those
duplicate architecture fault handlers about it. They all already have
the code to handle SIGSEGV, so it's about just tying that new return
value to the existing code, but it's all a bit annoying.
This is the mindless minimal patch to do this. A more extensive patch
would be to try to gather up the mostly shared fault handling logic into
one generic helper routine, and long-term we really should do that
cleanup.
Just from this patch, you can generally see that most architectures just
copied (directly or indirectly) the old x86 way of doing things, but in
the meantime that original x86 model has been improved to hold the VM
semaphore for shorter times etc and to handle VM_FAULT_RETRY and other
"newer" things, so it would be a good idea to bring all those
improvements to the generic case and teach other architectures about
them too.
Reported-and-tested-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Jan Engelhardt <jengelh@inai.de>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # "s390 still compiles and boots"
Cc: linux-arch@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The time Kconfig expects that NR_CPUS is defined.
This patch removes this config warning:
"kernel/time/Kconfig:163:warning: range is invalid"
Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com>
Avoids the warning about:
warning: 'bite_in_progress' defined but not used [-Wunused-variable]
Variable is only used if the Kconfig CONFIG_ETRAX_WATCHDOG_NICE_DOGGY
is set.
Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com>
Move declaration of waitqueue to beginning of block,
avoids warning about mixing declarations and code.
Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com>
Allows that symbol to be used in modules, and fixes
the following on allmodconfig:
ERROR: "csum_partial_copy_nocheck" [net/ipv6/ipv6.ko] undefined!
Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com>
Pull vfs fixes from Al Viro:
"A couple of fixes - deadlock in CIFS and build breakage in cris serial
driver (resurfaced f_dentry in there)"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
VFS: Convert file->f_dentry->d_inode to file_inode()
fix deadlock in cifs_ioctl_clone()
Convert file->f_dentry->d_inode to file_inode() so as to get layered
filesystems right.
Found with: git grep '[.>]f_dentry'
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Nothing needs the module pointer any more, and the next patch will
call it from RCU, where the module itself might no longer exist.
Removing the arg is the safest approach.
This just codifies the use of the module_alloc/module_free pattern
which ftrace and bpf use.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: x86@kernel.org
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: linux-cris-kernel@axis.com
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: nios2-dev@lists.rocketboards.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparclinux@vger.kernel.org
Cc: netdev@vger.kernel.org
Move pinmux alloc/dealloc code into functions that don't take
the spinlock so we can use from code that has the spinlock already.
CRISv32 has no working SMP, so spinlocks becomes a NOP,
so deadlock was never seen.
Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com>
- Add free_initrd_mem as found by Guenter Roeck <linux@roeck-us.net>
- Add free_init_pages
- Export empty_zero_page symbol
Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com>
Don't enter watchdog handling if we're already in watchdog handling.
Also some minor formatting tweaks.
Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com>
strcmp was lost when all other string functions were removed,
but we still have an optimized version for this on CRISv32,
so any driver built as a module would not have access to this symbol.
In a similar manner, we had optimized versions of
csum_partial_copy_from_user and __do_clear_user
but no exported symbols for them, breaking bunch of other drivers
when built as a module.
At the same time, move EXPORT_SYMBOL(__copy_user) and
EXPORT_SYMBOL(__copy_user_zeroing) C-files so it's
located together with the function definition.
Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com>
Fix headers_install by adjusting the path to arch files.
And delete unused Kbuild file.
Drop special handling of cris in the headers.sh script
as a nice side-effect.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com>
Fixes the following compile error.
arch/cris/arch-v32/kernel/time.c: In function 'reset_watchdog':
arch/cris/arch-v32/kernel/time.c:121:2:
error: implicit declaration of function 'global_page_state'
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com>
Pull networking updates from David Miller:
1) New offloading infrastructure and example 'rocker' driver for
offloading of switching and routing to hardware.
This work was done by a large group of dedicated individuals, not
limited to: Scott Feldman, Jiri Pirko, Thomas Graf, John Fastabend,
Jamal Hadi Salim, Andy Gospodarek, Florian Fainelli, Roopa Prabhu
2) Start making the networking operate on IOV iterators instead of
modifying iov objects in-situ during transfers. Thanks to Al Viro
and Herbert Xu.
3) A set of new netlink interfaces for the TIPC stack, from Richard
Alpe.
4) Remove unnecessary looping during ipv6 routing lookups, from Martin
KaFai Lau.
5) Add PAUSE frame generation support to gianfar driver, from Matei
Pavaluca.
6) Allow for larger reordering levels in TCP, which are easily
achievable in the real world right now, from Eric Dumazet.
7) Add a variable of napi_schedule that doesn't need to disable cpu
interrupts, from Eric Dumazet.
8) Use a doubly linked list to optimize neigh_parms_release(), from
Nicolas Dichtel.
9) Various enhancements to the kernel BPF verifier, and allow eBPF
programs to actually be attached to sockets. From Alexei
Starovoitov.
10) Support TSO/LSO in sunvnet driver, from David L Stevens.
11) Allow controlling ECN usage via routing metrics, from Florian
Westphal.
12) Remote checksum offload, from Tom Herbert.
13) Add split-header receive, BQL, and xmit_more support to amd-xgbe
driver, from Thomas Lendacky.
14) Add MPLS support to openvswitch, from Simon Horman.
15) Support wildcard tunnel endpoints in ipv6 tunnels, from Steffen
Klassert.
16) Do gro flushes on a per-device basis using a timer, from Eric
Dumazet. This tries to resolve the conflicting goals between the
desired handling of bulk vs. RPC-like traffic.
17) Allow userspace to ask for the CPU upon what a packet was
received/steered, via SO_INCOMING_CPU. From Eric Dumazet.
18) Limit GSO packets to half the current congestion window, from Eric
Dumazet.
19) Add a generic helper so that all drivers set their RSS keys in a
consistent way, from Eric Dumazet.
20) Add xmit_more support to enic driver, from Govindarajulu
Varadarajan.
21) Add VLAN packet scheduler action, from Jiri Pirko.
22) Support configurable RSS hash functions via ethtool, from Eyal
Perry.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1820 commits)
Fix race condition between vxlan_sock_add and vxlan_sock_release
net/macb: fix compilation warning for print_hex_dump() called with skb->mac_header
net/mlx4: Add support for A0 steering
net/mlx4: Refactor QUERY_PORT
net/mlx4_core: Add explicit error message when rule doesn't meet configuration
net/mlx4: Add A0 hybrid steering
net/mlx4: Add mlx4_bitmap zone allocator
net/mlx4: Add a check if there are too many reserved QPs
net/mlx4: Change QP allocation scheme
net/mlx4_core: Use tasklet for user-space CQ completion events
net/mlx4_core: Mask out host side virtualization features for guests
net/mlx4_en: Set csum level for encapsulated packets
be2net: Export tunnel offloads only when a VxLAN tunnel is created
gianfar: Fix dma check map error when DMA_API_DEBUG is enabled
cxgb4/csiostor: Don't use MASTER_MUST for fw_hello call
net: fec: only enable mdio interrupt before phy device link up
net: fec: clear all interrupt events to support i.MX6SX
net: fec: reset fep link status in suspend function
net: sock: fix access via invalid file descriptor
net: introduce helper macro for_each_cmsghdr
...
As there are now no remaining users of arch_fast_hash(), lets kill
it entirely.
This basically reverts commit 71ae8aac3e19 ("lib: introduce arch
optimized hash library") and follow-up work, that is f.e., commit
237217546d44 ("lib: hash: follow-up fixups for arch hash"),
commit e3fec2f74f7f ("lib: Add missing arch generic-y entries for
asm-generic/hash.h") and last but not least commit 6a02652df511
("perf tools: Fix include for non x86 architectures").
Cc: Francesco Fusco <fusco@ntop.org>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
introduce new setsockopt() command:
setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, &prog_fd, sizeof(prog_fd))
where prog_fd was received from syscall bpf(BPF_PROG_LOAD, attr, ...)
and attr->prog_type == BPF_PROG_TYPE_SOCKET_FILTER
setsockopt() calls bpf_prog_get() which increments refcnt of the program,
so it doesn't get unloaded while socket is using the program.
The same eBPF program can be attached to multiple sockets.
User task exit automatically closes socket which calls sk_filter_uncharge()
which decrements refcnt of eBPF program
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alternative to RPS/RFS is to use hardware support for multiple
queues.
Then split a set of million of sockets into worker threads, each
one using epoll() to manage events on its own socket pool.
Ideally, we want one thread per RX/TX queue/cpu, but we have no way to
know after accept() or connect() on which queue/cpu a socket is managed.
We normally use one cpu per RX queue (IRQ smp_affinity being properly
set), so remembering on socket structure which cpu delivered last packet
is enough to solve the problem.
After accept(), connect(), or even file descriptor passing around
processes, applications can use :
int cpu;
socklen_t len = sizeof(cpu);
getsockopt(fd, SOL_SOCKET, SO_INCOMING_CPU, &cpu, &len);
And use this information to put the socket into the right silo
for optimal performance, as all networking stack should run
on the appropriate cpu, without need to send IPI (RPS/RFS).
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
write{b,w,l}_relaxed are implemented by some architectures in order to
permit memory-mapped I/O accesses with weaker barrier semantics than the
non-relaxed variants.
This patch adds dummy macros for the write accessors to Cris, in the same
vein as the dummy definitions for the relaxed read accessors.
Cc: Mikael Starvik <starvik@axis.com>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Pull scheduler updates from Ingo Molnar:
"The main changes in this cycle were:
- Optimized support for Intel "Cluster-on-Die" (CoD) topologies (Dave
Hansen)
- Various sched/idle refinements for better idle handling (Nicolas
Pitre, Daniel Lezcano, Chuansheng Liu, Vincent Guittot)
- sched/numa updates and optimizations (Rik van Riel)
- sysbench speedup (Vincent Guittot)
- capacity calculation cleanups/refactoring (Vincent Guittot)
- Various cleanups to thread group iteration (Oleg Nesterov)
- Double-rq-lock removal optimization and various refactorings
(Kirill Tkhai)
- various sched/deadline fixes
... and lots of other changes"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits)
sched/dl: Use dl_bw_of() under rcu_read_lock_sched()
sched/fair: Delete resched_cpu() from idle_balance()
sched, time: Fix build error with 64 bit cputime_t on 32 bit systems
sched: Improve sysbench performance by fixing spurious active migration
sched/x86: Fix up typo in topology detection
x86, sched: Add new topology for multi-NUMA-node CPUs
sched/rt: Use resched_curr() in task_tick_rt()
sched: Use rq->rd in sched_setaffinity() under RCU read lock
sched: cleanup: Rename 'out_unlock' to 'out_free_new_mask'
sched: Use dl_bw_of() under RCU read lock
sched/fair: Remove duplicate code from can_migrate_task()
sched, mips, ia64: Remove __ARCH_WANT_UNLOCKED_CTXSW
sched: print_rq(): Don't use tasklist_lock
sched: normalize_rt_tasks(): Don't use _irqsave for tasklist_lock, use task_rq_lock()
sched: Fix the task-group check in tg_has_rt_tasks()
sched/fair: Leverage the idle state info when choosing the "idlest" cpu
sched: Let the scheduler see CPU idle states
sched/deadline: Fix inter- exclusive cpusets migrations
sched/deadline: Clear dl_entity params when setscheduling to different class
sched/numa: Kill the wrong/dead TASK_DEAD check in task_numa_fault()
...
Pull arch atomic cleanups from Ingo Molnar:
"This is a series kept separate from the main locking tree, which
cleans up and improves various details in the atomics type handling:
- Remove the unused atomic_or_long() method
- Consolidate and compress atomic ops implementations between
architectures, to reduce linecount and to make it easier to add new
ops.
- Rewrite generic atomic support to only require cmpxchg() from an
architecture - generate all other methods from that"
* 'locking-arch-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
locking,arch: Use ACCESS_ONCE() instead of cast to volatile in atomic_read()
locking, mips: Fix atomics
locking, sparc64: Fix atomics
locking,arch: Rewrite generic atomic support
locking,arch,xtensa: Fold atomic_ops
locking,arch,sparc: Fold atomic_ops
locking,arch,sh: Fold atomic_ops
locking,arch,powerpc: Fold atomic_ops
locking,arch,parisc: Fold atomic_ops
locking,arch,mn10300: Fold atomic_ops
locking,arch,mips: Fold atomic_ops
locking,arch,metag: Fold atomic_ops
locking,arch,m68k: Fold atomic_ops
locking,arch,m32r: Fold atomic_ops
locking,arch,ia64: Fold atomic_ops
locking,arch,hexagon: Fold atomic_ops
locking,arch,cris: Fold atomic_ops
locking,arch,avr32: Fold atomic_ops
locking,arch,arm64: Fold atomic_ops
locking,arch,arm: Fold atomic_ops
...
Use the much more reader friendly ACCESS_ONCE() instead of the cast to volatile.
This is purely a stylistic change.
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-arch@vger.kernel.org
Link: http://lkml.kernel.org/r/1411482607-20948-1-git-send-email-bobby.prani@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The nohz full code needs irq work to trigger its own interrupt so that
the subsystem can work even when the tick is stopped.
Lets introduce arch_irq_work_has_interrupt() that archs can override to
tell about their support for this ability.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>