linux/arch/arc/Kconfig

567 lines
15 KiB
Plaintext
Raw Normal View History

# SPDX-License-Identifier: GPL-2.0-only
#
# Copyright (C) 2004, 2007-2010, 2011-2012 Synopsys, Inc. (www.synopsys.com)
#
config ARC
def_bool y
select ARC_TIMERS
mm: generalize ARCH_HAS_CACHE_LINE_SIZE Patch series "mm: some config cleanups", v2. This series contains config cleanup patches which reduces code duplication across platforms and also improves maintainability. There is no functional change intended with this series. This patch (of 6): ARCH_HAS_CACHE_LINE_SIZE config has duplicate definitions on platforms that subscribe it. Instead, just make it a generic option which can be selected on applicable platforms. This change reduces code duplication and makes it cleaner. Link: https://lkml.kernel.org/r/1617259448-22529-1-git-send-email-anshuman.khandual@arm.com Link: https://lkml.kernel.org/r/1617259448-22529-2-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Acked-by: Vineet Gupta <vgupta@synopsys.com> [arc] Cc: Will Deacon <will@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Palmer Dabbelt <palmerdabbelt@google.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-05 01:38:09 +00:00
select ARCH_HAS_CACHE_LINE_SIZE
mm/debug: add tests validating architecture page table helpers This adds tests which will validate architecture page table helpers and other accessors in their compliance with expected generic MM semantics. This will help various architectures in validating changes to existing page table helpers or addition of new ones. This test covers basic page table entry transformations including but not limited to old, young, dirty, clean, write, write protect etc at various level along with populating intermediate entries with next page table page and validating them. Test page table pages are allocated from system memory with required size and alignments. The mapped pfns at page table levels are derived from a real pfn representing a valid kernel text symbol. This test gets called via late_initcall(). This test gets built and run when CONFIG_DEBUG_VM_PGTABLE is selected. Any architecture, which is willing to subscribe this test will need to select ARCH_HAS_DEBUG_VM_PGTABLE. For now this is limited to arc, arm64, x86, s390 and powerpc platforms where the test is known to build and run successfully Going forward, other architectures too can subscribe the test after fixing any build or runtime problems with their page table helpers. Folks interested in making sure that a given platform's page table helpers conform to expected generic MM semantics should enable the above config which will just trigger this test during boot. Any non conformity here will be reported as an warning which would need to be fixed. This test will help catch any changes to the agreed upon semantics expected from generic MM and enable platforms to accommodate it thereafter. [anshuman.khandual@arm.com: v17] Link: http://lkml.kernel.org/r/1587436495-22033-3-git-send-email-anshuman.khandual@arm.com [anshuman.khandual@arm.com: v18] Link: http://lkml.kernel.org/r/1588564865-31160-3-git-send-email-anshuman.khandual@arm.com Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390] Tested-by: Christophe Leroy <christophe.leroy@c-s.fr> [ppc32] Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Link: http://lkml.kernel.org/r/1583919272-24178-1-git-send-email-anshuman.khandual@arm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04 23:47:15 +00:00
select ARCH_HAS_DEBUG_VM_PGTABLE
select ARCH_HAS_DMA_PREP_COHERENT
select ARCH_HAS_PTE_SPECIAL
select ARCH_HAS_SETUP_DMA_OPS
select ARCH_HAS_SYNC_DMA_FOR_CPU
select ARCH_HAS_SYNC_DMA_FOR_DEVICE
select ARCH_NEED_CMPXCHG_1_EMU
select ARCH_SUPPORTS_ATOMIC_RMW if ARC_HAS_LLSC
select ARCH_32BIT_OFF_T
select BUILDTIME_TABLE_SORT
select CLONE_BACKWARDS
select COMMON_CLK
select DMA_DIRECT_REMAP
select GENERIC_ATOMIC64 if !ISA_ARCV2 || !(ARC_HAS_LL64 && ARC_HAS_LLSC)
# for now, we don't need GENERIC_IRQ_PROBE, CONFIG_GENERIC_IRQ_CHIP
select GENERIC_IRQ_SHOW
select GENERIC_PCI_IOMAP
select GENERIC_PENDING_IRQ if SMP
clocksource/drivers/arc_timer: Utilize generic sched_clock It turned out we used to use default implementation of sched_clock() from kernel/sched/clock.c which was as precise as 1/HZ, i.e. by default we had 10 msec granularity of time measurement. Now given ARC built-in timers are clocked with the same frequency as CPU cores we may get much higher precision of time tracking. Thus we switch to generic sched_clock which really reads ARC hardware counters. This is especially helpful for measuring short events. That's what we used to have: ------------------------------>8------------------------ $ perf stat /bin/sh -c /root/lmbench-master/bin/arc/hello > /dev/null Performance counter stats for '/bin/sh -c /root/lmbench-master/bin/arc/hello': 10.000000 task-clock (msec) # 2.832 CPUs utilized 1 context-switches # 0.100 K/sec 1 cpu-migrations # 0.100 K/sec 63 page-faults # 0.006 M/sec 3049480 cycles # 0.305 GHz 1091259 instructions # 0.36 insn per cycle 256828 branches # 25.683 M/sec 27026 branch-misses # 10.52% of all branches 0.003530687 seconds time elapsed 0.000000000 seconds user 0.010000000 seconds sys ------------------------------>8------------------------ And now we'll see: ------------------------------>8------------------------ $ perf stat /bin/sh -c /root/lmbench-master/bin/arc/hello > /dev/null Performance counter stats for '/bin/sh -c /root/lmbench-master/bin/arc/hello': 3.004322 task-clock (msec) # 0.865 CPUs utilized 1 context-switches # 0.333 K/sec 1 cpu-migrations # 0.333 K/sec 63 page-faults # 0.021 M/sec 2986734 cycles # 0.994 GHz 1087466 instructions # 0.36 insn per cycle 255209 branches # 84.947 M/sec 26002 branch-misses # 10.19% of all branches 0.003474829 seconds time elapsed 0.003519000 seconds user 0.000000000 seconds sys ------------------------------>8------------------------ Note how much more meaningful is the second output - time spent for execution pretty much matches number of cycles spent (we're runnign @ 1GHz here). Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Acked-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2018-11-19 11:29:17 +00:00
select GENERIC_SCHED_CLOCK
select GENERIC_SMP_IDLE_THREAD
arc: mm: convert to GENERIC_IOREMAP By taking GENERIC_IOREMAP method, the generic generic_ioremap_prot(), generic_iounmap(), and their generic wrapper ioremap_prot(), ioremap() and iounmap() are all visible and available to arch. Arch needs to provide wrapper functions to override the generic versions if there's arch specific handling in its ioremap_prot(), ioremap() or iounmap(). This change will simplify implementation by removing duplicated code with generic_ioremap_prot() and generic_iounmap(), and has the equivalent functioality as before. Here, add wrapper functions ioremap_prot() and iounmap() for arc's special operation when ioremap_prot() and iounmap(). Link: https://lkml.kernel.org/r/20230706154520.11257-8-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Brian Cain <bcain@quicinc.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Chris Zankel <chris@zankel.net> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonas Bonn <jonas@southpole.se> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Niklas Schnelle <schnelle@linux.ibm.com> Cc: Rich Felker <dalias@libc.org> Cc: Stafford Horne <shorne@gmail.com> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-06 15:45:08 +00:00
select GENERIC_IOREMAP
select GENERIC_STRNCPY_FROM_USER if MMU
select GENERIC_STRNLEN_USER if MMU
select HAVE_ARCH_KGDB
select HAVE_ARCH_TRACEHOOK
mm: drop redundant HAVE_ARCH_TRANSPARENT_HUGEPAGE HAVE_ARCH_TRANSPARENT_HUGEPAGE has duplicate definitions on platforms that subscribe it. Drop these reduntant definitions and instead just select it on applicable platforms. Link: https://lkml.kernel.org/r/1617259448-22529-7-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Vineet Gupta <vgupta@synopsys.com> [arc] Cc: Russell King <linux@armlinux.org.uk> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Palmer Dabbelt <palmerdabbelt@google.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Rich Felker <dalias@libc.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-05 01:38:29 +00:00
select HAVE_ARCH_TRANSPARENT_HUGEPAGE if ARC_MMU_V4
select HAVE_DEBUG_STACKOVERFLOW
select HAVE_DEBUG_KMEMLEAK
select HAVE_IOREMAP_PROT
select HAVE_KERNEL_GZIP
select HAVE_KERNEL_LZMA
select HAVE_KPROBES
select HAVE_KRETPROBES
select HAVE_REGS_AND_STACK_ACCESS_API
select HAVE_MOD_ARCH_SPECIFIC
select HAVE_PERF_EVENTS
select HAVE_SYSCALL_TRACEPOINTS
select IRQ_DOMAIN
select LOCK_MM_AND_FIND_VMA
select MODULES_USE_ELF_RELA
select OF
select OF_EARLY_FLATTREE
select PCI_SYSCALL if PCI
select HAVE_ARCH_JUMP_LABEL if ISA_ARCV2 && !CPU_ENDIAN_BE32
select TRACE_IRQFLAGS_SUPPORT
ARC: Add eBPF JIT support This will add eBPF JIT support to the 32-bit ARCv2 processors. The implementation is qualified by running the BPF tests on a Synopsys HSDK board with "ARC HS38 v2.1c at 500 MHz" as the 4-core CPU. The test_bpf.ko reports 2-10 fold improvements in execution time of its tests. For instance: test_bpf: #33 tcpdump port 22 jited:0 704 1766 2104 PASS test_bpf: #33 tcpdump port 22 jited:1 120 224 260 PASS test_bpf: #141 ALU_DIV_X: 4294967295 / 4294967295 = 1 jited:0 238 PASS test_bpf: #141 ALU_DIV_X: 4294967295 / 4294967295 = 1 jited:1 23 PASS test_bpf: #776 JMP32_JGE_K: all ... magnitudes jited:0 2034681 PASS test_bpf: #776 JMP32_JGE_K: all ... magnitudes jited:1 1020022 PASS Deployment and structure ------------------------ The related codes are added to "arch/arc/net": - bpf_jit.h -- The interface that a back-end translator must provide - bpf_jit_core.c -- Knows how to handle the input eBPF byte stream - bpf_jit_arcv2.c -- The back-end code that knows the translation logic The bpf_int_jit_compile() at the end of bpf_jit_core.c is the entrance to the whole process. Normally, the translation is done in one pass, namely the "normal pass". In case some relocations are not known during this pass, some data (arc_jit_data) is allocated for the next pass to come. This possible next (and last) pass is called the "extra pass". 1. Normal pass # The necessary pass 1a. Dry run # Get the whole JIT length, epilogue offset, etc. 1b. Emit phase # Allocate memory and start emitting instructions 2. Extra pass # Only needed if there are relocations to be fixed 2a. Patch relocations Support status -------------- The JIT compiler supports BPF instructions up to "cpu=v4". However, it does not yet provide support for: - Tail calls - Atomic operations - 64-bit division/remainder - BPF_PROBE_MEM* (exception table) The result of "test_bpf" test suite on an HSDK board is: hsdk-lnx# insmod test_bpf.ko test_suite=test_bpf test_bpf: Summary: 863 PASSED, 186 FAILED, [851/851 JIT'ed] All the failing test cases are due to the ones that were not JIT'ed. Categorically, they can be represented as: .-----------.------------.-------------. | test type | opcodes | # of cases | |-----------+------------+-------------| | atomic | 0xC3, 0xDB | 149 | | div64 | 0x37, 0x3F | 22 | | mod64 | 0x97, 0x9F | 15 | `-----------^------------+-------------| | (total) 186 | `-------------' Setup: build config ------------------- The following configs must be set to have a working JIT test: CONFIG_BPF_JIT=y CONFIG_BPF_JIT_ALWAYS_ON=y CONFIG_TEST_BPF=m The following options are not necessary for the tests module, but are good to have: CONFIG_DEBUG_INFO=y # prerequisite for below CONFIG_DEBUG_INFO_BTF=y # so bpftool can generate vmlinux.h CONFIG_FTRACE=y # CONFIG_BPF_SYSCALL=y # all these options lead to CONFIG_KPROBE_EVENTS=y # having CONFIG_BPF_EVENTS=y CONFIG_PERF_EVENTS=y # Some BPF programs provide data through /sys/kernel/debug: CONFIG_DEBUG_FS=y arc# mount -t debugfs debugfs /sys/kernel/debug Setup: elfutils --------------- The libdw.{so,a} library that is used by pahole for processing the final binary must come from elfutils 0.189 or newer. The support for ARCv2 [1] has been added since that version. [1] https://sourceware.org/git/?p=elfutils.git;a=commit;h=de3d46b3e7 Setup: pahole ------------- The line below in linux/scripts/Makefile.btf must be commented out: pahole-flags-$(call test-ge, $(pahole-ver), 121) += --btf_gen_floats Or else, the build will fail: $ make V=1 ... BTF .btf.vmlinux.bin.o pahole -J --btf_gen_floats \ -j --lang_exclude=rust \ --skip_encoding_btf_inconsistent_proto \ --btf_gen_optimized .tmp_vmlinux.btf Complex, interval and imaginary float types are not supported Encountered error while encoding BTF. ... BTFIDS vmlinux ./tools/bpf/resolve_btfids/resolve_btfids vmlinux libbpf: failed to find '.BTF' ELF section in vmlinux FAILED: load BTF from vmlinux: No data available This is due to the fact that the ARC toolchains generate "complex float" DIE entries in libgcc and at the moment, pahole can't handle such entries. Running the tests ----------------- host$ scp /bld/linux/lib/test_bpf.ko arc: arc # sysctl net.core.bpf_jit_enable=1 arc # insmod test_bpf.ko test_suite=test_bpf ... test_bpf: #1048 Staggered jumps: JMP32_JSLE_X jited:1 697811 PASS test_bpf: Summary: 863 PASSED, 186 FAILED, [851/851 JIT'ed] Acknowledgments --------------- - Claudiu Zissulescu for his unwavering support - Yuriy Kolerov for testing and troubleshooting - Vladimir Isaev for the pahole workaround - Sergey Matyukevich for paving the road by adding the interpreter support Signed-off-by: Shahab Vahedi <shahab@synopsys.com> Link: https://lore.kernel.org/r/20240430145604.38592-1-list+bpf@vahedi.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-30 14:56:04 +00:00
select HAVE_EBPF_JIT if ISA_ARCV2
config LOCKDEP_SUPPORT
def_bool y
config SCHED_OMIT_FRAME_POINTER
def_bool y
config GENERIC_CSUM
def_bool y
config ARCH_FLATMEM_ENABLE
def_bool y
config MMU
def_bool y
config NO_IOPORT_MAP
def_bool y
config GENERIC_CALIBRATE_DELAY
def_bool y
config GENERIC_HWEIGHT
def_bool y
config STACKTRACE_SUPPORT
def_bool y
select STACKTRACE
menu "ARC Architecture Configuration"
menu "ARC Platform/SoC/Board"
source "arch/arc/plat-tb10x/Kconfig"
source "arch/arc/plat-axs10x/Kconfig"
source "arch/arc/plat-hsdk/Kconfig"
endmenu
choice
prompt "ARC Instruction Set"
default ISA_ARCV2
config ISA_ARCOMPACT
bool "ARCompact ISA"
lib/GCD.c: use binary GCD algorithm instead of Euclidean The binary GCD algorithm is based on the following facts: 1. If a and b are all evens, then gcd(a,b) = 2 * gcd(a/2, b/2) 2. If a is even and b is odd, then gcd(a,b) = gcd(a/2, b) 3. If a and b are all odds, then gcd(a,b) = gcd((a-b)/2, b) = gcd((a+b)/2, b) Even on x86 machines with reasonable division hardware, the binary algorithm runs about 25% faster (80% the execution time) than the division-based Euclidian algorithm. On platforms like Alpha and ARMv6 where division is a function call to emulation code, it's even more significant. There are two variants of the code here, depending on whether a fast __ffs (find least significant set bit) instruction is available. This allows the unpredictable branches in the bit-at-a-time shifting loop to be eliminated. If fast __ffs is not available, the "even/odd" GCD variant is used. I use the following code to benchmark: #include <stdio.h> #include <stdlib.h> #include <stdint.h> #include <string.h> #include <time.h> #include <unistd.h> #define swap(a, b) \ do { \ a ^= b; \ b ^= a; \ a ^= b; \ } while (0) unsigned long gcd0(unsigned long a, unsigned long b) { unsigned long r; if (a < b) { swap(a, b); } if (b == 0) return a; while ((r = a % b) != 0) { a = b; b = r; } return b; } unsigned long gcd1(unsigned long a, unsigned long b) { unsigned long r = a | b; if (!a || !b) return r; b >>= __builtin_ctzl(b); for (;;) { a >>= __builtin_ctzl(a); if (a == b) return a << __builtin_ctzl(r); if (a < b) swap(a, b); a -= b; } } unsigned long gcd2(unsigned long a, unsigned long b) { unsigned long r = a | b; if (!a || !b) return r; r &= -r; while (!(b & r)) b >>= 1; for (;;) { while (!(a & r)) a >>= 1; if (a == b) return a; if (a < b) swap(a, b); a -= b; a >>= 1; if (a & r) a += b; a >>= 1; } } unsigned long gcd3(unsigned long a, unsigned long b) { unsigned long r = a | b; if (!a || !b) return r; b >>= __builtin_ctzl(b); if (b == 1) return r & -r; for (;;) { a >>= __builtin_ctzl(a); if (a == 1) return r & -r; if (a == b) return a << __builtin_ctzl(r); if (a < b) swap(a, b); a -= b; } } unsigned long gcd4(unsigned long a, unsigned long b) { unsigned long r = a | b; if (!a || !b) return r; r &= -r; while (!(b & r)) b >>= 1; if (b == r) return r; for (;;) { while (!(a & r)) a >>= 1; if (a == r) return r; if (a == b) return a; if (a < b) swap(a, b); a -= b; a >>= 1; if (a & r) a += b; a >>= 1; } } static unsigned long (*gcd_func[])(unsigned long a, unsigned long b) = { gcd0, gcd1, gcd2, gcd3, gcd4, }; #define TEST_ENTRIES (sizeof(gcd_func) / sizeof(gcd_func[0])) #if defined(__x86_64__) #define rdtscll(val) do { \ unsigned long __a,__d; \ __asm__ __volatile__("rdtsc" : "=a" (__a), "=d" (__d)); \ (val) = ((unsigned long long)__a) | (((unsigned long long)__d)<<32); \ } while(0) static unsigned long long benchmark_gcd_func(unsigned long (*gcd)(unsigned long, unsigned long), unsigned long a, unsigned long b, unsigned long *res) { unsigned long long start, end; unsigned long long ret; unsigned long gcd_res; rdtscll(start); gcd_res = gcd(a, b); rdtscll(end); if (end >= start) ret = end - start; else ret = ~0ULL - start + 1 + end; *res = gcd_res; return ret; } #else static inline struct timespec read_time(void) { struct timespec time; clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &time); return time; } static inline unsigned long long diff_time(struct timespec start, struct timespec end) { struct timespec temp; if ((end.tv_nsec - start.tv_nsec) < 0) { temp.tv_sec = end.tv_sec - start.tv_sec - 1; temp.tv_nsec = 1000000000ULL + end.tv_nsec - start.tv_nsec; } else { temp.tv_sec = end.tv_sec - start.tv_sec; temp.tv_nsec = end.tv_nsec - start.tv_nsec; } return temp.tv_sec * 1000000000ULL + temp.tv_nsec; } static unsigned long long benchmark_gcd_func(unsigned long (*gcd)(unsigned long, unsigned long), unsigned long a, unsigned long b, unsigned long *res) { struct timespec start, end; unsigned long gcd_res; start = read_time(); gcd_res = gcd(a, b); end = read_time(); *res = gcd_res; return diff_time(start, end); } #endif static inline unsigned long get_rand() { if (sizeof(long) == 8) return (unsigned long)rand() << 32 | rand(); else return rand(); } int main(int argc, char **argv) { unsigned int seed = time(0); int loops = 100; int repeats = 1000; unsigned long (*res)[TEST_ENTRIES]; unsigned long long elapsed[TEST_ENTRIES]; int i, j, k; for (;;) { int opt = getopt(argc, argv, "n:r:s:"); /* End condition always first */ if (opt == -1) break; switch (opt) { case 'n': loops = atoi(optarg); break; case 'r': repeats = atoi(optarg); break; case 's': seed = strtoul(optarg, NULL, 10); break; default: /* You won't actually get here. */ break; } } res = malloc(sizeof(unsigned long) * TEST_ENTRIES * loops); memset(elapsed, 0, sizeof(elapsed)); srand(seed); for (j = 0; j < loops; j++) { unsigned long a = get_rand(); /* Do we have args? */ unsigned long b = argc > optind ? strtoul(argv[optind], NULL, 10) : get_rand(); unsigned long long min_elapsed[TEST_ENTRIES]; for (k = 0; k < repeats; k++) { for (i = 0; i < TEST_ENTRIES; i++) { unsigned long long tmp = benchmark_gcd_func(gcd_func[i], a, b, &res[j][i]); if (k == 0 || min_elapsed[i] > tmp) min_elapsed[i] = tmp; } } for (i = 0; i < TEST_ENTRIES; i++) elapsed[i] += min_elapsed[i]; } for (i = 0; i < TEST_ENTRIES; i++) printf("gcd%d: elapsed %llu\n", i, elapsed[i]); k = 0; srand(seed); for (j = 0; j < loops; j++) { unsigned long a = get_rand(); unsigned long b = argc > optind ? strtoul(argv[optind], NULL, 10) : get_rand(); for (i = 1; i < TEST_ENTRIES; i++) { if (res[j][i] != res[j][0]) break; } if (i < TEST_ENTRIES) { if (k == 0) { k = 1; fprintf(stderr, "Error:\n"); } fprintf(stderr, "gcd(%lu, %lu): ", a, b); for (i = 0; i < TEST_ENTRIES; i++) fprintf(stderr, "%ld%s", res[j][i], i < TEST_ENTRIES - 1 ? ", " : "\n"); } } if (k == 0) fprintf(stderr, "PASS\n"); free(res); return 0; } Compiled with "-O2", on "VirtualBox 4.4.0-22-generic #38-Ubuntu x86_64" got: zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10 gcd0: elapsed 10174 gcd1: elapsed 2120 gcd2: elapsed 2902 gcd3: elapsed 2039 gcd4: elapsed 2812 PASS zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10 gcd0: elapsed 9309 gcd1: elapsed 2280 gcd2: elapsed 2822 gcd3: elapsed 2217 gcd4: elapsed 2710 PASS zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10 gcd0: elapsed 9589 gcd1: elapsed 2098 gcd2: elapsed 2815 gcd3: elapsed 2030 gcd4: elapsed 2718 PASS zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10 gcd0: elapsed 9914 gcd1: elapsed 2309 gcd2: elapsed 2779 gcd3: elapsed 2228 gcd4: elapsed 2709 PASS [akpm@linux-foundation.org: avoid #defining a CONFIG_ variable] Signed-off-by: Zhaoxiu Zeng <zhaoxiu.zeng@gmail.com> Signed-off-by: George Spelvin <linux@horizon.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-21 00:03:57 +00:00
select CPU_NO_EFFICIENT_FFS
help
The original ARC ISA of ARC600/700 cores
config ISA_ARCV2
bool "ARC ISA v2"
select ARC_TIMERS_64BIT
help
ISA for the Next Generation ARC-HS cores
endchoice
menu "ARC CPU Configuration"
choice
prompt "ARC Core"
default ARC_CPU_770 if ISA_ARCOMPACT
default ARC_CPU_HS if ISA_ARCV2
config ARC_CPU_770
bool "ARC770"
depends on ISA_ARCOMPACT
select ARC_HAS_SWAPE
help
Support for ARC770 core introduced with Rel 4.10 (Summer 2011)
This core has a bunch of cool new features:
-MMU-v3: Variable Page Sz (4k, 8k, 16k), bigger J-TLB (128x4)
Shared Address Spaces (for sharing TLB entries in MMU)
-Caches: New Prog Model, Region Flush
-Insns: endian swap, load-locked/store-conditional, time-stamp-ctr
config ARC_CPU_HS
bool "ARC-HS"
depends on ISA_ARCV2
help
Support for ARC HS38x Cores based on ARCv2 ISA
The notable features are:
- SMP configurations of up to 4 cores with coherency
- Optional L2 Cache and IO-Coherency
- Revised Interrupt Architecture (multiple priorites, reg banks,
auto stack switch, auto regfile save/restore)
- MMUv4 (PIPT dcache, Huge Pages)
- Instructions for
* 64bit load/store: LDD, STD
* Hardware assisted divide/remainder: DIV, REM
* Function prologue/epilogue: ENTER_S, LEAVE_S
* IRQ enable/disable: CLRI, SETI
* pop count: FFS, FLS
* SETcc, BMSKN, XBFU...
endchoice
config ARC_TUNE_MCPU
string "Override default -mcpu compiler flag"
default ""
help
Override default -mcpu=xxx compiler flag (which is set depending on
the ISA version) with the specified value.
NOTE: If specified flag isn't supported by current compiler the
ISA default value will be used as a fallback.
config CPU_BIG_ENDIAN
bool "Enable Big Endian Mode"
help
Build kernel for Big Endian Mode of ARC CPU
config SMP
bool "Symmetric Multi-Processing"
select ARC_MCIP if ISA_ARCV2
help
This enables support for systems with more than one CPU.
if SMP
config NR_CPUS
int "Maximum number of CPUs (2-4096)"
range 2 4096
default "4"
config ARC_SMP_HALT_ON_RESET
bool "Enable Halt-on-reset boot mode"
help
In SMP configuration cores can be configured as Halt-on-reset
or they could all start at same time. For Halt-on-reset, non
masters are parked until Master kicks them so they can start off
at designated entry point. For other case, all jump to common
entry point and spin wait for Master's signal.
endif #SMP
config ARC_MCIP
bool "ARConnect Multicore IP (MCIP) Support "
depends on ISA_ARCV2
default y if SMP
help
This IP block enables SMP in ARC-HS38 cores.
It provides for cross-core interrupts, multi-core debug
hardware semaphores, shared memory,....
menuconfig ARC_CACHE
bool "Enable Cache Support"
default y
if ARC_CACHE
config ARC_CACHE_LINE_SHIFT
int "Cache Line Length (as power of 2)"
range 5 7
default "6"
help
Starting with ARC700 4.9, Cache line length is configurable,
This option specifies "N", with Line-len = 2 power N
So line lengths of 32, 64, 128 are specified by 5,6,7, respectively
Linux only supports same line lengths for I and D caches.
config ARC_HAS_ICACHE
bool "Use Instruction Cache"
default y
config ARC_HAS_DCACHE
bool "Use Data Cache"
default y
config ARC_CACHE_PAGES
bool "Per Page Cache Control"
default y
depends on ARC_HAS_ICACHE || ARC_HAS_DCACHE
help
This can be used to over-ride the global I/D Cache Enable on a
per-page basis (but only for pages accessed via MMU such as
Kernel Virtual address or User Virtual Address)
TLB entries have a per-page Cache Enable Bit.
Note that Global I/D ENABLE + Per Page DISABLE works but corollary
Global DISABLE + Per Page ENABLE won't work
endif #ARC_CACHE
config ARC_HAS_ICCM
bool "Use ICCM"
help
Single Cycle RAMS to store Fast Path Code
config ARC_ICCM_SZ
int "ICCM Size in KB"
default "64"
depends on ARC_HAS_ICCM
config ARC_HAS_DCCM
bool "Use DCCM"
help
Single Cycle RAMS to store Fast Path Data
config ARC_DCCM_SZ
int "DCCM Size in KB"
default "64"
depends on ARC_HAS_DCCM
config ARC_DCCM_BASE
hex "DCCM map address"
default "0xA0000000"
depends on ARC_HAS_DCCM
choice
prompt "MMU Version"
default ARC_MMU_V3 if ISA_ARCOMPACT
default ARC_MMU_V4 if ISA_ARCV2
config ARC_MMU_V3
bool "MMU v3"
depends on ISA_ARCOMPACT
help
Introduced with ARC700 4.10: New Features
Variable Page size (1k-16k), var JTLB size 128 x (2 or 4)
Shared Address Spaces (SASID)
config ARC_MMU_V4
bool "MMU v4"
depends on ISA_ARCV2
endchoice
choice
prompt "MMU Page Size"
default ARC_PAGE_SIZE_8K
config ARC_PAGE_SIZE_8K
bool "8KB"
select HAVE_PAGE_SIZE_8KB
help
Choose between 8k vs 16k
config ARC_PAGE_SIZE_16K
select HAVE_PAGE_SIZE_16KB
bool "16KB"
config ARC_PAGE_SIZE_4K
bool "4KB"
select HAVE_PAGE_SIZE_4KB
depends on ARC_MMU_V3 || ARC_MMU_V4
endchoice
choice
prompt "MMU Super Page Size"
depends on ISA_ARCV2 && TRANSPARENT_HUGEPAGE
default ARC_HUGEPAGE_2M
config ARC_HUGEPAGE_2M
bool "2MB"
config ARC_HUGEPAGE_16M
bool "16MB"
endchoice
config PGTABLE_LEVELS
int "Number of Page table levels"
default 2
config ARC_COMPACT_IRQ_LEVELS
depends on ISA_ARCOMPACT
bool "Setup Timer IRQ as high Priority"
# if SMP, LV2 enabled ONLY if ARC implementation has LV2 re-entrancy
depends on !SMP
config ARC_FPU_SAVE_RESTORE
bool "Enable FPU state persistence across context switch"
help
ARCompact FPU has internal registers to assist with Double precision
Floating Point operations. There are control and stauts registers
for floating point exceptions and rounding modes. These are
preserved across task context switch when enabled.
config ARC_CANT_LLSC
def_bool n
config ARC_HAS_LLSC
bool "Insn: LLOCK/SCOND (efficient atomic ops)"
default y
depends on !ARC_CANT_LLSC
config ARC_HAS_SWAPE
bool "Insn: SWAPE (endian-swap)"
default y
if ISA_ARCV2
config ARC_USE_UNALIGNED_MEM_ACCESS
bool "Enable unaligned access in HW"
default y
select HAVE_EFFICIENT_UNALIGNED_ACCESS
help
The ARC HS architecture supports unaligned memory access
which is disabled by default. Enable unaligned access in
hardware and use software to use it
config ARC_HAS_LL64
bool "Insn: 64bit LDD/STD"
help
Enable gcc to generate 64-bit load/store instructions
ISA mandates even/odd registers to allow encoding of two
dest operands with 2 possible source operands.
default y
config ARC_HAS_DIV_REM
bool "Insn: div, divu, rem, remu"
default y
config ARC_HAS_ACCL_REGS
bool "Reg Pair ACCL:ACCH (FPU and/or MPY > 6 and/or DSP)"
default y
help
Depending on the configuration, CPU can contain accumulator reg-pair
(also referred to as r58:r59). These can also be used by gcc as GPR so
kernel needs to save/restore per process
config ARC_DSP_HANDLED
def_bool n
config ARC_DSP_SAVE_RESTORE_REGS
def_bool n
choice
prompt "DSP support"
default ARC_DSP_NONE
help
Depending on the configuration, CPU can contain DSP registers
(ACC0_GLO, ACC0_GHI, DSP_BFLY0, DSP_CTRL, DSP_FFT_CTRL).
Below are options describing how to handle these registers in
interrupt entry / exit and in context switch.
config ARC_DSP_NONE
bool "No DSP extension presence in HW"
help
No DSP extension presence in HW
config ARC_DSP_KERNEL
bool "DSP extension in HW, no support for userspace"
select ARC_HAS_ACCL_REGS
select ARC_DSP_HANDLED
help
DSP extension presence in HW, no support for DSP-enabled userspace
applications. We don't save / restore DSP registers and only do
some minimal preparations so userspace won't be able to break kernel
config ARC_DSP_USERSPACE
bool "Support DSP for userspace apps"
select ARC_HAS_ACCL_REGS
select ARC_DSP_HANDLED
select ARC_DSP_SAVE_RESTORE_REGS
help
DSP extension presence in HW, support save / restore DSP registers to
run DSP-enabled userspace applications
config ARC_DSP_AGU_USERSPACE
bool "Support DSP with AGU for userspace apps"
select ARC_HAS_ACCL_REGS
select ARC_DSP_HANDLED
select ARC_DSP_SAVE_RESTORE_REGS
help
DSP and AGU extensions presence in HW, support save / restore DSP
and AGU registers to run DSP-enabled userspace applications
endchoice
config ARC_IRQ_NO_AUTOSAVE
bool "Disable hardware autosave regfile on interrupts"
default n
help
On HS cores, taken interrupt auto saves the regfile on stack.
This is programmable and can be optionally disabled in which case
software INTERRUPT_PROLOGUE/EPILGUE do the needed work
config ARC_LPB_DISABLE
bool "Disable loop buffer (LPB)"
help
On HS cores, loop buffer (LPB) is programmable in runtime and can
be optionally disabled.
endif # ISA_ARCV2
endmenu # "ARC CPU Configuration"
config LINUX_LINK_BASE
hex "Kernel link address"
default "0x80000000"
help
ARC700 divides the 32 bit phy address space into two equal halves
-Lower 2G (0 - 0x7FFF_FFFF ) is user virtual, translated by MMU
-Upper 2G (0x8000_0000 onwards) is untranslated, for kernel
Typically Linux kernel is linked at the start of untransalted addr,
hence the default value of 0x8zs.
However some customers have peripherals mapped at this addr, so
Linux needs to be scooted a bit.
If you don't know what the above means, leave this setting alone.
This needs to match memory start address specified in Device Tree
config LINUX_RAM_BASE
hex "RAM base address"
default LINUX_LINK_BASE
help
By default Linux is linked at base of RAM. However in some special
cases (such as HSDK), Linux can't be linked at start of DDR, hence
this option.
config HIGHMEM
bool "High Memory Support"
arc: use FLATMEM with freeing of unused memory map instead of DISCONTIGMEM Currently ARC uses DISCONTIGMEM to cope with sparse physical memory address space on systems with 2 memory banks. While DISCONTIGMEM avoids wasting memory on unpopulated memory map, it adds both memory and CPU overhead relatively to FLATMEM. Moreover, DISCONTINGMEM is generally considered deprecated. The obvious replacement for DISCONTIGMEM would be SPARSEMEM, but it is also less efficient than FLATMEM in pfn_to_page() and page_to_pfn() conversions. Besides it requires tuning of SECTION_SIZE which is not trivial for possible ARC memory configuration. Since the memory map for both banks is always allocated from the "lowmem" bank, it is possible to use FLATMEM for two-bank configuration and simply free the unused hole in the memory map. All is required for that is to provide ARC-specific pfn_valid() that will take into account actual physical memory configuration and define HAVE_ARCH_PFN_VALID. The resulting kernel image configured with defconfig + HIGHMEM=y is smaller: $ size a/vmlinux b/vmlinux text data bss dec hex filename 4673503 1245456 279756 6198715 5e95bb a/vmlinux 4658706 1246864 279756 6185326 5e616e b/vmlinux $ ./scripts/bloat-o-meter a/vmlinux b/vmlinux add/remove: 28/30 grow/shrink: 42/399 up/down: 10986/-29025 (-18039) ... Total: Before=4709315, After = 4691276, chg -0.38% Booting nSIM with haps_ns.dts results in the following memory usage reports: a: Memory: 1559104K/1572864K available (3531K kernel code, 595K rwdata, 752K rodata, 136K init, 275K bss, 13760K reserved, 0K cma-reserved, 1048576K highmem) b: Memory: 1559112K/1572864K available (3519K kernel code, 594K rwdata, 752K rodata, 136K init, 280K bss, 13752K reserved, 0K cma-reserved, 1048576K highmem) Link: https://lkml.kernel.org/r/20201101170454.9567-11-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matt Turner <mattst88@gmail.com> Cc: Meelis Roos <mroos@linux.ee> Cc: Michael Schmitz <schmitzmic@gmail.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 03:10:04 +00:00
select HAVE_ARCH_PFN_VALID
2020-11-03 09:27:21 +00:00
select KMAP_LOCAL
help
With ARC 2G:2G address split, only upper 2G is directly addressable by
kernel. Enable this to potentially allow access to rest of 2G and PAE
in future
config ARC_HAS_PAE40
bool "Support for the 40-bit Physical Address Extension"
depends on ISA_ARCV2
select HIGHMEM
select PHYS_ADDR_T_64BIT
help
Enable access to physical memory beyond 4G, only supported on
ARC cores with 40 bit Physical Addressing support
config ARC_KVADDR_SIZE
int "Kernel Virtual Address Space size (MB)"
range 0 512
default "256"
help
The kernel address space is carved out of 256MB of translated address
space for catering to vmalloc, modules, pkmap, fixmap. This however may
not suffice vmalloc requirements of a 4K CPU EZChip system. So allow
this to be stretched to 512 MB (by extending into the reserved
kernel-user gutter)
config ARC_CURR_IN_REG
bool "cache current task pointer in gp"
default y
help
This reserves gp register to point to Current Task in
kernel mode eliding memory access for each access
config ARC_EMUL_UNALIGNED
bool "Emulate unaligned memory access (userspace only)"
select SYSCTL_ARCH_UNALIGN_NO_WARN
select SYSCTL_ARCH_UNALIGN_ALLOW
depends on ISA_ARCOMPACT
help
This enables misaligned 16 & 32 bit memory access from user space.
Use ONLY-IF-ABS-NECESSARY as it will be very slow and also can hide
potential bugs in code
config HZ
int "Timer Frequency"
default 100
config ARC_METAWARE_HLINK
bool "Support for Metaware debugger assisted Host access"
help
This options allows a Linux userland apps to directly access
host file system (open/creat/read/write etc) with help from
Metaware Debugger. This can come in handy for Linux-host communication
when there is no real usable peripheral such as EMAC.
menuconfig ARC_DBG
bool "ARC debugging"
default y
if ARC_DBG
config ARC_DW2_UNWIND
bool "Enable DWARF specific kernel stack unwind"
default y
select KALLSYMS
help
Compiles the kernel with DWARF unwind information and can be used
to get stack backtraces.
If you say Y here the resulting kernel image will be slightly larger
but not slower, and it will give very useful debugging information.
If you don't debug the kernel, you can say N, but we may not be able
to solve problems without frame unwind information
config ARC_DBG_JUMP_LABEL
bool "Paranoid checks in Static Keys (jump labels) code"
depends on JUMP_LABEL
default y if STATIC_KEYS_SELFTEST
help
Enable paranoid checks and self-test of both ARC-specific and generic
part of static keys (jump labels) related code.
endif
config ARC_BUILTIN_DTB_NAME
string "Built in DTB"
help
Set the name of the DTB to embed in the vmlinux binary
Leaving it blank selects the "nsim_700" dtb.
endmenu # "ARC Architecture Configuration"
config ARCH_FORCE_MAX_ORDER
int "Maximum zone order"
default "11" if ARC_HUGEPAGE_16M
default "10"
source "kernel/power/Kconfig"