Christoph Lameter 2244b95a7b [PATCH] zoned vm counters: basic ZVC (zoned vm counter) implementation
Per zone counter infrastructure

The counters that we currently have for the VM are split per processor.  The
processor however has not much to do with the zone these pages belong to.  We
cannot tell f.e.  how many ZONE_DMA pages are dirty.

So we are blind to potentially inbalances in the usage of memory in various
zones.  F.e.  in a NUMA system we cannot tell how many pages are dirty on a
particular node.  If we knew then we could put measures into the VM to balance
the use of memory between different zones and different nodes in a NUMA
system.  For example it would be possible to limit the dirty pages per node so
that fast local memory is kept available even if a process is dirtying huge
amounts of pages.

Another example is zone reclaim.  We do not know how many unmapped pages exist
per zone.  So we just have to try to reclaim.  If it is not working then we
pause and try again later.  It would be better if we knew when it makes sense
to reclaim unmapped pages from a zone.  This patchset allows the determination
of the number of unmapped pages per zone.  We can remove the zone reclaim
interval with the counters introduced here.

Futhermore the ability to have various usage statistics available will allow
the development of new NUMA balancing algorithms that may be able to improve
the decision making in the scheduler of when to move a process to another node
and hopefully will also enable automatic page migration through a user space
program that can analyse the memory load distribution and then rebalance
memory use in order to increase performance.

The counter framework here implements differential counters for each processor
in struct zone.  The differential counters are consolidated when a threshold
is exceeded (like done in the current implementation for nr_pageache), when
slab reaping occurs or when a consolidation function is called.

Consolidation uses atomic operations and accumulates counters per zone in the
zone structure and also globally in the vm_stat array.  VM functions can
access the counts by simply indexing a global or zone specific array.

The arrangement of counters in an array also simplifies processing when output
has to be generated for /proc/*.

Counters can be updated by calling inc/dec_zone_page_state or
_inc/dec_zone_page_state analogous to *_page_state.  The second group of
functions can be called if it is known that interrupts are disabled.

Special optimized increment and decrement functions are provided.  These can
avoid certain checks and use increment or decrement instructions that an
architecture may provide.

We also add a new CONFIG_DMA_IS_NORMAL that signifies that an architecture can
do DMA to all memory and therefore ZONE_NORMAL will not be populated.  This is
only currently set for IA64 SGI SN2 and currently only affects
node_page_state().  In the best case node_page_state can be reduced to
retrieving a single counter for the one zone on the node.

[akpm@osdl.org: cleanups]
[akpm@osdl.org: export vm_stat[] for filesystems]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30 11:25:34 -07:00

527 lines
13 KiB
Plaintext

#
# For a description of the syntax of this configuration file,
# see Documentation/kbuild/kconfig-language.txt.
#
mainmenu "IA-64 Linux Kernel Configuration"
source "init/Kconfig"
menu "Processor type and features"
config IA64
bool
default y
help
The Itanium Processor Family is Intel's 64-bit successor to
the 32-bit X86 line. The IA-64 Linux project has a home
page at <http://www.linuxia64.org/> and a mailing list at
<linux-ia64@vger.kernel.org>.
config 64BIT
bool
default y
config MMU
bool
default y
config SWIOTLB
bool
default y
config RWSEM_XCHGADD_ALGORITHM
bool
default y
config GENERIC_FIND_NEXT_BIT
bool
default y
config GENERIC_CALIBRATE_DELAY
bool
default y
config TIME_INTERPOLATION
bool
default y
config DMI
bool
default y
config EFI
bool
default y
config GENERIC_IOMAP
bool
default y
config SCHED_NO_NO_OMIT_FRAME_POINTER
bool
default y
config IA64_UNCACHED_ALLOCATOR
bool
select GENERIC_ALLOCATOR
config DMA_IS_DMA32
bool
default y
config DMA_IS_NORMAL
bool
depends on IA64_SGI_SN2
default y
choice
prompt "System type"
default IA64_GENERIC
config IA64_GENERIC
bool "generic"
select ACPI
select PCI
select NUMA
select ACPI_NUMA
help
This selects the system type of your hardware. A "generic" kernel
will run on any supported IA-64 system. However, if you configure
a kernel for your specific system, it will be faster and smaller.
generic For any supported IA-64 system
DIG-compliant For DIG ("Developer's Interface Guide") compliant systems
HP-zx1/sx1000 For HP systems
HP-zx1/sx1000+swiotlb For HP systems with (broken) DMA-constrained devices.
SGI-SN2 For SGI Altix systems
Ski-simulator For the HP simulator <http://www.hpl.hp.com/research/linux/ski/>
If you don't know what to do, choose "generic".
config IA64_DIG
bool "DIG-compliant"
config IA64_HP_ZX1
bool "HP-zx1/sx1000"
help
Build a kernel that runs on HP zx1 and sx1000 systems. This adds
support for the HP I/O MMU.
config IA64_HP_ZX1_SWIOTLB
bool "HP-zx1/sx1000 with software I/O TLB"
help
Build a kernel that runs on HP zx1 and sx1000 systems even when they
have broken PCI devices which cannot DMA to full 32 bits. Apart
from support for the HP I/O MMU, this includes support for the software
I/O TLB, which allows supporting the broken devices at the expense of
wasting some kernel memory (about 2MB by default).
config IA64_SGI_SN2
bool "SGI-SN2"
help
Selecting this option will optimize the kernel for use on sn2 based
systems, but the resulting kernel binary will not run on other
types of ia64 systems. If you have an SGI Altix system, it's safe
to select this option. If in doubt, select ia64 generic support
instead.
config IA64_HP_SIM
bool "Ski-simulator"
endchoice
choice
prompt "Processor type"
default ITANIUM
config ITANIUM
bool "Itanium"
help
Select your IA-64 processor type. The default is Itanium.
This choice is safe for all IA-64 systems, but may not perform
optimally on systems with, say, Itanium 2 or newer processors.
config MCKINLEY
bool "Itanium 2"
help
Select this to configure for an Itanium 2 (McKinley) processor.
endchoice
choice
prompt "Kernel page size"
default IA64_PAGE_SIZE_16KB
config IA64_PAGE_SIZE_4KB
bool "4KB"
help
This lets you select the page size of the kernel. For best IA-64
performance, a page size of 8KB or 16KB is recommended. For best
IA-32 compatibility, a page size of 4KB should be selected (the vast
majority of IA-32 binaries work perfectly fine with a larger page
size). For Itanium 2 or newer systems, a page size of 64KB can also
be selected.
4KB For best IA-32 compatibility
8KB For best IA-64 performance
16KB For best IA-64 performance
64KB Requires Itanium 2 or newer processor.
If you don't know what to do, choose 16KB.
config IA64_PAGE_SIZE_8KB
bool "8KB"
config IA64_PAGE_SIZE_16KB
bool "16KB"
config IA64_PAGE_SIZE_64KB
depends on !ITANIUM
bool "64KB"
endchoice
choice
prompt "Page Table Levels"
default PGTABLE_3
config PGTABLE_3
bool "3 Levels"
config PGTABLE_4
depends on !IA64_PAGE_SIZE_64KB
bool "4 Levels"
endchoice
source kernel/Kconfig.hz
config IA64_BRL_EMU
bool
depends on ITANIUM
default y
# align cache-sensitive data to 128 bytes
config IA64_L1_CACHE_SHIFT
int
default "7" if MCKINLEY
default "6" if ITANIUM
config IA64_CYCLONE
bool "Cyclone (EXA) Time Source support"
help
Say Y here to enable support for IBM EXA Cyclone time source.
If you're unsure, answer N.
config IOSAPIC
bool
depends on !IA64_HP_SIM
default y
config IA64_SGI_SN_XP
tristate "Support communication between SGI SSIs"
depends on IA64_GENERIC || IA64_SGI_SN2
select IA64_UNCACHED_ALLOCATOR
help
An SGI machine can be divided into multiple Single System
Images which act independently of each other and have
hardware based memory protection from the others. Enabling
this feature will allow for direct communication between SSIs
based on a network adapter and DMA messaging.
config FORCE_MAX_ZONEORDER
int "MAX_ORDER (11 - 17)" if !HUGETLB_PAGE
range 11 17 if !HUGETLB_PAGE
default "17" if HUGETLB_PAGE
default "11"
config SMP
bool "Symmetric multi-processing support"
help
This enables support for systems with more than one CPU. If you have
a system with only one CPU, say N. If you have a system with more
than one CPU, say Y.
If you say N here, the kernel will run on single and multiprocessor
systems, but will use only one CPU of a multiprocessor system. If
you say Y here, the kernel will run on many, but not all,
single processor systems. On a single processor system, the kernel
will run faster if you say N here.
See also the <file:Documentation/smp.txt> and the SMP-HOWTO
available at <http://www.tldp.org/docs.html#howto>.
If you don't know what to do here, say N.
config NR_CPUS
int "Maximum number of CPUs (2-1024)"
range 2 1024
depends on SMP
default "64"
help
You should set this to the number of CPUs in your system, but
keep in mind that a kernel compiled for, e.g., 2 CPUs will boot but
only use 2 CPUs on a >2 CPU system. Setting this to a value larger
than 64 will cause the use of a CPU mask array, causing a small
performance hit.
config HOTPLUG_CPU
bool "Support for hot-pluggable CPUs (EXPERIMENTAL)"
depends on SMP && EXPERIMENTAL
select HOTPLUG
default n
---help---
Say Y here to experiment with turning CPUs off and on. CPUs
can be controlled through /sys/devices/system/cpu/cpu#.
Say N if you want to disable CPU hotplug.
config ARCH_ENABLE_MEMORY_HOTPLUG
def_bool y
config SCHED_SMT
bool "SMT scheduler support"
depends on SMP
help
Improves the CPU scheduler's decision making when dealing with
Intel IA64 chips with MultiThreading at a cost of slightly increased
overhead in some places. If unsure say N here.
config PERMIT_BSP_REMOVE
bool "Support removal of Bootstrap Processor"
depends on HOTPLUG_CPU
default n
---help---
Say Y here if your platform SAL will support removal of BSP with HOTPLUG_CPU
support.
config FORCE_CPEI_RETARGET
bool "Force assumption that CPEI can be re-targetted"
depends on PERMIT_BSP_REMOVE
default n
---help---
Say Y if you need to force the assumption that CPEI can be re-targetted to
any cpu in the system. This hint is available via ACPI 3.0 specifications.
Tiger4 systems are capable of re-directing CPEI to any CPU other than BSP.
This option it useful to enable this feature on older BIOS's as well.
You can also enable this by using boot command line option force_cpei=1.
config PREEMPT
bool "Preemptible Kernel"
help
This option reduces the latency of the kernel when reacting to
real-time or interactive events by allowing a low priority process to
be preempted even if it is in kernel mode executing a system call.
This allows applications to run more reliably even when the system is
under load.
Say Y here if you are building a kernel for a desktop, embedded
or real-time system. Say N if you are unsure.
source "mm/Kconfig"
config ARCH_SELECT_MEMORY_MODEL
def_bool y
config ARCH_DISCONTIGMEM_ENABLE
def_bool y
help
Say Y to support efficient handling of discontiguous physical memory,
for architectures which are either NUMA (Non-Uniform Memory Access)
or have huge holes in the physical address space for other reasons.
See <file:Documentation/vm/numa> for more.
config ARCH_FLATMEM_ENABLE
def_bool y
config ARCH_SPARSEMEM_ENABLE
def_bool y
depends on ARCH_DISCONTIGMEM_ENABLE
config ARCH_DISCONTIGMEM_DEFAULT
def_bool y if (IA64_SGI_SN2 || IA64_GENERIC || IA64_HP_ZX1 || IA64_HP_ZX1_SWIOTLB)
depends on ARCH_DISCONTIGMEM_ENABLE
config NUMA
bool "NUMA support"
depends on !IA64_HP_SIM && !FLATMEM
default y if IA64_SGI_SN2
help
Say Y to compile the kernel to support NUMA (Non-Uniform Memory
Access). This option is for configuring high-end multiprocessor
server systems. If in doubt, say N.
config NODES_SHIFT
int "Max num nodes shift(3-10)"
range 3 10
default "8"
depends on NEED_MULTIPLE_NODES
help
This option specifies the maximum number of nodes in your SSI system.
MAX_NUMNODES will be 2^(This value).
If in doubt, use the default.
# VIRTUAL_MEM_MAP and FLAT_NODE_MEM_MAP are functionally equivalent.
# VIRTUAL_MEM_MAP has been retained for historical reasons.
config VIRTUAL_MEM_MAP
bool "Virtual mem map"
depends on !SPARSEMEM
default y if !IA64_HP_SIM
help
Say Y to compile the kernel with support for a virtual mem map.
This code also only takes effect if a memory hole of greater than
1 Gb is found during boot. You must turn this option on if you
require the DISCONTIGMEM option for your machine. If you are
unsure, say Y.
config HOLES_IN_ZONE
bool
default y if VIRTUAL_MEM_MAP
config HAVE_ARCH_EARLY_PFN_TO_NID
def_bool y
depends on NEED_MULTIPLE_NODES
config HAVE_ARCH_NODEDATA_EXTENSION
def_bool y
depends on NUMA
config IA32_SUPPORT
bool "Support for Linux/x86 binaries"
help
IA-64 processors can execute IA-32 (X86) instructions. By
saying Y here, the kernel will include IA-32 system call
emulation support which makes it possible to transparently
run IA-32 Linux binaries on an IA-64 Linux system.
If in doubt, say Y.
config COMPAT
bool
depends on IA32_SUPPORT
default y
config IA64_MCA_RECOVERY
tristate "MCA recovery from errors other than TLB."
config PERFMON
bool "Performance monitor support"
help
Selects whether support for the IA-64 performance monitor hardware
is included in the kernel. This makes some kernel data-structures a
little bigger and slows down execution a bit, but it is generally
a good idea to turn this on. If you're unsure, say Y.
config IA64_PALINFO
tristate "/proc/pal support"
help
If you say Y here, you are able to get PAL (Processor Abstraction
Layer) information in /proc/pal. This contains useful information
about the processors in your systems, such as cache and TLB sizes
and the PAL firmware version in use.
To use this option, you have to ensure that the "/proc file system
support" (CONFIG_PROC_FS) is enabled, too.
config SGI_SN
def_bool y if (IA64_SGI_SN2 || IA64_GENERIC)
source "drivers/sn/Kconfig"
source "drivers/firmware/Kconfig"
source "fs/Kconfig.binfmt"
endmenu
menu "Power management and ACPI"
source "kernel/power/Kconfig"
source "drivers/acpi/Kconfig"
if PM
source "arch/ia64/kernel/cpufreq/Kconfig"
endif
endmenu
if !IA64_HP_SIM
menu "Bus options (PCI, PCMCIA)"
config PCI
bool "PCI support"
help
Real IA-64 machines all have PCI/PCI-X/PCI Express busses. Say Y
here unless you are using a simulator without PCI support.
config PCI_DOMAINS
bool
default PCI
source "drivers/pci/pcie/Kconfig"
source "drivers/pci/Kconfig"
source "drivers/pci/hotplug/Kconfig"
source "drivers/pcmcia/Kconfig"
endmenu
endif
source "net/Kconfig"
source "drivers/Kconfig"
source "fs/Kconfig"
source "lib/Kconfig"
#
# Use the generic interrupt handling code in kernel/irq/:
#
config GENERIC_HARDIRQS
bool
default y
config GENERIC_IRQ_PROBE
bool
default y
config GENERIC_PENDING_IRQ
bool
depends on GENERIC_HARDIRQS && SMP
default y
config IRQ_PER_CPU
bool
default y
source "arch/ia64/hp/sim/Kconfig"
menu "Instrumentation Support"
depends on EXPERIMENTAL
source "arch/ia64/oprofile/Kconfig"
config KPROBES
bool "Kprobes (EXPERIMENTAL)"
depends on EXPERIMENTAL && MODULES
help
Kprobes allows you to trap at almost any kernel address and
execute a callback function. register_kprobe() establishes
a probepoint and specifies the callback. Kprobes is useful
for kernel debugging, non-intrusive instrumentation and testing.
If in doubt, say "N".
endmenu
source "arch/ia64/Kconfig.debug"
source "security/Kconfig"
source "crypto/Kconfig"