diff options
Diffstat (limited to 'Documentation/x86/x86_64')
| -rw-r--r-- | Documentation/x86/x86_64/boot-options.txt | 143 | ||||
| -rw-r--r-- | Documentation/x86/x86_64/fake-numa-for-cpusets | 5 | ||||
| -rw-r--r-- | Documentation/x86/x86_64/kernel-stacks | 6 | ||||
| -rw-r--r-- | Documentation/x86/x86_64/machinecheck | 8 | ||||
| -rw-r--r-- | Documentation/x86/x86_64/mm.txt | 22 |
5 files changed, 94 insertions, 90 deletions
diff --git a/Documentation/x86/x86_64/boot-options.txt b/Documentation/x86/x86_64/boot-options.txt index 34c13040a71..5223479291a 100644 --- a/Documentation/x86/x86_64/boot-options.txt +++ b/Documentation/x86/x86_64/boot-options.txt @@ -5,21 +5,58 @@ only the AMD64 specific ones are listed here. Machine check - mce=off disable machine check - mce=bootlog Enable logging of machine checks left over from booting. - Disabled by default on AMD because some BIOS leave bogus ones. - If your BIOS doesn't do that it's a good idea to enable though - to make sure you log even machine check events that result - in a reboot. On Intel systems it is enabled by default. + Please see Documentation/x86/x86_64/machinecheck for sysfs runtime tunables. + + mce=off + Disable machine check + mce=no_cmci + Disable CMCI(Corrected Machine Check Interrupt) that + Intel processor supports. Usually this disablement is + not recommended, but it might be handy if your hardware + is misbehaving. + Note that you'll get more problems without CMCI than with + due to the shared banks, i.e. you might get duplicated + error logs. + mce=dont_log_ce + Don't make logs for corrected errors. All events reported + as corrected are silently cleared by OS. + This option will be useful if you have no interest in any + of corrected errors. + mce=ignore_ce + Disable features for corrected errors, e.g. polling timer + and CMCI. All events reported as corrected are not cleared + by OS and remained in its error banks. + Usually this disablement is not recommended, however if + there is an agent checking/clearing corrected errors + (e.g. BIOS or hardware monitoring applications), conflicting + with OS's error handling, and you cannot deactivate the agent, + then this option will be a help. + mce=bootlog + Enable logging of machine checks left over from booting. + Disabled by default on AMD because some BIOS leave bogus ones. + If your BIOS doesn't do that it's a good idea to enable though + to make sure you log even machine check events that result + in a reboot. On Intel systems it is enabled by default. mce=nobootlog Disable boot machine check logging. - mce=tolerancelevel (number) + mce=tolerancelevel[,monarchtimeout] (number,number) + tolerance levels: 0: always panic on uncorrected errors, log corrected errors 1: panic or SIGBUS on uncorrected errors, log corrected errors 2: SIGBUS or log uncorrected errors, log corrected errors 3: never panic or SIGBUS, log all errors (for testing only) Default is 1 Can be also set using sysfs which is preferable. + monarchtimeout: + Sets the time in us to wait for other CPUs on machine checks. 0 + to disable. + mce=bios_cmci_threshold + Don't overwrite the bios-set CMCI threshold. This boot option + prevents Linux from overwriting the CMCI threshold set by the + bios. Without this option, Linux always sets the CMCI + threshold to 1. Enabling this may make memory predictive failure + analysis less effective if the bios sets thresholds for memory + errors since we will not see details for all errors. nomce (for compatibility with i386): same as mce=off @@ -41,33 +78,11 @@ APICs no_timer_check Don't check the IO-APIC timer. This can work around problems with incorrect timer initialization on some boards. - - apicmaintimer Run time keeping from the local APIC timer instead - of using the PIT/HPET interrupt for this. This is useful - when the PIT/HPET interrupts are unreliable. - - noapicmaintimer Don't do time keeping using the APIC timer. - Useful when this option was auto selected, but doesn't work. - apicpmtimer Do APIC timer calibration using the pmtimer. Implies apicmaintimer. Useful when your PIT timer is totally broken. -Early Console - - syntax: earlyprintk=vga - earlyprintk=serial[,ttySn[,baudrate]] - - The early console is useful when the kernel crashes before the - normal console is initialized. It is not enabled by - default because it has some cosmetic problems. - Append ,keep to not disable it when the real console takes over. - Only vga or serial at a time, not both. - Currently only ttyS0 and ttyS1 are supported. - Interaction with the standard serial driver is not very good. - The VGA output is eventually overwritten by the real console. - Timing notsc @@ -75,10 +90,6 @@ Timing This can be used to work around timing problems on multiprocessor systems with not properly synchronized CPUs. - report_lost_ticks - Report when timer interrupts are lost because some code turned off - interrupts for too long. - nohpet Don't use the HPET timer. @@ -125,35 +136,19 @@ Non Executable Mappings on Enable(default) off Disable -SMP - - additional_cpus=NUM Allow NUM more CPUs for hotplug - (defaults are specified by the BIOS, see Documentation/x86/x86_64/cpu-hotplug-spec) - NUMA numa=off Only set up a single NUMA node spanning all memory. numa=noacpi Don't parse the SRAT table for NUMA setup - numa=fake=CMDLINE - If a number, fakes CMDLINE nodes and ignores NUMA setup of the - actual machine. Otherwise, system memory is configured - depending on the sizes and coefficients listed. For example: - numa=fake=2*512,1024,4*256,*128 - gives two 512M nodes, a 1024M node, four 256M nodes, and the - rest split into 128M chunks. If the last character of CMDLINE - is a *, the remaining memory is divided up equally among its - coefficient: - numa=fake=2*512,2* - gives two 512M nodes and the rest split into two nodes. - Otherwise, the remaining system RAM is allocated to an - additional node. - - numa=hotadd=percent - Only allow hotadd memory to preallocate page structures upto - percent of already available memory. - numa=hotadd=0 will disable hotadd memory. + numa=fake=<size>[MG] + If given as a memory unit, fills all system RAM with nodes of + size interleaved over physical nodes. + + numa=fake=<N> + If given as an integer, fills all system RAM with N fake nodes + interleaved over physical nodes. ACPI @@ -168,15 +163,20 @@ ACPI acpi=noirq Don't route interrupts + acpi=nocmcff Disable firmware first mode for corrected errors. This + disables parsing the HEST CMC error source to check if + firmware has set the FF flag. This may result in + duplicate corrected error reports. + PCI - pci=off Don't use PCI - pci=conf1 Use conf1 access. - pci=conf2 Use conf2 access. - pci=rom Assign ROMs. - pci=assign-busses Assign busses - pci=irqmask=MASK Set PCI interrupt mask to MASK - pci=lastbus=NUMBER Scan upto NUMBER busses, no matter what the mptable says. + pci=off Don't use PCI + pci=conf1 Use conf1 access. + pci=conf2 Use conf2 access. + pci=rom Assign ROMs. + pci=assign-busses Assign busses + pci=irqmask=MASK Set PCI interrupt mask to MASK + pci=lastbus=NUMBER Scan up to NUMBER busses, no matter what the mptable says. pci=noacpi Don't use ACPI to set up PCI interrupt routing. IOMMU (input/output memory management unit) @@ -187,7 +187,7 @@ IOMMU (input/output memory management unit) (e.g. because you have < 3 GB memory). Kernel boot message: "PCI-DMA: Disabling IOMMU" - 2. <arch/x86_64/kernel/pci-gart.c>: AMD GART based hardware IOMMU. + 2. <arch/x86/kernel/amd_gart_64.c>: AMD GART based hardware IOMMU. Kernel boot message: "PCI-DMA: using GART IOMMU" 3. <arch/x86_64/kernel/pci-swiotlb.c> : Software IOMMU implementation. Used @@ -274,23 +274,8 @@ IOMMU (input/output memory management unit) Debugging - oops=panic Always panic on oopses. Default is to just kill the process, - but there is a small probability of deadlocking the machine. - This will also cause panics on machine check exceptions. - Useful together with panic=30 to trigger a reboot. - kstack=N Print N words from the kernel stack in oops dumps. - pagefaulttrace Dump all page faults. Only useful for extreme debugging - and will create a lot of output. - - call_trace=[old|both|newfallback|new] - old: use old inexact backtracer - new: use new exact dwarf2 unwinder - both: print entries from both - newfallback: use new unwinder but fall back to old if it gets - stuck (default) - Miscellaneous nogbpages diff --git a/Documentation/x86/x86_64/fake-numa-for-cpusets b/Documentation/x86/x86_64/fake-numa-for-cpusets index 33bb5665599..0f11d9becb0 100644 --- a/Documentation/x86/x86_64/fake-numa-for-cpusets +++ b/Documentation/x86/x86_64/fake-numa-for-cpusets @@ -7,7 +7,8 @@ you can create fake NUMA nodes that represent contiguous chunks of memory and assign them to cpusets and their attached tasks. This is a way of limiting the amount of system memory that are available to a certain class of tasks. -For more information on the features of cpusets, see Documentation/cpusets.txt. +For more information on the features of cpusets, see +Documentation/cgroups/cpusets.txt. There are a number of different configurations you can use for your needs. For more information on the numa=fake command line option and its various ways of configuring fake nodes, see Documentation/x86/x86_64/boot-options.txt. @@ -32,7 +33,7 @@ A machine may be split as follows with "numa=fake=4*512," as reported by dmesg: On node 3 totalpages: 131072 Now following the instructions for mounting the cpusets filesystem from -Documentation/cpusets.txt, you can assign fake nodes (i.e. contiguous memory +Documentation/cgroups/cpusets.txt, you can assign fake nodes (i.e. contiguous memory address spaces) to individual cpusets: [root@xroads /]# mkdir exampleset diff --git a/Documentation/x86/x86_64/kernel-stacks b/Documentation/x86/x86_64/kernel-stacks index 5ad65d51fb9..a01eec5d1d0 100644 --- a/Documentation/x86/x86_64/kernel-stacks +++ b/Documentation/x86/x86_64/kernel-stacks @@ -18,9 +18,9 @@ specialized stacks contain no useful data. The main CPU stacks are: Used for external hardware interrupts. If this is the first external hardware interrupt (i.e. not a nested hardware interrupt) then the kernel switches from the current task to the interrupt stack. Like - the split thread and interrupt stacks on i386 (with CONFIG_4KSTACKS), - this gives more room for kernel interrupt processing without having - to increase the size of every per thread stack. + the split thread and interrupt stacks on i386, this gives more room + for kernel interrupt processing without having to increase the size + of every per thread stack. The interrupt stack is also used when processing a softirq. diff --git a/Documentation/x86/x86_64/machinecheck b/Documentation/x86/x86_64/machinecheck index a05e58e7b15..b1fb3027328 100644 --- a/Documentation/x86/x86_64/machinecheck +++ b/Documentation/x86/x86_64/machinecheck @@ -41,7 +41,9 @@ check_interval the polling interval. When the poller stops finding MCEs, it triggers an exponential backoff (poll less often) on the polling interval. The check_interval variable is both the initial and - maximum polling interval. + maximum polling interval. 0 means no polling for corrected machine + check errors (but some corrected errors might be still reported + in other ways) tolerant Tolerance level. When a machine check exception occurs for a non @@ -67,6 +69,10 @@ trigger Program to run when a machine check event is detected. This is an alternative to running mcelog regularly from cron and allows to detect events faster. +monarch_timeout + How long to wait for the other CPUs to machine check too on a + exception. 0 to disable waiting for other CPUs. + Unit: us TBD document entries for AMD threshold interrupt configuration diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt index 29b52b14d0b..afe68ddbe6a 100644 --- a/Documentation/x86/x86_64/mm.txt +++ b/Documentation/x86/x86_64/mm.txt @@ -6,13 +6,18 @@ Virtual memory map with 4 level page tables: 0000000000000000 - 00007fffffffffff (=47 bits) user space, different per mm hole caused by [48:63] sign extension ffff800000000000 - ffff80ffffffffff (=40 bits) guard hole -ffff880000000000 - ffffc0ffffffffff (=57 TB) direct mapping of all phys. memory -ffffc10000000000 - ffffc1ffffffffff (=40 bits) hole -ffffc20000000000 - ffffe1ffffffffff (=45 bits) vmalloc/ioremap space -ffffe20000000000 - ffffe2ffffffffff (=40 bits) virtual memory map (1TB) +ffff880000000000 - ffffc7ffffffffff (=64 TB) direct mapping of all phys. memory +ffffc80000000000 - ffffc8ffffffffff (=40 bits) hole +ffffc90000000000 - ffffe8ffffffffff (=45 bits) vmalloc/ioremap space +ffffe90000000000 - ffffe9ffffffffff (=40 bits) hole +ffffea0000000000 - ffffeaffffffffff (=40 bits) virtual memory map (1TB) +... unused hole ... +ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks ... unused hole ... ffffffff80000000 - ffffffffa0000000 (=512 MB) kernel text mapping, from phys 0 -ffffffffa0000000 - fffffffffff00000 (=1536 MB) module mapping space +ffffffffa0000000 - ffffffffff5fffff (=1525 MB) module mapping space +ffffffffff600000 - ffffffffffdfffff (=8 MB) vsyscalls +ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole The direct mapping covers all memory in the system up to the highest memory address (this means in some cases it can also include PCI memory @@ -25,4 +30,11 @@ reference. Current X86-64 implementations only support 40 bits of address space, but we support up to 46 bits. This expands into MBZ space in the page tables. +->trampoline_pgd: + +We map EFI runtime services in the aforementioned PGD in the virtual +range of 64Gb (arbitrarily set, can be raised if needed) + +0xffffffef00000000 - 0xffffffff00000000 + -Andi Kleen, Jul 2004 |
