diff options
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/DocBook/kernel-locking.tmpl | 14 | ||||
-rw-r--r-- | Documentation/RCU/checklist.txt | 46 | ||||
-rw-r--r-- | Documentation/RCU/stallwarn.txt | 18 | ||||
-rw-r--r-- | Documentation/RCU/trace.txt | 13 | ||||
-rw-r--r-- | Documentation/cputopology.txt | 23 | ||||
-rw-r--r-- | Documentation/feature-removal-schedule.txt | 28 | ||||
-rw-r--r-- | Documentation/kernel-parameters.txt | 11 | ||||
-rw-r--r-- | Documentation/kprobes.txt | 8 |
8 files changed, 108 insertions, 53 deletions
diff --git a/Documentation/DocBook/kernel-locking.tmpl b/Documentation/DocBook/kernel-locking.tmpl index a0d479d1e1d..f66f4df1869 100644 --- a/Documentation/DocBook/kernel-locking.tmpl +++ b/Documentation/DocBook/kernel-locking.tmpl @@ -1645,7 +1645,9 @@ the amount of locking which needs to be done. all the readers who were traversing the list when we deleted the element are finished. We use <function>call_rcu()</function> to register a callback which will actually destroy the object once - the readers are finished. + all pre-existing readers are finished. Alternatively, + <function>synchronize_rcu()</function> may be used to block until + all pre-existing are finished. </para> <para> But how does Read Copy Update know when the readers are @@ -1714,7 +1716,7 @@ the amount of locking which needs to be done. - object_put(obj); + list_del_rcu(&obj->list); cache_num--; -+ call_rcu(&obj->rcu, cache_delete_rcu, obj); ++ call_rcu(&obj->rcu, cache_delete_rcu); } /* Must be holding cache_lock */ @@ -1725,14 +1727,6 @@ the amount of locking which needs to be done. if (++cache_num > MAX_CACHE_SIZE) { struct object *i, *outcast = NULL; list_for_each_entry(i, &cache, list) { -@@ -85,6 +94,7 @@ - obj->popularity = 0; - atomic_set(&obj->refcnt, 1); /* The cache holds a reference */ - spin_lock_init(&obj->lock); -+ INIT_RCU_HEAD(&obj->rcu); - - spin_lock_irqsave(&cache_lock, flags); - __cache_add(obj); @@ -104,12 +114,11 @@ struct object *cache_find(int id) { diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt index 790d1a81237..0c134f8afc6 100644 --- a/Documentation/RCU/checklist.txt +++ b/Documentation/RCU/checklist.txt @@ -218,13 +218,22 @@ over a rather long period of time, but improvements are always welcome! include: a. Keeping a count of the number of data-structure elements - used by the RCU-protected data structure, including those - waiting for a grace period to elapse. Enforce a limit - on this number, stalling updates as needed to allow - previously deferred frees to complete. - - Alternatively, limit only the number awaiting deferred - free rather than the total number of elements. + used by the RCU-protected data structure, including + those waiting for a grace period to elapse. Enforce a + limit on this number, stalling updates as needed to allow + previously deferred frees to complete. Alternatively, + limit only the number awaiting deferred free rather than + the total number of elements. + + One way to stall the updates is to acquire the update-side + mutex. (Don't try this with a spinlock -- other CPUs + spinning on the lock could prevent the grace period + from ever ending.) Another way to stall the updates + is for the updates to use a wrapper function around + the memory allocator, so that this wrapper function + simulates OOM when there is too much memory awaiting an + RCU grace period. There are of course many other + variations on this theme. b. Limiting update rate. For example, if updates occur only once per hour, then no explicit rate limiting is required, @@ -365,3 +374,26 @@ over a rather long period of time, but improvements are always welcome! and the compiler to freely reorder code into and out of RCU read-side critical sections. It is the responsibility of the RCU update-side primitives to deal with this. + +17. Use CONFIG_PROVE_RCU, CONFIG_DEBUG_OBJECTS_RCU_HEAD, and + the __rcu sparse checks to validate your RCU code. These + can help find problems as follows: + + CONFIG_PROVE_RCU: check that accesses to RCU-protected data + structures are carried out under the proper RCU + read-side critical section, while holding the right + combination of locks, or whatever other conditions + are appropriate. + + CONFIG_DEBUG_OBJECTS_RCU_HEAD: check that you don't pass the + same object to call_rcu() (or friends) before an RCU + grace period has elapsed since the last time that you + passed that same object to call_rcu() (or friends). + + __rcu sparse checks: tag the pointer to the RCU-protected data + structure with __rcu, and sparse will warn you if you + access that pointer without the services of one of the + variants of rcu_dereference(). + + These debugging aids can help you find problems that are + otherwise extremely difficult to spot. diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt index 44c6dcc93d6..862c08ef1fd 100644 --- a/Documentation/RCU/stallwarn.txt +++ b/Documentation/RCU/stallwarn.txt @@ -80,6 +80,24 @@ o A CPU looping with bottom halves disabled. This condition can o For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel without invoking schedule(). +o A CPU-bound real-time task in a CONFIG_PREEMPT kernel, which might + happen to preempt a low-priority task in the middle of an RCU + read-side critical section. This is especially damaging if + that low-priority task is not permitted to run on any other CPU, + in which case the next RCU grace period can never complete, which + will eventually cause the system to run out of memory and hang. + While the system is in the process of running itself out of + memory, you might see stall-warning messages. + +o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that + is running at a higher priority than the RCU softirq threads. + This will prevent RCU callbacks from ever being invoked, + and in a CONFIG_TREE_PREEMPT_RCU kernel will further prevent + RCU grace periods from ever completing. Either way, the + system will eventually run out of memory and hang. In the + CONFIG_TREE_PREEMPT_RCU case, you might see stall-warning + messages. + o A bug in the RCU implementation. o A hardware failure. This is quite unlikely, but has occurred diff --git a/Documentation/RCU/trace.txt b/Documentation/RCU/trace.txt index efd8cc95c06..a851118775d 100644 --- a/Documentation/RCU/trace.txt +++ b/Documentation/RCU/trace.txt @@ -125,6 +125,17 @@ o "b" is the batch limit for this CPU. If more than this number of RCU callbacks is ready to invoke, then the remainder will be deferred. +o "ci" is the number of RCU callbacks that have been invoked for + this CPU. Note that ci+ql is the number of callbacks that have + been registered in absence of CPU-hotplug activity. + +o "co" is the number of RCU callbacks that have been orphaned due to + this CPU going offline. + +o "ca" is the number of RCU callbacks that have been adopted due to + other CPUs going offline. Note that ci+co-ca+ql is the number of + RCU callbacks registered on this CPU. + There is also an rcu/rcudata.csv file with the same information in comma-separated-variable spreadsheet format. @@ -180,7 +191,7 @@ o "s" is the "signaled" state that drives force_quiescent_state()'s o "jfq" is the number of jiffies remaining for this grace period before force_quiescent_state() is invoked to help push things - along. Note that CPUs in dyntick-idle mode thoughout the grace + along. Note that CPUs in dyntick-idle mode throughout the grace period will not report on their own, but rather must be check by some other CPU via force_quiescent_state(). diff --git a/Documentation/cputopology.txt b/Documentation/cputopology.txt index f1c5c4bccd3..902d3151f52 100644 --- a/Documentation/cputopology.txt +++ b/Documentation/cputopology.txt @@ -14,25 +14,39 @@ to /proc/cpuinfo. identifier (rather than the kernel's). The actual value is architecture and platform dependent. -3) /sys/devices/system/cpu/cpuX/topology/thread_siblings: +3) /sys/devices/system/cpu/cpuX/topology/book_id: + + the book ID of cpuX. Typically it is the hardware platform's + identifier (rather than the kernel's). The actual value is + architecture and platform dependent. + +4) /sys/devices/system/cpu/cpuX/topology/thread_siblings: internel kernel map of cpuX's hardware threads within the same core as cpuX -4) /sys/devices/system/cpu/cpuX/topology/core_siblings: +5) /sys/devices/system/cpu/cpuX/topology/core_siblings: internal kernel map of cpuX's hardware threads within the same physical_package_id. +6) /sys/devices/system/cpu/cpuX/topology/book_siblings: + + internal kernel map of cpuX's hardware threads within the same + book_id. + To implement it in an architecture-neutral way, a new source file, -drivers/base/topology.c, is to export the 4 attributes. +drivers/base/topology.c, is to export the 4 or 6 attributes. The two book +related sysfs files will only be created if CONFIG_SCHED_BOOK is selected. For an architecture to support this feature, it must define some of these macros in include/asm-XXX/topology.h: #define topology_physical_package_id(cpu) #define topology_core_id(cpu) +#define topology_book_id(cpu) #define topology_thread_cpumask(cpu) #define topology_core_cpumask(cpu) +#define topology_book_cpumask(cpu) The type of **_id is int. The type of siblings is (const) struct cpumask *. @@ -45,6 +59,9 @@ not defined by include/asm-XXX/topology.h: 3) thread_siblings: just the given CPU 4) core_siblings: just the given CPU +For architectures that don't support books (CONFIG_SCHED_BOOK) there are no +default definitions for topology_book_id() and topology_book_cpumask(). + Additionally, CPU topology information is provided under /sys/devices/system/cpu and includes these files. The internal source for the output is in brackets ("[]"). diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 842aa9de84a..5e2bc4ab897 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -386,34 +386,6 @@ Who: Tejun Heo <tj@kernel.org> ---------------------------- -What: Support for VMware's guest paravirtuliazation technique [VMI] will be - dropped. -When: 2.6.37 or earlier. -Why: With the recent innovations in CPU hardware acceleration technologies - from Intel and AMD, VMware ran a few experiments to compare these - techniques to guest paravirtualization technique on VMware's platform. - These hardware assisted virtualization techniques have outperformed the - performance benefits provided by VMI in most of the workloads. VMware - expects that these hardware features will be ubiquitous in a couple of - years, as a result, VMware has started a phased retirement of this - feature from the hypervisor. We will be removing this feature from the - Kernel too. Right now we are targeting 2.6.37 but can retire earlier if - technical reasons (read opportunity to remove major chunk of pvops) - arise. - - Please note that VMI has always been an optimization and non-VMI kernels - still work fine on VMware's platform. - Latest versions of VMware's product which support VMI are, - Workstation 7.0 and VSphere 4.0 on ESX side, future maintainence - releases for these products will continue supporting VMI. - - For more details about VMI retirement take a look at this, - http://blogs.vmware.com/guestosguide/2009/09/vmi-retirement.html - -Who: Alok N Kataria <akataria@vmware.com> - ----------------------------- - What: Support for lcd_switch and display_get in asus-laptop driver When: March 2010 Why: These two features use non-standard interfaces. There are the diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 8dd7248508a..3a0009e03d1 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -455,7 +455,7 @@ and is between 256 and 4096 characters. It is defined in the file [ARM] imx_timer1,OSTS,netx_timer,mpu_timer2, pxa_timer,timer3,32k_counter,timer0_1 [AVR32] avr32 - [X86-32] pit,hpet,tsc,vmi-timer; + [X86-32] pit,hpet,tsc; scx200_hrt on Geode; cyclone on IBM x440 [MIPS] MIPS [PARISC] cr16 @@ -2153,6 +2153,11 @@ and is between 256 and 4096 characters. It is defined in the file Reserves a hole at the top of the kernel virtual address space. + reservelow= [X86] + Format: nn[K] + Set the amount of memory to reserve for BIOS at + the bottom of the address space. + reset_devices [KNL] Force drivers to reset the underlying device during initialization. @@ -2435,6 +2440,10 @@ and is between 256 and 4096 characters. It is defined in the file disables clocksource verification at runtime. Used to enable high-resolution timer mode on older hardware, and in virtualized environment. + [x86] noirqtime: Do not use TSC to do irq accounting. + Used to run time disable IRQ_TIME_ACCOUNTING on any + platforms where RDTSC is slow and this accounting + can add overhead. turbografx.map[2|3]= [HW,JOY] TurboGraFX parallel port interface diff --git a/Documentation/kprobes.txt b/Documentation/kprobes.txt index 1762b81fcdf..741fe66d6ec 100644 --- a/Documentation/kprobes.txt +++ b/Documentation/kprobes.txt @@ -542,9 +542,11 @@ Kprobes does not use mutexes or allocate memory except during registration and unregistration. Probe handlers are run with preemption disabled. Depending on the -architecture, handlers may also run with interrupts disabled. In any -case, your handler should not yield the CPU (e.g., by attempting to -acquire a semaphore). +architecture and optimization state, handlers may also run with +interrupts disabled (e.g., kretprobe handlers and optimized kprobe +handlers run without interrupt disabled on x86/x86-64). In any case, +your handler should not yield the CPU (e.g., by attempting to acquire +a semaphore). Since a return probe is implemented by replacing the return address with the trampoline's address, stack backtraces and calls |