diff options
Diffstat (limited to 'Documentation')
41 files changed, 2968 insertions, 282 deletions
diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt index 1af0f2d5022..2ffb0d62f0f 100644 --- a/Documentation/DMA-API.txt +++ b/Documentation/DMA-API.txt @@ -33,7 +33,9 @@ pci_alloc_consistent(struct pci_dev *dev, size_t size, Consistent memory is memory for which a write by either the device or the processor can immediately be read by the processor or device -without having to worry about caching effects. +without having to worry about caching effects. (You may however need +to make sure to flush the processor's write buffers before telling +devices to read that memory.) This routine allocates a region of <size> bytes of consistent memory. it also returns a <dma_handle> which may be cast to an unsigned @@ -304,12 +306,12 @@ dma address with dma_mapping_error(). A non zero return value means the mapping could not be created and the driver should take appropriate action (eg reduce current DMA mapping usage or delay and try again later). -int -dma_map_sg(struct device *dev, struct scatterlist *sg, int nents, - enum dma_data_direction direction) -int -pci_map_sg(struct pci_dev *hwdev, struct scatterlist *sg, - int nents, int direction) + int + dma_map_sg(struct device *dev, struct scatterlist *sg, + int nents, enum dma_data_direction direction) + int + pci_map_sg(struct pci_dev *hwdev, struct scatterlist *sg, + int nents, int direction) Maps a scatter gather list from the block layer. @@ -327,12 +329,33 @@ critical that the driver do something, in the case of a block driver aborting the request or even oopsing is better than doing nothing and corrupting the filesystem. -void -dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nhwentries, - enum dma_data_direction direction) -void -pci_unmap_sg(struct pci_dev *hwdev, struct scatterlist *sg, - int nents, int direction) +With scatterlists, you use the resulting mapping like this: + + int i, count = dma_map_sg(dev, sglist, nents, direction); + struct scatterlist *sg; + + for (i = 0, sg = sglist; i < count; i++, sg++) { + hw_address[i] = sg_dma_address(sg); + hw_len[i] = sg_dma_len(sg); + } + +where nents is the number of entries in the sglist. + +The implementation is free to merge several consecutive sglist entries +into one (e.g. with an IOMMU, or if several pages just happen to be +physically contiguous) and returns the actual number of sg entries it +mapped them to. On failure 0, is returned. + +Then you should loop count times (note: this can be less than nents times) +and use sg_dma_address() and sg_dma_len() macros where you previously +accessed sg->address and sg->length as shown above. + + void + dma_unmap_sg(struct device *dev, struct scatterlist *sg, + int nhwentries, enum dma_data_direction direction) + void + pci_unmap_sg(struct pci_dev *hwdev, struct scatterlist *sg, + int nents, int direction) unmap the previously mapped scatter/gather list. All the parameters must be the same as those and passed in to the scatter/gather mapping diff --git a/Documentation/DMA-mapping.txt b/Documentation/DMA-mapping.txt index ee4bb73683c..7c717699032 100644 --- a/Documentation/DMA-mapping.txt +++ b/Documentation/DMA-mapping.txt @@ -58,11 +58,15 @@ translating each of those pages back to a kernel address using something like __va(). [ EDIT: Update this when we integrate Gerd Knorr's generic code which does this. ] -This rule also means that you may not use kernel image addresses -(ie. items in the kernel's data/text/bss segment, or your driver's) -nor may you use kernel stack addresses for DMA. Both of these items -might be mapped somewhere entirely different than the rest of physical -memory. +This rule also means that you may use neither kernel image addresses +(items in data/text/bss segments), nor module image addresses, nor +stack addresses for DMA. These could all be mapped somewhere entirely +different than the rest of physical memory. Even if those classes of +memory could physically work with DMA, you'd need to ensure the I/O +buffers were cacheline-aligned. Without that, you'd see cacheline +sharing problems (data corruption) on CPUs with DMA-incoherent caches. +(The CPU could write to one word, DMA would write to a different one +in the same cache line, and one of them could be overwritten.) Also, this means that you cannot take the return of a kmap() call and DMA to/from that. This is similar to vmalloc(). @@ -194,7 +198,7 @@ document for how to handle this case. Finally, if your device can only drive the low 24-bits of address during PCI bus mastering you might do something like: - if (pci_set_dma_mask(pdev, 0x00ffffff)) { + if (pci_set_dma_mask(pdev, DMA_24BIT_MASK)) { printk(KERN_WARNING "mydev: 24-bit DMA addressing not available.\n"); goto ignore_this_device; @@ -212,7 +216,7 @@ functions (for example a sound card provides playback and record functions) and the various different functions have _different_ DMA addressing limitations, you may wish to probe each mask and only provide the functionality which the machine can handle. It -is important that the last call to pci_set_dma_mask() be for the +is important that the last call to pci_set_dma_mask() be for the most specific mask. Here is pseudo-code showing how this might be done: @@ -284,6 +288,11 @@ There are two types of DMA mappings: in order to get correct behavior on all platforms. + Also, on some platforms your driver may need to flush CPU write + buffers in much the same way as it needs to flush write buffers + found in PCI bridges (such as by reading a register's value + after writing it). + - Streaming DMA mappings which are usually mapped for one DMA transfer, unmapped right after it (unless you use pci_dma_sync_* below) and for which hardware can optimize for sequential accesses. @@ -303,6 +312,9 @@ There are two types of DMA mappings: Neither type of DMA mapping has alignment restrictions that come from PCI, although some devices may have such restrictions. +Also, systems with caches that aren't DMA-coherent will work better +when the underlying buffers don't share cache lines with other data. + Using Consistent DMA mappings. diff --git a/Documentation/DocBook/Makefile b/Documentation/DocBook/Makefile index 7d87dd73cbe..5a2882d275b 100644 --- a/Documentation/DocBook/Makefile +++ b/Documentation/DocBook/Makefile @@ -2,7 +2,7 @@ # This makefile is used to generate the kernel documentation, # primarily based on in-line comments in various source files. # See Documentation/kernel-doc-nano-HOWTO.txt for instruction in how -# to ducument the SRC - and how to read it. +# to document the SRC - and how to read it. # To add a new book the only step required is to add the book to the # list of DOCBOOKS. diff --git a/Documentation/DocBook/kernel-api.tmpl b/Documentation/DocBook/kernel-api.tmpl index 8c9c6704e85..ca02e04a906 100644 --- a/Documentation/DocBook/kernel-api.tmpl +++ b/Documentation/DocBook/kernel-api.tmpl @@ -322,7 +322,6 @@ X!Earch/i386/kernel/mca.c <chapter id="sysfs"> <title>The Filesystem for Exporting Kernel Objects</title> !Efs/sysfs/file.c -!Efs/sysfs/dir.c !Efs/sysfs/symlink.c !Efs/sysfs/bin.c </chapter> diff --git a/Documentation/DocBook/libata.tmpl b/Documentation/DocBook/libata.tmpl index d260d92089a..f869b03929d 100644 --- a/Documentation/DocBook/libata.tmpl +++ b/Documentation/DocBook/libata.tmpl @@ -120,14 +120,27 @@ void (*dev_config) (struct ata_port *, struct ata_device *); <programlisting> void (*set_piomode) (struct ata_port *, struct ata_device *); void (*set_dmamode) (struct ata_port *, struct ata_device *); -void (*post_set_mode) (struct ata_port *ap); +void (*post_set_mode) (struct ata_port *); +unsigned int (*mode_filter) (struct ata_port *, struct ata_device *, unsigned int); </programlisting> <para> Hooks called prior to the issue of SET FEATURES - XFER MODE - command. dev->pio_mode is guaranteed to be valid when - ->set_piomode() is called, and dev->dma_mode is guaranteed to be - valid when ->set_dmamode() is called. ->post_set_mode() is + command. The optional ->mode_filter() hook is called when libata + has built a mask of the possible modes. This is passed to the + ->mode_filter() function which should return a mask of valid modes + after filtering those unsuitable due to hardware limits. It is not + valid to use this interface to add modes. + </para> + <para> + dev->pio_mode and dev->dma_mode are guaranteed to be valid when + ->set_piomode() and when ->set_dmamode() is called. The timings for + any other drive sharing the cable will also be valid at this point. + That is the library records the decisions for the modes of each + drive on a channel before it attempts to set any of them. + </para> + <para> + ->post_set_mode() is called unconditionally, after the SET FEATURES - XFER MODE command completes successfully. </para> @@ -230,6 +243,32 @@ void (*dev_select)(struct ata_port *ap, unsigned int device); </sect2> + <sect2><title>Private tuning method</title> + <programlisting> +void (*set_mode) (struct ata_port *ap); + </programlisting> + + <para> + By default libata performs drive and controller tuning in + accordance with the ATA timing rules and also applies blacklists + and cable limits. Some controllers need special handling and have + custom tuning rules, typically raid controllers that use ATA + commands but do not actually do drive timing. + </para> + + <warning> + <para> + This hook should not be used to replace the standard controller + tuning logic when a controller has quirks. Replacing the default + tuning logic in that case would bypass handling for drive and + bridge quirks that may be important to data reliability. If a + controller needs to filter the mode selection it should use the + mode_filter hook instead. + </para> + </warning> + + </sect2> + <sect2><title>Reset ATA bus</title> <programlisting> void (*phy_reset) (struct ata_port *ap); @@ -666,7 +705,7 @@ and other resources, etc. <sect1><title>ata_scsi_error()</title> <para> - ata_scsi_error() is the current hostt->eh_strategy_handler() + ata_scsi_error() is the current transportt->eh_strategy_handler() for libata. As discussed above, this will be entered in two cases - timeout and ATAPI error completion. This function calls low level libata driver's eng_timeout() callback, the diff --git a/Documentation/acpi-hotkey.txt b/Documentation/acpi-hotkey.txt index 744f1aec655..38040fa3764 100644 --- a/Documentation/acpi-hotkey.txt +++ b/Documentation/acpi-hotkey.txt @@ -30,7 +30,7 @@ specific hotkey(event)) echo "event_num:event_type:event_argument" > /proc/acpi/hotkey/action. The result of the execution of this aml method is -attached to /proc/acpi/hotkey/poll_method, which is dnyamically +attached to /proc/acpi/hotkey/poll_method, which is dynamically created. Please use command "cat /proc/acpi/hotkey/polling_method" to retrieve it. diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 495858b236b..293fed113df 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -71,14 +71,6 @@ Who: Mauro Carvalho Chehab <mchehab@brturbo.com.br> --------------------------- -What: remove EXPORT_SYMBOL(panic_timeout) -When: April 2006 -Files: kernel/panic.c -Why: No modular usage in the kernel. -Who: Adrian Bunk <bunk@stusta.de> - ---------------------------- - What: remove EXPORT_SYMBOL(insert_resource) When: April 2006 Files: kernel/resource.c @@ -127,13 +119,6 @@ Who: Christoph Hellwig <hch@lst.de> --------------------------- -What: EXPORT_SYMBOL(lookup_hash) -When: January 2006 -Why: Too low-level interface. Use lookup_one_len or lookup_create instead. -Who: Christoph Hellwig <hch@lst.de> - ---------------------------- - What: CONFIG_FORCED_INLINING When: June 2006 Why: Config option is there to see if gcc is good enough. (in january @@ -241,3 +226,15 @@ Why: The USB subsystem has changed a lot over time, and it has been Who: Greg Kroah-Hartman <gregkh@suse.de> --------------------------- + +What: find_trylock_page +When: January 2007 +Why: The interface no longer has any callers left in the kernel. It + is an odd interface (compared with other find_*_page functions), in + that it does not take a refcount to the page, only the page lock. + It should be replaced with find_get_page or find_lock_page if possible. + This feature removal can be reevaluated if users of the interface + cannot cleanly use something else. +Who: Nick Piggin <npiggin@suse.de> + +--------------------------- diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index adaa899e5c9..3a2e5520c1e 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -694,7 +694,7 @@ struct file_operations ---------------------- This describes how the VFS can manipulate an open file. As of kernel -2.6.13, the following members are defined: +2.6.17, the following members are defined: struct file_operations { loff_t (*llseek) (struct file *, loff_t, int); @@ -723,6 +723,10 @@ struct file_operations { int (*check_flags)(int); int (*dir_notify)(struct file *filp, unsigned long arg); int (*flock) (struct file *, int, struct file_lock *); + ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, size_t, unsigned +int); + ssize_t (*splice_read)(struct file *, struct pipe_inode_info *, size_t, unsigned +int); }; Again, all methods are called without any locks being held, unless @@ -790,6 +794,12 @@ otherwise noted. flock: called by the flock(2) system call + splice_write: called by the VFS to splice data from a pipe to a file. This + method is used by the splice(2) system call + + splice_read: called by the VFS to splice data from file to a pipe. This + method is used by the splice(2) system call + Note that the file operations are implemented by the specific filesystem in which the inode resides. When opening a device node (character or block special) most filesystems will call special diff --git a/Documentation/fujitsu/frv/kernel-ABI.txt b/Documentation/fujitsu/frv/kernel-ABI.txt index 0ed9b0a779b..8b0a5fc8bfd 100644 --- a/Documentation/fujitsu/frv/kernel-ABI.txt +++ b/Documentation/fujitsu/frv/kernel-ABI.txt @@ -1,17 +1,19 @@ - ================================= - INTERNAL KERNEL ABI FOR FR-V ARCH - ================================= - -The internal FRV kernel ABI is not quite the same as the userspace ABI. A number of the registers -are used for special purposed, and the ABI is not consistent between modules vs core, and MMU vs -no-MMU. - -This partly stems from the fact that FRV CPUs do not have a separate supervisor stack pointer, and -most of them do not have any scratch registers, thus requiring at least one general purpose -register to be clobbered in such an event. Also, within the kernel core, it is possible to simply -jump or call directly between functions using a relative offset. This cannot be extended to modules -for the displacement is likely to be too far. Thus in modules the address of a function to call -must be calculated in a register and then used, requiring two extra instructions. + ================================= + INTERNAL KERNEL ABI FOR FR-V ARCH + ================================= + +The internal FRV kernel ABI is not quite the same as the userspace ABI. A +number of the registers are used for special purposed, and the ABI is not +consistent between modules vs core, and MMU vs no-MMU. + +This partly stems from the fact that FRV CPUs do not have a separate +supervisor stack pointer, and most of them do not have any scratch +registers, thus requiring at least one general purpose register to be +clobbered in such an event. Also, within the kernel core, it is possible to +simply jump or call directly between functions using a relative offset. +This cannot be extended to modules for the displacement is likely to be too +far. Thus in modules the address of a function to call must be calculated +in a register and then used, requiring two extra instructions. This document has the following sections: @@ -39,7 +41,8 @@ When a system call is made, the following registers are effective: CPU OPERATING MODES =================== -The FR-V CPU has three basic operating modes. In order of increasing capability: +The FR-V CPU has three basic operating modes. In order of increasing +capability: (1) User mode. @@ -47,42 +50,46 @@ The FR-V CPU has three basic operating modes. In order of increasing capability: (2) Kernel mode. - Normal kernel mode. There are many additional control registers available that may be - accessed in this mode, in addition to all the stuff available to user mode. This has two - submodes: + Normal kernel mode. There are many additional control registers + available that may be accessed in this mode, in addition to all the + stuff available to user mode. This has two submodes: (a) Exceptions enabled (PSR.T == 1). - Exceptions will invoke the appropriate normal kernel mode handler. On entry to the - handler, the PSR.T bit will be cleared. + Exceptions will invoke the appropriate normal kernel mode + handler. On entry to the handler, the PSR.T bit will be cleared. (b) Exceptions disabled (PSR.T == 0). - No exceptions or interrupts may happen. Any mandatory exceptions will cause the CPU to - halt unless the CPU is told to jump into debug mode instead. + No exceptions or interrupts may happen. Any mandatory exceptions + will cause the CPU to halt unless the CPU is told to jump into + debug mode instead. (3) Debug mode. - No exceptions may happen in this mode. Memory protection and management exceptions will be - flagged for later consideration, but the exception handler won't be invoked. Debugging traps - such as hardware breakpoints and watchpoints will be ignored. This mode is entered only by - debugging events obtained from the other two modes. + No exceptions may happen in this mode. Memory protection and + management exceptions will be flagged for later consideration, but + the exception handler won't be invoked. Debugging traps such as + hardware breakpoints and watchpoints will be ignored. This mode is + entered only by debugging events obtained from the other two modes. - All kernel mode registers may be accessed, plus a few extra debugging specific registers. + All kernel mode registers may be accessed, plus a few extra debugging + specific registers. ================================= INTERNAL KERNEL-MODE REGISTER ABI ================================= -There are a number of permanent register assignments that are set up by entry.S in the exception -prologue. Note that there is a complete set of exception prologues for each of user->kernel -transition and kernel->kernel transition. There are also user->debug and kernel->debug mode -transition prologues. +There are a number of permanent register assignments that are set up by +entry.S in the exception prologue. Note that there is a complete set of +exception prologues for each of user->kernel transition and kernel->kernel +transition. There are also user->debug and kernel->debug mode transition +prologues. REGISTER FLAVOUR USE - =============== ======= ==================================================== + =============== ======= ============================================== GR1 Supervisor stack pointer GR15 Current thread info pointer GR16 GP-Rel base register for small data @@ -92,10 +99,12 @@ transition prologues. GR31 NOMMU Destroyed by debug mode entry GR31 MMU Destroyed by TLB miss kernel mode entry CCR.ICC2 Virtual interrupt disablement tracking - CCCR.CC3 Cleared by exception prologue (atomic op emulation) + CCCR.CC3 Cleared by exception prologue + (atomic op emulation) SCR0 MMU See mmu-layout.txt. SCR1 MMU See mmu-layout.txt. - SCR2 MMU Save for EAR0 (destroyed by icache insns in debug mode) + SCR2 MMU Save for EAR0 (destroyed by icache insns + in debug mode) SCR3 MMU Save for GR31 during debug exceptions DAMR/IAMR NOMMU Fixed memory protection layout. DAMR/IAMR MMU See mmu-layout.txt. @@ -104,18 +113,21 @@ transition prologues. Certain registers are also used or modified across function calls: REGISTER CALL RETURN - =============== =============================== =============================== + =============== =============================== ====================== GR0 Fixed Zero - GR2 Function call frame pointer GR3 Special Preserved GR3-GR7 - Clobbered - GR8 Function call arg #1 Return value (or clobbered) - GR9 Function call arg #2 Return value MSW (or clobbered) + GR8 Function call arg #1 Return value + (or clobbered) + GR9 Function call arg #2 Return value MSW + (or clobbered) GR10-GR13 Function call arg #3-#6 Clobbered GR14 - Clobbered GR15-GR16 Special Preserved GR17-GR27 - Preserved - GR28-GR31 Special Only accessed explicitly + GR28-GR31 Special Only accessed + explicitly LR Return address after CALL Clobbered CCR/CCCR - Mostly Clobbered @@ -124,46 +136,53 @@ Certain registers are also used or modified across function calls: INTERNAL DEBUG-MODE REGISTER ABI ================================ -This is the same as the kernel-mode register ABI for functions calls. The difference is that in -debug-mode there's a different stack and a different exception frame. Almost all the global -registers from kernel-mode (including the stack pointer) may be changed. +This is the same as the kernel-mode register ABI for functions calls. The +difference is that in debug-mode there's a different stack and a different +exception frame. Almost all the global registers from kernel-mode +(including the stack pointer) may be changed. REGISTER FLAVOUR USE - =============== ======= ==================================================== + =============== ======= ============================================== GR1 Debug stack pointer GR16 GP-Rel base register for small data - GR31 Current debug exception frame pointer (__debug_frame) + GR31 Current debug exception frame pointer + (__debug_frame) SCR3 MMU Saved value of GR31 -Note that debug mode is able to interfere with the kernel's emulated atomic ops, so it must be -exceedingly careful not to do any that would interact with the main kernel in this regard. Hence -the debug mode code (gdbstub) is almost completely self-contained. The only external code used is -the sprintf family of functions. +Note that debug mode is able to interfere with the kernel's emulated atomic +ops, so it must be exceedingly careful not to do any that would interact +with the main kernel in this regard. Hence the debug mode code (gdbstub) is +almost completely self-contained. The only external code used is the +sprintf family of functions. -Futhermore, break.S is so complicated because single-step mode does not switch off on entry to an -exception. That means unless manually disabled, single-stepping will blithely go on stepping into -things like interrupts. See gdbstub.txt for more information. +Futhermore, break.S is so complicated because single-step mode does not +switch off on entry to an exception. That means unless manually disabled, +single-stepping will blithely go on stepping into things like interrupts. +See gdbstub.txt for more information. ========================== VIRTUAL INTERRUPT HANDLING ========================== -Because accesses to the PSR is so slow, and to disable interrupts we have to access it twice (once -to read and once to write), we don't actually disable interrupts at all if we don't have to. What -we do instead is use the ICC2 condition code flags to note virtual disablement, such that if we -then do take an interrupt, we note the flag, really disable interrupts, set another flag and resume -execution at the point the interrupt happened. Setting condition flags as a side effect of an -arithmetic or logical instruction is really fast. This use of the ICC2 only occurs within the +Because accesses to the PSR is so slow, and to disable interrupts we have +to access it twice (once to read and once to write), we don't actually +disable interrupts at all if we don't have to. What we do instead is use +the ICC2 condition code flags to note virtual disablement, such that if we +then do take an interrupt, we note the flag, really disable interrupts, set +another flag and resume execution at the point the interrupt happened. +Setting condition flags as a side effect of an arithmetic or logical +instruction is really fast. This use of the ICC2 only occurs within the kernel - it does not affect userspace. The flags we use are: (*) CCR.ICC2.Z [Zero flag] - Set to virtually disable interrupts, clear when interrupts are virtually enabled. Can be - modified by logical instructions without affecting the Carry flag. + Set to virtually disable interrupts, clear when interrupts are + virtually enabled. Can be modified by logical instructions without + affecting the Carry flag. (*) CCR.ICC2.C [Carry flag] @@ -176,8 +195,9 @@ What happens is this: ICC2.Z is 0, ICC2.C is 1. - (2) An interrupt occurs. The exception prologue examines ICC2.Z and determines that nothing needs - doing. This is done simply with an unlikely BEQ instruction. + (2) An interrupt occurs. The exception prologue examines ICC2.Z and + determines that nothing needs doing. This is done simply with an + unlikely BEQ instruction. (3) The interrupts are disabled (local_irq_disable) @@ -187,48 +207,56 @@ What happens is this: ICC2.Z would be set to 0. - A TIHI #2 instruction (trap #2 if condition HI - Z==0 && C==0) would be used to trap if - interrupts were now virtually enabled, but physically disabled - which they're not, so the - trap isn't taken. The kernel would then be back to state (1). + A TIHI #2 instruction (trap #2 if condition HI - Z==0 && C==0) would + be used to trap if interrupts were now virtually enabled, but + physically disabled - which they're not, so the trap isn't taken. The + kernel would then be back to state (1). - (5) An interrupt occurs. The exception prologue examines ICC2.Z and determines that the interrupt - shouldn't actually have happened. It jumps aside, and there disabled interrupts by setting - PSR.PIL to 14 and then it clears ICC2.C. + (5) An interrupt occurs. The exception prologue examines ICC2.Z and + determines that the interrupt shouldn't actually have happened. It + jumps aside, and there disabled interrupts by setting PSR.PIL to 14 + and then it clears ICC2.C. (6) If interrupts were then saved and disabled again (local_irq_save): - ICC2.Z would be shifted into the save variable and masked off (giving a 1). + ICC2.Z would be |