aboutsummaryrefslogtreecommitdiff
path: root/Documentation
AgeCommit message (Collapse)Author
2009-06-16console: make blank timeout value a boot optionDaniel Mack
The console blank timer is currently hardcoded to 10*60 seconds which might be annoying on systems with no input devices attached to wake up the console again. Especially during development, disabling the screen saver can be handy - for example when debugging the root fs mount mechanism or other scenarios where no userspace program could be started to do that at runtime from userspace. This patch defines a core_param for the variable in charge which allows users to entirely disable the blank feature at boot time by setting it 0. The value can still be overwritten at runtime using the standard ioctl call - this just allows to conditionally change the default. Signed-off-by: Daniel Mack <daniel@caiaq.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-16Documentation/atomic_ops.txt: fix sample codeFigo.zhang
list_add() lost a parameter in sample code. Signed-off-by: Figo.zhang <figo1802@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-16Documentation/accounting/getdelays.c intialize the variable before using itJaswinder Singh Rajput
Fix compilation warning: Documentation/accounting/getdelays.c: In function `main': Documentation/accounting/getdelays.c:249: warning: `cmd_type' may be used uninitialized in this function This is in fact a false positive. Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Acked-by: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-16vmscan: properly account for the number of page cache pages zone_reclaim() ↵Mel Gorman
can reclaim A bug was brought to my attention against a distro kernel but it affects mainline and I believe problems like this have been reported in various guises on the mailing lists although I don't have specific examples at the moment. The reported problem was that malloc() stalled for a long time (minutes in some cases) if a large tmpfs mount was occupying a large percentage of memory overall. The pages did not get cleaned or reclaimed by zone_reclaim() because the zone_reclaim_mode was unsuitable, but the lists are uselessly scanned frequencly making the CPU spin at near 100%. This patchset intends to address that bug and bring the behaviour of zone_reclaim() more in line with expectations which were noticed during investigation. It is based on top of mmotm and takes advantage of Kosaki's work with respect to zone_reclaim(). Patch 1 fixes the heuristics that zone_reclaim() uses to determine if the scan should go ahead. The broken heuristic is what was causing the malloc() stall as it uselessly scanned the LRU constantly. Currently, zone_reclaim is assuming zone_reclaim_mode is 1 and historically it could not deal with tmpfs pages at all. This fixes up the heuristic so that an unnecessary scan is more likely to be correctly avoided. Patch 2 notes that zone_reclaim() returning a failure automatically means the zone is marked full. This is not always true. It could have failed because the GFP mask or zone_reclaim_mode were unsuitable. Patch 3 introduces a counter zreclaim_failed that will increment each time the zone_reclaim scan-avoidance heuristics fail. If that counter is rapidly increasing, then zone_reclaim_mode should be set to 0 as a temporarily resolution and a bug reported because the scan-avoidance heuristic is still broken. This patch: On NUMA machines, the administrator can configure zone_reclaim_mode that is a more targetted form of direct reclaim. On machines with large NUMA distances for example, a zone_reclaim_mode defaults to 1 meaning that clean unmapped pages will be reclaimed if the zone watermarks are not being met. There is a heuristic that determines if the scan is worthwhile but the problem is that the heuristic is not being properly applied and is basically assuming zone_reclaim_mode is 1 if it is enabled. The lack of proper detection can manfiest as high CPU usage as the LRU list is scanned uselessly. Historically, once enabled it was depending on NR_FILE_PAGES which may include swapcache pages that the reclaim_mode cannot deal with. Patch vmscan-change-the-number-of-the-unmapped-files-in-zone-reclaim.patch by Kosaki Motohiro noted that zone_page_state(zone, NR_FILE_PAGES) included pages that were not file-backed such as swapcache and made a calculation based on the inactive, active and mapped files. This is far superior when zone_reclaim==1 but if RECLAIM_SWAP is set, then NR_FILE_PAGES is a reasonable starting figure. This patch alters how zone_reclaim() works out how many pages it might be able to reclaim given the current reclaim_mode. If RECLAIM_SWAP is set in the reclaim_mode it will either consider NR_FILE_PAGES as potential candidates or else use NR_{IN}ACTIVE}_PAGES-NR_FILE_MAPPED to discount swapcache and other non-file-backed pages. If RECLAIM_WRITE is not set, then NR_FILE_DIRTY number of pages are not candidates. If RECLAIM_SWAP is not set, then NR_FILE_MAPPED are not. [kosaki.motohiro@jp.fujitsu.com: Estimate unmapped pages minus tmpfs pages] [fengguang.wu@intel.com: Fix underflow problem in Kosaki's estimate] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: Christoph Lameter <cl@linux-foundation.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-16oom: move oom_adj value from task_struct to mm_structDavid Rientjes
The per-task oom_adj value is a characteristic of its mm more than the task itself since it's not possible to oom kill any thread that shares the mm. If a task were to be killed while attached to an mm that could not be freed because another thread were set to OOM_DISABLE, it would have needlessly been terminated since there is no potential for future memory freeing. This patch moves oomkilladj (now more appropriately named oom_adj) from struct task_struct to struct mm_struct. This requires task_lock() on a task to check its oom_adj value to protect against exec, but it's already necessary to take the lock when dereferencing the mm to find the total VM size for the badness heuristic. This fixes a livelock if the oom killer chooses a task and another thread sharing the same memory has an oom_adj value of OOM_DISABLE. This occurs because oom_kill_task() repeatedly returns 1 and refuses to kill the chosen task while select_bad_process() will repeatedly choose the same task during the next retry. Taking task_lock() in select_bad_process() to check for OOM_DISABLE and in oom_kill_task() to check for threads sharing the same memory will be removed in the next patch in this series where it will no longer be necessary. Writing to /proc/pid/oom_adj for a kthread will now return -EINVAL since these threads are immune from oom killing already. They simply report an oom_adj value of OOM_DISABLE. Cc: Nick Piggin <npiggin@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-16pagemap: add page-types toolWu Fengguang
Add page-types, a handy tool for querying page flags. It will expand some of the overloaded flags: PG_slob_free = PG_private PG_slub_frozen = PG_active PG_slub_debug = PG_error PG_readahead = PG_reclaim and mask out obscure flags except in -raw mode: PG_reserved PG_mlocked PG_mappedtodisk PG_private PG_private_2 PG_owner_priv_1 PG_arch_1 PG_uncached PG_compound* for non hugeTLB pages [akpm@linux-foundation.org: fix warning] Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Matt Mackall <mpm@selenic.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-16pagemap: document 9 more exported page flagsWu Fengguang
Also add short descriptions for all of the 20 exported page flags. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Matt Mackall <mpm@selenic.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-16pagemap: document clarificationsWu Fengguang
Some bit ranges were inclusive and some not. Fix them to be consistently inclusive. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Matt Mackall <mpm@selenic.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-16page allocator: use allocation flags as an index to the zone watermarkMel Gorman
ALLOC_WMARK_MIN, ALLOC_WMARK_LOW and ALLOC_WMARK_HIGH determin whether pages_min, pages_low or pages_high is used as the zone watermark when allocating the pages. Two branches in the allocator hotpath determine which watermark to use. This patch uses the flags as an array index into a watermark array that is indexed with WMARK_* defines accessed via helpers. All call sites that use zone->pages_* are updated to use the helpers for accessing the values and the array offsets for setting. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1244 commits) pkt_sched: Rename PSCHED_US2NS and PSCHED_NS2US ipv4: Fix fib_trie rebalancing Bluetooth: Fix issue with uninitialized nsh.type in DTL-1 driver Bluetooth: Fix Kconfig issue with RFKILL integration PIM-SM: namespace changes ipv4: update ARPD help text net: use a deferred timer in rt_check_expire ieee802154: fix kconfig bool/tristate muckup bonding: initialization rework bonding: use is_zero_ether_addr bonding: network device names are case sensative bonding: elminate bad refcount code bonding: fix style issues bonding: fix destructor bonding: remove bonding read/write semaphore bonding: initialize before registration bonding: bond_create always called with default parameters x_tables: Convert printk to pr_err netfilter: conntrack: optional reliable conntrack event delivery list_nulls: add hlist_nulls_add_head and hlist_nulls_del ...
2009-06-15Merge branch 'merge' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (103 commits) powerpc: Fix bug in move of altivec code to vector.S powerpc: Add support for swiotlb on 32-bit powerpc/spufs: Remove unused error path powerpc: Fix warning when printing a resource_size_t powerpc/xmon: Remove unused variable in xmon.c powerpc/pseries: Fix warnings when printing resource_size_t powerpc: Shield code specific to 64-bit server processors powerpc: Separate PACA fields for server CPUs powerpc: Split exception handling out of head_64.S powerpc: Introduce CONFIG_PPC_BOOK3S powerpc: Move VMX and VSX asm code to vector.S powerpc: Set init_bootmem_done on NUMA platforms as well powerpc/mm: Fix a AB->BA deadlock scenario with nohash MMU context lock powerpc/mm: Fix some SMP issues with MMU context handling powerpc: Add PTRACE_SINGLEBLOCK support fbdev: Add PLB support and cleanup DCR in xilinxfb driver. powerpc/virtex: Add ml510 reference design device tree powerpc/virtex: Add Xilinx ML510 reference design support powerpc/virtex: refactor intc driver and add support for i8259 cascading powerpc/virtex: Add support for Xilinx PCI host bridge ...
2009-06-15Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: (22 commits) nilfs2: support contiguous lookup of blocks nilfs2: add sync_page method to page caches of meta data nilfs2: use device's backing_dev_info for btree node caches nilfs2: return EBUSY against delete request on snapshot nilfs2: modify list of unsupported features in caveats nilfs2: enable sync_page method nilfs2: set bio unplug flag for the last bio in segment nilfs2: allow future expansion of metadata read out via get info ioctl NILFS2: Pagecache usage optimization on NILFS2 nilfs2: remove nilfs_btree_operations from btree mapping nilfs2: remove nilfs_direct_operations from direct mapping nilfs2: remove bmap pointer operations nilfs2: remove useless b_low and b_high fields from nilfs_bmap struct nilfs2: remove pointless NULL check of bpop_commit_alloc_ptr function nilfs2: move get block functions in bmap.c into btree codes nilfs2: remove nilfs_bmap_delete_block nilfs2: remove nilfs_bmap_put_block nilfs2: remove header file for segment list operations nilfs2: eliminate removal list of segments nilfs2: add sufile function that can modify multiple segment usages ...
2009-06-15Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: Documentation/feature-removal-schedule.txt drivers/scsi/fcoe/fcoe.c net/core/drop_monitor.c net/core/net-traces.c
2009-06-14Merge branch 'master' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-next * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-next: (53 commits) .gitignore: ignore *.lzma files kbuild: add generic --set-str option to scripts/config kbuild: simplify argument loop in scripts/config kbuild: handle non-existing options in scripts/config kallsyms: generalize text region handling kallsyms: support kernel symbols in Blackfin on-chip memory documentation: make version fix kbuild: fix a compile warning gitignore: Add GNU GLOBAL files to top .gitignore kbuild: fix delay in setlocalversion on readonly source README: fix misleading pointer to the defconf directory vmlinux.lds.h update kernel-doc: cleanup perl script Improve vmlinux.lds.h support for arch specific linker scripts kbuild: fix headers_exports with boolean expression kbuild/headers_check: refine extern check kbuild: fix "Argument list too long" error for "make headers_check", ignore *.patch files Remove bashisms from scripts menu: fix embedded menu presentation ...
2009-06-14Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (31 commits) trivial: remove the trivial patch monkey's name from SubmittingPatches trivial: Fix a typo in comment of addrconf_dad_start() trivial: usb: fix missing space typo in doc trivial: pci hotplug: adding __init/__exit macros to sgi_hotplug trivial: Remove the hyphen from git commands trivial: fix ETIMEOUT -> ETIMEDOUT typos trivial: Kconfig: .ko is normally not included in module names trivial: SubmittingPatches: fix typo trivial: Documentation/dell_rbu.txt: fix typos trivial: Fix Pavel's address in MAINTAINERS trivial: ftrace:fix description of trace directory trivial: unnecessary (void*) cast removal in sound/oss/msnd.c trivial: input/misc: Fix typo in Kconfig trivial: fix grammo in bus_for_each_dev() kerneldoc trivial: rbtree.txt: fix rb_entry() parameters in sample code trivial: spelling fix in ppc code comments trivial: fix typo in bio_alloc kernel doc trivial: Documentation/rbtree.txt: cleanup kerneldoc of rbtree.txt trivial: Miscellaneous documentation typo fixes trivial: fix typo milisecond/millisecond for documentation and source comments. ...
2009-06-14Merge branch 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-armLinus Torvalds
* 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm: (417 commits) MAINTAINERS: EB110ATX is not ebsa110 MAINTAINERS: update Eric Miao's email address and status fb: add support of LCD display controller on pxa168/910 (base layer) [ARM] 5552/1: ep93xx get_uart_rate(): use EP93XX_SYSCON_PWRCNT and EP93XX_SYSCON_PWRCN [ARM] pxa/sharpsl_pm: zaurus needs generic pxa suspend/resume routines [ARM] 5544/1: Trust PrimeCell resource sizes [ARM] pxa/sharpsl_pm: cleanup of gpio-related code. [ARM] pxa/sharpsl_pm: drop set_irq_type calls [ARM] pxa/sharpsl_pm: merge pxa-specific code into generic one [ARM] pxa/sharpsl_pm: merge the two sharpsl_pm.c since it's now pxa specific [ARM] sa1100: remove unused collie_pm.c [ARM] pxa: fix the conflicting non-static declarations of global_gpios[] [ARM] 5550/1: Add default configure file for w90p910 platform [ARM] 5549/1: Add clock api for w90p910 platform. [ARM] 5548/1: Add gpio api for w90p910 platform [ARM] 5551/1: Add multi-function pin api for w90p910 platform. [ARM] Make ARM_VIC_NR depend on ARM_VIC [ARM] 5546/1: ARM PL022 SSP/SPI driver v3 ARM: OMAP4: SMP: Update defconfig for OMAP4430 ARM: OMAP4: SMP: Enable SMP support for OMAP4430 ...
2009-06-14documentation: make version fixAdam Lackorzynski
The Makefiles in the build directories use the internal make variable MAKEFILE_LIST which is available from make 3.80 only. (The patch would be valid back to 2.6.25) Signed-off-by: Adam Lackorzynski <adam@os.inf.tu-dresden.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2009-06-13Merge branch 'next-i2c' of git://aeryn.fluff.org.uk/bjdooks/linuxLinus Torvalds
* 'next-i2c' of git://aeryn.fluff.org.uk/bjdooks/linux: i2c-ocores: Can add I2C devices to the bus i2c-s3c2410: move to using platform idtable to match devices i2c: OMAP3: Better noise suppression for fast/standard modes i2c: OMAP2/3: Fix scll/sclh calculations i2c: Blackfin TWI: implement I2C_FUNC_SMBUS_I2C_BLOCK functionality i2c: Blackfin TWI: fix transfer errors with repeat start i2c: Blackfin TWI: fix REPEAT START mode doesn't repeat i2c: Blackfin TWI: make sure we don't end up with a CLKDIV=0
2009-06-13Merge branch 'x86-mce-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (80 commits) x86, mce: Add boot options for corrected errors x86, mce: Fix mce printing x86, mce: fix for mce counters x86, mce: support action-optional machine checks x86, mce: define MCE_VECTOR x86, mce: rename mce_notify_user to mce_notify_irq x86: fix panic with interrupts off (needed for MCE) x86, mce: export MCE severities coverage via debugfs x86, mce: implement new status bits x86, mce: print header/footer only once for multiple MCEs x86, mce: default to panic timeout for machine checks x86, mce: improve mce_get_rip x86, mce: make non Monarch panic message "Fatal machine check" too x86, mce: switch x86 machine check handler to Monarch election. x86, mce: implement panic synchronization x86, mce: implement bootstrapping for machine check wakeups x86, mce: check early in exception handler if panic is needed x86, mce: add table driven machine check grading x86, mce: remove TSC print heuristic x86, mce: log corrected errors when panicing ...
2009-06-13Merge branch 'docs-next' of git://git.lwn.net/linux-2.6Linus Torvalds
* 'docs-next' of git://git.lwn.net/linux-2.6: Document the debugfs API Documentation: Add "how to write a good patch summary" to SubmittingPatches SubmittingPatches: fix typo docs: Encourage better changelogs in the development process document Document Reported-by in SubmittingPatches
2009-06-13i2c-ocores: Can add I2C devices to the busRichard Röjfors
There is sometimes a need for the ocores driver to add devices to the bus when installed. i2c_register_board_info can not always be used, because the I2C devices are not known at an early state, they could for instance be connected on a I2C bus on a PCI device which has the Open Cores IP. i2c_new_device can not be used in all cases either since the resulting bus nummer might be unknown. The solution is the pass a list of I2C devices in the platform data to the Open Cores driver. This is useful for MFD drivers. Signed-off-by: Richard Röjfors <richard.rojfors.ext@mocean-labs.com> Signed-off-by: Ben Dooks <ben-linux@fluff.org>
2009-06-12PM: Remove bus_type suspend_late()/resume_early() V2Magnus Damm
Remove the ->suspend_late() and ->resume_early() callbacks from struct bus_type V2. These callbacks are legacy stuff at this point and since there seem to be no in-tree users we may as well remove them. New users should use dev_pm_ops. Signed-off-by: Magnus Damm <damm@igel.co.jp> Acked-by: Pavel Machek <pavel@ucw.cz> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2009-06-12Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6: (290 commits) ALSA: pcm - Update document about xrun_debug proc file ALSA: lx6464es - support standard alsa module parameters ALSA: snd_usb_caiaq: set mixername ALSA: hda - add quirk for STAC92xx (SigmaTel STAC9205) ALSA: use card device as parent for jack input-devices ALSA: sound/ps3: Correct existing and add missing annotations ALSA: sound/ps3: Restructure driver source ALSA: sound/ps3: Fix checkpatch issues ASoC: Fix lm4857 control ALSA: ctxfi - Clear PCM resources at hw_params and hw_free ALSA: ctxfi - Check the presence of SRC instance in PCM pointer callbacks ALSA: ctxfi - Add missing start check in atc_pcm_playback_start() ALSA: ctxfi - Add use_system_timer module option ALSA: usb - Add boot quirk for C-Media 6206 USB Audio ALSA: ctxfi - Fix wrong model id for UAA ALSA: ctxfi - Clean up probe routines ALSA: hda - Fix the previous tagra-8ch patch ALSA: hda - Add 7.1 support for MSI GX620 ALSA: pcm - A helper function to compose PCM stream name for debug prints ALSA: emu10k1 - Fix minimum periods for efx playback ...
2009-06-12Merge branch 'topic/pcm-jiffies-check' into for-linusTakashi Iwai
* topic/pcm-jiffies-check: ALSA: pcm - Update document about xrun_debug proc file
2009-06-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-lguestLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-lguest: (31 commits) lguest: add support for indirect ring entries lguest: suppress notifications in example Launcher lguest: try to batch interrupts on network receive lguest: avoid sending interrupts to Guest when no activity occurs. lguest: implement deferred interrupts in example Launcher lguest: remove obsolete LHREQ_BREAK call lguest: have example Launcher service all devices in separate threads lguest: use eventfds for device notification eventfd: export eventfd_signal and eventfd_fget for lguest lguest: allow any process to send interrupts lguest: PAE fixes lguest: PAE support lguest: Add support for kvm_hypercall4() lguest: replace hypercall name LHCALL_SET_PMD with LHCALL_SET_PGD lguest: use native_set_* macros, which properly handle 64-bit entries when PAE is activated lguest: map switcher with executable page table entries lguest: fix writev returning short on console output lguest: clean up length-used value in example launcher lguest: Segment selectors are 16-bit long. Fix lg_cpu.ss1 definition. lguest: beyond ARRAY_SIZE of cpu->arch.gdt ...
2009-06-12Merge branch 'for-2.6.31' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6 * 'for-2.6.31' of git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (29 commits) ide: re-implement ide_pci_init_one() on top of ide_pci_init_two() ide: unexport ide_find_dma_mode() ide: fix PowerMac bootup oops ide: skip probe if there are no devices on the port (v2) sl82c105: add printk() logging facility ide-tape: fix proc warning ide: add IDE_DFLAG_NIEN_QUIRK device flag ide: respect quirk_drives[] list on all controllers hpt366: enable all quirks for devices on quirk_drives[] list hpt366: sync quirk_drives[] list with pdc202xx_{new,old}.c ide: remove superfluous SELECT_MASK() call from do_rw_taskfile() ide: remove superfluous SELECT_MASK() call from ide_driveid_update() icside: remove superfluous ->maskproc method ide-tape: fix IDE_AFLAG_* atomic accesses ide-tape: change IDE_AFLAG_IGNORE_DSC non-atomically pdc202xx_old: kill resetproc() method pdc202xx_old: don't call pdc202xx_reset() on IRQ timeout pdc202xx_old: use ide_dma_test_irq() ide: preserve Host Protected Area by default (v2) ide-gd: implement block device ->set_capacity method (v2) ...
2009-06-12trivial: remove the trivial patch monkey's name from SubmittingPatchesMarkus Heidelberg
It is outdated here and can be found in the MAINTAINERS file. Also remove the URL of the previous maintainer, similar content can be found in the SubmittingPatches file. Signed-off-by: Markus Heidelberg <markus.heidelberg@web.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-06-12trivial: usb: fix missing space typo in docNémeth Márton
Signed-off-by: Márton Németh <nm127@freemail.hu> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-06-12trivial: Remove the hyphen from git commandsMatt Kraai
Signed-off-by: Matt Kraai <kraai@ftbfs.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-06-12trivial: SubmittingPatches: fix typoPavel Machek
Fix typo. Signed-off-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-06-12trivial: Documentation/dell_rbu.txt: fix typosMasanori Kobayasi
Remove a period from end of command-line and fix misplaced comma. Signed-off-by: Masanori Kobayasi <zap03216@nifty.ne.jp> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-06-12trivial: ftrace:fix description of trace directoryGeunSik Lim
Fix trace source directory from kernel/tracing/ to kernel/trace/. Signed-off-by: GeunSik Lim <geunsik.lim@samsung.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-06-12trivial: rbtree.txt: fix rb_entry() parameters in sample codeWang Tinggong
Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-06-12trivial: Documentation/rbtree.txt: cleanup kerneldoc of rbtree.txtfigo.zhang
The first formal parameter of the rb_link_node() is a pointer, and the "node" is define a data struct (pls see line 67 and line 73 in the doc), so the actual parameter should use "&data->node". Signed-off-by: Figo.zhang <figo.zhang@kolorific.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-06-12trivial: Miscellaneous documentation typo fixesMatt LaPlante
Fix various typos in documentation txts. Signed-off-by: Matt LaPlante <kernel1@cyberdogtech.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-06-12trivial: fix typo milisecond/millisecond for documentation and source comments.Martin Olsson
Signed-off-by: Martin Olsson <martin@minimum.se> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-06-12trivial: fix typos s/paramter/parameter/ and s/excute/execute/ in ↵Martin Olsson
documentation and source comments. Signed-off-by: Martin Olsson <martin@minimum.se> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-06-12lguest: add support for indirect ring entriesMark McLoughlin
Support the VIRTIO_RING_F_INDIRECT_DESC feature. This is a simple matter of changing the descriptor walking code to operate on a struct vring_desc* and supplying it with an indirect table if detected. Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12lguest: suppress notifications in example LauncherRusty Russell
The Guest only really needs to tell us about activity when we're going to listen to the eventfd: normally, we don't want to know. So if there are no available buffers, turn on notifications, re-check, then wait for the Guest to notify us via the eventfd, then turn notifications off again. There's enough else going on that the differences are in the noise. Before: Secs RxKicks TxKicks 1G TCP Guest->Host: 3.94 4686 32815 1M normal pings: 104 142862 1000010 1M 1k pings (-l 120): 57 142026 1000007 After: 1G TCP Guest->Host: 3.76 4691 32811 1M normal pings: 111 142859 997467 1M 1k pings (-l 120): 55 19648 501549 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12lguest: try to batch interrupts on network receiveRusty Russell
Rather than triggering an interrupt every time, we only trigger an interrupt when there are no more incoming packets (or the recv queue is full). However, the overhead of doing the select to figure this out is measurable: 1M pings goes from 98 to 104 seconds, and 1G Guest->Host TCP goes from 3.69 to 3.94 seconds. It's close to the noise though. I tested various timeouts, including reducing it as the number of pending packets increased, timing a 1 gigabyte TCP send from Guest -> Host and Host -> Guest (GSO disabled, to increase packet rate). // time tcpblast -o -s 65536 -c 16k 192.168.2.1:9999 > /dev/null Timeout Guest->Host Pkts/irq Host->Guest Pkts/irq Before 11.3s 1.0 6.3s 1.0 0 11.7s 1.0 6.6s 23.5 1 17.1s 8.8 8.6s 26.0 1/pending 13.4s 1.9 6.6s 23.8 2/pending 13.6s 2.8 6.6s 24.1 5/pending 14.1s 5.0 6.6s 24.4 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12lguest: avoid sending interrupts to Guest when no activity occurs.Rusty Russell
If we track how many buffers we've used, we can tell whether we really need to interrupt the Guest. This happens as a side effect of spurious notifications. Spurious notifications happen because it can take a while before the Host thread wakes up and sets the VRING_USED_F_NO_NOTIFY flag, and meanwhile the Guest can more notifications. A real fix would be to use wake counts, rather than a suppression flag, but the practical difference is generally in the noise: the interrupt is usually coalesced into a pending one anyway so we just save a system call which isn't clearly measurable. Secs Spurious IRQS 1G TCP Guest->Host: 3.93 58 1M normal pings: 100 72 1M 1k pings (-l 120): 57 492904 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12lguest: implement deferred interrupts in example LauncherRusty Russell
Rather than sending an interrupt on every buffer, we only send an interrupt when we're about to wait for the Guest to send us a new one. The console input and network input still send interrupts manually, but the block device, network and console output queues can simply rely on this logic to send interrupts to the Guest at the right time. The patch is cluttered by moving trigger_irq() higher in the code. In practice, two factors make this optimization less interesting: (1) we often only get one input at a time, even for networking, (2) triggering an interrupt rapidly tends to get coalesced anyway. Before: Secs RxIRQS TxIRQs 1G TCP Guest->Host: 3.72 32784 32771 1M normal pings: 99 1000004 995541 100,000 1k pings (-l 120): 5 49510 49058 After: 1G TCP Guest->Host: 3.69 32809 32769 1M normal pings: 99 1000004 996196 100,000 1k pings (-l 120): 5 52435 52361 (Note the interrupt count on 100k pings goes *up*: see next patch). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12lguest: have example Launcher service all devices in separate threadsRusty Russell
Currently lguest has three threads: the main Launcher thread, a Waker thread, and a thread for the block device (because synchronous block was simply too painful to bear). The Waker selects() on all the input file descriptors (eg. stdin, net devices, pipe to the block thread) and when one becomes readable it calls into the kernel to kick the Launcher thread out into userspace, which repeats the poll, services the device(s), and then tells the kernel to release the Waker before re-entering the kernel to run the Guest. Also, to make a slightly-decent network transmit routine, the Launcher would suppress further network interrupts while it set a timer: that signal handler would write to a pipe, which would rouse the Waker which would prod the Launcher out of the kernel to check the network device again. Now we can convert all our virtqueues to separate threads: each one has a separate eventfd for when the Guest pokes the device, and can trigger interrupts in the Guest directly. The linecount shows how much this simplifies, but to really bring it home, here's an strace analysis of single Guest->Host ping before: * Guest sends packet, notifies xmit vq, return control to Launcher * Launcher clears notification flag on xmit ring * Launcher writes packet to TUN device writev(4, [{"\0\0\0\0\0\0\0\0\0\0", 10}, {"\366\r\224`\2058\272m\224vf\274\10\0E\0\0T\0\0@\0@\1\265"..., 98}], 2) = 108 * Launcher sets up interrupt for Guest (xmit ring is empty) write(10, "\2\0\0\0\3\0\0\0", 8) = 0 * Launcher sets up timer for interrupt mitigation setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 505}}, NULL) = 0 * Launcher re-runs guest pread64(10, 0xbfa5f4d4, 4, 0) ... * Waker notices reply packet in tun device (it was in select) select(12, [0 3 4 6 11], NULL, NULL, NULL) = 1 (in [4]) * Waker kicks Launcher out of guest: pwrite64(10, "\3\0\0\0\1\0\0\0", 8, 0) = 0 * Launcher returns from running guest: ... = -1 EAGAIN (Resource temporarily unavailable) * Launcher looks at input fds: select(7, [0 3 4 6], NULL, NULL, {0, 0}) = 1 (in [4], left {0, 0}) * Launcher reads pong from tun device: readv(4, [{"\0\0\0\0\0\0\0\0\0\0", 10}, {"\272m\224vf\274\366\r\224`\2058\10\0E\0\0T\364\26\0\0@"..., 1518}], 2) = 108 * Launcher injects guest notification: write(10, "\2\0\0\0\2\0\0\0", 8) = 0 * Launcher rechecks fds: select(7, [0 3 4 6], NULL, NULL, {0, 0}) = 0 (Timeout) * Launcher clears Waker: pwrite64(10, "\3\0\0\0\0\0\0\0", 8, 0) = 0 * Launcher reruns Guest: pread64(10, 0xbfa5f4d4, 4, 0) = ? ERESTARTSYS (To be restarted) * Signal comes in, uses pipe to wake up Launcher: --- SIGALRM (Alarm clock) @ 0 (0) --- write(8, "\0", 1) = 1 sigreturn() = ? (mask now []) * Waker sees write on pipe: select(12, [0 3 4 6 11], NULL, NULL, NULL) = 1 (in [6]) * Waker kicks Launcher out of Guest: pwrite64(10, "\3\0\0\0\1\0\0\0", 8, 0) = 0 * Launcher exits from kernel: pread64(10, 0xbfa5f4d4, 4, 0) = -1 EAGAIN (Resource temporarily unavailable) * Launcher looks to see what fd woke it: select(7, [0 3 4 6], NULL, NULL, {0, 0}) = 1 (in [6], left {0, 0}) * Launcher reads timeout fd, sets notification flag on xmit ring read(6, "\0", 32) = 1 * Launcher rechecks fds: select(7, [0 3 4 6], NULL, NULL, {0, 0}) = 0 (Timeout) * Launcher clears Waker: pwrite64(10, "\3\0\0\0\0\0\0\0", 8, 0) = 0 * Launcher resumes Guest: pread64(10, "\0p\0\4", 4, 0) .... strace analysis of single Guest->Host ping after: * Guest sends packet, notifies xmit vq, creates event on eventfd. * Network xmit thread wakes from read on eventfd: read(7, "\1\0\0\0\0\0\0\0", 8) = 8 * Network xmit thread writes packet to TUN device writev(4, [{"\0\0\0\0\0\0\0\0\0\0", 10}, {"J\217\232FI\37j\27\375\276\0\304\10\0E\0\0T\0\0@\0@\1\265"..., 98}], 2) = 108 * Network recv thread wakes up from read on tunfd: readv(4, [{"\0\0\0\0\0\0\0\0\0\0", 10}, {"j\27\375\276\0\304J\217\232FI\37\10\0E\0\0TiO\0\0@\1\214"..., 1518}], 2) = 108 * Network recv thread sets up interrupt for the Guest write(6, "\2\0\0\0\2\0\0\0", 8) = 0 * Network recv thread goes back to reading tunfd 13:39:42.460285 readv(4, <unfinished ...> * Network xmit thread sets up interrupt for Guest (xmit ring is empty) write(6, "\2\0\0\0\3\0\0\0", 8) = 0 * Network xmit thread goes back to reading from eventfd read(7, <unfinished ...> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12lguest: PAE supportMatias Zabaljauregui
This version requires that host and guest have the same PAE status. NX cap is not offered to the guest, yet. Signed-off-by: Matias Zabaljauregui <zabaljauregui@gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12lguest: fix writev returning short on console outputRusty Russell
I've never seen it here, but I can't find anywhere that says writev will write everything. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12lguest: clean up length-used value in example launcherRusty Russell
The "len" field in the used ring for virtio indicates the number of bytes *written* to the buffer. This means the guest doesn't have to zero the buffers in advance as it always knows the used length. Erroneously, the console and network example code puts the length *read* into that field. The guest ignores it, but it's wrong. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12lguest: clean up example launcher compile flags.Rusty Russell
18 months ago 5bbf89fc260830f3f58b331d946a16b39ad1ca2d changed to loading bzImages directly, and no longer manually ungzipping them, so we no longer need libz. Also, -m32 is useful for those on 64-bit platforms (and harmless on 32-bit). Reported-by: Ron Minnich <rminnich@gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12lguest: remove invalid interrupt forcing logic.Rusty Russell
20887611523e749d99cc7d64ff6c97d27529fbae (lguest: notify on empty) introduced lguest support for the VIRTIO_F_NOTIFY_ON_EMPTY flag, but in fact it turned on interrupts all the time. Because we always process one buffer at a time, the inflight count is always 0 when call trigger_irq and so we always ignore VRING_AVAIL_F_NO_INTERRUPT from the Guest. It should be looking to see if there are more buffers in the Guest's queue: if it's empty, then we force an interrupt. This makes little difference, since we usually have an empty queue; but that's the subject of another patch. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12lguest: get more serious about wmb() in example Launcher codeRusty Russell
Since the Launcher process runs the Guest, it doesn't have to be very serious about its barriers: the Guest isn't running while we are (Guest is UP). Before we change to use threads to service devices, we need to fix this. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12lguest: cleanup passing of /dev/lguest fd around example launcher.Rusty Russell
We hand the /dev/lguest fd everywhere; it's far neater to just make it a global (it already is, in fact, hidden in the waker_fds struct). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>