aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2006-03-11[PATCH] ext3: fix nobh mode for chattr +j inodesBadari Pulavarty
One can do "chattr +j" on a file to change its journalling mode. Fix writeback mode with "nobh" handling for it. Even though, we mount ext3 filesystem in writeback mode with "nobh" option, some one can do "chattr +j" on a single file to force it to do journalled mode. In order to do journaling, ext3_block_truncate_page() need to fallback to default case of creating buffers and adding them to transaction etc. Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-11[PATCH] ext3: ext3_symlink should use GFP_NOFS allocations insideKirill Korotaev
This patch fixes illegal __GFP_FS allocation inside ext3 transaction in ext3_symlink(). Such allocation may re-enter ext3 code from try_to_free_pages. But JBD/ext3 code keeps a pointer to current journal handle in task_struct and, hence, is not reentrable. This bug led to "Assertion failure in journal_dirty_metadata()" messages. http://bugzilla.openvz.org/show_bug.cgi?id=115 Signed-off-by: Andrey Savochkin <saw@saw.sw.com.sg> Signed-off-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-10[PATCH] Input: psmouse - disable autoresyncDmitry Torokhov
Automatic resynchronization in psmouse driver causes problems on some hardware so disable it by default for now. People with KVM switches that require resync can still enable it via module parameter or sysfs attribute. Signed-off-by: Dmitry Torokhov <dtor@mail.ru> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-10Merge branch 'release' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6 * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6: [IA64] Fix race in the accessed/dirty bit handlers
2006-03-10[PATCH] kbuild: version.h should depend on .kernelreleaseJan Beulich
Rebuilding a previously built tree while using make's -j option from time to time results in the version.h check running at the same time as the updating of .kernelrelease, resulting in UTS_RELEASE remaining an empty string (and as a side effect causing the entire kernel to be rebuilt). Signed-Off-By: Jan Beulich <jbeulich@novell.com> Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-10[PATCH] fix pcmcia_device_probe oopsHugh Dickins
Fix pcmcia_device_probe NULL pointer dereference at startup. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-09[PATCH] slab: Node rotor for freeing alien caches and remote per cpu pages.Christoph Lameter
The cache reaper currently tries to free all alien caches and all remote per cpu pages in each pass of cache_reap. For a machines with large number of nodes (such as Altix) this may lead to sporadic delays of around ~10ms. Interrupts are disabled while reclaiming creating unacceptable delays. This patch changes that behavior by adding a per cpu reap_node variable. Instead of attempting to free all caches, we free only one alien cache and the per cpu pages from one remote node. That reduces the time spend in cache_reap. However, doing so will lengthen the time it takes to completely drain all remote per cpu pagesets and all alien caches. The time needed will grow with the number of nodes in the system. All caches are drained when they overflow their respective capacity. So the drawback here is only that a bit of memory may be wasted for awhile longer. Details: 1. Rename drain_remote_pages to drain_node_pages to allow the specification of the node to drain of pcp pages. 2. Add additional functions init_reap_node, next_reap_node for NUMA that manage a per cpu reap_node counter. 3. Add a reap_alien function that reaps only from the current reap_node. For us this seems to be a critical issue. Holdoffs of an average of ~7ms cause some HPC benchmarks to slow down significantly. F.e. NAS parallel slows down dramatically. NAS parallel has a 12-16 seconds runtime w/o rotor compared to 5.8 secs with the rotor patches. It gets down to 5.05 secs with the additional interrupt holdoff reductions. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-09[PATCH] m68k: fix cmpxchg compile errors if CONFIG_RMW_INSNS=nRoman Zippel
We require that all archs implement atomic_cmpxchg(), for the generic version of atomic_add_unless(). Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-09[PATCH] s390: dasd proc interface typoHorst Hummel
This fixes a typo introduced with 90f0094dc607abe384a412bfb7199fb667ab0735. Signed-off-by: Horst Hummel <horst.hummel@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-09[PATCH] memory hotadd: pgdat->node_present_pages fixYasunori Goto
When pages are onlined, not only zone->present_pages but also pgdat->node_present_pages should be refreshed. This parameter is used to show information at /sys/device/system/node/nodeX/meminfo via si_meminfo_node(). So, it shows strange value for MemUsed which is calculated (node_present_pages - all zones free pages). Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-09[PATCH] edac: mark as experimentalTim Small
EDAC is still causing a few problems and the code is relatively green. Mark it as experimental until thing settle down. Also, provide some documentation pointers in Kconfig help. Signed-off-by: Tim Small <tim@buttersideup.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-09[PATCH] s390: Increase spinlock retry code performanceChristian Ehrhardt
Currently the code tries up to spin_retry times to grab a lock using the cs instruction. The cs instruction has exclusive access to a memory region and therefore invalidates the appropiate cache line of all other cpus. If there is contention on a lock this leads to cache line trashing. This can be avoided if we first check wether a cs instruction is likely to succeed before the instruction gets actually executed. Signed-off-by: Christian Ehrhardt <ehrhardt@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-09[PATCH] ibmasm: use after free fixMax Asbock
The kobject_put() can free the memory at *cmd, but cmd->lock points to a persistent lock that is not freed with cmd. Signed-off-by: Max Asbock <masbock@us.ibm.com> Cc: Vernon Mauery <vernux@us.ibm.com> Cc: Srihari Vijayaraghavan <sriharivijayaraghavan@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-09[PATCH] vmscan: no zone_reclaim if PF_MALLOC is setChristoph Lameter
If the process has already set PF_MALLOC and is already using current->reclaim_state then do not try to reclaim memory from the zone. This is set by kswapd and/or synchrononous global reclaim which will not take it lightly if we zap the reclaim_state. Signed-off-by: Christoph Lameter <clameter@sig.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-09[PATCH] md: Fix several raid1 bugs which cause a memory leakNeilBrown
- wrong test for 'is this a BARRIER bio' - not freeing on all possible paths. - using r1_bio after freeing it. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-09[PATCH] xtensa must set RWSEM_GENERIC_SPINLOCK=yAdrian Bunk
/usr/src/ctest/git/kernel/mm/rmap.c: In function `page_referenced_one': /usr/src/ctest/git/kernel/mm/rmap.c:354: warning: implicit declaration of function `rwsem_is_locked' Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: <chris@zankel.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-09[PATCH] arch/sh/Kconfig: don't source non-existing Kconfig filesAdrian Bunk
arch/sh/Kconfig shouldn't source non-existing Kconfig files. Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-09[PATCH] mtd: 64 bit fixesAtsushi Nemoto
Fix some bugs in mtd/jffs2 on 64bit platform. The MEMGETBADBLOCK/MEMSETBADBLOCK ioctl are not listed in compat_ioctl.h. And some variables in jffs2 are declared as uint32_t but used to hold size_t values. Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-09[PATCH] alpha: fix IRQ handling lockupIvan Kokshaysky
Fix a lockup which was introduced during the conversion to the generic IRQ framework. Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-09[PATCH] Fix error handling in backlight driversJean Delvare
ERR_PTR() is supposed to be passed a negative value. Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-09[PATCH] dcdbas: dcdbas_pdev referenced after platform_device_unregister on exitDoug Warzecha
smi_data_buf_free() references dcdbas_pdev when calling dma_free_coherent(). In dcdbas_exit(), smi_data_buf_free() is called after platform_device_unregister(dcdbas_pdev). This patch moves platform_device_unregister(dcdbas_pdev) after smi_data_buf_free() in dcdbas_exit(). Signed-off-by: Doug Warzecha <Douglas_Warzecha@dell.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-09[PATCH] page_add_file_rmap(): remove BUG_ON()sHugh Dickins
Remove two early-development BUG_ONs from page_add_file_rmap. The pfn_valid test (originally useful for checking that nobody passed an artificial struct page) comes too late, since we already have the struct page. The PageAnon test (useful when anon was first distinguished from file rmap) prevents ->nopage implementations from reusing ->mapping, which would otherwise be available. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-09[MIPS] Always pass -msoft-float.Ralf Baechle
Some people still haven't heared that fp in the kernel is forbidden. Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-03-09[MIPS] Undefine scr_writew and scr_readw in <asm/vga.h>.Ralf Baechle
This is gluing the build of cirrusfb but really the mess that would need cleaning and fixing is <video/vga.h> and <linux/vt_buffer.h> ... Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-03-09[MIPS] Scatter a bunch of __init over tlbex.c.Ralf Baechle
Found by make buildcheck. Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-03-09[MIPS] Momentum: Resurrect after things were moved around a while ago.Ralf Baechle
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-03-09[MIPS] Discard .exit.text at runtime.Ralf Baechle
At times gcc will place bits of __exit functions into .rodata. If compiled into the kernle itself we used to discard .exit.text - but not the bits left in .rodata. While harmless this did at times result in a large number of warnings. So until gcc fixes this, discard .exit.text at runtime. Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-03-09[MIPS] Enable highmem for all MIPS32 and MIPS64 processors.Ralf Baechle
In case a particular system doesn't support highmem the runtime checks will ensure nothing bad is going to happen. Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-03-09[MIPS] A struct console.setup function may not be __init.Ralf Baechle
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-03-09[MIPS] Threaten removal of code for NEC DDB5074 and DDB5476 evaluation boards.Ralf Baechle
What: Support for NEC DDB5074 and DDB5476 evaluation boards. When: June 2006 Why: Board specific code doesn't build anymore since ~2.6.0 and no users have complained indicating there is no more need for these boards. This should really be considered a last call. Who: Ralf Baechle <ralf@linux-mips.org>
2006-03-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc-mergeLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc-merge: powerpc: Fix various syscall/signal/swapcontext bugs [PATCH] powerpc: incorrect rmo_top handling in prom_init [PATCH] powerpc: Fix incorrect pud_ERROR() message [PATCH] powerpc: Expose SMT and L1 icache snoop userland features [PATCH] powerpc: Fix windfarm_pm112 not starting all control loops [PATCH] powerpc: Fix old g5 issues with windfarm powerpc32: Fix timebase synchronization on 32-bit powermacs powerpc: Turn off verbose debug output in powermac platform functions powerpc: Fix might-sleep warning in program check exception handler
2006-03-08[PATCH] block: disable block layer bouncing for most memory on 64bit systemsAndi Kleen
The low level PCI DMA mapping functions should handle it in most cases. This should fix problems with depleting the DMA zone early. The old code used precious GFP_DMA memory in many cases where it was not needed. Signed-off-by: Andi Kleen <ak@suse.de> Cc: Jens Axboe <axboe@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-08[PATCH] i386: port ATI timer fix from x86_64 to i386 IIAndi Kleen
ATI chipsets tend to generate double timer interrupts for the local APIC timer when both the 8254 and the IO-APIC timer pins are enabled. This is because they route it to both and the result is anded together and the CPU ends up processing it twice. This patch changes check_timer to disable the 8254 routing for interrupt 0. I think it would be safe on all chipsets actually (i tested it on a couple and it worked everywhere) and Windows seems to do it in a similar way, but to be conservative this patch only enables this mode on ATI (and adds options to enable/disable too) Ported over from a similar x86-64 change. I reused the ACPI earlyquirk infrastructure for the ATI bridge check, but tweaked it a bit to work even without ACPI. Inspired by a patch from Chuck Ebbert, but redone. Cc: Chuck Ebbert <76306.1226@compuserve.com> Cc: "Brown, Len" <len.brown@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-08Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: [NET] compat ifconf: fix limits
2006-03-08Merge master.kernel.org:/pub/scm/linux/kernel/git/mchehab/v4l-dvbLinus Torvalds
* master.kernel.org:/pub/scm/linux/kernel/git/mchehab/v4l-dvb: Merge branch 'work-fixes' V4L/DVB (3413): Typos grab bag of the month V4L/DVB (3403): Workaround to fix initialization for Nexus CA Merge branch 'work-fixes' V4L/DVB (3395): Fixed Pinnacle 300i DVB-T support V4L/DVB (3399): ELSA EX-VISION 500TV: fix incorrect PCI subsystem ID V4L/DVB (3382): Fix stv0297 for qam128 on tt c1500 (saa7146) V4L/DVB (3300a): Removing personal email from DVB maintainers V4L/DVB (3385): Dvb: fix __init/__exit section references in av7110 driver V4L/DVB (3378): Restore power on defaults of tda9887 after tda8290 probe V4L/DVB (3354): Fix maximum for the saturation and contrast controls. V4L/DVB (3352): Cxusb: fix lgdt3303 naming V4L/DVB (3348): Fixed saa7134 ALSA initialization with multiple cards V4L/DVB (3347): Pinnacle PCTV 40i: add filtered Composite2 input V4L/DVB (3341): Upstream sync - make 2 structs static V4L/DVB (3340): Make a struct static V4L/DVB (3337): Drivers/media/dvb/frontends/mt312.c: cleanups V4L/DVB (3336): Bt8xx documentation authors fix
2006-03-08[NET] compat ifconf: fix limitsRandy Dunlap
A recent change to compat. dev_ifconf() in fs/compat_ioctl.c causes ifconf data to be truncated 1 entry too early when copying it to userspace. The correct amount of data (length) is returned, but the final entry is empty (zero, not filled in). The for-loop 'i' check should use <= to allow the final struct ifreq32 to be copied. I also used the ifconf-corruption program in kernel bugzilla #4746 to make sure that this change does not re-introduce the corruption. Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-08[IA64] Fix race in the accessed/dirty bit handlersChristoph Lameter
A pte may be zapped by the swapper, exiting process, unmapping or page migration while the accessed or dirty bit handers are about to run. In that case the accessed bit or dirty is set on an zeroed pte which leads the VM to conclude that this is a swap pte. This may lead to - Messages from the vm like swap_free: Bad swap file entry 4000000000000000 - Processes being aborted swap_dup: Bad swap file entry 4000000000000000 VM: killing process .... Page migration is particular suitable for the creation of this race since it needs to remove and restore page table entries. The fix here is to check for the present bit and simply not update the pte if the page is not present anymore. If the page is not present then the fault handler should run next which will take care of the problem by bringing the page back and then mark the page dirty or move it onto the active list. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-03-08[PATCH] fix kexec asmMichael Matz
While testing kexec and kdump we hit problems where the new kernel would freeze or instantly reboot. The easiest way to trigger it was to kexec a kernel compiled for CONFIG_M586 on an athlon cpu. Compiling for CONFIG_MK7 instead would work fine. The patch fixes a few problems with the kexec inline asm. Signed-off-by: Chris Mason <mason@suse.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-08[PATCH] dac960: add disk entropy in request completionsMatt Mackall
Signed-off-by: Matt Mackall <mpm@selenic.com> Tested-by: Anders K. Pedersen <akp@cohaesio.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-08[PATCH] slab: allocate larger cache_cache if order 0 failsJack Steiner
kmem_cache_init() incorrectly assumes that the cache_cache object will fit in an order 0 allocation. On very large systems, this is not true. Change the code to try larger order allocations if order 0 fails. Signed-off-by: Jack Steiner <steiner@sgi.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-08[PATCH] v9fs: fix for access to unitialized variables or freed memoryLatchesar Ionkov
Miscellaneous fixes related to accessing uninitialized variables or memory that was already freed. Signed-off-by: Latchesar Ionkov <lucho@ionkov.net> Cc: Eric Van Hensbergen <ericvh@ericvh.myip.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-08[PATCH] x86: cpu model calculation for family 6 cpuShaohua Li
The x86_model calculation also applies for family 6. early_cpu_detect does the right thing, but generic_identify misses. Signed-off-by: Shaohua Li<shaohua.li@intel.com> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: "Seth, Rohit" <rohit.seth@intel.com> Acked-by: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-08[PATCH] s390: dasd partition detectionHorst Hummel
DASD allows to open a device as soon as gendisk is registered, which means the device is a fake device (capacity=0) and we do know nothing about blocksize and partitions at that point of time. In case the device is opened by someone, the bdev and inode creation is done with the fake device info and the following partition detection code is just using the wrong data. To avoid this modify the DASD state machine to make sure that the open is rejected until the device analysis is either finished or an unformatted device was detected. Signed-off-by: Horst Hummel <horst.hummel@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-08[PATCH] s390: iucv message limit for smsgMartin Schwidefsky
The message limit on the iucv connect call for the smsg module is too low. Therefore increase the smsg message limit to 255. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-08[PATCH] s390: fix strnlen_user return valueGerald Schaefer
strnlen_user is supposed to return then length count + 1 if no terminating \0 is found, and it should return 0 on exception. Found by David Howells <dhowells@redhat.com>. Signed-off-by: Gerald Schaefer <geraldsc@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-By: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-08[PATCH] jffs2: avoid divide-by-zeroDavid Woodhouse
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-08[PATCH] fix file countingDipankar Sarma
I have benchmarked this on an x86_64 NUMA system and see no significant performance difference on kernbench. Tested on both x86_64 and powerpc. The way we do file struct accounting is not very suitable for batched freeing. For scalability reasons, file accounting was constructor/destructor based. This meant that nr_files was decremented only when the object was removed from the slab cache. This is susceptible to slab fragmentation. With RCU based file structure, consequent batched freeing and a test program like Serge's, we just speed this up and end up with a very fragmented slab - llm22:~ # cat /proc/sys/fs/file-nr 587730 0 758844 At the same time, I see only a 2000+ objects in filp cache. The following patch I fixes this problem. This patch changes the file counting by removing the filp_count_lock. Instead we use a separate percpu counter, nr_files, for now and all accesses to it are through get_nr_files() api. In the sysctl handler for nr_files, we populate files_stat.nr_files before returning to user. Counting files as an when they are created and destroyed (as opposed to inside slab) allows us to correctly count open files with RCU. Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-08[PATCH] rcu batch tuningDipankar Sarma
This patch adds new tunables for RCU queue and finished batches. There are two types of controls - number of completed RCU updates invoked in a batch (blimit) and monitoring for high rate of incoming RCUs on a cpu (qhimark, qlowmark). By default, the per-cpu batch limit is set to a small value. If the input RCU rate exceeds the high watermark, we do two things - force quiescent state on all cpus and set the batch limit of the CPU to INTMAX. Setting batch limit to INTMAX forces all finished RCUs to be processed in one shot. If we have more than INTMAX RCUs queued up, then we have bigger problems anyway. Once the incoming queued RCUs fall below the low watermark, the batch limit is set to the default. Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-08[PATCH] percpu_counter_sum()Andrew Morton
Implement percpu_counter_sum(). This is a more accurate but slower version of percpu_counter_read_positive(). We need this for Alex's speedup-ext3_statfs patch and for the nr_file accounting fix. Otherwise these things would be too inaccurate on large CPU counts. Cc: Ravikiran G Thirumalai <kiran@scalex86.org> Cc: Alex Tomas <alex@clusterfs.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-08[PATCH] x86: Fix i386 nmi_watchdog that does not trigger die_nmiGOTO Masanori
Fix i386 nmi_watchdog that does not meet watchdog timeout condition. It does not hit die_nmi when it should be triggered, because the current nmi_watchdog_tick in arch/i386/kernel/nmi.c never count up alert_counter like this: void nmi_watchdog_tick (struct pt_regs * regs) { if (last_irq_sums[cpu] == sum) { alert_counter[cpu]++; <- count up alert_counter, but if (alert_counter[cpu] == 5*nmi_hz) die_nmi(regs, "NMI Watchdog detected LOCKUP"); alert_counter[cpu] = 0; <- reset alert_counter This patch changes it back to the previous and working version. This was found and originally written by Kohta NAKASHIMA. (akpm: also uninline write_watchdog_counter(), saving 184 byets) Signed-off-by: GOTO Masanori <gotom@sanori.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>