aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2006-01-18[PATCH] mm: optimize numa policy handling in slab allocatorChristoph Lameter
Move the interrupt check from slab_node into ___cache_alloc and adds an "unlikely()" to avoid pipeline stalls on some architectures. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-18[PATCH] NUMA policies in the slab allocator V2Christoph Lameter
This patch fixes a regression in 2.6.14 against 2.6.13 that causes an imbalance in memory allocation during bootup. The slab allocator in 2.6.13 is not numa aware and simply calls alloc_pages(). This means that memory policies may control the behavior of alloc_pages(). During bootup the memory policy is set to MPOL_INTERLEAVE resulting in the spreading out of allocations during bootup over all available nodes. The slab allocator in 2.6.13 has only a single list of slab pages. As a result the per cpu slab cache and the spinlock controlled page lists may contain slab entries from off node memory. The slab allocator in 2.6.13 makes no effort to discern the locality of an entry on its lists. The NUMA aware slab allocator in 2.6.14 controls locality of the slab pages explicitly by calling alloc_pages_node(). The NUMA slab allocator manages slab entries by having lists of available slab pages for each node. The per cpu slab cache can only contain slab entries associated with the node local to the processor. This guarantees that the default allocation mode of the slab allocator always assigns local memory if available. Setting MPOL_INTERLEAVE as a default policy during bootup has no effect anymore. In 2.6.14 all node unspecific slab allocations are performed on the boot processor. This means that most of key data structures are allocated on one node. Most processors will have to refer to these structures making the boot node a potential bottleneck. This may reduce performance and cause unnecessary memory pressure on the boot node. This patch implements NUMA policies in the slab layer. There is the need of explicit application of NUMA memory policies by the slab allcator itself since the NUMA slab allocator does no longer let the page_allocator control locality. The check for policies is made directly at the beginning of __cache_alloc using current->mempolicy. The memory policy is already frequently checked by the page allocator (alloc_page_vma() and alloc_page_current()). So it is highly likely that the cacheline is present. For MPOL_INTERLEAVE kmalloc() will spread out each request to one node after another so that an equal distribution of allocations can be obtained during bootup. It is not possible to push the policy check to lower layers of the NUMA slab allocator since the per cpu caches are now only containing slab entries from the current node. If the policy says that the local node is not to be preferred or forbidden then there is no point in checking the slab cache or local list of slab pages. The allocation better be directed immediately to the lists containing slab entries for the allowed set of nodes. This way of applying policy also fixes another strange behavior in 2.6.13. alloc_pages() is controlled by the memory allocation policy of the current process. It could therefore be that one process is running with MPOL_INTERLEAVE and would f.e. obtain a new page following that policy since no slab entries are in the lists anymore. A page can typically be used for multiple slab entries but lets say that the current process is only using one. The other entries are then added to the slab lists. These are now non local entries in the slab lists despite of the possible availability of local pages that would provide faster access and increase the performance of the application. Another process without MPOL_INTERLEAVE may now run and expect a local slab entry from kmalloc(). However, there are still these free slab entries from the off node page obtained from the other process via MPOL_INTERLEAVE in the cache. The process will then get an off node slab entry although other slab entries may be available that are local to that process. This means that the policy if one process may contaminate the locality of the slab caches for other processes. This patch in effect insures that a per process policy is followed for the allocation of slab entries and that there cannot be a memory policy influence from one process to another. A process with default policy will always get a local slab entry if one is available. And the process using memory policies will get its memory arranged as requested. Off-node slab allocation will require the use of spinlocks and will make the use of per cpu caches not possible. A process using memory policies to redirect allocations offnode will have to cope with additional lock overhead in addition to the latency added by the need to access a remote slab entry. Changes V1->V2 - Remove #ifdef CONFIG_NUMA by moving forward declaration into prior #ifdef CONFIG_NUMA section. - Give the function determining the node number to use a saner name. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-18[PATCH] sem2mutex: mm/slab.cIngo Molnar
Convert mm/swapfile.c's swapon_sem to swapon_mutex. Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-18[PATCH] Zone reclaim: proc overrideChristoph Lameter
proc support for zone reclaim This patch creates a proc entry /proc/sys/vm/zone_reclaim_mode that may be used to override the automatic determination of the zone reclaim made on bootup. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-18[PATCH] Zone reclaim: Reclaim logicChristoph Lameter
Some bits for zone reclaim exists in 2.6.15 but they are not usable. This patch fixes them up, removes unused code and makes zone reclaim usable. Zone reclaim allows the reclaiming of pages from a zone if the number of free pages falls below the watermarks even if other zones still have enough pages available. Zone reclaim is of particular importance for NUMA machines. It can be more beneficial to reclaim a page than taking the performance penalties that come with allocating a page on a remote zone. Zone reclaim is enabled if the maximum distance to another node is higher than RECLAIM_DISTANCE, which may be defined by an arch. By default RECLAIM_DISTANCE is 20. 20 is the distance to another node in the same component (enclosure or motherboard) on IA64. The meaning of the NUMA distance information seems to vary by arch. If zone reclaim is not successful then no further reclaim attempts will occur for a certain time period (ZONE_RECLAIM_INTERVAL). This patch was discussed before. See http://marc.theaimsgroup.com/?l=linux-kernel&m=113519961504207&w=2 http://marc.theaimsgroup.com/?l=linux-kernel&m=113408418232531&w=2 http://marc.theaimsgroup.com/?l=linux-kernel&m=113389027420032&w=2 http://marc.theaimsgroup.com/?l=linux-kernel&m=113380938612205&w=2 Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-18[PATCH] Zone reclaim: resurrect may_swapChristoph Lameter
Zone reclaim has a huge impact on NUMA performance (f.e. our maximum throughput with XFS is raised from 4GB to 6GB/sec / page cache contamination of numa nodes destroys locality if one just does a large copy operation which results in performance dropping for good until reboot). This patch: Resurrect may_swap in struct scan_control Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-18[PATCH] Simplify migrate_page_addChristoph Lameter
Simplify migrate_page_add after feedback from Hugh. This also allows us to drop one parameter from migrate_page_add. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-18[PATCH] mm: migration page refcounting fixNick Piggin
Migration code currently does not take a reference to target page properly, so between unlocking the pte and trying to take a new reference to the page with isolate_lru_page, anything could happen to it. Fix this by holding the pte lock until we get a chance to elevate the refcount. Other small cleanups while we're here. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-18[PATCH] mm: dirty_exceeded speedupAndrew Morton
Ravikiran reports that this variable is bouncing all around nodes on NUMA machines, causing measurable performance problems. Fix that up by only writing to it when it actually changed. And put it in a new cacheline to prevent it sharing with other things (this happened). Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-18[PATCH] Prevent trident driver from grabbing pcnet32 hardwareJon Mason
Some pcnet32 hardware erroneously has the Vendor ID for Trident. The pcnet32 driver looks for the PCI ethernet class before grabbing the hardware, but the current trident driver does not check against the PCI audio class. This allows the trident driver to claim the pcnet32 hardware. This patch prevents that. This revised version of the OSS Trident patch includes PCI_DEVICE Macro usage. Signed-off-by: Jon Mason <jdmason@us.ibm.com> Signed-off-by: Muli Ben-Yehuda <mulix@mulix.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-18[PATCH] synclink_gt fix size of register value storagePaul Fulghum
Fix incorrect variable size used to hold register value. This bug might wipe out a portion of the TCR value when setting the interface options. Signed-off-by: Paul Fulghum <paulkf@microgate.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-18[PATCH] scsi_transport_spi build fixAndrew Morton
On alpha: In file included from drivers/scsi/sym53c8xx_2/sym_glue.h:59, from drivers/scsi/sym53c8xx_2/sym_fw.c:40: include/scsi/scsi_transport_spi.h:57: error: field `dv_mutex' has incomplete type Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-18[PATCH] x86_64: Fix MCE exception stack for boot CPUJan Beulich
Fix a typo/mis-merge in one of the previous patches. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-18[PATCH] jbd: remove_transaction fixJan Kara
We have to check that also the second checkpoint list is non-empty before dropping the transaction. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-18[PATCH] jbd: log_do_checkpoint fixJan Kara
While checkpointing we have to check that our transaction still is in the checkpoint list *and* (not or) that it's not just a different transaction with the same address. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-18Merge master.kernel.org:/home/rmk/linux-2.6-serialLinus Torvalds
2006-01-18Merge master.kernel.org:/home/rmk/linux-2.6-armLinus Torvalds
2006-01-18Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
2006-01-18Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6Linus Torvalds
2006-01-18[SPARC64]: Fix build with CONFIG_COMPAT disabled.David S. Miller
Based upon a report and preliminary patch from Jim Gifford. Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-18Merge master.kernel.org:/pub/scm/linux/kernel/git/tmlind/linux-omap-upstreamRussell King
2006-01-18[SPARC64]: Serial Console for E250 PatchEddie C. Dost
From: Eddie C. Dost <ecd@brainaid.de> I have the following patch for serial console over the RSC (remote system controller) on my E250 machine. It basically adds support for input-device=rsc and output-device=rsc from OBP, and allows 115200,8,n,1,- serial mode setting. Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-18[MAINTAINERS]: add entry for wireless networkingJohn W. Linville
Add an entry to MAINTAINERS for wireless networking, just so people know whom to bless with patches. Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-18[MAINTAINERS]: correct location for net-2.6.gitJohn W. Linville
Correct location info for net-2.6 git tree. Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-18[ARM] 3281/1: ixp4xx: export ixp4xx_exp_bus_size for modulesDavid Vrabel
Patch from David Vrabel Export ixp4xx_exp_bus_size so modules can use the IXP4XX_EXP_BUS_BASE(n) macro. Also, fix a printk format warning. Signed-off-by: David Vrabel <dvrabel@arcom.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-01-18[ARM] 3272/1: fix kernel decompressor crashNicolas Pitre
Patch from Nicolas Pitre Commit f4619025a51747a3788fd1bb6bdc46e368a889a7 broke the kernel decompressor (at least on PXA). Here's the fix. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-01-18[ARM] 3271/1: ARM EABI: fix calling of cmpxchg syscall emulationNicolas Pitre
Patch from Nicolas Pitre This is kernel provided user space code. Since a syscall is used, it has to be updated to work with EABI. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-01-18[ARM] 3270/1: ARM EABI: fix sigreturn and rt_sigreturnNicolas Pitre
Patch from Nicolas Pitre The signal return path consists of user code provided by the kernel. Since a syscall is used, it has to be updated to work with EABI. Noticed by Daniel Jacobowitz. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-01-18[ARM] 3268/1: AT91RM9200 serial update for 2.6.15-git12Andrew Victor
Patch from Andrew Victor This patch fixes two small issues with 2.6.15-git12. 1) Corrected major/minor numbers for ttyAT devices in the KConfig help. (Patch from Karl Olsen) 2) tty->flip.count has been removed. Signed-off-by: Andrew Victor <andrew@sanpeople.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-01-18[ARM] 3267/1: PXA27x SSP controller register definesDavid Vrabel
Patch from David Vrabel PXA27x SSP controller has a few different registers, including SCR (serial clock rate) in SSCR0. Signed-off-by: David Vrabel <dvrabel@arcom.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-01-18Merge git://tipc.cslab.ericsson.net/pub/git/tipcDavid S. Miller
2006-01-18[IPV4]: Fix multiple bugs in IGMPv3David L Stevens
1) fix "mld_marksources()" to a) send nothing when all queried sources are excluded b) send full exclude report when source queried sources are not excluded c) don't schedule a timer when there's nothing to report 2) fix "add_grec()" to send empty-source records when it should The original check doesn't account for a non-empty source list with all sources inactive; the new code keeps that short-circuit case, and also generates the group header with an empty list if needed. 3) fix mca_crcount decrement to be after add_grec(), which needs its original value 4) add/remove delete records and prevent current advertisements when an exclude-mode filter moves from "active" to "inactive" or vice versa based on new filter additions. Items 1-3 are just IPv4 versions of the IPv6 bugs found by Yan Zheng and fixed earlier. Item #4 is a related bug that affects exclude-mode change records only (but not queries) and also occurs in IPv6 (IPv6 version coming soon). Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-18[PKTGEN]: Respect hard_header_len of device.David S. Miller
Don't assume 16. Found by Ben Greear. Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-18[IRDA]: maintainer statusStephen Hemminger
Jean says he really doesn't have time to much IRDA any more. The following would help motivate someone who has more time. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-18[CASSINI]: dont touch page_countNick Piggin
Remove page refcount manipulations from cassini driver by using another field in struct page. Needed for lockless pagecache. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-18[SPARC64]: Update defconfig.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-18[PATCH] e1000: fix compile warningJesse Brandeburg
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: John Ronciak <john.ronciak@intel.com> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2006-01-18[PATCH] e1000: fix receive breakageJesse Brandeburg
in attempting to not send the "prefetch" patch, we broke the receive code, this patch fixes that issue. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: John Ronciak <john.ronciak@intel.com> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2006-01-18[PATCH] e1000: Added driver commentsJesse Brandeburg
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: John Ronciak <john.ronciak@intel.com> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2006-01-18[PATCH] e1000: Fix whitespaceJesse Brandeburg
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: John Ronciak <john.ronciak@intel.com> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2006-01-18[PATCH] e1000: Added functions declarationsJesse Brandeburg
Added e1000_mc_addr_list_update Added e1000_read_reg_io Added e1000_enable_pciex_master These are not static functions, that is why we have them declared in the header. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: John Ronciak <john.ronciak@intel.com> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2006-01-18[PATCH] e1000: Added functions to save and restore configJesse Brandeburg
These functions help restore the driver to active configuration when coming out of resume for power management. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: John Ronciak <john.ronciak@intel.com> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2006-01-18[PATCH] e1000: Added RX buffer enhancementsJesse Brandeburg
Align the prefetches to a dword to help speed them up. Recycle skb's and early replenish. Force memory writes to complete before fetching more descriptors. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: John Ronciak <john.ronciak@intel.com> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2006-01-18[PATCH] e1000: Added disable packet split capabilityJesse Brandeburg
Adds the ability to disability packet split at compile time and use the legacy receive path on PCI express hardware. Made this a CONFIG option and modified the Kconfig, to reflect the new option. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: John Ronciak <john.ronciak@intel.com> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2006-01-18[SERIAL] Add 8250 support for Decision Computer International Co. PCCOM2Alon Bar-Lev
There is a new device which is look like: Serial controller: Decision Computer International Co. PCCOM2 (rev 02) (prog-if 02 [16550]) 0700: 6666:0004 (rev 02) (prog-if 02) Flags: medium devsel, IRQ 177 Memory at fe000000 (32-bit, non-prefetchable) [size=128] I/O ports at e880 [size=128] I/O ports at e400 [size=256] It has two 16550A, and is not listed in kernel, although the manufacturer clams that it is supported... I've created the following patch, it only add the new PCI id and the card to the repository, it seems to work. Signed-off-by: Alon Bar-Lev <alon.barlev@gmail.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-01-18[SERIAL] Fix serial8250 driver initialisation orderingRussell King
Commit 7493a314cb83797ce612a577475aacaedc553fed changed the ordering of the registration of the platform device driver vs the 8250 drivers internal initialisation. This led to the probe function being called before the driver had finished its internal initialisation, causing mayhem. Revert the ordering change. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-01-17[IPV4]: RT_CACHE_STAT_INC() warning fixAndrew Morton
BUG: using smp_processor_id() in preemptible [00000001] code: rpc.statd/2408 And it _is_ a bug, but I guess we don't care enough to add preempt_disable(). Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-17Merge git://oss.sgi.com:8090/oss/git/xfs-2.6Linus Torvalds
2006-01-17Merge branch 'upstream-linus' of ↵Linus Torvalds
master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev
2006-01-17Merge branch 'upstream-linus' of ↵Linus Torvalds
master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6