Age | Commit message (Collapse) | Author |
|
|
|
|
|
The scheduler will stop load balancing if the most busy processor contains
processes pinned via processor affinity.
The scheduler currently only does one search for busiest cpu. If it cannot
pull any tasks away from the busiest cpu because they were pinned then the
scheduler goes into a corner and sulks leaving the idle processors idle.
F.e. If you have processor 0 busy running four tasks pinned via taskset,
there are none on processor 1 and one just started two processes on
processor 2 then the scheduler will not move one of the two processes away
from processor 2.
This patch fixes that issue by forcing the scheduler to come out of its
corner and retrying the load balancing by considering other processors for
load balancing.
This patch was originally developed by John Hawkes and discussed at
http://marc.theaimsgroup.com/?l=linux-kernel&m=113901368523205&w=2.
I have removed extraneous material and gone back to equipping struct rq
with the cpu the queue is associated with since this makes the patch much
easier and it is likely that others in the future will have the same
difficulty of figuring out which processor owns which runqueue.
The overhead added through these patches is a single word on the stack if
the kernel is configured to support 32 cpus or less (32 bit). For 32 bit
environments the maximum number of cpus that can be configued is 255 which
would result in the use of 32 bytes additional on the stack. On IA64 up to
1k cpus can be configured which will result in the use of 128 additional
bytes on the stack. The maximum additional cache footprint is one
cacheline. Typically memory use will be much less than a cacheline and the
additional cpumask will be placed on the stack in a cacheline that already
contains other local variable.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
Add a new PCI ID for SiI 3124. Reported by Silicon Image.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
This patch prevents cross-region mappings
on IA64 and SPARC which could lead to system crash.
Adrian Bunk:
Adapted to 2.6.16.
Signed-Off-By: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
xmon writes garbage on the screen because the nvidia console driver has
changed the line pitch from what the firmware set it to. Fix it by making
the nvidia driver inform the btext engine (which xmon uses if the screen is
its output device) about changes to display resolution.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
As pointed out by Herbert Xu <herbert@gondor.apana.org.au>, our
memcpy implementation didn't return the destination pointer as its
return value, and there is code in the kernel that expects that.
This fixes it.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
When pskb_trim has to defer to ___pksb_trim to trim the frag_list part of
the packet, the frag_list is not updated to reflect the trimming. This
will usually work fine until you hit something that uses the packet length
or tail from the frag_list.
Examples include esp_output and ip_fragment.
Another problem caused by this is that you can end up with a linear packet
with a frag_list attached.
It is possible to get away with this if we audit everything to make sure
that they always consult skb->len before going down onto frag_list. In
fact we can do the samething for the paged part as well to avoid copying
the data area of the skb. For now though, let's do the conservative fix
and update frag_list.
Many thanks to Marco Berizzi for helping me to track down this bug.
This 4-year old bug took 3 months to track down. Marco was very patient
indeed :)
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
The scx200_acb i2c bus driver pretends to support SMBus block
transactions, but in fact it implements the more simple I2C block
transactions. Additionally, it lacks sanity checks on the length
of the block transactions, which could lead to a buffer overrun.
This fixes an oops reported by Alexander Atanasov:
http://marc.theaimsgroup.com/?l=linux-kernel&m=114970382125094
Thanks to Ben Gardner for fixing my bugs :)
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
* Nack was sent one byte too late on reads >= 2 bytes.
* Stop bit was set one byte too late on reads.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
Quadro NVS280 is a dual-head PCIe card with PCI ID 10de:00fd and subsystem I
10de:0215.
Signed-off-by: Pavel Roskin <proski@gnu.org>
Signed-off-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
The current sun disklabel code uses a signed int for the sector count.
When partitions larger than 1 TB are used, the cast to a sector_t causes
the partition sizes to be invalid:
# cat /proc/paritions | grep sdan
66 112 2146435072 sdan
66 115 9223372036853660736 sdan3
66 120 9223372036853660736 sdan8
This patch switches the sector count to an unsigned int to fix this.
Eric Sandeen also submitted the same patch.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
We have seen a couple of __alloc_pages() failures due to
fragmentation, there is plenty of free memory but no large order pages
available. I think the problem is in sock_alloc_send_pskb(), the
gfp_mask includes __GFP_REPEAT but its never used/passed to the page
allocator. Shouldnt the gfp_mask be passed to alloc_skb() ?
Signed-off-by: Larry Woodman <lwoodman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
Since pskb_copy tacks on the non-linear bits from the original
skb, it needs to count them in the truesize field of the new skb.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
This patch removes consideration of high memory when determining TCP
hash table sizes. Taking into account high memory results in tcp_mem
values that are too large.
Signed-off-by: John Heffner <jheffner@psc.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
This patch removes an unused variable.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Dave Jones <davej@redhat.com>
|
|
Based on a patch from the Ubuntu kernel.
Signed-off-by: Ben Collins <bcollins@ubuntu.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
The Coverity checker noted that in
drivers/telephony/ixj.c:ixj_build_filter_cadence(), filter_en[4] or
filter_en[5] could be written to.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
Add support for Geforce 6100 and related chipsets (PCI device id 0x024x)
Signed-off-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
START_ARRAY will not be removed in 2.6.16, therefore replace the date
reference with a kernel version reference.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
The information in Documentation/feature-removal-schedule.txt
doesn't apply to the 2.6.16 branch (and most dates were already
in the past.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
During OpenVZ stress testing we found that UDP traffic with random src
can generate too much excessive rt hash growing leading finally to OOM
and kernel panics.
It was found that for 4GB i686 system (having 1048576 total pages and
225280 normal zone pages) kernel allocates the following route hash:
syslog: IP route cache hash table entries: 262144 (order: 8, 1048576
bytes) => ip_rt_max_size = 4194304 entries, i.e. max rt size is
4194304 * 256b = 1Gb of RAM > normal_zone
Attached the patch which removes HASH_HIGHMEM flag from
alloc_large_system_hash() call.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
This just turns off chmod() on the /proc/<pid>/ files, since there is no
good reason to allow it, and had we disallowed it originally, the nasty
/proc race exploit wouldn't have been possible.
The other patches already fixed the problem chmod() could cause, so this
is really just some final mop-up..
This particular version is based off a patch by Eugene and Marcel which
had much better naming than my original equivalent one.
Signed-off-by: Eugene Teo <eteo@redhat.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
Address http://bugzilla.kernel.org/show_bug.cgi?id=7189
It should check `clen', not `len'.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
This bug was unknowingly fixed the GSO patches (or rather, its effect was
unknown at the time).
Thanks to Marco Berizzi's persistence which is documented in the thread
"ipsec tunnel asymmetrical mtu", we now know that it can have highly
non-obvious symptoms.
What happens is that uninitialised uso_size fields can cause packets to
be incorrectly identified as UFO, which means that it does not get
fragmented even if it's over the MTU.
The fix is simple enough.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
1434 static inline void cow_user_page(struct page *dst, struct page *src, unsigned long va)
1435 {
1436 /*
1437 * If the source page was a PFN mapping, we don't have
1438 * a "struct page" for it. We do a best-effort copy by
1439 * just copying from the original user address. If that
1440 * fails, we just zero-fill it. Live with it.
1441 */
1442 if (unlikely(!src)) {
1443 void *kaddr = kmap_atomic(dst, KM_USER0);
1444 void __user *uaddr = (void __user *)(va & PAGE_MASK);
1445
1446 /*
1447 * This really shouldn't fail, because the page is there
1448 * in the page tables. But it might just be unreadable,
1449 * in which case we just give up and fill the result with
1450 * zeroes.
1451 */
1452 if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE))
1453 memset(kaddr, 0, PAGE_SIZE);
1454 kunmap_atomic(kaddr, KM_USER0);
#### D-cache have to be flushed here.
#### It seems it is just forgotten.
1455 return;
1456
1457 }
1458 copy_user_highpage(dst, src, va);
#### Ok here. flush_dcache_page() called from this func if arch need it
1459 }
Signed-off-by: Dmitriy Monakhov <dmonakhov@openvz.org>
Acked-by: David Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
Module reference needs to be given back if message header
construction fails.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
Return ENOENT if action module is unavailable
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
The TCA_ACT_KIND attribute is used without checking its
availability when dumping actions therefore leading to a
value of 0x4 being dereferenced.
The use of strcmp() in tc_lookup_action_n() isn't safe
when fed with string from an attribute without enforcing
proper NUL termination.
Both bugs can be triggered with malformed netlink message
and don't require any privileges.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
"return -err" and blindly inheriting the error code in the netlink
failure exception handler causes errors codes to be returned as
positive value therefore making them being ignored by the caller.
May lead to sending out incomplete netlink messages.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
Else a subsequent bio_clone might make a mess.
Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
Fix powernow-k8 doesn't load bug.
Reference:
https://launchpad.net/distros/ubuntu/+source/linux-source-2.6.15/+bug/35145
Signed-off-by: Ben Collins <bcollins@ubuntu.com>
Signed-off-by: Dave Jones <davej@redhat.com>
|
|
Even though powernow-k7 doesn't work in SMP environments,
it can work on an SMP configured kernel if there's only
one CPU present, however recalibrate_cpu_khz was returning
-EINVAL on such kernels, so we failed to init the cpufreq driver.
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
|
|
|
|
As reported by Mark Dowd <Mark_Dowd@McAfee.com>, ip6_tables is susceptible
to a fragmentation attack causing false negatives on extension header
matches.
When extension headers occur in the non-first fragment after the fragment
header (possibly with an incorrect nexthdr value in the fragment header)
a rule looking for this extension header will never match.
Drop fragments that are at offset 0 and don't contain the final protocol
header regardless of the ruleset, since this should not happen normally.
Since all extension headers are before the protocol header this makes sure
an extension header is either not present or in the first fragment, where
we can properly parse it.
With help from Yasuyuki KOZAKAI <yasuyuki.kozakai@toshiba.co.jp>.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
As reported by Mark Dowd <Mark_Dowd@McAfee.com>, ip6_tables is susceptible
to a fragmentation attack causing false negatives on protocol matches.
When the protocol header doesn't follow the fragment header immediately,
the fragment header contains the protocol number of the next extension
header. When the extension header and the protocol header are sent in
a second fragment a rule like "ip6tables .. -p udp -j DROP" will never
match.
Drop fragments that are at offset 0 and don't contain the final protocol
header regardless of the ruleset, since this should not happen normally.
With help from Yasuyuki KOZAKAI <yasuyuki.kozakai@toshiba.co.jp>.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
This is a long standing bug that seems to have only recently become
apparent, presumably due to increasing use of NFS over TCP - many
distros seem to be making it the default.
The SK_CONN bit gets set when a listening socket may be ready
for an accept, just as SK_DATA is set when data may be available.
It is entirely possible for svc_tcp_accept to be called with neither
of these set. It doesn't happen often but there is a small race in
svc_sock_enqueue as SK_CONN and SK_DATA are tested outside the
spin_lock. They could be cleared immediately after the test and
before the lock is gained.
This normally shouldn't be a problem. The sockets are non-blocking so
trying to read() or accept() when ther is nothing to do is not a problem.
However: svc_tcp_recvfrom makes the decision "Should I accept() or
should I read()" based on whether SK_CONN is set or not. This usually
works but is not safe. The decision should be based on whether it is
a TCP_LISTEN socket or a TCP_CONNECTED socket.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
The integer divisions in the timer accounting code can round the result
down to 0. Adding 0 is without effect and the signal delivery stops.
Clamp the division result to minimum 1 to avoid this.
Problem was reported by Seongbae Park <spark@google.com>, who provided
also an inital patch.
Roland sayeth:
I have had some more time to think about the problem, and to reproduce it
using Toyo's test case. For the record, if my understanding of the problem
is correct, this happens only in one very particular case. First, the
expiry time has to be so soon that in cputime_t units (usually 1s/HZ ticks)
it's < nthreads so the division yields zero. Second, it only affects each
thread that is so new that its CPU time accumulation is zero so now+0 is
still zero and ->it_*_expires winds up staying zero. For the VIRT and PROF
clocks when cputime_t is tick granularity (or the SCHED clock on
configurations where sched_clock's value only advances on clock ticks), this
is not hard to arrange with new threads starting up and blocking before they
accumulate a whole tick of CPU time. That's what happens in Toyo's test
case.
Note that in general it is fine for that division to round down to zero,
and set each thread's expiry time to its "now" time. The problem only
arises with thread's whose "now" value is still zero, so that now+0 winds up
0 and is interpreted as "not set" instead of ">= now". So it would be a
sufficient and more precise fix to just use max(ticks, 1) inside the loop
when setting each it_*_expires value.
But, it does no harm to round the division up to one and always advance
every thread's expiry time. If the thread didn't already fire timers for
the expiry time of "now", there is no expectation that it will do so before
the next tick anyway. So I followed Thomas's patch in lifting the max out
of the loops.
This patch also covers the reload cases, which are harder to write a test
for (and I didn't try). I've tested it with Toyo's case and it fixes that.
[toyoa@mvista.com: fix: min_t -> max_t]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
There's a bug in the seqfile handling for /proc/net/ip6_flowlabel, where,
after finding a flowlabel, the code will loop forever not finding any
further flowlabels, first traversing the rest of the hash bucket then just
looping.
This patch fixes the problem by breaking after the hash bucket has been
traversed.
Note that this bug can cause lockups and oopses, and is trivially invoked
by an unpriveleged user.
Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
http://bugzilla.kernel.org/show_bug.cgi?id=5653
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
memcpy 4 bytes to address of auto unsigned long variable followed
by comparison with u32 is a bloody bad idea.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
The previous patch to correct the copy_from_user padding is quite
broken. The execute instruction needs to be done via the register %r4,
not via %r2 and 31 bit doesn't know the instructions lgr and ahji.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
A user space program can read uninitialised kernel memory
by appending to a file from a bad address and then reading
the result back. The cause is the copy_from_user function
that does not clear the remaining bytes of the kernel
buffer after it got a fault on the user space address.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
|
|
|
|
Fix a bug in sys_perfmonctl() whereby it was not correctly
decrementing the file descriptor reference count.
Signed-off-by: Stephane Eranian <eranian@hpl.hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
PPPoE must advertise the underlying device's MTU via the ppp channel
descriptor structure, as multilink functionality depends on it.
Signed-off-by: Michal Ostrowski <mostrows@earthlink.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
Prevents filters from being added if the first generated
handle already exists.
Signed-off-by: Kim Nordlund <kim.nordlund@nokia.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|