aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2007-05-11signal/timer/event: timerfd coreDavide Libenzi
This patch introduces a new system call for timers events delivered though file descriptors. This allows timer event to be used with standard POSIX poll(2), select(2) and read(2). As a consequence of supporting the Linux f_op->poll subsystem, they can be used with epoll(2) too. The system call is defined as: int timerfd(int ufd, int clockid, int flags, const struct itimerspec *utmr); The "ufd" parameter allows for re-use (re-programming) of an existing timerfd w/out going through the close/open cycle (same as signalfd). If "ufd" is -1, s new file descriptor will be created, otherwise the existing "ufd" will be re-programmed. The "clockid" parameter is either CLOCK_MONOTONIC or CLOCK_REALTIME. The time specified in the "utmr->it_value" parameter is the expiry time for the timer. If the TFD_TIMER_ABSTIME flag is set in "flags", this is an absolute time, otherwise it's a relative time. If the time specified in the "utmr->it_interval" is not zero (.tv_sec == 0, tv_nsec == 0), this is the period at which the following ticks should be generated. The "utmr->it_interval" should be set to zero if only one tick is requested. Setting the "utmr->it_value" to zero will disable the timer, or will create a timerfd without the timer enabled. The function returns the new (or same, in case "ufd" is a valid timerfd descriptor) file, or -1 in case of error. As stated before, the timerfd file descriptor supports poll(2), select(2) and epoll(2). When a timer event happened on the timerfd, a POLLIN mask will be returned. The read(2) call can be used, and it will return a u32 variable holding the number of "ticks" that happened on the interface since the last call to read(2). The read(2) call supportes the O_NONBLOCK flag too, and EAGAIN will be returned if no ticks happened. A quick test program, shows timerfd working correctly on my amd64 box: http://www.xmailserver.org/timerfd-test.c [akpm@linux-foundation.org: add sys_timerfd to sys_ni.c] Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11signal/timer/event: signalfd compat codeDavide Libenzi
This patch implements the necessary compat code for the signalfd system call. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11signal/timer/event: signalfd wire up x86 archesDavide Libenzi
This patch wires the signalfd system call to the x86 architectures. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11signal/timer/event: signalfd coreDavide Libenzi
This patch series implements the new signalfd() system call. I took part of the original Linus code (and you know how badly it can be broken :), and I added even more breakage ;) Signals are fetched from the same signal queue used by the process, so signalfd will compete with standard kernel delivery in dequeue_signal(). If you want to reliably fetch signals on the signalfd file, you need to block them with sigprocmask(SIG_BLOCK). This seems to be working fine on my Dual Opteron machine. I made a quick test program for it: http://www.xmailserver.org/signafd-test.c The signalfd() system call implements signal delivery into a file descriptor receiver. The signalfd file descriptor if created with the following API: int signalfd(int ufd, const sigset_t *mask, size_t masksize); The "ufd" parameter allows to change an existing signalfd sigmask, w/out going to close/create cycle (Linus idea). Use "ufd" == -1 if you want a brand new signalfd file. The "mask" allows to specify the signal mask of signals that we are interested in. The "masksize" parameter is the size of "mask". The signalfd fd supports the poll(2) and read(2) system calls. The poll(2) will return POLLIN when signals are available to be dequeued. As a direct consequence of supporting the Linux poll subsystem, the signalfd fd can use used together with epoll(2) too. The read(2) system call will return a "struct signalfd_siginfo" structure in the userspace supplied buffer. The return value is the number of bytes copied in the supplied buffer, or -1 in case of error. The read(2) call can also return 0, in case the sighand structure to which the signalfd was attached, has been orphaned. The O_NONBLOCK flag is also supported, and read(2) will return -EAGAIN in case no signal is available. If the size of the buffer passed to read(2) is lower than sizeof(struct signalfd_siginfo), -EINVAL is returned. A read from the signalfd can also return -ERESTARTSYS in case a signal hits the process. The format of the struct signalfd_siginfo is, and the valid fields depends of the (->code & __SI_MASK) value, in the same way a struct siginfo would: struct signalfd_siginfo { __u32 signo; /* si_signo */ __s32 err; /* si_errno */ __s32 code; /* si_code */ __u32 pid; /* si_pid */ __u32 uid; /* si_uid */ __s32 fd; /* si_fd */ __u32 tid; /* si_fd */ __u32 band; /* si_band */ __u32 overrun; /* si_overrun */ __u32 trapno; /* si_trapno */ __s32 status; /* si_status */ __s32 svint; /* si_int */ __u64 svptr; /* si_ptr */ __u64 utime; /* si_utime */ __u64 stime; /* si_stime */ __u64 addr; /* si_addr */ }; [akpm@linux-foundation.org: fix signalfd_copyinfo() on i386] Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11signal/timer/event fds: anonymous inode sourceDavide Libenzi
This patch add an anonymous inode source, to be used for files that need and inode only in order to create a file*. We do not care of having an inode for each file, and we do not even care of having different names in the associated dentries (dentry names will be same for classes of file*). This allow code reuse, and will be used by epoll, signalfd and timerfd (and whatever else there'll be). Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11Don't init pgrp and __session in INIT_SIGNALSSukadev Bhattiprolu
Remove initialization of pgrp and __session in INIT_SIGNALS, as these are later set by the call to __set_special_pids() in init/main.c by the patch: explicitly-set-pgid-and-sid-of-init-process.patch Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11Replace pid_t in autofs with struct pid referenceSukadev Bhattiprolu
Make autofs container-friendly by caching struct pid reference rather than pid_t and using pid_nr() to retreive a task's pid_t. ChangeLog: - Fix Eric Biederman's comments - Use find_get_pid() to hold a reference to oz_pgrp and release while unmounting; separate out changes to autofs and autofs4. - Fix Cedric's comments: retain old prototype of parse_options() and move necessary change to its caller. Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: containers@lists.osdl.org Acked-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11Fix some coding-style errors in autofsSukadev Bhattiprolu
Fix coding style errors (extra spaces, long lines) in autofs and autofs4 files being modified for container/pidspace issues. Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: <containers@lists.osdl.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11Kill unused sesssion and group values in rocket driverSukadev Bhattiprolu
The process_session() and process_group() values are not really used by the driver. Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: <containers@lists.osdl.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11Use task_pgrp() task_session() in copy_process()Sukadev Bhattiprolu
Use task_pgrp() and task_session() in copy_process(), and avoid find_pid() call when attaching the task to its process group and session. Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: <containers@lists.osdl.org> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11Use struct pid parameter in copy_process()Sukadev Bhattiprolu
Modify copy_process() to take a struct pid * parameter instead of a pid_t. This simplifies the code a bit and also avoids having to call find_pid() to convert the pid_t to a struct pid. Changelog: - Fixed Badari Pulavarty's comments and passed in &init_struct_pid from fork_idle(). - Fixed Eric Biederman's comments and simplified this patch and used a new patch to remove the likely(pid) check. Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: <containers@lists.osdl.org> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11Explicitly set pgid and sid of init processSukadev Bhattiprolu
Explicitly set pgid and sid of init process to 1. Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: <containers@lists.osdl.org> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11statically initialize struct pid for swapperSukadev Bhattiprolu
Statically initialize a struct pid for the swapper process (pid_t == 0) and attach it to init_task. This is needed so task_pid(), task_pgrp() and task_session() interfaces work on the swapper process also. Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: <containers@lists.osdl.org> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11attach_pid() with struct pid parameterSukadev Bhattiprolu
attach_pid() currently takes a pid_t and then uses find_pid() to find the corresponding struct pid. Sometimes we already have the struct pid. We can then skip find_pid() if attach_pid() were to take a struct pid parameter. Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: <containers@lists.osdl.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11rtc-rs5c313.c: add error handling to avoid hardware hangupkogiidena
Add error processing. Hanging up by an infinite loop is evaded. Signed-off-by: kogiidena <kogiidena@eggplant.ddo.jp> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: David Brownell <david-b@pacbell.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11rtc-rs5c313.c: rtc_time value are fixedkogiidena
Correct an initial value of suruct rtc_ time. Signed-off-by: kogiidena <kogiidena@eggplant.ddo.jp> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: David Brownell <david-b@pacbell.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11rtc-rs5c313.c: error and warning are fixedkogiidena
Correct a compile error and warning. Signed-off-by: kogiidena <kogiidena@eggplant.ddo.jp> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: David Brownell <david-b@pacbell.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11use defines in sys_getpriority/sys_setpriorityDaniel Walker
Switch to the defines for these two checks, instead of hard coding the values. [akpm@linux-foundation.org: add missing include] Signed-off-by: Daniel Walker <dwalker@mvista.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11MPC52xx PSC SPI master driverDragos Carp
SPI master driver for MPC52xx using its Programmable Serial Controller. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Dragos Carp <dragos.carp@toptica.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11consolidate generic_writepages and mpage_writepagesMiklos Szeredi
Clean up massive code duplication between mpage_writepages() and generic_writepages(). The new generic function, write_cache_pages() takes a function pointer argument, which will be called for each page to be written. Maybe cifs_writepages() too can use this infrastructure, but I'm not touching that with a ten-foot pole. The upcoming page writeback support in fuse will also want this. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Acked-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11tty: add compat_ioctlPaul Fulghum
Add compat_ioctl method for tty code to allow processing of 32 bit ioctl calls on 64 bit systems by tty core, tty drivers, and line disciplines. Based on patch by Arnd Bergmann: http://www.uwsg.iu.edu/hypermail/linux/kernel/0511.0/1732.html [akpm@linux-foundation.org: make things static] Signed-off-by: Paul Fulghum <paulkf@microgate.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11module_author: don't advise putting in an email addressRene Herman
module_author: don't advise putting in an email address It's information that's easily outdated and easily mistaken for a driver contact which is a problem especially for modules with multiple current and non-current authors as well as for modules with a maintainer who may not even be a module author. Signed-off-by: Rene Herman <rene.herman@gmail.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11SubmitChecklist: add -W helpAndrew Morton
Help people to work out how to use `gcc -W'. Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11Overrun in drivers/char/rio/riocmd.cEric Sesterhenn / Snakebyte
This got somehow lost in the noise. This fixes coverity bug id #1025, if Rup is greater or equal to MAX_RUP, we run past the Mapping Array. Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de> Cc: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11stop_machine() now uses hard_irq_disableBenjamin Herrenschmidt
Add a call to hard_irq_disable() to stop_machine so that we make sure IRQs are really disabled and not only lazy-disabled on archs like powerpc as some users of stop_machine() may rely on that. [akpm@linux-foundation.org: build fix] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11Add hard_irq_disable()Benjamin Herrenschmidt
Some architectures, like powerpc, implement lazy disabling of interrupts. That means that on those, local_irq_disable() doesn't actually disable interrupts on the CPU, but only sets some per CPU flag which cause them to be disabled only if an interrupt actually occurs. However, in some cases, such as stop_machine, we really want interrupts to be fully disabled. For example, I have code using stop machine to do ECC error injection, used to verify operations of the ECC hardware, that sort of thing. It really needs to make sure that nothing is actually writing to memory while the injection happens. Similar examples can be found in other low level bits and pieces. This patch implements a generic hard_irq_disable() function which is meant to be called -after- local_irq_disable() and ensures that interrupts are fully disabled on that CPU. The default implementation is a nop, though powerpc does already provide an appropriate one. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11powerpc: fixup hard_irq_disable semanticsBenjamin Herrenschmidt
This patch renames the raw hard_irq_{enable,disable} into __hard_irq_{enable,disable} and introduces a higher level hard_irq_disable() function that can be used by any code to enforce that IRQs are fully disabled, not only lazy disabled. The difference with the __ versions is that it will update some per-processor fields so that the kernel keeps track and properly re-enables them in the next local_irq_disable(); This prepares powerpc for my next patch that introduces hard_irq_disable() generically. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11synclink_gt: add compat_ioctlPaul Fulghum
Add support for 32 bit ioctl on 64 bit systems for synclink_gt Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Paul Fulghum <paulkf@microgate.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11small cleanup in gpt partition handlingOlaf Hering
Remove unused argument in is_pmbr_valid() Remove unneeded initialization of local variable legacy_mbr Signed-off-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11Consolidate asm/poll.hStephen Rothwell
These files are almost all the same. This patch could be made even simpler if we don't mind POLLREMOVE turning up in a few architectures that didn't have it previously (which should be OK as POLLREMOVE is not used anywhere in the current tree). Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11lib/hexdumpRandy Dunlap
Based on ace_dump_mem() from Grant Likely for the Xilinx SystemACE CompactFlash interface. Add print_hex_dump() & hex_dumper() to lib/hexdump.c and linux/kernel.h. This patch adds the functions print_hex_dump() & hex_dumper(). print_hex_dump() can be used to perform a hex + ASCII dump of data to syslog, in an easily viewable format, thus providing a common text hex dump format. hex_dumper() provides a dump-to-memory function. It converts one "line" of output (16 bytes of input) at a time. Example usages: print_hex_dump(KERN_DEBUG, DUMP_PREFIX_ADDRESS, frame->data, frame->len); hex_dumper(frame->data, frame->len, linebuf, sizeof(linebuf)); Example output using %DUMP_PREFIX_OFFSET: 0009ab42: 40414243 44454647 48494a4b 4c4d4e4f-@ABCDEFG HIJKLMNO Example output using %DUMP_PREFIX_ADDRESS: ffffffff88089af0: 70717273 74757677 78797a7b 7c7d7e7f-pqrstuvw xyz{|}~. [akpm@linux-foundation.org: cleanups, add export] Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11getrusage(): fill ru_inblock and ru_oublock fields if possibleEric Dumazet
If CONFIG_TASK_IO_ACCOUNTING is defined, we update io accounting counters for each task. This patch permits reporting of values using the well known getrusage() syscall, filling ru_inblock and ru_oublock instead of null values. As TASK_IO_ACCOUNTING currently counts bytes counts, we approximate blocks count doing : nr_blocks = nr_bytes / 512 Example of use : ---------------------- After patch is applied, /usr/bin/time command can now give a good approximation of IO that the process had to do. $ /usr/bin/time grep tototo /usr/include/* Command exited with non-zero status 1 0.00user 0.02system 0:02.11elapsed 1%CPU (0avgtext+0avgdata 0maxresident)k 24288inputs+0outputs (0major+259minor)pagefaults 0swaps $ /usr/bin/time dd if=/dev/zero of=/tmp/testfile count=1000 1000+0 enregistrements lus 1000+0 enregistrements écrits 512000 octets (512 kB) copiés, 0,00326601 seconde, 157 MB/s 0.00user 0.00system 0:00.00elapsed 80%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+3000outputs (0major+299minor)pagefaults 0swaps Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11uml: shrink kernel stacksJeff Dike
Make kernel stacks be 1 page on i386 and 2 pages on x86_64. These match the host values. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11uml: iRQ stacksJeff Dike
Add a separate IRQ stack. This differs from i386 in having the entire interrupt run on a separate stack rather than starting on the normal kernel stack and switching over once some preparation has been done. The underlying mechanism, is of course, sigaltstack. Another difference is that interrupts that happen in userspace are handled on the normal kernel stack. These cause a wait wakeup instead of a signal delivery so there is no point in trying to switch stacks for these. There's no other stuff on the stack, so there is no extra stack consumption. This quirk makes it possible to have the entire interrupt run on a separate stack - process preemption (and calls to schedule()) happens on a normal kernel stack. If we enable CONFIG_PREEMPT, this will need to be rethought. The IRQ stack for CPU 0 is declared in the same way as the initial kernel stack. IRQ stacks for other CPUs will be allocated dynamically. An extra field was added to the thread_info structure. When the active thread_info is copied to the IRQ stack, the real_thread field points back to the original stack. This makes it easy to tell where to copy the thread_info struct back to when the interrupt is finished. It also serves as a marker of a nested interrupt. It is NULL for the first interrupt on the stack, and non-NULL for any nested interrupts. Care is taken to behave correctly if a second interrupt comes in when the thread_info structure is being set up or taken down. I could just disable interrupts here, but I don't feel like giving up any of the performance gained by not flipping signals on and off. If an interrupt comes in during these critical periods, the handler can't run because it has no idea what shape the stack is in. So, it sets a bit for its signal in a global mask and returns. The outer handler will deal with this signal itself. Atomicity is had with xchg. A nested interrupt that needs to bail out will xchg its signal mask into pending_mask and repeat in case yet another interrupt hit at the same time, until the mask stabilizes. The outermost interrupt will set up the thread_info and xchg a zero into pending_mask when it is done. At this point, nested interrupts will look at ->real_thread and see that no setup needs to be done. They can just continue normally. Similar care needs to be taken when exiting the outer handler. If another interrupt comes in while it is copying the thread_info, it will drop a bit into pending_mask. The outer handler will check this and if it is non-zero, will loop, set up the stack again, and handle the interrupt. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11uml: tidy IRQ codeJeff Dike
Some tidying of the irq code before introducing irq stacks. Mostly style fixes, but the timer handler calls the timer code directly rather than going through the generic sig_handler_common_skas. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11uml: use UM_THREAD_SIZE in userspace codeJeff Dike
Now that we have UM_THREAD_SIZE, we can replace the calculations in user-space code (an earlier patch took care of the kernel side of the house). Signed-off-by: Jeff Dike <jdike@linux.intel.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11uml: remove task_protectionsJeff Dike
Replaced task_protections with stack_protections since they do the same thing, and task_protections was misnamed anyway. This needs THREAD_SIZE, so that's imported via common-offsets.h Also tidied up the code in the vicinity. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11Let SYSV68_PARTITION default to yes on VME onlyGeert Uytterhoeven
Don't enable SYSV68 partition table support on all m68k boxes by default, only on Motorola VME boards. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Philippe De Muyter <phdm@macqel.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11m32r: fix pte_to_pgoff(), pgoff_to_pte() and __swp_type() macrosHirokazu Takata
This patch is required to handle file-mapped or swapped-out pages correctly. - Fix pte_to_pgoff() and pgoff_to_pte() macros not to include _PAGE_PROTNONE bit of PTE. Mask value for { ACCESSED, N, (R, W, X), L, G } is not 0xef but 0x7f. - Fix __swp_type() macro for MAX_SWAPFILES_SHIFT(=5), which is defined in include/linux/swap.h. * M32R TLB format [0] [1:19] [20:23] [24:31] +-----------------------+----+-------------+ | VPN |0000| ASID | +-----------------------+----+-------------+ +-+---------------------+----+-+---+-+-+-+-+ |0 PPN |0000|N|AC |L|G|V| | +-+---------------------+----+-+---+-+-+-+-+ || RWX | | * software bits in PTE || | +-- _PAGE_FILE | _PAGE_DIRTY || +---- _PAGE_PRESENT |+---------------- _PAGE_ACCESSED +----------------- _PAGE_PROTNONE Signed-off-by: Hitoshi Yamamoto <hitoshiy@linux-m32r.org> Signed-off-by: Hirokazu Takata <takata@linux-m32r.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11m32r: fix tme_handler to check _PAGE_PRESENT bitHirokazu Takata
Fix the tlb-miss handler (tme_handler) to check _PAGE_PRESENT bit in order to handle file-mapped or swapped-out pages correctly. This patch is required to fix unexpected page errors for m32r. Signed-off-by: Hitoshi Yamamoto <hitoshiy@linux-m32r.org> Signed-off-by: Hirokazu Takata <takata@linux-m32r.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11m32r: fix switch_to macro to push/pop frame pointer if neededHirokazu Takata
This patch fixes a rarely-happened but severe scheduling problem of the recent m32r kernel of 2.6.17-rc3 or later. In the following previous m32r patch, the switch_to macro was modified not to do unnecessary push/pop operations for tuning. > [PATCH] m32r: update switch_to macro for tuning > 4127272c38619c56f0c1aa01d01c7bd757db70a1 In this modification, only 'lr' and 'sp' registers are push/pop'ed, assuming that the m32r kernel is always compiled with -fomit-frame-pointer option. However, in 2.6 kernel, kernel/sched.c is irregularly compiled with -fno-omit-frame-pointer if CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER is not defined. -- kernel/Makefile -- : ifneq ($(CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER),y) # According to Alan Modra <alan@linuxcare.com.au>, the -fno-omit-frame-pointer is # needed for x86 only. Why this used to be enabled for all architectures is beyond # me. I suspect most platforms don't need this, but until we know that for sure # I turn this off for IA-64 only. Andreas Schwab says it's also needed on m68k # to get a correct value for the wait-channel (WCHAN in ps). --davidm CFLAGS_sched.o := $(PROFILING) -fno-omit-frame-pointer endif : --- Therefore, for the recent m32r kernel, we have to push/pop 'fp' (frame pointer) if CONFIG_FRAME_POINTER is defined or CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER is not defined. Signed-off-by: Hitoshi Yamamoto <hitoshiy@linux-m32r.org> Signed-off-by: Hirokazu Takata <takata@linux-m32r.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11h8300 syscall updateYoshinori Sato
h8300 systemcall entry table update. Signed-off-by: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11frv: gdb: use __maybe_unusedDavid Rientjes
Replace function instances of __attribute__((unused)) with __maybe_unused to suppress warnings. Cc: David Howells <dhowells@redhat.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11VM statistics: Make timer deferrableChristoph Lameter
VM statistics updates do not matter if the kernel is in idle powersaving mode. So allow the timer to be deferred. It would be better though if we could switch the timer between deferrable and nondeferrable based on differentials present. The timer would start out nondeferrable and if we find that there were no updates in the last statistics interval then we would switch the timer to deferrable. If the timer later finds again that there are differentials then go to nondeferrable again. And yet another way would be to run the timer shortly before going to idle? The solution here means that the VM counters may be slightly off during idle since differentials may be still pending while the timer is deferred. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11AFS: implement statfsDavid Howells
Implement the statfs() op for AFS. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11AFS: fix a couple of problems with unlinking AFS filesDavid Howells
Fix a couple of problems with unlinking AFS files. (1) The parent directory wasn't being updated properly between unlink() and the following lookup(). It seems that, for some reason, invalidate_remote_inode() wasn't discarding the directory contents correctly, so this patch calls invalidate_inode_pages2() instead on non-regular files. (2) afs_vnode_deleted_remotely() should handle vnodes that don't have a source server recorded without oopsing. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11AFS: fix interminable loop in afs_write_back_from_locked_page()David Howells
Following bug was uncovered by compiling with '-W' flag: CC [M] fs/afs/write.o fs/afs/write.c: In function ‘afs_write_back_from_locked_page’: fs/afs/write.c:398: warning: comparison of unsigned expression >= 0 is always true Loop variable 'n' is unsigned, so wraps around happily as far as I can see. Trival fix attached (compile tested only). Signed-off-by: Mika Kukkonen <mikukkon@iki.fi> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11x86_64: new syscallAndi Kleen
Add epoll_pwait() (akpm: stolen from Andi's queue, because I want to send the signalfd patches which also add syscalls. Not sure what the __IGNORE_getcpu is for). Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11Documentation/gpio.txt mentions GENERIC_GPIODavid Brownell
Documentation/gpio.txt should mention the Kconfig GENERIC_GPIO flag, for platforms to declare when relevant. This should help minimize goofs like omitting it, or not depending on it when needed. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11Bug in mm/thrash.c function grab_swap_token()Mika Kukkonen
Following bug was uncovered by compiling with '-W' flag: CC mm/thrash.o mm/thrash.c: In function ‘grab_swap_token’: mm/thrash.c:52: warning: comparison of unsigned expression < 0 is always false Variable token_priority is unsigned, so decrementing first and then checking the result does not work; fixed by reversing the test, patch attached (compile tested only). I am not sure if likely() makes much sense in this new situation, but I'll let somebody else to make a decision on that. Signed-off-by: Mika Kukkonen <mikukkon@iki.fi> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>