From 4fd2c20d964a8fb9861045f1022475c9d200d684 Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Tue, 23 Mar 2010 13:35:42 -0700 Subject: kcore: fix test for end of list "m" is never NULL here. We need a different test for the end of list condition. Signed-off-by: Dan Carpenter Acked-by: KAMEZAWA Hiroyuki Acked-by: WANG Cong Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/proc/kcore.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'fs/proc') diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c index a44a7897fd4..b442dac8f5f 100644 --- a/fs/proc/kcore.c +++ b/fs/proc/kcore.c @@ -490,7 +490,7 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) } read_unlock(&kclist_lock); - if (m == NULL) { + if (&m->list == &kclist_head) { if (clear_user(buffer, tsz)) return -EFAULT; } else if (is_vmalloc_or_module_addr((void *)start)) { -- cgit v1.2.3-18-g5258 From 5a0e3ad6af8660be21ca98a971cd00f331318c05 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Wed, 24 Mar 2010 17:04:11 +0900 Subject: include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo Guess-its-ok-by: Christoph Lameter Cc: Ingo Molnar Cc: Lee Schermerhorn --- fs/proc/array.c | 1 - fs/proc/base.c | 1 + fs/proc/generic.c | 1 + fs/proc/inode.c | 1 + fs/proc/kcore.c | 1 + fs/proc/nommu.c | 1 - fs/proc/proc_devtree.c | 1 + fs/proc/proc_net.c | 1 + fs/proc/stat.c | 1 - fs/proc/task_mmu.c | 1 + fs/proc/task_nommu.c | 1 + fs/proc/vmcore.c | 1 + 12 files changed, 9 insertions(+), 3 deletions(-) (limited to 'fs/proc') diff --git a/fs/proc/array.c b/fs/proc/array.c index aa8637b8102..e51f2ec2c5e 100644 --- a/fs/proc/array.c +++ b/fs/proc/array.c @@ -68,7 +68,6 @@ #include #include #include -#include #include #include #include diff --git a/fs/proc/base.c b/fs/proc/base.c index a7310841c83..9e82adc37b0 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -81,6 +81,7 @@ #include #include #include +#include #include "internal.h" /* NOTE: diff --git a/fs/proc/generic.c b/fs/proc/generic.c index 08f4d71dacd..43c12749060 100644 --- a/fs/proc/generic.c +++ b/fs/proc/generic.c @@ -13,6 +13,7 @@ #include #include #include +#include #include #include #include diff --git a/fs/proc/inode.c b/fs/proc/inode.c index 445a02bcaab..d35b23238fb 100644 --- a/fs/proc/inode.c +++ b/fs/proc/inode.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c index b442dac8f5f..19979a2ce27 100644 --- a/fs/proc/kcore.c +++ b/fs/proc/kcore.c @@ -19,6 +19,7 @@ #include #include #include +#include #include #include #include diff --git a/fs/proc/nommu.c b/fs/proc/nommu.c index 9fe7d7ebe11..b1822dde55c 100644 --- a/fs/proc/nommu.c +++ b/fs/proc/nommu.c @@ -21,7 +21,6 @@ #include #include #include -#include #include #include #include diff --git a/fs/proc/proc_devtree.c b/fs/proc/proc_devtree.c index f8650dce74f..ce94801f48c 100644 --- a/fs/proc/proc_devtree.c +++ b/fs/proc/proc_devtree.c @@ -12,6 +12,7 @@ #include #include #include +#include #include #include #include "internal.h" diff --git a/fs/proc/proc_net.c b/fs/proc/proc_net.c index 04d1270f1c3..9020ac15baa 100644 --- a/fs/proc/proc_net.c +++ b/fs/proc/proc_net.c @@ -14,6 +14,7 @@ #include #include #include +#include #include #include #include diff --git a/fs/proc/stat.c b/fs/proc/stat.c index b9b7aad2003..bf31b03fc27 100644 --- a/fs/proc/stat.c +++ b/fs/proc/stat.c @@ -1,6 +1,5 @@ #include #include -#include #include #include #include diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 183f8ff5f40..2d45889931f 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -4,6 +4,7 @@ #include #include #include +#include #include #include #include diff --git a/fs/proc/task_nommu.c b/fs/proc/task_nommu.c index 5d9fd64ef81..46d4b5d72bd 100644 --- a/fs/proc/task_nommu.c +++ b/fs/proc/task_nommu.c @@ -5,6 +5,7 @@ #include #include #include +#include #include #include "internal.h" diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c index 0872afa58d3..9fbc99ec799 100644 --- a/fs/proc/vmcore.c +++ b/fs/proc/vmcore.c @@ -12,6 +12,7 @@ #include #include #include +#include #include #include #include -- cgit v1.2.3-18-g5258 From b95c35e76b29ba812e5dabdd91592e25ec640e93 Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Thu, 1 Apr 2010 15:13:57 +0200 Subject: oom: fix the unsafe usage of badness() in proc_oom_score() proc_oom_score(task) has a reference to task_struct, but that is all. If this task was already released before we take tasklist_lock - we can't use task->group_leader, it points to nowhere - it is not safe to call badness() even if this task is ->group_leader, has_intersects_mems_allowed() assumes it is safe to iterate over ->thread_group list. - even worse, badness() can hit ->signal == NULL Add the pid_alive() check to ensure __unhash_process() was not called. Also, use "task" instead of task->group_leader. badness() should return the same result for any sub-thread. Currently this is not true, but this should be changed anyway. Signed-off-by: Oleg Nesterov Cc: stable@kernel.org Signed-off-by: Linus Torvalds --- fs/proc/base.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'fs/proc') diff --git a/fs/proc/base.c b/fs/proc/base.c index a7310841c83..b1f6e62773d 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -442,12 +442,13 @@ static const struct file_operations proc_lstats_operations = { unsigned long badness(struct task_struct *p, unsigned long uptime); static int proc_oom_score(struct task_struct *task, char *buffer) { - unsigned long points; + unsigned long points = 0; struct timespec uptime; do_posix_clock_monotonic_gettime(&uptime); read_lock(&tasklist_lock); - points = badness(task->group_leader, uptime.tv_sec); + if (pid_alive(task)) + points = badness(task, uptime.tv_sec); read_unlock(&tasklist_lock); return sprintf(buffer, "%lu\n", points); } -- cgit v1.2.3-18-g5258 From d82ef020cf31504c816803b1def94eb5ff173363 Mon Sep 17 00:00:00 2001 From: KAMEZAWA Hiroyuki Date: Fri, 2 Apr 2010 09:11:29 +0900 Subject: proc: pagemap: Hold mmap_sem during page walk In initial design, walk_page_range() was designed just for walking page table and it didn't require mmap_sem. Now, find_vma() etc.. are used in walk_page_range() and we need mmap_sem around it. This patch adds mmap_sem around walk_page_range(). Because /proc//pagemap's callback routine use put_user(), we have to get rid of it to do sane fix. Changelog: 2010/Apr/2 - fixed start_vaddr and end overflow Changelog: 2010/Apr/1 - fixed start_vaddr calculation - removed unnecessary cast. - removed unnecessary change in smaps. - use GFP_TEMPORARY instead of GFP_KERNEL Signed-off-by: KAMEZAWA Hiroyuki Cc: Matt Mackall Cc: KOSAKI Motohiro Cc: San Mehat Cc: Brian Swetland Cc: Dave Hansen Cc: Andrew Morton [ Fixed kmalloc failure return code as per Matt ] Signed-off-by: Linus Torvalds --- fs/proc/task_mmu.c | 87 ++++++++++++++++++++++++------------------------------ 1 file changed, 38 insertions(+), 49 deletions(-) (limited to 'fs/proc') diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 183f8ff5f40..096273984c3 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -406,6 +406,7 @@ static int show_smap(struct seq_file *m, void *v) memset(&mss, 0, sizeof mss); mss.vma = vma; + /* mmap_sem is held in m_start */ if (vma->vm_mm && !is_vm_hugetlb_page(vma)) walk_page_range(vma->vm_start, vma->vm_end, &smaps_walk); @@ -552,7 +553,8 @@ const struct file_operations proc_clear_refs_operations = { }; struct pagemapread { - u64 __user *out, *end; + int pos, len; + u64 *buffer; }; #define PM_ENTRY_BYTES sizeof(u64) @@ -575,10 +577,8 @@ struct pagemapread { static int add_to_pagemap(unsigned long addr, u64 pfn, struct pagemapread *pm) { - if (put_user(pfn, pm->out)) - return -EFAULT; - pm->out++; - if (pm->out >= pm->end) + pm->buffer[pm->pos++] = pfn; + if (pm->pos >= pm->len) return PM_END_OF_BUFFER; return 0; } @@ -720,21 +720,20 @@ static int pagemap_hugetlb_range(pte_t *pte, unsigned long addr, * determine which areas of memory are actually mapped and llseek to * skip over unmapped regions. */ +#define PAGEMAP_WALK_SIZE (PMD_SIZE) static ssize_t pagemap_read(struct file *file, char __user *buf, size_t count, loff_t *ppos) { struct task_struct *task = get_proc_task(file->f_path.dentry->d_inode); - struct page **pages, *page; - unsigned long uaddr, uend; struct mm_struct *mm; struct pagemapread pm; - int pagecount; int ret = -ESRCH; struct mm_walk pagemap_walk = {}; unsigned long src; unsigned long svpfn; unsigned long start_vaddr; unsigned long end_vaddr; + int copied = 0; if (!task) goto out; @@ -757,35 +756,12 @@ static ssize_t pagemap_read(struct file *file, char __user *buf, if (!mm) goto out_task; - - uaddr = (unsigned long)buf & PAGE_MASK; - uend = (unsigned long)(buf + count); - pagecount = (PAGE_ALIGN(uend) - uaddr) / PAGE_SIZE; - ret = 0; - if (pagecount == 0) - goto out_mm; - pages = kcalloc(pagecount, sizeof(struct page *), GFP_KERNEL); + pm.len = PM_ENTRY_BYTES * (PAGEMAP_WALK_SIZE >> PAGE_SHIFT); + pm.buffer = kmalloc(pm.len, GFP_TEMPORARY); ret = -ENOMEM; - if (!pages) + if (!pm.buffer) goto out_mm; - down_read(¤t->mm->mmap_sem); - ret = get_user_pages(current, current->mm, uaddr, pagecount, - 1, 0, pages, NULL); - up_read(¤t->mm->mmap_sem); - - if (ret < 0) - goto out_free; - - if (ret != pagecount) { - pagecount = ret; - ret = -EFAULT; - goto out_pages; - } - - pm.out = (u64 __user *)buf; - pm.end = (u64 __user *)(buf + count); - pagemap_walk.pmd_entry = pagemap_pte_range; pagemap_walk.pte_hole = pagemap_pte_hole; pagemap_walk.hugetlb_entry = pagemap_hugetlb_range; @@ -807,23 +783,36 @@ static ssize_t pagemap_read(struct file *file, char __user *buf, * user buffer is tracked in "pm", and the walk * will stop when we hit the end of the buffer. */ - ret = walk_page_range(start_vaddr, end_vaddr, &pagemap_walk); - if (ret == PM_END_OF_BUFFER) - ret = 0; - /* don't need mmap_sem for these, but this looks cleaner */ - *ppos += (char __user *)pm.out - buf; - if (!ret) - ret = (char __user *)pm.out - buf; - -out_pages: - for (; pagecount; pagecount--) { - page = pages[pagecount-1]; - if (!PageReserved(page)) - SetPageDirty(page); - page_cache_release(page); + ret = 0; + while (count && (start_vaddr < end_vaddr)) { + int len; + unsigned long end; + + pm.pos = 0; + end = start_vaddr + PAGEMAP_WALK_SIZE; + /* overflow ? */ + if (end < start_vaddr || end > end_vaddr) + end = end_vaddr; + down_read(&mm->mmap_sem); + ret = walk_page_range(start_vaddr, end, &pagemap_walk); + up_read(&mm->mmap_sem); + start_vaddr = end; + + len = min(count, PM_ENTRY_BYTES * pm.pos); + if (copy_to_user(buf, pm.buffer, len) < 0) { + ret = -EFAULT; + goto out_free; + } + copied += len; + buf += len; + count -= len; } + *ppos += copied; + if (!ret || ret == PM_END_OF_BUFFER) + ret = copied; + out_free: - kfree(pages); + kfree(pm.buffer); out_mm: mmput(mm); out_task: -- cgit v1.2.3-18-g5258 From 309361e09ca9e9670dc8664e5d14125bf82078af Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Tue, 6 Apr 2010 13:45:39 +0300 Subject: proc: copy_to_user() returns unsigned copy_to_user() returns the number of bytes left to be copied. This was a typo from: d82ef020cf31 "proc: pagemap: Hold mmap_sem during page walk". Signed-off-by: Dan Carpenter Acked-by: Matt Mackall Signed-off-by: Linus Torvalds --- fs/proc/task_mmu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'fs/proc') diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index caf0337dff7..a05a669510a 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -800,7 +800,7 @@ static ssize_t pagemap_read(struct file *file, char __user *buf, start_vaddr = end; len = min(count, PM_ENTRY_BYTES * pm.pos); - if (copy_to_user(buf, pm.buffer, len) < 0) { + if (copy_to_user(buf, pm.buffer, len)) { ret = -EFAULT; goto out_free; } -- cgit v1.2.3-18-g5258 From 116354d177ba2da37e91cf884e3d11e67f825efd Mon Sep 17 00:00:00 2001 From: Naoya Horiguchi Date: Tue, 6 Apr 2010 14:35:04 -0700 Subject: pagemap: fix pfn calculation for hugepage When we look into pagemap using page-types with option -p, the value of pfn for hugepages looks wrong (see below.) This is because pte was evaluated only once for one vma although it should be updated for each hugepage. This patch fixes it. $ page-types -p 3277 -Nl -b huge voffset offset len flags 7f21e8a00 11e400 1 ___U___________H_G________________ 7f21e8a01 11e401 1ff ________________TG________________ ^^^ 7f21e8c00 11e400 1 ___U___________H_G________________ 7f21e8c01 11e401 1ff ________________TG________________ ^^^ One hugepage contains 1 head page and 511 tail pages in x86_64 and each two lines represent each hugepage. Voffset and offset mean virtual address and physical address in the page unit, respectively. The different hugepages should not have the same offset value. With this patch applied: $ page-types -p 3386 -Nl -b huge voffset offset len flags 7fec7a600 112c00 1 ___UD__________H_G________________ 7fec7a601 112c01 1ff ________________TG________________ ^^^ 7fec7a800 113200 1 ___UD__________H_G________________ 7fec7a801 113201 1ff ________________TG________________ ^^^ OK More info: - This patch modifies walk_page_range()'s hugepage walker. But the change only affects pagemap_read(), which is the only caller of hugepage callback. - Without this patch, hugetlb_entry() callback is called per vma, that doesn't match the natural expectation from its name. - With this patch, hugetlb_entry() is called per hugepte entry and the callback can become much simpler. Signed-off-by: Naoya Horiguchi Signed-off-by: KAMEZAWA Hiroyuki Acked-by: Matt Mackall Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/proc/task_mmu.c | 27 +++++++-------------------- 1 file changed, 7 insertions(+), 20 deletions(-) (limited to 'fs/proc') diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index a05a669510a..070553427dd 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -662,31 +662,18 @@ static u64 huge_pte_to_pagemap_entry(pte_t pte, int offset) return pme; } -static int pagemap_hugetlb_range(pte_t *pte, unsigned long addr, - unsigned long end, struct mm_walk *walk) +/* This function walks within one hugetlb entry in the single call */ +static int pagemap_hugetlb_range(pte_t *pte, unsigned long hmask, + unsigned long addr, unsigned long end, + struct mm_walk *walk) { - struct vm_area_struct *vma; struct pagemapread *pm = walk->private; - struct hstate *hs = NULL; int err = 0; + u64 pfn; - vma = find_vma(walk->mm, addr); - if (vma) - hs = hstate_vma(vma); for (; addr != end; addr += PAGE_SIZE) { - u64 pfn = PM_NOT_PRESENT; - - if (vma && (addr >= vma->vm_end)) { - vma = find_vma(walk->mm, addr); - if (vma) - hs = hstate_vma(vma); - } - - if (vma && (vma->vm_start <= addr) && is_vm_hugetlb_page(vma)) { - /* calculate pfn of the "raw" page in the hugepage. */ - int offset = (addr & ~huge_page_mask(hs)) >> PAGE_SHIFT; - pfn = huge_pte_to_pagemap_entry(*pte, offset); - } + int offset = (addr & ~hmask) >> PAGE_SHIFT; + pfn = huge_pte_to_pagemap_entry(*pte, offset); err = add_to_pagemap(addr, pfn, pm); if (err) return err; -- cgit v1.2.3-18-g5258