Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux
Pull SLAB changes from Pekka Enberg:
"The patches from Joonsoo Kim switch mm/slab.c to use 'struct page' for
slab internals similar to mm/slub.c. This reduces memory usage and
improves performance:
https://lkml.org/lkml/2013/10/16/155
Rest of the changes are bug fixes from various people"
* 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux: (21 commits)
mm, slub: fix the typo in mm/slub.c
mm, slub: fix the typo in include/linux/slub_def.h
slub: Handle NULL parameter in kmem_cache_flags
slab: replace non-existing 'struct freelist *' with 'void *'
slab: fix to calm down kmemleak warning
slub: proper kmemleak tracking if CONFIG_SLUB_DEBUG disabled
slab: rename slab_bufctl to slab_freelist
slab: remove useless statement for checking pfmemalloc
slab: use struct page for slab management
slab: replace free and inuse in struct slab with newly introduced active
slab: remove SLAB_LIMIT
slab: remove kmem_bufctl_t
slab: change the management method of free objects of the slab
slab: use __GFP_COMP flag for allocating slab pages
slab: use well-defined macro, virt_to_slab()
slab: overloading the RCU head over the LRU for RCU free
slab: remove cachep in struct slab_rcu
slab: remove nodeid in struct slab
slab: remove colouroff in struct slab
slab: change return type of kmem_getpages() to struct page
...
|
|
Fengguang Wu reports that compiling mm/mempolicy.c results in a warning:
mm/mempolicy.c: In function 'mpol_to_str':
mm/mempolicy.c:2878:2: error: format not a string literal and no format arguments
Kees says this is because he is using -Wformat-security.
Silence the warning.
Signed-off-by: David Rientjes <rientjes@google.com>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Suggested-by: Kees Cook <keescook@chromium.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit 7cb2ef56e6a8 ("mm: fix aio performance regression for database
caused by THP") can cause dereference of a dangling pointer if
split_huge_page runs during PageHuge() if there are updates to the
tail_page->private field.
Also it is repeating compound_head twice for hugetlbfs and it is running
compound_head+compound_trans_head for THP when a single one is needed in
both cases.
The new code within the PageSlab() check doesn't need to verify that the
THP page size is never bigger than the smallest hugetlbfs page size, to
avoid memory corruption.
A longstanding theoretical race condition was found while fixing the
above (see the change right after the skip_unlock label, that is
relevant for the compound_lock path too).
By re-establishing the _mapcount tail refcounting for all compound
pages, this also fixes the below problem:
echo 0 >/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
BUG: Bad page state in process bash pfn:59a01
page:ffffea000139b038 count:0 mapcount:10 mapping: (null) index:0x0
page flags: 0x1c00000000008000(tail)
Modules linked in:
CPU: 6 PID: 2018 Comm: bash Not tainted 3.12.0+ #25
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
dump_stack+0x55/0x76
bad_page+0xd5/0x130
free_pages_prepare+0x213/0x280
__free_pages+0x36/0x80
update_and_free_page+0xc1/0xd0
free_pool_huge_page+0xc2/0xe0
set_max_huge_pages.part.58+0x14c/0x220
nr_hugepages_store_common.isra.60+0xd0/0xf0
nr_hugepages_store+0x13/0x20
kobj_attr_store+0xf/0x20
sysfs_write_file+0x189/0x1e0
vfs_write+0xc5/0x1f0
SyS_write+0x55/0xb0
system_call_fastpath+0x16/0x1b
Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Tested-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Pravin Shelar <pshelar@nicira.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Right now, the migration code in migrate_page_copy() uses copy_huge_page()
for hugetlbfs and thp pages:
if (PageHuge(page) || PageTransHuge(page))
copy_huge_page(newpage, page);
So, yay for code reuse. But:
void copy_huge_page(struct page *dst, struct page *src)
{
struct hstate *h = page_hstate(src);
and a non-hugetlbfs page has no page_hstate(). This works 99% of the
time because page_hstate() determines the hstate from the page order
alone. Since the page order of a THP page matches the default hugetlbfs
page order, it works.
But, if you change the default huge page size on the boot command-line
(say default_hugepagesz=1G), then we might not even *have* a 2MB hstate
so page_hstate() returns null and copy_huge_page() oopses pretty fast
since copy_huge_page() dereferences the hstate:
void copy_huge_page(struct page *dst, struct page *src)
{
struct hstate *h = page_hstate(src);
if (unlikely(pages_per_huge_page(h) > MAX_ORDER_NR_PAGES)) {
...
Mel noticed that the migration code is really the only user of these
functions. This moves all the copy code over to migrate.c and makes
copy_huge_page() work for THP by checking for it explicitly.
I believe the bug was introduced in commit b32967ff101a ("mm: numa: Add
THP migration for the NUMA working set scanning fault case")
[akpm@linux-foundation.org: fix coding-style and comment text, per Naoya Horiguchi]
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Tested-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This reverts commit ea1e7ed33708c7a760419ff9ded0a6cb90586a50.
Al points out that while the commit *does* actually create a separate
slab for the page->ptl allocation, that slab is never actually used, and
the code continues to use kmalloc/kfree.
Damien Wyart points out that the original patch did have the conversion
to use kmem_cache_alloc/free, so it got lost somewhere on its way to me.
Revert the half-arsed attempt that didn't do anything. If we really do
want the special slab (remember: this is all relevant just for debug
builds, so it's not necessarily all that critical) we might as well redo
the patch fully.
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Kirill A Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina:
"Usual earth-shaking, news-breaking, rocket science pile from
trivial.git"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (23 commits)
doc: usb: Fix typo in Documentation/usb/gadget_configs.txt
doc: add missing files to timers/00-INDEX
timekeeping: Fix some trivial typos in comments
mm: Fix some trivial typos in comments
irq: Fix some trivial typos in comments
NUMA: fix typos in Kconfig help text
mm: update 00-INDEX
doc: Documentation/DMA-attributes.txt fix typo
DRM: comment: `halve' -> `half'
Docs: Kconfig: `devlopers' -> `developers'
doc: typo on word accounting in kprobes.c in mutliple architectures
treewide: fix "usefull" typo
treewide: fix "distingush" typo
mm/Kconfig: Grammar s/an/a/
kexec: Typo s/the/then/
Documentation/kvm: Update cpuid documentation for steal time and pv eoi
treewide: Fix common typo in "identify"
__page_to_pfn: Fix typo in comment
Correct some typos for word frequency
clk: fixed-factor: Fix a trivial typo
...
|
|
This patch enhances the type safety for the kfifo API. It is now safe
to put const data into a non const FIFO and the API will now generate a
compiler warning when reading from the fifo where the destination
address is pointing to a const variable.
As a side effect the kfifo_put() does now expect the value of an element
instead a pointer to the element. This was suggested Russell King. It
make the handling of the kfifo_put easier since there is no need to
create a helper variable for getting the address of a pointer or to pass
integers of different sizes.
IMHO the API break is okay, since there are currently only six users of
kfifo_put().
The code is also cleaner by kicking out the "if (0)" expressions.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Stefani Seibold <stefani@seibold.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
If DEBUG_SPINLOCK and DEBUG_LOCK_ALLOC are enabled spinlock_t on x86_64
is 72 bytes. For page->ptl they will be allocated from kmalloc-96 slab,
so we loose 24 on each. An average system can easily allocate few tens
thousands of page->ptl and overhead is significant.
Let's create a separate slab for page->ptl allocation to solve this.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Use kernel/bounds.c to convert build-time spinlock_t size check into a
preprocessor symbol and apply that to properly separate the page::ptl
situation.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
If split page table lock is in use, we embed the lock into struct page
of table's page. We have to disable split lock, if spinlock_t is too
big be to be embedded, like when DEBUG_SPINLOCK or DEBUG_LOCK_ALLOC
enabled.
This patch add support for dynamic allocation of split page table lock
if we can't embed it to struct page.
page->ptl is unsigned long now and we use it as spinlock_t if
sizeof(spinlock_t) <= sizeof(long), otherwise it's pointer to spinlock_t.
The spinlock_t allocated in pgtable_page_ctor() for PTE table and in
pgtable_pmd_page_ctor() for PMD table. All other helpers converted to
support dynamically allocated page->ptl.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The basic idea is the same as with PTE level: the lock is embedded into
struct page of table's page.
We can't use mm->pmd_huge_pte to store pgtables for THP, since we don't
take mm->page_table_lock anymore. Let's reuse page->lru of table's page
for that.
pgtable_pmd_page_ctor() returns true, if initialization is successful
and false otherwise. Current implementation never fails, but assumption
that constructor can fail will help to port it to -rt where spinlock_t
is rather huge and cannot be embedded into struct page -- dynamic
allocation is required.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: Alex Thorlton <athorlton@sgi.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Jones <davej@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Robin Holt <robinmholt@gmail.com>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Hugh Dickins <hughd@google.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Only trivial cases left. Let's convert them altogether.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: Alex Thorlton <athorlton@sgi.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Jones <davej@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Robin Holt <robinmholt@gmail.com>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Hugetlb supports multiple page sizes. We use split lock only for PMD
level, but not for PUD.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: Alex Thorlton <athorlton@sgi.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Jones <davej@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Robin Holt <robinmholt@gmail.com>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Currently mm->pmd_huge_pte protected by page table lock. It will not
work with split lock. We have to have per-pmd pmd_huge_pte for proper
access serialization.
For now, let's just introduce wrapper to access mm->pmd_huge_pte.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: Alex Thorlton <athorlton@sgi.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Jones <davej@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Robin Holt <robinmholt@gmail.com>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
With split page table lock we can't know which lock we need to take
before we find the relevant pmd.
Let's move lock taking inside the function.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: Alex Thorlton <athorlton@sgi.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Jones <davej@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Robin Holt <robinmholt@gmail.com>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
With split ptlock it's important to know which lock
pmd_trans_huge_lock() took. This patch adds one more parameter to the
function to return the lock.
In most places migration to new api is trivial. Exception is
move_huge_pmd(): we need to take two locks if pmd tables are different.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: Alex Thorlton <athorlton@sgi.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Jones <davej@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Robin Holt <robinmholt@gmail.com>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
With split page table lock for PMD level we can't hold mm->page_table_lock
while updating nr_ptes.
Let's convert it to atomic_long_t to avoid races.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: Alex Thorlton <athorlton@sgi.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Jones <davej@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Robin Holt <robinmholt@gmail.com>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Alex Thorlton noticed that some massively threaded workloads work poorly,
if THP enabled. This patchset fixes this by introducing split page table
lock for PMD tables. hugetlbfs is not covered yet.
This patchset is based on work by Naoya Horiguchi.
: akpm result summary:
:
: THP off, v3.12-rc2: 18.059261877 seconds time elapsed
: THP off, patched: 16.768027318 seconds time elapsed
:
: THP on, v3.12-rc2: 42.162306788 seconds time elapsed
: THP on, patched: 8.397885779 seconds time elapsed
:
: HUGETLB, v3.12-rc2: 47.574936948 seconds time elapsed
: HUGETLB, patched: 19.447481153 seconds time elapsed
THP off, v3.12-rc2:
-------------------
Performance counter stats for './thp_memscale -c 80 -b 512m' (5 runs):
1037072.835207 task-clock # 57.426 CPUs utilized ( +- 3.59% )
95,093 context-switches # 0.092 K/sec ( +- 3.93% )
140 cpu-migrations # 0.000 K/sec ( +- 5.28% )
10,000,550 page-faults # 0.010 M/sec ( +- 0.00% )
2,455,210,400,261 cycles # 2.367 GHz ( +- 3.62% ) [83.33%]
2,429,281,882,056 stalled-cycles-frontend # 98.94% frontend cycles idle ( +- 3.67% ) [83.33%]
1,975,960,019,659 stalled-cycles-backend # 80.48% backend cycles idle ( +- 3.88% ) [66.68%]
46,503,296,013 instructions # 0.02 insns per cycle
# 52.24 stalled cycles per insn ( +- 3.21% ) [83.34%]
9,278,997,542 branches # 8.947 M/sec ( +- 4.00% ) [83.34%]
89,881,640 branch-misses # 0.97% of all branches ( +- 1.17% ) [83.33%]
18.059261877 seconds time elapsed ( +- 2.65% )
THP on, v3.12-rc2:
------------------
Performance counter stats for './thp_memscale -c 80 -b 512m' (5 runs):
3114745.395974 task-clock # 73.875 CPUs utilized ( +- 1.84% )
267,356 context-switches # 0.086 K/sec ( +- 1.84% )
99 cpu-migrations # 0.000 K/sec ( +- 1.40% )
58,313 page-faults # 0.019 K/sec ( +- 0.28% )
7,416,635,817,510 cycles # 2.381 GHz ( +- 1.83% ) [83.33%]
7,342,619,196,993 stalled-cycles-frontend # 99.00% frontend cycles idle ( +- 1.88% ) [83.33%]
6,267,671,641,967 stalled-cycles-backend # 84.51% backend cycles idle ( +- 2.03% ) [66.67%]
117,819,935,165 instructions # 0.02 insns per cycle
# 62.32 stalled cycles per insn ( +- 4.39% ) [83.34%]
28,899,314,777 branches # 9.278 M/sec ( +- 4.48% ) [83.34%]
71,787,032 branch-misses # 0.25% of all branches ( +- 1.03% ) [83.33%]
42.162306788 seconds time elapsed ( +- 1.73% )
HUGETLB, v3.12-rc2:
-------------------
Performance counter stats for './thp_memscale_hugetlbfs -c 80 -b 512M' (5 runs):
2588052.787264 task-clock # 54.400 CPUs utilized ( +- 3.69% )
246,831 context-switches # 0.095 K/sec ( +- 4.15% )
138 cpu-migrations # 0.000 K/sec ( +- 5.30% )
21,027 page-faults # 0.008 K/sec ( +- 0.01% )
6,166,666,307,263 cycles # 2.383 GHz ( +- 3.68% ) [83.33%]
6,086,008,929,407 stalled-cycles-frontend # 98.69% frontend cycles idle ( +- 3.77% ) [83.33%]
5,087,874,435,481 stalled-cycles-backend # 82.51% backend cycles idle ( +- 4.41% ) [66.67%]
133,782,831,249 instructions # 0.02 insns per cycle
# 45.49 stalled cycles per insn ( +- 4.30% ) [83.34%]
34,026,870,541 branches # 13.148 M/sec ( +- 4.24% ) [83.34%]
68,670,942 branch-misses # 0.20% of all branches ( +- 3.26% ) [83.33%]
47.574936948 seconds time elapsed ( +- 2.09% )
THP off, patched:
-----------------
Performance counter stats for './thp_memscale -c 80 -b 512m' (5 runs):
943301.957892 task-clock # 56.256 CPUs utilized ( +- 3.01% )
86,218 context-switches # 0.091 K/sec ( +- 3.17% )
121 cpu-migrations # 0.000 K/sec ( +- 6.64% )
10,000,551 page-faults # 0.011 M/sec ( +- 0.00% )
2,230,462,457,654 cycles # 2.365 GHz ( +- 3.04% ) [83.32%]
2,204,616,385,805 stalled-cycles-frontend # 98.84% frontend cycles idle ( +- 3.09% ) [83.32%]
1,778,640,046,926 stalled-cycles-backend # 79.74% backend cycles idle ( +- 3.47% ) [66.69%]
45,995,472,617 instructions # 0.02 insns per cycle
# 47.93 stalled cycles per insn ( +- 2.51% ) [83.34%]
9,179,700,174 branches # 9.731 M/sec ( +- 3.04% ) [83.35%]
89,166,529 branch-misses # 0.97% of all branches ( +- 1.45% ) [83.33%]
16.768027318 seconds time elapsed ( +- 2.47% )
THP on, patched:
----------------
Performance counter stats for './thp_memscale -c 80 -b 512m' (5 runs):
458793.837905 task-clock # 54.632 CPUs utilized ( +- 0.79% )
41,831 context-switches # 0.091 K/sec ( +- 0.97% )
98 cpu-migrations # 0.000 K/sec ( +- 1.66% )
57,829 page-faults # 0.126 K/sec ( +- 0.62% )
1,077,543,336,716 cycles # 2.349 GHz ( +- 0.81% ) [83.33%]
1,067,403,802,964 stalled-cycles-frontend # 99.06% frontend cycles idle ( +- 0.87% ) [83.33%]
864,764,616,143 stalled-cycles-backend # 80.25% backend cycles idle ( +- 0.73% ) [66.68%]
16,129,177,440 instructions # 0.01 insns per cycle
# 66.18 stalled cycles per insn ( +- 7.94% ) [83.35%]
3,618,938,569 branches # 7.888 M/sec ( +- 8.46% ) [83.36%]
33,242,032 branch-misses # 0.92% of all branches ( +- 2.02% ) [83.32%]
8.397885779 seconds time elapsed ( +- 0.18% )
HUGETLB, patched:
-----------------
Performance counter stats for './thp_memscale_hugetlbfs -c 80 -b 512M' (5 runs):
395353.076837 task-clock # 20.329 CPUs utilized ( +- 8.16% )
55,730 context-switches # 0.141 K/sec ( +- 5.31% )
138 cpu-migrations # 0.000 K/sec ( +- 4.24% )
21,027 page-faults # 0.053 K/sec ( +- 0.00% )
930,219,717,244 cycles # 2.353 GHz ( +- 8.21% ) [83.32%]
914,295,694,103 stalled-cycles-frontend # 98.29% frontend cycles idle ( +- 8.35% ) [83.33%]
704,137,950,187 stalled-cycles-backend # 75.70% backend cycles idle ( +- 9.16% ) [66.69%]
30,541,538,385 instructions # 0.03 insns per cycle
# 29.94 stalled cycles per insn ( +- 3.98% ) [83.35%]
8,415,376,631 branches # 21.286 M/sec ( +- 3.61% ) [83.36%]
32,645,478 branch-misses # 0.39% of all branches ( +- 3.41% ) [83.32%]
19.447481153 seconds time elapsed ( +- 2.00% )
This patch (of 11):
CONFIG_GENERIC_LOCKBREAK increases sizeof(spinlock_t) to 8 bytes. It
leads to increase sizeof(struct page) by 4 bytes on 32-bit system if split
page table lock is in use, since page->ptl shares space in union with
longs and pointers.
Let's disable split page table lock on 32-bit systems with
GENERIC_LOCKBREAK enabled.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Jones <davej@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Robin Holt <robinmholt@gmail.com>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
There's only one caller of do_generic_file_read() and the only actor is
file_read_actor(). No reason to have a callback parameter.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core locking changes from Ingo Molnar:
"The biggest changes:
- add lockdep support for seqcount/seqlocks structures, this
unearthed both bugs and required extra annotation.
- move the various kernel locking primitives to the new
kernel/locking/ directory"
* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
block: Use u64_stats_init() to initialize seqcounts
locking/lockdep: Mark __lockdep_count_forward_deps() as static
lockdep/proc: Fix lock-time avg computation
locking/doc: Update references to kernel/mutex.c
ipv6: Fix possible ipv6 seqlock deadlock
cpuset: Fix potential deadlock w/ set_mems_allowed
seqcount: Add lockdep functionality to seqcount/seqlock structures
net: Explicitly initialize u64_stats_sync structures for lockdep
locking: Move the percpu-rwsem code to kernel/locking/
locking: Move the lglocks code to kernel/locking/
locking: Move the rwsem code to kernel/locking/
locking: Move the rtmutex code to kernel/locking/
locking: Move the semaphore core to kernel/locking/
locking: Move the spinlock code to kernel/locking/
locking: Move the lockdep code to kernel/locking/
locking: Move the mutex code to kernel/locking/
hung_task debugging: Add tracepoint to report the hang
x86/locking/kconfig: Update paravirt spinlock Kconfig description
lockstat: Report avg wait and hold times
lockdep, x86/alternatives: Drop ancient lockdep fixup message
...
|
|
Pull block IO core updates from Jens Axboe:
"This is the pull request for the core changes in the block layer for
3.13. It contains:
- The new blk-mq request interface.
This is a new and more scalable queueing model that marries the
best part of the request based interface we currently have (which
is fully featured, but scales poorly) and the bio based "interface"
which the new drivers for high IOPS devices end up using because
it's much faster than the request based one.
The bio interface has no block layer support, since it taps into
the stack much earlier. This means that drivers end up having to
implement a lot of functionality on their own, like tagging,
timeout handling, requeue, etc. The blk-mq interface provides all
these. Some drivers even provide a switch to select bio or rq and
has code to handle both, since things like merging only works in
the rq model and hence is faster for some workloads. This is a
huge mess. Conversion of these drivers nets us a substantial code
reduction. Initial results on converting SCSI to this model even
shows an 8x improvement on single queue devices. So while the
model was intended to work on the newer multiqueue devices, it has
substantial improvements for "classic" hardware as well. This code
has gone through extensive testing and development, it's now ready
to go. A pull request is coming to convert virtio-blk to this
model will be will be coming as well, with more drivers scheduled
for 3.14 conversion.
- Two blktrace fixes from Jan and Chen Gang.
- A plug merge fix from Alireza Haghdoost.
- Conversion of __get_cpu_var() from Christoph Lameter.
- Fix for sector_div() with 64-bit divider from Geert Uytterhoeven.
- A fix for a race between request completion and the timeout
handling from Jeff Moyer. This is what caused the merge conflict
with blk-mq/core, in case you are looking at that.
- A dm stacking fix from Mike Snitzer.
- A code consolidation fix and duplicated code removal from Kent
Overstreet.
- A handful of block bug fixes from Mikulas Patocka, fixing a loop
crash and memory corruption on blk cg.
- Elevator switch bug fix from Tomoki Sekiyama.
A heads-up that I had to rebase this branch. Initially the immutable
bio_vecs had been queued up for inclusion, but a week later, it became
clear that it wasn't fully cooked yet. So the decision was made to
pull this out and postpone it until 3.14. It was a straight forward
rebase, just pruning out the immutable series and the later fixes of
problems with it. The rest of the patches applied directly and no
further changes were made"
* 'for-3.13/core' of git://git.kernel.dk/linux-block: (31 commits)
block: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO
block: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO
block: Do not call sector_div() with a 64-bit divisor
kernel: trace: blktrace: remove redundent memcpy() in compat_blk_trace_setup()
block: Consolidate duplicated bio_trim() implementations
block: Use rw_copy_check_uvector()
block: Enable sysfs nomerge control for I/O requests in the plug list
block: properly stack underlying max_segment_size to DM device
elevator: acquire q->sysfs_lock in elevator_change()
elevator: Fix a race in elevator switching and md device initialization
block: Replace __get_cpu_var uses
bdi: test bdi_init failure
block: fix a probe argument to blk_register_region
loop: fix crash if blk_alloc_queue fails
blk-core: Fix memory corruption if blkcg_init_queue fails
block: fix race between request completion and timeout handling
blktrace: Send BLK_TN_PROCESS events to all running traces
blk-mq: don't disallow request merges for req->special being set
blk-mq: mq plug list breakage
blk-mq: fix for flush deadlock
...
|
|
Pull networking updates from David Miller:
1) The addition of nftables. No longer will we need protocol aware
firewall filtering modules, it can all live in userspace.
At the core of nftables is a, for lack of a better term, virtual
machine that executes byte codes to inspect packet or metadata
(arriving interface index, etc.) and make verdict decisions.
Besides support for loading packet contents and comparing them, the
interpreter supports lookups in various datastructures as
fundamental operations. For example sets are supports, and
therefore one could create a set of whitelist IP address entries
which have ACCEPT verdicts attached to them, and use the appropriate
byte codes to do such lookups.
Since the interpreted code is composed in userspace, userspace can
do things like optimize things before giving it to the kernel.
Another major improvement is the capability of atomically updating
portions of the ruleset. In the existing netfilter implementation,
one has to update the entire rule set in order to make a change and
this is very expensive.
Userspace tools exist to create nftables rules using existing
netfilter rule sets, but both kernel implementations will need to
co-exist for quite some time as we transition from the old to the
new stuff.
Kudos to Patrick McHardy, Pablo Neira Ayuso, and others who have
worked so hard on this.
2) Daniel Borkmann and Hannes Frederic Sowa made several improvements
to our pseudo-random number generator, mostly used for things like
UDP port randomization and netfitler, amongst other things.
In particular the taus88 generater is updated to taus113, and test
cases are added.
3) Support 64-bit rates in HTB and TBF schedulers, from Eric Dumazet
and Yang Yingliang.
4) Add support for new 577xx tigon3 chips to tg3 driver, from Nithin
Sujir.
5) Fix two fatal flaws in TCP dynamic right sizing, from Eric Dumazet,
Neal Cardwell, and Yuchung Cheng.
6) Allow IP_TOS and IP_TTL to be specified in sendmsg() ancillary
control message data, much like other socket option attributes.
From Francesco Fusco.
7) Allow applications to specify a cap on the rate computed
automatically by the kernel for pacing flows, via a new
SO_MAX_PACING_RATE socket option. From Eric Dumazet.
8) Make the initial autotuned send buffer sizing in TCP more closely
reflect actual needs, from Eric Dumazet.
9) Currently early socket demux only happens for TCP sockets, but we
can do it for connected UDP sockets too. Implementation from Shawn
Bohrer.
10) Refactor inet socket demux with the goal of improving hash demux
performance for listening sockets. With the main goals being able
to use RCU lookups on even request sockets, and eliminating the
listening lock contention. From Eric Dumazet.
11) The bonding layer has many demuxes in it's fast path, and an RCU
conversion was started back in 3.11, several changes here extend the
RCU usage to even more locations. From Ding Tianhong and Wang
Yufen, based upon suggestions by Nikolay Aleksandrov and Veaceslav
Falico.
12) Allow stackability of segmentation offloads to, in particular, allow
segmentation offloading over tunnels. From Eric Dumazet.
13) Significantly improve the handling of secret keys we input into the
various hash functions in the inet hashtables, TCP fast open, as
well as syncookies. From Hannes Frederic Sowa. The key fundamental
operation is "net_get_random_once()" which uses static keys.
Hannes even extended this to ipv4/ipv6 fragmentation handling and
our generic flow dissector.
14) The generic driver layer takes care now to set the driver data to
NULL on device removal, so it's no longer necessary for drivers to
explicitly set it to NULL any more. Many drivers have been cleaned
up in this way, from Jingoo Han.
15) Add a BPF based packet scheduler classifier, from Daniel Borkmann.
16) Improve CRC32 interfaces and generic SKB checksum iterators so that
SCTP's checksumming can more cleanly be handled. Also from Daniel
Borkmann.
17) Add a new PMTU discovery mode, IP_PMTUDISC_INTERFACE, which forces
using the interface MTU value. This helps avoid PMTU attacks,
particularly on DNS servers. From Hannes Frederic Sowa.
18) Use generic XPS for transmit queue steering rather than internal
(re-)implementation in virtio-net. From Jason Wang.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1622 commits)
random32: add test cases for taus113 implementation
random32: upgrade taus88 generator to taus113 from errata paper
random32: move rnd_state to linux/random.h
random32: add prandom_reseed_late() and call when nonblocking pool becomes initialized
random32: add periodic reseeding
random32: fix off-by-one in seeding requirement
PHY: Add RTL8201CP phy_driver to realtek
xtsonic: add missing platform_set_drvdata() in xtsonic_probe()
macmace: add missing platform_set_drvdata() in mace_probe()
ethernet/arc/arc_emac: add missing platform_set_drvdata() in arc_emac_probe()
ipv6: protect for_each_sk_fl_rcu in mem_check with rcu_read_lock_bh
vlan: Implement vlan_dev_get_egress_qos_mask as an inline.
ixgbe: add warning when max_vfs is out of range.
igb: Update link modes display in ethtool
netfilter: push reasm skb through instead of original frag skbs
ip6_output: fragment outgoing reassembled skb properly
MAINTAINERS: mv643xx_eth: take over maintainership from Lennart
net_sched: tbf: support of 64bit rates
ixgbe: deleting dfwd stations out of order can cause null ptr deref
ixgbe: fix build err, num_rx_queues is only available with CONFIG_RPS
...
|
|
Merge first patch-bomb from Andrew Morton:
"Quite a lot of other stuff is banked up awaiting further
next->mainline merging, but this batch contains:
- Lots of random misc patches
- OCFS2
- Most of MM
- backlight updates
- lib/ updates
- printk updates
- checkpatch updates
- epoll tweaking
- rtc updates
- hfs
- hfsplus
- documentation
- procfs
- update gcov to gcc-4.7 format
- IPC"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (269 commits)
ipc, msg: fix message length check for negative values
ipc/util.c: remove unnecessary work pending test
devpts: plug the memory leak in kill_sb
./Makefile: export initial ramdisk compression config option
init/Kconfig: add option to disable kernel compression
drivers: w1: make w1_slave::flags long to avoid memory corruption
drivers/w1/masters/ds1wm.cuse dev_get_platdata()
drivers/memstick/core/ms_block.c: fix unreachable state in h_msb_read_page()
drivers/memstick/core/mspro_block.c: fix attributes array allocation
drivers/pps/clients/pps-gpio.c: remove redundant of_match_ptr
kernel/panic.c: reduce 1 byte usage for print tainted buffer
gcov: reuse kbasename helper
kernel/gcov/fs.c: use pr_warn()
kernel/module.c: use pr_foo()
gcov: compile specific gcov implementation based on gcc version
gcov: add support for gcc 4.7 gcov format
gcov: move gcov structs definitions to a gcc version specific file
kernel/taskstats.c: return -ENOMEM when alloc memory fails in add_del_listener()
kernel/taskstats.c: add nla_nest_cancel() for failure processing between nla_nest_start() and nla_nest_end()
kernel/sysctl_binary.c: use scnprintf() instead of snprintf()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:
"All kinds of stuff this time around; some more notable parts:
- RCU'd vfsmounts handling
- new primitives for coredump handling
- files_lock is gone
- Bruce's delegations handling series
- exportfs fixes
plus misc stuff all over the place"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (101 commits)
ecryptfs: ->f_op is never NULL
locks: break delegations on any attribute modification
locks: break delegations on link
locks: break delegations on rename
locks: helper functions for delegation breaking
locks: break delegations on unlink
namei: minor vfs_unlink cleanup
locks: implement delegations
locks: introduce new FL_DELEG lock flag
vfs: take i_mutex on renamed file
vfs: rename I_MUTEX_QUOTA now that it's not used for quotas
vfs: don't use PARENT/CHILD lock classes for non-directories
vfs: pull ext4's double-i_mutex-locking into common code
exportfs: fix quadratic behavior in filehandle lookup
exportfs: better variable name
exportfs: move most of reconnect_path to helper function
exportfs: eliminate unused "noprogress" counter
exportfs: stop retrying once we race with rename/remove
exportfs: clear DISCONNECTED on all parents sooner
exportfs: more detailed comment for path_reconnect
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup changes from Tejun Heo:
"Not too much activity this time around. css_id is finally killed and
a minor update to device_cgroup"
* 'for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
device_cgroup: remove can_attach
cgroup: kill css_id
memcg: stop using css id
memcg: fail to create cgroup if the cgroup id is too big
memcg: convert to use cgroup id
memcg: convert to use cgroup_is_descendant()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
Pull percpu changes from Tejun Heo:
"Two smallish changes for percpu. Two patches to remove unused
this_cpu_xor() and one to fix a bug in percpu init failure path so
that it can reach the proper BUG() instead of oopsing earlier"
* 'for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
x86: remove this_cpu_xor() implementation
percpu: remove this_cpu_xor() implementation
percpu: fix bootmem error handling in pcpu_page_first_chunk()
|
|
Commit 0255d4918480 ("mm: Account for a THP NUMA hinting update as one
PTE update") was added to account for the number of PTE updates when
marking pages prot_numa. task_numa_work was using the old return value
to track how much address space had been updated. Altering the return
value causes the scanner to do more work than it is configured or
documented to in a single unit of work.
This patch reverts that commit and accounts for the number of THP
updates separately in vmstat. It is up to the administrator to
interpret the pair of values correctly. This is a straight-forward
operation and likely to only be of interest when actively debugging NUMA
balancing problems.
The impact of this patch is that the NUMA PTE scanner will scan slower
when THP is enabled and workloads may converge slower as a result. On
the flip size system CPU usage should be lower than recent tests
reported. This is an illustrative example of a short single JVM specjbb
test
specjbb
3.12.0 3.12.0
vanilla acctupdates
TPut 1 26143.00 ( 0.00%) 25747.00 ( -1.51%)
TPut 7 185257.00 ( 0.00%) 183202.00 ( -1.11%)
TPut 13 329760.00 ( 0.00%) 346577.00 ( 5.10%)
TPut 19 442502.00 ( 0.00%) 460146.00 ( 3.99%)
TPut 25 540634.00 ( 0.00%) 549053.00 ( 1.56%)
TPut 31 512098.00 ( 0.00%) 519611.00 ( 1.47%)
TPut 37 461276.00 ( 0.00%) 474973.00 ( 2.97%)
TPut 43 403089.00 ( 0.00%) 414172.00 ( 2.75%)
3.12.0 3.12.0
vanillaacctupdates
User 5169.64 5184.14
System 100.45 80.02
Elapsed 252.75 251.85
Performance is similar but note the reduction in system CPU time. While
this showed a performance gain, it will not be universal but at least
it'll be behaving as documented. The vmstats are obviously different but
here is an obvious interpretation of them from mmtests.
3.12.0 3.12.0
vanillaacctupdates
NUMA page range updates 1408326 11043064
NUMA huge PMD updates 0 21040
NUMA PTE updates 1408326 291624
"NUMA page range updates" == nr_pte_updates and is the value returned to
the NUMA pte scanner. NUMA huge PMD updates were the number of THP
updates which in combination can be used to calculate how many ptes were
updated from userspace.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reported-by: Alex Thorlton <athorlton@sgi.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The same calculation is currently done in three differents places.
Factor that code so future changes has to be made at only one place.
[akpm@linux-foundation.org: uninline vm_commit_limit()]
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|