linux/Documentation/cgroups/memory.txt, branch v3.2.19

memcg: skip scanning active lists based on individual size

2011-11-02T23:07:00Z

Reclaim decides to skip scanning an active list when the corresponding inactive list is above a certain size in comparison to leave the assumed working set alone while there are still enough reclaim candidates around. The memcg implementation of comparing those lists instead reports whether the whole memcg is low on the requested type of inactive pages, considering all nodes and zones. This can lead to an oversized active list not being scanned because of the state of the other lists in the memcg, as well as an active list being scanned while its corresponding inactive list has enough pages. Not only is this wrong, it's also a scalability hazard, because the global memory state over all nodes and zones has to be gathered for each memcg and zone scanned. Make these calculations purely based on the size of the two LRU lists that are actually affected by the outcome of the decision. Signed-off-by: Johannes Weiner Reviewed-by: Rik van Riel Cc: KOSAKI Motohiro Acked-by: KAMEZAWA Hiroyuki Cc: Daisuke Nishimura Cc: Balbir Singh Reviewed-by: Minchan Kim Reviewed-by: Ying Han Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

memcg: Revert "memcg: add memory.vmscan_stat"

2011-09-15T01:09:38Z

Revert the post-3.0 commit 82f9d486e59f5 ("memcg: add memory.vmscan_stat"). The implementation of per-memcg reclaim statistics violates how memcg hierarchies usually behave: hierarchically. The reclaim statistics are accounted to child memcgs and the parent hitting the limit, but not to hierarchy levels in between. Usually, hierarchical statistics are perfectly recursive, with each level representing the sum of itself and all its children. Since this exports statistics to userspace, this may lead to confusion and problems with changing things after the release, so revert it now, we can try again later. Signed-off-by: Johannes Weiner Acked-by: KAMEZAWA Hiroyuki Cc: Daisuke Nishimura Cc: Michal Hocko Cc: Ying Han Cc: Balbir Singh Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

memcg: add memory.vmscan_stat

2011-07-26T23:49:42Z

The commit log of 0ae5e89c60c9 ("memcg: count the soft_limit reclaim in...") says it adds scanning stats to memory.stat file. But it doesn't because we considered we needed to make a concensus for such new APIs. This patch is a trial to add memory.scan_stat. This shows - the number of scanned pages(total, anon, file) - the number of rotated pages(total, anon, file) - the number of freed pages(total, anon, file) - the number of elaplsed time (including sleep/pause time) for both of direct/soft reclaim. The biggest difference with oringinal Ying's one is that this file can be reset by some write, as # echo 0 ...../memory.scan_stat Example of output is here. This is a result after make -j 6 kernel under 300M limit. [kamezawa@bluextal ~]$ cat /cgroup/memory/A/memory.scan_stat [kamezawa@bluextal ~]$ cat /cgroup/memory/A/memory.vmscan_stat scanned_pages_by_limit 9471864 scanned_anon_pages_by_limit 6640629 scanned_file_pages_by_limit 2831235 rotated_pages_by_limit 4243974 rotated_anon_pages_by_limit 3971968 rotated_file_pages_by_limit 272006 freed_pages_by_limit 2318492 freed_anon_pages_by_limit 962052 freed_file_pages_by_limit 1356440 elapsed_ns_by_limit 351386416101 scanned_pages_by_system 0 scanned_anon_pages_by_system 0 scanned_file_pages_by_system 0 rotated_pages_by_system 0 rotated_anon_pages_by_system 0 rotated_file_pages_by_system 0 freed_pages_by_system 0 freed_anon_pages_by_system 0 freed_file_pages_by_system 0 elapsed_ns_by_system 0 scanned_pages_by_limit_under_hierarchy 9471864 scanned_anon_pages_by_limit_under_hierarchy 6640629 scanned_file_pages_by_limit_under_hierarchy 2831235 rotated_pages_by_limit_under_hierarchy 4243974 rotated_anon_pages_by_limit_under_hierarchy 3971968 rotated_file_pages_by_limit_under_hierarchy 272006 freed_pages_by_limit_under_hierarchy 2318492 freed_anon_pages_by_limit_under_hierarchy 962052 freed_file_pages_by_limit_under_hierarchy 1356440 elapsed_ns_by_limit_under_hierarchy 351386416101 scanned_pages_by_system_under_hierarchy 0 scanned_anon_pages_by_system_under_hierarchy 0 scanned_file_pages_by_system_under_hierarchy 0 rotated_pages_by_system_under_hierarchy 0 rotated_anon_pages_by_system_under_hierarchy 0 rotated_file_pages_by_system_under_hierarchy 0 freed_pages_by_system_under_hierarchy 0 freed_anon_pages_by_system_under_hierarchy 0 freed_file_pages_by_system_under_hierarchy 0 elapsed_ns_by_system_under_hierarchy 0 total_xxxx is for hierarchy management. This will be useful for further memcg developments and need to be developped before we do some complicated rework on LRU/softlimit management. This patch adds a new struct memcg_scanrecord into scan_control struct. sc->nr_scanned at el is not designed for exporting information. For example, nr_scanned is reset frequentrly and incremented +2 at scanning mapped pages. To avoid complexity, I added a new param in scan_control which is for exporting scanning score. Signed-off-by: KAMEZAWA Hiroyuki Cc: Daisuke Nishimura Cc: Michal Hocko Cc: Ying Han Cc: Andrew Bresticker Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

Documentation: fix cgroup typos and formatting

2011-06-16T04:52:50Z

Fix format and spelling. Signed-off-by: Jörg Sommer Acked-by: Paul Menage Signed-off-by: Randy Dunlap Signed-off-by: Linus Torvalds

Documentation: update cgroupfs mount point

2011-06-16T04:52:50Z

According to commit 676db4af0430 ("cgroupfs: create /sys/fs/cgroup to mount cgroupfs on") the canonical mountpoint for the cgroup filesystem is /sys/fs/cgroup. Hence, this should be used in the documentation. Signed-off-by: Jörg Sommer Acked-by: Paul Menage Signed-off-by: Randy Dunlap Signed-off-by: Linus Torvalds

memcg: add documentation for the memory.numastat API

2011-06-16T03:03:59Z

[akpm@linux-foundation.org: rework text, fit it into 80-cols] Signed-off-by: Ying Han Reviewed-by: KOSAKI Motohiro Acked-by: Balbir Singh Acked-by: KAMEZAWA Hiroyuki Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

memcg: update documentation to describe usage_in_bytes

2011-04-28T18:28:21Z

Since 569b846d ("memcg: coalesce uncharge during unmap/truncate"), we do batched (delayed) uncharge at truncation/unmap. And since cdec2e42(memcg: coalesce charging via percpu storage), we have percpu cache for res_counter. These changes improved performance of memory cgroup very much, but made res_counter->usage usually have a bigger value than the actual value of memory usage. So, *.usage_in_bytes, which show res_counter->usage, are not desirable for precise values of memory(and swap) usage anymore. Instead of removing these files completely(because we cannot know res_counter->usage without them), this patch updates the meaning of those files. Signed-off-by: Daisuke Nishimura Acked-by: KAMEZAWA Hiroyuki Acked-by: Michal Hocko Cc: Balbir Singh Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

memcg: clarify use_hierarchy documentation

2011-02-17T08:17:21Z

The memcg code does not allow changing memory.use_hierarchy if the parent cgroup has enabled use_hierarchy. Update documentation to match the code. Signed-off-by: Greg Thelen Acked-by: KAMEZAWA Hiroyuki Signed-off-by: Jiri Kosina

memcg: update documentation

2010-05-27T16:12:43Z

Some information are old, and I think current document doesn't work as "a guide for users". We need summary of all of our controls, at least. Signed-off-by: KAMEZAWA Hiroyuki Reviewed-by: Randy Dunlap Cc: Balbir Singh Cc: Daisuke Nishimura Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

memcg: move charge of file pages

2010-05-27T16:12:43Z

This patch adds support for moving charge of file pages, which include normal file, tmpfs file and swaps of tmpfs file. It's enabled by setting bit 1 of /memory.move_charge_at_immigrate. Unlike the case of anonymous pages, file pages(and swaps) in the range mmapped by the task will be moved even if the task hasn't done page fault, i.e. they might not be the task's "RSS", but other task's "RSS" that maps the same file. And mapcount of the page is ignored(the page can be moved even if page_mapcount(page) > 1). So, conditions that the page/swap should be met to be moved is that it must be in the range mmapped by the target task and it must be charged to the old cgroup. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix warning] Signed-off-by: Daisuke Nishimura Acked-by: KAMEZAWA Hiroyuki Cc: Balbir Singh Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds