aboutsummaryrefslogtreecommitdiff
path: root/include/net/ip6_fib.h
AgeCommit message (Collapse)Author
2010-04-01ipv6 fib: Make rt6_info{} more cache-line aware.YOSHIFUJI Hideaki / 吉藤英明
The head element of rt6_info{} is dst_entry{}, and IPv6 specific elements follow. Because elements at the end of dst_entry{} are frequently updated, it is not good to put frequently-used static elements, such as rt6i_idev, rt6i_dst or rt6i_flags in the same cache line. On the other hand, fib6_table, rt6i_node or rt6i_gateway are rarely used, so it is okay to stay in the same cache line. Let's rearrange rt6_info{}. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-18ipv6: use standard lists for FIB walksAlexey Dobriyan
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-12ipv6: fib: fix crash when changing large fib while dumping itPatrick McHardy
When the fib size exceeds what can be dumped in a single skb, the dump is suspended and resumed once the last skb has been received by userspace. When the fib is changed while the dump is suspended, the walker might contain stale pointers, causing a crash when the dump is resumed. BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 IP: [<ffffffffa01bce04>] fib6_walk_continue+0xbb/0x124 [ipv6] PGD 5347a067 PUD 65c7067 PMD 0 Oops: 0000 [#1] PREEMPT SMP ... RIP: 0010:[<ffffffffa01bce04>] [<ffffffffa01bce04>] fib6_walk_continue+0xbb/0x124 [ipv6] ... Call Trace: [<ffffffff8104aca3>] ? mutex_spin_on_owner+0x59/0x71 [<ffffffffa01bd105>] inet6_dump_fib+0x11b/0x1b9 [ipv6] [<ffffffff81371af4>] netlink_dump+0x5b/0x19e [<ffffffff8134f288>] ? consume_skb+0x28/0x2a [<ffffffff81373b69>] netlink_recvmsg+0x1ab/0x2c6 [<ffffffff81372781>] ? netlink_unicast+0xfa/0x151 [<ffffffff813483e0>] __sock_recvmsg+0x6d/0x79 [<ffffffff81348a53>] sock_recvmsg+0xca/0xe3 [<ffffffff81066d4b>] ? autoremove_wake_function+0x0/0x38 [<ffffffff811ed1f8>] ? radix_tree_lookup_slot+0xe/0x10 [<ffffffff810b3ed7>] ? find_get_page+0x90/0xa5 [<ffffffff810b5dc5>] ? filemap_fault+0x201/0x34f [<ffffffff810ef152>] ? fget_light+0x2f/0xac [<ffffffff813519e7>] ? verify_iovec+0x4f/0x94 [<ffffffff81349a65>] sys_recvmsg+0x14d/0x223 Store the serial number when beginning to walk the fib and reload pointers when continuing to walk after a change occured. Similar to other dumping functions, this might cause unrelated entries to be missed when entries are deleted. Tested-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-04net: cleanup include/netEric Dumazet
This cleanup patch puts struct/union/enum opening braces, in first line to ease grep games. struct something { becomes : struct something { Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-30xfrm: select sane defaults for xfrm[4|6] gc_threshNeil Horman
Choose saner defaults for xfrm[4|6] gc_thresh values on init Currently, the xfrm[4|6] code has hard-coded initial gc_thresh values (set to 1024). Given that the ipv4 and ipv6 routing caches are sized dynamically at boot time, the static selections can be non-sensical. This patch dynamically selects an appropriate gc threshold based on the corresponding main routing table size, using the assumption that we should in the worst case be able to handle as many connections as the routing table can. For ipv4, the maximum route cache size is 16 * the number of hash buckets in the route cache. Given that xfrm4 starts garbage collection at the gc_thresh and prevents new allocations at 2 * gc_thresh, we set gc_thresh to half the maximum route cache size. For ipv6, its a bit trickier. there is no maximum route cache size, but the ipv6 dst_ops gc_thresh is statically set to 1024. It seems sane to select a simmilar gc_thresh for the xfrm6 code that is half the number of hash buckets in the v6 route cache times 16 (like the v4 code does). Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-04[NETNS][IPV6] rt6_info - move rt6_info structure inside the namespaceDaniel Lezcano
The rt6_info structures are moved inside the network namespace structure. All references to these structures are now relative to the initial network namespace. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-03[NETNS][IPV6] ip6_fib - add net to gc timer parameterDaniel Lezcano
The fib tables are now relative to the network namespace. When the garbage collector timer expires, we must have a network namespace parameter in order to retrieve the tables. For now this is the init_net, but we should be able to have a timer per namespace and use the timer callback parameter to pass the network namespace from the expired timer. The timer callback, fib6_run_gc, is actually used to be called synchronously by some functions and asynchronously when the timer expires. When the timer expires, the delay specified for fib6_run_gc parameter is always zero. So, I changed fib6_run_gc to not be a timer callback but a function called by the timer callback and I added a timer callback where its work is just to retrieve from the data arg of the timer the network namespace and call fib6_run_gc with zero expiring time and the network namespace parameters. That makes the code cleaner for the fib6_run_gc callers. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-03[NETNS][IPV6] ip6_fib - fib6_clean_all handle several network namespacesDaniel Lezcano
The function fib6_clean_all takes the network namespace as parameter. That allows to flush the routes related to a specific network namespace. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-03[NETNS][IPV6] ip6_fib - make it per network namespaceDaniel Lezcano
The fib table for ipv6 are moved to the network namespace structure. All references to them are made relatively to the network namespace. All external calls to the ip6_fib functions taking the network namespace parameter are made using the init_net variable, so the ip6_fib engine is ready for the namespaces but the callers not yet. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-07[IPV6] Minor cleanup: remove unused definitions in net/ip6_fib.hRami Rosen
This patch removes some unused definitions and one method typedef declaration (f_pnode) in include/net/ip6_fib.h, as they are not used in the kernel. Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[XFRM] IPv6: Fix dst/routing check at transformation.Masahide NAKAMURA
IPv6 specific thing is wrongly removed from transformation at net-2.6.25. This patch recovers it with current design. o Update "path" of xfrm_dst since IPv6 transformation should care about routing changes. It is required by MIPv6 and off-link destined IPsec. o Rename nfheader_len which is for non-fragment transformation used by MIPv6 to rt6i_nfheader_len as IPv6 name space. Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[IPV6]: route6 remove ifdef for fib_rulesDaniel Lezcano
The patch defines the usual static inline functions when the code is disabled for fib6_rules. That's allow to remove some ifdef in route.c file and make the code a little more clear. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[IPV6]: Make fib6_rules_init to return an error code.Daniel Lezcano
When the fib_rules initialization finished, no return code is provided so there is no way to know, for the caller, if the initialization has been successful or has failed. This patch fix that. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Acked-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[IPV6]: Make fib6_init to return an error code.Daniel Lezcano
If there is an error in the initialization function, nothing is followed up to the caller. So I add a return value to be set for the init function. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Acked-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[IPV6]: Move nfheader_len into rt6_infoHerbert Xu
The dst member nfheader_len is only used by IPv6. It's also currently creating a rather ugly alignment hole in struct dst. Therefore this patch moves it from there into struct rt6_info. It also reorders the fields in rt6_info to minimize holes. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[IPV6] XFRM: Fix connected socket to use transformation.Noriaki TAKAMIYA
When XFRM policy and state are ready after TCP connection is started, the traffic should be transformed immediately, however it does not on IPv6 TCP. It depends on a dst cache replacement policy with connected socket. It seems that the replacement is always done for IPv4, however, on IPv6 case it is done only when routing cookie is changed. This patch fix that non-transformation dst can be changed to transformation one. This behavior is required by MIPv6 and improves IPv6 IPsec. Fixes by Masahide NAKAMURA. Signed-off-by: Noriaki TAKAMIYA <takamiya@po.ntts.co.jp> Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[IPv6]: Use rtnl registration interfaceThomas Graf
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-25[IPV6]: Fix routing round-robin locking.David S. Miller
As per RFC2461, section 6.3.6, item #2, when no routers on the matching list are known to be reachable or probably reachable we do round robin on those available routes so that we make sure to probe as many of them as possible to detect when one becomes reachable faster. Each routing table has a rwlock protecting the tree and the linked list of routes at each leaf. The round robin code executes during lookup and thus with the rwlock taken as a reader. A small local spinlock tries to provide protection but this does not work at all for two reasons: 1) The round-robin list manipulation, as coded, goes like this (with read lock held): walk routes finding head and tail spin_lock(); rotate list using head and tail spin_unlock(); While one thread is rotating the list, another thread can end up with stale values of head and tail and then proceed to corrupt the list when it gets the lock. This ends up causing the OOPS in fib6_add() later onthat many people have been hitting. 2) All the other code paths that run with the rwlock held as a reader do not expect the list to change on them, they expect it to remain completely fixed while they hold the lock in that way. So, simply stated, it is impossible to implement this correctly using a manipulation of the list without violating the rwlock locking semantics. Reimplement using a per-fib6_node round-robin pointer. This way we don't need to manipulate the list at all, and since the round-robin pointer can only ever point to real existing entries we don't need to perform any locking on the changing of the round-robin pointer itself. We only need to reset the round-robin pointer to NULL when the entry it is pointing to is removed. The idea is from Thomas Graf and it is very similar to how this was implemented before the advanced router selection code when in. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-10[IPV6]: Convert ipv6 route to use the new dst_entry 'next' pointerEric Dumazet
This patch removes the next pointer from 'struct rt6_info.u' union, and renames u.next to u.dst.rt6_next. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-13[IPV6]: Make fib6_node subtree depend on IPV6_SUBTREESKim Nordlund
Make fib6_node 'subtree' depend on IPV6_SUBTREES. Signed-off-by: Kim Nordlund <kim.nordlund@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02[IPV6]: Introduce ip6_dst_idev() to get inet6_dev{} stored in dst_entry{}.YOSHIFUJI Hideaki
Otherwise, we will see a lot of casts... Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2006-09-22[IPV6] ROUTE: Unify RT6_F_xxx and RT6_SELECT_F_xxx flagsYOSHIFUJI Hideaki
Unify RT6_F_xxx and RT6_SELECT_F_xxx flags into RT6_LOOKUP_F_xxx flags, and put them into ip6_route.h Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Ville Nuorvala <vnuorval@tcs.hut.fi Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22[IPV6] ROUTE: Put SUBTREE() as FIB6_SUBTREE() into ip6_fib.h for future use.YOSHIFUJI Hideaki
Based on MIPL2 kernel patch. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22[IPv6] route: FIB6 configuration using struct fib6_configThomas Graf
Replaces the struct in6_rtmsg based interface orignating from the ioctl interface with a struct fib6_config based on. Allows changing the interface without breaking the ioctl interface and avoids passing on tons of parameters. The recently introduced struct nl_info is used to pass on netlink authorship information for notifications. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22[IPV6] ip6_fib.c: make code staticAdrian Bunk
Make the following needlessly global code static: - fib6_walker_lock - struct fib6_walker_list - fib6_walk_continue() - fib6_walk() Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22[NET]: Make code static.Adrian Bunk
This patch makes needlessly global code static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22[IPV6]: Policy Routing RulesThomas Graf
Adds support for policy routing rules including a new local table for routes with a local destination. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22[IPV6]: Multiple Routing TablesThomas Graf
Adds the framework to support multiple IPv6 routing tables. Currently all automatically generated routes are put into the same table. This could be changed at a later point after considering the produced locking overhead. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-21[IPV6]: V6 route events reported with wrong netlink PID and seq numberJamal Hadi Salim
Essentially netlink at the moment always reports a pid and sequence of 0 always for v6 route activities. To understand the repurcassions of this look at: http://lists.quagga.net/pipermail/quagga-dev/2005-June/003507.html While fixing this, i took the liberty to resolve the outstanding issue of IPV6 routes inserted via ioctls to have the correct pids as well. This patch tries to behave as close as possible to the v4 routes i.e maintains whatever PID the socket issuing the command owns as opposed to the process. That made the patch a little bulky. I have tested against both netlink derived utility to add/del routes as well as ioctl derived one. The Quagga folks have tested against quagga. This fixes the problem and so far hasnt been detected to introduce any new issues. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-04-16Linux-2.6.12-rc2v2.6.12-rc2Linus Torvalds
Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!