From 8713dbf05754aa777f31bf491cb60a111f7ad828 Mon Sep 17 00:00:00 2001 From: Yan Zheng Date: Fri, 28 Oct 2005 08:02:08 +0800 Subject: [MCAST]: ip[6]_mc_add_src should be called when number of sources is zero And filter mode is exclude. Further explanation by David Stevens: Multicast source filters aren't widely used yet, and that's really the only feature that's affected if an application actually exercises this bug, as far as I can tell. An ordinary filter-less multicast join should still work, and only forwarded multicast traffic making use of filters and doing empty-source filters with the MSFILTER ioctl would be at risk of not getting multicast traffic forwarded to them because the reports generated would not be based on the correct counts. Signed-off-by: Yan Zheng Signed-off-by: Arnaldo Carvalho de Melo --- net/ipv4/igmp.c | 5 ++++- net/ipv6/mcast.c | 4 +++- 2 files changed, 7 insertions(+), 2 deletions(-) (limited to 'net') diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c index 8b6d3939e1e..c6247fc8406 100644 --- a/net/ipv4/igmp.c +++ b/net/ipv4/igmp.c @@ -1908,8 +1908,11 @@ int ip_mc_msfilter(struct sock *sk, struct ip_msfilter *msf, int ifindex) sock_kfree_s(sk, newpsl, IP_SFLSIZE(newpsl->sl_max)); goto done; } - } else + } else { newpsl = NULL; + (void) ip_mc_add_src(in_dev, &msf->imsf_multiaddr, + msf->imsf_fmode, 0, NULL, 0); + } psl = pmc->sflist; if (psl) { (void) ip_mc_del_src(in_dev, &msf->imsf_multiaddr, pmc->sfmode, diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c index 966b2372aaa..f15e04ad026 100644 --- a/net/ipv6/mcast.c +++ b/net/ipv6/mcast.c @@ -545,8 +545,10 @@ int ip6_mc_msfilter(struct sock *sk, struct group_filter *gsf) sock_kfree_s(sk, newpsl, IP6_SFLSIZE(newpsl->sl_max)); goto done; } - } else + } else { newpsl = NULL; + (void) ip6_mc_add_src(idev, group, gsf->gf_fmode, 0, NULL, 0); + } psl = pmc->sflist; if (psl) { (void) ip6_mc_del_src(idev, group, pmc->sfmode, -- cgit v1.2.3-18-g5258 From 450b5b18983cc15f4d27bd3f62901e02281e818b Mon Sep 17 00:00:00 2001 From: Stephen Hemminger Date: Tue, 1 Nov 2005 15:26:45 -0800 Subject: [TCP]: BIC max increment too large The max growth of BIC TCP is too large. Original code was based on BIC 1.0 and the default there was 32. Later code (2.6.13) included compensation for delayed acks, and should have reduced the default value to 16; since normally TCP gets one ack for every two packets sent. The current value of 32 makes BIC too aggressive and unfair to other flows. Submitted-by: Injong Rhee Signed-off-by: Stephen Hemminger Acked-by: Ian McDonald Signed-off-by: Arnaldo Carvalho de Melo --- net/ipv4/tcp_bic.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net') diff --git a/net/ipv4/tcp_bic.c b/net/ipv4/tcp_bic.c index 6d80e063c18..ae35e060904 100644 --- a/net/ipv4/tcp_bic.c +++ b/net/ipv4/tcp_bic.c @@ -27,7 +27,7 @@ */ static int fast_convergence = 1; -static int max_increment = 32; +static int max_increment = 16; static int low_window = 14; static int beta = 819; /* = 819/1024 (BICTCP_BETA_SCALE) */ static int low_utilization_threshold = 153; -- cgit v1.2.3-18-g5258 From c75d721c761ad0f2d8725c40af9e4f376efefd24 Mon Sep 17 00:00:00 2001 From: Herbert Xu Date: Wed, 2 Nov 2005 18:55:00 +1100 Subject: [NET]: Fix zero-size datagram reception The recent rewrite of skb_copy_datagram_iovec broke the reception of zero-size datagrams. This patch fixes it. Signed-off-by: Herbert Xu Signed-off-by: Arnaldo Carvalho de Melo --- net/core/datagram.c | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'net') diff --git a/net/core/datagram.c b/net/core/datagram.c index 81987df536e..d219435d086 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -213,6 +213,10 @@ int skb_copy_datagram_iovec(const struct sk_buff *skb, int offset, { int i, err, fraglen, end = 0; struct sk_buff *next = skb_shinfo(skb)->frag_list; + + if (!len) + return 0; + next_skb: fraglen = skb_headlen(skb); i = -1; -- cgit v1.2.3-18-g5258 From 979ad663125af4be120697263038bb06ddbb83b4 Mon Sep 17 00:00:00 2001 From: Yan Zheng Date: Fri, 14 Oct 2005 18:31:15 +0800 Subject: [IPV6]: inet6_ifinfo_notify should use RTM_DELLINK in addrconf_ifdown Signed-off-by: Yan Zheng Acked-by: YOSHIFUJI Hideaki Signed-off-by: Arnaldo Carvalho de Melo --- net/ipv6/addrconf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net') diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 41edc14851e..2c5f57299d6 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -2163,7 +2163,7 @@ static int addrconf_ifdown(struct net_device *dev, int how) /* Step 5: netlink notification of this interface */ idev->tstamp = jiffies; - inet6_ifinfo_notify(RTM_NEWLINK, idev); + inet6_ifinfo_notify(RTM_DELLINK, idev); /* Shot the device (if unregistered) */ -- cgit v1.2.3-18-g5258 From 52ab4ac258ff10a362d78a3f8160a7c4d0721b51 Mon Sep 17 00:00:00 2001 From: Thomas Graf Date: Tue, 1 Nov 2005 15:13:02 +0100 Subject: [PKT_SCHED]: Rework QoS and/or fair queueing configuration Make "QoS and/or fair queueing" have its own menu, it's too big to be inlined into "Network options". Remove the obsolete NET_QOS option. Automatically select NET_CLS if needed. Do the same for NET_ESTIMATOR but allow it to be selected manually for statistical purposes. Add comments to separate queueing from classification. Fix dependencies and ordering of classifiers. Improve descriptions/help texts and remove outdated pieces. Signed-off-by: Thomas Graf Signed-off-by: Arnaldo Carvalho de Melo --- net/sched/Kconfig | 394 +++++++++++++++++++++++++++--------------------------- 1 file changed, 195 insertions(+), 199 deletions(-) (limited to 'net') diff --git a/net/sched/Kconfig b/net/sched/Kconfig index 81510da3179..7f34e7fd767 100644 --- a/net/sched/Kconfig +++ b/net/sched/Kconfig @@ -2,13 +2,15 @@ # Traffic control configuration. # -menuconfig NET_SCHED +menu "QoS and/or fair queueing" + +config NET_SCHED bool "QoS and/or fair queueing" ---help--- When the kernel has several packets to send out over a network device, it has to decide which ones to send first, which ones to - delay, and which ones to drop. This is the job of the packet - scheduler, and several different algorithms for how to do this + delay, and which ones to drop. This is the job of the queueing + disciplines, several different algorithms for how to do this "fairly" have been proposed. If you say N here, you will get the standard packet scheduler, which @@ -23,13 +25,13 @@ menuconfig NET_SCHED To administer these schedulers, you'll need the user-level utilities from the package iproute2+tc at . That package also contains some documentation; for more, check out - . + . This Quality of Service (QoS) support will enable you to use Differentiated Services (diffserv) and Resource Reservation Protocol - (RSVP) on your Linux router if you also say Y to "QoS support", - "Packet classifier API" and to some classifiers below. Documentation - and software is at . + (RSVP) on your Linux router if you also say Y to the corresponding + classifiers below. Documentation and software is at + . If you say Y here and to "/proc file system" below, you will be able to read status information about packet schedulers from the file @@ -42,7 +44,7 @@ choice prompt "Packet scheduler clock source" depends on NET_SCHED default NET_SCH_CLK_JIFFIES - help + ---help--- Packet schedulers need a monotonic clock that increments at a static rate. The kernel provides several suitable interfaces, each with different properties: @@ -56,7 +58,7 @@ choice config NET_SCH_CLK_JIFFIES bool "Timer interrupt" - help + ---help--- Say Y here if you want to use the timer interrupt (jiffies) as clock source. This clock source is fast, synchronized on all processors and handles cpu clock frequency changes, but its resolution is too low @@ -64,7 +66,7 @@ config NET_SCH_CLK_JIFFIES config NET_SCH_CLK_GETTIMEOFDAY bool "gettimeofday" - help + ---help--- Say Y here if you want to use gettimeofday as clock source. This clock source has high resolution, is synchronized on all processors and handles cpu clock frequency changes, but it is slow. @@ -77,7 +79,7 @@ config NET_SCH_CLK_GETTIMEOFDAY config NET_SCH_CLK_CPU bool "CPU cycle counter" depends on ((X86_TSC || X86_64) && !SMP) || ALPHA || SPARC64 || PPC64 || IA64 - help + ---help--- Say Y here if you want to use the CPU's cycle counter as clock source. This is a cheap and high resolution clock source, but on some architectures it is not synchronized on all processors and doesn't @@ -95,134 +97,129 @@ config NET_SCH_CLK_CPU endchoice +comment "Queueing/Scheduling" + depends on NET_SCHED + config NET_SCH_CBQ - tristate "CBQ packet scheduler" + tristate "Class Based Queueing (CBQ)" depends on NET_SCHED ---help--- Say Y here if you want to use the Class-Based Queueing (CBQ) packet - scheduling algorithm for some of your network devices. This - algorithm classifies the waiting packets into a tree-like hierarchy - of classes; the leaves of this tree are in turn scheduled by - separate algorithms (called "disciplines" in this context). + scheduling algorithm. This algorithm classifies the waiting packets + into a tree-like hierarchy of classes; the leaves of this tree are + in turn scheduled by separate algorithms. - See the top of for references about the - CBQ algorithm. + See the top of for more details. CBQ is a commonly used scheduler, so if you're unsure, you should say Y here. Then say Y to all the queueing algorithms below that you - want to use as CBQ disciplines. Then say Y to "Packet classifier - API" and say Y to all the classifiers you want to use; a classifier - is a routine that allows you to sort your outgoing traffic into - classes based on a certain criterion. + want to use as leaf disciplines. To compile this code as a module, choose M here: the module will be called sch_cbq. config NET_SCH_HTB - tristate "HTB packet scheduler" + tristate "Hierarchical Token Bucket (HTB)" depends on NET_SCHED ---help--- Say Y here if you want to use the Hierarchical Token Buckets (HTB) - packet scheduling algorithm for some of your network devices. See + packet scheduling algorithm. See for complete manual and in-depth articles. - HTB is very similar to the CBQ regarding its goals however is has + HTB is very similar to CBQ regarding its goals however is has different properties and different algorithm. To compile this code as a module, choose M here: the module will be called sch_htb. config NET_SCH_HFSC - tristate "HFSC packet scheduler" + tristate "Hierarchical Fair Service Curve (HFSC)" depends on NET_SCHED ---help--- Say Y here if you want to use the Hierarchical Fair Service Curve - (HFSC) packet scheduling algorithm for some of your network devices. + (HFSC) packet scheduling algorithm. To compile this code as a module, choose M here: the module will be called sch_hfsc. -#tristate ' H-PFQ packet scheduler' CONFIG_NET_SCH_HPFQ config NET_SCH_ATM - tristate "ATM pseudo-scheduler" + tristate "ATM Virtual Circuits (ATM)" depends on NET_SCHED && ATM ---help--- Say Y here if you want to use the ATM pseudo-scheduler. This - provides a framework for invoking classifiers (aka "filters"), which - in turn select classes of this queuing discipline. Each class maps - the flow(s) it is handling to a given virtual circuit (see the top of - ). + provides a framework for invoking classifiers, which in turn + select classes of this queuing discipline. Each class maps + the flow(s) it is handling to a given virtual circuit. + + See the top of ) for more details. To compile this code as a module, choose M here: the module will be called sch_atm. config NET_SCH_PRIO - tristate "The simplest PRIO pseudoscheduler" + tristate "Multi Band Priority Queueing (PRIO)" depends on NET_SCHED - help + ---help--- Say Y here if you want to use an n-band priority queue packet - "scheduler" for some of your network devices or as a leaf discipline - for the CBQ scheduling algorithm. If unsure, say Y. + scheduler. To compile this code as a module, choose M here: the module will be called sch_prio. config NET_SCH_RED - tristate "RED queue" + tristate "Random Early Detection (RED)" depends on NET_SCHED - help + ---help--- Say Y here if you want to use the Random Early Detection (RED) - packet scheduling algorithm for some of your network devices (see - the top of for details and references - about the algorithm). + packet scheduling algorithm. + + See the top of for more details. To compile this code as a module, choose M here: the module will be called sch_red. config NET_SCH_SFQ - tristate "SFQ queue" + tristate "Stochastic Fairness Queueing (SFQ)" depends on NET_SCHED ---help--- Say Y here if you want to use the Stochastic Fairness Queueing (SFQ) - packet scheduling algorithm for some of your network devices or as a - leaf discipline for the CBQ scheduling algorithm (see the top of - for details and references about the SFQ - algorithm). + packet scheduling algorithm . + + See the top of for more details. To compile this code as a module, choose M here: the module will be called sch_sfq. config NET_SCH_TEQL - tristate "TEQL queue" + tristate "True Link Equalizer (TEQL)" depends on NET_SCHED ---help--- Say Y here if you want to use the True Link Equalizer (TLE) packet - scheduling algorithm for some of your network devices or as a leaf - discipline for the CBQ scheduling algorithm. This queueing - discipline allows the combination of several physical devices into - one virtual device. (see the top of for - details). + scheduling algorithm. This queueing discipline allows the combination + of several physical devices into one virtual device. + + See the top of for more details. To compile this code as a module, choose M here: the module will be called sch_teql. config NET_SCH_TBF - tristate "TBF queue" + tristate "Token Bucket Filter (TBF)" depends on NET_SCHED - help - Say Y here if you want to use the Simple Token Bucket Filter (TBF) - packet scheduling algorithm for some of your network devices or as a - leaf discipline for the CBQ scheduling algorithm (see the top of - for a description of the TBF algorithm). + ---help--- + Say Y here if you want to use the Token Bucket Filter (TBF) packet + scheduling algorithm. + + See the top of for more details. To compile this code as a module, choose M here: the module will be called sch_tbf. config NET_SCH_GRED - tristate "GRED queue" + tristate "Generic Random Early Detection (GRED)" depends on NET_SCHED - help + ---help--- Say Y here if you want to use the Generic Random Early Detection (GRED) packet scheduling algorithm for some of your network devices (see the top of for details and @@ -232,9 +229,9 @@ config NET_SCH_GRED module will be called sch_gred. config NET_SCH_DSMARK - tristate "Diffserv field marker" + tristate "Differentiated Services marker (DSMARK)" depends on NET_SCHED - help + ---help--- Say Y if you want to schedule packets according to the Differentiated Services architecture proposed in RFC 2475. Technical information on this method, with pointers to associated @@ -244,9 +241,9 @@ config NET_SCH_DSMARK module will be called sch_dsmark. config NET_SCH_NETEM - tristate "Network emulator" + tristate "Network emulator (NETEM)" depends on NET_SCHED - help + ---help--- Say Y if you want to emulate network delay, loss, and packet re-ordering. This is often useful to simulate networks when testing applications or protocols. @@ -259,58 +256,23 @@ config NET_SCH_NETEM config NET_SCH_INGRESS tristate "Ingress Qdisc" depends on NET_SCHED - help - If you say Y here, you will be able to police incoming bandwidth - and drop packets when this bandwidth exceeds your desired rate. + ---help--- + Say Y here if you want to use classifiers for incoming packets. If unsure, say Y. To compile this code as a module, choose M here: the module will be called sch_ingress. -config NET_QOS - bool "QoS support" +comment "Classification" depends on NET_SCHED - ---help--- - Say Y here if you want to include Quality Of Service scheduling - features, which means that you will be able to request certain - rate-of-flow limits for your network devices. - - This Quality of Service (QoS) support will enable you to use - Differentiated Services (diffserv) and Resource Reservation Protocol - (RSVP) on your Linux router if you also say Y to "Packet classifier - API" and to some classifiers below. Documentation and software is at - . - - Note that the answer to this question won't directly affect the - kernel: saying N will just cause the configurator to skip all - the questions about QoS support. - -config NET_ESTIMATOR - bool "Rate estimator" - depends on NET_QOS - help - In order for Quality of Service scheduling to work, the current - rate-of-flow for a network device has to be estimated; if you say Y - here, the kernel will do just that. config NET_CLS - bool "Packet classifier API" - depends on NET_SCHED - ---help--- - The CBQ scheduling algorithm requires that network packets which are - scheduled to be sent out over a network device be classified - according to some criterion. If you say Y here, you will get a - choice of several different packet classifiers with the following - questions. - - This will enable you to use Differentiated Services (diffserv) and - Resource Reservation Protocol (RSVP) on your Linux router. - Documentation and software is at - . + boolean config NET_CLS_BASIC - tristate "Basic classifier" - depends on NET_CLS + tristate "Elementary classification (BASIC)" + depends NET_SCHED + select NET_CLS ---help--- Say Y here if you want to be able to classify packets using only extended matches and actions. @@ -319,24 +281,25 @@ config NET_CLS_BASIC module will be called cls_basic. config NET_CLS_TCINDEX - tristate "TC index classifier" - depends on NET_CLS - help - If you say Y here, you will be able to classify outgoing packets - according to the tc_index field of the skb. You will want this - feature if you want to implement Differentiated Services using - sch_dsmark. If unsure, say Y. + tristate "Traffic-Control Index (TCINDEX)" + depends NET_SCHED + select NET_CLS + ---help--- + Say Y here if you want to be able to classify packets based on + traffic control indices. You will want this feature if you want + to implement Differentiated Services together with DSMARK. To compile this code as a module, choose M here: the module will be called cls_tcindex. config NET_CLS_ROUTE4 - tristate "Routing table based classifier" - depends on NET_CLS + tristate "Routing decision (ROUTE)" + depends NET_SCHED select NET_CLS_ROUTE - help - If you say Y here, you will be able to classify outgoing packets - according to the route table entry they matched. If unsure, say Y. + select NET_CLS + ---help--- + If you say Y here, you will be able to classify packets + according to the route table entry they matched. To compile this code as a module, choose M here: the module will be called cls_route. @@ -346,58 +309,45 @@ config NET_CLS_ROUTE default n config NET_CLS_FW - tristate "Firewall based classifier" - depends on NET_CLS - help - If you say Y here, you will be able to classify outgoing packets - according to firewall criteria you specified. + tristate "Netfilter mark (FW)" + depends NET_SCHED + select NET_CLS + ---help--- + If you say Y here, you will be able to classify packets + according to netfilter/firewall marks. To compile this code as a module, choose M here: the module will be called cls_fw. config NET_CLS_U32 - tristate "U32 classifier" - depends on NET_CLS - help - If you say Y here, you will be able to classify outgoing packets - according to their destination address. If unsure, say Y. + tristate "Universal 32bit comparisons w/ hashing (U32)" + depends NET_SCHED + select NET_CLS + ---help--- + Say Y here to be able to classify packetes using a universal + 32bit pieces based comparison scheme. To compile this code as a module, choose M here: the module will be called cls_u32. config CLS_U32_PERF - bool "U32 classifier performance counters" + bool "Performance counters support" depends on NET_CLS_U32 - help - gathers stats that could be used to tune u32 classifier performance. - Requires a new iproute2 - You MUST NOT turn this on if you dont have an update iproute2. - -config NET_CLS_IND - bool "classify input device (slows things u32/fw) " - depends on NET_CLS_U32 || NET_CLS_FW - help - This option will be killed eventually when a - metadata action appears because it slows things a little - Available only for u32 and fw classifiers. - Requires a new iproute2 - You MUST NOT turn this on if you dont have an update iproute2. + ---help--- + Say Y here to make u32 gather additional statistics useful for + fine tuning u32 classifiers. config CLS_U32_MARK - bool "Use nfmark as a key in U32 classifier" + bool "Netfilter marks support" depends on NET_CLS_U32 && NETFILTER - help - This allows you to match mark in a u32 filter. - Example: - tc filter add dev eth0 protocol ip parent 1:0 prio 5 u32 \ - match mark 0x0090 0xffff \ - match ip dst 4.4.4.4 \ - flowid 1:90 - You must use a new iproute2 to use this feature. + ---help--- + Say Y here to be able to use netfilter marks as u32 key. config NET_CLS_RSVP - tristate "Special RSVP classifier" - depends on NET_CLS && NET_QOS + tristate "IPv4 Resource Reservation Protocol (RSVP)" + depends on NET_SCHED + select NET_CLS + select NET_ESTIMATOR ---help--- The Resource Reservation Protocol (RSVP) permits end systems to request a minimum and maximum data flow rate for a connection; this @@ -410,31 +360,33 @@ config NET_CLS_RSVP module will be called cls_rsvp. config NET_CLS_RSVP6 - tristate "Special RSVP classifier for IPv6" - depends on NET_CLS && NET_QOS + tristate "IPv6 Resource Reservation Protocol (RSVP6)" + depends on NET_SCHED + select NET_CLS + select NET_ESTIMATOR ---help--- The Resource Reservation Protocol (RSVP) permits end systems to request a minimum and maximum data flow rate for a connection; this is important for real time data such as streaming sound or video. Say Y here if you want to be able to classify outgoing packets based - on their RSVP requests and you are using the new Internet Protocol - IPv6 as opposed to the older and more common IPv4. + on their RSVP requests and you are using the IPv6. To compile this code as a module, choose M here: the module will be called cls_rsvp6. config NET_EMATCH bool "Extended Matches" - depends on NET_CLS + depends NET_SCHED + select NET_CLS ---help--- Say Y here if you want to use extended matches on top of classifiers and select the extended matches below. Extended matches are small classification helpers not worth writing - a separate classifier. + a separate classifier for. - You must have a recent version of the iproute2 tools in order to use + A recent version of the iproute2 package is required to use extended matches. config NET_EMATCH_STACK @@ -468,7 +420,7 @@ config NET_EMATCH_NBYTE module will be called em_nbyte. config NET_EMATCH_U32 - tristate "U32 hashing key" + tristate "U32 key" depends on NET_EMATCH ---help--- Say Y here if you want to be able to classify packets using @@ -496,76 +448,120 @@ config NET_EMATCH_TEXT select TEXTSEARCH_BM select TEXTSEARCH_FSM ---help--- - Say Y here if you want to be ablt to classify packets based on + Say Y here if you want to be able to classify packets based on textsearch comparisons. To compile this code as a module, choose M here: the module will be called em_text. config NET_CLS_ACT - bool "Packet ACTION" - depends on EXPERIMENTAL && NET_CLS && NET_QOS + bool "Actions" + depends on EXPERIMENTAL && NET_SCHED + select NET_ESTIMATOR ---help--- - This option requires you have a new iproute2. It enables - tc extensions which can be used with tc classifiers. - You MUST NOT turn this on if you dont have an update iproute2. + Say Y here if you want to use traffic control actions. Actions + get attached to classifiers and are invoked after a successful + classification. They are used to overwrite the classification + result, instantly drop or redirect packets, etc. + + A recent version of the iproute2 package is required to use + extended matches. config NET_ACT_POLICE - tristate "Policing Actions" + tristate "Traffic Policing" depends on NET_CLS_ACT ---help--- - If you are using a newer iproute2 select this one, otherwise use one - below to select a policer. - You MUST NOT turn this on if you dont have an update iproute2. + Say Y here if you want to do traffic policing, i.e. strict + bandwidth limiting. This action replaces the existing policing + module. + + To compile this code as a module, choose M here: the + module will be called police. config NET_ACT_GACT - tristate "generic Actions" + tristate "Generic actions" depends on NET_CLS_ACT ---help--- - You must have new iproute2 to use this feature. - This adds simple filtering actions like drop, accept etc. + Say Y here to take generic actions such as dropping and + accepting packets. + + To compile this code as a module, choose M here: the + module will be called gact. config GACT_PROB - bool "generic Actions probability" + bool "Probability support" depends on NET_ACT_GACT ---help--- - Allows generic actions to be randomly or deterministically used. + Say Y here to use the generic action randomly or deterministically. config NET_ACT_MIRRED - tristate "Packet In/Egress redirecton/mirror Actions" + tristate "Redirecting and Mirroring" depends on NET_CLS_ACT ---help--- - requires new iproute2 - This allows packets to be mirrored or redirected to netdevices + Say Y here to allow packets to be mirrored or redirected to + other devices. + + To compile this code as a module, choose M here: the + module will be called mirred. config NET_ACT_IPT - tristate "iptables Actions" + tristate "IPtables targets" depends on NET_CLS_ACT && NETFILTER && IP_NF_IPTABLES ---help--- - requires new iproute2 - This allows iptables targets to be used by tc filters + Say Y here to be able to invoke iptables targets after succesful + classification. + + To compile this code as a module, choose M here: the + module will be called ipt. config NET_ACT_PEDIT - tristate "Generic Packet Editor Actions" + tristate "Packet Editing" depends on NET_CLS_ACT ---help--- - requires new iproute2 - This allows for packets to be generically edited + Say Y here if you want to mangle the content of packets. -config NET_CLS_POLICE - bool "Traffic policing (needed for in/egress)" - depends on NET_CLS && NET_QOS && NET_CLS_ACT!=y - help - Say Y to support traffic policing (bandwidth limits). Needed for - ingress and egress rate limiting. + To compile this code as a module, choose M here: the + module will be called pedit. config NET_ACT_SIMP - tristate "Simple action" + tristate "Simple Example (Debug)" depends on NET_CLS_ACT ---help--- - You must have new iproute2 to use this feature. - This adds a very simple action for demonstration purposes - The idea is to give action authors a basic example to look at. - All this action will do is print on the console the configured - policy string followed by _ then packet count. + Say Y here to add a simple action for demonstration purposes. + It is meant as an example and for debugging purposes. It will + print a configured policy string followed by the packet count + to the console for every packet that passes by. + + If unsure, say N. + + To compile this code as a module, choose M here: the + module will be called simple. + +config NET_CLS_POLICE + bool "Traffic Policing (obsolete)" + depends on NET_SCHED && NET_CLS_ACT!=y + select NET_ESTIMATOR + ---help--- + Say Y here if you want to do traffic policing, i.e. strict + bandwidth limiting. This option is obsoleted by the traffic + policer implemented as action, it stays here for compatibility + reasons. + +config NET_CLS_IND + bool "Incoming device classification" + depends on NET_SCHED && (NET_CLS_U32 || NET_CLS_FW) + ---help--- + Say Y here to extend the u32 and fw classifier to support + classification based on the incoming device. This option is + likely to disappear in favour of the metadata ematch. + +config NET_ESTIMATOR + bool "Rate estimator" + depends on NET_SCHED + ---help--- + Say Y here to allow using rate estimators to estimate the current + rate-of-flow for network devices, queues, etc. This module is + automaticaly selected if needed but can be selected manually for + statstical purposes. +endmenu -- cgit v1.2.3-18-g5258 From c556b754967afd0878d65de2cfe0675577b0f62f Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 1 Nov 2005 12:24:48 -0500 Subject: SUNRPC: allow sunrpc.o to link when CONFIG_SYSCTL is disabled The sunrpc module should build properly even when CONFIG_SYSCTL is disabled. Reported by Jan-Benedict Glaw. Test plan: Compile kernel with CONFIG_NFS as a module and built-in, and CONFIG_SYSCTL enabled and disabled. Signed-off-by: Chuck Lever Signed-off-by: Trond Myklebust --- net/sunrpc/sunrpc_syms.c | 2 -- net/sunrpc/sysctl.c | 7 ------- net/sunrpc/xprtsock.c | 9 +++++++++ 3 files changed, 9 insertions(+), 9 deletions(-) (limited to 'net') diff --git a/net/sunrpc/sunrpc_syms.c b/net/sunrpc/sunrpc_syms.c index 2387e7b823f..a03d4b600c9 100644 --- a/net/sunrpc/sunrpc_syms.c +++ b/net/sunrpc/sunrpc_syms.c @@ -63,8 +63,6 @@ EXPORT_SYMBOL(rpc_mkpipe); /* Client transport */ EXPORT_SYMBOL(xprt_create_proto); EXPORT_SYMBOL(xprt_set_timeout); -EXPORT_SYMBOL(xprt_udp_slot_table_entries); -EXPORT_SYMBOL(xprt_tcp_slot_table_entries); /* Client credential cache */ EXPORT_SYMBOL(rpcauth_register); diff --git a/net/sunrpc/sysctl.c b/net/sunrpc/sysctl.c index d0c9f460e41..1065904841f 100644 --- a/net/sunrpc/sysctl.c +++ b/net/sunrpc/sysctl.c @@ -119,13 +119,6 @@ done: return 0; } -unsigned int xprt_udp_slot_table_entries = RPC_DEF_SLOT_TABLE; -unsigned int xprt_tcp_slot_table_entries = RPC_DEF_SLOT_TABLE; -unsigned int xprt_min_resvport = RPC_DEF_MIN_RESVPORT; -EXPORT_SYMBOL(xprt_min_resvport); -unsigned int xprt_max_resvport = RPC_DEF_MAX_RESVPORT; -EXPORT_SYMBOL(xprt_max_resvport); - static unsigned int min_slot_table_size = RPC_MIN_SLOT_TABLE; static unsigned int max_slot_table_size = RPC_MAX_SLOT_TABLE; diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index 2e1529217e6..0a51fd46a84 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -35,6 +35,15 @@ #include #include +/* + * xprtsock tunables + */ +unsigned int xprt_udp_slot_table_entries = RPC_DEF_SLOT_TABLE; +unsigned int xprt_tcp_slot_table_entries = RPC_DEF_SLOT_TABLE; + +unsigned int xprt_min_resvport = RPC_DEF_MIN_RESVPORT; +unsigned int xprt_max_resvport = RPC_DEF_MAX_RESVPORT; + /* * How many times to try sending a request on a socket before waiting * for the socket buffer to clear. -- cgit v1.2.3-18-g5258 From 0bbacc402e67abca8794a8401c1621dc0c0202e9 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 1 Nov 2005 16:53:32 -0500 Subject: NFS,SUNRPC,NLM: fix unused variable warnings when CONFIG_SYSCTL is disabled Fix some dprintk's so that NLM, NFS client, and RPC client compile cleanly if CONFIG_SYSCTL is disabled. Test plan: Compile kernel with CONFIG_NFS enabled and CONFIG_SYSCTL disabled. Signed-off-by: Chuck Lever Signed-off-by: Trond Myklebust --- net/sunrpc/auth.c | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-) (limited to 'net') diff --git a/net/sunrpc/auth.c b/net/sunrpc/auth.c index a415d99c394..8c7756036e9 100644 --- a/net/sunrpc/auth.c +++ b/net/sunrpc/auth.c @@ -299,11 +299,10 @@ put_rpccred(struct rpc_cred *cred) void rpcauth_unbindcred(struct rpc_task *task) { - struct rpc_auth *auth = task->tk_auth; struct rpc_cred *cred = task->tk_msg.rpc_cred; dprintk("RPC: %4d releasing %s cred %p\n", - task->tk_pid, auth->au_ops->au_name, cred); + task->tk_pid, task->tk_auth->au_ops->au_name, cred); put_rpccred(cred); task->tk_msg.rpc_cred = NULL; @@ -312,22 +311,22 @@ rpcauth_unbindcred(struct rpc_task *task) u32 * rpcauth_marshcred(struct rpc_task *task, u32 *p) { - struct rpc_auth *auth = task->tk_auth; struct rpc_cred *cred = task->tk_msg.rpc_cred; dprintk("RPC: %4d marshaling %s cred %p\n", - task->tk_pid, auth->au_ops->au_name, cred); + task->tk_pid, task->tk_auth->au_ops->au_name, cred); + return cred->cr_ops->crmarshal(task, p); } u32 * rpcauth_checkverf(struct rpc_task *task, u32 *p) { - struct rpc_auth *auth = task->tk_auth; struct rpc_cred *cred = task->tk_msg.rpc_cred; dprintk("RPC: %4d validating %s cred %p\n", - task->tk_pid, auth->au_ops->au_name, cred); + task->tk_pid, task->tk_auth->au_ops->au_name, cred); + return cred->cr_ops->crvalidate(task, p); } @@ -363,12 +362,12 @@ rpcauth_unwrap_resp(struct rpc_task *task, kxdrproc_t decode, void *rqstp, int rpcauth_refreshcred(struct rpc_task *task) { - struct rpc_auth *auth = task->tk_auth; struct rpc_cred *cred = task->tk_msg.rpc_cred; int err; dprintk("RPC: %4d refreshing %s cred %p\n", - task->tk_pid, auth->au_ops->au_name, cred); + task->tk_pid, task->tk_auth->au_ops->au_name, cred); + err = cred->cr_ops->crrefresh(task); if (err < 0) task->tk_status = err; -- cgit v1.2.3-18-g5258 From 3428c209c6820bbbb7dfb323caef8d402b3deb4c Mon Sep 17 00:00:00 2001 From: Harald Welte Date: Thu, 3 Nov 2005 14:27:07 +0100 Subject: [NETFILTER] PPTP helper: Fix compilation of conntrack helper without NAT This patch fixes compilation of the PPTP conntrack helper when NAT is configured off. Signed-off-by: Yasuyuki Kozakai Signed-off-by: Harald Welte Signed-off-by: Arnaldo Carvalho de Melo --- net/ipv4/netfilter/ip_conntrack_helper_pptp.c | 4 ---- net/ipv4/netfilter/ip_nat_helper_pptp.c | 2 ++ 2 files changed, 2 insertions(+), 4 deletions(-) (limited to 'net') diff --git a/net/ipv4/netfilter/ip_conntrack_helper_pptp.c b/net/ipv4/netfilter/ip_conntrack_helper_pptp.c index 926a6684643..4108a5e12b3 100644 --- a/net/ipv4/netfilter/ip_conntrack_helper_pptp.c +++ b/net/ipv4/netfilter/ip_conntrack_helper_pptp.c @@ -270,14 +270,10 @@ exp_gre(struct ip_conntrack *master, exp_orig->expectfn = pptp_expectfn; exp_orig->flags = 0; - exp_orig->dir = IP_CT_DIR_ORIGINAL; - /* both expectations are identical apart from tuple */ memcpy(exp_reply, exp_orig, sizeof(*exp_reply)); memcpy(&exp_reply->tuple, &exp_tuples[1], sizeof(exp_reply->tuple)); - exp_reply->dir = !exp_orig->dir; - if (ip_nat_pptp_hook_exp_gre) ret = ip_nat_pptp_hook_exp_gre(exp_orig, exp_reply); else { diff --git a/net/ipv4/netfilter/ip_nat_helper_pptp.c b/net/ipv4/netfilter/ip_nat_helper_pptp.c index 3cdd0684d30..ee6ab74ad3a 100644 --- a/net/ipv4/netfilter/ip_nat_helper_pptp.c +++ b/net/ipv4/netfilter/ip_nat_helper_pptp.c @@ -216,6 +216,7 @@ pptp_exp_gre(struct ip_conntrack_expect *expect_orig, expect_orig->saved_proto.gre.key = htons(nat_pptp_info->pac_call_id); expect_orig->tuple.src.u.gre.key = htons(nat_pptp_info->pns_call_id); expect_orig->tuple.dst.u.gre.key = htons(ct_pptp_info->pac_call_id); + expect_orig->dir = IP_CT_DIR_ORIGINAL; inv_t.src.ip = reply_t->src.ip; inv_t.dst.ip = reply_t->dst.ip; inv_t.src.u.gre.key = htons(nat_pptp_info->pac_call_id); @@ -233,6 +234,7 @@ pptp_exp_gre(struct ip_conntrack_expect *expect_orig, expect_reply->saved_proto.gre.key = htons(nat_pptp_info->pns_call_id); expect_reply->tuple.src.u.gre.key = htons(nat_pptp_info->pac_call_id); expect_reply->tuple.dst.u.gre.key = htons(ct_pptp_info->pns_call_id); + expect_reply->dir = IP_CT_DIR_REPLY; inv_t.src.ip = orig_t->src.ip; inv_t.dst.ip = orig_t->dst.ip; inv_t.src.u.gre.key = htons(nat_pptp_info->pns_call_id); -- cgit v1.2.3-18-g5258 From d811552eda2476215d69d485e437d2dcae1ab0b4 Mon Sep 17 00:00:00 2001 From: Harald Welte Date: Thu, 3 Nov 2005 13:05:20 +0100 Subject: [NETFILTER] PPTP helper: Fix endianness bug in GRE key / CallID NAT This endianness bug slipped through while changing the 'gre.key' field in the conntrack tuple from 32bit to 16bit. None of my tests caught the problem, since the linux pptp client always has '0' as call id / gre key. Only windows clients actually trigger the bug. Signed-off-by: Harald Welte Signed-off-by: Arnaldo Carvalho de Melo --- net/ipv4/netfilter/ip_nat_proto_gre.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'net') diff --git a/net/ipv4/netfilter/ip_nat_proto_gre.c b/net/ipv4/netfilter/ip_nat_proto_gre.c index 7c128540167..f7cad7cf1ae 100644 --- a/net/ipv4/netfilter/ip_nat_proto_gre.c +++ b/net/ipv4/netfilter/ip_nat_proto_gre.c @@ -139,8 +139,8 @@ gre_manip_pkt(struct sk_buff **pskb, break; case GRE_VERSION_PPTP: DEBUGP("call_id -> 0x%04x\n", - ntohl(tuple->dst.u.gre.key)); - pgreh->call_id = htons(ntohl(tuple->dst.u.gre.key)); + ntohs(tuple->dst.u.gre.key)); + pgreh->call_id = tuple->dst.u.gre.key; break; default: DEBUGP("can't nat unknown GRE version\n"); -- cgit v1.2.3-18-g5258 From d2a7bb7141a1fac7b11523538b2d2407e928baeb Mon Sep 17 00:00:00 2001 From: Harald Welte Date: Thu, 3 Nov 2005 20:17:51 +0100 Subject: [NETFILTER] NAT: Fix module refcount dropping too far The unknown protocol is used as a fallback when a protocol isn't known. Hence we cannot handle it failing, so don't set ".me". It's OK, since we only grab a reference from within the same module (iptable_nat.ko), so we never take the module refcount from 0 to 1. Also, remove the "protocol is NULL" test: it's never NULL. Signed-off-by: Rusty Rusty Signed-off-by: Harald Welte Signed-off-by: Arnaldo Carvalho de Melo --- net/ipv4/netfilter/ip_nat_core.c | 6 ++---- net/ipv4/netfilter/ip_nat_proto_unknown.c | 2 +- 2 files changed, 3 insertions(+), 5 deletions(-) (limited to 'net') diff --git a/net/ipv4/netfilter/ip_nat_core.c b/net/ipv4/netfilter/ip_nat_core.c index c5e3abd2467..762f4d93936 100644 --- a/net/ipv4/netfilter/ip_nat_core.c +++ b/net/ipv4/netfilter/ip_nat_core.c @@ -66,10 +66,8 @@ ip_nat_proto_find_get(u_int8_t protonum) * removed until we've grabbed the reference */ preempt_disable(); p = __ip_nat_proto_find(protonum); - if (p) { - if (!try_module_get(p->me)) - p = &ip_nat_unknown_protocol; - } + if (!try_module_get(p->me)) + p = &ip_nat_unknown_protocol; preempt_enable(); return p; diff --git a/net/ipv4/netfilter/ip_nat_proto_unknown.c b/net/ipv4/netfilter/ip_nat_proto_unknown.c index 99bbef56f84..f0099a646a0 100644 --- a/net/ipv4/netfilter/ip_nat_proto_unknown.c +++ b/net/ipv4/netfilter/ip_nat_proto_unknown.c @@ -62,7 +62,7 @@ unknown_print_range(char *buffer, const struct ip_nat_range *range) struct ip_nat_protocol ip_nat_unknown_protocol = { .name = "unknown", - .me = THIS_MODULE, + /* .me isn't set: getting a ref to this cannot fail. */ .manip_pkt = unknown_manip_pkt, .in_range = unknown_in_range, .unique_tuple = unknown_unique_tuple, -- cgit v1.2.3-18-g5258 From 0f81eb4db4f1cc560318b6e7762a7a1d7d8c7095 Mon Sep 17 00:00:00 2001 From: Harald Welte Date: Thu, 3 Nov 2005 19:05:37 +0100 Subject: [NETFILTER]: Fix double free after netlink_unicast() in ctnetlink It's not necessary to free skb if netlink_unicast() failed. Signed-off-by: Yasuyuki Kozakai Signed-off-by: Harald Welte Signed-off-by: Arnaldo Carvalho de Melo --- net/ipv4/netfilter/ip_conntrack_netlink.c | 19 +++++++------------ 1 file changed, 7 insertions(+), 12 deletions(-) (limited to 'net') diff --git a/net/ipv4/netfilter/ip_conntrack_netlink.c b/net/ipv4/netfilter/ip_conntrack_netlink.c index 166e6069f12..82a65043a8e 100644 --- a/net/ipv4/netfilter/ip_conntrack_netlink.c +++ b/net/ipv4/netfilter/ip_conntrack_netlink.c @@ -815,7 +815,7 @@ ctnetlink_get_conntrack(struct sock *ctnl, struct sk_buff *skb, IPCTNL_MSG_CT_NEW, 1, ct); ip_conntrack_put(ct); if (err <= 0) - goto out; + goto free; err = netlink_unicast(ctnl, skb2, NETLINK_CB(skb).pid, MSG_DONTWAIT); if (err < 0) @@ -824,9 +824,9 @@ ctnetlink_get_conntrack(struct sock *ctnl, struct sk_buff *skb, DEBUGP("leaving\n"); return 0; +free: + kfree_skb(skb2); out: - if (skb2) - kfree_skb(skb2); return -1; } @@ -1322,21 +1322,16 @@ ctnetlink_get_expect(struct sock *ctnl, struct sk_buff *skb, nlh->nlmsg_seq, IPCTNL_MSG_EXP_NEW, 1, exp); if (err <= 0) - goto out; + goto free; ip_conntrack_expect_put(exp); - err = netlink_unicast(ctnl, skb2, NETLINK_CB(skb).pid, MSG_DONTWAIT); - if (err < 0) - goto free; - - return err; + return netlink_unicast(ctnl, skb2, NETLINK_CB(skb).pid, MSG_DONTWAIT); +free: + kfree_skb(skb2); out: ip_conntrack_expect_put(exp); -free: - if (skb2) - kfree_skb(skb2); return err; } -- cgit v1.2.3-18-g5258 From 10dfdc69ea07d5a6c24406755f5e8de95a1b8901 Mon Sep 17 00:00:00 2001 From: Harald Welte Date: Thu, 3 Nov 2005 19:20:07 +0100 Subject: [NETFILTER] nfnetlink: Use kzalloc These is a cleanup patch, kzalloc can be used in a couple of cases Signed-off-by: Samir Bellabes Signed-off-by: Harald Welte Signed-off-by: Arnaldo Carvalho de Melo --- net/netfilter/nfnetlink_log.c | 6 ++---- net/netfilter/nfnetlink_queue.c | 6 ++---- 2 files changed, 4 insertions(+), 8 deletions(-) (limited to 'net') diff --git a/net/netfilter/nfnetlink_log.c b/net/netfilter/nfnetlink_log.c index efcd10f996b..d194676f365 100644 --- a/net/netfilter/nfnetlink_log.c +++ b/net/netfilter/nfnetlink_log.c @@ -146,11 +146,10 @@ instance_create(u_int16_t group_num, int pid) goto out_unlock; } - inst = kmalloc(sizeof(*inst), GFP_ATOMIC); + inst = kzalloc(sizeof(*inst), GFP_ATOMIC); if (!inst) goto out_unlock; - memset(inst, 0, sizeof(*inst)); INIT_HLIST_NODE(&inst->hlist); inst->lock = SPIN_LOCK_UNLOCKED; /* needs to be two, since we _put() after creation */ @@ -962,10 +961,9 @@ static int nful_open(struct inode *inode, struct file *file) struct iter_state *is; int ret; - is = kmalloc(sizeof(*is), GFP_KERNEL); + is = kzalloc(sizeof(*is), GFP_KERNEL); if (!is) return -ENOMEM; - memset(is, 0, sizeof(*is)); ret = seq_open(file, &nful_seq_ops); if (ret < 0) goto out_free; diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c index eaa44c49567..f065a6c9495 100644 --- a/net/netfilter/nfnetlink_queue.c +++ b/net/netfilter/nfnetlink_queue.c @@ -136,11 +136,10 @@ instance_create(u_int16_t queue_num, int pid) goto out_unlock; } - inst = kmalloc(sizeof(*inst), GFP_ATOMIC); + inst = kzalloc(sizeof(*inst), GFP_ATOMIC); if (!inst) goto out_unlock; - memset(inst, 0, sizeof(*inst)); inst->queue_num = queue_num; inst->peer_pid = pid; inst->queue_maxlen = NFQNL_QMAX_DEFAULT; @@ -1036,10 +1035,9 @@ static int nfqnl_open(struct inode *inode, struct file *file) struct iter_state *is; int ret; - is = kmalloc(sizeof(*is), GFP_KERNEL); + is = kzalloc(sizeof(*is), GFP_KERNEL); if (!is) return -ENOMEM; - memset(is, 0, sizeof(*is)); ret = seq_open(file, &nfqnl_seq_ops); if (ret < 0) goto out_free; -- cgit v1.2.3-18-g5258 From 433a4d3b5456dec5bcca5a0f236bf622da1267b3 Mon Sep 17 00:00:00 2001 From: Harald Welte Date: Thu, 3 Nov 2005 19:25:56 +0100 Subject: [NETFILTER]: CONNMARK target needs ip_conntrack There's a missing dependency from the CONNMARK target to ip_conntrack. Signed-off-by: Pablo Neira Ayuso Signed-off-by: Harald Welte Signed-off-by: Arnaldo Carvalho de Melo --- net/ipv4/netfilter/ipt_CONNMARK.c | 1 + 1 file changed, 1 insertion(+) (limited to 'net') diff --git a/net/ipv4/netfilter/ipt_CONNMARK.c b/net/ipv4/netfilter/ipt_CONNMARK.c index 13463802133..05d66ab5942 100644 --- a/net/ipv4/netfilter/ipt_CONNMARK.c +++ b/net/ipv4/netfilter/ipt_CONNMARK.c @@ -109,6 +109,7 @@ static struct ipt_target ipt_connmark_reg = { static int __init init(void) { + need_ip_conntrack(); return ipt_register_target(&ipt_connmark_reg); } -- cgit v1.2.3-18-g5258 From 1758ee0ea26561943813c5f5a7b27272f2cbc4cf Mon Sep 17 00:00:00 2001 From: Harald Welte Date: Thu, 3 Nov 2005 20:03:24 +0100 Subject: [NETFILTER] nf_queue: Fix Ooops when no queue handler registered With the new nf_queue generalization in 2.6.14, we've introduced a bug that causes an oops as soon as a packet is queued but no queue handler registered. This patch fixes it. Signed-off-by: Harald Welte Signed-off-by: Arnaldo Carvalho de Melo --- net/netfilter/nf_queue.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net') diff --git a/net/netfilter/nf_queue.c b/net/netfilter/nf_queue.c index d10d552d9c4..d3a4f30a7f2 100644 --- a/net/netfilter/nf_queue.c +++ b/net/netfilter/nf_queue.c @@ -117,7 +117,7 @@ int nf_queue(struct sk_buff **skb, /* QUEUE == DROP if noone is waiting, to be safe. */ read_lock(&queue_handler_lock); - if (!queue_handler[pf]->outfn) { + if (!queue_handler[pf] || !queue_handler[pf]->outfn) { read_unlock(&queue_handler_lock); kfree_skb(*skb); return 1; -- cgit v1.2.3-18-g5258 From 07aaa11540828f4482c09e1a936a1f63cdb9fc9d Mon Sep 17 00:00:00 2001 From: Stephen Hemminger Date: Thu, 3 Nov 2005 13:43:07 -0800 Subject: [NETEM]: use PSCHED_LESS Convert netem to use PSCHED_LESS and warn if requeue fails. With some of the psched clock sources, the subtraction doesn't work always work right without wrapping. Signed-off-by: Stephen Hemminger Signed-off-by: Arnaldo Carvalho de Melo --- net/sched/sch_netem.c | 34 ++++++++++++++++++++++------------ 1 file changed, 22 insertions(+), 12 deletions(-) (limited to 'net') diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c index bb9bf8d5003..d871fe7f81a 100644 --- a/net/sched/sch_netem.c +++ b/net/sched/sch_netem.c @@ -185,10 +185,13 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch) || q->counter < q->gap /* inside last reordering gap */ || q->reorder < get_crandom(&q->reorder_cor)) { psched_time_t now; + psched_tdiff_t delay; + + delay = tabledist(q->latency, q->jitter, + &q->delay_cor, q->delay_dist); + PSCHED_GET_TIME(now); - PSCHED_TADD2(now, tabledist(q->latency, q->jitter, - &q->delay_cor, q->delay_dist), - cb->time_to_send); + PSCHED_TADD2(now, delay, cb->time_to_send); ++q->counter; ret = q->qdisc->enqueue(skb, q->qdisc); } else { @@ -248,24 +251,31 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch) const struct netem_skb_cb *cb = (const struct netem_skb_cb *)skb->cb; psched_time_t now; - long delay; /* if more time remaining? */ PSCHED_GET_TIME(now); - delay = PSCHED_US2JIFFIE(PSCHED_TDIFF(cb->time_to_send, now)); - pr_debug("netem_run: skb=%p delay=%ld\n", skb, delay); - if (delay <= 0) { + + if (PSCHED_TLESS(cb->time_to_send, now)) { pr_debug("netem_dequeue: return skb=%p\n", skb); sch->q.qlen--; sch->flags &= ~TCQ_F_THROTTLED; return skb; - } + } else { + psched_tdiff_t delay = PSCHED_TDIFF(cb->time_to_send, now); - mod_timer(&q->timer, jiffies + delay); - sch->flags |= TCQ_F_THROTTLED; + if (q->qdisc->ops->requeue(skb, q->qdisc) != NET_XMIT_SUCCESS) { + sch->qstats.drops++; - if (q->qdisc->ops->requeue(skb, q->qdisc) != 0) - sch->qstats.drops++; + /* After this qlen is confused */ + printk(KERN_ERR "netem: queue discpline %s could not requeue\n", + q->qdisc->ops->id); + + sch->q.qlen--; + } + + mod_timer(&q->timer, jiffies + PSCHED_US2JIFFIE(delay)); + sch->flags |= TCQ_F_THROTTLED; + } } return NULL; -- cgit v1.2.3-18-g5258 From 6b31b28a441c9ba33889f88ac1d9451ed9532ada Mon Sep 17 00:00:00 2001 From: Thomas Graf Date: Sat, 5 Nov 2005 21:14:05 +0100 Subject: [PKT_SCHED]: RED: Use new generic red interface Simplifies code a lot by separating the red algorithm and the queueing logic. We now differentiate between probability marks and forced marks but sum them together again to not break backwards compatibility. Signed-off-by: Thomas Graf Signed-off-by: Arnaldo Carvalho de Melo --- net/sched/sch_red.c | 321 ++++++++++++---------------------------------------- 1 file changed, 74 insertions(+), 247 deletions(-) (limited to 'net') diff --git a/net/sched/sch_red.c b/net/sched/sch_red.c index 7845d045eec..0dabcc9091b 100644 --- a/net/sched/sch_red.c +++ b/net/sched/sch_red.c @@ -41,44 +41,10 @@ #include #include #include +#include -/* Random Early Detection (RED) algorithm. - ======================================= - - Source: Sally Floyd and Van Jacobson, "Random Early Detection Gateways - for Congestion Avoidance", 1993, IEEE/ACM Transactions on Networking. - - This file codes a "divisionless" version of RED algorithm - as written down in Fig.17 of the paper. - -Short description. ------------------- - - When a new packet arrives we calculate the average queue length: - - avg = (1-W)*avg + W*current_queue_len, - - W is the filter time constant (chosen as 2^(-Wlog)), it controls - the inertia of the algorithm. To allow larger bursts, W should be - decreased. - - if (avg > th_max) -> packet marked (dropped). - if (avg < th_min) -> packet passes. - if (th_min < avg < th_max) we calculate probability: - - Pb = max_P * (avg - th_min)/(th_max-th_min) - - and mark (drop) packet with this probability. - Pb changes from 0 (at avg==th_min) to max_P (avg==th_max). - max_P should be small (not 1), usually 0.01..0.02 is good value. - - max_P is chosen as a number, so that max_P/(th_max-th_min) - is a negative power of two in order arithmetics to contain - only shifts. - - - Parameters, settable by user: +/* Parameters, settable by user: ----------------------------- limit - bytes (must be > qth_max + burst) @@ -89,92 +55,19 @@ Short description. arbitrarily high (well, less than ram size) Really, this limit will never be reached if RED works correctly. - - qth_min - bytes (should be < qth_max/2) - qth_max - bytes (should be at least 2*qth_min and less limit) - Wlog - bits (<32) log(1/W). - Plog - bits (<32) - - Plog is related to max_P by formula: - - max_P = (qth_max-qth_min)/2^Plog; - - F.e. if qth_max=128K and qth_min=32K, then Plog=22 - corresponds to max_P=0.02 - - Scell_log - Stab - - Lookup table for log((1-W)^(t/t_ave). - - -NOTES: - -Upper bound on W. ------------------ - - If you want to allow bursts of L packets of size S, - you should choose W: - - L + 1 - th_min/S < (1-(1-W)^L)/W - - th_min/S = 32 th_min/S = 4 - - log(W) L - -1 33 - -2 35 - -3 39 - -4 46 - -5 57 - -6 75 - -7 101 - -8 135 - -9 190 - etc. */ struct red_sched_data { -/* Parameters */ - u32 limit; /* HARD maximal queue length */ - u32 qth_min; /* Min average length threshold: A scaled */ - u32 qth_max; /* Max average length threshold: A scaled */ - u32 Rmask; - u32 Scell_max; - unsigned char flags; - char Wlog; /* log(W) */ - char Plog; /* random number bits */ - char Scell_log; - u8 Stab[256]; - -/* Variables */ - unsigned long qave; /* Average queue length: A scaled */ - int qcount; /* Packets since last random number generation */ - u32 qR; /* Cached random number */ - - psched_time_t qidlestart; /* Start of idle period */ - struct tc_red_xstats st; + u32 limit; /* HARD maximal queue length */ + unsigned char flags; + struct red_parms parms; + struct red_stats stats; }; -static int red_ecn_mark(struct sk_buff *skb) +static inline int red_use_ecn(struct red_sched_data *q) { - if (skb->nh.raw + 20 > skb->tail) - return 0; - - switch (skb->protocol) { - case __constant_htons(ETH_P_IP): - if (INET_ECN_is_not_ect(skb->nh.iph->tos)) - return 0; - IP_ECN_set_ce(skb->nh.iph); - return 1; - case __constant_htons(ETH_P_IPV6): - if (INET_ECN_is_not_ect(ipv6_get_dsfield(skb->nh.ipv6h))) - return 0; - IP6_ECN_set_ce(skb->nh.ipv6h); - return 1; - default: - return 0; - } + return q->flags & TC_RED_ECN; } static int @@ -182,119 +75,50 @@ red_enqueue(struct sk_buff *skb, struct Qdisc* sch) { struct red_sched_data *q = qdisc_priv(sch); - psched_time_t now; + q->parms.qavg = red_calc_qavg(&q->parms, sch->qstats.backlog); - if (!PSCHED_IS_PASTPERFECT(q->qidlestart)) { - long us_idle; - int shift; + if (red_is_idling(&q->parms)) + red_end_of_idle_period(&q->parms); - PSCHED_GET_TIME(now); - us_idle = PSCHED_TDIFF_SAFE(now, q->qidlestart, q->Scell_max); - PSCHED_SET_PASTPERFECT(q->qidlestart); + switch (red_action(&q->parms, q->parms.qavg)) { + case RED_DONT_MARK: + break; -/* - The problem: ideally, average length queue recalcultion should - be done over constant clock intervals. This is too expensive, so that - the calculation is driven by outgoing packets. - When the queue is idle we have to model this clock by hand. - - SF+VJ proposed to "generate" m = idletime/(average_pkt_size/bandwidth) - dummy packets as a burst after idle time, i.e. - - q->qave *= (1-W)^m - - This is an apparently overcomplicated solution (f.e. we have to precompute - a table to make this calculation in reasonable time) - I believe that a simpler model may be used here, - but it is field for experiments. -*/ - shift = q->Stab[us_idle>>q->Scell_log]; - - if (shift) { - q->qave >>= shift; - } else { - /* Approximate initial part of exponent - with linear function: - (1-W)^m ~= 1-mW + ... - - Seems, it is the best solution to - problem of too coarce exponent tabulation. - */ - - us_idle = (q->qave * us_idle)>>q->Scell_log; - if (us_idle < q->qave/2) - q->qave -= us_idle; - else - q->qave >>= 1; - } - } else { - q->qave += sch->qstats.backlog - (q->qave >> q->Wlog); - /* NOTE: - q->qave is fixed point number with point at Wlog. - The formulae above is equvalent to floating point - version: - - qave = qave*(1-W) + sch->qstats.backlog*W; - --ANK (980924) - */ - } + case RED_PROB_MARK: + sch->qstats.overlimits++; + if (!red_use_ecn(q) || !INET_ECN_set_ce(skb)) { + q->stats.prob_drop++; + goto congestion_drop; + } - if (q->qave < q->qth_min) { - q->qcount = -1; -enqueue: - if (sch->qstats.backlog + skb->len <= q->limit) { - __skb_queue_tail(&sch->q, skb); - sch->qstats.backlog += skb->len; - sch->bstats.bytes += skb->len; - sch->bstats.packets++; - return NET_XMIT_SUCCESS; - } else { - q->st.pdrop++; - } - kfree_skb(skb); - sch->qstats.drops++; - return NET_XMIT_DROP; - } - if (q->qave >= q->qth_max) { - q->qcount = -1; - sch->qstats.overlimits++; -mark: - if (!(q->flags&TC_RED_ECN) || !red_ecn_mark(skb)) { - q->st.early++; - goto drop; - } - q->st.marked++; - goto enqueue; + q->stats.prob_mark++; + break; + + case RED_HARD_MARK: + sch->qstats.overlimits++; + if (!red_use_ecn(q) || !INET_ECN_set_ce(skb)) { + q->stats.forced_drop++; + goto congestion_drop; + } + + q->stats.forced_mark++; + break; } - if (++q->qcount) { - /* The formula used below causes questions. - - OK. qR is random number in the interval 0..Rmask - i.e. 0..(2^Plog). If we used floating point - arithmetics, it would be: (2^Plog)*rnd_num, - where rnd_num is less 1. - - Taking into account, that qave have fixed - point at Wlog, and Plog is related to max_P by - max_P = (qth_max-qth_min)/2^Plog; two lines - below have the following floating point equivalent: - - max_P*(qave - qth_min)/(qth_max-qth_min) < rnd/qcount - - Any questions? --ANK (980924) - */ - if (((q->qave - q->qth_min)>>q->Wlog)*q->qcount < q->qR) - goto enqueue; - q->qcount = 0; - q->qR = net_random()&q->Rmask; - sch->qstats.overlimits++; - goto mark; + if (sch->qstats.backlog + skb->len <= q->limit) { + __skb_queue_tail(&sch->q, skb); + sch->qstats.backlog += skb->len; + sch->bstats.bytes += skb->len; + sch->bstats.packets++; + return NET_XMIT_SUCCESS; } - q->qR = net_random()&q->Rmask; - goto enqueue; -drop: + q->stats.pdrop++; + kfree_skb(skb); + sch->qstats.drops++; + return NET_XMIT_DROP; + +congestion_drop: kfree_skb(skb); sch->qstats.drops++; return NET_XMIT_CN; @@ -305,7 +129,8 @@ red_requeue(struct sk_buff *skb, struct Qdisc* sch) { struct red_sched_data *q = qdisc_priv(sch); - PSCHED_SET_PASTPERFECT(q->qidlestart); + if (red_is_idling(&q->parms)) + red_end_of_idle_period(&q->parms); __skb_queue_head(&sch->q, skb); sch->qstats.backlog += skb->len; @@ -324,7 +149,8 @@ red_dequeue(struct Qdisc* sch) sch->qstats.backlog -= skb->len; return skb; } - PSCHED_GET_TIME(q->qidlestart); + + red_start_of_idle_period(&q->parms); return NULL; } @@ -338,11 +164,12 @@ static unsigned int red_drop(struct Qdisc* sch) unsigned int len = skb->len; sch->qstats.backlog -= len; sch->qstats.drops++; - q->st.other++; + q->stats.other++; kfree_skb(skb); return len; } - PSCHED_GET_TIME(q->qidlestart); + + red_start_of_idle_period(&q->parms); return 0; } @@ -352,9 +179,7 @@ static void red_reset(struct Qdisc* sch) __skb_queue_purge(&sch->q); sch->qstats.backlog = 0; - PSCHED_SET_PASTPERFECT(q->qidlestart); - q->qave = 0; - q->qcount = -1; + red_restart(&q->parms); } static int red_change(struct Qdisc *sch, struct rtattr *opt) @@ -374,19 +199,14 @@ static int red_change(struct Qdisc *sch, struct rtattr *opt) sch_tree_lock(sch); q->flags = ctl->flags; - q->Wlog = ctl->Wlog; - q->Plog = ctl->Plog; - q->Rmask = ctl->Plog < 32 ? ((1<Plog) - 1) : ~0UL; - q->Scell_log = ctl->Scell_log; - q->Scell_max = (255<Scell_log); - q->qth_min = ctl->qth_min<Wlog; - q->qth_max = ctl->qth_max<Wlog; q->limit = ctl->limit; - memcpy(q->Stab, RTA_DATA(tb[TCA_RED_STAB-1]), 256); - q->qcount = -1; + red_set_parms(&q->parms, ctl->qth_min, ctl->qth_max, ctl->Wlog, + ctl->Plog, ctl->Scell_log, + RTA_DATA(tb[TCA_RED_STAB-1])); + if (skb_queue_empty(&sch->q)) - PSCHED_SET_PASTPERFECT(q->qidlestart); + red_end_of_idle_period(&q->parms); sch_tree_unlock(sch); return 0; } @@ -401,17 +221,18 @@ static int red_dump(struct Qdisc *sch, struct sk_buff *skb) struct red_sched_data *q = qdisc_priv(sch); unsigned char *b = skb->tail; struct rtattr *rta; - struct tc_red_qopt opt; + struct tc_red_qopt opt = { + .limit = q->limit, + .flags = q->flags, + .qth_min = q->parms.qth_min >> q->parms.Wlog, + .qth_max = q->parms.qth_max >> q->parms.Wlog, + .Wlog = q->parms.Wlog, + .Plog = q->parms.Plog, + .Scell_log = q->parms.Scell_log, + }; rta = (struct rtattr*)b; RTA_PUT(skb, TCA_OPTIONS, 0, NULL); - opt.limit = q->limit; - opt.qth_min = q->qth_min>>q->Wlog; - opt.qth_max = q->qth_max>>q->Wlog; - opt.Wlog = q->Wlog; - opt.Plog = q->Plog; - opt.Scell_log = q->Scell_log; - opt.flags = q->flags; RTA_PUT(skb, TCA_RED_PARMS, sizeof(opt), &opt); rta->rta_len = skb->tail - b; @@ -425,8 +246,14 @@ rtattr_failure: static int red_dump_stats(struct Qdisc *sch, struct gnet_dump *d) { struct red_sched_data *q = qdisc_priv(sch); - - return gnet_stats_copy_app(d, &q->st, sizeof(q->st)); + struct tc_red_xstats st = { + .early = q->stats.prob_drop + q->stats.forced_drop, + .pdrop = q->stats.pdrop, + .other = q->stats.other, + .marked = q->stats.prob_mark + q->stats.forced_mark, + }; + + return gnet_stats_copy_app(d, &st, sizeof(st)); } static struct Qdisc_ops red_qdisc_ops = { -- cgit v1.2.3-18-g5258 From 9e178ff27cd9187babe86dc80ef766b722c88da6 Mon Sep 17 00:00:00 2001 From: Thomas Graf Date: Sat, 5 Nov 2005 21:14:06 +0100 Subject: [PKT_SCHED]: RED: Use generic queue management interface Signed-off-by: Thomas Graf Signed-off-by: Arnaldo Carvalho de Melo --- net/sched/sch_red.c | 42 +++++++++++++----------------------------- 1 file changed, 13 insertions(+), 29 deletions(-) (limited to 'net') diff --git a/net/sched/sch_red.c b/net/sched/sch_red.c index 0dabcc9091b..d5e934c33f9 100644 --- a/net/sched/sch_red.c +++ b/net/sched/sch_red.c @@ -105,22 +105,14 @@ red_enqueue(struct sk_buff *skb, struct Qdisc* sch) break; } - if (sch->qstats.backlog + skb->len <= q->limit) { - __skb_queue_tail(&sch->q, skb); - sch->qstats.backlog += skb->len; - sch->bstats.bytes += skb->len; - sch->bstats.packets++; - return NET_XMIT_SUCCESS; - } + if (sch->qstats.backlog + skb->len <= q->limit) + return qdisc_enqueue_tail(skb, sch); q->stats.pdrop++; - kfree_skb(skb); - sch->qstats.drops++; - return NET_XMIT_DROP; + return qdisc_drop(skb, sch); congestion_drop: - kfree_skb(skb); - sch->qstats.drops++; + qdisc_drop(skb, sch); return NET_XMIT_CN; } @@ -132,10 +124,7 @@ red_requeue(struct sk_buff *skb, struct Qdisc* sch) if (red_is_idling(&q->parms)) red_end_of_idle_period(&q->parms); - __skb_queue_head(&sch->q, skb); - sch->qstats.backlog += skb->len; - sch->qstats.requeues++; - return 0; + return qdisc_requeue(skb, sch); } static struct sk_buff * @@ -144,14 +133,12 @@ red_dequeue(struct Qdisc* sch) struct sk_buff *skb; struct red_sched_data *q = qdisc_priv(sch); - skb = __skb_dequeue(&sch->q); - if (skb) { - sch->qstats.backlog -= skb->len; - return skb; - } + skb = qdisc_dequeue_head(sch); - red_start_of_idle_period(&q->parms); - return NULL; + if (skb == NULL) + red_start_of_idle_period(&q->parms); + + return skb; } static unsigned int red_drop(struct Qdisc* sch) @@ -159,13 +146,11 @@ static unsigned int red_drop(struct Qdisc* sch) struct sk_buff *skb; struct red_sched_data *q = qdisc_priv(sch); - skb = __skb_dequeue_tail(&sch->q); + skb = qdisc_dequeue_tail(sch); if (skb) { unsigned int len = skb->len; - sch->qstats.backlog -= len; - sch->qstats.drops++; q->stats.other++; - kfree_skb(skb); + qdisc_drop(skb, sch); return len; } @@ -177,8 +162,7 @@ static void red_reset(struct Qdisc* sch) { struct red_sched_data *q = qdisc_priv(sch); - __skb_queue_purge(&sch->q); - sch->qstats.backlog = 0; + qdisc_reset_queue(sch); red_restart(&q->parms); } -- cgit v1.2.3-18-g5258 From 6a1b63d467281eb6bd64aafbbf6130a1b42c8c2e Mon Sep 17 00:00:00 2001 From: Thomas Graf Date: Sat, 5 Nov 2005 21:14:07 +0100 Subject: [PKT_SCHED]: RED: Dont start idle periods while already idling We should not interrupt and restart an idle period while idling already. Signed-off-by: Thomas Graf Signed-off-by: Arnaldo Carvalho de Melo --- net/sched/sch_red.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) (limited to 'net') diff --git a/net/sched/sch_red.c b/net/sched/sch_red.c index d5e934c33f9..76e8df8447d 100644 --- a/net/sched/sch_red.c +++ b/net/sched/sch_red.c @@ -135,7 +135,7 @@ red_dequeue(struct Qdisc* sch) skb = qdisc_dequeue_head(sch); - if (skb == NULL) + if (skb == NULL && !red_is_idling(&q->parms)) red_start_of_idle_period(&q->parms); return skb; @@ -154,7 +154,9 @@ static unsigned int red_drop(struct Qdisc* sch) return len; } - red_start_of_idle_period(&q->parms); + if (!red_is_idling(&q->parms)) + red_start_of_idle_period(&q->parms); + return 0; } -- cgit v1.2.3-18-g5258 From dba051f36a47989b20b248248ffef7984a2f6013 Mon Sep 17 00:00:00 2001 From: Thomas Graf Date: Sat, 5 Nov 2005 21:14:08 +0100 Subject: [PKT_SCHED]: RED: Cleanup and remove unnecessary code Removes the skb trimming code which is not needed since we never touch the skb upon failure. Removes unnecessary includes, initializers, and simplifies the code a bit. Removes Jamal's obsolete email addresses upon his own request. Signed-off-by: Thomas Graf Signed-off-by: Arnaldo Carvalho de Melo --- net/sched/sch_red.c | 65 +++++++++++++++++------------------------------------ 1 file changed, 21 insertions(+), 44 deletions(-) (limited to 'net') diff --git a/net/sched/sch_red.c b/net/sched/sch_red.c index 76e8df8447d..0d89dee751a 100644 --- a/net/sched/sch_red.c +++ b/net/sched/sch_red.c @@ -9,38 +9,19 @@ * Authors: Alexey Kuznetsov, * * Changes: - * J Hadi Salim 980914: computation fixes + * J Hadi Salim 980914: computation fixes * Alexey Makarenko 990814: qave on idle link was calculated incorrectly. - * J Hadi Salim 980816: ECN support + * J Hadi Salim 980816: ECN support */ #include #include -#include -#include -#include #include #include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include #include -#include -#include -#include -#include #include -#include #include #include -#include #include @@ -70,8 +51,7 @@ static inline int red_use_ecn(struct red_sched_data *q) return q->flags & TC_RED_ECN; } -static int -red_enqueue(struct sk_buff *skb, struct Qdisc* sch) +static int red_enqueue(struct sk_buff *skb, struct Qdisc* sch) { struct red_sched_data *q = qdisc_priv(sch); @@ -116,8 +96,7 @@ congestion_drop: return NET_XMIT_CN; } -static int -red_requeue(struct sk_buff *skb, struct Qdisc* sch) +static int red_requeue(struct sk_buff *skb, struct Qdisc* sch) { struct red_sched_data *q = qdisc_priv(sch); @@ -127,8 +106,7 @@ red_requeue(struct sk_buff *skb, struct Qdisc* sch) return qdisc_requeue(skb, sch); } -static struct sk_buff * -red_dequeue(struct Qdisc* sch) +static struct sk_buff * red_dequeue(struct Qdisc* sch) { struct sk_buff *skb; struct red_sched_data *q = qdisc_priv(sch); @@ -171,14 +149,16 @@ static void red_reset(struct Qdisc* sch) static int red_change(struct Qdisc *sch, struct rtattr *opt) { struct red_sched_data *q = qdisc_priv(sch); - struct rtattr *tb[TCA_RED_STAB]; + struct rtattr *tb[TCA_RED_MAX]; struct tc_red_qopt *ctl; - if (opt == NULL || - rtattr_parse_nested(tb, TCA_RED_STAB, opt) || - tb[TCA_RED_PARMS-1] == 0 || tb[TCA_RED_STAB-1] == 0 || + if (opt == NULL || rtattr_parse_nested(tb, TCA_RED_MAX, opt)) + return -EINVAL; + + if (tb[TCA_RED_PARMS-1] == NULL || RTA_PAYLOAD(tb[TCA_RED_PARMS-1]) < sizeof(*ctl) || - RTA_PAYLOAD(tb[TCA_RED_STAB-1]) < 256) + tb[TCA_RED_STAB-1] == NULL || + RTA_PAYLOAD(tb[TCA_RED_STAB-1]) < RED_STAB_SIZE) return -EINVAL; ctl = RTA_DATA(tb[TCA_RED_PARMS-1]); @@ -193,6 +173,7 @@ static int red_change(struct Qdisc *sch, struct rtattr *opt) if (skb_queue_empty(&sch->q)) red_end_of_idle_period(&q->parms); + sch_tree_unlock(sch); return 0; } @@ -205,8 +186,7 @@ static int red_init(struct Qdisc* sch, struct rtattr *opt) static int red_dump(struct Qdisc *sch, struct sk_buff *skb) { struct red_sched_data *q = qdisc_priv(sch); - unsigned char *b = skb->tail; - struct rtattr *rta; + struct rtattr *opts = NULL; struct tc_red_qopt opt = { .limit = q->limit, .flags = q->flags, @@ -217,16 +197,12 @@ static int red_dump(struct Qdisc *sch, struct sk_buff *skb) .Scell_log = q->parms.Scell_log, }; - rta = (struct rtattr*)b; - RTA_PUT(skb, TCA_OPTIONS, 0, NULL); + opts = RTA_NEST(skb, TCA_OPTIONS); RTA_PUT(skb, TCA_RED_PARMS, sizeof(opt), &opt); - rta->rta_len = skb->tail - b; - - return skb->len; + return RTA_NEST_END(skb, opts); rtattr_failure: - skb_trim(skb, b - skb->data); - return -1; + return RTA_NEST_CANCEL(skb, opts); } static int red_dump_stats(struct Qdisc *sch, struct gnet_dump *d) @@ -243,8 +219,6 @@ static int red_dump_stats(struct Qdisc *sch, struct gnet_dump *d) } static struct Qdisc_ops red_qdisc_ops = { - .next = NULL, - .cl_ops = NULL, .id = "red", .priv_size = sizeof(struct red_sched_data), .enqueue = red_enqueue, @@ -263,10 +237,13 @@ static int __init red_module_init(void) { return register_qdisc(&red_qdisc_ops); } -static void __exit red_module_exit(void) + +static void __exit red_module_exit(void) { unregister_qdisc(&red_qdisc_ops); } + module_init(red_module_init) module_exit(red_module_exit) + MODULE_LICENSE("GPL"); -- cgit v1.2.3-18-g5258 From dea3f62852f98670b72ad355c67bd55c9af58530 Mon Sep 17 00:00:00 2001 From: Thomas Graf Date: Sat, 5 Nov 2005 21:14:09 +0100 Subject: [PKT_SCHED]: GRED: Cleanup equalize flag and add new WRED mode detection Introduces a flags variable using bitops and transforms eqp to use it. Converts the conditions of the form (wred && rio) to (wred) since wred can only be enabled in rio mode anyway. The patch also improves WRED mode detection. The current behaviour does not allow WRED mode to be turned off again without removing the whole qdisc first. The new algorithm checks each VQ against each other looking for equal priorities every time a VQ is changed or added. The performance is poor, O(n**2), but it's used only during administrative tasks and the number of VQs is strictly limited. Signed-off-by: Thomas Graf Signed-off-by: Arnaldo Carvalho de Melo --- net/sched/sch_gred.c | 87 +++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 65 insertions(+), 22 deletions(-) (limited to 'net') diff --git a/net/sched/sch_gred.c b/net/sched/sch_gred.c index 25c171c3271..4ced47bf608 100644 --- a/net/sched/sch_gred.c +++ b/net/sched/sch_gred.c @@ -91,16 +91,57 @@ struct gred_sched_data psched_time_t qidlestart; /* Start of idle period */ }; +enum { + GRED_WRED_MODE = 1, +}; + struct gred_sched { struct gred_sched_data *tab[MAX_DPs]; + unsigned long flags; u32 DPs; u32 def; u8 initd; u8 grio; - u8 eqp; }; +static inline int gred_wred_mode(struct gred_sched *table) +{ + return test_bit(GRED_WRED_MODE, &table->flags); +} + +static inline void gred_enable_wred_mode(struct gred_sched *table) +{ + __set_bit(GRED_WRED_MODE, &table->flags); +} + +static inline void gred_disable_wred_mode(struct gred_sched *table) +{ + __clear_bit(GRED_WRED_MODE, &table->flags); +} + +static inline int gred_wred_mode_check(struct Qdisc *sch) +{ + struct gred_sched *table = qdisc_priv(sch); + int i; + + /* Really ugly O(n^2) but shouldn't be necessary too frequent. */ + for (i = 0; i < table->DPs; i++) { + struct gred_sched_data *q = table->tab[i]; + int n; + + if (q == NULL) + continue; + + for (n = 0; n < table->DPs; n++) + if (table->tab[n] && table->tab[n] != q && + table->tab[n]->prio == q->prio) + return 1; + } + + return 0; +} + static int gred_enqueue(struct sk_buff *skb, struct Qdisc* sch) { @@ -132,7 +173,7 @@ gred_enqueue(struct sk_buff *skb, struct Qdisc* sch) "general backlog %d\n",skb->tc_index&0xf,sch->handle,q->backlog, sch->qstats.backlog); /* sum up all the qaves of prios <= to ours to get the new qave*/ - if (!t->eqp && t->grio) { + if (!gred_wred_mode(t) && t->grio) { for (i=0;iDPs;i++) { if ((!t->tab[i]) || (i==q->DP)) continue; @@ -146,7 +187,7 @@ gred_enqueue(struct sk_buff *skb, struct Qdisc* sch) q->packetsin++; q->bytesin+=skb->len; - if (t->eqp && t->grio) { + if (gred_wred_mode(t)) { qave=0; q->qave=t->tab[t->def]->qave; q->qidlestart=t->tab[t->def]->qidlestart; @@ -160,7 +201,7 @@ gred_enqueue(struct sk_buff *skb, struct Qdisc* sch) q->qave >>= q->Stab[(us_idle>>q->Scell_log)&0xFF]; } else { - if (t->eqp) { + if (gred_wred_mode(t)) { q->qave += sch->qstats.backlog - (q->qave >> q->Wlog); } else { q->qave += q->backlog - (q->qave >> q->Wlog); @@ -169,7 +210,7 @@ gred_enqueue(struct sk_buff *skb, struct Qdisc* sch) } - if (t->eqp && t->grio) + if (gred_wred_mode(t)) t->tab[t->def]->qave=q->qave; if ((q->qave+qave) < q->qth_min) { @@ -240,7 +281,7 @@ gred_dequeue(struct Qdisc* sch) q= t->tab[(skb->tc_index&0xf)]; if (q) { q->backlog -= skb->len; - if (!q->backlog && !t->eqp) + if (!q->backlog && !gred_wred_mode(t)) PSCHED_GET_TIME(q->qidlestart); } else { D2PRINTK("gred_dequeue: skb has bad tcindex %x\n",skb->tc_index&0xf); @@ -248,7 +289,7 @@ gred_dequeue(struct Qdisc* sch) return skb; } - if (t->eqp) { + if (gred_wred_mode(t)) { q= t->tab[t->def]; if (!q) D2PRINTK("no default VQ set: Results will be " @@ -276,7 +317,7 @@ static unsigned int gred_drop(struct Qdisc* sch) if (q) { q->backlog -= len; q->other++; - if (!q->backlog && !t->eqp) + if (!q->backlog && !gred_wred_mode(t)) PSCHED_GET_TIME(q->qidlestart); } else { D2PRINTK("gred_dequeue: skb has bad tcindex %x\n",skb->tc_index&0xf); @@ -330,7 +371,6 @@ static int gred_change(struct Qdisc *sch, struct rtattr *opt) struct tc_gred_sopt *sopt; struct rtattr *tb[TCA_GRED_STAB]; struct rtattr *tb2[TCA_GRED_DPS]; - int i; if (opt == NULL || rtattr_parse_nested(tb, TCA_GRED_STAB, opt)) return -EINVAL; @@ -344,7 +384,17 @@ static int gred_change(struct Qdisc *sch, struct rtattr *opt) sopt = RTA_DATA(tb2[TCA_GRED_DPS-1]); table->DPs=sopt->DPs; table->def=sopt->def_DP; - table->grio=sopt->grio; + + if (sopt->grio) { + table->grio = 1; + gred_disable_wred_mode(table); + if (gred_wred_mode_check(sch)) + gred_enable_wred_mode(table); + } else { + table->grio = 0; + gred_disable_wred_mode(table); + } + table->initd=0; /* probably need to clear all the table DP entries as well */ return 0; @@ -413,17 +463,10 @@ static int gred_change(struct Qdisc *sch, struct rtattr *opt) PSCHED_SET_PASTPERFECT(q->qidlestart); memcpy(q->Stab, RTA_DATA(tb[TCA_GRED_STAB-1]), 256); - if ( table->initd && table->grio) { - /* this looks ugly but it's not in the fast path */ - for (i=0;iDPs;i++) { - if ((!table->tab[i]) || (i==q->DP) ) - continue; - if (table->tab[i]->prio == q->prio ){ - /* WRED mode detected */ - table->eqp=1; - break; - } - } + if (table->grio) { + gred_disable_wred_mode(table); + if (gred_wred_mode_check(sch)) + gred_enable_wred_mode(table); } if (!table->initd) { @@ -541,7 +584,7 @@ static int gred_dump(struct Qdisc *sch, struct sk_buff *skb) dst->DP=q->DP; dst->backlog=q->backlog; if (q->qave) { - if (table->eqp && table->grio) { + if (gred_wred_mode(table)) { q->qidlestart=table->tab[table->def]->qidlestart; q->qave=table->tab[table->def]->qave; } -- cgit v1.2.3-18-g5258 From d6fd4e9667bf5e00b92e62f02d75bd6c97a7007a Mon Sep 17 00:00:00 2001 From: Thomas Graf Date: Sat, 5 Nov 2005 21:14:10 +0100 Subject: [PKT_SCHED]: GRED: Transform grio to GRED_RIO_MODE Signed-off-by: Thomas Graf Signed-off-by: Arnaldo Carvalho de Melo --- net/sched/sch_gred.c | 36 ++++++++++++++++++++++++++++-------- 1 file changed, 28 insertions(+), 8 deletions(-) (limited to 'net') diff --git a/net/sched/sch_gred.c b/net/sched/sch_gred.c index 4ced47bf608..db594b46a52 100644 --- a/net/sched/sch_gred.c +++ b/net/sched/sch_gred.c @@ -93,6 +93,7 @@ struct gred_sched_data enum { GRED_WRED_MODE = 1, + GRED_RIO_MODE, }; struct gred_sched @@ -102,7 +103,6 @@ struct gred_sched u32 DPs; u32 def; u8 initd; - u8 grio; }; static inline int gred_wred_mode(struct gred_sched *table) @@ -120,6 +120,21 @@ static inline void gred_disable_wred_mode(struct gred_sched *table) __clear_bit(GRED_WRED_MODE, &table->flags); } +static inline int gred_rio_mode(struct gred_sched *table) +{ + return test_bit(GRED_RIO_MODE, &table->flags); +} + +static inline void gred_enable_rio_mode(struct gred_sched *table) +{ + __set_bit(GRED_RIO_MODE, &table->flags); +} + +static inline void gred_disable_rio_mode(struct gred_sched *table) +{ + __clear_bit(GRED_RIO_MODE, &table->flags); +} + static inline int gred_wred_mode_check(struct Qdisc *sch) { struct gred_sched *table = qdisc_priv(sch); @@ -173,7 +188,7 @@ gred_enqueue(struct sk_buff *skb, struct Qdisc* sch) "general backlog %d\n",skb->tc_index&0xf,sch->handle,q->backlog, sch->qstats.backlog); /* sum up all the qaves of prios <= to ours to get the new qave*/ - if (!gred_wred_mode(t) && t->grio) { + if (!gred_wred_mode(t) && gred_rio_mode(t)) { for (i=0;iDPs;i++) { if ((!t->tab[i]) || (i==q->DP)) continue; @@ -386,12 +401,12 @@ static int gred_change(struct Qdisc *sch, struct rtattr *opt) table->def=sopt->def_DP; if (sopt->grio) { - table->grio = 1; + gred_enable_rio_mode(table); gred_disable_wred_mode(table); if (gred_wred_mode_check(sch)) gred_enable_wred_mode(table); } else { - table->grio = 0; + gred_disable_rio_mode(table); gred_disable_wred_mode(table); } @@ -423,7 +438,7 @@ static int gred_change(struct Qdisc *sch, struct rtattr *opt) } q= table->tab[ctl->DP]; - if (table->grio) { + if (gred_rio_mode(table)) { if (ctl->prio <=0) { if (table->def && table->tab[table->def]) { DPRINTK("\nGRED: DP %u does not have a prio" @@ -463,7 +478,7 @@ static int gred_change(struct Qdisc *sch, struct rtattr *opt) PSCHED_SET_PASTPERFECT(q->qidlestart); memcpy(q->Stab, RTA_DATA(tb[TCA_GRED_STAB-1]), 256); - if (table->grio) { + if (gred_rio_mode(table)) { gred_disable_wred_mode(table); if (gred_wred_mode_check(sch)) gred_enable_wred_mode(table); @@ -496,7 +511,7 @@ static int gred_change(struct Qdisc *sch, struct rtattr *opt) q->qth_min = ctl->qth_min<Wlog; q->qth_max = ctl->qth_max<Wlog; - if (table->grio) + if (gred_rio_mode(table)) q->prio=table->tab[ctl->DP]->prio; else q->prio=8; @@ -528,7 +543,12 @@ static int gred_init(struct Qdisc *sch, struct rtattr *opt) sopt = RTA_DATA(tb2[TCA_GRED_DPS-1]); table->DPs=sopt->DPs; table->def=sopt->def_DP; - table->grio=sopt->grio; + + if (sopt->grio) + gred_enable_rio_mode(table); + else + gred_disable_rio_mode(table); + table->initd=0; return 0; } -- cgit v1.2.3-18-g5258 From 05f1cc01b4d24bc5432ae7044f8209d464f2b8ec Mon Sep 17 00:00:00 2001 From: Thomas Graf Date: Sat, 5 Nov 2005 21:14:11 +0100 Subject: [PKT_SCHED]: GRED: Cleanup dumping Avoids the allocation of a buffer by appending the VQs directly to the skb and simplifies the code by using the appropriate message construction macros. Signed-off-by: Thomas Graf Signed-off-by: Arnaldo Carvalho de Melo --- net/sched/sch_gred.c | 92 +++++++++++++++++++--------------------------------- 1 file changed, 34 insertions(+), 58 deletions(-) (limited to 'net') diff --git a/net/sched/sch_gred.c b/net/sched/sch_gred.c index db594b46a52..b3f5ad73fd8 100644 --- a/net/sched/sch_gred.c +++ b/net/sched/sch_gred.c @@ -559,50 +559,44 @@ static int gred_init(struct Qdisc *sch, struct rtattr *opt) static int gred_dump(struct Qdisc *sch, struct sk_buff *skb) { - unsigned long qave; - struct rtattr *rta; - struct tc_gred_qopt *opt = NULL ; - struct tc_gred_qopt *dst; struct gred_sched *table = qdisc_priv(sch); - struct gred_sched_data *q; + struct rtattr *parms, *opts = NULL; int i; - unsigned char *b = skb->tail; - - rta = (struct rtattr*)b; - RTA_PUT(skb, TCA_OPTIONS, 0, NULL); - opt=kmalloc(sizeof(struct tc_gred_qopt)*MAX_DPs, GFP_KERNEL); - - if (opt == NULL) { - DPRINTK("gred_dump:failed to malloc for %Zd\n", - sizeof(struct tc_gred_qopt)*MAX_DPs); - goto rtattr_failure; - } + opts = RTA_NEST(skb, TCA_OPTIONS); + parms = RTA_NEST(skb, TCA_GRED_PARMS); - memset(opt, 0, (sizeof(struct tc_gred_qopt))*table->DPs); - - if (!table->initd) { - DPRINTK("NO GRED Queues setup!\n"); - } + for (i = 0; i < MAX_DPs; i++) { + struct gred_sched_data *q = table->tab[i]; + struct tc_gred_qopt opt; - for (i=0;itab[i]; + memset(&opt, 0, sizeof(opt)); if (!q) { /* hack -- fix at some point with proper message This is how we indicate to tc that there is no VQ at this DP */ - dst->DP=MAX_DPs+i; - continue; + opt.DP = MAX_DPs + i; + goto append_opt; } - dst->limit=q->limit; - dst->qth_min=q->qth_min>>q->Wlog; - dst->qth_max=q->qth_max>>q->Wlog; - dst->DP=q->DP; - dst->backlog=q->backlog; + opt.limit = q->limit; + opt.DP = q->DP; + opt.backlog = q->backlog; + opt.prio = q->prio; + opt.qth_min = q->qth_min >> q->Wlog; + opt.qth_max = q->qth_max >> q->Wlog; + opt.Wlog = q->Wlog; + opt.Plog = q->Plog; + opt.Scell_log = q->Scell_log; + opt.other = q->other; + opt.early = q->early; + opt.forced = q->forced; + opt.pdrop = q->pdrop; + opt.packets = q->packetsin; + opt.bytesin = q->bytesin; + if (q->qave) { if (gred_wred_mode(table)) { q->qidlestart=table->tab[table->def]->qidlestart; @@ -610,46 +604,28 @@ static int gred_dump(struct Qdisc *sch, struct sk_buff *skb) } if (!PSCHED_IS_PASTPERFECT(q->qidlestart)) { long idle; + unsigned long qave; psched_time_t now; PSCHED_GET_TIME(now); idle = PSCHED_TDIFF_SAFE(now, q->qidlestart, q->Scell_max); qave = q->qave >> q->Stab[(idle>>q->Scell_log)&0xFF]; - dst->qave = qave >> q->Wlog; + opt.qave = qave >> q->Wlog; } else { - dst->qave = q->qave >> q->Wlog; + opt.qave = q->qave >> q->Wlog; } - } else { - dst->qave = 0; } - - - dst->Wlog = q->Wlog; - dst->Plog = q->Plog; - dst->Scell_log = q->Scell_log; - dst->other = q->other; - dst->forced = q->forced; - dst->early = q->early; - dst->pdrop = q->pdrop; - dst->prio = q->prio; - dst->packets=q->packetsin; - dst->bytesin=q->bytesin; + +append_opt: + RTA_APPEND(skb, sizeof(opt), &opt); } - RTA_PUT(skb, TCA_GRED_PARMS, sizeof(struct tc_gred_qopt)*MAX_DPs, opt); - rta->rta_len = skb->tail - b; + RTA_NEST_END(skb, parms); - kfree(opt); - return skb->len; + return RTA_NEST_END(skb, opts); rtattr_failure: - if (opt) - kfree(opt); - DPRINTK("gred_dump: FAILURE!!!!\n"); - -/* also free the opt struct here */ - skb_trim(skb, b - skb->data); - return -1; + return RTA_NEST_CANCEL(skb, opts); } static void gred_destroy(struct Qdisc *sch) -- cgit v1.2.3-18-g5258 From e06368221c204d7b5f1ba37d047170f9a0dd359d Mon Sep 17 00:00:00 2001 From: Thomas Graf Date: Sat, 5 Nov 2005 21:14:12 +0100 Subject: [PKT_SCHED]: GRED: Dump table definition Signed-off-by: Thomas Graf Signed-off-by: Arnaldo Carvalho de Melo --- net/sched/sch_gred.c | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'net') diff --git a/net/sched/sch_gred.c b/net/sched/sch_gred.c index b3f5ad73fd8..a1369550ce7 100644 --- a/net/sched/sch_gred.c +++ b/net/sched/sch_gred.c @@ -562,8 +562,14 @@ static int gred_dump(struct Qdisc *sch, struct sk_buff *skb) struct gred_sched *table = qdisc_priv(sch); struct rtattr *parms, *opts = NULL; int i; + struct tc_gred_sopt sopt = { + .DPs = table->DPs, + .def_DP = table->def, + .grio = gred_rio_mode(table), + }; opts = RTA_NEST(skb, TCA_OPTIONS); + RTA_PUT(skb, TCA_GRED_DPS, sizeof(sopt), &sopt); parms = RTA_NEST(skb, TCA_GRED_PARMS); for (i = 0; i < MAX_DPs; i++) { -- cgit v1.2.3-18-g5258 From 6639607ed9deaed9ab3a1cc588f0288891ece2ac Mon Sep 17 00:00:00 2001 From: Thomas Graf Date: Sat, 5 Nov 2005 21:14:13 +0100 Subject: [PKT_SCHED]: GRED: Use a central table definition change procedure Introduces a function gred_change_table_def() acting as a central point to change the table definition. Adds missing validations for table definition: MAX_DPs > DPs > 0 and def_DP < DPs thus fixing possible invalid memory reference oopses. Only root could do it but having a typo crashing the machine is a bit hard. Adds missing locking while changing the table definition, the operation of changing the number of DPs and removing shadowed VQs may not be interrupted by a dequeue. Signed-off-by: Thomas Graf Signed-off-by: Arnaldo Carvalho de Melo --- net/sched/sch_gred.c | 113 +++++++++++++++++++++++++++------------------------ 1 file changed, 61 insertions(+), 52 deletions(-) (limited to 'net') diff --git a/net/sched/sch_gred.c b/net/sched/sch_gred.c index a1369550ce7..fdc20ced52e 100644 --- a/net/sched/sch_gred.c +++ b/net/sched/sch_gred.c @@ -378,43 +378,72 @@ static void gred_reset(struct Qdisc* sch) } } -static int gred_change(struct Qdisc *sch, struct rtattr *opt) +static inline void gred_destroy_vq(struct gred_sched_data *q) +{ + kfree(q); +} + +static inline int gred_change_table_def(struct Qdisc *sch, struct rtattr *dps) { struct gred_sched *table = qdisc_priv(sch); - struct gred_sched_data *q; - struct tc_gred_qopt *ctl; struct tc_gred_sopt *sopt; - struct rtattr *tb[TCA_GRED_STAB]; - struct rtattr *tb2[TCA_GRED_DPS]; + int i; - if (opt == NULL || rtattr_parse_nested(tb, TCA_GRED_STAB, opt)) + if (dps == NULL || RTA_PAYLOAD(dps) < sizeof(*sopt)) return -EINVAL; - if (tb[TCA_GRED_PARMS-1] == 0 && tb[TCA_GRED_STAB-1] == 0) { - rtattr_parse_nested(tb2, TCA_GRED_DPS, opt); + sopt = RTA_DATA(dps); - if (tb2[TCA_GRED_DPS-1] == 0) - return -EINVAL; + if (sopt->DPs > MAX_DPs || sopt->DPs == 0 || sopt->def_DP >= sopt->DPs) + return -EINVAL; - sopt = RTA_DATA(tb2[TCA_GRED_DPS-1]); - table->DPs=sopt->DPs; - table->def=sopt->def_DP; + sch_tree_lock(sch); + table->DPs = sopt->DPs; + table->def = sopt->def_DP; - if (sopt->grio) { - gred_enable_rio_mode(table); - gred_disable_wred_mode(table); - if (gred_wred_mode_check(sch)) - gred_enable_wred_mode(table); - } else { - gred_disable_rio_mode(table); - gred_disable_wred_mode(table); - } + /* + * Every entry point to GRED is synchronized with the above code + * and the DP is checked against DPs, i.e. shadowed VQs can no + * longer be found so we can unlock right here. + */ + sch_tree_unlock(sch); - table->initd=0; - /* probably need to clear all the table DP entries as well */ - return 0; - } + if (sopt->grio) { + gred_enable_rio_mode(table); + gred_disable_wred_mode(table); + if (gred_wred_mode_check(sch)) + gred_enable_wred_mode(table); + } else { + gred_disable_rio_mode(table); + gred_disable_wred_mode(table); + } + + for (i = table->DPs; i < MAX_DPs; i++) { + if (table->tab[i]) { + printk(KERN_WARNING "GRED: Warning: Destroying " + "shadowed VQ 0x%x\n", i); + gred_destroy_vq(table->tab[i]); + table->tab[i] = NULL; + } + } + table->initd = 0; + + return 0; +} + +static int gred_change(struct Qdisc *sch, struct rtattr *opt) +{ + struct gred_sched *table = qdisc_priv(sch); + struct gred_sched_data *q; + struct tc_gred_qopt *ctl; + struct rtattr *tb[TCA_GRED_STAB]; + + if (opt == NULL || rtattr_parse_nested(tb, TCA_GRED_STAB, opt)) + return -EINVAL; + + if (tb[TCA_GRED_PARMS-1] == NULL && tb[TCA_GRED_STAB-1] == NULL) + return gred_change_table_def(sch, tb[TCA_GRED_DPS-1]); if (!table->DPs || tb[TCA_GRED_PARMS-1] == 0 || tb[TCA_GRED_STAB-1] == 0 || RTA_PAYLOAD(tb[TCA_GRED_PARMS-1]) < sizeof(*ctl) || @@ -526,35 +555,15 @@ static int gred_change(struct Qdisc *sch, struct rtattr *opt) static int gred_init(struct Qdisc *sch, struct rtattr *opt) { - struct gred_sched *table = qdisc_priv(sch); - struct tc_gred_sopt *sopt; - struct rtattr *tb[TCA_GRED_STAB]; - struct rtattr *tb2[TCA_GRED_DPS]; + struct rtattr *tb[TCA_GRED_MAX]; - if (opt == NULL || rtattr_parse_nested(tb, TCA_GRED_STAB, opt)) + if (opt == NULL || rtattr_parse_nested(tb, TCA_GRED_MAX, opt)) return -EINVAL; - if (tb[TCA_GRED_PARMS-1] == 0 && tb[TCA_GRED_STAB-1] == 0) { - rtattr_parse_nested(tb2, TCA_GRED_DPS, opt); - - if (tb2[TCA_GRED_DPS-1] == 0) - return -EINVAL; - - sopt = RTA_DATA(tb2[TCA_GRED_DPS-1]); - table->DPs=sopt->DPs; - table->def=sopt->def_DP; - - if (sopt->grio) - gred_enable_rio_mode(table); - else - gred_disable_rio_mode(table); - - table->initd=0; - return 0; - } + if (tb[TCA_GRED_PARMS-1] || tb[TCA_GRED_STAB-1]) + return -EINVAL; - DPRINTK("\n GRED_INIT error!\n"); - return -EINVAL; + return gred_change_table_def(sch, tb[TCA_GRED_DPS-1]); } static int gred_dump(struct Qdisc *sch, struct sk_buff *skb) @@ -641,7 +650,7 @@ static void gred_destroy(struct Qdisc *sch) for (i = 0;i < table->DPs; i++) { if (table->tab[i]) - kfree(table->tab[i]); + gred_destroy_vq(table->tab[i]); } } -- cgit v1.2.3-18-g5258 From a8aaa9958eea2420e13d5a00c3fae934e0a3889e Mon Sep 17 00:00:00 2001 From: Thomas Graf Date: Sat, 5 Nov 2005 21:14:14 +0100 Subject: [PKT_SCHED]: GRED: Report out-of-bound DPs as illegal Signed-off-by: Thomas Graf Signed-off-by: Arnaldo Carvalho de Melo --- net/sched/sch_gred.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) (limited to 'net') diff --git a/net/sched/sch_gred.c b/net/sched/sch_gred.c index fdc20ced52e..b04b07fcc2c 100644 --- a/net/sched/sch_gred.c +++ b/net/sched/sch_gred.c @@ -451,12 +451,9 @@ static int gred_change(struct Qdisc *sch, struct rtattr *opt) return -EINVAL; ctl = RTA_DATA(tb[TCA_GRED_PARMS-1]); - if (ctl->DP > MAX_DPs-1 ) { - /* misbehaving is punished! Put in the default drop probability */ - DPRINTK("\nGRED: DP %u not in the proper range fixed. New DP " - "set to default at %d\n",ctl->DP,table->def); - ctl->DP=table->def; - } + + if (ctl->DP >= table->DPs) + return -EINVAL; if (table->tab[ctl->DP] == NULL) { table->tab[ctl->DP]=kmalloc(sizeof(struct gred_sched_data), -- cgit v1.2.3-18-g5258 From f62d6b936df500247474c13360eb23e1b602bad0 Mon Sep 17 00:00:00 2001 From: Thomas Graf Date: Sat, 5 Nov 2005 21:14:15 +0100 Subject: [PKT_SCHED]: GRED: Use central VQ change procedure Introduces a function gred_change_vq() acting as a central point to change VQ parameters. Fixes priority inheritance in rio mode when the default DP equals 0. Adds proper locking during changes. Signed-off-by: Thomas Graf Signed-off-by: Arnaldo Carvalho de Melo --- net/sched/sch_gred.c | 173 +++++++++++++++++++++++++-------------------------- 1 file changed, 84 insertions(+), 89 deletions(-) (limited to 'net') diff --git a/net/sched/sch_gred.c b/net/sched/sch_gred.c index b04b07fcc2c..ca6cb271493 100644 --- a/net/sched/sch_gred.c +++ b/net/sched/sch_gred.c @@ -58,6 +58,8 @@ #define D2PRINTK(format,args...) #endif +#define GRED_DEF_PRIO (MAX_DPs / 2) + struct gred_sched_data; struct gred_sched; @@ -432,77 +434,102 @@ static inline int gred_change_table_def(struct Qdisc *sch, struct rtattr *dps) return 0; } -static int gred_change(struct Qdisc *sch, struct rtattr *opt) +static inline int gred_change_vq(struct Qdisc *sch, int dp, + struct tc_gred_qopt *ctl, int prio, u8 *stab) { struct gred_sched *table = qdisc_priv(sch); struct gred_sched_data *q; + + if (table->tab[dp] == NULL) { + table->tab[dp] = kmalloc(sizeof(*q), GFP_KERNEL); + if (table->tab[dp] == NULL) + return -ENOMEM; + memset(table->tab[dp], 0, sizeof(*q)); + } + + q = table->tab[dp]; + q->DP = dp; + q->prio = prio; + + q->Wlog = ctl->Wlog; + q->Plog = ctl->Plog; + q->limit = ctl->limit; + q->Scell_log = ctl->Scell_log; + q->Rmask = ctl->Plog < 32 ? ((1<Plog) - 1) : ~0UL; + q->Scell_max = (255<Scell_log); + q->qth_min = ctl->qth_min<Wlog; + q->qth_max = ctl->qth_max<Wlog; + q->qave=0; + q->backlog=0; + q->qcount = -1; + q->other=0; + q->forced=0; + q->pdrop=0; + q->early=0; + + PSCHED_SET_PASTPERFECT(q->qidlestart); + memcpy(q->Stab, stab, 256); + + return 0; +} + +static int gred_change(struct Qdisc *sch, struct rtattr *opt) +{ + struct gred_sched *table = qdisc_priv(sch); struct tc_gred_qopt *ctl; - struct rtattr *tb[TCA_GRED_STAB]; + struct rtattr *tb[TCA_GRED_MAX]; + int err = -EINVAL, prio = GRED_DEF_PRIO; + u8 *stab; - if (opt == NULL || rtattr_parse_nested(tb, TCA_GRED_STAB, opt)) + if (opt == NULL || rtattr_parse_nested(tb, TCA_GRED_MAX, opt)) return -EINVAL; if (tb[TCA_GRED_PARMS-1] == NULL && tb[TCA_GRED_STAB-1] == NULL) - return gred_change_table_def(sch, tb[TCA_GRED_DPS-1]); + return gred_change_table_def(sch, opt); - if (!table->DPs || tb[TCA_GRED_PARMS-1] == 0 || tb[TCA_GRED_STAB-1] == 0 || - RTA_PAYLOAD(tb[TCA_GRED_PARMS-1]) < sizeof(*ctl) || - RTA_PAYLOAD(tb[TCA_GRED_STAB-1]) < 256) - return -EINVAL; + if (tb[TCA_GRED_PARMS-1] == NULL || + RTA_PAYLOAD(tb[TCA_GRED_PARMS-1]) < sizeof(*ctl) || + tb[TCA_GRED_STAB-1] == NULL || + RTA_PAYLOAD(tb[TCA_GRED_STAB-1]) < 256) + return -EINVAL; ctl = RTA_DATA(tb[TCA_GRED_PARMS-1]); + stab = RTA_DATA(tb[TCA_GRED_STAB-1]); if (ctl->DP >= table->DPs) - return -EINVAL; - - if (table->tab[ctl->DP] == NULL) { - table->tab[ctl->DP]=kmalloc(sizeof(struct gred_sched_data), - GFP_KERNEL); - if (NULL == table->tab[ctl->DP]) - return -ENOMEM; - memset(table->tab[ctl->DP], 0, (sizeof(struct gred_sched_data))); - } - q= table->tab[ctl->DP]; + goto errout; if (gred_rio_mode(table)) { - if (ctl->prio <=0) { - if (table->def && table->tab[table->def]) { - DPRINTK("\nGRED: DP %u does not have a prio" - "setting default to %d\n",ctl->DP, - table->tab[table->def]->prio); - q->prio=table->tab[table->def]->prio; - } else { - DPRINTK("\nGRED: DP %u does not have a prio" - " setting default to 8\n",ctl->DP); - q->prio=8; - } - } else { - q->prio=ctl->prio; - } - } else { - q->prio=8; + if (ctl->prio == 0) { + int def_prio = GRED_DEF_PRIO; + + if (table->tab[table->def]) + def_prio = table->tab[table->def]->prio; + + printk(KERN_DEBUG "GRED: DP %u does not have a prio " + "setting default to %d\n", ctl->DP, def_prio); + + prio = def_prio; + } else + prio = ctl->prio; } + sch_tree_lock(sch); - q->DP=ctl->DP; - q->Wlog = ctl->Wlog; - q->Plog = ctl->Plog; - q->limit = ctl->limit; - q->Scell_log = ctl->Scell_log; - q->Rmask = ctl->Plog < 32 ? ((1<Plog) - 1) : ~0UL; - q->Scell_max = (255<Scell_log); - q->qth_min = ctl->qth_min<Wlog; - q->qth_max = ctl->qth_max<Wlog; - q->qave=0; - q->backlog=0; - q->qcount = -1; - q->other=0; - q->forced=0; - q->pdrop=0; - q->early=0; + err = gred_change_vq(sch, ctl->DP, ctl, prio, stab); + if (err < 0) + goto errout_locked; - PSCHED_SET_PASTPERFECT(q->qidlestart); - memcpy(q->Stab, RTA_DATA(tb[TCA_GRED_STAB-1]), 256); + if (table->tab[table->def] == NULL) { + if (gred_rio_mode(table)) + prio = table->tab[ctl->DP]->prio; + + err = gred_change_vq(sch, table->def, ctl, prio, stab); + if (err < 0) + goto errout_locked; + } + + table->initd = 1; if (gred_rio_mode(table)) { gred_disable_wred_mode(table); @@ -510,44 +537,12 @@ static int gred_change(struct Qdisc *sch, struct rtattr *opt) gred_enable_wred_mode(table); } - if (!table->initd) { - table->initd=1; - /* - the first entry also goes into the default until - over-written - */ - - if (table->tab[table->def] == NULL) { - table->tab[table->def]= - kmalloc(sizeof(struct gred_sched_data), GFP_KERNEL); - if (NULL == table->tab[table->def]) - return -ENOMEM; - - memset(table->tab[table->def], 0, - (sizeof(struct gred_sched_data))); - } - q= table->tab[table->def]; - q->DP=table->def; - q->Wlog = ctl->Wlog; - q->Plog = ctl->Plog; - q->limit = ctl->limit; - q->Scell_log = ctl->Scell_log; - q->Rmask = ctl->Plog < 32 ? ((1<Plog) - 1) : ~0UL; - q->Scell_max = (255<Scell_log); - q->qth_min = ctl->qth_min<Wlog; - q->qth_max = ctl->qth_max<Wlog; - - if (gred_rio_mode(table)) - q->prio=table->tab[ctl->DP]->prio; - else - q->prio=8; - - q->qcount = -1; - PSCHED_SET_PASTPERFECT(q->qidlestart); - memcpy(q->Stab, RTA_DATA(tb[TCA_GRED_STAB-1]), 256); - } - return 0; + err = 0; +errout_locked: + sch_tree_unlock(sch); +errout: + return err; } static int gred_init(struct Qdisc *sch, struct rtattr *opt) -- cgit v1.2.3-18-g5258 From 22b33429ab93155895854e9518a253680a920493 Mon Sep 17 00:00:00 2001 From: Thomas Graf Date: Sat, 5 Nov 2005 21:14:16 +0100 Subject: [PKT_SCHED]: GRED: Use new generic red interface Simplifies code a lot by separating the red algorithm and the queueing logic. We now differentiate between probability marks and forced marks but sum them together again to not break backwards compatibility. This brings GRED back to the level of RED and improves the accuracy of the averge queue length calculations when stab suggests a zero shift. Signed-off-by: Thomas Graf Signed-off-by: Arnaldo Carvalho de Melo --- net/sched/sch_gred.c | 224 +++++++++++++++++++++------------------------------ 1 file changed, 91 insertions(+), 133 deletions(-) (limited to 'net') diff --git a/net/sched/sch_gred.c b/net/sched/sch_gred.c index ca6cb271493..a52490c7af3 100644 --- a/net/sched/sch_gred.c +++ b/net/sched/sch_gred.c @@ -45,6 +45,7 @@ #include #include #include +#include #if 1 /* control */ #define DPRINTK(format,args...) printk(KERN_DEBUG format,##args) @@ -65,32 +66,15 @@ struct gred_sched; struct gred_sched_data { -/* Parameters */ u32 limit; /* HARD maximal queue length */ - u32 qth_min; /* Min average length threshold: A scaled */ - u32 qth_max; /* Max average length threshold: A scaled */ u32 DP; /* the drop pramaters */ - char Wlog; /* log(W) */ - char Plog; /* random number bits */ - u32 Scell_max; - u32 Rmask; u32 bytesin; /* bytes seen on virtualQ so far*/ u32 packetsin; /* packets seen on virtualQ so far*/ u32 backlog; /* bytes on the virtualQ */ - u32 forced; /* packets dropped for exceeding limits */ - u32 early; /* packets dropped as a warning */ - u32 other; /* packets dropped by invoking drop() */ - u32 pdrop; /* packets dropped because we exceeded physical queue limits */ - char Scell_log; - u8 Stab[256]; u8 prio; /* the prio of this vq */ -/* Variables */ - unsigned long qave; /* Average queue length: A scaled */ - int qcount; /* Packets since last random number generation */ - u32 qR; /* Cached random number */ - - psched_time_t qidlestart; /* Start of idle period */ + struct red_parms parms; + struct red_stats stats; }; enum { @@ -159,13 +143,22 @@ static inline int gred_wred_mode_check(struct Qdisc *sch) return 0; } +static inline unsigned int gred_backlog(struct gred_sched *table, + struct gred_sched_data *q, + struct Qdisc *sch) +{ + if (gred_wred_mode(table)) + return sch->qstats.backlog; + else + return q->backlog; +} + static int gred_enqueue(struct sk_buff *skb, struct Qdisc* sch) { - psched_time_t now; struct gred_sched_data *q=NULL; struct gred_sched *t= qdisc_priv(sch); - unsigned long qave=0; + unsigned long qavg = 0; int i=0; if (!t->initd && skb_queue_len(&sch->q) < (sch->dev->tx_queue_len ? : 1)) { @@ -195,8 +188,9 @@ gred_enqueue(struct sk_buff *skb, struct Qdisc* sch) if ((!t->tab[i]) || (i==q->DP)) continue; - if ((t->tab[i]->prio < q->prio) && (PSCHED_IS_PASTPERFECT(t->tab[i]->qidlestart))) - qave +=t->tab[i]->qave; + if (t->tab[i]->prio < q->prio && + !red_is_idling(&t->tab[i]->parms)) + qavg +=t->tab[i]->parms.qavg; } } @@ -205,68 +199,49 @@ gred_enqueue(struct sk_buff *skb, struct Qdisc* sch) q->bytesin+=skb->len; if (gred_wred_mode(t)) { - qave=0; - q->qave=t->tab[t->def]->qave; - q->qidlestart=t->tab[t->def]->qidlestart; + qavg = 0; + q->parms.qavg = t->tab[t->def]->parms.qavg; + q->parms.qidlestart = t->tab[t->def]->parms.qidlestart; } - if (!PSCHED_IS_PASTPERFECT(q->qidlestart)) { - long us_idle; - PSCHED_GET_TIME(now); - us_idle = PSCHED_TDIFF_SAFE(now, q->qidlestart, q->Scell_max); - PSCHED_SET_PASTPERFECT(q->qidlestart); + q->parms.qavg = red_calc_qavg(&q->parms, gred_backlog(t, q, sch)); - q->qave >>= q->Stab[(us_idle>>q->Scell_log)&0xFF]; - } else { - if (gred_wred_mode(t)) { - q->qave += sch->qstats.backlog - (q->qave >> q->Wlog); - } else { - q->qave += q->backlog - (q->qave >> q->Wlog); - } - - } - + if (red_is_idling(&q->parms)) + red_end_of_idle_period(&q->parms); if (gred_wred_mode(t)) - t->tab[t->def]->qave=q->qave; + t->tab[t->def]->parms.qavg = q->parms.qavg; - if ((q->qave+qave) < q->qth_min) { - q->qcount = -1; -enqueue: - if (q->backlog + skb->len <= q->limit) { - q->backlog += skb->len; -do_enqueue: - __skb_queue_tail(&sch->q, skb); - sch->qstats.backlog += skb->len; - sch->bstats.bytes += skb->len; - sch->bstats.packets++; - return 0; - } else { - q->pdrop++; - } + switch (red_action(&q->parms, q->parms.qavg + qavg)) { + case RED_DONT_MARK: + break; -drop: - kfree_skb(skb); - sch->qstats.drops++; - return NET_XMIT_DROP; - } - if ((q->qave+qave) >= q->qth_max) { - q->qcount = -1; - sch->qstats.overlimits++; - q->forced++; - goto drop; + case RED_PROB_MARK: + sch->qstats.overlimits++; + q->stats.prob_drop++; + goto drop; + + case RED_HARD_MARK: + sch->qstats.overlimits++; + q->stats.forced_drop++; + goto drop; } - if (++q->qcount) { - if ((((qave+q->qave) - q->qth_min)>>q->Wlog)*q->qcount < q->qR) - goto enqueue; - q->qcount = 0; - q->qR = net_random()&q->Rmask; - sch->qstats.overlimits++; - q->early++; - goto drop; + + if (q->backlog + skb->len <= q->limit) { + q->backlog += skb->len; +do_enqueue: + __skb_queue_tail(&sch->q, skb); + sch->qstats.backlog += skb->len; + sch->bstats.bytes += skb->len; + sch->bstats.packets++; + return 0; } - q->qR = net_random()&q->Rmask; - goto enqueue; + + q->stats.pdrop++; +drop: + kfree_skb(skb); + sch->qstats.drops++; + return NET_XMIT_DROP; } static int @@ -276,7 +251,9 @@ gred_requeue(struct sk_buff *skb, struct Qdisc* sch) struct gred_sched *t= qdisc_priv(sch); q= t->tab[(skb->tc_index&0xf)]; /* error checking here -- probably unnecessary */ - PSCHED_SET_PASTPERFECT(q->qidlestart); + + if (red_is_idling(&q->parms)) + red_end_of_idle_period(&q->parms); __skb_queue_head(&sch->q, skb); sch->qstats.backlog += skb->len; @@ -299,7 +276,7 @@ gred_dequeue(struct Qdisc* sch) if (q) { q->backlog -= skb->len; if (!q->backlog && !gred_wred_mode(t)) - PSCHED_GET_TIME(q->qidlestart); + red_start_of_idle_period(&q->parms); } else { D2PRINTK("gred_dequeue: skb has bad tcindex %x\n",skb->tc_index&0xf); } @@ -312,7 +289,7 @@ gred_dequeue(struct Qdisc* sch) D2PRINTK("no default VQ set: Results will be " "screwed up\n"); else - PSCHED_GET_TIME(q->qidlestart); + red_start_of_idle_period(&q->parms); } return NULL; @@ -333,9 +310,9 @@ static unsigned int gred_drop(struct Qdisc* sch) q= t->tab[(skb->tc_index&0xf)]; if (q) { q->backlog -= len; - q->other++; + q->stats.other++; if (!q->backlog && !gred_wred_mode(t)) - PSCHED_GET_TIME(q->qidlestart); + red_start_of_idle_period(&q->parms); } else { D2PRINTK("gred_dequeue: skb has bad tcindex %x\n",skb->tc_index&0xf); } @@ -350,7 +327,7 @@ static unsigned int gred_drop(struct Qdisc* sch) return 0; } - PSCHED_GET_TIME(q->qidlestart); + red_start_of_idle_period(&q->parms); return 0; } @@ -369,14 +346,12 @@ static void gred_reset(struct Qdisc* sch) q= t->tab[i]; if (!q) continue; - PSCHED_SET_PASTPERFECT(q->qidlestart); - q->qave = 0; - q->qcount = -1; + red_restart(&q->parms); q->backlog = 0; - q->other=0; - q->forced=0; - q->pdrop=0; - q->early=0; + q->stats.other = 0; + q->stats.forced_drop = 0; + q->stats.prob_drop = 0; + q->stats.pdrop = 0; } } @@ -450,25 +425,19 @@ static inline int gred_change_vq(struct Qdisc *sch, int dp, q = table->tab[dp]; q->DP = dp; q->prio = prio; - - q->Wlog = ctl->Wlog; - q->Plog = ctl->Plog; q->limit = ctl->limit; - q->Scell_log = ctl->Scell_log; - q->Rmask = ctl->Plog < 32 ? ((1<Plog) - 1) : ~0UL; - q->Scell_max = (255<Scell_log); - q->qth_min = ctl->qth_min<Wlog; - q->qth_max = ctl->qth_max<Wlog; - q->qave=0; - q->backlog=0; - q->qcount = -1; - q->other=0; - q->forced=0; - q->pdrop=0; - q->early=0; - - PSCHED_SET_PASTPERFECT(q->qidlestart); - memcpy(q->Stab, stab, 256); + + if (q->backlog == 0) + red_end_of_idle_period(&q->parms); + + red_set_parms(&q->parms, + ctl->qth_min, ctl->qth_max, ctl->Wlog, ctl->Plog, + ctl->Scell_log, stab); + + q->stats.other = 0; + q->stats.forced_drop = 0; + q->stats.prob_drop = 0; + q->stats.pdrop = 0; return 0; } @@ -592,37 +561,26 @@ static int gred_dump(struct Qdisc *sch, struct sk_buff *skb) opt.DP = q->DP; opt.backlog = q->backlog; opt.prio = q->prio; - opt.qth_min = q->qth_min >> q->Wlog; - opt.qth_max = q->qth_max >> q->Wlog; - opt.Wlog = q->Wlog; - opt.Plog = q->Plog; - opt.Scell_log = q->Scell_log; - opt.other = q->other; - opt.early = q->early; - opt.forced = q->forced; - opt.pdrop = q->pdrop; + opt.qth_min = q->parms.qth_min >> q->parms.Wlog; + opt.qth_max = q->parms.qth_max >> q->parms.Wlog; + opt.Wlog = q->parms.Wlog; + opt.Plog = q->parms.Plog; + opt.Scell_log = q->parms.Scell_log; + opt.other = q->stats.other; + opt.early = q->stats.prob_drop; + opt.forced = q->stats.forced_drop; + opt.pdrop = q->stats.pdrop; opt.packets = q->packetsin; opt.bytesin = q->bytesin; - if (q->qave) { - if (gred_wred_mode(table)) { - q->qidlestart=table->tab[table->def]->qidlestart; - q->qave=table->tab[table->def]->qave; - } - if (!PSCHED_IS_PASTPERFECT(q->qidlestart)) { - long idle; - unsigned long qave; - psched_time_t now; - PSCHED_GET_TIME(now); - idle = PSCHED_TDIFF_SAFE(now, q->qidlestart, q->Scell_max); - qave = q->qave >> q->Stab[(idle>>q->Scell_log)&0xFF]; - opt.qave = qave >> q->Wlog; - - } else { - opt.qave = q->qave >> q->Wlog; - } + if (gred_wred_mode(table)) { + q->parms.qidlestart = + table->tab[table->def]->parms.qidlestart; + q->parms.qavg = table->tab[table->def]->parms.qavg; } + opt.qave = red_calc_qavg(&q->parms, q->parms.qavg); + append_opt: RTA_APPEND(skb, sizeof(opt), &opt); } -- cgit v1.2.3-18-g5258 From 301d063c2915e8307e3d128245d8a393ad776539 Mon Sep 17 00:00:00 2001 From: Thomas Graf Date: Sat, 5 Nov 2005 21:14:17 +0100 Subject: [PKT_SCHED]: GRED: Do not reset statistics in gred_reset/gred_change Qdiscs are not supposed to reset statistics in reset() and while changing parameters. My argumentation is that if the user wants the counters to be reset he can simply remove and readd the qdiscs, that's what most users do anyway. Signed-off-by: Thomas Graf Signed-off-by: Arnaldo Carvalho de Melo --- net/sched/sch_gred.c | 9 --------- 1 file changed, 9 deletions(-) (limited to 'net') diff --git a/net/sched/sch_gred.c b/net/sched/sch_gred.c index a52490c7af3..50f184cd7f1 100644 --- a/net/sched/sch_gred.c +++ b/net/sched/sch_gred.c @@ -348,10 +348,6 @@ static void gred_reset(struct Qdisc* sch) continue; red_restart(&q->parms); q->backlog = 0; - q->stats.other = 0; - q->stats.forced_drop = 0; - q->stats.prob_drop = 0; - q->stats.pdrop = 0; } } @@ -434,11 +430,6 @@ static inline int gred_change_vq(struct Qdisc *sch, int dp, ctl->qth_min, ctl->qth_max, ctl->Wlog, ctl->Plog, ctl->Scell_log, stab); - q->stats.other = 0; - q->stats.forced_drop = 0; - q->stats.prob_drop = 0; - q->stats.pdrop = 0; - return 0; } -- cgit v1.2.3-18-g5258 From c3b553cdaf50ce915bcd995fa8ec2905f227de64 Mon Sep 17 00:00:00 2001 From: Thomas Graf Date: Sat, 5 Nov 2005 21:14:18 +0100 Subject: [PKT_SCHED]: GRED: Report congestion related drops as NET_XMIT_CN Signed-off-by: Thomas Graf Signed-off-by: Arnaldo Carvalho de Melo --- net/sched/sch_gred.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) (limited to 'net') diff --git a/net/sched/sch_gred.c b/net/sched/sch_gred.c index 50f184cd7f1..f7c6c0359ce 100644 --- a/net/sched/sch_gred.c +++ b/net/sched/sch_gred.c @@ -219,12 +219,12 @@ gred_enqueue(struct sk_buff *skb, struct Qdisc* sch) case RED_PROB_MARK: sch->qstats.overlimits++; q->stats.prob_drop++; - goto drop; + goto congestion_drop; case RED_HARD_MARK: sch->qstats.overlimits++; q->stats.forced_drop++; - goto drop; + goto congestion_drop; } if (q->backlog + skb->len <= q->limit) { @@ -242,6 +242,11 @@ drop: kfree_skb(skb); sch->qstats.drops++; return NET_XMIT_DROP; + +congestion_drop: + kfree_skb(skb); + sch->qstats.drops++; + return NET_XMIT_CN; } static int -- cgit v1.2.3-18-g5258 From edf7a7b1f0bd31d96096e38cbf35b02a3a1352b4 Mon Sep 17 00:00:00 2001 From: Thomas Graf Date: Sat, 5 Nov 2005 21:14:19 +0100 Subject: [PKT_SCHED]: GRED: Use generic queue management interface Signed-off-by: Thomas Graf Signed-off-by: Arnaldo Carvalho de Melo --- net/sched/sch_gred.c | 32 +++++++++----------------------- 1 file changed, 9 insertions(+), 23 deletions(-) (limited to 'net') diff --git a/net/sched/sch_gred.c b/net/sched/sch_gred.c index f7c6c0359ce..95c5f2cf3fd 100644 --- a/net/sched/sch_gred.c +++ b/net/sched/sch_gred.c @@ -230,22 +230,15 @@ gred_enqueue(struct sk_buff *skb, struct Qdisc* sch) if (q->backlog + skb->len <= q->limit) { q->backlog += skb->len; do_enqueue: - __skb_queue_tail(&sch->q, skb); - sch->qstats.backlog += skb->len; - sch->bstats.bytes += skb->len; - sch->bstats.packets++; - return 0; + return qdisc_enqueue_tail(skb, sch); } q->stats.pdrop++; drop: - kfree_skb(skb); - sch->qstats.drops++; - return NET_XMIT_DROP; + return qdisc_drop(skb, sch); congestion_drop: - kfree_skb(skb); - sch->qstats.drops++; + qdisc_drop(skb, sch); return NET_XMIT_CN; } @@ -260,11 +253,8 @@ gred_requeue(struct sk_buff *skb, struct Qdisc* sch) if (red_is_idling(&q->parms)) red_end_of_idle_period(&q->parms); - __skb_queue_head(&sch->q, skb); - sch->qstats.backlog += skb->len; - sch->qstats.requeues++; q->backlog += skb->len; - return 0; + return qdisc_requeue(skb, sch); } static struct sk_buff * @@ -274,9 +264,9 @@ gred_dequeue(struct Qdisc* sch) struct gred_sched_data *q; struct gred_sched *t= qdisc_priv(sch); - skb = __skb_dequeue(&sch->q); + skb = qdisc_dequeue_head(sch); + if (skb) { - sch->qstats.backlog -= skb->len; q= t->tab[(skb->tc_index&0xf)]; if (q) { q->backlog -= skb->len; @@ -307,11 +297,9 @@ static unsigned int gred_drop(struct Qdisc* sch) struct gred_sched_data *q; struct gred_sched *t= qdisc_priv(sch); - skb = __skb_dequeue_tail(&sch->q); + skb = qdisc_dequeue_tail(sch); if (skb) { unsigned int len = skb->len; - sch->qstats.backlog -= len; - sch->qstats.drops++; q= t->tab[(skb->tc_index&0xf)]; if (q) { q->backlog -= len; @@ -322,7 +310,7 @@ static unsigned int gred_drop(struct Qdisc* sch) D2PRINTK("gred_dequeue: skb has bad tcindex %x\n",skb->tc_index&0xf); } - kfree_skb(skb); + qdisc_drop(skb, sch); return len; } @@ -343,9 +331,7 @@ static void gred_reset(struct Qdisc* sch) struct gred_sched_data *q; struct gred_sched *t= qdisc_priv(sch); - __skb_queue_purge(&sch->q); - - sch->qstats.backlog = 0; + qdisc_reset_queue(sch); for (i=0;iDPs;i++) { q= t->tab[i]; -- cgit v1.2.3-18-g5258 From 716a1b40b0ed630570edd4e2bf9053c421e9770b Mon Sep 17 00:00:00 2001 From: Thomas Graf Date: Sat, 5 Nov 2005 21:14:20 +0100 Subject: [PKT_SCHED]: GRED: Introduce tc_index_to_dp() Adds a transformation function returning the DP index for a given skb according to its tc_index. Signed-off-by: Thomas Graf Signed-off-by: Arnaldo Carvalho de Melo --- net/sched/sch_gred.c | 27 ++++++++++++++++++--------- 1 file changed, 18 insertions(+), 9 deletions(-) (limited to 'net') diff --git a/net/sched/sch_gred.c b/net/sched/sch_gred.c index 95c5f2cf3fd..38dab959fee 100644 --- a/net/sched/sch_gred.c +++ b/net/sched/sch_gred.c @@ -60,6 +60,7 @@ #endif #define GRED_DEF_PRIO (MAX_DPs / 2) +#define GRED_VQ_MASK (MAX_DPs - 1) struct gred_sched_data; struct gred_sched; @@ -153,6 +154,11 @@ static inline unsigned int gred_backlog(struct gred_sched *table, return q->backlog; } +static inline u16 tc_index_to_dp(struct sk_buff *skb) +{ + return skb->tc_index & GRED_VQ_MASK; +} + static int gred_enqueue(struct sk_buff *skb, struct Qdisc* sch) { @@ -160,14 +166,16 @@ gred_enqueue(struct sk_buff *skb, struct Qdisc* sch) struct gred_sched *t= qdisc_priv(sch); unsigned long qavg = 0; int i=0; + u16 dp; if (!t->initd && skb_queue_len(&sch->q) < (sch->dev->tx_queue_len ? : 1)) { D2PRINTK("NO GRED Queues setup yet! Enqueued anyway\n"); goto do_enqueue; } + dp = tc_index_to_dp(skb); - if ( ((skb->tc_index&0xf) > (t->DPs -1)) || !(q=t->tab[skb->tc_index&0xf])) { + if (dp >= t->DPs || (q = t->tab[dp]) == NULL) { printk("GRED: setting to default (%d)\n ",t->def); if (!(q=t->tab[t->def])) { DPRINTK("GRED: setting to default FAILED! dropping!! " @@ -176,7 +184,7 @@ gred_enqueue(struct sk_buff *skb, struct Qdisc* sch) } /* fix tc_index? --could be controvesial but needed for requeueing */ - skb->tc_index=(skb->tc_index&0xfffffff0) | t->def; + skb->tc_index=(skb->tc_index & ~GRED_VQ_MASK) | t->def; } D2PRINTK("gred_enqueue virtualQ 0x%x classid %x backlog %d " @@ -245,9 +253,8 @@ congestion_drop: static int gred_requeue(struct sk_buff *skb, struct Qdisc* sch) { - struct gred_sched_data *q; - struct gred_sched *t= qdisc_priv(sch); - q= t->tab[(skb->tc_index&0xf)]; + struct gred_sched *t = qdisc_priv(sch); + struct gred_sched_data *q = t->tab[tc_index_to_dp(skb)]; /* error checking here -- probably unnecessary */ if (red_is_idling(&q->parms)) @@ -267,13 +274,14 @@ gred_dequeue(struct Qdisc* sch) skb = qdisc_dequeue_head(sch); if (skb) { - q= t->tab[(skb->tc_index&0xf)]; + q = t->tab[tc_index_to_dp(skb)]; if (q) { q->backlog -= skb->len; if (!q->backlog && !gred_wred_mode(t)) red_start_of_idle_period(&q->parms); } else { - D2PRINTK("gred_dequeue: skb has bad tcindex %x\n",skb->tc_index&0xf); + D2PRINTK("gred_dequeue: skb has bad tcindex %x\n", + tc_index_to_dp(skb)); } return skb; } @@ -300,14 +308,15 @@ static unsigned int gred_drop(struct Qdisc* sch) skb = qdisc_dequeue_tail(sch); if (skb) { unsigned int len = skb->len; - q= t->tab[(skb->tc_index&0xf)]; + q = t->tab[tc_index_to_dp(skb)]; if (q) { q->backlog -= len; q->stats.other++; if (!q->backlog && !gred_wred_mode(t)) red_start_of_idle_period(&q->parms); } else { - D2PRINTK("gred_dequeue: skb has bad tcindex %x\n",skb->tc_index&0xf); + D2PRINTK("gred_dequeue: skb has bad tcindex %x\n", + tc_index_to_dp(skb)); } qdisc_drop(skb, sch); -- cgit v1.2.3-18-g5258 From 18e3fb84e698dcab1c5fa7b7c89921b826bb5620 Mon Sep 17 00:00:00 2001 From: Thomas Graf Date: Sat, 5 Nov 2005 21:14:21 +0100 Subject: [PKT_SCHED]: GRED: Improve error handling and messages Try to enqueue packets if we cannot associate it with a VQ, this basically means that the default VQ has not been set up yet. We must check if the VQ still exists while requeueing, the VQ might have been changed between dequeue and the requeue of the underlying qdisc. Signed-off-by: Thomas Graf Signed-off-by: Arnaldo Carvalho de Melo --- net/sched/sch_gred.c | 68 +++++++++++++++++++++++++++++++++------------------- 1 file changed, 44 insertions(+), 24 deletions(-) (limited to 'net') diff --git a/net/sched/sch_gred.c b/net/sched/sch_gred.c index 38dab959fee..646dbdc4ef2 100644 --- a/net/sched/sch_gred.c +++ b/net/sched/sch_gred.c @@ -176,20 +176,24 @@ gred_enqueue(struct sk_buff *skb, struct Qdisc* sch) dp = tc_index_to_dp(skb); if (dp >= t->DPs || (q = t->tab[dp]) == NULL) { - printk("GRED: setting to default (%d)\n ",t->def); - if (!(q=t->tab[t->def])) { - DPRINTK("GRED: setting to default FAILED! dropping!! " - "(%d)\n ", t->def); - goto drop; + dp = t->def; + + if ((q = t->tab[dp]) == NULL) { + /* Pass through packets not assigned to a DP + * if no default DP has been configured. This + * allows for DP flows to be left untouched. + */ + if (skb_queue_len(&sch->q) < sch->dev->tx_queue_len) + return qdisc_enqueue_tail(skb, sch); + else + goto drop; } + /* fix tc_index? --could be controvesial but needed for requeueing */ - skb->tc_index=(skb->tc_index & ~GRED_VQ_MASK) | t->def; + skb->tc_index = (skb->tc_index & ~GRED_VQ_MASK) | dp; } - D2PRINTK("gred_enqueue virtualQ 0x%x classid %x backlog %d " - "general backlog %d\n",skb->tc_index&0xf,sch->handle,q->backlog, - sch->qstats.backlog); /* sum up all the qaves of prios <= to ours to get the new qave*/ if (!gred_wred_mode(t) && gred_rio_mode(t)) { for (i=0;iDPs;i++) { @@ -254,13 +258,20 @@ static int gred_requeue(struct sk_buff *skb, struct Qdisc* sch) { struct gred_sched *t = qdisc_priv(sch); - struct gred_sched_data *q = t->tab[tc_index_to_dp(skb)]; -/* error checking here -- probably unnecessary */ + struct gred_sched_data *q; + u16 dp = tc_index_to_dp(skb); - if (red_is_idling(&q->parms)) - red_end_of_idle_period(&q->parms); + if (dp >= t->DPs || (q = t->tab[dp]) == NULL) { + if (net_ratelimit()) + printk(KERN_WARNING "GRED: Unable to relocate VQ 0x%x " + "for requeue, screwing up backlog.\n", + tc_index_to_dp(skb)); + } else { + if (red_is_idling(&q->parms)) + red_end_of_idle_period(&q->parms); + q->backlog += skb->len; + } - q->backlog += skb->len; return qdisc_requeue(skb, sch); } @@ -274,15 +285,20 @@ gred_dequeue(struct Qdisc* sch) skb = qdisc_dequeue_head(sch); if (skb) { - q = t->tab[tc_index_to_dp(skb)]; - if (q) { + u16 dp = tc_index_to_dp(skb); + + if (dp >= t->DPs || (q = t->tab[dp]) == NULL) { + if (net_ratelimit()) + printk(KERN_WARNING "GRED: Unable to relocate " + "VQ 0x%x after dequeue, screwing up " + "backlog.\n", tc_index_to_dp(skb)); + } else { q->backlog -= skb->len; + if (!q->backlog && !gred_wred_mode(t)) red_start_of_idle_period(&q->parms); - } else { - D2PRINTK("gred_dequeue: skb has bad tcindex %x\n", - tc_index_to_dp(skb)); } + return skb; } @@ -308,15 +324,19 @@ static unsigned int gred_drop(struct Qdisc* sch) skb = qdisc_dequeue_tail(sch); if (skb) { unsigned int len = skb->len; - q = t->tab[tc_index_to_dp(skb)]; - if (q) { + u16 dp = tc_index_to_dp(skb); + + if (dp >= t->DPs || (q = t->tab[dp]) == NULL) { + if (net_ratelimit()) + printk(KERN_WARNING "GRED: Unable to relocate " + "VQ 0x%x while dropping, screwing up " + "backlog.\n", tc_index_to_dp(skb)); + } else { q->backlog -= len; q->stats.other++; + if (!q->backlog && !gred_wred_mode(t)) red_start_of_idle_period(&q->parms); - } else { - D2PRINTK("gred_dequeue: skb has bad tcindex %x\n", - tc_index_to_dp(skb)); } qdisc_drop(skb, sch); -- cgit v1.2.3-18-g5258 From 4a591834cfc79b2ff74457e976420361f8ae28b4 Mon Sep 17 00:00:00 2001 From: Thomas Graf Date: Sat, 5 Nov 2005 21:14:22 +0100 Subject: [PKT_SCHED]: GRED: Remove initd flag The case when the default VQ is not set up yet is already handled in a less error prone way. Signed-off-by: Thomas Graf Signed-off-by: Arnaldo Carvalho de Melo --- net/sched/sch_gred.c | 15 +-------------- 1 file changed, 1 insertion(+), 14 deletions(-) (limited to 'net') diff --git a/net/sched/sch_gred.c b/net/sched/sch_gred.c index 646dbdc4ef2..29869a07748 100644 --- a/net/sched/sch_gred.c +++ b/net/sched/sch_gred.c @@ -89,7 +89,6 @@ struct gred_sched unsigned long flags; u32 DPs; u32 def; - u8 initd; }; static inline int gred_wred_mode(struct gred_sched *table) @@ -166,14 +165,7 @@ gred_enqueue(struct sk_buff *skb, struct Qdisc* sch) struct gred_sched *t= qdisc_priv(sch); unsigned long qavg = 0; int i=0; - u16 dp; - - if (!t->initd && skb_queue_len(&sch->q) < (sch->dev->tx_queue_len ? : 1)) { - D2PRINTK("NO GRED Queues setup yet! Enqueued anyway\n"); - goto do_enqueue; - } - - dp = tc_index_to_dp(skb); + u16 dp = tc_index_to_dp(skb); if (dp >= t->DPs || (q = t->tab[dp]) == NULL) { dp = t->def; @@ -241,7 +233,6 @@ gred_enqueue(struct sk_buff *skb, struct Qdisc* sch) if (q->backlog + skb->len <= q->limit) { q->backlog += skb->len; -do_enqueue: return qdisc_enqueue_tail(skb, sch); } @@ -420,8 +411,6 @@ static inline int gred_change_table_def(struct Qdisc *sch, struct rtattr *dps) } } - table->initd = 0; - return 0; } @@ -509,8 +498,6 @@ static int gred_change(struct Qdisc *sch, struct rtattr *opt) goto errout_locked; } - table->initd = 1; - if (gred_rio_mode(table)) { gred_disable_wred_mode(table); if (gred_wred_mode_check(sch)) -- cgit v1.2.3-18-g5258 From 7051703b990ec40bdf192ec7c87ffafd7011c640 Mon Sep 17 00:00:00 2001 From: Thomas Graf Date: Sat, 5 Nov 2005 21:14:23 +0100 Subject: [PKT_SCHED]: GRED: Dont abuse default VQ for equalizing Introduces a new red parameter set for use in equalize mode, although only the qavg variable and the idle period marker are being used for now this makes it possible to allow a separate parameter set to be used for equalize later on. The use of this separate parameter set fixes a bogus start of an idle period in gred_drop() which did start an idle period on the default VQ even if equalize mode was disabled. Signed-off-by: Thomas Graf Signed-off-by: Arnaldo Carvalho de Melo --- net/sched/sch_gred.c | 37 ++++++++++++++++++++----------------- 1 file changed, 20 insertions(+), 17 deletions(-) (limited to 'net') diff --git a/net/sched/sch_gred.c b/net/sched/sch_gred.c index 29869a07748..a545532be2c 100644 --- a/net/sched/sch_gred.c +++ b/net/sched/sch_gred.c @@ -89,6 +89,7 @@ struct gred_sched unsigned long flags; u32 DPs; u32 def; + struct red_parms wred_set; }; static inline int gred_wred_mode(struct gred_sched *table) @@ -158,6 +159,19 @@ static inline u16 tc_index_to_dp(struct sk_buff *skb) return skb->tc_index & GRED_VQ_MASK; } +static inline void gred_load_wred_set(struct gred_sched *table, + struct gred_sched_data *q) +{ + q->parms.qavg = table->wred_set.qavg; + q->parms.qidlestart = table->wred_set.qidlestart; +} + +static inline void gred_store_wred_set(struct gred_sched *table, + struct gred_sched_data *q) +{ + table->wred_set.qavg = q->parms.qavg; +} + static int gred_enqueue(struct sk_buff *skb, struct Qdisc* sch) { @@ -204,8 +218,7 @@ gred_enqueue(struct sk_buff *skb, struct Qdisc* sch) if (gred_wred_mode(t)) { qavg = 0; - q->parms.qavg = t->tab[t->def]->parms.qavg; - q->parms.qidlestart = t->tab[t->def]->parms.qidlestart; + gred_load_wred_set(t, q); } q->parms.qavg = red_calc_qavg(&q->parms, gred_backlog(t, q, sch)); @@ -214,7 +227,7 @@ gred_enqueue(struct sk_buff *skb, struct Qdisc* sch) red_end_of_idle_period(&q->parms); if (gred_wred_mode(t)) - t->tab[t->def]->parms.qavg = q->parms.qavg; + gred_store_wred_set(t, q); switch (red_action(&q->parms, q->parms.qavg + qavg)) { case RED_DONT_MARK: @@ -293,14 +306,8 @@ gred_dequeue(struct Qdisc* sch) return skb; } - if (gred_wred_mode(t)) { - q= t->tab[t->def]; - if (!q) - D2PRINTK("no default VQ set: Results will be " - "screwed up\n"); - else - red_start_of_idle_period(&q->parms); - } + if (gred_wred_mode(t)) + red_start_of_idle_period(&t->wred_set); return NULL; } @@ -334,13 +341,9 @@ static unsigned int gred_drop(struct Qdisc* sch) return len; } - q=t->tab[t->def]; - if (!q) { - D2PRINTK("no default VQ set: Results might be screwed up\n"); - return 0; - } + if (gred_wred_mode(t)) + red_start_of_idle_period(&t->wred_set); - red_start_of_idle_period(&q->parms); return 0; } -- cgit v1.2.3-18-g5258 From 6214e653cc578947bf83d6766339a18a41c5b923 Mon Sep 17 00:00:00 2001 From: Thomas Graf Date: Sat, 5 Nov 2005 21:14:24 +0100 Subject: [PKT_SCHED]: GRED: Remove auto-creation of default VQ Since we are no longer depending on the default VQ to be always allocated we can leave it up to the user to actually create it. This gives the user the ability to leave it out on purpose and enqueue packets directly to the device without applying the RED algorithm. Signed-off-by: Thomas Graf Signed-off-by: Arnaldo Carvalho de Melo --- net/sched/sch_gred.c | 9 --------- 1 file changed, 9 deletions(-) (limited to 'net') diff --git a/net/sched/sch_gred.c b/net/sched/sch_gred.c index a545532be2c..897e6df81b1 100644 --- a/net/sched/sch_gred.c +++ b/net/sched/sch_gred.c @@ -492,15 +492,6 @@ static int gred_change(struct Qdisc *sch, struct rtattr *opt) if (err < 0) goto errout_locked; - if (table->tab[table->def] == NULL) { - if (gred_rio_mode(table)) - prio = table->tab[ctl->DP]->prio; - - err = gred_change_vq(sch, table->def, ctl, prio, stab); - if (err < 0) - goto errout_locked; - } - if (gred_rio_mode(table)) { gred_disable_wred_mode(table); if (gred_wred_mode_check(sch)) -- cgit v1.2.3-18-g5258 From 1e4dfaf9b99a8b652e8421936fd5fe2459da8265 Mon Sep 17 00:00:00 2001 From: Thomas Graf Date: Sat, 5 Nov 2005 21:14:25 +0100 Subject: [PKT_SCHED]: GRED: Cleanup and remove unnecessary code Removes unnecessary includes, initializers, and simplifies the code a bit. Signed-off-by: Thomas Graf Signed-off-by: Arnaldo Carvalho de Melo --- net/sched/sch_gred.c | 100 ++++++++++++++++----------------------------------- 1 file changed, 31 insertions(+), 69 deletions(-) (limited to 'net') diff --git a/net/sched/sch_gred.c b/net/sched/sch_gred.c index 897e6df81b1..1fb34be32f7 100644 --- a/net/sched/sch_gred.c +++ b/net/sched/sch_gred.c @@ -15,50 +15,18 @@ * from Ren Liu * - More error checks * - * - * - * For all the glorious comments look at Alexey's sch_red.c + * For all the glorious comments look at include/net/red.h */ #include #include -#include -#include -#include #include #include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include #include -#include -#include -#include -#include #include -#include #include #include -#if 1 /* control */ -#define DPRINTK(format,args...) printk(KERN_DEBUG format,##args) -#else -#define DPRINTK(format,args...) -#endif - -#if 0 /* data */ -#define D2PRINTK(format,args...) printk(KERN_DEBUG format,##args) -#else -#define D2PRINTK(format,args...) -#endif - #define GRED_DEF_PRIO (MAX_DPs / 2) #define GRED_VQ_MASK (MAX_DPs - 1) @@ -72,7 +40,7 @@ struct gred_sched_data u32 bytesin; /* bytes seen on virtualQ so far*/ u32 packetsin; /* packets seen on virtualQ so far*/ u32 backlog; /* bytes on the virtualQ */ - u8 prio; /* the prio of this vq */ + u8 prio; /* the prio of this vq */ struct red_parms parms; struct red_stats stats; @@ -87,8 +55,8 @@ struct gred_sched { struct gred_sched_data *tab[MAX_DPs]; unsigned long flags; - u32 DPs; - u32 def; + u32 DPs; + u32 def; struct red_parms wred_set; }; @@ -172,13 +140,11 @@ static inline void gred_store_wred_set(struct gred_sched *table, table->wred_set.qavg = q->parms.qavg; } -static int -gred_enqueue(struct sk_buff *skb, struct Qdisc* sch) +static int gred_enqueue(struct sk_buff *skb, struct Qdisc* sch) { struct gred_sched_data *q=NULL; struct gred_sched *t= qdisc_priv(sch); unsigned long qavg = 0; - int i=0; u16 dp = tc_index_to_dp(skb); if (dp >= t->DPs || (q = t->tab[dp]) == NULL) { @@ -200,26 +166,23 @@ gred_enqueue(struct sk_buff *skb, struct Qdisc* sch) skb->tc_index = (skb->tc_index & ~GRED_VQ_MASK) | dp; } - /* sum up all the qaves of prios <= to ours to get the new qave*/ + /* sum up all the qaves of prios <= to ours to get the new qave */ if (!gred_wred_mode(t) && gred_rio_mode(t)) { - for (i=0;iDPs;i++) { - if ((!t->tab[i]) || (i==q->DP)) - continue; - - if (t->tab[i]->prio < q->prio && + int i; + + for (i = 0; i < t->DPs; i++) { + if (t->tab[i] && t->tab[i]->prio < q->prio && !red_is_idling(&t->tab[i]->parms)) qavg +=t->tab[i]->parms.qavg; } - + } q->packetsin++; - q->bytesin+=skb->len; + q->bytesin += skb->len; - if (gred_wred_mode(t)) { - qavg = 0; + if (gred_wred_mode(t)) gred_load_wred_set(t, q); - } q->parms.qavg = red_calc_qavg(&q->parms, gred_backlog(t, q, sch)); @@ -258,8 +221,7 @@ congestion_drop: return NET_XMIT_CN; } -static int -gred_requeue(struct sk_buff *skb, struct Qdisc* sch) +static int gred_requeue(struct sk_buff *skb, struct Qdisc* sch) { struct gred_sched *t = qdisc_priv(sch); struct gred_sched_data *q; @@ -279,16 +241,15 @@ gred_requeue(struct sk_buff *skb, struct Qdisc* sch) return qdisc_requeue(skb, sch); } -static struct sk_buff * -gred_dequeue(struct Qdisc* sch) +static struct sk_buff *gred_dequeue(struct Qdisc* sch) { struct sk_buff *skb; - struct gred_sched_data *q; - struct gred_sched *t= qdisc_priv(sch); + struct gred_sched *t = qdisc_priv(sch); skb = qdisc_dequeue_head(sch); if (skb) { + struct gred_sched_data *q; u16 dp = tc_index_to_dp(skb); if (dp >= t->DPs || (q = t->tab[dp]) == NULL) { @@ -315,13 +276,12 @@ gred_dequeue(struct Qdisc* sch) static unsigned int gred_drop(struct Qdisc* sch) { struct sk_buff *skb; - - struct gred_sched_data *q; - struct gred_sched *t= qdisc_priv(sch); + struct gred_sched *t = qdisc_priv(sch); skb = qdisc_dequeue_tail(sch); if (skb) { unsigned int len = skb->len; + struct gred_sched_data *q; u16 dp = tc_index_to_dp(skb); if (dp >= t->DPs || (q = t->tab[dp]) == NULL) { @@ -351,15 +311,16 @@ static unsigned int gred_drop(struct Qdisc* sch) static void gred_reset(struct Qdisc* sch) { int i; - struct gred_sched_data *q; - struct gred_sched *t= qdisc_priv(sch); + struct gred_sched *t = qdisc_priv(sch); qdisc_reset_queue(sch); - for (i=0;iDPs;i++) { - q= t->tab[i]; - if (!q) - continue; + for (i = 0; i < t->DPs; i++) { + struct gred_sched_data *q = t->tab[i]; + + if (!q) + continue; + red_restart(&q->parms); q->backlog = 0; } @@ -590,15 +551,13 @@ static void gred_destroy(struct Qdisc *sch) struct gred_sched *table = qdisc_priv(sch); int i; - for (i = 0;i < table->DPs; i++) { + for (i = 0; i < table->DPs; i++) { if (table->tab[i]) gred_destroy_vq(table->tab[i]); } } static struct Qdisc_ops gred_qdisc_ops = { - .next = NULL, - .cl_ops = NULL, .id = "gred", .priv_size = sizeof(struct gred_sched), .enqueue = gred_enqueue, @@ -617,10 +576,13 @@ static int __init gred_module_init(void) { return register_qdisc(&gred_qdisc_ops); } -static void __exit gred_module_exit(void) + +static void __exit gred_module_exit(void) { unregister_qdisc(&gred_qdisc_ops); } + module_init(gred_module_init) module_exit(gred_module_exit) + MODULE_LICENSE("GPL"); -- cgit v1.2.3-18-g5258 From d8f64e19605d6ce40bc560e7bc919e2e02a79c1b Mon Sep 17 00:00:00 2001 From: Thomas Graf Date: Sat, 5 Nov 2005 21:14:26 +0100 Subject: [PKT_SCHED]: GRED: Fix restart of idle period in WRED mode upon dequeue and drop Signed-off-by: Thomas Graf Signed-off-by: Arnaldo Carvalho de Melo --- net/sched/sch_gred.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'net') diff --git a/net/sched/sch_gred.c b/net/sched/sch_gred.c index 1fb34be32f7..69f0fd45d4c 100644 --- a/net/sched/sch_gred.c +++ b/net/sched/sch_gred.c @@ -267,7 +267,7 @@ static struct sk_buff *gred_dequeue(struct Qdisc* sch) return skb; } - if (gred_wred_mode(t)) + if (gred_wred_mode(t) && !red_is_idling(&t->wred_set)) red_start_of_idle_period(&t->wred_set); return NULL; @@ -301,7 +301,7 @@ static unsigned int gred_drop(struct Qdisc* sch) return len; } - if (gred_wred_mode(t)) + if (gred_wred_mode(t) && !red_is_idling(&t->wred_set)) red_start_of_idle_period(&t->wred_set); return 0; -- cgit v1.2.3-18-g5258 From b38c7eef7e536d12051cc3d5864032f2f907cdfe Mon Sep 17 00:00:00 2001 From: Thomas Graf Date: Sat, 5 Nov 2005 21:14:27 +0100 Subject: [PKT_SCHED]: GRED: Support ECN marking Adds a new u8 flags in a unused padding area of the netlink message. Adds ECN marking support to be used instead of dropping packets immediately. Signed-off-by: Thomas Graf Signed-off-by: Arnaldo Carvalho de Melo --- net/sched/sch_gred.c | 25 +++++++++++++++++++++---- 1 file changed, 21 insertions(+), 4 deletions(-) (limited to 'net') diff --git a/net/sched/sch_gred.c b/net/sched/sch_gred.c index 69f0fd45d4c..079b0a4ea1c 100644 --- a/net/sched/sch_gred.c +++ b/net/sched/sch_gred.c @@ -55,6 +55,7 @@ struct gred_sched { struct gred_sched_data *tab[MAX_DPs]; unsigned long flags; + u32 red_flags; u32 DPs; u32 def; struct red_parms wred_set; @@ -140,6 +141,11 @@ static inline void gred_store_wred_set(struct gred_sched *table, table->wred_set.qavg = q->parms.qavg; } +static inline int gred_use_ecn(struct gred_sched *t) +{ + return t->red_flags & TC_RED_ECN; +} + static int gred_enqueue(struct sk_buff *skb, struct Qdisc* sch) { struct gred_sched_data *q=NULL; @@ -198,13 +204,22 @@ static int gred_enqueue(struct sk_buff *skb, struct Qdisc* sch) case RED_PROB_MARK: sch->qstats.overlimits++; - q->stats.prob_drop++; - goto congestion_drop; + if (!gred_use_ecn(t) || !INET_ECN_set_ce(skb)) { + q->stats.prob_drop++; + goto congestion_drop; + } + + q->stats.prob_mark++; + break; case RED_HARD_MARK: sch->qstats.overlimits++; - q->stats.forced_drop++; - goto congestion_drop; + if (!gred_use_ecn(t) || !INET_ECN_set_ce(skb)) { + q->stats.forced_drop++; + goto congestion_drop; + } + q->stats.forced_mark++; + break; } if (q->backlog + skb->len <= q->limit) { @@ -348,6 +363,7 @@ static inline int gred_change_table_def(struct Qdisc *sch, struct rtattr *dps) sch_tree_lock(sch); table->DPs = sopt->DPs; table->def = sopt->def_DP; + table->red_flags = sopt->flags; /* * Every entry point to GRED is synchronized with the above code @@ -489,6 +505,7 @@ static int gred_dump(struct Qdisc *sch, struct sk_buff *skb) .DPs = table->DPs, .def_DP = table->def, .grio = gred_rio_mode(table), + .flags = table->red_flags, }; opts = RTA_NEST(skb, TCA_OPTIONS); -- cgit v1.2.3-18-g5258 From bdc450a0bb1d48144ced1f899cc8366ec8e85024 Mon Sep 17 00:00:00 2001 From: Thomas Graf Date: Sat, 5 Nov 2005 21:14:28 +0100 Subject: [PKT_SCHED]: (G)RED: Introduce hard dropping Introduces a new flag TC_RED_HARDDROP which specifies that if ECN marking is enabled packets should still be dropped once the average queue length exceeds the maximum threshold. This _may_ help to avoid global synchronisation during small bursts of peers advertising but not caring about ECN. Use this option very carefully, it does more harm than good if (qth_max - qth_min) does not cover at least two average burst cycles. The difference to the current behaviour, in which we'd run into the hard queue limit, is that due to the low pass filter of RED short bursts are less likely to cause a global synchronisation. Signed-off-by: Thomas Graf Signed-off-by: Arnaldo Carvalho de Melo --- net/sched/sch_gred.c | 8 +++++++- net/sched/sch_red.c | 8 +++++++- 2 files changed, 14 insertions(+), 2 deletions(-) (limited to 'net') diff --git a/net/sched/sch_gred.c b/net/sched/sch_gred.c index 079b0a4ea1c..29a2dd9f302 100644 --- a/net/sched/sch_gred.c +++ b/net/sched/sch_gred.c @@ -146,6 +146,11 @@ static inline int gred_use_ecn(struct gred_sched *t) return t->red_flags & TC_RED_ECN; } +static inline int gred_use_harddrop(struct gred_sched *t) +{ + return t->red_flags & TC_RED_HARDDROP; +} + static int gred_enqueue(struct sk_buff *skb, struct Qdisc* sch) { struct gred_sched_data *q=NULL; @@ -214,7 +219,8 @@ static int gred_enqueue(struct sk_buff *skb, struct Qdisc* sch) case RED_HARD_MARK: sch->qstats.overlimits++; - if (!gred_use_ecn(t) || !INET_ECN_set_ce(skb)) { + if (gred_use_harddrop(t) || !gred_use_ecn(t) || + !INET_ECN_set_ce(skb)) { q->stats.forced_drop++; goto congestion_drop; } diff --git a/net/sched/sch_red.c b/net/sched/sch_red.c index 0d89dee751a..dccfa44c2d7 100644 --- a/net/sched/sch_red.c +++ b/net/sched/sch_red.c @@ -51,6 +51,11 @@ static inline int red_use_ecn(struct red_sched_data *q) return q->flags & TC_RED_ECN; } +static inline int red_use_harddrop(struct red_sched_data *q) +{ + return q->flags & TC_RED_HARDDROP; +} + static int red_enqueue(struct sk_buff *skb, struct Qdisc* sch) { struct red_sched_data *q = qdisc_priv(sch); @@ -76,7 +81,8 @@ static int red_enqueue(struct sk_buff *skb, struct Qdisc* sch) case RED_HARD_MARK: sch->qstats.overlimits++; - if (!red_use_ecn(q) || !INET_ECN_set_ce(skb)) { + if (red_use_harddrop(q) || !red_use_ecn(q) || + !INET_ECN_set_ce(skb)) { q->stats.forced_drop++; goto congestion_drop; } -- cgit v1.2.3-18-g5258 From 300ce174ebc2fcf2b5111a50fa42f79d891927dd Mon Sep 17 00:00:00 2001 From: Stephen Hemminger Date: Sun, 30 Oct 2005 13:47:34 -0800 Subject: [NETEM]: Support time based reordering Change netem to support packets getting reordered because of variations in delay. Introduce a special case version of FIFO that queues packets in order based on the netem delay. Since netem is classful, those users that don't want jitter based reordering can just insert a pfifo instead of the default. This required changes to generic skbuff code to allow finer grain manipulation of sk_buff_head. Insertion into the middle and reverse walk. Signed-off-by: Stephen Hemminger Signed-off-by: Arnaldo Carvalho de Melo --- net/sched/sch_netem.c | 85 ++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 84 insertions(+), 1 deletion(-) (limited to 'net') diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c index d871fe7f81a..7c10ef3457d 100644 --- a/net/sched/sch_netem.c +++ b/net/sched/sch_netem.c @@ -300,11 +300,16 @@ static void netem_reset(struct Qdisc *sch) del_timer_sync(&q->timer); } +/* Pass size change message down to embedded FIFO */ static int set_fifo_limit(struct Qdisc *q, int limit) { struct rtattr *rta; int ret = -ENOMEM; + /* Hack to avoid sending change message to non-FIFO */ + if (strncmp(q->ops->id + 1, "fifo", 4) != 0) + return 0; + rta = kmalloc(RTA_LENGTH(sizeof(struct tc_fifo_qopt)), GFP_KERNEL); if (rta) { rta->rta_type = RTM_NEWQDISC; @@ -436,6 +441,84 @@ static int netem_change(struct Qdisc *sch, struct rtattr *opt) return 0; } +/* + * Special case version of FIFO queue for use by netem. + * It queues in order based on timestamps in skb's + */ +struct fifo_sched_data { + u32 limit; +}; + +static int tfifo_enqueue(struct sk_buff *nskb, struct Qdisc *sch) +{ + struct fifo_sched_data *q = qdisc_priv(sch); + struct sk_buff_head *list = &sch->q; + const struct netem_skb_cb *ncb + = (const struct netem_skb_cb *)nskb->cb; + struct sk_buff *skb; + + if (likely(skb_queue_len(list) < q->limit)) { + skb_queue_reverse_walk(list, skb) { + const struct netem_skb_cb *cb + = (const struct netem_skb_cb *)skb->cb; + + if (PSCHED_TLESS(cb->time_to_send, ncb->time_to_send)) + break; + } + + __skb_queue_after(list, skb, nskb); + + sch->qstats.backlog += nskb->len; + sch->bstats.bytes += nskb->len; + sch->bstats.packets++; + + return NET_XMIT_SUCCESS; + } + + return qdisc_drop(nskb, sch); +} + +static int tfifo_init(struct Qdisc *sch, struct rtattr *opt) +{ + struct fifo_sched_data *q = qdisc_priv(sch); + + if (opt) { + struct tc_fifo_qopt *ctl = RTA_DATA(opt); + if (RTA_PAYLOAD(opt) < sizeof(*ctl)) + return -EINVAL; + + q->limit = ctl->limit; + } else + q->limit = max_t(u32, sch->dev->tx_queue_len, 1); + + return 0; +} + +static int tfifo_dump(struct Qdisc *sch, struct sk_buff *skb) +{ + struct fifo_sched_data *q = qdisc_priv(sch); + struct tc_fifo_qopt opt = { .limit = q->limit }; + + RTA_PUT(skb, TCA_OPTIONS, sizeof(opt), &opt); + return skb->len; + +rtattr_failure: + return -1; +} + +static struct Qdisc_ops tfifo_qdisc_ops = { + .id = "tfifo", + .priv_size = sizeof(struct fifo_sched_data), + .enqueue = tfifo_enqueue, + .dequeue = qdisc_dequeue_head, + .requeue = qdisc_requeue, + .drop = qdisc_queue_drop, + .init = tfifo_init, + .reset = qdisc_reset_queue, + .change = tfifo_init, + .dump = tfifo_dump, +}; + static int netem_init(struct Qdisc *sch, struct rtattr *opt) { struct netem_sched_data *q = qdisc_priv(sch); @@ -448,7 +531,7 @@ static int netem_init(struct Qdisc *sch, struct rtattr *opt) q->timer.function = netem_watchdog; q->timer.data = (unsigned long) sch; - q->qdisc = qdisc_create_dflt(sch->dev, &pfifo_qdisc_ops); + q->qdisc = qdisc_create_dflt(sch->dev, &tfifo_qdisc_ops); if (!q->qdisc) { pr_debug("netem: qdisc create failed\n"); return -ENOMEM; -- cgit v1.2.3-18-g5258 From eb229c4cdc3389682cda20adb015ba767950a220 Mon Sep 17 00:00:00 2001 From: Stephen Hemminger Date: Thu, 3 Nov 2005 13:49:01 -0800 Subject: [NETEM]: Add version string Add a version string to help support issues. Signed-off-by: Stephen Hemminger Signed-off-by: Arnaldo Carvalho de Melo --- net/sched/sch_netem.c | 3 +++ 1 file changed, 3 insertions(+) (limited to 'net') diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c index 7c10ef3457d..cdc8d283791 100644 --- a/net/sched/sch_netem.c +++ b/net/sched/sch_netem.c @@ -25,6 +25,8 @@ #include +#define VERSION "1.1" + /* Network Emulation Queuing algorithm. ==================================== @@ -694,6 +696,7 @@ static struct Qdisc_ops netem_qdisc_ops = { static int __init netem_module_init(void) { + pr_info("netem: version " VERSION "\n"); return register_qdisc(&netem_qdisc_ops); } static void __exit netem_module_exit(void) -- cgit v1.2.3-18-g5258 From 6151b31c9616d71f714fc7ef8e2306f67f3b94c3 Mon Sep 17 00:00:00 2001 From: Herbert Xu Date: Fri, 4 Nov 2005 09:56:56 +1100 Subject: [NET]: Fix race condition in sk_stream_wait_connect When sk_stream_wait_connect detects a state transition to ESTABLISHED or CLOSE_WAIT prior to it going to sleep, it will return without calling finish_wait and decrementing sk_write_pending. This may result in crashes and other unintended behaviour. The fix is to always call finish_wait and update sk_write_pending since it is safe to do so even if the wait entry is no longer on the queue. This bug was tracked down with the help of Alex Sidorenko and the fix is also based on his suggestion. Signed-off-by: Herbert Xu Signed-off-by: Arnaldo Carvalho de Melo --- net/core/stream.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) (limited to 'net') diff --git a/net/core/stream.c b/net/core/stream.c index ac9edfdf874..15bfd03e802 100644 --- a/net/core/stream.c +++ b/net/core/stream.c @@ -52,8 +52,9 @@ int sk_stream_wait_connect(struct sock *sk, long *timeo_p) { struct task_struct *tsk = current; DEFINE_WAIT(wait); + int done; - while (1) { + do { if (sk->sk_err) return sock_error(sk); if ((1 << sk->sk_state) & ~(TCPF_SYN_SENT | TCPF_SYN_RECV)) @@ -65,13 +66,12 @@ int sk_stream_wait_connect(struct sock *sk, long *timeo_p) prepare_to_wait(sk->sk_sleep, &wait, TASK_INTERRUPTIBLE); sk->sk_write_pending++; - if (sk_wait_event(sk, timeo_p, - !((1 << sk->sk_state) & - ~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)))) - break; + done = sk_wait_event(sk, timeo_p, + !((1 << sk->sk_state) & + ~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT))); finish_wait(sk->sk_sleep, &wait); sk->sk_write_pending--; - } + } while (!done); return 0; } -- cgit v1.2.3-18-g5258 From 6df716340da3a6fdd33d73d7ed4c6f7590ca1c42 Mon Sep 17 00:00:00 2001 From: Stephen Hemminger Date: Thu, 3 Nov 2005 16:33:23 -0800 Subject: [TCP/DCCP]: Randomize port selection This patch randomizes the port selected on bind() for connections to help with possible security attacks. It should also be faster in most cases because there is no need for a global lock. Signed-off-by: Stephen Hemminger Signed-off-by: Arnaldo Carvalho de Melo --- net/dccp/ipv4.c | 32 +++----------------------------- net/ipv4/inet_connection_sock.c | 14 +++----------- net/ipv4/tcp.c | 1 - net/ipv4/tcp_ipv4.c | 2 -- net/ipv6/tcp_ipv6.c | 15 ++++----------- 5 files changed, 10 insertions(+), 54 deletions(-) (limited to 'net') diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c index 6298cf58ff9..4b9bc81ae1a 100644 --- a/net/dccp/ipv4.c +++ b/net/dccp/ipv4.c @@ -31,8 +31,6 @@ struct inet_hashinfo __cacheline_aligned dccp_hashinfo = { .lhash_lock = RW_LOCK_UNLOCKED, .lhash_users = ATOMIC_INIT(0), .lhash_wait = __WAIT_QUEUE_HEAD_INITIALIZER(dccp_hashinfo.lhash_wait), - .portalloc_lock = SPIN_LOCK_UNLOCKED, - .port_rover = 1024 - 1, }; EXPORT_SYMBOL_GPL(dccp_hashinfo); @@ -125,36 +123,15 @@ static int dccp_v4_hash_connect(struct sock *sk) int ret; if (snum == 0) { - int rover; int low = sysctl_local_port_range[0]; int high = sysctl_local_port_range[1]; int remaining = (high - low) + 1; + int rover = net_random() % (high - low) + low; struct hlist_node *node; struct inet_timewait_sock *tw = NULL; local_bh_disable(); - - /* TODO. Actually it is not so bad idea to remove - * dccp_hashinfo.portalloc_lock before next submission to - * Linus. - * As soon as we touch this place at all it is time to think. - * - * Now it protects single _advisory_ variable - * dccp_hashinfo.port_rover, hence it is mostly useless. - * Code will work nicely if we just delete it, but - * I am afraid in contented case it will work not better or - * even worse: another cpu just will hit the same bucket - * and spin there. - * So some cpu salt could remove both contention and - * memory pingpong. Any ideas how to do this in a nice way? - */ - spin_lock(&dccp_hashinfo.portalloc_lock); - rover = dccp_hashinfo.port_rover; - do { - rover++; - if ((rover < low) || (rover > high)) - rover = low; head = &dccp_hashinfo.bhash[inet_bhashfn(rover, dccp_hashinfo.bhash_size)]; spin_lock(&head->lock); @@ -187,9 +164,9 @@ static int dccp_v4_hash_connect(struct sock *sk) next_port: spin_unlock(&head->lock); + if (++rover > high) + rover = low; } while (--remaining > 0); - dccp_hashinfo.port_rover = rover; - spin_unlock(&dccp_hashinfo.portalloc_lock); local_bh_enable(); @@ -197,9 +174,6 @@ static int dccp_v4_hash_connect(struct sock *sk) ok: /* All locks still held and bhs disabled */ - dccp_hashinfo.port_rover = rover; - spin_unlock(&dccp_hashinfo.portalloc_lock); - inet_bind_hash(sk, tb, rover); if (sk_unhashed(sk)) { inet_sk(sk)->sport = htons(rover); diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 94468a76c5b..3fe021f1a56 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -78,17 +78,9 @@ int inet_csk_get_port(struct inet_hashinfo *hashinfo, int low = sysctl_local_port_range[0]; int high = sysctl_local_port_range[1]; int remaining = (high - low) + 1; - int rover; + int rover = net_random() % (high - low) + low; - spin_lock(&hashinfo->portalloc_lock); - if (hashinfo->port_rover < low) - rover = low; - else - rover = hashinfo->port_rover; do { - rover++; - if (rover > high) - rover = low; head = &hashinfo->bhash[inet_bhashfn(rover, hashinfo->bhash_size)]; spin_lock(&head->lock); inet_bind_bucket_for_each(tb, node, &head->chain) @@ -97,9 +89,9 @@ int inet_csk_get_port(struct inet_hashinfo *hashinfo, break; next: spin_unlock(&head->lock); + if (++rover > high) + rover = low; } while (--remaining > 0); - hashinfo->port_rover = rover; - spin_unlock(&hashinfo->portalloc_lock); /* Exhausted local port range during search? It is not * possible for us to be holding one of the bind hash diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index f3f0013a958..72b7c22e1ea 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2112,7 +2112,6 @@ void __init tcp_init(void) sysctl_tcp_max_orphans >>= (3 - order); sysctl_max_syn_backlog = 128; } - tcp_hashinfo.port_rover = sysctl_local_port_range[0] - 1; sysctl_tcp_mem[0] = 768 << order; sysctl_tcp_mem[1] = 1024 << order; diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index c85819d8474..49d67cd75ed 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -93,8 +93,6 @@ struct inet_hashinfo __cacheline_aligned tcp_hashinfo = { .lhash_lock = RW_LOCK_UNLOCKED, .lhash_users = ATOMIC_INIT(0), .lhash_wait = __WAIT_QUEUE_HEAD_INITIALIZER(tcp_hashinfo.lhash_wait), - .portalloc_lock = SPIN_LOCK_UNLOCKED, - .port_rover = 1024 - 1, }; static int tcp_v4_get_port(struct sock *sk, unsigned short snum) diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index d693cb988b7..d746d3b27ef 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -114,16 +114,9 @@ static int tcp_v6_get_port(struct sock *sk, unsigned short snum) int low = sysctl_local_port_range[0]; int high = sysctl_local_port_range[1]; int remaining = (high - low) + 1; - int rover; + int rover = net_random() % (high - low) + low; - spin_lock(&tcp_hashinfo.portalloc_lock); - if (tcp_hashinfo.port_rover < low) - rover = low; - else - rover = tcp_hashinfo.port_rover; - do { rover++; - if (rover > high) - rover = low; + do { head = &tcp_hashinfo.bhash[inet_bhashfn(rover, tcp_hashinfo.bhash_size)]; spin_lock(&head->lock); inet_bind_bucket_for_each(tb, node, &head->chain) @@ -132,9 +125,9 @@ static int tcp_v6_get_port(struct sock *sk, unsigned short snum) break; next: spin_unlock(&head->lock); + if (++rover > high) + rover = low; } while (--remaining > 0); - tcp_hashinfo.port_rover = rover; - spin_unlock(&tcp_hashinfo.portalloc_lock); /* Exhausted local port range during search? It is not * possible for us to be holding one of the bind hash -- cgit v1.2.3-18-g5258 From a10b5aacea01d59152b9d003a14476ee99d394d8 Mon Sep 17 00:00:00 2001 From: Jeff Garzik Date: Sat, 5 Nov 2005 23:39:54 -0500 Subject: Remove linux/version.h include from drivers/net/phy/* and net/ieee80211/*. Unused, and causes the files to be needlessly rebuilt in some cases. --- net/ieee80211/ieee80211_crypt.c | 1 - net/ieee80211/ieee80211_crypt_ccmp.c | 1 - net/ieee80211/ieee80211_crypt_tkip.c | 1 - net/ieee80211/ieee80211_crypt_wep.c | 1 - net/ieee80211/ieee80211_geo.c | 1 - net/ieee80211/ieee80211_module.c | 1 - net/ieee80211/ieee80211_rx.c | 1 - net/ieee80211/ieee80211_tx.c | 1 - 8 files changed, 8 deletions(-) (limited to 'net') diff --git a/net/ieee80211/ieee80211_crypt.c b/net/ieee80211/ieee80211_crypt.c index f3b6aa3be63..20cc580a07e 100644 --- a/net/ieee80211/ieee80211_crypt.c +++ b/net/ieee80211/ieee80211_crypt.c @@ -12,7 +12,6 @@ */ #include -#include #include #include #include diff --git a/net/ieee80211/ieee80211_crypt_ccmp.c b/net/ieee80211/ieee80211_crypt_ccmp.c index 05a853c1301..47022172850 100644 --- a/net/ieee80211/ieee80211_crypt_ccmp.c +++ b/net/ieee80211/ieee80211_crypt_ccmp.c @@ -10,7 +10,6 @@ */ #include -#include #include #include #include diff --git a/net/ieee80211/ieee80211_crypt_tkip.c b/net/ieee80211/ieee80211_crypt_tkip.c index 2e34f29b795..e0988320efb 100644 --- a/net/ieee80211/ieee80211_crypt_tkip.c +++ b/net/ieee80211/ieee80211_crypt_tkip.c @@ -10,7 +10,6 @@ */ #include -#include #include #include #include diff --git a/net/ieee80211/ieee80211_crypt_wep.c b/net/ieee80211/ieee80211_crypt_wep.c index 7c08ed2f262..073aebdf0f6 100644 --- a/net/ieee80211/ieee80211_crypt_wep.c +++ b/net/ieee80211/ieee80211_crypt_wep.c @@ -10,7 +10,6 @@ */ #include -#include #include #include #include diff --git a/net/ieee80211/ieee80211_geo.c b/net/ieee80211/ieee80211_geo.c index c4b54ef8f6d..610cc5cbc25 100644 --- a/net/ieee80211/ieee80211_geo.c +++ b/net/ieee80211/ieee80211_geo.c @@ -38,7 +38,6 @@ #include #include #include -#include #include #include #include diff --git a/net/ieee80211/ieee80211_module.c b/net/ieee80211/ieee80211_module.c index f66d792cd20..321287bc887 100644 --- a/net/ieee80211/ieee80211_module.c +++ b/net/ieee80211/ieee80211_module.c @@ -45,7 +45,6 @@ #include #include #include -#include #include #include #include diff --git a/net/ieee80211/ieee80211_rx.c b/net/ieee80211/ieee80211_rx.c index ce694cf5c16..6ad88218f57 100644 --- a/net/ieee80211/ieee80211_rx.c +++ b/net/ieee80211/ieee80211_rx.c @@ -28,7 +28,6 @@ #include #include #include -#include #include #include #include diff --git a/net/ieee80211/ieee80211_tx.c b/net/ieee80211/ieee80211_tx.c index 95ccbadbf55..445f206e65e 100644 --- a/net/ieee80211/ieee80211_tx.c +++ b/net/ieee80211/ieee80211_tx.c @@ -38,7 +38,6 @@ #include #include #include -#include #include #include #include -- cgit v1.2.3-18-g5258 From 80d188a643b0f550a2aaedf7bf4dd1abd86cfc45 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Mon, 7 Nov 2005 01:00:27 -0800 Subject: [PATCH] knfsd: make sure svc_process call the correct pg_authenticate for multi-service port If an RPC socket is serving multiple programs, then the pg_authenticate of the first program in the list is called, instead of pg_authenticate for the program to be run. This does not cause a problem with any programs in the current kernel, but could confuse future code. Also set pg_authenticate for nfsd_acl_program incase it ever gets used. Signed-off-by: Neil Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- net/sunrpc/svc.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) (limited to 'net') diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index e9bd91265f7..5a220b2bb37 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -313,6 +313,11 @@ svc_process(struct svc_serv *serv, struct svc_rqst *rqstp) rqstp->rq_proc = proc = ntohl(svc_getu32(argv)); /* procedure number */ progp = serv->sv_program; + + for (progp = serv->sv_program; progp; progp = progp->pg_next) + if (prog == progp->pg_prog) + break; + /* * Decode auth data, and add verifier to reply buffer. * We do this before anything else in order to get a decent @@ -320,7 +325,7 @@ svc_process(struct svc_serv *serv, struct svc_rqst *rqstp) */ auth_res = svc_authenticate(rqstp, &auth_stat); /* Also give the program a chance to reject this call: */ - if (auth_res == SVC_OK) { + if (auth_res == SVC_OK && progp) { auth_stat = rpc_autherr_badcred; auth_res = progp->pg_authenticate(rqstp); } @@ -340,10 +345,7 @@ svc_process(struct svc_serv *serv, struct svc_rqst *rqstp) case SVC_COMPLETE: goto sendit; } - - for (progp = serv->sv_program; progp; progp = progp->pg_next) - if (prog == progp->pg_prog) - break; + if (progp == NULL) goto err_bad_prog; -- cgit v1.2.3-18-g5258 From 81f875208e7f46d003bedb82d5cfe54458a3ab60 Mon Sep 17 00:00:00 2001 From: James Ketrenos Date: Mon, 24 Oct 2005 10:20:53 -0500 Subject: scripts/Lindent on ieee80211 subsystem. Signed-off-by: James Ketrenos --- net/ieee80211/ieee80211_wx.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) (limited to 'net') diff --git a/net/ieee80211/ieee80211_wx.c b/net/ieee80211/ieee80211_wx.c index 1ce7af9bec3..f5b80535500 100644 --- a/net/ieee80211/ieee80211_wx.c +++ b/net/ieee80211/ieee80211_wx.c @@ -161,9 +161,11 @@ static inline char *ipw2100_translate_scan(struct ieee80211_device *ieee, (ieee->perfect_rssi - ieee->worst_rssi) - (ieee->perfect_rssi - network->stats.rssi) * (15 * (ieee->perfect_rssi - ieee->worst_rssi) + - 62 * (ieee->perfect_rssi - network->stats.rssi))) / - ((ieee->perfect_rssi - ieee->worst_rssi) * - (ieee->perfect_rssi - ieee->worst_rssi)); + 62 * (ieee->perfect_rssi - + network->stats.rssi))) / + ((ieee->perfect_rssi - + ieee->worst_rssi) * (ieee->perfect_rssi - + ieee->worst_rssi)); if (iwe.u.qual.qual > 100) iwe.u.qual.qual = 100; else if (iwe.u.qual.qual < 1) -- cgit v1.2.3-18-g5258 From e189277a3f1cbb0f1282e0f4b8fa8c91e004c286 Mon Sep 17 00:00:00 2001 From: Volker Braun Date: Mon, 24 Oct 2005 10:15:36 -0500 Subject: Fix problem with WEP unicast key > index 0 The functions ieee80211_wx_{get,set}_encodeext fail if one tries to set unicast (IW_ENCODE_EXT_GROUP_KEY not set) keys at key indices>0. But at least some Cisco APs dish out dynamic WEP unicast keys at index !=0. Signed-off-by: Volker Braun Signed-off-by: James Ketrenos --- net/ieee80211/ieee80211_wx.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) (limited to 'net') diff --git a/net/ieee80211/ieee80211_wx.c b/net/ieee80211/ieee80211_wx.c index f5b80535500..181755f2aa8 100644 --- a/net/ieee80211/ieee80211_wx.c +++ b/net/ieee80211/ieee80211_wx.c @@ -522,7 +522,8 @@ int ieee80211_wx_set_encodeext(struct ieee80211_device *ieee, crypt = &ieee->crypt[idx]; group_key = 1; } else { - if (idx != 0) + /* some Cisco APs use idx>0 for unicast in dynamic WEP */ + if (idx != 0 && ext->alg != IW_ENCODE_ALG_WEP) return -EINVAL; if (ieee->iw_mode == IW_MODE_INFRA) crypt = &ieee->crypt[idx]; @@ -690,7 +691,8 @@ int ieee80211_wx_get_encodeext(struct ieee80211_device *ieee, } else idx = ieee->tx_keyidx; - if (!ext->ext_flags & IW_ENCODE_EXT_GROUP_KEY) + if (!ext->ext_flags & IW_ENCODE_EXT_GROUP_KEY && + ext->alg != IW_ENCODE_ALG_WEP) if (idx != 0 || ieee->iw_mode != IW_MODE_INFRA) return -EINVAL; -- cgit v1.2.3-18-g5258 From fd7a516efbcdabf5d7b9307ca9fe48b511b7d123 Mon Sep 17 00:00:00 2001 From: Adrian Bunk Date: Wed, 2 Nov 2005 01:53:16 +0100 Subject: [PATCH] fix NET_RADIO=n, IEEE80211=y compile This patch fixes the following compile error with CONFIG_NET_RADIO=n and CONFIG_IEEE80211=y: LD .tmp_vmlinux1 net/built-in.o: In function `ieee80211_rx': : undefined reference to `wireless_spy_update' make: *** [.tmp_vmlinux1] Error 1 Signed-off-by: Adrian Bunk Signed-off-by: John W. Linville --- net/ieee80211/ieee80211_rx.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'net') diff --git a/net/ieee80211/ieee80211_rx.c b/net/ieee80211/ieee80211_rx.c index ce694cf5c16..00eb780836d 100644 --- a/net/ieee80211/ieee80211_rx.c +++ b/net/ieee80211/ieee80211_rx.c @@ -370,6 +370,7 @@ int ieee80211_rx(struct ieee80211_device *ieee, struct sk_buff *skb, /* Put this code here so that we avoid duplicating it in all * Rx paths. - Jean II */ #ifdef IW_WIRELESS_SPY /* defined in iw_handler.h */ +#ifdef CONFIG_NET_RADIO /* If spy monitoring on */ if (ieee->spy_data.spy_number > 0) { struct iw_quality wstats; @@ -396,6 +397,7 @@ int ieee80211_rx(struct ieee80211_device *ieee, struct sk_buff *skb, /* Update spy records */ wireless_spy_update(ieee->dev, hdr->addr2, &wstats); } +#endif /* CONFIG_NET_RADIO */ #endif /* IW_WIRELESS_SPY */ #ifdef NOT_YET -- cgit v1.2.3-18-g5258 From 971f359ddcb2e7a0d577479c7561bda407febe1b Mon Sep 17 00:00:00 2001 From: YOSHIFUJI Hideaki Date: Tue, 8 Nov 2005 09:37:56 -0800 Subject: [IPV6]: Put addr_diff() into common header for future use. Signed-off-by: YOSHIFUJI Hideaki Signed-off-by: David S. Miller --- net/ipv6/ip6_fib.c | 54 ++---------------------------------------------------- 1 file changed, 2 insertions(+), 52 deletions(-) (limited to 'net') diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c index 4fcc5a7acf6..1bf6d9a769e 100644 --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -127,56 +127,6 @@ static __inline__ int addr_bit_set(void *token, int fn_bit) return htonl(1 << ((~fn_bit)&0x1F)) & addr[fn_bit>>5]; } -/* - * find the first different bit between two addresses - * length of address must be a multiple of 32bits - */ - -static __inline__ int addr_diff(void *token1, void *token2, int addrlen) -{ - __u32 *a1 = token1; - __u32 *a2 = token2; - int i; - - addrlen >>= 2; - - for (i = 0; i < addrlen; i++) { - __u32 xb; - - xb = a1[i] ^ a2[i]; - - if (xb) { - int j = 31; - - xb = ntohl(xb); - - while ((xb & (1 << j)) == 0) - j--; - - return (i * 32 + 31 - j); - } - } - - /* - * we should *never* get to this point since that - * would mean the addrs are equal - * - * However, we do get to it 8) And exacly, when - * addresses are equal 8) - * - * ip route add 1111::/128 via ... - * ip route add 1111::/64 via ... - * and we are here. - * - * Ideally, this function should stop comparison - * at prefix length. It does not, but it is still OK, - * if returned value is greater than prefix length. - * --ANK (980803) - */ - - return addrlen<<5; -} - static __inline__ struct fib6_node * node_alloc(void) { struct fib6_node *fn; @@ -296,11 +246,11 @@ insert_above: /* find 1st bit in difference between the 2 addrs. - See comment in addr_diff: bit may be an invalid value, + See comment in __ipv6_addr_diff: bit may be an invalid value, but if it is >= plen, the value is ignored in any case. */ - bit = addr_diff(addr, &key->addr, addrlen); + bit = __ipv6_addr_diff(addr, &key->addr, addrlen); /* * (intermediate)[in] -- cgit v1.2.3-18-g5258 From b1cacb6820e0afc4aeeea67bcb5296a316862cad Mon Sep 17 00:00:00 2001 From: YOSHIFUJI Hideaki Date: Tue, 8 Nov 2005 09:38:12 -0800 Subject: [IPV6]: Make ipv6_addr_type() more generic so that we can use it for source address selection. Signed-off-by: YOSHIFUJI Hideaki Signed-off-by: David S. Miller --- net/ipv6/addrconf.c | 90 +++++++++++++++++++++++++++------------------------- net/ipv6/ipv6_syms.c | 2 +- 2 files changed, 48 insertions(+), 44 deletions(-) (limited to 'net') diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 2c5f57299d6..ff895da6395 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -35,6 +35,9 @@ * YOSHIFUJI Hideaki @USAGI : ARCnet support * YOSHIFUJI Hideaki @USAGI : convert /proc/net/if_inet6 to * seq_file. + * YOSHIFUJI Hideaki @USAGI : improved source address + * selection; consider scope, + * status etc. */ #include @@ -193,46 +196,51 @@ const struct in6_addr in6addr_any = IN6ADDR_ANY_INIT; #endif const struct in6_addr in6addr_loopback = IN6ADDR_LOOPBACK_INIT; -int ipv6_addr_type(const struct in6_addr *addr) +#define IPV6_ADDR_SCOPE_TYPE(scope) ((scope) << 16) + +static inline unsigned ipv6_addr_scope2type(unsigned scope) +{ + switch(scope) { + case IPV6_ADDR_SCOPE_NODELOCAL: + return (IPV6_ADDR_SCOPE_TYPE(IPV6_ADDR_SCOPE_NODELOCAL) | + IPV6_ADDR_LOOPBACK); + case IPV6_ADDR_SCOPE_LINKLOCAL: + return (IPV6_ADDR_SCOPE_TYPE(IPV6_ADDR_SCOPE_LINKLOCAL) | + IPV6_ADDR_LINKLOCAL); + case IPV6_ADDR_SCOPE_SITELOCAL: + return (IPV6_ADDR_SCOPE_TYPE(IPV6_ADDR_SCOPE_SITELOCAL) | + IPV6_ADDR_SITELOCAL); + } + return IPV6_ADDR_SCOPE_TYPE(scope); +} + +int __ipv6_addr_type(const struct in6_addr *addr) { - int type; u32 st; st = addr->s6_addr32[0]; - if ((st & htonl(0xFF000000)) == htonl(0xFF000000)) { - type = IPV6_ADDR_MULTICAST; - - switch((st & htonl(0x00FF0000))) { - case __constant_htonl(0x00010000): - type |= IPV6_ADDR_LOOPBACK; - break; - - case __constant_htonl(0x00020000): - type |= IPV6_ADDR_LINKLOCAL; - break; - - case __constant_htonl(0x00050000): - type |= IPV6_ADDR_SITELOCAL; - break; - }; - return type; - } - - type = IPV6_ADDR_UNICAST; - /* Consider all addresses with the first three bits different of - 000 and 111 as finished. + 000 and 111 as unicasts. */ if ((st & htonl(0xE0000000)) != htonl(0x00000000) && (st & htonl(0xE0000000)) != htonl(0xE0000000)) - return type; - - if ((st & htonl(0xFFC00000)) == htonl(0xFE800000)) - return (IPV6_ADDR_LINKLOCAL | type); + return (IPV6_ADDR_UNICAST | + IPV6_ADDR_SCOPE_TYPE(IPV6_ADDR_SCOPE_GLOBAL)); + if ((st & htonl(0xFF000000)) == htonl(0xFF000000)) { + /* multicast */ + /* addr-select 3.1 */ + return (IPV6_ADDR_MULTICAST | + ipv6_addr_scope2type(IPV6_ADDR_MC_SCOPE(addr))); + } + + if ((st & htonl(0xFFC00000)) == htonl(0xFE800000)) + return (IPV6_ADDR_LINKLOCAL | IPV6_ADDR_UNICAST | + IPV6_ADDR_SCOPE_TYPE(IPV6_ADDR_SCOPE_LINKLOCAL)); /* addr-select 3.1 */ if ((st & htonl(0xFFC00000)) == htonl(0xFEC00000)) - return (IPV6_ADDR_SITELOCAL | type); + return (IPV6_ADDR_SITELOCAL | IPV6_ADDR_UNICAST | + IPV6_ADDR_SCOPE_TYPE(IPV6_ADDR_SCOPE_SITELOCAL)); /* addr-select 3.1 */ if ((addr->s6_addr32[0] | addr->s6_addr32[1]) == 0) { if (addr->s6_addr32[2] == 0) { @@ -240,24 +248,20 @@ int ipv6_addr_type(const struct in6_addr *addr) return IPV6_ADDR_ANY; if (addr->s6_addr32[3] == htonl(0x00000001)) - return (IPV6_ADDR_LOOPBACK | type); + return (IPV6_ADDR_LOOPBACK | IPV6_ADDR_UNICAST | + IPV6_ADDR_SCOPE_TYPE(IPV6_ADDR_SCOPE_LINKLOCAL)); /* addr-select 3.4 */ - return (IPV6_ADDR_COMPATv4 | type); + return (IPV6_ADDR_COMPATv4 | IPV6_ADDR_UNICAST | + IPV6_ADDR_SCOPE_TYPE(IPV6_ADDR_SCOPE_GLOBAL)); /* addr-select 3.3 */ } if (addr->s6_addr32[2] == htonl(0x0000ffff)) - return IPV6_ADDR_MAPPED; - } - - st &= htonl(0xFF000000); - if (st == 0) - return IPV6_ADDR_RESERVED; - st &= htonl(0xFE000000); - if (st == htonl(0x02000000)) - return IPV6_ADDR_RESERVED; /* for NSAP */ - if (st == htonl(0x04000000)) - return IPV6_ADDR_RESERVED; /* for IPX */ - return type; + return (IPV6_ADDR_MAPPED | + IPV6_ADDR_SCOPE_TYPE(IPV6_ADDR_SCOPE_GLOBAL)); /* addr-select 3.3 */ + } + + return (IPV6_ADDR_RESERVED | + IPV6_ADDR_SCOPE_TYPE(IPV6_ADDR_SCOPE_GLOBAL)); /* addr-select 3.4 */ } static void addrconf_del_timer(struct inet6_ifaddr *ifp) diff --git a/net/ipv6/ipv6_syms.c b/net/ipv6/ipv6_syms.c index 37a4a99c9fe..16482785bdf 100644 --- a/net/ipv6/ipv6_syms.c +++ b/net/ipv6/ipv6_syms.c @@ -7,7 +7,7 @@ #include #include -EXPORT_SYMBOL(ipv6_addr_type); +EXPORT_SYMBOL(__ipv6_addr_type); EXPORT_SYMBOL(icmpv6_send); EXPORT_SYMBOL(icmpv6_statistics); EXPORT_SYMBOL(icmpv6_err_convert); -- cgit v1.2.3-18-g5258 From 072047e4de3800905e09d0f8ef0e1cc4e91a601e Mon Sep 17 00:00:00 2001 From: YOSHIFUJI Hideaki Date: Tue, 8 Nov 2005 09:38:30 -0800 Subject: [IPV6]: RFC3484 compliant source address selection Choose more appropriate source address; e.g. - outgoing interface - non-deprecated - scope - matching label Signed-off-by: YOSHIFUJI Hideaki Signed-off-by: David S. Miller --- net/ipv6/addrconf.c | 344 ++++++++++++++++++++++++++++++++++++---------------- 1 file changed, 240 insertions(+), 104 deletions(-) (limited to 'net') diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index ff895da6395..a34d1504deb 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -809,138 +809,274 @@ out: #endif /* - * Choose an appropriate source address - * should do: - * i) get an address with an appropriate scope - * ii) see if there is a specific route for the destination and use - * an address of the attached interface - * iii) don't use deprecated addresses + * Choose an appropriate source address (RFC3484) */ -static int inline ipv6_saddr_pref(const struct inet6_ifaddr *ifp, u8 invpref) +struct ipv6_saddr_score { + int addr_type; + unsigned int attrs; + int matchlen; + unsigned int scope; + unsigned int rule; +}; + +#define IPV6_SADDR_SCORE_LOCAL 0x0001 +#define IPV6_SADDR_SCORE_PREFERRED 0x0004 +#define IPV6_SADDR_SCORE_HOA 0x0008 +#define IPV6_SADDR_SCORE_OIF 0x0010 +#define IPV6_SADDR_SCORE_LABEL 0x0020 +#define IPV6_SADDR_SCORE_PRIVACY 0x0040 + +static int inline ipv6_saddr_preferred(int type) { - int pref; - pref = ifp->flags&IFA_F_DEPRECATED ? 0 : 2; -#ifdef CONFIG_IPV6_PRIVACY - pref |= (ifp->flags^invpref)&IFA_F_TEMPORARY ? 0 : 1; -#endif - return pref; + if (type & (IPV6_ADDR_MAPPED|IPV6_ADDR_COMPATv4| + IPV6_ADDR_LOOPBACK|IPV6_ADDR_RESERVED)) + return 1; + return 0; } -#ifdef CONFIG_IPV6_PRIVACY -#define IPV6_GET_SADDR_MAXSCORE(score) ((score) == 3) -#else -#define IPV6_GET_SADDR_MAXSCORE(score) (score) -#endif +/* static matching label */ +static int inline ipv6_saddr_label(const struct in6_addr *addr, int type) +{ + /* + * prefix (longest match) label + * ----------------------------- + * ::1/128 0 + * ::/0 1 + * 2002::/16 2 + * ::/96 3 + * ::ffff:0:0/96 4 + */ + if (type & IPV6_ADDR_LOOPBACK) + return 0; + else if (type & IPV6_ADDR_COMPATv4) + return 3; + else if (type & IPV6_ADDR_MAPPED) + return 4; + else if (addr->s6_addr16[0] == htons(0x2002)) + return 2; + return 1; +} -int ipv6_dev_get_saddr(struct net_device *dev, +int ipv6_dev_get_saddr(struct net_device *daddr_dev, struct in6_addr *daddr, struct in6_addr *saddr) { - struct inet6_ifaddr *ifp = NULL; - struct inet6_ifaddr *match = NULL; - struct inet6_dev *idev; - int scope; - int err; - int hiscore = -1, score; + struct ipv6_saddr_score hiscore; + struct inet6_ifaddr *ifa_result = NULL; + int daddr_type = __ipv6_addr_type(daddr); + int daddr_scope = __ipv6_addr_src_scope(daddr_type); + u32 daddr_label = ipv6_saddr_label(daddr, daddr_type); + struct net_device *dev; - scope = ipv6_addr_scope(daddr); + memset(&hiscore, 0, sizeof(hiscore)); - /* - * known dev - * search dev and walk through dev addresses - */ + read_lock(&dev_base_lock); + read_lock(&addrconf_lock); - if (dev) { - if (dev->flags & IFF_LOOPBACK) - scope = IFA_HOST; + for (dev = dev_base; dev; dev=dev->next) { + struct inet6_dev *idev; + struct inet6_ifaddr *ifa; + + /* Rule 0: Candidate Source Address (section 4) + * - multicast and link-local destination address, + * the set of candidate source address MUST only + * include addresses assigned to interfaces + * belonging to the same link as the outgoing + * interface. + * (- For site-local destination addresses, the + * set of candidate source addresses MUST only + * include addresses assigned to interfaces + * belonging to the same site as the outgoing + * interface.) + */ + if ((daddr_type & IPV6_ADDR_MULTICAST || + daddr_scope <= IPV6_ADDR_SCOPE_LINKLOCAL) && + daddr_dev && dev != daddr_dev) + continue; - read_lock(&addrconf_lock); idev = __in6_dev_get(dev); - if (idev) { - read_lock_bh(&idev->lock); - for (ifp=idev->addr_list; ifp; ifp=ifp->if_next) { - if (ifp->scope == scope) { - if (ifp->flags&IFA_F_TENTATIVE) - continue; -#ifdef CONFIG_IPV6_PRIVACY - score = ipv6_saddr_pref(ifp, idev->cnf.use_tempaddr > 1 ? IFA_F_TEMPORARY : 0); -#else - score = ipv6_saddr_pref(ifp, 0); -#endif - if (score <= hiscore) - continue; + if (!idev) + continue; - if (match) - in6_ifa_put(match); - match = ifp; - hiscore = score; - in6_ifa_hold(ifp); + read_lock_bh(&idev->lock); + for (ifa = idev->addr_list; ifa; ifa = ifa->if_next) { + struct ipv6_saddr_score score; - if (IPV6_GET_SADDR_MAXSCORE(score)) { - read_unlock_bh(&idev->lock); - read_unlock(&addrconf_lock); - goto out; - } + score.addr_type = __ipv6_addr_type(&ifa->addr); + + /* Rule 0: Candidate Source Address (section 4) + * - In any case, anycast addresses, multicast + * addresses, and the unspecified address MUST + * NOT be included in a candidate set. + */ + if (unlikely(score.addr_type == IPV6_ADDR_ANY || + score.addr_type & IPV6_ADDR_MULTICAST)) { + LIMIT_NETDEBUG(KERN_DEBUG + "ADDRCONF: unspecified / multicast address" + "assigned as unicast address on %s", + dev->name); + continue; + } + + score.attrs = 0; + score.matchlen = 0; + score.scope = 0; + score.rule = 0; + + if (ifa_result == NULL) { + /* record it if the first available entry */ + goto record_it; + } + + /* Rule 1: Prefer same address */ + if (hiscore.rule < 1) { + if (ipv6_addr_equal(&ifa_result->addr, daddr)) + hiscore.attrs |= IPV6_SADDR_SCORE_LOCAL; + hiscore.rule++; + } + if (ipv6_addr_equal(&ifa->addr, daddr)) { + score.attrs |= IPV6_SADDR_SCORE_LOCAL; + if (!(hiscore.attrs & IPV6_SADDR_SCORE_LOCAL)) { + score.rule = 1; + goto record_it; } + } else { + if (hiscore.attrs & IPV6_SADDR_SCORE_LOCAL) + continue; } - read_unlock_bh(&idev->lock); - } - read_unlock(&addrconf_lock); - } - if (scope == IFA_LINK) - goto out; + /* Rule 2: Prefer appropriate scope */ + if (hiscore.rule < 2) { + hiscore.scope = __ipv6_addr_src_scope(hiscore.addr_type); + hiscore.rule++; + } + score.scope = __ipv6_addr_src_scope(score.addr_type); + if (hiscore.scope < score.scope) { + if (hiscore.scope < daddr_scope) { + score.rule = 2; + goto record_it; + } else + continue; + } else if (score.scope < hiscore.scope) { + if (score.scope < daddr_scope) + continue; + else { + score.rule = 2; + goto record_it; + } + } - /* - * dev == NULL or search failed for specified dev - */ + /* Rule 3: Avoid deprecated address */ + if (hiscore.rule < 3) { + if (ipv6_saddr_preferred(hiscore.addr_type) || + !(ifa_result->flags & IFA_F_DEPRECATED)) + hiscore.attrs |= IPV6_SADDR_SCORE_PREFERRED; + hiscore.rule++; + } + if (ipv6_saddr_preferred(score.addr_type) || + !(ifa->flags & IFA_F_DEPRECATED)) { + score.attrs |= IPV6_SADDR_SCORE_PREFERRED; + if (!(hiscore.attrs & IPV6_SADDR_SCORE_PREFERRED)) { + score.rule = 3; + goto record_it; + } + } else { + if (hiscore.attrs & IPV6_SADDR_SCORE_PREFERRED) + continue; + } - read_lock(&dev_base_lock); - read_lock(&addrconf_lock); - for (dev = dev_base; dev; dev=dev->next) { - idev = __in6_dev_get(dev); - if (idev) { - read_lock_bh(&idev->lock); - for (ifp=idev->addr_list; ifp; ifp=ifp->if_next) { - if (ifp->scope == scope) { - if (ifp->flags&IFA_F_TENTATIVE) - continue; -#ifdef CONFIG_IPV6_PRIVACY - score = ipv6_saddr_pref(ifp, idev->cnf.use_tempaddr > 1 ? IFA_F_TEMPORARY : 0); -#else - score = ipv6_saddr_pref(ifp, 0); -#endif - if (score <= hiscore) - continue; + /* Rule 4: Prefer home address -- not implemented yet */ - if (match) - in6_ifa_put(match); - match = ifp; - hiscore = score; - in6_ifa_hold(ifp); + /* Rule 5: Prefer outgoing interface */ + if (hiscore.rule < 5) { + if (daddr_dev == NULL || + daddr_dev == ifa_result->idev->dev) + hiscore.attrs |= IPV6_SADDR_SCORE_OIF; + hiscore.rule++; + } + if (daddr_dev == NULL || + daddr_dev == ifa->idev->dev) { + score.attrs |= IPV6_SADDR_SCORE_OIF; + if (!(hiscore.attrs & IPV6_SADDR_SCORE_OIF)) { + score.rule = 5; + goto record_it; + } + } else { + if (hiscore.attrs & IPV6_SADDR_SCORE_OIF) + continue; + } - if (IPV6_GET_SADDR_MAXSCORE(score)) { - read_unlock_bh(&idev->lock); - goto out_unlock_base; - } + /* Rule 6: Prefer matching label */ + if (hiscore.rule < 6) { + if (ipv6_saddr_label(&ifa_result->addr, hiscore.addr_type) == daddr_label) + hiscore.attrs |= IPV6_SADDR_SCORE_LABEL; + hiscore.rule++; + } + if (ipv6_saddr_label(&ifa->addr, score.addr_type) == daddr_label) { + score.attrs |= IPV6_SADDR_SCORE_LABEL; + if (!(hiscore.attrs & IPV6_SADDR_SCORE_LABEL)) { + score.rule = 6; + goto record_it; } + } else { + if (hiscore.attrs & IPV6_SADDR_SCORE_LABEL) + continue; } - read_unlock_bh(&idev->lock); + + /* Rule 7: Prefer public address + * Note: prefer temprary address if use_tempaddr >= 2 + */ + if (hiscore.rule < 7) { + if ((!(ifa_result->flags & IFA_F_TEMPORARY)) ^ + (ifa_result->idev->cnf.use_tempaddr >= 2)) + hiscore.attrs |= IPV6_SADDR_SCORE_PRIVACY; + hiscore.rule++; + } + if ((!(ifa->flags & IFA_F_TEMPORARY)) ^ + (ifa->idev->cnf.use_tempaddr >= 2)) { + score.attrs |= IPV6_SADDR_SCORE_PRIVACY; + if (!(hiscore.attrs & IPV6_SADDR_SCORE_PRIVACY)) { + score.rule = 7; + goto record_it; + } + } else { + if (hiscore.attrs & IPV6_SADDR_SCORE_PRIVACY) + continue; + } + + /* Rule 8: Use longest matching prefix */ + if (hiscore.rule < 8) + hiscore.matchlen = ipv6_addr_diff(&ifa_result->addr, daddr); + score.rule++; + score.matchlen = ipv6_addr_diff(&ifa->addr, daddr); + if (score.matchlen > hiscore.matchlen) { + score.rule = 8; + goto record_it; + } +#if 0 + else if (score.matchlen < hiscore.matchlen) + continue; +#endif + + /* Final Rule: choose first available one */ + continue; +record_it: + if (ifa_result) + in6_ifa_put(ifa_result); + in6_ifa_hold(ifa); + ifa_result = ifa; + hiscore = score; } + read_unlock_bh(&idev->lock); } - -out_unlock_base: read_unlock(&addrconf_lock); read_unlock(&dev_base_lock); -out: - err = -EADDRNOTAVAIL; - if (match) { - ipv6_addr_copy(saddr, &match->addr); - err = 0; - in6_ifa_put(match); - } - - return err; + if (!ifa_result) + return -EADDRNOTAVAIL; + + ipv6_addr_copy(saddr, &ifa_result->addr); + in6_ifa_put(ifa_result); + return 0; } -- cgit v1.2.3-18-g5258 From b541ca2c5a3f3f399d6f2ec9da33c1be5a8d8c19 Mon Sep 17 00:00:00 2001 From: Thomas Graf Date: Tue, 8 Nov 2005 09:39:17 -0800 Subject: [PKT_SCHED]: Correctly handle empty ematch trees Fixes an invalid memory reference when the basic classifier is used without any ematches but just actions. Signed-off-by: Thomas Graf Signed-off-by: David S. Miller --- net/sched/ematch.c | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'net') diff --git a/net/sched/ematch.c b/net/sched/ematch.c index ebfe2e7d21b..64b047c6556 100644 --- a/net/sched/ematch.c +++ b/net/sched/ematch.c @@ -298,6 +298,11 @@ int tcf_em_tree_validate(struct tcf_proto *tp, struct rtattr *rta, struct tcf_ematch_tree_hdr *tree_hdr; struct tcf_ematch *em; + if (!rta) { + memset(tree, 0, sizeof(*tree)); + return 0; + } + if (rtattr_parse_nested(tb, TCA_EMATCH_TREE_MAX, rta) < 0) goto errout; -- cgit v1.2.3-18-g5258 From dc8103f25fd7cfac2c2b295f33edc10f255b4c80 Mon Sep 17 00:00:00 2001 From: Julian Anastasov Date: Tue, 8 Nov 2005 09:40:05 -0800 Subject: [IPVS]: fix connection leak if expire_nodest_conn=1 There was a fix in 2.6.13 that changed the behaviour of ip_vs_conn_expire_now function not to put reference to connection, its callers should hold write lock or connection refcnt. But we forgot to convert one caller, when the real server for connection is unavailable caller should put the connection reference. It happens only when sysctl var expire_nodest_conn is set to 1 and such connections never expire. Thanks to Roberto Nibali who found the problem and tested a 2.4.32-rc2 patch, which is equal to this 2.6 version. Patch for 2.4 is already sent to Marcelo. Signed-off-by: Julian Anastasov Signed-off-by: Roberto Nibali Signed-off-by: David S. Miller --- net/ipv4/ipvs/ip_vs_core.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) (limited to 'net') diff --git a/net/ipv4/ipvs/ip_vs_core.c b/net/ipv4/ipvs/ip_vs_core.c index 981cc3244ef..1a0843cd58a 100644 --- a/net/ipv4/ipvs/ip_vs_core.c +++ b/net/ipv4/ipvs/ip_vs_core.c @@ -1009,11 +1009,10 @@ ip_vs_in(unsigned int hooknum, struct sk_buff **pskb, if (sysctl_ip_vs_expire_nodest_conn) { /* try to expire the connection immediately */ ip_vs_conn_expire_now(cp); - } else { - /* don't restart its timer, and silently - drop the packet. */ - __ip_vs_conn_put(cp); } + /* don't restart its timer, and silently + drop the packet. */ + __ip_vs_conn_put(cp); return NF_DROP; } -- cgit v1.2.3-18-g5258 From a51482bde22f99c63fbbb57d5d46cc666384e379 Mon Sep 17 00:00:00 2001 From: Jesper Juhl Date: Tue, 8 Nov 2005 09:41:34 -0800 Subject: [NET]: kfree cleanup From: Jesper Juhl This is the net/ part of the big kfree cleanup patch. Remove pointless checks for NULL prior to calling kfree() in net/. Signed-off-by: Jesper Juhl Cc: "David S. Miller" Cc: Arnaldo Carvalho de Melo Acked-by: Marcel Holtmann Acked-by: YOSHIFUJI Hideaki Signed-off-by: Andrew Morton --- net/802/p8023.c | 3 +-- net/ax25/af_ax25.c | 6 ++---- net/ax25/ax25_in.c | 6 ++---- net/ax25/ax25_route.c | 19 ++++++------------- net/bluetooth/hidp/core.c | 4 +--- net/core/dev_mcast.c | 3 +-- net/core/sock.c | 3 +-- net/dccp/ipv4.c | 6 ++---- net/dccp/proto.c | 3 +-- net/decnet/dn_table.c | 14 ++++++-------- net/ethernet/pe2.c | 3 +-- net/ipv4/af_inet.c | 3 +-- net/ipv4/fib_frontend.c | 3 +-- net/ipv4/ip_options.c | 3 +-- net/ipv4/ip_output.c | 12 ++++-------- net/ipv4/ip_sockglue.c | 12 ++++-------- net/ipv4/ipvs/ip_vs_app.c | 6 ++---- net/ipv4/multipath_wrandom.c | 10 +++------- net/ipv4/netfilter/ip_nat_snmp_basic.c | 3 +-- net/ipv4/tcp_ipv4.c | 3 +-- net/ipv6/addrconf.c | 3 +-- net/ipv6/ip6_output.c | 15 +++++---------- net/ipv6/ip6_tunnel.c | 6 ++---- net/ipv6/ipcomp6.c | 3 +-- net/ipv6/ipv6_sockglue.c | 3 +-- net/irda/discovery.c | 3 +-- net/irda/irias_object.c | 16 ++++++---------- net/rose/rose_route.c | 6 ++---- net/sched/cls_fw.c | 3 +-- net/sched/cls_route.c | 3 +-- net/sched/cls_rsvp.h | 3 +-- net/sched/cls_tcindex.c | 9 +++------ net/sched/cls_u32.c | 4 ++-- net/sched/em_meta.c | 3 +-- net/sctp/associola.c | 4 +--- net/sctp/sm_make_chunk.c | 6 ++---- net/sunrpc/auth_gss/gss_krb5_seal.c | 2 +- net/sunrpc/auth_gss/gss_krb5_unseal.c | 2 +- net/sunrpc/auth_gss/gss_mech_switch.c | 3 +-- net/sunrpc/auth_gss/gss_spkm3_seal.c | 3 +-- net/sunrpc/auth_gss/gss_spkm3_token.c | 3 +-- net/sunrpc/auth_gss/gss_spkm3_unseal.c | 6 ++---- net/sunrpc/svc.c | 9 +++------ net/sunrpc/xdr.c | 3 +-- net/wanrouter/af_wanpipe.c | 20 +++++++------------- net/wanrouter/wanmain.c | 12 ++++-------- net/xfrm/xfrm_state.c | 12 ++++-------- 47 files changed, 99 insertions(+), 191 deletions(-) (limited to 'net') diff --git a/net/802/p8023.c b/net/802/p8023.c index 6368d3dce44..d23e906456e 100644 --- a/net/802/p8023.c +++ b/net/802/p8023.c @@ -54,8 +54,7 @@ struct datalink_proto *make_8023_client(void) */ void destroy_8023_client(struct datalink_proto *dl) { - if (dl) - kfree(dl); + kfree(dl); } EXPORT_SYMBOL(destroy_8023_client); diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c index 8e37e71e34f..1b683f30265 100644 --- a/net/ax25/af_ax25.c +++ b/net/ax25/af_ax25.c @@ -1138,10 +1138,8 @@ static int ax25_connect(struct socket *sock, struct sockaddr *uaddr, sk->sk_state = TCP_CLOSE; sock->state = SS_UNCONNECTED; - if (ax25->digipeat != NULL) { - kfree(ax25->digipeat); - ax25->digipeat = NULL; - } + kfree(ax25->digipeat); + ax25->digipeat = NULL; /* * Handle digi-peaters to be used. diff --git a/net/ax25/ax25_in.c b/net/ax25/ax25_in.c index 73cfc3411c4..4cf87540fb3 100644 --- a/net/ax25/ax25_in.c +++ b/net/ax25/ax25_in.c @@ -401,10 +401,8 @@ static int ax25_rcv(struct sk_buff *skb, struct net_device *dev, } if (dp.ndigi == 0) { - if (ax25->digipeat != NULL) { - kfree(ax25->digipeat); - ax25->digipeat = NULL; - } + kfree(ax25->digipeat); + ax25->digipeat = NULL; } else { /* Reverse the source SABM's path */ memcpy(ax25->digipeat, &reverse_dp, sizeof(ax25_digi)); diff --git a/net/ax25/ax25_route.c b/net/ax25/ax25_route.c index 26b77d97222..b1e945bd6ed 100644 --- a/net/ax25/ax25_route.c +++ b/net/ax25/ax25_route.c @@ -54,15 +54,13 @@ void ax25_rt_device_down(struct net_device *dev) if (s->dev == dev) { if (ax25_route_list == s) { ax25_route_list = s->next; - if (s->digipeat != NULL) - kfree(s->digipeat); + kfree(s->digipeat); kfree(s); } else { for (t = ax25_route_list; t != NULL; t = t->next) { if (t->next == s) { t->next = s->next; - if (s->digipeat != NULL) - kfree(s->digipeat); + kfree(s->digipeat); kfree(s); break; } @@ -90,10 +88,8 @@ static int ax25_rt_add(struct ax25_routes_struct *route) while (ax25_rt != NULL) { if (ax25cmp(&ax25_rt->callsign, &route->dest_addr) == 0 && ax25_rt->dev == ax25_dev->dev) { - if (ax25_rt->digipeat != NULL) { - kfree(ax25_rt->digipeat); - ax25_rt->digipeat = NULL; - } + kfree(ax25_rt->digipeat); + ax25_rt->digipeat = NULL; if (route->digi_count != 0) { if ((ax25_rt->digipeat = kmalloc(sizeof(ax25_digi), GFP_ATOMIC)) == NULL) { write_unlock(&ax25_route_lock); @@ -145,8 +141,7 @@ static int ax25_rt_add(struct ax25_routes_struct *route) static void ax25_rt_destroy(ax25_route *ax25_rt) { if (atomic_read(&ax25_rt->ref) == 0) { - if (ax25_rt->digipeat != NULL) - kfree(ax25_rt->digipeat); + kfree(ax25_rt->digipeat); kfree(ax25_rt); return; } @@ -530,9 +525,7 @@ void __exit ax25_rt_free(void) s = ax25_rt; ax25_rt = ax25_rt->next; - if (s->digipeat != NULL) - kfree(s->digipeat); - + kfree(s->digipeat); kfree(s); } write_unlock(&ax25_route_lock); diff --git a/net/bluetooth/hidp/core.c b/net/bluetooth/hidp/core.c index 860444a7fc0..cdb9cfafd96 100644 --- a/net/bluetooth/hidp/core.c +++ b/net/bluetooth/hidp/core.c @@ -660,9 +660,7 @@ unlink: failed: up_write(&hidp_session_sem); - if (session->input) - kfree(session->input); - + kfree(session->input); kfree(session); return err; } diff --git a/net/core/dev_mcast.c b/net/core/dev_mcast.c index db098ff3cd6..cb530eef0e3 100644 --- a/net/core/dev_mcast.c +++ b/net/core/dev_mcast.c @@ -194,8 +194,7 @@ int dev_mc_add(struct net_device *dev, void *addr, int alen, int glbl) done: spin_unlock_bh(&dev->xmit_lock); - if (dmi1) - kfree(dmi1); + kfree(dmi1); return err; } diff --git a/net/core/sock.c b/net/core/sock.c index 9602ceb3bac..13cc3be4f05 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1242,8 +1242,7 @@ static void sock_def_write_space(struct sock *sk) static void sock_def_destruct(struct sock *sk) { - if (sk->sk_protinfo) - kfree(sk->sk_protinfo); + kfree(sk->sk_protinfo); } void sk_send_sigurg(struct sock *sk) diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c index 4b9bc81ae1a..ca03521112c 100644 --- a/net/dccp/ipv4.c +++ b/net/dccp/ipv4.c @@ -1263,10 +1263,8 @@ static int dccp_v4_destroy_sock(struct sock *sk) if (inet_csk(sk)->icsk_bind_hash != NULL) inet_put_port(&dccp_hashinfo, sk); - if (dp->dccps_service_list != NULL) { - kfree(dp->dccps_service_list); - dp->dccps_service_list = NULL; - } + kfree(dp->dccps_service_list); + dp->dccps_service_list = NULL; ccid_hc_rx_exit(dp->dccps_hc_rx_ccid, sk); ccid_hc_tx_exit(dp->dccps_hc_tx_ccid, sk); diff --git a/net/dccp/proto.c b/net/dccp/proto.c index a021c3422f6..e0ace7cbb99 100644 --- a/net/dccp/proto.c +++ b/net/dccp/proto.c @@ -238,8 +238,7 @@ static int dccp_setsockopt_service(struct sock *sk, const u32 service, lock_sock(sk); dp->dccps_service = service; - if (dp->dccps_service_list != NULL) - kfree(dp->dccps_service_list); + kfree(dp->dccps_service_list); dp->dccps_service_list = sl; release_sock(sk); diff --git a/net/decnet/dn_table.c b/net/decnet/dn_table.c index eeba56f9932..6f8b5658cb4 100644 --- a/net/decnet/dn_table.c +++ b/net/decnet/dn_table.c @@ -784,16 +784,14 @@ struct dn_fib_table *dn_fib_get_table(int n, int create) static void dn_fib_del_tree(int n) { - struct dn_fib_table *t; + struct dn_fib_table *t; - write_lock(&dn_fib_tables_lock); - t = dn_fib_tables[n]; - dn_fib_tables[n] = NULL; - write_unlock(&dn_fib_tables_lock); + write_lock(&dn_fib_tables_lock); + t = dn_fib_tables[n]; + dn_fib_tables[n] = NULL; + write_unlock(&dn_fib_tables_lock); - if (t) { - kfree(t); - } + kfree(t); } struct dn_fib_table *dn_fib_empty_table(void) diff --git a/net/ethernet/pe2.c b/net/ethernet/pe2.c index 98a494be603..9d57b4fb644 100644 --- a/net/ethernet/pe2.c +++ b/net/ethernet/pe2.c @@ -32,8 +32,7 @@ struct datalink_proto *make_EII_client(void) void destroy_EII_client(struct datalink_proto *dl) { - if (dl) - kfree(dl); + kfree(dl); } EXPORT_SYMBOL(destroy_EII_client); diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index a9d84f93442..eaa150c33b0 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -147,8 +147,7 @@ void inet_sock_destruct(struct sock *sk) BUG_TRAP(!sk->sk_wmem_queued); BUG_TRAP(!sk->sk_forward_alloc); - if (inet->opt) - kfree(inet->opt); + kfree(inet->opt); dst_release(sk->sk_dst_cache); sk_refcnt_debug_dec(sk); } diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c index 990633c09df..2267c1fad87 100644 --- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -266,8 +266,7 @@ int ip_rt_ioctl(unsigned int cmd, void __user *arg) if (tb) err = tb->tb_insert(tb, &req.rtm, &rta, &req.nlh, NULL); } - if (rta.rta_mx) - kfree(rta.rta_mx); + kfree(rta.rta_mx); } rtnl_unlock(); return err; diff --git a/net/ipv4/ip_options.c b/net/ipv4/ip_options.c index bce4e875193..dbe12da8d8b 100644 --- a/net/ipv4/ip_options.c +++ b/net/ipv4/ip_options.c @@ -510,8 +510,7 @@ static int ip_options_get_finish(struct ip_options **optp, kfree(opt); return -EINVAL; } - if (*optp) - kfree(*optp); + kfree(*optp); *optp = opt; return 0; } diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 17758234a3e..df7f20da422 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -1262,10 +1262,8 @@ int ip_push_pending_frames(struct sock *sk) out: inet->cork.flags &= ~IPCORK_OPT; - if (inet->cork.opt) { - kfree(inet->cork.opt); - inet->cork.opt = NULL; - } + kfree(inet->cork.opt); + inet->cork.opt = NULL; if (inet->cork.rt) { ip_rt_put(inet->cork.rt); inet->cork.rt = NULL; @@ -1289,10 +1287,8 @@ void ip_flush_pending_frames(struct sock *sk) kfree_skb(skb); inet->cork.flags &= ~IPCORK_OPT; - if (inet->cork.opt) { - kfree(inet->cork.opt); - inet->cork.opt = NULL; - } + kfree(inet->cork.opt); + inet->cork.opt = NULL; if (inet->cork.rt) { ip_rt_put(inet->cork.rt); inet->cork.rt = NULL; diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c index 2f0b47da5b3..4f2d8725730 100644 --- a/net/ipv4/ip_sockglue.c +++ b/net/ipv4/ip_sockglue.c @@ -202,8 +202,7 @@ int ip_ra_control(struct sock *sk, unsigned char on, void (*destructor)(struct s if (ra->sk == sk) { if (on) { write_unlock_bh(&ip_ra_lock); - if (new_ra) - kfree(new_ra); + kfree(new_ra); return -EADDRINUSE; } *rap = ra->next; @@ -446,8 +445,7 @@ int ip_setsockopt(struct sock *sk, int level, int optname, char __user *optval, #endif } opt = xchg(&inet->opt, opt); - if (opt) - kfree(opt); + kfree(opt); break; } case IP_PKTINFO: @@ -828,10 +826,8 @@ int ip_setsockopt(struct sock *sk, int level, int optname, char __user *optval, err = ip_mc_msfilter(sk, msf, ifindex); mc_msf_out: - if (msf) - kfree(msf); - if (gsf) - kfree(gsf); + kfree(msf); + kfree(gsf); break; } case IP_ROUTER_ALERT: diff --git a/net/ipv4/ipvs/ip_vs_app.c b/net/ipv4/ipvs/ip_vs_app.c index fc6f95aaa96..d7eb680101c 100644 --- a/net/ipv4/ipvs/ip_vs_app.c +++ b/net/ipv4/ipvs/ip_vs_app.c @@ -110,8 +110,7 @@ ip_vs_app_inc_new(struct ip_vs_app *app, __u16 proto, __u16 port) return 0; out: - if (inc->timeout_table) - kfree(inc->timeout_table); + kfree(inc->timeout_table); kfree(inc); return ret; } @@ -136,8 +135,7 @@ ip_vs_app_inc_release(struct ip_vs_app *inc) list_del(&inc->a_list); - if (inc->timeout_table != NULL) - kfree(inc->timeout_table); + kfree(inc->timeout_table); kfree(inc); } diff --git a/net/ipv4/multipath_wrandom.c b/net/ipv4/multipath_wrandom.c index bd7d75b6abe..d34a9fa608e 100644 --- a/net/ipv4/multipath_wrandom.c +++ b/net/ipv4/multipath_wrandom.c @@ -207,16 +207,12 @@ static void wrandom_select_route(const struct flowi *flp, decision = mpc->rt; last_power = mpc->power; - if (last_mpc) - kfree(last_mpc); - + kfree(last_mpc); last_mpc = mpc; } - if (last_mpc) { - /* concurrent __multipath_flush may lead to !last_mpc */ - kfree(last_mpc); - } + /* concurrent __multipath_flush may lead to !last_mpc */ + kfree(last_mpc); decision->u.dst.__use++; *rp = decision; diff --git a/net/ipv4/netfilter/ip_nat_snmp_basic.c b/net/ipv4/netfilter/ip_nat_snmp_basic.c index 93b2c5111bb..8acb7ed40b4 100644 --- a/net/ipv4/netfilter/ip_nat_snmp_basic.c +++ b/net/ipv4/netfilter/ip_nat_snmp_basic.c @@ -1161,8 +1161,7 @@ static int snmp_parse_mangle(unsigned char *msg, if (!snmp_object_decode(&ctx, obj)) { if (*obj) { - if ((*obj)->id) - kfree((*obj)->id); + kfree((*obj)->id); kfree(*obj); } kfree(obj); diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 49d67cd75ed..634dabb558f 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -823,8 +823,7 @@ out: */ static void tcp_v4_reqsk_destructor(struct request_sock *req) { - if (inet_rsk(req)->opt) - kfree(inet_rsk(req)->opt); + kfree(inet_rsk(req)->opt); } static inline void syn_flood_warning(struct sk_buff *skb) diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index a34d1504deb..b7a5f51238b 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -3090,8 +3090,7 @@ static int inet6_fill_ifinfo(struct sk_buff *skb, struct inet6_dev *idev, nlmsg_failure: rtattr_failure: - if (array) - kfree(array); + kfree(array); skb_trim(skb, b - skb->data); return -1; } diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 614296a920c..dbd9767b32e 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -587,8 +587,7 @@ static int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *)) skb->next = NULL; } - if (tmp_hdr) - kfree(tmp_hdr); + kfree(tmp_hdr); if (err == 0) { IP6_INC_STATS(IPSTATS_MIB_FRAGOKS); @@ -1186,10 +1185,8 @@ int ip6_push_pending_frames(struct sock *sk) out: inet->cork.flags &= ~IPCORK_OPT; - if (np->cork.opt) { - kfree(np->cork.opt); - np->cork.opt = NULL; - } + kfree(np->cork.opt); + np->cork.opt = NULL; if (np->cork.rt) { dst_release(&np->cork.rt->u.dst); np->cork.rt = NULL; @@ -1214,10 +1211,8 @@ void ip6_flush_pending_frames(struct sock *sk) inet->cork.flags &= ~IPCORK_OPT; - if (np->cork.opt) { - kfree(np->cork.opt); - np->cork.opt = NULL; - } + kfree(np->cork.opt); + np->cork.opt = NULL; if (np->cork.rt) { dst_release(&np->cork.rt->u.dst); np->cork.rt = NULL; diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c index cf94372d1af..e6b0e3954c0 100644 --- a/net/ipv6/ip6_tunnel.c +++ b/net/ipv6/ip6_tunnel.c @@ -756,8 +756,7 @@ ip6ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev) } ip6_tnl_dst_store(t, dst); - if (opt) - kfree(opt); + kfree(opt); t->recursion--; return 0; @@ -766,8 +765,7 @@ tx_err_link_failure: dst_link_failure(skb); tx_err_dst_release: dst_release(dst); - if (opt) - kfree(opt); + kfree(opt); tx_err: stats->tx_errors++; stats->tx_dropped++; diff --git a/net/ipv6/ipcomp6.c b/net/ipv6/ipcomp6.c index 85bfbc69b2c..55917fb1709 100644 --- a/net/ipv6/ipcomp6.c +++ b/net/ipv6/ipcomp6.c @@ -130,8 +130,7 @@ static int ipcomp6_input(struct xfrm_state *x, struct xfrm_decap_state *decap, s out_put_cpu: put_cpu(); out: - if (tmp_hdr) - kfree(tmp_hdr); + kfree(tmp_hdr); if (err) goto error_out; return nexthdr; diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c index 8567873d0dd..003fd99ff59 100644 --- a/net/ipv6/ipv6_sockglue.c +++ b/net/ipv6/ipv6_sockglue.c @@ -80,8 +80,7 @@ int ip6_ra_control(struct sock *sk, int sel, void (*destructor)(struct sock *)) if (ra->sk == sk) { if (sel>=0) { write_unlock_bh(&ip6_ra_lock); - if (new_ra) - kfree(new_ra); + kfree(new_ra); return -EADDRINUSE; } diff --git a/net/irda/discovery.c b/net/irda/discovery.c index c4ba5fa1446..3fefc822c1c 100644 --- a/net/irda/discovery.c +++ b/net/irda/discovery.c @@ -194,8 +194,7 @@ void irlmp_expire_discoveries(hashbin_t *log, __u32 saddr, int force) /* Remove it from the log */ curr = hashbin_remove_this(log, (irda_queue_t *) curr); - if (curr) - kfree(curr); + kfree(curr); } } diff --git a/net/irda/irias_object.c b/net/irda/irias_object.c index 6fec428b451..75f2666e863 100644 --- a/net/irda/irias_object.c +++ b/net/irda/irias_object.c @@ -122,8 +122,7 @@ static void __irias_delete_attrib(struct ias_attrib *attrib) IRDA_ASSERT(attrib != NULL, return;); IRDA_ASSERT(attrib->magic == IAS_ATTRIB_MAGIC, return;); - if (attrib->name) - kfree(attrib->name); + kfree(attrib->name); irias_delete_value(attrib->value); attrib->magic = ~IAS_ATTRIB_MAGIC; @@ -136,8 +135,7 @@ void __irias_delete_object(struct ias_object *obj) IRDA_ASSERT(obj != NULL, return;); IRDA_ASSERT(obj->magic == IAS_OBJECT_MAGIC, return;); - if (obj->name) - kfree(obj->name); + kfree(obj->name); hashbin_delete(obj->attribs, (FREE_FUNC) __irias_delete_attrib); @@ -562,14 +560,12 @@ void irias_delete_value(struct ias_value *value) /* No need to deallocate */ break; case IAS_STRING: - /* If string, deallocate string */ - if (value->t.string != NULL) - kfree(value->t.string); + /* Deallocate string */ + kfree(value->t.string); break; case IAS_OCT_SEQ: - /* If byte stream, deallocate byte stream */ - if (value->t.oct_seq != NULL) - kfree(value->t.oct_seq); + /* Deallocate byte stream */ + kfree(value->t.oct_seq); break; default: IRDA_DEBUG(0, "%s(), Unknown value type!\n", __FUNCTION__); diff --git a/net/rose/rose_route.c b/net/rose/rose_route.c index b18fe504301..8631b65a731 100644 --- a/net/rose/rose_route.c +++ b/net/rose/rose_route.c @@ -240,8 +240,7 @@ static void rose_remove_neigh(struct rose_neigh *rose_neigh) if ((s = rose_neigh_list) == rose_neigh) { rose_neigh_list = rose_neigh->next; spin_unlock_bh(&rose_neigh_list_lock); - if (rose_neigh->digipeat != NULL) - kfree(rose_neigh->digipeat); + kfree(rose_neigh->digipeat); kfree(rose_neigh); return; } @@ -250,8 +249,7 @@ static void rose_remove_neigh(struct rose_neigh *rose_neigh) if (s->next == rose_neigh) { s->next = rose_neigh->next; spin_unlock_bh(&rose_neigh_list_lock); - if (rose_neigh->digipeat != NULL) - kfree(rose_neigh->digipeat); + kfree(rose_neigh->digipeat); kfree(rose_neigh); return; } diff --git a/net/sched/cls_fw.c b/net/sched/cls_fw.c index 29d8b9a4d16..75470486e40 100644 --- a/net/sched/cls_fw.c +++ b/net/sched/cls_fw.c @@ -298,8 +298,7 @@ static int fw_change(struct tcf_proto *tp, unsigned long base, return 0; errout: - if (f) - kfree(f); + kfree(f); return err; } diff --git a/net/sched/cls_route.c b/net/sched/cls_route.c index 02996ac05c7..520ff716dab 100644 --- a/net/sched/cls_route.c +++ b/net/sched/cls_route.c @@ -525,8 +525,7 @@ reinsert: return 0; errout: - if (f) - kfree(f); + kfree(f); return err; } diff --git a/net/sched/cls_rsvp.h b/net/sched/cls_rsvp.h index 006168d6937..572f06be3b0 100644 --- a/net/sched/cls_rsvp.h +++ b/net/sched/cls_rsvp.h @@ -555,8 +555,7 @@ insert: goto insert; errout: - if (f) - kfree(f); + kfree(f); errout2: tcf_exts_destroy(tp, &e); return err; diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c index 404d9d83a7f..9f921174c8a 100644 --- a/net/sched/cls_tcindex.c +++ b/net/sched/cls_tcindex.c @@ -194,8 +194,7 @@ found: } tcf_unbind_filter(tp, &r->res); tcf_exts_destroy(tp, &r->exts); - if (f) - kfree(f); + kfree(f); return 0; } @@ -442,10 +441,8 @@ static void tcindex_destroy(struct tcf_proto *tp) walker.skip = 0; walker.fn = &tcindex_destroy_element; tcindex_walk(tp,&walker); - if (p->perfect) - kfree(p->perfect); - if (p->h) - kfree(p->h); + kfree(p->perfect); + kfree(p->h); kfree(p); tp->root = NULL; } diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c index 364b87d8645..2b670479dde 100644 --- a/net/sched/cls_u32.c +++ b/net/sched/cls_u32.c @@ -347,7 +347,7 @@ static int u32_destroy_key(struct tcf_proto *tp, struct tc_u_knode *n) if (n->ht_down) n->ht_down->refcnt--; #ifdef CONFIG_CLS_U32_PERF - if (n && (NULL != n->pf)) + if (n) kfree(n->pf); #endif kfree(n); @@ -680,7 +680,7 @@ static int u32_change(struct tcf_proto *tp, unsigned long base, u32 handle, return 0; } #ifdef CONFIG_CLS_U32_PERF - if (n && (NULL != n->pf)) + if (n) kfree(n->pf); #endif kfree(n); diff --git a/net/sched/em_meta.c b/net/sched/em_meta.c index cf68a59fdc5..700844d49d7 100644 --- a/net/sched/em_meta.c +++ b/net/sched/em_meta.c @@ -561,8 +561,7 @@ static int meta_var_change(struct meta_value *dst, struct rtattr *rta) static void meta_var_destroy(struct meta_value *v) { - if (v->val) - kfree((void *) v->val); + kfree((void *) v->val); } static void meta_var_apply_extras(struct meta_value *v, diff --git a/net/sctp/associola.c b/net/sctp/associola.c index 12b0f582a66..8c8ddf7f9b6 100644 --- a/net/sctp/associola.c +++ b/net/sctp/associola.c @@ -344,9 +344,7 @@ void sctp_association_free(struct sctp_association *asoc) } /* Free peer's cached cookie. */ - if (asoc->peer.cookie) { - kfree(asoc->peer.cookie); - } + kfree(asoc->peer.cookie); /* Release the transport structures. */ list_for_each_safe(pos, temp, &asoc->peer.transport_addr_list) { diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c index 660c61bdf16..f9573eba5c7 100644 --- a/net/sctp/sm_make_chunk.c +++ b/net/sctp/sm_make_chunk.c @@ -254,8 +254,7 @@ struct sctp_chunk *sctp_make_init(const struct sctp_association *asoc, aiparam.adaption_ind = htonl(sp->adaption_ind); sctp_addto_chunk(retval, sizeof(aiparam), &aiparam); nodata: - if (addrs.v) - kfree(addrs.v); + kfree(addrs.v); return retval; } @@ -347,8 +346,7 @@ struct sctp_chunk *sctp_make_init_ack(const struct sctp_association *asoc, nomem_chunk: kfree(cookie); nomem_cookie: - if (addrs.v) - kfree(addrs.v); + kfree(addrs.v); return retval; } diff --git a/net/sunrpc/auth_gss/gss_krb5_seal.c b/net/sunrpc/auth_gss/gss_krb5_seal.c index 13f8ae97945..d0dfdfd5e79 100644 --- a/net/sunrpc/auth_gss/gss_krb5_seal.c +++ b/net/sunrpc/auth_gss/gss_krb5_seal.c @@ -143,6 +143,6 @@ gss_get_mic_kerberos(struct gss_ctx *gss_ctx, struct xdr_buf *text, return ((ctx->endtime < now) ? GSS_S_CONTEXT_EXPIRED : GSS_S_COMPLETE); out_err: - if (md5cksum.data) kfree(md5cksum.data); + kfree(md5cksum.data); return GSS_S_FAILURE; } diff --git a/net/sunrpc/auth_gss/gss_krb5_unseal.c b/net/sunrpc/auth_gss/gss_krb5_unseal.c index 2030475d98e..db055fd7d77 100644 --- a/net/sunrpc/auth_gss/gss_krb5_unseal.c +++ b/net/sunrpc/auth_gss/gss_krb5_unseal.c @@ -176,6 +176,6 @@ gss_verify_mic_kerberos(struct gss_ctx *gss_ctx, ret = GSS_S_COMPLETE; out: - if (md5cksum.data) kfree(md5cksum.data); + kfree(md5cksum.data); return ret; } diff --git a/net/sunrpc/auth_gss/gss_mech_switch.c b/net/sunrpc/auth_gss/gss_mech_switch.c index b048bf672da..f8bac6ccd52 100644 --- a/net/sunrpc/auth_gss/gss_mech_switch.c +++ b/net/sunrpc/auth_gss/gss_mech_switch.c @@ -60,8 +60,7 @@ gss_mech_free(struct gss_api_mech *gm) for (i = 0; i < gm->gm_pf_num; i++) { pf = &gm->gm_pfs[i]; - if (pf->auth_domain_name) - kfree(pf->auth_domain_name); + kfree(pf->auth_domain_name); pf->auth_domain_name = NULL; } } diff --git a/net/sunrpc/auth_gss/gss_spkm3_seal.c b/net/sunrpc/auth_gss/gss_spkm3_seal.c index 148201e929d..d1e12b25d6e 100644 --- a/net/sunrpc/auth_gss/gss_spkm3_seal.c +++ b/net/sunrpc/auth_gss/gss_spkm3_seal.c @@ -122,8 +122,7 @@ spkm3_make_token(struct spkm3_ctx *ctx, return GSS_S_COMPLETE; out_err: - if (md5cksum.data) - kfree(md5cksum.data); + kfree(md5cksum.data); token->data = NULL; token->len = 0; return GSS_S_FAILURE; diff --git a/net/sunrpc/auth_gss/gss_spkm3_token.c b/net/sunrpc/auth_gss/gss_spkm3_token.c index 46c08a0710f..1f824578d77 100644 --- a/net/sunrpc/auth_gss/gss_spkm3_token.c +++ b/net/sunrpc/auth_gss/gss_spkm3_token.c @@ -259,8 +259,7 @@ spkm3_verify_mic_token(unsigned char **tokp, int *mic_hdrlen, unsigned char **ck ret = GSS_S_COMPLETE; out: - if (spkm3_ctx_id.data) - kfree(spkm3_ctx_id.data); + kfree(spkm3_ctx_id.data); return ret; } diff --git a/net/sunrpc/auth_gss/gss_spkm3_unseal.c b/net/sunrpc/auth_gss/gss_spkm3_unseal.c index c3c0d958610..241d5b30dfc 100644 --- a/net/sunrpc/auth_gss/gss_spkm3_unseal.c +++ b/net/sunrpc/auth_gss/gss_spkm3_unseal.c @@ -120,9 +120,7 @@ spkm3_read_token(struct spkm3_ctx *ctx, /* XXX: need to add expiration and sequencing */ ret = GSS_S_COMPLETE; out: - if (md5cksum.data) - kfree(md5cksum.data); - if (wire_cksum.data) - kfree(wire_cksum.data); + kfree(md5cksum.data); + kfree(wire_cksum.data); return ret; } diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index 5a220b2bb37..e4296c8b861 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -196,12 +196,9 @@ svc_exit_thread(struct svc_rqst *rqstp) struct svc_serv *serv = rqstp->rq_server; svc_release_buffer(rqstp); - if (rqstp->rq_resp) - kfree(rqstp->rq_resp); - if (rqstp->rq_argp) - kfree(rqstp->rq_argp); - if (rqstp->rq_auth_data) - kfree(rqstp->rq_auth_data); + kfree(rqstp->rq_resp); + kfree(rqstp->rq_argp); + kfree(rqstp->rq_auth_data); kfree(rqstp); /* Release the server */ diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c index 32df43372ee..aaf08cdd19f 100644 --- a/net/sunrpc/xdr.c +++ b/net/sunrpc/xdr.c @@ -992,8 +992,7 @@ xdr_xcode_array2(struct xdr_buf *buf, unsigned int base, err = 0; out: - if (elem) - kfree(elem); + kfree(elem); if (ppages) kunmap(*ppages); return err; diff --git a/net/wanrouter/af_wanpipe.c b/net/wanrouter/af_wanpipe.c index 596cb96e5f4..59fec59b213 100644 --- a/net/wanrouter/af_wanpipe.c +++ b/net/wanrouter/af_wanpipe.c @@ -1099,7 +1099,7 @@ static void release_driver(struct sock *sk) sock_reset_flag(sk, SOCK_ZAPPED); wp = wp_sk(sk); - if (wp && wp->mbox) { + if (wp) { kfree(wp->mbox); wp->mbox = NULL; } @@ -1186,10 +1186,8 @@ static void wanpipe_kill_sock_timer (unsigned long data) return; } - if (wp_sk(sk)) { - kfree(wp_sk(sk)); - wp_sk(sk) = NULL; - } + kfree(wp_sk(sk)); + wp_sk(sk) = NULL; if (atomic_read(&sk->sk_refcnt) != 1) { atomic_set(&sk->sk_refcnt, 1); @@ -1219,10 +1217,8 @@ static void wanpipe_kill_sock_accept (struct sock *sk) sk->sk_socket = NULL; - if (wp_sk(sk)) { - kfree(wp_sk(sk)); - wp_sk(sk) = NULL; - } + kfree(wp_sk(sk)); + wp_sk(sk) = NULL; if (atomic_read(&sk->sk_refcnt) != 1) { atomic_set(&sk->sk_refcnt, 1); @@ -1243,10 +1239,8 @@ static void wanpipe_kill_sock_irq (struct sock *sk) sk->sk_socket = NULL; - if (wp_sk(sk)) { - kfree(wp_sk(sk)); - wp_sk(sk) = NULL; - } + kfree(wp_sk(sk)); + wp_sk(sk) = NULL; if (atomic_read(&sk->sk_refcnt) != 1) { atomic_set(&sk->sk_refcnt, 1); diff --git a/net/wanrouter/wanmain.c b/net/wanrouter/wanmain.c index 13b650ad22e..bcf7b3faa76 100644 --- a/net/wanrouter/wanmain.c +++ b/net/wanrouter/wanmain.c @@ -714,10 +714,8 @@ static int wanrouter_device_new_if(struct wan_device *wandev, } /* This code has moved from del_if() function */ - if (dev->priv) { - kfree(dev->priv); - dev->priv = NULL; - } + kfree(dev->priv); + dev->priv = NULL; #ifdef CONFIG_WANPIPE_MULTPPP if (cnf->config_id == WANCONFIG_MPPP) @@ -851,10 +849,8 @@ static int wanrouter_delete_interface(struct wan_device *wandev, char *name) /* Due to new interface linking method using dev->priv, * this code has moved from del_if() function.*/ - if (dev->priv){ - kfree(dev->priv); - dev->priv=NULL; - } + kfree(dev->priv); + dev->priv=NULL; unregister_netdev(dev); diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index 8b9a4747417..7cf48aa6c95 100644 --- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -62,14 +62,10 @@ static void xfrm_state_gc_destroy(struct xfrm_state *x) { if (del_timer(&x->timer)) BUG(); - if (x->aalg) - kfree(x->aalg); - if (x->ealg) - kfree(x->ealg); - if (x->calg) - kfree(x->calg); - if (x->encap) - kfree(x->encap); + kfree(x->aalg); + kfree(x->ealg); + kfree(x->calg); + kfree(x->encap); if (x->type) { x->type->destructor(x); xfrm_put_type(x->type); -- cgit v1.2.3-18-g5258 From 89f5f0aeed14ac7245f760b0b96c9269c87bcbbe Mon Sep 17 00:00:00 2001 From: Herbert Xu Date: Tue, 8 Nov 2005 09:41:56 -0800 Subject: [IPV4]: Fix ip_queue_xmit identity increment for TSO packets When ip_queue_xmit calls ip_select_ident_more for IP identity selection it gives it the wrong packet count for TSO packets. The ip_select_* functions expect one less than the number of packets, so we need to subtract one for TSO packets. This bug was diagnosed and fixed by Tom Young. Signed-off-by: Herbert Xu Signed-off-by: David S. Miller --- net/ipv4/ip_output.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'net') diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index df7f20da422..11c2f68254f 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -353,7 +353,8 @@ packet_routed: ip_options_build(skb, opt, inet->daddr, rt, 0); } - ip_select_ident_more(iph, &rt->u.dst, sk, skb_shinfo(skb)->tso_segs); + ip_select_ident_more(iph, &rt->u.dst, sk, + (skb_shinfo(skb)->tso_segs ?: 1) - 1); /* Add an IP checksum. */ ip_send_check(iph); -- cgit v1.2.3-18-g5258 From 1ebb92521d0bc2d4ef772730d29333c06b807191 Mon Sep 17 00:00:00 2001 From: Marcel Holtmann Date: Tue, 8 Nov 2005 09:57:21 -0800 Subject: [Bluetooth]: Add endian annotations to the core This patch adds the endian annotations to the Bluetooth core. Signed-off-by: Marcel Holtmann Signed-off-by: David S. Miller --- net/bluetooth/hci_core.c | 2 +- net/bluetooth/hci_event.c | 6 +++--- net/bluetooth/hci_sock.c | 2 +- 3 files changed, 5 insertions(+), 5 deletions(-) (limited to 'net') diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c index cf0df1c8c93..9106354c781 100644 --- a/net/bluetooth/hci_core.c +++ b/net/bluetooth/hci_core.c @@ -183,7 +183,7 @@ static void hci_reset_req(struct hci_dev *hdev, unsigned long opt) static void hci_init_req(struct hci_dev *hdev, unsigned long opt) { struct sk_buff *skb; - __u16 param; + __le16 param; BT_DBG("%s %ld", hdev->name, opt); diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c index b61b4e8e36f..eb64555d1fb 100644 --- a/net/bluetooth/hci_event.c +++ b/net/bluetooth/hci_event.c @@ -242,7 +242,7 @@ static void hci_cc_host_ctl(struct hci_dev *hdev, __u16 ocf, struct sk_buff *skb break; status = *((__u8 *) skb->data); - setting = __le16_to_cpu(get_unaligned((__u16 *) sent)); + setting = __le16_to_cpu(get_unaligned((__le16 *) sent)); if (!status && hdev->voice_setting != setting) { hdev->voice_setting = setting; @@ -728,7 +728,7 @@ static inline void hci_disconn_complete_evt(struct hci_dev *hdev, struct sk_buff static inline void hci_num_comp_pkts_evt(struct hci_dev *hdev, struct sk_buff *skb) { struct hci_ev_num_comp_pkts *ev = (struct hci_ev_num_comp_pkts *) skb->data; - __u16 *ptr; + __le16 *ptr; int i; skb_pull(skb, sizeof(*ev)); @@ -742,7 +742,7 @@ static inline void hci_num_comp_pkts_evt(struct hci_dev *hdev, struct sk_buff *s tasklet_disable(&hdev->tx_task); - for (i = 0, ptr = (__u16 *) skb->data; i < ev->num_hndl; i++) { + for (i = 0, ptr = (__le16 *) skb->data; i < ev->num_hndl; i++) { struct hci_conn *conn; __u16 handle, count; diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c index 799e448750a..1d6d0a15c09 100644 --- a/net/bluetooth/hci_sock.c +++ b/net/bluetooth/hci_sock.c @@ -416,7 +416,7 @@ static int hci_sock_sendmsg(struct kiocb *iocb, struct socket *sock, skb->dev = (void *) hdev; if (bt_cb(skb)->pkt_type == HCI_COMMAND_PKT) { - u16 opcode = __le16_to_cpu(get_unaligned((u16 *)skb->data)); + u16 opcode = __le16_to_cpu(get_unaligned((__le16 *) skb->data)); u16 ogf = hci_opcode_ogf(opcode); u16 ocf = hci_opcode_ocf(opcode); -- cgit v1.2.3-18-g5258 From be9d122730c878baafe11e70d1436faac229f2fc Mon Sep 17 00:00:00 2001 From: Marcel Holtmann Date: Tue, 8 Nov 2005 09:57:38 -0800 Subject: [Bluetooth]: Remove the usage of /proc completely This patch removes all relics of the /proc usage from the Bluetooth subsystem core and its upper layers. All the previous information are now available via /sys/class/bluetooth through appropriate functions. Signed-off-by: Marcel Holtmann Signed-off-by: David S. Miller --- net/bluetooth/af_bluetooth.c | 12 +---- net/bluetooth/hci_sysfs.c | 4 +- net/bluetooth/l2cap.c | 98 ++++++---------------------------- net/bluetooth/rfcomm/core.c | 124 ++++++------------------------------------- net/bluetooth/rfcomm/sock.c | 90 +++++-------------------------- net/bluetooth/sco.c | 92 +++++--------------------------- 6 files changed, 63 insertions(+), 357 deletions(-) (limited to 'net') diff --git a/net/bluetooth/af_bluetooth.c b/net/bluetooth/af_bluetooth.c index 03532062a46..ea616e3fc98 100644 --- a/net/bluetooth/af_bluetooth.c +++ b/net/bluetooth/af_bluetooth.c @@ -36,7 +36,6 @@ #include #include #include -#include #include #if defined(CONFIG_KMOD) @@ -50,10 +49,7 @@ #define BT_DBG(D...) #endif -#define VERSION "2.7" - -struct proc_dir_entry *proc_bt; -EXPORT_SYMBOL(proc_bt); +#define VERSION "2.8" /* Bluetooth sockets */ #define BT_MAX_PROTO 8 @@ -312,10 +308,6 @@ static int __init bt_init(void) { BT_INFO("Core ver %s", VERSION); - proc_bt = proc_mkdir("bluetooth", NULL); - if (proc_bt) - proc_bt->owner = THIS_MODULE; - sock_register(&bt_sock_family_ops); BT_INFO("HCI device and connection manager initialized"); @@ -334,8 +326,6 @@ static void __exit bt_exit(void) bt_sysfs_cleanup(); sock_unregister(PF_BLUETOOTH); - - remove_proc_entry("bluetooth", NULL); } subsys_initcall(bt_init); diff --git a/net/bluetooth/hci_sysfs.c b/net/bluetooth/hci_sysfs.c index 7856bc26acc..bd7568ac87f 100644 --- a/net/bluetooth/hci_sysfs.c +++ b/net/bluetooth/hci_sysfs.c @@ -103,7 +103,7 @@ static void bt_release(struct class_device *cdev) kfree(hdev); } -static struct class bt_class = { +struct class bt_class = { .name = "bluetooth", .release = bt_release, #ifdef CONFIG_HOTPLUG @@ -111,6 +111,8 @@ static struct class bt_class = { #endif }; +EXPORT_SYMBOL_GPL(bt_class); + int hci_register_sysfs(struct hci_dev *hdev) { struct class_device *cdev = &hdev->class_dev; diff --git a/net/bluetooth/l2cap.c b/net/bluetooth/l2cap.c index 59b2dd36baa..e3bb11ca423 100644 --- a/net/bluetooth/l2cap.c +++ b/net/bluetooth/l2cap.c @@ -38,9 +38,8 @@ #include #include #include -#include -#include #include +#include #include #include @@ -56,7 +55,7 @@ #define BT_DBG(D...) #endif -#define VERSION "2.7" +#define VERSION "2.8" static struct proto_ops l2cap_sock_ops; @@ -2137,94 +2136,29 @@ drop: return 0; } -/* ---- Proc fs support ---- */ -#ifdef CONFIG_PROC_FS -static void *l2cap_seq_start(struct seq_file *seq, loff_t *pos) +static ssize_t l2cap_sysfs_show(struct class *dev, char *buf) { struct sock *sk; struct hlist_node *node; - loff_t l = *pos; + char *str = buf; read_lock_bh(&l2cap_sk_list.lock); - sk_for_each(sk, node, &l2cap_sk_list.head) - if (!l--) - goto found; - sk = NULL; -found: - return sk; -} + sk_for_each(sk, node, &l2cap_sk_list.head) { + struct l2cap_pinfo *pi = l2cap_pi(sk); -static void *l2cap_seq_next(struct seq_file *seq, void *e, loff_t *pos) -{ - (*pos)++; - return sk_next(e); -} + str += sprintf(str, "%s %s %d %d 0x%4.4x 0x%4.4x %d %d 0x%x\n", + batostr(&bt_sk(sk)->src), batostr(&bt_sk(sk)->dst), + sk->sk_state, pi->psm, pi->scid, pi->dcid, pi->imtu, + pi->omtu, pi->link_mode); + } -static void l2cap_seq_stop(struct seq_file *seq, void *e) -{ read_unlock_bh(&l2cap_sk_list.lock); -} -static int l2cap_seq_show(struct seq_file *seq, void *e) -{ - struct sock *sk = e; - struct l2cap_pinfo *pi = l2cap_pi(sk); - - seq_printf(seq, "%s %s %d %d 0x%4.4x 0x%4.4x %d %d 0x%x\n", - batostr(&bt_sk(sk)->src), batostr(&bt_sk(sk)->dst), - sk->sk_state, pi->psm, pi->scid, pi->dcid, pi->imtu, - pi->omtu, pi->link_mode); - return 0; + return (str - buf); } -static struct seq_operations l2cap_seq_ops = { - .start = l2cap_seq_start, - .next = l2cap_seq_next, - .stop = l2cap_seq_stop, - .show = l2cap_seq_show -}; - -static int l2cap_seq_open(struct inode *inode, struct file *file) -{ - return seq_open(file, &l2cap_seq_ops); -} - -static struct file_operations l2cap_seq_fops = { - .owner = THIS_MODULE, - .open = l2cap_seq_open, - .read = seq_read, - .llseek = seq_lseek, - .release = seq_release, -}; - -static int __init l2cap_proc_init(void) -{ - struct proc_dir_entry *p = create_proc_entry("l2cap", S_IRUGO, proc_bt); - if (!p) - return -ENOMEM; - p->owner = THIS_MODULE; - p->proc_fops = &l2cap_seq_fops; - return 0; -} - -static void __exit l2cap_proc_cleanup(void) -{ - remove_proc_entry("l2cap", proc_bt); -} - -#else /* CONFIG_PROC_FS */ - -static int __init l2cap_proc_init(void) -{ - return 0; -} - -static void __exit l2cap_proc_cleanup(void) -{ - return; -} -#endif /* CONFIG_PROC_FS */ +static CLASS_ATTR(l2cap, S_IRUGO, l2cap_sysfs_show, NULL); static struct proto_ops l2cap_sock_ops = { .family = PF_BLUETOOTH, @@ -2266,7 +2200,7 @@ static struct hci_proto l2cap_hci_proto = { static int __init l2cap_init(void) { int err; - + err = proto_register(&l2cap_proto, 0); if (err < 0) return err; @@ -2284,7 +2218,7 @@ static int __init l2cap_init(void) goto error; } - l2cap_proc_init(); + class_create_file(&bt_class, &class_attr_l2cap); BT_INFO("L2CAP ver %s", VERSION); BT_INFO("L2CAP socket layer initialized"); @@ -2298,7 +2232,7 @@ error: static void __exit l2cap_exit(void) { - l2cap_proc_cleanup(); + class_remove_file(&bt_class, &class_attr_l2cap); if (bt_sock_unregister(BTPROTO_L2CAP) < 0) BT_ERR("L2CAP socket unregistration failed"); diff --git a/net/bluetooth/rfcomm/core.c b/net/bluetooth/rfcomm/core.c index c3d56ead840..0d89d643413 100644 --- a/net/bluetooth/rfcomm/core.c +++ b/net/bluetooth/rfcomm/core.c @@ -35,9 +35,8 @@ #include #include #include +#include #include -#include -#include #include #include #include @@ -47,17 +46,13 @@ #include #include -#define VERSION "1.5" +#define VERSION "1.6" #ifndef CONFIG_BT_RFCOMM_DEBUG #undef BT_DBG #define BT_DBG(D...) #endif -#ifdef CONFIG_PROC_FS -struct proc_dir_entry *proc_bt_rfcomm; -#endif - static struct task_struct *rfcomm_thread; static DECLARE_MUTEX(rfcomm_sem); @@ -2001,117 +1996,32 @@ static struct hci_cb rfcomm_cb = { .encrypt_cfm = rfcomm_encrypt_cfm }; -/* ---- Proc fs support ---- */ -#ifdef CONFIG_PROC_FS -static void *rfcomm_seq_start(struct seq_file *seq, loff_t *pos) +static ssize_t rfcomm_dlc_sysfs_show(struct class *dev, char *buf) { struct rfcomm_session *s; struct list_head *pp, *p; - loff_t l = *pos; + char *str = buf; rfcomm_lock(); list_for_each(p, &session_list) { s = list_entry(p, struct rfcomm_session, list); - list_for_each(pp, &s->dlcs) - if (!l--) { - seq->private = s; - return pp; - } - } - return NULL; -} + list_for_each(pp, &s->dlcs) { + struct sock *sk = s->sock->sk; + struct rfcomm_dlc *d = list_entry(pp, struct rfcomm_dlc, list); -static void *rfcomm_seq_next(struct seq_file *seq, void *e, loff_t *pos) -{ - struct rfcomm_session *s = seq->private; - struct list_head *pp, *p = e; - (*pos)++; - - if (p->next != &s->dlcs) - return p->next; - - list_for_each(p, &session_list) { - s = list_entry(p, struct rfcomm_session, list); - __list_for_each(pp, &s->dlcs) { - seq->private = s; - return pp; + str += sprintf(str, "%s %s %ld %d %d %d %d\n", + batostr(&bt_sk(sk)->src), batostr(&bt_sk(sk)->dst), + d->state, d->dlci, d->mtu, d->rx_credits, d->tx_credits); } } - return NULL; -} -static void rfcomm_seq_stop(struct seq_file *seq, void *e) -{ rfcomm_unlock(); -} - -static int rfcomm_seq_show(struct seq_file *seq, void *e) -{ - struct rfcomm_session *s = seq->private; - struct sock *sk = s->sock->sk; - struct rfcomm_dlc *d = list_entry(e, struct rfcomm_dlc, list); - - seq_printf(seq, "%s %s %ld %d %d %d %d\n", - batostr(&bt_sk(sk)->src), batostr(&bt_sk(sk)->dst), - d->state, d->dlci, d->mtu, d->rx_credits, d->tx_credits); - return 0; -} - -static struct seq_operations rfcomm_seq_ops = { - .start = rfcomm_seq_start, - .next = rfcomm_seq_next, - .stop = rfcomm_seq_stop, - .show = rfcomm_seq_show -}; - -static int rfcomm_seq_open(struct inode *inode, struct file *file) -{ - return seq_open(file, &rfcomm_seq_ops); -} - -static struct file_operations rfcomm_seq_fops = { - .owner = THIS_MODULE, - .open = rfcomm_seq_open, - .read = seq_read, - .llseek = seq_lseek, - .release = seq_release, -}; - -static int __init rfcomm_proc_init(void) -{ - struct proc_dir_entry *p; - - proc_bt_rfcomm = proc_mkdir("rfcomm", proc_bt); - if (proc_bt_rfcomm) { - proc_bt_rfcomm->owner = THIS_MODULE; - - p = create_proc_entry("dlc", S_IRUGO, proc_bt_rfcomm); - if (p) - p->proc_fops = &rfcomm_seq_fops; - } - return 0; -} - -static void __exit rfcomm_proc_cleanup(void) -{ - remove_proc_entry("dlc", proc_bt_rfcomm); - remove_proc_entry("rfcomm", proc_bt); + return (str - buf); } -#else /* CONFIG_PROC_FS */ - -static int __init rfcomm_proc_init(void) -{ - return 0; -} - -static void __exit rfcomm_proc_cleanup(void) -{ - return; -} -#endif /* CONFIG_PROC_FS */ +static CLASS_ATTR(rfcomm_dlc, S_IRUGO, rfcomm_dlc_sysfs_show, NULL); /* ---- Initialization ---- */ static int __init rfcomm_init(void) @@ -2122,9 +2032,7 @@ static int __init rfcomm_init(void) kernel_thread(rfcomm_run, NULL, CLONE_KERNEL); - BT_INFO("RFCOMM ver %s", VERSION); - - rfcomm_proc_init(); + class_create_file(&bt_class, &class_attr_rfcomm_dlc); rfcomm_init_sockets(); @@ -2132,11 +2040,15 @@ static int __init rfcomm_init(void) rfcomm_init_ttys(); #endif + BT_INFO("RFCOMM ver %s", VERSION); + return 0; } static void __exit rfcomm_exit(void) { + class_remove_file(&bt_class, &class_attr_rfcomm_dlc); + hci_unregister_cb(&rfcomm_cb); /* Terminate working thread. @@ -2153,8 +2065,6 @@ static void __exit rfcomm_exit(void) #endif rfcomm_cleanup_sockets(); - - rfcomm_proc_cleanup(); } module_init(rfcomm_init); diff --git a/net/bluetooth/rfcomm/sock.c b/net/bluetooth/rfcomm/sock.c index a2b30f0aedb..6c34261b232 100644 --- a/net/bluetooth/rfcomm/sock.c +++ b/net/bluetooth/rfcomm/sock.c @@ -42,8 +42,7 @@ #include #include #include -#include -#include +#include #include #include @@ -887,89 +886,26 @@ done: return result; } -/* ---- Proc fs support ---- */ -#ifdef CONFIG_PROC_FS -static void *rfcomm_seq_start(struct seq_file *seq, loff_t *pos) +static ssize_t rfcomm_sock_sysfs_show(struct class *dev, char *buf) { struct sock *sk; struct hlist_node *node; - loff_t l = *pos; + char *str = buf; read_lock_bh(&rfcomm_sk_list.lock); - sk_for_each(sk, node, &rfcomm_sk_list.head) - if (!l--) - return sk; - return NULL; -} - -static void *rfcomm_seq_next(struct seq_file *seq, void *e, loff_t *pos) -{ - struct sock *sk = e; - (*pos)++; - return sk_next(sk); -} + sk_for_each(sk, node, &rfcomm_sk_list.head) { + str += sprintf(str, "%s %s %d %d\n", + batostr(&bt_sk(sk)->src), batostr(&bt_sk(sk)->dst), + sk->sk_state, rfcomm_pi(sk)->channel); + } -static void rfcomm_seq_stop(struct seq_file *seq, void *e) -{ read_unlock_bh(&rfcomm_sk_list.lock); -} -static int rfcomm_seq_show(struct seq_file *seq, void *e) -{ - struct sock *sk = e; - seq_printf(seq, "%s %s %d %d\n", - batostr(&bt_sk(sk)->src), batostr(&bt_sk(sk)->dst), - sk->sk_state, rfcomm_pi(sk)->channel); - return 0; -} - -static struct seq_operations rfcomm_seq_ops = { - .start = rfcomm_seq_start, - .next = rfcomm_seq_next, - .stop = rfcomm_seq_stop, - .show = rfcomm_seq_show -}; - -static int rfcomm_seq_open(struct inode *inode, struct file *file) -{ - return seq_open(file, &rfcomm_seq_ops); + return (str - buf); } -static struct file_operations rfcomm_seq_fops = { - .owner = THIS_MODULE, - .open = rfcomm_seq_open, - .read = seq_read, - .llseek = seq_lseek, - .release = seq_release, -}; - -static int __init rfcomm_sock_proc_init(void) -{ - struct proc_dir_entry *p = create_proc_entry("sock", S_IRUGO, proc_bt_rfcomm); - if (!p) - return -ENOMEM; - p->proc_fops = &rfcomm_seq_fops; - return 0; -} - -static void __exit rfcomm_sock_proc_cleanup(void) -{ - remove_proc_entry("sock", proc_bt_rfcomm); -} - -#else /* CONFIG_PROC_FS */ - -static int __init rfcomm_sock_proc_init(void) -{ - return 0; -} - -static void __exit rfcomm_sock_proc_cleanup(void) -{ - return; -} -#endif /* CONFIG_PROC_FS */ +static CLASS_ATTR(rfcomm, S_IRUGO, rfcomm_sock_sysfs_show, NULL); static struct proto_ops rfcomm_sock_ops = { .family = PF_BLUETOOTH, @@ -997,7 +933,7 @@ static struct net_proto_family rfcomm_sock_family_ops = { .create = rfcomm_sock_create }; -int __init rfcomm_init_sockets(void) +int __init rfcomm_init_sockets(void) { int err; @@ -1009,7 +945,7 @@ int __init rfcomm_init_sockets(void) if (err < 0) goto error; - rfcomm_sock_proc_init(); + class_create_file(&bt_class, &class_attr_rfcomm); BT_INFO("RFCOMM socket layer initialized"); @@ -1023,7 +959,7 @@ error: void __exit rfcomm_cleanup_sockets(void) { - rfcomm_sock_proc_cleanup(); + class_remove_file(&bt_class, &class_attr_rfcomm); if (bt_sock_unregister(BTPROTO_RFCOMM) < 0) BT_ERR("RFCOMM socket layer unregistration failed"); diff --git a/net/bluetooth/sco.c b/net/bluetooth/sco.c index 997e42df115..9cb00dc6c08 100644 --- a/net/bluetooth/sco.c +++ b/net/bluetooth/sco.c @@ -38,8 +38,7 @@ #include #include #include -#include -#include +#include #include #include @@ -55,7 +54,7 @@ #define BT_DBG(D...) #endif -#define VERSION "0.4" +#define VERSION "0.5" static struct proto_ops sco_sock_ops; @@ -893,91 +892,26 @@ drop: return 0; } -/* ---- Proc fs support ---- */ -#ifdef CONFIG_PROC_FS -static void *sco_seq_start(struct seq_file *seq, loff_t *pos) +static ssize_t sco_sysfs_show(struct class *dev, char *buf) { struct sock *sk; struct hlist_node *node; - loff_t l = *pos; + char *str = buf; read_lock_bh(&sco_sk_list.lock); - sk_for_each(sk, node, &sco_sk_list.head) - if (!l--) - goto found; - sk = NULL; -found: - return sk; -} - -static void *sco_seq_next(struct seq_file *seq, void *e, loff_t *pos) -{ - struct sock *sk = e; - (*pos)++; - return sk_next(sk); -} + sk_for_each(sk, node, &sco_sk_list.head) { + str += sprintf(str, "%s %s %d\n", + batostr(&bt_sk(sk)->src), batostr(&bt_sk(sk)->dst), + sk->sk_state); + } -static void sco_seq_stop(struct seq_file *seq, void *e) -{ read_unlock_bh(&sco_sk_list.lock); -} - -static int sco_seq_show(struct seq_file *seq, void *e) -{ - struct sock *sk = e; - seq_printf(seq, "%s %s %d\n", - batostr(&bt_sk(sk)->src), batostr(&bt_sk(sk)->dst), sk->sk_state); - return 0; -} -static struct seq_operations sco_seq_ops = { - .start = sco_seq_start, - .next = sco_seq_next, - .stop = sco_seq_stop, - .show = sco_seq_show -}; - -static int sco_seq_open(struct inode *inode, struct file *file) -{ - return seq_open(file, &sco_seq_ops); + return (str - buf); } -static struct file_operations sco_seq_fops = { - .owner = THIS_MODULE, - .open = sco_seq_open, - .read = seq_read, - .llseek = seq_lseek, - .release = seq_release, -}; - -static int __init sco_proc_init(void) -{ - struct proc_dir_entry *p = create_proc_entry("sco", S_IRUGO, proc_bt); - if (!p) - return -ENOMEM; - p->owner = THIS_MODULE; - p->proc_fops = &sco_seq_fops; - return 0; -} - -static void __exit sco_proc_cleanup(void) -{ - remove_proc_entry("sco", proc_bt); -} - -#else /* CONFIG_PROC_FS */ - -static int __init sco_proc_init(void) -{ - return 0; -} - -static void __exit sco_proc_cleanup(void) -{ - return; -} -#endif /* CONFIG_PROC_FS */ +static CLASS_ATTR(sco, S_IRUGO, sco_sysfs_show, NULL); static struct proto_ops sco_sock_ops = { .family = PF_BLUETOOTH, @@ -1035,7 +969,7 @@ static int __init sco_init(void) goto error; } - sco_proc_init(); + class_create_file(&bt_class, &class_attr_sco); BT_INFO("SCO (Voice Link) ver %s", VERSION); BT_INFO("SCO socket layer initialized"); @@ -1049,7 +983,7 @@ error: static void __exit sco_exit(void) { - sco_proc_cleanup(); + class_remove_file(&bt_class, &class_attr_sco); if (bt_sock_unregister(BTPROTO_SCO) < 0) BT_ERR("SCO socket unregistration failed"); -- cgit v1.2.3-18-g5258 From e3305626e0985faa8796f1f4e5a99c1f40bfa70e Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 9 Nov 2005 01:01:04 -0500 Subject: ieee80211: cleanup crypto list handling, other minor cleanups. --- net/ieee80211/ieee80211_crypt.c | 152 +++++++++++----------------------------- 1 file changed, 41 insertions(+), 111 deletions(-) (limited to 'net') diff --git a/net/ieee80211/ieee80211_crypt.c b/net/ieee80211/ieee80211_crypt.c index 20cc580a07e..ecc9bb196ab 100644 --- a/net/ieee80211/ieee80211_crypt.c +++ b/net/ieee80211/ieee80211_crypt.c @@ -11,15 +11,14 @@ * */ -#include +#include #include #include #include -#include -#include - +#include #include + MODULE_AUTHOR("Jouni Malinen"); MODULE_DESCRIPTION("HostAP crypto"); MODULE_LICENSE("GPL"); @@ -29,32 +28,20 @@ struct ieee80211_crypto_alg { struct ieee80211_crypto_ops *ops; }; -struct ieee80211_crypto { - struct list_head algs; - spinlock_t lock; -}; - -static struct ieee80211_crypto *hcrypt; +static LIST_HEAD(ieee80211_crypto_algs); +static DEFINE_SPINLOCK(ieee80211_crypto_lock); void ieee80211_crypt_deinit_entries(struct ieee80211_device *ieee, int force) { - struct list_head *ptr, *n; - struct ieee80211_crypt_data *entry; + struct ieee80211_crypt_data *entry, *next; unsigned long flags; spin_lock_irqsave(&ieee->lock, flags); - - if (list_empty(&ieee->crypt_deinit_list)) - goto unlock; - - for (ptr = ieee->crypt_deinit_list.next, n = ptr->next; - ptr != &ieee->crypt_deinit_list; ptr = n, n = ptr->next) { - entry = list_entry(ptr, struct ieee80211_crypt_data, list); - + list_for_each_entry_safe(entry, next, &ieee->crypt_deinit_list, list) { if (atomic_read(&entry->refcnt) != 0 && !force) continue; - list_del(ptr); + list_del(&entry->list); if (entry->ops) { entry->ops->deinit(entry->priv); @@ -62,7 +49,6 @@ void ieee80211_crypt_deinit_entries(struct ieee80211_device *ieee, int force) } kfree(entry); } - unlock: spin_unlock_irqrestore(&ieee->lock, flags); } @@ -125,9 +111,6 @@ int ieee80211_register_crypto_ops(struct ieee80211_crypto_ops *ops) unsigned long flags; struct ieee80211_crypto_alg *alg; - if (hcrypt == NULL) - return -1; - alg = kmalloc(sizeof(*alg), GFP_KERNEL); if (alg == NULL) return -ENOMEM; @@ -135,9 +118,9 @@ int ieee80211_register_crypto_ops(struct ieee80211_crypto_ops *ops) memset(alg, 0, sizeof(*alg)); alg->ops = ops; - spin_lock_irqsave(&hcrypt->lock, flags); - list_add(&alg->list, &hcrypt->algs); - spin_unlock_irqrestore(&hcrypt->lock, flags); + spin_lock_irqsave(&ieee80211_crypto_lock, flags); + list_add(&alg->list, &ieee80211_crypto_algs); + spin_unlock_irqrestore(&ieee80211_crypto_lock, flags); printk(KERN_DEBUG "ieee80211_crypt: registered algorithm '%s'\n", ops->name); @@ -147,64 +130,49 @@ int ieee80211_register_crypto_ops(struct ieee80211_crypto_ops *ops) int ieee80211_unregister_crypto_ops(struct ieee80211_crypto_ops *ops) { + struct ieee80211_crypto_alg *alg; unsigned long flags; - struct list_head *ptr; - struct ieee80211_crypto_alg *del_alg = NULL; - - if (hcrypt == NULL) - return -1; - - spin_lock_irqsave(&hcrypt->lock, flags); - for (ptr = hcrypt->algs.next; ptr != &hcrypt->algs; ptr = ptr->next) { - struct ieee80211_crypto_alg *alg = - (struct ieee80211_crypto_alg *)ptr; - if (alg->ops == ops) { - list_del(&alg->list); - del_alg = alg; - break; - } - } - spin_unlock_irqrestore(&hcrypt->lock, flags); - if (del_alg) { - printk(KERN_DEBUG "ieee80211_crypt: unregistered algorithm " - "'%s'\n", ops->name); - kfree(del_alg); + spin_lock_irqsave(&ieee80211_crypto_lock, flags); + list_for_each_entry(alg, &ieee80211_crypto_algs, list) { + if (alg->ops == ops) + goto found; } - - return del_alg ? 0 : -1; + spin_unlock_irqrestore(&ieee80211_crypto_lock, flags); + return -EINVAL; + + found: + printk(KERN_DEBUG "ieee80211_crypt: unregistered algorithm " + "'%s'\n", ops->name); + list_del(&alg->list); + spin_unlock_irqrestore(&ieee80211_crypto_lock, flags); + kfree(alg); + return 0; } struct ieee80211_crypto_ops *ieee80211_get_crypto_ops(const char *name) { + struct ieee80211_crypto_alg *alg; unsigned long flags; - struct list_head *ptr; - struct ieee80211_crypto_alg *found_alg = NULL; - - if (hcrypt == NULL) - return NULL; - - spin_lock_irqsave(&hcrypt->lock, flags); - for (ptr = hcrypt->algs.next; ptr != &hcrypt->algs; ptr = ptr->next) { - struct ieee80211_crypto_alg *alg = - (struct ieee80211_crypto_alg *)ptr; - if (strcmp(alg->ops->name, name) == 0) { - found_alg = alg; - break; - } + + spin_lock_irqsave(&ieee80211_crypto_lock, flags); + list_for_each_entry(alg, &ieee80211_crypto_algs, list) { + if (strcmp(alg->ops->name, name) == 0) + goto found; } - spin_unlock_irqrestore(&hcrypt->lock, flags); + spin_unlock_irqrestore(&ieee80211_crypto_lock, flags); + return NULL; - if (found_alg) - return found_alg->ops; - else - return NULL; + found: + spin_unlock_irqrestore(&ieee80211_crypto_lock, flags); + return alg->ops; } static void *ieee80211_crypt_null_init(int keyidx) { return (void *)1; } + static void ieee80211_crypt_null_deinit(void *priv) { } @@ -213,56 +181,18 @@ static struct ieee80211_crypto_ops ieee80211_crypt_null = { .name = "NULL", .init = ieee80211_crypt_null_init, .deinit = ieee80211_crypt_null_deinit, - .encrypt_mpdu = NULL, - .decrypt_mpdu = NULL, - .encrypt_msdu = NULL, - .decrypt_msdu = NULL, - .set_key = NULL, - .get_key = NULL, - .extra_mpdu_prefix_len = 0, - .extra_mpdu_postfix_len = 0, .owner = THIS_MODULE, }; static int __init ieee80211_crypto_init(void) { - int ret = -ENOMEM; - - hcrypt = kmalloc(sizeof(*hcrypt), GFP_KERNEL); - if (!hcrypt) - goto out; - - memset(hcrypt, 0, sizeof(*hcrypt)); - INIT_LIST_HEAD(&hcrypt->algs); - spin_lock_init(&hcrypt->lock); - - ret = ieee80211_register_crypto_ops(&ieee80211_crypt_null); - if (ret < 0) { - kfree(hcrypt); - hcrypt = NULL; - } - out: - return ret; + return ieee80211_register_crypto_ops(&ieee80211_crypt_null); } static void __exit ieee80211_crypto_deinit(void) { - struct list_head *ptr, *n; - - if (hcrypt == NULL) - return; - - for (ptr = hcrypt->algs.next, n = ptr->next; ptr != &hcrypt->algs; - ptr = n, n = ptr->next) { - struct ieee80211_crypto_alg *alg = - (struct ieee80211_crypto_alg *)ptr; - list_del(ptr); - printk(KERN_DEBUG "ieee80211_crypt: unregistered algorithm " - "'%s' (deinit)\n", alg->ops->name); - kfree(alg); - } - - kfree(hcrypt); + ieee80211_unregister_crypto_ops(&ieee80211_crypt_null); + BUG_ON(!list_empty(&ieee80211_crypto_algs)); } EXPORT_SYMBOL(ieee80211_crypt_deinit_entries); -- cgit v1.2.3-18-g5258 From e4543eddfd3bf3e0d625841377fa695a519edfd4 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Tue, 8 Nov 2005 21:35:04 -0800 Subject: [PATCH] add a vfs_permission helper Most permission() calls have a struct nameidata * available. This helper takes that as an argument and thus makes sure we pass it down for lookup intents and prepares for per-mount read-only support where we need a struct vfsmount for checking whether a file is writeable. Signed-off-by: Christoph Hellwig Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- net/unix/af_unix.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net') diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 41feca3bef8..acc73ba8bad 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -676,7 +676,7 @@ static struct sock *unix_find_other(struct sockaddr_un *sunname, int len, err = path_lookup(sunname->sun_path, LOOKUP_FOLLOW, &nd); if (err) goto fail; - err = permission(nd.dentry->d_inode,MAY_WRITE, &nd); + err = vfs_permission(&nd, MAY_WRITE); if (err) goto put_fail; -- cgit v1.2.3-18-g5258 From 49705b7743fd8f5632a95ec4c6547d169d27ac1f Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Tue, 8 Nov 2005 21:35:06 -0800 Subject: [PATCH] sanitize lookup_hash prototype ->permission and ->lookup have a struct nameidata * argument these days to pass down lookup intents. Unfortunately some callers of lookup_hash don't actually pass this one down. For lookup_one_len() we don't have a struct nameidata to pass down, but as this function is a library function only used by filesystem code this is an acceptable limitation. All other callers should pass down the nameidata, so this patch changes the lookup_hash interface to only take a struct nameidata argument and derives the other two arguments to __lookup_hash from it. All callers already have the nameidata argument available so this is not a problem. At the same time I'd like to deprecate the lookup_hash interface as there are better exported interfaces for filesystem usage. Before it can actually be removed I need to fix up rpc_pipefs. Signed-off-by: Christoph Hellwig Cc: Ram Pai Cc: Jeff Mahoney Cc: Al Viro Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- net/sunrpc/rpc_pipe.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'net') diff --git a/net/sunrpc/rpc_pipe.c b/net/sunrpc/rpc_pipe.c index 4f188d0a5d1..81e00a6c19d 100644 --- a/net/sunrpc/rpc_pipe.c +++ b/net/sunrpc/rpc_pipe.c @@ -603,7 +603,7 @@ rpc_lookup_negative(char *path, struct nameidata *nd) return ERR_PTR(error); dir = nd->dentry->d_inode; down(&dir->i_sem); - dentry = lookup_hash(&nd->last, nd->dentry); + dentry = lookup_hash(nd); if (IS_ERR(dentry)) goto out_err; if (dentry->d_inode) { @@ -665,7 +665,7 @@ rpc_rmdir(char *path) return error; dir = nd.dentry->d_inode; down(&dir->i_sem); - dentry = lookup_hash(&nd.last, nd.dentry); + dentry = lookup_hash(&nd); if (IS_ERR(dentry)) { error = PTR_ERR(dentry); goto out_release; @@ -726,7 +726,7 @@ rpc_unlink(char *path) return error; dir = nd.dentry->d_inode; down(&dir->i_sem); - dentry = lookup_hash(&nd.last, nd.dentry); + dentry = lookup_hash(&nd); if (IS_ERR(dentry)) { error = PTR_ERR(dentry); goto out_release; -- cgit v1.2.3-18-g5258 From 46998f59c03ecbd7c2250810f35af6fe24868845 Mon Sep 17 00:00:00 2001 From: Yasuyuki Kozakai Date: Wed, 9 Nov 2005 12:58:05 -0800 Subject: [NETFILTER]: packet counter of conntrack is 32bits The packet counter variable of conntrack was changed to 32bits from 64bits. This follows that change. Signed-off-by: Yasuyuki Kozakai Signed-off-by: Harald Welte Signed-off-by: David S. Miller --- net/ipv4/netfilter/ip_conntrack_netlink.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net') diff --git a/net/ipv4/netfilter/ip_conntrack_netlink.c b/net/ipv4/netfilter/ip_conntrack_netlink.c index 82a65043a8e..431a446994f 100644 --- a/net/ipv4/netfilter/ip_conntrack_netlink.c +++ b/net/ipv4/netfilter/ip_conntrack_netlink.c @@ -175,7 +175,7 @@ ctnetlink_dump_counters(struct sk_buff *skb, const struct ip_conntrack *ct, { enum ctattr_type type = dir ? CTA_COUNTERS_REPLY: CTA_COUNTERS_ORIG; struct nfattr *nest_count = NFA_NEST(skb, type); - u_int64_t tmp; + u_int32_t tmp; tmp = htonl(ct->counters[dir].packets); NFA_PUT(skb, CTA_COUNTERS32_PACKETS, sizeof(u_int32_t), &tmp); -- cgit v1.2.3-18-g5258 From eaae4fa45e0f4cd1da0f00ae93551edb1002b2b9 Mon Sep 17 00:00:00 2001 From: Yasuyuki Kozakai Date: Wed, 9 Nov 2005 12:58:46 -0800 Subject: [NETFILTER]: refcount leak of proto when ctnetlink dumping tuple Signed-off-by: Yasuyuki Kozakai Signed-off-by: Harald Welte Signed-off-by: David S. Miller --- net/ipv4/netfilter/ip_conntrack_netlink.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) (limited to 'net') diff --git a/net/ipv4/netfilter/ip_conntrack_netlink.c b/net/ipv4/netfilter/ip_conntrack_netlink.c index 431a446994f..02f303cf201 100644 --- a/net/ipv4/netfilter/ip_conntrack_netlink.c +++ b/net/ipv4/netfilter/ip_conntrack_netlink.c @@ -58,14 +58,17 @@ ctnetlink_dump_tuples_proto(struct sk_buff *skb, const struct ip_conntrack_tuple *tuple) { struct ip_conntrack_protocol *proto; + int ret = 0; NFA_PUT(skb, CTA_PROTO_NUM, sizeof(u_int8_t), &tuple->dst.protonum); proto = ip_conntrack_proto_find_get(tuple->dst.protonum); - if (proto && proto->tuple_to_nfattr) - return proto->tuple_to_nfattr(skb, tuple); + if (likely(proto && proto->tuple_to_nfattr)) { + ret = proto->tuple_to_nfattr(skb, tuple); + ip_conntrack_proto_put(proto); + } - return 0; + return ret; nfattr_failure: return -1; -- cgit v1.2.3-18-g5258 From a2506c04322ca266fe2f9bd7d02a67b1972da611 Mon Sep 17 00:00:00 2001 From: Harald Welte Date: Wed, 9 Nov 2005 12:59:13 -0800 Subject: [NETFILTER] nfnetlink: nfattr_parse() can never fail, make it void nfattr_parse (and thus nfattr_parse_nested) always returns success. So we can make them 'void' and remove all the checking at the caller side. Based on original patch by Pablo Neira Ayuso Signed-off-by: Harald Welte Signed-off-by: David S. Miller --- net/ipv4/netfilter/ip_conntrack_netlink.c | 45 +++++------------------------ net/ipv4/netfilter/ip_conntrack_proto_tcp.c | 6 +--- net/netfilter/nfnetlink.c | 4 +-- 3 files changed, 10 insertions(+), 45 deletions(-) (limited to 'net') diff --git a/net/ipv4/netfilter/ip_conntrack_netlink.c b/net/ipv4/netfilter/ip_conntrack_netlink.c index 02f303cf201..838262e1737 100644 --- a/net/ipv4/netfilter/ip_conntrack_netlink.c +++ b/net/ipv4/netfilter/ip_conntrack_netlink.c @@ -482,9 +482,7 @@ ctnetlink_parse_tuple_ip(struct nfattr *attr, struct ip_conntrack_tuple *tuple) DEBUGP("entered %s\n", __FUNCTION__); - - if (nfattr_parse_nested(tb, CTA_IP_MAX, attr) < 0) - goto nfattr_failure; + nfattr_parse_nested(tb, CTA_IP_MAX, attr); if (nfattr_bad_size(tb, CTA_IP_MAX, cta_min_ip)) return -EINVAL; @@ -500,9 +498,6 @@ ctnetlink_parse_tuple_ip(struct nfattr *attr, struct ip_conntrack_tuple *tuple) DEBUGP("leaving\n"); return 0; - -nfattr_failure: - return -1; } static const int cta_min_proto[CTA_PROTO_MAX] = { @@ -524,8 +519,7 @@ ctnetlink_parse_tuple_proto(struct nfattr *attr, DEBUGP("entered %s\n", __FUNCTION__); - if (nfattr_parse_nested(tb, CTA_PROTO_MAX, attr) < 0) - goto nfattr_failure; + nfattr_parse_nested(tb, CTA_PROTO_MAX, attr); if (nfattr_bad_size(tb, CTA_PROTO_MAX, cta_min_proto)) return -EINVAL; @@ -542,9 +536,6 @@ ctnetlink_parse_tuple_proto(struct nfattr *attr, } return ret; - -nfattr_failure: - return -1; } static inline int @@ -558,8 +549,7 @@ ctnetlink_parse_tuple(struct nfattr *cda[], struct ip_conntrack_tuple *tuple, memset(tuple, 0, sizeof(*tuple)); - if (nfattr_parse_nested(tb, CTA_TUPLE_MAX, cda[type-1]) < 0) - goto nfattr_failure; + nfattr_parse_nested(tb, CTA_TUPLE_MAX, cda[type-1]); if (!tb[CTA_TUPLE_IP-1]) return -EINVAL; @@ -586,9 +576,6 @@ ctnetlink_parse_tuple(struct nfattr *cda[], struct ip_conntrack_tuple *tuple, DEBUGP("leaving\n"); return 0; - -nfattr_failure: - return -1; } #ifdef CONFIG_IP_NF_NAT_NEEDED @@ -606,11 +593,10 @@ static int ctnetlink_parse_nat_proto(struct nfattr *attr, DEBUGP("entered %s\n", __FUNCTION__); - if (nfattr_parse_nested(tb, CTA_PROTONAT_MAX, attr) < 0) - goto nfattr_failure; + nfattr_parse_nested(tb, CTA_PROTONAT_MAX, attr); if (nfattr_bad_size(tb, CTA_PROTONAT_MAX, cta_min_protonat)) - goto nfattr_failure; + return -1; npt = ip_nat_proto_find_get(ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.dst.protonum); if (!npt) @@ -629,9 +615,6 @@ static int ctnetlink_parse_nat_proto(struct nfattr *attr, DEBUGP("leaving\n"); return 0; - -nfattr_failure: - return -1; } static inline int @@ -645,8 +628,7 @@ ctnetlink_parse_nat(struct nfattr *cda[], memset(range, 0, sizeof(*range)); - if (nfattr_parse_nested(tb, CTA_NAT_MAX, cda[CTA_NAT-1]) < 0) - goto nfattr_failure; + nfattr_parse_nested(tb, CTA_NAT_MAX, cda[CTA_NAT-1]); if (tb[CTA_NAT_MINIP-1]) range->min_ip = *(u_int32_t *)NFA_DATA(tb[CTA_NAT_MINIP-1]); @@ -668,9 +650,6 @@ ctnetlink_parse_nat(struct nfattr *cda[], DEBUGP("leaving\n"); return 0; - -nfattr_failure: - return -1; } #endif @@ -681,8 +660,7 @@ ctnetlink_parse_help(struct nfattr *attr, char **helper_name) DEBUGP("entered %s\n", __FUNCTION__); - if (nfattr_parse_nested(tb, CTA_HELP_MAX, attr) < 0) - goto nfattr_failure; + nfattr_parse_nested(tb, CTA_HELP_MAX, attr); if (!tb[CTA_HELP_NAME-1]) return -EINVAL; @@ -690,9 +668,6 @@ ctnetlink_parse_help(struct nfattr *attr, char **helper_name) *helper_name = NFA_DATA(tb[CTA_HELP_NAME-1]); return 0; - -nfattr_failure: - return -1; } static int @@ -960,8 +935,7 @@ ctnetlink_change_protoinfo(struct ip_conntrack *ct, struct nfattr *cda[]) u_int16_t npt = ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.dst.protonum; int err = 0; - if (nfattr_parse_nested(tb, CTA_PROTOINFO_MAX, attr) < 0) - goto nfattr_failure; + nfattr_parse_nested(tb, CTA_PROTOINFO_MAX, attr); proto = ip_conntrack_proto_find_get(npt); if (!proto) @@ -972,9 +946,6 @@ ctnetlink_change_protoinfo(struct ip_conntrack *ct, struct nfattr *cda[]) ip_conntrack_proto_put(proto); return err; - -nfattr_failure: - return -ENOMEM; } static int diff --git a/net/ipv4/netfilter/ip_conntrack_proto_tcp.c b/net/ipv4/netfilter/ip_conntrack_proto_tcp.c index d6701cafbcc..6ea4b22ff28 100644 --- a/net/ipv4/netfilter/ip_conntrack_proto_tcp.c +++ b/net/ipv4/netfilter/ip_conntrack_proto_tcp.c @@ -362,8 +362,7 @@ static int nfattr_to_tcp(struct nfattr *cda[], struct ip_conntrack *ct) struct nfattr *attr = cda[CTA_PROTOINFO_TCP-1]; struct nfattr *tb[CTA_PROTOINFO_TCP_MAX]; - if (nfattr_parse_nested(tb, CTA_PROTOINFO_TCP_MAX, attr) < 0) - goto nfattr_failure; + nfattr_parse_nested(tb, CTA_PROTOINFO_TCP_MAX, attr); if (!tb[CTA_PROTOINFO_TCP_STATE-1]) return -EINVAL; @@ -374,9 +373,6 @@ static int nfattr_to_tcp(struct nfattr *cda[], struct ip_conntrack *ct) write_unlock_bh(&tcp_lock); return 0; - -nfattr_failure: - return -1; } #endif diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c index 4bc27a6334c..f8bd7c7e792 100644 --- a/net/netfilter/nfnetlink.c +++ b/net/netfilter/nfnetlink.c @@ -128,7 +128,7 @@ void __nfa_fill(struct sk_buff *skb, int attrtype, int attrlen, memset(NFA_DATA(nfa) + attrlen, 0, NFA_ALIGN(size) - size); } -int nfattr_parse(struct nfattr *tb[], int maxattr, struct nfattr *nfa, int len) +void nfattr_parse(struct nfattr *tb[], int maxattr, struct nfattr *nfa, int len) { memset(tb, 0, sizeof(struct nfattr *) * maxattr); @@ -138,8 +138,6 @@ int nfattr_parse(struct nfattr *tb[], int maxattr, struct nfattr *nfa, int len) tb[flavor-1] = nfa; nfa = NFA_NEXT(nfa, len); } - - return 0; } /** -- cgit v1.2.3-18-g5258 From 51df784ed739246a3774b300e5f536e17bec36ed Mon Sep 17 00:00:00 2001 From: Pablo Neira Ayuso Date: Wed, 9 Nov 2005 12:59:41 -0800 Subject: [NETFILTER] ctnetlink: check if protoinfo is present This fixes an oops triggered from userspace. If we don't pass information about the private protocol info, the reference to attr will be NULL. This is likely to happen in update messages. Signed-off-by: Pablo Neira Ayuso Signed-off-by: Harald Welte Signed-off-by: David S. Miller --- net/ipv4/netfilter/ip_conntrack_proto_tcp.c | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'net') diff --git a/net/ipv4/netfilter/ip_conntrack_proto_tcp.c b/net/ipv4/netfilter/ip_conntrack_proto_tcp.c index 6ea4b22ff28..468c6003b4c 100644 --- a/net/ipv4/netfilter/ip_conntrack_proto_tcp.c +++ b/net/ipv4/netfilter/ip_conntrack_proto_tcp.c @@ -362,6 +362,11 @@ static int nfattr_to_tcp(struct nfattr *cda[], struct ip_conntrack *ct) struct nfattr *attr = cda[CTA_PROTOINFO_TCP-1]; struct nfattr *tb[CTA_PROTOINFO_TCP_MAX]; + /* updates could not contain anything about the private + * protocol info, in that case skip the parsing */ + if (!attr) + return 0; + nfattr_parse_nested(tb, CTA_PROTOINFO_TCP_MAX, attr); if (!tb[CTA_PROTOINFO_TCP_STATE-1]) -- cgit v1.2.3-18-g5258 From 02a78cdf425156b86abdb6883f837a70fb7106da Mon Sep 17 00:00:00 2001 From: Pablo Neira Ayuso Date: Wed, 9 Nov 2005 13:00:04 -0800 Subject: [NETFILTER] ctnetlink: add marking support from userspace This patch adds support for conntrack marking from user space. Signed-off-by: Pablo Neira Ayuso Signed-off-by: Harald Welte Signed-off-by: David S. Miller --- net/ipv4/netfilter/ip_conntrack_netlink.c | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'net') diff --git a/net/ipv4/netfilter/ip_conntrack_netlink.c b/net/ipv4/netfilter/ip_conntrack_netlink.c index 838262e1737..09957f9be97 100644 --- a/net/ipv4/netfilter/ip_conntrack_netlink.c +++ b/net/ipv4/netfilter/ip_conntrack_netlink.c @@ -979,6 +979,11 @@ ctnetlink_change_conntrack(struct ip_conntrack *ct, struct nfattr *cda[]) return err; } +#if defined(CONFIG_IP_NF_CONNTRACK_MARK) + if (cda[CTA_MARK-1]) + ct->mark = ntohl(*(u_int32_t *)NFA_DATA(cda[CTA_MARK-1])); +#endif + DEBUGP("all done\n"); return 0; } @@ -1022,6 +1027,11 @@ ctnetlink_create_conntrack(struct nfattr *cda[], if (ct->helper) ip_conntrack_helper_put(ct->helper); +#if defined(CONFIG_IP_NF_CONNTRACK_MARK) + if (cda[CTA_MARK-1]) + ct->mark = ntohl(*(u_int32_t *)NFA_DATA(cda[CTA_MARK-1])); +#endif + DEBUGP("conntrack with id %u inserted\n", ct->id); return 0; -- cgit v1.2.3-18-g5258 From 119a31849442215fa66e4d18a33443a55c45e631 Mon Sep 17 00:00:00 2001 From: Pablo Neira Ayuso Date: Wed, 9 Nov 2005 13:00:29 -0800 Subject: [NETFILTER] ctnetlink: add module alias to fix autoloading Add missing module alias. This is a must to load ctnetlink on demand. For example, the conntrack tool will fail if the module isn't loaded. Signed-off-by: Pablo Neira Ayuso Signed-off-by: Harald Welte Signed-off-by: David S. Miller --- net/ipv4/netfilter/ip_conntrack_netlink.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'net') diff --git a/net/ipv4/netfilter/ip_conntrack_netlink.c b/net/ipv4/netfilter/ip_conntrack_netlink.c index 09957f9be97..842fe2c081c 100644 --- a/net/ipv4/netfilter/ip_conntrack_netlink.c +++ b/net/ipv4/netfilter/ip_conntrack_netlink.c @@ -1538,6 +1538,8 @@ static struct nfnetlink_subsystem ctnl_exp_subsys = { .cb = ctnl_exp_cb, }; +MODULE_ALIAS_NFNL_SUBSYS(NFNL_SUBSYS_CTNETLINK); + static int __init ctnetlink_init(void) { int ret; -- cgit v1.2.3-18-g5258 From 7a4fe3664b3cfecd2a40a46f54c71333639e28b7 Mon Sep 17 00:00:00 2001 From: Pablo Neira Ayuso Date: Wed, 9 Nov 2005 13:00:47 -0800 Subject: [NETFILTER] ctnetlink: kill unused includes Kill some useless headers included in ctnetlink. They aren't used in any way. Signed-off-by: Pablo Neira Ayuso Signed-off-by: Harald Welte Signed-off-by: David S. Miller --- net/ipv4/netfilter/ip_conntrack_netlink.c | 3 --- 1 file changed, 3 deletions(-) (limited to 'net') diff --git a/net/ipv4/netfilter/ip_conntrack_netlink.c b/net/ipv4/netfilter/ip_conntrack_netlink.c index 842fe2c081c..bf584249571 100644 --- a/net/ipv4/netfilter/ip_conntrack_netlink.c +++ b/net/ipv4/netfilter/ip_conntrack_netlink.c @@ -28,11 +28,8 @@ #include #include #include -#include #include -#include -#include #include #include #include -- cgit v1.2.3-18-g5258 From 81e5c27d08bb39e646fe822ea80ab8feba62b94d Mon Sep 17 00:00:00 2001 From: Pablo Neira Ayuso Date: Wed, 9 Nov 2005 13:01:19 -0800 Subject: [NETFILTER] ctnetlink: get_conntrack can use GFP_KERNEL ctnetlink_get_conntrack is always called from user context, so GFP_KERNEL is enough. Signed-off-by: Pablo Neira Ayuso Signed-off-by: Harald Welte Signed-off-by: David S. Miller --- net/ipv4/netfilter/ip_conntrack_netlink.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net') diff --git a/net/ipv4/netfilter/ip_conntrack_netlink.c b/net/ipv4/netfilter/ip_conntrack_netlink.c index bf584249571..d1dde1417ef 100644 --- a/net/ipv4/netfilter/ip_conntrack_netlink.c +++ b/net/ipv4/netfilter/ip_conntrack_netlink.c @@ -779,7 +779,7 @@ ctnetlink_get_conntrack(struct sock *ctnl, struct sk_buff *skb, ct = tuplehash_to_ctrack(h); err = -ENOMEM; - skb2 = alloc_skb(NLMSG_GOODSIZE, GFP_ATOMIC); + skb2 = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL); if (!skb2) { ip_conntrack_put(ct); return -ENOMEM; -- cgit v1.2.3-18-g5258 From 5978a9b82c55b82a1087bd86e0ae8b00f94d0d0b Mon Sep 17 00:00:00 2001 From: Philip Craig Date: Wed, 9 Nov 2005 13:01:53 -0800 Subject: [NETFILTER] PPTP helper: fix PNS-PAC expectation call id The reply tuple of the PNS->PAC expectation was using the wrong call id. So we had the following situation: - PNS behind NAT firewall - PNS call id requires NATing - PNS->PAC gre packet arrives first then the PNS->PAC expectation is matched, and the other expectation is deleted, but the PAC->PNS gre packets do not match the gre conntrack because the call id is wrong. We also cannot use ip_nat_follow_master(). Signed-off-by: Philip Craig Signed-off-by: Harald Welte Signed-off-by: David S. Miller --- net/ipv4/netfilter/ip_nat_helper_pptp.c | 28 ++++++++++++++++++++++++++-- 1 file changed, 26 insertions(+), 2 deletions(-) (limited to 'net') diff --git a/net/ipv4/netfilter/ip_nat_helper_pptp.c b/net/ipv4/netfilter/ip_nat_helper_pptp.c index ee6ab74ad3a..e546203f566 100644 --- a/net/ipv4/netfilter/ip_nat_helper_pptp.c +++ b/net/ipv4/netfilter/ip_nat_helper_pptp.c @@ -73,6 +73,7 @@ static void pptp_nat_expected(struct ip_conntrack *ct, struct ip_conntrack_tuple t; struct ip_ct_pptp_master *ct_pptp_info; struct ip_nat_pptp *nat_pptp_info; + struct ip_nat_range range; ct_pptp_info = &master->help.ct_pptp_info; nat_pptp_info = &master->nat.help.nat_pptp_info; @@ -110,7 +111,30 @@ static void pptp_nat_expected(struct ip_conntrack *ct, DEBUGP("not found!\n"); } - ip_nat_follow_master(ct, exp); + /* This must be a fresh one. */ + BUG_ON(ct->status & IPS_NAT_DONE_MASK); + + /* Change src to where master sends to */ + range.flags = IP_NAT_RANGE_MAP_IPS; + range.min_ip = range.max_ip + = ct->master->tuplehash[!exp->dir].tuple.dst.ip; + if (exp->dir == IP_CT_DIR_ORIGINAL) { + range.flags |= IP_NAT_RANGE_PROTO_SPECIFIED; + range.min = range.max = exp->saved_proto; + } + /* hook doesn't matter, but it has to do source manip */ + ip_nat_setup_info(ct, &range, NF_IP_POST_ROUTING); + + /* For DST manip, map port here to where it's expected. */ + range.flags = IP_NAT_RANGE_MAP_IPS; + range.min_ip = range.max_ip + = ct->master->tuplehash[!exp->dir].tuple.src.ip; + if (exp->dir == IP_CT_DIR_REPLY) { + range.flags |= IP_NAT_RANGE_PROTO_SPECIFIED; + range.min = range.max = exp->saved_proto; + } + /* hook doesn't matter, but it has to do destination manip */ + ip_nat_setup_info(ct, &range, NF_IP_PRE_ROUTING); } /* outbound packets == from PNS to PAC */ @@ -213,7 +237,7 @@ pptp_exp_gre(struct ip_conntrack_expect *expect_orig, /* alter expectation for PNS->PAC direction */ invert_tuplepr(&inv_t, &expect_orig->tuple); - expect_orig->saved_proto.gre.key = htons(nat_pptp_info->pac_call_id); + expect_orig->saved_proto.gre.key = htons(ct_pptp_info->pns_call_id); expect_orig->tuple.src.u.gre.key = htons(nat_pptp_info->pns_call_id); expect_orig->tuple.dst.u.gre.key = htons(ct_pptp_info->pac_call_id); expect_orig->dir = IP_CT_DIR_ORIGINAL; -- cgit v1.2.3-18-g5258 From ed77de9fc69076e6e7c85edf7c1b70650f53121a Mon Sep 17 00:00:00 2001 From: Harald Welte Date: Wed, 9 Nov 2005 13:02:16 -0800 Subject: [NETFILTER] nfnetlink: only load subsystems if CAP_NET_ADMIN is set Without this patch, any user can cause nfnetlink subsystems to be autoloaded. Those subsystems however could add significant processing overhead to packet processing, and would refuse any configuration messages from non-CAP_NET_ADMIN processes anyway. This patch follows a suggestion from Patrick McHardy. Signed-off-by: Harald Welte Signed-off-by: David S. Miller --- net/netfilter/nfnetlink.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) (limited to 'net') diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c index f8bd7c7e792..83f4c53030f 100644 --- a/net/netfilter/nfnetlink.c +++ b/net/netfilter/nfnetlink.c @@ -240,15 +240,18 @@ static inline int nfnetlink_rcv_msg(struct sk_buff *skb, ss = nfnetlink_get_subsys(type); if (!ss) { #ifdef CONFIG_KMOD - /* don't call nfnl_shunlock, since it would reenter - * with further packet processing */ - up(&nfnl_sem); - request_module("nfnetlink-subsys-%d", NFNL_SUBSYS_ID(type)); - nfnl_shlock(); - ss = nfnetlink_get_subsys(type); + if (cap_raised(NETLINK_CB(skb).eff_cap, CAP_NET_ADMIN)) { + /* don't call nfnl_shunlock, since it would reenter + * with further packet processing */ + up(&nfnl_sem); + request_module("nfnetlink-subsys-%d", + NFNL_SUBSYS_ID(type)); + nfnl_shlock(); + ss = nfnetlink_get_subsys(type); + } if (!ss) #endif - goto err_inval; + goto err_inval; } nc = nfnetlink_find_client(type, ss); -- cgit v1.2.3-18-g5258 From d63a92810807e8da298895236f2b99697e884014 Mon Sep 17 00:00:00 2001 From: Yasuyuki Kozakai Date: Wed, 9 Nov 2005 13:02:45 -0800 Subject: [NETFILTER]: stop tracking ICMP error at early point Currently connection tracking handles ICMP error like normal packets if it failed to get related connection. But it fails that after all. This makes connection tracking stop tracking ICMP error at early point. Signed-off-by: Yasuyuki Kozakai Signed-off-by: Harald Welte Signed-off-by: David S. Miller --- net/ipv4/netfilter/ip_conntrack_proto_icmp.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) (limited to 'net') diff --git a/net/ipv4/netfilter/ip_conntrack_proto_icmp.c b/net/ipv4/netfilter/ip_conntrack_proto_icmp.c index 98f0015dd25..9481d159acb 100644 --- a/net/ipv4/netfilter/ip_conntrack_proto_icmp.c +++ b/net/ipv4/netfilter/ip_conntrack_proto_icmp.c @@ -151,13 +151,13 @@ icmp_error_message(struct sk_buff *skb, /* Not enough header? */ inside = skb_header_pointer(skb, skb->nh.iph->ihl*4, sizeof(_in), &_in); if (inside == NULL) - return NF_ACCEPT; + return -NF_ACCEPT; /* Ignore ICMP's containing fragments (shouldn't happen) */ if (inside->ip.frag_off & htons(IP_OFFSET)) { DEBUGP("icmp_error_track: fragment of proto %u\n", inside->ip.protocol); - return NF_ACCEPT; + return -NF_ACCEPT; } innerproto = ip_conntrack_proto_find_get(inside->ip.protocol); @@ -166,7 +166,7 @@ icmp_error_message(struct sk_buff *skb, if (!ip_ct_get_tuple(&inside->ip, skb, dataoff, &origtuple, innerproto)) { DEBUGP("icmp_error: ! get_tuple p=%u", inside->ip.protocol); ip_conntrack_proto_put(innerproto); - return NF_ACCEPT; + return -NF_ACCEPT; } /* Ordinarily, we'd expect the inverted tupleproto, but it's @@ -174,7 +174,7 @@ icmp_error_message(struct sk_buff *skb, if (!ip_ct_invert_tuple(&innertuple, &origtuple, innerproto)) { DEBUGP("icmp_error_track: Can't invert tuple\n"); ip_conntrack_proto_put(innerproto); - return NF_ACCEPT; + return -NF_ACCEPT; } ip_conntrack_proto_put(innerproto); @@ -190,7 +190,7 @@ icmp_error_message(struct sk_buff *skb, if (!h) { DEBUGP("icmp_error_track: no match\n"); - return NF_ACCEPT; + return -NF_ACCEPT; } /* Reverse direction from that found */ if (DIRECTION(h) != IP_CT_DIR_REPLY) -- cgit v1.2.3-18-g5258 From fe902a91ff427af7dbf20e7c196623b2a4eade13 Mon Sep 17 00:00:00 2001 From: Pablo Neira Ayuso Date: Wed, 9 Nov 2005 13:03:09 -0800 Subject: [NETFILTER] ctnetlink: return -EINVAL if size is wrong Return -EINVAL if the size isn't OK instead of -EPERM. Signed-off-by: Pablo Neira Ayuso Signed-off-by: Harald Welte Signed-off-by: David S. Miller --- net/ipv4/netfilter/ip_conntrack_netlink.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net') diff --git a/net/ipv4/netfilter/ip_conntrack_netlink.c b/net/ipv4/netfilter/ip_conntrack_netlink.c index d1dde1417ef..cfc5487e627 100644 --- a/net/ipv4/netfilter/ip_conntrack_netlink.c +++ b/net/ipv4/netfilter/ip_conntrack_netlink.c @@ -593,7 +593,7 @@ static int ctnetlink_parse_nat_proto(struct nfattr *attr, nfattr_parse_nested(tb, CTA_PROTONAT_MAX, attr); if (nfattr_bad_size(tb, CTA_PROTONAT_MAX, cta_min_protonat)) - return -1; + return -EINVAL; npt = ip_nat_proto_find_get(ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.dst.protonum); if (!npt) -- cgit v1.2.3-18-g5258 From fcda46128d5cb50075339b79ce585ab767337e9e Mon Sep 17 00:00:00 2001 From: Pablo Neira Ayuso Date: Wed, 9 Nov 2005 13:03:26 -0800 Subject: [NETFILTER] ctnetlink: propagate error instaed of returning -EPERM Propagate the error to userspace instead of returning -EPERM if the get conntrack operation fails. Signed-off-by: Pablo Neira Ayuso Signed-off-by: Harald Welte Signed-off-by: David S. Miller --- net/ipv4/netfilter/ip_conntrack_netlink.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net') diff --git a/net/ipv4/netfilter/ip_conntrack_netlink.c b/net/ipv4/netfilter/ip_conntrack_netlink.c index cfc5487e627..7fe74565964 100644 --- a/net/ipv4/netfilter/ip_conntrack_netlink.c +++ b/net/ipv4/netfilter/ip_conntrack_netlink.c @@ -802,7 +802,7 @@ ctnetlink_get_conntrack(struct sock *ctnl, struct sk_buff *skb, free: kfree_skb(skb2); out: - return -1; + return err; } static inline int -- cgit v1.2.3-18-g5258 From a856a19a9f3ee14fc0d555470f3af138aeb0245c Mon Sep 17 00:00:00 2001 From: Pablo Neira Ayuso Date: Wed, 9 Nov 2005 13:03:42 -0800 Subject: [NETFILTER] ctnetlink: Add support to identify expectations by ID's Signed-off-by: Pablo Neira Ayuso Signed-off-by: Harald Welte Signed-off-by: David S. Miller --- net/ipv4/netfilter/ip_conntrack_netlink.c | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'net') diff --git a/net/ipv4/netfilter/ip_conntrack_netlink.c b/net/ipv4/netfilter/ip_conntrack_netlink.c index 7fe74565964..5c1c0a3d1c4 100644 --- a/net/ipv4/netfilter/ip_conntrack_netlink.c +++ b/net/ipv4/netfilter/ip_conntrack_netlink.c @@ -1293,6 +1293,14 @@ ctnetlink_get_expect(struct sock *ctnl, struct sk_buff *skb, if (!exp) return -ENOENT; + if (cda[CTA_EXPECT_ID-1]) { + u_int32_t id = *(u_int32_t *)NFA_DATA(cda[CTA_EXPECT_ID-1]); + if (exp->id != ntohl(id)) { + ip_conntrack_expect_put(exp); + return -ENOENT; + } + } + err = -ENOMEM; skb2 = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL); if (!skb2) -- cgit v1.2.3-18-g5258 From 439a9994bb6ae3c7cab1f0b776bca6bc7aa58a11 Mon Sep 17 00:00:00 2001 From: Krzysztof Piotr Oledzki Date: Wed, 9 Nov 2005 13:04:08 -0800 Subject: [NETFILTER] ctnetlink: Fix oops when no ICMP ID info in message This patch fixes an userspace triggered oops. If there is no ICMP_ID info the reference to attr will be NULL. Signed-off-by: Krzysztof Piotr Oledzki Signed-off-by: Pablo Neira Ayuso Signed-off-by: Harald Welte Signed-off-by: David S. Miller --- net/ipv4/netfilter/ip_conntrack_proto_icmp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'net') diff --git a/net/ipv4/netfilter/ip_conntrack_proto_icmp.c b/net/ipv4/netfilter/ip_conntrack_proto_icmp.c index 9481d159acb..083951e2069 100644 --- a/net/ipv4/netfilter/ip_conntrack_proto_icmp.c +++ b/net/ipv4/netfilter/ip_conntrack_proto_icmp.c @@ -296,7 +296,8 @@ static int icmp_nfattr_to_tuple(struct nfattr *tb[], struct ip_conntrack_tuple *tuple) { if (!tb[CTA_PROTO_ICMP_TYPE-1] - || !tb[CTA_PROTO_ICMP_CODE-1]) + || !tb[CTA_PROTO_ICMP_CODE-1] + || !tb[CTA_PROTO_ICMP_ID-1]) return -1; tuple->dst.u.icmp.type = -- cgit v1.2.3-18-g5258 From 5fd52fe0989f8c84abd8d4a40ded79d4da911744 Mon Sep 17 00:00:00 2001 From: Krzysztof Piotr Oledzki Date: Wed, 9 Nov 2005 13:04:32 -0800 Subject: [NETFILTER] ctnetlink: ICMP_ID is u_int16_t not u_int8_t. Signed-off-by: Krzysztof Piotr Oledzki Signed-off-by: Pablo Neira Ayuso Signed-off-by: Harald Welte Signed-off-by: David S. Miller --- net/ipv4/netfilter/ip_conntrack_proto_icmp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net') diff --git a/net/ipv4/netfilter/ip_conntrack_proto_icmp.c b/net/ipv4/netfilter/ip_conntrack_proto_icmp.c index 083951e2069..5198f3a1e2c 100644 --- a/net/ipv4/netfilter/ip_conntrack_proto_icmp.c +++ b/net/ipv4/netfilter/ip_conntrack_proto_icmp.c @@ -305,7 +305,7 @@ static int icmp_nfattr_to_tuple(struct nfattr *tb[], tuple->dst.u.icmp.code = *(u_int8_t *)NFA_DATA(tb[CTA_PROTO_ICMP_CODE-1]); tuple->src.u.icmp.id = - *(u_int8_t *)NFA_DATA(tb[CTA_PROTO_ICMP_ID-1]); + *(u_int16_t *)NFA_DATA(tb[CTA_PROTO_ICMP_ID-1]); return 0; } -- cgit v1.2.3-18-g5258 From 44fd0261d3509b0b4303fd9ba792058d230186ab Mon Sep 17 00:00:00 2001 From: Peter Chubb Date: Wed, 9 Nov 2005 13:05:47 -0800 Subject: [IPV6]: Fix fallout from CONFIG_IPV6_PRIVACY Trying to build today's 2.6.14+git snapshot gives undefined references to use_tempaddr Looks like an ifdef got left out. Signed-off-by: Peter Chubb Signed-off-by: David S. Miller --- net/ipv6/addrconf.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'net') diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index b7a5f51238b..ddcf7754eec 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -1022,6 +1022,7 @@ int ipv6_dev_get_saddr(struct net_device *daddr_dev, continue; } +#ifdef CONFIG_IPV6_PRIVACY /* Rule 7: Prefer public address * Note: prefer temprary address if use_tempaddr >= 2 */ @@ -1042,7 +1043,7 @@ int ipv6_dev_get_saddr(struct net_device *daddr_dev, if (hiscore.attrs & IPV6_SADDR_SCORE_PRIVACY) continue; } - +#endif /* Rule 8: Use longest matching prefix */ if (hiscore.rule < 8) hiscore.matchlen = ipv6_addr_diff(&ifa_result->addr, daddr); -- cgit v1.2.3-18-g5258 From 9f0ede52a0ebfe1fe99ee5bfd99d17e6ac0c503d Mon Sep 17 00:00:00 2001 From: Ken-ichirou MATSUZAWA Date: Wed, 9 Nov 2005 13:08:29 -0800 Subject: [IPV6]: ip6ip6_lock is not unlocked in error path. From: Ken-ichirou MATSUZAWA Signed-off-by: David S. Miller --- net/ipv6/ip6_tunnel.c | 1 + 1 file changed, 1 insertion(+) (limited to 'net') diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c index e6b0e3954c0..e315d0f80af 100644 --- a/net/ipv6/ip6_tunnel.c +++ b/net/ipv6/ip6_tunnel.c @@ -525,6 +525,7 @@ ip6ip6_rcv(struct sk_buff **pskb, unsigned int *nhoffp) if ((t = ip6ip6_tnl_lookup(&ipv6h->saddr, &ipv6h->daddr)) != NULL) { if (!xfrm6_policy_check(NULL, XFRM_POLICY_IN, skb)) { + read_unlock(&ip6ip6_lock); kfree_skb(skb); return 0; } -- cgit v1.2.3-18-g5258