<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/net/rds, branch v3.8.7</title>
<subtitle>Linux kernel source tree</subtitle>
<id>https://git.amat.us/linux/atom/net/rds?h=v3.8.7</id>
<link rel='self' href='https://git.amat.us/linux/atom/net/rds?h=v3.8.7'/>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/'/>
<updated>2013-03-20T20:10:57Z</updated>
<entry>
<title>rds: limit the size allocated by rds_message_alloc()</title>
<updated>2013-03-20T20:10:57Z</updated>
<author>
<name>Cong Wang</name>
<email>amwang@redhat.com</email>
</author>
<published>2013-03-03T16:18:11Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=273402d99210dc44926a53c4ccdbfa6b5058753c'/>
<id>urn:sha1:273402d99210dc44926a53c4ccdbfa6b5058753c</id>
<content type='text'>
[ Upstream commit ece6b0a2b25652d684a7ced4ae680a863af041e0 ]

Dave Jones reported the following bug:

"When fed mangled socket data, rds will trust what userspace gives it,
and tries to allocate enormous amounts of memory larger than what
kmalloc can satisfy."

WARNING: at mm/page_alloc.c:2393 __alloc_pages_nodemask+0xa0d/0xbe0()
Hardware name: GA-MA78GM-S2H
Modules linked in: vmw_vsock_vmci_transport vmw_vmci vsock fuse bnep dlci bridge 8021q garp stp mrp binfmt_misc l2tp_ppp l2tp_core rfcomm s
Pid: 24652, comm: trinity-child2 Not tainted 3.8.0+ #65
Call Trace:
 [&lt;ffffffff81044155&gt;] warn_slowpath_common+0x75/0xa0
 [&lt;ffffffff8104419a&gt;] warn_slowpath_null+0x1a/0x20
 [&lt;ffffffff811444ad&gt;] __alloc_pages_nodemask+0xa0d/0xbe0
 [&lt;ffffffff8100a196&gt;] ? native_sched_clock+0x26/0x90
 [&lt;ffffffff810b2128&gt;] ? trace_hardirqs_off_caller+0x28/0xc0
 [&lt;ffffffff810b21cd&gt;] ? trace_hardirqs_off+0xd/0x10
 [&lt;ffffffff811861f8&gt;] alloc_pages_current+0xb8/0x180
 [&lt;ffffffff8113eaaa&gt;] __get_free_pages+0x2a/0x80
 [&lt;ffffffff811934fe&gt;] kmalloc_order_trace+0x3e/0x1a0
 [&lt;ffffffff81193955&gt;] __kmalloc+0x2f5/0x3a0
 [&lt;ffffffff8104df0c&gt;] ? local_bh_enable_ip+0x7c/0xf0
 [&lt;ffffffffa0401ab3&gt;] rds_message_alloc+0x23/0xb0 [rds]
 [&lt;ffffffffa04043a1&gt;] rds_sendmsg+0x2b1/0x990 [rds]
 [&lt;ffffffff810b21cd&gt;] ? trace_hardirqs_off+0xd/0x10
 [&lt;ffffffff81564620&gt;] sock_sendmsg+0xb0/0xe0
 [&lt;ffffffff810b2052&gt;] ? get_lock_stats+0x22/0x70
 [&lt;ffffffff810b24be&gt;] ? put_lock_stats.isra.23+0xe/0x40
 [&lt;ffffffff81567f30&gt;] sys_sendto+0x130/0x180
 [&lt;ffffffff810b872d&gt;] ? trace_hardirqs_on+0xd/0x10
 [&lt;ffffffff816c547b&gt;] ? _raw_spin_unlock_irq+0x3b/0x60
 [&lt;ffffffff816cd767&gt;] ? sysret_check+0x1b/0x56
 [&lt;ffffffff810b8695&gt;] ? trace_hardirqs_on_caller+0x115/0x1a0
 [&lt;ffffffff81341d8e&gt;] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [&lt;ffffffff816cd742&gt;] system_call_fastpath+0x16/0x1b
---[ end trace eed6ae990d018c8b ]---

Reported-by: Dave Jones &lt;davej@redhat.com&gt;
Cc: Dave Jones &lt;davej@redhat.com&gt;
Cc: David S. Miller &lt;davem@davemloft.net&gt;
Cc: Venkat Venkatsubra &lt;venkat.x.venkatsubra@oracle.com&gt;
Signed-off-by: Cong Wang &lt;amwang@redhat.com&gt;
Acked-by: Venkat Venkatsubra &lt;venkat.x.venkatsubra@oracle.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>IB/rds: suppress incompatible protocol when version is known</title>
<updated>2012-12-26T23:17:37Z</updated>
<author>
<name>Marciniszyn, Mike</name>
<email>mike.marciniszyn@intel.com</email>
</author>
<published>2012-12-21T08:01:54Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=a49675988c127b5b5876c252e5db2ee0410a10c2'/>
<id>urn:sha1:a49675988c127b5b5876c252e5db2ee0410a10c2</id>
<content type='text'>
Add an else to only print the incompatible protocol message
when version hasn't been established.

Signed-off-by: Mike Marciniszyn &lt;mike.marciniszyn@intel.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>IB/rds: Correct ib_api use with gs_dma_address/sg_dma_len</title>
<updated>2012-12-26T23:17:37Z</updated>
<author>
<name>Marciniszyn, Mike</name>
<email>mike.marciniszyn@intel.com</email>
</author>
<published>2012-12-21T08:01:49Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=f2e9bd70327d788011cf787a51ceba5925bbc63a'/>
<id>urn:sha1:f2e9bd70327d788011cf787a51ceba5925bbc63a</id>
<content type='text'>
0b088e00 ("RDS: Use page_remainder_alloc() for recv bufs")
added uses of sg_dma_len() and sg_dma_address(). This makes
RDS DOA with the qib driver.

IB ulps should use ib_sg_dma_len() and ib_sg_dma_address
respectively since some HCAs overload ib_sg_dma* operations.

Signed-off-by: Mike Marciniszyn &lt;mike.marciniszyn@intel.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: rds: use this_cpu_* per-cpu helper</title>
<updated>2012-11-19T23:59:44Z</updated>
<author>
<name>Shan Wei</name>
<email>davidshan@tencent.com</email>
</author>
<published>2012-11-12T15:52:01Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=ae4b46e9d7128d2d76e6857fe0b9fc240e8ac695'/>
<id>urn:sha1:ae4b46e9d7128d2d76e6857fe0b9fc240e8ac695</id>
<content type='text'>
Signed-off-by: Shan Wei &lt;davidshan@tencent.com&gt;
Reviewed-by: Christoph Lameter &lt;cl@linux.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>RDS: fix rds-ping spinlock recursion</title>
<updated>2012-10-09T17:57:23Z</updated>
<author>
<name>jeff.liu</name>
<email>jeff.liu@oracle.com</email>
</author>
<published>2012-10-08T18:57:27Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=5175a5e76bbdf20a614fb47ce7a38f0f39e70226'/>
<id>urn:sha1:5175a5e76bbdf20a614fb47ce7a38f0f39e70226</id>
<content type='text'>
This is the revised patch for fixing rds-ping spinlock recursion
according to Venkat's suggestions.

RDS ping/pong over TCP feature has been broken for years(2.6.39 to
3.6.0) since we have to set TCP cork and call kernel_sendmsg() between
ping/pong which both need to lock "struct sock *sk". However, this
lock has already been hold before rds_tcp_data_ready() callback is
triggerred. As a result, we always facing spinlock resursion which
would resulting in system panic.

Given that RDS ping is only used to test the connectivity and not for
serious performance measurements, we can queue the pong transmit to
rds_wq as a delayed response.

Reported-by: Dan Carpenter &lt;dan.carpenter@oracle.com&gt;
CC: Venkat Venkatsubra &lt;venkat.x.venkatsubra@oracle.com&gt;
CC: David S. Miller &lt;davem@davemloft.net&gt;
CC: James Morris &lt;james.l.morris@oracle.com&gt;
Signed-off-by: Jie Liu &lt;jeff.liu@oracle.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>rds: Don't disable BH on BH context</title>
<updated>2012-08-23T05:52:04Z</updated>
<author>
<name>Ying Xue</name>
<email>ying.xue@windriver.com</email>
</author>
<published>2012-08-19T21:44:08Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=bfdc587c5af4ff155cf702b972e8fcd66d77d5f2'/>
<id>urn:sha1:bfdc587c5af4ff155cf702b972e8fcd66d77d5f2</id>
<content type='text'>
Since we have already in BH context when *_write_space(),
*_data_ready() as well as *_state_change() are called, it's
unnecessary to disable BH.

Signed-off-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>rds: set correct msg_namelen</title>
<updated>2012-07-23T08:01:44Z</updated>
<author>
<name>Weiping Pan</name>
<email>wpan@redhat.com</email>
</author>
<published>2012-07-23T02:37:48Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=06b6a1cf6e776426766298d055bb3991957d90a7'/>
<id>urn:sha1:06b6a1cf6e776426766298d055bb3991957d90a7</id>
<content type='text'>
Jay Fenlason (fenlason@redhat.com) found a bug,
that recvfrom() on an RDS socket can return the contents of random kernel
memory to userspace if it was called with a address length larger than
sizeof(struct sockaddr_in).
rds_recvmsg() also fails to set the addr_len paramater properly before
returning, but that's just a bug.
There are also a number of cases wher recvfrom() can return an entirely bogus
address. Anything in rds_recvmsg() that returns a non-negative value but does
not go through the "sin = (struct sockaddr_in *)msg-&gt;msg_name;" code path
at the end of the while(1) loop will return up to 128 bytes of kernel memory
to userspace.

And I write two test programs to reproduce this bug, you will see that in
rds_server, fromAddr will be overwritten and the following sock_fd will be
destroyed.
Yes, it is the programmer's fault to set msg_namelen incorrectly, but it is
better to make the kernel copy the real length of address to user space in
such case.

How to run the test programs ?
I test them on 32bit x86 system, 3.5.0-rc7.

1 compile
gcc -o rds_client rds_client.c
gcc -o rds_server rds_server.c

2 run ./rds_server on one console

3 run ./rds_client on another console

4 you will see something like:
server is waiting to receive data...
old socket fd=3
server received data from client:data from client
msg.msg_namelen=32
new socket fd=-1067277685
sendmsg()
: Bad file descriptor

/***************** rds_client.c ********************/

int main(void)
{
	int sock_fd;
	struct sockaddr_in serverAddr;
	struct sockaddr_in toAddr;
	char recvBuffer[128] = "data from client";
	struct msghdr msg;
	struct iovec iov;

	sock_fd = socket(AF_RDS, SOCK_SEQPACKET, 0);
	if (sock_fd &lt; 0) {
		perror("create socket error\n");
		exit(1);
	}

	memset(&amp;serverAddr, 0, sizeof(serverAddr));
	serverAddr.sin_family = AF_INET;
	serverAddr.sin_addr.s_addr = inet_addr("127.0.0.1");
	serverAddr.sin_port = htons(4001);

	if (bind(sock_fd, (struct sockaddr*)&amp;serverAddr, sizeof(serverAddr)) &lt; 0) {
		perror("bind() error\n");
		close(sock_fd);
		exit(1);
	}

	memset(&amp;toAddr, 0, sizeof(toAddr));
	toAddr.sin_family = AF_INET;
	toAddr.sin_addr.s_addr = inet_addr("127.0.0.1");
	toAddr.sin_port = htons(4000);
	msg.msg_name = &amp;toAddr;
	msg.msg_namelen = sizeof(toAddr);
	msg.msg_iov = &amp;iov;
	msg.msg_iovlen = 1;
	msg.msg_iov-&gt;iov_base = recvBuffer;
	msg.msg_iov-&gt;iov_len = strlen(recvBuffer) + 1;
	msg.msg_control = 0;
	msg.msg_controllen = 0;
	msg.msg_flags = 0;

	if (sendmsg(sock_fd, &amp;msg, 0) == -1) {
		perror("sendto() error\n");
		close(sock_fd);
		exit(1);
	}

	printf("client send data:%s\n", recvBuffer);

	memset(recvBuffer, '\0', 128);

	msg.msg_name = &amp;toAddr;
	msg.msg_namelen = sizeof(toAddr);
	msg.msg_iov = &amp;iov;
	msg.msg_iovlen = 1;
	msg.msg_iov-&gt;iov_base = recvBuffer;
	msg.msg_iov-&gt;iov_len = 128;
	msg.msg_control = 0;
	msg.msg_controllen = 0;
	msg.msg_flags = 0;
	if (recvmsg(sock_fd, &amp;msg, 0) == -1) {
		perror("recvmsg() error\n");
		close(sock_fd);
		exit(1);
	}

	printf("receive data from server:%s\n", recvBuffer);

	close(sock_fd);

	return 0;
}

/***************** rds_server.c ********************/

int main(void)
{
	struct sockaddr_in fromAddr;
	int sock_fd;
	struct sockaddr_in serverAddr;
	unsigned int addrLen;
	char recvBuffer[128];
	struct msghdr msg;
	struct iovec iov;

	sock_fd = socket(AF_RDS, SOCK_SEQPACKET, 0);
	if(sock_fd &lt; 0) {
		perror("create socket error\n");
		exit(0);
	}

	memset(&amp;serverAddr, 0, sizeof(serverAddr));
	serverAddr.sin_family = AF_INET;
	serverAddr.sin_addr.s_addr = inet_addr("127.0.0.1");
	serverAddr.sin_port = htons(4000);
	if (bind(sock_fd, (struct sockaddr*)&amp;serverAddr, sizeof(serverAddr)) &lt; 0) {
		perror("bind error\n");
		close(sock_fd);
		exit(1);
	}

	printf("server is waiting to receive data...\n");
	msg.msg_name = &amp;fromAddr;

	/*
	 * I add 16 to sizeof(fromAddr), ie 32,
	 * and pay attention to the definition of fromAddr,
	 * recvmsg() will overwrite sock_fd,
	 * since kernel will copy 32 bytes to userspace.
	 *
	 * If you just use sizeof(fromAddr), it works fine.
	 * */
	msg.msg_namelen = sizeof(fromAddr) + 16;
	/* msg.msg_namelen = sizeof(fromAddr); */
	msg.msg_iov = &amp;iov;
	msg.msg_iovlen = 1;
	msg.msg_iov-&gt;iov_base = recvBuffer;
	msg.msg_iov-&gt;iov_len = 128;
	msg.msg_control = 0;
	msg.msg_controllen = 0;
	msg.msg_flags = 0;

	while (1) {
		printf("old socket fd=%d\n", sock_fd);
		if (recvmsg(sock_fd, &amp;msg, 0) == -1) {
			perror("recvmsg() error\n");
			close(sock_fd);
			exit(1);
		}
		printf("server received data from client:%s\n", recvBuffer);
		printf("msg.msg_namelen=%d\n", msg.msg_namelen);
		printf("new socket fd=%d\n", sock_fd);
		strcat(recvBuffer, "--data from server");
		if (sendmsg(sock_fd, &amp;msg, 0) == -1) {
			perror("sendmsg()\n");
			close(sock_fd);
			exit(1);
		}
	}

	close(sock_fd);
	return 0;
}

Signed-off-by: Weiping Pan &lt;wpan@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: Fix (nearly-)kernel-doc comments for various functions</title>
<updated>2012-07-11T06:13:45Z</updated>
<author>
<name>Ben Hutchings</name>
<email>bhutchings@solarflare.com</email>
</author>
<published>2012-07-10T10:55:09Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=2c53040f018b6c36a46eec75b9b937aaa5f78e6d'/>
<id>urn:sha1:2c53040f018b6c36a46eec75b9b937aaa5f78e6d</id>
<content type='text'>
Fix incorrect start markers, wrapped summary lines, missing section
breaks, incorrect separators, and some name mismatches.

Signed-off-by: Ben Hutchings &lt;bhutchings@solarflare.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>rds_rdma: don't assume infiniband device is PCI</title>
<updated>2012-05-29T21:30:07Z</updated>
<author>
<name>Thadeu Lima de Souza Cascardo</name>
<email>cascardo@linux.vnet.ibm.com</email>
</author>
<published>2012-05-28T08:52:05Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=a0c6ffbcfe600606b2d913dded4dc6b37b3bbbfd'/>
<id>urn:sha1:a0c6ffbcfe600606b2d913dded4dc6b37b3bbbfd</id>
<content type='text'>
RDS code assumes that the struct ib_device dma_device member, which is a
pointer, points to a struct device embedded in a struct pci_dev.

This is not the case for ehca, for example, which is a OF driver, and
makes dma_device point to a struct device embedded in a struct
platform_device.

This will make the system crash when rds_rdma is loaded in a system
with ehca, since it will try to access the bus member of a non-existent
struct pci_dev.

The only reason rds_rdma uses the struct pci_dev is to get the NUMA node
the device is attached to. Using dev_to_node for that is much better,
since it won't assume which bus the infiniband is attached to.

Signed-off-by: Thadeu Lima de Souza Cascardo &lt;cascardo@linux.vnet.ibm.com&gt;
Cc: dledford@redhat.com
Cc: Jes.Sorensen@redhat.com
Cc: Venkat Venkatsubra &lt;venkat.x.venkatsubra@oracle.com&gt;
Acked-by: Venkat Venkatsubra &lt;venkat.x.venkatsubra@oracle.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>sock: Introduce named constants for sk_reuse</title>
<updated>2012-04-21T19:52:25Z</updated>
<author>
<name>Pavel Emelyanov</name>
<email>xemul@parallels.com</email>
</author>
<published>2012-04-19T03:39:36Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=4a17fd5229c1b6066aa478f6b690f8293ce811a1'/>
<id>urn:sha1:4a17fd5229c1b6066aa478f6b690f8293ce811a1</id>
<content type='text'>
Name them in a "backward compatible" manner, i.e. reuse or not
are still 1 and 0 respectively. The reuse value of 2 means that
the socket with it will forcibly reuse everyone else's port.

Signed-off-by: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
