aboutsummaryrefslogtreecommitdiffstats
path: root/net/sctp
Commit message (Collapse)AuthorAgeFilesLines
* sctp: implement sender-side procedures for SSN Reset Request ParameterXin Long2017-01-183-10/+131
| | | | | | | | | | | | | | | | This patch is to implement sender-side procedures for the Outgoing and Incoming SSN Reset Request Parameter described in rfc6525 section 5.1.2 and 5.1.3. It is also add sockopt SCTP_RESET_STREAMS in rfc6525 section 6.3.2 for users. Note that the new asoc member strreset_outstanding is to make sure only one reconf request chunk on the fly as rfc6525 section 5.1.1 demands. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: add sockopt SCTP_ENABLE_STREAM_RESETXin Long2017-01-182-0/+85
| | | | | | | | | This patch is to add sockopt SCTP_ENABLE_STREAM_RESET to get/set strreset_enable to indicate which reconf request type it supports, which is described in rfc6525 section 6.3.1. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: add reconf_enable in asoc ep and netnsXin Long2017-01-184-0/+20
| | | | | | | | | | | | | | | | This patch is to add reconf_enable field in all of asoc ep and netns to indicate if they support stream reset. When initializing, asoc reconf_enable get the default value from ep reconf_enable which is from netns netns reconf_enable by default. It is also to add reconf_capable in asoc peer part to know if peer supports reconf_enable, the value is set if ext params have reconf chunk support when processing init chunk, just as rfc6525 section 5.1.1 demands. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: add stream reconf primitiveXin Long2017-01-183-0/+36
| | | | | | | | | | | | This patch is to add a primitive based on sctp primitive frame for sending stream reconf request. It works as the other primitives, and create a SCTP_CMD_REPLY command to send the request chunk out. sctp_primitive_RECONF would be the api to send a reconf request chunk. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: add stream reconf timerXin Long2017-01-185-2/+104
| | | | | | | | | | | | | | | | | | | | This patch is to add a per transport timer based on sctp timer frame for stream reconf chunk retransmission. It would start after sending a reconf request chunk, and stop after receiving the response chunk. If the timer expires, besides retransmitting the reconf request chunk, it would also do the same thing with data RTO timer. like to increase the appropriate error counts, and perform threshold management, possibly destroying the asoc if sctp retransmission thresholds are exceeded, just as section 5.1.1 describes. This patch is also to add asoc strreset_chunk, it is used to save the reconf request chunk, so that it can be retransmitted, and to check if the response is really for this request by comparing the information inside with the response chunk as well. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: add support for generating stream reconf ssn reset request chunkXin Long2017-01-182-0/+122
| | | | | | | | | | | | This patch is to add asoc strreset_outseq and strreset_inseq for saving the reconf request sequence, initialize them when create assoc and process init, and also to define Incoming and Outgoing SSN Reset Request Parameter described in rfc6525 section 4.1 and 4.2, As they can be in one same chunk as section rfc6525 3.1-3 describes, it makes them in one function. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: remove useless code from sctp_apply_peer_addr_paramsMarcelo Ricardo Leitner2017-01-161-1/+0
| | | | | | | | | | | | | | | | | | sctp_frag_point() doesn't store anything, and thus just calling it cannot do anything useful. sctp_apply_peer_addr_params is only called by sctp_setsockopt_peer_addr_params. When operating on an asoc, sctp_setsockopt_peer_addr_params will call sctp_apply_peer_addr_params once for the asoc, and then once for each transport this asoc has, meaning that the frag_point will be recomputed when updating the transports and calling it when updating the asoc is not necessary. IOW, no action is needed here and we can remove this call. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: remove unused var from sctp_process_asconfMarcelo Ricardo Leitner2017-01-161-2/+0
| | | | | | | | | Assigned but not used. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2017-01-111-1/+1
|\ | | | | | | | | | | | | Two AF_* families adding entries to the lockdep tables at the same time. Signed-off-by: David S. Miller <davem@davemloft.net>
| * sctp: Fix spelling mistake: "Atempt" -> "Attempt"Colin Ian King2017-01-111-1/+1
| | | | | | | | | | | | | | | | Trivial fix to spelling mistake in WARN_ONCE message Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sctp: prepare asoc stream for stream reconfXin Long2017-01-068-160/+116
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sctp stream reconf, described in RFC 6525, needs a structure to save per stream information in assoc, like stream state. In the future, sctp stream scheduler also needs it to save some stream scheduler params and queues. This patchset is to prepare the stream array in assoc for stream reconf. It defines sctp_stream that includes stream arrays inside to replace ssnmap. Note that we use different structures for IN and OUT streams, as the members in per OUT stream will get more and more different from per IN stream. v1->v2: - put these patches into a smaller group. v2->v3: - define sctp_stream to contain stream arrays, and create stream.c to put stream-related functions. - merge 3 patches into 1, as new sctp_stream has the same name with before. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sctp: refactor sctp_datamsg_from_userMarcelo Ricardo Leitner2016-12-291-75/+32
| | | | | | | | | | | | | | | | | | | | | | | | This patch refactors sctp_datamsg_from_user() in an attempt to make it better to read and avoid code duplication for handling the last fragment. It also avoids doing division and remaining operations. Even though, it should still operate similarly as before this patch. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sctp: add pr_debug for tracking asocs not foundMarcelo Ricardo Leitner2016-12-281-2/+15
| | | | | | | | | | | | | | | | This pr_debug may help identify why the system is generating some Aborts. It's not something a sysadmin would be expected to use. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sctp: sctp_chunk_length_valid should return boolMarcelo Ricardo Leitner2016-12-281-8/+7
| | | | | | | | | | Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sctp: remove return value from sctp_packet_init/configMarcelo Ricardo Leitner2016-12-282-11/+8
| | | | | | | | | | | | | | | | There is no reason to use this cascading. It doesn't add anything. Let's remove it and simplify. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sctp: simplify addr copyMarcelo Ricardo Leitner2016-12-282-20/+14
| | | | | | | | | | | | | | Make it a bit easier to read. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sctp: reduce indent level in sctp_sf_shut_8_4_5Marcelo Ricardo Leitner2016-12-281-30/+28
| | | | | | | | | | Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sctp: reduce indent level at sctp_sf_tabort_8_4_8Marcelo Ricardo Leitner2016-12-281-23/+21
|/ | | | | Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ktime: Cleanup ktime_set() usageThomas Gleixner2016-12-251-1/+1
| | | | | | | | | | ktime_set(S,N) was required for the timespec storage type and is still useful for situations where a Seconds and Nanoseconds part of a time value needs to be converted. For anything where the Seconds argument is 0, this is pointless and can be replaced with a simple assignment. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org>
* Replace <asm/uaccess.h> with <linux/uaccess.h> globallyLinus Torvalds2016-12-241-1/+1
| | | | | | | | | | | | | This was entirely automated, using the script by Al: PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* sctp: fix recovering from 0 win with small data chunksMarcelo Ricardo Leitner2016-12-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently if SCTP closes the receive window with window pressure, mostly caused by excessive skb overhead on payload/overheads ratio, SCTP will close the window abruptly while saving the delta on rwnd_press. It will start recovering rwnd as the chunks are consumed by the application and the rwnd_press will be only recovered after rwnd reach the same value as of rwnd_press, mostly to prevent silly window syndrome. Thing is, this is very inefficient with small data chunks, as with those it will never reach back that value, and thus it will never recover from such pressure. This means that we will not issue window updates when recovering from 0 window and will rely on a sender retransmit to notice it. The fix here is to remove such threshold, as no value is good enough: it depends on the (avg) chunk sizes being used. Test with netperf -t SCTP_STREAM -- -m 1, and trigger 0 window by sending SIGSTOP to netserver, sleep 1.2, and SIGCONT. Rate limited to 845kbps, for visibility. Capture done at netserver side. Previously: 01.500751 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632372996] [a_rwnd 99153] [ 01.500752 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632372997] [SID: 0] [SS 01.517471 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373010] [SID: 0] [SS 01.517483 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373009] [a_rwnd 0] [#gap 01.517485 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373083] [SID: 0] [SS 01.517488 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373009] [a_rwnd 0] [#gap 01.534168 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373096] [SID: 0] [SS 01.534180 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373009] [a_rwnd 0] [#gap 01.534181 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373169] [SID: 0] [SS 01.534185 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373009] [a_rwnd 0] [#gap 02.525978 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373010] [SID: 0] [SS 02.526021 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373009] [a_rwnd 0] [#gap (window update missed) 04.573807 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373010] [SID: 0] [SS 04.779370 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373082] [a_rwnd 859] [#g 04.789162 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373083] [SID: 0] [SS 04.789323 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373156] [SID: 0] [SS 04.789372 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373228] [a_rwnd 786] [#g After: 02.568957 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098728] [a_rwnd 99153] 02.568961 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098729] [SID: 0] [S 02.585631 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098742] [SID: 0] [S 02.585666 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 0] [#ga 02.585671 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098815] [SID: 0] [S 02.585683 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 0] [#ga 02.602330 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098828] [SID: 0] [S 02.602359 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 0] [#ga 02.602363 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098901] [SID: 0] [S 02.602372 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 0] [#ga 03.600788 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098742] [SID: 0] [S 03.600830 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 0] [#ga 03.619455 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 13508] 03.619479 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 27017] 03.619497 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 40526] 03.619516 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 54035] 03.619533 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 67544] 03.619552 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 81053] 03.619570 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 94562] (following data transmission triggered by window updates above) 03.633504 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098742] [SID: 0] [S 03.836445 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098814] [a_rwnd 100000] 03.843125 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098815] [SID: 0] [S 03.843285 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098888] [SID: 0] [S 03.843345 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098960] [a_rwnd 99894] 03.856546 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098961] [SID: 0] [S 03.866450 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490099011] [SID: 0] [S Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: do not loose window information if in rwnd_overMarcelo Ricardo Leitner2016-12-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's possible that we receive a packet that is larger than current window. If it's the first packet in this way, it will cause it to increase rwnd_over. Then, if we receive another data chunk (specially as SCTP allows you to have one data chunk in flight even during 0 window), rwnd_over will be overwritten instead of added to. In the long run, this could cause the window to grow bigger than its initial size, as rwnd_over would be charged only for the last received data chunk while the code will try open the window for all packets that were received and had its value in rwnd_over overwritten. This, then, can lead to the worsening of payload/buffer ratio and cause rwnd_press to kick in more often. The fix is to sum it too, same as is done for rwnd_press, so that if we receive 3 chunks after closing the window, we still have to release that same amount before re-opening it. Log snippet from sctp_test exhibiting the issue: [ 146.209232] sctp: sctp_assoc_rwnd_decrease: asoc:ffff88013928e000 rwnd decreased by 1 to (0, 1, 114221) [ 146.209232] sctp: sctp_assoc_rwnd_decrease: association:ffff88013928e000 has asoc->rwnd:0, asoc->rwnd_over:1! [ 146.209232] sctp: sctp_assoc_rwnd_decrease: asoc:ffff88013928e000 rwnd decreased by 1 to (0, 1, 114221) [ 146.209232] sctp: sctp_assoc_rwnd_decrease: association:ffff88013928e000 has asoc->rwnd:0, asoc->rwnd_over:1! [ 146.209232] sctp: sctp_assoc_rwnd_decrease: asoc:ffff88013928e000 rwnd decreased by 1 to (0, 1, 114221) [ 146.209232] sctp: sctp_assoc_rwnd_decrease: association:ffff88013928e000 has asoc->rwnd:0, asoc->rwnd_over:1! [ 146.209232] sctp: sctp_assoc_rwnd_decrease: asoc:ffff88013928e000 rwnd decreased by 1 to (0, 1, 114221) Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: not copying duplicate addrs to the assoc's bind address listXin Long2016-12-202-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | sctp.local_addr_list is a global address list that is supposed to include all the local addresses. sctp updates this list according to NETDEV_UP/ NETDEV_DOWN notifications. However, if multiple NICs have the same address, the global list would have duplicate addresses. Even if for one NIC, promote secondaries in __inet_del_ifa can also lead to accumulating duplicate addresses. When sctp binds address 'ANY' and creates a connection, it copies all the addresses from global list into asoc's bind addr list, which makes sctp pack the duplicate addresses into INIT/INIT_ACK packets. This patch is to filter the duplicate addresses when copying the addrs from global list in sctp_copy_local_addr_list and unpacking addr_param from cookie in sctp_raw_to_bind_addrs to asoc's bind addr list. Note that we can't filter the duplicate addrs when global address list gets updated, As NETDEV_DOWN event may remove an addr that still exists in another NIC. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: reduce indent level in sctp_copy_local_addr_listXin Long2016-12-201-18/+19
| | | | | | | | | This patch is to reduce indent level by using continue when the addr is not allowed, and also drop end_copy by using break. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: sctp_transport_lookup_process should rcu_read_unlock when transport is ↵Xin Long2016-12-171-4/+3
| | | | | | | | | | | | | | | null Prior to this patch, sctp_transport_lookup_process didn't rcu_read_unlock when it failed to find a transport by sctp_addrs_lookup_transport. This patch is to fix it by moving up rcu_read_unlock right before checking transport and also to remove the out path. Fixes: 1cceda784980 ("sctp: fix the issue sctp_diag uses lock_sock in rcu_read_lock") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: sctp_epaddr_lookup_transport should be protected by rcu_read_lockXin Long2016-12-171-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | Since commit 7fda702f9315 ("sctp: use new rhlist interface on sctp transport rhashtable"), sctp has changed to use rhlist_lookup to look up transport, but rhlist_lookup doesn't call rcu_read_lock inside, unlike rhashtable_lookup_fast. It is called in sctp_epaddr_lookup_transport and sctp_addrs_lookup_transport. sctp_addrs_lookup_transport is always in the protection of rcu_read_lock(), as __sctp_lookup_association is called in rx path or sctp_lookup_association which are in the protection of rcu_read_lock() already. But sctp_epaddr_lookup_transport is called by sctp_endpoint_lookup_assoc, it doesn't call rcu_read_lock, which may cause "suspicious rcu_dereference_check usage' in __rhashtable_lookup. This patch is to fix it by adding rcu_read_lock in sctp_endpoint_lookup_assoc before calling sctp_epaddr_lookup_transport. Fixes: 7fda702f9315 ("sctp: use new rhlist interface on sctp transport rhashtable") Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: use new rhlist interface on sctp transport rhashtableXin Long2016-11-163-47/+61
| | | | | | | | | | | | | | | | | | | | | | | | | Now sctp transport rhashtable uses hash(lport, dport, daddr) as the key to hash a node to one chain. If in one host thousands of assocs connect to one server with the same lport and different laddrs (although it's not a normal case), all the transports would be hashed into the same chain. It may cause to keep returning -EBUSY when inserting a new node, as the chain is too long and sctp inserts a transport node in a loop, which could even lead to system hangs there. The new rhlist interface works for this case that there are many nodes with the same key in one chain. It puts them into a list then makes this list be as a node of the chain. This patch is to replace rhashtable_ interface with rhltable_ interface. Since a chain would not be too long and it would not return -EBUSY with this fix when inserting a node, the reinsert loop is also removed here. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2016-11-153-33/+31
|\ | | | | | | | | | | | | Several cases of bug fixes in 'net' overlapping other changes in 'net-next-. Signed-off-by: David S. Miller <davem@davemloft.net>
| * sctp: change sk state only when it has assocs in sctp_shutdownXin Long2016-11-141-8/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now when users shutdown a sock with SEND_SHUTDOWN in sctp, even if this sock has no connection (assoc), sk state would be changed to SCTP_SS_CLOSING, which is not as we expect. Besides, after that if users try to listen on this sock, kernel could even panic when it dereference sctp_sk(sk)->bind_hash in sctp_inet_listen, as bind_hash is null when sock has no assoc. This patch is to move sk state change after checking sk assocs is not empty, and also merge these two if() conditions and reduce indent level. Fixes: d46e416c11c8 ("sctp: sctp should change socket state when shutdown is received") Reported-by: Andrey Konovalov <andreyknvl@google.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sctp: assign assoc_id earlier in __sctp_connectMarcelo Ricardo Leitner2016-11-071-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sctp_wait_for_connect() currently already holds the asoc to keep it alive during the sleep, in case another thread release it. But Andrey Konovalov and Dmitry Vyukov reported an use-after-free in such situation. Problem is that __sctp_connect() doesn't get a ref on the asoc and will do a read on the asoc after calling sctp_wait_for_connect(), but by then another thread may have closed it and the _put on sctp_wait_for_connect will actually release it, causing the use-after-free. Fix is, instead of doing the read after waiting for the connect, do it before so, and avoid this issue as the socket is still locked by then. There should be no issue on returning the asoc id in case of failure as the application shouldn't trust on that number in such situations anyway. This issue doesn't exist in sctp_sendmsg() path. Reported-by: Dmitry Vyukov <dvyukov@google.com> Reported-by: Andrey Konovalov <andreyknvl@google.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sctp: hold transport instead of assoc when lookup assoc in rx pathXin Long2016-10-312-17/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prior to this patch, in rx path, before calling lock_sock, it needed to hold assoc when got it by __sctp_lookup_association, in case other place would free/put assoc. But in __sctp_lookup_association, it lookup and hold transport, then got assoc by transport->assoc, then hold assoc and put transport. It means it didn't hold transport, yet it was returned and later on directly assigned to chunk->transport. Without the protection of sock lock, the transport may be freed/put by other places, which would cause a use-after-free issue. This patch is to fix this issue by holding transport instead of assoc. As holding transport can make sure to access assoc is also safe, and actually it looks up assoc by searching transport rhashtable, to hold transport here makes more sense. Note that the function will be renamed later on on another patch. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sctp: return back transport in __sctp_rcv_init_lookupXin Long2016-10-311-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prior to this patch, it used a local variable to save the transport that is looked up by __sctp_lookup_association(), and didn't return it back. But in sctp_rcv, it is used to initialize chunk->transport. So when hitting this, even if it found the transport, it was still initializing chunk->transport with null instead. This patch is to return the transport back through transport pointer that is from __sctp_rcv_lookup_harder(). Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sctp: hold transport instead of assoc in sctp_diagXin Long2016-10-311-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | In sctp_transport_lookup_process(), Commit 1cceda784980 ("sctp: fix the issue sctp_diag uses lock_sock in rcu_read_lock") moved cb() out of rcu lock, but it put transport and hold assoc instead, and ignore that cb() still uses transport. It may cause a use-after-free issue. This patch is to hold transport instead of assoc there. Fixes: 1cceda784980 ("sctp: fix the issue sctp_diag uses lock_sock in rcu_read_lock") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sctp: clean up sctp_packet_transmitXin Long2016-11-021-277/+158
| | | | | | | | | | | | | | | | | | | | | | | | | | | | After adding sctp gso, sctp_packet_transmit is a quite big function now. This patch is to extract the codes for packing packet to sctp_packet_pack from sctp_packet_transmit, and add some comments, simplify the err path by freeing auth chunk when freeing packet chunk_list in out path and freeing head skb early if it fails to pack packet. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2016-10-303-8/+17
|\| | | | | | | | | | | | | | | | | Mostly simple overlapping changes. For example, David Ahern's adjacency list revamp in 'net-next' conflicted with an adjacency list traversal bug fix in 'net'. Signed-off-by: David S. Miller <davem@davemloft.net>
| * sctp: validate chunk len before actually using itMarcelo Ricardo Leitner2016-10-291-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Andrey Konovalov reported that KASAN detected that SCTP was using a slab beyond the boundaries. It was caused because when handling out of the blue packets in function sctp_sf_ootb() it was checking the chunk len only after already processing the first chunk, validating only for the 2nd and subsequent ones. The fix is to just move the check upwards so it's also validated for the 1st chunk. Reported-by: Andrey Konovalov <andreyknvl@google.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sctp: fix the panic caused by route updateXin Long2016-10-261-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 7303a1475008 ("sctp: identify chunks that need to be fragmented at IP level") made the chunk be fragmented at IP level in the next round if it's size exceed PMTU. But there still is another case, PMTU can be updated if transport's dst expires and transport's pmtu_pending is set in sctp_packet_transmit. If the new PMTU is less than the chunk, the same issue with that commit can be triggered. So we should drop this packet and let it retransmit in another round where it would be fragmented at IP level. This patch is to fix it by checking the chunk size after PMTU may be updated and dropping this packet if it's size exceed PMTU. Fixes: 90017accff61 ("sctp: Add GSO support") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@txudriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: sctp, forbid negative lengthJiri Slaby2016-10-231-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Most of getsockopt handlers in net/sctp/socket.c check len against sizeof some structure like: if (len < sizeof(int)) return -EINVAL; On the first look, the check seems to be correct. But since len is int and sizeof returns size_t, int gets promoted to unsigned size_t too. So the test returns false for negative lengths. Yes, (-1 < sizeof(long)) is false. Fix this in sctp by explicitly checking len < 0 before any getsockopt handler is called. Note that sctp_getsockopt_events already handled the negative case. Since we added the < 0 check elsewhere, this one can be removed. If not checked, this is the result: UBSAN: Undefined behaviour in ../mm/page_alloc.c:2722:19 shift exponent 52 is too large for 32-bit type 'int' CPU: 1 PID: 24535 Comm: syz-executor Not tainted 4.8.1-0-syzkaller #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014 0000000000000000 ffff88006d99f2a8 ffffffffb2f7bdea 0000000041b58ab3 ffffffffb4363c14 ffffffffb2f7bcde ffff88006d99f2d0 ffff88006d99f270 0000000000000000 0000000000000000 0000000000000034 ffffffffb5096422 Call Trace: [<ffffffffb3051498>] ? __ubsan_handle_shift_out_of_bounds+0x29c/0x300 ... [<ffffffffb273f0e4>] ? kmalloc_order+0x24/0x90 [<ffffffffb27416a4>] ? kmalloc_order_trace+0x24/0x220 [<ffffffffb2819a30>] ? __kmalloc+0x330/0x540 [<ffffffffc18c25f4>] ? sctp_getsockopt_local_addrs+0x174/0xca0 [sctp] [<ffffffffc18d2bcd>] ? sctp_getsockopt+0x10d/0x1b0 [sctp] [<ffffffffb37c1219>] ? sock_common_getsockopt+0xb9/0x150 [<ffffffffb37be2f5>] ? SyS_getsockopt+0x1a5/0x270 Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: linux-sctp@vger.kernel.org Cc: netdev@vger.kernel.org Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sctp: remove the old ttl expires policyXin Long2016-10-132-27/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The prsctp polices include ttl expires policy already, we should remove the old ttl expires codes, and just adjust the new polices' codes to be compatible with the old one for users. This patch is to remove all the old expires codes, and if prsctp polices are not set, it will still set msg's expires_at and check the expires in sctp_check_abandoned. Note that asoc->prsctp_enable is set by default, so users can't feel any difference even if they use the old expires api in userspace. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sctp: reuse sent_count to avoid retransmitted chunks for RTT measurementsXin Long2016-10-132-4/+3
|/ | | | | | | | | | | | | | | Now sctp uses chunk->resent to record if a chunk is retransmitted, for RTT measurements with retransmitted DATA chunks. chunk->sent_count was introduced to record how many times one chunk has been sent for prsctp RTX policy before. We actually can know if one chunk is retransmitted by checking chunk->sent_count is greater than 1. This patch is to remove resent from sctp_chunk and reuse sent_count to avoid retransmitted chunks for RTT measurements. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2016-10-025-45/+61
|\ | | | | | | | | | | Three sets of overlapping changes. Nothing serious. Signed-off-by: David S. Miller <davem@davemloft.net>
| * sctp: fix the issue sctp_diag uses lock_sock in rcu_read_lockXin Long2016-09-302-21/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When sctp dumps all the ep->assocs, it needs to lock_sock first, but now it locks sock in rcu_read_lock, and lock_sock may sleep, which would break rcu_read_lock. This patch is to get and hold one sock when traversing the list. After that and get out of rcu_read_lock, lock and dump it. Then it will traverse the list again to get the next one until all sctp socks are dumped. For sctp_diag_dump_one, it fixes this issue by holding asoc and moving cb() out of rcu_read_lock in sctp_transport_lookup_process. Fixes: 8f840e47f190 ("sctp: add the sctp_diag.c file") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sctp: change to check peer prsctp_capable when using prsctp policesXin Long2016-09-302-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now before using prsctp polices, sctp uses asoc->prsctp_enable to check if prsctp is enabled. However asoc->prsctp_enable is set only means local host support prsctp, sctp should not abandon packet if peer host doesn't enable prsctp. So this patch is to use asoc->peer.prsctp_capable to check if prsctp is enabled on both side, instead of asoc->prsctp_enable, as asoc's peer.prsctp_capable is set only when local and peer both enable prsctp. Fixes: a6c2f792873a ("sctp: implement prsctp TTL policy") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sctp: remove prsctp_param from sctp_chunkXin Long2016-09-303-19/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now sctp uses chunk->prsctp_param to save the prsctp param for all the prsctp polices, we didn't need to introduce prsctp_param to sctp_chunk. We can just use chunk->sinfo.sinfo_timetolive for RTX and BUF polices, and reuse msg->expires_at for TTL policy, as the prsctp polices and old expires policy are mutual exclusive. This patch is to remove prsctp_param from sctp_chunk, and reuse msg's expires_at for TTL and chunk's sinfo.sinfo_timetolive for RTX and BUF polices. Note that sctp can't use chunk's sinfo.sinfo_timetolive for TTL policy, as it needs a u64 variables to save the expires_at time. This one also fixes the "netperf-Throughput_Mbps -37.2% regression" issue. Fixes: a6c2f792873a ("sctp: implement prsctp TTL policy") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Suppress the "Comparison to NULL could be written" warningsJia He2016-09-301-1/+1
| | | | | | | | | | | | | | | | This is to suppress the checkpatch.pl warning "Comparison to NULL could be written". No functional changes here. Signed-off-by: Jia He <hejianet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | proc: Reduce cache miss in sctp_snmp_seq_showJia He2016-09-301-2/+6
| | | | | | | | | | | | | | | | This is to use the generic interfaces snmp_get_cpu_field{,64}_batch to aggregate the data by going through all the items of each cpu sequentially. Signed-off-by: Jia He <hejianet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sctp: fix the handling of SACK Gap Ack blocksMarcelo Ricardo Leitner2016-09-231-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sctp_acked() is using 32bit arithmetics on 16bits vars, via TSN_lte() macros, which is weird and confusing. Once the offset to ctsn is calculated, all wrapping is already handled and thus to verify the Gap Ack blocks we can just use pure less/big-or-equal than checks. Also, rename gap variable to tsn_offset, so it's more meaningful, as it doesn't point to any gap at all. Even so, I don't think this discrepancy resulted in any practical bug. This patch is a preparation for the next one, which will introduce typecheck() for TSN_lte() macros and would cause a compile error here. Suggested-by: David Laight <David.Laight@ACULAB.COM> Reported-by: David Laight <David.Laight@ACULAB.COM> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2016-09-231-10/+17
|\|
| * sctp: hold the transport before using it in sctp_hash_cmpXin Long2016-09-131-10/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 4f0087812648 ("sctp: apply rhashtable api to send/recv path"), sctp uses transport rhashtable with .obj_cmpfn sctp_hash_cmp, in which it compares the members of the transport with the rhashtable args to check if it's the right transport. But sctp uses the transport without holding it in sctp_hash_cmp, it can cause a use-after-free panic. As after it gets transport from hashtable, another CPU may close the sk and free the asoc. In sctp_association_free, it frees all the transports, meanwhile, the assoc's refcnt may be reduced to 0, assoc can be destroyed by sctp_association_destroy. So after that, transport->assoc is actually an unavailable memory address in sctp_hash_cmp. Although sctp_hash_cmp is under rcu_read_lock, it still can not avoid this, as assoc is not freed by RCU. This patch is to hold the transport before checking it's members with sctp_transport_hold, in which it checks the refcnt first, holds it if it's not 0. Fixes: 4f0087812648 ("sctp: apply rhashtable api to send/recv path") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sctp: make use of SCTP_TRUNC4 macroMarcelo Ricardo Leitner2016-09-221-3/+4
| | | | | | | | | | | | | | | | | | And avoid the usage of '&~3'. This is the last place still not using the macro. Also break the line to make it easier to read. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>