aboutsummaryrefslogtreecommitdiffstats
path: root/src/net
Commit message (Collapse)AuthorAgeFilesLines
* [tcp] Gracefully close connections during shutdownMichael Brown2015-07-041-1/+56
| | | | | | | | | | | | | | | We currently do not wait for a received FIN before exiting to boot a loaded OS. In the common case of booting from an HTTP server, this means that the TCP connection is left consuming resources on the server side: the server will retransmit the FIN several times before giving up. Fix by initiating a graceful close of all TCP connections and waiting (for up to one second) for all connections to finish closing gracefully (i.e. for the outgoing FIN to have been sent and ACKed, and for the incoming FIN to have been received and ACKed at least once). Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [ipoib] Attempt to generate ARPs as needed to repopulate REMAC cacheMichael Brown2015-06-291-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The only way to map an eIPoIB MAC address (REMAC) to an IPoIB MAC address is to intercept an incoming ARP request or reply. If we do not have an REMAC cache entry for a particular destination MAC address, then we cannot transmit the packet. This can arise in at least two situations: - An external program (e.g. a PXE NBP using the UNDI API) may attempt to transmit to a destination MAC address that has been obtained by some method other than ARP. - Memory pressure may have caused REMAC cache entries to be discarded. This is fairly likely on a busy network, since REMAC cache entries are created for all received (broadcast) ARP requests. (We can't sensibly avoid creating these cache entries, since they are required in order to send an ARP reply, and when we are being used via the UNDI API we may have no knowledge of which IP addresses are "ours".) Attempt to ameliorate the situation by generating a semi-spurious ARP request whenever we find a missing REMAC cache entry. This will hopefully trigger an ARP reply, which would then provide us with the information required to populate the REMAC cache. Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [dhcp] Defer discovery if link is blockedMichael Brown2015-06-251-0/+9
| | | | | | | | If the link is blocked (e.g. due to a Spanning Tree Protocol port not yet forwarding packets) then defer DHCP discovery until the link becomes unblocked. Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [stp] Fix interpretaton of hello timeMichael Brown2015-06-251-3/+3
| | | | | | Times in STP packets are expressed in units of 1/256 of a second. Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [stp] Add support for detecting Spanning Tree Protocol non-forwarding portsMichael Brown2015-06-251-0/+152
| | | | | | | | | | | | | | | | | | A fairly common end-user problem is that the default configuration of a switch may leave the port in a non-forwarding state for a substantial length of time (tens of seconds) after link up. This can cause iPXE to time out and give up attempting to boot. We cannot force the switch to start forwarding packets sooner, since any attempt to send a Spanning Tree Protocol bridge PDU may cause the switch to disable our port (if the switch happens to have the Bridge PDU Guard feature enabled for the port). For non-ancient versions of the Spanning Tree Protocol, we can detect whether or not the port is currently forwarding and use this to inform the network device core that the link is currently blocked. Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [netdevice] Add a generic concept of a "blocked link"Michael Brown2015-06-251-1/+51
| | | | | | | | | | | | When Spanning Tree Protocol (STP) is used, there may be a substantial delay (tens of seconds) from the time that the link goes up to the time that the port starts forwarding packets. Add a generic concept of a "blocked link" (i.e. a link which is up but which is not expected to communicate successfully), and allow "ifstat" to indicate when a link is blocked. Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [ethernet] Add minimal support for receiving LLC framesMichael Brown2015-06-251-2/+36
| | | | | | | | | | | | | | | In some Ethernet framing variants the two-byte protocol field is used as a length, with the Ethernet header being followed by an IEEE 802.2 LLC header. The first two bytes of the LLC header are the DSAP and SSAP. If the received Ethernet packet appears to use this framing, then interpret the two-byte DSAP and SSAP as being the network-layer protocol. This allows support for receiving Spanning Tree Protocol frames (which use an LLC header with {DSAP,SSAP}=0x4242) to be added without requiring a full LLC protocol layer. Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [tcp] Do not shrink window when discarding received packetsMichael Brown2015-06-251-20/+3
| | | | | | | | | | | | | | | | | | | | | We currently shrink the TCP window permanently if we are ever forced (by a low-memory condition) to discard a previously received TCP packet. This behaviour was intended to reduce the number of retransmissions in a lossy network, since lost packets might potentially result in the entire window contents being retransmitted. Since commit e0fc8fe ("[tcp] Implement support for TCP Selective Acknowledgements (SACK)") the cost of lost packets has been reduced by around one order of magnitude, and the reduction in the window size (which affects the maximum throughput) is now the more significant cost. Remove the code which reduces the TCP maximum window size when a received packet is discarded. Reported-by: Wissam Shoukair <wissams@mellanox.com> Tested-by: Wissam Shoukair <wissams@mellanox.com> Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [neighbour] Return success when deferring a packetMichael Brown2015-05-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | Deferral of a packet for neighbour discovery is not really an error. If we fail to discover a neighbour then the failure will eventually be reported by the call to neighbour_destroy() when any outstanding I/O buffers are discarded. The current behaviour breaks PXE booting on FreeBSD, which seems to treat the error return from PXENV_UDP_WRITE as a fatal error and so never proceeds to poll PXENV_UDP_READ (and hence never allows iPXE to receive the ARP reply and send the deferred UDP packet). Change neighbour_tx() to return success when deferring a packet. This fixes interoperability with FreeBSD and removes transient neighbour cache misses from the "ifstat" error output, while leaving genuine neighbour discovery failures visible via "ifstat" (once neighbour discovery times out, or the interface is closed). Debugged-by: Wissam Shoukair <wissams@mellanox.com> Tested-by: Wissam Shoukair <wissams@mellanox.com> Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [ipv6] Disambiguate received ICMPv6 errorsMichael Brown2015-05-111-2/+78
| | | | | Originally-implemented-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [base64] Add buffer size parameter to base64_encode() and base64_decode()Michael Brown2015-04-243-3/+5
| | | | Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [base16] Add buffer size parameter to base16_encode() and base16_decode()Michael Brown2015-04-244-14/+18
| | | | | | | | | | | | | | The current API for Base16 (and Base64) encoding requires the caller to always provide sufficient buffer space. This prevents the use of the generic encoding/decoding functionality in some situations, such as in formatting the hex setting types. Implement a generic hex_encode() (based on the existing format_hex_setting()), implement base16_encode() and base16_decode() in terms of the more generic hex_encode() and hex_decode(), and update all callers to provide the additional buffer length parameter. Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [build] Add missing "const" qualifiersChristian Hesse2015-04-241-2/+2
| | | | | | | | | This fixes "initialization discards 'const' qualifier from pointer target type" warnings with GCC 5.1.0. Signed-off-by: Christian Hesse <mail@eworm.de> Modified-by: Michael Brown <mcb30@ipxe.org> Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [peerdist] Add support for decoding PeerDist Content InformationMichael Brown2015-04-131-0/+803
| | | | Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [netdevice] Add missing bus types to netdev_fetch_bustype()Michael Brown2015-03-181-0/+4
| | | | | Reported-by: Robin Smidsrød <robin@smidsrod.no> Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [tcpip] Fix dubious calculation of min_portMichael Brown2015-03-131-1/+1
| | | | | | Detected using sparse. Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [tcp] Implement support for TCP Selective Acknowledgements (SACK)Michael Brown2015-03-111-4/+158
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The TCP Selective Acknowledgement option (specified in RFC2018) provides a mechanism for the receiver to indicate packets that have been received out of order (e.g. due to earlier dropped packets). iPXE often operates in environments in which there is a high probability of packet loss. For example, the legacy USB keyboard emulation in some BIOSes involves polling the USB bus from within a system management interrupt: this introduces an invisible delay of around 500us which is long enough for around 40 full-length packets to be dropped. Similarly, almost all 1Gbps USB2 devices will eventually end up dropping packets because the USB2 bus does not provide enough bandwidth to sustain a 1Gbps stream, and most devices will not provide enough internal buffering to hold a full TCP window's worth of received packets. Add support for sending TCP Selective Acknowledgements. This provides the sender with more detailed information about which packets have been lost, and so allows for a more efficient retransmission strategy. We include a SACK-permitted option in our SYN packet, since experimentation shows that at least Linux peers will not include a SACK-permitted option in the SYN-ACK packet if one was not present in the initial SYN. (RFC2018 does not seem to mandate this behaviour, but it is consistent with the approach taken in RFC1323.) We ignore any received SACK options; this is safe to do since SACK is only ever advisory and we never have to send non-trivial amounts of data. Since our TCP receive queue is a candidate for cache discarding under low memory conditions, we may end up discarding data that has been reported as received via a SACK option. This is permitted by RFC2018. We follow the stricture that SACK blocks must not report data which is no longer held by the receiver: previously-reported blocks are validated against the current receive queue before being included within the current SACK block list. Experiments in a qemu VM using forced packet drops (by setting NETDEV_DISCARD_RATE to 32) show that implementing SACK improves throughput by around 400%. Experiments with a USB2 NIC (an SMSC7500) show that implementing SACK improves throughput by around 700%, increasing the download rate from 35Mbps up to 250Mbps (which is approximately the usable bandwidth limit for USB2). Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [http] Support MD5-sess Digest authenticationMichael Brown2015-03-091-2/+42
| | | | | | | Microsoft IIS supports only MD5-sess for Digest authentication. Requested-by: Andreas Hammarskjöld <junior@2PintSoftware.com> Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [http] Abstract out HTTP Digest hash algorithm operationsMichael Brown2015-03-091-28/+56
| | | | Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [legal] Relicense files under GPL2_OR_LATER_OR_UBDLMichael Brown2015-03-051-1/+5
| | | | | | | | | | | Relicense files with kind permission from Stefan Hajnoczi <stefanha@redhat.com> alongside the contributors who have already granted such relicensing permission. Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [retry] Colourise debug outputMichael Brown2015-03-051-10/+10
| | | | Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [retry] Rewrite unrelicensable portions of retry.cMichael Brown2015-03-053-31/+46
| | | | Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [build] Fix the REQUIRE_SYMBOL mechanismMichael Brown2015-03-057-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | At some point in the past few years, binutils became more aggressive at removing unused symbols. To function as a symbol requirement, a relocation record must now be in a section marked with @progbits and must not be in a section which gets discarded during the link (either via --gc-sections or via /DISCARD/). Update REQUIRE_SYMBOL() to generate relocation records meeting these criteria. To minimise the impact upon the final binary size, we use existing symbols (specified via the REQUIRING_SYMBOL() macro) as the relocation targets where possible. We use R_386_NONE or R_X86_64_NONE relocation types to prevent any actual unwanted relocation taking place. Where no suitable symbol exists for REQUIRING_SYMBOL() (such as in config.c), the macro PROVIDE_REQUIRING_SYMBOL() can be used to generate a one-byte-long symbol to act as the relocation target. If there are versions of binutils for which this approach fails, then the fallback will probably involve killing off REQUEST_SYMBOL(), redefining REQUIRE_SYMBOL() to use the current definition of REQUEST_SYMBOL(), and postprocessing the linked ELF file with something along the lines of "nm -u | wc -l" to check that there are no undefined symbols remaining. Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [build] Use REQUIRE_OBJECT() to drag in per-object configurationMichael Brown2015-03-054-0/+12
| | | | Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [iscsi] Rewrite unrelicensable portions of iscsi.cMichael Brown2015-03-021-36/+28
| | | | Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [libc] Rewrite byte-swapping codeMichael Brown2015-03-021-1/+1
| | | | Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [legal] Relicense files under GPL2_OR_LATER_OR_UBDLMichael Brown2015-03-0213-13/+53
| | | | | | | | | | These files cannot be automatically relicensed by util/relicense.pl since they either contain unusual but trivial contributions (such as the addition of __nonnull function attributes), or contain lines dating back to the initial git revision (and so require manual knowledge of the code's origin). Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [legal] Relicense files under GPL2_OR_LATER_OR_UBDLMichael Brown2015-03-023-3/+15
| | | | | | | | | | | | | | | Relicence files with kind permission from the following contributors: Alex Williamson <alex.williamson@redhat.com> Eduardo Habkost <ehabkost@redhat.com> Greg Jednaszewski <jednaszewski@gmail.com> H. Peter Anvin <hpa@zytor.com> Marin Hannache <git@mareo.fr> Robin Smidsrød <robin@smidsrod.no> Shao Miller <sha0.miller@gmail.com> Thomas Horsten <thomas@horsten.com> Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [legal] Relicense files under GPL2_OR_LATER_OR_UBDLMichael Brown2015-03-0232-32/+160
| | | | | | | Relicense files for which I am the sole author (as identified by util/relicense.pl). Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [dhcp] Extract timing parameters out to config/dhcp.hAlex Williamson2015-02-251-13/+19
| | | | | | | | | | | | | | iPXE uses DHCP timeouts loosely based on values recommended by the specification, but often abbreviated to reduce timeouts for reliable and/or simple network topologies. Extract the DHCP timing parameters to config/dhcp.h and document them. The resulting default iPXE behavior is exactly the same, but downstreams are now afforded the opportunity to implement spec-compliant behavior via config file overrides. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Modified-by: Michael Brown <mcb30@ipxe.org> Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [ipv4] Rewrite inet_aton()Michael Brown2015-02-191-5/+37
| | | | | | | | | | The implementation of inet_aton() has an unknown provenance. Rewrite this code to avoid potential licensing uncertainty. Also move the code from core/misc.c to its logical home in net/ipv4.c, and add a few extra test cases. Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [legal] Add missing copyright header to net/ipv4.cMichael Brown2015-02-181-0/+20
| | | | Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [rndis] Add rndis_rx_err()Michael Brown2015-02-111-0/+15
| | | | Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [tftp] Explicitly abort connection whenever parent interface is closedMichael Brown2015-02-061-38/+16
| | | | | | | | | Fetching the TFTP file size is currently implemented via a custom "tftpsize://" protocol hack. Generalise this approach to instead close the TFTP connection whenever the parent data-transfer interface is closed. Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [rndis] Ignore start-of-day RNDIS_INDICATE_STATUS_MSG with status 0x40020006Michael Brown2014-12-201-0/+4
| | | | | | | | | | | Windows Server 2012 R2 generates an RNDIS_INDICATE_STATUS_MSG with a status code of 0x4002006. This status code does not appear to be documented anywhere within the sphere of human knowledge. Explicitly ignore this status code in order to avoid unnecessarily cluttering the display when RNDIS debugging is enabled. Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [hyperv] Assume that VMBus xfer page ranges correspond to RNDIS messagesMichael Brown2014-12-201-50/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The (undocumented) VMBus protocol seems to allow for transfer page-based packets where the data payload is split into an arbitrary set of ranges within the transfer page set. The RNDIS protocol includes a length field within the header of each message, and it is known from observation that multiple RNDIS messages can be concatenated into a single VMBus message. iPXE currently assumes that the transfer page range boundaries are entirely arbitrary, and uses the RNDIS header length to determine the RNDIS message boundaries. Windows Server 2012 R2 generates an RNDIS_INDICATE_STATUS_MSG for an undocumented and unknown status code (0x40020006) with a malformed RNDIS header length: the length does not cover the StatusBuffer portion of the message. This causes iPXE to report a malformed RNDIS message and to discard any further RNDIS messages within the same VMBus message. The Linux Hyper-V driver assumes that the transfer page range boundaries correspond to RNDIS message boundaries, and so does not notice the malformed length field in the RNDIS header. Match the behaviour of the Linux Hyper-V driver: assume that the transfer page range boundaries correspond to the RNDIS message boundaries and ignore the RNDIS header length. This avoids triggering the "malformed packet" error and also avoids unnecessary data copying: since we now have one I/O buffer per RNDIS message, there is no longer any need to use iob_split(). Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [rndis] Clear receive filter when closing the deviceMichael Brown2014-12-201-11/+30
| | | | | | | | | On Windows Server 2012 R2, closing and reopening the device will sometimes result in a non-functional RX datapath. The root cause is unknown. Clearing the receive filter before closing the device seems to fix the problem. Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [rndis] Send RNDIS_HALT_MSGMichael Brown2014-12-191-0/+58
| | | | | | The RNDIS specification requires that we send RNDIS_HALT_MSG. Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [rndis] Send RNDIS_INITIALISE_MSGMichael Brown2014-12-191-22/+161
| | | | | | | | The Hyper-V RNDIS implementation on Windows Server 2012 R2 requires that we send an explicit RNDIS initialisation message in order to get a working RX datapath. Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [rndis] Add generic RNDIS device abstractionMichael Brown2014-12-181-0/+850
| | | | | | | RNDIS provides an abstraction of a network device on top of a generic packet transmission mechanism. Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [netdevice] Fix erroneous use of free(iobuf) instead of free_iob(iobuf)Michael Brown2014-12-121-1/+1
| | | | Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [dhcp] Remove obsolete dhcp_chaddr() functionMichael Brown2014-09-221-75/+5
| | | | | | | | | | | | | | | As of commit 03f0c23 ("[ipoib] Expose Ethernet-compatible eIPoIB link-layer addresses and headers"), all link layers have used addresses which fit within the DHCP chaddr field. The dhcp_chaddr() function was therefore made obsolete by this commit, but was accidentally left present (though unused) in the source code. Remove the dhcp_chaddr() function and the only remaining use of it, unnecessarily introduced in commit 08bcc0f ("[dhcp] Check for matching chaddr in received DHCP packets"). Reported-by: Wissam Shoukair <wissams@mellanox.com> Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [dhcp] Check for matching chaddr in received DHCP packetsMichael Brown2014-09-221-0/+37
| | | | | | | | On large networks a DHCP XID collision is possible. Fix by explicitly checking the chaddr in received DHCP packets. Originally-fixed-by: Wissam Shoukair <wissams@mellanox.com> Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [netdevice] Avoid registering duplicate network devicesMichael Brown2014-07-301-5/+40
| | | | | | | | | | | | | | | | | Reject network devices which appear to be duplicates of those already available via a different underlying hardware device. On a Xen PV-HVM system, this allows us to filter out the emulated PCI NICs (which would otherwise appear alongside the netfront NICs). Note that we cannot use the Xen facility to "unplug" the emulated PCI NICs, since there is no guarantee that the OS we subsequently load will have a native netfront driver. We permit devices with the same MAC address if they are attached to the same underlying hardware device (e.g. VLAN devices). Inspired-by: Marin Hannache <git@mareo.fr> Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [lacp] Set "aggregatable" flag in response LACPDUSven Ulland2014-07-231-1/+2
| | | | | | | | | | | | | Some switches do not allow an individual link (as defined in IEEE Std 802.3ad-2000 section 43.3.5) to work alone in a link aggregation group as described in section 43.3.6. This is verified on Dell's PowerConnect M6220, based on the Broadcom Strata XGS-IV chipset. Set the LACP_STATE_AGGREGATABLE flag in the actor.state field to announce link aggregation in the response LACPDU, which will have the switch enable the link aggregation group and allow frames to pass. Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [netdevice] Reset network device index when last device is unregisteredMichael Brown2014-07-141-2/+8
| | | | | | | | | | | | | | When functioning as an EFI driver, drivers can be disconnected and reconnected multiple times (e.g. via the EFI shell "connect" command, or by running an executable such as ipxe.efi which will temporarily disconnect existing drivers). Minimise surprise by resetting the network device index to zero whenever the last device is unregistered. This is not foolproof, but it does handle the common case of having all devices unregistered and then reregistered in the original order. Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [build] Expose build timestamp, build name, and product namesMichael Brown2014-06-241-2/+2
| | | | | | | | Expose the build timestamp (measured in seconds since the Epoch) and the build name (e.g. "rtl8139.rom" or "ipxe.efi"), and provide the product name and product short name in a single centralised location. Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [scsi] Improve sense code parsingMichael Brown2014-06-032-7/+9
| | | | | | | | Parse the sense data to extract the reponse code, the sense key, the additional sense code, and the additional sense code qualifier. Originally-implemented-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [ethernet] Provide eth_random_addr() to generate random Ethernet addressesHannes Reinecke2014-06-011-0/+16
| | | | | Modified-by: Michael Brown <mcb30@ipxe.org> Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [ipv6] Avoid potentially copying from a NULL pointer in ipv6_tx()Michael Brown2014-05-231-1/+2
| | | | | | | | | | | | | If ipv6_tx() is called with a non-NULL network device, a NULL or unspecified source address, and a destination address which does not match any routing table entry, then it will attempt to copy the source address from a NULL pointer. I don't think that there is currently any code path which could trigger this behaviour, but we should probably ensure that it can never happen. Signed-off-by: Michael Brown <mcb30@ipxe.org>