aboutsummaryrefslogtreecommitdiff
path: root/src/net
AgeCommit message (Collapse)AuthorFilesLines
2015-09-10[tcpip] Avoid generating positive zero for transmitted UDP checksumsMichael Brown3-1/+7
TCP/IP checksum fields are one's complement values and therefore have two possible representations of zero: positive zero (0x0000) and negative zero (0xffff). In RFC768, UDP over IPv4 exploits this redundancy to repurpose the positive representation of zero (0x0000) to mean "no checksum calculated"; checksums are optional for UDP over IPv4. In RFC2460, checksums are made mandatory for UDP over IPv4. The wording of the RFC is such that the UDP header is mandated to use only the negative representation of zero (0xffff), rather than simply requiring the checksum to be correct but allowing for either representation of zero to be used. In RFC1071, an example algorithm is given for calculating the TCP/IP checksum. This algorithm happens to produce only the positive representation of zero (0x0000); this is an artifact of the way that unsigned arithmetic is used to calculate a signed one's complement sum (and its final negation). A common misconception has developed (exemplified in RFC1624) that this artifact is part of the specification. Many people have assumed that the checksum field should never contain the negative representation of zero (0xffff). A sensible receiver will calculate the checksum over the whole packet and verify that the result is zero (in whichever representation of zero happens to be generated by the receiver's algorithm). Such a receiver will not care which representation of zero happens to be used in the checksum field. However, there are receivers in existence which will verify the received checksum the hard way: by calculating the checksum over the remainder of the packet and comparing the result against the checksum field. If the representation of zero used by the receiver's algorithm does not match the representation of zero used by the transmitter (and so placed in the checksum field), and if the receiver does not explicitly allow for both representations to compare as equal, then the receiver may reject packets with a valid checksum. For UDP, the combined RFCs effectively mandate that we should generate only the negative representation of zero in the checksum field. For IP, TCP and ICMP, the RFCs do not mandate which representation of zero should be used, but the misconceptions which have grown up around RFC1071 and RFC1624 suggest that it would be least surprising to generate only the positive representation of zero in the checksum field. Fix by ensuring that all of our checksum algorithms generate only the positive representation of zero, and explicitly inverting this in the case of transmitted UDP packets. Reported-by: Wissam Shoukair <wissams@mellanox.com> Tested-by: Wissam Shoukair <wissams@mellanox.com> Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-09-01[pxe] Populate ciaddr in fake PXE Boot Server ACK packetMichael Brown1-0/+4
We currently do not populate the ciaddr field in the constructed PXE Boot Server ACK packet. This causes a WDS server to respond with a broadcast packet, which is then ignored by wdsmgfw.efi since it does not match the specified IP address filter. Fix by populating ciaddr within the constructed PXE Boot Server ACK packet. Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-09-01[tcpip] Allow supported address families to be detected at runtimeMichael Brown3-8/+9
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-08-26[dhcp] Do not skip ProxyDHCPREQUEST if next-server is emptyMichael Brown1-2/+3
We attempt to mimic the behaviour of Intel's PXE ROM by skipping the separate ProxyDHCPREQUEST if the ProxyDHCPOFFER already contains a boot filename or a PXE boot menu. Experimentation reveals that Intel's PXE ROM will also check for a non-empty next-server address alongside the boot filename. Update our test to match this behaviour. Reported-by: Wissam Shoukair <wissams@mellanox.com> Tested-by: Wissam Shoukair <wissams@mellanox.com> Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-08-25[settings] Re-add "uristring" setting typeMichael Brown1-2/+2
Commit 09b057c ("[settings] Remove "uristring" setting type") removed support for URI-encoded settings via the "uristring" setting type, on the basis that such encoding was no longer necessary to avoid problems with the command line parser. Other valid use cases for the "uristring" setting type do exist: for example, a password containing a '/' character expanded via chain http://username:${password:uristring}@server.name/boot.php Restore the existence of the "uristring" setting, avoiding the potentially large stack allocations that were used in the old code prior to commit 09b057c ("[settings] Remove "uristring" setting type"). Requested-by: Robin Smidsrød <robin@smidsrod.no> Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-08-18[dhcp] Ignore ProxyDHCPACKs without PXE optionsMichael Brown1-0/+4
Suggested-by: Wissam Shoukair <wissams@mellanox.com> Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-08-18[dhcp] Allow pseudo-DHCP servers to use pseudo-identifiersMichael Brown1-27/+54
Some ProxyDHCP servers and PXE boot servers do not specify a DHCP server identifier via option 54. We currently work around this in a variety of ad-hoc ways: - if a ProxyDHCPACK has no server identifier then we treat it as having the correct server identifier, - if a boot server ACK has no server identifier then we use the packet's source IP address as the server identifier. Introduce the concept of a DHCP server pseudo-identifier, defined as being: - the server identifier (option 54), or - if there is no server identifier, then the next-server address (siaddr), - if there is no server identifier or next-server address, then the DHCP packet's source IP address. Use the pseudo-identifier in place of the server identifier when handling ProxyDHCP and PXE boot server responses. Originally-fixed-by: Wissam Shoukair <wissams@mellanox.com> Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-08-17[ipoib] Fix a race when chain-loading undionly.kpxe in IPoIBWissam Shoukair2-0/+12
The Infiniband link status change callback ipoib_link_state_changed() may be called while the IPoIB device is closed, in which case there will not be an IPoIB queue pair to be joined to the IPv4 broadcast group. This leads to NULL pointer dereferences in ib_mcast_attach() and ib_mcast_detach(). Fix by not attempting to join (or leave) the broadcast group unless we actually have an IPoIB queue pair. Signed-off-by: Wissam Shoukair <wissams@mellanox.com> Modified-by: Michael Brown <mcb30@ipxe.org> Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-08-17[peerdist] Add support for PeerDist (aka BranchCache) HTTP content encodingMichael Brown1-0/+145
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-08-17[peerdist] Add block download multiplexerMichael Brown1-0/+387
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-08-17[peerdist] Add individual block download mechanismMichael Brown1-0/+1366
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-08-17[peerdist] Add segment discovery mechanismMichael Brown1-0/+551
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-08-17[http] Rewrite HTTP core to support content encodingsMichael Brown8-1241/+2480
Rewrite the HTTP core to allow for the addition of arbitrary content encoding mechanisms, such as PeerDist and gzip. The core now exposes http_open() which can be used to create requests with an explicitly selected HTTP method, an optional requested content range, and an optional request body. A simple wrapper provides the preexisting behaviour of creating either a GET request or an application/x-www-form-urlencoded POST request (if the URI includes parameters). The HTTP SAN interface is now implemented using the generic block device translator. Individual blocks are requested using http_open() to create a range request. Server connections are now managed via a connection pool; this allows for multiple requests to the same server (e.g. for SAN blocks) to be completely unaware of each other. Repeated HTTPS connections to the same server can reuse a pooled connection, avoiding the per-connection overhead of establishing a TLS session (which can take several seconds if using a client certificate). Support for HTTP SAN booting and for the Basic and Digest authentication schemes is now optional and can be controlled via the SANBOOT_PROTO_HTTP, HTTP_AUTH_BASIC, and HTTP_AUTH_DIGEST build configuration options in config/general.h. Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-08-02[crypto] Support SHA-{224,384,512} in X.509 certificatesMichael Brown1-73/+17
Add support for SHA-224, SHA-384, and SHA-512 as digest algorithms in X.509 certificates, and allow the choice of public-key, cipher, and digest algorithms to be configured at build time via config/crypto.h. Originally-implemented-by: Tufan Karadere <tufank@gmail.com> Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-08-02[tls] Report supported signature algorithms in ClientHelloMichael Brown1-0/+25
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-08-01[tls] Do not access beyond the end of a 24-bit integerMichael Brown1-22/+29
The current implementation handles big-endian 24-bit integers (which occur in several TLS record types) by treating them as big-endian 32-bit integers which are shifted by 8 bits. This can result in "Invalid read" errors when running under valgrind, if the 24-bit field happens to be exactly at the end of an I/O buffer. Fix by ensuring that we touch only the three bytes which comprise the 24-bit integer. Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-07-28[peerdist] Add support for constructing and decoding discovery messagesMichael Brown1-0/+286
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-07-28[peerdist] Include trimmed range within content information blockMichael Brown1-4/+19
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-07-28[netdevice] Allow network devices to disclaim IRQ support at runtimeMichael Brown2-0/+8
VLAN and 802.11 devices use a network device operations structure that wraps an underlying structure. For example, the vlan_operations structure wraps the network device operations structure of the underlying trunk device. This can cause false positives from the current implementation of netdev_irq_supported(), which will always report that VLAN devices support interrupts since it has no visibility into the support provided by the underlying trunk device. Fix by allowing network devices to explicitly flag that interrupts are not supported, despite the presence of an irq() method. Originally-fixed-by: Wissam Shoukair <wissams@mellanox.com> Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-07-28[iscsi] Add missing "break" statementsMichael Brown1-0/+2
iscsi_tx_done() is missing "break" statements at the end of each case. (Fortunately, this happens not to cause a bug in practice, since iscsi_login_request_done() is effectively a no-op when completing a data-out PDU.) Reported-by: Wissam Shoukair <wissams@mellanox.com> Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-07-28[ipv4] Allow IPv4 socket addresses to include a scope IDMichael Brown1-12/+33
Extend the IPv6 concept of "scope ID" (indicating the network device index) to IPv4 socket addresses, so that IPv4 multicast transmissions may specify the transmitting network device. The scope ID is not (currently) exposed via the string representation of the socket address, since IPv4 does not use the IPv6 concept of link-local addresses (which could legitimately be specified in a URI). Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-07-28[ipv4] Redefine IP address constants to avoid unnecessary byte swappingMichael Brown1-8/+8
Redefine various IPv4 address constants and testing macros to avoid unnecessary byte swapping at runtime, and slightly rename the macros to prevent code from accidentally using the old definitions. Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-07-28[netdevice] Avoid using zero as a network device indexMichael Brown1-2/+2
Avoid using zero as a network device index, so that a zero sin6_scope_id can be used to mean "unspecified" (rather than unintentionally meaning "net0"). Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-07-28[ipv6] Treat a missing network device name as "netX"Michael Brown1-4/+15
When an IPv6 socket address string specifies a link-local or multicast address but does not specify the requisite network device name (e.g. "fe80::69ff:fe50:5845" rather than "fe80::69ff:fe50:5845%net0"), assume the use of "netX". Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-07-27[802.11] Use correct SHA1_DIGEST_SIZE constant nameMichael Brown1-1/+1
The constant SHA1_SIZE is defined only as part of the imported AXTLS code. Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-07-22[xferbuf] Generalise to handle umalloc()-based buffersMichael Brown1-2/+3
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-07-22[fault] Generalise NETDEV_DISCARD_RATE fault injection mechanismMichael Brown1-7/+5
Provide a generic inject_fault() function that can be used to inject random faults with configurable probabilities. Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-07-22[tcp] Ensure FIN is actually sent if connection is closed while idleMichael Brown1-0/+1
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-07-04[tcp] Gracefully close connections during shutdownMichael Brown1-1/+56
We currently do not wait for a received FIN before exiting to boot a loaded OS. In the common case of booting from an HTTP server, this means that the TCP connection is left consuming resources on the server side: the server will retransmit the FIN several times before giving up. Fix by initiating a graceful close of all TCP connections and waiting (for up to one second) for all connections to finish closing gracefully (i.e. for the outgoing FIN to have been sent and ACKed, and for the incoming FIN to have been received and ACKed at least once). Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-06-29[ipoib] Attempt to generate ARPs as needed to repopulate REMAC cacheMichael Brown1-3/+3
The only way to map an eIPoIB MAC address (REMAC) to an IPoIB MAC address is to intercept an incoming ARP request or reply. If we do not have an REMAC cache entry for a particular destination MAC address, then we cannot transmit the packet. This can arise in at least two situations: - An external program (e.g. a PXE NBP using the UNDI API) may attempt to transmit to a destination MAC address that has been obtained by some method other than ARP. - Memory pressure may have caused REMAC cache entries to be discarded. This is fairly likely on a busy network, since REMAC cache entries are created for all received (broadcast) ARP requests. (We can't sensibly avoid creating these cache entries, since they are required in order to send an ARP reply, and when we are being used via the UNDI API we may have no knowledge of which IP addresses are "ours".) Attempt to ameliorate the situation by generating a semi-spurious ARP request whenever we find a missing REMAC cache entry. This will hopefully trigger an ARP reply, which would then provide us with the information required to populate the REMAC cache. Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-06-25[dhcp] Defer discovery if link is blockedMichael Brown1-0/+9
If the link is blocked (e.g. due to a Spanning Tree Protocol port not yet forwarding packets) then defer DHCP discovery until the link becomes unblocked. Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-06-25[stp] Fix interpretaton of hello timeMichael Brown1-3/+3
Times in STP packets are expressed in units of 1/256 of a second. Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-06-25[stp] Add support for detecting Spanning Tree Protocol non-forwarding portsMichael Brown1-0/+152
A fairly common end-user problem is that the default configuration of a switch may leave the port in a non-forwarding state for a substantial length of time (tens of seconds) after link up. This can cause iPXE to time out and give up attempting to boot. We cannot force the switch to start forwarding packets sooner, since any attempt to send a Spanning Tree Protocol bridge PDU may cause the switch to disable our port (if the switch happens to have the Bridge PDU Guard feature enabled for the port). For non-ancient versions of the Spanning Tree Protocol, we can detect whether or not the port is currently forwarding and use this to inform the network device core that the link is currently blocked. Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-06-25[netdevice] Add a generic concept of a "blocked link"Michael Brown1-1/+51
When Spanning Tree Protocol (STP) is used, there may be a substantial delay (tens of seconds) from the time that the link goes up to the time that the port starts forwarding packets. Add a generic concept of a "blocked link" (i.e. a link which is up but which is not expected to communicate successfully), and allow "ifstat" to indicate when a link is blocked. Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-06-25[ethernet] Add minimal support for receiving LLC framesMichael Brown1-2/+36
In some Ethernet framing variants the two-byte protocol field is used as a length, with the Ethernet header being followed by an IEEE 802.2 LLC header. The first two bytes of the LLC header are the DSAP and SSAP. If the received Ethernet packet appears to use this framing, then interpret the two-byte DSAP and SSAP as being the network-layer protocol. This allows support for receiving Spanning Tree Protocol frames (which use an LLC header with {DSAP,SSAP}=0x4242) to be added without requiring a full LLC protocol layer. Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-06-25[tcp] Do not shrink window when discarding received packetsMichael Brown1-20/+3
We currently shrink the TCP window permanently if we are ever forced (by a low-memory condition) to discard a previously received TCP packet. This behaviour was intended to reduce the number of retransmissions in a lossy network, since lost packets might potentially result in the entire window contents being retransmitted. Since commit e0fc8fe ("[tcp] Implement support for TCP Selective Acknowledgements (SACK)") the cost of lost packets has been reduced by around one order of magnitude, and the reduction in the window size (which affects the maximum throughput) is now the more significant cost. Remove the code which reduces the TCP maximum window size when a received packet is discarded. Reported-by: Wissam Shoukair <wissams@mellanox.com> Tested-by: Wissam Shoukair <wissams@mellanox.com> Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-05-20[neighbour] Return success when deferring a packetMichael Brown1-1/+1
Deferral of a packet for neighbour discovery is not really an error. If we fail to discover a neighbour then the failure will eventually be reported by the call to neighbour_destroy() when any outstanding I/O buffers are discarded. The current behaviour breaks PXE booting on FreeBSD, which seems to treat the error return from PXENV_UDP_WRITE as a fatal error and so never proceeds to poll PXENV_UDP_READ (and hence never allows iPXE to receive the ARP reply and send the deferred UDP packet). Change neighbour_tx() to return success when deferring a packet. This fixes interoperability with FreeBSD and removes transient neighbour cache misses from the "ifstat" error output, while leaving genuine neighbour discovery failures visible via "ifstat" (once neighbour discovery times out, or the interface is closed). Debugged-by: Wissam Shoukair <wissams@mellanox.com> Tested-by: Wissam Shoukair <wissams@mellanox.com> Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-05-11[ipv6] Disambiguate received ICMPv6 errorsMichael Brown1-2/+78
Originally-implemented-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-04-24[base64] Add buffer size parameter to base64_encode() and base64_decode()Michael Brown3-3/+5
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-04-24[base16] Add buffer size parameter to base16_encode() and base16_decode()Michael Brown4-14/+18
The current API for Base16 (and Base64) encoding requires the caller to always provide sufficient buffer space. This prevents the use of the generic encoding/decoding functionality in some situations, such as in formatting the hex setting types. Implement a generic hex_encode() (based on the existing format_hex_setting()), implement base16_encode() and base16_decode() in terms of the more generic hex_encode() and hex_decode(), and update all callers to provide the additional buffer length parameter. Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-04-24[build] Add missing "const" qualifiersChristian Hesse1-2/+2
This fixes "initialization discards 'const' qualifier from pointer target type" warnings with GCC 5.1.0. Signed-off-by: Christian Hesse <mail@eworm.de> Modified-by: Michael Brown <mcb30@ipxe.org> Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-04-13[peerdist] Add support for decoding PeerDist Content InformationMichael Brown1-0/+803
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-03-18[netdevice] Add missing bus types to netdev_fetch_bustype()Michael Brown1-0/+4
Reported-by: Robin Smidsrød <robin@smidsrod.no> Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-03-13[tcpip] Fix dubious calculation of min_portMichael Brown1-1/+1
Detected using sparse. Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-03-11[tcp] Implement support for TCP Selective Acknowledgements (SACK)Michael Brown1-4/+158
The TCP Selective Acknowledgement option (specified in RFC2018) provides a mechanism for the receiver to indicate packets that have been received out of order (e.g. due to earlier dropped packets). iPXE often operates in environments in which there is a high probability of packet loss. For example, the legacy USB keyboard emulation in some BIOSes involves polling the USB bus from within a system management interrupt: this introduces an invisible delay of around 500us which is long enough for around 40 full-length packets to be dropped. Similarly, almost all 1Gbps USB2 devices will eventually end up dropping packets because the USB2 bus does not provide enough bandwidth to sustain a 1Gbps stream, and most devices will not provide enough internal buffering to hold a full TCP window's worth of received packets. Add support for sending TCP Selective Acknowledgements. This provides the sender with more detailed information about which packets have been lost, and so allows for a more efficient retransmission strategy. We include a SACK-permitted option in our SYN packet, since experimentation shows that at least Linux peers will not include a SACK-permitted option in the SYN-ACK packet if one was not present in the initial SYN. (RFC2018 does not seem to mandate this behaviour, but it is consistent with the approach taken in RFC1323.) We ignore any received SACK options; this is safe to do since SACK is only ever advisory and we never have to send non-trivial amounts of data. Since our TCP receive queue is a candidate for cache discarding under low memory conditions, we may end up discarding data that has been reported as received via a SACK option. This is permitted by RFC2018. We follow the stricture that SACK blocks must not report data which is no longer held by the receiver: previously-reported blocks are validated against the current receive queue before being included within the current SACK block list. Experiments in a qemu VM using forced packet drops (by setting NETDEV_DISCARD_RATE to 32) show that implementing SACK improves throughput by around 400%. Experiments with a USB2 NIC (an SMSC7500) show that implementing SACK improves throughput by around 700%, increasing the download rate from 35Mbps up to 250Mbps (which is approximately the usable bandwidth limit for USB2). Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-03-09[http] Support MD5-sess Digest authenticationMichael Brown1-2/+42
Microsoft IIS supports only MD5-sess for Digest authentication. Requested-by: Andreas Hammarskjöld <junior@2PintSoftware.com> Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-03-09[http] Abstract out HTTP Digest hash algorithm operationsMichael Brown1-28/+56
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-03-05[legal] Relicense files under GPL2_OR_LATER_OR_UBDLMichael Brown1-1/+5
Relicense files with kind permission from Stefan Hajnoczi <stefanha@redhat.com> alongside the contributors who have already granted such relicensing permission. Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-03-05[retry] Colourise debug outputMichael Brown1-10/+10
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-03-05[retry] Rewrite unrelicensable portions of retry.cMichael Brown3-31/+46
Signed-off-by: Michael Brown <mcb30@ipxe.org>