The Embedded New Testament

The "Holy Bible" for embedded engineers


Project maintained by theEmbeddedGeorge Hosted on GitHub Pages — Theme by mattgraham

Network Protocols for Embedded Systems

Designing networked embedded devices requires balancing deterministic behavior, small memory footprints, and constrained CPU cycles while speaking interoperable protocols. This guide focuses on practical, production-oriented aspects for IPv4/IPv6, ICMP/ARP/ND, UDP/TCP, and application-layer IoT protocols.


Concept → Why it matters → Minimal example → Try it → Takeaways

Concept: Network protocols in embedded systems are about managing limited resources while maintaining reliable communication. Unlike desktop systems with abundant memory and processing power, embedded devices must carefully balance functionality, performance, and resource constraints.

Why it matters: Network connectivity is essential for modern embedded systems, but traditional network stacks can overwhelm constrained devices. Understanding how to configure lightweight stacks, manage memory pools, and implement efficient protocols is crucial for building reliable networked devices.

Minimal example: A simple UDP echo server that demonstrates memory pool management and zero-copy buffer handling.

Try it: Implement a basic TCP client with connection pooling and observe memory usage patterns under different load conditions.

Takeaways: Network protocols in embedded systems require careful resource management, thoughtful configuration, and understanding of the trade-offs between functionality and resource constraints.


Goals


TCP/IP Stack Overview

Network Stack Models and Embedded Considerations

The OSI (7 layers) model provides a theoretical framework, but embedded systems typically implement the TCP/IP (4 layers) model for practical reasons. This simplification reduces memory overhead and processing complexity while maintaining full network functionality.

Why TCP/IP for Embedded?

Link Layer Choices and Trade-offs

Internet Layer Design Decisions IPv4 vs IPv6 is a critical choice for embedded systems:

Transport Layer Selection Criteria

Memory Layout Considerations

Understanding memory allocation is crucial for embedded systems where every byte counts. The lwIP stack provides extensive configuration options to balance functionality with memory constraints.

Memory Pool Philosophy Instead of dynamic allocation, embedded systems use pre-allocated pools to:

Key Configuration Parameters Explained

// Typical lwIP memory pool configuration
#define MEM_SIZE                    (20*1024)        // 20KB for general memory
#define MEMP_NUM_TCP_PCB          20                 // Max TCP connections
#define MEMP_NUM_TCP_PCB_LISTEN   10                 // Max listening sockets
#define MEMP_NUM_TCP_SEG          32                 // Max TCP segments in flight
#define TCP_MSS                   1460               // Max segment size (Ethernet MTU 1500 - IP 20 - TCP 20)
#define TCP_SND_BUF               (4*TCP_MSS)       // 4 segments send buffer
#define TCP_WND                   (4*TCP_MSS)       // 4 segments receive window
#define PBUF_POOL_SIZE            24                 // Packet buffer pool
#define PBUF_POOL_BUFSIZE         1520               // Buffer size (Ethernet MTU - 14)

Addressing, Name Resolution, and Configuration

Address Assignment Strategies and Their Implications

Static IP Configuration

DHCPv4 Dynamic Assignment

IPv6 SLAAC (Stateless Address Autoconfiguration)

NAT Traversal Considerations NAT creates challenges for embedded devices:

ARP and Neighbor Discovery Implementation

Address Resolution Protocol (ARP) ARP maps IP addresses to MAC addresses in IPv4 networks. Understanding ARP behavior is crucial for:

Neighbor Discovery (ND) in IPv6 IPv6 replaces ARP with ND, which provides:

ARP Table Management Strategies Embedded systems need efficient ARP table management:

// Custom ARP table management for embedded systems
typedef struct {
    uint32_t ip_addr;
    uint8_t mac_addr[6];
    uint32_t timestamp;
    uint8_t state;  // ARP_STATE_EMPTY, ARP_STATE_PENDING, ARP_STATE_STABLE
} arp_entry_t;

#define ARP_TABLE_SIZE 16
static arp_entry_t arp_table[ARP_TABLE_SIZE];

// ARP request with timeout and retry
err_t arp_request_with_retry(struct netif *netif, const ip4_addr_t *ipaddr) {
    err_t err = arp_request(netif, ipaddr);
    if (err == ERR_OK) {
        // Start retry timer
        sys_timeout(ARP_TIMEOUT_MS, arp_retry_timeout, netif);
    }
    return err;
}

Embedded Network Stacks

lwIP Configuration Deep Dive

Configuration Philosophy lwIP provides extensive configuration options that must be carefully tuned for embedded systems. The goal is to enable only the features you need while maintaining system stability.

Feature Selection Criteria

Memory Pool Tuning Strategy Memory pools must be sized based on:

// lwIP configuration header (lwipopts.h)
#define LWIP_IPV4                  1
#define LWIP_IPV6                  0  // Disable if not needed
#define LWIP_DNS                   1
#define LWIP_DHCP                  1
#define LWIP_AUTOIP                0  // Disable for production
#define LWIP_NETIF_HOSTNAME        1
#define LWIP_NETCONN               0  // Use raw API for better control
#define LWIP_SOCKET                0  // Disable socket API if using raw
#define LWIP_STATS                 1
#define LWIP_DEBUG                 0  // Enable for development

// Memory pool tuning for specific use case
#define MEMP_NUM_UDP_PCB           8
#define MEMP_NUM_TCP_PCB           4
#define MEMP_NUM_TCP_SEG           16
#define MEMP_NUM_NETBUF            8
#define MEMP_NUM_NETCONN           0
#define MEMP_NUM_TCPIP_MSG_API     8
#define MEMP_NUM_TCPIP_MSG_INPKT   8

// TCP tuning for embedded
#define TCP_TMR_INTERVAL           250  // 250ms timer interval
#define TCP_MSL                    (60*1000/TCP_TMR_INTERVAL)  // 60s MSL
#define TCP_FIN_WAIT_TIMEOUT      (2*TCP_MSL)
#define TCP_SYNMAXRTX              6
#define TCP_DEFAULT_LISTEN_BACKLOG 1

Threading Model Considerations lwIP can operate in different threading models:

Performance Implications

Zero-Copy Buffer Management

Why Zero-Copy Matters Traditional network stacks copy data multiple times:

  1. Driver receives packet → kernel buffer
  2. Kernel buffer → user buffer
  3. User buffer → application

Each copy consumes CPU cycles and memory bandwidth. Zero-copy eliminates these copies.

DMA and Cache Considerations Modern MCUs with DMA and cache require careful buffer management:

// DMA-safe buffer allocation with cache maintenance
typedef struct {
    uint8_t *buffer;
    uint32_t size;
    uint32_t flags;
} dma_buffer_t;

dma_buffer_t* allocate_dma_buffer(uint32_t size) {
    dma_buffer_t *buf = malloc(sizeof(dma_buffer_t));
    if (buf) {
        // Align to cache line size (32 bytes for ARM Cortex-M7)
        buf->buffer = aligned_alloc(32, size);
        buf->size = size;
        buf->flags = DMA_BUFFER_FLAG_CACHEABLE;
        
        // Ensure cache coherence
        SCB_CleanInvalidateDCache_by_Addr((uint32_t*)buf->buffer, size);
    }
    return buf;
}

UDP in Embedded Systems

When to Use UDP

UDP Advantages for Embedded Systems

UDP Challenges and Mitigations

Design Patterns for Reliable UDP Sequence Numbers and Acknowledgments Every UDP message should include:

Retry Strategies

// Reliable UDP with sequence numbers and ACKs
typedef struct {
    uint32_t seq_num;
    uint32_t ack_num;
    uint16_t length;
    uint16_t checksum;
    uint8_t flags;
    uint8_t data[];
} reliable_udp_header_t;

#define UDP_FLAG_ACK    0x01
#define UDP_FLAG_NACK   0x02
#define UDP_FLAG_RETRY  0x04

typedef struct {
    uint32_t seq_num;
    uint32_t timestamp;
    uint8_t retry_count;
    uint8_t data[UDP_MAX_PAYLOAD];
} udp_packet_t;

// UDP reliability layer
err_t udp_send_reliable(struct udp_pcb *pcb, const void *data, u16_t len,
                       const ip_addr_t *addr, u16_t port) {
    static uint32_t seq_counter = 0;
    udp_packet_t *packet = malloc(sizeof(udp_packet_t) + len);
    
    packet->seq_num = seq_counter++;
    packet->timestamp = sys_now();
    packet->retry_count = 0;
    memcpy(packet->data, data, len);
    
    // Add to retransmission queue
    add_to_retry_queue(packet);
    
    return udp_send(pcb, packet, sizeof(udp_packet_t) + len);
}

UDP Multicast Implementation

Multicast vs Broadcast

IGMP (Internet Group Management Protocol) IGMP allows hosts to join/leave multicast groups:

Multicast Address Ranges

// Join multicast group with IGMP
err_t udp_join_multicast_group(const ip4_addr_t *multicast_addr, 
                               const ip4_addr_t *netif_addr) {
    err_t err = igmp_joingroup(netif_addr, multicast_addr);
    if (err == ERR_OK) {
        // Configure network interface for multicast
        struct netif *netif = ip4_route_src(multicast_addr);
        if (netif) {
            netif->flags |= NETIF_FLAG_IGMP;
        }
    }
    return err;
}

TCP in Embedded Systems

TCP Connection Management

Connection Pool Design Embedded systems often need to manage multiple TCP connections efficiently:

Connection Lifecycle Management Every TCP connection goes through several states:

  1. CLOSED: No connection exists
  2. LISTEN: Server waiting for connections
  3. SYN_SENT: Client sent connection request
  4. SYN_RECEIVED: Server received connection request
  5. ESTABLISHED: Connection is active
  6. FIN_WAIT_1: Application closed connection
  7. FIN_WAIT_2: Waiting for remote close
  8. CLOSE_WAIT: Remote closed, waiting for application
  9. LAST_ACK: Waiting for final acknowledgment
  10. TIME_WAIT: Connection closed, waiting for cleanup

Keepalive Configuration TCP keepalive detects dead connections:

// TCP connection with keepalive and timeout
err_t tcp_connect_with_keepalive(const ip_addr_t *ipaddr, u16_t port) {
    tcp_connection_t *conn = tcp_connection_acquire();
    if (!conn) return ERR_MEM;
    
    conn->pcb = tcp_new();
    if (!conn->pcb) {
        conn->in_use = 0;
        return ERR_MEM;
    }
    
    // Set keepalive parameters
    tcp_keepalive(conn->pcb, 1, 60, 3); // Enable, 60s idle, 3 probes
    
    // Set callback functions
    tcp_arg(conn->pcb, conn);
    tcp_recv(conn->pcb, tcp_recv_callback);
    tcp_sent(conn->pcb, tcp_sent_callback);
    tcp_err(conn->pcb, tcp_err_callback);
    
    // Connect
    return tcp_connect(conn->pcb, ipaddr, port, tcp_connected_callback);
}

TCP Flow Control and Window Management

Flow Control Fundamentals TCP uses a sliding window mechanism for flow control:

Window Management Strategies

Nagle’s Algorithm and Latency Nagle’s algorithm reduces network overhead by coalescing small packets:

// Custom TCP window management
typedef struct {
    uint16_t advertised_window;
    uint16_t effective_window;
    uint16_t congestion_window;
    uint16_t slow_start_threshold;
    uint8_t dup_ack_count;
} tcp_window_state_t;

void tcp_window_update(struct tcp_pcb *pcb, tcp_window_state_t *state) {
    // Update advertised window based on available buffer space
    uint16_t available_space = tcp_sndbuf(pcb);
    state->advertised_window = available_space;
    
    // Apply flow control
    if (available_space < TCP_MIN_WINDOW) {
        // Pause sending
        tcp_output(pcb);
    }
}

IoT Application Protocols

MQTT (Message Queuing Telemetry Transport)

MQTT Architecture and Concepts MQTT is a publish/subscribe messaging protocol designed for constrained devices:

MQTT for Embedded Systems Advantages

Challenges

Implementation Considerations

// MQTT client state machine
typedef enum {
    MQTT_STATE_DISCONNECTED,
    MQTT_STATE_CONNECTING,
    MQTT_STATE_CONNECTED,
    MQTT_STATE_PUBLISHING,
    MQTT_STATE_SUBSCRIBING
} mqtt_state_t;

typedef struct {
    mqtt_state_t state;
    uint16_t packet_id;
    uint32_t keepalive_interval;
    uint32_t last_activity;
    struct tcp_pcb *tcp_pcb;
    mqtt_message_callback_t message_callback;
} mqtt_client_t;

CoAP (Constrained Application Protocol)

CoAP Design Philosophy CoAP brings HTTP-like semantics to constrained networks:

CoAP Features for Embedded

CoAP vs HTTP Trade-offs


Performance Tuning and Optimization

Memory Pool Optimization

Pool Sizing Strategy Memory pools must be sized based on:

Fragmentation Prevention

// Custom memory pool for network buffers
typedef struct {
    uint8_t *pool_start;
    uint8_t *pool_end;
    uint32_t pool_size;
    uint32_t used_blocks;
    uint32_t total_blocks;
    uint32_t block_size;
    uint8_t *free_list;
} network_pool_t;

network_pool_t* create_network_pool(uint32_t block_size, uint32_t num_blocks) {
    network_pool_t *pool = malloc(sizeof(network_pool_t));
    if (pool) {
        pool->block_size = block_size;
        pool->total_blocks = num_blocks;
        pool->pool_size = block_size * num_blocks;
        
        // Allocate aligned memory
        pool->pool_start = aligned_alloc(32, pool->pool_size);
        pool->pool_end = pool->pool_start + pool->pool_size;
        
        // Initialize free list
        pool->free_list = pool->pool_start;
        for (uint32_t i = 0; i < num_blocks - 1; i++) {
            *(uint32_t*)(pool->pool_start + i * block_size) = 
                (uint32_t)(pool->pool_start + (i + 1) * block_size);
        }
        *(uint32_t*)(pool->pool_start + (num_blocks - 1) * block_size) = 0;
        
        pool->used_blocks = 0;
    }
    return pool;
}

Interrupt Coalescing Configuration

Interrupt Coalescing Theory Interrupt coalescing reduces CPU overhead by batching interrupts:

Configuration Guidelines

// Ethernet interrupt coalescing setup
typedef struct {
    uint32_t rx_coal_pkt;
    uint32_t rx_coal_time;
    uint32_t tx_coal_pkt;
    uint32_t tx_coal_time;
} eth_coal_config_t;

err_t eth_set_interrupt_coalescing(eth_coal_config_t *config) {
    // Configure RX coalescing
    ETH->MACCR |= ETH_MACCR_IPC; // Enable interrupt coalescing
    
    // Set packet count threshold
    ETH->MACFCR = (config->rx_coal_pkt << ETH_MACFCR_RXCOAL_Pos) |
                  (config->rx_coal_time << ETH_MACFCR_RXCOAL_TIME_Pos);
    
    // Set time threshold (in 64ns units)
    uint32_t time_threshold = config->rx_coal_time * 15625; // Convert to 64ns units
    ETH->MACFCR |= (time_threshold << ETH_MACFCR_RXCOAL_TIME_Pos);
    
    return ERR_OK;
}

Diagnostics and Troubleshooting

Network Statistics Collection

What to Measure

Statistics Collection Strategy

// Comprehensive network statistics
typedef struct {
    // Interface statistics
    uint32_t rx_packets;
    uint32_t tx_packets;
    uint32_t rx_bytes;
    uint32_t tx_bytes;
    uint32_t rx_errors;
    uint32_t tx_errors;
    uint32_t rx_dropped;
    uint32_t tx_dropped;
    
    // TCP statistics
    uint32_t tcp_connections;
    uint32_t tcp_retransmissions;
    uint32_t tcp_timeouts;
    uint32_t tcp_keepalive_probes;
    
    // UDP statistics
    uint32_t udp_packets_sent;
    uint32_t udp_packets_received;
    uint32_t udp_checksum_errors;
    
    // Memory statistics
    uint32_t mem_allocated;
    uint32_t mem_peak;
    uint32_t mem_fragments;
} network_stats_t;

Advanced Packet Capture

Capture Strategy

Analysis Techniques


Production Readiness and Deployment

Health Monitoring and Watchdog

Monitoring Strategy

Recovery Mechanisms

// Network health monitoring
typedef struct {
    uint32_t last_heartbeat;
    uint32_t heartbeat_interval;
    uint32_t missed_heartbeats;
    uint8_t healthy;
} network_health_t;

void network_health_check(network_health_t *health) {
    uint32_t current_time = sys_now();
    
    if (current_time - health->last_heartbeat > health->heartbeat_interval) {
        health->missed_heartbeats++;
        
        if (health->missed_heartbeats > MAX_MISSED_HEARTBEATS) {
            health->healthy = 0;
            // Trigger network recovery
            network_recovery_procedure();
        }
    }
}

Configuration Management

Configuration Philosophy

Configuration Validation

This enhanced version provides a better balance of conceptual explanations, practical insights, and technical implementation details that embedded engineers can use to understand and implement robust networking solutions.


Guided Labs

Lab 1: Memory Pool Analysis

Objective: Understand how memory pools affect network performance and stability.

Setup: Configure lwIP with different memory pool sizes and observe behavior under load.

Steps:

  1. Start with minimal memory pools (MEMP_NUM_TCP_PCB = 2, PBUF_POOL_SIZE = 8)
  2. Run a TCP stress test with 10 concurrent connections
  3. Monitor memory usage and connection failures
  4. Gradually increase pool sizes until stable operation
  5. Document the minimum viable configuration

Expected Outcome: Understanding of the relationship between memory allocation and network stability.

Lab 2: TCP Connection Pooling

Objective: Implement and test connection pooling for improved performance.

Setup: Create a TCP client that maintains a pool of pre-allocated connections.

Steps:

  1. Implement a connection pool with configurable size
  2. Add connection health checking and automatic reconnection
  3. Test with varying connection counts and failure scenarios
  4. Measure connection establishment time with and without pooling
  5. Analyze memory usage patterns

Expected Outcome: Reduced connection overhead and improved reliability.

Lab 3: Network Performance Profiling

Objective: Profile network performance and identify bottlenecks.

Setup: Implement comprehensive network statistics collection and analysis.

Steps:

  1. Add statistics collection to key network operations
  2. Implement performance monitoring with configurable thresholds
  3. Create a dashboard for real-time network health monitoring
  4. Test under various network conditions (high latency, packet loss)
  5. Analyze performance patterns and optimize accordingly

Expected Outcome: Data-driven network optimization and proactive problem detection.


Check Yourself

Understanding Check

Application Check

Analysis Check


Further Reading

Industry Standards