The "Holy Bible" for embedded engineers
Understanding memory layout, segmentation, and access patterns for efficient embedded programming
Know which data ends up in Flash vs RAM and what the startup code must zero or copy. Use the map file to make footprint visible and deliberate.
.data
increases Flash (init image) and RAM (runtime) and costs boot copy time..bss
increases RAM and costs boot zeroing time.const
moves to ROM (.rodata
), reducing RAM..data
and .bss
.static const
and observe .rodata
vs .data
changes.static const
for lookup tables..data
and .bss
and reduce them..data
to consume both Flash and RAM?const
placement differ between hosted vs freestanding targets?Embedded_C/C_Language_Fundamentals.md
for storage durationEmbedded_C/Structure_Alignment.md
for layoutUnderstanding memory models is crucial for embedded systems programming. Memory layout, segmentation, and access patterns directly impact performance, reliability, and security of embedded applications.
Memory models define how memory is organized, accessed, and managed in embedded systems. They specify the layout of different memory regions, how data is stored and retrieved, and how memory operations are synchronized across different components of the system.
Memory Organization:
Memory Access Patterns:
Memory Management:
Flat Memory Model:
Segmented Memory Model:
Paged Memory Model:
Performance Optimization:
Reliability and Safety:
Resource Constraints:
Performance Impact:
// Poor memory access pattern - cache misses
void poor_memory_access(uint32_t* data, size_t size) {
for (size_t i = 0; i < size; i += 16) {
// Accessing every 16th element - poor cache utilization
data[i] = process_value(data[i]);
}
}
// Good memory access pattern - cache-friendly
void good_memory_access(uint32_t* data, size_t size) {
for (size_t i = 0; i < size; i++) {
// Sequential access - good cache utilization
data[i] = process_value(data[i]);
}
}
Memory Layout Impact:
// Poor memory layout - fragmentation
typedef struct {
uint8_t small_field; // 1 byte
uint32_t large_field; // 4 bytes (3 bytes padding)
uint8_t another_small; // 1 byte (3 bytes padding)
} poor_layout_t; // 12 bytes total
// Good memory layout - efficient
typedef struct {
uint32_t large_field; // 4 bytes
uint8_t small_field; // 1 byte
uint8_t another_small; // 1 byte (2 bytes padding)
} good_layout_t; // 8 bytes total
Stack Management Impact:
// Poor stack usage - potential overflow
void poor_stack_usage(void) {
uint8_t large_buffer[8192]; // 8KB on stack
// Process large buffer...
// May cause stack overflow
}
// Good stack usage - heap for large data
void good_stack_usage(void) {
uint8_t* large_buffer = malloc(8192); // Heap allocation
if (large_buffer != NULL) {
// Process large buffer...
free(large_buffer);
}
}
High Impact Scenarios:
Low Impact Scenarios:
Address Space Organization:
Memory Segmentation:
Memory Protection:
Sequential Access:
Random Access:
Strided Access:
Cache Levels:
Memory Types:
Memory layout refers to how different memory regions are organized in the address space. It defines where code, data, stack, and heap are located and how they relate to each other.
Address Space Organization:
Memory Region Types:
/*
High Address
┌─────────────────┐
│ Stack │ ← Grows downward
│ │
├─────────────────┤
│ Heap │ ← Grows upward
│ │
├─────────────────┤
│ .bss │ ← Uninitialized data
├─────────────────┤
│ .data │ ← Initialized data
├─────────────────┤
│ .text │ ← Code
└─────────────────┘
Low Address
*/
// Memory address ranges for ARM Cortex-M
#define FLASH_BASE 0x08000000 // Code memory
#define SRAM_BASE 0x20000000 // Data memory
#define PERIPH_BASE 0x40000000 // Peripheral registers
// Memory sizes
#define FLASH_SIZE (512 * 1024) // 512KB
#define SRAM_SIZE (64 * 1024) // 64KB
#define STACK_SIZE (8 * 1024) // 8KB stack
Memory segments are logical divisions of memory that serve different purposes. They help organize memory efficiently and provide different access patterns and protection levels.
Segment Organization:
Segment Characteristics:
// Code segment - contains executable instructions
void function_in_text(void) {
// This function is stored in .text segment
uint32_t local_var = 42;
// Function code here...
}
// Constants in .text segment
const uint8_t lookup_table[] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
// Function pointers
typedef void (*callback_t)(void);
const callback_t callbacks[] = {function1, function2, function3};
// Initialized global variables
uint32_t global_counter = 0;
uint8_t sensor_data[64] = {0xAA, 0xBB, 0xCC, 0xDD};
const char* const_string = "Hello World";
// Initialized static variables
static uint16_t static_var = 0x1234;
// Initialized arrays
uint8_t buffer[1024] = {0}; // Zero-initialized
// Uninitialized global variables (zeroed by startup code)
uint32_t uninitialized_var;
uint8_t large_buffer[8192];
static uint16_t static_uninit;
// These variables are automatically zeroed
// No space is used in the binary file
// Stack variables
void stack_example(void) {
int local_var = 42; // Stack allocated
uint8_t buffer[256]; // Stack array
struct sensor_data data; // Stack structure
// Stack grows downward
// Variables are automatically freed when function returns
}
// Stack overflow detection
void check_stack_usage(void) {
uint8_t* stack_ptr;
asm volatile ("mov %0, sp" : "=r" (stack_ptr));
// Calculate stack usage
uint32_t stack_used = STACK_BASE - (uint32_t)stack_ptr;
if (stack_used > STACK_SIZE - 1024) {
// Stack nearly full - take action
}
}
// Dynamic memory allocation
void heap_example(void) {
uint8_t* buffer = malloc(1024);
if (buffer != NULL) {
// Use buffer...
free(buffer);
}
}
// Heap fragmentation monitoring
typedef struct {
size_t total_blocks;
size_t free_blocks;
size_t largest_free_block;
} heap_stats_t;
heap_stats_t get_heap_stats(void) {
heap_stats_t stats = {0};
// Implementation depends on malloc implementation
return stats;
}
Linker scripts define how the linker organizes memory and creates the final executable. They specify memory layout, section placement, and symbol definitions.
Memory Definition:
Section Placement:
/* STM32F4 Linker Script */
MEMORY
{
FLASH (rx) : ORIGIN = 0x08000000, LENGTH = 512K
SRAM (rwx) : ORIGIN = 0x20000000, LENGTH = 64K
}
SECTIONS
{
/* Code section */
.text : {
*(.text)
*(.text*)
*(.rodata)
*(.rodata*)
} > FLASH
/* Initialized data */
.data : {
_sdata = .;
*(.data)
*(.data*)
_edata = .;
} > SRAM AT> FLASH
/* Uninitialized data */
.bss : {
_sbss = .;
*(.bss)
*(.bss*)
*(COMMON)
_ebss = .;
} > SRAM
}
// Custom section for critical data
__attribute__((section(".critical_data")))
uint8_t critical_buffer[256];
// Custom section for fast code
__attribute__((section(".fast_code")))
void fast_function(void) {
// Fast code implementation
}
Memory protection prevents unauthorized access to memory regions. It ensures that programs can only access memory they are supposed to access.
Protection Mechanisms:
Protection Levels:
// Memory Protection Unit configuration
typedef struct {
uint32_t region_number;
uint32_t base_address;
uint32_t size;
uint32_t access_permissions;
uint32_t attributes;
} mpu_region_t;
void configure_mpu(void) {
// Configure MPU regions
mpu_region_t regions[] = {
{0, 0x20000000, 0x1000, 0x03, 0x00}, // SRAM region
{1, 0x08000000, 0x80000, 0x05, 0x00}, // Flash region
};
// Apply MPU configuration
for (int i = 0; i < sizeof(regions)/sizeof(regions[0]); i++) {
configure_mpu_region(®ions[i]);
}
}
// Memory access control functions
void protect_memory_region(uint32_t start, uint32_t end, uint32_t permissions) {
// Configure memory protection for region
mpu_region_t region = {
.base_address = start,
.size = end - start,
.access_permissions = permissions
};
configure_mpu_region(®ion);
}
// Usage
protect_memory_region(0x20000000, 0x20001000, MPU_READ_WRITE);
Cache behavior refers to how the CPU cache interacts with memory. Understanding cache behavior is crucial for optimizing memory access patterns.
Cache Organization:
Cache Operations:
// Cache-friendly array access
void cache_friendly_access(uint32_t* data, size_t size) {
// Sequential access - good for cache
for (size_t i = 0; i < size; i++) {
data[i] = process_value(data[i]);
}
}
// Cache-unfriendly access pattern
void cache_unfriendly_access(uint32_t* data, size_t size) {
// Strided access - poor for cache
for (size_t i = 0; i < size; i += 16) {
data[i] = process_value(data[i]);
}
}
// Cache line aligned data structure
typedef struct {
uint32_t data[16]; // 64 bytes - cache line size
} __attribute__((aligned(64))) cache_aligned_t;
// Cache line aligned allocation
void* allocate_cache_aligned(size_t size) {
void* ptr;
posix_memalign(&ptr, 64, size); // 64-byte alignment
return ptr;
}
Memory ordering refers to the order in which memory operations are performed. It’s important for multi-core systems and concurrent programming.
Memory Ordering Types:
Memory Barrier Types:
// Memory barrier functions
void full_memory_barrier(void) {
__asm volatile (
"dmb 0xF\n" // Full system memory barrier
: : : "memory"
);
}
void data_memory_barrier(void) {
__asm volatile (
"dmb 0xE\n" // Data memory barrier
: : : "memory"
);
}
void instruction_barrier(void) {
__asm volatile (
"isb 0xF\n" // Instruction synchronization barrier
: : : "memory"
);
}
// Atomic operations with memory ordering
uint32_t atomic_add(uint32_t* ptr, uint32_t value) {
uint32_t result;
__asm volatile (
"ldrex %0, [%1]\n"
"add %0, %0, %2\n"
"strex r1, %0, [%1]\n"
"cmp r1, #0\n"
"bne 1b\n"
: "=r" (result)
: "r" (ptr), "r" (value)
: "r1", "cc"
);
return result;
}
#include <stdint.h>
#include <stdbool.h>
// Memory layout definitions
#define FLASH_BASE 0x08000000
#define SRAM_BASE 0x20000000
#define STACK_SIZE (8 * 1024)
#define HEAP_SIZE (16 * 1024)
// Memory protection definitions
#define MPU_READ_WRITE 0x03
#define MPU_READ_ONLY 0x05
#define MPU_NO_ACCESS 0x00
// Memory region structure
typedef struct {
uint32_t start_address;
uint32_t end_address;
uint32_t permissions;
const char* name;
} memory_region_t;
// Memory regions
static const memory_region_t memory_regions[] = {
{FLASH_BASE, FLASH_BASE + 512*1024, MPU_READ_ONLY, "Flash"},
{SRAM_BASE, SRAM_BASE + 64*1024, MPU_READ_WRITE, "SRAM"},
{0x40000000, 0x40000000 + 1024*1024, MPU_READ_WRITE, "Peripherals"},
};
// Memory protection functions
void configure_memory_protection(void) {
// Configure MPU for memory regions
for (int i = 0; i < sizeof(memory_regions)/sizeof(memory_regions[0]); i++) {
const memory_region_t* region = &memory_regions[i];
configure_mpu_region(region->start_address,
region->end_address - region->start_address,
region->permissions);
}
}
// Stack monitoring
typedef struct {
uint32_t stack_base;
uint32_t stack_size;
uint32_t current_usage;
} stack_monitor_t;
static stack_monitor_t stack_monitor = {
.stack_base = SRAM_BASE + 64*1024 - STACK_SIZE,
.stack_size = STACK_SIZE
};
void update_stack_usage(void) {
uint32_t current_sp;
__asm volatile ("mov %0, sp" : "=r" (current_sp));
stack_monitor.current_usage =
stack_monitor.stack_base + stack_monitor.stack_size - current_sp;
// Check for stack overflow
if (stack_monitor.current_usage > stack_monitor.stack_size - 1024) {
// Stack nearly full - take action
handle_stack_overflow();
}
}
// Heap monitoring
typedef struct {
size_t total_allocated;
size_t total_freed;
size_t current_usage;
size_t peak_usage;
} heap_monitor_t;
static heap_monitor_t heap_monitor = {0};
void* monitored_malloc(size_t size) {
void* ptr = malloc(size);
if (ptr != NULL) {
heap_monitor.total_allocated += size;
heap_monitor.current_usage += size;
if (heap_monitor.current_usage > heap_monitor.peak_usage) {
heap_monitor.peak_usage = heap_monitor.current_usage;
}
}
return ptr;
}
void monitored_free(void* ptr) {
if (ptr != NULL) {
// Note: This is simplified - actual size tracking requires more complex implementation
heap_monitor.total_freed += sizeof(void*);
heap_monitor.current_usage -= sizeof(void*);
free(ptr);
}
}
// Cache optimization functions
void* allocate_cache_aligned(size_t size) {
void* ptr;
if (posix_memalign(&ptr, 64, size) != 0) {
return NULL;
}
return ptr;
}
void cache_friendly_copy(uint8_t* dest, const uint8_t* src, size_t size) {
// Copy data in cache-friendly manner
for (size_t i = 0; i < size; i++) {
dest[i] = src[i];
}
}
// Memory barrier functions
void full_memory_barrier(void) {
__asm volatile (
"dmb 0xF\n"
: : : "memory"
);
}
void data_memory_barrier(void) {
__asm volatile (
"dmb 0xE\n"
: : : "memory"
);
}
// Main function
int main(void) {
// Configure memory protection
configure_memory_protection();
// Monitor stack usage
update_stack_usage();
// Use monitored memory allocation
uint8_t* buffer = monitored_malloc(1024);
if (buffer != NULL) {
// Use buffer
monitored_free(buffer);
}
// Use cache-aligned allocation
uint8_t* cache_buffer = allocate_cache_aligned(1024);
if (cache_buffer != NULL) {
// Use cache-aligned buffer
free(cache_buffer);
}
return 0;
}
Problem: Stack grows beyond allocated space Solution: Monitor stack usage and allocate sufficient stack space
// ❌ Bad: Large stack allocation
void bad_stack_usage(void) {
uint8_t large_buffer[8192]; // 8KB on stack
// May cause stack overflow
}
// ✅ Good: Heap allocation for large data
void good_stack_usage(void) {
uint8_t* large_buffer = malloc(8192);
if (large_buffer != NULL) {
// Use buffer
free(large_buffer);
}
}
Problem: Memory becomes fragmented over time Solution: Use memory pools and avoid frequent allocation/deallocation
// ❌ Bad: Frequent allocation/deallocation
void bad_memory_usage(void) {
for (int i = 0; i < 1000; i++) {
void* ptr = malloc(100);
// Use ptr
free(ptr);
}
}
// ✅ Good: Reuse allocated memory
void good_memory_usage(void) {
void* ptr = malloc(100);
for (int i = 0; i < 1000; i++) {
// Reuse ptr
}
free(ptr);
}
Problem: Poor cache utilization due to access patterns Solution: Use cache-friendly access patterns
// ❌ Bad: Cache-unfriendly access
void cache_unfriendly(uint32_t* data, size_t size) {
for (size_t i = 0; i < size; i += 16) {
data[i] = process_value(data[i]);
}
}
// ✅ Good: Cache-friendly access
void cache_friendly(uint32_t* data, size_t size) {
for (size_t i = 0; i < size; i++) {
data[i] = process_value(data[i]);
}
}
Problem: Misaligned memory access causing performance penalties Solution: Ensure proper memory alignment
// ❌ Bad: Misaligned access
typedef struct {
uint8_t a; // 1 byte
uint32_t b; // 4 bytes (3 bytes padding)
uint8_t c; // 1 byte (3 bytes padding)
} misaligned_t; // 12 bytes
// ✅ Good: Aligned access
typedef struct {
uint32_t b; // 4 bytes
uint8_t a; // 1 byte
uint8_t c; // 1 byte (2 bytes padding)
} aligned_t; // 8 bytes
Next Steps: Explore Advanced Memory Management to understand efficient memory management techniques, or dive into Hardware Fundamentals for hardware-specific programming.