Linux Performance Improvement: Kernel Settings, TCP/IP and NUMA Optimization

🧠 What Will You Learn in This Guide?

Linux is the backbone of modern cloud infrastructure.
However, default kernel configurations are not always optimized for high performance.
In this guide, you will learn how to maximize CPU, memory, networking and I/O performance by optimizing the Linux kernel.
You'll also see how to detect bottlenecks and fine-tune the NUMA architecture with professional analysis tools like perf, ftrace, bpftrace.

🚀 1. Performance Profiling and Monitoring Tools

The first step to improving Linux performance is to correctly identify where the bottlenecks are.

🔧 1.1 `perf`: CPU, Memory and I/O Analysis

perf tracks CPU cycles, cache misses, and instruction execution statistics.

Purpose: To find the functions that consume the most CPU
Steps:

sudo perf record -F 99 -a --call-graph dwarf sleep 30
sudo perf report --stdio

🔹 The first command collects CPU samples for 30 seconds. 🔹 The second command shows which functions consume the most CPU time. In this way, you can optimize “hot spots” (hot paths).

🧩 1.2 ftrace: Monitoring Kernel Functions

ftrace tracks system calls directly at the kernel level. For example, to analyze disk write latency:

echo function | sudo tee /sys/kernel/debug/tracing/current_tracer
echo sys_enter_write | sudo tee /sys/kernel/debug/tracing/set_ftrace_filter
cat /sys/kernel/debug/tracing/trace_pipe

💡 These commands monitor the sys_enter_write function and instantly show delays. Perfect for diagnosing disk I/O bottlenecks.

🧠 1.3 bpftrace: Customizable, Real-Time Tracking

bpftrace is the most flexible tool for real-time monitoring of kernel events (tracepoints).

sudo bpftrace -e 'tracepoint:kmem:kmalloc { printf("Allocated %d bytes\n", args->bytes_alloc); }'

📊 This script reports each memory allocation (kmalloc). Used to detect memory leaks or granular allocation problems.

💾 2. Memory and I/O Optimization

🧮 2.1 Memory Profiling (Valgrind/Massif)

To monitor memory usage over time:

valgrind --tool=massif --time-unit=B ./memory_heavy_app
ms_print massif.out.<pid>

💡 This analysis shows which functions memory consumption is concentrated in. Allows you to reconfigure applications with high memory usage.

💿 2.2 Detecting Disk I/O Congestion (iostat)

To measure disk access time:

iostat -dx 5
Metrik	Açıklama
await	İsteklerin kuyrukta beklediği süre
svctm	I/O hizmet süresi

If await > svctm, there is a disk bottleneck.

Solution: Switching to SSD, I/O scheduler change or RAID optimization.

🌐 3. Networking Stack Optimization

📡 3.1 TCP Queue Tuning for High Concurrency

sudo sysctl -w net.ipv4.tcp_max_syn_backlog=4096
echo "net.ipv4.tcp_max_syn_backlog=4096" | sudo tee -a /etc/sysctl.conf
sudo sysctl -p

This setting queues more connection requests under heavy traffic. It is especially ideal for API gateways or high traffic websites.

📦 TCP Buffer Sizes for 3.2 Bandwidth

sudo sysctl -w net.ipv4.tcp_rmem="4096 87380 6291456"
sudo sysctl -w net.ipv4.tcp_wmem="4096 65536 6291456"

These buffers improve send/receive performance during data transfer. Latency is reduced in high bandwidth systems.

⚙️ 3.3 BBR Algorithm for Low Latency

sudo sysctl -w net.ipv4.tcp_congestion_control=bbr

BBR is an algorithm that focuses on “bandwidth and RTT” rather than “packet loss”. Result: Low latency, high stability, faster response times.

🧩 4. NUMA (Non-Uniform Memory Access) Configuration

NUMA awareness on multi-socket servers is critical, especially in data centers.

sudo echo 1 > /proc/irq/<IRQ_NUMBER>/smp_affinity

This command connects the network card's interrupts (IRQ) to the CPU on the same NUMA node. Latency decreases, data flow speeds up.

💡 You can also disable the irqbalance service and manually distribute the CPU.

⚡ 5. Making Kernel Settings Persistent

echo "net.ipv4.tcp_max_syn_backlog=4096" | sudo tee -a /etc/sysctl.conf
sudo sysctl -p

🧠 sysctl -w commands are temporary. It must be added to the /etc/sysctl.conf file to be permanent.

❓ Frequently Asked Questions (FAQ)

1. Will all these settings be lost after a reboot?

Yes, the settings made with sysctl -w are temporary. To make it permanent, write to /etc/sysctl.conf and execute sudo sysctl -p.

2. What settings should I prioritize for low latency?

The bbr algorithm and limiting buffer sizes (rmem/wmem) reduce jitter.

3. How much should tcp_max_syn_backlog be?

Values between 4096–8192 are usually sufficient but should be verified by load tests.

4. Why is NUMA important?

Because aligning the CPU and memory on the same node greatly reduces latency.

5. What can wrong kernel tuning lead to?

Memory waste, disconnection, or network bottleneck may occur. You should test all changes in a pre-production environment.

🏁 Result

A properly configured Linux system both reduces latency and uses resources more efficiently. With these settings, you can achieve noticeable performance increases, especially on high-traffic APIs and database servers.

⚙️ By applying these optimization steps on your GenixNode servers, you can create a faster, stable and scalable infrastructure.

🧠 What Will You Learn in This Guide?​

🚀 1. Performance Profiling and Monitoring Tools​

🔧 1.1 perf: CPU, Memory and I/O Analysis​

🧩 1.2 ftrace: Monitoring Kernel Functions​

🧠 1.3 bpftrace: Customizable, Real-Time Tracking​

💾 2. Memory and I/O Optimization​

🧮 2.1 Memory Profiling (Valgrind/Massif)​

💿 2.2 Detecting Disk I/O Congestion (iostat)​

🌐 3. Networking Stack Optimization​

📡 3.1 TCP Queue Tuning for High Concurrency​

📦 TCP Buffer Sizes for 3.2 Bandwidth​

⚙️ 3.3 BBR Algorithm for Low Latency​

🧩 4. NUMA (Non-Uniform Memory Access) Configuration​

⚡ 5. Making Kernel Settings Persistent​

❓ Frequently Asked Questions (FAQ)​

🏁 Result​