Performance & Optimization #
Resources #
- Explanation of what load averages really means
- Commands you should run in the first 60 seconds on a misbehaving server
- In-depth analysis of Linux Performance tools
- 30 Linux monitoring tools every sysadmin should know
Notes #
It’s important to have a methodology. I’ve run these tools in this manner and I didn’t find anything that indicates a performance issue on the server. I think you should check your code.
Anti-Methods #
Drunk man looking for their keys under a streetlamp because that’s where the light is.
Blame someone else anti-method.
Actual Methodologies #
- Problem statement
- Workload characterization
- USE
- Off-CPU Analysis
- CPU profile
- RTFM Method
- Active Benchmarking
- Static Performance Tuning
Problem Statement Method #
- What makes you think there is a performance problem?
- Has this system ever performed well?
- What has changed recently? a. Software? b. Hardware? c. Load?
- Can the performance degradation be expressed in terms of latency or run time?
- Does the problem affect other people or applications (or is it just you)?
- What is the environment? Software, hardware, instance types? Versions? Configuration?
Workload Characterization Method #
- Who is causing the load? PID, UID, IP addr, …
- Why is the load called? code path, stack trace
- What is the load? IOPS, tput, type, r/w
- How is the load changing over time?
The USE Method #
Only check these 3 things for all of your resources.
For every resource check:
- Utilization
- Saturation
- Errors
Definitions:
- Utilization: busy time
- Saturation: queue length or queued time
- Errors: easy to interpret (objective)
It helps if you have a functional (block) diagram of your system / software / environment, showing all resources
Start with questions, then find the tools.
USE Method: Linux Performance Checklist #
Off-CPU Analysis #
See slides
I’m not sure I understand this really…worth more research.
CPU Profile Method #
- Take a CPU profile
- Understand all software in profile > 1%
- Discovers a wide range of performance issues by their CPU usage.
- Narrows software study.
If you profile what’s on CPU then narrows down what parts of the software (i.e. MySQL) is actually turned on and therefore needs to be looked at.
RTFM #
How to understand performance tools or metrics?
- Man pages
- Books
- Web search
- Co-workers
- Talks, slides, videos
- Support services
- Source code
- Experimentation
- Social
Reading through source code. Writing a bit of code that should tax the resource in the way we’re looking for.
Tools #
Objectives:
- Perform the USE Method for resource utilization
- Perform workload characterization for disks, network
- Perform CPU Profile Method using flame graphs
- Have exposure to various observability tools:
- Basic: vmstat, iostat, mpstat, ps, top
- Intermediate: tcpdump, netstat, nicstat, pidstat, sar,
- Advnaced: ss, slaptop, perf_events,
- Perform Active Benchmarking
- Understand tuning risks
- Perform Static Performance Tuning
Tool Types #
Type | Types |
---|---|
Observability | Watch activity. Safe: usually, depending on resource overhead. |
Benchmarking | Load test. Caution: production tests can cause issues due to contention. |
Tuning | Change. Danger: changes could hurt performance, now or later with load. |
Static | Check configuration. Should be safe. |
Basic Observability Tools #
- uptime
- top or htop
- ps
- vmstat
- iostat
- mpstat
- free
Intermediate Observability Tools #
- strace
- tcpdump
- netstat
- nicstat
- pidstat
- swapon
- lsof
- sar - System Activity Reporter