Analyze Simulation Results
In this page, we will explain how to analyze simulation results.
Analyze SimpleSSD Log and Statistics
SimpleSSD prints debug log which contains all information about I/O requests and statistics of all components.
SimpleSSD debug log
For SimpleSSD-FullSystem, you can find debug log on stdout
of gem5.
For SimpleSSD-Standalone, you can define where to save debug log by [global] DebugLogFile
simulation option.
Here is an example of debug log when read-ahead triggered.
1 861004146876: HIL::NVMe: SQ 3 | Submission Queue Tail Doorbell | Item count in queue 0 -> 1 | head 432 | tail 432 -> 433
2 861004362088: HIL::NVMe: NVM | READ | NSID 1 | 1B0 + 1
3 861004362088: HIL: READ | REQ 477 | LCA 432 + 1 | BYTE 0 + 4096
4 861004362088: ICL::GenericCache: READ | REQ 477-1 | LCA 432 | SIZE 4096
5 861004362088: ICL::GenericCache: READ | Cache hit at (0, 71) | 861004362088 - 861005056598 (694510)
6 861004362088: ICL::GenericCache: READ | Prefetch triggered
7 861004362088: ICL::GenericCache: ----- | Begin eviction
8 861004362088: ICL::GenericCache: ----- | End eviction | 861008457768 - 861008457768 (0)
9 861004362088: FTL: READ | LPN 5
10 861004362088: PAL::PALOLD: READ | Block 2555 | Page 1
11 861004362088: PAL::PALOLD: READ | C 0 | W 0 | D 0 | P 0 | B 511 | P 1
12 861004362088: FTL::PageMapping: READ | LPN 5 | 861008457768 - 861126478092 (118020324)
13 861004362088: ICL::GenericCache: READ | Cache miss at (0, 77) | 861008457768 - 861126775592 (118317824)
14
15 ...
16
17 861004362088: FTL: READ | LPN 5
18 861004362088: PAL::PALOLD: READ | Block 2555 | Page 1
19 861004362088: PAL::PALOLD: READ | C 11 | W 0 | D 0 | P 0 | B 511 | P 1
20 861004362088: FTL::PageMapping: READ | LPN 5 | 861008457768 - 861126478092 (118020324)
21 861004362088: ICL::GenericCache: READ | Cache miss at (3, 77) | 861008457768 - 861126775592 (118317824)
22 861004362088: FTL: READ | LPN 5
23 861004362088: PAL::PALOLD: READ | Block 2555 | Page 1
24 861004362088: PAL::PALOLD: READ | C 0 | W 0 | D 1 | P 0 | B 511 | P 1
25 861004362088: FTL::PageMapping: READ | LPN 5 | 861008457768 - 861138228620 (129770852)
26 861004362088: ICL::GenericCache: READ | Cache miss at (4, 77) | 861008457768 - 861138526120 (130068352)
27
28 ...
29
30 861004362088: FTL: READ | LPN 5
31 861004362088: PAL::PALOLD: READ | Block 2555 | Page 1
32 861004362088: PAL::PALOLD: READ | C 11 | W 0 | D 7 | P 0 | B 511 | P 1
33 861004362088: FTL::PageMapping: READ | LPN 5 | 861008457768 - 861208731788 (200274020)
34 861004362088: ICL::GenericCache: READ | Cache miss at (7, 87) | 861008457768 - 861209029288 (200571520)
35 861004362088: ICL::GenericCache: READ | Prefetch done
36 861004362088: ICL: READ | LCA 432 + 1 | 861004362088 - 861006961598 (2599510)
37 861007304098: HIL::NVMe: NVM | READ | NSID 1 | 1B0 + 1 | 861004362088 - 861007304098 (2942010)
38 861007444114: HIL::NVMe: INTR | MSI-X sent | vector 2
39 861008760480: HIL::NVMe: CQ 3 | Completion Queue Head Doorbell | Item count in queue 1 -> 0 | head 432 -> 433 | tail 433
As shown at line 6, prefetch has been triggered because prefetch condition has been satisfied.
GenericCache
module of SimpleSSD will read one super page
or all pages from all planes
(you can configure this).
At line 13 and 21, you can check the latency calculated by PAL
has same value (118,317,824 ps).
These two requests are targets different channel, so they can operate simultaneously (They are accessing channel 0 and 11, respectively).
At line 26, the latency value changed (130,068,352 ps), because channel 0 has previous request, so PAL determined there was channel conflicts.
At line 34, request targeting die 7 in channel 11 delayed about 90us because of previous I/Os targeting 7 dies (die 0 to 6).
After prefetch, GenericCache
return request as shown at line 36.
The latency is quite small because data for the request is already exists on the cache (see line 5).
By analyzing debug log, you can find characteristics of each I/O and which component consumes large portion of I/O latency.
SimpleSSD Statistics
For SimpleSSD-FullSystem, you can find statistics at m5out/stats.txt
.
For SimpleSSD-Standalone, not yet supported (at point of writing v2.0-rc3
).
Here is the example of statistics.
This file is generated after performing I/O using fio
with --direct=1 --ioengine=libaio --bs=4K --iodepth=1 --rw=read --time_based --runtime=5s
.
1 system.pci_nvme.command_count 413480 # Total number of NVMe command handled
2 system.pci_nvme.read.request_count 413480 # Read request count
3 system.pci_nvme.read.bytes 1693614080 # Read data size in byte
4 system.pci_nvme.read.busy 1653776508704 # Device busy time when read
5 system.pci_nvme.write.request_count 0 # Write request count
6 system.pci_nvme.write.bytes 0 # Write data size in byte
7 system.pci_nvme.write.busy 0 # Device busy time when write
8 system.pci_nvme.request_count 413480 # Total request count
9 system.pci_nvme.bytes 1693614080 # Total data size in byte
10 system.pci_nvme.busy 1653776508704 # Total device busy time
11 system.pci_nvme.icl.generic_cache.read.request_count 413480 # Read request count
12 system.pci_nvme.icl.generic_cache.read.from_cache 413224 # Read requests that served from cache
13 system.pci_nvme.icl.generic_cache.write.request_count 0 # Write request count
14 system.pci_nvme.icl.generic_cache.write.to_cache 0 # Write requests that served to cache
15 system.pci_nvme.dram.energy.act 351568356.208920 # ACT command energy (pJ)
16 system.pci_nvme.dram.energy.pre 437367776.459918 # PRE command energy (pJ)
17 system.pci_nvme.dram.energy.rd 368277927.504022 # RD command energy (pJ)
18 system.pci_nvme.dram.energy.wr 198140164.999281 # WR command energy (pJ)
19 system.pci_nvme.dram.energy.act.standby 1992764428.068482 # ACT standby energy (pJ)
20 system.pci_nvme.dram.energy.pre.standby 146846141074.948761 # PRE standby energy (pJ)
21 system.pci_nvme.dram.energy.ref 2589408.045731 # Refresh energy (pJ)
22 system.pci_nvme.dram.read.request_count 826662 # Read request count
23 system.pci_nvme.dram.read.bytes 1695873008 # Read data size in byte
24 system.pci_nvme.dram.write.request_count 413438 # Write request count
25 system.pci_nvme.dram.write.bytes 1693442048 # Write data size in byte
26 system.pci_nvme.dram.request_count 1240100 # Total request count
27 system.pci_nvme.dram.bytes 3389315056 # Total data size in byte
28 system.pci_nvme.ftl.page_mapping.gc_count 0 # Total GC count
29 system.pci_nvme.ftl.page_mapping.reclaimed_blocks 0 # Total reclaimed blocks in GC
30 system.pci_nvme.pal.energy.read 2892824.972693 # NAND power used on read transaction (uJ)
31 system.pci_nvme.pal.energy.program 0 # NAND power used on program transaction (uJ)
32 system.pci_nvme.pal.energy.erase 0 # NAND power used on erase transaction (uJ)
33 system.pci_nvme.pal.energy.total 2892824.972693 # NAND power used on all transaction (uJ)
34 system.pci_nvme.cpu.icl0.busy 929296300000 # CPU for ICL core 0 busy ticks
35 system.pci_nvme.cpu.icl0.insts.branch 31837960 # CPU for ICL core 0 executed branch instructions
36 system.pci_nvme.cpu.icl0.insts.load 193508640 # CPU for ICL core 0 executed load instructions
37 system.pci_nvme.cpu.icl0.insts.store 26049240 # CPU for ICL core 0 executed store instructions
38 system.pci_nvme.cpu.icl0.insts.arithmetic 118668760 # CPU for ICL core 0 executed arithmetic instructions
39 system.pci_nvme.cpu.icl0.insts.fp 0 # CPU for ICL core 0 executed floating point instructions
40 system.pci_nvme.cpu.icl0.insts.others 1653920 # CPU for ICL core 0 executed other instructions
41 system.pci_nvme.cpu.ftl0.busy 373127795000 # CPU for FTL core 0 busy ticks
42 system.pci_nvme.cpu.ftl0.insts.branch 21085338 # CPU for FTL core 0 executed branch instructions
43 system.pci_nvme.cpu.ftl0.insts.load 74418840 # CPU for FTL core 0 executed store instructions
44 system.pci_nvme.cpu.ftl0.insts.store 12403140 # CPU for FTL core 0 executed load instructions
45 system.pci_nvme.cpu.ftl0.insts.arithmetic 39276610 # CPU for FTL core 0 executed arithmetic instructions
46 system.pci_nvme.cpu.ftl0.insts.fp 0 # CPU for FTL core 0 executed floating point instructions
47 system.pci_nvme.cpu.ftl0.insts.others 2067190 # CPU for FTL core 0 executed other instructions
Each line contains one statistic for components. You can find DRAMPower result at line 15-21. This SSD has three cores, each core is dedicated for each firmware component. You can find CPU statistics at line 34-47.
This stats.txt
file is generated by gem5, which can controlled in user-level space through m5
utility.
Analyze Log from Linux Kernel
To analyze kernel behavior, you need to modify the kernel source code.
In SimpleSSD-FullSystem, we added new M5 pseudo-instruction (M5OP) called m5print.
This instruction prints string to debug log of gem5 (specified by --debug-flag
and --debug-file
options).
Prepare Linux Kernel
In this section, we will explain how to add M5OP
to the Linux Kernel.
We assumes ARMv8
(arm64
) architecture.
Copy m5ops.h
and m5op_flags.h
files in include/gem5/asm/generic
directory of SimpleSSD-FullSystem to arch/arm64/include/asm
of Linux Kernel.
Modify m5ops.h
file to change include path of m5op_flags.h
.
// From
#include <gem5/asm/generic/m5op_flags.h>
// To
#include <asm/m5op_flags.h>
Copy m5op_arm_A64.S
file in util/m5
directory of SimpleSSD-FullSystem to arch/arm64/kernel
of Linux Kernel.
Modify m5op_arm_A64.S
file to change include path of m5ops.h
// From
#include <gem5/asm/generic/m5ops.h>
// To
#include <asm/m5ops.h>
Add following line to Makefile
in arch/arm64/kernel
(line number may different).
16 # Object file lists.
17 arm64-obj-y := debug-monitors.o entry.o irq.o fpsimd.o \
18 entry-fpsimd.o process.o ptrace.o setup.o signal.o \
19 sys.o stacktrace.o time.o traps.o io.o vdso.o \
20 hyp-stub.o psci.o cpu_ops.o insn.o \
21 return_address.o cpuinfo.o cpu_errata.o \
22 cpufeature.o alternative.o cacheinfo.o \
23 smp.o smp_spin_table.o topology.o smccc-call.o \
24 m5op_arm_A64.o
25
26 extra-$(CONFIG_EFI) := efi-entry.o
Copy m5ops.h
file in include/gem5
directory of SimpleSSD to include/linux
of Linux Kernel.
Do not confuse with m5ops.h
which copied to arch/arm64/include/asm
directory.
Modify include/linux/m5ops.h
.
// From
#include <stdint.h>
#include <time.h>
#include <gem5/asm/generic/m5op_flags.h>
// To
#include <linux/types.h>
#include <linux/time.h>
#include <asm/m5op_flags.h>
Also you need to modify following lines too.
// From line 49 and 66
uint64_t m5_rpns();
void m5_load_symbol();
// To
uint64_t m5_rpns(void);
void m5_load_symbol(void);
Now you are ready to use M5OP
inside the Linux Kernel.
Modify Kernel
Now, let’s add m5_print
function where you want to analyze.
Objective: measure latency of MSI-X interrupt.
As SimpleSSD sends MSI-X interrupt (see above section for debug log), you can measure latency until IRQ handler of NVMe driver called.
Modify drivers/nvme/host/pci.c
(line number may different).
854 static irqreturn_t nvme_irq(int irq, void *data)
855 {
856 irqreturn_t result;
857 struct nvme_queue *nvmeq = data;
858 m5_print("nvme_irq", 8);
859 spin_lock(&nvmeq->q_lock);
860 nvme_process_cq(nvmeq);
861 result = nvmeq->cqe_seen ? IRQ_HANDLED : IRQ_NONE;
862 nvmeq->cqe_seen = 0;
863 spin_unlock(&nvmeq->q_lock);
864 return result;
865 }
Don’t forget to add #include <linux/m5ops.h>
to beginning of the file.
Build Linux Kernel and copy it to $M5_PATH/binaries
.
For kernel build instructions, see Build Linux Kernel for gem5.
Analyze result
Boot SimpleSSD-FullSystem with newly built kernel image.
As we want to check MSI-X, terminate simulation with m5 exit
after kernel boot.
Here is the example of m5out/debug.txt
and m5out/simout
.
1 5361549000: system.terminal: attach terminal 0
2 436771014500: global: Log from guest: nvme_irq
3 436777643000: global: Log from guest: nvme_irq
4 437013694000: global: Log from guest: nvme_irq
5 437021837500: global: Log from guest: nvme_irq
91 436732767000: HIL::NVMe: BAR0 | WRITE | Controller Configuration
92 436732767000: HIL::NVMe: DMAPORT | WRITE | DATA 00460001
93 436732786500: HIL::NVMe: BAR0 | READ | Controller Status
94 436732786500: HIL::NVMe: DMAPORT | READ | DATA 00000001
95 436763277000: HIL::NVMe: BAR0 | READ | Version
96 436763277000: HIL::NVMe: DMAPORT | READ | DATA 00010201
97 436763284000: HIL::NVMe: BAR0 | READ | Controller Capabilities
98 436763284000: HIL::NVMe: DMAPORT | READ | DATA 0020002028010FFF
99 436765154500: HIL::NVMe: SQ 0 | Submission Queue Tail Doorbell | Item count in queue 0 -> 1 | head 0 | tail 0 -> 1
100 436768910422: HIL::NVMe: ADMIN | Identify | CNS 1 | CNTID 0 | NSID 0
101 436770652297: HIL::NVMe: INTR | MSI-X sent | vector 0
102 436771375000: HIL::NVMe: CQ 0 | Completion Queue Head Doorbell | Item count in queue 1 -> 0 | head 0 -> 1 | tail 1
103 436776059000: HIL::NVMe: SQ 0 | Submission Queue Tail Doorbell | Item count in queue 0 -> 1 | head 1 | tail 1 -> 2
104 436777053844: HIL::NVMe: ADMIN | Set Features | Feature 7 | NSID 0
105 436777179719: HIL::NVMe: INTR | MSI-X sent | vector 0
106 436778003500: HIL::NVMe: CQ 0 | Completion Queue Head Doorbell | Item count in queue 1 -> 0 | head 1 -> 2 | tail 2
In the simout
(lower code block), MSI-X sent at 436,770,652,297 ps (436ms).
In the debug.txt
(upper code block), nvme_irq
function called at 436,771,014,500 ps (436ms).
The difference was 362,203 ps (362ns).
The analysis method using M5OP
has very low overhead (few CPU cycles) because it based on the instruction.
You can add any number of m5_print
function to the Linux Kernel or even in user-level program.
To generate debug.txt
file, you should specify --debug-flag=M5Print
and --debug-file=debug.txt
when executing SimpleSSD-FullSystem.
Analyze Debug Log and Statistics from Simulator
Also you can analyze debug log and statistics of simulator itself.
SimpleSSD-FullSystem
Similar to Analyze SimpleSSD Log and Statistics, gem5 uses --debug-file
and stats.txt
to print detailed debug log of hardware components and statistics.
stats.txt
file can be analyzed with same method of SimpleSSD.
We will explain how to use --debug-flag
and --debug-file
options of gem5.
--debug-file
specifies where to save hardware debug log (see Execution of SimpleSSD-FullSystem).
--debug-flag
specifies which hardware modules to enable debug log printout.
The M5Print
module used in above section specifies debug log from guest OS running on gem5.
You can list up all usable debug flags using ./build/ARM/gem5.opt --debug-help
command.
SimpleSSD-Standalone
For standalone version, the log printout is quite limited (v2.0-rc3
).
SimpleSSD Standalone v2.0
********** Begin of simulation **********
*** Statistics of Request Generator ***
Tick: 5000439428548
Time (ps): 79297718 - 5000439428548 (5000360130830)
I/O (bytes): 1393033216 (278586577.676911 B/s)
I/O (counts): 340096 (Read: 170048, Write: 170048)
*** End of statistics ***
*** Statistics of Event Engine ***
Simulation Tick (ps): 5000439428548
Host time duration (sec): 27.220903
Event handled: 10758702 (395236.771106 ops)
*** End of statistics ***
End of simulation @ tick 5000439428548
As shown as above code block, statistics of request generator contains total I/O count and bandwidth information.
Analyze on user-level
If you are using SimpleSSD-FullSystem, you can analyze the SimpleSSD using user-level benchmark programs. Compile benchmark program (see Build User-Level Program for gem5) and run inside of gem5.
Here is the example script (using with --script
option).
1 #!/bin/bash
2 cd /home/root
3 ./stat 100 &
4 m5 resetstats
5 ./fio --direct=1 --ioengine=libaio --iodepth=1 --bs=4096 \
6 --rw=read --filename=/dev/nvme0n1 --numjobs=1 --name=test --write_lat_log=test \
7 --write_iops_log=test --write_bw_log=test \
8 --log_avg_msec=10 --time_based --runtime=5s \
9 --randseed=13425
10 m5 dumpstats
11 m5 writefile test_clat.1.log
12 m5 writefile test_slat.1.log
13 m5 writefile test_lat.1.log
14 m5 writefile test_iops.1.log
15 m5 writefile test_bw.1.log
16 m5 exit
At line 3, stat
program collects kernel/user CPU and DRAM utilization (through /proc
filesystem).
At line 4, reset all statistics of gem5 to zero.
Execute fio
and dump final statistics (line 10).
After writing result files of fio
, terminate simulation (line 16).
After simulation, you can get 6 files (one for fio
output, 5 for fio
log files).