Analyze Simulation Results

In this page, we will explain how to analyze simulation results.

Analyze SimpleSSD Log and Statistics

SimpleSSD prints debug log which contains all information about I/O requests and statistics of all components.

SimpleSSD debug log

For SimpleSSD-FullSystem, you can find debug log on stdout of gem5. For SimpleSSD-Standalone, you can define where to save debug log by [global] DebugLogFile simulation option.

Here is an example of debug log when read-ahead triggered.

 1 861004146876: HIL::NVMe: SQ 3    | Submission Queue Tail Doorbell | Item count in queue 0 -> 1 | head 432 | tail 432 -> 433
 2 861004362088: HIL::NVMe: NVM     | READ  | NSID 1     | 1B0 + 1
 3 861004362088: HIL: READ  | REQ     477 | LCA 432 + 1 | BYTE 0 + 4096
 4 861004362088: ICL::GenericCache: READ  | REQ     477-1    | LCA 432 | SIZE 4096
 5 861004362088: ICL::GenericCache: READ  | Cache hit at (0, 71) | 861004362088 - 861005056598 (694510)
 6 861004362088: ICL::GenericCache: READ  | Prefetch triggered
 7 861004362088: ICL::GenericCache: ----- | Begin eviction
 8 861004362088: ICL::GenericCache: ----- | End eviction | 861008457768 - 861008457768 (0)
 9 861004362088: FTL: READ  | LPN 5
10 861004362088: PAL::PALOLD: READ  | Block 2555 | Page 1
11 861004362088: PAL::PALOLD: READ  | C     0 | W     0 | D     0 | P     0 | B   511 | P     1
12 861004362088: FTL::PageMapping: READ  | LPN 5 | 861008457768 - 861126478092 (118020324)
13 861004362088: ICL::GenericCache: READ  | Cache miss at (0, 77) | 861008457768 - 861126775592 (118317824)
14 
15 ...
16 
17 861004362088: FTL: READ  | LPN 5
18 861004362088: PAL::PALOLD: READ  | Block 2555 | Page 1
19 861004362088: PAL::PALOLD: READ  | C    11 | W     0 | D     0 | P     0 | B   511 | P     1
20 861004362088: FTL::PageMapping: READ  | LPN 5 | 861008457768 - 861126478092 (118020324)
21 861004362088: ICL::GenericCache: READ  | Cache miss at (3, 77) | 861008457768 - 861126775592 (118317824)
22 861004362088: FTL: READ  | LPN 5
23 861004362088: PAL::PALOLD: READ  | Block 2555 | Page 1
24 861004362088: PAL::PALOLD: READ  | C     0 | W     0 | D     1 | P     0 | B   511 | P     1
25 861004362088: FTL::PageMapping: READ  | LPN 5 | 861008457768 - 861138228620 (129770852)
26 861004362088: ICL::GenericCache: READ  | Cache miss at (4, 77) | 861008457768 - 861138526120 (130068352)
27 
28 ...
29 
30 861004362088: FTL: READ  | LPN 5
31 861004362088: PAL::PALOLD: READ  | Block 2555 | Page 1
32 861004362088: PAL::PALOLD: READ  | C    11 | W     0 | D     7 | P     0 | B   511 | P     1
33 861004362088: FTL::PageMapping: READ  | LPN 5 | 861008457768 - 861208731788 (200274020)
34 861004362088: ICL::GenericCache: READ  | Cache miss at (7, 87) | 861008457768 - 861209029288 (200571520)
35 861004362088: ICL::GenericCache: READ  | Prefetch done
36 861004362088: ICL: READ  | LCA 432 + 1 | 861004362088 - 861006961598 (2599510)
37 861007304098: HIL::NVMe: NVM     | READ  | NSID 1     | 1B0 + 1 | 861004362088 - 861007304098 (2942010)
38 861007444114: HIL::NVMe: INTR    | MSI-X sent | vector 2
39 861008760480: HIL::NVMe: CQ 3    | Completion Queue Head Doorbell | Item count in queue 1 -> 0 | head 432 -> 433 | tail 433

As shown at line 6, prefetch has been triggered because prefetch condition has been satisfied. GenericCache module of SimpleSSD will read one super page or all pages from all planes (you can configure this).

At line 13 and 21, you can check the latency calculated by PAL has same value (118,317,824 ps). These two requests are targets different channel, so they can operate simultaneously (They are accessing channel 0 and 11, respectively). At line 26, the latency value changed (130,068,352 ps), because channel 0 has previous request, so PAL determined there was channel conflicts. At line 34, request targeting die 7 in channel 11 delayed about 90us because of previous I/Os targeting 7 dies (die 0 to 6).

After prefetch, GenericCache return request as shown at line 36. The latency is quite small because data for the request is already exists on the cache (see line 5).

By analyzing debug log, you can find characteristics of each I/O and which component consumes large portion of I/O latency.

SimpleSSD Statistics

For SimpleSSD-FullSystem, you can find statistics at m5out/stats.txt. For SimpleSSD-Standalone, not yet supported (at point of writing v2.0-rc3).

Here is the example of statistics. This file is generated after performing I/O using fio with --direct=1 --ioengine=libaio --bs=4K --iodepth=1 --rw=read --time_based --runtime=5s.

 1 system.pci_nvme.command_count                  413480                       # Total number of NVMe command handled
 2 system.pci_nvme.read.request_count             413480                       # Read request count
 3 system.pci_nvme.read.bytes                 1693614080                       # Read data size in byte
 4 system.pci_nvme.read.busy                1653776508704                       # Device busy time when read
 5 system.pci_nvme.write.request_count                 0                       # Write request count
 6 system.pci_nvme.write.bytes                         0                       # Write data size in byte
 7 system.pci_nvme.write.busy                          0                       # Device busy time when write
 8 system.pci_nvme.request_count                  413480                       # Total request count
 9 system.pci_nvme.bytes                      1693614080                       # Total data size in byte
10 system.pci_nvme.busy                     1653776508704                       # Total device busy time
11 system.pci_nvme.icl.generic_cache.read.request_count       413480                       # Read request count
12 system.pci_nvme.icl.generic_cache.read.from_cache       413224                       # Read requests that served from cache
13 system.pci_nvme.icl.generic_cache.write.request_count            0                       # Write request count
14 system.pci_nvme.icl.generic_cache.write.to_cache            0                       # Write requests that served to cache
15 system.pci_nvme.dram.energy.act          351568356.208920                       # ACT command energy (pJ)
16 system.pci_nvme.dram.energy.pre          437367776.459918                       # PRE command energy (pJ)
17 system.pci_nvme.dram.energy.rd           368277927.504022                       # RD command energy (pJ)
18 system.pci_nvme.dram.energy.wr           198140164.999281                       # WR command energy (pJ)
19 system.pci_nvme.dram.energy.act.standby  1992764428.068482                       # ACT standby energy (pJ)
20 system.pci_nvme.dram.energy.pre.standby  146846141074.948761                       # PRE standby energy (pJ)
21 system.pci_nvme.dram.energy.ref          2589408.045731                       # Refresh energy (pJ)
22 system.pci_nvme.dram.read.request_count        826662                       # Read request count
23 system.pci_nvme.dram.read.bytes            1695873008                       # Read data size in byte
24 system.pci_nvme.dram.write.request_count       413438                       # Write request count
25 system.pci_nvme.dram.write.bytes           1693442048                       # Write data size in byte
26 system.pci_nvme.dram.request_count            1240100                       # Total request count
27 system.pci_nvme.dram.bytes                 3389315056                       # Total data size in byte
28 system.pci_nvme.ftl.page_mapping.gc_count            0                       # Total GC count
29 system.pci_nvme.ftl.page_mapping.reclaimed_blocks            0                       # Total reclaimed blocks in GC
30 system.pci_nvme.pal.energy.read          2892824.972693                       # NAND power used on read transaction (uJ)
31 system.pci_nvme.pal.energy.program                  0                       # NAND power used on program transaction (uJ)
32 system.pci_nvme.pal.energy.erase                    0                       # NAND power used on erase transaction (uJ)
33 system.pci_nvme.pal.energy.total         2892824.972693                       # NAND power used on all transaction (uJ)
34 system.pci_nvme.cpu.icl0.busy            929296300000                       # CPU for ICL core 0 busy ticks
35 system.pci_nvme.cpu.icl0.insts.branch        31837960                       # CPU for ICL core 0 executed branch instructions
36 system.pci_nvme.cpu.icl0.insts.load         193508640                       # CPU for ICL core 0 executed load instructions
37 system.pci_nvme.cpu.icl0.insts.store         26049240                       # CPU for ICL core 0 executed store instructions
38 system.pci_nvme.cpu.icl0.insts.arithmetic    118668760                       # CPU for ICL core 0 executed arithmetic instructions
39 system.pci_nvme.cpu.icl0.insts.fp                   0                       # CPU for ICL core 0 executed floating point instructions
40 system.pci_nvme.cpu.icl0.insts.others         1653920                       # CPU for ICL core 0 executed other instructions
41 system.pci_nvme.cpu.ftl0.busy            373127795000                       # CPU for FTL core 0 busy ticks
42 system.pci_nvme.cpu.ftl0.insts.branch        21085338                       # CPU for FTL core 0 executed branch instructions
43 system.pci_nvme.cpu.ftl0.insts.load          74418840                       # CPU for FTL core 0 executed store instructions
44 system.pci_nvme.cpu.ftl0.insts.store         12403140                       # CPU for FTL core 0 executed load instructions
45 system.pci_nvme.cpu.ftl0.insts.arithmetic     39276610                       # CPU for FTL core 0 executed arithmetic instructions
46 system.pci_nvme.cpu.ftl0.insts.fp                   0                       # CPU for FTL core 0 executed floating point instructions
47 system.pci_nvme.cpu.ftl0.insts.others         2067190                       # CPU for FTL core 0 executed other instructions

Each line contains one statistic for components. You can find DRAMPower result at line 15-21. This SSD has three cores, each core is dedicated for each firmware component. You can find CPU statistics at line 34-47.

This stats.txt file is generated by gem5, which can controlled in user-level space through m5 utility.

Analyze Log from Linux Kernel

To analyze kernel behavior, you need to modify the kernel source code. In SimpleSSD-FullSystem, we added new M5 pseudo-instruction (M5OP) called m5print. This instruction prints string to debug log of gem5 (specified by --debug-flag and --debug-file options).

Prepare Linux Kernel

In this section, we will explain how to add M5OP to the Linux Kernel. We assumes ARMv8 (arm64) architecture.

Copy m5ops.h and m5op_flags.h files in include/gem5/asm/generic directory of SimpleSSD-FullSystem to arch/arm64/include/asm of Linux Kernel. Modify m5ops.h file to change include path of m5op_flags.h.

// From
#include <gem5/asm/generic/m5op_flags.h>

// To
#include <asm/m5op_flags.h>

Copy m5op_arm_A64.S file in util/m5 directory of SimpleSSD-FullSystem to arch/arm64/kernel of Linux Kernel. Modify m5op_arm_A64.S file to change include path of m5ops.h

// From
#include <gem5/asm/generic/m5ops.h>

// To
#include <asm/m5ops.h>

Add following line to Makefile in arch/arm64/kernel (line number may different).

16 # Object file lists.
17 arm64-obj-y             := debug-monitors.o entry.o irq.o fpsimd.o              \
18                            entry-fpsimd.o process.o ptrace.o setup.o signal.o   \
19                            sys.o stacktrace.o time.o traps.o io.o vdso.o        \
20                            hyp-stub.o psci.o cpu_ops.o insn.o   \
21                            return_address.o cpuinfo.o cpu_errata.o              \
22                            cpufeature.o alternative.o cacheinfo.o               \
23                            smp.o smp_spin_table.o topology.o smccc-call.o       \
24                            m5op_arm_A64.o
25 
26 extra-$(CONFIG_EFI)                     := efi-entry.o

Copy m5ops.h file in include/gem5 directory of SimpleSSD to include/linux of Linux Kernel. Do not confuse with m5ops.h which copied to arch/arm64/include/asm directory.

Modify include/linux/m5ops.h.

// From
#include <stdint.h>
#include <time.h>

#include <gem5/asm/generic/m5op_flags.h>

// To
#include <linux/types.h>
#include <linux/time.h>

#include <asm/m5op_flags.h>

Also you need to modify following lines too.

// From line 49 and 66
uint64_t m5_rpns();
void m5_load_symbol();

// To
uint64_t m5_rpns(void);
void m5_load_symbol(void);

Now you are ready to use M5OP inside the Linux Kernel.

Modify Kernel

Now, let’s add m5_print function where you want to analyze.

Objective: measure latency of MSI-X interrupt.

As SimpleSSD sends MSI-X interrupt (see above section for debug log), you can measure latency until IRQ handler of NVMe driver called. Modify drivers/nvme/host/pci.c (line number may different).

854 static irqreturn_t nvme_irq(int irq, void *data)
855 {
856         irqreturn_t result;
857         struct nvme_queue *nvmeq = data;
858         m5_print("nvme_irq", 8);
859         spin_lock(&nvmeq->q_lock);
860         nvme_process_cq(nvmeq);
861         result = nvmeq->cqe_seen ? IRQ_HANDLED : IRQ_NONE;
862         nvmeq->cqe_seen = 0;
863         spin_unlock(&nvmeq->q_lock);
864         return result;
865 }

Don’t forget to add #include <linux/m5ops.h> to beginning of the file.

Build Linux Kernel and copy it to $M5_PATH/binaries. For kernel build instructions, see Build Linux Kernel for gem5.

Analyze result

Boot SimpleSSD-FullSystem with newly built kernel image. As we want to check MSI-X, terminate simulation with m5 exit after kernel boot.

Here is the example of m5out/debug.txt and m5out/simout.

1 5361549000: system.terminal: attach terminal 0
2 436771014500: global: Log from guest: nvme_irq
3 436777643000: global: Log from guest: nvme_irq
4 437013694000: global: Log from guest: nvme_irq
5 437021837500: global: Log from guest: nvme_irq
 91 436732767000: HIL::NVMe: BAR0    | WRITE | Controller Configuration
 92 436732767000: HIL::NVMe: DMAPORT | WRITE | DATA 00460001
 93 436732786500: HIL::NVMe: BAR0    | READ  | Controller Status
 94 436732786500: HIL::NVMe: DMAPORT | READ  | DATA 00000001
 95 436763277000: HIL::NVMe: BAR0    | READ  | Version
 96 436763277000: HIL::NVMe: DMAPORT | READ  | DATA 00010201
 97 436763284000: HIL::NVMe: BAR0    | READ  | Controller Capabilities
 98 436763284000: HIL::NVMe: DMAPORT | READ  | DATA 0020002028010FFF
 99 436765154500: HIL::NVMe: SQ 0    | Submission Queue Tail Doorbell | Item count in queue 0 -> 1 | head 0 | tail 0 -> 1
100 436768910422: HIL::NVMe: ADMIN   | Identify | CNS 1 | CNTID 0 | NSID 0
101 436770652297: HIL::NVMe: INTR    | MSI-X sent | vector 0
102 436771375000: HIL::NVMe: CQ 0    | Completion Queue Head Doorbell | Item count in queue 1 -> 0 | head 0 -> 1 | tail 1
103 436776059000: HIL::NVMe: SQ 0    | Submission Queue Tail Doorbell | Item count in queue 0 -> 1 | head 1 | tail 1 -> 2
104 436777053844: HIL::NVMe: ADMIN   | Set Features | Feature 7 | NSID 0
105 436777179719: HIL::NVMe: INTR    | MSI-X sent | vector 0
106 436778003500: HIL::NVMe: CQ 0    | Completion Queue Head Doorbell | Item count in queue 1 -> 0 | head 1 -> 2 | tail 2

In the simout (lower code block), MSI-X sent at 436,770,652,297 ps (436ms). In the debug.txt (upper code block), nvme_irq function called at 436,771,014,500 ps (436ms). The difference was 362,203 ps (362ns).

The analysis method using M5OP has very low overhead (few CPU cycles) because it based on the instruction. You can add any number of m5_print function to the Linux Kernel or even in user-level program.

To generate debug.txt file, you should specify --debug-flag=M5Print and --debug-file=debug.txt when executing SimpleSSD-FullSystem.

Analyze Debug Log and Statistics from Simulator

Also you can analyze debug log and statistics of simulator itself.

SimpleSSD-FullSystem

Similar to Analyze SimpleSSD Log and Statistics, gem5 uses --debug-file and stats.txt to print detailed debug log of hardware components and statistics. stats.txt file can be analyzed with same method of SimpleSSD. We will explain how to use --debug-flag and --debug-file options of gem5.

--debug-file specifies where to save hardware debug log (see Execution of SimpleSSD-FullSystem). --debug-flag specifies which hardware modules to enable debug log printout.

The M5Print module used in above section specifies debug log from guest OS running on gem5. You can list up all usable debug flags using ./build/ARM/gem5.opt --debug-help command.

SimpleSSD-Standalone

For standalone version, the log printout is quite limited (v2.0-rc3).

SimpleSSD Standalone v2.0
********** Begin of simulation **********
*** Statistics of Request Generator ***
Tick: 5000439428548
Time (ps): 79297718 - 5000439428548 (5000360130830)
I/O (bytes): 1393033216 (278586577.676911 B/s)
I/O (counts): 340096 (Read: 170048, Write: 170048)
*** End of statistics ***
*** Statistics of Event Engine ***
Simulation Tick (ps): 5000439428548
Host time duration (sec): 27.220903
Event handled: 10758702 (395236.771106 ops)
*** End of statistics ***
End of simulation @ tick 5000439428548

As shown as above code block, statistics of request generator contains total I/O count and bandwidth information.

Analyze on user-level

If you are using SimpleSSD-FullSystem, you can analyze the SimpleSSD using user-level benchmark programs. Compile benchmark program (see Build User-Level Program for gem5) and run inside of gem5.

Here is the example script (using with --script option).

 1 #!/bin/bash
 2 cd /home/root
 3 ./stat 100 &
 4 m5 resetstats
 5 ./fio --direct=1 --ioengine=libaio --iodepth=1 --bs=4096 \
 6 --rw=read --filename=/dev/nvme0n1 --numjobs=1 --name=test --write_lat_log=test \
 7 --write_iops_log=test --write_bw_log=test \
 8 --log_avg_msec=10 --time_based --runtime=5s \
 9 --randseed=13425
10 m5 dumpstats
11 m5 writefile test_clat.1.log
12 m5 writefile test_slat.1.log
13 m5 writefile test_lat.1.log
14 m5 writefile test_iops.1.log
15 m5 writefile test_bw.1.log
16 m5 exit

At line 3, stat program collects kernel/user CPU and DRAM utilization (through /proc filesystem). At line 4, reset all statistics of gem5 to zero. Execute fio and dump final statistics (line 10). After writing result files of fio, terminate simulation (line 16).

After simulation, you can get 6 files (one for fio output, 5 for fio log files).