Temporal Breakdown

To best utilize the GPUs it is vital to understand where the GPU is spending time for a given job. Is the GPU spending time on computation, communication, memory events, or is it idle? The temporal breakdown feature breaks down the time spent in three categories

Idle time - GPU is idle.
Compute time - GPU is being used for matrix multiplications or vector operations.
Non-compute time - GPU is being used for communication or memory events.

To achieve high training efficiency the code should maximize compute time and minimize idle time and non-compute time. This is accomplished by implementing concurrent execution of computation kernels with communication or memory kernels.

Note

During concurrent execution of computation kernels with communication/memory kernels the time spent by communication/memory kernels is accounted for under compute time.

The temporal breakdown can be calculated as follows:

analyzer = TraceAnalysis(trace_dir = "/path/to/trace/folder")
time_spent_df = analyzer.get_temporal_breakdown()

The function returns a dataframe containing the temporal breakdown for each rank. See figure below.

When the visualize argument is set to True, the get_temporal_breakdown function also generates a bar graph representing the breakdown by rank.