Temporal Breakdown
To best utilize the GPUs it is vital to understand where the GPU is spending time for a given job. Is the GPU spending time on computation, communication, memory events, or is it idle? The temporal breakdown feature breaks down the time spent in three categories
Idle time - GPU is idle.
Compute time - GPU is being used for matrix multiplications or vector operations.
Non-compute time - GPU is being used for communication or memory events.
To achieve high training efficiency the code should maximize compute time and minimize idle time and non-compute time. This is accomplished by implementing concurrent execution of computation kernels with communication or memory kernels.
Note
During concurrent execution of computation kernels with communication/memory kernels the time spent by communication/memory kernels is accounted for under compute time.
The temporal breakdown can be calculated as follows:
analyzer = TraceAnalysis(trace_dir = "/path/to/trace/folder")
time_spent_df = analyzer.get_temporal_breakdown()
The function returns a dataframe containing the temporal breakdown for each rank. See figure below.
When the visualize
argument is set to True, the get_temporal_breakdown
function also generates a bar graph representing the breakdown by rank.