Temporal Breakdown

To best utilize the GPUs it is vital to understand where the GPU is spending time for a given job. Is the GPU spending time on computation, communication, memory events, or is it idle? The temporal breakdown feature breaks down the time spent in three categories

  1. Idle time - GPU is idle.

  2. Compute time - GPU is being used for matrix multiplications or vector operations.

  3. Non-compute time - GPU is being used for communication or memory events.

To achieve high training efficiency the code should maximize compute time and minimize idle time and non-compute time. This is accomplished by implementing concurrent execution of computation kernels with communication or memory kernels.

Note

During concurrent execution of computation kernels with communication/memory kernels the time spent by communication/memory kernels is accounted for under compute time.

The temporal breakdown can be calculated as follows:

analyzer = TraceAnalysis(trace_dir = "/path/to/trace/folder")
time_spent_df = analyzer.get_temporal_breakdown()

The function returns a dataframe containing the temporal breakdown for each rank. See figure below.

../../_images/temporal_breakdown_df.png

When the visualize argument is set to True, the get_temporal_breakdown function also generates a bar graph representing the breakdown by rank.

../../_images/temporal_breakdown_plot.png