Kernel Breakdown
The kernel breakdown feature breaks down the time spent for each kernel type i.e. communication (COMM), computation (COMP), and memory (MEM) across all ranks and presents the proportion of time spent in each category. The percentage of time spent in each category as a pie chart.
The kernel breakdown can be calculated as follows:
analyzer = TraceAnalysis(trace_dir = "/path/to/trace/folder")
kernel_type_metrics_df, kernel_metrics_df = analyzer.get_gpu_kernel_breakdown()
The first dataframe returned by the function contains the raw values used to generate the Pie chart.
Kernel Duration Distribution
The second dataframe returned by get_gpu_kernel_breakdown contains duration summary statistics for each kernel. In particular, this includes the count, min, max, average, standard deviation, sum and kernel type for each kernel on each rank.
Using this data HTA creates many visualizations to identify performance bottlenecks.
Pie charts of the top kernels for each kernel type for each rank.
Bar graphs of the average duration across all ranks for each of the top kernels and for each kernel type.
Tip
All images are generated using plotly. Hovering on the graph shows the mode bar on the top right which allows the user to zoom, pan, select and download the graph.
The pie charts above shows the top 5 computation, communication and memory
kernels. Similar pie charts are generated for each rank. The pie charts can be
configured to show the top k kernels using the num_kernels
argument passed to
the get_gpu_kernel_breakdown
function. Additionally, the duration_ratio
argument can be used to tune the
percentage of time that needs to be analyzed. If both num_kernels
and
duration_ratio
are specified, then num_kernels
takes precedence.
The bar graph above shows the average duration of the NCCL AllReduce kernel across all the ranks. The black lines indicate the minimum and maximum time taken on each rank.
Warning
When using jupyter-lab set the “image_renderer” argument value to “jupyterlab” otherwise the graphs will not render in the notebook.
For a detailed walkthrough of this feature see the gpu_kernel_breakdown notebook in the examples folder of the repo.