Trace Collection
Trace collection in PyTorch is enabled by wrapping the training/inference loop
in a profile
context. A couple of useful options to know about are
tracing schedule
and trace handler
. The tracing schedule allows the
user to specify how many steps we can skip, wait, warmup the profiler, record
the activity and finally how many times to repeat the process. During the
warmup, the profiler is running but no events are being recorded hence there is
no profiling overhead. The trace handler allows to specify the output folder
along with the option to gzip the trace file. Given that trace files can easily
run into hundreds of MBs this is useful to have.
The profile
context also gives options to record either or both CPU and GPU
events using the activities argument. Users can also record the shapes of the
tensors with record_shapes
argument and collect the python call stack with
the with_stack
argument. The with_stack
argument is especially helpful in
connecting the trace event to the source code, which enables faster debugging.
The profile_memory
option allows tracking tensor memory allocations and
deallocations.
To profile, wrap the code in the profile
context manager as shown below.
1from torch.profiler import profile, schedule, tensorboard_trace_handler
2
3tracing_schedule = schedule(skip_first=5, wait=5, warmup=2, active=2, repeat=1)
4trace_handler = tensorboard_trace_handler(dir_name=/output/folder, use_gzip=True)
5
6with profile(
7 activities = [ProfilerActivity.CPU, ProfilerActivity.CUDA],
8 schedule = tracing_schedule,
9 on_trace_ready = trace_handler,
10 profile_memory = True,
11 record_shapes = True,
12 with_stack = True
13) as prof:
14
15 for step, batch_data in enumerate(data_loader):
16 train(batch_data)
17 prof.step()
Line 17 in the code snippet above signals to the profiler that a training iteration has completed.