Trace Collection ================ Trace collection in PyTorch is enabled by wrapping the training/inference loop in a ``profile`` context. A couple of useful options to know about are ``tracing schedule`` and ``trace handler``. The `tracing schedule` allows the user to specify how many steps we can skip, wait, warmup the profiler, record the activity and finally how many times to repeat the process. During the warmup, the profiler is running but no events are being recorded hence there is no profiling overhead. The `trace handler` allows to specify the output folder along with the option to gzip the trace file. Given that trace files can easily run into hundreds of MBs this is useful to have. The ``profile`` context also gives options to record either or both CPU and GPU events using the activities argument. Users can also record the shapes of the tensors with ``record_shapes`` argument and collect the python call stack with the ``with_stack`` argument. The ``with_stack`` argument is especially helpful in connecting the trace event to the source code, which enables faster debugging. The ``profile_memory`` option allows tracking tensor memory allocations and deallocations. To profile, wrap the code in the ``profile`` context manager as shown below. .. code-block:: python :linenos: :emphasize-lines: 17 from torch.profiler import profile, schedule, tensorboard_trace_handler tracing_schedule = schedule(skip_first=5, wait=5, warmup=2, active=2, repeat=1) trace_handler = tensorboard_trace_handler(dir_name=/output/folder, use_gzip=True) with profile( activities = [ProfilerActivity.CPU, ProfilerActivity.CUDA], schedule = tracing_schedule, on_trace_ready = trace_handler, profile_memory = True, record_shapes = True, with_stack = True ) as prof: for step, batch_data in enumerate(data_loader): train(batch_data) prof.step() Line 17 in the code snippet above signals to the profiler that a training iteration has completed.