We include things like an inefficient reference PyTorch implementation in gpt_oss/torch/model.py. This code takes advantage of essential PyTorch operators to indicate the precise model architecture, with a little addition of supporting tensor parallelism in MoE so which the more substantial product can run with this particular code (e.The terminal