Utilizing Chakra execution traces for benchmarking and community efficiency optimization

  • Meta presents Chakra execution traces, an open graph-based illustration of AI/ML workload execution, laying the muse for benchmarking and community efficiency optimization.
  • Chakra execution traces signify key operations, akin to compute, reminiscence, and communication, information and management dependencies, timing, and useful resource constraints.
  • In collaboration with MLCommons, we’re looking for industry-wide adoption for benchmarking. 
  • Meta open sourced a set of instruments to allow the gathering, evaluation, technology, and adoption of Chakra execution traces by a broad vary of simulators, emulators, and replay instruments.

At Meta, our endeavors are usually not solely geared in the direction of pushing the boundaries of AI/ML but in addition in the direction of optimizing the huge networks that allow these computations. Our agile, reproducible, and standardized benchmarking system performs an necessary function on this. By our collaboration with MLCommons, and our deep insights into conventional benchmarking constraints, we have now initiated the Chakra execution traces—a graph-based illustration of AI/ML workloads. This strategy goals to unify numerous execution hint schemas, looking for industry-wide adoption for enhanced AI effectivity evaluation instruments and holistic efficiency benchmarking.

The constraints of conventional AI benchmarking methodology

Historically, benchmarking AI methods has largely relied on working full ML workloads. Established benchmarking approaches, akin to MLPerf, have offered invaluable insights into the conduct and efficiency of AI workloads and methods. Nonetheless, conventional full workload benchmarking presents a number of challenges:

  1. Issue in forecasting future system efficiency: When designing an AI system, engineers continuously face the problem of predicting the efficiency of future methods. Such predictions develop into much more complicated when the compute engines aren’t prepared or when modifications in community topology and bandwidth develop into mandatory. Counting on full workloads to guage the efficiency of those not-yet-realized methods just isn’t possible.
  2. Excessive compute value: Executing full workload benchmarks comes at a considerable compute value. Provided that coaching up to date ML fashions typically requires 1000’s of graphics processing models (GPUs), these benchmarks ought to ideally be executed on a equally huge variety of GPUs. Moreover, gauging the efficiency of a system utilizing this technique may be time-consuming.
  3. Incapacity to adapt to evolving workloads: The panorama of ML workloads and their necessities is quickly evolving. Conventional full workload benchmarks fall brief in terms of addressing these altering wants, primarily as a result of they necessitate important efforts to standardize workloads as benchmarks.

An summary of Chakra

Constructing upon our insights into the constraints of conventional benchmarking, we current the Chakra execution traces. This new strategy supplies an open, interoperable graph-based depiction of AI/ML workload execution. The Chakra execution hint captures core operations—together with compute, reminiscence, and communication—together with their dependencies, timing, and metadata. 

Although execution traces are a priceless illustration of an ML activity, the construction and metadata of the ensuing traces can differ primarily based on the ML framework utilized. Recognizing this, Chakra introduces a standardized schema for efficiency modeling, termed the Chakra execution hint. The beneath determine outlines the Chakra ecosystem, with execution traces as its central element. As depicted within the determine, Chakra additionally affords a spread of instruments to transform, visualize, generate, and simulate these execution traces.

How Meta leverages Chakra execution traces

At Meta, we accumulate execution traces from our manufacturing servers on daily basis. These execution traces serve a number of functions: Benchmarking, visualization, and efficiency optimization.

Benchmarking

Benchmarking is important for enhancing present AI methods and planning future networks. We particularly make the most of Chakra execution traces for this activity. We’ve developed a number of benchmarking instruments, together with Mystique and PARAM. Mystique permits us to copy the efficiency of an ML workload by replaying each compute and communication operators present in execution traces. It leverages the Chakra execution hint to file runtime particulars of a mannequin on the operator stage after which replays them to breed the unique efficiency. Consistent with our imaginative and prescient, the MLCommons Chakra working group is curating the ‘Chakra hint benchmark suite’ by gathering execution traces from varied {industry} gamers.

Visualization and efficiency optimization

One instance of visualization and efficiency optimization is the evaluation of collective message sizes. We analyze manufacturing execution traces utilizing an automatic system. The visible information generated aids us in figuring out any steadiness or imbalance in collective message sizes throughout completely different ranks. Our visualization software can exactly spotlight these imbalances, as proven by the beneath determine. 

With this info at hand, Meta engineers are outfitted to craft applicable options, guaranteeing a balanced message measurement, as demonstrated within the beneath determine.

Future plans

Enhancing the benchmarking functionality of Chakra execution traces

Whereas the execution hint replayer allows replay of execution traces, it brings forth challenges. A major problem is the intrinsic linkage of collected execution traces to particular methods. As a result of traces are gathered from precise machine runs, the kernels executed are optimized for the precise system at play. In consequence, traces sourced from one system may not precisely simulate on one other with a unique GPU, community topology, and bandwidth.

We’re addressing this constraint in collaboration with the MLCommons Chakra working group. We aspire to collect execution traces previous to the operator optimization section for any goal system, as proven within the determine. These are termed pre-execution traces. In parallel, to allow benchmarking next-gen AI methods, we’re streamlining the method from hint assortment to simulation on a simulator.

Utilizing AI to generate consultant execution traces

Chakra execution traces are able to figuring out community bottlenecks in ML workload execution. Nonetheless, optimizing SW/HW stacks with manufacturing execution traces presents a sensible problem. The primary problem arises when making an attempt to globally optimize our manufacturing methods. Given the sheer quantity of manufacturing traces, exhaustively working them for system optimization is neither possible nor environment friendly. Doing so could be each time-consuming and computationally costly. Thus, deciding on a consultant subset of manufacturing execution traces turns into crucial. 

Nonetheless, there’s a threat: The chosen traces may not holistically signify the worldwide traits, probably skewing optimization efforts in the direction of solely particular ML workloads. We envision a generative AI mannequin that may determine and generate execution traces which might be consultant of the first traits noticed. We additionally plan to include an obfuscation mechanism inside the AI mannequin. This can facilitate hint sharing with out jeopardizing mental property, fostering SW/HW co-design between completely different firms.

 

Taking the leap with {industry} collaboration

For such an ecosystem to flourish, {industry} consensus is paramount. Our collaboration with the MLCommons consortium, an open engineering meeting of over 50 main firms, is a testomony to our dedication. This collaboration goals to ascertain Chakra inside its fold, offering a framework for broad adoption.

Chakra’s working group below MLCommons will spearhead efforts to create and develop:

  • A standardized schema that may seize and convert execution traces from numerous frameworks.
  • ML fashions for creating consultant Chakra execution traces – defending proprietary info whereas additionally projecting future AI workloads.
  • An open ecosystem of instruments for benchmarks, simulations, and emulations.
  • Complete benchmarks with Chakra execution traces primarily based on MLCommons/MLPerf tips.

Be a part of us on this journey

Our imaginative and prescient is to forge an agile, reproducible benchmarking and co-design system for AI. Collaboration with friends, tutorial establishments, and consortiums will probably be pivotal. We invite people and corporations to develop into part of the Chakra working group, to assist contribute to the paradigm shift in benchmarking and community efficiency optimization.

Learn the analysis paper

Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces

Acknowledgements

We want to thank all contributors to the Chakra undertaking inside Meta: Taekyung Heo, Srinivas Sridharan, Brian Coutinho, Hiwot Kassa, Matt Bergeron, Parth Malani, Shashi Gandham, Omar Baldonado, our exterior companions in Georgia Tech and MLCommons, in addition to exterior collaborators in AMD, CMU, Cornell, Enfabrica, Google, Harvard, HP Labs, Intel, Keysight Applied sciences, Microsoft, NVIDIA, OCP, and Stanford.