Active projects

AI benchmarking tool

We are in the alpha stages of testing our new profiler that is a specialized tool designed to facilitate the execution of AI workloads as a means to assess the capabilities of hardware to run those workloads. It provides a framework for training and inference of various model archetypes (e.g. Large Language Models, Text Classification, Object Recognition, Voice Synthesis) while profiling the underlying hardware performance. This includes monitoring metrics such as throughput, disk I/O operations per second (IOPS), memory usage in conjunction with model specific statistics such as reported loss.

Features:

Model Training and Inference: Supports operations like training, pretraining, and inference with popular models such as llama-2
Custom Tokenizer Support: Includes functionality to grab pre-trained tokenizers and customize them with specific tokens for tasks like padding and sequence delineation.
Performance Profiling: Integrates with system hardware to log performance metrics during model operations, helping identify bottlenecks, inefficiencies, and overall system performance.
Data Handling: Implements custom collate functions for data loaders to handle batches of text data, ensuring efficient processing for LLMs, Image Segmentation and Classification Models.
Image Segmentation Training Workflow: Empowers users to extend the capabilities of chosen models through additional training on designated datasets. It offers a versatile data loader capable of preprocessing various Hugging Face image segmentation datasets for seamless integration into the training pipeline.
Image Classification Training Workflow: Enables comprehensive customization of training configurations for Image Classification models. Users have the flexibility to tailor every aspect of the training process according to their specific requirements. Additionally, the workflow provides default configurations for streamlined training of basic models, ensuring a hassle-free experience.

Data collected during benchmarking process

Performance metrics

- CPU/GPU utilization percentage over time

- CPU/GPU memory utilization over time

- DISK IO: [CPU <-> GPU <-> DISK]

- LOSS functions for ML tasks

Hardware parameters

- Clock speed: This is the speed at which the processor operates, measured in GHz. Higher clock speeds generally result in better performance, but there are other factors that can impact performance as well.

- Core count: number of processing cores in the processor. More cores generally result in better performance for multi-threaded workloads.

- Memory bandwidth: rate at which data can be read from or written to memory, measured in GB/s. Higher memory bandwidth can result in better performance for memory-intensive workloads.

- Floating-point operations per second (FLOPS): measure of the number of floating-point operations that can be performed per second, measured in GFLOPS or TFLOPS. This metric is particularly important for scientific computing and other workloads that involve a lot of numerical calculations.

- Power consumption: amount of power consumed by the processor, measured in watts. Lower power consumption can be important for applications that need to run on battery-powered devices or in environments with limited power.

- Thermal design power (TDP): maximum amount of power that the processor is designed to consume, measured in watts. This metric can be useful for evaluating the cooling requirements for the processor.

- Benchmark scores: measure of the performance of the processor on specific benchmarks or workloads. Benchmark scores can be useful for comparing the performance of different processors under similar workloads.

Page updated

Report abuse