This document describes the high-performance radial integration (often referred to as "caking") component developed as part of the larger MIDAS toolkit at Argonne National Laboratory. This component leverages a multi-process GPU-accelerated architecture to deliver exceptional speed and accuracy for processing 2D X-ray diffraction (XRD), SAXS, and WAXS data, making it ideal for high-speed streaming analysis workflows at facilities like the Advanced Photon Source (APS).

Key Features of the Radial Integration Component
Exceptional Speed via Multi-Process GPU Acceleration: Achieves state-of-the-art processing speeds by effectively utilizing multiple processes concurrently on the GPU.
Demonstrated throughputs up to 200 GPx/s on NVIDIA H100 GPUs (10 processes) and 50 GPx/s on NVIDIA RTX 3090 GPUs (15 processes).
This multi-process GPU approach shows substantially higher performance (e.g., ~34x faster reported in benchmarks) compared to the documented GPU performance of pyFAI (e.g., 1.47 GPx/s with 2 processes on RTX 3090, or 2.5 GPx/s cited from their website) on similar hardware classes.
Also offers efficient multi-CPU utilization (e.g., ~3 GPx/s across 20+ jobs).
High Accuracy Integration:
Performs accurate sub-pixel integration using full pixel splitting, ensuring high fidelity conversion from 2D detector space to 1D or 2D integrated patterns.
Provides superior detector distortion correction, achieving up to 10x better accuracy (e.g., 1x10⁻⁵ strain for GE detectors) compared to some standard libraries.
Efficient Algorithm:
Generates a detector map (geometry look-up table) once per experimental configuration.
Re-uses this map for each subsequent frame, significantly speeding up processing for large datasets or streaming data.
Supports integration into 1D or 2D outputs and can efficiently process chunks of multiple frames.
Workflow Integration & Automation:
Designed to work seamlessly within automated analysis pipelines.
When combined with the automated calibration tools available in the broader MIDAS toolkit, it enables fully automated geometry determination.
Its ability to export directly to GSAS-II format or perform direct peak fitting makes it a powerful engine for automated, high-speed streaming analysis of SAXS/WAXS data, including Rietveld refinement.
Architecture & Usability:
Features a Python interface controlling a highly optimized CUDA backend for GPU execution.
Utilizes a server-client architecture (e.g., using PVAccess/PVAPy), allowing computationally intensive integration to run on dedicated GPU servers (like 'Califone') accessible by various beamline clients.
Real-time Capability:
The high throughput enables real-time analysis during experiments. As demonstrated, it can track and fit peak characteristics (position, FWHM, area) from streaming detector data (e.g., 10 MByte/image @ 5 Hz) with live plot updates (e.g., 1 Hz update rate), even when limited by network bandwidth.
This specialized radial integration component within the MIDAS framework provides the speed and accuracy necessary for modern, data-intensive synchrotron experiments, forming a crucial part of automated data processing and analysis workflows. Its performance advantage stems significantly from its effective multi-process utilization of GPU resources.
Project Members (Primary for this component)
Hemant Sharma
Sinisa Veseli