Shortcuts

lightrft.strategy.sglang_utils.sgl_model_saver

This module provides memory management utilities for SGLang model execution.

The module implements memory saving functionality by allowing temporary release and restoration of GPU memory occupied by model weights and states. This is particularly useful in scenarios where multiple models or processes need to share limited GPU memory resources efficiently.

This module is designed to be compatible with different versions of SGLang: - For SGLang v0.5.6.post2+: Uses built-in methods from SchedulerUpdateWeightsMixin - For older versions: Provides backward-compatible monkey patching

The module automatically detects which approach to use based on the SGLang version.

lightrft.strategy.sglang_utils.sgl_model_saver.release_memory_occupation(self, recv_req: sglang.srt.managers.io_struct.ReleaseMemoryOccupationReqInput)[source]

Release memory occupation by stashing model weights and states to CPU memory.

This method temporarily frees GPU memory by moving model parameters and static states to CPU memory. It’s designed to be used when the model is temporarily not needed, allowing other processes or models to utilize the freed GPU memory.

Compatible with both old and new SGLang versions by detecting the model runner location.

The method performs the following operations:
  1. Validates the memory saver adapter

  2. Exports and stashes the model’s static state

  3. Clones model parameters to CPU memory if not already done

  4. Pauses the memory saver adapter

  5. Flushes the model cache

Parameters:

recv_req (ReleaseMemoryOccupationReqInput) – Request input for releasing memory occupation

Returns:

Response indicating successful memory release

Return type:

ReleaseMemoryOccupationReqOutput

Example::
>>> scheduler = Scheduler(...)
>>> req = ReleaseMemoryOccupationReqInput()
>>> response = scheduler.release_memory_occupation(req)
>>> # GPU memory is now freed for other uses
lightrft.strategy.sglang_utils.sgl_model_saver.resume_memory_occupation(self, recv_req: sglang.srt.managers.io_struct.ResumeMemoryOccupationReqInput)[source]

Resume memory occupation by restoring model weights and states from CPU memory.

This method restores the model to its fully operational state by loading back the previously stashed model parameters and static states from CPU memory to GPU. It should be called after release_memory_occupation() when the model needs to be used again.

Compatible with both old and new SGLang versions by detecting the model runner location.

The method performs the following operations:
  1. Validates the memory saver adapter

  2. Resumes the memory saver adapter

  3. Imports the previously stashed static state

  4. Restores model parameters from CPU to GPU

  5. Cleans up temporary static state storage

Parameters:

recv_req (ResumeMemoryOccupationReqInput) – Request input for resuming memory occupation

Returns:

Response indicating successful memory restoration

Return type:

ResumeMemoryOccupationReqOutput

Example::
>>> scheduler = Scheduler(...)
>>> # After previously calling release_memory_occupation()
>>> req = ResumeMemoryOccupationReqInput()
>>> response = scheduler.resume_memory_occupation(req)
>>> # Model is now ready for inference again