GPU Tensor Server

A REST API server for managing GPU tensors across processes using CUDA IPC (Inter-Process Communication) when available, while gracefully falling back to CPU tensors when GPUs are not present.

Key Features

🚀 Zero-Copy Sharing: When CUDA is available tensors stay on GPU and are shared by handle
🧠 CPU Fallback: Runs even on CPU-only hosts by serialising tensors as base64 payloads
🎯 CUDA IPC: Uses PyTorch's CUDA IPC for true cross-process GPU memory sharing
🔒 Device Control: Force explicit CUDA device allocation
🆔 Unique References: UUID-based tensor IDs for reliable access
⚡ REST API: Simple HTTP interface for any language/process

Installation

# Install dependencies
./install-stuff.sh
source .venv/bin/activate

Usage

Start the Server

python -m tensor_manager.tensor_server

The server will run on http://localhost:8000

API Endpoints

1. Load WAV file as tensor

curl -X POST \
  -F "wav_file=@your_audio.wav" \
  -F "cuda_device=0" \
  http://localhost:8000/tensors

Returns:

{
  "tensor_id": "550e8400-e29b-41d4-a716-446655440000",
  "shape": [32000],
  "dtype": "torch.float32", 
  "device": "cuda:0",
  "sample_rate": 16000,
  "message": "Tensor loaded successfully on cuda:0"
}

2. Get tensor information

curl http://localhost:8000/tensors/{tensor_id}

3. Get tensor handle (IPC or base64 payload)

curl http://localhost:8000/tensors/{tensor_id}/handle

If CUDA is available you'll receive an IPC handle and metadata for zero-copy access:

{
  "tensor_id": "550e8400-e29b-41d4-a716-446655440000",
  "ipc_handle": "base64-encoded-cuda-ipc-handle",
  "shape": [32000],
  "dtype": "torch.float32",
  "device": "cuda:0",
  "data_ptr": 140000000000000,
  "element_size": 4,
  "numel": 32000,
  "sample_rate": 16000
}

On CPU-only systems you'll instead receive a base64 payload that can be rebuilt client-side:

{
  "tensor_id": "550e8400-e29b-41d4-a716-446655440000",
  "data_b64": "base64-encoded-tensor",
  "shape": [32000],
  "dtype": "torch.float32",
  "device": "cpu",
  "element_size": 4,
  "numel": 32000,
  "sample_rate": 16000
}

4. Delete tensor

curl -X DELETE http://localhost:8000/tensors/{tensor_id}

5. List all tensors

curl http://localhost:8000/tensors

6. Get CUDA device info

curl http://localhost:8000/cuda/info

Client Usage

Use the TensorClient class for easy integration:

from tensor_manager.tensor_client import TensorClient

client = TensorClient()

# Upload WAV file
tensor_id = client.upload_wav_file("audio.wav", cuda_device=0)

# Access the tensor (zero-copy on CUDA, decoded otherwise)
shared_tensor = client.access_shared_tensor(tensor_id)

# Tensor is moved to CUDA automatically if available
shared_tensor.mul_(2.0)

# Cleanup
client.delete_tensor(tensor_id)

Testing

pytest test_tensor_server.py -v

Examples

# Basic server usage
python usage_example.py

# Client library demonstration  
python -m tensor_manager.tensor_client

How It Works

Server loads WAV → Tensor lives on GPU when CUDA+CuPy are available, otherwise on CPU.
Client requests handle → Server returns a CUDA IPC handle or a base64 payload depending on the backend.
Client consumes handle → Client performs zero-copy mapping on CUDA or reconstructs the CPU tensor from bytes.

Features

✅ True zero-copy GPU tensor sharing
✅ CUDA IPC memory handles
✅ Explicit CUDA device allocation
✅ Unique tensor IDs for reference
✅ RESTful API interface
✅ Client library for easy integration
✅ Comprehensive test suite
✅ Manual memory management (no refcounting)
✅ Cross-process tensor access

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
tensor_manager		tensor_manager
.gitignore		.gitignore
README.md		README.md
loader.py		loader.py
pyproject.toml		pyproject.toml
tensor_client.py		tensor_client.py
test_tensor_server.py		test_tensor_server.py
usage_example.py		usage_example.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPU Tensor Server

Key Features

Installation

Usage

Start the Server

API Endpoints

1. Load WAV file as tensor

2. Get tensor information

3. Get tensor handle (IPC or base64 payload)

4. Delete tensor

5. List all tensors

6. Get CUDA device info

Client Usage

Testing

Examples

How It Works

Features

About

Uh oh!

Releases

Packages

Languages

strolid/tensor_manager

Folders and files

Latest commit

History

Repository files navigation

GPU Tensor Server

Key Features

Installation

Usage

Start the Server

API Endpoints

1. Load WAV file as tensor

2. Get tensor information

3. Get tensor handle (IPC or base64 payload)

4. Delete tensor

5. List all tensors

6. Get CUDA device info

Client Usage

Testing

Examples

How It Works

Features

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages