Skip to content

Update version to 0.19.0 and enhance file upload functionality#69

Merged
parashardhapola merged 4 commits intomasterfrom
r2_upload
Mar 7, 2026
Merged

Update version to 0.19.0 and enhance file upload functionality#69
parashardhapola merged 4 commits intomasterfrom
r2_upload

Conversation

@parashardhapola
Copy link
Member

  • Bump package version to 0.19.0.
  • Increase maximum workers for file uploads from 4 to 6.
  • Introduce support for uploading chunks to presigned URLs, including progress reporting and error handling.

- Bump package version to 0.19.0.
- Increase maximum workers for file uploads from 4 to 6.
- Introduce support for uploading chunks to presigned URLs, including progress reporting and error handling.
- Refactor upload logic to accommodate both presigned URL and server uploads, improving overall upload reliability.
@qodo-code-review
Copy link

Review Summary by Qodo

Add R2 presigned URL upload support and increase worker threads

✨ Enhancement

Grey Divider

Walkthroughs

Description
• Bump package version to 0.19.0
• Increase max workers for uploads from 4 to 6
• Add support for uploading chunks to presigned URLs
• Refactor upload logic to support both R2 and server uploads
• Implement progress reporting and error handling for R2 uploads
Diagram
flowchart LR
  A["Upload Request"] --> B{"Use R2?"}
  B -->|Yes| C["Get Presigned URLs"]
  B -->|No| D["Server Upload"]
  C --> E["Upload Chunks to R2"]
  D --> F["Upload Chunks to Server"]
  E --> G["Collect ETags"]
  F --> H["Complete Upload"]
  G --> I["Complete with Parts"]
  I --> J["Return Response"]
  H --> J
Loading

Grey Divider

File Changes

1. cytetype/__init__.py ⚙️ Configuration changes +1/-1

Bump version to 0.19.0

• Update package version from 0.18.1 to 0.19.0

cytetype/init.py


2. cytetype/api/client.py ✨ Enhancement +78/-23

Refactor upload logic for R2 presigned URLs

• Increase default max_workers parameter from 4 to 6 in _upload_file, upload_obs_duckdb, and
 upload_vars_h5 functions
• Add detection of presigned URLs and R2 upload ID from server response
• Extract progress update logic into separate _update_progress function
• Create two upload functions: _upload_chunk_r2 for presigned URL uploads and
 _upload_chunk_server for server uploads
• Implement ETag collection for R2 uploads with thread-safe dictionary
• Add retry logic with exponential backoff for R2 chunk uploads
• Modify completion step to send ETag parts for R2 uploads or empty POST for server uploads
• Route to appropriate upload function based on R2 availability

cytetype/api/client.py


3. cytetype/api/transport.py ✨ Enhancement +42/-0

Add presigned URL and JSON POST methods

• Add put_to_presigned_url method to upload raw bytes to presigned URLs and return ETag header
• Add post_json method to make POST requests with JSON body and proper headers

cytetype/api/transport.py


Grey Divider

Qodo Logo

@qodo-code-review
Copy link

qodo-code-review bot commented Mar 7, 2026

Code Review by Qodo

🐞 Bugs (3) 📘 Rule violations (0) 📎 Requirement gaps (0)

Grey Divider


Remediation recommended

1. Presigned URL count mismatch🐞 Bug ✓ Correctness
Description
R2 upload indexes presigned_urls[chunk_idx] for every chunk in range(n_chunks) without validating
presigned_urls length. If the server returns fewer URLs than the client-computed chunk count,
uploads fail with an IndexError from a worker thread (hard to interpret for users).
Code

cytetype/api/client.py[116]

+        url = presigned_urls[chunk_idx]  # type: ignore[index]
Evidence
The client computes n_chunks from file size and chunk_size, then spawns workers for every chunk
index 0..n_chunks-1. In R2 mode each worker blindly indexes presigned_urls by chunk_idx, so a
shorter presigned_urls list will raise IndexError.

cytetype/api/client.py[70-75]
cytetype/api/client.py[107-118]
cytetype/api/client.py[180-187]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
R2 uploads assume `presigned_urls` has at least `n_chunks` entries, but `n_chunks` is computed client-side and there is no validation that `presigned_urls` matches. This can cause `IndexError` inside the thread pool.
### Issue Context
The client computes `n_chunks` from file size + `chunk_size_bytes`, then uses `range(n_chunks)` for work dispatch.
### Fix Focus Areas
- cytetype/api/client.py[70-75]
- cytetype/api/client.py[107-118]
- cytetype/api/client.py[180-187]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Missing ETag validation🐞 Bug ⛯ Reliability
Description
put_to_presigned_url returns an empty string when the ETag header is missing, and the completion
payload uses those values without validation. Multipart completion typically requires non-empty
ETags per part; missing ETags will likely cause completion failures.
Code

cytetype/api/transport.py[R135-142]

+            response = self.session.put(
+                url,
+                data=data,
+                headers={"Content-Type": "application/octet-stream"},
+                timeout=timeout,
+            )
+            response.raise_for_status()
+            return response.headers.get("ETag", "")
Evidence
The transport explicitly falls back to an empty ETag string, and the client includes whatever ETag
string it recorded when building the multipart completion 'parts' list. This permits sending empty
ETags to the server.

cytetype/api/transport.py[127-145]
cytetype/api/client.py[201-207]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Multipart completion depends on part ETags, but `put_to_presigned_url()` returns `""` when the header is missing and the client uses it unconditionally.
### Issue Context
If ETag is absent, the upload appears successful but completion will likely fail (or fail unpredictably) due to invalid parts metadata.
### Fix Focus Areas
- cytetype/api/transport.py[127-145]
- cytetype/api/client.py[119-126]
- cytetype/api/client.py[201-207]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Retries on 4xx errors 🐞 Bug ⛯ Reliability
Description
Presigned URL failures with non-JSON bodies are converted into NetworkError(error_code='HTTP_ERROR')
and retried automatically in the R2 chunk loop. This can introduce long delays for non-retryable
errors like signature/permission 4xx responses.
Code

cytetype/api/client.py[R127-142]

+            except (NetworkError, TimeoutError) as exc:
+                last_exc = exc
+            except APIError as exc:
+                if exc.error_code in _RETRYABLE_API_ERROR_CODES:
+                    last_exc = exc
+                else:
+                    raise
+            if attempt < len(_CHUNK_RETRY_DELAYS):
+                delay = _CHUNK_RETRY_DELAYS[attempt]
+                logger.warning(
+                    f"Chunk {chunk_idx + 1}/{n_chunks} upload failed "
+                    f"(attempt {attempt + 1}/{1 + len(_CHUNK_RETRY_DELAYS)}), "
+                    f"retrying in {delay}s: {last_exc}"
+                )
+                time.sleep(delay)
+        raise last_exc  # type: ignore[misc]
Evidence
Non-JSON error responses are coerced into NetworkError with error_code 'HTTP_ERROR'. The R2 upload
loop retries any NetworkError regardless of HTTP status, so 4xx presigned URL errors (commonly
non-JSON) will be retried even when they cannot succeed.

cytetype/api/transport.py[25-42]
cytetype/api/transport.py[127-145]
cytetype/api/client.py[118-142]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The R2 chunk retry loop retries all `NetworkError`s. For presigned URLs, many 4xx errors return non-JSON bodies and become `NetworkError(error_code=&amp;#x27;HTTP_ERROR&amp;#x27;)`, causing futile retries and long delays.
### Issue Context
`_parse_error()` maps non-JSON HTTP errors to `NetworkError` and includes the HTTP status code in the message.
### Fix Focus Areas
- cytetype/api/transport.py[25-42]
- cytetype/api/transport.py[127-145]
- cytetype/api/client.py[118-142]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

- Added a check to ensure the number of presigned URLs matches the expected chunk count, raising a ValueError if there is a discrepancy. This enhancement improves error handling and ensures consistency in the file upload process.
- Updated the response handling in the HTTPTransport class to raise a NetworkError if the ETag header is missing after a successful presigned URL PUT request. This change improves error reporting and ensures that clients are informed of potential issues with the upload response.
- Added APIError exception handling for client-side errors (HTTP 400-499) during presigned URL uploads. This improvement provides clearer feedback when uploads are rejected, enhancing the robustness of the upload process.
@parashardhapola parashardhapola merged commit bd0a977 into master Mar 7, 2026
1 check passed
@parashardhapola parashardhapola deleted the r2_upload branch March 7, 2026 12:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant