Fix bugs in table upload by fcollman · Pull Request #217 · CAVEconnectome/MaterializationEngine

fcollman · 2026-03-04T15:23:44Z

trying to find bugs in CSV upload feature

The inline vectorized spatial processing in process_chunk called .astype(float) on coordinate columns, which fails when coordinates are provided in combined "[x,y,z]" string format. Replace with calls to the existing process_spatial_point method, which already handles both string and separate-column formats. Also fixes missing-coordinate fallback to use None instead of empty string. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Columns present in the DB model but absent from the CSV were silently filled with empty string "". This caused silent data corruption for nullable columns (which expect NULL) and hid constraint violations for non-nullable columns. Use None so nullable columns get a proper NULL and non-nullable columns surface a clear DB error. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…/empty fillna(0), fillna(0.0), and fillna(False) silently substituted 0 or False for missing CSV values, corrupting data for fields where those values have semantic meaning. Use pandas nullable dtypes (Int64, boolean) and remove fillna so missing values remain NA and serialize to empty CSV cells, which PostgreSQL imports as NULL. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ries _id_counter was always initialized to 0, so a retry (processor re-instantiation) would regenerate IDs starting from 1 and collide with rows already written in the staging table. The new id_counter_start parameter lets callers pass the current max ID from the staging table when resuming, making restarts safe. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add id_counter_start parameter to the process_csv Celery task and pass it through to SchemaProcessor. After processing completes, record last_assigned_id in both the return dict and Redis job status. This lets callers (e.g. continuation batch uploads) pass the previous run's final ID so generated IDs don't overlap with existing rows. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

When psql times out or fails, Popen.__exit__ closes dump_proc.stdout and calls wait(). If pg_dump has suppressed SIGPIPE (libpq does this), pg_dump stays alive and wait() blocks forever. Fix by catching any exception inside the with-block, explicitly closing stdout and sending SIGKILL to pg_dump before the exception reaches __exit__, so wait() returns immediately. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

raise ValueError(...) was discarding the original exception context, making it harder to diagnose the root cause of GCS upload failures. Add 'from e' so the full traceback chain is visible in logs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Request parsing and storage.Client() initialization were outside the try/except block, causing unhandled exceptions and opaque 500s when: - request body is missing or not JSON (TypeError on data["filename"]) - "filename"/"contentType" keys are absent (KeyError) - GCS credentials are not configured (DefaultCredentialsError) - MATERIALIZATION_UPLOAD_BUCKET_PATH is not set Move all logic inside the try, add an explicit null check on the request body and the bucket config, and add a KeyError handler to return a 400 for missing fields instead of a 500. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The JS read the error response body and logged it to the console but then threw a new Error containing only the HTTP status code, so the actual server message never reached the UI. Parse the JSON response and include the message field in the thrown error so users see a meaningful description instead of just "Failed to get upload URL: 500". Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

When a Celery chord callback (finalize_chunk_outcome) is lost due to a worker crash or warm shutdown during deployment, the affected chunk remains in PROCESSING_SUBTASKS indefinitely. get_chunks_to_process ignores that status, so the dispatcher polls forever with no recovery. Track entry into PROCESSING_SUBTASKS with a per-chunk timestamp in a new Redis hash (processing_subtasks_timestamps). Add recover_stale_processing_subtasks() to CheckpointManager, which scans for chunks stuck beyond a threshold (default 10 min) and moves them to FAILED_RETRYABLE so the dispatcher can re-dispatch them. The dispatcher calls this method each time it finds no chunks to dispatch, retrying immediately if any are recovered.

fcollman and others added 29 commits March 4, 2026 06:38

improve error messaging

edf031f

fix bug in data chunks strategy

4cd25af

fixing status update and spatial config saving

0b9d0aa

fixing retry exception handling

f4dfb17

improving robustness of transfer

2e0b33f

fixing table check bug

b973a03

fixing retries

650c81c

add postgres-sqlclient to dockerfile

6ac8900

improve error handling

7463f78

fixing dockerfile for postgres-18

1ded706

fixing segmentation metadata update call

fb9a076

fixing bug in segmentation table creation

3d76fe8

adding UI updates and fixing metadata creation

7cbfec0

adding validation to prevent duplicate table names

27d9bfd

adding post-transfer UI updates

6dfe45b

fixing last updated metadata updates

6d37f64

improve error reporting in initial UI and changing defaults

76bede3

improving recovery from partially done chunks

616c10b

adding staging overwrite flow

1d8fac2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix bugs in table upload#217

Fix bugs in table upload#217
fcollman wants to merge 29 commits intomasterfrom
fix_upload

fcollman commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fcollman commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant