Skip to content

feat(tracing): add ClickHouse tracing backend and benchmark suite#3872

Draft
mmabrouk wants to merge 1 commit intomainfrom
feat/clickhouse-tracing-backend
Draft

feat(tracing): add ClickHouse tracing backend and benchmark suite#3872
mmabrouk wants to merge 1 commit intomainfrom
feat/clickhouse-tracing-backend

Conversation

@mmabrouk
Copy link
Member

@mmabrouk mmabrouk commented Mar 1, 2026

Summary

  • add a new ClickHouse tracing DAO and wire TRACING_BACKEND=clickhouse into API + tracing/evaluations workers
  • implement ClickHouse read/write paths for traces (query, fetch, analytics, legacy analytics, sessions/users) with compatibility fixes for filtering and timestamps
  • add ClickHouse tracing design docs, schemas, and benchmark tooling/results notes under docs/design/clickhouse-tracing/ plus helper benchmark scripts

Notes

  • this PR intentionally includes only ClickHouse-related changes from the current worktree
  • unrelated local edits (EE throttling/auth/postgres tuning and raw benchmark result artifacts) were left out

Open with Devin

@vercel
Copy link

vercel bot commented Mar 1, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agenta-documentation Building Building Preview, Comment Mar 1, 2026 8:48pm

Request Review

@mmabrouk
Copy link
Member Author

mmabrouk commented Mar 1, 2026

@jp-agenta burn after reading

Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 4 potential issues.

View 7 additional findings in Devin Review.

Open in Devin Review

Comment on lines +301 to +302
if isinstance(operator, ExistenceOperator):
return f"{expr} IS {'NOT ' if operator == ExistenceOperator.NOT_EXISTS else ''}NULL"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 ExistenceOperator logic is inverted — EXISTS produces IS NULL, NOT_EXISTS produces IS NOT NULL

The ExistenceOperator handling in _build_uuid_condition, _build_string_condition, and _build_list_json_condition is inverted compared to the Postgres DAO reference implementation.

Root Cause and Impact

The ternary expression uses ExistenceOperator.NOT_EXISTS to decide whether to insert NOT, but the logic is backwards:

f"{expr} IS {'NOT ' if operator == ExistenceOperator.NOT_EXISTS else ''}NULL"
  • NOT_EXISTS → produces IS NOT NULL (should be IS NULL)
  • EXISTS → produces IS NULL (should be IS NOT NULL)

The Postgres DAO (oss/src/dbs/postgres/tracing/utils.py:336-348) correctly implements:

  • ExistenceOperator.EXISTSisnot(None) (IS NOT NULL)
  • ExistenceOperator.NOT_EXISTSis_(None) (IS NULL)

This bug is present in three locations:

  • _build_uuid_condition at line 302
  • _build_string_condition at line 387
  • _build_list_json_condition at line 428

Impact: Any filter using exists or not_exists operators will return the opposite result set — spans where the field is missing will be returned when present ones are requested, and vice versa.

Prompt for agents
Fix the inverted ExistenceOperator logic in three locations in api/oss/src/dbs/clickhouse/tracing/dao.py:

1. Line 302 in _build_uuid_condition: Change ExistenceOperator.NOT_EXISTS to ExistenceOperator.EXISTS
2. Line 387 in _build_string_condition: Change ExistenceOperator.NOT_EXISTS to ExistenceOperator.EXISTS
3. Line 428 in _build_list_json_condition: Change ExistenceOperator.NOT_EXISTS to ExistenceOperator.EXISTS

In all three cases the pattern should be:
  f"{expr} IS {'NOT ' if operator == ExistenceOperator.EXISTS else ''}NULL"

This makes EXISTS produce IS NOT NULL and NOT_EXISTS produce IS NULL, matching the Postgres DAO behavior.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +98 to +99
def _escape_sql(self, value: str) -> str:
return value.replace("'", "''")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 SQL escape insufficient for ClickHouse — backslashes not escaped, enabling SQL injection

The _escape_sql method only escapes single quotes (''') but does not escape backslashes. ClickHouse interprets \' as an escaped single quote in string literals by default (setting enable_backslash_quoting=1).

Detailed Explanation

The method at api/oss/src/dbs/clickhouse/tracing/dao.py:98-99:

def _escape_sql(self, value: str) -> str:
    return value.replace("'", "''")

Consider a user-controlled filter value of \. After _escape_sql, it remains \. When interpolated into SQL as 'test\', ClickHouse parses \' as an escaped single quote, leaving the string literal unterminated.

More critically, a crafted value like \' OR 1=1 -- becomes \'' OR 1=1 -- after escaping. ClickHouse parses \' as ', then ' terminates the string, and OR 1=1 -- becomes executable SQL.

This affects every method that uses _escape_sql to build SQL strings, including:

  • _format_literal (line 112) — used throughout condition building
  • _condition_to_sql for content field (line 239) — uses match() with user input
  • _string_match_expr (lines 188-216) — all LIKE/= pattern building
  • _json_extract_expr (line 156) — JSON path keys

Impact: An authenticated user with API access can inject arbitrary SQL into ClickHouse queries via filter condition values, potentially accessing data from other projects or causing denial of service.

Suggested change
def _escape_sql(self, value: str) -> str:
return value.replace("'", "''")
def _escape_sql(self, value: str) -> str:
return value.replace("\\", "\\\\").replace("'", "''")
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +210 to +216
if case_sensitive:
return clause
if "LIKE" in clause:
rhs = clause.split("LIKE")[-1].strip()
return f"lower({expr}) LIKE lower({rhs})"
rhs = clause.split("=")[-1].strip()
return f"lower({expr}) = lower({rhs})"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Case-insensitive string matching breaks when value contains 'LIKE' or '=' substring

The _string_match_expr method uses naive string splitting on "LIKE" and "=" to extract the RHS of the clause for wrapping in lower(). When the user-provided search value contains those substrings, the split produces incorrect SQL.

Root Cause

At api/oss/src/dbs/clickhouse/tracing/dao.py:210-216, when case_sensitive=False (the default per TextOptions):

if "LIKE" in clause:
    rhs = clause.split("LIKE")[-1].strip()
    return f"lower({expr}) LIKE lower({rhs})"
rhs = clause.split("=")[-1].strip()
return f"lower({expr}) = lower({rhs})"

For example, with operator=CONTAINS and value="I LIKE cats", the clause is:
span_name LIKE '%I LIKE cats%'

Splitting by "LIKE" produces ["span_name ", " '%I ", " cats%'"]. Taking [-1] gives " cats%'". The resulting SQL is:
lower(span_name) LIKE lower( cats%') — syntactically broken SQL that will cause a ClickHouse query error.

Impact: Any case-insensitive string filter (the default) where the search value contains "LIKE" or "=" will produce malformed SQL, causing the query to fail.

Prompt for agents
In api/oss/src/dbs/clickhouse/tracing/dao.py, refactor _string_match_expr (lines 173-216) to avoid parsing the clause string. Instead of building the clause first and then trying to split it apart, build the case-insensitive version directly. For example:

Replace lines 210-216 with logic that wraps expr and the value in lower() calls during clause construction rather than post-hoc string splitting. Something like:

  ci_expr = f"lower({expr})"
  ci_wrap = lambda text: f"lower('{self._escape_sql(text)}')"

Then use ci_expr and ci_wrap when building LIKE/= clauses for the case-insensitive branch.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +70 to +73
self.url = os.getenv(
"CLICKHOUSE_TRACING_URL",
"https://er3fulih5c.eu-central-1.aws.clickhouse.cloud:8443",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Hardcoded production ClickHouse Cloud URL as default credential

The ClickHouseTracingDAO.__init__ defaults the ClickHouse URL to a production ClickHouse Cloud instance when CLICKHOUSE_TRACING_URL is not set.

Details

At api/oss/src/dbs/clickhouse/tracing/dao.py:70-72:

self.url = os.getenv(
    "CLICKHOUSE_TRACING_URL",
    "https://er3fulih5c.eu-central-1.aws.clickhouse.cloud:8443",
)

This hardcodes a production ClickHouse Cloud endpoint as the default. If a developer or staging environment enables TRACING_BACKEND=clickhouse without setting CLICKHOUSE_TRACING_URL, queries will silently hit the production ClickHouse Cloud instance. Combined with the default CLICKHOUSE_TRACING_PASSWORD being empty string (line 75), this could cause confusing connection failures or — if the password happens to be set in the environment for other purposes — unintended writes to production.

Impact: Risk of accidental production data access from non-production environments.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@jp-agenta jp-agenta added olap/clickhouse Managed ClickHouse database maintenance/community Primarily under community maintenance support/experimental Experimental or best-effort repository surface and removed olap/clickhouse Managed ClickHouse database labels Mar 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

maintenance/community Primarily under community maintenance olap/clickhouse Managed ClickHouse database support/experimental Experimental or best-effort repository surface

Projects

Development

Successfully merging this pull request may close these issues.

2 participants