-
Notifications
You must be signed in to change notification settings - Fork 4
docs(rfc): add static CSV provider specification #1701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Define the full design for amp-providers-static, covering provider config schema, CSV schema inference with column name sanitization, small-file in-memory caching, and lazy catalog integration into the providers registry. - Specify three-phase implementation plan with dependency ordering - Document provider TOML config with grouped tables and column mapping - Define schema inference rules, header auto-detection, and sanitization - Outline in-memory cache strategy with configurable byte threshold - Record all resolved design decisions in verification log Signed-off-by: Lorenzo Delgado <lorenzo@edgeandnode.com>
|
Did you consider using datasets instead of providers for this? Then this would benefit from the tooling for dataset discoverability. |
That's a very good point. This is something that we should consider after the POC. Yes. I see, at this moment, two main issues:
In the end, for me, the provider's concept (external services that act as a data source) fits naturally in the mental model. I am advocating for a POC to enable some use cases in the short term, and that can evolve alongside the dataset authoring work happening in parallel. |
|
Note that the schema description here is a proposal that could be included or replaced completely by the dataset authoring design (e.g., by introducing a new dataset kind). |
|
Alright we can try out this design then |
Just to comment on this aspect, there are tradeoffs but it wouldn't be unreasonable to design this such that the CSV data is copied over into Amp table format. |
Are you suggesting that we materialize the CSV files into Amp Parquet files? |
Define the full design for amp-providers-static, covering provider config schema, CSV schema inference with column name sanitization, small-file in-memory caching, and lazy catalog integration into the providers registry.