-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parquet streaming [WIP] #538
base: dev
Are you sure you want to change the base?
Commits on Nov 11, 2023
-
Configuration menu - View commit details
-
Copy full SHA for a51288f - Browse repository at this point
Copy the full SHA a51288fView commit details
Commits on Nov 18, 2023
-
Move examples out, merge base/ upward (#494)
* scripts/ -> benchmarks/. * examples/ -> notebooks/. * streaming/multimodal/ -> examples/multimodal/ (reorganized). * streaming/text/ -> examples/text (reorganized). * streaming/vision/base.py -> streaming/base/vision.py. * Switch streaming/base/vision.py to kwargs. * streaming/vision/ -> examples/vision/. * Update pyproject.toml. * And .pre-commit-config.yaml. * Fix headers. * Collapse "base/": streaming/base/ -> streaming/. * Fil imports re: collapsing the `base/` dirs upward. * Fixes (imports and indentation). * Update test_streaming_remote.py to not rely on any specific SD example subclasses * Fix pypyroject config. * Update paths. * Fix. * More examples/ moves. * Comma-tailing args. * Fix links. * More fixes. * Fix missing license. * How about this for import redirects... * Or this... * Improve redirect deprecation warning. * examples/ tree: __init__ imports and __all__'s.^ * benchmarks/ tree: __init__ imports and __all__'s * notebooks/ tree: __init__ imports and __all__'s. * Add notebooks/ symlink to docs/source. * Add benchmarks, examples, and notebooks trees to document_modules. * Also add benchmarks symlink. Or should we only symlink to notebooks/?
Configuration menu - View commit details
-
Copy full SHA for 7f5d160 - Browse repository at this point
Copy the full SHA 7f5d160View commit details
Commits on Dec 7, 2023
-
Dataset kwargs switchover. (#523)
* Dataset kwargs switchover. * Docstrings: **kwargs not kwargs. * Docstrings: Callable not callable. * Add dev to workflows.
Configuration menu - View commit details
-
Copy full SHA for 3cd8a22 - Browse repository at this point
Copy the full SHA 3cd8a22View commit details
Commits on Dec 12, 2023
-
* Break up util.py. * Update streaming/util/importing.py Co-authored-by: Karan Jariwala <[email protected]> * Update streaming/util/importing.py Co-authored-by: Karan Jariwala <[email protected]> * Add basic import redirect test. --------- Co-authored-by: Karan Jariwala <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for bf81f6b - Browse repository at this point
Copy the full SHA bf81f6bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 0ecf06f - Browse repository at this point
Copy the full SHA 0ecf06fView commit details
Commits on Dec 14, 2023
-
Redo/generalize/tighten args shorthand (#530)
* Redo/generalize/tighten args shorthand, clean up usage, update tests. * Fix (cruft). * Fix (typo). * Fix (reference to member). * Tweak. * Divide tests/test_util.py into tests/util/....py. * Fix. * Error messages. * Lowercase, no space.
Configuration menu - View commit details
-
Copy full SHA for 7c3fa05 - Browse repository at this point
Copy the full SHA 7c3fa05View commit details
Commits on Dec 15, 2023
-
Add benchmarking suite for all backends and formats (#533)
* Benchmarking all backends and formats. * Fix (missing docstrings).
Configuration menu - View commit details
-
Copy full SHA for d969cd6 - Browse repository at this point
Copy the full SHA d969cd6View commit details -
Configuration menu - View commit details
-
Copy full SHA for 78c150e - Browse repository at this point
Copy the full SHA 78c150eView commit details -
* New storage APIs. * Potentially fix import issue. * Fix (path). * Fix (paths). * Fix (paths).
Configuration menu - View commit details
-
Copy full SHA for 02bd910 - Browse repository at this point
Copy the full SHA 02bd910View commit details -
Improve naming: JSON shards are actually JSONL, etc. (#537)
* Stdize docstrings, also fix ordering of get_sample_data, decode_sample. * Terminology: "joint" -> "mono". * "split" -> "dual" to stop confusing people (SplitWriter != dataaset splits) * "Reader" -> "Shard". They manage shards. They do more than read. * Fix filenames accordingly. * Finally, JSON -> JSONL. * Switch order of decorators... * Fix markdown code.
Configuration menu - View commit details
-
Copy full SHA for 3972c9d - Browse repository at this point
Copy the full SHA 3972c9dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 3dcf500 - Browse repository at this point
Copy the full SHA 3dcf500View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6d4bb55 - Browse repository at this point
Copy the full SHA 6d4bb55View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2a383b6 - Browse repository at this point
Copy the full SHA 2a383b6View commit details