Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parquet streaming [WIP] #538

Open
wants to merge 13 commits into
base: dev
Choose a base branch
from
Open

Parquet streaming [WIP] #538

wants to merge 13 commits into from

Commits on Nov 11, 2023

  1. Configuration menu
    Copy the full SHA
    a51288f View commit details
    Browse the repository at this point in the history

Commits on Nov 18, 2023

  1. Move examples out, merge base/ upward (#494)

    * scripts/ -> benchmarks/.
    
    * examples/ -> notebooks/.
    
    * streaming/multimodal/ -> examples/multimodal/ (reorganized).
    
    * streaming/text/ -> examples/text (reorganized).
    
    * streaming/vision/base.py -> streaming/base/vision.py.
    
    * Switch streaming/base/vision.py to kwargs.
    
    * streaming/vision/ -> examples/vision/.
    
    * Update pyproject.toml.
    
    * And .pre-commit-config.yaml.
    
    * Fix headers.
    
    * Collapse "base/": streaming/base/ -> streaming/.
    
    * Fil imports re: collapsing the `base/` dirs upward.
    
    * Fixes (imports and indentation).
    
    * Update test_streaming_remote.py to not rely on any specific SD example subclasses
    
    * Fix pypyroject config.
    
    * Update paths.
    
    * Fix.
    
    * More examples/ moves.
    
    * Comma-tailing args.
    
    * Fix links.
    
    * More fixes.
    
    * Fix missing license.
    
    * How about this for import redirects...
    
    * Or this...
    
    * Improve redirect deprecation warning.
    
    * examples/ tree: __init__ imports and __all__'s.^
    
    * benchmarks/ tree: __init__ imports and __all__'s
    
    * notebooks/ tree: __init__ imports and __all__'s.
    
    * Add notebooks/ symlink to docs/source.
    
    * Add benchmarks, examples, and notebooks trees to document_modules.
    
    * Also add benchmarks symlink. Or should we only symlink to notebooks/?
    knighton authored Nov 18, 2023
    Configuration menu
    Copy the full SHA
    7f5d160 View commit details
    Browse the repository at this point in the history

Commits on Dec 7, 2023

  1. Dataset kwargs switchover. (#523)

    * Dataset kwargs switchover.
    
    * Docstrings: **kwargs not kwargs.
    
    * Docstrings: Callable not callable.
    
    * Add dev to workflows.
    knighton authored Dec 7, 2023
    Configuration menu
    Copy the full SHA
    3cd8a22 View commit details
    Browse the repository at this point in the history

Commits on Dec 12, 2023

  1. Organize utils. (#524)

    * Break up util.py.
    
    * Update streaming/util/importing.py
    
    Co-authored-by: Karan Jariwala <[email protected]>
    
    * Update streaming/util/importing.py
    
    Co-authored-by: Karan Jariwala <[email protected]>
    
    * Add basic import redirect test.
    
    ---------
    
    Co-authored-by: Karan Jariwala <[email protected]>
    knighton and karan6181 authored Dec 12, 2023
    Configuration menu
    Copy the full SHA
    bf81f6b View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    0ecf06f View commit details
    Browse the repository at this point in the history

Commits on Dec 14, 2023

  1. Redo/generalize/tighten args shorthand (#530)

    * Redo/generalize/tighten args shorthand, clean up usage, update tests.
    
    * Fix (cruft).
    
    * Fix (typo).
    
    * Fix (reference to member).
    
    * Tweak.
    
    * Divide tests/test_util.py into tests/util/....py.
    
    * Fix.
    
    * Error messages.
    
    * Lowercase, no space.
    knighton authored Dec 14, 2023
    Configuration menu
    Copy the full SHA
    7c3fa05 View commit details
    Browse the repository at this point in the history

Commits on Dec 15, 2023

  1. Add benchmarking suite for all backends and formats (#533)

    * Benchmarking all backends and formats.
    
    * Fix (missing docstrings).
    knighton authored Dec 15, 2023
    Configuration menu
    Copy the full SHA
    d969cd6 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    78c150e View commit details
    Browse the repository at this point in the history
  3. New storage APIs (#536)

    * New storage APIs.
    
    * Potentially fix import issue.
    
    * Fix (path).
    
    * Fix (paths).
    
    * Fix (paths).
    knighton authored Dec 15, 2023
    Configuration menu
    Copy the full SHA
    02bd910 View commit details
    Browse the repository at this point in the history
  4. Improve naming: JSON shards are actually JSONL, etc. (#537)

    * Stdize docstrings, also fix ordering of get_sample_data, decode_sample.
    
    * Terminology: "joint" -> "mono".
    
    * "split" -> "dual" to stop confusing people (SplitWriter != dataaset splits)
    
    * "Reader" -> "Shard". They manage shards. They do more than read.
    
    * Fix filenames accordingly.
    
    * Finally, JSON -> JSONL.
    
    * Switch order of decorators...
    
    * Fix markdown code.
    knighton authored Dec 15, 2023
    Configuration menu
    Copy the full SHA
    3972c9d View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    3dcf500 View commit details
    Browse the repository at this point in the history
  6. Fix (docstrings).

    knighton committed Dec 15, 2023
    Configuration menu
    Copy the full SHA
    6d4bb55 View commit details
    Browse the repository at this point in the history
  7. Fix (import).

    knighton committed Dec 15, 2023
    Configuration menu
    Copy the full SHA
    2a383b6 View commit details
    Browse the repository at this point in the history