-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add download_decoder + download_extractor #50
Conversation
Signed-off-by: Artem Inzhyyants <[email protected]>
📝 Walkthrough📝 WalkthroughWalkthroughThe changes in this pull request enhance the Changes
Would you like to explore any specific part of these changes further? Sequence Diagram(s)sequenceDiagram
participant User
participant AsyncRetriever
participant ModelToComponentFactory
participant ResponseToFileExtractor
User->>ModelToComponentFactory: create_async_retriever(model)
ModelToComponentFactory->>AsyncRetriever: initialize(download_extractor, download_decoder)
AsyncRetriever->>ResponseToFileExtractor: create_response_to_file_extractor(parameters)
ResponseToFileExtractor->>AsyncRetriever: return extractor instance
AsyncRetriever->>User: return initialized AsyncRetriever
Possibly related PRs
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (5)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (3)
2031-2042
: Consider adding validation for the field_path parameter in the default DpathExtractor.The code handles both custom and default extractors well, but when creating the default DpathExtractor, we're passing an empty list for field_path. Should we validate or document why an empty field_path is acceptable here? wdyt?
2052-2053
: Nice cleanup removing the hardcoded ResponseToFileExtractor!The replacement of the hardcoded
ResponseToFileExtractor
with a configurabledownload_extractor
improves flexibility. The commented-out old code could be removed for cleanliness, wdyt?- # extractor=ResponseToFileExtractor(),# old one
2043-2055
: Consider adding error handling for download_retriever configuration.The download_retriever setup looks good, but it might be worth adding some validation to ensure the download_requester and record_selector are properly configured for the specific download use case. For instance, we could validate that the extractor's configuration is compatible with the download operation. What are your thoughts on this?
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)
2575-2584
: LGTM! The download_decoder property looks good.The property supports a comprehensive set of decoder types. What do you think about enhancing the description to mention the supported formats explicitly? Something like:
"Component decoding the download response so records can be extracted. Supports JSON, JSONL, Iterable, XML, and Gzip compressed formats through their respective decoders."
wdyt? 🤔
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
1679-1681
: Consider adding atitle
to thedownload_extractor
field for consistency.I noticed that the
download_decoder
field includes atitle
parameter in itsField()
definition. Perhaps adding atitle
to thedownload_extractor
field would enhance readability and maintain consistency across your model definitions. What do you think?Here's how you might update it:
download_extractor: Optional[Union[CustomRecordExtractor, DpathExtractor]] = Field( None, description="Responsible for fetching the records from provided urls.", + title="Download Extractor", )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
(2 hunks)airbyte_cdk/sources/declarative/models/declarative_component_schema.py
(2 hunks)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(1 hunks)
🔇 Additional comments (3)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
2026-2030
: LGTM! The download_decoder initialization looks good.
The code properly handles both cases where a custom decoder is provided or falls back to a default JsonDecoder. The parameters are correctly passed through.
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)
2516-2520
: LGTM! The download_extractor property is well-defined.
The property is properly structured and provides good flexibility by supporting both CustomRecordExtractor and DpathExtractor implementations.
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
1732-1745
: The addition of download_decoder
looks great!
You've correctly defined the download_decoder
field with appropriate types, description, and title. This addition enhances the functionality of the AsyncRetriever
. Well done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a test to validate that custom decoders are supported?
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
Outdated
Show resolved
Hide resolved
…ver-split-decoders
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (4)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)
2053-2054
: Remove commented codeThe old implementation reference can be safely removed since it's tracked in version control:
- # extractor=ResponseToFileExtractor(),# old one extractor=download_extractor,
2034-2038
: Fix indentation for better readabilityThe indentation seems a bit off. Would you like to adjust it to match the surrounding code? wdyt?
- model=model.download_extractor, - config=config, - decoder=download_decoder, - parameters=model.parameters, + model=model.download_extractor, + config=config, + decoder=download_decoder, + parameters=model.parameters,airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2)
2516-2520
: The download_extractor implementation looks good! Would you consider enhancing the documentation?The implementation correctly allows for both CustomRecordExtractor and DpathExtractor. Perhaps we could make the description more detailed by adding examples or common use cases? For instance, explaining when to use CustomRecordExtractor vs DpathExtractor for URL-based record extraction, wdyt?
download_extractor: - description: Responsible for fetching the records from provided urls. + description: | + Responsible for fetching the records from provided urls. Use CustomRecordExtractor for complex extraction logic + or DpathExtractor for simple path-based extraction. + + Example using DpathExtractor: + download_extractor: + type: DpathExtractor + field_path: ["data", "records"]
2575-2584
: The download_decoder implementation looks good! Would you consider enhancing the documentation?The implementation correctly supports all necessary decoder types. Perhaps we could make the description more detailed by explaining when to use each decoder type for downloaded content? For instance, when to use GzipJsonDecoder vs JsonDecoder, wdyt?
download_decoder: title: Download Decoder - description: Component decoding the download response so records can be extracted. + description: | + Component decoding the download response so records can be extracted. Choose the decoder based on the response format: + - JsonDecoder: For standard JSON responses + - JsonlDecoder: For newline-delimited JSON + - GzipJsonDecoder: For gzip-compressed JSON + - XmlDecoder: For XML responses + - IterableDecoder: For line-by-line processing + - CustomDecoder: For custom decoding logic
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
(2 hunks)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(1 hunks)
🔇 Additional comments (2)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)
2027-2031
: LGTM: Clean initialization of download_decoder with fallback
The code properly handles both cases where download_decoder is provided or needs to fall back to JsonDecoder.
2032-2043
: Consider validating empty field path
The fallback case creates a DpathExtractor with an empty field path list []
. Should we validate this or provide a default path? wdyt?
✅ Verification successful
Let me check the implementation of DpathExtractor to understand if empty paths are handled properly.
Let me try to find the DpathExtractor implementation using ripgrep instead.
Let me check the actual implementation details of the DpathExtractor to see how it handles empty paths.
Let me check the extract method to see how it handles the empty path.
Empty field path is handled correctly, no validation needed
Looking at the implementation, empty field path is a valid use case and is explicitly handled in the extract_records
method. When field_path
is empty, it returns the entire response body as-is. This behavior is also documented in the class docstring with an example showing empty field path usage. The test cases we found earlier confirm this is an intended behavior.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if DpathExtractor is used with empty paths elsewhere
ast-grep --pattern 'DpathExtractor($$$, field_path=[], $$$)'
Length of output: 1531
Script:
#!/bin/bash
# Check DpathExtractor implementation
ast-grep --pattern 'class DpathExtractor'
Length of output: 43
Script:
#!/bin/bash
# Search for DpathExtractor class definition and its usage
rg "class DpathExtractor" -A 10
Length of output: 2583
Script:
#!/bin/bash
# Get the full implementation of DpathExtractor
rg "class DpathExtractor" -A 50 airbyte_cdk/sources/declarative/extractors/dpath_extractor.py
Length of output: 1909
Script:
#!/bin/bash
# Get the extract method implementation
rg "def extract_records" -A 20 airbyte_cdk/sources/declarative/extractors/dpath_extractor.py
Length of output: 861
Signed-off-by: Artem Inzhyyants <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2)
1334-1350
: Documentation needs improvement, wdyt?The
ResponseToFileExtractor
component's documentation could be enhanced:
- The description is incomplete: "Record extractor that downloads a headless"
- The title mentions "CSV" but the description doesn't specify file formats
- Consider adding:
- Supported file formats
- When to use this extractor vs others
- Example usage
- Purpose of
needs_decompression
Line range hint
2533-2602
: Consider enhancing the AsyncRetriever documentation, what do you think?The new properties look good, but their documentation could be more helpful:
For
download_extractor
, consider explaining:
- The relationship with
urls_extractor
- When to use each extractor type
- Example configurations
For
download_decoder
, consider clarifying:
- How it differs from the main
decoder
property- When a separate download decoder is needed
- Typical use cases for each decoder type
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (4)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
(3 hunks)airbyte_cdk/sources/declarative/extractors/response_to_file_extractor.py
(4 hunks)airbyte_cdk/sources/declarative/models/declarative_component_schema.py
(3 hunks)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(4 hunks)
👮 Files not reviewed due to content moderation or server errors (3)
- airbyte_cdk/sources/declarative/extractors/response_to_file_extractor.py
- airbyte_cdk/sources/declarative/models/declarative_component_schema.py
- airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
Signed-off-by: Artem Inzhyyants <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (2)
3280-3286
: Consider adding error handling for JSON decoding, wdyt?The
extract_records
method directly yields fromresponse.json()
which could raise a JSONDecodeError if the response contains invalid JSON. Consider adding try-except to handle potential errors gracefully.def extract_records( self, response: requests.Response, ) -> Iterable[Mapping[str, Any]]: - yield from response.json() + try: + yield from response.json() + except requests.exceptions.JSONDecodeError as e: + raise ValueError(f"Failed to decode response as JSON: {str(e)}")
3288-3294
: Consider adding more test cases, wdyt?The test verifies the basic instantiation but could be enhanced to test the actual functionality:
- Test with valid JSON response
- Test with invalid JSON response
- Test with empty response
def test_create_custom_record_extractor(): # Test instantiation definition = { "type": "CustomRecordExtractor", "class_name": "unit_tests.sources.declarative.parsers.test_model_to_component_factory.CustomRecordExtractor", } component = factory.create_component(CustomRecordExtractorModel, definition, {}) assert isinstance(component, CustomRecordExtractor) # Test with valid JSON mock_response = MagicMock(spec=requests.Response) mock_response.json.return_value = [{"id": 1}, {"id": 2}] records = list(component.extract_records(mock_response)) assert records == [{"id": 1}, {"id": 2}] # Test with invalid JSON mock_response.json.side_effect = requests.exceptions.JSONDecodeError("msg", "doc", 0) with pytest.raises(ValueError): list(component.extract_records(mock_response))
Signed-off-by: Artem Inzhyyants <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
airbyte_cdk/sources/declarative/requesters/http_job_repository.py (1)
45-45
: Consider adding a docstring for the config parameter?Since we're introducing a config parameter to
ResponseToFileExtractor
, would it be helpful to add documentation about its purpose and expected values? This could help future developers understand the configuration options, wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
airbyte_cdk/sources/declarative/requesters/http_job_repository.py
(1 hunks)unit_tests/sources/declarative/extractors/test_response_to_file_extractor.py
(1 hunks)unit_tests/sources/declarative/requesters/test_http_job_repository.py
(1 hunks)
🔇 Additional comments (4)
unit_tests/sources/declarative/extractors/test_response_to_file_extractor.py (2)
17-17
: LGTM! Constructor update is consistent.
The change aligns with the updated ResponseToFileExtractor
implementation.
Line range hint 98-98
: LGTM! Constructor update is consistent.
The change maintains consistency with the updated implementation.
airbyte_cdk/sources/declarative/requesters/http_job_repository.py (1)
45-45
: LGTM! Constructor update is consistent.
The change maintains consistency with the updated ResponseToFileExtractor
implementation across the codebase.
unit_tests/sources/declarative/requesters/test_http_job_repository.py (1)
98-98
: LGTM! Constructor update is consistent.
The change maintains consistency with the updated implementation.
Signed-off-by: Artem Inzhyyants <[email protected]>
Signed-off-by: Artem Inzhyyants <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main concern is regarding backward compatibility with source-sendgrid. Once we figure this out, I'll approve
airbyte_cdk/sources/declarative/extractors/response_to_file_extractor.py
Outdated
Show resolved
Hide resolved
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
Outdated
Show resolved
Hide resolved
…ver-split-decoders
Signed-off-by: Artem Inzhyyants <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (3)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (3)
1334-1346
: Consider enhancing the component description for better clarity.The description could be more detailed about:
- When to use this extractor (e.g., size thresholds)
- File lifecycle management (creation, cleanup)
- Error handling for disk operations
- Performance implications
What do you think about expanding the description like this?
- description: A record extractor designed for handling large responses that may exceed memory limits (to prevent OOM issues). It downloads a CSV file to disk, reads the data from disk, and deletes the file once it has been fully processed. + description: | + A record extractor designed for handling large responses that may exceed memory limits (to prevent OOM issues). + + Key features: + - Downloads response content to a temporary file on disk + - Streams records from disk to minimize memory usage + - Automatically cleans up temporary files after processing + - Suitable for responses larger than available memory + + Usage considerations: + - Recommended for responses > 500MB + - Requires sufficient disk space + - May have slower processing due to disk I/O
2529-2534
: The download_extractor property looks good but could use an example.The property is well-defined with proper references to available extractors. Consider adding an example to illustrate common usage patterns?
What do you think about adding usage examples like this?
download_extractor: description: Responsible for fetching the records from provided urls. anyOf: - "$ref": "#/definitions/CustomRecordExtractor" - "$ref": "#/definitions/DpathExtractor" - "$ref": "#/definitions/ResponseToFileExtractor" + examples: + - type: ResponseToFileExtractor # For large file downloads + - type: DpathExtractor # For JSON responses + field_path: ["data", "records"]
2589-2598
: The download_decoder property looks good but could benefit from usage guidance.The property supports all standard decoders but could use guidance on when to use each type.
What do you think about adding usage guidance like this?
download_decoder: title: Download Decoder - description: Component decoding the download response so records can be extracted. + description: | + Component decoding the download response so records can be extracted. + + Common decoder choices: + - JsonDecoder: For JSON responses + - JsonlDecoder: For newline-delimited JSON files + - GzipJsonDecoder: For compressed JSON responses + - XmlDecoder: For XML responses + - IterableDecoder: For line-by-line processing anyOf: - "$ref": "#/definitions/CustomDecoder" - "$ref": "#/definitions/JsonDecoder" - "$ref": "#/definitions/JsonlDecoder" - "$ref": "#/definitions/IterableDecoder" - "$ref": "#/definitions/XmlDecoder" - "$ref": "#/definitions/GzipJsonDecoder"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (4)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
(3 hunks)airbyte_cdk/sources/declarative/extractors/response_to_file_extractor.py
(3 hunks)airbyte_cdk/sources/declarative/models/declarative_component_schema.py
(3 hunks)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(4 hunks)
👮 Files not reviewed due to content moderation or server errors (3)
- airbyte_cdk/sources/declarative/extractors/response_to_file_extractor.py
- airbyte_cdk/sources/declarative/models/declarative_component_schema.py
- airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
…ver-split-decoders # Conflicts: # airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
1454-1460
: Consider adding type hints for better code clarity?The method looks good functionally, but would adding type hints for the return type make it more consistent with other factory methods, wdyt?
def create_response_to_file_extractor( self, model: ResponseToFileExtractorModel, **kwargs: Any, - ) -> ResponseToFileExtractor: + ) -> ResponseToFileExtractor[Any]: return ResponseToFileExtractor(parameters=model.parameters or {})unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (1)
3281-3287
: Consider adding docstring and type hints for better code clarity?The implementation looks good, but would adding a docstring and return type hint make it more maintainable, wdyt?
class CustomRecordExtractor(RecordExtractor): + """Test implementation of RecordExtractor that yields JSON response directly.""" def extract_records( self, response: requests.Response, - ) -> Iterable[Mapping[str, Any]]: + ) -> Iterable[Mapping[str, Any]]: yield from response.json()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(4 hunks)unit_tests/sources/declarative/parsers/test_model_to_component_factory.py
(4 hunks)
🔇 Additional comments (3)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)
273-275
: LGTM!
Clean import addition for the new ResponseToFileExtractorModel
.
433-433
: LGTM!
Clean addition of the new model-to-constructor mapping.
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (1)
3289-3295
: LGTM!
Clean test implementation that verifies the factory's ability to create custom record extractors.
Signed-off-by: Artem Inzhyyants <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)
1454-1460
: Should we enhance the ResponseToFileExtractor factory method?The current implementation is simpler than other factory methods. Consider adding:
- Config parameter usage (it's passed to most other extractors)
- Error handling for invalid parameters
- Decoder handling (similar to other extractors)
What do you think about making it consistent with other extractor factory methods?
def create_response_to_file_extractor( self, model: ResponseToFileExtractorModel, + config: Config, + decoder: Optional[Decoder] = None, **kwargs: Any, ) -> ResponseToFileExtractor: - return ResponseToFileExtractor(parameters=model.parameters or {}) + return ResponseToFileExtractor( + config=config, + parameters=model.parameters or {}, + decoder=decoder or JsonDecoder(parameters={}) + )
2043-2059
: Should we improve parameter consistency and naming?A few suggestions to enhance readability and consistency:
- The
parameters
are passed to download_extractor but not to download_decoder - should we make this consistent?- Consider renaming
download_decoder
todownload_response_decoder
to be more descriptive about its purpose- The empty list
[]
in the DpathExtractor fallback could be more explicit about what it represents (field_path)WDYT about these improvements?
download_decoder = ( self._create_component_from_model(model=model.download_decoder, config=config) if model.download_decoder - else JsonDecoder(parameters={}) + else JsonDecoder(parameters=model.parameters or {}) ) download_extractor = ( self._create_component_from_model( model=model.download_extractor, config=config, decoder=download_decoder, parameters=model.parameters, ) if model.download_extractor else DpathExtractor( - [], + field_path=[], config=config, decoder=download_decoder, parameters=model.parameters ) )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(5 hunks)
🔇 Additional comments (1)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
273-275
: LGTM! Clean addition of ResponseToFileExtractor to the component mappings
The new component is properly integrated into the factory's type system.
Also applies to: 433-433
@maxi297 Index: airbyte-integrations/connectors/source-sendgrid/manifest.yaml
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/airbyte-integrations/connectors/source-sendgrid/manifest.yaml b/airbyte-integrations/connectors/source-sendgrid/manifest.yaml
--- a/airbyte-integrations/connectors/source-sendgrid/manifest.yaml (revision e47ad300dee26c2ff964e856d90a751cbc76969d)
+++ b/airbyte-integrations/connectors/source-sendgrid/manifest.yaml (date 1733159536552)
@@ -640,6 +640,8 @@
path: "{{stream_slice['url']}}"
url_base: ""
http_method: GET
+ download_extractor:
+ type: ResponseToFileExtractor
schema_loader:
type: InlineSchemaLoader
schema:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks Artem. I'm eager to see this applied to amazon-ads
Signed-off-by: Artem Inzhyyants <[email protected]>
Signed-off-by: Artem Inzhyyants <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)
1454-1460
: Consider adding a docstring for consistency?The implementation looks good, but other create methods in this class have docstrings. Would you consider adding one here as well? wdyt?
def create_response_to_file_extractor( self, model: ResponseToFileExtractorModel, **kwargs: Any, ) -> ResponseToFileExtractor: + """Creates a ResponseToFileExtractor component. + + Args: + model: The model containing the component configuration + **kwargs: Additional keyword arguments + + Returns: + ResponseToFileExtractor: The created component + """ return ResponseToFileExtractor(parameters=model.parameters or {})
2043-2062
: LGTM! Consider extracting decoder/extractor creation?The implementation looks good with proper null checks and fallbacks. However, the download_decoder and download_extractor creation logic is quite similar to other parts of the code. Would you consider extracting these into helper methods to improve readability? wdyt?
Something like:
+ def _create_download_decoder( + self, + model: AsyncRetrieverModel, + config: Config, + ) -> Decoder: + return ( + self._create_component_from_model(model=model.download_decoder, config=config) + if model.download_decoder + else JsonDecoder(parameters={}) + ) + + def _create_download_extractor( + self, + model: AsyncRetrieverModel, + config: Config, + decoder: Decoder, + parameters: Dict[str, Any], + ) -> Union[ResponseToFileExtractor, DpathExtractor]: + return ( + self._create_component_from_model( + model=model.download_extractor, + config=config, + decoder=decoder, + parameters=parameters, + ) + if model.download_extractor + else DpathExtractor( + [], + config=config, + decoder=decoder, + parameters=parameters or {}, + ) + ) def create_async_retriever( # ... existing parameters ... ) -> AsyncRetriever: # ... existing code ... - download_decoder = ( - self._create_component_from_model(model=model.download_decoder, config=config) - if model.download_decoder - else JsonDecoder(parameters={}) - ) - download_extractor = ( - self._create_component_from_model( - model=model.download_extractor, - config=config, - decoder=download_decoder, - parameters=model.parameters, - ) - if model.download_extractor - else DpathExtractor( - [], - config=config, - decoder=download_decoder, - parameters=model.parameters or {}, - ) - ) + download_decoder = self._create_download_decoder(model, config) + download_extractor = self._create_download_extractor( + model, config, download_decoder, model.parameters + ) # ... rest of the code ...
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(5 hunks)
🔇 Additional comments (2)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)
273-275
: LGTM!
The import follows the existing pattern and is correctly grouped with other model imports.
433-433
: LGTM!
The mapping is correctly added and maintains alphabetical ordering.
What
resolving https://github.com/airbytehq/airbyte-internal-issues/issues/10429
Summary by CodeRabbit
Summary by CodeRabbit
New Features
download_extractor
anddownload_decoder
properties in the asynchronous data retrieval process, enhancing flexibility for record extraction and decoding.ResponseToFileExtractor
class for handling responses that require extraction and decompression.CustomRecordExtractor
class for improved record extraction from JSON responses.ResponseToFileExtractor
tests.Bug Fixes
download_extractor
instantiation to ensure default functionality when not explicitly defined.