Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify Teams connector design to not run a queue inside of the connector #2933

Open
artem-shelkovnikov opened this issue Nov 1, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@artem-shelkovnikov
Copy link
Member

Problem Description

Teams connector is syncing data using Graph API: https://github.com/elastic/connectors/blob/main/connectors/sources/microsoft_teams.py#L249.

Graph API has tough throttling policies and in most cases there's no point implementing high concurrency when loading data from it. Teams connector does concurrent requests to Graph API:

class MicrosoftTeamsDataSource(BaseDataSource):
    def __init__(self, configuration):
        self.queue = MemQueue(maxmemsize=QUEUE_MEM_SIZE, refresh_timeout=120)
        ...

    async def get_docs(self):
        ...
        async for tabs in self.client.get_user_chat_tabs(chat_id=chat["id"]):
            for tab in tabs:
                await self.queue.put(
                    (
                        self.formatter.format_doc(
                            item=tab,
                            document_type=self.schema.chat_tabs,
                            document={
                                "type": UserEndpointName.TABS.value,
                                "url": tab.get("configuration", {}).get("websiteUrl"),
                                "_timestamp": chat["lastUpdatedDateTime"],
                                "members": members,
                            },
                        ),
                        None,
                    )
                )
        ...

which:

  1. Makes code more complex
  2. Makes errors much less readable due to the way they are handled (in async queue rather than in regular async methods)
  3. Does not add any parallelism even - see the code snippet, it puts the formatter statements into the queue, so data is loaded in a regular way

Proposed Solution

Get rid of usage of MemQueue from Microsoft Teams connector

Alternatives

None

Additional Context

This issue was created after seeing a user reporting problems and their log statements from the connector were absolutely unreadable:

[FMWK][15:53:04][ERROR] Exception found for task Task-471: 'NoneType' object has no attribute 'get'
NoneType: None

Investigation has shown that it happens because the queue is masking the exceptions potentially

@artem-shelkovnikov artem-shelkovnikov added the enhancement New feature or request label Nov 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant