Skip to content
This repository has been archived by the owner on Mar 1, 2021. It is now read-only.

insta_posts with user_id=0 in elasticsearch #224

Open
Urhengulas opened this issue Dec 4, 2019 · 0 comments
Open

insta_posts with user_id=0 in elasticsearch #224

Urhengulas opened this issue Dec 4, 2019 · 0 comments

Comments

@Urhengulas
Copy link
Collaborator

Description

At the moment we have 583652 insta_posts in our elasticsearch indexed, which have an user_id of 0.

You can see that in aggregations.user.buckets of the return of the request in kibana of this:

GET /insta_posts/_search
{"aggregations":{"user":{"terms":{"field":"user_id"}}}}

Some statistics regarding our postgres:

instascraper=> SELECT COUNT(*) FROM posts WHERE user_id is NULL;
 count  
--------
 724473
(1 row)

instascraper=> SELECT max(id) FROM posts WHERE user_id is NULL;
   max    
----------
 49919470
(1 row)

instascraper=> SELECT min(id) FROM posts WHERE user_id is NULL;
 min  
------
 9641
(1 row)

We guess that in some messages in kafka/postgres.public.posts have no user_id and thereby the indexer is using the zero value for that field which is 0.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant