Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add streets table from data.gov #2478

Closed
atalyaalon opened this issue Sep 27, 2023 · 10 comments
Closed

Add streets table from data.gov #2478

atalyaalon opened this issue Sep 27, 2023 · 10 comments
Assignees
Milestone

Comments

@atalyaalon
Copy link
Collaborator

The list in data.gov is here

  1. Add it to our DB
  2. Create a flow in Airflow that updates it once a week
  3. Decide where to use it and how in our streets cache, discuss with @tkalir and me, and let's implement
@atalyaalon atalyaalon added this to the v.0.17.0 milestone Sep 27, 2023
@ziv17
Copy link
Collaborator

ziv17 commented Feb 2, 2024

Hi @atalyaalon,

  • this is still relevant, right?
  • do you mean to download manually, or automatically as part of the insert?
  • If automatically, is there an endpoint for API, or do I need to parse the HTML?
  • If manually:
  • Encoding: originally it is Hebrew(1255), and size 1.7M Converting to UTF-8, it takes 2.5M. So I assume we will store it in S3? Hebrew(1255) or UTF-8?

@atalyaalon
Copy link
Collaborator Author

@ziv17 I suggest you take this next task , since this is relevant for Tamar's work
(I know there is Streets table, not sure when and where it's used, but you can find this mistery and perhaps use this table for this load)

I suggest

@atalyaalon
Copy link
Collaborator Author

Following this issue
I think we need to add to the flow the removal of chars like space, tab, etc
Ziv, What do you think?

@ziv17
Copy link
Collaborator

ziv17 commented Dec 20, 2024

The added tab at the end of the street is not in the streets table, it is in the news_flash.
Do we have issues of added white space, or erroneous names in the streets table?

@atalyaalon
Copy link
Collaborator Author

atalyaalon commented Dec 25, 2024

The added tab at the end of the street is not in the streets table, it is in the news_flash.

Got it.
I guess we see that since the CBS location added to the newsflash table comes from the CBS data and perhaps they had the tab there..
The hebrew names fo the streets of CBS data in parsers/cbs/executor.py are filled from the CBS dictionary.

Perhaps we should either:

  • Fill the CBS streets from the gov.il streets table. Perhaps it will be better to do so for data consistency?
  • Do nothing and just wait for task 2 here to be completed. That way querying will be only based on numeric fields and not text fields.

What do you think?

Do we have issues of added white space, or erroneous names in the streets table?

Not that I know of

@ziv17
Copy link
Collaborator

ziv17 commented Dec 26, 2024

Hi @atalyaalon ,
I think streets table is already filled from gov.il. I think it was entered in PR-2668. I think this issue (#2668 ) can be closed.

@ziv17
Copy link
Collaborator

ziv17 commented Dec 26, 2024

I guess we see that since the CBS location added to the newsflash table comes from the CBS data and perhaps they had the tab there..

by "CBS data" you mean cbs_locations table? The street that had the extra \t in the news_flash is correct there.
Another point. I think I added a change to the news_flash creation that sets also yishuv_symbol and street numeric values. The news_flash from the issue does not have these numeric values.

@atalyaalon
Copy link
Collaborator Author

I guess we see that since the CBS location added to the newsflash table comes from the CBS data and perhaps they had the tab there..

by "CBS data" you mean cbs_locations table? The street that had the extra \t in the news_flash is correct there. Another point. I think I added a change to the news_flash creation that sets also yishuv_symbol and street numeric values. The news_flash from the issue does not have these numeric values.

I mean both cbs_locations table and our own CBS tables (markers, markers_hebrew, etc)
In addition, I checked, and the \t appears in cbs_locations - download csv [from here](https://redash.dataforchange.org.il/queries/167/source and open with excel - the tab is there).
For some reason Redash UI does not display this tab but it exist. However we will take care of this in another issue. Great work!

Regarding your second point, you are correct, the street and yishuv_symbol numeric values are still not saved in the newsflashes see here for example. Road segment id btw is saved properly. I opened another issue for it and we will take care of it in the coming future.

@atalyaalon
Copy link
Collaborator Author

Hi @atalyaalon , I think streets table is already filled from gov.il. I think it was entered in PR-2668. I think this issue (#2668 ) can be closed.

I agree, closing this one

@atalyaalon
Copy link
Collaborator Author

Great work @ziv17 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants