Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need for list format #35

Open
harisbal opened this issue Apr 2, 2019 · 2 comments
Open

Need for list format #35

harisbal opened this issue Apr 2, 2019 · 2 comments

Comments

@harisbal
Copy link

harisbal commented Apr 2, 2019

Hello everyone and congratulations for this great project.
Standardising formats will definitely help everyone in the transport industry.

I just wanted to raise a concern regarding the selected approach. Although the transport industry has been using and storing OD matrices in the "matrix format" (e.g. rows and columns), I believe that this is not the most efficient approach. From my perspective as well as from quite a few other data analysts and programmers (e.g. https://vita.had.co.nz/papers/tidy-data.pdf ) storing data in a "list" or "database format" is more efficient. Following this format all ODs could be stored in a single file and the user will be able to make selections based on simple and standardised queries.
For instance:

Origin_Zone, Destination_Zone, Trip_Purpose, Time_Period, Trips
A, B, HBW, AM, 10
Z, X, HBO, IP, 12
...

I would really like to know the views of the development team regarding this comment.

Kind Regards

-Haris

@pedrocamargo
Copy link

Hey Haris,
I'd beg to differ. It is true that in a world of agent-based modelling, our demand matrices are incredibly sparse and what we actually have are trip tables (as you have suggested).
However, many of the inputs and outputs in our models are still dense matrices (e.g. skim matrices), and storing them in anything else other than a proper matrix would not be efficient.
I would also point to the computational efficiency of contiguous space in memory (or in disk) for storing arrays (of known size), which I guess is part of the attractiveness of this format.
Maybe what we need is to have examples of smart ways of converting OMX matrices to data tables and vice-versa (I could help with the Python version)?

@harisbal
Copy link
Author

harisbal commented Apr 2, 2019

Hi Pedro!
Allowing for both worlds would be probably the best approach since as you already pointed out different formats are more appropriate for different scenarios. It's true though that converting matrices to lists is a rather simple task.
The dev team could count me in to contribute towards this approach.
Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants