Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] Replace ciso8601 with dateutil #10

Open
berggren opened this issue May 26, 2020 · 4 comments
Open

[FR] Replace ciso8601 with dateutil #10

berggren opened this issue May 26, 2020 · 4 comments
Labels
enhancement New feature or request

Comments

@berggren
Copy link

Is your feature request related to a problem? Please describe.
One of the dependencies for this project is ciso8601, which is a C library to parse datetime strings to datetime objects. This has the side effect that we need to install build tools when building docker containers.

Describe the solution you'd like
I opt to investigate if this library is essential, or if it could be replaced with something like the dateutil library. WDYT?

Additional context
This affects the Timesketch project and have added an issue when building our docker containers. I have disabled the Mans importer at the moment, but would like to enable it as soon as possible again.

@berggren berggren added the enhancement New feature or request label May 26, 2020
@berggren berggren changed the title "[FR]" Replace ciso8601 with dateutil [FR] Replace ciso8601 with dateutil May 26, 2020
@dadokkio
Copy link
Collaborator

If we sent data to elastic with a single wrong date it'll be discarded so we need to parse and check all the date we use as timeline info.
The parsing of millions of rows in pandas dataframes takes an huge amount of time so using ciso8601 was a quick win
We can make this dependency optional, but it'll make all the processing slower.
I'll try to run some tests with both for comparison.

@berggren
Copy link
Author

berggren commented May 27, 2020

Thanks for your quick response!

Ok, I see. I'm curios why the timestamp would be wrong in the first place, and also wondering if you really want to parse it then? :) Anyhow, if we can move along with #11 instead, then this issue can be closed.

Let me know.

@dadokkio
Copy link
Collaborator

I don't know exactly why.
For example, It happened recently that a mans had FileItem.Changed: '16339-02-03T00:12:01Z' as a value.

In this case we remove this item from records, but in any case we need to parse all data value to change date format and generate timestamps.

@garanews
Copy link
Member

@berggren
This is the output of the tool:
image
So don't know exactly why date is wrong, and in this XML file there are rough 8M dates that we sould check, so ciso8601 outperform dateutil

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants