Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

log2timeline.py fails to process large gzip input file #2746

Open
meeehow opened this issue Sep 12, 2019 · 2 comments
Open

log2timeline.py fails to process large gzip input file #2746

meeehow opened this issue Sep 12, 2019 · 2 comments
Assignees
Labels
blocked Work cannot progress until another issue is resolved core Issues relating to Plaso's core - processing, file access etc.

Comments

@meeehow
Copy link
Contributor

meeehow commented Sep 12, 2019

Description of problem:

log2timeline fails to process large gzip input file provided as an input. Works as expected with smaller archives.

Command line and arguments:

log2timeline.py --process_archives --debug x.plaso x.tar.gz

Source data:

~3GB gzip file containing Linux file system. Output of tar zcvf x.tar.gz /*

Plaso version:

20190708

Operating system Plaso is running on:

Not the operating system of the image/files you're trying to analyze.

Debug output/tracebacks:

2019-09-12 15:40:52,066 [DEBUG] (MainProcess) PID:88739 <log2timeline_tool> Starting extraction in single process mode.
2019-09-12 15:40:52,460 [DEBUG] (MainProcess) PID:88739 <extractors> Active parsers: amcache, android_app_usage, apache_access, asl_log, bash, bencode, binary_cookies, bsm_log, chrome_cache, chrome_preferences, cups_ipp, custom_destinations, czip, dockerjson, dpkg, esedb, filestat, firefox_cache, firefox_cache2, fsevents, gdrive_synclog, java_idx, lnk, mac_appfirewall_log, mac_keychain, mac_securityd, mactime, macwifi, mcafee_protection, mft, msiecf, olecf, opera_global, opera_typed_history, pe, plist, pls_recall, popularity_contest, prefetch, recycle_bin, recycle_bin_info2, rplog, santa, sccm, selinux, skydrive_log, skydrive_log_old, sophos_av, sqlite, symantec_scanlog, syslog, systemd_journal, trendmicro_url, trendmicro_vd, usnjrnl, utmp, utmpx, winevt, winevtx, winfirewall, winiis, winjob, winreg, xchatlog, xchatscrollback, zsh_extended_history
2019-09-12 15:40:52,460 [DEBUG] (MainProcess) PID:88739 <hashing_analyzer> Got hasher names: sha256
2019-09-12 15:40:52,460 [DEBUG] (MainProcess) PID:88739 <single_process> Processing started.
2019-09-12 15:40:52,463 [DEBUG] (MainProcess) PID:88739 <worker> [ProcessFileEntry] processing file entry: OS:/[redacted].tar.gz
2019-09-12 15:40:52,463 [DEBUG] (MainProcess) PID:88739 <worker> [ProcessFileEntryDataStream] processing data stream: "" of file entry: OS:/[redacted].tar.gz
2019-09-12 15:40:52,463 [DEBUG] (MainProcess) PID:88739 <worker> [AnalyzeDataStream] analyzing file: OS:/[redacted].tar.gz
2019-09-12 15:40:58,506 [DEBUG] (MainProcess) PID:88739 <hashing_analyzer> Processing results for hasher sha256
2019-09-12 15:40:58,506 [DEBUG] (MainProcess) PID:88739 <worker> [AnalyzeFileObject] attribute sha256_hash:9c3a16aac4287bd39fb31123f3d27d786733b456c320cd9a23ab5722c8a373b2 calculated for file: OS:/[redacted].tar.gz.
2019-09-12 15:40:58,506 [DEBUG] (MainProcess) PID:88739 <worker> [AnalyzeDataStream] completed analyzing file: OS:/[redacted].tar.gz
2019-09-12 15:40:58,506 [DEBUG] (MainProcess) PID:88739 <worker> [ExtractMetadataFromFileEntry] processing file entry: OS:/[redacted].tar.gz
2019-09-12 15:40:58,507 [DEBUG] (MainProcess) PID:88739 <worker> [ProcessFileEntry] done processing file entry: OS:/[redacted].tar.gz
2019-09-12 15:44:18,380 [WARNING] (MainProcess) PID:88739 <single_process> Unhandled exception while processing path spec: GZIP:/[redacted].tar.gz.
2019-09-12 15:44:18,380 [ERROR] (MainProcess) PID:88739 <single_process> 
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/plaso/engine/single_process.py", line 54, in _ProcessPathSpec
    parser_mediator, path_spec, excluded_find_specs=excluded_find_specs)
  File "/usr/lib/python3/dist-packages/plaso/engine/worker.py", line 803, in ProcessPathSpec
    path_spec, resolver_context=mediator.resolver_context)
  File "/usr/lib/python3/dist-packages/dfvfs/resolver/resolver.py", line 60, in OpenFileEntry
    file_entry = file_system.GetFileEntryByPathSpec(path_spec_object)
  File "/usr/lib/python3/dist-packages/dfvfs/vfs/gzip_file_system.py", line 27, in GetFileEntryByPathSpec
    self._resolver_context, self, path_spec, is_root=True, is_virtual=True)
  File "/usr/lib/python3/dist-packages/dfvfs/vfs/gzip_file_entry.py", line 36, in __init__
    path_spec, resolver_context=resolver_context)
  File "/usr/lib/python3/dist-packages/dfvfs/resolver/resolver.py", line 112, in OpenFileObject
    file_object.open(path_spec=path_spec_object)
  File "/usr/lib/python3/dist-packages/dfvfs/file_io/file_io.py", line 76, in open
    self._Open(path_spec=path_spec, mode=mode)
  File "/usr/lib/python3/dist-packages/dfvfs/file_io/gzip_file_io.py", line 212, in _Open
    self._gzip_file_object, next_member_offset, uncompressed_data_offset)
  File "/usr/lib/python3/dist-packages/dfvfs/lib/gzipfile.py", line 183, in __init__
    self._LoadDataIntoCache(file_object, 0, read_all_data=True)
  File "/usr/lib/python3/dist-packages/dfvfs/lib/gzipfile.py", line 380, in _LoadDataIntoCache
    self._cache = b''.join([self._cache, data_to_add])
MemoryError
2019-09-12 15:44:30,451 [DEBUG] (MainProcess) PID:88739 <single_process> Processing completed.
@joachimmetz
Copy link
Member

Been able to reproduce this with acserver.tar.gz

@joachimmetz joachimmetz self-assigned this Dec 30, 2020
@joachimmetz joachimmetz added the core Issues relating to Plaso's core - processing, file access etc. label Dec 30, 2020
@joachimmetz joachimmetz added this to the 2021 January release milestone Dec 30, 2020
@joachimmetz
Copy link
Member

Initial tests with libgzipf indicate that it helps to limit memory usage:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked Work cannot progress until another issue is resolved core Issues relating to Plaso's core - processing, file access etc.
Projects
None yet
Development

No branches or pull requests

2 participants