Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception trying to read large 3.9 GB grib file #41

Open
SpechtacularDave opened this issue Feb 4, 2022 · 6 comments
Open

Exception trying to read large 3.9 GB grib file #41

SpechtacularDave opened this issue Feb 4, 2022 · 6 comments

Comments

@SpechtacularDave
Copy link

SpechtacularDave commented Feb 4, 2022

Hello,

Please let me know if there is a better place to ask a question like this one. I wasn't able to find a forum or user group for metview, but if there is a resource like that which would be better suited, please direct me and feel free to close this issue.

Using the metview python api, I am trying to read a 3.9 GB grib file that I retrieved from the the CDS API. But when I run the read() command on the file, after a couple minutes I receive the following:
Exception: Metview error: Service mars@8600 died before completing the request

This happens when trying to read the grib file with:
fs = mv.read(source="1980.grib")

And it also happens even if I just try to extract a small area:
fs = mv.read(source="1980.grib", area=[30, -100, 29.75, -99.75])

The file contains data for 9 parameters from ERA5, ["10u", "10v", "2d", "2t", "i10fg", "msl", "ptype", "tcc", "tp"] that all have hourly values for a whole year, for the area of [50, -125, 24, -66], which covers the entire USA country. I believe the two precipitation parameters are stored in grib2 messages, while the rest are grib1.

Is there a file size limit for Metview? If so, can you recommend a way to split my files containing grib1 and grib2 messages into smaller files? Or is it necessary to compute some kind of index on the file before reading it?

If it matters, I don't need to read the whole file at once, I just need to be able to extract a small area at a time, so that I can combine it with data from another grib file for the next year, so that I can ultimately compute some statistics on the combined dataset for each point.

Thank you, - David

@iainrussell
Copy link
Member

Hi David,

This is a perfectly good place to ask Metview questions!

You can set this environment variable before starting the Python process, then you should get more output.

export METVIEW_PYTHON_DEBUG=1

I suspect you're encountering an internal buffer limit - you might see a message like this:
"wmo_read_any_from_file: error -3 (Passed buffer is too small) l=140726296292856, len=3732480187"

in which case you can set this environment variable to the size of the GRIB file before importing Metview, e.g.
export MARS_READANY_BUFFER_SIZE=3732480200 # use the size of the GRIB file here in bytes

Let us know if this solves your problem.

If not, then it is possible to split the file using ecCodes command-line tools first (grib_filter).

Cheers,
Iain

@SpechtacularDave
Copy link
Author

Hi Iain!

I'm glad, thanks for the help. I turned on debug messages, and while I tried setting a large buffer size, I think it's actually an issue with the mars service timing out after about 5 minutes and 30 seconds. Here are some lines I think are relevant from the debug messages (full output at bottom of this post):

event - INFO - 20220206.222550 - mars : Timeout = 300 Alive = 0, Max = 1
event - INFO - 20220206.222550 - pool : Timeout = 86400 Alive = 0, Max = 0
event - INFO - 20220206.223115 - Service mars has timed out
Mars - FATAL - 20220206.223115 - Server localhost port 36571 is dead
Python - FATAL - 20220206.223231 - Server localhost port 36571 is dead

I took a guess at what the mars timeout variable might be with:
export MARS_TIMEOUT=1800
But it's still timing out after 5 minutes 30 seconds, so that's not it. Is there a different name for it? Or is there anything else I should try instead?

Thanks,
David

MetviewInvoker: Invoking Metview
Starting Metview using these command args:
['metview', '-nocreatehome', '-slog', '-python-serve', '/tmp/tmpcc4phicp', '37129']
Metview 5.14.1 (2022.01.20) @ david-OptiPlex-7020
Installed in /home/david/anaconda3/envs/metview/lib/metview-bundle
event - INFO - 20220206.222550 - Starting server: port is 36571
event - INFO - 20220206.222550 - NOTE: $EVENT_HOST is now 'localhost'!
event - INFO - 20220206.222550 - Got connection
event - INFO - 20220206.222550 - incoming host is localhost (127.0.0.1)
event - INFO - 20220206.222550 - Register: PythonServe david david-OptiPlex-7020 37149 as ref 1
event - INFO - 20220206.222550 - Maximum value for PythonServe : 1
event - INFO - 20220206.222550 - Got connection
event - INFO - 20220206.222550 - incoming host is localhost (127.0.0.1)
event - INFO - 20220206.222550 - Register: /Process@37129/PythonScript, line 0:1 david david-OptiPlex-7020 37129 as ref 3
event - INFO - 20220206.222550 - Alive value for /Process@37129/PythonScript, line 0:1 : on
event - INFO - 20220206.222550 - Starting service: mars
event - INFO - 20220206.222550 - With command : " /home/david/anaconda3/envs/metview/lib/metview-bundle/bin/Mars"
event - INFO - 20220206.222550 - Got connection
event - INFO - 20220206.222550 - incoming host is localhost (127.0.0.1)
event - INFO - 20220206.222550 - Register: mars david david-OptiPlex-7020 37154 as ref 5
event - INFO - 20220206.222550 - mars : Timeout = 300 Alive = 0, Max = 1
event - INFO - 20220206.222550 - Maximum value for mars : 1
event - INFO - 20220206.222550 - Maximum value for mars : 1
event - INFO - 20220206.222550 - Maximum value for mars : 1
event - INFO - 20220206.222550 - Maximum value for mars : 1
event - INFO - 20220206.222550 - Maximum value for mars : 1
event - INFO - 20220206.222550 - Maximum value for mars : 1
event - INFO - 20220206.222550 - Maximum value for mars : 1
event - INFO - 20220206.222550 - Maximum value for mars : 1
event - INFO - 20220206.222550 - Maximum value for mars : 1
event - INFO - 20220206.222550 - Maximum value for mars : 1
event - INFO - 20220206.222550 - Maximum value for mars : 1
Mars - INFO - 20220206.222550 - package mir version: 1.10.2
event - INFO - 20220206.222550 - Got connection
event - INFO - 20220206.222550 - incoming host is localhost (127.0.0.1)
event - INFO - 20220206.222550 - Register: mars@37158 david david-OptiPlex-7020 37158 as ref 6
event - INFO - 20220206.222550 - Starting service: pool
event - INFO - 20220206.222550 - With command : " /home/david/anaconda3/envs/metview/lib/metview-bundle/bin/pool"
event - INFO - 20220206.222550 - Got connection
event - INFO - 20220206.222550 - incoming host is localhost (127.0.0.1)
event - INFO - 20220206.222550 - Register: pool david david-OptiPlex-7020 37160 as ref 8

pool - INFO - 20220206.222550 - [mars@37158] - Object /Process@37129/PythonScript, line 0:0 [READ] not cached
--> Retrieve::serve in -->
MARS HOME:
MARS LANGUAGE FILE: /home/david/anaconda3/envs/metview/lib/metview-bundle/share/metview/etc/MarsDef
MARS RULES FILE: /home/david/anaconda3/envs/metview/lib/metview-bundle/share/metview/etc/MarsRules

Mars - INFO - 20220206.222550 - GRIB conversion needed...
Mars - WARN - 20220206.222614 - CACHE-MANAGER mir/cropping, /tmp/cache does not exist
Mars - WARN - 20220206.222614 - CACHE-MANAGER mir/cropping, /tmp/cache created
Mars - INFO - 20220206.222614 - Cache file /tmp/cache/mir/cropping/9/94bd7bbfe016fc215834d1aeef8fb472.area does not exist
Mars - INFO - 20220206.222614 - Creating cache file /tmp/cache/mir/cropping/9/94bd7bbfe016fc215834d1aeef8fb472.area
Mars - INFO - 20220206.222614 - CacheManager creating file /tmp/cache/mir/cropping/9/94bd7bbfe016fc215834d1aeef8fb472.area
Mars - INFO - 20220206.222614 - Saving cropping to cache: 0.001539 second elapsed, 0.000109 second cpu
Mars - INFO - 20220206.222614 - Loading cropping from cache: 1.8e-05 second elapsed, 1.7e-05 second cpu
Mars - INFO - 20220206.222615 - Cache file /tmp/cache/mir/cropping/9/0a542d5e9296c605f34ace3e8bc26430.area does not exist
Mars - INFO - 20220206.222615 - Creating cache file /tmp/cache/mir/cropping/9/0a542d5e9296c605f34ace3e8bc26430.area
Mars - INFO - 20220206.222615 - CacheManager creating file /tmp/cache/mir/cropping/9/0a542d5e9296c605f34ace3e8bc26430.area
Mars - INFO - 20220206.222615 - Saving cropping to cache: 0.001404 second elapsed, 0.000223 second cpu
Mars - INFO - 20220206.222615 - Loading cropping from cache: 2.3e-05 second elapsed, 2.3e-05 second cpu
event - INFO - 20220206.223115 - Service mars has timed out
event - INFO - 20220206.223115 - Closing service mars
event - INFO - 20220206.223115 - 0 outstanding replies
Mars - FATAL - 20220206.223115 - Server localhost port 36571 is dead
Mars - INFO - 20220206.223212 - 79056 field(s) read from file 1980.grib into '/Process@37129/PythonScript, line 0:0'
Mars - INFO - 20220206.223212 - Request time: wall: 6 min 22 sec cpu: 6 min 21 sec
Mars - INFO - 20220206.223212 - Post-processing: wall: 4 min 12 sec cpu: 4 min 11 sec
Mars - INFO - 20220206.223212 - Reading file 1980.grib: wall: 24 sec
^CMars - FATAL - 20220206.223231 - Signal 2 received
pool - FATAL - 20220206.223231 - Signal 2 received
event - FATAL - 20220206.223231 - Signal 2 received
PythonServe - FATAL - 20220206.223231 - Signal 2 received
Python - FATAL - 20220206.223231 - Server localhost port 36571 is dead

@iainrussell
Copy link
Member

Hi @SpechtacularDave ,

Try setting METVIEW_TIMEOUT=10 - the value is in minutes. It's strange, as I cannot reproduce the problem here, but you're not the first to report it, so something is going on! That said, from your output log, it does look like you pressed CTRL-C at some point...! Be aware that Metview essentially runs microservices, and when a program is finished, it kills any running services and you get various warning messages about services being killed or not being able to communicate with the server, so don't give up on the program when you see these messages, they might just be part of the normal termination process! I hope this helps you to get further.

Best regards,
Iain

@SpechtacularDave
Copy link
Author

SpechtacularDave commented Feb 11, 2022

Hi Iain,

Ah, you're correct, I did hit ctrl-c because I thought metview had died because of the debug messages. However, when I let it run the entire time, I'm still getting an exception after about 7-8 minutes (timeout was set to 100), even with all the recommended environment variables set,. Below is the full debug output. Does this shine any light on what's going wrong? If it adds anything, I am able to open the file with ecCodes.

(metview) david@david-OptiPlex-7020:~/anaconda3/envs/metview/bulkdownload$ python3 bulktest.py
MetviewInvoker: Invoking Metview
Starting Metview using these command args:
['metview', '-nocreatehome', '-slog', '-python-serve', '/tmp/tmpkp0ltlml', '30891']
Metview 5.14.1 (2022.01.20) @ david-OptiPlex-7020
Installed in /home/david/anaconda3/envs/metview/lib/metview-bundle
event - INFO - 20220211.175848 - Starting server: port is 48831
event - INFO - 20220211.175848 - NOTE: $EVENT_HOST is now 'localhost'!
event - INFO - 20220211.175848 - Got connection
event - INFO - 20220211.175848 - incoming host is localhost (127.0.0.1)
event - INFO - 20220211.175848 - Register: PythonServe david david-OptiPlex-7020 30911 as ref 1
event - INFO - 20220211.175848 - Maximum value for PythonServe : 1
event - INFO - 20220211.175849 - Got connection
event - INFO - 20220211.175849 - incoming host is localhost (127.0.0.1)
event - INFO - 20220211.175849 - Register: /Process@30891/PythonScript, line 0:1 david david-OptiPlex-7020 30891 as ref 3
event - INFO - 20220211.175849 - Alive value for /Process@30891/PythonScript, line 0:1 : on
event - INFO - 20220211.175849 - Starting service: mars
event - INFO - 20220211.175849 - With command : " /home/david/anaconda3/envs/metview/lib/metview-bundle/bin/Mars"
Mars - INFO - 20220211.175849 - package mir version: 1.10.2
event - INFO - 20220211.175849 - Got connection
event - INFO - 20220211.175849 - incoming host is localhost (127.0.0.1)
event - INFO - 20220211.175849 - Register: mars david david-OptiPlex-7020 30916 as ref 5
event - INFO - 20220211.175849 - mars : Timeout = 300 Alive = 0, Max = 1
event - INFO - 20220211.175849 - Maximum value for mars : 1
event - INFO - 20220211.175849 - Maximum value for mars : 1
event - INFO - 20220211.175849 - Maximum value for mars : 1
event - INFO - 20220211.175849 - Maximum value for mars : 1
event - INFO - 20220211.175849 - Maximum value for mars : 1
event - INFO - 20220211.175849 - Maximum value for mars : 1
event - INFO - 20220211.175849 - Maximum value for mars : 1
event - INFO - 20220211.175849 - Maximum value for mars : 1
event - INFO - 20220211.175849 - Maximum value for mars : 1
event - INFO - 20220211.175849 - Maximum value for mars : 1
event - INFO - 20220211.175849 - Maximum value for mars : 1
event - INFO - 20220211.175849 - Got connection
event - INFO - 20220211.175849 - incoming host is localhost (127.0.0.1)
event - INFO - 20220211.175849 - Register: mars@30920 david david-OptiPlex-7020 30920 as ref 6
event - INFO - 20220211.175849 - Starting service: pool
event - INFO - 20220211.175849 - With command : " /home/david/anaconda3/envs/metview/lib/metview-bundle/bin/pool"
event - INFO - 20220211.175849 - Got connection
event - INFO - 20220211.175849 - incoming host is localhost (127.0.0.1)
event - INFO - 20220211.175849 - Register: pool david david-OptiPlex-7020 30922 as ref 8
event - INFO - 20220211.175849 - pool : Timeout = 86400 Alive = 0, Max = 0
pool - INFO - 20220211.175849 - [mars@30920] - Object /Process@30891/PythonScript, line 0:0 [READ] not cached
--> Retrieve::serve in -->
MARS HOME:
MARS LANGUAGE FILE: /home/david/anaconda3/envs/metview/lib/metview-bundle/share/metview/etc/MarsDef
MARS RULES FILE: /home/david/anaconda3/envs/metview/lib/metview-bundle/share/metview/etc/MarsRules

Mars - INFO - 20220211.175849 - GRIB conversion needed...
Mars - WARN - 20220211.175914 - CACHE-MANAGER mir/cropping, /tmp/cache does not exist
Mars - WARN - 20220211.175914 - CACHE-MANAGER mir/cropping, /tmp/cache created
Mars - INFO - 20220211.175914 - Cache file /tmp/cache/mir/cropping/9/94bd7bbfe016fc215834d1aeef8fb472.area does not exist
Mars - INFO - 20220211.175914 - Creating cache file /tmp/cache/mir/cropping/9/94bd7bbfe016fc215834d1aeef8fb472.area
Mars - INFO - 20220211.175914 - CacheManager creating file /tmp/cache/mir/cropping/9/94bd7bbfe016fc215834d1aeef8fb472.area
Mars - INFO - 20220211.175914 - Saving cropping to cache: 0.001532 second elapsed, 0.000104 second cpu
Mars - INFO - 20220211.175914 - Loading cropping from cache: 1.8e-05 second elapsed, 1.7e-05 second cpu
Mars - INFO - 20220211.175914 - Cache file /tmp/cache/mir/cropping/9/0a542d5e9296c605f34ace3e8bc26430.area does not exist
Mars - INFO - 20220211.175914 - Creating cache file /tmp/cache/mir/cropping/9/0a542d5e9296c605f34ace3e8bc26430.area
Mars - INFO - 20220211.175914 - CacheManager creating file /tmp/cache/mir/cropping/9/0a542d5e9296c605f34ace3e8bc26430.area
Mars - INFO - 20220211.175914 - Saving cropping to cache: 0.001388 second elapsed, 0.00012 second cpu
Mars - INFO - 20220211.175914 - Loading cropping from cache: 1.9e-05 second elapsed, 1.8e-05 second cpu
event - INFO - 20220211.180415 - Service mars has timed out
event - INFO - 20220211.180415 - Closing service mars
event - INFO - 20220211.180415 - 0 outstanding replies
Mars - FATAL - 20220211.180415 - Server localhost port 48831 is dead
Mars - INFO - 20220211.180512 - 79056 field(s) read from file 1980.grib into '/Process@30891/PythonScript, line 0:0'
Mars - INFO - 20220211.180512 - Request time: wall: 6 min 23 sec cpu: 6 min 22 sec
Mars - INFO - 20220211.180512 - Post-processing: wall: 4 min 12 sec cpu: 4 min 12 sec
Mars - INFO - 20220211.180512 - Reading file 1980.grib: wall: 26 sec
Mars - INFO - 20220211.180703 - READ: No sorting done
--> Retrieve::serve out -->
event - INFO - 20220211.180703 - Closing service mars@30920
event - INFO - 20220211.180703 - 1 outstanding replies
event - INFO - 20220211.180703 - Service mars@30920 died before completing the request
event - INFO - 20220211.180703 - AbortIfBatch: not batch, no abort!
Python - ERROR - 20220211.180703 - Line 0 in 'PythonScript': Service mars@30920 died before completing the request
Traceback (most recent call last):
File "/home/david/anaconda3/envs/metview/bulkdownload/bulktest.py", line 3, in
fs = mv.read(source="1980.grib", area=[30, -100, 29.75, -99.75])
File "/home/david/anaconda3/envs/metview/lib/python3.10/site-packages/metview/bindings.py", line 1190, in wrapped
return value_from_metview(val)
File "/home/david/anaconda3/envs/metview/lib/python3.10/site-packages/metview/bindings.py", line 1144, in value_from_metview
raise retval
Exception: Metview error: Service mars@30920 died before completing the request
MetviewInvoker: Closing Metview
pool - FATAL - 20220211.180703 - Server localhost port 48831 is dead

@iainrussell
Copy link
Member

Hi David,
I've tried to replicate this with a local 6GB GRIB file, but it works for me. Could you give me the CDS API retrieval query so that I can see the particular data, in case there's something specific about this that's causing the problem?
It is possible to split the file using the grib_filter command that comes with ecCodes (if you have Metview installed, you will also have ecCodes installed). See here: https://confluence.ecmwf.int/display/ECC/grib_filter

But I'd also be very interested to see the CDS request so I can replicate the scenario.
Cheers,
Iain

@SpechtacularDave
Copy link
Author

SpechtacularDave commented Feb 14, 2022

Iain,

Here is my query:

import cdsapi

c = cdsapi.Client()

c.retrieve(
'reanalysis-era5-single-levels',
{
'product_type': 'reanalysis',
'variable': [
'10m_u_component_of_wind', '10m_v_component_of_wind', '2m_dewpoint_temperature',
'2m_temperature', 'instantaneous_10m_wind_gust', 'mean_sea_level_pressure',
'precipitation_type', 'total_cloud_cover', 'total_precipitation',
],
'year': 1979,
'month': [
'01', '02', '03',
'04', '05', '06',
'07', '08', '09',
'10', '11', '12',
],
'day': [
'01', '02', '03',
'04', '05', '06',
'07', '08', '09',
'10', '11', '12',
'13', '14', '15',
'16', '17', '18',
'19', '20', '21',
'22', '23', '24',
'25', '26', '27',
'28', '29', '30',
'31',
],
'time': [
'00:00', '01:00', '02:00',
'03:00', '04:00', '05:00',
'06:00', '07:00', '08:00',
'09:00', '10:00', '11:00',
'12:00', '13:00', '14:00',
'15:00', '16:00', '17:00',
'18:00', '19:00', '20:00',
'21:00', '22:00', '23:00',
],
'area': [
50, -125, 24,
-66,
],
'format': 'grib',
},
'download.grib')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants