Hi,
I was trying to run this experiment on Orion based on the SkyLab 4.0 release:
create_experiment.py jedi_bundle/ewok/experiments/gfs-3dvar-c12.yaml
It crashed at the an/variational step.
The error message goes as follows:
terminate called after throwing an instance of ‘ioda::Exception’
terminate called after throwing an instance of ‘ioda::Exception’
terminate called after throwing an instance of ‘ioda::Exception’
terminate called after throwing an instance of ‘ioda::Exception’
terminate called after throwing an instance of ‘ioda::Exception’
terminate called after throwing an instance of ‘ioda::Exception’
what(): Reason: H5Fcreate failed
compat: HDF5_Version_Range: [V18, V110]
filename: /work2/noaa/wrfruc/gge/skylab/workdir/58c3b7/2020-12-15T00:00:00Z/obs/fb.Aircraft.2020-12-15T00:00:00Z.nc4
mode: ioda::Engines::BackendCreateModes::Truncate_If_Exists
source_column: 0
source_filename: /work2/noaa/wrfruc/gge/skylab/jedi-bundle/ioda/src/engines/ioda/src/ioda/Engines/HH/HH.cpp
source_function: ioda::Group ioda::Engines::HH::createFileImpl(const std::__cxx11::basic_string<char, std::char_traits, std::allocator> &, ioda::Engines::BackendCreateModes, std::pair<ioda::Engines::HH::HDF5_Version, ioda::Engines::HH::HDF5_Version>, int, bool)
source_line: 161
Any thoughts on this? Thanks!
Hi @guoqing,
It looks like IODA attempted to write out a file and the creation of that file failed. This could be a permissions problem.
Does this error repeat on exactly the same file at the same time, or does it seem intermittent?
We discovered an issue with “:” in the file names on particular HPC’s (S4 for example) where writing a single file using the hdf5 parallel IO feature breaks like this. This issue is fixed for the upcoming skylab 5.0 release.
In the meantime we are working around this by using the feature in the IODA writer to write out multiple files. This is a control in the YAML: write multiple files: true
. This should be enabled by default in the generated YAML files, but is something to check.
Hope these help, and I can help further if these don’t pan out.
Stephen
Hi @stephenh,
Thanks for the information. This case is on Orion and is provided by SkyLab 4.0. I did not make any modifications. Did others meet a similar problem on Orion?
This directory:
/work2/noaa/wrfruc/gge/skylab/workdir/58c3b7/2020-12-15T00:00:00Z/obs
grants the write permission.
I did not find write multiple files: true
in the variational.146451.yaml
file:
observers:
- obs space:
name: Aircraft
obsdatain:
engine:
type: H5File
obsfile: /work2/noaa/wrfruc/gge/skylab/workdir/58c3b7/2020-12-15T00:00:00Z/obs/obs.Aircraft.2020-12-15T00:00:00Z.nc4
obsdataout:
engine:
type: H5File
allow overwrite: true
obsfile: /work2/noaa/wrfruc/gge/skylab/workdir/58c3b7/2020-12-15T00:00:00Z/obs/fb.Aircraft.2020-12-15T00:00:00Z.nc4
_source: ncdiag
simulated variables:
- windEastward
- windNorthward
- airTemperature
Could you advise how to add that to the Yaml file?
Thanks,
Guoqing
Hi Guoqing,
Under the obs space
spec, at the same level of indentation as obsdataout
add,
obsdataout:
...
io pool:
write multiple files: true
_source: ncdiag
...
It seems odd that you need to do this, but I think this is the issue and the way to fix it.
The “:” issue has actually been fixed on the develop branches in the jcsda-internal repos, so you might have a mix of skylab 4.0 tags and develop branches in all of your repos. If this is the case you might run into more issues downstream and perhaps should sync up so that you are using develop branches for all repos, or the skylab 4.0 tags for all repos.
Stephen
Thanks, @stephenh !
The problem is that this yaml file is automatically generated with a PID suffix. What original file should I modify?
For SkyLab 4.0, I cloned as follows:
git clone -b 4.0.0 https://github.com/jcsda/jedi-bundle
And for the ewok/solo/r2d2/simobs, here are the latest commits under my directories:
ewok:
commit ea5589b596bbdeb3130c8449975a4c1647e78b4c (HEAD, tag: 0.4.0, origin/release/skylab-v4)
solo:
commit 9cdfa5ad464cadd222547e6d4e886decd17d3d6a (HEAD, tag: 1.2.0, origin/release/skylab-v4)
r2d2:
commit 6373b9c0d134cad6aea2e6154d2504e80535a342 (HEAD -> skylab-v4, tag: 2.0.0, origin/release/skylab-v4)
simobs:
commit ecc29abb0b65ce8bbcad443768458b2eda25b96c (HEAD, tag: 1.2.0, origin/release/skylab-v4)
Are they correct? Or which one needs to be updated?
Thanks, Guoqing
Hi Gouqing,
The git clone
and tags for ewok/solo/r2d2/simobs look good to me. It’s not clear to me why you are not already getting the write multiple files: true
configuration in the YAML. It should be automatically generated.
This is getting out of my area of expertise and we need to have someone else help. @fabiolrdiniz or @cgas can you help figure out why Guoqing’s YAML does not have the write multiple files: true
configuration? Does he have all the correct tags for skylab 4.0? Thanks!
Stephen
Hi @stephenh
I am happy to update that after I manually added the “write multiple files: true” in the yaml file. The an/variational step ran successfully! Thanks a lot for your help!
Now I would like to hear how to include this line automatically in the workflow without my manual editing.
Thanks,
Guoqing
Hi Gouqing,
Great news! I’m hoping that @cgas or @fabiolrdiniz may be able to show you how to get the write multiple files: true
to be added automatically.
Steve
Thanks for adding us here, @stephenh.
@guoqing, if you open your experiment YAML (gfs-3dvar-c12.yaml), you will notice inside the OBSERVATIONS
block which YAMLs are being used to define each observation space (please, see these lines https://github.com/JCSDA-internal/ewok/blob/ea5589b596bbdeb3130c8449975a4c1647e78b4c/experiments/gfs-3dvar-c12.yaml#L40-L42).
You can open each of those YAMLs (sondes
, amsua_n19
, aircraft
) and add the write multiple files: true
, following @stephenh’s suggestion above. Please, remember that these changes will only take place when creating new experiments (your previous experiments won’t inherit those changes).
Got it. Thanks! @fabiolrdiniz
1 Like