SkyLab 4.0 ewok experiment, "an/variational" crashed


I was trying to run this experiment on Orion based on the SkyLab 4.0 release: jedi_bundle/ewok/experiments/gfs-3dvar-c12.yaml

It crashed at the an/variational step.

The error message goes as follows:

terminate called after throwing an instance of ‘ioda::Exception’
terminate called after throwing an instance of ‘ioda::Exception’
terminate called after throwing an instance of ‘ioda::Exception’
terminate called after throwing an instance of ‘ioda::Exception’
terminate called after throwing an instance of ‘ioda::Exception’
terminate called after throwing an instance of ‘ioda::Exception’
what(): Reason: H5Fcreate failed
compat: HDF5_Version_Range: [V18, V110]
filename: /work2/noaa/wrfruc/gge/skylab/workdir/58c3b7/2020-12-15T00:00:00Z/obs/fb.Aircraft.2020-12-15T00:00:00Z.nc4
mode: ioda::Engines::BackendCreateModes::Truncate_If_Exists
source_column: 0
source_filename: /work2/noaa/wrfruc/gge/skylab/jedi-bundle/ioda/src/engines/ioda/src/ioda/Engines/HH/HH.cpp
source_function: ioda::Group ioda::Engines::HH::createFileImpl(const std::__cxx11::basic_string<char, std::char_traits, std::allocator> &, ioda::Engines::BackendCreateModes, std::pair<ioda::Engines::HH::HDF5_Version, ioda::Engines::HH::HDF5_Version>, int, bool)
source_line: 161

Any thoughts on this? Thanks!

Hi @guoqing,

It looks like IODA attempted to write out a file and the creation of that file failed. This could be a permissions problem.

Does this error repeat on exactly the same file at the same time, or does it seem intermittent?

We discovered an issue with “:” in the file names on particular HPC’s (S4 for example) where writing a single file using the hdf5 parallel IO feature breaks like this. This issue is fixed for the upcoming skylab 5.0 release.

In the meantime we are working around this by using the feature in the IODA writer to write out multiple files. This is a control in the YAML: write multiple files: true. This should be enabled by default in the generated YAML files, but is something to check.

Hope these help, and I can help further if these don’t pan out.


Hi @stephenh,

Thanks for the information. This case is on Orion and is provided by SkyLab 4.0. I did not make any modifications. Did others meet a similar problem on Orion?

This directory:
grants the write permission.

I did not find write multiple files: true in the variational.146451.yaml file:

    - obs space:
        name: Aircraft
            type: H5File
            obsfile: /work2/noaa/wrfruc/gge/skylab/workdir/58c3b7/2020-12-15T00:00:00Z/obs/obs.Aircraft.2020-12-15T00:00:00Z.nc4
            type: H5File
            allow overwrite: true
            obsfile: /work2/noaa/wrfruc/gge/skylab/workdir/58c3b7/2020-12-15T00:00:00Z/obs/fb.Aircraft.2020-12-15T00:00:00Z.nc4
        _source: ncdiag
        simulated variables:
        - windEastward
        - windNorthward
        - airTemperature

Could you advise how to add that to the Yaml file?


Hi Guoqing,

Under the obs space spec, at the same level of indentation as obsdataout add,

        io pool:
          write multiple files: true
        _source: ncdiag

It seems odd that you need to do this, but I think this is the issue and the way to fix it.

The “:” issue has actually been fixed on the develop branches in the jcsda-internal repos, so you might have a mix of skylab 4.0 tags and develop branches in all of your repos. If this is the case you might run into more issues downstream and perhaps should sync up so that you are using develop branches for all repos, or the skylab 4.0 tags for all repos.


Thanks, @stephenh !
The problem is that this yaml file is automatically generated with a PID suffix. What original file should I modify?

For SkyLab 4.0, I cloned as follows:
git clone -b 4.0.0

And for the ewok/solo/r2d2/simobs, here are the latest commits under my directories:

commit ea5589b596bbdeb3130c8449975a4c1647e78b4c (HEAD, tag: 0.4.0, origin/release/skylab-v4)

commit 9cdfa5ad464cadd222547e6d4e886decd17d3d6a (HEAD, tag: 1.2.0, origin/release/skylab-v4)

commit 6373b9c0d134cad6aea2e6154d2504e80535a342 (HEAD -> skylab-v4, tag: 2.0.0, origin/release/skylab-v4)

commit ecc29abb0b65ce8bbcad443768458b2eda25b96c (HEAD, tag: 1.2.0, origin/release/skylab-v4)

Are they correct? Or which one needs to be updated?

Thanks, Guoqing

Hi Gouqing,

The git clone and tags for ewok/solo/r2d2/simobs look good to me. It’s not clear to me why you are not already getting the write multiple files: true configuration in the YAML. It should be automatically generated.

This is getting out of my area of expertise and we need to have someone else help. @fabiolrdiniz or @cgas can you help figure out why Guoqing’s YAML does not have the write multiple files: true configuration? Does he have all the correct tags for skylab 4.0? Thanks!


Hi @stephenh

I am happy to update that after I manually added the “write multiple files: true” in the yaml file. The an/variational step ran successfully! Thanks a lot for your help!

Now I would like to hear how to include this line automatically in the workflow without my manual editing.


Hi Gouqing,

Great news! I’m hoping that @cgas or @fabiolrdiniz may be able to show you how to get the write multiple files: true to be added automatically.


Thanks for adding us here, @stephenh.

@guoqing, if you open your experiment YAML (gfs-3dvar-c12.yaml), you will notice inside the OBSERVATIONS block which YAMLs are being used to define each observation space (please, see these lines

You can open each of those YAMLs (sondes, amsua_n19, aircraft) and add the write multiple files: true, following @stephenh’s suggestion above. Please, remember that these changes will only take place when creating new experiments (your previous experiments won’t inherit those changes).

Got it. Thanks! @fabiolrdiniz