Failure of bumpparameters_nicas_gfs_aero at C96

Hi @hbo9955, I copy/paste here the email exchange we had with @danholdaway last month, which could be useful for other users. I just reordered the messages to make the discussion easy to read.


From @hbo9955:

Hi Benjamin,

I am Bo Huang. Thanks for your comments on my question posted on JEDI forum. Since Mariusz was also testing this function in JEDI, I included him in this email.

Attached includes four yaml files and sbatch job script.
(1) bumpparameters_nicas_gfs_c12.yaml for [T, ps] at C12
(2) bumpparameters_nicas_gfs_c96.yaml for [T, ps] at C96
(3) bumpparameters_nicas_gfs_aero_c12.yaml for [sulf, …, seas5] at C12
(4) bumpparameters_nicas_gfs_aero_c96.yaml for [sulf, …, seas5] at C96
(5) sbatch_bump_gfs_aero.sh

I only have errors in (4) showing “!!! ABORT in nicas_blk_write on task #0005: dimension nc1b has a different size in file”. The other three work fine. If you need more information, please let me know.

In addition, I wonder if there is an option to set the vertical localization scale in the “logpres” unit in the bump yaml files? By default, this is controlled by “rv=0.3” using “Sigma-Level” unit (I assume). In the “lgetkf.yaml”, it uses the “logpres” unit for vertical localization length scale. like
“”"
87 local ensemble DA:
88 solver: GETKF
89 vertical localization:
90 fraction of retained variance: .5
91 lengthscale: 1.5
92 lengthscale units: logp
“”"

Thanks.
Bo


From @benjaminmenetrier:

Hi Bo, [cc. Mariusz and Dan]

Thanks for the yaml files.

The important point for your issue is the key “prefix” in the “bump” section. For the “bumpparameters_nicas_gfs_aero_c96” run, it is set to “…/bump_aero/fv3jedi_bumpparameters_nicas_gfs_aero”. This means that all the files produced by BUMP will be written as “…/bump_aero/fv3jedi_bumpparameters_nicas_gfs_aero_XXX.nc” where XXX is a suffix depending on the data that are written. So before running this test, you have to make sure that the directory “…/bump_aero” is empty. Can you try again? Unfortunately, I don’t have access to the NOAA machine you are probably using. If it doesn’t work, please send me the output log file, I’ll check it too.

Regarding the vertical coordinate, BUMP is not really aware of it since it is provided in the model interface. For instance in FV3-JEDI, a “fake” sigma coordinate is passed, see fv3-jedi/fv3jedi_geom_mod.f90 at 6b0b1806c9ac3d9262301465cb4483972e83a33f · JCSDA/fv3-jedi · GitHub Thus, the unit of “rv” in the yaml file for FV3-JEDI is the “sigma”-unit.
If the vertical coordinate was pressure, the unit of “rv” would be pascals, and so on. So if you want to use the logarithm of pressure, you have to change the vertical coordinate in the “fill_atlas_fieldset” subroutine of fv3jedi_geom_mod.f90. You could look at the “getVerticalCoordLogP” subroutine as an example. Let me know if you need some more help about this issue.

Have a good day,
Benjamin


From @hbo9955:

Hi Benjamin,

Thanks for your prompt response. I attached the log files from four runs in my last email. I empty the directory where the BUMP files are written to before each run. Only “bump_gfs_aero_c99.out” has the writing error message at the end of this file.

I also tried to only include two variables [sulf, bc1] in bump_gfs_aero_c96.yaml like bump_gfs_c96.yaml. It also shows similar errors.

Thanks for your response to the vertical coordinate question. We will look into it and will let you know if we need help from you.

Best,
Bo


From @danholdaway:

Hi Bo,

Can you send your bump_gfs_aero_c96.yaml?

Thanks,
Dan


From @hbo9955:

Hi Dan,

Yaml file is attached,

Thanks,
Bo


From @danholdaway:

new_nicas: 1 is likely wrong in this yaml file. It is telling bump to create a new operator, when you’ve already created it in the prior step and are trying to read it in the prefix line. That means it will try to write to the existing files.


From @hbo9955:

That makes sense.

I also see new_nicas;1 is also in the bumpparameters_nicas_gfs.yaml for [T, ps] variables. Is this error possibly caused by the codes related to aerosol BUMP? If so, I think I will need to talk to Andrew who developed the aerosol BUMP capability.

Bo


From @danholdaway:

Bo,

It is correct to have it in that file. There is a two step process.

  1. Run BUMP parameters to generate localization and or covariance model (new_nicas:1 in bumpparameters_nicas_gfs.yaml).

  2. Run assimilation that reads precomputed BUMP models (new_nicas:0 in bump_gfs_aero_c96.yaml)

Thanks,
Dan.


From @hbo9955:

Hi Dan,

Thanks for the clarification. I think I did not interpret my problem clearly.

My goal is to run Step 1 to generate localization and/or covariance model for aerosols using bumpparameters_nicas_gfs_aero.yaml at C96. So new_nicas: 1 should be fine in this yaml file. But it caused writing errors.

The four log files I uploaded are from running bumpparameters_nicas_gfs.yaml and bumpparameters_nicas_gfs_aero.yaml at C12 and C96. I only got the writing errors, when running bumpparameters_nicas_gfs_aero.yaml at C96. The other three worked fine.

Thanks,
Bo


From @benjaminmenetrier:

Hi Bo,

I think I understand your problem now, sorry I didn’t notice the issue earlier.

When running bumpparameters_nicas_gfs_aero.yaml at C12 or C96, you are generating the NICAS operator with fixed length-scales several times (once for each variable) and overwriting each time in the same NetCDF group because of how the “io_keys” and “io_values” keys are set. Even if the NICAS subgrid size (nc1) is the same each time, the random subsampling can lead to different local subgrid sizes for each MPI task when the subsampling process is repeated for each variable. Thus, nc1b is different for each variable (as you can see in the log file of C96), which explains the crash. It does not crash at resolution C12 because the grid is so coarse that all points are kept in the NICAS subsampling (nc1b = 144), so the NICAS variables are overwritten without any problem.

More generally, I think that with the current setup where only two NICAS operators are generated with fixed length-scales in bumpparameters_nicas_gfs_c12.yaml and bumpparameters_nicas_gfs_c96.yaml (one 3D and one 2D), it is useless to regenerate specific NICAS operators aerosols. No need to run bumpparameters_nicas_gfs_aero_c12.yaml and bumpparameters_nicas_gfs_aero_c96.yaml, you can simply use the NICAS files produced by bumpparameters_nicas_gfs_c12.yaml and bumpparameters_nicas_gfs_c96.yaml when running your variational applications.

So to extend Dan’s answer:

  1. Run BUMP parameters to generate localization and or covariance model - bumpparameters_nicas_gfs_c12.yaml and bumpparameters_nicas_gfs_c96.yaml - with “new_nicas: 1”.
  2. Run assimilation that reads precomputed BUMP models with load_nicas:1, at the correct resolution (use the correct prefix from bumpparameters_nicas_gfs_c12.yaml or bumpparameters_nicas_gfs_c96.yaml), and specify your own “io_keys” / “io_values” set (the one from your bumpparameters_nicas_gfs_aero_c12.yaml was correct).

Dan: what is done in the current bumpparameters_nicas_gfs_aero.yaml is not really useful and can be misleading as Bo noticed (sorry Bo!), we should update it or remove it.

Benjamin


From @hbo9955:

Hi Benjamin,

Thanks for your detailed reply.

I will try what you suggested to use NICAS files produced by bumpparameters_nicas_gfs_c96.yaml and adjust “io_keys” / “io_values” sets in our aerosol variational update.

Thanks,
Bo