In getting back to JEDI I’m attempting to build develop branch of jedi-bundle and run it, and SkyLab, on Orion. I’m getting a small number of ctest failures and am wondering if these are expected for develop, or if perhaps I have a problem. I’ve been following instructions on readthedocs and did the ctest -R get_ on the front end before running ctest -E get_ in an interactive session on a compute node. I haven’t looked at full output of every one, but some seg fault and others appear to have what looks like a ioda marshalling problem. The following was run, both at compile time and ctest time, to set up the environment:
Your modules and setup look correct. I haven’t run ctests on Orion in the last three weeks since I was on vacation, but I know that not too long ago all of them passed. We’ll try to reproduce the problem on our end!
More confusion… Now, today when I try to run ctest -R get_ from an Orion front-end when using the develop branch, I get:
(base) charrop@Orion-login-4:/work/noaa/gsd-hpcs/charrop/SENA/JEDI/develop/build> ctest -R get_
Test project /work/noaa/gsd-hpcs/charrop/SENA/JEDI/develop/build
Start 507: get_crtm_coeffs
1/7 Test #507: get_crtm_coeffs ........................ Passed 42.42 sec
Start 778: get_ioda_test_data
2/7 Test #778: get_ioda_test_data .....................***Failed 0.25 sec
Start 951: ufo_get_ufo_test_data
3/7 Test #951: ufo_get_ufo_test_data ..................***Failed 0.11 sec
Start 952: ufo_get_crtm_test_data
4/7 Test #952: ufo_get_crtm_test_data ................. Passed 0.52 sec
Start 981: test_ufo_geovals_get_nonexistent_var
5/7 Test #981: test_ufo_geovals_get_nonexistent_var ... Passed 1.94 sec
Start 1423: fv3jedi_get_fv3-jedi_test_data
6/7 Test #1423: fv3jedi_get_fv3-jedi_test_data .........***Failed 0.11 sec
Start 1424: fv3jedi_get_crtm_test_data
7/7 Test #1424: fv3jedi_get_crtm_test_data ............. Passed 0.21 sec
57% tests passed, 3 tests failed out of 7
And when I use verbose mode, it tells me a directory is missing:
(base) charrop@Orion-login-4:/work/noaa/gsd-hpcs/charrop/SENA/JEDI/develop/build> ctest -VV -R get_ioda_test_data
UpdateCTestConfiguration from :/work/noaa/gsd-hpcs/charrop/SENA/JEDI/develop/build/DartConfiguration.tcl
Parse Config file:/work/noaa/gsd-hpcs/charrop/SENA/JEDI/develop/build/DartConfiguration.tcl
UpdateCTestConfiguration from :/work/noaa/gsd-hpcs/charrop/SENA/JEDI/develop/build/DartConfiguration.tcl
Parse Config file:/work/noaa/gsd-hpcs/charrop/SENA/JEDI/develop/build/DartConfiguration.tcl
Test project /work/noaa/gsd-hpcs/charrop/SENA/JEDI/develop/build
Constructing a list of tests
Done constructing a list of tests
Updating test list for fixtures
Added 0 tests to meet fixture requirements
Checking test dependency graph...
Checking test dependency graph end
test 778
Start 778: get_ioda_test_data
778: Test command: /work/noaa/gsd-hpcs/charrop/SENA/JEDI/develop/build/bin/ioda_data_checker.py "/work/noaa/gsd-hpcs/charrop/SENA/JEDI/develop/jedi-bundle/ioda-data"
778: Environment variables:
778: OMP_NUM_THREADS=1
778: Test timeout computed to be: 1500
778: /work/noaa/gsd-hpcs/charrop/SENA/JEDI/develop/jedi-bundle/ioda-data does not exist
1/1 Test #778: get_ioda_test_data ...............***Failed 0.27 sec
0% tests passed, 1 tests failed out of 1
This is different behavior than what I was seeing before. And it happens with develop only. I’m using the exact same comands/scripts for 5.0.0 and that works fine.
Don’t know what to make out of this, to be honest. But I ran the ctests for “develop” with a slightly newer spack-stack version (1.4.1) and I got all tests to pass on Orion. The 1.4.1 version was created specifically for UFS and shouldn’t give different results for jedi-bundle than 1.4.0, though. Here is the job_card that I used to run the tests, but I load the same modules to build the code. Make sure you have nothing in your ~/.bashrc, ~/.profile etc that modifies the environment. Dirty user environments have led to many problems in the past.
I’m trying again after removing anything suspicious from my bash init files. But, now, for develop I’m getting failures when running ctgest -R get_ and ctest -R bumpparameters on an Orion front-end. It was my impression that is needed before doing the ctest -E get_ on the compute node. It worked earlier, but now it doesn’t.
charrop@Orion-login-3:/work/noaa/gsd-hpcs/charrop/SENA/JEDI/develop/build> ctest -R get_
Test project /work/noaa/gsd-hpcs/charrop/SENA/JEDI/develop/build
Start 507: get_crtm_coeffs
1/7 Test #507: get_crtm_coeffs ........................ Passed 56.91 sec
Start 778: get_ioda_test_data
2/7 Test #778: get_ioda_test_data .....................***Failed 0.17 sec
Start 952: ufo_get_ufo_test_data
3/7 Test #952: ufo_get_ufo_test_data ..................***Failed 0.15 sec
Start 953: ufo_get_crtm_test_data
4/7 Test #953: ufo_get_crtm_test_data ................. Passed 0.29 sec
Start 982: test_ufo_geovals_get_nonexistent_var
5/7 Test #982: test_ufo_geovals_get_nonexistent_var ... Passed 0.28 sec
Start 1424: fv3jedi_get_fv3-jedi_test_data
6/7 Test #1424: fv3jedi_get_fv3-jedi_test_data .........***Failed 0.15 sec
Start 1425: fv3jedi_get_crtm_test_data
7/7 Test #1425: fv3jedi_get_crtm_test_data ............. Passed 0.27 sec
57% tests passed, 3 tests failed out of 7
Do you know why I would be getting errors about directories not existing?
charrop@Orion-login-3:/work/noaa/gsd-hpcs/charrop/SENA/JEDI/develop/build> ctest -VV -R get_ioda_test_data
UpdateCTestConfiguration from :/work/noaa/gsd-hpcs/charrop/SENA/JEDI/develop/build/DartConfiguration.tcl
Parse Config file:/work/noaa/gsd-hpcs/charrop/SENA/JEDI/develop/build/DartConfiguration.tcl
UpdateCTestConfiguration from :/work/noaa/gsd-hpcs/charrop/SENA/JEDI/develop/build/DartConfiguration.tcl
Parse Config file:/work/noaa/gsd-hpcs/charrop/SENA/JEDI/develop/build/DartConfiguration.tcl
Test project /work/noaa/gsd-hpcs/charrop/SENA/JEDI/develop/build
Constructing a list of tests
Done constructing a list of tests
Updating test list for fixtures
Added 0 tests to meet fixture requirements
Checking test dependency graph...
Checking test dependency graph end
test 778
Start 778: get_ioda_test_data
778: Test command: /work/noaa/gsd-hpcs/charrop/SENA/JEDI/develop/build/bin/ioda_data_checker.py "/work/noaa/gsd-hpcs/charrop/SENA/JEDI/develop/jedi-bundle/ioda-data"
778: Environment variables:
778: OMP_NUM_THREADS=1
778: Test timeout computed to be: 1500
778: /work/noaa/gsd-hpcs/charrop/SENA/JEDI/develop/jedi-bundle/ioda-data does not exist
1/1 Test #778: get_ioda_test_data ...............***Failed 0.15 sec
0% tests passed, 1 tests failed out of 1
Label Time Summary:
ioda = 0.15 sec*proc (1 test)
script = 0.15 sec*proc (1 test)
Total Test time (real) = 0.62 sec
The following tests FAILED:
778 - get_ioda_test_data (Failed)
I don’t see any instructions about creating these, and I didn’t need to do anything else in my previous attempt on Monday.
@DWu - If you don’t first run ctest -R get_ and possibly also ctest -R bumpparameters on a front-end to download the data, you will get a LOT of test failures. Make sure you’ve run those tests on the front-end first, and then do the ctest -E get_ on a compute node to run the full suite.
@cwharrop Thanks for your reply.
I ran the ctest -R get_ as suggested, then I got the same error as you: the non-existing directories. I manually created those directories and got them to pass.
Then I ran ctest -R bumpparameters on the front-end, however, all the tests are failed.
0% tests passed, 5 tests failed out of 5
@DWu - To clarify, you only have to run git lfs install once. The reason I mentioned to do it after loading the modules is that it’s probably better to make sure your have the proper version of git and git-lfs loaded in your environment when you run that command. You have to do it BEFORE you clone because it is what enables downloading of LFS files during the cloning process. If you do it after, there is a git lfs command you can run to grab the files, but I forget what it is right now (maybe git lfs fetch or something, I don’t remember).
@cwharrop I tried git lfs install before clone the repositories. However, I still cannot pass ctest -R bumpparameters. And for ctest -E get_, I got only 57% tests passed, 701 tests failed out of 1615.
Below is the error message. Seems I don’t have permission to access the needed files.
Running case 0: distribution/Distribution/testConstructor …
Completed case 0: distribution/Distribution/testConstructor
Running case 1: distribution/Distribution/testDistributionConstructedManually …
Completed case 1: distribution/Distribution/testDistributionConstructedManually
Running case 2: distribution/Distribution/testDistributionConstructedByObsSpace …
HDF5-DIAG: Error detected in HDF5 (1.14.0) MPI-process 1: #000: /work/noaa/epic-ps/role-epic-ps/spack-stack/spack-stack-1.4.0/cache/build_stage/spack-stage-hdf5-1.14.0-aq2zjiuzjn3c7ocaky4vzc7wispcxs6s/spack-src/src/H5F.c line 836 in H5Fopen(): unable to synchronously open file
major: File accessibility
minor: Unable to open file #001: /work/noaa/epic-ps/role-epic-ps/spack-stack/spack-stack-1.4.0/cache/build_stage/spack-stage-hdf5-1.14.0-aq2zjiuzjn3c7ocaky4vzc7wispcxs6s/spack-src/src/H5F.c line 796 in H5F__open_api_common(): unable to open file
major: File accessibility
minor: Unable to open file
@DWu - I am also have trouble with git lfs not downloading the data for the tests. I don’t know what is going on. I’ve tried a few different things, but the data directories remain unpopulated and git lfs thinks there’s nothing to download. I didn’t have any of these issues with 5.0.0. Not only that, but it had worked for me a couple times earlier, and now it doesn’t.