I have tried various ways to solve this issue myself, but have not been able to resolve 24 test failures. I have provided some relevant information below that will hopefully help diagnose my mistake quickly.
Receiving the following failures from ctest…
The following tests FAILED:
158 - atlas_test_metadata (Failed)
162 - atlas_test_haloexchange_adjoint (Failed)
163 - atlas_test_setcomm (Failed)
164 - atlas_test_haloexchange (Failed)
165 - atlas_test_gather (Failed)
178 - atlas_fctest_grids (Failed)
193 - atlas_test_distribution_regular_bands (Failed)
234 - atlas_fctest_elements (Failed)
235 - atlas_test_parfields (Failed)
236 - atlas_test_halo (Failed)
237 - atlas_test_distmesh (Failed)
239 - atlas_test_mesh_node2cell (Failed)
249 - atlas_fctest_functionspace (Failed)
252 - test_structuredcolumns_biperiodic (Failed)
253 - atlas_test_structuredcolumns (Failed)
257 - atlas_test_stencil_parallel_mpi4 (Failed)
258 - atlas_test_stencil_parallel_mpi16 (Failed)
259 - atlas_test_polygons_structuredcolumns (Failed)
260 - atlas_test_polygons_nodecolumns (Failed)
261 - atlas_test_polygons_projections_structuredcolumns (Failed)
262 - atlas_test_polygons_projections_nodecolumns (Failed)
263 - atlas_test_structuredcolumns_haloexchange (Failed)
291 - atlas_test_interpolation_structured2D_to_unstructured (Failed)
1261 - test_femps_csgrid (Failed)
Running on DISCOVER using the following modules…
@discover32[jedi_spack-stack] JEDI_OPT=/discover/swdev/jcsda/modules
@discover32[jedi_spack-stack] export JEDI_OPT
@discover32[jedi_spack-stack]$ module use JEDI_OPT/modulefiles/core
@discover32[jedi_spack-stack] module use JEDI_OPT/modulefiles/apps
@discover32[jedi_spack-stack] module load jedi/intel-impi
@discover32[jedi_spack-stack]$ module list
Currently Loaded Modules:
- git/2.38.1
- git-lfs/2.10.0
- jedi-python/3.8.3
- comp/gcc/9.2.0
- comp/intel/19.1.0.166
- jedi-intel/19.1.0.166
- szip/2.1.1
- zlib/1.2.11
- udunits/2.2.26
- mpi/impi/19.1.0.166
- jedi-impi/19.1.0.166
- hdf5/1.12.0
- pnetcdf/1.12.1
- netcdf/4.7.4
- nccmp/1.8.7.0
- boost-headers/1.68.0
- eigen/3.3.7
- bufr/noaa-emc-11.5.0
- cmake/3.23.1
- ecbuild/ecmwf-3.6.1
- eckit/ecmwf-1.16.0
- fckit/ecmwf-0.9.2
- atlas/ecmwf-0.24.1
- nco/4.9.9
- pio/2.5.1-debug
- gsl_lite/0.37.0
- gptl/8.0.3
- pybind11/2.7.0
- jedi/intel-impi/ecbuild35
Cloned the fv3-bundle and using the attached screenshot of CMakeLists.txt below
Any help identifying my error is much appreciated, THANK YOU!
Hi,
Please try using the latest JEDI code. I recommend using jedi-bundle’s latest tag (skylab-v2).
Also, please follow the instruction here to use the latest modules on Discover. Atlas is already available in the latest modules and you don’t need to build it again.
Please let me know if you have any further questions.
Maryam
Thank you so much for your quick response and guidance.
I am now using the jedi-bundle instead and currently executing “make -j4”…this step always takes quite a while but after almost 1.5 hrs it is only 50% completed.
I have previously tried to execute this step differently to speed up the process…similar to using “salloc --nodes=1 --time=30” for running the entire full ctest…however I began to worry that sometimes it lead to make failures so I resorted back to basic.
Do you have any suggestions? Or is it best to suffer out the long wait.
Thank you again for your help
Unfortunately it takes more than an hour for me to build it on Discover too. One way to save a bit of time is to comment out mom6
and soca
in CMakeLists.txt
(Lines 51 and 52) if you don’t have plans to use those.
I expected that was the case, thank you for the suggestion.
After completing the full ctest following the instructions you provided above, I am only left with a single test failure.
1143 - test_femps_csgrid (Failed)
forrtl: severe (174): SIGSEGV, segmentation fault occurred
see screenshot below as well
I have not been able to find any relevant notes in the online JCSDA forums or documentation, but I will continue to explore…
Please let me know if you have any suggestions or know the solution to this ctest failure. Otherwise THANK YOU again for your help!
1 Like
We’ve seen this before with intel, an intermittent failure. Usually goes away when you run it again.
1 Like
Thank you for your response…
I have rebuilt and ran through the entire process two more times with the same failure.
Should I pivot and use GNU instead?
I understand that these types of intermittent failures occur and are often out of our control, so I have no problem if the solution is “it is what it is” and continuing to try until it passes is the reality…I just want to make sure I am not making things harder on myself or causing the issue myself
Truly appreciate your assistance…I have been trying to build and run JEDI successfully on discover through trial and error myself, and the feedback provided via this forum has been extremely helpful to search through
If your workflow allows it, you may also simply ignore this one failing test or exclude it when running ctest (ctest -E femps_csgrid)
I was able to successfully build jedi-bundle and complete ctest without failure using the instructions provided in the initial response from Maryam (Thank you!)
Now I have hit another block…
When I originally submitted my initial ctest question to this forum I was trying to build the fv3-bundle. I need to be able to make changes and rerun JEDI in various configurations…so that I can compare T/q soundings from
a) different satellite instruments,
b) using both forward models (crtm and rttov), and
c) processed with specific modifications to the retrieval algorithm and/or channel selection coefficients.
For this reason, I thought it made more sense to build and run JEDI using the fv3-bundle instead of the jedi-bundle. I may or may not be incorrect in this…I feel like I am still very much in the transitioning stage between “Youngling and Padawan” hahah
I have tried to build and run JEDI with the fv3-bundle on Discover multiple ways (environ. modules, spack-stack, singularity, …) and using tutorials but I can’t seem to get any to work, despite the excellent and seemingly straightforward documentation provided online.
Right now I think I am my biggest issue, bouncing around between options from one to another multiple times in a day and crossing my own wires…
Regardless…I am still unsure which way to build/run JEDI will be the best for my specific goals. I also read somewhere that RTTOV must be pulled individually from EUMETSAT after signing a license agreement – I am not sure if that is still the case – but I assume this would also influence which option need.
Please also feel free to contact me via email: ashley.a.wheeler@nasa.gov
I suggest you use Spack modules on Discover. Switching to GNU may help with reducing the build time.
If I understand your research goals correctly then you only need the obs operators? If yes then you can comment out more repositories from jedi-bundle/CMakeLists.txt. For example you can remove fv3-jedi repos (fms, fv3, femps, fv3-jedi-lm, fv3-jedi, fv3-jedi-data) and also mom6 and soca. This will reduce the build and test time.