Some Issues Related with Ensemble-Variational DA Application with JEDI

Hello everyone,​

I think I have been actively working with JEDI and wanted to bring some issues which I’ve encountered regarding the ensemble-variational application in the JEDI over 2 years.​

1. Potential Incompatibility with 3 OL when using ‘Derber-Rosati’ Minimizers (e.g., DRIPCG)

When I attempt to run 3OL using DRIPCG in the JEDI EnVar application, it consistently crashes after the inner loop of the third OL concludes. This is evident even with the ctest data (C12 global grid example in the jedi-bundle paired with FV3), not my own case.​

No such issues were observed in my testings until the June 2022 release (from the fv3-bundle). However, post the JEDI-Skylab v4 release, this issue seems persistent, as seen in multiple JEDI public releases including JEDI-Skylab v5, v6, and the JEDI-MPAS v2.0.0 (which I have been tested). All of them exhibit segmentation fault errors at the same stage.​ The below is an example using JEDI-MPAS v2

Although I tested on the TACC stampede2 machine with intel/19 compiler and intel-mpi, a colleague has also confirmed similar concerns on the NOAA Hera machine. This may suggest a broader problem rather than a local setup issue.
Interestingly, using the ‘PCG’ minimizer appears to work fine with 3 OLs. I hypothesize that certain allocations in ‘DRXXX’ minimizers, which retain some vectors from previous OLs, might be contributing to the segmentation fault. If this isn’t an issue in JEDI with QG or L95 models, we might need to revisit the model-to-JEDI interface, I think.

2. Memory Consumption in the Ensemble-Variational DA Application

As I have flagged approximately a year or two ago, the JEDI-based EnVar application seems more memory-intensive compared to the GSI application, with the extent depending on the number of OLs and inner loops. I’ve observed a pattern where memory spikes at specific stages during the inner loop. For instance, with the ‘DRIPCG’ minimizer, a surge happens at the ‘B multiply’ step which seems potentially linked to MPI broadcasting step.​
I thought that it could increase memory in the first hand but it’s somewhat perplexing why memory keeps surging in subsequent OLs as well when, in theory, the memory from the first OL could be reused.​

Another observation is that the GMRESR minimizer, employed post each OL for minimization evaluation, doesn’t seem to release its memory. By tweaking GMRESR evaluation (increasing thresholds not to run, which I recall doing for diagnostic purposes; I am using DRIPCG), I managed to reduce memory usage without affecting the DA result. It would be worth investigating if the memory consumed during this step can be reclaimed.

In summary, these challenges could hinder scaling up JEDI for more extensive applications for me. If I’ve made any oversights, I’d appreciate corrections to ensure clarity for all.
If anyone has insights, questions, or needs further details, do reach out. I aim to respond promptly when available.​

Thank you for your attention to these matters.​​

Thank you very much for the detailed analysis of these issues and attaching the supporting plots – it is very informative and helpful!
I would like to answer your first point here, and I will get back to you on the second one in few days (I need to look more into this).
You are correct that the issue with running more than 2 outer loops is (was) only showing up for DR* minimizers. Good news is that it is now fixed on develop, and 2+ outer loops should work with DR* minimizers again. The bugfix was merged about 2 weeks ago, so it didn’t make into any of the releases yet. You can either try running with develop branches to use this feature, or wait for the next release (which will be some time around New Year, date undefined yet).