I am trying to test a pure 3D EnVAR application using JEDI-FV3 v1.1.2 on CONUS configuration with FV3-LAM.
When I compare EnVAR applications between two JEDI-FV3 releases v1.0.0 and v1.1.2,
I have noticed that memory usages are somewhat increased in JEDI-FV3 1.1.2.
I remembered I can run the application of JEDI-FV3 v1.0.0 successfully in a single computation node which has 192GB memory. However, this is not viable in JEDI-FV3 v.1.1.2 even for the same configuration (BUMP ensemble COV, 3 Outer loops with maximum 30 inner loops, same observation data, 40 ensemble members). As you can see the below, it takes about ~1TB of memory. Thus, this application requires at least 6 computational nodes with JEDI-FV3 v1.1.2. I want to emphasize that this applications are done for a very small-size domain (300x300).
As you can see in the below screenshot, the memory usage keeps increasing significantly during the inner loop step.
Is there any possible solution in JEDI-FV3 v1.1.2 not to use memory too much during the inner loop step? I don’t think the application on CONUS is viable due to memory for my current configuration.
Or are there any other updates in ‘develop’ branch to deal with such things? I guess I have seen that JEDI team tried to refactoring H(x) code at AMS.
Jun, thank you for reporting this, and providing the useful information on the memory usage at each iteration.
I have not heard of the specific efforts aimed to address this issue, but it is possible that develop version would have different memory usage (there has been many different changes that came in since the last release). Do you mind trying fv3-bundle develop version with your application and seeing if you see similar statistics in develop?
Thanks for reply,
Sure! I’d like to try on ‘develop’ branch of the JEDI-FV3 with variational configurations.
FYI, I am also sharing the results testing ‘CONUS’ domain.
(1820*1092) with two variational DA applications by employing JEDI-FV3 v1.1.2.
This shows memory usages of 3DVAR (~0.6 TB) with ‘FV3JEDI-ID BEC’ using JEDI-FV3 v1.1.2 assimilating observations (about 3.3 millions)
This shows memory usages of P3DEnVAR (~3.4TB) with ‘BUMP ensemble BEC’ using JEDI-FV3 v1.1.2 with 5 members
I will see if the memory usage keeps increasing in the ‘develop’ branch.
It will take some time since I need to update base libraries for ‘develop’ branch.
I will update you later.
Thank you Jun! Could you please also share here the yaml configuration file that you are using for your experiments?
I captured the main components of my yaml configuration of P3DEnVAR on CONUS.
Cost Function - Geometry, Background, Background Error Cov.
Although you may see additional variables (e.g., use_cvpq, …), they don’t seem to affect memory usages much I guess (I do not see such things in EnKF applications after the JEDI-FV3 v1.1.2 supports ‘halo’ distribution).
It seems to me that the ‘develop’ branch (Cloned May 26 2022) also has similar increasing behaviors in memory usages at the inner loop step.
Here I attached memory usages information. I tried 3DVAR using ‘FV3-JEDI ID’ BEC with 15 inner loops.
Develop’ Branch (Cloned May 26 2022) ~ 573 Gb
JEDI-FV3 v1.1.2 ~ 549 Gb
FYI, I cloned the ‘develop’ branch around June 2022 and found that EnVAR application using ‘develop’ branch took much more memory (Total; 3.1TB) for the same application with JEDI-FV3 v1.1.2 (1TB).
(don’t know if there are some updates after Jun or additional yaml variables to control memory usages)