Make fv3-jedi faster with more processors

I am trying to implement LETKF with FV3-JEDI, with the default “layout: [1,1]”, i.e. 6 cores, it took about 30 minutes. My assumption was that with more processors the time could be shorter. When I changed to “layout: [2,2]” (changed this only) and implemented it with 24 cores, the calculation time became much much longer than with 6 cores (more than doubled time).
Please let me know if anything is done incorrectly or anything else also needs to be changed.
Thanks in advance!

@xtian15 . can you send me the log file for the original and the modified runs? The time should not have doubled. This is curious.

Overall, the h(X) is the dominant task for LETKF at the moment. With the current implementation, h(x) doesn’t scale very well. It scales a bit better if you use obs distribution: halo. After the h(X) is refactored at the end of this month, you should see significant speed up and improved scalability for letkf.

@ frolovsa Thanks for your kind response! In both letkf callings, the Halo distributions were used following the given lektf.yaml examples. Actually, the attempt with 24 cores was terminated due to a system time limit of 48 hours, so the execution time was far more than double.
In addition, would it be possible that some examplary fv3-jedi yaml files can also be shared in scenarios including

  1. when LETKF is implemented with more than 6 cores;
  2. when LETKF is implemented with HX also written in disk; and
  3. the similar cases where LGTKF is used.
    Thanks again in advance!

@xtian15 your test with 24 cores was using lat-lon output and that was the problem (i think you used it for output increment). please switch to gfs. It will go much faster. You might be running out of memory that is why when you have 24 instead 6 tasks it takes so long to finish the job.

@ frolovsa Thanks for spotting this! You are right. After replacing “latlon” with “gfs”, the execution time was greatly reduced in both 6 and 24 cores, with 24 cores also much faster than 6 cores.
Looking forward to the refactoring of h(X) and further improvements!
Thanks again!