#1 Dramatic slow-down after inclusion of SOC by AdrianoDiPietro 11.10.2021 16:59

avatar

Dear Fleur community,

After a very fast and problem free structural relaxation in a MgO/CoFe multilayer setup, I am experiencing a dramatic slow down as soon as I include SOC in the calculation. I am aware that SOC reduces the speed of the SCF loop, however I am currently working on a cluster and I run the simulation on 40 cores in parallel.
(Since I am interested in calculating magnetic properties such as magentic anisotropy and possibly DMI, I explicitly broke as many symmetries as possible in the inp file by specifying an arbitrary quantization axis via the line /soc 0.37 0.10)
I am aware of the optimized parallelization schemes offered by OpenMP, but in this particular instance they do not seem to provide the usual dramatic speedup.

Other simulations with similar size unit cells and structure displayed a much more bening convergence behavior after the inclusion of SOC. My questions:

1) Is the reduction in speed of SOC calculations caused by atom-specific parameters or does it only depend on the symmetry of the unit cell?

2) Could you show me a typical parallelization scheme you would run on your cluster?

As a ballpark value for the duration of 1 iteration : it takes about 15 minutes.

I will include the inp.xml and out.xml for your consideration as well as the sbatch file for you to compare and maybe suggest some better parallelization schemes.

Thanks in advance,

Adriano Di Pietro

#2 RE: Dramatic slow-down after inclusion of SOC by Gregor 11.10.2021 18:26

It is always good to have a mixture of OMP and MPI parallelization. Right now you use 40 MPI processes and a single OMP thread per MPI process. You have 49 k points. It is always good to have a number of MPI processes that divides the number of k points. In this case I suggest to use 7 MPI processes and 5 OpenMP threads per MPI process:

export OMP_NUM_THREADS=5

...or however this is specified on your cluster.

Of course, you have a performance drawback from including SOC and from reducing the symmetry. But it may be that the so far chosen parallelization scheme is also a factor here.

With the suggested parallelization you make use of 35 of the 40 cores. I think you should not worry about not using 5 of the cores. But I will also make Uliana aware of this question. Maybe she has something to add or has to correct me. :)

#3 RE: Dramatic slow-down after inclusion of SOC by Gregor 11.10.2021 18:40

On what kind of processor is this running? Is this an Intel or an AMD processor? For AMD processors you have to be careful in the compilation process. If you don't use the optimal compiler options you will get a bad performance on AMD processors.

#4 RE: Dramatic slow-down after inclusion of SOC by Uliana 12.10.2021 16:16

Hi guys!
No, I agree with Gregor, one should pay attention to the MPI/k parallelisation first.
Besides, how dramatic is the dramatic slow-down in numbers?
You ales have timings at the end of the out file, you can compare whether the differences come from the SOC or whether the whole computation became more slow.
Cheers, Uliana

#5 RE: Dramatic slow-down after inclusion of SOC by AdrianoDiPietro 15.10.2021 14:36

avatar

Dear Gregor and Uliana,

It was indeed a matter of correctly communicating the number of OMP_NUM_THREADS to the cluster - I was used to another usage of sbatch and was communicating it in the wrong way.

In calculations without SOC the convergence was rather quick and raised my hopes. With the optimized setup the SOC calculation does 1 SCF iteration every 3 minutes, which is good!

As always, thanks for the fast advice.

Best,

Adriano

Xobor Einfach ein eigenes Xobor Forum erstellen
Datenschutz