Optimizing fortran code with intel VTune analyzer
I am working with a fortran project to simulate vegetation dynamics. The code is slow, so I am always looking for ways to optimize it. I've read that there is a "rule" that says that typically 90% of the time is spent on 10% of the code. To find out about these bottlenecks, I started using the Intel VTune Performance Analyzer. Analysis of the simulation shows that a large amount of time is spent on certain parts of the code, as shown in the images . The following figure shows the most expensive part leaftw_derivs
.
Below is the code mentioned in the analysis.
!---- Update soil moisture and energy from transpiration/root uptake. ------------------!
if (rk4aux(ibuff)%any_resolvable) then
do k1 = klsl, mzg ! loop over extracted water
do k2=k1,mzg
if (rk4site%ntext_soil(k2) /= 13) then
!---------------------------------------------------------------------------!
! Transpiration happens only when there is some water left down to this !
! layer. !
!---------------------------------------------------------------------------!
if (rk4aux(ibuff)%avail_h2o_int(k1) > 0.d0) then
!------------------------------------------------------------------------!
! Find the contribution of layer k2 for the transpiration from !
! cohorts that reach layer k1. !
!------------------------------------------------------------------------!
ext_weight = rk4aux(ibuff)%avail_h2o_lyr(k2) / rk4aux(ibuff)%avail_h2o_int(k1)
!------------------------------------------------------------------------!
wloss_tot = 0.d0
qloss_tot = 0.d0
wvlmeloss_tot = 0.d0
qvlmeloss_tot = 0.d0
do ico=1,cpatch%ncohorts
!----- Find the loss from this cohort. -------------------------------!
wloss = rk4aux(ibuff)%extracted_water(ico,k1) * ext_weight
qloss = wloss * tl2uint8(initp%soil_tempk(k2),1.d0)
wvlmeloss = wloss * wdnsi8 * dslzi8(k2)
qvlmeloss = qloss * dslzi8(k2)
!---------------------------------------------------------------------!
!---------------------------------------------------------------------!
! Add the internal energy to the cohort. This energy will be !
! eventually lost to the canopy air space because of transpiration, !
! but we will do it in two steps so we ensure energy is conserved. !
!---------------------------------------------------------------------!
dinitp%leaf_energy(ico) = dinitp%leaf_energy(ico) + qloss
dinitp%veg_energy(ico) = dinitp%veg_energy(ico) + qloss
initp%hflx_lrsti(ico) = initp%hflx_lrsti(ico) + qloss
!---------------------------------------------------------------------!
!----- Integrate the total to be removed from this layer. ------------!
wloss_tot = wloss_tot + wloss
qloss_tot = qloss_tot + qloss
wvlmeloss_tot = wvlmeloss_tot + wvlmeloss
qvlmeloss_tot = qvlmeloss_tot + qvlmeloss
!---------------------------------------------------------------------!
end do
!------------------------------------------------------------------------!
!----- Update derivatives of water, energy, and transpiration. ----------!
dinitp%soil_water (k2) = dinitp%soil_water(k2) - wvlmeloss_tot
dinitp%soil_energy (k2) = dinitp%soil_energy(k2) - qvlmeloss_tot
dinitp%avg_transloss(k2) = dinitp%avg_transloss(k2) - wloss_tot
!------------------------------------------------------------------------!
end if
!---------------------------------------------------------------------------!
end if
!------------------------------------------------------------------------------!
end do
!---------------------------------------------------------------------------------!
end do
!------------------------------------------------------------------------------------!
end if
!---------------------------------------------------------------------------------------!
I have a very basic understanding of optimization, but I don't see what can be done here to improve the code. In particular, I do not understand what the Resignation Instructions mean and how it is done. Is there a way to speed up calculations?
EDIT
After thinking a little, I realized that there are some simple optimizations here. For example, moving a conditional if (rk4aux(ibuff)%avail_h2o_int(k1) > 0.d0) then
outside the loop, as well as moving tl2uint8(initp%soil_tempk(k2),1.d0)
outside the innermost loop.
However, I cannot figure out the reason for the supposedly long time that VTune gives: 3 lines
dinitp%leaf_energy(ico) = dinitp%leaf_energy(ico) + qloss dinitp%veg_energy(ico) = dinitp%veg_energy(ico) + qloss initp%hflx_lrsti(ico) = initp%hflx_lrsti(ico) + qloss
just do the addition. It should be very fast, but instead the analyzer says it spends a lot of time there. Why is this so?
EDIT2
I rewrote the whole loop trying to optimize as much as I could. This is the code I came up with
!---- Update soil moisture and energy from transpiration/root uptake. ------------------!
if (rk4aux(ibuff)%any_resolvable) then
do k1 = klsl, mzg ! loop over extracted water
!---------------------------------------------------------------------------!
! Transpiration happens only when there is some water left down to this !
! layer. !
!---------------------------------------------------------------------------!
if (rk4aux(ibuff)%avail_h2o_int(k1) > 0.d0) then
wloss_tot_k1 = 0.d0
do ico=1,cpatch%ncohorts
!----- Integrate the total to be removed from this layer. ------------!
wloss_tot_k1 = wloss_tot_k1 + rk4aux(ibuff)%extracted_water(ico,k1)
!---------------------------------------------------------------------!
end do
!------------------------------------------------------------------------!
do k2=k1,mzg
if (rk4site%ntext_soil(k2) /= 13) then
do ico=1,cpatch%ncohorts
wloss = rk4aux(ibuff)%extracted_water(ico,k1) * ext_weight
uint_here1 = wloss * uint_here
dinitp%leaf_energy(ico) = dinitp%leaf_energy(ico) + uint_here1
dinitp%veg_energy(ico) = dinitp%veg_energy(ico) + uint_here1
initp%hflx_lrsti(ico) = initp%hflx_lrsti(ico) + uint_here1
end do
!------------------------------------------------------------------------!
wloss_tot = wloss_tot_k1 * ext_weight
wvlmeloss_tot = wloss_tot * dslzi8(k2) * wdnsi8
qvlmeloss_tot = wloss_tot * dslzi8(k2) * uint_here
!----- Update derivatives of water, energy, and transpiration. ----------!
dinitp%soil_water (k2) = dinitp%soil_water(k2) - wvlmeloss_tot
dinitp%soil_energy (k2) = dinitp%soil_energy(k2) - qvlmeloss_tot
dinitp%avg_transloss(k2) = dinitp%avg_transloss(k2) - wloss_tot
!------------------------------------------------------------------------!
end if
!---------------------------------------------------------------------------!
end do
!------------------------------------------------------------------------------!
end if
!---------------------------------------------------------------------------------!
end do
!------------------------------------------------------------------------------------!
end if
!---------------------------------------------------------------------------------------!
It's a bit long, so I don't expect people to go through with it. If I run the analyzer, I now get significantly reduced times (from 290 to 185, although the speed appears to be slightly slower in real simulators).
However, when looking at the sample, there is still a significant amount of time spent on operations and I would not expect "expensive". I still don't understand what deprecated instructions mean and how it's done. For now, I think that's enough, and I'm guessing that the correct way to further speed up would be to use openMP capabilities as Holmes suggests.
source to share
No one has answered this question yet
Check out similar questions: