Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plumed and GROMACS openmpi #492

Closed
ghost opened this issue Jun 19, 2019 · 8 comments
Closed

Plumed and GROMACS openmpi #492

ghost opened this issue Jun 19, 2019 · 8 comments

Comments

@ghost
Copy link

ghost commented Jun 19, 2019

Hello,

I've build the GROMACS 2018.6 patched with Plumed 2.5.1 using the easybuild. I'm trying to run the CINECA tutorial (https://www.plumed.org/doc-v2.5/user-doc/html/cineca.html) exercise 4.

gmx_mpi mdrun -s topol.tpr -plumed plumed.dat -nsteps 500000 runs fine

mpirun -np 2 gmx_mpi mdrun -s topol.tpr -plumed plumed.dat -multi 2 -nsteps 500000

Fails with following errors:

[kfhl160@seskscpg009 SCRIPTS]$ mpirun np 2 gmx_mpi mdrun -s ../SETUP/topol.tpr -plumed plumed.dat -multi 2 -nsteps 500000

WARNING: Open MPI will create a shared memory backing file in a
directory that appears to be mounted on a network filesystem.
Creating the shared memory backup file on a network file system, such
as NFS or Lustre is not recommended -- it may cause excessive network
traffic to your file servers and/or cause shared memory traffic in
Open MPI to be much slower than expected.

You may want to check what the typical temporary directory is on your
node. Possible sources of the location of this temporary directory
include the $TEMPDIR, $TEMP, and $TMP environment variables.

Note, too, that system administrators can set a list of filesystems
where Open MPI is disallowed from creating temporary files by setting
the MCA parameter "orte_no_session_dir".

Local host: seskscpg009.prim.scp
Filename: /scratch/kfhl160/openmpi-sessions-486452227@seskscpg009_0/5849/1/shared_mem_pool.seskscpg009
You can set the MCA paramter shmem_mmap_enable_nfs_warning to 0 to
disable this message.

:-) GROMACS - gmx mdrun, 2018.6 (-:

GROMACS is written by:
Emile Apol Rossen Apostolov Paul Bauer Herman J.C. Berendsen
Par Bjelkmar Aldert van Buuren Rudi van Drunen Anton Feenstra
Gerrit Groenhof Aleksei Iupinov Christoph Junghans Anca Hamuraru
Vincent Hindriksen Dimitrios Karkoulis Peter Kasson Jiri Kraus
Carsten Kutzner Per Larsson Justin A. Lemkul Viveca Lindahl
Magnus Lundborg Pieter Meulenhoff Erik Marklund Teemu Murtola
Szilard Pall Sander Pronk Roland Schulz Alexey Shvetsov
Michael Shirts Alfons Sijbers Peter Tieleman Teemu Virolainen
Christian Wennberg Maarten Wolf
and the project leaders:
Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel
Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2017, The GROMACS development team at
Uppsala University, Stockholm University and
the Royal Institute of Technology, Sweden.
check out http://www.gromacs.org for more information.

GROMACS is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License
as published by the Free Software Foundation; either version 2.1
of the License, or (at your option) any later version.

GROMACS: gmx mdrun, version 2018.6
Executable: /opt/scp/software/GROMACS/2018.6_GPU-foss-2017a-mpi/bin/gmx_mpi
Data prefix: /opt/scp/software/GROMACS/2018.6_GPU-foss-2017a-mpi
Working dir: /home/kfhl160/gromacstest1/cineca/SCRIPTS
Command line:
gmx_mpi mdrun -s ../SETUP/topol.tpr -plumed plumed.dat -multi 2 -nsteps 500000

Back Off! I just backed up md0.log to ./#md0.log.8#

Back Off! I just backed up md1.log to ./#md1.log.8#
++ Loading the PLUMED kernel runtime ++
++ PLUMED_KERNEL="/opt/scp/software/PLUMED/2.5.1-foss-2017a/lib/libplumedKernel.so" ++
++ Loading the PLUMED kernel runtime ++
++ PLUMED_KERNEL="/opt/scp/software/PLUMED/2.5.1-foss-2017a/lib/libplumedKernel.so" ++
++ Loading the PLUMED kernel runtime ++
++ PLUMED_KERNEL="/opt/scp/software/PLUMED/2.5.1-foss-2017a/lib/libplumedKernel.so" ++
++ Loading the PLUMED kernel runtime ++
++ PLUMED_KERNEL="/opt/scp/software/PLUMED/2.5.1-foss-2017a/lib/libplumedKernel.so" ++
Reading file ../SETUP/topol1.tpr, VERSION 4.6.7 (single precision)
Note: file tpx version 83, software tpx version 112
NOTE: GPU found, but the current simulation can not use GPUs
To use a GPU, set the mdp option: cutoff-scheme = Verlet

Overriding nsteps with value passed on the command line: 500000 steps, 1e+03 ps
Reading file ../SETUP/topol0.tpr, VERSION 4.6.7 (single precision)
Note: file tpx version 83, software tpx version 112
NOTE: GPU found, but the current simulation can not use GPUs
To use a GPU, set the mdp option: cutoff-scheme = Verlet

Overriding nsteps with value passed on the command line: 500000 steps, 1e+03 ps

This is simulation 1 out of 2 running as a composite GROMACS
multi-simulation job. Setup for this simulation:

Using 1 MPI process

This is simulation 0 out of 2 running as a composite GROMACS
multi-simulation job. Setup for this simulation:

Using 1 MPI process

Non-default thread affinity set probably by the OpenMP library,
disabling internal thread affinity

Non-default thread affinity set probably by the OpenMP library,
disabling internal thread affinity

NOTE: This file uses the deprecated 'group' cutoff_scheme. This will be
removed in a future release when 'verlet' supports all interaction forms.

NOTE: This file uses the deprecated 'group' cutoff_scheme. This will be
removed in a future release when 'verlet' supports all interaction forms.

Back Off! I just backed up traj_comp1.xtc to ./#traj_comp1.xtc.6#

Back Off! I just backed up traj_comp0.xtc to ./#traj_comp0.xtc.6#

Back Off! I just backed up ener1.edr to ./#ener1.edr.6#

Back Off! I just backed up ener0.edr to ./#ener0.edr.6#
starting mdrun 'alanine dipeptide in vacuum'
500000 steps, 1000.0 ps.
starting mdrun 'alanine dipeptide in vacuum'
500000 steps, 1000.0 ps.
[seskscpg009:90377] * Process received signal
[seskscpg009:90376] Process received signal
[seskscpg009:90376] Signal: Segmentation fault (11)
[seskscpg009:90376] Signal code: Address not mapped (1)
[seskscpg009:90376] Failing at address: 0x30
[seskscpg009:90377] Signal: Segmentation fault (11)
[seskscpg009:90377] Signal code: Address not mapped (1)
[seskscpg009:90377] Failing at address: 0x30
[seskscpg009:90376] [ 0] [seskscpg009:90377] [ 0] /lib64/libpthread.so.0(+0xf6d0)[0x7f65825396d0]
[seskscpg009:90376] [ 1] /lib64/libpthread.so.0(+0xf6d0)[0x7f8897b1e6d0]
[seskscpg009:90377] [ 1] /opt/scp/software/OpenMPI/2.0.2-GCC-6.3.0-2.27/lib/libmpi.so.20(MPI_Allreduce+0x1a4)[0x7f657aa95f24]
[seskscpg009:90376] [ 2] /opt/scp/software/OpenMPI/2.0.2-GCC-6.3.0-2.27/lib/libmpi.so.20(MPI_Allreduce+0x1a4)[0x7f889007af24]
[seskscpg009:90377] [ 2] /opt/scp/software/PLUMED/2.5.1-foss-2017a/lib/libplumedKernel.so(ZN4PLMD4GREX3cmdERKNSt7_cxx1112basic_stringIcSt11char_traitsIcESaIcEEEPv+0xb78)[0x7f65639d0198]
[seskscpg009:90376] [ 3] /opt/scp/software/PLUMED/2.5.1-foss-2017a/lib/libplumedKernel.so(ZN4PLMD4GREX3cmdERKNSt7_cxx1112basic_stringIcSt11char_traitsIcESaIcEEEPv+0xb78)[0x7f8885258198]
[seskscpg009:90377] [ 3] /opt/scp/software/PLUMED/2.5.1-foss-2017a/lib/libplumedKernel.so(ZN4PLMD10PlumedMain3cmdERKNSt7_cxx1112basic_stringIcSt11char_traitsIcESaIcEEEPv+0x1c66)[0x7f65639df1e6]
[seskscpg009:90376] [ 4] /opt/scp/software/PLUMED/2.5.1-foss-2017a/lib/libplumedKernel.so(ZN4PLMD10PlumedMain3cmdERKNSt7_cxx1112basic_stringIcSt11char_traitsIcESaIcEEEPv+0x1c66)[0x7f88852671e6]
[seskscpg009:90377] [ 4] /opt/scp/software/PLUMED/2.5.1-foss-2017a/lib/libplumedKernel.so(plumed_plumedmain_cmd+0x5d)[0x7f65639ef37d]
[seskscpg009:90376] [ 5] gmx_mpi[0x4159af]
[seskscpg009:90376] [ 6] gmx_mpi[0x438095]
[seskscpg009:90376] [ 7] gmx_mpi[0x41faae]
[seskscpg009:90376] [ 8] /opt/scp/software/PLUMED/2.5.1-foss-2017a/lib/libplumedKernel.so(plumed_plumedmain_cmd+0x5d)[0x7f888527737d]
[seskscpg009:90377] [ 5] gmx_mpi[0x4159af]
[seskscpg009:90377] [ 6] gmx_mpi[0x438095]
[seskscpg009:90377] [ 7] gmx_mpi[0x4204e2]
[seskscpg009:90376] [ 9] gmx_mpi[0x444cdb]
[seskscpg009:90376] [10] gmx_mpi[0x40fbcc]
[seskscpg009:90376] [11] gmx_mpi[0x41faae]
[seskscpg009:90377] [ 8] gmx_mpi[0x4204e2]
[seskscpg009:90377] [ 9] gmx_mpi[0x444cdb]
[seskscpg009:90377] [10] gmx_mpi[0x40fbcc]
[seskscpg009:90377] [11] /lib64/libc.so.6(_libc_start_main+0xf5)[0x7f6579baa445]
[seskscpg009:90376] [12] gmx_mpi[0x412dee]
/lib64/libc.so.6(_libc_start_main+0xf5)[0x7f888f18f445]
[seskscpg009:90377] [12] gmx_mpi[0x412dee]
[seskscpg009:90377] End of error message
[seskscpg009:90376] End of error message *

mpirun noticed that process rank 1 with PID 0 on node seskscpg009 exited on signal 11 (Segmentation fault).

[seskscpg009.prim.scp:90335] 3 more processes have sent help message help-opal-shmem-mmap.txt / mmap on nfs
[seskscpg009.prim.scp:90335] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

Could you advise that can wrong in the installation? Plumed version is 2.5.1

@GiovanniBussi
Copy link
Member

Can you try to:

  1. Run the same executable without plumed (e.g. mpirun -np 2 gmx_mpi mdrun -s topol.tpr -multi 2 -nsteps 500000
  2. In case the latter still fail, recompile gromacs alone and check if it works.

I suspect there is a problem with the MPI library that is not related directly to PLUMED.

@ghost
Copy link
Author

ghost commented Jun 19, 2019

Thank you for response.

The run without plumed works. Could you suggest a way to debug the issue?

@GiovanniBussi
Copy link
Member

Do you mean case 1 above or case 2?

In case it is 1 can you check if gromacs was built with MPI or with threadMPI?

@ghost
Copy link
Author

ghost commented Jun 19, 2019

I've run the case 1 i.e. mpirun -np 2 gmx_mpi mdrun -s topol.tpr -multi 2 -nsteps 500000 on the same data

@GiovanniBussi
Copy link
Member

Can you point to the script used for building gromacs?

In addition, please check also what happens if you call mpirun -np 1 (that is with a single process).

Thanks

@ghost
Copy link
Author

ghost commented Jun 19, 2019

The run with -np1 runs well. If you remove the multi option

Here is the easybuild script we use. The build is happening inside the singularity centos 7.5

bootstrap_cmds = [
'yum --nogpgcheck -y install cuda-repo-rhel7-9-1-local.x86_64 && yum update && yum install -y cuda cuda-toolkit'
]

easyblock = "CMakeMake"

name = 'GROMACS'
version = '2018.6_GPU'
mainversion = '2018.6'
versionsuffix = '-mpi'

homepage = 'http://www.gromacs.org'
description = '''GROMACS is a versatile package to perform molecular dynamics,
i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles.'''

toolchain = {'name': 'foss', 'version': '2017a'}
toolchainopts = {'openmp': True, 'usempi': True}

source_urls = [
'file:///GROMACS/',
]

sources = [
'gromacs-%s.tar.gz' % (mainversion),
]

dependencies = [
('PLUMED', '2.5.1'),
('FFTW', '3.3.6', '', ('gompi', '2017a')),
('CUDA', '.9.1.85', '', ('dummy','')),
]

hiddendependencies = [
('CUDA', '9.1.85', '', ('dummy', '')),
]

builddependencies = [
('CMake', '3.7.2'),
('Boost', '1.63.0'),
('Python', '2.7.13')
]

preconfigopts = 'pwd && cd %(builddir)s/gromacs-2018.6 && plumed patch -p --runtime -e gromacs-2018.6 && cd /opt/easybuild/build/GROMACS/%(version)s/foss-2017a-mpi/easybuild_obj && '

separate_build_dir = True

config_list = [
'-DCMAKE_C_FLAGS="-m64 -O3 -march=broadwell -mtune=broadwell"',
'-DCMAKE_CXX_FLAGS="-m64 -O3 -march=broadwell -mtune=broadwell"',
'-DGMX_SIMD=AVX2_256',
'-DGMX_GPU=ON',
'-DGMX_MPI=ON',
'-DGMX_THREAD_MPI=OFF',
'-DGMX_USE_RDTSCP=ON',
'-DGMX_BUILD_OWN_FFTW=OFF',
'-DREGRESSIONTEST_DOWNLOAD=OFF',
'-DCMAKE_INSTALL_PREFIX=%(installdir)s',
'-DBUILD_SHARED_LIBS=OFF',
'-DGMX_PREFER_STATIC_LIBS=ON'
]

configopts = ' '.join(config_list)

buildopts_list = [
'CFLAGS="-m64 -O3 -mavx2 -march=broadwell -mtune=broadwell" CXXFLAGS="-m64 -O3 -march=broadwell -mtune=broadwell"',
'CPPFLAGS="$EBROOTFFTW/include" LDFLAGS="$LDFLAGS -L$EBROOTFFTW/lib -L$EBROOTOPENMPI/lib" CC=gcc',
'CMAKE_INSTALL_PREFIX=%(installdir)s'
]

buildopts = ' '.join(buildopts_list)

postinstallcmds = [
'rm -rf %(installdir)s/easybuild'
]

modextrapaths = {
'GMXBIN': 'bin',
'GMXLDLIB': 'lib64',
'GMXMAN': 'share/man',
'GMXDATA': 'share/gromacs'
}

modextravars = {
'GMXFONT': '10x20'
}

moduleclass = 'bio'

@GiovanniBussi
Copy link
Member

Sorry I am afraid I do not know how to help...

Did you manage to run with other plumed or gromacs versions on the same cluster?

Another thing that you can check is if plumed and gromacs are linked to the same MPI library.

These two commands should report exactly the same file:

ldd $(which plumed)  | grep libmpi
ldd $(which gmx_mpi)  | grep libmpi

@carlocamilloni
Copy link
Member

close because of no clear documentation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants