Discussion:
[Libmesh-users] parmetis troubles
Manav Bhatia
2015-03-13 04:52:03 UTC
Permalink
Hi,

I am now running my code linked to libMesh that uses parmetis from petsc version 3.5.3. The specific version of parmetis is 4.0.2. My compiler is gcc 4.4.7 with openmpi-1.8.3.

Parmetis is quitting on me with segmentation faults. The code works just fine on other machines (both mac and some big intel based clusters). Before I spend more time debugging it, I wanted to get a quick check from others if they have had issues if parmetis.

Thanks,
Manav
Paul T. Bauman
2015-03-13 11:24:33 UTC
Permalink
Post by Manav Bhatia
Parmetis is quitting on me with segmentation faults. The code works
just fine on other machines (both mac and some big intel based clusters).
Before I spend more time debugging it, I wanted to get a quick check from
others if they have had issues if parmetis.
Make sure you use the configure option --with-metis=PETSc

The libMesh and PETSc ParMetis installations can (and often do) collide and
that option will tell libMesh to use PETSc's.
Manav Bhatia
2015-03-13 16:52:23 UTC
Permalink
Yes, I have been doing that.

U am getting an error from Parmetis saying

The sum of tpwgts for constraint #0 is not 1.0

I searched Parmetis' website and did not find anything conclusive to point me in a direction. There was someone who did a reinstall of mpi to get around this, but that is hardly a solution in my case.

-Manav
Post by Paul T. Bauman
Post by Manav Bhatia
Parmetis is quitting on me with segmentation faults. The code works just fine on other machines (both mac and some big intel based clusters). Before I spend more time debugging it, I wanted to get a quick check from others if they have had issues if parmetis.
Make sure you use the configure option --with-metis=PETSc
The libMesh and PETSc ParMetis installations can (and often do) collide and that option will tell libMesh to use PETSc's.
Manav Bhatia
2015-03-13 17:19:38 UTC
Permalink
Found it!!!

So, I did specify "PETSc" as the source of metis during compilation of
libmesh (as I have been doing successfully for a while on my mac). However,
for some reason, the compile options included the libmesh/contrib/metis and
libmesh/contrib/parmetis as search paths.

As a result, the compiler was using metis.h from libmesh/contrib, but using
libmetis.so and libparmetis.so library from petsc. The libmesh/contrib has
a real width of 32, and uses "float" as scalar in parmetis_partitioner,
while the petsc metis library was build with real width type of 64.

I do not know why contrib stuff is still being used, but this mismatch was
causing the error.
Post by Paul T. Bauman
Post by Manav Bhatia
Parmetis is quitting on me with segmentation faults. The code works
just fine on other machines (both mac and some big intel based clusters).
Before I spend more time debugging it, I wanted to get a quick check from
others if they have had issues if parmetis.
Make sure you use the configure option --with-metis=PETSc
The libMesh and PETSc ParMetis installations can (and often do) collide
and that option will tell libMesh to use PETSc's.
Paul T. Bauman
2015-03-13 17:24:24 UTC
Permalink
Thanks for the report. We'll get it sorted soon. I'm about to leave for
SIAM CSE, as are other developers I anticipate so it may be a few days.
Post by Manav Bhatia
Found it!!!
So, I did specify "PETSc" as the source of metis during compilation of
libmesh (as I have been doing successfully for a while on my mac). However,
for some reason, the compile options included the libmesh/contrib/metis and
libmesh/contrib/parmetis as search paths.
As a result, the compiler was using metis.h from libmesh/contrib, but
using libmetis.so and libparmetis.so library from petsc. The
libmesh/contrib has a real width of 32, and uses "float" as scalar in
parmetis_partitioner, while the petsc metis library was build with real
width type of 64.
I do not know why contrib stuff is still being used, but this mismatch was
causing the error.
Post by Paul T. Bauman
Post by Manav Bhatia
Parmetis is quitting on me with segmentation faults. The code works
just fine on other machines (both mac and some big intel based clusters).
Before I spend more time debugging it, I wanted to get a quick check from
others if they have had issues if parmetis.
Make sure you use the configure option --with-metis=PETSc
The libMesh and PETSc ParMetis installations can (and often do) collide
and that option will tell libMesh to use PETSc's.
Manav Bhatia
2015-03-13 17:27:35 UTC
Permalink
not a problem...
enjoy your conference and have a safe trip!

-Manav
Post by Paul T. Bauman
Thanks for the report. We'll get it sorted soon. I'm about to leave for
SIAM CSE, as are other developers I anticipate so it may be a few days.
Post by Manav Bhatia
Found it!!!
So, I did specify "PETSc" as the source of metis during compilation of
libmesh (as I have been doing successfully for a while on my mac). However,
for some reason, the compile options included the libmesh/contrib/metis and
libmesh/contrib/parmetis as search paths.
As a result, the compiler was using metis.h from libmesh/contrib, but
using libmetis.so and libparmetis.so library from petsc. The
libmesh/contrib has a real width of 32, and uses "float" as scalar in
parmetis_partitioner, while the petsc metis library was build with real
width type of 64.
I do not know why contrib stuff is still being used, but this mismatch
was causing the error.
Post by Paul T. Bauman
Post by Manav Bhatia
Parmetis is quitting on me with segmentation faults. The code works
just fine on other machines (both mac and some big intel based clusters).
Before I spend more time debugging it, I wanted to get a quick check from
others if they have had issues if parmetis.
Make sure you use the configure option --with-metis=PETSc
The libMesh and PETSc ParMetis installations can (and often do) collide
and that option will tell libMesh to use PETSc's.
Manav Bhatia
2015-03-13 17:44:32 UTC
Permalink
Is there a motivation to keep "scalar" as the real type in
parmetis_partitioner.h?

Why not change this to the real type in metis.h?

-Manav
Post by Manav Bhatia
not a problem...
enjoy your conference and have a safe trip!
-Manav
Post by Paul T. Bauman
Thanks for the report. We'll get it sorted soon. I'm about to leave for
SIAM CSE, as are other developers I anticipate so it may be a few days.
Post by Manav Bhatia
Found it!!!
So, I did specify "PETSc" as the source of metis during compilation of
libmesh (as I have been doing successfully for a while on my mac). However,
for some reason, the compile options included the libmesh/contrib/metis and
libmesh/contrib/parmetis as search paths.
As a result, the compiler was using metis.h from libmesh/contrib, but
using libmetis.so and libparmetis.so library from petsc. The
libmesh/contrib has a real width of 32, and uses "float" as scalar in
parmetis_partitioner, while the petsc metis library was build with real
width type of 64.
I do not know why contrib stuff is still being used, but this mismatch
was causing the error.
Post by Paul T. Bauman
Post by Manav Bhatia
Parmetis is quitting on me with segmentation faults. The code works
just fine on other machines (both mac and some big intel based clusters).
Before I spend more time debugging it, I wanted to get a quick check from
others if they have had issues if parmetis.
Make sure you use the configure option --with-metis=PETSc
The libMesh and PETSc ParMetis installations can (and often do) collide
and that option will tell libMesh to use PETSc's.
Manav Bhatia
2015-03-13 17:54:01 UTC
Permalink
Correction: I had meant to write "real as the scalar type" (instead of
"scalar as the real type") in my previous message.

-Manav
Post by Manav Bhatia
Is there a motivation to keep "scalar" as the real type in
parmetis_partitioner.h?
Why not change this to the real type in metis.h?
-Manav
Post by Manav Bhatia
not a problem...
enjoy your conference and have a safe trip!
-Manav
Post by Paul T. Bauman
Thanks for the report. We'll get it sorted soon. I'm about to leave for
SIAM CSE, as are other developers I anticipate so it may be a few days.
Post by Manav Bhatia
Found it!!!
So, I did specify "PETSc" as the source of metis during compilation of
libmesh (as I have been doing successfully for a while on my mac). However,
for some reason, the compile options included the libmesh/contrib/metis and
libmesh/contrib/parmetis as search paths.
As a result, the compiler was using metis.h from libmesh/contrib, but
using libmetis.so and libparmetis.so library from petsc. The
libmesh/contrib has a real width of 32, and uses "float" as scalar in
parmetis_partitioner, while the petsc metis library was build with real
width type of 64.
I do not know why contrib stuff is still being used, but this mismatch
was causing the error.
Post by Paul T. Bauman
Post by Manav Bhatia
Parmetis is quitting on me with segmentation faults. The code
works just fine on other machines (both mac and some big intel based
clusters). Before I spend more time debugging it, I wanted to get a quick
check from others if they have had issues if parmetis.
Make sure you use the configure option --with-metis=PETSc
The libMesh and PETSc ParMetis installations can (and often do)
collide and that option will tell libMesh to use PETSc's.
John Peterson
2015-03-13 18:18:57 UTC
Permalink
Post by Manav Bhatia
Correction: I had meant to write "real as the scalar type" (instead of
"scalar as the real type") in my previous message.
Hmm... I don't see a single instance of Real in either metis_partitioner.h
or parmetis_partitioner.h. Perhaps you are referring to:

std::vector<float> _tpwgts;
std::vector<float> _ubvec;

which use float? I agree that should probably be changed, along with using
Metis' idx_t for indexing instead of hard-coding int.
--
John
Manav Bhatia
2015-03-13 18:28:37 UTC
Permalink
Thanks for the correction, John. I had meant to say "float".

At my end, I have temporarily changed all "floats" to "double" in the
metis/parmetis files in libmesh, but this is a temporary fix.

It would be better to source it from the respective headers. But that
raises another point: as of now, it seems like the metis.h in contrib has a
hardcoded value of real width to be 32. I understand that this saves memory
space, but would there be a motivation to keep this scalar type to be
consistent with the rest of the library?

-Manav
Post by John Peterson
Post by Manav Bhatia
Correction: I had meant to write "real as the scalar type" (instead of
"scalar as the real type") in my previous message.
Hmm... I don't see a single instance of Real in either metis_partitioner.h
std::vector<float> _tpwgts;
std::vector<float> _ubvec;
which use float? I agree that should probably be changed, along with
using Metis' idx_t for indexing instead of hard-coding int.
--
John
John Peterson
2015-03-13 18:35:09 UTC
Permalink
Post by Manav Bhatia
Thanks for the correction, John. I had meant to say "float".
At my end, I have temporarily changed all "floats" to "double" in the
metis/parmetis files in libmesh, but this is a temporary fix.
It would be better to source it from the respective headers. But that
raises another point: as of now, it seems like the metis.h in contrib has a
hardcoded value of real width to be 32. I understand that this saves memory
space, but would there be a motivation to keep this scalar type to be
consistent with the rest of the library?
The floating point stuff is bad... converting double to float certainly has
loss of precision issues, but I'm also worried about the case when PETSc
and libmesh are configured with --with-64-bit-indices and
--with-dof-id-bytes=8, respectively, and we continue using int when passing
data through the libmesh/Metis-Parmetis interfaces.
--
John
Loading...