Discussion:
[Libmesh-users] AMR speed
Rossi, Simone
2017-04-27 03:09:37 UTC
Permalink
Dear Roy, dear Paul, dear all,
I am testing AMR in libmesh using simple linear elements.
My test case is a propagating front described by a reaction-diffusion equation with a cubic bistable reaction term.
I followed the adaptivity examples to create this test case.

The run times for 100 timesteps using AMR can be more than 10 times slower than when using a fine uniform grid.
For example, with a 16 x 16 x 16 uniform grid, 100 iterations take about 18 seconds with a single processor.
With AMR, using a 2 x 2 x 2 grid and 3 levels of refinement, 100 iterations take about 800 seconds.

I’m attaching the code I’m using.
Without AMR, I build the matrix ( mass + dt * stiffness ) once and I update the rhs at every timestep.
Conversely, with AMR I am building the matrix and the rhs at every timestep for all the refinement levels.
Do you have any suggestions?

Thanks a lot for your help,
All the best,
Simone
Vikram Garg
2017-04-27 15:02:36 UTC
Permalink
Hello Rossi,
Two questions:

1) Which error estimator/indicator are you using to mark elements for
refinement ?

2) Can you send the perfLog output from libMesh ? You might need to
recompile libMesh with the option --enable-perflog.

Looks something like this:

-----------------------------------------------------------------------------------------------------------------
| libMesh Performance: Alive time=0.013423, Active time=0.007095
|
-----------------------------------------------------------------------------------------------------------------
| Event nCalls Total Time Avg Time
Total Time Avg Time % of Active Time |
| w/o Sub w/o Sub
With Sub With Sub w/o S With S |
|-----------------------------------------------------------------------------------------------------------------|
|
|
|
|
| DofMap
|
| add_neighbors_to_send_list() 6 0.0001 0.000012
0.0001 0.000012 1.01 1.01 |
| build_sparsity() 6 0.0002 0.000033
0.0011 0.000187 2.78 15.84 |
| create_dof_constraints() 6 0.0000 0.000001
0.0000 0.000001 0.07 0.07 |
| distribute_dofs() 6 0.0001 0.000025
0.0004 0.000066 2.09 5.57 |
| dof_indices() 688 0.0010 0.000001
0.0010 0.000001 14.36 14.36 |
| old_dof_indices() 300 0.0001 0.000000
0.0001 0.000000 0.96 0.96 |
| prepare_send_list() 7 0.0000 0.000000
0.0000 0.000000 0.01 0.01 |
| reinit() 6 0.0002 0.000041
0.0002 0.000041 3.48 3.48 |
|
|
| EquationSystems
|
| build_solution_vector() 1 0.0001 0.000056
0.0001 0.000064 0.79 0.90 |


Thanks.
Post by Rossi, Simone
Dear Roy, dear Paul, dear all,
I am testing AMR in libmesh using simple linear elements.
My test case is a propagating front described by a reaction-diffusion
equation with a cubic bistable reaction term.
I followed the adaptivity examples to create this test case.
The run times for 100 timesteps using AMR can be more than 10 times slower
than when using a fine uniform grid.
For example, with a 16 x 16 x 16 uniform grid, 100 iterations take about
18 seconds with a single processor.
With AMR, using a 2 x 2 x 2 grid and 3 levels of refinement, 100
iterations take about 800 seconds.
I’m attaching the code I’m using.
Without AMR, I build the matrix ( mass + dt * stiffness ) once and I
update the rhs at every timestep.
Conversely, with AMR I am building the matrix and the rhs at every
timestep for all the refinement levels.
Do you have any suggestions?
Thanks a lot for your help,
All the best,
Simone
------------------------------------------------------------
------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Libmesh-users mailing list
https://lists.sourceforge.net/lists/listinfo/libmesh-users
--
Vikram Garg
Postdoctoral Associate
The University of Texas at Austin

http://vikramvgarg.wordpress.com/
http://www.runforindia.org/runners/vikramg
Rossi, Simone
2017-04-27 15:54:39 UTC
Permalink
Dear Vikram,
as in the examples, I am using the libmesh::KellyErrorEstimator.

I’m compiling libmesh with the --enable-perflog option. Does it automatically give all the details you have listed in the example?

For the time being, I am attaching two perfLogs I had saved with only “coarse scale” data for 2 levels of refinements.
It looks like that most of the time is spent in the AMR step, probably in the call to reinit().

Thanks,
Simone

NO AMR:

------------------------------------------------------------------------------------------------------------
| perf_log Performance: Alive time=18.0494, Active time=18.0426 |
------------------------------------------------------------------------------------------------------------
| Event nCalls Total Time Avg Time Total Time Avg Time % of Active Time |
| w/o Sub w/o Sub With Sub With Sub w/o S With S |
|------------------------------------------------------------------------------------------------------------|
| no amr matrix assembly 1 0.1545 0.154465 0.1545 0.154465 0.86 0.86 |
| no amr linear solve 101 4.8069 0.047593 4.8069 0.047593 26.64 26.64 |
| no amr rhs assembly 101 12.0348 0.119156 12.0348 0.119156 66.70 66.70 |
| time loop 1 1.0464 1.046422 17.8884 17.888405 5.80 99.15 |
------------------------------------------------------------------------------------------------------------
| Totals: 204 18.0426 100.00 |
------------------------------------------------------------------------------------------------------------


AMR:

------------------------------------------------------------------------------------------------------------
| perf_log Performance: Alive time=209.305, Active time=209.298 |
------------------------------------------------------------------------------------------------------------
| Event nCalls Total Time Avg Time Total Time Avg Time % of Active Time |
| w/o Sub w/o Sub With Sub With Sub w/o S With S |
|------------------------------------------------------------------------------------------------------------|
| |
| amr 303 195.1102 0.643928 195.1102 0.643928 93.22 93.22 |
| amr solve 303 13.9907 0.046174 13.9907 0.046174 6.68 6.68 |
| time loop 1 0.1974 0.197370 209.2990 209.299042 0.09 100.00 |
------------------------------------------------------------------------------------------------------------
| Totals: 607 209.2983 100.00 |
------------------------------------------------------------------------------------------------------------​


On Apr 27, 2017, at 11:02, Vikram Garg <***@gmail.com<mailto:***@gmail.com>> wrote:

Hello Rossi,
Two questions:

1) Which error estimator/indicator are you using to mark elements for refinement ?

2) Can you send the perfLog output from libMesh ? You might need to recompile libMesh with the option --enable-perflog.

Looks something like this:

-----------------------------------------------------------------------------------------------------------------
| libMesh Performance: Alive time=0.013423, Active time=0.007095 |
-----------------------------------------------------------------------------------------------------------------
| Event nCalls Total Time Avg Time Total Time Avg Time % of Active Time |
| w/o Sub w/o Sub With Sub With Sub w/o S With S |
|-----------------------------------------------------------------------------------------------------------------|
| |
| |
| DofMap |
| add_neighbors_to_send_list() 6 0.0001 0.000012 0.0001 0.000012 1.01 1.01 |
| build_sparsity() 6 0.0002 0.000033 0.0011 0.000187 2.78 15.84 |
| create_dof_constraints() 6 0.0000 0.000001 0.0000 0.000001 0.07 0.07 |
| distribute_dofs() 6 0.0001 0.000025 0.0004 0.000066 2.09 5.57 |
| dof_indices() 688 0.0010 0.000001 0.0010 0.000001 14.36 14.36 |
| old_dof_indices() 300 0.0001 0.000000 0.0001 0.000000 0.96 0.96 |
| prepare_send_list() 7 0.0000 0.000000 0.0000 0.000000 0.01 0.01 |
| reinit() 6 0.0002 0.000041 0.0002 0.000041 3.48 3.48 |
| |
| EquationSystems |
| build_solution_vector() 1 0.0001 0.000056 0.0001 0.000064 0.79 0.90 |


Thanks.

On Wed, Apr 26, 2017 at 10:09 PM, Rossi, Simone <***@email.unc.edu<mailto:***@email.unc.edu>> wrote:
Dear Roy, dear Paul, dear all,
I am testing AMR in libmesh using simple linear elements.
My test case is a propagating front described by a reaction-diffusion equation with a cubic bistable reaction term.
I followed the adaptivity examples to create this test case.

The run times for 100 timesteps using AMR can be more than 10 times slower than when using a fine uniform grid.
For example, with a 16 x 16 x 16 uniform grid, 100 iterations take about 18 seconds with a single processor.
With AMR, using a 2 x 2 x 2 grid and 3 levels of refinement, 100 iterations take about 800 seconds.

I’m attaching the code I’m using.
Without AMR, I build the matrix ( mass + dt * stiffness ) once and I update the rhs at every timestep.
Conversely, with AMR I am building the matrix and the rhs at every timestep for all the refinement levels.
Do you have any suggestions?

Thanks a lot for your help,
All the best,
Simone


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org<http://slashdot.org/>! http://sdm.link/slashdot
_______________________________________________
Libmesh-users mailing list
Libmesh-***@lists.sourceforge.net<mailto:Libmesh-***@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/libmesh-users
--
Vikram Garg
Postdoctoral Associate
The University of Texas at Austin

http://vikramvgarg.wordpress.com/
http://www.runforindia.org/runners/vikramg
Vikram Garg
2017-04-27 16:14:05 UTC
Permalink
Rossi, yes compiling with perflog should give you all the details as in the
example.
Post by Rossi, Simone
Dear Vikram,
as in the examples, I am using the libmesh::KellyErrorEstimator.
I’m compiling libmesh with the --enable-perflog option. Does it
automatically give all the details you have listed in the example?
For the time being, I am attaching two perfLogs I had saved with only
“coarse scale” data for 2 levels of refinements.
It looks like that most of the time is spent in the AMR step, probably in
the call to reinit().
Thanks,
Simone
-----------------------------------------------------------
-------------------------------------------------
| perf_log Performance: Alive time=18.0494, Active time=18.0426
|
-----------------------------------------------------------
-------------------------------------------------
| Event nCalls Total Time Avg Time Total
Time Avg Time % of Active Time |
| w/o Sub w/o Sub With
Sub With Sub w/o S With S |
|-----------------------------------------------------------
-------------------------------------------------|
| no amr matrix assembly 1 0.1545 0.154465
0.1545 0.154465 0.86 0.86 |
| no amr linear solve 101 4.8069 0.047593
4.8069 0.047593 26.64 26.64 |
| no amr rhs assembly 101 12.0348 0.119156
12.0348 0.119156 66.70 66.70 |
| time loop 1 1.0464
1.046422 17.8884 17.888405 5.80 99.15 |
-----------------------------------------------------------
-------------------------------------------------
| Totals: 204 18.0426
100.00 |
-----------------------------------------------------------
-------------------------------------------------
------------------------------------------------------------
------------------------------------------------
| perf_log Performance: Alive time=209.305, Active time=209.298
|
-----------------------------------------------------------
-------------------------------------------------
| Event nCalls Total Time Avg Time Total
Time Avg Time % of Active Time |
| w/o Sub w/o Sub With
Sub With Sub w/o S With S |
|-----------------------------------------------------------
-------------------------------------------------|
|
|
| amr 303 195.1102 0.643928
195.1102 0.643928 93.22 93.22 |
| amr solve 303 13.9907 0.046174
13.9907 0.046174 6.68 6.68 |
| time loop 1 0.1974 0.197370
209.2990 209.299042 0.09 100.00 |
-----------------------------------------------------------
-------------------------------------------------
| Totals: 607 209.2983
100.00 |
-----------------------------------------------------------
-------------------------------------------------​
Hello Rossi,
1) Which error estimator/indicator are you using to mark elements for refinement ?
2) Can you send the perfLog output from libMesh ? You might need to
recompile libMesh with the option --enable-perflog.
-----------------------------------------------------------
------------------------------------------------------
| libMesh Performance: Alive time=0.013423, Active time=0.007095
|
-----------------------------------------------------------
------------------------------------------------------
| Event nCalls Total Time Avg Time
Total Time Avg Time % of Active Time |
| w/o Sub w/o Sub
With Sub With Sub w/o S With S |
|-----------------------------------------------------------
------------------------------------------------------|
|
|
|
|
| DofMap
|
| add_neighbors_to_send_list() 6 0.0001 0.000012
0.0001 0.000012 1.01 1.01 |
| build_sparsity() 6 0.0002 0.000033
0.0011 0.000187 2.78 15.84 |
| create_dof_constraints() 6 0.0000 0.000001
0.0000 0.000001 0.07 0.07 |
| distribute_dofs() 6 0.0001 0.000025
0.0004 0.000066 2.09 5.57 |
| dof_indices() 688 0.0010 0.000001
0.0010 0.000001 14.36 14.36 |
| old_dof_indices() 300 0.0001 0.000000
0.0001 0.000000 0.96 0.96 |
| prepare_send_list() 7 0.0000 0.000000
0.0000 0.000000 0.01 0.01 |
| reinit() 6 0.0002 0.000041
0.0002 0.000041 3.48 3.48 |
|
|
| EquationSystems
|
| build_solution_vector() 1 0.0001 0.000056
0.0001 0.000064 0.79 0.90 |
Thanks.
Post by Rossi, Simone
Dear Roy, dear Paul, dear all,
I am testing AMR in libmesh using simple linear elements.
My test case is a propagating front described by a reaction-diffusion
equation with a cubic bistable reaction term.
I followed the adaptivity examples to create this test case.
The run times for 100 timesteps using AMR can be more than 10 times
slower than when using a fine uniform grid.
For example, with a 16 x 16 x 16 uniform grid, 100 iterations take about
18 seconds with a single processor.
With AMR, using a 2 x 2 x 2 grid and 3 levels of refinement, 100
iterations take about 800 seconds.
I’m attaching the code I’m using.
Without AMR, I build the matrix ( mass + dt * stiffness ) once and I
update the rhs at every timestep.
Conversely, with AMR I am building the matrix and the rhs at every
timestep for all the refinement levels.
Do you have any suggestions?
Thanks a lot for your help,
All the best,
Simone
------------------------------------------------------------
------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org <http://slashdot.org/>! http://
sdm.link/slashdot
_______________________________________________
Libmesh-users mailing list
https://lists.sourceforge.net/lists/listinfo/libmesh-users
--
Vikram Garg
Postdoctoral Associate
The University of Texas at Austin
http://vikramvgarg.wordpress.com/
http://www.runforindia.org/runners/vikramg
--
Vikram Garg
Postdoctoral Associate
The University of Texas at Austin

http://vikramvgarg.wordpress.com/
http://www.runforindia.org/runners/vikramg
Rossi, Simone
2017-04-27 18:29:33 UTC
Permalink
Ok, I ran again the tests with different max_h_levels with the perflog enabled.
Let me know if you see anything here.
Thanks,
Simone

NO AMR
-----------------------------------------------------------------------------------------------------------------
| libMesh Performance: Alive time=77.5482, Active time=40.2976 |
-----------------------------------------------------------------------------------------------------------------
| Event nCalls Total Time Avg Time Total Time Avg Time % of Active Time |
| w/o Sub w/o Sub With Sub With Sub w/o S With S |
|-----------------------------------------------------------------------------------------------------------------|
| |
| |
| DefaultCoupling |
| operator() 98306 0.1609 0.000002 0.1609 0.000002 0.40 0.40 |
| |
| DofMap |
| add_neighbors_to_send_list() 1 0.0959 0.095930 0.3744 0.374369 0.24 0.93 |
| build_sparsity() 1 0.4701 0.470055 1.1433 1.143297 1.17 2.84 |
| create_dof_constraints() 1 0.0137 0.013673 0.0137 0.013673 0.03 0.03 |
| distribute_dofs() 1 0.0126 0.012578 0.4376 0.437647 0.03 1.09 |
| dof_indices() 11010048 9.9728 0.000001 9.9728 0.000001 24.75 24.75 |
| prepare_send_list() 2 0.0000 0.000002 0.0000 0.000002 0.00 0.00 |
| reinit() 1 0.0507 0.050692 0.0507 0.050692 0.13 0.13 |
| |
| EquationSystems |
| build_parallel_solution_vector() 5 1.4241 0.284811 2.4934 0.498673 3.53 6.19 |
| build_solution_vector() 5 0.0002 0.000050 2.4936 0.498724 0.00 6.19 |
| |
| ExodusII_IO |
| write_nodal_data() 3 0.0774 0.025816 0.0774 0.025816 0.19 0.19 |
| |
| FE |
| compute_shape_functions() 10027008 11.7027 0.000001 11.7027 0.000001 29.04 29.04 |
| init_shape_functions() 102 0.0007 0.000007 0.0007 0.000007 0.00 0.00 |
| |
| FEMap |
| compute_affine_map() 10027008 9.9328 0.000001 9.9328 0.000001 24.65 24.65 |
| init_reference_to_physical_map() 102 0.0008 0.000008 0.0008 0.000008 0.00 0.00 |
| |
| GMVIO |
| write_nodal_data() 2 0.2260 0.113020 0.2260 0.113020 0.56 0.56 |
| |
| GenericProjector |
| operator() 1 0.8425 0.842529 2.0842 2.084232 2.09 5.17 |
| project_edges 98304 0.0765 0.000001 0.0765 0.000001 0.19 0.19 |
| project_interior 98304 0.0765 0.000001 0.0765 0.000001 0.19 0.19 |
| project_nodes 98304 0.0865 0.000001 0.0865 0.000001 0.21 0.21 |
| project_sides 98304 0.0763 0.000001 0.0763 0.000001 0.19 0.19 |
| |
| Mesh |
| find_neighbors() 1 0.1105 0.110532 0.1105 0.110532 0.27 0.27 |
| renumber_nodes_and_elem() 2 0.0063 0.003125 0.0063 0.003125 0.02 0.02 |
| |
| MeshOutput |
| write_equation_systems() 5 0.0001 0.000021 2.7972 0.559445 0.00 6.94 |
| |
| MeshTools::Generation |
| build_cube() 1 0.0280 0.027995 0.0280 0.027995 0.07 0.07 |
| |
| Parallel |
| allgather() 1 0.0000 0.000003 0.0000 0.000003 0.00 0.00 |
| |
| Partitioner |
| single_partition() 1 0.0028 0.002767 0.0028 0.002767 0.01 0.01 |
| |
| PetscLinearSolver |
| solve() 101 4.8469 0.047989 4.8469 0.047989 12.03 12.03 |
| |
| System |
| project_fem_vector() 1 0.0034 0.003364 2.0876 2.087598 0.01 5.18 |
| project_vector(FunctionBase) 1 0.0000 0.000011 2.0876 2.087610 0.00 5.18 |
-----------------------------------------------------------------------------------------------------------------
| Totals: 3.156e+07 40.2976 100.00 |
-----------------------------------------------------------------------------------------------------------------






AMR: 1 refinement
-----------------------------------------------------------------------------------------------------------------
| libMesh Performance: Alive time=395.981, Active time=261.811 |
-----------------------------------------------------------------------------------------------------------------
| Event nCalls Total Time Avg Time Total Time Avg Time % of Active Time |
| w/o Sub w/o Sub With Sub With Sub w/o S With S |
|-----------------------------------------------------------------------------------------------------------------|
| |
| |
| DefaultCoupling |
| operator() 1336320 2.0806 0.000002 2.0806 0.000002 0.79 0.79 |
| |
| DofMap |
| add_neighbors_to_send_list() 102 1.2626 0.012378 4.8311 0.047363 0.48 1.85 |
| build_sparsity() 102 6.5962 0.064669 15.1863 0.148885 2.52 5.80 |
| create_dof_constraints() 102 0.1384 0.001356 0.2351 0.002305 0.05 0.09 |
| distribute_dofs() 102 0.1489 0.001459 5.6797 0.055684 0.06 2.17 |
| dof_indices() 22510266 19.3897 0.000001 19.3897 0.000001 7.41 7.41 |
| enforce_constraints_exactly() 303 0.1463 0.000483 0.1463 0.000483 0.06 0.06 |
| old_dof_indices() 11914452 11.0468 0.000001 11.0468 0.000001 4.22 4.22 |
| prepare_send_list() 103 0.0001 0.000001 0.0001 0.000001 0.00 0.00 |
| reinit() 102 0.6993 0.006856 0.6993 0.006856 0.27 0.27 |
| |
| EquationSystems |
| build_parallel_solution_vector() 5 0.1832 0.036644 0.3127 0.062538 0.07 0.12 |
| build_solution_vector() 5 0.0001 0.000018 0.3128 0.062557 0.00 0.12 |
| |
| ExodusII_IO |
| write_nodal_data() 3 0.0094 0.003131 0.0094 0.003131 0.00 0.00 |
| |
| FE |
| compute_shape_functions() 12975978 16.6602 0.000001 16.6602 0.000001 6.36 6.36 |
| init_shape_functions() 10329700 16.6365 0.000002 16.6365 0.000002 6.35 6.35 |
| inverse_map() 10386411 11.3644 0.000001 11.3644 0.000001 4.34 4.34 |
| |
| FEMap |
| compute_affine_map() 12975978 13.4041 0.000001 13.4041 0.000001 5.12 5.12 |
| compute_face_map() 7691859 8.9240 0.000001 8.9240 0.000001 3.41 3.41 |
| init_face_shape_functions() 101 0.0004 0.000004 0.0004 0.000004 0.00 0.00 |
| init_reference_to_physical_map() 10329700 11.4379 0.000001 11.4379 0.000001 4.37 4.37 |
| |
| GMVIO |
| write_nodal_data() 2 0.0979 0.048947 0.0979 0.048947 0.04 0.04 |
| |
| GenericProjector |
| copy_dofs 3917556 15.7713 0.000004 59.2081 0.000015 6.02 22.61 |
| operator() 304 11.6914 0.038458 95.5809 0.314411 4.47 36.51 |
| project_edges 66216 0.0489 0.000001 0.0489 0.000001 0.02 0.02 |
| project_interior 66216 0.0493 0.000001 0.0493 0.000001 0.02 0.02 |
| project_nodes 66216 0.2561 0.000004 3.4858 0.000053 0.10 1.33 |
| project_sides 66216 0.0498 0.000001 0.0498 0.000001 0.02 0.02 |
| |
| JumpErrorEstimator |
| estimate_error() 101 73.8216 0.730907 231.1510 2.288624 28.20 88.29 |
| |
| Mesh |
| contract() 101 0.0296 0.000293 0.0581 0.000575 0.01 0.02 |
| find_neighbors() 101 1.4534 0.014391 1.4534 0.014391 0.56 0.56 |
| renumber_nodes_and_elem() 303 0.0847 0.000280 0.0847 0.000280 0.03 0.03 |
| |
| MeshOutput |
| write_equation_systems() 5 0.0001 0.000017 0.4202 0.084033 0.00 0.16 |
| |
| MeshRefinement |
| _coarsen_elements() 202 0.0812 0.000402 0.0812 0.000402 0.03 0.03 |
| _refine_elements() 202 0.1485 0.000735 0.2795 0.001383 0.06 0.11 |
| add_node() 64512 0.0546 0.000001 0.0546 0.000001 0.02 0.02 |
| make_coarsening_compatible() 204 0.3018 0.001479 0.3018 0.001479 0.12 0.12 |
| make_flags_parallel_consistent() 303 0.2300 0.000759 0.2300 0.000759 0.09 0.09 |
| make_refinement_compatible() 204 0.0242 0.000119 0.0242 0.000119 0.01 0.01 |
| |
| MeshTools::Generation |
| build_cube() 1 0.0039 0.003937 0.0039 0.003937 0.00 0.00 |
| |
| OldSolutionValue |
| Number eval_at_node() 215712 0.2301 0.000001 2.9735 0.000014 0.09 1.14 |
| check_old_context(c) 3917556 10.9141 0.000003 27.5061 0.000007 4.17 10.51 |
| check_old_context(c,p) 68724 0.1726 0.000003 0.4012 0.000006 0.07 0.15 |
| eval_at_point() 68724 0.8513 0.000012 2.6627 0.000039 0.33 1.02 |
| eval_old_dofs() 3917556 6.6409 0.000002 38.7818 0.000010 2.54 14.81 |
| |
| Parallel |
| allgather() 102 0.0001 0.000001 0.0001 0.000001 0.00 0.00 |
| |
| Partitioner |
| single_partition() 101 0.0341 0.000338 0.0341 0.000338 0.01 0.01 |
| |
| PetscLinearSolver |
| solve() 202 1.6660 0.008248 1.6660 0.008248 0.64 0.64 |
| |
| StatisticsVector |
| maximum() 101 0.0018 0.000017 0.0018 0.000017 0.00 0.00 |
| |
| System |
| assemble() 202 11.5849 0.057351 28.7372 0.142263 4.42 10.98 |
| project_fem_vector() 1 0.0004 0.000417 0.2583 0.258341 0.00 0.10 |
| project_vector(FunctionBase) 1 0.0000 0.000008 0.2584 0.258351 0.00 0.10 |
| project_vector(old,new) 303 5.2799 0.017425 109.1696 0.360296 2.02 41.70 |
| |
| TopologyMap |
| init() 202 0.1071 0.000530 0.1071 0.000530 0.04 0.04 |
-----------------------------------------------------------------------------------------------------------------
| Totals: 1.129e+08 261.8108 100.00 |
-----------------------------------------------------------------------------------------------------------------





AMR 2 refinements
-----------------------------------------------------------------------------------------------------------------
| libMesh Performance: Alive time=156.79, Active time=103.985 |
-----------------------------------------------------------------------------------------------------------------
| Event nCalls Total Time Avg Time Total Time Avg Time % of Active Time |
| w/o Sub w/o Sub With Sub With Sub w/o S With S |
|-----------------------------------------------------------------------------------------------------------------|
| |
| |
| DefaultCoupling |
| operator() 487585 0.7671 0.000002 0.7671 0.000002 0.74 0.74 |
| |
| DofMap |
| add_neighbors_to_send_list() 203 0.4861 0.002394 1.8338 0.009034 0.47 1.76 |
| build_sparsity() 203 2.8815 0.014194 6.2119 0.030601 2.77 5.97 |
| create_dof_constraints() 203 0.2105 0.001037 0.4801 0.002365 0.20 0.46 |
| distribute_dofs() 203 0.0596 0.000294 2.1454 0.010569 0.06 2.06 |
| dof_indices() 8055927 7.4875 0.000001 7.4875 0.000001 7.20 7.20 |
| enforce_constraints_exactly() 606 0.3674 0.000606 0.3674 0.000606 0.35 0.35 |
| old_dof_indices() 4358601 4.2132 0.000001 4.2132 0.000001 4.05 4.05 |
| prepare_send_list() 204 0.0002 0.000001 0.0002 0.000001 0.00 0.00 |
| reinit() 203 0.2510 0.001237 0.2510 0.001237 0.24 0.24 |
| |
| EquationSystems |
| build_parallel_solution_vector() 5 0.0316 0.006312 0.0543 0.010852 0.03 0.05 |
| build_solution_vector() 5 0.0001 0.000014 0.0543 0.010868 0.00 0.05 |
| |
| ExodusII_IO |
| write_nodal_data() 3 0.0024 0.000816 0.0024 0.000816 0.00 0.00 |
| |
| FE |
| compute_shape_functions() 4507581 6.1953 0.000001 6.1953 0.000001 5.96 5.96 |
| init_shape_functions() 3783756 6.6310 0.000002 6.6310 0.000002 6.38 6.38 |
| inverse_map() 3875385 4.5491 0.000001 4.5491 0.000001 4.37 4.37 |
| |
| FEMap |
| compute_affine_map() 4507581 5.2201 0.000001 5.2201 0.000001 5.02 5.02 |
| compute_face_map() 2763882 3.5520 0.000001 3.5520 0.000001 3.42 3.42 |
| init_face_shape_functions() 202 0.0007 0.000004 0.0007 0.000004 0.00 0.00 |
| init_reference_to_physical_map() 3783756 4.6286 0.000001 4.6286 0.000001 4.45 4.45 |
| |
| GMVIO |
| write_nodal_data() 2 0.1665 0.083237 0.1665 0.083237 0.16 0.16 |
| |
| GenericProjector |
| copy_dofs 1361385 5.6580 0.000004 21.6490 0.000016 5.44 20.82 |
| operator() 607 5.0012 0.008239 40.4516 0.066642 4.81 38.90 |
| project_edges 97080 0.0766 0.000001 0.0766 0.000001 0.07 0.07 |
| project_interior 97080 0.0751 0.000001 0.0751 0.000001 0.07 0.07 |
| project_nodes 97080 0.4693 0.000005 5.0553 0.000052 0.45 4.86 |
| project_sides 97080 0.0770 0.000001 0.0770 0.000001 0.07 0.07 |
| |
| JumpErrorEstimator |
| estimate_error() 202 28.7106 0.142132 89.7093 0.444106 27.61 86.27 |
| |
| Mesh |
| contract() 202 0.0160 0.000079 0.0280 0.000139 0.02 0.03 |
| find_neighbors() 203 0.5978 0.002945 0.5978 0.002945 0.57 0.57 |
| renumber_nodes_and_elem() 608 0.0350 0.000058 0.0350 0.000058 0.03 0.03 |
| |
| MeshOutput |
| write_equation_systems() 5 0.0001 0.000013 0.2233 0.044669 0.00 0.21 |
| |
| MeshRefinement |
| _coarsen_elements() 404 0.0378 0.000094 0.0378 0.000094 0.04 0.04 |
| _refine_elements() 404 0.1563 0.000387 0.4010 0.000993 0.15 0.39 |
| add_node() 113664 0.1007 0.000001 0.1007 0.000001 0.10 0.10 |
| make_coarsening_compatible() 407 0.1988 0.000489 0.1988 0.000489 0.19 0.19 |
| make_flags_parallel_consistent() 606 0.0937 0.000155 0.0937 0.000155 0.09 0.09 |
| make_refinement_compatible() 407 0.0102 0.000025 0.0102 0.000025 0.01 0.01 |
| |
| MeshTools::Generation |
| build_cube() 1 0.0007 0.000677 0.0007 0.000677 0.00 0.00 |
| |
| OldSolutionValue |
| Number eval_at_node() 382176 0.3948 0.000001 4.1027 0.000011 0.38 3.95 |
| check_old_context(c) 1361385 4.0214 0.000003 10.1149 0.000007 3.87 9.73 |
| check_old_context(c,p) 85266 0.2420 0.000003 0.5679 0.000007 0.23 0.55 |
| eval_at_point() 85266 1.1648 0.000014 3.5999 0.000042 1.12 3.46 |
| eval_old_dofs() 1361385 2.4347 0.000002 14.2695 0.000010 2.34 13.72 |
| |
| Parallel |
| allgather() 203 0.0002 0.000001 0.0002 0.000001 0.00 0.00 |
| |
| Partitioner |
| single_partition() 203 0.0140 0.000069 0.0140 0.000069 0.01 0.01 |
| |
| PetscLinearSolver |
| solve() 303 0.7612 0.002512 0.7612 0.002512 0.73 0.73 |
| |
| StatisticsVector |
| maximum() 202 0.0008 0.000004 0.0008 0.000004 0.00 0.00 |
| |
| System |
| assemble() 303 3.4738 0.011465 8.5615 0.028256 3.34 8.23 |
| project_fem_vector() 1 0.0001 0.000142 0.0331 0.033134 0.00 0.03 |
| project_vector(FunctionBase) 1 0.0000 0.000009 0.0331 0.033144 0.00 0.03 |
| project_vector(old,new) 606 2.3051 0.003804 46.4861 0.076710 2.22 44.70 |
| |
| TopologyMap |
| init() 404 0.1561 0.000386 0.1561 0.000386 0.15 0.15 |
-----------------------------------------------------------------------------------------------------------------
| Totals: 4.127e+07 103.9851 100.00 |
-----------------------------------------------------------------------------------------------------------------





AMR 3 refinements
-----------------------------------------------------------------------------------------------------------------
| libMesh Performance: Alive time=455.466, Active time=308.123 |
-----------------------------------------------------------------------------------------------------------------
| Event nCalls Total Time Avg Time Total Time Avg Time % of Active Time |
| w/o Sub w/o Sub With Sub With Sub w/o S With S |
|-----------------------------------------------------------------------------------------------------------------|
| |
| |
| DefaultCoupling |
| operator() 1153034 1.9001 0.000002 1.9001 0.000002 0.62 0.62 |
| |
| DofMap |
| add_neighbors_to_send_list() 304 1.1509 0.003786 4.4251 0.014556 0.37 1.44 |
| build_sparsity() 304 8.8682 0.029172 16.6976 0.054926 2.88 5.42 |
| create_dof_constraints() 304 1.6469 0.005417 3.9951 0.013142 0.53 1.30 |
| distribute_dofs() 304 0.2173 0.000715 5.4146 0.017811 0.07 1.76 |
| dof_indices() 19916934 18.6126 0.000001 18.6126 0.000001 6.04 6.04 |
| enforce_constraints_exactly() 909 2.8034 0.003084 2.8034 0.003084 0.91 0.91 |
| old_dof_indices() 10268793 10.0124 0.000001 10.0124 0.000001 3.25 3.25 |
| prepare_send_list() 305 0.0003 0.000001 0.0003 0.000001 0.00 0.00 |
| reinit() 304 0.7707 0.002535 0.7707 0.002535 0.25 0.25 |
| |
| EquationSystems |
| build_parallel_solution_vector() 5 0.0498 0.009954 0.0849 0.016974 0.02 0.03 |
| build_solution_vector() 5 0.0001 0.000015 0.0850 0.016991 0.00 0.03 |
| |
| ExodusII_IO |
| write_nodal_data() 3 0.0016 0.000526 0.0016 0.000526 0.00 0.00 |
| |
| FE |
| compute_shape_functions() 12087258 16.7562 0.000001 16.7562 0.000001 5.44 5.44 |
| init_shape_functions() 10555340 23.3502 0.000002 23.3502 0.000002 7.58 7.58 |
| inverse_map() 11670851 13.6081 0.000001 13.6081 0.000001 4.42 4.42 |
| |
| FEMap |
| compute_affine_map() 12087258 15.1613 0.000001 15.1613 0.000001 4.92 4.92 |
| compute_face_map() 6822171 8.8288 0.000001 8.8288 0.000001 2.87 2.87 |
| init_face_shape_functions() 303 0.0011 0.000004 0.0011 0.000004 0.00 0.00 |
| init_reference_to_physical_map() 10555340 14.9343 0.000001 14.9343 0.000001 4.85 4.85 |
| |
| GMVIO |
| write_nodal_data() 2 0.0676 0.033816 0.0676 0.033816 0.02 0.02 |
| |
| GenericProjector |
| copy_dofs 2157561 8.8513 0.000004 33.9505 0.000016 2.87 11.02 |
| operator() 910 18.4194 0.020241 155.7304 0.171132 5.98 50.54 |
| project_edges 1299333 1.0235 0.000001 1.0235 0.000001 0.33 0.33 |
| project_interior 1299333 1.0026 0.000001 1.0026 0.000001 0.33 0.33 |
| project_nodes 1299333 6.3258 0.000005 76.0383 0.000059 2.05 24.68 |
| project_sides 1299333 1.0258 0.000001 1.0258 0.000001 0.33 0.33 |
| |
| JumpErrorEstimator |
| estimate_error() 303 71.4588 0.235838 222.8668 0.735534 23.19 72.33 |
| |
| Mesh |
| contract() 303 0.0998 0.000329 0.1462 0.000483 0.03 0.05 |
| find_neighbors() 304 2.2488 0.007397 2.2488 0.007397 0.73 0.73 |
| renumber_nodes_and_elem() 911 0.1348 0.000148 0.1348 0.000148 0.04 0.04 |
| |
| MeshOutput |
| write_equation_systems() 5 0.0001 0.000013 0.1542 0.030848 0.00 0.05 |
| |
| MeshRefinement |
| _coarsen_elements() 606 0.1621 0.000268 0.1621 0.000268 0.05 0.05 |
| _refine_elements() 606 1.6498 0.002722 4.9647 0.008193 0.54 1.61 |
| add_node() 1542432 1.3647 0.000001 1.3647 0.000001 0.44 0.44 |
| make_coarsening_compatible() 809 1.4420 0.001782 1.4420 0.001782 0.47 0.47 |
| make_flags_parallel_consistent() 909 0.2881 0.000317 0.2881 0.000317 0.09 0.09 |
| make_refinement_compatible() 809 0.0552 0.000068 0.0552 0.000068 0.02 0.02 |
| |
| MeshTools::Generation |
| build_cube() 1 0.0002 0.000230 0.0002 0.000230 0.00 0.00 |
| |
| OldSolutionValue |
| Number eval_at_node() 5196564 5.4931 0.000001 63.1554 0.000012 1.78 20.50 |
| check_old_context(c) 2157561 6.2716 0.000003 15.8724 0.000007 2.04 5.15 |
| check_old_context(c,p) 1343484 3.6784 0.000003 8.6255 0.000006 1.19 2.80 |
| eval_at_point() 1343484 18.1202 0.000013 55.9662 0.000042 5.88 18.16 |
| eval_old_dofs() 2157561 3.8284 0.000002 22.3994 0.000010 1.24 7.27 |
| |
| Parallel |
| allgather() 304 0.0003 0.000001 0.0003 0.000001 0.00 0.00 |
| |
| Partitioner |
| single_partition() 304 0.0450 0.000148 0.0450 0.000148 0.01 0.01 |
| |
| PetscLinearSolver |
| solve() 404 1.5022 0.003718 1.5022 0.003718 0.49 0.49 |
| |
| StatisticsVector |
| maximum() 303 0.0019 0.000006 0.0019 0.000006 0.00 0.00 |
| |
| System |
| assemble() 404 7.4765 0.018506 18.1484 0.044922 2.43 5.89 |
| project_fem_vector() 1 0.0001 0.000109 0.0045 0.004474 0.00 0.00 |
| project_vector(FunctionBase) 1 0.0000 0.000010 0.0045 0.004485 0.00 0.00 |
| project_vector(old,new) 909 6.4352 0.007079 174.8106 0.192311 2.09 56.73 |
| |
| TopologyMap |
| init() 606 0.9755 0.001610 0.9755 0.001610 0.32 0.32 |
-----------------------------------------------------------------------------------------------------------------
| Totals: 1.162e+08 308.1230 100.00 |
-----------------------------------------------------------------------------------------------------------------


On Apr 27, 2017, at 12:14, Vikram Garg <***@gmail.com<mailto:***@gmail.com>> wrote:

Rossi, yes compiling with perflog should give you all the details as in the example.





On Thu, Apr 27, 2017 at 10:54 AM, Rossi, Simone <***@email.unc.edu<mailto:***@email.unc.edu>> wrote:
Dear Vikram,
as in the examples, I am using the libmesh::KellyErrorEstimator.

I’m compiling libmesh with the --enable-perflog option. Does it automatically give all the details you have listed in the example?

For the time being, I am attaching two perfLogs I had saved with only “coarse scale” data for 2 levels of refinements.
It looks like that most of the time is spent in the AMR step, probably in the call to reinit().

Thanks,
Simone

NO AMR:

------------------------------------------------------------------------------------------------------------
| perf_log Performance: Alive time=18.0494, Active time=18.0426 |
------------------------------------------------------------------------------------------------------------
| Event nCalls Total Time Avg Time Total Time Avg Time % of Active Time |
| w/o Sub w/o Sub With Sub With Sub w/o S With S |
|------------------------------------------------------------------------------------------------------------|
| no amr matrix assembly 1 0.1545 0.154465 0.1545 0.154465 0.86 0.86 |
| no amr linear solve 101 4.8069 0.047593 4.8069 0.047593 26.64 26.64 |
| no amr rhs assembly 101 12.0348 0.119156 12.0348 0.119156 66.70 66.70 |
| time loop 1 1.0464 1.046422 17.8884 17.888405 5.80 99.15 |
------------------------------------------------------------------------------------------------------------
| Totals: 204 18.0426 100.00 |
------------------------------------------------------------------------------------------------------------


AMR:

------------------------------------------------------------------------------------------------------------
| perf_log Performance: Alive time=209.305, Active time=209.298 |
------------------------------------------------------------------------------------------------------------
| Event nCalls Total Time Avg Time Total Time Avg Time % of Active Time |
| w/o Sub w/o Sub With Sub With Sub w/o S With S |
|------------------------------------------------------------------------------------------------------------|
| |
| amr 303 195.1102 0.643928 195.1102 0.643928 93.22 93.22 |
| amr solve 303 13.9907 0.046174 13.9907 0.046174 6.68 6.68 |
| time loop 1 0.1974 0.197370 209.2990 209.299042 0.09 100.00 |
------------------------------------------------------------------------------------------------------------
| Totals: 607 209.2983 100.00 |
------------------------------------------------------------------------------------------------------------​


On Apr 27, 2017, at 11:02, Vikram Garg <***@gmail.com<mailto:***@gmail.com>> wrote:

Hello Rossi,
Two questions:

1) Which error estimator/indicator are you using to mark elements for refinement ?

2) Can you send the perfLog output from libMesh ? You might need to recompile libMesh with the option --enable-perflog.

Looks something like this:

-----------------------------------------------------------------------------------------------------------------
| libMesh Performance: Alive time=0.013423, Active time=0.007095 |
-----------------------------------------------------------------------------------------------------------------
| Event nCalls Total Time Avg Time Total Time Avg Time % of Active Time |
| w/o Sub w/o Sub With Sub With Sub w/o S With S |
|-----------------------------------------------------------------------------------------------------------------|
| |
| |
| DofMap |
| add_neighbors_to_send_list() 6 0.0001 0.000012 0.0001 0.000012 1.01 1.01 |
| build_sparsity() 6 0.0002 0.000033 0.0011 0.000187 2.78 15.84 |
| create_dof_constraints() 6 0.0000 0.000001 0.0000 0.000001 0.07 0.07 |
| distribute_dofs() 6 0.0001 0.000025 0.0004 0.000066 2.09 5.57 |
| dof_indices() 688 0.0010 0.000001 0.0010 0.000001 14.36 14.36 |
| old_dof_indices() 300 0.0001 0.000000 0.0001 0.000000 0.96 0.96 |
| prepare_send_list() 7 0.0000 0.000000 0.0000 0.000000 0.01 0.01 |
| reinit() 6 0.0002 0.000041 0.0002 0.000041 3.48 3.48 |
| |
| EquationSystems |
| build_solution_vector() 1 0.0001 0.000056 0.0001 0.000064 0.79 0.90 |


Thanks.

On Wed, Apr 26, 2017 at 10:09 PM, Rossi, Simone <***@email.unc.edu<mailto:***@email.unc.edu>> wrote:
Dear Roy, dear Paul, dear all,
I am testing AMR in libmesh using simple linear elements.
My test case is a propagating front described by a reaction-diffusion equation with a cubic bistable reaction term.
I followed the adaptivity examples to create this test case.

The run times for 100 timesteps using AMR can be more than 10 times slower than when using a fine uniform grid.
For example, with a 16 x 16 x 16 uniform grid, 100 iterations take about 18 seconds with a single processor.
With AMR, using a 2 x 2 x 2 grid and 3 levels of refinement, 100 iterations take about 800 seconds.

I’m attaching the code I’m using.
Without AMR, I build the matrix ( mass + dt * stiffness ) once and I update the rhs at every timestep.
Conversely, with AMR I am building the matrix and the rhs at every timestep for all the refinement levels.
Do you have any suggestions?

Thanks a lot for your help,
All the best,
Simone


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org<http://slashdot.org/>! http://sdm.link/slashdot
_______________________________________________
Libmesh-users mailing list
Libmesh-***@lists.sourceforge.net<mailto:Libmesh-***@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/libmesh-users
--
Vikram Garg
Postdoctoral Associate
The University of Texas at Austin

http://vikramvgarg.wordpress.com/
http://www.runforindia.org/runners/vikramg
--
Vikram Garg
Postdoctoral Associate
The University of Texas at Austin

http://vikramvgarg.wordpress.com/
http://www.runforindia.org/runners/vikramg
Vikram Garg
2017-04-27 20:37:09 UTC
Permalink
Hello Rossi,
It seems it is the projection functions that are
computationally expensive. Would it be possible for you to run with the
PatchRecovery estimator, and see if that results in a similar performance ?

Thanks.
Post by Rossi, Simone
Rossi, yes compiling with perflog should give you all the details as in the example.
Post by Rossi, Simone
Dear Vikram,
as in the examples, I am using the libmesh::KellyErrorEstimator.
I’m compiling libmesh with the --enable-perflog option. Does it
automatically give all the details you have listed in the example?
For the time being, I am attaching two perfLogs I had saved with only
“coarse scale” data for 2 levels of refinements.
It looks like that most of the time is spent in the AMR step, probably in
the call to reinit().
Thanks,
Simone
-----------------------------------------------------------
-------------------------------------------------
| perf_log Performance: Alive time=18.0494, Active time=18.0426
|
-----------------------------------------------------------
-------------------------------------------------
| Event nCalls Total Time Avg Time Total
Time Avg Time % of Active Time |
| w/o Sub w/o Sub With
Sub With Sub w/o S With S |
|-----------------------------------------------------------
-------------------------------------------------|
| no amr matrix assembly 1 0.1545 0.154465
0.1545 0.154465 0.86 0.86 |
| no amr linear solve 101 4.8069 0.047593
4.8069 0.047593 26.64 26.64 |
| no amr rhs assembly 101 12.0348 0.119156
12.0348 0.119156 66.70 66.70 |
| time loop 1 1.0464
1.046422 17.8884 17.888405 5.80 99.15 |
-----------------------------------------------------------
-------------------------------------------------
| Totals: 204 18.0426
100.00 |
-----------------------------------------------------------
-------------------------------------------------
------------------------------------------------------------
------------------------------------------------
| perf_log Performance: Alive time=209.305, Active time=209.298
|
-----------------------------------------------------------
-------------------------------------------------
| Event nCalls Total Time Avg Time Total
Time Avg Time % of Active Time |
| w/o Sub w/o Sub With
Sub With Sub w/o S With S |
|-----------------------------------------------------------
-------------------------------------------------|
|
|
| amr 303 195.1102 0.643928
195.1102 0.643928 93.22 93.22 |
| amr solve 303 13.9907 0.046174
13.9907 0.046174 6.68 6.68 |
| time loop 1 0.1974 0.197370
209.2990 209.299042 0.09 100.00 |
-----------------------------------------------------------
-------------------------------------------------
| Totals: 607 209.2983
100.00 |
-----------------------------------------------------------
-------------------------------------------------​
Hello Rossi,
1) Which error estimator/indicator are you using to mark elements for refinement ?
2) Can you send the perfLog output from libMesh ? You might need to
recompile libMesh with the option --enable-perflog.
-----------------------------------------------------------
------------------------------------------------------
| libMesh Performance: Alive time=0.013423, Active time=0.007095
|
-----------------------------------------------------------
------------------------------------------------------
| Event nCalls Total Time Avg Time
Total Time Avg Time % of Active Time |
| w/o Sub w/o Sub
With Sub With Sub w/o S With S |
|-----------------------------------------------------------
------------------------------------------------------|
|
|
|
|
| DofMap
|
| add_neighbors_to_send_list() 6 0.0001 0.000012
0.0001 0.000012 1.01 1.01 |
| build_sparsity() 6 0.0002 0.000033
0.0011 0.000187 2.78 15.84 |
| create_dof_constraints() 6 0.0000 0.000001
0.0000 0.000001 0.07 0.07 |
| distribute_dofs() 6 0.0001 0.000025
0.0004 0.000066 2.09 5.57 |
| dof_indices() 688 0.0010 0.000001
0.0010 0.000001 14.36 14.36 |
| old_dof_indices() 300 0.0001 0.000000
0.0001 0.000000 0.96 0.96 |
| prepare_send_list() 7 0.0000 0.000000
0.0000 0.000000 0.01 0.01 |
| reinit() 6 0.0002 0.000041
0.0002 0.000041 3.48 3.48 |
|
|
| EquationSystems
|
| build_solution_vector() 1 0.0001 0.000056
0.0001 0.000064 0.79 0.90 |
Thanks.
Post by Rossi, Simone
Dear Roy, dear Paul, dear all,
I am testing AMR in libmesh using simple linear elements.
My test case is a propagating front described by a reaction-diffusion
equation with a cubic bistable reaction term.
I followed the adaptivity examples to create this test case.
The run times for 100 timesteps using AMR can be more than 10 times
slower than when using a fine uniform grid.
For example, with a 16 x 16 x 16 uniform grid, 100 iterations take about
18 seconds with a single processor.
With AMR, using a 2 x 2 x 2 grid and 3 levels of refinement, 100
iterations take about 800 seconds.
I’m attaching the code I’m using.
Without AMR, I build the matrix ( mass + dt * stiffness ) once and I
update the rhs at every timestep.
Conversely, with AMR I am building the matrix and the rhs at every
timestep for all the refinement levels.
Do you have any suggestions?
Thanks a lot for your help,
All the best,
Simone
------------------------------------------------------------
------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org <http://slashdot.org/>! http://sd
m.link/slashdot
_______________________________________________
Libmesh-users mailing list
https://lists.sourceforge.net/lists/listinfo/libmesh-users
--
Vikram Garg
Postdoctoral Associate
The University of Texas at Austin
http://vikramvgarg.wordpress.com/
http://www.runforindia.org/runners/vikramg
--
Vikram Garg
Postdoctoral Associate
The University of Texas at Austin
http://vikramvgarg.wordpress.com/
http://www.runforindia.org/runners/vikramg
--
Vikram Garg
Postdoctoral Associate
The University of Texas at Austin

http://vikramvgarg.wordpress.com/
http://www.runforindia.org/runners/vikramg
Roy Stogner
2017-04-28 13:49:25 UTC
Permalink
It seems it is the projection functions that are computationally
expensive.
I'm not sure if this is the entire issue, but Vikram's almost
certainly right that this is the main issue.

I have a couple ideas for possible optimizations here; I'll see if I
can get a PR together next week.
---
Roy
Rossi, Simone
2017-04-28 14:56:37 UTC
Permalink
Dear Vikram,
I switched to the PatchRecoveryErrorEstimator.
The AMR simulations are faster than before, but still much slower than the uniform mesh case.
Most of the time is still spent in the projections.
Let me know if you have any suggestion.
Thanks a lot for your help,
All the best,
Simone


AMR 1 refinement
-----------------------------------------------------------------------------------------------------------------
| libMesh Performance: Alive time=230.416, Active time=153.727 |
-----------------------------------------------------------------------------------------------------------------
| Event nCalls Total Time Avg Time Total Time Avg Time % of Active Time |
| w/o Sub w/o Sub With Sub With Sub w/o S With S |
|-----------------------------------------------------------------------------------------------------------------|
| |
| |
| DefaultCoupling |
| operator() 1308971 2.0650 0.000002 2.0650 0.000002 1.34 1.34 |
| |
| DofMap |
| add_neighbors_to_send_list() 102 1.2799 0.012548 4.8744 0.047788 0.83 3.17 |
| build_sparsity() 102 6.6201 0.064903 15.3341 0.150334 4.31 9.97 |
| create_dof_constraints() 102 0.1584 0.001553 0.2895 0.002838 0.10 0.19 |
| distribute_dofs() 102 0.1486 0.001457 5.6969 0.055852 0.10 3.71 |
| dof_indices() 19634123 17.7785 0.000001 17.7785 0.000001 11.56 11.56 |
| enforce_constraints_exactly() 303 0.1809 0.000597 0.1809 0.000597 0.12 0.12 |
| old_dof_indices() 11672412 11.0857 0.000001 11.0857 0.000001 7.21 7.21 |
| prepare_send_list() 103 0.0001 0.000001 0.0001 0.000001 0.00 0.00 |
| reinit() 102 0.6735 0.006603 0.6735 0.006603 0.44 0.44 |
| |
| EquationSystems |
| build_parallel_solution_vector() 5 0.1823 0.036460 0.3190 0.063797 0.12 0.21 |
| build_solution_vector() 5 0.0001 0.000020 0.3191 0.063818 0.00 0.21 |
| |
| ExodusII_IO |
| write_nodal_data() 3 0.0096 0.003193 0.0096 0.003193 0.01 0.01 |
| |
| FE |
| compute_shape_functions() 7892622 7.8140 0.000001 7.8140 0.000001 5.08 5.08 |
| init_shape_functions() 238324 1.0645 0.000004 1.0645 0.000004 0.69 0.69 |
| inverse_map() 168619 0.2099 0.000001 0.2099 0.000001 0.14 0.14 |
| |
| FEMap |
| compute_affine_map() 7892622 8.1976 0.000001 8.1976 0.000001 5.33 5.33 |
| init_reference_to_physical_map() 238324 0.6433 0.000003 0.6433 0.000003 0.42 0.42 |
| |
| GMVIO |
| write_nodal_data() 2 0.1544 0.077185 0.1544 0.077185 0.10 0.10 |
| |
| GenericProjector |
| copy_dofs 3813348 15.6784 0.000004 59.6722 0.000016 10.20 38.82 |
| operator() 304 11.9653 0.039359 98.8752 0.325247 7.78 64.32 |
| project_edges 88377 0.0687 0.000001 0.0687 0.000001 0.04 0.04 |
| project_interior 88377 0.0678 0.000001 0.0678 0.000001 0.04 0.04 |
| project_nodes 88377 0.3843 0.000004 5.5123 0.000062 0.25 3.59 |
| project_sides 88377 0.0691 0.000001 0.0691 0.000001 0.04 0.04 |
| |
| Mesh |
| contract() 101 0.0302 0.000299 0.0582 0.000576 0.02 0.04 |
| find_neighbors() 102 1.4291 0.014010 1.4291 0.014010 0.93 0.93 |
| renumber_nodes_and_elem() 305 0.0836 0.000274 0.0836 0.000274 0.05 0.05 |
| |
| MeshOutput |
| write_equation_systems() 5 0.0001 0.000015 0.4831 0.096625 0.00 0.31 |
| |
| MeshRefinement |
| _coarsen_elements() 202 0.0808 0.000400 0.0808 0.000400 0.05 0.05 |
| _refine_elements() 202 0.1724 0.000854 0.3641 0.001803 0.11 0.24 |
| add_node() 90496 0.0793 0.000001 0.0793 0.000001 0.05 0.05 |
| make_coarsening_compatible() 270 0.3739 0.001385 0.3739 0.001385 0.24 0.24 |
| make_flags_parallel_consistent() 303 0.2229 0.000736 0.2229 0.000736 0.14 0.14 |
| make_refinement_compatible() 270 0.0388 0.000144 0.0388 0.000144 0.03 0.03 |
| |
| MeshTools::Generation |
| build_cube() 1 0.0045 0.004484 0.0045 0.004484 0.00 0.00 |
| |
| OldSolutionValue |
| Number eval_at_node() 304356 0.3435 0.000001 4.7509 0.000016 0.22 3.09 |
| check_old_context(c) 3813348 11.2585 0.000003 27.9372 0.000007 7.32 18.17 |
| check_old_context(c,p) 103944 0.2723 0.000003 0.6305 0.000006 0.18 0.41 |
| eval_at_point() 103944 1.4186 0.000014 4.2789 0.000041 0.92 2.78 |
| eval_old_dofs() 3813348 6.6561 0.000002 39.2936 0.000010 4.33 25.56 |
| |
| Parallel |
| allgather() 102 0.0001 0.000001 0.0001 0.000001 0.00 0.00 |
| |
| Partitioner |
| single_partition() 102 0.0321 0.000314 0.0321 0.000314 0.02 0.02 |
| |
| PatchRecoveryErrorEstimator |
| estimate_error() 101 25.9830 0.257257 61.0740 0.604693 16.90 39.73 |
| |
| PetscLinearSolver |
| solve() 202 1.6808 0.008321 1.6808 0.008321 1.09 1.09 |
| |
| StatisticsVector |
| maximum() 101 0.0018 0.000017 0.0018 0.000017 0.00 0.00 |
| |
| System |
| assemble() 202 11.6496 0.057671 29.2803 0.144952 7.58 19.05 |
| project_fem_vector() 1 0.0004 0.000423 0.2601 0.260093 0.00 0.17 |
| project_vector(FunctionBase) 1 0.0000 0.000009 0.2601 0.260103 0.00 0.17 |
| project_vector(old,new) 303 5.3281 0.017585 112.7034 0.371958 3.47 73.31 |
| |
| TopologyMap |
| init() 202 0.0871 0.000431 0.0871 0.000431 0.06 0.06 |
-----------------------------------------------------------------------------------------------------------------
| Totals: 6.145e+07 153.7275 100.00 |
-----------------------------------------------------------------------------------------------------------------




AMR 2 refinements
-----------------------------------------------------------------------------------------------------------------
| libMesh Performance: Alive time=84.948, Active time=56.8324 |
-----------------------------------------------------------------------------------------------------------------
| Event nCalls Total Time Avg Time Total Time Avg Time % of Active Time |
| w/o Sub w/o Sub With Sub With Sub w/o S With S |
|-----------------------------------------------------------------------------------------------------------------|
| |
| |
| DefaultCoupling |
| operator() 476364 0.7400 0.000002 0.7400 0.000002 1.30 1.30 |
| |
| DofMap |
| add_neighbors_to_send_list() 203 0.4752 0.002341 1.7849 0.008793 0.84 3.14 |
| build_sparsity() 203 2.8012 0.013799 6.0101 0.029606 4.93 10.58 |
| create_dof_constraints() 203 0.2178 0.001073 0.4982 0.002454 0.38 0.88 |
| distribute_dofs() 203 0.0647 0.000319 2.1029 0.010359 0.11 3.70 |
| dof_indices() 6846785 6.2077 0.000001 6.2077 0.000001 10.92 10.92 |
| enforce_constraints_exactly() 606 0.3776 0.000623 0.3776 0.000623 0.66 0.66 |
| old_dof_indices() 4266276 4.0893 0.000001 4.0893 0.000001 7.20 7.20 |
| prepare_send_list() 204 0.0002 0.000001 0.0002 0.000001 0.00 0.00 |
| reinit() 203 0.2524 0.001243 0.2524 0.001243 0.44 0.44 |
| |
| EquationSystems |
| build_parallel_solution_vector() 5 0.0295 0.005894 0.0509 0.010187 0.05 0.09 |
| build_solution_vector() 5 0.0001 0.000014 0.0510 0.010202 0.00 0.09 |
| |
| ExodusII_IO |
| write_nodal_data() 3 0.0025 0.000837 0.0025 0.000837 0.00 0.00 |
| |
| FE |
| compute_shape_functions() 2537219 2.4867 0.000001 2.4867 0.000001 4.38 4.38 |
| init_shape_functions() 132687 0.6792 0.000005 0.6792 0.000005 1.20 1.20 |
| inverse_map() 149560 0.1817 0.000001 0.1817 0.000001 0.32 0.32 |
| |
| FEMap |
| compute_affine_map() 2537219 2.6492 0.000001 2.6492 0.000001 4.66 4.66 |
| init_reference_to_physical_map() 132687 0.3793 0.000003 0.3793 0.000003 0.67 0.67 |
| |
| GMVIO |
| write_nodal_data() 2 0.0492 0.024577 0.0492 0.024577 0.09 0.09 |
| |
| GenericProjector |
| copy_dofs 1341762 5.4628 0.000004 20.9821 0.000016 9.61 36.92 |
| operator() 607 4.6860 0.007720 38.8120 0.063941 8.25 68.29 |
| project_edges 83040 0.0645 0.000001 0.0645 0.000001 0.11 0.11 |
| project_interior 83040 0.0644 0.000001 0.0644 0.000001 0.11 0.11 |
| project_nodes 83040 0.3899 0.000005 4.8299 0.000058 0.69 8.50 |
| project_sides 83040 0.0655 0.000001 0.0655 0.000001 0.12 0.12 |
| |
| Mesh |
| contract() 202 0.0157 0.000078 0.0286 0.000142 0.03 0.05 |
| find_neighbors() 203 0.5877 0.002895 0.5877 0.002895 1.03 1.03 |
| renumber_nodes_and_elem() 608 0.0372 0.000061 0.0372 0.000061 0.07 0.07 |
| |
| MeshOutput |
| write_equation_systems() 5 0.0001 0.000014 0.1028 0.020552 0.00 0.18 |
| |
| MeshRefinement |
| _coarsen_elements() 404 0.0362 0.000090 0.0362 0.000090 0.06 0.06 |
| _refine_elements() 404 0.1353 0.000335 0.3436 0.000851 0.24 0.60 |
| add_node() 96928 0.0864 0.000001 0.0864 0.000001 0.15 0.15 |
| make_coarsening_compatible() 494 0.2403 0.000486 0.2403 0.000486 0.42 0.42 |
| make_flags_parallel_consistent() 606 0.0939 0.000155 0.0939 0.000155 0.17 0.17 |
| make_refinement_compatible() 494 0.0151 0.000031 0.0151 0.000031 0.03 0.03 |
| |
| MeshTools::Generation |
| build_cube() 1 0.0007 0.000688 0.0007 0.000688 0.00 0.00 |
| |
| OldSolutionValue |
| Number eval_at_node() 326016 0.3474 0.000001 4.0303 0.000012 0.61 7.09 |
| check_old_context(c) 1341762 3.8804 0.000003 9.8097 0.000007 6.83 17.26 |
| check_old_context(c,p) 88194 0.2335 0.000003 0.5516 0.000006 0.41 0.97 |
| eval_at_point() 88194 1.1315 0.000013 3.5721 0.000041 1.99 6.29 |
| eval_old_dofs() 1341762 2.3692 0.000002 13.8493 0.000010 4.17 24.37 |
| |
| Parallel |
| allgather() 203 0.0002 0.000001 0.0002 0.000001 0.00 0.00 |
| |
| Partitioner |
| single_partition() 203 0.0133 0.000066 0.0133 0.000066 0.02 0.02 |
| |
| PatchRecoveryErrorEstimator |
| estimate_error() 202 8.7239 0.043187 20.4969 0.101470 15.35 36.07 |
| |
| PetscLinearSolver |
| solve() 303 0.7635 0.002520 0.7635 0.002520 1.34 1.34 |
| |
| StatisticsVector |
| maximum() 202 0.0008 0.000004 0.0008 0.000004 0.00 0.00 |
| |
| System |
| assemble() 303 3.3086 0.010919 8.1195 0.026797 5.82 14.29 |
| project_fem_vector() 1 0.0002 0.000171 0.0334 0.033433 0.00 0.06 |
| project_vector(FunctionBase) 1 0.0000 0.000010 0.0334 0.033443 0.00 0.06 |
| project_vector(old,new) 606 2.2412 0.003698 44.6599 0.073696 3.94 78.58 |
| |
| TopologyMap |
| init() 404 0.1538 0.000381 0.1538 0.000381 0.27 0.27 |
-----------------------------------------------------------------------------------------------------------------
| Totals: 2.204e+07 56.8324 100.00 |
-----------------------------------------------------------------------------------------------------------------






AMR 3 refinements
-----------------------------------------------------------------------------------------------------------------
| libMesh Performance: Alive time=238.81, Active time=167.585 |
-----------------------------------------------------------------------------------------------------------------
| Event nCalls Total Time Avg Time Total Time Avg Time % of Active Time |
| w/o Sub w/o Sub With Sub With Sub w/o S With S |
|-----------------------------------------------------------------------------------------------------------------|
| |
| |
| DefaultCoupling |
| operator() 842150 1.3950 0.000002 1.3950 0.000002 0.83 0.83 |
| |
| DofMap |
| add_neighbors_to_send_list() 304 0.8206 0.002699 3.1651 0.010411 0.49 1.89 |
| build_sparsity() 304 6.9767 0.022950 12.4466 0.040943 4.16 7.43 |
| create_dof_constraints() 304 1.7488 0.005753 4.3237 0.014223 1.04 2.58 |
| distribute_dofs() 304 0.1829 0.000602 3.9777 0.013084 0.11 2.37 |
| dof_indices() 12860957 11.3931 0.000001 11.3931 0.000001 6.80 6.80 |
| enforce_constraints_exactly() 909 3.1564 0.003472 3.1564 0.003472 1.88 1.88 |
| old_dof_indices() 7578465 7.1611 0.000001 7.1611 0.000001 4.27 4.27 |
| prepare_send_list() 305 0.0003 0.000001 0.0003 0.000001 0.00 0.00 |
| reinit() 304 0.6284 0.002067 0.6284 0.002067 0.37 0.37 |
| |
| EquationSystems |
| build_parallel_solution_vector() 5 0.0252 0.005032 0.0418 0.008360 0.02 0.02 |
| build_solution_vector() 5 0.0001 0.000015 0.0419 0.008375 0.00 0.02 |
| |
| ExodusII_IO |
| write_nodal_data() 3 0.0015 0.000506 0.0015 0.000506 0.00 0.00 |
| |
| FE |
| compute_shape_functions() 5859019 5.9806 0.000001 5.9806 0.000001 3.57 3.57 |
| init_shape_functions() 1723435 10.2193 0.000006 10.2193 0.000006 6.10 6.10 |
| inverse_map() 2650340 3.1708 0.000001 3.1708 0.000001 1.89 1.89 |
| |
| FEMap |
| compute_affine_map() 5859019 7.7749 0.000001 7.7749 0.000001 4.64 4.64 |
| init_reference_to_physical_map() 1723435 5.3966 0.000003 5.3966 0.000003 3.22 3.22 |
| |
| GMVIO |
| write_nodal_data() 2 0.0801 0.040046 0.0801 0.040046 0.05 0.05 |
| |
| GenericProjector |
| copy_dofs 1136565 4.5570 0.000004 17.2482 0.000015 2.72 10.29 |
| operator() 910 15.3913 0.016914 141.3689 0.155350 9.18 84.36 |
| project_edges 1387677 1.0267 0.000001 1.0267 0.000001 0.61 0.61 |
| project_interior 1387677 1.0240 0.000001 1.0240 0.000001 0.61 0.61 |
| project_nodes 1387677 6.4217 0.000005 87.2413 0.000063 3.83 52.06 |
| project_sides 1387677 1.0431 0.000001 1.0431 0.000001 0.62 0.62 |
| |
| Mesh |
| contract() 303 0.1038 0.000343 0.1467 0.000484 0.06 0.09 |
| find_neighbors() 304 1.8576 0.006111 1.8576 0.006111 1.11 1.11 |
| renumber_nodes_and_elem() 911 0.1198 0.000131 0.1198 0.000131 0.07 0.07 |
| |
| MeshOutput |
| write_equation_systems() 5 0.0001 0.000016 0.1236 0.024716 0.00 0.07 |
| |
| MeshRefinement |
| _coarsen_elements() 606 0.1436 0.000237 0.1436 0.000237 0.09 0.09 |
| _refine_elements() 606 1.6756 0.002765 5.0523 0.008337 1.00 3.01 |
| add_node() 1645216 1.4108 0.000001 1.4108 0.000001 0.84 0.84 |
| make_coarsening_compatible() 890 1.2224 0.001374 1.2224 0.001374 0.73 0.73 |
| make_flags_parallel_consistent() 909 0.2219 0.000244 0.2219 0.000244 0.13 0.13 |
| make_refinement_compatible() 890 0.0720 0.000081 0.0720 0.000081 0.04 0.04 |
| |
| MeshTools::Generation |
| build_cube() 1 0.0002 0.000212 0.0002 0.000212 0.00 0.00 |
| |
| OldSolutionValue |
| Number eval_at_node() 5549940 5.7961 0.000001 74.1854 0.000013 3.46 44.27 |
| check_old_context(c) 1136565 3.1938 0.000003 8.0410 0.000007 1.91 4.80 |
| check_old_context(c,p) 1654752 4.2185 0.000003 9.8454 0.000006 2.52 5.87 |
| eval_at_point() 1654752 21.8614 0.000013 66.4162 0.000040 13.04 39.63 |
| eval_old_dofs() 1136565 1.9376 0.000002 11.3337 0.000010 1.16 6.76 |
| |
| Parallel |
| allgather() 304 0.0002 0.000001 0.0002 0.000001 0.00 0.00 |
| |
| Partitioner |
| single_partition() 304 0.0375 0.000123 0.0375 0.000123 0.02 0.02 |
| |
| PatchRecoveryErrorEstimator |
| estimate_error() 303 15.3540 0.050673 35.4803 0.117097 9.16 21.17 |
| |
| PetscLinearSolver |
| solve() 404 1.3966 0.003457 1.3966 0.003457 0.83 0.83 |
| |
| StatisticsVector |
| maximum() 303 0.0015 0.000005 0.0015 0.000005 0.00 0.00 |
| |
| System |
| assemble() 404 5.2823 0.013075 12.6676 0.031355 3.15 7.56 |
| project_fem_vector() 1 0.0001 0.000102 0.0044 0.004428 0.00 0.00 |
| project_vector(FunctionBase) 1 0.0000 0.000010 0.0044 0.004438 0.00 0.00 |
| project_vector(old,new) 909 5.3573 0.005894 157.5095 0.173278 3.20 93.99 |
| |
| TopologyMap |
| init() 606 0.7440 0.001228 0.7440 0.001228 0.44 0.44 |
-----------------------------------------------------------------------------------------------------------------
| Totals: 5.857e+07 167.5848 100.00 |
-----------------------------------------------------------------------------------------------------------------


On Apr 27, 2017, at 14:29, Rossi, Simone <***@email.unc.edu<mailto:***@email.unc.edu>> wrote:

Ok, I ran again the tests with different max_h_levels with the perflog enabled.
Let me know if you see anything here.
Thanks,
Simone

NO AMR
-----------------------------------------------------------------------------------------------------------------
| libMesh Performance: Alive time=77.5482, Active time=40.2976 |
-----------------------------------------------------------------------------------------------------------------
| Event nCalls Total Time Avg Time Total Time Avg Time % of Active Time |
| w/o Sub w/o Sub With Sub With Sub w/o S With S |
|-----------------------------------------------------------------------------------------------------------------|
| |
| |
| DefaultCoupling |
| operator() 98306 0.1609 0.000002 0.1609 0.000002 0.40 0.40 |
| |
| DofMap |
| add_neighbors_to_send_list() 1 0.0959 0.095930 0.3744 0.374369 0.24 0.93 |
| build_sparsity() 1 0.4701 0.470055 1.1433 1.143297 1.17 2.84 |
| create_dof_constraints() 1 0.0137 0.013673 0.0137 0.013673 0.03 0.03 |
| distribute_dofs() 1 0.0126 0.012578 0.4376 0.437647 0.03 1.09 |
| dof_indices() 11010048 9.9728 0.000001 9.9728 0.000001 24.75 24.75 |
| prepare_send_list() 2 0.0000 0.000002 0.0000 0.000002 0.00 0.00 |
| reinit() 1 0.0507 0.050692 0.0507 0.050692 0.13 0.13 |
| |
| EquationSystems |
| build_parallel_solution_vector() 5 1.4241 0.284811 2.4934 0.498673 3.53 6.19 |
| build_solution_vector() 5 0.0002 0.000050 2.4936 0.498724 0.00 6.19 |
| |
| ExodusII_IO |
| write_nodal_data() 3 0.0774 0.025816 0.0774 0.025816 0.19 0.19 |
| |
| FE |
| compute_shape_functions() 10027008 11.7027 0.000001 11.7027 0.000001 29.04 29.04 |
| init_shape_functions() 102 0.0007 0.000007 0.0007 0.000007 0.00 0.00 |
| |
| FEMap |
| compute_affine_map() 10027008 9.9328 0.000001 9.9328 0.000001 24.65 24.65 |
| init_reference_to_physical_map() 102 0.0008 0.000008 0.0008 0.000008 0.00 0.00 |
| |
| GMVIO |
| write_nodal_data() 2 0.2260 0.113020 0.2260 0.113020 0.56 0.56 |
| |
| GenericProjector |
| operator() 1 0.8425 0.842529 2.0842 2.084232 2.09 5.17 |
| project_edges 98304 0.0765 0.000001 0.0765 0.000001 0.19 0.19 |
| project_interior 98304 0.0765 0.000001 0.0765 0.000001 0.19 0.19 |
| project_nodes 98304 0.0865 0.000001 0.0865 0.000001 0.21 0.21 |
| project_sides 98304 0.0763 0.000001 0.0763 0.000001 0.19 0.19 |
| |
| Mesh |
| find_neighbors() 1 0.1105 0.110532 0.1105 0.110532 0.27 0.27 |
| renumber_nodes_and_elem() 2 0.0063 0.003125 0.0063 0.003125 0.02 0.02 |
| |
| MeshOutput |
| write_equation_systems() 5 0.0001 0.000021 2.7972 0.559445 0.00 6.94 |
| |
| MeshTools::Generation |
| build_cube() 1 0.0280 0.027995 0.0280 0.027995 0.07 0.07 |
| |
| Parallel |
| allgather() 1 0.0000 0.000003 0.0000 0.000003 0.00 0.00 |
| |
| Partitioner |
| single_partition() 1 0.0028 0.002767 0.0028 0.002767 0.01 0.01 |
| |
| PetscLinearSolver |
| solve() 101 4.8469 0.047989 4.8469 0.047989 12.03 12.03 |
| |
| System |
| project_fem_vector() 1 0.0034 0.003364 2.0876 2.087598 0.01 5.18 |
| project_vector(FunctionBase) 1 0.0000 0.000011 2.0876 2.087610 0.00 5.18 |
-----------------------------------------------------------------------------------------------------------------
| Totals: 3.156e+07 40.2976 100.00 |
-----------------------------------------------------------------------------------------------------------------






AMR: 1 refinement
-----------------------------------------------------------------------------------------------------------------
| libMesh Performance: Alive time=395.981, Active time=261.811 |
-----------------------------------------------------------------------------------------------------------------
| Event nCalls Total Time Avg Time Total Time Avg Time % of Active Time |
| w/o Sub w/o Sub With Sub With Sub w/o S With S |
|-----------------------------------------------------------------------------------------------------------------|
| |
| |
| DefaultCoupling |
| operator() 1336320 2.0806 0.000002 2.0806 0.000002 0.79 0.79 |
| |
| DofMap |
| add_neighbors_to_send_list() 102 1.2626 0.012378 4.8311 0.047363 0.48 1.85 |
| build_sparsity() 102 6.5962 0.064669 15.1863 0.148885 2.52 5.80 |
| create_dof_constraints() 102 0.1384 0.001356 0.2351 0.002305 0.05 0.09 |
| distribute_dofs() 102 0.1489 0.001459 5.6797 0.055684 0.06 2.17 |
| dof_indices() 22510266 19.3897 0.000001 19.3897 0.000001 7.41 7.41 |
| enforce_constraints_exactly() 303 0.1463 0.000483 0.1463 0.000483 0.06 0.06 |
| old_dof_indices() 11914452 11.0468 0.000001 11.0468 0.000001 4.22 4.22 |
| prepare_send_list() 103 0.0001 0.000001 0.0001 0.000001 0.00 0.00 |
| reinit() 102 0.6993 0.006856 0.6993 0.006856 0.27 0.27 |
| |
| EquationSystems |
| build_parallel_solution_vector() 5 0.1832 0.036644 0.3127 0.062538 0.07 0.12 |
| build_solution_vector() 5 0.0001 0.000018 0.3128 0.062557 0.00 0.12 |
| |
| ExodusII_IO |
| write_nodal_data() 3 0.0094 0.003131 0.0094 0.003131 0.00 0.00 |
| |
| FE |
| compute_shape_functions() 12975978 16.6602 0.000001 16.6602 0.000001 6.36 6.36 |
| init_shape_functions() 10329700 16.6365 0.000002 16.6365 0.000002 6.35 6.35 |
| inverse_map() 10386411 11.3644 0.000001 11.3644 0.000001 4.34 4.34 |
| |
| FEMap |
| compute_affine_map() 12975978 13.4041 0.000001 13.4041 0.000001 5.12 5.12 |
| compute_face_map() 7691859 8.9240 0.000001 8.9240 0.000001 3.41 3.41 |
| init_face_shape_functions() 101 0.0004 0.000004 0.0004 0.000004 0.00 0.00 |
| init_reference_to_physical_map() 10329700 11.4379 0.000001 11.4379 0.000001 4.37 4.37 |
| |
| GMVIO |
| write_nodal_data() 2 0.0979 0.048947 0.0979 0.048947 0.04 0.04 |
| |
| GenericProjector |
| copy_dofs 3917556 15.7713 0.000004 59.2081 0.000015 6.02 22.61 |
| operator() 304 11.6914 0.038458 95.5809 0.314411 4.47 36.51 |
| project_edges 66216 0.0489 0.000001 0.0489 0.000001 0.02 0.02 |
| project_interior 66216 0.0493 0.000001 0.0493 0.000001 0.02 0.02 |
| project_nodes 66216 0.2561 0.000004 3.4858 0.000053 0.10 1.33 |
| project_sides 66216 0.0498 0.000001 0.0498 0.000001 0.02 0.02 |
| |
| JumpErrorEstimator |
| estimate_error() 101 73.8216 0.730907 231.1510 2.288624 28.20 88.29 |
| |
| Mesh |
| contract() 101 0.0296 0.000293 0.0581 0.000575 0.01 0.02 |
| find_neighbors() 101 1.4534 0.014391 1.4534 0.014391 0.56 0.56 |
| renumber_nodes_and_elem() 303 0.0847 0.000280 0.0847 0.000280 0.03 0.03 |
| |
| MeshOutput |
| write_equation_systems() 5 0.0001 0.000017 0.4202 0.084033 0.00 0.16 |
| |
| MeshRefinement |
| _coarsen_elements() 202 0.0812 0.000402 0.0812 0.000402 0.03 0.03 |
| _refine_elements() 202 0.1485 0.000735 0.2795 0.001383 0.06 0.11 |
| add_node() 64512 0.0546 0.000001 0.0546 0.000001 0.02 0.02 |
| make_coarsening_compatible() 204 0.3018 0.001479 0.3018 0.001479 0.12 0.12 |
| make_flags_parallel_consistent() 303 0.2300 0.000759 0.2300 0.000759 0.09 0.09 |
| make_refinement_compatible() 204 0.0242 0.000119 0.0242 0.000119 0.01 0.01 |
| |
| MeshTools::Generation |
| build_cube() 1 0.0039 0.003937 0.0039 0.003937 0.00 0.00 |
| |
| OldSolutionValue |
| Number eval_at_node() 215712 0.2301 0.000001 2.9735 0.000014 0.09 1.14 |
| check_old_context(c) 3917556 10.9141 0.000003 27.5061 0.000007 4.17 10.51 |
| check_old_context(c,p) 68724 0.1726 0.000003 0.4012 0.000006 0.07 0.15 |
| eval_at_point() 68724 0.8513 0.000012 2.6627 0.000039 0.33 1.02 |
| eval_old_dofs() 3917556 6.6409 0.000002 38.7818 0.000010 2.54 14.81 |
| |
| Parallel |
| allgather() 102 0.0001 0.000001 0.0001 0.000001 0.00 0.00 |
| |
| Partitioner |
| single_partition() 101 0.0341 0.000338 0.0341 0.000338 0.01 0.01 |
| |
| PetscLinearSolver |
| solve() 202 1.6660 0.008248 1.6660 0.008248 0.64 0.64 |
| |
| StatisticsVector |
| maximum() 101 0.0018 0.000017 0.0018 0.000017 0.00 0.00 |
| |
| System |
| assemble() 202 11.5849 0.057351 28.7372 0.142263 4.42 10.98 |
| project_fem_vector() 1 0.0004 0.000417 0.2583 0.258341 0.00 0.10 |
| project_vector(FunctionBase) 1 0.0000 0.000008 0.2584 0.258351 0.00 0.10 |
| project_vector(old,new) 303 5.2799 0.017425 109.1696 0.360296 2.02 41.70 |
| |
| TopologyMap |
| init() 202 0.1071 0.000530 0.1071 0.000530 0.04 0.04 |
-----------------------------------------------------------------------------------------------------------------
| Totals: 1.129e+08 261.8108 100.00 |
-----------------------------------------------------------------------------------------------------------------





AMR 2 refinements
-----------------------------------------------------------------------------------------------------------------
| libMesh Performance: Alive time=156.79, Active time=103.985 |
-----------------------------------------------------------------------------------------------------------------
| Event nCalls Total Time Avg Time Total Time Avg Time % of Active Time |
| w/o Sub w/o Sub With Sub With Sub w/o S With S |
|-----------------------------------------------------------------------------------------------------------------|
| |
| |
| DefaultCoupling |
| operator() 487585 0.7671 0.000002 0.7671 0.000002 0.74 0.74 |
| |
| DofMap |
| add_neighbors_to_send_list() 203 0.4861 0.002394 1.8338 0.009034 0.47 1.76 |
| build_sparsity() 203 2.8815 0.014194 6.2119 0.030601 2.77 5.97 |
| create_dof_constraints() 203 0.2105 0.001037 0.4801 0.002365 0.20 0.46 |
| distribute_dofs() 203 0.0596 0.000294 2.1454 0.010569 0.06 2.06 |
| dof_indices() 8055927 7.4875 0.000001 7.4875 0.000001 7.20 7.20 |
| enforce_constraints_exactly() 606 0.3674 0.000606 0.3674 0.000606 0.35 0.35 |
| old_dof_indices() 4358601 4.2132 0.000001 4.2132 0.000001 4.05 4.05 |
| prepare_send_list() 204 0.0002 0.000001 0.0002 0.000001 0.00 0.00 |
| reinit() 203 0.2510 0.001237 0.2510 0.001237 0.24 0.24 |
| |
| EquationSystems |
| build_parallel_solution_vector() 5 0.0316 0.006312 0.0543 0.010852 0.03 0.05 |
| build_solution_vector() 5 0.0001 0.000014 0.0543 0.010868 0.00 0.05 |
| |
| ExodusII_IO |
| write_nodal_data() 3 0.0024 0.000816 0.0024 0.000816 0.00 0.00 |
| |
| FE |
| compute_shape_functions() 4507581 6.1953 0.000001 6.1953 0.000001 5.96 5.96 |
| init_shape_functions() 3783756 6.6310 0.000002 6.6310 0.000002 6.38 6.38 |
| inverse_map() 3875385 4.5491 0.000001 4.5491 0.000001 4.37 4.37 |
| |
| FEMap |
| compute_affine_map() 4507581 5.2201 0.000001 5.2201 0.000001 5.02 5.02 |
| compute_face_map() 2763882 3.5520 0.000001 3.5520 0.000001 3.42 3.42 |
| init_face_shape_functions() 202 0.0007 0.000004 0.0007 0.000004 0.00 0.00 |
| init_reference_to_physical_map() 3783756 4.6286 0.000001 4.6286 0.000001 4.45 4.45 |
| |
| GMVIO |
| write_nodal_data() 2 0.1665 0.083237 0.1665 0.083237 0.16 0.16 |
| |
| GenericProjector |
| copy_dofs 1361385 5.6580 0.000004 21.6490 0.000016 5.44 20.82 |
| operator() 607 5.0012 0.008239 40.4516 0.066642 4.81 38.90 |
| project_edges 97080 0.0766 0.000001 0.0766 0.000001 0.07 0.07 |
| project_interior 97080 0.0751 0.000001 0.0751 0.000001 0.07 0.07 |
| project_nodes 97080 0.4693 0.000005 5.0553 0.000052 0.45 4.86 |
| project_sides 97080 0.0770 0.000001 0.0770 0.000001 0.07 0.07 |
| |
| JumpErrorEstimator |
| estimate_error() 202 28.7106 0.142132 89.7093 0.444106 27.61 86.27 |
| |
| Mesh |
| contract() 202 0.0160 0.000079 0.0280 0.000139 0.02 0.03 |
| find_neighbors() 203 0.5978 0.002945 0.5978 0.002945 0.57 0.57 |
| renumber_nodes_and_elem() 608 0.0350 0.000058 0.0350 0.000058 0.03 0.03 |
| |
| MeshOutput |
| write_equation_systems() 5 0.0001 0.000013 0.2233 0.044669 0.00 0.21 |
| |
| MeshRefinement |
| _coarsen_elements() 404 0.0378 0.000094 0.0378 0.000094 0.04 0.04 |
| _refine_elements() 404 0.1563 0.000387 0.4010 0.000993 0.15 0.39 |
| add_node() 113664 0.1007 0.000001 0.1007 0.000001 0.10 0.10 |
| make_coarsening_compatible() 407 0.1988 0.000489 0.1988 0.000489 0.19 0.19 |
| make_flags_parallel_consistent() 606 0.0937 0.000155 0.0937 0.000155 0.09 0.09 |
| make_refinement_compatible() 407 0.0102 0.000025 0.0102 0.000025 0.01 0.01 |
| |
| MeshTools::Generation |
| build_cube() 1 0.0007 0.000677 0.0007 0.000677 0.00 0.00 |
| |
| OldSolutionValue |
| Number eval_at_node() 382176 0.3948 0.000001 4.1027 0.000011 0.38 3.95 |
| check_old_context(c) 1361385 4.0214 0.000003 10.1149 0.000007 3.87 9.73 |
| check_old_context(c,p) 85266 0.2420 0.000003 0.5679 0.000007 0.23 0.55 |
| eval_at_point() 85266 1.1648 0.000014 3.5999 0.000042 1.12 3.46 |
| eval_old_dofs() 1361385 2.4347 0.000002 14.2695 0.000010 2.34 13.72 |
| |
| Parallel |
| allgather() 203 0.0002 0.000001 0.0002 0.000001 0.00 0.00 |
| |
| Partitioner |
| single_partition() 203 0.0140 0.000069 0.0140 0.000069 0.01 0.01 |
| |
| PetscLinearSolver |
| solve() 303 0.7612 0.002512 0.7612 0.002512 0.73 0.73 |
| |
| StatisticsVector |
| maximum() 202 0.0008 0.000004 0.0008 0.000004 0.00 0.00 |
| |
| System |
| assemble() 303 3.4738 0.011465 8.5615 0.028256 3.34 8.23 |
| project_fem_vector() 1 0.0001 0.000142 0.0331 0.033134 0.00 0.03 |
| project_vector(FunctionBase) 1 0.0000 0.000009 0.0331 0.033144 0.00 0.03 |
| project_vector(old,new) 606 2.3051 0.003804 46.4861 0.076710 2.22 44.70 |
| |
| TopologyMap |
| init() 404 0.1561 0.000386 0.1561 0.000386 0.15 0.15 |
-----------------------------------------------------------------------------------------------------------------
| Totals: 4.127e+07 103.9851 100.00 |
-----------------------------------------------------------------------------------------------------------------





AMR 3 refinements
-----------------------------------------------------------------------------------------------------------------
| libMesh Performance: Alive time=455.466, Active time=308.123 |
-----------------------------------------------------------------------------------------------------------------
| Event nCalls Total Time Avg Time Total Time Avg Time % of Active Time |
| w/o Sub w/o Sub With Sub With Sub w/o S With S |
|-----------------------------------------------------------------------------------------------------------------|
| |
| |
| DefaultCoupling |
| operator() 1153034 1.9001 0.000002 1.9001 0.000002 0.62 0.62 |
| |
| DofMap |
| add_neighbors_to_send_list() 304 1.1509 0.003786 4.4251 0.014556 0.37 1.44 |
| build_sparsity() 304 8.8682 0.029172 16.6976 0.054926 2.88 5.42 |
| create_dof_constraints() 304 1.6469 0.005417 3.9951 0.013142 0.53 1.30 |
| distribute_dofs() 304 0.2173 0.000715 5.4146 0.017811 0.07 1.76 |
| dof_indices() 19916934 18.6126 0.000001 18.6126 0.000001 6.04 6.04 |
| enforce_constraints_exactly() 909 2.8034 0.003084 2.8034 0.003084 0.91 0.91 |
| old_dof_indices() 10268793 10.0124 0.000001 10.0124 0.000001 3.25 3.25 |
| prepare_send_list() 305 0.0003 0.000001 0.0003 0.000001 0.00 0.00 |
| reinit() 304 0.7707 0.002535 0.7707 0.002535 0.25 0.25 |
| |
| EquationSystems |
| build_parallel_solution_vector() 5 0.0498 0.009954 0.0849 0.016974 0.02 0.03 |
| build_solution_vector() 5 0.0001 0.000015 0.0850 0.016991 0.00 0.03 |
| |
| ExodusII_IO |
| write_nodal_data() 3 0.0016 0.000526 0.0016 0.000526 0.00 0.00 |
| |
| FE |
| compute_shape_functions() 12087258 16.7562 0.000001 16.7562 0.000001 5.44 5.44 |
| init_shape_functions() 10555340 23.3502 0.000002 23.3502 0.000002 7.58 7.58 |
| inverse_map() 11670851 13.6081 0.000001 13.6081 0.000001 4.42 4.42 |
| |
| FEMap |
| compute_affine_map() 12087258 15.1613 0.000001 15.1613 0.000001 4.92 4.92 |
| compute_face_map() 6822171 8.8288 0.000001 8.8288 0.000001 2.87 2.87 |
| init_face_shape_functions() 303 0.0011 0.000004 0.0011 0.000004 0.00 0.00 |
| init_reference_to_physical_map() 10555340 14.9343 0.000001 14.9343 0.000001 4.85 4.85 |
| |
| GMVIO |
| write_nodal_data() 2 0.0676 0.033816 0.0676 0.033816 0.02 0.02 |
| |
| GenericProjector |
| copy_dofs 2157561 8.8513 0.000004 33.9505 0.000016 2.87 11.02 |
| operator() 910 18.4194 0.020241 155.7304 0.171132 5.98 50.54 |
| project_edges 1299333 1.0235 0.000001 1.0235 0.000001 0.33 0.33 |
| project_interior 1299333 1.0026 0.000001 1.0026 0.000001 0.33 0.33 |
| project_nodes 1299333 6.3258 0.000005 76.0383 0.000059 2.05 24.68 |
| project_sides 1299333 1.0258 0.000001 1.0258 0.000001 0.33 0.33 |
| |
| JumpErrorEstimator |
| estimate_error() 303 71.4588 0.235838 222.8668 0.735534 23.19 72.33 |
| |
| Mesh |
| contract() 303 0.0998 0.000329 0.1462 0.000483 0.03 0.05 |
| find_neighbors() 304 2.2488 0.007397 2.2488 0.007397 0.73 0.73 |
| renumber_nodes_and_elem() 911 0.1348 0.000148 0.1348 0.000148 0.04 0.04 |
| |
| MeshOutput |
| write_equation_systems() 5 0.0001 0.000013 0.1542 0.030848 0.00 0.05 |
| |
| MeshRefinement |
| _coarsen_elements() 606 0.1621 0.000268 0.1621 0.000268 0.05 0.05 |
| _refine_elements() 606 1.6498 0.002722 4.9647 0.008193 0.54 1.61 |
| add_node() 1542432 1.3647 0.000001 1.3647 0.000001 0.44 0.44 |
| make_coarsening_compatible() 809 1.4420 0.001782 1.4420 0.001782 0.47 0.47 |
| make_flags_parallel_consistent() 909 0.2881 0.000317 0.2881 0.000317 0.09 0.09 |
| make_refinement_compatible() 809 0.0552 0.000068 0.0552 0.000068 0.02 0.02 |
| |
| MeshTools::Generation |
| build_cube() 1 0.0002 0.000230 0.0002 0.000230 0.00 0.00 |
| |
| OldSolutionValue |
| Number eval_at_node() 5196564 5.4931 0.000001 63.1554 0.000012 1.78 20.50 |
| check_old_context(c) 2157561 6.2716 0.000003 15.8724 0.000007 2.04 5.15 |
| check_old_context(c,p) 1343484 3.6784 0.000003 8.6255 0.000006 1.19 2.80 |
| eval_at_point() 1343484 18.1202 0.000013 55.9662 0.000042 5.88 18.16 |
| eval_old_dofs() 2157561 3.8284 0.000002 22.3994 0.000010 1.24 7.27 |
| |
| Parallel |
| allgather() 304 0.0003 0.000001 0.0003 0.000001 0.00 0.00 |
| |
| Partitioner |
| single_partition() 304 0.0450 0.000148 0.0450 0.000148 0.01 0.01 |
| |
| PetscLinearSolver |
| solve() 404 1.5022 0.003718 1.5022 0.003718 0.49 0.49 |
| |
| StatisticsVector |
| maximum() 303 0.0019 0.000006 0.0019 0.000006 0.00 0.00 |
| |
| System |
| assemble() 404 7.4765 0.018506 18.1484 0.044922 2.43 5.89 |
| project_fem_vector() 1 0.0001 0.000109 0.0045 0.004474 0.00 0.00 |
| project_vector(FunctionBase) 1 0.0000 0.000010 0.0045 0.004485 0.00 0.00 |
| project_vector(old,new) 909 6.4352 0.007079 174.8106 0.192311 2.09 56.73 |
| |
| TopologyMap |
| init() 606 0.9755 0.001610 0.9755 0.001610 0.32 0.32 |
-----------------------------------------------------------------------------------------------------------------
| Totals: 1.162e+08 308.1230 100.00 |
-----------------------------------------------------------------------------------------------------------------


On Apr 27, 2017, at 12:14, Vikram Garg <***@gmail.com<mailto:***@gmail.com>> wrote:

Rossi, yes compiling with perflog should give you all the details as in the example.





On Thu, Apr 27, 2017 at 10:54 AM, Rossi, Simone <***@email.unc.edu<mailto:***@email.unc.edu>> wrote:
Dear Vikram,
as in the examples, I am using the libmesh::KellyErrorEstimator.

I’m compiling libmesh with the --enable-perflog option. Does it automatically give all the details you have listed in the example?

For the time being, I am attaching two perfLogs I had saved with only “coarse scale” data for 2 levels of refinements.
It looks like that most of the time is spent in the AMR step, probably in the call to reinit().

Thanks,
Simone

NO AMR:

------------------------------------------------------------------------------------------------------------
| perf_log Performance: Alive time=18.0494, Active time=18.0426 |
------------------------------------------------------------------------------------------------------------
| Event nCalls Total Time Avg Time Total Time Avg Time % of Active Time |
| w/o Sub w/o Sub With Sub With Sub w/o S With S |
|------------------------------------------------------------------------------------------------------------|
| no amr matrix assembly 1 0.1545 0.154465 0.1545 0.154465 0.86 0.86 |
| no amr linear solve 101 4.8069 0.047593 4.8069 0.047593 26.64 26.64 |
| no amr rhs assembly 101 12.0348 0.119156 12.0348 0.119156 66.70 66.70 |
| time loop 1 1.0464 1.046422 17.8884 17.888405 5.80 99.15 |
------------------------------------------------------------------------------------------------------------
| Totals: 204 18.0426 100.00 |
------------------------------------------------------------------------------------------------------------


AMR:

------------------------------------------------------------------------------------------------------------
| perf_log Performance: Alive time=209.305, Active time=209.298 |
------------------------------------------------------------------------------------------------------------
| Event nCalls Total Time Avg Time Total Time Avg Time % of Active Time |
| w/o Sub w/o Sub With Sub With Sub w/o S With S |
|------------------------------------------------------------------------------------------------------------|
| |
| amr 303 195.1102 0.643928 195.1102 0.643928 93.22 93.22 |
| amr solve 303 13.9907 0.046174 13.9907 0.046174 6.68 6.68 |
| time loop 1 0.1974 0.197370 209.2990 209.299042 0.09 100.00 |
------------------------------------------------------------------------------------------------------------
| Totals: 607 209.2983 100.00 |
------------------------------------------------------------------------------------------------------------​


On Apr 27, 2017, at 11:02, Vikram Garg <***@gmail.com<mailto:***@gmail.com>> wrote:

Hello Rossi,
Two questions:

1) Which error estimator/indicator are you using to mark elements for refinement ?

2) Can you send the perfLog output from libMesh ? You might need to recompile libMesh with the option --enable-perflog.

Looks something like this:

-----------------------------------------------------------------------------------------------------------------
| libMesh Performance: Alive time=0.013423, Active time=0.007095 |
-----------------------------------------------------------------------------------------------------------------
| Event nCalls Total Time Avg Time Total Time Avg Time % of Active Time |
| w/o Sub w/o Sub With Sub With Sub w/o S With S |
|-----------------------------------------------------------------------------------------------------------------|
| |
| |
| DofMap |
| add_neighbors_to_send_list() 6 0.0001 0.000012 0.0001 0.000012 1.01 1.01 |
| build_sparsity() 6 0.0002 0.000033 0.0011 0.000187 2.78 15.84 |
| create_dof_constraints() 6 0.0000 0.000001 0.0000 0.000001 0.07 0.07 |
| distribute_dofs() 6 0.0001 0.000025 0.0004 0.000066 2.09 5.57 |
| dof_indices() 688 0.0010 0.000001 0.0010 0.000001 14.36 14.36 |
| old_dof_indices() 300 0.0001 0.000000 0.0001 0.000000 0.96 0.96 |
| prepare_send_list() 7 0.0000 0.000000 0.0000 0.000000 0.01 0.01 |
| reinit() 6 0.0002 0.000041 0.0002 0.000041 3.48 3.48 |
| |
| EquationSystems |
| build_solution_vector() 1 0.0001 0.000056 0.0001 0.000064 0.79 0.90 |


Thanks.

On Wed, Apr 26, 2017 at 10:09 PM, Rossi, Simone <***@email.unc.edu<mailto:***@email.unc.edu>> wrote:
Dear Roy, dear Paul, dear all,
I am testing AMR in libmesh using simple linear elements.
My test case is a propagating front described by a reaction-diffusion equation with a cubic bistable reaction term.
I followed the adaptivity examples to create this test case.

The run times for 100 timesteps using AMR can be more than 10 times slower than when using a fine uniform grid.
For example, with a 16 x 16 x 16 uniform grid, 100 iterations take about 18 seconds with a single processor.
With AMR, using a 2 x 2 x 2 grid and 3 levels of refinement, 100 iterations take about 800 seconds.

I’m attaching the code I’m using.
Without AMR, I build the matrix ( mass + dt * stiffness ) once and I update the rhs at every timestep.
Conversely, with AMR I am building the matrix and the rhs at every timestep for all the refinement levels.
Do you have any suggestions?

Thanks a lot for your help,
All the best,
Simone


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org<http://slashdot.org/>! http://sdm.link/slashdot
_______________________________________________
Libmesh-users mailing list
Libmesh-***@lists.sourceforge.net<mailto:Libmesh-***@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/libmesh-users
--
Vikram Garg
Postdoctoral Associate
The University of Texas at Austin

http://vikramvgarg.wordpress.com/
http://www.runforindia.org/runners/vikramg
--
Vikram Garg
Postdoctoral Associate
The University of Texas at Austin

http://vikramvgarg.wordpress.com/
http://www.runforindia.org/runners/vikramg
Roy Stogner
2017-04-28 15:40:01 UTC
Permalink
Post by Rossi, Simone
The run times for 100 timesteps using AMR can be more than 10 times slower than when using a fine uniform grid.
For example, with a 16 x 16 x 16 uniform grid, 100 iterations take about 18 seconds with a single processor.
With AMR, using a 2 x 2 x 2 grid and 3 levels of refinement, 100 iterations take about 800 seconds.
I didn't really understand this sentence until I started to run your
code to test possible libMesh optimizations - you're running 3 levels
of refinement *per timestep*!? That's pretty much guaranteed to be
inefficient; for nearly any transient PDE solve, the solution is never
going to change so much within a single time step that you'll want to
use more than one AMR step.

We probably violate this rule of thumb in the examples, which we
should fix to avoid misleading others, but in most cases you want to
think "time steps per adaptive step", not the other way around.

(there are exceptions, but in those cases you have to also be
exceptionally careful about how you do AMR; e.g. saving your previous
time step's error indicator so you don't accidentally coarsen too
soon)


I'm not complaining, though; your code really hammers the AMR code in
libMesh, which is exactly what we need for optimization purposes.
---
Roy
Boyce Griffith
2017-04-28 19:26:53 UTC
Permalink
Post by Roy Stogner
Post by Rossi, Simone
The run times for 100 timesteps using AMR can be more than 10 times slower than when using a fine uniform grid.
For example, with a 16 x 16 x 16 uniform grid, 100 iterations take about 18 seconds with a single processor.
With AMR, using a 2 x 2 x 2 grid and 3 levels of refinement, 100 iterations take about 800 seconds.
I didn't really understand this sentence until I started to run your
code to test possible libMesh optimizations - you're running 3 levels
of refinement *per timestep*!? That's pretty much guaranteed to be
inefficient; for nearly any transient PDE solve, the solution is never
going to change so much within a single time step that you'll want to
use more than one AMR step.
We probably violate this rule of thumb in the examples, which we
should fix to avoid misleading others, but in most cases you want to
think "time steps per adaptive step", not the other way around.
(there are exceptions, but in those cases you have to also be
exceptionally careful about how you do AMR; e.g. saving your previous
time step's error indicator so you don't accidentally coarsen too
soon)
I think what Simone wants is a "three level AMR grid", so that he is getting the same effective fine grid resolution with a 2x2x2 base grid as in the uniformly fine case.

What is the correct way to initialize such a mesh and maintain it in a time-dependent model?

Thanks,

-- Boyce
Post by Roy Stogner
I'm not complaining, though; your code really hammers the AMR code in
libMesh, which is exactly what we need for optimization purposes.
---
Roy
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Libmesh-users mailing list
https://lists.sourceforge.net/lists/listinfo/libmesh-users
Rossi, Simone
2017-04-29 02:16:58 UTC
Permalink
Dear Roy,
thanks for your answer.

If I understand you correctly, performing more than one AMR step at every timestep is “inefficient”.
The strategy should be to run with a fixed locally refined mesh for N timestep, before running a new adaptive step.

So if I want to compare with a uniform grid I guess that
(depending on how long the adaptive step takes)
I can increase the “number of timestep” per adaptive step to make it more efficient.

I’ll update my code with your suggestions. I’ll keep you posted on the outcome.

Alternatively, could a possible strategy be to estimate the error at every time step,
and take the adaptive step only if the error is larger than a given tolerance?

Thanks again for the help,
Best,
Simone


On Apr 28, 2017, at 15:26, Boyce Griffith <***@cims.nyu.edu<mailto:***@cims.nyu.edu>> wrote:


On Apr 28, 2017, at 11:40 AM, Roy Stogner <***@ices.utexas.edu<mailto:***@ices.utexas.edu>> wrote:


On Thu, 27 Apr 2017, Rossi, Simone wrote:

The run times for 100 timesteps using AMR can be more than 10 times slower than when using a fine uniform grid.
For example, with a 16 x 16 x 16 uniform grid, 100 iterations take about 18 seconds with a single processor.

With AMR, using a 2 x 2 x 2 grid and 3 levels of refinement, 100 iterations take about 800 seconds.

I didn't really understand this sentence until I started to run your
code to test possible libMesh optimizations - you're running 3 levels
of refinement *per timestep*!? That's pretty much guaranteed to be
inefficient; for nearly any transient PDE solve, the solution is never
going to change so much within a single time step that you'll want to
use more than one AMR step.

We probably violate this rule of thumb in the examples, which we
should fix to avoid misleading others, but in most cases you want to
think "time steps per adaptive step", not the other way around.

(there are exceptions, but in those cases you have to also be
exceptionally careful about how you do AMR; e.g. saving your previous
time step's error indicator so you don't accidentally coarsen too
soon)

I think what Simone wants is a "three level AMR grid", so that he is getting the same effective fine grid resolution with a 2x2x2 base grid as in the uniformly fine case.

What is the correct way to initialize such a mesh and maintain it in a time-dependent model?

Thanks,

-- Boyce

I'm not complaining, though; your code really hammers the AMR code in
libMesh, which is exactly what we need for optimization purposes.
---
Roy

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org<http://slashdot.org/>! http://sdm.link/slashdot
_______________________________________________
Libmesh-users mailing list
Libmesh-***@lists.sourceforge.net<mailto:Libmesh-***@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/libmesh-users
Roy Stogner
2017-05-08 21:22:07 UTC
Permalink
If I understand you correctly, performing more than one AMR step at
every timestep is “inefficient”. The strategy should be to run with
a fixed locally refined mesh for N timestep, before running a new
adaptive step.
Yes, although N=1 is often reasonable IMHO, if you're time stepping
aggressively enough.
Alternatively, could a possible strategy be to estimate the error at
every time step, and take the adaptive step only if the error is
larger than a given tolerance?
Hmm... I've never tried that, but it does sound like a good idea.

Another strategy I've seen in a paper was to use the same grid for
every time step of a transient calculation, but to determine *that*
grid adaptively, in a loop outside the transient loop. You then don't
need to compute the error indicator on your finest grind and you don't
need to compute projections ever. That was for a turbulent flow
problem, though; IIRC Boyce told me you had moving fronts, for which
that hack would be very suboptimal.
---
Roy
Boyce Griffith
2017-05-09 00:32:40 UTC
Permalink
Post by Roy Stogner
If I understand you correctly, performing more than one AMR step at
every timestep is “inefficient”. The strategy should be to run with
a fixed locally refined mesh for N timestep, before running a new
adaptive step.
Yes, although N=1 is often reasonable IMHO, if you're time stepping
aggressively enough.
Alternatively, could a possible strategy be to estimate the error at
every time step, and take the adaptive step only if the error is
larger than a given tolerance?
Hmm... I've never tried that, but it does sound like a good idea.
Another strategy I've seen in a paper was to use the same grid for
every time step of a transient calculation, but to determine *that*
grid adaptively, in a loop outside the transient loop. You then don't
need to compute the error indicator on your finest grind and you don't
need to compute projections ever. That was for a turbulent flow
problem, though; IIRC Boyce told me you had moving fronts, for which
that hack would be very suboptimal.
Indeed --- it would tell you just to use a uniform grid. :-)
Post by Roy Stogner
---
Roy
Loading...