Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 8 additions & 2 deletions doc/development/pytorch-profiler.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,11 @@ export DP_PYTORCH_PROFILER_OUTPUT_DIR=./profiler_results

3. Check for profiler output in the specified directory:
```bash
# For single-rank or non-MPI usage
ls -la ./profiler_results/pytorch_profiler_trace.json

# For MPI usage, each rank gets its own file
ls -la ./profiler_results/pytorch_profiler_trace_rank*.json
```

For MPI applications, you can use different output directories per rank:
Expand All @@ -36,11 +40,13 @@ The profiler uses PyTorch's modern `torch::profiler` API and automatically:
- Creates the output directory if it doesn't exist
- Profiles all forward pass operations in DeepPotPT and DeepSpinPT
- Saves profiling results to a JSON file when the object is destroyed
- Automatically includes MPI rank in filename when MPI is available and initialized

## Output Files

- **All usage**: `pytorch_profiler_trace.json`
- **Single-rank or non-MPI usage**: `pytorch_profiler_trace.json`
- **MPI usage**: `pytorch_profiler_trace_rank{rank}.json` (e.g., `pytorch_profiler_trace_rank0.json`, `pytorch_profiler_trace_rank1.json`)

For MPI applications, users can distinguish between ranks by setting different output directories per rank using the `DP_PYTORCH_PROFILER_OUTPUT_DIR` environment variable.
This ensures that each MPI rank saves its profiling data to a separate file, preventing conflicts in multi-rank simulations.

This is intended for development and debugging purposes.
15 changes: 15 additions & 0 deletions source/api_cc/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,21 @@ set_target_properties(
INSTALL_RPATH_USE_LINK_PATH TRUE
BUILD_RPATH "$ORIGIN/../op/tf;$ORIGIN/../op/pt;$ORIGIN/../op/pd")
target_compile_definitions(${libname} PRIVATE TF_PRIVATE)
find_package(MPI)
if(MPI_FOUND)
include(CheckCXXSymbolExists)
set(CMAKE_REQUIRED_INCLUDES ${MPI_CXX_INCLUDE_DIRS})
set(CMAKE_REQUIRED_LIBRARIES ${MPI_CXX_LIBRARIES})
check_cxx_symbol_exists(MPIX_Query_cuda_support "mpi.h" CUDA_AWARE)
if(NOT CUDA_AWARE)
check_cxx_symbol_exists(MPIX_Query_cuda_support "mpi.h;mpi-ext.h" OMP_CUDA)
if(NOT OMP_CUDA)
target_compile_definitions(${libname} PRIVATE NO_CUDA_AWARE)
endif()
endif()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot MPIX_Query_cuda_support is not used in api_cc, so I think it's not needed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're absolutely right. Removed the unnecessary MPIX_Query_cuda_support check and related CUDA-aware MPI detection from the api_cc library since it's not used in the source code. The library now only includes the essential MPI functionality needed for rank detection. Commit: 284fc69

target_link_libraries(${libname} PRIVATE MPI::MPI_CXX)
target_compile_definitions(${libname} PRIVATE USE_MPI)
endif()
if(CMAKE_TESTING_ENABLED)
target_link_libraries(${libname} PRIVATE coverage_config)
endif()
Expand Down
22 changes: 15 additions & 7 deletions source/api_cc/src/common.cc
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,9 @@
#include <sys/stat.h>
#include <errno.h>

// Note: MPI rank detection has been removed from api_cc library
// to avoid MPI linking dependencies. The profiler will use a generic
// filename. Users can still distinguish between ranks by using different
// output directories per rank if needed.
#ifdef USE_MPI
#include <mpi.h>
#endif

#include "AtomMap.h"
#include "device.h"
Expand Down Expand Up @@ -408,10 +407,19 @@ void deepmd::get_env_pytorch_profiler(bool& enable_profiler, std::string& output
}

int deepmd::get_mpi_rank() {
// MPI rank detection removed from api_cc to avoid MPI linking dependencies
// Always return -1 to indicate no MPI rank available
// Users can distinguish between ranks by using different output directories
#ifdef USE_MPI
int rank = -1; // Use -1 to indicate MPI not available/initialized
int initialized = 0;
if (MPI_Initialized(&initialized) == MPI_SUCCESS && initialized) {
if (MPI_Comm_rank(MPI_COMM_WORLD, &rank) != MPI_SUCCESS) {
rank = -1; // fallback to -1 if MPI_Comm_rank fails
}
}
return rank;
#else
// MPI not available at compile time
return -1;
#endif
}

bool deepmd::create_directories(const std::string& path) {
Expand Down