Fix CI: build mecab-ko as CMake ExternalProject and fix tokenizer bug#6
Fix CI: build mecab-ko as CMake ExternalProject and fix tokenizer bug#6ji-heo111 wants to merge 6 commits into
Conversation
|
Oh, you also try fixing the CI. I'll merge #5 so that you don't need to re-implement it. |
|
I've merged #5. Could you rebase on main? |
| key: test-ubuntu-ccache-${{ hashFiles('tokenizers/**') }} | ||
| restore-keys: cmake-ubuntu-ccache- | ||
| key: test-ubuntu-ccache-${{ hashFiles('CMakeLists.txt', 'tokenizers/**') }} | ||
| restore-keys: test-ubuntu-ccache- |
There was a problem hiding this comment.
Could you revert needless changes?
How about creating a separated job for bundled mecab-ko? We can use strategy for it.
https://docs.github.com/en/actions/reference/workflows-and-actions/workflow-syntax#jobsjob_idstrategy
There was a problem hiding this comment.
@kou
I've reverted the unnecessary test code. I'll set up a separate job for bundled mecab-ko using a matrix strategy to test across multiple OS environments in a follow-up PR.
| set(MECAB_KO_INCLUDE_DIRS "${MECAB_KO_PRIVATE_PREFIX}/include") | ||
| if(APPLE) | ||
| set(MECAB_KO_LIBRARY "${MECAB_KO_PRIVATE_PREFIX}/lib/libmecab.dylib") | ||
| elseif(WIN32) | ||
| set(MECAB_KO_LIBRARY "${MECAB_KO_PRIVATE_PREFIX}/lib/libmecab.dll") | ||
| else() | ||
| set(MECAB_KO_LIBRARY "${MECAB_KO_PRIVATE_PREFIX}/lib/libmecab.so") | ||
| endif() | ||
| set(CMAKE_INSTALL_RPATH "${MECAB_KO_PRIVATE_PREFIX}/lib") | ||
| set(CMAKE_INSTALL_RPATH_USE_LINK_PATH FALSE) |
There was a problem hiding this comment.
How about using static library instead of shared library? Then we don't need to install mecab-ko separately. TokenMecab does so and specify --rcfile explicitly: https://github.com/groonga/groonga/blob/cc4ae7e3ecc241428922a40edb104a2be7b20ea8/plugins/tokenizers/mecab.c#L526-L557
There was a problem hiding this comment.
@kou
Thanks for the suggestion. I've updated the implementation to build mecab-ko as a static library.
|
My suggestions: ji-heo111#3 |
| if(DEFINED MECAB_KO_PRIVATE_PREFIX) | ||
| add_dependencies(groonga_tokenizer_mecab_ko mecab_ko_dic_external) | ||
| target_compile_definitions( | ||
| groonga_tokenizer_mecab_ko | ||
| PRIVATE | ||
| GRN_WITH_BUNDLED_MECAB_KO | ||
| GRN_BUNDLED_MECAB_KO_RC_PATH="${MECAB_KO_RC_PATH}" | ||
| ) | ||
| find_package(Threads REQUIRED) | ||
| target_link_libraries(groonga_tokenizer_mecab_ko PRIVATE Threads::Threads) | ||
| endif() |
There was a problem hiding this comment.
ji-heo111#3 added libmecab_ko CMake target and it has all needed options. So we can use libmecab_ko for both system mecab-ko and bundled mecab-ko.
| argv[argc++] = "-F%m\n"; | ||
| argv[argc++] = "-E\n"; | ||
| } | ||
| #ifdef GRN_WITH_BUNDLED_MECAB_KO |
There was a problem hiding this comment.
How about using GRN_TOKENIZER_MECAB_KO prefix instead of GRN prefix?
| } | ||
| #ifdef GRN_WITH_BUNDLED_MECAB_KO | ||
| argv[argc++] = "--rcfile"; | ||
| argv[argc++] = GRN_BUNDLED_MECAB_KO_RC_PATH; |
| REQUIRED | ||
| NO_DEFAULT_PATH | ||
| ) | ||
| if(DEFINED MECAB_KO_PRIVATE_PREFIX) |
There was a problem hiding this comment.
How about falling back to bundled mecab-ko automatically when system mecab-ko isn't found?
We can use -DMECAB_KO_CONFIG=nonexistent to disable system mecab-ko explicitly.
ji-heo111#3 does so.
| https://bitbucket.org/eunjeon/mecab-ko/downloads/mecab-0.996-ko-0.9.2.tar.gz | ||
| PREFIX ${CMAKE_BINARY_DIR}/mecab-ko | ||
| CONFIGURE_COMMAND | ||
| <SOURCE_DIR>/configure --prefix=${MECAB_KO_PRIVATE_PREFIX} |
There was a problem hiding this comment.
Let's use --prefix=<INSTALL_DIR> instead of ${MECAB_KO_PRIVATE_PREFIX}. If we use static linking, we don't need to install mecab-ko.
ji-heo111#3 does so.
| INSTALL_COMMAND make install | ||
| BUILD_IN_SOURCE FALSE | ||
| BUILD_BYPRODUCTS ${MECAB_KO_LIBRARY} | ||
| ) |
There was a problem hiding this comment.
We need to install license files of mecab-ko when we use bundled mecab-ko.
ji-heo111#3 does so.
| CONFIGURE_COMMAND | ||
| sh -c | ||
| "cd <SOURCE_DIR> && ./autogen.sh && ./configure --prefix=${MECAB_KO_PRIVATE_PREFIX} --with-mecab-config=${MECAB_KO_PRIVATE_PREFIX}/bin/mecab-config --with-dicdir=${MECAB_KO_DIC_DIR}" |
There was a problem hiding this comment.
We can use PATCH_COMMAND for autogen.sh.
Let's install mecab-ko-dic files to CMAKE_INSTALL_PREFIX/... not MECAB_KO_PRIVATE_PREFIX/.... In general, users control where they install by CMAKE_INSTALL_PREFIX in CMake.
ji-heo111#3 does so.
| ${CMAKE_COMMAND} -E env | ||
| PATH=${MECAB_KO_PRIVATE_PREFIX}/bin:$ENV{PATH} make -j${NPROC} |
There was a problem hiding this comment.
We don't need to change PATH here because mecab-ko-dic doesn't use mecab in ${MECAB_KO_PRIVATE_PREFIX}/bin and mecab-dict-index exists in ${MECAB_KO_PRIVATE_PREFIX}/libexec/mecab/.
ji-heo111#3 removes this.
| COMMAND | ||
| sed -i "s|^dicdir.*|dicdir = ${MECAB_KO_DIC_DIR}|" | ||
| ${MECAB_KO_RC_PATH} |
There was a problem hiding this comment.
We can use configure_file() instead of sed.
ji-heo111#3 does so.
| sed -i "s|^dicdir.*|dicdir = ${MECAB_KO_DIC_DIR}|" | ||
| ${MECAB_KO_RC_PATH} | ||
| BUILD_IN_SOURCE TRUE | ||
| ) |
There was a problem hiding this comment.
We need to install license files of mecab-ko-dic when we use bundled mecab-ko-dic.
ji-heo111#3 does so.
Summary
ExternalProjectusing theMECAB_KO_PRIVATE_PREFIXCMake optionlibmecab.a) in bundled builds to eliminate external runtime dependencies--rcfileoption in bundled buildsChanges
CMakeLists.txtMECAB_KO_PRIVATE_PREFIXis defined, build mecab-ko and mecab-ko-dic from source usingExternalProject_Add--enable-static --disable-shared --with-picfor static linkingMECAB_KO_PRIVATE_PREFIXis not defined, retain the existingmecab-configbased detection logictokenizers/CMakeLists.txtmecab_ko_dic_externalfor bundled buildsGRN_WITH_BUNDLED_MECAB_KOandGRN_BUNDLED_MECAB_KO_RC_PATHThreadslibrary (required for static mecab linking)tokenizers/mecab_ko.cargvarray size from 3 to 5GRN_WITH_BUNDLED_MECAB_KOis defined, pass--rcfilewith the bundled mecabrc path