Hi, I found 5 distinct memory-safety bugs in pocketsphinx_lm_convert on x86-64 Ubuntu 22.04. They are still reproducible on commit 511126b.
Compiler
Ubuntu clang version 20.1.8
Build commands
git clone https://github.com/cmusphinx/pocketsphinx.git
cd pocketsphinx
mkdir build && cd build
cmake -G "Unix Makefiles" \
-DCMAKE_BUILD_TYPE=Release \
-DBUILD_SHARED_LIBS=OFF \
-DBUILD_GSTREAMER=OFF \
-DCMAKE_C_COMPILER=clang-20 \
-DCMAKE_CXX_COMPILER=clang++-20 \
-DCMAKE_C_FLAGS="-O1 -g -fno-omit-frame-pointer -fsanitize=address -static-libsan" \
-DCMAKE_CXX_FLAGS="-O1 -g -fno-omit-frame-pointer -fsanitize=address -static-libsan" \
..
make all -j12
Commandline
./pocketsphinx_lm_convert -i <input> -o /tmp/out.lm.bin
Bug 1 — heap-buffer-overflow (read) / SEGV (src/lm/ngrams_raw.c:224)
heap-buffer-overflow(read) variant — ngrams_raw.c:224:58
==2845702==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7c1ff6fe0610 at pc 0x55555569e4f7 bp 0x7fffffffd570 sp 0x7fffffffd568
READ of size 4 at 0x7c1ff6fe0610 thread T0
#0 0x55555569e4f6 in read_dmp_weight_array src/lm/ngrams_raw.c:224:58
#1 0x55555569d0b1 in ngrams_raw_read_dmp src/lm/ngrams_raw.c:318:5
#2 0x55555569aecc in ngram_model_trie_read_dmp src/lm/ngram_model_trie.c:667:13
#3 0x5555556740c4 in ngram_model_read src/lm/ngram_model.c
#4 0x555555673b56 in main programs/pocketsphinx_lm_convert.c
SEGV variant (same function, index goes further out of bounds) — ngrams_raw.c
==2859064==ERROR: AddressSanitizer: SEGV on unknown address 0x7c1ff702022c (pc 0x55555569e18a bp 0x7fffffffd610 sp 0x7fffffffd580 T0)
The signal is caused by a READ memory access.
#0 0x55555569e18a in read_dmp_weight_array src/lm/ngrams_raw.c
#1 0x55555569d195 in ngrams_raw_read_dmp src/lm/ngrams_raw.c:327:9
#2 0x55555569aecc in ngram_model_trie_read_dmp src/lm/ngram_model_trie.c:667:13
#3 0x5555556740c4 in ngram_model_read src/lm/ngram_model.c
#4 0x555555673b56 in main programs/pocketsphinx_lm_convert.c
Files: pocketsphinx_bug1a.zip, pocketsphinx_bug1b.zip
Bug 2 — NULL-pointer deref (read) (ngrams_raw.c:382)
Via DMP reader — ngrams_raw_read_dmp
==2839214==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x55555569cd1d bp 0x7fffffffd730 sp 0x7fffffffd620 T0)
The signal is caused by a READ memory access.
#0 0x55555569cd1d in ngrams_raw_free src/lm/ngrams_raw.c:382:48
#1 0x55555569cd1d in ngrams_raw_read_dmp src/lm/ngrams_raw.c:289:9
#2 0x55555569aecc in ngram_model_trie_read_dmp src/lm/ngram_model_trie.c:667:13
#3 0x5555556740c4 in ngram_model_read src/lm/ngram_model.c
#4 0x555555673b56 in main programs/pocketsphinx_lm_convert.c
Via ARPA reader — ngrams_raw_read_arpa
==2820049==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x55555569c31d bp 0x7fffffffd730 sp 0x7fffffffd5e0 T0)
The signal is caused by a READ memory access.
#0 0x55555569c31d in ngrams_raw_free src/lm/ngrams_raw.c:382:48
#1 0x55555569c31d in ngrams_raw_read_arpa src/lm/ngrams_raw.c:185:2
#2 0x5555556985e8 in ngram_model_trie_read_arpa src/lm/ngram_model_trie.c:219:13
#3 0x555555674087 in ngram_model_read src/lm/ngram_model.c:138:18
#4 0x555555673b56 in main programs/pocketsphinx_lm_convert.c
Files: pocketsphinx_bug2a.zip, pocketsphinx_bug2b.zip
Bug 3 — NULL-pointer deref (read) (ngrams_raw.c:189)
==2813206==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x55555569c1f9 bp 0x7fffffffd730 sp 0x7fffffffd5e0 T0)
The signal is caused by a READ memory access.
#0 0x55555569c1f9 in ngrams_raw_read_arpa src/lm/ngrams_raw.c:189:20
#1 0x5555556985e8 in ngram_model_trie_read_arpa src/lm/ngram_model_trie.c:219:13
#2 0x555555674087 in ngram_model_read src/lm/ngram_model.c:138:18
#3 0x555555673b56 in main programs/pocketsphinx_lm_convert.c
File: pocketsphinx_bug3.zip
Bug 4 — High Value SEGV (src/lm/ngram_model_trie.c:535)
if (strncmp(file_header, dmp_hdr, k) != 0) {
E_ERROR("Wrong header %s: %s is not a dump file\n", dmp_hdr); /* ngram_model_trie.c:535 */
goto error_out;
}
==2814616==ERROR: AddressSanitizer: SEGV on unknown address (pc 0x55555564a000 bp 0x7fffffffcd90 sp 0x7fffffffc508 T0)
The signal is caused by a READ memory access.
#0 0x55555564a000 in __sanitizer::internal_strlen(char const*) ...sanitizer_libc.cpp:176:10
#1 0x5555555b4c06 in printf_common(void*, char const*, __va_list_tag*) ...sanitizer_common_interceptors_format.inc:561:17
#2 0x5555555b5301 in vsnprintf ...sanitizer_common_interceptors.inc:1707:1
#3 0x5555556911bb in err_msg src/util/err.c:125:5
#4 0x55555569a693 in ngram_model_trie_read_dmp src/lm/ngram_model_trie.c:535:9
#5 0x5555556740c4 in ngram_model_read src/lm/ngram_model.c
#6 0x555555673b56 in main programs/pocketsphinx_lm_convert.c
File: pocketsphinx_bug4.zip
Bug 5 — NULL-pointer deref (read) (ngram_model_trie.c:453)
==2837436==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7ffff7d9d8bd bp 0x7fffffffd7b0 sp 0x7fffffffcf68 T0)
The signal is caused by a READ memory access.
#1 0x5555555a65f3 in strlen ...sanitizer_common_interceptors.inc
#2 0x55555569a12b in write_word_str src/lm/ngram_model_trie.c:453:14
#3 0x55555569a12b in ngram_model_trie_write_bin src/lm/ngram_model_trie.c:483:5
#4 0x555555673cc6 in main programs/pocketsphinx_lm_convert.c:187:13
File: pocketsphinx_bug5.zip
Let me know if you'd like these split into separate issues instead, or need anything else to reproduce.
Hi, I found 5 distinct memory-safety bugs in
pocketsphinx_lm_convertonx86-64 Ubuntu 22.04. They are still reproducible on commit511126b.Compiler
Build commands
Commandline
Bug 1 — heap-buffer-overflow (read) / SEGV (
src/lm/ngrams_raw.c:224)heap-buffer-overflow(read) variant —
ngrams_raw.c:224:58SEGV variant (same function, index goes further out of bounds) —
ngrams_raw.cFiles: pocketsphinx_bug1a.zip, pocketsphinx_bug1b.zip
Bug 2 — NULL-pointer deref (read) (
ngrams_raw.c:382)Via DMP reader —
ngrams_raw_read_dmpVia ARPA reader —
ngrams_raw_read_arpaFiles: pocketsphinx_bug2a.zip, pocketsphinx_bug2b.zip
Bug 3 — NULL-pointer deref (read) (
ngrams_raw.c:189)File: pocketsphinx_bug3.zip
Bug 4 — High Value SEGV (
src/lm/ngram_model_trie.c:535)File: pocketsphinx_bug4.zip
Bug 5 — NULL-pointer deref (read) (
ngram_model_trie.c:453)File: pocketsphinx_bug5.zip
Let me know if you'd like these split into separate issues instead, or need anything else to reproduce.