Skip to content

Add Python examples for KNI usage#1

Open
bruno-at-orange wants to merge 3 commits into
mainfrom
add-python-src
Open

Add Python examples for KNI usage#1
bruno-at-orange wants to merge 3 commits into
mainfrom
add-python-src

Conversation

@bruno-at-orange
Copy link
Copy Markdown
Member

Add Python implementations demonstrating single-table and multi-table recoding with the Khiops Native Interface (KNI).

New files:

  • python/KNI.py: Complete ctypes wrapper for KhiopsNativeInterface library with automatic library discovery (KNI_HOME, system paths)
  • python/KNIRecodeFile.py: Single-table recoding example
  • python/KNIRecodeMTFiles.py: Multi-table recoding example with support for secondary tables and external tables

Features:

  • Cross-platform support (Windows, Linux, macOS)
  • Flexible library loading with multiple search strategies
  • Complete API coverage including multi-table operations
  • Command-line interface with argparse for multi-table example
  • Error handling with descriptive messages

Requirements: Python 3.6+ and KNI shared library installed

Add Python implementations demonstrating single-table and multi-table
recoding with the Khiops Native Interface (KNI).

New files:
- python/KNI.py: Complete ctypes wrapper for KhiopsNativeInterface library
  with automatic library discovery (KNI_HOME, system paths)
- python/KNIRecodeFile.py: Single-table recoding example
- python/KNIRecodeMTFiles.py: Multi-table recoding example with support for
  secondary tables and external tables

Features:
- Cross-platform support (Windows, Linux, macOS)
- Flexible library loading with multiple search strategies
- Complete API coverage including multi-table operations
- Command-line interface with argparse for multi-table example
- Error handling with descriptive messages

Requirements: Python 3.6+ and KNI shared library installed
@bruno-at-orange bruno-at-orange requested a review from popescu-v May 12, 2026 07:49
Comment thread python/KNI.py Outdated
return ctypes.CDLL(lib_name)
except OSError:
# Try to find in KNI_HOME environment variable
import os
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import os at the beginning of the module to be PEP8-compatible.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread python/KNI.py Outdated
/ lib_name, # KNI_HOME directly (if set to lib/bin directory)
]

if system == "Windows":
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to maintain different "lib" / "bin" path priorities for Windows vs. UNIX?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep only one path: KNI_HOME

Comment thread python/KNI.py Outdated
# Try standard installation paths
if system == "Linux":
standard_paths = [
f"/usr/lib/{lib_name}",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use either os.path.join("usr", "lib", lib_name) or Path("usr") / "lib" / lib_name; idem for the other standard path.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread python/KNI.py Outdated
Comment on lines +127 to +130
try:
return ctypes.CDLL(path)
except OSError:
continue
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not proceed as above, viz.:

for path in standard_paths:
    if os.path.exists(path):
        return ctypes.CDLL(path)

?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread python/KNI.py Outdated
Comment on lines +253 to +254
if isinstance(field_separator, str)
else field_separator
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If field_separator is of type bytes (what other legit type could it have, besides str?), we still need to get the first byte to have a C-language char, viz. field_separator[0], right? For other types (e.g. int or float), we should raise exception IMHO.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread python/KNI.py
Recode an input record using the stream's dictionary.

Args:
stream_handle: Handle returned by open_stream
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, we should check the types of stream_handle, input_record and max_output_length before attempting to use them. And this goes for all API and all arguments IMHO.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread python/KNI.py Outdated
output_buffer = ctypes.create_string_buffer(max_output_length)
ret_code = self._lib.KNIRecodeStreamRecord(
stream_handle,
input_record.encode("utf-8"),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we accept an input_record of type bytes, in which case the encoding to UTF-8 would not apply anymore?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accept both bytes and string

Comment thread python/KNI.py Outdated
if ret_code == self.KNI_OK:
return ret_code, output_buffer.value.decode("utf-8")
else:
return ret_code, ""
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be inclined to have the return string set to None rather than the empty string for the error case.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread python/KNI.py Outdated
KNI_OK on success, negative error code on failure
"""
return self._lib.KNISetSecondaryHeaderLine(
stream_handle, data_path.encode("utf-8"), header_line.encode("utf-8")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why couldn't we accept the data_path and / or (especially) header_line of type bytes?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done for all arguments of type string

Comment thread python/KNI.py Outdated
Returns:
KNI_OK on success, negative error code on failure
"""
return self._lib.KNIFinishOpeningStream(stream_handle)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could / should we check, in the Python code that the preconditions are met (secondary headers and external tables set) before calling the C-level function? The advantage of this would be that we make sure we call into the C-level functions only if all preconditions are met.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree: the check is done in the shared library. I would be duplicated work.

Comment thread python/KNI.py
Set the maximum amount of memory (in MB) for stream opening.

Args:
max_mb: Maximum memory in MB
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check the type of max_mb before calling into the C-level function.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread python/KNIRecodeFile.py Outdated
# Set error log file
if error_file_name:
ret_code = kni.set_log_file_name(error_file_name)
if ret_code != KNI.KNI_OK:
Copy link
Copy Markdown

@popescu-v popescu-v May 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Python we tend to use exceptions rather than error codes AFAIK. And the Python wrapper would IMHO be an opportunity to provide a least-surprise Python interface.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Comment thread python/KNIRecodeFile.py Outdated
) as output_file:

# Read header line
header_line = input_file.readline().rstrip("\n\r")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.rstrip() without argument suffices, as it removes, by default, any trailing whitespace. And "whitespace" in Python includes the line feeds and carriage returns: https://docs.python.org/3/library/string.html#string.whitespace.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread python/KNIRecodeFile.py Outdated
record_number = 0
for line_number, line in enumerate(input_file, start=2):
# Remove trailing newline/carriage return
input_record = line.rstrip("\n\r")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.rstrip() alone suffices (see above).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread python/KNIRecodeFile.py Outdated

if ret_code == KNI.KNI_OK:
# Write output record
output_file.write(output_record + "\n")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use f"{output_record}\n".

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread python/KNIRecodeFile.py Outdated
def main():
"""Main entry point for command-line execution."""
# Check command-line arguments
if len(sys.argv) < 5:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use argparse (https://docs.python.org/3/library/argparse.html#module-argparse) because it provides argument (type) checking and help and is more standard Python. I'd keep sys.argv for quick one-shot scripts with a rather simple argument list (like, a couple of arguments maximum).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread python/KNIRecodeFile.py Outdated
"""Main entry point for command-line execution."""
# Check command-line arguments
if len(sys.argv) < 5:
print(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd distinguish two cases for usage / help information:

  1. fewer than required arguments are provided, in which case the message should indeed go to stderr;
  2. the user requests for help via e.g. the --help option, in which case the message should go to stdout IMHO, so that the user can easily pipe that output to other commands, like e.g. less or grep.

argparse handles all this with minimum hassle AFAIK.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread python/KNIRecodeMTFiles.py Outdated
Comment on lines +167 to +172
main_file.readline()

for line_number, line in enumerate(main_file, start=2):
main_record = line.rstrip("\n\r")
if not main_record:
continue
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have simplified header skip to:

for line_number, line in enumerate(main_file, start=1):
    # Skip header
    if line_number == 1:
        continue

    # Get main record
    main_record = line.rstrip()
    ...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread README.md
The purpose of KNI is to allow a deeper integration of Khiops in information systems, by means of the C programming language, using a shared library (`.dll` in Windows, `.so` in Linux). This relates especially to the problem of model deployment, which otherwise requires the use of input and output data files when using directly the Khiops tool in batch mode. See Khiops Guide for an introduction to dictionary files, dictionaries, database files and deployment.

The Khiops deployment API is thus made public through a shared library. Therefore, a Khiops model can be deployed directly from any programming language, such as C, C++, Java, Python, Matlab, etc. This enables real time model deployment without the overhead of temporary data files or launching executables. This is critical for certain applications, such as marketing or targeted advertising on the web..
The Khiops deployment API is thus made public through a shared library. Therefore, a Khiops model can be deployed directly from any programming language, such as C, C++, Java, Python, Matlab, etc. This enables real-time model deployment without the overhead of temporary data files or launching executables. This is critical for certain applications, such as marketing or targeted advertising on the web.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd state somewhere that the API is compatible with ISO C99 (or the applicable standard if different, e.g. ANSI / ISO C89).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread README.md
@@ -14,15 +14,32 @@ See [KhiopsNativeInterface.h](include/KhiopsNativeInterface.h) for a detailed de
> [!CAUTION]
> The functions are not reentrant (thread-safe): the library can be used simultaneously by several executables, but not simultaneously by several threads in the same executable.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we suggest using multiprocessing e.g. via MPI if parallelization is needed?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think It's not necessary

Comment thread README.md

## Requirements

- Python 3.6 or later
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd start with the earliest still supported version of Python, viz. 3.10 currently.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown

@popescu-v popescu-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the comments.
Generally, I would:

  1. type-check the input arguments of the Python API systematically, via is_instance and raise TypeError in case of erroneous types.
  2. favor raising exceptions rather than returning non-zero error codes in the Python API, to achieve a "least-surprise" API for Python users.
  3. systematically use argparse for dealing with input arguments to the example scripts.

This commit modernizes the Python KNI wrapper to follow Pythonic conventions
by using exceptions for error handling instead of C-style return codes.

Changes to KNI.py:
- Add KNIError exception class with error_code attribute
- Refactor all methods to raise KNIError on failure instead of returning error codes
- open_stream() now returns stream_handle directly (raises on error)
- recode_stream_record() now returns output string directly (no tuple)
- Add comprehensive type checking for all method parameters
- Extend all string parameters to accept both str and bytes types
- Simplify library path resolution to use KNI_HOME/ only

Changes to KNIRecodeFile.py:
- Import KNIError from KNI module
- Remove all error code checking (if ret_code != KNI.KNI_OK)
- Simplify to direct method calls with exception handling
- Fix field separator bug: change "\\t" to "\t" for proper tab character
- Add comprehensive exception handling in main()

Changes to KNIRecodeMTFiles.py:
- Import KNIError from KNI module
- Remove all error code checking throughout
- Simplify stream setup and record processing loops
- Clean exception handling throughout

Benefits:
- More Pythonic and easier to read
- Cleaner code with less boilerplate
- Better integration with Python's try/except patterns
- Type safety with parameter validation
- Support for both str and bytes in string parameters
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants