Add Python examples for KNI usage#1
Conversation
Add Python implementations demonstrating single-table and multi-table recoding with the Khiops Native Interface (KNI). New files: - python/KNI.py: Complete ctypes wrapper for KhiopsNativeInterface library with automatic library discovery (KNI_HOME, system paths) - python/KNIRecodeFile.py: Single-table recoding example - python/KNIRecodeMTFiles.py: Multi-table recoding example with support for secondary tables and external tables Features: - Cross-platform support (Windows, Linux, macOS) - Flexible library loading with multiple search strategies - Complete API coverage including multi-table operations - Command-line interface with argparse for multi-table example - Error handling with descriptive messages Requirements: Python 3.6+ and KNI shared library installed
| return ctypes.CDLL(lib_name) | ||
| except OSError: | ||
| # Try to find in KNI_HOME environment variable | ||
| import os |
There was a problem hiding this comment.
Import os at the beginning of the module to be PEP8-compatible.
| / lib_name, # KNI_HOME directly (if set to lib/bin directory) | ||
| ] | ||
|
|
||
| if system == "Windows": |
There was a problem hiding this comment.
Why do we need to maintain different "lib" / "bin" path priorities for Windows vs. UNIX?
There was a problem hiding this comment.
Keep only one path: KNI_HOME
| # Try standard installation paths | ||
| if system == "Linux": | ||
| standard_paths = [ | ||
| f"/usr/lib/{lib_name}", |
There was a problem hiding this comment.
I'd use either os.path.join("usr", "lib", lib_name) or Path("usr") / "lib" / lib_name; idem for the other standard path.
| try: | ||
| return ctypes.CDLL(path) | ||
| except OSError: | ||
| continue |
There was a problem hiding this comment.
Why not proceed as above, viz.:
for path in standard_paths:
if os.path.exists(path):
return ctypes.CDLL(path)?
| if isinstance(field_separator, str) | ||
| else field_separator |
There was a problem hiding this comment.
If field_separator is of type bytes (what other legit type could it have, besides str?), we still need to get the first byte to have a C-language char, viz. field_separator[0], right? For other types (e.g. int or float), we should raise exception IMHO.
| Recode an input record using the stream's dictionary. | ||
|
|
||
| Args: | ||
| stream_handle: Handle returned by open_stream |
There was a problem hiding this comment.
IMHO, we should check the types of stream_handle, input_record and max_output_length before attempting to use them. And this goes for all API and all arguments IMHO.
| output_buffer = ctypes.create_string_buffer(max_output_length) | ||
| ret_code = self._lib.KNIRecodeStreamRecord( | ||
| stream_handle, | ||
| input_record.encode("utf-8"), |
There was a problem hiding this comment.
Why can't we accept an input_record of type bytes, in which case the encoding to UTF-8 would not apply anymore?
There was a problem hiding this comment.
accept both bytes and string
| if ret_code == self.KNI_OK: | ||
| return ret_code, output_buffer.value.decode("utf-8") | ||
| else: | ||
| return ret_code, "" |
There was a problem hiding this comment.
I'd be inclined to have the return string set to None rather than the empty string for the error case.
| KNI_OK on success, negative error code on failure | ||
| """ | ||
| return self._lib.KNISetSecondaryHeaderLine( | ||
| stream_handle, data_path.encode("utf-8"), header_line.encode("utf-8") |
There was a problem hiding this comment.
Why couldn't we accept the data_path and / or (especially) header_line of type bytes?
There was a problem hiding this comment.
done for all arguments of type string
| Returns: | ||
| KNI_OK on success, negative error code on failure | ||
| """ | ||
| return self._lib.KNIFinishOpeningStream(stream_handle) |
There was a problem hiding this comment.
Could / should we check, in the Python code that the preconditions are met (secondary headers and external tables set) before calling the C-level function? The advantage of this would be that we make sure we call into the C-level functions only if all preconditions are met.
There was a problem hiding this comment.
I disagree: the check is done in the shared library. I would be duplicated work.
| Set the maximum amount of memory (in MB) for stream opening. | ||
|
|
||
| Args: | ||
| max_mb: Maximum memory in MB |
There was a problem hiding this comment.
Check the type of max_mb before calling into the C-level function.
| # Set error log file | ||
| if error_file_name: | ||
| ret_code = kni.set_log_file_name(error_file_name) | ||
| if ret_code != KNI.KNI_OK: |
There was a problem hiding this comment.
In Python we tend to use exceptions rather than error codes AFAIK. And the Python wrapper would IMHO be an opportunity to provide a least-surprise Python interface.
| ) as output_file: | ||
|
|
||
| # Read header line | ||
| header_line = input_file.readline().rstrip("\n\r") |
There was a problem hiding this comment.
.rstrip() without argument suffices, as it removes, by default, any trailing whitespace. And "whitespace" in Python includes the line feeds and carriage returns: https://docs.python.org/3/library/string.html#string.whitespace.
| record_number = 0 | ||
| for line_number, line in enumerate(input_file, start=2): | ||
| # Remove trailing newline/carriage return | ||
| input_record = line.rstrip("\n\r") |
There was a problem hiding this comment.
.rstrip() alone suffices (see above).
|
|
||
| if ret_code == KNI.KNI_OK: | ||
| # Write output record | ||
| output_file.write(output_record + "\n") |
| def main(): | ||
| """Main entry point for command-line execution.""" | ||
| # Check command-line arguments | ||
| if len(sys.argv) < 5: |
There was a problem hiding this comment.
I'd use argparse (https://docs.python.org/3/library/argparse.html#module-argparse) because it provides argument (type) checking and help and is more standard Python. I'd keep sys.argv for quick one-shot scripts with a rather simple argument list (like, a couple of arguments maximum).
| """Main entry point for command-line execution.""" | ||
| # Check command-line arguments | ||
| if len(sys.argv) < 5: | ||
| print( |
There was a problem hiding this comment.
I'd distinguish two cases for usage / help information:
- fewer than required arguments are provided, in which case the message should indeed go to
stderr; - the user requests for help via e.g. the
--helpoption, in which case the message should go tostdoutIMHO, so that the user can easily pipe that output to other commands, like e.g.lessorgrep.
argparse handles all this with minimum hassle AFAIK.
| main_file.readline() | ||
|
|
||
| for line_number, line in enumerate(main_file, start=2): | ||
| main_record = line.rstrip("\n\r") | ||
| if not main_record: | ||
| continue |
There was a problem hiding this comment.
I would have simplified header skip to:
for line_number, line in enumerate(main_file, start=1):
# Skip header
if line_number == 1:
continue
# Get main record
main_record = line.rstrip()
...| The purpose of KNI is to allow a deeper integration of Khiops in information systems, by means of the C programming language, using a shared library (`.dll` in Windows, `.so` in Linux). This relates especially to the problem of model deployment, which otherwise requires the use of input and output data files when using directly the Khiops tool in batch mode. See Khiops Guide for an introduction to dictionary files, dictionaries, database files and deployment. | ||
|
|
||
| The Khiops deployment API is thus made public through a shared library. Therefore, a Khiops model can be deployed directly from any programming language, such as C, C++, Java, Python, Matlab, etc. This enables real time model deployment without the overhead of temporary data files or launching executables. This is critical for certain applications, such as marketing or targeted advertising on the web.. | ||
| The Khiops deployment API is thus made public through a shared library. Therefore, a Khiops model can be deployed directly from any programming language, such as C, C++, Java, Python, Matlab, etc. This enables real-time model deployment without the overhead of temporary data files or launching executables. This is critical for certain applications, such as marketing or targeted advertising on the web. |
There was a problem hiding this comment.
I'd state somewhere that the API is compatible with ISO C99 (or the applicable standard if different, e.g. ANSI / ISO C89).
| @@ -14,15 +14,32 @@ See [KhiopsNativeInterface.h](include/KhiopsNativeInterface.h) for a detailed de | |||
| > [!CAUTION] | |||
| > The functions are not reentrant (thread-safe): the library can be used simultaneously by several executables, but not simultaneously by several threads in the same executable. | |||
There was a problem hiding this comment.
Could we suggest using multiprocessing e.g. via MPI if parallelization is needed?
There was a problem hiding this comment.
I think It's not necessary
|
|
||
| ## Requirements | ||
|
|
||
| - Python 3.6 or later |
There was a problem hiding this comment.
I'd start with the earliest still supported version of Python, viz. 3.10 currently.
popescu-v
left a comment
There was a problem hiding this comment.
See the comments.
Generally, I would:
- type-check the input arguments of the Python API systematically, via
is_instanceand raiseTypeErrorin case of erroneous types. - favor raising exceptions rather than returning non-zero error codes in the Python API, to achieve a "least-surprise" API for Python users.
- systematically use
argparsefor dealing with input arguments to the example scripts.
This commit modernizes the Python KNI wrapper to follow Pythonic conventions by using exceptions for error handling instead of C-style return codes. Changes to KNI.py: - Add KNIError exception class with error_code attribute - Refactor all methods to raise KNIError on failure instead of returning error codes - open_stream() now returns stream_handle directly (raises on error) - recode_stream_record() now returns output string directly (no tuple) - Add comprehensive type checking for all method parameters - Extend all string parameters to accept both str and bytes types - Simplify library path resolution to use KNI_HOME/ only Changes to KNIRecodeFile.py: - Import KNIError from KNI module - Remove all error code checking (if ret_code != KNI.KNI_OK) - Simplify to direct method calls with exception handling - Fix field separator bug: change "\\t" to "\t" for proper tab character - Add comprehensive exception handling in main() Changes to KNIRecodeMTFiles.py: - Import KNIError from KNI module - Remove all error code checking throughout - Simplify stream setup and record processing loops - Clean exception handling throughout Benefits: - More Pythonic and easier to read - Cleaner code with less boilerplate - Better integration with Python's try/except patterns - Type safety with parameter validation - Support for both str and bytes in string parameters
Add Python implementations demonstrating single-table and multi-table recoding with the Khiops Native Interface (KNI).
New files:
Features:
Requirements: Python 3.6+ and KNI shared library installed