The input and output dataclasses define required columns that need to be in the dataframes that are passed in and out of each vectorstore method/api.
The id column is essentially the unique id tracker for many of the input dataframes: embed, search and reverse_search. Currently there are no limitations to the actual column content. If the user has non-unique values in this column it causes issues with the server rest_api processing. Also it's intuitive and clear to have unique ID values for each row. The pandera models should be updated to enforce this, or another appropriate way
The input and output dataclasses define required columns that need to be in the dataframes that are passed in and out of each vectorstore method/api.
The
idcolumn is essentially the unique id tracker for many of the input dataframes:embed,searchandreverse_search. Currently there are no limitations to the actual column content. If the user has non-unique values in this column it causes issues with the server rest_api processing. Also it's intuitive and clear to have unique ID values for each row. The pandera models should be updated to enforce this, or another appropriate way