Inputs
- document in markdown formats
- embedding configuration
Outputs
- document embedded as chunks stored in a vector DB
Success metrics (TBD)
Element to help us define success:
- for which downstream task do we need vectors: similar doc search, search, RAG, ???
- what storage and vector size can we afford ?
- review embedding leaderboard: https://huggingface.co/spaces/mteb/leaderboard
- where do we want to store vectors (storage ES ?) ?
Inputs
Outputs
Success metrics (TBD)
Element to help us define success: