Skip to content

feat: Add multi-format file upload support for RAG (PDF, TXT, JSON, TSV) #91

@yash12991

Description

@yash12991

Overview

This PR adds support for multiple file formats (PDF, TXT, JSON, TSV) to the LocalMind RAG system, addressing the "Coming Soon" items mentioned in the README.

Changes Made

New Features

  • Support for PDF, TXT, JSON, and TSV file formats in addition to CSV
  • Automatic file format detection based on extension and MIME type
  • File upload functionality using multer middleware
  • Comprehensive file validation (type, size, format)
  • New endpoint: GET /api/v1/dataset/formats to list supported formats
  • Automatic file cleanup after processing or on error

Files Added

  • DataSet.fileLoader.ts - Universal file loader supporting all formats
  • DataSet.multer.ts - Multer configuration for file uploads
  • DataSet.validator.ts - File validation middleware
  • DataSet.type.ts - TypeScript interfaces and enums
  • README.md - Complete documentation with API examples
  • test-samples/ - Sample files for each supported format

Files Modified

  • DataSet.controller.ts - Updated with multi-format support
  • DataSet.routes.ts - Changed from GET to POST with file upload
  • package.json - Added pdf-parse, multer dependencies

Dependencies Added

  • pdf-parse - For PDF document parsing
  • multer - For handling file uploads
  • @types/multer - TypeScript types for multer

Breaking Changes

⚠️API Endpoint Changed: Upload endpoint changed from GET /upload to POST /upload with multipart/form-data

Migration:

# Before
GET /api/v1/dataset/upload

# After
POST /api/v1/dataset/upload
Content-Type: multipart/form-data
Body: file (form field)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions