Skip to content

Synthetic Data Forge – RAG-Powered Custom Dataset Generator #14

@anikchand461

Description

@anikchand461

Hi Appwrite team/community! I'm Anik Chand, aspiring ML Engineer from Kolkata, India (B.Tech CSE at Haldia Institute of Technology, CGPA 8.59). With hands-on experience in GANs (TensorFlow/Keras for synthetic handwritten digits on MNIST, including training viz pipelines), RAG chatbots (FastAPI/LangChain/Gemini API deployed on Render for portfolio Q&A), and ensemble classifiers (Scikit-learn/TF-IDF achieving 89% accuracy on sentiment segmentation), I'm pumped to contribute to Hacktoberfest 2025. Check my GitHub for full projects like Fake Handwritten Digits Generation and AkBOT.

Project Overview

A web app where users describe datasets they need (e.g., "Generate 1K synthetic medical records for privacy-safe ML training"), and a RAG system pulls from public schemas/examples to guide a GAN in creating realistic, exportable data. Includes validation via quick ensemble classifiers and a dashboard for previews—perfect for data-scarce ML projects.

Key Features & Appwrite Integration

  • Auth: User accounts to save/share generated datasets securely.
  • Databases: Store user prompts, generation params, and validation metrics (e.g., accuracy logs from Scikit-learn).
  • Storage: Upload/export CSVs/JSONs or GAN artifacts (images/models).
  • Functions: Serverless GAN training/inference (Python runtime with tf.GradientTape) and RAG queries (LangChain for prompt enhancement).
  • Bonus: Realtime progress updates via Messaging for long gens.

Tech: Python/FastAPI backend, Gradio for interactive UI (Matplotlib previews), TensorFlow/NumPy/Scikit-learn. Deployed on Appwrite Sites—leveraging 4+ services for central functionality.

Questions for Feedback

  • How to handle large GAN models in Appwrite Functions (e.g., avoiding cold starts with pre-loaded weights)?
  • Best way to integrate Appwrite Databases as a vector store for RAG (instead of ChromaDB)?
  • Ideas to boost creativity/impact, like Hugging Face model swaps or collaborative dataset sharing?

Would love your input to refine before prototyping—proto repo coming soon!

Thanks!
Anik Chand
mail : anikchand461@gmail.com
portfolio : https://portfolio-fawn-beta-28.vercel.app/
linkedin : https://www.linkedin.com/in/anik-chand-3b14b12b6/
github : https://github.com/anikchand461

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions