Skip to content

Feature Request: support 1 bit Bonsai models #21298

@laurentpayot

Description

@laurentpayot

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

The new 1 bit Bonsai 8B model has just been released.:

1-bit Bonsai 8B implements a proprietary 1-bit model design across the entire network: embeddings, attention layers, MLP layers, and the LM head are all 1-bit. There are no higher-precision escape hatches. It is a true 1-bit model, end to end, across 8.2 billion parameters.
Despite being 14x smaller than the 8B (16-bit) full-precision models in its parameter-count class, it performs competitively on standard benchmarks while operating at radically higher efficiency.

Too bad a fork of llama.cpp is needed.

Related feature request #12598 was unfortunately closed. Could Bonsai models support be added?

Motivation

Performances of this model are impressive. Reactions are very positive.

Possible Implementation

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions