NVIDIA
diff --git a/‎README.md‎
Lines changed: 9 additions & 1 deletion b/‎README.md‎
Lines changed: 9 additions & 1 deletion
diff --git a/‎example_notebooks/transformers/cite_prompt_logits_processor.ipynb‎
Lines changed: 21 additions & 5 deletions b/‎example_notebooks/transformers/cite_prompt_logits_processor.ipynb‎
Lines changed: 21 additions & 5 deletions
diff --git a/‎example_notebooks/vllm/force_last_phrase_logits_processor.ipynb‎
Lines changed: 54 additions & 15 deletions b/‎example_notebooks/vllm/force_last_phrase_logits_processor.ipynb‎
Lines changed: 54 additions & 15 deletions
diff --git a/‎example_notebooks/vllm/trigger_phrase_logits_processor.ipynb‎
Lines changed: 21 additions & 17 deletions b/‎example_notebooks/vllm/trigger_phrase_logits_processor.ipynb‎
Lines changed: 21 additions & 17 deletions
diff --git a/‎logits_processor_zoo/transformers/base.py‎
Lines changed: 51 additions & 0 deletions b/‎logits_processor_zoo/transformers/base.py‎
Lines changed: 51 additions & 0 deletions
@@ -78,4 +78,12 @@ I am getting a lot of calls during the day. What is more important for me to con
 2. Operating System
 3. Battery
 ```
-The goal is to make LLM generate "3" as an answer.
+The goal is to make LLM generate "3" as an answer.
+
+### TriggerPhraseLogitsProcessor
+A logits processor which triggers phrases when it encounters a given token.
+One common use case is to force writing python code just after thinking:
+```python
+trigger_python = TriggerPhraseLogitsProcessor(phrase="\n```python", trigger_token_phrase="</think>", 
+                                              tokenizer=tokenizer, trigger_count=1, trigger_after=True)
+```
@@ -136,7 +136,7 @@
       "    \n",
       "\n",
       "LLM response:\n",
-      "The user seems to have mixed feelings about the price of the product. They find it expensive, but they also appreciate its softness, colorfulness, and style, which suggests that the product is well-made and worth the cost.\n",
+      "The user seems to have mixed feelings about the price of the product. They find it expensive, but they also appreciate its softness, colorfulness, and style.\n",
       "-----END-----\n",
       "\n",
       "Prompt: \n",
@@ -158,7 +158,7 @@
    "source": [
     "runner.generate_response(\n",
     "    example_prompts,\n",
-    "    [CiteFromPromptLogitsProcessor(runner.tokenizer, example_prompts, boost_factor=2.0)]\n",
+    "    [CiteFromPromptLogitsProcessor(runner.tokenizer, boost_factor=2.0, boost_eos=False)]\n",
     ")"
    ]
   },
@@ -187,9 +187,17 @@
       "    \n",
       "\n",
       "LLM response:\n",
-      "The reviewer seems to have mixed feelings towards the pricing of the product. They describe it as \"expensive\" and \"deserves its price,\" which suggests that they find it worth paying for its quality or unique features. The use of words like \"stylish\" further emphasizes their positive impression.\n",
+      "The reviewer seems to have mixed feelings towards the pricing of the product:\n",
       "\n",
-      "So in summary, while they appreciate the design and style of the product, they also acknowledge that it might be quite pricey. Therefore, their overall sentiment can be described as **mixed**, with appreciation for both aspects (quality and style).\n",
+      "- They describe it as \"very soft\" and \"colorful\", suggesting that they appreciate these qualities.\n",
+      "\n",
+      "- They also mention that it is \"expensive,\" which might be seen as negative if you're looking for an affordable option or if this was their first time buying something like this.\n",
+      "\n",
+      "- However, they state that it \"deserves its price,\" indicating that they believe the high cost reflects on quality or value.\n",
+      "\n",
+      "Overall, while they seem satisfied with the overall experience and don't mind paying more for what they perceive as good-quality materials and design, they may feel that the price point could be higher than expected for everyday use or budget-conscious shoppers.\n",
+      "\n",
+      "So in summary, they find the item to be well-made and aesthetically pleasing despite feeling that it might not be suitable for everyone due to being too pricey for some people's budgets. The reviewer seems generally positive toward the purchase decision itself rather than just the specific item.\n",
       "-----END-----\n",
       "\n",
       "Prompt: \n",
@@ -215,9 +223,17 @@
    "source": [
     "runner.generate_response(\n",
     "    example_prompts,\n",
-    "    [CiteFromPromptLogitsProcessor(runner.tokenizer, example_prompts, boost_factor=-2.0)]\n",
+    "    [CiteFromPromptLogitsProcessor(runner.tokenizer, boost_factor=-2.0, boost_eos=False)]\n",
     ")"
    ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c29fedb3",
+   "metadata": {},
+   "outputs": [],
+   "source": []
   }
  ],
  "metadata": {
 
@@ -28,17 +28,17 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "WARNING 02-12 13:42:36 cuda.py:22] You are using a deprecated `pynvml` package. Please install `nvidia-ml-py` instead. See https://pypi.org/project/pynvml for more information.\n",
-      "WARNING 02-12 13:42:39 config.py:1563] Casting torch.bfloat16 to torch.float16.\n",
-      "INFO 02-12 13:42:39 llm_engine.py:184] Initializing an LLM engine (v0.5.5) with config: model='google/gemma-1.1-2b-it', speculative_config=None, tokenizer='google/gemma-1.1-2b-it', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=8192, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=True, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=google/gemma-1.1-2b-it, use_v2_block_manager=False, enable_prefix_caching=False)\n",
-      "INFO 02-12 13:42:40 model_runner.py:879] Starting to load model google/gemma-1.1-2b-it...\n",
-      "INFO 02-12 13:42:40 weight_utils.py:236] Using model weights format ['*.safetensors']\n"
+      "WARNING 03-18 13:40:54 cuda.py:22] You are using a deprecated `pynvml` package. Please install `nvidia-ml-py` instead. See https://pypi.org/project/pynvml for more information.\n",
+      "WARNING 03-18 13:40:58 config.py:1563] Casting torch.bfloat16 to torch.float16.\n",
+      "INFO 03-18 13:40:58 llm_engine.py:184] Initializing an LLM engine (v0.5.5) with config: model='google/gemma-1.1-2b-it', speculative_config=None, tokenizer='google/gemma-1.1-2b-it', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=8192, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=True, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=google/gemma-1.1-2b-it, use_v2_block_manager=False, enable_prefix_caching=False)\n",
+      "INFO 03-18 13:40:59 model_runner.py:879] Starting to load model google/gemma-1.1-2b-it...\n",
+      "INFO 03-18 13:41:00 weight_utils.py:236] Using model weights format ['*.safetensors']\n"
      ]
     },
     {
      "data": {
       "application/vnd.jupyter.widget-view+json": {
-       "model_id": "",
+       "model_id": "ebad9294acfd4e15aa9272a1aac448df",
        "version_major": 2,
        "version_minor": 0
       },
@@ -53,8 +53,8 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "INFO 02-12 13:42:42 model_runner.py:890] Loading model weights took 4.6720 GB\n",
-      "INFO 02-12 13:42:44 gpu_executor.py:121] # GPU blocks: 49686, # CPU blocks: 14563\n"
+      "INFO 03-18 13:41:02 model_runner.py:890] Loading model weights took 4.6720 GB\n",
+      "INFO 03-18 13:41:05 gpu_executor.py:121] # GPU blocks: 49742, # CPU blocks: 14563\n"
      ]
     }
    ],
@@ -156,10 +156,10 @@
     }
    ],
    "source": [
-    "phrase = \"\\n\\nReferences:\"\n",
+    "reference = ForceLastPhraseLogitsProcessor(\"\\n\\nReferences:\", runner.tokenizer)\n",
     "\n",
     "runner.generate_response(example_prompts,\n",
-    "                         [ForceLastPhraseLogitsProcessor(phrase, runner.tokenizer)])"
+    "                         [reference])"
    ]
   },
   {
@@ -199,19 +199,58 @@
     }
    ],
    "source": [
-    "phrase = \"\\n\\nThanks for trying our RAG application! If you have more questions about\"\n",
+    "thank = ForceLastPhraseLogitsProcessor(\"\\n\\nThanks for trying our RAG application! If you have more questions about\",\n",
+    "                                       runner.tokenizer)\n",
     "\n",
     "runner.generate_response(example_prompts,\n",
-    "                         [ForceLastPhraseLogitsProcessor(phrase, runner.tokenizer)])"
+    "                         [thank])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "34735f41",
+   "metadata": {},
+   "source": [
+    "## Both"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 6,
    "id": "88032bf3",
    "metadata": {},
-   "outputs": [],
-   "source": []
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Prompt: \n",
+      "    Retrieved information from: https://en.wikipedia.org/wiki/Bulbasaur\n",
+      "    Bulbasaur is a fictional Pokémon species in Nintendo and Game Freak's Pokémon franchise. \n",
+      "    Designed by Atsuko Nishida, Bulbasaur is a Grass and Poison-type, first appearing in Pocket Monsters: Red and Green (Pokémon Red and Blue outside Japan) as a starter Pokémon. \n",
+      "    Since then, it has reappeared in sequels, spin-off games, related merchandise, and animated and printed adaptations of the franchise. \n",
+      "    It is a central character in the Pokémon anime, being one of Ash Ketchum's main Pokémon for the first season, with a different one later being obtained by supporting character May. \n",
+      "    It is featured in various manga and is owned by protagonist Red in Pokémon Adventures.\n",
+      "    \n",
+      "    What is Bulbasaur?\n",
+      "    \n",
+      "Bulbasaur is a fictional Pokémon species that is a Grass and Poison-type.\n",
+      "\n",
+      "Thanks for trying our RAG application! If you have more questions about Bulbasaur or anything else, please let me know.\n",
+      "\n",
+      "References:\n",
+      "\n",
+      "* Wikipedia: Bulbasaur\n",
+      "* Pokémon franchise website\n",
+      "-----END-----\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "runner.generate_response(example_prompts,\n",
+    "                         [thank, reference])"
+   ]
   }
  ],
  "metadata": {
 
@@ -28,20 +28,20 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "WARNING 02-13 10:32:45 cuda.py:22] You are using a deprecated `pynvml` package. Please install `nvidia-ml-py` instead. See https://pypi.org/project/pynvml for more information.\n",
-      "WARNING 02-13 10:32:49 config.py:1563] Casting torch.bfloat16 to torch.float16.\n",
-      "WARNING 02-13 10:32:49 arg_utils.py:839] Chunked prefill is enabled by default for models with max_model_len > 32K. Currently, chunked prefill might not work with some features or models. If you encounter any issues, please disable chunked prefill by setting --enable-chunked-prefill=False.\n",
-      "INFO 02-13 10:32:49 config.py:911] Chunked prefill is enabled with max_num_batched_tokens=512.\n",
-      "INFO 02-13 10:32:49 llm_engine.py:184] Initializing an LLM engine (v0.5.5) with config: model='deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B', speculative_config=None, tokenizer='deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=131072, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=True, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B, use_v2_block_manager=False, enable_prefix_caching=False)\n",
-      "INFO 02-13 10:32:50 model_runner.py:879] Starting to load model deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B...\n",
-      "INFO 02-13 10:32:51 weight_utils.py:236] Using model weights format ['*.safetensors']\n",
-      "INFO 02-13 10:32:52 weight_utils.py:280] No model.safetensors.index.json found in remote.\n"
+      "WARNING 03-18 13:37:20 cuda.py:22] You are using a deprecated `pynvml` package. Please install `nvidia-ml-py` instead. See https://pypi.org/project/pynvml for more information.\n",
+      "WARNING 03-18 13:37:24 config.py:1563] Casting torch.bfloat16 to torch.float16.\n",
+      "WARNING 03-18 13:37:24 arg_utils.py:839] Chunked prefill is enabled by default for models with max_model_len > 32K. Currently, chunked prefill might not work with some features or models. If you encounter any issues, please disable chunked prefill by setting --enable-chunked-prefill=False.\n",
+      "INFO 03-18 13:37:24 config.py:911] Chunked prefill is enabled with max_num_batched_tokens=512.\n",
+      "INFO 03-18 13:37:24 llm_engine.py:184] Initializing an LLM engine (v0.5.5) with config: model='deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B', speculative_config=None, tokenizer='deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=131072, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=True, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B, use_v2_block_manager=False, enable_prefix_caching=False)\n",
+      "INFO 03-18 13:37:25 model_runner.py:879] Starting to load model deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B...\n",
+      "INFO 03-18 13:37:26 weight_utils.py:236] Using model weights format ['*.safetensors']\n",
+      "INFO 03-18 13:37:27 weight_utils.py:280] No model.safetensors.index.json found in remote.\n"
      ]
     },
     {
      "data": {
       "application/vnd.jupyter.widget-view+json": {
-       "model_id": "35561044f6c848eb9c56be591d6c50c8",
+       "model_id": "e3ff26a7bf23415c9e29952e2a5f2d1a",
        "version_major": 2,
        "version_minor": 0
       },
@@ -56,8 +56,8 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "INFO 02-13 10:32:53 model_runner.py:890] Loading model weights took 3.3460 GB\n",
-      "INFO 02-13 10:32:53 gpu_executor.py:121] # GPU blocks: 37897, # CPU blocks: 9362\n"
+      "INFO 03-18 13:37:29 model_runner.py:890] Loading model weights took 3.3460 GB\n",
+      "INFO 03-18 13:37:29 gpu_executor.py:121] # GPU blocks: 37898, # CPU blocks: 9362\n"
      ]
     }
    ],
@@ -338,9 +338,11 @@
     }
    ],
    "source": [
+    "trigger_python = TriggerPhraseLogitsProcessor(\"\\n```python\", \"</think>\", runner.tokenizer, \n",
+    "                                              trigger_count=1, trigger_after=True)\n",
+    "\n",
     "runner.generate_response(example_prompts,\n",
-    "                         [TriggerPhraseLogitsProcessor(\"\\n```python\", \"</think>\", runner.tokenizer, \n",
-    "                                                       trigger_count=1, trigger_after=True)],\n",
+    "                         [trigger_python],\n",
     "                        max_tokens=4096)"
    ]
   },
@@ -387,12 +389,14 @@
     }
    ],
    "source": [
+    "keep_thinking_short = GenLengthLogitsProcessor(runner.tokenizer, boost_factor=0.1, complete_sentences=True,\n",
+    "                                               boost_token_str=\"</think>\")\n",
+    "\n",
     "runner.generate_response(example_prompts,\n",
     "                         [\n",
-    "                             GenLengthLogitsProcessor(runner.tokenizer, boost_factor=0.1, complete_sentences=True,\n",
-    "                                                      boost_token_str=\"</think>\"),\n",
-    "                             TriggerPhraseLogitsProcessor(\"\\n```python\", \"</think>\", runner.tokenizer, \n",
-    "                                                       trigger_count=1, trigger_after=True)],\n",
+    "                             keep_thinking_short,\n",
+    "                             trigger_python\n",
+    "                         ],\n",
     "                        max_tokens=4096)"
    ]
   },
 
@@ -0,0 +1,51 @@
+#
+# SPDX-FileCopyrightText: Copyright (c) 1993-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import torch
+
+
+class BaseLogitsProcessor:
+    def __init__(self):
+        self.prompt_token_ids = None
+        self.prev_token_ids = None
+
+    def _reset(self):
+        pass
+
+    def _check_new_generation(self, input_ids: torch.LongTensor):
+        first_time = self.prompt_token_ids is None
+        if first_time:
+            self._reset()
+            self.prompt_token_ids = input_ids
+        else:
+            same_gen = False
+            if input_ids.shape[1] > 1:
+                same_gen = torch.equal(input_ids[:, :-1], self.prev_token_ids)
+
+            if not same_gen:
+                self._reset()
+                self.prompt_token_ids = input_ids
+
+        self.prev_token_ids = input_ids
+
+    def _process(self, input_ids: torch.LongTensor, scores: torch.FloatTensor) -> torch.Tensor:
+        return scores
+
+    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor) -> torch.Tensor:
+        self._check_new_generation(input_ids)
+        scores = self._process(input_ids, scores)
+        return scores