Skip to content

Commit 7f6cbe2

Browse files
erwardenaarclaude
andcommitted
Move field table to vignette; add app screenshot to README
- Replaces the 25-row field table in README with a brief summary and link to the rendered vignette, which now carries the full table in the Python API section - Adds streamlitapp.png screenshot to README - Gitignores README.html, README_files/, and *.xlsx to prevent accidental commits of render artefacts and local files Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent c788920 commit 7f6cbe2

4 files changed

Lines changed: 39 additions & 29 deletions

File tree

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,3 +11,6 @@ docs/vignette_files/
1111
docs/code_walkthrough.md
1212
docs/code_walkthrough.html
1313
docs/code_walkthrough_files/
14+
README.html
15+
README_files/
16+
*.xlsx

README.md

Lines changed: 3 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -8,34 +8,7 @@ Choosing the right open LLM for research is hard given the rapidly growing lands
88

99
## Database fields
1010

11-
Each model record contains 25 fields:
12-
13-
| Field | Type | Description |
14-
|---|---|---|
15-
| `name`, `family`, `organization`, `country_of_origin` | str | Model identity |
16-
| `release_year` | int | Year of public release |
17-
| `size_b` | float | Model size in billions of parameters |
18-
| `training_tokens_b` | float \| None | Pre-training token count in billions; `None` when undisclosed |
19-
| `context_window` | int | Maximum context length in tokens |
20-
| `modality` | list[str] | Supported modalities (`"text"` and/or `"image"`) |
21-
| `architecture` | str | `decoder-only`, `encoder-decoder`, or `mixture-of-experts` |
22-
| `license` | str | License name |
23-
| `open_weights` | bool | Model weights are publicly available |
24-
| `open_training_data` | bool | Training data is publicly available |
25-
| `intermediate_checkpoints` | bool | Intermediate training checkpoints have been released |
26-
| `open_code` | bool | Training code is publicly available |
27-
| `multilingual` | bool | Officially supports more than one language |
28-
| `num_languages` | int | Number of officially supported languages |
29-
| `languages` | list[str] | Names of officially supported languages |
30-
| `has_instruct_version` | bool | An instruction-tuned variant exists (or the model is itself instruction-tuned) |
31-
| `model_type` | str | Model release type: `"base"`, `"instruct"`, or `"reasoning"` |
32-
| `has_think_version` | bool | A chain-of-thought / think variant exists (or the model is itself a reasoning model) |
33-
| `notes` | str *(optional)* | Additional context; present only for models where extra clarification is needed (e.g. post-trained models where `training_tokens_b` is null for structural reasons) |
34-
| `foundational_paper` | str | URL of the foundational paper (arXiv for most models; non-arXiv for GPT-J 6B, Grok-1, Mixtral 8x22B, and Sarvam 30B) |
35-
| `huggingface_id` | str | HuggingFace model identifier |
36-
| `openness_score` | int | Computed 0–5 score: sum of `open_weights` + `open_training_data` + `intermediate_checkpoints` + `open_code` + permissive license (Apache 2.0 or MIT) |
37-
38-
Languages reflect officially supported languages as documented by the model creators, not partial or limited capabilities (e.g. Falcon supports German, Spanish and French officially, but has only limited capabilities in several other languages which are not included).
11+
Each model record contains 25 fields covering identity, size, training scale, context window, modality, architecture, license, openness flags, language support, and links to the foundational paper and HuggingFace page. Most records are base models; a small number are instruct or reasoning variants. See the [rendered vignette](https://htmlpreview.github.io/?https://github.com/Programming-The-Next-Step-2026/openllm-selector/blob/week-4/docs/vignette.html) for the full field reference.
3912

4013
## Installation
4114

@@ -49,6 +22,8 @@ To run the interactive Streamlit app locally:
4922
streamlit run app/app.py
5023
```
5124

25+
![openllm-selector Streamlit app](streamlitapp.png)
26+
5227
## Python API
5328

5429
```python

docs/vignette.qmd

Lines changed: 33 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,9 +22,14 @@ An optional `notes` field is present for models that require additional context,
2222
pip install git+https://github.com/Programming-The-Next-Step-2026/openllm-selector.git@week-4
2323
```
2424

25-
This gives access to the Python API (`load_models`, `filter_models`, and related functions) without any further steps. To run the interactive Streamlit app, clone the repository and launch it from the project root:
25+
This gives access to the Python API (`load_models`, `filter_models`, and related functions) without any further steps.
26+
27+
To run the interactive Streamlit app, clone the repository and launch it from the project root:
2628

2729
```bash
30+
git clone https://github.com/Programming-The-Next-Step-2026/openllm-selector.git
31+
cd openllm-selector
32+
git checkout week-4
2833
streamlit run app/app.py
2934
```
3035

@@ -214,6 +219,33 @@ The same five questions answered in code. All functions are importable directly
214219
import openllm_selector as o
215220
```
216221

222+
Each model dict contains the following fields:
223+
224+
| Field | Type | Description |
225+
|---|---|---|
226+
| `name`, `family`, `organization`, `country_of_origin` | str | Model identity |
227+
| `release_year` | int | Year of public release |
228+
| `size_b` | float | Model size in billions of parameters |
229+
| `training_tokens_b` | float \| None | Pre-training token count in billions; `None` when undisclosed |
230+
| `context_window` | int | Maximum context length in tokens |
231+
| `modality` | list[str] | Supported modalities (`"text"` and/or `"image"`) |
232+
| `architecture` | str | `decoder-only`, `encoder-decoder`, or `mixture-of-experts` |
233+
| `license` | str | License name |
234+
| `open_weights` | bool | Model weights are publicly available |
235+
| `open_training_data` | bool | Training data is publicly available |
236+
| `intermediate_checkpoints` | bool | Intermediate training checkpoints have been released |
237+
| `open_code` | bool | Training code is publicly available |
238+
| `multilingual` | bool | Officially supports more than one language |
239+
| `num_languages` | int | Number of officially supported languages |
240+
| `languages` | list[str] | Names of officially supported languages |
241+
| `has_instruct_version` | bool | An instruction-tuned variant exists (or the model is itself instruction-tuned) |
242+
| `model_type` | str | Model release type: `"base"`, `"instruct"`, or `"reasoning"` |
243+
| `has_think_version` | bool | A chain-of-thought / think variant exists (or the model is itself a reasoning model) |
244+
| `notes` | str *(optional)* | Additional context; present only for models where extra clarification is needed (e.g. post-trained models where `training_tokens_b` is null for structural reasons) |
245+
| `foundational_paper` | str | URL of the foundational paper (arXiv for most models; non-arXiv for GPT-J 6B, Grok-1, Mixtral 8x22B, and Sarvam 30B) |
246+
| `huggingface_id` | str | HuggingFace model identifier |
247+
| `openness_score` | int | Computed 0–5 score: sum of `open_weights` + `open_training_data` + `intermediate_checkpoints` + `open_code` + permissive license (Apache 2.0 or MIT) |
248+
217249
---
218250

219251
### Scenario a — Training dynamics researcher

streamlitapp.png

211 KB
Loading

0 commit comments

Comments
 (0)