Skip to content

Commit 140f754

Browse files
committed
Added AI instructions for new encodings
1 parent 2ee1ccf commit 140f754

File tree

4 files changed

+157
-0
lines changed

4 files changed

+157
-0
lines changed

.github/copilot-instructions.md

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
# Copilot Instructions — Enhancements Only
2+
3+
## Scope
4+
This repository focuses on **adding new encoding/decoding schemes only**.
5+
6+
Copilot MUST:
7+
- Propose **new codecs only**
8+
- Avoid refactoring unrelated code
9+
- Avoid dependency changes unless strictly required for the codec
10+
- Avoid stylistic or formatting changes
11+
12+
## Context
13+
This project extends Python's codecs with many encoding/decoding schemes and CLI tools.
14+
It already includes a wide variety of bases, ciphers, compression, and niche encodings.
15+
16+
## Enhancement Guidelines
17+
18+
When adding a new encoding:
19+
1. Check if it already exists in the project
20+
2. Follow the existing codec structure and naming conventions
21+
3. Provide:
22+
- `encode()` implementation
23+
- `decode()` implementation
24+
- Registration into the codec registry
25+
4. Ensure CLI compatibility (if applicable)
26+
27+
## Implementation Constraints
28+
29+
- Pure Python preferred
30+
- No heavy dependencies
31+
- Deterministic transformations only
32+
- Reversible encoding required unless explicitly documented
33+
34+
## Testing
35+
36+
Every new codec MUST include:
37+
- Unit tests (encode/decode roundtrip)
38+
- Edge cases (empty input, binary data if applicable)
39+
40+
## Documentation
41+
42+
Each codec must include:
43+
- Short description
44+
- Reference (standard, RFC, or algorithm source)
45+
- Example usage
46+
47+
## Output Format (IMPORTANT)
48+
49+
When asked to add a codec, Copilot should:
50+
1. Briefly justify the encoding (1–2 lines)
51+
2. Provide full implementation
52+
3. Provide tests
53+
4. Provide documentation snippet
54+
55+
## Explicit Non-Goals
56+
57+
- No refactoring
58+
- No performance optimization passes
59+
- No linting-only changes
60+
- No CI/CD changes
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
Add a new encoding scheme to this repository.
2+
3+
Constraints:
4+
- Follow copilot-instructions.md strictly
5+
- Do not modify unrelated code
6+
- Use existing codec patterns
7+
8+
Task:
9+
Add encoding: {{ENCODING_NAME}}
10+
11+
Requirements:
12+
- Implement according to ADDING_CODECS.md guideline
13+
- Add tests if needed (if `__examples__` cannot be consistently defined)
14+
- Add minimal documentation (in the relevant category page under `docs/pages`)
15+
16+
Reference:
17+
{{LINK_OR_DESCRIPTION}}

.github/pull_request_template.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
## Type
2+
- [ ] New encoding (required)
3+
4+
## Checklist
5+
- [ ] No unrelated changes
6+
- [ ] Codec is new (not already implemented)
7+
- [ ] Tests included (if cannot be automated with `tests/test_generated`)
8+
- [ ] Documentation (included in the right page in `docs/pages/enc`)
9+
10+
## Description
11+
Explain the encoding and its source.

docs/ADDING_CODECS.md

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
# Adding a Codec
2+
3+
1. Categorize accordingly ; categories are the folder names in `src/codext` (further folder references are relative to this). When a category cannot be put in one of these folders, it shall be put by default in `others`.
4+
5+
2. Add the `.py` file in the relevant category folder, named with the short name of the new codec.
6+
7+
3. Respect the typical structure of a codec's `.py` file according to the following template (double-bracketed enclosures indicate codec parameters, double-arrowed enclosures indicate instructions that may refer to further steps of this guideline):
8+
9+
```python
10+
# -*- coding: UTF-8 -*-
11+
"""{{codec_long_name}} Codec - {{codec_short_name}} content encoding.
12+
13+
{{codec_description}}
14+
15+
This codec:
16+
- en/decodes strings from str to str
17+
- en/decodes strings from bytes to bytes
18+
- decodes file content to str (read)
19+
- encodes file content from str to bytes (write)
20+
21+
Reference: {{codec_source_hyperlink}}
22+
"""
23+
from ..__common__ import *
24+
25+
26+
__examples__ = {<<dictionary of examples with, as keys, a special format detailed hereafter and, as values, a dictionary mapping source to destination values (see 7.)>>}
27+
<<optional list of valid codec names to be used with the guessing mode (see 8.), in format "__guess__ = [...]">>]
28+
29+
30+
<<constants here, including ENCMAP if the codec is a simple mapping (see 6.)>>
31+
<<functions here, if the codec requires some additional logic, i.e. when it is not a mapping (see 6.)>>
32+
33+
34+
<<put the right add function (see 4.) here with its relevant parameters (see 5.)>>
35+
```
36+
37+
4. Choose the right add function
38+
39+
If the codec is a simple mapping, use the `add_map` function.
40+
41+
Examples: `languages/braille`, `languages/morse`, `languages/southpark`
42+
43+
In some cases, an algorithm can even be equivalent to one or a number of mappings and can then be defined as a dynamic generation of `ENCMAP`.
44+
45+
Examples: `stegano/resistor`, `crypto/barbie`
46+
47+
When the codec is more complex than a mapping, use the `add` function.
48+
49+
5. Configure the add function
50+
51+
Refer to the relevant function signature in `__common__.py`.
52+
53+
6. Write the codec logic
54+
55+
If the codec is a mapping, at least `ENC_MAP` should be defined and refered in the parameters of the `add_map` function.
56+
57+
Examples: `stegano/rick`, `stegano/klopf`
58+
59+
If the codec is not a mapping, the logic can be written in the following order: the encoding function first, then the decoding function.
60+
61+
Examples: `stegano/whitespace`, `crypto/railfence`
62+
63+
7. Write some examples
64+
65+
Examples are used during the automated test generation. They should then be carefully written to also cover some edge cases. A set of 3-8 examples is generally a must.
66+
67+
8. Specify the names to be used with the guessing mode
68+
69+
The `__guess__` list of codec names is used to limit the possibilities in the tree search from the guessing mode. Especially when the codec is dynamic and may have a large (or even infinite) number of dynamic names, it is necessary to set a limited number, generally maximum 16 as a best practice. This list, when relevant, shall be used with due care.

0 commit comments

Comments
 (0)