|
| 1 | +# KB Folder Manager - Copilot Instructions |
| 2 | + |
| 3 | +## Project Overview |
| 4 | + |
| 5 | +A Windows/Python tool for managing personal knowledge base folders. Splits Complete directories into Doc (documents) and Res (resources), merges them back, validates structure compliance, and generates hash-based indexes. |
| 6 | + |
| 7 | +**Version 3.0** includes a modern GUI built with ttkbootstrap, making the tool accessible to non-technical users. |
| 8 | + |
| 9 | +**Important**: The project uses relative imports and is not installed as a package. All commands should be run from the project root directory, or PYTHONPATH must be set to include the project root. |
| 10 | + |
| 11 | +## Build, Test, and Lint |
| 12 | + |
| 13 | +### Install Dependencies |
| 14 | +```bash |
| 15 | +pip install -r requirements.txt |
| 16 | +``` |
| 17 | + |
| 18 | +This installs: |
| 19 | +- PyYAML (config management) |
| 20 | +- ttkbootstrap (modern GUI framework) |
| 21 | +- pillow (image support for GUI) |
| 22 | + |
| 23 | +### Run GUI |
| 24 | +```bash |
| 25 | +# Launch graphical interface (from project root) |
| 26 | +python kb_folder_manager_gui.py |
| 27 | + |
| 28 | +# Or via module (requires PYTHONPATH) |
| 29 | +export PYTHONPATH=$(pwd) # Linux/Mac |
| 30 | +$env:PYTHONPATH = "$PWD" # Windows PowerShell |
| 31 | +python -m kb_folder_manager.gui |
| 32 | +``` |
| 33 | + |
| 34 | +### Dependencies |
| 35 | +All dependencies are in `requirements.txt`: |
| 36 | +- **PyYAML** (>= 6.0): Config file parsing |
| 37 | +- **ttkbootstrap** (>= 1.20.0): Modern GUI framework |
| 38 | +- **pillow** (>= 10.0.0): Image support for GUI |
| 39 | + |
| 40 | +No additional dependencies like tqdm, pandas, numpy, etc. are used. |
| 41 | + |
| 42 | +### Run Tests |
| 43 | +```bash |
| 44 | +# Run all tests |
| 45 | +python -m pytest tests/ |
| 46 | + |
| 47 | +# Run single test file |
| 48 | +python -m pytest tests/test_basic.py |
| 49 | + |
| 50 | +# Run specific test |
| 51 | +python -m pytest tests/test_basic.py::TestSplitMerge::test_split_merge_roundtrip |
| 52 | + |
| 53 | +# Run GUI tests (simulated interactions) |
| 54 | +python tests/test_gui.py |
| 55 | +``` |
| 56 | + |
| 57 | +### Manual Testing Commands |
| 58 | +```bash |
| 59 | +# Split operation |
| 60 | +python kb_folder_manager.py split --source "D:\Data\MyKB" --output-root "D:\Output\SplitRun" |
| 61 | + |
| 62 | +# Merge operation |
| 63 | +python kb_folder_manager.py merge --doc "D:\Output\doc\MyKB" --res "D:\Output\res\MyKB" --output-root "D:\Output\MergeRun" |
| 64 | + |
| 65 | +# Validate |
| 66 | +python kb_folder_manager.py validate --mode class1 --target "D:\Data\MyKB" --role complete --log-dir "D:\Output\logs" |
| 67 | + |
| 68 | +# Index generation |
| 69 | +python kb_folder_manager.py index --target "D:\Data\MyKB" --output "index.json" --log-dir "D:\Output\logs" |
| 70 | +``` |
| 71 | + |
| 72 | +## Architecture |
| 73 | + |
| 74 | +### User Interfaces |
| 75 | + |
| 76 | +**GUI (v3.0+)**: |
| 77 | +- Entry point: `kb_folder_manager_gui.py` → `kb_folder_manager/gui.py` |
| 78 | +- Framework: ttkbootstrap (modern tkinter) |
| 79 | +- Design: Multi-tab interface (Split/Merge/Validate/Index/Settings) |
| 80 | +- Threading: Operations run in background threads to prevent UI freezing |
| 81 | +- Features: Real-time progress bars, scrollable log output, file browsers |
| 82 | + |
| 83 | +**CLI**: |
| 84 | +- Entry point: `kb_folder_manager.py` → `kb_folder_manager/cli.py` |
| 85 | +- Arguments parsed with argparse |
| 86 | +- Direct calls to operations module |
| 87 | + |
| 88 | +Both interfaces are thin wrappers around the same backend operations. |
| 89 | + |
| 90 | +### Core Principles |
| 91 | + |
| 92 | +1. **Complete Directory is Read-Only** - Never modify Complete folders directly; treat as immutable source of truth |
| 93 | +2. **Placeholder Mechanism** - Empty folders with `placeholder_suffix` (e.g., `"(在百度网盘)"`) mark where files were moved during split |
| 94 | +3. **Closed-Loop Operations** - All operations follow: Pre-check → User Confirmation → Execute → Post-check |
| 95 | +4. **Hash Verification** - All file operations tracked with SHA256 hashes in `.kb_index.json` files |
| 96 | + |
| 97 | +### Data Flow |
| 98 | + |
| 99 | +**Split Operation:** |
| 100 | +``` |
| 101 | +Complete/ |
| 102 | + ├── file.pdf → doc/Complete/file.pdf |
| 103 | + ├── file.bin → res/Complete/file.bin |
| 104 | + └── nested/ → doc/Complete/nested/ (preserved structure) |
| 105 | + ├── doc.txt → doc/Complete/nested/doc.txt |
| 106 | + │ res/Complete/nested/doc.txt(在百度网盘)/ [placeholder] |
| 107 | + └── image.jpg → res/Complete/nested/image.jpg |
| 108 | + doc/Complete/nested/image.jpg(在百度网盘)/ [placeholder] |
| 109 | +``` |
| 110 | + |
| 111 | +**Merge Operation:** |
| 112 | +``` |
| 113 | +doc/ + res/ → complete/ |
| 114 | +- Placeholders are removed |
| 115 | +- Files from both sides combined |
| 116 | +- Structure integrity validated |
| 117 | +``` |
| 118 | + |
| 119 | +### Module Responsibilities |
| 120 | + |
| 121 | +- **gui.py** (NEW in v3.0) - Graphical user interface with ttkbootstrap |
| 122 | + - `KBFolderManagerGUI`: Main window class with tab management |
| 123 | + - `OperationThread`: Background thread for non-blocking operations |
| 124 | + - `LogCapture`: Captures operation logs for display in GUI |
| 125 | +- **cli.py** - Command-line interface, argument parsing, top-level orchestration |
| 126 | +- **operations.py** - Core split/merge/validate/index operations with pre-check → execute → post-check workflow |
| 127 | +- **validator.py** - Structure validation (class1/class2/mutual/compare modes) |
| 128 | +- **indexer.py** - File tree indexing with hash generation |
| 129 | +- **config.py** - YAML config loading and validation |
| 130 | +- **utils.py** - File I/O, logging, path handling, hash computation |
| 131 | + |
| 132 | +## Key Conventions |
| 133 | + |
| 134 | +### Configuration (config.yaml) |
| 135 | + |
| 136 | +- **specified_types** - File extensions that go to Doc side; must be lowercase with dot prefix (e.g., `'.pdf'`, `'.md'`) |
| 137 | +- **placeholder_suffix** - Reserved marker (default: `"(在百度网盘)"`). FATAL error if real directories end with this suffix |
| 138 | +- **hash_algorithm** - Default is `"sha256"`, also supports MD5, SHA1, etc. |
| 139 | +- **use_7zip** - Boolean for compression operations |
| 140 | + |
| 141 | +### File Type Classification |
| 142 | + |
| 143 | +Files are classified by **last extension only** using `is_specified_type()`: |
| 144 | +```python |
| 145 | +# In Doc: .pdf, .doc, .docx, .txt, .md, .xmind, images, videos, audio, code files |
| 146 | +# In Res: Everything else (binary resources, unknown formats) |
| 147 | +``` |
| 148 | + |
| 149 | +### Validation Modes |
| 150 | + |
| 151 | +1. **class1** - Basic environment checks (no UNC paths, no symlinks, no invalid characters, case conflicts) |
| 152 | + - Used for Complete, Doc, and Res folders |
| 153 | + - `allow_placeholders=False` for Complete, `True` for Doc/Res |
| 154 | + |
| 155 | +2. **class2** - Type purity checks (only specified types in Doc, non-specified in Res) |
| 156 | + - Only applies to Doc and Res folders |
| 157 | + - Ensures split was done correctly |
| 158 | + |
| 159 | +3. **mutual** - Doc/Res consistency (structure mirrors, placeholders complement real files) |
| 160 | + - Validates Doc and Res are perfect complements |
| 161 | + |
| 162 | +4. **compare** - Hash/size verification between old and new folders |
| 163 | + - mtime differences are warnings, not errors |
| 164 | + |
| 165 | +### Logger Result Levels |
| 166 | + |
| 167 | +Operations use `Logger` class with three severity levels: |
| 168 | +- **fatal** - Operation must abort (e.g., placeholder suffix in Complete folder name) |
| 169 | +- **error** - Serious issue but operation may continue (e.g., non-empty placeholder directory) |
| 170 | +- **warning** - Informational (e.g., long paths >240 chars, mtime mismatch in compare) |
| 171 | + |
| 172 | +Use `abort_if_blockers(logger, operation_name)` to halt if `logger.result.fatals > 0` |
| 173 | + |
| 174 | +### Path Handling |
| 175 | + |
| 176 | +- **Windows paths only** - Use backslashes, no UNC paths (`\\server\share`) |
| 177 | +- **Relative paths in indexes** - All paths in `.kb_index.json` use forward slashes and are relative to index root |
| 178 | +- **Normalization** - Use `Path.resolve()` for absolute normalization, but store/compare as relative |
| 179 | + |
| 180 | +### Index Structure |
| 181 | + |
| 182 | +`.kb_index.json` contains: |
| 183 | +```json |
| 184 | +{ |
| 185 | + "files": { |
| 186 | + "path/to/file.txt": { |
| 187 | + "kind": "file", |
| 188 | + "size": 1234, |
| 189 | + "mtime": "2026-01-30T12:00:00", |
| 190 | + "hash": "abc123...", |
| 191 | + "hash_alg": "sha256" |
| 192 | + } |
| 193 | + }, |
| 194 | + "dirs": { |
| 195 | + "path/to/dir": {"kind": "dir"} |
| 196 | + }, |
| 197 | + "placeholders": { |
| 198 | + "path/to/file.txt(在百度网盘)": { |
| 199 | + "kind": "placeholder_dir", |
| 200 | + "placeholder_for_name": "file.txt", |
| 201 | + "placeholder_suffix": "(在百度网盘)" |
| 202 | + } |
| 203 | + }, |
| 204 | + "metadata": { |
| 205 | + "root_path": "D:\\Data\\MyKB", |
| 206 | + "generated_at": "2026-01-30T12:00:00" |
| 207 | + } |
| 208 | +} |
| 209 | +``` |
| 210 | + |
| 211 | +### Common Patterns |
| 212 | + |
| 213 | +**Pre-check validation before operations:** |
| 214 | +```python |
| 215 | +pre_log = Logger(log_dir / 'Operation_pre_check.log') |
| 216 | +try: |
| 217 | + validate_class1(source, config, allow_placeholders=False, logger=pre_log) |
| 218 | + write_summary(pre_log) |
| 219 | + abort_if_blockers(pre_log, 'operation pre-check') |
| 220 | +finally: |
| 221 | + pre_log.close() |
| 222 | +``` |
| 223 | + |
| 224 | +**Iterating directory tree:** |
| 225 | +```python |
| 226 | +for rel_root, current_norm, dirs, files, placeholder_dirs in iter_walk(root, placeholder_suffix): |
| 227 | + # rel_root: Path relative to root |
| 228 | + # current_norm: Normalized absolute path to current directory |
| 229 | + # dirs, files: Regular entries |
| 230 | + # placeholder_dirs: Directories ending with placeholder_suffix |
| 231 | +``` |
| 232 | + |
| 233 | +**Placeholder handling:** |
| 234 | +```python |
| 235 | +# Creating placeholder |
| 236 | +placeholder_name = original_name + config.placeholder_suffix |
| 237 | +placeholder_path = parent_dir / placeholder_name |
| 238 | +ensure_dir(placeholder_path) # Create empty directory |
| 239 | + |
| 240 | +# Deriving original name from placeholder |
| 241 | +original = derive_placeholder_original(placeholder_name, config.placeholder_suffix) |
| 242 | +``` |
| 243 | + |
| 244 | +## Important Notes |
| 245 | + |
| 246 | +- Always use `--yes` flag for automated testing to skip confirmation prompts |
| 247 | +- Use `--force` with caution - allows operations on non-empty output directories |
| 248 | +- Log files are timestamped and stored in `logs/<timestamp>/` subdirectories |
| 249 | +- Entry point is `kb_folder_manager.py` which imports from `kb_folder_manager/cli.py` |
| 250 | +- GUI entry point is `kb_folder_manager_gui.py` which imports from `kb_folder_manager/gui.py` |
| 251 | +- Tests use `tempfile.TemporaryDirectory()` for isolation |
| 252 | +- GUI operations run in threads - never block the main (UI) thread |
| 253 | +- GUI tests are automated via `tests/test_gui.py` and simulate user interactions programmatically |
| 254 | + |
| 255 | +## GUI Development Guidelines |
| 256 | + |
| 257 | +- **Never modify backend operations** - GUI is a wrapper only |
| 258 | +- **Use threading for operations** - Keep UI responsive |
| 259 | +- **Update progress via callbacks** - Progress bar and status label |
| 260 | +- **Display logs in real-time** - ScrolledText widget with auto-scroll |
| 261 | +- **Handle errors gracefully** - Catch exceptions and show messagebox |
| 262 | +- **Test via simulation** - `tests/test_gui.py` provides automated testing without actual GUI interaction |
0 commit comments