You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/blog/a-developers-guide-to-building-and-testing-ai-skills.md
+23-8Lines changed: 23 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -114,20 +114,35 @@ In practice, each behavior becomes a small eval case:
114
114
- some context, like the Vue version
115
115
- a few rules that tell me whether the result is acceptable
116
116
117
-
In order to keep this article short, you can find a concrete example on my Github, have a look ! <RichLinkhref="https://github.com/adrienZ/skill-development-demo"title="adrienZ/skill-development-demo"></RichLink>
117
+
In order to keep this article short, you can find a concrete example on my Github, have a look !
To be honest, my knowledge on evals is quite new, there is things to improve, A/B testing and model temperature could be good way to make a skill more reliable. Current evals systems are also not as structured as traditional unit tests are, this may change in the future !
122
+
123
+
If you want to learn more about evals, here some references from people who knows what they are doing 😝
Your skill is now well written, validated and tested, how can you easily share it with your colleagues ?
131
+
132
+
Vercel created a skill registry called [skills.sh](https://www.skills.sh/). As per npm, it allows you to publish, audit, browse and install skills. It uses a `skills-lock.json` to easily setup your projects.
If you work in private codebase, you can use any github url or filesystem path.
118
137
119
138
## Conclusion
120
139
121
140
Creating a skill can make a big difference in your everyday life, but they are "just" text and the ouput of you LLM can vary.
122
141
123
-
As developer we have tools to make this better
142
+
As developer we have tools to make this better:
124
143
- standard: Open agent skill
125
144
- "linters": skill-ref, skill-validator
126
145
- unit-test: evals
146
+
- registry: similarly than npm, we have skills.sh
127
147
128
-
To be honest, my knowledge on evals is quite new, there is things to improve, A/B testing and model temperature could be good way to make a skill more reliable. Current evals systems are also not as structured as traditional unit tests are, this may change in the future !
129
-
130
-
If you want to learn more about evals, here some references from people who knows what they are doing 😝
0 commit comments