Skills Officially Comes to Codex

(developers.openai.com)

127 points | by rochansinha 7 hours ago

19 comments

  • btown 12 minutes ago
    Something that’s under-emphasized and vital to understand about Skills is that, by the spec, there’s no RAG on the content of Skill code or markdown - the names and descriptions in every skill’s front-matter are included verbatim in your prompt, and that’s all that’s used to choose a skill.

    So if you have subtle logic in a Skill that’s not mentioned in a description, or you use the skill body to describe use-cases not obvious from the front-matter, it may never be discovered or used.

    Additionally, skill descriptions are all essentially prompt injections, whether relevant/vector-adjacent to your current task or not; if they nudge towards a certain tone, that may apply to your general experience with the LLM. And, of course, they add to your input tokens on every agentic turn. (This feature was proudly brought to you by Big Token.) So be thoughtful about what you load in what context.

    See e.g. https://github.com/openai/codex/blob/a6974087e5c04fc711af68f...

  • cube2222 4 hours ago
    It's so nice that skills are becoming a standard, they are imo a much bigger deal long-term than e.g. MCP.

    Easy to author (at its most basic, just a markdown file), context efficient by default (only preloads yaml front-matter, can lazy load more markdown files as needed), can piggyback on top of existing tooling (for instance, instead of the GitHub MCP, you just make a skill describing how to use the `gh` cli).

    Compared to purpose-tuned system prompts they don't require a purpose-specific agent, and they also compose (the agent can load multiple skills that make sense for a given task).

    Part of the effectiveness of this, is that AI models are heavy enough, that running a sandbox vm for them on the side is likely irrelevant cost-wise, so now the major chat ui providers all give the model such a sandboxed environment - which means skills can also contain python scripts and/or js scripts - again, much simpler, more straightforward, and flexible than e.g. requiring the target to expose remote MCPs.

    Finally, you can use a skill to tell your model how to properly approach using your MCP server - which previously often required either long prompting, or a purpose-specific system prompt, with the cons I've already described.

    • hu3 3 hours ago
      Perhaps you could help me.

      I'm having a hard time figuring out how could I leverage skills in a medium size web application project.

      It's python, PostgreSQL, Django.

      Thanks in advance.

      I wonder if skills are more useful for non crud-like projects. Maybe data science and DevOps.

      • jonrosner 3 hours ago
        you could for example create a skill to access your database for testing purposes and pass in your tables specifications so that the agent can easily retrieve data for you on the fly.
        • derrida 2 hours ago
          Oooooo, woah, I didn't really "get it" thanks for spelling it out a bit, just thought of some crazy cool experiments I can run if that is true.
          • dkdcio 1 hour ago
            it’s also for (typically) longer context you don’t always want the agent to have in its context. if you always want it in context, use rules (memories)

            but if it’s something more involved or less frequently used (perhaps some debugging methodology, or designing new data schemas) skills are probably a good fit

      • freakynit 3 hours ago
        Skills are not useful for single-shot cases. They are for: cross-team standardization (for LLM generated code), and reliable reusability of existing code/learnings.
      • JamesSwift 2 hours ago
        Skills are the matrix scene where neo learns kungfu. Imagine they are a database of specialized knowledge that can an agent can instantly tap into _on demand_.

        The key here is “on demand”. Not every agent or convention needs to know kung fu. But when they do, a skill is waiting to be consumed. This basic idea is “progressive disclosure” and it composes nicely to keep context windows focused. Eg i have a metabase skill to query analytics. Within that I conditionally refer to how to generate authentication if they arent authenticated. If they are authenticated, that information need not be consumed.

        Some practical “skills”: writing tests, fetching sentry info, using playwright (a lot of local mcps are just flat out replaced by skills), submitting a PR according to team conventions (eg run lint, review code for X, title matches format, etc)

        • aed 1 hour ago
          Could you explain more about your metabase skill and how you use it? We use metabase (and generally love it) and I’m interested to hear about how other people are using it!
  • astra90 1 hour ago
    I think Skills could turn into something like open source libraries: standardized solutions to common problems, often written by experts.

    Imagine having Skills available that implements authentication systems, multi-tenancy, etc.. in your codebase without having to know all the details about how to do this securely and correctly. This would probably boost code quality a lot and prevent insecure/buggy vibe coded products.

    • JimDabell 1 hour ago
      And then you make a global index of those skills available to models, where they can search for an appropriate skill on demand, then download and use them automatically.

      A lot of the things we want continuous learning for can actually be provided by the ability to obtain skills on the fly.

  • freakynit 3 hours ago
    I already was doing something similar on a regular basis.

    I have many "folders"... each with a README.md, a scripts folder, and an optional GUIDE.md.

    Whenever I arrive at some code that I know can be reused easily (for example: clerk.dev integration hat spans frontend and backend both), I used to create a "folder" of the same.

    When needed, I used to just copy-paste all the folder content using my https://www.npmjs.com/package/merge-to-md package.

    This has worked flawlessly well for me uptil now.

    Glad we are bringing such capability natively into these coding agents.

  • andybak 3 hours ago
    Skills, plugins, apps, connectors, MCPs, agents - anyone else getting a bit lost?
    • Frost1x 2 hours ago
      In my opinion it’s to some degree an artifact of immature and/or rapidly changing technology. Basically not many know what the best approach is, all the use cases aren’t well understood, and things are changing so rapidly they’re basically just creating interfaces around everything so you can change flow in and out of LLMs any way you may desire.

      Some paths are emerging popular, but in a lot of cases we’re still not sure even these are the long term paths that will remain. It doesn’t help that there’s not a good taxonomy (that I’m aware of) to define and organize the different approaches out there. “Agent” for example is a highly overloaded term that means a lot of things and even in this space, agents mean different things to different groups.

      • nlawalker 4 minutes ago
        I liken the discovery/invention of LLMs to the discovery/invention of the electric motor - it's easy to take things like cars, drills, fans, pumps etc. for granted now, and all of the ergonomics and standards around them seem obvious in this era, but it took quite a while to go from "we can put power in this thing and it spins" to the state we're in today.

        For LLMs, we're just about at the stage where we've realized we can jam a sharp thing in the spinny part and use it to cut things. The race is on not only to improve the motors (models) themselves, but to invent ways of holding and manipulating and taking advantage of this fundamental thing that feel so natural that they seem obvious in hindsight.

    • not_a_toaster 2 hours ago
      They’re all bandaids
      • throwuxiytayq 2 hours ago
        Just like C++, JavaScript and every Microsoft product in existence
    • maddmann 2 hours ago
      It reminds me of llm output at scale. Llms tend to produce a lot of similar but slightly different ideas in a codebase, when not properly guided.
    • iLoveOncall 2 hours ago
      All marketing names for APIs and prompts. IMO you don't need to even try to follow, because there's nothing inherently new or innovative about any of this.
    • ksdnjweusdnkl21 2 hours ago
      It's like JS frameworks. Just wait until a React emerges and get up to speed with that later.
      • andybak 1 hour ago
        That's funny. My reaction to react emerging was to run away from JS frameworks entirely.
      • riffraff 1 hour ago
        React itself took a few years for react to decide how it should work (hooks not classes etc).
        • tartoran 1 hour ago
          Probably same will follow with LLMs. If you find something that works for you, sorry but that will change.
  • orliesaurus 1 hour ago
    If there was a marketplace or directory of skills.md files that were ranked with comments, it would be a good idea for the propagating of this tech
    • relativeadv 1 hour ago
      it feels like people keep attempting this idea, largely because its easy to build, but in practice people aren't interested using others' prompts because the cost to create a customized skill/gpt/prompt/whatever is near zero
      • true2octave 1 hour ago
        People want inspiration rather than off-the-shelf prompts

        More like a gallery than a marketplace

    • nickdichev 18 minutes ago
      I created a skill to write skills (based on the Anthropic docs). I think the value is really in making the skills work for your workflows and code base
    • dkdcio 1 hour ago
      ask, receive! https://github.com/anthropics/skills

      not ranked with comments but I’d expect solid quality from these and they should “just work” in Codex etc.

  • arnabgho 7 minutes ago
    Anthropic: Chief Product Officer of OpenAI
  • ithkuil 1 hour ago
    I wonder if generated skills could be useful to codify the outcome of long sessions where the agent has tried a bunch of things and then finally settled on a solution based on a mixture of test failures and user feedback
    • dkdcio 1 hour ago
      yeah I have a “meta” skill and often use it after a session to instruct CC to update its own skills/rules. get the flywheel going
  • pupppet 48 minutes ago
    How are skills different than tool/function calling?
    • esafak 0 minutes ago
      It's the catalog for the tools. Especially useful if you have custom tools; they expect the basics like grep and jq to be there.
    • jinushaun 37 minutes ago
      I agree. I don’t see how this is different from tool calling. We just put the tool instructions in a folder of markdown files.
      • yousif_123123 21 minutes ago
        It doesn't need to be describing a function. It could be explaining the skill in any way, it's kind of just like more instructions and metadata to be load just in time vs given all at once to the model.
  • mikaelaast 4 hours ago
    Are we sure that unrestricted free-form Markdown content is the best configuration format for this kind of thing? I know there is a YAML frontmatter component to this, but doesn't the free-form nature of the "body" part of these configuration files lead to an inevitably unverifiable process? I would like my agents to be inherently evaluable, and free-text instructions do not lend themselves easily to systematic evaluation.
    • coldtea 3 hours ago
      >doesn't the free-form nature of the "body" part of these configuration files lead to an inevitably unverifiable process?

      The non-deterministic statistical nature of LLMs means it's inherently an "inevitably unverifiable process" to begin with, even if you pass it some type-checked, linted, skills file or prompt format.

      Besides, YAML or JSON or XML or free-form text, for the LLM it's just tokens.

      At best you could parse the more structured docs with external tools more easily, but that's about it, not much difference when it comes to their LLM consumption.

    • Etheryte 4 hours ago
      The modern state of the art is inherently not verifiable. Which way you give it input is really secondary to that fact. When you don't see weights or know anything else about the system, any idea of verifiability is an illusion.
      • mikaelaast 3 hours ago
        Sure. Verifiability is far-fetched. But say I want to produce a statistically significant evaluation result from this – essentially testing a piece of prose. How do I go about this, short of relying on a vague LLM-as-a-judge metric? What are the parameters?
        • coldtea 2 hours ago
          Would a structured skills file format help you evaluate the results more?
          • mikaelaast 2 hours ago
            Yes. It would make it much easier to evaluate results if the input contents were parameterized and normalized to some agreed-upon structure.

            Not to mention the advantages it would present for iteration and improvement.

      • hu3 3 hours ago
        At least MCPs can be unit tested.

        With Skills however, you just selectively append more text to prompt and pray.

  • rdli 3 hours ago
    This is great. At my startup, we have a mix of Codex/CC users so having a common set of skills we can all use for building is exciting.

    It’s also interesting to see how instead of a plan mode like CC, Codex is implementing planning as a skill.

    • greymalik 2 hours ago
      I’m probably missing it, but I don’t see how you can share skills across agents, other than maybe symlinking .claude/skills and .codex/skills to the same place?
      • rdli 2 hours ago
        Nothing super-fancy. We have a common GitHub repo in our org for skills, and everyone checks out the repo into their preferred setup locally.

        (To clarify, I meant that some engineers mostly use CC while others mostly use Codex, as opposed to engineers using both at the same time.)

      • hugh-avherald 1 hour ago
        Codex 5.2 automatically picked up my claude agents' skills. Didn't prompt for it, it just so happened that what I asked it for, one of claude's agents' prompts was useful, so Codex ran with it.
  • stared 4 hours ago
    Yes! I was raving about Claude Skills a few days ago (vide https://quesma.com/blog/claude-skills-not-antigravity/), and excited they come to Codex as well!
    • derrida 2 hours ago
      Thanks for that! You mentioned Antigravity seemed slow, I just started playing with it too (but not really given it a good go yet to really evaluate) but I had the model set to Gemini Flash, maybe you get a speed up if you do that?
      • stared 1 hour ago
        My motivation was to use the smartest model available (overall, not only from Google) - I wanted to squeeze more out of Gemini 3 Pro that in Cursor. With new model releases usually there are things with outages. This are ever changing.

        That said, for many tasks (summaries and data extraction) I do use Gemini 2.5 Flash, as it cheap and fast. So excited to try Gemini 3 Flash as well.

  • jonrosner 3 hours ago
    one thing that I am missing from the specification is a way to inject specific variables into the skills. If I create let's say a postgres-skill, then I can either (1) provide the password on every skill execution or (2) hardcode the password into my script. To make this really useful there needs to be some kind of secret storage that the agent can read/write. This would also allow me as a programmer to sell the skills that I create more easily to customers.
    • j_bum 2 hours ago
      I have no clue how you’re running your agents or what you’re building, but giving the raw password string to a the model seems dubious?

      Otherwise, why not just keep the password in an .env file, and state “grab the password from the .env file” in your Postgres skill?

      • jonrosner 2 hours ago
        I am thinking of distributing skills that I build to my clients. As my clients are mostly non-technical users I need this process of distribution to be as easy as possible. Even adding a .env file would probably be too much for most of them. With skills I can now finally distribute my logic easily, just send the raw files and tell them to put it into a folder - done. But there is no easy way for them to "setup" the credentials in those skills yet. The best UX in my opinion would be for Codex (or Claude, doesn't matter) to ask for those setup-parameters once when first using the skill and process the inputs in a secure manner, i.e. some internal secret storage
    • bavell 2 hours ago
      > there needs to be some kind of secret storage that the agent can read/write

      Why not the filesystem?

      I would create a local file (e.g. .env) in each project using postgres, then in my postgres skill, tell the agent to check that file for credentials.

  • not_a_toaster 2 hours ago
    We’ve made a zero shot decision tree
  • summarity 4 hours ago
  • rochansinha 7 hours ago
    Agent Skills let you extend Codex with task-specific capabilities. A skill packages instructions, resources, and optional scripts so Codex can perform a specific workflow reliably. You can share skills across teams or the community, and they build on the open Agent Skills standard.

    Skills are available in both the Codex CLI and IDE extensions.

    • dan_wood 5 hours ago
      Thanks to Anthropic.
  • karolcodes 4 hours ago
    anyone using this in agentic workflow already? how is it?
  • alexgotoi 2 hours ago
    At any HR conference you go, there are two overused words: AI and Skills.

    As of this week, this also applies to Hacker News.

  • haffi112 5 hours ago
    What are your favourite skills?
    • frankc 1 hour ago
      The skills that matter most to me are the ones I create myself (with the skill creator skill) that are very specific and proprietary. For instance, a skill on how to write a service in my back-testing framework.

      I do also like to make skills on things that are more niche tools, like marimo (a very nice jupyter replacement). The model probably does known some stuff about it, but not enough, and the agent could find enough online or in context7, but it will waste a lot of time and context in figuring it out every time. So instead I will have a deep thinking agent do all that research up front and build a skill for it, and I might customize it to be more specific to my environment, but it's mostly the condensed research of the agent so that I don't need to redo that every time.

    • dmd 2 hours ago
      A very particular set of skills.
    • pylotlight 4 hours ago
      nunchuck skills