Yaml is the language nobody needed. All we wanted was a better JSON format that supports comments and doesn’t crash with an extra comma as the end of a list, eg: [1,2,3,]
As someone who has tried to parse this crazypants as a side-effect of the old 1Password.opvault, please don't. The idea that one has to //key[text()=="firstname"]/following-sibling::string/text() because they're not nested they're siblings. Insanity
Trailing commas for sure, but I buy the argument that comments would be used to sneak in arbitrary directives and break interoperability. In fact, I’d like to see a less featureful JSON where everything is strings. Trivial parsing, leave interpretation up to the receiver.
For a configuration language, comments are absolutely crucial. You want to be able to say "# This option is set because <so-and-so>" to explain why you are configuring it this way to the next person that reads the code (or you, in the future).
If the price to pay is that there is some risk some dummy might start parsing the comments as code, so be it. This is not a really a problem in "regular" programming languages, I don't see why it would be in a configuration language.
I don’t think the Linux one is that stupid, but it might be me.
It’s not a “magic comment” because it doesn’t depend on the runtime. It specifies an interpreter to use, regardless of the language of the file.
Eg you can use #!/usr/bin/python for a Python script. I don’t find it worse than the existing alternative of making the file name magic and finding and interpreter based on that.
Comments by themselves provide enough value to justify their supports.
Plus non standard stuff is not a valid argument. As there are many tools which support non standard behaviour, because useful features like comment are considered non standard
In most cases Yaml is bizarre kind of DSL with tricky way of API interaction. For instance - I don't understand why exactly the same Ansible API isn't just python library?
For the same reason any DSL exists: because the programming representation is a lot more verbose than the DSL, due to the computers not currently honoring the "you know what I meant" flag
#!/usr/bin/env python3
"""A made up example of the line noise"""
from ansible import *
def main():
hosts = ["localhost"]
for h in hosts:
run_one_host(h)
def run_one_host(inventory_hostname: str):
connection = ansible.builtin.ssh(inventory_hostname)
print("sniff out the machine's os")
host_vars = ansible.builtin.setup(gather_subset="os", connection)
if host_vars["distribution"] == "ubuntu":
dest = "/etc/apt"
else ...
dest = "/etc/something else"
print("do something awesome")
copy_ok = ansible.builtin.copy(src="./some_file", dest=dest, connection)
...
That said, I can't readily imagine why you couldn't do exactly what you said because the AnsibleModule[1] contract is JSON over stdin/stdout (and one can write Ansible modules in any programming language[2] - they just default to Python)
This format is unreadable on mobile, it keeps opening up my keyboard and scrolling up a bit when it does.
I understand and appreciate the "why" of the format, but this also could have been a non-editable "editor-like" presentation and achieved the same result.
I will die on the hill that TOML should be used for the vast majority of what YAML's used for today. There are times a full language is needed, but I've seen so many YAML files that use none of the features YAML has with all of the footguns.
The thinking that would lead one to the conclusion of "yeah, fine, pick whatever characters you want for the string contents" is a "you are I are solving different problems"
Yaml is a sad Icarus parable. The syntax is great but the type inference is too much. I don't see why we have to throw the baby out with the bathwater and settle for toml, though.
Here's how yaml's type inference should work:
- All object keys are strings (with or without quotes)
- Value atoms are parsed the exact same way as in JSON5
I'm kinda shocked this isn't a thing. StrictYAML is cool but a bit too cumbersome IMO.
Yeah theres some problem, but reading a multiline code block (like github actions bash script) as an indent-escaped string is so much better than having to understand crazy triple-escaped characters like "sed \"\\\\\\"name1\\\\\\\"\""
This is the same debate folks have between Maven and Gradle: do you want CI code to be able to do *anything* that Python or Node can do, or do you want well defined knobs people can turn. If nothing else, it makes code reviews for CI way less drama than trying to use some bespoke dsl-in-python that re-implements {job: {steps: [{run: ...}]}} in a less legible way
Yeah. My experience has often been that you end up with a task that is very easy and familiar to write in say Bash, that you then have to solve a puzzle to write in Ansible/Puppet/whatever. Which feels exactly like what you're saying: a DSL re-implementing something else in a less legible way.
I guess it's like anything: for the right task, the right tool works well. But invariably, a tool will eventually be pushed into use for the wrong task.
you only think until you have to extract some information from it. "Here is a dir full of CI job definitions, find all the ones which require extra permissions" - trivial in yaml or json or toml, could be hard to impossible for Python/nodejs.
or "I am doing ci job frontend, for each job I need to get a list of inputs it takes and their description, but without doing full code checkout" - good luck doing this if your jobs are in python/nodejs.
(to be fair, there are programming languages that are also severely limited and can be evaluated mostly safely, like Starlark, but I don't think they'd match your definition)
I'm a big fan of YAML after coming from JSON and, later, ProtoBuf's many definition formats while working at Google---but it's true that there are a lot of oddities in YAML's magical parsing. I'm grateful for the many ways it's possible to quickly an naturally define simple hierarchies of data (for example, in a docker-compose file).
This website does the rare thing of, after complaining, providing a long list of alternatives. It's really nice.
I wrote a really long blog post about this once without the clock metaphor.
This is far superior in illustrating the slippery slope.
Aside from that slide too an inevitable dinner with Turing completeness, there's often the problem of sourcing information from multiple files overlaying it backtracking where it's sourced from.
Docker files are an example of this, as is the complete list of config values in spring framework (it's like 30 different sources).
In addition config starts getting into secured secrets, service invocations, database lookups, operating system commands, and who else knows what.
So not only is it really a touring complete problem, it veers into hellscape that is system integration.
YAML's idea of human readability misses the mark. Especially anchors. They're the worst tools for abstraction.
JSON with functions as the high level format and JSON as the low level format is the way to go. Examples include Nix and Jsonnet. They're much nicer to deal with and less error prone.
Amusingly, the Stack Overflow answer you linked in your contribution is the second result in a Google search for "YAML multiline string" or the like, after yaml-multiline.info; the two combined appear to the canonical resource on the web.
Yawn. I'll keep using XML. While many waste their youth pointlessly reinventing XSDs in YAML/JSON/TOML/JSON5/HJSON, I'll still be here. Living. Content.
Yaml is the language nobody needed. All we wanted was a better JSON format that supports comments and doesn’t crash with an extra comma as the end of a list, eg: [1,2,3,]
JSON5 is exactly what you're looking for and has somewhat decent adoption. I use it for configuration on one of my projects and I really enjoy it.
Does it support float NaNs? Only asking because of Python's quirky non-standard implementation
I want multiline strings and references
Have you considered XML?
Or heck, PLists. They have an XML representation that is fairly similar to what JSON can express.
https://en.wikipedia.org/wiki/Property_list
As someone who has tried to parse this crazypants as a side-effect of the old 1Password.opvault, please don't. The idea that one has to //key[text()=="firstname"]/following-sibling::string/text() because they're not nested they're siblings. Insanity
Trailing commas for sure, but I buy the argument that comments would be used to sneak in arbitrary directives and break interoperability. In fact, I’d like to see a less featureful JSON where everything is strings. Trivial parsing, leave interpretation up to the receiver.
For a configuration language, comments are absolutely crucial. You want to be able to say "# This option is set because <so-and-so>" to explain why you are configuring it this way to the next person that reads the code (or you, in the future).
If the price to pay is that there is some risk some dummy might start parsing the comments as code, so be it. This is not a really a problem in "regular" programming languages, I don't see why it would be in a configuration language.
I will start by saying, I completely agree with you!
But, then, I have to behave like a typical computer nerd and say..
Well ackchuallyyyyyy:
Browsers do stupid: <!--[if IE 8]>Linux does stupid: #!/bin/bash
C/C++ (preprocessor marcos) do stupid: #ifdef
https://en.wikipedia.org/wiki/Conditional_comment
https://en.wikipedia.org/wiki/Comment_(computer_programming)
I don’t think the Linux one is that stupid, but it might be me.
It’s not a “magic comment” because it doesn’t depend on the runtime. It specifies an interpreter to use, regardless of the language of the file.
Eg you can use #!/usr/bin/python for a Python script. I don’t find it worse than the existing alternative of making the file name magic and finding and interpreter based on that.
> This is not a really a problem in "regular" programming languages
https://go.dev/wiki/Comments#directives :-D
They said "regular" programming languages
Comments by themselves provide enough value to justify their supports.
Plus non standard stuff is not a valid argument. As there are many tools which support non standard behaviour, because useful features like comment are considered non standard
In most cases Yaml is bizarre kind of DSL with tricky way of API interaction. For instance - I don't understand why exactly the same Ansible API isn't just python library?
For the same reason any DSL exists: because the programming representation is a lot more verbose than the DSL, due to the computers not currently honoring the "you know what I meant" flag
That said, I can't readily imagine why you couldn't do exactly what you said because the AnsibleModule[1] contract is JSON over stdin/stdout (and one can write Ansible modules in any programming language[2] - they just default to Python)1: https://github.com/ansible/ansible/blob/v2.18.4/lib/ansible/...
2: https://docs.ansible.com/ansible-core/2.18/dev_guide/develop... and https://github.com/ansible/ansible/blob/v2.18.4/test/integra...
Yes, this.
This format is unreadable on mobile, it keeps opening up my keyboard and scrolling up a bit when it does.
I understand and appreciate the "why" of the format, but this also could have been a non-editable "editor-like" presentation and achieved the same result.
This site's mobile experience mirrors my feelings when having to edit deeply nested and templated YAML.
Is editing deeply nested JSON and XML a better experience than YAML? I don't think so.
I’d argue yes, strictly due to the lack of significant whitespace.
That's a really odd reason to prefer something used as a configuration language where readability is important.
From the bottom of the article:
> # ps. By design, this website is as usable as YAML.
It's intentionally bad.
And from their own testimonials section:
> The good news (I realised) was that you can select all the text of the site, and then delete it. Problem solved.
This site is beautiful in it's own way.
Its. It's is a contraction of it+is
I just read it here: https://raw.githubusercontent.com/ghuntley/noyaml/refs/heads...
I will die on the hill that TOML should be used for the vast majority of what YAML's used for today. There are times a full language is needed, but I've seen so many YAML files that use none of the features YAML has with all of the footguns.
The thinking that would lead one to the conclusion of "yeah, fine, pick whatever characters you want for the string contents" is a "you are I are solving different problems"
https://github.com/toml-lang/toml/blob/1.0.0/toml.md#user-co...
TOML is worse and even harder to read. I'd take YAML any day over TOML.
Yaml is a sad Icarus parable. The syntax is great but the type inference is too much. I don't see why we have to throw the baby out with the bathwater and settle for toml, though.
Here's how yaml's type inference should work:
- All object keys are strings (with or without quotes)
- Value atoms are parsed the exact same way as in JSON5
I'm kinda shocked this isn't a thing. StrictYAML is cool but a bit too cumbersome IMO.
But toml isn't any better than yaml. It's difficult to read
Yeah theres some problem, but reading a multiline code block (like github actions bash script) as an indent-escaped string is so much better than having to understand crazy triple-escaped characters like "sed \"\\\\\\"name1\\\\\\\"\""
What's even better is just using an actual programming language. Not bash, not sed, not yaml. Just python or NodeJS.
This is the same debate folks have between Maven and Gradle: do you want CI code to be able to do *anything* that Python or Node can do, or do you want well defined knobs people can turn. If nothing else, it makes code reviews for CI way less drama than trying to use some bespoke dsl-in-python that re-implements {job: {steps: [{run: ...}]}} in a less legible way
Yeah. My experience has often been that you end up with a task that is very easy and familiar to write in say Bash, that you then have to solve a puzzle to write in Ansible/Puppet/whatever. Which feels exactly like what you're saying: a DSL re-implementing something else in a less legible way.
I guess it's like anything: for the right task, the right tool works well. But invariably, a tool will eventually be pushed into use for the wrong task.
you only think until you have to extract some information from it. "Here is a dir full of CI job definitions, find all the ones which require extra permissions" - trivial in yaml or json or toml, could be hard to impossible for Python/nodejs.
or "I am doing ci job frontend, for each job I need to get a list of inputs it takes and their description, but without doing full code checkout" - good luck doing this if your jobs are in python/nodejs.
(to be fair, there are programming languages that are also severely limited and can be evaluated mostly safely, like Starlark, but I don't think they'd match your definition)
I'm a big fan of YAML after coming from JSON and, later, ProtoBuf's many definition formats while working at Google---but it's true that there are a lot of oddities in YAML's magical parsing. I'm grateful for the many ways it's possible to quickly an naturally define simple hierarchies of data (for example, in a docker-compose file).
This website does the rare thing of, after complaining, providing a long list of alternatives. It's really nice.
I find the configuration complexity clock always valuable in framing conversations like this: https://mikehadlow.blogspot.com/2012/05/configuration-comple...
I wrote a really long blog post about this once without the clock metaphor.
This is far superior in illustrating the slippery slope.
Aside from that slide too an inevitable dinner with Turing completeness, there's often the problem of sourcing information from multiple files overlaying it backtracking where it's sourced from.
Docker files are an example of this, as is the complete list of config values in spring framework (it's like 30 different sources).
In addition config starts getting into secured secrets, service invocations, database lookups, operating system commands, and who else knows what.
So not only is it really a touring complete problem, it veers into hellscape that is system integration.
YAML's idea of human readability misses the mark. Especially anchors. They're the worst tools for abstraction.
JSON with functions as the high level format and JSON as the low level format is the way to go. Examples include Nix and Jsonnet. They're much nicer to deal with and less error prone.
> # Anyone wondering why their first seven Kubernetes clusters deploy just fine, and the eighth fails?
Yay, octal numbers! But don't panic, lots of supposed C programmers fall for exactly the same trick when prefixing numbers with `0`.
I can take credit for one of these, the 63 ways to wrap a string, on line 156.
That's what https://yaml-multiline.info is for! I know the author's on HN, I hope they chime in here :)
Amusingly, the Stack Overflow answer you linked in your contribution is the second result in a Google search for "YAML multiline string" or the like, after yaml-multiline.info; the two combined appear to the canonical resource on the web.
Heh, not sure if you realise, but that website was inspired by my SO answer. (I'm credited at the bottom).
I like writing yaml
(context: I've never had to write 1000-line yaml files for kubernetes)
When it comes to AWS Cloudformation, I love YAML. Can't think of any other positive use though.
They should be beaten about the head and shoulders for how atrocious this is: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGui...
Also, due to it failing to work as advertised, I just use `cfn-lint --info -e` to grab the transformed template and use that for realz.
All I see is Lisp:
#Operator: - Operand 1 - Operand 2
Yawn. I'll keep using XML. While many waste their youth pointlessly reinventing XSDs in YAML/JSON/TOML/JSON5/HJSON, I'll still be here. Living. Content.