Topics

Dhall as RISC-V config language

Schaefer, Daniel Helmut (DualStudy)
 

Hi all,

following my introduction of Dhall on Monday, I'd like to give you again an
overview of Dhall in text-form.

TLDR; The reason for us to use Dhall would be to allow maximum flexibility
while also ensuring maximum validity of the config.

I invite you to look at some resources about Dhall to get a quick overview:
- Homepage: https://dhall-lang.org
- Cheatsheet: https://docs.dhall-lang.org/howtos/Cheatsheet.html
- Learn X in Y Minutes: https://learnxinyminutes.com/docs/dhall/
- Safety Guarantees: https://github.com/dhall-lang/dhall-lang/wiki/Safety-guarantees

Advantages
- Easy to write because there are many features for deduplication
- Import (e.g. U540 = 4xU54 + 1x E51)
- Override (e.g. of imported configs)
- Functions
- We can specify ranges and Dhall can deduplicate and merge them
- Easy to read
- Because deduplication is part of the language, thus standardized by us
- Compared to full programming languages: total (guaranteed termination in finite-time)
- Has formal spec
- Built-in super-strict typesystem/schema
- Designed to be trivially convertable to YAML and JSON
- Could even be used as a templating language (e.g. to Markdown)
- Has standardized binary format: CBOR

Drawbacks
- Relatively unknown in the wider community (-> needs to be learned)
- Can be written in a very unreadable manner (I have improved my example from Monday greatly, will send it soon)
- Not supported by *all* languages, like JSON/YAML. _Only_: Clojure, Go, Haskell, Ruby, Rust
- But it doesn't matter much because the only/most important consumer is the compiler to the binary format

I'll explain some of the points in follow-up mails.

Cheers,
Daniel

Schaefer, Daniel Helmut (DualStudy)
 

On 7/9/20 2:58 PM, Schaefer, Daniel Helmut (DualStudy) wrote:
Drawbacks
  - Relatively unknown in the wider community (-> needs to be learned)
  - Can be written in a very unreadable manner (I have improved my example from Monday greatly, will send it soon)
I have talked to colleagues and showed them Bill's as well as my example from Monday.
Like I expected, even though they appreciated the features of Dhall, they said that my
example wasn't very readable at all and even intimidating.

Now I've learnt to reduce the boiler-plate a bit and didn't use the automatic formatter.
Most people probably don't appreciate the Haskell-like formatting, so now I made it look
more like JSON and I'm pretty happy with the result.

The entire config is available here:
https://github.com/JohnAZoidberg/configuration-structure/blob/aae5c508a81da92becaba91b331362f60e630dd1/example.dhall

One part of the configuration can look like this:

-- Define Isa and Priv for even numbered harts
withHartRange (Multiple [ 0, 2, 4 ]) Categories::{
Isa = [ IsaEntry.RISCV_32, IsaEntry.RISCV_64 ],
Priv = Some {
modes = [ Mode.M , Mode.S , Mode.U ],
epmp = True,
satp = [ Satp.Sv39 , Satp.Sv48 , Satp.Sv57 , Satp.Sv64 ]
}
},

which defines the ISA and privilege modes config for the even numbered harts.

When running the Dhall interpreter on the config file, it automatically expands
config for individual harts, hart lists and hart ranges into configs for
the individual harts. Those are then merged together - so what's left is one listing
per hart, which has all of the hart's configuration.

By running `dhall-to-yaml --file example.dhall > dhall-to.yaml` the config can be
translated to YAML, which looks pretty nice now.
I haven't finished writing the merging logic for the harts but after doing that by
hand, the generated YAML for looks like this

harts:
- config:
Isa:
- RISCV_32
- RISCV_64
Priv:
epmp: true
modes:
- M
- S
- U
satp:
- Sv39
- Sv48
- Sv57
- Sv64
hartid: 0
uncore:
DebugModule:
abstractCommands:
- mask: 65280
value: 4660
connectedHarts:
end: 4
start: 0
Trace:
branchPredictorEntries: 0
contextBusWidth: 32
jumpTargetCacheEntries: 0

, which is probably what it would look like also, if we used YAML directly.

Full YAML output here:
https://github.com/JohnAZoidberg/configuration-structure/blob/aae5c508a81da92becaba91b331362f60e630dd1/dhall-to-merged.yaml

Please let me know what you think!

Thanks,
Daniel

Tim Newsome
 

Thanks for the detailed summary and example.

While the syntax still looks a little foreign to me, overall I like this approach.
Dhall supports just enough programming features that we can concisely express common configurations, but isn't really any more complex than necessary. It's a lot more concise than writing YAML directly.

How do you envision the translation to a binary format? Would we convert to JSON and go from there, or is it possible/sensible to work on a parsed Dhall file directly?

In the interest of keeping everything in one place, can you make a PR against https://github.com/riscv/configuration-structure that gets those examples into that repo?

For those following along, I just put the scala and yaml examples into https://github.com/riscv/configuration-structure/tree/master/human-readable.

Tim

On Thu, Jul 9, 2020 at 5:58 AM Schaefer, Daniel Helmut (DualStudy) <daniel.schaefer@...> wrote:
Hi all,

following my introduction of Dhall on Monday, I'd like to give you again an
overview of Dhall in text-form.

TLDR; The reason for us to use Dhall would be to allow maximum flexibility
while also ensuring maximum validity of the config.

I invite you to look at some resources about Dhall to get a quick overview:
- Homepage: https://dhall-lang.org
- Cheatsheet: https://docs.dhall-lang.org/howtos/Cheatsheet.html
- Learn X in Y Minutes: https://learnxinyminutes.com/docs/dhall/
- Safety Guarantees: https://github.com/dhall-lang/dhall-lang/wiki/Safety-guarantees

Advantages
   - Easy to write because there are many features for deduplication
     - Import (e.g. U540 = 4xU54 + 1x E51)
     - Override (e.g. of imported configs)
     - Functions
     - We can specify ranges and Dhall can deduplicate and merge them
   - Easy to read
     - Because deduplication is part of the language, thus standardized by us
   - Compared to full programming languages: total (guaranteed termination in finite-time)
   - Has formal spec
   - Built-in super-strict typesystem/schema
   - Designed to be trivially convertable to YAML and JSON
   - Could even be used as a templating language (e.g. to Markdown)
   - Has standardized binary format: CBOR

Drawbacks
   - Relatively unknown in the wider community (-> needs to be learned)
   - Can be written in a very unreadable manner (I have improved my example from Monday greatly, will send it soon)
   - Not supported by *all* languages, like JSON/YAML. _Only_: Clojure, Go, Haskell, Ruby, Rust
     - But it doesn't matter much because the only/most important consumer is the compiler to the binary format

I'll explain some of the points in follow-up mails.

Cheers,
Daniel



Schaefer, Daniel Helmut (DualStudy)
 

On 7/9/20 2:58 PM, Schaefer, Daniel Helmut (DualStudy) wrote:> Advantages
  - Built-in super-strict typesystem/schema
# Validation of values

Dhall *cannot* validate (or introspect at all) strings or double values.
Looking at Tim's example, there doesn't appear to be a need for that.
In fact, I would argue that in many cases using stringly-typed data is an
indication of a bad schema, where proper types would have been the better
choice.

The schema that the compliance group uses [1] employs regex, for example,
to validate the ISA string, which looks like e.g. RV64ABCDZfinx.
This is necessary if we use a simple stringly typed configuration language,
however in something more advanced, like Dhall, we would build the ISA type
using a enums. In pseudo-code:

xlen = 32 | 64 | 128
extensions = [A...X] | Zfinx | Zifenci
ISA = { xlen, extensions }

This could then, if necessary, be converted to a string.

If somebody sees a real use-case for strings, please *do* let me know.
I'd be happy to think about how to accomodate that.

Dhall allows a limited set of assertions, for example ensuring an integer is
within a certain range.

# Validation using types/schema

Just like JSON/YAML validation schemas, Dhall has many types:
- Integer/Natural
- Boolean
- Lists
- Records/Structs

Additionally there is an Option type, like in Rust - meaning that in order for
something to be optional/nullable it must explicitly be declared as such.
This further reduces the set of configurations that are valid by the schema
but don't make sense.

## Interaction with JSON/YAML

Dhall by default supports conversion to JSON, YAML and XML, as well as
normalization to a Dhall representation, without functions, imports, ....
This normalized form is practically JSON/YAML without the advances language
constructs, but still with schema.

This schema is expressed in a typesystem, similarly powerful as Haskell.
Since the call yesterday I also found that there's a tool that can convert
JSON/YAML back into Dhall, while checking its conformance to the schema.

Therefore we could specify the schema in Dhall and use the advanced features of
Dhall to write a simple deduplicated and modular configuration, while also
being able to convert it into JSON, that's readable by anyone.
If someone wished, they could also write JSON and verify compliance with the
schema by converting it to Dhall.

I'm not sure whether it would be a good idea to define the spec in Dhall but
let people write configs in JSON - at least it's an option to consider.
Or, like the Compliance Group, we can just use Dhall as a schema and never
write it.

[1] https://github.com/riscv/riscv-config/blob/e428724c3457a34977a189608acdbb81e1291b54/riscv_config/schemas/schema_isa.yaml#L57-L61

Josh Scheid
 

I like the idea of a "strong" configuration language like dhall (or scala), but how might other systems/languages consume the configuration?  From what I found, importing YAML/JSON back into dhall is an open project.

I think the conservative approach could be:

1) Use YAML or JSON5 as the universal format, along with a group-specified JSON Schema for validation.
2) Something like Dhall could be a semi-supported "input" language, but others could use their system of choice, from raw (1) to templating layers, to full Turing-complete programming languages.
3) Translation to/from the binary format is done to/from (1).

-Josh

Tim Newsome
 

On Thu, Jul 9, 2020 at 9:21 AM Josh Scheid <jscheid@...> wrote:
I like the idea of a "strong" configuration language like dhall (or scala), but how might other systems/languages consume the configuration?  From what I found, importing YAML/JSON back into dhall is an open project.

I think the conservative approach could be:

1) Use YAML or JSON5 as the universal format, along with a group-specified JSON Schema for validation.
2) Something like Dhall could be a semi-supported "input" language, but others could use their system of choice, from raw (1) to templating layers, to full Turing-complete programming languages.
3) Translation to/from the binary format is done to/from (1).

I agree that is the conservative approach. But I can't imagine anybody writing so much YAML/JSON5 directly, so now we're signing up for writing a schema to validate that format, in addition to a definition for the input language.

Can other systems/languages consume the machine-readable format instead of the human-readable one?

Tim

Josh Scheid
 

On Thu, Jul 9, 2020 at 10:44 AM Tim Newsome <tim@...> wrote:
On Thu, Jul 9, 2020 at 9:21 AM Josh Scheid <jscheid@...> wrote:
I like the idea of a "strong" configuration language like dhall (or scala), but how might other systems/languages consume the configuration?  From what I found, importing YAML/JSON back into dhall is an open project.

I think the conservative approach could be:

1) Use YAML or JSON5 as the universal format, along with a group-specified JSON Schema for validation.
2) Something like Dhall could be a semi-supported "input" language, but others could use their system of choice, from raw (1) to templating layers, to full Turing-complete programming languages.
3) Translation to/from the binary format is done to/from (1).

I agree that is the conservative approach. But I can't imagine anybody writing so much YAML/JSON5 directly, so now we're signing up for writing a schema to validate that format, in addition to a definition for the input language.


I don't think the YAML need be that verbose/sparse.

My concern specifying the input language is that it's generally difficult to have tooling directly input and output that input language.

If you have all of your generative design logic in srclang, do you want to write a srclang to inputlang converter, which then generates YAML/JSON anyway?  Chances are, srclang already has libraries to generate YAML/JSON. 

Can other systems/languages consume the machine-readable format instead of the human-readable one?


That's a good question.

Don't we also want a binary->human converter as well?

-Josh
,_._,_