Introduction
Purpose and Audience
Core Solidity is an update to the Solidity programming language. It both extends the language with new features (ADTs, Parametric polymorphism, Typeclasses), and diverges from Classic Solidity in incompatible ways (most notably by removing inheritance). This document targets compiler implementers, tooling developers, and language researchers. It is a reference specification, not a tutorial.
A prototype implementation exists at argotorg/solcore. The
implementation is a research prototype. It contains bugs, lacks optimizations, and will change.
Compilation Architecture
Compilation proceeds by succesive translation between the following representations:
-
Core Solidity: The user-facing language with all syntactic features. Supports higher-order functions, pattern matching, type classes, and modules. This is a superset of SAIL, with all additional language constructs being defined in terms of desugaring passes (syntactic transformations) into SAIL.
-
SAIL: A minimal first-order intermediate representation. All syntactic sugar has been removed. Type classes are resolved to concrete instances. Polymorphic functions are specialized to monomorphic versions. Pattern matching is compiled to simple case trees.
-
Hull: A low-level representation close to Yul. Retains algebraic data types (binary sums and products) but otherwise resembles Yul. Hull code is translated directly to Yul, which is then compiled to EVM bytecode.
-
Yul: The existing Yul assembly language currently used in solc. Core Solidity is designed in a way that allows the use of alternative or additional representations at this level in the future.
Each transformation step simplifies the language and moves closer to the EVM execution model.
Document Structure
- Compilation Pipeline: Describes the compilation pipeline and how representations relate
- SAIL: Specifies the desugared intermediate representation
- Core Solidity: Specifies the user-facing language features
- Hull: Specifies the low-level IR before Yul emission
Status
This is a research prototype. The language design is not stable. Features may be added, removed, or changed. The implementation has known bugs and missing functionality. Do not use this for production systems.
SAIL
Syntax
Builtins
Variable Declaration and Assignment
Functions
Assembly Blocks
Datatypes
Parametric Polymorphism
Typeclasses
Modules
Type Inference
Core Solidity
Lambda Functions
Numeric Types
Arithmetic
Data Locations
ABI Encoding / Decoding
Contracts
Hull
Introduction
The Hull language is an intermediate step before Yul generation. It is basically Yul with algebraic types (sums/products)
Naming
Hull was initially named Core, but once it was decided that the whole language should be called "Core Solidity", it was renamed to Hull.
Abstract syntax
See src/Language/Hull.hs
Types
data Type
= TWord
| TBool
| TPair Type Type -- binary product, e.g. (word * word)
| TSum Type Type -- binary sum, e.g. (unit + word)
| TSumN [Type] -- n-ary sum
| TFun [Type] Type
| TUnit
| TNamed String Type -- named type, e.g. Option{unit + word}
Expressions
data Expr
= EWord Integer
| EBool Bool
| EVar Name
| EPair Expr Expr
| EFst Expr
| ESnd Expr
| EInl Type Expr
| EInr Type Expr
| EInK Int Type Expr
| ECall Name [Expr]
| EUnit
Notes:
EInl/EInrare binary sum constructors (injections)EInK krepresents k-th injection for n-ary sum- all injections are annotated with target type, so we have for example
inr<Option{unit + word}>(42) - n-ary products not implemented yet, but planned (they are simpler than sums)
Statements
data Stmt
= SAssign Expr Expr
| SAlloc Name Type
| SExpr Expr
| SAssembly [YulStmt]
| SReturn Expr
| SComment String
| SBlock [Stmt]
| SMatch Type Expr [Alt]
| SFunction Name [Arg] Type [Stmt]
| SRevert String
data Arg = TArg Name Type
data Alt = Alt Con Name Stmt
data Con = CInl | CInr | CInK Int
Notes:
- there are no control constructs yet, except call/return/revert; this is mostly because the surface language does not have them either
- using comments can lead to problems, since Solidity seems to have no nested comments
- for
YulStmt, seesrc/Language/Yul.hs - as in Yul, there are no statement separators
Contracts
Right now a contract consists of a name and a list of statements. This is definitely a temporary solution.
data Contract = Contract { ccName :: Name, ccStmts :: [Stmt] }
Concrete Syntax
See src/Language/Hull/Parser.hs
Although Hull is not meant to be written manually, it has concrete syntax, mostly for diagnostic purposes.
Block = "{" hullStmt* "}"
Contract = "contract" identifier
Stmt = "let" identifier ":" Type
| "return" Expr
| match "<" Type ">" Expr "with" "{" Alt* "}"
| "function" identifier "(" Args? ")" "->" Type Block
| "assembly" YulBlock
| "revert" StringLiteral
| Expr ":=" Expr
| Expr
Args = identifier : Type ("," Args)?
Alt = Con identifier "=>" Stmt
Con = "inl" | "inr" | "in" "(" integer ")"
Expr = Con "<" Type ">" PrimaryExpr
| Project PrimaryExpr
| PrimaryExpr
PrimaryExpr
= integer
| "true"
| "false"
| Tuple
| identifier "(" ExprList ")"
| identifier
Tuple = "(" ExprList ")"
ExprList = (Expr (, Expr)*)?
Examples
Option
contract Option {
function main () -> word {
return maybe$Word(0, inr<Option{unit + word}>(42))
}
function maybe$Word (n : word, o : Option{unit + word}) -> word {
match<Option{unit + word}> o with {
inl $alt => { // None
return n
}
inr $alt => { // Some[var_1]
let var_2 : word
var_2 := $alt
return var_2
}
}
}
}
Enumeration type
contract RGB {
function fromEnum (c : Color{unit + (unit + unit)}) -> word {
// match c with
match<Color{unit + (unit + unit)}> c with {
inl $alt => { // R
return 4
}
inr $alt => match<unit+unit> $alt with {
inl $alt => { // G
return 2
}
inr $alt => { // B
return 42
}
}
}
}
function main () -> word {
return fromEnum(inr<Color{unit + (unit + unit)}>(inr<unit + unit>(())))
}
}
Problems
Type annotation are helpful for code generation and seem to be necessary for compression, but they lead to rather verbose syntax, e.g.
data Food = Curry | Beans | Other
data CFood = Red[Food] | Green[Food] | Nocolor
function fromEnum(x : Food) {
match x {
| Curry => return 1;
| Beans => return 2;
| Other => return 3;
};
}
contract Food {
function eat(x) {
match x {
| Red[f] => return f;
| Green[f] => return f;
| _ => Other;
};
}
function main() {
return fromEnum(eat(Green[Beans]));
}
}
yields the following Hull:
contract Food {
function eat (x : CFood{(Food{(unit + (unit + unit))} + (Food{(unit + (unit + unit))} + unit))}) -> Food{(unit + (unit + unit))} {
match<CFood{(Food{(unit + (unit + unit))} + (Food{(unit + (unit + unit))} + unit))}> x with {
inl $alt => { // Red[var_8]
let var_9 : Food{(unit + (unit + unit))}
var_9 := $alt
return var_9
}
inr $alt => match<(Food{(unit + (unit + unit))} + unit)> $alt with {
inl $alt => { // Green[var_5]
let var_6 : Food{(unit + (unit + unit))}
var_6 := $alt
return var_6
}
inr $alt => revert "no match for: Nocolor"
}
}
}
function fromEnum (x : Food{(unit + (unit + unit))}) -> word {
match<Food{(unit + (unit + unit))}> x with {
inl $alt => { // Curry
return 1
}
inr $alt => match<(unit + unit)> $alt with {
inl $alt => { // Beans
return 2
}
inr $alt => { // Other
return 3
}
}
}
}
function main () -> word {
return fromEnum(eat(inr<CFood{(Food{(unit + (unit + unit))} + (Food{(unit + (unit + unit))} + unit))}>(inl<(Food{(unit + (unit + unit))} + unit)>(inr<Food{(unit + (unit + unit))}>(inl<(unit + unit)>(()))))))
}
}
On the other programs are not really meant to be read by humans.