ChocoPy v2.2: Language Manual and Reference
Designed by Rohan Padhye and Koushik Sen; v2 changes by Paul Hilfnger
University of California, Berkeley
November 23, 2019
Contents
1 Introduction 3
2 A tour of ChocoPy 3
2.1 The top level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.4 Type hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.5 Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.5.1 Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.5.2 Booleans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.5.3 Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.5.4 Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.5.5 Objects of user-defined classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.5.6 None . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.5.7 The empty list ([]) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.6 Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.6.1 Literals and identifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.6.2 List expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.6.3 Arithmetic expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.6.4 Logical expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.6.5 Relational expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.6.6 Conditional expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.6.7 Concatenation expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.6.8 Access expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.6.9 Call expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.7 Type annotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.8 Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.8.1 Expression statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.8.2 Compound statements: conditionals and loops . . . . . . . . . . . . . . . . . . . . . . 10
2.8.3 Assignment statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.8.4 Pass statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.8.5 Return statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.8.6 Predefined classes and functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1
3 Lexical structure 12
3.1 Line structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.1.1 Physical lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.1.2 Logical lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.1.3 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.1.4 Blank lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.1.5 Indentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.1.6 Whitespace between tokens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 Identifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.3 Keywords . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.4 Literals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.4.1 String literals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.4.2 Integer literals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.5 Operators and delimiters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4 Syntax 14
4.1 Precedence and Associativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5 Type rules 16
5.1 Type environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.2 Type checking rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6 Operational semantics 23
6.1 Evaluation context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
6.2 Syntax for values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
6.2.1 Class instances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
6.2.2 List objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
6.2.3 None . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
6.2.4 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
6.3 Syntax for class definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
6.4 Operational rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
7 Acknowledgements 38
A Known incompatibilities with Python 38
2
1 Introduction
This manual describes the ChocoPy language, which is a statically typed dialect of Python 3.6. ChocoPy
is intended to be used in a classroom setting. It has been designed to be small enough for students to
implement a full ChocoPy compiler over one semester.
ChocoPy has been designed to be a subset of Python. Almost every valid ChocoPy program is also a
valid Python 3.6 program. An execution of a ChocoPy program that does not result in error usually has
the same observable semantics as the execution of that program in Python 3.6. Appendix A lists the small
number of exceptions to this rule.
A ChocoPy program is contained in a single source file. At the top level, a ChocoPy program consists
of a sequence of variable definitions, function definitions, and class definitions followed by a sequence of
statements. A class consists of a sequence of attribute definitions and method definitions. A class creates a
user-defined type. Function definitions can be nested inside other methods and functions. All class names and
functions defined at the top level are globally visible. Classes, functions, and methods cannot be redefined.
Program statements can contain expressions, assignments, and control-flow statements such as conditionals
and loops. Evaluation of an expression results in a value that can be an integer, a boolean, a string, an
object of user-defined class, a list, or the special value None. ChocoPy does not support dictionaries, first-
class functions, and reflective introspection. All expressions are statically typed. Variables (global and local)
and class attributes are statically typed, and have only one type throughout their lifetime. Both variables
and attributes are explicitly typed using annotations. In function and method definitions, type annotations
are used to explicitly specify return type and types of formal parameters.
For readers familiar with the Python language, Figure 1 contains a sample ChocoPy program illustrating
top-level functions, statements, global variables, local variables, and type annotations. The type annotations
are valid syntaxes in Python 3.6, though the Python interpreter simply ignores these annotations and leaves
them as hints for other tools. In contrast, ChocoPy enforces static type checking at compile time. In
Figure 1, the function is
zero is defined at the top level. Its formal parameters items and idx are explicitly
typed as a list of integers and an integer, respectively. The return type of the function is zero is bool.
The function defines a local variable, val, whose type is int. At the top level, the program defines a global
variable mylist, whose type is a list of integers. Function is zero is invoked in a top level statement and
its result is output using the predefined print function. Similarly, Figure 2 contains a ChocoPy program
that defines two classes: animal and cow. The class cow inherits from animal, which in turn inherits from
the predefined root class object. The Boolean attribute makes noise is defined in animal and is therefore
inherited by class cow. The class cow overrides the method sound. The constructor for class cow is invoked
at line 19.
Section 2 provides a detailed but informal overview of the various language constructs in ChocoPy.
Sections 3–6 provide formal descriptions of the lexical structure, grammatical syntax, typing rules, and
operation semantics of ChocoPy.
2 A tour of ChocoPy
Notation In the rest of this section, we use:
{expr} to denote an expression in the program.
{id} to denote an identifier such as the name of a variable or function.
{stmts} to denote a list of program statements, separated by newlines.
{declarations} to denote a list of (possibly interleaved) declarations of functions, variables, attributes,
and/or classes, where applicable.
{type} to denote a static type annotation.
{literal} to denote a constant literal such as an integer literal, a string literal, or the keywords True,
False or None.
3
1 def is_zero ( items : [ int ] , idx : int ) -> bool :
2 val :int = 0 # Type is explici tly decla red
3 val = items [ idx]
4 return val == 0
5
6 mylist : [ int ] = None
7 mylist = [1, 0 , 1]
8 print ( i s_zer o ( mylist , 1) ) # Prints True
Figure 1: ChocoPy program illustrating functions, variables, and static typing.
2.1 The top level
A ChocoPy program consists of zero or more definitions followed by zero or more statements, referred to
as top-level definitions and statements respectively. Top-level definitions include global variable definitions,
function definitions, and class definitions. These definitions create new mappings in a scope called the
global scope. Global variables are defined using the syntax {id}:{type} = {literal}, where the identifier
specifies the variable name, the type annotation specifies the static type of the variable, and the constant
literal zspecifies the initial value of the variable upon program execution. The names of global variables,
global functions, and classes must be distinct. Top-level statements execute in the global scope; that is,
expressions in top-level statements may reference entities defined in the global scope using identifiers. A
ChocoPy program’s execution begins with the first top-level statement and ends when the last top-level
statement is executed completely. Function and class definitions are described in Section 2.2 and Section 2.3
respectively. Program statements are described in Section 2.8.
2.2 Functions
In ChocoPy, a function definition can appear at the top level of a program, or it could be nested inside other
functions or methods. Functions cannot be redefined in the same scope. However, a function definition in
the current scope can shadow a function defined in a surrounding scope of the function. A function definition
has the following form:
def {id}({id}: {type}, ..., {id}: {type}) {return type}:
{declarations}
{stmts}
where {return type} is either empty or has the form -> {type}.
The first line defines the function’s name, a comma-separated list of zero or more formal parameters in
parentheses, and the function’s return type after the -> symbol. When the return type is empty, the function
may only return the value None. Every formal parameter has a name and a static type annotation. The
body of a function contains a sequence of zero or more declarations followed by a sequence of one or more
program statements. A function definition creates a new scope.
Declarations in a function body include local variable definitions, global and nonlocal variable declara-
tions, and definitions of nested functions. {id}:{type} = {literal}. Such a definition declares a local
variable with the name id, explicitly associates it with a static type, and specifies an initial value using a
literal. The global {id} statement is used to bind a name to a global variable. Similarly, nonlocal {id}
statement is used within a nested function to bind a name to a variable defined in a surrounding scope that
is not the global scope—specifically to the closest surrounding scope declaring that variable. It is illegal for
a global declaration to occur at the top level. Similarly, it is illegal for a nonlocal declaration to occur
outside a nested function, or to refer to a global variable.
If a variable is not explicitly declared in a function, but is bound to some entity—variable, function, or
class—in any surrounding scope, then its binding is implicitly inherited from the surrounding scope as a
read-only variable—such a variable cannot be assigned to in any of the function’s statements.
4
1 class animal ( object ) :
2 makes_noi s e : bool = False
3
4 def make _noise (self : " animal ") -> object :
5 if ( self . makes _ noise ):
6 print ( self . sound ())
7
8 def sound ( self: " animal " ) -> str:
9 retur n " ??? "
10
11 class cow ( animal ):
12 def _ _init__ ( self : " cow " ):
13 self . mak e s_noise = True
14
15 def sound ( self: " cow " ) -> str :
16 retur n " moo "
17
18 c: an imal = None
19 c = cow ()
20 c. make _ noise () # Prints " moo "
Figure 2: ChocoPy program illustrating classes, attributes, methods, and inheritance.
2.3 Classes
In ChocoPy, a class definition can appear at the top level of a program. Classes cannot be redefined. Class
names can never be shadowed; that is, a program may not define any variable or function with the same
name as a class name. A class definition has the following form:
class {id}({id}):
{declarations}
The first line specifies the name of the class followed by the name of its superclass in parentheses. The
class name must not be be bound to any other entity—class, function, or variable—in the program. The
superclass must refer to a class that has been previously defined in the program, or be the predefined class
object. The superclass may not be one of int, str, or bool.
The class body consists of a sequence of attribute definitions and method definitions. In ChocoPy,
attributes and methods are associated with object instances of a class, and not with the classes themselves;
that is, ChocoPy does not support the notion of static class members that some other languages support.
Similar to variable definitions, an attribute definition has the form {id}:{type} = {literal}. A method
definition has the same syntax as a function definitions (ref. Section 2.2), with two important restrictions:
(1) a method definition must have at least one formal parameter, and (2) the first formal parameter must
have the defining class as type.
A class defines attributes and methods. A class inherits attributes and methods of its superclass. At-
tributes, whether defined in the current class or inherited from the superclass, cannot be redefined. Methods
cannot be redefined in the same class. Inherited methods can be redefined as long as the return type and the
types of all formal parameters except the first parameter are exactly the same. Any reference to an attribute
or method must be prefixed with an expression and the dot operator (ref. Section 2.6).
If a class C is defined to have a superclass P, then class C is a subclass of P. If C is a subclass of P, then
P must either be a user-defined class that has been defined before C in the program, or be the predefined
class object. The object class does not have a superclass. Since every ChocoPy class (except object)
inherits attributes and methods from a single superclass, this scheme is called single inheritance. The
subclass/superclass relation on classes defines a graph. Since a class can only subclass another class defined
previously, this graph is a tree with object as the root.
In order to create an object o of type C, the expression C() is used. Upon execution, a new object is
first created with attributes initialized to their defined values. Then, the init method in the class C is
5
invoked. If init method is not defined in class C, then the inherited init method is called. The root
class object has a default init method whose body is empty.
2.4 Type hierarchy
In ChocoPy, every class name is also a type. The basic type rule in ChocoPy is that if a method or variable
expects a value of type P, then any value of type C may be used instead, provided that P is an ancestor of C
in the class hierarchy. In other words, if C inherits from P, either directly or indirectly, then a C can be used
wherever a P would suffice. When an object of type C may be used in place of an object of type P, we say
that C conforms to P or that C P (think: C is lower down in the inheritance tree). Conformance of class
types is defined in terms of the inheritance graph. Let A, C, and P be types. Then conformance (i.e. ) is
defined as follows:
A A for all types A
if C is a subclass of P, then C P
if A C and C P, then A P
The root of the class hierarchy is the predefined class object. The predefined types int, bool, and str are
subclasses of object. Additionally, for every type T in a ChocoPy program, there is a list type [T], which
represents a list whose elements are of type T. For example, the type [int] represents a list of integers. List
types are recursive: the type [[int]] represents a list whose elements are each a list of integers. List types
are not related to each other by the relation. Every list type conforms to object; that is, [T] object
for any type T.
In addition to types that one can denote in ChocoPy programs, there are two special types: the type of
None, which we denote <None>, and the type of [], which we denote as <Empty>. Because there is no way to
explicitly write these type names in ChocoPy, no variable, parameter, or function return value ever has either
of these types. In the type hierarchy, <None> object, <Empty> object, and as usual <None> <None>
and <Empty> <Empty>, but otherwise these types are unrelated to any other type.
To describe assignments and function invocation, we will need a slightly different relation, which we’ll
call assignment compatibility and denote by the symbol
a
. Roughly, the idea is that we may assign or pass
a quantity of type T
1
to something of type T
2
iff T
1
a
T
2
. More precisely, T
1
a
T
2
iff at least one of the
following is true:
T
1
T
2
(i.e., ordinary subtyping).
T
1
is <None> and T
2
is not int, bool, or str.
T
2
is a list type [T ] and T
1
is <Empty>.
T
2
is a the list type [T] and T
1
is [<None>], where <None>
a
T .
The last case bears mention: it is the only case in which two different list types are assignment compatible.
It is convenient for writing such things as
x: [A] = [None, None]
It is rather limited, admittedly. For example,
x: [[A]] = [[None]]
is still invalid, although it looks perfectly sensible. There is subtle danger lurking here, and it turns out
we must restrict the places where we may assign objects of type [<None>], as described under the multi-
assign-stmt rule in Section 5.
In some situations, we will also need to use the concept of a join of two or more types. The join of two
types A and B is the least type C (using the
a
ordering) such that A and B are assignment compatible
with C. The join operator t can be formally defined as follows: C = A t B if and only if:
(A
a
C) (B
a
C) (D : (A
a
D) (B
a
D) (C
a
D))
6
That is, C is the join of A and B if and only if both A and B are assignment compatible with C, and if
there exists type D such that A and B are assignment compatible with D, then C also is also assignment
compatible with D. The join of any two types always exists and is unique:
If A
a
B, then A t B = B t A = B.
Otherwise, A t B is simply the least common ancestor of A and B in the tree-like type hierarchy
defined by .
2.5 Values
In ChocoPy, we can have the following kinds of values.
2.5.1 Integers
Integers are signed and are represented using 32 bits. The range of integers is from 2
31
to (2
31
1).
Although integers are objects, they are immutable. Arithmetic operations that cause overflow lead to
undefined behavior in program execution.
2.5.2 Booleans
There are exactly two boolean values: True and False.
2.5.3 Strings
Strings are immutable sequences of characters. String literals are delimited with double quotes, e.g. "Hello
World". Strings support the following three operations: retrieving the length via the len function, indexing
via the s[i] syntax, and concatenation via the s1 + s2 syntax. As in Python, ChocoPy does not have a
character type. Indexing into a string returns a new string of length 1. Concatenation returns a new string
with length equal to the sum of the lengths of its operands.
2.5.4 Lists
Lists are mutable sequences with a fixed length. As such, lists in ChocoPy behave more like arrays in C. A
list can be constructed using the square-brackets notation, e.g. [1, 2, 3].
Like strings, lists of type [T] support three operations: len, indexing via the lst[i] syntax, and con-
catenation via the lst1 + lst2 syntax. Indexing a list of type [T] returns a value of type T. Concatenation
of two lists of type [T
1
] and [T
2
] respectively returns a new list of type [T
3
], where the element type of
the new list is T
3
= T
1
t T
2
. The concatenated list has length equal to the sum of the lengths of the two
operands. Additionally, lists of type [T] are mutable and support a fourth operation: element assignment
via the syntax lst[i] = {expr}, where the expression on the right-hand side must conform to the type T.
The rules of ChocoPy permit the construction of some rather odd (but harmless) lists with types such as
[<None>] and [<Empty>]. The only variables and parameters these may be assigned to or passed as have
declared type object, since it is not possible to write the specific type names [<None>] and [<Empty>] in
ChocoPy. It is even possible (if not particularly useful) to index such values.
2.5.5 Objects of user-defined classes
Objects are manipulated using references. That is, x = cow() implies that variable x references an object
of type cow. A subsequent assignment y = x implies that x and y reference the same cow object in memory.
The is operator can be used to determine if two expression reference the same object in memory. Objects
are destroyed when they are not reachable from any local, global, or temporary variable.
7
2.5.6 None
None is a special value that can be assigned to a variable or attribute of type object, any user-defined class
type, or any list type. The is operator can be used to determine if an expression evaluates to the None
value. For type-checking purposes, it has the type <None> (see section 2.4).
2.5.7 The empty list ([])
The expression [] creates a list that can be assigned to a variable having type object or any list type. For
type-checking purposes, it has type <Empty>.
2.6 Expressions
ChocoPy supports the following categories of expressions: literals, identifiers, arithmetic expressions, logical
expressions, relational expressions, concatenation expressions, access expressions, and call expressions.
2.6.1 Literals and identifiers
The basic expression is a constant literal or a variable. Literals of type str, bool, and int have been
described briefly in Section 2.5, and their lexical structure is described in Section 3.4. Variables evaluate to
the value contained in the variable. If an identifier is bound to a global function or class, then it is not a
valid expression by itself—it can appear only in specific expressions such as call expressions. This is because
ChocoPy does not support first-class functions and classes.
2.6.2 List expressions
Lists may be constructed using a comma-separated sequence of expressions delimited by square brackets:
[{expr}, ...]. The type of a list expression containing one or more elements is [T], where T is the least
common ancestor of the types of the list elements in the program’s type hierarchy. In other words, T is
the least type such that the type of each element expression conforms to T. Using the the join operator t
defined in Section 2.4, we can say that an expression of the form [{expr 1}, {expr 2}, ..., {expr n}],
where each expression {expr i} has the type T i, results in a list of type [T] where T = T 1 t T 2 t ...
t T n.
The empty list expression [] has the special type <Empty>, which allows it to be assigned to (passed as)
to a variable (parameter) of any list type. For example, if variable x has type [int], then the assignment x
= [] is legal; x will contain an empty list of integers after this assignment.
2.6.3 Arithmetic expressions
ChocoPy supports the following arithmetic expressions on two operands each of type int: {expr} + {expr},
{expr} - {expr}, {expr} * {expr}, {expr} // {expr}, and {expr} % {expr}. These operators per-
form integer addition, subtraction, multiplication, division quotient, and division remainder, respectively.
ChocoPy does not support the {expr} / {expr} expression, which in Python evaluates to a float value.
The unary expression -{expr} evaluates to the negative of the integer-valued operand. Arithmetic operations
return an int value.
2.6.4 Logical expressions
ChocoPy supports the following logical operations on operands of type bool: not {expr}, {expr} and {expr},
and {expr} or {expr}, which evaluate to the logical negation, conjunction, and disjunction of their operands,
respectively. Logical expressions return a bool value. The binary logical expressions are also short-circuiting.
If the left operand of an and expression evaluates to False, then a result of False is returned without eval-
uating the right operand at all. Similarly, if the left operand of an or expression evaluates to True, then a
result of True is returned without evaluating the right operand at all. These semantics are important when
the expressions in the right-hand side operands contain side-effects.
8
2.6.5 Relational expressions
ChocoPy supports the following relational expressions on operands of type int: {expr} < {expr}, {expr}
<= {expr}, {expr} > {expr}, {expr} >= {expr}. Additionally, the operands in the expressions of the form
{expr} == {expr} and {expr} != {expr} can be of types int, bool, or str, as long as both operands are
of the same type. In constrast, the operands in the expressions of the form {expr} is {expr} can be the
None literal or expressions of any static type other than int, bool, str. The == and != operators return
true if and only if their operands evaluate to respectively equal or unequal values of integers, booleans, or
strings. The is operator returns True if and only if both operands evaluate to the same object or if both
operands evaluate to None.
2.6.6 Conditional expressions
The expression {expr1} if {expr0} else {expr2} first evaluates {expr0}, which must have type bool. If
the result is True, then {expr1} is evaluated and its result is the value of the expression. Otherwise, {expr2}
is evaluated and its value is the value of the expression.
2.6.7 Concatenation expressions
The expression {expr} + {expr} can be used to concatenate two strings or two lists; the result is a new
string or list, respectively.
2.6.8 Access expressions
An attribute of an object can be accessed using the dot operator: {expr}.{id}. For example, x.y.z returns
the value stored in the attribute z of the object obtained by evaluating the expression x.y. An element
of a string or list can be accessed using the index operator: {expr}[{expr}]. For example, "Hello"[2+2]
returns the string "o". Accessing a string or list x with an index i such that i < 0 or i >= len(x) aborts
the program with an appropriate error message.
2.6.9 Call expressions
A call expression is of the form {id}({expr},...), where {expr},... is a comma-separated list of zero or
more expressions provided as arguments to the call. If the identifier is bound to a globally declared function,
the expression evaluates to the result of the function call. If the identifier is bound to a class, the expression
results in the construction of a new object of that class, whose init method is invoked with the provided
arguments.
An expression of the form {expr}.{id}({expr},...) invokes a method with name {id} on the object
returned by evaluating the expression to the left of the dot operator. The first argument is implicit and is
the object whose method is being invoked; the remaining arguments are explicitly provided in parentheses.
Methods are invoked using dynamic dispatch: if the dynamic type of the object, i.e. the type at the time of
execution, is T, then the method {id} defined in T or inherited by T is invoked.
2.7 Type annotations
In ChocoPy, static type annotations are used to explicitly provide types for variables, attributes, formal
parameters and return types of functions and methods. A type annotation can either refer to a class type
T, or a list type [T] such that T is the type annotation corresponding to the type of the elements of the list.
Class-type annotations can be provided in one of two forms: as identifiers or as string literals containing
the name of a class. In ChocoPy, one can use either of the two forms for annotations. However, in Python,
the former form cannot be used to refer to a class type that has not yet been defined, because Python
is interpreted line by line. Since we want ChocoPy to behave similarly as Python, we will use the latter
form of annotation in the above described scenario. In particular, string literals are always needed in type
annotations for the first formal parameter in method definitions, since the type of that parameter is always
the same as the enclosing class, which is not yet fully defined.
9
2.8 Statements
2.8.1 Expression statements
The simplest statement is a standalone expression. The expression is evaluated and its result is discarded.
These types of statements are useful when they have side-effects, e.g. print("Hello").
2.8.2 Compound statements: conditionals and loops
ChocoPy supports the Python-like if-elif-else syntax for conditional control-flow, with elifs and else
being optional:
The following code:
if {expr1}:
{body1}
elif {expr2}:
{body2}
elif {expr3}:
{body3}
is equivalent to:
if {expr1}:
{body1}
else:
if {expr2}:
{body2}
else:
if {expr3}:
{body3}
The expressions in the if and elif conditions must have type bool. The body immediately following an if
or elif condition is only evaluated if the expression evaluates to True. If the expression evaluates to False,
then subsequent elif or else blocks are considered. The body following the else arm is only evaluated if
all of the preceding condition expressions evaluate to False.
ChocoPy supports two types of loops: simple while loops and for loops over lists and strings. while
loops have the following structure:
while {expr}:
{body}
The expression must be of type bool. The body is repeatedly evaluated as long as the expression evaluates
to True between iterations.
for loops can be used to iterate over elements of a list or characters of a string. They take the following
form:
for x in {expr}:
{body}
for loops are syntactic sugar; the structure above is equivalent to the following de-sugaring:
itr = {expr}
idx = 0
while idx < len(itr):
x = itr[idx]
{body}
idx = idx + 1
where len is the predefined length function, and itr/idx are temporary variables that are not defined in the
original scope. A for loop does not create new declarations for the loop variable (x in the above example);
the loop variable must be declared before the for statement.
10
2.8.3 Assignment statements
An assignment statement can be one of the following three forms: (1) {id} = {expr} assigns a value to
the variable bound to the identifier {id}, (2) {expr}.{id} = {expr} assigns a value to an attribute of an
object, and (3) {expr}[{expr}] = {expr} assigns a value to an element of a list. When assigning a value
to index i of a list x, if i < 0 or i >= len(x) then the program aborts after printing an appropriate error
message.
A single assignment may assign the same value to several different destinations. For example, the code
x = y.f = z[0] = 1 assigns the integer value 1 to three memory locations: (1) the variable x, (2) the
attribute f of the object referenced by variable y, (3) and the first element of the list z, in that order. That
is, the final expression (the right-hand side) is evaluated first. The result is then assigned to the left-hand
sides (left of the = symbols, that is), evaluating these from left to right.
2.8.4 Pass statement
The pass statement is a no-op. The program state does not change and control flow simply continues on to
the next statement.
2.8.5 Return statement
The return statement terminates the execution of a function and optionally returns a value using the return
{expr} syntax. If a return value is not specified, then the None value is returned. It is illegal for a return
statement to occur at the top level outside a function or method body. During a function’s execution, if
control flow reaches the end of the function body without encountering a return statement, then the None
value is implicitly returned. Consider the following example:
def bar(x: int) -> object:
if x > 0:
return
elif x == 0:
return None
else:
pass
In function bar, the execution of the function can terminate either because (1) x > 0 and a return
statement with no return value is executed, or (2) x == 0 and an explicit return None is executed, or (3)
x < 0 and the control flow reaches the end of the function, implicitly returning None.
In functions or methods that declare a return type of int, str, or bool, all execution paths must contain
a return statement with an expression that is not a None literal. In class definitions, init methods must
have an empty return type (indicating that they always return None).
2.8.6 Predefined classes and functions
The functions print, input and len are provided by the runtime.
print takes an argument of type object, and outputs outputs its printed form to the standard output,
returning the value None. The valid arguments are restricted to str, int, or bool. Other arguments
cause the program to abort with an error message.
input takes no arguments and returns a value of type str by reading a line of input from the standard
input, including the final newline. It returns an empty string when the standard input is exhausted.
len takes an argument x of type object, returning its length if it is a str or a list. Other arguments cause
the program to abort with an error message.
The predefined classes object, int, bool, and str each define an init method. “Calling” these
classes yield an empty object, the value 0, the value False, and the value "" (empty string) respectively.
11
3 Lexical structure
This section describes the details required to implement a lexical analysis for ChocoPy. A lexical analysis
reads an input file and produces a sequences of tokens. Tokens are matched in the input string using lexical
rules that are expressed using regular expressions. Where ambiguity exists, a token comprises the longest
possible string that forms a legal token, when read from left to right.
The following categories of tokens exist: line structure, identifiers, keywords, literals, operators, and
delimiters.
3.1 Line structure
In ChocoPy, like in Python, whitespace may be significant both for terminating a statement and for reasoning
about the indentation level of a program statement. To accommodate this, ChocoPy defines three lexical
tokens that are derived from whitespace: NEWLINE, INDENT, and DEDENT. The rules for when such tokens are
generated are described next using the concepts of physical and logical lines.
3.1.1 Physical lines
A physical line is a sequence of characters terminated by an end-of-line sequence. In source files and strings,
the following line termination sequences can be used: the Unix form using ASCII LF (\n), the Windows form
using the sequence ASCII CR LF (\r\n), or the old Macintosh form using the ASCII CR (\r) character.
All of these forms can be used equally, regardless of platform. The end of input also serves as an implicit
terminator for the final physical line.
3.1.2 Logical lines
A logical line is a physical line that contains at least one token that is not whitespace or comments. The end
of a logical line is represented by the lexical token NEWLINE. Statements cannot cross logical line boundaries
except where NEWLINE is allowed by the syntax (e.g., between statements in control-flow structures such as
while loops).
3.1.3 Comments
A comment starts with a hash character (#) that is not part of a string literal, and ends at the end of the
physical line. Comments are ignored by the lexical analyzer; they are not emitted as tokens.
3.1.4 Blank lines
A physical line that contains only spaces, tabs, and possibly a comment, is ignored (i.e., no NEWLINE token
is generated).
3.1.5 Indentation
The description of indentation is borrowed from the Python 3 documentation
1
.
“Leading whitespace (spaces and tabs) at the beginning of a logical line is used to compute the indentation
level of the line, which in turn is used to determine the grouping of statements. Tabs are replaced (from left
to right) by one to eight spaces such that the total number of characters up to and including the replacement
is a multiple of eight (this is intended to be the same rule as used by Unix). The total number of spaces
preceding the first non-blank character then determines the line’s indentation.”
“The indentation levels of consecutive lines are used to generate INDENT and DEDENT tokens, using a
stack, as follows: Before the first line of the input program is read, a single zero is pushed on the stack; this
will never be popped off again. The numbers pushed on the stack will always be strictly increasing from
bottom to top. At the beginning of each logical line, the line’s indentation level is compared to the top of
the stack. If it is equal, nothing happens. If it is larger, it is pushed on the stack, and one INDENT token is
1
https://docs.python.org/3/reference/lexical_analysis.html
12
generated. If it is smaller, it must be one of the numbers occurring on the stack; all numbers on the stack
that are larger are popped off, and for each number popped off a DEDENT token is generated. At the end of
the input program, a DEDENT token is generated for each number remaining on the stack that is larger than
zero.”
3.1.6 Whitespace between tokens
Except at the beginning of a logical line or in string literals, the whitespace characters space and tab can be
used interchangeably to separate tokens. Whitespace is needed between two tokens only if their concatenation
could otherwise be interpreted as a different token (e.g., ab is one token, but a b is two tokens). Whitespace
characters are not tokens; they are simply ignored.
3.2 Identifiers
Identifiers are defined as a contiguous sequence of characters containing the uppercase and lowercase letters
A through Z, the underscore and, except for the first character, the digits 0 through 9.
3.3 Keywords
The following strings are not recognized as identifiers, and are instead recognized as distinct keyword tokens:
False, None, True, and, as, assert, async, await, break, class, continue, def, del, elif, else,
except, finally, for, from, global, if, import, in, is, lambda, nonlocal, not, or, pass, raise, return,
try, while, with, yield.
Not all keywords have special meaning in ChocoPy. For example, ChocoPy does not support async
or await. However, ChocoPy uses the same list of keywords as Python in order to avoid cases where an
identifier is legal in ChocoPy but not in Python. Consequently, some keywords (such as async) do not
appear anywhere in the grammar and will simply lead to a syntax error.
An identifier may contain a keyword as a substring; for example, classic is a valid identifier even though
it contains the substring class. This follows from the longest match rule.
3.4 Literals
String and integer literals are matched at the lexical analysis stage and are represented by string-valued and
integer-valued tokens, respectively. The structure of these literals is described below. Boolean literals True
and False are represented simply by their keyword tokens.
3.4.1 String literals
String literals in ChocoPy are greatly simplified from that in Python. In ChocoPy, string literals are simply
a sequence of ASCII characters delimited by (and including) double quotes: "...". The ASCII characters
must lie within the decimal range 32-126 inclusive—that is, higher than or equal to the space character and
up to tilde. The string itself may contain double quotes escaped by a preceding backslash, e.g. \".
Because string literals are used both for values and for type names, it is convenient to distinguish two
categories: string literals whose content has the syntax of an identifier (IDSTRING in the syntax), and other
string literals (denoted STRING).
The value of a string token is the sequence of characters between the delimiting double quotes, with any
escape sequences applied. The following escape sequences are recognized: \", \n, \t, \\, which correspond
to a literal double quote, a newline, a tab, and a literal backslash respectively. Any other escape sequence is
considered illegal. Some examples follow:
Literal Value
"Hello" Hello
"He\"ll\"o" He"ll"o
"He\\\"llo" He\"llo
"Hell\o" (error: "\o" not recognized)
13
3.4.2 Integer literals
Integer literals in ChocoPy are composed of a sequence of one or more digits 0-9, where the leftmost digit
may only be 0 if it is the only character in the sequence. That is, non-zero valued integer literals may not
have leading zeros. The integer value of such literals is interpreted in base 10. The maximum interpreted
value can be 2
31
1 for the literal 2147483647. A literal with a larger value than this limit results in a
lexical error.
3.5 Operators and delimiters
The following is a space-separated list of symbols that correspond to distinct ChocoPy tokens: + - * // %
< > <= >= == != = ( ) [ ] , : . ->
4 Syntax
Figure 3 lists the grammar of ChocoPy using an extended BNF notation. Keyword tokens are represented
in a boldfaced font. Literals and whitespace tokens are represented in UPPERCASE. Nonterminals are
formatted lowercase italics. Operators and delimiters are formatted as-is. The notation J. . .K is used to group
one or more symbols in a production rule and are not tokens in the input language. Symbols or groups may
be annotated as follows: ‘?’ denotes that the preceding symbol or group is optional, denotes zero or more
repeating occurrences and ‘+’ denotes one or more repeating occurrences.
The puzzling division of expressions into expr and cexpr captures the obscure point that ChocoPy (like
Python) only allows logical binary or unary expressions as operands of logical operators (and, or, not). As
a result, the expression True == not False is supposed to produce a syntax error (the correct expression
being True == (not False)).
4.1 Precedence and Associativity
Operators in ChocoPy have the same precedence as they do in Python. The following table summarizes
the precedence of operators in ChocoPy, from lowest precedence (least binding) to highest precedence (most
binding).
Precedence Operator(s) Associativity
1 · if · else · Right
2 or Left
3 and Left
4 not N/A
5 ==, !=, <, >, <=, >= , is None
6 +, - (binary) Left
7 *, // , % Left
8 - (unary) N/A
9 ., [] Left
Note that the comparison operators are nonassociative (so ChocoPy, unlike Python 3, does not allow ex-
pressions such as x < y < z).
14
program ::= Jvar def | func def | class def K
stmt
class def ::= class ID ( ID ) : NEWLINE INDENT class body DEDENT
class body ::= pass NEWLINE
| Jvar def | func def K
+
func def ::= def ID ( Jtyped var J, typed varK
K
?
) J-> typeK
?
: NEWLINE INDENT func body DEDENT
func body ::= Jglobal decl | nonlocal decl | var def | func def K
stmt
+
typed var ::= ID : type
type ::= ID | IDSTRING | [ type ]
global decl ::= global ID NEWLINE
nonlocal decl ::= nonlocal ID NEWLINE
var def ::= typed var = literal NEWLINE
stmt ::= simple stmt NEWLINE
| if expr : block Jelif expr : block K
Jelse : block K
?
| while expr : block
| for ID in expr : block
simple stmt ::= pass
| expr
| return Jexpr K
?
| J target = K
+
expr
block ::= NEWLINE INDENT stmt
+
DEDENT
literal ::= None
| True
| False
| INTEGER
| IDSTRING | STRING
expr ::= cexpr
| not expr
| expr Jand | orK expr
| expr if expr else expr
cexpr ::= ID
| literal
| [ Jexpr J, expr K
K
?
]
| ( expr )
| member expr
| index expr
| member expr ( Jexpr J, expr K
K
?
)
| ID ( Jexpr J, exprK
K
?
)
| cexpr bin op cexpr
| - cexpr
bin op ::= + | - | * | // | % | == | != | <= | >= | < | > | is
member expr ::= cexpr . ID
index expr ::= cexpr [ expr ]
target ::= ID
| member expr
| index expr
Figure 3: Grammar describing the syntax of the ChocoPy language.
15
5 Type rules
This section formally defines the type rules of ChocoPy. The type rules define the type of every ChocoPy
expression in a given context. The context is the type environment, which describes the type of every
unbound identifier appearing in an expression. The type environment is described in Section 5.1. Section
5.2 gives the type rules.
5.1 Type environments
To a first approximation, type checking in ChocoPy can be thought of as a bottom-up algorithm: the type
of an expression e is inferred from the (previously inferred) types of e’s sub-expression. For example, an
integer 1 has type int; there are no sub-expression in this case. As another example, if the types of e
1
and
e
2
are int, then the expression e
1
> e
2
has type bool.
A complication arises in the case of an expression v, where v is a variable or a function. It is not possible
to say what the type of v is in a strictly bottom-up algorithm; we need to know the type declared for v in the
larger expression. Such a declaration must exist for every variable and function in valid ChocoPy programs.
To capture information about the types of identifiers, we use a type environment. The type environment
consists of four parts: O a local environment, M a method/attribute environment M, C the name of the
current class in which the expression or statement appears, and R the return type of the function or method in
which the expression or statement appears. C is when the expression or statement appears outside a class,
i.e. as a statement or expression in the top level. Similarly, R is when the expression or statement appears
outside a function or method, i.e. as a statement or expression in the top level. The local environment and
the method/attribute environment are both maps. The local environment is a function of the form:
O(v) = T
which assigns the type T to a variable v. The same environment also holds information about function
signatures. For example,
O(f) = {T
1
× · · · × T
n
T
0
; x
1
, . . . , x
n
; v
1
: T
0
1
, . . . , v
m
: T
0
m
}
gives the type of f and denotes that identifier f has formal parameters x
1
, . . . x
n
of types T
1
, . . . , T
n
, respec-
tively, and has return type T
0
. The identifiers v
1
, . . . , v
m
are the variables and nested functions declared
in the body of f and their types are T
0
1
, . . . , T
0
m
, respectively. The method/attribute environment similarly
maps a class and its attributes and methods to their types. For example,
M(C, a) = T
maps the attribute a in the class C to the type T . Similarly,
M(C, m) = {T
1
× · · · × T
n
T
0
; x
1
, . . . , x
n
; v
1
: T
0
1
, . . . , v
k
: T
0
k
}
maps the method m of class C to it type. Specifically, it denotes that method m in class C has formal
parameters x
1
, . . . x
n
of types T
1
, . . . , T
n
, respectively, and has return type T
0
. The identifiers v
1
, . . . , v
k
are
the variables and nested functions declared in the body of m and their types are T
0
1
, . . . , T
0
k
, respectively.
The third component of the type environment is the name of the class containing the expression or
statement to be type checked. The fourth component of the type environment is the return type R of the
function or method containing the expression or statement to be type checked.
When type checking function and method definitions, we need to propagate the typing environment
from an outer scope to the function scope, where the binding of any identifier is inherited unless the
function declares a formal parameter, variable, or a nested function with the same name. Let O be the
current local environment and f be a function with type definition {T
1
× · · · × T
n
T
0
; x
1
, . . . , x
n
; v
1
:
T
0
1
, . . . , v
m
: T
0
m
}. When type checking the definition of f, we type check its body using the local environ-
ment O[T
1
/x
1
][T
2
/x
2
] . . . [T
n
/x
n
][T
0
1
/v
1
] . . . [T
0
m
/v
m
], where the notation O[T/c] is used to construct a new
mapping as follows:
16
O[T/c](c) = T
O[T/c](d) = O(d) if d 6= c
5.2 Type checking rules
The general form of a type checking rule is:
.
.
.
O, M, C, R ` e : T
The rule should be read: in the type environment with local environment O, method/attribute environment
M, containing class C, and return type R, the expression e has type T . The line below the bar is a typing
judgment: the turnstyle ` separates context (O, M, C, R) from a proposition e : T . The dots above the
horizontal bar stand for other judgments about the types of sub-expressions of e. These other judgments are
hypotheses of the rule; if the hypotheses are satisfied, then the judgment below the bar is true.
Variables. The rule for variables is simply that if the environment assigns an identifier id a type T , then
the expression id has type T .
O(id) = T, where T is not a function type.
O, M, C, R ` id : T
[var-read]
We must, however, prohibit identifiers with function types when reading values (that is, when identifiers
are used as expressions in the syntax). This simply reflects the fact that ChocoPy does not treat functions
as first-class (assignable, storable) values.
Variable Definitions and Assignments. This assignment rule—as well as others—uses the relation
a
(ref. Section 2.4). The rule says that the assigned expression e
1
must have a type T
1
that is assignment
compatible with the type T of the identifier id in the type environment.
O(id) = T
O, M, C, R ` e
1
: T
1
T
1
a
T
O, M, C, R ` id = e
1
[var-assign-stmt]
Variable definitions obey a similar rule:
O(id) = T
O, M, C, R ` e
1
: T
1
T
1
a
T
O, M, C, R ` id: T = e
1
[var-init]
The colon used below the line in the rule for var-init is the colon in the syntax for type annotations.
Statement and Definition Lists. These type check if all the component definitions and statements type
check.
17
O, M, C, R ` s
1
O, M, C, R ` s
2
.
.
.
O, M, C, R ` s
n
n 1
O, M, C, R ` s
1
NEWLINE s
2
NEWLINE . . . s
n
NEWLINE
[stmt-def-list]
Pass Statements.
O, M, C, R ` pass
[pass]
Expression Statements.
O, M, C, R ` e : T
O, M, C, R ` e
[expr-stmt]
Literals.
O, M, C, R ` False : bool
[bool-false]
O, M, C, R ` True : bool
[bool-true]
i is an integer literal
O, M, C, R ` i : int
[int]
s is a string literal
O, M, C, R ` s : str
[str]
The None literal is assigned the (unmentionable) type <None>:
O, M, C, R ` None : <None>
[none]
Arithmetic and Numerical Relational Operators.
O, M, C, R ` e : int
O, M, C, R ` - e : int
[negate]
O, M, C, R ` e
1
: int
O, M, C, R ` e
2
: int
op {+, , , //, %}
O, M, C, R ` e
1
op e
2
: int
[arith]
O, M, C, R ` e
1
: int
O, M, C, R ` e
2
: int
{<, <=, >, >=, ==, !=}
O, M, C, R ` e
1
e
2
: bool
[int-compare]
18
Logical Operators.
O, M, C, R ` e
1
: bool
O, M, C, R ` e
2
: bool
{==, !=}
O, M, C, R ` e
1
e
2
: bool
[bool-compare]
O, M, C, R ` e
1
: bool
O, M, C, R ` e
2
: bool
O, M, C, R ` e
1
and e
2
: bool
[and]
O, M, C, R ` e
1
: bool
O, M, C, R ` e
2
: bool
O, M, C, R ` e
1
or e
2
: bool
[or]
O, M, C, R ` e : bool
O, M, C, R ` not e : bool
[not]
Conditional Expressions.
O, M, C, R ` e
0
: bool
O, M, C, R ` e
1
: T
1
O, M, C, R ` e
2
: T
2
O, M, C, R ` e
1
if e
0
else e
2
: T
1
t T
2
)
[cond]
String Operations.
O, M, C, R ` e
1
: str
O, M, C, R ` e
2
: str
{==, !=}
O, M, C, R ` e
1
e
2
: bool
[str-compare]
O, M, C, R ` e
1
: str
O, M, C, R ` e
2
: str
O, M, C, R ` e
1
+ e
2
: str
[str-concat]
O, M, C, R ` e
1
: str
O, M, C, R ` e
2
: int
O, M, C, R ` e
1
[e
2
] : str
[str-select]
The ‘is’ Operator.
O, M, C, R ` e
1
: T
1
O, M, C, R ` e
2
: T
2
T
1
, T
2
are not one of int, str, bool
O, M, C, R ` e
1
is e
2
: bool
[is]
19
Object Construction. If T is the name of a class, then object-construction expressions of that class can
be typed as follows:
T is a class
O, M, C, R ` T () : T
[new]
List Displays.
n 1
O, M, C, R ` e
1
: T
1
O, M, C, R ` e
2
: T
2
.
.
.
O, M, C, R ` e
n
: T
n
T = T
1
t T
2
t . . . t T
n
O, M, C, R ` [e
1
, e
2
, . . . , e
n
] : [T ]
[list-display]
The empty list is a special case:
O, M, C, R ` [] : <Empty>
[nil]
List Operators.
O, M, C, R ` e
1
: [T
1
]
O, M, C, R ` e
2
: [T
2
]
T = T
1
t T
2
O, M, C, R ` e
1
+ e
2
: [T ]
[list-concat]
O, M, C, R ` e
1
: [T ]
O, M, C, R ` e
2
: int
O, M, C, R ` e
1
[e
2
] : T
[list-select]
O, M, C, R ` e
1
: [T
1
]
O, M, C, R ` e
2
: int
O, M, C, R ` e
3
: T
3
T
3
a
T
1
O, M, C, R ` e
1
[e
2
] = e
3
[list-assign-stmt]
Attribute Access, Assignment, and Initialization. For attribute access, we use the class-member
environment M:
O, M, C, R ` e
0
: T
0
M(T
0
, id) = T
O, M, C, R ` e
0
.id : T
[attr-read]
O, M, C, R ` e
0
: T
0
O, M, C, R ` e
1
: T
1
M(T
0
, id) = T
T
1
a
T
O, M, C, R ` e
0
.id = e
1
[attr-assign-stmt]
20
M(C, id) = T
O, M, C, R ` e
1
: T
1
T
1
a
T
O, M, C, R ` id: T = e
1
[attr-init]
Multiple Assignments. Multiple assignment is type-checked by decomposing into individual single as-
signments, as follows.
n > 1
O, M, C, R ` e
0
: T
0
O, M, C, R ` e
1
= e
0
.
.
.
O, M, C, R ` e
n
= e
0
T
0
6= [<None>]
O, M, C, R ` e
1
= e
2
= · · · = e
n
= e
0
[multi-assign-stmt]
The restriction that T
0
6= [<None>] avoids a subtle type-safety issue. It is dangerous to allow there to
be two different views of a list with differing element types. The type [<None>] can only arise from list
displays. As long as the value of such a display is immediately consumed by assignment to a single variable,
parameter, or operand (+ for lists), there will be only be one opinion as to its type subsequently. But multiple
assignment opens the possibility of programs like this:
x: [A] = None
y: [[int]] = None
x = y = [None] # Trouble ahead!
x[0] = A()
print(y[0][0]) # ???
Java, for example, gets itself into this bind and therefore needs a runtime ArrayStoreException. To avoid
it in ChocoPy, we conservatively forbid values of the type [<None>] to be multiply assigned.
There is another very subtle point lurking here in the case where e
0
has type <Empty> (the type of the empty
list). In this case, however, we need no special rule because in ChocoPy, there is no .append method to
allow elements to be added to an empty list. Were that not the case, we could get this situation:
A: [int] = None
B: [str] = None
A = B = []
A.append(3)
and we’d subsequently have B[0] returning the value 3, which is certainly not a string.
Function Applications.
O, M, C, R ` e
1
: T
00
1
O, M, C, R ` e
2
: T
00
2
.
.
.
O, M, C, R ` e
n
: T
00
n
n 0
O(f) = {T
1
× · · · × T
n
T
0
; x
1
, . . . , x
n
; v
1
: T
0
1
, . . . , v
m
: T
0
m
}
1 i n : T
00
i
a
T
i
O, M, C, R ` f(e
1
, e
2
, . . . , e
n
) : T
0
[invoke]
21
To type check a function invocation, each of the arguments to the function must be first type checked. The
type of each argument must conform to the types of the corresponding formal parameter of the function.
The invocation expression is assigned the function’s declared return type.
Method dispatch expressions are type checked in a similar fashion. The key difference from the function
invocation expression is that the target method is determined by consulting the method/attribute environ-
ment using the type of the receiver object expression:
O, M, C, R ` e
1
: T
00
1
O, M, C, R ` e
2
: T
00
2
.
.
.
O, M, C, R ` e
n
: T
00
n
n 1
M(T
00
1
, f) = {T
1
× · · · × T
n
T
0
; x
1
, . . . , x
n
; v
1
: T
0
1
, . . . , v
m
: T
0
m
}
T
00
1
a
T
1
i. 2 i n : T
00
i
a
T
i
O, M, C, R ` e
1
.f(e
2
, . . . , e
n
) : T
0
[dispatch]
Return Statements. This is where the return-type environment comes into play:
O, M, C, R ` e : T
T
a
R
O, M, C, R ` return e
[return-e]
<None>
a
R
O, M, C, R ` return
[return]
Conditional Statements.
O, M, C, R ` e
0
: bool
O, M, C, R ` b
0
O, M, C, R ` e
1
: bool
O, M, C, R ` b
1
.
.
.
O, M, C, R ` e
n
: bool
O, M, C, R ` b
n
n 0
O, M, C, R ` b
n+1
O, M, C, R ` if e
0
:b
0
elif e
1
:b
1
. . . elif e
n
:b
n
else:b
n+1
[if-elif-else]
While Statements.
O, M, C, R ` e : bool
O, M, C, R ` b
O, M, C, R ` while e:b
[while]
For Statements.
O, M, C, R ` e : str
O(id) = T
str
a
T
O, M, C, R ` b
O, M, C, R ` for id in e:b
[for-str]
22
O, M, C, R ` e : [T
1
]
O(id) = T
T
1
a
T
O, M, C, R ` b
O, M, C, R ` for id in e:b
[for-list]
Function Definitions. To type a function definition for f, we check the body of the function f in an
environment where O is extended with bindings for the names explicitly declared by f.
T =
T
0
, if -> is present,
<None>, otherwise.
O(f) = {T
1
× · · · × T
n
T ; x
1
, . . . , x
n
; v
1
: T
0
1
, . . . , v
m
: T
0
m
}
n 0 m 0
O[T
1
/x
1
] . . . [T
n
/x
n
][T
0
1
/v
1
] . . . [T
0
m
/v
m
], M, C, T ` b
O, M, C, R ` def f(x
1
:T
1
, . . . , x
n
:T
n
) J-> T
0
K
?
:b
[func-def]
Likewise for methods:
T =
T
0
, if -> is present,
<None>, otherwise.
M(C, f) = {T
1
× · · · × T
n
T ; x
1
, . . . , x
n
; v
1
: T
0
1
, . . . , v
m
: T
0
m
}
n 1 m 0
C = T
1
O[T
1
/x
1
] . . . [T
n
/x
n
][T
0
1
/v
1
] . . . [T
0
m
/v
m
], M, C, T ` b
O, M, C, R ` def f(x
1
:T
1
, . . . , x
n
:T
n
) J-> T
0
K
?
:b
[method-def]
Class Definitions. Class definitions are type checked by propagating the appropriate typing environment:
O, M, C, R ` b
O, M, , R ` class C(S):b
[class-def]
The Global Typing Environment. The following functions and class methods are predefined globally:
O(len) = {object int; arg}
O(print) = {object <None>; arg}
O(input) = {→ str}
M(object, init ) = {object <None>; self}
M(str, init ) = {object <None>; self}
M(int, init ) = {object <None>; self }
M(bool, init ) = {object <None>; self}
6 Operational semantics
This section contains the formal operational semantics for the ChocoPy language. The operational semantics
define how every definition, statement, or expression in a ChocoPy program should be evaluated in a given
context. The context has four components: a global environment, a local environment, a store, and a return
23
value. Section 6.1 describes these components. Section 6.2 defines the syntax used to refer to ChocoPy
values, and Section 6.3 defines the syntax used to refer to class definitions.
Keep in mind that a formal semantics is a specification only—it does not describe an implementation.
The purpose of presenting the formal semantics is to make clear all the details of the behavior of a ChocoPy
program. How this behavior is implemented is another matter.
6.1 Evaluation context
The value of a ChocoPy expression depends on the context in which it is evaluated. The context comprises
of an environment, which maps variable identifiers to locations. Intuitively, an environment tells us for
a given identifier the address of the memory location where that identifier’s value is stored. For a given
expression, the environment must assign a location to all identifiers to which the expression may refer. For
the expression a + b, we need an environment that maps a to some location and b to some location. We’ll
use the following syntax to describe environments, which is very similar to the syntax of type environments
defined in Section 5.1.
E = [a : l
1
, b : l
2
]
This environment maps variable a to location l
1
and variable b to location l
2
.
The second component of the evaluation context is the store (memory). The store maps locations to
values, where values in ChocoPy are objects or functions. Intuitively, a store tells us what value is stored in a
given memory location. For the moment, assume all values are integers. A store is similar to an environment:
S = [l
1
7→ 42, l
2
7→ 7]
This store maps location l
1
to the value 42 and the location l
2
to the value 7. Given an environment and a
store, the value of a variable can be retrieved by first looking up the location of a variable in the environment
E, and then looking up the value stored at this location in the store S. For example, the value of variable a
can be looked up in the following way:
E(a) = l
1
S(l
1
) = 42
Together, the environment and the store define the execution state at a particular step of the evaluation
of a ChocoPy program. The double indirection from identifiers to locations to values allows us to model
variables. Consider what happens if the value 11 is assigned to variable a in the environment and store
defined above. Assigning to a variable means changing the value to which it refers but not its location. To
perform the assignment, we look up the location for a in the environment E and then change the mapping
for the obtained location to the new value, giving a new store S
0
.
E(a) = l
1
S
0
= S[11/l
1
]
Where the syntax S
0
= S[v/l] denotes a new store S
0
that is identical to the store S, except that S
0
maps
location l to value v. It is formally defined as follows:
S[v/l](l) = v
S[v/l](l
0
) = S(l
0
) if l
0
6= l
There are also situations in which the environment is modified. Consider the definition of a variable in
ChocoPy:
x : int = 36
24
When evaluating this definition, we must introduce a new identifier x into the environment, which will be
valid for the rest of the scope in which definition occurs. If the current environment and store are E and S
respectively, then the new environment and store after evaluating this definition are E
0
and S
0
respectively,
which are defined by:
l
x
= newloc(S)
E
0
= E[l
x
/x]
S
0
= S[36/l
x
]
The first step is to allocate a location for the variable x. The location should be fresh, meaning that the
current store should not have a mapping for it. The function newloc applied to a store gives us an unused
location in that store. We then create a new environment E
0
that maps x to l
x
but also contains all of the
mappings of E for identifiers other than x. If x already has a mapping in E, the new environment E
0
hides
this old mapping. We must also update the store to map the new location to a value. In this case l
x
maps
to the value 36, which is the initial value for x specified in the variable’s definition.
When we need several distinct new locations at once, as when creating a new list, we’ll extend newloc
to do so: newloc(S, n) produces n distinct new locations in S.
The example in this subsection oversimplifies ChocoPy environments and stores a bit, because simple
integers are not ChocoPy values. Even integers are full-fledged objects in ChocoPy.
In ChocoPy, the evaluation context consists of an additional component: a global environment G. G is an
environment similar to E, but it always contains mappings for variables and functions defined at the global
scope. This environment is useful for handling cases where a nested function references a global variable x
via the global x declaration, bypassing any inherited mappings for variable x from the enclosing function.z
6.2 Syntax for values
To describe ChocoPy semantics, we use the following categories of values: objects that are instances of
classes, list objects, the None value, and functions (or methods).
6.2.1 Class instances
Let v be a value corresponding to a object belonging to class X. The value v is represented by the syntax:
v = X(a
1
= l
1
, a
2
= l
2
, . . . , a
n
= l
n
)
where each a
i
for 1 i n is an attribute or method (including those inherited) of class X, and each l
i
is
the location where the value of attribute or method a
i
is stored. Each a
i
is distinct in a semantically valid
ChocoPy program.
For base objects of ChocoPy (i.e., int, str, bool), we use a special case of the above syntax. Base
objects have a class name, but their attributes are not like attributes of normal classes, because they cannot
be modified. Therefore, we describe base objects using the following syntax:
int(5)
bool(T rue)
str(7, “ChocoPy”)
Integers and booleans contain just one component: the integer or boolean value respectively. Strings
contain two components: a length and the actual sequence of ASCII characters.
25
6.2.2 List objects
A list object v of length n is represented by the syntax:
v = [l
0
, l
2
, . . . , l
n1
]
where each l
i
is the element at index i.
6.2.3 None
The special value None is represented simply as None.
6.2.4 Functions
Functions are not first-class values in ChocoPy; that is, ChocoPy variables cannot store references to functions
and ChocoPy expressions cannot evaluate to a function. However, functions are represented as values in the
formal semantics and are the result of evaluating function and method definitions. Global functions, nested
functions, and methods of classes are all represented by the same syntax, which is as follows:
v = (x
1
, . . . , x
n
, y
1
= e
1
, . . . , y
k
= e
k
, b
body
, E
f
)
Here, v is a function having n formal parameters x
1
, . . . , x
n
and k local definitions y
1
, . . . , y
k
. Local definitions
include local variables and nested functions. The term e
i
represents the definition corresponding to the
identifier y
i
. For variable definitions, e
i
is the literal expression that gives its initial value. For nested
functions, e
i
is the function definition itself. The term b
body
is the function’s body, which is a sequence of
statements. The term E
f
is the environment in which the function is defined, with appropriate substitutions
for global declarations within the function.
Certain functions are predefined in ChocoPy: len, print, and input. They don’t have bodies—at least
not bodies written in ChocoPy—but if we are going to denote their values, we’ll need some way of denoting
these bodies. So, for the predefined functions, we’ll use the “bodies” native print, native len, and native
input, suggestive of the syntax used for native methods in Java.
6.3 Syntax for class definitions
When referring to class definitions, we need to use an additional notation. The mapping class is used to get
the attributes and methods defined in a particular class. The syntax is as follows:
class(A) = (a
1
= e
1
, . . . , a
m
= e
m
)
where each a
i
is an attribute or method belonging to the class A (including inherited members). If a
i
is an
attribute, then the corresponding e
i
is the literal expression that specifies the attribute’s initial value. If a
i
is a method name, then e
i
is the corresponding method definition (which has the same syntax as a function
definition).
For classes that inherit the method init from its definition in class object, we assume that the
method body contains a single statement: pass.
6.4 Operational rules
Equipped with the notation for environments, stores, and the syntax for values and class definitions, we can
now present the formal operational semantics rules for ChocoPy. The general form of a rule is:
.
.
.
G, E, S ` e : v, S
0
, R
0
This rule should be read as follows: in a context with global environment G, local environment E, and store
S, the program fragment e evaluates to value v, after which the new store is S
0
and the returned value is R
0
.
26
Program fragments include expressions, definitions, statements, and sequences of statements. The value v
will only be meaningful when e is an expression or function/method definition; in other cases, we represent
the value v by the (underscore) symbol to denote the absence of any value.
Literals.
G, E, S ` None : None, S ,
[none]
G, E, S ` False : bool(false), S,
[bool-false]
G, E, S ` True : bool(true), S,
[bool-true]
i is an integer literal
G, E, S ` i : int(i), S,
[int]
s is a string literal
n is the length of the string s
G, E, S ` s : str(n, s), S,
[str]
Pass Statements.
G, E, S ` pass : , S,
[pass]
Expression Statements. Here, we evaluate an expression, which may modify the store, and then discard
its value:
G, E, S ` e : v, S
0
,
G, E, S ` e : , S
0
,
[expr-stmt]
Variable Accesses. The values of variables involve finding their location in the environment and then
reading or writing their values in the store.
E(id) = l
id
S(l
id
) = v
G, E, S ` id : v, S,
[var-read]
G, E, S ` e : v, S
1
,
E(id) = l
id
S
2
= S
1
[v/l
id
]
G, E, S ` id = e : , S
2
,
[var-assign-stmt]
The last rule should be read as follows: the right-hand side of the assignment is first evaluated with the
context G, E, S, to produce value v; this evaluation may result in a modified store S
1
. After evaluating the
variable assignment, the new store is S
2
, which is a modification of S
1
where the location l
id
maps to the
evaluated value of the right-hand-side: v.
27
Numerical Operations. These relate ChocoPy operators to corresponding mathematical operators.
G, E, S ` e : int(i
1
), S
1
,
v = int(i
1
)
G, E, S ` - e : v, S
1
,
[negate]
G, E, S ` e
1
: int(i
1
), S
1
,
G, E, S
1
` e
2
: int(i
2
), S
2
,
op {+, -, *, //, %}
op {//, %} i
2
6= 0
v = int(i
1
op i
2
)
G, E, S ` e
1
op e
2
: v, S
2
,
[arith]
G, E, S ` e
1
: int(i
1
), S
1
,
G, E, S
1
` e
2
: int(i
2
), S
2
,
{<, <=, >, >=, ==, !=}
v =
(
bool(true) if i
1
i
2
bool(f alse) otherwise
G, E, S ` e
1
e
2
: v, S
2
,
[int-compare]
G, E, S ` e
1
: bool(b
1
), S
1
,
G, E, S
1
` e
2
: bool(b
2
), S
2
,
{==, !=}
v =
(
bool(true) if b
1
b
2
bool(f alse) otherwise
G, E, S ` e
1
e
2
: v, S
2
,
[bool-compare]
String Operations.
G, E, S ` e
1
: str(n
1
, s
1
), S
1
,
G, E, S
1
` e
2
: str(n
2
, s
2
), S
2
,
{==, !=}
v =
bool(true) if is = and s
1
= s
2
bool(true) if is != and s
1
6= s
2
bool(f alse) otherwise
G, E, S ` e
1
e
2
: v, S
2
,
[str-compare]
G, E, S ` e
1
: str(n
1
, s
1
), S
1
,
G, E, S
1
` e
2
: str(n
2
, s
2
), S
2
,
v = str(n
1
+ n
2
, s
1
.s
2
) where s
1
.s
2
is the concatenated string
G, E, S ` e
1
+ e
2
: v, S
2
,
[str-concat]
G, E, S ` e
1
: str(n, c
1
.c
2
. . .c
n
), S
1
,
G, E, S
1
` e
2
: int(i), S
2
,
0 i < n
v = str(1, c
i+1
)
G, E, S ` e
1
[e
2
] : v, S
2
,
[str-select]
28
Object Identity.
G, E, S ` e
1
: v
1
, S
1
,
G, E, S
1
` e
2
: v
2
, S
2
,
v =
bool(true) if v
1
= None and v
2
= None
bool(true) if v
1
and v
2
are the same object in memory
bool(f alse) otherwise
G, E, S ` e
1
is e
2
: v, S
2
,
[is]
Logical Operators. The rule for the unary logical operator not is simple: the operand’s value is negated.
The binary logical operators perform short-circuit evaluation: if the result of logical conjunction or dis-
junction is apparent from evaluating the first operand (because the operand corresponds to False or True
respectively), then the second operand is not evaluated at all. Otherwise, the result of the conjunction or
disjunction is equal to the value of the second operand.
G, E, S ` e : bool(b), S
1
,
v =
(
bool(f alse) if b = true
bool(true) otherwise
G, E, S ` not e : v, S
1
,
[not]
G, E, S ` e
1
: bool(f alse), S
1
,
G, E, S ` e
1
and e
2
: bool(f alse), S
1
,
[and-1]
G, E, S ` e
1
: bool(true), S
1
,
G, E, S
1
` e
2
: v, S
2
,
G, E, S ` e
1
and e
2
: v, S
2
,
[and-2]
G, E, S ` e
1
: bool(true), S
1
,
G, E, S ` e
1
or e
2
: bool(true), S
1
,
[or-1]
G, E, S ` e
1
: bool(f alse), S
1
,
G, E, S
1
` e
2
: v, S
2
,
G, E, S ` e
1
or e
2
: v, S
2
,
[or-2]
Conditional Statements and Expressions. conditional (if-else and if-elif-else) statements and
conditional expressions are evaluated by first evaluating the condition, and then deciding which branch to
evaluate. The body of a conditional statement may contain a return statement, which is propagated by the
enclosing if statement as well.
G, E, S ` e : bool(true), S
1
,
G, E, S
1
` b
1
: , S
2
, R
G, E, S ` if e: b
1
else: b
2
: , S
2
, R
[if-else-true]
G, E, S ` e : bool(false), S
1
,
G, E, S
1
` b
2
: , S
2
, R
G, E, S ` if e: b
1
else: b
2
: , S
2
, R
[if-else-false]
29
G, E, S ` e
0
: bool(true), S
1
,
G, E, S
1
` b
0
: , S
2
, R
G, E, S ` if e
0
:b
0
elif e
1
:b
1
. . . elif e
n
:b
n
else:b
n+1
: , S
2
, R
[if-elif-true]
G, E, S ` e
0
: bool(f alse), S
1
,
G, E, S
1
` if e
1
:b
1
. . . elif e
n
:b
n
else:b
n+1
: , S
2
, R
G, E, S ` if e
0
:b
0
elif e
1
:b
1
. . . elif e
n
:b
n
else:b
n+1
: , S
2
, R
[if-elif-false]
G, E, S ` if e: b
1
else: pass : , S
1
, R
G, E, S ` if e: b
1
: , S
1
, R
[if-no-else]
G, E, S ` e : bool(true), S
1
,
G, E, S
1
` b
1
: v, S
2
,
G, E, S ` b
1
if e else b
2
: v, S
2
,
[if-else-expr-true]
G, E, S ` e : bool(false), S
1
,
G, E, S
1
` b
2
: v, S
2
,
G, E, S ` b
1
if e else b
2
: v, S
2
,
[if-else-expr-false]
While Loops. Evaluating a while loop first evaluates the condition. If the condition is false, the loop
terminates. If the condition is true, then the loop body is executed and the while loop is then evaluated
again unless the loop body returns a value.
G, E, S ` e : bool(false), S
1
,
G, E, S ` while e: b : , S
1
,
[while-false]
G, E, S ` e : bool(true), S
1
,
G, E, S
1
` b : , S
2
,
G, E, S
2
` while e: b : , S
3
, R
G, E, S ` while e: b : , S
3
, R
[while-true-loop]
G, E, S ` e : bool(true), S
1
,
G, E, S
1
` b : , S
2
, R
R is not
G, E, S ` while e: b : , S
2
, R
[while-true-return]
Return Statements. return statements explicitly or implicitly set the return value R, which then prop-
agates upward, preventing further execution until the nearest enclosing function call is reached.
G, E, S ` e : v, S
1
,
G, E, S ` return e : , S
1
, v
[return-e]
G, E, S ` return : , S, None
[return]
30
Statement Sequences. A sequence of statements is evaluated by evaluating each statement in turn, either
until some statement returns a value or until the last statement in the sequence is evaluated.
n 0
G, E, S
0
` s
1
: , S
1
,
G, E, S
1
` s
2
: , S
2
,
.
.
.
G, E, S
n1
` s
n
: , S
n
,
G, E, S
0
` s
1
NEWLINE s
2
NEWLINE . . . s
n
NEWLINE : , S
n
,
[stmt-seq]
n 0
G, E, S
0
` s
1
: , S
1
,
G, E, S
1
` s
2
: , S
2
,
.
.
.
G, E, S
k1
` s
k
: , S
k
, R
k n, R is not
G, E, S
0
` s
1
NEWLINE s
2
NEWLINE . . . s
n
NEWLINE : , S
k
, R
[stmt-seq-return]
Function Invocation and Method Dispatching. These are two of the three most complex operational
rules (the other being object creation, [new]). First, simple function invocation.
S
0
(E(f)) = (x
1
, . . . , x
n
, y
1
= e
0
1
, . . . , y
k
= e
0
k
, b
body
, E
f
)
n, k 0
G, E, S
0
` e
1
: v
1
, S
1
,
.
.
.
G, E, S
n1
` e
n
: v
n
, S
n
,
l
x1
, . . . , l
xn
, l
y 1
, . . . , l
y k
= newloc(S
n
, n + k)
E
0
= E
f
[l
x1
/x
1
] . . . [l
xn
/x
n
][l
y 1
/y
1
] . . . [l
y k
/y
k
]
G, E
0
, S
n
` e
0
1
: v
0
1
, S
n
,
.
.
.
G, E
0
, S
n
` e
0
k
: v
0
k
, S
n
,
S
n+1
= S
n
[v
1
/l
x1
] . . . [v
n
/l
xn
][v
0
1
/l
y1
] . . . [v
0
k
/l
y k
]
G, E
0
, S
n+1
` b
body
: , S
n+2
, R
R
0
=
(
None, if R is
R, otherwise
G, E, S
0
` f(e
1
, . . . , e
n
) : R
0
, S
n+2
,
[invoke]
First, the function’s value is fetched from the current store. Second, the arguments to the function call are
evaluated in left-to-right order. Then, new locations are allocated for the function’s formal parameters, local
variables and nested functions. A new environment E
0
is created for the function call, which maps the formal
parameters, local variables, and the names of nested functions to their corresponding locations. The store
S
n+1
maps these locations to their corresponding arguments, initial values, and function values respectively.
Finally, the body of the function is evaluated with this new environment E
0
and initial state S
n+1
. The
function invocation expression evaluates to the value returned by the function body, or the value None if the
function body was completely evaluated without encountering a return statement.
31
G, E, S ` e
0
: v
0
, S
0
,
v
0
= X(a
1
= l
1
, . . . , f = l
f
, . . . , a
m
= l
m
)
S
0
(l
f
) = (x
0
, x
1
, . . . , x
n
, y
1
= e
0
1
, . . . , y
k
= e
0
k
, b
body
, E
f
)
n, k 0
G, E, S
0
` e
1
: v
1
, S
1
,
.
.
.
G, E, S
n1
` e
n
: v
n
, S
n
,
l
x1
, . . . , l
xn
, l
y 1
, . . . , l
y k
= newloc(S
n
, n + k)
E
0
= E
f
[l
x0
/x
0
] . . . [l
xn
/x
n
][l
y 1
/y
1
] . . . [l
y k
/y
k
]
G, E
0
, S
n
` e
0
1
: v
0
1
, S
n
,
.
.
.
G, E
0
, S
n
` e
0
k
: v
0
k
, S
n
,
S
n+1
= S
n
[v
0
/l
x0
] . . . [v
n
/l
xn
][v
0
1
/l
y 1
] . . . [v
0
k
/l
y k
]
G, E
0
, S
n+1
` b
body
: , S
n+2
, R
R
0
=
(
None, if R is
R, otherwise
G, E, S ` e
0
.f(e
1
, . . . , e
n
) : R
0
, S
n+2
,
[dispatch]
The rule for dynamic dispatch is similar to the rule for function invocation, with two main differences: (1)
the target method is determined by first evaluating the object expression and then retrieving the function
value that the method slot in the resulting object maps to; (2) the object itself is passed as the first argument
to the method, before any of the arguments in the method-call expression.
Function Definitions. Function definitions result in the creation of function values. The rule is exactly
the same for the definition of a globally defined function, a nested function, or a method defined in a class:
g
1
, . . . , g
L
are the variables explicitly declared as global in f
y
1
= e
1
, . . . , y
k
= e
k
are the local variables and nested functions defined in f
E
f
= E[G(g
1
)/g
1
] . . . [G(g
L
)/g
L
]
v = (x
1
, . . . , x
n
, y
1
= e
1
, . . . , y
k
= e
k
, b
body
, E
f
)
G, E, S ` def f(x
1
:T
1
, . . . , x
n
:T
n
) J-> T
0
K
?
:b : v, S,
[func-method-def]
The function value captures the environment E in which it is defined. This allows a nested function to refer
to local variables and functions defined in enclosing scopes. The captured environment E is slightly modified
as E
f
, which overrides the mapping for variables explicitly declared as global within the function’s body
using the global declaration. This modification allows a nested function to reference a global variable even
if there exists a local variable defined with the same name in an enclosing scope.
Accessing Class Attributes. The rules for accessing class attributes are relatively simple. Here, we use
the syntax for class definitions described in Section 6.3 to state the necessary assumptions about the object
value being selected from (v
1
):
G, E, S
0
` e : v
1
, S
1
,
v
1
= X(a
1
= l
1
, . . . , id = l
id
, . . . , a
m
= l
m
)
v
2
= S
1
(l
id
)
G, E, S
0
` e.id : v
2
, S
1
,
[attr-read]
32
G, E, S
0
` e
2
: v
r
, S
1
,
G, E, S
1
` e
1
: v
l
, S
2
,
v
l
= X(a
1
= l
1
, . . . , id = l
id
, . . . , a
m
= l
m
)
S
3
= S
2
[v
r
/l
id
]
G, E, S
0
` e
1
.id = e
2
: , S
3
,
[attr-assign-stmt]
In the rule for attribute assignment, the expression on the right-hand side is evaluated before the expression
on the left-hand side.
Object Instantiation. Instead of treating classes as values stored in the state, as for functions, the
reference chooses to rely on the class(·) mapping from Section 6.3.
class(T ) = (a
1
= e
1
, . . . , a
m
= e
m
) m 1
l
a1
, . . . , l
am
= newloc(S, m)
v
0
= T (a
1
= l
ai
, . . . , a
m
= l
am
)
G, G, S ` e
1
: v
1
, S,
G, G, S ` e
2
: v
2
, S,
.
.
.
G, G, S ` e
m
: v
m
, S,
S
1
= S[v
1
/l
a1
] . . . [v
m
/l
am
]
l
init
= l
ai
such that a
i
= init
S
1
(l
init
) = (x
0
, y
1
= e
0
1
, . . . , y
k
= e
0
k
, b
body
, E
f
) k 0
l
x0
, l
y1
, . . . , l
y k
= newloc(S
1
, k + 1)
E
0
= E
f
[l
x0
/x
0
][l
y1
/y
1
] . . . [l
y k
/y
k
]
G, E, S
1
` e
0
1
: v
0
1
, S
1
,
.
.
.
G, E, S
1
` e
0
k
: v
0
k
, S
1
,
S
2
= S
1
[v
0
/l
x0
][v
0
1
/l
y1
] . . . [v
0
k
/l
y k
]
G, E
0
, S
2
` b
body
: , S
3
,
G, E, S ` T () : v
0
, S
3
,
[new]
This rule performs the following operations. First, a new object v
0
with class T is created by allocating
locations for each attribute and method that is defined or inherited by class T . Second, the attribute
initializers and method definitions are evaluated using the global environment; this distinction is important
since method definitions do not capture the environment E in which the object is being constructed. Third,
a new store S
1
is created by modifying the current store S with mappings for the newly allocated attributes
and methods of v
0
. Finally, the init method of the object v
0
is invoked via dynamic dispatch. The steps
required to invoke this method are similar to that of general dynamic dispatch, with the exception that the
init method does not accept any arguments apart from the object on which it is invoked.
List Displays. A list display creates a sequence of new locations in the store, which house the values of
the list elements.
33
n 0
G, E, S
0
` e
1
: v
1
, S
1
,
G, E, S
1
` e
2
: v
2
, S
2
,
.
.
.
G, E, S
n1
` e
n
: v
n
, S
n
,
l
1
, . . . , l
n
= newloc(S
n
, n)
v = [l
1
, l
2
, . . . , l
n
]
S
n+1
= S
n
[v
1
/l
1
][v
2
/l
2
] . . . [v
n
/l
n
]
G, E, S
0
` [ e
1
, e
2
, . . . , e
n
] : v, S
n+1
,
[list-display]
Operations on Lists. The rules for list selection, concatenation, and element update all use the locations
corresponding to list elements and the values they map to in the store.
G, E, S
0
` e
1
: v
1
, S
1
,
G, E, S
1
` e
2
: int(i), S
2
,
v
1
= [l
1
, l
2
, . . . , l
n
]
0 i < n
v
2
= S
2
(l
i+1
)
G, E, S
0
` e
1
[e
2
] : v
2
, S
2
,
[list-select]
G, E, S
0
` e
1
: v
1
, S
1
,
G, E, S
1
` e
2
: v
2
, S
2
,
v
1
= [l
1
, l
2
, . . . , l
n
]
v
2
= [l
0
1
, l
0
2
, . . . , l
0
m
]
n, m 0
l
00
1
, . . . , l
00
m+n
= newloc(S
2
, m + n)
v
3
= [l
00
1
, l
00
2
, . . . , l
00
n+m
]
S
3
= S
2
[S
2
(l
1
)/l
00
1
] . . . [S
2
(l
n
)/l
00
n
][S
2
(l
0
1
)/l
00
n+1
] . . . [S
2
(l
0
m
)/l
00
n+m
]
G, E, S
0
` e
1
+ e
2
: v
3
, S
3
,
[list-concat]
G, E, S
0
` e
3
: v
r
, S
1
,
G, E, S
1
` e
1
: v
l
, S
2
,
G, E, S
2
` e
2
: int(i), S
3
,
v
l
= [l
1
, l
2
, . . . , l
n
]
0 i < n
S
4
= S
3
[v
r
/l
i+1
]
G, E, S
0
` e
1
[e
2
] = e
3
: , S
4
,
[list-assign-stmt]
In the last rule, the expression on the right-hand side of the assignment operator is evaluated before the
expressions on the left-hand side.
Multiple Assignment. We can describe multiple assignments by a kind of rewriting, breaking them down
into a sequence of simple assignments and taking care to evaluate the right-hand side exactly once.
n > 1
l = newloc(S)
Xis a unique identifier not in the program
G, E, S ` e
0
: v, S
0
,
G, E[l/X], S
0
[v/l] ` e
1
= X NEWLINE e
2
= X NEWLINE . . . e
n
= X : , S
00
,
G, E, S ` e
1
= e
2
= · · · = e
n
= e
0
: , S
00
,
[multi-assign-stmt]
34
Predefined Functions. As mentioned in Section 6.2.4, the built-in functions have special bodies that do
not occur in value ChocoPy programs. The rule for function invocation will give rise to assertions involving
these bodies. For those that take an argument, the argument name will be val’. The following rules tell
what the body does once any function argument has been evaluated and assigned to the variable val. The
complete behavior of a call such as len(L) then follows from the rules below plus [invoke].
The predefined functions print and input perform IO.
S(E(val)) = v
v = int(i) or v = bool(b) or v = str(n, s)
G, E, S ` native print : , S, None
[print]
s is the next user-provided input string of length n
v = str(n, s)
G, E, S ` native input : , S, v
[input]
The predefined function len retrieves the length of a list or a string.
S(E(val)) = v
v = [l
1
, l
2
, . . . , l
n
]
n 0
G, E, S ` native len : , S, int(n)
[len-list]
S(E(val)) = v
v = str(n, s)
G, E, S ` native len : , S, int(n)
[len-str]
For Loops. The rules for for loops on lists and strings update the store after each iteration to map the
loop variable’s location to a value corresponding to a list element or substring respectively. In each case, we
have a special version for early termination due to a return from the loop body.
G, E, S
0
` e : v, S
0
0
,
v = [l
1
, l
2
, . . . , l
n
]
n 0
l
id
= E(id)
S
1
= S
0
0
[S
0
0
(l
1
)/l
id
]
G, E, S
1
` b : , S
0
1
,
S
2
= S
0
1
[S
0
1
(l
2
)/l
id
]
G, E, S
2
` b : , S
0
2
,
.
.
.
S
n
= S
0
n1
[S
0
n1
(l
n
)/l
id
]
G, E, S
n
` b : , S
0
n
,
G, E, S
0
` for id in e : b : , S
0
n
,
[for-list]
35
G, E, S
0
` e : v
l
, S
0
0
,
v
l
= [l
1
, l
2
, . . . , l
n
]
n 0
l
id
= E(id)
S
1
= S
0
0
[S
0
0
(l
1
)/l
id
]
G, E, S
1
` b : , S
0
1
,
S
2
= S
0
1
[S
0
1
(l
2
)/l
id
]
G, E, S
2
` b : , S
0
2
,
.
.
.
S
k
= S
0
k1
[S
0
k1
(l
k
)/l
id
]
G, E, S
k
` b :
, S
0
k
, R
1 k n R is not
G, E, S
0
` for id in e : b : , S
0
k
, R
[for-list-return]
G, E, S
0
` e : v, S
0
0
,
v = str(n, c
1
.c
2
. . .c
n
)
n 0
l
id
= E(id)
S
1
= S
0
0
[str(1, c
1
)/l
id
]
G, E, S
1
` b : , S
0
1
,
S
2
= S
0
1
[str(1, c
2
)/l
id
]
G, E, S
2
` b : , S
0
2
,
.
.
.
S
n
= S
0
n1
[str(1, c
n
)/l
id
]
G, E, S
n
` b : , S
0
n
,
G, E, S
0
` for id in e : b : , S
0
n
,
[for-str]
G, E, S
0
` e : v
s
, S
0
0
,
v
s
= str(n, c
1
.c
2
. . .c
n
)
n 0
l
id
= E(id)
S
1
= S
0
0
[str(1, c
1
)/l
id
]
G, E, S
1
` b : , S
0
1
,
S
2
= S
0
1
[str(1, c
2
)/l
id
]
G, E, S
2
` b : , S
0
2
,
.
.
.
S
k
= S
0
k1
[str(1, c
k
)/l
id
]
G, E, S
k
` b : , S
0
k
, R
1 k n R is not
G, E, S
0
` for id in e : b : , S
0
k
, R
[for-str-return]
Programs. Finally, the rule for evaluating a ChocoPy program involves first initializing a global environ-
ment and the initial store with globally defined variables and functions, and then evaluating the sequence of
top-level statements. Let represent an empty mapping. Then, the top-level rule is:
36
g
1
= e
1
, . . . , g
k
= e
k
are the global variable and function definitions in the program
P is the sequence of statements in the program
l
g 1
, . . . , l
g k
= newloc(S
init
, k)
G = G
init
[l
g 1
/g
1
] . . . [l
g k
/g
k
]
G, G, S
init
` e
1
: v
1
, S
init
,
.
.
.
G, G, ` e
k
: v
k
, S
init
,
S = S
init
[v
k
/l
g 1
] . . . [v
k
/l
g k
]
G, G, S ` P : , S
0
,
, , ` P : , S
0
,
[program]
The initial and store G
init
and S
init
hold the predefined functions:
G
init
= [l
len
/len][l
print
/print][l
input
/input]
S
init
(l
print
) = (val, native print, )
S
init
(l
len
) = (val, native len, )
S
init
(l
input
) = (native print, )
When no valid rule can be applied to a given expression, the program aborts after printing an appropriate
error message. Due to the myriad of semantic checks and typing rules enforced at compile-time, the set of
errors that can occur at run-time is limited. The following is the standard set of run-time errors that can
occur during the execution of a ChocoPy program:
1. Invalid argument (during invocation of print or len)
2. Division by zero
3. Index out of bounds (during string selection or list element access)
4. Operation on None (during method dispatch, attribute access, or list operations)
5. Out of memory (when allocating a new object)
The operational semantics do not specify what happens in the event of arithmetic integer overflow. This
manual only specifies semantics for signed integer arithmetic that fits within 32 bits. Overflow is considered
undefined behavior, and implementations may handle such situations in any manner of their choosing.
7 Acknowledgements
ChocoPy is a dialect of Python, version 3.6. The set of Python language features to include in ChocoPy were
influenced by Cool (Classroom Object-Oriented Language), which itself is based on Sather164, a dialect of
the Sather language. This language manual is largely based off the Cool reference manual.
Several language design choices in ChocoPy were refined through discussions with Rohan Bavishi and
Kevin Laeufer. Grant Posner helped review this document and improve its clarity. Countless typos were
identified by the students taking CS164 at UC Berkeley in Fall 2018.
A Known incompatibilities with Python
The following are the known cases where a valid ChocoPy program either does not correspond to a valid
Python program, or has different semantics:
37
Python 3 does not allow forward references to classes that have not yet been defined. For example,
the variable definition x:A = None is invalid in Python if this line appears before the class A has been
completely defined. In ChocoPy, this is valid as long as the class A is defined anywhere in the program.
A compromise is to use quoted type annotations: the syntax x:"A" = None is valid in both ChocoPy
and Python, and has exactly the same meaning. In Python 3.7, forward references can be enabled by
running from
future import annotations
2
.
Since integer overflow leads to undefined behavior in ChocoPy, there is no compatibility with Python
when dealing with integer values less than 2
31
or at least as large as 2
31
.
2
PEP 563 mentions that forward references will be allowed by default in Python 4.0.
38