Writing a code analyzer in TypeScript
The automated mentoring support on exercism is driven by code analyzers. Explore how you can build an analyzer from scratch.
Written by
Derk-Jan KarrenbeldThis is the main author for this article.Published on
updated:Languages
We've used these programming languages.- TypeScript
License
CC BY-NC-SA 4.0This license applies to the textual content of the article. The images might have their own license.
Exercism is an online platform designed to help you improve your coding skills through practice and mentorship.
Exercism provides you with thousands of exercises spread across numerous language tracks. Once you start a language track you are presented with a core set of exercises to complete. Each one is a fun and interesting challenge designed to teach you a little more about the features of a language.
At the moment of writing, I'm a maintainer of the JavaScript and TypeScript track and recently we've been working on automating a part of the experience.
This article will introduce you to AST parsing and walking using ESTree compatible tools. It specifically looks at certain token types, most commonly found in JavaScript and TypeScript code.
It teaches you how to explore these trees yourself and refers to code samples and actual production implementations.
When reading this article, think about your own JavaScript and TypeScript code.
Once you understand how the browser (and tooling, like eslint
)
parses your code, you might better understand how the language is defined and
constructed.
π§ The links to GitHub are all
master
links, meaning the contents may change between writing this article and you clicking them. However, in order to make sure the code samples make sense, the analyzer repository links refer to a specific commit (9ff332b
). This means that the code you see might not be the same as what is used today.
Table of Contents
- Table of Contents
- π The exercise
- π― Optimal solutions
- π©π½βπ» Analysing the code
- The algorithm
- β Automated Mentoring
- π Testing variations
- Walking TypeScript Trees
- Conclusion
- Reference
π The exercise
For this article, I'm going to write the analyzer for the gigasecond
exercise, for both the TypeScript Γ‘nd JavaScript track. The description is a
mere two lines:
Given a moment, determine the moment that would be after a gigasecond has passed.
A gigasecond is
10^9
(1,000,000,000) seconds.
The canonical data hints at the code I need to write, but luckily the exercise is implemented in both the JavaScript and TypeScript tracks.
JavaScript implementation
The JavaScript implementation expects us to write a named export gigasecond
which returns a Date
that is a gigasecond past the input Date
.
// Test case from the JavaScript track
test('tells a gigasecond anniversary since midnight', () => {
const gs = gigasecond(new Date(Date.UTC(2015, 8, 14)));
const expectedDate = new Date(Date.UTC(2047, 4, 23, 1, 46, 40));
expect(gs).toEqual(expectedDate);
});
TypeScript implementation
The TypeScript implementation expects us to write a default export Gigasecond
which is a class that has a date()
function which returns a Date
that is a
gigasecond past the constructor Date
.
// Test case from the TypeScript track
it('tells a gigasecond anniversary since midnight', () => {
const gs = new Gigasecond(new Date(Date.UTC(2015, 8, 14)));
const expectedDate = new Date(Date.UTC(2047, 4, 23, 1, 46, 40));
expect(gs.date()).toEqual(expectedDate);
});
π― Optimal solutions
Before tackling how to write an analyzer for these two implementations, I first have to establish what the optimal solutions are. If I know the intended code result, I can try to recognise that and work from there.
JavaScript solution
The implementation in JavaScript is straightforward. It uses the Date
constructor together with Date#getTime
and a constant to
generate a the appropiate result.
const GIGASECOND_IN_MS = 10 ** 9 * 1000;
export function gigasecond(date) {
return new Date(date.getTime() + GIGASECOND_IN_MS);
}
It is vital to note the perculiarities here:
- Optimal is extracting the
GIGASECOND_IN_MS
value as a top-level constant - The constant's value (
(10 ** 9) * 1000
) can optimally be written in many equally valid forms. Writing the number out, however, is deemed a smell. All the following SHOULD be recognised as optimal:10 ** 12
1e9 * 1e3
1e12
,Math.pow(10, 9) * 1000
Math.pow(10, 12)
Date#valueOf
is not optimal. It is marked as "This method is usually called internally by JavaScript and not explicitly in code.", even though it's functionally equivalent.- Finally,
Date.parse(date)
is not a good candidate as it's supposed to work with strings only. The reason it returns the same value asgetTime
when given a date, is because that date is coerced to a string and then parsed.
TypeScript solution
The TypeScript implementation expects a class
as default export
, and has a
method date()
. The algorithm is exactly the same as in the JavaScript
solution, but it requires type annotations.
const GIGASECOND_IN_MS = 10 ** 9 * 1000;
export default class Gigasecond {
private readonly futureDate: Date;
constructor(date: Readonly<Date>) {
this.futureDate = new Date(date.getTime() + GIGASECOND_IN_MS);
}
date(): Date {
return this.futureDate;
}
}
Apart from the variations and rules as described earlier for JavaScript, the
calculation may be done either in the constructor
(as shown above) or in
the date
function. In that last case, it will look as follows:
const GIGASECOND_IN_MS = 10 ** 9 * 1000;
export default class Gigasecond {
constructor(private readonly date: Date) {}
date(): Date {
return new Date(date.getTime() + GIGASECOND_IN_MS);
}
}
π©π½βπ» Analysing the code
Now it's time to actually write the analyzer. We'll focus on the JavaScript
implementation first. Because there are already JavaScript analyzers running in
the wild and that work is open source, this example will use the utilities and
base class from the javascript-analyzer
repository.
π¬ Abstract Syntax Trees
The JavaScript Analyzer will be working the Abstract Syntax Tree (AST) of the code solutions. There are other ways to write an analyzer, but for the sake of this article, AST parsing is the way to go.
yarn add @typescript-eslint/parser @typescript-eslint/typescript-estree
The TypeScript ESLint team has built a great parser
that outputs an ESTree, a specced format of tokens and information
about the input code. It can work with both JavaScript
and TypeScript
and
is therefore great for our use case. I prefer working with this type of tree,
because of the spec which allows for interoptability with other tools.
The parser
deals with the eslint configuration and then invokes the
typescript-estree
package which compiles the code using TypeScript and
transforms the result to match the ESTree. You can head on over to
AST Explorer and try this out yourself by pasting the
example code from above into the input field and selecting
@typescript-eslint/parser
. β Note: The version of the parser here is
usually not up-to-date with the latest parser.
ππ½βπ¨ Running the parser
Now that the packages are in place, let's parse the input code.
import { parse, TSESTreeOptions } from '@typescript-eslint/typescript-estree';
const options: TSESTreeOptions = {
comment: false,
jsx: false,
};
const program = parse(source, options);
// => Program({ body: [...], sourceType: "module", tokens: [...] })
This gives us the same output as you see in AST Explorer,
with at the root a Program
and its body
. We won't need the other fields, but
tokens
is interesting. It lists the parsed tokens from the source, as it was
building the tree.
π You can find this in
parsers/AstParser.ts
π Finding the main entrypoint
We're looking for a function called gigasecond
. We know the following things:
- It must be
export
ed - Its name must be
gigasecond
The input code declares a function, like in the optimal solution above, so the
tree holds a FunctionDeclaration
with an Identifier
:
export function gigasecond(date) {
// ...
}
{
type: "FunctionDeclaration",
id: {
type: "Identifier",
name: "gigasecond"
},
generator: false,
expression: false,
async: false,
params: [ ... ],
// The body is a block { ... }
body: {
type: "BlockStatement",
body: [ ... ]
}
}
This is something it can search for. The most common way to search in an AST is by walking that tree. You start at some node (usually the root / program) and visit each item.
We know that our parsed EStree
is compatible with eslint
and that eslint
, just like prettier
can
recognise (and transform) code.
# Not a dev dependency!
yarn add eslint
import { TSESTree } from '@typescript-eslint/typescript-estree';
import { traverse } from 'eslint/lib/util/traverser';
traverse(program, {
enter(node: TSESTree.Node) {
// ...
},
});
When writing this code, TypeScript will complain that there are no types for
this library, which is unfortunately still true at moment of writing. However,
you can copy this declarations.d.ts
I wrote in order to
get type completion.
The enter
method will be called for each node in the program. Inside the
enter
block, your in the "TraverserContext" which exposes two methods:
this.skip()
: skips the node from further traversal, meaning it will not visit any other keys (and therefore children) of the current node;this.break()
: completely stop traversing.
Finding the entrypoint is now straightforward.
import { TSESTree, AST_NODE_TYPES } from '@typescript-eslint/typescript-estree';
let entry: TSESTree.FunctionDeclaration | undefined = undefined;
traverse(program, {
enter(node: TSESTree.Node) {
switch (node.type) {
// function name() {}
case AST_NODE_TYPES.FunctionDeclaration:
if (node.id && node.id.name === 'gigasecond') {
entry = node;
this.break();
}
break;
}
},
});
entry;
// => FunctionDeclaration({
// id: { type: "Identifier", name: "gigasecond" }, ...
// })
Unfortunately the walker above only finds FunctionDeclaration
and fails on
equivalent code usign an ArrowFunctionExpression
or FunctionExpression
. More
on that later.
π You can find this in
analyzers/utils/extract_main_method.ts
π Finding the top-level constant
The code finds the first of the two components. Now it also needs to find the
second. A top-level const
, but the name is not known.
const GIGASECOND_IN_MS = 10 ** 9 * 1000;
{
type: "VariableDeclaration",
declarations: [
{
type: "VariableDeclarator",
id: {
type: "Identifier",
name: "GIGASECOND_IN_MS"
},
init: {
type: "BinaryExpression",
operator: "*",
// Left is another BinaryExpression with **
left: { ... },
// Right is a Literal
right: { ... }
}
}
],
kind: "const"
}
Nothing here is particularly helpful. Given the variations of data it needs to
accept, I can't rely on init
being a certain type. The name is also not fixed
as it's not export
ed and therefore not tested.
However, there are a few constraints that will help here:
- It must be a top-level constant
- It can not be named
gigasecond
- In an optimal solution, there is really only one top-level constant that is
not the
entry
,
type FoundConst = { kind: TSESTree.VariableDeclaration['kind'] }
& TSESTree.VariableDeclarator
let bigNumber: FoundConst | undefined = undefined
traverse(program, {
enter(node: TSESTree.Node) {
switch (node.type) {
// const NAME = ...
case AST_NODE_TYPES.VariableDeclaration:
const declaration = node.declarations.find(
(declaration) => declaration.id && declaration.id.name !== 'gigasecond')
)
if (declaration) {
bigNumber = { kind: node.kind, ...declaration }
this.break()
}
break;
default:
// This doesn't declare a variable, so skip the node
this.skip()
}
}
})
Later, I can check bigNumber['kind']
and make sure it's const
, or otherwise
attach a message that const
is preferred.
π You can find this in
analyzers/utils/find_top_level_constants.ts
The algorithm
Now that I found the entry
point, I can figure out what the name of the
argument is (date
). Because I also know the top-level constant, I know what
the constant name is GIGASECOND_IN_MS
.
new Date(...)
Nothing too fancy here. It's a new
expression with an expression as the first
argument.
{
type: "NewExpression",
callee: {
type: "Identifier",
name: "Date"
},
arguments: [ ... ]
}
let newDate: TSESTree.NewExpression | undefined = undefined;
traverse(program, {
enter(node: TSESTree.Node) {
switch (node.type) {
// new Date(...)
case AST_NODE_TYPES.NewExpression:
if (
node.callee.type === AST_NODE_TYPES.Identifier &&
node.callee.name === 'Date'
) {
newDate = node;
this.break();
}
break;
default:
// This doesn't declare a variable, so skip the node
this.skip();
}
},
});
π You can find this in
analyzers/utils/find_new_expression.ts
The inner expression is of type BinaryExpression
. In EStree compatible output,
and operator with two components (such as +
, -
, *
) is a binary
expression, whereas those with one component (such as ~
and !
) are unary
expressions.
date.getTime() + GIGASECOND_IN_MS;
{
type: "BinaryExpression",
operator: "+",
left: {
type: "CallExpression",
callee: {
type: "MemberExpression",
object: {
type: "Identifier",
name: "date"
},
property: {
type: "Identifier",
name: "getTime"
}
},
arguments: []
},
right: {
type: "Identifier",
name: "GIGASECOND_IN_MS"
}
}
Quite a few things we've already seen and also a few new types. Let's look at those.
Properties of objects
When the parser encounters a object property accessor (object.property
), it is
parsed as a MemberExpression
. Depending on the way of writing, the property is
either an Identifier
or a Literal
.
date.getTime;
// ^ ^
// object property
// | |
// | identifier (name = getTime)
// identifier (name = date)
date['getTime'];
// ^ ^
// object property
// | |
// | literal (value = getTime)
// identifier (name = date)
"Executing" a property
If there are parentheses behind the MemberExpression
, the entire expression
is parsed as a child of a CallExpression
. This is also the case for
parentheses following an identifier.
date.getTime();
// ---------| ^ |
// ^ | argument(s) of call expression
// member expression
// |
// -------------|
// call expression
gigasecond(INPUT);
// ------| ^ |
// ^ | argument of call expression
// identifier |
// |
// --------------|
// call expression
Matching the identifiers
There are two identifiers provided by the source code that I need to find and match against:
- the gigasecond first argument (used in
arg.getTime()
) - the top-level constant (used in
time + CONSTANT
)
const argumentName = entry.id.name;
// => "gigasecond"
const constantName = bigNumber.id.name;
// => "GIGASECOND_IN_MS"
let optimalExpression: boolean = false;
// NOTE: passing in the newDate as root, so this is a subtree traversal!
traverse(newDate, {
enter(node: TSESTree.Node) {
switch (node.type) {
// new Date(x.z() + y)
case AST_NODE_TYPES.BinaryExpression:
this.break();
if (node.operator !== '+') {
optimalExpression = false;
return;
}
// This allows the order to be reversed
const leftType = node.left.type;
const constSide =
leftType === AST_NODE_TYPES.Identifier ? node.left : node.right;
const expressionSide =
leftType === AST_NODE_TYPES.CallExpression ? node.left : node.right;
if (constSide === expressionSide) {
// throw new Error("not optimal! this is not x.z() + y")
optimalExpression = false;
return;
}
if (constSide.id.name !== constantName) {
optimalExpression = false;
return;
}
const { object, property } = expressionSide.callee;
optimalExpression =
object.type === AST_NODE_TYPES.Identifier &&
object.name === argumentName &&
((property.type === AST_NODE_TYPES.Identifier &&
property.name === 'getTime') ||
(property.type === AST_NODE_TYPES.Literal &&
property.value === 'getTime'));
break;
}
},
});
π You can find this in:
β Automated Mentoring
When all these pieces are put together, it is the analyzer for gigasecond. There are a few more things to check:
- Is the
bigNumber.kind
equal to"const"
? If not, add a comment - Is the value of
GIGASECOND_IN_MS
using one of the comprehensions? If not, add a comment. - Is there only one argument to
gigasecond
? Make sure it's not a...splat
argument, and that it has novalue = "default"
. - Is
gigasecond
actually exported? Is theexport
inline? if not, add a comment.
Since the first one was mentioned (kind
equality check) and the second one is
very similar to the inner expression of the new Date(...)
call, I've left
out how to implement these. You can check the
gigasecond analyzer source code if you
need some inspiration. The third one is testing the entry
for parameters
.
As for export
s, these are handled by π extract_export
,
but I'll show you the gist of it.
π¦ Testing exports
Exports in JavaScript
and TypeScript
basically come in three types. Those
using a core language feature (read: use a keyword) are the easiest:
export const inline = {}
export default class InlineClass {}
export default defaultSpecifier
export { specifier }
export { specifier as renamed }
The default
export
have their own token type ExportDefaultDeclaration
.
{
type: "ExportDefaultDeclaration",
declaration: {
// ...
}
}
Those without a default
modifier are of type ExportNamedDeclaration
.
{
type: "ExportNamedDeclaration",
declaration: {
// ...
}
}
The declaration
property is where it gets slightly tricky. The inline export
statements, regardless if they're default or not, are followed by the same token
types as if they did not have the export
keyword similarly to how writing
parentheses wraps an expression in a CallExpression
.
Inline exports
This means that the first example is a VariableDeclaration
with a single
VariableDeclaractor
: the id
is an Identifier
with name = "inline"
and the
init
is an ObjectExpression
. Similarly the second example is a
ClassDeclaration
with as id
an Identifier
with name = "InlineClass"
.
Specifier exports
The third has a declaration
of type Identifier
with
name = "defaultSpecifier"
. This is similar to inline
exports
The fourth and fifth however, do not have a declaration
property.
Instead, they have a specifiers
property with, in this case, one item:
{
type: "ExportSpecifier",
local: {
type: "Identifier",
name: "specifier"
}
exported: {
type: "Identifier",
name: "specifier" // or "renamed"
}
}
Use the local
property to determine what is exported (what the internal name
is) and the exported
property how it's imported (what the exported name is).
CommonJS exports
Finally there are those exports that don't use a keyword but instead use the
(as far as I'm concerned) defunct module.exports
.
module.exports = singleExport;
module.exports = { specifier };
module.exports.default = defaultExport;
module.exports.renamed = specifier;
Since these don't use keywords, they're interpreted as ExpressionStatement
s,
as they're AssignmentExpression
s. Here is a quick overview table of the
important properties and representations:
expression | type | prop | value |
---|---|---|---|
module.exports.renamed = specifier |
AssignmentExpression |
||
operator |
"=" |
||
Identifier |
right |
"specifier" |
|
MemberExpression |
left |
π½ module.exports.renamed π½ |
|
module.exports.renamed |
MemberExpression |
||
Identifier |
property |
"renamed" |
|
MemberExpression |
object |
π½ module.exports π½ |
|
module.exports |
MemberExpression |
||
Identifier |
property |
"exports" |
|
Identifier |
object |
"module" |
There is also the variant of using the object['accessor']
, where accessor
is not an Identifier
but a Literal
, but otherwise that is the same.
π Testing variations
As mentioned before, there are many ways to write functions in JavaScript and
TypeScript. In the source code for the analyzer there
is a utility method π extract_main_method
. It can
detect the following variations:
function name() {}
// FunctionDeclaration
const name = () => {};
// ArrowFunctionExpression
const name = function () {};
// FunctionExpression
export default {
name: () => {},
};
// ExportDefaultDeclaration + ObjectExpression + (Arrow)FunctionExpression
And the TypeScript specific variants (but they work on both)
class Foo {
name() {}
}
// MethodDefinition + FunctionExpression
class Foo {
static name = () => {};
static name = function name() {};
}
// ClassProperty + (Arrow)FunctionExpression
Walking TypeScript Trees
As you've noticed, all we did so far is checking the JavaScript code and shown how that is parsed and walked. In order to adapt the solution so that TypeScript code can be parsed with it, there is one thing to change in the walker and a few extra properties to test for.
π Visitor Keys
When the traverser
is walking the tree, it decides which nodes to "walk πΆπ½β"
based on a set of keys called visitor keys
. Since TypeScript
is a superset
of JavaScript
it has the same keys and then some.
import { visitorKeys } from '@typescript-eslint/parser/dist/visitor-keys';
traverse(root, {
enter(node: Node) {
// ...
},
visitorKeys,
});
If you look at the source of that file, you'll see that it
actually imports the eslint
visitor keys (in order
to visit all the JavaScript keys) and adds the specific TypeScript keys.
π Type annotations
These are interesting.
class Gigasecond {
constructor(private readonly date: Date) {}
}
{
type: "ClassDeclaration",
id: {
type: "Identifier",
name: "Gigasecond"
},
// everything inside the class { body }
body: {
type: "ClassBody",
body: [
{
// a constructor is a regular method definition...
type: "MethodDefinition",
key: {
type: "Identifier",
name: "constructor"
},
value: {
type: "FunctionExpression",
params: [{ /*...*/ }],
generator: false,
expression: false,
async: false,
body: { /*...*/ }
}
computed: false,
static: false, // (typescript static keyword)
// ... but with a special kind
kind: 'constructor'
}
]
}
}
The above has no special properties over the JavaScript equivalent, but that's
because there are no type annotations in the source except for in the params
of the constructor
:
{
type: "Identifier",
name: "date",
// : ...
typeAnnotation: {
type: "TSTypeAnnotation",
typeAnnotation: {
// Readonly
type: "TSTypeReference",
typeName: {
type: "Identifier"
name: "Readonly"
},
// <...>
typeParameters: {
type: "TSTypeParameterInstantiation",
// Each type between the < brackets >
params: [
{
type: "TSTypeReference",
typeName: {
type: "Identifier",
name: "Date"
}
}
]
}
}
}
}
A few key observations:
- The type annotation has its own visitor key
typeAnnotation
, - All the TypeScript nodes start with
TS
, - A generic type is just a
TSTypeReference
with both atypeName
and one or multipletypeParameters
. - Stripping types is almost as easy as removing the
typeAnnotation
keys, which is almost exactly what babel'spreset-typescript
does.
Class properties
In TypeScript you can annotate class properties with keywords such as private
and readonly
. Additionally, they can have a type (which is Date
in this
example).
class Gigasecond {
private readonly futureDate: Date;
}
{
type: "ClassProperty",
key: {
type: "Identifier",
name: "futureDate"
},
computed: false,
static: false, // static keyword
readonly: true, // readonly keyword
accessibility: "private", // private, protected, public keywords
typeAnnotation: {
type: "TSTypeAnnotation",
typeAnnotation: {
type: "TSTypeReference",
typeName: {
type: "Identifier",
name: "Date"
}
}
}
}
The TypeScript keywords private
and readonly
modify the ClassProperty
directly, but the type is again on typeAnnotation
. If the type annotation is
left out the source code (read: implicit any
), the typeAnnotation
key is
not present on the AST.
β© Return types
The final type we'll look at for now are function return
types. Most other
type annotations are just variations on this and the ones mentioned before.
class Gigasecond {
date(): Date {
// ...
}
}
{
type: "MethodDefinition",
key: {
type: "Identifier",
name: "date"
},
value: {
type: "FunctionExpression",
generator: false,
expression: false,
async: false,
body: { /*...*/ },
params: [],
returnType: {
type: "TSTypeAnnotation",
typeAnnotation: {
type: "TSTypeReference",
typeName: {
type: "Identifier",
name: "Date"
}
}
}
},
computed: false,
static: false, // static keyword
kind: "method"
}
As you might have noticed, the typeAnnotation
is not on the
MethodDefinition
. That is because a method definition on a class is actually
binding a function expression (): Date { ... }
to the identifier date
.
On the FunctionExpression
you can find the previously not encountered type
annotation returnType
. Its structure is the same as typeAnnotation
s for
ClassProperty
.
Conclusion
Interpreting code as an Abstract Syntax Tree and seeking out certain properties is a lot of tree walking; because there is a specification for the format of the output of certain AST Parsers, you can write tooling yourself.
The contents of this article, in a different format, is being used to
automatically approve the gigasecond
exercise, given that the student has
provided an exact variation of the optimal solution. There is enough surface
to hook into the findings of the analyzer to provide meaningful commentary,
should the student not have provided an optimal solution.
Reference
Analyzer reference
- AstParser
#parse
extractExport
extractMainMethod
findMemberCall
findNewExpression
findTopLevelConstants
isBinaryExpression
isIdentifier
isLiteral
Exercism repositories
problem-specifications/gigasecond
| canonical-datajavascript/gigasecond
typescript/gigasecond
javascript-analyzer/gigasecond
javascript-analyzer