JS 语法树学习(全)
简介
最开始 Mozilla JS Parser API 是 Mozilla 工程师在 Firefox 中创建的 SpiderMonkey 引擎输出 JavaScript AST 的规范文档。而后随着 Javascript 更多语法的加入,The ESTree Spec 诞生了,作为参与构建和使用这些工具的人员的社区标准。这两者的区别在于 Parser API 中描述了一些特定于 SpiderMonkey 引擎的行为,而 ESTree 是社区规范,并且向后兼容 SpiderMonkey 格式。
解析器
Parser 解析一般分为两步,词法分析和语法分析。本文使用 Acorn@7.2.0 作为 Javascript 的解析器,以下面的 JS 代码为例:
const href = 'https://vincent0700.com'
词法分析
词法分析会把代码转化成令牌(Tokens)流,例如上面的案例,得到的结果大致如下:
[
Token {
type: TokenType { label: 'const', keyword: 'const' ... },
value: 'const', ...
},
Token {
type: TokenType { label: 'name', keyword: 'undefined' ... },
value: 'href', ...
},
Token {
type: TokenType { label: '=', keyword: 'undefined' ... },
value: '=', ...
},
Token {
type: TokenType { label: 'string', keyword: 'undefined' ... },
value: 'https://vincent0700.com', ...
},
Token {
type: TokenType { label: 'eof', keyword: 'undefined' ... },
value: undefined, ...
]
Token 的数据结构:
class Token {
type: TokenType
value: any
start: number
end: number
loc?: SourceLocation
range?: [number, number]
}
TokenType 的数据结构:
class TokenType {
label: string
keyword: string
beforeExpr: boolean
startsExpr: boolean
isLoop: boolean
isAssign: boolean
prefix: boolean
postfix: boolean
binop: number
updateContext?: (prevType: TokenType) => void
}
语法分析
根据词法分析得到的 Tokens 流,将其转换成 AST,得到的结果大致如下:
Node {
type: 'Program',
sourceType: 'script',
body: [
Node {
type: 'VariableDeclaration',
kind: 'const',
declarations: [
Node {
type: 'VariableDeclaration',
kind: 'const'
declarations: [
Node {
type: 'VariableDeclarator',
id: Node { type: 'Identifier', name: 'href' },
init: Node { type: 'Literal', value: 'https://vincent0700.com' }
}
]
}
]
}
]
}
AST 的所有节点都是 Node 的实例,它的数据结构如下:
class Node {
type: string
start: number
end: number
loc?: SourceLocation
sourceFile?: string
range?: [number, number]
}
ES5
Node 大致分为以下 8 个大类:
Program 根节点
interface Program <: Node {
type: "Program";
body: [ Statement ];
}
AST 的顶部, body 包含了多个 Statement(语句)节点。
Identifier 标识符
interface Identifier <: Expression, Pattern {
type: "Identifier";
name: string;
}
用户自定义的名称,如变量名,函数名,属性名等。
Literal 字面量
interface Literal <: Expression {
type: "Literal";
value: string | boolean | null | number | RegExp;
}
从 value 的类型可以看出,字面量就是值,他的类型有字符串,布尔,数值,null 和正则。
Statement 语句
interface Statement <: Node { }
从根节点就可以看出,AST 是由 Statement 数组构成,我认为 Statement 应该是 AST 中除了 Program 最大的概念了,JS 的各种语法也是从 Statement 展开的:
- 空语句 “;”
interface EmptyStatement <: Statement { type: "EmptyStatement"; }
- 调试语句 “debugger;”
interface DebuggerStatement <: Statement { type: "DebuggerStatement"; }
- 表达式语句 “1 + 1;”
interface ExpressionStatement <: Statement { type: "ExpressionStatement"; expression: Expression; }
- 块语句 “{[body]}”
interface BlockStatement <: Statement { type: "BlockStatement"; body: [ Statement ]; }
- With语句 “with ([object]) {[body]}”
interface WithStatement <: Statement { type: "WithStatement"; object: Expression; body: Statement; }
- 流程控制语句
- Return语句 “return [argument]”
interface ReturnStatement <: Statement { type: "ReturnStatement"; argument: Expression | null; }
- 标签语句 “loop: … break loop;”
interface LabeledStatement <: Statement { type: "LabeledStatement"; label: Identifier; body: Statement; }
- Break语句 “break [label?];”
interface BreakStatement <: Statement { type: "BreakStatement"; label: Identifier | null; }
- Continue语句 “continue [label?];”
interface ContinueStatement <: Statement { type: "ContinueStatement"; label: Identifier | null; }
- Return语句 “return [argument]”
- 条件语句
- If语句 “if ([test]) {[consequent]} else {[alternate]}”
interface IfStatement <: Statement { type: "IfStatement"; test: Expression; consequent: Statement; alternate: Statement | null; }
- Switch语句 “switch ([discriminant]) {[cases]}”
interface SwitchStatement <: Statement { type: "SwitchStatement"; discriminant: Expression; cases: [ SwitchCase ]; }
- SwitchCase节点 “case: [test]: [consequent]”
interface SwitchCase <: Node { type: "SwitchCase"; test: Expression | null; consequent: [ Statement ]; }
- SwitchCase节点 “case: [test]: [consequent]”
- If语句 “if ([test]) {[consequent]} else {[alternate]}”
- 异常语句
- Throw语句 “throw [argument]”
interface ThrowStatement <: Statement { type: "ThrowStatement"; argument: Expression; }
- Try语句 “try {[block]} catch {[handler]} finally {[finalizer]}”
interface TryStatement <: Statement { type: "TryStatement"; block: BlockStatement; handler: CatchClause | null; finalizer: BlockStatement | null; }
- Catch节点
interface CatchClause <: Node { type: "CatchClause"; param: Pattern; body: BlockStatement; }
- Catch节点
- Throw语句 “throw [argument]”
- 循环语句
- While语句 “while ([test] {[body]}”
interface WhileStatement <: Statement { type: "WhileStatement"; test: Expression; body: Statement; }
- DoWhile语句 “do {[test]} while ([body])”
interface DoWhileStatement <: Statement { type: "DoWhileStatement"; body: Statement; test: Expression; }
- For语句 “for ([init];[test];[update]) {[body]}”
interface ForStatement <: Statement { type: "ForStatement"; init: VariableDeclaration | Expression | null; test: Expression | null; update: Expression | null; body: Statement; }
- ForIn语句 “for ([left] in [right]) {[body]}”
interface ForInStatement <: Statement { type: "ForInStatement"; left: VariableDeclaration | Pattern; right: Expression; body: Statement; }
- While语句 “while ([test] {[body]}”
Declaration 声明语句
interface Declaration <: Statement { }
声明语句节点,同样也是语句,只是一个类型的细化。
- 函数声明 “function [id] ([params]) {[body]}”
interface FunctionDeclaration <: Function, Declaration { type: "FunctionDeclaration"; id: Identifier; }
- 函数
interface Function <: Node { id: Identifier | null; params: [ Pattern ]; body: FunctionBody; }
- 函数
- 变量声明 “var a = 10;”
interface VariableDeclaration <: Declaration { type: "VariableDeclaration"; declarations: [ VariableDeclarator ]; kind: "var"; }
- 变量声明描述
interface VariableDeclarator <: Node { type: "VariableDeclarator"; id: Pattern; init: Expression | null; }
- 变量声明描述
Expression 表达式
interface Expression <: Node { }
- This表达式 “this”
interface ThisExpression <: Expression { type: "ThisExpression"; }
- Array表达式 “[1, 2, 3]”
interface ArrayExpression <: Expression { type: "ArrayExpression"; elements: [ Expression | null ]; }
- Object表达式 “{ a: 1 }”
interface ObjectExpression <: Expression { type: "ObjectExpression"; properties: [ Property ]; }
- 属性节点
interface Property <: Node { type: "Property"; key: Literal | Identifier; value: Expression; kind: "init" | "get" | "set"; }
- 属性节点
- 函数表达式 “function ([params]) {[body]}”
interface FunctionExpression <: Function, Expression { type: "FunctionExpression"; }
- 一元操作
- Unary表达式
interface UnaryExpression <: Expression { type: "UnaryExpression"; operator: UnaryOperator; prefix: boolean; argument: Expression; }
- Unary运算符 “typeof a”
enum UnaryOperator { "-" | "+" | "!" | "~" | "typeof" | "void" | "delete" }
- Unary运算符 “typeof a”
- Update表达式 “a++” “—a”
interface UpdateExpression <: Expression { type: "UpdateExpression"; operator: UpdateOperator; argument: Expression; prefix: boolean; }
- Update运算符
enum UpdateOperator { "++" | "--" }
- Update运算符
- Unary表达式
- 二元操作
- Binary表达式 “a > b”
interface BinaryExpression <: Expression { type: "BinaryExpression"; operator: BinaryOperator; left: Expression; right: Expression; }
- Binary运算符
enum BinaryOperator { "==" | "!=" | "===" | "!==" | "<" | "<=" | ">" | ">=" | "<<" | ">>" | ">>>" | "+" | "-" | "*" | "/" | "%" | "|" | "^" | "&" | "in" | "instanceof" }
- Binary运算符
- Binary表达式 “a > b”
- 赋值表达式 “a = 1”
interface AssignmentExpression <: Expression { type: "AssignmentExpression"; operator: AssignmentOperator; left: Pattern | Expression; right: Expression; }
- 赋值运算符
enum AssignmentOperator { "=" | "+=" | "-=" | "*=" | "/=" | "%=" | "<<=" | ">>=" | ">>>=" | "|=" | "^=" | "&=" }
- 赋值运算符
- 逻辑表达式 “a && b”
interface LogicalExpression <: Expression { type: "LogicalExpression"; operator: LogicalOperator; left: Expression; right: Expression; }
- 逻辑运算符
enum LogicalOperator { "||" | "&&" }
- 逻辑运算符
- 成员表达式 “a.b”
interface MemberExpression <: Expression, Pattern { type: "MemberExpression"; object: Expression; property: Expression; computed: boolean; }
- 条件表达式 “a > b ? c : d”
interface ConditionalExpression <: Expression { type: "ConditionalExpression"; test: Expression; alternate: Expression; consequent: Expression; }
- 函数调用表达式 “func(1, 2)”
interface CallExpression <: Expression { type: "CallExpression"; callee: Expression; arguments: [ Expression ]; }
- New表达式 “new Date()”
interface NewExpression <: Expression { type: "NewExpression"; callee: Expression; arguments: [ Expression ]; }
- Sequence表达式 “1,2,3”
interface SequenceExpression <: Expression { type: "SequenceExpression"; expressions: [ Expression ]; }
Patterns 模式
interface Pattern <: Node { }
主要在 ES6 的解构赋值中有意义,在 ES5 中,可以理解为和 Identifier 差不多的东西。
ES2015
Program 根节点
extend interface Program {
sourceType: "script" | "module";
body: [ Statement | ModuleDeclaration ];
}
如果是 ES6 模块,必须指定 sourceType 为 “module”,否则将指定为 “script”。
Function 函数
extend interface Function {
generator: boolean;
}
支持 Generator 函数
Statement 语句
- ForOf语句 “for (let [left] of [right])”
interface ForOfStatement <: ForInStatement { type: "ForOfStatement"; }
Declaration 声明
- 变量声明
extend interface VariableDeclaration { kind: "var" | "let" | "const"; }
Expression 表达式
- Super表达式 “super([arguments])”
interface Super <: Node { type: "Super"; } extend interface CallExpression { callee: Expression | Super; } extend interface MemberExpression { object: Expression | Super; }
- Spread表达式 “[head, …iter]”
interface SpreadElement <: Node { type: "SpreadElement"; argument: Expression; } extend interface ArrayExpression { elements: [ Expression | SpreadElement | null ]; } extend interface CallExpression { arguments: [ Expression | SpreadElement ]; } extend interface NewExpression { arguments: [ Expression | SpreadElement ]; }
- 箭头函数表达式 “() => {[body]}”
interface ArrowFunctionExpression <: Function, Expression { type: "ArrowFunctionExpression"; body: FunctionBody | Expression; expression: boolean; }
- Yield表达式 “yield [argument]”
interface YieldExpression <: Expression { type: "YieldExpression"; argument: Expression | null; delegate: boolean; }
- 模板字面量 “
Hello ${name}
“interface TemplateLiteral <: Expression { type: "TemplateLiteral"; quasis: [ TemplateElement ]; expressions: [ Expression ]; }
- 模板元素
interface TemplateElement <: Node { type: "TemplateElement"; tail: boolean; value: { cooked: string; raw: string; }; }
- 模板元素
- 带标签的模板字符串表达式 MDN链接
interface TaggedTemplateExpression <: Expression { type: "TaggedTemplateExpression"; tag: Expression; quasi: TemplateLiteral; }
Pattern 模式
主要跟解构赋值相关
- ObjectPattern “{ a, b: c } = { a: 1, b: { c: 2 }}”
interface AssignmentProperty <: Property { type: "Property"; // inherited value: Pattern; kind: "init"; method: false; } interface ObjectPattern <: Pattern { type: "ObjectPattern"; properties: [ AssignmentProperty ]; }
- ArrayPattern “[a, b] = [1, 2]”
interface ArrayPattern <: Pattern { type: "ArrayPattern"; elements: [ Pattern | null ]; }
- RestElement “fun(…args){}”
interface RestElement <: Pattern { type: "RestElement"; argument: Pattern; }
- AssignmentPattern “fun(a=10){}”
interface AssignmentPattern <: Pattern { type: "AssignmentPattern"; left: Pattern; right: Expression; }
Class 类
interface Class <: Node {
id: Identifier | null;
superClass: Expression | null;
body: ClassBody;
}
- 类主体
interface ClassBody <: Node { type: "ClassBody"; body: [ MethodDefinition ]; }
- 方法定义
interface MethodDefinition <: Node { type: "MethodDefinition"; key: Expression; value: FunctionExpression; kind: "constructor" | "method" | "get" | "set"; computed: boolean; static: boolean; }
- 类声明 “class [name] [extends] {[body]}”
interface ClassDeclaration <: Class, Declaration { type: "ClassDeclaration"; id: Identifier; }
- 类表达式 “const A = class [name] [extends] {[body]}”
interface ClassExpression <: Class, Expression { type: "ClassExpression"; }
- 元属性 “new.target”
interface MetaProperty <: Expression { type: "MetaProperty"; meta: Identifier; property: Identifier; }
Module 模块
- 模块声明
interface ModuleDeclaration <: Node { }
- 模块说明符
interface ModuleSpecifier <: Node { local: Identifier; }
- Import
- 导入声明 “import foo from ‘mod’”
interface ImportDeclaration <: ModuleDeclaration { type: "ImportDeclaration"; specifiers: [ ImportSpecifier | ImportDefaultSpecifier | ImportNamespaceSpecifier ]; source: Literal; }
- 导入说明符 “import { foo as a } from ‘mod’”
interface ImportSpecifier <: ModuleSpecifier { type: "ImportSpecifier"; imported: Identifier; }
- 默认导入说明符 “import foo from ‘mod’”
interface ImportDefaultSpecifier <: ModuleSpecifier { type: "ImportDefaultSpecifier"; }
- 命名空间导入说明符 “import * as foo from ‘mod’”
interface ImportNamespaceSpecifier <: ModuleSpecifier { type: "ImportNamespaceSpecifier"; }
- 导入声明 “import foo from ‘mod’”
- Exports
- 部分导出声明 “export { foo, bar }” “export var foo = 1”
interface ExportNamedDeclaration <: ModuleDeclaration { type: "ExportNamedDeclaration"; declaration: Declaration | null; specifiers: [ ExportSpecifier ]; source: Literal | null; }
- 导出说明符 “export { foo }” “export { foo as bar }”
interface ExportSpecifier <: ModuleSpecifier { type: "ExportSpecifier"; exported: Identifier; }
- 默认导出声明 “export default foo”
interface AnonymousDefaultExportedFunctionDeclaration <: Function { type: "FunctionDeclaration"; id: null; } interface AnonymousDefaultExportedClassDeclaration <: Class { type: "ClassDeclaration"; id: null; } interface ExportDefaultDeclaration <: ModuleDeclaration { type: "ExportDefaultDeclaration"; declaration: AnonymousDefaultExportedFunctionDeclaration | FunctionDeclaration | AnonymousDefaultExportedClassDeclaration | ClassDeclaration | Expression; }
- 全部导出声明 “export * from ‘mod’”
interface ExportAllDeclaration <: ModuleDeclaration { type: "ExportAllDeclaration"; source: Literal; }
- 部分导出声明 “export { foo, bar }” “export var foo = 1”
ES2016
新增二元运算符 **
extend enum BinaryOperator {
"**"
}
新增赋值运算符 **=
extend enum AssignmentOperator {
"**="
}
ES2017
async/await
extend interface Function {
async: boolean;
}
interface AwaitExpression <: Expression {
type: "AwaitExpression";
argument: Expression;
}
ES2018
异步迭代器 for-await-of
extend interface ForOfStatement {
await: boolean;
}
for await (const x of xs) {}
对象支持 Rest/Spread
extend interface ObjectExpression {
properties: [ Property | SpreadElement ];
}
extend interface ObjectPattern {
properties: [ AssignmentProperty | RestElement ];
}
ES2015 引入了 Rest 参数和 Spread 运算符,但仅作用于数组,ES2018 新增了对 Object 的支持。
非法转义序列
extend interface TemplateElement {
value: {
cooked: string | null;
raw: string;
};
}
ES2018 移除对 ECMAScript 在带标签的模版字符串中转义序列的语法限制。
之前,\u 开始一个 unicode 转义,\x 开始一个十六进制转义,\ 后跟一个数字开始一个八进制转义。这使得创建特定的字符串变得不可能,更多细节参考 MDN。
ES2019
Catch 语句允许为空
extend interface CatchClause {
param: Pattern | null;
}
try { } catch { }
ES2020
BigInt 字面量
extend interface Literal <: Expression {
type: "Literal";
value: string | boolean | null | number | RegExp | bigint;
}
interface BigIntLiteral <: Literal {
bigint: string;
}
双问号运算符
extend enum LogicalOperator {
"||" | "&&" | "??"
}
export * as 语法
extend interface ExportAllDeclaration {
exported: Identifier | null;
}