ES2018

正則表示式 Unicode · 發表 2018-12-01 23:12:59

摘要：一.概覽 2個主特性： Asynchronous Iteration Rest/Spread Properties 正則表示式相關的4個小特性： ...

一.概覽

2個主特性：

ofollow,noindex" target="_blank">Asynchronous Iteration
Rest/Spread Properties

正則表示式相關的4個小特性：

其它：

Promise.prototype.finally
Lifting template literal restriction

二.Asynchronous Iteration

普通的（同步）迭代是類似這樣的：

let arr = [1, 2, 3];
let iter = arr[Symbol.iterator]();
// 手動遍歷
while (true) {
let step = iter.next();
if (step.done) break;
console.log(step.value);// 1, 2, 3
}

或者通過for...of 迴圈去遍歷：

for (let value of arr) {
console.log(value); // 1, 2, 3
}

但如果資料來源是非同步的，for...of 迴圈就只能拿到一堆Promise，而不是想要的值：

// 非同步資料來源
let arr = [1, 2, 3].map(n => Promise.resolve(n));
for (let value of arr) {
console.log(value); // Promise.{<resolved>: 1}...
}

這是因為ES2015推出的Iterator介面僅適用於同步資料來源：

Iterators are only suitable for representing synchronous data sources

為了支援非同步資料來源 ，ES2018新增了3個東西：

非同步迭代器介面：AsyncIterator
非同步迭代語句：for-await-of
非同步（迭代器的）生成器：async generator functions

async generator用來快速實現AsyncIterator介面，而實現了AsyncIterator介面的東西就能夠方便地通過for-await-of 遍歷了

AsyncIterator

類似於同步Iterator介面：

const { value, done } = syncIterator.next();

非同步AsyncIterator介面要求next() 返回攜帶著{ value, done } 的Promsie：

asyncIterator.next().then(({ value, done }) => /* ... */);

介面對應的方法名為Symbol.asyncIterator ，例如：

let myObj = {/* ... */};
// 實現了Symbol.asyncIterator就說明我是可被非同步迭代的（async iterable）
myObj[Symbol.asyncIterator] = () => {
return {
next() {
return Promise.resolve({ value: "more and more...", done: false });
}
}
};

試玩：

let asyncIter = myObj[Symbol.asyncIterator]();
(async () => {
while (true) {
let step = await asyncIter.next();
if (step.done) break;
console.log(step.value);// more and more...死迴圈，無限序列嘛
}
})();

P.S.同步Iterator介面對應的方法名為Symbol.iterator ，具體見for…of迴圈_ES6筆記1 | 2.不能遍歷物件

for-await-of

類似的，實現了AsyncIterator介面的，就叫async iterable ，就有能通過for-await-of 遍歷的特權：

// 非同步資料來源
let arr = [1, 2, 3].map(n => Promise.resolve(n));
// 實現AsyncIterator介面
arr[Symbol.asyncIterator] = () => {
let i = 0;
return {
next() {
let done = i === arr.length;
return !done ?
arr[i++].then(value => ({ value, done })) :
Promise.resolve({ value: void 0, done: true });
}
}
};

(async ()=> {
for await (const n of arr) {
console.log(n); // 1, 2, 3
}
})();

用起來與同步for...of 沒太大區別，只是實現AsyncIterator介面有些麻煩，迫切需要一種更方便的方式

P.S.同樣，await 關鍵字只能出現在async function裡，for-await-of 的await 也不例外

async generator

async generator就是我們迫切想要的非同步迭代器的生成器 ：

// 非同步資料來源
let arr = [1, 2, 3].map(n => Promise.resolve(n));
// 實現AsyncIterator介面
arr[Symbol.asyncIterator] = async function*() {
for (let value of arr) {
yield value;
}
}

方便多了，更進一步地，async generator返回值本來就是async iterable（隱式實現了AsyncIterator介面），沒必要手動實現該介面：

let asyncIterable = async function*() {
let arr = [1, 2, 3].map(n => Promise.resolve(n));
for (let value of arr) {
yield value;
}
}();

類似於同步版本：

let iterable = function*() {
let arr = [1, 2, 3];
for (let value of arr) {
yield value;
}
}();

就具體語法而言，async generator有3個特點：

返回async iterable物件，其next 、throw 、return 方法都返回Promise，而不直接返回{ value, done } ，並且會預設實現Symbol.asyncIterator 方法（因此async generator返回async iterable）
函式體中允許出現await 、for-await-of 語句
同樣支援 yield\* 拼接迭代器

例如：

let asyncIterable = async function*() {
let arr = [1, 2, 3].map(n => Promise.resolve(n));
for (let value of arr) {
yield value;
}
// yield*拼接非同步迭代器
yield* (async function*() {
for (let v of [4, 5, 6]) {
yield v;
}
}());
// 允許出現await
let seven = await Promise.resolve(7);
yield seven;
// 允許出現for-await-of
for await (let x of [8, 9]) {
yield x;
}
}();

// test
(async ()=> {
for await (const n of asyncIterable) {
console.log(n); // 1, 2, 3...9
}
})();

P.S.注意一個細節，類似於await nonPromise ，for-wait-of 也能接受非Promise值（同步值）

P.S.另外，async generator裡的yield 等價於yield await ，具體見 Suggestion: Makeyield Promise.reject(...) uncatchable

實現原理

Implicit in the concept of the async iterator is the concept of a request queue. Since iterator methods may be called many times before the result of a prior request is resolved, each method call must be queued internally until all previous request operations have completed.

asyncIterator內部維持了一個請求佇列，以此保證遍歷次序，例如：

const sleep = (ts) => new Promise((resolve) => setTimeout(resolve, ts));
let asyncIterable = async function*() {
yield sleep(3000);
yield sleep(1000);
}();
const now = Date.now();
const time = () => Date.now() - now;
asyncIterable.next().then(() => console.log('first then fired at ' + time()));
asyncIterable.next().then(() => console.log('second then fired at ' + time()));

輸出：

first then fired at 3002
second then fired at 4005

第一個next() 結果還沒完成，立即發起的第二個next() ，會被記到佇列裡，等到前置next() 都完成以後，才實際去做

上例相當於：

let iterable = function*() {
let first;
yield first = sleep(3000);
// 排隊，等到前置yield promise都完成以後，才開始
yield first.then(() => sleep(1000));
}();

iterable.next().value.then(() => console.log('first then fired at ' + time()));
iterable.next().value.then(() => console.log('second then fired at ' + time()));

P.S.關於請求佇列機制的更多資訊，請檢視ES2018: asynchronous iteration | await in async generators

三.Rest/Spread Properties

ES2015裡推出了3種... 的語法：

不定引數
剩餘元素
JavaScript/Reference/Operators/Spread_syntax" rel="nofollow,noindex" target="_blank">展開元素

例如：

// 不定引數
function f(first, second, ...rest) {
console.log(rest);
}
// 剩餘元素
const iterable = [1, 2, 3, 4];
const [first, second, ...rest] = iterable;
// 展開元素
f(...iterable);

ES2018新增了兩種：

剩餘屬性

基本用法如下：

let { x, y, ...z } = { x: 1, y: 2, a: 3, b: 4 };
z;// { a: 3, b: 4 }

巢狀結構同樣適用：

let complex = {
x: { a: 1, b: 2, c: 3 }
};
let {
x: { a: xa, ...xbc }
} = complex;

常見的應用場景：

// 淺拷貝（不帶原型屬性）
let { ...aClone } = a;

// 擴充套件選項引數
function baseFunction({ a, b }) {
// ...
}
function wrapperFunction({ x, y, ...restConfig }) {
// do something with x and y
// pass the rest to the base function
return baseFunction(restConfig);
}

需要特別注意 ，解構賦值與剩餘屬性的差異：

let { x, y, ...z } = a;
// is not equivalent to
let { x, ...n } = a;
let { y, ...z } = n;

這兩種方式看似等價，實則不然：

let a = Object.create({x: 1, y: 2});
a.z = 3;

void (() => {
let { x, y, ...z } = a;
console.log(x, y, z); // 1 2 {z: 3}
})();
void (() => {
let { x, ...n } = a;
let { y, ...z } = n;
console.log(x, y, z); // 1 undefined {z: 3}
})();

關鍵區別在於剩餘屬性只取自身屬性 ，而解構賦值會取自身及原型鏈上的屬性，所以對照組中的y 變成undefined 了（n 拿不到原型屬性y ，僅拿到了例項屬性z ）

展開屬性

基本用法示例：

let n = { x, y, ...z };
n;// { x: 1, y: 2, a: 3, b: 4 }

常見應用場景：

// 淺拷貝（不帶原型屬性）
let aClone = { ...a };
// 等價於
let aClone = Object.assign({}, a);

// merge多個物件
let ab = { ...a, ...b };
// 等價於
let ab = Object.assign({}, a, b);

// 重寫屬性
let aWithOverrides = { ...a, x: 1, y: 2 };
// 或者
let aWithOverrides = { ...a, ...{ x: 1, y: 2 } };
// 等價於
let aWithOverrides = Object.assign({}, a, { x: 1, y: 2 });

// 預設屬性
let aWithDefaults = { x: 1, y: 2, ...a };
// 等價於
let aWithDefaults = Object.assign({ x: 1, y: 2 }, a);

// 打包-還原
let assembled = { x: 1, y: 2, a: 3, b: 4 };
let { x, y, ...z } = assembled;
let reassembled = { x, y, ...z };

P.S.關於打包-還原的實際應用，見react-redux原始碼解讀 | 預設引數與物件解構

另外，還有2個細節：

展開屬性只觸發（待展開物件的）getter，不觸發（目標物件的）setter
嘗試展開null, undefined 不會引發報錯，而是忽略掉

例如：

// 拷貝x時會觸發getter
let runtimeError = { ...{a: 1}, ...{ get x() { throw new Error('報錯') } } };
// 重寫x時候不觸發setter
let z = { set x(v) { throw new Error('不報錯'); }, ...{ x: 1 } }; // No error

四.正則表示式增強

說來話長，1999年ES3引入正則表示式支援，2016年的ES2015增強過一波：

JavaScript/Reference/Global_Objects/RegExp/unicode" rel="nofollow,noindex" target="_blank">Unicode mode (the u flag) ：實際應用見JavaScript emoji utils | 正則表示式中的Unicode
sticky mode (the y flag) ：嚴格從lastIndex 指定的位置開始匹配
the RegExp.prototype.flags getter ：獲取正則表示式物件所開啟的模式標識（gimuy 按字母序排列，分別表示全域性匹配、忽略大小寫、多行匹配、Unicode支援與嚴格模式）

2017年的ES2018進一步增強：

s (dotAll) flag for regular expressions ：點號通配模式，在此模式下，點號可以匹配任意字元（預設點號只能匹配除換行符外的任意字元）
RegExp Lookbehind Assertions ：肯定逆序環視，支援向後看
RegExp named capture groups ：命名捕獲分組
RegExp Unicode Property Escapes ：Unicode（序列）屬性轉義

s (dotAll) flag for regular expressions

不開s 模式的話，. （點號）能夠匹配除換行外的任意字元，換行符有4個：

U+000A LINE FEED (LF)(\n)
U+000D CARRIAGE RETURN (CR)(\r)
U+2028 LINE SEPARATOR：行分隔符
U+2029 PARAGRAPH SEPARATOR：段分隔符（與行分隔符一樣，都是不可見字元）

例如：

/a.c/.test('abc') === true
/a.c/.test('a\nc') === false
/a.c/.test('a\rc') === false
/a.c/.test('a\u2028c') === false
/a.c/.test('a\u2029c}') === false

要想匹配任意字元的話，只能通過一些技巧繞過，如：

// [^]匹配一個字元，什麼都不排除
/a[^]c/s.test('a\nc') === true
// [\s\S]匹配一個字元，任意空白字元和非空白字元
/a[^]c/s.test('a\nc') === true

有了點號通配模式以後，這些換行符都能被點號匹配（像其它語言的正則引擎一樣）：

const regex = /a.c/s;
regex.test('a\nc') === true

另外，還有兩個屬性用來獲取該模式是否已開啟：

regex.dotAll === true
regex.flags === 's'

注意，點號通配模式（s ）並不影響多行匹配模式（m ），二者是完全獨立的：

s ：隻影響. （點號）的匹配行為
m ：隻影響^$ 的匹配行為

可以一起用，也互不干擾 ：

// 不開m時，$匹配串尾
/^c$/.test('a\nc') === false
// 開m之後，$能夠匹配行尾
/^c$/m.test('a\nc') === true
// 同時開sm，各司其職
/^b./sm.test('a\nb\nc') === true

P.S.m 模式術語叫增強的行錨點模式（具體見正則表示式學習筆記 | 九.附表【元字元表】【模式控制符表】【特殊元字元表】）：

增強的行錨點模式，把段落分割成邏輯行，使得^和$可以匹配每一行的相應位置，而不是整個串的開始和結束位置

RegExp Lookbehind Assertions

正則環視（lookaround）相關的一個特性，環視的特點是不匹配任何字元，只匹配文字中的特定位置：

Lookarounds are zero-width assertions that match a string without consuming anything.

ES2018引入了逆序環視 ：

(?<=...) ：肯定逆序環視（Positive lookbehind assertions），子表示式能夠匹配左側文字時才成功匹配
(?<!...) ：否定逆序環視（Negative lookbehind assertions），子表示式不能匹配左側文字時才成功匹配

一種向後看的能力，典型應用場景如下：

// 從'$10.53'提取10.53，即捕獲左側是$符的數值
'$10.53'.match(/(?<=\$)\d+(\.\d*)?/)[0] === '10.53'
// 從'$-10.53 $-10 $0.53'提取正值0.53，即捕獲左側不是負號的數值
'$-10.53 $-10 $0.53'.match(/(?<=\$)(?<!-)\d+(\.\d*)?/g)[0] === '0.53'

向前看的能力一直都有，例如：

// (?=…) 肯定順序環視，子表示式能夠匹配右側文字
'baaabac'.match(/(?=(a+))a*b\1/)[0] === 'aba'
// (?!…) 否定順序環視，子表示式不能匹配右側文字
'testRegexp test-feature tesla'.match(/(?<=\s)(?!test-?)\w+/g)[0] === 'tesla'

具體見ES5規範15.10.2.8 Atom 中的NOTE 2與NOTE 3

逆序環視與反向引用

實現上，含逆序環視的正則表示式的匹配順序是從右向左的 ，例如：

// 逆序環視，從右向左掃描輸入串，所以$2貪婪匹配到了053
'1053'.replace(/(?<=(\d+))(\d+)$/, '[$1,$2]') === '1[1,053]'
// 一般情況，從左向右掃描輸入串，貪婪匹配$1為105
'1053'.replace(/^(\d+)(\d+)/, '[$1,$2]') === '[105,3]'

從上例能夠發現另一個細節：雖然掃描順序相反，但捕獲分組排序都是從左向右的

此外，逆序環視場景下反向掃描對反向引用有影響，畢竟只能引用已匹配過的內容：

Within a backreference, it is only possible to refer to captured groups that have already been evaluated.

所以要想匹配疊詞的話，應該這樣做：

/(?<=\1(.))/.test('哈哈') === true

而不是：

/(?<=(.)\1)/.test('哈8') === true

實際上，這裡的\1 什麼都匹配不到，永遠是空串 （因為從右向左掃，還沒捕獲哪來的引用），刪掉它也沒關係（/(?<=(.))/ ）

P.S.關於反向引用與逆序環視的更多資訊，見 Greediness proceeds from right to left

RegExp named capture groups

常見的日期格式轉換場景：

'2017-01-25'.replace(/(\d{4})-(\d{2})-(\d{2})/, '$1/$2/$3') === '2017/01/25'

我們通過$n 來引用對應的捕獲到的內容，存在兩個問題：

可讀性：$n 僅表示第幾個捕獲分組，不含其它語義
靈活性：一旦正則表示式中括號順序發生變化，replacement（$1/$2/$3 ）要跟著變

命名捕獲分組能夠很好的解決這兩個問題：

const reDate = /(?<yyyy>\d{4})-(?<mm>\d{2})-(?<dd>\d{2})/;
'2017-01-25'.replace(reDate, '$<yyyy>/$<mm>/$<dd>') === '2017/01/25'

正則表示式中的捕獲分組與replacement中的引用都有了額外語義

另外，匹配結果物件身上也有一份命名捕獲內容：

let result = reDate.exec('2017-01-25');
const { yyyy, mm, dd } = result.groups;
// 或者
// const { groups: {yyyy, mm, dd} } = result;
`${yyyy}/${mm}/${dd}` === '2017/01/25'

從語法上看，引入了3個新東西：

(?<name>...) ：命名捕獲型括號
\k<name> ：命名反向引用
$<name> ：命名replacement引用，函式形式的replacement把groups 作為最後一個引數，具體見Replacement targets

例如：

P.S.特性不錯，語法有點太長了啊，對比(...) 與(?<name>...) 。。。雖說是出於向後相容考慮

RegExp Unicode Property Escapes

Unicode字元有一些屬性，比如π 是希臘文字，在Unicode中對應的屬性是Script=Greek

為了支援根據Unicode屬性特徵匹配字元 的場景，提供了兩種語法：

\p{UnicodePropertyName=UnicodePropertyValue} ：匹配一個Unicode屬性名等於指定屬性值的字元
\p{LoneUnicodePropertyNameOrValue} ：匹配一個該Unicode屬性值為true的字元

P.S.對應的\P 表示補集

注意，都要開u 模式，不開不認

前者適用於非布林值（non-binary）屬性，後者用於布林值（binary）屬性，例如：

const regexGreekSymbol = /\p{Script=Greek}/u;
regexGreekSymbol.test('π') === true
// Unicode數字
/\p{Number}{2}/u.test('羅馬數字和帶圈數字Ⅵ㉜') === true
// Unicode版\d
/^\p{Decimal_Number}+$/u.test('') === true

P.S.支援的屬性名及值都按Unicode標準來，定義在PropertyAliases.txt 、ropertyValueAliases.txt ，布林值屬性定義在UTS18 RL1.2

喜報，Emoji問題也終於有終極解決方案了：

const reEmoji = /\p{Emoji_Modifier_Base}\p{Emoji_Modifier}?|\p{Emoji_Presentation}|\p{Emoji}\uFE0F/gu;
reEmoji.test('\u{1F469}') === true

P.S.關於binary為什麼表示布林值屬性，見1.2 Properties

P.S.Unicode字元表見

五.其它小特性

Promise.prototype.finally

Pending 的Promise要麼Resolved 要麼Rejected ，而有些時候需要的是Resolved || Rejected ，比如只想等到非同步操作結束，不論成功失敗，此時Promise.prototype.finally 就是最合適的解決方案：

fetch('http://www.example.com').finally(() => {
// 請求回來了（不論成功失敗），隱藏loading
document.querySelector('#loading').classList.add('hide');
});

可以在finally 塊裡做一些清理工作 （類似於try-catch-finally 的finally ），比如隱藏loading、關閉檔案描述符、log記錄操作已完成

之前類似的場景一般通過then(f, f) 來解決，但finally 的特點在於：

沒有引數（專職清理，不關心引數）
不論Resolved 還是Rejected 都觸發
不影響Promise鏈的狀態及結果（而then(() => {}, () => {}) 會得到Resolved undefined ），除非finally 塊裡的throw 或者return rejectedPromise 會讓Promise鏈變為Rejected error

例如：

Promise.resolve(1)
.finally(() => 2)
.finally((x) => new Promise((resolve) => {
setTimeout(() => {
resolve(x+1)
}, 3000);
}))
.then(
// 3秒後，log 1
res => console.log(res)
)

Resolved 1 始終沒被改變，因為從設計上不希望finally 影響返回值：

Syntactic finally can only modify the return value with an “abrupt completion”: either throwing an exception, or returning a value early. Promise#finally will not be able to modify the return value, except by creating an abrupt completion by throwing an exception (ie, rejecting the promise)

其中，returning a value early 指的是返回Rejected Promise，例如：

Promise.resolve(1)
// returning a value early
.finally(() => Promise.reject(2))
.catch(ex => console.log(ex))
.finally(() => {
// throwing an exception
throw 3;
})
.catch(ex => console.log(ex))

Lifting template literal restriction

模板字串預設識別（嘗試去匹配解釋）其中的轉義字元：

\u ：Unicode字元序列，如\u00FF 或\u{42}
\x ：十六進位制數值，如\xFF
\0 ：八進位制，如\101 ，具體見Octal escape sequences

P.S.實際上，八進位制轉義序列在模板字面量和嚴格模式下的字串字面量都是不合法的：

Octal escapes are forbidden in template literals and strict mode string literals.

對於不合法的轉義序列，會報錯：

// Uncaught SyntaxError: Invalid Unicode escape sequence
`\uZZZ`
// Uncaught SyntaxError: Invalid hexadecimal escape sequence
`\xxyz`
// Uncaught SyntaxError: Octal escape sequences are not allowed in template strings.
`\0999`
// 更容易出現的巧合
`windowsPath = c:\usrs\xxx\projects`

但是，模板字串作為ES2015最開放的特性：

標籤模板以開放的姿態歡迎庫設計者們來建立強有力領域特定語言。這些語言可能看起來不像JS，但是它們仍可以無縫嵌入到JS中並與JS的其它語言特性智慧互動。我不知道這一特性將會帶領們走向何方，但它蘊藏著無限的可能性，這令我感到異常興奮！

這種粗暴的預設解析實際上限制了模板字串的包容能力 ，例如latex ：

let latexDocument = `
\newcommand{\fun}{\textbf{Fun!}}// works just fine
\newcommand{\unicode}{\textbf{Unicode!}} // Illegal token!
\newcommand{\xerxes}{\textbf{King!}} // Illegal token!

Breve over the h goes \u{h}ere // Illegal token!
`

這是一段合法的latex原始碼，但其中的\unicode 、\xerxes 和\u{h}ere 會引發報錯

針對這個問題，ES2018決定對標籤模板去掉這層預設解析，把處理非法轉義序列的工作拋到上層 ：

Remove the restriction on escape sequences.
Lifting the restriction raises the question of how to handle cooked template values that contain illegal escape sequences.

例如：

function tag(strs) {
// 解析過的，存在非法轉義序列就是undefined
strs[0] === undefined
// 裸的，與輸入完全一致
strs.raw[0] === "\\unicode and \\u{55}";
}
tag`\unicode and \u{55}`

P.S.關於標籤模板的更多資訊，請檢視模板字串_ES6筆記3

注意，這個特性僅針對標籤模板 ，普通模板字串仍然保留之前的行為（遇到非法轉義序列會報錯）：

let bad = `bad escape sequence: \unicode`; // throws early error

六.總結

最實在的特性要數正則表示式相關增強，此外Promise任務模型正在逐步完善、generator與async function擦出了火花、已經廣泛應用的展開運算子終於敲定了、模板字串的包容性限制去掉了一些，使之符合設計初衷

總之，有點著急的JS語言正在往好的方向發展

P.S.ES2019相關資訊，見Finished Proposals

參考資料

ECMAScript regular expressions are getting better!
Template Literal Revision
ECMAScript regular expressions are getting better!