走進 JDK 之談談字串拼接

JDK · 發表 2019-04-11 23:21:50

摘要：今天是String 系列最後一篇了，字串的拼接。日常開發中，字串拼接是很常見的操作，一般常用的有以下幾種：直接使用+ 拼接使用String 的concat() 方法使用Str...

今天是String 系列最後一篇了，字串的拼接。日常開發中，字串拼接是很常見的操作，一般常用的有以下幾種：

直接使用+ 拼接
使用String 的concat() 方法
使用StringBuilder 的append() 方法
使用StringBuffer 的append() 方法

那麼，這幾種方法有什麼不同呢？具體效能如何？下面進行一個簡單的效能測試，程式碼如下：

public class StringTest {

public static void main(String[] args) {
int count = 1000;
String word = "Hello, ";
StringBuilder builder = new StringBuilder("Hello,");
StringBuffer buffer = new StringBuffer("Hello,");
long start, end;

start = System.currentTimeMillis();
for (int i = 0; i < count; i++) {
word += "java";
}
end = System.currentTimeMillis();
System.out.println("String + : " + (end - start));

word = "Hello, ";
start = System.currentTimeMillis();
for (int i = 0; i < count; i++) {
word = word.concat("java");
}
end = System.currentTimeMillis();
System.out.println("String.concat() : " + (end - start));

word = "Hello, ";
start = System.currentTimeMillis();
for (int i = 0; i < count; i++) {
builder.append("java");
}
word = builder.toString();
end = System.currentTimeMillis();
System.out.println("StringBuilder : " + (end - start));

word = "Hello, ";
start = System.currentTimeMillis();
for (int i = 0; i < count; i++) {
buffer.append("java");
}
word = buffer.toString();
end = System.currentTimeMillis();
System.out.println("StringBuffer : " + (end - start));
}
}
複製程式碼

執行結果如下所示：

	1k	1w	10w	100w
+	11	397	20191	720286
concat	3	72	5671	763612
StringBuilder	0	0	3	17
StringBuffer	1	1	4	36

以上都是執行一次的結果，可能不太嚴謹，但還是能反映問題的。執行次數越多，效能差距越明顯，StringBuilder >StringBuffer >contact >+ 。關於其中原因，我想很多人應該都知道。下面從原始碼角度分析一下這幾種字串拼接方式。

+

使用+ 拼接字串是效率最低的一種方式嗎？首先，我們要知道+ 具體是怎麼拼接字串的。對於這種我們不知道具體原理的時候，javap 是你的好選擇。從最簡單的一行程式碼開始：

String str = "a" + "b";
複製程式碼

這樣寫其實並不行，智慧的編譯器看到"a"+"b" 就知道你要幹啥了，所以你編譯出來就是String str = "ab" ，我們稍作修改就可以了：

String a = "a";
String str = a + "b";
複製程式碼

javap 看一下位元組碼：

0: ldc#2// String a
2: astore_1
3: new#3// class java/lang/StringBuilder
6: dup
7: invokespecial #4// Method java/lang/StringBuilder."<init>":()V
10: aload_1
11: invokevirtual #5// Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
14: ldc#6// String b
16: invokevirtual #5// Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
19: invokevirtual #7// Method java/lang/StringBuilder.toString:()Ljava/lang/String;
22: astore_2
23: return
複製程式碼

可以看到編譯器自動將+ 轉換成了StringBuilder.append() 方法，拼接之後再呼叫StringBuilder.toString() 方法轉換成字串。既然這樣的話，那豈不是應該和StringBuilder 的執行效率一樣了？別忘了，上面的測試程式碼使用for 迴圈模擬頻繁的字串拼接操作。使用+ 的話，在每一次迴圈中，都將重複下列操作：

新建StringBuilder 物件
呼叫StringBuilder.append() 方法
呼叫StringBuilder.toString() 方法，該方法會通過new String() 建立字串

幾萬次迴圈下來，你看看建立了多少中間物件，怪不得這麼慢，別人要麼以空間換時間，要麼以時間換空間。這傢伙倒好，即浪費時間，又浪費空間。所以，在頻繁拼接字串的情況下，儘量避免使用+ 。那麼，它存在的意義何在呢？有的時候我們就是要拼接兩個字串，使用+ ，直截了當。

String.concat()

public String concat(String str) {
int otherLen = str.length();
if (otherLen == 0) {
return this; // str 為空直接返回 this
}
int len = value.length;
char buf[] = Arrays.copyOf(value, len + otherLen);
str.getChars(buf, len);
return new String(buf, true);
}

void getChars(char dst[], int dstBegin) {
System.arraycopy(value, 0, dst, dstBegin, value.length);
}
複製程式碼

先構建新的字元陣列buf[] ，再利用System.arraycopy() 挪來挪去，最後new String() 構建字串。比+ 少了建立StringBuilder 的過程，但每次迴圈中，又要重新建立字元陣列，又要重新new 字串物件，頻繁拼接的時候效率還是不是很理想。

再提一點，當傳入str 長度為 0 時，直接返回this 。這好像是String 中唯一一個返回this 的地方了。

append()

StringBuilder 和StringBuffer 其實是很像的，它兩頻繁拼接字串的效率遠勝於+ 和concat 。當迴圈執行10w 次，分別耗時3ms 、4ms ,StringBuilder 還比StringBuffer 快那麼一點。至於為什麼，Read the fucking source code !

先看看StringBuilder.append() ：

@Override
public StringBuilder append(String str) {
super.append(str);
return this;
}
複製程式碼

並沒有什麼實際邏輯，直接呼叫了父類的append() 方法。看一下StringBuilder 的類宣告：

public final class StringBuilder
extends AbstractStringBuilder
implements java.io.Serializable, CharSequence{}
複製程式碼

StringBuilder 繼承了AbstractStringBuilder 類，StringBuferr 其實也是。所以它們實際上呼叫的都是是AbstractStringBuilder.append() ：

public AbstractStringBuilder append(String str) {
if (str == null)
return appendNull(); // 1
int len = str.length();
ensureCapacityInternal(count + len); // 2
str.getChars(0, len, value, count); // 3
count += len;
return this;
}
複製程式碼

程式碼中出現了兩個變數，value 和count ，先來看看它們是幹嘛的。

/**
* The value is used for character storage.
*/
char[] value;

/**
* The count is the number of characters used.
*/
int count;
複製程式碼

value 是一個字元陣列，用來儲存字元。它可以自動擴容，在後面的程式碼中你將會看到。count 是已使用的字元的數量，注意並不是vale[] 的長度。再回到append() 方法，分三部分來解析。

appendNull(String)

當append() 的引數為null 時呼叫，它並不是什麼都不新增，而是正如它的方法名那樣，追加了null 字串。

private AbstractStringBuilder appendNull() {
int c = count;
ensureCapacityInternal(c + 4);
final char[] value = this.value;
value[c++] = 'n';
value[c++] = 'u';
value[c++] = 'l';
value[c++] = 'l';
count = c;
return this;
}
複製程式碼

ensureCapacityInternal(int)

private void ensureCapacityInternal(int minimumCapacity) {
// overflow-conscious code
if (minimumCapacity - value.length > 0) {
value = Arrays.copyOf(value,newCapacity(minimumCapacity));
}
}
複製程式碼

ensureCapacityInternal() 方法用來確保value[] 的容量足以拼接引數中的字串。如果容量不夠，將呼叫Arrays.copyOf(value,newCapacity(minimumCapacity)) 對value[] 進行擴容，newCapacity(minimumCapacity) 就是字元陣列的新長度。

private int newCapacity(int minCapacity) {
// overflow-conscious code
// 新容量等於舊容量乘以 2 再加上 2
int newCapacity = (value.length << 1) + 2;
if (newCapacity - minCapacity < 0) {
newCapacity = minCapacity;
}
return (newCapacity <= 0 || MAX_ARRAY_SIZE - newCapacity < 0)
? hugeCapacity(minCapacity)
: newCapacity;
}

private int hugeCapacity(int minCapacity) {
if (Integer.MAX_VALUE - minCapacity < 0) { // overflow
// 如果需求容量大於 Integer 最大值，直接丟擲 OOM
throw new OutOfMemoryError();
}
return (minCapacity > MAX_ARRAY_SIZE)
? minCapacity : MAX_ARRAY_SIZE;
}
複製程式碼

基本的擴容邏輯是，新的陣列大小是原來的兩倍再加上 2，但是有個最大值MAX_ARRAY_SIZE ，其值是Integer.MAX_VALUE - 8 ，減去 8 是因為一些虛擬機器會在陣列中保留一些頭資訊。當然，一般在程式中也達不到這個最大值。如果我們直接和虛擬機器說，我需要一個大小為Integer.MAX_VALUE 的新陣列，那會直接丟擲OOM 。

getChars()

新陣列建立好了，那麼剩下的就是拼接字串了。

str.getChars(0, len, value, count);
count += len;
複製程式碼

str 是要拼接的字串，是不是對這個getChars() 方法很眼熟。仔細看過String 原始碼的話，應該對這個方法還有印象。

public void getChars(int srcBegin, int srcEnd, char dst[], int dstBegin) {
if (srcBegin < 0) {
throw new StringIndexOutOfBoundsException(srcBegin);
}
if (srcEnd > value.length) {
throw new StringIndexOutOfBoundsException(srcEnd);
}
if (srcBegin > srcEnd) {
throw new StringIndexOutOfBoundsException(srcEnd - srcBegin);
}
System.arraycopy(value, srcBegin, dst, dstBegin, srcEnd - srcBegin);
}
複製程式碼

進行一些邊界判斷之後，利用System.arraycopy() 拼接字串。

看完這三部分，也就完成了一次字串拼接。回想一下，在大量拼接字串的過程中，append() 把時間都花在了哪裡？陣列擴容和System.arraycopy() 操作，的確比+ 和concat() 不停的new 物件效率高多了。

還記得StringBuffer 雖然也同樣快，但是比StringBuilder 慢了一些吧！來看看StringBuffer 的實現：

@Override
public synchronized StringBuffer append(String str) {
toStringCache = null;
super.append(str);
return this;
}
複製程式碼

邏輯是完全一致的，但是多了synchronized 關鍵字，用來保證執行緒安全。所以會比StringBuilder 耗時一些。關於StringBuilder 和StringBuffer 之間的區別，除了synchronized 關鍵字就沒有了。