Collections 原始碼閱讀

Python · 發表 2018-11-18 13:37:57

摘要： deque deque並不是普通的教科書式的雙鏈表實現。經典實現是： struct node { struct node *prev; struct node *next; void *data; } node; struct list { struc...

deque

deque並不是普通的教科書式的雙鏈表實現。經典實現是：

struct node {
struct node *prev;
struct node *next;
void *data;
} node;

struct list {
struct node *leftnode;
struct node *rightnode;
} list;

這樣每個節點都儲存了前一個和後一個節點的指標，64位機器上每個節點空間使用量為（不計算記憶體對齊）：8 + 8 + 8 = 24(bytes) 。

而deque的實現為：

#define BLOCKLEN 64

typedef struct BLOCK {
struct BLOCK *leftlink;
PyObject *data[BLOCKLEN];
struct BLOCK *rightlink;
} block;

typedef struct {
PyObject_VAR_HEAD
block *leftblock;
block *rightblock;
Py_ssize_t leftindex;/* 0 <= leftindex < BLOCKLEN */
Py_ssize_t rightindex;/* 0 <= rightindex < BLOCKLEN */
size_t state;/* incremented whenever the indices move */
Py_ssize_t maxlen;/* maxlen is -1 for unbounded deques */
PyObject *weakreflist;
} dequeobject;

一個塊的記憶體容量為：8 + 8 * 64 + 8 = 528(bytes) ，平均到data陣列中的每一個成員，記憶體使用量為：528/64 = 8.25 。是不是大大的節省了記憶體空間？不過對於dequeobject 我沒搞懂的是裡面的state ，每次操作他都會+1 但是卻沒有看到具體用途。

對於deque，pop popleft append appendleft 操作都是O(1) 的時間效率，因為只要一動一下指標就行了。而insert 時間複雜度為O(N) ，其底層實現在_deque_rotate 函式中（_deque_rotate 函式的具體實現方式是每次移動一個塊，直到移動完n個數據為止），insert操作的實現方式是，先把insert(index, object) index左邊的資料rotate到右邊，然後插入，然後再把剛才的資料rotate回來。

ordereddict

ordereddict實現方式為繼承dict，然後底層用一個雙向連結串列儲存順序。而雙向連結串列的實現比較有趣，是：

class _Link(object):
__slots__ = 'prev', 'next', 'key', '__weakref__'

class OrderedDict(dict):
def __init__(self):
# ...
try:
self.__root
except AttributeError:
self.__hardroot = _Link()
self.__root = root = _proxy(self.__hardroot)
root.prev = root.next = root
self.__map = {}
# ...

其中_proxy 返回的弱引用。我們再看一下__setitem__ 操作：

def __setitem__(self, key, value, dict_setitem=dict.__setitem__, proxy=_proxy, Link=_Link):
if key not in self:
self.__map[key] = link = Link()
root = self.__root
last = root.prev
link.prev, link.next, link.key = last, root, key
last.next = link
root.prev = proxy(link)
dict_setitem(self, key, value)

self.__map 裡用key-value 形式儲存每個key的前後節點。

namedtuple

namedtuple 返回的是class tuple 的子類。所以使用層面上和tuple一致，包括支援index等。實現原理是，通過上面定義的模板，把typename傳進去當做namedtuple 的名字，然後exec生成，放到__name__ 名稱空間下。

In [1]: from collections import namedtuple

In [2]: a = namedtuple("Point", ['x', 'y'])

In [3]: a(1, 2)
Out[3]: Point(x=1, y=2)

In [4]: print(a(1, 2)._source)
from builtins import property as _property, tuple as _tuple
from operator import itemgetter as _itemgetter
from collections import OrderedDict

class Point(tuple):
'Point(x, y)'

__slots__ = ()

_fields = ('x', 'y')

def __new__(_cls, x, y):
'Create new instance of Point(x, y)'
return _tuple.__new__(_cls, (x, y))

@classmethod
def _make(cls, iterable, new=tuple.__new__, len=len):
'Make a new Point object from a sequence or iterable'
result = new(cls, iterable)
if len(result) != 2:
raise TypeError('Expected 2 arguments, got %d' % len(result))
return result

def _replace(_self, **kwds):
'Return a new Point object replacing specified fields with new values'
result = _self._make(map(kwds.pop, ('x', 'y'), _self))
if kwds:
raise ValueError('Got unexpected field names: %r' % list(kwds))
return result

def __repr__(self):
'Return a nicely formatted representation string'
return self.__class__.__name__ + '(x=%r, y=%r)' % self

def _asdict(self):
'Return a new OrderedDict which maps field names to their values.'
return OrderedDict(zip(self._fields, self))

def __getnewargs__(self):
'Return self as a plain tuple.Used by copy and pickle.'
return tuple(self)

x = _property(_itemgetter(0), doc='Alias for field number 0')

y = _property(_itemgetter(1), doc='Alias for field number 1')



In [5]:

counter

counter也是繼承自dict，主要實現原理是這麼一句話：

for elem, count in iterable.items():
self[elem] = count + self_get(elem, 0)

chainmap

chainmap實現其實就是一個proxy，我們看看__init__ 和__getitem__ 就知道了：

class ChainMap(MutableMapping):
def __init__(self, *maps):
self.maps = list(maps) or [{}]

def __getitem__(self, key):
for mapping in self.maps:
try:
return mapping[key]
except KeyError:
pass
return self.__missing__(key)

其他的幾個例如UserDict ，UserString 就不多說了。