Collections 原始碼閱讀
deque
deque並不是普通的教科書式的雙鏈表實現。經典實現是:
struct node { struct node *prev; struct node *next; void *data; } node; struct list { struct node *leftnode; struct node *rightnode; } list;
這樣每個節點都儲存了前一個和後一個節點的指標,64位機器上每個節點空間使用量
為(不計算記憶體對齊):8 + 8 + 8 = 24(bytes)
。
而deque的實現為:
#define BLOCKLEN 64 typedef struct BLOCK { struct BLOCK *leftlink; PyObject *data[BLOCKLEN]; struct BLOCK *rightlink; } block; typedef struct { PyObject_VAR_HEAD block *leftblock; block *rightblock; Py_ssize_t leftindex;/* 0 <= leftindex < BLOCKLEN */ Py_ssize_t rightindex;/* 0 <= rightindex < BLOCKLEN */ size_t state;/* incremented whenever the indices move */ Py_ssize_t maxlen;/* maxlen is -1 for unbounded deques */ PyObject *weakreflist; } dequeobject;
一個塊的記憶體容量為:8 + 8 * 64 + 8 = 528(bytes)
,平均到data陣列中的每一個
成員,記憶體使用量為:528/64 = 8.25
。是不是大大的節省了記憶體空間?不過對於dequeobject
我沒搞懂的是裡面的state
,每次操作他都會+1
但是卻沒有看到
具體用途。
對於deque,pop popleft append appendleft
操作都是O(1)
的時間效率,因為
只要一動一下指標就行了。而insert
時間複雜度為O(N)
,其底層實現在_deque_rotate
函式中(_deque_rotate
函式的具體實現方式是每次移動一個塊,
直到移動完n個數據為止),insert操作的實現方式是,先把insert(index, object)
index左邊的資料rotate到右邊,然後插入,然後再把剛才的資料rotate回來。
ordereddict
ordereddict實現方式為繼承dict,然後底層用一個雙向連結串列儲存順序。而雙向連結串列的 實現比較有趣,是:
class _Link(object): __slots__ = 'prev', 'next', 'key', '__weakref__' class OrderedDict(dict): def __init__(self): # ... try: self.__root except AttributeError: self.__hardroot = _Link() self.__root = root = _proxy(self.__hardroot) root.prev = root.next = root self.__map = {} # ...
其中_proxy
返回的弱引用。我們再看一下__setitem__
操作:
def __setitem__(self, key, value, dict_setitem=dict.__setitem__, proxy=_proxy, Link=_Link): if key not in self: self.__map[key] = link = Link() root = self.__root last = root.prev link.prev, link.next, link.key = last, root, key last.next = link root.prev = proxy(link) dict_setitem(self, key, value)
self.__map
裡用key-value
形式儲存每個key的前後節點。
namedtuple
namedtuple
返回的是class tuple
的子類。所以使用層面上和tuple一致,包括
支援index等。實現原理是,通過上面定義的模板,把typename傳進去當做namedtuple
的名字,然後exec生成,放到__name__
名稱空間下。
In [1]: from collections import namedtuple In [2]: a = namedtuple("Point", ['x', 'y']) In [3]: a(1, 2) Out[3]: Point(x=1, y=2) In [4]: print(a(1, 2)._source) from builtins import property as _property, tuple as _tuple from operator import itemgetter as _itemgetter from collections import OrderedDict class Point(tuple): 'Point(x, y)' __slots__ = () _fields = ('x', 'y') def __new__(_cls, x, y): 'Create new instance of Point(x, y)' return _tuple.__new__(_cls, (x, y)) @classmethod def _make(cls, iterable, new=tuple.__new__, len=len): 'Make a new Point object from a sequence or iterable' result = new(cls, iterable) if len(result) != 2: raise TypeError('Expected 2 arguments, got %d' % len(result)) return result def _replace(_self, **kwds): 'Return a new Point object replacing specified fields with new values' result = _self._make(map(kwds.pop, ('x', 'y'), _self)) if kwds: raise ValueError('Got unexpected field names: %r' % list(kwds)) return result def __repr__(self): 'Return a nicely formatted representation string' return self.__class__.__name__ + '(x=%r, y=%r)' % self def _asdict(self): 'Return a new OrderedDict which maps field names to their values.' return OrderedDict(zip(self._fields, self)) def __getnewargs__(self): 'Return self as a plain tuple.Used by copy and pickle.' return tuple(self) x = _property(_itemgetter(0), doc='Alias for field number 0') y = _property(_itemgetter(1), doc='Alias for field number 1') In [5]:
counter
counter也是繼承自dict,主要實現原理是這麼一句話:
for elem, count in iterable.items(): self[elem] = count + self_get(elem, 0)
chainmap
chainmap實現其實就是一個proxy,我們看看__init__
和__getitem__
就知道了:
class ChainMap(MutableMapping): def __init__(self, *maps): self.maps = list(maps) or [{}] def __getitem__(self, key): for mapping in self.maps: try: return mapping[key] except KeyError: pass return self.__missing__(key)
其他的幾個例如UserDict
,UserString
就不多說了。