PostgreSQL 原始碼解讀(96)- 分割槽表#3(資料插入路由#3-獲取分割槽鍵值)
本節介紹了ExecPrepareTupleRouting->ExecFindPartition->FormPartitionKeyDatum函式,該函式獲取Tuple的分割槽鍵值。
一、資料結構
ModifyTable
通過插入、更新或刪除,將子計劃生成的行應用到結果表。
/* ---------------- *ModifyTable node - *Apply rows produced by subplan(s) to result table(s), *by inserting, updating, or deleting. *通過插入、更新或刪除,將子計劃生成的行應用到結果表。 * * If the originally named target table is a partitioned table, both * nominalRelation and rootRelation contain the RT index of the partition * root, which is not otherwise mentioned in the plan.Otherwise rootRelation * is zero.However, nominalRelation will always be set, as it's the rel that * EXPLAIN should claim is the INSERT/UPDATE/DELETE target. * 如果最初命名的目標表是分割槽表,則nominalRelation和rootRelation都包含分割槽根的RT索引,計劃中沒有另外提到這個索引。 * 否則,根關係為零。但是,總是會設定名義關係,nominalRelation因為EXPLAIN應該宣告的rel是INSERT/UPDATE/DELETE目標關係。 * * Note that rowMarks and epqParam are presumed to be valid for all the * subplan(s); they can't contain any info that varies across subplans. * 注意,rowMarks和epqParam被假定對所有子計劃有效; * 它們不能包含任何在子計劃中變化的資訊。 * ---------------- */ typedef struct ModifyTable { Planplan; CmdTypeoperation;/* 操作型別;INSERT, UPDATE, or DELETE */ boolcanSetTag;/* 是否需要設定tag?do we set the command tag/es_processed? */ IndexnominalRelation;/* 用於EXPLAIN的父RT索引;Parent RT index for use of EXPLAIN */ IndexrootRelation;/* 根Root RT索引(如目標為分割槽表);Root RT index, if target is partitioned */ boolpartColsUpdated;/* 更新了層次結構中的分割槽關鍵字;some part key in hierarchy updated */ List*resultRelations;/* RT索引的整型連結串列;integer list of RT indexes */ intresultRelIndex; /* 計劃連結串列中第一個resultRel的索引;index of first resultRel in plan's list */ introotResultRelIndex; /* 分割槽表根索引;index of the partitioned table root */ List*plans;/* 生成源資料的計劃連結串列;plan(s) producing source data */ List*withCheckOptionLists;/* 每一個目標表均具備的WCO連結串列;per-target-table WCO lists */ List*returningLists; /* 每一個目標表均具備的RETURNING連結串列;per-target-table RETURNING tlists */ List*fdwPrivLists;/* 每一個目標表的FDW私有資料鏈表;per-target-table FDW private data lists */ Bitmapset*fdwDirectModifyPlans;/* FDW DM計劃索引點陣圖;indices of FDW DM plans */ List*rowMarks;/* rowMarks連結串列;PlanRowMarks (non-locking only) */ intepqParam;/* EvalPlanQual再解析使用的引數ID;ID of Param for EvalPlanQual re-eval */ OnConflictAction onConflictAction;/* ON CONFLICT action */ List*arbiterIndexes; /* 衝突仲裁器索引表;List of ON CONFLICT arbiter index OIDs*/ List*onConflictSet;/* SET for INSERT ON CONFLICT DO UPDATE */ Node*onConflictWhere;/* WHERE for ON CONFLICT UPDATE */ IndexexclRelRTI;/* RTI of the EXCLUDED pseudo relation */ List*exclRelTlist;/* 已排除偽關係的投影列連結串列;tlist of the EXCLUDED pseudo relation */ } ModifyTable;
ResultRelInfo
ResultRelInfo結構體
每當更新一個現有的關係時,我們必須更新關係上的索引,也許還需要觸發觸發器。ResultRelInfo儲存關於結果關係所需的所有信息,包括索引。
/* * ResultRelInfo * ResultRelInfo結構體 * * Whenever we update an existing relation, we have to update indexes on the * relation, and perhaps also fire triggers.ResultRelInfo holds all the * information needed about a result relation, including indexes. * 每當更新一個現有的關係時,我們必須更新關係上的索引,也許還需要觸發觸發器。 * ResultRelInfo儲存關於結果關係所需的所有資訊,包括索引。 * * Normally, a ResultRelInfo refers to a table that is in the query's * range table; then ri_RangeTableIndex is the RT index and ri_RelationDesc * is just a copy of the relevant es_relations[] entry.But sometimes, * in ResultRelInfos used only for triggers, ri_RangeTableIndex is zero * and ri_RelationDesc is a separately-opened relcache pointer that needs * to be separately closed.See ExecGetTriggerResultRel. * 通常,ResultRelInfo是指查詢範圍表中的表; * ri_RangeTableIndex是RT索引,而ri_RelationDesc只是相關es_relations[]條目的副本。 * 但有時,在只用於觸發器的ResultRelInfos中,ri_RangeTableIndex為零(NULL), *而ri_RelationDesc是一個需要單獨關閉單獨開啟的relcache指標。 *具體可參考ExecGetTriggerResultRel結構體。 */ typedef struct ResultRelInfo { NodeTagtype; /* result relation's range table index, or 0 if not in range table */ //RTE索引 Indexri_RangeTableIndex; /* relation descriptor for result relation */ //結果/目標relation的描述符 Relationri_RelationDesc; /* # of indices existing on result relation */ //目標關係中索引數目 intri_NumIndices; /* array of relation descriptors for indices */ //索引的關係描述符陣列(索引視為一個relation) RelationPtr ri_IndexRelationDescs; /* array of key/attr info for indices */ //索引的鍵/屬性陣列 IndexInfo **ri_IndexRelationInfo; /* triggers to be fired, if any */ //觸發的索引 TriggerDesc *ri_TrigDesc; /* cached lookup info for trigger functions */ //觸發器函式(快取) FmgrInfo*ri_TrigFunctions; /* array of trigger WHEN expr states */ //WHEN表示式狀態的觸發器陣列 ExprState **ri_TrigWhenExprs; /* optional runtime measurements for triggers */ //可選的觸發器執行期度量器 Instrumentation *ri_TrigInstrument; /* FDW callback functions, if foreign table */ //FDW回撥函式 struct FdwRoutine *ri_FdwRoutine; /* available to save private state of FDW */ //可用於儲存FDW的私有狀態 void*ri_FdwState; /* true when modifying foreign table directly */ //直接更新FDW時為T boolri_usesFdwDirectModify; /* list of WithCheckOption's to be checked */ //WithCheckOption連結串列 List*ri_WithCheckOptions; /* list of WithCheckOption expr states */ //WithCheckOption表示式連結串列 List*ri_WithCheckOptionExprs; /* array of constraint-checking expr states */ //約束檢查表示式狀態陣列 ExprState **ri_ConstraintExprs; /* for removing junk attributes from tuples */ //用於從元組中刪除junk屬性 JunkFilter *ri_junkFilter; /* list of RETURNING expressions */ //RETURNING表示式連結串列 List*ri_returningList; /* for computing a RETURNING list */ //用於計算RETURNING連結串列 ProjectionInfo *ri_projectReturning; /* list of arbiter indexes to use to check conflicts */ //用於檢查衝突的仲裁器索引的列表 List*ri_onConflictArbiterIndexes; /* ON CONFLICT evaluation state */ //ON CONFLICT解析狀態 OnConflictSetState *ri_onConflict; /* partition check expression */ //分割槽檢查表示式連結串列 List*ri_PartitionCheck; /* partition check expression state */ //分割槽檢查表示式狀態 ExprState*ri_PartitionCheckExpr; /* relation descriptor for root partitioned table */ //分割槽root根表描述符 Relationri_PartitionRoot; /* Additional information specific to partition tuple routing */ //額外的分割槽元組路由資訊 struct PartitionRoutingInfo *ri_PartitionInfo; } ResultRelInfo;
PartitionRoutingInfo
PartitionRoutingInfo結構體
分割槽路由資訊,用於將元組路由到表分割槽的結果關係資訊。
/* * PartitionRoutingInfo * PartitionRoutingInfo - 分割槽路由資訊 * * Additional result relation information specific to routing tuples to a * table partition. * 用於將元組路由到表分割槽的結果關係資訊。 */ typedef struct PartitionRoutingInfo { /* * Map for converting tuples in root partitioned table format into * partition format, or NULL if no conversion is required. * 對映,用於將根分割槽表格式的元組轉換為分割槽格式,如果不需要轉換,則轉換為NULL。 */ TupleConversionMap *pi_RootToPartitionMap; /* * Map for converting tuples in partition format into the root partitioned * table format, or NULL if no conversion is required. * 對映,用於將分割槽格式的元組轉換為根分割槽表格式,如果不需要轉換,則轉換為NULL。 */ TupleConversionMap *pi_PartitionToRootMap; /* * Slot to store tuples in partition format, or NULL when no translation * is required between root and partition. * 以分割槽格式儲存元組的slot.在根分割槽和分割槽之間不需要轉換時為NULL。 */ TupleTableSlot *pi_PartitionTupleSlot; } PartitionRoutingInfo;
TupleConversionMap
TupleConversionMap結構體,用於儲存元組轉換對映資訊.
typedef struct TupleConversionMap { TupleDescindesc;/* 源行型別的描述符;tupdesc for source rowtype */ TupleDescoutdesc;/* 結果行型別的描述符;tupdesc for result rowtype */ AttrNumber *attrMap;/* 輸入欄位的索引資訊,0表示NULL;indexes of input fields, or 0 for null */ Datum*invalues;/* 析構源資料的工作空間;workspace for deconstructing source */ bool*inisnull;//是否為NULL標記陣列 Datum*outvalues;/* 構造結果的工作空間;workspace for constructing result */ bool*outisnull;//null標記 } TupleConversionMap;
二、原始碼解讀
FormPartitionKeyDatum函式獲取Tuple的分割槽鍵值,返回鍵值values[]陣列和是否為null標記isnull[]陣列.
/* ---------------- *FormPartitionKeyDatum *Construct values[] and isnull[] arrays for the partition key *of a tuple. *構造values[]陣列和isnull[]陣列 * *pdPartition dispatch object of the partitioned table *pd分割槽表的分割槽分發器(dispatch)物件 * *slotHeap tuple from which to extract partition key *slot從其中提前分割槽鍵的heap tuple * *estateexecutor state for evaluating any partition key *expressions (must be non-NULL) *estate解析分割槽鍵表示式(必須非NULL)的執行器狀態 * *valuesArray of partition key Datums (output area) *分割槽鍵Datums陣列(輸出引數) *isnullArray of is-null indicators (output area) *is-null標記陣列(輸出引數) * * the ecxt_scantuple slot of estate's per-tuple expr context must point to * the heap tuple passed in. * estate的per-tuple上下文的ecxt_scantuple必須指向傳入的heap tuple * ---------------- */ static void FormPartitionKeyDatum(PartitionDispatch pd, TupleTableSlot *slot, EState *estate, Datum *values, bool *isnull) { ListCell*partexpr_item; inti; if (pd->key->partexprs != NIL && pd->keystate == NIL) { /* Check caller has set up context correctly */ //檢查呼叫者是否已正確配置記憶體上下文 Assert(estate != NULL && GetPerTupleExprContext(estate)->ecxt_scantuple == slot); /* First time through, set up expression evaluation state */ //第一次進入,配置表示式解析器狀態 pd->keystate = ExecPrepareExprList(pd->key->partexprs, estate); } partexpr_item = list_head(pd->keystate);//獲取分割槽鍵表示式狀態 for (i = 0; i < pd->key->partnatts; i++)//迴圈遍歷分割槽鍵 { AttrNumberkeycol = pd->key->partattrs[i];//分割槽鍵屬性編號 Datumdatum;// typedef uintptr_t Datum;sizeof(Datum) == sizeof(void *) == 4 or 8 boolisNull;//是否null if (keycol != 0)//編號不為0 { /* Plain column; get the value directly from the heap tuple */ //扁平列,直接從堆元組中提取值 datum = slot_getattr(slot, keycol, &isNull); } else { /* Expression; need to evaluate it */ //表示式,需要解析 if (partexpr_item == NULL)//分割槽鍵表示式狀態為NULL,報錯 elog(ERROR, "wrong number of partition key expressions"); //獲取表示式值 datum = ExecEvalExprSwitchContext((ExprState *) lfirst(partexpr_item), GetPerTupleExprContext(estate), &isNull); //切換至下一個 partexpr_item = lnext(partexpr_item); } values[i] = datum;//賦值 isnull[i] = isNull; } if (partexpr_item != NULL)//引數設定有誤?報錯 elog(ERROR, "wrong number of partition key expressions"); } /* * slot_getattr - fetch one attribute of the slot's contents. * slot_getattr - 提取slot中的某個屬性值 */ static inline Datum slot_getattr(TupleTableSlot *slot, int attnum, bool *isnull) { AssertArg(attnum > 0); if (attnum > slot->tts_nvalid) slot_getsomeattrs(slot, attnum); *isnull = slot->tts_isnull[attnum - 1]; return slot->tts_values[attnum - 1]; } /* * This function forces the entries of the slot's Datum/isnull arrays to be * valid at least up through the attnum'th entry. * 這個函式強制slot的Datum/isnull陣列的條目至少在attnum的第一個條目上是有效的。 */ static inline void slot_getsomeattrs(TupleTableSlot *slot, int attnum) { if (slot->tts_nvalid < attnum) slot_getsomeattrs_int(slot, attnum); } /* * slot_getsomeattrs_int - workhorse for slot_getsomeattrs() * slot_getsomeattrs_int - slot_getsomeattrs()函式的實際實現 */ void slot_getsomeattrs_int(TupleTableSlot *slot, int attnum) { /* Check for caller errors */ //檢查呼叫者輸入引數是否有誤 Assert(slot->tts_nvalid < attnum); /* slot_getsomeattr checked */ Assert(attnum > 0); //attnum引數判斷 if (unlikely(attnum > slot->tts_tupleDescriptor->natts)) elog(ERROR, "invalid attribute number %d", attnum); /* Fetch as many attributes as possible from the underlying tuple. */ //從元組中獲取儘可能多的屬性。 slot->tts_ops->getsomeattrs(slot, attnum); /* * If the underlying tuple doesn't have enough attributes, tuple descriptor * must have the missing attributes. * 如果底層元組沒有足夠的屬性,那麼元組描述符必須具有缺少的屬性。 */ if (unlikely(slot->tts_nvalid < attnum)) { slot_getmissingattrs(slot, slot->tts_nvalid, attnum); slot->tts_nvalid = attnum; } }
三、跟蹤分析
測試指令碼如下
-- Hash Partition drop table if exists t_hash_partition; create table t_hash_partition (c1 int not null,c2varchar(40),c3 varchar(40)) partition by hash(c1); create table t_hash_partition_1 partition of t_hash_partition for values with (modulus 6,remainder 0); create table t_hash_partition_2 partition of t_hash_partition for values with (modulus 6,remainder 1); create table t_hash_partition_3 partition of t_hash_partition for values with (modulus 6,remainder 2); create table t_hash_partition_4 partition of t_hash_partition for values with (modulus 6,remainder 3); create table t_hash_partition_5 partition of t_hash_partition for values with (modulus 6,remainder 4); create table t_hash_partition_6 partition of t_hash_partition for values with (modulus 6,remainder 5); insert into t_hash_partition(c1,c2,c3) VALUES(20,'HASH0','HAHS0');
啟動gdb,設定斷點
(gdb) b FormPartitionKeyDatum Breakpoint 5 at 0x6e30d2: file execPartition.c, line 1087. (gdb) b slot_getattr Breakpoint 6 at 0x489d9b: file heaptuple.c, line 1510. (gdb) c Continuing. Breakpoint 5, FormPartitionKeyDatum (pd=0x2e1bfa0, slot=0x2e1b8a0, estate=0x2e1aeb8, values=0x7fff4e2407a0, isnull=0x7fff4e240780) at execPartition.c:1087 1087if (pd->key->partexprs != NIL && pd->keystate == NIL)
迴圈,根據分割槽鍵獲取相應的鍵值
1087if (pd->key->partexprs != NIL && pd->keystate == NIL) (gdb) n 1097partexpr_item = list_head(pd->keystate); (gdb) 1098for (i = 0; i < pd->key->partnatts; i++) (gdb) 1100AttrNumberkeycol = pd->key->partattrs[i]; (gdb) 1104if (keycol != 0) (gdb) 1107datum = slot_getattr(slot, keycol, &isNull);
進入函式slot_getattr
(gdb) step Breakpoint 6, slot_getattr (slot=0x2e1b8a0, attnum=1, isnull=0x7fff4e240735) at heaptuple.c:1510 1510HeapTupletuple = slot->tts_tuple;
獲取結果,分割槽鍵值為20
... (gdb) p *isnull $31 = false (gdb) p slot->tts_values[attnum - 1] $32 = 20
返回到FormPartitionKeyDatum函式中
(gdb) n 1593} (gdb) FormPartitionKeyDatum (pd=0x2e1bfa0, slot=0x2e1b8a0, estate=0x2e1aeb8, values=0x7fff4e2407a0, isnull=0x7fff4e240780) at execPartition.c:1119 1119values[i] = datum;
完成呼叫
1119values[i] = datum; (gdb) n 1120isnull[i] = isNull; (gdb) 1098for (i = 0; i < pd->key->partnatts; i++) (gdb) 1123if (partexpr_item != NULL) (gdb) 1125} (gdb) ExecFindPartition (resultRelInfo=0x2e1b108, pd=0x2e1c5b8, slot=0x2e1b8a0, estate=0x2e1aeb8) at execPartition.c:282 282if (partdesc->nparts == 0)
DONE!
四、參考資料
PG 11.1 Source Code.
注:ofollow,noindex">doxygen 上的原始碼與PG 11.1原始碼並不一致,本節基於11.1進行分析.