Python 資料結構：Hash Map（雜湊映射）(06)

Hash Map 是以「鍵—值（key–value）」方式儲存資料、並能在平均 O(1) 時間完成查找/插入/更新的資料結構。它以陣列為底、靠 Hash Function（雜湊函數） 將鍵轉為索引，進而定位到資料所在位置。本文帶你從概念到實作，完整掌握 Hash Map 的運作與 Python 實作要點。

內容目錄

為何需要映射（Mapping）

許多資料更適合以「關聯」而非「順序」來表達：

例如「州 → 州花」、「使用者帳號 → 使用者資訊」。
在映射中，每個「鍵」對應唯一的「值」（一對一關係）。
查找的重點在「給定鍵，快速得到值」，而非依序遍歷。

Hash Function 的角色

Hash Map 的核心在於：把鍵轉成陣列索引。流程如下：

將鍵（如字串）餵給 hash function，得到一個整數 hash code。
將 hash code 經由 壓縮函數（compression） 取模至 [0, array_size)，得到合法索引。

設計考量

輸入：通常為字串或任意型別（會先轉為位元組/整數序列）。
輸出：陣列索引（整數）。
效率：計算必須簡單快速（否則會拖累整體效能）。
不可逆：雜湊輸出資訊量少於輸入，本質不可逆。

小提醒：壓縮函數常用「hash_code % array_size」。良好的 hash function 能讓鍵分布均勻，減少碰撞。

碰撞（Collision）與解法

碰撞：不同鍵可能被映射到相同索引。這是無可避免的，因此需要策略處理。

1) Separate Chaining（分離鏈結）

陣列每格放的不是單一值，而是容器（常見：鏈結串列）。
同索引的多個鍵值對，串在同一條鏈上。
優點：插入簡單，陣列不用搬動。
缺點：碰撞多時，鏈會變長，查找成本接近 O(n)。

2) Open Addressing（開放定址）

仍只用陣列；若索引衝突，就**探測（Probing）**下一個位置直到找到空位。
Linear Probing：每次往後 +1（或固定步長）。
Quadratic Probing：依序 +1、+4、+9…（平方數），較能減緩群聚（clustering）。

多數教材都會介紹以上兩種。工業界實作還會考慮負載因子（load factor）、動態擴容（rehash & resize）等。

Python 與 Hash Map

實務上，Python 內建的 dict 就是高度最佳化的 Hash Table。
為了理解原理，下面示範Hash Map（以 list 模擬陣列，採 open addressing / linear probing）。

建構子：配置陣列

class HashMap:
    def __init__(self, array_size):
        self.array_size = array_size
        self.array = [None for _ in range(array_size)]

Hash 與 Compression

class HashMap:
    def __init__(self, array_size):
        self.array_size = array_size
        self.array = [None for _ in range(array_size)]

    def hash(self, key, count_collisions=0):
        key_bytes = key.encode()
        hash_code = sum(key_bytes)
        # 加上碰撞次數，支援 probing
        return hash_code + count_collisions

    def compressor(self, hash_code):
        return hash_code % self.array_size

指派（Setter）：處理碰撞（Linear Probing）

陣列每格存 [key, value]，以利檢查是否同鍵或不同鍵。

    def assign(self, key, value):
        array_index = self.compressor(self.hash(key))
        current = self.array[array_index]

        # 空位：直接放入
        if current is None:
            self.array[array_index] = [key, value]
            return

        # 同鍵：覆蓋
        if current[0] == key:
            self.array[array_index] = [key, value]
            return

        # 不同鍵：開放定址（線性探測）
        collisions = 1
        while True:
            new_index = self.compressor(self.hash(key, collisions))
            current = self.array[new_index]

            if current is None:
                self.array[new_index] = [key, value]
                return

            if current[0] == key:
                self.array[new_index] = [key, value]
                return

            collisions += 1

取值（Getter）：同樣需探測

    def retrieve(self, key):
        array_index = self.compressor(self.hash(key))
        current = self.array[array_index]

        if current is None:
            return None
        if current[0] == key:
            return current[1]

        # 不同鍵：沿相同探測序列繼續找
        collisions = 1
        while True:
            new_index = self.compressor(self.hash(key, collisions))
            current = self.array[new_index]

            if current is None:
                return None
            if current[0] == key:
                return current[1]

            collisions += 1

提醒：若你要支援刪除，在 open addressing 中一般會使用 lazy deletion（標記刪除），避免破壞探測序列。

結語

在理想情況下，它能以近乎 O(1) 的時間完成插入、查找與刪除，成為快取、索引、會話管理等場景的首選結構。真正影響效能與穩定度的，往往不是語法，而是工程細節：雜湊函數的分布品質、碰撞處理策略（separate chaining / open addressing）、負載因子控制與動態擴容（rehash）等。理解這些設計抉擇，能讓你在實務系統中做出「快且穩」的資料結構決策。

Python 資料結構：Hash Map（雜湊映射）(06)

為何需要映射（Mapping）

Hash Function 的角色

設計考量

碰撞（Collision）與解法

1) Separate Chaining（分離鏈結）

2) Open Addressing（開放定址）

Python 與 Hash Map

建構子：配置陣列

Hash 與 Compression

指派（Setter）：處理碰撞（Linear Probing）

取值（Getter）：同樣需探測

結語

發佈留言取消回覆

資料庫 SQL 入門（04）：GROUP BY 分組與聚合——COUNT/AVG/SUM 到 HAVING 的完整心法

Python 函數 (Functions) (10)入門：把重複變成「一次寫、到處用」——定義、呼叫、參數與回傳

Nunc Velrisus Commodo Maecenas Accumsan

n8n 移除重複項（Remove Duplicates）（10）：一次搞懂「當次去重」與「跨執行去重」

為何需要映射（Mapping）

Hash Function 的角色

設計考量

碰撞（Collision）與解法

1) Separate Chaining（分離鏈結）

2) Open Addressing（開放定址）

Python 與 Hash Map

建構子：配置陣列

Hash 與 Compression

指派（Setter）：處理碰撞（Linear Probing）

取值（Getter）：同樣需探測

結語

相關文章

Python 資料結構 : 堆疊 (Stack)(05)

Python 資料結構：佇列（Queue）(04)

Python 資料結構：雙向鏈結串列（Doubly Linked List）(03)

發佈留言取消回覆

Python 函數 (Functions) (10)入門：把重複變成「一次寫、到處用」——定義、呼叫、參數與回傳

Nunc Velrisus Commodo Maecenas Accumsan

n8n 移除重複項（Remove Duplicates）（10）：一次搞懂「當次去重」與「跨執行去重」