C++ 中的 map, unordered_map, cc_hash_table, gp_hash

做题时，常常会用到查重操作，可以使用 STL 中的 map 与 unordered_map ，也可以使用 “平板电视” 中的 cc_hash_table 和 gp_hash_table 实现。

\(\texttt{map}\)

map 的内部实现是红黑树，插入、查找元素的时间复杂度都是 \(O(\log n)\)。

map<int,bool>m;
int n;
cin>>n;
for(int i=1;i<=n;i++)
{
	int t;
	cin>>t;
	m[t]=1;
}
cin>>n;
for(int i=1;i<=n;i++)
{
	int t;
	cin>>t;
	if(m.count(t))
		cout<<"have\n";
	else cout<<"nohave\n";
}

Input

Output

have
have
nohave
have
have

count(x) 返回的是以 x 为键值的元素个数。显然这个值只可能为 1 或者 0．我们用它检验元素是否存在。用 find(x)!=end() 也可以达到同样效果。

重点来了：map 易错点

如果将上述代码的 if(m.count(t)) 换成 if(m[t]==1)，你会发现输出的答案仍然是正确的，但是，如果在查询量大的题目中，后者的时间复杂度会异常地大，有时可能导致 TLE。这种情况非常坑，有时是一直 TLE 而调不出来代码导致心态崩溃的罪魁祸首。

就比如这两份记录：

https://atcoder.jp/contests/abc265/submissions/34250287

https://atcoder.jp/contests/abc265/submissions/34250846

什么原因呢？让我们每次查询后输出 map 的每一个元素试试看

map<int,bool>m;
int n;
cin>>n;
for(int i=1;i<=n;i++)
{
	int t;
	cin>>t;
	m[t]=1;
}
cin>>n;
for(int i=1;i<=n;i++)
{
	int t;
	cin>>t;
	if(m.count(t))cout<<"have\n";
	else cout<<"nohave\n";
	cout<<"Case "<<i<<":\n";
	for(auto i:m)
	cout<<i.first<<' '<<i.second<<'\n';
	cout<<'\n';
}

Input

Output

have
Case 1:
1 1
2 1
3 1

nohave
Case 2:
1 1
2 1
3 1

nohave
Case 3:
1 1
2 1
3 1

使用 count() 的情况下，一切正常，符合预期。

把上面的 m.count(t) 改成 m[t]==1：

Input

同上。

Output

have
Case 1:
1 1
2 1
3 1

nohave
Case 2:
1 1
2 1
3 1
4 0

nohave
Case 3:
1 1
2 1
3 1
4 0
5 0

竟然莫名多了 \(val=0\) 的元素！

是的，当 map 使用 [] 访问不存在的元素时，它不会报错，而会插入一个新的 \(key=访问的key，val=0\) 的元素。这样，就导致我们查询的复杂度一下子从 \(O\big(m\log n\big)\) 变成 \(O\big(m\log (n+m)\big)\)。

有的题目只需要访问元素而本身不需要判断某个元素是否存在，这时我们一定不要忘了要加上判断其是否存在的这句话。对于 \(val=0\) 元素本身有意义的题目，没判存在的情况用 [] 还会导致 WA。

map 的遍历

map 的每个元素被表示成一个 pair，其中 first 为 key，second 为 value。

他的遍历可以这么写：

map<string,int>m;
for(pair<string,int>c:m)
{
	cout<<c.first<<' ';
	cout<<c.second<<'\n';
}

或者直接

map<string,int>m;
for(auto c:m)
{
	cout<<c.first<<' ';
	cout<<c.second<<'\n';
}

当然也可以用迭代器，但是太麻烦了。

\(\texttt{unordered_map}\)

unordered_map 和 map 使用方法和特点类似，只是由于无序，它的插入和查询都是均摊 \(O(1)\) 的，最坏情况由于刻意制造的哈希冲突，会变成 \(O(n)\)。

遍历与 map 同。

\(\texttt{pb_ds}\)

俗称平板电视，是扩展库，封装字典树、堆、平衡树、哈希等数据结构。这里主要讲 pd_ds 中的哈希表（即 cc_hash_table 和 gp_hash_table）。使用 pb_ds 的哈希表之前，我们要有这样的头文件和命名空间声明：

#include <ext/pb_ds/hash_policy.hpp>
using namespace __gnu_pbds;
/*
鉴于其中某些成员名称和标准命名空间的冲突，
有时避免错误可以不进行命名空间声明，
而在每次使用时用 __gnu_pbds::xxx; 的方式访问
*/

cc_hash_table 和 gp_hash_table 的主要区别为处理哈希冲突的方式。

gp_hash_table 使用查探法，cc_hash_table 使用拉链法。cc 常常跑的更快些。

和 unordered_map 类似，也是无序的。插入查询均摊 \(O(1)\)。

值得注意的是，这两个都没有 count() 函数，我们只能用 find(x)!=end() 做查询是否存在。

代码：

#include <ext/pb_ds/hash_policy.hpp>
using namespace __gnu_pbds;
/*etc*/

gp_hash_table <int,int> g;
cc_hash_table <int,int> c;
inline void solve(){
	int n;
	cin>>n;
	for(int i=1;i<=n;i++)
	{
		if(i&1) g[i]=1;
		else c[i]=1;
	}
	cin>>n;
	for(int i=1;i<=n;i++)
	{
		int t;
		cin>>t;
		
		if(g.find(t)!=g.end()) cout<<"have g\n";
		else cout<<"nohave g\n";

		if(c.find(t)!=c.end()) cout<<"have c\n";
		else cout<<"nohave c\n";
	}
}

他们也会在 [] 访问不存在的元素时，插入一个它，服了。

遍历与 map 同。做题时，常常会用到哈希，哈希的实现可以使用 STL 中的 map 与 unordered_map ，也可以使用 “平板电视” 中的 cc_hash_table 和 gp_hash_table。