内置数据类型

数值类型数据

int 类型

int 类型用于表示整数，Python 3 中的 int 类型是没有大小限制的，可以当作 long 类型使用。

i.bit_length() 返回 int 类型的二进制表示的长度

float 类型

float 类型用于表示有限精度浮点数，python中没有 double 类型，float 类型的精度和平台有关。

浮点数的比较应该使用 math.isclose 函数，而不是直接使用 == 进行比较

complex 类型

complex 类型用于表示复数，复数由实部和虚部构成，虚部用 j 表示。

a = 1 + 2j
print(a.real) # 1.0
print(a.imag) # 2.0

a.conjugate() # 1-2j 复数的共轭

bool 类型

bool 类型用于表示布尔值，布尔值只有两个取值：True 和 False。

优先级：not > and > or

序列类型数据

str 类型

单引号和双引号的字符串是一样的，但是如果字符串中包含单引号或双引号，那么需要使用另外一种引号来表示字符串，如："I'm fine"、'He said "I\'m fine"'

三引号用于表示多行字符串，如：

s = '''hello
world'''
print(s) # hello
        # world

字符串的格式化

print('hello, %s' % 'world') # hello, world
print('hello, {}'.format('world')) # hello, world
print('hello, {0}'.format('world')) # hello, world
print('hello, {name}'.format(name='world')) # hello, world

# 数值格式化
print('{:.2f}'.format(3.1415926)) # 3.14
print('num%03d' % 1) # num001

# 字符串格式化
print('1'.center(20)) #         1
print("1".ljust(20)) # 1
print("1".rjust(20,"*")) # *****************1
print(format("121", ">20")) #                 121 # 1行20个字符，右对齐
print(format("121", "<20")) # 121                 # 1行20个字符，左对齐
print(format("121", "^20")) #         121         # 1行20个字符，居中对齐

元组和列表

元组是一种不可变的序列类型，元组的元素可以是任意类型，元组中的元素通过逗号分隔。

列表是一种可变的序列类型，列表的元素可以是任意类型，列表中的元素通过逗号分隔。

# 元组
a = (1, 2, 3)
t = tuple([1, 2, 3]) # 通过 tuple 函数创建元组（参数是可遍历对象） # (1, 2, 3)
t = tuple('hello') # ('h', 'e', 'l', 'l', 'o')

#列表
b = [1, 2, 3] 
l = list((1, 2, 3)) # 通过 list 函数创建列表 # [1, 2, 3]

列表解析表达式

迭代序列中所有内容，并对每个元素进行操作

a = [1, 2, 3, 4, 5]
b = [i * 2 for i in a] # [2, 4, 6, 8, 10]

按条件迭代序列，并对每个元素进行操作

a = [1, 2, 3, 4, 5]
b = [i * 2 for i in a if i > 2] # [6, 8, 10]

序列类型的基本操作

len(s) 、max(s)、min(s)、sum(s)
索引：s[i]、切片：s[i:j:k]

s='hello world'

print(s[0]) # h
print(s[-1]) # d
print(s[0:5]) # hello
print(s[0:5:2]) # hlo # 步长为2
print(s[::2]) # hlowrd # 步长为2
print(s[::-1]) # dlrow olleh # 反转字符串

in 和 not in 操作符

s='hello world'

print('hello' in s) # True
print('hello' not in s) # False

+ 和 * 操作符

s='hello world'

print(s + '!!!') # hello world!!!
print(s * 3) # hello worldhello worldhello world

all(s) 和 any(s) 函数

s=(1, 2, 0) # 元组

print(all(s)) # False
print(any(s)) # True

排序

s='hello world'

print(sorted(s)) # [' ', 'd', 'e', 'h', 'l', 'l', 'l', 'o', 'o', 'r', 'w']
print(sorted(s, reverse=True)) # ['w', 'r', 'o', 'o', 'l', 'l', 'l', 'h', 'e', 'd', ' ']

序列拆封

变量个数和序列元素个数相等

a, b, c = [1, 2, 3]
print(a, b, c) # 1 2 3

变量个数和序列元素个数不相等
使用 * 来表示剩余元素

a, *b, c = [1, 2, 3, 4, 5]
print(a, b, c) # 1 [2, 3, 4] 5

序列迭代

Map函数

map(func, *iterables)：将传入的函数依次作用在传入的可迭代对象上，返回一个迭代器

def func(x):
    return x ** 2

for i in map(func, [1, 2, 3, 4, 5]):
    print(i) # 1 4 9 16 25

itertools.starmap(func, iterable):与map函数类似，但是传入的可迭代对象中的元素是可迭代对象，会把可迭代对象中的元素作为参数传递给func函数

def func(x, y):
    return x ** y

for i in itertools.starmap(func, [(1, 2), (2, 3), (3, 4)]):
    print(i) # 1 8 81

Filter函数

filter(func, iterable)：将传入的函数依次作用在传入的可迭代对象上，返回一个迭代器，迭代器中只包含使得函数返回值为True的元素

def func(x):
    return x % 2 == 0

for i in filter(func, [1, 2, 3, 4, 5]):
    print(i) # 2 4

itertools.filterfalse(func, iterable)：与filter函数类似，但是返回的是使得函数返回值为False的元素

def func(x):
    return x % 2 == 0

for i in itertools.filterfalse(func, [1, 2, 3, 4, 5]):
    print(i) # 1 3 5

Zip函数

zip(*iterables)：将传入的可迭代对象中的元素依次打包成元组，返回一个迭代器

for i in zip([1, 2, 3], [4, 5, 6]):
    print(i) # (1, 4) (2, 5) (3, 6)

itertools.zip_longest(*iterables, fillvalue=None)：与zip函数类似，但是会把最长的可迭代对象中的元素打包，如果最长的可迭代对象中的元素个数不足，则使用fillvalue填充

for i in itertools.zip_longest([1, 2, 3], [4, 5, 6, 7], fillvalue=0):
    print(i) # (1, 4) (2, 5) (3, 6) (0, 7)

可迭代对象

实现了__iter__方法的对象是可迭代对象

__iter__方法：返回一个迭代器
__next__方法：返回迭代器中的下一个元素
__reversed__方法：返回一个反向迭代器

class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age

    def __iter__(self):
        return iter([self.name, self.age])
    
    def __next__(self):
        return next(iter([self.name, self.age]))
    def __reversed__(self):
        return reversed([self.name, self.age])

    

p = Person('Tom', 18)
for i in p:
    print(i) # Tom 18

print(next(p)) # Tom
print(next(p)) # 18

for i in reversed(p):
    print(i) # 18 Tom

迭代器

itertools.count(start=0, step=1)：从start开始，以step为步长，无限迭代下去
itertools.cycle(iterable)：无限迭代iterable中的元素
itertools.repeat(object, times=None)：重复object，times表示重复次数，如果times为None，则无限重复
itertools.accmulate(iterable, func=None)：对iterable中的元素依次调用func函数，如果func为None，则对元素求和
itertools.chain(*iterables)：将多个可迭代对象连接起来，返回一个迭代器
itertools.islice(iterable, start, stop, step=None)：对可迭代对象进行切片，返回一个迭代器
itertools.groupby(iterable, key=None)：对可迭代对象进行分组，返回一个迭代器，每个元素是一个元组，元组中的第一个元素是分组的key，第二个元素是分组中的元素集合
itertools.tee(iterable, n=2)：将一个可迭代对象分为n个，返回一个元组，元组中的每个元素都是一个迭代器
itertools.combinations(iterable, r)：从可迭代对象中取出r个元素，返回一个迭代器，迭代器中的每个元素都是一个元组，元组中的元素是可迭代对象中的元素
itertools.combinations_with_replacement(iterable, r)：从可迭代对象中取出r个元素，返回一个迭代器，迭代器中的每个元素都是一个元组，元组中的元素是可迭代对象中的元素，元素可以重复
itertools.permutations(iterable, r=None)：序列排序
itertools.product(*iterables, repeat=1)：求笛卡儿积

for i in itertools.count(1, 2):
    print(i) # 1 3 5 7 9 ...

for i in itertools.cycle([1, 2, 3]):
    print(i) # 1 2 3 1 2 3 ...

for i in itertools.repeat(1, 3):
    print(i) # 1 1 1

for i in itertools.accumulate([1, 2, 3, 4, 5]):
    print(i) # 1 3 6 10 15

for i in itertools.chain([1, 2, 3], [4, 5, 6]):
    print(i) # 1 2 3 4 5 6

for i in itertools.islice([1, 2, 3, 4, 5], 1, 4, 2):
    print(i) # 2 4

for key, group in itertools.groupby('AAABBBCCAAA'):
    print(key, list(group)) # A ['A', 'A', 'A'] B ['B', 'B', 'B'] C ['C', 'C'] A ['A', 'A', 'A']

for i in itertools.tee([1, 2, 3, 4, 5], 2):
    print(list(i)) # [1, 2, 3, 4, 5] [1, 2, 3, 4, 5]

for i in itertools.combinations([1, 2, 3, 4, 5], 3):
    print(i) # (1, 2, 3) (1, 2, 4) (1, 2, 5) (1, 3, 4) (1, 3, 5) (1, 4, 5) (2, 3, 4) (2, 3, 5) (2, 4, 5) (3, 4, 5)

for i in itertools.combinations_with_replacement([1, 2, 3], 2):
    print(i) # (1, 1) (1, 2) (1, 3) (2, 2) (2, 3) (3, 3)

for i in itertools.permutations([1, 2, 3],2):
    print(i) # (1, 2) (1, 3) (2, 1) (2, 3) (3, 1) (3, 2)

for i in itertools.product([1, 2, 3], [4, 5, 6]):
    print(i) # (1, 4) (1, 5) (1, 6) (2, 4) (2, 5) (2, 6) (3, 4) (3, 5) (3, 6)

复杂数据结构

Array模块

Python中的数组是通过array模块中的array类来实现的，数组的元素必须是相同的类型，数组的元素可以是数字、字符串、对象等，但是数组中的元素不能是列表或者字典等类型。

import array

a = array.array('i', [1, 2, 3, 4, 5]) # 创建一个整型数组
print(a) # array('i', [1, 2, 3, 4, 5])

# 需要指定数组的类型，否则会报错
a = array.array([1, 2, 3, 4, 5]) # TypeError: array() argument 1 must be a unicode character, not list

类型代码	C类型	Python类型	最小字节数
'b'	signed char	int	1
'B'	unsigned char	int	1
'i'	signed int	int	2
'I'	unsigned int	int	2
'l'	signed long	int	4
'L'	unsigned long	int	4
'f'	float	float	4
'd'	double	float	8

collections 模块

collections 模块提供了一些有用的集合类，可以根据需要选用。

deque 对象

deque 是一个双端队列，可以从头和尾添加或删除元素，deque 是线程安全的。

from collections import deque

d = deque('hello')

d.append('w') # 从尾部添加元素
d.appendleft('w') # 从头部添加元素
print(d) # deque(['w', 'h', 'e', 'l', 'l', 'o', 'w'])

defaultdict 对象

defaultdict 是一个字典，可以指定默认值，当访问不存在的键时，返回默认值。

from collections import defaultdict

d = defaultdict(int) # 指定默认值为 int 类型
d['a'] = 1
print(d['a']) # 1
print(d['b']) # 0

OrderedDict 对象

OrderedDict 是一个字典，可以保持元素的添加顺序。

from collections import OrderedDict

d = OrderedDict()

d['b'] = 2
d['c'] = 3
d['a'] = 1

print(d) # OrderedDict([('b', 2), ('c', 3), ('a', 1)])

ChainMap 对象

ChainMap 是一个字典的集合，可以将多个字典进行逻辑上的合并，ChainMap 是线程安全的。

from collections import ChainMap

m1 = {'a': 1, 'b': 2}
m2 = {'a': 3, 'x': 4, 'y': 5}
m = ChainMap(m1, m2)

print(m['a']) # 1
print(m.maps) # [{'a': 1, 'b': 2}, {'a': 3, 'x': 4, 'y': 5}]

# parents 返回除了第一个映射之外的所有映射
print(m.parents) # ChainMap({'a': 3, 'x': 4, 'y': 5})

# 包含一个新的映射，是所有当前的映射
print(m.new_child()) # [{'a': 1, 'b': 2}, {'a': 3, 'x': 4, 'y': 5}]

ChainMap应用

比如用户指定的命令行参数优先于环境变量的示例，而环境变量优先于默认值：

import os, argparse
from collections import ChainMap

# 构造缺省参数:
defaults = {'color':'red','user':'guest'}

parser = argparse.ArgumentParser()
parser.add_argument('-u','-user')
parser.add_argument('-c','-color')

namespace = parser.parse_args()

# 构造命令行参数:
command_line_args = { k: v for k, v in vars(namespace).items() if v }
combined = ChainMap(command_line_args, os.environ, defaults)

# 打印参数:
print('color=%s' % combined['color']) # 优先从命令行参数获取
print('user=%s' % combined['user'])

Counter 对象

Counter 是一个计数器，可以统计字符出现的个数。

from collections import Counter

c = Counter('hello world')
print(c) # Counter({'l': 3, 'o': 2, 'h': 1, 'e': 1, ' ': 1, 'w': 1, 'r': 1, 'd': 1})

elements() 返回迭代器，按照计数重复元素
most_common([n]) 返回前 n 个元素和计数
subtract([iterable-or-mapping]) 从迭代对象或者映射对象中减去元素

from collections import Counter

c = Counter('hello world')
print(list(c.elements())) # ['h', 'e', 'l', 'l', 'l', 'o', 'o', ' ', 'w', 'r', 'd']
print(c.most_common(2)) # [('l', 3), ('o', 2)]
c.subtract('hello')
print(c) # Counter({'l': 2, 'o': 1, 'h': 0, 'e': 0, ' ': 0, 'w': 1, 'r': 1, 'd': 1})

namedtuple 对象

namedtuple 是一个命名元组，可以通过名字来访问元组中的元素。

from collections import namedtuple

Point = namedtuple('Point', ['x', 'y'])
p = Point(1, 2)
print(p.x) # 1
print(p.y) # 2

集合

集合是一个无序的不重复元素序列，可以使用大括号 {} 或者 set() 函数创建集合。

s = {1, 2, 3, 4, 5}
s = set([1, 2, 3, 4, 5])

# 集合中的元素不能重复
s = {1, 2, 3, 4, 5, 5, 5}
print(s) # {1, 2, 3, 4, 5}

# 集合中的元素必须是不可变类型
s = {[1, 2], 3, 4, 5} # TypeError: unhashable type: 'list'

集合的基本操作

解析表达式
元素是否存在 in 和 not in 操作符
交并差运算 &、|、-、^

s1 = {1, 2, 3, 4, 5}
s2 = {4, 5, 6, 7, 8}

print(s1 & s2) # {4, 5}
print(s1 | s2) # {1, 2, 3, 4, 5, 6, 7, 8}
print(s1 - s2) # {1, 2, 3}

# 对称差运算，相当于 (s1 - s2) | (s2 - s1)
print(s1 ^ s2) # {1, 2, 3, 6, 7, 8}

isdisjoint、issubset、issuperset、union、intersection、difference、symmetric_difference

s1 = {1, 2, 3, 4, 5}
s2 = {4, 5, 6, 7, 8}

# 是否没有交集
print(s1.isdisjoint(s2)) # False

# 是否是子集
print(s1.issubset(s2)) # False

# 是否是父集
print(s1.issuperset(s2)) # False

# 并集
print(s1.union(s2)) # {1, 2, 3, 4, 5, 6, 7, 8}

# 交集
print(s1.intersection(s2)) # {4, 5}

# 差集
print(s1.difference(s2)) # {1, 2, 3}

# 对称差集
print(s1.symmetric_difference(s2)) # {1, 2, 3, 6, 7, 8}

<、<=、>、>= 操作符


s1 = {1, 2, 3, 4, 5}
s2 = {4, 5, 6, 7, 8}

# 是否是子集
print(s1 < s2) # False

# 是否是父集
print(s1 > s2) # False

# 是否是真子集
print(s1 <= s2) # False

# 是否是真父集
print(s1 >= s2) # False

字典/映射

字典是一种映射类型，字典中的元素是键值对，字典中的键必须是不可变类型，字典中的值可以是任意类型。

d = {'name': 'Tom', 'age': 18}
d = dict(name='Tom', age=18)

字典的创建

dict(**kwargs) 通过关键字参数创建字典
dict(mapping, **kwargs) 通过映射对象和关键字参数创建字典
dict(iterable, **kwargs) 通过可迭代对象和关键字参数创建字典

d = dict(name='Tom', age=18) # {'name': 'Tom', 'age': 18}
d = dict([('name', 'Tom'), ('age', 18)]) # {'name': 'Tom', 'age': 18}
d = dict({'name': 'Tom', 'age': 18}) # {'name': 'Tom', 'age': 18}

字典的基本操作

len(d) 返回字典中元素的个数
d[key] 返回字典中键为 key 对应的值
d[key] = value 设置字典中键为 key 对应的值为 value
del d[key] 删除字典中键为 key 的元素
字典的遍历
d.items() 、d.keys()、d.values()都是可迭代对象

字典解析表达式

迭代字典中所有的键值对

d = {'name': 'Tom', 'age': 18}
d = {k: v for k, v in d.items()} # {'name': 'Tom', 'age': 18}

按条件迭代字典中所有的键值对

d = {'name': 'Tom', 'age': 18}
d = {k: v for k, v in d.items() if v > 18} # {'age': 18}

526互联

【python基础】2.python数据结构