python generator相关

发布时间 2023-03-28 14:52:08作者: saaspeter

本文的重点介绍python中的yield用法及这样的表达式:

comp_list = [x * 2 for x in range(10)]   -- List Comprehensions

(x ** 2 for x in range(10))  -- Generator Expressions

摘抄自: https://realpython.com/introduction-to-python-generators/  和 List Comprehensions in Python and Generator Expressions | Django Stars , 给自己看的,所以格式较乱

一、python generator

先看两个例子:

Example 1: Reading Large Files

A common use case of generators is to work with data streams or large files, like CSV files. These text files separate data into columns by using commas. This format is a common way to share data. Now, what if you want to count the number of rows in a CSV file? The code block below shows one way of counting those rows:

csv_gen = csv_reader("some_csv.txt")
row_count = 0

for row in csv_gen:
    row_count += 1

print(f"Row count is {row_count}")

Looking at this example, you might expect csv_gen to be a list. To populate this list, csv_reader() opens a file and loads its contents into csv_gen. Then, the program iterates over the list and increments row_count for each row.

This is a reasonable explanation, but would this design still work if the file is very large? What if the file is larger than the memory you have available? To answer this question, let’s assume that csv_reader() just opens the file and reads it into an array:

def csv_reader(file_name):
    file = open(file_name)
    result = file.read().split("\n")
    return result

This function opens a given file and uses file.read() along with .split() to add each line as a separate element to a list. If you were to use this version of csv_reader() in the row counting code block you saw further up, then you’d get the following output:

>>>
Traceback (most recent call last):
  File "ex1_naive.py", line 22, in <module>
    main()
  File "ex1_naive.py", line 13, in main
    csv_gen = csv_reader("file.txt")
  File "ex1_naive.py", line 6, in csv_reader
    result = file.read().split("\n")
MemoryError

In this case, open() returns a generator object that you can lazily iterate through line by line. However,  file.read().split() loads everything into memory at once, causing the MemoryError.

Before that happens, you’ll probably notice your computer slow to a crawl. You might even need to kill the program with a KeyboardInterrupt. So, how can you handle these huge data files? Take a look at a new definition of csv_reader():

def csv_reader(file_name):
    for row in open(file_name, "r"):
        yield row

In this version, you open the file, iterate through it, and yield a row. This code should produce the following output, with no memory errors:

Row count is 64186394

What’s happening here? Well, you’ve essentially turned csv_reader() into a generator function. This version opens a file, loops through each line, and yields each row, instead of returning it.

You can also define a generator expression (also called a generator comprehension), which has a very similar syntax to list comprehensions. In this way, you can use the generator without calling a function:

csv_gen = (row for row in open(file_name))

This is a more succinct way to create the list csv_gen. You’ll learn more about the Python yield statement soon. For now, just remember this key difference:

  • Using yield will result in a generator object.
  • Using return will result in the first line of the file only.

Example 2: Generating an Infinite Sequence

Let’s switch gears and look at infinite sequence generation. In Python, to get a finite sequence, you call range() and evaluate it in a list context:

>>>
>>> a = range(5)
>>> list(a)
[0, 1, 2, 3, 4]

Generating an infinite sequence, however, will require the use of a generator, since your computer memory is finite:

def infinite_sequence():
    num = 0
    while True:
        yield num
        num += 1

This code block is short and sweet. First, you initialize the variable num and start an infinite loop. Then, you immediately yield num so that you can capture the initial state. This mimics the action of range().

After yield, you increment num by 1. If you try this with a for loop, then you’ll see that it really does seem infinite:

>>>
>>> for i in infinite_sequence():
...     print(i, end=" ")
...
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
30 31 32 33 34 35 36 37 38 39 40 41 42
[...]
6157818 6157819 6157820 6157821 6157822 6157823 6157824 6157825 6157826 6157827
6157828 6157829 6157830 6157831 6157832 6157833 6157834 6157835 6157836 6157837
6157838 6157839 6157840 6157841 6157842
KeyboardInterrupt
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>

The program will continue to execute until you stop it manually.

Instead of using a for loop, you can also call next() on the generator object directly. This is especially useful for testing a generator in the console:

>>>
>>> gen = infinite_sequence()
>>> next(gen)
0
>>> next(gen)
1
>>> next(gen)
2
>>> next(gen)
3

Here, you have a generator called gen, which you manually iterate over by repeatedly calling next(). This works as a great sanity check to make sure your generators are producing the output you expect.

Understanding Generators

So far, you’ve learned about the two primary ways of creating generators: by using generator functions and generator expressions. You might even have an intuitive understanding of how generators work. Let’s take a moment to make that knowledge a little more explicit.

Generator functions look and act just like regular functions, but with one defining characteristic. Generator functions use the Python yield keyword instead of return. Recall the generator function you wrote earlier:

def infinite_sequence():
    num = 0
    while True:
        yield num
        num += 1

This looks like a typical function definition, except for the Python yield statement and the code that follows it. yield indicates where a value is sent back to the caller, but unlike return, you don’t exit the function afterward.

Instead, the state of the function is remembered. That way, when next() is called on a generator object (either explicitly or implicitly within a for loop), the previously yielded variable num is incremented, and then yielded again. Since generator functions look like other functions and act very similarly to them, you can assume that generator expressions are very similar to other comprehensions available in Python.

 

二、List Comprehensions 

[x * 2 for x in range(10)], 返回的是list

三、 Generator Expressions
>>> gen_exp = (x ** 2 for x in range(10) if x % 2 == 0)
像这样的形式,没有用yield,但是返回的是generator