pyspark小案例

发布时间 2023-08-12 17:39:23作者: steve.z

#
#   py_pyspark_demo.py
#   py_learn
#
#   Created by Z. Steve on 2023/8/12 15:33.
#


# 统计文件中各个单词出现的次数

# 1. 导入库
from pyspark import SparkConf, SparkContext

# 2. 创建 SparkConf 对象 和 SparkContext 对象
conf = SparkConf().setMaster("local[*]").setAppName("spark_demo")
sc = SparkContext(conf=conf)

# 3. 读取文本文件
rdd = sc.textFile("/Users/stevexhz/PycharmProjects/py_learn/content.txt")
word_list_rdd = rdd.flatMap(lambda x: x.split(" "))

word_group_rdd = word_list_rdd.map(lambda word: (word, 1))
result_rdd = word_group_rdd.reduceByKey(lambda a, b: a + b)

# 4. 输出
print(result_rdd.collect())


'''
content.txt

hello
welcome
to
our
country
significant
vulnerable
hurl
hello
welcome
to
our
country
significant
vulnerable
hurl
today is a great day.
and everybody should be here.
Vice President Kamala Harris is scheduled to visit Seattle on Tuesday to attend a political fundraiser and deliver a speech on the Biden administration’s actions to address climate change.

It’s expected that Harris will address provisions of the Inflation Reduction Act, signed into law by President Joe Biden after the vice president cast her tie-breaking vote last August in a divided Senate.

The legislation allocates nearly $375 billion over the next decade for climate-changing measures, including tax credits for clean energy manufacturing and production, and for consumer investments in electric vehicles and wind and solar power.

The act also aimed to lower prescription drug costs, provide more funding for the Internal Revenue Service, and impose a new corporate minimum tax while – say supporters – paying down the federal deficit over time. During the Senate vote, only Democrats favored the bill; Republicans were equally opposed.

During multiple appearances across the nation this month, when Congress is in recess, Harris and the president have been touting the benefits of the legislation and the Bipartisan Infrastructure Law.

Second Gentleman Doug Emhoff is expected to join the vice president on Tuesday’s visit. The Seattle Times reported that Harris will also headline a high-priced political fundraising luncheon co-hosted by Microsoft president Brad Smith and his wife, Kathy Surace-Smith, along with other Microsoft executives and community, business, and civic leaders.


'''