day07 sparksql 生成Physical Plan

发布时间 2023-04-03 22:54:30作者: 一y样

  1. 案例sql

select A,B  from testdata2 where A>2

 对应的执行计划:

== Analyzed Logical Plan ==
Project [A#23, B#24]
+- Filter (A#23 > 2)
   +- SubqueryAlias testdata2
      +- View (`testData2`, [a#23,b#24])
         +- SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).a AS a#23, knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).b AS b#24]
            +- ExternalRDD [obj#22]

== Optimized Logical Plan ==
Project [A#23, B#24]
+- Filter (A#23 > 2)
   +- SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).a AS a#23, knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).b AS b#24]
      +- ExternalRDD [obj#22]

== Physical Plan ==
Project [A#23, B#24]
+- Filter (A#23 > 2)
   +- SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).a AS a#23, knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).b AS b#24]
      +- Scan[obj#22]

== executedPlan ==
*(1) Project [A#23, B#24]
+- *(1) Filter (A#23 > 2)
   +- *(1) SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).a AS a#23, knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).b AS b#24]
      +- Scan[obj#22]

 

1.1 该sql涉及的策略以及策略执行后得到的物理执行计划

========================涉及的策略:class org.apache.spark.sql.execution.SparkStrategies$SpecialLimits$============================
List(PlanLater Project [A#23, B#24])

========================涉及的策略:class org.apache.spark.sql.execution.SparkStrategies$BasicOperators$============================
List(Project [A#23, B#24]
+- PlanLater Filter (A#23 > 2)
)

========================涉及的策略:class org.apache.spark.sql.execution.SparkStrategies$BasicOperators$============================
List(Filter (A#23 > 2)
+- PlanLater SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).a AS a#23, knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).b AS b#24]
)

========================涉及的策略:class org.apache.spark.sql.execution.SparkStrategies$BasicOperators$============================
List(SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).a AS a#23, knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).b AS b#24]
+- PlanLater ExternalRDD [obj#22]
)

========================涉及的策略:class org.apache.spark.sql.execution.SparkStrategies$BasicOperators$============================
List(Scan[obj#22]
)

打印起作用的策略以及策略作用后的结果代码如下:

    val candidates = strategies.iterator.flatMap{ stra =>
      val my_plan = stra(plan)
      if(my_plan.size>0){
        println("========================涉及的策略:"+stra.getClass+"============================")
        println(my_plan)
        println()
      }
      my_plan
    }

 

2 . 代码作用细节:

2.1 代码入口: