Spark SQL DataFrame Self Join and Example - DWgeek.com
dwgeek.com › spark-sql-dataframe-self-join-andNov 16, 2019 · A self join in a DataFrame is a join in which dataFrame is joined to itself. The self join is used to identify the child and parent relation. In a Spark, you can perform self joining using two methods: Use DataFrame to join; Write Hive Self Join Query and Execute using Spark SQL; Let us check these two methods in details. Spark SQL DataFrame Self Join. In this method, we will use the DataFrame to perform self join. i.e. join dataFrame to itself.
pyspark.sql.DataFrame.join — PySpark 3.3.1 documentation
spark.apache.org › pysparkDataFrame.join(other: pyspark.sql.dataframe.DataFrame, on: Union [str, List [str], pyspark.sql.column.Column, List [pyspark.sql.column.Column], None] = None, how: Optional[str] = None) → pyspark.sql.dataframe.DataFrame [source] ¶ Joins with another DataFrame, using the given join expression. New in version 1.3.0. Parameters other DataFrame