python - PySpark ReduceByKey - Stack Overflow
stackoverflow.com › questions › 32038685Then send this in to reduceByKey method - object.reduceByKey(func) Per comments, actually the OP has a list of RDD Objects (not a single RDD Objects) , in that case you can convert the RDD objects to a list by calling .collect() and then do the logic , and then you can decide whether you want the resultant as a python dictionary or an RDD object, if you want first.
pyspark.RDD.reduceByKey — PySpark 3.3.1 documentation
spark.apache.org › pysparkRDD.reduceByKey(func: Callable [ [V, V], V], numPartitions: Optional [int] = None, partitionFunc: Callable [ [K], int] = <function portable_hash>) → pyspark.rdd.RDD [ Tuple [ K, V]] [source] ¶. Merge the values for each key using an associative and commutative reduce function. This will also perform the merging locally on each mapper before sending results to a reducer, similarly to a “combiner” in MapReduce.
pyspark.RDD.reduceByKey — PySpark 3.1.1 documentation
spark.apache.org › pysparkpyspark.RDD.reduceByKey ¶ RDD.reduceByKey(func, numPartitions=None, partitionFunc=<function portable_hash>) [source] ¶ Merge the values for each key using an associative and commutative reduce function. This will also perform the merging locally on each mapper before sending results to a reducer, similarly to a “combiner” in MapReduce.