Spark - Print contents of RDD - TutorialKart
www.tutorialkart.com › spark-print-contents-of-rddSpark – Print contents of RDD. RDD (Resilient Distributed Dataset) is a fault-tolerant collection of elements that can be operated on in parallel. To print RDD contents, we can use RDD collect action or RDD foreach action. RDD.collect () returns all the elements of the dataset as an array at the driver program, and using for loop on this array, we can print elements of RDD.
RDD Programming Guide - Spark 3.3.1 Documentation
spark.apache.org › docs › latestTo print all elements on the driver, one can use the collect() method to first bring the RDD to the driver node thus: rdd.collect().foreach(println). This can cause the driver to run out of memory, though, because collect() fetches the entire RDD to a single machine; if you only need to print a few elements of the RDD, a safer approach is to use the take() : rdd.take(100).foreach(println) .