A Spark application has a 128 GB DataFrame A and a 1 GB DataFrame B. If a broadcast join were to be performed on these two DataFrames, which of the following describes which DataFrame should be broadcasted and why?
A. Either DataFrame can be broadcasted. Their results will be identical in result and efficiency.
B. DataFrame B should be broadcasted because it is smaller and will eliminate the need for the shuffling of itself.
C. DataFrame A should be broadcasted because it is larger and will eliminate the need for the shuffling of DataFrame B.
D. DataFrame B should be broadcasted because it is smaller and will eliminate the need for the shuffling of DataFrame A.
E. DataFrame A should be broadcasted because it is smaller and will eliminate the need for the shuffling of itself.
A Spark application has a 128 GB DataFrame A and a 1 GB DataFrame B. If a broadcast join were to be performed on these t
-
- Site Admin
- Posts: 899559
- Joined: Mon Aug 02, 2021 8:13 am