Rename PySpark Result File

 Due to the distributed nature of Apache Spark, when writing result, we can't specify name for the result file. This makes the result file hard to predict which I need for my process orchestration. In my case, I need to write the result to S3 and I finally found a way to do this within a reasonable amount of time by utilizing aws wrangler, Panda, and optionally Arrow. I basically feed Spark dataframe to aws wrangler and have it write to S3 using a specific name.

Here's link to my sample: https://github.com/nik-yo/PySparkFilename

Comments

Popular posts from this blog

Sentinel One Strikes Again. No internet connection. Uninstall Sentinel One Agent.

A2 Hosting Let's Encrypt Can't Install Certificate on ASP.NET Core Application

NuGet Package Reference NU6105 Publish Error