Menu

Close
Subscribe
Home  |  Talks & Presentations  |  About Me   Subscribe

#codecs

Page 1 of 1

Reading compressed data with Spark using unknown file extensions

#spark , #codecs , #data 2 Oct 2015

This post could also be called Reading .gz.tmp files with Spark. At Socialmetrix we have several pipelines writing logs to AWS S3, sometimes Apache Flume fails on the last phase to rename the final archive from .gz.tmp to .gz, therefore those files are unavailable to be read by SparkContext.textFile API. This post presents our workaround to process those files.

Continue Reading »

Page 1 of 1
Mutable Ideas 2012-2020
Proudly generated by HUGO, with a customized version of Casper theme