Articles for tag hadoop
-
Ignoring Corrupted gzip files in Hadoop
Handle corrupted Hadoop gzipped log files gracefully. This improved `FileInputFormat` extends Hadoop's default functionality, enabling your MapReduce jobs to continue processing even when encountering corrupted hourly log files, preventing job aborts due to `EOFException`. The solution uses a custom `LineRecordReader` to handle exceptions, ensuring data processing continues uninterrupted.