Read / Extract TAR File - Java Example

In this tutorial, we will explain how to extract the contents of a TAR file through a Java program example. In order to decompress TAR file, we will be using Apache Commons Compress library, so make sure you have a copy of this library commons-compress-1.4.1.jar loaded into your classpath. You will also require Apache Commons IO library commons-io-2.4.jar in your classpath as we will use this to write every single extracted file from the TAR archive to disk. The high level steps involved in reading a TAR file via Java program is captured below:

Steps to Read / Extract a TAR File in Java
Steps to Read / Extract a TAR File in Java



Note: Shapes with a glow are to be repeated for every entry in the TAR file.

This example is targeted for beginners to understand how to go about on reading contents in the TAR file. Feel free to modify the program to handle directories / exceptions.  Let us now get started with the step by step guide.


1. Read TAR File into TarArchiveInputStream


You use TarArchiveInputStream to read a TAR file as an InputStream. Once you get a TAR file as an object of this type, you can easily start processing on the file.  The code segment to read a File into TarArchiveInputStream [org.apache.commons.compress.archivers.tar.TarArchiveInputStream] is provided below:
                /* Step - 1: Read File into TarArchiveInputStream */
                TarArchiveInputStream myTarFile=new TarArchiveInputStream(new FileInputStream(new File("tar_ball.tar")));


Note: tar_ball.tar is our sample TAR file. It contains two XML files, as shown in the screenshot below:

Input TAR File for Extraction - Sample
Input TAR File for Extraction - Sample


2. Get Every File Name from TAR File


TarArchiveInputStream has a method getNextTarEntry using which, you can read every single entry in the file as an object of type TarArchiveEntry [org.apache.commons.compress.archivers.tar.TarArchiveEntry]. When there are no more files available inside the TAR file, then this method returns null. So,we declare a While loop and read every single file inside our TAR file. To get the name of every individual file, you should use getName method available in TarArchiveEntry class. We need the file name, as we have to write the same file back to the disk. The code segment will be provided once we complete other related steps.


3. Get Individual File Size


Here, we use the getSize method available in TarArchiveEntry class to get the size of the file. This is returned as a long value and  we have to create a byte array so that we can extract the required number of bytes ( that matches file size inside the TAR ball) from the original archive. In order to avoid loosing precision we cast the long as int and pass it to the byte array method. The code segment for steps (2) and (3) is shown below:
                while ((entry = myTarFile.getNextTarEntry()) != null) {
                        /* Get the name of the file */
                        individualFiles = entry.getName();
                        byte[] content = new byte[(int) entry.getSize()];



4. Extract File Based on Size Offset


We have identified the length of the first file inside the archive in bytes. Now, we can extract the file into the byte array we created, by using the read method  in TarArchiveInputStream. We pass the byte array name and specify the number of bytes to extract with the offset. This step reads the archive and gets the file out of it to the byte array. This is shown below:

                        myTarFile.read(content, offset, content.length - offset);



5. Write Every Extracted File to Disk


We have got a byte array with us now. We also know the file name obtained in step 2. We now need to convert byte array to a physical file on the disk using Java. To do this, we will make use of IOUtils class defined in org.apache.commons.io.IOUtils. This class has a write method which takes a byte array and OutputStream object. It then writes the contents of the byte array to the specified Output Stream. Easy.
                        outputFile=new FileOutputStream(new File(individualFiles));
                        IOUtils.write(content,outputFile);              
                        outputFile.close();

Finally , you close all output streams / files opened and that completes the program. A really simple one that makes decompressing TAR files, a very simple job. Let us now see the full Java program.


Extract TAR File – Complete Java Program


The complete Java program to extract  / read the contents of the TAR file in Java is provided below:

import java.io.*;
import org.apache.commons.compress.archivers.tar.TarArchiveEntry;
import org.apache.commons.compress.archivers.tar.TarArchiveInputStream;
import org.apache.commons.io.IOUtils;
public class unTar {  
        public static void main(String[] args) throws Exception{
                /* Read TAR File into TarArchiveInputStream */
                TarArchiveInputStream myTarFile=new TarArchiveInputStream(new FileInputStream(new File("tar_ball.tar")));
                /* To read individual TAR file */
                TarArchiveEntry entry = null;
                String individualFiles;
                int offset;
                FileOutputStream outputFile=null;
                /* Create a loop to read every single entry in TAR file */
                while ((entry = myTarFile.getNextTarEntry()) != null) {
                        /* Get the name of the file */
                        individualFiles = entry.getName();
                        /* Get Size of the file and create a byte array for the size */
                        byte[] content = new byte[(int) entry.getSize()];
                        offset=0;
                        /* Some SOP statements to check progress */
                        System.out.println("File Name in TAR File is: " + individualFiles);
                        System.out.println("Size of the File is: " + entry.getSize());                  
                        System.out.println("Byte Array length: " + content.length);
                        /* Read file from the archive into byte array */
                        myTarFile.read(content, offset, content.length - offset);
                        /* Define OutputStream for writing the file */
                        outputFile=new FileOutputStream(new File(individualFiles));
                        /* Use IOUtiles to write content of byte array to physical file */
                        IOUtils.write(content,outputFile);              
                        /* Close Output Stream */
                        outputFile.close();
                }               
                /* Close TarAchiveInputStream */
                myTarFile.close();
        }
}


The output of the program is shown below:

java -classpath .;commons-compress-1.4.1.jar;commons-io-2.4.jar unTar
File Name in TAR File is: test_file_1.xml
Size of the File is: 11357
Byte Array length: 11357
File Name in TAR File is: ZipFiles.txt
Size of the File is: 1439
Byte Array length: 1439

The program produces two files in the output which can be seen from the screenshot below:

Extracted Files from TAR Archive  - Java Program Output
Extracted Files from TAR Archive  - Java Program Output 

There is good scope for improving this code. If you have a comment, you can post it in the comments section of this blog. See you in the next tutorial otherwise. 

2 comments:

  1. Your example extracts folders inside tar as files. Please update it to handle both.

    ReplyDelete
  2. byte[] content = new byte[(int) entry.getSize()];
    => Bad idea, big problems ahead if you have large files.
    I prefer something like
    while ((entry = myTarFile.getNextTarEntry()) != null) {
    /* Get the name of the file */
    individualFiles = entry.getName();

    /* Some SOP statements to check progress */
    System.out.println("File Name in TAR File is: " + individualFiles);
    System.out.println("Size of the File is: " + entry.getSize());
    System.out.println("Byte Array length: " + content.length);

    /* Define OutputStream for writing the file */
    outputFile=new FileOutputStream(new File(individualFiles));
    /* Use IOUtiles to write content to physical file */
    IOUtils.copyLarge(myTarFile, outputFile, 0, entry.getSize());

    /* Close Output Stream */
    outputFile.close();
    }

    ReplyDelete