Compare two files by content - Java Example

Compare two Files - Introduction


In this tutorial, we will provide a highly simplified approach to compare two files in Java by content. We will use Apache Commons IO library to perform this comparison, and run this against some test files we prepare to see the results. Make sure you download commons-io-2.4 and have the file commons-io-2.4.jar in your classpath.We will be using the class org.apache.commons.io.FileUtils to compare the files.

Java Program to Compare Files by Content


The Java program to compare two files by content using Apache Commons IO, is provided below:

import java.io.*;
import org.apache.commons.io.FileUtils;
public class compareFiles {
    public static void main(String[] args) throws Exception {
        /* Get the files to be compared first */
        File test_file_1= new File(args[0]);
        File test_file_2= new File(args[1]);
        boolean compareResult=FileUtils.contentEquals(test_file_1,test_file_2);
        System.out.println("Are the files same? " + compareResult);
    }
}

If you look at the program above, you will be quick to notice that the code is really simple. You just invoke the static method "contentEquals", that takes two File objects. It compares them and returns a boolean back to you. You dump the output back in a SOP statement, or take it for further processing.  We will quickly run some test to see the code in action.

Compare Two Text Files - Output


A sample test screenshot for both the files is shown below:

Compare Two Text Files by Content - Java Program Examples - Input
Compare Two Text Files by Content - Java Program Example - Input
 The output of the program is shown below:

java -classpath .;commons-io-2.4.jar compareFiles test.txt test2.txt
Are the files same? true

For a change, we introduce a special character on one of the files (test2.txt), a new line at the end. The output automatically returns the result as false. Let us now compare two word documents by content.

Compare two Word Documents - Output


We tried the same code with two Microsoft Word documents and it worked like a charm. Here is a screenshot of the content used in Word documents.

Compare Two Word Documents by Content - Java Program Example - Input
Compare Two Word Documents by Content - Java Program Example - Input

The difference is highlighted by a line. We have a additional spacebar. The output was false. When the spacebar was removed the output came as true.

The comparison works because, it is done as a byte by byte comparison between the two files as per the API docs. You can compare any two files using this code, be it of any format. One drawback of this approach, is that if one of the files has additional metadata but same content, the comparison still returns false. This is acceptable in one angle and otherwise. So, you may want to make use of this simplified approach depending on your requirement.

No comments:

Post a Comment