Read XLS / XLSX Files in Java Using Apache POI

We are going to commence a new set of tutorials on processing Excel spreadsheets in Java using Apache POI. We had earlier provided a tutorial on creatingexcel documents in JExcel / Java. This set of examples will utilize Apache POI library to create MS Office documents, specifically around Excel Workbooks. I will not dwell more into the library itself, but will focus more on providing working examples that you can easily integrate into your Java application. We are going to present a set of 10 examples (tested working) using POI / Java, and at the bottom of the post you can find links to each of these. If you are keen to look at a particular post, you can jump directly into this.

JAR Files - Examples:


 To work with these examples,  you need the following JAR files to be loaded into your CLASSPATH


  • Java version 1.5 / above
  • Apache POI library v3.8 (You can still use a higher version and make this example to work)
The first tutorial of this series would explain how to read Excel files in Java using Apache POI. We will use some of the powerful features of POI to get this done. Before we start, let us define a sample XLSX file that we want to read



Read XLS files using Apache POI


We have a sample XLS file with the content as per the image below:

Input XLS / XLSX file to Read in Apache POI
Input XLS / XLSX file to Read in Apache POI
The worksheet name is “Sheet1”. I have saved this document with the name Read_Excel_Example.xls. I also have a XLSX version of the same sheet for reasons that will be explained later. We will see how to read this workbook and print the output on the screen in POI.

Read XLS Documents using Apache POI / Java


The Java code to read XLS files in POI is provided below. You can use this code as a step by step guide as it is fully commented:
import java.io.FileInputStream;
import java.io.*;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.ss.usermodel.*;
import java.util.Iterator;
public class readxls {  
        public static void main(String[] args) throws Exception{
                FileInputStream input_document = new FileInputStream(new File("C:\\Read_Excel_Example.xls")); //Read XLS document - Office 97 -2003 format     
                HSSFWorkbook my_xls_workbook = new HSSFWorkbook(input_document); //Read the Excel Workbook in a instance object    
                HSSFSheet my_worksheet = my_xls_workbook.getSheetAt(0); //This will read the sheet for us into another object
                Iterator<Row> rowIterator = my_worksheet.iterator(); // Create iterator object
                while(rowIterator.hasNext()) {
                        Row row = rowIterator.next(); //Read Rows from Excel document       
                        Iterator<Cell> cellIterator = row.cellIterator();//Read every column for every row that is READ
                                while(cellIterator.hasNext()) {
                                        Cell cell = cellIterator.next(); //Fetch CELL
                                        switch(cell.getCellType()) { //Identify CELL type
                                        case Cell.CELL_TYPE_NUMERIC:
                                                System.out.print(cell.getNumericCellValue() + "\t\t"); //print numeric value
                                                break;
                                        case Cell.CELL_TYPE_STRING:
                                                System.out.print(cell.getStringCellValue() + "\t\t"); //print string value
                                                break;
                                        }
                                }
                System.out.println(""); // To iterate over to the next row
                }
                input_document.close(); //Close the XLS file opened for printing
        }
}



When you compile and run this code, it produces the following output in the console,

Student ID              Subject         Score
101.0           English         67.0
101.0           Science         87.0
102.0           English         69.0
102.0           Science         75.0
103.0           English         37.0
103.0           Maths           90.0

If you try to read XLSX file with the same code, you will be get the following exception


Exception in thread "main" org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF)

        at org.apache.poi.poifs.storage.HeaderBlock.<init>(HeaderBlock.java:131)
        at org.apache.poi.poifs.storage.HeaderBlock.<init>(HeaderBlock.java:104)
        at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:138)
        at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:327)
        at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:308)
        at readxlsx.main(readxlsx.java:15)

This means, the code we used can only read XLS format and not XLSX. For the same example input, file being XLSX this time, we will explain how to read XLSX documents.



Read XLSX Documents using POI / Java


XLSX documents differ in format when compared with XLS and hence you need to use different class files inside the POI library to read spreadsheets in this format.
import java.io.FileInputStream;
import java.io.*;
import org.apache.poi.xssf.usermodel.XSSFWorkbook; //New imports to read XLSX format
import org.apache.poi.xssf.usermodel.XSSFSheet; //New imports to read XLSX format
import org.apache.poi.ss.usermodel.*;
import java.util.Iterator;
public class readxlsx {  
        public static void main(String[] args) throws Exception{
                FileInputStream input_document = new FileInputStream(new File("C:\\Read_Excel_Example.xlsx")); //Read XLSX document - Office 2007, 2010 format     
                XSSFWorkbook my_xlsx_workbook = new XSSFWorkbook(input_document); //Read the Excel Workbook in a instance object    
                XSSFSheet my_worksheet = my_xlsx_workbook.getSheetAt(0); //This will read the sheet for us into another object
                Iterator<Row> rowIterator = my_worksheet.iterator(); // Create iterator object
                while(rowIterator.hasNext()) {
                        Row row = rowIterator.next(); //Read Rows from Excel document       
                        Iterator<Cell> cellIterator = row.cellIterator();//Read every column for every row that is READ
                                while(cellIterator.hasNext()) {
                                        Cell cell = cellIterator.next(); //Fetch CELL
                                        switch(cell.getCellType()) { //Identify CELL type
                                        case Cell.CELL_TYPE_NUMERIC:
                                                System.out.print(cell.getNumericCellValue() + "\t\t"); //print numeric value
                                                break;
                                        case Cell.CELL_TYPE_STRING:
                                                System.out.print(cell.getStringCellValue() + "\t\t"); //print string value
                                                break;
                                        }
                                }
                System.out.println(""); // To iterate over to the next row
                }
                input_document.close(); //Close the XLS file opened for printing
        }
}

This code produces the same output described earlier, but you have ended up reading XLSX workbook in this case.


Common Exceptions / Solution


Listed below are the common exceptions that can get thrown on the screen when working with the examples in POI and the solution to them.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/xmlbeans/XmlException
        at readxlsx.main(readxlsx.java:10)
Caused by: java.lang.ClassNotFoundException: org.apache.xmlbeans.XmlException
        at java.net.URLClassLoader$1.run(Unknown Source)
        at java.net.URLClassLoader$1.run(Unknown Source)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(Unknown Source)
        at java.lang.ClassLoader.loadClass(Unknown Source)
        at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
        at java.lang.ClassLoader.loadClass(Unknown Source)
        ... 1 more


You can solve this error by adding xmlbeans-2.3.0.jar to the classpath. (or equivalent for your version)
Exception in thread "main" java.lang.NoClassDefFoundError: org/dom4j/DocumentException
        at org.apache.poi.openxml4j.opc.OPCPackage.init(OPCPackage.java:154)
        at org.apache.poi.openxml4j.opc.OPCPackage.<init>(OPCPackage.java:141)
        at org.apache.poi.openxml4j.opc.Package.<init>(Package.java:54)
        at org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:82)
        at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:228)
        at org.apache.poi.util.PackageHelper.open(PackageHelper.java:39)
        at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:1
87)
        at readxlsx.main(readxlsx.java:10)
Caused by: java.lang.ClassNotFoundException: org.dom4j.DocumentException
        at java.net.URLClassLoader$1.run(Unknown Source)
        at java.net.URLClassLoader$1.run(Unknown Source)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(Unknown Source)
        at java.lang.ClassLoader.loadClass(Unknown Source)
        at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
        at java.lang.ClassLoader.loadClass(Unknown Source)
        ... 8 more

This exception can be solved by adding dom4j-1.6.1.jar to the classpath.
Exception in thread "main" java.lang.NoClassDefFoundError: org/openxmlformats/schemas/spreadsheetml/x2006/main/CTSheet
        at readxlsx.main(readxlsx.java:10)
Caused by: java.lang.ClassNotFoundException: org.openxmlformats.schemas.spreadsh
eetml.x2006.main.CTSheet
        at java.net.URLClassLoader$1.run(Unknown Source)
        at java.net.URLClassLoader$1.run(Unknown Source)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(Unknown Source)
        at java.lang.ClassLoader.loadClass(Unknown Source)
        at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
        at java.lang.ClassLoader.loadClass(Unknown Source)
        ... 1 more

You can solve this by copying poi-ooxml-schemas-3.8.jar to the classpath.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/poi/xssf/usermodel/XSSFWorkbook
        at readxlsx.main(readxlsx.java:10)
Caused by: java.lang.ClassNotFoundException: org.apache.poi.xssf.usermodel.XSSFWorkbook
        at java.net.URLClassLoader$1.run(Unknown Source)
        at java.net.URLClassLoader$1.run(Unknown Source)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(Unknown Source)
        at java.lang.ClassLoader.loadClass(Unknown Source)
        at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
        at java.lang.ClassLoader.loadClass(Unknown Source)
        ... 1 more
Solved by adding poi-ooxml-3.8.jar to the classpath
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/poi/ss/usermodel/Workbook
        at java.lang.ClassLoader.defineClass1(Native Method)
        at java.lang.ClassLoader.defineClass(Unknown Source)
        at java.security.SecureClassLoader.defineClass(Unknown Source)
        at java.net.URLClassLoader.defineClass(Unknown Source)
        at java.net.URLClassLoader.access$100(Unknown Source)
        at java.net.URLClassLoader$1.run(Unknown Source)
        at java.net.URLClassLoader$1.run(Unknown Source)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(Unknown Source)
        at java.lang.ClassLoader.loadClass(Unknown Source)
        at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
        at java.lang.ClassLoader.loadClass(Unknown Source)
        at readxlsx.main(readxlsx.java:10)
Caused by: java.lang.ClassNotFoundException: org.apache.poi.ss.usermodel.Workbook
        at java.net.URLClassLoader$1.run(Unknown Source)
        at java.net.URLClassLoader$1.run(Unknown Source)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(Unknown Source)
        at java.lang.ClassLoader.loadClass(Unknown Source)
        at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
        at java.lang.ClassLoader.loadClass(Unknown Source)
        ... 13 more

Solved by adding poi-3.8.jar to the classpath.
Exception in thread "main" org.apache.poi.POIXMLException: org.apache.poi.openxml4j.exceptions.InvalidFormatException: Package should contain a content type part [M1.13]
        at org.apache.poi.util.PackageHelper.open(PackageHelper.java:41)
        at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:187)
        at readxlsx.main(readxlsx.java:10)
Caused by: org.apache.poi.openxml4j.exceptions.InvalidFormatException: Package should contain a content type part [M1.13]
        at org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:148)
        at org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:623)
        at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:230)
        at org.apache.poi.util.PackageHelper.open(PackageHelper.java:39)
        ... 2 more

You get this exception when you are trying to read XLS file by using the classes specific for XLSX. You need to use the right class files in this case. (HSSF classes)


That completes our tutorial -1 on Apache POI to read XLS / XLSX documents in Java. Try our examples and if you get any other exception, post it in the comments section. Keep watching this space, as we are going to add more examples on POI shortly.


This Tutorial: How to Read XLS / XLSX SpreadSheets in Java using Apache POI

Keywords: Java, Excel 2007, Excel 97 - 2003, Apache, Apache POI, Java Excel Read, Example Program

Next Tutorial: How to Create New XLS Document in Apache POI - 97 - 2003 Format


8 comments:

  1. Exception in thread "main" java.io.IOException: Invalid header signature; read 0x4353414E2023233C, expected 0xE11AB1A1E011CFD0
    at org.apache.poi.poifs.storage.HeaderBlock.(HeaderBlock.java:140)
    at org.apache.poi.poifs.storage.HeaderBlock.(HeaderBlock.java:104)
    at org.apache.poi.poifs.filesystem.POIFSFileSystem.(POIFSFileSystem.java:138)
    at org.apache.poi.hssf.usermodel.HSSFWorkbook.(HSSFWorkbook.java:322)
    at org.apache.poi.hssf.usermodel.HSSFWorkbook.(HSSFWorkbook.java:303)

    ReplyDelete
  2. @dinkar,

    Are you sure the file you are trying to read is a .XLS file? You will get this exception if the file you are reading is not a .XLS file and is something else. For example, if you have renamed a .CSV file to .XLS and pass this to the program, you will get the exception. Can you confirm if the file you are using is REALLY a proper excel file?

    ReplyDelete
  3. Thanks a lot !
    Ur article...wid types of errors possible and their solutions is Very helpful
    :)

    ReplyDelete
  4. @ Venkat Sunil M
    I hope many people are looking for a solution for the "InvalidFormatException", here we go.

    fileInputStream = new FileInputStream(fileName);
    OPCPackage opcPackage = OPCPackage.open(fileName);
    XSSFWorkbook xssfWorkBook = new XSSFWorkbook(opcPackage);

    If you getting "InvalidFormatException", create an instance for OPCPackage like shown above and pass it.

    ReplyDelete
  5. stax-api-1.0.1.jar file is also required

    ReplyDelete
  6. stax-api-1.0.1.jar

    ReplyDelete
  7. Thanks for article.. it helped me a lot..
    @nakul

    ReplyDelete