We are going to commence a new set of
tutorials on processing Excel spreadsheets in Java using Apache POI. We had
earlier provided a tutorial on creatingexcel documents in JExcel / Java. This set of examples will utilize Apache POI library to create MS Office
documents, specifically around Excel Workbooks. I will not dwell more into the
library itself, but will focus more on providing working examples that you can
easily integrate into your Java application. We are going to present a set of
10 examples (tested working) using POI / Java, and at the bottom of the post
you can find links to each of these. If you are keen to look at a particular
post, you can jump directly into this.
- Java version 1.5 / above
- Apache POI library v3.8 (You can still use a higher version and make this example to work)
Read XLS files using Apache POI
We have a sample XLS file with the content as per the image below:
Input XLS / XLSX file to Read in Apache POI |
The worksheet name is “Sheet1”. I have saved this document with the name Read_Excel_Example.xls. I also have a XLSX version of the same sheet for reasons that will be explained later. We will see how to read this workbook and print the output on the screen in POI.
Read XLS Documents using Apache POI / Java
The Java code to read XLS files in POI is provided below. You can use this code as a step by step guide as it is fully commented:
import java.io.FileInputStream;
import java.io.*;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.ss.usermodel.*;
import java.util.Iterator;
public class readxls {
public static void main(String[] args) throws Exception{
FileInputStream input_document = new FileInputStream(new File("C:\\Read_Excel_Example.xls")); //Read XLS document - Office 97 -2003 format
HSSFWorkbook my_xls_workbook = new HSSFWorkbook(input_document); //Read the Excel Workbook in a instance object
HSSFSheet my_worksheet = my_xls_workbook.getSheetAt(0); //This will read the sheet for us into another object
Iterator<Row> rowIterator = my_worksheet.iterator(); // Create iterator object
while(rowIterator.hasNext()) {
Row row = rowIterator.next(); //Read Rows from Excel document
Iterator<Cell> cellIterator = row.cellIterator();//Read every column for every row that is READ
while(cellIterator.hasNext()) {
Cell cell = cellIterator.next(); //Fetch CELL
switch(cell.getCellType()) { //Identify CELL type
case Cell.CELL_TYPE_NUMERIC:
System.out.print(cell.getNumericCellValue() + "\t\t"); //print numeric value
break;
case Cell.CELL_TYPE_STRING:
System.out.print(cell.getStringCellValue() + "\t\t"); //print string value
break;
}
}
System.out.println(""); // To iterate over to the next row
}
input_document.close(); //Close the XLS file opened for printing
}
}
When you compile and run this code, it
produces the following output in the console,
Student ID Subject Score
101.0 English 67.0
101.0 Science 87.0
102.0 English 69.0
102.0 Science 75.0
103.0 English 37.0
103.0 Maths 90.0
If you try to read XLSX file with the same code, you will be get the following exception
Exception in thread "main" org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF)
at org.apache.poi.poifs.storage.HeaderBlock.<init>(HeaderBlock.java:131)
at org.apache.poi.poifs.storage.HeaderBlock.<init>(HeaderBlock.java:104)
at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:138)
at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:327)
at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:308)
at readxlsx.main(readxlsx.java:15)
This means, the code we used can only read XLS format and not XLSX. For the same example input, file being XLSX this time, we will explain how to read XLSX documents.
XLSX documents differ in format when compared with XLS and hence you need to use different class files inside the POI library to read spreadsheets in this format.
This code produces the same output described earlier, but you have ended up reading XLSX workbook in this case.
Listed below are the common exceptions that can get thrown on the screen when working with the examples in POI and the solution to them.
Read XLSX Documents using POI / Java
XLSX documents differ in format when compared with XLS and hence you need to use different class files inside the POI library to read spreadsheets in this format.
import java.io.FileInputStream;
import java.io.*;
import org.apache.poi.xssf.usermodel.XSSFWorkbook; //New imports to read XLSX format
import org.apache.poi.xssf.usermodel.XSSFSheet; //New imports to read XLSX format
import org.apache.poi.ss.usermodel.*;
import java.util.Iterator;
public class readxlsx {
public static void main(String[] args) throws Exception{
FileInputStream input_document = new FileInputStream(new File("C:\\Read_Excel_Example.xlsx")); //Read XLSX document - Office 2007, 2010 format
XSSFWorkbook my_xlsx_workbook = new XSSFWorkbook(input_document); //Read the Excel Workbook in a instance object
XSSFSheet my_worksheet = my_xlsx_workbook.getSheetAt(0); //This will read the sheet for us into another object
Iterator<Row> rowIterator = my_worksheet.iterator(); // Create iterator object
while(rowIterator.hasNext()) {
Row row = rowIterator.next(); //Read Rows from Excel document
Iterator<Cell> cellIterator = row.cellIterator();//Read every column for every row that is READ
while(cellIterator.hasNext()) {
Cell cell = cellIterator.next(); //Fetch CELL
switch(cell.getCellType()) { //Identify CELL type
case Cell.CELL_TYPE_NUMERIC:
System.out.print(cell.getNumericCellValue() + "\t\t"); //print numeric value
break;
case Cell.CELL_TYPE_STRING:
System.out.print(cell.getStringCellValue() + "\t\t"); //print string value
break;
}
}
System.out.println(""); // To iterate over to the next row
}
input_document.close(); //Close the XLS file opened for printing
}
}
This code produces the same output described earlier, but you have ended up reading XLSX workbook in this case.
Common Exceptions / Solution
Listed below are the common exceptions that can get thrown on the screen when working with the examples in POI and the solution to them.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/xmlbeans/XmlException
at readxlsx.main(readxlsx.java:10)
Caused by: java.lang.ClassNotFoundException: org.apache.xmlbeans.XmlException
at java.net.URLClassLoader$1.run(Unknown Source)
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
... 1 more
You can solve this error by adding xmlbeans-2.3.0.jar to the classpath. (or equivalent for your version)
Exception in thread "main" java.lang.NoClassDefFoundError: org/dom4j/DocumentException
at org.apache.poi.openxml4j.opc.OPCPackage.init(OPCPackage.java:154)
at org.apache.poi.openxml4j.opc.OPCPackage.<init>(OPCPackage.java:141)
at org.apache.poi.openxml4j.opc.Package.<init>(Package.java:54)
at org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:82)
at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:228)
at org.apache.poi.util.PackageHelper.open(PackageHelper.java:39)
at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:1
87)
at readxlsx.main(readxlsx.java:10)
Caused by: java.lang.ClassNotFoundException: org.dom4j.DocumentException
at java.net.URLClassLoader$1.run(Unknown Source)
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
... 8 more
This exception can be solved by adding dom4j-1.6.1.jar to the classpath.
Exception in thread "main" java.lang.NoClassDefFoundError: org/openxmlformats/schemas/spreadsheetml/x2006/main/CTSheet
at readxlsx.main(readxlsx.java:10)
Caused by: java.lang.ClassNotFoundException: org.openxmlformats.schemas.spreadsh
eetml.x2006.main.CTSheet
at java.net.URLClassLoader$1.run(Unknown Source)
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
... 1 more
You can solve this by copying poi-ooxml-schemas-3.8.jar to the classpath.Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/poi/xssf/usermodel/XSSFWorkbook
at readxlsx.main(readxlsx.java:10)
Caused by: java.lang.ClassNotFoundException: org.apache.poi.xssf.usermodel.XSSFWorkbook
at java.net.URLClassLoader$1.run(Unknown Source)
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
... 1 more
Solved by adding poi-ooxml-3.8.jar to the classpathException in thread "main" java.lang.NoClassDefFoundError: org/apache/poi/ss/usermodel/Workbook
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(Unknown Source)
at java.security.SecureClassLoader.defineClass(Unknown Source)
at java.net.URLClassLoader.defineClass(Unknown Source)
at java.net.URLClassLoader.access$100(Unknown Source)
at java.net.URLClassLoader$1.run(Unknown Source)
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at readxlsx.main(readxlsx.java:10)
Caused by: java.lang.ClassNotFoundException: org.apache.poi.ss.usermodel.Workbook
at java.net.URLClassLoader$1.run(Unknown Source)
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
... 13 more
Exception in thread "main" org.apache.poi.POIXMLException: org.apache.poi.openxml4j.exceptions.InvalidFormatException: Package should contain a content type part [M1.13]
at org.apache.poi.util.PackageHelper.open(PackageHelper.java:41)
at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:187)
at readxlsx.main(readxlsx.java:10)
Caused by: org.apache.poi.openxml4j.exceptions.InvalidFormatException: Package should contain a content type part [M1.13]
at org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:148)
at org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:623)
at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:230)
at org.apache.poi.util.PackageHelper.open(PackageHelper.java:39)
... 2 more
You get this exception when you are trying to read XLS file by using the classes specific for XLSX. You need to use the right class files in this case. (HSSF classes)
Exception in thread "main" java.io.IOException: Invalid header signature; read 0x4353414E2023233C, expected 0xE11AB1A1E011CFD0
ReplyDeleteat org.apache.poi.poifs.storage.HeaderBlock.(HeaderBlock.java:140)
at org.apache.poi.poifs.storage.HeaderBlock.(HeaderBlock.java:104)
at org.apache.poi.poifs.filesystem.POIFSFileSystem.(POIFSFileSystem.java:138)
at org.apache.poi.hssf.usermodel.HSSFWorkbook.(HSSFWorkbook.java:322)
at org.apache.poi.hssf.usermodel.HSSFWorkbook.(HSSFWorkbook.java:303)
@dinkar,
ReplyDeleteAre you sure the file you are trying to read is a .XLS file? You will get this exception if the file you are reading is not a .XLS file and is something else. For example, if you have renamed a .CSV file to .XLS and pass this to the program, you will get the exception. Can you confirm if the file you are using is REALLY a proper excel file?
Thanks a lot !
ReplyDeleteUr article...wid types of errors possible and their solutions is Very helpful
:)
@ Venkat Sunil M
ReplyDeleteI hope many people are looking for a solution for the "InvalidFormatException", here we go.
fileInputStream = new FileInputStream(fileName);
OPCPackage opcPackage = OPCPackage.open(fileName);
XSSFWorkbook xssfWorkBook = new XSSFWorkbook(opcPackage);
If you getting "InvalidFormatException", create an instance for OPCPackage like shown above and pass it.
stax-api-1.0.1.jar file is also required
ReplyDeletestax-api-1.0.1.jar
ReplyDeleteThanks for article.. it helped me a lot..
ReplyDelete@nakul
Great stuff, thanks a lot!
ReplyDeleteHi,
ReplyDeleteHow can we read column wise in xlsx in java
Sir , Can u please give me solution how to compare 2 different sheet (Values) data from .xlsx file using java.
ReplyDeleteRegards
Sagar C
hi,while i am trying to fetch the data from excel which contains graphs, it is throwing me the following exception
ReplyDeleteException in thread "main" org.apache.poi.hssf.record.RecordFormatException: WINDOW2 was not found
please help me out with this issue
[Utils] [ERROR] [Error] java.lang.NoClassDefFoundError: org/dom4j/DocumentException
ReplyDelete