We have seen so much on working with actual data in POI so far. In this post, I would like to touch a bit on the metadata. A document is never complete without adding suitable metadata and Apache POI offers a plethora of options to set metadata against a document. We will discuss how to read and write metadata into Excel workbooks using Apache POI in Java with simple working examples in this post. We will focus on XLSX document types. For Office 97 – 2003 formats, we will provide another post that would explain how metadata can be extracted / written.
Metadata in Excel spreadsheets can be classified into three major categories;
Core Properties: These are defined in
org.apache.poi.POIXMLProperties.CoreProperties
and include the following properties that you can read / set against:- Category – Category of the document
- Content Status – Status of the content
- Content Type – Type of Content
- Document Created TimeStamp
- Description of the Document
- Identifiers
- Keywords in the document
- Last printed date
- Modified Date
- Revision
- Subject of the document
- Title of the document
Extended Properties: Extended properties are applicable across all Office files and defined in
org.openxmlformats.schemas.officeDocument.x2006.extendedProperties.CTProperties
class. Some of the properties that you can set through this class in Apache POI is provided below:- Company
- Manager
- Hyperlink Base
Custom Properties: On top of core and extended properties, you can also add custom properties as name value pairs. You can add your own or choose from a list that is already provided as a part of Office.
These include the following – Checked By, Client, Date Completed, Department, Destination, Disposition, Division, Document Number, Editor, Forward to, Group, Language, Mailstop, Matter, Office, Owner, Project, Publisher, Purpose, Received From, Recorded By, Reference, Source, Status, Telephone Number and Typist. You can set a type against such custom properties which can be Text, Date, Number or Yes /No. Finally you can also provide a value for such properties.
With this bit of theory, let us now write a simple Java program that will help us to set all these properties against an Excel Spreadsheet.
Excel File: Write Metadata Using Apache POI – Java Example Program
import java.io.FileOutputStream;
import java.io.IOException;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
import org.apache.poi.xssf.usermodel.*;
import org.apache.poi.POIXMLProperties;
import org.apache.poi.POIXMLProperties.*;
import org.openxmlformats.schemas.officeDocument.x2006.extendedProperties.CTProperties;
public class setMetadata
{
public static void main(String[] args)
throws IOException
{
/* Start with a workbook object to write comments into */
XSSFWorkbook xlsxSetMetadata = new XSSFWorkbook();
xlsxSetMetadata.createSheet("Write Metadata");
POIXMLProperties props = xlsxSetMetadata.getProperties();
/* Let us set some core properties now*/
POIXMLProperties.CoreProperties coreProp=props.getCoreProperties();
coreProp.setCreator("Thinktibits"); //set document creator
coreProp.setDescription("set Metadata using Apache POI / Java"); //set Description
coreProp.setKeywords("Apache POI, Metadata, Java, Example Program, XLSX "); //set keywords
coreProp.setTitle("XLSX Metadata Example Apache POI "); //Title of the document
coreProp.setSubjectProperty("Code"); //Subject
coreProp.setCategory("Programming"); //category
/* We can now set some extended properties */
POIXMLProperties.ExtendedProperties extProp=props.getExtendedProperties();
extProp.getUnderlyingProperties().setCompany("Apache Software Foundation");
extProp.getUnderlyingProperties().setTemplate("XSSF");
extProp.getUnderlyingProperties().setManager("Manager Name");
/* Finally, we can set some custom Properies */
POIXMLProperties.CustomProperties custProp = props.getCustomProperties();
custProp.addProperty("Author", "Thinktibits");// String
custProp.addProperty("Year", 2014); // Number Property
custProp.addProperty("Published", true); //Yes No Property
custProp.addProperty("Typist", "tika");
/* Write all these metadata into workbook and Close Document */
String fname = "Metadata.xlsx";
FileOutputStream out = new FileOutputStream(fname);
xlsxSetMetadata.write(out);
out.close();
}
}
The output of this program is shown below (2 screens : One for Core/ Extended Metadata, Another for Custom Metadata)
Apache POI - Add Custom Metadata in Excel - Example Output |
Java: Add Core / Extended Metadata to Excel Using POI - Output |
Excel File: Read Metadata with Apache POI – Example Code
import java.io.FileInputStream;
import java.io.*;
import java.util.List;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
import org.apache.poi.xssf.usermodel.*;
import org.apache.poi.POIXMLProperties;
import org.apache.poi.POIXMLProperties.*;
import org.openxmlformats.schemas.officeDocument.x2006.customProperties.*;
public class getMetadata
{
public static void main(String[] args)
throws IOException
{
FileInputStream input_document = new FileInputStream(new File("Metadata.xlsx")); //Read XLSX document - Office 2007, 2010 format
XSSFWorkbook readMetadata = new XSSFWorkbook(input_document); //Read the Excel Workbook in a instance object
POIXMLProperties props = readMetadata.getProperties();
POIXMLProperties.CoreProperties coreProp=props.getCoreProperties();
/* Read and print core properties as SOP */
System.out.println("Document Creator :" + coreProp.getCreator());
System.out.println("Description :" + coreProp.getDescription());
System.out.println("Keywords :" + coreProp.getKeywords());
System.out.println("Title :" + coreProp.getTitle());
System.out.println("Subject :" + coreProp.getSubject());
System.out.println("Category :" + coreProp.getCategory());
/* Read and print extended properties */
POIXMLProperties.ExtendedProperties extProp=props.getExtendedProperties();
System.out.println("Company :" + extProp.getUnderlyingProperties().getCompany());
System.out.println("Template :" + extProp.getUnderlyingProperties().getTemplate());
System.out.println("Manager Name :" + extProp.getUnderlyingProperties().getManager());
/* Finally, we can retrieve some custom Properies */
POIXMLProperties.CustomProperties custProp = props.getCustomProperties();
List<CTProperty> my1=custProp.getUnderlyingProperties().getPropertyList();
System.out.println("Size :" + my1.size());
for (int i = 0; i < my1.size(); i++) {
CTProperty pItem = my1.get(i);
System.out.println(""+pItem.getPid());
System.out.println(""+pItem.getFmtid());
System.out.println(""+pItem.getName());
System.out.println(""+pItem.getLpwstr());
}
}
}
This programs prints the metadata information and the output is shown below:
Document Creator :Thinktibits
Description :set Metadata using Apache POI / Java
Keywords :Apache POI, Metadata, Java, Example Program, XLSX
Title :XLSX Metadata Example Apache POI
Subject :Code
Category :Programming
Company :Apache Software Foundation
Template :XSSF
Manager Name :Manager Name
Size :4
2
{D5CDD505-2E9C-101B-9397-08002B2CF9AE}
Author
Thinktibits
3
{D5CDD505-2E9C-101B-9397-08002B2CF9AE}
Year
null
4
{D5CDD505-2E9C-101B-9397-08002B2CF9AE}
Published
null
5
{D5CDD505-2E9C-101B-9397-08002B2CF9AE}
Typist
tika
If you notice the custom properties section output, you are able to get the output for properties that are of type “string”. You need to use different methods to get if you use a Yes/No or number type of custom property. I would like leave this for you to give a try – post a comment here if you are stuck. The examples presented in this post are tested with POIv3.8. You can modify the code provided above to update metadata on existing Excel documents, if you have such a requirement.
Can custom properties on worksheet level (in VBA accessed as Worksheet.CustomProperties) also be accessed via POI?
ReplyDelete