Split PDF by Size Java Example iText Tutorial

In this new iText tutorial, we will explain how to split a PDF file by size with an example. We earlier presented an example to slice a PDF document based on the number of pages in Java. But quite often, you may have to divide a PDF document based on size and not by number of pages. Hence, it would be handy to define an approach to achieve this kind of size based PDF split. For this demonstration, I have an input PDF file of size ~2 MB and I would like to size this PDF into smaller chunks of 500 Kb files. We will provide a working Java code example that will demonstrate how to split this PDF into smaller chunks of PDF files based on size. We will get started with the step by step guide now.

Step-1: Begin with creating a PdfReader object that will accept the incoming PDF file that needs to be split based on size. We also create a Document and PdfCopy object at this stage as the final output would atleast have one PDF file. The Java Code to achieve this is provided below;
          PdfReader Split_PDF_By_Size = new PdfReader("CombinedPDFDocument.pdf");          
          Document document=new Document();; 
          PdfCopy copy=new PdfCopy(document,new FileOutputStream("File1.pdf"));
          document.open();          
Step-2: Eventhough the split is based on size, we still have to get the number of pages in the incoming PDF file. We use getNumberofPages in Document class to get this information for us. I will also declare an integer variable to dynamically name the new files and an integer variable that will hold the size of the new PDF as we import pages into it from original PDF file. I will also declare a float variable to get the size in kilobytes and check it for creating subsequent new files. The variable declarations are provided below;
          int number_of_pages = Split_PDF_By_Size.getNumberOfPages();
          int pagenumber=1;
          int Find_PDF_Size;
          float combinedsize=0;        
Step-3:We now declare a for loop for the number of pages in input PDF. Inside this loop, we import pages from original document to new PDF document using getImportedPage method. We also use getCurrentDocumentSize method of PdfCopy object to inspect the size of the Document after page import. This size is returned to us in bytes and I divide it by 1024 to convert it to kilobytes in Java. If this size exceeds 496 Kb (I want to reserve 4 Kb for EOF information) or if the page being imported is the last page in the input PDF file, we can close the Document object and reset combinedsize to 0. In a scenario where the size has exceeded our limit, we create a new Document object when the loop runs again, and add pages to the new PDF file. The code to run this for loop is provided below;
          for (int i = 1; i < number_of_pages; i++ ) {
                  if (combinedsize==0 && i !=1 ){                   
                  document = new Document();
                  pagenumber++;
                  String FileName="File"+ pagenumber+".pdf";                     
                  copy = new PdfCopy(document,new FileOutputStream(FileName));
                  document.open();   
                  }
                  copy.addPage(copy.getImportedPage(Split_PDF_By_Size, i));
                  Find_PDF_Size=copy.getCurrentDocumentSize();            
                  combinedsize=(float)Find_PDF_Size/1024;                 
                  if (combinedsize > 496 || i==number_of_pages) {
                  document.close();
                  combinedsize=0;
                  }
                                }
Step-4: The For Loop in Step 3 does all the size based split of PDF file for us. At the end of the loop, we can put a simple print statement in Java to print the number of new files that got created.

The complete commented version of the Java Code to Split a PDF file based on Size using the iText library is provided below;
import java.io.*;
import com.itextpdf.text.*;
import com.itextpdf.text.pdf.*;
public class SplitPDFBySize {  
     public static void main(String[] args){
        try {
          PdfReader Split_PDF_By_Size = new PdfReader("CombinedPDFDocument.pdf");          
          Document document=new Document();; 
          PdfCopy copy=new PdfCopy(document,new FileOutputStream("File1.pdf"));
          document.open();          
          int number_of_pages = Split_PDF_By_Size.getNumberOfPages();
          int pagenumber=1; /* To generate file name dynamically */
          int Find_PDF_Size; /* To get PDF size in bytes */
          float combinedsize=0; /* To convert this to Kilobytes and estimate new PDF size */        
          for (int i = 1; i < number_of_pages; i++ ) {
                  if (combinedsize==0 && i !=1 ){ /* Generate new file only for second time when first document size
                          exceeds limit and incoming loop counter is not 1 */
                  document = new Document();
                  pagenumber++;
                  String FileName="File"+ pagenumber+".pdf"; /* Dynamic file name */                    
                  copy = new PdfCopy(document,new FileOutputStream(FileName));
                  document.open();   
                  }
                  copy.addPage(copy.getImportedPage(Split_PDF_By_Size, i)); /* Import pages from original document */
                  Find_PDF_Size=copy.getCurrentDocumentSize(); /* Estimate PDF size in bytes */           
                  combinedsize=(float)Find_PDF_Size/1024; /* Convert bytes to kilobytes */                
                  if (combinedsize > 496 || i==number_of_pages) { /* Close document if the page is the last page or if limit reaches */
                  document.close();
                  combinedsize=0; /* reset variable to generate next file, if required */
                  }
                                }
          System.out.println("PDF Split By Size Completed. Number of Documents Created:"+pagenumber);                        
        }
        catch (Exception i)
        {
            System.out.println(i);
        }
    }
}
In my case, I got 7 PDF documents each of size ~ 500 Kb. I could understand that the split is not an equivalent File Split as we have to generate header information for all PDF files.We will dwell more on this in upcoming tutorials
Do you know a better way to chunk PDF files by Size? Share your thoughts on this code with us.

14 comments:

  1. There is an error in the for loop at - if (combinedsize > 496 || i==number_of_pages) {
    document.close();
    combinedsize=0;
    }

    The second condition will never be satisfied since the execution exits the loop when i == number_of_pages. So, the last file will not be written properly, and will show an error when opened in a pdf reader.

    A document.close() has to be included after the for loop to close the last document.

    ReplyDelete
  2. Thanx men !!
    Both of you have saved my life :)

    ReplyDelete
  3. Hey can anyone help me for , once the files are splitted I want them to store in an array and i want to return that array ,So basically I want the whole process in a function whose return type will be an array !!!!
    Please I really need that help...
    Thank you

    ReplyDelete
  4. @Anonymous,

    Quite possible. You can have an array of Document objects. Can you throw more light on your requirement here? What are you going to do with this array? What version of itext you are using?

    ReplyDelete
  5. This comment has been removed by the author.

    ReplyDelete
  6. Hey I am using the 5.3.4 version of iText.

    Actually I have a requirement in that I have to create one function whose return type is an array of strings.in that function I am passing the filename , filepath and the split size.
    This fuction is used to split a pdf file.
    once the splitting is done I have to store all the splitted files names like in our case file 1, file 2 ,file 3 ....... in an array and i have to return this array...


    Now can you please help me for that
    Thank You sooo much

    ReplyDelete
  7. @Rucha, please create an array of document objects and use the same code.

    ReplyDelete
  8. I did that way but not working !!!
    Can you please show me how you want me to do it !!!!!

    Thanx

    ReplyDelete
  9. Please Post your code , exception..so that we can see how to fix it

    ReplyDelete
  10. I am doing this way but not working can you please please help !!!
    I am new to java too ..... :(


    import com.itextpdf.text.Document;
    import com.itextpdf.text.pdf.PdfCopy;
    import com.itextpdf.text.pdf.PdfReader;
    import java.io.FileOutputStream;
    import java.text.DateFormat;
    import java.text.SimpleDateFormat;
    import java.util.ArrayList;
    import java.util.Date;

    /*
    * To change this template, choose Tools | Templates
    * and open the template in the editor.
    */
    /**
    *
    * @author rucha
    */
    public class FTP_PDFSplit {

    public String[] SplitBySize(String fileName, String path, double size) {
    ArrayList outputFileNames = new ArrayList();
    int pagenumber = 1;
    DateFormat dateFormat = new SimpleDateFormat("yyyyMMdd.HH.mm.ss.ms");
    Date date = new Date();
    try {
    PdfReader inputPdf = new PdfReader(path + fileName);

    Document[] document = new Document;

    int nop = inputPdf.getNumberOfPages();

    PdfCopy copy = new PdfCopy(document[0], new FileOutputStream(path + "File_11_" +dateFormat.format(date) + ".pdf"));

    document[0].open();

    int number_of_pages = inputPdf.getNumberOfPages();
    /* To generate file name dynamically */
    float combinedsize = 0; /* To convert this to Kilobytes and estimate new PDF size */
    int temp = 11;
    for (int i = 1; i < number_of_pages; i++) {
    if (combinedsize == 0 && i != 1) { /* Generate new file only for second time when first document size
    exceeds limit and incoming loop counter is not 1 */

    document[i] = new Document();
    pagenumber++;
    temp++;
    String FileName = path + "File_" + temp + "_" + dateFormat.format(date) +".pdf" ; /* Dynamic file name */

    copy = new PdfCopy(document[i], new FileOutputStream(FileName));
    document[i].open();

    }
    copy.addPage(copy.getImportedPage(inputPdf, i)); /* Import pages from original document */
    float Find_PDF_Size = copy.getCurrentDocumentSize(); /* Estimate PDF size in bytes */
    combinedsize = (float) Find_PDF_Size / 1024; /* Convert bytes to kilobytes */
    if (combinedsize > size || i == number_of_pages) { /* Close document if the page is the last page or if limit reaches */

    document[i].close();
    combinedsize = 0; /* reset variable to generate next file, if required */
    }

    }
    document[0].close();

    System.out.println("PDF Split By Size Completed. Number of Documents Created:"+pagenumber);

    } catch (Exception i) {
    System.out.println(i);
    }
    String[] result = null;
    if (outputFileNames.size() > 0) {
    result = ((String[]) outputFileNames.toArray(new String[0]));
    for (int i = 0; i < result.length; i++) {



    }

    }
    else {
    result = new String[0];
    }


    return result;
    }
    }

    ReplyDelete
  11. Hello, what is your error when you run this program? Where does it fail?

    ReplyDelete
  12. Hey I am able to get the file names in array but one more problem I am facing is ,here we are splitting the first file separately and rest separately in the for loop ,

    Here is my latest code :


    public class FTP_PDFSplit {

    public String[] SplitBySize(String fileName, String path, double size) {
    ArrayList outputFileNames = new ArrayList();
    String[] result = null;
    int pagenumber = 1;
    DateFormat dateFormat = new SimpleDateFormat("yyyyMMdd.HH.mm.ss.ms");
    Date date = new Date();
    try {
    PdfReader inputPdf = new PdfReader(path + fileName);

    Document document = new Document();

    int nop = inputPdf.getNumberOfPages();

    PdfCopy copy = new PdfCopy(document, new FileOutputStream(path + "File_11_" +dateFormat.format(date) + ".pdf"));
    String tempfn = path + "File_11_" +dateFormat.format(date) + ".pdf";
    document.open();

    int number_of_pages = inputPdf.getNumberOfPages();
    /* To generate file name dynamically */
    float combinedsize = 0; /* To convert this to Kilobytes and estimate new PDF size */
    int temp = 11;
    for (int i = 1; i < number_of_pages; i++) {
    if (combinedsize == 0 && i != 1) { /* Generate new file only for second time when first document size
    exceeds limit and incoming loop counter is not 1 */

    document = new Document();
    pagenumber++;
    temp++;
    String FileName = path + "File_" + temp + "_" + dateFormat.format(date) +".pdf" ; /* Dynamic file name */

    outputFileNames.add(FileName);
    copy = new PdfCopy(document, new FileOutputStream(FileName));
    document.open();

    }
    copy.addPage(copy.getImportedPage(inputPdf, i)); /* Import pages from original document */
    float Find_PDF_Size = copy.getCurrentDocumentSize(); /* Estimate PDF size in bytes */
    combinedsize = (float) Find_PDF_Size / 1024; /* Convert bytes to kilobytes */
    if (combinedsize > size || i == number_of_pages) { /* Close document if the page is the last page or if limit reaches */

    document.close();
    combinedsize = 0; /* reset variable to generate next file, if required */
    }

    }
    document.close();

    System.out.println("PDF Split By Size Completed. Number of Documents Created:"+pagenumber);

    } catch (Exception i) {
    System.out.println(i);
    }

    if (outputFileNames.size() > 0) {

    System.out.println ( "\n\nOS: " + outputFileNames.size() + "\n\n" );

    result = (String[])outputFileNames.toArray(new String[0]);
    for (int i = 0; i < result.length; i++) {
    //System.out.println("PDF Split By Size Completed. Number of Documents Created:" + pagenumber);
    System.out.println("File " + (i + 1) + " " + result[i]);

    }

    }
    else {
    result = new String[0];
    }


    return result;
    }
    }




    Output :---

    PDF Split By Size Completed. Number of Documents Created:5


    File 1 C:\Temp_Workspace\TestSplit\SplitOutput\File_12_20121213.09.12.31.1231.pdf

    File 2 C:\Temp_Workspace\TestSplit\SplitOutput\File_13_20121213.09.12.31.1231.pdf

    File 3 C:\Temp_Workspace\TestSplit\SplitOutput\File_14_20121213.09.12.31.1231.pdf

    File 4 C:\Temp_Workspace\TestSplit\SplitOutput\File_15_20121213.09.12.31.1231.pdf

    Here you see that the total files splitted is 5 but i am only getting four files in my array ....
    Can anyone please help me to get all the files....
    I am almost there for what I want but stuck here :(.....

    Thank You men for your help ,

    ReplyDelete
  13. when I try to split the file i am able to split it in multiple files the only problem I am facing is the last page in the PDF does not included in the split file . Can anyone please help me with this please.

    ReplyDelete
  14. @Anonymous,

    did you try a print on the number of pages variable? Does this match the number of pages in your PDF document? You may want to alter the condition in For loop to match suitably.

    ReplyDelete