Search ZIP File -Glob Pattern -Java NIO Example

In this example, we will describe how to search a ZIP Archive with a Glob / Wild Card pattern, using a Java NIO program. We will read the ZIP File using Zip File System Provider, and use java.nio.file.PathMatcher to match the pattern we want to search with the contents of the ZIP file. This article is built on top of the search ZIP Entries by File Name tutorial. You may wish to refer to the tutorial to get a base on this approach.

Create ZIP File System / Scan ZIP Folders

We have discussed this step quite a large number of times in this blog now. We create a ZPFS using the methods available in Java NIO, and mount the ZIP file that we would like to search on as a File System. Then, we get all the folders inside the ZIP file using getRootDirectories method. We iterate each of those, and call the walkFileTree method during this iteration. This will result in scanning of individual files, which can grab under visitFile method implemented by FileVisitor interface. Inside this method, we can use PathMatcher object to match the scanned file with the Glob pattern.

        /* Define ZIP File System Properies in HashMap */
        Map<String, String> zip_properties = new HashMap<>();
        zip_properties.put("create", "false");
        URI zip_disk = URI.create("jar:file:/sp.zip");
        FileSystem zipfs = FileSystems.newFileSystem(zip_disk, zip_properties);
        
        Iterable<Path> dirs = zipfs.getRootDirectories();
        
        for (Path root : dirs) {                
                Files.walkFileTree(root, walk);            
        }        


Searching ZIP Archives Using java.nio.file.PathMatcher

PathMatcher class in Java NIO is capable of accepting a search pattern, and compare it with a Path object in Java and return a true boolean back to the calling program if there is a match. To create a PathMatcher pattern, we use getPathMatcher method available in java.nio.file.FileSystem and pass the search pattern to this method. Note that it is also possible to send a regular expression to getPathMatcher method. Using this, we will be able to search a ZIP archive for files that matches the string in regular expression. We will cover that in detail in another post. For now, we will focus on glob based ZIP entry searches alone. The code snippet to create a PathMatcher object is shown below:

       matcher = FileSystems.getDefault().getPathMatcher("glob:" + searchPattern);

The string searchPattern, for this example, we will set to *.sql, that will find matching .sql files inside the ZIP archive.


Match ZIP File Entries with GLOB Pattern

While scanning every single ZIP entry in the archive, we will get the name of the file scanned, and the Path of the file inside the ZIP Archive. This Path can then be matched with PathMatcher object created earlier, using matches method. If this method returns true, then we have found a match in the ZIP archive. Else, we can scan the next file. This needs to be recursively executed, till all the files are scanned in the ZIP file. This recursive operation is automatically managed by the walkFileTree method, which we discussed in the previous tutorial.

            if (name != null && matcher.matches(name)) {
                    System.out.println("Searched file was found: " + name + " in " + my_file.toRealPath().toString());
            }


Complete Java NIO Program to Search ZIP File Entries with GLOB Pattern

The complete Java program for this tutorial is given below:

import java.nio.file.attribute.BasicFileAttributes;
import java.nio.file.*;
import java.io.IOException;
import java.util.*;
import java.net.URI;

class globSearch implements FileVisitor {
   
    private final PathMatcher matcher;
  
    public globSearch(String searchPattern) {
       matcher = FileSystems.getDefault().getPathMatcher("glob:" + searchPattern);
    }
    
    @Override
    public FileVisitResult visitFile(Object file, BasicFileAttributes attrs)
    throws IOException {
            Path my_file = (Path) file;
            Path name = my_file.getFileName();
            if (name != null && matcher.matches(name)) {
                    System.out.println("Searched file was found: " + name + " in " + my_file.toRealPath().toString());
            }
            return FileVisitResult.CONTINUE;         
    }
    /* We don't use these, so just override them */
    @Override
    public FileVisitResult postVisitDirectory(Object dir, IOException exc)
    throws IOException {        
        return FileVisitResult.CONTINUE;
    }
    
    @Override
    public FileVisitResult preVisitDirectory(Object dir, BasicFileAttributes attrs)
    throws IOException {
        return FileVisitResult.CONTINUE;
    }
    @Override
    public FileVisitResult visitFileFailed(Object file, IOException exc)
    throws IOException {        
        return FileVisitResult.CONTINUE;
    }
    
    
    public static void main(String args[]) throws IOException {
        
        /* We want to find out all .SQL files inside the ZIP file */
        String searchPattern="*.sql";
        globSearch walk = new globSearch(searchPattern);        
        
        /* Define ZIP File System Properies in HashMap */
        Map<String, String> zip_properties = new HashMap<>();
        zip_properties.put("create", "false");
        URI zip_disk = URI.create("jar:file:/sp.zip");
        FileSystem zipfs = FileSystems.newFileSystem(zip_disk, zip_properties);
        
        Iterable<Path> dirs = zipfs.getRootDirectories();
        /* For every root folder inside the ZIP archive */
        for (Path root : dirs) {                
                Files.walkFileTree(root, walk);            
        }        
    }    
}

For a ZIP File structure as shown in the screenshot below:

Search ZIP File using Wild Cards / Glob - Input File
Search ZIP File using Wild Cards / Glob - Input File

The output of this program is shown  below:

Searched file was found: dest.sql in /dest.sql
Searched file was found: mira.sql in /sp/sp2/mira.sql
Searched file was found: final.sql in /sp/sp2/final.sql
Searched file was found: dest.sql in /sp/dest.sql

Not only this, you can also specify different GLOB  patterns inside the ZIP file for a wide variety of search outcomes. Some of these are documented in the table below:

Pattern
Search Output
*.sql
Matches all .sql files inside the ZIP archive
*. {sql,log }
Matches all files with extension .sql or .log
fin*.sql
Find all .sql files with file name starting with “fin”
fin.?
Find all files with a single character extension and has “fin” as file name.
*fi*
Find all files that has “fi” in any part of their file name.


That completes our tutorial to search ZIP file with a GLOB pattern. In the next example, we will discuss how to use regular expressions to search for ZIP file entries in Java NIO. Send us a comment to tell us how we are doing / if you have any questions on this tutorial.

1 comment:

  1. You are effectively only matching on the actual file name, not the full path within the zip. This means that you cannot do any matching on the folder within the zip.

    Say you want to do this glob : "**/pictures/*.png" which is of course perfectly legal but you've effectively excluded yourself from that because you match only on the last part of the Path, the file name.

    ReplyDelete