Sunday, March 18, 2007

File IO

Input/output operations route data from the keyboard, files or programs into your program and from there to the screen, printer, files or other programs. For simple tasks such as dumping your output to a file or printer or using a text file as an input script simple redirection provided by the operating system is all that is required. Our previous tutorials have used System.in and System.out objects to read and write from the standard input and output streams.

For more complex IO activities, modern languages use the concept of streams. Java uses file streams, data streams, pipe streams and object streams to manipulate I/O. For these you must access the io class library with the instruction import java.io.*

File Management

File management is the manner in which files are monitored and controlled for standard I/O access. The File class provides a constructor to create a file handle. This file handle is then used by various file class methods to access the properties of a specific file or by file stream constructors to open files. The File class also provides appropriate platform dependent directory and file separator symbol (slash or backslash) using either File.separator or File.separatorChar. Simple file constructors either use hardcoded names or pass a value from the parameter line such as:

File hard = new File(File.separator + "sample.dat"); // in root dir
File soft = new File(args[0]); // entered as part of command line

Accessor methods: getAbsolutePath(), getCanonicalPath(), getName(), getPath(), getParent(), lastModified(), length(), list() {returns array of String}, listFiles() {returns array of File objects}.

Mutator methods: delete(), deleteOnExit(),mkdir(), mkdirs(), renameTo(), setLastModified(), setReadOnly().

Boolean methods: canRead(), canWrite(), compareTo(), exists(), isAbsolute(), isDirectory(), isFile(), isHidden().

Here is a very useful routine to establish the current directory path:

public class CurrentDir {
public static void main (String args[]) {
String dir="user.dir"; // set to current directory
try {dir=new File(System.getProperty(dir)).getCanonicalPath();}
catch (IOException e1) { /*handler required but null */ }
System.out.println ("Current dir : " + dir);
}
}

To limit what is returned by the list() method, apply a filter using the FilenameFilter interface. accept() is the only method allowed. A program showing the use of Filefilter is:

class OnlyExt implements FilenameFilter
{
String ext;
public OnlyExt(String ext)
{ this.ext = "." + ext; }
public boolean accept(File dir, String name)
{ return name.endsWith(ext); }
}
class DirListOnly // *.htm for example
{
public static void main(String args[])
{
String dirname="./website"; // select directory
File f1 = new File(dirname);
FilenameFilter only = new OnlyExt("htm"); // and ext
String files[] = f1.list(only); // array of files
for (int i=0; i

Note: More sophisticated GUI techniques for selecting files include the Swing FileChooser class and its awt cousin FileDialog. These classes also include file filters and checks for existence and the ability to access the data.

File Streams

File streams are primitive streams whose sources or destinations are files. Both byte [8-bit] (FileInputStream / FileOutputStream) and character [16-bit] (FileReader / FileWriter) quantities can be used. Some sample constructors are:

FilenInputStream(fileObj)                 FileReader(fileObj)
FileInputStream(FilePath[,append_flag]) FileReader(filePath[,append_flag])
FileOutputStream(fileObj) FileWriter(fileObj)
FileOutputStream(FilePath[,append_flag]) FileWriter(filePath[,append_flag])

Note 1: Use character streams for new code! This allows Unicode material to be processed correctly.

Note 2: The read() method returns an integer even when character streams are used. Java still has some quirks to keep you thinking!!

copybyte.java is a working file IO system that uses primitive file streams to read and copy bytes to a new file. You can use copybyte as a start point for other utilities by placing code between the read() and write() methods. However there are more efficient ways of handling most types of data. File streams should be wrapped and buffered in data streams.

Data Streams

Data streams are streams whose sources and destinations are other streams. They are known as wrappers because they wrap the primitive file stream object mechanism inside a more powerful one. Data streams are buffered so that more than a single 8/16 bit quantity is processed at a time. The basic buffered streams are BufferedInputStream(), BufferedOutputStream(), BufferedReader() and BufferedWriter().

Java does not provide an EOF() method as other languages do! read() returns an integer -1 when an EOF occurs and readLine() returns null. But a better technique is to use an exception handler to catch the EOFException and handle it. Check the Exceptions tutorial. You may also want to close the file if it was sequential access.

copyline.java is a working io system that uses buffered data streams to read and copy lines to a new file. Many utilities rely on text files which are often best handled one line at a time. You can use this sample as a start point for your own utility between the readLine() and write() methods.

DataInputStream() and DataOutputStream() streams can also be used to read/write one primitive quantity at a time. Some of the useful methods are: read(), readBoolean(), readByte(), readDouble(), readFloat(), readInt(), readLong(), readShort(), readLine(), write(), etc.

Tokens (or words) are strings broken at whitespace. Adding the StringTokenizer class makes some utilities more efficient because they can work with individual tokens rather than whole lines of text (ie. the line is already parsed for analysis). The StreamTokenizer class can also be used to read directly from a file stream! copytoken.java is a working file io system that uses a token stream to read and copy tokens to a new file.
Note: Whitespace is minimized which makes this format great for compressing HTML source files into a server copy. You can also use copytoken as a start point and add your own utility between the read and write operations. One easy project to start with is wc (the Unix word count utility).

Random Access Files

Random access files allow files to be accessed at a specific point in the file. They can also be opened in read/write mode which allows updating of a current file. The constructor is RandomAccessFile(File fileObject, String accessMethod) where the access method is either "r" or "rw". The seek(long position) method moves the file position pointer. It is incremented automatically on a write. The getFilePointer() method returns the current file position pointer. The file size can be adjusted with setLength(). Normal i/o methods are used for access.

copyrandom.java is a working random access file io system that uses bit streams to read and copy binary data to a new file. Note that it illustrates file creation but does not demonstrate either the ability to access at specific points in the file or the update a single file strengths of random access methods.

mirror.java shows a very simple use of the seek() method. The source file is read backwards and each byte written to a new file. This is one of the simplest forms of encryption offered.

RandomAccess.java is a more complete example that uses user input to alter file contents. Since it extends GenericApplication, that file must be compiled first. Once both are compiled, test with java RandomAccess xxx where xxx is the filename.

To fully implement a random access io system, you must structure your own record format and then use this structure to determine record size for the seek method. You must also provide your own indexing method based on one or more fields of the record. The java.io library provides only the primitive functions necessary. At this point you may want to seek out a random access toolbox provided by another implementer!!

Project: Text File IO Class

This project will test your knowledge of basic file IO as well as how to encapsulate an object. The task is to make a TextIO object that has methods to open data streams and to read and write text files on a line by line basis. The required GUI will be added as part of a case study. To have the program do something write a worker class that removes comment lines from a file. This will require the indexOf() and substring() methods from the String class

A simple enhancement that makes use of this TextIO class is a pager utility which takes a file and makes single page files of every 55 line group. This overcomes the problem of the java \f formfeed not working with many printers. Possible command line switches are:

  • /lines=n sets the # of lines in each file [default is 55]
  • /head=n sets the # of lines used as heading [default is 0]
  • /title="string" sets the first line title [default of none]

Project: Archive Packaging

This project will extend the use of the TextIO class you just constructed as well as begin to replicate a old unix utility called archive. The task is to make an archive object that \has methods to both merge text files into a block and to extract them.

The first version uses a trivial token file marker (*** ) and extracts to named or numbered files. A loop is needed to continue the extracting process. The command line syntax is archive.java filename.

The first project enhancement is to incorporate switches for pack/unpack/update, etc and to add required functionality.

A final extension would be to use a whitelist to add files or deblock files. This project can keep one busy for a week at least!

Batch File Shell

Many utilitities can benefit from a shell code routine that allows running over a complete directory or a file subset (batch mode).

touch.java updates the last_modified field of all files in the current directory to the current date. This overcomes a flaw in XP that sometimes deletes older files without informing the user. Programmers can also use touch to make sure libraries are recompiled with the latest version of the compiler.

longest.java [ext] uses a command line approach and a FilenameFilter to execute on all files with a given extension. The batch operation wraps around a method that identifies the longest line in each file.

Command line wildcard file constructs are automatically expanded at runtime and passed to programs as command line arguments. Careful attention must be given here! wildcard.java shows how to handle command line wildcard expansion.

Ongoing Projects

WordCount2.java adds basic file IO to our previous project in word counting. Reuse the WordCount class and add the text io class to the workspace class. Use fixed or argument filenames for now. The required GUI will be added as part of a case study.

Prep2.java adds basic file IO to our previous project in cipher text preparation. Reuse the Prep class and add the text io class to the workspace class. Use fixed or argument filenames for now. The required GUI will be added as part of a case study.

TagScan2.java adds basic file IO to our previous project in HTML analysis. Reuse the TagScan class and add the text io class to the workspace class. Use fixed or argument filenames for now. The required GUI will be added as part of a case study.

No comments: