Introduction to Java Input/Output Streams

This document introduces the basic concepts of Java input/output streams (as implemented by the java.io package of the Java class libraries).

Streams

A stream transports data from some point A to some other point B. It may modify the data on its way. A stream is not active on its own. Some other object (let's call it a "pump") must push data into a stream or pull data from a stream.

Pull vs. Push Streams

There are two different kinds of streams:

Symbol Stream Type Java Classes Key Method
Pull Streams Reader, InputStream read()
Push Streams Writer, OutputStream write()

You use a Pull Stream to pull some data out of something. A lot of pull streams are provided by Java, your job usually is to write a pump object, which pulls data out of a stream by calling its read() methods.

A Push Stream is used to push some data into something. Java provides a lot of push streams, you just have to write a pump object, which pushes data into a stream by calling its write() methods.

Soure and Sink Streams

Source Streams and Sink Streams are special types of pull and push streams, which terminate a pipeline composed of streams by reading from, or writing to, a "data store".

Symbol Stream Type Special Kind Of Java Classes Key Method
Source Streams Pull Streams Reader, InputStream read()
Sink Streams Push Streams Writer, OutputStream write()

A Source Stream starts a pipeline of pull streams. You connect the source stream to some data source. At the end of a pipeline of pull streams, some pump object has to pull (read) the data out of the pipeline.

A Sink Stream ends a pipeline of push streams. You connect the sink stream to some data sink. At the beginning of such a pipeline of push streams, some pump object has to push (write) the data into the pipeline.

Pipes

You cannot directly connect a pull stream to a push stream. In such a configuration, there would be no pump, so no data would flow. There always has to be some pump object driving the flow of data (reading from a pull stream and/or writing to a push stream).

There exists a way, though, to directly connect a pull stream to a push stream. For this you have to connect a PipedInputStream, a specific pull stream, to a special kind of push stream, a PipedOutputStream. In such a configuration, you need two pump objects, one to push data into the PipedOutputStream and another to pull it out of the PipedInputStream. If you pump data into such a pipe, the PipedOutputStream buffers the data, until it is read by the output pump.

Byte vs. Character Streams

In Java (since version 1.1), you have the choice between two different types of streams. The difference lies in the type of data they transport.

Color Stream Type Data Element Type Java Classes
Byte Streams byte InputStream, OutputStream
Character Streams char Reader, Writer

Byte Streams are dumb. They know absolutly nothing about the data flowing through them. They just know that a data element is a byte.

Character Streams are quite dumb, too. But instead of a single byte, their smallest unit of information is a Unicode character.

So if you use a byte stream's write() method, you have to supply a byte as it's argument. And the read() method returns a byte, too (at least conceptually; it really returns an int, which contains the byte in it's least siginificant byte, or -1 in case of problems).

If you use a character stream's write() method instead, you have to supply a char as the argument. The corresponding read() method returns a char (it returns an int, which contains the char in it's two low-order bytes, or -1 in case of problems).

Character Encoding

In our digital world, in the bottom line, there is only one way to represent all kind of data: As bits or bytes. For a computer, there exists no such thing as "text" or "image" or "audio". It only knows plain bytes.

A program has to tell our computer about the meaning - the interpretation - of a specific byte. If you want to store text on a computer, this is relativly easy. You assign a number to every character you have in your alphabeth. This mechanism is called encoding.

In the world of computers, there exist hundreds of such encodings. One of the best known is the ASCII Code. It defines the encoding of 128 characters as numbers from 0 to 127. So it uses 7 bit per character.

The ASCII Code lacks some important characters, for example some key characters used in European alphabeths (like or ). To be able to also store these characters, ASCII has been enhanced in different ways. One of these is the ISO-8859-1 Encoding, also called ISO-Latin-1. This encoding provides 256 characters (stored as numbers 0 to 255, using 8 bit) and contains all the letters used in Western Europe. Solaris uses ISO-8859-1 for encoding text. Graphical Windows programs use a similiar encoding, called Cp1252 (also called Windows-Latin-1). DOS textmode programs use other encodings, e.g. Cp850 (also called MS-DOS Latin-1).

But even 256 different values are not sufficient to represent all existing characters of all existing alphabeths. So other encodings using more than one byte per character have been defined. The one best known and most often used is Unicode. This is the encoding used internally in Java. So what the Java Character Streams really transport are bytes representing characters encoded in Unicode.

But what if you want your characters to leave the Java World? Or what if you want to import text into the Java World which has been stored in a World using some other encoding, like the common ISO-Latin-1?

Conversion between Bytes and Characters

Java provides two stream classes that allow you to convert bytes known to represent text in a specific encoding into Java characters or vice versa.

Input Data Element Type Symbol Output Data Element Type Stream Type Java Classes
byte
holding characters encoded in a specified encoding
char Pull Stream InputStreamReader
char byte
holding characters encoded in a specified encoding
Push Stream OutputStreamWriter

The gray ring in the middle of the streams visualizes the encoder. You may specify the encoder in an argument to the constructor of the InputStreamReader or OutputStreamWriter classes. When you use the constructer without specifying the encoding, the Reader/Writer uses the default encoding for your platform. On Solaris, the ISO-8859-1 encoding is chosen.

An OutputStreamWriter produces one or more byte(s) representing a written character in the given encoding.

An InputStreamReader consumes one or more byte(s) representing a character to be read in the given encoding .


Matthias Hauswirth