Java network programming and distributed computing

  • 351 trang
  • file .pdf
Java™ Network Programming and Distributed Computing
By David Reilly, Michael Reilly
Publisher : Addison Wesley
Pub Date : March 25, 2002
ISBN : 0-201-71037-4
Table of
Contents Pages : 496
Java(TM) Network Programming and Distributed Computing is an accessible
introduction to the changing face of networking theory, Java(TM) technology, and the
fundamental elements of the Java networking API. With the explosive growth of the
Internet, Web applications, and Web services, the majority of today's programs and
applications require some form of networking. Because it was created with extensive
networking features, the Java programming language is uniquely suited for network
programming and distributed computing.
Whether you are a Java devotee who needs a solid working knowledge of network
programming or a network programmer needing to apply your existing skills to Java, this
how-to guide is the one book you will want to keep close at hand. You will learn the
basic concepts involved with networking and the practical application of the skills
necessary to be an effective Java network programmer. An accelerated guide to
networking API, Java(TM) Network Programming and Distributed Computing also
serves as a comprehensive, example-rich reference.
You will learn to maximize the API structure through in-depth coverage of:
• The architecture of the Internet and TCP/IP
• Java's input/output system
• How to write to clients and servers using the User Datagram Protocol (UDP) and
TCP
• The advantages of multi-threaded applications
• How to implement network protocols and see examples of client/server
implementations
• HTTP and how to write server-side Java applications for the WebDistributed
computing technologies such as Remote Method Invocation (RMI) and CORBA
• How to access e-mail using the extensive and powerful JavaMail(TM) API
This book's coverage of advanced topics such as input/output streaming and multi-
threading allows even the most experienced Java developers to sharpen their skills.
Java(TM) Network Programming and Distributed Computing will get you up-to-speed
with network programming today; helping you employ innovative techniques in your
own software development projects.
Brought to you by ownSky!
Table of Content
Table of Content................................................................................................................................i
Copyright ........................................................................................................................................... v
Dedication....................................................................................................................................vi
PREFACE ........................................................................................................................................vi
What You'll Learn .......................................................................................................................vi
What You'll Need .......................................................................................................................vii
Companion Web Site ................................................................................................................vii
Contacting the Authors .............................................................................................................vii
ACKNOWLEDGMENTS ..............................................................................................................viii
Chapter 1. Networking Theory....................................................................................................... 1
1.1 What Is a Network? .............................................................................................................. 1
1.2 How Do Networks Communicate? ..................................................................................... 2
1.3 Communication across Layers ........................................................................................... 3
1.4 Advantages of Layering ....................................................................................................... 6
1.5 Internet Architecture ............................................................................................................. 6
1.6 Internet Application Protocols ........................................................................................... 13
1.7 TCP/IP Protocol Suite Layers ........................................................................................... 15
1.8 Security Issues: Firewalls and Proxy Servers ................................................................ 16
1.9 Summary .............................................................................................................................. 18
Chapter 2. Java Overview ............................................................................................................ 20
2.1 What Is Java?...................................................................................................................... 20
2.2 The Java Programming Language................................................................................... 20
2.3 The Java Platform............................................................................................................... 25
2.4 The Java Application Program Interface ......................................................................... 27
2.5 Java Networking Considerations...................................................................................... 28
2.6 Applications of Java Network Programming................................................................... 29
2.7 Java Language Issues ....................................................................................................... 32
2.8 System Properties............................................................................................................... 36
2.9 Development Tools............................................................................................................. 37
2.10 Summary............................................................................................................................ 39
Chapter 3. Internet Addressing.................................................................................................... 40
3.1 Local Area Network Addresses ........................................................................................ 40
3.2 Internet Protocol Addresses.............................................................................................. 40
3.3 Beyond IP Addresses: The Domain Name System....................................................... 43
3.4 Internet Addressing with Java........................................................................................... 46
3.5 Summary .............................................................................................................................. 49
Chapter 4. Data Streams .............................................................................................................. 50
4.1 Overview .............................................................................................................................. 50
4.2 How Streams Work............................................................................................................. 51
4.3 Filter Streams ...................................................................................................................... 60
4.4 Readers and Writers........................................................................................................... 66
4.5 Object Persistence and Object Serialization .................................................................. 79
4.6 Summary .............................................................................................................................. 88
Chapter 5. User Datagram Protocol............................................................................................ 89
5.1 Overview .............................................................................................................................. 89
5.2 DatagramPacket Class ...................................................................................................... 91
5.3 DatagramSocket Class ...................................................................................................... 93
5.4 Listening for UDP Packets................................................................................................. 95
5.5 Sending UDP packets ........................................................................................................ 96
5.6 User Datagram Protocol Example.................................................................................... 97
5.7 Building a UDP Client/Server.......................................................................................... 102
5.8 Additional Information on UDP ....................................................................................... 107
ii
5.9 Summary ............................................................................................................................ 108
Chapter 6. Transmission Control Protocol ............................................................................... 110
6.1 Overview ............................................................................................................................ 110
6.2 TCP and the Client/Server Paradigm ............................................................................ 113
6.3 TCP Sockets and Java..................................................................................................... 114
6.4 Socket Class...................................................................................................................... 115
6.5 Creating a TCP Client ...................................................................................................... 122
6.6 ServerSocket Class .......................................................................................................... 123
6.7 Creating a TCP Server..................................................................................................... 126
6.8 Exception Handling: Socket-Specific Exceptions ........................................................ 128
6.9 Summary ............................................................................................................................ 129
Chapter 7. Multi-threaded Applications .................................................................................... 130
7.1 Overview ............................................................................................................................ 130
7.2 Multi-threading in Java ..................................................................................................... 133
7.3 Synchronization................................................................................................................. 141
7.4 Interthread Communication ............................................................................................. 146
7.5 Thread Groups .................................................................................................................. 150
7.6 Thread Priorities................................................................................................................ 155
7.7 Summary ............................................................................................................................ 156
Chapter 8. Implementing Application Protocols ...................................................................... 158
8.1 Overview ............................................................................................................................ 158
8.2 Application Protocol Specifications ................................................................................ 158
8.3 Application Protocol Implementation.............................................................................. 159
8.4 Summary ............................................................................................................................ 183
Chapter 9. HyperText Transfer Protocol .................................................................................. 184
9.1 Overview ............................................................................................................................ 184
9.2 HTTP and Java ................................................................................................................. 192
9.3 Common Gateway Interface (CGI)................................................................................. 215
9.4 Summary ............................................................................................................................ 222
Chapter 10. Java Servlets .......................................................................................................... 223
10.1 Overview .......................................................................................................................... 223
10.2 How Servlets Work ......................................................................................................... 223
10.3 Using Servlets ................................................................................................................. 224
10.4 Running Servlets............................................................................................................. 227
10.5 Writing a Simple Servlet ................................................................................................ 230
10.6 SingleThreadModel ........................................................................................................ 232
10.7 ServletRequest and HttpServletRequest .................................................................... 233
10.8 ServletResponse and HttpResponse .......................................................................... 235
10.9 ServletConfig ................................................................................................................... 237
10.10 ServletContext............................................................................................................... 238
10.11 Servlet Exceptions........................................................................................................ 239
10.12 Cookies .......................................................................................................................... 240
10.13 HTTP Session Management in Servlets................................................................... 243
10.14 Summary........................................................................................................................ 244
Chapter 11. Remote Method Invocation (RMI) ....................................................................... 246
11.1 Overview .......................................................................................................................... 246
11.2 How Does Remote Method Invocation Work? ........................................................... 248
11.3 Defining an RMI Service Interface ............................................................................... 250
11.4 Implementing an RMI Service Interface ...................................................................... 251
11.5 Creating Stub and Skeleton Classes........................................................................... 253
11.6 Creating an RMI Server ................................................................................................. 253
11.7 Creating an RMI Client................................................................................................... 255
11.8 Running the RMI System............................................................................................... 257
11.9 Remote Method Invocation Packages and Classes.................................................. 258
iii
11.10 Remote Method Invocation Deployment Issues ...................................................... 273
11.11 Using Remote Method Invocation to Implement Callbacks ................................... 278
11.12 Remote Object Activation............................................................................................ 286
11.13 Summary........................................................................................................................ 295
Chapter 12. Java IDL and CORBA ........................................................................................... 296
12.1 Overview .......................................................................................................................... 296
12.2 Architectural View of CORBA ....................................................................................... 297
12.3 Interface Definition Language (IDL)............................................................................. 299
12.4 From IDL to Java ............................................................................................................ 302
12.5 Summary.......................................................................................................................... 310
Chapter 13. JavaMail .................................................................................................................. 311
13.1 Overview .......................................................................................................................... 311
13.2 Installing the JavaMail API ............................................................................................ 312
13.3 Testing the JavaMail Installation .................................................................................. 313
13.4 Working with the JavaMail API ..................................................................................... 315
13.5 Advanced Messaging with JavaMail............................................................................ 333
13.6 Summary.......................................................................................................................... 342
iv
Brought to you by ownSky!
Copyright
Many of the designations used by manufacturers and sellers to distinguish their products are
claimed as trademarks. Where those designations appear in this book and Addison-Wesley was
aware of a trademark claim, the designations have been printed in initial caps or all caps.
The authors and publisher have taken care in the preparation of this book, but make no expressed
or implied warranty of any kind and assume no responsibility for errors or omissions. No liability
is assumed for incidental or consequential damages in connection with or arising out of the use of
the information or programs contained herein.
The publisher offers discounts on this book when ordered in quantity for special sales. For more
information, please contact:
Pearson Education Corporate Sales Division
201 W. 103rd Street
Indianapolis, IN 46290
(800) 428-5331
[email protected]
Visit Addison-Wesley on the Web: www.awl.com/cseng/
Library of Congress Control Number:
2002101206
Copyright © 2002 by Pearson Education, Inc.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or
transmitted, in any form, or by any means, electronic, mechanical, photocopying, recording, or
otherwise, without the prior consent of the publisher. Printed in the United States of America.
Published simultaneously in Canada.
For information on obtaining permission for use of material from this work, please submit a
written request to:
Pearson Education, Inc.
Rights and Contracts Department
75 Arlington Street, Suite 300
Boston, MA 02116
Fax: (617) 848-7047
Text printed on recycled paper
v
12345678910—CRS—0605040302
First printing, March 2002
Dedication
To the memory of Countess Ada Lovelace, the world's first computer programmer, and to Myrtle
Irene Daley, my beloved grandmother. A gracious thanks goes out to two former instructors, Mr.
Terry Bell and Dr. Zheng da Wu, whose encouragement and faith in writing and networking,
respectively, guided me to what I am today.
—David Reilly
PREFACE
Welcome to Java Network Programming and Distributed Computing. The goal of this book is to
introduce and explain the basic concepts of networking and discuss the practical aspects of Java
network programming.
This book will help readers get up to speed with network programming and employ the techniques
learned in software development. If you've had some networking experience in another language
and want to apply your existing skills to Java, you'll find the book to be an accelerated guide and a
comprehensive reference to the networking API. This book does not require you to be a
networking guru, however, as Chapters 1–4 provide a gentle introduction to networking theory,
Java, and the most basic elements of the Java networking API. In later chapters, the Java API is
covered in greater detail, with a discussion supplementing the documentation that Sun
Microsystems provides as a reference.
What You'll Learn
In this book, readers will learn how to write applications in Java that make use of network
programming. The Java API provides many ways to communicate over the Internet, from sending
packets and streams of data to employing higher-level application protocols such as HTTP and
distributed computing mechanisms.
Along the way, you'll read about:
• How the Internet works, its architecture and the TCP/IP protocol stack
• The Java programming language, including a refresher course on topics such as exception
handling
• Java's input/output system and how it works
• How to write clients and servers using the User Datagram Protocol (UDP) and the
Transport Control Protocol (TCP)
• The advantages of multi-threaded applications, which allow network applications to
perform multiple tasks concurrently
• How to implement network protocols, including examples of client/server
implementations
• The HyperText Transfer Protocol (HTTP) and how to access the World Wide Web using
Java
vi
• How to write server-side Java applications for the WWW
• Distributed computing technologies including remote method invocation (RMI) and
CORBA
• How to access e-mail using the extensive JavaMail API
What You'll Need
A reasonable familiarity with Java programming is required to get the most out of this book.
You'll need to be able to compile and run Java applications and to understand basic concepts such
as classes, objects, and the Java API. However, you don't need to be an expert with respect to the
more advanced topics covered herein, such as I/O streams and multi-threading. All examples use a
text interface, so there's no need to have GUI experience.
You'll also need to install the Java SDK, available for free from Sun Microsystems
(http://java.sun.com/j2se/). Java programmers will no doubt already have access to the SDK, but
readers should be aware that some examples in this text will require JDK 1.1, and the advanced
sections on servlets, RMI and CORBA, and JavaMail will require Java 2.
A minimal amount of additional software is required, and most of the tools for Java programming
are available for free and downloadable via the WWW. Chapter 2 includes an overview of Java
development tools, but readers can also use their existing code editor. Readers will be advised
when examples feature additional Sun Microsystems software.
Companion Web Site
As a companion to the material covered in this book, the book's Web site offers the source code in
downloadable form (no need to wear out your fingers!), as well as a list of Frequently Asked
Questions about Java Networking, links to networking resources, and additional information about
the book. The site can be found at
http://www.davidreilly.com/jnpbook/.
Contacting the Authors
We welcome feedback from readers, be it comments on specific chapters or sections or an
evaluation of the book as a whole. In particular, reader input about whether topics were clearly
conveyed and sufficiently comprehensive would be appreciated. While we'd love to receive only
praise, honest opinions are valued (as well as suggestions about coverage of new networking
topics).
Feel free to contact us directly. While we can't guarantee an individual reply, we'll do our best to
respond to your query. Please send questions and feedback via e-mail to:
[email protected].
David Reilly and Michael Reilly
September 2001
vii
ACKNOWLEDGMENTS
This book would not have been possible without the assistance of our peer reviewers, who
contributed greatly to improving its quality and allowing us to deliver a guide to Java network
programming that is both clear and comprehensive. Our thanks go to Michael Brundage, Elisabeth
Freeman, Bob Kitzberge, Lak Ming Lam, Ian Lance Taylor, and John J. Wegis.
We'd like to make special mention of two reviewers who contributed detailed reviews and offered
insightful recommendations: Howard Lee Harkness and D. Jay Newman. Most of all, we would
like to thank Amy Fong, whose thoroughness and invaluable suggestions, including questions that
the inquisitive reader might have about TCP/IP and Java, helped shape the book that you are
reading today.
We'd also like to thank our editorial team at Addison-Wesley, including Karen Gettman, whose
initial encouragement and persistence convinced us to take on the project, Mary Hart, Marcy
Barnes-Henrie, Melissa Dobson, and Emily Frey. Their support throughout the process of writing,
editing, and preparing this book for publication is most heartily appreciated.
viii
Chapter 1. Networking Theory
This chapter provides an overview of the basic concepts of networking and discusses essential
topics of networking theory. Readers experienced with networking may choose to skip over some
of these preliminary sections, although a refresher course on basic networking concepts will be
useful, as later chapters presume a knowledge of this theory on the part of the reader. A solid
understanding of the relationship between the various protocols that make up the TCP/IP suite is
required for network programming.
1.1 What Is a Network?
Put simply, a network is a collection of devices that share a common communication protocol and
a common communication medium (such as network cables, dial-up connections, and wireless
links). We use the term devices in this definition rather than computers, even though most people
think of a network as being a collection of computers; certainly the basic concept of a network in
most peoples' mind is of an assembly of network servers and desktop machines.
However, to say that networks are merely a collection of computers is to limit the range of
hardware that can use them. For example, printers may be shared across a network, allowing more
than one machine to gain access to their services. Other types of devices can also be connected to
a network; these devices can provide access to information, or offer services that may be
controlled remotely. Indeed, there is a growing movement toward connecting noncomputing
devices to networks. While the technology is still evolving, we're moving toward a network-
centric as opposed to a computing-centric model. Services and devices can be distributed across a
network rather than being bound to individual machines. In the same way, users can move from
machine to machine, logging on as if they were sitting at their own familiar terminal.
One fun and popular example from very early on in the history of networking is the soda machine
connected to the Internet, allowing people around the world to see how many cans of a certain
flavor of drink were available. While a trivial application, it served to demonstrate the power of
networking devices. Indeed, as home networks become easier to use and more affordable, we may
even see regular household appliances such as telephones, televisions, and home stereo systems
connected to local networks or even to the Internet.
Network and software standards such as Sun's Jini already exist to help devices and hardware talk
to each other over networks and to allow instant plug-and-play functionality. Devices and services
can be added and removed from the network (as, for example, when you unplug your printer and
take it to the next room) without the need for complex administration and configuration. It is
anticipated that over the course of the next few years, users will become just as comfortable and
familiar with network-centric computing as they are with the Internet.
In addition to devices that provide services are devices that keep the network going. Depending on
the complexity of a network and its physical architecture, elements forming it may include
network cards, routers, hubs, and gateways. These terms are defined below.
• Network cards are hardware devices added to a computer to allow it to talk to a network.
The most common network card in use today is the Ethernet card. Network cards usually
connect to a network cable, which is the link to the network and the medium through
which data is transmitted. However, other media exist, such as dial-up connections
through a phone line, and wireless links.
• Routers are machines that act as switches. These machines direct packets of data to the
next "hop" in their journey across a network.
1
• Hubs provide connections that allow multiple computers to access a network (for example,
allowing two desktop machines to access a local area network).
• Gateways connect one network to another—for example, a local area network to the
Internet. While routers and gateways are similar, a router does not have to bridge multiple
networks. In some cases, routers are also gateways.
While it is useful to understand such networking terminology as it is widely used in networking
texts and protocol specifications, programmers do not generally need to be concerned with the
implementation details of a network and its underlying architecture. However, it is important for
programmers to be aware of the various elements making up the network.
1.2 How Do Networks Communicate?
Networks consist of connections between computers and devices. These connections are most
commonly physical connections, such as wires and cables, through which electricity is sent.
However, many other media exist. For example, it is possible to use infrared and radio as a
communication medium for transmitting data wirelessly, or fiber-optic cables that use light rather
than electricity.
Such connections carry data between one point in the network and another. This data is
represented as bits of information (either "on" or "off," a "zero" or a "one"). Whether through a
physical medium such as a cable, through the air, or using light, this raw data is passed across
various points in the network called nodes; a node could represent a computer, another type of
hardware device such as a printer, or a piece of networking equipment that relays this information
onward to other nodes in the network or to an entirely different network. Of course, for data to be
successfully delivered to individual nodes, these nodes must be clearly identifiable.
1.2.1 Addressing
Each node in a network is typically represented by an address, just as a street name and number,
town or city, and zip code identifies individual homes and offices. The manufacturer of the
network interface card (NIC) installed in such devices is responsible for ensuring that no two card
addresses are alike, and chooses a suitable addressing scheme. Each card will have this address
stored permanently, so that it remains fixed—it cannot be manually assigned or modified,
although some operating systems will allow these addresses to be faked in the event of an
accidental conflict with another card's address.
Because of the wide variety of NICs, many addressing schemes are used. For example, Ethernet
network cards are assigned a unique 48-bit number to distinguish one card from another. Usually,
a numerical number is assigned to each card, and manufacturers are allocated batches of numbers.
This system must be strictly regulated by industry, of course—two cards with the same address
would cause headaches for network administrators. The physical address is referred to by many
names (some of which are specific to a certain type of card, while others are general terms),
including:
• Hardware address
• Ethernet address
• Media Access Control (MAC) address
• NIC address
These addresses are used to send information to the appropriate node. If two nodes shared the
same address, they would be competing for the same information and one would inevitably lose
out, or both would receive the same data. Often, machines are known by more than one type of
2
address. A network server may have a physical Ethernet address as well as an Internet Protocol (IP)
address that distinguishes it from other hosts on the Internet, or it may have more than one
network card.
Within a local area network, machines can use physical addresses to communicate. However,
since there are many types of these addresses, they are not appropriate for internetwork
communication. As discussed later in this chapter, the IP address is used for this purpose.
1.2.2 Data Transmission Using Packets
Sending individual bits of data from node to node is not very cost effective, as a fair bit of
overhead is involved in relaying the necessary address information every time a byte of data is
transmitted. Most networks, instead, group data into packets. Packets consist of a header and data
segment, as shown in Figure 1-1. The header contains addressing information (such as the sender
and the recipient), checksums to ensure that a packet has not been corrupted, as well as other
useful information that is needed for transmission across the network. The data segment contains
sequences of bytes, comprising the actual data being sent from one node to another. Since the
header information is needed only for transmission, applications are interested only in the data
segment. Ideally, as much data as possible would be combined into a packet, in order to minimize
the overhead of the headers. However, if information needs to be sent quickly, packets may be
dispatched when nearly empty. Depending on the type of packet and protocol being used, packets
may also be padded out to fit a fixed length of bytes.
Figure 1-1. Pictorial representation of a packet header
When a node on the network is ready to transmit a packet, a direct connection to the destination
node is usually not available. Instead, intermediary nodes carry packets from one location to
another, and this process is repeated indefinitely until the packet reaches its destination. Due to
network conditions (such as congestion or network failures), packets may take arbitrary routes,
and sometimes they may be lost in transit or arrive out of sequence. This may seem like a chaotic
way of communicating, but as will be seen in later chapters, there are ways to guarantee delivery
and sequencing. Indeed, the properties of guaranteed delivery and sequential order are often
irrelevant to certain types of applications (such as streaming video and audio, where it is more
important to present current video frames and audio segments than to retransmit lost ones). When
these properties are necessary, networking software can keep track of lost packets and out-of-
sequence data for applications.
Packet transmission and transmission of raw bits of information are low-level processes, while
most network programming deals with high-level transmission of data. Rather than simultaneously
covering the gamut of transmission from raw bytes to packets and then to actual program data, it is
helpful to conceive of these different types of communication as comprising individual layers.
1.3 Communication across Layers
The concept of layers was introduced to acknowledge and address the complexity of networking
theory. The most popular approach to network layering is the Open Systems Interconnection (OSI)
model, created by the International Standards Organization (ISO). This model groups network
operations into seven parts, from the most basic physical layer through to the application layer,
where software applications such as Web clients and e-mail servers communicate.
3
Under the OSI model, each of the seven layers into which communication is grouped can be
referred to by a number or by a descriptive name. Generally, when network programmers refer to
a particular layer (e.g., Layer n), they are referring to the nth layer of the OSI model. Each of the
seven layers is illustrated in Figure 1-2.
Figure 1-2. Seven layers of the OSI Reference Model
Each of the layers is responsible for some form of communication task, but each task is narrowly
defined and usually relies on the services of one or more layers beneath it. In some systems, one or
more layers may be absent, while in other systems all layers are used. Frequently, though, only a
subset of the seven layers is employed by an operating system. Generally, programmers limit
themselves to working with one layer at a time; details of the layers below are thus hidden from
view. When writing software for one layer—say, for communicating across the Internet—we as
programmers don't need to concern ourselves with issues such as initiating a modem connection
and sending data to and from the communications port to the modem. Breaking the network into
layers leads to a much simpler system.
4
1.3.1 Layer 1—Physical Layer
The physical layer is networking communication at its most basic level. The physical layer
governs the very lowest form of communication between net-work nodes. At this level,
networking hardware, such as cards and cables, transmit a sequence of bits between two nodes.
Java programmers do not work at this level—it is the domain of hardware driver developers and
electrical engineers. At this layer, no real attempt is made to ensure error-free data transmission.
Errors can occur for a variety of reasons, such as a spike in voltage due to interference from an
outside source, or line noise in networks that use analog transmission media.
1.3.2 Layer 2—Data Link Layer
The data link layer is responsible for providing a more reliable transfer of data, and for grouping
data together into frames. Frames are similar to data packets, but are blocks of data specific to a
single type of hardware architecture (whereas data packets are used at a higher level and can move
from one type of network to another). Frames have checksums to detect errors in transmission,
and typically a "start" and "end" marker to alert hardware to the division between one frame and
another. Sequences of frames are transmitted between network nodes, and if a frame is corrupted
it will be discarded. The data link layer helps to ensure that garbled data frames will not be passed
to higher layers, confusing applications. However, the data link layer does not normally guarantee
retransmission of corrupted frames; higher layers normally handle this behavior.
1.3.3 Layer 3—Network Layer
Moving up from the data link layer, which sends frames over a network, we reach the network
layer. The network layer deals with data packets, rather than frames, and introduces several
important concepts, such as the network address and routing. Packets are sent across the network,
and in the case of the Internet, all around the world. Unless traveling to a node in an adjacent
network where there is only one choice, these packets will often take alternative routes (the route
is determined by routers). Communication at this level is still very low-level; network
programmers are rarely required to write software services for this layer.
1.3.4 Layer 4—Transport Layer
The fourth layer, the transport layer, is concerned with controlling how data is transmitted. This
layer deals with issues such as automatic error detection and correction, and flow control (limiting
the amount of data sent to prevent overload).
1.3.5 Layer 5—Session Layer
The purpose of the session layer is to facilitate application-to-application data exchange, and the
establishment and termination of communication sessions. Session management involves a variety
of tasks, including establishing a session, synchronizing a session, and reestablishing a session that
has been abruptly terminated. Not every type of application will require this type of service, as the
additional overhead of connection-oriented communication can increase network delays and
bandwidth consumption. Some applications will instead choose to use a connectionless form of
communication.
1.3.6 Layer 6—Presentation Layer
The sixth layer deals with data representation and data conversion. Different machines use
different types of data representation (an integer might be represented by 8 bits on one system and
16 bits on another). Some protocols may want to compress data, or encrypt it. Whenever data
5
types are being converted from one format to another, the presentation layer handles these types of
tasks.
1.3.7 Layer 7—Application Layer
The final OSI layer is the application layer, which is where the vast majority of programmers
write code. Application layer protocols dictate the semantics of how requests for services are
made, such as requesting a file or checking for e-mail. In Java, almost all network software written
will be for the application layer, although the services of some lower layers may also be called
upon.
1.4 Advantages of Layering
The division of network protocols and services into layers not only helps simplify networking
protocols by breaking them into smaller, more manageable units, but also offers greater flexibility.
By dividing protocols into layers, protocols can be designed for interoperability. Software that
uses Layer n can communicate with software running on another machine that supports Layer n,
regardless of the details of Layer n-1, Layer n-2, and so on. Lower-level layers, for example, can
be substituted and replaced without having to modify or redesign higher-level layers, or recompile
application software. For example, a network layer protocol can work with an Ethernet network
and a token ring network, even though at the physical and data link layers, two different protocols
and hardware devices are being used. In a world of heterogeneous networks, this is an important
quality, as it makes networks interoperable.
1.5 Internet Architecture
The most important revolution in networking history has been the evolution of the Internet, a
worldwide collection of smaller networks that share a common communication suite (TCP/IP).
The term evolution rather than creation is used here, as the Internet did not simply come into
existence one day and start running. Over the years, the Internet has been extended to include what
we have today; it has evolved from a defense communications project called ARPANET into a
worldwide collection of networks that spans both the commercial and noncommercial domains.
Contributions to the design of the Internet came from both the original ARPANET developers and
from academic and commercial researchers who offered suggestions and improvements that
helped shape what it is today.
The Internet is an open system, built on common network, transport, and application layer
protocols, while granting the flexibility to connect a variety of computers, devices, and operating
systems to it. Whether an individual is running a PC, Unix, Macintosh, or Palm handheld
computer, the complexities of communication and translation are handled transparently for users
by the TCP/IP suite of protocols.
NOTE
The history of the Internet is a fascinating topic, but one that some readers
will find rather dry. Those interested in learning more about the history of
the Internet and the people involved in its evolution can consult a variety
of resources online. One of the best resources is from the Internet Society,
at http://www.isoc.org/internet/history/.
6
1.5.1 Design of the Internet
The Internet as we know it today is the result of many decades of innovation and experimentation.
The protocols that make up the TCP/IP suite have been carefully designed, tested, and improved
upon over the years. Some of the major goals (expressed in RFC 871[1]) were to achieve:
[1]
Request for Comment (RFC) specifications, described in more detail in Chapter 8, Section 8.2.
• Resource sharing between networks, by creating network protocols that support
internetwork communication or "internetting." The various protocols that make up the
Internet must support a variety of networking gateways.
• Hardware and software independence, by creating network protocols that would be
interoperable with any CPU architecture, operating system, and networking card.
• Reliability and robustness, by creating network protocols that would be fault tolerant, so
that regardless of the state of intermediary networks, data could be rerouted if necessary
in order to reach its destination. Because the Internet started as a defense research project,
robustness in the event of catastrophic network failure was extremely important.
Damaged networks can be circumvented so that the Internet at large remains accessible.
• "Good" protocols that are efficient and simple, by creating network protocols that
exhibited quality design principles, such as the concepts of communication sockets,
network ports, and so on. Though such a design goal seems intuitive now, designers had
to make a conscious effort to develop TCP/IP for long-term and high-volume use, and to
make it as simple as possible to use.
The ease of interconnection between computers and networks connected to the Internet has been
brought about by common protocols that are independent of specific hardware and software
architectures, are robust and fault tolerant, and are efficient and simple to learn. As a result, we
have the TCP/IP protocol suite. Each of the major protocols involved are detailed below.
1.5.1.1 Internet Protocol (IP)
The Internet Protocol (IP) is a Layer 3 protocol (network layer) that is used to transmit data
packets over the Internet. It is undoubtedly the most widely used networking protocol in the world,
and has spread prolifically. Regardless of what type of networking hardware is used, it will almost
certainly support IP networking. IP acts as a bridge between networks of different types, forming a
worldwide network of computers and smaller subnetworks (see Figure 1-3). Indeed, many
organizations use the IP and related protocols within their local area networks, as it can be applied
equally well internally as externally.
Figure 1-3. Support for IP networking among various physical networks
7
The Internet Protocol is a packet-switching network protocol. Information is exchanged between
two hosts in the form of IP packets, also known as IP datagrams. Each datagram is treated as a
discrete unit, unrelated to any other previously sent packet—there are no "connections" between
machines at the network layer. Instead, a series of datagrams are sent and higher-level protocols at
the transport layer provide connection services.
IP Datagram Format
The IP datagram carries with it essential information for controlling how it will be delivered. This
information is stored inside the datagram header, which is followed by the actual data being sent.
The various header fields, and their sizes, are shown in Figure 1-4.
Figure 1-4. Format of an IPv4 datagram packet
8
NOTE
Full coverage of the design and implementation details of the Internet Protocol
would require extremely complex theory, well beyond the scope of this book.
For those readers interested in learning more, full details of the Internet
Protocol version 4 are available in RFC 791. Chapter 8 outlines how to
retrieve RFCs.
A thorough knowledge of each individual IP datagram header field is not required for everyday
programming. Nonetheless, a rough understanding of how IP datagrams work will assist readers in
understanding how Internet communication takes place; therefore a brief description of these
header fields is offered.
The version field describes which version of the Internet Protocol is being used. Currently,
Internet Protocol version 4 (referred to as IPv4) is in common use, but the next generation of the
Internet Protocol is already in testing. Future versions of the Internet Protocol will feature
additional security, and include an expanded IP address space (greater than the current 32-bit
address range) to allow more devices to have their own addresses.
The header length field specifies the length of the header, in multiples of 32 bits. When no
datagram options are specified, the minimum value for this will be 5 (leaving a minimum header
length of 160 bits). However, when additional options are used, this value can be greater.
9
The type of service field requests that a specific level of service be offered to the datagram. Some
applications may require quick responses to reduce network delays, greater reliability, or higher
throughput.
The total length field states the total length of the datagram (including both header and data). A
maximum value of 65,536 bytes is usually imposed, but many networks may only support smaller
sizes. All networks are guaranteed to support a minimum of 576 bytes.
The identification field allows datagrams that are part of a sequence to be uniquely identified. This
field can be thought of as a sequence number, allowing ordering of datagrams that arrive out of
sequence.
Sometimes when packets are sent between network gateways, one gate-way will support only
smaller packets. The flags field controls whether these datagrams may be fragmented (sent as
smaller pieces and later reassembled). Fields marked "do not fragment" are discarded and are
undeliverable.
As datagrams are routed across the Internet, congestion throughout the network or faults in
intermediate gateways may cause a datagram to be routed through long and winding paths. So that
datagrams don't get caught in infinite loops and congest the network even further, the time-to-live
counter (TTL) field is included. The value of this field is decremented every time it is routed by a
gateway, and when it reaches zero the datagram is discarded. It can be thought of as a self-destruct
mechanism to prevent network overload.
The protocol type field identifies the transport level protocol that is using a datagram for
information transmission. Higher-level transport protocols rely on IP for sending messages across
a network. Each transport protocol has a unique protocol number, defined in RFC 790. For
example, if TCP is used, the protocol field will have a value of 6.
To safeguard against incorrect transmission of a datagram, a header checksum is used to detect
whether data has been scrambled. If any of the bits within the header have been modified in transit,
the checksum is designed to detect this, and the datagram is discarded. Not only can datagrams
become lost if their TTL reaches zero, they can also fail to reach their destination if an error
occurs in transmission.
The next two fields contain addressing information. The source IP address field and destination
IP address fields are stored as two separate 32-bit values. Note that there is no authentication
mechanism to prove that a datagram originated from the specified source address. Though not
common, it is possible to use the technique of "IP spoofing" to make it appear that a datagram
originated from a specific address, such as a trusted host.
The final field within the datagram header is an optional field that is not always present. The
datagram options field is of variable length, and contains flags to control security settings, routing
information, and time stamping of individual datagrams. The length of the options field must be a
multiple of 32—if not, extra bits are added as padding.
IP Address
The addressing of IP datagrams is an important issue, as applications require a way to deliver
packets to specific machines and to identify the sender. Each host machine under the Internet
Protocol has a unique address, the IP address.
The IP address is a four-byte (32-bit) address, which is usually expressed in dotted decimal format
(e.g., 192.168.0.6). Although a physical address will normally be issued to a machine, once
outside the local network in which it resides, the physical address is not very useful. Even if
somehow every machine could be located by its physical address, if the address changed for any
10
reason (such as installation of a new networking connection, or reassignment of the network
interface by the administrator), then the machine would no longer be locatable.
Instead, a new type of address is introduced, that is not bound to a particular physical location.
The details of this address format are described in more detail in Chapter 3, but for the moment,
think of the IP address as a numerical number that uniquely identifies a machine on the Internet.
Typically, one machine has a single IP address, but it can have multiple addresses. A machine
could, for example, have more than one network card, or could be assigned multiple IP addresses
(known as virtual addresses) so that it can appear to the outside world as many different machines.
Machines connected to the Internet can send data to that IP address, and routers and gateways
ensure delivery of the message. To map between a physical network address and an IP address,
host machines and routers on a local network can use the Address Resolution Protocol (ARP) and
Reverse Address Resolution Protocol (RARP). Such details, however, are more the domain of
network administrators than of programmers. In normal programming, only the IP address is
needed—the physical address is neither useful nor accessible in Java.
Host Name
While numerical address values serve the purposes of computers, they are not designed with
people in mind. Users who can remember thousands of 32-bit IP addresses in dotted decimal
format and store them in their head are few and far between. A much simpler addressing
mechanism is to associate an easy-to-remember textual name with an IP address. This text name is
known as the hostname. For example, companies on the Internet usually choose a .com address,
such as www.microsoft.com, or java.sun.com. The details of this addressing scheme are
covered further in Chapter 3.
1.5.1.2 Internet Control Message Protocol (ICMP)
Though the IP might seem to be an ineffectual means of transmitting information, it is actually
highly efficient (leaving the provision of an error-control mechanism to other protocols if they
require it). Since the Internet Protocol provides absolutely no guarantee of datagram delivery,
there is an obvious need for error-control mechanisms in many situations. One such mechanism is
the Internet Control Message Protocol (ICMP), which is used in conjunction with the Internet
Protocol to report errors when and if they occur.
The relationship between these two protocols is strong. When IP must notify another host of an
error, it uses ICMP. ICMP, on the other hand, uses IP to send the error message. When minor
errors occur, such as a corrupt header in a datagram, the datagram will be discarded without
warning since the sender address in the header cannot be trusted. Therefore a host cannot rely
solely upon ICMP to guarantee delivery—the services of ICMP are more informational, to prevent
wasted bandwidth if errors are likely to be repeated. No guarantee is offered that ICMP messages
will be sent, or that they will reach their intended destination.
The ICMP defines five error messages:
1. Destination Unreachable. As datagrams are passed from gateway to gateway, they will (it
is hoped!) travel closer and closer to their final destination. If a fault in the network
occurs, a gateway may be unable to pass the datagram on to its destination. In this case,
the "destination unreachable" ICMP message is sent back to the original host.
2. Parameter Problem. When a gateway determines that there is a problem with any of the
header parameters of an IP datagram and is unable to process them, the datagram is
discarded and the sending host may be notified via a "parameter problem" ICMP message.
3. Redirect. When a shorter path, or alternate route, is available, a gateway may send a
"redirect" ICMP message to the router that passed on a datagram.
11