Introduction

Data Structures

A data structure is encapsulation of data and relevant operations. It is referred to as an Abstract Data Type (ADT), signifying that the actual implementation is left open, subject to satisfying the intended meaning of the operations. For the most part the exact data representation is not presented; rather, all access to the data is through the operations.

The Java class is a perfect model for an ADT. In addition to the encapsulation with accessibility via operations only, the presentation of operations are in the modern object-oriented fashion in which the data takes precedence over the operation.

According to this definition, every Java class is a data structure. However, classically, we have particular interest in the classes which are often termed to be part of the Java collections. In Java terms these are contained in the java.util package. In particular, they are aggregate data structures typically representing unbounded structures whose size is parameterized.

The focus of interest to this class is:
  1. Building data structures.
    • what data to use
    • what operations are of interest
    • how to implement the operations
  2. Comparing and contrasting the efficiency of different implementations
  3. Using data structures to solve difficult algorithmic problems.
There are many factors involved in these pursuits, but we can often pin it down to three things:

Simplicity

This really should be the first consideration. We say time is money, but the "time" may be the amount of time you spend on something, not the time it takes to execute.

Speed

This is the second consideration: try to improve the overall speed. There are several considerations:

Space efficiency

Advanced data structures usually require extra storage to hold information about the structure which can be used for efficient processing. Usually this features is considered last, but in data-intensive data structures, say, for a database, it is crucial.

Algorithm Analysis

Algorithm Analysis is usually about the "speed" of an algorithm. How do we measure speed? By abstract time we mean that we decide on some programming event to count regardless of what constitutes this event while ignoring various details.

The other issue when measuring speed is that the speed may depend on the input in which case there are possibly three things of interest to us: The best case is generally not of interest; its primary value is to ascertain the minimum amount of work an algorithm must do regardless of its input.

The worst case is often the thing we analyze first, because it represents an upper bound on the time. This, however, is not the complete story, because it represents the behavior of a single operation. In some cases, an operation repeated multiple times will have a better worst-case behavior than any single operation. Such worst-case analysis is referred to as the amortized worst-case behavior.

The average case is often a good judge of what to expect in a "generic" situation. This analysis is usually the most challenging and often summons deep layers of mathematics to study it. The other issue with averaging is how to characterize a generic situation. Usually the assumption is that events are "equally likely", but that is often not the case. For example in a text search, say, to count the occurrences of certain words, one typically takes advantage of the well-known fact that certain words occur much more frequently than others.


© Robert M. Kline