Abstraction in object-oriented programming

In object-oriented programming theory, abstraction is the facility to define objects that represent abstract "actors" that can perform work, report on and change their state, and "communicate" with other objects in the system.

Various object-oriented progamming languages offer similar facilities for abstraction', all to support a general strategy of Polymorphism in object-oriented programming - the substitution of one type type for another in the same or similar role role. Although it is not as generally supported, a configuration or image or package may predetermine a great many of these bindings at compile-time, link-time, or load-time. This would leave only a minimum of such bindings to change at run-time.

In Java, for instance, abstraction refers to the extension of the concept of data type from earlier programming languages to include not only state information (i.e., data) but also behavior (i.e. procedures). This extended data type is called a class, and objects of that type are called instances of that class.

In other languages, e.g. the self programming language, CLOS, there is less of a class-instance distinction, and more use of generics, overloading, delegation, and prototypes is encouraged. Although these are all useful for the same data/class/type/role binding and enable more flexible polymorphism, they have certain complexity disadvantages. Another extreme is C++, which relies heavily on templates and overloading and other static bindings at compile-time, which in turn has certain flexibility problems.

Although these are alternate strategies for achieving the same abstraction, they do not fundamentally alter the need to support abstract nouns in code - all programming relies on an ability to abstract verbs as functions, nouns as data structures, and either as processes.

For example, a Java program may need to represent, say, animals, so it would define an Animal class to represent both the state of the animal and its functions:

 class Animal extends LivingThing {
   Location m_loc;
   double m_energy_reserves;
   
   boolean is_hungry() {
     if (m_energy_reserves < 2.5) { return true; }
     else { return false; }
   }
   void eat(Food f) {
     // Consume food
     m_energy_reserves += f.getCalories();
   }
   void moveto(Location l) {
     // Move to new location
     m_loc = l;
   }
 }

With the above definition, one could create objects of type Animal and call their methods like this:

 thePig = new Animal();
 theCow = new Animal();
 if (thePig.is_hungry()) { thePig.eat(table_scraps); }
 if (theCow.is_hungry()) { theCow.eat(grass); }
 theCow.move(theBarn);

If a more differentiated hierarchy of animals is required to differentiate, say, those who provide milk from those who provide nothing except meat at the end of their lives, that is an intermediary level of abstraction, probably DairyAnimal (cows, goats) who would eat foods suitable to giving good milk, and Animal (pigs, steers) who would eat foods to give the best meat quality.

Such an abstraction could remove the need for the application coder to specify the type of food, so s/he could concentrate instead on the feeding schedule. The two classes could be related using inheritance or stand alone, and varying degrees of polymorphism between the two types could be defined. These facilities tend to vary drastically between languages, but in general each can achieve anything that is possible with any of the others. A great many operation overloads, data type by data type, can have the same effect at compile-time as any degree of inheritance or other means to achieve polymorphism. The class notation is simply a coder's convenience.

Decisions regarding what to abstract and what to keep under the control of the coder are the major concern of object-oriented design and domain analysis - actually determining the relevant relationships in the real world is the concern of object-oriented analysis or legacy analysis.

In general, to determine appropriate abstraction, one must make many small decisions about scope, domain analysis, determine what other systems one must cooperate with, legacy analysis, then perform a detailed object-oriented analysis which is expressed within project time and budget constraints as an object-oriented design. In our simple example, the domain is the barnyard, the live pigs and cows and their eating habits are the legacy constraints, the detailed analysis is that coders must have the flexibility to feed the animals what is available and thus there is no reason to code the type of food into the class itself, and the design is a single simple Animal class of which pigs and cows are instances with the same functions. A decision to differentiate DairyAnimal would change the detailed analysis but the domain and legacy analysis would be unchanged - thus it is entirely under the control of the programmer, and we refer to abstraction in object-oriented programming as distinct from abstraction in domain or legacy analysis.

Lots more to say here about how different languages deal with abstraction, etc...