July 12, 2005

...Learn TDD with Codemanship

Software Development Meta-Model

Isn't it great when you take time off work and all thoughts of software engineering empty from your mind? I just had a week off in the gloriously beautiful Scottish Highlands, and by the second day I found myself not thinking about my work at all. UML who? Agile what? Faced with that kind of scenery I defy anybody to sit there and think about refactoring or design patterns.

The benefit of emptying your mind of your work is that when you get back to the grindstone, there's lots of room to fill up with brand new ideas. My brand new idea today is about the process of going through the process of designing and developing software.

A meta-model is a model of a process that raises the level of abstraction of our understanding of that process. A Software Process Engineering Language (SPEL) is one kind of process mta-model, but the ones I've seen arguably don't abstract my understanding of software development in a way that provides any deep insights. Process actors, working in process roles, performing process steps that consume process artifects and create new ones (or transform existing ones) is all well and good, but if we used an SPEL to describe sex, would it provide us with a deeper understanding of human sexuality?

As well as abstracting workflow, role and project structure, we also need an abstraction of the things that the software development process creates or transforms.

The primary artifact in software development is - er - software. Well, actually it's source code. Source code is an executable specification that describes what we want the computer to do, and what we want it to do it to. An executable specification is a formal model, in so much as it is expressed mathematically (using a mathematically precise programming language like Java or C++) and can have only one possible interpretation.

Source code is not the end product, of course. The end product is an executable computer program, written in a language the machine will understand, which is 1's and 0's. Except that even 1's and 0's are abstractions of the actual physical end product, which could be high and low voltages, for example. So every representation between the source code and the high and low voltages is some kind of abstraction of the final product. Java bytecode is an abstraction of assembly code, which is an abstraction of binary code, which is an abstraction of the electrical signals that will actually be sent to the processor to be executed. Abstraction is very much at the heart of building and executing software, it seems.

In the opposite direction, we have a process called refinement. Java bytecode is a refinement of Java source code, for example. A refinement is the transformation of a model at one level of abstraction to another model at a lower level of abstraction. It is the journey from goal to strategy. The Java source describes the things we want the computer to do, and the Java compiler describes how this will be achieved in bytecode for the JVM. The JVM in turn refines the bytecode specification into machine-executable instructions that are at an even lower level of abstraction.

And that got me thinking: is there a neater way of describing the software development process - at least from source code downwards to begin with - that utilises these simple ideas and that makes it easier to understand what software development is all about? I think there is, and this is my initial stab in the dark to begin developing the idea.

The ultimate goal of software development is to have your instructions executed by a computer so that it can do useful stuff. In my process meta-model, the thing that executes instructions is called a Processor. A processor only responds to specific types of stimulus. Physical processors require a physical stimulus. For example, I can ask Fred to make me a cup of tea. Fred is the processor, and I transmit my instructions through the medium of sound in the form of spoken English.

Pentium processors don't speak English, unfortunately. They respond to electrical stimuli in the form of sequences of machine instructions represented as sets of high and low voltages. Getting from English-like written instructions (source code) to machine-executable instructions requires a sequence of refinements which are performed by Transformers. The Java compiler is a transformer that transforms Java source code into Java bytecode, for example. The Java Virtual Machine is also a transformer, as is the Windows operating system. Each transformer speaks at least two languages, and knows how to translate the meaning of a specification written in one language into another language.

Code transformers are not very smart. They don't need to be, because the code can only have one possible interpretation. The languages they speak are said to be deterministic. They just follow very simple rules which are applied to the input specification, in exactly the same way that XSLT applies simple rules to input XML documents to create an output in a different form.

So, from source code down to the end product, the development process is conceptually very straightforward. You start with a formal model you can understand (source code) and that goes through a sequence of simple rule-based transformations until it's in a form the processor can accept as input. All of this is easily automated, so you don't actually need people for this part of the development process. It can all be done by software.

But what about from source code upwards? How do we know what source code to write in the first place? Well, I like consistency, so I prefer to continue the meta-model of models, languages, processors, transformers/transformations, and refinements.

So is a requirements document a model? Of course it is. It may be written in English, but it's still a model - an abstract representation of some thing (in this, of the desired properties of the software under development). Let's take a use case as an example. How does a use case differ from source code? Written and spoken English are not like programming languages, in that sentences written in English (and therefore, models written in English) are often - actually, are usually - ambiguous. They have multiple possible interpretations, and are therefore said to be non-deterministic. Transforming models described in one ambiguous language to another ambiguous language is perfectly do-able. You can translate the ambiguity so that it is carried into the refinement. But where's the value in that?

We need to end up with machine-executable instructions, which must be unambiguous. If a requirement can have 10 possible meanings, how can we tell the computer which one we actually want it to do? Again, it's actually very simple - we must formalize the requirements model. Formalizing a model means taking a set of possible meanings and choosing just one, which can then be represented formally (for example, as Java source code).

This is the really hard part of software development. It cannot be achieved using simple rules, and cannot currently be automated using software. This type of refinement must be done by intelligent beings - either people, or perhaps highly-trained donkeys, but intelligent nevertheless.

So we have two types of model in the software development process: formal and informal (deterministic and non-deterministic), and two kinds of model refinement: transformation/translations can be applied to models of one type to produce a model of the same type (e.g., formal-to-formal), and formalizations are required to refine an informal model into a formal model that can then be transformed automatically into machine-executable instructions.

Up until now, the software industry has focused on solving the very easy problem of automated transformations of formal models. More recently, natural language translators have been developed that do a fair-to-middling job of automatically transforming informal models (e.g. "How are you today?") into other informal models, complete with the original ambiguities (e.g. "Comment allez-vous aujourd'houi?"). But I'm not aware of any software product that has cracked the nut of turning wishy-washy, hand-wavy specifications into cold, hard machine code.

Indeed, not a great deal of research has been done into the mechanics of formalization of models, so I'm possibly breaking new ground here (email me and scorn if you know otherwise). Like I said before, an informal model has multiple possible interpretations, and formalization is essentially the process of choosing the correct one. Each interpretation of a model is a formal model, so the formalization is the process of selecting the correct formal model from the set of available formal models that fit within the range of interpretations that the informal model allows.

Selecting the correct formal model is what stumps most projects. It's a hard problem. Hopefully, if time allows, I will post a simple UML process meta-model that describes the concepts so far, as well as discuss the mechanics of formalization in more depth.

Posted 16 years, 2 months ago on July 12, 2005