The Scientific Problem and its Novelty
In the following we give the background information needed in order to position the Centre scientifically. First we give a brief overview of relevant approaches to the modelling of systems and the specification of system properties. We then describe the current trends in system verification within each of the areas of discrete, stochastic, and continuous models and properties. In order to position the Centre we clarify our scientific aims for each of these contexts.
1. Models of IT Systems
As in all engineering disciplines modelling is an indispensable activity in the design and planning of complicated IT systems. Such models may serve both as conceptual guidelines in the later implementation phase and, more importantly, as the basis of formal reasoning about systems.
The actual construction of IT systems spans a large number of abstraction levels ranging from low-level, hardware-oriented programming languages (e.g. VHDL), high-level programming languages (e.g. C++ and Java) to object-oriented development notations (e.g. UML).
To facilitate the reasoning about systems, MT-LAB will be based on formal modelling formalisms that are able to unambiguously describe systems.
In the following we point out several cornerstones of IT systems modeling, and allow ourselves to postpone the discussion of stochastic and continuous aspects of IT systems to later sections.
State of the art
The Unified Modeling Language (UML) [RJB05] has been designed for describing the use, structure and behaviour of IT systems. It has been standardized by the Object Management Group [OMG]. UML has been embraced by the Industry and has attracted a lot of research attention. UML models are often used in early design phases, but can also be used for automatic code generation. However, as UML is relatively detailed and complex, efficient analysis and code generation techniques often rely on extraction of more abstract descriptions from UML models, such as automata or process algebraic descriptions (for example [BMPS05]).
UML is a universal language focusing on the wide domain of IT systems. A recent strong trend in software engineering is the use of domain specific languages [CJW07,VSBHH06], where a language is defined around a narrow class of systems. Examples of domain specific modeling languages include the Business Process Execution Language (BPEL) [BIMSS03], and Business Process Modeling Notation (BPMN) [OMG06], the Web Service Definition Language (WSDL) [CMRW06]. Even though these languages are much better scoped than UML, similarly to UML they are often translated to more abstract formalisms like automata, process algebra [BGHM05,VDR07] or Petri Nets [St05], to facilitate automatic analysis or synthesis of code.
In a similar direction, the UML Profile for Scheduling, Performance and Time (SPT) defines a set of concepts useful for modeling real-time systems. Its purpose is to integrate notation used by existing real-time analysis techniques into UML. However its lack of formal semantics has been criticized [GOO06] and the European project Omega has been aiming at developing a methodology in UML for embedded and real-time systems based on formal techniques.
Two of the best known process algebras are Hoare's CSP [Hoa85] and Milner's CCS [Mil82]. Both have been developed to increase understanding of the construction of distributed systems and especially CSP has gained a relatively large acceptance in engineering textbooks, for example in descriptions of communication protocols for distributed systems. Later research has developed a wide range of process algebras [AILS07], which focus on more specialized properties of distributed systems, for example mobile systems (e.g. the pi-calculus [Mil99, SW01] and the Ambient Calculus [CG98], and more recently Bigraphs [JM03] and Bigraphical reactive systems [M01]) and service oriented architectures [BBC+06,LPT07] (that continue to lead to new variations within the ongoing EU project SENSORIA [SEN]) as well as business processes in general [BFN05,YCH05].
Scientific Aims
To ensure applicability at all levels and independence of concrete programming languages, MT-LAB will focus on modelling system behaviour by means of process models expressed using process algebras. These models are able to model a wide variety of discrete and stochastic features whereas further development is needed to fully account for the continuous ones.
2. Specifications of Properties
In a number of areas the quality of software is of utmost importance. In order to meet this requirement it is often necessary to formally verify the presence of desirable properties before deployment. The Safety Instrumented Systems standard [SIS] (pervasive in the area of embedded systems) and the Common Criteria standard [CC] (widely used in the larger NATO countries) are examples of international standards that emphasize the need for validating that systems are
- functionally correct (react as expected),
- dependable (do not cause damage on environment or users),
- highly efficient while demanding few resources,
- secure (against hackers and viruses),
- stable (do not crash),
- fault tolerant (offers vital functionality even when partially crashed).
To enable a uniform approach to the validation of the above important properties MT-LAB will be based on logical specification formalisms that are able to express seemingly dissimilar properties within the same formalism.
In the following we point out several cornerstones of specification formalisms, and allow ourselves to postpone the discussion of stochastic and continuous aspects of specification formalisms to later sections.
State of the art
Logical specification formalisms permit desired properties of process models to be expressed. In particular, the formalisms are equipped with precise and unambiguous definitions of when a process model may be classified as satisfying or being correct with respect to a given specification. A logical specification may thus be identified with its set of satisfying process models, providing the basis for refinement and Boolean combinations of logical specifications. Pioneered by Ed Clarke, Allan Emerson and Joseph Sifakis, the term model checking relates to the act of determining whether a given process model satisfies a given logical specification [CGP00] – for which they were jointly awarded the prestigeous A. M. Turing award in 2007.
For process models, various types of temporal and modal logics provide favourite specification formalisms. Since the introduction of temporal logics to Computer Science by Pnueli [Pnu77], two main families of logical specification formalisms have emerged: linear time temporal logic (e.g. LTL [Pnu77]), with the so-called automata-theoretic approach [VW84] and partial order reduction techniques [CGP00] providing the basis for several efficient analysis methods, and branching time temporal logic (e.g. CTL [CE81]), with so-called symbolic methods [Bry92] providing the basis for efficient analysis. However, the research on model checking is a very active research area, with continuous development of new efficient techniques for specific types of process models. For example, extensions of the automata theoretic approach to branching time logics has been provided [KVW00] as well as combinations of symbolic and partial order techniques [ABH+01].
Closely related to specification formalisms are refinement pre-orders [Gla93] between process models according to which one process model may be considered more abstract than another model. Typically, during development, successively refined process models are generated, constantly introducing new aspects and limiting choices. Ideally the refinement pre-order should preserve properties expressed within the given specification formalisms, i.e. if a property has already been established of an abstract process model, then the property should be guaranteed also to hold for any refined process model.
Formal specifications of requirements are only seldom applied by industry. Rather requirements are mostly specified in terms of natural language description of a set of design requirements. However, this form of specification leads to ambiguous and unverifiable specifications. Thus, given the increasing complexity of designs, there is a growing awareness that easy-to-use machine-checkable and unambiguous notations for specifications should be developed.
- [PSL04] The Accellera Property Specification Language (PSL) was developed in order to enable for hardware designers to make mathematically precise, as well as simple, specifications of design. Formally PSL may be seen as a combination of LTL and automata.
- [LSC01] Live Sequence Charts provide a highly expressive and rigorously defined graphical language for expressing requirements in terms of for use-cases and scenarios. Formally LSC can be embedded into CTL*.
Scientific Aims
MT-LAB will pursue the identification of logical specification formalisms and property preserving refinement pre-orders between process models to accommodate not only discrete properties (e.g. functional correctness) but also quantitative properties (e.g. performance and robustness) with explicit requirements to the stochastic and continuous behaviour of the systems. A central goal will be the construction of efficient automatic validation engines.
3. Pursuing the Challenge
While more work is needed to ensure that models and specifications can express the phenomena of interest, the main challenge is in providing the powerful methods, algorithms, and prototypes that allow the models of IT systems to be verified against the specifications of properties and to test them on well chosen scenarios.
We now take a closer look at state of the art and challenges within each of the discrete, stochastic and continuous dimensions.
4. Discrete Models and Properties
The computer scientific approach to IT systems is traditionally based on discrete models that consciously abstract away from less relevant physical phenomena, e.g., the discrete model of a finite automaton [HMU00] is often a useful abstraction of an electronic device [SAH+00]. Static analysis [NNH99] and model checking [CGP00] are two of the most prominent techniques for discrete systems analysis. In many ways they are complementary and largely developed by independent research communities. They are used by some of the largest international companies (e.g. IBM, Intel, Microsoft).
State of the art
State of the art discrete model checking techniques fall in two categories: Automata theoretic and symbolic. Automata theoretic model checking analyzes models of software by explicit inspection of all its possible executions [Hol03]. Symbolic verification analysis inspects many model executions simultaneously, by having mathematical descriptions of numerous executions. It has been extremely successful in scaling model checking to much larger models than previously possible [BCM+90].
Symbolic techniques has been expanded to include compositionality [SAH+00, LAH+01], so that fragments of models can be analyzed in isolation, enabling automatic analysis of industrial models [LHN+02]. Most tools based on model checking employ a top-down scenario, where models are first manually constructed and then verified. Only afterwards an implementation of the system is automatically generated in order to avoid the introduction of human errors [Was05, Was04]. A similar development scenario is widely and successfully applied in design, verification and synthesis of electronic circuits.
Static analysis techniques [NNH99] are approximation techniques often used to extract and analyse models of software for otherwise undecidable properties. In modern compilers, e.g., type systems [DDH72] are used to identify potential run-time errors. The framework of Abstract Interpretation [CC77] has also been used to identify more subtle run-time errors in safety critical software, e.g., the tools ASTREE [BCC+ 03] and PAG [Mar98] are used by AirBus to verify flight control systems. Another tool, the PolySpace Verifier [PSV], has recently been acquired by MathWorks to be incorporated into the widely used Simulink [Sim] verification software. The Flow Logic approach to static analysis [NN02] has been very successful in the verification of security related properties on the basis of process algebraic models - a prime example is the LySa-tool [BBD+03], which efficiently analyses the secrecy, integrity, and authenticity properties of security protocols. Recently, the classical approach of Monotone Frameworks [KU77] has been used to analyse the causal structure of process algebraic models, thereby deriving Finite Automata abstractions of potentially infinite process behaviours [NN06]. In the area of Computational Biology several of these techniques have been used to identify pathways of transport and general metabolism in process algebraic models of cellular systems [Pil07].
Static analysis and model checking have only to a small extent been combined, with notable exceptions being the tools Bandera [Ban], Microsoft Static Driver Verifier [SDV], and Blast [BLA]. In these tools static analysis is used to automatically extract models from existing implementations, which can then be used for verification via model checking and theorem proving. This opens up for many promising applications, which fit the usual software development processes better than the top-down approach described above.
Scientific Aims
The scientists involved in MT-LAB are internationally reputed for their contributions within static analysis and model checking, and will exploit this unique position in creating new combined analysis techniques more powerful than seen before.
5. Stochastic Models and Properties
The quantitative properties of the environment of an IT system are often accompanied by uncertainties best described using stochastic or probabilistic models in the form of discrete time or continuous time Markov chains [Nor98]. In recent years a substantial amount of effort has been invested in developing process algebras supporting the construction of the Markov chains and in inventing logics and model checking tools for reasoning about the processes.
State of the art
The early work of Hillston [Hil96] focused on performance evaluation and introduced the process algebra PEPA as a language for expressing discrete state continuous time Markov chains (CTMC) in a compositional manner – the accompanying PEPA tool allows for simple numerical analyses as well as simulation of the specified models. Around the same time Priami [Pri95] proposed the stochastic pi-calculus as an extension of Milner’s pi-calculus; similar techniques have later been used for extending many other process algebras with stochastic information. Modern state of the art process algebras extend this line of work in various ways; e.g. Brinksma and Hermanns have proposed a process algebra based on continuous time Markov decision processes (CTMDP) [BH01].
Classical model checking techniques have been extended to handle stochastic information. This has involved developing appropriate logics for expressing the desired properties; Continuous Stochastic Logic (CSL), e.g., has been suggested for CTMCs and is implemented as a key component in the model checker PRISM [KNP07]. In combination with a reward structure this allows for the modelling and analysis of a wide variety of systems in the tool, including embedded systems, biological systems and network systems with cryptographic primitives. Systems are specified as simple transition systems annotated with stochastic information; however, PRISM also interacts with PEPA. Other stochastic model checking tools include E|-MC2 [HKM+03] and Rapture [JDL02].
Within static analysis very little work has been done for stochastic settings; we shall only mention probabilistic extensions of the abstract interpretation framework suggested by [Mon00] and [PHW05].
Scientific Aims
Within a number of European projects the scientists involved in MT-LAB have contributed significantly to model checking methods for different types of Markovian models (Discrete Time Markov Chains, Markov Decision Processes). MT-LAB will focus on extensions to models combining stochastic and continuous aspects and will develop – so far non-existing – static analysis techniques applicable to stochastic models.
From the point of view of “traditional” mathematical modelling the approach of MT-LAB offers a unique chance to integrate and further develop recent advances in the area of stochastic models. This involves techniques for dealing with the joint distribution of distinct properties of interest. An important example would be the joint distribution of the time an power required to complete a certain task.
6. Continuous Models and Properties
Within classical Control Theory [FPE01] the predominant way of modelling an IT system is through a set of differential equations [Har02] describing the evolution of physical phenomena in the environment when regulated by a given control program. A serious restriction is that only systems with completely deterministic behaviour may be modelled, making it impossible to deal with the future use of distributed control systems. To overcome this restriction, the new area of Hybrid Systems has emerged in the intersection between Computer Science and Control Theory.
The models studied within Hybrid Systems combine in different ways the continuous models of Control Theory with the discrete models of Computer Science, always with a chosen trade-off between expressivity and analysability.
State of the art
A control system can be viewed as a family of dynamical systems parameterized by control that influences the evolution of the system’s trajectory to satisfy desired specifications. Virtually all control systems are nonlinear in nature. However, if a system does not deviate too much from the nominal conditions it is possible to describe its operation by a linear model, that is, a set of linear ordinary differential equations [HSD04]. This is the reason why linear systems have occupied so much attention in control theory. A great number of linear control theories have been successfully developed over the past decades, to mention a few: optimal [AM71], robust [ZDG96], adaptive control [AW94]. Each of these provides a software tool that is capable of computing a control algorithm for a given class of differential equations.
Now, facing nonlinear systems, the paradigm of Hybrid Systems has recently attracted considerable attention emerging in the intersection between Control Theory and Computer Science. Here a nonlinear system is approximated by a number of linear subsystems each defined only in a small domain, e.g. a simplex or a cube. Formally, a (piecewise affine) Hybrid System [Son81] is a control system of the following form. The discrete dynamics (gluing of the linear subsystems) is specified by a finite-state automaton. A piecewise affine control system is associated to each discrete location. The design of a controller for a nonlinear system is now equivalent to finding a discrete supervisor that switches between discrete locations and a family of linear controllers controlling the system only locally [HCS06].
The restricted class of Linear Hybrid Automata, introduced by Henzinger et al. [HHW97], allows for the modelling of systems, where the flow of continuous variables in a given mode can be characterized by simple interval constraints on rates and mode-switches being triggered by linear constraints on the continuous variables. Though generally undecidable, several practical models within this class may be analysed using a validation engine based on polyhedra.
Finally, the modelling formalism of Timed Automata, introduced by Alur and Dill [AD90], may be seen as an even more restricted class of Hybrid Systems, where the only continuous variables are clocks with switching between modes essentially only triggered by bounds (upper and lower, strict and non-strict) on clocks. Despite its relatively limited expressive power, Timed Automata has established itself as a favourite formalism for modelling real-time systems and timing issues. A main reason for this success has been the development of very efficient data structures and algorithms for the analysis of Timed Automata as well as their realisation in tools such as the widely distributed tool UPPAAL [UPA].
Priced (or weighted) extensions of Timed Automata have been introduced [BFH+01] allowing to model and efficiently analyse [LBB+01, RLS04] not just timing properties but also consumption of resources (e.g. energy, money, memory etc.).
Scientific Aim
The scientists involved in MT-LAB will concentrate on the challenge of defining property preserving transformations from hybrid system models to timed or discrete models in order that the powerful analysis techniques for timed automata and discrete systems may be applied. Also we will devote effort to defining – so far mainly absent – logical specification formalisms and analysis methods directly related to hybrid systems. The formalism of Priced Timed Automata is subject to very active foundational research internationally, and MT-LAB intends to play a leading role in this.
From the point of view of “traditional” mathematical modelling the approach of MT-LAB offers a unique chance to develop tractable ways of dealing with important control system properties, such as reachability (ability of a system to attain a specified goal) and stability (ability to keep the goal against disturbances), which is currently beyond state of the art.
Crosscutting Combinations
A challenging crosscutting combination of stochastic and mathematical models is the study of stochastic differential equations, an area known as Stochastic Control Theory. Here the challenge is to find an adequate stochastic process, which realistically models the uncertainty inherent in the given problem, and to develop the corresponding algorithms. An interesting development is taken in PEPA where stochastic analyses, solutions of differential equations, and simulations are integrated into the same palette. Processes are described in the process algebra of choice, and the initial modelling is performed using Markov chains whose "steady-state" distribution can be calculated using existing methods and tools. When the complexity of the model increases and the computation of the "steady-state" distribution become prohibitive, the same process algebraic model is given an interpretation using linear differential equations, which in practice can be solved much faster. To achieve additional insight into the dynamic aspects of the system, e.g. the speed with which it stabilizes, or if the numerical algorithms are prohibitive even in the case of differential equations, one may perform sequences of stochastic simulations.
|