Model validation is actually a misnomer

Mathematical modelling is in core of systems biology. Models are usually developed based on our understanding about biological system and described using a set of differential equation. Model behavior is characterized by parameters those are generally identified using appropriate experimental data. Normally parameter estimation and optimization is carried out using a finite number of experimental data sets. Despite adopting the same protocol, no two experiments will ever produce exactly same data, which means in principal a model optimized over a very large data set will be more valid than a model identified using a small data set. This makes model validation a tedious task, and often for a given new experimental data set model exhibits the discrepancies between observed experimental data and the model behavior. Time to time it has been suggested that model validation is unnecessary practice and validity of a model cannot be established unless model is optimized over infinite number of experimental data sets. In a latest BMC Bioinformatics paper Anderson and Papachristodoulou suggest that
In principle the only statement that one can make about a system model is that it is incorrect, i.e., invalid, a fact which can be established given appropriate experimental data.

and

In fact, in order to understand biological function one should try to invalidate models that are incompatible with available data.

Model invalidation is a sound approach to check loop holes in the model and it helps in identifying where parameters and/or system structure should be refined. In order to evaluate how good the model structure is to represent the experimental observations authors first establish that there is strong interplay between system identification and model invalidation.

We begin this investigation by highlighting the link between system identification and model invalidation. The question we set out to answer is “Given experimental data, what is the least error one can expect between the data and predictions from a model with the best parameter choice within the allowable parameter range?” Most system identification questions try to find the best parameters in order to minimize an objective function of the error between model predictions and data, while the question we are asking here is dual to that: ‘How bad is the best model, for all allowable parameters?’ If the error is large, then this could indicate that the model structure may be inappropriate and one may want to invalidate the model and repeat the system identification cycle.

Further they describe a simulation free framework for invalidating both continuous and discrete-time ODE models based on convex optimization techniques.

We then provide a methodology for discrete-time and continuous-time model invalidation using ideas for Real Algebraic Geometry and Semidefinite Programming. The aim is to construct functions/certificates that provide proof of the fact that the model can never represent an experimental data set. We stress that simulation cannot be used for this purpose, unless the data is certain, the model size is small and its structure, initial conditions and parameters are fixed. The reason for this is that as models become more complex (containing more states and parameters) exhaustive simulation for model invalidation becomes computationally prohibitive – as well as being inconclusive.

Compared to simulation based approaches where problem of invalidating complex nonlinear models of high dimension is computationally intractable and inconclusive, current method has a demonstrated superiority as it does not require any simulation of the candidate models.

Reference:

James Anderson,Antonis Papachristodoulou (2009). On validation and invalidation of biological models BMC Bioinformatics, 10 (132)

Share and Enjoy:
  • HackerNews
  • Twitter
  • Facebook
  • Google Buzz
  • LinkedIn
  • Posterous
  • Tumblr
  • Digg
  • Reddit
  • del.icio.us
  • DZone
  • FriendFeed
  • Suggest to Techmeme via Twitter
  • Print
  • RSS
  • Slashdot

9 Responses to “Model validation is actually a misnomer”
  1. JMG
    05.07.2009

    “Further they describe a simulation free framework for invalidating both continuous and discrete-time ODE models”

    Erm, is this a typo? ODE (Ordinary Differential Equation) models are by definition always continuous. There’s no such thing as a discrete-time ODE model. I hope this isn’t too nit-picky, but I thought I should point this out.

  2. Steve
    05.07.2009

    Discrete time ODE: x_n+1 = Ax_n + b is the discrete time equivalent of xdot = Ax +b
    http://en.wikipedia.org/wiki/Dynamical_system#Maps

  3. JMG
    05.07.2009

    So, that would be a discrete-time dynamical system, in this case a difference equation, NOT a differential equation. They’re related (dynamical systems are great!) but calling the equation you’ve written a discrete time ODE is not valid.

  4. Model validation is actually a misnomer http://tinyurl.com/ddaxut

  5. 05.07.2009

    RT @ResearchBlogs Model validation is actually a misnomer http://tinyurl.com/ddaxut

  6. 05.07.2009

    Lot of heat out there in comment section :-)

  7. 05.07.2009

    Model validation is actually a misnomer: Mathematical modelling is in core of systems biology. Models are usuall.. http://tinyurl.com/d9qx3c

  8. Tom W
    05.07.2009

    Of course it’s valid! A difference equation or difference-differential equation (check math-mathworld) is a sub class of differential equations. Granted it’s the discrete equivalent of a continuous differential equation. But it’s still an ODE.

  9. 05.07.2009

    @Tom @Steve I could not agree more, Discrete-time ODE is used to link continuous-time ODE with discrete-time experimental data and this is very common for biochemical models where different discretization techniques are used to formulate a discrete time representation of continuous-time ODE systems. Even numerical integration is also aggregated over certain discretization.