Architecture

Over the last week rapid progress is being made moving towards an initial test of the system on an artificial dataset.

This process has involved integrating various sub-systems that have been written so far with external libraries that have been adopted as well as writing some new code.

The current architectural layout is roughly as follows.

  1. Graph language
  2. CPT Learner
  3. Model Generation
  4. Observer
  5. Sampler

Graph Language

This a small and simple plain text representation of the graph structure. It serves as a standard representation of the graph that can be transcribed into the format required by each of the subsystems. At a later date this will be auto generated through machine learning techniques.

CPT Learner

This subsystem has evolved from the code discussed previously in this blog. It loads in the graph from the graph language and represents it in a network. Training data is then passed with this graph structure in mind and CPTs for each node are generated. These are stored at the end of the training process using the Python 'pickle' library to serialise the  CPT itself.

Model Generation

PYMC utilises models to perform calculations upon (see Adventures with PYMC for more details). To make this possible the model generation system combines the graph structure from the graph language and the learnt CPTs stored in pickle files to generate a python file that represents the model for PYMC to use. One can also pass observed variables into the model.

Observer

The observer looks through a dataset and identifies observed nodes in the data that correspond to nodes the graph structure. These are then passed into the aforementioned model generation system.

Sampler

With the model is created it can now be sampled. This uses the PYMC library. The result of the sampling allows the probability of each node to be extracted.

 

Further details of some of these stages will be discussed in subsequent blog posts.

Leave a Reply

Your email address will not be published. Required fields are marked *