'Reference' RDFLib Implementation of RETE

Just a collection of my thoughts about the possiblity of developing (from scratch) an implementation of the RETE algorithm that integrates much more cleanly with RDFLib than Pychinko. Mostly as a way to avoid the heavy software-dependency burden associated with using Pychinko for forward-chaining. The simplicity of the RETE algorithm was the main motivation.

Rete Network Diagram from 'Comparison of the Rete and Treat Production Matchers for Soar'

Alpha Node

An alpha node filters RDF triples against a triple pattern. The triple pattern corresponds to a quoted statement (from the antecedent) of a rule. The triple is passed on by the alpha node if it matches the triple pattern.

class TriplePattern:
  def __init__(self,(self,predicate,object)):
    self.subject = subject
    self.predicate = predicate
    self.object = object

class ReteFact:
  def __init__(self,(self,predicate,object)):
    self.subject = subject
    self.predicate = predicate
    self.object = object
    self.alphaNode = None #The alpha node that passed this Fact on (used for testing variable consistency)

class AlphaNode:
  def __init__(self,triplePattern):
    self.triplePattern = triplePattern
    self.alphaMemory = Set() #The set of facts that passed the test
    self.betaNode = None #The BetaNode fed by this AlphaNode

  def test(self,aReteFact):
    .. if fact passes attach this AlphaNode and add it to self.alphaMemory ..

The AlphaNode class could support a mechanism for registering certain built-in predicates with functions to invoke instead of test. This would be similar to the way PyChinko relies on CWM built-ins for implementing specialized Fact tests with the exception that arbitrary extensions can be built to work with the core algorithm.

Beta Nodes

A beta node is associated with two incoming sets of facts. The beta node is also associated with an outgoing set of facts which are updated with only those incoming facts that have consistent variable bindings. That is, for each of the variables in common, they must be bound to the same value. The outgoing set is used for the left incoming set of the beta node below it.

class BetaNode:      
  def __init__(self,leftNode,rightNode,terminalNode=False,consequent=None):
    self.terminalNode = terminalNode #Boolean value indicating if this is the bottom of the RETE network or not
    self.consequent = consequent is None and [] or consequent #The consequent is a list of TriplePatterns that will be fired if this is a terminal node and propagation is 'successful'
    self.leftNode = leftNode
    self.rightNode = rightNode #The incoming right input of a BetaNode is always an AlphaNode
    self.betaMemory = Set() #The set of facts (from the memory of both incoming nodes) with consistent bindings 
    self.betaNode = None #The BetaNode fed by this BetaNode

  def propagate(self):
    .. checks for consistent variable bindings between memory of incoming nodes ..


class ReteNetwork:      
  """
  """
  def __init__(self,ruleGraphs):
    self.topLevelAlphaNodes = []
    self.inferredFacts = []
  def serialize(self):
    "Serialize network in N3 - for persistence"
    pass
  def reset(self):
    "Reset the network by emptying the memory associated with all contained nodes"
    pass
  def buildNetw
  def feedFact(self,aFact):
    "Passes a fact to the top level alpha nodes, propagates through the network and adds all inferred facts to self.inferredFacts (if any)"
    pass

The BetaNode is either a terminal (i.e., any facts that 'pass' will trigger the entire rule) or it isn't. If it's a terminal, it's associated with the consequent of the rule (a set of TriplePatterns). If it isn't a terminal, it is associated with a betaMemory which will hold all facts that pass the propagation test.

The propagation test checks for variables that are common between the memory of both incoming nodes. It propagates facts where these variables are bound to the same value. If this is a terminal node, then for each propagated fact, the consequent is filled in with the values for each variable and the triplePatterns should be 'fired.'

An Vocabulary for Persisting a RETE Network

For the sake of reuse, documentation, and/or unit testing, a compiled RETE network can be expressed in RDF using a vocabulary specifically tailored for the components of the RETE alghorithm.

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix : <http://metacognition.info/ontologies/ReteVocabulary.owl#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix log: <http://www.w3.org/2000/10/swap/log#>.

:AlphaNode a owl:Class.
:BetaNode a owl:Class.
:TerminalNode a owl:Class.
              rdfs:subClassOf :BetaNode.

:alphaNodeTest a owl:ObjectProperty;
               rdfs:comment "The alpha node test is a triple pattern in the rule antecedent";
               rdfs:domain :AlphaNode;
               rdfs:range rdf:Statement.

:leftInputOf a owl:ObjectProperty;
               rdfs:domain owl:unionOf (:AlphaNode :BetaNode);
               rdfs:range :BetaNode;
               owl:inverseOf :leftInput.

:rightInputOf a owl:ObjectProperty;
               rdfs:domain owl:unionOf (:AlphaNode :BetaNode);
               rdfs:range :BetaNode;
               owl:inverseOf :rightInput.

:leftInput a owl:ObjectProperty;
           owl:inverseOf :leftInputOf;
           rdfs:domain :BetaNode;
           rdfs:range owl:unionOf (:AlphaNode :BetaNode).

:rightInput a owl:ObjectProperty;
           owl:inverseOf :rightInputOf;
           rdfs:domain :BetaNode;
           rdfs:range owl:unionOf (:AlphaNode :BetaNode).

:ruleConsequent a owl:ObjectProperty;
                rdfs:comment "The set of consequents (associated with the rule) that will be 'fired' when facts propagate past this Terminal Node";
                rdfs:domain :TerminalNode;
                rdfs:range log:Formula;

Python Heuristic for Compiling Alpha Nodes - First pass

for subject,predicate,object in antecedentFormula:
    if not AlphaNode(TriplePattern((subject,predicate,object))) in network:
        .. add to network if there was no existing multi input pattern that had matched so far..
        .. otherwise connect to existing multi input pattern (or alpha node)..
    else:
        if ..previous iteration was also in the network..:
          ..continue interation..
        else:
          ..begin match for multi input pattern..

Diagrams of Compiled Networks

Compiled networks can be processed into the Graphviz .dot format, from which diagrams of the compiled networks can be generated into the various output formats supported by Graphviz.

Test-driven Development

Considering the very simple and specific behavior associated with the components of the algorithm, it would seem that fully regressive unit-tests could be easily written for the AlphaNode,BetaNode,ReteNetwork classes to facilitate test-driven development.

To be continued

References