Josh Munn's Website
microkanren, cons, and Python [2022-04-27 Wed]

Home

Table of Contents

I've been implementing µKanren1, 2 in Python. One challenge so far has been creating a sequence abstraction with semantics similar to Lisp lists. This is important as it's necessary to be able to decompose sequences into parts for unification, such that:

  1. I can get the first element of a sequence;
  2. I can get the rest of the sequence; and
  3. the "rest" of the sequence might be a sequence or a non-iterable value (including logic variables).

Cons

The ubiquitous data structure in Lisp-like languages is a two part cell known as a cons or a pair. It can be thought of as a struct with two fields, instances of which are created using the function cons. Conceptually, (cons 1 2) yields the pair containing 1 in its first part and 2 in its second part.

┌───┬───┐
│ 1 ┊ 2 │
└───┴───┘
(cons 1 2)

Lists are composed of pairs - a proper list is a pair in which the second part contains either a proper list or the empty list.

(cons 1 (cons 2 (cons 3 '()))) creates the list containing 1, 2, and 3 3, 4:

┌───┬───┐  ┌───┬───┐  ┌───┬───┐  
│ 1 ┊ ╾────┤ 2 ┊ ╾────┤ 3 ┊ ∅ │
└───┴───┘  └───┴───┘  └───┴───┘  
A proper list

┌───┬───┐  ┌───┬───┐
│ 1 ┊ ╾────┤ 2 ┊ 3 │
└───┴───┘  └───┴───┘
An improper list (not terminated by the empty list)

Python

Getting back to Python: first you might try to emulate the behaviour of Lisp lists using the builtin list (or tuple), which meets our first two criteria and can be conveniently decomposed with unpacking assignments:

the_list = [1, 2, 3, 4]
first, *rest = the_list  # also the_list[0], the_list[1:]
first, rest
(1, [2, 3, 4])

This unfortunately fails the third criteria - I want to be able to get the "rest" of the list, allowing the "rest" to be a list or something else.

I would like to be able to construct a sequence such that the terminating element is some entity that represents that the sequence is not yet fully instantiated - I don't know what the rest of the sequence is, yet - but it may become a proper sequence at some point in time. That last sentence seems a little "spooky" in the Fox Mulder sense, so here's an example:

  1. (1, 2, 3 ? x) represents a sequence in which the suffix is unknown;
  2. if at some point in time x is found to represent the sequence (4, 5), the original sequence is now known to be (1, 2, 3, 4, 5);
  3. if x is found to represent the single element 4, the sequence is now known to be (1, 2, 3, 4); and
  4. if x is found to represent the empty sequence, the sequence is now known to be (1, 2, 3).

A new data type

One could try to conditionally handle the retrieval of the "rest" part of a sequence everywhere it is needed, but I think a cleaner approach is to encapsulate the desired behaviour in a new data type. My first attempt was a class that encapsulated the semantics of unpacking assignments, UnpackingAssignment. It looked something like this:

class UnpackingAssignment:
    def __init__(self, *heads, glob=None, rest=None):
        self.heads = heads
        self.glob = glob
        self.rest = rest

pack = UnpackingAssignment  # conflicting naming, but it's a convenient shorthand

pack(1, 2, glob=[3, 4], rest=[5])  # a, b, *c, d = [1,2,3,4,5]

pack(glob=[3, 4], rest=[5, 6])  # *a, b, c = [3,4,5,6]

pack(1, glob=[2, 3])  # a, *b = [1,2,3]

The variadic nature of the fields makes unification tricky, particularly once you add logic variables and arbitrarily deeply nested structures into the mix. One area of difficulty is unifying UnpackingAssignment instances that contain "overlapping" fields. In the following examples, variable names represent logic variables, represents unification, and { u₀ ↦ v₀, u₁ ↦ v₁, ... } represents the resulting set of mappings.

Some simple cases:

  • pack(1, glob=x) ≡ [1, 2, 3]
    • { x ↦ [2, 3] }
  • pack(x, glob=[2, 3]) ≡ [1, 2, 3]
    • { x ↦ 1 }
  • pack(glob=x, heads=[3]) ≡ [1, 2, 3]
    • { x ↦ [1, 2]}
  • pack(x, glob=[2, 3], rest=[y]) ≡ [1, 2, 3, 4]
    • { x ↦ 1, y ↦ 4 }

And some more interesting ones:

  • pack(1, glob=x, rest=y) ≡? [1, 2, 3, 4]
    • intuition says that the set of results should be:
      • { x ↦ [], y ↦ [1, 2, 3, 4] }
      • { x ↦ [1], y ↦ [2, 3, 4] }
      • { x ↦ [1, 2], y ↦ [3, 4] } (and so on)
  • pack(1, 2, glob=x, rest=[4, 5]) ≡? pack(1, glob=y, pack(glob=x, rest=[4, 5]))
    • intuitively these terms should unify with { x ↦ [3], y ↦ [2] }
  • pack(x, glob=[2, 3, 4], rest=[5, 6, 7]) ≡? pack(1, 2, glob=[3], rest=y)
    • again, there is an intuitive set of results ({ x ↦ 1, y ↦ [4, 5, 6, 7] }), but translating these intuitions into an implementation feels a bit like solving the Towers of Hanoi in the dark.

Footnotes: