Cypher

This page will list my prompts, and its replies. This is simply a dialog between me and the machine as I start to form a design, in an area that I am quite unfamiliar in. Rather than we state the obvious each time "I prompted the LLM", I'll simply just list my prompts and its replies.

list the CREATE CONSTRAINT commands I will need to express the labels needed for the model objects

CREATE CONSTRAINT FOR (n:Document)  REQUIRE n.id IS UNIQUE;
CREATE CONSTRAINT FOR (n:Chapter)   REQUIRE n.id IS UNIQUE;
CREATE CONSTRAINT FOR (n:Paragraph) REQUIRE n.id IS UNIQUE;
CREATE CONSTRAINT FOR (n:Sentence)  REQUIRE n.id IS UNIQUE;
CREATE CONSTRAINT FOR (n:Token)     REQUIRE n.id IS UNIQUE;

That effecctively creates the 5 labels, Document, Chapter, Paragraph, Sentence, and Token.
Although Cypher does not have the equivalent of CREATE TABLE, as does SQL, I would think that these
5 commands would need to be executed once, and never again, in the database, similarly to how
CREATE TABLE commands are typically run in a database migration task.
I don't want to run these commands every time the application starts up. I only want to run them once.
What is the standard way to achieve this using Python and Neo4J?

It replied with a fairly wordy 3-option response, where option 1 was to use IF NOT EXISTS, option 2 was to check in Python first if it exists, which is functionally equivalent, and in my opinion, pointlessly complex. Option 3 was to use a standalone migration script. I think that it will be necessary to have a migration scaffold working for later on, in case I need to change the topology, or do a data migration. Big Pickle was advising me to use option 1, but I'm not taking its advice. I'm going to set up migrations.

What python-based migration runners, that include versioning support, so that
if I make a change at any point, and re-run the migrations, they will pick up at the last run
migration and find the migrations that follow it and have not as yet been run, and run them
only (not the preceding already-run migrations) in order (using their order number or date stamp,
or some other ordering mechanism)?

It gave me three options: custom Python code, Liquibase, or neo4j-migrations (a CLI).
If I implemented a migration runner in Python, I'd effectively be re-creating the wheel, and really, why not just use the neo4j-migrations CLI. I installed the migration runner:

brew install michael-simons/homebrew-neo4j-migrations/neo4j-migrations

I added some commands to the Makefile, and then:

Make a cypher file in @neo4j/migrations called V001__create_constraints.cypher
(V001 meaning the first migration with version number 1). The rest of the name is entirely
descriptive except for the ".cypher" extension. This should create the constraints for the models.

I then tested a migration:

make migrate       
neo4j-migrations --database=test migrate
[2026-06-08T17:57:40.652825000] Applied migration 001 ("create constraints").
Database migrated to version 001.

That worked perfectly, but I still was unsure about how to write the models to the database, or how to retrieve them back from the database. I wrote some description of how the models and the repository (neo4j) would interact. See this file: README.md
I was scratching around for ideas, so:

Read the file @src/dockb/repositories/README.md and discuss with me how to write/read
to/from the Neo4j database, specifically how to represent the parent-child relationships and
how ordering is maintained.

I won't bore you with the details, but what proceeded from that was discussion about how to represent the models in Neo4j ... in effect I was learning about Cypher. After a few prompts, I thought this made some sense as an example of storing a Sentence to the database:

MERGE (s:Sentence {id: $sentence_id})

MATCH (s)-[r:HAS_TOKEN]->(orphan:Token)
WHERE NOT orphan.id IN $token_ids
DELETE r, orphan

UNWIND $tokens AS t
  MERGE (tok:Token {id: t.id})
  SET tok.text       = t.text,
      tok.type       = t.type,
      tok.trailing_ws = t.trailing_ws,
      tok.is_digit   = t.is_digit,
      tok.like_num   = t.like_num,
      tok.is_alpha   = t.is_alpha,
      tok.is_stop    = t.is_stop,
      tok.lemma      = t.lemma,
      tok.pos        = t.pos
  MERGE (s)-[r:HAS_TOKEN]->(tok)
  SET r.position = t.position

Passed as parameters:
{
  "sentence_id": "<uuid>",
  "sentence_text": "The cat sat on the mat",
  "token_ids": ["<uuid-1>", "<uuid-2>", ...],
  "tokens": [
    {"id": "<uuid-1>", "text": "The", "type": "word",  "trailing_ws": " ",  "lemma": "the", "pos": "DET",  "position": 0},
    {"id": "<uuid-2>", "text": "cat", "type": "word",  "trailing_ws": " ",  "lemma": "cat", "pos": "NOUN", "position": 1},
    {"id": "<uuid-3>", "text": "sat", "type": "word",  "trailing_ws": " ",  "lemma": "sit", "pos": "VERB", "position": 2},
    {"id": "<uuid-4>", "text": "on",  "type": "word",  "trailing_ws": " ",  "lemma": "on",  "pos": "ADP",  "position": 3},
    {"id": "<uuid-5>", "text": "the", "type": "word",  "trailing_ws": " ",  "lemma": "the", "pos": "DET",  "position": 4},
    {"id": "<uuid-6>", "text": "mat", "type": "word",  "trailing_ws": "",   "lemma": "mat", "pos": "NOUN", "position": 5}
  ]
}

Key points:
- MERGE is idempotent — creates if new, matches if existing — so this single query handles both NEW and
CHANGED states for the Sentence and each Token.
- position on the HAS_TOKEN relationship preserves order; query with ORDER BY r.position when reading.
- Orphan cleanup: the MATCH ... WHERE NOT deletes tokens that were removed from the sentence
(handles DELETED state).

Then to retrieve a Sentence and its Tokens:

MATCH (s:Sentence {id: $sentence_id})-[:HAS_TOKEN]->(t:Token)
RETURN s.id AS sentence_id,
       t.id            AS id,
       t.text          AS text,
       t.type          AS type,
       t.trailing_ws   AS trailing_ws,
       t.is_digit      AS is_digit,
       t.like_num      AS like_num,
       t.is_alpha      AS is_alpha,
       t.is_stop       AS is_stop,
       t.lemma         AS lemma,
       t.pos           AS pos
ORDER BY r.position

Pass a parameter to replace $sentence_id with the actual UUID:
{
  "sentence_id": "<uuid>"
}

The ORDER BY r.position on the relationship ensures tokens come back in the correct order to reconstruct
the sentence text.

To get back a Chapter - rather than get back each paragraph, I'd want to get back the paragraph IDs:

MATCH (c:Chapter {id: $chapter_id})-[:HAS_PARAGRAPH]->(p:Paragraph)
RETURN c.id AS chapter_id,
       collect(p.id ORDER BY r.position) AS paragraph_ids
Returns a single row:
{
  "chapter_id": "<uuid>",
  "paragraph_ids": ["<uuid-1>", "<uuid-2>", "<uuid-3>", ...]
}

Then your Python code creates a Chapter with those IDs, and you lazy-load each Paragraph on demand (or batch-load the ones you need).

At this point, I was ready to start creating the repository classes. I admit, I could mess up badly, given that I am such a novice at Cypher, but I'll learn as I muddle along.