Restructure
When I started this, I left a comment in the Document model class:
# The entire document is stored in the text parameter. (This may change later.)The question is, "Which model class will be the point at which the text is no longer stored, and where the semantic structure alone stores the text?"
I had a few options:
- Only the Sentence model
- Only the Token model
- Only permanently in the Token, temporarily everywhere else.
I chose 3. I.e. that during an apply_... function call, the text is first rebuilt from the children, edited, and then removed again after the semantic representation has been rebuilt.
For the purpose of NLP, the tokens are important (and they will be expanded with more properties.) For the purpose of NLP, the only hierarchy of the document I need to store is Document -> Chapter -> Paragraph -> Sentence -> Token. Phrase is unnecessary. So I will remove it.
The other thing I mentioned above, about making the text temporary in everything except the Tokens, will be addressed when I add the queue workers for handling the rebuilding of the semantics.
I noticed a bug in the models, I'm not initialising the id's correctly.
Services
I started to write 2_services.md
TL;DR
Services do the hard work, and they do all the tokenisation and NLP semantics via jobs added to a queue.
A gripe I have about OpenCode: it is very easy to accidentally answer it incorrectly. When you push enter for a newline, it assumes you are answering, and it is too easy to accidentally select the wrong numbered optional answer.
But it did a pretty good job of creating the queue, but it added a sleep after catching a cancel in the worker. I had to prompt it to fix it.
It fixed it very quickly.