MANUAL.md update

2025-06-06 12:24:40 +02:00 · 2020-10-29 13:41:52 +01:00 · 2020-10-29 13:41:52 +01:00 · df73401a86
commit df73401a86
parent ec14abcaff
2 changed files with 59 additions and 12 deletions
--- a/MANUAL.md
+++ b/MANUAL.md
@ -1,6 +1,24 @@
 # Velascobot: Manual

-**OUTDATED: REVISION PENDING**
+Some notes:
+
+- Scriptorium version: Velasco v4.X (from the "Big Overhaul Update" on 27 Mar, 2019 until the 2nd Overhaul)
+  - Recognizable because Readers are Scribes and stored in a big dictionary called the Scriptorium, among others
+- Overhaul 2 version: starting with Velasco v5.0
+
+# Updating to Overhaul 2
+
+If you have a Velasco clone or fork from the Scriptorium version, you should follow these steps:
+
+1. First of all, update all your chat files to CARD=v4 format. You can do this by making a script that imports the Archivist, and then loading and saving all files.
+2. Then, pull the update.
+3. To convert files to the new unescaped UTF-16 encoding (previously the default, escaped UTF-8, was used), edit the `get_reader(...)` function in the Archivist so it uses `load_reader_old(...)` instead of `load_reader(...)`.
+4. Make a script that imports the Archivist and calls the `update(...)` function (it loads and saves all files).
+5. Revert the `get_reader(...)` edit.
+
+And voilà! You're up to date. Unless you want to switch to the `mongodb` branch (WIP).
+
+# Mechanisms

 ## Markov chains

@ -12,17 +30,29 @@ The actual messages aren't stored. After they're processed and all the words hav

 The storing action is made sometimes when a configuration value is changed, and whenever the bot sends a message. If the bot crashes, all the words processed from the messages since the last one from Velascobot will be lost. For high `period` values, this could be a considerable amount, but for small ones this is negligible. Still, the bot is not expected to crash often.

+## Speaker's Memory
+
+The memory of a `Speaker` is a small cache of the `C` most recently modified `Readers` (where `C` is set through a flag; default is `20`). A modified `Reader` is one where the metadata was changed through a command, or a new message has been read. When a new `Reader`is modified that goes over the memory limit, the oldest modified `Reader` is pushed out and saved into its file.
+
+## Reader's Short Term and Long Term Memory
+
+When a message is read, it gets stored in a temporal cache. It will only be processed into the vocabulary `Generator` when the `Reader` is asked to generate a new message, or whenever the `Reader` gets saved into a file. This allows the bot to answer to other recent messages, and not just the last one, when the periodic message is a reply.
+
 ## File hierarchy

-For those who are interested in cloning or forking:
+- `Generator` is the object class that holds a vocabulary dictionary and can generate new messages
+- `Metadata` is the object class that holds one chat's configuration flags and other miscellaneous information.
+  - Some times the file where the metadata is saved is called a `card`.
+- `Reader`is an object class that holds a `Metadata`instance and a `Generator` instance, and is associated with a specific chat.
+- `Archivist`is the object class that handles persistence: reading and loading from files.
+- `Speaker` is the object class that handles all (or most of) the functions for the commands that Velasco has
+  - Holds a limited set of `Readers` that it loads and saves through some `Archivist` functions (borrowed during `Speaker` initialization).
+- `velasco.py` is the main file, in charge of starting up the telegram bot itself.

- `velasco.py` is the file in charge of starting up the telegram bot itself
- `speaker.py` is the file with all the functions for the commands that Velasco has
- A *Speaker* is then the entity that receives the messages, and has 1 *Parrot* and 1 *Scriptorium*
- The *Scriptorium* is a collection of *Scribes*. Each *Scribe* contains the metadata of a chat (title, ID number, the `period`, etc) and the Markov dictionary associated to it
- *Scribes* are defined in `scribe.py`
- A *Parrot* is an entity that contains a Markov dictionary, and the *Speaker's Parrot* corresponds to the last chat that prompted a Velasco message. Whenever that happens, the *Parrot* for that chat is loaded, the corresponding *Scribe* teaches the *Parrot* the latest messages, and then the *Scribe* is stored along with the updated dictionary
- A Markov dictionary is defined in `markov.py`
- The *Archivist* (defined in `archivist.py`) is in charge of doing all file saves and loads
+### TODO

-**Warning:** This hierarchy is pending an overhaul.
+After managing to get Velasco back to being somewhat usable, I've already stated in the [News channel](t.me/velascobotnews) that I will focus on rewriting the code into a different language. Thus, I will add no improvements to the Python version from that point onwards. If you're interested of picking this project up and continue development for Python, here's a few suggestions:
+
+- The `speaker.py` is too big. It would be useful to separate it into 2 files, one that has surface command handling, and another one that does all the speech handling (doing checks for `restricted` and `silenced` flags, the `period`, the random chances, ...).
+- For a while now, Telegram allows to download a full chat history in a compressed file. Being able to send the compressed file, making sure that it *is* a Telegram chat history compressed file, and then unpacking and loading it into the chat's `Generator` would be cool.
+- The most active chats have files that are too massive to keep in the process' memory. I will probably add a local database in MongoDB to solve that, but it will be a simple local one. Expanding it could be a good idea.
--- a/README.md
+++ b/README.md
@ -38,7 +38,7 @@ Sending the command on its own (e.g. `/period`) tells you the current value. Sen

 ### Answer

-This value is the chance of the bot to answer to a message that is in turn a reply to one of its own messages, or to a message that mentions the bot (see above: [Summon](###Summon)). The default value is `0.5` (50% chance). The maximum is `1` (100% chance) and to disable it you must set it to 0 (0% chance).
+This value is the chance of the bot to answer to a message that is in turn a reply to one of its own messages, or to a message that mentions the bot (see above: [Summon](#summon)). The default value is `0.5` (50% chance). The maximum is `1` (100% chance) and to disable it you must set it to 0 (0% chance).

 Sending the command on its own (e.g. `/answer`) tells you the current value. Sending a positive decimal number between `0` and `1` inclusive (e.g. `/answer 0.95`) will set it as the new value.

@ -49,3 +49,20 @@ This toggles the chat's *restriction* (off by default). Having the chat *restric
 ### Silenced

 This toggles the chat's *silence* (off by default). Having the chat *silenced* means that possible user mentions that may appear in randomly generated messages, will be disabled by enveloping the '@' between parentheses. This will avoid Telegram mention notifications, specially useful for those who have the group chat muted.
+
+## When does the bot send a message?
+
+The bot will send a message, guaranteed:
+
+- If someone sends the `/speak` command, and have permissions to do so.
+- If `period` messages have been read by the bot since the last time it sent a message.
+
+In addition, the bot will have a random chance to:
+
+- Reply to a message that mentions it (be it the username, like "@velascobot", or a name from a list of given nicknames, like "Velasco").
+  - The chance of this is the answer probability configured with the `/answer` command.
+  - This does not affect the `period` countdown.
+- Send a guaranteed message as a reply to a random recent read message (see [below](#readers-short-term-and-long-term-memory)) instead of sending it normally.
+  - The chance of this is the `reply` variable in `Speaker`, and the default is `1`.
+- Send a second message just after sending one (never a third one).
+  - The chance of this is the `repeat` variable in `Speaker`, and the default is `0.05`.