MANUAL.md update

This commit is contained in:
vylion 2020-10-29 13:41:52 +01:00
parent ec14abcaff
commit df73401a86
2 changed files with 59 additions and 12 deletions

View file

@ -1,6 +1,24 @@
# Velascobot: Manual
**OUTDATED: REVISION PENDING**
Some notes:
- Scriptorium version: Velasco v4.X (from the "Big Overhaul Update" on 27 Mar, 2019 until the 2nd Overhaul)
- Recognizable because Readers are Scribes and stored in a big dictionary called the Scriptorium, among others
- Overhaul 2 version: starting with Velasco v5.0
# Updating to Overhaul 2
If you have a Velasco clone or fork from the Scriptorium version, you should follow these steps:
1. First of all, update all your chat files to CARD=v4 format. You can do this by making a script that imports the Archivist, and then loading and saving all files.
2. Then, pull the update.
3. To convert files to the new unescaped UTF-16 encoding (previously the default, escaped UTF-8, was used), edit the `get_reader(...)` function in the Archivist so it uses `load_reader_old(...)` instead of `load_reader(...)`.
4. Make a script that imports the Archivist and calls the `update(...)` function (it loads and saves all files).
5. Revert the `get_reader(...)` edit.
And voilà! You're up to date. Unless you want to switch to the `mongodb` branch (WIP).
# Mechanisms
## Markov chains
@ -12,17 +30,29 @@ The actual messages aren't stored. After they're processed and all the words hav
The storing action is made sometimes when a configuration value is changed, and whenever the bot sends a message. If the bot crashes, all the words processed from the messages since the last one from Velascobot will be lost. For high `period` values, this could be a considerable amount, but for small ones this is negligible. Still, the bot is not expected to crash often.
## Speaker's Memory
The memory of a `Speaker` is a small cache of the `C` most recently modified `Readers` (where `C` is set through a flag; default is `20`). A modified `Reader` is one where the metadata was changed through a command, or a new message has been read. When a new `Reader`is modified that goes over the memory limit, the oldest modified `Reader` is pushed out and saved into its file.
## Reader's Short Term and Long Term Memory
When a message is read, it gets stored in a temporal cache. It will only be processed into the vocabulary `Generator` when the `Reader` is asked to generate a new message, or whenever the `Reader` gets saved into a file. This allows the bot to answer to other recent messages, and not just the last one, when the periodic message is a reply.
## File hierarchy
For those who are interested in cloning or forking:
- `Generator` is the object class that holds a vocabulary dictionary and can generate new messages
- `Metadata` is the object class that holds one chat's configuration flags and other miscellaneous information.
- Some times the file where the metadata is saved is called a `card`.
- `Reader`is an object class that holds a `Metadata`instance and a `Generator` instance, and is associated with a specific chat.
- `Archivist`is the object class that handles persistence: reading and loading from files.
- `Speaker` is the object class that handles all (or most of) the functions for the commands that Velasco has
- Holds a limited set of `Readers` that it loads and saves through some `Archivist` functions (borrowed during `Speaker` initialization).
- `velasco.py` is the main file, in charge of starting up the telegram bot itself.
- `velasco.py` is the file in charge of starting up the telegram bot itself
- `speaker.py` is the file with all the functions for the commands that Velasco has
- A *Speaker* is then the entity that receives the messages, and has 1 *Parrot* and 1 *Scriptorium*
- The *Scriptorium* is a collection of *Scribes*. Each *Scribe* contains the metadata of a chat (title, ID number, the `period`, etc) and the Markov dictionary associated to it
- *Scribes* are defined in `scribe.py`
- A *Parrot* is an entity that contains a Markov dictionary, and the *Speaker's Parrot* corresponds to the last chat that prompted a Velasco message. Whenever that happens, the *Parrot* for that chat is loaded, the corresponding *Scribe* teaches the *Parrot* the latest messages, and then the *Scribe* is stored along with the updated dictionary
- A Markov dictionary is defined in `markov.py`
- The *Archivist* (defined in `archivist.py`) is in charge of doing all file saves and loads
### TODO
**Warning:** This hierarchy is pending an overhaul.
After managing to get Velasco back to being somewhat usable, I've already stated in the [News channel](t.me/velascobotnews) that I will focus on rewriting the code into a different language. Thus, I will add no improvements to the Python version from that point onwards. If you're interested of picking this project up and continue development for Python, here's a few suggestions:
- The `speaker.py` is too big. It would be useful to separate it into 2 files, one that has surface command handling, and another one that does all the speech handling (doing checks for `restricted` and `silenced` flags, the `period`, the random chances, ...).
- For a while now, Telegram allows to download a full chat history in a compressed file. Being able to send the compressed file, making sure that it *is* a Telegram chat history compressed file, and then unpacking and loading it into the chat's `Generator` would be cool.
- The most active chats have files that are too massive to keep in the process' memory. I will probably add a local database in MongoDB to solve that, but it will be a simple local one. Expanding it could be a good idea.

View file

@ -38,7 +38,7 @@ Sending the command on its own (e.g. `/period`) tells you the current value. Sen
### Answer
This value is the chance of the bot to answer to a message that is in turn a reply to one of its own messages, or to a message that mentions the bot (see above: [Summon](###Summon)). The default value is `0.5` (50% chance). The maximum is `1` (100% chance) and to disable it you must set it to 0 (0% chance).
This value is the chance of the bot to answer to a message that is in turn a reply to one of its own messages, or to a message that mentions the bot (see above: [Summon](#summon)). The default value is `0.5` (50% chance). The maximum is `1` (100% chance) and to disable it you must set it to 0 (0% chance).
Sending the command on its own (e.g. `/answer`) tells you the current value. Sending a positive decimal number between `0` and `1` inclusive (e.g. `/answer 0.95`) will set it as the new value.
@ -49,3 +49,20 @@ This toggles the chat's *restriction* (off by default). Having the chat *restric
### Silenced
This toggles the chat's *silence* (off by default). Having the chat *silenced* means that possible user mentions that may appear in randomly generated messages, will be disabled by enveloping the '@' between parentheses. This will avoid Telegram mention notifications, specially useful for those who have the group chat muted.
## When does the bot send a message?
The bot will send a message, guaranteed:
- If someone sends the `/speak` command, and have permissions to do so.
- If `period` messages have been read by the bot since the last time it sent a message.
In addition, the bot will have a random chance to:
- Reply to a message that mentions it (be it the username, like "@velascobot", or a name from a list of given nicknames, like "Velasco").
- The chance of this is the answer probability configured with the `/answer` command.
- This does not affect the `period` countdown.
- Send a guaranteed message as a reply to a random recent read message (see [below](#readers-short-term-and-long-term-memory)) instead of sending it normally.
- The chance of this is the `reply` variable in `Speaker`, and the default is `1`.
- Send a second message just after sending one (never a third one).
- The chance of this is the `repeat` variable in `Speaker`, and the default is `0.05`.