From df73401a86e04b4d187d83693276834e8b800da5 Mon Sep 17 00:00:00 2001 From: vylion Date: Thu, 29 Oct 2020 13:41:52 +0100 Subject: [PATCH] MANUAL.md update --- MANUAL.md | 52 +++++++++++++++++++++++++++++++++++++++++----------- README.md | 19 ++++++++++++++++++- 2 files changed, 59 insertions(+), 12 deletions(-) diff --git a/MANUAL.md b/MANUAL.md index ecd9be5..de3ebb3 100644 --- a/MANUAL.md +++ b/MANUAL.md @@ -1,6 +1,24 @@ # Velascobot: Manual -**OUTDATED: REVISION PENDING** +Some notes: + +- Scriptorium version: Velasco v4.X (from the "Big Overhaul Update" on 27 Mar, 2019 until the 2nd Overhaul) + - Recognizable because Readers are Scribes and stored in a big dictionary called the Scriptorium, among others +- Overhaul 2 version: starting with Velasco v5.0 + +# Updating to Overhaul 2 + +If you have a Velasco clone or fork from the Scriptorium version, you should follow these steps: + +1. First of all, update all your chat files to CARD=v4 format. You can do this by making a script that imports the Archivist, and then loading and saving all files. +2. Then, pull the update. +3. To convert files to the new unescaped UTF-16 encoding (previously the default, escaped UTF-8, was used), edit the `get_reader(...)` function in the Archivist so it uses `load_reader_old(...)` instead of `load_reader(...)`. +4. Make a script that imports the Archivist and calls the `update(...)` function (it loads and saves all files). +5. Revert the `get_reader(...)` edit. + +And voilĂ ! You're up to date. Unless you want to switch to the `mongodb` branch (WIP). + +# Mechanisms ## Markov chains @@ -12,17 +30,29 @@ The actual messages aren't stored. After they're processed and all the words hav The storing action is made sometimes when a configuration value is changed, and whenever the bot sends a message. If the bot crashes, all the words processed from the messages since the last one from Velascobot will be lost. For high `period` values, this could be a considerable amount, but for small ones this is negligible. Still, the bot is not expected to crash often. +## Speaker's Memory + +The memory of a `Speaker` is a small cache of the `C` most recently modified `Readers` (where `C` is set through a flag; default is `20`). A modified `Reader` is one where the metadata was changed through a command, or a new message has been read. When a new `Reader`is modified that goes over the memory limit, the oldest modified `Reader` is pushed out and saved into its file. + +## Reader's Short Term and Long Term Memory + +When a message is read, it gets stored in a temporal cache. It will only be processed into the vocabulary `Generator` when the `Reader` is asked to generate a new message, or whenever the `Reader` gets saved into a file. This allows the bot to answer to other recent messages, and not just the last one, when the periodic message is a reply. + ## File hierarchy -For those who are interested in cloning or forking: +- `Generator` is the object class that holds a vocabulary dictionary and can generate new messages +- `Metadata` is the object class that holds one chat's configuration flags and other miscellaneous information. + - Some times the file where the metadata is saved is called a `card`. +- `Reader`is an object class that holds a `Metadata`instance and a `Generator` instance, and is associated with a specific chat. +- `Archivist`is the object class that handles persistence: reading and loading from files. +- `Speaker` is the object class that handles all (or most of) the functions for the commands that Velasco has + - Holds a limited set of `Readers` that it loads and saves through some `Archivist` functions (borrowed during `Speaker` initialization). +- `velasco.py` is the main file, in charge of starting up the telegram bot itself. -- `velasco.py` is the file in charge of starting up the telegram bot itself -- `speaker.py` is the file with all the functions for the commands that Velasco has -- A *Speaker* is then the entity that receives the messages, and has 1 *Parrot* and 1 *Scriptorium* -- The *Scriptorium* is a collection of *Scribes*. Each *Scribe* contains the metadata of a chat (title, ID number, the `period`, etc) and the Markov dictionary associated to it -- *Scribes* are defined in `scribe.py` -- A *Parrot* is an entity that contains a Markov dictionary, and the *Speaker's Parrot* corresponds to the last chat that prompted a Velasco message. Whenever that happens, the *Parrot* for that chat is loaded, the corresponding *Scribe* teaches the *Parrot* the latest messages, and then the *Scribe* is stored along with the updated dictionary -- A Markov dictionary is defined in `markov.py` -- The *Archivist* (defined in `archivist.py`) is in charge of doing all file saves and loads +### TODO -**Warning:** This hierarchy is pending an overhaul. \ No newline at end of file +After managing to get Velasco back to being somewhat usable, I've already stated in the [News channel](t.me/velascobotnews) that I will focus on rewriting the code into a different language. Thus, I will add no improvements to the Python version from that point onwards. If you're interested of picking this project up and continue development for Python, here's a few suggestions: + +- The `speaker.py` is too big. It would be useful to separate it into 2 files, one that has surface command handling, and another one that does all the speech handling (doing checks for `restricted` and `silenced` flags, the `period`, the random chances, ...). +- For a while now, Telegram allows to download a full chat history in a compressed file. Being able to send the compressed file, making sure that it *is* a Telegram chat history compressed file, and then unpacking and loading it into the chat's `Generator` would be cool. +- The most active chats have files that are too massive to keep in the process' memory. I will probably add a local database in MongoDB to solve that, but it will be a simple local one. Expanding it could be a good idea. diff --git a/README.md b/README.md index 01c05ee..62f5b38 100644 --- a/README.md +++ b/README.md @@ -38,7 +38,7 @@ Sending the command on its own (e.g. `/period`) tells you the current value. Sen ### Answer -This value is the chance of the bot to answer to a message that is in turn a reply to one of its own messages, or to a message that mentions the bot (see above: [Summon](###Summon)). The default value is `0.5` (50% chance). The maximum is `1` (100% chance) and to disable it you must set it to 0 (0% chance). +This value is the chance of the bot to answer to a message that is in turn a reply to one of its own messages, or to a message that mentions the bot (see above: [Summon](#summon)). The default value is `0.5` (50% chance). The maximum is `1` (100% chance) and to disable it you must set it to 0 (0% chance). Sending the command on its own (e.g. `/answer`) tells you the current value. Sending a positive decimal number between `0` and `1` inclusive (e.g. `/answer 0.95`) will set it as the new value. @@ -49,3 +49,20 @@ This toggles the chat's *restriction* (off by default). Having the chat *restric ### Silenced This toggles the chat's *silence* (off by default). Having the chat *silenced* means that possible user mentions that may appear in randomly generated messages, will be disabled by enveloping the '@' between parentheses. This will avoid Telegram mention notifications, specially useful for those who have the group chat muted. + +## When does the bot send a message? + +The bot will send a message, guaranteed: + +- If someone sends the `/speak` command, and have permissions to do so. +- If `period` messages have been read by the bot since the last time it sent a message. + +In addition, the bot will have a random chance to: + +- Reply to a message that mentions it (be it the username, like "@velascobot", or a name from a list of given nicknames, like "Velasco"). + - The chance of this is the answer probability configured with the `/answer` command. + - This does not affect the `period` countdown. +- Send a guaranteed message as a reply to a random recent read message (see [below](#readers-short-term-and-long-term-memory)) instead of sending it normally. + - The chance of this is the `reply` variable in `Speaker`, and the default is `1`. +- Send a second message just after sending one (never a third one). + - The chance of this is the `repeat` variable in `Speaker`, and the default is `0.05`.