From df73401a86e04b4d187d83693276834e8b800da5 Mon Sep 17 00:00:00 2001
From: vylion <volfaria@gmail.com>
Date: Thu, 29 Oct 2020 13:41:52 +0100
Subject: [PATCH] MANUAL.md update

---
 MANUAL.md | 52 +++++++++++++++++++++++++++++++++++++++++-----------
 README.md | 19 ++++++++++++++++++-
 2 files changed, 59 insertions(+), 12 deletions(-)

diff --git a/MANUAL.md b/MANUAL.md
index ecd9be5..de3ebb3 100644
--- a/MANUAL.md
+++ b/MANUAL.md
@@ -1,6 +1,24 @@
 # Velascobot: Manual
 
-**OUTDATED: REVISION PENDING**
+Some notes:
+
+- Scriptorium version: Velasco v4.X (from the "Big Overhaul Update" on 27 Mar, 2019 until the 2nd Overhaul)
+  - Recognizable because Readers are Scribes and stored in a big dictionary called the Scriptorium, among others
+- Overhaul 2 version: starting with Velasco v5.0
+
+# Updating to Overhaul 2
+
+If you have a Velasco clone or fork from the Scriptorium version, you should follow these steps:
+
+1. First of all, update all your chat files to CARD=v4 format. You can do this by making a script that imports the Archivist, and then loading and saving all files.
+2. Then, pull the update.
+3. To convert files to the new unescaped UTF-16 encoding (previously the default, escaped UTF-8, was used), edit the `get_reader(...)` function in the Archivist so it uses `load_reader_old(...)` instead of `load_reader(...)`.
+4. Make a script that imports the Archivist and calls the `update(...)` function (it loads and saves all files).
+5. Revert the `get_reader(...)` edit.
+
+And voilà! You're up to date. Unless you want to switch to the `mongodb` branch (WIP).
+
+# Mechanisms
 
 ## Markov chains
 
@@ -12,17 +30,29 @@ The actual messages aren't stored. After they're processed and all the words hav
 
 The storing action is made sometimes when a configuration value is changed, and whenever the bot sends a message. If the bot crashes, all the words processed from the messages since the last one from Velascobot will be lost. For high `period` values, this could be a considerable amount, but for small ones this is negligible. Still, the bot is not expected to crash often.
 
+## Speaker's Memory
+
+The memory of a `Speaker` is a small cache of the `C` most recently modified `Readers` (where `C` is set through a flag; default is `20`). A modified `Reader` is one where the metadata was changed through a command, or a new message has been read. When a new `Reader`is modified that goes over the memory limit, the oldest modified `Reader` is pushed out and saved into its file.
+
+## Reader's Short Term and Long Term Memory
+
+When a message is read, it gets stored in a temporal cache. It will only be processed into the vocabulary `Generator` when the `Reader` is asked to generate a new message, or whenever the `Reader` gets saved into a file. This allows the bot to answer to other recent messages, and not just the last one, when the periodic message is a reply.
+
 ## File hierarchy
 
-For those who are interested in cloning or forking:
+- `Generator` is the object class that holds a vocabulary dictionary and can generate new messages
+- `Metadata` is the object class that holds one chat's configuration flags and other miscellaneous information.
+  - Some times the file where the metadata is saved is called a `card`.
+- `Reader`is an object class that holds a `Metadata`instance and a `Generator` instance, and is associated with a specific chat.
+- `Archivist`is the object class that handles persistence: reading and loading from files.
+- `Speaker` is the object class that handles all (or most of) the functions for the commands that Velasco has
+  - Holds a limited set of `Readers` that it loads and saves through some `Archivist` functions (borrowed during `Speaker` initialization).
+- `velasco.py` is the main file, in charge of starting up the telegram bot itself.
 
-- `velasco.py` is the file in charge of starting up the telegram bot itself
-- `speaker.py` is the file with all the functions for the commands that Velasco has
-- A *Speaker* is then the entity that receives the messages, and has 1 *Parrot* and 1 *Scriptorium*
-- The *Scriptorium* is a collection of *Scribes*. Each *Scribe* contains the metadata of a chat (title, ID number, the `period`, etc) and the Markov dictionary associated to it
-- *Scribes* are defined in `scribe.py`
-- A *Parrot* is an entity that contains a Markov dictionary, and the *Speaker's Parrot* corresponds to the last chat that prompted a Velasco message. Whenever that happens, the *Parrot* for that chat is loaded, the corresponding *Scribe* teaches the *Parrot* the latest messages, and then the *Scribe* is stored along with the updated dictionary
-- A Markov dictionary is defined in `markov.py`
-- The *Archivist* (defined in `archivist.py`) is in charge of doing all file saves and loads
+### TODO
 
-**Warning:** This hierarchy is pending an overhaul.
\ No newline at end of file
+After managing to get Velasco back to being somewhat usable, I've already stated in the [News channel](t.me/velascobotnews) that I will focus on rewriting the code into a different language. Thus, I will add no improvements to the Python version from that point onwards. If you're interested of picking this project up and continue development for Python, here's a few suggestions:
+
+- The `speaker.py` is too big. It would be useful to separate it into 2 files, one that has surface command handling, and another one that does all the speech handling (doing checks for `restricted` and `silenced` flags, the `period`, the random chances, ...).
+- For a while now, Telegram allows to download a full chat history in a compressed file. Being able to send the compressed file, making sure that it *is* a Telegram chat history compressed file, and then unpacking and loading it into the chat's `Generator` would be cool.
+- The most active chats have files that are too massive to keep in the process' memory. I will probably add a local database in MongoDB to solve that, but it will be a simple local one. Expanding it could be a good idea.
diff --git a/README.md b/README.md
index 01c05ee..62f5b38 100644
--- a/README.md
+++ b/README.md
@@ -38,7 +38,7 @@ Sending the command on its own (e.g. `/period`) tells you the current value. Sen
 
 ### Answer
 
-This value is the chance of the bot to answer to a message that is in turn a reply to one of its own messages, or to a message that mentions the bot (see above: [Summon](###Summon)). The default value is `0.5` (50% chance). The maximum is `1` (100% chance) and to disable it you must set it to 0 (0% chance).
+This value is the chance of the bot to answer to a message that is in turn a reply to one of its own messages, or to a message that mentions the bot (see above: [Summon](#summon)). The default value is `0.5` (50% chance). The maximum is `1` (100% chance) and to disable it you must set it to 0 (0% chance).
 
 Sending the command on its own (e.g. `/answer`) tells you the current value. Sending a positive decimal number between `0` and `1` inclusive (e.g. `/answer 0.95`) will set it as the new value.
 
@@ -49,3 +49,20 @@ This toggles the chat's *restriction* (off by default). Having the chat *restric
 ### Silenced
 
 This toggles the chat's *silence* (off by default). Having the chat *silenced* means that possible user mentions that may appear in randomly generated messages, will be disabled by enveloping the '@' between parentheses. This will avoid Telegram mention notifications, specially useful for those who have the group chat muted.
+
+## When does the bot send a message?
+
+The bot will send a message, guaranteed:
+
+- If someone sends the `/speak` command, and have permissions to do so.
+- If `period` messages have been read by the bot since the last time it sent a message.
+
+In addition, the bot will have a random chance to:
+
+- Reply to a message that mentions it (be it the username, like "@velascobot", or a name from a list of given nicknames, like "Velasco").
+  - The chance of this is the answer probability configured with the `/answer` command.
+  - This does not affect the `period` countdown.
+- Send a guaranteed message as a reply to a random recent read message (see [below](#readers-short-term-and-long-term-memory)) instead of sending it normally.
+  - The chance of this is the `reply` variable in `Speaker`, and the default is `1`.
+- Send a second message just after sending one (never a third one).
+  - The chance of this is the `repeat` variable in `Speaker`, and the default is `0.05`.