# Velascobot: Manual Some notes: - Scriptorium version: Velasco v4.X (from the "Big Overhaul Update" on 27 Mar, 2019 until the 2nd Overhaul) - Recognizable because Readers are Scribes and stored in a big dictionary called the Scriptorium, among others - Overhaul 2 version: starting with Velasco v5.0 # Updating to Overhaul 2 If you have a Velasco clone or fork from the Scriptorium version, you should follow these steps: 1. First of all, update all your chat files to CARD=v4 format. You can do this by making a script that imports the Archivist, and then loading and saving all files. 2. Then, pull the update. 3. To convert files to the new unescaped UTF-16 encoding (previously the default, escaped UTF-8, was used), edit the `get_reader(...)` function in the Archivist so it uses `load_reader_old(...)` instead of `load_reader(...)`. 4. Make a script that imports the Archivist and calls the `update(...)` function (it loads and saves all files). 5. Revert the `get_reader(...)` edit. And voilĂ ! You're up to date. Unless you want to switch to the `mongodb` branch (WIP). # Mechanisms ## Markov chains This bot uses Markov chains of 3 words for message generation. For each 3 consecutive words read, it will store the 3rd one as the word that follows the first 2 combined. This way, whenever it is generating a new sentence, it will always pick at random one of the stored words that follow the last 2 words of the message generated so far, combined. ## Storing The actual messages aren't stored. After they're processed and all the words have been assigned to lists under combinations of 2 words, the message is discarded, and only the dictionary with the lists of "following words" is stored. The words said in a chat may be visible, but from a certain point onwards its impossible to recreate with accuracy the exact messages said in a chat. The storing action is made sometimes when a configuration value is changed, and whenever the bot sends a message. If the bot crashes, all the words processed from the messages since the last one from Velascobot will be lost. For high `period` values, this could be a considerable amount, but for small ones this is negligible. Still, the bot is not expected to crash often. ## Speaker's Memory The memory of a `Speaker` is a small cache of the `C` most recently modified `Readers` (where `C` is set through a flag; default is `20`). A modified `Reader` is one where the metadata was changed through a command, or a new message has been read. When a new `Reader`is modified that goes over the memory limit, the oldest modified `Reader` is pushed out and saved into its file. ## Reader's Short Term and Long Term Memory When a message is read, it gets stored in a temporal cache. It will only be processed into the vocabulary `Generator` when the `Reader` is asked to generate a new message, or whenever the `Reader` gets saved into a file. This allows the bot to answer to other recent messages, and not just the last one, when the periodic message is a reply. ## File hierarchy - `Generator` is the object class that holds a vocabulary dictionary and can generate new messages - `Metadata` is the object class that holds one chat's configuration flags and other miscellaneous information. - Some times the file where the metadata is saved is called a `card`. - `Reader`is an object class that holds a `Metadata`instance and a `Generator` instance, and is associated with a specific chat. - `Archivist`is the object class that handles persistence: reading and loading from files. - `Speaker` is the object class that handles all (or most of) the functions for the commands that Velasco has - Holds a limited set of `Readers` that it loads and saves through some `Archivist` functions (borrowed during `Speaker` initialization). - `velasco.py` is the main file, in charge of starting up the telegram bot itself. ### TODO After managing to get Velasco back to being somewhat usable, I've already stated in the [News channel](t.me/velascobotnews) that I will focus on rewriting the code into a different language. Thus, I will add no improvements to the Python version from that point onwards. If you're interested of picking this project up and continue development for Python, here's a few suggestions: - The `speaker.py` is too big. It would be useful to separate it into 2 files, one that has surface command handling, and another one that does all the speech handling (doing checks for `restricted` and `silenced` flags, the `period`, the random chances, ...). - For a while now, Telegram allows to download a full chat history in a compressed file. Being able to send the compressed file, making sure that it *is* a Telegram chat history compressed file, and then unpacking and loading it into the chat's `Generator` would be cool. - The most active chats have files that are too massive to keep in the process' memory. I will probably add a local database in MongoDB to solve that, but it will be a simple local one. Expanding it could be a good idea.