Velasco Overhaul 2: Merging Boogaloo

- Merge branch 'overhaul2' into 'master'
- Changed file, class, function and variable names to more self-explanatory ones, instead of all that medieval monk metaphor
- Changed class, function and variable names to comply with Python style
- Updated to python-telegram-bot v. 12 callbacks
- Added mandatory wakeup message sent privately to bot admin
- Changed file encoding from UTF-8 to UTF-16
- Added crude memory/cache system for C last accessed chats
- Added new arguments:
  - Optional whitelist filtering chat IDs
  - Bot nicknames
  - Mute time
  - Save time
  - Chats folder path
  - Minimum period
  - Maximum period
This commit is contained in:
Vylion 2020-10-29 12:44:57 +00:00
commit f028b9ab73
13 changed files with 1234 additions and 757 deletions

4
.gitignore vendored
View file

@ -1,3 +1,7 @@
chatlogs/* chatlogs/*
__pycache__/* __pycache__/*
misc/* misc/*
bkp/*
test/*
*log*

58
MANUAL.md Normal file
View file

@ -0,0 +1,58 @@
# Velascobot: Manual
Some notes:
- Scriptorium version: Velasco v4.X (from the "Big Overhaul Update" on 27 Mar, 2019 until the 2nd Overhaul)
- Recognizable because Readers are Scribes and stored in a big dictionary called the Scriptorium, among others
- Overhaul 2 version: starting with Velasco v5.0
# Updating to Overhaul 2
If you have a Velasco clone or fork from the Scriptorium version, you should follow these steps:
1. First of all, update all your chat files to CARD=v4 format. You can do this by making a script that imports the Archivist, and then loading and saving all files.
2. Then, pull the update.
3. To convert files to the new unescaped UTF-16 encoding (previously the default, escaped UTF-8, was used), edit the `get_reader(...)` function in the Archivist so it uses `load_reader_old(...)` instead of `load_reader(...)`.
4. Make a script that imports the Archivist and calls the `update(...)` function (it loads and saves all files).
5. Revert the `get_reader(...)` edit.
And voilà! You're up to date. Unless you want to switch to the `mongodb` branch (WIP).
# Mechanisms
## Markov chains
This bot uses Markov chains of 3 words for message generation. For each 3 consecutive words read, it will store the 3rd one as the word that follows the first 2 combined. This way, whenever it is generating a new sentence, it will always pick at random one of the stored words that follow the last 2 words of the message generated so far, combined.
## Storing
The actual messages aren't stored. After they're processed and all the words have been assigned to lists under combinations of 2 words, the message is discarded, and only the dictionary with the lists of "following words" is stored. The words said in a chat may be visible, but from a certain point onwards its impossible to recreate with accuracy the exact messages said in a chat.
The storing action is made sometimes when a configuration value is changed, and whenever the bot sends a message. If the bot crashes, all the words processed from the messages since the last one from Velascobot will be lost. For high `period` values, this could be a considerable amount, but for small ones this is negligible. Still, the bot is not expected to crash often.
## Speaker's Memory
The memory of a `Speaker` is a small cache of the `C` most recently modified `Readers` (where `C` is set through a flag; default is `20`). A modified `Reader` is one where the metadata was changed through a command, or a new message has been read. When a new `Reader`is modified that goes over the memory limit, the oldest modified `Reader` is pushed out and saved into its file.
## Reader's Short Term and Long Term Memory
When a message is read, it gets stored in a temporal cache. It will only be processed into the vocabulary `Generator` when the `Reader` is asked to generate a new message, or whenever the `Reader` gets saved into a file. This allows the bot to answer to other recent messages, and not just the last one, when the periodic message is a reply.
## File hierarchy
- `Generator` is the object class that holds a vocabulary dictionary and can generate new messages
- `Metadata` is the object class that holds one chat's configuration flags and other miscellaneous information.
- Some times the file where the metadata is saved is called a `card`.
- `Reader`is an object class that holds a `Metadata`instance and a `Generator` instance, and is associated with a specific chat.
- `Archivist`is the object class that handles persistence: reading and loading from files.
- `Speaker` is the object class that handles all (or most of) the functions for the commands that Velasco has
- Holds a limited set of `Readers` that it loads and saves through some `Archivist` functions (borrowed during `Speaker` initialization).
- `velasco.py` is the main file, in charge of starting up the telegram bot itself.
### TODO
After managing to get Velasco back to being somewhat usable, I've already stated in the [News channel](t.me/velascobotnews) that I will focus on rewriting the code into a different language. Thus, I will add no improvements to the Python version from that point onwards. If you're interested of picking this project up and continue development for Python, here's a few suggestions:
- The `speaker.py` is too big. It would be useful to separate it into 2 files, one that has surface command handling, and another one that does all the speech handling (doing checks for `restricted` and `silenced` flags, the `period`, the random chances, ...).
- For a while now, Telegram allows to download a full chat history in a compressed file. Being able to send the compressed file, making sure that it *is* a Telegram chat history compressed file, and then unpacking and loading it into the chat's `Generator` would be cool.
- The most active chats have files that are too massive to keep in the process' memory. I will probably add a local database in MongoDB to solve that, but it will be a simple local one. Expanding it could be a good idea.

View file

@ -1,48 +1,68 @@
# Velascobot # Velascobot
This is yet another Markov chain-based chatbot, based on the Twitterbot fad consisting of creating a bot account that would try to generate new random tweets, using your own as a template. However, instead of reading the messages from a Twitter account, this bot is made to read the messages in a group chat, and try to blend in by generating new messages that fit the patterns seen in that specific group chat. At the beginning that will mean a lot of parroting, but eventually the bot starts coming up with sentences of itself. This is yet another Markov chain-based chatbot, based on the Twitterbot fad consisting of creating a bot account that would try to generate new random tweets (usually having `_ebooks` or `.txt` in their names to indicate that an account was one of such, or just a plain `bot` suffix), using your own as a template. However, instead of reading the messages from a Twitter account, this bot is made to read the messages in a group chat, and try to blend in by generating new messages that fit the patterns seen in that specific group chat. At the beginning that will mean a lot of parroting, but eventually the bot starts coming up with sentences of itself.
This bot also works on private chats between a user and itself, but of course the training is much lower and it will feel like talking to a parrot for a longer time, unless you feed it a lot of messages quickly. This bot also works on private chats between a user and itself, but of course the training is much lower and it will feel like talking to a parrot for a longer time, unless you feed it a lot of messages quickly.
## Markov chains ## How to use it
This bot uses Markov chains of 3 words for message generation. For each 3 consecutive words read, it will store the 3rd one as the word that follows the first 2 combined. This way, whenever it is generating a new sentence, it will always pick at random one of the stored words that follow the last 2 words of the message generated so far, combined. You have to add the bot to a chat group, or speak to it privately, letting it read and send messages. Maybe set some configuration commands too.
## Storing If you want to clone or fork this repo and host your own instance of Velasco, see [MANUAL.md](MANUAL.md).
The actual messages aren't stored. After they're processed and all the words have been assigned to lists under combinations of 2 words, the message is discarded, and only the dictionary with the lists of "following words" is stored. The words said in a chat may be visible, but from a certain point onwards its impossible to recreate with accuracy the exact messages said in a chat. ## Commands & ussage
The storing action is made sometimes when a configuration value is changed, and whenever the bot sends a message. If the bot crashes, all the words processed from the messages since the last one from Velascobot will be lost. For high `freq` values, this could be a considerable amount, but for small ones this is negligible. Still, the bot is not expected to crash often. ### Help, About and Explain
## Configuration commands The `/help` command lists the most useful available commands for the bot. The `/about` command has a short explanation on the purpose of this bot, and the `/explain` command goes a little further in detail.
### Speak
This will make the bot send a message, aside from the periodic messages. If the command message is a reply to a different message M, the bot's message will be a reply to M as well; otherwise, the bot will reply to the message with the command.
### Summon
This isn't a command per se, but mentioning the username (in this case, '@velascobot') or any of the configured nicknames (like 'velasco') will prompt a chance for the bot to answer.
A summon of 3 or less words will not be processed, so you can call Velasco's name to your heart's content without having to worry for the bot learning to repeat a lot of short 'Velasco!' messages.
### Count ### Count
This is the amount of messages that the bot remembers, this is, the amount of messages processed. The messages themselves aren't stored but there is a counter that increases each time a message is processed. This tells you the amount of messages that the bot has read so far. The messages themselves aren't stored, but there is a counter that increases each time a message is processed.
### Freq ### Period
It comes from "frequency", and at the beginning it was, but now it's actually the opposite, the "period". This is the amount of messages that the bot waits for before sending a message of its own. Increase it to make it talk less often, and decrease it to make it talk more often. This is the amount of messages that the bot waits for before sending a message of its own. Increase it to make it talk less often, and decrease it to make it talk more often.
Sending the command on its own tells you the current value. Sending a positive number with the command will set that as the new value. Sending the command on its own (e.g. `/period`) tells you the current value. Sending a positive number with the command (e.g. `/period 85`) will set that as the new value.
### Answer ### Answer
This value is the chance of the bot to answer to a message that is in turn a reply to one of its own messages, or (to be implemented:) to a message that mentions it. The default value is 0.5 (50% chance). The maximum is 1 (100% chance) and to disable it you must set it to 0 (0% chance). This value is the chance of the bot to answer to a message that is in turn a reply to one of its own messages, or to a message that mentions the bot (see above: [Summon](#summon)). The default value is `0.5` (50% chance). The maximum is `1` (100% chance) and to disable it you must set it to 0 (0% chance).
Sending the command on its own tells you the current value. Sending a positive decimal number between 0 and 1 inclusive will set it as the new value. Sending the command on its own (e.g. `/answer`) tells you the current value. Sending a positive decimal number between `0` and `1` inclusive (e.g. `/answer 0.95`) will set it as the new value.
## File hierarchy ### Restricted
For those who are interested in cloning or forking: This toggles the chat's *restriction* (off by default). Having the chat *restricted* means that only the administrators of a chat can send configuration commands, like `/period n` or `/answer n`, only they can force the bot to speak with the `/speak` command, and only they can summon the bot. The bot will still read all users' messages and will still send periodic messages for all to enjoy.
- `velasco.py` is the file in charge of starting up the telegram bot itself ### Silenced
- `speaker.py` is the file with all the functions for the commands that Velasco has
- A *Speaker* is then the entity that receives the messages, and has 1 *Parrot* and 1 *Scriptorium*
- The *Scriptorium* is a collection of *Scribes*. Each *Scribe* contains the metadata of a chat (title, ID number, the `freq`, etc) and the Markov dictionary associated to it
- *Scribes* are defined in `scribe.py`
- A *Parrot* is an entity that contains a Markov dictionary, and the *Speaker's Parrot* corresponds to the last chat that prompted a Velasco message. Whenever that happens, the *Parrot* for that chat is loaded, the corresponding *Scribe* teaches the *Parrot* the latest messages, and then the *Scribe* is stored along with the updated dictionary
- A Markov dictionary is defined in `markov.py`
- The *Archivist* (defined in `archivist.py`) is in charge of doing all file saves and loads
**Warning:** This hierarchy is pending an overhaul. This toggles the chat's *silence* (off by default). Having the chat *silenced* means that possible user mentions that may appear in randomly generated messages, will be disabled by enveloping the '@' between parentheses. This will avoid Telegram mention notifications, specially useful for those who have the group chat muted.
## When does the bot send a message?
The bot will send a message, guaranteed:
- If someone sends the `/speak` command, and have permissions to do so.
- If `period` messages have been read by the bot since the last time it sent a message.
In addition, the bot will have a random chance to:
- Reply to a message that mentions it (be it the username, like "@velascobot", or a name from a list of given nicknames, like "Velasco").
- The chance of this is the answer probability configured with the `/answer` command.
- This does not affect the `period` countdown.
- Send a guaranteed message as a reply to a random recent read message (see [below](#readers-short-term-and-long-term-memory)) instead of sending it normally.
- The chance of this is the `reply` variable in `Speaker`, and the default is `1`.
- Send a second message just after sending one (never a third one).
- The chance of this is the `repeat` variable in `Speaker`, and the default is `0.05`.

View file

@ -1,164 +1,153 @@
import os, errno, random, pickle import os
from scribe import Scribe from reader import Reader
from markov import Markov from generator import Generator
class Archivist(object): class Archivist(object):
def __init__(self, logger, chatdir=None, chatext=None, admin=0, def __init__(self, logger, chatdir=None, chatext=None, admin=0,
freqIncrement=5, saveCount=15, maxFreq=100000, maxLen=50, period_inc=5, save_count=15, min_period=1,
readOnly=False, filterCids=None, bypass=False max_period=100000, read_only=False
): ):
if chatdir is None or len(chatdir) == 0: if chatdir is None or len(chatdir) == 0:
raise ValueError("Chatlog directory name is empty") chatdir = "./"
elif chatext is None: # Can be len(chatext) == 0 elif chatext is None: # Can be len(chatext) == 0
raise ValueError("Chatlog file extension is invalid") raise ValueError("Chatlog file extension is invalid")
self.logger = logger self.logger = logger
self.chatdir = chatdir self.chatdir = chatdir
self.chatext = chatext self.chatext = chatext
self.admin = admin self.period_inc = period_inc
self.freqIncrement = freqIncrement self.save_count = save_count
self.saveCount = saveCount self.min_period = min_period
self.maxFreq = maxFreq self.max_period = max_period
self.maxLen = maxLen self.read_only = read_only
self.readOnly = readOnly
self.filterCids = filterCids
self.bypass = bypass
self.scribeFolder = chatdir + "chat_{tag}"
self.scribePath = chatdir + "chat_{tag}/{file}{ext}"
def store(self, tag, log, gen): # Formats and returns a chat folder path
scribefolder = self.scribeFolder.format(tag=tag) def chat_folder(self, *formatting, **key_format):
cardfile = self.scribePath.format(tag=tag, file="card", ext=".txt") return (self.chatdir + "/chat_{tag}").format(*formatting, **key_format)
if self.readOnly:
# Formats and returns a chat file path
def chat_file(self, *formatting, **key_format):
return (self.chatdir + "/chat_{tag}/{file}{ext}").format(*formatting, **key_format)
# Stores a Reader/Generator file pair
def store(self, tag, data, vocab):
chat_folder = self.chat_folder(tag=tag)
chat_card = self.chat_file(tag=tag, file="card", ext=".txt")
if self.read_only:
return return
try: try:
if not os.path.exists(scribefolder): if not os.path.exists(chat_folder):
os.makedirs(scribefolder, exist_ok=True) os.makedirs(chat_folder, exist_ok=True)
self.logger.info("Storing a new chat. Folder {} created.".format(scribefolder)) self.logger.info("Storing a new chat. Folder {} created.".format(chat_folder))
except: except Exception:
self.logger.error("Failed creating {} folder.".format(scribefolder)) self.logger.error("Failed creating {} folder.".format(chat_folder))
return return
file = open(cardfile, 'w') file = open(chat_card, 'w')
file.write(log) file.write(data)
file.close() file.close()
if gen is not None:
recordfile = self.scribePath.format(tag=tag, file="record", ext=self.chatext) if vocab is not None:
file = open(recordfile, 'w') chat_record = self.chat_file(tag=tag, file="record", ext=self.chatext)
file.write(gen) file = open(chat_record, 'w', encoding="utf-16")
file.write(vocab)
file.close() file.close()
def recall(self, filename): # Loads a Generator's vocabulary file dump
#print("Loading chat: " + path) def load_vocab(self, tag):
file = open(self.chatdir + filename, 'rb') filepath = self.chat_file(tag=tag, file="record", ext=self.chatext)
scribe = None
try: try:
scribe = Scribe.Recall(pickle.load(file), self) file = open(filepath, 'r', encoding="utf-16")
self.logger.info("Unpickled {}{}".format(self.chatdir, filename))
except pickle.UnpicklingError:
file.close()
file = open(self.chatdir + filename, 'r')
try:
scribe = Scribe.Recall(file.read(), self)
self.logger.info("Read {}{} text file".format(self.chatdir, filename))
except Exception as e:
self.logger.error("Failed reading {}{}".format(self.chatdir, filename))
self.logger.exception(e)
raise e
file.close()
return scribe
def wakeScribe(self, filepath):
file = open(filepath.format(filename="card", ext=".txt"), 'r')
card = file.read()
file.close()
return Scribe.FromFile(card, self)
def wakeParrot(self, tag):
filepath = self.scribePath.format(tag=tag, file="record", ext=self.chatext)
try:
file = open(filepath, 'r')
#print("\nOPening " + filepath + "\n")
record = file.read() record = file.read()
file.close() file.close()
return Markov.loads(record) return record
except: except Exception as e:
self.logger.error("Parrot file {} not found.".format(filepath)) self.logger.error("Vocabulary file {} not found.".format(filepath))
self.logger.exception(e)
return None return None
def wakeScriptorium(self): # Loads a Generator's vocabulary file dump in the old UTF-8 encoding
scriptorium = {} def load_vocab_old(self, tag):
filepath = self.chat_file(tag=tag, file="record", ext=self.chatext)
try:
file = open(filepath, 'r')
record = file.read().encode().decode('utf-8')
file.close()
return record
except Exception as e:
self.logger.error("Vocabulary file {} not found.".format(filepath))
self.logger.exception(e)
return None
# Loads a Metadata card file dump
def load_card(self, tag):
filepath = self.chat_file(tag=tag, file="card", ext=".txt")
try:
reader_file = open(filepath, 'r')
reader = reader_file.read()
reader_file.close()
return reader
except OSError:
self.logger.error("Metadata file {} not found.".format(filepath))
return None
# Returns a Reader for a given ID with an already working vocabulary - be it
# new or loaded from file
def get_reader(self, tag):
card = self.load_card(tag)
if card:
vocab_dump = self.load_vocab(tag)
if vocab_dump:
vocab = Generator.loads(vocab_dump)
else:
vocab = Generator()
return Reader.FromCard(card, vocab, self.min_period, self.max_period, self.logger)
else:
return None
# Count the stored chats
def chat_count(self):
count = 0
directory = os.fsencode(self.chatdir)
for subdir in os.scandir(directory):
dirname = subdir.name.decode("utf-8")
if dirname.startswith("chat_"):
count += 1
return count
# Crawl through all the stored Readers
def readers_pass(self):
directory = os.fsencode(self.chatdir) directory = os.fsencode(self.chatdir)
for subdir in os.scandir(directory): for subdir in os.scandir(directory):
dirname = subdir.name.decode("utf-8") dirname = subdir.name.decode("utf-8")
if dirname.startswith("chat_"): if dirname.startswith("chat_"):
cid = dirname[5:] cid = dirname[5:]
try: try:
filepath = self.chatdir + dirname + "/{filename}{ext}" reader = self.get_reader(cid)
scriptorium[cid] = self.wakeScribe(filepath) # self.logger.info("Chat {} contents:\n{}".format(cid, reader.card.dumps()))
self.logger.info("Chat {} contents:\n".format(cid) + scriptorium[cid].chat.dumps()) self.logger.info("Successfully passed through {} ({}) chat.\n".format(cid, reader.title()))
if self.bypass: if reader.period() > self.max_period:
scriptorium[cid].setFreq(random.randint(self.maxFreq//2, self.maxFreq)) reader.set_period(self.max_period)
elif scriptorium[cid].freq() > self.maxFreq: self.store(*reader.archive())
scriptorium[cid].setFreq(self.maxFreq) elif reader.period() < self.min_period:
reader.set_period(self.min_period)
self.store(*reader.archive())
yield reader
except Exception as e: except Exception as e:
self.logger.error("Failed reading {}".format(dirname)) self.logger.error("Failed passing through {}".format(dirname))
self.logger.exception(e) self.logger.exception(e)
raise e raise e
return scriptorium
""" # Load and immediately store every Reader
def wake_old(self): def update(self):
scriptorium = {} for reader in self.readers_pass():
if reader.vocab is None:
directory = os.fsencode(self.chatdir) yield reader.cid()
for file in os.listdir(directory):
filename = os.fsdecode(file)
if filename.endswith(self.chatext):
cid = filename[:-(len(self.chatext))]
if self.filterCids is not None:
#self.logger.info("CID " + cid)
if not cid in self.filterCids:
continue
scriptorium[cid] = self.recall(filename)
scribe = scriptorium[cid]
if scribe is not None:
if self.bypass:
scribe.setFreq(random.randint(self.maxFreq//2, self.maxFreq))
elif scribe.freq() > self.maxFreq:
scribe.setFreq(self.maxFreq)
self.logger.info("Loaded chat " + scribe.title() + " [" + scribe.cid() + "]"
"\n" + "\n".join(scribe.chat.dumps()))
else: else:
continue
return scriptorium
"""
def update(self, oldext=None):
failed = []
remove = False
if not oldext:
oldext = self.chatext
remove = True
directory = os.fsencode(self.chatdir)
for file in os.listdir(directory):
filename = os.fsdecode(file)
if filename.endswith(oldext):
try: try:
self.logger.info("Updating chat " + filename) self.store(*reader.archive())
scribe = self.recall(filename)
if scribe is not None:
scribe.store(scribe.parrot.dumps())
self.wakeParrot(scribe.cid())
self.logger.info("--- Update done: " + scribe.title())
if remove:
os.remove(filename)
except Exception as e: except Exception as e:
failed.append(filename)
self.logger.error("Found the following error when trying to update:")
self.logger.exception(e) self.logger.exception(e)
else: yield reader.cid()
continue
return failed

View file

@ -1,106 +0,0 @@
#!/usr/bin/env python3
def parse(l):
s = l.split('=', 1)
if len(s) < 2:
return ""
else:
return s[1]
class Chatlog(object):
def __init__(self, cid, ctype, title, count=0, freq=None, answer=0.5, restricted=False, silenced=False):
self.id = str(cid)
self.type = ctype
self.title = title
if freq is None:
if "group" in ctype:
freq = 10
#elif ctype is "private":
else:
freq = 2
self.count = count
self.freq = freq
self.answer = answer
self.restricted = restricted
self.silenced = silenced
def add_msg(self, message):
self.gen.add_text(message)
self.count += 1
def set_freq(self, freq):
if freq < 1:
raise ValueError('Tried to set freq a value less than 1.')
else:
self.freq = freq
return self.freq
def set_answer(self, afreq):
if afreq > 1:
raise ValueError('Tried to set answer probability higher than 1.')
elif afreq < 0:
raise ValueError('Tried to set answer probability lower than 0.')
else:
self.answer = afreq
return self.answer
def dumps(self):
lines = ["LOG=v4"]
lines.append("CHAT_ID=" + self.id)
lines.append("CHAT_TYPE=" + self.type)
lines.append("CHAT_NAME=" + self.title)
lines.append("WORD_COUNT=" + str(self.count))
lines.append("MESSAGE_FREQ=" + str(self.freq))
lines.append("ANSWER_FREQ=" + str(self.answer))
lines.append("RESTRICTED=" + str(self.restricted))
lines.append("SILENCED=" + str(self.silenced))
#lines.append("WORD_DICT=")
return '\n'.join(lines)
def loads(text):
lines = text.splitlines()
return Chatlog.loadl(lines)
def loadl(lines):
version = parse(lines[0]).strip()
version = version if len(version.strip()) > 1 else (lines[4] if len(lines) > 4 else "LOG_ZERO")
if version == "v4":
return Chatlog(cid=parse(lines[1]),
ctype=parse(lines[2]),
title=parse(lines[3]),
count=int(parse(lines[4])),
freq=int(parse(lines[5])),
answer=float(parse(lines[6])),
restricted=(parse(lines[7]) == 'True'),
silenced=(parse(lines[8]) == 'True')
)
elif version == "v3":
return Chatlog(cid=parse(lines[1]),
ctype=parse(lines[2]),
title=parse(lines[3]),
count=int(parse(lines[7])),
freq=int(parse(lines[4])),
answer=float(parse(lines[5])),
restricted=(parse(lines[6]) == 'True')
)
elif version == "v2":
return Chatlog(cid=parse(lines[1]),
ctype=parse(lines[2]),
title=parse(lines[3]),
count=int(parse(lines[6])),
freq=int(parse(lines[4])),
answer=float(parse(lines[5]))
)
elif version == "dict:":
return Chatlog(cid=lines[0],
ctype=lines[1],
title=lines[2],
count=int(lines[5]),
freq=int(lines[3])
)
else:
return Chatlog(cid=lines[0],
ctype=lines[1],
title=lines[2],
freq=int(lines[3])
)

177
generator.py Normal file
View file

@ -0,0 +1,177 @@
#!/usr/bin/env python3
import random
import json
# This splits strings into lists of words delimited by space.
# Other whitespaces are appended space characters so they are included
# as their own Markov chain element, so as not to pollude with
# "different" words that would only differ in having a whitespace
# attached or not
def rewrite(text):
words = text.replace('\n', '\n ').split(' ')
i = 0
while i < len(words):
w = words[i].strip(' \t')
if len(w) > 0:
words[i] = w
else:
del words[i]
i -= 1
i += 1
return words
# This gives a dictionary key from 2 words, ignoring case
def getkey(w1, w2):
key = (w1.strip().casefold(), w2.strip().casefold())
return str(key)
# This turns a dictionary key back into 2 separate words
def getwords(key):
words = key.strip('()').split(', ')
for i in range(len(words)):
words[i].strip('\'')
return words
# Generates triplets of words from the given data string. So if our string
# were "What a lovely day", we'd generate (What, a, lovely) and then
# (a, lovely, day).
def triplets(wordlist):
if len(wordlist) < 3:
return
for i in range(len(wordlist) - 2):
yield (wordlist[i], wordlist[i+1], wordlist[i+2])
class Generator(object):
# Marks when we want to create a Generator object from a given JSON
MODE_JSON = "MODE_JSON"
# Marks when we want to create a Generator object from a given list of words
MODE_LIST = "MODE_LIST"
# Marks when we want to create a Generator object from a given dictionary
MODE_DICT = "MODE_DICT"
# Marks when we want to create a Generator object from a whole Chat history (WIP)
MODE_HIST = "MODE_HIST"
# Marks the beginning of a message
HEAD = "\n^MESSAGE_SEPARATOR^"
# Marks the end of a message
TAIL = " ^MESSAGE_SEPARATOR^"
def __init__(self, load=None, mode=None):
if mode is not None:
if mode == Generator.MODE_JSON:
self.cache = json.loads(load)
elif mode == Generator.MODE_LIST:
self.cache = {}
self.load_list(load)
elif mode == Generator.MODE_DICT:
self.cache = load
# TODO: Chat History mode
else:
self.cache = {}
# Loads a text divided into a list of lines
def load_list(self, many):
for one in many:
self.add(one)
# Dumps the cache dictionary into a JSON-formatted string
def dumps(self):
return json.dumps(self.cache, ensure_ascii=False)
# Dumps the cache dictionary into a file, formatted as JSON
def dump(self, f):
json.dump(self.cache, f, ensure_ascii=False)
# Loads the cache dictionary from a JSON-formatted string
def loads(dump):
if len(dump) == 0:
# faulty dump gives default Generator
return Generator()
# otherwise
return Generator(load=dump, mode=Generator.MODE_JSON)
# Loads the cache dictionary from a file, formatted as JSON
def load(f):
return Generator(load=json.load(f), mode=Generator.MODE_DICT)
def add(self, text):
words = [Generator.HEAD]
text = rewrite(text + Generator.TAIL)
words.extend(text)
self.database(words)
# This takes a list of words and stores it in the cache, adding
# a special entry for the first word (the HEAD marker)
def database(self, words):
for w1, w2, w3 in triplets(words):
if w1 == Generator.HEAD:
if w1 in self.cache:
self.cache[Generator.HEAD].append(w2)
else:
self.cache[Generator.HEAD] = [w2]
key = getkey(w1, w2)
if key in self.cache:
# if the key exists, add the new word to the end of the chain
self.cache[key].append(w3)
else:
# otherwise, create a new entry for the new key starting with
# the new end of chain
self.cache[key] = [w3]
# This generates the Markov text/word chain
# silence=True disables Telegram user mentions
def generate(self, size=50, silence=False):
if len(self.cache) == 0:
# If there is nothing in the cache we cannot generate anything
return ""
# Start with a message HEAD and a random message starting word
w1 = random.choice(self.cache[Generator.HEAD])
w2 = random.choice(self.cache[getkey(Generator.HEAD, w1)])
gen_words = []
# As long as we don't go over the max. message length (in n. of words)...
for i in range(size):
if silence and w1.startswith("@") and len(w1) > 1:
# ...append word 1, disabling any possible Telegram mention
gen_words.append(w1.replace("@", "(@)"))
else:
# ..append word 1
gen_words.append(w1)
if w2 == Generator.TAIL or not getkey(w1, w2) in self.cache:
# When there's no key from the last 2 words to follow the chain,
# or we reached a separation between messages, stop
break
else:
# Get a random third word that follows the chain of words 1
# and 2, then make words 2 and 3 to be the new words 1 and 2
w1, w2 = w2, random.choice(self.cache[getkey(w1, w2)])
return ' '.join(gen_words)
# Cross a second Generator into this one
def cross(self, gen):
for key in gen.cache:
if key in self.cache:
self.cache[key].extend(gen.cache[key])
else:
self.cache[key] = list(gen.cache[key])
# Count again the number of messages
# (for whenever the count number is unreliable)
def new_count(self):
count = 0
for key in self.cache:
for word in self.cache[key]:
if word == Generator.TAIL:
# ...by just counting message separators
count += 1
return count

105
markov.py
View file

@ -1,105 +0,0 @@
#!/usr/bin/env python3
import random
import json
def getkey(w1, w2):
key = (w1.strip().casefold(), w2.strip().casefold())
return str(key)
def getwords(key):
words = key.strip('()').split(', ')
for i in range(len(words)):
words[i].strip('\'')
return words
def triples(wordlist):
# Generates triples from the given data string. So if our string were
# "What a lovely day", we'd generate (What, a, lovely) and then
# (a, lovely, day).
if len(wordlist) < 3:
return
for i in range(len(wordlist) - 2):
yield (wordlist[i], wordlist[i+1], wordlist[i+2])
class Markov(object):
ModeJson = "MODE_JSON"
ModeList = "MODE_LIST"
ModeChatData = "MODE_CHAT_DATA"
Head = "\n^MESSAGE_SEPARATOR^"
Tail = "^MESSAGE_SEPARATOR^"
def __init__(self, load=None, mode=None):
if mode is not None:
if mode == Markov.ModeJson:
self.cache = json.loads(load)
elif mode == Markov.ModeList:
self.cache = {}
self.loadList(load)
else:
self.cache = {}
def loadList(self, lines):
for line in lines:
words = [Markov.Head]
words.extend(line.split())
self.learn_words(words)
def dumps(self):
return json.dumps(self.cache)
def loads(dump):
if len(dump) == 0:
return Markov()
return Markov(load=dump, mode=Markov.ModeJson)
def learn_words(self, words):
self.database(words)
def database(self, wordlist):
for w1, w2, w3 in triples(wordlist):
if w1 == Markov.Head:
if w1 in self.cache:
self.cache[Markov.Head].append(w2)
else:
self.cache[Markov.Head] = [w2]
key = getkey(w1, w2)
if key in self.cache:
self.cache[key].append(w3)
else:
self.cache[key] = [w3]
def generate_markov_text(self, size=50, silence=False):
if len(self.cache) == 0:
return ""
w1 = random.choice(self.cache[Markov.Head])
w2 = random.choice(self.cache[getkey(Markov.Head, w1)])
gen_words = []
for i in range(size):
if silence and w1.startswith("@") and len(w1) > 1:
gen_words.append(w1.replace("@", "(@)"))
else:
gen_words.append(w1)
if w2 == Markov.Tail or not getkey(w1, w2) in self.cache:
# print("Generated text")
break
else:
w1, w2 = w2, random.choice(self.cache[getkey(w1, w2)])
return ' '.join(gen_words)
def cross(self, gen):
for key in gen.cache:
if key in self.cache:
self.cache[key].extend(d[key])
else:
self.cache[key] = list(d[key])
def new_count(self):
count = 0
for key in self.cache:
for word in self.cache[key]:
if word == Markov.Tail:
count += 1
return count

65
memorylist.py Normal file
View file

@ -0,0 +1,65 @@
#!/usr/bin/env python3
from collections.abc import Sequence
class MemoryList(Sequence):
"""Special "memory list" class that:
- Whenever an item is added that was already in the list,
it gets moved to the back instead
- Whenever an item is looked for, it gets moved to the
back
- If a new item is added that goes over a given capacity
limit, the item at the front (oldest accessed item)
is removed (and returned)"""
def __init__(self, capacity, data=None):
super(MemoryList, self).__init__()
self._capacity = capacity
if (data is not None):
self._list = list(data)
else:
self._list = list()
def __repr__(self):
return "<{0} {1}, capacity {2}>".format(self.__class__.__name__, self._list, self._capacity)
def __str__(self):
return "{0}, {1}/{2}".format(self._list, len(self._list), self._capacity)
def __len__(self):
return len(self._list)
def capacity(self):
return self._capacity
def __getitem__(self, ii):
return self._list[ii]
def __contains__(self, val):
return val in self._list
def __iter__(self):
return self._list.__iter__()
def add(self, val):
if val in self._list:
self._list.remove(val)
self._list.append(val)
if len(self._list) >= self._capacity:
x = self._list[0]
del self._list[0]
return x
else:
return None
def search(self, cond, *args, **kwargs):
val = next((v for v in self._list if cond(v)), *args, **kwargs)
if val is not None:
self._list.remove(val)
self._list.append(val)
return val
def remove(self, val):
self._list.remove(val)

160
metadata.py Normal file
View file

@ -0,0 +1,160 @@
#!/usr/bin/env python3
# This reads a line in the format 'VARIABLE=value' and gives me the value.
# See Metadata.loadl(...) for more details
def parse_card_line(line):
s = line.split('=', 1)
if len(s) < 2:
return ""
else:
return s[1]
# This is a chat's Metadata, holding different configuration values for
# Velasco and other miscellaneous information about the chat
class Metadata(object):
def __init__(self, cid, ctype, title, count=0, period=None, answer=0.5, restricted=False, silenced=False):
# The Telegram chat's ID
self.id = str(cid)
# The type of chat
self.type = ctype
# The title of the chat
self.title = title
if period is None:
if "group" in ctype:
# Default period for groups and supergroups
period = 10
else:
# Default period for private or channel chats
period = 2
# The number of messages read in a chat
self.count = count
# This chat's configured period
self.period = period
# This chat's configured answer probability
self.answer = answer
# Wether some interactions are restricted to admins only
self.restricted = restricted
# Wether messages should silence user mentions
self.silenced = silenced
# Sets the period for a chat
# It has to be higher than 1
# Returns the new value
def set_period(self, period):
if period < 1:
raise ValueError('Tried to set period a value less than 1.')
else:
self.period = period
return self.period
# Sets the answer probability
# It's a percentage represented as a decimal between 0 and 1
# Returns the new value
def set_answer(self, prob):
if prob > 1:
raise ValueError('Tried to set answer probability higher than 1.')
elif prob < 0:
raise ValueError('Tried to set answer probability lower than 0.')
else:
self.answer = prob
return self.answer
# Dumps the metadata into a list of lines, then joined together in a string,
# ready to be written into a file
def dumps(self):
lines = ["CARD=v5"]
lines.append("CHAT_ID=" + self.id)
lines.append("CHAT_TYPE=" + self.type)
lines.append("CHAT_NAME=" + self.title)
lines.append("WORD_COUNT=" + str(self.count))
lines.append("MESSAGE_PERIOD=" + str(self.period))
lines.append("ANSWER_PROB=" + str(self.answer))
lines.append("RESTRICTED=" + str(self.restricted))
lines.append("SILENCED=" + str(self.silenced))
# lines.append("WORD_DICT=")
return ('\n'.join(lines)) + '\n'
# Creates a Metadata object from a previous text dump
def loads(text):
lines = text.splitlines()
return Metadata.loadl(lines)
# Creates a Metadata object from a list of metadata lines
def loadl(lines):
# In a perfect world, I would get both the variable name and its corresponding value
# from each side of the lines, but I know the order in which the lines are writen in
# the file, I hardcoded it. So I can afford also hardcoding reading it back in the
# same order, and nobody can stop me
version = parse_card_line(lines[0]).strip()
version = version if len(version.strip()) > 1 else (lines[4] if len(lines) > 4 else "LOG_ZERO")
if version == "v4" or version == "v5":
return Metadata(cid=parse_card_line(lines[1]),
ctype=parse_card_line(lines[2]),
title=parse_card_line(lines[3]),
count=int(parse_card_line(lines[4])),
period=int(parse_card_line(lines[5])),
answer=float(parse_card_line(lines[6])),
restricted=(parse_card_line(lines[7]) == 'True'),
silenced=(parse_card_line(lines[8]) == 'True')
)
elif version == "v3":
# Deprecated: this elif block will be removed in a new version
print("Warning! This Card format ({}) is deprecated. Update all".format(version),
"your files in case that there are still some left in old formats before",
"downloading the next update.")
# This is kept for retrocompatibility purposes, in case someone did a fork
# of this repo and still has some chat files that haven't been updated in
# a long while -- but I already converted all my files to v5
return Metadata(cid=parse_card_line(lines[1]),
ctype=parse_card_line(lines[2]),
title=parse_card_line(lines[3]),
count=int(parse_card_line(lines[7])),
period=int(parse_card_line(lines[4])),
answer=float(parse_card_line(lines[5])),
restricted=(parse_card_line(lines[6]) == 'True')
)
elif version == "v2":
# Deprecated: this elif block will be removed in a new version
print("Warning! This Card format ({}) is deprecated. Update all".format(version),
"your files in case that there are still some left in old formats before",
"downloading the next update.")
# Also kept for retrocompatibility purposes
return Metadata(cid=parse_card_line(lines[1]),
ctype=parse_card_line(lines[2]),
title=parse_card_line(lines[3]),
count=int(parse_card_line(lines[6])),
period=int(parse_card_line(lines[4])),
answer=float(parse_card_line(lines[5]))
)
elif version == "dict:":
# Deprecated: this elif block will be removed in a new version
print("Warning! This Card format ('dict') is deprecated. Update all",
"your files in case that there are still some left in old formats before",
"downloading the next update.")
# Also kept for retrocompatibility purposes
# At some point I decided to number the versions of each dictionary format,
# but this was not always the case. This is what you get if you try to read
# whatever there is in very old files where the version should be
return Metadata(cid=lines[0],
ctype=lines[1],
title=lines[2],
count=int(lines[5]),
period=int(lines[3])
)
else:
# Deprecated: this elif block will be removed in a new version
print("Warning! This ancient Card format is deprecated. Update all",
"your files in case that there are still some left in old formats before",
"downloading the next update.")
# Also kept for retrocompatibility purposes
# This is for the oldest of file formats
return Metadata(cid=lines[0],
ctype=lines[1],
title=lines[2],
period=int(lines[3])
)

229
reader.py Normal file
View file

@ -0,0 +1,229 @@
#!/usr/bin/env python3
import random
from metadata import Metadata, parse_card_line
from generator import Generator
# This gives me the chat title, or the first and maybe last
# name of the user as fallback if it's a private chat
def get_chat_title(chat):
if chat.title is not None:
return chat.title
elif chat.first_name is not None:
if chat.last_name is not None:
return chat.first_name + " " + chat.last_name
else:
return chat.first_name
else:
return ""
class Memory(object):
def __init__(self, mid, content):
self.id = mid
self.content = content
# This is a chat Reader object, in charge of managing the parsing of messages
# for a specific chat, and holding said chat's metadata
class Reader(object):
# Media tagging variables
TAG_PREFIX = "^IS_"
STICKER_TAG = "^IS_STICKER^"
ANIM_TAG = "^IS_ANIMATION^"
VIDEO_TAG = "^IS_VIDEO^"
def __init__(self, metadata, vocab, min_period, max_period, logger, names=[]):
# The Metadata object holding a chat's specific bot parameters
self.meta = metadata
# The Generator object holding the vocabulary learned so far
self.vocab = vocab
# The maximum period allowed for this bot
self.max_period = max_period
# The short term memory, for recently read messages (see below)
self.short_term_mem = []
# The countdown until the period ends and it's time to talk
self.countdown = self.meta.period
# The logger object shared program-wide
self.logger = logger
# The bot's nicknames + username
self.names = names
# Create a new Reader from a Chat object
def FromChat(chat, min_period, max_period, logger):
meta = Metadata(chat.id, chat.type, get_chat_title(chat))
vocab = Generator()
return Reader(meta, vocab, min_period, max_period, logger)
# TODO: Create a new Reader from a whole Chat history
def FromHistory(history, vocab, min_period, max_period, logger):
return None
# Create a new Reader from a meta's file dump
def FromCard(card, vocab, min_period, max_period, logger):
meta = Metadata.loads(card)
return Reader(meta, vocab, min_period, max_period, logger)
# Deprecated: this method will be removed in a new version
def FromFile(text, min_period, max_period, logger, vocab=None):
print("Warning! This method of loading a Reader from file (Reader.FromFile(...))",
"is deprecated, and will be removed from the next update. Use FromCard instead.")
# Load a Reader from a file's text string
lines = text.splitlines()
version = parse_card_line(lines[0]).strip()
version = version if len(version.strip()) > 1 else lines[4]
logger.info("Dictionary version: {} ({} lines)".format(version, len(lines)))
if version == "v4" or version == "v5":
return Reader.FromCard(text, vocab, min_period, max_period, logger)
# I stopped saving the chat metadata and the cache together
elif version == "v3":
meta = Metadata.loadl(lines[0:8])
cache = '\n'.join(lines[9:])
vocab = Generator.loads(cache)
elif version == "v2":
meta = Metadata.loadl(lines[0:7])
cache = '\n'.join(lines[8:])
vocab = Generator.loads(cache)
elif version == "dict:":
meta = Metadata.loadl(lines[0:6])
cache = '\n'.join(lines[6:])
vocab = Generator.loads(cache)
else:
meta = Metadata.loadl(lines[0:4])
cache = lines[4:]
vocab = Generator(load=cache, mode=Generator.MODE_LIST)
# raise SyntaxError("Reader: Metadata format unrecognized.")
r = Reader(meta, vocab, min_period, max_period, logger)
return r
# Returns a nice lice little tuple package for the archivist to save to file.
# Also commits to long term memory any pending short term memories
def archive(self):
self.commit_memory()
return (self.meta.id, self.meta.dumps(), self.vocab.dumps())
# Checks type. Returns "True" for "group" even if it's supergroupA
def check_type(self, t):
return t in self.meta.type
# Hard check
def exactly_type(self, t):
return t == self.meta.type
def set_title(self, title):
self.meta.title = title
# Sets a new period in the Metadata
def set_period(self, period):
# The period has to be in the range [min..max_period]; otherwise, clamp to said range
new_period = max(self.min_period, min(period, self.max_period))
set_period = self.meta.set_period(new_period)
if new_period == set_period and new_period < self.countdown:
# If succesfully changed and the new period is less than the current
# remaining countdown, reduce the countdown to the new period
self.countdown = new_period
return new_period
def set_answer(self, prob):
return self.meta.set_answer(prob)
def cid(self):
return str(self.meta.id)
def count(self):
return self.meta.count
def period(self):
return self.meta.period
def title(self):
return self.meta.title
def answer(self):
return self.meta.answer
def ctype(self):
return self.meta.type
def is_restricted(self):
return self.meta.restricted
def toggle_restrict(self):
self.meta.restricted = (not self.meta.restricted)
def is_silenced(self):
return self.meta.silenced
def toggle_silence(self):
self.meta.silenced = (not self.meta.silenced)
# Rolls the chance for answering in this specific chat,
# according to the answer probability
def is_answering(self):
rand = random.random()
chance = self.answer()
if chance == 1:
return True
elif chance == 0:
return False
return rand <= chance
# Adds a new message to the short term memory
def add_memory(self, mid, content):
mem = Memory(mid, content)
self.short_term_mem.append(mem)
# Returns a random message ID from the short memory,
# when answering to a random comment
def random_memory(self):
if len(self.short_term_mem) == 0:
return None
mem = random.choice(self.short_term_mem)
return mem.id
def reset_countdown(self):
self.countdown = self.meta.period
# Reads a message
# This process will determine which kind of message it is (Sticker, Anim,
# Video, or actual text) and pre-process it accordingly for the Generator,
# then store it in the short term memory
def read(self, message):
mid = str(message.message_id)
if message.text is not None:
self.learn(mid, message.text)
elif message.sticker is not None:
self.learn_drawing(mid, Reader.STICKER_TAG, message.sticker.file_id)
elif message.animation is not None:
self.learn_drawing(mid, Reader.ANIM_TAG, message.animation.file_id)
elif message.video is not None:
self.learn_drawing(mid, Reader.VIDEO_TAG, message.video.file_id)
self.meta.count += 1
# Stores a multimedia message in the short term memory as a text with
# TAG + the media file ID
def learn_drawing(self, mid, tag, drawing):
self.learn(mid, tag + " " + drawing)
# Stores a text message in the short term memory
def learn(self, mid, text):
for name in self.names:
if name.casefold() in text.casefold() and len(text.split()) <= 3:
# If it's less than 3 words and one of the bot's names is in
# the message, ignore it as it's most probably just a summon
return
self.add_memory(mid, text)
# Commits the short term memory messages into the "long term memory"
# aka the vocabulary Generator's cache
def commit_memory(self):
for mem in self.short_term_mem:
self.vocab.add(mem.content)
self.short_term_mem = []
def generate_message(self, max_len):
return self.vocab.generate(size=max_len, silence=self.is_silenced())

194
scribe.py
View file

@ -1,194 +0,0 @@
#!/usr/bin/env python3
import random
from chatlog import *
from markov import Markov
def getTitle(chat):
if chat.title is not None:
return chat.title
elif chat.first_name is not None:
if chat.last_name is not None:
return chat.first_name + " " + chat.last_name
else:
return chat.first_name
else:
return ""
def rewrite(text):
words = text.replace('\n', '\n ').split(' ')
i = 0
while i < len(words):
w = words[i].strip(' \t')
if len(w) > 0:
words[i] = w
else:
del words[i]
i -= 1
i += 1
return words
class Page(object):
def __init__(self, mid, content):
self.id = mid
self.content = content
class Scribe(object):
TagPrefix = "^IS_"
StickerTag = "^IS_STICKER^"
AnimTag = "^IS_ANIMATION^"
VideoTag = "^IS_VIDEO^"
def __init__(self, chatlog, archivist):
self.chat = chatlog
self.archivist = archivist
self.pages = []
self.countdown = self.chat.freq
self.logger = self.archivist.logger
def FromChat(chat, archivist, newchat=False):
chatlog = Chatlog(chat.id, chat.type, getTitle(chat))
scribe = Scribe(chatlog, archivist)
return scribe
def FromData(data, archivist):
return None
def FromFile(log, archivist):
chatlog = Chatlog.loads(log)
return Scribe(chatlog, archivist)
def Recall(text, archivist):
lines = text.splitlines()
version = parse(lines[0]).strip()
version = version if len(version.strip()) > 1 else lines[4]
archivist.logger.info( "Dictionary version: {} ({} lines)".format(version, len(lines)) )
if version == "v4":
chatlog = Chatlog.loadl(lines[0:9])
cache = '\n'.join(lines[10:])
parrot = Markov.loads(cache)
elif version == "v3":
chatlog = Chatlog.loadl(lines[0:8])
cache = '\n'.join(lines[9:])
parrot = Markov.loads(cache)
elif version == "v2":
chatlog = Chatlog.loadl(lines[0:7])
cache = '\n'.join(lines[8:])
parrot = Markov.loads(cache)
elif version == "dict:":
chatlog = Chatlog.loadl(lines[0:6])
cache = '\n'.join(lines[6:])
parrot = Markov.loads(cache)
else:
chatlog = Chatlog.loadl(lines[0:4])
cache = lines[4:]
parrot = Markov(load=cache, mode=Markov.ModeList)
#raise SyntaxError("Scribe: Chatlog format unrecognized.")
s = Scribe(chatlog, archivist)
s.parrot = parrot
return s
def store(self, parrot):
self.archivist.store(self.chat.id, self.chat.dumps(), parrot)
def checkType(self, t):
return t in self.chat.type
def compareType(self, t):
return t == self.chat.type
def setTitle(self, title):
self.chat.title = title
def setFreq(self, freq):
if freq < self.countdown:
self.countdown = max(freq, 1)
return self.chat.set_freq(min(freq, self.archivist.maxFreq))
def setAnswer(self, afreq):
return self.chat.set_answer(afreq)
def cid(self):
return str(self.chat.id)
def count(self):
return self.chat.count
def freq(self):
return self.chat.freq
def title(self):
return self.chat.title
def answer(self):
return self.chat.answer
def type(self):
return self.chat.type
def isRestricted(self):
return self.chat.restricted
def restrict(self):
self.chat.restricted = (not self.chat.restricted)
def isSilenced(self):
return self.chat.silenced
def silence(self):
self.chat.silenced = (not self.chat.silenced)
def isAnswering(self):
rand = random.random()
chance = self.answer()
if chance == 1:
return True
elif chance == 0:
return False
return rand <= chance
def addPage(self, mid, content):
page = Page(mid, content)
self.pages.append(page)
def getReference(self):
page = random.choice(self.pages)
return page.id
def resetCountdown(self):
self.countdown = self.chat.freq
def learn(self, message):
mid = str(message.message_id)
if message.text is not None:
self.read(mid, message.text)
elif message.sticker is not None:
self.learnDrawing(mid, Scribe.StickerTag, message.sticker.file_id)
elif message.animation is not None:
self.learnDrawing(mid, Scribe.AnimTag, message.animation.file_id)
elif message.video is not None:
self.learnDrawing(mid, Scribe.VideoTag, message.video.file_id)
self.chat.count += 1
def learnDrawing(self, mid, tag, drawing):
self.read(mid, tag + " " + drawing)
def read(self, mid, text):
if "velasco" in text.casefold() and len(text.split()) <= 3:
return
words = [Markov.Head]
text = text + " " + Markov.Tail
words.extend(rewrite(text))
self.addPage(mid, words)
def teachParrot(self, parrot):
for page in self.pages:
parrot.learn_words(page.content)
self.pages = []
"""
def learnFrom(self, scribe):
self.chat.count += scribe.chat.count
self.parrot.cross(scribe.parrot)
"""

View file

@ -1,297 +1,435 @@
#!/usr/bin/env python3 #!/usr/bin/env python3
import random import random
from scribe import Scribe import time
from markov import Markov from sys import stderr
from telegram.error import * from memorylist import MemoryList
from reader import Reader, get_chat_title
from telegram.error import NetworkError
def send(bot, cid, text, replying=None, format=None, logger=None, **kwargs):
kwargs["parse_mode"] = format
kwargs["reply_to_message_id"] = replying
if text.startswith(Scribe.TagPrefix): # Auxiliar print to stderr function (alongside logger messages)
def eprint(*args, **kwargs):
print(*args, end=' ', file=stderr, **kwargs)
# Auxiliar message to send a text to a chat through a bot
def send(bot, cid, text, replying=None, formatting=None, logger=None, **kwargs):
# Markdown or HTML formatting (both argument names are valid)
kwargs["parse_mode"] = formatting or kwargs.get("parse_mode")
# ID of the message it's replying to (both argument names are valid)
kwargs["reply_to_message_id"] = replying or kwargs.get("reply_to_message_id")
# Reminder that dict.get(key) defaults to None if the key isn't found
if text.startswith(Reader.TAG_PREFIX):
# We're sending a media file ID
words = text.split(maxsplit=1) words = text.split(maxsplit=1)
if logger: if logger:
logger.info('Sending {} "{}" to {}'.format(words[0][4:-1], words[1], cid)) logger.info('Sending {} "{}" to {}'.format(words[0][4:-1], words[1], cid))
# Logs something like 'Sending VIDEO "VIDEO_ID" to CHAT_ID'
if words[0] == Scribe.StickerTag: if words[0] == Reader.STICKER_TAG:
return bot.send_sticker(cid, words[1], **kwargs) return bot.send_sticker(cid, words[1], **kwargs)
elif words[0] == Scribe.AnimTag: elif words[0] == Reader.ANIM_TAG:
return bot.send_animation(cid, words[1], **kwargs) return bot.send_animation(cid, words[1], **kwargs)
elif words[0] == Scribe.VideoTag: elif words[0] == Reader.VIDEO_TAG:
return bot.send_video(cid, words[1], **kwargs) return bot.send_video(cid, words[1], **kwargs)
else: else:
text # It's text
if logger: if logger:
mtype = "reply" if replying else "message" mtype = "reply" if (kwargs.get("reply_to_message_id")) else "message"
logger.info("Sending a {} to {}: '{}'".format(mtype, cid, text)) logger.info("Sending a {} to {}: '{}'".format(mtype, cid, text))
# eprint('.')
return bot.send_message(cid, text, **kwargs) return bot.send_message(cid, text, **kwargs)
def getTitle(chat):
if chat.title:
return chat.title
else:
last = chat.last_name if chat.last_name else ""
first = chat.first_name if chat.first_name else ""
name = " ".join([first, last]).strip()
if len(name) == 0:
return "Unknown"
else:
return name
class Speaker(object): class Speaker(object):
# Marks if the period is a fixed time when to send a new message
ModeFixed = "FIXED_MODE" ModeFixed = "FIXED_MODE"
ModeChance = "MODE_CHANCE" # Marks if the "periodic" messages have a weighted random chance to be sent, depending on the period
ModeChance = "CHANCE_MODE"
def __init__(self, name, username, archivist, logger, def __init__(self, username, archivist, logger, admin=0, nicknames=[],
reply=0.1, repeat=0.05, wakeup=False, mode=ModeFixed reply=0.1, repeat=0.05, wakeup=False, mode=ModeFixed,
memory=20, mute_time=60, save_time=3600, bypass=False,
cid_whitelist=None, max_len=50
): ):
self.name = name # List of nicknames other than the username that the bot can be called as
self.names = nicknames
# Mute time for Telegram network errors
self.mute_time = mute_time
# Last mute timestamp
self.mute_timer = None
# The bot's username, "@" included
self.username = username self.username = username
self.archivist = archivist # The minimum and maximum chat period for this bot
self.scriptorium = archivist.wakeScriptorium() self.min_period = archivist.min_period
self.max_period = archivist.max_period
# The Archivist functions to load and save from and to files
self.get_reader_file = archivist.get_reader
self.store_file = archivist.store
# Archivist function to crawl all stored Readers
self.readers_pass = archivist.readers_pass
# Legacy load logging emssages
logger.info("----") logger.info("----")
logger.info("Finished loading.") logger.info("Finished loading.")
logger.info("Loaded {} chats.".format(len(self.scriptorium))) logger.info("Loaded {} chats.".format(archivist.chat_count()))
logger.info("----") logger.info("----")
self.wakeup = wakeup
self.logger = logger
self.reply = reply
self.repeat = repeat
self.filterCids = archivist.filterCids
self.bypass=archivist.bypass
def announce(self, announcement, check=(lambda _: True)): # Wakeup flag that determines if it should send a wakeup message to stored groupchats
for scribe in self.scriptorium: self.wakeup = wakeup
# The logger shared program-wide
self.logger = logger
# Chance of sending messages as replies
self.reply = reply
# Chance of sending 2 messages in a row
self.repeat = repeat
# If not empty, whitelist of chat IDs to only respond to
self.cid_whitelist = cid_whitelist
# Memory list/cache for the last accessed chats
self.memory = MemoryList(memory)
# Minimum time to wait between memory saves (triggered at the next message from any chat)
self.save_time = save_time
# Last save timestamp
self.memory_timer = int(time.perf_counter())
# Admin user ID
self.admin = admin
# For testing purposes
self.bypass = bypass
# Max word length for a message
self.max_len = max_len
# Sends an announcement to all chats that pass the check
def announce(self, bot, announcement, check=(lambda _: True)):
for reader in self.readers_pass():
try: try:
if check(scribe): if check(reader):
send(bot, scribe.cid(), announcement) send(bot, reader.cid(), announcement)
logger.info("Waking up on chat {}".format(scribe.cid())) self.logger.info("Sending announcement to chat {}".format(reader.cid()))
except: except Exception:
pass pass
# If wakeup flag is set, sends a wake-up message as announcement to all chats that
# are groups. Also, always sends a wakeup message to the 'bot admin'
def wake(self, bot, wake): def wake(self, bot, wake):
send(bot, self.admin, wake)
if self.wakeup: if self.wakeup:
def check(scribe): def group_check(reader):
return scribe.checkType("group") return reader.check_type("group")
self.announce(wake, check) self.announce(bot, wake, group_check)
def getScribe(self, chat): # Looks up a reader in the memory list
def get_reader(self, cid):
return self.memory.search(lambda r: r.cid() == cid, None)
# Looks up and returns a reader if it's in memory, or loads up a reader from
# file, adds it to memory, and returns it. Any other reader pushed out of
# memory is saved to file
def load_reader(self, chat):
cid = str(chat.id) cid = str(chat.id)
if not cid in self.scriptorium: reader = self.get_reader(cid)
scribe = Scribe.FromChat(chat, self.archivist, newchat=True) if reader is not None:
self.scriptorium[cid] = scribe return reader
return scribe
else:
return self.scriptorium[cid]
def shouldReply(self, message, scribe): reader = self.get_reader_file(cid)
if not self.bypass and scribe.isRestricted(): if not reader:
reader = Reader.FromChat(chat, self.min_period, self.max_period, self.logger)
old_reader = self.memory.add(reader)
if old_reader is not None:
old_reader.commit_memory()
self.store(old_reader)
return reader
# Returns a reader if it's in memory, or loads it up from a file and returns
# it otherwise. Does NOT add the Reader to memory
# This is useful for command prompts that do not require the Reader to be cached
def access_reader(self, cid):
reader = self.get_reader(cid)
if reader is None:
return self.get_reader_file(cid)
return reader
# Returns True if the bot's username is called, or if one of the nicknames is
# mentioned and they're not another user's username
def mentioned(self, text):
if self.username in text:
return True
for name in self.names:
if name in text and "@{}".format(name) not in text:
return True
return False
# Returns True if not enough time has passed since the last mute timestamp
def is_mute(self):
current_time = int(time.perf_counter())
return self.mute_timer is not None and (current_time - self.mute_timer) < self.mute_time
# Series of checks to determine if the bot should reply to a specific message, aside
# from the usual periodic messages
def should_reply(self, message, reader):
if self.is_mute():
# Not if mute time hasn't finished
return False
if not self.bypass and reader.is_restricted():
# If we're not in testing mode and the chat is restricted
user = message.chat.get_member(message.from_user.id) user = message.chat.get_member(message.from_user.id)
if not self.userIsAdmin(user): if not self.user_is_admin(user):
# update.message.reply_text("You do not have permissions to do that.") # ...And the user has no permissions, should not reply
return False return False
# otherwise (testing mode, or the chat is unrestricted, or the user has permissions)
replied = message.reply_to_message replied = message.reply_to_message
text = message.text.casefold() if message.text else "" text = message.text.casefold() if message.text else ""
return ( ((replied is not None) and (replied.from_user.name == self.username)) or # Only if it's a reply to a message of ours or the bot is mentioned in the message
(self.username in text) or return (((replied is not None) and (replied.from_user.name == self.username))
(self.name in text and "@{}".format(self.name) not in text) or (self.mentioned(text)))
)
def store(self, scribe): def store(self, reader):
if self.parrot is None: if reader is None:
raise ValueError("Tried to store a Parrot that is None.") raise ValueError("Tried to store a None Reader.")
else: else:
scribe.store(self.parrot.dumps()) self.store_file(*reader.archive())
def loadParrot(self, scribe): # Check if enough time for saving memory has passed
newParrot = False def should_save(self):
self.parrot = self.archivist.wakeParrot(scribe.cid()) current_time = int(time.perf_counter())
if self.parrot is None: elapsed = (current_time - self.memory_timer)
newParrot = True self.logger.debug("Save check: {}".format(elapsed))
self.parrot = Markov() return elapsed >= self.save_time
scribe.teachParrot(self.parrot)
self.store(scribe)
return newParrot
def read(self, bot, update): # Save all Readers in memory to files if it's save time
chat = update.message.chat def save(self):
scribe = self.getScribe(chat) if self.should_save():
scribe.learn(update.message) self.logger.info("Saving chats in memory...")
for reader in self.memory:
self.store(reader)
self.memory_timer = time.perf_counter()
self.logger.info("Chats saved.")
if self.shouldReply(update.message, scribe) and scribe.isAnswering(): # Reads a non-command message
self.say(bot, scribe, replying=update.message.message_id) def read(self, update, context):
# Check for save time
self.save()
# Ignore non-message updates
if update.message is None:
return return
title = getTitle(update.message.chat) chat = update.message.chat
if title != scribe.title(): reader = self.load_reader(chat)
scribe.setTitle(title) reader.read(update.message)
scribe.countdown -= 1 # Check if it's a "replyable" message & roll the chance to do so
if scribe.countdown < 0: if self.should_reply(update.message, reader) and reader.is_answering():
scribe.resetCountdown() self.say(context.bot, reader, replying=update.message.message_id)
rid = scribe.getReference() if random.random() <= self.reply else None return
self.say(bot, scribe, replying=rid)
elif (scribe.freq() - scribe.countdown) % self.archivist.saveCount == 0:
self.loadParrot(scribe)
def speak(self, bot, update): # Update the Reader's title if it has changed since the last message read
title = get_chat_title(update.message.chat)
if title != reader.title():
reader.set_title(title)
# Decrease the countdown for the chat, and send a message if it reached 0
reader.countdown -= 1
if reader.countdown < 0:
reader.reset_countdown()
# Random chance to reply to a recent message
rid = reader.random_memory() if random.random() <= self.reply else None
self.say(context.bot, reader, replying=rid)
# Handles /speak command
def speak(self, update, context):
chat = (update.message.chat) chat = (update.message.chat)
scribe = self.getScribe(chat) reader = self.load_reader(chat)
if not self.bypass and scribe.isRestricted(): if not self.bypass and reader.is_restricted():
user = update.message.chat.get_member(update.message.from_user.id) user = update.message.chat.get_member(update.message.from_user.id)
if not self.userIsAdmin(user): if not self.user_is_admin(user):
# update.message.reply_text("You do not have permissions to do that.") # update.message.reply_text("You do not have permissions to do that.")
return return
mid = str(update.message.message_id) mid = str(update.message.message_id)
replied = update.message.reply_to_message replied = update.message.reply_to_message
# Reply to the message that the command replies to, otherwise to the command itself
rid = replied.message_id if replied else mid rid = replied.message_id if replied else mid
words = update.message.text.split() words = update.message.text.split()
if len(words) > 1: if len(words) > 1:
scribe.learn(' '.join(words[1:])) reader.read(' '.join(words[1:]))
self.say(bot, scribe, replying=rid) self.say(context.bot, reader, replying=rid)
def userIsAdmin(self, member): # Checks user permissions. Bot admin is always considered as having full permissions
def user_is_admin(self, member):
self.logger.info("user {} ({}) requesting a restricted action".format(str(member.user.id), member.user.name)) self.logger.info("user {} ({}) requesting a restricted action".format(str(member.user.id), member.user.name))
# self.logger.info("Bot Creator ID is {}".format(str(self.archivist.admin))) # eprint('!')
return ((member.status == 'creator') or # self.logger.info("Bot Creator ID is {}".format(str(self.admin)))
(member.status == 'administrator') or return ((member.status == 'creator')
(member.user.id == self.archivist.admin)) or (member.status == 'administrator')
or (member.user.id == self.admin))
def speech(self, scribe): # Generate speech (message)
return self.parrot.generate_markov_text(size=self.archivist.maxLen, silence=scribe.isSilenced()) def speech(self, reader):
return reader.generate_message(self.max_len)
def say(self, bot, scribe, replying=None, **kwargs): # Say a newly generated message
if self.filterCids is not None and not scribe.cid() in self.filterCids: def say(self, bot, reader, replying=None, **kwargs):
cid = reader.cid()
if self.cid_whitelist is not None and cid not in self.cid_whitelist:
# Don't, if there's a whitelist and this chat is not in it
return
if self.is_mute():
# Don't, if mute time isn't over
return return
self.loadParrot(scribe)
try: try:
send(bot, scribe.cid(), self.speech(scribe), replying, logger=self.logger, **kwargs) send(bot, cid, self.speech(reader), replying, logger=self.logger, **kwargs)
if self.bypass: if self.bypass:
maxFreq = self.archivist.maxFreq # Testing mode, force a reasonable period (to not have the bot spam one specific chat with a low period)
scribe.setFreq(random.randint(maxFreq//4, maxFreq)) minp = self.min_period
maxp = self.max_period
rangep = maxp - minp
reader.set_period(random.randint(rangep // 4, rangep) + minp)
if random.random() <= self.repeat: if random.random() <= self.repeat:
send(bot, scribe.cid(), self.speech(scribe), logger=self.logger, **kwargs) send(bot, cid, self.speech(reader), logger=self.logger, **kwargs)
except TimedOut: # Consider any Network Error as a Telegram temporary ban, as I couldn't find
scribe.setFreq(scribe.freq() + self.archivist.freqIncrement) # out in the documentation how error 429 is handled by python-telegram-bot
self.logger.warning("Increased period for chat {} [{}]".format(scribe.title(), scribe.cid())) except NetworkError as e:
self.logger.error("Sending a message caused network error:")
self.logger.exception(e)
self.logger.error("Going mute for {} seconds.".format(self.mute_time))
self.mute_timer = int(time.perf_counter())
except Exception as e: except Exception as e:
self.logger.error("Sending a message caused error:") self.logger.error("Sending a message caused exception:")
self.logger.error(e) self.logger.exception(e)
def getCount(self, bot, update): # Handling /count command
def get_count(self, update, context):
cid = str(update.message.chat.id) cid = str(update.message.chat.id)
scribe = self.scriptorium[cid] reader = self.load_reader(cid)
num = str(scribe.count()) if self.scriptorium[cid] else "no"
num = str(reader.count()) if reader else "no"
update.message.reply_text("I remember {} messages.".format(num)) update.message.reply_text("I remember {} messages.".format(num))
def getChats(self, bot, update): # Handling /get_chats command (exclusive for bot admin)
lines = ["[{}]: {}".format(cid, self.scriptorium[cid].title()) for cid in self.scriptorium] def get_chats(self, update, context):
list = "\n".join(lines) lines = ["[{}]: {}".format(reader.cid(), reader.title()) for reader in self.readers_pass()]
update.message.reply_text( "\n\n".join(["I have the following chats:", list]) ) chat_list = "\n".join(lines)
update.message.reply_text("I have the following chats:\n\n" + chat_list)
def freq(self, bot, update): # Handling /period command
# Print the current period or set a new one if one is given
def period(self, update, context):
chat = update.message.chat chat = update.message.chat
scribe = self.getScribe(chat) reader = self.load_reader(str(chat.id))
words = update.message.text.split() words = update.message.text.split()
if len(words) <= 1: if len(words) <= 1:
update.message.reply_text("The current speech period is {}".format(scribe.freq())) update.message.reply_text("The current speech period is {}".format(reader.period()))
return return
if scribe.isRestricted(): if reader.is_restricted():
user = update.message.chat.get_member(update.message.from_user.id) user = update.message.chat.get_member(update.message.from_user.id)
if not self.userIsAdmin(user): if not self.user_is_admin(user):
update.message.reply_text("You do not have permissions to do that.") update.message.reply_text("You do not have permissions to do that.")
return return
try: try:
freq = int(words[1]) period = int(words[1])
freq = scribe.setFreq(freq) period = reader.set_period(period)
update.message.reply_text("Period of speaking set to {}.".format(freq)) update.message.reply_text("Period of speaking set to {}.".format(period))
scribe.store(None) except Exception:
except: update.message.reply_text("Format was confusing; period unchanged from {}.".format(reader.period()))
update.message.reply_text("Format was confusing; period unchanged from {}.".format(scribe.freq()))
def answer(self, bot, update): # Handling /answer command
# Print the current answer probability or set a new one if one is given
def answer(self, update, context):
chat = update.message.chat chat = update.message.chat
scribe = self.getScribe(chat) reader = self.load_reader(str(chat.id))
words = update.message.text.split() words = update.message.text.split()
if len(words) <= 1: if len(words) <= 1:
update.message.reply_text("The current answer probability is {}".format(scribe.answer())) update.message.reply_text("The current answer probability is {}".format(reader.answer()))
return return
if scribe.isRestricted(): if reader.is_restricted():
user = update.message.chat.get_member(update.message.from_user.id) user = update.message.chat.get_member(update.message.from_user.id)
if not self.userIsAdmin(user): if not self.user_is_admin(user):
update.message.reply_text("You do not have permissions to do that.") update.message.reply_text("You do not have permissions to do that.")
return return
try: try:
answ = float(words[1]) answer = float(words[1])
answ = scribe.setAnswer(answ) answer = reader.set_answer(answer)
update.message.reply_text("Answer probability set to {}.".format(answ)) update.message.reply_text("Answer probability set to {}.".format(answer))
scribe.store(None) except Exception:
except: update.message.reply_text("Format was confusing; answer probability unchanged from {}.".format(reader.answer()))
update.message.reply_text("Format was confusing; answer probability unchanged from {}.".format(scribe.answer()))
def restrict(self, bot, update): # Handling /restrict command
# Toggle the restriction value if it's a group chat and the user has permissions to do so
def restrict(self, update, context):
if "group" not in update.message.chat.type: if "group" not in update.message.chat.type:
update.message.reply_text("That only works in groups.") update.message.reply_text("That only works in groups.")
return return
chat = update.message.chat chat = update.message.chat
user = chat.get_member(update.message.from_user.id) user = chat.get_member(update.message.from_user.id)
scribe = self.getScribe(chat) reader = self.load_reader(str(chat.id))
if scribe.isRestricted():
if not self.userIsAdmin(user): if reader.is_restricted():
if not self.user_is_admin(user):
update.message.reply_text("You do not have permissions to do that.") update.message.reply_text("You do not have permissions to do that.")
return return
scribe.restrict() reader.toggle_restrict()
allowed = "let only admins" if scribe.isRestricted() else "let everyone" allowed = "let only admins" if reader.is_restricted() else "let everyone"
update.message.reply_text("I will {} configure me now.".format(allowed)) update.message.reply_text("I will {} configure me now.".format(allowed))
def silence(self, bot, update): # Handling /silence command
# Toggle the silence value if it's a group chat and the user has permissions to do so
def silence(self, update, context):
if "group" not in update.message.chat.type: if "group" not in update.message.chat.type:
update.message.reply_text("That only works in groups.") update.message.reply_text("That only works in groups.")
return return
chat = update.message.chat chat = update.message.chat
user = chat.get_member(update.message.from_user.id) user = chat.get_member(update.message.from_user.id)
scribe = self.getScribe(chat) reader = self.load_reader(str(chat.id))
if scribe.isRestricted():
if not self.userIsAdmin(user): if reader.is_restricted():
if not self.user_is_admin(user):
update.message.reply_text("You do not have permissions to do that.") update.message.reply_text("You do not have permissions to do that.")
return return
scribe.silence() reader.toggle_silence()
allowed = "avoid mentioning" if scribe.isSilenced() else "mention" allowed = "avoid mentioning" if reader.is_silenced() else "mention"
update.message.reply_text("I will {} people now.".format(allowed)) update.message.reply_text("I will {} people now.".format(allowed))
def who(self, bot, update): # Handling /who command
def who(self, update, context):
msg = update.message msg = update.message
usr = msg.from_user usr = msg.from_user
cht = msg.chat cht = msg.chat
chtname = cht.title if cht.title else cht.first_name chtname = cht.title if cht.title else cht.first_name
rdr = self.access_reader(str(cht.id))
answer = ("You're **{name}**, with username `{username}`, and " answer = ("You're **{name}**, with username `{username}`, and "
"id `{uid}`.\nYou're messaging in the chat named __{cname}__," "id `{uid}`.\nYou're messaging in the chat named __{cname}__,"
" of type {ctype}, with id `{cid}`, and timestamp `{tstamp}`." " of type {ctype}, with id `{cid}`, and timestamp `{tstamp}`."
).format(name=usr.full_name, username=usr.username, ).format(name=usr.full_name, username=usr.username,
uid=usr.id, cname=chtname, cid=cht.id, uid=usr.id, cname=chtname, cid=cht.id,
ctype=scribe.type(), tstamp=str(msg.date)) ctype=rdr.ctype(), tstamp=str(msg.date))
msg.reply_markdown(answer) msg.reply_markdown(answer)
def where(self, bot, update): # Handling /where command
print("THEY'RE ASKING WHERE") def where(self, update, context):
msg = update.message msg = update.message
chat = msg.chat chat = msg.chat
scribe = self.getScribe(chat) reader = self.access_reader(str(chat.id))
if scribe.isRestricted() and scribe.isSilenced(): if reader.is_restricted() and reader.is_silenced():
permissions = "restricted and silenced" permissions = "restricted and silenced"
elif scribe.isRestricted(): elif reader.is_restricted():
permissions = "restricted but not silenced" permissions = "restricted but not silenced"
elif scribe.isSilenced(): elif reader.is_silenced():
permissions = "not restricted but silenced" permissions = "not restricted but silenced"
else: else:
permissions = "neither restricted nor silenced" permissions = "neither restricted nor silenced"
@ -299,8 +437,8 @@ class Speaker(object):
answer = ("You're messaging in the chat of saved title __{cname}__," answer = ("You're messaging in the chat of saved title __{cname}__,"
" with id `{cid}`, message count {c}, period {p}, and answer " " with id `{cid}`, message count {c}, period {p}, and answer "
"probability {a}.\n\nThis chat is {perm}." "probability {a}.\n\nThis chat is {perm}."
).format(cname=scribe.title(), cid=scribe.cid(), ).format(cname=reader.title(), cid=reader.cid(),
c=scribe.count(), p=scribe.freq(), a=scribe.answer(), c=reader.count(), p=reader.period(),
perm=permissions) a=reader.answer(), perm=permissions)
msg.reply_markdown(answer) msg.reply_markdown(answer)

View file

@ -1,8 +1,9 @@
#!/usr/bin/env python3 #!/usr/bin/env python3
import logging, argparse import logging
import argparse
from telegram.ext import Updater, CommandHandler, MessageHandler, Filters from telegram.ext import Updater, CommandHandler, MessageHandler, Filters
from telegram.error import * # from telegram.error import *
from archivist import Archivist from archivist import Archivist
from speaker import Speaker from speaker import Speaker
@ -18,7 +19,7 @@ speakerbot = None
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
# Enable logging # Enable logging
log_format="[{}][%(asctime)s]%(name)s::%(levelname)s: %(message)s".format(username.upper()) log_format = "[{}][%(asctime)s]%(name)s::%(levelname)s: %(message)s".format(username.upper())
if coloredlogsError: if coloredlogsError:
logging.basicConfig(format=log_format, level=logging.INFO) logging.basicConfig(format=log_format, level=logging.INFO)
@ -38,69 +39,109 @@ help_msg = """I answer to the following commands:
/explain - I explain how I work. /explain - I explain how I work.
/help - I send this message. /help - I send this message.
/count - I tell you how many messages from this chat I remember. /count - I tell you how many messages from this chat I remember.
/freq - Change the frequency of my messages. (Maximum of 100000) /period - Change the period of my messages. (Maximum of 100000)
/speak - Forces me to speak. /speak - Forces me to speak.
/answer - Change the probability to answer to a reply. (Decimal between 0 and 1). /answer - Change the probability to answer to a reply. (Decimal between 0 and 1).
/restrict - Toggle restriction of configuration commands to admins only. /restrict - Toggle restriction of configuration commands to admins only.
/silence - Toggle restriction on mentions by the bot. /silence - Toggle restriction on mentions by the bot.
/who - Tell general information about you and your message. For debugging purposes.
/where - Tell my configuration for this chat.
""" """
about_msg = "I am yet another Markov Bot experiment. I read everything you type to me and then spit back nonsensical messages that look like yours.\n\nYou can send /explain if you want further explanation." about_msg = "I am yet another Markov Bot experiment. I read everything you type to me and then spit back nonsensical messages that look like yours.\n\nYou can send /explain if you want further explanation."
explanation = "I decompose every message I read in groups of 3 consecutive words, so for each consecutive pair I save the word that can follow them. I then use this to make my own messages. At first I will only repeat your messages because for each 2 words I will have very few possible following words.\n\nI also separate my vocabulary by chats, so anything I learn in one chat I will only say in that chat. For privacy, you know. Also, I save my vocabulary in the form of a json dictionary, so no logs are kept.\n\nMy default frequency in private chats is one message of mine from each 2 messages received, and in group chats it\'s 10 messages I read for each message I send." explanation = "I decompose every message I read in groups of 3 consecutive words, so for each consecutive pair I save the word that can follow them. I then use this to make my own messages. At first I will only repeat your messages because for each 2 words I will have very few possible following words.\n\nI also separate my vocabulary by chats, so anything I learn in one chat I will only say in that chat. For privacy, you know. Also, I save my vocabulary in the form of a json dictionary, so no logs are kept.\n\nMy default period in private chats is one message of mine from each 2 messages received, and in group chats it\'s 10 messages I read for each message I send."
def static_reply(text, format=None): def static_reply(text, format=None):
def reply(bot, update): def reply(update, context):
update.message.reply_text(text, parse_mode=format) update.message.reply_text(text, parse_mode=format)
return reply return reply
def error(bot, update, error):
logger.warning('Update "{}" caused error "{}"'.format(update, error))
def stop(bot, update): def error(update, context):
scribe = speakerbot.getScribe(update.message.chat.id) logger.warning('The following update:\n"{}"\n\nCaused the following error:\n'.format(update))
#del chatlogs[chatlog.id] logger.exception(context.error)
#os.remove(LOG_DIR + chatlog.id + LOG_EXT) # raise error
logger.warning("I got blocked by user {} [{}]".format(scribe.title(), scribe.cid()))
def stop(update, context):
reader = speakerbot.get_reader(str(update.message.chat.id))
# del chatlogs[chatlog.id]
# os.remove(LOG_DIR + chatlog.id + LOG_EXT)
logger.warning("I got blocked by user {} [{}]".format(reader.title(), reader.cid()))
def main(): def main():
global speakerbot global speakerbot
parser = argparse.ArgumentParser(description='A Telegram markov bot.') parser = argparse.ArgumentParser(description='A Telegram markov bot.')
parser.add_argument('token', metavar='TOKEN', help='The Bot Token to work with the Telegram Bot API') parser.add_argument('token', metavar='TOKEN',
parser.add_argument('admin_id', metavar='ADMIN_ID', type=int, help='The ID of the Telegram user that manages this bot') help='The Bot Token to work with the Telegram Bot API')
parser.add_argument('-w', '--wakeup', action='store_true', help='Flag that makes the bot send a first message to all chats during wake up.') parser.add_argument('admin_id', metavar='ADMIN_ID', type=int, default=0,
help='The ID of the Telegram user that manages this bot')
parser.add_argument('-w', '--wakeup', action='store_true',
help='Flag that makes the bot send a first message to all chats during wake up.')
parser.add_argument('-f', '--filter', nargs='*', default=None, metavar='cid',
help='Zero or more chat IDs to add in a filter whitelist (default is empty, all chats allowed)')
parser.add_argument('-n', '--nicknames', nargs='*', default=[], metavar='name',
help='Any possible nicknames that the bot could answer to.')
parser.add_argument('-d', '--directory', metavar='CHATLOG_DIR', default='./chatlogs',
help='The chat logs directory path (default: "./chatlogs").')
parser.add_argument('-c', '--capacity', metavar='C', type=int, default=20,
help='The memory capacity for the last C updated chats. (default: 20).')
parser.add_argument('-m', '--mute_time', metavar='T', type=int, default=60,
help='The time (in s) for the muting period when Telegram limits the bot. (default: 60).')
parser.add_argument('-s', '--save_time', metavar='T', type=int, default=3600,
help='The time (in s) for periodic saves. (default: 3600)')
parser.add_argument('-p', '--min_period', metavar='MIN_P', type=int, default=1,
help='The minimum value for a chat\'s period. (default: 1)')
parser.add_argument('-P', '--max_period', metavar='MAX_P', type=int, default=100000,
help='The maximum value for a chat\'s period. (default: 100000)')
args = parser.parse_args() args = parser.parse_args()
# Create the EventHandler and pass it your bot's token. assert args.max_period >= args.min_period
updater = Updater(args.token)
#filterCids=["-1001036575277", "-1001040087584", str(args.admin_id)] # Create the EventHandler and pass it your bot's token.
filterCids=None updater = Updater(args.token, use_context=True)
filter_cids = args.filter
if filter_cids:
filter_cids.append(str(args.admin_id))
archivist = Archivist(logger, archivist = Archivist(logger,
chatdir="chatlogs/", chatdir=args.directory,
chatext=".vls", chatext=".vls",
admin=args.admin_id, min_period=args.min_period,
filterCids=filterCids, max_period=args.max_period,
readOnly=False read_only=False
) )
speakerbot = Speaker("velasco", "@" + username, archivist, logger, wakeup=args.wakeup) username = updater.bot.get_me().username
speakerbot = Speaker("@" + username,
archivist,
logger,
admin=args.admin_id,
cid_whitelist=filter_cids,
nicknames=args.nicknames,
wakeup=args.wakeup,
memory=args.capacity,
mute_time=args.mute_time,
save_time=args.save_time)
# Get the dispatcher to register handlers # Get the dispatcher to register handlers
dp = updater.dispatcher dp = updater.dispatcher
# on different commands - answer in Telegram # on different commands - answer in Telegram
dp.add_handler(CommandHandler("start", static_reply(start_msg) )) dp.add_handler(CommandHandler("start", static_reply(start_msg)))
dp.add_handler(CommandHandler("about", static_reply(about_msg) )) dp.add_handler(CommandHandler("about", static_reply(about_msg)))
dp.add_handler(CommandHandler("explain", static_reply(explanation) )) dp.add_handler(CommandHandler("explain", static_reply(explanation)))
dp.add_handler(CommandHandler("help", static_reply(help_msg) )) dp.add_handler(CommandHandler("help", static_reply(help_msg)))
dp.add_handler(CommandHandler("count", speakerbot.getCount)) dp.add_handler(CommandHandler("count", speakerbot.get_count))
dp.add_handler(CommandHandler("period", speakerbot.freq)) dp.add_handler(CommandHandler("period", speakerbot.period))
dp.add_handler(CommandHandler("list", speakerbot.getChats, Filters.chat(chat_id=archivist.admin))) dp.add_handler(CommandHandler("list", speakerbot.get_chats, filters=Filters.chat(chat_id=speakerbot.admin)))
#dp.add_handler(CommandHandler("user", get_name, Filters.chat(chat_id=archivist.admin))) # dp.add_handler(CommandHandler("user", get_name, Filters.chat(chat_id=archivist.admin)))
#dp.add_handler(CommandHandler("id", get_id)) # dp.add_handler(CommandHandler("id", get_id))
dp.add_handler(CommandHandler("stop", stop)) dp.add_handler(CommandHandler("stop", stop))
dp.add_handler(CommandHandler("speak", speakerbot.speak)) dp.add_handler(CommandHandler("speak", speakerbot.speak))
dp.add_handler(CommandHandler("answer", speakerbot.answer)) dp.add_handler(CommandHandler("answer", speakerbot.answer))
@ -126,5 +167,6 @@ def main():
# start_polling() is non-blocking and will stop the bot gracefully. # start_polling() is non-blocking and will stop the bot gracefully.
updater.idle() updater.idle()
if __name__ == '__main__': if __name__ == '__main__':
main() main()