Move Mail into Monthly IMAP4 Folders with Python
Posted on December 6, 2017  (Last modified on December 13, 2022 )
4 minutes • 773 words • Other languages: Deutsch
Recently, I had to cope with a very large mail archive in a company – the email account collected all kinds of automatic messages and status reports sent by mail which have to be archived due to legal reasons. Until now, an employee moved about 40,000 mails by hand each month. It is impressive that Firefox has no problems with such numbers, but it takes quite a while, and it is pretty boring to watch the program do this while I could do better things in the meantime.
So I searched the web for a few hints and wrote a small Python script that will log in via IMAP and move mails to the correct monthly folder.
Configuration File
First, create a configuration file, e.g. my_domain.com.ini
. It should contain something along these lines:
[server]
hostname: imap.server.com
[account]
username: login
password: password
Fill in the hostname of you IMAP server, and include username and password for the account. The script below will try to log in via TLS/SSL. If you want to use an insecure connection, you will have to change a few lines.
The Script
The script imap_folder_per_month.py
is shown here:
#!/usr/bin/python3
import configparser
import datetime
import email.utils
import imaplib
import os
import re
import sys
list_response_pattern = re.compile(r'\((?P<flags>.*?)\) "(?P<delimiter>.*)" (?P<name>.*)')
def parse_list_response(line):
flags, delimiter, mailbox_name = list_response_pattern.match(line.decode()).groups()
mailbox_name = mailbox_name.strip('"')
return (flags, delimiter, mailbox_name)
def open_connection(config_file, verbose=False):
# Read the config file
config = configparser.ConfigParser()
config.read([os.path.expanduser(config_file)])
# Connect to the server
hostname = config.get('server', 'hostname')
if verbose:
print('Connecting to', hostname)
connection = imaplib.IMAP4_SSL(hostname)
# Login to our account
username = config.get('account', 'username')
password = config.get('account', 'password')
if verbose:
print('Logging in as', username)
connection.login(username, password)
return connection
if __name__ == '__main__':
if len(sys.argv) < 2:
print('Pass the name of the config file as first parameter, please!')
sys.exit(-1)
c = open_connection(sys.argv[1], verbose=True)
try:
# List of Mailboxes
mailboxes = []
typ, data = c.list()
for line in data:
flags, delimiter, mailbox_name = parse_list_response(line)
mailboxes.append(mailbox_name)
# print 'Parsed response:', (flags, delimiter, mailbox_name)
# pprint.pprint(mailboxes)
print(mailboxes)
# get all messages from inbox
typ, data = c.select('INBOX')
num_msgs = int(data[0])
print('There are %d messages in INBOX' % num_msgs)
typ, msg_ids = c.search(None, 'ALL')
msg_ids = msg_ids[0].decode()
if msg_ids == '':
msg_ids = []
else:
msg_ids = msg_ids.split(' ')[::-1]
for msg_id in msg_ids:
typ, msg_data = c.fetch(msg_id, '(RFC822)')
msg = email.message_from_string(msg_data[0][1].decode())
date_tuple = email.utils.parsedate_tz(msg['Date'])
# pprint.pprint(date_tuple)
# Convert to date string
local_date = datetime.datetime.fromtimestamp(email.utils.mktime_tz(date_tuple))
to_mailbox = 'INBOX/' + local_date.strftime("%Y-%m")
print('Parsing #' + msg_id + ' ' + local_date.strftime("%Y-%m-%d"))
if to_mailbox not in mailboxes:
print(to_mailbox + ' created')
typ, create_response = c.create(to_mailbox)
mailboxes.append(to_mailbox)
# Move message to mailbox
c.copy(msg_id, to_mailbox)
c.store(msg_id, '+FLAGS', r'(\Deleted)')
# Clear
c.expunge()
finally:
c.close()
c.logout()
The script reads the name of the configuration file as first parameter of the command line and fetches the login data. It will then attempt to log in (the script will fail in an ugly way if something goes wrong).
In the main part, the script will fetch the folders within the INBOX. We do this now, because we want to know which folders have been created already (we need sub folders named 2017-10, 2017-11, 2017-12 etc.). Then the mails of the INBOX will be read using the highest number first (to cope with incoming mails while we are at it).
We will try to read the local date from each mail, considering time zone shifts and adjusting time, if needed (instead of local time, you could use UTC, of course). So, we have a local time for each mail in the end and can now move it to the right folder.
The script will check for existing folders and create non-existent ones. Finally, the mail will be copied and deleted from the INBOX. After all mails have been checked, the postbox will be emtied (expunge in IMAP).
A call of python3 imap_folder_per_month.py my_domain.com.ini
should be sufficient to start the script. It is possible
that you system does not have all modules installed yet. You can do this by installing them via pip install imaplib
,
for example. Please check Python tutorials of you operating system and Python installer to see how to do it.
Automatic Movement of Mails with Cron
Once you have checked everything works, you can create a cronjob to move your mails regularly. Depending on how many
mails the account receives, it makes sense to do this hourly or even minutely. Using crontab -e
on a Unix like system
should work:
*/15 * * * * /path/to/imap_folder_per_month.py /path/to/my_domain.com.ini > /dev/null
In this example, the script will be called every 15 minutes. You have to adjust the path names, naturally.
Finally, no more mail monster movements each month!