Weird dovecot Error: dovecot/doveadm "failed: cache message size"

This one stumped me for a while and I only realized what was wrong after getting ready to submit a bug report to dovecot. I am putting it here since it’s not a common case and it might help some people. I found no help online when it happened to me.

Here’s what the error message I encountered while using doveadm to read a message looked like.

doveadm fetch -u user@DOMAINNAME hdr mailbox SpamCheck/Spam uid 10
doveadm(user@DOMAINNAME): Error: Mailbox SpamCheck/Spam: UID=10: 
 read(/var/spool/mail/virtual/DOMAINNAME/public/.Spam/cur/1583888147.M606102P10872.MAILDOMAIN,S=5367,W=5455:2,S) failed: 
 Cached message size larger than expected (5367 > 2476, box=SpamCheck/Spam, UID=10) (read reason=mail stream)
doveadm(user@DOMAINNAME): Error: Corrupted record in index cache file /var/spool/mail/virtual/DOMAINNAME/public/.Spam/dovecot.index.cache: 
 UID 10: Broken physical size in mailbox SpamCheck/Spam: [...]

dovecot has a nice feature where its imap module as well as all of its components (lmtp,doveadm etc) can read and write compressed email files. You would use this if you are running low on space and want to preserve some until you can upgrade. Although you can compress old email files too, it is not strictly necessary (and is a complicated process anyway) since dovecot is able to distinguish and read plain emails along with those that are compressed, regardless of the compression algorithm.

With one caveat. And this is what got me.

You have to enable the compression methods that you use for all subsystems that need it. doveadm is NOT excluded from this.

I changed all occurences of my actual domain name to DOMAINNAME in the following.

Let’s step back a bit and start over. I first search the mailbox to get the uid of a message. We’re looking for this in that particular user’s SpamCheck/Spam mailbox. (We’re using this public shared mailbox for bayes operations. I will cover public shared mailboxes with dovecot later.)

doveadm search -u user@DOMAINNAME mailbox SpamCheck/Spam
d1adce0f1722685e4a010000233c2ca8 10

For the rest of this we’ll concentrate on this one mail message with the uid discovered above. Let’s try to read its headers.

doveadm fetch -u user@DOMAINNAME hdr mailbox SpamCheck/Spam uid 10

which results in a couple of error messages that repeat a few times.

doveadm(user@DOMAINNAME): Error: Mailbox SpamCheck/Spam: UID=10: 
read(/var/spool/mail/virtual/DOMAINNAME/public/.Spam/cur/1583888147.M606102P10872.MAILDOMAIN,S=5367,W=5455:2,S) failed: 
Cached message size larger than expected (5367 > 2476, box=SpamCheck/Spam, UID=10) (read reason=mail stream)

What’s significant here is that doveadm is looking at the message size first and comparing it to what it has in the cache file for the same message. Size is denoated by the S= and is 5367 in this case. However dovecot claims that the size should have been 2476 according to its cache.

This particular message was compressed with bz2 via zlib by dovecot when it was delivered (I used first lda, later lmtp for local delivery rather than the mechanism provided by my MTA postfix in order to have a single auth database provided by dovecot and backed by LDAP).

The second error message this generates is as follows.

doveadm(user@DOMAINNAME): Error: Corrupted record in index cache file
 /var/spool/mail/virtual/DOMAINNAME/public/.Spam/dovecot.index.cache: UID 10: Broken physical size in mailbox SpamCheck/Spam:

Because the size of file does not match the size for it in the cache, dovecot concludes that the message must be corrupt and it doesn’t try to read it. But is the file actually corrupt? Let’s decompress it manually. While we’re at it we’ll actually check it’s size too.

 bzcat 1583888147.M606102P10872.MAILDOMAIN,S=5367,W=5455:2,S > /tmp/email.txt
 ls -alF /tmp/email.txt
-rw-r--r-- 1 root root 5367 Mar 11 02:08 /tmp/email.txt

As you can see the file decompresses without an error and if you look at the file size, it is the same as the S= field in the filename.

When I actually inspected this file with cat I saw that it was a regular email header. It wasn’t incorrect, or corrupted. I won’t paste that whole thing because there’s nothing interesting in it. Just a boring email header with From’s and Reply To’s and everything.

The real kicker is, this particular email message opened and displayed fine when using an IMAP client.

At one point I did have the inkling that compression support was missing somewhere but a cursory glance revealed that I had a bunch of zlib settings enabled in the config file. Besides I could actually read the emails so everything should have been configured correctly right?

Wrong!! My zlib settings where inside particular imap, lmpt protocol blocks. These only apply to said protocols' subsystems. In order for doveadm to have support for various plugins, zlib in my case, those have to be declared in the global section. i.e outside of protocol blocks.

I think this was pretty dumb of me. However I haven’t found any documentation about this particular error message where the headers were NOT actually corrupted.

It would be nice if doveadm documentation mentioned something at least in passing.

Well. Now you (I) know.


790 Words

2020-03-26 00:00 +0000

How to upgrade portainer running as a docker container

I thought it would be kind of cool if portainer could upgrade itself using its gui. Alas it falls into a chicken and egg situation and you need to step in and use the console. Here’s how to upgrade portainer safely without losing configuration data.

Update: The docker package name changed from portainer/portainer to portainer/portainer-ce. This has not been updated below.

Short Version: make sure the portainer data directory is mounted as a persistent volume and recreate the container using the latest docker image. Here’s how it’s done.

I’m going to assume you used docker without swarm when you installed your current version of portainer since I haven’t tried swarms yet.

We have to make note of the original docker command we’ve used while initially setting up the container portainer lives in. Currently the default way to do that based on the official documentation is:

   docker run -d -p 9000:9000 -p 8000:8000 --name portainer --restart always \
      -v /var/run/docker.sock:/var/run/docker.sock -v portainer_data:/data portainer/portainer

You’ll notice that I am running the docker commands as root. My environment is within an lxc (lxd managed) container and does not expose itself to the Internet. Safe enough for my tinkering purposes.

Here’s our roadmap. We’re going to stop the container. Remove it. Remove the stale image we have for it.

portainer is still running. Find it’s name if you don’t know and stop it.

   root@portainer:~# docker ps
   CONTAINER ID        IMAGE                   COMMAND                  CREATED             STATUS              PORTS                                            NAMES
   f8299de001cc        dnknth/ldap-ui:latest   "/usr/bin/hypercorn …"   11 minutes ago      Up 5 minutes        0.0.0.0:32773->5000/tcp                          ldap-ui
   cb5affb59e4d        portainer/portainer     "/portainer"             7 days ago          Up 2 days           0.0.0.0:8000->8000/tcp, 0.0.0.0:9000->9000/tcp   portainer
   root@portainer:~# docker stop portainer
   portainer
   root@portainer:~# docker ps
   CONTAINER ID        IMAGE                   COMMAND                  CREATED             STATUS              PORTS                     NAMES
   f8299de001cc        dnknth/ldap-ui:latest   "/usr/bin/hypercorn …"   11 minutes ago      Up 6 minutes        0.0.0.0:32773->5000/tcp   ldap-ui

Now that it’s stopped it’s time to remove the instance and it’s current old image.

root@portainer:~# docker rm portainer
portainer
root@portainer:~# docker rmi portainer/portainer
Untagged: portainer/portainer:latest
Untagged: portainer/portainer@sha256:026381c60682b82a863f0c3737a9b4a414beaddd4cf050477a7749ff5ac61189
Deleted: sha256:10383f5b5720d7e1f5f824137034c69b7f6d82cc8aa33afcc4e9d508b561af77
Deleted: sha256:01b8db7b5a6e256e37d9a57a6cdcdd07c33fe5051b5d21117ad4842723f68083
Deleted: sha256:dd4969f97241b9aefe2a70f560ce399ee9fa0354301c9aef841082ad52161ec5
root@portainer:~#

Good riddance. Now it’s time to get the new fresh goodies.

root@portainer:~# docker pull portainer/portainer:latest
latest: Pulling from portainer/portainer
d1e017099d17: Pull complete
a7dca5b5a9e8: Pull complete
Digest: sha256:4ae7f14330b56ffc8728e63d355bc4bc7381417fa45ba0597e5dd32682901080
Status: Downloaded newer image for portainer/portainer:latest
docker.io/portainer/portainer:latest

Now we have to start the new portainer with the same storage settings as before so it will keep the configuration details and data. Use the same command you used for the old version to launch it. In my case we’ll use the command I’ve listed at the top which is:

root@portainer:~# docker run -d -p 9000:9000 -p 8000:8000 --name portainer --restart always \
-v /var/run/docker.sock:/var/run/docker.sock -v portainer_data:/data portainer/portainer

67c671cea3b3408e76c4abe7cda0fb34ad5f733feb52283e4b22b6fb6f3f5f65

Looks good enough. Let’s see if it’s actually running.

root@portainer:~# docker ps
CONTAINER ID        IMAGE                   COMMAND                  CREATED             STATUS              PORTS                                            NAMES
67c671cea3b3        portainer/portainer     "/portainer"             15 minutes ago      Up 15 minutes       0.0.0.0:8000->8000/tcp, 0.0.0.0:9000->9000/tcp   portainer
f8299de001cc        dnknth/ldap-ui:latest   "/usr/bin/hypercorn …"   28 minutes ago      Up 22 minutes       0.0.0.0:32773->5000/tcp                          ldap-ui

It is indeed. In my case when I went to the web inteface everything was up and running with the same config as before using the latest version of portainer. But I’m generally a lucky person. I wish you the same kind of luck.