Purge files in the filestore of deleted attachments


#1

Hello.

Once in a while, i’d wish to erase files in the filestore if the attachment record has been deleted.

Such job don’t seem to exist (I inspected trytond-con code). Did I missed something, or should I have to write a script comparing the ir_attachment and the filestore content ?

Thank you.


(Cédric Krier) #2

There is not such tool in standard because such tool will not be transactional.
Indeed you should be careful with such cleaning and think twice if you really need that because the filestore is not only used by attachment but also different Binary fields in modules.


#3

Thank you for this warning.

I still think such a feature would be useful : one of my user often update / replace a lot of big files in attachements. The filestore backup archive will be soon ridiculousely huge.

A quick and dirty solution for my case was to add this function in ir.attachment :

@classmethod
def delete(cls, records):
    # [attachment].purge_deleted = True -> delete file
    if config.getboolean('attachment', 'purge_deleted', default=False):
        for record in records:
            if record.file_id is not None:
                path = filestore._filename(record.file_id,config.get('attachment', 'store_prefix', default=None))
                # print(path)
                os.remove(path)
    super(Attachment, cls).delete(records)

There is still room for improvement as it don’t delete empty directories.

It would be nice to implement it for any delete/updated binary field, but I’m not sure yet how to intercept modification/deleting on the Binary object…


(Sebastien Marie) #4

the code makes several assumptions on the class instance used for the FileStore. If you are using a no standard FileStore (to remotely store data on S3 for example), it will not work.


(albert) #5

Note that the transaction may rollback after you removed the file so the rollback would not be complete because the file would no longer be in the filestore.

Maybe a more robust solution would be to move the file to a special directori in the filestore named “removed” or something like that. At the end of the transaction if it succeeds delete them and if it fails move the file back. This method has a small probability of misbehaving because if concurrently another transaction created a file whose name would collide with the one moved to “removed”, moving it back would overwrite the new file.

Another create a model of “files to be removed” and then have a cron process that removes them. As they will be seen by another transaction it means that they can be removed.

For me, it is worth having a robust method for removing attachments.


#6

Ok, this solution was pretty lame… I now better understand the comment of Cedric.
Sorry.


(Cédric Krier) #7

Yes such FS operation needs to be performed inside a data manager to be transactional.

There is another point that makes filestore removal more complicated. If the same file was used on different fields, it will be stored only once in FS. So you must check before removing a file that any field is using it.


#8

Still thinking about a solution…

I created a wizard performing the following steps :

  1. Retrieves every model/field having the binary type. eg : Pool().get('ir.model.field').search(['ttype','=','binary'])
  2. Parse every record for each model/field having the attribute file_id, record the content in a Set
  3. Parse every file in the FileStore directory. If not in the set, delete it.

I’m still uneasy with the transactional pitfalls, but if my procedure is triggered when no user is connected, it would be fine ?
Also, I’m aware this solution only works for the stadard FileStore. Maybe file operations (listing, deleting) should be implemented at this object level…

Any chance I still misunderstand something and screw my files ?