Import multiple CAMT-Files in one step

mstma · January 21, 2025, 3:43pm

Getting a bunch CAMT files from the bank, it’s hard to import manually each single file (for one day). Therefore a request to ChatGPT delivered the attached script. It takes several CAMT files combined in a ZIP archive and merge them together to one single CAMT file. It also cleans the “.0” at the end of some timestamp entries, which Tryton cannot deal with.

What do You think? Would it be the best approach to

create a separate module
to integrate such code in the account-sepa-statement module
or is there yet another and better solution?

Greetings,
Michael

Here the script generated by ChatGPT:

#! /usr/bin/env python3

import os
import zipfile
import xml.dom.minidom as minidom
from xml.dom.minidom import Document
from datetime import datetime
import re

def extract_zip_files(input_folder):
    for filename in os.listdir(input_folder):
        if filename.endswith(".zip"):
            zip_path = os.path.join(input_folder, filename)
            with zipfile.ZipFile(zip_path, 'r') as zip_ref:
                zip_ref.extractall(input_folder)

def merge_camt053_files(input_folder, output_file):
    doc = Document()
    root = doc.createElement("Document")
    root.setAttribute("xmlns", "urn:iso:std:iso:20022:tech:xsd:camt.053.001.02")
    doc.appendChild(root)

    bk_to_cstmr_stmt = doc.createElement("BkToCstmrStmt")
    root.appendChild(bk_to_cstmr_stmt)

    for filename in os.listdir(input_folder):
        if filename.endswith(".xml"):
            filepath = os.path.join(input_folder, filename)
            parse_and_merge(filepath, bk_to_cstmr_stmt, doc)

    with open(output_file, "w", encoding="utf-8") as f:
        f.write(doc.toprettyxml(indent="  "))

def parse_and_merge(filepath, target_node, doc):
    dom = minidom.parse(filepath)
    statements = dom.getElementsByTagName("BkToCstmrStmt")

    for stmt in statements:
        for child in stmt.childNodes:
            if child.nodeType == Document.ELEMENT_NODE:
                clean_time_elements(child)
                imported_stmt = doc.importNode(child, True)
                target_node.appendChild(imported_stmt)

def clean_time_elements(node):
    for element in node.getElementsByTagName("FrDtTm"):
        clean_time_format(element)
    for element in node.getElementsByTagName("ToDtTm"):
        clean_time_format(element)
    for element in node.getElementsByTagName("CreDtTm"):
        clean_time_format(element)

def clean_time_format(element):
    if element.firstChild and element.firstChild.nodeType == element.TEXT_NODE:
        original_value = element.firstChild.nodeValue
        try:
            cleaned_value = re.sub(r"(\.\d+)([+-]\d+:\d+)?", r"\2", original_value)
            dt = datetime.fromisoformat(cleaned_value)
            element.firstChild.nodeValue = dt.strftime("%Y-%m-%dT%H:%M:%S")
        except ValueError:
            pass

if __name__ == "__main__":
    input_folder = "./input_files"
    output_file = "merged_camt053.xml"

    if not os.path.exists(input_folder):
        print(f"Folder {input_folder} not found.")
    else:
        # Extract ZIP files before processing XML files
        extract_zip_files(input_folder)
        merge_camt053_files(input_folder, output_file)
        print(f"Finished merging. Result saved in {output_file}")

ced · January 21, 2025, 5:11pm

Could not they generate a single one. The SEPA norm says that it should contain the transaction between the last requested file.

Could you please fill a https://bugs.tryton.org/ with an example of failing format.

mstma · January 22, 2025, 7:20pm

The issue occured some times ago. Now the import has no errors, so please don’t care about it.

Eulengesicht · January 28, 2025, 3:08pm

Hello, I came across the same issue. Asked at my bank (Sparkasse) and at another (Volksbank), both of them only do provide zip archives, containing a number of daily statement xml files.

For that reason:
Thanks to @mstma for this useful script which I’ll try; but as well a kind request to Tryton makers to implement batch processing for this; at least Germans will be grateful.

ced · January 28, 2025, 3:45pm

I think a MultiBinary field could be developed to be used only on Wizard which will behave like an <input type="file" multiple="true"/>.

Eulengesicht · February 2, 2025, 2:04pm

Thank you for your reply. Not being a programmer, I do not know what it means, but never mind.

“Sparkasse” and “Volksbank” are two (of three) major players in the German market with roughly a 65% share in the market of business accounts. IMHO, at present for those (don’t know about the situation with private banks), Tryton’s CAMT import feature is of limited use.

So: Should I file an issue on this?

EG

skadlec · February 27, 2025, 10:12am

Hello, I make a new try with Tryton these days. It is getting better and better, very nice to see Thanks a lot to everyone contributing.
I came across the same issue. Files from german bank Sparkasse are zipped CAMT xml or CSV files.
The fastest way for me is this patch for account_statement_sepa module:

6d5
< from zipfile import is_zipfile, ZipFile
33,58c32,47
<
<         if not is_zipfile(file_):
<             file_.close()
<             file_ = BytesIO()
<             with ZipFile(file_, 'a') as camtzip:
<                 camtzip.writestr('camt.xml', self.start.file_)
<
<         with ZipFile(file_) as camtzip:
<             for camtfile in camtzip.namelist():
<                 with camtzip.open(camtfile) as camtxml:
<                     tree = etree.parse(camtxml)
<                     root = tree.getroot()
<                     namespaces = dict(root.nsmap)
<                     namespaces['ns'] = namespaces.pop(None)
<                     for camt_statement in root.xpath(
<                             './ns:BkToCstmrStmt/ns:Stmt | '
<                             './ns:BkToCstmrAcctRpt/ns:Rpt | '
<                             './ns:BkToCstmrDbtCdtNtfctn/ns:Ntfctn', namespaces=namespaces):
<                         statement = self.camt_statement(camt_statement)
<                         origins = []
<                         for entry in camt_statement.iterfind('./{*}Ntry'):
<                             origins.extend(self.camt_origin(camt_statement, entry))
<                         if origins:
<                             statement.number_of_lines = len(origins)
<                             statement.origins = origins
<                             yield statement
---
>         tree = etree.parse(file_)
>         root = tree.getroot()
>         namespaces = dict(root.nsmap)
>         namespaces['ns'] = namespaces.pop(None)
>         for camt_statement in root.xpath(
>                 './ns:BkToCstmrStmt/ns:Stmt | '
>                 './ns:BkToCstmrAcctRpt/ns:Rpt | '
>                 './ns:BkToCstmrDbtCdtNtfctn/ns:Ntfctn', namespaces=namespaces):
>             statement = self.camt_statement(camt_statement)
>             origins = []
>             for entry in camt_statement.iterfind('./{*}Ntry'):
>                 origins.extend(self.camt_origin(camt_statement, entry))
>             if origins:
>                 statement.number_of_lines = len(origins)
>                 statement.origins = origins
>                 yield statement

I would like to implement the feature for german users. Importing the downloaded zip file from the bank without further processing.
What is the suggested way to integrate this functionality in Tryton?

ced · February 27, 2025, 10:15am

The main problem is that zipfile is not safe against ZIP bomb.

So the proper way is to implement the MultiBinary field.