Extract information from a PDF invoice

Thanks to the Invoice Extract product powered by AI, you can convert a PDF invoice to a FatturaPA XML invoice.

This is useful in Italy because of the requirement to transform any cross border invoice received in Italy from a foreign supplier, into a valid FatturaPA XML self-invoice. The document type in this case must be one of:

TD17		integrazione/autofattura per acquisto servizi dall'estero
TD18		integrazione per acquisto di beni intracomunitari
TD19		integrazione/autofattura per acquisto di beni ex art.17 c.2 DPR 633/72
Ask to get access

Write to business@a-cube.io to activate the product in sandbox and to get a quotation.

Note: the sandbox environment is limited by default to 10 document conversions.

Operations flow

  1. POST /invoice-extract to provide the PDF file. Example
    curl --location 'https://api-sandbox.acubeapi.com/invoice-extract' \
    --header 'Authorization: Bearer YOUR TOKEN HERE' \
    --form 'file=@"/path/to/file.pdf"'

    You will get a JSON response with the job details

    "uuid": "unique job identifier",
    "acquisition_date": "date time",
    "filename": "string",
    "job_status": "waiting|success|error",
    "pages": null
  2. GET /invoice-extract/{uuid} to get the status of the job. When the field job_status switch to success then you can proceed to obtain the XML result.
  3. GET /invoice-extract/{uuid}/result to obtain the XML file. Note: you can get both XML or JSON invoice formats. The Accept header must be set to application/xml or aplication/json


The POST /invoice-extract API accept optionally a configuration JSON. The form field name to be sent is conversion_configuration. The configuration JSON can contain:

  • default vat rate: if not provided, by default the XML will have 22% vat rate applied
  • convert_amounts: if set to false you will get prices in the original currency of the invoice


curl --location 'https://api-sandbox.acubeapi.com/invoice-extract' \
--header 'Authorization: Bearer YOUR TOKEN HERE' \
--form 'file=@"/path/to/file.pdf"' \
--form 'conversion_configuration="{\"default_vat_rate\":0,\"convert_amounts\":false}"'

Which information the converted XML will contain

  • The AI algorithm try to extract supplier and customer information. In case the vat number is found, the BusinessRegistry database will be used as data fallback.
  • Geocoding functionalities split the addresses into the required fields.
  • Invoice number and invoice date
  • The AI algorithm extracts the invoice detail lines and these will be converted into detail lines of the XML invoice.
  • The DatiRiepilogo fields will be generated automatically based on the detail lines that was found.
  • Prices will be converted into the EUR currency using the right exchange rate taken from Bank of Italy webservices for the right date (the invoice date).
  • Description texts are clean-up converting special chars to latin1 entities

Which information the converted XML will not contain

The document type (TipoDocumento): in example, in case of self-invoice it can be one of TD17, TD18, TD19 depending on the type of the sold items (goods, services). Choosing the right type of document is currently not an ability of the Invoice Extract algorithms.