Hyperscalers’ PaaS platform versus SAP’s BTP (ex SCP) PaaS platform, which to choose ?
Recent years have seen the emergence of generalist Paas platforms such as AWS, Azure, Google and more recently the Paas platforms of major software publishers such as SAP.
In addition, there is an abundant supply of open source bricks.
This offers companies a very wide choice to meet their specific application needs, but it also complicates their choices and we note that, faced with pressure from suppliers seeking to promote their solutions, IT departments find it difficult to see clearly.
Through a particular service offered by these platforms: character recognition (OCR), the objective of this article is to highlight the main characteristics of 3 approaches and to show that the choice between them is ultimately a « make or buy » alternative.
Generalist Paas (OCR cloud raw data extraction)
This solution generates a .Json file, which is more accurate than the text file, making it a much more accurate tool than local OCR, with good performance even with blurry or badly scanned documents. It is a remote tool, requiring a connection to the cloud. The scanned document is sent to the cloud OCR, which responds with a .Json file. The advantage of this type of file is that it is more accurate than a text file, as it contains not only characters but also information about the position of the text on the page in the form of bounding boxes, which makes it possible to manipulate the coordinates. Some versions provide another useful piece of information: a character detection confidence index, which allows the probability of error to be judged. Information about the layout of elements on the page is hierarchically arranged at different levels of precision: paragraph, line, word, and character for example. There are several APIs to implement this solution, such as G vision and MS Azure. They both offer a good level of performance, with data extraction in less than 5 seconds. Other APIs such as Recognition by Amazon are less efficient.
Paas platforms of software publishers. ( Turn-key cloud OCR)
Turn-key cloud OCR does not require this first step. They take care of all the work in the chain and directly restore the information of the requested fields.
2) Semantic analysis of the generated file to identify relevant fields
-
- Open Source OCR:
For this solution, for a lower performance, the only cost is fixed: a non-negligible integration cost in order to write the template file for retrieving the desired data. This solution is the least efficient, but as soon as a volume is reached for which its integration cost is lower than the invoicing of the software package publishers’ Paas platform, it is the least expensive of all.
-
- Generalist Paas platforms (cloud OCR for raw data extraction:
For this solution, there is a non-negligible fixed integration cost in order to develop the in-house script for retrieving the desired data, and a variable cost for invoicing the cloud service, depending on the number of documents processed (relatively accessible: around $1.50 per 1000 documents scanned).
Thus, when the volume of documents is significant, it is less expensive than the solution of the Paas platforms of software publishers, for an equivalent performance.
-
- Paas platforms of software publishers. (“Turn-key” cloud OCR)
For this solution, in the absence of any integration effort, the only cost is that of invoicing, depending on the number of documents processed.
Thus, for a low volume, the Pass Platform from software publishers is both the least expensive and the most efficient solution. On the other hand, for a high volume, it is the most expensive of all.
It should also be noted that if the expected use falls outside the field for which the platform was designed, the added value provided by the semantic analysis proposed by the publisher becomes irrelevant and the platform loses its interest compared to generalist Paas platforms. For example, such a platform can save a company a lot of time when recognising supplier invoices if it has been designed for this purpose, but will lose its interest if the company wants to use it to recognise cash register receipts if the platform has not been designed for this purpose.
What to choose ?
FOR A LOW VOLUME or fairly standard requirement
Low enough that the cost of the Paas solution from software publishers falls below the higher integration cost of the generalist Paas solution, or even the Open Source OCR solution.
FOR HIGH VOLUME or very specific needs
High enough that the cost of the software package Paas solution is higher than the integration cost of the general purpose Paas solution and the Open Source OCR solution.