adesso Blog

In this blog post I describe my journey from a Python novice to a REST service (junior) developer. I created a REST service which includes FastAPI, PyMuPDF and other components that is used to search for text in PDF files and add a highlight annotation. The application runs as a Docker container.

The blog post starts by exploring the steps in my learning process. I then go into the technical details of the solution, challenges I encountered along my journey as well as a number of use cases. Lastly, I describe possibilities for further upgrades and wrap it up with a few final notes.

I would like to thank my esteemed colleague Marc Fabian Metzger for inspiring me to write this blog post and for his feedback over the course of the project.

A look into my learning process

My background: I am an IT management consultant, though I currently work as an IT consultant and project manager. Away from the job, I work with a Docker-based private cloud and occasionally develop scripts for the cloud and/or my home automation system. Before this project, I had already done a few smaller jobs using Python.

I got the idea for this project at a training seminar I attended on how to use the Aleph Alpha Luminous API and its role in generating references to information found using ‘explanations’. A customer of mine at the moment, a statutory health insurance company, is planning to use AI-supported file searches in the future, for example, during the appeals process. The traceability of AI results is critical in order to gain acceptance for these procedures amongst employees and in-house administrative staff. The files to be searched are available as PDFs, so it makes complete sense to flag information found in these files.

Since I have found Python very easy to learn on my own so far, Python 3 was the obvious choice. Firstly, I implemented the core function of the application in a prototype. Searching for and testing out suitable Python PDF libraries took the most time. In the end, I opted for PyMuPDF. At the end of a long trip by train from Munich to Hamburg, I had produced a script that searched a PDF for a particular text and could affix annotations to it.

I then created an initial REST service using Flask. A colleague of mine happened to mention that he uses the FastAPI framework instead of Flask. To explain, FastAPI and Flask are web frameworks used to create API in Python 3.x. Because I was able to go to my colleague for help whenever I was running into a problem, I decided to use FastAPI as the API instead. My decision to make the switch to FastAPI proved to be the right choice. Because Swagger is integrated into FastAPI, the generated results could be tested right away in the browser. Beyond that, the documentation available for FastAPI is excellent in the author’s opinion. While working with FastAPI, I also gained rudimentary knowledge of Pydantic for the objects I used as well as other Python libraries.

Often in software development, any automated tests are developed first. After that, the actual code is written. In this project, the automated tests were created at the end. I chose this strategy because I had developed the actual workflow in the application iteratively. Plus, the development process seemed sufficiently complex even without testing.

The technical details

Application components

The application consists of the REST interface, a class file and a module to generate the annotations. Because of this, a library other than PyMuPDF can also be used in the future with a few minor adjustments.

Steps in the annotation process

The API provides the following endpoints:

To create an annotation job, first call /annotationsjobs via POST and then transfer the texts you wish to search for in the body in the JSON format:

	
	{
	  "explanations": [
	    "string1", "string2" 
	  ]
	}
	

The response contains the ID of the job that has just been generated:

	
	{
	  "id": "aed32b65-e142-4bdb-9a9d-fb91e09ffc83",
	  "explanations": [
	    "string1", "string2"
	  ],
	  "documentdetails": null,
	  "status": 1
	}
	

Once this is complete, one or more PDF files are transferred with a POST to /annotationsjobs/{job_id}/documents

For control purposes, some meta data from each document is stored in documentdetails. The annotation is called up asynchronously as a background task.

The two GET methods /annotationsjobs and /annotationsjobs/{job_id}/documents can be used to retrieve the status/metadata of all jobs or a single job. Once a document has the status ‘done_annotated’, one or more of the transferred texts have been found. The annotated document can now be downloaded using /annotationsjobs/{job_id}/documents/{document_id}. The suffix ‘_anno’ was added to the original file for this purpose.

Once all the files have been downloaded, DELETE /annotationsjobs/{job_id} can be used to delete the job and any temporarily stored files.

The challenges encountered while developing the app

Before I did my research, there were several occasions when I thought that a particular task I was working on was very difficult. However, I learnt that Python and the libraries I used almost always offer elegant and easy-to-use solutions. Here are a few examples:

1. How can you find an object with specific properties in an array?

The solution resolves this issue in one line and sets the object to None if the ID was not found, so that it is easy to continue working with the output and the code remains clear and simple:

	
	currentjob = next((x for x in jobs if x.id == job_id), None)
	
2. How can you run a task in the background so that the API can quickly send a response to the caller?

There is also a simple solution for this:

	
	background_tasks.add_task(search_and_annotate_allpages, job, tmp_dir)
	

3. While searching for suitable PDF libraries, I discovered that annotations in PDFs are created using coordinates. In several libraries I tested, text could be searched for, but converting it into coordinates seemed to be more of a challenge. With PyMuPDF, the text search can output coordinates directly, whereby multi-line results have two coordinates per line:

	
	rl = page.search_for(explanation, quads=True) 
	    for result in rl:
	        annot = page.add_highlight_annot(result)
	

Possible use case

As originally conceived, this application is designed to annotate information found in a text by a large language model in order to foster acceptance and/or build trust amongst users. To meet this goal, the application is run as a Docker container alongside other containers. This setup allows you to create a showcase in which PDF documents can be uploaded and the user can write questions addressed at the Aleph Alpha API. The questions are then answered by the Luminous language model based on the PDF documents. Along with that, users are shown the explanations and offered a copy of the original PDF for download, in which the explanations are annotated.

Outlook

A possible extension to the application could look like this. In addition to the highlight annotations, a comment or something similar could be added, indicating which question was answered using the relevant text passage. I will meet with the customer to see if they are receptive to the idea.

The jobs are currently valid at runtime and are not persisted. Persisting the job data seems to make sense for live use outside of showcases.

Conclusion

From my perspective, developing the application and expanding my Python skills presented an exciting challenge that I had a lot of fun doing. Python is very easy to learn in my opinion. You can quickly make excellent progress learning to use the programming language – and you can achieve results quickly.

Would you like to learn more about exciting topics from the adesso world? Then take a look at our latest blog posts.

Picture Alexander Zielinski

Author Alexander Zielinski

Alexander Zielinski advises companies in the areas of digitalisation and IT service management and has more than 10 years of experience in these areas. His industry focus is on statutory health insurance.

Save this page. Remove this page.