Detect risk and requirements in legal documents

abstract

our customer creates almost 30,000 tenders every year. sometimes they miss a simple requirement and face certain rejection. that's the problem we're tackling in this project.

background

about the project

Our customer is a large consulting company that acquires many of its projects through public procurements. They create almost 30,000 tenders every year and have a lot to win on more effective handling. It’s also common that employees who work on bids forget to fulfil specific requirements or miss legal details, which leads to certain rejection.

In this project, our task was to develop machine learning algorithms that identify requirements and risks in the legal sections of project descriptions and reduce the number of mistakes made during the tender process.

challenge

no training data

The main challenge was that we didn’t have any data for training. There was a process in place, but no systematic storage of data good enough for training an algorithm.

solution

application for annotating data

To create training data for our algorithms, we developed an application called Lably to speed up the annotation process. You upload the data to annotate and start working. In parallel, we automatically train a machine learning algorithm as new annotations appear. The algorithm prioritizes which data point to annotate next based on certainty.

results

training data

We uploaded sentences and paragraphs from relevant documents that our customer started annotating. Once the algorithm reached the desired performance (after around 5,000 data points), we went straight to implementation. Here's how the algorithm improves over time with accuracy on the y-axis, and number of data points on the x-axis.

web application

To make the algorithm accessible to our customer´s employees, we create a simple web application. User upload their PDFs to the algorithm, which returns a list of requirements and potentially risky paragraphs.

‍

Similar applications

You can use the exact same solution to do super-resolution or in-painting for another domain, such as fashion. Just replace the dataset with the images you want.

Detect risk and requirements in legal documents

abstract

our customer creates almost 30,000 tenders every year. sometimes they miss a simple requirement and face certain rejection. that's the problem we're tackling in this project.

background

about the project

challenge

no training data

solution

application for annotating data

results

training data

web application

Similar applications

If you have something similar in mind, schedule a meeting, and let’s talk!

other projects

Detecting damages on railroad infrastructure

Text recognition in factories for a large manufacturer

Large scale video-search using CLIP

Unsupervised segmentation of satellite images

get in touch