13
Extracting Funding Amounts from State Funding Programs
In this challenge, you have the opportunity to join a Civic Data Lab project aimed at increasing transparency around the German government’s funding policies for organizations and initiatives that strengthen democracy. We have prepared a scraped dataset from www.foerderdatenbank.de and invite you to participate in the next step: developing a method to automatically extract funding amounts. This can be treated as a Named Entity Recognition (NER) or information extraction task. The ultimate goal is to quantify how much the German government spends on promoting democracy.
CDL @ Hack4SocialGood 2025: Extracting Funding Amounts from State Funding Programs
- Development of a method to automatically extract funding amounts from funding programs.
- The problem can be framed as a Named Entity Recognition (NER) or information extraction task.
- The data basis consists of funding programs that are part of the German federal government's "Förderdatenbank" (funding database). These have been scraped and published here.
- The background is an attempt to quantify how much the German government spends on promoting democracy. As a first step, a classifier has already been developed to identify democracy funding programs. The next step is the extraction of funding amounts. You can find an article on the project (in German!) here.
Data
- The data originates from the website: www.foerderdatenbank.de
- A description of the scraped dataset, as well as the link to the data, can be found here.
- An example of how the data can be read using Python is available here
Possible Approaches
- NER using the Python package spaCy.
- Fine-tuning language models like BERT.
- In-context learning with generative LLMs.
Important Considerations
- The method should be evaluated using suitable metrics such as the F1 Score or Accuracy.
Preview of
external content.
Previous
Hack4SocialGood 2025