Demo

Helvetas

Auto data collection translation for international surveys

Readme

In this project, we studied the technical architecture of a solution designed to improve data flows from global field offices of Helvetas - an independent Swiss development organization building capacity in Africa, Asia, Latin America and Eastern Europe. Our focus was enabling automatic and manual translation and suggesting improvements to data quality.

To make the data as accessible and useful as possible, we aim to incorporate a translation feature. This will enable us to have all the data available in both the native language of the country in which the survey was conducted, and in English for the head office to use. This will be a crucial feature as we strive to ensure that our data is accessible and useful to all stakeholders involved in our projects.

(From challenge detail)

After going through the languages requirements, we compiled a matrix of translation engines. We then prototyped code for implementing loading of their sample datasets in a Machine Learning environment where standardization and data quality control measures can take place. Two experiments were done in automatic translation of texts (screenshot above) using different APIs.

Since the Helevetas team already has an Azure cloud-based architecture, we deployed an instance of Azure Cognitive Services API for automatic translation, which could be extended with custom ML models for languages which are not yet supported out of the box. The Ever Traduora service, an open source collaborative translation application, was installed for evaluation on a server as a possibility of combining automatic translation with manual moderation.

We looked at the Kobo API, and started working on data analysis and translation features in Python and Pandas. We worked for approximately 20 hrs (4 people x 5 hrs) on this challenge, and did a brief verbal presentation of the results at the end of Hack4SocialGood. Our exchanges led to some knowledge shares, and everything we worked on can be found in our open source repository:

Kanban project board
Language table with translation capabilities
Notebook for data loading and parsing (Jupyter/Python)
Notebook for language detection (Jupyter/Python)
Notebook for translation with Azure Cognitive Services API (Jupyter/Python)
Demo server for Traduora

Screenshots

Open Traduora

Issue #379

Screenshots of Traduora service

Screenshot of deployment of Azure service

Screenshot of automatic translations in a Jupyter notebook

Challenge

Scroll down for English version.

Automatische Übersetzung der Datenerhebung bei internationalen Umfragen

Helvetas ist eine Entwicklungsorganisation mit Projekten in 30 Ländern mit mittlerem und niedrigem Einkommen, die sich auf Berufsbildung, sauberes Wasser und sanitäre Einrichtungen, kleine Unternehmen und die Stärkung der Gemeinden konzentrieren. Zur Datenerhebung verwenden wir KoboToolbox, ein kostenloses Online-Umfragetool, mit dem Umfragen in mehr als 50 Sprachen erstellt werden können.

Wir entwickeln eine Benutzeroberfläche, die es den Projektmitarbeitenden ermöglicht, ihre Daten zu bereinigen und an den Datenpool zu übermitteln. Die Mitarbeitenden werden dann PowerBI-Dashboards erstellen können. Aber auch die Zentrale in der Schweiz muss die Daten verstehen und aggregierte PowerBI-Dashboards erstellen. Das Problem ist, dass die Daten aus mehreren Sprachen ins Englische übersetzt werden müssen. Diese Übersetzung könnte in die Benutzeroberfläche eingebettet werden oder als Teil der Übermittlung erfolgen.

Skills: Daten, KI/Übersetzung, Daten-Dashboards, Entwicklung

Auto data collection translation for international surveys

Helvetas is a development organization; we have projects in 30 middle and low-income countries that focus on vocational training, clean water and sanitation, small businesses, community empowerment and more. We collect data via KoboToolbox in our 30 country offices for projects such as vocational training, clean water, migration and government services. The Kobo surveys are in in 50+ languages. How can we translate the data so that it can be used for aggregate (worldwide) performance indicator dashboards in PowerBI? We are developing a user interface (UI) that will enable project staff to clean their Kobo data and submit it to the Data Lake. This translation could be embedded in the UI or done as part of the submission process from the UI to the Data Lake.

Skills: Data, KI/translation, data dashboards, development

Resources

KoboToolbox website - test server
Collection of Data samples (Sharepoint)

Languages translation availability by cloud

Key	Languages	NLP Cloud	Azure
en	English	+	+
al	Albanian	+	+
am	Amharic	+	+
ar	Arabic	+	+
bn	Bangla	-	+
dz	Dzongkha	+	-
bs	Bosnian	+	+
mi	Burmese	+	-
fr	French	+	+
ht	Haitian Creole	+	+
lo	Laotian	+	-
mk	Macedonian	+	+
ro	Romanian	+	+
ne	Nepali (has some English text in Nepali data)	+	+
pt	Portuguese	+	+
ru	Russian	+	+
si	Sinhalese	-	-
es	Spanish	+	+
tg	Tajik	+	-
ur	Urdu	+	+
uz	Uzbek	+	+
vi	Vietnamese	+	+

Preview of external content.

👋 Contact ✨ Demo 💻 Source

Alle Teilnehmer*innen, Sponsor, Partner, Freiwilligen und Mitarbeiter*innen unseres Hackathons sind verpflichtet, dem Hack Code of Conduct zuzustimmen. Die Organisatoren werden diesen Kodex während der gesamten Veranstaltung durchsetzen. Wir erwarten die Zusammenarbeit aller Teilnehmer*innen, um eine sichere Umgebung für alle zu gewährleisten. Weitere Einzelheiten zum Ablauf der Veranstaltung finden Sie unter Richtlinien in unserem Wiki.

Die Inhalte dieser Website stehen, sofern nicht anders angegeben, unter einer Creative Commons Attribution 4.0 International License.

Previous
Hack4SocialGood 2023
Next project