Challenge Project

Helvetas

Auto data collection translation for international surveys

📂 Demo

In this project, we studied the technical architecture of a solution designed to improve data flows from global field offices of Helvetas - an independent Swiss development organization building capacity in Africa, Asia, Latin America and Eastern Europe. Our focus was enabling automatic and manual translation and suggesting improvements to data quality.

To make the data as accessible and useful as possible, we aim to incorporate a translation feature. This will enable us to have all the data available in both the native language of the country in which the survey was conducted, and in English for the head office to use. This will be a crucial feature as we strive to ensure that our data is accessible and useful to all stakeholders involved in our projects.

(From challenge detail)

After going through the languages requirements, we compiled a matrix of translation engines. We then prototyped code for implementing loading of their sample datasets in a Machine Learning environment where standardization and data quality control measures can take place. Two experiments were done in automatic translation of texts (screenshot above) using different APIs.

Since the Helevetas team already has an Azure cloud-based architecture, we deployed an instance of Azure Cognitive Services API for automatic translation, which could be extended with custom ML models for languages which are not yet supported out of the box. The Ever Traduora service, an open source collaborative translation application, was installed for evaluation on a server as a possibility of combining automatic translation with manual moderation.

We looked at the Kobo API, and started working on data analysis and translation features in Python and Pandas. We worked for approximately 20 hrs (4 people x 5 hrs) on this challenge, and did a brief verbal presentation of the results at the end of Hack4SocialGood. Our exchanges led to some knowledge shares, and everything we worked on can be found in our open source repository:

Screenshots

Open Traduora

Issue #379

Screenshots of Traduora service

Screenshot of deployment of Azure service

Screenshot of automatic translations in a Jupyter notebook


Challenge

Scroll down for English version.

Automatische Übersetzung der Datenerhebung bei internationalen Umfragen

Helvetas ist eine Entwicklungsorganisation mit Projekten in 30 Ländern mit mittlerem und niedrigem Einkommen, die sich auf Berufsbildung, sauberes Wasser und sanitäre Einrichtungen, kleine Unternehmen und die Stärkung der Gemeinden konzentrieren. Zur Datenerhebung verwenden wir KoboToolbox, ein kostenloses Online-Umfragetool, mit dem Umfragen in mehr als 50 Sprachen erstellt werden können.

Wir entwickeln eine Benutzeroberfläche, die es den Projektmitarbeitenden ermöglicht, ihre Daten zu bereinigen und an den Datenpool zu übermitteln. Die Mitarbeitenden werden dann PowerBI-Dashboards erstellen können. Aber auch die Zentrale in der Schweiz muss die Daten verstehen und aggregierte PowerBI-Dashboards erstellen. Das Problem ist, dass die Daten aus mehreren Sprachen ins Englische übersetzt werden müssen. Diese Übersetzung könnte in die Benutzeroberfläche eingebettet werden oder als Teil der Übermittlung erfolgen.

Skills: Daten, KI/Übersetzung, Daten-Dashboards, Entwicklung


Auto data collection translation for international surveys

Helvetas is a development organization; we have projects in 30 middle and low-income countries that focus on vocational training, clean water and sanitation, small businesses, community empowerment and more. We collect data via KoboToolbox in our 30 country offices for projects such as vocational training, clean water, migration and government services. The Kobo surveys are in in 50+ languages. How can we translate the data so that it can be used for aggregate (worldwide) performance indicator dashboards in PowerBI? We are developing a user interface (UI) that will enable project staff to clean their Kobo data and submit it to the Data Lake. This translation could be embedded in the UI or done as part of the submission process from the UI to the Data Lake.

Skills: Data, KI/translation, data dashboards, development

Resources

Languages translation availability by cloud

Key Languages NLP Cloud Azure
en English + +
al Albanian + +
am Amharic + +
ar Arabic + +
bn Bangla - +
dz Dzongkha + -
bs Bosnian + +
mi Burmese + -
fr French + +
ht Haitian Creole + +
lo Laotian + -
mk Macedonian + +
ro Romanian + +
ne Nepali (has some English text in Nepali data) + +
pt Portuguese + +
ru Russian + +
si Sinhalese - -
es Spanish + +
tg Tajik + -
ur Urdu + +
uz Uzbek + +
vi Vietnamese + +
This content is a preview from an external site.
 

Event finished

Hi guys, I just updated the translatepoc notebook with an example of rapidapi nlp translation tool, its free for 300 tokens and I implemented a function to translate an input. And I uploaded a detectlanguage notebook from my former lectures that could be helpfull. Dragan

01.04.2023 07:31 ~ dragan_stoll

Research

7 com (@dragstoll)

6 com (@dragstoll)

31.03.2023 20:30 ~ oleg

Research

Dear Oleg, there was another place where you azure key was in the notebook, I deleted it and updated the github.

31.03.2023 19:43 ~ dragan_stoll

5 com (@dragstoll)

We have a demo project up where we are reading the CSV files and (separately for now) auto-translating the strings.

31.03.2023 19:12 ~ oleg

Research

Cognitive Service demo

Language table updated

31.03.2023 18:56 ~ dragan_stoll

Research

Can I has more RAM plz?

31.03.2023 18:55 ~ oleg

4 com (@dragstoll)

Repository updated

31.03.2023 18:20 ~ oleg

Project

We have started a GitHub Project with a simple kanban plan of our work, and are setting up some Jupyter notebooks and VMs at the moment.

31.03.2023 18:20 ~ oleg

3 com (@dragstoll)

2 com (@dragstoll)

Merge pull request #1 from dragstoll/vladimir/readme_contr

readme

Edited (version 22)

31.03.2023 17:29 ~ oleg

qaStaHvIS ram law' Hoch DaSovbe'chu' (That's MS Klingon for "A glorious valiant death by a thousand hackathons")_

31.03.2023 17:28 ~ oleg

Joined the team

31.03.2023 17:26 ~ oleg

first commit (@dragstoll)

Joined the team

31.03.2023 17:16 ~ stajilov

Event started

Edited (version 9)

30.03.2023 12:17 ~ dhevenstone

Edited (version 6)

30.03.2023 11:59 ~ rebeccalk

Joined the team

28.03.2023 13:00 ~ rebeccalk

Challenge

 
Alle Teilnehmer*innen, Sponsor, Partner, Freiwilligen und Mitarbeiter*innen unseres Hackathons sind verpflichtet, dem Hack Code of Conduct zuzustimmen. Die Organisatoren werden diesen Kodex während der gesamten Veranstaltung durchsetzen. Wir erwarten die Zusammenarbeit aller Teilnehmer*innen, um eine sichere Umgebung für alle zu gewährleisten. Weitere Einzelheiten zum Ablauf der Veranstaltung finden Sie unter Richtlinien in unserem Wiki.

Creative Commons LicenceDie Inhalte dieser Website stehen, sofern nicht anders angegeben, unter einer Creative Commons Attribution 4.0 International License.