Hi guys, I just updated the translatepoc notebook with an example of rapidapi nlp translation tool, its free for 300 tokens and I implemented a function to translate an input. And I uploaded a detectlanguage notebook from my former lectures that could be helpfull. Dragan
Helvetas
Auto data collection translation for international surveys
In this project, we studied the technical architecture of a solution designed to improve data flows from global field offices of Helvetas - an independent Swiss development organization building capacity in Africa, Asia, Latin America and Eastern Europe. Our focus was enabling automatic and manual translation and suggesting improvements to data quality.
To make the data as accessible and useful as possible, we aim to incorporate a translation feature. This will enable us to have all the data available in both the native language of the country in which the survey was conducted, and in English for the head office to use. This will be a crucial feature as we strive to ensure that our data is accessible and useful to all stakeholders involved in our projects.
(From challenge detail)
After going through the languages requirements, we compiled a matrix of translation engines. We then prototyped code for implementing loading of their sample datasets in a Machine Learning environment where standardization and data quality control measures can take place. Two experiments were done in automatic translation of texts (screenshot above) using different APIs.
Since the Helevetas team already has an Azure cloud-based architecture, we deployed an instance of Azure Cognitive Services API for automatic translation, which could be extended with custom ML models for languages which are not yet supported out of the box. The Ever Traduora service, an open source collaborative translation application, was installed for evaluation on a server as a possibility of combining automatic translation with manual moderation.
We looked at the Kobo API, and started working on data analysis and translation features in Python and Pandas. We worked for approximately 20 hrs (4 people x 5 hrs) on this challenge, and did a brief verbal presentation of the results at the end of Hack4SocialGood. Our exchanges led to some knowledge shares, and everything we worked on can be found in our open source repository:
- Kanban project board
- Language table with translation capabilities
- Notebook for data loading and parsing (Jupyter/Python)
- Notebook for language detection (Jupyter/Python)
- Notebook for translation with Azure Cognitive Services API (Jupyter/Python)
- Demo server for Traduora
Screenshots
Screenshots of Traduora service
Screenshot of deployment of Azure service
Screenshot of automatic translations in a Jupyter notebook
Challenge
Scroll down for English version.
Automatische Übersetzung der Datenerhebung bei internationalen Umfragen
Helvetas ist eine Entwicklungsorganisation mit Projekten in 30 Ländern mit mittlerem und niedrigem Einkommen, die sich auf Berufsbildung, sauberes Wasser und sanitäre Einrichtungen, kleine Unternehmen und die Stärkung der Gemeinden konzentrieren. Zur Datenerhebung verwenden wir KoboToolbox, ein kostenloses Online-Umfragetool, mit dem Umfragen in mehr als 50 Sprachen erstellt werden können.
Wir entwickeln eine Benutzeroberfläche, die es den Projektmitarbeitenden ermöglicht, ihre Daten zu bereinigen und an den Datenpool zu übermitteln. Die Mitarbeitenden werden dann PowerBI-Dashboards erstellen können. Aber auch die Zentrale in der Schweiz muss die Daten verstehen und aggregierte PowerBI-Dashboards erstellen. Das Problem ist, dass die Daten aus mehreren Sprachen ins Englische übersetzt werden müssen. Diese Übersetzung könnte in die Benutzeroberfläche eingebettet werden oder als Teil der Übermittlung erfolgen.
Skills: Daten, KI/Übersetzung, Daten-Dashboards, Entwicklung
Auto data collection translation for international surveys
Helvetas is a development organization; we have projects in 30 middle and low-income countries that focus on vocational training, clean water and sanitation, small businesses, community empowerment and more. We collect data via KoboToolbox in our 30 country offices for projects such as vocational training, clean water, migration and government services. The Kobo surveys are in in 50+ languages. How can we translate the data so that it can be used for aggregate (worldwide) performance indicator dashboards in PowerBI? We are developing a user interface (UI) that will enable project staff to clean their Kobo data and submit it to the Data Lake. This translation could be embedded in the UI or done as part of the submission process from the UI to the Data Lake.
Skills: Data, KI/translation, data dashboards, development
Resources
- KoboToolbox website - test server
- Collection of Data samples (Sharepoint)
Languages translation availability by cloud
Key | Languages | NLP Cloud | Azure |
---|---|---|---|
en | English | + | + |
al | Albanian | + | + |
am | Amharic | + | + |
ar | Arabic | + | + |
bn | Bangla | - | + |
dz | Dzongkha | + | - |
bs | Bosnian | + | + |
mi | Burmese | + | - |
fr | French | + | + |
ht | Haitian Creole | + | + |
lo | Laotian | + | - |
mk | Macedonian | + | + |
ro | Romanian | + | + |
ne | Nepali (has some English text in Nepali data) | + | + |
pt | Portuguese | + | + |
ru | Russian | + | + |
si | Sinhalese | - | - |
es | Spanish | + | + |
tg | Tajik | + | - |
ur | Urdu | + | + |
uz | Uzbek | + | + |
vi | Vietnamese | + | + |
Event finish
Research
7 com (@dragstoll)
6 com (@dragstoll)
Research
Dear Oleg, there was another place where you azure key was in the notebook, I deleted it and updated the github.
5 com (@dragstoll)
We have a demo project up where we are reading the CSV files and (separately for now) auto-translating the strings.
Research
Cognitive Service demo
Language table updated
Research
Can I has more RAM plz?
It works!
4 com (@dragstoll)
Repository updated
Project
We have started a GitHub Project with a simple kanban plan of our work, and are setting up some Jupyter notebooks and VMs at the moment.
3 com (@dragstoll)
Merge branch 'main' of https://github.com/dragstoll/helvetas_translation (@dragstoll)
2 com (@dragstoll)
Merge pull request #1 from dragstoll/vladimir/readme_contr
readme
readme
qaStaHvIS ram law' Hoch DaSovbe'chu' (That's MS Klingon for "A glorious valiant death by a thousand hackathons")_
Joined the team
first commit (@dragstoll)
Joined the team
Start
Joined the team