Lux-ASR: Speech to Text System

The University of Luxembourg provides and Automatic Speech Recognition System (ASR) for Luxembourgish and several other languages (english, french, german, portuguese and spanish). Four output formats are available : plain text (txt), SubRip Subtitles (srt), JSON (with or without time codes for words) and Praat TextGrid. As an experimental feature for the Luxembourgish text translation to other languages has been added, which will output the recognized text in English, French, German, Portuguese, and Spanish.

The speech to transcribe can be recorded from a microphone or uploaded as Audio or Video file. If the recording contains more than one speaker, setting diarization to “On” will separate the text of every speaker in the recording along with time codes for their turns.

API Access

Lux-ASR can also be accessed by API and can be reached via:

curl -X POST "https://luxasr.uni.lu/v2/asr?diarization=Enabled&outfmt=text" \
  -H "accept: application/json" \
  -F "audio_file=@PATH/TO/AUDIO FILE;type=audio/wav"

The API returns the transcription in the specified output format.

Query Parameters

diarization: Can be set to Enabled (default) or Disabled to include or exclude speaker diarization.
outfmt: Specifies the output format. Supported values are:
- text – plain text transcript (default)
- json – detailed JSON output
- srt – SubRip subtitle format
- textgrid – Praat TextGrid format

Accepted audio formats are .wav, .mp3, and .m4a.

Python Script

Below is a basic Python script that replicates the functionality of the curl command with added flexibility. You can specify the audio file and optionally choose whether to enable diarization and which output format to use.

import requests
import argparse
import os
import sys

def main():
    parser = argparse.ArgumentParser(
        description="Send an audio file to the LuxASR API for transcription."
    )
    parser.add_argument(
        "audio_file",
        type=str,
        help="Path to the audio file (.wav, .mp3, .m4a)"
    )
    parser.add_argument(
        "--diarization",
        choices=["Enabled", "Disabled"],
        default="Enabled",
        help="Enable or disable speaker diarization (default: Enabled)"
    )
    parser.add_argument(
        "--outfmt",
        choices=["text", "json", "srt", "textgrid"],
        default="text",
        help="Output format: text, json, srt, or textgrid (default: text)"
    )

    args = parser.parse_args()

    if not os.path.isfile(args.audio_file):
        print(f"Error: File '{args.audio_file}' not found.")
        sys.exit(1)

    url = f"https://luxasr.uni.lu/v2/asr?diarization={args.diarization}&outfmt={args.outfmt}"
    headers = {
        "accept": "application/json"
    }

    # Determine MIME type
    ext = args.audio_file.lower()
    if ext.endswith(".wav"):
        mime_type = "audio/wav"
    elif ext.endswith(".mp3"):
        mime_type = "audio/mpeg"
    elif ext.endswith(".m4a"):
        mime_type = "audio/mp4"
    else:
        mime_type = "application/octet-stream"

    with open(args.audio_file, "rb") as audio:
        files = {
            "audio_file": (os.path.basename(args.audio_file), audio, mime_type)
        }
        response = requests.post(url, headers=headers, files=files)

    print(response.text)

if __name__ == "__main__":
    main()

Usage

python luxasr_transcribe.py path/to/your_audio.wav --diarization Enabled --outfmt json

Replace path/to/your_audio.wav with your actual audio file. The --diarization and --outfmt options are optional and default to Enabled and text respectively.

Lux-ASR is under constant development by Peter Gilles, Nina Hosseini-Kivanani, and Léopold Hillah at the University of Luxembourg and is supported by the Chambre des Députes du Grand-Duché de Luxembourg.

Disclaimer

Note that the transcription and the translation are run on a dedicated server at the University of Luxembourg. All data thus stays within Luxembourg and the University’s network. Nobody has access to the uploaded audio or the text output. The audio data is streamed to this server and no files are stored on this server or in the network. No data is used to further train the model and no data is transferred to third parties.