Lost in Translation on Luxembourg’s Web

By crossing language detection with topic modeling on 81,000 website-years, I examine whether essential services — government, healthcare, childcare — are as linguistically accessible as market-driven sectors like real estate and restaurants.
Published

March 27, 2026

In my first post, I found that 75% of Luxembourg websites offer French but only 2.4% offer Portuguese — despite Portuguese speakers being 14.5% of the population. In the second, I discovered that real estate dominates the web while construction splits into 11 subgroups. But neither post could answer the following question: if you only speak Portuguese, can you find a doctor online? Can you navigate your commune’s website? Can you find a crèche for your child?

By joining the two datasets — language availability and topic (hereafter sector) classification — I can now map which services are accessible in which languages.

NoteThe Dataset

4,629 classified websites in 2024 with both a sector label and detected languages, spanning 16 sectors over 9 years.

The Language Map of Luxembourg’s Sectors

Which languages does each sector make available? The heatmap below shows language availability across all sectors in 2024. Darker cells mean more websites in that sector offer that language.

Language availability by sector (2024)

Show code
# Build heatmap data (exclude "Other" — it's a catch-all, not a real sector)
sector_lang_filtered = [s for s in sector_lang if s['sector'] != 'Other']
sectors_sorted = [s['sector'] for s in sector_lang_filtered]
langs = ['fr', 'de', 'en', 'lb', 'pt']
lang_labels = [lang_names[l] for l in langs]

z_data = []
hover_data = []
for s in sector_lang_filtered:
    row = [s[f'{l}_pct'] for l in langs]
    z_data.append(row)
    hover_row = [
        f"<b>{s['sector']}</b><br>"
        f"{lang_names[l]}: {s[f'{l}_pct']}%<br>"
        f"({s['n_websites']} websites)"
        for l in langs
    ]
    hover_data.append(hover_row)

fig = go.Figure(go.Heatmap(
    z=z_data,
    x=lang_labels,
    y=sectors_sorted,
    colorscale=[
        [0, '#f7f7f7'],
        [0.15, '#d1e5f0'],
        [0.35, '#92c5de'],
        [0.55, '#4393c3'],
        [0.75, '#2166ac'],
        [1.0, '#053061']
    ],
    text=[[f'{v}%' for v in row] for row in z_data],
    texttemplate='%{text}',
    textfont=dict(size=11),
    hovertext=hover_data,
    hovertemplate='%{hovertext}<extra></extra>',
    colorbar=dict(
        title=dict(text='% of websites', side='right'),
        ticksuffix='%',
        len=0.7
    ),
    zmin=0,
    zmax=100,
))

fig.update_layout(
    height=max(450, len(sectors_sorted) * 32),
    margin=dict(t=20, r=100, b=50, l=180),
    plot_bgcolor='white',
    paper_bgcolor='white',
    xaxis=dict(side='top', tickangle=0),
    yaxis=dict(autorange='reversed', tickfont=dict(size=11)),
)

fig.show(config={'displayModeBar': False})

Note: Each cell shows the percentage of websites in that sector offering that language. Sectors sorted by number of websites (largest at top). Only sectors with 5+ websites shown.

TipKey Findings

The heatmap reveals strikingly different language worlds. Finance & Law is the only sector where English (65%) beats French (62%) — the language of international capital markets. Childcare is the opposite extreme: 97% French, just 18% English, reflecting that crèches primarily communicate in French

Community organizations (scouts, churches, fire services) have the highest Luxembourgish availability at 33% — these seem to be the most culturally rooted institutions. Culture & Tourism is the most multilingual sector overall (58%), serving international visitors.

Now look at the Portuguese column: it’s near zero almost everywhere. Public Services show 0% — not a single municipal or government website in the sample offers Portuguese.

Essential Services vs. the Market

Does the market do a better job at multilingualism than the public sector? I grouped sectors into two categories: essential services (government, healthcare, childcare) that residents need, and market-driven sectors (real estate, restaurants, retail, finance) that compete for customers.

Language availability: essential services vs. market-driven sectors (2024)

Show code
ess = ess_vs_mkt['essential']
mkt = ess_vs_mkt['market']

langs_compare = ['fr', 'de', 'en', 'lb', 'pt']
labels = [lang_names[l] for l in langs_compare] + ['Multilingual']
ess_vals = [ess.get(f'{l}_pct', 0) for l in langs_compare] + [ess.get('multilingual_pct', 0)]
mkt_vals = [mkt.get(f'{l}_pct', 0) for l in langs_compare] + [mkt.get('multilingual_pct', 0)]

fig = go.Figure()

fig.add_trace(go.Bar(
    name=f"Essential Services (N={ess['n_websites']})",
    x=labels,
    y=ess_vals,
    marker_color='#2166ac',
    text=[f'{v}%' for v in ess_vals],
    textposition='outside',
    hovertemplate='<b>Essential Services</b><br>%{x}: %{y:.1f}%<extra></extra>',
))

fig.add_trace(go.Bar(
    name=f"Market-Driven (N={mkt['n_websites']})",
    x=labels,
    y=mkt_vals,
    marker_color='#b2182b',
    text=[f'{v}%' for v in mkt_vals],
    textposition='outside',
    hovertemplate='<b>Market-Driven Sectors</b><br>%{x}: %{y:.1f}%<extra></extra>',
))

fig.update_layout(
    height=450,
    margin=dict(t=20, r=20, b=100, l=60),
    plot_bgcolor='white',
    paper_bgcolor='white',
    barmode='group',
    yaxis=dict(
        title='% of websites', ticksuffix='%',
        gridcolor='#eee', zerolinecolor='#eee',
        range=[0, max(max(ess_vals), max(mkt_vals)) * 1.2],
    ),
    xaxis=dict(gridcolor='#eee', zerolinecolor='#eee'),
    legend=dict(orientation='h', y=-0.22, x=0.5, xanchor='center'),
)

fig.show(config={'displayModeBar': False})

Note: Essential services = Public Services, Healthcare, Childcare. Market-driven = Real Estate, Restaurants, Retail, Finance & Law.

TipKey Findings

The gap is clear in English: market-driven sectors offer it on 46% of websites, while essential services manage only 28%. This matters for the 48% of Luxembourg’s residents who are foreign-born — many of whom speak English but not necessarily French or German.

Luxembourgish shows the reverse pattern: 14% in essential services vs. 4% in the market — government and healthcare websites are more likely to be offered in Luxembourgish.

Portuguese availability is equally low in both groups (2.4%) — neither the market nor the public sector serves this community online.

The Portuguese Gap

Portuguese speakers make up 14.5% of Luxembourg’s population — the largest immigrant community. Yet the previous analysis found Portuguese on just 2.4% of websites overall. How does this gap look across sectors that matter most for daily life?

Portuguese availability by sector vs. population share (14.5%)

Show code
# Filter to named sectors with enough websites
pt_sectors = [s for s in pt_gap if s['n_websites'] >= 10 and s['sector'] != 'Other'][:15]

sectors_list = [s['sector'] for s in reversed(pt_sectors)]
pt_pcts = [s['pt_pct'] for s in reversed(pt_sectors)]

fig = go.Figure()

# Portuguese availability bars
fig.add_trace(go.Bar(
    y=sectors_list,
    x=pt_pcts,
    orientation='h',
    name='Portuguese availability',
    marker_color='#7c3aed',
    text=[f'{v}%' for v in pt_pcts],
    textposition='outside',
    hovertemplate='<b>%{y}</b><br>Portuguese: %{x:.1f}%<extra></extra>',
))

# Population reference line
fig.add_vline(
    x=14.5, line_dash='dash', line_color='#ef4444', line_width=2,
    annotation_text='Population share (14.5%)',
    annotation_position='top',
    annotation_font_color='#ef4444',
)

fig.update_layout(
    height=max(400, len(pt_sectors) * 30),
    margin=dict(t=40, r=80, b=50, l=180),
    plot_bgcolor='white',
    paper_bgcolor='white',
    xaxis=dict(
        title='% of websites offering Portuguese',
        ticksuffix='%', gridcolor='#eee', zerolinecolor='#eee',
        range=[0, max(max(pt_pcts) * 1.3, 18)],
    ),
    yaxis=dict(tickfont=dict(size=11)),
    showlegend=False,
)

fig.show(config={'displayModeBar': False})

Note: The dashed red line marks the share of Portuguese speakers in Luxembourg’s population (14.5%, source). Sectors sorted by gap size (largest gap at top).

TipKey Findings

Not a single sector comes close to the 14.5% population benchmark. The closest is Automotive at 8.7%.

Public Services sit at exactly 0% — the most striking finding. Of 187 government and municipal websites, none offers Portuguese. Healthcare (2.9%) and Childcare (5.3%) fare marginally better, but the gap with the population share remains large.

Even Construction — a sector where Portuguese workers make up a large share of the workforce — only offers 1.7% of its websites in Portuguese.

Is the Gap Closing?

Has multilingualism in essential services improved over time? The chart below tracks English and multilingual availability in key sectors from 2016 to 2024.

English availability over time in selected sectors

Show code
years = evolution['years']
focus_sectors = [
    'Public Services', 'Healthcare', 'Childcare',
    'Real Estate', 'Finance & Law', 'Restaurants'
]

sector_colors = {
    'Public Services': '#2166ac',
    'Healthcare': '#469990',
    'Childcare': '#fabed4',
    'Real Estate': '#4363d8',
    'Finance & Law': '#1e3a5f',
    'Restaurants': '#e6194b',
}

fig = go.Figure()

for sector_data in evolution['sectors']:
    name = sector_data['sector']
    if name not in focus_sectors:
        continue
    en_pcts = sector_data['en_pct']
    # Filter out None values for clean lines
    valid = [(y, v) for y, v in zip(years, en_pcts) if v is not None]
    if not valid:
        continue
    xs, ys = zip(*valid)

    fig.add_trace(go.Scatter(
        x=list(xs),
        y=list(ys),
        name=name,
        mode='lines+markers',
        line=dict(
            color=sector_colors.get(name, '#999'),
            width=3, shape='spline', smoothing=0.8,
        ),
        marker=dict(size=6),
        hovertemplate=f'<b>{name}</b><br>English: %{{y:.1f}}%<br>Year: %{{x}}<extra></extra>',
    ))

fig.update_layout(
    height=450,
    margin=dict(t=20, r=20, b=100, l=60),
    plot_bgcolor='white',
    paper_bgcolor='white',
    xaxis=dict(title='Year', dtick=1, gridcolor='#eee', zerolinecolor='#eee'),
    yaxis=dict(
        title='% of websites offering English', ticksuffix='%',
        gridcolor='#eee', zerolinecolor='#eee',
    ),
    legend=dict(orientation='h', y=-0.28, x=0.5, xanchor='center'),
    hovermode='x unified',
)

fig.show(config={'displayModeBar': False})

Note: Only sectors with 10+ websites per year are shown. Sectors with fewer observations in a given year are excluded from that data point.

TipKey Findings

Finance & Law has consistently led in English availability, hovering around 60-65% throughout the period. Real Estate also trends upward, reflecting international demand for Luxembourg property.

The most concerning trend is Childcare: English availability dropped from 33% (2016) to 18% (2024). Crèche websites are becoming more monolingual over time, not less — moving in the opposite direction of what digital inclusion would require. In a country where a growing share of parents are international workers, this narrowing of linguistic access is worth attention.

Methodology

This analysis joins two existing datasets at the website-year level, combining language detection from the first post with topic classification from the second. No new machine learning models are trained — the value is in the cross-referencing.

1. Language Data
Language availability per website-year from a three-tier detection pipeline: hreflang extraction (regex), LLM-based language switcher detection (Magistral), and FastText fallback for monolingual sites. Covers 83,728 website-years with flags for French, German, English, Luxembourgish, Portuguese, Dutch, and Other.
2. Topic Data
Sector classifications from a global BERTopic model run on 81,585 website-years. Each website-year is assigned one of 191 topics (or marked as outlier). Topics are manually grouped into 15 sectors (Real Estate, Healthcare, Public Services, etc.) via a sector mapping.
3. Join & Aggregate
An inner join on (website_url, year) produces the analysis dataset. Outlier websites (42% — those too unique to cluster) are excluded, as they have no sector label. For each sector and year, I compute the share of websites offering each language and the overall multilingual rate.
4. Policy Groupings
Sectors are grouped into essential services (Public Services, Healthcare, Childcare) and market-driven sectors (Real Estate, Restaurants, Retail, Finance & Law) for the comparative analysis. The Portuguese gap analysis compares web availability against the 14.5% population share of Portuguese speakers in Luxembourg (source).

Citation

For attribution, please cite this work as:

Garbers, J. (2026, March). Lost in Translation on Luxembourg's Web.
Retrieved from https://github.com/julio-garbers/blog/tree/main/language_sector_lux

BibTeX:

@misc{garbers2026translation,
  author = {Garbers, Julio},
  title = {Lost in Translation on Luxembourg's Web},
  url = {https://github.com/julio-garbers/blog/tree/main/language_sector_lux},
  year = {2026}
}