The Linguistic Web of Luxembourg

Using CommonCrawl data, I analyzed 83,728 pages of Luxembourg websites from 2013 to 2024, detecting which languages each website offers. I explore language trends over time, multilingual patterns, and the most common language configurations.
Published

January 17, 2026

Using CommonCrawl data, I analyzed 83,728 pages of Luxembourg websites from 2013 to 2024, detecting which languages each website offers. Below, I explore language trends over time, multilingual patterns, and the most common language configurations.

NoteWebsite Languages in 2024
French English German Luxembourgish Dutch Portuguese Other Multilingual
75% 39% 33% 9% 4% 2% 4% 41%

The Language Landscape

Which languages do Luxembourg websites make available to visitors?

Language availability over time (% of websites offering each language)

Show code
fig = go.Figure()

for lang in ['fr', 'de', 'en', 'lb', 'pt', 'nl', 'other']:
    fig.add_trace(go.Scatter(
        x=yearly['year'].to_list(),
        y=yearly[f'{lang}_pct'].to_list(),
        name=lang_names[lang],
        mode='lines+markers',
        line=dict(color=lang_colors[lang], width=3, shape='spline', smoothing=0.8),
        marker=dict(size=6),
        hovertemplate=f"<b>{lang_names[lang]}</b><br>%{{y:.1f}}% of websites<br>Year: %{{x}}<extra></extra>"
    ))

fig.update_layout(
    height=450,
    margin=dict(t=20, r=20, b=100, l=60),
    plot_bgcolor="white",
    paper_bgcolor="white",
    xaxis=dict(title="Year", dtick=1, gridcolor="#eee", zerolinecolor="#eee"),
    yaxis=dict(title="% of websites", ticksuffix="%", gridcolor="#eee", zerolinecolor="#eee"),
    legend=dict(orientation="h", y=-0.28, x=0.5, xanchor="center"),
    hovermode="x unified"
)

fig.show(config={'displayModeBar': False})

Note: Percentages indicate share of websites offering each language. Sites can offer multiple languages, so percentages do not sum to 100%.

TipKey Findings

French dominates at 75% and has remained remarkably stable over the decade. I don’t have a good explanation for the sharp 2019 decrease in German, English, and Luxembourgish on websites.

The increase in websites in Luxembourgish from 2016–2018 coincides with the government’s “Strategie fir d’Promotioun vun der Lëtzebuerger Sprooch”, launched in March 2017. This plan aimed to increase digital presence and standardization of Luxembourgish, culminating in the July 2018 law establishing a Language Commissioner and the Centre for the Luxembourgish Language. (source)

Portuguese speakers make up 14.5% of Luxembourg’s population — the largest immigrant community — but only 2.4% of websites offer Portuguese content. (source)

Beyond Monolingual

Is Luxembourg’s constitutional trilingualism (French, German, Luxembourgish) reflected online?

Multilingual breakdown over time

Show code
colors = ['rgba(59, 130, 246, 0.5)', 'rgba(16, 185, 129, 0.5)', 'rgba(245, 158, 11, 0.5)', 'rgba(239, 68, 68, 0.5)']
line_colors = ['#3b82f6', '#10b981', '#f59e0b', '#ef4444']
names = ['1 Language', '2 Languages', '3 Languages', '4+ Languages']
keys = ['monolingual', 'bilingual', 'trilingual', 'quadlingual_plus']

fig = go.Figure()

for i, (key, name, color, line_color) in enumerate(zip(keys, names, colors, line_colors)):
    fig.add_trace(go.Scatter(
        x=multilingual['year'].to_list(),
        y=multilingual[key].to_list(),
        name=name,
        mode='lines',
        stackgroup='one',
        fillcolor=color,
        line=dict(color=line_color, width=0),
        hovertemplate=f"<b>{name}</b><br>%{{y:.1f}}%<extra></extra>"
    ))

fig.update_layout(
    height=450,
    margin=dict(t=20, r=20, b=100, l=60),
    plot_bgcolor="white",
    paper_bgcolor="white",
    xaxis=dict(title="Year", dtick=1, gridcolor="#eee", zerolinecolor="#eee"),
    yaxis=dict(title="% of websites", ticksuffix="%", range=[0, 100], gridcolor="#eee", zerolinecolor="#eee", tickvals=[20, 40, 60, 80, 100], ticktext=["20%", "40%", "60%", "80%", "100%"]),
    legend=dict(orientation="h", y=-0.28, x=0.5, xanchor="center"),
    hovermode="x unified"
)

fig.show(config={'displayModeBar': False})

Note: Stacked area chart shows the percentage of websites offering 1, 2, 3, or 4+ languages.

TipKey Findings

Contrary to what one might expect, the share of multilingual websites has slightly decreased — from 46% in 2013 to 41% in 2024.

The majority of Luxembourg websites (59%) remain monolingual, with French-only sites being the most common configuration.

Language Configurations

What are the most common language configurations?

Most common language configurations (2024)

Show code
# Reverse for horizontal bar chart
combos_reversed = combinations.reverse()

# Create abbreviated labels for mobile (single letter, no spaces)
abbrev_map = {
    'French': 'F', 'German': 'D', 'English': 'E',
    'Luxembourgish': 'L', 'Portuguese': 'P', 'Dutch': 'N'
}

def abbreviate(combo):
    result = combo
    for full, short in abbrev_map.items():
        result = result.replace(full, short)
    return result.replace(' + ', '+')

full_labels = combos_reversed['combo'].to_list()
short_labels = [abbreviate(c) for c in full_labels]

colors_gradient = [
    f'hsl({200 + i * 10}, 60%, {45 + i * 3}%)'
    for i in range(len(combos_reversed))
]

fig = go.Figure()

fig.add_trace(go.Bar(
    y=full_labels,
    x=combos_reversed['pct'].to_list(),
    orientation='h',
    marker=dict(color=colors_gradient),
    text=[f"{p}%" for p in combos_reversed['pct'].to_list()],
    textposition='outside',
    hovertemplate="<b>%{y}</b><br>%{x:.1f}% of websites<extra></extra>"
))

fig.update_layout(
    height=500,
    margin=dict(t=20, r=80, b=50, l=200),
    plot_bgcolor="white",
    paper_bgcolor="white",
    xaxis=dict(title="% of websites", ticksuffix="%", range=[0, 48], gridcolor="#eee", zerolinecolor="#eee"),
    yaxis=dict(tickfont=dict(size=10), gridcolor="#eee"),
    showlegend=False
)

fig.show(config={'displayModeBar': False})

Note: Based on 8,061 websites in 2024. Only the four main languages (French, German, English, Luxembourgish) are shown.

TipKey Findings

French-only sites dominate at 38.1%, followed by English-only at 12.9%, even though English isn’t an official language.

The French + German pairing (8.6%) has been overtaken by French + English (10.8%).

The full quadrilingual setup (French + German + English + Luxembourgish) remains rare at just 2.4%.

Notably, Luxembourgish-only sites sit at the bottom (1.2%) — the national language rarely stands alone online, almost always appearing alongside French or German.

Methodology

This analysis detects language availability — which languages a website offers to visitors — rather than just language content (what language a page happens to be written in). This distinction matters because analyzing page content alone only reveals the language of that specific page, not the full set of languages a site offers. By detecting language switchers and hreflang tags, I can identify multilingual sites even when only one language version was archived.

I use a three-tier detection pipeline, prioritizing the most reliable signals first:

1. Data Collection
I extracted 83,728 website-year observations from CommonCrawl archives (2013–2024), filtering for .lu domains. CommonCrawl provides free, publicly available web archives — enabling reproducible research without requiring custom crawling infrastructure.
2. Hreflang Extraction
I scanned HTML for hreflang tags — the W3C standard for declaring language alternatives. When present, these tags explicitly list all language versions a site offers, making them the most reliable signal. Found in 15,808 website-years (19%).
3. LLM Detection
For sites without hreflang tags, I used Mistral’s Magistral model to detect language switchers in HTML navigation elements. LLMs can identify patterns like “FR | DE | EN” menus that rule-based approaches miss. Applied to 67,774 website-years (81%).
4. FastText Fallback
Sites with no detected language switcher are assumed monolingual. I classified their primary language using FastText, a lightweight model trained on Wikipedia. Applied to 4,995 website-years (6%).

Note: 1,134 website-years (1.4%) had insufficient text for language classification and were excluded from the analysis.

Citation

For attribution, please cite this work as:

Garbers, J. (2026, January). The Linguistic Web of Luxembourg.
Retrieved from https://github.com/julio-garbers/blog/tree/main/website_languages_lux

BibTeX:

@misc{garbers2026linguistic,
  author = {Garbers, Julio},
  title = {The Linguistic Web of Luxembourg},
  url = {https://github.com/julio-garbers/blog/tree/main/website_languages_lux},
  year = {2026}
}