By crossing language detection with topic modeling on 81,000 website-years, I examine whether essential services — government, healthcare, childcare — are as linguistically accessible as market-driven sectors like real estate and restaurants.
Using BERTopic on CommonCrawl archives, I applied unsupervised topic modeling to Luxembourg websites. The results reveal what .lu domains talk about — from property listings and investment funds to sushi menus and scout camps.
Using CommonCrawl data, I analyzed 83,728 pages of Luxembourg websites from 2013 to 2024, detecting which languages each website offers. I explore language trends over time, multilingual patterns, and the most common language configurations.