Overview
Tejas recently came to me with a business idea related to helping people learn to speak uncommonly-spoken languages. It seemed intriguing so I am doing this deep dive into niche languages to get some clarity around the topic.
Tejas and his fiances’ families speak Gujarati. Tejas wants to learn this language so that he can speak with family members in their native tongue. Gujarati is a widely-spoken language in India, so he figured there would be plenty of learning-resources online.
Not so. There’s some buggy apps, some independent online courses. None of it looks trustworthy enough so that I would feel comfortable committing a significant amount of my time towards it.
Duolingo
Duolingo is the king. The owl is popular-culture canon. They’ve cracked gamification — people I have talked with have year-long streaks and use it even if they don’t have a strong desire to learn a language. Learning a language is really really hard, but Duolingo has figured out how to reliably get people to beginner level at any language while keeping it fun and engaging.
Still, Duolingo supports 40 languages. That seems like a lot. But let’s dive into some data to find out if that’s true.
Let’s Gather Some Data
Enthnologue currently estimates that there are 7,164 languages currently spoken in the world. Here is a basic breakdown of these by their current status:
Institutional — The language has been developed to the point that it is used and sustained by institutions beyond the home and community.
These “institutional” languages are what we want to focus on.
It is surprisingly hard to find a list of more than 30 languages sorted by the total number of speakers. Ethnologue charges $250 (!) for their list and that seems to be the consensus best source. The best I have found is this list but this does not seem trustworthy at all. Quickly checking a few of the # of speakers on the table versus the # of speakers on the associated Wikipedia link shows lots of inconsistencies. So I’m gonna scrape Wikipedia and make this list myself, brb…
OK done. I made a repository with the script and the resulting sqlite database and pasted a smaller (top 100 out of 810 total) table at the bottom of this article. It is sorted by the number of people who speak the language natively. I chose this because it would have been much more complicated to get the total speakers, and because native speakers is more important for our goals here.
Why Learn a Language?
So, Duolingo supports 40 languages. Two of these are fictional languages (High Valyrian and Klingon) and one, Esperanto, is “an artificial language designed to be an international second language”. Here are the top 11 languages in our top 100 list that Duolingo does not support:
- Bengali (237 Million)
- Punjabi (150 Million)
- Nigerian (116 Million)
- Marathi (83 Million)
- Telugu (83 Million)
- Wu (83 Million)
- Malay (82 Million)
- Tamil (79 Million)
- Persian (72 Million)
- Javanese (68 Million)
- Gujarati (57 Million)
Why are these languages not supported by Duolingo? Each of these languages have more native speakers than Polish, which IS supported. What process is Duolingo using to decide which languages are worth investing in to add to their app? Why do these languages not match that criteria? Or, more precisely:
- What are the motivations of people who use Duolingo?
- Why would anyone want to learn the languages on our top-11 list?
Here are the reasons people use Duolingo shown pretty plainly in Duolingo’s onboarding flow:
Only one of the languages on our top-11 list (Nigerian) is the primary language spoken throughout any one country. Wu is a minority languages in China, Javanese is a minority language in Indonesia, and the remaining 7 are minority languages in India.
Minority Languages
Even though these are minority languages in their respective countries, those who natively speak it use it as their primary language among their family, friends, and broader community. If my family or friends natively speak Gujarati, I will be able to communicate with them in a way that allows them to fully express themselves with all of the nuances and complexity of the language they are most familiar with. The same applies if I am traveling to a region in India that speaks Gujarati.
It makes sense to me that Duolingo would not offer minority languages on its platform. Its users want to gain beginner-proficiency to aid themselves in international business, recreational travel, or a school course. Learning a country’s majority language is almost always the better use of time for these goals.
It does not make sense to me why there is not a huge offering of language-learning tools to learn Gujarati. 57 million people natively speak this. While this is one-tenth of those who natively speak Hindi (500 million), it is still a huge number. A sizeable amount of people travel to regions that speak Gujarati or have family that speaks Gujarati. Would those people not benefit from gaining even a beginner-level understanding of the language? I think they would.
Top 100 Languages by Number of Native Speakers
Rank | Name | Native Speakers |
---|---|---|
1 | Chinese | 1,350,000,000 |
2 | Mandarin | 940,000,000 |
3 | Spanish | 600,000,000 |
4 | Hindustani | 500,000,000 |
5 | Arabic | 380,000,000 |
6 | English | 380,000,000 |
7 | Russian | 255,000,000 |
8 | Bengali | 237,000,000 |
9 | Portuguese | 236,000,000 |
10 | Punjabi | 150,000,000 |
11 | Japanese | 123,000,000 |
12 | Mexican Spanish | 120,000,000 |
13 | Nigerian PidginBroken | 116,000,000 |
14 | German | 95,000,000 |
15 | Vietnamese | 85,000,000 |
16 | Turkish | 84,000,000 |
17 | Marathi | 83,000,000 |
18 | Telugu | 83,000,000 |
19 | Wu | 83,000,000 |
20 | Malay | 82,000,000 |
21 | Korean | 81,000,000 |
22 | Tamil | 79,000,000 |
23 | Egyptian Arabic | 78,000,000 |
24 | French | 74,000,000 |
25 | Indonesian | 72,000,000 |
26 | Persian | 72,000,000 |
27 | Italian | 68,000,000 |
28 | Javanese | 68,000,000 |
29 | Gujarati | 57,000,000 |
30 | Hausa | 54,000,000 |
31 | Bhojpuri | 52,200,000 |
32 | Levantine Arabic | 51,000,000 |
33 | Uzbek | 51,000,000 |
34 | Oromo | 45,500,000 |
35 | Yoruba | 45,000,000 |
36 | Hakka | 44,000,000 |
37 | Kannada | 44,000,000 |
38 | Pashto | 44,000,000 |
39 | Polish | 40,000,000 |
40 | Odia | 38,000,000 |
41 | Xiang | 38,000,000 |
42 | Malayalam | 37,000,000 |
43 | Sudanese Arabic | 37,000,000 |
44 | Algerian Arabic | 36,000,000 |
45 | Amharic | 35,000,000 |
46 | Burmese | 33,000,000 |
47 | Ukrainian | 33,000,000 |
48 | Sindhi | 32,000,000 |
49 | Sundanese | 32,000,000 |
50 | Igbo | 31,000,000 |
51 | Moroccan Arabic | 29,000,000 |
52 | Tagalog | 29,000,000 |
53 | Kurdish | 26,000,000 |
54 | Dutch | 25,000,000 |
55 | Malagasy | 25,000,000 |
56 | Romanian | 25,000,000 |
57 | Saʽīdi Arabic | 25,000,000 |
58 | Azerbaijani | 24,000,000 |
59 | Somali | 24,000,000 |
60 | Gan | 23,000,000 |
61 | Isan | 22,000,000 |
62 | Lingala | 21,000,000 |
63 | Thai | 21,000,000 |
64 | Cebuano | 20,000,000 |
65 | Najdi Arabic | 19,000,000 |
66 | Nepali | 19,000,000 |
67 | Serbo-Croatian | 18,000,000 |
68 | Gilit Mesopotamian Arabic | 17,000,000 |
69 | Khmer | 17,000,000 |
70 | Maithili | 16,800,000 |
71 | Kazakh | 16,700,000 |
72 | Chhattisgarhi | 16,200,000 |
73 | Chittagonian | 16,000,000 |
74 | Sinhala | 16,000,000 |
75 | Zhuang | 16,000,000 |
76 | Zulu | 16,000,000 |
77 | Assamese | 15,000,000 |
78 | Bavarian | 15,000,000 |
79 | Hungarian | 14,000,000 |
80 | Madurese | 13,600,000 |
81 | Greek | 13,500,000 |
82 | Haitian Creole | 13,000,000 |
83 | Sanʽani Arabic | 13,000,000 |
84 | Uyghur | 13,000,000 |
85 | Kikuyu | 12,000,000 |
86 | Serbian | 12,000,000 |
87 | Taʽizzi-Adeni Arabic | 12,000,000 |
88 | Tunisian Arabic | 12,000,000 |
89 | Gulf ArabicKhaleeji | 11,000,000 |
90 | Hejazi Arabic | 11,000,000 |
91 | Tausūg | 11,000,000 |
92 | Xhosa | 11,000,000 |
93 | Czech | 10,600,000 |
94 | Rangpuri | 10,000,000 |
95 | North Mesopotamian Arabic | 10,000,000 |
96 | Swedish | 10,000,000 |
97 | Tajik | 10,000,000 |
98 | Tigrinya | 9,700,000 |
99 | Kanuri | 9,600,000 |
100 | Hiligaynon | 9,100,000 |