Niche Languages

Niche Languages

Overview

Tejas recently came to me with a business idea related to helping people learn to speak uncommonly-spoken languages. It seemed intriguing so I am doing this deep dive into niche languages to get some clarity around the topic.

Tejas and his fiances’ families speak Gujarati. Tejas wants to learn this language so that he can speak with family members in their native tongue. Gujarati is a widely-spoken language in India, so he figured there would be plenty of learning-resources online.

Not so. There’s some buggy apps, some independent online courses. None of it looks trustworthy enough so that I would feel comfortable committing a significant amount of my time towards it.

Duolingo

Duolingo is the king. The owl is popular-culture canon. They’ve cracked gamification — people I have talked with have year-long streaks and use it even if they don’t have a strong desire to learn a language. Learning a language is really really hard, but Duolingo has figured out how to reliably get people to beginner level at any language while keeping it fun and engaging.

Still, Duolingo supports 40 languages. That seems like a lot. But let’s dive into some data to find out if that’s true.

Let’s Gather Some Data

Enthnologue currently estimates that there are 7,164 languages currently spoken in the world. Here is a basic breakdown of these by their current status:

source: https://ethnologue.com
source: https://ethnologue.com
Institutional — The language has been developed to the point that it is used and sustained by institutions beyond the home and community.

These “institutional” languages are what we want to focus on.

It is surprisingly hard to find a list of more than 30 languages sorted by the total number of speakers. Ethnologue charges $250 (!) for their list and that seems to be the consensus best source. The best I have found is this list but this does not seem trustworthy at all. Quickly checking a few of the # of speakers on the table versus the # of speakers on the associated Wikipedia link shows lots of inconsistencies. So I’m gonna scrape Wikipedia and make this list myself, brb…

OK done. I made a repository with the script and the resulting sqlite database and pasted a smaller (top 100 out of 810 total) table at the bottom of this article. It is sorted by the number of people who speak the language natively. I chose this because it would have been much more complicated to get the total speakers, and because native speakers is more important for our goals here.

Why Learn a Language?

So, Duolingo supports 40 languages. Two of these are fictional languages (High Valyrian and Klingon) and one, Esperanto, is “an artificial language designed to be an international second language”. Here are the top 11 languages in our top 100 list that Duolingo does not support:

  1. Bengali (237 Million)
  2. Punjabi (150 Million)
  3. Nigerian (116 Million)
  4. Marathi (83 Million)
  5. Telugu (83 Million)
  6. Wu (83 Million)
  7. Malay (82 Million)
  8. Tamil (79 Million)
  9. Persian (72 Million)
  10. Javanese (68 Million)
  11. Gujarati (57 Million)

Why are these languages not supported by Duolingo? Each of these languages have more native speakers than Polish, which IS supported. What process is Duolingo using to decide which languages are worth investing in to add to their app? Why do these languages not match that criteria? Or, more precisely:

  1. What are the motivations of people who use Duolingo?
  2. Why would anyone want to learn the languages on our top-11 list?

Here are the reasons people use Duolingo shown pretty plainly in Duolingo’s onboarding flow:

source: https://goodux.appcues.com/blog/duolingo-user-onboarding
source: https://goodux.appcues.com/blog/duolingo-user-onboarding

Only one of the languages on our top-11 list (Nigerian) is the primary language spoken throughout any one country. Wu is a minority languages in China, Javanese is a minority language in Indonesia, and the remaining 7 are minority languages in India.

Minority Languages

Even though these are minority languages in their respective countries, those who natively speak it use it as their primary language among their family, friends, and broader community. If my family or friends natively speak Gujarati, I will be able to communicate with them in a way that allows them to fully express themselves with all of the nuances and complexity of the language they are most familiar with. The same applies if I am traveling to a region in India that speaks Gujarati.

It makes sense to me that Duolingo would not offer minority languages on its platform. Its users want to gain beginner-proficiency to aid themselves in international business, recreational travel, or a school course. Learning a country’s majority language is almost always the better use of time for these goals.

It does not make sense to me why there is not a huge offering of language-learning tools to learn Gujarati. 57 million people natively speak this. While this is one-tenth of those who natively speak Hindi (500 million), it is still a huge number. A sizeable amount of people travel to regions that speak Gujarati or have family that speaks Gujarati. Would those people not benefit from gaining even a beginner-level understanding of the language? I think they would.


Top 100 Languages by Number of Native Speakers

Rank Name Native Speakers
1 Chinese 1,350,000,000
2 Mandarin 940,000,000
3 Spanish 600,000,000
4 Hindustani 500,000,000
5 Arabic 380,000,000
6 English 380,000,000
7 Russian 255,000,000
8 Bengali 237,000,000
9 Portuguese 236,000,000
10 Punjabi 150,000,000
11 Japanese 123,000,000
12 Mexican Spanish 120,000,000
13 Nigerian PidginBroken 116,000,000
14 German 95,000,000
15 Vietnamese 85,000,000
16 Turkish 84,000,000
17 Marathi 83,000,000
18 Telugu 83,000,000
19 Wu 83,000,000
20 Malay 82,000,000
21 Korean 81,000,000
22 Tamil 79,000,000
23 Egyptian Arabic 78,000,000
24 French 74,000,000
25 Indonesian 72,000,000
26 Persian 72,000,000
27 Italian 68,000,000
28 Javanese 68,000,000
29 Gujarati 57,000,000
30 Hausa 54,000,000
31 Bhojpuri 52,200,000
32 Levantine Arabic 51,000,000
33 Uzbek 51,000,000
34 Oromo 45,500,000
35 Yoruba 45,000,000
36 Hakka 44,000,000
37 Kannada 44,000,000
38 Pashto 44,000,000
39 Polish 40,000,000
40 Odia 38,000,000
41 Xiang 38,000,000
42 Malayalam 37,000,000
43 Sudanese Arabic 37,000,000
44 Algerian Arabic 36,000,000
45 Amharic 35,000,000
46 Burmese 33,000,000
47 Ukrainian 33,000,000
48 Sindhi 32,000,000
49 Sundanese 32,000,000
50 Igbo 31,000,000
51 Moroccan Arabic 29,000,000
52 Tagalog 29,000,000
53 Kurdish 26,000,000
54 Dutch 25,000,000
55 Malagasy 25,000,000
56 Romanian 25,000,000
57 Saʽīdi Arabic 25,000,000
58 Azerbaijani 24,000,000
59 Somali 24,000,000
60 Gan 23,000,000
61 Isan 22,000,000
62 Lingala 21,000,000
63 Thai 21,000,000
64 Cebuano 20,000,000
65 Najdi Arabic 19,000,000
66 Nepali 19,000,000
67 Serbo-Croatian 18,000,000
68 Gilit Mesopotamian Arabic 17,000,000
69 Khmer 17,000,000
70 Maithili 16,800,000
71 Kazakh 16,700,000
72 Chhattisgarhi 16,200,000
73 Chittagonian 16,000,000
74 Sinhala 16,000,000
75 Zhuang 16,000,000
76 Zulu 16,000,000
77 Assamese 15,000,000
78 Bavarian 15,000,000
79 Hungarian 14,000,000
80 Madurese 13,600,000
81 Greek 13,500,000
82 Haitian Creole 13,000,000
83 Sanʽani Arabic 13,000,000
84 Uyghur 13,000,000
85 Kikuyu 12,000,000
86 Serbian 12,000,000
87 Taʽizzi-Adeni Arabic 12,000,000
88 Tunisian Arabic 12,000,000
89 Gulf ArabicKhaleeji 11,000,000
90 Hejazi Arabic 11,000,000
91 Tausūg 11,000,000
92 Xhosa 11,000,000
93 Czech 10,600,000
94 Rangpuri 10,000,000
95 North Mesopotamian Arabic 10,000,000
96 Swedish 10,000,000
97 Tajik 10,000,000
98 Tigrinya 9,700,000
99 Kanuri 9,600,000
100 Hiligaynon 9,100,000