The United Nations has recently announced that international donors have pledged $1 billion to provide education to millions of children in Pakistan. Nearly 25 million children are currently out of school in Pakistan, and about seven million of these children have yet to receive primary schooling, according to a recent report prepared by Society for the Protection of the Rights of the Child (SPARC).
Education in Pakistan has long been in a state of crisis. After Musharraf’s regime, Pakistan resumed elections in 2008, and media, judiciary and other democratic institutions have strengthened since then. What does the narrative of education look like in current times, and what kind of discourse underlies the education narrative? These are the questions that we explore in this inquiry.
In order to understand the narrative of education in Pakistan, we employed unsupervised learning algorithm on the text corpus provided by Alif Ailaan, an education advocacy group in Pakistan. The corpus comprises education stories curated from Pakistani media sources— including Dawn, The Express Tribune, Nation, The News and Pakistan Today— since Feb. 2013. The purpose of using unsupervised learning algorithm was to delineate underlying topical themes that are present in the text corpus.
We extracted five topic structures using our learning algorithm. The intuition behind our algorithm is that documents exhibit multiple topics. For instance, in a single document, ‘Malala’, ‘woman’ and ‘education’ are lumped together as one topic, and ‘federal’, ‘funding’ and ‘government’ are grouped into another topic. Using this technique we extracted keywords associated with five topics that our algorithm discovers.
Below is a bubble graph of the entire topical space.Each bubble represents proportional representation of a keyword in a topical cluster, which is differentiated by color.
Now we will look at each topic individually. We have labeled the first topic as “Federal Education” because it loosely exhibits the discourse surrounding federal policies and issues on education in form of keywords like ‘federal’, administration’, ‘CADD’ and ‘FDE’. Both Capital Administration and Development Division (CADD) and Federal Directorate of Education (FDE) are constitutional bodies that are responsible for federal functions on education.
We have labeled the second cluster as “Higher Education” since it contains terms like ‘university’, ‘international’, ‘technology’, ‘faculty’, and ‘science’ which are characteristic of higher education in Pakistan. The Higher Education Commission of Pakistan (‘HEC’) is a constitutionally established institution that drives higher education efforts in Pakistan.
We have labeled the third cluster “Primary Education” because of terms like ‘child’, ‘primary’, ‘enrollment’, ‘school’, ‘literacy’, ‘teacher’, and ‘english’. Last year, successful primary enrollment drives took place at provincial level in Pakistan to register out-of-school children in public schools.
The fourth cluster of topics, which we have labeled “Malala”, is the most telling one. Malala became “the spokesperson for a generation of girls” after being shot in the head by Taliban. Almost half of rural young women in Pakistan have never attended school, according to a 2012-2013 UNESCO report. The name Malala is the only personal name that appears in the topical space on education in Pakistan. This cluster of words is also marked by tension between heterogeneous discourses in Pakistan including Talibanization, religion, security, peace, rights, and gender, highlighting the disruptive power of the “Malala” narrative on the discourses around education.
Lastly, the fifth cluster of topics includes provinces-related terms such as ‘sindh’,’punjab’, ‘local’, ‘district’, ‘provincial’. We have labeled this topic as “Provinces and Education”.
In the chart below we show a timeline representation of the news stories curated in the Alif Ailaan corpus. Malala gave her first speech at the United Nations in July 2013; an increase in the number of stories on education in July could be related to Malala’s speech. Similarly, spikes in Aug. 2013 and Sept. 2013 could be explained by enrolment drives in Punjab and Khyber Pakhtunka provinces. These campaigns aimed at enrolling out-of-school children in public schools. Finally, the spike in Feb. 2014 could be related to the launch of Annual Status of Education Report (ASER) report, which highlighted Pakistan’s education crisis and made headlines in national newspapers. An in-depth analysis of these correlations is needed to provide more concrete insights on these trends.
In summary, these preliminary findings suggest that the current narrative of education in Pakistani media landscape is rich and diverse and covers the entire gamut of concerns around education crisis. The topics we discovered suggest that the media attention on education is produced by an active state of affairs.
Great bit of media ecosystem analysis! Unsupervised learning algorithms are not a normal part of the data journalists toolkit but I expect they will be in the future. They will certainly play an ever larger role in news discovery that feeds stories for journalists to write as we see in The AOL Way approach.
I like the bubble graphs and the two-dimensionality of size and color, but the third dimension of spatial orientation is distracting. Obviously, in the first graph the spatial orientation is in part representative of clustering, but that’s already covered by color. What I don’t understand is the core, periphery, and outer periphery nature of it. Is there something about the core that is different and worth paying attention to because the layout makes me think so but you don’t mention it.
In the subset bubble graphs, I like that I can focus more on the words (they are bigger), but I find the colors distracting. What do they mean? Can you explain or offer a legend of some kind?
The final timeline is great! I love the trendline and the labeling of the spikes. You should be consistent about either pointing to the spikes or giving a specific time for each event like you do for Malala’s speech so that we know exact what it refers to.
Overall, very interesting!