Busting HBCU myths with data

By Jeneé Osterheldt and Tyler Dukes

There’s a long-standing myth that Historically Black Colleges and Universities, or HBCUs, do a poor job graduating their black students.

According to U.S. Department of Education data, only 4 out of 10 black students graduate “on-time” — that is, within six years of starting their freshman year.

Weighted average of graduation rates for black students at 84 HBCUs reporting to the U.S. Department of Education as of 2014 within six years of their start date, or 150 percent time. SOURCE: Integrated Postsecondary Education Data System, PartNews analysis

Compared with colleges and universities overall, the number of black students who graduate on time is closer to 5 in 10.

Weighted average of graduation rates for black students at 1,671 colleges and universities, including HBCUs, reporting to the U.S. Department of Education as of 2014 within six years of their start date, or 150 percent of the time. SOURCE: Integrated Postsecondary Education Data System, PartNews analysis

So what’s the deal?

Jay Z says numbers don’t lie, but they don’t exactly paint the whole Picasso either. It might seem like HBCUs have a low grad rate — but it’s just not that simple.

If you plot graduation rates for black students against the percentage of first-generation students at a college or university, it looks a little something like this.

Approximate plot of percentage of first-generation students (horizontal axis) vs. graduation rates for black students (vertical axis) in 2015 for about 1,600 colleges and universities reporting to the U.S. Department of Education. SOURCE: US DOE College Scorecard, PartNews analysis

The general trend is that the higher the percentage of first-generation students, the lower the graduation rate.

And that’s an important relationship, because when we look at where HBCUs fall on this plot, they tend to be scattered around here, toward the lower end of the graduation rates and the higher percentage of first-generation students.

Approximate locations of 85 HBCUs inplot of percentage of first-generation students (horizontal axis) vs. graduation rates for black students (vertical axis) in 2015 for about 1,600 colleges and universities reporting to the U.S. Department of Education. SOURCE: US DOE College Scorecard, PartNews analysis

On average, about 43 percent of students enrolled in HBCUs are first-generation. Compare that to about 36 percent for colleges overall.

Another factor: Money. According to a Pell Institute study students from families in the top quartile (over $108,650) are eight times more likely to hold a college degree than a kid from the bottom quartile (under $34,160). About half of the nation’s HBCUs have a freshman class where three-quarters of the students are from low-income backgrounds.

About 50 percent of the nation’s HBCUs have a freshman class where 75 percent are from low-income backgrounds.  SOURCE: Pell Institute

But just 1 percent of the 676 non-HBCUs serve as high a percentage of low-income students.

That bag makes a difference. Not to mention, the schools themselves see less resources.
According to the Thurgood Marshall College Fund, HBCUs have one-eighth the average size of endowments than historically white colleges and universities.

And consider the open-admission policy. HBCUs are more likely to accept students with lower grades and SAT scores than other institutions. The Post Secondary National Policy Institute found that over 25 percent of HBCUs are open admission institutions compared with 14 percent of other colleges and universities.

Despite the odds, HBCUs still make a major difference to their student bodies. These schools, which on the surface may seem to do a poor job at graduating black students, helped create the black middle class. At least that’s what U.S. Commission On Civil Rights report says.

Historically Black Colleges and Universities have produced 40 percent of African-American members of Congress, 40 percent of engineers, 50 percent professors at PWIs, 50 percent lawyers and 80 percent of judges.

And to think, HBCUs only represent 3 percent of of post-secondary institutions. Just saying: imagine what these schools could do with more funding and support.

Long live black excellence.

Tyler’s media diary: Using data to break bad habits

I had a suspicion going into this assignment that I’ve developed a few really bad media consumption habits:

  • I spend too much time on Twitter, which is skewing my perception of news coverage.
  • I “graze” way too much, opting for reading headlines instead of reading stories. This means I’m much less informed than I think.
  • My desire to be constantly “in the know” means I almost compulsively check social media first thing in the morning, throughout the day and last thing at night, so it’s more difficult to balance my media diet.

After tracking my media consumption from Feb. 15 through Feb. 20, I can confirm all of these three bad habits are true. The problems may actually be underrepresented, given that this period was a fairly atypical week (as I’ll explain).

Because I wanted to track media consumption across multiple platforms, I opted not to use RescueTime and noted everything manually in a Google Spreadsheet as the week went on. I cross-checked entries with my Google calendar, browsing history and Twitter history to make sure I wasn’t missing anything. I didn’t count reading email, unless there was some specific content there that fit the definition of “media” (a newsletter, for example).

The first big finding is pretty glaring: Social media accounts for nearly a third of the time I spend consuming media. Break that down further, and you can see that within the social media category, Twitter takes a lot of the air out of the room.

Twitter is followed by Reddit, which I mostly consume at night before I go to sleep (I also need to cut down on how long I play video games, an issue I blame solely on Stardew Valley).

I’ve made a concerted effort over the last few weeks to spend less time on Facebook, which is why it appears so small in the chart.

One thing to note about the time period recorded: Over the long weekend, I took an out-of-town trip with friends to a spot without great Internet service. I suspect that if I were to repeat the media diary for the next few weeks, there’d be even more Twitter usage, although this may be offset by more media consumption in general.

If I had to guess before taking a closer look at the numbers, I would have said I spend far more time consuming media by phone than by laptop. That’s clearly untrue.

Also of note: I despise online video. Although I didn’t graph this particular breakout, almost all the video I consume is through the TV (in this case a Roku stick), and not through mobile or laptop. And although I do often listen to podcasts, NPR or other broadcast media, I didn’t really do that over this time period.

So despite some data collection problems, it’s pretty clear I’ve got some media consumption issues I want to address:

  • Spend less time on social media — specifically Twitter.
  • Seek out platforms that use more than just immediacy as the driver for news judgement: Instead of the “happening now” on Twitter, find what news editors think is important on the home pages of local and national news organizations.
  • Change the nighttime routine: Use the evening to read physical media or dive deeper into stories flagged online earlier in the day.

Two tools I think will help are Nuzzel, which alerts you to stories being shared often in your timeline, and Pocket, which allows you to save stories and other content you see through either your mobile device or laptop to read later. I’ve already signed up for these services, but I don’t use them often enough to help me consume more content, instead of just reading headlines.

Snoozing differently

By Anne, Jeneé, Michelle and Tyler

Our group discussed a common problem with wake-up apps: How often we hit the snooze button. So we came up with a feature that allows the user to set up two separate playlists — songs you love and songs you hate.

When the alarm clock rings, you can select the “hype” playlist or the “hate” playlist for the next time the alarm sounds to get you out of bed. We also discussed ways to integrate the alarm app with services like Spotify and Pandora and use them to further randomize the hate/hype based on the preferences you’ve already stored.

 

Posted in All

Overview: Find stories faster in massive document dumps

If you were tasked with reviewing and making sense of a huge stack of documents you’ve never seen before, you would probably go about it in a pretty standard way. Skim the first page and make a quick decision about whether it’s relevant or about a specific topic, then move to page two and make that decision again. After a few pages, you might have a few separate piles describing what you’ve seen so far.

As you continue reading, the piles might get more sophisticated. In one pile, you might place emails containing specific complaints to the school board. In another, policy proposals from a public official’s top adviser. On and on you go until you get through enough of the pile to have a fairly good idea of what’s inside.

For investigative journalists reviewing massive document dumps — responses to public records requests, for example — this may be one of the very first steps in the reporting process. The faster reporters understand what they have, the faster they can decide whether there’s a story worth digging into.

Overview, a project to help journalists sift through massive document dumps

Making sense of documents as efficiently as possible is the primary purpose of Overview, an open-source tool originally developed by The Associated Press and funded by a collection of grants from the Knight Foundation and Google, among others.

Upload your documents into Overview and it will automatically process them first using optical character recognition. It then uses a clustering algorithm called term frequency-inverse document frequency to try to sort each individual document into a series of piles. It’s somewhat similar to the way a human reporter would sort documents if she were reading the pages one by one.

TF-IDF is built on a really basic assumption. It counts the number of times each word is used in each document — say a single email in a batch of thousands. It then compares those counts to the number of times the same words are used in the larger collection of documents. If a few of the emails have words in common that are relatively uncommon in the whole collection of emails, the assumption is that those documents are related in some way.

Overview doesn’t actually derive any meaning from the words it’s counting, so the assumption the algorithm makes about documents being related might be wrong or totally unhelpful. But Overview also allows users to tag individual documents (or whole piles) with custom labels. It might, for example, help a reporter more quickly identify those complaints to the school board or the policy proposals to the public official because they’re all grouped together by the algorithm.

Overview has a few other helpful features, like fast searching and the ability to rerun the clustering algorithm with different parameters — specific terms of interest or stop words, for example. It’s also seamlessly integrated with another tool called DocumentCloud, a popular platform journalists use to annotate and publish documents online.

Tyler’s bio

I’m Tyler Dukes, and I’m a 2017 fellow at the Nieman Foundation for Journalism at Harvard. In real life, I’m an investigative reporter for the state politics team at WRAL News in Raleigh, North Carolina, where I work on longform stories and specialize in data and public records. I’m really interested in finding ways to use technology to enhance in-depth reporting and make data journalism more accessible to underserved media markets. That means (I think) developing better methods for training working journalists and educating journalism students in ways that allow them apply these skills practically on the beat.

At WRAL, I’ve led the reporting on deep dives into the state’s mental healthcare system, deaths in the prisons and, oddly enough, the search for sunken treasure off the Carolina coast. I also built systems that allow readers to search more than a million pages of records from a major university athletic scandal and explore the campaign cash fueling each state lawmaker’s election bid. Prior to working at WRAL, I managed a research project at Duke University’s DeWitt Wallace Center for Media and Democracy called the Reporters’ Lab, aimed at finding ways to reduce the cost of investigative reporting. I also freelanced as a science and technology reporter for several newspapers and worked as an adviser to North Carolina State University’s (then-)daily student newspaper.

While my background is in reporting, writing, editing etc., I’m also proficient in Python, JavaScript and HTML/CSS (although far, far, far from being an expert). I’m also really good at prying records from the clutches of government officials.

I’m a native North Carolinian, devotee of Eastern-NC barbecue and fan of gas station coffee. I love all dogs and very few cats.

Follow me on Twitter and Instagram.

Posted in Bio