We always hear about the benefits of analytics done correctly and used well. What we don’t hear as often are the dire consequences of analytics done poorly and used inappropriately. There are occasions where people’s lives can literally be ruined because of an analysis that is poorly designed or incorrectly interpreted.

The story I will be telling here is a true story that happened in my neighborhood a few weeks ago. While I don’t have any personal stake in the school, teachers, or students in this case, the story has eaten at me ever since I heard about it. Anyone who understands analytics will likely be equally appalled.

The local schools in my area are the best in the state and there are a lot of very smart, motivated kids that attend the schools. One of our neighbors has a very good student who is a senior in high school. We’ll call her “Sue”. Sue has always had good grades, she has never gotten into trouble, and she is taking a variety of challenging classes so that she can get into a good college next year.

Sue is taking an Advanced Placement course through the English department. This summer, her class was asked to write a paper as an extra assignment before school started. So far, so good. A new teacher in the department brought up the point that the school wasn’t using any plagiarism checking software and that they really should start using it. Again, so far, so good. Then things went horribly wrong…

The results of the plagiarism analysis showed that every single student in the class had cheated. All failed the assignment and initially the school planned to note the offense on each student’s official transcript. Goodbye, good schools! One might think that the school would question whether there were issues with either the software’s analysis or how the results were interpreted. After all, 100% of a group of “A” students with no history of trouble was flagged. The school did not question the results. The parents did, however. After some digging, the findings are quite troubling.

First, the software by default looks for any phrases of three words or more that match between two submitted papers. Each “offense” of a “copied” three-word phrase is tagged. Get tagged too many times, and you’re identified as a cheater. Let’s think about this criteria applied blindly without further thought. Assume students are writing about Tolstoy’s War and Peace. Two students start a sentence with “Tolstoy said that…” or “The meaning of…” or “The book refers to…”. They are now guilty of plagiarism. The software assumes nobody could have such phrases in common without copying from one another. Such tags are useful as a starting point. But, to be applied correctly, someone needs to review the papers and validate if any of the phrases really appear to be copied or just innocent matches like the examples. Nobody did that.

It gets worse. Part of the assignment was to provide definitions for a number of terms. Students were told that using dictionaries was allowed. Sue pasted definitions from a dictionary into her paper. It ends up that another student she barely knows had many of the same definitions. That, the parents were told, was the smoking gun. Clearly, the two girls had copied from each other. What wasn’t considered is that there are perhaps a handful of well-known dictionaries and dozens of kids in the class. Isn’t it reasonable that a couple of them may have used the same dictionary? Not according the English department. The only way the girls could both have a definition from Dictionary.com is if they copied from one another.

Sue’s parents managed to get the offense removed from her transcript, but not the 0% grade for the assignment. The teachers in the department are now unwilling to provide college recommendations for any of the students since they are all now viewed as cheaters. Some of these kids could quite literally have their futures stolen due to the misuse of analytics by a well-meaning school.

The point of this story is that analytics can’t be blindly followed. Results must be put in context. Unexpected results should be further examined. The English department’s personnel don’t understand what the plagiarism software is doing or why. They just assume that if the software says “potential cheater”, then it must be true. In this case, a horrible injustice has been done to some smart, honest kids.

As analytics become more pervasive and more and more tools make complex analytics “easy” and “push button”, we’ll continue to see examples like this. Those of us in the field have a responsibility to educate others of the limits and appropriate uses of analytics. In this case, the right course of action was to look at each student’s paper and interview them to get their explanation of the flagged passages. Also, the settings on the software should have been validated. To any fair-minded person willing to think logically, Sue did not cheat. To an English department using analytics software they don’t understand, she did. After all, the software found some red flags. What else do they need to know? The kids in the class may not get into a top college after all their years of work for they now have the brand of “Cheater” unfairly stamped on their chests.

Just like any other tool, analytics can be very powerful and helpful when used in the correct way. Analytics can also do major damage when used by those who don’t understand it and don’t use it correctly. In business, we’ve all seen cases where statistics or figures are shown to an executive without full context or needed caveats. After seeing the data, the executive takes some drastic actions that in some cases are not productive or required. It is imperative to ensure that analysis results are used correctly. Let’s review a few practices that should be standard when producing and using analytics. If these principles had been applied, this story would not have happened.

Make sure that someone in the process fully understands the analysis being done, its strengths, and its weaknesses. Not everyone has to understand the gory details, but someone must.
Make sure that the various settings and options utilized in a process have been chosen with good reason. Don’t just assume the default settings are best for your situation.
When unexpected results are found, investigate further and ask critical questions before jumping to conclusions. No algorithm or software package is omniscient. “Because the software said so” is not an acceptable answer.
When provided with additional facts or data that contradict an analysis, give them consideration. The goal should be the right answer, not your initial answer.
Be careful to enable people to execute only the types of analysis they are prepared to execute correctly. It is easy for people to be in over their head and not even realize it. The results, as we’ve seen, can be devastating.

Analytics Gone Wrong: Dire Consequences For Kids