Saturday, June 11, 2011

"Dirty Data" and its Consequences

The Reading School District has a total of 26 schools, including elementary schools, middle schools, gateway schools, vo-tech schools, and high schools. So, trying to understand the various potential sources of data entry for SIS systems is a challenge. The questions posed for this discussion board prompt – eg, “how is data quality compromised? how can we improve data quality?” - is like asking someone to describe an elephant. My view of the world of data entry at Reading School District is largely limited to my view as a math teacher in one of the 26 school buildings in our district. And I do not even know all of the details of data entry done by the various secretarial and administrative personnel within my building. With this caveat, I will attempt to answer the questions posed to the best of my knowledge.

With the number of schools in our system, data quality could most easily be compromised by the multiple points of data entry. When a new student comes into the school district or transfers from one school to another, who “owns” the data entry piece? When does ownership transfer from one party to another? By not having a single data owner or by having more than one party believe they are the data owner are two potential sources of data compromise.

There are many consequences of dirty data and I will give one example that I have seen in my school. Study Island is a web-based program used in our school for improvement and practice in Math, English, and Science based on our state standards. Each student has their own individual login for access to the system. As a teacher, I can create classes and assignments for any of my math classes. The standard for creating a student ID in Study Island is using the student's name in the format first_last@rhs, such as dougsnyder@rhs. However, I have seen student names such as snyderdoug@rhs or dougsnyder (without “@rhs”). This inconsistency makes it difficult for me as a teacher to create a class. Or, if one account is created and later corrected, a student sometimes has “two” Study Island accounts to use. Another are of potential conflict is if we have two students with the same name – I had two students named “Anthony Ramos” this school year. Having the same name can cause data entry confusion.

In order to improve data quality, it is important to make sure, at a district level, that each piece of data is identified and a data owner is assigned to each piece of data. Procedures need to be put in place for transferring ownership of a piece of data from one owner to a new owner. Procedures also need to be put in place to correct data entry errors like the Study Island example that I mentioned above. Having the ability to merge and/or delete existing records is an important component to ensure overall data integrity.

