Saturday, June 18, 2011

Data Warehouse Needs Analysis

There are many considerations to be made when researching and choosing a data warehouse solution for an organization. To complete a detailed needs analysis would take a number of days and meetings with key personnel. With that said, it is difficult to add much to the information detailed by Bernhardt's book (Bernhardt, pp. 57-72). I will be addressing these needs in a more generic way, since Reading School District already has a data warehouse in place.

As I thought through the process and choices of key committee members, I came up with a similar list to Bernhardt: IT staff, project manager, principals, vice principals, other administrators, teachers, finance office personnel, and school board members. The roles of the members vary depending on the phase of the project, which would be laid out by the project manager. Initially, the project team should meet and determine the priorities for choosing a system – eg, speed of implementation, maximizing system features, bottom line cost, etc. Timing and cost are most often considerations, as it makes sense for systems to be installed at logical timeframes – the beginning of a school year, the beginning of a financial fiscal year, etc. Cost is always a consideration, and much more so in the current economic climate.

The scope of the data warehouse will vary from school to school and will be based on the priorities of the school with regards to its mission and vision. Coming to a common agreement on the purposes, users, uses, and data requirements is important and putting these in writing is also important for project direction and for future reflection on the system which has been implemented. These requirements should continually be in front of the project team to ensure that they go in the right direction and do not attempt to implement features not in the original project scope.

Determining data readiness entails all of the things to which Bernhardt refers – evaluation of the current network, currently existing data, and the amount of money available for a new system. The network is the underlying foundation of any successful software solution. An organization will not want to lose any current data and implementation of a new system will likely involve mapping / converting current data to a new solution. And, ultimately, financial concerns will constrain the scope of any software solution. One thing to keep in mind regarding software costs is that they can be negotiable. A simple example comes to mind – tax preparation software is much more valuable on April 1st than it is on April 16th, at which time one would be hard pressed to find anyone willing to pay $29 for tax preparation software!

Once the network groundwork has been laid, identifying the personnel available to do the work is important. Notice that all of this groundwork – counting the cost both in terms of dollars spent on software and personnel – is done before even approaching a vendor. An organization that invests due diligence in this part of the software selection process is well on its way to implementation of a successful solution for all constituents involved.




Bernhardt, V.L. (2008). Translating data into information to improve teaching and learning. Larchmont, NY: Eye On Education, Inc.

Wednesday, June 15, 2011

Data Warehouse Planning and Selection Process

Chapter 5 of Bernhardt's book details the steps for planning a data warehouse for an organization. The purpose of this planning is to connect the various types of data – demographic, student learning, school process, and perceptions – in order to provide comprehensive data analysis (Bernhardt, p. 57). Bernhardt details seven steps necessary in the planning: 1) establishing a project team; 2) defining the scope of the data warehouse; 3) determining data readiness; 4) determining the desired data; 5) determining who is going to do the work; 6) determining the levels of access; and, 7) selecting data warehouse vendor (Bernhardt, p. 61-72). I view this process as laying the foundation for the construction of a skyscraper – there are so many unseen things going on underneath the surface which result in the successful construction of a really tall building.

Chapter 6 discusses data discovery and data mapping. At the lowest level, this process includes defining each piece of data needed in the data warehouse, its characteristics, and potential values. For example, birthdate will be entered in the format mm/dd/yyyy. The National Center for Education Statistics (NCES) refers to the entity (description), attribute (characteristics), and instance (attribute value) of the data (Bernhardt, p. 74). From a larger perspective, data discovery results in determining the reports needed from the system and cleaning up dirty data (Bernhardt, p. 73).

Chapter 7 discusses selection of analytical tools which will be used to make effective use of the data mapped through the data discovery and mapping process. Bernhardt contends that data warehouses and analytical tools must make the data accessible, consistent, and secure (Bernhardt, p. 85). The data warehouse must continuously adapt to meet users' needs. The various capabilities of a system – things such as levels of data access, intuitive use, and easy report generation – as well as vendor considerations – implementation timeframe, vendor support, and forming consortiums with like-minded districts – are also discussed (Bernhardt, p. 86-93).

Chapter 8 discusses whether a data warehouse solution should be built in-house or purchased from a vendor. Bernhardt quickly concludes that purchasing a system from a vendor is the better option, but the implementation of the system will not be successful without the support of in-house staff (Bernhardt, p. 95-96). She gives three different scenarios resulting in various levels of success or failure and emphasizes the need for IT staff and district leadership to work hand-in-hand for a successful solution.

My personal experience is consistent with Bernhardt's thoughts. There were some of my IT staff members who felt as if we could build a system ourselves and save the organization tons of money. With an extremely capable, but small development staff, this was not realistic. As a Project Manager, one of my primary, unwritten tasks was to be a liaison between the users, the management, and software vendors. The biggest asset I brought to the table was not my technical knowledge, but my ability to relate and communicate with people. This relational approach helped to smooth the waters when system implementation became challenging and ultimately resulted in a successful implementation to a new system.


Bernhardt, V.L. (2008). Translating data into information to improve teaching and learning. Larchmont, NY: Eye On Education, Inc.

Saturday, June 11, 2011

"Dirty Data" and its Consequences

The Reading School District has a total of 26 schools, including elementary schools, middle schools, gateway schools, vo-tech schools, and high schools. So, trying to understand the various potential sources of data entry for SIS systems is a challenge. The questions posed for this discussion board prompt – eg, “how is data quality compromised? how can we improve data quality?” - is like asking someone to describe an elephant. My view of the world of data entry at Reading School District is largely limited to my view as a math teacher in one of the 26 school buildings in our district. And I do not even know all of the details of data entry done by the various secretarial and administrative personnel within my building. With this caveat, I will attempt to answer the questions posed to the best of my knowledge.

With the number of schools in our system, data quality could most easily be compromised by the multiple points of data entry. When a new student comes into the school district or transfers from one school to another, who “owns” the data entry piece? When does ownership transfer from one party to another? By not having a single data owner or by having more than one party believe they are the data owner are two potential sources of data compromise.

There are many consequences of dirty data and I will give one example that I have seen in my school. Study Island is a web-based program used in our school for improvement and practice in Math, English, and Science based on our state standards. Each student has their own individual login for access to the system. As a teacher, I can create classes and assignments for any of my math classes. The standard for creating a student ID in Study Island is using the student's name in the format first_last@rhs, such as dougsnyder@rhs. However, I have seen student names such as snyderdoug@rhs or dougsnyder (without “@rhs”). This inconsistency makes it difficult for me as a teacher to create a class. Or, if one account is created and later corrected, a student sometimes has “two” Study Island accounts to use. Another are of potential conflict is if we have two students with the same name – I had two students named “Anthony Ramos” this school year. Having the same name can cause data entry confusion.

In order to improve data quality, it is important to make sure, at a district level, that each piece of data is identified and a data owner is assigned to each piece of data. Procedures need to be put in place for transferring ownership of a piece of data from one owner to a new owner. Procedures also need to be put in place to correct data entry errors like the Study Island example that I mentioned above. Having the ability to merge and/or delete existing records is an important component to ensure overall data integrity.

Thursday, June 9, 2011

Reading School District PSSA Data Analysis

Reading School District publishes a District Report Card each year (“Reading PSSA Data”). District results are broken down by gender, ethnicity, and other aggregated groups (eg, IEP, English Language Learners, Migrant students). Results are also broken down based on participation and performance.

With respect to NCLB and Pennsylvania state standards, Reading is far behind most of the schools in Pennsylvania. For PSSA Mathematics in the 2009-10 school year, Reading School District is at 73% proficiency (3rd grade), 62% (8th grade), and 24% (11th grade). These numbers are significantly below the state percentages of 84%, 75%, and 59%, respectively. For PSSA Reading in the 2009-10 school year, Reading School District is at 53% (3rd grade), 61% (8th grade), and 42% (11th grade). The state percentages are 75%, 82%, and 66%, respectively (“Reading PSSA Data”). However, in comparison to numbers from the 2008-09 school year, four of these six percentages are an increased percentage of proficiency and the two decreases were by 1% each.

Within various student groups in 11th grade, the Black student group is the group closest to percentages matching the state averages for both Reading and Mathematics.

To some degree, this paints a realistic picture of our school district. While the data is measured by various ethnic groups, I believe it is difficult to measure progress with relation to a baseline number for individual students or student groups. For example, for some migrant students, it is possible that they did not begin any formal education until coming to the USA at 10 years old. This means that they may have missed the basic building blocks of reading and mathematics and are starting their school career behind many of their classmates.

For someone moving into the district or already living in the district, this data shows the reality – Reading School District is below the state averages, but incremental improvements are being made. In some grade levels, average proficiency has increased by as much as 7% in one school year. For administrators and teachers, more specific data is needed than these overall aggregate numbers. For example, it would be helpful for me as a teacher to know if only 15% of 11th grade math students were proficient in solving a quadratic equation using the quadratic formula, for example. For the general public, it is helpful to know the aggregate numbers and see incremental progress from year to year.

In comparison to Daniel Boone School District (my home district) and Muhlenberg School District (the closest neighboring school district), the county results show a significant gap in proficiency in both Reading proficiency (minimum 20% difference) and Mathematics proficiency (16% difference). Certainly, if I was a parent considering moving into Berks County, I would not consider moving my children into Reading School District based on these test scores (“Berks County PSSA Data”). While it is a positive thing to see the incremental improvements from year to year, such a large gap between Reading School District and the other county districts would cause me great concern as a parent.

Reading PSSA Data. (n.d.). Retrieved from http://www.readingsd.net/www/readingsd/site/hosting/districtnews/RSD%2009%2010%20Report%20Card.pdf

Berks County PSSA Data. (n.d.). Retrieved from

http://projects.mcall.com/PSSA-results/county-districts/BERKS/

Tuesday, June 7, 2011

SIS / CIA Interview Results

Today I interviewed Craig Dilks, the head of Information Services for Reading School District, with the interview questions previously posed. Here are Craig's answers to the questions:

  1. What is/are the name(s) of the SIS / CIA system(s) used in Reading School District?

    A: Reading is using eSchoolPlus for their SIS and PerformancePathways for their CIA. Craig's responsibility is for the SIS, so the rest of our conversation focused on his knowledge of the SIS.

  2. When were the current systems installed?

    A: Current systems were installed in 2005.

  3. Were the systems installed in multiple phases?

    A: There was conversion from a previous AS/400 system to the current SQL system. In terms of phases, most of the system was installed simultaneously, but the Gradebook feature is the last one to be phased into use.

  4. What system was the first priority when the systems were originally installed?

    A: Priority was determined by involving users within certain parts of the organization - clerks, administrators, teachers, etc. Parents' access to their children's information was also important and more than 12,000 letters have been sent out to parents giving them information needed to log into the system and access their children's information.

  5. Which of the following student information tracked in the SIS systems? class attendance, tardies, referrals, schedules, grades.

    A: "Yes" to all the information. Referrals was the last piece added.

  6. Is the system integrated so that a student’s history can be tracked from year to year and school to school?

    A: Yes.

  7. Is there any overlap of data among different systems?

    A: There is some overlap between various systems which require the two systems to talk with one another. One example is a Special Education system which interacts with the eSchoolPlus system for information sharing.

  8. How does Reading School District use data warehousing? How much history is warehoused?

    A: There are different ways to define warehousing, but Reading School District does use data warehousing. The best example of this is Performance Tracker.

  9. Was there a team of people involved in the selection process? How was the team chosen and what different groups were represented in the selection process?

    A: Various levels of staff members were involved in the selection process and were given the opportunity to use the software and participate in vendor demos. A core team of "superusers" was used throughout the process to be the voice of the various end users' of the software.

  10. What kind of security measures are part of the system?

    A: System security is at the user level. Users are only given access to the information they need. For example, teachers have access to all of the students in their own classes, but cannot view all information on all students. Since it is easier to give "more" access than to take away access already granted, Reading's IS Department errs on the side of limiting access to information.

  11. What is the maximum capacity of the system in terms of concurrent users / students?

    A: Capacity is limited to available disk space, so it is virtually unlimited. Currently, there are 18,400 users - ie, parents and staff - who have access to the system. Concurrent users are not explicitly tracked, but is estimated to peak around 500-600 concurrent users.

  12. What types of reporting is available in the SIS / CIA? What kind of reports are currently being used and by whom?

    A: There are a number of canned reports as well as a report writer (Cognos Impromptu) for creating customized reports. Data on the customized reports is only related to the creativity of the user and their knowledge of the tables within the SQL database.

  13. What kinds of reports are currently being used and by whom?

    A: (See answer to 12 above)

  14. What are future plans for system expansion?

    A: No immediate plans for system expansion, based on current budget constraints and today's economy. Future expansion will be based on demands from parents and other district needs.

  15. If you were choosing a new system today, would you choose the same system we are currently using? Why or why not?

    A: Yes. There are advantages / disadvantages to every system. The system implemented by Reading School District meets our current needs and has other features which we have not yet implemented. This system is being used successfully by other school districts in Berks County and beyond.

Monday, June 6, 2011

Obtaining and Ensuring Good, Quality Data

Chapters 3 and 4 of Bernhardt’s book focus on the data being collected and the quality of the data being collected in the systems introduced in Chapter 2.


In chapter 3, Bernhardt talks about the specific types of data and focuses on demographic data, as well as the processes which occur within the school, and the perceptions of students, parents, staff, and others with regards to the learning environment. She then goes into a discussion of Input / Process / Output (IPO) and notes that there is a finite number of each of these elements within a learning organization (Bernhardt, 2008; p. 33). In order to understand the relationships between each of these three elements, data from all three groups must be gathered and analyzed. Bernhardt then launches into a discussion of the different types of analyses which can be done: Classroom Analyses, School Analyses, School District Analyses, and State / Federal Analyses.


Chapter 4 focuses on improving the quality of the data being entered into the systems. Bill Gates' quote in the beginning of this chapter sums up the use of data systems succinctly: The first rule of any technology used in business is that automation applied to an efficient operation will magnify the efficiency. The second is that automation applied to an inefficient operation will magnify the inefficiency” (Bernhardt, 2008; p. 43). So while often times the focus is on automation, the more important focus is on internal organizational processes to collect and enter data into a system. Bernhardt then focuses on the common ways in which data quality can be compromised – often we think of data being compromised by input errors, but data can also be compromised by flawed processes. Bernhardt suggests that ensuring data quality can be done in a number of ways – by setting clear data definitions, but implementing proper procedures for data entry, and by assigning a data manager who “owns” the data and has responsibility to ensure that the data is “clean” (Bernardt, 2008; p. 46).


There is a tried and true phrase used in computer circles - “Garbage in / Garbage Out”. Effectively, this is what Bernhardt speaks about in Chapter 4 rang especially true for me as a project manager. The most difficult part of new system implementation was meeting with users and understanding the processes in place and trying to determine if any of the processes could be automated and/or improved. Breaking the system down into smaller pieces and assigning a “data czar” in each area was an important part of the process. Internal procedures are part of the process and automated systems can help by enforcing the defined rules. For example, if it is important to enter “M” or “F” for male or female, an automated system can either check input or provide a drop down box of options to enforce consistent data entry. There are many more examples similar to this one. Bernhardt's last piece of advice in this chapter is appropriate: “The data tools you use and data analyses you generate can only be as good as the data in them.” (Bernhardt, 2008; p. 55)

Bernhardt, V.L. (2008). Translating data into information to improve teaching and learning. Larchmont, NY: Eye On Education, Inc.

Sunday, June 5, 2011

SIS / CIA Management Tools

PerformancePathways is an assessment and curriculum management system which is part of the PLUS 360 suite of software products offered by Sungard Systems. The student information system part of this suite is eSchoolPLUS, which is the SIS system used by the Reading School District. PerformancePathways is a suite of products which includes PerformanceTRACKER (for tracking student performance), AssessmentBUILDER (for building and scoring local benchmarks), and CurriculumCONNECTOR (for comprehensive, standards-based curriculum development and review). This is a web-based solution and provides support for Common Core Standards ("Performanceplus" )

PowerSchool is a web-based student information system which provides a number of different modules based on the needs of the individual school district. Some sample modules include Immunization and Health Screenings, Classroom Management, Scheduling, and Student / Parent modules ("Powerschool" ). There are a number of different individual modules and online demos to help the potential user understand the features of each software module.

My Learning Plan bills itself as “the leading PDMES (Professional Development Management and Evaluation System)” (“My Learning Plan”). This system is also web based and seems to be designed at more of a district level in order to manage professional development programs at the district, regional, and state levels. It seems to be more focused on professional growth than on student information systems that PerformancePathways and PowerSchool offer.

I find it interesting that there are probably a handful of companies who offer similar types of systems for school districts' use. I found the same thing when attempting to select a donor management system for the non-profit for which I worked. There were a minimal number of options with strong opinions about each one and what they could do. What I found was that there were multiple solutions that would work for our organization. There were not many features that made one solution better or worse than another. Of course, pricing became a factor, as it will for most any school district. And, the real key to having a successful implementation of software had more to do with the internal staff understanding their own organization's needs and working hand in hand with the software vendor to implement the solution.

Performanceplus. (n.d.). Retrieved from http://www.sungard.com/en/sitecore/content/campaigns/corporate/plus360/products/performanceplus.aspx

Powerschool. (n.d.). Retrieved from http://www.pearsonschoolsystems.com/products/powerschool

My Learning Plan. (n.d.). Retrieved from http://www.mylearningplan.com/Content/Home/mlp.html