This concept, underlying the Potential Outlier Report and the recent Alerts associated with the report, has inspired a lot of productive conversations regarding assessment and the meaning and use of the results. When we have predominately good data to work with, assessment results are used to provide instructional recommendations, group students, make judgments regarding progress to date, plan for the next few weeks, and even set SEP goals and plans to attain them. When we allow bad data to sneak in, all of these decisions, whether on an individual student or higher level, will be negatively impacted, sometimes seriously. This leads to the question: What is Bad Data?
Bad assessment data is an assessment outcome that does not represent a student’s true skills and abilities and leads to bad educational decisions. We are familiar with the concept of an assessment score being “unusual” for a particular student.
Scenario | Fall Last Year | Mid Last Year | Winter Last Year | Spring Last Year | Fall this Year | Mid This Year | Winter This Year |
1 | 45 | 48 | 46 | 50 | 48 | 52 | 18 |
2 | 50 | 55 | 52 | 51 | 1 | 1 | 65 |
Scenario 1 is the typical example of a drop outlier. A student typically scoring in the high 40’s and showing signs of increasing their score suddenly scores way below his/her historical average. We can immediately detect that something is wrong with the Winter score and need to find out why. Was it a function of the assessment conditions, was it a function of something happening to the student (illness, strife inside or outside of school, not caring anymore, etc.)? We need to find out the cause, de-activate the score, help the student to resolve the cause, and then re-test.
Scenario 2 is a typical example of an increase outlier. In this case, the true outliers are the consecutive scores of 1 that bring down the previous score average. They should be de-activated.
This brings us to low scores and/or time. A student spending less than eight minutes to complete an assessment AND/OR scoring a 1 or 6.7 (a 6.7 is usually a 1 with a lucky keystroke) is usually an indicator of bad data. The student may have “blown off” the test, had personal issues, or was not ready for a STAR test.
Bear with me for a thought experiment.
Let’s say that a yardstick is nailed to the wall with the bottom exactly 36” above the floor. Therefore, the true height of a person measured with this instrument is (indicated value +36”). A person with an indicated value of 12” inches would be recorded as being 48” tall. A person who does not come up to the bottom of the stick would be recorded as 36” (0 + 36). This is called the “floor effect”. The CAT nature of STAR has removed the ceiling effect but there is still a floor to its measuring capability. For example, STAR Reading requires at least a 100-word vocabulary before it can be appropriately administered.
When I review the recorded values utilizing this instrument, I realize that the value of 36” is useless (or bad) data. It might be the true value, but the best bet is that the true height is below the reach of my instrument.
The same is true of STAR assessments. Values of 1 (or 6.7) do not tell us much about where a student really is. This is bad data. The important thing is to find out why this value occurs. Some possibilities:
- Temporary, personal issues with the student ranging from not caring to emotional trauma.
- Serious issues with the testing environment either for that student or the group.
- The fact that STAR is inappropriate for use with this student at this time:
- No fluency in English (if the student’s native language is Spanish, STAR Spanish may be appropriate)
- Administering SR when SEL is the appropriate assessment.
- Such serious educational deficits that the student’s “true” score is below the floor.
- Exceptional student issues that might require an IEP specifying alternative assessments.
In all cases of bad data, the score(s) should be deactivated, and the issues should be investigated and resolved prior to re-testing.