Boston Youth Arts Evaluation Project
Creating an Evaluation Plan & Designing Tools
Before we launched into developing our own tools, we researched many others, hoping that the appropriate tools had already been developed. Although we found many helpful, none spoke specifically to the three desired outcome areas for our current participants (I Create, I Am, and We Connect) and to the six intermediate and long-term outcome areas we identified for our alumni (Able to Navigate, Able to Engage and be Productive, Able to Make Connections and to do so with Resiliency, Self-Efficacy/Personal Fulfillment, and Community Engagement). It was clear we needed to create our own tools, but, knowing that this was going to be a daunting task, we first needed a plan.
Creating an Evaluation Plan
The following ten questions inspired by the W.K. Kellogg Foundation’s Evaluation Handbook helped us to design both a plan and the tools that we needed (1998, pp. 47-99). We believe these questions are very helpful for all organizations that are attempting to design a system of evaluation.
- WHO IS ON OUR TEAM? Identify stakeholders and your evaluation team, including staff, early in the process. Getting input from all of your staff members on the design of the evaluation tools is very important. Collaborators regularly asked for feedback from their team, and we held all-staff training for all five sites to help design and pilot our tools.
- WHAT DO WE VALUE? Define the “sacred bundle” (the creative soul of the work that you do). Develop a strong logic model and clear theory of change. Do this with your team (not in isolation) in order to get buy-in from a diverse and rich knowledge base. We worked with five different disciplines and populations, and while this was very challenging at times, we were closely aligned in our values.
- WHAT DO WE ASK? Define the indicators/outcomes in your logic model and then develop evaluation questions that align with your logic model. Indicators should be Specific, Measureable, Action-oriented, Realistic, and Timed (SMART). Make sure, too, that the questions connect with your “Sacred Bundle.” The toughest part of our task was formulating measurable questions that were aligned with our indicators. Writing them in a language accessible to both youth and funders proved quite challenging.
- WHAT WILL IT COST? Budget an amount between 5-10% of your project’s total budget for evaluation. Know that evaluation is time-intensive and that there is significant effort and time needed for the next six steps. Although RAW received funding to help manage and lead this project, none of the collaborating organizations received funding to offset the additional resource demands of BYAEP. The staff time devoted to this project exceeded our budget, and we found that we often underestimated how much time it takes to formulate, implement, and analyze evaluations. Creating the BYAEP Handbook is partly an attempt to minimize the time investment for others. That being said, the process was deeply rewarding, and wrestling with the questions, our values, and the analysis enhanced our ability to understand and convey our missions.
- WHO OWNS THIS? Find out who will take on the evaluations. Will this be handled with staff on hand and/or external evaluators or consultants? This time-intensive process requires ownership and a clear assessment of staff and outside skills and resources (especially time) needed. We received a lot of advice and help on this project. We also needed to contact experts in the field to help with the pilot design. Suzanne Bouffard from Harvard, Steve Seidel from Project Zero, Michael Sikes from Arts Educations Partnership, Dennie Palmer Wolf, and Julia Gittleman all helped in the formulation of our pilot evaluations along with BYAEP collaborators, who contributed countless hours. Individual staff members engaged in all components of the evaluation process, with RAW’s Käthe Swaback managing the flow, guidance, and details of reporting.
- WHAT CAN WE GATHER? Plan how you will collect the data as you assess the resources and skills available. Determine what data you need to collect and be careful not to collect data that is “interesting” but can easily lead to “data burn-out.” We found that we were collecting far too much data the first year in our pilot. Although all this information was informative, we simply did not have the staff resources to work with all the results. We cut the Self-Evaluation from six pages in the first year to four pages, completed online, in year two. We decided to include optional worksheets for program staff to complete with youth in order to gain other information that would be valuable for the leaders but not necessarily for the organization as a whole (see the Workbook for examples).
- HOW WILL WE GATHER IT? Collect both qualitative (descriptive information) and quantitative (information that can be counted) data. Determine what information you need and how you will obtain it in order to best assess your outcomes. Did we want to use pre- and post-tests, focus groups, interviews, observations, or other creative tools we could invent? We found collecting stories, numbers, and images (photos and other visuals) was important in capturing the vibrant makeup of our programs. When we could, we offered multiple-choice answers in order to derive percentages that we could rate and compare. Although we saw many downfalls with pre- and post-tests, we used them in order to assess change, resulting in some important findings. It was also important to assess things creatively. We piloted the Drawing Evaluations; their results can be viewed in the Appendix.
- WHAT DOES IT ALL MEAN? Analyze and understand your findings. Determine what you can assess yourselves and where you may need technical assistance and statistical analysis. We were challenged by some of the technical aspects of Excel and the fact that none of us were well-versed in statistics and analysis. Learning to manipulate Survey Monkey proved important, allowing us to download reports in a useable format. In our third year we formulated an Excel template to populate all the results from Survey Monkey into a system that presented comparisons and enabled us to delete duplicated and unmatched evaluations.
- WHAT AND WHO CAN WE TELL? Communicate findings to participants, staff, and stakeholders. Report on what you wanted to do, what you did, how you did it, what you learned, and what you might want to change going forward. This was an important part of the process. Many evaluation efforts end in the data-gathering stage, and we were determined to see it through to the reporting stage. With BYAEP we had the unusual opportunity to share our data–both the strengths and weaknesses of our findings–with each other. This afforded us a new lens for viewing ourselves, our organizations, and our field as a whole.
- HOW CAN WE IMPROVE? Make practical use of the results by reflecting it back to your programs. Use what you have learned to inform program improvements and to better assess and meet the needs of youth, staff, and community. Although it was rewarding to see our high scores in several areas, discovering where our low scores fell and discussing how we might work to improve these outcome areas was most beneficial. This was instrumental in setting goals for the year and designing a curriculum and initiatives that would better address these areas.
Researching Designs and Tools
Our greatest challenge was to try to create a reliable, valid, and practical evaluation plan and tools that would address the indicators of our outcomes and provide us with usable data to improve our programs. There is great diversity in the type of evaluation models developed and used by the social sciences. The following approaches are some that were recommended for us to consider.
Experimental Designs: These evaluations are considered the “gold standard” in research because they consider not only outcomes of programs and their participants, but also the comparison of those who are not involved with the program and assigned at random to a control group. The outcomes of the control group are then compared to the program outcomes to understand the direct effect of the program.
Quasi-Experimental Design: This design is exactly the same as experimental design except that there is no random assignment of participants to a control group; instead, the assignment may be based on things like convenience.
Non-Experimental Impact Evaluations: These types of evaluations look at changes in the indicators of outcomes among program participants or groups but do not include comparison groups who are not part of the program(s).
Pre- and Post-Participation Surveys: These surveys relate to before and after comparisons and look at outcomes for participants before the program’s start and at its conclusion.
Retrospective Evaluation: This kind of evaluation asks youth to compare how they are “now” to how they were before they started the program. Retrospective evaluations are seen as less reliable and valid than pre-post assessments because one’s “recall of information through reflection may be subject to problems of insufficient recall as well as offer the potential for fabricated or biased responses” (Lamb, 2005, p. 18). However, other studies have shown little difference in traditional pre-tests/post-tests and the retrospective evaluation.
Utilization-Focused Evaluation: The utilization-focused approach is one in which evaluations are designed, used, and judged by their utility so that the whole process is designed for and by the intended subjects for a specific use. These evaluations are personal, situational, and implemented in a way that makes a significant difference to improving programs and improving decisions about programs.
Participatory Evaluation: Participatory evaluation design is the process of designing evaluations with the people involved in the organization, programs, and/or community (including funders) in order to make the findings more relevant and meaningful to all stakeholders.