Let’s talk

About

Let’s talk

Let’s talk

Back

Autograder for Arizona State University

AI Application Helping Universities Assess Student Work At Scale

Scope

From manual to automated processes with custom software

It’s safe to say Arizona State University (ASU) is one of the most innovative universities in the world. They push the boundaries of education technology on a daily basis. Their offer is constantly growing along with the user base. That kind of success presents some new challenges.

Grading students’ work at scale becomes time-consuming, management- and scheduling-heavy. Costs and timelines skyrocket quickly with a manual approach. Unfortunately, in the end, it negatively impacts students’ experience. With limited internal resources available to tackle the problem, you can find yourself quickly in a stalemate. To handle the growth, you need more resources, or your limited capacities will stunt your growth. Not to mention, you may start losing your users. This is a very tough cycle.

The leverage that helped Arizona State University break the cycle was with the help of the combined talents of GogoApps and Flod. We’ve worked with ASU multiple times, but this opportunity was different, more complex, open-ended, and simply exciting. This time, we faced a short notice for the project start, measured in days. A tight deadline of 8 weeks. And the ASU team’s vision of what the end goal could be. This product vision proved to be strong and ultimately everything that we needed.

The goal was to make life easier for graders and provide faster feedback to students. Ideally, instead of days, the feedback should be available in hours (or faster). Also, the solution should not aim to substitute human graders. It should help them streamline their work, cut time for feedback, and cut down on manual activities. The consistency of feedback should improve, too. Lastly, there was one more important thing to keep in mind - student privacy.

The main requirement was to research all the possible options. Is our custom software product better than existing solutions? What are the costs? What about the maintainability? What about the vendor lock-in?

Our assignment, in this case, was an uncharted area. There was nothing like it before. This work had a very exploratory character, especially in the early stages of the project. Nobody could be 100% sure what it was going to be. To sum it up, being guided mostly by a product vision, we started our work, paying attention to all the angles that the ASU team presented to us.

Company Name: Arizona State University

Industry: Education

Country: United States of America

Project: AI Grader / Autograder

Goals: Create a proof of concept (custom software) and a landscape analysis

Requirements:

Cut time-to-feedback
Help graders (not substitute them)
Needs to be cost-efficient
PoC will be further developed in-house

Scope:

Custom software (backend to handle data and integrations)
ML Engineering (LLM layer to provide automated feedback for graders)
UX research (Landscape analysis, process mapping)

Streamlining processes with automatic data sharing and integrated AI application

The latest LLM technology advances is the game changer, with ChatGPT at the forefront. It allowed ASU to explore new ways of addressing their pain points. Yet, before we started coming up with clever prompts, our project needed a team. Together we’ve gathered what outputs are expected and it gave us better understanding what kind of talent will be needed.

A cross-functional with backend engineer, machine learning (ML) specials and user experience (UX) designer seemed like the right choice. On top of that a delivery manager was assigned to make sure we’re in the right place, risks are monitored, progress is transparent and everyone have a point of contact to address their needs.

One of the main requirements of a landscape analysis, which included current snapshot of AI-based learning solutions. Our UX expert assessed aspects such as grading features, integration and compatibility, security and privacy, UI and ease of use, scalability, reporting and analytics, communication, support, and costs. The goal was to figure out if the need could be addressed by products already available on the market.

We’re are in the early stages of the LLM-based product market. It turned out that the quickest, cheapest, most flexible and safest way to create the AI Grader was to outsource a custom software agency.

Once we understood the direction we need to go, we started to map the user flows and grading process. It allowed us to visualise the end-to-end process and design the software solution accordingly.

First, our grading backend used API to received payloads from LMS (Learning Management System). This was a starting point of the grading journey. After receiving a request JWT token’ signature was verified and payload extracted. Based on the information backed would make additional calls to LMS to obtain additional data (like submission info, assignment info, templates how to grade, etc.). Then appropriate prompt was sent to the LLM API for processing. The response with a grade draft was directed to a ticketing system for a human grader to do their part. All communication used secured and encrypted protocol (HTTPS). JWT tokens were verified using public keys, so there was a separation between signing and verification processes, at the same time ensuring scalability and ease of key rotation. In the end, any attachment conversion was done inside the docker container and was immediately discarded after use.

Next we have the core of the application that produces automated grading feedback based on the set prerequisites. Inputs digested by the application consisted of student submission (submission ID, assignment ID, text body, attachments). Submission’s text was combined into a single string divided into distinct sections for clarity. Then the instructions went into te system stripped down from everything not related to the submission. And finally the rules of the specific assignment were applied. Before we got an output with grading feedback the application was running between 1 and 10 threads. First one always generated the assignment completion feedback. Middle ones generated optional generic feedbacks and the last thread was optional. Each thread resulted in a container object and was later aggregated. Then the objects were being handled by a text processor and the html string was a final output. Regarding LLM processing thread flow, we’ve decided to use Tree of Thought (ToT) processing. We could go into more details, but most of them were related to the specific needs of ASU’s team.

It’s important to mention that from the long-term perspective the ASU team was supposed to take over further development of the PoC, so the product needed to be easy to hand off and take over. So we’ve provided a service of deployment to ASU’s infrastructure, handed over the code with detailed documentation, including architecture decisions that were made.

One last question: How the project was managed? We’ve tailored the meeting scheme to address timezone difference. There were at least two meetings with the ASU team, one for sprint review, other for our Delivery Manager to talk about progress from a high-level perspective. Then we had a Slack communication channel for asynchronous communication. Some of our team members preferred to work in the US business hours, so the Slack worked for everyone perfectly. That simple set up combined with proactive communication did the trick. On top of that our Delivery Manager provided the ASU team with regular reports on the progress, product state, budget updates and monitored risks.

It was amazing to see how vision comes to live into something real, ticks off all the checkmarks, it’s on time, and on budget. But don’t get us wrong, there were challenges…

Team:

Backend Engineer
ML Engineer
AI Researcher (from Carnegie Mellon University)
UX designer
Delivery Manager

Technology:

Python
JWT
Docker
Langchain
open.ai ChatGPT API
AWS

Design team output:

Landscape Analysis Report
Mapped user flows in Figma

Delivery Manager output:

Cost and progress trackers
Roadmap
Project process management
Communication management

Software output:

Integration of ASU tools (3 platforms) and systems via custom API.
ML layer on top of ChatGPT to produce expected feedback accordingly to set rubric.
Dockerized apps, deployable on any cloud.
Deployment on a chosen infrastructure, along with Ci/CD.
Git repository with robust documentation.

H1: LLM integration into daily grading process greatly improves the feedback speed and consistency

Our AI Grader PoC proved to speed up greatly entire grading process. Instead of jumping between tools to assign people and change statuses, most of this happened automatically and quickly. Output provided by the system consisted of suggestions in bullet points aiming to help the grader provide the final score and feedback. Delivered documentation on the LLM model, on the BE and process, along with the code provided all the tools for the ASU team to continue working on our innovative application. From the first moments it looked like most, if not all, of the requirements will be checked off.

Now, that the ASU took over the project we hope the work will provide them as much fun as it did to us. It opens doors to vast new possibilities and it feels like the beginning of something even bigger.

There’s a long list of project highlights:

We can start with strong ASU vision of the product. It was the fire fuelling the work up to the final delivery. It confirms that the vision is all you need when you have the right team for product discovery and software implementation.
ASU team’s leadership was a key success factor that allowed us to turn the idea into code in a truly Agile fashion. We’ve focused on interactions, collaboration, responding to change and, lastly, working software.
Good communication management, with multiple channels (async and regular meetings), rendered timezone difference irrelevant.
This list could go on for a while, so let’s close it of with a classic work that was delivered on time, as agreed with the ASU team.

There’s also quite a list of challenges that we had to face:

A short time for project start and the delivery. Having limited resources we had to be smart about planning our work. Transparent and proactive communication was a key in addressing this.
At the early stages there was a low visibility for how the end product will look like, and where it will be deployed. Dynamic project tempo and changing details impacted some delivery aspects (like accesses for infrastructure and some tools). In the end the final deployment to ASU infrastructure took a bit more time than initially estimated.

Those challenges is something you can’t avoid, but they need to be actively monitored and mitigated.

Challenges:

initial uncertainty regarding final infrastructure and product
getting accesses without knowing the exact final outcome, some aspects had to be resolved at later stages

Highlights:

ASU vision and leadership
truly Agile approach
delivery checked-off all the requirements

Result:

LLM-based autograding application PoC met the requirements and set goals
Code and documentation opens the product for further in-house development

The best way of getting the most optimal results when scaling up is through addressing repeatable business processes. You don’t have to know what exactly it’s going to be. We can work this out together. It’s likely there are some existing solutions to help you out-of-the-box, but they won’t be tailored to your specific needs. Your own custom solution gives you better control over the costs and minimizes the risk of vendor lock-in.

GogoApps can help you with that. We’ll take care of everything, all you need to do is contact us and…

Duration

6 years

ongoing

Team

3x Frontend

3x Design

2x Delivery

1x Quality

Software Stack

MobX

React

Tailwind

Typescript

React

MobX

Tailwind

HTML

CSS

React Testing Library

Design

Illustrations

Animations

Motion

Autograder for Arizona State University

Scope

From manual to automated processes with custom software

Streamlining processes with automatic data sharing and integrated AI application

H1: LLM integration into daily grading process greatly improves the feedback speed and consistency

Duration

Team

Software Stack

Design

Let’s talk.