Computer Vision for the Visually Impaired

Team: Rylan Schubkegel, Grant Walker, Nathaniel Russell, Luke Havener, Evan Cox
Advisor: Dr. Brian Snider
School year: 2020–2021

Design Challenge

Our team explored the design challenge of helping visually impaired individuals navigate the world around them. As part of our human-centered design approach to tackling this challenge, we developed a high-level user story to frame our design efforts: "As a visually impaired person, I want to identify common objects and read text and symbols on signs as I navigate my environment."

Through our work on this challenge, our team (Fig. 9) developed a passion for designing tools to help visually impaired individuals navigate and interact with the world around them.

Design Solution

Due to visual impairment of the user, our solution must augment visual ability and translate visible information to auditory cues. We constructed a low-fidelity prototype device featuring a head-mounted camera, tactile control strip, headphones, and an external battery/processing unit (Fig. 1). Upon tapping the control strip, live video or still images would be processed by a computer vision algorithm running on the processing unit, and the resulting information would be read out loud to the user through the headphones.

low-fidelity prototype
Fig. 1: Low fidelity prototype of the device.

User Interaction

While exploring ways users might use our device during day-to-day activities, we tested our prototype in various scenarios, allowing us to refine our interaction design (Fig. 2). The diagram allows us to visualize the number of steps the user would have to take in order to achieve one of their goals.

flowchart and state diagram hybrid
Fig. 2: Flowchart and state diagram hybrid showing device interaction.

We quickly realized that fully implementing all aspects of our interaction design would be an immense task, ultimately leading us to constrain our solution to address one of our specific user goals: reading a hand of playing cards during a game with friends.

Machine Learning

To optimize object detection, we decided to train our own computer vision model. Our project required 1) custom object detection, 2) bounding box detection, and 3) live visual input. After looking at several Python libraries including YOLO, we decided to use Detecto because of its simple API and fulfillment of our needs.

To train our model, we took over 400 pictures of playing cards and manually labeled them. To increase the data set even more, we wrote a Python script to create rotated copies of our photos and their corresponding labeled bounding boxes, effectively quadrupling our training data.

In order to accurately detect cards, we combined the data of two separate neural networks. The first detects whole playing cards (Fig. 5), which is ideal for more distant cards that are unobstructed, such as cards on a table. The second neural network detects the corners of playing cards, which is ideal for closer, partially obstructed playing cards, such as the fan of cards in a player’s hand (Fig. 6).

After training our data sets, we found it just as important to test the neural networks. We decided to use a confusion matrix where the x-axis depicts the predicted value against the actual value (y-axis). The ideal result would be a solid diagonal across the matrix showing that every predicted value maps to its actual value.

example of bounding box detection
Fig 3: Bounding box detection.
example of bounding box detection
Fig 4: Bounding box detection.
optimal whole card detection data
Fig. 5: Optimal whole-card detection data.
optimal card corner detection data
Fig. 6: Optimal card corner detection data.
confusion matrix of whole-card detection model
Fig. 7: Confusion matrix of whole-card detection model.
confusion matrix of card corner detection model
Fig. 8: Confusion matrix of card corner detection model.

Results

We tested our neural network’s confidences on 52 cards and visualized the results using a confusion matrix. We found that detection on the whole of a card (Fig. 7) proved to have higher precision than detection on the corners of the cards (Fig. 8).

Our whole-card model could detect the correct playing card with an average confidence of 78.9% for the whole-card model. Interestingly, the highest probability was always the correct playing card with a testing data set of single card images.

The symmetry shown on the corner confusion matrix shows us that the corner-card model confuses cards of the same color. Because of this confusion, the accuracy of the corner-of-card model is not nearly as high as the whole-card model. We plan to combine both predictions to establish a better confidence in the cards.

Future Work

The next step for our team would be to train the model to recognize objects for additional user goals, such as bathroom signs and classroom numbers. We would also add the “live detect” and “navigation” modes according to the state flowchart diagram.

Additionally, we would want to make the system portable. Although putting the system into glasses is our end goal, we would like to start by miniaturizing it into a Raspberry Pi or another small computing system.

team photo
Fig. 9: From left to right: Luke Havener, Evan Cox, Rylan Schubkegel, Grant Walker, Nathaniel Russell.