What Am I Looking At?

WAILA

What Am I Looking At?

WAILA

Android App Overview and Examples

The goal of the project was to create a functioning android app that was integrated with the Tensorflow API.

The app performs the following functions:

  • Detects object from a video feed.
  • Draws bounding boxes on the object that can be clicked by the user.
  • Uses Wikipedia API to obtain more information about the object.
  • Saves detected object as a memory along with it’s location.
  • Displays all memories that are stored on the cloud database.

Authentication

Authentication is controlled through the Firebase Authentication service.

The authentication fragment supports the following cases:

  • User login for existing account.
  • Anonymous user login.
  • Create new user account.

Screenshot of authentication fragment

Detection from Video Feed

Once camera permissions are granted, the TensorFlow Object Detection intent is started. Here the user sees a live video feed from the phones camera and they can point it at any object. If its an object that the model recognizes, a bounding box is drawn around the object and a prediction is given. The user can learn more about the object by clicking anywhere within the bounding box of the object.

Model detecting a laptop with 97% confidence

Wikipedia Info and Memory Saving

Once the bounding box for the object is clicked, using Wikipedia’s API, WAILA return the details about what the object detected. It also returns another image from Wikipedia so that the user can relate the object they have seen with the picture.

Clicking New Memory allows the user to store this memory. If the user logs in with their credentials, they store the memory in their private database. However, if the user logs in anonymously, they store the memory in the open public database.

Clicking Past Memories takes the user to a page where they can go look at their saved memories.

Screen after detected object bounding box is clicked

Saved Memories

The image, location (if available), timestamp, and object title are all saved on a database hosted on the Firebase Database service. The memories are displayed on a custom RecyclerView. Clicking on a saved memory brings up the additional information saved along with a map of the location using the Google Maps API.

Example of interactions with saved memories

Backend Processing

I think the most noteworthy thing about the way we deal with our database across our 5 different application activities. We have a Java class called PhotoManager that controls every interaction from our database. Since we want the same database reference across all the activities, we used a singleton design pattern to ensure that we always have the same database inference. PhotoManager has a lot of useful functions such as updating the current user, uploading a photo object to firebase, helper functions that convert to and from different data types and also a function that gets all the photo objects for a given user. Additionally, PhotoManager defines an interface getDataListener, that serves as a callback from when the RecyclerView wants to query the database.

Avatar
Nihal Dhamani
Software Engineer

I like things