Aeye: an open-source AI-as-a-Service for the blind


Aeye aspires to be an open-source AI-as-a-Service platform for hobbyists and tinkerers to develops apps and services for the visually challenged. We want to provide the best tools and services for you to go out there and build cool stuff for our less privileged brothers and sisters, to empower them to prevail over their shortcomings so that they too can come forth and contribute and fully be a part of our society.


The latest advancements in Computer vision thanks to technologies like Deep Learning we are now able to augment human vision with computer vision models that help detect the objects around us, detect faces and identify who they are, describe the environment you are in and much more. This can be of great use to visually challenged people, to help them perform their day to day tasks more efficiently.

The service offers simple to use API end points that help deliver basic deep learning modules like Object Detection and Image Captioning.


Currently we have the following modules planned. Each module is an individual Python module that is has functionality to perform at least inferencing when provided with the pretrained models and other artifacts. These modules are imported into Aeye and the necessary endpoints are created.

The module we support are:

  1. Image Captioning - This generates a description of the image that is passed to it. The current implementation is based on the Show and Tell paper. This can be used to generate a description of the surrounding and is able to give some idea to the user about his/her surroundings. Please refer the Module Repo or the docs for more information.

  2. Object Detection - This generates the bounding boxes for a range of objects and is based on the Single Shot MuiltiBox Detector. This takes an image and returns the objects that are in the image and their bounding boxes.

  3. Face Detection and Recognition - We can both detect and recognise the people in photos using this API. Face detection is implemented using the Multitask Cascaded Convolutional Network (MTCNN) which gives very reliable results.

    We can also use this to recognise people. This is basically just comparing the faces that are extracted by the detection API and running and running it through a pretrained InceptionNet to compute the similarities and output the scores. This only works for a few people currently.

    More work on this end is required like recognition based on the user and is planned in the upcomming version.


The quickest way to use the API is our hosted solution. Its hosted on GCP but using the free tier, so reliablitly is an issue. If the project gets some traction we can upgrade to a more stable solution. Head over to on your browser to access the Swagger UI of the API. Alternatively if you want to add it to you scripts try the cURL or requests method


  $ curl -X POST -F "image=@path/to/image.jpg"


import requests
url = ''
data = open('path/to/image.jpg', 'rb')
r =, data=data)

But if the service is down please feel free to run it locally. This was build using PyTorch=1.5 and bentoml=0.7.7 (bentoml is used to create the server and serve the trained models). To setup locally follow the steps.

  1. Clone the repo to you machine

  2. Install all the dependencies

$ pip install -r requirements.txt
  1. Download the pretrained weights and keep it in the artifacts folder.

  2. Run

$ python

This will pack the dependencies and models into a docker container which is now ready to be hosted on any system.

  1. Serve using bentoML.
$ bentoml serve AeyeService:latest

This will launch the server and the service is running on your system.


Contributions would be awesome! This is just an experiment so there is a lot of places were improvements are actually required. If you like the idea and would like to contribute sent a mail to


We would like to thank Future Technologies Lab under KSUM for providing us with the GPU to build the prototype.

I know this sounds cliche but without your support none of this will would have been possible.

To checkout more visit the github repo or the docs. If you have any comments share it here. I’d love to hear your feedback.