Unearthing A 2 year old product – GmailExtractor

GmailExtractor was my friend’s project. He worked on it two years ago, approximately few months before I joined the company. The project had two iterations one done by my x-colleague and the 2nd version by our former intern. Unfortunately, neither versions had any documentation regarding setup or working(requirements or readme file).

What is GmailExtractor?

I don’t have the exact context of why GmailExtractor was born. Recently we were working with one other team in the company and were communicating specs and project details via email. We thought it would be nice to maintain all those communications in one wiki.

So let’s just say GmailExtractor extracts Gmail of a particular context and creates a document of required format.

Why am I working on it now?

Maybe I don’t have life, or this is how my life is :P.

Goal 1 – Setup & Running Project

I got the repo from GitHub. It was a Django project. I remember us using django==1.9ish that time so Installed it.
But the project also uses google’s oauth2client which was depreciated and to add to the flame with the last supported version it threw a bunch of errors.

I tracked down their changelog, Good thing they had one and figured out oauth2client released another major version than the one used on the project. I have to downgrade it to every version between 3 to 2 to find it finally work at 1.5.11

The same drill goes on for installing all the packages,djangorestframework, apiclientstripogram.

sgmllib was the final string. Found out it was a core python module in python2 and depreciated in python2.6. I dumped the idea of using python3 anymore and switched to python2.7.

Now djangorestframework  won`t work with current Django version. So again I went to the changelog, scrolled down to 2015(remember the project is two years old and 2018 just started)installed some random version and yay it worked.

Finally, after DB setup I got the project up and running.

Goal 2 – Working Project

Next stop redirect_uri_error

1. With the default config, it was pointing to some Gmail account, so I have to create new credentials. Good thing that Google allows you to configure localhost
2. Even after changing the config file it was still using old id. It was because of the data created in the DB due to my previous attempts, so I deleted all `Flow` objects.

Don’t ask me what flow object is yet. My primary goal is to get the project working

3. Once the authorisation of the user is complete, Gmail API was used to extract emails from users account. It was throwing a weird exception in the code.  Then I figured out it was because I didn’t enable the Gmail API on the Google developers console.
4. I could now successfully sign Up and set a password for my account. The dashboard listed all my Gmail labels.
5. With all the enthusiasm I clicked on one of the labels and one more error. Thank god this time it was a tiny one.
6. Finally, I got all my emails listed based on the labels, and I could download it in JSON format.

Closing Thoughts


Had I got how the project works?

Not yet. But I will get there. I have a basic idea of how the control flows I think its a good starting point.

How long did it take for the above process?

About 3 hours.

What did I learn out of this session?

Lot of things
1. Always document and package your code well so that the ones who use it later won’t curse you.(Sorry buddy !!)
2. The importance of requirements.txt file.
3. I liked the way my brain figured out things like installing the packages to debugging the errors. I realised how my thinking has evolved, and I was ok with breaking the software like I was an electrical engineer connecting modules after modules making the application glow in the end.

For any kind of feedbacks and suggestions tweet @geeky_bhavani