An Introductory Guide

Many talk about the value of research and the gems it can help uncover. The topics for learning and potential discoveries are vast and are often backed by countless publications and tutorials available online. Yet, it seems that not enough attention is given to the “how” of the process itself. Research can take many forms, have different objectives and timespans and at the end of the day may get discouraging.

Disclaimer: the shared links will have a strong flavor of AI, but some things should apply to other fields as well.

The process

For great general advice read Lessons from My First Two Years of AI Research written by Tom Silver, PhD student at MIT, 2018.

Here are my condensed highlights from it:

Carefully coding a baseline of the problem at hand can help discover bugs in your mental model early on.
Begin with code for visualization, it will make it easier to discover problems with your ideas. And keep in mind this:

“A bad visualization will require recalling in detail the code you wrote to produce it; a good one will scream an obvious conclusion.”

Use any platform that works for you to pick what to read: community (e.g. Twitter), newsletters, trusted conferences agenda, etc. Also, check out Arxiv Sanity Preserver by Andrej Karpathy.
Recognize the motivation of the paper and its relationship with a field of AI in general, e.g. “math”, “engineering” or “cognition”. A project that effectively combines existing techniques will lure engineers but might bore cognitive scientists and vice versa.
The best explanations are often found during invited talks by the authors because that’s where clarity (and not word count) is valued most.
Take time to reflect on a paper you’ve just read, don’t rush it.
Collect tangible evidence of accomplished work by the end of each day: an idea in detail, what was ruled out and why, conversations made, etc.
Take notes of your thinking process and of virtually anything that seems valuable at the moment, you might forget it.
When feeling stuck don’t hesitate to step back, you’ve written everything down, so you can safely come back to this later. Quote:

“Backtracking is forward progress.”

For more valuable insights and structure read the original article.

Reading of a scientific paper

Now that you hopefully feel even more excited to dive into reading, watch the video tutorial by Yannic Kilcher, PhD Student at ETH Zurich, How I Read a Paper: Facebook’s DETR. He has a lot of well-curated reviews of deep learning research, but I feel like this one is worth watching first. In this video he shares his approach and shows which sections he looks at first and which he usually skips, which require thinking critically or going over twice, etc.

Small tip: Many researchers claim that a “Related Work” section is irrelevant, but I feel like sometimes it’s actually a great way to start unraveling the topic, since you can follow through the references with a bonus: authors’ assessment of them. Alternatively, look for a good survey paper.

The tool used in the video is OneNote.

Organization

A few days into reading and papers begin to pile up. Even if you take notes, unless you organize them properly they get lost and forgotten. It’s hard to remember what was unique about each publication. Also, you might want to highlight some text inside of a paper itself. Save a tree if you can and use some software instead. There are benefits that come with it, such as an opportunity to organize your ideas into a hierarchical structure, to attach notes, collaborate with a group of people, etc.

Let’s go over a few options out there.

Zotero

Meant for academic research: Yes
Subfolders: Yes
Adding notes: Yes, both general and attached to a paper
Group search: Yes
Group membership types: Public Open, Public Closed, Private
Supported platforms: macOS, Windows, Linux, a browser
Built-in highlights inside papers: No, only through Adobe Reader commenting tools
Chrome extension rating (Match, 2021): 4
License: Free
Overall experience: My personal favorite. The desktop version looks a bit outdated, but Web Library has it all

Mendeley

Meant for academic research: Yes
Subfolders: Yes, but can only be created from a desktop version
Adding notes: Yes, but can only be created and viewed in a desktop version, can only be attached to a paper
Group search: Yes, but I wasn’t able to find it in a new layout
Group membership types: Public, Invite-Only, Private
Supported platforms: macOS, Windows, Linux, a browser (fewer features), Android, iOS
Built-in highlights inside papers: Yes, with a sync feature
Chrome extension rating (Match, 2021): 3
License: Free
Overall experience: The desktop version looks a bit outdated, while the online Library lacks some features. Shared highlights save the day

Paperpile

Meant for academic research: Yes
Subfolders: Yes
Adding notes: Yes, can only be attached to a paper, no formatting
Group search: No
Group membership types: With an email or with a link
Supported platforms: Chrome
Built-in highlights inside papers: Yes (beta)
Chrome extension rating (Match, 2021): 5
License: 30-day trial, then $2.99/month for academic, $9.99/month for business, billed annually
Overall experience: Didn’t enjoy the design of the library, didn’t find a reason to choose it over other free alternatives

These are just a very few. Invest some time into finding the one that works for you, it will definitely pay off!

Implementation

Check out how other people solve similar problems with code, then build your own thing in a fork or in a new repository. Just don’t forget to check the license and to cite the authors if you are using someone else’s work.
Papers With Code is a great place to do that. They also recently partnered with arXiv where now you’ll find a Code tab for each publication, so see if authors left a link to Official Code.

Methods

Visit this page regularly to keep yourself up-to-date and to check for new methods organized by areas of Machine Learning.

Browse State-of-the-Art

This page may introduce you to 3 very valuable notions:

benchmarks;
leaderboards;
metrics;

You can also search for relative leaderboards on Kaggle. But be extra mindful about leaderboards in general. As Chip Huyen writes in her booklet Machine Learning Systems Design in the Performance requirements section:

“A few percentage points might be a big deal on a leaderboard, but might not even be noticeable for users.”

And also

“…a model can do better than the rest just by chance (AI competitions don’t produce useful models, Luke Oakden-Rayner, 2019).”

To conclude I’d like to wish you good luck with your project. You are going to be inspired, excited, daunted, frustrated and all over again. It’s all part of the process. Just remember: discovering new things is fun! 🎉