Overleaf’s Remote Hackathon

Roberta Cucuzza · October 8, 2020

It’s been a year since our blog post on how we started doing remote hackathons. So many things have changed globally in the meantime, but it seems that our remote-first working practices have been our best resource during this unprecedented year.

During this time, we have done other remote hackathons; the last one, which we did just a few weeks ago, was particularly interesting for the variety of projects that came out. Here we want to share a few highlights.

Before we dive into the projects, let me re-cap on how we organise ourselves during these remote events, which may be useful if you are also thinking of moving your hackathons online.

Why we do hackathons

Timeframe

The hackathon usually starts with a few ideas on a dedicated #hackathon channel on Slack. Once we have a date and (possibly) a theme, we plan:

A short session to pitch our ideas, followed by dot voting on a Miro board.
A full hackathon day which stretches globally across time zones where we are based.
A (very popular) 2-hour demo session for the whole company to learn about the results.

This time we tried out Discord to run our virtual rooms; we had a virtual “kitchen” for general chit chat and about 5 other video/text channels for project discussions. It was nice to set these outside of our regular work apps as it helped us keep our focus.

Some of us also shared music on Groovechat, a really fun Spotify plugin developed by a member of our team; it enables you to create a private room to play music and chat live - give it a try!

The Projects

Git Bridge Hacking (Simon)
Improve template search results (Jessica, Eric, Roberta)
Spell-Check in frontend (Jakob)
Institutional SSO authentication via EduPerson (Christopher)
SFTP Server (Shane)
Detach PDF window (Paulo & Miguel)
Frequently linked-files (Chrystal, Tim)
Real User Metrics (Ali)
Frontend & Yoga (Roberta)

Git Bridge Hacking (Simon)

The Git Bridge presently keeps its current repositories on disk on a single VM instance and archives them to S3. This is not as scalable as we would like, and also has the problem that if something goes wrong on-disk we lose the commit history.

I wanted to see if we could improve scalability by making the process stateless and keeping its data in a database backend, and also whether we could compute history in a reproducible way from the Overleaf project-history service.

I had previously started a hobby-project that extends go-git to store its data in MongoDB instead of the filesystem, so decided to extend this to integrate with Overleaf. The advantage of go-git is that it is highly extensible, so can be connected to other backends easily.

The main part of this project was to write a transformer that converts entries in the History service into git commits in a reproducible way (i.e. it will generate the same Git hash every time.) Entries in history are stored in OT, so it was a matter of converting these to git diffs instead. The history service has a Swagger API and I was able to integrate this with the go-git server and then use it to transform the updates.

The outcome was a lightweight server that stored its content in MongoDB and had a commit history that was in-sync with the Overleaf project.

There were a lot of shortcuts required to get this done in time - error handling was very minimal and author names were left as a placeholder. The end result was read-only - you couldn’t push local changes. There would be a lot of extra work to get this production-ready, but it identified a good approach we could take if we were to commit resources to getting this done in future.

Improve template search results (Jessica, Eric, Roberta)

The goal of this project was to improve search results in the template gallery search of the Overleaf website so that users can find a suitable template for their project more easily.

Overleaf Template Gallery

As a first step, we experimented using a different search provider.

Our current search provider is AddSearch. It scans the gallery pages and builds a search database out of the contents of these pages. This is simple and works quite well. However, in order to get finer control on what makes a search result relevant, we need the search database to contain more structured data. AddSearch provides functionality to achieve that through special markup added to the pages. We could have added that markup to get better control on the results.

However, we already use Algolia, a different search provider, for our documentation section. Instead of scanning the gallery pages, it provides us with an API for directly sending structured data. This fits our needs a bit better and allows us to consolidate our search providers.

We ended up reimplementing the search using Algolia and experimenting with the new ways it allowed us to tweak the search. The results were promising, but this new search needs some more work before we make it available. This work is ongoing and we hope to release it soon.

Spell-Check in frontend (Jakob)

The spell-check feature in Overleaf currently processes the contents of each document server-side. The frontend editor is sending new words to the spelling service hosted in our infrastructure and receives back a list of misspellings and suggestions for those.

Using server-side processing allows the frontend code to remain fairly simple, at the cost of added latency for checks and a higher bandwidth bill for us. Unfortunately there is no simple interface for the spell checker of the modern browser, let alone a unified API across browser vendors.

As part of the hackathon I looked at a possible alternative for spell checking in the frontend. Processing the user content in the frontend would eliminate both downsides of the existing implementation.

For Overleaf v1 there was an existing prototype for using the Hunspell spell checker in the frontend. Hunspell is a popular spell checker used in LibreOffice, Google Chrome and other popular applications. Hunspell is written in C++ which is not compatible with JavaScript in the browser. With the use of modern tooling with WebAssembly and emscripten we can expose a Javascript compatible interface for the Hunspell checker.

The Hunspell checker is operating with separate dictionaries and suggestion files which are lazy loaded based on the users language preference.

At the end of the day I had a working prototype for Overleaf v2 and a selection of languages to pick from. The complexity of interfacing Hunspell was reasonable, I think. This is excluding the build pipeline for Hunspell.

Using the backend service we were able to generate suggestions for every misspelled word. The generation of suggestions using Hunspell is too slow and needs further investigation. We could generate suggestions as the user opens the browser context menu of a misspelled word.

Overall it was a fun project! I worked with technology I am not familiar with and touched areas of the codebase I do not look at as part of my day-to-day work.

Institutional SSO authentication via EduPerson (Christopher)

Last year Overleaf added support for institutional SSO, using the open source passport-saml library. This has been enthusiastically adopted by Universities, many of which are accustomed to using the eduPersonTargetedID attribute from the eduPerson schema as a privacy-respecting unique key for identifying the account of the authenticated user. However, we were aware that there was an issue using this attribute with passport-saml. I was keen to fix this, as I am a big advocate of institutional SSO due to my prior experience working for a collegiate University.

The main problem was figuring out the correct way to fix the problem. The passport-saml project correctly insists that changes to the library need to be compliant with the SAML 2.0 standard, and should be supported by tests. eduPersonTargetedID is not defined within the SAML standard, but as part of the eduPerson schema. It embeds a fragment of the SAML standard into the value of an attribute. However, the SAML standard does not assign any special meaning to a fragment of SAML appearing in an attribute.

A careful reading of the SAML standard revealed that any well-formed XML is a valid value for a SAML attribute. The passport-saml library processes scalar valued attributes, but ignores more complex attributes. It seems to me that a more compliant passport-saml implementation would return an object for complex attributes. It would be the responsibility of the calling application to assign any special meaning to the object. This could be implemented with a one line change, which I submitted to the project as a Pull Request, with a test using eduPersonTargetedID as an example. This was completed by the end of the Hackathon. The following morning I created a branch of Overleaf utilising my modification to passport-saml, and verified that it could be used with eduPersonTargetedID, testing against the UKAMF test IdP. My PR has subsequently been merged into passport-saml.

I was also reminded that eduPersonTargetedID is deprecated in the latest eduPerson standard, as it is superseded by the new SAML V2.0 Subject Identifier Attributes Profile Version 1.0. An interesting future project would be to add Subject Identifier Attribute support to passport-saml.

SFTP Server (Shane)

Shane prototyped an SFTP server, to allow users to access their project files directly rather than going through the Overleaf website. This project turned out to be more tricky than we expected, but we learned a lot along the way.

Detach PDF window (Paulo & Miguel)

Miguel and Paulo have prototyped what seems to be a very common request from Overleaf users: the ability to detach the PDF preview window. This allows the user to work more efficiently with a multi-monitor setup, by having the editor on one screen with the preview on the other one.

There are workarounds to achieve this currently, but a) those are not officially supported/endorsed; and b) they sacrifice some of the most important efficiency features of Overleaf (namely, the ability to compile via keyboard shortcuts from the editor and SyncTeX—the ability to go to the corresponding PDF location from code and vice-versa).

Overleaf Detach PDF Window

Implementing a built-in detach PDF preview feature allows the user to take advantage of his hardware without sacrificing Overleaf features.

We haven’t released this functionality on the product yet as it needs further work, but at least we have proved that it can be done!

Frequently linked-files (Chrystal, Tim)

Overleaf users often link to the same external files on a regular basis. We prototyped changing the ‘Add files from another project’ dialogue to have a small ‘Recently linked’ files section for easy access to those files that users reach for to include in their projects time and time again.

Overleaf Frequently Linked Files

It went well, pre-selecting the files correctly for the form by the end of the day. It was a productive time working on a day-to-day useful feature.

This was a useful proof of concept and it received positive feedback from the wider business when we demoed it. However it will require further validation work and development before we could release it to our users.

Real User Metrics (Ali)

Currently we don’t have a good way of monitoring the “average” end-user’s experience when using Overleaf. “Real User Metrics” (RUM) is a technique for capturing performance data from the browser so that we can keep an eye on how Overleaf is performing.

I researched which metrics would be useful for us to measure, and then developed some infrastructure for capturing these metrics using PerformanceObserver.

Going in, I knew that the specifications for capturing RUM are a bit in-flux, so it was useful to learn that PerformanceObserver looks like the best technique going forward. It was also useful to research which metrics are relevant to Overleaf, for example the First Contentful Paint metric, which is the amount of time for the editor loading screen to be shown.

Overleaf Real User Metrics

Frontend & Yoga (Roberta)

As a Product Manager I have started learning about front-end development to understand my team better. During the hackathon I dusted off an old book on designing websites and used it as a starting point to create a basic page in Atom. I then paired up with one of our team leads to get a bit of guidance and I was introduced to CodePen - a beginner-friendly tool that allows you to preview your HTML, CSS and Javascript as you go.

I found this tool really useful because I could tweak things quickly. You can change the view to switch from Full Page to Editor or Debug. Sharing was also very easy and I was able to do my demo right from there and share a link afterwards.

CodePen

CodePen_v2 Caption: CodePen allows you to view, edit and preview your code on one screen.

As for the content of the website, I filmed and hosted a “Desk Yoga” video routine for my colleagues to celebrate receiving my Yoga teacher diploma. 2 birds, 1 stone!

We hope that this has provided you with some inspiration for your next hackathon. Feel free to get in touch with us if you want any tips on working remotely, we can certainly help!

Overleaf’s Remote Hackathon

Timeframe