Blog

Fetching Files from GitHub

7/22/2020

So, I went ahead and created a pair of child pages for projects to display the readme and changelog from my git repositories. Child pages in Nuxt are pretty easy. You can just use the <nuxt-child /> component and all of the essential routing is handled automatically based on the pages directory structure. That part's easy! And in fact I've already got those pages up and working, displaying both my readme and changelog for the one project, but...

Here we run into an issue with the GitHub API's rate limiting.

Without authentication, API requests are limited based on IP address to only 60 per hour. An authenticated request, can make up to 5000. But - problem - what authentication options are there?

  1. Basic Auth with GitHub user credentials
  2. OAuth with GitHub login and redirect flow
  3. App secret

These all suck for a client-side interface. There's obviously no way I'll use my own user credentials to authenticate my site visitors, and publicly exposing an app secret is out of the question too.

I could do any of these in a server-side build step, meaning that there is no live content displayed and... updates to these pages will always lag behind until I initiate a new site build. This is far from ideal, but it is a working solution.

I could reduce requests by storing the data retrieved in session and not calling for it again while the site remains loaded (no matter what the user does in terms of page navigation). This would mean far fewer requests per visitor, and the limiting factor is then just how many projects they view where an API call is made. Realistically, this will never exceed 60 per hour for a very long time... except that it also matters how many pages I display from the repo and... at the moment it's two, but I might introduce more. I'd definitely like to display my BACKERS.md files. So...

Perhaps we need to think more in terms of bulk retrieval.

The GitHub API may be rate limited, but the Git protocol isn't so restrictive. Fetching the entirety of a repo when I only want to display a few files might be a sizeable operation, but... it does bring those few files all at once.

So what's better:

  • N requests per project retrieving specific file contents
  • 1 git retrieval per project retrieving the full repo contents

...?

It's a lot of little operations vs a few big ones. To support the latter idea, isomorphic-git exists, which is a pure JavaScript implementation of git. It requires an in-browser file system to handle retrieved files... so obviously the storage overhead for this approach is far greater, especially if individuals go exploring multiple projects on my site.

I am starting to dig the idea of featuring like a repo explorer though...

The issue is just, I guess, also making the browser forget these files when the user navigates away or closes the webpage. Shouldn't be a problem. Might explore this git approach.

If it turns out to be too much, too hungry for most visitors, or any particular repo of mine appears too large... then, at that point, we can maybe include a variable in each local project file that says, "hey, use the API approach for this project instead".

But I'm gonna go ahead and explore using an in-browser git client for now.

Edit: On second thoughts, I'm gonna develop the easier API approach first, backed by the Vuex store to prevent hammering of the API with numerous requests. Might explore the git approach later, but it's a big job I do not have the time for at the moment.

Edit 2: Vuex store seems broken for this purpose. Not sure why. Doesn't appear to persist between page navigation, so I wind up re-fetching over and over - the exact thing I'm trying to avoid. Maybe this is a local problem? Something about yarn dev launching Nuxt in dev mode, maybe? Maybe this solves itself on production? Worth deploying to see the result.