The Crash and Rebirth of a Six-Year-Old Open Source Project

I have an open source project that has been maintained for six years - RSSHub, and it is facing a collapse.

Background#

On the surface, it has nearly 30k stars, over 900 contributors, over 300 million requests per month, countless users, monthly sponsorship of tens of dollars, a steady stream of issues and pull requests, and code updates almost every day. It seems very healthy and vibrant. However, behind the scenes, the continuous high maintenance cost over the years, monthly server expenses of over a thousand dollars, repetitive and accumulating maintenance work every day, have pushed it to the edge of collapse.

The project was developed six years ago, and many trendy Node.js technologies and dependencies that were touted as the "Next Generation" at that time have become outdated. Many popular new technologies nowadays cannot be applied, such as JSX, TypeScript, Serverless, etc. Its architecture is also very unreasonable, with information about each route scattered in multiple places. Developing or modifying a route requires modifications in multiple places - registering the route, writing the route script, writing Radar rules, writing documentation, etc. This increases a lot of workload and is prone to errors. It wasn't a problem when there were fewer routes, but now it has become unbearable.

Maintaining the current state on such a poor foundation is already trying our best. Developing new features will only increase the difficulty of future updates. So sometimes, even if I have novel ideas, it is difficult to implement them.

The only solution to these problems is to rewrite the core using modern frameworks and newly designed architectures. However, as the number of routes increases, the cost of transformation also increases. Each fundamental change may require several months of work. So, even though the problem is getting more serious, we have been postponing it based on the principle that it is still usable.

But this is something that must be done, so I took some time to redesign and rewrite it.

Technology Stack Update#

koa -> Hono#

The first and most fundamental step is to replace the previous web framework koa. As a popular next-generation web framework six years ago, the author has long abandoned it. After research, I decided to switch to Hono, which has the best support for JSX, TypeScript, and Serverless.

There are significant differences in their APIs, so all middleware needs to be rewritten, and the koa APIs used in all routes need to be replaced.

Main changes:
https://github.com/DIYgod/RSSHub/pull/14295

The author of Hono also liked this transformation.

Yusuke Wada

@yusukebe

·Follow

うおお、このPRはすごい。RSSHubがKoaからHonoへ移行してる。 Files changed 2,440 +23,545 −19,773 github.com/DIYgod/RSSHub/…

11:25 AM · Feb 28, 2024

Read 3 replies

JavaScript -> TypeScript#

Using TypeScript can avoid many type-related issues and low-level errors. The most important thing is to ensure that the hundreds of contributors maintain consistent and high-quality route code.

Main changes:

DIŸgöd ☀️

@DIYgod

·Follow

现在 RSSHub 是一个 TypeScript 项目

6:43 PM · Mar 3, 2024

194

Read 6 replies

CommonJS -> ESM#

ESM is a specification that was strongly recommended by some Node.js core developers a few years ago. It has some advantages, but the biggest issue is the ecosystem fragmentation caused by its incompatibility with CommonJS and the simplification of functionality.

After several years of development, it can now be said that it is barely usable in most scenarios. tsx also provides support for mixed usage of CommonJS and ESM.

Although we have made our best efforts, there are still some CommonJS code that is difficult to migrate temporarily. As a result, we can only use tsx to run, which is incompatible with some serverless platforms like Vercel. However, there is still an opportunity to gradually solve this in the future.

Main changes:

art-template -> JSX#

art-template is a template engine that supports koa. I remember there was a more popular template engine six years ago, but I don't remember its name. I chose art-template because I couldn't understand the more popular one at that time, and this one is very simple.

Hono comes with JSX support, and JSX doesn't need much introduction. It is a syntax extension of JavaScript, which is equivalent to using React.

Main changes:

Jest -> Vitest#

Jest used to be a popular testing framework, but it has become less effective after the advent of ESM. Its support for ESM has always been "experimental support". Now, Vitest is more popular.

Main changes:
https://github.com/DIYgod/RSSHub/commit/38e42156a0622a2cd09f328d2d60623813b8df28

Got -> ?#

The currently used Got is also in a state of no active maintenance. I haven't found a good alternative yet, so I may switch to native Fetch or a self-encapsulated Fetch in the future, but I haven't started yet.

New Routing Standard#

I am not capable enough on my own, so I have learned and improved a lot through discussions with community developers. The process has been very interesting: https://github.com/DIYgod/RSSHub/issues/14685

Main changes:
https://github.com/DIYgod/RSSHub/pull/14718

History#

The new standard is mainly aimed at solving the problem of scattered routing information. This should be considered the third version.

The first version came from the development stage of RSSHub. At that time, I didn't foresee that there would be so many routes, so there was almost no planning. All routes were registered in the same file, and then the route script and documentation were added. Later, this file became larger and conflicts easily occurred. In addition, all route scripts would be loaded during the startup phase, and the program's performance became worse and worse.

The second version came from the period when it was maintained by NeverBehave. It introduced namespaces, split router.js and radar.js, and routes with the same namespace were centralized in the same folder and one or more Markdown documents. It also implemented lazy loading, greatly improving maintainability and performance. However, it was still scattered in multiple files, and inconsistencies in information between different files could easily lead to errors.

Now#

This time, the route files are divided into two categories: namespace.ts and route files with arbitrary names.

namespace.ts defines namespace information by exporting an object named "namespace".

import type { Namespace } from '@/types';

export const namespace: Namespace = {
    // ...
};

The fields contained in the namespace are restricted by TypeScript to:

interface Namespace {
    name: string;
    url?: string;
    categories?: string[];
    description?: string;
}

These pieces of information will be used by the compiled documentation and RSSHub Radar.

Route files define route information by exporting an object named "route".

import { Route } from '@/types';

export const route: Route = {
    // ...
};

The fields contained in the route are restricted by TypeScript to:

interface Route {
    path: string | string[];
    name: string;
    url?: string;
    maintainers: string[];
    handler: (ctx: Context) => Promise<Data> | Data;
    example: string;
    parameters?: Record<string, string>;
    description?: string;
    categories?: string[];

    features: {
        requireConfig?: string[] | false;
        requirePuppeteer?: boolean;
        antiCrawler?: boolean;
        supportRadar?: boolean;
        supportBT?: boolean;
        supportPodcast?: boolean;
        supportScihub?: boolean;
    };
    radar?: {
        source: string[];
        target?: string;
    };
}

The information that used to be in route.js, mantainer.js, radar.js, and the documentation is now centralized in this one file, reducing multiple definitions and reducing the possibility of errors.

Implementation#

The implementation logic is that in the development environment, we traverse the entire route folder, find all namespace.ts and route files, read the information, and load the routes. In the production environment, we use pre-compiled path lists to avoid traversal and unnecessary loading. The code is here: https://github.com/DIYgod/RSSHub/blob/master/lib/registry.ts

The documentation is also generated by traversing the route folder, finding all the required information, and synthesizing a series of Markdown files. It no longer needs to be manually maintained. The code is here: https://github.com/DIYgod/RSSHub/blob/master/scripts/workflow/build-routes.ts

Of course, routes developed using the previous standard need to be migrated to the new standard instead of being abandoned directly. They have been replaced after batch fetching and organizing information through scripts. However, the documentation is quite messy and contains many errors, so there are also many errors in the fetched information. We can only gradually modify them manually in the future.

Future#

With these improvements, RSSHub can finally get rid of its historical burdens and focus on developing new features. Here are some ideas I have accumulated to stimulate more ideas:

Since RSSHub is a data aggregator, its purpose is not limited to RSS. The JSON output function can be enhanced to be used as a general RESTful API, such as providing an interface to get the next page or output non-feed data similar to Twitter followers.
User system and user-customized configurations, generating private subscription addresses #14706
Routing error notification and health check #14712
Linkage with RSS3 nodes and cryptocurrency revenue sharing https://twitter.com/rss3_/status/1731822029199094012
AI translation and summarization
More detailed analysis of example data and reverse deduction to automatically recommend Radar rules
RSSHub instances bound to local browsers or clients, with the hope of truly solving anti-crawling challenges
...

Finally, open source is a very expensive thing, and RSSHub cannot survive until now without the help of these developers.

And these kind sponsors

If RSSHub is helping you, I hope you can actively participate and contribute to the future of information freedom.