I’m reaching out to the community to gather your thoughts and suggestions on how we can enhance Lemmy’s search functionality, as discussed in Issue #846. Currently, the search options (new or top of a specific time) do not consistently deliver relevant or useful results, which creates difficulties for users trying to find posts based on specific keywords. While search engines like Google employ factors such as backlinks, freshness, keyword mentions, user experience, and topical authority, we need to strike a balance between improving search results and maintaining low complexity.
Please consider leaving a thumbs up on the GitHub issue.
The main issue is that currently searching hits the database hard, and database load is already the main bottleneck for Lemmy performance.
I think there need to be backend changes first, such as support for Meilisearch or similar before search options can be extended.
OP, who are you and why are you asking this? Search is one of my things (solrize :: solr.apache.org) so If you’re one of the Lemmy search devs I can discuss search stuff with you here, but I would rather not sign up on the Microsoft code hosting site for that purpose. If you’re a regular Lemmy user trying to get other users to brigade a dev ticket, that is not very nice, and I think it’s better to discuss search stuff here on Lemmy.
While agree that I’d like Lemmy search is not up to snuff, I think it will be a while before it meaningfully improves:
- There are higher priorities that need work right now, and that work is not simple or quick. Lemmy desperately needs performance/scaling improvements like db query optimization and support for pg read replicas to weather the influx of reddit immigrants. It desperately needs improved mod tools so mods/admins can keep up with the torrent of bots and abuse. As useful as improved search would be, scaling and moderation are existential challenges that need attention first.
- Proper multi-dimensional search weighting is complicated. The search boxes we’ve become accustomed to on the commercial internet are powered by incredibly complex backends with multiple different data-bas-ish components, multiple async analysis/weighting pipelines, and plus bits responding to queries. While these techniques can be scaled down to work on smaller deployments, they will definitely make Lemmy more complex to run… which is a very expensive tradeoff for an ecosystem that depends on amateur sysadmins to volunteer to run instances.
- These search systems are also computationally expensive, much much more so than “simple” storage and fetching of posts/comments. Lemmy instances are already groaning under the weight of the reddit user influx, and I don’t see devs or admins signing up immediately to add resource hungry features to their setups.
- It’s possible to improve search from outside Lemmy. Lemmy instances are not well indexed by external search engines yet, but searching for
site:lemmy.*
do return some results and as Lemmy instances begin to fill up with high quality content I think we’ll see the the “anchor instances” climb the rankings and crawl priority relatively quickly.
All of which is to say… better search would be very useful but there are even more important features right now… and it won’t be easy when the time comes. A combination of making better use of type/community constraints and searching outside Lemmy is probably your best bet unless you’re a developer who has built multi-dimensionally weighted search tools before and can do some dev/testing to show how how much better an alternative could be.