February 5, 2024

Setting Up SearXNG

Table of contents

SearXNG

For many, many years, I’ve been using DuckDuckGo and I’ve largely been satisfied with it. It’s better than the more famous alternatives and generally respects privacy by not tracking you. That being said, DDG has had some controversy in the past which, to their credit, they resolved in three months. Around some time ago, I started noticing that results were being shuffled every time I visited the search. Some users on Lemmy started noticing this too. Although this isn’t the end of the world, it does make things a lot harder when I accidentally click on something (instead of opening it in a new tab) because I can’t go down the list of results in the order it was presented anymore. At the end of the day, I felt like it was time to switch to something else, if something better even existed.

This is where SearXNG comes in (I pronounce it as surk-sing1) (Github, Docs):

SearXNG is a free internet metasearch engine which aggregates results from more than 70 search services. Users are neither tracked nor profiled. Additionally, SearXNG can be used over Tor for online anonymity.

Essentially, it relays your searches to other search engines and congregates & displays the results in the order that it wants. Imagine searching for “caprese sandwich recipe” across DuckDuckGo, Qwant, Google, Brave, Bing, Yahoo, etc. and getting all the results in one nice and easy sweep — that’s the gist.

To test it out, you can find a public SearXNG instance here: https://searx.space/.

Here’s how a search result for “caprese sandwich recipe” looks like on a light and dark background of SearXNG:

A comparison of two images of SearXNG one of which has a light background and the other has a dark background. Both images contain the search results for Caprese Sandwich Recipe.

Here’s how the main page (dark) looks like:

The main homepage of SearXNG using the dark theme. In the center is an italicized 'SearXNG' in blue under which is a search box with the preview text 'Search for...' inside of it.

Setting SearXNG Up

Setting it up on my VPS was pretty easy. There are three installation methods after which the next step, using a WSGI, was ever so slightly a challenge. The good thing is that I’m used to using gunicorn & flask (see I Love Flask’s Documentation) and SearXNG is built using flask2.

— Systemd File

I made a service file (based off of gunicorn’s systemd service):

[Unit]
Description=searxng gunicorn daemon
After=network.target

[Service]
Type=notify
User=searxng
Group=searxng
RuntimeDirectory=gunicorn
WorkingDirectory=/usr/local/searxng/searxng-src/searx/
ExecStart=/usr/local/searxng/searx-pyenv/bin/gunicorn --config=searxng-gunicorn.conf.py webapp
ExecReload=/bin/kill -s HUP $MAINPID
KillMode=mixed
TimeoutStopSec=5
PrivateTmp=true

[Install]
WantedBy=multi-user.target

— Gunicorn Config

This is a part of my go-to gunicorn config that I created many years ago that I adapted for SearXNG. I call this searxng-gunicorn.conf.py:

################################################################################
###################### Server Socket & Worker Processes ########################
################################################################################
# The socket to bind. A string of the form: HOST, HOST:PORT, unix:PATH, fd://FD.
# An IP is a valid HOST.
#
# https://docs.gunicorn.org/en/latest/settings.html#bind
bind = 'IPADDR:PORTNUMBER' # Replace IPADDR:PORTNUMBER here 

# The number of worker processes for handling requests. THESE ARE PROCESSES.
# Refer to: https://stackoverflow.com/a/41696500
#
# https://docs.gunicorn.org/en/latest/settings.html#workers
workers = 2 

# NOTE: In my experience on a server with 1-2 vCPU and 1-4Gb of RAM, 2 workers
# + 2-3 threads works well for many cases.
#
# Of course, ymmv because of the type of differences between HTTP calls and your
# application, the architecture, the software you're dealing with, etcetera. Are
# some of your HTTP requests blocking (i.e. do they take some time to process?)
# or are they HTTP Keep-Alive kind of connections. My general go-to with
# gunicorn is 2 workers + 3 threads when dealing with high volume of requests
# with low response times (pun sort of intended), i.e. your response generation
# doesn't take too much time to figure out.

# The number of worker threads for handling requests. Run each worker with the
# specified number of threads.
#
# https://docs.gunicorn.org/en/latest/settings.html#threads
threads = 1

# The maximum number of requests a worker will process before restarting. Any
# value greater than zero will limit the number of requests a worker will
# process before automatically restarting. This is a simple method to help limit
# the damage of memory leaks. If this is set to zero (the default) then the
# automatic worker restarts are disabled.
#
# https://docs.gunicorn.org/en/latest/settings.html#max-requests
max_requests = 5000

# The maximum jitter to add to the max_requests setting. The jitter causes the
# restart per worker to be randomized by randint(0, max_requests_jitter). This
# is intended to stagger worker restarts to avoid all workers restarting at the
# same time.
#
# https://docs.gunicorn.org/en/latest/settings.html#max-requests-jitter
# max_requests_jitter = 4950

################################################################################
################################## Debugging ###################################
################################################################################
# Restart workers when code changes.
# This setting is intended for development. It will cause workers to be
# restarted whenever application code changes.
#
# https://docs.gunicorn.org/en/latest/settings.html#reload
reload= False

# Install a trace function that spews every line executed by the server.
# This is the nuclear option.
#
# https://docs.gunicorn.org/en/latest/settings.html#spew
spew= False

— Caddyfile

You may notice that the binding is IPADDR:PORTNUMBER. It’s a good idea for the gunicorn service is not exposed to the internet directly but through a reverse proxy. My HTTP server caddy (which is fantastic btw) acts as a reverse proxy for the gunicorn service. A simple, sample Caddyfile:

https://searxng.example.com {
	reverse_proxy IPADDR:PORTNUM

	basicauth {
		username hashed_password
	}
}

Basicauth ensures that a username and password is required to access a caddy directive webpage. For now, I wish for my SearXNG instance to be private. Why do I use basicauth instead of something easier like ClientTLS??? Well, I’m glad you asked! It’s time for a mini rant!

—— Cloudflare, Caddy, and ClientTLS

Before we begin, here’s 3 quick paras about Client TLS. If you don’t care, skip to The Rant below.

——— Client TLS

In TLS, the authenticity comes from certificates. How does your browser know that you’re actually viewing the webpages from my website, https://snee.la, and not some other (potentially malicious) impersonating website? Via certificates. My website has a certificate issued to it by a Certificate Authority that your browser trusts. Your browser validates the certificate and ensures that all the magic math + complex cryptography is correct. If everything works out right, you end up seeing the lock/padlock symbol in the url bar of your browser. This means that the data you see at the webpage you visit is from the url you intended to visit.

Let’s flip this around. What if my website wants to ensure that it’s talking to you and no one else? Well, this can be achieved via certificates of course! In the day-to-day scenario of you visiting a website, this isn’t a common requirement. The websites you visit on a daily basis (Wikipedia for example) don’t have to validate you. They don’t have to ensure that it’s you who’s visiting the website and not an impersonator3. They’re fine to serve their webpages to anyone who asks for them, just like my website served you these webpages when you requested to see it.

However, there are certain use cases where a server wants to ensure that it’s talking to a trusted client and not an impersonator. A very good example of this is an IoT ecosystem. A server collecting data from many hundreds / thousands of devices with sensors has to ensure that only trusted devices submit data to it (the server). Otherwise, a malicious device could submit deliberately bad data to the server. This is where Client TLS comes in. Every device that’s communicating with the server has to have a TLS certificate issued by a Certificate Authority that the server trusts. This certificate is presented at the beginning to prove to the server that the client is authentic / trusted.

——— The Rant

TL;DR: Cloudflare doesn’t support Client TLS with your own CA unless you’re an enterprise customer.

Something you should know before I continue: Most of my domains are managed by Cloudflare, including the one I’m using to access my SearXNG server. Cloudflare takes care of DNS and proxying for me so that my puny servers don’t have to potentially deal with a gajillion requests.

I decided to go down the Client TLS certificate route and even made an ultra cool certificate-making set of scripts which you can find at Appendix A — Client TLS Certificate Script. The script is based on mtigas’s gist: Mini tutorial for configuring client-side SSL certificates.

I configured client TLS in caddy. This is called client_auth in the tls directive. I then imported an issued client certificate into my web browser, Firefox.

Sample Caddyfile config:

https://searxng.example.com {
	reverse_proxy IPADDR:PORTNUM

	tls {
		client_auth {
			mode require_and_verify
			trusted_ca_cert MII*********************************************XV4=
		}
	}
}

As I tried to connect to the server, I keep getting a Error 520 as shown below

An error page of Cloudflare with the large heading text as 'Web server is returning an unknown error' and a bubble below mentioning that the error code is 520. There are three diagrams which indicate that the browser is working, the cloudflare center in Vienna is working, and that the host has an error.

After MUCH digging into Cloudflare’s “Mutual TLS” docs (this is what they call Client TLS), I find this page: Cloudflare Docs - Bring your own CA for mTLS.

In the availability section (emphasis mine):

Availability

  • Currently, you can only upload your CA via API.
  • This process is only available on Enterprise accounts.
  • Each Enterprise account can upload up to five CAs. This quota does not apply to CAs uploaded through Cloudflare Access.

UGH. An absolute waste of time. To be fair, it’s my fault for not having checked with Cloudflare’s docs in advance.

Closing Thoughts

I’ve been using SearXNG for the past 1.5 days as of writing this and there have been very few issues that I’ve had. The biggest problem is with Qwant and this Github Issue seems to explain that the gibberish issue is because Qwant isn’t available in the country that my server is in. I’ve just disabled Qwant for now. So far, I’ve been pretty happy with SearXNG and I think I’ll be continuing to use it. If I change my mind, I’ll either update this post or write a new one.

To try SearXNG out, you can find a public SearXNG instance here: https://searx.space/.

Footnotes

1 SearXNG is a fork of Searx (Github, Wikipedia). Searx, according to the Github, is pronounced as /sɜːks/ — this is pronounced as “surks”. Based on this, I’ve decided to call SearXNG as surk-sing or (occasionally) searching.

2 As stated in footnote 1, SearXNG is a fork of Searx. Searx is built using Flask.

3 I’m not considering the authentication mechanism that usernames, passwords, and/or MFA provide.


Appendix A — Client TLS Certificate Script

Some pretty awfully coded scripts to generate Client TLS Certificates.

create-cert.sh:

#!/bin/bash
# Based On: https://gist.github.com/mtigas/952344

# This is only to create a client TLS certificate. It expects ca.key, ca.pem in
# a directory called `InputCA/`. Refer to the gist above to generate the key &
# pem files. Just make sure to change the cipher to prime256v1 while generating
# the key & pem if you copy this file directly.

# Also ensure that "echo 0 > client_serial.id" is run exactly once before
# running this script. I CBA to actually test if it exists.

if [[ $# -eq 0 ]]; then
    echo "expected argument 1 to be the name of client"
    echo "exiting..."
    exit 1
fi

INPUT_DIR=InputCA
DUMP_DIR=dump
OUT_DIR=Out

CLIENT_SERIAL_FILE=client_serial.id

CLIENT_SERIAL_OLD=$(cat $CLIENT_SERIAL_FILE)
CLIENT_SERIAL=$((CLIENT_SERIAL_OLD+=1))

CLIENT_ID=$CLIENT_SERIAL"-"$1

echo "[CREATE-CERT] Generating key..."
openssl ecparam -genkey -name prime256v1 | openssl ec -out $CLIENT_ID.key

echo "[CREATE-CERT] Generating CSR..."
openssl req -new -key $CLIENT_ID.key -out $CLIENT_ID.csr

echo "[CREATE-CERT] Issuing certificate..."
openssl x509 -req -days 3650 -in $CLIENT_ID.csr -CA $INPUT_DIR/ca.pem          \
	-CAkey $INPUT_DIR/ca.key -set_serial $CLIENT_SERIAL -out $CLIENT_ID.pem

# Writing serial to the file
echo $CLIENT_SERIAL > $CLIENT_SERIAL_FILE

cat $CLIENT_ID.key $CLIENT_ID.pem $INPUT_DIR/ca.pem > $CLIENT_ID.full.pem

echo "================================"
echo "[CREATE-CERT] Outputs in $OUT_DIR/$CLIENT_ID"
echo "================================"
openssl pkcs12 -export -out $CLIENT_ID.full.pfx -inkey $CLIENT_ID.key          \
	-in $CLIENT_ID.pem -certfile $INPUT_DIR/ca.pem

mkdir -p $DUMP_DIR/$CLIENT_ID
mkdir -p $OUT_DIR/$CLIENT_ID

mv $CLIENT_ID.full.pem $CLIENT_ID.full.pfx $OUT_DIR/$CLIENT_ID
mv $CLIENT_ID.key $CLIENT_ID.csr $CLIENT_ID.pem $DUMP_DIR/$CLIENT_ID

and an expect script whose argument must be the name of the client:

#!/usr/bin/expect

set clientname [lindex $argv 0];
spawn ./create-cert.sh $clientname

expect -ex {Country Name (2 letter code) [AU]:} {send "AT\r"}
expect -ex {State or Province Name (full name) [Some-State]:} {send "AStateOfPeril\r"}
expect -ex {Locality Name (eg, city) []:} {send "\r"}
expect -ex {Organization Name (eg, company) [Internet Widgits Pty Ltd]:} {send "Sneelas CA\r"}
expect -ex {Organizational Unit Name (eg, section) []:} {send "\r"}
expect -ex {Common Name (e.g. server FQDN or YOUR name) []:} {send "$clientname\r"}
expect -ex {Email Address []:} {send "\r"}
expect -ex {A challenge password []:} {send "password\r"}
expect -ex {An optional company name []:} {send "\r"}
expect -ex {Enter Export Password:} {send "password\r"}
expect -ex {Verifying - Enter Export Password:} {send "password\r"}

sleep 1 # fastest & hackiest way to ensure mkdir, mv of create-cert.sh run