For many, many years, I’ve been using DuckDuckGo and
I’ve largely been satisfied with it. It’s better than the more famous
alternatives and generally respects privacy by not tracking
you. That being
said, DDG has had some
controversy in the
past which, to their credit, they resolved in three months. Around some time
ago, I started noticing that results were being shuffled every time I visited
the search. Some users on Lemmy started
noticing this too. Although this isn’t the
end of the world, it does make things a lot harder when I accidentally click on
something (instead of opening it in a new tab) because I can’t go down the list
of results in the order it was presented anymore. At the end of the day, I felt
like it was time to switch to something else, if something better even existed.
This is where SearXNG comes in (I pronounce it as surk-sing1)
(Github,
Docs):
SearXNG is a free internet metasearch engine which aggregates results from
more than 70 search services. Users are neither tracked nor profiled.
Additionally, SearXNG can be used over Tor for online anonymity.
Essentially, it relays your searches to other search engines and congregates &
displays the results in the order that it wants. Imagine searching for “caprese
sandwich recipe” across DuckDuckGo, Qwant, Google, Brave, Bing, Yahoo, etc. and
getting all the results in one nice and easy sweep — that’s the gist.
To test it out, you can find a public SearXNG instance here:
https://searx.space/.
Here’s how a search result for “caprese sandwich recipe” looks like on a light
and dark background of SearXNG:
This is a part of my go-to gunicorn config that I created many years ago that I
adapted for SearXNG. I call this searxng-gunicorn.conf.py:
###################################################################################################### Server Socket & Worker Processes ######################################################################################################### The socket to bind. A string of the form: HOST, HOST:PORT, unix:PATH, fd://FD.# An IP is a valid HOST.## https://docs.gunicorn.org/en/latest/settings.html#bindbind ='IPADDR:PORTNUMBER'# Replace IPADDR:PORTNUMBER here # The number of worker processes for handling requests. THESE ARE PROCESSES.# Refer to: https://stackoverflow.com/a/41696500## https://docs.gunicorn.org/en/latest/settings.html#workersworkers =2# NOTE: In my experience on a server with 1-2 vCPU and 1-4Gb of RAM, 2 workers# + 2-3 threads works well for many cases.## Of course, ymmv because of the type of differences between HTTP calls and your# application, the architecture, the software you're dealing with, etcetera. Are# some of your HTTP requests blocking (i.e. do they take some time to process?)# or are they HTTP Keep-Alive kind of connections. My general go-to with# gunicorn is 2 workers + 3 threads when dealing with high volume of requests# with low response times (pun sort of intended), i.e. your response generation# doesn't take too much time to figure out.# The number of worker threads for handling requests. Run each worker with the# specified number of threads.## https://docs.gunicorn.org/en/latest/settings.html#threadsthreads =1# The maximum number of requests a worker will process before restarting. Any# value greater than zero will limit the number of requests a worker will# process before automatically restarting. This is a simple method to help limit# the damage of memory leaks. If this is set to zero (the default) then the# automatic worker restarts are disabled.## https://docs.gunicorn.org/en/latest/settings.html#max-requestsmax_requests =5000# The maximum jitter to add to the max_requests setting. The jitter causes the# restart per worker to be randomized by randint(0, max_requests_jitter). This# is intended to stagger worker restarts to avoid all workers restarting at the# same time.## https://docs.gunicorn.org/en/latest/settings.html#max-requests-jitter# max_requests_jitter = 4950################################################################################################################## Debugging #################################################################################################################### Restart workers when code changes.# This setting is intended for development. It will cause workers to be# restarted whenever application code changes.## https://docs.gunicorn.org/en/latest/settings.html#reloadreload=False# Install a trace function that spews every line executed by the server.# This is the nuclear option.## https://docs.gunicorn.org/en/latest/settings.html#spewspew=False
— Caddyfile
You may notice that the binding is IPADDR:PORTNUMBER. It’s a good idea for the
gunicorn service is not exposed to the internet directly but through a reverse
proxy. My HTTP server
caddy (which is fantastic btw) acts as a reverse
proxy for the gunicorn service. A simple, sample Caddyfile:
Basicauth ensures that a username and password is required to access a caddy
directive webpage. For now, I wish for my SearXNG instance to be private. Why do
I use basicauth
instead of something easier like
ClientTLS??? Well,
I’m glad you asked! It’s time for a mini rant!
—— Cloudflare, Caddy, and ClientTLS
Before we begin, here’s 3 quick paras about Client TLS. If you don’t care, skip
to The Rant below.
——— Client TLS
In TLS, the authenticity comes from certificates. How does your browser know
that you’re actually viewing the webpages from my website, https://snee.la,
and not some other (potentially malicious) impersonating website? Via
certificates. My website has a certificate issued to it by a Certificate
Authority that your browser trusts. Your browser validates the certificate and
ensures that all the magic math + complex cryptography is correct. If everything
works out right, you end up seeing the lock/padlock symbol in the url bar of
your browser. This means that the data you see at the webpage you visit is from
the url you intended to visit.
Let’s flip this around. What if my website wants to ensure that it’s talking to
you and no one else? Well, this can be achieved via certificates of course! In
the day-to-day scenario of you visiting a website, this isn’t a common
requirement. The websites you visit on a daily basis (Wikipedia for example)
don’t have to validate you. They don’t have to ensure that it’s you who’s
visiting the website and not an impersonator3. They’re fine to serve their webpages to anyone
who asks for them, just like my website served you these webpages when you
requested to see it.
However, there are certain use cases where a server wants to ensure that it’s
talking to a trusted client and not an impersonator. A very good example of this
is an IoT ecosystem. A
server collecting data from many hundreds / thousands of devices with sensors
has to ensure that only trusted devices submit data to it (the server).
Otherwise, a malicious device could submit deliberately bad data to the server.
This is where Client TLS comes in. Every device that’s communicating with the
server has to have a TLS certificate issued by a Certificate Authority that the
server trusts. This certificate is presented at the beginning to prove to the
server that the client is authentic / trusted.
——— The Rant
TL;DR: Cloudflare doesn’t support Client TLS with your own CA unless you’re an
enterprise
customer.
Something you should know before I continue: Most of my domains are managed by
Cloudflare, including the one I’m using to access my SearXNG server. Cloudflare
takes care of DNS and proxying for me so that my puny servers don’t have to
potentially deal with a gajillion requests.
I decided to go down the Client TLS certificate route and even made an ultra
cool certificate-making set of scripts which you can find at Appendix A —
Client TLS Certificate Script.
The script is based on mtigas’s gist:
Mini tutorial for configuring client-side SSL certificates.
I configured client TLS in caddy. This is called
client_auth
in the tls directive.
I then imported an issued client certificate into my web browser, Firefox.
This process is only available on Enterprise accounts.
Each Enterprise account can upload up to five CAs. This quota does not apply
to CAs uploaded through Cloudflare Access.
UGH. An absolute waste of time. To be fair, it’s my fault for not having checked
with Cloudflare’s docs in advance.
Closing Thoughts
I’ve been using SearXNG for the past 1.5 days as of writing this and there have
been very few issues that I’ve had. The biggest problem is with Qwant and this
Github Issue seems to explain
that the gibberish issue is because Qwant isn’t available in the country that my
server is in. I’ve just disabled Qwant for now. So far, I’ve been pretty happy
with SearXNG and I think I’ll be continuing to use it. If I change my mind,
I’ll either update this post or write a new one.
To try SearXNG out, you can find a public SearXNG instance here:
https://searx.space/.
Footnotes
1 SearXNG is a fork of Searx
(Github,
Wikipedia). Searx, according to the
Github, is pronounced as /sɜːks/ — this is pronounced as “surks”. Based on
this, I’ve decided to call SearXNG as surk-sing or (occasionally) searching. ↩
2 As stated in footnote 1, SearXNG is a fork of
Searx. Searx is built using Flask. ↩
3 I’m not considering the authentication mechanism
that usernames, passwords, and/or MFA provide. ↩
Appendix A — Client TLS Certificate Script
Some pretty awfully coded scripts to generate Client TLS Certificates.
create-cert.sh:
#!/bin/bash
# Based On: https://gist.github.com/mtigas/952344# This is only to create a client TLS certificate. It expects ca.key, ca.pem in# a directory called `InputCA/`. Refer to the gist above to generate the key &# pem files. Just make sure to change the cipher to prime256v1 while generating# the key & pem if you copy this file directly.# Also ensure that "echo 0 > client_serial.id" is run exactly once before# running this script. I CBA to actually test if it exists.if[[ $# -eq 0]]; then echo "expected argument 1 to be the name of client" echo "exiting..." exit 1fiINPUT_DIR=InputCA
DUMP_DIR=dump
OUT_DIR=Out
CLIENT_SERIAL_FILE=client_serial.id
CLIENT_SERIAL_OLD=$(cat $CLIENT_SERIAL_FILE)CLIENT_SERIAL=$((CLIENT_SERIAL_OLD+=1))CLIENT_ID=$CLIENT_SERIAL"-"$1
echo "[CREATE-CERT] Generating key..."openssl ecparam -genkey -name prime256v1 | openssl ec -out $CLIENT_ID.key
echo "[CREATE-CERT] Generating CSR..."openssl req -new -key $CLIENT_ID.key -out $CLIENT_ID.csr
echo "[CREATE-CERT] Issuing certificate..."openssl x509 -req -days 3650 -in $CLIENT_ID.csr -CA $INPUT_DIR/ca.pem \
-CAkey $INPUT_DIR/ca.key -set_serial $CLIENT_SERIAL -out $CLIENT_ID.pem
# Writing serial to the fileecho $CLIENT_SERIAL > $CLIENT_SERIAL_FILE
cat $CLIENT_ID.key $CLIENT_ID.pem $INPUT_DIR/ca.pem > $CLIENT_ID.full.pem
echo "================================"echo "[CREATE-CERT] Outputs in $OUT_DIR/$CLIENT_ID"echo "================================"openssl pkcs12 -export -out $CLIENT_ID.full.pfx -inkey $CLIENT_ID.key \
-in $CLIENT_ID.pem -certfile $INPUT_DIR/ca.pem
mkdir -p $DUMP_DIR/$CLIENT_ID
mkdir -p $OUT_DIR/$CLIENT_ID
mv $CLIENT_ID.full.pem $CLIENT_ID.full.pfx $OUT_DIR/$CLIENT_ID
mv $CLIENT_ID.key $CLIENT_ID.csr $CLIENT_ID.pem $DUMP_DIR/$CLIENT_ID
and an expect script whose argument must be the name of the client:
#!/usr/bin/expect
set clientname [lindex $argv 0];
spawn ./create-cert.sh $clientname
expect -ex {Country Name (2 letter code)[AU]:}{send "AT\r"}expect -ex {State or Province Name (full name)[Some-State]:}{send "AStateOfPeril\r"}expect -ex {Locality Name (eg, city)[]:}{send "\r"}expect -ex {Organization Name (eg, company)[Internet Widgits Pty Ltd]:}{send "Sneelas CA\r"}expect -ex {Organizational Unit Name (eg, section)[]:}{send "\r"}expect -ex {Common Name (e.g. server FQDN or YOUR name)[]:}{send "$clientname\r"}expect -ex {Email Address []:}{send "\r"}expect -ex {A challenge password []:}{send "password\r"}expect -ex {An optional company name []:}{send "\r"}expect -ex {Enter Export Password:}{send "password\r"}expect -ex {Verifying - Enter Export Password:}{send "password\r"}sleep 1# fastest & hackiest way to ensure mkdir, mv of create-cert.sh run