Ahmia search engine use elasticsearch to index content.
pip install -r requirements.txt
Ensure that you default version is python3:
python --version
example.env
contains some default values that should work out of the box. Copy this to .env
to create
your own instance of environment settings:
cp example.env .env
Review the .env
file to ensure that it fits your needs. Make any modifications needed there.
Default configuration is enough to run index in dev mode. Here is suggestion for a more secure configuration
elasticsearch - nofile unlimited
elasticsearch soft memlock unlimited
elasticsearch hard memlock unlimited
As a general rule, you should set -Xms and -Xmx to the SAME value, which should be 50% of your total available RAM.
-Xms15g
-Xmx15g
MAX_OPEN_FILES=unlimited
MAX_LOCKED_MEMORY=unlimited
bootstrap.mlockall: true
script.engine.groovy.inline.update: on
script.engine.groovy.inline.aggs: on
# systemctl start elasticsearch
curl -XPUT 'http://localhost:9200/_all/_settings?preserve_existing=true' -d '{
"index.max_result_window" : "30000"
}'
Please do this when running for the first time
$ bash setup_index.sh
Alternatively you could set up the indices manually, somehow like this:
$ curl -XPUT -i "localhost:9200/tor-2018-01/" -H 'Content-Type: application/json' -d "@./mappings_tor.json"
$ curl -XPUT -i "localhost:9200/i2p-2018-01/" -H 'Content-Type: application/json' -d "@./mappings_i2p.json"
$ curl -XPUT -i "localhost:9200/tor-2018-02/" -H 'Content-Type: application/json' -d "@./mappings_tor.json"
$ curl -XPUT -i "localhost:9200/i2p-2018-02/" -H 'Content-Type: application/json' -d "@./mappings_i2p.json"
...
...
latest-tor
, latest-i2p
aliases pointed to latest monthly indicesThis needs to be the first time you deploy and then once per month
$ python point_to_indexes.py
$ bash call_filtering.sh
# Execute child abuse text filtering over the index every hour
30 * * * * cd /home/juha/ahmia-index && bash wrap_filtering.sh > ./crontab_filter.log 2>&1
# First of Each Month:
10 04 01 * * cd /home/juha/ahmia-index && python point_to_indexes.py --add > ./add_alias.log 2>&1
# On 16th of Each Month
10 04 16 * * cd /home/juha/ahmia-index && python point_to_indexes.py --rm > ./remove_alias.log 2>&1