Installation on Linux
Installation on Linux is only possible for users with a commercial Flow license.
If you want to go pro check out our license plans or just use our free docker version.
It offers more freedom than the pre-configured Docker image, but set up is a little more complex.
Requirements
Flow requires third party software to run. Please note that the licenses are different to the Flow license. To use Flow you need to accept the licenses of each required third party software.
List of dependencies
Flow 5.0.2 depends on the following 3rd party software.
Software | Version | Artifacts | License | Repo |
---|---|---|---|---|
Solr | 9.4.1 | solr-core | Apache-2.0 | |
Tensorflow | 2.10.1 | tensorflow-core-api, tensorflow-core-platform | Apache-2.0 | |
Fork of DJL | 0.22-ndarray-1 | api, tensorflow-engine | Apache-2.0 | |
Protobuf | 3.19.4 | protobuf-java | BSD-3-Clause | |
JavaCPP | 1.5.8 | javacpp | Apache-2.0 | |
NDArray | 0.4.0 | ndarray | Apache-2.0 | |
GSON | 2.8.9 | gson | Apache-2.0 |
Architecture
Flow uses highly optimized native code for some operations which require x86_64
CPUs (also named amd64
). Other architectures like ARM, PowerPC etc. are not supported.
Linux
Use a modern and stable GNU/Linux Server operation system like Debian 11 or Ubuntu Server 22.04 LTS.
Java
You need the Java Runtime Environment (JRE) version 11 or higher. Check your Java version like this (your output may differ):
java -version
openjdk version "11.0.19" 2023-04-18
OpenJDK Runtime Environment Temurin-11.0.19+7 (build 11.0.19+7)
OpenJDK 64-Bit Server VM Temurin-11.0.19+7 (build 11.0.19+7, mixed mode, sharing)
Due to a bug in Java affecting Apache Lucene that can cause JVM crashes it is recommended to use Java versions 11.0.19, 17.0.7 or 19.0.2 (or later).
Machine learning framework
Flow uses the Deep Java Library (DJL) in conjunction with Tensorflow as engine for model inference and efficient matrix operations.
Apache Solr
Flow not only provides an advanced search API, but also extends the inner search process of Apache Solr, including data prefetching and scoring. This leads to deep integration and the use of many internal Solr APIs that may change from one version to another. For this reason, the provided Flow binaries are tested and compiled against a specific Solr version and can only be used with that version.
We provide Flow binaries for the following Solr versions:
Version | Solr 6.6 | Solr 7.7 | Solr 8.11 | Solr 9.0 | Solr 9.2 | Solr 9.3 | Solr 9.4 |
---|---|---|---|---|---|---|---|
Flow 3.4.X | |||||||
Flow 4.0.0 - 4.0.2 | |||||||
Flow 4.0.4 - 4.0.5 | |||||||
Flow 5.0.X |
You can check which Solr version (SOLR_VERSION
) the Flow binaries (FLOW_VERSION
) are compatible with in two places.
- The Flow JAR filename
pixolution-flow-[FLOW_VERSION]-solr-[SOLR_VERSION].jar
or - the
META-INF/MANIFEST.MF
file in this JAR file:Solr-Version: [SOLR_VERSION] Specification-Version: [FLOW_VERSION]
We only declare the SOLR_VERSION
as major and minor version (e.g. 9.4
).
You can safely upgrade your Solr instance as long as only the subminor version changes (9.4.X
).
Installation
We assume you already got the Flow package compatible to a specific Solr version. Now, unpack the zip archive:
unzip pixolution-flow-5.0.2-solr-9.4.zip
cat QUICKSTART.txt
Java memory limits
Since we load AI models into memory at startup, we need to increase the available memory that Solr can allocate.
The limits are set in the configuration file [SOLR_INSTALL_FOLDER]/bin/solr.in.sh
and default to 512MB
if not set.
We recommend setting the limit to at least 2GB
or higher.
- Open
[SOLR_INSTALL_FOLDER]/bin/solr.in.sh
- Append the following rule:
SOLR_JAVA_MEM="-Xmx2g"
Instead of manually adding the permission rules you can also run the following bash commands.
- Your working directory must contain the
solr-9.4.1
folder. - Execute the following command:
sed -i 's|.*SOLR_JAVA_MEM.*|SOLR_JAVA_MEM="-Xmx2g"|g' solr-9.4.1/bin/solr.in.sh
Java security policy
Starting with Solr 9 the Java security policies are active by default.
Flow requires the permission to load libraries at runtime.
Therefore you have to add permission rules to the security.policy
file shipped with Solr:
- Open
[SOLR_INSTALL_FOLDER]/server/etc/security.policy
- add the following rules and replace
[USERNAME]
with the user that is running the Solr process.grant { permission java.lang.RuntimePermission "loadLibrary.*"; }; grant { permission java.lang.RuntimePermission "createSecurityManager"; }; grant { permission java.io.FilePermission "/tmp/.javacpp-[USERNAME]/cache", "read,execute,write"; }; grant { permission java.io.FilePermission "/tmp/.javacpp-[USERNAME]/cache/-", "read,execute,write"; }; grant { permission java.io.FilePermission "/home/[USERNAME]/.javacpp/cache", "read,write,execute"; }; grant { permission java.io.FilePermission "/home/[USERNAME]/.javacpp/cache/-", "read,write,execute"; };
Instead of manually adding the permission rules you can also run the following bash commands.
- Your working directory must contain the
solr-9.4.1
folder. - Execute the following commands:
echo 'grant { permission java.lang.RuntimePermission "loadLibrary.*"; };' >> solr-9.4.1/server/etc/security.policy echo 'grant { permission java.lang.RuntimePermission "createSecurityManager"; };' >> solr-9.4.1/server/etc/security.policy echo "grant { permission java.io.FilePermission \"/tmp/.javacpp-${USER}/cache\", \"read,execute,write\"; };" >> solr-9.4.1/server/etc/security.policy echo "grant { permission java.io.FilePermission \"/tmp/.javacpp-${USER}/cache/-\", \"read,execute,write\"; };" >> solr-9.4.1/server/etc/security.policy echo "grant { permission java.io.FilePermission \"/home/${USER}/.javacpp/cache\", \"read,execute,write\"; };" >> solr-9.4.1/server/etc/security.policy echo "grant { permission java.io.FilePermission \"/home/${USER}/.javacpp/cache/-\", \"read,execute,write\"; };" >> solr-9.4.1/server/etc/security.policy
Configuration
Flow is configured using the solrconfig.xml
and the schema.xml
or managed-schema
file.
solrconfig.xml
The solrconfig.xml
contains all operational configuration like request handlers, search components, update processors and such.
Components
<!-- Pixolution components -->
<queryParser name="pxlParser" class="de.pixolution.solr.search.PixolutionParserPlugin" />
<searchComponent name="pxlFlow" class="de.pixolution.solr.handler.component.PixolutionComponent" />
<queryResponseWriter name="html" default="false" class="de.pixolution.solr.response.HtmlResponseWriter" />
<!-- Required by tagging handler -->
<searchComponent name="taggingComponentPre" class="de.pixolution.solr.handler.component.TaggingComponentPre"/>
<searchComponent name="taggingComponentPost" class="de.pixolution.solr.handler.component.TaggingComponentPost"/>
Special request handler
<requestHandler name="/pixolution" class="de.pixolution.solr.handler.component.PixolutionHandler">
<!-- special defaults: only Flow global params can be set here -->
<lst name="defaults">
<str name="fieldname.prefix"></str>
<str name="fieldname.default.image">image</str>
<str name="fieldname.default.text">labels</str>
<str name="parser.name">pxlParser</str>
</lst>
</requestHandler>
<requestHandler name="/analyze" class="de.pixolution.solr.handler.component.AnalyzeHandler">
<lst name="defaults">
<str name="echoParams">none</str>
</lst>
</requestHandler>
<requestHandler name="/tag" class="de.pixolution.solr.handler.component.TaggingHandler">
<lst name="defaults">
<str name="echoParams">none</str>
<str name="q">*:*</str>
<str name="rank.threshold">0.6</str>
<str name="tagging.field">labels</str>
<!-- Optimized for suggesting a single tag (like category) -->
<str name="tagging.max">1</str>
<str name="tagging.inspect">3</str>
<str name="tagging.field">labels</str>
<str name="tagging.inspect.minterms">1</str>
</lst>
</requestHandler>
All global Flow parameters can only be set in solrconfig.xml
within the defaults
list of the /pixolution
request handler.
fieldname.default.image
This parameter defines the field from which to obtain the image URLs when indexing documents or return HTML responses.
The default is image
.
<str name="fieldname.default.image">image</str>
fieldname.prefix
This parameter adds the given prefix to all fields, fieldtypes and pseudo-fields which are created by Flow. The default is without prefix.
Changing fieldname.prefix
requires reindexing.
<str name="fieldname.prefix"></str>
Using a prefix avoids fieldname collisions with existing fields and allows easier identification which fields belong to Flow.
Setting a prefix fieldname.prefix=pxl_
would change the field color_names
to pxl_color_names
.
Setting field.prefix
also changes import
fieldname
If field.prefix=example_
then the pseudo-field would be example_import
.
analysis.threads
This parameter defines the threads available to analyze images when using the /analyze
and /update
endpoint.
The default is automatically set to the number of available CPUs.
<int name="analysis.threads">[number of available CPUs]</int>
downloader.fileLimitkBytes
This parameter limits the maximum file size allowed when downloading images.
The default is set to 20480
(20MB).
<int name="downloader.fileLimitkBytes">20480</int>
To safeguard against misuse you may limit the allowed file size of the image to download. If the file is bigger than allowed, an exception is thrown. If you do not want to limit the file size set this value to zero or a negative number.
downloader.userAgent
This parameter sets the given user agent in the HTTP request when downloading images.
This can be useful when an image server allows access based on the user agent.
The default is Flowbot
.
<str name="downloader.userAgent">Flowbot</str>
Update processor chain
<!-- Pixolution update processor configuration default="true" is mandatory.-->
<updateRequestProcessorChain name="imageloader" default="true">
<processor class="solr.DistributedUpdateProcessorFactory" />
<!-- Placed after distributed to distribute image analysis load across the cloud, if any. -->
<processor class="de.pixolution.solr.update.processor.PixolutionUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
<processor class="solr.LogUpdateProcessorFactory" />
</updateRequestProcessorChain>
<!-- Pixolution update processor configuration default="true" is mandatory.
Enabled field type guessing for unknown fields.-->
<updateRequestProcessorChain name="imageloader" default="true">
<processor class="solr.RemoveBlankFieldUpdateProcessorFactory" />
<processor class="solr.ParseDateFieldUpdateProcessorFactory">
<arr name="format">
<str>yyyy-MM-dd['T'[HH:mm[:ss[.SSS]][z</str>
<str>yyyy-MM-dd['T'[HH:mm[:ss[,SSS]][z</str>
<str>yyyy-MM-dd HH:mm[:ss[.SSS]][z</str>
<str>yyyy-MM-dd HH:mm[:ss[,SSS]][z</str>
<str>[EEE, ]dd MMM yyyy HH:mm[:ss] z</str>
<str>EEEE, dd-MMM-yy HH:mm:ss z</str>
<str>EEE MMM ppd HH:mm:ss [z ]yyyy</str>
</arr>
</processor>
<processor class="solr.AddSchemaFieldsUpdateProcessorFactory">
<lst name="typeMapping">
<str name="valueClass">java.lang.String</str>
<str name="fieldType">string</str>
<lst name="copyField">
<!-- copy all incoming string field in the default search field-->
<str name="dest">text</str>
</lst>
<!-- Use as default mapping instead of defaultFieldType -->
<bool name="default">true</bool>
</lst>
<lst name="typeMapping">
<str name="valueClass">java.lang.Boolean</str>
<str name="fieldType">boolean</str>
</lst>
<lst name="typeMapping">
<str name="valueClass">java.util.Date</str>
<str name="fieldType">date</str>
</lst>
<lst name="typeMapping">
<str name="valueClass">java.lang.Long</str>
<str name="valueClass">java.lang.Integer</str>
<str name="fieldType">long</str>
</lst>
<lst name="typeMapping">
<str name="valueClass">java.lang.Number</str>
<str name="fieldType">double</str>
</lst>
</processor>
<processor class="solr.DistributedUpdateProcessorFactory" />
<!-- Placed after distributed to distribute image analysis load across the cloud, if any. -->
<processor class="de.pixolution.solr.update.processor.PixolutionUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
<processor class="solr.LogUpdateProcessorFactory" />
</updateRequestProcessorChain>
Recommended search handlers and parameters
<requestHandler name="/select" class="solr.SearchHandler" default="true" initParams="flow">
<arr name="first-components">
<str>pxlFlow</str>
</arr>
</requestHandler>
<requestHandler name="/duplicate" class="solr.SearchHandler" initParams="flow">
<lst name="defaults">
<str name="rank.mode">duplicate</str>
<str name="rank.threshold">0.7</str>
</lst>
<arr name="first-components">
<str>pxlFlow</str>
</arr>
</requestHandler>
<requestHandler name="/image" class="solr.SearchHandler" initParams="flow">
<lst name="defaults">
<str name="rank.mode">content</str>
</lst>
<arr name="first-components">
<str>pxlFlow</str>
</arr>
</requestHandler>
<requestHandler name="/color" class="solr.SearchHandler" initParams="flow">
<lst name="defaults">
<str name="rank.mode">color</str>
</lst>
<arr name="first-components">
<str>pxlFlow</str>
</arr>
</requestHandler>
<initParams name="flow">
<lst name="defaults">
<str name="echoParams">none</str>
<int name="rows">10</int>
<str name="q">*:*</str>
<str name="fl">id,image,score,color_*</str>
<str name="rank.threshold">0.5</str>
<!-- automatically filled with created string fields via copy directive in update chain -->
<str name="df">text</str>
<str name="facet.mincount">1</str>
</lst>
</initParams>
Schema
The schema.xml
or managed-schema
file represents the storage structure by defining known fields and their fieldtypes.
<?xml version="1.0" ?>
<schema name="pixolution-example-schema" version="1.6">
<uniqueKey>id</uniqueKey>
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="image" type="string" indexed="true" stored="true" multiValued="false" />
<field name="labels" type="string" indexed="true" stored="true" multiValued="true" />
<!-- default search field with all text copied into, filled by copyField directive in field adding updatechain -->
<field name="text" type="general_text" indexed="true" stored="false" multiValued="true" />
<field name="location" type="location" indexed="true" stored="true" multiValued="false" />
<field name="_version_" type="long" indexed="false" stored="false" docValues="true"/>
<dynamicField name="random_*" type="random" />
<fieldType name="random" class="solr.RandomSortField" indexed="true" />
<fieldType name="string" class="solr.StrField" indexed="true" stored="true" />
<fieldType name="long" class="solr.LongPointField" indexed="true" stored="true" />
<fieldType name="double" class="solr.DoublePointField" indexed="true" stored="true" />
<fieldType name="boolean" class="solr.BoolField" indexed="true" stored="true" />
<fieldType name="date" class="solr.DatePointField" indexed="true" stored="true" />
<fieldType name="location" class="solr.LatLonPointSpatialField" indexed="true" stored="true" />
<fieldType name="general_text" class="solr.TextField">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
</fieldType>
<!-- preconfigured copy fields -->
<copyField source="color_names" dest="text" />
<copyField source="labels" dest="text" />
<!-- Flow modules preconfigured fields to remove the need for calling pixolution handler on startup -->
<fieldtype name="bin" class="de.pixolution.solr.schema.DescriptorFieldType" />
<fieldtype name="int" class="org.apache.solr.schema.IntPointField" />
<fieldtype name="float" class="org.apache.solr.schema.FloatPointField" />
<field name="color_descriptor" type="bin" docValues="true" multiValued="false" indexed="false" />
<field name="color_cluster" type="int" stored="false" docValues="false" multiValued="false" indexed="true" />
<field name="color_nclusters" type="bin" docValues="true" multiValued="false" indexed="false" />
<field name="color_lopq" type="bin" docValues="true" multiValued="false" indexed="false" />
<field name="color_rerank" type="long" docValues="true" multiValued="false" indexed="false" />
<field name="color_names" type="string" stored="true" docValues="false" multiValued="true" indexed="true" />
<field name="color_isolated" type="boolean" stored="true" docValues="false" multiValued="false" indexed="true" />
<field name="color_palette_hex" type="string" stored="true" docValues="false" multiValued="true" indexed="false" />
<field name="color_palette_freq" type="float" stored="true" docValues="false" multiValued="true" indexed="false" />
<field name="copyspace" type="int" stored="true" docValues="false" multiValued="true" indexed="true" />
<field name="duplicate_descriptor" type="bin" docValues="true" multiValued="false" indexed="false" />
<field name="duplicate_cluster" type="int" stored="false" docValues="false" multiValued="false" indexed="true" />
<field name="duplicate_nclusters" type="bin" docValues="true" multiValued="false" indexed="false" />
<field name="duplicate_lopq" type="bin" docValues="true" multiValued="false" indexed="false" />
<field name="duplicate_rerank" type="long" docValues="true" multiValued="false" indexed="false" />
<field name="content_descriptor" type="bin" docValues="true" multiValued="false" indexed="false" />
<field name="content_cluster" type="int" stored="false" docValues="false" multiValued="false" indexed="true" />
<field name="content_nclusters" type="bin" docValues="true" multiValued="false" indexed="false" />
<field name="content_lopq" type="bin" docValues="true" multiValued="false" indexed="false" />
<field name="content_rerank" type="long" docValues="true" multiValued="false" indexed="false" />
</schema>
Enable image loading in HTML responses
To display images from search results directly in your browser when requesting HTML responses via wt=html
, you must relax the Jetty server's content security policy (CSP) to allow your browser to load images from arbitrary domains.
In the jetty.xml
config file change the Content-Security-Policy
for image resources to img-src * data:
.
sed -i "s/img-src 'self'/img-src */" /opt/solr/server/etc/jetty.xml
Next Steps
- Start Solr
- Index example dataset or add your own docs
- Explore the query API
- Learn how to find duplicates, search by colors or find relevant images.
The query examples and explanations in this documentation refer to the above configuration. If your configuration is different, this may also affect the examples documented here (e.g. field names, available request handlers, etc.).