/analyze
endpoint
The /analyze
handler is used to analyze an image and return the result which can be used for indexing.
When adding documents to Flow, the images must be downloaded and analyzed before the processed data is actually indexed. Image analysis is a computationally intensive task. To speed up the indexing of large volumes of documents and avoid a reduction in search performance during indexing, you can separate image analysis from indexing.
Scalabe data workflow
Analyzing images is computational expensive and therefore a time consuming process compared to text processing. To avoid putting heavy load on your search server and to speed up the analysis step, you can split the analysis and indexing steps into separate processes and use multiple servers just for analyzing.
graph LR
A[Client]
subgraph Analyze Cluster
direction TB
subgraph Server 1
B{{Flow node /analyze}}
end
subgraph Server 2
B2{{Flow node /analyze}}
end
subgraph Server N
B3{{Flow node /analyze}}
end
end
D[(Database)]
subgraph Search Server
C{{Flow node /update}}
end
A <==>|1. Analyze Image| B
A <--> B2
A <--> B3
A -.->| 2. Store Json for reuse| D
A ==>|3. Index Json| C
- First, analyze images using the
/analyze
endpoint.- Use one or more servers, each with a running Flow Docker.
- Since these Flow nodes only analyze images and do not store data, you can add and remove servers as needed.
- The more servers the faster the processing (obviously ).
- Optionally, store the JSON in a database to reuse that data when re-indexing is necessary.
- Then, index the JSON as value in the pseudo-field
import
using the/update
endpoint.- Indexing is now very fast because the heavy lifting has already been done.
- Indexing does not affect search performance anymore.
Which workflow to choose?
Given the information above, we suggest to use the scalable workflow (separate analyis and indexing) if:
- you have >100,000 images to index,
- you need to re-index often (your data or schema changes),
- reduce load on search server (constant search times are important).
Import pre-analyzed image
To speed up indexing you can also import pre-analyzed image data using the outputs of the /analyze
endpoint.
- Send image to
/analyze
endpoint - Extract the analyzed image data as json string from
outputs
field of the response. - Add the json string as fieldvalue to the special field
import
when indexing the corresponding doc.
import requests
IMG_URL = "YOUR_IMAGE_URL"
FLOW_URL = "http://localhost:8983/api/cores/my-collection"
# 1. Analyze image
rsp = requests.get(FLOW_URL + "/analyze?input.url=" + IMG_URL)
pre_analyzed = rsp.json()["outputs"]
# 2. Index image
doc = {
"id":"1",
"image":IMG_URL,
"import":pre_analyzed
}
requests.post(FLOW_URL + "/update", json=doc)
curl --request POST \
--url 'http://localhost:8983/api/cores/my-collection/update' \
--header 'Content-Type: application/json' \
--data ' {
"id" : "1",
"image" : "YOUR_IMAGE_URL",
"import" : "{\"color_cluster\":117,\"color_rerank\":5707186122717457336,\"color_isolated\":false,\"content_lopq\":\"HnjaneuchsAm\",\"content_rerank\":-4163907962861818406,\"color_palette_freq\":[0.24072231,0.22010717,0.21671125,0.18918492,0.14692917],\"color_palette_hex\":[\"#FDFDFD\",\"#FF8E3F\",\"#FF635B\",\"#FF7A4D\",\"#FF3A70\"],\"content_descriptor\":\"E61SS785+fDoUFqsGB3tBv/TTYs1lIhHAeMCPQbm6wIGpwx8k7wa8wMW3VYM38TgiKuBMT/slvPSzCX+l8/4I77tf1n32z1bDAAydPRy+E4mE1SBGdvZFGamU0EG7ALu7IHYLqY1Gh8/Bg4mwywc7IdE2DWkBwb6/ObHCyLQB8o=\",\"copyspace\":[8],\"duplicate_nclusters\":\"X1bpk08JXrXsdpoeQKw6nA==\",\"color_lopq\":\"daenygQBnoR1\",\"duplicate_lopq\":\"XzSGyemXKxkt\",\"color_names\":[\"orange\",\"red\",\"white\",\"brown\"],\"duplicate_rerank\":-7442502864752548956,\"content_nclusters\":\"HghAbA+D2XbdedHK6D1aGQ==\",\"color_descriptor\":\"fx0Bf+QD0g4QQBb8AroCJ7zwHVwOC0sH\",\"color_nclusters\":\"dQWESMEwfzHq3cZTk7FDHg==\",\"duplicate_descriptor\":\"XJc7jT9JMsxSS8T2AIGrxGxOs4GXFQztB5oK28C3V4ie7tUozMmdx9RXxqM8+Totzghkw4G16TLA9xvvIP/ZKfSB2Gd/w9UevsnoFhF/IWyB9/6B7Dv4TAt3Dq4jv+aq1/ZILw8ksA0fTb1PEDUA/gievD3qZDfnxPv0aN86JyiBgSIiFIHugQ4DURvTRIE1yQvDJYHLJfiuJYFu5Ox/nwjhNgkJLN/5CLxI3Db1egQu37fSNoUN66Admiae824g01swZPjkMfoL/dF/scGnWicqIiCTIGsEaQm6I9TsEcvlxNS/S0zBNuPSQaffTTo1vNksHDjBBLnm/ywjjxTiAn871OAd5hNJYvr1f5vof5jvYn7UOWbmDvG15BvZ2QGUjsrBKe0U1LwnsDVXFvLyoPHRgWH88Rh8l4wqADCfQrmR8y4Ba1V/wVFV/B8WYXfX3X/5DwTxq8S8ajK8kynnHir5+w8Zn2bhgR/4m3HMRsUfP+suVcrhHfL0quWB+Mb0VyLQ3Yx/Hs5g9i3pIcT7SGrs58LwgRPwDQj8A5vIfOrio4EjXeYPEyDY1y7T0bk2HLvn7ewdtT+CpMv0m6ojtfrlHpiBILT9gQT5xJJM/Q/UP/vv6ekTEH+LRBkd8Bdm5Zsf8T416BRiHLIlhkgJBkLjAwf5JNVL/evGKztLTvs=\",\"duplicate_cluster\":95,\"content_cluster\":30}"
}'
If the special field import
is present in the doc, Flow uses the import
field as data source and not the image
field.
We recommend indexing the original image URL as well, to be able to display the image when using the HTML response writer or your own UI.
The special import
field value is not indexed.
Upload image
The following snippet loads the image and scales it down to thumbnail size in memory before uploading it to Flow. This has two advantages:
- Python Pillow package supports many more image formats than the Java runtime used by Flow. By converting the image to
png
before uploading, you can easily handle a wide range of formats. - Downscaling the image before uploading significantly speeds up processing for high-resolution images by minimizing I/O and decoding times.
import requests
import base64
from PIL import Image
from io import BytesIO
IMG_PATH = "YOU_IMAGE_FILE"
FLOW_URL = "http://localhost:8983/api/cores/my-collection"
# Load & scale to thumbnail size - HighRes images slow down IO tremendously
img = Image.open(IMG_PATH)
img.thumbnail((400, 400))
# encode as in-memory png
bytes = BytesIO()
img.save(bytes, format='PNG')
base64_image = base64.b64encode(bytes.getvalue()).decode('utf-8')
payload = {'input.data': base64_image}
# set empty logParamsList= to avoid logging huge log messages when uploading base64 images as parameters
rsp = requests.post(FLOW_URL+"/analyze?logParamsList=", data=payload)
pre_analyzed = rsp.json()["outputs"]
print(pre_analyzed)
The following command converts a given image file (png
, jpg
, gif
supported) to base64 that is piped as value to the input.data
parameter in curl
and finally sent to the /analyze
endpoint.
base64 --wrap=0 YOUR_FILE_PATH | \
curl --data-urlencode "input.data@-" \
--url 'http://localhost:8983/api/cores/my-collection/analyze?logParamsList='
Java API
To directly analyze images within Java, you can use the SimpleOutputProducer
class which is included in the Flow jar
file.
import java.net.URL;
import de.pixolution.api.SimpleOutputProducer;
import de.pixolution.api.json.JsonWriter;
import de.pixolution.storage.ServiceOutput;
public class JavaAnalyzeAPI {
public static void main(String[] args) {
// Note: Reuse the producer for best performance
try (SimpleOutputProducer producer = new SimpleOutputProducer()) {
URL url = new URL("https://website.com/my-image.jpg");
// Download & analyze image
ServiceOutput result = producer.calcOutput(url);
JsonWriter writer = new JsonWriter();
// jsonResult can be directly set in import field when indexing
String jsonResult = writer.write(result);
System.out.println(jsonResult);
} catch (Exception e) {
// Handle exception
}
}
}