elasticsearch-scroll
碰到一個比較頭疼的問題,MySQL資料丟失。
有兩個辦法,一個辦法是讓DBA找半年前的資料。另一個辦法是儲存了MySQL資料的ES裡找。
由於資料量過萬,而且ES設定了一次查詢資料量最大10000,想想用 scroll 取資料會比較好。
1 ElasticSearch 2.x
1.1 查詢索引有多少資料
localhost:9200/_nodes/stats/indices/search?pretty
1.1 檢視索引資訊
curl -XGET 'http://127.0.0.1:9400/dev_index1_20190118/docs/_search?pretty'
1.2 使用遊標
curl -XGET 'http://127.0.0.1:9400/dev_index1_20190118/docs/_search?scroll=10m' -d ' { "query": { "match_all": {}}, "sort" : ["_doc"], "size":10000 }'>> es_scroll_data_20190118_1w.txt
1.3 不斷取下一頁
curl -XGET 'http://127.0.0.1:9400/_search?scroll=10m' -d ' { "scroll": "10m", "scroll_id" : "DnF1ZXJ5VGhlbkZldGNoBQAAAAAANKLTFjdfVno3Y2hGU1BpOUYtMW54OTV3TEEAAAAAADSi1BY3X1Z6N2NoRlNQaTlGLTFueDk1d0xBAAAAAAA0otYWN19WejdjaEZTUGk5Ri0xbng5NXdMQQAAAAAANKLVFjdfVno3Y2hGU1BpOUYtMW54OTV3TEEAAAAAADvJpxZzcU9YSExnLVRTNk5RY3JfMlNuWU9n" }' >> es_scroll_data_20190118_2w.txt
curl -XGET 'http://127.0.0.1:9400/_search?scroll=10m' -d ' { "scroll": "10m", "scroll_id" : "DnF1ZXJ5VGhlbkZldGNoBQAAAAAANKLTFjdfVno3Y2hGU1BpOUYtMW54OTV3TEEAAAAAADSi1BY3X1Z6N2NoRlNQaTlGLTFueDk1d0xBAAAAAAA0otYWN19WejdjaEZTUGk5Ri0xbng5NXdMQQAAAAAANKLVFjdfVno3Y2hGU1BpOUYtMW54OTV3TEEAAAAAADvJpxZzcU9YSExnLVRTNk5RY3JfMlNuWU9n" }' >> es_scroll_data_20190118_3w.txt
2 ElasticSearch 5.6.x
2.1 查詢索引資訊
localhost:9200/_nodes/stats/indices/search?pretty
curl -XGET 'http://127.0.0.1:9400/dev_index1_20190118/docs/_search?pretty'
2.2 使用遊標
curl -XGET 'http://127.0.0.1:9400/dev_index1_20190118/docs/_search?scroll=10m' -d ' { "query": { "match_all": {}}, "sort" : ["_doc"], "size":10000 }'>> es_scroll_data_20190118_1w.txt
2.3 不斷取下一頁
curl -XGET 'http://127.0.0.1:9400/_search?scroll=10m' -d ' { "scroll_id" : "DnF1ZXJ5VGhlbkZldGNoBQAAAAAANKLTFjdfVno3Y2hGU1BpOUYtMW54OTV3TEEAAAAAADSi1BY3X1Z6N2NoRlNQaTlGLTFueDk1d0xBAAAAAAA0otYWN19WejdjaEZTUGk5Ri0xbng5NXdMQQAAAAAANKLVFjdfVno3Y2hGU1BpOUYtMW54OTV3TEEAAAAAADvJpxZzcU9YSExnLVRTNk5RY3JfMlNuWU9n" }' >> es_scroll_data_20190118_2w.txt
curl -XGET 'http://127.0.0.1:9400/_search?scroll=10m' -d ' { "scroll_id" : "DnF1ZXJ5VGhlbkZldGNoBQAAAAAANKLTFjdfVno3Y2hGU1BpOUYtMW54OTV3TEEAAAAAADSi1BY3X1Z6N2NoRlNQaTlGLTFueDk1d0xBAAAAAAA0otYWN19WejdjaEZTUGk5Ri0xbng5NXdMQQAAAAAANKLVFjdfVno3Y2hGU1BpOUYtMW54OTV3TEEAAAAAADvJpxZzcU9YSExnLVRTNk5RY3JfMlNuWU9n" }' >> es_scroll_data_20190118_3w.txt ``` # 遇到的問題 ## 3.1 Unknown key for a VALUE_STRING in [scroll_id]. ```json { "error": { "root_cause": [ { "type": "parsing_exception", "reason": "Unknown key for a VALUE_STRING in [scroll_id].", "line": 3, "col": 19 } ], "type": "parsing_exception", "reason": "Unknown key for a VALUE_STRING in [scroll_id].", "line": 3, "col": 19 }, "status": 400 }
第二次使用的 scroll_id 和第一次返回的 scroll_id 不一致導致
3.2 Unknown key for a VALUE_STRING in [scroll]
{ "error": { "root_cause": [ { "type": "parsing_exception", "reason": "Unknown key for a VALUE_STRING in [scroll].", "line": 3, "col": 15 } ], "type": "parsing_exception", "reason": "Unknown key for a VALUE_STRING in [scroll].", "line": 3, "col": 15 }, "status": 400 }
第二次請求時 請求引數裡多了 scroll 引數
3.3 Batch size is too large, size must be less than or equal to: [10000] but was [1000000]. Scroll batch sizes cost as much memory as result windows so they are controlled by the [index.max_result_window] index level setting.
{ "error": { "root_cause": [ { "type": "query_phase_execution_exception", "reason": "Batch size is too large, size must be less than or equal to: [10000] but was [1000000]. Scroll batch sizes cost as much memory as result windows so they are controlled by the [index.max_result_window] index level setting." } ], "type": "search_phase_execution_exception", "reason": "all shards failed", "phase": "query", "grouped": true, "failed_shards": [ { "shard": 0, "index": "dev_index1_20190118", "node": "8XqKY198S823M78QA43F8g", "reason": { "type": "query_phase_execution_exception", "reason": "Batch size is too large, size must be less than or equal to: [10000] but was [1000000]. Scroll batch sizes cost as much memory as result windows so they are controlled by the [index.max_result_window] index level setting." } } ] }, "status": 500 }
設定的 size 過大,超過10000,配置檔案裡 index.max_result_window 最大為10000
3.4 search_context_missing_exception
{ "error": { "root_cause": [ { "type": "search_context_missing_exception", "reason": "No search context found for id [3540965]" }, { "type": "search_context_missing_exception", "reason": "No search context found for id [3922089]" }, { "type": "search_context_missing_exception", "reason": "No search context found for id [3454995]" }, { "type": "search_context_missing_exception", "reason": "No search context found for id [3454996]" }, { "type": "search_context_missing_exception", "reason": "No search context found for id [3454994]" } ], "type": "search_phase_execution_exception", "reason": "all shards failed", "phase": "query", "grouped": true, "failed_shards": [ { "shard": -1, "index": null, "reason": { "type": "search_context_missing_exception", "reason": "No search context found for id [3540965]" } }, { "shard": -1, "index": null, "reason": { "type": "search_context_missing_exception", "reason": "No search context found for id [3922089]" } }, { "shard": -1, "index": null, "reason": { "type": "search_context_missing_exception", "reason": "No search context found for id [3454995]" } }, { "shard": -1, "index": null, "reason": { "type": "search_context_missing_exception", "reason": "No search context found for id [3454996]" } }, { "shard": -1, "index": null, "reason": { "type": "search_context_missing_exception", "reason": "No search context found for id [3454994]" } } ], "caused_by": { "type": "search_context_missing_exception", "reason": "No search context found for id [3454994]" } }, "status": 404 }
其實是超時了,scroll自動刪除了
References
[1]遊標查詢
[2]scroll