kafka應用程序與操作系統的沖洗管理
小編:管理員 21閱讀 2022.07.25
Kafka always immediately writes all data to the filesystem and supports the ability to configure the flush policy that controls when data is forced out of the OS cache and onto disk using the and flush. This flush policy can be controlled to force data to disk after a period of time or after a certain number of messages has been written. There are several choices in this configuration.
Kafka一直都是立即把所有數據寫入文件系統,并支持使用flush(沖洗)功能將數據從操作系統緩存沖洗到磁盤上。這個沖洗策略可控制在“一段時間之后”或“消息到一定數量之后”強制數據寫入磁盤,在這個配置中有幾個選擇。
Kafka must eventually call fsync to know that data was flushed. When recovering from a crash for any log segment not known to be fsync'd Kafka will check the integrity of each message by checking its CRC and also rebuild the accompanying offset index file as part of the recovery process executed on startup.
Kafka最終必須調用fsync知道數據被刷新。 當從崩潰中恢復任何未知為fsync的日志段時,Kafka將通過檢查每個消息的CRC來檢查每個消息的完整性,并且還將重新生成伴隨的offset索引文件,作為啟動時執行的恢復過程的一部分。
Note that durability in Kafka does not require syncing data to disk, as a failed node will always recover from its replicas
注意,kafka的耐久性不需要同步數據到磁盤,因為失敗的節點會從它的副本恢復。
We recommend using the default flush settings which disable application fsync entirely. This means relying on the background flush done by the OS and Kafka's own background flush. This provides the best of all worlds for most uses: no knobs to tune, great throughput and latency, and full recovery guarantees. We generally feel that the guarantees provided by replication are stronger than sync to local disk, however the paranoid still may prefer having both and application level fsync policies are still supported.
我們推薦使用默認的設置,完全禁用fsync應用。這意味著依賴操作系統和kafka自己的后臺沖洗,最適合大多數使用:無需調整,大吞吐量和延遲,以及全面恢復保證,我們一般認為,通過副本提供的保證比同步到本地磁盤更強,但是,偏執狂仍然支持應用級fsync策略。
The drawback of using application level flush settings are that this is less efficient in it's disk usage pattern (it gives the OS less leeway to re-order writes) and it can introduce latency as fsync in most Linux filesystems blocks writes to the file whereas the background flushing does much more granular page-level locking.
使用應用程序級別刷新設置的缺點是它的磁盤使用模式效率較低(它給操作系統減少了重新排序寫操作的余地),并且可能引入延遲,因為fsync在大多數Linux文件系統中阻塞寫入文件,而后臺刷新進行更細粒度的頁面級鎖定。
In general you don't need to do any low-level tuning of the filesystem, but in the next few sections we will go over some of this in case it is useful.
一般情況下你不需要做任何底層文件系統的調優,但在接下來的幾節中,我們將討論一些這樣的情況。
相關推薦
- kafka消費者Java客戶端 一個從kafka集群中獲取消息的java客戶端。kafka客戶端從kafka集群中獲取消息,并透明地處理kafka集群中出現故障broker,透明地調節適應集群中變化的數據分區。也和broker交互,負載平衡消費者。public class KafkaConsumerK,V extends Object implements Consu…
- HanLP《自然語言處理入門》--3.二元語法與中文分詞 文章目錄3. 二元語法與中文分詞3.1 語言模型3.2 中文分詞語料庫3.3 訓練與預測3.4 HanLP分詞與用戶詞典的集成3.5 二元語法與詞典分詞比較3.6 GitHub項目3. 二元語法與中文分詞上一章中我們實現了塊兒不準的詞典分詞,詞典分詞無法消歧。給定兩種分詞結果“商品…