Tech Field Day

The Independent IT Influencer Event

  • Home
    • The Futurum Group
    • FAQ
    • Staff
  • Sponsors
    • Sponsor List
      • 2025 Sponsors
      • 2024 Sponsors
      • 2023 Sponsors
      • 2022 Sponsors
    • Sponsor Tech Field Day
    • Best of Tech Field Day
    • Results and Metrics
    • Preparing Your Presentation
      • Complete Presentation Guide
      • A Classic Tech Field Day Agenda
      • Field Day Room Setup
      • Presenting to Engineers
  • Delegates
    • Delegate List
      • 2025 Delegates
      • 2024 Delegates
      • 2023 Delegates
      • 2022 Delegates
      • 2021 Delegates
      • 2020 Delegates
      • 2019 Delegates
      • 2018 Delegates
    • Become a Field Day Delegate
    • What Delegates Should Know
  • Events
    • All Events
      • Upcoming
      • Past
    • Field Day
    • Field Day Extra
    • Field Day Exclusive
    • Field Day Experience
    • Field Day Live
    • Field Day Showcase
  • Topics
    • Tech Field Day
    • Cloud Field Day
    • Mobility Field Day
    • Networking Field Day
    • Security Field Day
    • Storage Field Day
  • News
    • Coverage
    • Event News
    • Podcast
  • When autocomplete results are available use up and down arrows to review and enter to go to the desired page. Touch device users, explore by touch or with swipe gestures.
You are here: Home / Videos / Improving Deduplication via Mathematics with Richard Lary

Improving Deduplication via Mathematics with Richard Lary



Storage Field Day 13

Richard Lary presented for X-IO at SFD13




This video is part of the appearance, “X-IO Technologies Presents at Storage Field Day 13“. It was recorded as part of Storage Field Day 13 at 10:30-12:30 on June 16, 2017.


Watch on YouTube
Watch on Vimeo

In his presentation at Storage Field Day 13, Richard Lary, Chief Scientist at X-IO Technologies, delves into the complexities and innovations in deduplication technology, emphasizing the significant role of mathematics in enhancing performance. Lary begins by highlighting the resource-intensive nature of deduplication, which involves substantial memory, CPU cycles, and disk accesses. He explains that deduplication typically relies on computationally intensive signature techniques to compare incoming data with existing data, necessitating a robust and persistent database of signatures. This database must handle petascale systems with billions of entries, survive power and controller failures, and maintain high write throughput despite the challenges posed by the random nature of hash-based signatures. Lary critiques the inefficiency of traditional caching methods in this context and underscores the need for a high-throughput, persistent mapping database to manage the dynamic nature of deduplication.

Lary then introduces the concept of using non-crypto signatures, specifically MetroHash, to improve deduplication performance. He explains that while traditional crypto hashes like SHA-1 and SHA-256 are secure, they are slow and CPU-bound, making them less suitable for high-performance storage systems. In contrast, non-crypto hashes like MetroHash are significantly faster and can handle the high throughput demands of modern storage systems. Lary also discusses the innovative use of a “bouquet filter,” a variation of the bloom filter, to efficiently manage the deduplication process. This approach involves using multiple small bloom filters to reduce the computational overhead and improve performance. Lary hints at a proprietary method developed by X-IO to further optimize deduplication by treating unique data differently, thereby reducing wasted resources and enhancing overall system efficiency. This method, still under patent consideration, promises to significantly improve deduplication performance while maintaining data integrity.

Personnel: Richard Lary

  • Bluesky
  • LinkedIn
  • Mastodon
  • RSS
  • Twitter
  • YouTube

Event Calendar

  • Oct 9-Oct 9 — Tech Field Day Exclusive with Microsoft Security
  • Oct 15-Oct 15 — Tech Field Day Experience at NetApp INSIGHT 2025
  • Oct 22-Oct 23 — Cloud Field Day 24
  • Oct 29-Oct 30 — AI Field Day 7
  • Nov 5-Nov 6 — Networking Field Day 39
  • Nov 11-Nov 12 — Tech Field Day at KubeCon North America 2025
  • Jan 28-Jan 29 — AI Infrastructure Field Day 4
  • Apr 29-Apr 30 — Security Field Day 15

Latest Coverage

  • 68 Days Ahead: Turning DNS Data into Compliance and Cyber Resilience
  • What If Your Storage Knew How to Talk Back?
  • Behind the Exabytes: A Field Note from Inside the Cloud
  • Glean Insights & Value from Unstructured Data With Qlik Answers
  • Hammerspace and the Open Flash Platform at #AIIFD3

Tech Field Day News

  • The Latest in Cybersecurity Innovation at Security Field Day 14
  • Pushing the Boundaries of AI Performance, Scale, and Innovation at AI Infrastructure Field Day 3

Return to top of page

Copyright © 2025 · Genesis Framework · WordPress · Log in