|
This video is part of the appearance, “Infineta Presents at Networking Field Day 3“. It was recorded as part of Networking Field Day 3 at 16:00-18:00 on March 29, 2012.
Watch on YouTube
Watch on Vimeo
Dr. K. V. S. Ramarao’s presentation at Tech Field Day on March 29, 2012, delved into the evolution of data deduplication solutions and introduced Infineta’s unique approach to the problem. He began by explaining the basic concept of data deduplication, which involves maintaining a dynamic dictionary of previously seen data to avoid redundant transmissions. He traced the history of deduplication algorithms back to the Rabin algorithm from 1981, which addressed the string matching problem by using rolling hashes to efficiently find substrings within larger strings. This method reduced computational complexity but introduced the challenge of false positives, which Rabin mitigated using random irreducible polynomials to minimize the probability of hash collisions.
Ramarao then discussed subsequent advancements in deduplication algorithms, including Manber’s work in 1993 on file similarity and Broder’s application of these ideas to web page similarity in 1997. By 1999, Ross Williams had patented a method that applied these principles to deduplication, focusing on identifying identical parts between similar strings. Traditional deduplication methods, as Ramarao explained, involve partitioning data into chunks and using rolling hashes to find breakpoints, which are then used to identify and eliminate redundant data. However, these methods face significant scalability issues, particularly when dealing with large volumes of data at high speeds, due to the heavy computational and memory demands.
Infineta’s “secret sauce,” as Ramarao described, lies in its innovative approach to deduplication, which avoids the pitfalls of traditional methods. Instead of relying on large, variable-length chunks and sequential processing, Infineta uses a massively parallelizable algorithm that processes fixed-length data segments. This method involves selecting random positions within data packets and comparing small, fixed-size segments to a dictionary, allowing for partial matches and reducing the need for extensive memory and CPU resources. By implementing this in hardware, specifically FPGAs, Infineta achieves high throughput with minimal latency, making it suitable for high-speed data transfer scenarios. This approach not only improves deduplication efficiency but also ensures scalability, addressing the limitations of traditional deduplication solutions.
Personnel: K. V. S. Ramarao