In an ideal world, we would like to use a Hash Table to store a set of elements because of its O(1) average-case time complexity for find, insert, and remove operations. In practice, however, we may face datasets that are too large to fit into memory at once. In these cases, we can use the Bloom filter, which is a probabilistic data structure that never has false negatives (i.e., if it tells us x doesn’t exist, x definitely doesn’t exist), but that may have false positives (i.e., if it tells us that x exists, x might not actually exist).
In this part of the assignment, we have provided a file called BloomFilter.cpp that contains initial steps towards implementing a Bloom filter. Function headers (with usage details) are included in BloomFilter.h, and you need to fill in the insert() and find() functions of the BloomFilter class. Be sure to only modify BloomFilter.cpp: do not modify BloomFilter.h.
We will not be checking for memory leaks, as you should not be dynamically allocating any memory. We have provided a tester program, BloomFilterTest, that will help test your code.
Task: Edit CountMinSketch.cpp
In an ideal world, we would like to use a Hash Map to store the counts of elements because of its O(1) average-case time complexity for find, insert, and remove operations. In practice, however, we may face datasets that are too large to fit into memory at once. In these cases, we can use the Count-Min Sketch, which is a probabilistic data structure that may overestimate our count, but that will never underestimate our count.
In this part of the assignment, we have provided a file called CountMinSketch.cpp that contains initial steps towards implementing a Count-Min Sketch. Function headers (with usage details) are included in CountMinSketch.h, and you need to fill in the increment() and find() functions of the CountMinSketch class. Be sure to only modify CountMinSketch.cpp: do not modify CountMinSketch.h.
We will not be checking for memory leaks, as you should not be dynamically allocating any memory. We have provided a tester program, CountMinSketchTest, that will help test your code.
Hash Functions
You will want to make use of the hash functions we have declared in HashFunctions.h and defined in HashFunctions.cpp. You must not edit these files! Instead, in your code, you will have access to the hash_functions global variable, which is a vector containing multiple string hash functions.
Compiling and Running
You can compile your code using the provided Makefile via the make command:
$ make g++ -Wall -pedantic -g -O0 -std=c++11 -o BloomFilterTest BloomFilterTest.cpp B loomFilter.cpp HashFunctions.cpp g++ -Wall -pedantic -g -O0 -std=c++11 -o CountMinSketchTest CountMinSketchTest .cpp CountMinSketch.cpp HashFunctions.cpp
If you want to clean up your environment by deleting all the compiled executables, you can simply run make clean:
$ make clean rm -f BloomFilterTest CountMinSketchTest *.o
Running BloomFilterTest
Here’s an example of how BloomFilterTest should look like when it’s run from the command line:
$ ./BloomFilterTest 1.978
Given a Bloom Filter with m slots that uses k hash functions that is storing n elements, the probability of a False Positive is the following:
ϵ≈(1−e−mkn)k
The number that gets printed by BloomFilterTest is the empirical False Positive probability we compute by running your code divided by this theoretical False Positive probability we compute using k, m, and n. Given an unmodified BloomFilterTest.cpp, we expect this number to be at most 8. If yours exceeds 8, you will receive 0 points.
Running CountMinSketchTest
Here’s an example of how CountMinSketchTest should look like when it’s run from the command line:
$ ./CountMinSketchTest 0.864665
We can design a Count-Min Sketch by selecting parameters ϵ and δ such that, given one of our n elements, the the estimated count of the element is at most ϵn larger than the true count with probability 1−δ:
c^x≤cx+ϵn with probability 1−δ
The number that gets printed by CountMinSketchTest is this theoretical probability we compute using the parameters of the Count-Min Sketch design divided by the empirical probability we compute by running your code. Given an unmodified CountMinSketchTest.cpp, we expect this number to be at most 1. If yours exceeds 1, you will receive 0 points. A value of infinity means that your code never returns an estimated count within this range.
We offer the best custom writing paper services. We have answered this question before and we can also do it for you.
GET STARTED TODAY AND GET A 20% DISCOUNT coupon code DISC20
We offer the bestcustom writing paper services. We have done this question before, we can also do it for you.
Why Choose Us
- 100% non-plagiarized Papers
- 24/7 /365 Service Available
- Affordable Prices
- Any Paper, Urgency, and Subject
- Will complete your papers in 6 hours
- On-time Delivery
- Money-back and Privacy guarantees
- Unlimited Amendments upon request
- Satisfaction guarantee
How it Works
- Click on the “Place Order” tab at the top menu or “Order Now” icon at the bottom and a new page will appear with an order form to be filled.
- Fill in your paper’s requirements in the "PAPER DETAILS" section.
- Fill in your paper’s academic level, deadline, and the required number of pages from the drop-down menus.
- Click “CREATE ACCOUNT & SIGN IN” to enter your registration details and get an account with us for record-keeping and then, click on “PROCEED TO CHECKOUT” at the bottom of the page.
- From there, the payment sections will show, follow the guided payment process and your order will be available for our writing team to work on it.