Recipe 4.7. Finding Elements in One Array but Not Another (Perl Cookbook)

4.7. Finding Elements in One Array but Not Another

Problem

You want to find elements that are in one array but not another.

Solution

You want to find elements in @A that aren't in @B. Build a hash of the keys of @B to use as a lookup table. Then check each element in @A to see if it is in @B.

Straightforward implementation

# assume @A and @B are already loaded
%seen = ();                  # lookup table to test membership of B
@aonly = ();                 # answer

# build lookup table
foreach $item (@B) { $seen{$item} = 1 }

# find only elements in @A and not in @B
foreach $item (@A) {
    unless ($seen{$item}) {
        # it's not in %seen, so add to @aonly
        push(@aonly, $item);
    }
}

More idiomatic version

my %seen; # lookup table
my @aonly;# answer

# build lookup table
@seen{@B} = ();

foreach $item (@A) {
    push(@aonly, $item) unless exists $seen{$item};
}

As with nearly any problem in Perl that asks whether a scalar is in one list or another, this one uses a hash. First, process @B so that the %seen hash records each element from @B by setting its value to 1. Then process @A one element at a time, checking whether that particular element had been in @B by consulting the %seen hash.

The given code retains duplicate elements in @A . This can be easily fixed by adding the elements of @A to %seen as they are processed:

foreach $item (@A) {
    push(@aonly, $item) unless $seen{$item};
    $seen{$item} = 1;                       # mark as seen
}

The two solutions differ mainly in how they build the hash. The first iterates through @B. The second uses a hash slice to initialize the hash. A hash slice is easiest illustrated by example:

$hash{"key1"} = 1;
$hash{"key2"} = 2;

is equivalent to:

@hash{"key1", "key2"} = (1,2);

The list in the curly braces holds the keys; the list on the right holds the values. We initialize %seen in the first solution by looping over each element in @B and setting the appropriate value of %seen to 1. In the second, we simply say:

@seen{@B} = ();

This uses items in @B as keys for %seen, setting each corresponding value to undef, because there are fewer values on the right than places to put them. This works out here because we check for existence of the key, not logical truth or definedness of the value. If we needed true values, a slice could still shorten our code:

@seen{@B} = (1) x @B;

4.7. Finding Elements in One Array but Not Another

Problem

Solution

Straightforward implementation

More idiomatic version

Discussion

See Also


4.6. Extracting Unique Elements from a List		4.8. Computing Union, Intersection, or Difference of Unique Lists