XEROF

 

xlsgen 4.0.0.64 : Fuzzy string matching (II)


Build 4.0.0.64 of xlsgen advances the previous build by adding even more capability to string matching in xlsgen.

What if we could match strings differing not only on spaces and punctuations but on some sort of letter combinations such as substitution, insertion and deletion? Let's see an example of that :

        
Sample data  Excel match  xlsgen Fuzzy punct  xlsgen Fuzzy letters  


In this example, we start with a number of variations of string "abcdef" in which we have a number of spaces and punctuations and also swapped letters, missing letters and additional letters.

If in Excel we create a conditional formatting meant to highlight strings matching "abcdef", only the two first rows get highlighted in Excel. And for a reason, they differ only on the case. Add any space, punctuation let alone letters and Excel sees all of these as different strings.

If you are using xlsgen and the punctations mode for fuzzy string matching, then xlsgen highlights many rows but far from all rows. Specifically, xlsgen does not highlight rows where there is at least one letter combination such as swapping, insertion or deletion.

And if you are using xlsgen with the letters mode for fuzzy string matching, then xlsgen all of a sudden highlights almost all rows. In fact, the only which doesn't is because a letter appears three times which is regarded as too different.

Here is how you do this in xlsgen,

workbook.FuzzyStringMatch = stringmatch_letters;


xlsgen::IXlsConditionalFormattingPtr cf001s0 = worksheet->NewRange(L"R3C3:R16C3")->NewConditionalFormatting();
cf001s0->CellCondition->EqualTo(L"\"abcdef\"");
xlsgen::IXlsStylePtr style001s0 = worksheet->NewStyle();
style001s0->Pattern->Pattern = xlsgen::pattern_solid;
style001s0->Pattern->BackgroundColor = 0xFFFF00;
cf001s0->Style = style001s0;


What is this good for in practice? It turns out this is really useful to have such a versatile fuzzy string matching because letter substitutions are exactly what happens with manually entered data in the real world. Indeed, often letters are simply swapped, or they are missing letters or too many letters. Combined with supporting spaces, punctuations and case, this looks like the perfect tool for getting your job done as fast and accurately as possible.

This is indeed an additional tool in your arsenal and we are looking forward to seeing customers out there taking advantage of it for improving how they handle real world data, which is often full of errors.

Posted on 16-December-2016 09:55 | Category: xlsgen, Excel generator | comment[0] | trackback[0]

 

 

<-- previous page

< December >
0102030405
0607080910
1112131415
1617181920
2122232425
2627282930
31



 

 

This site
Home
Articles

DevTools
CPU-Z
EditPlus
ExplorerXP
Kill.exe
OllyDbg
DependencyWalker
Process Explorer
autoruns.exe
Araxis
COM Trace injection
CodeStats
NetBrute
FileMon/Regmon
BoundsChecker
AQTime profiler
Source monitor
GDI leaks tracking
Rootkit revealer
Rootkit removal
RunAsLimitedUser(1)
RunAsLimitedUser(2)

 

 

Liens
Le Plan B
Un jour à Paris
Meneame
Rezo.net (aggr)
Reseau voltaire
Cuba solidarity project
Le grand soir
L'autre journal
Le courrier suisse
L'Orient, le jour
Agoravox (aggr)