Michael Wolraich's picture

    I Sorted Hillary’s Email

    When Hillary Clinton released emails from her personal account last week, many assumed that her attorneys had personally reviewed the messages before sending them to the State Department, but that’s not what happened. As detailed in her press statement, the review team used keyword searches to automatically filter over 60,000 messages, flagging about half as work related.

    “I have absolute confidence that everything that could be in any way connected to work is now in the possession of the State Department,” Clinton declared.

    I’m afraid that I don’t share her confidence, and I speak from experience. Twenty years ago, I used the same method to sort the Clinton administration’s email communications, including those of First Lady Hillary Clinton. It failed miserably.

    Read the full story at New York Magazine

    Topics: 

    Comments

    So cool!  And to think I knew you when. . .

    Congrats.  Really.


    You still know me when


    What a headline! Great job.


    Michael that's just fabulous.  Can't wait to read it!  Nice work.

     


    Excellent piece. Here's my 2¢:

    On one hand, we've made a lot of progress in natural language processing in the last 20 years. Skype will now automatically translate speech from English to Spanish, relying on context to know how to perform this translation. IBM's Watson is another notable success. That was nearly 10 years ago, and the field continues to make great progress.

    On the other hand, I doubt the IT team that failed to ensure that Clinton used work e-mail for official business will be using anything more sophisticated then the keyword search you mention in your article.


    Your 2 cents = my 2 cents


    Good article, Michael, and an interesting perspective on what may or may not have been done with her emails. But the simple fact is that no one will ever be sure if she deleted something she shouldn't have - including Clinton herself. Whatever the sorting method, she didn't go through each one deciding what was to be turned over. The question of pertinence is subjective. Had she given the server to the State Dept. before any sorting or deletions and some bipartisan folks tediously examined each one, would they have ever reached a consensus?

    Whether anyone likes it or not (because, frankly, there's little reason to), at some point we just have to trust that she didn't do anything that will harm the country or the citizens of it. If we can't do that much, the problem isn't email.


    This is the problem... her detractors opponents will now insist that she was specifically covering up emails that would have shown how the Muslim Brotherhood controls her chief of staff, foreign governments bribe her through the Clinton Foundation and she still laughs about getting away with killing Vince Foster, who was a spy.  All malarkey, but that's where this is heading.


    All of that and more would be waiting in the wings anyway, but I agree that this gave them something to shoot besides blanks. She's Hillary Clinton. Her wealth of political experience that makes her such a viable candidate carries with it heavy baggage - a clean slate she's not.

    As an aside, whose bright idea was it to dig up James Carville?


    Wouldn't laughing about getting away with killing Vince Foster properly fall under private e-mail? I mean, I doubt she was doing that as an official function of her position as Secretary of State.


    My thinking about this is a little different. It's not about whether there's some smoking gun hidden on her email server. The point is that she broke the rules, rules that have been in place since Bill was president. Given the violation, it's not sufficient for her to wave her hands and say "we have a great process, all good now." She is obligated to demonstrate that her process is effective. And if she can't, she should turn over the emails to the State Department. Human decisions involve subjectivity, sure, but ambiguity is not the same as outright error.


    There's no doubt that she will be called upon to get into the weeds regarding how the sorting was done and by whom. I absolutely agree that her ambiguity on that count cannot be accepted at face value. That would be true even if she was just the former SOS and not a presidential candidate.

    But the deed having been done, what would be accomplished by turning over the server now? Unless you think the deleted emails can be recovered? When it all comes out in the wash, no matter the transparency, unanswerable questions will remain. Will that hurt her? I guess that depends on who you ask.


    Okay, so you 'read' through 55,000 emails.

    How many were associated with pizza or Chinese?

    Must be one hell of a staff.


    I hear that one of key words is "General Tso"


    We went to war without absolute confidence it was legal or necessary, tortured without absolute confidence it was legal or worked, and pretend to have confidence that our Supreme Court is an apolitical arbitrator of the law.

    And we are supposed to worry about 'absolute confidence' we have all the Hillaremail?

    ........I'm waiting for Hillaremail #4 and #5. Somebody get on it.


    10.  Developments in Litigation Strategy Keep an eye out for emerging strategic nuances in litigation strategy. ………... The court found that the defendants’ review process was defensible but plaintiffs’ argument is one to watch for in the future. In the world of keyword searches, counsel could negotiate a list of words that connect directly to documents that are being produced. The list of keyword search terms was critical to defining the agreed-upon scope of discoverable evidence. When complex linguistic algorithms in predictive coding are replacing the deployment of simple keyword searches, the nature of litigation strategy in discovery will invariably be impacted.

    Download PDF - Association of Corporate Counsel


    Sorry I don't share your pessimism. 1) even Alta Vista and Dogpile in 1998 were horrible - Google came along a year or so later and blew them all away 2) daily I use Outlook searches that autoindex, getting much much better results than Thunderbird - you do get what you pay for 3) I,be worked with speech-to-text where the results were quite good for indexing/metadata and awful for full transliteration verbatim 4) for a task like indexing and archiving and triaging private from public, we probably don't need to parse humor and other advanced qualities - unless it's tweets from Hillary, most emails will be tagged accurately based on recipients, an extended trained relevant keyword list (finding synonyms and related categories automatically) and anything in those threads. That process is simply far more advanced now than 1994 approaches and processing power. Sure someone might have done a dumb search, but anyone computer trained and seriously trying probably got the most of them

    Here's the improvement over a year and a half for Google's voice recognition / voice server accuracy (Siri competitor) - from 61% to 84% for "ability to answer questions", 81% to 93% accuracty for "heard correctly.

    These types of improvement have been going on continually for Google search (along with features designed to make them more money). Presumably Hillary's IT aide could just license a Google applicance or some other obvious IT selection.

     


    That was interesting.  It gives a insight as to why the government expects their officials to delete their personal emails because technology isn't in place to do that yet.  


    The technology can do it, but the robot fingers kept busting the "Del" key.

    Government worker fingers like the rest of their bodies are softer, flabbier and well-adapted to meaningless repetition.


    People, my dreams have come true. I've been quoted at Breitbart.


    Right on!  Michael 


    Don't be so modest ... the entire article is about you! ;-)


    "Government data processing veteran..."

    Our lives are rewritten on the web, clip job by clip job. Ah, journamalism.


    That's Big Journalism to you sir!


    Sorry but even worse. Breitbart? No, people don't write biz mails abstractly. I'm sure Hillary memos are very to-the-point with little room for ambiguity - that's how successful organizers do it, and that's her top skill. But Breitbart runs with the scare possibilities - things that might a happened. Sorry but you just fed a troll big time.

    We should all share our experiences no matter what Breitbart might do with them.


    Hear, hear 

    What's next, someone will claim a letter undermined a favored position, giving aid and comfort to the opponents and the writer(s) should be charged with.... treasonous activities? 


    It's not like any Senators have been hanged over it.


    "We should all share our experiences no matter what..."

    That's a laugh. No.

    With our current cut throat political climate, and the eternal Republican search for non-crimes or 'serious Constitutional abuses' to distract the nation from their latest nefarious war/budget/kill SS/Medicare schemes, with a sole goal, not to seek the truth, but to impeach Democrat Presidents or attack Democrat leaders, some discretion would be called for before feeding the scandal and rumor mongers of the right with 'stuff you experienced' a decade or more ago.

    And MW gave his opinion on the reliability of the current email search, an opinion which frankly was irrelevant because it had no basis or connection to the current endeavor.


    You haven't read "The Unbearable Lightness of Being", have you. Kundera deals with this PoV in depth.


    Obviously, any published article can be snagged and reinterpreted by anyone - blaming the author is misdirected indignance. I wonder, though, how many other sites (if any) will run with it, and if Michael will be contacted for comment.


    I only blame him for overstating the relevance of a 20-year-old attempt at search.

    As noted, even my Outlook email search these days - searching pdf's & word enclosures as well - is much more powerful than a planned consultant project of 1994. Breitbart will conveniently ignore this and push straight on to the innuendo.

    [there's a chance Hillary hired a 17-year-old computer newbie to archive her emails, or she might have hired a professional or used one from State - I'm not willing to guess, or if I did, I'd err on the side of the latter]


    As would I. And as Michael pointed out, she would be well served (no pun intended) by clarifying just that. Maybe take a bit of air out of the innuendo.


    Well, I guess I"m with Hillary and am surprised that State IT department is so incompetent.

    But State Department spokeswoman Jen Psaki said Friday that simply including other State Department officials on an email is by no means a guarantee that these emails would be preserved. Instead, she said that the vast majority of State Department officials are expected to voluntarily archive their work-related emails.

    Psaki at one point said this was “voluntary” for State Department officials, and later said it was their “responsibility” to do this. But she said clearly that there was no automatic way to capture these emails, and it would depend on the actions of the officials involved.

    Psaki said that since 2013, after Clinton had left, the secretary’s emails have been automatically archived. She said this same system has only been in place for dozens of other senior officials since last month.

    WTF?  It's been how many years since this dictum to learn how to archive emails?  That they only did the Secretary's for the last 2 years, but now learned how to archive a few dozen more *last month*? Let's outsource this to the Chinese, who certainly know how to hack US email and archive it. Too bizarre.

    ANother article thinks that "Benhgazi" misspelled  (instead of Benghazi) won't be caught by a keyword search. These reporters don't understand fuzzy search, which has only been around for a couple decades.

    More on screwy State archiving - this isn't a Hillary scandal, it's a US government scandal - we're too stupid to chew gum, much less anything else along with it. 20 years to figure out digital medical records? hell, we can't even save a few thousand emails. I used to process thousands of news releases - it was pretty damn simple, even 20 years ago - even made them searchable with a bit of HTML and MySQL (yes, the same crappy 1994 approach that Michael used - it was good enough for that task at that time - not for 2015). What's up with the inventors of the internet?

     


    What's up with the inventors of the internet?

    Maybe they all went to work for the NSA?

    Go to court and request  the millions of meta data  the NSA has accumulated and stored of Hillary's conversations, text, etc.( what was her location during the Benghazi attack?) 

    Credit card,cell phone records was their encryption methods used to avoid detection?  

    If they can track Merkel, why not Hillary, or is she so powerful, she's excluded. 


    "And as Michael pointed out, she would be well served (no pun intended) by clarifying just that. " - yes, and she should clarify that she wasn't logging on to a BBS system using a 1200 baud modem on a Commodore 64.

    She had just finished a presidential campaign where presumably she had to coordinate communications via email-lists/Facebook/Twitter to millions of donors, the press, event coordinators, along with her normal Senate work and private life, but sorting a few thousand emails during her stint at State we have to check that she wasn't using 20-year-old sorting algorithms with basic logic AND/OR/XOR/NOT with no fuzzy logic/near-miss capabiity? As someone pointed out, legal search systems used every day at legal offices are far past this in capability and accuracy. Perhaps she didn't use this, but then if someone at State or the Inspector General cares, they can ask for it and it can be done in a morning.

    I regularly get documents in Chinese or German

    Here's a nice comment from the New Yorker site

    beihai 20 hours ago

    @DevilsPrinciple @beihai again, this is nonsense. Do you even use email at all? All of my email for the past 18 years is sorted by category. It just took me 2 seconds to find every email I sent one friend and received from one friend and found that they were all sorted under the category of friends. If one were misfiled it would take me a second to resort it to the proper category.

    50 thousand emails are really not that much.

    I must have around that many, if not more, over 18 years.

    The assumption that Hillary was sitting around doing email all day is ludicrous as well. Most correspondence was probably handled by her staff, I imagine even her personal emails were mostly dictated. You seem to think it was just her clickity clacking away on her keyboard day in an day out.

    Pre computer I handled filing at my old printing company. One of my jobs was to make sure everything was where it belonged. I could find one invoice for one company from 5 years previously if I needed to. And I am talking about well over 50,000 invoices. Later on when everything was computerized thousands and thousands of invoices were available in a second, just type in the company name or the invoice number and there it was.

    I highly doubt she was writing to the undersecretary for Asian affairs about her daughters wedding, would you agree? You don't think she knows the email addresses of her personal friends? Do you really think her email to her cousin Helen was going to get confused with an email about the Indian PM?


    I have to say that although you're right that there are exceptional natural language processing tools out there, very few people use them, and the statements she's issued so far do not lead me to believe she's part of the exception and not the rule.

    I don't want to crucify her. She's not alone in her mistakes, and many of the people attacking her now no doubt received e-mails from her and had ample opportunity to correct the mistake when it was still correctable. However, I think there's an important object lesson to be learned. If she and her team were really as tech savvy as you're giving her credit for, she wouldn't be in this position in the first place.


    NLP is a red herring - just need decent search w indexing and near misses. This is all quite standard.

    Decent search requires NLP. Otherwise, you get either a lot of false positives or a lot of false negatives.


    Please give an example of what you think in a State Department email requires heavy NLP.

    sure, the word "state" probably requires context - Libya/Benghazi/Baghdad/Pakistan not, nor "global warming" or "sanctions". And anything to any embassy/.gov address, a list of X foreign country domains,

    Good chance "we wuz jammin w Lady Gaga for Bill's birthday" isn't going to be picked up as work-related.


    What keywords would you use to separate private from official e-mail? Is "operation" going to be about Bill's hernia operation, or about a military operation? "Expense" and "costs" are other interesting words. There are so many ways to do this wrong, and I've witnessed several of them. You seem to think this would be simple, yet I don't see that you have any experience in exhaustively sorting e-mails (or other documents) into two piles. Genghis gave his experience - what exactly do you think has changed since his experience and now, if not for NLP? Are keywords somehow substantially different?


    What changed is for the software to substantially include different models on its own without Michael & his boss having to discover the AND operator. It ain't that hard with 2015 software, presuming someone took it an eensy-weensy seriously, but we can wring our hands and just guess that the clock stopped in 1994 and wait for Hillary to deliver something that will gratify us. Of course having been through Rose/Whitewater and Benghazi hearings, I wouldn't be surprised if her lesson learned is that people will never be satisfied until she simply leaps off a cliff, and that's a pleasure she's not about to give them.


    First of all, how do you think those different models work, if they're not using NLP? Secondly, have you actually used the default models on e-mails to do extensive sorting? Yes, they're better, but they're still going to get lots of false positives and false negatives. It requires a certain amount of expertise to use tools like those correctly, and even then there will be false positives and false negatives. I'm surprised you're discounting the need for intelligence in the process. And, again, don't get all hyperbolic. I'm not asking for her to self-immolate. I'm just saying that this is not an easy process (which is why I've not been calling for any persecution of her personally), and the lesson I would like to be learned is that good IT should be put into place. Why is that at all controversial?


    Yikes.. that's a bummer..


    Sorry I don't share your pessimism. 1) even Alta Vista and Dogpile in 1998 were horrible - Google came along a year or so later and blew them all away 2) daily I use Outlook searches that autoindex, getting much much better results than Thunderbird - you do get what you pay for 3) I,be worked with speech-to-text where the results were quite good for indexing/metadata and awful for full transliteration verbatim 4) for a task like indexing and archiving and triaging private from public, we probably don't need to parse humor and other advanced qualities - unless it's tweets from Hillary, most emails will be tagged accurately based on recipients, an extended trained relevant keyword list (finding synonyms and related categories automatically) and anything in those threads. That process is simply far more advanced now than 1994 approaches and processing power. Sure someone might have done a dumb search, but anyone computer trained and seriously trying probably got the most of them

    Oops, cached 2nd time on phone

    We Americans loved the salaciousness of "Deep throat"  I wonder if such a name would have been entered in the keyword search and allowed to slip by, as personal email?


    You can certainly flag unknown names that repeat in a regular search.

    Additionally there would be ties to words like "meeting" that might conceivably be entered for gov business searches.


    Maybe she uses an Enigma machine, that changes code every 3 hours?

    We need to find the key.


    Great article, who knew! 

    I used outlook just yesterday at work to find some info I needed. Of course I got all kinds of unrelated emails as well.  I don't believe this hurts her in the long term, there have been several other cases of officials doing exactly what she did. 

    I guess it's that at this point Clinton is an undeclared candidate and there isn't much to talk about yet, rest assured this issue will disappear and we will move onto the next thing, whatever that is, to slam Clinton. However, she will survive and she will be President if that is what she wants to be. 


    Very cool article.  Congrats!  And getting a mention on Breitbart is the end all be all. :)

     


    (OT for a sec ...)

    Synch, what happened to your post on your Kickstarter project? I was going to forward it to someone.


    We closed our Kickstarter.  In hindsight I should have just edited that post to let people know.  I was just going around taking things down everywhere and rushed through it.

    We decided that we are going to print the calendar next year for 2017.  We had several things working against us such as my daughter getting hit by a car the day before we launched.  She was not seriously injured thankfully but she was out of commission as far as contributing and being involved which created ripples with the other people working on the project and this seems to be the best course of action.  We are still going to create the calendar one way or another next year.  

    It's kind of mandatory having a daughter who is graduating with a degree in Environmental Science having been confronted with the stark realities we face over the past few years. It just makes it impossible for me not to do as much as I can.  I am for example tonight meeting with a networking group where I will be promoting a community recycle/composting party where we invite an expert to review with us what can be recycled, how we can go about getting composting services in our area, and take questions.  I will use my time for that and  just hand out business cards at the end.  My daughter is planning to put in some effort this summer on recycling in Denver because it came to her attention that most apartment buildings in the city are not recycling.  

    Thanks for the support. 


    Latest Comments