Differences

This shows you the differences between two versions of the page.

Link to this comparison view

cgss15:students:jiayil:polyanalyst:weeklymeetings:2_26_2016 [2016/02/26 17:54] (current)
jiayil created
Line 1: Line 1:
 +====== Megaputer Intelligence Weekly Meeting 2/26/2016 ======
  
 +===== Entity Resolution -- Sergei =====
 +Goal: to identify different manifestations of the same real world object
 +  * different records for the same person in a database
 +  * different ways of addressing (name, email) the same person in text
 +  * different accounts of the same person in a social network
 +  * different photos of the same object
 +  * ...
 +
 +Motivating examples:
 +  * Purging customer databases
 +  * Linking internal data to external sources
 +  * Fraud detection
 +  * Counter terrorism
 +  * Spam detection
 +  * Named entity extraction
 +  * ...
 +
 +Challenges in ER:
 +  * Name ambiguity (Hilton Paris, hotel or celebrity)
 +  * Errors due to data entry (check contextual patterns)
 +  * Missing values (check different data sources)
 +  * Changing attributes ​
 +  * Evolving values
 +  * Data formatting (different ways of writing dates)
 +  * Abbreviations
 +
 +De-duplication Task and Records Linkage: ​
 +  * Cluster records that correspond to the same entity
 +  * Select suitable cluster representative ​
 +  * Link records that match from different data sources
 +
 +Levenstein -> check!!!
 +
 +Data enrichment: merge info from duplicate mentions to construct a cluster representative with maximal information
 +
 +Relational & Multi-entity ER
 +
 +Measure similarities (what is the probability?​): ​
 +(a,b) = /​log(frequency)
 +
 +Some nodes in less than a month?!
cgss15/students/jiayil/polyanalyst/weeklymeetings/2_26_2016.txt ยท Last modified: 2016/02/26 17:54 by jiayil
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 4.0 International
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki