Big Data Deduplication and data matching using Python
|Project:||ABS Census Data Enhancement|
Andrew Rowe will present the lessons learnt and techniques used to process very large amounts of data from the ABS Census. The Australian Bureau of Statistics used Python to investigate data from the 2006 Australian Census. Python is an integral part of ABS systems to determine duplicated entries and link people in the Census to other ABS collections. You will learn about:
* Handling large data.
* Dealing with confidentiality.
* Multiprocessing techniques.
* Performance tips and tricks.
* Difference between if( 1 < 2 ) and if 1 < 2.
Andrew Rowe is a old senior developer with the Australian Bureau of Statistics with more than 20 years computing experience and has presented before at PyCon 2012.