IEEE Hackathon on Big Data Governance and Metadata and Management (BDGMM Summer 2018)
IEEE Hackathon on Big Data Governance and Metadata and Management
(BDGMM Summer 2018)
July 23, 2018
In conjunction with
Click to Register for the Hackathon
(FREE for full-time COMPSAC Registrants, $200 for Hackathon event only)
Access to the Competition Datasets
A subset sample dataset brain_sample.zip (under Documentation: one subject, 330MB, freely available)
Full Datasets (49 subjects, 17GB, simple registration is required via 'Login')
This hackathon is co-located with the The 42nd IEEE International Conference on Computers, Software & Application: Staying Smarter in Smartening World. The IEEE Big Data Governance and Metadata Management (BDGMM) is sponsored by the IEEE Big Data Technical Community (http://bigdata.ieee.org) and the IEEE Standards Association (http://standards.ieee.org/index.html) with the goal to study where there is a need and opportunity for developing IEEE Standards for Big Data governance and metadata management. The BDGMM Summer 2018 workshop follows the successful BDGMM Spring 2018 workshop (https://bigdatawg.nist.gov/bdgmm2018.html) and BDGMM 2017 workshop (https://bigdatawg.nist.gov/bdmm2017.html).
Big Data is a collection of data so large, so complex, so distributed, and growing so fast (or 5Vs- volume, variety, velocity, veracity, and value). It has been known for unlocking new sources of economic values, providing fresh insights into sciences, and assisting on policy making. However, Big Data is not practically consumable until it can be aggregated and integrated into a manner that a computer system can process. For instance, in the Internet of Things (IoT) environment, there is a great deal of variation in the hardware, software, coding methods, terminologies and nomenclatures used among the data generation systems. Given the variety of data locations, formats, structures and access policies, data aggregation has been extremely complex and difficult. More specifically, a health researcher was interested in finding answers to a series of questions, such as “How is the gene ‘myosin light chain 2’ associated with the chamber type hypertrophic cardiomyopathy? What is the similarity to a subset of the genes’ features? What are the potential connections among pairs of genes”? To answer these questions, one may retrieve information from databases he knows, such as the NCBI Gene database or PubMed database. In the Big Data era, it is highly likely that there are other repositories also storing the relevant data. Thus, we are wondering
- Is there an approach to manage such big data, so that a single search engine available to obtain all relevant information drawn from a variety of data sources and to act as a whole?
- How do we know if the data provided is related to the information contained in our study?
Hackathon: 24 hours on Data Mashup (Varieties Problem) Big Data AnalyticsGovernance and metadata management poses unique challenges with regard to the Big Data paradigm shift. It is critical to develop interoperable data infrastructure for Big Data Governance and Metadata Management that is scalable and can enable the Findability, Accessibility, Interoperability, and Reusability between heterogeneous datasets from various domains without worrying about data source and structure.
Hackathon: Brain Data Bank on Video Gaming Enhances Cognitive Skills
Submitted and Subject Matter Expert by Dr. David Ziegler
Director of Technology Program; Multimodal Biosensing – Neuroscape University of California San Francisco
[Reference: Nature. 2013 Sep 5; 501(7465): 97–101, doi: 10.1038/nature12486]
Cognitive control is defined by a set of neural processes that allow us to interact with our complex environment in a goal-directed manner. Humans regularly challenge these control processes when attempting to simultaneously accomplish multiple goals (multitasking), generating interference as the result of fundamental information processing limitations. It is clear that multitasking behaviour has become ubiquitous in today’s technologically dense world3, and substantial evidence has accrued regarding multitasking difficulties and cognitive control deficits in our ageing population4.
Here we show that multitasking performance, as assessed with a custom-designed three-dimensional video game (NeuroRacer), exhibits a linear age-related decline from 20 to 79 years of age. By playing an adaptive version of NeuroRacer in multitasking training mode, older adults (60 to 85 years old) reduced multitasking costs compared to both an active control group and a no-contact control group, attaining levels beyond those achieved by untrained 20-year-old participants, with gains persisting for 6 months.
These findings highlight the robust plasticity of the prefrontal cognitive control system in the ageing brain, and provide the first evidence, to our knowledge, of how a custom-designed videogame can be used to assess cognitive abilities across the lifespan, evaluate underlying neural mechanisms, and serve as a powerful tool for cognitive enhancement.
Tutorial and Hands-on (No neuroscience background is needed but willing to work within a team is preferred)
Subject Matter Experts: Provide neuroscience overview and hands-on excerise on given datasets
- Dr. David Ziegler (Tutorial), Director of Technology Program, Multimodal Biosensing, UCSF, USA
- Dr. Seth Elkin-Frankston (Hands-on), Scientist, Cognitive Systems, Charles River Analytics Inc., USA
Develop data mashup scheme based on use cases to cross reference different datasets and apply statistical analysis, visualization, and machine learning tools to statistically analyze and develop predictive models for what changed between “shoot only or single tasking” and “drive & shoot or multi-tasking” from the EEG (electroencephalography) signals. Think outside the box and come up with innovative ideas that bring more value out of the data, or choose one or more of the following to:
Beginner Challenge Questions
- What are the strengths vs. limitations of the EEG technology? How do consumer-level EEG headsets compare to laboratory-grade equipment?
- What are the realistic EEG applications in daily life (automatic driving, interactive games, Internet Marketing, etc.)? Provide convincing prototypes (virtual or real).
- Try to conduct an event-related potential (ERP) analysis of the data in one or more conditions. How does this approach compare to that used in the Nature paper (i.e., ERSP-Event-Related Spectral Perturbation or time-frequency analysis)? Hint: check out the EEGLab and Fieldtrip tutorial
- Try conducting an ICA decomposition analysis of the data (Hint: this is best done in EEGLab). How does this approach compare to that used in the Nature paper or the ERP analysis suggested above? What new information can we learn using this approach?
- Would a micro-state analysis be appropriate for the data? What new knowledge might we learn from such an approach?
- What advanced methods (e.g., deep learning, but also others) are available that would help predict post-game performance? Specifically by what mechanisms and by how much?
- Languages: MATLAB, Python, C++.
- EEG Analysis & Visualization tools (all three are free and have excellent tutorials):
- MATLAB toolbox for M/EEG analysis
- Largely command-line functionality
- Particular emphasis on analyses in the time-frequency domain
- Interactive MATLAB toolbox for processing M/EEG data
- GUI and command line options
- Particular emphasis on ICA methods for decomposing data and extracting meaningful components
- Open-source Python software for visualizing and analyzing M/EEG data
- Particularly good for source-localization analysis
- Stand-alone (C++) EEG analysis software
- Option to conduct micro-state analysis
A subset sample dataset brain_sample.zip (Under Documentation: one subject, 330MB, freely available) and Full Datasets (49 subjects, 17GB, simple registration is required via 'Login') can be download from the IEEE DataPort. Datasets contains pair “single tasking” and “multi-tasking” with the following set of files:
- Dataset 1 (group of): xxxx_DS_n.bdf where DS = “drive and shoot” or multi-tasking and n = 1,2,3, etc.
- Dataset 2 (group of): xxxx_SO_n.bdf where SO = “shoot only” or single-tasking and n = 1,2,3, etc.
- Dataset 3 (group of): xxxxB_DS_n.bdf where DS = “drive and shoot” or multi-tasking and B = POST training and n = 1,2,3, etc.
- Dataset 4 (group of): xxxxB_SO_n.bdf where SO = “shoot only” or single-tasking and B = POST training and n = 1,2,3, etc.
All data recorded with BioSemi 64 (with bdf extension) and each bdf file is about 40MB. Files with the same “xxxx” (subject name) that have the ending ‘B’ are the POST training EEG files for a given subject. Note that all participants have both a PRE and POST (but most do).
Hackathon Team, Computing Environment, and Implementation White Paper
All participants must be registered via the IEEE COMPSAC 2018 Registration and attend physically. You may register as a team (up to four per team) or an individual (we will place you on a team). Each participant brings his/her own laptop with all the necessary computing tools. No remote computing resources are allowed. All implementation must be based on the original work. Participating teams are encouraged to submit implementation approach as a white paper which will be published as part of the IEEE Big Data Governance and Metadata Management publication three months after the hackathon event.
- David Belanger, Chair of IEEE Big Data Technical Community, Stevens Institute of Technology
- Mahmoud Daneshmand, Vice-Chair of BDGMM, Steven Institute of Technology
- Kathy Grise, Senior Program Director, Future Directions, IEEE
- Joan Woolery, Senior Project Manager, Industry Connections, IEEE Standards Association, IEEE
- Cherry Tom, Emerging Technologies Initiatives Manager, IEEE Standards Association, IEEE
- David Ziegler, Director of Technology Program; Multimodal Biosensing – Neuroscape University of California San Francisco
- Seth Elkin-Frankston, Scientist, Cognitive Systems, Charles River Analytics Inc
Evaluation CriteriaTechnical Approach (40 pts)
- Data mashup (20)
- Big Data analytics (20)
Novelty (40 pts)
- Creativity (20)
- Efficiency (20)
Results (20 pts)
- Output content (10)
- Output format (10)
- 1st Place: $400*
- 2nd Place: $200*
- 3rd Place: $100*
- All team members win a t-shirt
'*' - at the discretion from the Evaluation Team
Jul 16, 2018: Deadline for hackathon sign-up
Jul 23, 2018: Hackathon
Oct 23, 2018: Due date for Hackathon White Paper
July 23, 2018
|08:00 – 08:10||Welcome, Wo Chang, Chair of IEEE BDGMM, NIST, USA||08:10 – 08:20||Opening Remark, David Belanger, Chair of IEEE Big Data Technical Community, Stevens Institute of Technology||08:20 – 10:00||
Hackathon Briefing on use case, datasets, challenges, Q/As
Dr. David Ziegler (Tutorial), Director of Technology Program, Multimodal Biosensing, UCSF, USA
Dr. Seth Elkin-Frankston (Hands-on), Scientist, Cognitive Systems, Charles River Analytics Inc., USA
|10:00 – till next day 08:00||Solving hackathon challenges||Next day 09:00 – 10:30||Hackathon Presentation and Evaluation, See Team & Criteria||Day-2 Later Afternoon||Award Ceremony|
General Co-ChairsWo Chang
Digital Data Advisor
National Institute of Standards and Technology, USA
Convenor, ISO/IEC JTC 1/WG 9 Working Group on Big Data
Chair, IEEE Big Data Governance and Metadata Management
Chair, IEEE BDGMM Workshop Subcommittee
Newcastle University, UK
Mahmoud Daneshmand (PhD)
Professor, Stevens Institute of Technology, USA
Co-Chair, IEEE Big Data Governance and Metadata Management
Co-founder, IEEE BDIs
Program Co-ChairsKathy Grise
Senior Program Director, Future Directions, IEEE Technical Activities, USA
Yinglong Xia (PhD)
Huawei Research America, USA
Co-chair, IEEE BDI - Big Data Management Standardization
Publicity ChairsCherry Tom
Emerging Technologies Intelligence Manager
IEEE Standards Association
445 Hoes Lane, Piscataway, NJ 08854-4141