IEEE Workshop on Big Data Governance and Metadata and Management (BDGMM ’2018)

 

big data header

IEEE Workshop on Big Data Governance and Metadata and Management 

(BDGMM ’2018)

Berlin, Germany

March 19 - 20, 2018

March 19 -- Hackathon

March 20 -- Workshop

In conjunction with
RDA Plenary #11 Meeting
#11 RDA Plenary - Industry Side Meeting

Sponsored by
IEEE Big Data Initiative (BDI)
IEEE Standards Association (IEEE-SA)

Hackathon/Workshop Registration: https://rd-alliance.org/rda-11th-plenary-registration

 

Motivations

This workshop is co-located with the Research Data Alliance (RDA) Plenary #11 meeting. The IEEE Big Data Governance and Metadata Management (BDGMM) is sponsored by the IEEE Big Data Technical Community (http://bigdata.ieee.org) and the IEEE Standards Association (http://standards.ieee.org/index.html) with the goal to study where there is a need and opportunity for developing IEEE Standards for Big Data governance and metadata management. The BDGMM 2018 follows the successful BDGMM 2017 workshop (https://bigdatawg.nist.gov/bdmm2017.html) held at IEEE Big Data 2017 (http://cci.drexel.edu/bigdata/bigdata2017).

Big Data is a collection of data so large, so complex, so distributed, and growing so fast (or 5Vs- volume, variety, velocity, veracity, and value). It has been known for unlocking new sources of economic values, providing fresh insights into sciences, and assisting on policy making. However, Big Data is not practically consumable until it can be aggregated and integrated into a manner that a computer system can process. For instance, in the Internet of Things (IoT) environment, there is a great deal of variation in the hardware, software, coding methods, terminologies and nomenclatures used among the data generation systems. Given the variety of data locations, formats, structures and access policies, data aggregation has been extremely complex and difficult. More specifically, a health researcher was interested in finding answers to a series of questions, such as “How is the gene ‘myosin light chain 2’ associated with the chamber type hypertrophic cardiomyopathy? What is the similarity to a subset of the genes’ features? What are the potential connections among pairs of genes”? To answer these questions, one may retrieve information from databases he knows, such as the NCBI Gene database or PubMed database. In the Big Data era, it is highly likely that there are other repositories also storing the relevant data. Thus, we are wondering

  • Is there an approach to manage such big data, so that a single search engine available to obtain all relevant information drawn from a variety of data sources and to act as a whole?
  • How do we know if the data provided is related to the information contained in our study?
To achieve this objective, we need a mechanism to help us describe a digital source so well that allows it to be understood by both human and machine. Metadata is "data about data". It is descriptive information about a particular dataset, object or resource, including how it is formatted, and when and by whom it is collected. With those information, the finding of and the working with particular instances of Big Data would become easier. Besides, the Big Data must be managed effectively. This has partially manifested in data models a.k.a. “NoSQL”. The goal of this multidisciplinary workshop is to gather both researchers and practitioners to discuss methodological, technical and standard aspects for Big Data management. Papers describing original research on both theoretical and practical aspects of metadata for Big Data management are solicited.

 

Topics

Topics include, but are not limited to:
  • Metadata standard(s) development for Big Data management
  • Methodologies, architecture and tools for metadata annotation, discovery, and interpretation
  • Case study on metadata standard development and application
  • Metadata interoperability (crosswalk)
  • Metadata and Data Privacy
  • Metadata for Semantic Webs
  • Human Factors on Metadata
  • Innovations in Big Data management
  • Opportunities in standardizing Big Data management
  • Digital object architectures and infrastructures for Big Data management
  • Best practices and standard based persistent identifiers, data types registry structures and representations for Big Data management
  • Query languages and ontology in Big Data
  • NoSQL databases and Schema-less data modeling
  • Multimodal resource and workload management
  • Availability, reliability and Fault tolerance
  • Frameworks for parallel and distributed information retrieval
  • Domain standardization for Big Data management
  • Big Data governance for data integrity, quality, provenance, retention, asset management, and business intelligence
In addition to the accepted papers, the workshop intends to have an industry focus through a keynote speaker and hackathon challenges. The hackathon session will explore interoperable data infrastructure for Big Data Governance and Metadata Management that is scalable and can enable the Findability, Accessibility, Interoperability, and Reusability between heterogeneous datasets from various domains without worrying about data source and structure.

 

Paper submission instructions 

This workshop will only accept for review original papers that have not been previously published. Papers should be formatted based on the IEEE Transactions journals and conferences style; maximum allowed camera-ready paper length is ten (10) pages. Submissions must use the followiing formatting instructions:
8.5" x 11" x 2 (DOC, PDF, LaTex Formatting Macros)

Please submit your paper(s) to Wo Chang (wchang@nist.gov) and the paper(s) will go through peer-review process.

Accepted papers will be published under the IEEE BDGMM website while exploring publishing them under the IEEE Xplore . For further questions please contact Wo chang (wchang@nist.gov).

Review procedure 

All submitted papers will be reviewed by 3 international program committees.

 

Hackathon: 24 hours on Data Mashup (Varieties Problem) Big Data Analytics  

Governance and metadata management poses unique challenges with regard to the Big Data paradigm shift. It is critical to develop interoperable data infrastructure for Big Data Governance and Metadata Management that is scalable and can enable the Findability, Accessibility, Interoperability, and Reusability between heterogeneous datasets from various domains without worrying about data source and structure.

Hackathon Track#1: FIESTA-IoT: Experimentation-As-A-Service for Big IoT Testbed Data
Organization and support by FIESTA-IoT EU H2020 Project

Problem Statement

FIESTA-IoT provides a platform for IoT testbeds to federate using a common semantic model that builds upon ontologies for sensors and observations, IoT concepts, and spatial and temporal contexts. The model also adopts a taxonomy for sensing devices, quantity kinds and units, which grows based on the federated testbeds. Data integrity is achieved by a Resource and Observation Validation process. Quality is enforced by a certification suite that testbed providers must pass. And in terms of business intelligence, the platform provides a dashboard for testbed monitoring and threshold alerting. Metadata management is a core feature in the FIESTA-IoT platform. All testbeds and their sensor devices are required to register with the platform. The registration involves the submission of descriptions which define the properties of a “Resource”. All descriptions must comply with the FIESTA-IoT Ontology. Once validated, it is then stored in the IoT Registry. The IoT registry is also the point of contact for experimenters to discover “Resources” of interest, and hence retrieve the datasets generated by them. Using the IoT registry’s SPARQL endpoint, dataset retrieval can be acquired using various formats, and can then be reused with other datasets, based on the query response structure and format.

Challenges

Develop data mashup scheme based on use cases to cross reference different datasets from a range of IoT testbeds that produce data concerning smart cities, smart buildings, environment, maritime, wireless networks, data centers, etc. and apply statistical analysis, visualization, and machine learning tools to statistically analyze and develop predictive models for the design and deployment of advanced (experimental) applications, notably applications that will leverage data and resources from multiple administratively and geographically dispersed IoT testbeds. Think outside the box and come up with innovative ideas that bring more value out of the data, or choose one or more of the following to:

  1. Identify relationships between different Resources or trends within Resources using Inquisitive analytics.
  2. Assess Resource/Testbed performance and data governance aspects using descriptive analytics techniques
  3. Forecast and statistically model Resources and their observations to determine the future states using predictive analytics
  4. Infer how datasets that could be providing redundant data, be optimized using prescriptive analytics techniques.

Datasets 

Datasets are accessible through APIs or the portal, with links to API/portal docs. Available datasets can be found at http://fiesta-iot.eu/index.php/fiesta-testbeds.

Hackathon Track#2: ** CANCELLED ** - Due to insufficient CLARIN community members participation. This activity will defer to other events in a near future.

Hackathon Team, Computing Environment, and Implementation White Paper

All participants must be registered via the RDA Plenary #11 main conference website and attend physically. You may register as a team (up to four per team) or an individual (we will place you on a team). Each participant brings his/her own laptop with all the necessary computing tools. No remote computing resources are allowed. All implementation must be based on the original work. Participating teams are encouraged to submit implementation approach as a white paper which will be published as part of the IEEE Big Data Governance and Metadata Management publication three months after the hackathon event.

Evaluation Team 

  • David Belanger, Chair of IEEE Big Data Technical Community, Stevens Institute of Technology
  • Mahmoud Daneshmand, Vice-Chair of BDGMM, Steven Institute of Technology
  • Kathy Grise, Senior Program Director, Future Directions, IEEE
  • Joan Woolery, Senior Project Manager, Industry Connections, IEEE Standards Association, IEEE
  • Cherry Tom, Emerging Technologies Initiatives Manager, IEEE Standards Association, IEEE
  • Tarek Elsaleh, Systems Developer, University of Surrey
  • Elias Tragos, Project Manager, National University of Ireland Galway
  • Luis Sanchez, Associate Professor, Universidad de Cantabria
  • Ronald Steinke, Senior Researcher, Fraunhofer FOKUS

Evaluation Criteria 

Technical Approach (40 pts)
- Data mashup (20)
- Big Data analytics (20)

Novelty (40 pts)
- Creativity (20)
- Efficiency (20)

Results (20 pts)
- Output content (10)
- Output format (10)

Winners:
- Plaques/trophies for 1st, 2nd, 3rd only
- Additional cash awards* provided by FIESTA-IoT for the FIESTA-IoT Hackathon: 1st: 2000 EUR, 2nd: 1000 EUR, 3rd: 500 EUR
'*' - at the discretion from the Evaluation Team

 

Important Dates

Mar 7, 2018: Due date for full workshop paper submission (Extended) 

Mar 8, 2018: Notification of paper acceptance with copyright forms to authors 

Mar 12, 2018: Deadline for hackathon sign-up

Mar 15, 2018: Send accepted papers and copyright forms to IEEE-SA for review and approval

Mar 19, 2018: Hackathon

Mar 20, 2018: Workshop

Jun 20, 2018: Due date for Hackathon White Paper

 

Program Schedules 

Day-1: March 19, 2018

TimeTopic
08:00 – 08:10Welcome, Wo Chang, Chair of IEEE BDGMM, NIST
08:10 – 08:20Opening Remark, David Belanger, Chair of IEEE Big Data Technical Community, Stevens Institute of Technology
08:20 – 10:00 Hackathon Briefing on use case, datasets, challenges, Q/As
Tarek Elsaleh, University of Surrey (FIESTA-IoT CONSORTIUM), UK
10:00 – till next day 08:00Solving hackathon challenges
Next day 09:00 – 10:30 Hackathon Presentation and Evaluation, See Team & Criteria
Day-2 Later AfternoonAward Ceremony

Day-2: March 20, 2018

TimeTopic
09:00 – 10:30Hackathon Presentation and Evaluation
10:30 – 11:00Coffee Break
11:00 – 11:10 Opening Remark, David Belanger, Chair of IEEE Big Data Technical Community, Stevens Institute of Technology
11:10 – 11:40 Keynote Speaker: Making Open Science Work for Science and Society

Dr. Beth A. Plale, On loan to the National Science Foundation as Science Advisor for Public Access; Professor, Informatics and Computing, Indiana University, US

11:40 – 12:05Invited Speaker: Big Data Governance Management

Dr. Ismael Caballero Muñoz-Reja, Associate Professor at UCLM and Training Head of DQTeam, Spain

12:05 – 12:30Invited Speaker: Metadata Solutions and Data Sharing Licensing for Big Data

Dr. Jane Greenberg, Alice B. Kroeger Professor; Director, Metadata Research Center; and Associate Department Head, Graduate Affairs, Information Science, College of Computing and Informatics, Drexel University, US

12:30 – 13:30Lunch
13:30 – 15:00Paper Presentations

Authors...

15:00 – 15:30Coffee Break
15:30 – 16:30Paper Presentation-10

Authors

16:30 – 16:50Hackathon Ceremony

David Belanger and Kathy Grise

16:50 – 17:00 Announcement for next BDGMM Event, Wo Chang

Keynote Speaker 

Dr. Beth A. Plale, On loan to the National Science Foundation as Science Advisor for Public Access; Professor, Informatics and Computing, Indiana University, US

Making Open Science work for science and society
Open science is built on the premise that research data should be shared and shareable. The technical and social foundations are beginning to be in place to realize open science on a grand scale. Guiding principles such as the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) give solid form to what it means for research data to be shareable. International organizations such as the Research Data Alliance (RDA), now in its 5th year, are forums where consensus solutions necessary for bridging heterogeneity have already emerged. Most funding organizations have open access plans in place, and fundamental research in key data-sharing areas such as data provenance, naming, and reproducibility have had years to decades of work. Training in data sharing is picking up pace.

What will it take to reach the tipping point to broad sharing? In this talk I take stock, and identify social and technical barriers and opportunities in open science. I take short dives into areas of personal expertise in data provenance, persistent IDs, and barriers to data sharing. Through continued commitment and collective effort, open science can prevail as a economically viable contribution to society.

Dr. Beth A. Plale is currently on assignment at the National Science Foundation where she is serving as Science Advisor for Public Access. In this capacity Dr. Plale works with the scientific community, across the directorates of NSF, and nationally and internationally to advance access to data from public research. Dr. Plale is a Full Professor of Informatics and Computing at Indiana University. She has authored over 150 peer-reviewed publications, and been PI or co-PI on over $50 million in extramural funding from industry, federal and non-profit organizations.

Professor Plale’s postdoctoral studies were at Georgia Institute of Technology, and PhD in computer science from State University of New York Binghamton. She has an MBA, a MSc in Computer and Information Science and a BSc in computer science with a minor in math. Dr. Plale past chair of the Research Data Alliance (RDA) Technical Advisory Board (TAB); and founding director of the Data To Insight Center and HathiTrust Research Center both in the United States.

Invited Speaker 

Dr. Ismael Caballero Muñoz-Reja, Associate Professor at UCLM and Training Head of DQTeam, Spain
Abstract: Forthcoming...

Ismael Caballero holds a PhD in Computer Science from the University of Castilla-La Mancha, where he works as associate professor teaching Software Engineering and Data Quality Management foundations. His main research interests are focused on Data Quality Management, Data Governance, and Big Data. His publications have been published is several international forums like the International Conference on Information Quality (ICIQ). He is one of the General Chairs for ICIQ 2016. He is cofounder and Training Head of DQTeam. He was nominated as national expert by AENOR to work in the ISO TC184/SC4/WG13 and WG23 to participate in the development of international standards for data quality as well in the ISO JTC 1/WG 9, where he is in charge of the Mandate to develop the part of Data Governance for Big Data.

Dr. Jane Greenberg, Alice B. Kroeger Professor; Director, Metadata Research Center; and Associate Department Head, Graduate Affairs, Information Science, College of Computing and Informatics, Drexel University, US

Presentation: Metadata Solutions and Data Sharing Licensing for Big Data
Sharing data can provide tremendous mutual benefits for industry, researchers, and nonprofit organizations. Even so, legal conditions, local policies, and privacy concerns present significant challenges when seeking to share sensitive data. In fact, well-intentioned data sharing plans start out optimistically, but frequently fail due to prohibitive legal costs and, ultimately, protracted negotiations. Researchers at Drexel University, Massachusetts Institute of Technology (MIT), and Brown University are addressing these challenges through an NSF Spoke Initiative “A Licensing Model and Ecosystem for Data Sharing,” connected with the Northeast Big Data Innovation Hub (NEHDIH). This presentation will focus the project’s licensing model development, specifically the underlying metadata infrastructure. Attention will be given to the system ontology and metadata supported life-cycle features for sharing closed and not necessarily free data. I will also share about the DataHub prototype software platform that enforces data sharing conditions and restrictions and mints licenses and generates relevant metadata to ensuring data access and interpretation.

Jane Greenberg is the Alice B. Kroeger Professor and Director of the Metadata Research Center (http://cci.drexel.edu/mrc/) at the College of Computing & Informatics, Drexel University. Her research activities focus on metadata, knowledge organization/semantics, linked data, data science, and information economics. She serves on the advisory board of the Dublin Core Metadata Initiative (DCMI) and the steering committee for the NSF Northeast Big Data Innovation Hub (NEBDIH). She is a principal investigator (PI) on the NSF Spoke initiative, 'A Licensing Model and Ecosystem for Data Sharing,' and the lead PI the Metadata Capital Initiative (MetaDataCAPT'L) and the Helping Interdisciplinary Vocabulary Engineering (HIVE) linked data project. She is also a co-PI for Drexel's NSF Industry/University Collaborative Research Center (NSF-I/UCRC), Center for Visualization and Decision Informatics (CVDI). Her research has been funded by the NSF, NIH, IMLS, Microsoft Research, National Library of Medicine, Library of Congress, OCLC Online Computer Library Center, among other organizational and private sponsors. She has received numerous awards and honors for her research and leadership. She is a 2016 ELATE at Drexel® Fellow and, in 2014, she was among the first cohort of Data Science Fellows at the National Consortium for Data Science, Chapel Hill, North Carolina.

 

Workshop Organizers 

General Co-Chairs

Wo Chang
Digital Data Advisor
National Institute of Standards and Technology, USA
Convenor, ISO/IEC JTC 1/WG 9 Working Group on Big Data
Chair, IEEE Big Data Governance and Metadata Management
Email: chang@nist.gov

Priyaa Thavasimani
Chair, IEEE BDGMM Workshop Subcommittee
Newcastle University, UK
Email: P.Thavasimani2@newcastle.ac.uk

Mahmoud Daneshmand (PhD)
Professor, Stevens Institute of Technology, USA
Co-Chair, IEEE Big Data Governance and Metadata Management
Co-founder, IEEE BDIs
Email: mahmoud.daneshmand@gmail.com

Program Co-Chairs

Kathy Grise
Senior Program Director, Future Directions, IEEE Technical Activities, USA
Email: k.l.grise@ieee.org

Yinglong Xia (PhD)
Huawei Research America, USA
Co-chair, IEEE BDI - Big Data Management Standardization
Email: yinglong.xia.2010@ieee.org

Publicity Chairs

Cherry Tom
Emerging Technologies Intelligence Manager
IEEE Standards Association
445 Hoes Lane, Piscataway, NJ 08854-4141
Email: c.tom@ieee.org

 

Technical Program Committee

Name Organization Country
Frederic AndresNational Institute of InformaticsJapan
Paventhan ArumugamERNETIndia
Claire AustinS&T Strategies,Environment & Climate Change Canada Canada
Ismael CaballeroUCLMSpain
Yue-Shan ChangNational Central UniversityTaiwan
Periklis ChatzimisiosDepartment of Informatics, Alexander TEI of ThessalonikiGreece
Hung-Ming ChenNational Taichung University of Science and TechnologyTaiwan
Miyuru DayarathnaWSO2 Inc.Sri Lanka
Jacob DillesAcuant Corp.US
Robert HsuChung Hua University Taiwan
Wei HuNanjing UniversityChina
Carson LeungUniversity of ManitobaCanada
Sian Lun LauSunway University Malaysia
Christian Camilo Urcuqui LóepzIcesi University Colombia
Neil MillerThe bioinformatics for Children's Mercy HospitalUSA
Jinghua MinChina Electronic Cyberspace Great Wall Co., Ltd.China
Carlos MonroyRice UniversityUS
Huansheng NingUSTBChina
Arindam PalTCS ResearchIndia
Lijun QianPrairie View A&M UniversityUSA
Weining Qianx East China Normal UniversityChina
Yufei RenIBMUSA
Robby RobsonEduworks CorporationUS
Angelo Simone ScottoEuropean Food Safety Authority Italy
Priyaa ThavasimaniNewcastle University UK
Alex ThomoUniversity of VictoriaCanada
Chongang WangInterDigital CommunicationsUSA
Jianwu WangUniversity of Maryland, Baltimore CountyUS
Shu-Lin WangNational Taichung University of Science and TechnologyTaiwan
Jens WeberUniversity of VictoriaCanada
Lingfei WuIBM ResearchUSA
Hao XuUniversity of North Carolina at Chapel HillUS
Godwin YeboahUniversity of WarwickUK
Tim ZimmerlinAutomation TechnologiesUS