Quality Crowdsourcing

Crowdsourcing is a distributed sourcing and computing model where individuals or organizations leverage the crowd’s intelligence towards solving problems, or getting new services or ideas. Although, this is an quick, inexpensive and effective method to get answers to tasks that are difficult for the computer but easy for human beings to answer; quality of responses from a crowd is always a concern.

Poor quality in crowdsourcing results can arise from various situations:

  • Non-alignment of interests between contributing participants and system stakeholders.
  • Unreliable results from using the cheap but scalable online crowdsourcing sites.
  • Heterogeneous quality of results because of the dissimilar levels of expertise and experience.
  • The abundance of low-quality work, and the fact that verifying the correctness of this work is far more expensive than performing the task itself.
  • Carelessness, misunderstanding of the questionnaire, or even wrong intent (cheating, fraud, etc.)
  • Limited evaluation of the questionnaire given by the stakeholders.

Enhancing sustainability, robustness, and maintaining quality of responses is especially important today, since critical organizations such as world health organizations, and relief agencies today use crowd sourcing to find solutions to their problems. Expert and peer reviews, majority voting, machine learning, game theory, etc. are just some of the practices used to control quality in crowdsourcing today. These measures, however are not achieving the desired quality in results based on available literature surveys conducted by experts.

Lalit Mohan Sanagavarapu, senior research member in the Software Engineering Research Center, IIIT-H is currently researching a Requirements Engineering based approach of Completeness, Consistency and Correctness (3Cs) to get the desired quality control in Information Security related responses.

The 3Cs in any response are measured (CSQuaRE) in terms of the degree of coverage with reference to the knowledge base, contributor credibility in the domain and alignment to evolving domain ontology.

  • Measuring Completeness: Since obtaining complete information is a never ending problem in the real world scenario, the completeness measure for a response is always taken with reference to the extracted knowledge base (KB), termed as Adequate Completeness ACP.
  • Measuring Consistency: Consistency is the measure of conflict free sentences of the response and the determinism with respect to the objective (question). Based on evolving ontology with increasing domain content, the consistency measurement ACN is an identification of conflict free tuples (Concept + Relationship + Concept), along with the credibility of the contributor based on the history of their past contributions.
  • Measuring Correctness: Correctness is the degree to which the response contains conditions and limitations for the desired capability (question). Hence, a response correctness is not necessarily a binary (Yes / No or True / False) but a degree of match / similarity. The adequate correctness ACR of a crowdsourced response is measured, based on the response semantic similarity with extracted Knowledge Base and to the question type (What, Why, When, Where, Who and How).

The approach involves assigning a metric so that response seekers / viewers can have most relevant responses. The proposal’s suggested approach will then be demonstrated for Information Security related question and answers on a crowdsourcing platform, and the results of the approach will be compared with existing quality control techniques, and the feedback from security experts.


Leave a Reply

Your email address will not be published. Required fields are marked *

Next post