Why Platforms Like CodeSignal and HackerRank Can’t and Shouldn’t Detect ChatGPT Cheating

Recently, CodeSignal and HackerRank published articles (CodeSignal post, HackerRank post) explaining that ChatGPT-based cheating is either ineffective on their assessments or easily detected by utilizing proctoring features or looking for identifiable traces in ChatGPT outputs.

However, for those who have tried GPT (particularly GPT-4), it is evident that platforms like CodeSignal and HackerRank may not effectively detect cheating via ChatGPT. Their primary argument against cheating detection revolves around stringent proctoring tools, which, as we will discuss later, might not be the most effective solution. Rather than addressing the core issue – the ease with which candidates can generate answers using ChatGPT – these platforms focus on detecting cheaters.

Generally, coding platforms are likely to adopt one of two extreme approaches in response to ChatGPT:

Strictly proctor assessments and attempt to detect ChatGPT usage.
Embrace ChatGPT and refine assessments to emphasize skills that will be relevant for future developers.

By focusing on proctoring and limiting ChatGPT usage, these platforms will become even more outdated and impractical, emphasizing obsolete skills rather than acknowledging the changing landscape of developer skills. It seems that CodeSignal and HackerRank are leaning towards the first approach, expressing confidence in their ability to prevent ChatGPT-assisted cheating and detect its usage.

Is it Easy to Get An Answer For a CodeSignal or HackerRank Problem?

The reality is that it is quite easy. Even with HackerRank's suggestions to remove questions that require only a few lines of code to solve, with GPT-4's impressive token limit of ~32,000 (approximately 25,000 words), it is hard to imagine what is the alternative. This is particularly true for HackerRank-style questions, which typically involve a prompt and necessitate coding from scratch, without exploring a codebase. It's difficult to envision a feasible approach where candidates are asked to write a 25,000-word program.

CodeSignal cites a quote from StackOverflow in December 2022, stating that they will not accept answers from GPT. However, this is a misleading comparison, as StackOverflow focuses on solving recent, unique problems, whereas CodeSignal assessments are more akin to LeetCode-style questions that don’t require new updated knowledge. Furthermore, GPT-4 has significantly improved since December 2022. As I mentioned in my recent blog post, the knowledge cutoff of GPT is a major limitation that is relevant to StackOverflow but not so much LeetCode-style questions.

Why Detection is Not Feasible

CodeSignal claims that ChatGPT leaves identifiable traces, such as over-commenting, unnecessary statements, and code incompatible with their IDE. However, those who have tried GPT-4 know that these examples are inaccurate and already outdated. Additionally, it is possible to create clever prompts that make it difficult to detect if GPT-4 generated the code. One such prompt includes providing a sample of the user's code to instruct ChatGPT to generate code in a similar style. Here is a prompt I generally like to use to create unique solutions from ChatGPT:

This approach results in a more personalized and unique output, making detection even more challenging. Here are two example outputs from ChatGPT with drastically different coding styles just by using different prompts for the same LeetCode question:

Moreover, ChatGPT not only provides code solutions but also explains them, allowing candidates to gain insights and improve their understanding.

As new language models emerge, they may not have the same identifiable markers as GPT-4. The only reliable method mentioned by CodeSignal and HackerRank in their articles for detecting cheating is if someone directly copies and pastes a solution. However, this problem is not unique to ChatGPT, and candidates can easily circumvent this issue by writing out the solution themselves or using the provided explanation to create their own answer.

The Cost of Detection

Implementing strict proctoring measures may result in heightened anxiety for candidates, as they become more conscious of being monitored during the assessment. This stress can negatively impact their performance, leading to false negatives and undermining the purpose of these tests. Furthermore, by concentrating on detecting and preventing ChatGPT usage, coding platforms will continue to focus on assessing outdated skills, rather than evaluating candidates based on their ability to adapt and thrive in real-world situations where tools like ChatGPT are readily available. As a result, these platforms may inadvertently penalize talented developers who effectively use ChatGPT as a tool to help improve their abilities and would be successful on the job. If you want to learn about how these companies implement proctoring tools and detect cheating more in general, check out this article.

Conclusion

In summary, the rise of ChatGPT presents both challenges and opportunities for coding platforms. Instead of focusing on strict proctoring and attempting to detect ChatGPT usage, these platforms should embrace the changing landscape of developer skills and adapt their assessments accordingly. As ChatGPT becomes more sophisticated, detecting its usage will become increasingly difficult, and relying on proctoring tools is not the most effective approach. Rather than penalizing candidates who utilize these advanced tools, platforms should consider how they can better prepare developers for the future by emphasizing the new skills that will be crucial for success.

At Hatchways, our mission is to shape the future of engineering assessments. Following the guidelines outlined in this previous blog post, rather than concentrating on limiting ChatGPT access and implementing stringent proctoring, we embrace this innovative AI technology and craft assessments that evaluate the emerging skills engineers will need in the years to come. Want to try it out? Schedule a call with our team.

If you are a job seeker and are looking for ways to practice your technical skills through practical skills, check out our developer upskilling tool here.

Why Platforms Like CodeSignal and HackerRank Can’t and Shouldn’t Detect ChatGPT Cheating

Is it Easy to Get An Answer For a CodeSignal or HackerRank Problem?

Why Detection is Not Feasible

The Cost of Detection

Conclusion

Subscribe for more