Common Classifiers ASL

From static classifiers to reasoning engines: OpenAI’s new model rethinks content moderation

Enterprises, eager to ensure any AI models they use adhere to safety and safe-use policies, fine-tune LLMs so they do not respond to unwanted queries. However, much of the safeguarding and red teaming ...

Dark Reading

'Constitutional Classifiers' Technique Mitigates GenAI Jailbreaks

Researchers at Anthropic, the company behind the Claude AI assistant, have developed an approach they believe provides a practical, scalable method to make it harder for malicious actors to jailbreak ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

From static classifiers to reasoning engines: OpenAI’s new model rethinks content moderation

'Constitutional Classifiers' Technique Mitigates GenAI Jailbreaks

Trending now