CMC develops a large Vietnamese legal language model

CMC launches a large legal language model and a set of assessment standards with the goal of creating legal virtual assistants for Vietnamese people.

CMC OpenAI Company (C-OpenAI) announced the large Vietnamese legal language model CMC-AI-Legal-32B, and introduced the VLegal-Bench evaluation standard set developed by the unit. These are two fundamental components for developing legal virtual assistants to serve Vietnamese users.

CMC-AI-Legal-32B was built and refined in depth about Vietnamese law. When evaluated on VLegal-Bench, the model leads in overall performance, ranking top 1 in 6/22 tasks, such as problems requiring multi-layer legal reasoning and reasoning.

The research team said that, due to contextual training and standards for citing Vietnamese law, CMC-AI-Legal-32B has an advantage over major foreign language models such as GPT-4o, Claude or Gemini when processing and inferring this specialized content.

 

The engineering team developed the VLegal – Bench evaluation standard. Image: CMC

To train the model, the team built a set of specialized standards for specific legal assessment. This is a task that is difficult for international standards to replace, due to differences in language and legal systems. VLegal-Bench includes 10,450 data samples with standard answers, divided into 22 tasks, designed according to 5 increasing levels of inference. The standard set is designed according to Vietnam’s specific characteristics. Each data sample is tied to a central-level legal document source to ensure verifiability.

Mr. Nguyen Tien Dong, Technical Director of CMC OpenAI, said that building a set of assessment standards is a “difficult problem” because of both technical and legal requirements. According to him, the biggest challenge is ensuring legal correctness, re-verifiability and compatibility with major language model evaluation standards in the world.

“We pursue the development of a large Vietnamese language model and specialized AI for each field,” said C-OpenAI General Director Dang Van Tu. At the same time, the company will publish the source code, data, evaluation process and call on domestic and foreign experts to participate in standardization to make the toolkit better and better.

AI is currently one of 11 technology groups in the Strategic Technology List signed by the Prime Minister in June, with product groups such as large Vietnamese language models, virtual assistants and specialized AI.

At the AI ​​Forum in the Digital Age held in late August, Minister of Science and Technology Nguyen Manh Hung said that AI is gradually becoming a type of national infrastructure, similar to electricity, telecommunications or the Internet. According to him, developing specialized AI applications not only helps solve national problems but also creates conditions for Vietnamese businesses to perfect and enhance technological capacity.

Many other Vietnamese businesses also pursue the development of a large Vietnamese language model, but at a more general level, instead of following a specialized direction. Zalo AI’s large Vietnamese language model currently has 13 billion parameters and is deployed in many practical applications. In September, VNPT proposed that the Government assign the task of developing a large Vietnamese language model, as the foundation for AI applications built by Vietnamese people.

In another direction, ViGen builds an open source Vietnamese data set, to promote AI applications in Vietnam. The project is jointly run by the National Innovation Center (NIC) and many organizations, including AI for Vietnam and Meta.

By Editor

One thought on “CMC develops a large Vietnamese legal language model”
  1. https://bbs.t-firefly.com/home.php?mod=space&uid=683984
    https://hangoutshelp.net/user/DenHadler
    https://zenno.club/discussion/members/leopaster.102800/#about
    https://xtremepape.rs/members/devidspelling.611822/#about
    https://ofuse.me/e/205783
    https://joy.gallery/mymtrading
    https://igli.me/MTrading
    https://sites.google.com/view/mtrading1/%D0%B3%D0%BB%D0%B0%D0%B2%D0%BD%D0%B0%D1%8F-%D1%81%D1%82%D1%80%D0%B0%D0%BD%D0%B8%D1%86%D0%B0
    http://www.arwen-undomiel.com/forum/viewtopic.php?f=7&t=346637
    https://niadd.com/statuses/1064296.html
    https://www.goldengaterelay.com/forums/topic/choosing-a-taxi-in-cyprus/
    https://bookclubs.com/books/dein-taxi-ist-da-593817
    https://wasliestdu.de/rezension/1-0#comment-1664427
    https://forums.hitched.co.uk/chat/forums/thread/taxi-services-for-special-events-1213871/2/
    http://users.atw.hu/nlw/viewtopic.php?p=84217#84217
    https://blogs.cornell.edu/cornellmasterclassinbangkok/2012/03/09/by-mc2289-ac2238/comment-page-185/#comment-162975
    https://metsastys.com/keskustelut/topic/10409-spostamenti-quotidiani-taxitransfer-sharm/?tab=comments#comment-347252
    https://www.fusioncash.net/forum.php?topic=97115.0
    https://discuss.facts.net/threads/new-surveillance-cameras-to-enhance-taxi-services.3675/#post-7931
    https://brschool16.ru/communication/forum/forum3/topic13213/
    https://illustrators.ru/illustrations/1957530
    https://duster-clubs.ru/forum/album.php?albumid=1331&pictureid=11057
    https://www.osago-to.ru/to/forum/94/12114
    http://kxk.ru/procomp/v1_2952038__.php
    http://ru.esosedi.org/RU/DA/1000487063/taksi_komfort/#lat=43238579&lng=46725834&z=20&mt=1&v=1

Leave a Reply