It turns out you can train AI models without copyrighted material


AI companies claim their tools could not exist without exercising copyrighted by copyright. It turns out, they can – it’s really hard. To prove it, AI researchers have prepared a new model, which is less powerful, but more ethical. This is the only public domain and open licensed material of the LLM’s database.

This paper (through Washington Post) 14 were cooperating between different institutions. The authors represent universities such as Mit, Carnegie Mellon and Toronto University. The vector Institute and the EU Allen Institute also contributed to commercial organizations.

The group set up a database of 8 tuberculosis. There were 130,000 book sets in the Congress Library. After entering the material, according to this information, the seven billion-billion-parameter taught a large language model (LLM). The result? Performed as the same size of the meter Llama 2-7b Since 2023. The team did not publish the results of comparing comparable to today’s best models.

The performance compared to a two-year-old model was not the only negative aspect. The process of putting them all together was also a grind. Most data were not read by machines, so people should have taken it. “We are using automated tools, but all our things were written by hand in the day and checked by people.”
. “And it’s really hard.” It also had difficulty finding legal details. The team had to determine which license was applied to each website they scanned.

So what do you do with a less strong LLM that is more difficult to raise? If nothing else can serve as a counterpoint.

In 2024 Open Told the British Parliament Committee Such a model could not be essentially available. The company claimed that this is not “possible to prepare today’s leading AI models without using copyrighted materials.” Last year, an anthopic expert Witness, “LLS, AI companies are required to request a license for licensing information for licensing, it will probably not exist.”

Of course, this work will not change the trajectory of AI companies. After all, more work to create less powerful means does not civt with their interests. However, at least pinds one of the general arguments of the industry. Don’t be surprised if you hear again about this work Legal affairs and Regulatory arguments.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *