We are looking to hire resources for 3 types of tasks.
Conversational Email
In total, we need a minimum of 10,000 emails, containing 700,000 words total.
* 10-30 words: 20%
* 30-100 words: 50%
* 100+ words: 30%
Email Thread should have at least 2 emails with at least 10 total words: 1st email + Answer (2 turns).
SMS-style text:
SMS data:
Paragraph level prose data:
In total we need a minimum of 10,000 samples, containing 700,000 words total.
* 10-30 words: 20%
* 30-100 words: 50%
* 100+ words: 30%
In total we need a minimum of 10,000 conversation threads, containing 200,000 words total. Each thread should contain at least 4 turns (4 messages) and a minimum of 10 words in the thread.
Each thread should contain at least 4 turns (4 messages) and a minimum of 10 words in the thread.
3 Notes:
An example is any text written in “notes”.
We need as many as possible. Includes formal, friendly and informal elements.
Do not include illegal, criminal, pornographic, or violent information, or personal information that involves personal privacy.
Topically balanced and culturally representative for the target locale Fluent/native speaker text – resource can be located anywhere if the text is in the required locale.
All AI-generated content and content that copy-paste from the Internet will NOT be accepted.
Profile requirement:
Balanced distribution should be given to any age/gender, but there should be no specific distribution. You need to provide content written by native and fluent speakers, but their location is irrelevant.
Locales: French (Locales: French (France), French-Canadian, Chinese, Spanish (Spain, Mexican, US), German, Italian, Japanese, Korean, Portuguese (Brazil)
Productivity desired:
At least 80 – 100 assets per person. But there is no limit to the number of unique submissions from each participant.
Start date: We are preparing for kick-off as soon as possible.
For all types of content, no real PII should be contained in the data. Fictional PII is OK. You can modify PII by adding fake info or directly delete PII if this does not affect the content.
Project will be worked in WebApp (Through OneForma agency platform)
Data characteristics:
Topically balanced and culturally representative for the target locale
Fluent/native speaker text (some natural distribution of mistakes and typos is acceptable)
Balanced in terms of perceived age and gender of speakers
Include formal, friendly, and informal content
Mirror the distribution of dialects of the target region